1

I would like to run the apply function (my_func2) more efficiently by using parallelization in R across multiple imputed datasets by using all 8 cores on my computer. Each imputed dataset is about 1.7 milllion long, so it takes a while for my computer to run across 25 imputed datasets. How can I maximize the computation time?

Below is some sample data:

library(haven)
library(dplyr)
library(mstate)
impute1 <- data.frame(unique_ID = c(1,2,3,4), 
              DIED_INDICATOR = c(0,1,1,1), 
              CVD_ANY = c(0,1,1,0), 
              YEARS_CVD_DEATH = c(15.9, 23.6, 22.7, 3.4), 
              YEARS_CVD_HOSP = c(15.9, 11.4, 20.7, 3.4), 
              TOBACCO = c(0, 0, 0, 1), 
              MARRIED = c(1,0,1,0), 
              PARITY = c(2,1,1,2)) 

impute2 <- data.frame(unique_ID = c(1,2,3,4), 
              DIED_INDICATOR = c(0,1,1,1), 
              CVD_ANY = c(0,1,1,0), 
              YEARS_CVD_DEATH = c(15.9, 23.6, 22.7, 3.4), 
              YEARS_CVD_HOSP = c(15.9, 11.4, 21.7, 3.4), 
              TOBACCO = c(0, 1, 0, 1), 
              MARRIED = c(1,0,1,1), 
              PARITY = c(1,1,1,2)) 


test_list <- list(impute1, impute2)

covs <- c("TOBACCO", "MARRIED", "PARITY")

tmat <- trans.illdeath()




my_func2 <- function(x) {
cohort1 <- msprep(data=x, trans=tmat, 
  time=c(NA,"YEARS_CVD_HOSP","YEARS_CVD_DEATH"),
  status=c(NA,"CVD_ANY","DIED_INDICATOR"), 
  keep=covs, id=x$unique_ID)

cohort_expand <- expand.covs(cohort1, covs, append=TRUE, longnames=FALSE)

c1 <- coxph(Surv(Tstart, Tstop, status) 
  ~ TOBACCO.1 + TOBACCO.2 + TOBACCO.3 + strata(trans),
  data=cohort_expand, method="breslow")

summary(c1)
}

What would you recommend using with this lapply function?

1 Answer 1

0

Probably you may use dopar, dosnow to create parallel jobs with R, there is a similar answer here:

Calling external program in parallel using foreach and doSNOW: How to import results?

Very best regards.

Not the answer you're looking for? Browse other questions tagged or ask your own question.