As seen in this question, in Windows it is not possible to run parallel processes with shared memory in R. Therefore, I have devised the following methodology, using a series of set.seed()
, to simulate a "shared memory" between my cores.
First, I want to simulate a random variable (RV) matrix:
- The lines represent the fixed number of loans.
- The columns represent the number of Monte Calro simulations.
Second, I want to repeat the above a certain number of times, precisely nruns
times. At each run
in the nruns
any metric (mean, quantile) can be computed from this matrix.
nruns <- 4
nsim.runs <- c(5000, 10000)
nport.total <- 10
n.sectors <- 2
K <- n.sectors
ISCorrMatrix <- matrix(data=0.5, nrow=n.sectors, ncol=n.sectors, byrow=TRUE)
diag(ISCorrMatrix) <- 1
Alpha.Matrix <- t(chol(ISCorrMatrix))
Alpha.Matrix.extended <- NULL
for (i in 1:n.sectors) {
new <- matrix(data=Alpha.Matrix[i, ], nrow=nport.total/2, ncol=n.sectors,
byrow=TRUE)
Alpha.Matrix.extended <- rbind(Alpha.Matrix.extended, new)
}
beta.star <- runif(n.sectors, min=0.1, max=0.9) ## so just draw them at random.
beta.star <- beta.star/sqrt(sum(beta.star^2)) ## to normalise the draw.
sum(beta.star^2)
## Initialize parallel backend
num_cores <- length(nsim.runs)
library(doParallel)
cl <- makeCluster(num_cores)
registerDoParallel(cl)
## Set up a reproducible RNG stream
clusterSetRNGStream(cl)
parallel_results <- foreach(core=1:length(nsim.runs), .combine="c") %dopar% {
nsim <- nsim.runs[core]
results <- list()
set.seed(123)
for (run in 1:nruns) {
Z.systematic <- matrix(data=rnorm(nsim*K), nrow=K, ncol=nsim, byrow=FALSE)
Y.systematic <- Alpha.Matrix %*% Z.systematic
Y.systematic.star <- beta.star %*% Z.systematic
Y.systematic.extended <- c()
for (i in 1:n.sectors) {
new <- matrix(data=rep(Y.systematic[i, ], nport.total/2),
nrow=nport.total/2, ncol=nsim, byrow=TRUE)
Y.systematic.extended <- rbind(Y.systematic.extended, new)
}
Y.systematic.extended.star <-
matrix(data=Y.systematic.star, nrow=nport.total, ncol=nsim, byrow=TRUE)
set.seed(123 + run*100000)
Z.Default <- matrix(data=rnorm(nsim*nport.total), nrow=nport.total,
ncol=nsim, byrow=FALSE)
set.seed(123 + run*100000)
LGD.Basel <- matrix(data=rbeta(n=nsim*nport.total, shape1=2, shape2=3),
nrow=nport.total, ncol=nsim, byrow=FALSE)
results[[paste0("run_", run, "_Nsim_", nsim)]] <-
list(Y.systematic=Y.systematic.extended.star,
Z.Default=Z.Default,
LGD=LGD.Basel)
}
return(results) ## The return call of the parallelisation.
}
## Stop the parallel backend
stopCluster(cl)
If we focus on one run
across the cores, I want that:
- The first core simulates 5000 columns (Monte Carlo simulations).
- The second core simulates 10000 columns (Monte Carlo simulations). To avoid simulation noise, the first 5000 columns shall contain the same simulated RV as in the first core, and so on, as more cores are added to the parallelisation.
This methodology aims to replicate a situation in which you first simulated a large matrix of 10000 Monte Carlo simulations, and then assigned the first 5000 columns to the first core, the full 10000 columns to the second core, and so on (if there are more simulations).
If we focus on multiple run
(s) across the same core, I want that if I rerun its nsim
, to get a newly generated random matrix, independent of the one generated in the previous run
, and so on.
Visually, when the above methodology is applied to compute one quantile, per run
, across a series of cores each having a different number of nsim
this yields:
My code seems to achieve that (here I show only Y and Z, but also works for the LGD):
Stability for same run across nsims: Y
> parallel_results$run_1_Nsim_5000$Y.systematic[1:2,4998:5000]
[,1] [,2] [,3]
[1,] -0.2397531 1.548914 0.2781637
[2,] -0.2397531 1.548914 0.2781637
> parallel_results$run_1_Nsim_10000$Y.systematic[1:2,4998:5002]
[,1] [,2] [,3] [,4] [,5]
[1,] -0.2397531 1.548914 0.2781637 2.23994 -0.6209843
[2,] -0.2397531 1.548914 0.2781637 2.23994 -0.6209843
Variability for different runs across same nsim: Y
> parallel_results$run_1_Nsim_5000$Y.systematic[1:2,1:2]
[,1] [,2]
[1,] -0.3996827 0.2019272
[2,] -0.3996827 0.2019272
> parallel_results$run_2_Nsim_5000$Y.systematic[1:2,1:2]
[,1] [,2]
[1,] -0.4580132 -0.08322681
[2,] -0.4580132 -0.08322681
Stability for same run across nsims: Z
> parallel_results$run_1_Nsim_5000$Z.Default[1:2,4998:5000]
[,1] [,2] [,3]
[1,] -1.077092 -0.006225626 -0.6152041
[2,] 2.033250 -1.350659878 -0.7040274
> parallel_results$run_1_Nsim_10000$Z.Default[1:2,4998:5002]
[,1] [,2] [,3] [,4] [,5]
[1,] -1.077092 -0.006225626 -0.6152041 0.57472772 2.593232
[2,] 2.033250 -1.350659878 -0.7040274 -0.08696669 -1.456689
Variability for different runs across same nsim: Z
> parallel_results$run_1_Nsim_5000$Z.Default[1:2,1:2]
[,1] [,2]
[1,] 0.0008058532 -1.0539683
[2,] -1.3029004815 -0.2031874
> parallel_results$run_2_Nsim_5000$Z.Default[1:2,1:2]
[,1] [,2]
[1,] 0.50913790 -1.120769
[2,] -0.03775884 0.894895
Question:
My question is now, are the nruns
draws in the same core truly independent (I have checked the summary stats and the corr, they seem to be...)? Or have I inserted some undesired link through the set.seed()
setting?
Are there considerations linked to the RNG in R (and its interaction with the parallel
package) I have missed? I have extensively searched across stack but was unable to find something matching my application.