Solved: Hi Ali, I have nothing to add - Page 3

Anwar_Ludin · ‎06-13-2011

I would like to use the vsl random number generators in a parallel monte carlo simulation, ie the possibility to distribute the simulation on all processor cores. Regarding this I have 2 different cases:

- In the first case I would like to accelerate a simulation by distributing it on multiple processor cores. For example, let s say I need to simulate 10000 runs with each run containing 5000 timesteps. That means that I need to generate 10000*5000 random variates.

My simulation would look something like this:

#define SEED 1

#define BRNG VSL_BRNG_MCG31

#define METHOD VSL_RNG_METHOD_GAUSSIAN_ICDF

// initialize vsl random generator stream

VSLStreamStatePtr stream;

double a=0.0,sigma=0.3;

errcode = vslNewStream( &stream, BRNG, SEED );

for(int i=0; i<9999; i++){

// simulate one path by generating 5000 variates.

double r[5000];

vdRngGaussian( METHOD, stream, N, r, a, sigma );

for (int j=0;j<4999;j++){

// simulate random walk using the variates

}

I would like to parallelize the outer loop. My question is: is it safe to call vdRngGaussian from multiple threads? And am I guaranteed to have independant variates?

The second scenario would be to parallelize multiple simulations. In this case I would like to do one full simulation per thread and I need to to generate independant variates for all simulations. In this case my question would be what is the approach to generating the random variates? Should I use one rng per thread and initialize them with different seeds? I have been told that this is not the best way of getting independant variates. Another method would be to use the leapfrog method. What is best?

anwar

Andrey_N_Intel · ‎06-14-2011

Hello Anwar,

Intel MKL Random Number Generators support parallel Monte Carlo simulations by means of the following methodologies:

1. Block-splitting which allows you to split the original sequence into k non-overlapping blocks, where k - number of independent streams. The first stream would generate the random numbers x(1),...,x(nskip), the second stream would generate the numbers x(nskip+1),...,x(2nskip), etc. Service function vslSkipAheadStream( stream, nskip ) is way to use this methodology in your app.

VSLStreamStatePtr stream[nstreams];
int k;
for ( k=0; k{
vslNewStream( &stream, brng, seed );
vslSkipAheadStream( stream, nskip*k );
}

2. Leap-Frogging that allows you to split the original sequence into k disjoint subsequences, where k - number of independent streams. The first stream would generate the random numbers x(1), x(k+1),x(2k+1),... the second stream would generate the numbers x(2), x(k+2),x(2k+2),...,etc. Service function vslLeapfrogStream( stream, k, nstreams ) will help you to parallelize the application by means of this approach.

VSLStreamStatePtr stream[nstreams];
int k;
for ( k=0; k{
vslNewStream( &stream, brng, seed );
vslLeapfrogStream( stream, k, nstreams );
}

3. Generator family which supports parallel Monte Carlo simulations by design: Whichmann-Hill Basic Random Number Generator (BRNG) helps you to get up to 273 independent random streams, and Mersenne Twsiter MT2203 - up to 6024 independent random streams

#define nstreams 6024
VSLStreamStatePtr stream[nstreams];

int k;
for ( k=0; k< nstreams; k++ )
{
vslNewStream( &stream, VSL_BRNG_MT2203+k, seed );
}

All those techniques will help you to have streams of indepedent variates. As soon as you create nstreams random streams you can call safely call MKL Gaussian or any other RNG with k-th stream stream in thread safe way. Below are the additional notes related to parallelization of Monte-Carlo simulations:

1. To parallelize one simulation in your code you might use Block-Splitting or LeapFrog methodologies with MCG31, MCG59 or other MKL BRNG which supports them. Before the simulations please check that BRNG period addresses needs of your application in random numbers.

2. Please, avoid using the same VSL RNG per one thread initialized with different seeds. It can results in the biased output of the Monte-Carlo simulations. Instead, please, use one of the methodologies described above.

3. To parallelize multiple simulations you can use one of the methodologies above. Considerations related to BRNG period are applicable to this scenario too. Also, please, keep inmind the following aspects:

- to apply SkipAhead technique you will need to compute number of variates per one simulation, parameter nskip. If this number is not available (or difficult to compute) for some reasons in advance you might want to use Leapfrog technique

- Leapfrog technique, however, is recommended when number of independent streams k is fairly small

- Use of WH or MT2203 BRNG family could probably be the suitable option for your needs

You can find the details behind each methodology in VSLNotes, http://software.intel.com/sites/products/documentation/hpc/mkl/vsl/vslnotes.pdf, section 7.3.5 and in Intel MKL Manual, Chapter Statistical Functions, Section Random Number Generators, Service Routines, LeapFroagStream and SkipAheadStream and evaluate which of the methodologies fits your environment in the best way.

Also, it might be helpful to have a look at KB article that considers aspects for seed choice for of MKL BRNG initialization, http://software.intel.com/en-us/articles/initializing-Intel-MKL-BRNGs

Please, let me know if this addresses your questions. Feel free to ask more questions on parallelization of Monte Carlo simulations with MKL RNGs, we will be happy to help.

Best,
Andrey

View solution in original post

ali_m_2 · ‎11-11-2015

Hi Andrey,

What do you think about my observation on different outputs between Cilkscreen and inspxe-cl?

Thanks

Ali

Andrey_N_Intel · ‎11-12-2015

Hi Ali, I have nothing to add to Gennady's suggestion. Thanks, Andrey

Dmitry_V_ · ‎02-07-2017

Hello, Andrey!

Thank you for the detailed explanations in this forum! I have a question on parallel random generation in an MPI based application.

There is an application based on Microsoft MPI that performs calculation on multiple hosts. I'd like to add Mersenne Twsiter MT2203 independent random stream (your option #3) to each of this application MPI process using Intel MKL.

What would be the most optimal place in the code to start the streams? Should it be done before setting up all MPI processes or at the beginning of each MPI process?

Thanks much in advance,

Dmitry

Andrey_N_Intel · ‎02-07-2017

Hi Dmitry, I would start with initialization of MT2203 streams at the beginning of each MPI process. Thanks, Andrey

Dmitry_V_ · ‎02-08-2017

Andrey, thank you for the quick response!

parallel random number generation