- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
#define SEED 1
#define BRNG VSL_BRNG_MCG31
#define METHOD VSL_RNG_METHOD_GAUSSIAN_ICDF
VSLStreamStatePtr stream;
double a=0.0,sigma=0.3;
errcode = vslNewStream( &stream, BRNG, SEED );
// simulate one path by generating 5000 variates.
double r[5000];
vdRngGaussian( METHOD, stream, N, r, a, sigma );
for (int j=0;j<4999;j++){
// simulate random walk using the variates
}
}
I would like to parallelize the outer loop. My question is: is it safe to call vdRngGaussian from multiple threads? And am I guaranteed to have independant variates?
The second scenario would be to parallelize multiple simulations. In this case I would like to do one full simulation per thread and I need to to generate independant variates for all simulations. In this case my question would be what is the approach to generating the random variates? Should I use one rng per thread and initialize them with different seeds? I have been told that this is not the best way of getting independant variates. Another method would be to use the leapfrog method. What is best?
anwar
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Anwar,
Intel MKL Random Number Generators support parallel Monte Carlo simulations by means of the following methodologies:
1. Block-splitting which allows you to split the original sequence into k non-overlapping blocks, where k - number of independent streams. The first stream would generate the random numbers x(1),...,x(nskip), the second stream would generate the numbers x(nskip+1),...,x(2nskip), etc. Service function vslSkipAheadStream( stream, nskip ) is way to use this methodology in your app.
VSLStreamStatePtr stream[nstreams];
int k;
for ( k=0; k
vslNewStream( &stream
vslSkipAheadStream( stream
}
2. Leap-Frogging that allows you to split the original sequence into k disjoint subsequences, where k - number of independent streams. The first stream would generate the random numbers x(1), x(k+1),x(2k+1),... the second stream would generate the numbers x(2), x(k+2),x(2k+2),...,etc. Service function vslLeapfrogStream( stream, k, nstreams ) will help you to parallelize the application by means of this approach.
VSLStreamStatePtr stream[nstreams];
int k;
for ( k=0; k
vslNewStream( &stream
vslLeapfrogStream( stream
}
3. Generator family which supports parallel Monte Carlo simulations by design: Whichmann-Hill Basic Random Number Generator (BRNG) helps you to get up to 273 independent random streams, and Mersenne Twsiter MT2203 - up to 6024 independent random streams
#define nstreams 6024
VSLStreamStatePtr stream[nstreams];
int k;
for ( k=0; k< nstreams; k++ )
{
vslNewStream( &stream
}
All those techniques will help you to have streams of indepedent variates. As soon as you create nstreams random streams you can call safely call MKL Gaussian or any other RNG with k-th stream stream
1. To parallelize one simulation in your code you might use Block-Splitting or LeapFrog methodologies with MCG31, MCG59 or other MKL BRNG which supports them. Before the simulations please check that BRNG period addresses needs of your application in random numbers.
2. Please, avoid using the same VSL RNG per one thread initialized with different seeds. It can results in the biased output of the Monte-Carlo simulations. Instead, please, use one of the methodologies described above.
3. To parallelize multiple simulations you can use one of the methodologies above. Considerations related to BRNG period are applicable to this scenario too. Also, please, keep inmind the following aspects:
- to apply SkipAhead technique you will need to compute number of variates per one simulation, parameter nskip. If this number is not available (or difficult to compute) for some reasons in advance you might want to use Leapfrog technique
- Leapfrog technique, however, is recommended when number of independent streams k is fairly small
- Use of WH or MT2203 BRNG family could probably be the suitable option for your needs
You can find the details behind each methodology in VSLNotes, http://software.intel.com/sites/products/documentation/hpc/mkl/vsl/vslnotes.pdf, section 7.3.5 and in Intel MKL Manual, Chapter Statistical Functions, Section Random Number Generators, Service Routines, LeapFroagStream and SkipAheadStream and evaluate which of the methodologies fits your environment in the best way.
Also, it might be helpful to have a look at KB article that considers aspects for seed choice for of MKL BRNG initialization, http://software.intel.com/en-us/articles/initializing-Intel-MKL-BRNGs
Please, let me know if this addresses your questions. Feel free to ask more questions on parallelization of Monte Carlo simulations with MKL RNGs, we will be happy to help.
Best,
Andrey
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Anwar,
Intel MKL Random Number Generators support parallel Monte Carlo simulations by means of the following methodologies:
1. Block-splitting which allows you to split the original sequence into k non-overlapping blocks, where k - number of independent streams. The first stream would generate the random numbers x(1),...,x(nskip), the second stream would generate the numbers x(nskip+1),...,x(2nskip), etc. Service function vslSkipAheadStream( stream, nskip ) is way to use this methodology in your app.
VSLStreamStatePtr stream[nstreams];
int k;
for ( k=0; k
vslNewStream( &stream
vslSkipAheadStream( stream
}
2. Leap-Frogging that allows you to split the original sequence into k disjoint subsequences, where k - number of independent streams. The first stream would generate the random numbers x(1), x(k+1),x(2k+1),... the second stream would generate the numbers x(2), x(k+2),x(2k+2),...,etc. Service function vslLeapfrogStream( stream, k, nstreams ) will help you to parallelize the application by means of this approach.
VSLStreamStatePtr stream[nstreams];
int k;
for ( k=0; k
vslNewStream( &stream
vslLeapfrogStream( stream
}
3. Generator family which supports parallel Monte Carlo simulations by design: Whichmann-Hill Basic Random Number Generator (BRNG) helps you to get up to 273 independent random streams, and Mersenne Twsiter MT2203 - up to 6024 independent random streams
#define nstreams 6024
VSLStreamStatePtr stream[nstreams];
int k;
for ( k=0; k< nstreams; k++ )
{
vslNewStream( &stream
}
All those techniques will help you to have streams of indepedent variates. As soon as you create nstreams random streams you can call safely call MKL Gaussian or any other RNG with k-th stream stream
1. To parallelize one simulation in your code you might use Block-Splitting or LeapFrog methodologies with MCG31, MCG59 or other MKL BRNG which supports them. Before the simulations please check that BRNG period addresses needs of your application in random numbers.
2. Please, avoid using the same VSL RNG per one thread initialized with different seeds. It can results in the biased output of the Monte-Carlo simulations. Instead, please, use one of the methodologies described above.
3. To parallelize multiple simulations you can use one of the methodologies above. Considerations related to BRNG period are applicable to this scenario too. Also, please, keep inmind the following aspects:
- to apply SkipAhead technique you will need to compute number of variates per one simulation, parameter nskip. If this number is not available (or difficult to compute) for some reasons in advance you might want to use Leapfrog technique
- Leapfrog technique, however, is recommended when number of independent streams k is fairly small
- Use of WH or MT2203 BRNG family could probably be the suitable option for your needs
You can find the details behind each methodology in VSLNotes, http://software.intel.com/sites/products/documentation/hpc/mkl/vsl/vslnotes.pdf, section 7.3.5 and in Intel MKL Manual, Chapter Statistical Functions, Section Random Number Generators, Service Routines, LeapFroagStream and SkipAheadStream and evaluate which of the methodologies fits your environment in the best way.
Also, it might be helpful to have a look at KB article that considers aspects for seed choice for of MKL BRNG initialization, http://software.intel.com/en-us/articles/initializing-Intel-MKL-BRNGs
Please, let me know if this addresses your questions. Feel free to ask more questions on parallelization of Monte Carlo simulations with MKL RNGs, we will be happy to help.
Best,
Andrey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Andrey,
Thanks a lot for your feedback and please forgive me for this delayed response. Indeed your explanations clarified things a lot. I will now play around with the various generators using Intel TBB and come back to you with further questions :)
Anwar
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sure, please, let us know when you have further questions on Statistical Component of Intel Math Kernel Library, we will be glad to help.
Best,
Andrey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To share the same random streamamong several threads in thread-safe and "correlation-free" wayyou would need to manage the access to the random number generationthrough thread syncronization primitives (so, that at any time only one thread uses the stream for the producing random numbers). Otherwise, you would use the stream in not thread-safe manner whichpotentially canput correlations in your results.
To splitone simulation across cores I would suggestto use one of the parallelization approaches supportedby Intel MKL Random Number Generators and shortly described above. Use of those methodologies would also be advantageous from perspective of performance/effective use of multi-core resources and would help to avoid unnecessary thread syncronization.
Thanks,
Andrey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OK thanks for your clarifications and thank a lot for your feedback!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm hitting another stumbling block. I m trying to use the random generators with intel tbb and I m not sure how/when to initialize them correctly. As an example let s consider thecalculation of pi through monte carlo. I would like to use a tbbparallel_reduce algorithm for this. Here's an example i've taken from the tina rng manual that I would like to adapt to mkl. For one thing I can t know in advance how many streams I will need because this is taken care of by tbb. My guesswould be that I need to pass adifferent stream to each splitting constructor, but how can I know in advance how manystreams I willnead? Any help would be greatly appreciated!Regards, Anwar
#include
#include
#include
#include
#include
class parallel_pi {
trng::uniform01_dist<> u; // random number distribution
long in;
const trng::yarn2 &r;
public:
void operator()(const tbb::blocked_range
trng::yarn2 r_local; // local copy of random number engine
r_local.jump(2*range.begin()); // jump ahead
for (long i=range.begin(); i!=range.end(); ++i) {
double x=u(r_local), y=u(r_local); // choose random x and y coordinates
if (x*x+y*y<=1.0) // i s point in circle ?
++in; // increase thread local counter
} //for
}//operator
// join threads and counters
void join(const parallel_pi &other) {
in+=other.in;
}
long in_circle() const {
return in;
}
parallel_pi(const trng::yarn2 &r) : r, in(0) {
}
parallel_pi(const parallel_pi &other, tbb::split) : r(other.r), in(0) {
}
};
int main(int argc, char *argv[]) {
tbb::task_scheduler_init init;
const long samples=1000000l; // total number of points in square
trng::yarn2 r; // random number engine
parallel_pi pi; // functor for parallel reduce
// parallel MC computation of pi
tbb::parallel_reduce(tbb::blocked_range
// print result
std::cout << "pi = " << 4.0*pi.in_circle()/samples << std::endl;
return EXIT_SUCCESS;
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Anwar,
I have quick suggestions you might want to have a look at.
1.You have the number of samples whose processing you want to split across threads, samples = 1M. I assume that you can check the maximal number of cores for your CPU, say ncores. So, you can assign the processing of range of size k=samples/ncore to one core.
For TBB blocked range you can specify the grainsize parameter: your blocked_range will not be split into two sub-ranges if the size of the range less than grainsize (please, see Section 4.2.1 of Intel TBB Manual for more details).
If you set the grain size to kyouwill avoid undersubcription (that is, the number of logical threads is not enough to keep physical threads working) and oversubsription (number of logical threads exceeds number of physical threads). On the next step youwould associate VSL Random stream with a thread in the operator()(const tbb::blocked_range
a) Determine number of cores on CPU ncore
b)Create ncore MKL Random Streams by applying one of VSL parallelization techniques
c)Construct object of type parallel_pi by providing number of cores, number of samples, and array of random streams
d)Compute index idx of the block being processed and obtain random numbers from the stream indexed idx. Use nsamples/ncore for grainsize in blocked_range.
2. Also, please, have a look at Chapter "Catalog of Recommended task Patterns". The methodology described there would allow creating and spawning k tasks; each task would process the random numbers obtained from specific VSL Random Stream created by using one of parallelization techniques.
Please, let me know how those approaches work for you.
Best,
Andrey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
#include
#include "mkl_vsl.h"
#include "tbb/task_scheduler_init.h"
#include "tbb/task_group.h"
class PiCalculator {
public:
long numpoints;
long in;
VSLStreamStatePtr stream;
PiCalculator(long numpoints, long& in, VSLStreamStatePtr stream) :
numpoints(numpoints), in(in), stream(stream) {
}
void operator()(){
double variates[2*numpoints]; //we need 2 random variates per point
// crashes here EXC_BAD_ACCESS: Could not access memory
vdRngUniform(VSL_RNG_METHOD_UNIFORM_STD, stream, 2*numpoints, variates, 0.0, 1.0);
for(int i=0; i
double x=variates;
double y=variates[numpoints+i];
if(x*x+y*y<=1.0) ++in;
}
};
};
int main() {
int errorcode;
const long samples = 1000000l;
int seed = 1;
int nstreams = 2;
VSLStreamStatePtr stream[nstreams];
for (int i=0; i
{
errorcode = vslNewStream( &stream, VSL_BRNG_MCG31, seed );
if(errorcode){
return 1;
}
errorcode = vslLeapfrogStream( stream, i, nstreams );
if(errorcode){
return 1;
}
}
tbb::task_scheduler_init init;
tbb::task_group group;
long result1 = 0;
long result2 = 0;
group.run(PiCalculator(samples/2, result1, stream[0]));
group.run(PiCalculator(samples/2, result2, stream[1]));
group.wait();
std::cout << "pi = " << 4.0*(result1+result2)/samples << std::endl;
for(int i=0;i
vslDeleteStream(&stream);
return 0;
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Period of Intel MKL MCG31m1 BRNG should be sufficient for goals of this demo application. However, ifthe application requestsnumberof random variatesthat exceeds its period you would see repeated random numbers in the sequence.
The root of the crash seems to be in the operator() - the size of the buffer which is allocated on stackand is used for random number is 2 * samples * sizeof( double ) = 2 * 1M * 8 = 16 MB.
To avoid the issues with buffer size and to improve perfromance of the application Imodified your code as shown below. The essense of the changes is use of the buffer of the fixed size for random numbers.Size of the buffer is chosen to get the best performance of the application (you would probably need several experiments to choose the best size of the buffer).
void operator()( )
{
const int block_size = 1024;
double variates[2*block_size];
int nblocks, ntail, i, j;
nblocks = numpoints / block_size;
ntail = numpoints - nblocks * block_size;
for( j = 0; j < nblocks; j++ )
{
vdRngUniform(VSL_RNG_METHOD_UNIFORM_STD, stream, 2*block_size, variates, 0.0, 1.0 );
for( i = 0; i < block_size; i++ )
{
double x = variates[2*i + 0];
double y = variates[2*i + 1];
if(x*x+y*y<=1.0) ++(in);
}
}
vdRngUniform(VSL_RNG_METHOD_UNIFORM_STD, stream, 2*ntail, variates, 0.0, 1.0 );
for( i = 0; i < ntail; i++ )
{
double x = variates[2*i + 0];
double y = variates[2*i + 1];
if(x*x+y*y<=1.0) ++(in);
}
}
public:
long& in;
Thanks,
Andrey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
#include
#include "mkl_vsl.h"
#include "tbb/task_scheduler_init.h"
#include "tbb/task_group.h"
#include "tbb/tick_count.h"
class PiCalculator {
public:
long numpoints;
long& in;
VSLStreamStatePtr stream;
PiCalculator(long numpoints, long& in, VSLStreamStatePtr stream) :
numpoints(numpoints), in(in), stream(stream) {
in = 0; // make sure to initialize to zero.
}
void operator()() {
const int block_size = 2048;
double variates[2 * block_size];
int nblocks, ntail, i, j;
nblocks = numpoints / block_size;
ntail = numpoints - nblocks * block_size;
for (j = 0; j < nblocks; j++) {
vdRngUniform(VSL_RNG_METHOD_UNIFORM_STD, stream, 2 * block_size,variates, 0.0, 1.0);
for (i = 0; i < block_size; i++) {
double x = variates[2 * i + 0];
double y = variates[2 * i + 1];
if (x * x + y * y <= 1.0)
++(in);
}
}
vdRngUniform(VSL_RNG_METHOD_UNIFORM_STD, stream, 2 * ntail, variates,0.0, 1.0);
for (i = 0; i < ntail; i++) {
double x = variates[2 * i + 0];
double y = variates[2 * i + 1];
if (x * x + y * y <= 1.0)
++(in);
}
};
};
int main() {
int errorcode;
const long samples = 10000000000l;
int seed = 1;
int tasks = 50;
VSLStreamStatePtr stream[tasks];
for (int i = 0; i < tasks; i++) {
errorcode = vslNewStream(&stream, VSL_BRNG_MCG59, seed);
if (errorcode) {
return 1;
}
errorcode = vslLeapfrogStream(stream, i, tasks);
if (errorcode) {
return 1;
}
}
tbb::task_scheduler_init init;
tbb::task_group group;
long results[tasks];
long samplesPerTasks = samples/tasks;
tbb::tick_count t0 = tbb::tick_count::now();
for (int i = 0; i < tasks; i++) {
group.run(PiCalculator(samplesPerTasks, results, stream));
}
group.wait();
tbb::tick_count t1 = tbb::tick_count::now();
long result = 0;
for(int i=0;i
result += results;
}
std::cout << "pi = " << 4.0 * result / samples << std::endl;
std::cout << "time : " << (t1-t0).seconds();
for (int i = 0; i < tasks; i++)
vslDeleteStream(&stream);
return 0;
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
At first glance, the problem you solve is similar to the problem Anwar has earlier described. You need to parallelize Monte Carlo simulations, and process random numbers using some algorithm(say, compute some statistics).
The dimensions you mentioned in the postindicatethat MKL Random Number Generatorstogether with OpenMP* look suitable choice for your simulations. The methodology for parallelization of the simulations with Intel MKL Random Number Generators is "language independent".Before choosing type of Basic RNG and parallelization methodology you would need to better understand requirements to the generator:
1. How many numbers would you need (especially if number of simulationswould potentially increase)?
2. What are the performance requirements?RNG performance data available at
http://software.intel.com/sites/products/documentation/hpc/mkl/vsl/vsl_performance_data.htm
could be useful to dosome perfromance estimatesfor your codewith different types ofRNGs.
3. Number of cores/threads you plan to use today (or even tomorrow).
4.Any other aspects that reflect specifics of your problem.
The list of the requirements would help you to choose MKL BRNG that meets requirements of your problem.If you choose MT2203BRNG which supports up to 6024 threads (in MKL 10.3.3)and has period~10^663 youmight wanttohave array of ncore VSL Random streams (ncore is 12 in your case). Each of them is initizalized in the usual way by means of Fortran version of the function NewStream vslnewstream. Youthen need to split NSIM simulations acrossncore threads that is, assign blocks of simulations to each thread. Assume, for simplicityNSIM=120. Then random stream #1 would serve block of simulations 1-10, second - block of 11-20, that is number of simulations per core is 10. Using RNG you will generate arrays of the observations in each block.On high levelit would look like this
do i=1:ncore
status = vslnewstream( stream(i), VSL_BRNG_MT2203+i, seed)
do j=1:sim_per_core
status=vdrnggaussian( VSL_RNG_METHOD_GAUSSIAN_ICDF, stream(i), n, r, a, sigma )
process array r of size n=1000
end do
status = vsldeletestream( stream(i) )
end do
Please, note that it would be more effective from perspective of performance to call VSL RNGs on vector lengths like few thousands. If number of observations is 100 you might want to groupseveralvectors of the observationsinto one call to the generator.
If you choose BRNG with skip-ahead or leapfrog feature for your simulations, e.g. MCG59BRNGthe computations would look similar.The only change is initialization by means of service functions vslleapfrogstream or vslskipaheadstream. MKL installationdirectory contains example/vslfwith Fortran examples that demonstrate RNG use; Skip-Ahead and Leap-Frog features are among them.
If you need to compute simple statistical estimates like mean or covariance you might want to do it with Summary Statistics ofVSL. As VSL RNG it provides Fortran API and can be called from your application. Please, note that Summary Stats routines incorporate threadingfeature while processingthe dataset of size p x n when appropriate.
Please, let me know when/if you parallelize the computations or ask more questions on Statistical Feature of MKL.
Best,
Andrey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is great to know that you got speed-up of the app with MKL RNGs,TBB on multi-core CPU.
Please, feel free to ask questions on Stat Component of Intel MKL, we would be ready to discuss and help.
Best,
Andrey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
to determine the treadid. This will give me the index of the corresponding stream to use from the array of streams previously initialized. I m also doing a toy reduction but this really could be anything you decide to calculate using the variates. The result of the reduction is available at the end of the parallel region and printed to the console.
#include #include "omp.h" #include "mkl_vsl.h" int main() { int seed = 1; int tid; int numthreads = omp_get_num_procs(); VSLStreamStatePtr stream[numthreads]; int variatesPerThread = 5000; double variates[variatesPerThread]; double a = 1.0; double sigma = 0.2; int result = 0; for (int i = 0; i < numthreads; i++) { int errorCode=0; errorCode = vslNewStream(&stream, VSL_BRNG_MCG59, seed); if(errorCode){ printf("vslNewStream failed\n"); return 1; } errorCode = vslLeapfrogStream(stream, i, numthreads); if(errorCode){ printf("vslLeapfrogStream failed\n"); return 1; } } #pragma omp parallel private(tid, variates) reduction(+:result) { tid = omp_get_thread_num(); // generate the random samples and do something interesting. vdRngGaussian(VSL_RNG_METHOD_GAUSSIAN_ICDF, stream[tid],variatesPerThread, variates, a, sigma); // reduce the result result = result+tid; } printf("result is: %d", result); for (int i = 0; i < numthreads; i++) vslDeleteStream(&stream); }
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1. I do not expect any issues if seed would be a shared scalar - in the threads the library reads this value once to initialize the state of the basic random generator. Also, as another option you could initialize the array of streams prior toyour OpenMP directives, in serial part of the program.
2. It is important that i-the entry of the array stream is assigned to threadindexed i; andthe expectation is that i-th thread/core would use only i-th entry (stream)ofthe array to produce random numbers and will not use j-th entry.From this perspective, the array can be shared among the threads.
3. Yes, you can delete i-th stream in i-th thread independently. You are correct - Imodified my previous post to have status=vsldeletestream( stream(i) ).
4. Choice of the parallelization methodology (Skip-ahead, Leap-Frog, BRNG family) is entirely defined by you and requirements of your problem. In some cases (due to specifics of the problem)it might be moresuitable to useLeap-Frog or Skip-Ahead methodologies for parallelization of Monte Carlo simulations. How many streams (or blocks) to use inyour environementwill be definedon your side - youinitialize as manyIntel MKL Random Streams as you need/want.If the example we are considering for 12 core based computer you could set number ofVSL random streamsto 6 if, say, you plan to use the rest cores for other computations.
Please, let me know if this answers your questions.
Best,
Andrey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page