- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
#define SEED 1
#define BRNG VSL_BRNG_MCG31
#define METHOD VSL_RNG_METHOD_GAUSSIAN_ICDF
VSLStreamStatePtr stream;
double a=0.0,sigma=0.3;
errcode = vslNewStream( &stream, BRNG, SEED );
// simulate one path by generating 5000 variates.
double r[5000];
vdRngGaussian( METHOD, stream, N, r, a, sigma );
for (int j=0;j<4999;j++){
// simulate random walk using the variates
}
}
I would like to parallelize the outer loop. My question is: is it safe to call vdRngGaussian from multiple threads? And am I guaranteed to have independant variates?
The second scenario would be to parallelize multiple simulations. In this case I would like to do one full simulation per thread and I need to to generate independant variates for all simulations. In this case my question would be what is the approach to generating the random variates? Should I use one rng per thread and initialize them with different seeds? I have been told that this is not the best way of getting independant variates. Another method would be to use the leapfrog method. What is best?
anwar
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Anwar,
Intel MKL Random Number Generators support parallel Monte Carlo simulations by means of the following methodologies:
1. Block-splitting which allows you to split the original sequence into k non-overlapping blocks, where k - number of independent streams. The first stream would generate the random numbers x(1),...,x(nskip), the second stream would generate the numbers x(nskip+1),...,x(2nskip), etc. Service function vslSkipAheadStream( stream, nskip ) is way to use this methodology in your app.
VSLStreamStatePtr stream[nstreams];
int k;
for ( k=0; k
vslNewStream( &stream
vslSkipAheadStream( stream
}
2. Leap-Frogging that allows you to split the original sequence into k disjoint subsequences, where k - number of independent streams. The first stream would generate the random numbers x(1), x(k+1),x(2k+1),... the second stream would generate the numbers x(2), x(k+2),x(2k+2),...,etc. Service function vslLeapfrogStream( stream, k, nstreams ) will help you to parallelize the application by means of this approach.
VSLStreamStatePtr stream[nstreams];
int k;
for ( k=0; k
vslNewStream( &stream
vslLeapfrogStream( stream
}
3. Generator family which supports parallel Monte Carlo simulations by design: Whichmann-Hill Basic Random Number Generator (BRNG) helps you to get up to 273 independent random streams, and Mersenne Twsiter MT2203 - up to 6024 independent random streams
#define nstreams 6024
VSLStreamStatePtr stream[nstreams];
int k;
for ( k=0; k< nstreams; k++ )
{
vslNewStream( &stream
}
All those techniques will help you to have streams of indepedent variates. As soon as you create nstreams random streams you can call safely call MKL Gaussian or any other RNG with k-th stream stream
1. To parallelize one simulation in your code you might use Block-Splitting or LeapFrog methodologies with MCG31, MCG59 or other MKL BRNG which supports them. Before the simulations please check that BRNG period addresses needs of your application in random numbers.
2. Please, avoid using the same VSL RNG per one thread initialized with different seeds. It can results in the biased output of the Monte-Carlo simulations. Instead, please, use one of the methodologies described above.
3. To parallelize multiple simulations you can use one of the methodologies above. Considerations related to BRNG period are applicable to this scenario too. Also, please, keep inmind the following aspects:
- to apply SkipAhead technique you will need to compute number of variates per one simulation, parameter nskip. If this number is not available (or difficult to compute) for some reasons in advance you might want to use Leapfrog technique
- Leapfrog technique, however, is recommended when number of independent streams k is fairly small
- Use of WH or MT2203 BRNG family could probably be the suitable option for your needs
You can find the details behind each methodology in VSLNotes, http://software.intel.com/sites/products/documentation/hpc/mkl/vsl/vslnotes.pdf, section 7.3.5 and in Intel MKL Manual, Chapter Statistical Functions, Section Random Number Generators, Service Routines, LeapFroagStream and SkipAheadStream and evaluate which of the methodologies fits your environment in the best way.
Also, it might be helpful to have a look at KB article that considers aspects for seed choice for of MKL BRNG initialization, http://software.intel.com/en-us/articles/initializing-Intel-MKL-BRNGs
Please, let me know if this addresses your questions. Feel free to ask more questions on parallelization of Monte Carlo simulations with MKL RNGs, we will be happy to help.
Best,
Andrey
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Best,
Andrey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Andrey,
thank you so much for your very helpful reply. I followed some of your instructions and finally divided the number of simulations into chunks of 100,000 and stored all random numbers needed for one chunk in vector<double> data types.
Furthermore, I introduced another serial for loop (serial_iter) to avoid memory problems...
Code:
const int number_iterations = number_sims / 100000;
const int processors = 20;
const int number_sims_chunk = number_sims / processors / number_iterations;
VSLStreamStatePtr stream[6024];
for (int i = 0; i < 6024; ++i) {
vslNewStream( &stream, VSL_BRNG_MT2203+i, seed );
}
for (int serial_iter = 0; serial_iter < number_iterations; ++serial_iter) {
#pragma omp parallel for
for (int p = 0; p < processors; ++p) {
//Random numbers used for standard case (99% of these random numbers are used in code)
vector<double> vecUniform(number_sims_chunk * number_days, 0.0);
vector<double> vecGaussian(number_sims_chunk * number_days, 0.0);
//Random numbers needed for some check
//(< 1% of these random numbers are used in code)
//--> need to be improved
vector<double> vecUniform2(number_sims_chunk * number_days, 0.0);
//Some other Random numbers only needed for postprocessing of default (again < 1% )
vector<double> vecGaussian3(number_sims_chunk * liq_period, 0.0);
vector<double> vecUniform3(number_sims_chunk * liq_period, 0.0);
vdRngUniform( VSL_RNG_METHOD_UNIFORM_STD_ACCURATE, stream[serial_iter * processors + p], vecUniform.size(), &vecUniform[0], 0.0, 1.0 );
vdRngGaussian( VSL_RNG_METHOD_GAUSSIAN_ICDF, stream[serial_iter * processors + p], vecGaussian.size(), &vecGaussian[0], 0.0, 1.0 );
vdRngUniform( VSL_RNG_METHOD_UNIFORM_STD_ACCURATE, stream[serial_iter * processors + p], vecUniform2.size(), &vecUniform2[0], 0.0, 1.0 );
vdRngUniform( VSL_RNG_METHOD_UNIFORM_STD_ACCURATE, stream[serial_iter * processors + p], vecUniform3.size(), &vecUniform3[0], 0.0, 1.0 );
vdRngGaussian( VSL_RNG_METHOD_GAUSSIAN_ICDF, stream[serial_iter * processors + p], vecGaussian3.size(), &vecGaussian3[0], 0.0, 1.0 );
for (int z1 = 1; z1 <= number_sims_chunk; ++z1) {
serial code with variable loss changed from double data type to tbb::atomic<double> to sum up losses
...
The output was a nice Excel library (xll) containing one function. Calling the function from Excel gave me a speedup of 5.5 on two Xeon X5650 in comparison to the serial code.
I was happy to see Excel using 11 cores at rougly 80% for a couple of minutes :-)
Unfortunately, I cannot use the Intel(R) C++ compiler in my company (we only have the MKL library avaialble) and couldn't figure out how to do the memory allignment needed for SIMD (SSE) in VS2010.
Do you have any ideas how to vectorize my code?
Can I make the following run faster using other data types having memory allignment?
vector<double> vecUniform(number_sims_chunk * number_days, 0.0);
vdRngUniform( VSL_RNG_METHOD_UNIFORM_STD_ACCURATE, stream[serial_iter * processors + p], vecUniform.size(), &vecUniform[0], 0.0, 1.0 );
Anyways, thank you so much again for your help.
All the best, Matt
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Matt,
I'm glad to know that you developed the algorithm which uses HW resources in the effective way, excellent progress!
To my understanding, in many (not all) cases Monte Carlo simulations can demonstrate close to linear scalability over number of cores. You mentioned about 5.5x speed-up on ~11 cores. Does it mean that your code still spends visible amount of time in serial sections? Which scalability do you expect for your algorithm, say 10x for 11 cores or different?
One possible way to vectorize the code in your environment is to use SSE2 intrinsics, see, for example, http://msdn.microsoft.com/en-us/library/9b07190d(v=vs.80).aspx . However, intrinics based approach is not "scalable": once you want to migrate your code to Intel(R) AVX with 256 bit registers or try Intel(R) Xeon Phi with 512 bit vector registers, you would need to re-develop your vectorized code again to make effective use of wider vector registers. Intel(R) C++ compiler does this work for you.
Yes, please, apply memory alingment to your arrays when ever possible (however, I'm not sure that this would make the application faster in that specific part of the code).
You also might want to try VSL_RNG_METHOD_UNIFORM_STD method for generation of uniform random numbers only if you do not have reasons for use of the accurate method.
Please, let me know if you have more questions.
Thanks,
Andrey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Andrey
I tried running an example with independent streams in Intel Fortran 12.1.5, and I'm a little surprised by the results. Why are the STARTING bolded values from streams 1 and 5, and the italicized values from streams 2 and 4 so similar? Is this normal? The output and the code are given below:
Thanks,
Ray
stream 1
0.36630
0.78807
0.40149
1.61842
1.04421
stream 2
-0.50169
0.86743
-1.32459
-0.29372
-0.16414
stream 3
-0.52681
0.84276
0.85682
-0.25308
-0.17656
stream 4
-0.50169
0.86565
0.92212
-0.09324
-0.00586
stream 5
0.36630
0.78807
0.23478
1.61870
0.69967
And here's the code
[fortran]
include 'mkl_vsl.f90'
include "errcheck.inc"
program MKL_VSL_TEST
USE MKL_VSL_TYPE
USE MKL_VSL
integer, parameter :: n = 1000, nstreams = 5
integer(kind=4) errcode
real(kind=4) :: a,sigma,r(n)
integer :: brng,method,seed
integer :: loopCount
TYPE (VSL_STREAM_STATE), dimension(nstreams) :: stream
brng=VSL_BRNG_MT2203
method=VSL_RNG_METHOD_GAUSSIAN_ICDF
seed=777
a=0.0
sigma=1.0
do loopCount = 1,nstreams
! ***** Initialize *****
errcode=vslnewstream( stream(loopCount), brng + loopCount-1, seed )
call CheckVslError(errcode)
! ***** Call RNG *****
errcode=vsrnggaussian( method, stream(loopCount), n, r, a, sigma)
call CheckVslError(errcode)
! *******Print Stuff *****
print '(A,i2)','stream',loopCount
print '(f8.5)', (r(i), i = 1,5)
print *
! ***** Deinitialize *****
errcode=vsldeletestream( stream(loopCount) )
call CheckVslError(errcode)
end do
end program MKL_VSL_TEST
[/fortran]
Further, I need to run one indepedent Markov Chain Monte Carlo sequence per stream. The problem is that beforehand, I do not know exactly how many random numbers I will need (perhaps I can calculate a minimum number of random numbers needed) in each chain (stream). Is using the vector statistical library still the right approach?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ray,
Your first question on repeated random numbers returned by MT2203 BRNGs is discussed in the article http://software.intel.com/en-us/articles/initializing-Intel-MKL-BRNGs
Answer to your second question - yes: you can use streams of independent random numbers generated by MT2203 BRNGs to produce your MCMC. If, for some reasons, MT2203 approach does not work, you might want to try, for example, MCG59 properly initialized with leapfrog service function. The specifics of leapfrog blocking approach is that you do not need to know in advance how many random numbers would be necessary for one path/chain. The parameter you need to provide to the function is a stride which is basically a number of threads you are going to use to parallelize your application.
Please let me know if you need more details.
Andrey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Andrey,
Thanks for your reply and the link explaining the specific nature of certain seeds like 777. I can initialize the streams with a different seed to get widely dispersed starting values. I will proceed now based on your advice on the following matter. For a Markov chain decisions to make a move to the next sampling point is made by the use of ONE random number, based on a ratio of probabilities of the previous to the current sampling point. What scheme would you recommend I use to generate ONE random number at a time, for each independent sequence? Also, I intend to use openMPI or something like it to run the code in parallel on at least 56 cores on a cluster.
This is a most useful thread. Many thanks in advance for your help.
Ray
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ray,
I would not recommend to use Intel(R) MKL RNG in a scalar mode, that is when vector size is one (see, for example, the graph http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/vsl/functions/uniform_cont.htm which shows how performance of the generator depends on vector size). In other words, to benefit from performance of Intel(R) MKL RNGs you need to generate array r[] of at least several hundreed random numbers:
vdRngUniform( method, stream, n, r, a, b );
Once random numbers are available in array r you can process them in vector way:
for ( i = 0; i < n; i++ ) {
if ( r < p[k-1] / p
}
If you need N chains (say, N > Ncores) split them between cores by assigning ~ N / Ncores chains to each core. In its turn, each core should be supplied by source of random numbers, which are used to generate those chains. One possible option is to use Ncore streams of MT2203, each of them would serve a different core (maximal number of streams supported by MT2203 is ~6K). Use of random numbers and blocking technique described above you would be able to generate chains, one by one on each core, and all cores would produce blocks of the chains in parallel.
Of course, there should be more options to parallelize your MCMC code, choice of the most suitable would be defined by requirements and details of your app, for example
- would number of nodes/cores and/or number of paths increase in future?
- do you expect the the output would not change numerically when you add additional cores/nodes to your cluster?
- which properties (in particular, period) do you need from source fo random numbers?
- etc...
Please, let me know, if this helps or more details are necessary to discuss.
Andrey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Andrey,
From what I understand then, I should pre-initialize the random numbers for each stream (one stream per Markov chain) in an array, and then use these numbers in each step of the MCMC chain. I do indeed intend to use Ncore streams of the Mersenne Twister. It is just that I will have to be a little smart about how many numbers per stream (where Ncores = Nstreams = N_parallel_chains) to initialize in each chain, because I have more than one probable move per step - and sometimes the ratio does not need to be calculated if the proposed candidate sample is outisde the prior probability bounds. However, this is not a major issue as I can safely calculate the maximum number of random numbers per chain (~10^7) I will need, and can have them pre-initialized in an array (per chain) as you suggested.
About the questions you raise:
1) I do not expect the number of cores to go up significantly beyond 96 cores
2) I do not expect the output to change significantly on adding more than 96 cores
3) I am not sure I need an extremely large period, as this is not purely random Monte Carlo sampling, but Markov Chain Monte Carlo sampling which is a directed random walk, the directions being given by a model likelihood function. If the starting models are dispersed widely enough, then a period as low as 10^7 (per chain) is fine for me.
Thanks,
Ray
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ray,
Based on the additional details you provided MT2203 streams appear to be reasonable way to start parallelization of your MCMC. I wonder if you would able to apply the blocking similar to what I described earlier? Also, another question is about number of chains - what are the reasons you correlate number of MCMC chains and number of cores, Ncores = N_chains? What if number of chains you would need is much greater than number of available cores?
Andrey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Andrey,
Perhaps the blocking technique will be useful, but I just tried the separate streams approach you outlined above (pre-initialize the random numbers before MCMC): Separate streams are quite easy to implement. For each chain, I implemented 2 separate streams, one with vRngUniform, and the other with vRngGaussian, as I needed both uniform and normally distributed Gaussian numbers (see code below)
I correlate the number of chains and number of cores (N_cores=N_chains) because we're working on massive clusters where a lot of cores are available at any one time. Also, we're working with geophysical problems where one independent chain on one core takes a day to finish. Running more than one chain on one core will take too long, I suspect.
Thanks for all your help,
Ray
[fortran]
subroutine init_intel_random(brng,seed,nProcs,streamU,streamN,methodU,methodN,rU,rN)
integer, intent(in) :: nProcs
real(RealPrec), intent(out), dimension(:,:) :: rU, rN
integer, intent(out) :: brng,seed,methodU,methodN
TYPE (VSL_STREAM_STATE), intent(out), dimension(nProcs) :: streamU, streamN
integer :: loopCount, nRandsU, nRandsN
integer(kind=4) :: errcode
brng=VSL_BRNG_MT2203 !starting stream
methodN=VSL_RNG_METHOD_GAUSSIAN_ICDF !Method for Normal variates
methodU=VSL_RNG_METHOD_UNIFORM_STD !Method for Uniform variates
seed=1
nRandsU = size(rU,1) !Number of Uniform variates
nRandsN = size(rN,1) !Number of Normal variates
!for the Uniform random variables
do loopCount = 1,nProcs
!Initialize nProc Streams
errcode = vslnewstream( streamU(loopCount), brng + loopCount-1, seed )
call CheckVslError(errcode)
!Initialize in each column of rU, nRandsU uniform variates
errcode=vdrnguniform( methodU, streamU(loopCount), nRandsU, rU(:,loopCount), 0d0, 1d0)
call CheckVslError(errcode)
end do
!not the smartest way, but least confusing
!now shift the stream number nProcs higher for the Normal Random Variates
brng=VSL_BRNG_MT2203 + nProcs
!for the Normal random variables
do loopCount = 1,nProcs
!Initialize the next nProc Streams
errcode = vslnewstream( streamN(loopCount), brng + loopCount-1, seed )
call CheckVslError(errcode)
!Initialize in each column of rN, nRandsN standard normal uniform variates
errcode=vdrngGaussian( methodN, streamN(loopCount), nRandsN, rN(:,loopCount), 0d0, 1d0)
call CheckVslError(errcode)
end do
end subroutine init_intel_random
[/fortran]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ray, it is good to know that you implemented separate stream approach. Please, let me know when/if further help on RNGs/other Intel(R) MKL functionality is necessary from our side. Andrey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I ran into exactly the same problem. Thanks for both the question and the great answer!!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have tried the following simple program to generate random numbers for parallel threads using Cilk.
#include "mkl.h"
#include <cilk/cilk.h>
#define NUM_THREADS 2
VSLStreamStatePtr stream[NUM_THREADS];
void cal(int sid) {
double buff[1024];
vdRngUniform(VSL_METHOD_DUNIFORM_STD, stream[sid], 1024, buff, -1.0, 1.0);
}
int main() {
for (int i = 0; i < NUM_THREADS; i++) {
vslNewStream(&stream, VSL_BRNG_WH, 1);
}
for (int i = 0; i < NUM_THREADS; i++) {
cilk_spawn cal(i);
}
cilk_sync;
for (int i = 0; i < NUM_THREADS; i++)
vslDeleteStream(&stream);
return 0;
}
The I compile it with icpc compiler icpc (ICC) 14.0.1 20131008.
icpc -std=c++11 -fopenmp -mkl -I/opt/intel/include/ -g -o pi_mont pi_mont.cpp
If I run it inside cilkscreen, it shows the race condition error.
$ cilkscreen ./pi_mont
Cilkscreen Race Detector V2.0.0, Build 4225
Race condition on location 0x7fc000ee8d00
write access at 0x7fc000b877e3: (vdRngUniform+0xf3)
read access at 0x7fc000b87761: (vdRngUniform+0x71)
called by 0x40172c: (/localstore/theorie/amirsol/paralleluct/pi_mont.cpp:10, cal+0x60)
called by 0x401a3a: (/localstore/theorie/amirsol/paralleluct/pi_mont.cpp:20, main+0x2fc)
1 error found by Cilkscreen
Cilkscreen suppressed 1 duplicate error messages
Ali
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ali,
while we would have a look at your question, I have a clarifying question for you.
WH is a set of 273 Wichmann-Hill’s combined multiplicative congruential generators. Individual generators in this set that produce independent random number sequences are intended to be used in separate threads. You can set-up WH for use in i-th thread as shown below: vslNewStream (&stream, VSL_BRNG_WH+i, seed); More details on the generator are available in Vector Statistical Library Notes at https://software.intel.com/en-us/node/590404.
In your experiment you use the same (WH+0) generator initialized with the same seed=1 in both threads what results into generation of identical sequences. Did you do it for testing purposes only?
Thanks,
Andrey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Andrey,
The purpose of this program is to show the possible race condition. I have also tried vslNewStream(&stream, VSL_BRNG_WH+i, i+100); .This is also shows the race condition. Just out of curiosity, shouldn't we have a identical seed for all threads?
Ali
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ali,
I checked the case you provided with the latest MKL 11.3 and with latest version of Intel(R) Inspector. I was playing with win64 and lin64 versions of MKL. Inspector detects no problem.
>inspxe-cl.exe -collect ti3 pi_mont.exe
....... TEST PASSED .......
0 new problem(s) found
We couldn't apply the Cilkscreen with the latest version of MKL - 11.3 because of Cilkscreen is compatible with code generated by Intel® C++ Composer XE 2013 Update 5 or later ( the version of MKL bundled with this package is 11.1 which was released more than 2 years ago).
could you check how this case works on your side with the latest versions of Intel products ( MKL, Inspector ) and let us know if any further problems.
thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ali,
just to add to Gennady's comment and answer your question on the seeding of the generators in different threads: if you use different WHs from the pool of 273 WH BRNGs, the seed can be the same in the threads. Anyway, the choice of seed (the same or different) is dictated by the different aspects and requirements of your application.
Thanks,
Andrey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Genaddy,
Now, I'm using composer_xe_2015.0.090. This is most recent version that I have on my machine. Trying to compile the code I seethe following error:
pi_mont.cpp(10): error: identifier "VSL_METHOD_DUNIFORM_STD" is undefined
vdRngUniform(VSL_METHOD_DUNIFORM_STD, stream[sid], 1024, buff, -1.0, 1.0);
^
compilation aborted for pi_mont.cpp (code 2)
I have changed VSL_METHOD_DUNIFORM_STD to 0 and compile it. The ldd command shows following dependencies:
linux-vdso.so.1 => (0x00007fff8cfbe000)
libmkl_intel_lp64.so => /opt/intel/composer_xe_2015.0.090/mkl/lib/intel64/libmkl_intel_lp64.so (0x00007f08f2db3000)
libmkl_intel_thread.so => /opt/intel/composer_xe_2015.0.090/mkl/lib/intel64/libmkl_intel_thread.so (0x00007f08f1a18000)
libmkl_core.so => /opt/intel/composer_xe_2015.0.090/mkl/lib/intel64/libmkl_core.so (0x00007f08eff73000)
libiomp5.so => /opt/intel/composer_xe_2015.0.090/compiler/lib/intel64/libiomp5.so (0x00007f08efc48000)
libm.so.6 => /lib64/libm.so.6 (0x00007f08ef9b5000)
libcilkrts.so.5 => /opt/intel/composer_xe_2015.0.090/compiler/lib/intel64/libcilkrts.so.5 (0x00007f08ef776000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f08ef470000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f08ef259000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f08ef03c000)
libc.so.6 => /lib64/libc.so.6 (0x00007f08eeca8000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f08eeaa3000)
/lib64/ld-linux-x86-64.so.2 (0x00007f08f36c1000)
I have printed INTEL_MKL_VERSION and it shows 110200.
Inspxe-cl output is:
0 new problem(s) found
I have also tried Cilkscreen. The output still shows race condition:
Cilkscreen Race Detector V2.0.0, Build 4225
Race condition on location 0x7f7f7e21e040
write access at 0x7f7f7de4cef3: (vdRngUniform+0xf3)
read access at 0x7f7f7de4ce71: (vdRngUniform+0x71)
called by 0x40164c: (/localstore/theorie/amirsol/paralleluct/pi_mont.cpp:11, cal+0x60)
called by 0x401914: (/localstore/theorie/amirsol/paralleluct/pi_mont.cpp:22, main+0x2b7)
1 error found by Cilkscreen
Cilkscreen suppressed 1 duplicate error messages
Which one should I thrust? I do not have MKL 11.3 to check it.
Ali
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ali,
on your observation "pi_mont.cpp(10): error: identifier "VSL_METHOD_DUNIFORM_STD" is undefined": this method was deprecated and removed from the library. Use VSL_RNG_METHOD_UNIFORM_STD instead of it. Please, consult with the table in the section "New method names" available at https://software.intel.com/en-us/node/521870 that maps the legacy names into new ones.
Thanks,
Andrey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page