- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey there,
I recently ran into trouble when using MKL as the pseudo random number generator in my Monte Carlo particle solver. Background: I switched from the intrinsic Fortran RNG to the MKL RNG. My Code uses both MPI and OpenMP and I want to generate pseudo random numbers on each MPI process independently.
When I implemented the routines I mainly took the information from another discussion thread (https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/283349) here in the forum. In order to generate the random numbers in a thread-safe fashion I chose the Mersenne Twister MT2203 Method, which gives me the possibility to run up to 6024 streams independently (which is absolutely sufficient, since each MPI processes uses up to 28 threads (cores) in our computing nodes).
I tested my implementation with the Intel Inspector XE and it detected a memory leak in the vslnewstream() function and also data races in the vdRngUniform() function. Since the latter could be driven by the memory leak, I concentrated on the former one, ie the memory leak. In order to exclude coding bugs and ease the debugging procedure, I took the vdrnguniform.f code from the MKL examples. Even in this example the memory leak was detected! Now I am wondering what the problem is and how to fix it.
My setup:
CPU:
Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
ifort --version
ifort (IFORT) 16.0.3 20160415
Copyright (C) 1985-2016 Intel Corporation. All rights reserved.
inspector-cl --version
Intel(R) Inspector XE 2016 Update 3 (build 460803) Command Line tool
Copyright (C) 2009-2016 Intel Corporation. All rights reserved.
Code (from the MKL examples):
!=============================================================================== ! Copyright 2003-2016 Intel Corporation All Rights Reserved. ! ! The source code, information and material ("Material") contained herein is ! owned by Intel Corporation or its suppliers or licensors, and title to such ! Material remains with Intel Corporation or its suppliers or licensors. The ! Material contains proprietary information of Intel or its suppliers and ! licensors. The Material is protected by worldwide copyright laws and treaty ! provisions. No part of the Material may be used, copied, reproduced, ! modified, published, uploaded, posted, transmitted, distributed or disclosed ! in any way without Intel's prior express written permission. No license under ! any patent, copyright or other intellectual property rights in the Material ! is granted to or conferred upon you, either expressly, by implication, ! inducement, estoppel or otherwise. Any license under such intellectual ! property rights must be express and approved by Intel in writing. ! ! Unless otherwise agreed by Intel in writing, you may not remove or alter this ! notice or any other notice embedded in Materials by Intel or Intel's ! suppliers or licensors in any way. !=============================================================================== ! Content: ! vdRngUniform Example Program Text !******************************************************************************* include 'mkl_vsl.f90' include "errcheck.inc" program MKL_VSL_TEST USE MKL_VSL_TYPE USE MKL_VSL integer(kind=4) i,nn integer n integer(kind=4) errcode real(kind=8) a,b real(kind=8) r(1000) integer brng,method,seed real(kind=8) tM,tD,tQ,tD2 real(kind=8) sM,sD real(kind=8) sum, sum2 real(kind=8) s real(kind=8) DeltaM,DeltaD TYPE (VSL_STREAM_STATE) :: stream n=1000 nn=10 brng=VSL_BRNG_MCG31 method=VSL_RNG_METHOD_UNIFORM_STD seed=1 a=0.0 b=1.0 ! ***** Initialize ***** errcode=vslnewstream( stream, brng, seed ) call CheckVslError(errcode) ! ***** Call RNG ***** errcode=vdrnguniform( method, stream, n, r, a, b ) call CheckVslError(errcode) ! ***** Theoretical moments ***** tM=(b+a)/2.0 tD=((b-a)*(b-a))/12.0 tQ=((b-a)*(b-a)*(b-a)*(b-a))/80.0 ! ***** Sample moments ***** sum=0.0 sum2=0.0 do i=1,n sum=sum+r(i) sum2=sum2+r(i)*r(i) end do sM=sum/n sD=sum2/n-sM*sM ! ***** Comparison of theoretical and sample moments ***** tD2=tD*tD s=((tQ-tD2)/n)-(2*(tQ-2*tD2)/(n*n))+((tQ-3*tD2)/(n*n*n)) DeltaM=(tM-sM)/sqrt(tD/n) DeltaD=(tD-sD)/sqrt(s) ! ***** Printing results ***** print *,"Sample of vdRngUniform." print *,"-----------------------" print *,"" print *,"Parameters:" print 11," a=",a print 11," b=",b print *,"" print *,"Results (first 10 of 1000):" print *,"---------------------------" do i=1,nn print 10,r(i) end do print *,"" if (abs(DeltaM)>3.0 .OR. abs(DeltaD)>3.0) then print 12,"Error: sample moments (mean=", & & sM,", variance=",sD, & & ") disagree with theory (mean=", & & tM,", variance=",tD,")." stop 1 else print 12,"Sample moments (mean=",sM, & & ", variance=",sD,") agree with theory (mean=", & & tM,", variance=",tD,")." end if ! ***** Deinitialize ***** errcode=vsldeletestream( stream ) call CheckVslError(errcode) 10 format(F7.3) 11 format(A,F5.3) 12 format(A,F5.2,A,F5.2,A,F5.2,A,F5.2,A) end
I compiled the code with:
ifort -g -O0 -mkl vdrnguniform.f -o test
Intel Inspector XE throws following errors:
ID Type Sources Modules Object Size State
P1 Memory leak vsrnguniform.f test 70816 New
Memory leak vsrnguniform.f:61 test 70816 New
P2 Invalid memory access [Unknown] libc.so.6 New
Invalid memory access libc.so.6:0x11a090 libc.so.6 New
P3 Invalid memory access [Unknown] libc.so.6 New
Invalid memory access libc.so.6:0x11a127 libc.so.6 New
P4 Invalid memory access vsrnguniform.f test New
Invalid memory access vsrnguniform.f:61 test New
P5 Uninitialized memory access kmp_itt.inl libiomp5.so New
Uninitialized memory access kmp_itt.inl:1078 libiomp5.so New
P6 Uninitialized memory access start.S test New
Uninitialized memory access start.S:122 test New
P1 points to following allocation site:
Description Source Function Module Object Size Offset
Allocation site vsrnguniform.f:61 mkl_vsl_test test 184
59
60 ! ***** Initialize *****
>61 errcode=vslnewstream( stream, brng, seed )
62 call CheckVslError(errcode)
63
The statistics in my code implementation and in the MKL example is correct, but I cannot live with memory leaks and data races ...
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tobias, could you set mkl_free_buffers() function at the end of the program and check if memory leakages are still exist.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, some invalid memory accesses and the memory leak in the vslnewstream() function have disappeared:
ID Type Sources Modules Object Size State
P1 Invalid memory access [Unknown] libc.so.6 New
Invalid memory access libc.so.6:0x11a090 libc.so.6 New
P2 Invalid memory access [Unknown] libc.so.6 New
Invalid memory access libc.so.6:0x11a127 libc.so.6 New
P3 Invalid memory access vdrnguniform.f test New
Invalid memory access vdrnguniform.f:61 test New
There is still the invalid memory access problem in the vslnewstream function (P3)
Description Source Function Module Object Size Offset
Read vdrnguniform.f:61 mkl_vsl_test test
59
60 ! ***** Initialize *****
>61 errcode=vslnewstream( stream, brng, seed )
62 call CheckVslError(errcode)
63
Here are the collector messages:
Analysis started...
Result file: $USER/Misc/Test/insp_xe/mkl_rng/new/new.inspxe
Analysis started for $USER/Misc/Test/insp_xe/mkl_rng/test (pid = 5469)
Loaded module: $USER/Misc/Test/insp_xe/mkl_rng/test, address range [0x400000-0x70b397]
Loaded module: /lib64/ld-linux-x86-64.so.2, address range [0x7f6c515ae000-0x7f6c517d0147], minimal analysis
Loaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_intel_lp64.so, address range [0x7f6c3c0b4000-0x7f6c3cbc3dcf]
Loaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_intel_thread.so, address range [0x7f6c3a6ec000-0x7f6c3c0176af]
Loaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_core.so, address range [0x7f6c38cc1000-0x7f6c3a6d10c7]
Loaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64/libiomp5.so, address range [0x7f6c38924000-0x7f6c38c6735f], minimal analysis
Loaded module: /lib64/libm.so.6, address range [0x7f6c38597000-0x7f6c38897157]
Loaded module: /lib64/libpthread.so.0, address range [0x7f6c38340000-0x7f6c3855c48f]
Loaded module: /lib64/libc.so.6, address range [0x7f6c37f88000-0x7f6c3832fa1f]
Loaded module: /lib64/libgcc_s.so.1, address range [0x7f6c37d47000-0x7f6c37f5d42f]
Loaded module: /lib64/libdl.so.2, address range [0x7f6c37b29000-0x7f6c37d2c10f], minimal analysis
Loaded module: $USER/Misc/Tools/local/tools/inspector_xe_2016.1.3.460803/lib64/runtime/libittnotify.so, address range [0x7f6c34f63000-0x7f6c3516df7f], minimal analysis
Loaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_avx2.so, address range [0x7f6c31b68000-0x7f6c33ddc69f]
Loaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_vml_avx2.so, address range [0x7f6c30d26000-0x7f6c319bf107]
Process $USER/Misc/Test/insp_xe/mkl_rng/test exited with code 0. Leak analysis starting. Please wait...
Unloaded module: $USER/Misc/Test/insp_xe/mkl_rng/test
Unloaded module: /lib64/ld-linux-x86-64.so.2
Unloaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_intel_lp64.so
Unloaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_intel_thread.so
Unloaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_core.so
Unloaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64/libiomp5.so
Unloaded module: /lib64/libm.so.6
Unloaded module: /lib64/libpthread.so.0
Unloaded module: /lib64/libc.so.6
Unloaded module: /lib64/libgcc_s.so.1
Unloaded module: /lib64/libdl.so.2
Unloaded module: $USER/Misc/Tools/local/tools/inspector_xe_2016.1.3.460803/lib64/runtime/libittnotify.so
Unloaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_avx2.so
Unloaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_vml_avx2.so
Completed analysis for $USER/Misc/Test/insp_xe/mkl_rng/test
Application exit code: 0
Result file: $USER/Misc/Test/insp_xe/mkl_rng/new/new.inspxe
Analysis completed
3 new problem(s) found
3 Invalid memory access problem(s) detected
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
FYI: This is my implementation of the problem and Intel Inspector XE throws several errors:
$ inspxe-cl -collect mi3 -result-dir new1 ./test
4 new problem(s) found
3 Invalid memory access problem(s) detected
1 Memory not deallocated problem(s) detected
$ inspxe-cl -collect ti3 -result-dir new2 ./test
1 new problem(s) found
1 Data race problem(s) detected
This is the code:
include "mkl_vsl.f90" program invoke_PRNG_MKL_uniform use omp_lib use mkl_vsl use mkl_vsl_type implicit none ! transfer variables integer :: NVariates, NThreads real(kind=8) :: rLeft, rRight real(kind=8) :: rAvg, rVar real(kind=8), dimension(:), allocatable :: arVariates type(VSL_STREAM_STATE), dimension(:), allocatable :: aaStreams ! local variables integer :: iErr, i, iMySeed integer :: iThread, iRest, NVariatesPThMin, NVariatesPTh, NVariatesPThAcc ! +----------------+ ! | Initialization | ! +----------------+ ! allocate memory NThreads = 4 NVariates = 400000 allocate( aaStreams( NThreads ), stat=iErr ) if(iErr /= 0) then write(*,*) "Streams allocation error!" stop end if allocate( arVariates( NVariates ), stat=iErr ) if(iErr /= 0) then write(*,*) "Variates allocation error!" stop end if ! initialize streams do i=1,NThreads ! create seed (if desired) iMySeed = 0 ! create VSL stream iErr = vslNewStream( aaStreams(i), VSL_BRNG_MT2203+i-1, iMySeed ) if(iErr /= 0) then write(*,*) "MKL Error: ",iErr stop end if end do ! initialize OpenMP call omp_set_num_threads( NThreads ) write(*,*) " Number of OpenMP Threads: ",NThreads ! +-------------+ ! | Computation | ! +-------------+ ! set problem rLeft = -2.5d0 rRight = 1.5d0 ! set minimum number of variates and remainder iRest = mod(NVariates, NThreads) NVariatesPThMin = int(NVariates / NThreads) write(*,*) " setup: ",iRest,NVariatesPThMin ! invoke MKL random number generator !$OMP PARALLEL PRIVATE(iErr,iThread,NVariatesPth,NVariatesPThAcc) iThread = omp_get_thread_num() ! determine number of variates to generate if(iThread < iRest) then NVariatesPth = NVariatesPthMin + 1 NVariatesPthAcc = iThread*NVariatesPthMin + iThread else NVariatesPth = NVariatesPthMin NVariatesPthAcc = iThread*NVariatesPthMin !+ iRest end if !$OMP CRITICAL write(*,*) " thread: ",iThread,NVariatesPTh,NVariatesPThAcc !$OMP END CRITICAL if( NVariatesPth > 0 ) then ! generate random samples iErr = vdRngUniform( VSL_RNG_METHOD_UNIFORM_STD_ACCURATE, & & aaStreams(iThread+1), & & NVariatesPTh, & & arVariates(NVariatesPthAcc+1), & & rLeft, rRight ); if(iErr /= 0) then write(*,*) "MKL Error: ",iErr stop end if end if !$OMP END PARALLEL ! +---------+ ! | Results | ! +---------+ rAvg = 0.0d0 do i=1,NVariates rAvg = rAvg + arVariates(i) end do rAvg = rAvg / NVariates write(*,'(a,f8.5,a,f8.5,a)') " avg: ",rAvg," (expected: ",0.5d0 * (rLeft + rRight),")" rAvg = 0.5d0 * (rLeft + rRight) rVar = 0.0d0 do i=1,NVariates rVar = rVar + (arVariates(i) - rAvg)*(arVariates(i) - rAvg) end do rVar = rVar / NVariates write(*,'(a,f8.5,a,f8.5,a)') " var: ",rVar," (expected: ",1.d0/12.d0 * (rRight - rLeft)**2,")" ! +--------------+ ! | Finalization | ! +--------------+ ! destroy streams do i=1,NThreads iErr = vslDeleteStream( aaStreams(i) ) if(iErr /= 0) then write(*,*) "MKL Error: ",iErr stop end if end do call mkl_free_buffers() ! deallocate memory deallocate( aaStreams, stat=iErr ) if(iErr /= 0) then write(*,*) "Streams deallocation error!" stop end if deallocate( arVariates, stat=iErr ) if(iErr /= 0) then write(*,*) "Variates deallocation error!" stop end if end program invoke_PRNG_MKL_uniform
The code is complied as described above.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
wrt data race - I see no problem with MKL 2017 ( latest ) and Inspector 2017 ( latest as well) with the latest example of the code you gave.
inspxe-cl -collect ti3 -result-dir ti3 ./a.out
Number of OpenMP Threads: 4
setup: 0 100000
thread: 0 100000 0
avg: -0.12553 (expected: -0.50000)
var: 0.52144 (expected: 1.33333)
0 new problem(s) found
but there are some memory issues we see with the same environment:
inspxe-cl -collect mi3 -result-dir mi3 ./a.out
Number of OpenMP Threads: 4
setup: 0 100000
thread: 0 100000 0
avg: -0.12553 (expected: -0.50000)
var: 0.52144 (expected: 1.33333)
6 new problem(s) found
5 Invalid memory access problem(s) detected
1 Memory not deallocated problem(s) detected
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
forget to add - the problem is escalated and we will keep you updated with the status.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could it be that you forgot to add the -qopenmp flag? This could be the reason for the wrong results.
I compiled it with: ifort -g -O0 -mkl -qopenmp test_mkl.f90 -o test
I tried the 2017 tools (Composer, MKL, Inspector) and you are right, the data races seem to be gone:
$ inspxe-cl -collect ti3 -result-dir new1 ./test
Number of OpenMP Threads: 4
setup: 0 100000
Warning: One or more threads in the application accessed the stack of another thread. This may indicate one or more bugs in your application. Setting the Inspector to detect data races on stack accesses and running another analysis may help you locate these and other bugs.
thread: 1 100000 100000
thread: 2 100000 200000
thread: 0 100000 0
thread: 3 100000 300000
avg: -0.50326 (expected: -0.50000)
var: 1.33160 (expected: 1.33333)
0 new problem(s) found
$ inspxe-cl -collect mi3 -result-dir new2 ./test
Number of OpenMP Threads: 4
setup: 0 100000
thread: 0 100000 0
thread: 1 100000 100000
thread: 3 100000 300000
thread: 2 100000 200000
avg: -0.50326 (expected: -0.50000)
var: 1.33160 (expected: 1.33333)
6 new problem(s) found
5 Invalid memory access problem(s) detected
1 Memory not deallocated problem(s) detected
Why is there still a warning?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Any news on this matter?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yes, with qopenmp option, we see the same warning. At the first glance the code is correct and the warning may be caused by false positives message from Inspector.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page