- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey there,
I recently ran into trouble when using MKL as the pseudo random number generator in my Monte Carlo particle solver. Background: I switched from the intrinsic Fortran RNG to the MKL RNG. My Code uses both MPI and OpenMP and I want to generate pseudo random numbers on each MPI process independently.
When I implemented the routines I mainly took the information from another discussion thread (https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/283349) here in the forum. In order to generate the random numbers in a thread-safe fashion I chose the Mersenne Twister MT2203 Method, which gives me the possibility to run up to 6024 streams independently (which is absolutely sufficient, since each MPI processes uses up to 28 threads (cores) in our computing nodes).
I tested my implementation with the Intel Inspector XE and it detected a memory leak in the vslnewstream() function and also data races in the vdRngUniform() function. Since the latter could be driven by the memory leak, I concentrated on the former one, ie the memory leak. In order to exclude coding bugs and ease the debugging procedure, I took the vdrnguniform.f code from the MKL examples. Even in this example the memory leak was detected! Now I am wondering what the problem is and how to fix it.
My setup:
CPU:
Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
ifort --version
ifort (IFORT) 16.0.3 20160415
Copyright (C) 1985-2016 Intel Corporation. All rights reserved.
inspector-cl --version
Intel(R) Inspector XE 2016 Update 3 (build 460803) Command Line tool
Copyright (C) 2009-2016 Intel Corporation. All rights reserved.
Code (from the MKL examples):
!===============================================================================
! Copyright 2003-2016 Intel Corporation All Rights Reserved.
!
! The source code, information and material ("Material") contained herein is
! owned by Intel Corporation or its suppliers or licensors, and title to such
! Material remains with Intel Corporation or its suppliers or licensors. The
! Material contains proprietary information of Intel or its suppliers and
! licensors. The Material is protected by worldwide copyright laws and treaty
! provisions. No part of the Material may be used, copied, reproduced,
! modified, published, uploaded, posted, transmitted, distributed or disclosed
! in any way without Intel's prior express written permission. No license under
! any patent, copyright or other intellectual property rights in the Material
! is granted to or conferred upon you, either expressly, by implication,
! inducement, estoppel or otherwise. Any license under such intellectual
! property rights must be express and approved by Intel in writing.
!
! Unless otherwise agreed by Intel in writing, you may not remove or alter this
! notice or any other notice embedded in Materials by Intel or Intel's
! suppliers or licensors in any way.
!===============================================================================
! Content:
! vdRngUniform Example Program Text
!*******************************************************************************
include 'mkl_vsl.f90'
include "errcheck.inc"
program MKL_VSL_TEST
USE MKL_VSL_TYPE
USE MKL_VSL
integer(kind=4) i,nn
integer n
integer(kind=4) errcode
real(kind=8) a,b
real(kind=8) r(1000)
integer brng,method,seed
real(kind=8) tM,tD,tQ,tD2
real(kind=8) sM,sD
real(kind=8) sum, sum2
real(kind=8) s
real(kind=8) DeltaM,DeltaD
TYPE (VSL_STREAM_STATE) :: stream
n=1000
nn=10
brng=VSL_BRNG_MCG31
method=VSL_RNG_METHOD_UNIFORM_STD
seed=1
a=0.0
b=1.0
! ***** Initialize *****
errcode=vslnewstream( stream, brng, seed )
call CheckVslError(errcode)
! ***** Call RNG *****
errcode=vdrnguniform( method, stream, n, r, a, b )
call CheckVslError(errcode)
! ***** Theoretical moments *****
tM=(b+a)/2.0
tD=((b-a)*(b-a))/12.0
tQ=((b-a)*(b-a)*(b-a)*(b-a))/80.0
! ***** Sample moments *****
sum=0.0
sum2=0.0
do i=1,n
sum=sum+r(i)
sum2=sum2+r(i)*r(i)
end do
sM=sum/n
sD=sum2/n-sM*sM
! ***** Comparison of theoretical and sample moments *****
tD2=tD*tD
s=((tQ-tD2)/n)-(2*(tQ-2*tD2)/(n*n))+((tQ-3*tD2)/(n*n*n))
DeltaM=(tM-sM)/sqrt(tD/n)
DeltaD=(tD-sD)/sqrt(s)
! ***** Printing results *****
print *,"Sample of vdRngUniform."
print *,"-----------------------"
print *,""
print *,"Parameters:"
print 11," a=",a
print 11," b=",b
print *,""
print *,"Results (first 10 of 1000):"
print *,"---------------------------"
do i=1,nn
print 10,r(i)
end do
print *,""
if (abs(DeltaM)>3.0 .OR. abs(DeltaD)>3.0) then
print 12,"Error: sample moments (mean=", &
& sM,", variance=",sD, &
& ") disagree with theory (mean=", &
& tM,", variance=",tD,")."
stop 1
else
print 12,"Sample moments (mean=",sM, &
& ", variance=",sD,") agree with theory (mean=", &
& tM,", variance=",tD,")."
end if
! ***** Deinitialize *****
errcode=vsldeletestream( stream )
call CheckVslError(errcode)
10 format(F7.3)
11 format(A,F5.3)
12 format(A,F5.2,A,F5.2,A,F5.2,A,F5.2,A)
end
I compiled the code with:
ifort -g -O0 -mkl vdrnguniform.f -o test
Intel Inspector XE throws following errors:
ID Type Sources Modules Object Size State
P1 Memory leak vsrnguniform.f test 70816 New
Memory leak vsrnguniform.f:61 test 70816 New
P2 Invalid memory access [Unknown] libc.so.6 New
Invalid memory access libc.so.6:0x11a090 libc.so.6 New
P3 Invalid memory access [Unknown] libc.so.6 New
Invalid memory access libc.so.6:0x11a127 libc.so.6 New
P4 Invalid memory access vsrnguniform.f test New
Invalid memory access vsrnguniform.f:61 test New
P5 Uninitialized memory access kmp_itt.inl libiomp5.so New
Uninitialized memory access kmp_itt.inl:1078 libiomp5.so New
P6 Uninitialized memory access start.S test New
Uninitialized memory access start.S:122 test New
P1 points to following allocation site:
Description Source Function Module Object Size Offset
Allocation site vsrnguniform.f:61 mkl_vsl_test test 184
59
60 ! ***** Initialize *****
>61 errcode=vslnewstream( stream, brng, seed )
62 call CheckVslError(errcode)
63
The statistics in my code implementation and in the MKL example is correct, but I cannot live with memory leaks and data races ...
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tobias, could you set mkl_free_buffers() function at the end of the program and check if memory leakages are still exist.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, some invalid memory accesses and the memory leak in the vslnewstream() function have disappeared:
ID Type Sources Modules Object Size State
P1 Invalid memory access [Unknown] libc.so.6 New
Invalid memory access libc.so.6:0x11a090 libc.so.6 New
P2 Invalid memory access [Unknown] libc.so.6 New
Invalid memory access libc.so.6:0x11a127 libc.so.6 New
P3 Invalid memory access vdrnguniform.f test New
Invalid memory access vdrnguniform.f:61 test New
There is still the invalid memory access problem in the vslnewstream function (P3)
Description Source Function Module Object Size Offset
Read vdrnguniform.f:61 mkl_vsl_test test
59
60 ! ***** Initialize *****
>61 errcode=vslnewstream( stream, brng, seed )
62 call CheckVslError(errcode)
63
Here are the collector messages:
Analysis started...
Result file: $USER/Misc/Test/insp_xe/mkl_rng/new/new.inspxe
Analysis started for $USER/Misc/Test/insp_xe/mkl_rng/test (pid = 5469)
Loaded module: $USER/Misc/Test/insp_xe/mkl_rng/test, address range [0x400000-0x70b397]
Loaded module: /lib64/ld-linux-x86-64.so.2, address range [0x7f6c515ae000-0x7f6c517d0147], minimal analysis
Loaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_intel_lp64.so, address range [0x7f6c3c0b4000-0x7f6c3cbc3dcf]
Loaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_intel_thread.so, address range [0x7f6c3a6ec000-0x7f6c3c0176af]
Loaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_core.so, address range [0x7f6c38cc1000-0x7f6c3a6d10c7]
Loaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64/libiomp5.so, address range [0x7f6c38924000-0x7f6c38c6735f], minimal analysis
Loaded module: /lib64/libm.so.6, address range [0x7f6c38597000-0x7f6c38897157]
Loaded module: /lib64/libpthread.so.0, address range [0x7f6c38340000-0x7f6c3855c48f]
Loaded module: /lib64/libc.so.6, address range [0x7f6c37f88000-0x7f6c3832fa1f]
Loaded module: /lib64/libgcc_s.so.1, address range [0x7f6c37d47000-0x7f6c37f5d42f]
Loaded module: /lib64/libdl.so.2, address range [0x7f6c37b29000-0x7f6c37d2c10f], minimal analysis
Loaded module: $USER/Misc/Tools/local/tools/inspector_xe_2016.1.3.460803/lib64/runtime/libittnotify.so, address range [0x7f6c34f63000-0x7f6c3516df7f], minimal analysis
Loaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_avx2.so, address range [0x7f6c31b68000-0x7f6c33ddc69f]
Loaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_vml_avx2.so, address range [0x7f6c30d26000-0x7f6c319bf107]
Process $USER/Misc/Test/insp_xe/mkl_rng/test exited with code 0. Leak analysis starting. Please wait...
Unloaded module: $USER/Misc/Test/insp_xe/mkl_rng/test
Unloaded module: /lib64/ld-linux-x86-64.so.2
Unloaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_intel_lp64.so
Unloaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_intel_thread.so
Unloaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_core.so
Unloaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64/libiomp5.so
Unloaded module: /lib64/libm.so.6
Unloaded module: /lib64/libpthread.so.0
Unloaded module: /lib64/libc.so.6
Unloaded module: /lib64/libgcc_s.so.1
Unloaded module: /lib64/libdl.so.2
Unloaded module: $USER/Misc/Tools/local/tools/inspector_xe_2016.1.3.460803/lib64/runtime/libittnotify.so
Unloaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_avx2.so
Unloaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_vml_avx2.so
Completed analysis for $USER/Misc/Test/insp_xe/mkl_rng/test
Application exit code: 0
Result file: $USER/Misc/Test/insp_xe/mkl_rng/new/new.inspxe
Analysis completed
3 new problem(s) found
3 Invalid memory access problem(s) detected
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
FYI: This is my implementation of the problem and Intel Inspector XE throws several errors:
$ inspxe-cl -collect mi3 -result-dir new1 ./test
4 new problem(s) found
3 Invalid memory access problem(s) detected
1 Memory not deallocated problem(s) detected
$ inspxe-cl -collect ti3 -result-dir new2 ./test
1 new problem(s) found
1 Data race problem(s) detected
This is the code:
include "mkl_vsl.f90"
program invoke_PRNG_MKL_uniform
use omp_lib
use mkl_vsl
use mkl_vsl_type
implicit none
! transfer variables
integer :: NVariates, NThreads
real(kind=8) :: rLeft, rRight
real(kind=8) :: rAvg, rVar
real(kind=8), dimension(:), allocatable :: arVariates
type(VSL_STREAM_STATE), dimension(:), allocatable :: aaStreams
! local variables
integer :: iErr, i, iMySeed
integer :: iThread, iRest, NVariatesPThMin, NVariatesPTh, NVariatesPThAcc
! +----------------+
! | Initialization |
! +----------------+
! allocate memory
NThreads = 4
NVariates = 400000
allocate( aaStreams( NThreads ), stat=iErr )
if(iErr /= 0) then
write(*,*) "Streams allocation error!"
stop
end if
allocate( arVariates( NVariates ), stat=iErr )
if(iErr /= 0) then
write(*,*) "Variates allocation error!"
stop
end if
! initialize streams
do i=1,NThreads
! create seed (if desired)
iMySeed = 0
! create VSL stream
iErr = vslNewStream( aaStreams(i), VSL_BRNG_MT2203+i-1, iMySeed )
if(iErr /= 0) then
write(*,*) "MKL Error: ",iErr
stop
end if
end do
! initialize OpenMP
call omp_set_num_threads( NThreads )
write(*,*) " Number of OpenMP Threads: ",NThreads
! +-------------+
! | Computation |
! +-------------+
! set problem
rLeft = -2.5d0
rRight = 1.5d0
! set minimum number of variates and remainder
iRest = mod(NVariates, NThreads)
NVariatesPThMin = int(NVariates / NThreads)
write(*,*) " setup: ",iRest,NVariatesPThMin
! invoke MKL random number generator
!$OMP PARALLEL PRIVATE(iErr,iThread,NVariatesPth,NVariatesPThAcc)
iThread = omp_get_thread_num()
! determine number of variates to generate
if(iThread < iRest) then
NVariatesPth = NVariatesPthMin + 1
NVariatesPthAcc = iThread*NVariatesPthMin + iThread
else
NVariatesPth = NVariatesPthMin
NVariatesPthAcc = iThread*NVariatesPthMin !+ iRest
end if
!$OMP CRITICAL
write(*,*) " thread: ",iThread,NVariatesPTh,NVariatesPThAcc
!$OMP END CRITICAL
if( NVariatesPth > 0 ) then
! generate random samples
iErr = vdRngUniform( VSL_RNG_METHOD_UNIFORM_STD_ACCURATE, &
& aaStreams(iThread+1), &
& NVariatesPTh, &
& arVariates(NVariatesPthAcc+1), &
& rLeft, rRight );
if(iErr /= 0) then
write(*,*) "MKL Error: ",iErr
stop
end if
end if
!$OMP END PARALLEL
! +---------+
! | Results |
! +---------+
rAvg = 0.0d0
do i=1,NVariates
rAvg = rAvg + arVariates(i)
end do
rAvg = rAvg / NVariates
write(*,'(a,f8.5,a,f8.5,a)') " avg: ",rAvg," (expected: ",0.5d0 * (rLeft + rRight),")"
rAvg = 0.5d0 * (rLeft + rRight)
rVar = 0.0d0
do i=1,NVariates
rVar = rVar + (arVariates(i) - rAvg)*(arVariates(i) - rAvg)
end do
rVar = rVar / NVariates
write(*,'(a,f8.5,a,f8.5,a)') " var: ",rVar," (expected: ",1.d0/12.d0 * (rRight - rLeft)**2,")"
! +--------------+
! | Finalization |
! +--------------+
! destroy streams
do i=1,NThreads
iErr = vslDeleteStream( aaStreams(i) )
if(iErr /= 0) then
write(*,*) "MKL Error: ",iErr
stop
end if
end do
call mkl_free_buffers()
! deallocate memory
deallocate( aaStreams, stat=iErr )
if(iErr /= 0) then
write(*,*) "Streams deallocation error!"
stop
end if
deallocate( arVariates, stat=iErr )
if(iErr /= 0) then
write(*,*) "Variates deallocation error!"
stop
end if
end program invoke_PRNG_MKL_uniform
The code is complied as described above.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
wrt data race - I see no problem with MKL 2017 ( latest ) and Inspector 2017 ( latest as well) with the latest example of the code you gave.
inspxe-cl -collect ti3 -result-dir ti3 ./a.out
Number of OpenMP Threads: 4
setup: 0 100000
thread: 0 100000 0
avg: -0.12553 (expected: -0.50000)
var: 0.52144 (expected: 1.33333)
0 new problem(s) found
but there are some memory issues we see with the same environment:
inspxe-cl -collect mi3 -result-dir mi3 ./a.out
Number of OpenMP Threads: 4
setup: 0 100000
thread: 0 100000 0
avg: -0.12553 (expected: -0.50000)
var: 0.52144 (expected: 1.33333)
6 new problem(s) found
5 Invalid memory access problem(s) detected
1 Memory not deallocated problem(s) detected
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
forget to add - the problem is escalated and we will keep you updated with the status.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could it be that you forgot to add the -qopenmp flag? This could be the reason for the wrong results.
I compiled it with: ifort -g -O0 -mkl -qopenmp test_mkl.f90 -o test
I tried the 2017 tools (Composer, MKL, Inspector) and you are right, the data races seem to be gone:
$ inspxe-cl -collect ti3 -result-dir new1 ./test
Number of OpenMP Threads: 4
setup: 0 100000
Warning: One or more threads in the application accessed the stack of another thread. This may indicate one or more bugs in your application. Setting the Inspector to detect data races on stack accesses and running another analysis may help you locate these and other bugs.
thread: 1 100000 100000
thread: 2 100000 200000
thread: 0 100000 0
thread: 3 100000 300000
avg: -0.50326 (expected: -0.50000)
var: 1.33160 (expected: 1.33333)
0 new problem(s) found
$ inspxe-cl -collect mi3 -result-dir new2 ./test
Number of OpenMP Threads: 4
setup: 0 100000
thread: 0 100000 0
thread: 1 100000 100000
thread: 3 100000 300000
thread: 2 100000 200000
avg: -0.50326 (expected: -0.50000)
var: 1.33160 (expected: 1.33333)
6 new problem(s) found
5 Invalid memory access problem(s) detected
1 Memory not deallocated problem(s) detected
Why is there still a warning?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Any news on this matter?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yes, with qopenmp option, we see the same warning. At the first glance the code is correct and the warning may be caused by false positives message from Inspector.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page