Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Problems using MKL as PRNG in a MC code (memory leak and data races)

Tobias_D_
Beginner
1,159 Views

Hey there,

I recently ran into trouble when using MKL as the pseudo random number generator in my Monte Carlo particle solver. Background: I switched from the intrinsic Fortran RNG to the MKL RNG. My Code uses both MPI and OpenMP and I want to generate pseudo random numbers on each MPI process independently.

When I implemented the routines I mainly took the information from another discussion thread (https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/283349) here in the forum. In order to generate the random numbers in a thread-safe fashion I chose the Mersenne Twister MT2203 Method, which gives me the possibility to run up to 6024 streams independently (which is absolutely sufficient, since each MPI processes uses up to 28 threads (cores) in our computing nodes).

I tested my implementation with the Intel Inspector XE and it detected a memory leak in the vslnewstream() function and also data races in the vdRngUniform() function. Since the latter could be driven by the memory leak, I concentrated on the former one, ie the memory leak. In order to exclude coding bugs and ease the debugging procedure, I took the vdrnguniform.f code from the MKL examples. Even in this example the memory leak was detected! Now I am wondering what the problem is and how to fix it.

My setup:
CPU:
Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz

ifort --version
ifort (IFORT) 16.0.3 20160415
Copyright (C) 1985-2016 Intel Corporation.  All rights reserved.

inspector-cl --version
Intel(R) Inspector XE 2016 Update 3 (build 460803) Command Line tool
Copyright (C) 2009-2016 Intel Corporation. All rights reserved.

Code (from the MKL examples):

!===============================================================================
! Copyright 2003-2016 Intel Corporation All Rights Reserved.
!
! The source code,  information  and material  ("Material") contained  herein is
! owned by Intel Corporation or its  suppliers or licensors,  and  title to such
! Material remains with Intel  Corporation or its  suppliers or  licensors.  The
! Material  contains  proprietary  information  of  Intel or  its suppliers  and
! licensors.  The Material is protected by  worldwide copyright  laws and treaty
! provisions.  No part  of  the  Material   may  be  used,  copied,  reproduced,
! modified, published,  uploaded, posted, transmitted,  distributed or disclosed
! in any way without Intel's prior express written permission.  No license under
! any patent,  copyright or other  intellectual property rights  in the Material
! is granted to  or  conferred  upon  you,  either   expressly,  by implication,
! inducement,  estoppel  or  otherwise.  Any  license   under such  intellectual
! property rights must be express and approved by Intel in writing.
!
! Unless otherwise agreed by Intel in writing,  you may not remove or alter this
! notice or  any  other  notice   embedded  in  Materials  by  Intel  or Intel's
! suppliers or licensors in any way.
!===============================================================================

!  Content:
!    vdRngUniform  Example Program Text
!*******************************************************************************

      include 'mkl_vsl.f90'
      include "errcheck.inc"

      program MKL_VSL_TEST

      USE MKL_VSL_TYPE
      USE MKL_VSL

      integer(kind=4) i,nn
      integer n
      integer(kind=4) errcode

      real(kind=8) a,b
      real(kind=8) r(1000)
      integer brng,method,seed

      real(kind=8) tM,tD,tQ,tD2
      real(kind=8) sM,sD
      real(kind=8) sum, sum2
      real(kind=8) s
      real(kind=8) DeltaM,DeltaD

      TYPE (VSL_STREAM_STATE) :: stream

      n=1000
      nn=10

      brng=VSL_BRNG_MCG31
      method=VSL_RNG_METHOD_UNIFORM_STD
      seed=1

      a=0.0
      b=1.0

!     ***** Initialize *****
      errcode=vslnewstream( stream, brng,  seed )
      call CheckVslError(errcode)

!     ***** Call RNG *****
      errcode=vdrnguniform( method, stream, n, r, a, b )
      call CheckVslError(errcode)

!     ***** Theoretical moments *****
      tM=(b+a)/2.0
      tD=((b-a)*(b-a))/12.0
      tQ=((b-a)*(b-a)*(b-a)*(b-a))/80.0

!     ***** Sample moments *****
      sum=0.0
      sum2=0.0
      do i=1,n
        sum=sum+r(i)
        sum2=sum2+r(i)*r(i)
      end do
      sM=sum/n
      sD=sum2/n-sM*sM

!     ***** Comparison of theoretical and sample moments *****
      tD2=tD*tD
      s=((tQ-tD2)/n)-(2*(tQ-2*tD2)/(n*n))+((tQ-3*tD2)/(n*n*n))
      DeltaM=(tM-sM)/sqrt(tD/n)
      DeltaD=(tD-sD)/sqrt(s)

!     ***** Printing results *****
      print *,"Sample of vdRngUniform."
      print *,"-----------------------"
      print *,""
      print *,"Parameters:"
      print 11,"    a=",a
      print 11,"    b=",b

      print *,""
      print *,"Results (first 10 of 1000):"
      print *,"---------------------------"
      do i=1,nn
        print 10,r(i)
      end do

      print *,""
      if (abs(DeltaM)>3.0 .OR. abs(DeltaD)>3.0) then
        print 12,"Error: sample moments (mean=",                        &
     &    sM,", variance=",sD,                                          &
     &    ") disagree with theory (mean=",                              &
     &    tM,", variance=",tD,")."
        stop 1
      else
        print 12,"Sample moments (mean=",sM,                            &
     &    ", variance=",sD,") agree with theory (mean=",                &
     &    tM,", variance=",tD,")."
      end if

!     ***** Deinitialize *****
      errcode=vsldeletestream( stream )
      call CheckVslError(errcode)

10    format(F7.3)
11    format(A,F5.3)
12    format(A,F5.2,A,F5.2,A,F5.2,A,F5.2,A)

      end

I compiled the code with:
ifort -g -O0 -mkl vdrnguniform.f -o test

Intel Inspector XE throws following errors:
ID  Type Sources Modules Object Size State
P1  Memory leak vsrnguniform.f test 70816 New
      Memory leak vsrnguniform.f:61 test 70816 New

P2  Invalid memory access [Unknown] libc.so.6  New
      Invalid memory access libc.so.6:0x11a090 libc.so.6  New
P3  Invalid memory access [Unknown] libc.so.6  New
      Invalid memory access libc.so.6:0x11a127 libc.so.6  New
P4  Invalid memory access vsrnguniform.f test  New
      Invalid memory access vsrnguniform.f:61 test  New

P5  Uninitialized memory access kmp_itt.inl libiomp5.so  New
      Uninitialized memory access kmp_itt.inl:1078 libiomp5.so  New
P6  Uninitialized memory access start.S test  New
      Uninitialized memory access start.S:122 test  New

P1 points to following allocation site:
Description Source Function Module Object Size Offset
Allocation site vsrnguniform.f:61 mkl_vsl_test test 184
     59
     60   !     ***** Initialize *****
    >61         errcode=vslnewstream( stream, brng,  seed )
     62         call CheckVslError(errcode)
     63

The statistics in my code implementation and in the MKL example is correct, but I cannot live with memory leaks and data races ...

 

0 Kudos
8 Replies
Gennady_F_Intel
Moderator
1,159 Views

Tobias, could you set mkl_free_buffers() function at the end of the program and check if memory leakages are still exist. 

0 Kudos
Tobias_D_
Beginner
1,159 Views

Hi, some invalid memory accesses and the memory leak in the vslnewstream() function have disappeared:

ID  Type Sources Modules Object Size State
P1  Invalid memory access [Unknown] libc.so.6  New
      Invalid memory access libc.so.6:0x11a090 libc.so.6  New
P2  Invalid memory access [Unknown] libc.so.6  New
      Invalid memory access libc.so.6:0x11a127 libc.so.6  New
P3  Invalid memory access vdrnguniform.f test  New
      Invalid memory access vdrnguniform.f:61 test  New

There is still the invalid memory access problem in the vslnewstream function (P3)

Description Source Function Module Object Size Offset
Read vdrnguniform.f:61 mkl_vsl_test test  
     59
     60   !     ***** Initialize *****
    >61         errcode=vslnewstream( stream, brng,  seed )
     62         call CheckVslError(errcode)
     63

Here are the collector messages:

Analysis started...
Result file: $USER/Misc/Test/insp_xe/mkl_rng/new/new.inspxe
Analysis started for $USER/Misc/Test/insp_xe/mkl_rng/test (pid = 5469)
Loaded module: $USER/Misc/Test/insp_xe/mkl_rng/test, address range [0x400000-0x70b397]
Loaded module: /lib64/ld-linux-x86-64.so.2, address range [0x7f6c515ae000-0x7f6c517d0147], minimal analysis
Loaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_intel_lp64.so, address range [0x7f6c3c0b4000-0x7f6c3cbc3dcf]
Loaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_intel_thread.so, address range [0x7f6c3a6ec000-0x7f6c3c0176af]
Loaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_core.so, address range [0x7f6c38cc1000-0x7f6c3a6d10c7]
Loaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64/libiomp5.so, address range [0x7f6c38924000-0x7f6c38c6735f], minimal analysis
Loaded module: /lib64/libm.so.6, address range [0x7f6c38597000-0x7f6c38897157]
Loaded module: /lib64/libpthread.so.0, address range [0x7f6c38340000-0x7f6c3855c48f]
Loaded module: /lib64/libc.so.6, address range [0x7f6c37f88000-0x7f6c3832fa1f]
Loaded module: /lib64/libgcc_s.so.1, address range [0x7f6c37d47000-0x7f6c37f5d42f]
Loaded module: /lib64/libdl.so.2, address range [0x7f6c37b29000-0x7f6c37d2c10f], minimal analysis
Loaded module: $USER/Misc/Tools/local/tools/inspector_xe_2016.1.3.460803/lib64/runtime/libittnotify.so, address range [0x7f6c34f63000-0x7f6c3516df7f], minimal analysis
Loaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_avx2.so, address range [0x7f6c31b68000-0x7f6c33ddc69f]
Loaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_vml_avx2.so, address range [0x7f6c30d26000-0x7f6c319bf107]
Process $USER/Misc/Test/insp_xe/mkl_rng/test exited with code 0. Leak analysis starting. Please wait...
Unloaded module: $USER/Misc/Test/insp_xe/mkl_rng/test
Unloaded module: /lib64/ld-linux-x86-64.so.2
Unloaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_intel_lp64.so
Unloaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_intel_thread.so
Unloaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_core.so
Unloaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64/libiomp5.so
Unloaded module: /lib64/libm.so.6
Unloaded module: /lib64/libpthread.so.0
Unloaded module: /lib64/libc.so.6
Unloaded module: /lib64/libgcc_s.so.1
Unloaded module: /lib64/libdl.so.2
Unloaded module: $USER/Misc/Tools/local/tools/inspector_xe_2016.1.3.460803/lib64/runtime/libittnotify.so
Unloaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_avx2.so
Unloaded module: /media/LDS/module/software/compiler/intel/16.0.3/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_vml_avx2.so
Completed analysis for $USER/Misc/Test/insp_xe/mkl_rng/test
Application exit code: 0
Result file: $USER/Misc/Test/insp_xe/mkl_rng/new/new.inspxe
Analysis completed
 
3 new problem(s) found
    3 Invalid memory access problem(s) detected

0 Kudos
Tobias_D_
Beginner
1,159 Views

FYI: This is my implementation of the problem and Intel Inspector XE throws several errors:

$ inspxe-cl -collect mi3 -result-dir new1 ./test
4 new problem(s) found
    3 Invalid memory access problem(s) detected
    1 Memory not deallocated problem(s) detected

$ inspxe-cl -collect ti3 -result-dir new2 ./test
1 new problem(s) found
    1 Data race problem(s) detected

This is the code:
 

include "mkl_vsl.f90"

program invoke_PRNG_MKL_uniform
 
  use omp_lib
  use mkl_vsl
  use mkl_vsl_type
 
  implicit none
 
  ! transfer variables
  integer                ::  NVariates, NThreads
  real(kind=8)           ::  rLeft, rRight
  real(kind=8)           ::  rAvg, rVar
  real(kind=8), dimension(:), allocatable            ::  arVariates
  type(VSL_STREAM_STATE), dimension(:), allocatable  ::  aaStreams
 
  ! local variables
  integer       ::  iErr, i, iMySeed
  integer       ::  iThread, iRest, NVariatesPThMin, NVariatesPTh, NVariatesPThAcc
 
 
  ! +----------------+
  ! | Initialization |
  ! +----------------+
 
  ! allocate memory
  NThreads  = 4
  NVariates = 400000
  allocate( aaStreams( NThreads ), stat=iErr )
  if(iErr /= 0) then
    write(*,*) "Streams allocation error!"
    stop
  end if
  allocate( arVariates( NVariates ), stat=iErr )
  if(iErr /= 0) then
    write(*,*) "Variates allocation error!"
    stop
  end if
 
  ! initialize streams
  do i=1,NThreads
   
    ! create seed (if desired)
      iMySeed = 0
   
    ! create VSL stream
    iErr = vslNewStream( aaStreams(i), VSL_BRNG_MT2203+i-1, iMySeed )
    if(iErr /= 0) then
      write(*,*) "MKL Error: ",iErr
      stop
    end if
   
  end do
 
  ! initialize OpenMP
  call omp_set_num_threads( NThreads )
  write(*,*) " Number of OpenMP Threads: ",NThreads
 
 
  ! +-------------+
  ! | Computation |
  ! +-------------+
 
  ! set problem
  rLeft  = -2.5d0 
  rRight =  1.5d0 

  ! set minimum number of variates and remainder
  iRest           = mod(NVariates,  NThreads)
  NVariatesPThMin = int(NVariates / NThreads)
  write(*,*) " setup: ",iRest,NVariatesPThMin
 
  ! invoke MKL random number generator
  !$OMP PARALLEL PRIVATE(iErr,iThread,NVariatesPth,NVariatesPThAcc)
   
    iThread = omp_get_thread_num()
   
    ! determine number of variates to generate
    if(iThread < iRest) then
      NVariatesPth    = NVariatesPthMin + 1
      NVariatesPthAcc = iThread*NVariatesPthMin + iThread
    else
      NVariatesPth    = NVariatesPthMin
      NVariatesPthAcc = iThread*NVariatesPthMin !+ iRest
    end if
    !$OMP CRITICAL
    write(*,*) " thread: ",iThread,NVariatesPTh,NVariatesPThAcc
    !$OMP END CRITICAL
   
    if( NVariatesPth > 0 ) then
   
      ! generate random samples
      iErr = vdRngUniform( VSL_RNG_METHOD_UNIFORM_STD_ACCURATE, &
                         & aaStreams(iThread+1),                &
                         & NVariatesPTh,                        &
                         & arVariates(NVariatesPthAcc+1),       &
                         & rLeft, rRight                        );
      if(iErr /= 0) then
        write(*,*) "MKL Error: ",iErr
        stop
      end if
   
    end if
  !$OMP END PARALLEL
 
 
  ! +---------+
  ! | Results |
  ! +---------+
 
  rAvg = 0.0d0
  do i=1,NVariates
    rAvg = rAvg + arVariates(i)
  end do
  rAvg = rAvg / NVariates
  write(*,'(a,f8.5,a,f8.5,a)') " avg: ",rAvg," (expected: ",0.5d0 * (rLeft + rRight),")"
  rAvg = 0.5d0 * (rLeft + rRight)
  rVar = 0.0d0
  do i=1,NVariates
    rVar = rVar + (arVariates(i) - rAvg)*(arVariates(i) - rAvg)
  end do
  rVar = rVar / NVariates
  write(*,'(a,f8.5,a,f8.5,a)') " var: ",rVar," (expected: ",1.d0/12.d0 * (rRight - rLeft)**2,")"
 
 
  ! +--------------+
  ! | Finalization |
  ! +--------------+
 
  ! destroy streams
  do i=1,NThreads
   
    iErr = vslDeleteStream( aaStreams(i) )
    if(iErr /= 0) then
      write(*,*) "MKL Error: ",iErr
      stop
    end if
   
  end do
  call mkl_free_buffers()
 
  ! deallocate memory
  deallocate( aaStreams, stat=iErr )
  if(iErr /= 0) then
    write(*,*) "Streams deallocation error!"
    stop
  end if
  deallocate( arVariates, stat=iErr )
  if(iErr /= 0) then
    write(*,*) "Variates deallocation error!"
    stop
  end if
 

end program invoke_PRNG_MKL_uniform

The code is complied as described above.

0 Kudos
Gennady_F_Intel
Moderator
1,159 Views

wrt data race - I see no problem with MKL 2017 ( latest ) and Inspector 2017 ( latest as well) with the latest example of the code you gave.

inspxe-cl -collect ti3 -result-dir ti3 ./a.out
  Number of OpenMP Threads:            4
  setup:            0      100000
  thread:            0      100000           0
 avg: -0.12553 (expected: -0.50000)
 var:  0.52144 (expected:  1.33333)
0 new problem(s) found

but there are some memory issues we see with the same environment:

inspxe-cl -collect mi3 -result-dir mi3 ./a.out
  Number of OpenMP Threads:            4
  setup:            0      100000
  thread:            0      100000           0
 avg: -0.12553 (expected: -0.50000)
 var:  0.52144 (expected:  1.33333)

6 new problem(s) found
    5 Invalid memory access problem(s) detected
    1 Memory not deallocated problem(s) detected
 

 

 

0 Kudos
Gennady_F_Intel
Moderator
1,159 Views

forget to add - the problem is escalated and we will  keep you updated with the status.

0 Kudos
Tobias_D_
Beginner
1,159 Views

Could it be that you forgot to add the -qopenmp flag? This could be the reason for the wrong results.

I compiled it with: ifort -g -O0 -mkl -qopenmp test_mkl.f90 -o test

I tried the 2017 tools (Composer, MKL, Inspector) and you are right, the data races seem to be gone:

$ inspxe-cl -collect ti3 -result-dir new1 ./test
  Number of OpenMP Threads:            4
  setup:            0      100000
Warning: One or more threads in the application accessed the stack of another thread. This may indicate one or more bugs in your application. Setting the Inspector to detect data races on stack accesses and running another analysis may help you locate these and other bugs.
  thread:            1      100000      100000
  thread:            2      100000      200000
  thread:            0      100000           0
  thread:            3      100000      300000
 avg: -0.50326 (expected: -0.50000)
 var:  1.33160 (expected:  1.33333)
 
0 new problem(s) found

$ inspxe-cl -collect mi3 -result-dir new2 ./test
  Number of OpenMP Threads:            4
  setup:            0      100000
  thread:            0      100000           0
  thread:            1      100000      100000
  thread:            3      100000      300000
  thread:            2      100000      200000
 avg: -0.50326 (expected: -0.50000)
 var:  1.33160 (expected:  1.33333)
 
6 new problem(s) found
    5 Invalid memory access problem(s) detected
    1 Memory not deallocated problem(s) detected

Why is there still a warning?

0 Kudos
Tobias_D_
Beginner
1,159 Views

Any news on this matter?

0 Kudos
Gennady_F_Intel
Moderator
1,159 Views

yes, with qopenmp option, we see the same warning. At the first glance the code is correct and the warning may be caused by false positives message from Inspector.

0 Kudos
Reply