Software Archive
Read-only legacy content
17061 Discussions

help determining memory leak

conor_p_
Beginner
472 Views
 
HI everyone, I am analyzing my code using valgrind and I seem to incurring memory leaks whenever I offload to the coprocessor. Specifically below, the first offload transfer line is cited as troublesome and a memory leak. I am showing a snippet of my code where I allocate my arrays on the coprocessor. All these arrays are declared as globals, and are visible everywhere. Integers nlistzie, ourmic,cache_size,np are all global and have been previsiously initializized. I was wondering if anyone can help me make any sense of these errors I am getting from valgrind. My code is an embarrassingly parallel MD code. MPI ranks are ran different sockets and only interact with their respective MIC denoted by the variable ourmic. My initial guess is that either the fact that the variables are global, or something to do with the MPI may cause this error. Any help is appreciated.
      !dir$ offload_transfer target(mic:ourmic) in(position: alloc_if(.true.) free_if(.false.)), &
      !dir$ in(nlist: alloc_if(.true.) free_if(.false.)),                                   &
      !dir$ in(numneigh: alloc_if(.true.) free_if(.false.)),                                &
      !dir$ in(dr2array: alloc_if(.true.) free_if(.false.)),                                &
      !dir$ in(specbond: alloc_if(.true.) free_if(.false.)),                                &
      !dir$ in(num3bond: alloc_if(.true.) free_if(.false.)),                                &
      !dir$ in(atombin: alloc_if(.true.) free_if(.false.)),                                 &
      !dir$ in(start: alloc_if(.true.) free_if(.false.)),                                   &
      !dir$ in(endposit: alloc_if(.true.) free_if(.false.)),                                &
      !dir$ in(cnum: alloc_if(.true.) free_if(.false.)),                                    &           
      !dir$ in(ff: alloc_if(.true.), free_if(.false.)),                                     &
      !dir$ in(q: alloc_if(.true.) free_if(.false.)),                                       &
      !dir$ in(lj1: alloc_if(.true.) free_if(.false.)),                                     &
      !dir$ in(lj2: alloc_if(.true.) free_if(.false.)),                                     &
      !dir$ in(lj3: alloc_if(.true.) free_if(.false.)),                                     &
      !dir$ in(lj4: alloc_if(.true.) free_if(.false.)),                                     &
      !dir$ in(nlistsize,cache_size,np,ncellT)
      print*,'we managed to allocate that'

Valgrind is giving me two kinds of errors that I don't understand. In terms of variables being undeclared I get the following. mod_memory.f90:273 refers specifically to the first offload line which seems to be troublesome

!dir$ offload_transfer target(mic:ourmic) in(position: alloc_if(.true.) free_if(.false.)), 

==39354== Warning: set address range perms: large range [0x12c7b040, 0x2b3972f0) (undefined)
==39354== Warning: set address range perms: large range [0x51291040, 0x7fb3b040) (undefined)
==39354== Syscall param ioctl(generic) points to uninitialised byte(s)
==39354==    at 0x31F80E08C7: ioctl (in /lib64/libc-2.12.so)
==39354==    by 0x31F9000E5B: scif_bind (in /usr/lib64/libscif.so.0.0.1)
==39354==    by 0x31F984CBB7: ??? (in /usr/lib64/libcoi_host.so.0)
==39354==    by 0x31F9830EB8: ??? (in /usr/lib64/libcoi_host.so.0)
==39354==    by 0x31F9831564: ??? (in /usr/lib64/libcoi_host.so.0)
==39354==    by 0x31F9831655: ??? (in /usr/lib64/libcoi_host.so.0)
==39354==    by 0x31F981183F: COIEngineGetHandle (in /usr/lib64/libcoi_host.so.0)
==39354==    by 0x81B6FF2: Engine::init_process() (in /apps/rhel6/intel/composer_xe_2013.3.163/compiler/lib/intel64/liboffload.so.5)
==39354==    by 0x81B6EED: Engine::init() (in /apps/rhel6/intel/composer_xe_2013.3.163/compiler/lib/intel64/liboffload.so.5)
==39354==    by 0x81C4F8E: __offload_target_acquire (in /apps/rhel6/intel/composer_xe_2013.3.163/compiler/lib/intel64/liboffload.so.5)
==39354==    by 0x410FF8: mod_memory_mp_xeon_phi_memory_neighbor_alloc_full_ (mod_memory.f90:273)
==39354==    by 0x40E053: mod_memory_mp_memory_ (mod_memory.f90:154)
==39354==  Address 0x7feffbb30 is on thread 1's stack
==39354== 
==39354== Syscall param ioctl(generic) points to uninitialised byte(s)
==39354==    at 0x31F80E08C7: ioctl (in /lib64/libc-2.12.so)
==39354==    by 0x31F9001128: scif_connect (in /usr/lib64/libscif.so.0.0.1)
==39354==    by 0x31F984CABA: ??? (in /usr/lib64/libcoi_host.so.0)
==39354==    by 0x31F984CB46: ??? (in /usr/lib64/libcoi_host.so.0)
==39354==    by 0x31F984CBC7: ??? (in /usr/lib64/libcoi_host.so.0)
==39354==    by 0x31F9830EB8: ??? (in /usr/lib64/libcoi_host.so.0)
==39354==    by 0x31F9831564: ??? (in /usr/lib64/libcoi_host.so.0)
==39354==    by 0x31F9831655: ??? (in /usr/lib64/libcoi_host.so.0)
==39354==    by 0x31F981183F: COIEngineGetHandle (in /usr/lib64/libcoi_host.so.0)
==39354==    by 0x81B6FF2: Engine::init_process() (in /apps/rhel6/intel/composer_xe_2013.3.163/compiler/lib/intel64/liboffload.so.5)
==39354==    by 0x81B6EED: Engine::init() (in /apps/rhel6/intel/composer_xe_2013.3.163/compiler/lib/intel64/liboffload.so.5)
==39354==    by 0x81C4F8E: __offload_target_acquire (in /apps/rhel6/intel/composer_xe_2013.3.163/compiler/lib/intel64/liboffload.so.5)
==39354==  Address 0x7feffba60 is on thread 1's stack
==39354== Conditional jump or move depends on uninitialised value(s)
==39354==    at 0x31F984D072: ??? (in /usr/lib64/libcoi_host.so.0)
==39354==    by 0x31F984D1CB: ??? (in /usr/lib64/libcoi_host.so.0)
==39354==    by 0x31F983115E: ??? (in /usr/lib64/libcoi_host.so.0)
==39354==    by 0x31F9831564: ??? (in /usr/lib64/libcoi_host.so.0)
==39354==    by 0x31F9831655: ??? (in /usr/lib64/libcoi_host.so.0)
==39354==    by 0x31F981183F: COIEngineGetHandle (in /usr/lib64/libcoi_host.so.0)
==39354==    by 0x81B6FF2: Engine::init_process() (in /apps/rhel6/intel/composer_xe_2013.3.163/compiler/lib/intel64/liboffload.so.5)
==39354==    by 0x81B6EED: Engine::init() (in /apps/rhel6/intel/composer_xe_2013.3.163/compiler/lib/intel64/liboffload.so.5)
==39354==    by 0x81C4F8E: __offload_target_acquire (in /apps/rhel6/intel/composer_xe_2013.3.163/compiler/lib/intel64/liboffload.so.5)
==39354==    by 0x410FF8: mod_memory_mp_xeon_phi_memory_neighbor_alloc_full_ (mod_memory.f90:273)
==39354==    by 0x40E053: mod_memory_mp_memory_ (mod_memory.f90:154)
==39354==    by 0x4D4D43: mod_restart_datadump_mp_restart_datadump_ (mod_restart_datadump.f90:100)

From a memory leak perspective, I get

=39354== 672 (360 direct, 312 indirect) bytes in 1 blocks are definitely lost in loss record 369 of 612
==39354==    at 0x4A074CC: operator new(unsigned long) (vg_replace_malloc.c:298)
==39354==    by 0x31F981A545: ???
==39354==    by 0x31F9810196: ???
==39354==    by 0x81B892F: Engine::create_buffer_from_memory(unsigned long, unsigned int, void*, bool) (in /apps/rhel6/intel/composer_xe_2013.3.163/compiler/lib/intel64/liboffload.so.5)
==39354==    by 0x81BAB66: OffloadDescriptor::init_static_ptr_data(PtrData*) (in /apps/rhel6/intel/composer_xe_2013.3.163/compiler/lib/intel64/liboffload.so.5)
==39354==    by 0x81BA9D1: OffloadDescriptor::find_ptr_data(void*, long, long, bool) (in /apps/rhel6/intel/composer_xe_2013.3.163/compiler/lib/intel64/liboffload.so.5)
==39354==    by 0x81BB852: OffloadDescriptor::offload(char const*, bool, VarDesc*, VarDesc2*, int, void**, int, void*) (in /apps/rhel6/intel/composer_xe_2013.3.163/compiler/lib/intel64/liboffload.so.5)
==39354==    by 0x81C551F: __offload_offload (in /apps/rhel6/intel/composer_xe_2013.3.163/compiler/lib/intel64/liboffload.so.5)
==39354==    by 0x41175B: mod_memory_mp_xeon_phi_memory_neighbor_alloc_full_ (mod_memory.f90:273)
==39354==    by 0x40E053: mod_memory_mp_memory_ (mod_memory.f90:154)
==39354==    by 0x4D4D43: mod_restart_datadump_mp_restart_datadump_ (mod_restart_datadump.f90:100)
==39354==    by 0x4E0F4E: MAIN__ (MD.f90:60)

 

 

0 Kudos
0 Replies
Reply