Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Memory leak with MKL

O_Rourke__Conn
3,372 Views

Hi There, 

 

I have a problem with a code I'm using (vasp electronic structure code v5.4.1) which I have modified to run multiple times within a loop.  I noticed I was getting memory leaks, so figured there was some allocatable not being cleaned up somewhere. 

I passed it through valgrind, and got the output at the bottom of the message. It seems like the memory leaks are coming from mkl, not the main code itself. 

I was wondering if anyone has seen this type of thing before, and if so knows how to solve it.

I'm using Parallel studio XE cluster: intel_2020/compilers_and_libraries_2020.0.166

Thanks,

Conn

 

 

==19089== 264 bytes in 1 blocks are possibly lost in loss record 18 of 26
==19089==    at 0x887CB6B: malloc (vg_replace_malloc.c:299)
==19089==    by 0xBB92709: mkl_serv_malloc (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089==    by 0xBB91A90: mkl_serv_allocate (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089==    by 0xA170576: MKL_BLACS_ALLOCATE (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
==19089==    by 0xA153629: blacs_gridmap_ (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
==19089==    by 0xA1531FD: blacs_gridinit_ (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
==19089==    by 0x4369C1: scala_mp_pdssyex_zheevx_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0xCC665B: david_mp_eddav_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0xCE4762: elmin_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x1234378: MAIN__ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x4066C1: main (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== 
==19089== 10,464 bytes in 1 blocks are possibly lost in loss record 21 of 26
==19089==    at 0x887CB6B: malloc (vg_replace_malloc.c:299)
==19089==    by 0xBB91916: mkl_serv_allocate (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089==    by 0xA170576: MKL_BLACS_ALLOCATE (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
==19089==    by 0xA16AC53: BI_GetBuff (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
==19089==    by 0xA14C987: zgsum2d_ (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
==19089==    by 0x9B34EFD: pzlarfb_ (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_scalapack_lp64.so)
==19089==    by 0x9B6CAD0: pzunmql_ (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_scalapack_lp64.so)
==19089==    by 0x9B70323: pzunmtr_ (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_scalapack_lp64.so)
==19089==    by 0x9AE2099: mkl_pzheevx0_ (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_scalapack_lp64.so)
==19089==    by 0x9ADE9C6: pzheevx_ (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_scalapack_lp64.so)
==19089==    by 0x4374A8: scala_mp_pdssyex_zheevx_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0xCC665B: david_mp_eddav_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== 
==19089== 69,664 bytes in 1 blocks are possibly lost in loss record 22 of 26
==19089==    at 0x887CB6B: malloc (vg_replace_malloc.c:299)
==19089==    by 0xBB933B7: mm_account_ptr_by_tid..0 (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089==    by 0xBB914C4: mkl_serv_allocate (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089==    by 0xC580A66: mkl_lapack_zhseqr (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089==    by 0xC508660: mkl_lapack_zgeev (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089==    by 0x8E935D7: ZGEEV (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so)
==19089==    by 0x56BA9F: spinsym_mp_set_spinrot_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x58016F: ibzkpt_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x5336AD: mkpoints_mp_rd_kpoints_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x53180E: mkpoints_mp_setup_kpoints_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x1216A13: MAIN__ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x4066C1: main (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== 
==19089== 4,309,536 bytes in 1 blocks are possibly lost in loss record 23 of 26
==19089==    at 0x887CB6B: malloc (vg_replace_malloc.c:299)
==19089==    by 0xBB91E13: mkl_serv_allocate (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089==    by 0x154B4845: mkl_blas_avx2_zgemm_get_bufs (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089==    by 0x154B6174: mkl_blas_avx2_z_generic_fullacopybcopy (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089==    by 0x154C7CD5: mkl_blas_avx2_xzgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089==    by 0xBBCA6EF: mkl_blas_xzgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089==    by 0xA4FA875: mkl_blas_zgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_sequential.so)
==19089==    by 0x8C21576: ZGEMM (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so)
==19089==    by 0x60698D: orth1_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x6090BF: choleski_mp_orthch_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x123092B: MAIN__ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x4066C1: main (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== 
==19089== 4,364,832 bytes in 1 blocks are possibly lost in loss record 24 of 26
==19089==    at 0x887CB6B: malloc (vg_replace_malloc.c:299)
==19089==    by 0xBB91E13: mkl_serv_allocate (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089==    by 0x154B4845: mkl_blas_avx2_zgemm_get_bufs (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089==    by 0x154B6174: mkl_blas_avx2_z_generic_fullacopybcopy (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089==    by 0x154C7CD5: mkl_blas_avx2_xzgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089==    by 0xBBCA6EF: mkl_blas_xzgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089==    by 0xA4FA875: mkl_blas_zgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_sequential.so)
==19089==    by 0x8C2149E: ZGEMM (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so)
==19089==    by 0x60698D: orth1_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x6090BF: choleski_mp_orthch_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x123092B: MAIN__ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x4066C1: main (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== 
==19089== 4,647,456 bytes in 1 blocks are possibly lost in loss record 25 of 26
==19089==    at 0x887CB6B: malloc (vg_replace_malloc.c:299)
==19089==    by 0xBB91E13: mkl_serv_allocate (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089==    by 0x154B4845: mkl_blas_avx2_zgemm_get_bufs (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089==    by 0x154CFD48: mkl_blas_avx2_xztrmm_right_upper_fullacopybcopy (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089==    by 0x154CE0D6: mkl_blas_avx2_xztrmm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089==    by 0xBBCE07F: mkl_blas_xztrmm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089==    by 0xA4FB178: mkl_blas_ztrmm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_sequential.so)
==19089==    by 0x8C25205: ZTRMM (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so)
==19089==    by 0x6048CA: linbas_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x60400E: dfast_mp_lincom_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x609ED8: choleski_mp_orthch_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x123092B: MAIN__ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== 
==19089== 6,076,960 bytes in 1 blocks are possibly lost in loss record 26 of 26
==19089==    at 0x887CB6B: malloc (vg_replace_malloc.c:299)
==19089==    by 0xBB91916: mkl_serv_allocate (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089==    by 0x154B4845: mkl_blas_avx2_zgemm_get_bufs (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089==    by 0x154B6174: mkl_blas_avx2_z_generic_fullacopybcopy (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089==    by 0x154C7CD5: mkl_blas_avx2_xzgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089==    by 0xBBCA6EF: mkl_blas_xzgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089==    by 0xA4FA875: mkl_blas_zgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_sequential.so)
==19089==    by 0x8C2149E: ZGEMM (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so)
==19089==    by 0xCC49FE: david_mp_eddav_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0xCE4762: elmin_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x1234378: MAIN__ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x4066C1: main (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== 
==19089== LEAK SUMMARY:
==19089==    definitely lost: 111 bytes in 8 blocks
==19089==    indirectly lost: 0 bytes in 0 blocks
==19089==      possibly lost: 19,479,432 bytes in 8 blocks
==19089==    still reachable: 5,129 bytes in 15 blocks
==19089==         suppressed: 0 bytes in 0 blocks
==19089== Reachable blocks (those to which a pointer was found) are not shown.
==19089== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==19089== 
==19089== For counts of detected and suppressed errors, rerun with: -v
==19089== Use --track-origins=yes to see where uninitialised values come from
==19089== ERROR SUMMARY: 201 errors from 41 contexts (suppressed: 0 from 0)

 

 

 

0 Kudos
1 Solution
Gennady_F_Intel
Moderator
3,314 Views

This issue has been resolved and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only. 



View solution in original post

0 Kudos
6 Replies
Kirill_V_Intel
Employee
3,342 Views

Hi Cohn,

I believe you should call mkl_free_buffers() after the last call to MKL, see https://software.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/support-functions/memory-management/mkl-free-buffers.html

MKL(oneMKL) can create and use internal buffers which in some cases are crucial for getting optimal performance. While the existence of these buffers is transparent to the users, this API (mkl_free_buffers) allows the user to explicitly tell MKL to get rid of those buffers.

Let us know if this solves the problem.

Best,
Kirill

 

 

0 Kudos
O_Rourke__Conn
3,336 Views

Hi Kirill, 

 

I already call mkl_free_buffers() at the end of each loop (I have modified the code to loop, and communicate to a python parent with MPI rather than having multiple spawns).

Calling mkl_free_buffers() has no effect on the memory issues, and the code eventually falls over anyway.

 

Do you have any other suggestions?

 

Thanks, 

Conn

0 Kudos
Kirill_V_Intel
Employee
3,333 Views

Hi Conn,

[Just to make it clear, I am not directly working on the MKL service layer but on other MKL components.]

Ok, one more suggestion: call mkl_mem_stat right before and after the mkl_free_buffers at the end of the loop and tell us what do you see (in terms of # allocated buffers and # allocated bytes).

And one more thing to check is what each of the MPI processes do. Can it be that not all of them are calling mkl_free_buffers? Can it be that BLACS init/gridinit is called repeatedly? I would also try (if possible) to run on a single MPI and check that the issue is present.

Best,
Kirill

0 Kudos
Kirill_V_Intel
Employee
3,331 Views

Conn, one more suggeston:

Try to turn the fast memory manager off by calling mkl_disable_fast_mm (or setting the equiaveltn environment variable):

https://software.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/support-functions/memory-management/mkl-disable-fast-mm.html

I suspect that the leak will go away but it would be great to have it as a fact.

Best,
Kirill

0 Kudos
O_Rourke__Conn
3,325 Views

Hi Kirill, 

 

Thanks for the suggestions. I have been running a standalone version of the code with the loop (no spawn from python) on a single mpi process for testing. 

The # of allocated bytes before and after the call mkl_free_buffers() remains the same (11532), with the memory management turned off. Though I suppose that suggests that the memory usage should be constant, not bloating. Perhaps the problem is not with mkl at all, as I tested with non-mkl versions of the libraries and found the same thing happening.

 

I'll investigate further - feel free to close the ticket. I'll open another if I am sure it mkl.

 

Thanks, 

Conn

0 Kudos
Gennady_F_Intel
Moderator
3,315 Views

This issue has been resolved and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only. 



0 Kudos
Reply