Community
cancel
Showing results for 
Search instead for 
Did you mean: 
140 Views

Memory leak with MKL

Jump to solution

Hi There, 

 

I have a problem with a code I'm using (vasp electronic structure code v5.4.1) which I have modified to run multiple times within a loop.  I noticed I was getting memory leaks, so figured there was some allocatable not being cleaned up somewhere. 

I passed it through valgrind, and got the output at the bottom of the message. It seems like the memory leaks are coming from mkl, not the main code itself. 

I was wondering if anyone has seen this type of thing before, and if so knows how to solve it.

I'm using Parallel studio XE cluster: intel_2020/compilers_and_libraries_2020.0.166

Thanks,

Conn

 

 

==19089== 264 bytes in 1 blocks are possibly lost in loss record 18 of 26
==19089==    at 0x887CB6B: malloc (vg_replace_malloc.c:299)
==19089==    by 0xBB92709: mkl_serv_malloc (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089==    by 0xBB91A90: mkl_serv_allocate (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089==    by 0xA170576: MKL_BLACS_ALLOCATE (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
==19089==    by 0xA153629: blacs_gridmap_ (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
==19089==    by 0xA1531FD: blacs_gridinit_ (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
==19089==    by 0x4369C1: scala_mp_pdssyex_zheevx_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0xCC665B: david_mp_eddav_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0xCE4762: elmin_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x1234378: MAIN__ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x4066C1: main (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== 
==19089== 10,464 bytes in 1 blocks are possibly lost in loss record 21 of 26
==19089==    at 0x887CB6B: malloc (vg_replace_malloc.c:299)
==19089==    by 0xBB91916: mkl_serv_allocate (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089==    by 0xA170576: MKL_BLACS_ALLOCATE (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
==19089==    by 0xA16AC53: BI_GetBuff (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
==19089==    by 0xA14C987: zgsum2d_ (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
==19089==    by 0x9B34EFD: pzlarfb_ (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_scalapack_lp64.so)
==19089==    by 0x9B6CAD0: pzunmql_ (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_scalapack_lp64.so)
==19089==    by 0x9B70323: pzunmtr_ (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_scalapack_lp64.so)
==19089==    by 0x9AE2099: mkl_pzheevx0_ (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_scalapack_lp64.so)
==19089==    by 0x9ADE9C6: pzheevx_ (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_scalapack_lp64.so)
==19089==    by 0x4374A8: scala_mp_pdssyex_zheevx_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0xCC665B: david_mp_eddav_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== 
==19089== 69,664 bytes in 1 blocks are possibly lost in loss record 22 of 26
==19089==    at 0x887CB6B: malloc (vg_replace_malloc.c:299)
==19089==    by 0xBB933B7: mm_account_ptr_by_tid..0 (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089==    by 0xBB914C4: mkl_serv_allocate (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089==    by 0xC580A66: mkl_lapack_zhseqr (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089==    by 0xC508660: mkl_lapack_zgeev (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089==    by 0x8E935D7: ZGEEV (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so)
==19089==    by 0x56BA9F: spinsym_mp_set_spinrot_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x58016F: ibzkpt_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x5336AD: mkpoints_mp_rd_kpoints_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x53180E: mkpoints_mp_setup_kpoints_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x1216A13: MAIN__ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x4066C1: main (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== 
==19089== 4,309,536 bytes in 1 blocks are possibly lost in loss record 23 of 26
==19089==    at 0x887CB6B: malloc (vg_replace_malloc.c:299)
==19089==    by 0xBB91E13: mkl_serv_allocate (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089==    by 0x154B4845: mkl_blas_avx2_zgemm_get_bufs (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089==    by 0x154B6174: mkl_blas_avx2_z_generic_fullacopybcopy (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089==    by 0x154C7CD5: mkl_blas_avx2_xzgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089==    by 0xBBCA6EF: mkl_blas_xzgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089==    by 0xA4FA875: mkl_blas_zgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_sequential.so)
==19089==    by 0x8C21576: ZGEMM (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so)
==19089==    by 0x60698D: orth1_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x6090BF: choleski_mp_orthch_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x123092B: MAIN__ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x4066C1: main (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== 
==19089== 4,364,832 bytes in 1 blocks are possibly lost in loss record 24 of 26
==19089==    at 0x887CB6B: malloc (vg_replace_malloc.c:299)
==19089==    by 0xBB91E13: mkl_serv_allocate (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089==    by 0x154B4845: mkl_blas_avx2_zgemm_get_bufs (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089==    by 0x154B6174: mkl_blas_avx2_z_generic_fullacopybcopy (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089==    by 0x154C7CD5: mkl_blas_avx2_xzgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089==    by 0xBBCA6EF: mkl_blas_xzgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089==    by 0xA4FA875: mkl_blas_zgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_sequential.so)
==19089==    by 0x8C2149E: ZGEMM (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so)
==19089==    by 0x60698D: orth1_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x6090BF: choleski_mp_orthch_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x123092B: MAIN__ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x4066C1: main (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== 
==19089== 4,647,456 bytes in 1 blocks are possibly lost in loss record 25 of 26
==19089==    at 0x887CB6B: malloc (vg_replace_malloc.c:299)
==19089==    by 0xBB91E13: mkl_serv_allocate (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089==    by 0x154B4845: mkl_blas_avx2_zgemm_get_bufs (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089==    by 0x154CFD48: mkl_blas_avx2_xztrmm_right_upper_fullacopybcopy (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089==    by 0x154CE0D6: mkl_blas_avx2_xztrmm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089==    by 0xBBCE07F: mkl_blas_xztrmm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089==    by 0xA4FB178: mkl_blas_ztrmm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_sequential.so)
==19089==    by 0x8C25205: ZTRMM (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so)
==19089==    by 0x6048CA: linbas_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x60400E: dfast_mp_lincom_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x609ED8: choleski_mp_orthch_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x123092B: MAIN__ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== 
==19089== 6,076,960 bytes in 1 blocks are possibly lost in loss record 26 of 26
==19089==    at 0x887CB6B: malloc (vg_replace_malloc.c:299)
==19089==    by 0xBB91916: mkl_serv_allocate (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089==    by 0x154B4845: mkl_blas_avx2_zgemm_get_bufs (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089==    by 0x154B6174: mkl_blas_avx2_z_generic_fullacopybcopy (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089==    by 0x154C7CD5: mkl_blas_avx2_xzgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089==    by 0xBBCA6EF: mkl_blas_xzgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089==    by 0xA4FA875: mkl_blas_zgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_sequential.so)
==19089==    by 0x8C2149E: ZGEMM (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so)
==19089==    by 0xCC49FE: david_mp_eddav_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0xCE4762: elmin_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x1234378: MAIN__ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==    by 0x4066C1: main (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== 
==19089== LEAK SUMMARY:
==19089==    definitely lost: 111 bytes in 8 blocks
==19089==    indirectly lost: 0 bytes in 0 blocks
==19089==      possibly lost: 19,479,432 bytes in 8 blocks
==19089==    still reachable: 5,129 bytes in 15 blocks
==19089==         suppressed: 0 bytes in 0 blocks
==19089== Reachable blocks (those to which a pointer was found) are not shown.
==19089== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==19089== 
==19089== For counts of detected and suppressed errors, rerun with: -v
==19089== Use --track-origins=yes to see where uninitialised values come from
==19089== ERROR SUMMARY: 201 errors from 41 contexts (suppressed: 0 from 0)

 

 

 

0 Kudos

Accepted Solutions
Gennady_F_Intel
Moderator
82 Views

This issue has been resolved and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only. 



View solution in original post

6 Replies
Kirill_V_Intel
Employee
108 Views

Hi Cohn,

I believe you should call mkl_free_buffers() after the last call to MKL, see https://software.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/...

MKL(oneMKL) can create and use internal buffers which in some cases are crucial for getting optimal performance. While the existence of these buffers is transparent to the users, this API (mkl_free_buffers) allows the user to explicitly tell MKL to get rid of those buffers.

Let us know if this solves the problem.

Best,
Kirill

 

 

102 Views

Hi Kirill, 

 

I already call mkl_free_buffers() at the end of each loop (I have modified the code to loop, and communicate to a python parent with MPI rather than having multiple spawns).

Calling mkl_free_buffers() has no effect on the memory issues, and the code eventually falls over anyway.

 

Do you have any other suggestions?

 

Thanks, 

Conn

Kirill_V_Intel
Employee
99 Views

Hi Conn,

[Just to make it clear, I am not directly working on the MKL service layer but on other MKL components.]

Ok, one more suggestion: call mkl_mem_stat right before and after the mkl_free_buffers at the end of the loop and tell us what do you see (in terms of # allocated buffers and # allocated bytes).

And one more thing to check is what each of the MPI processes do. Can it be that not all of them are calling mkl_free_buffers? Can it be that BLACS init/gridinit is called repeatedly? I would also try (if possible) to run on a single MPI and check that the issue is present.

Best,
Kirill

Kirill_V_Intel
Employee
97 Views

Conn, one more suggeston:

Try to turn the fast memory manager off by calling mkl_disable_fast_mm (or setting the equiaveltn environment variable):

https://software.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top...

I suspect that the leak will go away but it would be great to have it as a fact.

Best,
Kirill

91 Views

Hi Kirill, 

 

Thanks for the suggestions. I have been running a standalone version of the code with the loop (no spawn from python) on a single mpi process for testing. 

The # of allocated bytes before and after the call mkl_free_buffers() remains the same (11532), with the memory management turned off. Though I suppose that suggests that the memory usage should be constant, not bloating. Perhaps the problem is not with mkl at all, as I tested with non-mkl versions of the libraries and found the same thing happening.

 

I'll investigate further - feel free to close the ticket. I'll open another if I am sure it mkl.

 

Thanks, 

Conn

Gennady_F_Intel
Moderator
83 Views

This issue has been resolved and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only. 



View solution in original post