- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
Hi There,
I have a problem with a code I'm using (vasp electronic structure code v5.4.1) which I have modified to run multiple times within a loop. I noticed I was getting memory leaks, so figured there was some allocatable not being cleaned up somewhere.
I passed it through valgrind, and got the output at the bottom of the message. It seems like the memory leaks are coming from mkl, not the main code itself.
I was wondering if anyone has seen this type of thing before, and if so knows how to solve it.
I'm using Parallel studio XE cluster: intel_2020/compilers_and_libraries_2020.0.166
Thanks,
Conn
==19089== 264 bytes in 1 blocks are possibly lost in loss record 18 of 26
==19089== at 0x887CB6B: malloc (vg_replace_malloc.c:299)
==19089== by 0xBB92709: mkl_serv_malloc (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089== by 0xBB91A90: mkl_serv_allocate (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089== by 0xA170576: MKL_BLACS_ALLOCATE (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
==19089== by 0xA153629: blacs_gridmap_ (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
==19089== by 0xA1531FD: blacs_gridinit_ (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
==19089== by 0x4369C1: scala_mp_pdssyex_zheevx_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0xCC665B: david_mp_eddav_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0xCE4762: elmin_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x1234378: MAIN__ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x4066C1: main (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==
==19089== 10,464 bytes in 1 blocks are possibly lost in loss record 21 of 26
==19089== at 0x887CB6B: malloc (vg_replace_malloc.c:299)
==19089== by 0xBB91916: mkl_serv_allocate (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089== by 0xA170576: MKL_BLACS_ALLOCATE (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
==19089== by 0xA16AC53: BI_GetBuff (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
==19089== by 0xA14C987: zgsum2d_ (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
==19089== by 0x9B34EFD: pzlarfb_ (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_scalapack_lp64.so)
==19089== by 0x9B6CAD0: pzunmql_ (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_scalapack_lp64.so)
==19089== by 0x9B70323: pzunmtr_ (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_scalapack_lp64.so)
==19089== by 0x9AE2099: mkl_pzheevx0_ (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_scalapack_lp64.so)
==19089== by 0x9ADE9C6: pzheevx_ (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_scalapack_lp64.so)
==19089== by 0x4374A8: scala_mp_pdssyex_zheevx_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0xCC665B: david_mp_eddav_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==
==19089== 69,664 bytes in 1 blocks are possibly lost in loss record 22 of 26
==19089== at 0x887CB6B: malloc (vg_replace_malloc.c:299)
==19089== by 0xBB933B7: mm_account_ptr_by_tid..0 (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089== by 0xBB914C4: mkl_serv_allocate (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089== by 0xC580A66: mkl_lapack_zhseqr (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089== by 0xC508660: mkl_lapack_zgeev (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089== by 0x8E935D7: ZGEEV (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so)
==19089== by 0x56BA9F: spinsym_mp_set_spinrot_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x58016F: ibzkpt_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x5336AD: mkpoints_mp_rd_kpoints_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x53180E: mkpoints_mp_setup_kpoints_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x1216A13: MAIN__ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x4066C1: main (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==
==19089== 4,309,536 bytes in 1 blocks are possibly lost in loss record 23 of 26
==19089== at 0x887CB6B: malloc (vg_replace_malloc.c:299)
==19089== by 0xBB91E13: mkl_serv_allocate (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089== by 0x154B4845: mkl_blas_avx2_zgemm_get_bufs (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089== by 0x154B6174: mkl_blas_avx2_z_generic_fullacopybcopy (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089== by 0x154C7CD5: mkl_blas_avx2_xzgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089== by 0xBBCA6EF: mkl_blas_xzgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089== by 0xA4FA875: mkl_blas_zgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_sequential.so)
==19089== by 0x8C21576: ZGEMM (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so)
==19089== by 0x60698D: orth1_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x6090BF: choleski_mp_orthch_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x123092B: MAIN__ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x4066C1: main (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==
==19089== 4,364,832 bytes in 1 blocks are possibly lost in loss record 24 of 26
==19089== at 0x887CB6B: malloc (vg_replace_malloc.c:299)
==19089== by 0xBB91E13: mkl_serv_allocate (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089== by 0x154B4845: mkl_blas_avx2_zgemm_get_bufs (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089== by 0x154B6174: mkl_blas_avx2_z_generic_fullacopybcopy (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089== by 0x154C7CD5: mkl_blas_avx2_xzgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089== by 0xBBCA6EF: mkl_blas_xzgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089== by 0xA4FA875: mkl_blas_zgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_sequential.so)
==19089== by 0x8C2149E: ZGEMM (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so)
==19089== by 0x60698D: orth1_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x6090BF: choleski_mp_orthch_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x123092B: MAIN__ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x4066C1: main (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==
==19089== 4,647,456 bytes in 1 blocks are possibly lost in loss record 25 of 26
==19089== at 0x887CB6B: malloc (vg_replace_malloc.c:299)
==19089== by 0xBB91E13: mkl_serv_allocate (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089== by 0x154B4845: mkl_blas_avx2_zgemm_get_bufs (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089== by 0x154CFD48: mkl_blas_avx2_xztrmm_right_upper_fullacopybcopy (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089== by 0x154CE0D6: mkl_blas_avx2_xztrmm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089== by 0xBBCE07F: mkl_blas_xztrmm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089== by 0xA4FB178: mkl_blas_ztrmm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_sequential.so)
==19089== by 0x8C25205: ZTRMM (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so)
==19089== by 0x6048CA: linbas_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x60400E: dfast_mp_lincom_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x609ED8: choleski_mp_orthch_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x123092B: MAIN__ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==
==19089== 6,076,960 bytes in 1 blocks are possibly lost in loss record 26 of 26
==19089== at 0x887CB6B: malloc (vg_replace_malloc.c:299)
==19089== by 0xBB91916: mkl_serv_allocate (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089== by 0x154B4845: mkl_blas_avx2_zgemm_get_bufs (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089== by 0x154B6174: mkl_blas_avx2_z_generic_fullacopybcopy (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089== by 0x154C7CD5: mkl_blas_avx2_xzgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089== by 0xBBCA6EF: mkl_blas_xzgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089== by 0xA4FA875: mkl_blas_zgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_sequential.so)
==19089== by 0x8C2149E: ZGEMM (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so)
==19089== by 0xCC49FE: david_mp_eddav_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0xCE4762: elmin_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x1234378: MAIN__ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x4066C1: main (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==
==19089== LEAK SUMMARY:
==19089== definitely lost: 111 bytes in 8 blocks
==19089== indirectly lost: 0 bytes in 0 blocks
==19089== possibly lost: 19,479,432 bytes in 8 blocks
==19089== still reachable: 5,129 bytes in 15 blocks
==19089== suppressed: 0 bytes in 0 blocks
==19089== Reachable blocks (those to which a pointer was found) are not shown.
==19089== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==19089==
==19089== For counts of detected and suppressed errors, rerun with: -v
==19089== Use --track-origins=yes to see where uninitialised values come from
==19089== ERROR SUMMARY: 201 errors from 41 contexts (suppressed: 0 from 0)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
This issue has been resolved and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
Hi Cohn,
I believe you should call mkl_free_buffers() after the last call to MKL, see https://software.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/...
MKL(oneMKL) can create and use internal buffers which in some cases are crucial for getting optimal performance. While the existence of these buffers is transparent to the users, this API (mkl_free_buffers) allows the user to explicitly tell MKL to get rid of those buffers.
Let us know if this solves the problem.
Best,
Kirill
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
Hi Kirill,
I already call mkl_free_buffers() at the end of each loop (I have modified the code to loop, and communicate to a python parent with MPI rather than having multiple spawns).
Calling mkl_free_buffers() has no effect on the memory issues, and the code eventually falls over anyway.
Do you have any other suggestions?
Thanks,
Conn
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
Hi Conn,
[Just to make it clear, I am not directly working on the MKL service layer but on other MKL components.]
Ok, one more suggestion: call mkl_mem_stat right before and after the mkl_free_buffers at the end of the loop and tell us what do you see (in terms of # allocated buffers and # allocated bytes).
And one more thing to check is what each of the MPI processes do. Can it be that not all of them are calling mkl_free_buffers? Can it be that BLACS init/gridinit is called repeatedly? I would also try (if possible) to run on a single MPI and check that the issue is present.
Best,
Kirill
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
Conn, one more suggeston:
Try to turn the fast memory manager off by calling mkl_disable_fast_mm (or setting the equiaveltn environment variable):
I suspect that the leak will go away but it would be great to have it as a fact.
Best,
Kirill
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
Hi Kirill,
Thanks for the suggestions. I have been running a standalone version of the code with the loop (no spawn from python) on a single mpi process for testing.
The # of allocated bytes before and after the call mkl_free_buffers() remains the same (11532), with the memory management turned off. Though I suppose that suggests that the memory usage should be constant, not bloating. Perhaps the problem is not with mkl at all, as I tested with non-mkl versions of the libraries and found the same thing happening.
I'll investigate further - feel free to close the ticket. I'll open another if I am sure it mkl.
Thanks,
Conn
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
This issue has been resolved and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page