- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi There,
I have a problem with a code I'm using (vasp electronic structure code v5.4.1) which I have modified to run multiple times within a loop. I noticed I was getting memory leaks, so figured there was some allocatable not being cleaned up somewhere.
I passed it through valgrind, and got the output at the bottom of the message. It seems like the memory leaks are coming from mkl, not the main code itself.
I was wondering if anyone has seen this type of thing before, and if so knows how to solve it.
I'm using Parallel studio XE cluster: intel_2020/compilers_and_libraries_2020.0.166
Thanks,
Conn
==19089== 264 bytes in 1 blocks are possibly lost in loss record 18 of 26
==19089== at 0x887CB6B: malloc (vg_replace_malloc.c:299)
==19089== by 0xBB92709: mkl_serv_malloc (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089== by 0xBB91A90: mkl_serv_allocate (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089== by 0xA170576: MKL_BLACS_ALLOCATE (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
==19089== by 0xA153629: blacs_gridmap_ (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
==19089== by 0xA1531FD: blacs_gridinit_ (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
==19089== by 0x4369C1: scala_mp_pdssyex_zheevx_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0xCC665B: david_mp_eddav_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0xCE4762: elmin_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x1234378: MAIN__ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x4066C1: main (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==
==19089== 10,464 bytes in 1 blocks are possibly lost in loss record 21 of 26
==19089== at 0x887CB6B: malloc (vg_replace_malloc.c:299)
==19089== by 0xBB91916: mkl_serv_allocate (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089== by 0xA170576: MKL_BLACS_ALLOCATE (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
==19089== by 0xA16AC53: BI_GetBuff (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
==19089== by 0xA14C987: zgsum2d_ (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
==19089== by 0x9B34EFD: pzlarfb_ (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_scalapack_lp64.so)
==19089== by 0x9B6CAD0: pzunmql_ (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_scalapack_lp64.so)
==19089== by 0x9B70323: pzunmtr_ (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_scalapack_lp64.so)
==19089== by 0x9AE2099: mkl_pzheevx0_ (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_scalapack_lp64.so)
==19089== by 0x9ADE9C6: pzheevx_ (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_scalapack_lp64.so)
==19089== by 0x4374A8: scala_mp_pdssyex_zheevx_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0xCC665B: david_mp_eddav_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==
==19089== 69,664 bytes in 1 blocks are possibly lost in loss record 22 of 26
==19089== at 0x887CB6B: malloc (vg_replace_malloc.c:299)
==19089== by 0xBB933B7: mm_account_ptr_by_tid..0 (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089== by 0xBB914C4: mkl_serv_allocate (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089== by 0xC580A66: mkl_lapack_zhseqr (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089== by 0xC508660: mkl_lapack_zgeev (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089== by 0x8E935D7: ZGEEV (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so)
==19089== by 0x56BA9F: spinsym_mp_set_spinrot_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x58016F: ibzkpt_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x5336AD: mkpoints_mp_rd_kpoints_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x53180E: mkpoints_mp_setup_kpoints_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x1216A13: MAIN__ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x4066C1: main (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==
==19089== 4,309,536 bytes in 1 blocks are possibly lost in loss record 23 of 26
==19089== at 0x887CB6B: malloc (vg_replace_malloc.c:299)
==19089== by 0xBB91E13: mkl_serv_allocate (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089== by 0x154B4845: mkl_blas_avx2_zgemm_get_bufs (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089== by 0x154B6174: mkl_blas_avx2_z_generic_fullacopybcopy (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089== by 0x154C7CD5: mkl_blas_avx2_xzgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089== by 0xBBCA6EF: mkl_blas_xzgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089== by 0xA4FA875: mkl_blas_zgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_sequential.so)
==19089== by 0x8C21576: ZGEMM (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so)
==19089== by 0x60698D: orth1_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x6090BF: choleski_mp_orthch_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x123092B: MAIN__ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x4066C1: main (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==
==19089== 4,364,832 bytes in 1 blocks are possibly lost in loss record 24 of 26
==19089== at 0x887CB6B: malloc (vg_replace_malloc.c:299)
==19089== by 0xBB91E13: mkl_serv_allocate (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089== by 0x154B4845: mkl_blas_avx2_zgemm_get_bufs (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089== by 0x154B6174: mkl_blas_avx2_z_generic_fullacopybcopy (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089== by 0x154C7CD5: mkl_blas_avx2_xzgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089== by 0xBBCA6EF: mkl_blas_xzgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089== by 0xA4FA875: mkl_blas_zgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_sequential.so)
==19089== by 0x8C2149E: ZGEMM (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so)
==19089== by 0x60698D: orth1_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x6090BF: choleski_mp_orthch_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x123092B: MAIN__ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x4066C1: main (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==
==19089== 4,647,456 bytes in 1 blocks are possibly lost in loss record 25 of 26
==19089== at 0x887CB6B: malloc (vg_replace_malloc.c:299)
==19089== by 0xBB91E13: mkl_serv_allocate (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089== by 0x154B4845: mkl_blas_avx2_zgemm_get_bufs (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089== by 0x154CFD48: mkl_blas_avx2_xztrmm_right_upper_fullacopybcopy (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089== by 0x154CE0D6: mkl_blas_avx2_xztrmm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089== by 0xBBCE07F: mkl_blas_xztrmm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089== by 0xA4FB178: mkl_blas_ztrmm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_sequential.so)
==19089== by 0x8C25205: ZTRMM (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so)
==19089== by 0x6048CA: linbas_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x60400E: dfast_mp_lincom_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x609ED8: choleski_mp_orthch_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x123092B: MAIN__ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==
==19089== 6,076,960 bytes in 1 blocks are possibly lost in loss record 26 of 26
==19089== at 0x887CB6B: malloc (vg_replace_malloc.c:299)
==19089== by 0xBB91916: mkl_serv_allocate (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089== by 0x154B4845: mkl_blas_avx2_zgemm_get_bufs (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089== by 0x154B6174: mkl_blas_avx2_z_generic_fullacopybcopy (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089== by 0x154C7CD5: mkl_blas_avx2_xzgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_avx2.so)
==19089== by 0xBBCA6EF: mkl_blas_xzgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so)
==19089== by 0xA4FA875: mkl_blas_zgemm (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_sequential.so)
==19089== by 0x8C2149E: ZGEMM (in /opt/intel_2020/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so)
==19089== by 0xCC49FE: david_mp_eddav_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0xCE4762: elmin_ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x1234378: MAIN__ (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089== by 0x4066C1: main (in /home/cor22/Scratch/optyU/TEST/LiCoO2/configs/config_1/INTEL_TEST/TEST2/MEM_LEAK/vasp.5.4.1/bin/vasp_std)
==19089==
==19089== LEAK SUMMARY:
==19089== definitely lost: 111 bytes in 8 blocks
==19089== indirectly lost: 0 bytes in 0 blocks
==19089== possibly lost: 19,479,432 bytes in 8 blocks
==19089== still reachable: 5,129 bytes in 15 blocks
==19089== suppressed: 0 bytes in 0 blocks
==19089== Reachable blocks (those to which a pointer was found) are not shown.
==19089== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==19089==
==19089== For counts of detected and suppressed errors, rerun with: -v
==19089== Use --track-origins=yes to see where uninitialised values come from
==19089== ERROR SUMMARY: 201 errors from 41 contexts (suppressed: 0 from 0)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This issue has been resolved and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Cohn,
I believe you should call mkl_free_buffers() after the last call to MKL, see https://software.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/support-functions/memory-management/mkl-free-buffers.html
MKL(oneMKL) can create and use internal buffers which in some cases are crucial for getting optimal performance. While the existence of these buffers is transparent to the users, this API (mkl_free_buffers) allows the user to explicitly tell MKL to get rid of those buffers.
Let us know if this solves the problem.
Best,
Kirill
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kirill,
I already call mkl_free_buffers() at the end of each loop (I have modified the code to loop, and communicate to a python parent with MPI rather than having multiple spawns).
Calling mkl_free_buffers() has no effect on the memory issues, and the code eventually falls over anyway.
Do you have any other suggestions?
Thanks,
Conn
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Conn,
[Just to make it clear, I am not directly working on the MKL service layer but on other MKL components.]
Ok, one more suggestion: call mkl_mem_stat right before and after the mkl_free_buffers at the end of the loop and tell us what do you see (in terms of # allocated buffers and # allocated bytes).
And one more thing to check is what each of the MPI processes do. Can it be that not all of them are calling mkl_free_buffers? Can it be that BLACS init/gridinit is called repeatedly? I would also try (if possible) to run on a single MPI and check that the issue is present.
Best,
Kirill
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Conn, one more suggeston:
Try to turn the fast memory manager off by calling mkl_disable_fast_mm (or setting the equiaveltn environment variable):
I suspect that the leak will go away but it would be great to have it as a fact.
Best,
Kirill
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kirill,
Thanks for the suggestions. I have been running a standalone version of the code with the loop (no spawn from python) on a single mpi process for testing.
The # of allocated bytes before and after the call mkl_free_buffers() remains the same (11532), with the memory management turned off. Though I suppose that suggests that the memory usage should be constant, not bloating. Perhaps the problem is not with mkl at all, as I tested with non-mkl versions of the libraries and found the same thing happening.
I'll investigate further - feel free to close the ticket. I'll open another if I am sure it mkl.
Thanks,
Conn
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This issue has been resolved and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page