Intel® oneAPI HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
1961 Discussions

Memory leak in D&C eigensolver parallel

nickpapior
Beginner
1,017 Views

Hi,

 

We have encountered several major memory leaks in Siesta, CP2K and QE when using D&C eigensolver.

 

See e.g. this thread: https://gitlab.com/siesta-project/siesta/-/issues/29#note_735026816

with explicit showcasing that MPI causes some problems.

Whether this is related to the MKL implementation, or the MPI implementation is not clear to me.

 

Elsewhere I found these topics:
https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Memory-leak-in-dpotrf-and-dpotri/m-p...

https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Memory-Leak-in-MKL/m-p/1153543

 

Where it is suggested to insert a call:

mkl_free_buffers

 

I have to say that this is not really a good solution? Why can't MKL garbage collect buffers not reachable?

 

/ Nick

0 Kudos
8 Replies
VidyalathaB_Intel
Moderator
977 Views

Hi,

 

Thanks for reaching out to us.

 

Could please let us know with which oneMKL version you have tried along with your OS details? 

 

For the issue which you have mentioned regarding memory leaks in dpotrf & dpotri, this issue has been fixed in 2021.3.0 release.

Here is the article where you can get information regarding the bug fixes

https://www.intel.com/content/www/us/en/developer/articles/troubleshooting/intel-oneapi-math-kernel-...

 

>> memory leaks in D&C eigensolver parallel.

 

You can try the latest oneMKL version which is 2021.4.0 and see if it resolves the issue regarding memory leaks.

 

If the issue still persists even with the latest release, please provide us with a minimal reproducer(& steps if any) so that we can work on it from our end.

 

Please confirm the observed memory leaks by checking them using Intel Inspector as well.

 

It would be helpful if you can summarize once again here with the versions of MKL & MPI where no memory leaks were observed.

 

Regards,

Vidya.

 

gsamsonidze
Beginner
945 Views

OS details: Red Hat Enterprise Linux Server release 7.6 (Maipo)

 

Summary of the versions of MKL & MPI for Siesta with D&C eigensolver:

intel/2018u4 + intel-mpi/2018.4.274 + mkl/2018u4 No memory leaks
intel/2019u4 + intel-mpi/2019.5.281 + mkl/2019u4 Memory leaks
oneapi/compiler/2021.3 + oneapi/mpi/2021.3.0 + oneapi/mkl/2021.3.0 Memory leaks
intel/2018u4 + intel-mpi/2018.4.274 + openblas-0.3.17 + scalapack-2.1.0 No memory leaks
intel/2019u4 + intel-mpi/2019.5.281 + openblas-0.3.17 + scalapack-2.1.0 Memory leaks
oneapi/compiler/2021.3 + oneapi/mpi/2021.3.0 + openblas-0.3.17 + scalapack-2.1.0 Memory leaks

 

It looks like the problem is not in MKL, since the same behavior is observed with OpenBLAS + ScaLAPACK as with MKL.

 

I will try the latest release (oneAPI 2021.4.0) and if the issue persists I will provide the input files to reproduce it.

VidyalathaB_Intel
Moderator
878 Views

Hi,

Reminder:

Could you please provide us with the above-mentioned details (in my previous post) so that we can work on it from our end?

Regards,

Vidya.

 

gsamsonidze
Beginner
863 Views

Hi Vidya,

Here are the answers to your previous post:

> Could please let us know with which oneMKL version you have tried along with your OS details?

oneMKL version: 2021.3.0
OS details: Red Hat Enterprise Linux Server release 7.6 (Maipo)

> You can try the latest oneMKL version which is 2021.4.0 and see if it resolves the issue regarding memory leaks.

I'm working on installing 2021.4.0 and recompiling and testing Siesta.

> If the issue still persists even with the latest release, please provide us with a minimal reproducer(& steps if any) so that we can work on it from our end.

The minimal reproducer is attached. It contains Siesta makefile (arch.make), job submission script (submit.sh), and Siesta input files (WATER.fdf, O.psf, H.psf). Siesta source code can be downloaded here:

https://gitlab.com/siesta-project/siesta/-/releases/v4.1.5/downloads/siesta-4.1.5.tar.gz

> Please confirm the observed memory leaks by checking them using Intel Inspector as well.

So far I've been using /proc/PID/smaps files in Linux to track memory usage of Siesta. I will confirm with Intel Inspector as well.

> It would be helpful if you can summarize once again here with the versions of MKL & MPI where no memory leaks were observed.

There were no memory leaks with Intel Parallel Studio XE Cluster Edition 2018.4. There were memory leaks with Intel Parallel Studio XE Cluster Edition 2019.5 and with Intel oneAPI HPC Toolkit 2021.3.0.

Regards,

-Georgy

VidyalathaB_Intel
Moderator
745 Views

Hi,


Thanks for providing the details.

We have reported this issue to the development team, they are looking into this issue. We will get back to you soon.


Regards,

Vidya.


VidyalathaB_Intel
Moderator
633 Views

Hi,

 

>>I'm working on installing 2021.4.0 and recompiling and testing Siesta

The issue is fixed in oneAPI 2021.4.0

Could you please try with 2021.4.0 Intel MPI and let us know if your issue is resolved?

 

Regards,

Vidya.

 

gsamsonidze
Beginner
610 Views

Hi Vidya, thank you for the update. I just finished testing 2021.4 and I can confirm that the memory leaks are gone. There is one more issue though. CP2K built with 2021.4 freezes after running for several hours, while CP2K built with 2018.4 works fine and reaches the walltime limit. This is not a memory leak as the memory consumption is low. Not sure if this is specific to my OS. I will run some more tests with 2021.4 and if the issue persists I will provide a minimal reproducer (or open a new ticket).

VidyalathaB_Intel
Moderator
584 Views

Hi,


>> .....(or open a new ticket)


Yes, you can raise a new thread if you face any issues with respect to CP2K application (as it would be easier to keep track of the issue).


>>I can confirm that the memory leaks are gone


Thanks for the confirmation.


As your issue regarding the memory leaks in the siesta application is resolved, we are closing this thread. Please post a new question if you need any additional information from Intel as this thread will no longer be monitored.


Regards,

Vidya.



Reply