Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
6981 Discussions

memory leak or buffers are not freed in mkl?

Jize_Zhao
Beginner
541 Views
I am doing matrix calcution with the newest mkl and icc 11.1 and openmp
The calculation is iterative. After one iterative, the memory will be freed.
But I found that if the matrix are large(occupy more than 40G memory), some memory
can not be freed, even after calling MKL_Free_Buffers(), the unfreed memory is about 40G,
which seemingly does not increase as iterative number.

I have checked my code with valgrind, and no memory leak is found in my code.
0 Kudos
6 Replies
Gennady_F_Intel
Moderator
541 Views
for checking amount of memory utilizes by MKL's routine, you can check:

AllocatedBuffersBefore = MKL_Mem_Stat( );

MKL_Free_Buffers();
AllocatedBuffersAfter = MKL_Mem_Stat( );
if ( AllocatedBuffersAfter > 0) ==> memory leakage.
Could you please check this and let us know.
--Gennady
0 Kudos
Jize_Zhao
Beginner
541 Views
CPU is " Intel Xeon CPU X5670 @ 2.93GHz", with 96G memory.


0 Kudos
Gennady_F_Intel
Moderator
541 Views

thanks for the info,

but how about theAllocatedBuffersBefore andAllocatedBuffersAfter mkl_free_buffers()?

--Gennady

0 Kudos
Jize_Zhao
Beginner
541 Views
I check it,

Memory used by mkl after free is 0 byte in 0 buffers.


strange problem.

Thank you!

0 Kudos
Gennady_F_Intel
Moderator
541 Views
I just wonder if you depend on whether this behavior of the size of the problem?
say if the size of the problem will 4 Gb not 40?
--Gennady
0 Kudos
Jize_Zhao
Beginner
541 Views
Thank you, Gennady.

I will describe my problem in more detail.

In my program, there is a global matrix: double Coeff[48][48][48][48], and a matrix class Matrix, the code can write simply as follows:

....
#include

double Coeff[48][48][48][48];

int main(void)
{
int state=12000;
for(int index=0; index<64;index++)
{
for(int k=0;k<48;k++)
{
Matrix*mt=new Matrix(state,k, ....);

mt.diagonalizeMatrix();
mt.saveMatrixToDisk();

delete mt;
}

if(index%8==0)state+=4000;
}

return 1;
}

The matrix class is quite complicate, it diagonalize a Hamiltonian with the size (state*2*state*2)^2. Since the Hamiltonian is a block matrix, so it is much smaller.

After "delete mt", the memory used is expeted to just a little larger than 48*48*48*48*8=42467328=42M.
In the constructor "Matrix(....)", it will read and print the memory information from "/proc/self/status" before loading data from disks.

When state=12000", I can show you some, for example at k=10,

"VmPeak: 28396820 kB
VmSize: 911108 kB
VmLck: 0 kB
VmHWM: 25515396 kB
VmRSS: 53640 kB <===================== this is justa little larger than 42M.
VmData: 858328 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 1732 kB
",
at k=20,


VmPeak: 28396820 kB
VmSize: 911108 kB
VmLck: 0 kB
VmHWM: 25515396 kB
VmRSS: 53640 kB
VmData: 858328 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 1732 kB

When state=16000, at k=0,

"
VmPeak: 28396820 kB
VmSize: 911108 kB
VmLck: 0 kB
VmHWM: 26683804 kB
VmRSS: 53648 kB
VmData: 858328 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 1732 kB
"

at k=1,

"
VmPeak: 34412208 kB
VmSize: 1831832 kB
VmLck: 0 kB
VmHWM: 31453736 kB
VmRSS: 974220 kB
VmData: 1779052 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 3528 kB

",

at k=47,

"VmPeak: 35346244 kB
VmSize: 1994660 kB
VmLck: 0 kB
VmHWM: 34490032 kB
VmRSS: 1137164 kB
VmData: 1941880 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 3860 kB

",

when state=20000;

at k=0,

"
VmPeak: 35446708 kB
VmSize: 2217204 kB
VmLck: 0 kB
VmHWM: 34498692 kB
VmRSS: 1359616 kB
VmData: 2164424 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 4296 kB

",

when state=24000;

at k=0,

"VmPeak: 35446708 kB
VmSize: 2217204 kB
VmLck: 0 kB
VmHWM: 34498692 kB
VmRSS: 1359616 kB
VmData: 2164424 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 4296 kB

",

at k=1,

"VmPeak: 48898748 kB
VmSize: 2226456 kB
VmLck: 0 kB
VmHWM: 43931232 kB
VmRSS: 1368952 kB
VmData: 2173676 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 4320 kB

",

at k=2,

"VmPeak: 58133264 kB
VmSize: 40184236 kB
VmLck: 0 kB
VmHWM: 53270568 kB
VmRSS: 39326660 kB
VmData: 40131456 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 78664 kB

",

now it become serious.

When state=28000, my process is stop after running several steps. Just before it stopped,

"
VmPeak: 100349956 kB
VmSize: 74397700 kB
VmLck: 0 kB
VmHWM: 91141796 kB
VmRSS: 65545192 kB <====== this is quite large.
VmData: 74329900 kB
VmStk: 84 kB
VmExe: 568 kB
VmLib: 48336 kB
VmPTE: 149148 kB

"
I continue run my program at the point where it stopped. Even if all the data is loaded into memory, the memory usage is much smaller,

"
VmPeak: 51814344 kB <=====this seems to be the maximum memory my code should use when state=28000.
VmSize: 43797400 kB
VmLck: 0 kB
VmHWM: 51218832 kB
VmRSS: 43201932 kB
VmData: 43729600 kB
VmStk: 84 kB
VmExe: 568 kB
VmLib: 48336 kB
VmPTE: 86328 kB
"

I use linux 2.6.18 x86_64, with openmp, compiler flags :

"-O3 -openmp -xSSE4.2"

link " -lmkl_lapack -lmkl_intel_lp64 -lmkl_core -lmkl_sequential",



Do you have any ideas or suggestions?

Thank you very much!
0 Kudos
Reply