Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
7234 Discussions

memory leak or buffers are not freed in mkl?

Jize_Zhao
Beginner
1,288 Views
I am doing matrix calcution with the newest mkl and icc 11.1 and openmp
The calculation is iterative. After one iterative, the memory will be freed.
But I found that if the matrix are large(occupy more than 40G memory), some memory
can not be freed, even after calling MKL_Free_Buffers(), the unfreed memory is about 40G,
which seemingly does not increase as iterative number.

I have checked my code with valgrind, and no memory leak is found in my code.
0 Kudos
6 Replies
Gennady_F_Intel
Moderator
1,288 Views
for checking amount of memory utilizes by MKL's routine, you can check:

AllocatedBuffersBefore = MKL_Mem_Stat( );

MKL_Free_Buffers();
AllocatedBuffersAfter = MKL_Mem_Stat( );
if ( AllocatedBuffersAfter > 0) ==> memory leakage.
Could you please check this and let us know.
--Gennady
0 Kudos
Jize_Zhao
Beginner
1,288 Views
CPU is " Intel Xeon CPU X5670 @ 2.93GHz", with 96G memory.


0 Kudos
Gennady_F_Intel
Moderator
1,288 Views

thanks for the info,

but how about theAllocatedBuffersBefore andAllocatedBuffersAfter mkl_free_buffers()?

--Gennady

0 Kudos
Jize_Zhao
Beginner
1,288 Views
I check it,

Memory used by mkl after free is 0 byte in 0 buffers.


strange problem.

Thank you!

0 Kudos
Gennady_F_Intel
Moderator
1,288 Views
I just wonder if you depend on whether this behavior of the size of the problem?
say if the size of the problem will 4 Gb not 40?
--Gennady
0 Kudos
Jize_Zhao
Beginner
1,288 Views
Thank you, Gennady.

I will describe my problem in more detail.

In my program, there is a global matrix: double Coeff[48][48][48][48], and a matrix class Matrix, the code can write simply as follows:

....
#include

double Coeff[48][48][48][48];

int main(void)
{
int state=12000;
for(int index=0; index<64;index++)
{
for(int k=0;k<48;k++)
{
Matrix*mt=new Matrix(state,k, ....);

mt.diagonalizeMatrix();
mt.saveMatrixToDisk();

delete mt;
}

if(index%8==0)state+=4000;
}

return 1;
}

The matrix class is quite complicate, it diagonalize a Hamiltonian with the size (state*2*state*2)^2. Since the Hamiltonian is a block matrix, so it is much smaller.

After "delete mt", the memory used is expeted to just a little larger than 48*48*48*48*8=42467328=42M.
In the constructor "Matrix(....)", it will read and print the memory information from "/proc/self/status" before loading data from disks.

When state=12000", I can show you some, for example at k=10,

"VmPeak: 28396820 kB
VmSize: 911108 kB
VmLck: 0 kB
VmHWM: 25515396 kB
VmRSS: 53640 kB <===================== this is justa little larger than 42M.
VmData: 858328 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 1732 kB
",
at k=20,


VmPeak: 28396820 kB
VmSize: 911108 kB
VmLck: 0 kB
VmHWM: 25515396 kB
VmRSS: 53640 kB
VmData: 858328 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 1732 kB

When state=16000, at k=0,

"
VmPeak: 28396820 kB
VmSize: 911108 kB
VmLck: 0 kB
VmHWM: 26683804 kB
VmRSS: 53648 kB
VmData: 858328 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 1732 kB
"

at k=1,

"
VmPeak: 34412208 kB
VmSize: 1831832 kB
VmLck: 0 kB
VmHWM: 31453736 kB
VmRSS: 974220 kB
VmData: 1779052 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 3528 kB

",

at k=47,

"VmPeak: 35346244 kB
VmSize: 1994660 kB
VmLck: 0 kB
VmHWM: 34490032 kB
VmRSS: 1137164 kB
VmData: 1941880 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 3860 kB

",

when state=20000;

at k=0,

"
VmPeak: 35446708 kB
VmSize: 2217204 kB
VmLck: 0 kB
VmHWM: 34498692 kB
VmRSS: 1359616 kB
VmData: 2164424 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 4296 kB

",

when state=24000;

at k=0,

"VmPeak: 35446708 kB
VmSize: 2217204 kB
VmLck: 0 kB
VmHWM: 34498692 kB
VmRSS: 1359616 kB
VmData: 2164424 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 4296 kB

",

at k=1,

"VmPeak: 48898748 kB
VmSize: 2226456 kB
VmLck: 0 kB
VmHWM: 43931232 kB
VmRSS: 1368952 kB
VmData: 2173676 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 4320 kB

",

at k=2,

"VmPeak: 58133264 kB
VmSize: 40184236 kB
VmLck: 0 kB
VmHWM: 53270568 kB
VmRSS: 39326660 kB
VmData: 40131456 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 78664 kB

",

now it become serious.

When state=28000, my process is stop after running several steps. Just before it stopped,

"
VmPeak: 100349956 kB
VmSize: 74397700 kB
VmLck: 0 kB
VmHWM: 91141796 kB
VmRSS: 65545192 kB <====== this is quite large.
VmData: 74329900 kB
VmStk: 84 kB
VmExe: 568 kB
VmLib: 48336 kB
VmPTE: 149148 kB

"
I continue run my program at the point where it stopped. Even if all the data is loaded into memory, the memory usage is much smaller,

"
VmPeak: 51814344 kB <=====this seems to be the maximum memory my code should use when state=28000.
VmSize: 43797400 kB
VmLck: 0 kB
VmHWM: 51218832 kB
VmRSS: 43201932 kB
VmData: 43729600 kB
VmStk: 84 kB
VmExe: 568 kB
VmLib: 48336 kB
VmPTE: 86328 kB
"

I use linux 2.6.18 x86_64, with openmp, compiler flags :

"-O3 -openmp -xSSE4.2"

link " -lmkl_lapack -lmkl_intel_lp64 -lmkl_core -lmkl_sequential",



Do you have any ideas or suggestions?

Thank you very much!
0 Kudos
Reply