- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am doing matrix calcution with the newest mkl and icc 11.1 and openmp
The calculation is iterative. After one iterative, the memory will be freed.
But I found that if the matrix are large(occupy more than 40G memory), some memory
can not be freed, even after calling MKL_Free_Buffers(), the unfreed memory is about 40G,
which seemingly does not increase as iterative number.
I have checked my code with valgrind, and no memory leak is found in my code.
The calculation is iterative. After one iterative, the memory will be freed.
But I found that if the matrix are large(occupy more than 40G memory), some memory
can not be freed, even after calling MKL_Free_Buffers(), the unfreed memory is about 40G,
which seemingly does not increase as iterative number.
I have checked my code with valgrind, and no memory leak is found in my code.
Link Copied
6 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
for checking amount of memory utilizes by MKL's routine, you can check:
AllocatedBuffersBefore = MKL_Mem_Stat( );
MKL_Free_Buffers();
AllocatedBuffersAfter = MKL_Mem_Stat( );
if ( AllocatedBuffersAfter > 0) ==> memory leakage.
Could you please check this and let us know.
--Gennady
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
CPU is " Intel Xeon CPU X5670 @ 2.93GHz", with 96G memory.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thanks for the info,
but how about theAllocatedBuffersBefore andAllocatedBuffersAfter mkl_free_buffers()?
--Gennady
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I check it,
Memory used by mkl after free is 0 byte in 0 buffers.
strange problem.
Thank you!
Memory used by mkl after free is 0 byte in 0 buffers.
strange problem.
Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I just wonder if you depend on whether this behavior of the size of the problem?
say if the size of the problem will 4 Gb not 40?
say if the size of the problem will 4 Gb not 40?
--Gennady
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you, Gennady.
I will describe my problem in more detail.
In my program, there is a global matrix: double Coeff[48][48][48][48], and a matrix class Matrix, the code can write simply as follows:
....
#include
double Coeff[48][48][48][48];
int main(void)
{
int state=12000;
for(int index=0; index<64;index++)
{
for(int k=0;k<48;k++)
{
Matrix*mt=new Matrix(state,k, ....);
mt.diagonalizeMatrix();
mt.saveMatrixToDisk();
delete mt;
}
if(index%8==0)state+=4000;
}
return 1;
}
The matrix class is quite complicate, it diagonalize a Hamiltonian with the size (state*2*state*2)^2. Since the Hamiltonian is a block matrix, so it is much smaller.
After "delete mt", the memory used is expeted to just a little larger than 48*48*48*48*8=42467328=42M.
In the constructor "Matrix(....)", it will read and print the memory information from "/proc/self/status" before loading data from disks.
When state=12000", I can show you some, for example at k=10,
"VmPeak: 28396820 kB
VmSize: 911108 kB
VmLck: 0 kB
VmHWM: 25515396 kB
VmRSS: 53640 kB <===================== this is justa little larger than 42M.
VmData: 858328 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 1732 kB
",
at k=20,
VmPeak: 28396820 kB
VmSize: 911108 kB
VmLck: 0 kB
VmHWM: 25515396 kB
VmRSS: 53640 kB
VmData: 858328 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 1732 kB
When state=16000, at k=0,
"
VmPeak: 28396820 kB
VmSize: 911108 kB
VmLck: 0 kB
VmHWM: 26683804 kB
VmRSS: 53648 kB
VmData: 858328 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 1732 kB
"
at k=1,
"
VmPeak: 34412208 kB
VmSize: 1831832 kB
VmLck: 0 kB
VmHWM: 31453736 kB
VmRSS: 974220 kB
VmData: 1779052 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 3528 kB
",
at k=47,
"VmPeak: 35346244 kB
VmSize: 1994660 kB
VmLck: 0 kB
VmHWM: 34490032 kB
VmRSS: 1137164 kB
VmData: 1941880 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 3860 kB
",
when state=20000;
at k=0,
"
VmPeak: 35446708 kB
VmSize: 2217204 kB
VmLck: 0 kB
VmHWM: 34498692 kB
VmRSS: 1359616 kB
VmData: 2164424 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 4296 kB
",
when state=24000;
at k=0,
"VmPeak: 35446708 kB
VmSize: 2217204 kB
VmLck: 0 kB
VmHWM: 34498692 kB
VmRSS: 1359616 kB
VmData: 2164424 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 4296 kB
",
at k=1,
"VmPeak: 48898748 kB
VmSize: 2226456 kB
VmLck: 0 kB
VmHWM: 43931232 kB
VmRSS: 1368952 kB
VmData: 2173676 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 4320 kB
",
at k=2,
"VmPeak: 58133264 kB
VmSize: 40184236 kB
VmLck: 0 kB
VmHWM: 53270568 kB
VmRSS: 39326660 kB
VmData: 40131456 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 78664 kB
",
now it become serious.
When state=28000, my process is stop after running several steps. Just before it stopped,
"
VmPeak: 100349956 kB
VmSize: 74397700 kB
VmLck: 0 kB
VmHWM: 91141796 kB
VmRSS: 65545192 kB <====== this is quite large.
VmData: 74329900 kB
VmStk: 84 kB
VmExe: 568 kB
VmLib: 48336 kB
VmPTE: 149148 kB
"
I continue run my program at the point where it stopped. Even if all the data is loaded into memory, the memory usage is much smaller,
"
VmPeak: 51814344 kB <=====this seems to be the maximum memory my code should use when state=28000.
VmSize: 43797400 kB
VmLck: 0 kB
VmHWM: 51218832 kB
VmRSS: 43201932 kB
VmData: 43729600 kB
VmStk: 84 kB
VmExe: 568 kB
VmLib: 48336 kB
VmPTE: 86328 kB
"
I use linux 2.6.18 x86_64, with openmp, compiler flags :
"-O3 -openmp -xSSE4.2"
link " -lmkl_lapack -lmkl_intel_lp64 -lmkl_core -lmkl_sequential",
Do you have any ideas or suggestions?
Thank you very much!
I will describe my problem in more detail.
In my program, there is a global matrix: double Coeff[48][48][48][48], and a matrix class Matrix, the code can write simply as follows:
....
#include
double Coeff[48][48][48][48];
int main(void)
{
int state=12000;
for(int index=0; index<64;index++)
{
for(int k=0;k<48;k++)
{
Matrix*mt=new Matrix(state,k, ....);
mt.diagonalizeMatrix();
mt.saveMatrixToDisk();
delete mt;
}
if(index%8==0)state+=4000;
}
return 1;
}
The matrix class is quite complicate, it diagonalize a Hamiltonian with the size (state*2*state*2)^2. Since the Hamiltonian is a block matrix, so it is much smaller.
After "delete mt", the memory used is expeted to just a little larger than 48*48*48*48*8=42467328=42M.
In the constructor "Matrix(....)", it will read and print the memory information from "/proc/self/status" before loading data from disks.
When state=12000", I can show you some, for example at k=10,
"VmPeak: 28396820 kB
VmSize: 911108 kB
VmLck: 0 kB
VmHWM: 25515396 kB
VmRSS: 53640 kB <===================== this is justa little larger than 42M.
VmData: 858328 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 1732 kB
",
at k=20,
VmPeak: 28396820 kB
VmSize: 911108 kB
VmLck: 0 kB
VmHWM: 25515396 kB
VmRSS: 53640 kB
VmData: 858328 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 1732 kB
When state=16000, at k=0,
"
VmPeak: 28396820 kB
VmSize: 911108 kB
VmLck: 0 kB
VmHWM: 26683804 kB
VmRSS: 53648 kB
VmData: 858328 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 1732 kB
"
at k=1,
"
VmPeak: 34412208 kB
VmSize: 1831832 kB
VmLck: 0 kB
VmHWM: 31453736 kB
VmRSS: 974220 kB
VmData: 1779052 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 3528 kB
",
at k=47,
"VmPeak: 35346244 kB
VmSize: 1994660 kB
VmLck: 0 kB
VmHWM: 34490032 kB
VmRSS: 1137164 kB
VmData: 1941880 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 3860 kB
",
when state=20000;
at k=0,
"
VmPeak: 35446708 kB
VmSize: 2217204 kB
VmLck: 0 kB
VmHWM: 34498692 kB
VmRSS: 1359616 kB
VmData: 2164424 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 4296 kB
",
when state=24000;
at k=0,
"VmPeak: 35446708 kB
VmSize: 2217204 kB
VmLck: 0 kB
VmHWM: 34498692 kB
VmRSS: 1359616 kB
VmData: 2164424 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 4296 kB
",
at k=1,
"VmPeak: 48898748 kB
VmSize: 2226456 kB
VmLck: 0 kB
VmHWM: 43931232 kB
VmRSS: 1368952 kB
VmData: 2173676 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 4320 kB
",
at k=2,
"VmPeak: 58133264 kB
VmSize: 40184236 kB
VmLck: 0 kB
VmHWM: 53270568 kB
VmRSS: 39326660 kB
VmData: 40131456 kB
VmStk: 84 kB
VmExe: 536 kB
VmLib: 32612 kB
VmPTE: 78664 kB
",
now it become serious.
When state=28000, my process is stop after running several steps. Just before it stopped,
"
VmPeak: 100349956 kB
VmSize: 74397700 kB
VmLck: 0 kB
VmHWM: 91141796 kB
VmRSS: 65545192 kB <====== this is quite large.
VmData: 74329900 kB
VmStk: 84 kB
VmExe: 568 kB
VmLib: 48336 kB
VmPTE: 149148 kB
"
I continue run my program at the point where it stopped. Even if all the data is loaded into memory, the memory usage is much smaller,
"
VmPeak: 51814344 kB <=====this seems to be the maximum memory my code should use when state=28000.
VmSize: 43797400 kB
VmLck: 0 kB
VmHWM: 51218832 kB
VmRSS: 43201932 kB
VmData: 43729600 kB
VmStk: 84 kB
VmExe: 568 kB
VmLib: 48336 kB
VmPTE: 86328 kB
"
I use linux 2.6.18 x86_64, with openmp, compiler flags :
"-O3 -openmp -xSSE4.2"
link " -lmkl_lapack -lmkl_intel_lp64 -lmkl_core -lmkl_sequential",
Do you have any ideas or suggestions?
Thank you very much!
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page