- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I updated my Composer from composer_xe_2011_sp1.7 to sp1.9, and my working code stopped working due to memory issues.
The first errors occured when deleting large arrays (for CRS-stored matrices). The delete[]-command caused the error:
if (values) delete []values; values = NULL;
(I always NULL my deletes pointers/arrays.)
Playing around with ulimit (Stacksize) and KMP_STACKSIZE did not help, but moved the error from my own routine to some mkl-subroutine:
0x00002aaaafc560a4 in mkl_spblas_lp64_dcsr0tg__c__mvout_par () from /opt/common/intel/composer_xe_2011_sp1.9.293/mkl/lib/intel64/libmkl_mc3.so
Unfortunatly, I cannot provide a "minimal working example" of this problem.
Any ideas? Or shall I switch back to sp1.7? Btw, sp1.8. does not work, too. Everytime I try a new version, I get new problems, usually somehow related to PARDISO...
Somewhere I read that in a threated enviroment, sometimes releasing (shared) memory is a problem. I remove ALL openmp-clauses and "omp.h" and those compiler-flags. No change.
Intel-Compiler Version 1210, Build-Date 20120212, kompatibel zu GNU-Compiler Version 4.5.2
Intel Math Kernel Library Version 10.3.9 Product Build 20120131 for Intel 64 architecture applications
AVX-optimizations : enabled.
Processor optimization : Intel Core i7 Processor
Any idea is appreciated!
I updated my Composer from composer_xe_2011_sp1.7 to sp1.9, and my working code stopped working due to memory issues.
The first errors occured when deleting large arrays (for CRS-stored matrices). The delete[]-command caused the error:
if (values) delete []values; values = NULL;
(I always NULL my deletes pointers/arrays.)
Playing around with ulimit (Stacksize) and KMP_STACKSIZE did not help, but moved the error from my own routine to some mkl-subroutine:
0x00002aaaafc560a4 in mkl_spblas_lp64_dcsr0tg__c__mvout_par () from /opt/common/intel/composer_xe_2011_sp1.9.293/mkl/lib/intel64/libmkl_mc3.so
Unfortunatly, I cannot provide a "minimal working example" of this problem.
Any ideas? Or shall I switch back to sp1.7? Btw, sp1.8. does not work, too. Everytime I try a new version, I get new problems, usually somehow related to PARDISO...
Somewhere I read that in a threated enviroment, sometimes releasing (shared) memory is a problem. I remove ALL openmp-clauses and "omp.h" and those compiler-flags. No change.
Intel-Compiler Version 1210, Build-Date 20120212, kompatibel zu GNU-Compiler Version 4.5.2
Intel Math Kernel Library Version 10.3.9 Product Build 20120131 for Intel 64 architecture applications
AVX-optimizations : enabled.
Processor optimization : Intel Core i7 Processor
Any idea is appreciated!
Link Copied
15 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What you are observing is typical of the corruption of heap allocation header(s) of the array "values" (and/or the objects/arrays deleted in dtors in the array of objects in "values"). You do not delete stack variables.
This assumes values was properly allocated (as opposed to uninitialized junk in the pointer).
Try compiling with subscript out of bounds (and uninitialized variable) runtime checks enabled. If that doesn't expose anything, then try valgrind or something equivilent.
Jim Dempsey
This assumes values was properly allocated (as opposed to uninitialized junk in the pointer).
Try compiling with subscript out of bounds (and uninitialized variable) runtime checks enabled. If that doesn't expose anything, then try valgrind or something equivilent.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting fabi.k
...
The delete[]-command caused the error:
if (values) delete []values; values = NULL;
Any idea is appreciated!
The delete[]-command caused the error:
if (values) delete []values; values = NULL;
Any idea is appreciated!
Two possible reasons are as follows:
1. The variable/member 'values' is already released
2.A memory corruption happened before ( I agree with Jim )
A releaselike thisis better:
...
if( pSomeData != NULL )
{
delete [] pSomeData;
pSomeData = NULL;
}
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
as Jim and Sergey already mentioned it high likely seems to be a dangling pointer issue that might have been there for quite some time. A small change in the build system unveiled it finally.
I'm not excluding other root causes but it's better to analyze invalid pointers first.
Hence I'd recommend to use Intel Inspector XE 2011 and start a memory analysis. Afterwards, or alternatively, you can manually debug into this problem using Intel Debugger (IDB) or GDB.
In other cases it also helps to reduce the problem to a smaller reproducer.
Best regards,
Georg Zitzlsberger
as Jim and Sergey already mentioned it high likely seems to be a dangling pointer issue that might have been there for quite some time. A small change in the build system unveiled it finally.
I'm not excluding other root causes but it's better to analyze invalid pointers first.
Hence I'd recommend to use Intel Inspector XE 2011 and start a memory analysis. Afterwards, or alternatively, you can manually debug into this problem using Intel Debugger (IDB) or GDB.
In other cases it also helps to reduce the problem to a smaller reproducer.
Best regards,
Georg Zitzlsberger
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you very much for your suggestions and help, I will try and report here later.
My compiler warnings settings are VERY pedantic, in fact I enabled almost everything possible... in some older versions I even got warnings in your own MKL-headers ;-)
CFLAGS_ICPC12_WARNINGS = -w2 -Wall -Wcheck -Wabi -Wcomment -Wdeprecated -Wformat -Wformat-security -Wmain -Wmissing-declarations -Wmissing-prototypes -Wnon-virtual-dtor -Wpointer-arith -Wremarks -Wreturn-type -Wreorder -Wshadow -Wstrict-aliasing -Wstrict-prototypes -Wsign-compare -Wtrigraphs -Wuninitialized -Wunused-function -Wunused-variable -Wwrite-strings -std=c++0x
Usually I'm very disciplined on uninitalized pointers and stuff, and valgrind did not find any "related" memory leaks so far. I will check out this Intel Inspector XE 2011 thing, but I cannot imagine it will show more that valgrind.
My compiler warnings settings are VERY pedantic, in fact I enabled almost everything possible... in some older versions I even got warnings in your own MKL-headers ;-)
CFLAGS_ICPC12_WARNINGS = -w2 -Wall -Wcheck -Wabi -Wcomment -Wdeprecated -Wformat -Wformat-security -Wmain -Wmissing-declarations -Wmissing-prototypes -Wnon-virtual-dtor -Wpointer-arith -Wremarks -Wreturn-type -Wreorder -Wshadow -Wstrict-aliasing -Wstrict-prototypes -Wsign-compare -Wtrigraphs -Wuninitialized -Wunused-function -Wunused-variable -Wwrite-strings -std=c++0x
Usually I'm very disciplined on uninitalized pointers and stuff, and valgrind did not find any "related" memory leaks so far. I will check out this Intel Inspector XE 2011 thing, but I cannot imagine it will show more that valgrind.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting fabi.k
...
My compiler warnings settings are VERY pedantic, in fact I enabled almost everything possible... in some older versions I even got warnings in your own MKL-headers ;-)
...
My compiler warnings settings are VERY pedantic, in fact I enabled almost everything possible... in some older versions I even got warnings in your own MKL-headers ;-)
...
Even if many warnings are enabled it doesn't eliminate or detect a logical error and, as a result, a crash in an application.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>I always NULL my deletes pointers/arrays.
And what about your uninitialized pointers/arrays?
And, when pointers/arrays not NULL, are you making an incorrect assumption as to the size of the allocation(s)?
Jim Dempsey
And what about your uninitialized pointers/arrays?
And, when pointers/arrays not NULL, are you making an incorrect assumption as to the size of the allocation(s)?
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Sergey: thx, but I'm aware of that.
Uninitialized pointers in the code are - imho - not the problem. Things go wrong when I start using PARDISO for the second time. Without this, everything is fine.
The error is as follow:
- (huge) memory allocation (pointer=new...)
- (huge) memory release (delete[] and pointer=NULL)
- (huge) memory allocation (pointer=new...)
- (huge) memory release (delete[] and pointer=NULL)
...
- (huge) memory allocation (pointer=new...)
- PARDISO
- (huge) memory release (delete[]... *error*)
7ffff6adf000-7ffff6cdf000 ---p 00d03000 00:18 61539763 /opt/common/intel/composer_xe_2011_sp1.9.293/mkl/lib/intel64/libmkl_intel_thread.so
The pointers are private members in some other class, which is NOT connected to the PARDISO at all - or should not. Of course I'm aware there could be logical errors, but I wouldn't ask here if I had not already spent days on resolving these.
@Jim:
What about that heap allocation thing?
This memory allocation/release is in a method and repeated for a couple of times, before PARDISO starts. But just in a method, no objects are deleted at this time.
The problem seems to be MKL-10.3.9-related, since the g++-Compiler and Intel-Compiler Version 1210, Build-Date 20120212 also fails. Using MKL 10.3.7 (instead of 10.3.9 oder 10.3.8), everything is fine.
Uninitialized pointers in the code are - imho - not the problem. Things go wrong when I start using PARDISO for the second time. Without this, everything is fine.
The error is as follow:
- (huge) memory allocation (pointer=new...)
- (huge) memory release (delete[] and pointer=NULL)
- (huge) memory allocation (pointer=new...)
- (huge) memory release (delete[] and pointer=NULL)
...
- (huge) memory allocation (pointer=new...)
- PARDISO
- (huge) memory release (delete[]... *error*)
7ffff6adf000-7ffff6cdf000 ---p 00d03000 00:18 61539763 /opt/common/intel/composer_xe_2011_sp1.9.293/mkl/lib/intel64/libmkl_intel_thread.so
The pointers are private members in some other class, which is NOT connected to the PARDISO at all - or should not. Of course I'm aware there could be logical errors, but I wouldn't ask here if I had not already spent days on resolving these.
@Jim:
What about that heap allocation thing?
This memory allocation/release is in a method and repeated for a couple of times, before PARDISO starts. But just in a method, no objects are deleted at this time.
The problem seems to be MKL-10.3.9-related, since the g++-Compiler and Intel-Compiler Version 1210, Build-Date 20120212 also fails. Using MKL 10.3.7 (instead of 10.3.9 oder 10.3.8), everything is fine.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Btw, the error does not occur (neither MKL 10.3.7, 8 nor 9) if I use
mkl_set_num_threads(1);
at the beginning.
OpenMP is not used (at least not by me, but i guess it is somehow used inside the MKL).
Doesn't that support my idea of "maybe somethings wrong in the MKL?".
mkl_set_num_threads(1);
at the beginning.
OpenMP is not used (at least not by me, but i guess it is somehow used inside the MKL).
Doesn't that support my idea of "maybe somethings wrong in the MKL?".
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
yes things seem to turn out against MKL.
Would it be possible to provide a small reproducer? I'm aware that it means some (big) work on your side but otherwise we're searching the needle in the haystack. I'm highly appreciating your efforts!
Thank you & best regards,
Georg Zitzlsberger
yes things seem to turn out against MKL.
Would it be possible to provide a small reproducer? I'm aware that it means some (big) work on your side but otherwise we're searching the needle in the haystack. I'm highly appreciating your efforts!
Thank you & best regards,
Georg Zitzlsberger
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Georg,
this is really a lot of work - and a first cut&paste-code to implement the idea from above does not reproduce the error. I don't think I can provide a small reproducer, it would take days or not be small and I don't want to give away our code.
I'm switching back to 10.3.7 and hope this works me.
Btw, it would be really nice to have PARDISO like pardiso(pt, .... blah blah..., const pointerE, const values, const input, output).
"const" is really helpful tool to avoid logical errors.
Best regards,
Fabian
this is really a lot of work - and a first cut&paste-code to implement the idea from above does not reproduce the error. I don't think I can provide a small reproducer, it would take days or not be small and I don't want to give away our code.
I'm switching back to 10.3.7 and hope this works me.
Btw, it would be really nice to have PARDISO like pardiso(pt, .... blah blah..., const pointerE, const values, const input, output).
"const" is really helpful tool to avoid logical errors.
Best regards,
Fabian
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting fabi.k
...The first errors occured when deleting large arrays (for CRS-stored matrices). The delete[]-command caused the error:
if (values) delete [] values; values = NULL;
...
if (values) delete [] values; values = NULL;
...
Did you try to comment a 'delete [] ...' part(s) of your code? If Yes, did you have any errors?
Best regards,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[cpp]void MatrixCRS::reallocateMemory(const int _newDim, const int _newNonZeros) {
/* if (values!=NULL) delete[] values; values = NULL;
if (columns!=NULL) delete[] columns; columns = NULL;
if (pointerB3!=NULL) delete[] pointerB3; pointerB3 = NULL;
if (pointerE!=NULL) delete[] pointerE; pointerE = NULL;*/
const long needed = (sizeof(REAL)+sizeof(MKL_INT))*_newNonZeros + 2*sizeof(MKL_INT)*_newDim;
if (memcheck && !System::checkRAM(needed)) { cout << _MEMORYFEHLER << " name=" << getName() << endl; exit(EXIT_FAILURE);}
try {
values = new REAL[_newNonZeros];
columns = new MKL_INT[_newNonZeros];
pointerB3 = new MKL_INT[_newDim+1];
pointerE = new MKL_INT[_newDim];
}
catch (exception& e) { cout << _CATCHIT(e) << "name=" << getName() << ", _newNonZeros=" << _newNonZeros << ", _newDim=" << _newDim << endl; throw; }
nonZeros = _newNonZeros;
}[/cpp]
Like that?
First of all, it works (or at least the error has not occured yet).
But: ??? Isn't NOT freeing allocated memory one of the DON'TS of C++? Of course I can try to mimize reallocation, for performance reasons. But shouldn't the upper example work with the deletes? And, with ALL versions of the MKL, not only <10.3.8.?
I'm not implementing vital ISS-software, but it would be nice to know that the upper code block does not affect other parts of my program - or is itself affected by some spacy >10.3.7.-MKL/OMP-subroutines...
Thx for helping me out here.
(btw: Ubuntu 11.04, 24x Xeon X5660, 48 GB mem, 10% of mem in usage during typical computation)
Like that?
First of all, it works (or at least the error has not occured yet).
But: ??? Isn't NOT freeing allocated memory one of the DON'TS of C++? Of course I can try to mimize reallocation, for performance reasons. But shouldn't the upper example work with the deletes? And, with ALL versions of the MKL, not only <10.3.8.?
I'm not implementing vital ISS-software, but it would be nice to know that the upper code block does not affect other parts of my program - or is itself affected by some spacy >10.3.7.-MKL/OMP-subroutines...
Thx for helping me out here.
(btw: Ubuntu 11.04, 24x Xeon X5660, 48 GB mem, 10% of mem in usage during typical computation)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Fabian,
even though you might be aware of this already I'd like to mention it here for completeness:
IntelMKLMemoryManagementSoftware
IntelMKLhasmemorymanagementsoftwarethatcontrolsmemorybuffersfortheusebythelibraryfunctions.
NewbuffersthatthelibraryallocateswhenyourapplicationcallsIntelMKLarenotdeallocateduntiltheprogram
ends.Togettheamountofmemoryallocatedbythememorymanagementsoftware,callthemkl_mem_stat()
function.Ifyourprogramneedstofreememory,callmkl_free_buffers().Ifanothercallismadetoalibrary
functionthatneedsamemorybuffer,thememorymanageragainallocatesthebuffersandtheyagainremain
allocateduntileithertheprogramendsortheprogramdeallocatesthememory.Thisbehaviorfacilitatesbetter
performance.However,sometoolsmayreportthisbehaviorasamemoryleak.
Thememorymanagementsoftwareisturnedonbydefault.Toturnitoff,settheMKL_DISABLE_FAST_MM
environmentvariabletoanyvalueorcallthemkl_disable_fast_mm()function.Beawarethatthischangemay
negativelyimpactperformanceofsomeIntelMKLroutines,especiallyforsmallproblemsizes.
(from the Intel Math Kernel Library for Linux* OS users guide for 10.3.9)
Does it make sense for your example to call "mkl_free_buffers()" before deleting the arrays? Also, just for testing, do you see a change when setting $MKL_DISABLE_FAST_MM?
Best regards,
Georg Zitzlsberger
even though you might be aware of this already I'd like to mention it here for completeness:
IntelMKLMemoryManagementSoftware
IntelMKLhasmemorymanagementsoftwarethatcontrolsmemorybuffersfortheusebythelibraryfunctions.
NewbuffersthatthelibraryallocateswhenyourapplicationcallsIntelMKLarenotdeallocateduntiltheprogram
ends.Togettheamountofmemoryallocatedbythememorymanagementsoftware,callthemkl_mem_stat()
function.Ifyourprogramneedstofreememory,callmkl_free_buffers().Ifanothercallismadetoalibrary
functionthatneedsamemorybuffer,thememorymanageragainallocatesthebuffersandtheyagainremain
allocateduntileithertheprogramendsortheprogramdeallocatesthememory.Thisbehaviorfacilitatesbetter
performance.However,sometoolsmayreportthisbehaviorasamemoryleak.
Thememorymanagementsoftwareisturnedonbydefault.Toturnitoff,settheMKL_DISABLE_FAST_MM
environmentvariabletoanyvalueorcallthemkl_disable_fast_mm()function.Beawarethatthischangemay
negativelyimpactperformanceofsomeIntelMKLroutines,especiallyforsmallproblemsizes.
(from the Intel Math Kernel Library for Linux* OS users guide for 10.3.9)
Does it make sense for your example to call "mkl_free_buffers()" before deleting the arrays? Also, just for testing, do you see a change when setting $MKL_DISABLE_FAST_MM?
Best regards,
Georg Zitzlsberger
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>...
>>Like that?
Yes, and the purpose of that test isto verify that there are no problems in another parts of your codes.
>>First of all, it works (or at least the error has not occured yet).
It seems to me that as soon as these pointers passed toMKL functions you are no longer
responsible for releasing them. Almost the same approach is used in COM programming.
>>But: ??? Isn't NOT freeing allocated memory one of the DON'TS of C++?
No. Of course the memory must be released. The question is who is responsible for this.
>>But shouldn't the upper example work with the deletes?
Yes, it should work if you don't use any MKL functions and don't pass any pointers with already allocated
memory to any MKL functions.
>>Like that?
Yes, and the purpose of that test isto verify that there are no problems in another parts of your codes.
>>First of all, it works (or at least the error has not occured yet).
It seems to me that as soon as these pointers passed toMKL functions you are no longer
responsible for releasing them. Almost the same approach is used in COM programming.
>>But: ??? Isn't NOT freeing allocated memory one of the DON'TS of C++?
No. Of course the memory must be released. The question is who is responsible for this.
>>But shouldn't the upper example work with the deletes?
Yes, it should work if you don't use any MKL functions and don't pass any pointers with already allocated
memory to any MKL functions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Disable Intel MM via mkl_disable_fast_mm() does not help, but moves the error to a MKL-multiplication routine. Thank for that hint, anyway! Good to know, for debugging.
>> It seems to me that as soon as these pointers passed toMKL functions you are no longer
>> responsible for releasing them. Almost the same approach is used in COM programming.
So, when I used my pointerE/pointerB3/etc-arrays in any (or some) MKL functions, somebody takes care of releasing MY memory, but does not ask me WHEN this should happen? Did I get that right? (if yes, is there any documentation about that? or is it that snipped about mkl_disable_fast_mm()?)
Btw, the error (or its pseudo-random behaviour) is NOT restricted to the machine I'm using, but to MKL 10.3.8, and 10.3.9.
>> It seems to me that as soon as these pointers passed toMKL functions you are no longer
>> responsible for releasing them. Almost the same approach is used in COM programming.
So, when I used my pointerE/pointerB3/etc-arrays in any (or some) MKL functions, somebody takes care of releasing MY memory, but does not ask me WHEN this should happen? Did I get that right? (if yes, is there any documentation about that? or is it that snipped about mkl_disable_fast_mm()?)
Btw, the error (or its pseudo-random behaviour) is NOT restricted to the machine I'm using, but to MKL 10.3.8, and 10.3.9.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page