I'm experiencing a random malloc error in d_commit_trig_transform. It's persistent, but happens at different times during the code execution as this routine is called repeatedly. I'm currently testing on a Mac using the latest available Intel C++ compiler and MKL versions. The routine that contains the d_commit_trig_transform call is given below. The error occurs regardless of the number of cores used, but usually after a few thousand calls. Does anything look suspicious in the code below? Any advice would be appreciated.
This method performs the discrete sine transform of the first kind (DST-I) on
rhsvector in preparation to solve the linear tridiagonal system. The transform
is performed in chunk sizes of nz. Due to the manner in which the DST is
calculated, an input array (a) of size nz+2 must be used with a=a[nz+1]=0,
and a[1 to nz]=data. A normalization factor of sqrt(2/(nz+1)) must be applied
when copying the transformed data back into rhsvector.
#pragma omp parallel for
for(int i=0; i<nrad; i++)
int error, ipar,n=nz+1,tt_type=0;
DFTI_DESCRIPTOR_HANDLE handle = 0; //data structures used in transform
//copy data from rhsvector
for(int j=0; j<nz; j++)
//copy transformed data back to rhsvector
for(int j=0; j<nz; j++)
if(error != 0)
cout<<"Error = "<<error<<" in free_trig_transform in method RHSVectorDST."<<endl;
Thanks for the reply. I constructed a sample case (attached here) but it doesn't seem to fail. However, in the code snippet I originally posted, almost all of the variables are created in the for loop scope, so it isn't very different. One additional test I tried in the original code was to comment out the free_trig_transform() method. If I do that then the error also doesn't occur, but the memory usage continually grows, as expected. An example error output is :
WI_EIBT(57555,0x101d86dc0) malloc: Incorrect checksum for freed object 0x7fd8014064e8: probably modified after being freed.
Corrupt value: 0xbd9018dc3302c455
WI_EIBT(57555,0x101d86dc0) malloc: *** set a breakpoint in malloc_error_break to debug
I see no problem on my side : win64, statically Linkin with mkl 2020, openmp lnking.
Starting loop #1
Starting loop #2
Starting loop #3
Starting loop #4
Starting loop #1353966
Starting loop #1353967
Starting loop #1353968
Starting loop #1353969
Starting loop #1353970
The full code fails (after some time) on two platforms I've tested on, Mac (MKL 2020) and linux (MKL 2019) dynamically linked with openMP. More interesting behavior I've just found is that if I declare ipar as int* ipar = new int and DON'T delete it at the end of the loop, the code runs (with the expected memory growth) until I kill it. If I "delete ipar" at the end of the loop, then the code fails as before. Ipar should not be touched by anything outside of the loop it's declared in
In case you have the chance to try the full code, I'm attaching it here. WI_EIBT.cpp is the main file. Once compiled you place it in the same directory as the other files included in the archive (*.config, *.dat). Execute the code via "./WI_EIBT WI_EIBT_100k_SC_dt2_6480V.config". It will create a directory named "WI_EIBT_100k_SC_dt2_6480V" and start writing multiple datafiles in that directory and will also print status updates to the terminal. It errors shortly after starting.