- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Greetings,
I'm experiencing a random malloc error in d_commit_trig_transform. It's persistent, but happens at different times during the code execution as this routine is called repeatedly. I'm currently testing on a Mac using the latest available Intel C++ compiler and MKL versions. The routine that contains the d_commit_trig_transform call is given below. The error occurs regardless of the number of cores used, but usually after a few thousand calls. Does anything look suspicious in the code below? Any advice would be appreciated.
void TwoDCylRZPotSolver::RHSVectorDST()
{
/*
This method performs the discrete sine transform of the first kind (DST-I) on
rhsvector in preparation to solve the linear tridiagonal system. The transform
is performed in chunk sizes of nz. Due to the manner in which the DST is
calculated, an input array (a) of size nz+2 must be used with a[0]=a[nz+1]=0,
and a[1 to nz]=data. A normalization factor of sqrt(2/(nz+1)) must be applied
when copying the transformed data back into rhsvector.
*/
double normfac=sqrt(2/double(nz+1));
#pragma omp parallel for
for(int i=0; i<nrad; i++)
{
int error, ipar[128],n=nz+1,tt_type=0;
double dpar[5*(nz+2)/2+2];
DFTI_DESCRIPTOR_HANDLE handle = 0; //data structures used in transform
double datatemp[nz+2];
datatemp[0]=0;
datatemp[nz+1]=0;
d_init_trig_transform(&n,&tt_type,ipar,dpar,&error);
d_commit_trig_transform(datatemp,&handle,ipar,dpar,&error);
//copy data from rhsvector
for(int j=0; j<nz; j++)
datatemp[j+1]=rhsvector[i*nz+j];
//perform transformation
d_backward_trig_transform(datatemp,&handle,ipar,dpar,&error);
//copy transformed data back to rhsvector
for(int j=0; j<nz; j++)
rhsvector[i*nz+j]=normfac*datatemp[j+1];
free_trig_transform(&handle,ipar,&error);
if(error != 0)
cout<<"Error = "<<error<<" in free_trig_transform in method RHSVectorDST."<<endl;
}
}
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We didn't see such an issue with this version of mkl. Could you give the standalone example to reproduce/investigate the problem on our side!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Gennady,
Thanks for the reply. I constructed a sample case (attached here) but it doesn't seem to fail. However, in the code snippet I originally posted, almost all of the variables are created in the for loop scope, so it isn't very different. One additional test I tried in the original code was to comment out the free_trig_transform() method. If I do that then the error also doesn't occur, but the memory usage continually grows, as expected. An example error output is :
WI_EIBT(57555,0x101d86dc0) malloc: Incorrect checksum for freed object 0x7fd8014064e8: probably modified after being freed.
Corrupt value: 0xbd9018dc3302c455
WI_EIBT(57555,0x101d86dc0) malloc: *** set a breakpoint in malloc_error_break to debug
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thanks for the example. Do you see the problem happens on some specific CPU and OS types?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I see no problem on my side : win64, statically Linkin with mkl 2020, openmp lnking.
AVX2 systems
Starting loop #1
Starting loop #2
Starting loop #3
Starting loop #4
....
Starting loop #1353966
Starting loop #1353967
Starting loop #1353968
Starting loop #1353969
Starting loop #1353970
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The full code fails (after some time) on two platforms I've tested on, Mac (MKL 2020) and linux (MKL 2019) dynamically linked with openMP. More interesting behavior I've just found is that if I declare ipar as int* ipar = new int[128] and DON'T delete it at the end of the loop, the code runs (with the expected memory growth) until I kill it. If I "delete[] ipar" at the end of the loop, then the code fails as before. Ipar should not be touched by anything outside of the loop it's declared in
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In case you have the chance to try the full code, I'm attaching it here. WI_EIBT.cpp is the main file. Once compiled you place it in the same directory as the other files included in the archive (*.config, *.dat). Execute the code via "./WI_EIBT WI_EIBT_100k_SC_dt2_6480V.config". It will create a directory named "WI_EIBT_100k_SC_dt2_6480V" and start writing multiple datafiles in that directory and will also print status updates to the terminal. It errors shortly after starting.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page