Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
6977 Discussions

Problems with MKL using new versions of Intel compiler

DanielRuiz
Beginner
1,183 Views

Hi, 

I had a problem with a code giving different results while executed with different versions of the Intel compiler. I experienced the problem with multiple similar codes but I attach the most reduced version of the code which allows reproducing the problem. The code is written in Fortran and uses the Math Kernel Library (MKL) to do Fourier transforms and integrate a partial differential equation. 

How to identify the problem: When the code is executed generates two columns of data. The first one is for the time of integration and the second for a value in the middle of the array that is integrated in time. For certain cases, the second column rapidly increases its value (accumulating error) leading to NaNs. My interpretation is that the error in the integration increases dramatically leading to NaNs for all values of the array and is observed with the written value in the second column. 

The error appears for versions of the ifort compiler after 2019. I found in the release notes of the MKL that from version 2019 the MKL has the following update:

 - Improved performance for non-power of 2 sizes on Intel® AVX-512

which may be potentially related to the problem. Moreover, changing the size of the arrays to values different than a power of 2 also produced in some cases correct results. This seems to indicate the problem is related to the Fourier transforms/MKL.

The problem seems to be independent of compilation options.

Compilation of mkl:

ifort -c /opt/intel/oneapi/mkl/latest/include/mkl_dfti.f90 -o mkl_dfti.o

Different compilations used:

 - ifort -ipo -O3 -no-prec-div -fp-model fast=2 -march=sandybridge -mtune=core-avx2 -o myprogram.x -mkl code.f90

 - ifort -fast -o myprogram.x -mkl code.f90

 - ifort -o myprogram.x -mkl code.f90


Summary of the code:

1 - Variables declaration

2 - Fourier transform settings

3 - Compute temporal propagators (l_prop, nl_prop)

4 - Initialize 2D data array (e) 

    The array is computed as e(i) = mod(i,3) +1d0. Usually, e includes noisy data but that way external libraries are not needed and the problem still appears.

5 - Fourier transform (e --> e_fourier)

6 - Integration loop

    6.1 Nonlinear term computed (field = -e**3)

    6.2 Fourier transform (field --> field_fourier)

    6.3 Matrix computed at t+dt in Fourier (e_fourier = l_prop*e_fourier + nl_prop*field_fourier)

    6.4 Fourier transform (e_fourier --> e)

 

Possible origin of the problem related to the calculations performed: Since all the values of the array approach 3 in real space (e) in Fourier space all values of the array (e_fourier) become very close to zero except the homogeneous mode. This may be related to the fact that doing calculations with small values close to the precision limit in Fortran. Maybe optimizations in the new versions of the compiler that increase speed while sacrificing precision may lead to the problem. 

Potentially related: Another thing I have tried is to set the environment variables described in the article Conditional Numerical Reproducibility, which solved the problem when a smaller array size was used (32x16, this can be changed in lines 3 and 4 of the code attached). For 32x16 the problem appears to be dependent on the machine used, AMD processors produce correct results while Intel processors don't. The only Intel machine that produces correct results didn't support AVX-512. However, for a bigger array size (256x256) the problem persisted independent of the machine used and only got correct results using an old versión of ifort (version 2013.0.079).

I haven't been able to further identify the origin of the problem or find a solution. Would be really helpful to know if there is something wrong with the code I use, the compilation or if it is a problem with the MKL.

 

Best regards,
Daniel 

0 Kudos
1 Solution
Khang_N_Intel
Employee
646 Views

Hi Daniel,


The parameter field = - e**3 in DftiComputeForward(punt_f,field,field_fourier) that will lead to float overflow which later results in NaN (Inf-Inf, e.g.).


This is an issue in your code, not in oneMKL FFT.


Best regards,

Khang


View solution in original post

4 Replies
VidyalathaB_Intel
Moderator
1,120 Views

Hi,


Thanks for reaching out to us.


The issue is reproducible from our end also.

We are working on this issue internally. we will get back to you soon.


Regards,

Vidya.


0 Kudos
Khang_N_Intel
Employee
1,079 Views

Hi Daniel,

I will take a look at this issue and will let you know we will proceed next.

Best regards,

Khang


0 Kudos
Khang_N_Intel
Employee
1,037 Views

Hi Daniel,


I confirmed that this issue does occur when linking to oneMKL 2021.4. This issue is escalated to the developers.

We will let you know when we root cause the issue.


Best regards,

Khang


0 Kudos
Khang_N_Intel
Employee
647 Views

Hi Daniel,


The parameter field = - e**3 in DftiComputeForward(punt_f,field,field_fourier) that will lead to float overflow which later results in NaN (Inf-Inf, e.g.).


This is an issue in your code, not in oneMKL FFT.


Best regards,

Khang


Reply