Solved: Re:Problems with MKL using new versions of Intel compiler

DanielRuiz · ‎10-06-2021

Hi,

I had a problem with a code giving different results while executed with different versions of the Intel compiler. I experienced the problem with multiple similar codes but I attach the most reduced version of the code which allows reproducing the problem. The code is written in Fortran and uses the Math Kernel Library (MKL) to do Fourier transforms and integrate a partial differential equation.

How to identify the problem: When the code is executed generates two columns of data. The first one is for the time of integration and the second for a value in the middle of the array that is integrated in time. For certain cases, the second column rapidly increases its value (accumulating error) leading to NaNs. My interpretation is that the error in the integration increases dramatically leading to NaNs for all values of the array and is observed with the written value in the second column.

The error appears for versions of the ifort compiler after 2019. I found in the release notes of the MKL that from version 2019 the MKL has the following update:

- Improved performance for non-power of 2 sizes on Intel® AVX-512

which may be potentially related to the problem. Moreover, changing the size of the arrays to values different than a power of 2 also produced in some cases correct results. This seems to indicate the problem is related to the Fourier transforms/MKL.

The problem seems to be independent of compilation options.

Compilation of mkl:

ifort -c /opt/intel/oneapi/mkl/latest/include/mkl_dfti.f90 -o mkl_dfti.o

Different compilations used:

- ifort -ipo -O3 -no-prec-div -fp-model fast=2 -march=sandybridge -mtune=core-avx2 -o myprogram.x -mkl code.f90

- ifort -fast -o myprogram.x -mkl code.f90

- ifort -o myprogram.x -mkl code.f90

Summary of the code:

1 - Variables declaration

2 - Fourier transform settings

3 - Compute temporal propagators (l_prop, nl_prop)

4 - Initialize 2D data array (e)

The array is computed as e(i) = mod(i,3) +1d0. Usually, e includes noisy data but that way external libraries are not needed and the problem still appears.

5 - Fourier transform (e --> e_fourier)

6 - Integration loop

6.1 Nonlinear term computed (field = -e**3)

6.2 Fourier transform (field --> field_fourier)

6.3 Matrix computed at t+dt in Fourier (e_fourier = l_prop*e_fourier + nl_prop*field_fourier)

6.4 Fourier transform (e_fourier --> e)

Possible origin of the problem related to the calculations performed: Since all the values of the array approach 3 in real space (e) in Fourier space all values of the array (e_fourier) become very close to zero except the homogeneous mode. This may be related to the fact that doing calculations with small values close to the precision limit in Fortran. Maybe optimizations in the new versions of the compiler that increase speed while sacrificing precision may lead to the problem.

Potentially related: Another thing I have tried is to set the environment variables described in the article Conditional Numerical Reproducibility, which solved the problem when a smaller array size was used (32x16, this can be changed in lines 3 and 4 of the code attached). For 32x16 the problem appears to be dependent on the machine used, AMD processors produce correct results while Intel processors don't. The only Intel machine that produces correct results didn't support AVX-512. However, for a bigger array size (256x256) the problem persisted independent of the machine used and only got correct results using an old versión of ifort (version 2013.0.079).

I haven't been able to further identify the origin of the problem or find a solution. Would be really helpful to know if there is something wrong with the code I use, the compilation or if it is a problem with the MKL.

Best regards,
Daniel

Khang_N_Intel · ‎02-17-2022

Hi Daniel,

The parameter field = - e**3 in DftiComputeForward(punt_f,field,field_fourier) that will lead to float overflow which later results in NaN (Inf-Inf, e.g.).

This is an issue in your code, not in oneMKL FFT.

Best regards,

Khang

View solution in original post

VidyalathaB_Intel · ‎10-07-2021

Hi,

Thanks for reaching out to us.

The issue is reproducible from our end also.

We are working on this issue internally. we will get back to you soon.

Regards,

Vidya.

Khang_N_Intel · ‎10-07-2021

Hi Daniel,

I will take a look at this issue and will let you know we will proceed next.

Best regards,

Khang

Khang_N_Intel · ‎10-07-2021

Hi Daniel,

I confirmed that this issue does occur when linking to oneMKL 2021.4. This issue is escalated to the developers.

We will let you know when we root cause the issue.

Best regards,

Khang

Khang_N_Intel · ‎02-17-2022

Hi Daniel,

The parameter field = - e**3 in DftiComputeForward(punt_f,field,field_fourier) that will lead to float overflow which later results in NaN (Inf-Inf, e.g.).

This is an issue in your code, not in oneMKL FFT.

Best regards,

Khang