Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
102 Views

Wrong eigenvalues of zheevd when OMP_DYNAMIC="TRUE"

I use MKL and IFORT for about 20 year. Now I have version 19.0.4.243.

Two days ago I spotted that zheevd of MKL produces wrong eigenvalues. 

I could reproduce the error, and it looks like this:

A.

export OMP_NUM_THREADS=8

export OMP_NESTED="TRUE"

export OMP_DYNAMIC="TRUE"

export OMP_SCHEDULE="dynamic"

export KMP_AFFINITY="verbose,compact,0,0"

produces a WRONG result

 

B.

setting                                                                                                                                                                         

export OMP_NUM_THREADS=1

or 

export MKL_NUM_THREADS=1

produces the correct result with the other settings left.

 

C.

setting

export OMP_DYNAMIC="FALSE"

produces the correct result, whatever the number of threads and the other settings are.

 

My solution is to have OMP_DYNAMIC="FALSE" in all applications,

but I do not understand why the problem occurs. I have an old machine with

version 11.1 where the problem does not occur.

 

0 Kudos
7 Replies
Highlighted
Moderator
88 Views

Juergen, this an unknown issue. Could You give us the test case in which we could build and reproduce/investigate the cause of the problem on our end?


0 Kudos
Highlighted
Beginner
84 Views

Dear Gennady,

thank you for taking care of the case so quickly. You can download everything from

https://obelix.physik.uni-bielefeld.de/~schnack/intel/diag-intel.tgz

You probably have to modify the makefile. If you run run.sh it should do everything. The program diag reads a huge matrix, of which only a part is diagonalized. This can be set with ThisDim in diag.f. The error did not occur if the matrix is small, say smaller than 100x100, but it always occurs with 1000x1000. I usually treat much bigger matrices.

My omp-output is in run.out. The various eigenvalues are in diag-omp-*

I have more news:

- On my system (centos 7), ifort version 19.0.4.243 I can repeat the problem again and again.

- On a second system (centos 8), onto which I copied the binary, it does not occur.

- On one of Germany's supercomputers (supermuc-ng, Garching) it also does not occur; they run suse and ifort version 19.0.5.281

It may be specific to my system or to the version ...

Thanks for investigating. Cheers, Juergen

0 Kudos
Highlighted
Moderator
82 Views

Hi Juergen,

thanks for the case. We will check the case and will get back asap.



0 Kudos
Highlighted
Moderator
76 Views

Jurgen,

Could you please give us the verbose log when running this test? please export MKL_VERBOSE=1 environment variables and run this executable.


0 Kudos
Highlighted
Beginner
75 Views

Dear Gennady,

you can download the result here:

https://obelix.physik.uni-bielefeld.de/~schnack/intel/diag-intel-2.tgz

A funny thing happened: after having set the variable in run.sh the outcome is 

sometimes wrong, sometimes right. I have repeated it 4 times, and the version

I am sending you is one where the result is wrong.

Cheers, Juergen

 

0 Kudos
Highlighted
Moderator
46 Views

We were not able to reproduce the problem with the current version of MKL 2020u4.

The only difference we see is the optimal workspace size, which changed from 130.0 to 149.0, but it is expected behavior.

The environment we tested: RH7.2 and CentOS 7.7,

two (AVX512 and AVX2 based systems) systems: Intel(R) Xeon(R) Platinum 8268 CPU @ 2.90GHz and Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz

MKL LP64, intel_threaded modes.

-Gennady


0 Kudos
Highlighted
Beginner
40 Views

Dear Gennady,

 

I thank you for trying. It might be a peculiar coincidence on my system.

Since I know that "OMP_DYNAMIC=FALSE" cures the problem, I can run problems with this setting.

I might as well download a new version of the MKL.

 

With best regards, Juergen

 

0 Kudos