Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
5 Views

SVD produces wrong results in mkl=parallel (2013 sp1)

Jump to solution

I have experienced a strange bug in MKL: zgesvd produces different results (some wrong) depending on the number of threads that MKL uses. Above 2, the singular values all become NaN, even though the matrix is perfectly diagonalizable. I would appreciate some help, as this is critical for my simulations at work.

I have placed a copy of a reproducible example here
https://www.dropbox.com/sh/0fejoblyv7w6t30/AABcD9jW3KZRR0z5BLJXA0KLa?dl=0

The code is intended to be run in a cluster with varying number of threads and thus "Makefile" should serve only as a guide. The compiler version is that from Composer 2013 sp1 2.144 and the Intel MKL library is the one that comes with this software.

Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 14.0.2.144 Build 20140120
Copyright (C) 1985-2014 Intel Corporation. All rights reserved.

0 Kudos

Accepted Solutions
Highlighted
Employee
5 Views

Hi Juan, 

I recalled we had the similar SVD bug report in MKL 11.0.x https://software.intel.com/en-us/articles/svd-multithreading-bug-in-mkl

and the bug was fixed in MKL 11. 0 update 4. 

And you mentioned the issue in MKL 11.1.2  ( Composer 2013 sp1 2.144,   https://software.intel.com/en-us/articles/which-version-of-the-intel-ipp-intel-mkl-and-intel-tbb-lib....   and mecej4 mention his result with current version is ok (thank you, Mecej4).   

If possible, could you please try the latest version MKL 11.2 update 1  to see if there any change? (it seems i can't access your reproducible example by the URL)

Best Regards,

Ying 

p.s https://software.intel.com/en-us/articles/intel-mkl-112-bug-fixes. ;

 

 

 

View solution in original post

0 Kudos
6 Replies
Highlighted
Black Belt
5 Views

I have a somewhat vague recollection of an older version of MKL having a bug in the returned value after a workspace query. The current version of MKL gives 16770 for the value of lwork with your code. (This is on a 4-core i7 Sandy Bridge CPU; the output singular values look normal, although there are a few as small as 10-26. The output did not change with the number of threads used.)

Please check what your installation gives for lwork after the first call to zgesvd.

0 Kudos
Highlighted
5 Views

The value of lwork is the right one, 16770. Indeed many of the singular values will be close to zero: we use SVD to extract the actually useful portion of a matrix, discarding singular values which are close to zero. However, this has never been a problem and the code's output is fine when MKL_NUM_THREADS=1

0 Kudos
Highlighted
Black Belt
5 Views

Does the error occur only when you run on a cluster, or does it happen on a single (but multicore) CPU?

0 Kudos
Highlighted
5 Views

The "cluster" part is irrelevant and I only mentioned it because the Makefile makes use of the queue manager to send the processes to the computation nodes. However, these are independent computers with a fixed number of cores (which I do not remember) and where I do not use any multiprocessing capabilities (no mpi, nothing similar). So the problem strictly relates the multithreading routines in MKL and the different outcomes depend on the number of threads that are used by MKL inside the same computer. Note however, that I do not have ssh access to the computation nodes and cannot manually run the code in them, so my information in this respect is very limited.

0 Kudos
Highlighted
Employee
6 Views

Hi Juan, 

I recalled we had the similar SVD bug report in MKL 11.0.x https://software.intel.com/en-us/articles/svd-multithreading-bug-in-mkl

and the bug was fixed in MKL 11. 0 update 4. 

And you mentioned the issue in MKL 11.1.2  ( Composer 2013 sp1 2.144,   https://software.intel.com/en-us/articles/which-version-of-the-intel-ipp-intel-mkl-and-intel-tbb-lib....   and mecej4 mention his result with current version is ok (thank you, Mecej4).   

If possible, could you please try the latest version MKL 11.2 update 1  to see if there any change? (it seems i can't access your reproducible example by the URL)

Best Regards,

Ying 

p.s https://software.intel.com/en-us/articles/intel-mkl-112-bug-fixes. ;

 

 

 

View solution in original post

0 Kudos
Highlighted
5 Views

Hi Ying, indeed this is version 11.1.2. I assumed that version numbers in Composer and MKL were correlated and that Composer 2013 corresponded to MKL 13.* Sad to be wrong. I do no thave access to more recent versions of the library in this cluster. In other clusters MKL 13.* does not give rise to the same bug, so probably you are right in that this bug was corrected with the upgrades.

0 Kudos