- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have experienced a strange bug in MKL: zgesvd produces different results (some wrong) depending on the number of threads that MKL uses. Above 2, the singular values all become NaN, even though the matrix is perfectly diagonalizable. I would appreciate some help, as this is critical for my simulations at work.
I have placed a copy of a reproducible example here
https://www.dropbox.com/sh/0fejoblyv7w6t30/AABcD9jW3KZRR0z5BLJXA0KLa?dl=0
The code is intended to be run in a cluster with varying number of threads and thus "Makefile" should serve only as a guide. The compiler version is that from Composer 2013 sp1 2.144 and the Intel MKL library is the one that comes with this software.
Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 14.0.2.144 Build 20140120
Copyright (C) 1985-2014 Intel Corporation. All rights reserved.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Juan,
I recalled we had the similar SVD bug report in MKL 11.0.x https://software.intel.com/en-us/articles/svd-multithreading-bug-in-mkl
and the bug was fixed in MKL 11. 0 update 4.
And you mentioned the issue in MKL 11.1.2 ( Composer 2013 sp1 2.144, https://software.intel.com/en-us/articles/which-version-of-the-intel-ipp-intel-mkl-and-intel-tbb-libraries-are-included-in-the-intel). and mecej4 mention his result with current version is ok (thank you, Mecej4).
If possible, could you please try the latest version MKL 11.2 update 1 to see if there any change? (it seems i can't access your reproducible example by the URL)
Best Regards,
Ying
p.s https://software.intel.com/en-us/articles/intel-mkl-112-bug-fixes. ;
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a somewhat vague recollection of an older version of MKL having a bug in the returned value after a workspace query. The current version of MKL gives 16770 for the value of lwork with your code. (This is on a 4-core i7 Sandy Bridge CPU; the output singular values look normal, although there are a few as small as 10-26. The output did not change with the number of threads used.)
Please check what your installation gives for lwork after the first call to zgesvd.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The value of lwork is the right one, 16770. Indeed many of the singular values will be close to zero: we use SVD to extract the actually useful portion of a matrix, discarding singular values which are close to zero. However, this has never been a problem and the code's output is fine when MKL_NUM_THREADS=1
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Does the error occur only when you run on a cluster, or does it happen on a single (but multicore) CPU?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The "cluster" part is irrelevant and I only mentioned it because the Makefile makes use of the queue manager to send the processes to the computation nodes. However, these are independent computers with a fixed number of cores (which I do not remember) and where I do not use any multiprocessing capabilities (no mpi, nothing similar). So the problem strictly relates the multithreading routines in MKL and the different outcomes depend on the number of threads that are used by MKL inside the same computer. Note however, that I do not have ssh access to the computation nodes and cannot manually run the code in them, so my information in this respect is very limited.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Juan,
I recalled we had the similar SVD bug report in MKL 11.0.x https://software.intel.com/en-us/articles/svd-multithreading-bug-in-mkl
and the bug was fixed in MKL 11. 0 update 4.
And you mentioned the issue in MKL 11.1.2 ( Composer 2013 sp1 2.144, https://software.intel.com/en-us/articles/which-version-of-the-intel-ipp-intel-mkl-and-intel-tbb-libraries-are-included-in-the-intel). and mecej4 mention his result with current version is ok (thank you, Mecej4).
If possible, could you please try the latest version MKL 11.2 update 1 to see if there any change? (it seems i can't access your reproducible example by the URL)
Best Regards,
Ying
p.s https://software.intel.com/en-us/articles/intel-mkl-112-bug-fixes. ;
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ying, indeed this is version 11.1.2. I assumed that version numbers in Composer and MKL were correlated and that Composer 2013 corresponded to MKL 13.* Sad to be wrong. I do no thave access to more recent versions of the library in this cluster. In other clusters MKL 13.* does not give rise to the same bug, so probably you are right in that this bug was corrected with the upgrades.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page