After installing MKL 11.0 Update 3 on Win64 ( upgrading from 11.0 Update 2), all our automated QA related to ARPACK based eigenvalue extraction started failing.
Code was not recompiled,We call mkl_dcsrsymv as part of the ARPACK reverse-communication interface.
I am reverted to Update 2, and all problems went away.
Sorry I cannot be more specific
Hey, thanks for reporting the problem. To help us zero in on the problem:
- Is mkl_dcsrsymv the only MKL function you call in your code?
- How do you link your code with MKL, dynamic or static?
- What compiler do you use?
- If you try re-lilnking with MKL 11.0 update 3, does the problem go away?
Our code calls many,many MKL BLAS and LAPACK functions. However, in particular, the eigensolver showed **serious numerical ** QA failures. This part of the QA has been robust through all MKL version 9.0->11.1 Update 2. We have a very robust and stable QA process
To try and actually track down exactly which MKL function is the problem will be very difficult. I mention mkl_dcsrsymv only as this is used extensively in the ARPACK call back of the eigensolver, and I notice the release notes mention related functions.
We use Intel 13.1 C++ Compiler/Visual Studio 2010.
I link dynamically with MKL, so it should not be necessary to re-link or recompile. I can just swap DLLs. But for completeness....
- Running against MKL Update 3 ( no re-link, no-recompile) ->FAIL
- Running against MKL Update 3 ( re-compile all source, re-link) ->FAIL
- Uninstall MKL Update 3, re-install MKL Update 2->SUCCESS
The MKL team is able to reproduce the problem. It's a bug introduced in MKL 11.0 Update 3 for the parallel inplementation of the ?LACPY routine. We will fix it in our next update release. Meanwhile, we are working on a temporary workaround for the current release. We'll communicate to all users about this workaround soon. Please stay tuned.
We run an exhaustive nightly QA on the "development" branch and that is how we found the issue. I am just glad the root cause has been found before I had to spend my time debugging it! I think Intel has been pretty responsive.
A quick update on the root cause and how Intel MKL is going to do address this issue.
The problem lies in ARPACK's using the DLACPY routine to copy arrays where the source and destination overlap. It caused no problem when DLACPY was implemented as a serial operation. Intel MKL 11.0 Update 3 introduces a multithreaded implementation for DLACPY, which manifests the issue of copying overlapped arrays. Note that FORTRAN standard enforces that array arguments passed to a routine to be non-aliased. MKL's threaded DLACPY is taking advantages of this specification, and therefore it is a valid implementation. We have contacted the ARPACK developers and made them aware of this issue.
In order to continue to provide compatibility with the existing ARPACK implementation, we will fully resolve this problem in our next release (MKL 11.0 Update 4). For the time being, there are a few options:
- Users can choose to use the default DLACPY implementation included in the ARPACK source, while still using MKL 11.0 Update 3 for other LAPACK functions. This requires minor modifications to the ARPACK makefile to make the default DLACPY linked before MKL. This KB details the steps.
- Users can also choose to modify the ARPACK source code to avoid using DLACPY on overlapped arrays. For example, replace the DLACPY with a loop copying element-by-element. This requires minor modifications to the ARPACK source. See the KB for detailed instructions.
- We are working on a patch that can be applied to an MKL 11.0 Update 3 installation ahead of the Update 4 release. Let us know if you would like to receive this patch when ready.
Hopefully, this can help our users to move forward with using MKL 11.0 Update 3. Thanks.