Thank you for reading this post.
I got these error messages when calling zgelsd in MKL 15.0 to solve a fairly large matrix,
Intel MKL INTERNAL ERROR: Condition 1 detected in function DLASD4.
Intel MKL INTERNAL ERROR: Condition 1 detected in function DLASD8.
I googled online and found the exact issue here https://software.intel.com/en-us/forums/topic/373673, where it said the bug had been fixed in MKL 11 update 5.
The matrix contains 23066 * 23068, which is more than 500 million, complex numbers. At first I thought it might be some overflow issue cause I only encounter this issue when dealing with matrices of such size or larger. However, it seems that 500 million is still far less than (2^31-1). Also I do have a case, in which the coefficients are computed in a slightly different way, where the solver works and gives the correct result. (I was developing a code used in our group and the coefficient computed in the two ways are mostly the same with slightly difference in minor places.)
Could you please tell which platform are you working on? it would be better if you provide the test code and the test matrix.
I try the code and test matrix from https://software.intel.com/en-us/forums/topic/373673 on Linux machine with 64bit, dynamic link. with composer_xe_2015.2.164
[yhu5@prc-mic01 ~]$ source /opt/intel/composer_xe_2015.2.164/mkl/bin/mklvars.sh intel64
[yhu5@prc-mic01 F373673_zgelsd]$ gcc -Wall -g -O0 -o zgelsd_bug zgelsd-bug.c -fno-strict-aliasing -L $LD_LIBRARY_PATH -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lm
[yhu5@prc-mic01 F373673_zgelsd]$ ./zgelsd_bug
Reading the input matrix...
Reading the input RHS...
info = 0
[yhu5@prc-mic01 F373673_zgelsd]$ gcc --version
gcc (GCC) 4.4.6 20110731 (Red Hat 4.4.6-3)
Thank you for the prompt reply.
I'm working on Linux. Yes, I can send you the solver part of the test code. However, the matrix saved in formatted file is around 30G. I'll convert it to unformatted form and see if there is more I can do to compress it.
We are able to reproduce the errors. There is a optimization-related issues inside of MKL, which causes some loss of precision and the algorithm could not converge on that particular matrix. I have recorded the problem into our buglists, our developer will fix it later.
For temp workaround, would you please try the zgelss? ( it works fine for the matrix.).
We have two SVD based algorithms for solving Least Squares problems:
1. ?gelsd – using SVD based on Divide and Conquer (D&C),
2. ?gelss – using SVD based on QR.
D&C algorithm are faster and exploit less flops, but less stable and there are some matrices they are unable to solve. QR based algorithms are more robust but slower. basically, we try solving with D&C algorithm first, but if it reports an error, then rerun the same task with QR.