- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi there,
Thank you for reading this post.
I got these error messages when calling zgelsd in MKL 15.0 to solve a fairly large matrix,
Intel MKL INTERNAL ERROR: Condition 1 detected in function DLASD4.
Intel MKL INTERNAL ERROR: Condition 1 detected in function DLASD8.
I googled online and found the exact issue here https://software.intel.com/en-us/forums/topic/373673, where it said the bug had been fixed in MKL 11 update 5.
The matrix contains 23066 * 23068, which is more than 500 million, complex numbers. At first I thought it might be some overflow issue cause I only encounter this issue when dealing with matrices of such size or larger. However, it seems that 500 million is still far less than (2^31-1). Also I do have a case, in which the coefficients are computed in a slightly different way, where the solver works and gives the correct result. (I was developing a code used in our group and the coefficient computed in the two ways are mostly the same with slightly difference in minor places.)
Thank you,
Yue
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry about the MKL version. I'm not sure about it. I was using the one in composer_xe_2015.2.164.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Yue,
Could you please tell which platform are you working on? it would be better if you provide the test code and the test matrix.
I try the code and test matrix from https://software.intel.com/en-us/forums/topic/373673 on Linux machine with 64bit, dynamic link. with composer_xe_2015.2.164
[yhu5@prc-mic01 ~]$ source /opt/intel/composer_xe_2015.2.164/mkl/bin/mklvars.sh intel64
[yhu5@prc-mic01 F373673_zgelsd]$ gcc -Wall -g -O0 -o zgelsd_bug zgelsd-bug.c -fno-strict-aliasing -L $LD_LIBRARY_PATH -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lm
[yhu5@prc-mic01 F373673_zgelsd]$ ./zgelsd_bug
Reading the input matrix...
Reading the input RHS...
Done
info = 0
[yhu5@prc-mic01 F373673_zgelsd]$ gcc --version
gcc (GCC) 4.4.6 20110731 (Red Hat 4.4.6-3)
Best Regards,
Ying
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ying,
Thank you for the prompt reply.
I'm working on Linux. Yes, I can send you the solver part of the test code. However, the matrix saved in formatted file is around 30G. I'll convert it to unformatted form and see if there is more I can do to compress it.
Thank you,
Yue
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ying,
I made two tarballs (~13G together) containing two sets of data and the test code. One set of data works while the other doesn't and gives these error messages. Is there anyway that I can send you the data please?
Thank you,
Yue
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Yue,
Thanks for the test package. We are trying it. It seems that the run is very long and I will keep you update if any result.
Thanks
Ying
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ying,
Thank you for the update. Yes, the run takes a while (should be around 10 to 15 hours). Thanks!
Regards,
Yue
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Yue,
We are able to reproduce the errors. There is a optimization-related issues inside of MKL, which causes some loss of precision and the algorithm could not converge on that particular matrix. I have recorded the problem into our buglists, our developer will fix it later.
For temp workaround, would you please try the zgelss? ( it works fine for the matrix.).
We have two SVD based algorithms for solving Least Squares problems:
1. ?gelsd – using SVD based on Divide and Conquer (D&C),
2. ?gelss – using SVD based on QR.
D&C algorithm are faster and exploit less flops, but less stable and there are some matrices they are unable to solve. QR based algorithms are more robust but slower. basically, we try solving with D&C algorithm first, but if it reports an error, then rerun the same task with QR.
Best regards,
Ying
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ying,
Thank you for the update. I'll try zgelss and do some tests.
Thanks,
Yue
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page