Showing results for

- Intel Community
- Software
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library
- A bug in zgelsd in MKL 15.0

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Yue_W_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-23-2015
04:03 PM

63 Views

A bug in zgelsd in MKL 15.0

Hi there,

Thank you for reading this post.

I got these error messages when calling *zgelsd* in MKL 15.0 to solve a fairly large matrix,

*Intel MKL INTERNAL ERROR: Condition 1 detected in function DLASD4.*

*Intel MKL INTERNAL ERROR: Condition 1 detected in function DLASD8.*

I googled online and found the exact issue here https://software.intel.com/en-us/forums/topic/373673, where it said the bug had been fixed in MKL 11 update 5.

The matrix contains 23066 * 23068, which is more than 500 million, complex numbers. At first I thought it might be some overflow issue cause I only encounter this issue when dealing with matrices of such size or larger. However, it seems that 500 million is still far less than (2^31-1). Also I do have a case, in which the coefficients are computed in a slightly different way, where the solver works and gives the correct result. (I was developing a code used in our group and the coefficient computed in the two ways are mostly the same with slightly difference in minor places.)

Thank you,

Yue

Link Copied

8 Replies

Yue_W_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-23-2015
04:09 PM

63 Views

Sorry about the MKL version. I'm not sure about it. I was using the one in *composer_xe_2015.2.164. *

Ying_H_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-23-2015
06:43 PM

63 Views

Hi Yue,

Could you please tell which platform are you working on? it would be better if you provide the test code and the test matrix.

I try the code and test matrix from __https://software.intel.com/en-us/forums/topic/373673 on Linux machine with 64bit, dynamic link. with composer_xe_2015.2.164__

[yhu5@prc-mic01 ~]$ source /opt/intel/composer_xe_2015.2.164/mkl/bin/mklvars.sh intel64

[yhu5@prc-mic01 F373673_zgelsd]$ gcc -Wall -g -O0 -o zgelsd_bug zgelsd-bug.c -fno-strict-aliasing -L $LD_LIBRARY_PATH -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lm

[yhu5@prc-mic01 F373673_zgelsd]$ ./zgelsd_bug

Reading the input matrix...

Reading the input RHS...

Done

info = 0

[yhu5@prc-mic01 F373673_zgelsd]$ gcc --version

gcc (GCC) 4.4.6 20110731 (Red Hat 4.4.6-3)

Best Regards,

Ying

Yue_W_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-23-2015
07:08 PM

63 Views

Hi Ying,

Thank you for the prompt reply.

I'm working on Linux. Yes, I can send you the solver part of the test code. However, the matrix saved in formatted file is around 30G. I'll convert it to unformatted form and see if there is more I can do to compress it.

Thank you,

Yue

Yue_W_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-24-2015
03:26 PM

63 Views

Hi Ying,

I made two tarballs (~13G together) containing two sets of data and the test code. One set of data works while the other doesn't and gives these error messages. Is there anyway that I can send you the data please?

Thank you,

Yue

Ying_H_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-29-2015
06:37 PM

63 Views

Hi Yue,

Thanks for the test package. We are trying it. It seems that the run is very long and I will keep you update if any result.

Thanks

Ying

Yue_W_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-29-2015
06:40 PM

63 Views

Hi Ying,

Thank you for the update. Yes, the run takes a while (should be around 10 to 15 hours). Thanks!

Regards,

Yue

Ying_H_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-31-2015
07:19 PM

63 Views

Hi Yue,

We are able to reproduce the errors. There is a optimization-related issues inside of MKL, which causes some loss of precision and the algorithm could not converge on that particular matrix. I have recorded the problem into our buglists, our developer will fix it later.

For temp workaround, would you please try the** zgelss**? ( it works fine for the matrix.).

We have two SVD based algorithms for solving Least Squares problems:

1. ?gelsd – using SVD based on Divide and Conquer (D&C),

2. ?gelss – using SVD based on QR.

D&C algorithm are faster and exploit less flops, but less stable and there are some matrices they are unable to solve. QR based algorithms are more robust but slower. basically, we try solving with D&C algorithm first, but if it reports an error, then rerun the same task with QR.

Best regards,

Ying

Yue_W_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-31-2015
07:33 PM

63 Views

Hi Ying,

Thank you for the update. I'll try zgelss and do some tests.

Thanks,

Yue

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.