Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Scalapack BUG in PZUNGQR (Again)

John_Young
New Contributor I
405 Views

Hi,

Previously, we reported a possible Scalapack bug in the PZUNGQR function

             http://software.intel.com/en-us/forums/topic/473803

That issue has still not been resolved, but it was stated that it was an issue with zero-sized matrices on some nodes.  However, we have encountered a somewhat similar issue with PZUNGQR even when the local matrices do no have zero-size.  In the attached test case, the PZGEMM call that follows the PZUNGQR call will either hang or produce Irecv error even though the QR matrices and the PZGEMM matrices have non-zero sized matrices on all nodes.  Interestingly, if the matrices used in the PZGEMM call have a global size less than the block size (only one node has non-zero sized matrices), then it completes fine. 

In the attached test case, the  bug only occurs if single-node matrices call ZUNQGR and multiple node matrices call PZUNGQR.  If all nodes call PZUNGQR it does not occur.  However, in our full code the bug seems to occur sometimes even if all nodes call PZUNGQR.  Unfortunately, I was not able to reduce this particular behavior down to a simple test case.

Thanks, John

0 Kudos
5 Replies
John_Young
New Contributor I
405 Views

In the attached test case, the  bug only occurs if single-node matrices call ZUNQGR and multiple node matrices call PZUNGQR.  If all nodes call PZUNGQR it does not occur.  However, in our full code the bug seems to occur sometimes even if all nodes call PZUNGQR.  Unfortunately, I was not able to reduce this particular behavior down to a simple test case.

It just occurred to me that mixing PZUNGQR and ZUNGQR is not the issue.  The primary issue is that some nodes call PZUNGQR and some don't.  If the ZGEQRF/ZUNGQR call in the attached test case is commented out, the bug occurs since only the matrices that are really distributed call PZUNGQR and the single-node matrices do nothing for the QR.

0 Kudos
Zhang_Z_Intel
Employee
405 Views

John,

The first issue you reported earlier is still being investigated by the MKL team. Thank you very much for the additional information. We will make sure our fix covers this new scenario as well.

 

0 Kudos
Stephen_G_
Beginner
405 Views

Zhang,

We appreciate that you are working on this.  Do you have an approximate time estimate on this?  This bug in the mkl library has been holding up a deliverable to our customer.   I know you can't put a firm date on it. However, if you could estimate in days, weeks or months, it would be helpful.  Thank you!

 

0 Kudos
Zhang_Z_Intel
Employee
405 Views

gedney@engr.uky.edu wrote:

Zhang,

We appreciate that you are working on this.  Do you have an approximate time estimate on this?  This bug in the mkl library has been holding up a deliverable to our customer.   I know you can't put a firm date on it. However, if you could estimate in days, weeks or months, it would be helpful.  Thank you!

 

Thanks for letting us know the impact of this issue on your deliveries. I'll send you a note with an estimate as soon as I have a solid idea on where we are now, hopefully in 1-2 business days.

0 Kudos
Stephen_G_
Beginner
405 Views

Zhang,

We really appreciate that.  Thank you!

0 Kudos
Reply