Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
7222 ディスカッション

Scalapack BUG in PZUNGQR (Again)

John_Young
新規コントリビューター I
1,036件の閲覧回数

Hi,

Previously, we reported a possible Scalapack bug in the PZUNGQR function

             http://software.intel.com/en-us/forums/topic/473803

That issue has still not been resolved, but it was stated that it was an issue with zero-sized matrices on some nodes.  However, we have encountered a somewhat similar issue with PZUNGQR even when the local matrices do no have zero-size.  In the attached test case, the PZGEMM call that follows the PZUNGQR call will either hang or produce Irecv error even though the QR matrices and the PZGEMM matrices have non-zero sized matrices on all nodes.  Interestingly, if the matrices used in the PZGEMM call have a global size less than the block size (only one node has non-zero sized matrices), then it completes fine. 

In the attached test case, the  bug only occurs if single-node matrices call ZUNQGR and multiple node matrices call PZUNGQR.  If all nodes call PZUNGQR it does not occur.  However, in our full code the bug seems to occur sometimes even if all nodes call PZUNGQR.  Unfortunately, I was not able to reduce this particular behavior down to a simple test case.

Thanks, John

0 件の賞賛
5 返答(返信)
John_Young
新規コントリビューター I
1,036件の閲覧回数

In the attached test case, the  bug only occurs if single-node matrices call ZUNQGR and multiple node matrices call PZUNGQR.  If all nodes call PZUNGQR it does not occur.  However, in our full code the bug seems to occur sometimes even if all nodes call PZUNGQR.  Unfortunately, I was not able to reduce this particular behavior down to a simple test case.

It just occurred to me that mixing PZUNGQR and ZUNGQR is not the issue.  The primary issue is that some nodes call PZUNGQR and some don't.  If the ZGEQRF/ZUNGQR call in the attached test case is commented out, the bug occurs since only the matrices that are really distributed call PZUNGQR and the single-node matrices do nothing for the QR.

Zhang_Z_Intel
従業員
1,036件の閲覧回数

John,

The first issue you reported earlier is still being investigated by the MKL team. Thank you very much for the additional information. We will make sure our fix covers this new scenario as well.

 

Stephen_G_
ビギナー
1,036件の閲覧回数

Zhang,

We appreciate that you are working on this.  Do you have an approximate time estimate on this?  This bug in the mkl library has been holding up a deliverable to our customer.   I know you can't put a firm date on it. However, if you could estimate in days, weeks or months, it would be helpful.  Thank you!

 

Zhang_Z_Intel
従業員
1,036件の閲覧回数

gedney@engr.uky.edu wrote:

Zhang,

We appreciate that you are working on this.  Do you have an approximate time estimate on this?  This bug in the mkl library has been holding up a deliverable to our customer.   I know you can't put a firm date on it. However, if you could estimate in days, weeks or months, it would be helpful.  Thank you!

 

Thanks for letting us know the impact of this issue on your deliveries. I'll send you a note with an estimate as soon as I have a solid idea on where we are now, hopefully in 1-2 business days.

Stephen_G_
ビギナー
1,036件の閲覧回数

Zhang,

We really appreciate that.  Thank you!

返信