Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Paul_C_2
Beginner
145 Views

zgetrf error handling regression

Hello,

We are in the process of trying to upgrade our MKL from 11.2.2.1 to 2017 update 2. We noticed a change in how a singular matrix is handled by zgetrf. In 11.2.2.1, zgetrf would return (via info) a positive number indicating the problem pivot point. Now 2017.2 throws a floating point division by zero exception and we do not know the problem pivot number. Was this an intentional change? If so, how do we find out the problem pivot number?

A simple test case is a 4x4 complex matrix represented by:

+        [0]    {d_re=0.00000000000000000 d_im=1000000.0000000000 }    complex
+        [1]    {d_re=0.00000000000000000 d_im=1000000.0000000000 }    complex
+        [2]    {d_re=0.00000000000000000 d_im=0.00000000000000000 }    complex
+        [3]    {d_re=0.00000000000000000 d_im=0.00000000000000000 }    complex
+        [4]    {d_re=0.00000000000000000 d_im=1000000.0000000000 }    complex
+        [5]    {d_re=0.00000000000000000 d_im=1000000.0000000000 }    complex
+        [6]    {d_re=0.00000000000000000 d_im=0.00000000000000000 }    complex
+        [7]    {d_re=0.00000000000000000 d_im=0.00000000000000000 }    complex
+        [8]    {d_re=0.00000000000000000 d_im=0.00000000000000000 }    complex
+        [9]    {d_re=0.00000000000000000 d_im=0.00000000000000000 }    complex
+        [10]    {d_re=1000000.0005243536 d_im=-16.191775445209515 }    complex
+        [11]    {d_re=-1000000.0000000000 d_im=0.00000000000000000 }    complex
+        [12]    {d_re=0.00000000000000000 d_im=0.00000000000000000 }    complex
+        [13]    {d_re=0.00000000000000000 d_im=0.00000000000000000 }    complex
+        [14]    {d_re=-1000000.0000000000 d_im=0.00000000000000000 }    complex
+        [15]    {d_re=1000000.0005243536 d_im=-16.191775445209515 }    complex

Exception info:

First-chance exception at 0x00007FFBFAFEC926 (mkl_avx2.dll) in blah.exe: 0xC000008E: Floating-point division by zero (parameters: 0x0000000000000000). In our exe, we translate select structured exceptions like this to C++ exceptions so we can deal with computation errors at a higher level.

 

Thanks,

Paul

 

0 Kudos
9 Replies
Gennady_F_Intel
Moderator
145 Views

hello Paul, at the first glance, this might be caused by non-exactness of floating point arithmetic and FMA instructions set. We need to check this more carefully. 

Gennady_F_Intel
Moderator
145 Views

Paul, I see no exceptions with LU routine and with the data you gave. the example of the code is attached. Here is the output I see on my side with the latest MKL 2017 u2: 

..\mkl_Forums\u731589>2017.exe

 ZGESVD Example Program Results
Major version:           2017
Minor version:           0
Update version:          2
Product status:          Product
Build:                   20170126
Platform:                Intel(R) 64 architecture
Processor optimization:  Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors
================================================================

 info, zgetrf = 2
 ipiv:
 [0] = 1
, [1] = 2
, [2] = 3
, [3] = 4

Paul_C_2
Beginner
145 Views

We finally have a test case. I have a zip file of a VS solution with three projects: A command line exe that loads a win32 dll which in turn depends on a FORTRAN project. It seems loading the FORTRAN runtime libs triggers the problem. The zip file is 92 MB if I include the MKL libs and that file fails to upload. Should I remove the MKL libs and try again?

Gennady_F_Intel
Moderator
145 Views

yes, you may remove mkl's libs and upload the project. 

you may also try to check the problem with the latest MKL 2017 u3 which we released one week ago. the announcement on the top of the forum.

Paul_C_2
Beginner
145 Views

Ok, version without bin libs is attached. We'll try update 3 later this week.

Gennady_F_Intel
Moderator
145 Views

I checked with mkl 2017 u3 on three different CPUs. I only added mkl_version routine and I see the same behavior  and no exceptions...

Windows 8.1, 64 bit.

 ZGESVD Example Program Results
Major version:           2017
Minor version:           0
Update version:          3
Product status:          Product
Build:                   20170413
Platform:                Intel(R) 64 architecture
Processor optimization:  Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled processors
================================================================

 info, zgetrf = 2
Press any key to continue . . .

 ZGESVD Example Program Results
Major version:           2017
Minor version:           0
Update version:          3
Product status:          Product
Build:                   20170413
Platform:                Intel(R) 64 architecture
Processor optimization:  Intel(R) Advanced Vector Extensions (Intel(R) AVX) enabled processors
================================================================

 info, zgetrf = 2
Press any key to continue . . .

 ZGESVD Example Program Results
Major version:           2017
Minor version:           0
Update version:          3
Product status:          Product
Build:                   20170413
Platform:                Intel(R) 64 architecture
Processor optimization:  Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors
================================================================

 info, zgetrf = 2
Press any key to continue . . .

Paul_C_2
Beginner
145 Views

We just verified that 2017 u3 seems to work. Can you do us a favor and try it on u2 just to confirm our findings and maybe let us know what changed? It seemed like the FORTRAN runtime libs were putting the exception mask into a bad state.

Paul_C_2
Beginner
145 Views

Any update on what changed between update 2 and update 3 that fixed this issue?

 

Eugene_C_Intel1
Employee
145 Views

Hi Paul,

You are right, it was a bug in LU implementation for small sizes (2x2, 3x3 and 4x4). It was introduced in MKL 2017 Update 1 and was fixed in MKL 2017 Update 3. A column was scaled even in the case of zero pivot. It caused division by zero and NaNs in a matrix.

Reply