Ok, version without bin libs

Paul_C_2 · ‎04-21-2017

Hello,

We are in the process of trying to upgrade our MKL from 11.2.2.1 to 2017 update 2. We noticed a change in how a singular matrix is handled by zgetrf. In 11.2.2.1, zgetrf would return (via info) a positive number indicating the problem pivot point. Now 2017.2 throws a floating point division by zero exception and we do not know the problem pivot number. Was this an intentional change? If so, how do we find out the problem pivot number?

A simple test case is a 4x4 complex matrix represented by:

+       [0]   {d_re=0.00000000000000000 d_im=1000000.0000000000 }   complex
+       [1]   {d_re=0.00000000000000000 d_im=1000000.0000000000 }   complex
+       [2]   {d_re=0.00000000000000000 d_im=0.00000000000000000 }   complex
+       [3]   {d_re=0.00000000000000000 d_im=0.00000000000000000 }   complex
+       [4]   {d_re=0.00000000000000000 d_im=1000000.0000000000 }   complex
+       [5]   {d_re=0.00000000000000000 d_im=1000000.0000000000 }   complex
+       [6]   {d_re=0.00000000000000000 d_im=0.00000000000000000 }   complex
+       [7]   {d_re=0.00000000000000000 d_im=0.00000000000000000 }   complex
+       [8]   {d_re=0.00000000000000000 d_im=0.00000000000000000 }   complex
+       [9]   {d_re=0.00000000000000000 d_im=0.00000000000000000 }   complex
+       [10]   {d_re=1000000.0005243536 d_im=-16.191775445209515 }   complex
+       [11]   {d_re=-1000000.0000000000 d_im=0.00000000000000000 }   complex
+       [12]   {d_re=0.00000000000000000 d_im=0.00000000000000000 }   complex
+       [13]   {d_re=0.00000000000000000 d_im=0.00000000000000000 }   complex
+       [14]   {d_re=-1000000.0000000000 d_im=0.00000000000000000 }   complex
+       [15]   {d_re=1000000.0005243536 d_im=-16.191775445209515 }   complex

Exception info:

First-chance exception at 0x00007FFBFAFEC926 (mkl_avx2.dll) in blah.exe: 0xC000008E: Floating-point division by zero (parameters: 0x0000000000000000). In our exe, we translate select structured exceptions like this to C++ exceptions so we can deal with computation errors at a higher level.

Thanks,

Paul

Gennady_F_Intel · ‎04-23-2017

hello Paul, at the first glance, this might be caused by non-exactness of floating point arithmetic and FMA instructions set. We need to check this more carefully.

Gennady_F_Intel · ‎04-23-2017

Paul, I see no exceptions with LU routine and with the data you gave. the example of the code is attached. Here is the output I see on my side with the latest MKL 2017 u2:

..\mkl_Forums\u731589>2017.exe

ZGESVD Example Program Results
Major version: 2017
Minor version: 0
Update version: 2
Product status: Product
Build: 20170126
Platform: Intel(R) 64 architecture
Processor optimization: Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors
================================================================

info, zgetrf = 2
ipiv:
[0] = 1
, [1] = 2
, [2] = 3
, [3] = 4

Paul_C_2 · ‎05-16-2017

We finally have a test case. I have a zip file of a VS solution with three projects: A command line exe that loads a win32 dll which in turn depends on a FORTRAN project. It seems loading the FORTRAN runtime libs triggers the problem. The zip file is 92 MB if I include the MKL libs and that file fails to upload. Should I remove the MKL libs and try again?

Gennady_F_Intel · ‎05-16-2017

yes, you may remove mkl's libs and upload the project.

you may also try to check the problem with the latest MKL 2017 u3 which we released one week ago. the announcement on the top of the forum.

Paul_C_2 · ‎05-17-2017

Ok, version without bin libs is attached. We'll try update 3 later this week.

Gennady_F_Intel · ‎05-17-2017

I checked with mkl 2017 u3 on three different CPUs. I only added mkl_version routine and I see the same behavior and no exceptions...

Windows 8.1, 64 bit.

ZGESVD Example Program Results
Major version: 2017
Minor version: 0
Update version: 3
Product status: Product
Build: 20170413
Platform: Intel(R) 64 architecture
Processor optimization: Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled processors
================================================================

info, zgetrf = 2
Press any key to continue . . .

ZGESVD Example Program Results
Major version: 2017
Minor version: 0
Update version: 3
Product status: Product
Build: 20170413
Platform: Intel(R) 64 architecture
Processor optimization: Intel(R) Advanced Vector Extensions (Intel(R) AVX) enabled processors
================================================================

info, zgetrf = 2
Press any key to continue . . .

ZGESVD Example Program Results
Major version: 2017
Minor version: 0
Update version: 3
Product status: Product
Build: 20170413
Platform: Intel(R) 64 architecture
Processor optimization: Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors
================================================================

info, zgetrf = 2
Press any key to continue . . .

Paul_C_2 · ‎05-18-2017

We just verified that 2017 u3 seems to work. Can you do us a favor and try it on u2 just to confirm our findings and maybe let us know what changed? It seemed like the FORTRAN runtime libs were putting the exception mask into a bad state.

Paul_C_2 · ‎07-18-2017

Any update on what changed between update 2 and update 3 that fixed this issue?

Eugene_C_Intel1 · ‎07-18-2017

Hi Paul,

You are right, it was a bug in LU implementation for small sizes (2x2, 3x3 and 4x4). It was introduced in MKL 2017 Update 1 and was fixed in MKL 2017 Update 3. A column was scaled even in the case of zero pivot. It caused division by zero and NaNs in a matrix.

zgetrf error handling regression