- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
We are in the process of trying to upgrade our MKL from 11.2.2.1 to 2017 update 2. We noticed a change in how a singular matrix is handled by zgetrf. In 11.2.2.1, zgetrf would return (via info) a positive number indicating the problem pivot point. Now 2017.2 throws a floating point division by zero exception and we do not know the problem pivot number. Was this an intentional change? If so, how do we find out the problem pivot number?
A simple test case is a 4x4 complex matrix represented by:
+ [0] {d_re=0.00000000000000000 d_im=1000000.0000000000 } complex
+ [1] {d_re=0.00000000000000000 d_im=1000000.0000000000 } complex
+ [2] {d_re=0.00000000000000000 d_im=0.00000000000000000 } complex
+ [3] {d_re=0.00000000000000000 d_im=0.00000000000000000 } complex
+ [4] {d_re=0.00000000000000000 d_im=1000000.0000000000 } complex
+ [5] {d_re=0.00000000000000000 d_im=1000000.0000000000 } complex
+ [6] {d_re=0.00000000000000000 d_im=0.00000000000000000 } complex
+ [7] {d_re=0.00000000000000000 d_im=0.00000000000000000 } complex
+ [8] {d_re=0.00000000000000000 d_im=0.00000000000000000 } complex
+ [9] {d_re=0.00000000000000000 d_im=0.00000000000000000 } complex
+ [10] {d_re=1000000.0005243536 d_im=-16.191775445209515 } complex
+ [11] {d_re=-1000000.0000000000 d_im=0.00000000000000000 } complex
+ [12] {d_re=0.00000000000000000 d_im=0.00000000000000000 } complex
+ [13] {d_re=0.00000000000000000 d_im=0.00000000000000000 } complex
+ [14] {d_re=-1000000.0000000000 d_im=0.00000000000000000 } complex
+ [15] {d_re=1000000.0005243536 d_im=-16.191775445209515 } complex
Exception info:
First-chance exception at 0x00007FFBFAFEC926 (mkl_avx2.dll) in blah.exe: 0xC000008E: Floating-point division by zero (parameters: 0x0000000000000000). In our exe, we translate select structured exceptions like this to C++ exceptions so we can deal with computation errors at a higher level.
Thanks,
Paul
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hello Paul, at the first glance, this might be caused by non-exactness of floating point arithmetic and FMA instructions set. We need to check this more carefully.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Paul, I see no exceptions with LU routine and with the data you gave. the example of the code is attached. Here is the output I see on my side with the latest MKL 2017 u2:
..\mkl_Forums\u731589>2017.exe
ZGESVD Example Program Results
Major version: 2017
Minor version: 0
Update version: 2
Product status: Product
Build: 20170126
Platform: Intel(R) 64 architecture
Processor optimization: Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors
================================================================
info, zgetrf = 2
ipiv:
[0] = 1
, [1] = 2
, [2] = 3
, [3] = 4
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We finally have a test case. I have a zip file of a VS solution with three projects: A command line exe that loads a win32 dll which in turn depends on a FORTRAN project. It seems loading the FORTRAN runtime libs triggers the problem. The zip file is 92 MB if I include the MKL libs and that file fails to upload. Should I remove the MKL libs and try again?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yes, you may remove mkl's libs and upload the project.
you may also try to check the problem with the latest MKL 2017 u3 which we released one week ago. the announcement on the top of the forum.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I checked with mkl 2017 u3 on three different CPUs. I only added mkl_version routine and I see the same behavior and no exceptions...
Windows 8.1, 64 bit.
ZGESVD Example Program Results
Major version: 2017
Minor version: 0
Update version: 3
Product status: Product
Build: 20170413
Platform: Intel(R) 64 architecture
Processor optimization: Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled processors
================================================================
info, zgetrf = 2
Press any key to continue . . .
ZGESVD Example Program Results
Major version: 2017
Minor version: 0
Update version: 3
Product status: Product
Build: 20170413
Platform: Intel(R) 64 architecture
Processor optimization: Intel(R) Advanced Vector Extensions (Intel(R) AVX) enabled processors
================================================================
info, zgetrf = 2
Press any key to continue . . .
ZGESVD Example Program Results
Major version: 2017
Minor version: 0
Update version: 3
Product status: Product
Build: 20170413
Platform: Intel(R) 64 architecture
Processor optimization: Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors
================================================================
info, zgetrf = 2
Press any key to continue . . .
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We just verified that 2017 u3 seems to work. Can you do us a favor and try it on u2 just to confirm our findings and maybe let us know what changed? It seemed like the FORTRAN runtime libs were putting the exception mask into a bad state.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Any update on what changed between update 2 and update 3 that fixed this issue?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Paul,
You are right, it was a bug in LU implementation for small sizes (2x2, 3x3 and 4x4). It was introduced in MKL 2017 Update 1 and was fixed in MKL 2017 Update 3. A column was scaled even in the case of zero pivot. It caused division by zero and NaNs in a matrix.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page