Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

MKL DSS fails to solve regular system

Peter_Zajac
Beginner
1,584 Views

I have recently included the sequential version of MKL DSS as a direct solver backend into our finite element software and our regression test system has fired because the MKL DSS failed to solve the linear test system on a Windows machine.

I have extracted the CSR matrix and rhs vector from our regression test and build a small example code that triggers the erroneous behaviour; the code is attached. The matrix of the linear system is a symmetric indefinite saddle point matrix and I have made sure that all main diagonal entries exist in the sparsity pattern, although the last few dozen rows have zeros on their main diagonal because of the saddle point nature of the matrix. The matrix contains a lot of zeros in the sparsity pattern, including the main diagonal. I have analyzed this matrix in a linear algebra software package and I have found that all eigenvalues are real (as expected), the minimum absolute eigenvalue is roughly 1.166e-4 and the maximum absolute eigenvalue is 393.185, which gives a condition number of about 3.37e+6, so the matrix is definitely regular.

The attached test code uses MKL DSS to solve the system and computes the defect maximum-norm by hand afterwards, which should be close to 0 if the system was solved correctly. I have set up MKL DSS with MKL_DSS_NON_SYMMETRIC for the strucutre definition, MKL_DSS_AUTO_ORDER for the reordering step and MKL_DSS_INDEFINITE for the factorization; all MKL DSS function calls returned MKL_DSS_SUCCESS.

As mentioned before, the test code fails on x64 Windows 10 machines giving a defect norm of about 48. The test code was compiled under Visual Studio 2022 and the installed MKL version is 2024.2. I am not sure whether this matters, but two Windows machines that I have tested this run AMD Ryzen 5 1600X and AMP Ryzen 7 5800X CPUs, respectively, and unfortunately, these are the only Windows machines with MKL installed that I have currently access to. It does not seem to matter whether I use ILP64 interface or not or whether I use the sequential or threaded MKL libraries, the result is always the same.

It is important to note that the code works perfectly fine on our AMD EPYC Redhat 9 Linux servers with MKL 2024.0, where the defect norm is 4.26326e-14, so this seems to be either a Windows issue or an issue with some AMD CPUs or the combination of both.

For whatever reason, this website does not allow me to attach the source code as a cpp file, because it says that the content type does not match the file extension, so here it comes: 

 

 

 

0 Kudos
16 Replies
Fengrui
Moderator
1,507 Views

Hello,


Thank you for posting in the forum!

I tried to run the code on both Intel and AMD machines with Ubuntu, Intel machine with Windows (using both Intel and MSVC compilers). All the tests showed the Defect Norm with values ~10^-14. I'm trying to find an AMD machine with Windows system to test.


At the same time, could you please try to build the code in Command Prompt, "icpx -qmkl test.cpp", on the AMD-Windows machine and run it?


Thanks,

Fengrui


0 Kudos
Peter_Zajac
Beginner
1,472 Views

Hello,

I have opened a command prompt and compiled from the command line using icpx, but I still get the incorrect result:

 

D:\Desktop\mkl_test>icpx --version
Intel(R) oneAPI DPC++/C++ Compiler 2024.2.1 (2024.2.1.20240711)
Target: x86_64-pc-windows-msvc
Thread model: posix
InstalledDir: C:\Program Files (x86)\Intel\oneAPI\compiler\2024.2\bin\compiler
Configuration file: C:\Program Files (x86)\Intel\oneAPI\compiler\2024.2\bin\compiler\..\icpx.cfg

D:\Desktop\mkl_test>icpx -qmkl mkl_stokes_test.cpp

D:\Desktop\mkl_test>a.exe
Defect Norm: 48.7579

WARNING: defect is not even close to zero!


Also, I have small request: For some reason, I cannot edit my original post, but my browser is struggling heavily to display this topic, probably because of the source code formatting. Could you please disable the source code formatting in my original post?

On a side note: I will be on holidays from tomorrow until the end of next week, so I won't be able to reply until the second week of September.

Best regards,
- Peter

0 Kudos
Fengrui
Moderator
1,347 Views

Hi Peter,

 

I finally find an AMD machine with Windows OS to test the code. Yes, I can reproduce the issue. I will escalate it to the developer team for fix. At the same time, please try oneMKL Pardiso as workaround. I tried oneMKL Pardiso on the same machine and got the error of ~10^-14.

 

I wasn't able to change your post. But let me consult my colleagues regarding this.

 

Thanks,

Fengrui

 

0 Kudos
Fengrui
Moderator
1,336 Views

Just moved the original source code in a zip file.

0 Kudos
jirina
New Contributor I
1,122 Views

Hello Fengrui,

As I am experiencing problems with DSS as well, similar to those reported by Peter, I implemented Pardiso and tried using it in my application which solves a linear system of equations A*x=b repeatedly (let me call these repeated calls to the matrix solver iterations), with the matrix A and b changing. At the same time, I tested what Jörn suggested in his post of this thread, i.e. replacing mkl_def.2.dll from 2024.2 by the version 2023.1.1.0.

This is what I found out:

  • DSS with MKL 2024.2: incorrect solution in the first iteration, i.e. the first of DSS, calculation then diverges.
  • DSS with MKL 2023.1: correct converged solution in several iterations.
  • Pardiso with MKL 2024.2: oscillating solution which does not converge to the correct result.
  • Pardiso with MKL 2023.1: correct converged solution in several iterations.

To summarize, switching from DSS to Pardiso does not resolve the problem. The workaround of using mkl_def.2.dll resolves the problem in case of both DSS and Pardiso; I am just not sure whether replacing a single dll cannot break other functionality of my application.

Best regards,

Jiri

0 Kudos
Fengrui
Moderator
1,020 Views

Hi Jiri,

 

Thank you for providing this information. The cause of this issue is being investigated. Could you please share more details (the matrix A and vector/matrix b) about the case that Pardiso with oneMKL 2024.2 doesn't work either?

 

Thanks,

Fengrui

0 Kudos
jirina
New Contributor I
874 Views

Hi Fengrui,

 

Unfortunately, it would be difficult for me to create a reproducer for Pardiso. While in case of DSS, I can clearly see that the solution vector is wrong if MKL 2024.2 is used, it is not straightforward in case on Pardiso:

  • I am solving a problem that requires iterations to reach a solution.
  • In each iteration, the matrix A and the vector b change.

If I find some time, I will try creating a reproducer, but chances that I succeed are probably low.

What is important for me, is that both DSS and Pardiso with MKL 2023.1.1.0 yield the same, correct solution of the problem I am solving, while with MKL 2024.2, calculation with DSS diverges and calculation with Pardiso oscillates. The fact that Pardiso does not diverge makes it difficult to create a reproducer.

0 Kudos
jirina
New Contributor I
1,250 Views

I am currently fighting a similar, if not the same, problem.

I have a console application written in C, and it calls DSS. If I run the app on a computer with an Intel processor, everything works fine and DSS yields a correct solution. If I run the app on a computer with an AMD processor, DSS yields a different solution, even though the matrix and the right hand side of the linear system of equations are the same (I double checked this by writing the matrix and the right hand side vector to text files and comparing them before calling dss_solve_real).

Also, the code calling DSS and data it uses have not changed in past few years, and the app worked well (DSS yielded correct solutions) with the MKL version from few years ago on computers with an AMD processor.

The way I am testing the app is that I copy it to the folder with data along with needed MKL libraries, and I run the app in the command prompt in that folder.

I tested this on 2 computers with an Intel processor (i7), one of them Windows 10 and the other one Windows 11, and on 2 computers with an AMD processor (ThreadRipper 2950X, 7950X3D), one them Windows 10 and the other one Windows 11.

0 Kudos
Jörn
Novice
1,230 Views

Hi jirina,

 

I've compiled the OP's test CPP file and have the same problem. Works on my Intel CPU, but fails on our server which has an AMD (EPYC) processor.

 

Since we've reported a similar problem with DGESVD in this forum, I tried the same "fix" for DSS and it seems to work: All you have to do is to replace mkl_def.2.dll by an older version - I used 2023.1.1.0.

 

So, in plain numbers:

mkl_def.2.dll version 2024.2.1.0: "Defect Norm: 48.7579"

mkl_def.2.dll version 2023.1.1.0: "Defect Norm: 6.14508e-14"

 

I'm quite concerned that DGESVD and DSS might only be the tip of the iceberg...

 

Best,

Jörn

jirina
New Contributor I
1,222 Views

Hi Jörn,

I have just tried replacing the latest version of mkl_def.2.dll by an older version, namely 2023.1.1.0. This simple replacement resolved my problem and I get the same solution to A*x=b on computers with Intel and AMD processors!

Thank you very much for this temporary workaround; I am just afraid that I might be breaking other functionality of my app using MKL.

I am thinking where to go from here. I can submit a support ticket, but it would require me to create a sample reproducer which would take me too much time, which I cannot afford.

Best regards and thanks a lot again for sharing the workaround.

Jiri

 

0 Kudos
Fengrui
Moderator
598 Views

Dear customers,


This issue has been fixed and the fix will be included in the upcoming 2025.0.1 patch release.


Thanks,

Fengrui


0 Kudos
Peter_Zajac
Beginner
551 Views

Hi,

 

thank you, I will try it out once it is released.

 

Best regards,

 - Peter

0 Kudos
Fengrui
Moderator
380 Views

Hi Peter,


Did you get a chance to verify the fix in oneMKL 2025.0.1?

It is available now, https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-download.html


Thanks,

Fengrui


0 Kudos
Peter_Zajac
Beginner
317 Views

Hello,

 

yes, I can confirm that the test code from the original post (as well as our original unit test) works now with MKL v2025.0.1 and the test code now gives me a defect norm of 1.57463e-10.

 

Thanks,

 - Peter

0 Kudos
jirina
New Contributor I
311 Views

Hi Fengrui,

Let me also confirm that the fix in oneMKL 2025.0.1 resolved my issues with both DSS and Pardiso.

Thank you,

Jiri

0 Kudos
Fengrui
Moderator
228 Views

Thank you all for the confirmation!


0 Kudos
Reply