Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Possible MKL Pardiso bug

xian-zhong_guous_cd-
370 Views

In our application we are using the Pardiso solver inside of a nonlinear solver that uses a Newton solver. The primary computation involves repeated factorizations of the Jacobian matrix and repeated solves using the factored matrix. Additionally, the application solves a series of these problems, each one independent of the other. The dimensions of these Jacobians are modest well under 100k but the benefit of the Pardiso speedup is substantial and increase as the problem size increases.

However, we have observed some failures in the solution process that are random in nature. Occasionally the errors do not occur at all. The result of the failure is that the Newton solver halts with failure messages that point to the factorization. A clue to the problem is to note that whenever these failure occur there is a spike in the memory use reported in the iparms array. Three numbers are reported, iparms[15], 16 and 17. It is the last number that spikes. For example, from a recent run that failed..

Pardiso Factor Jacobian: icalls=329157 n= 1372 LUnz= 33900 Mflops= 1 memused = 682 237 381

Pardiso fbsolve BDS Jacobian: icalls=570727 n= 1372 RHS memuse= 682 237 381

Pardiso fbsolve BDS Jacobian: icalls=570728 n= 1372 RHS memuse= 682 237 381

Pardiso Factor Jacobian: icalls=329159 Circalls=1170 n= 1372 LUnz= 33900 Mflops= 1 memused = 682 237 645

Pardiso fbsolve BDS Jacobian: icalls=570730 n= 1372 RHS memuse= 682 237 645

Pardiso fbsolve BDS Jacobian: icalls=570731 n= 1372 RHS memuse= 682 237 645

Pardiso Factor Jacobian: icalls=329163 n= 1372 LUnz= 33900 Mflops= 1 memused = 682 237 15926505

Pardiso Factor Jacobian: icalls=329164 n= 1372 LUnz= 33900 Mflops= 1 memused = 682 237 15926505

Pardiso Factor Jacobian: icalls=329165 n= 1372 LUnz= 33900 Mflops= 1 memused = 682 237 15926505

Pardiso Factor Jacobian: icalls=329166 n= 1372 LUnz= 33900 Mflops= 1 memused = 682 237 15926505

Pardiso Factor Jacobian: icalls=329167 n= 1372 LUnz= 33900 Mflops= 1 memused = 682 237 15926505

In this case the run almost finished, typically the error occurs much sooner. Icalls is the number of times the factor or solve routine has been called, yes that is 329,167 calls and 570,731 solves..

The memuse numbers are Kbytes and this problem has only 1372 equations and 33,900 nonzeros in the factor. The input matrices are good and the structure is always the same. So the question is, why does the memory use spike to 15926505?

This exact problem can be solver repetitively with a dense solver with no errors leading us to suspect that there is some issue inside Pardiso..

In this application the Pardiso solver starts over many times. That is, the Jacobian structure is always the same but independent problems are solved at several hundred time steps.

0 Kudos
1 Reply
Sergey_Solovev__Inte
New Contributor I
370 Views

Could you provide us with iparm() array (both before and after reordering step) and output statistic (msglvl=1) after reordering, factorization and solving steps? It would be useful to reproduce your problem.

And what version of MKL do you use?

0 Kudos
Reply