Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

problem with mkl pardiso performances

Antoine__A_
Beginner
139 Views

Hello,

i'm using Pardiso to solve a real unsymmetrical problem with very large matrix
( n=40 000 ), 3% non-zeros (45 986 096) and 1 rhs.

I'm quite disappointed about the times of calculation : on my computer (8 CPUs, 16432032 KB total memory, 1595 MHz), it takes 35 min. Does it seem normal to you?
I haven't found any comparison of performance related to the size of the matrix...

Maybe my iparms are not optimized for my problem :
        iparm[0] = 1; /* No solver default */
        iparm[1] = 2; /* Fill-in reordering from METIS */
        iparm[2] = 8;
        iparm[3] = 31; /* CGS */
        iparm[4] = 0; /* No user fill-in reducing permutation */
        iparm[5] = 0; /* Write solution into x */
        iparm[6] = 0; /* Not in use */
        iparm[7] = 0; /* Max numbers of iterative refinement steps */
        iparm[8] = 0; /* Not in use */
        iparm[9] = 13; /* Perturb the pivot elements with 1E-13 */
        iparm[10] = 1; /* Use nonsymmetric permutation and scaling MPS */
        iparm[11] = 0; /* Not in use */
        iparm[12] = 0; /* Not in use */
        iparm[13] = 0; /* Output: Number of perturbed pivots */
        iparm[14] = 0; /* Not in use */
        iparm[15] = 0; /* Not in use */
        iparm[16] = 0; /* Not in use */
        iparm[17] = -1; /* Output: Number of nonzeros in the factor LU */
        iparm[18] = -1; /* Output: Mflops for LU factorization */
        iparm[19] = 0; /* Output: Numbers of CG Iterations */
        iparm[31] = 1; /* iterative solver*/
        iparm[60] = 2; /* Out-of-Core resolution */


Thanks for any help you can give me
Yannick

p.s : I don't know where to find the version of pardiso ( is not in mkl_pardiso.h)

0 Kudos
7 Replies
Gennady_F_Intel
Moderator
139 Views
Yannick, your case is pretty big and you works in Out-Of-Core mode with is non-threaded and actually this time is looks reasonable for that case. what is the size of RAM on your system? and what CPU type you use? regarding pardiso version - in the latest version on mkl - pardiso print the version in the case if msglvl == 1 or you can find the version on mkl into ..\Documentation\mklsupport.txt --Gennady
Antoine__A_
Beginner
139 Views
Thanks for your reply The CPUs are : Bi-pro, Quad Core Intel Xeon E5335, 2x4MB Cache, 2.0GHz, 1333MHZ, the size of my RAM is 16 432 032 KB, and the Package ID of mkl is : l_mkl_p_10.0.011 I turned off Out-Of-Core mode, but it's the same performance.
Gennady_F_Intel
Moderator
139 Views
ok, thanks. I see you are working on the modest CPU but the mkl version is pretty aged. i would say that we did many improvements since 10.0 especially with OOC mode. Can you evaluate the latest 11.0? it's free for 30 days.
Alexander_K_Intel2
139 Views
Hi Antonie, Could you provide pardiso output by msglvl=1? It could help us to understand reason of bad performance of your testcase. In your iparm I see only one strange point for me - what the reason of setting iparm[3]=31? With best regards, Alexander Kalinkin
Antoine__A_
Beginner
139 Views
iparm[31] is just a try; I tried iparm[3]=31 and iparm[3]=0, It doesn't influence a lot on performance ================ PARDISO: solving a real nonsymmetric system ================ Summary PARDISO: ( reorder to reorder ) ================ Times: ====== Time fulladj: 2.646652 s Time reorder: 14.776046 s Time symbfct: 5.390644 s Time parlist: 0.193419 s Time malloc : -0.334824 s Time total : 37.119933 s total - sum: 14.447996 s Statistics: =========== < Parallel Direct Factorization with #processors: > 8 < Hybrid Solver PARDISO with CGS/CG Iteration > < Linear system Ax = b> #equations: 40000 #non-zeros in A: 45986096 non-zeros in A (%): 2.874131 #right-hand sides: 1 < Factors L and U > #columns for each panel: 72 #independent subgraphs: 0 < Preprocessing with state of the art partitioning metis> #supernodes: 3069 size of largest supernode: 19757 number of nonzeros in L 462436285 number of nonzeros in U 460178265 number of nonzeros in L+U 922614550 Reordering completed ... Number of nonzeros in factors = 922614550 Number of factorization MFLOPS = 13695630 ================ PARDISO: solving a real nonsymmetric system ================ Summary PARDISO: ( factorize to factorize ) ================ Times: ====== Time A to LU: 0.000000 s Time numfct : 2071.496263 s Time malloc : -0.000183 s Time total : 2071.496571 s total - sum: 0.000491 s Statistics: =========== < Parallel Direct Factorization with #processors: > 8 < Hybrid Solver PARDISO with CGS/CG Iteration > < Linear system Ax = b> #equations: 40000 #non-zeros in A: 45986096 non-zeros in A (%): 2.874131 #right-hand sides: 1 < Factors L and U > #columns for each panel: 72 #independent subgraphs: 0 < Preprocessing with state of the art partitioning metis> #supernodes: 3069 size of largest supernode: 19757 number of nonzeros in L 462436285 number of nonzeros in U 460178265 number of nonzeros in L+U 922614550 gflop for the numerical factorization: 13695.630000 gflop/s for the numerical factorization: 6.611467 Factorization completed ... ================ PARDISO: solving a real nonsymmetric system ================ Summary PARDISO: ( solve to solve ) ================ Times: ====== Time cgs : 6.895049 s cgx iterations 1 Time malloc : -0.000012 s Time total : 6.895151 s total - sum: 0.000113 s Statistics: =========== < Parallel Direct Factorization with #processors: > 8 < Hybrid Solver PARDISO with CGS/CG Iteration > < Linear system Ax = b> #equations: 40000 #non-zeros in A: 45986096 non-zeros in A (%): 2.874131 #right-hand sides: 1 < Factors L and U > #columns for each panel: 72 #independent subgraphs: 0 < Preprocessing with state of the art partitioning metis> #supernodes: 3069 size of largest supernode: 19757 number of nonzeros in L 462436285 number of nonzeros in U 460178265 number of nonzeros in L+U 922614550 gflop for the numerical factorization: 13695.630000 gflop/s for the numerical factorization: 6.611467 Solve completed ... ===========
Alexander_K_Intel2
139 Views
Hi Antonie, Your output a bit confused me - the factorized matrix is almost dense! Could you send this matrix to me (for example in private thread) to understand such situation arose? With best regards, Alexander Kalinkin
Antoine__A_
Beginner
139 Views
The file weighs 790Mo... How can i give it to you?
Reply