iparm(1) = 1 ! no solver default
!iparm(2) = 2 ! fill-in reordering from METIS
iparm(2) = 3 ! parallel (OpenMP) version of the nested dissection algorithm
iparm(4) = 0 ! no iterative-direct algorithm
iparm(5) = 0 ! no user fill-in reducing permutation
iparm(6) = 0 ! =0 solution on the first n compoments of x
iparm(8) = 9 ! numbers of iterative refinement steps
iparm(10) = 13 ! perturbe the pivot elements with 1E-13
iparm(11) = 1 ! use nonsymmetric permutation and scaling MPS
iparm(13) = 0 ! maximum weighted matching algorithm is switched-off (default for symmetric). Try iparm(13) = 1 in case of inappropriate accuracy
iparm(14) = 0 ! Output: number of perturbed pivots
iparm(18) = -1 ! Output: number of nonzeros in the factor LU
iparm(19) = -1 ! Output: Mflops for LU factorization
iparm(20) = 0 ! Output: Numbers of CG Iterations
iparm(21) = 1 ! Apply 1x1 and 2x2 Bunch and Kaufman pivoting during the factorization process
iparm(24) = 1 ! PARDISO uses new two-level factorization algorithm
In addition, I have set the following environment variables:
However, when running the program with one process, the CPU usage is 50% when solving, when running with two processes, the CPU usage of every process is 25% when solving, and so on. The final result is the total CPU usage is only 50%.
What have I missed to do?
For hyperthreading system, it is recommended to set the application threading number to half of the total physical threadings
You can check a few of the detail on this post:
You can use the following setting to enforce more threading:
MKL_NUM_THREADS= number of the threadings.
but it may not increase the application performance.