MKL Solvers Parallel Performance

Idefix · ‎07-08-2009

Dear all,
we are using MKL for solving transient water flow and solute transport in porous media. We apply both conjugate gradient and dfgmres solvers because the matrices are symmetric and asymmetric for the different problem categories. We are using ILU(0) preconditioning in both cases. The codes are running well and we observe a significant speedup in comparison to the unparallelized solvers we used before. Great! There is, however, something that struck us by surprise. The speedup observed on a dualcore machine was about 2 and the speedup on a quadcore machine was 2 as well, although we expected it to be higher. In both cases, CPU usage amounts to more than 90%. We have also played around with the environment variable mkl_num_threads. On the quadcore machine, setting this to one resulted in a quicker execution compared to not specifying at explicitly and letting MKL assign it dynamically! CPU usage was still more than 90%. We would be glad if someone could explain these results and could come up with some suggestions for a further speedup on the quadcore machine.

Thank you very much in advance

Idefix

Alexander_K_Intel2 · ‎07-08-2009

Quoting - Idefix

Dear all,
we are using MKL for solving transient water flow and solute transport in porous media. We apply both conjugate gradient and dfgmres solvers because the matrices are symmetric and asymmetric for the different problem categories. We are using ILU(0) preconditioning in both cases. The codes are running well and we observe a significant speedup in comparison to the unparallelized solvers we used before. Great! There is, however, something that struck us by surprise. The speedup observed on a dualcore machine was about 2 and the speedup on a quadcore machine was 2 as well, although we expected it to be higher. In both cases, CPU usage amounts to more than 90%. We have also played around with the environment variable mkl_num_threads. On the quadcore machine, setting this to one resulted in a quicker execution compared to not specifying at explicitly and letting MKL assign it dynamically! CPU usage was still more than 90%. We would be glad if someone could explain these results and could come up with some suggestions for a further speedup on the quadcore machine.

Thank you very much in advance

Idefix

Hi Idefix,
As I understand your program is enough complicated and uses many different parts of MKL. Could you measured what time your code spent on computing precondition (ILU(0)), on matrix multiplication, and calling CG subroutines on different machines. With these data we could understand situation more deeply.
With best regards,
Alexander