Pardiso scaling inversely with number of threads for openmp

John_N_ · ‎05-30-2014

Hello. After a number of failed debugging attempts and tests, I'm hoping to get some input on using pardiso in parallel with openmp. The software in question uses the intel fortran compiler (ifort) and also uses the pardiso solver within a broader finite element code.

I have attempted to run pardiso in parallel via openmp over 1, 2 and 4 processors, but the solve time systematically increases as the number of processors increases. This behavior is repeatable on:

two different computers (1 linux desktop, 1 linux cluster)
multiple different versions of the intel fortran compilers/mkl (11.1_080, 2013.5.192, 2013_sp1) used on the linux desktop
repeated checking of different mkl link line options (https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/)
tests with different matrix sizes - number of equations can vary from ~100,000 to greater than 10,000,000

For example, pardiso solve times might scaling in the following way for a relatively small matrix (80,000 equations): 1 processor: 0.79 seconds, 2 processors: 1.17 seconds, 4 processors: 1.91 seconds.

So, something I'm doing across all the systems (hardware, compiler versions, etc) is fundamentally wrong.

Before posting specifics for one example (iparm input parameters, mkl link line commands, compiler version, etc), is there any documentation ,previous posts, etc I should look at that might shed some light on this issue? At this point I've gone through the mkl manual and forums and haven't found any clues to what the issue is. If there is no other documentation to look up, I'll go ahead and post up whatever system/solver information is required.

Thanks in advance,

John

John_N_ · ‎05-30-2014

test

Alexander_K_Intel2 · ‎05-30-2014

Hi John,

That's really strange behavior of PARDISO solving step which we doesn't expect. Which processors do you use and have you switch of hyper-threading?

Thanks,

Alex

Gennady_F_Intel · ‎05-30-2014

>> For example, pardiso solve times might scaling in the following way for a relatively small matrix (80,000 equations): 1 processor: 0.79 seconds, 2 processors: 1.17 seconds, 4 processors: 1.91 seconds.

<< is that solving steps time only or reodering + factorization too?