Hello. After a number of failed debugging attempts and tests, I'm hoping to get some input on using pardiso in parallel with openmp. The software in question uses the intel fortran compiler (ifort) and also uses the pardiso solver within a broader finite element code.
I have attempted to run pardiso in parallel via openmp over 1, 2 and 4 processors, but the solve time systematically increases as the number of processors increases. This behavior is repeatable on:
- two different computers (1 linux desktop, 1 linux cluster)
- multiple different versions of the intel fortran compilers/mkl (11.1_080, 2013.5.192, 2013_sp1) used on the linux desktop
- repeated checking of different mkl link line options (https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/)
- tests with different matrix sizes - number of equations can vary from ~100,000 to greater than 10,000,000
For example, pardiso solve times might scaling in the following way for a relatively small matrix (80,000 equations): 1 processor: 0.79 seconds, 2 processors: 1.17 seconds, 4 processors: 1.91 seconds.
So, something I'm doing across all the systems (hardware, compiler versions, etc) is fundamentally wrong.
Before posting specifics for one example (iparm input parameters, mkl link line commands, compiler version, etc), is there any documentation ,previous posts, etc I should look at that might shed some light on this issue? At this point I've gone through the mkl manual and forums and haven't found any clues to what the issue is. If there is no other documentation to look up, I'll go ahead and post up whatever system/solver information is required.
Thanks in advance,
That's really strange behavior of PARDISO solving step which we doesn't expect. Which processors do you use and have you switch of hyper-threading?
>> For example, pardiso solve times might scaling in the following way for a relatively small matrix (80,000 equations): 1 processor: 0.79 seconds, 2 processors: 1.17 seconds, 4 processors: 1.91 seconds.
<< is that solving steps time only or reodering + factorization too?