Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
7014 ディスカッション

Pardiso scaling with multiple processors on Linux

Rawlins__David
ビギナー
455件の閲覧回数
Hello,

I am running on a four core Intel machine with Suse Linux. I have a finite element code that uses Pardiso and doesn't scale well with multiple processors. Here are the timings:

1 cpu 1:36
2 cpus 0:57
3 cpus 0:53
4 cpus 1:06

I should note that the same problem running on an SGI Altix machine with 8 processors does scale well. Has anyone else had problems with Pardiso not scaling well on Linux?

Thanks,

Dave
0 件の賞賛
4 返答(返信)
Gennady_F_Intel
モデレーター
455件の閲覧回数
Dave, please let us know:

What is the task size?
The number of non-zero elements?
the type of matrix?
In-Core or OOC version?
MKL version?
--Gennady

Rawlins__David
ビギナー
455件の閲覧回数
Dave, please let us know:

What is the task size?
The number of non-zero elements?
the type of matrix?
In-Core or OOC version?
MKL version?
--Gennady


Gennady,

The task size is 202800 equations, 7847016 non-zero elements, matrix type -2 (real, symmetric, indefinite), in-core, MKL version 10.1.1.019

Dave
Sergey_K_Intel1
従業員
455件の閲覧回数
Hello,

I am running on a four core Intel machine with Suse Linux. I have a finite element code that uses Pardiso and doesn't scale well with multiple processors. Here are the timings:

1 cpu 1:36
2 cpus 0:57
3 cpus 0:53
4 cpus 1:06

I should note that the same problem running on an SGI Altix machine with 8 processors does scale well. Has anyone else had problems with Pardiso not scaling well on Linux?

Thanks,

Dave

Dear Dave,

Is it possible to get yourmatrix along with typical PARDISO settings? If yes, please visit http://premier.intel.com, submit an issue and attach your data.Please, provide the output of command $cat /proc/cpuinfo and linking line as well.

The other alternative is to turn on printing statistical information by setting msglvl=1 andprovide us the PARDISO outputon MKL forum for MKL_NUM_THREADS=1 and MKL_THREADS=4.I'd also recommend to set iparm(27)=1. The setting checks sparse matrix representation and if something iswrong,PARDISO print warnings.

Thanks in advance
All the best
Sergey
Ying_H_Intel
従業員
455件の閲覧回数

Hello Dave,

May you trythe following command and see if you canbetter performanceon the four core machine?
for example,
$ export MKL_NUM_THREADS=4
$ export KMP_AFFINITY=compact,verbose
$ ./Pardiso.lnx

Here is my test result
without export KMP_AFFINITY=compact,verbose
1 cpu -- Time in Solver: 66 seconds
2 cpus -- Time in Solver: 36 seconds
3 cpus --Time in Solve : 32 seconds.
4 cpus --Time in Solver: 34 seconds

if with export KMP_AFFINITY=compact,verbose
1 cpu -- Time in Solver: 65 seconds
2 cpus -- Time in Solver: 40 seconds
3 cpus --Time in Solve : 29 seconds.
4 cpus --Time in Solver: 29 seconds

So -the key problem is in KMP_Affinity here. There are some description regarding the performance test on multi-core in MKL User Guide Chapter 6, Managing Performance and Memory:
/*
If you run with HT enabled, performance may be especially impacted if you run on fewer
threads than physical cores. Moreover, if, for example, there are two threads to every
physical core, the thread scheduler may assign two threads to some cores and ignore the
other ones altogether. If you are using the OpenMP* library of the Intel Compiler, read the respective User Guide on how to best set the affinity to avoid this situation. For Intel MKL, you are recommended to set KMP_AFFINITY=granularity=fine,compact,1,0.

Managing Multi-core Performance
You can obtain best performance on systems with multi-core processors by requiring that
threads do not migrate from core to core. To do this, bind threads to the CPU cores by
setting an affinity mask to threads. You can do it using any of the following options:
?OpenMP facilities (recommended, if available), for instance, the KMP_AFFINITY
environment variable using the Intel OpenMP library.
*/

MKL implemented dynamic parallelization (that means that threads get their part of work at run-time) in LAPACK and PARDISO using Intel OpenMP library, by default, the OpenMP threads don't bind to physical core. So for processors with more than two cores, we recommend to set
KMP_AFFINITY=compact

With Regards,
Ying
返信