- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Hello,
I am running on a four core Intel machine with Suse Linux. I have a finite element code that uses Pardiso and doesn't scale well with multiple processors. Here are the timings:
1 cpu 1:36
2 cpus 0:57
3 cpus 0:53
4 cpus 1:06
I should note that the same problem running on an SGI Altix machine with 8 processors does scale well. Has anyone else had problems with Pardiso not scaling well on Linux?
Thanks,
Dave
I am running on a four core Intel machine with Suse Linux. I have a finite element code that uses Pardiso and doesn't scale well with multiple processors. Here are the timings:
1 cpu 1:36
2 cpus 0:57
3 cpus 0:53
4 cpus 1:06
I should note that the same problem running on an SGI Altix machine with 8 processors does scale well. Has anyone else had problems with Pardiso not scaling well on Linux?
Thanks,
Dave
コピーされたリンク
4 返答(返信)
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Dave, please let us know:
What is the task size?
The number of non-zero elements?
the type of matrix?
In-Core or OOC version?
MKL version?
--Gennady
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Quoting - Gennady Fedorov (Intel)
Dave, please let us know:
What is the task size?
The number of non-zero elements?
the type of matrix?
In-Core or OOC version?
MKL version?
--Gennady
Gennady,
The task size is 202800 equations, 7847016 non-zero elements, matrix type -2 (real, symmetric, indefinite), in-core, MKL version 10.1.1.019
Dave
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Quoting - rawlinssci.utah.edu
Hello,
I am running on a four core Intel machine with Suse Linux. I have a finite element code that uses Pardiso and doesn't scale well with multiple processors. Here are the timings:
1 cpu 1:36
2 cpus 0:57
3 cpus 0:53
4 cpus 1:06
I should note that the same problem running on an SGI Altix machine with 8 processors does scale well. Has anyone else had problems with Pardiso not scaling well on Linux?
Thanks,
Dave
I am running on a four core Intel machine with Suse Linux. I have a finite element code that uses Pardiso and doesn't scale well with multiple processors. Here are the timings:
1 cpu 1:36
2 cpus 0:57
3 cpus 0:53
4 cpus 1:06
I should note that the same problem running on an SGI Altix machine with 8 processors does scale well. Has anyone else had problems with Pardiso not scaling well on Linux?
Thanks,
Dave
Dear Dave,
Is it possible to get yourmatrix along with typical PARDISO settings? If yes, please visit http://premier.intel.com, submit an issue and attach your data.Please, provide the output of command $cat /proc/cpuinfo and linking line as well.
The other alternative is to turn on printing statistical information by setting msglvl=1 andprovide us the PARDISO outputon MKL forum for MKL_NUM_THREADS=1 and MKL_THREADS=4.I'd also recommend to set iparm(27)=1. The setting checks sparse matrix representation and if something iswrong,PARDISO print warnings.
Thanks in advance
All the best
Sergey
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Hello Dave,
May you trythe following command and see if you canbetter performanceon the four core machine?
for example,
$ export MKL_NUM_THREADS=4
$ export KMP_AFFINITY=compact,verbose
$ ./Pardiso.lnx
Here is my test result
without export KMP_AFFINITY=compact,verbose
1 cpu -- Time in Solver: 66 seconds
2 cpus -- Time in Solver: 36 seconds
3 cpus --Time in Solve : 32 seconds.
4 cpus --Time in Solver: 34 seconds
if with export KMP_AFFINITY=compact,verbose
1 cpu -- Time in Solver: 65 seconds
2 cpus -- Time in Solver: 40 seconds
3 cpus --Time in Solve : 29 seconds.
4 cpus --Time in Solver: 29 seconds
So -the key problem is in KMP_Affinity here. There are some description regarding the performance test on multi-core in MKL User Guide Chapter 6, Managing Performance and Memory:
/*
If you run with HT enabled, performance may be especially impacted if you run on fewer
threads than physical cores. Moreover, if, for example, there are two threads to every
physical core, the thread scheduler may assign two threads to some cores and ignore the
other ones altogether. If you are using the OpenMP* library of the Intel Compiler, read the respective User Guide on how to best set the affinity to avoid this situation. For Intel MKL, you are recommended to set KMP_AFFINITY=granularity=fine,compact,1,0.
Managing Multi-core Performance
You can obtain best performance on systems with multi-core processors by requiring that
threads do not migrate from core to core. To do this, bind threads to the CPU cores by
setting an affinity mask to threads. You can do it using any of the following options:
?OpenMP facilities (recommended, if available), for instance, the KMP_AFFINITY
environment variable using the Intel OpenMP library.
*/
MKL implemented dynamic parallelization (that means that threads get their part of work at run-time) in LAPACK and PARDISO using Intel OpenMP library, by default, the OpenMP threads don't bind to physical core. So for processors with more than two cores, we recommend to set
KMP_AFFINITY=compact
With Regards,
Ying