- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hello folks,

I've a strange performance problem with PARDISO on Windows. Before I open a support call I'll hope to get some feedback in this forum.

I'm using Intel® Parallel Studio XE 2018 Update 3 Composer Edition for Fortran Windows*, Version 18.0.0040.

I have noticed that **parallel processing in PARDISO in MKL version 2018.0.3 does not work at all** and processing with only one thread is significantly slower than in version 2016.

Attached I've a small C++ test program and sample data to solve a small system multiple time.

When I run the program using the MKL DLLs from version 2018.0.3 I get following result:

>Release\pardiso.exe _data\mat.mm _data\b.mm Intel(R) Math Kernel Library Version 2018.0.3 Product Build 20180406 for 32-bit applications Solving matrix file _data\mat.mm with vector data _data\b.mm. Data: rows=445, cols=445, values=1339 MKL threads: 6 Performance: Loops=10000, Time=2.785514 sec

And now the funny stuff starts. The same program executed with MKL DLLs from version 2016 (11.3.3) create the following result:

>Release\pardiso.exe _data\mat.mm _data\b.mm Intel(R) Math Kernel Library Version 11.3.3 Product Build 20160413 for 32-bit applications Solving matrix file _data\mat.mm with vector data _data\b.mm. Data: rows=445, cols=445, values=1339 MKL threads: 6 Performance: Loops=10000, Time=1.171534 sec

And it's gonna get worse. The new PARDISO version 2018.0.3 uses a big amount of CPU time for multiple threads but it is slower compared with execution with only one single thread!

According to my understanding I've configured all stuff correct. And as it can be seen, using the old MKL stuff from 2016 it works fine.

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

For better understanding I have attached log files containing PARDISO diagnostic data. It shows results from single and multicore runs. This also makes it clear that 6 threads are really used and at the same time the performance of MFLOPS decreases.

This is the result form 6 core parallel calculation:

Statistics: =========== Parallel Direct Factorization is running on 6 OpenMP < Linear system Ax = b > number of equations: 445 number of non-zeros in A: 1339 number of non-zeros in A (%): 0.676177 number of right-hand sides: 1 < Factors L and U > number of columns for each panel: 128 number of independent subgraphs: 0 < Preprocessing with state of the art partitioning metis> number of supernodes: 427 size of largest supernode: 2 number of non-zeros in L: 1153 number of non-zeros in U: 672 number of non-zeros in L+U: 1825 gflop for the numerical factorization: 0.000015 gflop/s for the numerical factorization: 0.000479 Matrix Performance: Loops=1 Time=0.182937 sec

Here comes now the single core result. It has a better gflop/s performance as using MKL with 6 cores:

Statistics: =========== Parallel Direct Factorization is running on 1 OpenMP < Linear system Ax = b > number of equations: 445 number of non-zeros in A: 1339 number of non-zeros in A (%): 0.676177 number of right-hand sides: 1 < Factors L and U > number of columns for each panel: 128 number of independent subgraphs: 0 < Preprocessing with state of the art partitioning metis> number of supernodes: 427 size of largest supernode: 2 number of non-zeros in L: 1153 number of non-zeros in U: 672 number of non-zeros in L+U: 1825 gflop for the numerical factorization: 0.000015 gflop/s for the numerical factorization: 0.000532 Matrix Performance: Loops=1 Time=0.164001 sec

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

the problem size is too small. Do you see the similar performance regression with biggestt problem size too?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Gennady F. (Intel) wrote:

the problem size is too small. Do you see the similar performance regression with biggestt problem size too?

In my real application I can see same performance problem with larger systems too.

Anyway, I'll verify it in the small test program too. Please feel free to use my attached sample and any MM data file to verify it which a larger data set to be solved.

The main problem for me is that it seems to be 3 times slower in MKL 2018 as it was in MKL 2016. I'm happy to get feedback about compiler options and other settings which can be changed to get better PARDISO performance in MKL 2018 (or at least the same one as it was in the past).

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Gennady F. (Intel) wrote:

the problem size is too small. Do you see the similar performance regression with biggestt problem size too?

I am interested in the performance of Pardiso for systems with a number of equations around 500.

So possible solutions are: a) do not use pardiso for sparse systems with N<nnn. b) use pardiso but set max no of cores to 1. c) ......

Best regards

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

As mentioned in my previous post, I've done a test with a little bit larger matrix to be solved. Now the matrix is 131458x131458 with 712722 non-zero values. This is the typical size for our application.

The same performance problem in MKL 2018 is here too:

>Release\pardiso.exe _data\mat2.mm _data\b2.mm Intel(R) Math Kernel Library Version 2018.0.3 Product Build 20180406 for 32-bit applications Solving matrix file _data\mat2.mm with vector data _data\b2.mm. Data: rows=131458, cols=131458, values=712722 MKL threads: 6 Performance: Loops=100, Time=8.250383 sec

Same system solved with MKL 2016:

>Release\pardiso.exe _data\mat2.mm _data\b2.mm Intel(R) Math Kernel Library Version 11.3.3 Product Build 20160413 for 32-bit applications Solving matrix file _data\mat2.mm with vector data _data\b2.mm. Data: rows=131458, cols=131458, values=712722 MKL threads: 6 Performance: Loops=100, Time=4.823882 sec

As you can clearly see, the new MKL 2018 it about 50% slower as older versions.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I recently did the same upgrade in versions. I see a similar downgrade in run times, BUT it now gives better accuracy on my ill conditioned matrices and now matches IMSL and SuperLU in this respect. It was quite poor before and accuracy is as important to me as the speed.

I cannot say anything about multi-core as I long gave up on that aspect of PARDISO. But it might be worth another look now.

My problem sizes are between 500 and 20000 freedoms.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

thanks Andrew and Michael. I managed to reproduced the issue on our side and the case is escalated. we will keep you updated with the status.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Dears,

has this issue in PARDISO been fixed in any of the more recent releases of MKL?

Thanks and kind regards

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Dears,

Is it possible to have an update on this?

Gennady F. (Blackbelt) wrote:thanks Andrew and Michael. I managed to reproduced the issue on our side and the case is escalated. we will keep you updated with the status.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page