- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hello all,

we are currently using the PARDISO solver for solving symmetric definite systems. Since we had updated to MKL 10.2 update 6 we are facing a threading bug that strongly burdens the performance.

We have an 8 cores machine (no hyperthreading of course), use the mkl_set_num_threads(8), but when decomposing the matrix and solving a linear system, only 5 threads are fully used! In debug it's possible to check that 8 threads have been created by the mkl. We tried to set mkl_set_dynamic(0)... but any attempt lead to the same trouble: only 5 threads are used instead of 8, and the processing time is slower than with the MKL 10.2 update 2.

We also took some time to test this threading bug with the MKL 10.3 beta2, but that resulted in the same slowness.

Are you aware of such bug? Is there any plan to fix it soon?

Thanks in advance for you help.

Bests.

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

We are using Xeon E5420. After PARDISO running, iparm(60)=0 which is coherent with our request of incore run, and iparm(64)=102000114.

Notice that the system fits largely in memory since it takes less than 2GB of RAM over the 16GB available. By the way, we saw that bug on all systems we tried whatever their sizes were.

Below are the params we used for PARDISO:

(all non referenced params have been set to 0, and below index are 0-based "C" index)

iparm_[ 0] = 1 ; /* No solver default */

iparm_[ 1] = 3 ; /* Fill-in reordering from METIS. 3==OpenMP METIS! */

We call PARDISO withe follwing params:

PARDISO( handle, 1, 0, 2, phase, N, values, row_index, columns, dummy_interger, 1, iparm_, 0, dummy_double, dummy_double, error ) ;

Thanks a lot for your help.

Bests.

Luc B.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hello, in fact all parallelized phases seem to have been impacted. The ones that are really important for us are factorizating and solving phases.

I need to precise and correct some things I previously said. With mkl 10.2 update 2, factorization works fine in parallel. Solving phase was not parallelized, hence was slow but workable. In update 6, factorization is slower when running in parallel. Only 5 threads are used even if 8 are created. Solve pass is parallelized, and is a little bit faster than in update 2, but still only 5 threads are running in this phase! Concerning mkl 10.3 beta2, I've made a mistake yesterday. In fact it runs exactly as 10.2 update 2: factorization works fine in parallel, and as solve phase seems not to be parallelized, is slow but workable. It seems that the recent work made on 10.2 update 6 on improving solve phase using parallelization is the key...

As asked, please find below the result of a pardiso run for a matrix size of 217007 by 217007 with exactly 3365857 nnz. Log seems to be completely broken, but I don't know why!

Thanks a lot for your help!

Luc.

Number of items read = 100

*** Error in PARDISO memory allocation: WORK_I0 , size to allocate: 0 bytes

================ PARDISO: solving a symm. posit. def. system =================

=============== PARDISO: solving a symmetric indef. system ==================

============== PARDISO: solving a real struct. sym. system ===================

============= PARDISO: solving a symmetric indef. system ====================

============ PARDISO: solving a compl. str. sym. system ================

================ PARDISO: solving a real nonsymmetric system ================

reorder

Summary PARDISO: (

factorize Time parlist:

solve Time parlist:

clean Time parlist:

to Time parlist:

Times: Time parlist:

Time fulladj: Time symbfct:

Time A to LU:

================ PARDISO: solving a complex nonsym. system ================

Time numfct :

Time cgs :

0.000000 s cgx iterations -16843009

0.000000 s

< Parallel Direct Factorization with #processors: > 3365857

< Hybrid Solver PARDISO with CGS/CG Iteration >

< Numerical Factorization with Level-3 BLAS performance >

< Linear system Ax = b>

#non-zeros in A: 64

non-zeros in A (): 0.000000

#columns for each panel: 73823

#independent subgraphs: 3481

< Preprocessing with multiple minimum degree with constraints >

< no multiple minimum degree on the separator nodes >

< Preprocessing with input permutation >

Percentage of computed non-zeros for LL^T factorization

0 %

1 %

2 %

[..]

98 %

99 %

100 %

*** Error in PARDISO memory allocation: WORK_I0 , size to allocate: 4013312 bytes

================ PARDISO: solving a symm. posit. def. system =================

=============== PARDISO: solving a Herm. pos. def. system ==================

============== PARDISO: solving a real struct. sym. system ===================

============= PARDISO: solving a Herm. pos. def. system ====================

============ PARDISO: solving a compl. str. sym. system ================

================ PARDISO: solving a real nonsymmetric system ================

reorder

Summary PARDISO: (

) Time parlist:

================ Time parlist:

Times: Time parlist:

Time fulladj: Time symbfct:

Time A to LU:

================ PARDISO: solving a complex nonsym. system ================

Time numfct :

Time cgs :

0.000000 s cgx iterations -16843009

0.000000 s

< Parallel Direct Factorization with #processors: > 3365857

< Hybrid Solver PARDISO with CGS/CG Iteration >

< Numerical Factorization with Level-3 BLAS performance >

< Linear system Ax = b>

#non-zeros in A: 64

non-zeros in A (): 0.000000

#columns for each panel: 73823

#independent subgraphs: 3481

< Preprocessing with multiple minimum degree with constraints >

< no multiple minimum degree on the separator nodes >

< Preprocessing with input permutation >

#supernodes: -2032018434

size of largest supernode: 4626015541689943357

*** Error in PARDISO memory allocation: WORK_I0 , size to allocate: 4013312 bytes

================ PARDISO: solving a symm. posit. def. system =================

=============== PARDISO: solving a Hermitian indef. system ==================

============== PARDISO: solving a real struct. sym. system ===================

============= PARDISO: solving a Hermitian indef. system ====================

============ PARDISO: solving a compl. str. sym. system ================

================ PARDISO: solving a real nonsymmetric system ================

reorder

Summary PARDISO: (

=========== Time parlist:

Time fulladj: Time symbfct:

Time A to LU:

================ PARDISO: solving a complex nonsym. system ================

Time numfct :

Time cgs :

0.000000 s cgx iterations -16843009

0.000000 s

< Parallel Direct Factorization with #processors: > 3365857

< Hybrid Solver PARDISO with CGS/CG Iteration >

< Numerical Factorization with Level-3 BLAS performance >

< Linear system Ax = b>

#non-zeros in A: 64

non-zeros in A (): 0.000000

#columns for each panel: 73823

#independent subgraphs: 3481

< Preprocessing with multiple minimum degree with constraints >

< no multiple minimum degree on the separator nodes >

< Preprocessing with input permutation >

#supernodes: -2032018434

size of largest supernode: 4626015541689943357

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Just looking at messages:

#supernodes: -2032018434

and memory problems it looks like you should link with ILP64 MKL libraries (please use correct compiler options and link line)

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Thanks for your answer.

I tried to link with ILP64 with the corresponding compiler options, but still got exactly the same broken log, wrong number of threads used, slowness, but at least correct results!

I have the same troubles with any problem sizes, even very small. It looks like a scheduling bug.

Any other idea ?

Thanks a lot for your help!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Could you please send your compiler (C or FORTRAN) options and the whole link line?

Did you use compiler options: -i8 for FORTAN or -DMKL_ILP64 for C?

Small test-case to reproduce the problem would be very helpful.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page