Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library
- Threading bug in MKL 10.2 update 6

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Luc_Buatois

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-13-2010
06:56 AM

114 Views

Threading bug in MKL 10.2 update 6

Hello all,

we are currently using the PARDISO solver for solving symmetric definite systems. Since we had updated to MKL 10.2 update 6 we are facing a threading bug that strongly burdens the performance.

We have an 8 cores machine (no hyperthreading of course), use the mkl_set_num_threads(8), but when decomposing the matrix and solving a linear system, only 5 threads are fully used! In debug it's possible to check that 8 threads have been created by the mkl. We tried to set mkl_set_dynamic(0)... but any attempt lead to the same trouble: only 5 threads are used instead of 8, and the processing time is slower than with the MKL 10.2 update 2.

We also took some time to test this threading bug with the MKL 10.3 beta2, but that resulted in the same slowness.

Are you aware of such bug? Is there any plan to fix it soon?

Thanks in advance for you help.

Bests.

Link Copied

10 Replies

Alexander_K_Intel2

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-13-2010
07:09 AM

114 Views

Could you provide us the values of iparm(60) and iparm(64) after PARDISO running? And what kind of processor do you use?

With best regards,

Alexander Kalinkin

Gennady_F_Intel

Moderator

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-13-2010
07:52 AM

114 Views

and what was the size of the problem you solve?

Luc_Buatois

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-13-2010
08:01 AM

114 Views

We are using Xeon E5420. After PARDISO running, iparm(60)=0 which is coherent with our request of incore run, and iparm(64)=102000114.

Notice that the system fits largely in memory since it takes less than 2GB of RAM over the 16GB available. By the way, we saw that bug on all systems we tried whatever their sizes were.

Below are the params we used for PARDISO:

(all non referenced params have been set to 0, and below index are 0-based "C" index)

iparm_[ 0] = 1 ; /* No solver default */

iparm_[ 1] = 3 ; /* Fill-in reordering from METIS. 3==OpenMP METIS! */

We call PARDISO withe follwing params:

PARDISO( handle, 1, 0, 2, phase, N, values, row_index, columns, dummy_interger, 1, iparm_, 0, dummy_double, dummy_double, error ) ;

Thanks a lot for your help.

Bests.

Luc B.

Luc_Buatois

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-13-2010
08:08 AM

114 Views

Konstantin_A_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-13-2010
09:43 PM

114 Views

Could you please give us a bit more info: which PARDISO phase was really slow-down? Reordering, Factorization or Solving? You may set msglvl=1 and sent out outputs of 2 runs.

Thanks a lot,

Konstantin

Luc_Buatois

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-14-2010
02:00 AM

114 Views

Hello, in fact all parallelized phases seem to have been impacted. The ones that are really important for us are factorizating and solving phases.

I need to precise and correct some things I previously said. With mkl 10.2 update 2, factorization works fine in parallel. Solving phase was not parallelized, hence was slow but workable. In update 6, factorization is slower when running in parallel. Only 5 threads are used even if 8 are created. Solve pass is parallelized, and is a little bit faster than in update 2, but still only 5 threads are running in this phase! Concerning mkl 10.3 beta2, I've made a mistake yesterday. In fact it runs exactly as 10.2 update 2: factorization works fine in parallel, and as solve phase seems not to be parallelized, is slow but workable. It seems that the recent work made on 10.2 update 6 on improving solve phase using parallelization is the key...

As asked, please find below the result of a pardiso run for a matrix size of 217007 by 217007 with exactly 3365857 nnz. Log seems to be completely broken, but I don't know why!

Thanks a lot for your help!

Luc.

Number of items read = 100

*** Error in PARDISO memory allocation: WORK_I0 , size to allocate: 0 bytes

================ PARDISO: solving a symm. posit. def. system =================

=============== PARDISO: solving a symmetric indef. system ==================

============== PARDISO: solving a real struct. sym. system ===================

============= PARDISO: solving a symmetric indef. system ====================

============ PARDISO: solving a compl. str. sym. system ================

================ PARDISO: solving a real nonsymmetric system ================

reorder

Summary PARDISO: (

factorize Time parlist:

solve Time parlist:

clean Time parlist:

to Time parlist:

Times: Time parlist:

Time fulladj: Time symbfct:

Time A to LU:

================ PARDISO: solving a complex nonsym. system ================

Time numfct :

Time cgs :

0.000000 s cgx iterations -16843009

0.000000 s

< Parallel Direct Factorization with #processors: > 3365857

< Hybrid Solver PARDISO with CGS/CG Iteration >

< Numerical Factorization with Level-3 BLAS performance >

< Linear system Ax = b>

#non-zeros in A: 64

non-zeros in A (): 0.000000

#columns for each panel: 73823

#independent subgraphs: 3481

< Preprocessing with multiple minimum degree with constraints >

< no multiple minimum degree on the separator nodes >

< Preprocessing with input permutation >

Percentage of computed non-zeros for LL^T factorization

0 %

1 %

2 %

[..]

98 %

99 %

100 %

*** Error in PARDISO memory allocation: WORK_I0 , size to allocate: 4013312 bytes

================ PARDISO: solving a symm. posit. def. system =================

=============== PARDISO: solving a Herm. pos. def. system ==================

============== PARDISO: solving a real struct. sym. system ===================

============= PARDISO: solving a Herm. pos. def. system ====================

============ PARDISO: solving a compl. str. sym. system ================

================ PARDISO: solving a real nonsymmetric system ================

reorder

Summary PARDISO: (

) Time parlist:

================ Time parlist:

Times: Time parlist:

Time fulladj: Time symbfct:

Time A to LU:

================ PARDISO: solving a complex nonsym. system ================

Time numfct :

Time cgs :

0.000000 s cgx iterations -16843009

0.000000 s

< Parallel Direct Factorization with #processors: > 3365857

< Hybrid Solver PARDISO with CGS/CG Iteration >

< Numerical Factorization with Level-3 BLAS performance >

< Linear system Ax = b>

#non-zeros in A: 64

non-zeros in A (): 0.000000

#columns for each panel: 73823

#independent subgraphs: 3481

< Preprocessing with multiple minimum degree with constraints >

< no multiple minimum degree on the separator nodes >

< Preprocessing with input permutation >

#supernodes: -2032018434

size of largest supernode: 4626015541689943357

*** Error in PARDISO memory allocation: WORK_I0 , size to allocate: 4013312 bytes

================ PARDISO: solving a symm. posit. def. system =================

=============== PARDISO: solving a Hermitian indef. system ==================

============== PARDISO: solving a real struct. sym. system ===================

============= PARDISO: solving a Hermitian indef. system ====================

============ PARDISO: solving a compl. str. sym. system ================

================ PARDISO: solving a real nonsymmetric system ================

reorder

Summary PARDISO: (

=========== Time parlist:

Time fulladj: Time symbfct:

Time A to LU:

================ PARDISO: solving a complex nonsym. system ================

Time numfct :

Time cgs :

0.000000 s cgx iterations -16843009

0.000000 s

< Parallel Direct Factorization with #processors: > 3365857

< Hybrid Solver PARDISO with CGS/CG Iteration >

< Numerical Factorization with Level-3 BLAS performance >

< Linear system Ax = b>

#non-zeros in A: 64

non-zeros in A (): 0.000000

#columns for each panel: 73823

#independent subgraphs: 3481

< Preprocessing with multiple minimum degree with constraints >

< no multiple minimum degree on the separator nodes >

< Preprocessing with input permutation >

#supernodes: -2032018434

size of largest supernode: 4626015541689943357

Luc_Buatois

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-15-2010
01:39 AM

114 Views

Any idea of what's going wrong ?
Thanks !

barragan_villanueva_

Valued Contributor I

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-15-2010
05:24 AM

114 Views

Just looking at messages:

#supernodes: -2032018434

and memory problems it looks like you should link with ILP64 MKL libraries (please use correct compiler options and link line)

Luc_Buatois

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-15-2010
06:41 AM

114 Views

Thanks for your answer.

I tried to link with ILP64 with the corresponding compiler options, but still got exactly the same broken log, wrong number of threads used, slowness, but at least correct results!

I have the same troubles with any problem sizes, even very small. It looks like a scheduling bug.

Any other idea ?

Thanks a lot for your help!

barragan_villanueva_

Valued Contributor I

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-16-2010
08:25 AM

114 Views

Could you please send your compiler (C or FORTRAN) options and the whole link line?

Did you use compiler options: -i8 for FORTAN or -DMKL_ILP64 for C?

Small test-case to reproduce the problem would be very helpful.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.