Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
Announcements
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.
6557 Discussions

Can not get speedup with multi-core systems when using MKL-Pardiso solver

SunnySunny
Beginner
779 Views

Hi, I am using MKL-Pardiso solver to solve a asymmetric sparse matrix. The calculation time for pardiso-solver does not decrease when I increase the threads number. Here are some related information.

1.The computer has 32 cores.

2. I am using OpenMP to do parallel computing. I have set 'call mkl_set_dynamic(0), call mkl_set_num_threads(threads number)' 

3. When I increase the threads number (e.g. from 1 to 12), the calculation time for pardiso solver increase a lot instead of decrease. However, the other parts of the code which use OpenMP parallel computing do show a decrease on computational time. 

I am really confused. Hope somebody can help me. Thanks a lot.

0 Kudos
8 Replies
andrew_4619
Honored Contributor I
767 Views

I know nothing! I did however read at https://www.pardiso-project.org/   "Important: Please note that the Intel MKL version of PARDISO is based on our version from 2006 and that a lot of new features and improvements of PARDISO are not available in the Intel MKL library." Maybe others will comment. Maybe the MKL forum is a better place?

 

SunnySunny
Beginner
763 Views

Thanks a lot for your reply. According to the link you posted, it seems pardiso 7.2 got a lot of improvements comparing to mkl_pardiso. I am using a very old version of fortran, will it compatible with pardiso 7.2?

andrew_4619
Honored Contributor I
758 Views

why are you using a "very old" Fortran when the current OneAPI fortran is Free to use? What version do you have?

SunnySunny
Beginner
756 Views

It is Fortran XE 2013. Our seniors code is based on that version. I just don't want to modify too much.

jimdempseyatthecove
Black Belt
750 Views

IF your are calling MKL from within an OpenMP parallel region, you should be linking with the MKL sequential library.

On the other hand

IF your are calling MKL from a sequential application, you should be linking with the MKL threaded library.

On the other other hand

IF your are calling MKL from the sequential portion of a threaded application, you should be linking with the MKL threaded library .AND. only call from the main thread (same thread always) .AND. set the environment variable KMP_BLOCKTIME=0

 

Note, IIF your 12 thread OpenMP application is calling MKL threaded library from within an OpenMP parallel region, each of the 12 threads upon call to MKL will instantiate its own thread team of 12 threads. IOW 144 threads will be in play, and performance will be awful.

Jim Dempsey

SunnySunny
Beginner
708 Views

Thanks a lot.  I am calling MKL from the sequential application. However, in the VS2012 Fortran  XE2013, there is no MKL threaded library. The attached figure shows the MKL library it has. Does it due to the version? Thank you.

Kirill_V_Intel
Employee
566 Views

Hello!

First,  there have been a lot of unsupported claims by PARDISO Project. Second, MKL PARDISO has been developed independently from that project for quite some time already (since 2006) and there were many people who has improved the solver since then at Intel MKL. So while two solvers have a common past, it will not be correct to think that MKL PARDISO has the same performance as PARDISO in 2006.

For the issue you're describing, we need to know more details. It doesn't sound right at all that you see negative scalability.

Can you share the following?

1) iparm settings

2) set msglvl = 1 and share output produced by PARDISO.

or (and I'd prefer this option if it is possible)

3) a small working reproducer which shows how you call PARDISO for one of your matrices (so that we can check performance on our side). So we need a code which shows how you call PARDISO + matrix data

 

Best,
Kirill

segmentation_fault
New Contributor I
378 Views

I have run into the same issue with cluster_sparse_solver . The problem has been how I call the program and set the OMP_NUM_THREADS variable. What works for me using InteloneAPI is:

 

On one machine:

export OMP_NUM_THREADS=number_of_physical cpus

mpirun -np 1 -ppn 1 ./a.out

 

For multi-node ( replace X with # of nodes

export OMP_NUM_THREADS=number_of_physical cpus

 mpirun -np 2 -ppn 1 -hosts host1,host2 ./a.out

 

 

Reply