Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

CPardiso phase 33 scaling

Emond__Guillaume
Beginner
658 Views

Hi,

We want to use Cluster Pardiso for our finite element application. To get an estimate of performances, we used a simple code (attached file) to read a matrix from the Sparse Suite Collection (Matrix Market format) and then measure execution time for each phase (11, 22 and 33).

Factorisation (22) phase shows good scale up with MPI and OpenMP parallelization but solving phase (33) performances are not nearly as good as factorisation.

For example, the table below shows running times (in seconds) for differents combination of MPI processes and OMP threads (by process).

Serena.mtx:

MPI / OMP       Phase=22         Phase=33

2 / 2                  408.70               2.4668

2 / 4                  249.52               1.3382

2 / 8                  234.87               3.7524

2 / 16                93.879               1.3181

4 / 2                  327.69               1.8661

4 / 4                  162.16               1.9664

4 / 8                  96.526               4.4899

4 / 16                58.619               1.3763

8 / 2                  175.61               1.1638

8 / 4                  90.975               1.1006

8 / 8                  67.704               2.4264

8 / 16                39.654               0.9049

16 / 2                127.61               1.4321

16 / 4                62.155               0.9136

16 / 8                53.761               2.0407

16 / 16              26.957               0.7122

32 / 8                36.447               2.1856

32 / 16              24.977               0.3729

 

We can observe that solving does not always decrease with more MPI process or OpenMP threads. We tested other matrices (RM07R) but the same behaviour was observed. Is this normal or is it an issue? Is there a way to get better scaling?

 

Thanks a lot for any advice

Guillaume

0 Kudos
6 Replies
Gennady_F_Intel
Moderator
658 Views

Hi Guillaume,

This might be the scalability issue at the solution stage.

What version of MKL do you run?

thanks, Gennady

0 Kudos
Emond__Guillaume
Beginner
658 Views

Hi,

We used MKL 2017.0.4 for 64 architecture

0 Kudos
Gennady_F_Intel
Moderator
658 Views

Have you had a chance to take version 2019 and check the scalability with this version of mkl? You may take the latest update for free.

if not, then we will check these numbers on our side.

0 Kudos
Emond__Guillaume
Beginner
658 Views

Our application runs on Graham, a cluster at Compute Canada (https://docs.computecanada.ca/wiki/Graham). At the moment, the most recent version available is MKL 2018.0.3 and a request to install MKL 2019 could take a while before it is processed.

It seems that switching to MKL 2018 does not solve the problem. Here are the results for Serena.mtx (with MKL 2018).   

MPI / OMP       Phase=22         Phase=33

2 / 2                  431.21               2.4546

2 / 4                  266.05               1.5849

2 / 8                  266.05               1.7487

2 / 16                83.262                1.3433

4 / 2                  245.56               1.4342

4 / 4                  134.69               1.3420

4 / 8                  87.154               2.0223

4 / 16                54.010               2.1256

8 / 2                  152.77               1.0543

8 / 4                  89.457               1.0726

8 / 8                  60.434                2.1468

8 / 16                38.522               1.0357

16 / 2                110.03               1.5032

16 / 4                55.077               0.5922

16 / 8                40.506               0.9916

16 / 16              28.883               0.7282

32 / 8                34.848               0.7771

32 / 16              27.193               0.4550

 

I would appreciate if you could check these results.      

Thank you!

Guillaume

 

PS: Our application is compiled with these flags

-O3 -qopenmp -mkl=parallel -std=c++11 -Wall

 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -liomp5 -lpthread -lm -ldl -lmpi

0 Kudos
Gennady_F_Intel
Moderator
658 Views

sure, will try to check and get you back

0 Kudos
Gennady_F_Intel
Moderator
658 Views

Here what I obtained with OMP threads only due to some cluster access problem.

RM07M.  The scalability is too small.  

MKL version 2019 u1. 

OMP threads	phase == 22 (sec)	phase == 33 (sec)
1	        973.9	            5.04
2	        567.57	            3.01
4	        313.77	            2.37
8	        192.01	            1.94
16	        124.3	            1.76
32	        107.69	            1.72

 

 

0 Kudos
Reply