sclability problem with Pardiso

Stanislav_S_ · ‎11-24-2016

Hello,

I am using Pardiso parallel solver from Intel MKL (cluster_sparse_solver) and I have some problems with achieving good efficiency. The computations are run on Intel Xeon E5-2650 v2 8C 2.6GHz CPUs (co-processors are not used). The sparse system has 888 195 DOF, I do not go to systems with bigger size due to specifications of the problem (the solution of the sparse system is small part of the whole algorithm which involves sparse and dense matrices). Because I have several solutions of sparse systems with cluster_sparse_solver, at the end of the algorithm I get bad scalability, because of the MKL solver.

I would like to ask if someone knows if the whole cluster_sparse_solver is implemented for parallel runs? From the output of the cluster_sparse_solver I see the the CPU time of the first part of the algorithm does not change with increase of the processors:

Strong scalability of cluster_sparse_solver, mesh with 296 065 nodes, system with 888195 DOF

number of processes              1            2         4   8      16   32   64

11   Analysis                               5.9624   5.8961   5.8318   5.8405   5.8652   5.9099   6.0160
22   Numerical factorization       14.5922   7.7329   4.6724   2.9844   1.9513   1.5424   1.2885
33   Solve, iterative refinement   1.3819   0.8207   0.5122   0.3700   0.3089   0.2710   0.5799

I have contacted developers from Intel, but unfortunately I did not get an answer from them. I would be grateful if someone share his experience with scalability of cluster_sparse_solver.

Regards,

Stanley

Gennady_F_Intel · ‎11-24-2016

Stanley, we don't whom you contacted previously, but this is the right place to contact with us.

What mkl version have you tried?

Stanislav_S_ · ‎11-25-2016

Hello Gennady, the version of MKL which I used to get the results is 11.3.2.

Gennady_F_Intel · ‎11-25-2016

Stanislav, the fully distributed reordering in CPardiso has been added since MKL 2017 update 1 ( released Nov1st'16). You may take the eval version and check how it will work on your side with those workloads.

Stanislav_S_ · ‎01-09-2017

Hello Gennady,

Thank you for your reply. My colleagues installed the new version of MKL (MKL 2017 update 1) and I checked again the scalability of cluster_sparse_solver. Unfortunately the results are similar to the ones with MKL 2016. I generated sparse matrix with higher dimension, currently I have 3.5 millions DOF. Here are the results of strong scalability of cluster_sparse_solver (1 OpenMP and several MPI processes).

number of processes             1         2             4   8    16

11   Analysis                             26.4461      26.2817   25.8782   26.5270   26.0985
22   Numerical factorization       123.822 65.1132   38.7532   23.6410   16.1666
33   Solve, iterative refinement   4.7206         3.7750   2.5918    1.9842    1.9570

Again it seems that the first phase of the algorithm is not implemented for parallel computations.

I would like ask you if you can provide me with example where the first phase has scalability. Then I can modify my code to obtain also scalability in my algorithm.

Best regards,

Stanley

Alexander_K_Intel2 · ‎01-09-2017

Hi,

Can you set iparm[1] to 10 and rerun test on last version of MKL?

Thanks,

Alex

li__wei · ‎04-20-2018

I have similar problems that analysis step does not scale.

Also, I tried the advice given by Alex above, still the same.

Hope someone can help solving this problem.