Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
6977 Discussions

Direct Sparse Solver for Clusters poor scaling

Emond__Guillaume
Beginner
904 Views

Hi,

We are currently developing a distributed version of our c++ finite element program. We planned to use the Intel Direct Sparse Solver for Cluster but it seems we can't reach good scalability with our settings. The matrix is assumed non symmetric and built in the DCSR format.

The test case used is a simple thermal diffusion problem on a square grid.  Different sizes of problem, ranging from 1M to 25M DOF, have been tested with many combinations of MPI processes and OpenMP threads (usually with 1 MPI process by node or by socket). Memory allocated at factorization phase is scaling down but we observed small speed-up on running time.

Actually, we observed these behaviors:

- Symbolic factorization benefits from more MPI processes but is not affected by threads.

- Factorization scales with number of OpenMP threads and sometimes with MPI.

- Most of the time, results shows no significant gain  on solving phase for both parallelization.

I must be doing something wrong but i can't  seem to find the solution to the problem.

Thanks a lot for any advice

 

 

 

 

The following iparm variables are used:

iparm(0) = 1;

iparm(1) = 10;

iparm(7) = 2;

iparm(9) = 13;

iparm(10) = 1;

iparm(12) = 1;

iparm(34) = 1;

iparm(39) = 2;

iparm(40,41) = first and last line of local matrix

The code is compiled with  2017 Intel compiler and Intel MPI. Compilation flags used are :  -03 -qopenmp -mkl=parallel and 

0 Kudos
5 Replies
Gennady_F_Intel
Moderator
904 Views

Emond. what version of MKL do you use? is that MKL 2017 or 2019? 

if you want to use Direct Solvers for Clusters, you need to link with some of mpi based libs, using mkl=parallel option will allow to link with SMP version of Intel Pardiso. Please refer to the MKL Linker Adviser to see how to link when you need to use Direct Solver for Cluster. 

nevertheless, what scalability result do you observe?

 

0 Kudos
Emond__Guillaume
Beginner
904 Views

I am using MKL 2017

To link with mpi and mkl libs, I use these linking flags :

-lmkl -lmkl_intel_lp64 -lmkl_core -lmkl_blacs_intelmpi_lp64 -mkl_scalapack_lp64 -lpthread -lm -ldl -lmpi

I just realized its not exactly the same as the MKL linker adviser. I will check if it changes anything.

The attached figure shows our typical running times of a 6.5M DOF problem for analysis, factorization and solving phases.  N  & T are respectively the number of nodes (1 process per nodes) and threads per nodes. 

 

0 Kudos
Emond__Guillaume
Beginner
904 Views

Hi,

It seems linking options were not the probleme because same issues still occurs with the exact same options taken from linker advisor.

0 Kudos
Gennady_F_Intel
Moderator
904 Views

Could we ask you to try the latest MKL 2019 and check if the scalability problem will be the similar? or please give us the reproducer with these input data to check the problem on our side. thanks Gennady

0 Kudos
iarve__endel
Beginner
904 Views

Hello,

I noticed this post is about a year old and wonder what was the outcome. I have just submitted a help request ticket on a similar issue with 2019 version MKL. I am not seen any monotonicity in scaling neither by MPI nor OMP (except MPI=1). See attached report. The solver itself blends perfectly with our code and I am keeping my fingers crossed. 

Thanks

Endel 

Reply