Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Problem with MKL Scalapack PDGETRF

siegfriedyoung
Beginner
649 Views

Hi all,

I am trying to use MKL PBLAS/ScaLAPACK routine pdgetrf to do the LU decompostion. I wrote a simple test fortran program and it worked well with 2*2 processes on the cluster. However, when I tried to use more processes, like 'mpiexec -n 16', The program got stuck. 

One possible reason might be that the BLAS spawns too many threads which lead to a performance disaster ( for ref: https://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=6&t=3371 ). So I tried to export OMP_NUM_THREADS=1 or MKL_NUM_THREADS=1, set different combinations of pbs -l select=:ncpu:mpiprocs: to submit the job. But none of them solved the problem.

I have no idea now why it is fine with 2*2 procs but fails with 4*4 or more procs, hope someone here can help me. Any suggestion would be greatly appreciated.

Cluster compiler info:

Intel® Fortran Composer 13.0.1 and MPICH 3.0. 

Sieg

0 Kudos
3 Replies
SergeyKostrov
Valued Contributor II
649 Views
>>...https://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=6&t=3371 )... It is a quote from a post on another forum: >>Re: Scaling parallel, banded LU decomposition and solve >>by Julien Langou » Tue May 29, 2012 7:14 pm >> >>... >>Note: >>OMP_NUM_THREADS takes care of most of the BLAS libraries but not all, some have their own environment variable to control >>the number of threads they are running on... >>... Please ask Julien Langou to provide additional technical details about these internal environment variables. Also, did you do any testing in Non-MPI environment?
0 Kudos
siegfriedyoung
Beginner
649 Views

Sergey Kostrov wrote:

Also, did you do any testing in Non-MPI environment?

Thanks for the reply. Yes, it works fine.

0 Kudos
siegfriedyoung
Beginner
649 Views

Problem solved. It seems the MKL scalapack packed in composer 13.0.1 doesn't get along with MPICH3.0. After switching to the Intel MPI, the pdgetrf routine works fine with more procs.

Havn't try MPICH2 and Newer release of MKL. Hope the issue has been solved already.

0 Kudos
Reply