Speedup of PARDISO

jbakosi · ‎02-22-2007

Hello,

I'm trying the PARDISO solver in MKL v9.0 on a shared memory SGI Altix machine with Itanium 2 cpus. My matrix is symmetric positive definite, with sizes up to 350000 x 350000 and I tried setting OMP_NUM_THREADS up to 12 cpus. There is a batch system, so there is definitely no one else using the cpus I'm running on. However, as I increase the number of processors, the performance degrades, i.e. the fastest execution is with one cpu. Other parts of the code scale well with OpenMP, except the PARDISO calls (all phases).

I use -O3 -openmp -mtune=itanium2 as compiler flags and link the following libraries: -lstdc++ -lmkl_solver -lmkl_lapack -lmkl_ipf -lmkl_lapack64 -lmkl -lvml -lguide -lpthread.

I have in mkl.cfg:
MKL_SERIAL = OMP
MKL_INPUT_CHECK = OFF

Is there anything fundamental I'm missing or not doing right? Could anyone point me to the right direction?

Thank you in advance,
Jozsef