Pardiso example terminates when using 72 or more cpus

Ferris_H_ · ‎08-03-2016

I studied the example cl_solver_sym_sp_0_based_c.c in cluster_sparse_solverc/source . I compiled it using:

make libintel64 example=cl_solver_sym_sp_0_based_c

It runs fine . However the matrix is too small to look at performance. So I modified the example to read in a 3million^2 matrix from a text file. When I run it with 24 cpus ( 1 host ), it factors the matrix in 30 second. When I run it with 48 cpus ( 2 hosts ) it factors it in 20 seconds. This is great! But when I run it with 72 or more cpus, I keep getting this after the reordering stage:

Reordering completed ...
===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 106418 RUNNING AT cforge201
=   EXIT CODE: 11
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
==================================================================================

The command I am using is:

mpirun -np 3 -machinefile ./hostfile ./cl_solver_sym_sp_0_based_c.exe

Where hostfile contains:

cforge200:1
cforge201:1
cforge202:1

Here are my example files to see if issue is reproduceable:

cl_solver_sym_sp_0_based_c.c - Edit all the occurences of *.txt to the path where the files are on your system

https://www.dropbox.com/s/ndkzi9zojxuh1xo/cl_solver_sym_sp_0_based_c.c?dl=0

ia, ja, a, and b data in text files:

https://www.dropbox.com/s/3dkhbillyso03kc/ia_ja_a_b_data.tar.gz?dl=0

Curious what kind of performance improvement you get when running with MPI on 12, 24, 48, and 72 cpus!

Gennady_F_Intel · ‎08-03-2016

the issue is escalated and the fix of the problem is targeted to be released the next update.

Ferris_H_ · ‎08-03-2016

Excellent. Since I had two problems with Pardiso, I was not sure which problem the fix was meant for. It sounds like it is meant for the one where it can not run on more than 2 hosts ( 48 cpus ).