I studied the example cl_solver_sym_sp_0_based_c.c in cluster_sparse_solverc/source . I compiled it using:
make libintel64 example=cl_solver_sym_sp_0_based_c
It runs fine . However the matrix is too small to look at performance. So I modified the example to read in a 3million^2 matrix from a text file. When I run it with 24 cpus ( 1 host ), it factors the matrix in 30 second. When I run it with 48 cpus ( 2 hosts ) it factors it in 20 seconds. This is great! But when I run it with 72 or more cpus, I keep getting this after the reordering stage:
Reordering completed ...
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 106418 RUNNING AT cforge201
= EXIT CODE: 11
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
The command I am using is:
mpirun -np 3 -machinefile ./hostfile ./cl_solver_sym_sp_0_based_c.exe
Where hostfile contains:
Here are my example files to see if issue is reproduceable:
cl_solver_sym_sp_0_based_c.c - Edit all the occurences of *.txt to the path where the files are on your system
ia, ja, a, and b data in text files:
Curious what kind of performance improvement you get when running with MPI on 12, 24, 48, and 72 cpus!
Excellent. Since I had two problems with Pardiso, I was not sure which problem the fix was meant for. It sounds like it is meant for the one where it can not run on more than 2 hosts ( 48 cpus ).