Solved: How to use MPI version of the reordering algorithm ( phase 11 of cluster_sparse_solver )?

segmentation_fault · ‎11-01-2021

I am following the guide here to set iparm(2)=10 to use the MPI version of the nested dissection and symbolic factorization algorithms.

https://www.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-fortran/top/sparse-solver-routines/parallel-direct-sp-solver-for-clusters-iface/cluster-sparse-solver-iparm-parameter.html

However, I am confused on how to set iparm(41) and iparm(42). Say I want to run on two compute nodes ( 2 mpi processes ). Do I set iparm(41) to one and iparm(42) to half the number of equations (n)? I tried doing this but got an error:

iparm[40] = 1;
int upper_bound = (n+1)/2;
printf ( "upper bound is %i", upper_bound );
iparm[41] = upper_bound;


*** Error in PARDISO  ( sequence_ido,parameters) error_num= 11
*** Input check: preprocessing 2 (out of bounds)
*** Input parameters: inconsistent error= 11 max_fac_store_in: 1
          matrix_number_in: 1 matrix_type_in: -2
          ido_in          : 11 neqns_in      : 3867819
          ia(neqns_in+1)-1: 89286447 nb_in         : 1

ERROR during symbolic factorization: -1[proxy:0:0@cn406] pmi cmd from fd 6: cmd=finalize
[proxy:0:0@cn406] PMI response: cmd=finalize_ack

Gennady_F_Intel · ‎11-09-2021

iparm[40] and iparm[41

wrt to iparm[40] and iparm[41] ( when iparm[39]==2):

might be looking at the $MKLROOT\example\cluster_sparse_solversc\

cl_solver_unsym_distr_c.c will help to see how to properly distribute the inputs across nodes.

Regarding the performance of the reordering stage:

in this case, if the inputs size feet with your one node RAM size, that the SMP version of

reordering will always be faster compared with MPI (iparm[1]==10) mode,

otherwise, it looks like a real problem which we could investigate.

View solution in original post

VidyalathaB_Intel · ‎11-02-2021

Hi,

Thanks for reaching out to us.

Could you please let us know the programming language on which you are working?

We see that you are following oneMKL guide for fortran language and from the error log it seems like you are applying them to your C language code.

>>I am following the guide here to set iparm(2)=10 to use the MPI version of the nested dissection and symbolic factorization algorithms.

In C, you need to set iparm[1]=10 in order to use the MPI version of the nested dissection and symbolic factorization algorithms.

Kindly refer to the below guide

https://www.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/sparse-solver-routines/parallel-direct-sp-solver-for-clusters-iface/cluster-sparse-solver-iparm-parameter.html

If you still face any issues, please provide us a sample reproducer (and steps if any) along with your environment details and MKL version so that we can work on it from our end.

Regards,

Vidya.

segmentation_fault · ‎11-02-2021

I am coding in C and incorrectly linked to the fortran guide. In any case, my question is related to understanding iparm[40] and iparm[41]. How to set those based on the number of hosts ?

However, I have read more of the documentation and think I have figured it out now. I believe there might be an error or omission in the documentation. When setting iparm[1] = 10, one must also set iparm[39] > 0 . This is not mentioned in the documentation for iparm[1] .

VidyalathaB_Intel · ‎11-03-2021

Hi,

>> I believe there might be an error or omission in the documentation. When setting iparm[1] = 10, one must also set iparm[39] > 0 ....This is not mentioned in the documentation for iparm[1]

We are looking into this issue. we will get back to you soon.

Regards,

Vidya.

Gennady_F_Intel · ‎11-03-2021

iparm[39] == 0 means that the matrix is located into the master node, but the computation happens across many MPI processes.

segmentation_fault · ‎11-06-2021

I understand that if I set parm[39] = 1 , then the matrix will be distributed across the nodes. However, I am confused on how to set iparm[40] and iparm[41] ( the beginning and end of the input domain ). Here's what I tried doing in the code block below, but my phase 11 and factorization time doubled and the solution vector was wrong.

My reason for wanting to try iparm[1] = 10 ( The MPI version of the nested dissection and symbolic factorization algorithms. ) is because my phase 11 is taking too long. It takes longer than my factorization times ( phase 22 ). So I am looking to try anything that can reduce phase 11 times. I already tried setting iparm[1] = 3 to use the the parallel version of the nested dissection algorithm. I got some speedup of phase 11 of around 10-20%.

MKL_INT n = 1219161;
MKL_INT ndim = 46687911;

iparm[40] = 1;
int upper_bound = (n+1)/2;
printf ( "upper bound is %i", upper_bound );
iparm[41] = upper_bound;

Gennady_F_Intel · ‎11-09-2021

iparm[40] and iparm[41

wrt to iparm[40] and iparm[41] ( when iparm[39]==2):

might be looking at the $MKLROOT\example\cluster_sparse_solversc\

cl_solver_unsym_distr_c.c will help to see how to properly distribute the inputs across nodes.

Regarding the performance of the reordering stage:

in this case, if the inputs size feet with your one node RAM size, that the SMP version of

reordering will always be faster compared with MPI (iparm[1]==10) mode,

otherwise, it looks like a real problem which we could investigate.

segmentation_fault · ‎11-09-2021

Thanks. I guess I was under the impression the distributed reordering (iparm[1]=10) would be faster than the SMP version (iparm[1]=3) . Since it is not, I will pass on it. Perhaps it's a throwback to the days when machines had limited RAM..

Kirill_V_Intel · ‎11-09-2021

In fact,

I slightly disagree with the statement about distributed reordering vs SMP version. These versions are implemented in quite different ways so I would not be confidently say that SMP version is always faster. In fact, I would suggest checking performance for both of them, if reordering performance is important.

Also, off the top of my head, I don't think that distributed reordering should require iparm[39] != 0. Are you sure it doesn't work with iparm[39] = 0?

Thanks,
Kirill

Gennady_F_Intel · ‎11-22-2021

This issue has been resolved and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.