topic Re: How to use MPI version of the reordering algorithm ( phase 11 of cluster_sparse_solver )? in Intel® oneAPI Math Kernel Library

How to use MPI version of the reordering algorithm ( phase 11 of cluster_sparse_solver )?

segmentation_fault — Mon, 01 Nov 2021 16:26:13 GMT

I am following the guide here to set iparm(2)=10 to use the MPI version of the nested dissection and symbolic factorization algorithms.

https://www.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-fortran/top/sparse-solver-routines/parallel-direct-sp-solver-for-clusters-iface/cluster-sparse-solver-iparm-parameter.html

However, I am confused on how to set iparm(41) and iparm(42). Say I want to run on two compute nodes ( 2 mpi processes ). Do I set iparm(41) to one and iparm(42) to half the number of equations (n)? I tried doing this but got an error:

iparm[40] = 1; int upper_bound = (n+1)/2; printf ( "upper bound is %i", upper_bound ); iparm[41] = upper_bound; *** Error in PARDISO ( sequence_ido,parameters) error_num= 11 *** Input check: preprocessing 2 (out of bounds) *** Input parameters: inconsistent error= 11 max_fac_store_in: 1 matrix_number_in: 1 matrix_type_in: -2 ido_in : 11 neqns_in : 3867819 ia(neqns_in+1)-1: 89286447 nb_in : 1 ERROR during symbolic factorization: -1[proxy:0:0@cn406] pmi cmd from fd 6: cmd=finalize [proxy:0:0@cn406] PMI response: cmd=finalize_ack

Re: How to use MPI version of the reordering algorithm ( phase 11 of cluster_sparse_solver )?

VidyalathaB_Intel — Tue, 02 Nov 2021 11:44:19 GMT

Hi,

Thanks for reaching out to us.

Could you please let us know the programming language on which you are working?

We see that you are following oneMKL guide for fortran language and from the error log it seems like you are applying them to your C language code.

>>I am following the guide here to set iparm(2)=10 to use the MPI version of the nested dissection and symbolic factorization algorithms.

In C, you need to set iparm[1]=10 in order to use the MPI version of the nested dissection and symbolic factorization algorithms.

Kindly refer to the below guide

https://www.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/sparse-solver-routines/parallel-direct-sp-solver-for-clusters-iface/cluster-sparse-solver-iparm-parameter.html

If you still face any issues, please provide us a sample reproducer (and steps if any) along with your environment details and MKL version so that we can work on it from our end.

Regards,

Vidya.

Re: How to use MPI version of the reordering algorithm ( phase 11 of cluster_sparse_solver )?

segmentation_fault — Tue, 02 Nov 2021 21:28:01 GMT

I am coding in C and incorrectly linked to the fortran guide. In any case, my question is related to understanding iparm[40] and iparm[41]. How to set those based on the number of hosts ?

However, I have read more of the documentation and think I have figured it out now. I believe there might be an error or omission in the documentation. When setting iparm[1] = 10, one must also set iparm[39] > 0 . This is not mentioned in the documentation for iparm[1] .

Re:How to use MPI version of the reordering algorithm ( phase 11 of cluster_sparse_solver )?

VidyalathaB_Intel — Wed, 03 Nov 2021 12:44:21 GMT

Hi,

>> I believe there might be an error or omission in the documentation. When setting iparm[1] = 10, one must also set iparm[39] > 0 ....This is not mentioned in the documentation for iparm[1]

We are looking into this issue. we will get back to you soon.

Regards,

Vidya.

Re:How to use MPI version of the reordering algorithm ( phase 11 of cluster_sparse_solver )?

Gennady_F_Intel — Wed, 03 Nov 2021 15:03:50 GMT

iparm[39] == 0 means that the matrix is located into the master node, but the computation happens across many MPI processes.

Re: Re:How to use MPI version of the reordering algorithm ( phase 11 of cluster_sparse_solver )?

segmentation_fault — Sun, 07 Nov 2021 00:16:48 GMT

I understand that if I set parm[39] = 1 , then the matrix will be distributed across the nodes. However, I am confused on how to set iparm[40] and iparm[41] ( the beginning and end of the input domain ). Here's what I tried doing in the code block below, but my phase 11 and factorization time doubled and the solution vector was wrong.

My reason for wanting to try iparm[1] = 10 ( The MPI version of the nested dissection and symbolic factorization algorithms. ) is because my phase 11 is taking too long. It takes longer than my factorization times ( phase 22 ). So I am looking to try anything that can reduce phase 11 times. I already tried setting iparm[1] = 3 to use the the parallel version of the nested dissection algorithm. I got some speedup of phase 11 of around 10-20%.

MKL_INT n = 1219161; MKL_INT ndim = 46687911; iparm[40] = 1; int upper_bound = (n+1)/2; printf ( "upper bound is %i", upper_bound ); iparm[41] = upper_bound;

Re:How to use MPI version of the reordering algorithm ( phase 11 of cluster_sparse_solver )?

Gennady_F_Intel — Tue, 09 Nov 2021 08:19:39 GMT

iparm[40] and iparm[41

wrt to iparm[40] and iparm[41] ( when iparm[39]==2):

might be looking at the $MKLROOT\example\cluster_sparse_solversc\

cl_solver_unsym_distr_c.c will help to see how to properly distribute the inputs across nodes.

Regarding the performance of the reordering stage:

in this case, if the inputs size feet with your one node RAM size, that the SMP version of

reordering will always be faster compared with MPI (iparm[1]==10) mode,

otherwise, it looks like a real problem which we could investigate.

Re: Re:How to use MPI version of the reordering algorithm ( phase 11 of cluster_sparse_solver )?

segmentation_fault — Tue, 09 Nov 2021 22:32:27 GMT

Thanks. I guess I was under the impression the distributed reordering (iparm[1]=10) would be faster than the SMP version (iparm[1]=3) . Since it is not, I will pass on it. Perhaps it's a throwback to the days when machines had limited RAM..

Re: Re:How to use MPI version of the reordering algorithm ( phase 11 of cluster_sparse_solver )?

Kirill_V_Intel — Tue, 09 Nov 2021 22:48:23 GMT

In fact,

I slightly disagree with the statement about distributed reordering vs SMP version. These versions are implemented in quite different ways so I would not be confidently say that SMP version is always faster. In fact, I would suggest checking performance for both of them, if reordering performance is important.

Also, off the top of my head, I don't think that distributed reordering should require iparm[39] != 0. Are you sure it doesn't work with iparm[39] = 0?

Thanks,
Kirill

Re:How to use MPI version of the reordering algorithm ( phase 11 of cluster_sparse_solver )?

Gennady_F_Intel — Tue, 23 Nov 2021 03:32:00 GMT

This issue has been resolved and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.