- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I am following the guide here to set iparm(2)=10 to use the MPI version of the nested dissection and symbolic factorization algorithms.

However, I am confused on how to set iparm(41) and iparm(42). Say I want to run on two compute nodes ( 2 mpi processes ). Do I set iparm(41) to one and iparm(42) to half the number of equations (n)? I tried doing this but got an error:

```
iparm[40] = 1;
int upper_bound = (n+1)/2;
printf ( "upper bound is %i", upper_bound );
iparm[41] = upper_bound;
*** Error in PARDISO ( sequence_ido,parameters) error_num= 11
*** Input check: preprocessing 2 (out of bounds)
*** Input parameters: inconsistent error= 11 max_fac_store_in: 1
matrix_number_in: 1 matrix_type_in: -2
ido_in : 11 neqns_in : 3867819
ia(neqns_in+1)-1: 89286447 nb_in : 1
ERROR during symbolic factorization: -1[proxy:0:0@cn406] pmi cmd from fd 6: cmd=finalize
[proxy:0:0@cn406] PMI response: cmd=finalize_ack
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

iparm[40] and iparm[41

wrt to iparm[40] and iparm[41] ( when iparm[39]==2):

might be looking at the $MKLROOT\example\cluster_sparse_solversc\

cl_solver_unsym_distr_c.c will help to see how to properly distribute the inputs across nodes.

Regarding the performance of the reordering stage:

in this case, if the inputs size feet with your one node RAM size, that the SMP version of

reordering will always be faster compared with MPI (iparm[1]==10) mode,

otherwise, it looks like a real problem which we could investigate.

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi,

Thanks for reaching out to us.

Could you please let us know the programming language on which you are working?

We see that you are following oneMKL guide for fortran language and from the error log it seems like you are applying them to your C language code.

>>*I am following the guide here to set iparm(2)=10 to use the MPI version of the nested dissection and symbolic factorization algorithms.*

In C, you need to set iparm[1]=10 in order to use the MPI version of the nested dissection and symbolic factorization algorithms.

Kindly refer to the below guide

If you still face any issues, please provide us a sample reproducer (and steps if any) along with your environment details and MKL version so that we can work on it from our end.

Regards,

Vidya.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I am coding in C and incorrectly linked to the fortran guide. In any case, my question is related to understanding iparm[40] and iparm[41]. How to set those based on the number of hosts ?

However, I have read more of the documentation and think I have figured it out now. I believe there might be an error or omission in the documentation. When setting iparm[1] = 10, one must also set iparm[39] > 0 . This is not mentioned in the documentation for iparm[1] .

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi,

>>* I believe there might be an error or omission in the documentation. When setting iparm[1] = 10,* one must also set iparm[39] > 0 ....*This is* *not mentioned in the documentation for iparm[1]*

We are looking into this issue. we will get back to you soon.

Regards,

Vidya.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

iparm[39] == 0 means that the matrix is located into the master node, but the computation happens across many MPI processes.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I understand that if I set parm[39] = 1 , then the matrix will be distributed across the nodes. However, I am confused on how to set iparm[40] and iparm[41] ( the beginning and end of the input domain ). Here's what I tried doing in the code block below, but my phase 11 and factorization time doubled and the solution vector was wrong.

My reason for wanting to try iparm[1] = 10 ( The MPI version of the nested dissection and symbolic factorization algorithms. ) is because my phase 11 is taking too long. It takes longer than my factorization times ( phase 22 ). So I am looking to try anything that can reduce phase 11 times. I already tried setting iparm[1] = 3 to use the the parallel version of the nested dissection algorithm. I got some speedup of phase 11 of around 10-20%.

```
MKL_INT n = 1219161;
MKL_INT ndim = 46687911;
iparm[40] = 1;
int upper_bound = (n+1)/2;
printf ( "upper bound is %i", upper_bound );
iparm[41] = upper_bound;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

iparm[40] and iparm[41

wrt to iparm[40] and iparm[41] ( when iparm[39]==2):

might be looking at the $MKLROOT\example\cluster_sparse_solversc\

cl_solver_unsym_distr_c.c will help to see how to properly distribute the inputs across nodes.

Regarding the performance of the reordering stage:

in this case, if the inputs size feet with your one node RAM size, that the SMP version of

reordering will always be faster compared with MPI (iparm[1]==10) mode,

otherwise, it looks like a real problem which we could investigate.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Thanks. I guess I was under the impression the distributed reordering (iparm[1]=10) would be faster than the SMP version (iparm[1]=3) . Since it is not, I will pass on it. Perhaps it's a throwback to the days when machines had limited RAM..

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

In fact,

I slightly disagree with the statement about distributed reordering vs SMP version. These versions are implemented in quite different ways so I would not be confidently say that SMP version is always faster. In fact, I would suggest checking performance for both of them, if reordering performance is important.

Also, off the top of my head, I don't think that distributed reordering should require iparm[39] != 0. Are you sure it doesn't work with iparm[39] = 0?

Thanks,

Kirill

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

This issue has been resolved and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page