Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library
- Cluster Sparse Solver -- Matrix input format & MPI problem

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Ales_P_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-18-2016
06:20 AM

59 Views

Cluster Sparse Solver -- Matrix input format & MPI problem

Hello,

I am implementing MKL Cluster Sparse Solver (11.3.3.210 on Linux). I've been able to succesfully run a routine that works with centralized input format (nonsymetric real matrix). I am trying to enhance it by using distributed assembled matrix input format. But -- I don't understand what this means. I could not find any information in the document with format definitions. How does it work? Should I just define the matrix using *ia*, *ja* and *a* as I am used to with the centralized call with the exception that I set them on all processes, and then set *iparm[39]=1* and provide the limits using *iparm[40]* a and *iparm[41]*? I've tried to do this, but in this case, the factorisation stalled (with centralized input, the factorisation took about 3 minutes), but no error was given. Or do I have to modify *ia*, *ja* and *a* somehow to reflect the distribution? Can anyone give me an example?

The second problem I have is with MPI -- if I run the sparse solver on one computational node, everything goes smoothly. But if I use more than one node, MPI goes crazy with messages like:

Fatal error in PMPI_Bcast: Other MPI error, error stack: PMPI_Bcast(2434)........: MPI_Bcast(buf=0x2aad1690e8cc, count=757719881, MPI_INT, root=0, MPI_COMM_WORLD) failed MPIR_Bcast_impl(1807)...: MPIR_Bcast(1835)........: I_MPIR_Bcast_intra(2016): Failure during collective MPIR_Bcast_intra(1596)..: MPIR_Bcast_binomial(256): message sizes do not match across processes in the collective routine: Received -32766 but expected -1264087772

What's happening and how can I fix it? I am not the system administrator...

Link Copied

1 Reply

Alexander_K_Intel2

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-19-2016
06:42 AM

59 Views

Hi,

For example of dcsr please see this kb article https://software.intel.com/en-us/articles/intel-math-kernel-library-parallel-direct-sparse-solver-for-clusters

As about crash for distributed version of solver - can you try to turn on matrix checker? Just set iparm[26] and msglvl to 1

Thanks,

Alex

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.