Cluster Sparse Solver -- Matrix input format & MPI problem

Ales_P_ · ‎07-18-2016

Hello,

I am implementing MKL Cluster Sparse Solver (11.3.3.210 on Linux). I've been able to succesfully run a routine that works with centralized input format (nonsymetric real matrix). I am trying to enhance it by using distributed assembled matrix input format. But -- I don't understand what this means. I could not find any information in the document with format definitions. How does it work? Should I just define the matrix using ia, ja and a as I am used to with the centralized call with the exception that I set them on all processes, and then set iparm[39]=1 and provide the limits using iparm[40] a and iparm[41]? I've tried to do this, but in this case, the factorisation stalled (with centralized input, the factorisation took about 3 minutes), but no error was given. Or do I have to modify ia, ja and a somehow to reflect the distribution? Can anyone give me an example?

The second problem I have is with MPI -- if I run the sparse solver on one computational node, everything goes smoothly. But if I use more than one node, MPI goes crazy with messages like:

Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(2434)........: MPI_Bcast(buf=0x2aad1690e8cc, count=757719881, MPI_INT, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast_impl(1807)...:
MPIR_Bcast(1835)........:
I_MPIR_Bcast_intra(2016): Failure during collective
MPIR_Bcast_intra(1596)..:
MPIR_Bcast_binomial(256): message sizes do not match across processes in the collective routine: Received -32766 but expected -1264087772

What's happening and how can I fix it? I am not the system administrator...

Alexander_K_Intel2 · ‎07-19-2016

Hi,

For example of dcsr please see this kb article https://software.intel.com/en-us/articles/intel-math-kernel-library-parallel-direct-sparse-solver-for-clusters

As about crash for distributed version of solver - can you try to turn on matrix checker? Just set iparm[26] and msglvl to 1

Thanks,

Alex