I am implementing MKL Cluster Sparse Solver (18.104.22.168 on Linux). I've been able to succesfully run a routine that works with centralized input format (nonsymetric real matrix). I am trying to enhance it by using distributed assembled matrix input format. But -- I don't understand what this means. I could not find any information in the document with format definitions. How does it work? Should I just define the matrix using ia, ja and a as I am used to with the centralized call with the exception that I set them on all processes, and then set iparm=1 and provide the limits using iparm a and iparm? I've tried to do this, but in this case, the factorisation stalled (with centralized input, the factorisation took about 3 minutes), but no error was given. Or do I have to modify ia, ja and a somehow to reflect the distribution? Can anyone give me an example?
The second problem I have is with MPI -- if I run the sparse solver on one computational node, everything goes smoothly. But if I use more than one node, MPI goes crazy with messages like:
Fatal error in PMPI_Bcast: Other MPI error, error stack: PMPI_Bcast(2434)........: MPI_Bcast(buf=0x2aad1690e8cc, count=757719881, MPI_INT, root=0, MPI_COMM_WORLD) failed MPIR_Bcast_impl(1807)...: MPIR_Bcast(1835)........: I_MPIR_Bcast_intra(2016): Failure during collective MPIR_Bcast_intra(1596)..: MPIR_Bcast_binomial(256): message sizes do not match across processes in the collective routine: Received -32766 but expected -1264087772
What's happening and how can I fix it? I am not the system administrator...
For example of dcsr please see this kb article https://software.intel.com/en-us/articles/intel-math-kernel-library-parallel-direct-sparse-solver-for-clusters
As about crash for distributed version of solver - can you try to turn on matrix checker? Just set iparm and msglvl to 1