Pardiso mpi version with fatal error

Horst · ‎05-08-2019

Hello,

I receive a fatal error when using the impi version of intel Pardiso and it would be nice if someone could help me with it. I compile the code with (intel link line advisor)

mpiifort -i8 -I${MKLROOT}/include -c -o 
mkl_cluster_sparse_solver.o ${MKLROOT}/include /mkl_cluster_sparse_solver.f90

mpiifort -i8 -I${MKLROOT}/include -c -o MPI.o MPI.f90
mpiifort mkl_cluster_sparse_solver.o MPI.o -o MPI.out -Wl,
--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a 
${MKLROOT}/lib/intel64/libmkl_intel_thread.a 
${MKLROOT}/lib/intel64/libmkl_core.a 
${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group
 -liomp5 -lpthread -lm -ldl

and run it for instance on two nodes with

mpiexec -n 2 ./MPI.out

I use the 64bit interface. The funny thing is that the reordering phase perfectly works however, the factorisation and solve step don't. The error message I get is the following:

Fatal error in PMPI_Bcast: Message truncated, error stack:
PMPI_Bcast(2654)..................: MPI_Bcast(buf=0x7ffe63518210, count=1, 
MPI_LONG_LONG_INT, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast_impl(1804).............: fail failed
MPIR_Bcast(1832)..................: fail failed
I_MPIR_Bcast_intra(2057)..........: Failure during collective
MPIR_Bcast_intra(1599)............: fail failed
MPIR_Bcast_binomial(247)..........: fail failed
MPIDI_CH3U_Receive_data_found(131): Message from rank 0 and tag 2 truncated;
1600 bytes received but buffer size is 8

So this seems to be a problem with the buffer size. I thought first of all that my problem is too large however, this is not an issue of the matrix size. I tried to fix it by setting

export I_MPI_SHM_LMT_BUFFER_SIZE=2000

but it did not change the problem. In the impi manual there is also the I_MPI_SHM_LMT_BUFFER_NUM and I also tried to set this number to a higher value. The following versions are used: MKL version: 2017.4.256, Ifort version: 17.0.6.256, IMPI version: 2017.4.239. I tried also newer versions but it changed nothing. If I should post an example please let me know. However, I have the hope that it can be easily fixed by setting the buffer size (not I_MPI_SHM_LMT_BUFFER_SIZE) to a higher value.

Thanks in advance

Gennady_F_Intel · ‎05-08-2019

Horst, how big is the problem size you are trying to solve? Do you have enough memory (RAM) available?

Horst · ‎05-08-2019

Thank you for your fast response. The matrix has dimension 390*390 and the 3478 entries that differ from zero. I require 12GB on the cluster and since the problem is not big, I think this is sufficient.

Gennady_F_Intel · ‎05-08-2019

This is very small task. Pardiso API for SMP may help to solve this problem. Nevertheless, please share with us the example to check the problem on our side. btw, could you try to check LP64 API instead of ILP64 ones.

Horst · ‎05-09-2019

Thank you for your answer and suggestions. This is a very small task since I figured out that the problem occurs for almost every system size.

However, I tried to use the 32-bit interface and it does not fix the problem. So I stick to the 64-bit interface since I treat very large system sizes.

I have attached a very simple program that produces a different mpi error but still an error:

[proxy:0:1@n0710] HYD_pmcd_pmip_control_cmd_cb (../../pm/pmiserv/pmip_cb.c:3481): assert (!closed) failed
[proxy:0:1@n0710] HYDT_dmxu_poll_wait_for_event (../../tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:1@n0710] main (../../pm/pmiserv/pmip.c:558): demux engine error waiting for event

after the symbolic factorisation. The program is there:

program cluster_sparse_solver_sym
use mkl_cluster_sparse_solver
implicit none
include 'mpif.h'
integer, parameter :: dp = kind(1.0D0)
!.. Internal solver memory pointer for 64-bit architectures
TYPE(MKL_CLUSTER_SPARSE_SOLVER_HANDLE)  :: pt(64)

integer maxfct, mnum, mtype, phase, nrhs, error, msglvl,  i, idum(1), DimensionL, Nsparse
integer*4 mpi_stat, rank, num_procs
double precision :: ddum(1)

integer :: iparm( 64 )
integer, allocatable :: IA( : ),  JA( : )
double precision, allocatable :: VAL( : ), rhodot( : ), rho( : )

integer(4) MKL_COMM


MKL_COMM=MPI_COMM_WORLD
call mpi_init(mpi_stat)
call mpi_comm_rank(MKL_COMM, rank, mpi_stat)

do i = 1, 64
  pt(i)%dummy = 0
end do

if (rank.eq.0) then

DimensionL = 3
Nsparse    = 7

allocate (VAL(Nsparse), IA(DimensionL + 1), JA(Nsparse))

VAL(1) = 1.0d0
VAL(2) = 1.0d0
VAL(3) = 1.0d0
VAL(4) = 1.0d0
VAL(5) = 1.0d0
VAL(6) = 1.0d0
VAL(7) = 1.0d0

JA(1) = 1
JA(2) = 2
JA(3) = 3
JA(4) = 2
JA(5) = 3
JA(6) = 2
JA(7) = 3

IA(1) = 1
IA(2) = 4
IA(3) = 6
IA(4) = 8

allocate(rhodot(DimensionL), rho(DimensionL))

do i=1,DimensionL
  rhodot(i)  = 1.0d0
enddo

do i = 1, 64
  iparm(i) = 0
end do

    iparm(1) = 1 ! no solver default
    iparm(2) = 3 ! fill-in reordering from METIS
    iparm(8) = 2 ! numbers of iterative refinement steps
    iparm(10) = 13 ! perturbe the pivot elements with 1E-13
    iparm(11) = 1 ! use nonsymmetric permutation and scaling MPS
    iparm(13) = 1 ! maximum weighted matching algorithm is switched-off 
    iparm(27) = 1
    error = 0  ! initialize error flag
    msglvl = 1 ! print statistical information
    mtype = 11  ! symmetric, indefinite
    nrhs =1
    maxfct =1
    mnum =1
endif

phase = 11
call cluster_sparse_solver_64 ( pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, ddum, ddum, MKL_COMM, error )
if (error.ne.0.and.rank.eq.0) write(*,*) 'ERROR: ', error

phase = 22
call cluster_sparse_solver_64 ( pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, ddum, ddum, MKL_COMM, error )
if (error.ne.0.and.rank.eq.0) write(*,*) 'ERROR: ', error

phase = 33
call cluster_sparse_solver_64 ( pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, rhodot, rho, MKL_COMM, error )
if (error.ne.0.and.rank.eq.0) write(*,*) 'ERROR: ', error

if (rank.eq.0) write(*,*) rho

phase = -1
call cluster_sparse_solver_64 ( pt, maxfct, mnum, mtype, phase, DimensionL, ddum, idum, idum, idum, nrhs, iparm, msglvl, ddum, ddum, MKL_COMM, error )
if (error.ne.0.and.rank.eq.0) write(*,*) 'Release of memory: ', error

call mpi_finalize(mpi_stat)

end

It would be very nice if you could help me with this problem. Compiling is done via the commands described in my first post.

Thanks in advance

Kirill_V_Intel · ‎05-09-2019

Hello Horst,

Is it possible for you to share a complete reproducer for the failure you observed originally (MPI_Bcast), with a small matrix, together with MPI and Intel MKL versions? I don't think I_MPI_SHM_LMT_BUFFER_NUM can help with the failure. I'd suspect that some of the settings for the Cluster Sparse Solver were wrong (or maybe some non-supported case has happened internally). If we get the reproducer, we should be able to find the root cause.

Best,
Kirill

Horst · ‎05-09-2019

Thank you for your response.

Sorry I mixed something up. This is the code producing the fatal error. I have just run it with openmpi to check whether impi is the problem. If I run it with impi I obtain the fatal error. I used MKL version: 2017.4.256, Ifort version: 17.0.6.256, IMPI version: 2017.4.239. I also tried newer versions of mkl/impi/ifort but it changed nothing. I compile with the commands in my first post. If you need to know anything else, let me know.

Thanks in advance.

Kirill_V_Intel · ‎05-10-2019

Hello Horst,

We reproduced the issue on our side. The root cause is that maxfct variable is needed to be set on all ranks (not only on the root, as in your case). It is clear that this restriction can be avoided, and we are going to fix this in the coming releases.

So, the workaround is to set maxfct for all processes (outside the if clause in your example code).

Let us know if the workaround doesn't work for you.

Best,
Kirill

Horst · ‎05-10-2019

Thanks for your reply and suggestions. It did not solve the problem but what seems to solve the problem was to also set mnum, msglvl, mtype and nrhs for all ranks. In the documentation is written: 'Most of the input parameters (except for the pt, phase, and comm parameters and, for the
distributed format, the a, ia, and ja arrays) must be set on the master MPI process only, and ignored on other processes.' So it would be nice if you could change that.

Thank you for your help. I've had some issues so far and you always helped me a lot.

Kirill_V_Intel · ‎05-10-2019

Thanks for the catch, we'll definitely fix that!

Gennady_F_Intel · ‎09-15-2019

Horst, could you please try the latest MKL 2019 u5 we released the last week and let us know if the problem still exists on your side!

Horst · ‎09-19-2019

Thank you for this update. Unfortunately, it takes some time until we can use the latest MKL library. I will update you when we can use this.

Gennady_F_Intel · ‎12-15-2019

Horst, have you had a chance to check the issue with 2019 u5?