Hello,
I receive a fatal error when using the impi version of intel Pardiso and it would be nice if someone could help me with it. I compile the code with (intel link line advisor)
mpiifort -i8 -I${MKLROOT}/include -c -o mkl_cluster_sparse_solver.o ${MKLROOT}/include /mkl_cluster_sparse_solver.f90 mpiifort -i8 -I${MKLROOT}/include -c -o MPI.o MPI.f90 mpiifort mkl_cluster_sparse_solver.o MPI.o -o MPI.out -Wl, --start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -liomp5 -lpthread -lm -ldl
and run it for instance on two nodes with
mpiexec -n 2 ./MPI.out
I use the 64bit interface. The funny thing is that the reordering phase perfectly works however, the factorisation and solve step don't. The error message I get is the following:
Fatal error in PMPI_Bcast: Message truncated, error stack: PMPI_Bcast(2654)..................: MPI_Bcast(buf=0x7ffe63518210, count=1, MPI_LONG_LONG_INT, root=0, MPI_COMM_WORLD) failed MPIR_Bcast_impl(1804).............: fail failed MPIR_Bcast(1832)..................: fail failed I_MPIR_Bcast_intra(2057)..........: Failure during collective MPIR_Bcast_intra(1599)............: fail failed MPIR_Bcast_binomial(247)..........: fail failed MPIDI_CH3U_Receive_data_found(131): Message from rank 0 and tag 2 truncated; 1600 bytes received but buffer size is 8
So this seems to be a problem with the buffer size. I thought first of all that my problem is too large however, this is not an issue of the matrix size. I tried to fix it by setting
export I_MPI_SHM_LMT_BUFFER_SIZE=2000
but it did not change the problem. In the impi manual there is also the I_MPI_SHM_LMT_BUFFER_NUM and I also tried to set this number to a higher value. The following versions are used: MKL version: 2017.4.256, Ifort version: 17.0.6.256, IMPI version: 2017.4.239. I tried also newer versions but it changed nothing. If I should post an example please let me know. However, I have the hope that it can be easily fixed by setting the buffer size (not I_MPI_SHM_LMT_BUFFER_SIZE) to a higher value.
Thanks in advance
链接已复制
This is very small task. Pardiso API for SMP may help to solve this problem. Nevertheless, please share with us the example to check the problem on our side. btw, could you try to check LP64 API instead of ILP64 ones.
Thank you for your answer and suggestions. This is a very small task since I figured out that the problem occurs for almost every system size.
However, I tried to use the 32-bit interface and it does not fix the problem. So I stick to the 64-bit interface since I treat very large system sizes.
I have attached a very simple program that produces a different mpi error but still an error:
[proxy:0:1@n0710] HYD_pmcd_pmip_control_cmd_cb (../../pm/pmiserv/pmip_cb.c:3481): assert (!closed) failed
[proxy:0:1@n0710] HYDT_dmxu_poll_wait_for_event (../../tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:1@n0710] main (../../pm/pmiserv/pmip.c:558): demux engine error waiting for event
after the symbolic factorisation. The program is there:
program cluster_sparse_solver_sym use mkl_cluster_sparse_solver implicit none include 'mpif.h' integer, parameter :: dp = kind(1.0D0) !.. Internal solver memory pointer for 64-bit architectures TYPE(MKL_CLUSTER_SPARSE_SOLVER_HANDLE) :: pt(64) integer maxfct, mnum, mtype, phase, nrhs, error, msglvl, i, idum(1), DimensionL, Nsparse integer*4 mpi_stat, rank, num_procs double precision :: ddum(1) integer :: iparm( 64 ) integer, allocatable :: IA( : ), JA( : ) double precision, allocatable :: VAL( : ), rhodot( : ), rho( : ) integer(4) MKL_COMM MKL_COMM=MPI_COMM_WORLD call mpi_init(mpi_stat) call mpi_comm_rank(MKL_COMM, rank, mpi_stat) do i = 1, 64 pt(i)%dummy = 0 end do if (rank.eq.0) then DimensionL = 3 Nsparse = 7 allocate (VAL(Nsparse), IA(DimensionL + 1), JA(Nsparse)) VAL(1) = 1.0d0 VAL(2) = 1.0d0 VAL(3) = 1.0d0 VAL(4) = 1.0d0 VAL(5) = 1.0d0 VAL(6) = 1.0d0 VAL(7) = 1.0d0 JA(1) = 1 JA(2) = 2 JA(3) = 3 JA(4) = 2 JA(5) = 3 JA(6) = 2 JA(7) = 3 IA(1) = 1 IA(2) = 4 IA(3) = 6 IA(4) = 8 allocate(rhodot(DimensionL), rho(DimensionL)) do i=1,DimensionL rhodot(i) = 1.0d0 enddo do i = 1, 64 iparm(i) = 0 end do iparm(1) = 1 ! no solver default iparm(2) = 3 ! fill-in reordering from METIS iparm(8) = 2 ! numbers of iterative refinement steps iparm(10) = 13 ! perturbe the pivot elements with 1E-13 iparm(11) = 1 ! use nonsymmetric permutation and scaling MPS iparm(13) = 1 ! maximum weighted matching algorithm is switched-off iparm(27) = 1 error = 0 ! initialize error flag msglvl = 1 ! print statistical information mtype = 11 ! symmetric, indefinite nrhs =1 maxfct =1 mnum =1 endif phase = 11 call cluster_sparse_solver_64 ( pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, ddum, ddum, MKL_COMM, error ) if (error.ne.0.and.rank.eq.0) write(*,*) 'ERROR: ', error phase = 22 call cluster_sparse_solver_64 ( pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, ddum, ddum, MKL_COMM, error ) if (error.ne.0.and.rank.eq.0) write(*,*) 'ERROR: ', error phase = 33 call cluster_sparse_solver_64 ( pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, rhodot, rho, MKL_COMM, error ) if (error.ne.0.and.rank.eq.0) write(*,*) 'ERROR: ', error if (rank.eq.0) write(*,*) rho phase = -1 call cluster_sparse_solver_64 ( pt, maxfct, mnum, mtype, phase, DimensionL, ddum, idum, idum, idum, nrhs, iparm, msglvl, ddum, ddum, MKL_COMM, error ) if (error.ne.0.and.rank.eq.0) write(*,*) 'Release of memory: ', error call mpi_finalize(mpi_stat) end
It would be very nice if you could help me with this problem. Compiling is done via the commands described in my first post.
Thanks in advance
Hello Horst,
Is it possible for you to share a complete reproducer for the failure you observed originally (MPI_Bcast), with a small matrix, together with MPI and Intel MKL versions? I don't think I_MPI_SHM_LMT_BUFFER_NUM can help with the failure. I'd suspect that some of the settings for the Cluster Sparse Solver were wrong (or maybe some non-supported case has happened internally). If we get the reproducer, we should be able to find the root cause.
Best,
Kirill
Thank you for your response.
Sorry I mixed something up. This is the code producing the fatal error. I have just run it with openmpi to check whether impi is the problem. If I run it with impi I obtain the fatal error. I used MKL version: 2017.4.256, Ifort version: 17.0.6.256, IMPI version: 2017.4.239. I also tried newer versions of mkl/impi/ifort but it changed nothing. I compile with the commands in my first post. If you need to know anything else, let me know.
Thanks in advance.
Hello Horst,
We reproduced the issue on our side. The root cause is that maxfct variable is needed to be set on all ranks (not only on the root, as in your case). It is clear that this restriction can be avoided, and we are going to fix this in the coming releases.
So, the workaround is to set maxfct for all processes (outside the if clause in your example code).
Let us know if the workaround doesn't work for you.
Best,
Kirill
Thanks for your reply and suggestions. It did not solve the problem but what seems to solve the problem was to also set mnum, msglvl, mtype and nrhs for all ranks. In the documentation is written: 'Most of the input parameters (except for the pt, phase, and comm parameters and, for the
distributed format, the a, ia, and ja arrays) must be set on the master MPI process only, and ignored on other processes.' So it would be nice if you could change that.
Thank you for your help. I've had some issues so far and you always helped me a lot.
