- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hello,

I receive a fatal error when using the impi version of intel Pardiso and it would be nice if someone could help me with it. I compile the code with (intel link line advisor)

mpiifort -i8 -I${MKLROOT}/include -c -o mkl_cluster_sparse_solver.o ${MKLROOT}/include /mkl_cluster_sparse_solver.f90 mpiifort -i8 -I${MKLROOT}/include -c -o MPI.o MPI.f90 mpiifort mkl_cluster_sparse_solver.o MPI.o -o MPI.out -Wl, --start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -liomp5 -lpthread -lm -ldl

and run it for instance on two nodes with

mpiexec -n 2 ./MPI.out

I use the 64bit interface. The funny thing is that the reordering phase perfectly works however, the factorisation and solve step don't. The error message I get is the following:

Fatal error in PMPI_Bcast: Message truncated, error stack: PMPI_Bcast(2654)..................: MPI_Bcast(buf=0x7ffe63518210, count=1, MPI_LONG_LONG_INT, root=0, MPI_COMM_WORLD) failed MPIR_Bcast_impl(1804).............: fail failed MPIR_Bcast(1832)..................: fail failed I_MPIR_Bcast_intra(2057)..........: Failure during collective MPIR_Bcast_intra(1599)............: fail failed MPIR_Bcast_binomial(247)..........: fail failed MPIDI_CH3U_Receive_data_found(131): Message from rank 0 and tag 2 truncated; 1600 bytes received but buffer size is 8

So this seems to be a problem with the buffer size. I thought first of all that my problem is too large however, this is not an issue of the matrix size. I tried to fix it by setting

export I_MPI_SHM_LMT_BUFFER_SIZE=2000

but it did not change the problem. In the impi manual there is also the I_MPI_SHM_LMT_BUFFER_NUM and I also tried to set this number to a higher value. The following versions are used: MKL version: 2017.4.256, Ifort version: 17.0.6.256, IMPI version: 2017.4.239. I tried also newer versions but it changed nothing. If I should post an example please let me know. However, I have the hope that it can be easily fixed by setting the buffer size (not I_MPI_SHM_LMT_BUFFER_SIZE) to a higher value.

Thanks in advance

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Horst, how big is the problem size you are trying to solve? Do you have enough memory (RAM) available?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Thank you for your fast response. The matrix has dimension 390*390 and the 3478 entries that differ from zero. I require 12GB on the cluster and since the problem is not big, I think this is sufficient.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

This is very small task. Pardiso API for SMP may help to solve this problem. Nevertheless, please share with us the example to check the problem on our side. btw, could you try to check LP64 API instead of ILP64 ones.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Thank you for your answer and suggestions. This is a very small task since I figured out that the problem occurs for almost every system size.

However, I tried to use the 32-bit interface and it does not fix the problem. So I stick to the 64-bit interface since I treat very large system sizes.

I have attached a very simple program that produces a different mpi error but still an error:

[proxy:0:1@n0710] HYD_pmcd_pmip_control_cmd_cb (../../pm/pmiserv/pmip_cb.c:3481): assert (!closed) failed

[proxy:0:1@n0710] HYDT_dmxu_poll_wait_for_event (../../tools/demux/demux_poll.c:76): callback returned error status

[proxy:0:1@n0710] main (../../pm/pmiserv/pmip.c:558): demux engine error waiting for event

after the symbolic factorisation. The program is there:

program cluster_sparse_solver_sym use mkl_cluster_sparse_solver implicit none include 'mpif.h' integer, parameter :: dp = kind(1.0D0) !.. Internal solver memory pointer for 64-bit architectures TYPE(MKL_CLUSTER_SPARSE_SOLVER_HANDLE) :: pt(64) integer maxfct, mnum, mtype, phase, nrhs, error, msglvl, i, idum(1), DimensionL, Nsparse integer*4 mpi_stat, rank, num_procs double precision :: ddum(1) integer :: iparm( 64 ) integer, allocatable :: IA( : ), JA( : ) double precision, allocatable :: VAL( : ), rhodot( : ), rho( : ) integer(4) MKL_COMM MKL_COMM=MPI_COMM_WORLD call mpi_init(mpi_stat) call mpi_comm_rank(MKL_COMM, rank, mpi_stat) do i = 1, 64 pt(i)%dummy = 0 end do if (rank.eq.0) then DimensionL = 3 Nsparse = 7 allocate (VAL(Nsparse), IA(DimensionL + 1), JA(Nsparse)) VAL(1) = 1.0d0 VAL(2) = 1.0d0 VAL(3) = 1.0d0 VAL(4) = 1.0d0 VAL(5) = 1.0d0 VAL(6) = 1.0d0 VAL(7) = 1.0d0 JA(1) = 1 JA(2) = 2 JA(3) = 3 JA(4) = 2 JA(5) = 3 JA(6) = 2 JA(7) = 3 IA(1) = 1 IA(2) = 4 IA(3) = 6 IA(4) = 8 allocate(rhodot(DimensionL), rho(DimensionL)) do i=1,DimensionL rhodot(i) = 1.0d0 enddo do i = 1, 64 iparm(i) = 0 end do iparm(1) = 1 ! no solver default iparm(2) = 3 ! fill-in reordering from METIS iparm(8) = 2 ! numbers of iterative refinement steps iparm(10) = 13 ! perturbe the pivot elements with 1E-13 iparm(11) = 1 ! use nonsymmetric permutation and scaling MPS iparm(13) = 1 ! maximum weighted matching algorithm is switched-off iparm(27) = 1 error = 0 ! initialize error flag msglvl = 1 ! print statistical information mtype = 11 ! symmetric, indefinite nrhs =1 maxfct =1 mnum =1 endif phase = 11 call cluster_sparse_solver_64 ( pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, ddum, ddum, MKL_COMM, error ) if (error.ne.0.and.rank.eq.0) write(*,*) 'ERROR: ', error phase = 22 call cluster_sparse_solver_64 ( pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, ddum, ddum, MKL_COMM, error ) if (error.ne.0.and.rank.eq.0) write(*,*) 'ERROR: ', error phase = 33 call cluster_sparse_solver_64 ( pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, rhodot, rho, MKL_COMM, error ) if (error.ne.0.and.rank.eq.0) write(*,*) 'ERROR: ', error if (rank.eq.0) write(*,*) rho phase = -1 call cluster_sparse_solver_64 ( pt, maxfct, mnum, mtype, phase, DimensionL, ddum, idum, idum, idum, nrhs, iparm, msglvl, ddum, ddum, MKL_COMM, error ) if (error.ne.0.and.rank.eq.0) write(*,*) 'Release of memory: ', error call mpi_finalize(mpi_stat) end

It would be very nice if you could help me with this problem. Compiling is done via the commands described in my first post.

Thanks in advance

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hello Horst,

Is it possible for you to share a complete reproducer for the failure you observed originally (MPI_Bcast), with a small matrix, together with MPI and Intel MKL versions? I don't think I_MPI_SHM_LMT_BUFFER_NUM can help with the failure. I'd suspect that some of the settings for the Cluster Sparse Solver were wrong (or maybe some non-supported case has happened internally). If we get the reproducer, we should be able to find the root cause.

Best,

Kirill

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Thank you for your response.

Sorry I mixed something up. This is the code producing the fatal error. I have just run it with openmpi to check whether impi is the problem. If I run it with impi I obtain the fatal error. I used MKL version: 2017.4.256, Ifort version: 17.0.6.256, IMPI version: 2017.4.239. I also tried newer versions of mkl/impi/ifort but it changed nothing. I compile with the commands in my first post. If you need to know anything else, let me know.

Thanks in advance.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hello Horst,

We reproduced the issue on our side. The root cause is that maxfct variable is needed to be set on all ranks (not only on the root, as in your case). It is clear that this restriction can be avoided, and we are going to fix this in the coming releases.

So, the workaround is to set maxfct for all processes (outside the if clause in your example code).

Let us know if the workaround doesn't work for you.

Best,

Kirill

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Thanks for your reply and suggestions. It did not solve the problem but what seems to solve the problem was to also set mnum, msglvl, mtype and nrhs for all ranks. In the documentation is written: 'Most of the input parameters (except for the pt, phase, and comm parameters and, for the

distributed format, the a, ia, and ja arrays) must be set on the master MPI process only, and ignored on other processes.' So it would be nice if you could change that.

Thank you for your help. I've had some issues so far and you always helped me a lot.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Thanks for the catch, we'll definitely fix that!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Horst, could you please try the latest MKL 2019 u5 we released the last week and let us know if the problem still exists on your side!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Thank you for this update. Unfortunately, it takes some time until we can use the latest MKL library. I will update you when we can use this.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Horst, have you had a chance to check the issue with 2019 u5?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page