Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library
- Pardiso mpi version with fatal error

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Horst

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-08-2019
08:35 AM

213 Views

Pardiso mpi version with fatal error

Hello,

I receive a fatal error when using the impi version of intel Pardiso and it would be nice if someone could help me with it. I compile the code with (intel link line advisor)

mpiifort -i8 -I${MKLROOT}/include -c -o mkl_cluster_sparse_solver.o ${MKLROOT}/include /mkl_cluster_sparse_solver.f90 mpiifort -i8 -I${MKLROOT}/include -c -o MPI.o MPI.f90 mpiifort mkl_cluster_sparse_solver.o MPI.o -o MPI.out -Wl, --start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -liomp5 -lpthread -lm -ldl

and run it for instance on two nodes with

mpiexec -n 2 ./MPI.out

I use the 64bit interface. The funny thing is that the reordering phase perfectly works however, the factorisation and solve step don't. The error message I get is the following:

Fatal error in PMPI_Bcast: Message truncated, error stack: PMPI_Bcast(2654)..................: MPI_Bcast(buf=0x7ffe63518210, count=1, MPI_LONG_LONG_INT, root=0, MPI_COMM_WORLD) failed MPIR_Bcast_impl(1804).............: fail failed MPIR_Bcast(1832)..................: fail failed I_MPIR_Bcast_intra(2057)..........: Failure during collective MPIR_Bcast_intra(1599)............: fail failed MPIR_Bcast_binomial(247)..........: fail failed MPIDI_CH3U_Receive_data_found(131): Message from rank 0 and tag 2 truncated; 1600 bytes received but buffer size is 8

So this seems to be a problem with the buffer size. I thought first of all that my problem is too large however, this is not an issue of the matrix size. I tried to fix it by setting

export I_MPI_SHM_LMT_BUFFER_SIZE=2000

but it did not change the problem. In the impi manual there is also the I_MPI_SHM_LMT_BUFFER_NUM and I also tried to set this number to a higher value. The following versions are used: MKL version: 2017.4.256, Ifort version: 17.0.6.256, IMPI version: 2017.4.239. I tried also newer versions but it changed nothing. If I should post an example please let me know. However, I have the hope that it can be easily fixed by setting the buffer size (not I_MPI_SHM_LMT_BUFFER_SIZE) to a higher value.

Thanks in advance

Link Copied

12 Replies

Gennady_F_Intel

Moderator

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-08-2019
09:53 AM

213 Views

Horst

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-08-2019
10:20 AM

213 Views

Gennady_F_Intel

Moderator

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-08-2019
08:45 PM

213 Views

Horst

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-09-2019
08:11 AM

213 Views

Thank you for your answer and suggestions. This is a very small task since I figured out that the problem occurs for almost every system size.

However, I tried to use the 32-bit interface and it does not fix the problem. So I stick to the 64-bit interface since I treat very large system sizes.

I have attached a very simple program that produces a different mpi error but still an error:

[proxy:0:1@n0710] HYD_pmcd_pmip_control_cmd_cb (../../pm/pmiserv/pmip_cb.c:3481): assert (!closed) failed

[proxy:0:1@n0710] HYDT_dmxu_poll_wait_for_event (../../tools/demux/demux_poll.c:76): callback returned error status

[proxy:0:1@n0710] main (../../pm/pmiserv/pmip.c:558): demux engine error waiting for event

after the symbolic factorisation. The program is there:

program cluster_sparse_solver_sym use mkl_cluster_sparse_solver implicit none include 'mpif.h' integer, parameter :: dp = kind(1.0D0) !.. Internal solver memory pointer for 64-bit architectures TYPE(MKL_CLUSTER_SPARSE_SOLVER_HANDLE) :: pt(64) integer maxfct, mnum, mtype, phase, nrhs, error, msglvl, i, idum(1), DimensionL, Nsparse integer*4 mpi_stat, rank, num_procs double precision :: ddum(1) integer :: iparm( 64 ) integer, allocatable :: IA( : ), JA( : ) double precision, allocatable :: VAL( : ), rhodot( : ), rho( : ) integer(4) MKL_COMM MKL_COMM=MPI_COMM_WORLD call mpi_init(mpi_stat) call mpi_comm_rank(MKL_COMM, rank, mpi_stat) do i = 1, 64 pt(i)%dummy = 0 end do if (rank.eq.0) then DimensionL = 3 Nsparse = 7 allocate (VAL(Nsparse), IA(DimensionL + 1), JA(Nsparse)) VAL(1) = 1.0d0 VAL(2) = 1.0d0 VAL(3) = 1.0d0 VAL(4) = 1.0d0 VAL(5) = 1.0d0 VAL(6) = 1.0d0 VAL(7) = 1.0d0 JA(1) = 1 JA(2) = 2 JA(3) = 3 JA(4) = 2 JA(5) = 3 JA(6) = 2 JA(7) = 3 IA(1) = 1 IA(2) = 4 IA(3) = 6 IA(4) = 8 allocate(rhodot(DimensionL), rho(DimensionL)) do i=1,DimensionL rhodot(i) = 1.0d0 enddo do i = 1, 64 iparm(i) = 0 end do iparm(1) = 1 ! no solver default iparm(2) = 3 ! fill-in reordering from METIS iparm(8) = 2 ! numbers of iterative refinement steps iparm(10) = 13 ! perturbe the pivot elements with 1E-13 iparm(11) = 1 ! use nonsymmetric permutation and scaling MPS iparm(13) = 1 ! maximum weighted matching algorithm is switched-off iparm(27) = 1 error = 0 ! initialize error flag msglvl = 1 ! print statistical information mtype = 11 ! symmetric, indefinite nrhs =1 maxfct =1 mnum =1 endif phase = 11 call cluster_sparse_solver_64 ( pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, ddum, ddum, MKL_COMM, error ) if (error.ne.0.and.rank.eq.0) write(*,*) 'ERROR: ', error phase = 22 call cluster_sparse_solver_64 ( pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, ddum, ddum, MKL_COMM, error ) if (error.ne.0.and.rank.eq.0) write(*,*) 'ERROR: ', error phase = 33 call cluster_sparse_solver_64 ( pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, rhodot, rho, MKL_COMM, error ) if (error.ne.0.and.rank.eq.0) write(*,*) 'ERROR: ', error if (rank.eq.0) write(*,*) rho phase = -1 call cluster_sparse_solver_64 ( pt, maxfct, mnum, mtype, phase, DimensionL, ddum, idum, idum, idum, nrhs, iparm, msglvl, ddum, ddum, MKL_COMM, error ) if (error.ne.0.and.rank.eq.0) write(*,*) 'Release of memory: ', error call mpi_finalize(mpi_stat) end

It would be very nice if you could help me with this problem. Compiling is done via the commands described in my first post.

Thanks in advance

Kirill_V_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-09-2019
12:35 PM

213 Views

Hello Horst,

Is it possible for you to share a complete reproducer for the failure you observed originally (MPI_Bcast), with a small matrix, together with MPI and Intel MKL versions? I don't think I_MPI_SHM_LMT_BUFFER_NUM can help with the failure. I'd suspect that some of the settings for the Cluster Sparse Solver were wrong (or maybe some non-supported case has happened internally). If we get the reproducer, we should be able to find the root cause.

Best,

Kirill

Horst

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-09-2019
03:58 PM

213 Views

Thank you for your response.

Sorry I mixed something up. This is the code producing the fatal error. I have just run it with openmpi to check whether impi is the problem. If I run it with impi I obtain the fatal error. I used MKL version: 2017.4.256, Ifort version: 17.0.6.256, IMPI version: 2017.4.239. I also tried newer versions of mkl/impi/ifort but it changed nothing. I compile with the commands in my first post. If you need to know anything else, let me know.

Thanks in advance.

Kirill_V_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-10-2019
12:11 PM

213 Views

Hello Horst,

We reproduced the issue on our side. The root cause is that maxfct variable is needed to be set on all ranks (not only on the root, as in your case). It is clear that this restriction can be avoided, and we are going to fix this in the coming releases.

So, the workaround is to set maxfct for all processes (outside the if clause in your example code).

Let us know if the workaround doesn't work for you.

Best,

Kirill

Horst

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-10-2019
04:12 PM

213 Views

Thanks for your reply and suggestions. It did not solve the problem but what seems to solve the problem was to also set mnum, msglvl, mtype and nrhs for all ranks. In the documentation is written: 'Most of the input parameters (except for the pt, phase, and comm parameters and, for the

distributed format, the a, ia, and ja arrays) must be set on the master MPI process only, and ignored on other processes.' So it would be nice if you could change that.

Thank you for your help. I've had some issues so far and you always helped me a lot.

Kirill_V_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-10-2019
05:09 PM

213 Views

Thanks for the catch, we'll definitely fix that!

Gennady_F_Intel

Moderator

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-15-2019
09:27 PM

213 Views

Horst

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-19-2019
01:35 AM

213 Views

Gennady_F_Intel

Moderator

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-15-2019
07:49 PM

213 Views

Horst, have you had a chance to check the issue with 2019 u5?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.