Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.

fpe0 and mpi_init fails

dr_jfloyd
Beginner
3,867 Views

Cross-posting from the Fortran forum:

 

Have a large mpi application (cfd model) where we use fpe0 when running our verification suite in debug. After updating to the current OneAPI compilers, using either ifort or ifx the following program fails at MPI_INIT if fpe0 is used as a compiler option.  This means no verificaiton cases can be run in debug, as they all require intializing MPI (the actual application uses MPI_INIT_THREAD but that also fails). The error message follows the source code.

 

program test_mpi
use mpi_f08
implicit none

integer i, size, rank, namelen, ierr
character (len=MPI_MAX_PROCESSOR_NAME) :: name
type(mpi_status) :: stat

call MPI_INIT (ierr)

call MPI_COMM_SIZE (MPI_COMM_WORLD, size, ierr)
call MPI_COMM_RANK (MPI_COMM_WORLD, rank, ierr)
call MPI_GET_PROCESSOR_NAME (name, namelen, ierr)

if (rank.eq.0) then

print *, 'Hello world: rank ', rank, ' of ', size, ' running on ', name

do i = 1, size - 1
call MPI_RECV (rank, 1, MPI_INTEGER, i, 1, MPI_COMM_WORLD, stat, ierr)
call MPI_RECV (size, 1, MPI_INTEGER, i, 1, MPI_COMM_WORLD, stat, ierr)
call MPI_RECV (namelen, 1, MPI_INTEGER, i, 1, MPI_COMM_WORLD, stat, ierr)
name = ''
call MPI_RECV (name, namelen, MPI_CHARACTER, i, 1, MPI_COMM_WORLD, stat, ierr)
print *, 'Hello world: rank ', rank, ' of ', size, ' running on ', name
enddo

else

call MPI_SEND (rank, 1, MPI_INTEGER, 0, 1, MPI_COMM_WORLD, ierr)
call MPI_SEND (size, 1, MPI_INTEGER, 0, 1, MPI_COMM_WORLD, ierr)
call MPI_SEND (namelen, 1, MPI_INTEGER, 0, 1, MPI_COMM_WORLD, ierr)
call MPI_SEND (name, namelen, MPI_CHARACTER, 0, 1, MPI_COMM_WORLD, ierr)

endif

call MPI_FINALIZE (ierr)

end

 

forrtl: error (65): floating invalid

Image PC Routine Line Source
libc.so.6 000014D70BA54DB0 Unknown Unknown Unknown
libucp.so.0.0.0 000014D70C2E717E ucp_proto_perf_en Unknown Unknown
libucp.so.0.0.0 000014D70C2E881D ucp_proto_init_pa Unknown Unknown
libucp.so.0.0.0 000014D70C2EFDF7 ucp_proto_common_ Unknown Unknown
libucp.so.0.0.0 000014D70C2F15F5 ucp_proto_multi_i Unknown Unknown
libucp.so.0.0.0 000014D70C316404 Unknown Unknown Unknown
libucp.so.0.0.0 000014D70C2F2D42 Unknown Unknown Unknown
libucp.so.0.0.0 000014D70C2F4561 ucp_proto_select_ Unknown Unknown
libucp.so.0.0.0 000014D70C2F4A25 ucp_proto_select_ Unknown Unknown
libucp.so.0.0.0 000014D70C2E9A89 ucp_worker_get_ep Unknown Unknown
libucp.so.0.0.0 000014D70C33F39C ucp_wireup_init_l Unknown Unknown
libucp.so.0.0.0 000014D70C2D19CE ucp_ep_create_to_ Unknown Unknown
libucp.so.0.0.0 000014D70C2D2B33 ucp_ep_create Unknown Unknown
libmlx-fi.so 000014D709A08460 Unknown Unknown Unknown
libmpi.so.12.0.0 000014D70CB7295E Unknown Unknown Unknown
libmpi.so.12.0.0 000014D70C71C60A Unknown Unknown Unknown
libmpi.so.12.0.0 000014D70C9E414E Unknown Unknown Unknown
libmpi.so.12.0.0 000014D70C9E396B MPI_Init Unknown Unknown
libmpifort.so.12. 000014D70E0B90A6 mpi_init_f08_ Unknown Unknown
a.out 00000000004052BF Unknown Unknown Unknown
a.out 000000000040521D Unknown Unknown Unknown
libc.so.6 000014D70BA3FEB0 Unknown Unknown Unknown
libc.so.6 000014D70BA3FF60 __libc_start_main Unknown Unknown

 
0 Kudos
6 Replies
TobiasK
Moderator
3,844 Views

@dr_jfloyd


I tested your code and it compiles / runs fine.

Can you please post your full compilation/link line and the execution command?


Best

Tobias


0 Kudos
dr_jfloyd
Beginner
3,832 Views

@TobiasK 

 

Thank you for taking the time to look at this.

 

Compile:

% mpiifx -fpe0 test_mpi.f90

Run:

% mpiexec -n 2 ./a.out

 

Compile:

% mpiifort -diag-disable=10448 -fpe0 test_mpi.f90

Run:

% mpiexec -n 2 ./a.out

 

Without -fpe0 these run.  Our system is running RHEL 9.3.  

 

Jason

0 Kudos
TobiasK
Moderator
3,829 Views

@dr_jfloyd


Can you please run with:

export I_MPI_DEBUG=10


1)

export I_MPI_FABRICS=shm

If that works, please also run with

2)

export I_MPI_FABRICS=shm:ofi FI_PROVIDER=psm3



0 Kudos
dr_jfloyd
Beginner
3,823 Views

Both worked with the test program.

Using the full application running on the head node both worked. 

Trying the full application on the compute nodes (InfiniBand):

-Option 1)

--worked as long as I kept the core count to one node. 

--did not work over multiple nodes and gives the following (line 89 is the call to MPI_INIT_THREAD)

forrtl: severe (71): integer divide by zero
Image PC Routine Line Source
libc.so.6 00001471E4454DB0 Unknown Unknown Unknown
libmpi.so.12.0.0 00001471E4DDCADD Unknown Unknown Unknown
libmpi.so.12.0.0 00001471E4B14A64 Unknown Unknown Unknown
libmpi.so.12.0.0 00001471E4ABBB45 Unknown Unknown Unknown
libmpi.so.12.0.0 00001471E4AB9F7A Unknown Unknown Unknown
libmpi.so.12.0.0 00001471E4C490E1 Unknown Unknown Unknown
libmpi.so.12.0.0 00001471E4C46C2B Unknown Unknown Unknown
libmpi.so.12.0.0 00001471E4C4ACD5 PMPI_Init_thread Unknown Unknown
libmpifort.so.12. 00001471E6258110 mpi_init_thread_f Unknown Unknown
fds_impi_intel_li 0000000004179918 MAIN__ 89 main.f90
fds_impi_intel_li 0000000000407A1D Unknown Unknown Unknown
libc.so.6 00001471E443FEB0 Unknown Unknown Unknown
libc.so.6 00001471E443FF60 __libc_start_main Unknown Unknown
fds_impi_intel_li 0000000000407935 Unknown Unknown Unknown

 

-Option 2) did not work either way

Error is below. The underlined lines repeated mulitple times.

Abort(1614479) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Unknown error class, error stack:
MPIR_Init_thread(189)........:
MPID_Init(1561)..............:
MPIDI_OFI_mpi_init_hook(1624):
create_vni_context(2221).....: OFI endpoint open failed (ofi_init.c:2221:create_vni_context:Invalid argument)
Abort(1614479) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Unknown error class, error stack:

 

0 Kudos
TobiasK
Moderator
3,817 Views

@dr_jfloyd


Can you please try with:

export I_MPI_FABRICS=shm:ofi FI_PROVIDER=psm3 ?


I am still not able to reproduce this issue.

Can you please provide the output of lscpu?


Best


0 Kudos
dr_jfloyd
Beginner
3,799 Views

I did try "export I_MPI_FABRICS=shm:ofi FI_PROVIDER=psm3".  That was the option 2) result in my prior post.   But since had some success with just the shm, I felt you had pointed me in the right direction. I started playing with various options and  just now had success with both:

 

export I_MPI_FABRICS=shm:ofi FI_PROVIDER=verbs

 

and

 

export I_MPI_FABRICS=shm:ofa FI_PROVIDER=verbs

 

Thanks for all the help!

 

 

0 Kudos
Reply