- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Cross-posting from the Fortran forum:
Have a large mpi application (cfd model) where we use fpe0 when running our verification suite in debug. After updating to the current OneAPI compilers, using either ifort or ifx the following program fails at MPI_INIT if fpe0 is used as a compiler option. This means no verificaiton cases can be run in debug, as they all require intializing MPI (the actual application uses MPI_INIT_THREAD but that also fails). The error message follows the source code.
program test_mpi
use mpi_f08
implicit none
integer i, size, rank, namelen, ierr
character (len=MPI_MAX_PROCESSOR_NAME) :: name
type(mpi_status) :: stat
call MPI_INIT (ierr)
call MPI_COMM_SIZE (MPI_COMM_WORLD, size, ierr)
call MPI_COMM_RANK (MPI_COMM_WORLD, rank, ierr)
call MPI_GET_PROCESSOR_NAME (name, namelen, ierr)
if (rank.eq.0) then
print *, 'Hello world: rank ', rank, ' of ', size, ' running on ', name
do i = 1, size - 1
call MPI_RECV (rank, 1, MPI_INTEGER, i, 1, MPI_COMM_WORLD, stat, ierr)
call MPI_RECV (size, 1, MPI_INTEGER, i, 1, MPI_COMM_WORLD, stat, ierr)
call MPI_RECV (namelen, 1, MPI_INTEGER, i, 1, MPI_COMM_WORLD, stat, ierr)
name = ''
call MPI_RECV (name, namelen, MPI_CHARACTER, i, 1, MPI_COMM_WORLD, stat, ierr)
print *, 'Hello world: rank ', rank, ' of ', size, ' running on ', name
enddo
else
call MPI_SEND (rank, 1, MPI_INTEGER, 0, 1, MPI_COMM_WORLD, ierr)
call MPI_SEND (size, 1, MPI_INTEGER, 0, 1, MPI_COMM_WORLD, ierr)
call MPI_SEND (namelen, 1, MPI_INTEGER, 0, 1, MPI_COMM_WORLD, ierr)
call MPI_SEND (name, namelen, MPI_CHARACTER, 0, 1, MPI_COMM_WORLD, ierr)
endif
call MPI_FINALIZE (ierr)
end
forrtl: error (65): floating invalid
Image PC Routine Line Source
libc.so.6 000014D70BA54DB0 Unknown Unknown Unknown
libucp.so.0.0.0 000014D70C2E717E ucp_proto_perf_en Unknown Unknown
libucp.so.0.0.0 000014D70C2E881D ucp_proto_init_pa Unknown Unknown
libucp.so.0.0.0 000014D70C2EFDF7 ucp_proto_common_ Unknown Unknown
libucp.so.0.0.0 000014D70C2F15F5 ucp_proto_multi_i Unknown Unknown
libucp.so.0.0.0 000014D70C316404 Unknown Unknown Unknown
libucp.so.0.0.0 000014D70C2F2D42 Unknown Unknown Unknown
libucp.so.0.0.0 000014D70C2F4561 ucp_proto_select_ Unknown Unknown
libucp.so.0.0.0 000014D70C2F4A25 ucp_proto_select_ Unknown Unknown
libucp.so.0.0.0 000014D70C2E9A89 ucp_worker_get_ep Unknown Unknown
libucp.so.0.0.0 000014D70C33F39C ucp_wireup_init_l Unknown Unknown
libucp.so.0.0.0 000014D70C2D19CE ucp_ep_create_to_ Unknown Unknown
libucp.so.0.0.0 000014D70C2D2B33 ucp_ep_create Unknown Unknown
libmlx-fi.so 000014D709A08460 Unknown Unknown Unknown
libmpi.so.12.0.0 000014D70CB7295E Unknown Unknown Unknown
libmpi.so.12.0.0 000014D70C71C60A Unknown Unknown Unknown
libmpi.so.12.0.0 000014D70C9E414E Unknown Unknown Unknown
libmpi.so.12.0.0 000014D70C9E396B MPI_Init Unknown Unknown
libmpifort.so.12. 000014D70E0B90A6 mpi_init_f08_ Unknown Unknown
a.out 00000000004052BF Unknown Unknown Unknown
a.out 000000000040521D Unknown Unknown Unknown
libc.so.6 000014D70BA3FEB0 Unknown Unknown Unknown
libc.so.6 000014D70BA3FF60 __libc_start_main Unknown Unknown
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tested your code and it compiles / runs fine.
Can you please post your full compilation/link line and the execution command?
Best
Tobias
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for taking the time to look at this.
Compile:
% mpiifx -fpe0 test_mpi.f90
Run:
% mpiexec -n 2 ./a.out
Compile:
% mpiifort -diag-disable=10448 -fpe0 test_mpi.f90
Run:
% mpiexec -n 2 ./a.out
Without -fpe0 these run. Our system is running RHEL 9.3.
Jason
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you please run with:
export I_MPI_DEBUG=10
1)
export I_MPI_FABRICS=shm
If that works, please also run with
2)
export I_MPI_FABRICS=shm:ofi FI_PROVIDER=psm3
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Both worked with the test program.
Using the full application running on the head node both worked.
Trying the full application on the compute nodes (InfiniBand):
-Option 1)
--worked as long as I kept the core count to one node.
--did not work over multiple nodes and gives the following (line 89 is the call to MPI_INIT_THREAD)
forrtl: severe (71): integer divide by zero
Image PC Routine Line Source
libc.so.6 00001471E4454DB0 Unknown Unknown Unknown
libmpi.so.12.0.0 00001471E4DDCADD Unknown Unknown Unknown
libmpi.so.12.0.0 00001471E4B14A64 Unknown Unknown Unknown
libmpi.so.12.0.0 00001471E4ABBB45 Unknown Unknown Unknown
libmpi.so.12.0.0 00001471E4AB9F7A Unknown Unknown Unknown
libmpi.so.12.0.0 00001471E4C490E1 Unknown Unknown Unknown
libmpi.so.12.0.0 00001471E4C46C2B Unknown Unknown Unknown
libmpi.so.12.0.0 00001471E4C4ACD5 PMPI_Init_thread Unknown Unknown
libmpifort.so.12. 00001471E6258110 mpi_init_thread_f Unknown Unknown
fds_impi_intel_li 0000000004179918 MAIN__ 89 main.f90
fds_impi_intel_li 0000000000407A1D Unknown Unknown Unknown
libc.so.6 00001471E443FEB0 Unknown Unknown Unknown
libc.so.6 00001471E443FF60 __libc_start_main Unknown Unknown
fds_impi_intel_li 0000000000407935 Unknown Unknown Unknown
-Option 2) did not work either way
Error is below. The underlined lines repeated mulitple times.
Abort(1614479) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Unknown error class, error stack:
MPIR_Init_thread(189)........: 
MPID_Init(1561)..............: 
MPIDI_OFI_mpi_init_hook(1624): 
create_vni_context(2221).....: OFI endpoint open failed (ofi_init.c:2221:create_vni_context:Invalid argument)
Abort(1614479) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Unknown error class, error stack:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you please try with:
export I_MPI_FABRICS=shm:ofi FI_PROVIDER=psm3 ?
I am still not able to reproduce this issue.
Can you please provide the output of lscpu?
Best
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I did try "export I_MPI_FABRICS=shm:ofi FI_PROVIDER=psm3". That was the option 2) result in my prior post. But since had some success with just the shm, I felt you had pointed me in the right direction. I started playing with various options and just now had success with both:
export I_MPI_FABRICS=shm:ofi FI_PROVIDER=verbs
and
export I_MPI_FABRICS=shm:ofa FI_PROVIDER=verbs
Thanks for all the help!
 
					
				
				
			
		
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page