- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I just found a segfault in the "-check_mpi" feature of Intel MPI and Trace Analyzer. The problem is related to the optional "ierror" argument in the MPI_f08 Fortran bindings.
The following small program reproduce the problem:
PROGRAM hello
USE MPI_f08
USE, INTRINSIC :: ISO_FORTRAN_ENV
IMPLICIT NONE
INTEGER(int32) :: ierror, rank, nprocs
CALL MPI_Init()
CALL MPI_Comm_rank(MPI_COMM_WORLD, rank, ierror)
CALL MPI_Comm_size(MPI_COMM_WORLD, nprocs, ierror)
WRITE(*, '("Hello from rank ", I0, " of ", I0)') rank, nprocs
CALL MPI_Finalize(ierror)
END PROGRAM hello
When I compile it and runs it as normal everything works:
$ mpiifx --version
ifx (IFX) 2024.0.0 20231017
Copyright (C) 1985-2023 Intel Corporation. All rights reserved.
$ mpiifx -o hello -g -O0 hello.F90
$ mpirun -n 2 ./hello
Hello from rank 1 of 2
Hello from rank 0 of 2
When I run it with the "-check_mpi" option it segfault:
$ mpirun -check_mpi -n 2 ./hello
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
libc.so.6 00007F11F5842520 Unknown Unknown Unknown
libmpifort.so.12. 00007F11F7EE5664 pmpi_initialized_ Unknown Unknown
libVTmc.so 00007F11F896B855 mpi_init_f08__VT Unknown Unknown
libVTmc.so 00007F11F832FD99 mpi_init_f08_ Unknown Unknown
hello 00000000004051D1 Unknown Unknown Unknown
hello 000000000040519D Unknown Unknown Unknown
libc.so.6 00007F11F5829D90 Unknown Unknown Unknown
libc.so.6 00007F11F5829E40 __libc_start_main Unknown Unknown
hello 00000000004050B5 Unknown Unknown Unknown
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
libc.so.6 00007EFD6B842520 Unknown Unknown Unknown
libmpifort.so.12. 00007EFD6DEE5664 pmpi_initialized_ Unknown Unknown
libVTmc.so 00007EFD6E96B855 mpi_init_f08__VT Unknown Unknown
libVTmc.so 00007EFD6E32FD99 mpi_init_f08_ Unknown Unknown
hello 00000000004051D1 Unknown Unknown Unknown
hello 000000000040519D Unknown Unknown Unknown
libc.so.6 00007EFD6B829D90 Unknown Unknown Unknown
libc.so.6 00007EFD6B829E40 __libc_start_main Unknown Unknown
hello 00000000004050B5 Unknown Unknown Unknown
Running it through GDB and Valgrind leads to finding that the segfault is an invalid 4-byte write in PMPI_INITIALIZED, that comes from libVTmc.so. The 4 bytes is the missing "ierror" argument.
When all optional "ierror" arguments are present everything works as expected.
When "ierror" arguments are removed from the other MPI calls (MPI_Comm_rank, MPI_Comm_size, MPI_Finalize), there are no segfault, but errors are flagged. Trying to call MPI_Comm_rank without 'ierror' gives:
[0] ERROR: LOCAL:MPI:CALL_FAILED: error
[0] ERROR: Unknown error class.
[0] ERROR: Error occurred at:
[0] ERROR: mpi_comm_rank_f08_(comm=MPI_COMM_WORLD, *rank=0x7ffe295d2214, *ierr=0x(nil) <<invalid>>)
[0] ERROR: (/home/hakostra/tmp/itac-test/hello.F90:10)
[0] ERROR: (/home/hakostra/tmp/itac-test/hello)
[0] ERROR: (/usr/lib/x86_64-linux-gnu/libc.so.6)
[0] ERROR: (/usr/lib/x86_64-linux-gnu/libc.so.6)
[0] ERROR: (/home/hakostra/tmp/itac-test/hello)
[0] INFO: 1 error, limit CHECK-MAX-ERRORS reached => aborting
But luckily no segfault in this call, though.
All of this is produced with the latest 2024.0 base- and HPC toolkits, where the trace analyzer version is 2022.0.0.
Any comments and creative workarounds (besides the obvious inserting "ierror" everywhere) are appreciated. Thanks.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for posting in Intel communities!
We have attempted to recreate the issue on our end and successfully reproduced the segmentation fault. However, the distinction lies in the fact that in the scenario where we removed the ierr arguments from all calls, we encountered the same segmentation fault rather than the error you mentioned.
We are currently working on this internally and will get back to you soon with updates.
Regards,
Veena
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We have conducted further investigations on our end.
The variable 'ierror' needs to be passed in 'mpi_init.' This ensures that every process/thread creates its own copy of 'ierror.' If not passed through 'mpi_init,' only the root process will have a copy, and the other processes will not. When the '-mpi_check' flag is invoked, these processes may report a segmentation fault.
When using debugging tools like '--check-mpi,' 'ierror' is not optional.
Regards,
Veena
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We have not heard back from you. Could you please confirm whether the explanation provided adequately addresses your query?
Regards,
Veena
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Since we didn't hear back from you, we assume that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.
Regards,
Veena

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page