Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2241 Discussions

ITAC "-check_mpi" with MPI_f08 segfault

hakostra1
New Contributor II
1,552 Views

Hi,

I just found a segfault in the "-check_mpi" feature of Intel MPI and Trace Analyzer. The problem is related to the optional "ierror" argument in the MPI_f08 Fortran bindings.

The following small program reproduce the problem:

PROGRAM hello
    USE MPI_f08
    USE, INTRINSIC :: ISO_FORTRAN_ENV
    IMPLICIT NONE

    INTEGER(int32) :: ierror, rank, nprocs

    CALL MPI_Init()

    CALL MPI_Comm_rank(MPI_COMM_WORLD, rank, ierror)
    CALL MPI_Comm_size(MPI_COMM_WORLD, nprocs, ierror)

    WRITE(*, '("Hello from rank ", I0, " of ", I0)') rank, nprocs

    CALL MPI_Finalize(ierror)
END PROGRAM hello

When I compile it and runs it as normal everything works:

$ mpiifx --version
ifx (IFX) 2024.0.0 20231017
Copyright (C) 1985-2023 Intel Corporation. All rights reserved.

$ mpiifx -o hello -g -O0 hello.F90 

$ mpirun -n 2 ./hello
Hello from rank 1 of 2
Hello from rank 0 of 2

When I run it with the "-check_mpi" option it segfault:

$ mpirun -check_mpi -n 2 ./hello
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
libc.so.6          00007F11F5842520  Unknown               Unknown  Unknown
libmpifort.so.12.  00007F11F7EE5664  pmpi_initialized_     Unknown  Unknown
libVTmc.so         00007F11F896B855  mpi_init_f08__VT      Unknown  Unknown
libVTmc.so         00007F11F832FD99  mpi_init_f08_         Unknown  Unknown
hello              00000000004051D1  Unknown               Unknown  Unknown
hello              000000000040519D  Unknown               Unknown  Unknown
libc.so.6          00007F11F5829D90  Unknown               Unknown  Unknown
libc.so.6          00007F11F5829E40  __libc_start_main     Unknown  Unknown
hello              00000000004050B5  Unknown               Unknown  Unknown
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
libc.so.6          00007EFD6B842520  Unknown               Unknown  Unknown
libmpifort.so.12.  00007EFD6DEE5664  pmpi_initialized_     Unknown  Unknown
libVTmc.so         00007EFD6E96B855  mpi_init_f08__VT      Unknown  Unknown
libVTmc.so         00007EFD6E32FD99  mpi_init_f08_         Unknown  Unknown
hello              00000000004051D1  Unknown               Unknown  Unknown
hello              000000000040519D  Unknown               Unknown  Unknown
libc.so.6          00007EFD6B829D90  Unknown               Unknown  Unknown
libc.so.6          00007EFD6B829E40  __libc_start_main     Unknown  Unknown
hello              00000000004050B5  Unknown               Unknown  Unknown

Running it through GDB and Valgrind leads to finding that the segfault is an invalid 4-byte write in PMPI_INITIALIZED, that comes from libVTmc.so. The 4 bytes is the missing "ierror" argument.

When all optional "ierror" arguments are present everything works as expected.

When "ierror" arguments are removed from the other MPI calls (MPI_Comm_rank, MPI_Comm_size, MPI_Finalize), there are no segfault, but errors are flagged. Trying to call MPI_Comm_rank without 'ierror' gives:

[0] ERROR: LOCAL:MPI:CALL_FAILED: error
[0] ERROR:    Unknown error class.
[0] ERROR:    Error occurred at:
[0] ERROR:       mpi_comm_rank_f08_(comm=MPI_COMM_WORLD, *rank=0x7ffe295d2214, *ierr=0x(nil) <<invalid>>)
[0] ERROR:       (/home/hakostra/tmp/itac-test/hello.F90:10)
[0] ERROR:       (/home/hakostra/tmp/itac-test/hello)
[0] ERROR:       (/usr/lib/x86_64-linux-gnu/libc.so.6)
[0] ERROR:       (/usr/lib/x86_64-linux-gnu/libc.so.6)
[0] ERROR:       (/home/hakostra/tmp/itac-test/hello)
[0] INFO: 1 error, limit CHECK-MAX-ERRORS reached => aborting

But luckily no segfault in this call, though.

All of this is produced with the latest 2024.0 base- and HPC toolkits, where the trace analyzer version is 2022.0.0.

Any comments and creative workarounds (besides the obvious inserting "ierror" everywhere) are appreciated. Thanks.

0 Kudos
4 Replies
VeenaJ_Intel
Moderator
1,482 Views

Hi,

 

Thanks for posting in Intel communities!

 

We have attempted to recreate the issue on our end and successfully reproduced the segmentation fault. However, the distinction lies in the fact that in the scenario where we removed the ierr arguments from all calls, we encountered the same segmentation fault rather than the error you mentioned.

 

We are currently working on this internally and will get back to you soon with updates.

 

Regards,

Veena

 

0 Kudos
VeenaJ_Intel
Moderator
1,325 Views

Hi,

 

We have conducted further investigations on our end.

 

The variable 'ierror' needs to be passed in 'mpi_init.' This ensures that every process/thread creates its own copy of 'ierror.' If not passed through 'mpi_init,' only the root process will have a copy, and the other processes will not. When the '-mpi_check' flag is invoked, these processes may report a segmentation fault.

 

When using debugging tools like '--check-mpi,' 'ierror' is not optional.

 

Regards,

Veena

 

0 Kudos
VeenaJ_Intel
Moderator
1,256 Views

Hi,

 

We have not heard back from you. Could you please confirm whether the explanation provided adequately addresses your query?

 

Regards,

Veena

 

0 Kudos
VeenaJ_Intel
Moderator
1,191 Views

Hi,


Since we didn't hear back from you, we assume that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.


Regards,

Veena


0 Kudos
Reply