My team is working on some code that makes use of the mpi_f08 Fortran interface, and needs to profile it to identify where to prioritise optimisation. Since ITAC produces no output when run on code compiled with `use mpi_f08`, it is necessary to `use pmpi_f08` instead. However, the Intel implementation of this seems to be broken, not providing any of the `mpi_f08` functionality.
For example, this noop program segfaults:
program test_pmpi_f08 use pmpi_f08 implicit none call MPI_Init call MPI_Finalize end program test_pmpi_f08
But adding in the supposedly optional `ierr` argument to `MPI_Init` allows it to run, as does swapping `pmpi_f08` for `mpi_f08`
This second program gives the wrong result for the reduction, and then segfaults:
program test_pmpi_f08_collectives use pmpi_f08 implicit none integer :: ierr, i, ip, np call MPI_Init(ierr) call MPI_Comm_Rank(MPI_Comm_World, ip, ierr) call MPI_Comm_Size(MPI_Comm_World, np, ierr) i = 1 call MPI_AllReduce(MPI_In_Place, i, 1, MPI_Integer, MPI_Sum, MPI_Comm_World, ierr) if (ip .eq. 0) then if (i .eq. np) then print *, "Reduction succeeded" else print *, "FAILED:", i, "!=", np end if end if call MPI_Finalize end program test_pmpi_f08_collectives
As the last example, this runs fine if `mpi_f08` is used instead of `pmpi_f08`. Removing the `MPI_In_Place` and instead using a separate buffer also works even with `pmpi_f08`.
This looks like a bug—as if the `pmpi_f08` implementation falls back on the `use mpi` interface rather than the `mpi_f08` one. Obviously if we have to rewrite all of our code in `use mpi` style in order to profile it, there's little point in writing it with `mpi_f08` syntax to begin with. Has anyone else encountered this?