Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2237 Discussions

MPI_GATHER requires receive buffer to be allocated on all threads

Axel_Foley
Beginner
1,546 Views

I'm a bit confused by the behavior of ifort / Intel MPI when it comes to collective MPI routines.

Using version 2021.5.0 20211109 installed via OneAPI.

PROGRAM gathertests
    
    use iso_fortran_env, only: int32
    use mpi_f08
    
    implicit none
    
    INTEGER, PARAMETER :: I4B = int32
    INTEGER(I4B) :: r, nranks, error
    INTEGER(I4B) :: chunksize, nstart, nstop, n
    INTEGER(I4B), PARAMETER :: arr_length = 40 ! must be a whole multiple of nranks, e.g. run with 4 threads
    INTEGER(I4B), DIMENSION(:), ALLOCATABLE :: idx_temp_rot, receive_buffer
    
    call MPI_INIT(error)
    call MPI_COMM_SIZE(MPI_COMM_WORLD, nranks, error)
    call MPI_COMM_RANK(MPI_COMM_WORLD, r, error)
    
    allocate(idx_temp_rot(arr_length))
    
    ! calculate bounds
    chunksize = arr_length / nranks
    nstart = r * chunksize + 1
    nstop  = (r+1) * chunksize
    
    idx_temp_rot(nstart:nstop) = r
    
    ! when compiled with "-check all", the program fails at run time: "Attempt to fetch from allocatable variable RECEIVE_BUFFER when it is not allocated"
    ! when compiled without "-check all", program runs without error and produces expected result
    if(r .EQ. 0) then
        allocate(receive_buffer(arr_length))
    end if
    
    call MPI_GATHER(idx_temp_rot(nstart), chunksize, MPI_INTEGER4, receive_buffer, chunksize, MPI_INTEGER4, 0, MPI_COMM_WORLD, error)
    
    if(r .EQ. 0) then
        do n=1, arr_length
            write(*,*) n, receive_buffer(n)
        end do
    end if
    
    call MPI_FINALIZE(error)
    
END PROGRAM gathertests

This is a minimal working example of what I am looking at. Best run on 4 threads.

When compiled with the "-check all" option, this code throws a runtime error for line 33:
forrtl: severe (408): fort: (8): Attempt to fetch from allocatable variable RECEIVE_BUFFER when it is not allocated

When compiled without -check all, it runs without errors and produces the expected result.

And of course, If I allocate the receive buffer on all processes, the code runs fine no matter which flags I use.

My understanding was that the receive buffer is not required on all participating processes for MPI_GATHER, only on the receiving process. The documentation for some MPI implementations I read even has an example very similar to this.

Is the error checking too strict? Does Intel MPI implement collective routines like MPI_GATHER differently? Am I missing something?

I get that this could all be avoided with an in-place communication, but I would prefer understanding what is going on here first. Since this issue does not seem to be exclusive to MPI_GATHER.

0 Kudos
1 Solution
TobiasK
Moderator
1,498 Views

@Axel_Foley you are mixing things here.
-check all is a Fortran compiler option and correctly aborts since it just checks correctness from Fortran's standpoint of view. Fortran's check all is unaware of MPI.

I would recommend to allocate the buffer with a dummy size of 1 or 0 for the ranks other than root.

View solution in original post

0 Kudos
3 Replies
TobiasK
Moderator
1,499 Views

@Axel_Foley you are mixing things here.
-check all is a Fortran compiler option and correctly aborts since it just checks correctness from Fortran's standpoint of view. Fortran's check all is unaware of MPI.

I would recommend to allocate the buffer with a dummy size of 1 or 0 for the ranks other than root.

0 Kudos
Axel_Foley
Beginner
1,482 Views

What I am getting here is that both of these statements are correct:

1) MPI_GATHER does not require the receive buffer to be allocated on any processes other than root.

2) Despite this, the compiler is right to flag the "missing" allocation as an error.

 

So, contrary to the code examples one might find in the documentation for other MPI libraries (https://www.open-mpi.org/doc/v3.1/man3/MPI_Gather.3.php Example 2), it is required to allocate the receive buffer on all processes. Even if it is just a dummy allocation. The code might still run fine without, if debug flags are not enabled. But that seems like asking for trouble.

0 Kudos
TobiasK
Moderator
1,475 Views

@Axel_Foley 

 

1) is of no relevance here.


It is the Fortran standard which disallows passing an unallocated array to a function call in almost all cases:

https://community.intel.com/t5/Intel-Fortran-Compiler/Passing-unallocated-allocatable-arrays-to-subroutines/m-p/850193#M65652

 

There is no 'MPI' compiler, it's still the Fortran compiler, the mpiifort/mpiifx/mpif90 scripts are just wrapper scripts that add the necessary libraries and linking options. That's why -check all checks not for MPI conformance but for Fortran standard conformance.

 

Bottom line: The code is Fortran, in Fortran the code is illegal, no matter what the MPI standard says.

 

0 Kudos
Reply