Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

mpi_allgatherv with ilp64

m_sulc
Beginner
565 Views

Hello, I have been playing around with a simple program within the MPI framework. The idea is to construct a row-wise partitioned matrix and then simply call the MPI_ALLGATHERV function to collect the complete matrix on all cpus (assuming that the matrix is not particularly large but evaluation of individual elements is independent and pretty expensive). One possibility how to collect the data would be to iterate over the columns of the matrix and call MPI_ALLGATHERV on each column independently. However, I tried to do it in a more MPI-like fashion. To this end, I defined a custom MPI type using MPI_TYPE_VECTOR (as shown in the minimalistic example below) in order to exploit only one call of MPI_ALLGATHERV.

This program works (or seems to work) correctly when compiled in a straightforward fashion:
[bash]
mpiifort -o gather.lp64 gather.f90
mpirun -n 2 ./gather.lp64
[/bash]
Moreover, the results are independent of the optimization level.

However, for certain reasons, I would need to use the ILP64 interface. Following the instructions from the MPI manual, I compiled and executed the program like this:
[bash]
mpiifort -f90=ifort -fc=ifort -c -warn all -O1 -i8 -I$MKLROOT/include/intel64/ilp64 -I${MKLROOT}/include -I${I_MPI_ROOT}/include64  -o gather.o gather.f90
mpiifort -f90=ifort -fc=ifort -ilp64 -warn all -i8 -o gather.ilp64 gather.o ${MKLROOT}/lib/intel64/libmkl_blas95_ilp64.a -L${MKLROOT}/lib/intel64 -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_ilp64 -lpthread -lm
mpirun -ilp64 -n 2 ./gather.ilp64
[/bash]

Now, the strange thing is that this produces the expected results with -O0, -O2, and -O3, nevertheless a Segmentation fault pops in with -O1. Strangely enough, this segmentation fault disappears when one removes ${MKLROOT}/lib/intel64/libmkl_blas95_ilp64.a - however, I need this library for certain BLAS ILP64 calls (not used in the minimalistic example below).

I am using ifort Version 13.0.1.117 Build 20121010 and Intel MPI library v. 4.0.3.008.

Any ideas what might be wrong? Perhaps some arguments of the MPI calls are still supposed to be INTEGER(KIND=4) even with -i8? For example, in case of the MKL, the manual mentions that one should check the header files in order to find out the correct kinds, nevertheless the mpif.h header (recommended for ILP64) didn't provide me with any additional insight...

[fortran]
PROGRAM gather
    IMPLICIT NONE
    INCLUDE 'mpif.h'
!
    INTEGER, PARAMETER :: dp = KIND(1D0)
    INTEGER, PARAMETER :: number_of_states = 2
    INTEGER, PARAMETER :: number_of_points = 7
!
    INTEGER :: i
    INTEGER :: nproc, my_id, ierr
    INTEGER(KIND = MPI_ADDRESS_KIND) :: lb, extent
    INTEGER :: ROW_TYPE, ROW_TYPE_RESIZED
!
    REAL(dp), DIMENSION(:, :), ALLOCATABLE :: psi, psi_local, psi_local_tr
    INTEGER, ALLOCATABLE :: number_of_points_per_proc(:), gather_displ_points(:)
    INTEGER :: points_start_index, points_end_index
!
    CALL MPI_INIT(ierr)
    CALL MPI_COMM_SIZE(MPI_COMM_WORLD, nproc, ierr)
    CALL MPI_COMM_RANK(MPI_COMM_WORLD, my_id, ierr)
!
    ALLOCATE(gather_displ_points(0:nproc-1), number_of_points_per_proc(0:nproc-1))
!
    IF(nproc > 1) THEN
        number_of_points_per_proc(0:nproc-2) = number_of_points / nproc
        number_of_points_per_proc(nproc-1) = number_of_points - SUM(number_of_points_per_proc(0:nproc-2))
    ELSE
        number_of_points_per_proc(0) = number_of_points
    END IF
    gather_displ_points(0) = 0
    DO i = 0, nproc - 2
        gather_displ_points(i + 1) = gather_displ_points(i) + number_of_points_per_proc(i)
    END DO
!
    points_start_index = gather_displ_points(my_id) + 1
    points_end_index = points_start_index + number_of_points_per_proc(my_id) - 1
!
    ALLOCATE(psi_local(points_start_index:points_end_index, number_of_states))
    ALLOCATE(psi_local_tr(number_of_states, points_start_index:points_end_index))
    ALLOCATE(psi(number_of_points, number_of_states))
!
    CALL MPI_TYPE_VECTOR(number_of_states, 1, number_of_points, MPI_REAL8, ROW_TYPE, ierr)
    CALL MPI_TYPE_COMMIT(ROW_TYPE, ierr)
    CALL MPI_TYPE_GET_EXTENT(MPI_REAL8, lb, extent, ierr)
    CALL MPI_TYPE_CREATE_RESIZED(ROW_TYPE, lb, extent, ROW_TYPE_RESIZED, ierr)
    CALL MPI_TYPE_COMMIT(ROW_TYPE_RESIZED, ierr)
!
    psi = 0
    psi_local = 8 + my_id
    psi_local_tr = TRANSPOSE(psi_local)
!
    IF(my_id .EQ. 0) THEN
        WRITE(*, *) "calling ALLGATHER"
        WRITE(*, *) number_of_points_per_proc
        WRITE(*, *) gather_displ_points
    END IF
!
    CALL MPI_ALLGATHERV( &
        psi_local_tr, number_of_points_per_proc(my_id)*number_of_states, MPI_REAL8, &
        psi, number_of_points_per_proc, gather_displ_points, ROW_TYPE_RESIZED, MPI_COMM_WORLD, ierr)
!
    IF(my_id .EQ. 0) THEN
        DO i = 1, number_of_points
            WRITE(*, *) psi(i, :)
        END DO
    END IF
!
    DEALLOCATE(psi, psi_local, psi_local_tr)
    CALL MPI_FINALIZE(ierr)
END PROGRAM
[/fortran]

 

0 Kudos
2 Replies
Ron_Green
Moderator
565 Views

Your suspicion is correct - you cannot use -i8 to force integers to KIND=8 and then use those in MPI calls - MPI expects KIND=4 integers.

You will get stack corruption and odd seg faults like you see. 

ron

0 Kudos
m_sulc
Beginner
565 Views

Perhaps I am missing something, nevertheless judging by the example from the Intel MPI manual, I would say that the arguments in the MPI calls are 8-bit (in the ILP64 case), aren't they? The numeric literals, e.g., in MPI_SEND, will be also 8-bit with -i8, or am I mistaken?
http://software.intel.com/sites/products/documentation/hpc/ics/impi/41/lin/Reference_Manual/6_1_Using_ILP64.htm

Also, section 3.5.6.2. of the Intel MPI Reference manual mentions that: Use the mpif.h file instead of the MPI module in Fortran90* applications. The Fortran module supports 32-bit INTEGER size only. So I was wondering what would be the point, if the MPI calls supported exclusively 4-byte integers...

 

0 Kudos
Reply