Issues when running in optimized mode with Intel 18.1

Munafo__Alessandro · ‎11-22-2017

Hello,

I am Alessandro Munafo' (post-doc at University of Illinois at Urbana Champaign). I recently downloaded a 30 day trial version of Intel Compiler (18.1). I tested the compiler on a simple code solving the Euler equations for a quasi-1D nozzles flow. The program uses the PETSC library from Argonne national labs (version 3.8.2). I built PETSC using mpich3, openmpi2 and the MPI impementation from Intel 18.1. My OS is Fedora 24.

When I try to run the program and compile with flags' FCFLAGS=-O3 -fp-model precise' I get an error when calling MPIAllreduce at the bottom of the subroutine 'update_solution' given below. As the name suggests, this subroutine updates the numerical solution after each time-step. The error was obtained irrespective of the MPI implementation in use.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

subroutine update_solution()

#include<petscversion.h>

#if !PETSC_VERSION_LT(3,8,0)
#include "petsc_include.h"
    use data_kind_prec,     only: hpc_4, hpc_8
    use data_general,       only: nb_eq, nb_phys, master_node
    use data_sol,           only: global_Phys, global_U, global_dU, da_UExpl, da_UImpl, da_Phys, res_dU
    use data_mesh,          only: loc_left, loc_right
    use physics_EU_NS,      only: cons_to_phys_EU
    use petsc
#else
    use data_kind_prec,     only: hpc_4, hpc_8
    use data_general,       only: nb_eq, nb_phys, master_node
    use data_sol,           only: global_Phys, global_U, global_dU, da_UExpl, da_UImpl, da_Phys, res_dU
    use data_mesh,          only: loc_left, loc_right
    use physics_EU_NS,      only: cons_to_phys_EU
#endif

implicit none

#if PETSC_VERSION_LT(3,8,0)
#include "petsc_include.h"
#endif

    integer(kind=hpc_4) :: i, e, p, proc_id
    PetscErrorCode       :: ierr
    PetscScalar, pointer :: array_Phys(:,:)
    PetscScalar, pointer :: array_dU(:,:)
    PetscScalar, pointer :: array_U(:,:)
    real(kind=hpc_8)     :: du, u
    real(kind=hpc_8), dimension(nb_eq) :: cons, res, res_tot
    real(kind=hpc_8), dimension(nb_phys) :: phys

! Get the current processor id
call mpi_comm_rank(PETSC_COMM_WORLD, proc_id, ierr)

    ! Get access to arrays
    call DMDAVecGetArrayF90(da_Phys, global_Phys, array_Phys, ierr)
    call DMDAVecGetArrayF90(da_UImpl, global_dU, array_dU, ierr)
    call DMDAVecGetArrayF90(da_UExpl, global_U, array_U, ierr)

    ! Loop over cells
    res = 0.d0
    do i = loc_left,loc_right

       ! Update conservative variables
       do e = 1,nb_eq
          du     = array_dU(e - 1,i)
          res(e) = res(e) + du*du
          u = array_U(e - 1,i) + du
          array_U(e - 1,i) = u
          cons(e)          = u
       enddo

       ! Get physical variables based on update conservative variables
       call cons_to_phys_EU(cons, phys)
       do p = 1,nb_phys
          array_Phys(p - 1,i) = phys(p)
       enddo

    enddo

    ! Restore arrays
    call DMDAVecRestoreArrayF90(da_Phys, global_Phys, array_Phys, ierr)
    call DMDAVecRestoreArrayF90(da_UImpl, global_dU, array_dU, ierr)
    call DMDAVecRestoreArrayF90(da_UExpl, global_U, array_U, ierr)

    ! Sum the residual contribution from all sub-domains and store the result in the master node
    ! (there is no need for an "allreduce" as the residual is printed on screen/file by the master node)
    call mpi_allreduce(res, res_tot, nb_eq, MPI_DOUBLE_PRECISION, MPI_SUM, PETSC_COMM_WORLD, ierr)
    res_dU = res_tot

end subroutine update_solution

%%%%%%%%%%%%%%%%%%%%%%%%%%%

The error message I get is the following:

Fatal error in MPI_Allreduce: Invalid buffer pointer, error stack:
MPI_Allreduce(907): MPI_Allreduce(sbuf=(nil), rbuf=0x7ffc79791520, count=3, MPI_DOUBLE_PRECISION, MPI_SUM, MPI_COMM_WORLD) failed
MPI_Allreduce(871): Null buffer pointer
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=6770918

The message seems to suggest that there is a problem with the buffer arrays 'res' and 'res_tot' to 'MPIAllreduce'. The size of those arrays is 'nb_eq' (number of governing equations), which is equal to 3 (so a very small array).

I tried to investigate this issue by progressively turning optimization down and observed that:

FFLAGS=$(USELIB) -O0 -fp-model precise ! WORKS

FFLAGS=$(USELIB) -O1 -fp-model precise ! WORKS

FFLAGS=$(USELIB) -O2 -fp-model precise ! does NOT work

FFLAGS=$(USELIB) -O3 -fp-model precise ! does NOT work

I have also modified the above subroutine by making the arrays 'res' and 'res_tot' allocatable as follows:

%%%%%%%%%%%%%%%%%%%%%%%%%%

subroutine update_solution()

#include<petscversion.h>

#if !PETSC_VERSION_LT(3,8,0)
#include "petsc_include.h"
    use data_kind_prec,     only: hpc_4, hpc_8
    use data_general,       only: nb_eq, nb_phys, master_node
    use data_sol,           only: global_Phys, global_U, global_dU, da_UExpl, da_UImpl, da_Phys, res_dU
    use data_mesh,          only: loc_left, loc_right
    use physics_EU_NS,      only: cons_to_phys_EU
    use petsc
#else
    use data_kind_prec,     only: hpc_4, hpc_8
    use data_general,       only: nb_eq, nb_phys, master_node
    use data_sol,           only: global_Phys, global_U, global_dU, da_UExpl, da_UImpl, da_Phys, res_dU
    use data_mesh,          only: loc_left, loc_right
    use physics_EU_NS,      only: cons_to_phys_EU
#endif

implicit none

#if PETSC_VERSION_LT(3,8,0)
#include "petsc_include.h"
#endif

    integer(kind=hpc_4) :: i, e, p, proc_id
    PetscErrorCode       :: ierr
    PetscScalar, pointer :: array_Phys(:,:)
    PetscScalar, pointer :: array_dU(:,:)
    PetscScalar, pointer :: array_U(:,:)
    real(kind=hpc_8)     :: du, u
    real(kind=hpc_8), dimension(nb_eq) :: cons
    real(kind=hpc_8), dimension(nb_phys) :: phys
    real(kind=hpc_8), dimension(:), allocatable :: res, res_tot

allocate(res(nb_eq),res_tot(nb_eq))

! Get the current processor id
call mpi_comm_rank(PETSC_COMM_WORLD, proc_id, ierr)

    ! Get access to arrays
    call DMDAVecGetArrayF90(da_Phys, global_Phys, array_Phys, ierr)
    call DMDAVecGetArrayF90(da_UImpl, global_dU, array_dU, ierr)
    call DMDAVecGetArrayF90(da_UExpl, global_U, array_U, ierr)

    ! Loop over cells
    res = 0.d0
    do i = loc_left,loc_right

       ! Update conservative variables
       do e = 1,nb_eq
          du     = array_dU(e - 1,i)
          res(e) = res(e) + du*du
          u = array_U(e - 1,i) + du
          array_U(e - 1,i) = u
          cons(e)          = u
       enddo

       ! Get physical variables based on update conservative variables
       call cons_to_phys_EU(cons, phys)
       do p = 1,nb_phys
          array_Phys(p - 1,i) = phys(p)
       enddo

    enddo

    ! Restore arrays
    call DMDAVecRestoreArrayF90(da_Phys, global_Phys, array_Phys, ierr)
    call DMDAVecRestoreArrayF90(da_UImpl, global_dU, array_dU, ierr)
    call DMDAVecRestoreArrayF90(da_UExpl, global_U, array_U, ierr)

    ! Sum the residual contribution from all sub-domains and store the result in the master node
    ! (there is no need for an "allreduce" as the residual is printed on screen/file by the master node)
    call mpi_allreduce(res, res_tot, nb_eq, MPI_DOUBLE_PRECISION, MPI_SUM, PETSC_COMM_WORLD, ierr)
    res_dU = res_tot

deallocate(res,res_tot)

end subroutine update_solution

%%%%%%%%%%%%%%%%%%%%%%%%%%%

Using this last implementation the program runs and produces the expected results irrespective of compilation flags.

I did not observe any issue when the original implementation of the program (where 'res' and 'res_tot' are not allocatable) was ran using the GCC 7.1 compiler.

Has anyone ever faced this problem when using Intel 18.1? I checked over and over my implementation and everything seems to be correct. Could it be that there is an issue with the compiler when turning on aggressive optimization such as with '-O3'?

The whole code (src + Makefile) is in the attached tarball.

I thank you in advance,

Regards,

Alessandro Munafo