- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am Alessandro Munafo' (post-doc at University of Illinois at Urbana Champaign). I recently downloaded a 30 day trial version of Intel Compiler (18.1). I tested the compiler on a simple code solving the Euler equations for a quasi-1D nozzles flow. The program uses the PETSC library from Argonne national labs (version 3.8.2). I built PETSC using mpich3, openmpi2 and the MPI impementation from Intel 18.1. My OS is Fedora 24.
When I try to run the program and compile with flags' FCFLAGS=-O3 -fp-model precise' I get an error when calling MPIAllreduce at the bottom of the subroutine 'update_solution' given below. As the name suggests, this subroutine updates the numerical solution after each time-step. The error was obtained irrespective of the MPI implementation in use.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
subroutine update_solution()
#include<petscversion.h>
#if !PETSC_VERSION_LT(3,8,0)
#include "petsc_include.h"
use data_kind_prec, only: hpc_4, hpc_8
use data_general, only: nb_eq, nb_phys, master_node
use data_sol, only: global_Phys, global_U, global_dU, da_UExpl, da_UImpl, da_Phys, res_dU
use data_mesh, only: loc_left, loc_right
use physics_EU_NS, only: cons_to_phys_EU
use petsc
#else
use data_kind_prec, only: hpc_4, hpc_8
use data_general, only: nb_eq, nb_phys, master_node
use data_sol, only: global_Phys, global_U, global_dU, da_UExpl, da_UImpl, da_Phys, res_dU
use data_mesh, only: loc_left, loc_right
use physics_EU_NS, only: cons_to_phys_EU
#endif
implicit none
#if PETSC_VERSION_LT(3,8,0)
#include "petsc_include.h"
#endif
integer(kind=hpc_4) :: i, e, p, proc_id
PetscErrorCode :: ierr
PetscScalar, pointer :: array_Phys(:,:)
PetscScalar, pointer :: array_dU(:,:)
PetscScalar, pointer :: array_U(:,:)
real(kind=hpc_8) :: du, u
real(kind=hpc_8), dimension(nb_eq) :: cons, res, res_tot
real(kind=hpc_8), dimension(nb_phys) :: phys
! Get the current processor id
call mpi_comm_rank(PETSC_COMM_WORLD, proc_id, ierr)
! Get access to arrays
call DMDAVecGetArrayF90(da_Phys, global_Phys, array_Phys, ierr)
call DMDAVecGetArrayF90(da_UImpl, global_dU, array_dU, ierr)
call DMDAVecGetArrayF90(da_UExpl, global_U, array_U, ierr)
! Loop over cells
res = 0.d0
do i = loc_left,loc_right
! Update conservative variables
do e = 1,nb_eq
du = array_dU(e - 1,i)
res(e) = res(e) + du*du
u = array_U(e - 1,i) + du
array_U(e - 1,i) = u
cons(e) = u
enddo
! Get physical variables based on update conservative variables
call cons_to_phys_EU(cons, phys)
do p = 1,nb_phys
array_Phys(p - 1,i) = phys(p)
enddo
enddo
! Restore arrays
call DMDAVecRestoreArrayF90(da_Phys, global_Phys, array_Phys, ierr)
call DMDAVecRestoreArrayF90(da_UImpl, global_dU, array_dU, ierr)
call DMDAVecRestoreArrayF90(da_UExpl, global_U, array_U, ierr)
! Sum the residual contribution from all sub-domains and store the result in the master node
! (there is no need for an "allreduce" as the residual is printed on screen/file by the master node)
call mpi_allreduce(res, res_tot, nb_eq, MPI_DOUBLE_PRECISION, MPI_SUM, PETSC_COMM_WORLD, ierr)
res_dU = res_tot
end subroutine update_solution
%%%%%%%%%%%%%%%%%%%%%%%%%%%
The error message I get is the following:
Fatal error in MPI_Allreduce: Invalid buffer pointer, error stack:
MPI_Allreduce(907): MPI_Allreduce(sbuf=(nil), rbuf=0x7ffc79791520, count=3, MPI_DOUBLE_PRECISION, MPI_SUM, MPI_COMM_WORLD) failed
MPI_Allreduce(871): Null buffer pointer
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=6770918
The message seems to suggest that there is a problem with the buffer arrays 'res' and 'res_tot' to 'MPIAllreduce'. The size of those arrays is 'nb_eq' (number of governing equations), which is equal to 3 (so a very small array).
I tried to investigate this issue by progressively turning optimization down and observed that:
FFLAGS=$(USELIB) -O0 -fp-model precise ! WORKS
FFLAGS=$(USELIB) -O1 -fp-model precise ! WORKS
FFLAGS=$(USELIB) -O2 -fp-model precise ! does NOT work
FFLAGS=$(USELIB) -O3 -fp-model precise ! does NOT work
I have also modified the above subroutine by making the arrays 'res' and 'res_tot' allocatable as follows:
%%%%%%%%%%%%%%%%%%%%%%%%%%
subroutine update_solution()
#include<petscversion.h>
#if !PETSC_VERSION_LT(3,8,0)
#include "petsc_include.h"
use data_kind_prec, only: hpc_4, hpc_8
use data_general, only: nb_eq, nb_phys, master_node
use data_sol, only: global_Phys, global_U, global_dU, da_UExpl, da_UImpl, da_Phys, res_dU
use data_mesh, only: loc_left, loc_right
use physics_EU_NS, only: cons_to_phys_EU
use petsc
#else
use data_kind_prec, only: hpc_4, hpc_8
use data_general, only: nb_eq, nb_phys, master_node
use data_sol, only: global_Phys, global_U, global_dU, da_UExpl, da_UImpl, da_Phys, res_dU
use data_mesh, only: loc_left, loc_right
use physics_EU_NS, only: cons_to_phys_EU
#endif
implicit none
#if PETSC_VERSION_LT(3,8,0)
#include "petsc_include.h"
#endif
integer(kind=hpc_4) :: i, e, p, proc_id
PetscErrorCode :: ierr
PetscScalar, pointer :: array_Phys(:,:)
PetscScalar, pointer :: array_dU(:,:)
PetscScalar, pointer :: array_U(:,:)
real(kind=hpc_8) :: du, u
real(kind=hpc_8), dimension(nb_eq) :: cons
real(kind=hpc_8), dimension(nb_phys) :: phys
real(kind=hpc_8), dimension(:), allocatable :: res, res_tot
allocate(res(nb_eq),res_tot(nb_eq))
! Get the current processor id
call mpi_comm_rank(PETSC_COMM_WORLD, proc_id, ierr)
! Get access to arrays
call DMDAVecGetArrayF90(da_Phys, global_Phys, array_Phys, ierr)
call DMDAVecGetArrayF90(da_UImpl, global_dU, array_dU, ierr)
call DMDAVecGetArrayF90(da_UExpl, global_U, array_U, ierr)
! Loop over cells
res = 0.d0
do i = loc_left,loc_right
! Update conservative variables
do e = 1,nb_eq
du = array_dU(e - 1,i)
res(e) = res(e) + du*du
u = array_U(e - 1,i) + du
array_U(e - 1,i) = u
cons(e) = u
enddo
! Get physical variables based on update conservative variables
call cons_to_phys_EU(cons, phys)
do p = 1,nb_phys
array_Phys(p - 1,i) = phys(p)
enddo
enddo
! Restore arrays
call DMDAVecRestoreArrayF90(da_Phys, global_Phys, array_Phys, ierr)
call DMDAVecRestoreArrayF90(da_UImpl, global_dU, array_dU, ierr)
call DMDAVecRestoreArrayF90(da_UExpl, global_U, array_U, ierr)
! Sum the residual contribution from all sub-domains and store the result in the master node
! (there is no need for an "allreduce" as the residual is printed on screen/file by the master node)
call mpi_allreduce(res, res_tot, nb_eq, MPI_DOUBLE_PRECISION, MPI_SUM, PETSC_COMM_WORLD, ierr)
res_dU = res_tot
deallocate(res,res_tot)
end subroutine update_solution
%%%%%%%%%%%%%%%%%%%%%%%%%%%
Using this last implementation the program runs and produces the expected results irrespective of compilation flags.
I did not observe any issue when the original implementation of the program (where 'res' and 'res_tot' are not allocatable) was ran using the GCC 7.1 compiler.
Has anyone ever faced this problem when using Intel 18.1? I checked over and over my implementation and everything seems to be correct. Could it be that there is an issue with the compiler when turning on aggressive optimization such as with '-O3'?
The whole code (src + Makefile) is in the attached tarball.
I thank you in advance,
Regards,
Alessandro Munafo
Link Copied
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page