Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28608 Discussions

Possible bug with ifort/ifx 2023.2.0 and OpenMP SIMD

Gaurav-Saxena
New User
80 Views

I compile the following Fortran program

 

 

program hello
use omp_lib
implicit none
integer, parameter::ns = 300, ny = 960, nx = 360
integer, parameter::EXTRA = 0
integer :: ix, iy, is
double precision, allocatable::b2stbr_phys_sna(:),sna0(:,:,:,:),na(:,:,:)
double precision :: T_START, T_END


allocate(b2stbr_phys_sna(0:ns-1 + EXTRA))
allocate(sna0(-1:nx,-1:ny,0:1,0:ns-1 + EXTRA))
allocate(na(-1:nx,-1:ny,0:ns-1 + EXTRA))

b2stbr_phys_sna=0.0
sna0 = 0.01
na = 0.02

T_START = omp_get_wtime()
!$OMP PARALLEL DO DEFAULT(NONE) COLLAPSE(2) SHARED(sna0,na) PRIVATE(is,iy,ix) REDUCTION(+:b2stbr_phys_sna)
do is=0,ns-1
do iy=-1,ny
!$OMP SIMD REDUCTION(+:b2stbr_phys_sna)
do ix=-1,nx
b2stbr_phys_sna(is)=b2stbr_phys_sna(is)+sna0(ix,iy,0,is)+sna0(ix,iy,1,is)*na(ix,iy,is)
enddo
enddo
enddo
!$OMP END PARALLEL DO
T_END = omp_get_wtime()

deallocate(b2stbr_phys_sna)
deallocate(sna0)
deallocate(na)

PRINT *, "Work took", T_END - T_START, "seconds"

end program hello

 

 

with:

ifx -g -O2 -qopt-report=3 -qopenmp -xhost -mprefer-vector-width=512 ifx_test.f90 -o ifx_test.exe

and with ifort as:

ifort -g -O2 -qopt-report=3 -qopenmp -xhost -qopt-zmm-usage=high  ifx_test.f90 -o ifx_test.exe 

.

I then set export OMP_NUM_THREADS=2 to run like : ./ifx_test.exe 

It produces a segmentation fault. 

With gfortran/13.2.0 compiling like

gfortran ifx_test.f90 -fopenmp -fopenmp-simd -O3 -o g_ifx_test.exe

and running with export OMP_NUM_THREADS=2 

produces no error. 

When I remove the 
!$OMP SIMD REDUCTION(+:b2stbr_phys_sna)

line (with any number of threads), it always runs successfully with ifort / ifx. 

Could this be an ifort / ifx compiler bug with OpenMP SIMD  ? 

0 Kudos
1 Reply
jimdempseyatthecove
Honored Contributor III
35 Views

The loop structures you have, to not present multiple thread writes to the same cells of array b2stbr_phys_sna, and therefore the REDUCTION clause on the !$OMP PARALLEL... directive is not required.

 

The REDUCTION clause on the !$OMP SIMD should not be required as well. The compiler optimization should see that a summation is being performed to a scalar. The LHS of the = is scalar, RHS can be vectorized, and there is no loop order dependencies.

 

You can use VTune on fully optimized code, and examine the Disassembly to see if the code was vectorized.

 

Jim Dempsey

Reply