- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I compile the following Fortran program
program hello
use omp_lib
implicit none
integer, parameter::ns = 300, ny = 960, nx = 360
integer, parameter::EXTRA = 0
integer :: ix, iy, is
double precision, allocatable::b2stbr_phys_sna(:),sna0(:,:,:,:),na(:,:,:)
double precision :: T_START, T_END
allocate(b2stbr_phys_sna(0:ns-1 + EXTRA))
allocate(sna0(-1:nx,-1:ny,0:1,0:ns-1 + EXTRA))
allocate(na(-1:nx,-1:ny,0:ns-1 + EXTRA))
b2stbr_phys_sna=0.0
sna0 = 0.01
na = 0.02
T_START = omp_get_wtime()
!$OMP PARALLEL DO DEFAULT(NONE) COLLAPSE(2) SHARED(sna0,na) PRIVATE(is,iy,ix) REDUCTION(+:b2stbr_phys_sna)
do is=0,ns-1
do iy=-1,ny
!$OMP SIMD REDUCTION(+:b2stbr_phys_sna)
do ix=-1,nx
b2stbr_phys_sna(is)=b2stbr_phys_sna(is)+sna0(ix,iy,0,is)+sna0(ix,iy,1,is)*na(ix,iy,is)
enddo
enddo
enddo
!$OMP END PARALLEL DO
T_END = omp_get_wtime()
deallocate(b2stbr_phys_sna)
deallocate(sna0)
deallocate(na)
PRINT *, "Work took", T_END - T_START, "seconds"
end program hello
with:
ifx -g -O2 -qopt-report=3 -qopenmp -xhost -mprefer-vector-width=512 ifx_test.f90 -o ifx_test.exe
and with ifort as:
ifort -g -O2 -qopt-report=3 -qopenmp -xhost -qopt-zmm-usage=high ifx_test.f90 -o ifx_test.exe
.
I then set export OMP_NUM_THREADS=2 to run like : ./ifx_test.exe
It produces a segmentation fault.
With gfortran/13.2.0 compiling like
gfortran ifx_test.f90 -fopenmp -fopenmp-simd -O3 -o g_ifx_test.exe
and running with export OMP_NUM_THREADS=2
produces no error.
When I remove the
!$OMP SIMD REDUCTION(+:b2stbr_phys_sna)
line (with any number of threads), it always runs successfully with ifort / ifx.
Could this be an ifort / ifx compiler bug with OpenMP SIMD ?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The loop structures you have, to not present multiple thread writes to the same cells of array b2stbr_phys_sna, and therefore the REDUCTION clause on the !$OMP PARALLEL... directive is not required.
The REDUCTION clause on the !$OMP SIMD should not be required as well. The compiler optimization should see that a summation is being performed to a scalar. The LHS of the = is scalar, RHS can be vectorized, and there is no loop order dependencies.
You can use VTune on fully optimized code, and examine the Disassembly to see if the code was vectorized.
Jim Dempsey
![](/skins/images/DC0E2679F7049B943291D1ED082A478E/responsive_peak/images/icon_anonymous_message.png)
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page