- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I can write serial code that ifort/ifx can vectorise. However, as soon as I put an array into an OMP private clause the compiler no longer recognises the arrays as suitable for aligned access. This makes sense since each per thread array, beyond the first, is allocated at runtime. However, it seems to me that there must be a solution to this out there. I have spent a lot of time looking for one with no joy.
Here is my minimum working example, filename align.f90,
program openmp_align_test
use omp_lib
implicit none
integer, parameter :: dp = kind(1.0d0)
integer, parameter :: n = 128
integer :: i
real(dp), dimension(n) :: array1, array2
!$OMP PARALLEL DEFAULT(NONE) PRIVATE(array1, array2, i)
! Initialize arrays
array1 = 1.0_dp
array2 = 2.0_dp
! Simple vectorizable loop
!$OMP DO
do i = 1, n
array1(i) = array1(i) + array2(i)
end do
!$OMP END DO
!$OMP END PARALLEL
end program openmp_align_testFirst I compiled with,
ifort -O3 -xCORE-AVX512 -align array64byte -qopt-zmm-usage=high -qopt-report=5 align.f90note: this is without -qopenmp. In this case the three loops are fused and fully vectorised with aligned access. The optimisation report (align.optrpt) states : "estimated potential speedup: 9.450".
Then I compiled with '-qopenmp' and '-vec-threshold0',
ifort -O3 -xCORE-AVX512 -qopenmp -vec-threshold0 -align array64byte -qopt-zmm-usage=high -qopt-report=5 align.f90Now the two initialisation loops are fused with aligned access but the addition loop is vectorised with unaligned access (without the threshold flag the compiler chose not to vectorise it). This time the report states: "estimated potential speedup: 1.920".
I have been using ifort 2021.2.0 but would be very happy with ifx only solutions.
Finally, here are some more details on what I am trying to achieve beyond the minimum example above.
- Making array1 and array2 allocatable or on the heap in any way is desirable. Using iso_c_binding to achieve this is less desirable.
- I would prefer solutions that do not use intel specific directives. I am happy with OpenMP solutions to get the alignment right. However, it seems the OpenMP standard is ahead of most compilers when it comes to alignment.
- I have tried making array1 and array2 persistent threadprivate arrays but I get worse results than declaring them in OMP private clause as above.
Link Copied
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page