## Data alignment for supporting more efficient vectorization

I am testing on an AVX machine. The code somehow looks like this:

``` include "Sub_Prog_1.f90"
include "Sub_Prog_2.f90"

program MyCode

use Sub_Prog_1_Mod
use Sub_Prog_2_Mod

implicit none

integer, parameter :: dp = selected_real_kind(15,307), dp2 =selected_real_kind(15,307)
integer  :: array_size, i, j, k, l
integer, dimension(:), allocatable :: idx
real(kind = dp2) :: time1, time2, omp_get_wtime
real(kind = dp), dimension(:), allocatable :: a, b, c

array_size = 100000000  !assuming I read from an input file, here I just wrote like this

allocate ( idx(array_size), a(array_size), b(array_size), c(array_size) )

! Initialization
do i = 1, array_size
a(i) = dble(i)   ;   b(i) = dble(i * 2)   ;   idx(i) = array_size - i + 1
end do

time1 = omp_get_wtime()

!\$omp parallel
do i = 1, 10

call Sub_Prog_1 ( array_size, idx, a, b )
call Sub_Prog_2 ( array_size, a, b, c )

end do
!\$omp end parallel

time2 = omp_get_wtime()

print *, c(8000000)
print *, 'Results =', time2 - time1

end program MyCode

!==================================================================

subroutine Sub_Prog_1 ( array_size, idx, a, b )

implicit none

integer, parameter :: dp = selected_real_kind(15,307), dp2 = selected_real_kind(15,307)
integer :: array_size, i, j, k, l
integer, dimension(:), allocatable :: idx
real(kind = dp), dimension(:), allocatable :: a, b, c

!\$omp do private(i) schedule(runtime)
!dir\$ vector aligned
!\$omp simd simdlen(4)
do i = 1, array_size
a(i) = a(idx(i)) + dble(i)
if (a(i) <= 3000.0d+0) then
a(i) = dble(idx(i)) / 200.0d+0
end if
b(i) = sqrt(b(i)) + dble(i * 2)
end do
!\$omp end simd
!\$omp end do

end subroutine Sub_Prog_1

!==================================================================

subroutine Sub_Prog_2 ( array_size, a, b, c )

implicit none

integer, parameter :: dp = selected_real_kind(15,307), dp2 = selected_real_kind(15,307)
integer  :: array_size, i, j, k, l
real(kind = dp), dimension(:), allocatable :: a, b, c

!\$omp do private(i) schedule(runtime)
!dir\$ vector aligned
!\$omp simd simdlen(4)
do i = 1, array_size
c(i) = a(i) + sqrt(b(i)) / 3.67d+0
if (c(i) <= 350.0d+0) then
c(i) = a(i) + sqrt(b(i)) / 8.67d+0
end if
end do
!\$omp end simd
!\$omp end do

end subroutine Sub_Prog_2
```

I wanted to exploit the ability of the Intel Compiler 19 for applications of aligned data access for efficient vectorization. Thus, I compiled using the flags "ifort -O3 -qopt-report5 -qopenmp -align array32byte -xAVX -o MyCode.exe Main.f90". Now, I have two questions.

1. I was wondering why I cannot combine !dir\$ vector aligned and !\$omp simd simdlen(...) like written above as the compiler always showed me a message like this:
```Sub_Prog_1.f90(17): catastrophic error: **Internal compiler error: internal abort** Please report this error along with the circumstances in which it occurred in a Software Problem Report.  Note: File and line given may not be explicit cause of this error.
compilation aborted for Main.f90 (code 1)```
2. As I actually prefer using OpenMP directives to Intel one, I was also previously using the directives "!\$omp simd simdlen(4) aligned(a,b,idx :32)" and "!\$omp simd simdlen(4) aligned(a,b,c :32)" for the first and second subroutines, respectively. However, as I saw the vectorization reports, I found that the arrays still had unaligned access. The only thing that I could do so that I achieved both aligned access and vectorization is— to use "!dir\$ simd vectorlength(4)" instead of "!\$omp simd simdlen(4)".

Could someone please explain this matter?

Many thanks.

Best wishes,

Item 1 is a compiler bug. Please report it to Intel through https://supporttickets.intel.com/?lang=en-US and provide a complete source that reproduces the problem, along with the exact command line you used to compile.