- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am testing on an AVX machine. The code somehow looks like this:
include "Sub_Prog_1.f90" include "Sub_Prog_2.f90" program MyCode use Sub_Prog_1_Mod use Sub_Prog_2_Mod implicit none integer, parameter :: dp = selected_real_kind(15,307), dp2 =selected_real_kind(15,307) integer :: array_size, i, j, k, l integer, dimension(:), allocatable :: idx real(kind = dp2) :: time1, time2, omp_get_wtime real(kind = dp), dimension(:), allocatable :: a, b, c array_size = 100000000 !assuming I read from an input file, here I just wrote like this allocate ( idx(array_size), a(array_size), b(array_size), c(array_size) ) ! Initialization do i = 1, array_size a(i) = dble(i) ; b(i) = dble(i * 2) ; idx(i) = array_size - i + 1 end do time1 = omp_get_wtime() !$omp parallel do i = 1, 10 call Sub_Prog_1 ( array_size, idx, a, b ) call Sub_Prog_2 ( array_size, a, b, c ) end do !$omp end parallel time2 = omp_get_wtime() print *, c(8000000) print *, 'Results =', time2 - time1 end program MyCode !================================================================== subroutine Sub_Prog_1 ( array_size, idx, a, b ) implicit none integer, parameter :: dp = selected_real_kind(15,307), dp2 = selected_real_kind(15,307) integer :: array_size, i, j, k, l integer, dimension(:), allocatable :: idx real(kind = dp), dimension(:), allocatable :: a, b, c !$omp do private(i) schedule(runtime) !dir$ vector aligned !$omp simd simdlen(4) do i = 1, array_size a(i) = a(idx(i)) + dble(i) if (a(i) <= 3000.0d+0) then a(i) = dble(idx(i)) / 200.0d+0 end if b(i) = sqrt(b(i)) + dble(i * 2) end do !$omp end simd !$omp end do end subroutine Sub_Prog_1 !================================================================== subroutine Sub_Prog_2 ( array_size, a, b, c ) implicit none integer, parameter :: dp = selected_real_kind(15,307), dp2 = selected_real_kind(15,307) integer :: array_size, i, j, k, l real(kind = dp), dimension(:), allocatable :: a, b, c !$omp do private(i) schedule(runtime) !dir$ vector aligned !$omp simd simdlen(4) do i = 1, array_size c(i) = a(i) + sqrt(b(i)) / 3.67d+0 if (c(i) <= 350.0d+0) then c(i) = a(i) + sqrt(b(i)) / 8.67d+0 end if end do !$omp end simd !$omp end do end subroutine Sub_Prog_2
I wanted to exploit the ability of the Intel Compiler 19 for applications of aligned data access for efficient vectorization. Thus, I compiled using the flags "ifort -O3 -qopt-report5 -qopenmp -align array32byte -xAVX -o MyCode.exe Main.f90". Now, I have two questions.
- I was wondering why I cannot combine !dir$ vector aligned and !$omp simd simdlen(...) like written above as the compiler always showed me a message like this:
Sub_Prog_1.f90(17): catastrophic error: **Internal compiler error: internal abort** Please report this error along with the circumstances in which it occurred in a Software Problem Report. Note: File and line given may not be explicit cause of this error. compilation aborted for Main.f90 (code 1)
- As I actually prefer using OpenMP directives to Intel one, I was also previously using the directives "!$omp simd simdlen(4) aligned(a,b,idx :32)" and "!$omp simd simdlen(4) aligned(a,b,c :32)" for the first and second subroutines, respectively. However, as I saw the vectorization reports, I found that the arrays still had unaligned access. The only thing that I could do so that I achieved both aligned access and vectorization is— to use "!dir$ simd vectorlength(4)" instead of "!$omp simd simdlen(4)".
Could someone please explain this matter?
Many thanks.
Best wishes,
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Item 1 is a compiler bug. Please report it to Intel through https://supporttickets.intel.com/?lang=en-US and provide a complete source that reproduces the problem, along with the exact command line you used to compile.
I'll let someone else address your other question.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't know what you mean by "efficient vectorization." Restricting optimizations to those matching simdlen(4) seems likely to reduce performance, unless possibly you hope to optimize short loop counts of odd multiples of 4. I could understand if the compiler fails to adapt, although it must not internal error no matter if the compiler developers think it nonsense to go about it this way. As you use Intel directives, loop count directive seems more apt.
What are you looking for with your alignment directive? The only thing which should change is this should permit generating code to adjust for alignment at the beginning of the loop. I suppose this might be exsmined by diffing on the displays of generated code, if you don't trust when the opt_report says aligned. As it doesn't cost anything on a cpu which supports avx, code is generated to support unaligned access even if optimized for aligned, unless you find out the internal option to chsnge this, in which case internal error might not be a bug. You couldn't see a difference in timing tests except with specific short loop counts. You could achieve that also for avx by vector unaligned directive.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page