- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi everyone,
As the title said, an array from my modules is said to have unaligned access.
It's something like this:
module some_module ... real(dp), allocatable :: a(:,:) !dir$ attributes align:64 :: a ... end module some_module
It gives me unaligned access problem:
remark #15389: vectorization support: reference some_module_mp_a_ has unaligned access
As far as I can tell from articles Data Alignment to Assist Vectorization and Fortran Array Data and Arguments and Vectorization, that is not the right thing to do. Another similar thread "Ifort array alignment problem" has the suggestion to use assume_aligned.
I tried this way, in another file:
subroutine some_subroutine use some_module ... !$omp parallel do private(...) schedule(...) !dir$ assume_aligned a:64 do i = 1, n a(:,i) = 0.0d0 do j = 1, n ! update a(:,i) here enddo enddo !$omp end parallel do ... end subroutine some_subroutine
Compiling with version 16.0 gives me this error:
error #8018: The memory reference associated with an ASSUME_ALIGNED directive must not be in COMMON, a field reference, use associated, or host associated.
While compiling with version 17.0 works but it still says unaligned access in the report.
I complied all of them with -align array64byte. As far as I understand, aligned access with allocatable arrays from modules has problems. I originally assumed it's because of the OpenMP's clauses, but removing them doesn't change the report. Did I misunderstood something?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
You're applying the VECTOR ALIGNED directives to the wrong loop. You should apply it to the array assignment(s), (the implied inner loop(s)), not the outer DO loop:
!$omp parallel do private(...) schedule(...)
do i = 1, n
!dir$ vector aligned
a(:,i) = 0.0d0
do j = 1, n
!dir$ vector aligned
! update a(:,i) here
enddo
enddo
!$omp end parallel do
For this to work, the first dimension of A needs to be a multiple of 64 bytes, as well as A being aligned.
-align array64bytes works for aligning both locally and globally declared arrays. But the alignment of dummy arguments in a subroutine depends on the alignment (and contiguity) of the actual arguments that you pass into it.
For KNC, if the compiler that you build with doesn't match (is newer than) the run-time libraries available on the coprocessor, I'd expect you to have more severe problems than alignment. For kmp_aligned_malloc(), you should make sure that libiomp5.so is available and accessible via the LD_LIBRARY_PATH for the coprocessor. I'd be surprised if the assume_aligned directive was what caused that reference, though. assume_aligned just asserts alignment to the compiler optimizer, it doesn't cause the compiler to align anything at allocation.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I just recently learned of some remaining issues related to the situation you nicely outlined.
For the loop-nest you showed, see if using a syntax where you specify the alignment of the first element of the array eliminates the unaligned accesses. (e.g. !dir$ assume_aligned a(1,1):64)
This may or may not eliminate the unaligned accesses for your case. From what I've learned, there can still be unaligned access issues related to array references in the loop. Here's an example case where the issue is not fully resolved and for which our Developers are considering how to resolve:
real(8),allocatable :: C(:,:),A(:,:),B(:,:) !DIR$ ATTRIBUTES ALIGN: 16 :: C,A,B ... !DIR$ ASSUME_ALIGNED C(1,1):16, A(1,1):16, B(1,1):16 !!DIR$ ASSUME (MOD(N,2).eq.0) do J=1,5 do I=1,N C(I,1) = C(I,1) - A(I,J )*B(I,J) ! Only C(I,1) is aligned, others unaligned C(I,2) = C(I,2) - A(I,J+ 5)*B(I,J) ! All unaligned C(I,3) = C(I,3) - A(I,J+10)*B(I,J) ! All unaligned end do end do
I asked my colleague with more knowledge about this look over your post and reply also.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Both the switch -align array64byte and the directive !DIR$ ATTRIBUTES ALIGN:64 :: A work for aligning an allocatable array at its lower bound. Unlike for static arrays, the compiler does not in general know the lower bound at compile time. It can even vary during run-time, if the array is reallocated. The problem comes in asserting the alignment to the compiler for use in vectorization, since
!DIR$ ASSUME_ALIGNED A:64 doesn’t tell the compiler which element of A is aligned.
In the 17.0 compiler, support was added (for allocatable arrays or pointers only) for the syntax
!DIR$ ASSUME_ALIGNED A(1):64
which tells the compiler that element 1 of A is aligned. (Or it could be A(-1) if the lower bound is -1).
This works for 1-dimensional allocatable arrays or pointers. However, for multi-dimensional arrays there’s another problem: even if the first column is aligned, the compiler doesn’t know whether subsequent columns are aligned, because it doesn’t know the extent of the first dimension. (It needs to be a multiple of the alignment factor, here 64 bytes, for all columns to be aligned). For static arrays, you can assert this with an ASSUME directive, as in Kevin’s example, but this doesn’t work for allocatable arrays or pointers, for the same reason that the dimensions are only defined at run-time and can vary.
In simple cases such as your examples, you can assert that all columns are aligned like this:
!DIR$ ASSUME_ALIGNED A(1,i):64 (note the second subscript is "I", not "1").
(use the actual second subscript in your loop, even if it’s something like icount+1 or 13 )
There are currently some limitations – the directive needs to be inside your outer loop over J, either just before or inside the inner loop or array assignment, and you can’t make multiple assertions for different references to the same array in one loop. We’re working on relaxing these restrictions, hopefully in a forthcoming compiler update.
Please let us know whether the above works for you.
Incidentally, nothing about this is specific to Intel Xeon Phi, apart from the alignment factor of 64 bytes for Intel AVX-512. (It would be 32 bytes when targeting Intel AVX or AVX2 and 16 bytes when targeting Intel SSE).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Incidentally, if all the array accesses in a loop are aligned, the simplest way to ask the compiler to generate aligned loads is
!DIR$ VECTOR ALIGNED immediately before the loop.
This works for array assignments as well as DO loops, and for all types of arrays, allocatable or not. You need to be sure that the data really are contiguous and aligned.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your extensive explanation.
I tried assumed_aligned directives and it gives me this error when running a native Xeon Phi application:
./my-executable: relocation error: ./my-executable: symbol kmp_aligned_malloc, version VERSION not defined in file libiomp5.so with link time reference
Most likely because I compiled it with version 17, but on the coprocessor itself only has version 15 is available. I don't think this will be fixed soon enough.
I then tried to use the vector aligned directives with -align array64byte.
subroutine some_subroutine use some_module ... !$omp parallel do private(...) schedule(...) !dir$ vector aligned do i = 1, n a(:,i) = 0.0d0 !dir$ vector aligned ! or here, tried both positions, there are other arrays involved in this calculation do j = 1, n ! update a(:,i) here enddo enddo !$omp end parallel do ... end subroutine some_subroutine
For single-threaded execution, KMP_PLACE_THREADS=1C,1t, it gives me this error:
forrtl: severe (154): array index out of bounds Image PC Routine Line Source my-excutable 000000000050272D Unknown Unknown Unknown my-excutable 00000000005004C7 Unknown Unknown Unknown my-excutable 00000000004A8464 Unknown Unknown Unknown my-excutable 00000000004A8276 Unknown Unknown Unknown my-excutable 0000000000434D65 Unknown Unknown Unknown my-excutable 0000000000438B7D Unknown Unknown Unknown libpthread.so.0 00007F075D5FE4D0 Unknown Unknown Unknown my-excutable 000000000041942E Unknown Unknown Unknown
While multi-threaded execution, KMP_PLACE_THREADS=60C,4t, gives me segmentation faults:
forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred
In any case, I tried adding the "attributes align:64" directives around, where I declare the allocatable array a(:,:), where I allocate the array a(3,n), and where I use the array a itself. Obviously, there are several allocatable arrays other than a(:,:) involved. And the vectorization report also complained about their unaligned access. I suspect openmp is also part of the problem.
I wonder if you have any suggestion where to go from here?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
You're applying the VECTOR ALIGNED directives to the wrong loop. You should apply it to the array assignment(s), (the implied inner loop(s)), not the outer DO loop:
!$omp parallel do private(...) schedule(...)
do i = 1, n
!dir$ vector aligned
a(:,i) = 0.0d0
do j = 1, n
!dir$ vector aligned
! update a(:,i) here
enddo
enddo
!$omp end parallel do
For this to work, the first dimension of A needs to be a multiple of 64 bytes, as well as A being aligned.
-align array64bytes works for aligning both locally and globally declared arrays. But the alignment of dummy arguments in a subroutine depends on the alignment (and contiguity) of the actual arguments that you pass into it.
For KNC, if the compiler that you build with doesn't match (is newer than) the run-time libraries available on the coprocessor, I'd expect you to have more severe problems than alignment. For kmp_aligned_malloc(), you should make sure that libiomp5.so is available and accessible via the LD_LIBRARY_PATH for the coprocessor. I'd be surprised if the assume_aligned directive was what caused that reference, though. assume_aligned just asserts alignment to the compiler optimizer, it doesn't cause the compiler to align anything at allocation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Martyn Corden (Intel) wrote:
For this to work, the first dimension of A needs to be a multiple of 64 bytes, as well as A being aligned.
Thanks again for the extensive explanation.
In that case, at least for my code, a solution is perhaps to split the multi-dimensional array a to separate arrays.
Let me ask the admin about the mismatch of libraries on the host and xeon phi.
I'm running the code as native execution, it's possible everyone else in my place use the offload approach.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page