- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to vectorize my code for Xeon Phi and Xeon.
I am trying to offload a function to the Xeon Phi. Lets say I do this in module A. And this function is defined in some other module, say module B. Now, when I see the optreport for the module B, it says unaligned access, but only for intent(in) and intent(out) variables (these variables are arrays with leading dimension of size 16). Now, when I go to module A optreport, these intent(in) and intent(out) variables have aligned access. Then, I put assume_aligned directive before those variables in the module B. But still it says unaligned access. Also, if I make a local copy of these variables in module B, the local copies have aligned access. So, what should I do? Should I make local copies of each of these variables, or there is some other way?
Thanks.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Does making a local copy improve the predicted performance of vectorized loops? If not, I wouldn't be concerned about the compiler saying it will accept misalignment there.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tim,
After making local copy, there is slight increase in potential speedup, like, from 1.7 to 1.8 for one loop, from 2.9 to 3.1 for some other loop.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you provide some example code? It only needs to be compilable, not executable.
Tim is right in that the difference in performance between vectorized loops and non-vectorized loops is typically large, whereas the difference between loops with aligned and unaligned data is typically very much smaller. The potential speed-ups in the optimization report are only compile time estimates. There's a lot that isn't known at compile time, so the actual impact may be different.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Martyn,
Actually, it is very difficult to give the code, because they are spread across modules. Will providing small sections of code be beneficial? If yes, which kind of section of code should I provide,i.e., some loop which is auto vectorized, or some loop which is not, etc?
In some dummy codes that I compiled, the vectorizable loop was showing 13X potential speedup when the accesses were aligned compared to 7.3 with unaligned access, on Xeon Phi. But yeah, I didn't run and check the exact values.
Also, I was able to ensure aligned accesses for dummy argument variables as you had suggested in another thread, but the problem I am facing now, is that the run is stopping after just a few iterations. I think the alignment is causing some problem, because if I comment those assume_aligned directives, the code runs fine.
I have kind of figured out why this is happening, but don't know what the solution is. I will explain the problem, please help me, if possible.
So, say in module A, I am offloading a function call func1, and the actual arguments that I send to the function are from an array of (structure of arrays). Now, say in module B, I have the definition for this func1. In that definition, I use assume_aligned and make all accesses to the dummy arguments aligned. But I think that this array of (structure of arrays) in module A which I am using to store the arguments to func1 itself not aligned, and hence causing issues with the run with alignment. I tried using !dir$ attributes align : 64 :: array, to ensure that arraybuf is aligned, but it shows error. ( error #6410: This name has not been declared as an array or a function. [array] )
Ex. ( .. represents codes in between )
module A
use module B, only : func1
type(input)
real(kind=8) :: array1(16,27)
real(kind=8) :: array2(16,27)
end type(input)
type (input) array(1:800)
..
subroutine some_name()
..
offload directives
omp directives
do i = 1,800
func1( array(i)%array1, array(i)%array2 )
enddo
end omp directives
end offload directives
..
end subroutine some_name
..
end module A
module B
..
..
subroutine func1 ( array1, array2)
real(kind=8) , intent(in) :: array1
real(kind=8) , intent(in) :: array2
..
..
end subroutine func1
..
..
end module B
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Amiesh,
It’s true that alignment can make a bigger difference on Intel Xeon Phi than for recent Intel Xeon processors. If the compiler generates an aligned load on the basis of your directive, but the data are not aligned, you will likely get a fault on Intel Xeon Phi and with older Xeon instruction sets. Newer instruction sets on recent Xeons may not fault, but performance may be worse.
You can print out the alignment of arrays or array elements to verify it is what you think it is. For example,
istart = loc(array(i)%array1(1,1)) print *, 'array1 alignment ', istart, mod(istart,64) or even make it conditional on (mod(istart,64).ne.0)
In principle, you don’t need to align the structures themselves, only the arrays within the structures. However, I don’t believe there is a way to align static arrays that are components of derived types. If you make the components into allocatable arrays, then you can align them, but you can’t currently tell the compiler that they are aligned (because you can specify different lower bound in the ALLOCATE statement). That’s something we are currently working on.
For the present, the best you can do is to declare the type as a “SEQUENCE” type; align the instance of the type itself; and insert padding if necessary, to preserve the alignment of individual components. Here is an example I wrote some time ago:
! ============================================================== ! ! SAMPLE SOURCE CODE - SUBJECT TO THE TERMS OF SAMPLE CODE LICENSE AGREEMENT, ! http://software.intel.com/en-us/articles/intel-sample-source-code-license-agreement/ ! ! Copyright 2015 Intel Corporation ! ! THIS FILE IS PROVIDED "AS IS" WITH NO WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT ! NOT LIMITED TO ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR ! PURPOSE, NON-INFRINGEMENT OF INTELLECTUAL PROPERTY RIGHTS. ! ! ============================================================= Module Def_Module implicit none integer, parameter :: balign= 64 ! (alignment in bytes) integer, parameter :: data_size = 4 ! (data type size in bytes) integer, parameter :: nalign = balign / data_size integer, parameter :: n1=1003, n2=n1, np= nalign - mod(n1,nalign) type myType sequence real, dimension(n1) :: array1 ! real, dimension(np) :: pad real, dimension(n2) :: array2 end type end module Def_Module program align_in_type use Def_Module implicit none integer :: i, al1, al2, al3 real, dimension(10) :: dum1 type(myType) :: a ! !dir$ attributes align:64 :: a real, dimension(17) :: dum2 a%array1 = (/(real(i),i=1,n1)/) al1 = mod(loc(a%array1), balign) al2 = mod(loc(a%array2), balign) al3 = mod(loc(a) , balign) print *, 'array1, array2, type alignments', al1, al2, al3 !dir$ noinline call sub(a) print *, a%array2(1), a%array2(n1) end program align_in_type subroutine sub(a) use Def_Module implicit none type(myType), intent(inout) :: a !dir$ vector aligned a%array2 = 2.0 * a%array1 end subroutine sub $ ifort -traceback align_in_type.f90 ; ./a.out array1, array2, type alignments 32 12 32 forrtl: severe (174): SIGSEGV, segmentation fault occurred ...
Next, align A by uncommenting line 35 (this also aligns array1):
$ ifort -traceback align_in_type.f90 ; ./a.out array1, array2, type alignments 0 44 0 forrtl: severe (174): SIGSEGV, segmentation fault occurred
Now align array2 by inserting some padding (uncomment line 23):
$ ifort -traceback align_in_type.f90 ; ./a.out array1, array2, type alignments 0 0 0 2.000000 2006.000 $
Now that the type array components are both aligned, the VECTOR ALIGNED directive no longer causes a fault. You can’t use ASSUME_ALIGNED directives on derived types or their components; but if I had passed the array components to SUB directly as separate arguments, I could have used ASSUME_ALIGNED on the array dummy arguments. I don’t think the fact that you are offloading to an Intel Xeon Phi coprocessor makes much difference to most of this.
Hope this helps.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Martyn,
Thanks a lot for this. I will try out all these methods.
Regards.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Martyn,
The padding is working for my code and I am now getting correct results. Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Glad that worked. We're working on the alternative of aligning an allocatable array component directly, and being able to specify that in a directive, so that padding would not be necessary in that case.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page