Vectorization Issue with intent(in) variables

Amlesh_K_ · ‎10-12-2015

I am trying to vectorize my code for Xeon Phi and Xeon.

I am trying to offload a function to the Xeon Phi. Lets say I do this in module A. And this function is defined in some other module, say module B. Now, when I see the optreport for the module B, it says unaligned access, but only for intent(in) and intent(out) variables (these variables are arrays with leading dimension of size 16). Now, when I go to module A optreport, these intent(in) and intent(out) variables have aligned access. Then, I put assume_aligned directive before those variables in the module B. But still it says unaligned access. Also, if I make a local copy of these variables in module B, the local copies have aligned access. So, what should I do? Should I make local copies of each of these variables, or there is some other way?

Thanks.

TimP · ‎10-12-2015

Does making a local copy improve the predicted performance of vectorized loops? If not, I wouldn't be concerned about the compiler saying it will accept misalignment there.

Amlesh_K_ · ‎10-12-2015

Hi Tim,

After making local copy, there is slight increase in potential speedup, like, from 1.7 to 1.8 for one loop, from 2.9 to 3.1 for some other loop.

Martyn_C_Intel · ‎10-14-2015

Can you provide some example code? It only needs to be compilable, not executable.

Tim is right in that the difference in performance between vectorized loops and non-vectorized loops is typically large, whereas the difference between loops with aligned and unaligned data is typically very much smaller. The potential speed-ups in the optimization report are only compile time estimates. There's a lot that isn't known at compile time, so the actual impact may be different.

Amlesh_K_ · ‎10-14-2015

Hi Martyn,

Actually, it is very difficult to give the code, because they are spread across modules. Will providing small sections of code be beneficial? If yes, which kind of section of code should I provide,i.e., some loop which is auto vectorized, or some loop which is not, etc?

In some dummy codes that I compiled, the vectorizable loop was showing 13X potential speedup when the accesses were aligned compared to 7.3 with unaligned access, on Xeon Phi. But yeah, I didn't run and check the exact values.

Also, I was able to ensure aligned accesses for dummy argument variables as you had suggested in another thread, but the problem I am facing now, is that the run is stopping after just a few iterations. I think the alignment is causing some problem, because if I comment those assume_aligned directives, the code runs fine.

I have kind of figured out why this is happening, but don't know what the solution is. I will explain the problem, please help me, if possible.

So, say in module A, I am offloading a function call func1, and the actual arguments that I send to the function are from an array of (structure of arrays). Now, say in module B, I have the definition for this func1. In that definition, I use assume_aligned and make all accesses to the dummy arguments aligned. But I think that this array of (structure of arrays) in module A which I am using to store the arguments to func1 itself not aligned, and hence causing issues with the run with alignment. I tried using !dir$ attributes align : 64 :: array, to ensure that arraybuf is aligned, but it shows error. ( error #6410: This name has not been declared as an array or a function. [array] )

Ex. ( .. represents codes in between )

module A

use module B, only : func1

type(input)

real(kind=8) :: array1(16,27)

real(kind=8) :: array2(16,27)

end type(input)

type (input) array(1:800)

..

subroutine some_name()

..

offload directives

omp directives

do i = 1,800

func1( array(i)%array1, array(i)%array2 )

enddo

end omp directives

end offload directives

..

end subroutine some_name

..

end module A

module B

..

subroutine func1 ( array1, array2)

real(kind=8) , intent(in) :: array1

real(kind=8) , intent(in) :: array2

..

end subroutine func1

..

end module B

Martyn_C_Intel · ‎10-15-2015

Hi Amiesh,

It’s true that alignment can make a bigger difference on Intel Xeon Phi than for recent Intel Xeon processors. If the compiler generates an aligned load on the basis of your directive, but the data are not aligned, you will likely get a fault on Intel Xeon Phi and with older Xeon instruction sets. Newer instruction sets on recent Xeons may not fault, but performance may be worse.

You can print out the alignment of arrays or array elements to verify it is what you think it is. For example,

istart = loc(array(i)%array1(1,1))
print *, 'array1 alignment ', istart, mod(istart,64)

or even make it conditional on    (mod(istart,64).ne.0)

In principle, you don’t need to align the structures themselves, only the arrays within the structures. However, I don’t believe there is a way to align static arrays that are components of derived types. If you make the components into allocatable arrays, then you can align them, but you can’t currently tell the compiler that they are aligned (because you can specify different lower bound in the ALLOCATE statement). That’s something we are currently working on.

For the present, the best you can do is to declare the type as a “SEQUENCE” type; align the instance of the type itself; and insert padding if necessary, to preserve the alignment of individual components. Here is an example I wrote some time ago:

! ==============================================================
!
! SAMPLE SOURCE CODE - SUBJECT TO THE TERMS OF SAMPLE CODE LICENSE AGREEMENT,
! http://software.intel.com/en-us/articles/intel-sample-source-code-license-agreement/
!
! Copyright 2015 Intel Corporation
!
! THIS FILE IS PROVIDED "AS IS" WITH NO WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT
! NOT LIMITED TO ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
! PURPOSE, NON-INFRINGEMENT OF INTELLECTUAL PROPERTY RIGHTS.
!
! =============================================================

Module Def_Module
  implicit none
  integer, parameter :: balign= 64     ! (alignment in bytes)
  integer, parameter :: data_size = 4  ! (data type size in bytes)
  integer, parameter :: nalign = balign / data_size
  integer, parameter :: n1=1003, n2=n1, np= nalign - mod(n1,nalign)
  type myType
    sequence
    real, dimension(n1) :: array1
!   real, dimension(np) :: pad
    real, dimension(n2) :: array2 
  end type

end module Def_Module

program align_in_type
  use Def_Module
  implicit none
  integer                   :: i, al1, al2, al3
  real, dimension(10)       :: dum1
  type(myType)              :: a
! !dir$ attributes align:64 :: a 
  real, dimension(17)       :: dum2

  a%array1 = (/(real(i),i=1,n1)/)
  al1 = mod(loc(a%array1), balign)
  al2 = mod(loc(a%array2), balign)
  al3 = mod(loc(a)       , balign) 
  print *, 'array1, array2, type alignments', al1, al2, al3

!dir$ noinline
  call sub(a)

  print *, a%array2(1), a%array2(n1)
end program align_in_type

subroutine sub(a)
  use Def_Module
  implicit none
  type(myType), intent(inout) :: a

!dir$ vector aligned
  a%array2 = 2.0 * a%array1

end subroutine sub

$ ifort -traceback align_in_type.f90 ; ./a.out
 array1, array2, type alignments          32          12          32
forrtl: severe (174): SIGSEGV, segmentation fault occurred
...

Next, align A by uncommenting line 35 (this also aligns array1):

$ ifort -traceback align_in_type.f90 ; ./a.out
 array1, array2, type alignments           0          44           0
forrtl: severe (174): SIGSEGV, segmentation fault occurred

Now align array2 by inserting some padding (uncomment line 23):

$ ifort -traceback align_in_type.f90 ; ./a.out
 array1, array2, type alignments           0           0           0
   2.000000       2006.000
$

Now that the type array components are both aligned, the VECTOR ALIGNED directive no longer causes a fault. You can’t use ASSUME_ALIGNED directives on derived types or their components; but if I had passed the array components to SUB directly as separate arguments, I could have used ASSUME_ALIGNED on the array dummy arguments. I don’t think the fact that you are offloading to an Intel Xeon Phi coprocessor makes much difference to most of this.

Hope this helps.

Amlesh_K_ · ‎10-19-2015

Hi Martyn,

Thanks a lot for this. I will try out all these methods.

Regards.

Amlesh_K_ · ‎10-31-2015

Hi Martyn,

The padding is working for my code and I am now getting correct results. Thanks.

Martyn_C_Intel · ‎11-02-2015

Glad that worked. We're working on the alternative of aligning an allocatable array component directly, and being able to specify that in a directive, so that padding would not be necessary in that case.