Solved: Letting the compiler know strides of array from module..

kang__myeongseok · ‎07-25-2018

Hello,

While working on the vectorization of the Fortran code. I have encountered a msg from a vectorization report which says,

"non-unit strided store was emulated for the variable <A_(:,idx)>, stride is unknown to compiler"

I have declared an allocatable variable "A(:,:)" in a module, and initialized it in a subroutine with parameters to be decided at runtime.

Then by using USE ONLY with the module in another subroutine, I have tried to use "A" array in a loop to encounter such a msg from vec report.

Since I'm not using any pointer or assumed shape array to refer to it, I can't think of any way to say it's contiguous to the compiler,

which seems to be a way of solving this kind of problem from [ https://software.intel.com/en-us/articles/vectorization-and-array-contiguity ;].

If anyone knows how to deal with this issue. any advice will be deeply appreciated.

TimP · ‎07-25-2018

It looks like the compiler recognizes a length 3 for the first dimension, which is the worst possible for vectorization. So it wants to vectorize on the 2nd dimension, but recognizes that mixing stride 1 and 3 isn't satisfactory for vectorization, at least when you ask for Pentium 4 code.

View solution in original post

TimP · ‎07-25-2018

It shouldn't hurt to declare CONTIGUOUS as in Martyn's examples, although one would think it unnecessary for a module array. We may need a small specific working example.

kang__myeongseok · ‎07-25-2018

Thank you for a reply!

Here I have added an example code with a module, initilaization and another subroutine with a loop.

module mod_A
   implicit none
   save
   double precision, allocatable, dimension(:,:) :: A
   double precision, allocatable, dimension(:) :: B
   type C_type
      double precision, pointer, dimension(:,:) :: d
   end type C_type
   type(C_type) :: C

   integer :: N
end module mod_A

subroutine init_A
   use mod_A, only : A,B,C,N
   implicit none
   double precision :: a1,c1

   print*,"enter a size of array and two real values"
   read (*,*) N, a1, c1
   allocate(A(3,N), B(N), C%d(3,N))
   A = a1
   B = 0.d0
   C%d = c1
end subroutine init_A

subroutine loop_A
   use mod_A, only : A,B,C,N
   implicit none
   integer :: i

   do i = 1, N
      B(i) = dot_product(A(:,i),C%d(:,i))
   enddo 
end subroutine loop_A

program run_A
   implicit none
   
   call init_A
   call loop_A

end program run_A

After having saved this file titled as test.f90 I have compiled it with the command "ifort -O2 -qopt-report=5 -qopt-report-phase=vec test.f90",

I have attached test.f90 and vec-report as a zip file.

If you take a look at the optrpt file in the zip file, some parts report,

------------------------------------------------------------------------------------------------------------------------------------------------------------

LOOP BEGIN at test.f90(37,4) inlined into test.f90(46,9)
remark #15388: vectorization support: reference b_(I) has aligned access [ test.f90(38,7) ]
remark #15328: vectorization support: non-unit strided load was emulated for the variable <c(:,I)>, stride is 3 [ test.f90(38,14) ]
remark #15328: vectorization support: non-unit strided load was emulated for the variable <a_(:,I)>, stride is unknown to compiler [ test.f90(38,14) ]

-----------------------------------------------------------------------------------------------------------------------------------------------------------

As you can see, the allocatable array "A" is problematic for vectorization.

If I use pointers or assumed-shape-array argument for subroutines, this problem can be solved by giving that array "contiguous" attribute.

Whenever I use allocatable arrays in modules via USE ONLY as shown in the example, however, I can't seem to resolve this issue since I cannot give "contiguous" attribute to allocatable arrays.

The fact that sometimes compiler get the stride information of C%d and other times it doesn't as in this example confuses me as well.

Any advice will be deeply appreciated.

TimP · ‎07-25-2018

It looks like the compiler recognizes a length 3 for the first dimension, which is the worst possible for vectorization. So it wants to vectorize on the 2nd dimension, but recognizes that mixing stride 1 and 3 isn't satisfactory for vectorization, at least when you ask for Pentium 4 code.

kang__myeongseok · ‎07-25-2018

Thank you for a quick reply.

I have changed the dimension of the code such that A(N,3), where 3 is on the 2nd dimension, as shown below,

module mod_A
   implicit none
   save
   double precision, allocatable, dimension(:,:) :: A
   double precision, allocatable, dimension(:) :: B
   type C_type
      double precision, contiguous, pointer, dimension(:,:) :: d
   end type C_type
   type(C_type) :: C
 
   integer :: N
 
end module mod_A
 
subroutine init_A
   use mod_A, only : A,B,C,N
   implicit none
   double precision :: a1,c1
 
   print*,"enter a size of array and two real values"
   read (*,*) N, a1, c1
   allocate( B(N), A(N,3), C%d(N,3))

   A = a1
   B = 0.d0
   C%d = c1
end subroutine init_A
 
subroutine loop_A
   use mod_A, only : A,B,C,N
   implicit none
   integer :: i,j
 
   do i = 1, N
!      B(i) = dot_product(A(i,:),C%d(i,:))
      do j = 1, 3
         B(i) = (A(i,j)*C%d(i,j))
      enddo
   enddo
end subroutine loop_A
 
program run_A
   implicit none
 
   call init_A
   call loop_A
end program run_A

and this change essentially removed unknown stride problem.

This dimension change led to slower code though, so I think settling down with the first choice where stride 3 is on the first dimension should be better in this case.

Thank you so much for your time!