Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!
26758 Discussions

Letting the compiler know strides of array from module..

kang__myeongseok
Beginner
480 Views

Hello,

While working on the vectorization of the Fortran code. I have encountered a msg from a vectorization report which says,

"non-unit strided store was emulated for the variable <A_(:,idx)>, stride is unknown to compiler"

I have declared an allocatable variable "A(:,:)" in a module, and initialized it in a subroutine with parameters to be decided at runtime.

Then by using USE ONLY with the module in another subroutine, I have tried to use "A" array in a loop to encounter such a msg from vec report.

 

Since I'm not using any pointer or assumed shape array to refer to it, I can't think of any way to say it's contiguous to the compiler,

which seems to be a way of solving this kind of problem from [ https://software.intel.com/en-us/articles/vectorization-and-array-contiguity ;].

 

If anyone knows how to deal with this issue. any advice will be deeply appreciated.

0 Kudos
1 Solution
TimP
Black Belt
480 Views

It looks like the compiler recognizes a length 3 for the first dimension, which is the worst possible for vectorization.  So it wants to vectorize on the 2nd dimension, but recognizes that mixing stride 1 and 3 isn't satisfactory for vectorization, at least when you ask for Pentium 4 code.

View solution in original post

4 Replies
TimP
Black Belt
480 Views

It shouldn't hurt to declare CONTIGUOUS as in Martyn's examples, although one would think it unnecessary for a module array.  We may need a small specific working example.

kang__myeongseok
Beginner
480 Views

Thank you for a reply!

Here I have added an example code with a module, initilaization and another subroutine with a loop.

 

module mod_A
   implicit none
   save
   double precision, allocatable, dimension(:,:) :: A
   double precision, allocatable, dimension(:) :: B
   type C_type
      double precision, pointer, dimension(:,:) :: d
   end type C_type
   type(C_type) :: C

   integer :: N
end module mod_A

subroutine init_A
   use mod_A, only : A,B,C,N
   implicit none
   double precision :: a1,c1

   print*,"enter a size of array and two real values"
   read (*,*) N, a1, c1
   allocate(A(3,N), B(N), C%d(3,N))
   A = a1
   B = 0.d0
   C%d = c1
end subroutine init_A

subroutine loop_A
   use mod_A, only : A,B,C,N
   implicit none
   integer :: i

   do i = 1, N
      B(i) = dot_product(A(:,i),C%d(:,i))
   enddo 
end subroutine loop_A

program run_A
   implicit none
   
   call init_A
   call loop_A

end program run_A

 

After having saved this file titled as test.f90 I have compiled it with the command "ifort -O2 -qopt-report=5 -qopt-report-phase=vec test.f90",

I have attached test.f90 and vec-report as a zip file.

If you take a look at the optrpt file in the zip file, some parts report,

------------------------------------------------------------------------------------------------------------------------------------------------------------

LOOP BEGIN at test.f90(37,4) inlined into test.f90(46,9)
   remark #15388: vectorization support: reference b_(I) has aligned access   [ test.f90(38,7) ]
   remark #15328: vectorization support: non-unit strided load was emulated for the variable <c(:,I)>, stride is 3   [ test.f90(38,14) ]
   remark #15328: vectorization support: non-unit strided load was emulated for the variable <a_(:,I)>, stride is unknown to compiler   [ test.f90(38,14) ]
 

-----------------------------------------------------------------------------------------------------------------------------------------------------------

As you can see, the allocatable array "A" is problematic for vectorization.

If I use pointers or assumed-shape-array argument for subroutines, this problem can be solved by giving that array "contiguous" attribute.

Whenever I use allocatable arrays in modules via USE ONLY as shown in the example, however, I can't seem to resolve this issue since I cannot give "contiguous" attribute to allocatable arrays. 

The fact that sometimes compiler get the stride information of C%d and other times it doesn't as in this example confuses me as well.

Any advice will be deeply appreciated.

TimP
Black Belt
481 Views

It looks like the compiler recognizes a length 3 for the first dimension, which is the worst possible for vectorization.  So it wants to vectorize on the 2nd dimension, but recognizes that mixing stride 1 and 3 isn't satisfactory for vectorization, at least when you ask for Pentium 4 code.

View solution in original post

kang__myeongseok
Beginner
480 Views

Thank you for a quick reply.

I have changed the dimension of the code such that A(N,3), where 3 is on the 2nd dimension, as shown below,

module mod_A
   implicit none
   save
   double precision, allocatable, dimension(:,:) :: A
   double precision, allocatable, dimension(:) :: B
   type C_type
      double precision, contiguous, pointer, dimension(:,:) :: d
   end type C_type
   type(C_type) :: C
 
   integer :: N
 
end module mod_A
 
subroutine init_A
   use mod_A, only : A,B,C,N
   implicit none
   double precision :: a1,c1
 
   print*,"enter a size of array and two real values"
   read (*,*) N, a1, c1
   allocate( B(N), A(N,3), C%d(N,3))

   A = a1
   B = 0.d0
   C%d = c1
end subroutine init_A
 
subroutine loop_A
   use mod_A, only : A,B,C,N
   implicit none
   integer :: i,j
 
   do i = 1, N
!      B(i) = dot_product(A(i,:),C%d(i,:))
      do j = 1, 3
         B(i) = (A(i,j)*C%d(i,j))
      enddo
   enddo
end subroutine loop_A
 
program run_A
   implicit none
 
   call init_A
   call loop_A
end program run_A

and this change essentially removed unknown stride problem.

This dimension change led to slower code though, so I think settling down with the first choice where stride 3 is on the first dimension should be better in this case.

Thank you so much for your time!

Reply