- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
While working on the vectorization of the Fortran code. I have encountered a msg from a vectorization report which says,
"non-unit strided store was emulated for the variable <A_(:,idx)>, stride is unknown to compiler"
I have declared an allocatable variable "A(:,:)" in a module, and initialized it in a subroutine with parameters to be decided at runtime.
Then by using USE ONLY with the module in another subroutine, I have tried to use "A" array in a loop to encounter such a msg from vec report.
Since I'm not using any pointer or assumed shape array to refer to it, I can't think of any way to say it's contiguous to the compiler,
which seems to be a way of solving this kind of problem from [ https://software.intel.com/en-us/articles/vectorization-and-array-contiguity ;].
If anyone knows how to deal with this issue. any advice will be deeply appreciated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It looks like the compiler recognizes a length 3 for the first dimension, which is the worst possible for vectorization. So it wants to vectorize on the 2nd dimension, but recognizes that mixing stride 1 and 3 isn't satisfactory for vectorization, at least when you ask for Pentium 4 code.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It shouldn't hurt to declare CONTIGUOUS as in Martyn's examples, although one would think it unnecessary for a module array. We may need a small specific working example.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for a reply!
Here I have added an example code with a module, initilaization and another subroutine with a loop.
module mod_A implicit none save double precision, allocatable, dimension(:,:) :: A double precision, allocatable, dimension(:) :: B type C_type double precision, pointer, dimension(:,:) :: d end type C_type type(C_type) :: C integer :: N end module mod_A subroutine init_A use mod_A, only : A,B,C,N implicit none double precision :: a1,c1 print*,"enter a size of array and two real values" read (*,*) N, a1, c1 allocate(A(3,N), B(N), C%d(3,N)) A = a1 B = 0.d0 C%d = c1 end subroutine init_A subroutine loop_A use mod_A, only : A,B,C,N implicit none integer :: i do i = 1, N B(i) = dot_product(A(:,i),C%d(:,i)) enddo end subroutine loop_A program run_A implicit none call init_A call loop_A end program run_A
After having saved this file titled as test.f90 I have compiled it with the command "ifort -O2 -qopt-report=5 -qopt-report-phase=vec test.f90",
I have attached test.f90 and vec-report as a zip file.
If you take a look at the optrpt file in the zip file, some parts report,
------------------------------------------------------------------------------------------------------------------------------------------------------------
LOOP BEGIN at test.f90(37,4) inlined into test.f90(46,9)
remark #15388: vectorization support: reference b_(I) has aligned access [ test.f90(38,7) ]
remark #15328: vectorization support: non-unit strided load was emulated for the variable <c(:,I)>, stride is 3 [ test.f90(38,14) ]
remark #15328: vectorization support: non-unit strided load was emulated for the variable <a_(:,I)>, stride is unknown to compiler [ test.f90(38,14) ]
-----------------------------------------------------------------------------------------------------------------------------------------------------------
As you can see, the allocatable array "A" is problematic for vectorization.
If I use pointers or assumed-shape-array argument for subroutines, this problem can be solved by giving that array "contiguous" attribute.
Whenever I use allocatable arrays in modules via USE ONLY as shown in the example, however, I can't seem to resolve this issue since I cannot give "contiguous" attribute to allocatable arrays.
The fact that sometimes compiler get the stride information of C%d and other times it doesn't as in this example confuses me as well.
Any advice will be deeply appreciated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It looks like the compiler recognizes a length 3 for the first dimension, which is the worst possible for vectorization. So it wants to vectorize on the 2nd dimension, but recognizes that mixing stride 1 and 3 isn't satisfactory for vectorization, at least when you ask for Pentium 4 code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for a quick reply.
I have changed the dimension of the code such that A(N,3), where 3 is on the 2nd dimension, as shown below,
module mod_A implicit none save double precision, allocatable, dimension(:,:) :: A double precision, allocatable, dimension(:) :: B type C_type double precision, contiguous, pointer, dimension(:,:) :: d end type C_type type(C_type) :: C integer :: N end module mod_A subroutine init_A use mod_A, only : A,B,C,N implicit none double precision :: a1,c1 print*,"enter a size of array and two real values" read (*,*) N, a1, c1 allocate( B(N), A(N,3), C%d(N,3)) A = a1 B = 0.d0 C%d = c1 end subroutine init_A subroutine loop_A use mod_A, only : A,B,C,N implicit none integer :: i,j do i = 1, N ! B(i) = dot_product(A(i,:),C%d(i,:)) do j = 1, 3 B(i) = (A(i,j)*C%d(i,j)) enddo enddo end subroutine loop_A program run_A implicit none call init_A call loop_A end program run_A
and this change essentially removed unknown stride problem.
This dimension change led to slower code though, so I think settling down with the first choice where stride 3 is on the first dimension should be better in this case.
Thank you so much for your time!

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page