- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I suppose that the documentation is not correct in:
http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/fortran/lin/compiler_f/index.htm
chapter:
Optmizing Applications/Programming Guidelines/Understanding Run-time Performance/Non-Unit Stride MemoryAccess
In the description of the examples the indices are interchanged and the description itself is not correct. a
Since the indices in fortran runs from left to right, the two figures are also not correct or interchanged or transposed. For a(i,k) incrementing k (holding i constant) gets *not* consecutive memory elements and for b(k,j) incrementing k (holding j constant) gets consecutive memory elements.
Best regards
dnoack
--------
Non-Unit Stride Memory Access
Another issue that can have considerable impact on performance is accessing memory in a non-Unit Stride fashion. This means that as your inner loop increments consecutively, you access memory from non adjacent locations. For example, consider the following matrix multiplication code:
Example |
---|
!Non-Unit Stride Memory Access subroutine non_unit_stride_memory_access(a,b,c, NUM) implicit none integer :: i,j,k,NUM real :: a(NUM,NUM), b(NUM,NUM), c(NUM,NUM) ! loop before loop interchange do i=1,NUM do j=1,NUM do k=1,NUM c(j,i) = c(j,i) + a(j,k) * b(k,i) end do end do end do end subroutine non_unit_stride_memory_access |
Notice that c
Loop transformation (sometimes called loop interchange) helps to address this problem. While the compiler is capable of doing loop interchange automatically, it does not always recognize the opportunity.
The memory access pattern for the example code listed above is illustrated in the following figure:
Assume you modify the example code listed above by making the following changes to introduce loop interchange:
Example |
---|
subroutine unit_stride_memory_access(a,b,c, NUM) implicit none integer :: i,j,k,NUM real :: a(NUM,NUM), b(NUM,NUM), c(NUM,NUM) ! loop after interchange do i=1,NUM do k=1,NUM do j=1,NUM c(j,i) = c(j,i) + a(j,k) * b(k,i) end do end do end do end subroutine unit_stride_memory_access |
After the loop interchange the memory access pattern might look the following figure:

Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page