Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
29285 Discussions

Intel Fortran Compiler Documentation is not correct

dnoack
Beginner
332 Views
Hello,

I suppose that the documentation is not correct in:

http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/fortran/lin/compiler_f/index.htm

chapter:
Optmizing Applications/Programming Guidelines/Understanding Run-time Performance/Non-Unit Stride MemoryAccess

In the description of the examples the indices are interchanged and the description itself is not correct. a not access consecutive memory locations, but b(k,i) access consecutive memory locations.

Since the indices in fortran runs from left to right, the two figures are also not correct or interchanged or transposed. For a(i,k) incrementing k (holding i constant) gets *not* consecutive memory elements and for b(k,j) incrementing k (holding j constant) gets consecutive memory elements.

Best regards
dnoack

--------

Non-Unit Stride Memory Access

Another issue that can have considerable impact on performance is accessing memory in a non-Unit Stride fashion. This means that as your inner loop increments consecutively, you access memory from non adjacent locations. For example, consider the following matrix multiplication code:

Example

!Non-Unit Stride Memory Access

subroutine non_unit_stride_memory_access(a,b,c, NUM)

implicit none

integer :: i,j,k,NUM

real :: a(NUM,NUM), b(NUM,NUM), c(NUM,NUM)

! loop before loop interchange

do i=1,NUM

do j=1,NUM

do k=1,NUM

c(j,i) = c(j,i) + a(j,k) * b(k,i)

end do

end do

end do

end subroutine non_unit_stride_memory_access

Notice that c, and a both access consecutive memory locations when the inner-most loops associated with the array are incremented. The b array however, with its loops with indexes k and j, does not access Memory Unit Stride. When the loop reads b[k=0][j=0] and then the k loop increments by one to b[k=1][j=0], the loop has skipped over NUM memory locations having skipped b[1], b[2] .. b[NUM].

Loop transformation (sometimes called loop interchange) helps to address this problem. While the compiler is capable of doing loop interchange automatically, it does not always recognize the opportunity.

The memory access pattern for the example code listed above is illustrated in the following figure:


Assume you modify the example code listed above by making the following changes to introduce loop interchange:

Example

subroutine unit_stride_memory_access(a,b,c, NUM)

implicit none

integer :: i,j,k,NUM

real :: a(NUM,NUM), b(NUM,NUM), c(NUM,NUM)

! loop after interchange

do i=1,NUM

do k=1,NUM

do j=1,NUM

c(j,i) = c(j,i) + a(j,k) * b(k,i)

end do

end do

end do

end subroutine unit_stride_memory_access

After the loop interchange the memory access pattern might look the following figure:




0 Kudos
1 Reply
Steven_L_Intel1
Employee
332 Views
You are looking at documentation from an older version - it was obviously written for C and just the code changed for Fortran. This was removed in Intel Fortran Composer XE 2011.
0 Kudos
Reply