Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Row-major or column major ?

Junghwan_m_
Beginner
492 Views

It is common sense that FORTRAN store array in column-major order. But following code gives a unexpected results. (Row-major is faster)

 

Tested version : ifort version 18.0.1

 

[Results]

do i= ; do j= ;

A[i,j] = something … ==> 1.532 s

 

do j= ; do i= ;

A[i,j] = something … ==> 5.095 s

 

Thanks in advance.

 

      program test_simple_calculation

      parameter (IMAX=10000,JMAX=100)
      integer i,j,k
      real s1,s2
      real a(IMAX,JMAX),b(IMAX,JMAX),c(IMAX,JMAX)
      real d(IMAX,JMAX)

      write(*,*) 'The for sum  :'
      a=1.0e-1;b=1.0e-1;c=1.0e-1;
      call time_check(s1)
      do k=1,100
      do i=1,IMAX
      do j=1,JMAX
      c(i,j)=a(i,j)+b(i,j)
      enddo
      enddo
      enddo
      call time_check(s2)
      d=c
      write(*,*) 'total time in sec.    ',s2-s1
      write(*,*) '================================='

     write(*,*) 'The for sum  :'
      a=1.0e-1;b=1.0e-1;c=1.0e-1;
      call time_check(s1)
      do k=1,100
      do j=1,JMAX
      do i=1,IMAX
      c(i,j)=a(i,j)+b(i,j)
      enddo
      enddo
      enddo
      call time_check(s2)
      d=c
      write(*,*) 'total time in sec.    ',s2-s1
      write(*,*) '================================='
      stop
      end



      subroutine time_check(s)
      integer  values(1:8)
      real s
      call DATE_AND_TIME(values=values)
      write(*,*) values(5),values(6),values(7),values(8)
      s=real(values(6))*60.+real(values(7))+real(values(8))*0.001
      return
      end

 

0 Kudos
1 Reply
IanH
Honored Contributor III
492 Views

Compiling with the equivalent of -O3 here, I see minimal difference in timing (two milliseconds for the first set of loops, one millisecond for the second set).  Inspecting the assembly shows that the second set of loops is collapsed into a single loop, while the loops of the first set have been reordered.  Both sets drop the outer loop.

More generally - note that the value of d is not used in the program.  This means that the value of c is not used, which means that there is no point calculating a + b, which means that there is no point executing any of the loops.  Sometimes the optimiser will be smart enough to figure this sort of stuff out (at least here it appears to have figured out the k loop is not required), so you need to design your tests appropriately and always check assembly or similar to see what it is that you are actually testing.
 

0 Kudos
Reply