Strange speedup with gfortran over ifort

Micha_Gorelick · ‎12-25-2010

Hi,

I'm trying to create a custom CSHIFT function in order to easily maintain some strange boundary conditions. Playing around I made the following test program:

[fortran]PROGRAM cshifttest
  USE IFPORT
  IMPLICIT NONE
 
  INTEGER, PARAMETER                   :: numtests = 10000
  DOUBLE PRECISION, DIMENSION(300,300) :: a, b
  REAL                                 :: starttime, endtime
  INTEGER                              :: i, j
  777 FORMAT("=== ",A," took ",F16.8," seconds to run (",E16.8," seconds per shift)")
  call srand(34533)
 
  PRINT*,"Initializing"
  DO i=1,SIZE(a,1)
    DO j=1,SIZE(a,2)
      a(i,j) = rand() * 100
    END DO
  END DO
 
  PRINT*,"Runing cshift"
  CALL cpu_time(starttime)
  DO i=1,INT(numtests/2)
    b = cshift(a, i, 1)
  END DO
  DO i=1,INT(numtests/2)
    b = cshift(a, i, 2)
  END DO
  CALL cpu_time(endtime)
  WRITE(*,777) "cshift", (endtime-starttime), (endtime-starttime)/numtests
 
  PRINT*,"Running mshift"
  CALL cpu_time(starttime)
  DO i=1,INT(numtests/2)
    b = mshift(a, i, 1)
  END DO
  DO i=1,INT(numtests/2)
    b = mshift(a, i, 2)
  END DO
  CALL cpu_time(endtime)
  WRITE(*,777) "mshift", (endtime-starttime), (endtime-starttime)/numtests
 
  CONTAINS
 
  FUNCTION mshift(array, shift, axis) result(shifted)
    IMPLICIT NONE
    DOUBLE PRECISION, DIMENSION(:,:)                         :: array
    DOUBLE PRECISION, DIMENSION(SIZE(array,1),SIZE(array,2)) :: shifted
    INTEGER                                                  :: shift, axis
 
    shifted = CSHIFT(array, shift, axis)
    shifted(1,:) = array(2,:)
    shifted(SIZE(array,1),:) = array(SIZE(array,1)-1,:)
    shifted(:,1) = array(:,2)
    shifted(:,SIZE(array,1)) = array(:,SIZE(array,1)-1) 
    return
  END FUNCTION
END PROGRAM[/fortran]

When I run this with ifort (compiled with `ifort -O3 cshifttest.f90`) I get the following output:

$ ifort -O3 cshifttest.f90 && ./a.out
Initializing
Runing cshift
=== cshift took 0.3439480 seconds to run ( 0.3439480E-04 seconds per shift)
Running mshift
=== mshift took 39.9719238 seconds to run ( 0.3997192E-02 seconds per shift)

On the otherhand, gfortran (compiled with `gfortran -O3 cshifttest.f90`, note that you must comment out the USE command on line 2) gives:

gfortran -O3 cshifttest.f90 && ./a.out
Initializing
Runing cshift
=== cshift took 3.08553004 seconds to run ( 0.30855299E-03 seconds per shift)
Running mshift
=== mshift took 3.12652516 seconds to run ( 0.31265253E-03 seconds per shift)

I have the following versions installed:

$ ifort --version
ifort (IFORT) 11.0 20090318
Copyright (C) 1985-2009 Intel Corporation. All rights reserved.
$gfortran --version
GNU Fortran (GCC) 4.1.2 20080704 (Red Hat 4.1.2-44)
Copyright (C) 2007 Free Software Foundation, Inc.

Furthermore, some information about the machine:

[bash]$ uname -a
Linux xxxxxxxxxx 2.6.18-128.1.14.el5 #1 SMP Wed Jun 17 06:38:05 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

$ free -m
             total       used       free     shared    buffers     cached
Mem:         32168      10655      21513          0        848       8436
-/+ buffers/cache:       1370      30798
Swap:        16002         82      15920

$ head -n 22 /proc/cpuinfo 
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel Xeon CPU            5160  @ 3.00GHz
stepping        : 11
cpu MHz         : 2992.509
cache size      : 4096 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 2
apicid          : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
                 pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc 
                 pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
bogomips        : 5989.08
clflush size    : 64
cache_alignment : 64
address sizes   : 38 bits physical, 48 bits virtual[/bash]

Does anyone have any explination as to why the timings are so vastly different? I can understand why the intrinsic CSHIFT is faster with IFORT (simply becauese of all the optimizations and the fact that this machine runs on an intel chip), but I don't get why MSHIFT is SO much slower with IFORT. Can anyone recommend ways of implementing MSHIFT in a more optimized way? Note that the boundary conditions set in the current implemination are simply for testing, in the actual code they are bound to change and be much more intricate.

TimP · ‎12-26-2010

If you want run-time tests to discover when a pair of plain vector moves will do the job, you might as well write those in to your function, rather than depending on CSHIFT being implemented that way.

mecej4 · ‎12-26-2010

You are looking at measurements that are not meaningful. In fact, IFort 12.0 on a 3 GHz C2D E8400 gives:

[bash] Initializing
 Runing cshift
=== cshift took       0.00000000 seconds to run (  0.00000000E+00 seconds per shift)
 Running mshift
=== mshift took       5.84375000 seconds to run (  0.58437500E-03 seconds per shift)
[/bash]

This indicates that the optimization did away with the "cshift run". The same could have been done with the "mshift run" as well if the compiler could figure out that the function is PURE.

To get meaningful measurements, it would be necessary to do something more with the returned values from the function calls.