TIMEF() exhibits different behaviour in Linux and Windows

Frankcombe__Kim · ‎10-31-2017

I have spent some time today trying to sort out a bottleneck in my code and been using timef() from the portability library.

I have both linux (Ubuntu 16) and Windows (7) versions of the compiler - both 2017 - update 5. There is only one version of the code and the compiler switches for the two OSs are as good as the same.

It appears to me that the linux version is rounding to an integer. Here is a snip of the debug output for linux

kgsave=   2.000000     move to record=   0.000000     sizegrid=   1.000000     kgload=   1.000000
kgsave=   2.000000     move to record=   0.000000     sizegrid=   0.000000     kgload=   2.000000
kgsave=   1.000000     move to record=   0.000000     sizegrid=   1.000000     kgload=   1.000000
kgsave=   1.000000     move to record=   0.000000     sizegrid=   1.000000     kgload=   1.000000

and here is the same thing for Windows (a virtual machine on the linux box)

kgsave= 0.8750000     move to record=   0.000000     sizegrid= 0.4687500E-01 kgload= 0.7343750
kgsave= 0.7656250     move to record=   0.000000     sizegrid= 0.4687500E-01 kgload= 0.4062500
kgsave= 0.7968750     move to record=   0.000000     sizegrid= 0.3125000E-01 kgload= 0.5312500
kgsave= 0.8437500     move to record=   0.000000     sizegrid= 0.6250000E-01 kgload= 0.7343750

There are some efficiencies in the windows libraries calling the Win API versus the Xorg/Motif calls in linux which explains the faster times but not the precision differences.

The numbers were just created with timer(1)=timef() before the call to the subroutine and timer(2)=timef() after then printing out timer(2)-timer(1). Timer declared as a selected_real_kind(15) array.

Am I doing something wrong or is this problem real?

Cheers

Kim

jimdempseyatthecove · ‎10-31-2017

I haven't looked into your problem, but I suggest that for a portable (Windows<->Linux) timer, use the OpenMP function omp_get_wtime() that returns a high precision timer as a double (as seconds).

Jim Dempsey

Frankcombe__Kim · ‎11-01-2017

Thanks Jim

Sorry for the slow response, its been a busy day.

As a work around that may be a good solution but it doesn't really answer the question as to whether there is a problem with the linux or maybe just Ubuntu version of timef() or whether I'm doing something wrong in using it.

Cheers
Kim

Steve_Lionel · ‎11-01-2017

I would generally prefer using SYSTEM_CLOCK for such purposes.

TimP · ‎11-01-2017

system_clock() with INT64 arguments is excellent on linux. On ifort Windows, OpenMP or MPI timers may be better. There is also a (non-portable) timer function in MKL.

Frankcombe__Kim · ‎11-01-2017

Thanks Steve and Tim

Lawson Wakefield had also suggested system_clock. My routines were calling his so I had been talking to him about the linux optimisation of his routines.

It looks like none of the old hands use TIMEF() and perhaps there is a reason. The downside of system_clock and openmp_get_wtime for someone who isn't using timing regularly is that they don't leap out at you when you search the Intel help documentation, timef was the only time function I could find which reported time in mSec.

Cheers
Kim

Steve_Lionel · ‎11-01-2017

SYSTEM_CLOCK is a standard Fortran intrinsic subroutine. Unlike functions such as timef(), it will operate the same on all implementations. Many of the "portability library" routines vary in details across platforms.

Frankcombe__Kim · ‎11-01-2017

Thanks Steve.

I guess that redefines portable;-) I would have thought the whole point of the portability library was to provide the same experience across all platforms or if not at least acknowledge any variations in the help notes.

I did look at system_clock in help but as all the outputs were integers and I was trying for a quick fix it wasn't immediately obvious that it was able to count to mSec or even uSec as my re-reading this morning suggests. Timef appeared to give me what I wanted straight out of the box.

Cheers
Kim

Steve_Lionel · ‎11-02-2017

The resolution of SYSTEM_CLOCK varies, but you can ask what it is (COUNT_RATE). Yes, it takes a bit of extra code to convert that into numbers with fractions. But even like timef it is dependent on how often the OS updates the system clock.

The documentation of timef tells you what it does in Intel's implementation. I have seen (maybe not in the case of timef) other routines have differing interfaces and meanings across implementations, which is why I always prefer the Fortran intrinsics.

I think you're fooling yourself if you believe you'll be getting microsecond resolution out of timef.

JVanB · ‎11-02-2017

RDTSC outputs in units of clock cycles (actually bus cycles) so it has resolution of a couple of nanoseconds. It's portable across platforms because every processor has a Time Stamp Counter. However, the code required to set it up varies with OS and processor family. Here's what worked for me with gfortran on ubuntu, plagiarizing code from

https://groups.google.com/d/msg/comp.lang.fortran/VJ7tpIqoz9Y/Y_UDnno1AwAJ

With the help of web pages like

http://man7.org/linux/man-pages/man2/mmap.2.html

http://man7.org/linux/man-pages/man2/mprotect.2.html

module rdtsc_mod
   use ISO_C_BINDING
   implicit none
! We will not export anything but the pointer to the rdtsc function
   private
! Interface for rdtsc function
   abstract interface
      function rdtsc_iface() bind(C)
         import
         implicit none
         integer(C_INT64_T) rdtsc_iface
      end function rdtsc_iface
   end interface
! Define pointer to rdtsc function and initialize to point
! at initialization function
   procedure(rdtsc_iface), pointer, public :: rdtsc => rdtsc_init
! Typedef for off_t
   integer, parameter :: POSIX_OFF_T = C_LONG
! Constants required for mmap and mprotect
! Values used by gcc/ubuntu
   integer(C_INT), parameter :: &
      PROT_READ = int(Z'01',C_INT), &
      PROT_WRITE = int(Z'02',C_INT), &
      PROT_EXEC = int(Z'04',C_INT), &
      MAP_PRIVATE = int(Z'0002',C_INT), &
      MAP_ANONYMOUS = int(Z'0020',C_INT)
   type(C_PTR), parameter :: MAP_FAILED = transfer(-1_C_INTPTR_T,C_NULL_PTR)
! Interfaces for mmap and mprotect
   interface
      function mmap(addr,length,prot,flags, &
         fd,offset) bind(C,name='mmap')
         import
         implicit none
         type(C_PTR) mmap
         type(C_PTR), value :: addr
         integer(C_SIZE_T), value :: length
         integer(C_INT), value :: prot
         integer(C_INT), value :: flags
         integer(C_INT), value :: fd
         integer(POSIX_OFF_T), value :: offset
      end function mmap

      function mprotect(addr,len,prot) bind(C,name='mprotect')
         import
         implicit none
         integer(C_INT) mprotect
         type(C_PTR), value :: addr
         integer(C_SIZE_T), value :: len
         integer(C_INT), value :: prot
      end function mprotect
   end interface
   contains
! Initialization procedure for rdtsc. It will be called on the
! first invocation of rdtsc and sets up our real rdtsc function
      function rdtsc_init() bind(C)
         integer(C_INT64_T) rdtsc_init
! Machine code for 32-bit function
         integer(C_INT8_T), target :: BAD_STUFF_32(3)
         data BAD_STUFF_32 / &
            Z'0F', Z'31', &               ! rdtsc
            Z'C3' /                       ! ret
! Machine code for 64-bit function
         integer(C_INT8_T), target :: BAD_STUFF_64(10)
         data BAD_STUFF_64 / &
            Z'0F', Z'31', &               ! rdtsc
            Z'48', Z'C1', Z'E2', Z'20', & ! shl rdx, 32
            Z'48', Z'09', Z'D0', &        ! or rax, rdx
            Z'C3' /                       ! ret
! Pointer to machine code appropriate to address size
         integer(C_INT8_T), pointer :: code_ptr(:)
! Size of machine code
         integer(C_SIZE_T) code_size
! Address the OS allocates for our function via VirtualAlloc
         type(C_PTR) rdtsc_address
! Fortran pointer to write our function to
         integer(C_INT8_T), pointer :: rdtsc_code(:)
! Error status from mprotect
         integer(C_INT) status

! Point machine code pointer at code appropriate to
! address size and get code size
         if(bit_size(0_C_INTPTR_T) == 32) then
            code_ptr => BAD_STUFF_32
         else
            code_ptr => BAD_STUFF_64
         end if
         code_size = size(code_ptr,KIND=C_SIZE_T)
! Get writable address from OS to put our function in
         rdtsc_address = mmap( &
            addr = C_NULL_PTR, &
            length = code_size, &
            prot = iany([PROT_READ,PROT_WRITE,PROT_EXEC]), &
            flags = iany([MAP_PRIVATE,MAP_ANONYMOUS]), &
            fd = -1, &
            offset = 0_POSIX_OFF_T)
! If something goes wrong, abort
         if(transfer(rdtsc_address,0_C_INTPTR_T) == &
            transfer(MAP_FAILED,0_C_INTPTR_T)) then
            write(*,'(*(g0))') &
               'rdtsc_init failed in mmap'
            stop
         end if
! Get Fortran pointer to allocated memory and poke our
! function into it.  Then mark it as executable
         call C_F_POINTER(rdtsc_address,rdtsc_code,[code_size])
         rdtsc_code = code_ptr
         status = mprotect( &
            addr = rdtsc_address, &
            len = code_size, &
            prot = iany([PROT_READ,PROT_EXEC]))
! If something goes wrong, abort
         if(status == -1) then
            write(*,'(*(g0))') &
               'rdtsc_init failed in mprotect'
               stop
         end if
! Point the function pointer at the function we just poked into memory
         call C_F_PROCPOINTER(transfer(rdtsc_address,C_NULL_FUNPTR), &
            rdtsc)
! We still have to return the TSC value for transparency
         rdtsc_init = rdtsc()
      end function rdtsc_init
end module rdtsc_mod

program hello3
   use rdtsc_mod
   use ISO_C_BINDING, only: C_INT64_T
   implicit none
   integer(C_INT64_T) t0(-1:10), tf(-1:10)
   integer i
   integer array(100)
   integer partials(10)
   interface
      function get_sum(array,upper)
         implicit none
         integer get_sum
         integer array(*), upper
      end function get_sum
   end interface

   array = [(i,i=1,size(array))]
   t0(-1) = rdtsc()
   write(*,'(*(g0))') 'Hello, world'
   tf(-1) = rdtsc()
   t0(0) = rdtsc()
   tf(0) = rdtsc()
   do i = 1, 10
      t0(i) = rdtsc()
      partials(i) = get_sum(array,10*i)
      tf(i) = rdtsc()
   end do
   write(*,'(*(g0))') 'Time for hello = ',tf(-1)-t0(-1)
   write(*,'(*(g0))') 'Time for rdtsc = ',tf(0)-t0(0)
   do i = 1, 10
      write(*,'(*(g0))') 'Partials(',i,') = ',partials(i),', time = ',tf(i)-t0(i)
   end do
end program hello3

function get_sum(array,upper)
   implicit none
   integer get_sum
   integer array(*)
   integer upper
   get_sum = sum(array(1:upper))
end function get_sum

The output was:

Hello, world
Time for hello = 154656
Time for rdtsc = 72
Partials(1) = 55, time = 612
Partials(2) = 210, time = 288
Partials(3) = 465, time = 324
Partials(4) = 820, time = 324
Partials(5) = 1275, time = 378
Partials(6) = 1830, time = 432
Partials(7) = 2485, time = 504
Partials(8) = 3240, time = 540
Partials(9) = 4095, time = 594
Partials(10) = 5050, time = 648

So it seemed to work. Does ifort have a built-in function equivalent to RDTSC?