- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have spent some time today trying to sort out a bottleneck in my code and been using timef() from the portability library.
I have both linux (Ubuntu 16) and Windows (7) versions of the compiler - both 2017 - update 5. There is only one version of the code and the compiler switches for the two OSs are as good as the same.
It appears to me that the linux version is rounding to an integer. Here is a snip of the debug output for linux
kgsave= 2.000000 move to record= 0.000000 sizegrid= 1.000000 kgload= 1.000000
kgsave= 2.000000 move to record= 0.000000 sizegrid= 0.000000 kgload= 2.000000
kgsave= 1.000000 move to record= 0.000000 sizegrid= 1.000000 kgload= 1.000000
kgsave= 1.000000 move to record= 0.000000 sizegrid= 1.000000 kgload= 1.000000
and here is the same thing for Windows (a virtual machine on the linux box)
kgsave= 0.8750000 move to record= 0.000000 sizegrid= 0.4687500E-01 kgload= 0.7343750
kgsave= 0.7656250 move to record= 0.000000 sizegrid= 0.4687500E-01 kgload= 0.4062500
kgsave= 0.7968750 move to record= 0.000000 sizegrid= 0.3125000E-01 kgload= 0.5312500
kgsave= 0.8437500 move to record= 0.000000 sizegrid= 0.6250000E-01 kgload= 0.7343750
There are some efficiencies in the windows libraries calling the Win API versus the Xorg/Motif calls in linux which explains the faster times but not the precision differences.
The numbers were just created with timer(1)=timef() before the call to the subroutine and timer(2)=timef() after then printing out timer(2)-timer(1). Timer declared as a selected_real_kind(15) array.
Am I doing something wrong or is this problem real?
Cheers
Kim
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I haven't looked into your problem, but I suggest that for a portable (Windows<->Linux) timer, use the OpenMP function omp_get_wtime() that returns a high precision timer as a double (as seconds).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Jim
Sorry for the slow response, its been a busy day.
As a work around that may be a good solution but it doesn't really answer the question as to whether there is a problem with the linux or maybe just Ubuntu version of timef() or whether I'm doing something wrong in using it.
Cheers
Kim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would generally prefer using SYSTEM_CLOCK for such purposes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
system_clock() with INT64 arguments is excellent on linux. On ifort Windows, OpenMP or MPI timers may be better. There is also a (non-portable) timer function in MKL.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Steve and Tim
Lawson Wakefield had also suggested system_clock. My routines were calling his so I had been talking to him about the linux optimisation of his routines.
It looks like none of the old hands use TIMEF() and perhaps there is a reason. The downside of system_clock and openmp_get_wtime for someone who isn't using timing regularly is that they don't leap out at you when you search the Intel help documentation, timef was the only time function I could find which reported time in mSec.
Cheers
Kim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
SYSTEM_CLOCK is a standard Fortran intrinsic subroutine. Unlike functions such as timef(), it will operate the same on all implementations. Many of the "portability library" routines vary in details across platforms.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Steve.
I guess that redefines portable;-) I would have thought the whole point of the portability library was to provide the same experience across all platforms or if not at least acknowledge any variations in the help notes.
I did look at system_clock in help but as all the outputs were integers and I was trying for a quick fix it wasn't immediately obvious that it was able to count to mSec or even uSec as my re-reading this morning suggests. Timef appeared to give me what I wanted straight out of the box.
Cheers
Kim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The resolution of SYSTEM_CLOCK varies, but you can ask what it is (COUNT_RATE). Yes, it takes a bit of extra code to convert that into numbers with fractions. But even like timef it is dependent on how often the OS updates the system clock.
The documentation of timef tells you what it does in Intel's implementation. I have seen (maybe not in the case of timef) other routines have differing interfaces and meanings across implementations, which is why I always prefer the Fortran intrinsics.
I think you're fooling yourself if you believe you'll be getting microsecond resolution out of timef.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
RDTSC outputs in units of clock cycles (actually bus cycles) so it has resolution of a couple of nanoseconds. It's portable across platforms because every processor has a Time Stamp Counter. However, the code required to set it up varies with OS and processor family. Here's what worked for me with gfortran on ubuntu, plagiarizing code from
https://groups.google.com/d/msg/comp.lang.fortran/VJ7tpIqoz9Y/Y_UDnno1AwAJ
With the help of web pages like
http://man7.org/linux/man-pages/man2/mmap.2.html
http://man7.org/linux/man-pages/man2/mprotect.2.html
module rdtsc_mod
use ISO_C_BINDING
implicit none
! We will not export anything but the pointer to the rdtsc function
private
! Interface for rdtsc function
abstract interface
function rdtsc_iface() bind(C)
import
implicit none
integer(C_INT64_T) rdtsc_iface
end function rdtsc_iface
end interface
! Define pointer to rdtsc function and initialize to point
! at initialization function
procedure(rdtsc_iface), pointer, public :: rdtsc => rdtsc_init
! Typedef for off_t
integer, parameter :: POSIX_OFF_T = C_LONG
! Constants required for mmap and mprotect
! Values used by gcc/ubuntu
integer(C_INT), parameter :: &
PROT_READ = int(Z'01',C_INT), &
PROT_WRITE = int(Z'02',C_INT), &
PROT_EXEC = int(Z'04',C_INT), &
MAP_PRIVATE = int(Z'0002',C_INT), &
MAP_ANONYMOUS = int(Z'0020',C_INT)
type(C_PTR), parameter :: MAP_FAILED = transfer(-1_C_INTPTR_T,C_NULL_PTR)
! Interfaces for mmap and mprotect
interface
function mmap(addr,length,prot,flags, &
fd,offset) bind(C,name='mmap')
import
implicit none
type(C_PTR) mmap
type(C_PTR), value :: addr
integer(C_SIZE_T), value :: length
integer(C_INT), value :: prot
integer(C_INT), value :: flags
integer(C_INT), value :: fd
integer(POSIX_OFF_T), value :: offset
end function mmap
function mprotect(addr,len,prot) bind(C,name='mprotect')
import
implicit none
integer(C_INT) mprotect
type(C_PTR), value :: addr
integer(C_SIZE_T), value :: len
integer(C_INT), value :: prot
end function mprotect
end interface
contains
! Initialization procedure for rdtsc. It will be called on the
! first invocation of rdtsc and sets up our real rdtsc function
function rdtsc_init() bind(C)
integer(C_INT64_T) rdtsc_init
! Machine code for 32-bit function
integer(C_INT8_T), target :: BAD_STUFF_32(3)
data BAD_STUFF_32 / &
Z'0F', Z'31', & ! rdtsc
Z'C3' / ! ret
! Machine code for 64-bit function
integer(C_INT8_T), target :: BAD_STUFF_64(10)
data BAD_STUFF_64 / &
Z'0F', Z'31', & ! rdtsc
Z'48', Z'C1', Z'E2', Z'20', & ! shl rdx, 32
Z'48', Z'09', Z'D0', & ! or rax, rdx
Z'C3' / ! ret
! Pointer to machine code appropriate to address size
integer(C_INT8_T), pointer :: code_ptr(:)
! Size of machine code
integer(C_SIZE_T) code_size
! Address the OS allocates for our function via VirtualAlloc
type(C_PTR) rdtsc_address
! Fortran pointer to write our function to
integer(C_INT8_T), pointer :: rdtsc_code(:)
! Error status from mprotect
integer(C_INT) status
! Point machine code pointer at code appropriate to
! address size and get code size
if(bit_size(0_C_INTPTR_T) == 32) then
code_ptr => BAD_STUFF_32
else
code_ptr => BAD_STUFF_64
end if
code_size = size(code_ptr,KIND=C_SIZE_T)
! Get writable address from OS to put our function in
rdtsc_address = mmap( &
addr = C_NULL_PTR, &
length = code_size, &
prot = iany([PROT_READ,PROT_WRITE,PROT_EXEC]), &
flags = iany([MAP_PRIVATE,MAP_ANONYMOUS]), &
fd = -1, &
offset = 0_POSIX_OFF_T)
! If something goes wrong, abort
if(transfer(rdtsc_address,0_C_INTPTR_T) == &
transfer(MAP_FAILED,0_C_INTPTR_T)) then
write(*,'(*(g0))') &
'rdtsc_init failed in mmap'
stop
end if
! Get Fortran pointer to allocated memory and poke our
! function into it. Then mark it as executable
call C_F_POINTER(rdtsc_address,rdtsc_code,[code_size])
rdtsc_code = code_ptr
status = mprotect( &
addr = rdtsc_address, &
len = code_size, &
prot = iany([PROT_READ,PROT_EXEC]))
! If something goes wrong, abort
if(status == -1) then
write(*,'(*(g0))') &
'rdtsc_init failed in mprotect'
stop
end if
! Point the function pointer at the function we just poked into memory
call C_F_PROCPOINTER(transfer(rdtsc_address,C_NULL_FUNPTR), &
rdtsc)
! We still have to return the TSC value for transparency
rdtsc_init = rdtsc()
end function rdtsc_init
end module rdtsc_mod
program hello3
use rdtsc_mod
use ISO_C_BINDING, only: C_INT64_T
implicit none
integer(C_INT64_T) t0(-1:10), tf(-1:10)
integer i
integer array(100)
integer partials(10)
interface
function get_sum(array,upper)
implicit none
integer get_sum
integer array(*), upper
end function get_sum
end interface
array = [(i,i=1,size(array))]
t0(-1) = rdtsc()
write(*,'(*(g0))') 'Hello, world'
tf(-1) = rdtsc()
t0(0) = rdtsc()
tf(0) = rdtsc()
do i = 1, 10
t0(i) = rdtsc()
partials(i) = get_sum(array,10*i)
tf(i) = rdtsc()
end do
write(*,'(*(g0))') 'Time for hello = ',tf(-1)-t0(-1)
write(*,'(*(g0))') 'Time for rdtsc = ',tf(0)-t0(0)
do i = 1, 10
write(*,'(*(g0))') 'Partials(',i,') = ',partials(i),', time = ',tf(i)-t0(i)
end do
end program hello3
function get_sum(array,upper)
implicit none
integer get_sum
integer array(*)
integer upper
get_sum = sum(array(1:upper))
end function get_sum
The output was:
Hello, world Time for hello = 154656 Time for rdtsc = 72 Partials(1) = 55, time = 612 Partials(2) = 210, time = 288 Partials(3) = 465, time = 324 Partials(4) = 820, time = 324 Partials(5) = 1275, time = 378 Partials(6) = 1830, time = 432 Partials(7) = 2485, time = 504 Partials(8) = 3240, time = 540 Partials(9) = 4095, time = 594 Partials(10) = 5050, time = 648
So it seemed to work. Does ifort have a built-in function equivalent to RDTSC?
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page