Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7942 Discussions

Measuring the overhead of gettimeofday()

Wentao_Z_
Beginner
7,422 Views

Hi,

I wrote the following small program to measure the overhead of gettimeofday() on my platform.

#include <stdio.h>
#include <sys/time.h>

int main(void)
{
        const size_t num_syscalls = 1000000000;
        struct timeval begin, end, nao;
        int i;
        gettimeofday(&begin, NULL);
        for (i = 0; i < num_syscalls; ++i) {
                gettimeofday(&nao, NULL);
        }
        gettimeofday(&end, NULL);
        printf("time = %u.%06u\n", begin.tv_sec, begin.tv_usec);
        printf("time = %u.%06u\n", end.tv_sec, end.tv_usec);
        printf("Number of Calls = %u\n", num_syscalls);


}

And I got the following output

time = 1396927460.707331
time = 1396927491.641229
Number of Calls = 1000000000

This means around 30 seconds for 1000000000 calls, i.e., the overhead of gettimeofday() is around 30ns.

My question is: Is this small program correctly measuring the overhead of gettimeofday() ? Is it possible that the function was "cached" so that it's retrieval from memory was faster that if it had been called once?

Thanks for your time and kind help.

Best regards,
    Wentao

0 Kudos
12 Replies
Bernard
Valued Contributor I
7,421 Views

 

What is your CPU speed?

I would rather expect the result  to be in range of microseconds per single function call.In your case it probably took on average ~120 CPU  cycles to complete the execution.When you are taking into account the transition to kernel mode and probably reading either MMIO location or accessing machine dependend RTC implementation (I/O ports) it seems too fast to complete such time - lengthy operation in 30ns.

On the other hand I suppose that GetTimeOfDay() is constantly updating timeval member field which could be mapped to some Linux thread(in your case main thread) equivalent of Windows TEB structure.In such a case the overhead will not be that great because of lack of ring3-to-ring0 transition.

0 Kudos
bernaske
New Contributor I
7,421 Views

Hi

my result with this sample on a Intel i7-950 with 3.86 Ghz and 24 GByte RAM

time = 1396941708.658778
time = 1396941731.133417
Number of Calls = 1000000000


Platform: openSUSE 13.2 M0 64 Bit Linux , Kernel 3.14.0-5

Compiler:

icc (ICC) 14.0.2 20140120
Copyright (C) 1985-2014 Intel Corporation.  All rights reserved.

have a nice day

Franz

 

0 Kudos
Bernard
Valued Contributor I
7,422 Views

 

It seems that there is access to members of timeval struct from user mode code and this could explain the fast execution of posted code.

0 Kudos
Wentao_Z_
Beginner
7,422 Views

iliyapolak wrote:

 

What is your CPU speed?

I would rather expect the result  to be in range of microseconds per single function call.In your case it probably took on average ~120 CPU  cycles to complete the execution.When you are taking into account the transition to kernel mode and probably reading either MMIO location or accessing machine dependend RTC implementation (I/O ports) it seems too fast to complete such time - lengthy operation in 30ns.

On the other hand I suppose that GetTimeOfDay() is constantly updating timeval member field which could be mapped to some Linux thread(in your case main thread) equivalent of Windows TEB structure.In such a case the overhead will not be that great because of lack of ring3-to-ring0 transition.

Thanks for your reply. Here is my CPU information:

c557-302$ cat /proc/cpuinfo
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 45
model name	: Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
stepping	: 7
cpu MHz		: 2701.000
cache size	: 20480 KB
physical id	: 0
siblings	: 8
core id		: 0
cpu cores	: 8
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
bogomips	: 5400.20
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

Here is the platform information:

c557-302$ cat /proc/version
Linux version 2.6.32-358.el6.x86_64 (mockbuild@c6b8.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Fri Feb 22 00:31:26 UTC 2013

The compiler I am using is icc 13.1.0.

time = 1396965320.865573
time = 1396965351.810059
Number of Calls = 1000000000

Actually I was also expecting sth at the microseconds level. But the results just show it is around ~30ns per call. I am a little confused what might be wrong.

Best regards,
   Wentao

0 Kudos
Wentao_Z_
Beginner
7,422 Views

If the code I posted is incorrect, what will be the right way to measure the overhead of gettimeofday()? I truly appreciate your help.

Best regards,
   Wentao

0 Kudos
TimP
Honored Contributor III
7,422 Views

Your method seems typical; I also get an average of 30 ns for an analogous test on core i5-4200U using gnu compilers; 15 ns using Intel compilers.

0 Kudos
Bernard
Valued Contributor I
7,422 Views

>>>Actually I was also expecting sth at the microseconds level. But the results just show it is around ~30ns per call. I am a little confused what might be wrong.>>>

I might be wrong on my assumption,but I suppose that gettimeofaday() is accessing timeval struct member from within user mode.If I remember correctly on Windows GetTickCount() function is doing the same that's mean that thread local storage is holding constantly update RTC value.

0 Kudos
Bernard
Valued Contributor I
7,422 Views

 

As Tim said your measuring method is typical.What was not easy to understand it is  the short execution time of single call of gettimeofaday() function.

0 Kudos
Wentao_Z_
Beginner
7,422 Views

Tim Prince wrote:

Your method seems typical; I also get an average of 30 ns for an analogous test on core i5-4200U using gnu compilers; 15 ns using Intel compilers.

I just did some search work and found that some people got the overhead at microseconds level:

http://www.atl.external.lmco.com/projects/QoS/POSIX_html/index.html

http://zeromq.org/results:more-precise-0mq-tests

Is it because they were using slow CPUs? I think my platform is quite advanced.

Best regards,
   Wentao

0 Kudos
TimP
Honored Contributor III
7,422 Views

I suspect these function calls take advantage of a data structure automatically updated by the OS.  Both overhead and resolution of the timer are important; sub-microsecond resolution and overhead are useful and about the limit of what I hope to get from portable code.

For Fortran applications I prefer the system_clock timer which gives microsecond resolution with the choice of 64-bit integer data types on most linux compiler systems, but not on Windows.  You may have noted the recommendation by some of our colleagues on this forum of the omp_get_wtime() timer for portability between Windows and linux.

Time stamp counters may be accessed with resolution on the order of 10's of clock cycles on most current CPUs; translation to timings in seconds requires care to avoid unnecessary additional overhead.  Intel and Microsoft C++ compilers have built-in intrinsics for time stamp counter, which take care of the difference between 32- and 64-bit implementation

0 Kudos
Bernard
Valued Contributor I
7,422 Views

>>>Your method seems typical; I also get an average of 30 ns for an analogous test on core i5-4200U using gnu compilers; 15 ns using Intel compilers>>>

I suppose that Intel compiler probably inlined gettimeofaday() function thus reducing call/ret overhead.

0 Kudos
Bernard
Valued Contributor I
7,421 Views

Wentao Z. wrote:

Quote:

Tim Prince wrote:

Your method seems typical; I also get an average of 30 ns for an analogous test on core i5-4200U using gnu compilers; 15 ns using Intel compilers.

 

I just did some search work and found that some people got the overhead at microseconds level:

http://www.atl.external.lmco.com/projects/QoS/POSIX_html/index.html

http://zeromq.org/results:more-precise-0mq-tests

Is it because they were using slow CPUs? I think my platform is quite advanced.

Best regards,

   Wentao

Yes those benchmarks were run on slower CPU's,but that does not explain  the 10x bigger difference in results in case of Xeon 3.0Ghz CPU.

http://www.atl.external.lmco.com/projects/QoS/POSIX_html/index.html

 

 

0 Kudos
Reply