- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I wrote the following small program to measure the overhead of gettimeofday() on my platform.
#include <stdio.h> #include <sys/time.h> int main(void) { const size_t num_syscalls = 1000000000; struct timeval begin, end, nao; int i; gettimeofday(&begin, NULL); for (i = 0; i < num_syscalls; ++i) { gettimeofday(&nao, NULL); } gettimeofday(&end, NULL); printf("time = %u.%06u\n", begin.tv_sec, begin.tv_usec); printf("time = %u.%06u\n", end.tv_sec, end.tv_usec); printf("Number of Calls = %u\n", num_syscalls); }
And I got the following output
time = 1396927460.707331 time = 1396927491.641229 Number of Calls = 1000000000
This means around 30 seconds for 1000000000 calls, i.e., the overhead of gettimeofday() is around 30ns.
My question is: Is this small program correctly measuring the overhead of gettimeofday() ? Is it possible that the function was "cached" so that it's retrieval from memory was faster that if it had been called once?
Thanks for your time and kind help.
Best regards,
Wentao
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What is your CPU speed?
I would rather expect the result to be in range of microseconds per single function call.In your case it probably took on average ~120 CPU cycles to complete the execution.When you are taking into account the transition to kernel mode and probably reading either MMIO location or accessing machine dependend RTC implementation (I/O ports) it seems too fast to complete such time - lengthy operation in 30ns.
On the other hand I suppose that GetTimeOfDay() is constantly updating timeval member field which could be mapped to some Linux thread(in your case main thread) equivalent of Windows TEB structure.In such a case the overhead will not be that great because of lack of ring3-to-ring0 transition.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
my result with this sample on a Intel i7-950 with 3.86 Ghz and 24 GByte RAM
time = 1396941708.658778
time = 1396941731.133417
Number of Calls = 1000000000
Platform: openSUSE 13.2 M0 64 Bit Linux , Kernel 3.14.0-5
Compiler:
icc (ICC) 14.0.2 20140120
Copyright (C) 1985-2014 Intel Corporation. All rights reserved.
have a nice day
Franz
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It seems that there is access to members of timeval struct from user mode code and this could explain the fast execution of posted code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
iliyapolak wrote:
What is your CPU speed?
I would rather expect the result to be in range of microseconds per single function call.In your case it probably took on average ~120 CPU cycles to complete the execution.When you are taking into account the transition to kernel mode and probably reading either MMIO location or accessing machine dependend RTC implementation (I/O ports) it seems too fast to complete such time - lengthy operation in 30ns.
On the other hand I suppose that GetTimeOfDay() is constantly updating timeval member field which could be mapped to some Linux thread(in your case main thread) equivalent of Windows TEB structure.In such a case the overhead will not be that great because of lack of ring3-to-ring0 transition.
Thanks for your reply. Here is my CPU information:
c557-302$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 45 model name : Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz stepping : 7 cpu MHz : 2701.000 cache size : 20480 KB physical id : 0 siblings : 8 core id : 0 cpu cores : 8 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid bogomips : 5400.20 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management:
Here is the platform information:
c557-302$ cat /proc/version Linux version 2.6.32-358.el6.x86_64 (mockbuild@c6b8.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Fri Feb 22 00:31:26 UTC 2013
The compiler I am using is icc 13.1.0.
time = 1396965320.865573 time = 1396965351.810059 Number of Calls = 1000000000
Actually I was also expecting sth at the microseconds level. But the results just show it is around ~30ns per call. I am a little confused what might be wrong.
Best regards,
Wentao
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If the code I posted is incorrect, what will be the right way to measure the overhead of gettimeofday()? I truly appreciate your help.
Best regards,
Wentao
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your method seems typical; I also get an average of 30 ns for an analogous test on core i5-4200U using gnu compilers; 15 ns using Intel compilers.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>Actually I was also expecting sth at the microseconds level. But the results just show it is around ~30ns per call. I am a little confused what might be wrong.>>>
I might be wrong on my assumption,but I suppose that gettimeofaday() is accessing timeval struct member from within user mode.If I remember correctly on Windows GetTickCount() function is doing the same that's mean that thread local storage is holding constantly update RTC value.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As Tim said your measuring method is typical.What was not easy to understand it is the short execution time of single call of gettimeofaday() function.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tim Prince wrote:
Your method seems typical; I also get an average of 30 ns for an analogous test on core i5-4200U using gnu compilers; 15 ns using Intel compilers.
I just did some search work and found that some people got the overhead at microseconds level:
http://www.atl.external.lmco.com/projects/QoS/POSIX_html/index.html
http://zeromq.org/results:more-precise-0mq-tests
Is it because they were using slow CPUs? I think my platform is quite advanced.
Best regards,
Wentao
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I suspect these function calls take advantage of a data structure automatically updated by the OS. Both overhead and resolution of the timer are important; sub-microsecond resolution and overhead are useful and about the limit of what I hope to get from portable code.
For Fortran applications I prefer the system_clock timer which gives microsecond resolution with the choice of 64-bit integer data types on most linux compiler systems, but not on Windows. You may have noted the recommendation by some of our colleagues on this forum of the omp_get_wtime() timer for portability between Windows and linux.
Time stamp counters may be accessed with resolution on the order of 10's of clock cycles on most current CPUs; translation to timings in seconds requires care to avoid unnecessary additional overhead. Intel and Microsoft C++ compilers have built-in intrinsics for time stamp counter, which take care of the difference between 32- and 64-bit implementation
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>Your method seems typical; I also get an average of 30 ns for an analogous test on core i5-4200U using gnu compilers; 15 ns using Intel compilers>>>
I suppose that Intel compiler probably inlined gettimeofaday() function thus reducing call/ret overhead.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Wentao Z. wrote:
Quote:
Tim Prince wrote:Your method seems typical; I also get an average of 30 ns for an analogous test on core i5-4200U using gnu compilers; 15 ns using Intel compilers.
I just did some search work and found that some people got the overhead at microseconds level:
http://www.atl.external.lmco.com/projects/QoS/POSIX_html/index.html
http://zeromq.org/results:more-precise-0mq-tests
Is it because they were using slow CPUs? I think my platform is quite advanced.
Best regards,
Wentao
Yes those benchmarks were run on slower CPU's,but that does not explain the 10x bigger difference in results in case of Xeon 3.0Ghz CPU.
http://www.atl.external.lmco.com/projects/QoS/POSIX_html/index.html

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page