- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have written a simple example which differenciates between the apparent precision of system_clock, given by Count_Rate, and the actual precision available from the different values returned.
I have run this example on ifort Ver 11.1, which I have installed.
It shows that both CPU_TIME and SYSTEM_CLOCK have only 64 ticks per second, which is very poor precision available via the Fortran standard intrinsic routines.
Better precisoin is available ( see QueryPerformanceCounter) and should be provided in these intrinsic routines.
John
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>...Over the last 20 years, processor rates have improved by over 1,000 times from 1 mhz to 3 ghz. Unfortunately the precision
>>of some timers has not matched this improvement, to the extent that they now give poor performance for what program
>>developers require of them.
I think that John is talking mainly about the system timers like RTC.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
iliyapolak,
Of the 6 timers I tested, 4 are updated 64 times per second, which includes both Fortran intrinsics. That is once evert 42 million processor cycles, which contrasts dramatically with the CPU clocks reported in Sergey's recent posts. The following table summarises teh performance I have obtained from the program I attached recently, where:
Routine : is the name of the timing routine being tested ( Fortran intrinsic or API routine)
Elapse : is the duration of the test
Ticks : is the number of changes of time reported during the test
Calls : is the number of times the routine has been called during the test,
Time1 : is the time between tick cycles, timed with QueryPerformanceCounter
Time2 : is the time between tick cycles, timed with test routine
inc : is the average increment in timing value, reported by the test routine
ticks/sec : is the average ticks per second (time changes per second)
cpu : is the estimate of how long the routine takes to return the time value, in processor cycles
Note : my processor runs at 2.67 ghz, so the slowest (CPU_Time) returns the estimate in 0.15 micro seconds or 6.8 million calls per second. This is not a significant overhead for polling the time at a friequency of more than 64 cycles per second.
RDTSC looks like a much better timer for elapsed time estimates.
[plain]
Test Routine Elapse Ticks Calls time1 time2 inc ticks/sec cpu
1 system_clock 5.070 324 40845007 1.565E-02 1.565E-02 15647 64 331
2 CPU_Time 5.101 326 34510251 1.565E-02 1.565E-02 15648 64 394
3 GetTickCount 6.287 402 887790725 1.564E-02 1.564E-02 15 64 18
4 QueryPerformanceCoun 5.632 14625772 283723396 3.851E-07 3.853E-07 1 2595206 52
5 GetProcessTimes 5.038 322 34090182 1.565E-02 1.565E-02 156486 64 394
6 RDTSC API 5.289 459827584 459827584 1.150E-08 1.149E-08 30 87008169 30
[/plain]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
iliyapolak wrote:
>>>Threading forum is not necessarily restricted to Intel software if it concerns Intel platforms>>>
Tim do you mean Threading Building Blocks forum?
http://software.intel.com/en-us/forums/threading-on-intel-parallel-architectures
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
TimP (Intel) wrote:
Quote:
iliyapolakwrote:>>>Threading forum is not necessarily restricted to Intel software if it concerns Intel platforms>>>
Tim do you mean Threading Building Blocks forum?
http://software.intel.com/en-us/forums/threading-on-intel-parallel-archi...
Thanks Tim.
This forum seems a good place for my thread:)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>Nanosecond-precision Test
Web-link: zeromq.org/results:more-precise-0mq-tests>>>
Great article.Thanks for link.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi John,
I think that precision of 4 timers(64hz) frequency is directly related to real-time multimedia needs and to OS thread scheduling.For example quantum interval on Windows runs for two timer intervals ~20ms(it can be measured with Clockres tool) for client version so in this case granularity of RDTSC I think is not needed.I think that RDTSC is better option when small portion of code like loops are measured.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sergey Kostrov wrote:
In the article ( link is posted above ) there is a number for RDTSC overhead on Intel Pentium 4 CPU:
...
Intel(R) Pentium(R) 4 3 GHz ... 33 nsand if I normalize my result:
Intel(R) Pentium(R) 4 1.60GHz ... 84 clock cycles
to their ( with conversion from clock sycles to ns and GHz normalization ) then this is what I will get:
Normalized: ( 1 / 1600000000 * 84 ) / ( 3.0 / 1.6 ) = ( 52.5 ) / ( 1.875 ) = 28 ns
There is ~15% difference in accuracy of measurements. Unfortunately, it is Not clear how exactly they measured these numbers ( I could post my codes and they are very simple ).
RDTSC is so much different on CPUs of recent years than it was on P4 that I don't see much relevance in comparisons with such old data. Evidently, on recent CPUs, as RDTSC measures baseline clock ticks and multiplies them by the nominal multiplier, it can be expected not to increment during the baseline clock interval. RDTSC may access a local core clock so as not to have an access latency dependent on which core accesses it, but of course if you take care to return to the same core for consistency you have greater overhead. The latter might account for behavior of Windows QueryPerformance.
Intel CPUs which have an ETC counter shared by all cores tell you that latency is many times that of RDTSC.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sergey,
If you are asking a question on observed variability of:
- call duration, or
- variability in the time parameter increment between successive changes,
I have noted significant variability in my measures of these. For example, in the following code example, modified from my earlier attachment, the variation in the change in NUM, can be calculated for mean and standard deviation, and I have reported when the d_num value deviates significantly from the average. These instances occur with all the timing routines, and some of these deviations from the average can be many standard deviations.
I think there are multiple reasons for this, related to the system running many processes.
[fortran] do n = 1,n_lim
call use_system_clock (time, num)
if (num == last_num) cycle
last_num = num ! last tick
i = i+1
list(i) = num
sec(i) = time
if (i == size(list)) exit
end do [/fortran]
What I do not know, is if these variations effect the long term accuracy of the time measure. In the case of elapsed time, it would depend on how the timer is calibrated and what is the ultimate reference clock. RDTSC could be a reference clock, but there might be others. Any attempt to identify better accuracy is probably not justified, as elapsed time performance measure of a process is probably more affected by other background processes, so it is all part of the noise.
In the case of CPU time, this is an accumulation of the estimate of when the process is running. This has been more relevant when trying to identify lost time, such as I/O interrupt delays when I have tried to better manage buffered I/O performance. It is becoming less significant than elapsed time.
The optimisation problems I am attempting to solve are not at the few processor cycles level, but they are certainly shorter than 42 million processor cycles, which is precision you can get from the Fortran intrinsics.
The short answer is: I have seen the variability of performance in a number of areas, and I think most is due to shared processes. I am not convinced this variability is a significant problem.
John
Ps : I hope this post gets delivered soon, as my past post was held over for review, making it less relevant to the other later posts, when finally released. Not sure what made it qualify for being stopped ?
Also, why can't this windows forum accept windows cut and paste ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>I think there are multiple reasons for this, related to the system running many processes.>>>
I think that such a behaviour can be related to thread context switching.This maybe can explain variation of your measurements.Try to run your thread at higher priority level to diminish the frequency of thread context switches and observe how the variation changes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>Also, why can't this windows forum accept windows cut and paste ?>>>
I have never experienced such a issue on my browser.Btw I am using Firefox.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>I think this is expected because Windows interrupts my processing to do something else.>>>
I think that this is related to servicing interrupts and to running scheduler code itself.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page