Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29263 Discussions

precision of CPU_Time and System_Clock

John_Campbell
New Contributor II
21,635 Views
There have been a number of comments about the precision of the standard timing routines available in ifort.
I have written a simple example which differenciates between the apparent precision of system_clock, given by Count_Rate, and the actual precision available from the different values returned.
I have run this example on ifort Ver 11.1, which I have installed.
It shows that both CPU_TIME and SYSTEM_CLOCK have only 64 ticks per second, which is very poor precision available via the Fortran standard intrinsic routines.
Better precisoin is available ( see QueryPerformanceCounter) and should be provided in these intrinsic routines.

John
87 Replies
SergeyKostrov
Valued Contributor II
1,520 Views
[ Intel Intel(R) Pentium(R) 4 CPU 1.60GHz ] [ Microsoft C++ compiler - DEBUG ] Test-Case 1.1 - Overhead of RDTSC instruction Min RDTSC Overhead Value : 84.000 clock cycles Test-Case 1.2 - Overhead of RDTSC instruction Total Delta Value : 791947072 clock cycles Avg RDTSC Overhead Value : 79.195 clock cycles Test-Case 1.3 - Overhead of Assignment of a Value from RDTSC instruction Min Overhead of Assignment: 1.273 clock cycles Final RDTSC Overhead Value: 82.727 clock cycles
0 Kudos
SergeyKostrov
Valued Contributor II
1,520 Views
I finally have very consistent results for RDTSC Overhead ( Latency ) obtained with Intel C++ compiler. [ Intel Intel(R) Pentium(R) 4 CPU 1.60GHz ] [ Intel C++ compiler - DEBUG ] Test-Case 1.1 - Overhead of RDTSC instruction Min RDTSC Overhead Value : 84.000 clock cycles Test-Case 1.2 - Overhead of RDTSC instruction Total Delta Value : 885178944 clock cycles Avg RDTSC Overhead Value : 88.518 clock cycles Test-Case 1.3 - Overhead of Assignment of a Value from RDTSC instruction Min Overhead of Assignment: 1.091 clock cycles Final RDTSC Overhead Value: 82.909 clock cycles [ Intel C++ compiler - RELEASE ] Test-Case 1.1 - Overhead of RDTSC instruction Min RDTSC Overhead Value : 84.000 clock cycles Test-Case 1.2 - Overhead of RDTSC instruction Total Delta Value : 791483712 clock cycles Avg RDTSC Overhead Value : 79.148 clock cycles Test-Case 1.3 - Overhead of Assignment of a Value from RDTSC instruction Min Overhead of Assignment: 1.191 clock cycles Final RDTSC Overhead Value: 82.809 clock cycles Notes: - Results for RDTSC Overhead Value differ by ~0.12% for tests in DEBUG and RELEASE configurations - Results for Overhead of Assignment differ by ~8.5% - ~9% for tests in DEBUG and RELEASE configurations ( accuracy of measurements decreases for really small time intervals, like a couple of nano-seconds ) I will also provide some additional information how I've measured Overhead of Assignment and it is tricky because Intel and Microsoft C++ compilers don't generate the same number of CPU instructions and disassembled codes demonstrate it.
0 Kudos
SergeyKostrov
Valued Contributor II
1,520 Views
There are No a Latency number for RDTSC instruction in the following manual. Intel(R) 64 and IA-32 Architectures Optimization Reference Manual Order Number: 248966-026 April 2012 Page 767 Table C-16a. General Purpose Instructions (Contd.) ... Throughput of RDTSC instruction: ~28 - 06_2A, 06_2D ~31 - 06_25/2C/1A/1E/1F/2E/2F ~31 - 06_17, 06_1D ... Note 1: CPUID Signature Values of DisplayFamily_DisplayModel: 06_3AH - Microarchitecture Ivy Bridge 06_2AH - Microarchitecture Sandy Bridge 06_2DH - Microarchitecture Sandy Bridge ( Xeon ) 06_25H - Microarchitecture Westmere 06_2CH - Microarchitecture Westmere 06_1AH - Microarchitecture Nehalem 06_1EH - Microarchitecture Nehalem 06_1FH - Microarchitecture Nehalem 06_2EH - Microarchitecture Nehalem 06_2FH - Microarchitecture Westmere 06_17H - Microarchitecture Enhanced Intel Core 06_1DH - Microarchitecture Enhanced Intel Core Note 2: Throughput - The number of clock cycles required to wait before the issue ports are free to accept the same instruction again. For many instructions, the throughput of an instruction can be significantly less than its latency.
0 Kudos
John_Campbell
New Contributor II
1,520 Views

Sergey,

I updated the program testing variability. As SYSTEM_CLOCK and CPU_TIME intrinsics only tick at 64 cycles per second, I also tested GetPerformanceCounter, which has a much higher rate, so that the variability is more noticeable. The intrinsics do still show some variabliity in the elapsed time test.

The results for my processor show the accuracy of the 3 timing routines tested ( using RDTSC as the reference) are:

 CPU_TIME Variability Test
 Calls per second         =   6629241.52383241    
 Cycles per second        =   64.1021553426797    
 cpu_time accuracy        =  1.560009947643979E-002  seconds
   average RDTSC ticks per cycle   41599668.8534031  
   standard deviation (in ticks)   135195.923755431   
   variability                    3.249927883605512E-003

 System_Clock Variability Test
 Calls per second         =   8672321.84910597    
 Cycles per second        =   64.1081552551243   
 System_Clock accuracy    =  1.559863945578231E-002  seconds  
  average RDTSC ticks per cycle   41599764.0136054  
  standard deviation (in ticks)   80004.3281514126    
  variability                    1.923191874964643E-003

 Query_Perform Variability Test
 Calls per second         =   46191928.3624866    
 Cycles per second        =   2589364.74907529    
 Query_Perform accuracy   =  3.861951084168883E-007  seconds 
  average RDTSC ticks per cycle   1029.87607752155   
  standard deviation (in ticks)   1072.71121520382   
  variability                     1.04159251643689    

The key variability measures which measure the accuracy of when each routine ticks over are:
 cpu_time   :   standard deviation (in ticks)   135,195.     ( 51 microseconds)
 System_Clock :  standard deviation (in ticks)   80,004.  ( 30 microseconds)
 Query_Perform :  standard deviation (in ticks)   1,072.  ( 0.4 microseconds)

This test shows there is some variation in GetPerformanceCounter, but is less effective for CPU_Time and Syetem_Clock, in showing the outriders (significant variation in the time between ticks), due to their long tick duration. However it does show a large variation in their tick rate in comparison to Query_Perform, when measured as time.  All this is based on the assumption of the accuracy of RDTSC.
GetPerformanceCounter displays the effect of other system interuption for reporting it's tick interval.

The purpose of this test was to try and estimate the reliability and accuracy of the tick intervals.

John

 

0 Kudos
Reply