Performance monitor time elapsed not accurate

Zhuoran · ‎07-23-2019

Hello all,

I am using pcm-core to monitor several counters. I have set the sampling interval as e.x. 1s. But the result has shown that "time elapsed" is always a little bit different from 1s. It may be 1005 ms or 999 ms.

I have checked the function pcm-core.cpp used which is "getTickCount()". I searched a little bit of that time function and it said " the resolution of getTickCount() is at most 10 ms". So is it the problem why the time elapsed is not exactly 1s and are there any ways to set an exactly 1s interval for reading those counters? Thank you very much for your time!

Best regards,

Thomas_W_Intel · ‎07-24-2019

PCM is at the mercy of the Linux scheduler when it can access the performance counters on each core. That's why the timing is not always exactly the requested time. This is actually the reason why PCM outputs the time stamp: It allows you to compute metrics like bandwidth more precisely.

HadiBrais · ‎07-24-2019

I've taken a quick look at the code you were referring to. Here are the most relevant parts:

while ((ic <= numberOfIterations) || (numberOfIterations == 0)) {
   ...
   MySleepMs(calibrated_delay_ms);
   ...
   AfterTime = m->getTickCount();
   m->getAllCounterStates(SysAfterState, DummySocketStates, AfterState);
   cout << "Time elapsed: "<<dec<<fixed<<AfterTime-BeforeTime<<" ms" << endl;
   ...
   swap(BeforeTime, AfterTime);
   ...
}

There are at least two problems here. Windows (and mainstream Linux) provide no real-time guarantees regarding the Sleep function. The thread could be woken up before or after the specified time and the system does not offer any guarantees even on the bounds of the error. The Sleep API is implemented based on the system timer interrupt, which is used by the thread scheduler. You can change the timer period using the timeBeginPeriod API. One thing you can try is increasing the timer frequency by calling timeBeginPeriod somewhere at the top of the main function.

If I remember correctly, getTickCount is implemented based also on the timer interrupt (something like this), which is probably why its resolution is at most 10 ms. The documentation says the resolution is *typically* between 10 and 16 ms. You can measure it by calling getTickCount in a loop for some sufficient number of times (e.g., 100), storing the results in an array, and then, after the loop, printing them. The granularity at which the returned values increment is the resolution of getTickCount.

Having the parameter passed to Sleep to be a multiple of the timer duration may not actually provide the most accurate results. Unfortunately, the delay parameter of pcm-core seems to be in seconds, not milliseconds, so you have little control over it, unless you modify the source code to make it in milliseconds.

Remember that when a timer interrupt occurs, there is a lot of code that gets executed until the thread is woken up. It is difficult to measure this time and it depends on how exactly the Windows scheduler decides to wake up a sleeping thread, the thread priority, and the priorities of all other runnable threads in the system. The other issue is that there is also a lot of code between the two calls to getTickCount in pcm-core itself and the time it takes to execute that code has to be factored in.

It will probably be better to use a busy-wait loop based on the timestamp counter instead of Sleep, which eliminates much of the nondeterministic timing behavior of the operating system scheduler. See also: How to start two CPU cores to run instructions at the same time?

Zhuoran · ‎07-24-2019

Hello Thomas and Hadi,

Thank you very much for your help with it ! Much appreciated!