VTune Events: CPU_CLK_UNHALTED.CORE and INST_RETIRED.ANY

prashanthr · ‎12-03-2009

Sir,

I am doing analysis using VTune for an application, but I observed each time the numbers CPU_CLK_UNHALTED.CORE samples & INST_RETIRED.ANY samples are varying. Though I am using the same binary and same set up.

Then I found that Tim has suggested in one of the replay that,

"You can't compare runs when calibration is repeated. Turn it off and set your own sample-after values. A change as large as you mention would appear to mean you have a run which failed to complete."

I did turn off the calibarion and set the SAV as 3000000. But still I am getting each time different results.

Help me in getting consistent results. ( Same numbers every time when I run the same binary).

Thanks,
Prashanth

Vladimir_T_Intel · ‎12-03-2009

Quoting - prashanthr

Sir,

I am doing analysis using VTune for an application, but I observed each time the numbers CPU_CLK_UNHALTED.CORE samples & INST_RETIRED.ANY samples are varying. Though I am using the same binary and same set up.

Then I found that Tim has suggested in one of the replay that,

"You can't compare runs when calibration is repeated. Turn it off and set your own sample-after values. A change as large as you mention would appear to mean you have a run which failed to complete."

I did turn off the calibarion and set the SAV as 3000000. But still I am getting each time different results.

Help me in getting consistent results. ( Same numbers every time when I run the same binary).

Thanks,
Prashanth

How much different the results are? And what is the duration of the collection?

prashanthr · ‎12-03-2009

Quoting - Vladimir Tsymbal (Intel)

How much different the results are? And what is the duration of the collection?

Vladimir,

I collected 500 samples. The difference in the results are 10% to 100%

Thanks,
Prashanth

Vladimir_T_Intel · ‎12-03-2009

Quoting - prashanthr

Vladimir,

I collected 500 samples. The difference in the results are 10% to 100%

Thanks,
Prashanth

HiPrashanth,

In assumption that your CPU clock rate is 3Ghz, you collect 1000 samples a second. So, with 500 samples the program runs about 0.5 sec.

I'd recommend extending the collection time to several seconds, as the program run duration might be affected by other activities in the system.

To prove that you can set up a high resulution couter (counts in clock ticks) and start/stop it in the beginning and the end of the _main fuction. You will be surprized how vary the results.

prashanthr · ‎12-08-2009

Hi Vladimir,

Yes my CPU is running at 3GHz. I extended the collection time to 10 sec. Still I am not getting consistant results. I am doing profiling for a small function.
Is there a problem in getting the results for small programs?

Thanks,
Prashanth

Vladimir_T_Intel · ‎12-08-2009

Hi Prashanth,

Quoting - prashanthr
Is there a problem in getting the results for small programs?

VTune sampling uses statistical approach for measurement, which implies some variations in results. With a small program it might be that the system itself is affecting the consistency (I/O, paging, services running, high priority processes). I highly recommend you to check your small program with a high resolution counter. You could see the inconsistency of the results easily.

Quoting - prashanthr
Yes my CPU is running at 3GHz. I extended the collection time to 10 sec. Still I am not getting consistant results. I am doing profiling for a small function.

If I understood correctly, you just extended the sampling collection time up to 10s, but still profiling a small function, right? If yes, this is not correct. You'd better to extend the run-time of your function up to 10s or more (make a loop with function calls) and collect samples for the wholeloop run.

You can also isolate the loop/function measurement by using sampling collection API - you envelope the piece of code with VTune API calls and collect sampling data for this code region only for more consistency.