Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Estimating CPU utilization using Performance Counters

tim_kiefer
Beginner
1,244 Views

Hi,

this may be a stupid question... but what would be a good method/counter to estimate the (per core/per socket) utilization of the CPU. I would like to get a percentage like many OS tools (e.g., nmon, ...) provide them - but since I am collecting counters in my experiment framework anyway, it would be easier to use them instead of OS tools.

Any suggestions are appreciated

Thanks!

- tim

0 Kudos
14 Replies
SergeyKostrov
Valued Contributor II
1,244 Views
>>...what would be a good method/counter to estimate the (per core/per socket) utilization of the CPU... On a Windows platform you can use Windows Management Instrumentation ( WMI ) COM interfaces. There are lots of examples on MSDN how to use WMI and these WMI COM interfaces relatively simple to use. Also, you can use a wbemtest.exe utility to evaluate WMI COM interfaces without any programming.
0 Kudos
Bernard
Valued Contributor I
1,244 Views
>>>this may be a stupid question... but what would be a good method/counter to estimate the (per core/per socket) utilization of the CPU. I would like to get a percentage like many OS tools (e.g., nmon, ...) provide them - but since I am collecting counters in my experiment framework anyway, it would be easier to use them instead of OS tools. Any suggestions are appreciated Thanks! You can use programmaticaly performance counters consumer and providers. Link://msdn.microsoft.com/en-us/library/windows/desktop/aa373088(v=vs.85).aspx
0 Kudos
SergeyKostrov
Valued Contributor II
1,244 Views
>>..., it would be easier to use them instead of OS tools... By the way, what OS tools did you consider?
0 Kudos
tim_kiefer
Beginner
1,244 Views

Sergey, I am on Linux... sorry - forgot to mention.

iliyapolak: your link leads me to the performance counter reference... I don't see how that helps.

To clarify: I know how to use Intel PCM and how to collect (any) hardware performance counter. I don't want to use this functionality in my own code. Instead, I start the PCM - run some test - stop PCM - look at collected counter values (usually collected per second).

What I would like to know is, which counter(s) will tell me the CPU utilization... like UOP_RETIRED_ANY??? * TSC.

thanks

0 Kudos
Bernard
Valued Contributor I
1,244 Views
>>>iliyapolak: your link leads me to the performance counter reference... I don't see how that helps.>>> Sorry I did not know that you are on Linux:) >>>What I would like to know is, which counter(s) will tell me the CPU utilization... like UOP_RETIRED_ANY??? * TSC>>> Now I understand properly your question. For the total numberof uops delivered to front end you can use :UOPS_ISSUED For the number of any retired instructions from the execution use :INST_RETIRED.ANY For the instruction breakdown by type use:ARITH or FP_COMP_OPS_EXE or SIMD_INT_128
0 Kudos
tim_kiefer
Beginner
1,244 Views

Thanks for your help :)

I see how I can count the uops/instructions that are actually executed. But to calculate the utilization, I would also need the maximum possible number of instructions that can be executed so that I can output "actual"/"maximum". Comparable to link utilization... sending 6GB/s and having a maximum of 12.8GB/s I get about 50% link utilization (give or take) ;).

I'll think about it for another while...

0 Kudos
Bernard
Valued Contributor I
1,244 Views
Hi Tim, Please read this article about the cpu utilization Link:http://software.intel.com/en-us/articles/performance-insights-to-intel-hyper-threading-technology/
0 Kudos
Bernard
Valued Contributor I
1,244 Views
Hi Tim, Please read this article about the CPU utilization Link://software.intel.com/en-us/articles/performance-insights-to-intel-hyper-threading-technology/
0 Kudos
Patrick_F_Intel1
Employee
1,244 Views

Hello Tim.Kiefer

The term 'cpu utilization' is pretty vague. There are many measures of cpu utilization. This implies a resource that has a maximum and you'd like to know what % of max is getting used.

For instance, you can look at %idle (or %halted): http://software.intel.com/en-us/articles/measuring-the-halted-state/

Or the average unhalted frequency: http://software.intel.com/en-us/articles/measuring-the-average-unhalted-frequency/

Or the IPC (instructions per clocktick) where the max is 4 or 5 instructions per clocktick.

But in my experience, I've found that trying to dig deeper into the micro-architecture statistics should only be undertaken when you are sure that your application is cpu-bound, not waiting on cache misses, disk IO, network IO, etc. You can see the 'top down' methodology here: http://software.intel.com/en-us/blogs/2011/05/04/top-down-methodology-for-software-performance-analysis

If the IPC is low or your app isn't near 100% of the cpu, you are probably not bottlenecked in the cpu. 'Time' is the critical factor. I usually look to see where in the code the time is being spent with something like VTune.

If your code is bottlenecked by the cpu, you can use this (http://software.intel.com/sites/products/documentation/hpc/amplifierxe/en-us/2011Update/lin/ug_docs/GUID-8FCE6EF8-301B-4D62-B09E-EF79FE7CC33D.htm ) analysis methodology to understand better where the bottleneck is.

Pat

0 Kudos
Bernard
Valued Contributor I
1,244 Views

>>>Comparable to link utilization... sending 6GB/s and having a maximum of 12.8GB/s I get about 50% link utilization (give or take) ;).>>>

If you are interested also in measuring bus utilization follow this link://software.intel.com/en-us/forums/topic/281625

0 Kudos
Bernard
Valued Contributor I
1,244 Views
>>>Comparable to link utilization... sending 6GB/s and having a maximum of 12.8GB/s I get about 50% link utilization (give or take) ;).>>> If you are also interested in measuring bus utilization(This is related to old FSB concept) please look at this://software.intel.com/en-us/forums/topic/281625
0 Kudos
tim_kiefer
Beginner
1,244 Views

Sorry, I didn't get to this earlier. Thanks to both of you for the pointers to the material - quite some interesting reading!

iliyapolak: Thanks for the Hyperthreading document - that made a few things clear. Although I still can't figure out, what the OS is showing me. What is Windows showing in the Task Manager performance view as CPU utiliaztion?

Pat: Thanks for the pointers - all the methods you pointed out seem to work in my case. To clarify, I am not hunting the last bit of performance in my application. My applications are synthetic and sometimes CPU-bound, sometimes memory controller bound and sometimes bound by the QPI link bandwidth - all on purpose. What I am interested in right now is rather a relative measure between different cores/sockets - say cores 4-8 are significantly busier than cores 9-12. I guess that I can get this information with the methods that you pointed me to. And ultimately, I will be careful when I interpret any of the results.

Last question to anybody: what do OS tools (windows task manager, linux nmon) show me as CPU utilization. There I often see 100% usage, but I doubt that the application is finishing 4-5 instructions per cycle (which would be the ultimate maximum if I understood anything correctly;)).

Thanks

- tim

0 Kudos
Bernard
Valued Contributor I
1,244 Views
The best Windows OS tool for measuring cpu load by process ,thread,DPC and ISR breakdown is Xperf.You can also programatically access performance counters from the Win API albeit this approach needs an extensive coding related to allocating performance data object and accessing counters. Link://msdn.microsoft.com/en-us/performance/cc825801.aspx
0 Kudos
Bernard
Valued Contributor I
1,244 Views
For measuring CPU performance at hardware level the best tool is VTune. >>>Although I still can't figure out, what the OS is showing me. What is Windows showing in the Task Manager performance view as CPU utiliaztion?>>> Tool like task manager will show you average time of execution spent by some thread other than so called idle thread.Data is probably obtained from performance counters(%kernel time,%user time)
0 Kudos
Reply