Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.
1695 Discussions

## Q&A: How to determine performance headroom with HT-enabled processors

Beginner
231 Views
Here is a question submitted to Intel Software Network Support, along with the response provided by our Application Engineers. The following should be treated as a sample determination. Users should characterize the HT performance benefit ontheir own workload and arrive attheir own constant A:
Q. How can I calculate the overall processor time for a dual processor machine with HT enabled, once I have detected the HT, determined thenumber of logical processors per package, and associated the logical processors to physical processors? Would you please provide me with the formula which calculates the Overall (e.g. %Processor Time) for each individual physical processor?
A. In Windows*, you can use perfmon or the Task Manager to get an instantaneous measure of processor utilization. On Linux*, vmstat can provide the same information. If you have measured user time, system time, and elapsed time, then:
Processor utilization = (system time + user time)/(elapsed time).
With Hyper-Threading Technology (HT), you should modify the formula slightly. See below:
processor utilization = ( total_system_time + total_user_time )/ (elapsed_time * A).

total_system_time = time reported by the OS as spent on behalf of the application in kernel mode. If the application is multithreaded, then the user needs to sum up the time for all the threads.

total_user_time = time reported by the OS as spent on behalf of the application in user mode. Again, if the application is multithreaded, then the user needs to sum up the time for all the threads.

A = a constant. Determination of this constant is a bit tricky and will vary from application to application. For a 2P system, A = 2.0. For a 2P HT system, this is somewhere between 2.0 and 2.4. It could be less than 2 but these cases are rare.
Here's the rationale: the OS does not count an HT logical processor any differently than a physical processor, so using the processor utilization given by the OS will be optimistic. Say you're running a client-server application and your performance metric is the number of clients a system can support. If your OS, say through perfmon, showed 50% utilization at 100 users at a certain response time, with HT, you can't assume that you would be able to run 200 users at 100% utilization with the same response time. The point to emphasize is that an HT processor is not the same as a physical processor. If you assume a 20% performance benefit from HT, then you'd get the constant A to be 2.4. If you assume a 10% performance benefit from HT, then you'd get the constant A to be 2.2.
One additional comment if you're running on Linux*: on Linux*, sar and sadc are another alternative for obtai ning processor utilization directly.
/usr/lib/sa/sadc x y filename // collect y samples, each sample of x second duration, into filename

sar -u x y -f filename // yields the collected data with almost no overhead.

======

Lexi S.

IntelSoftware NetworkSupport