Software Tuning, Performance Optimization & Platform Monitoring
Discussion around monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform monitoring
Announcements
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.

How to narrow down intel PCM data to a single process?

Gholamipour__Amir
280 Views

I'm trying to use Performance Counter Monitor (PCM) to understand L3 cache miss and some other performance criteria in my code.

In getSystemCounterStat clearly enumerates over all the processors in the system to get the aggregated performance of all the counter. I tried narrowing down the results by setting Linux Kernel maxcpu parameters to 1 and rebooting the system. But now I'm getting the following error:

PCM does not support using Linux perf API on systems with offlined cores. Falling-back to direct PMU programming.
terminate called after throwing an instance of 'std::exception'
  what():  std::exception
Aborted (core dumped)

How can I achieve this? i.e to get L3 cache miss and other stats for one application running on a single CPU and not to get the stats for the entire system?

0 Kudos
1 Reply
HadiBrais
New Contributor III
280 Views

There is no need to put cores offline. If your code is single-threaded and runs only on a single core you can use getCoreCounterState(uint32 core) where the argument represents the core ID on which the code to be measured is running. You can pin your thread on a specific core and then pass that core ID to the getCoreCounterState API. If your code is multithreaded where multiple threads could be running on different cores at the same time, you can use the getAllCounterStates API. This API has two advantages over getSystemCounterStat. First, it collects performance event counts per core. Second, it is parallelized and so it is more efficient. However, getAllCounterStates also collects uncore performance event counts. If you don't need these counts and you want to reduce the overhead of calling getAllCounterStates, then you can manually call getCoreCounterState and pass the core IDs of the cores to which your threads are pinned.

Reply