Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.
1696 Discussions

(Non)-architectural Performance Monitoring Events. IA32_PERFEVTSEL MSR

ebashinskii__ebashin
8,500 Views

Intel System Programming Manual on Skylake reports that there are just 4 programmable counters per thread.  How is it then possible to collect, e.g. all FRONTEND_RETIRED.* events which are much more then 4? 

Currently I think of it as programming IA32_PERFEVTSELx, then reading IA32_PMCx. But there are just 4 IA32_PERFEVTSELx while FRONTEND_RETIRED.* events are more then 10. 

0 Kudos
1 Solution
McCalpinJohn
Honored Contributor III
8,264 Views

With HyperThreading enabled, you only have four programmable performance counters per logical processor (8 with HyperThreading disabled in the BIOS).   Analyses that require more than four programmable counters need to use either counter multiplexing or multiple runs with different counter sets.

Many tools support counter multiplexing.  Intel's VTune uses counter multiplexing for most of its analysis types -- I was just reviewing some VTune analyses using the "general exploration" target and saw that a total of 106 programmable performance counter events were used during the run.

Even the Linux "perf stat" (or "perf record") facility allows multiplexing -- simply specify all the events you want to record on the command line and the driver will switch between groups, attempting to measure each of the events for approximately the same fraction of the total time.

View solution in original post

0 Kudos
21 Replies
Bernard
Valued Contributor I
1,218 Views

@McCalpinJohn 

Thank you for your generous help.

Now the issues related to usage of "low-overhead-timers" are clearly understood.

0 Kudos
Reply