- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Intel System Programming Manual on Skylake reports that there are just 4 programmable counters per thread. How is it then possible to collect, e.g. all FRONTEND_RETIRED.* events which are much more then 4?
Currently I think of it as programming IA32_PERFEVTSELx, then reading IA32_PMCx. But there are just 4 IA32_PERFEVTSELx while FRONTEND_RETIRED.* events are more then 10.
- Tags:
- Parallel Computing
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
With HyperThreading enabled, you only have four programmable performance counters per logical processor (8 with HyperThreading disabled in the BIOS). Analyses that require more than four programmable counters need to use either counter multiplexing or multiple runs with different counter sets.
Many tools support counter multiplexing. Intel's VTune uses counter multiplexing for most of its analysis types -- I was just reviewing some VTune analyses using the "general exploration" target and saw that a total of 106 programmable performance counter events were used during the run.
Even the Linux "perf stat" (or "perf record") facility allows multiplexing -- simply specify all the events you want to record on the command line and the driver will switch between groups, attempting to measure each of the events for approximately the same fraction of the total time.
Link Copied
- « Previous
-
- 1
- 2
- Next »
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your generous help.
Now the issues related to usage of "low-overhead-timers" are clearly understood.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
-
- 1
- 2
- Next »