- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
I've been using Intel Performance Counter Monitor to validate results for ZSim (https://github.com/s5z/zsim), a pintool-based microarchitectural simulator. I've been having issues with PCM's accuracy relative to other performance counter tools.
For certain multithreaded benchmarks, PCM and pintool are returning wildly different instruction counts. This happens even with one thread (but with locks, compare-and-swaps, etc. remaining). At first I thought this could be attributed to syscalls, but after testing out Linux perf_event performance counters, I discovered that it matches with pintool. I also tested with PAPI, a library that wraps the Linux perf_event interface. Any ideas as to what's going on?
PAPI: http://icl.cs.utk.edu/papi/
For PCM, I copied the example code. I created SystemCounterStates and then called getInstructionsRetired(BeforeState, AfterState) for each core. Threads are pinned to cores.
I tested on a custom Breadth-First-Search graph problem:
PCM: 103,852,770 instructions
Pin 2.14: 73,739,015 instructions
Pin 3.0: 73,739,015 instructions
PAPI: 73,202,192 instructions
Directly calling perf_event: 75,610,848 instructions
Directly calling perf_event with exclude_kernel enabled: 73,199,366 instructions
As we can see, Pin correlates with the Linux perf_event results with exclude_kernel enabled (i.e. only measuring user-space code). Intel Performance Counter Monitor results are completely off by ~40%. Any ideas what's going on?
This is how I'm initializing and calling PCM (from my header file). I call getBeforeStates() before I call my kernel, and getAfterStates() after the kernel has completed. I measure using perf_event and PAPI in the same way.
PCM* m; SystemCounterState SysBeforeState, SysAfterState; //const uint32 ncores = m->getNumCores(); std::vector<CoreCounterState> BeforeState, AfterState; std::vector<SocketCounterState> DummySocketStates; void getBeforeStates() { m->getAllCounterStates(SysBeforeState, DummySocketStates, BeforeState); } void getAfterStates() { m->getAllCounterStates(SysAfterState, DummySocketStates, AfterState); } void initPCM(PCMEvent* WSMEvents) { m = PCM::getInstance(); m->resetPMU(); PCM::ExtendedCustomCoreEventDescription conf; conf.fixedCfg = NULL; // default conf.nGPCounters = 4; EventSelectRegister regs[4]; conf.gpCounterCfg = regs; EventSelectRegister def_event_select_reg; def_event_select_reg.value = 0; def_event_select_reg.fields.usr = 1; def_event_select_reg.fields.os = 1; def_event_select_reg.fields.enable = 1; for(int i=0;i<4;++i) regs = def_event_select_reg; for(int i = 0; i < 4; i++) { regs.fields.event_select = WSMEvents.event; regs.fields.umask = WSMEvents.umask; } PCM::ErrorCode status = m->program(PCM::EXT_CUSTOM_CORE_EVENTS, &conf); } void printCoreStats(PCMEvent* WSMEvents) { uint32_t numCores = m->getNumCores(); uint64_t sum = 0; // Find critical path uint64_t max = 0; uint32_t maxIdx = -1; for(int i = 0; i < numCores; i++) { uint64_t cycles = getCycles(BeforeState, AfterState); if(cycles > max) { max = cycles; maxIdx = i; } } ... cout << "Cycles: " << max << "\n"; }
Link kopiert
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Hi Dan Z,
PCM counts events for the for hardware thread (logical core), socket (CPU), system. Therefore PCM counts events triggered not only by your program/user thread.
Thanks,
Roman

- RSS-Feed abonnieren
- Thema als neu kennzeichnen
- Thema als gelesen kennzeichnen
- Diesen Thema für aktuellen Benutzer floaten
- Lesezeichen
- Abonnieren
- Drucker-Anzeigeseite