- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've been using Intel Performance Counter Monitor to validate results for ZSim (https://github.com/s5z/zsim), a pintool-based microarchitectural simulator. I've been having issues with PCM's accuracy relative to other performance counter tools.
For certain multithreaded benchmarks, PCM and pintool are returning wildly different instruction counts. This happens even with one thread (but with locks, compare-and-swaps, etc. remaining). At first I thought this could be attributed to syscalls, but after testing out Linux perf_event performance counters, I discovered that it matches with pintool. I also tested with PAPI, a library that wraps the Linux perf_event interface. Any ideas as to what's going on?
PAPI: http://icl.cs.utk.edu/papi/
For PCM, I copied the example code. I created SystemCounterStates and then called getInstructionsRetired(BeforeState, AfterState) for each core. Threads are pinned to cores.
I tested on a custom Breadth-First-Search graph problem:
PCM: 103,852,770 instructions
Pin 2.14: 73,739,015 instructions
Pin 3.0: 73,739,015 instructions
PAPI: 73,202,192 instructions
Directly calling perf_event: 75,610,848 instructions
Directly calling perf_event with exclude_kernel enabled: 73,199,366 instructions
As we can see, Pin correlates with the Linux perf_event results with exclude_kernel enabled (i.e. only measuring user-space code). Intel Performance Counter Monitor results are completely off by ~40%. Any ideas what's going on?
This is how I'm initializing and calling PCM (from my header file). I call getBeforeStates() before I call my kernel, and getAfterStates() after the kernel has completed. I measure using perf_event and PAPI in the same way.
PCM* m; SystemCounterState SysBeforeState, SysAfterState; //const uint32 ncores = m->getNumCores(); std::vector<CoreCounterState> BeforeState, AfterState; std::vector<SocketCounterState> DummySocketStates; void getBeforeStates() { m->getAllCounterStates(SysBeforeState, DummySocketStates, BeforeState); } void getAfterStates() { m->getAllCounterStates(SysAfterState, DummySocketStates, AfterState); } void initPCM(PCMEvent* WSMEvents) { m = PCM::getInstance(); m->resetPMU(); PCM::ExtendedCustomCoreEventDescription conf; conf.fixedCfg = NULL; // default conf.nGPCounters = 4; EventSelectRegister regs[4]; conf.gpCounterCfg = regs; EventSelectRegister def_event_select_reg; def_event_select_reg.value = 0; def_event_select_reg.fields.usr = 1; def_event_select_reg.fields.os = 1; def_event_select_reg.fields.enable = 1; for(int i=0;i<4;++i) regs = def_event_select_reg; for(int i = 0; i < 4; i++) { regs.fields.event_select = WSMEvents.event; regs.fields.umask = WSMEvents.umask; } PCM::ErrorCode status = m->program(PCM::EXT_CUSTOM_CORE_EVENTS, &conf); } void printCoreStats(PCMEvent* WSMEvents) { uint32_t numCores = m->getNumCores(); uint64_t sum = 0; // Find critical path uint64_t max = 0; uint32_t maxIdx = -1; for(int i = 0; i < numCores; i++) { uint64_t cycles = getCycles(BeforeState, AfterState); if(cycles > max) { max = cycles; maxIdx = i; } } ... cout << "Cycles: " << max << "\n"; }
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dan Z,
PCM counts events for the for hardware thread (logical core), socket (CPU), system. Therefore PCM counts events triggered not only by your program/user thread.
Thanks,
Roman
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page