- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've been using Intel Performance Counter Monitor to validate results for ZSim (https://github.com/s5z/zsim), a pintool-based microarchitectural simulator. The Intel PCM website directed me to this particular forum for help.
For certain multithreaded benchmarks, PCM and pintool are returning wildly different instruction counts. This happens even with one thread (but with locks, compare-and-swaps, etc. remaining). At first I thought this could be attributed to syscalls, but after testing out Linux perf_event performance counters, I discovered that it matches with pintool. I also tested with PAPI, a library that wraps the Linux perf_event interface. Any ideas as to what's going on?
PAPI: http://icl.cs.utk.edu/papi/
For PCM, I copied the example code. I created SystemCounterStates and then called getInstructionsRetired(BeforeState, AfterState) for each core. Threads are pinned to cores.
I tested on a custom Breadth-First-Search graph problem:
PCM: 103,852,770 instructions
Pin 2.14: 73,739,015 instructions
Pin 3.0: 73,739,015 instructions
PAPI: 73,202,192 instructions
Directly calling perf_event: 75,610,848 instructions
Directly calling perf_event with exclude_kernel enabled: 73,199,366 instructions
As we can see, Pin correlates with the Linux perf_event results with exclude_kernel enabled (i.e. only measuring user-space code). Intel Performance Counter Monitor results are completely off by ~40%. Any ideas what's going on?
This is how I'm initializing and calling PCM (from my header file). I call getBeforeStates() before I call my kernel, and getAfterStates() after the kernel has completed. I measure using perf_event and PAPI in the same way.
PCM* m; SystemCounterState SysBeforeState, SysAfterState; //const uint32 ncores = m->getNumCores(); std::vector<CoreCounterState> BeforeState, AfterState; std::vector<SocketCounterState> DummySocketStates; void getBeforeStates() { m->getAllCounterStates(SysBeforeState, DummySocketStates, BeforeState); } void getAfterStates() { m->getAllCounterStates(SysAfterState, DummySocketStates, AfterState); } void initPCM(PCMEvent* WSMEvents) { m = PCM::getInstance(); m->resetPMU(); PCM::ExtendedCustomCoreEventDescription conf; conf.fixedCfg = NULL; // default conf.nGPCounters = 4; EventSelectRegister regs[4]; conf.gpCounterCfg = regs; EventSelectRegister def_event_select_reg; def_event_select_reg.value = 0; def_event_select_reg.fields.usr = 1; def_event_select_reg.fields.os = 1; def_event_select_reg.fields.enable = 1; for(int i=0;i<4;++i) regs = def_event_select_reg; for(int i = 0; i < 4; i++) { regs.fields.event_select = WSMEvents.event; regs.fields.umask = WSMEvents.umask; } PCM::ErrorCode status = m->program(PCM::EXT_CUSTOM_CORE_EVENTS, &conf); } void printCoreStats(PCMEvent* WSMEvents) { uint32_t numCores = m->getNumCores(); uint64_t sum = 0; // Find critical path uint64_t max = 0; uint32_t maxIdx = -1; for(int i = 0; i < numCores; i++) { uint64_t cycles = getCycles(BeforeState, AfterState); if(cycles > max) { max = cycles; maxIdx = i; } } ... cout << "Cycles: " << max << "\n"; }
- Tags:
- Parallel Computing
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Bump. Does anyone know if this is the right place to ask?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page