<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hi Dan Z, in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/Intel-PCM-vs-perf-event-PAPI-correlation/m-p/1105760#M5954</link>
    <description>&lt;P&gt;Hi Dan Z,&lt;/P&gt;

&lt;P&gt;PCM counts events for the for hardware thread (logical core), socket (CPU), system. Therefore PCM counts events triggered not only by your program/user thread.&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Thanks,&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Roman&lt;/P&gt;</description>
    <pubDate>Tue, 26 Apr 2016 11:23:13 GMT</pubDate>
    <dc:creator>Roman_D_Intel</dc:creator>
    <dc:date>2016-04-26T11:23:13Z</dc:date>
    <item>
      <title>Intel PCM vs. perf_event/PAPI correlation</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Intel-PCM-vs-perf-event-PAPI-correlation/m-p/1105759#M5953</link>
      <description>&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;I've been using Intel Performance Counter Monitor to validate results for ZSim (&lt;SPAN style="font-size: 13.008px; line-height: 19.512px;"&gt;&lt;A href="https://github.com/s5z/zsim)" target="_blank"&gt;https://github.com/s5z/zsim)&lt;/A&gt;&lt;/SPAN&gt;, a pintool-based microarchitectural simulator. I've been having issues with PCM's accuracy relative to other performance counter tools.&lt;/P&gt;

&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;For certain multithreaded benchmarks, PCM and pintool are returning wildly different instruction counts. This happens even with one thread (but with locks, compare-and-swaps, etc. remaining). At first I thought this could be attributed to syscalls, but after testing out Linux perf_event performance counters, I discovered that it matches with pintool. I also tested with PAPI, a library that wraps the Linux perf_event interface. Any ideas as to what's going on?&lt;/P&gt;

&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;PAPI: &lt;A href="http://icl.cs.utk.edu/papi/" target="_blank"&gt;http://icl.cs.utk.edu/papi/&lt;/A&gt;&lt;/P&gt;

&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;For PCM, I copied the example code. I created SystemCounterStates and then called&amp;nbsp;getInstructionsRetired(BeforeState&lt;I&gt;, AfterState&lt;I&gt;) for each core. Threads are pinned to cores.&lt;/I&gt;&lt;/I&gt;&lt;/P&gt;

&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;I tested on a custom Breadth-First-Search graph problem:&lt;/P&gt;

&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;PCM:&amp;nbsp;&lt;SPAN id="docs-internal-guid-41b16e0e-f549-a53e-103a-c5641348b06c"&gt;&lt;SPAN style="font-size: 14.6667px; font-family: Arial; color: rgb(0, 0, 0); vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"&gt;103,852,770 instructions&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;Pin 2.14:&amp;nbsp;&lt;SPAN id="docs-internal-guid-41b16e0e-f549-dad0-9810-5c35fa775237"&gt;&lt;SPAN style="font-size: 14.6667px; font-family: Arial; color: rgb(0, 0, 0); vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"&gt;73,739,015 instructions&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;&lt;FONT color="#000000" face="Arial"&gt;&lt;SPAN style="font-size: 14.6667px; line-height: 22px; white-space: pre-wrap;"&gt;Pin 3.0: &lt;/SPAN&gt;&lt;/FONT&gt;&lt;SPAN id="docs-internal-guid-41b16e0e-f549-dad0-9810-5c35fa775237"&gt;&lt;SPAN style="font-size: 14.6667px; font-family: Arial; color: rgb(0, 0, 0); vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"&gt;73,739,015 instructions&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;PAPI:&amp;nbsp;&lt;SPAN id="docs-internal-guid-41b16e0e-f54a-4496-95f7-2b2dc05adfb3"&gt;&lt;SPAN style="font-size: 14.6667px; font-family: Arial; color: rgb(0, 0, 0); vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"&gt;73,202,192 instructions&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;&lt;SPAN style="font-size: 14.6667px; font-family: Arial; color: rgb(0, 0, 0); vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"&gt;Directly calling perf_event: &lt;/SPAN&gt;&lt;SPAN id="docs-internal-guid-41b16e0e-f54b-375e-4111-275cd47c092b"&gt;&lt;SPAN style="font-size: 14.6667px; font-family: Arial; color: rgb(0, 0, 0); vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"&gt;75,610,848 instructions&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;Directly calling perf_event with exclude_kernel enabled:&amp;nbsp;&lt;SPAN id="docs-internal-guid-41b16e0e-f54b-7ce4-9662-88235ddee477"&gt;&lt;SPAN style="font-size: 14.6667px; font-family: Arial; color: rgb(0, 0, 0); vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"&gt;73,199,366 instructions&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;&lt;SPAN style="font-size: 14.6667px; font-family: Arial; color: rgb(0, 0, 0); vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"&gt;As we can see, Pin correlates with the Linux perf_event results with exclude_kernel enabled (i.e. only measuring user-space code). Intel Performance Counter Monitor results are completely off by ~40%. Any ideas what's going on?&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;&lt;SPAN style="font-size: 14.6667px; font-family: Arial; color: rgb(0, 0, 0); vertical-align: baseline; white-space: pre-wrap; background-color: transparent;"&gt;This is how I'm initializing and calling PCM (from my header file). I call getBeforeStates() before I call my kernel, and getAfterStates() after the kernel has completed. I measure using perf_event and PAPI in the same way.&lt;/SPAN&gt;&lt;/P&gt;

&lt;PRE class="brush:cpp;" style="font-size: 13.008px; line-height: 19.512px;"&gt;PCM* m;
SystemCounterState SysBeforeState, SysAfterState;
//const uint32 ncores = m-&amp;gt;getNumCores();
std::vector&amp;lt;CoreCounterState&amp;gt; BeforeState, AfterState;
std::vector&amp;lt;SocketCounterState&amp;gt; DummySocketStates;

void getBeforeStates() {
    m-&amp;gt;getAllCounterStates(SysBeforeState, DummySocketStates, BeforeState);
}

void getAfterStates() {
    m-&amp;gt;getAllCounterStates(SysAfterState, DummySocketStates, AfterState);
}

void initPCM(PCMEvent* WSMEvents) {
    m = PCM::getInstance();
    m-&amp;gt;resetPMU();

    PCM::ExtendedCustomCoreEventDescription conf;
    conf.fixedCfg = NULL; // default
    conf.nGPCounters = 4;
    EventSelectRegister regs[4];
    conf.gpCounterCfg = regs;
    EventSelectRegister def_event_select_reg;
    def_event_select_reg.value = 0;
    def_event_select_reg.fields.usr = 1;
    def_event_select_reg.fields.os = 1;
    def_event_select_reg.fields.enable = 1;
    for(int i=0;i&amp;lt;4;++i)
        regs&lt;I&gt; = def_event_select_reg;

    for(int i = 0; i &amp;lt; 4; i++) {
        regs&lt;I&gt;.fields.event_select = WSMEvents&lt;I&gt;.event;
        regs&lt;I&gt;.fields.umask = WSMEvents&lt;I&gt;.umask;
    }

    PCM::ErrorCode status = m-&amp;gt;program(PCM::EXT_CUSTOM_CORE_EVENTS, &amp;amp;conf);
}

void printCoreStats(PCMEvent* WSMEvents) {
    uint32_t numCores = m-&amp;gt;getNumCores();
    uint64_t sum = 0;

    // Find critical path
    uint64_t max = 0;
    uint32_t maxIdx = -1;
    for(int i = 0; i &amp;lt; numCores; i++) {
        uint64_t cycles = getCycles(BeforeState&lt;I&gt;, AfterState&lt;I&gt;);
        if(cycles &amp;gt; max) {
            max = cycles;
            maxIdx = i;
        }
    }
    ...
    cout &amp;lt;&amp;lt; "Cycles: " &amp;lt;&amp;lt; max &amp;lt;&amp;lt; "\n";
}&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 26 Apr 2016 09:39:20 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Intel-PCM-vs-perf-event-PAPI-correlation/m-p/1105759#M5953</guid>
      <dc:creator>Dan_Z_</dc:creator>
      <dc:date>2016-04-26T09:39:20Z</dc:date>
    </item>
    <item>
      <title>Hi Dan Z,</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Intel-PCM-vs-perf-event-PAPI-correlation/m-p/1105760#M5954</link>
      <description>&lt;P&gt;Hi Dan Z,&lt;/P&gt;

&lt;P&gt;PCM counts events for the for hardware thread (logical core), socket (CPU), system. Therefore PCM counts events triggered not only by your program/user thread.&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Thanks,&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Roman&lt;/P&gt;</description>
      <pubDate>Tue, 26 Apr 2016 11:23:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Intel-PCM-vs-perf-event-PAPI-correlation/m-p/1105760#M5954</guid>
      <dc:creator>Roman_D_Intel</dc:creator>
      <dc:date>2016-04-26T11:23:13Z</dc:date>
    </item>
  </channel>
</rss>

