<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic The underlying hardware in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/PCM-Support-Gflops/m-p/1002564#M3597</link>
    <description>&lt;P&gt;The underlying hardware performance counters don't provide the required information, so there is no way for PCM to compute GFLOPS.&lt;/P&gt;

&lt;P&gt;If the new arithmetic operation counters in Broadwell work correctly, then it will be possible on that platform (and it looks like Skylake has the same support), but you will have to measure several different events, scale the results (and sum them) to get the total FP operation count.&lt;/P&gt;

&lt;P&gt;There are 6 events [scalar, packed 128-bit, packed 256-bit] x [single precision, double precision].&amp;nbsp;&amp;nbsp; An increment to one of these events corresponds to 1, 2, 4, or 8 FP operations, and the counters will increment twice for the fused multiply/add operations (thank goodness!).&lt;/P&gt;

&lt;P&gt;So it is clear that counting all 6 events, scaling each by its "width" and summing the 6 scaled values will give you the total FP operation count.&lt;/P&gt;

&lt;P&gt;It is not yet clear whether it will be possible to set multiple bits in the counter mask to collect the same sum using only 4 events:&lt;/P&gt;

&lt;OL&gt;
	&lt;LI&gt;Single FP operation per increment: SCALAR_DOUBLE + SCALAR_SINGLE&lt;/LI&gt;
	&lt;LI&gt;Two FP operations per increment: 128BIT_PACKED_DOUBLE&lt;/LI&gt;
	&lt;LI&gt;Four FP operations per increment: 128BIT_PACKED_SINGLE + 256BIT_PACKED_DOUBLE&lt;/LI&gt;
	&lt;LI&gt;Eight FP operations per increment: 256BIT_PACKED_SINGLE&lt;/LI&gt;
&lt;/OL&gt;

&lt;P&gt;This assumes that all FP operations are SSE or AVX -- to count x87 floating-point operations (difficult to generate with recent compilers, but still used in some codes) you would need a different counter events, and I don't see an x87 arithmetic operation event in the Broadwell counter documentation at &lt;A href="https://download.01.org/perfmon/BDW/" target="_blank"&gt;https://download.01.org/perfmon/BDW/&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 28 Aug 2015 15:41:19 GMT</pubDate>
    <dc:creator>McCalpinJohn</dc:creator>
    <dc:date>2015-08-28T15:41:19Z</dc:date>
    <item>
      <title>[PCM] Support Gflops?</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/PCM-Support-Gflops/m-p/1002562#M3595</link>
      <description>&lt;P&gt;Now, is the PCM support Gflops metric on Haswell?&lt;/P&gt;</description>
      <pubDate>Fri, 28 Aug 2015 08:32:31 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/PCM-Support-Gflops/m-p/1002562#M3595</guid>
      <dc:creator>GHui</dc:creator>
      <dc:date>2015-08-28T08:32:31Z</dc:date>
    </item>
    <item>
      <title>At this point of time PCM</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/PCM-Support-Gflops/m-p/1002563#M3596</link>
      <description>&lt;P&gt;At this point of time PCM does not support any GFLOPs metrics for any processor.&lt;/P&gt;

&lt;P&gt;Thanks,&lt;/P&gt;

&lt;P&gt;Roman&lt;/P&gt;</description>
      <pubDate>Fri, 28 Aug 2015 08:43:48 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/PCM-Support-Gflops/m-p/1002563#M3596</guid>
      <dc:creator>Roman_D_Intel</dc:creator>
      <dc:date>2015-08-28T08:43:48Z</dc:date>
    </item>
    <item>
      <title>The underlying hardware</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/PCM-Support-Gflops/m-p/1002564#M3597</link>
      <description>&lt;P&gt;The underlying hardware performance counters don't provide the required information, so there is no way for PCM to compute GFLOPS.&lt;/P&gt;

&lt;P&gt;If the new arithmetic operation counters in Broadwell work correctly, then it will be possible on that platform (and it looks like Skylake has the same support), but you will have to measure several different events, scale the results (and sum them) to get the total FP operation count.&lt;/P&gt;

&lt;P&gt;There are 6 events [scalar, packed 128-bit, packed 256-bit] x [single precision, double precision].&amp;nbsp;&amp;nbsp; An increment to one of these events corresponds to 1, 2, 4, or 8 FP operations, and the counters will increment twice for the fused multiply/add operations (thank goodness!).&lt;/P&gt;

&lt;P&gt;So it is clear that counting all 6 events, scaling each by its "width" and summing the 6 scaled values will give you the total FP operation count.&lt;/P&gt;

&lt;P&gt;It is not yet clear whether it will be possible to set multiple bits in the counter mask to collect the same sum using only 4 events:&lt;/P&gt;

&lt;OL&gt;
	&lt;LI&gt;Single FP operation per increment: SCALAR_DOUBLE + SCALAR_SINGLE&lt;/LI&gt;
	&lt;LI&gt;Two FP operations per increment: 128BIT_PACKED_DOUBLE&lt;/LI&gt;
	&lt;LI&gt;Four FP operations per increment: 128BIT_PACKED_SINGLE + 256BIT_PACKED_DOUBLE&lt;/LI&gt;
	&lt;LI&gt;Eight FP operations per increment: 256BIT_PACKED_SINGLE&lt;/LI&gt;
&lt;/OL&gt;

&lt;P&gt;This assumes that all FP operations are SSE or AVX -- to count x87 floating-point operations (difficult to generate with recent compilers, but still used in some codes) you would need a different counter events, and I don't see an x87 arithmetic operation event in the Broadwell counter documentation at &lt;A href="https://download.01.org/perfmon/BDW/" target="_blank"&gt;https://download.01.org/perfmon/BDW/&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 28 Aug 2015 15:41:19 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/PCM-Support-Gflops/m-p/1002564#M3597</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2015-08-28T15:41:19Z</dc:date>
    </item>
  </channel>
</rss>

