<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How can I count FP operations in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/How-to-measure-flops-on-v4/m-p/1109280#M6020</link>
    <description>&lt;P&gt;How can I count FP operations on v3?&amp;nbsp;&lt;/P&gt;

&lt;P&gt;What events that I can use to count FP operations?&lt;/P&gt;</description>
    <pubDate>Wed, 07 Sep 2016 05:53:05 GMT</pubDate>
    <dc:creator>GHui</dc:creator>
    <dc:date>2016-09-07T05:53:05Z</dc:date>
    <item>
      <title>How to measure flops on v4</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/How-to-measure-flops-on-v4/m-p/1109272#M6012</link>
      <description>&lt;P&gt;I cannot find FP* events on v4 via&amp;nbsp;64-ia-32-architectures-software-developer-manual-325462.pdf.&amp;nbsp;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Is there any manuals to show that?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 24 Aug 2016 05:21:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/How-to-measure-flops-on-v4/m-p/1109272#M6012</guid>
      <dc:creator>GHui</dc:creator>
      <dc:date>2016-08-24T05:21:39Z</dc:date>
    </item>
    <item>
      <title>The events are documented at</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/How-to-measure-flops-on-v4/m-p/1109273#M6013</link>
      <description>&lt;P&gt;The events are documented at &lt;A href="https://download.01.org/perfmon/BDW/Broadwell_core_V16.json" target="_blank"&gt;https://download.01.org/perfmon/BDW/Broadwell_core_V16.json&lt;/A&gt; -- look for "FP_ARITH" and you will find the various sub-events of the new 0xC7 core performance counter event.&lt;/P&gt;</description>
      <pubDate>Wed, 24 Aug 2016 14:55:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/How-to-measure-flops-on-v4/m-p/1109273#M6013</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2016-08-24T14:55:11Z</dc:date>
    </item>
    <item>
      <title>I've collect the follwing</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/How-to-measure-flops-on-v4/m-p/1109274#M6014</link>
      <description>&lt;P&gt;I've collect the follwing events, and run&amp;nbsp;xhpl for test.&lt;/P&gt;

&lt;P&gt;FP_ARITH_INST_RETIRED.SCALAR_DOUBLE&lt;BR /&gt;
	FP_ARITH_INST_RETIRED.SCALAR_SINGLE&lt;BR /&gt;
	FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE&lt;BR /&gt;
	FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE&lt;BR /&gt;
	FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE&lt;BR /&gt;
	FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE&lt;BR /&gt;
	FP_ARITH_INST_RETIRED.SCALAR&lt;BR /&gt;
	FP_ARITH_INST_RETIRED.PACKED&lt;BR /&gt;
	FP_ARITH_INST_RETIRED.SINGLE&lt;BR /&gt;
	FP_ARITH_INST_RETIRED.DOUBLE&lt;/P&gt;

&lt;P&gt;And their diff values in a seconds are "0 0 0 0 0 31684 1340700232 0.0 0.0 0.0".&lt;/P&gt;

&lt;P&gt;I confused that how to undestand the events, some are zero, the other not.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;And are the events had inclusion relation.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 26 Aug 2016 10:20:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/How-to-measure-flops-on-v4/m-p/1109274#M6014</guid>
      <dc:creator>GHui</dc:creator>
      <dc:date>2016-08-26T10:20:33Z</dc:date>
    </item>
    <item>
      <title>I have run the  mkl</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/How-to-measure-flops-on-v4/m-p/1109275#M6015</link>
      <description>&lt;P&gt;I have run the &amp;nbsp;mkl/benchmarks/linpack/runme_xeon64 program.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;The runme_xeon64 output the following message&lt;/P&gt;

&lt;UL&gt;
	&lt;LI&gt;Size &amp;nbsp; LDA &amp;nbsp; &amp;nbsp;Align. Time(s) &amp;nbsp; &amp;nbsp;GFlops &amp;nbsp; Residual &amp;nbsp; &amp;nbsp; Residual(norm) Check&lt;BR /&gt;
		1000 &amp;nbsp; 1000 &amp;nbsp; 4 &amp;nbsp; &amp;nbsp; &amp;nbsp;0.052 &amp;nbsp; &amp;nbsp; &amp;nbsp;12.9315 &amp;nbsp;8.866796e-13 3.023805e-02 &amp;nbsp; pass&lt;BR /&gt;
		1000 &amp;nbsp; 1000 &amp;nbsp; 4 &amp;nbsp; &amp;nbsp; &amp;nbsp;0.008 &amp;nbsp; &amp;nbsp; &amp;nbsp;82.7219 &amp;nbsp;8.866796e-13 3.023805e-02 &amp;nbsp; pass&lt;BR /&gt;
		1000 &amp;nbsp; 1000 &amp;nbsp; 4 &amp;nbsp; &amp;nbsp; &amp;nbsp;0.007 &amp;nbsp; &amp;nbsp; &amp;nbsp;93.5988 &amp;nbsp;8.866796e-13 3.023805e-02 &amp;nbsp; pass&lt;BR /&gt;
		1000 &amp;nbsp; 1000 &amp;nbsp; 4 &amp;nbsp; &amp;nbsp; &amp;nbsp;0.007 &amp;nbsp; &amp;nbsp; &amp;nbsp;92.9639 &amp;nbsp;8.866796e-13 3.023805e-02 &amp;nbsp; pass&lt;BR /&gt;
		2000 &amp;nbsp; 2000 &amp;nbsp; 4 &amp;nbsp; &amp;nbsp; &amp;nbsp;0.033 &amp;nbsp; &amp;nbsp; &amp;nbsp;164.2892 3.864797e-12 3.361900e-02 &amp;nbsp; pass&lt;BR /&gt;
		2000 &amp;nbsp; 2000 &amp;nbsp; 4 &amp;nbsp; &amp;nbsp; &amp;nbsp;0.027 &amp;nbsp; &amp;nbsp; &amp;nbsp;200.3969 3.864797e-12 3.361900e-02 &amp;nbsp; pass&lt;BR /&gt;
		5000 &amp;nbsp; 5008 &amp;nbsp; 4 &amp;nbsp; &amp;nbsp; &amp;nbsp;0.167 &amp;nbsp; &amp;nbsp; &amp;nbsp;499.0555 2.383066e-11 3.322993e-02 &amp;nbsp; pass&lt;BR /&gt;
		5000 &amp;nbsp; 5008 &amp;nbsp; 4 &amp;nbsp; &amp;nbsp; &amp;nbsp;0.190 &amp;nbsp; &amp;nbsp; &amp;nbsp;438.7789 2.155309e-11 3.005404e-02 &amp;nbsp; pass&lt;BR /&gt;
		10000 &amp;nbsp;10000 &amp;nbsp;4 &amp;nbsp; &amp;nbsp; &amp;nbsp;0.974 &amp;nbsp; &amp;nbsp; &amp;nbsp;685.0007 8.261911e-11 2.913233e-02 &amp;nbsp; pass&lt;BR /&gt;
		10000 &amp;nbsp;10000 &amp;nbsp;4 &amp;nbsp; &amp;nbsp; &amp;nbsp;0.906 &amp;nbsp; &amp;nbsp; &amp;nbsp;736.1333 8.531753e-11 3.008383e-02 &amp;nbsp; pass&lt;BR /&gt;
		15000 &amp;nbsp;15000 &amp;nbsp;4 &amp;nbsp; &amp;nbsp; &amp;nbsp;2.516 &amp;nbsp; &amp;nbsp; &amp;nbsp;894.5636 2.272723e-10 3.579576e-02 &amp;nbsp; pass&lt;BR /&gt;
		15000 &amp;nbsp;15000 &amp;nbsp;4 &amp;nbsp; &amp;nbsp; &amp;nbsp;2.760 &amp;nbsp; &amp;nbsp; &amp;nbsp;815.4055 2.019905e-10 3.181385e-02 &amp;nbsp; pass&lt;BR /&gt;
		18000 &amp;nbsp;18008 &amp;nbsp;4 &amp;nbsp; &amp;nbsp; &amp;nbsp;4.663 &amp;nbsp; &amp;nbsp; &amp;nbsp;834.0049 3.264814e-10 3.575372e-02 &amp;nbsp; pass&lt;BR /&gt;
		18000 &amp;nbsp;18008 &amp;nbsp;4 &amp;nbsp; &amp;nbsp; &amp;nbsp;4.587 &amp;nbsp; &amp;nbsp; &amp;nbsp;847.6924 3.264814e-10 3.575372e-02 &amp;nbsp; pass&lt;BR /&gt;
		20000 &amp;nbsp;20016 &amp;nbsp;4 &amp;nbsp; &amp;nbsp; &amp;nbsp;5.986 &amp;nbsp; &amp;nbsp; &amp;nbsp;891.1581 3.565633e-10 3.156367e-02 &amp;nbsp; pass&lt;BR /&gt;
		20000 &amp;nbsp;20016 &amp;nbsp;4 &amp;nbsp; &amp;nbsp; &amp;nbsp;6.009 &amp;nbsp; &amp;nbsp; &amp;nbsp;887.7311 3.565633e-10 3.156367e-02 &amp;nbsp; pass&lt;BR /&gt;
		22000 &amp;nbsp;22008 &amp;nbsp;4 &amp;nbsp; &amp;nbsp; &amp;nbsp;7.569 &amp;nbsp; &amp;nbsp; &amp;nbsp;938.0349 4.454127e-10 3.262473e-02 &amp;nbsp; pass&lt;BR /&gt;
		22000 &amp;nbsp;22008 &amp;nbsp;4 &amp;nbsp; &amp;nbsp; &amp;nbsp;7.541 &amp;nbsp; &amp;nbsp; &amp;nbsp;941.4906 4.454127e-10 3.262473e-02 &amp;nbsp; pass&lt;BR /&gt;
		25000 &amp;nbsp;25000 &amp;nbsp;4 &amp;nbsp; &amp;nbsp; &amp;nbsp;10.524 &amp;nbsp; &amp;nbsp; 989.9109 5.087659e-10 2.893169e-02 &amp;nbsp; pass&lt;BR /&gt;
		25000 &amp;nbsp;25000 &amp;nbsp;4 &amp;nbsp; &amp;nbsp; &amp;nbsp;10.488 &amp;nbsp; &amp;nbsp; 993.3168 5.087659e-10 2.893169e-02 &amp;nbsp; pass&lt;BR /&gt;
		26000 &amp;nbsp;26000 &amp;nbsp;4 &amp;nbsp; &amp;nbsp; &amp;nbsp;11.710 &amp;nbsp; &amp;nbsp; 1000.7430 5.944061e-10 3.125565e-02 &amp;nbsp; pass&lt;BR /&gt;
		26000 &amp;nbsp;26000 &amp;nbsp;4 &amp;nbsp; &amp;nbsp; &amp;nbsp;11.758 &amp;nbsp; &amp;nbsp; 996.6501 5.944061e-10 3.125565e-02 &amp;nbsp; pass&lt;BR /&gt;
		27000 &amp;nbsp;27000 &amp;nbsp;4 &amp;nbsp; &amp;nbsp; &amp;nbsp;13.020 &amp;nbsp; &amp;nbsp; 1007.9769 6.490156e-10 3.164930e-02 &amp;nbsp; pass&lt;BR /&gt;
		30000 &amp;nbsp;30000 &amp;nbsp;1 &amp;nbsp; &amp;nbsp; &amp;nbsp;17.293 &amp;nbsp; &amp;nbsp; 1040.9949 8.272351e-10 3.260969e-02 &amp;nbsp; pass&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;But I colloect these events only&amp;nbsp;274.324GFlops.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 26 Aug 2016 11:09:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/How-to-measure-flops-on-v4/m-p/1109275#M6015</guid>
      <dc:creator>GHui</dc:creator>
      <dc:date>2016-08-26T11:09:00Z</dc:date>
    </item>
    <item>
      <title>How are you collecting these</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/How-to-measure-flops-on-v4/m-p/1109276#M6016</link>
      <description>&lt;P&gt;How are you collecting these counts?&lt;/P&gt;

&lt;P&gt;These events count instructions, not operations, so the first six need to be scaled by the corresponding width if you want an operation count.&amp;nbsp;&amp;nbsp; The documentation pointed to above clearly explains how many operations each increment corresponds to, and points out that for Multiply-Add operations the counter is incremented twice, so that operations are counted in the expected way (Multiply-Add = 2 operations).&lt;/P&gt;

&lt;P&gt;The scaling should be:&lt;/P&gt;

&lt;UL&gt;
	&lt;LI&gt;FP_ARITH_INST_RETIRED.SCALAR_DOUBLE&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&lt;/LI&gt;
	&lt;LI&gt;FP_ARITH_INST_RETIRED.SCALAR_SINGLE&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&lt;/LI&gt;
	&lt;LI&gt;FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2&lt;/LI&gt;
	&lt;LI&gt;FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4&lt;/LI&gt;
	&lt;LI&gt;FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4&lt;/LI&gt;
	&lt;LI&gt;FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 8&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;From inspection of the Umask values, the next two events are the sum of the single and double precision operations for each case.&amp;nbsp; For the PACKED case it is not possible to get an operation count, since the single packed instructions correspond to a different number of operations than the double packed instructions.&lt;/P&gt;

&lt;UL&gt;
	&lt;LI&gt;FP_ARITH_INST_RETIRED.SCALAR&lt;/LI&gt;
	&lt;LI&gt;FP_ARITH_INST_RETIRED.PACKED&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;From inspection of the Umask values, the next two operations are the sum of scalar, packed 128 bit, and packed 256 bit operations for each width.&amp;nbsp; It is not possible to get an operation count from any of these counters, since they combine instructions of different widths.&lt;/P&gt;

&lt;UL&gt;
	&lt;LI&gt;FP_ARITH_INST_RETIRED.SINGLE&lt;/LI&gt;
	&lt;LI&gt;FP_ARITH_INST_RETIRED.DOUBLE&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;For the xHPL code running on a Xeon E5 v4, almost all of the counts should be in the FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE category.&amp;nbsp;&amp;nbsp; These should be multiplied by 4 to get the FP operation count.&lt;/P&gt;</description>
      <pubDate>Fri, 26 Aug 2016 21:53:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/How-to-measure-flops-on-v4/m-p/1109276#M6016</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2016-08-26T21:53:45Z</dc:date>
    </item>
    <item>
      <title>I set event 0x6310C7 to</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/How-to-measure-flops-on-v4/m-p/1109277#M6017</link>
      <description>&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;I set event 0x6310C7 to evtsel&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;0x18A, and get it from pmc 0xc5.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;And I get zero counts.&lt;/P&gt;</description>
      <pubDate>Wed, 31 Aug 2016 08:30:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/How-to-measure-flops-on-v4/m-p/1109277#M6017</guid>
      <dc:creator>GHui</dc:creator>
      <dc:date>2016-08-31T08:30:18Z</dc:date>
    </item>
    <item>
      <title>Does v3 can also use these </title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/How-to-measure-flops-on-v4/m-p/1109278#M6018</link>
      <description>&lt;P&gt;Does v3 can also use these&amp;nbsp;&lt;SPAN style="font-family: Arial, SimHei, SimSun, Tahoma, Helvetica, sans-serif; font-size: 12px; line-height: 18px;"&gt;"FP_ARITH"&amp;nbsp;&lt;/SPAN&gt;&amp;nbsp;events for counting flops?&lt;/P&gt;</description>
      <pubDate>Mon, 05 Sep 2016 07:17:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/How-to-measure-flops-on-v4/m-p/1109278#M6018</guid>
      <dc:creator>GHui</dc:creator>
      <dc:date>2016-09-05T07:17:12Z</dc:date>
    </item>
    <item>
      <title>These events do not exist on</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/How-to-measure-flops-on-v4/m-p/1109279#M6019</link>
      <description>&lt;P&gt;These events do not exist on Xeon E5 v3.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;The 0xC7 event is not documented on Xeon E5 v3, but a quick test shows that it is counting something, and it looks like it is probably counting the 0xC7 SIMD events defined for the Nehalem/Westmere platform.&amp;nbsp; These include arithmetic and non-arithmetic SIMD instructions, so they are not useful for counting FP operations.&lt;/P&gt;</description>
      <pubDate>Tue, 06 Sep 2016 18:14:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/How-to-measure-flops-on-v4/m-p/1109279#M6019</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2016-09-06T18:14:00Z</dc:date>
    </item>
    <item>
      <title>How can I count FP operations</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/How-to-measure-flops-on-v4/m-p/1109280#M6020</link>
      <description>&lt;P&gt;How can I count FP operations on v3?&amp;nbsp;&lt;/P&gt;

&lt;P&gt;What events that I can use to count FP operations?&lt;/P&gt;</description>
      <pubDate>Wed, 07 Sep 2016 05:53:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/How-to-measure-flops-on-v4/m-p/1109280#M6020</guid>
      <dc:creator>GHui</dc:creator>
      <dc:date>2016-09-07T05:53:05Z</dc:date>
    </item>
    <item>
      <title>There are no counters for</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/How-to-measure-flops-on-v4/m-p/1109281#M6021</link>
      <description>&lt;P&gt;There are no counters for floating-point operations on Xeon E5 v3.&lt;/P&gt;

&lt;P&gt;The 0x10 and 0x11 events that counted floating point operations on Xeon E5 v1 and v2 suffered from a serious implementation bug that could lead to serious overcounting (I have measured up to 10x over-counts), so these were disabled on the Xeon E5 v3.&amp;nbsp;&amp;nbsp;&amp;nbsp; Unfortunately the replacement 0xC7 events were not included until Xeon E5 v4, leaving Xeon E5 v3 with nothing.&lt;/P&gt;</description>
      <pubDate>Wed, 07 Sep 2016 14:38:03 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/How-to-measure-flops-on-v4/m-p/1109281#M6021</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2016-09-07T14:38:03Z</dc:date>
    </item>
  </channel>
</rss>

