<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Sorry, because of slow in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/Question-about-get-Gflops-and-AVX-performance/m-p/1029025#M4223</link>
    <description>&lt;P&gt;Sorry, because of slow internet, I clicked submit button one more times.&lt;/P&gt;</description>
    <pubDate>Tue, 21 Oct 2014 08:21:49 GMT</pubDate>
    <dc:creator>GHui</dc:creator>
    <dc:date>2014-10-21T08:21:49Z</dc:date>
    <item>
      <title>Question about get Gflops and AVX performance</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Question-about-get-Gflops-and-AVX-performance/m-p/1029018#M4216</link>
      <description>&lt;P&gt;I want to get Gflops and AVX performance. The PCM tools seems not support. What else I can do, in order to get Gflops and AVX?&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Any help will be appreciated.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 20 Oct 2014 02:52:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Question-about-get-Gflops-and-AVX-performance/m-p/1029018#M4216</guid>
      <dc:creator>GHui</dc:creator>
      <dc:date>2014-10-20T02:52:38Z</dc:date>
    </item>
    <item>
      <title>Do you want to measure</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Question-about-get-Gflops-and-AVX-performance/m-p/1029019#M4217</link>
      <description>&lt;P&gt;Do you want to measure program performance?&lt;/P&gt;</description>
      <pubDate>Mon, 20 Oct 2014 06:47:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Question-about-get-Gflops-and-AVX-performance/m-p/1029019#M4217</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2014-10-20T06:47:24Z</dc:date>
    </item>
    <item>
      <title>If you want to measure actual</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Question-about-get-Gflops-and-AVX-performance/m-p/1029020#M4218</link>
      <description>&lt;P&gt;If you want to measure actual floating point arithmetic execution rate you are mostly out of luck.&amp;nbsp; The performance counters that measure floating-point arithmetic instructions (scalar, 128-bit vector, and 256-bit vector) on Sandy Bridge, Ivy Bridge, and Haswell are known to "over-count".&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;The degree of over-counting depends primarily on average latency between issuing the instruction and the availability of the data that the instruction uses (either register arguments or memory arguments).&amp;nbsp;&amp;nbsp;&amp;nbsp; If all the data is in the L1 cache, then there is almost no over-counting. If the data is in the L2 cache then you can get slight over-counting (10%-20%, but variable), and if all the data is in memory the counts can be as much as 6x to 10x higher than the actual number of completed floating-point arithmetic instructions.&lt;/P&gt;

&lt;P&gt;See more discussion at &lt;A href="https://software.intel.com/en-us/forums/topic/499193" target="_blank"&gt;https://software.intel.com/en-us/forums/topic/499193&lt;/A&gt; and &lt;A href="https://software.intel.com/en-us/forums/topic/531796" target="_blank"&gt;https://software.intel.com/en-us/forums/topic/531796&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 20 Oct 2014 17:11:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Question-about-get-Gflops-and-AVX-performance/m-p/1029020#M4218</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2014-10-20T17:11:51Z</dc:date>
    </item>
    <item>
      <title>Quote:iliyapolak wrote:</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Question-about-get-Gflops-and-AVX-performance/m-p/1029021#M4219</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;iliyapolak wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Do you want to measure program performance?&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Yes, is there some way to do that?&lt;/P&gt;</description>
      <pubDate>Tue, 21 Oct 2014 05:10:19 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Question-about-get-Gflops-and-AVX-performance/m-p/1029021#M4219</guid>
      <dc:creator>GHui</dc:creator>
      <dc:date>2014-10-21T05:10:19Z</dc:date>
    </item>
    <item>
      <title>Quote:John D. McCalpin wrote:</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Question-about-get-Gflops-and-AVX-performance/m-p/1029022#M4220</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;John D. McCalpin wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;If you want to measure actual floating point arithmetic execution rate you are mostly out of luck.&amp;nbsp; The performance counters that measure floating-point arithmetic instructions (scalar, 128-bit vector, and 256-bit vector) on Sandy Bridge, Ivy Bridge, and Haswell are known to "over-count".&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;The degree of over-counting depends primarily on average latency between issuing the instruction and the availability of the data that the instruction uses (either register arguments or memory arguments).&amp;nbsp;&amp;nbsp;&amp;nbsp; If all the data is in the L1 cache, then there is almost no over-counting. If the data is in the L2 cache then you can get slight over-counting (10%-20%, but variable), and if all the data is in memory the counts can be as much as 6x to 10x higher than the actual number of completed floating-point arithmetic instructions.&lt;/P&gt;

&lt;P&gt;See more discussion at &lt;A href="https://software.intel.com/en-us/forums/topic/499193"&gt;https://software.intel.com/en-us/forums/topic/499193&lt;/A&gt; and &lt;A href="https://software.intel.com/en-us/forums/topic/531796"&gt;https://software.intel.com/en-us/forums/topic/531796&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;I have measure it on Sandy Bridge and Ivy Bridge, but not Haswell. I can accept slight over-counting. I have check the documents from&amp;nbsp;http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html. But it not list events about flops and vector on Haswell.&lt;/P&gt;</description>
      <pubDate>Tue, 21 Oct 2014 08:18:54 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Question-about-get-Gflops-and-AVX-performance/m-p/1029022#M4220</guid>
      <dc:creator>GHui</dc:creator>
      <dc:date>2014-10-21T08:18:54Z</dc:date>
    </item>
    <item>
      <title>Quote:John D. McCalpin wrote:</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Question-about-get-Gflops-and-AVX-performance/m-p/1029023#M4221</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;John D. McCalpin wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;If you want to measure actual floating point arithmetic execution rate you are mostly out of luck.&amp;nbsp; The performance counters that measure floating-point arithmetic instructions (scalar, 128-bit vector, and 256-bit vector) on Sandy Bridge, Ivy Bridge, and Haswell are known to "over-count".&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;The degree of over-counting depends primarily on average latency between issuing the instruction and the availability of the data that the instruction uses (either register arguments or memory arguments).&amp;nbsp;&amp;nbsp;&amp;nbsp; If all the data is in the L1 cache, then there is almost no over-counting. If the data is in the L2 cache then you can get slight over-counting (10%-20%, but variable), and if all the data is in memory the counts can be as much as 6x to 10x higher than the actual number of completed floating-point arithmetic instructions.&lt;/P&gt;

&lt;P&gt;See more discussion at &lt;A href="https://software.intel.com/en-us/forums/topic/499193"&gt;https://software.intel.com/en-us/forums/topic/499193&lt;/A&gt; and &lt;A href="https://software.intel.com/en-us/forums/topic/531796"&gt;https://software.intel.com/en-us/forums/topic/531796&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;I have measure it on Sandy Bridge and Ivy Bridge, but not Haswell. I can accept slight over-counting. I have check the documents from&amp;nbsp;http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html. But it not list events about flops and vector on Haswell.&lt;/P&gt;</description>
      <pubDate>Tue, 21 Oct 2014 08:18:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Question-about-get-Gflops-and-AVX-performance/m-p/1029023#M4221</guid>
      <dc:creator>GHui</dc:creator>
      <dc:date>2014-10-21T08:18:56Z</dc:date>
    </item>
    <item>
      <title>Quote:John D. McCalpin wrote:</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Question-about-get-Gflops-and-AVX-performance/m-p/1029024#M4222</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;John D. McCalpin wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;If you want to measure actual floating point arithmetic execution rate you are mostly out of luck.&amp;nbsp; The performance counters that measure floating-point arithmetic instructions (scalar, 128-bit vector, and 256-bit vector) on Sandy Bridge, Ivy Bridge, and Haswell are known to "over-count".&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;The degree of over-counting depends primarily on average latency between issuing the instruction and the availability of the data that the instruction uses (either register arguments or memory arguments).&amp;nbsp;&amp;nbsp;&amp;nbsp; If all the data is in the L1 cache, then there is almost no over-counting. If the data is in the L2 cache then you can get slight over-counting (10%-20%, but variable), and if all the data is in memory the counts can be as much as 6x to 10x higher than the actual number of completed floating-point arithmetic instructions.&lt;/P&gt;

&lt;P&gt;See more discussion at &lt;A href="https://software.intel.com/en-us/forums/topic/499193"&gt;https://software.intel.com/en-us/forums/topic/499193&lt;/A&gt; and &lt;A href="https://software.intel.com/en-us/forums/topic/531796"&gt;https://software.intel.com/en-us/forums/topic/531796&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;I have measure it on Sandy Bridge and Ivy Bridge, but not Haswell. I can accept slight over-counting. I have check the documents from&amp;nbsp;http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html. But it not list events about flops and vector on Haswell.&lt;/P&gt;</description>
      <pubDate>Tue, 21 Oct 2014 08:19:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Question-about-get-Gflops-and-AVX-performance/m-p/1029024#M4222</guid>
      <dc:creator>GHui</dc:creator>
      <dc:date>2014-10-21T08:19:05Z</dc:date>
    </item>
    <item>
      <title>Sorry, because of slow</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Question-about-get-Gflops-and-AVX-performance/m-p/1029025#M4223</link>
      <description>&lt;P&gt;Sorry, because of slow internet, I clicked submit button one more times.&lt;/P&gt;</description>
      <pubDate>Tue, 21 Oct 2014 08:21:49 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Question-about-get-Gflops-and-AVX-performance/m-p/1029025#M4223</guid>
      <dc:creator>GHui</dc:creator>
      <dc:date>2014-10-21T08:21:49Z</dc:date>
    </item>
    <item>
      <title>Quote:GHui wrote:</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Question-about-get-Gflops-and-AVX-performance/m-p/1029026#M4224</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;GHui wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;&lt;STRONG class="quote-header"&gt;Quote:&lt;/STRONG&gt;&lt;/P&gt;

&lt;BLOCKQUOTE class="quote-msg quote-nest-1 odd"&gt;
	&lt;DIV class="quote-author"&gt;&lt;EM class="placeholder"&gt;iliyapolak&lt;/EM&gt; wrote:&lt;/DIV&gt;

	&lt;P&gt;&amp;nbsp;&lt;/P&gt;

	&lt;P&gt;Do you want to measure program performance?&lt;/P&gt;

	&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Yes, is there some way to do that?&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;You can use VTune for do that. Start measurement&amp;nbsp; by choosing Lightweight Hotspots and move deeper by choosing more advanced analysis types&amp;nbsp;.&lt;/P&gt;</description>
      <pubDate>Tue, 21 Oct 2014 10:39:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Question-about-get-Gflops-and-AVX-performance/m-p/1029026#M4224</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2014-10-21T10:39:11Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;&gt;I have measure it on Sandy</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Question-about-get-Gflops-and-AVX-performance/m-p/1029027#M4225</link>
      <description>&lt;P&gt;&amp;gt;&amp;gt;&amp;gt;I have measure it on Sandy Bridge and Ivy Bridge, but not Haswell. I can accept slight over-counting. I have check the documents from&amp;nbsp;&lt;A href="http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html"&gt;http://www.intel.com/content/www/us/en/processors/architectures-software...&lt;/A&gt;. But it not list events about flops and vector on Haswell.&amp;gt;&amp;gt;&amp;gt;&lt;/P&gt;

&lt;P&gt;Check following paper about FP performance analysis &lt;A href="https://software.intel.com/en-us/articles/estimating-flops-using-event-based-sampling-ebs"&gt;https://software.intel.com/en-us/articles/estimating-flops-using-event-based-sampling-ebs&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 21 Oct 2014 10:51:37 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Question-about-get-Gflops-and-AVX-performance/m-p/1029027#M4225</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2014-10-21T10:51:37Z</dc:date>
    </item>
  </channel>
</rss>

