<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Let's finalize our discussion in Intel® ISA Extensions</title>
    <link>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936355#M3736</link>
    <description>Let's finalize our discussion about matrix multiplication algorithms.

&amp;gt;&amp;gt;...DGEMM does the matrix operation of &lt;STRONG&gt;C = C + A * B&lt;/STRONG&gt;...

?GEMM does more multiplications and additions by design:

&lt;STRONG&gt;C = alpha*A*B + beta*C&lt;/STRONG&gt;

However, this is ?GEMM specific and I'm talking about a generic case, like &lt;STRONG&gt;C = A * B&lt;/STRONG&gt;, and nothing else. I don't know any ISO-like standard accepted in industry regarding measuring performance of some software and everybody has its own solution(s). ( In reality I know how ISO 8001 works for X-Ray imaging software... Very-very strict... )

&amp;gt;&amp;gt;...I don't know what Kroneker Based DGEMM you're running or if you're quoting the timing for a Kroneker Product...

This is &lt;STRONG&gt;Not&lt;/STRONG&gt; a regular &lt;STRONG&gt;Kronecker Product&lt;/STRONG&gt; and that algorithm is described and I gave you a weblink earlier ( see one of my previous post ). The &lt;STRONG&gt;Kronecker Based algorithm for matrix multiplication&lt;/STRONG&gt; is a really high performance algorithm implemented in Fortran by another software developer ( &lt;STRONG&gt;Vineet Y&lt;/STRONG&gt; - http://software.intel.com/en-us/user/798062 ).

&amp;gt;&amp;gt;... I suspect you're not doing a traditional matrix mulitplication...

Once again, take a look at a document posted on the webpage I've mentioned and a description of the algorithm is available.</description>
    <pubDate>Wed, 10 Jul 2013 04:21:00 GMT</pubDate>
    <dc:creator>SergeyKostrov</dc:creator>
    <dc:date>2013-07-10T04:21:00Z</dc:date>
    <item>
      <title>Haswell GFLOPS</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936288#M3669</link>
      <description>&lt;P&gt;Hi Intel Experts:&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; I cannot find the latest Intel Haswell CPU GFlops, could you please let me know that?&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; I want to understand the performance difference between Haswell and Ivy-bridge, for example, i7-4700HQ and i7-3630QM. From Intel website, I could know i7-3630QM's GFlops is 76.8 (Base). Could you please let me know that of i7-4700HQ?&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; I get some information from internet that:&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Intel SandyBridge and Ivy-Bridge have the following floating-point performance: 16-SP FLOPS/cycle --&amp;gt; 8-wide AVX addition and 8-wide AVX multiplication.&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Intel Haswell have the following floating-point performance: 32-SP FLOPS/cycle --&amp;gt; two 8-wide FMA (fused multiply-add) instructions&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; I have two questions here:&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; 1. Take i7-3632QM as an example: 16 (SP FLOPS/cycle) X 4 (Quad-core) X 2.4G (Clock) = 153.6 GFLOPS = 76.8 X 2. Does it mean that one operation is a combined addition and multiplication operation?&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; 2. Does Haswell have TWO FMA?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; Thank you very much for any comments.&lt;/P&gt;
&lt;P&gt;Best Regards,&lt;/P&gt;
&lt;P&gt;Sun Cao&lt;/P&gt;</description>
      <pubDate>Wed, 26 Jun 2013 09:42:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936288#M3669</guid>
      <dc:creator>caosun</dc:creator>
      <dc:date>2013-06-26T09:42:09Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;...Intel SandyBridge and</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936289#M3670</link>
      <description>&amp;gt;&amp;gt;...Intel SandyBridge and Ivy-Bridge have the following floating-point performance: 16-SP FLOPS/cycle --&amp;gt; 8-wide
&amp;gt;&amp;gt;AVX addition and 8-wide AVX multiplication...

If you have &lt;STRONG&gt;Haswell&lt;/STRONG&gt; and &lt;STRONG&gt;Ivy Bridge&lt;/STRONG&gt; systems you could easily evaluate their &lt;STRONG&gt;real&lt;/STRONG&gt; performance and you need to use a &lt;STRONG&gt;Vec_samples.zip&lt;/STRONG&gt; sample from Intel Parallel Studio XE 2013.</description>
      <pubDate>Fri, 28 Jun 2013 00:16:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936289#M3670</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2013-06-28T00:16:02Z</dc:date>
    </item>
    <item>
      <title>Hi Sergey:</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936290#M3671</link>
      <description>&lt;P&gt;Hi Sergey:&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; I do not have Haswell systems now.&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; Even I have it, it will be very helpful if Intel could provide me more information.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Best Regards,&lt;/P&gt;
&lt;P&gt;Sun Cao&lt;/P&gt;</description>
      <pubDate>Fri, 28 Jun 2013 00:41:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936290#M3671</guid>
      <dc:creator>caosun</dc:creator>
      <dc:date>2013-06-28T00:41:08Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;...Does Haswell have TWO</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936291#M3672</link>
      <description>&amp;gt;&amp;gt;...Does Haswell have TWO FMA?..

There are 6 different groups of FMA instructions ( 60 instructions in total ) and please take a look at:

software.intel.com/en-us/blogs/2011/06/13/haswell-new-instruction-descriptions-now-available</description>
      <pubDate>Fri, 28 Jun 2013 01:31:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936291#M3672</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2013-06-28T01:31:04Z</dc:date>
    </item>
    <item>
      <title>Haswell execution engine has</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936292#M3673</link>
      <description>&lt;P&gt;Haswell execution engine has two Ports dedicated also&amp;nbsp; to FMA(one FMA per port) instructions(Port0 and Port1) so you have doubled bandwidth of gflops/cycle.&lt;/P&gt;
&lt;P&gt;On Haswell one FMA operation combines&amp;nbsp; multiplication and&amp;nbsp; addidtion when compared to previous architecture such a operation could stall two ports when executing at the same time.&lt;/P&gt;</description>
      <pubDate>Fri, 28 Jun 2013 07:50:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936292#M3673</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2013-06-28T07:50:07Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;I do not have Haswell</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936293#M3674</link>
      <description>&amp;gt;&amp;gt;I do not have Haswell systems now.
&amp;gt;&amp;gt;
&amp;gt;&amp;gt;Even I have it, it will be very helpful if Intel could provide me more information...

I agree with that. As soon as you have a Haswell system you could do a veri quick evaluation of performance with &lt;STRONG&gt;Vec_samples.zip&lt;/STRONG&gt; from ..\Composer XE\Samples\en_US\C++ folder ( for a Windows platform )

Here are some additional technical details:

&lt;STRONG&gt;Compiler options&lt;/STRONG&gt;: /O3 /Qstd=c99 /Qrestrict /Qipo

...
#define ALIGNED
#define NOALIAS
#define NOFUNCCALL	// Note: Inlining
...

&lt;STRONG&gt;[ Test 1 - No Vectorization &amp;amp; No Inlining &amp;amp; No IPO &amp;amp; /O2 are used - Release ]&lt;/STRONG&gt;

ROW:256 COL: 256
Execution time is 12.750 seconds
&lt;STRONG&gt;GigaFlops = 0.673720&lt;/STRONG&gt;
Sum of result = 1279224.000000

&lt;STRONG&gt;[ Test 2 - Vectorization &amp;amp; Alignment &amp;amp; Inlining &amp;amp; IPO &amp;amp; /O3 are used - Release ]&lt;/STRONG&gt;

ROW:256 COL: 256
Execution time is 4.734 seconds
&lt;STRONG&gt;GigaFlops = 1.814519&lt;/STRONG&gt;
Sum of result = 1279224.000000

As you can see &lt;STRONG&gt;Test 2&lt;/STRONG&gt; is ~&lt;STRONG&gt;2.7&lt;/STRONG&gt; times &lt;STRONG&gt;faster&lt;/STRONG&gt; then &lt;STRONG&gt;Test 1&lt;/STRONG&gt;.</description>
      <pubDate>Sun, 30 Jun 2013 06:03:44 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936293#M3674</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2013-06-30T06:03:44Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;&gt;&gt;...i7-3630QM's GFlops is</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936294#M3675</link>
      <description>&amp;gt;&amp;gt;&amp;gt;&amp;gt;...i7-3630QM's &lt;STRONG&gt;GFlops is 76.8&lt;/STRONG&gt; (Base)...
&amp;gt;&amp;gt;
&amp;gt;&amp;gt;&lt;STRONG&gt;GigaFlops = 1.814519&lt;/STRONG&gt;

By the way, two numbers I gave you are for &lt;STRONG&gt;Pentium 4&lt;/STRONG&gt; and you can see that &lt;STRONG&gt;i7-3630QM&lt;/STRONG&gt; is ~42x faster when processing is done using all cores.

Let me know if you're interested to see numbers for &lt;STRONG&gt;Ivy Bridge&lt;/STRONG&gt; system.</description>
      <pubDate>Sun, 30 Jun 2013 06:10:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936294#M3675</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2013-06-30T06:10:10Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;&gt;By the way, two numbers I</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936295#M3676</link>
      <description>&lt;P&gt;&amp;gt;&amp;gt;&amp;gt;By the way, two numbers I gave you are for &lt;STRONG&gt;Pentium 4&lt;/STRONG&gt; and you can see that &lt;STRONG&gt;i7-3630QM&lt;/STRONG&gt; is ~42x faster when processing is done using all cores.&amp;gt;&amp;gt;&amp;gt;&lt;/P&gt;
&lt;P&gt;Are those results obtained from testing Vec_samples?&lt;/P&gt;
&lt;P&gt;Afaik Pentium 4 cannot calculate at the same time fadd and fmul.Haswell core is able to&amp;nbsp; schedule for execution one FMA(two fp instructions) per one thread it is a tremendous improvement in raw processing power when compared to Pentium 4&lt;/P&gt;</description>
      <pubDate>Sun, 30 Jun 2013 08:18:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936295#M3676</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2013-06-30T08:18:36Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;Are those results obtained</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936296#M3677</link>
      <description>&amp;gt;&amp;gt;Are those results obtained from testing Vec_samples?

Yes and you could take a look at it because the project is in Samples folder.</description>
      <pubDate>Sun, 30 Jun 2013 16:48:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936296#M3677</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2013-06-30T16:48:15Z</dc:date>
    </item>
    <item>
      <title>Thanks</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936297#M3678</link>
      <description>&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Mon, 01 Jul 2013 07:14:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936297#M3678</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2013-07-01T07:14:05Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;...From Intel website, I</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936298#M3679</link>
      <description>&amp;gt;&amp;gt;...From Intel website, I could know &lt;STRONG&gt;i7-3630QM&lt;/STRONG&gt;'s GFlops is &lt;STRONG&gt;76.8 (Base)&lt;/STRONG&gt;...

Sun Cao, 

I couldn't find information about GFlops on &lt;STRONG&gt;ark.intel.com&lt;/STRONG&gt; and my question is where did you find that number?</description>
      <pubDate>Mon, 01 Jul 2013 12:47:47 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936298#M3679</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2013-07-01T12:47:47Z</dc:date>
    </item>
    <item>
      <title>Actually on Ivy Bridge you</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936299#M3680</link>
      <description>&lt;P&gt;Actually on Ivy Bridge you have 1 wide fadd/cycle and 1 wide fmul/cycle&amp;nbsp; it can be either SP(8 flops) or DP(4 flops) and mulitplied by 4 cores and by clock grequency 2.4 ghz = 76.8 Gflops.&lt;/P&gt;</description>
      <pubDate>Mon, 01 Jul 2013 15:44:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936299#M3680</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2013-07-01T15:44:27Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;&gt;&gt;...From Intel website, I</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936300#M3681</link>
      <description>&amp;gt;&amp;gt;&amp;gt;&amp;gt;...From Intel website, I could know i7-3630QM's GFlops is 76.8 (Base)...
&amp;gt;&amp;gt;
&amp;gt;&amp;gt;Sun Cao, 
&amp;gt;&amp;gt;
&amp;gt;&amp;gt;I couldn't find information about GFlops on ark.intel.com and my question is where did you find that number?

This is how it looks like in reality:

&lt;STRONG&gt;[ Test 1 on a system with Pentium 4 ]&lt;/STRONG&gt;
&lt;STRONG&gt;[ SSE2 - 32-bit Intel C++ compiler options - 1 CPU used ]&lt;/STRONG&gt;

Note: For all test cases /O3 /QaxSSE2 /Qstd=c99 options are used

GigaFlops = 1.808407 - 
GigaFlops = 1.814136 - /Qrestrict /Qansi-alias
GigaFlops = 1.844917 - /Qrestrict /Qansi-alias /Qipo
GigaFlops = 1.851279 - /Qrestrict /Qansi-alias /Qipo /Qunroll=4
GigaFlops = 1.889559 - /Qrestrict /Qansi-alias /Qipo /Qunroll=8
GigaFlops = &lt;STRONG&gt;2.147484&lt;/STRONG&gt; - /Qrestrict /Qansi-alias /Qipo /Qunroll=8 /Qopt-block-factor:3 (*)
GigaFlops = 1.814519 - /Qrestrict /Qansi-alias /Qipo /Qunroll=8 /Qopt-block-factor:3 /Qopt-mem-layout-trans:3
GigaFlops = 1.929022 - /Qrestrict /Qansi-alias /Qipo /Qunroll=8 /Qopt-block-factor:3 /Qopt-mem-layout-trans:3 /Qopt-prefetch:4
GigaFlops = 0.628287 - /Qrestrict /Qansi-alias /Qparallel
GigaFlops = 0.628333 - /Qrestrict /Qansi-alias /Qipo /Qparallel

(*) - Best result</description>
      <pubDate>Tue, 02 Jul 2013 00:12:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936300#M3681</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2013-07-02T00:12:05Z</dc:date>
    </item>
    <item>
      <title>[ Test 2 on a system with Ivy</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936301#M3682</link>
      <description>&lt;STRONG&gt;[ Test 2 on a system with Ivy Bridge ]&lt;/STRONG&gt;
&lt;STRONG&gt;[ AVX - 64-bit Intel C++ compiler options - 1 CPU used ]&lt;/STRONG&gt;

Note: For all test cases /O3 /QaxAVX /Qstd=c99 options are used

GigaFlops = 11.228673 - 
GigaFlops = 11.228673 - /Qrestrict /Qansi-alias
GigaFlops = 11.243370 - /Qrestrict /Qansi-alias /Qipo
GigaFlops =  9.326748 - /Qrestrict /Qansi-alias /Qipo /Qunroll=4
GigaFlops = 11.228673 - /Qrestrict /Qansi-alias /Qipo /Qunroll=8
GigaFlops = &lt;STRONG&gt;11.243370&lt;/STRONG&gt; - /Qrestrict /Qansi-alias /Qipo /Qunroll=8 /Qopt-block-factor:3 (*)
GigaFlops = 11.228673 - /Qrestrict /Qansi-alias /Qipo /Qunroll=8 /Qopt-block-factor:3 /Qopt-mem-layout-trans:3
GigaFlops = 11.228673 - /Qrestrict /Qansi-alias /Qipo /Qunroll=8 /Qopt-block-factor:3 /Qopt-mem-layout-trans:3 /Qopt-prefetch:4

&lt;STRONG&gt;[ AVX - 64-bit Intel C++ compiler options - 8 CPUs used ]&lt;/STRONG&gt;

GigaFlops = &lt;STRONG&gt;60.333168&lt;/STRONG&gt; - /Qrestrict /Qansi-alias /Qparallel (*)
GigaFlops = &lt;STRONG&gt;60.333168&lt;/STRONG&gt; - /Qrestrict /Qansi-alias /Qipo /Qparallel (*)

&lt;STRONG&gt;Note&lt;/STRONG&gt;: 60.33316 = 7.541646 * 8

(*) - Best result

As you can see my number is ~21% lower that Intel's number and this is because our test cases are different. I don't think we will know how &lt;STRONG&gt;76.8&lt;/STRONG&gt; number was measured unless Intel releases source codes, or informs everybody that some Open Source test was used.</description>
      <pubDate>Tue, 02 Jul 2013 00:21:49 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936301#M3682</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2013-07-02T00:21:49Z</dc:date>
    </item>
    <item>
      <title>Hi Sergey:</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936302#M3683</link>
      <description>&lt;P&gt;Hi Sergey:&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; You can find CPU GFlops at:&amp;nbsp;http://www.intel.com/support/processors/sb/CS-017346.htm&lt;/P&gt;</description>
      <pubDate>Tue, 02 Jul 2013 00:44:26 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936302#M3683</guid>
      <dc:creator>caosun</dc:creator>
      <dc:date>2013-07-02T00:44:26Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;...You can find CPU GFlops</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936303#M3684</link>
      <description>&amp;gt;&amp;gt;...You can find CPU GFlops at: &lt;A href="http://www.intel.com/support/processors/sb/CS-017346.htm" target="_blank"&gt;http://www.intel.com/support/processors/sb/CS-017346.htm&lt;/A&gt;.

Hi, Thank you and I'll take a look.</description>
      <pubDate>Tue, 02 Jul 2013 00:53:42 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936303#M3684</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2013-07-02T00:53:42Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;&gt;As you can see my number</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936304#M3685</link>
      <description>&lt;P&gt;&amp;gt;&amp;gt;&amp;gt;As you can see my number is ~21% lower that Intel's number and this is because our test cases are different. I don't think we will know how &lt;STRONG&gt;76.8&lt;/STRONG&gt; number was measured unless Intel releases source codes, or informs everybody that some Open Source test was used.&amp;gt;&amp;gt;&amp;gt;&lt;/P&gt;
&lt;P&gt;It could be theoretical peak performance bandwidth.Real application can affect this result by introducing memory stalls or instruction interdependencies.&lt;/P&gt;</description>
      <pubDate>Tue, 02 Jul 2013 06:27:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936304#M3685</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2013-07-02T06:27:00Z</dc:date>
    </item>
    <item>
      <title>Speed for Haswell running at</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936305#M3686</link>
      <description>&lt;P&gt;Speed for Haswell running at 4GHz here is ~116GFlops in Intel optimized linpack from MKL.&lt;/P&gt;</description>
      <pubDate>Tue, 02 Jul 2013 11:39:17 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936305#M3686</guid>
      <dc:creator>levicki</dc:creator>
      <dc:date>2013-07-02T11:39:17Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;...Speed for Haswell</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936306#M3687</link>
      <description>&amp;gt;&amp;gt;...Speed for Haswell running at 4GHz here is ~116GFlops in Intel optimized linpack from MKL...

Thanks for the tip regarding Linpack. I did a verification using older version of Linpack and numbers for Pentium 4 are 4x (!) lower:
...
Mflops
580.59
532.56
578.32
587.83
532.69
Average 562.40
...
That is 0.562Gflops and it was just a quick verification of my numbers.</description>
      <pubDate>Tue, 02 Jul 2013 13:49:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936306#M3687</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2013-07-02T13:49:22Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;...Speed for Haswell</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936307#M3688</link>
      <description>&lt;P&gt;&amp;gt;&amp;gt;...Speed for Haswell running at 4GHz here is ~116GFlops in Intel optimized linpack from MKL..&amp;gt;&amp;gt;&amp;gt;&lt;/P&gt;
&lt;P&gt;Haswell can pose a challenge for low end GPUs in terms of DP Gflops.&lt;/P&gt;</description>
      <pubDate>Tue, 02 Jul 2013 14:09:48 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Haswell-GFLOPS/m-p/936307#M3688</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2013-07-02T14:09:48Z</dc:date>
    </item>
  </channel>
</rss>

