Hi Intel Experts:
I cannot find the latest Intel Haswell CPU GFlops, could you please let me know that?
I want to understand the performance difference between Haswell and Ivy-bridge, for example, i7-4700HQ and i7-3630QM. From Intel website, I could know i7-3630QM's GFlops is 76.8 (Base). Could you please let me know that of i7-4700HQ?
I get some information from internet that:
Intel SandyBridge and Ivy-Bridge have the following floating-point performance: 16-SP FLOPS/cycle --> 8-wide AVX addition and 8-wide AVX multiplication.
Intel Haswell have the following floating-point performance: 32-SP FLOPS/cycle --> two 8-wide FMA (fused multiply-add) instructions
I have two questions here:
1. Take i7-3632QM as an example: 16 (SP FLOPS/cycle) X 4 (Quad-core) X 2.4G (Clock) = 153.6 GFLOPS = 76.8 X 2. Does it mean that one operation is a combined addition and multiplication operation?
2. Does Haswell have TWO FMA?
Thank you very much for any comments.
Best Regards,
Sun Cao
Hi Sergey:
You can find CPU GFlops at: http://www.intel.com/support/processors/sb/CS-017346.htm
Hi Sergey:
I do not have Haswell systems now.
Even I have it, it will be very helpful if Intel could provide me more information.
Best Regards,
Sun Cao
Haswell execution engine has two Ports dedicated also to FMA(one FMA per port) instructions(Port0 and Port1) so you have doubled bandwidth of gflops/cycle.
On Haswell one FMA operation combines multiplication and addidtion when compared to previous architecture such a operation could stall two ports when executing at the same time.
>>>By the way, two numbers I gave you are for Pentium 4 and you can see that i7-3630QM is ~42x faster when processing is done using all cores.>>>
Are those results obtained from testing Vec_samples?
Afaik Pentium 4 cannot calculate at the same time fadd and fmul.Haswell core is able to schedule for execution one FMA(two fp instructions) per one thread it is a tremendous improvement in raw processing power when compared to Pentium 4
Thanks
Actually on Ivy Bridge you have 1 wide fadd/cycle and 1 wide fmul/cycle it can be either SP(8 flops) or DP(4 flops) and mulitplied by 4 cores and by clock grequency 2.4 ghz = 76.8 Gflops.
Hi Sergey:
You can find CPU GFlops at: http://www.intel.com/support/processors/sb/CS-017346.htm
>>>As you can see my number is ~21% lower that Intel's number and this is because our test cases are different. I don't think we will know how 76.8 number was measured unless Intel releases source codes, or informs everybody that some Open Source test was used.>>>
It could be theoretical peak performance bandwidth.Real application can affect this result by introducing memory stalls or instruction interdependencies.
Speed for Haswell running at 4GHz here is ~116GFlops in Intel optimized linpack from MKL.
>>...Speed for Haswell running at 4GHz here is ~116GFlops in Intel optimized linpack from MKL..>>>
Haswell can pose a challenge for low end GPUs in terms of DP Gflops.
For more complete information about compiler optimizations, see our Optimization Notice.