- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Intel Experts:
I cannot find the latest Intel Haswell CPU GFlops, could you please let me know that?
I want to understand the performance difference between Haswell and Ivy-bridge, for example, i7-4700HQ and i7-3630QM. From Intel website, I could know i7-3630QM's GFlops is 76.8 (Base). Could you please let me know that of i7-4700HQ?
I get some information from internet that:
Intel SandyBridge and Ivy-Bridge have the following floating-point performance: 16-SP FLOPS/cycle --> 8-wide AVX addition and 8-wide AVX multiplication.
Intel Haswell have the following floating-point performance: 32-SP FLOPS/cycle --> two 8-wide FMA (fused multiply-add) instructions
I have two questions here:
1. Take i7-3632QM as an example: 16 (SP FLOPS/cycle) X 4 (Quad-core) X 2.4G (Clock) = 153.6 GFLOPS = 76.8 X 2. Does it mean that one operation is a combined addition and multiplication operation?
2. Does Haswell have TWO FMA?
Thank you very much for any comments.
Best Regards,
Sun Cao
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sergey:
You can find CPU GFlops at: http://www.intel.com/support/processors/sb/CS-017346.htm
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>What challenge? And why should it be a concern regarding GPUs? I really didn't understand what you wanted to say.>>>
It was only general comment.
I meant in terms of raw DP Gflops processing power Haswell microarchitecture is closing gap with lower end GPU's so in foreseable future it can be used to perform software rendering.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Actually in Turbo Mode at 3.9Ghz theoretical peak performance expressed in DP Glops is ~249Gflops.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>That sounds really interesting and who is defining that foreseable future and who is going to use lower end GPUs with Haswell systems?>>>
I am talking about raw performance comparision between Haswell and some lower to mid range GPU's.Usage of cpu for software rendering is already a reality.
http://www.inartis.com/products/kribi%203D%20Engine/Default.aspx
>>>who is defining that foreseable future>>>
Probably Intel by releasing wider architecturally execution engine designs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>Ivy Bridge - Performance Summary (GFlops) Average = 71.9007>>>
Close to theoretical peak.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sergey Kostrov wrote:
Igor, Did you get 116 GFlops number from some website ( 1st ) or after real testing on a Haswell system ( 2nd )? In the 2nd case How many cores were used during the test?
in case you are interested I published a result of mine here: http://www.realworldtech.com/forum/?threadid=134512&curpostid=134594
I measured better than 93% efficiency with a workset entirely in the L1D and a compute:load:store ratio of 11:1:1
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sergey Kostrov wrote:
Results are consistent and the difference is ~3.45% ( it is acceptible ). during the test?
it's neither the same test nor the same test platform (even CPU frequencies look different) so IMHO there is no point to compare the results
Sergey Kostrov wrote:
My question is the same: How many cores were used during the test?
the test I reported above was with a single thread on a single core and only vfmadd213ps as compute instructions, I can't comment on the other test though
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sergey Kostrov wrote:
I don't consider it as a rumor. A header file with some 512-bit-stuff could be found in Intel Parallel Studio XE 2013 ( ..\Compiler\Include folder ) and I know about it since December 2012.
this header is for Xeon Phi targets
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sergey Kostrov wrote:
What was the point of mentioning or posting these results?
well, this thread is named "Haswell GFLOPS" and this test of mine measures Haswell GFLOPS so I suppose it is at least somewhat relevant
Sergey Kostrov wrote:
A simple test based on just one instruction vfmadd213ps can not be considered as a valid one
I don't get what you mean, any test wanting to max out GFLOPS on Haswell will use only FMA instructions for computations
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sergey Kostrov wrote:
>>...I don't get what you mean, any test wanting to max out GFLOPS on Haswell...
Run Linpack benchmark utility from MKL installation to verify your numbers. Post results as soon as it is done.
as already explained the tests aren't comparable so one can't be used to verify the other, mine is with higher compute:load/store ratio than LINPACK, I use an unrealistic very high compute:load:store 11:1:1 ratio as mentioned in my post at RWT, the goal was to come close to the 2x FMA vs ADD+MUL theoretical speedup
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sergey Kostrov wrote:
What Haswell system do you have?
4770K / 16 GB DDR3-2400 memory / Corsair H110 cooler / ASUS Z87-Pro mobo
Sergey Kostrov wrote:
I understand it and I don't want to compare and I simply would be glad to see some numbers from Linpack utility.
if these are easy to run I can have a try, I'm downloading Studio XE 2013 for Windows Update 4 right now (1.11 GB, ETA 1hr 43 min !) so I'll have the latest MKL (the one in C++ Composer XE 2013 Update 5), is it the same version you are interested in ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sergey Kostrov wrote:
linpack_xeon32.exe
linpack_xeon64.exe
runme_xeon32.bat
runme_xeon64.bat
I'm just finished running these two tests (MKL released with Composer XE 2013 Update 5 / default MKL bench .bat files / Windows 8 pro 64-bit / CPU @ 4 GHz / realtime process priority), xeon64 is incredibly long to run, pretty boring since there isn't any feedback about its progress, anyway you'll see the result files attached, hope it will be helpful for your purpose
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page