- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What throughput does it report?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
*1: http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/cpp/win/intref_cls/common/intref_bk_avx_fma.htm
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Anyway, I really don't think Ivy Bridge will feature FMA support. The problem is that with two FMA units per core, the register file would require six read ports to sustain maximum throughput. According to Agner Fog's tests, they've only just increased it to four ports. Increasing it to six likely requires changes which are too significant to include in Ivy Bridge (although I'd love to be wrong).
Another solution would be to keep the current register file and fetch the third operand in another cycle. That would limit the peak sustainable performance, but most code has plenty of instructions with fewer input operands and integer instructions mixed in anyway. Also perhaps the bypass network can provide the extra operands most of the time. In any case, I fear we'll have to wait till Haswell for FMA support.
Note that although vblendps can take three register input operands, it actually appears to be split into two uops: one for creating the mask, and one for the actual blend. So it's only a narrow third operand, and it doesn't require a register file access. This also explains why this variant of the instruction has half the throughput and twice the latency.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm not sure because I was too lazy to test it without a supporting compiler. AFAIK the SDE documentation refer to the AVX specs including FMA soFMA3 must be supported,but since the compiler documentation is wrong (and unfixed 4 months after complaining) the SDE documentation may be wrong as well.
>Anyway, I really don't think Ivy Bridge will feature FMA support
indeed, provided the compiler is not yet releasedFMA3 is probably not for Ivy Bridge, thoughthere are slides whereIvy Bridgeis qualified as a"Tick+"and talking about "enhanced AVX support", maybe the enhancement to AVX are only the FP16 <-> FP32 conversions and the RND generator but it looks very slim, I hope that at least the L1/L2 cache bandwidth was increased (doubled) to have somegood speedups with AVX-256 code,at last. Increasing the cache bandwidth is more urgent than FMA IMHO and a pre-requisite for a full fledged 2 FMA per clock implementation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>Ivy Bridge is primarily a shrink to 22nm, not instruction set enhancement.
my understandingis that at least the seldom "Post-32nm processor instructions"will beincluded in Ivy, much like every "Tick" in the past has benefitet from some ISA enhancements
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I believe the Tick+ refers to simultaneously introducing 22 nm and FinFET technology, not any kind of architectural change. FP16 and RND instructions are pretty minor extensions that only affect a few components. I'm not expecting much else, since the move to 22 nm + FinFET is a major leap by itself and the whole Tick-Tock idea is to spread the risks. FMA support not only requires the extra operand but also higher cache bandwidth. Ivy Bridge will be a great refresh of Sandy Bridge, but I'm looking forward to what Haswell will bring.
If if includes significant changes to the cache hierarchy, there's actually a spark of hope that it includes gather/scatter support as well, making Haswell the first feature-complete thoughput-oriented CPU architecture...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>I believe the Tick+ refers to simultaneously introducing 22 nm and FinFET technology
I don't think so, the new fab processaccount forthe "Tick" not for the "+". New process technologies are always introduced (at Intel) at a new node,for example copper interconnects at 0.13um, strained siliconat 90nm, high-k + metal gates at 45nm, etc.
From what we know the "+"may befor :
- DX11 and OpenCL support in the iGPU +increased EU count("next Gen Intel HD Graphics")
- Next Gen Quick Sync"
- "Ultra-Performance Configurable TDP" with the new "docked mode"
- "Post-32nm processor instructions" (not officially announced for Ivy Bridge but obvious from the name)
- Better peformance for AVX-256 code thanks to uarch improvements (my wild guess after seing a slide mentioning "enhanced AVX acceleration")
- A marketing gimmick, after all Penryn was qualified as a simple"Tick" with 47 new instructions in the ISA and other uarch changes like the radix-16 divider and the vastly improved shuffle engine
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't think so, the new fab processaccount forthe "Tick" not for the "+". New process technologies are always introduced (at Intel) at a new node,for example copper interconnects at 0.13um, strained siliconat 90nm, high-k + metal gates at 45nm, etc.
From what we know the "+"may befor :
- DX11 and OpenCL support in the iGPU +increased EU count("next Gen Intel HD Graphics")
- Next Gen Quick Sync"
- "Ultra-Performance Configurable TDP" with the new "docked mode"
- "Post-32nm processor instructions" (not officially announced for Ivy Bridge but obvious from the name)
- Better peformance for AVX-256 code thanks to uarch improvements (my wild guess after seing a slide mentioning "enhanced AVX acceleration")
- A marketing gimmick, after all Penryn was qualified as a simple"Tick" with 47 new instructions in the ISA and other uarch changes like the radix-16 divider and the vastly improved shuffle engine
FinFET really is a major leap. They could have gone 22 nm completely without it, but after ten years of R&D decided to introduce it simultaneously with this new node. That's definitely a Tick+ to me. Note that the competition isn't expected to use non-planar transistors till around the 14 nm node. So it's not something to make 22 nm feasible, it's something extra. And it's no small feature. It cuts power consumption in half, or offers 30% higher performance. It's practically combines the advantages of two process generations into one; hence Tick+ is a fitting name.
I still seriously doubt it indicates any other change:
- The IGP has evolved independently from the Tick-Tock model before.
- Next gen Quick Sync probably adds support for WebM. A nice addition but hardly major in the bigger picture.
- Configurable TDP is sort of a consequence of FinFET. You can choose between much lower power consumption while on the road, or a nice speed boost while docked.
- There's only talk of "enhanced AVX support", which is likely merely the FP16 and RND instructions.
- While Penryn indeed added 47 new instructions, supporting these merely required changes to the ALUs and decoder. The architecture itself is unaffected. Likewise super shuffle was a welcome addition but these sort of things just required the transistor budget to become feasible.
So Tick+ really seems to indicate an extra large Tick, not a Tick with architectural changes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
FYI the original slide says "enhanced AVX *acceleration*"
http://www.google.com/#sclient=psy&hl=en&safe=off&source=hp&q=%22enhanced+AVX+acceleration%22
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This also confirms the Tick+ really just refers to FinFET. Which in turn means it's a major technological breakthrough on its own...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I can't see howthisvery cool Haswell disclosureconfirmsanything about what"Tick+" is or is not for Ivy Bridge,now I will beinterested to hear what people in the know have to say about this "+" if they are allowed.
I personnalyhope for some IPC increase for AVX code in Ivy Bridge thanks to uarch improvements, Idon't think thatincreased clock speed alone willl explainthe 20% performance boost reported by a lot of sites such as this one:
http://www.xbitlabs.com/news/cpu/display/20110203150914_Intel_s_Next_Gen_Ivy_Bridge_to_Offer_20_30_Performance_Boost_Over_Sandy_Bridge_Report.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I personnalyhope for some IPC increase for AVX code in Ivy Bridge thanks to uarch improvements, Idon't think thatincreased clock speed alone willl explainthe 20% performance boost reported by a lot of sites such as this one:
http://www.xbitlabs.com/news/cpu/display/20110203150914_Intel_s_Next_Gen_Ivy_Bridge_to_Offer_20_30_Performance_Boost_Over_Sandy_Bridge_Report.html
FinFET is a perfectly good explanation for the Tick+ designation. It's by far the biggest novelty for Ivy Bridge we know about. I see little point in looking any further with something like that fully confirmed and detailed. Non-planar technology radically changes semiconductor scaling behavior.
20% performance increase can easily be achieved with a higher Turbo Boost frequency. Note once again that FinFET allows significantly higher switching speeds while keeping power consumption in check. The process shrink should also allow for bigger caches. Since that has happened with every shrink, it's barely noteworthy, but it does help explain how a 20% performance increase is entirely feasible without micro-architecture changes.
By the way, the blog post about Haswell New Instructions pretty much answers your question about FMA throughput: "our floating-point multiply accumulate significantly increases peak flops". In particular it means Haswell will feature two 256-bit FMA units per core.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
FinFET is a perfectly good explanation for the Tick+ designation
I think you made your point very clear already. From the link I just provided:
"It is expected that Ivy Bridge CPUs, which will be made using 22nm process technology, will have certain micro-architecture level enhancements along with clock-speed and some other methods to boost performance."
so please understand that someother peoplehave another opinion,"other methods" for example may be for the rumored stacked DRAM, also we still don't know ATM what is referred to as "enhanced AVX acceleration"
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
so please understand that someother peoplehave another opinion,"other methods" for example may be for the rumored stacked DRAM, also we still don't know ATM what is referred to as "enhanced AVX acceleration"
The stacked DRAM rumor has absolutely no credibility. 30% higher IGP performance can simply be achieved by using 16 EUs instead of 12, and using DDR3-1600 instead of 1333. Stacked DRAM on the other hand would be used to provide a massive increase in bandwidth, and we would have gotten some official confirmation about the use of such technology and its far reaching consequences by now. The silicon and packaging cost would be substantial. Seriously, the numbers just don't add up. Other technologies offer bandwidth scaling at a lower cost: DDR3 will continue to scale up for a few more years, after which DDR4 will take over. Point-to-point memory topologies andThrough-Silicon Via (TSV) technology have been confirmed to be in active development. That's for the 2015 timeframe though, and the need for DRAM based L4 caches is even further out. For the short-term, Ivy Bridge, there's no reason to expect anything radical since we're not running into big issues yet.
I suspect someone heard about TSV and simply started to fantasize out loud.
And it only takes a single reporter to jot down "enhanced AVX acceleration" when hearing aboutaccelerated half-float support, to make some people think it's something more substantial. Please read Mark Buxton's blog post again, it clearly indicates Ivy Bridge will merely add support for what is called post-32nm instructions in the Programming Reference.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Nope, you got it wrong: as already explained "Enhanced AVX acceleration" is in the original Intel slide:
http://overclockingevent.com/index.php?option=com_content&task=view&id=2071&Itemid=65
>That's for the 2015 timeframe though
huh?
>Please read Mark Buxton's blog post again, it clearly indicates Ivy Bridge will merely add support for what is called post-32nm instructions in the Programming Reference.
his very cool posttalks only ofthe ISA, nothing is told about theimplemention or uarch enhancements (or lack of)in Ivy Bridge, since neither you nor me haveanything reallyconclusive so far I'll sugest to stop the speculations and to wait for upcoming official information
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Greased graphics turn a tick into a tock
I stand corrected. The tick+ refers to the graphics after all. They should have called it a tick++ for the Tri-Gate though. ;-)
I'm still not expecting AVX enhancements beyond the few new instructions, but again I wouldn't mind being wrong.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hint: see the comment for the 1.25 x Excel 2010 speedup here :
http://wccftech.com/intel-slides-officially-detail-3rd-generation-ivy-bridge-processors-architecture-launch-dates-performance-estimates-i7-2600k/
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page