Posted on behalf of Brijesh Tripathi
Intel has set an aggressive goal to reach zettascale (1021 flop/s or 1000x faster than exascale) performance in the datacenter over the next four to five years. This previously unheard-of performance high water mark will be realized via a series of incremental steps.
To reach zettascale performance, we are leaning in on mobile architecture and creating more power efficient semiconductor devices while rethinking the Commodity Off the Shelf (COTS) model. Instead of throwing massive numbers of commodity chips and boards into a system and relying on scaling to generate flop/s, we are looking at the physics of computation and what the end user does with their data. Very simply, COTS puts users at the mercy of the most critical component be it a chip on a motherboard or the CPU/GPU device in a computational node. Instead, we have to think about memory and understand how computational devices drive end customer performance.
The zettascale journey begins by thinking about the devices we are building and why we are building them. Looking at what is happening in the datacenter today, we recognize that too much time and power is spent moving data. In general, data movement overheads are costing us large amounts of execution time and energy causing driving serious datacenter time and power inefficiencies.
It is estimated that the impact on power consumption is extreme as roughly 2/3rd of the power entering a semiconductor package is spent moving bits.1
Some of this power is used to perform useful work (e.g., moving data to cache), but much of the data movement does nothing more than generate heat rather than providing useful compute. Fixing this does not require novel architectures - which are expensive and time-consuming investments - but rather by paying attention to the data to locate compute next to data.
Intel is already working to rearchitect our hardware and software. This is the key technology driver behind our message, expressed by Aditya Navale (Intel Fellow, director of GPU core IP architecture), that "It's always software first that drives our architecture.”
Software first means that we must move away from the COTS thinking that is prevalent in the computer industry. Instead, the industry must get into software optimization. We are even rethinking software APIs and how they force data to be moved and copied. In many cases, such software overhead is boilerplate that can be avoided, thus precluding unnecessary data movement, power consumption, heat generation, and transmission latency - all of which are bad. Software first means smarter software that will result in more efficient hardware.
Next, think about the physics of power transmission to rethink how we get power to a semiconductor device and what happens to the power that does make it inside the package. Currently, only about half the power delivered to a semiconductor device gets into the package.1
That is a huge loss that does nothing more than generate heat. The cause of this massive inefficiency is due in large part to voltage converters, which are required to create and transmit a range of low voltages to service all the semiconductor devices on a board.2
Most semiconductor devices are designed using a COTS model to meet the needs many customers. This bag-of-chips approach makes voltage converters a necessary component in creating a working board. For power efficiency reasons, it is essential that we rethink this model. The physics of power transmission mandates that we move to semiconductor packages that accept higher voltages such as 48 or 480 volts and bring voltage converters closer to consumers to reduce transmission losses. Nineteenth century physicists quickly learned that high voltages are required for efficient power transmission. These insights resulted in standardized voltages that are now the basis of our power grid. Let’s bring those same 19th century insights into 21st century semiconductor design.
Historically, the semiconductor industry has pushed to increase processor frequencies and single core performance when creating faster computational devices. This results in leakier devices and higher operating voltages creating huge power inefficiencies. The fix for such massive inefficiency is already a reality at Intel and is part of our early incremental steps on the path to zettascale computation. To understand our approach, consider one of the simplest models of CPU power consumption, specifically Power = (capacitance) * (voltage)2 * frequency as defined in equation 1 in the Enhanced Intel® SpeedStep® Technology for the Intel® Pentium® M Processor document.3
Today’s servers use a turbo boost voltage of 1.1v. Commodity PC processors tend to use higher turbo boost voltages in the 1.4v to 1.5v range. Now examine the power savings if we run the logic of those same devices at 0.35v. You will see (per the endnote) that we get a 10x performance improvement per watt for server chips and even more for PC chips simply by dropping this voltage.4
Finally, as physics has shown, devices run more efficient at colder temperatures. We are working on enabling colder operation of devices, which will result in higher performance at lower temperatures. Of course, semiconductor design and manufacturing are much more complex, but we can compensate for the myriad of issues not reflected in this simple power consumption model (capacitance, gate leakage, voltage drop due to distance, etcetera) by using more logic to deliver a high performance per watt. For even better efficiency, we want to run the on-device logic at 0.2v but there is no industry commitment to this voltage at this time.
I stress that the zettascale work discussed in this blog is already a reality at Intel. We don’t need to drop into the weeds, for example, to pursue a theoretical discussion of the feasibility of a low voltage approach. Our first step is to run at 0.65v, which is already supported by Intel logic. Our next step is to push further towards lower voltages by running at 0.5v or 0.6v for even greater efficiency.
This blog is a follow-on to the vision presented in Raja Koduri’s chip notes that lay out Intel’s path to zettascale. Even better, watch this blog to stay abreast of our progress and advances as Intel moves to zettascale performance without relying on COTS scaling and massive power budgets to advance datacenter performance.
1 Estimates are based on the many sources and studies that support this estimation, such as https://hpc.pnl.gov//modsim/2014/Presentations/Kestor.pdf and https://ieeexplore.ieee.org/document/6704670.
2 There are many sources and studies such as https://prace-ri.eu/wp-content/uploads/hpc-centre-electricity-whitepaper-2.pdf and https://www.electronicdesign.com/power-management/whitepaper/21170904/electronic-design-data-centers-feel-the-power-density-pinch.
3 More detailed information can be found in Enhanced Intel® SpeedStep® Technology for the Intel® Pentium® M Processor. This blog uses equation 1 from this publication to provide a simple model of power consumption.
4 Keeping the capacitance and frequency fixed in equation 1 from Enhanced Intel® SpeedStep® Technology for the Intel® Pentium® M Processor, we see that the ratio between the two logic voltages is equivalent to (1.1)2/(0.35)2 implying a power savings of 9.87 when using the lower voltage compared to current server voltage value.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.