Data Center
Participate in insightful discussions regarding Data Center topics
62 Discussions

Intel Unveils Details of 5th-Gen Intel® Xeon® Scalable Processors Among Eight Papers at ISSCC 2024

ScottBair
Employee
0 0 3,650

Scott Bair is a key voice at Intel Labs, sharing insights into innovative research for inventing tomorrow’s technology.

Highlights:

  • The 2024 IEEE International Solid-State Circuits Conference (ISSCC) will be hosted in San Francisco, California, February 18th through 22nd.
  • Intel’s contributions at the conference span one tutorial, two special event panels, three forums, and eight Intel conference papers, and two papers co-authored with university partners.
  • Intel will unveil details of the 5th-Generation Intel® Xeon® Scalable Processors at this year’s conference.

The 71st IEEE International Solid-State Circuits Conference (ISSCC) will be hosted in San Francisco, California, February 18th through 22nd. Intel’s contributions at the conference span one tutorial, two special event panels, three forums, and eight conference papers. Intel researchers and engineers will present various circuit advancements, including optical transmitters, voltage regulators, and the integration of power through-silicon via (TSV) technology in an SRAM array.

Notably, Intel will unveil further details of the 5th-Generation Intel® Xeon® Scalable Processors at this year’s conference. Launched in December 2023, the latest Xeon Scalable processors deliver 18% performance improvement for general integer compute workloads and a 24% improvement for floating point workloads at iso power versus the 4th-Gen Xeon Scalable Processors. This is achieved through an improved core, enhanced process technology, increased core count, significantly larger cache, higher DDR memory speeds, die reductions, and power efficiency improvements at idle conditions. Intel Labs also contributed other key technologies to the 5th-Gen Xeon Processors.

In other exciting news, the ISSCC program includes a special event panel to honor the legacy of Gordon Moore, Co-founder of Intel. In 1965, Moore proposed Moore’s Law, the observation that the number of transistors on an integrated circuit will double every two years with minimal rise in cost. The semiconductor industry has continually innovated to squeeze more and more transistors onto ever-smaller chips and maintain the pace of Moore’s Law. Simultaneously, the world of digital electronics has been transformed through continuous advancements. These steady innovations have fueled other fields, including artificial intelligence, quantum, and biomedical engineering. In recognition of Gordon Moore’s impact on the industry, the conference session will include a fireside chat celebrating his life and legacy and venturing into the next chapter of Moore’s law in the context of upcoming IC ecosystems.

Three of Intel’s papers – A 224Gb/s sub-pJ/b PAM-4 and PAM-6 DAC-Based Transmitter in 3nm FinFET; A 4×64Gb/s NRZ 1.3pJ/b Co-Packaged and Fiber-Terminated 4-Ch VCSEL-Based Optical Transmitter; and An 87% efficient 2V-input, 200A Voltage Regulator Chiplet enabling vertical power delivery in multi-kW Systems-on-Package – were invited for live demo sessions at the conference. Continue reading below to learn more about Intel’s contributions.

 

Invited Speakers and Panelists

T8: 3D Flash Memory from Technology to the System: Past, Present and Future Developments

Speaker: Violante Moschiano

Meeting the increasing demand for non-volatile memory in a range of critical applications has required improving memory cost, performance, and power consumption. Moving from 2D to 3D NAND has become critical for meeting these requirements. Substantial density increase is enabled by wafer-level stacking of more than 300 layers and by multi-level cells via threshold-voltage scaling, with up to 5bits/cell. But, such stacking and threshold-voltage control introduce sensitivities to reliability and variations, which impact overall system performance and must be managed with smart circuit design. This tutorial covers the fundamentals of 3D NAND flash, describing the main blocks and major design approaches that have led to the most recent innovations in the field. The tutorial goes on to describe tradeoffs between the components that impact system operation, with intuition on how component-level metrics (tread , tprog , trigger rate) translate to system requirements (MB/s, IOPS, QoS). The tutorial concludes by overviewing and introducing future trends in flash memory to provide perspectives on the evolution of NAND over the next decade.

 

EE3: Mixed-Foundry Chiplets? Opportunities and Challenges

Panelist: Lalitha Immaneni

One of the advantages of chiplets is the ability to integrate chiplets developed from different manufacturing processes to realize SoCs with optimal performance/$. An I/O chip, for instance, could be built by one foundry, the core processor could be built by another foundry, and then those chips could be put together on a package.  With this, there’s potential for mixing and matching chiplets from different foundries. To make this happen, many challenges need to be overcome for the industry, including standardized interfaces between chiplets, verification of whole SoCs using chiplets made from different processes, design flows from multiple foundries, and reliability assurance, including thermal and electromagnetic interactions between chiplets.  A mixed foundry chiplet ecosystem will be crucial to facilitate productization of complex systems-on-chiplet.   

 

EE5: The Legacy of Gordon Moore

Panelist: Ian Young

Moore’s Law has propelled the semiconductor industry for decades, transforming the world through advancements in digital electronics, and to some extent, analog and RF electronics. These advancements have fueled other engineering fields such as artificial intelligence, biomedical engineering and quantum engineering. This session will include a fireside chat with semiconductor and IC design luminaries celebrating the life and legacy of Gordon Moore, discussing the impact of Moore’s law on our industry, and venturing into the next chapter of Moore’s law in the context of upcoming IC ecosystems.

 

Forum 2: Energy-Efficient AI-Computing Systems for Large-Language Models

Cloud Processors for LLM Inference

Speaker: Sailesh Kottapalli

Large-language models (LLMs), such as ChatGPT and Bard, recently gained tremendous attention by demonstrating astonishing capabilities in recognizing, summarizing, translating, predicting, and generating text and other content based on extensive knowledge from massive datasets. As LLMs serve as a crucial tool for human-to-machine communication, these models are driving a paradigm shift in the capabilities and possibilities for AI computing. The enormous energy consumption for the LLM training and inference has emerged as the key limitation to future AI computing.  This forum presents the current and next generation circuits, architectures, and systems for high-performance computing (HPC) to address the energy-efficiency challenges associated with LLMs. This includes GPU and HPC systems, cloud server SoCs, accelerators, high-bandwidth access to storage, in-package high bandwidth memory, and DRAM processing in memory. Furthermore, this forum explores LLM quantization techniques to enable next-generation mobile SoCs for LLM inference. This forum welcomes experts across industry and research organizations to present innovations to enable future energy-efficient AIcomputing systems for LLMs. 

 

Forum 5: Recent Developments in High-Performance Frequency Synthesis Circuits and Systems

High Performance Digital Fractional-N PLLs for Connectivity Standards

Speaker: Ashoke Ravi

Frequency synthesizers are among the most critical blocks in wireless, wireline, and digital clocking applications. This forum will cover the latest advances in frequency synthesis circuits and systems to efficiently generate LO signals with low phase noise, low spurious tones, and large modulation bandwidth. Prior-art techniques will be discussed in-depth, such as energy-efficient reference clocks, high-FOM wide-tuning range VCOs, low-cost low-power PLLs, and modern fractional-N digital PLLs. Special attention will also be given to pulling and spur mitigation techniques, and injection-locked frequency multipliers. The forum will be concluded by exploring mm-wave PLLs for 5G communication systems, and FMCW generation for high-performance car radars.

 

Forum 6: Toward Next Generation of Highly Integrated Electrical and Optical Transceivers

Beyond 200Gbps Electrical Transceivers – Circuit Architecture, Design Implementation and Silicon Results

Speaker: Ariel Cohen

The next generation of highly integrated transceivers for high throughput applications poses significant design challenges in terms of power efficiency, signal integrity, ISI and noise cancellation.  This forum discusses the key issues for deploying 100G+ SERDES and design approaches for 200G+, including noise mitigation, power efficient analog/digital equalization schemes (CTLE, analog FFE, DSP FFE/DFE/MLSD), modulation, and system integration (packaging, connectors, etc).  Optical transceivers also play a crucial role in extending the reach of electrical interconnects as data rates continue to increase.  Various aspects of optical transceivers based on silicon photonics are discussed, such as foundry perspectives, directly modulated vs coherent optical links, packaging techniques and fiber termination challenges. In addition, the forum covers emerging technologies including co-packaged optics and heterogenous integration of both photonic and electronic chiplets, promising denser integration while introducing new challenges.

 

Intel Conference Papers

2.3 Emerald Rapids: 5th-Generation Intel® Xeon® Scalable Processors
O. Munch, N. Nassif, C. L. Molnar, J. Crop, R. Gammack, C. P. Joshi, G. Zelic, K. Munshi, M. Huang, C. R. Morganti, S. Kandula, A. Biswas

Emerald Rapids (EMR) is the next generation Xeon Scalable Processor with 64 cores, greater than 300MB shared L3 cache, 8 DDR5 channels at 5600MT/s with 1DPC, 32GT/s PCIe/CXL lanes, 20GT/s UPI and integrated accelerators composed of 2 die in a multi-chip package (MCP).   EMR delivers 18% performance improvement for general integer compute workloads and a 24% improvement for floating point workloads at iso power versus Sapphire Rapids (SPR) achieved through an improved core, enhanced process technology, increased core count, significantly larger cache, higher DDR memory and UPI speeds, die reductions, and power efficiency improvements.  The increase in cores and over 2.5 times the cache was implemented in lower total silicon area while also simplifying packaging, assembly, and test.

 

7.2 A 224Gb/s sub-pJ/b PAM-4 and PAM-6 DAC-Based Transmitter in 3nm FinFET
Cusmai, N. Familia, E. Kuperberg, M. Nashash, D. Gottesman, D. Kumar, Z. Marcus, Y. Horwitz, S. Zalcman, J. Kim, S. Kundu, I. Radashkevich, Y. Segal, D. Lazar, U. Virobnik, M. P. Li, A. Cohen

In this paper Intel presents a 224Gb/s transmitter based on a 7b DAC driver with 9-tap FFE, fabricated in 3nm FinFET technology. The TX uses a quarter-rate 28GHz clocking for a combined single-stage, current-mode 4:1 MUX and driver with a replica driver for phase adaptation. An LC-PLL with a tuning range of 21.2 to 30.0GHz can operate at both PAM-4 and PAM-6 modes. The TX achieves 1Vppd swing and 0.92pJ/b analog energy efficiency while showing 36.0dB SNDR, 55mUI J3u03 and 62fsrms jitter.

 

14.9 A Monolithic 10.5W/mm2 600MHz Top-Metal and C4 Planar Spiral Inductor-Based Integrated Buck Voltage Regulator on 16nm-Class CMOS
Kim, H. K. Krishnamurthy, Z. Ahmed, N. Desai, S. Weng, A. Augustine, H. T. Do, J. Yu, P. D. Bach, X. Liu, K. Radhakrishnan, K. Ravichandran, J. W. Tschanz, V. De

This paper demonstrates a monolithic, process node agnostic, fully digital high frequency buck voltage regulator with two kinds of on-die planar spiral inductors, i) one utilizing 3 thick TM layers to implement multiple turns (LTM) and ii) an industry first approach to utilizing the C4 layer, normally used to implement bumps, to construct a 1.5 turn inductor (LC4), both of which are implemented on a 16nm class CMOS FINFET process. The inductor structures enable lower losses, higher current density and superior scalability and flexibility than traditional on-die planar lateral spiral inductors without magnetics.

 

15.2 A 2048×60m4 SRAM Design in Intel 4 with Around-the-Array Power-Delivery Scheme Using PowerVia
Kim, Y. Kim, A. Shrivastava, G. Park, A. Mahadevan Pillai, K. Bannore, T. Doan, M. Rahman, G. Baek, C. Ong, X. Wang, Z. Guo, E. Karl

The ever-increasing demand for energy-efficient computing motivates novel innovations in advanced process technology. Power through-silicon via (TSV) technology is introduced to utilize low resistance interconnects on the backside as a power delivery network (PDN) and benefits include reduced IR drop in PDN and relaxed signal routing congestion on the frontside. Utilizing technology with power TSVs, a fabricated CPU core enabled >90% of standard cell utilization and demonstrated ~6% higher performance with ~30% lower IR drop. Integration of power TSVs in SRAM array design carries unique tradeoffs from general logic design that requires careful design of bitcells and array peripheral circuits to enable energy-efficient and dense embedded memory. This paper presents 108Mb high-current 6T SRAM (HCC) and 124Mb high-density 6T SRAM (HDC) designs implemented in 4nm CMOS with power TSV technology, demonstrating improved or comparable VMIN and improved performance with 2% higher bit density in HCC 2048x60m4 instance compared to similar array designs without power TSV. In addition, high volume manufacturing Si data confirmed that there is no unique yield or performance failure mode due to power TSV in this design.

 

18.2 A 4×64Gb/s NRZ 1.3pJ/b Co-Packaged and Fiber-Terminated 4-Ch VCSEL-Based Optical Transmitter
Mondal, J. Qiu, S. Krishnamurthy, J. Kennedy, S. Bose, T. Acikalin, S. Yamada, J. Jaussi, M. Mansuri

As bandwidth demand increases, electrical interconnects suffer from limited reach due to channel loss. Multi-mode vertical-cavity surface-emitting laser (VCSEL)-based optical interconnect can enable high-bandwidth connectivity while extending the reach to tens of meters. Pluggable VCSEL-based optical modules are widely used in data center communication but do not meet system stringent requirements such as interconnect latency, bandwidth (BW), or energy efficiency. This paper presents a co-packaged VCSEL-based optical TX solution that integrates a VCSEL driver (VCDRV) IC, VCSEL array, and fiber termination on the XPU/SW package. A complex-zero continuous time linear equalizer (CTLE) is introduced to equalize a complex-pole pair in the VCSEL optical response and enhance the maximum achievable baud rate for best latency and energy efficiency. A low-power, low-jitter resonant clocking architecture improves system jitter performance and includes a transmission-line (TL)-based resonant distribution and a wide-tuning-range quadrature generation (quad-gen). Finally, a low-power serializer and electrical driver architecture employs pulse-width correction for improved eye symmetry.

 

22.3 A 76mW 40GS/s 7b Time-Interleaved Hybrid Voltage/Time-Domain ADC with Common-Mode Input Tracking
Whitcombe, S. Kundu, H. Chandrakumar, A. Agrawal, T. Brown, S. Callender, B. Carlton, S. Pellerano

In this work a voltage-to-time converter (VTC) acts as a high-speed buffer and a time-to-digital converter (TDC) operates in parallel with a time-to-voltage converter (TVC) to generate a coarse signal estimate to speed up a single-comparator SAR ADC. To improve upon prior art, this work uses the hybrid architecture to optimize TI ADC floorplan and enhances reliability through (1) a VTC with common mode input voltage tracking and (2) a merged flash TDC+TVC to guarantee TDC monotonicity with low added power. The 22nm CMOS prototype consumes 76mW at 40GS/s including input and reference buffers and achieves 32.3dB SNDR with a 20GHz input, for 57fJ/step FoMw.

 

28.4 A Monolithic 12.7W/mm2-Pmax, 92% Peak-Efficiency CSCR-First Switched-Capacitor DC-DC Converter
Butzen, H. Krishnamurthy, J. Yu, Z. K. Ahmed, S. Weng, K. Ravichandran, R. H. Ahangharnejhad, J. Waldemer, C. Pelto, J. Tschanz

Near-compute voltage converters are key in supplying power close to where it is consumed while minimizing total socket input current. Switched-inductor buck converters are a popular choice, but because they rely on large volume on-board or in-package inductors, they are increasingly hard to integrate close to the load domain. Monolithic hybrid converters have seen a recent uptick in popularity in the literature, but their efficiency and/or power density is currently not acceptable for most computing applications. Monolithic Switched-Capacitor Voltage Regulators (SCVRs) use on-die capacitors which means they tend to be substantially smaller in volume, and can also be integrated entirely on the same die as the load domain, or close to the load but on a separate die using advanced packaging techniques. Moreover, when using the Continuously Scalable Conversion-Ratio (CSCR) topology, they can maintain high efficiency over wide Vin/Vout ranges, just like an inductor-based VR would. That being said, because of the CSCR topology’s poor scaling towards higher input voltages, they have yet to be demonstrated at high input voltage and high output power simultaneously. In this work, a CSCR-First topology is introduced that significantly improves the performance of high-voltage CSCR SCVRs.

 

28.6 An 87% efficient 2V-input, 200A Voltage Regulator Chiplet enabling vertical power delivery in multi-kW Systems-on-Package
Jain, S. Xu, R. Kaushal, C. Mariscal, H. Caballero, T. Salus, C. Schaef, A. Deka, A. Payala, K. Chen, H. Do, J. Douglas

A stand-alone 200A voltage regulator (VR) chiplet is built on 16 FinFET CMOS foundry process. Architectural and physical choices that enable high thermal design current, and further doubling of current capacity via 2-way ganging are presented. Distributed magnetic inductors on the package yield a volumetric current density of 0.75A/mm3. In this paper, 40x current capacity is demonstrated over previously reported integrated VRs, with a peak efficiency of 87%.

 

Co-Authored Conference Papers

16.6 PACTOR: A Variation-Tolerant Probing-Attack Detector for a 2.5Gb/s×4- Channel Chip-to-Chip Interface in 28nm CMOS
In collaboration with Columbia University

This paper presents the on-chip, variation-tolerant probing-attack detector, PACTOR, based on an SRAM-cell-like comparator monitoring the capacitance change on a PCB trace. We integrate the detector with a quad-channel high-speed I/O capable of achieving a data rate of up to 2.5Gbps/channel, similar to the DDR4 spec. At a typical condition, PACTOR can detect as small as 0.071pF, which is much smaller than the loading capacitance of a high-end commercial probe (~0.5pF). Notably, employing on-chip temperature sensor-compensated detection thresholds and the binary-search algorithm, PACTOR can deliver the high detection precision of 0.5pF robustly across -20 to 105° C and 0.65 to 1.1V. Fabricated in 28nm CMOS, the prototype occupies 1785.3µ m2/channel and consumes 0.0362mW/channel.

 

16.7 Power and EM Side-Channel-Attack-Resilient AES-128 Core with Round- Aligned Globally-Synchronous-Locally-Asynchronous Operation Based on Tunable Replica Circuits
In collaboration with University of Texas

This work presents a Round-Aligned Globally-Synchronous-Locally Asynchronous (RA-GSLA) architecture using Tunable Replica Circuits (TRCs) and stochastic parallel/serial module activity within one clock cycle as a countermeasure against Power/EM SCA. The key design attributes are (i)Maintaining round integrity and synchronous operation at the clock boundary while performing intra-cycle asynchronous SCA-critical operations, (ii)TRC-based completion detection scheme and randomized fire timing incurring low power/area overhead, (iii)Randomized sequencing and intra-round serial/parallel/null operation of security-critical modules, enabling greater/less than exactly one operation per module per round, improving SC entropy, (iv)Timing and dataflow randomization for computations and register updates (v)Compatible with any AES version (128/192/256) using external synchronous key schedule, (vi)Fully synthesizable, all-digital, single supply and technology scaling-friendly design, without using any analog components.

Tags (1)
About the Author
Scott Bair is a Senior Technical Creative Director for Intel Labs, chartered with growing awareness for Intel’s leading-edge research activities, like AI, Neuromorphic Computing and Quantum Computing. Scott is responsible for driving marketing strategy, messaging, and asset creation for Intel Labs and its joint-research activities. In addition to his work at Intel, he has a passion for audio technology and is an active father of 5 children. Scott has over 23 years of experience in the computing industry bringing new products and technology to market. During his 15 years at Intel, he has worked in a variety of roles from R&D, architecture, strategic planning, product marketing, and technology evangelism. Scott has an undergraduate degree in Electrical and Computer Engineering and a Masters of Business Administration from Brigham Young University.