Intel Details IPU and New Circuit Innovation at ISSCC 2023

ScottBair · ‎02-20-2023

Scott Bair is a key voice at Intel Labs, sharing insights into innovative research for inventing tomorrow’s technology.

Highlights:

The IEEE International Solid-State Circuits Conference (ISSCC), 2023, runs from February 19-23 in San Francisco, California.
Intel is excited to contribute seven papers, as well as special event panels, forums, a short course, and a tutorial.
Intel’s contributions include advanced building blocks for power delivery and management and Wi-Fi transceivers in Intel platforms as well as a fault-injection attack-resistant AES engine and details on the Intel® Infrastructure Processing Unit E2000.

This year, the IEEE International Solid-State Circuits Conference (ISSCC) will run from February 19-23. The conference will be held in San Francisco, California. Intel is pleased to present seven papers at this global forum for advances in solid-state circuits and systems-on-a-chip. Additionally, researchers from Intel Labs are participating in special event panels, forums, a short course, and a tutorial. These contributions include advanced building blocks for power delivery and management and Wi-Fi transceivers in Intel platforms.

Notably, Intel was invited to present its industry paper, “An In-depth Look at the Intel IPU E2000”, detailing the Intel® Infrastructure Processing Unit (Intel® IPU) E2000, formerly code-named Mount Evans. This 200GbE Intel IPU was co-designed with Google and is in production as of 2022. It features the strong separation of infrastructure functions and tenant workload allows tenants to take full control of the CPU. Cloud operator scan offload infrastructure tasks to the IPU maximizing CPU utilization and revenue. IPUs can manage storage traffic which reduces latency while efficiently using storage capacity via a diskless server architecture. With an IPU, customers can better utilize resources with a secure, programmable, stable solution that enables them to balance processing and storage.

Figure 1. A representation of fault-injection attacks on an AES engine.

Researchers will also present a live demonstration showing how fault-injection attacks can be mitigated by Intel Labs’ novel techniques detailed in the paper, “A 100Gbps Fault-Injection Attack Resistant AES-256 Engine with 99.1-to-99.99% Error Coverage in Intel 4 CMOS.” Fault-injection attacks (FIA), as depicted in Figure 1, are physical attacks that inject faults using lasers or glitches on the clock or power supply. These malicious attacks can be used to achieve a number of objectives, including the extraction of cryptographic keys, gaining privileged access, and modifying parameters in deep neural networks. In response to this emerging security issue, the Intel team details a source-agnostic fault-injection-attack resistant AES-256 accelerator with marked improvement in minimum-time-to-disclosure (MTD) against laser and undervoltage attacks, compared to an unprotected AES engine.

Full research papers are only available to conference attendees, but readers can find a preview of Intel Labs’ efforts below.

Conference Papers and Demos

An In-depth Look at the Intel IPU E2000

The Intel® Infrastructure Processing Unit (Intel IPU) E2000 is Intel’s first ASIC IPU device, a 200G product co-designed with Google and in production as of 2022. It features a rich packet processing pipeline, remote direct memory access (RDMA), and storage capability, including NVMe offload and an ARM Neoverse based compute complex enabling customer-provided software to execute features ranging from complex packet processing pipelines to storage transport to device management and telemetry. Utilizing the combination of general-purpose acceleration and software running in the compute complex, this IPU enables a rich variety of services to be provided to the attached client. Attached clients can range from compute hosts for providing infrastructure as a service, storage disks for fronting a storage target, accelerators such as GPUs or FPGAs for fronting specialized processing functions, or even function as pure SoC servicing custom appliances or downstream devices like smart switches. This broad deployment capability supports the rapid innovation necessary for the modern data center.

A 100Gbps Fault-Injection Attack Resistant AES-256 Engine with 99.1-to-99.99% Error Coverage in Intel 4 CMOS

Fault-injection (FI) attacks exploit corrupted ciphertexts from cryptographic engines to extract secret keys. A single fault injected into the penultimate AES round using directed laser pulses or voltage/clock glitches corrupts 4 output bytes, reducing key search space to a single guess with differential fault analysis (DFA) on 8 exploitable ciphertexts. FI countermeasures using redundant concurrent/time-interleaved computations incur 2× area/performance overheads. Conventional linear parity checkers provide insufficient fault coverage due to the non-linear characteristics of Sbox inverse operations. FI detection-based countermeasures, employing source-specific detectors such as substrate-current sensors for laser attacks and frequency-locked loops to detect clock glitches, respectively, are ineffective against generic FI attacks. In this paper, we present a source-agnostic FI-attack resistant AES-256 accelerator with 111× and 10,000× improvement in minimum-time-to-disclose (MTD) against laser and undervoltage attacks, respectively, compared to an unprotected AES engine. Arithmetic and parity-based checker circuits coupled with inverse and affine logic optimizations and byte-interleaved register placement enable 99.1% fault coverage against laser raster/box-scan injections (Fig. 1). Fine-grained placement of an all-digital laser detection circuit (LDC) within the AES core provides 13,400× higher margin for raster-scan laser pulse detections. Undervoltage attacks on FI-resistant AES show a measured 99.99% fault detection coverage and a 40mV positive slack in checker datapath to capture undervoltage faults.

A 1.8W High-Frequency SIMO Converter Featuring Digital Sensor-less Computational Zero-Current Operation and Non-Linear Duty-Boost

Power delivery components are critical for meeting the size and weight requirements of ultra-mobile electronic systems. The L & C passives in the power delivery sub-system occupy >50% of the total PCB area and often dictate the thickness of handheld devices. On the other hand, advanced power management capabilities demand multiple individually controllable voltage domains with high conversion efficiency to maximize battery life. Compared to traditional methods, single-inductor multiple-output (SIMO) converters promise a more balanced solution for these critical trade-offs. However, they are vulnerable to significant cross-regulation among the multiple outputs time-sharing a single inductor, especially due to a large load transient in one for >1W power. This work presents a high power (1.8W), high-frequency SIMO converter in 16nm FinFET CMOS operating in DCM at 10MHz with an ultra-small 5~10 nH inductor, delivering power to 4 outputs (1.4V/1.2V/1V/0.8V). The converter features (1) digital sensor-less computational zero-current operation with low overheads where the zero-current timing is calculated directly from the energizing time and the inductor current slopes cycle by cycle every 8ns, (2) a digital SIMO controller running at 128MHz that computes the energizing times required for regulating each output, and can be configured at runtime to any order from a register, and (3) a digital non-linear “duty-boost” ON-time correction for droop mitigation. A self-triggered soft-switching driver ensures turning on the low-side switch at zero voltage across a wide range of loads and output voltages, thus eliminating voltage overstress of core transistors and body diode conduction loss.

A Digital Low-Dropout (LDO) Linear Regulator with Adaptive Transfer Function Featuring 125A/mm2 Power Density and Autonomous Bypass Mode

Modern CPUs feature multiple power domains, most of which are grouped and share a common power supply per such a group. A combination of common supply with per-domain dynamic voltage and frequency scaling (DVFS) implies a need for local voltage regulation, which in some systems on a chip (SOCs) are based on low dropout regulators (LDOs). Operation conditions feature dynamically variable voltages (VIN & VOUT), ILOAD transients, and a variety of decoupling solutions. Such usage conditions usually cause VMIN to depend on VIN, which translates into the cost (test time increase) and/or power/performance (VMIN guard bands) penalties. In this paper, we report on a design of a CPU-grade digital LDO (DLDO) that supports adequate (to CPU performance requirements) power densities and mitigates VMIN dependency on the input voltage. DLDO was implemented in 10 nm process node as part of a multi-domain SOC.

A Monolithic 26A/mm2 Imax, 88.5% Peak-Efficiency Continuously Scalable Conversion-Ratio Switched-Capacitor DC-DC Converter

As SoC complexity continues to increase, finer-grained power domains are being employed to allocate power more precisely and meet stringent requirements on power, performance, and battery life. This does, however, put additional strain on the power delivery system as each time a domain is divided into multiple domains, the sum of the maximum currents that must be supported increases beyond the current rating of the initial domain. For voltage converters, this has proven particularly troublesome. However, Switched-Capacitor Voltage Regulators (SCVR) offer the promise of providing scalable voltage conversion without having to rely on in-package components but have so far not lived up to their expectations. Conventional SCVR topologies have been able to demonstrate both high current density and efficiency thanks to the use of high-density on-die capacitors or using an approach where MIM capacitors are placed on top of the load domain while minimizing the active silicon area of the converter – and thus cost. But because they don’t maintain high efficiency across a wide output/input voltage range, they have some of the same drawbacks as LDOs. The Continuous Scalable Conversion-Ratio (CSCR) topology, on the other hand, can maintain high efficiency across voltage conversion ratios (VCRs), but has never been demonstrated with sufficient current density to power applications other than energy scavenging. This work proposes a Phase-Merging Turbo (PMT) technique that can significantly increase the output current capability of a CSCR SCVR.

A 128Gb/s 1.95pJ/b D-Band Receiver with Integrated PLL and ADC in 22nm FinFET

Recent work has shown the ability of mm-wave (30-100 GHz) and subTHz (100-300 GHz) receivers to support large bandwidths needed to meet growing data rate demands in a variety of applications, from dielectric waveguide to wireless links. Unfortunately, prior art either excludes the integration of critical receiver blocks, like the phased-locked loop (PLL) or the analog-to-digital converter (ADC). A critical challenge in demonstrating subTHz receivers with an integrated ADC is the limited availability of process nodes that provide efficient performance for both RF and mixed-signal/digital circuits simultaneously. This work presents a D-band (140GHz) receiver (RX) which integrates the RX front-end with PLL and ADC. The RX is integrated in Intel 22nm FinFET process (22FFL), which is co-optimized for RF and digital performance, enabling state-of-the-art efficiency of 1.95pJ/b at a maximum data rate of 128Gb/s.

A 1.67-Tb 5b/cell Flash Memory Fabricated in a 192-Layer Floating-Gate 3D-NAND Technology and Featuring a 23.3Gb/mm2 Bit Density

NAND flash SSD technology is a key enabler of today’s datacenters and reducing cost per bit continues to be an important factor in order to accommodate the exponential growth in the data storage. Intel pioneered the industry by successful deployment of four generations of 4b/cell (QLC) 3D NAND technologies for datacenter and client applications. This paper presents the industry’s first 5b/cell memory fabricated in the Intel’s 192-layer floating gate NAND technology. The paper will describe key innovations to enable reliable PLC operation and features implemented to support system-level usage.

Tutorial

T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design

Eric Karl, Intel, Portland, OR

To meet the energy efficiency demands of future applications, system-on-a-chip (SoC) designs continue to march towards ultra-low-voltage operation. This tutorial will address the fundamental challenges for embedded memory operation at low voltages in advanced process technologies and cover single and multiport SRAM, logic flip-flop, logic latch arrays, and eFuse technologies prevalent in embedded design use. Memory assist circuits, repair, error correction, multi-VCC array design, clocking, and min-delay design strategies relevant to enabling ultra-low and near-threshold system operation will be explored. System designers will walk away with a stronger understanding of how to better navigate embedded memory usage in ultra-low-voltage designs.

Forum Contributions

Circuit Designs for 200+Gb/s Transceivers, presented by Jihwan Kim in Forum 1: Transceivers for Exascale: Towards Tbps/mm and sub-pJ/bit.

Advances in Highly Efficient All-Digital CMOS Transmitters for Wide Bandwidth Wireless Application, presented by Ofir Degani in Forum 3: Efficient Wireless Power Amplification and Linearization.

Universal Chiplet Interconnect Express (UCIe)TM: An Open Industry Standard Chiplet Interconnect for Next-Generation Systems on a Package, presented by Debendra Das Sharma in Forum 6: The Future of Heterogeneous Multi-Core Architectures for AI and Other Specialized Processing.

Special Event Panelists

EE2: Integrated Circuits in an Interconnected World

Panelist: Asako Toda, Intel, San Jose, CA

Data-centric applications such as automotive, IoT, and machine learning primarily depend on IC connectivity to meet the demands of high-bandwidth and energy-efficient communication. Product requirement diversity continues to grow, and IC components now face a bottleneck in system performance and power, particularly due to their interconnects. To ensure communication capabilities that meet the demands of emerging applications with ever-increasing features, additional research and focus is needed. This panel focuses on connectivity for the next generation of communication systems and brings together expert panelists to share their perspectives on topics in IC connectivity across wireless, wireline, chip-to-chip, and optical link communications.

EE3: The Path to Sustainable IC Ecosystems

Panelist: Todd Brady, Intel, Chandler, AZ

Sustainability has become a major concern in our lives in general, and all the way to IC design. It widened from energy management to include greenhouse gas emissions and pressure on natural resources all along the product lifecycle. These challenges must be taken into account early in product design phases to rethink system architecture and circuit techniques to favor frugality and reuse and minimize the impact of manufacturing. A deep restructuring of the IC ecosystem to integrate eco-design and reuse will need to arise, either from top-down political intervention or company-driven initiatives. How do we facilitate an economically viable path to a sustainable IC ecosystem? Will this come from market incentives, or are government regulations required? With growing awareness of this challenge, several initiatives have already emerged for sustainable electronics, coming from research labs, companies, citizens, and governments. This panel confronts these approaches and explores their potential to create this economy-viable path via market incentives or government regulation for a sustainable value chain all over the product lifecycle.

EE4: The Smartest Designer in the Universe, Post-Pandemic!

Panelist: Farhana Sheikh, Intel, Hillsboro, OR

At ISSCC 2020, there was a battle of epic proportions between industry, academia, and students to determine the smartest designer in the universe. Industry came out victorious. Now, at ISSCC 2023, as we return to an in-person conference, academia, and students have their chance to get their revenge and set the record straight. In this interactive quiz show, three teams representing industry, academia, and students will compete for the honor and the prestigious title: “The Smartest Designer in the Universe.” In several rounds, the contestants will solve questions and puzzles covering all parts of electrical engineering. They will baffle you with their knowledge, surprise you with their wit and entertain you with their to-the-point remarks. This is all topped with a gentle sauce of irony since the smartest designer in the universe should be smart enough to appreciate the special relativity of it all. Join this session not only to support your own team but enroll in the game. Everybody will be able to actively participate using an app.

EE5: What will be the Essential Skills for IC Designers in the Next Decade?

Moderator: Mozhgan Mansuri, Intel, Hillsboro, OR

Panelist: Itamar Levin, Intel, Jerusalem, Israel

Based on emerging trends in design methodology, such as AI for IC design and verification, this session of academic and industry leaders will predict and discuss how future design automation will change the way IC designers work in the next decade. Will more and more IC designs be automated by then? Is our field shrinking? Are we attracting and training enough students to learn IC design to meet potential industry needs? Join this special evening topic session to get the perspective of the industry and academic leaders in IC design.

Short Course Presentation

Spin Qubits: Principles, Control/Readout Architectures, and Cryoelectronic Solutions, presented by Sushil Subramanian in the Principles of Quantum Computing and the Application of Cryoelectronics to Qubit Control and Readout Short Course.