Scott Bair is a key voice at Intel Labs, sharing insights into innovative research for inventing tomorrow’s technology.
Highlights:
- This year’s IEEE International Solid-State Circuits Conference (ISSCC) runs from February 16-20 in San Francisco, California.
- Intel is presenting six papers and two university collaborations at the main conference. Additionally, researchers from Intel are participating in a special event panel and two forums.
- Navid Shahriari, Senior Vice President of Intel Foundry Technology Development, opened the first plenary session with a presentation on Intel’s invited paper: AI Era Innovation Matrix.
- Intel is also releasing details on Intel® Xeon® 6 System-on-Chip, formerly codenamed “Granite Rapids-D.”
This year’s IEEE International Solid-State Circuits Conference (ISSCC 2025) runs from February 16-20 in San Francisco, California. Intel is pleased to present six papers and two university collaborations at the main conference. Additionally, researchers from Intel are participating in a special event panel and two forums. These contributions include advancements in packet integration, integrated power and thermal management, and integrated photonics, among others.
On February 17th, Senior Vice President of Intel Foundry Technology Development, Navid Shahriari, opened the first plenary session with a presentation on Intel’s invited paper: AI Era Innovation Matrix. Artificial intelligence holds transformative potential for humanity, enhancing our ability to solve complex problems with speed and accuracy, and unlocking new realms of innovation and understanding. As data processing demand grows, so does the need for greater computing power in a smaller area with reduced energy consumption. Furthermore, the exponential scaling of parallel AI workloads is putting pressure on interconnect bandwidth density, latency and power. The growing need for AI system scaling is driving the innovation frontier in silicon, packaging, architecture, and software. Shahriari’s talk detailed a matrix of technologies developed by Intel researchers that are ushering progress at every level, from chips to systems. Some of these critical technology breakthroughs are being presented at this year’s conference: see the full list below to learn more. His entire talk can be viewed below on the ISSCC YouTube channel.
Intel is also releasing details on Intel® Xeon® 6 System-on-Chip (SoC), formerly codenamed “Granite Rapids-D.” Intel Xeon 6 SoC targets virtualized Radio Access Network, Edge Server, networking, storage, and security segments. The SoC is optimized for computing and is aimed at scalar and data parallel workloads, which require low-latency, high-bandwidth memory, high-bandwidth PCIe 5.0 capabilities, and server-grade robustness. The Edge-optimized SKUs will offer confidential AI-enabled security and scale across different edge systems based on one architecture, supporting multiple Ethernet and accelerators.
Intel Xeon 6 SoC is a disaggregated SoC comprising one or more compute dies and a single I/O die with a dense integration of hardware accelerators. These accelerators offer over two times increased performance and throughput over Ice Lake-D (ICX-D). The Xeon 6 SoC CPU also offers over three times an increase in core counts and memory bandwidth, up to 2.5 times increase in I/O performance, and up to two times increase in integrated Ethernet throughput. Furthermore, the SoC leverages the disaggregation strategy offered in the Xeon 6 generation, resulting in an improvement of 1.8 times gen-to-gen on SIR benchmarks for aggregate performance over ICX-D and better power.
Full research papers are only available to conference attendees, but readers can find a preview of Intel’s efforts below.
Main Conference Papers
A 0.021μm2 High-Density SRAM in Intel 18A RibbonFET Technology with PowerVia Backside Power Delivery
Session 29 – SRAM
The accelerating pursuit of high-performance and energy-efficient computing drives the recent breakthroughs in both semiconductor device and power delivery scheme in advanced process technology. This paper presents the industry’s first volume silicon validated high-current 6T SRAM (HCC) and high-density 6T SRAM (HDC) designs implemented in RibbonFET incorporating backside power delivery with PowerVia technology over the peripheral circuits. RibbonFET transistors offer better performance per watt and improved density, and they allow flexible adjustment of the effective transistor width to achieve optimal SRAM transistor sizing for power, performance and VMIN. PowerVia technology features backside power routing, which enables reduced power droop and additional wiring resources on the frontside for more efficient peripheral circuit design. Compared to similar designs using FinFET, the proposed RibbonFET SRAM design has 0.77x and 0.88x bitcell area scaling for HCC and HDC, respectively. The RibbonFET HCC SRAM demonstrated improved measured VMIN without assist circuitry at 90th percentile compared to prior FinFET based designs that required both read and write assist circuitry. With negative bitline (NBL) write assist, the 34.3Mb/mm2 HDC array demonstrated 68mV better VMIN compared to the prior designs. Up to 38.1Mb/mm2 can be achieved using HDC SRAM with different array configuration and additional peripheral circuit compaction.
A 0.9pJ/b 108Gb/s PAM-4 VCSEL-Based Direct-Drive Optical Engine
Session 36 – Ultra-High-Density D2D and High-Performance Optical Transceivers
This paper presents a 0.82pJ/b 108Gb/s PAM4 co-packaged VCSEL-based direct-drive optical engine that integrates VCSEL driver and transimpedance amplifier front end ICs, with their VCSEL and photodiode counterparts which are fiber terminated by direct optical wiring technology. Several circuit techniques are introduced to enable the >100Gb/s PAM4 operation, including a high linearity complex-zero CTLE in the VCSEL driver and a highly linear differential TIA-FE with an active complex-zero CTLE.
A 300MB SRAM, 20Tb/s Bandwidth Scalable Heterogenous 2.5D System Inferencing Simultaneous Streams Across 20 Chiplets with Workload-Dependent Configurations
Session 2 – Processors
Disaggregating large systems has shown multifold advantages especially with current application trends prompting a shift towards chiplet-based architectures. To meet increasing computing demands, 2.5D systems should have greater interoperability between advanced technology nodes from multi-Foundry, higher system memory capacity, higher I/O counts and scalable interconnect pitches. To further address the escalating memory capacity demands and mitigate memory bandwidth bottleneck prevalent in AI applications, chiplet systems should be capable of workload tailored configurations at assembly time. This adaptability enables optimal resource allocation and facilitates processing of voluminous datasets and complex AI computations.
A Fully Integrated Multi-Phase Voltage Regulator with Enhanced Light-Load-Efficiency Peak of 86%, Featuring an Autonomous Mode Transition from Hard-Switching to Soft-Switching to Discontinuous Conduction Mode in 3nm FinFET CMOS
Session 21 – Compute and USB Power
This work presents an FIVR that has an autonomous mode transition from hard-switching to soft-switching to DCM. The FIVR uses a novel high-precision, high-speed comparator that observes the low-side power switch during every switching cycle to detect negative inductor current and enable soft switching of the high-side power switch. An auxiliary detection circuit monitors the load current and changes the mode of operation to DCM where the load current is very low. One of the by-products of the transition between Continuous Conduction Mode (CCM) and DCM is the impact on the output in terms of overshoot and undershoot. Any undershoot on the FIVR output can directly impact the minimum supply voltage specification of the SoCs being powered and needs to be addressed adequately. This paper also discusses DCM droop improvement techniques that use the autonomous mode transition into DCM based on load current (hitherto known as auto-DCM) and the compensation network to improve the DCM to CCM transition.
Fine-Grained Spatial and Temporal Thermal Profiling of a 16nm CMOS Buck Converter and SOC Load-Current Emulator Using Low-Voltage Micron-Scale Thermal Sensors
Session 8 – Digital Techniques for System Adaptation, Power Management and Clocking
This paper proposes an accurate (+/-0.7°C), area-efficient (20µm x 20µm), low-power (18µW), digital and low-voltage friendly (0.7-1V), current-starved ring-oscillator-based thermal sensor (CSRO-TS) that can be distributed across the die for fine-grain thermal profiling of complex compute SoCs and IVR chiplets. An array of 204 sensors is implemented to demonstrate the fine-grain thermal profiling capability that can be used to detect large temperature gradients (>15°C) within 100µm. An additional 12 CSRO-TS have been placed across the active area of a high-power-density package-integrated buck IVR chiplet to validate its resiliency in noisy environments and demonstrate its usage for thermal and reliability monitoring of the IVR power train.
Granite Rapids-D: Intel Xeon 6 SoC for vRAN, Edge, Networking, and Storage
Session 2 – Processors
Granite Rapids-D (GNR-D) is the next-generation SoC in the Intel Xeon-D product swim lane. A successor to the Ice Lake-D (ICX-D) and Sapphire Rapids-Edge Enhanced (SPR-EE) products, it is targeted at virtualized Radio Access Network (vRAN, also called C-RAN, D-RAN), Edge Server, networking, storage, and security segments. The Xeon 6 SoC targets two segments, one optimized towards compute and one optimized towards Edge.
University Collaborations
A Dual VDD-Temperature Sensor Employing Sensor Fusion with 2.4°C, 9mV (±3σ) Inaccuracy in 65nm CMOS
Session 8 – Digital Techniques for System Adaptation, Power Management and Clocking
Intel and Georgia Institute of Technology
Trends in Heterogeneous and 3D chiplet integration have escalated thermal and power delivery challenges in modern SiP. Sensors must operate to specification, despite run-time Vdd scaling of SoC domains and Vdd noise. This paper describes a scalable sensor network consisting of compact distributed sensors well-suited to power and thermal management. The proposed approach involves judicious design of a pair of compact, non-idea Vdd and Tsub-sensors and relies on computational techniques to synergize them for enhanced performance.
T-REX: A 68-to-567μs/Token 0.41-to-3.95μJ/Token Transformer Accelerator with Reduced External Memory Access and Enhanced Hardware Utilization in 16nm FinFET
Session 23 – AI-Accelerators
Intel and Columbia University
Transformer, a recent mainstream model in deep learning, has revolutionized a wide range of AI applications, which motivates a surge in research to develop energy-efficient hardware accelerators. Our analysis indicates that external memory access (EMA) accounts for up to 81% of the total energy usage. To reduce EMA, we developed a factorizing training model that decomposes each weight matrix into a dense matrix shared across all layers (WS) and a highly sparse matrix distinct to each layer (WD). We prototyped the T-REX test chip in 16nm FinFET and measurement results show that T-REX can reduce EMA by 31-65.9X across four well-known transformer while achieving 68-567µs/token and 0.41-3.95µJ/token.
Forum Contributions
Optimizing Communication Between Chiplets for Future System-in-Packages
Presented by Kemal Aygün in Forum 3 – It’s all About Data: Building Blocks, Compute, Movement and Integration
UCIe: Requirements and Innovations in Electrical Link Circuits
Presented by Joe Wu in Forum 1 – Unlocking Innovation: Circuit Techniques and New Approaches for Die-to-Die Links and the Chiplet Ecosystem
Special Event Panelist
EE2 – Quantum Computing: Whose Qubit is Better?
Panelist: Jeanette Roberts, Intel, Hillsboro, OR
Building a fault-tolerant quantum computer will likely require millions of physical qubits. While many qubit technologies exist, only spin qubits use transistor fabrication processes. Advanced semiconductor manufacturing makes classical devices with billions of transistors. At Intel, we employ that technology, and corresponding infrastructure, to make quantum computing devices based on spin qubits. Spin qubits in silicon are also promising owing to their long coherence times and small size. Moreover, error rates consistent with fault-tolerant operation have been demonstrated.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.