Intel Presents Latest Circuit Innovations at CICC 2025

Scott_Bair · ‎04-14-2025

Scott Bair is a key voice at Intel Labs, sharing insights into innovative research for inventing tomorrow’s technology.

Highlights:

The 2025 IEEE Custom Integrated Circuits Conference runs from April 13th through 16th in Boston, Mass.
Intel is proud to present four technical papers at the conference, as well as a university collaboration, workshop paper, educational session, and panel.
Intel’s paper on Sparse GEMM acceleration was nominated for Best Paper, and Intel’s paper introducing a high-performance passive base system for distributed AI/media acceleration was nominated for Best Invited Paper.

This year’s IEEE Custom Integrated Circuits Conference (CICC 2025) runs from April 13th through 16th in Boston, Mass. CICC is a premier conference devoted to IC development.

Intel is proud to present four technical papers at the conference, as well as a university collaboration, workshop paper, educational session, and panel. These contributions detail innovations in heterogeneous multi-chiplet systems, attack-resistant crypto hardware, analog coupled-oscillator compute chips, and high-speed wireline/lightwave interconnects for AI servers.

Furthermore, Intel’s paper on Sparse GEMM acceleration was nominated for Best Paper, and Intel’s paper introducing a high-performance passive base system for distributed AI/media acceleration is a candidate for Best Invited Paper.

Main Conference Papers

16 Arrays of 32 All-to-all Coupled CMOS Oscillators for AI Inference and Combinatorial Optimization

This work presents a low-power (0.2mW/oscillator), high-accuracy (R~0.955 for dot product) oscillator-based analog compute chip with digital control. Each oscillator has 5-bit iDAC, 3-bit INIT code, and 512X frequency divider. High accuracy is achieved for AI inference (MNIST: 98.02%, CIFAR: 92.06%) and combinational optimization problems (52-city TSP: 22.5% more distance), via all-to-all 32-oscillator array synchronization, where the degree of match in each of 16 arrays is sensed by a high-gain low-power peak detector and digitized by an 8-bit ADC.

A 2455µm2 1.7Gbps Side-Channel Attack-Resistant Masked HMAC-SHA256 Accelerator in Intel 4 CMOS

This work presents the first-reported ASIC implementation of an side-channel attack (SCA) resistant Hash-based Message Authentication Code (HMAC) SHA256 hardware accelerator with a boolean/arithmetic-masked message-digest SHA-256 datapath in Intel 4 CMOS occupying 2455μm2. Resilience against side channel attacks is demonstrated with 10M measured traces providing >73× in SCA-resistance with SHA-256 hashing performance of 1.7Gbps and energy-efficiency of 6.1pJ/b while limiting performance overhead to 26% compared to unprotected implementation.

A 68 TOPS/W, 256MB SRAM Sparse GEMM Accelerator Tiled Across 16, 4nm Near Memory Compute (NMC) Chiplets Disaggregated 2.5D System
Best Paper Candidate

The rapid evolution of AI models from DNNs to Transformers, characterized by their increasing size and complexity, presents a significant challenge for hardware acceleration. Leveraging chiplet disaggregation to push the boundaries of hardware technology co-design will overcome the inefficiencies of monolithic hardware accelerators and drive transformative innovations. This work presents 20 chiplet disaggregated sparse GEMM acceleration with 16 NMC chiplets offering a total of 68 TOPS/W and 256MB SRAM minimizing frequent off package data movement.

A High-Performance Passive Base System for Distributed AI/Media Acceleration
Best Invited Paper Candidate

Moore's law continues to shine thanks to techniques like Heterogenous Integration, where devices are placed at their optimal process nodes for performance/power/cost. To accelerate fast changing algorithms in the AI/Multimedia era, this paper presents a 2.5D integrated plug-and-play style multi-chiplet system suited for rapid prototyping. This architecture features a blend of compute and memory chiplets on a high-performance Passive Base System with a fully synthesize custom die-to-die IO, easily reconfigurable to any future workloads.

University Collaboration

A 93.9% Peak Efficiency 3V-to-40V-Input GaN-based DC-DC Converter with Unified Reliability and Efficiency Adaptive Control
Intel, Columbia University, and IBM T. J. Watson Research Center

This work proposes a Gallium-Nitride (GaN) based 3V-to-40V input DC-DC converter with unified reliability and efficiency-aware adaptive control. The proposed controller employs multiple sensors and computation modules to accurately estimate the threshold voltage (Vth), on-state resistance (Ron), junction temperature, power loss, and remaining useful lifetime. Simultaneously, the controller tunes switching frequency and gate drive swing knobs to maximize reliability and efficiency. It successfully slows down the Vth degradation by 3.8X, reduces Ron by 5.5%, achieves a peak efficiency of 93.9%, and increases the efficiency up to 7.7%.

Workshop Paper

Architecting Heterogenous System-of-Chiplets for Data Center and AI Era
Chiplet Solution for Custom ICs (CHISIC) Workshop
Presenter: Surhud Khare

Data Center CPU, GPU and AI accelerators have evolved from monolithic System-on-Chip designs to heterogenous System-of-Chiplets to enable “More-than-Moore” scaling of system-level performance and energy efficiency at lower costs. This presentation highlights the importance of interdisciplinary System-Technology Co-Optimizations (STCO) for architecting such heterogenous System-of-Chiplets. It will cover architecture and design considerations for optimizing compute, memory and interconnect components, as well as trade-offs and co-optimization opportunities with process/packaging technologies, power delivery, thermals and system architectures. The presentation will also highlight future trends and innovations required to drive continued performance and efficiency improvements for the AI era.

Educational Session

Attack-Resistant Crypto Hardware Accelerators for Secure Platforms
Educational Session 3: Security or Privacy from Hardware to Systems
Presenter: Sanu Mathew

Secure platforms rely on silicon-embedded root-of-trust circuits to deliver security guarantees, while operating in hostile environments where adversaries are present at all layers of the compute stack. Attackers employ a variety of attack modalities using side-channels (SCA) and fault-injection (FIA) to steal on-die secret keys/IDs. In this tutorial, panelists will discuss how attackers mount SCA and FIA on symmetric-key encryption engines and explore first-order countermeasures against such attacks, while minimizing the area, power and performance overheads. The tutorial will describe the design of a reconfigurable AES engine that operates in SCA-resistant mode in a hostile environment, while allowing the administrator to switch to a high-performance mode while operating in a secure environment. The discussion will also cover the design of a self-checking AES engine that detects malicious injected faults in real-time using checker circuits.

Panel

Wireline and Lightwave Interconnects - The Shifting Boundary in the AI Era
Session 20: Panel
Panelist: Ajay Balankutty

In the rapidly evolving field of Artificial Intelligence (AI), the demand for high-performance computing infrastructure has reached unprecedented levels. Central to this infrastructure are the energy-efficient, high-bandwidth interconnects that enable rapid data exchange between servers and chips. This session will delve into the critical role of high-speed interconnects in AI servers, with a particular focus on the latest advancements in lightwave and wireline interconnect technologies.