Enabling In-Memory Computing for Artificial Intelligence Part 1: The Analog Approach

Hechen_Wang · ‎02-16-2023

Hechen Wang is a research scientist for Intel Labs with interests in mixed-signal circuits, data converters, digital frequency synthesizers, wireless communication systems, and analog/mixed-signal compute-in-memory for AI applications.

Highlights:

Intel Labs is actively pursuing multiple avenues for In-Memory Computing. Part 1 of this blog series discusses the analog approach and Intel Labs’ work in the area.
A series of simulation and measurement results demonstrated that the proposed novel, multi-bit solution can achieve energy efficiencies that are 10-100 times greater than conventional computation approaches.

The fundamental building block of computer memory is the memory cell; an electronic circuit that stores binary information. In the conventional approach to data processing, the data resides on a hard disk in the system or attached by a network. When needed, it’s called into the local system memory, or RAM, and then moves to the CPU. The lengthy process is relatively inefficient, so researchers began to seek an alternative. With In-Memory computing, data is stored directly in system memory. This architectural approach dramatically reduces latency by eliminating the time spent seeking data on the disk and then shuttling it closer to the CPU.

Currently, the concept of in-memory computing is heated in the field of AI hardware implementation. By reducing the distance of data movement, In-Memory computing is expected to achieve unparalleled energy efficiency. Yet, to realize this goal is not an easy task. Thus, there are different approaches to achieving the best results; namely, digital and analog. The digital computer has become increasingly popular, and effectively pushed analog computers to the backburner. However, as data generation continues to increase, researchers have revisited analog methods. Naturally, there are benefits and drawbacks to both digital and analog computing, and as such, Intel Labs is actively pursuing both avenues. In this two-part series, we will explore and evaluate each method and highlight the work that Intel Labs has done in the respective areas; starting with the analog approach.

Background

Artificial intelligence (AI) computing chips are beginning to outperform average humans on a wide range of tasks, including recognition, classification, gaming, and some scientific research areas [1], [2]. Thus, it has become one of the key enablers for big data processing, the Internet of Things (IoT), and the future of our intelligent society. Therefore, we must first identify the needs of AI applications to determine what benefits analog computing can offer. Figure 1 (a) presents a conceptual multiply–accumulate (MAC)-based neuron cell, which is the basic element in most AI systems. There are two types of data processed in one neuron: input activation (IA) and weight (W). Although they are treated equally in traditional digital systems and assigned the same number of bits, they do have different preferences in practice. Obtained during the training process, weights are static values preloaded in each node of the neural network. The major concerns in this circumstance are:

Storage robustness against process, voltage, and temperature (PVT) variations. The stored value needs to remain the same when the voltage supply or ambient temperature changes. And different batches of chips should have similar performance.
Areal density and upscale capability. The more data we can store in a network or on a chip, the more complicated problems the neural network can solve.
Read-write reliability. Basically, any discrepancy during memory write-in and read-out would lead to accuracy degeneration. Thus, with large redundancy in the design margin and great scaling capability, digital format is still the most suitable representation for the weights.

The input of the neuron, IA, is another story. Input activations are gathered either from multiple input resources (sensor, camera, microphone, etc.) or from the outputs of the previous neuron layer. As depicted in Fig. 1 (a), from port to port, those signals may have significant differences in terms of amplitude and frequency. A large dynamic range is the prerequisite for maintaining the fidelity of the signal without clipping or other unwanted nonlinear effects. Finer granularities in both sampling interval and amplitude resolution will provide better suppression of the aliasing issue and quantization noise. An analog signal, by definition, is a continuous waveform with infinite sets of possible values, in other words, no granularity nor quantization in both time and amplitude scales. Therefore, analog scalars (voltage, time, charge, etc.) outperform all other candidates due to their inherent unlimited number of bits and the lack of data conversion required before sending them into the network.

(a) (b)

Figure 1. (a) A conceptual diagram of a neuron cell, (b) requirements of weights and inputs in a neuron.

A neural network is a graph of operations formed by layers of neurons flowing from one to the next. In fact, there are so many of these MACs in a neural network that they completely dominate all other types of computation. Therefore, MAC is the main operation that needs to be merged with memory cells for the neural network applications. However, the conventional digital MAC unit is too large. As an example, a 4-bit digital system needs about 600 transistors to form a MAC unit, while a 4-bit 6T SRAM cell only contains 24 transistors, as given in Fig. 2 (a) and (b), respectively. This unbalanced ratio makes it impossible for them to fuse together efficiently. As such, both academia and industry have yet to develop practical and efficient in-memory computing architectures, usually settling for pseudo-in-memory or near-memory approaches.

(a) (b) (c)

Figure 2. A hardware comparison between digital MAC unit and unconventional analog MAC units. (a) 4-bit digital MAC unit, (b) 6T SRAM cell, (c) unconventional analog MAC units.

Since a pure digital solution seems to be hard to achieve, researchers have started exploring a variety of alternative approaches. Analog is considered one of the most promising approaches because it can handle some basic arithmetic operations with extremely limited hardware. For instance, as shown in Fig. 2 (c), some of the unconventional computing methods, such as multiplying digital-to-analog converters (DACs), time domain delay modulation, and stochastic computing, can provide operations similar to those of the digital MAC unit with much fewer circuit elements. As their hardware consumption is in the same order of magnitude as that of memory cells, we are able to integrate them into the regular memory array and form a true in-memory computing architecture.

Efficiency or Accuracy?

Recently, companies like IBM, Imec, GlobalFoundries, TSMC, Samsung, etc., together with academic groups from multiple universities like MIT, Stanford, Princeton, etc., have started to research analog in memory computation. Several experimental in-memory analog neural network accelerators have been released and proved to have exceptional energy efficiency. In 2018, a group from Princeton University published a paper on binary analog in-memory computing with a stunning power efficiency of 886 tera operations per second per watt (TOPS/W) [3]. Although it can only support binary neural networks, it can still be considered a great breakthrough in this area. While in the industry field, characterization tests performed by Imec and GFs demonstrate power efficiency peaking at 2,900 TOPS/W in 2020 under certain circumstances [4], which is more than 700 times higher than Google’s TPU. Figure 3 (a) summarizes most of the recent high-profile AI accelerators and clearly shows the value of analog approaches. Projecting from previous achievements, it is reasonable that we might see power efficiency higher than 10,000 TOPS/W in another 3-5 years.

However, when evaluating AI accelerators, the "TOPS/W" number alone can be a tricky indicator, especially for analog in-memory computing [5]. Most of the high TOPS/W analog in-memory accelerators presented in Fig. 3 (a) can only support low precision operations or even binarized weights and inputs, which leads to great inference accuracy degradation and closes the door for the possibility of edge training. By rearranging the data points in Fig. 3 (b) with a plot of power efficiency versus data quantization level (precision), we found that most of the works can be categorized into two mutually exclusive sets. Efficiency and accuracy are like the two ends of a balance. In other words, they are equally important, but it is challenging to possess them at the same time. And our task is to try to find a possible solution to reach the green region circled in the plot.

(a) (b)

Figure 3. (a) Recently published AI accelerators’ performance versus energy consumption, (b) energy efficiency versus precision.

Intel Labs’ Analog Solution

New Architectures:

In order to hit the region of interest shown in Fig. 3 (b), we need to achieve high TOPS/W (efficiency) and high precision (accuracy) simultaneously. New architectures from circuit level to system level are currently under development in our group. To solve the problem, we propose a charge domain computing method using a so-called C-2C (capacitor) ladder structure for multibit multiplication. As shown in Fig. 4, an 8-bit C-2C ladder-based MAC unit consists of only 16 capacitors and 8 CMOS switches [6], [7]. Since its hardware consumption is in the same order of magnitude compared with memory cells, it can be easily merged into the regular memory array to form a true in-memory computing architecture. A series of simulation results and a prototype in the Intel-16 process demonstrated that the proposed novel, multi-bit solution can achieve energy efficiencies that are 10-100 times greater than conventional computation approaches for AI applications.

Figure 4. Proposed charge domain analog in-memory computing architecture.

New materials:

To support the denser and more power efficient goal, the compatibility of novel memory and capacitor techniques is under investigation. A variety of types of non-volatile memory (NVM) technologies (3D Xpoint, MRAM, RRAM, FeFET, etc.), Intel super MIM capacitors, and power-via techniques are on our radar, as shown in Fig. 5 (a) and (b).

New applications:

Power-limited edge training, probabilistic Bayesian neural network (BNN) and brain-inspired high-dimensional computing (Fig. 5 (c)) are like the Holy Grail in the AI research field, shining yet hard to reach [8], [9], [10]. Ultra-high efficiency as well as high precision are the main requirements for such applications. And analog in-memory computing would become one of the key enablers for realizing the paradigm shifts during this technology revolution.

(a) (b) (c)

Figure 5. New materials (a), technology (b), and new applications, (c) of analog in-memory computing.

Project Contributors: Hechen Wang, Renzhi Liu, Richard Dorrance, Deepak Dasalukunte, Brent Carlton, Stefano Pellerano

References:

[1] Google AlphaGo: “Mastering the Game of Go Without Human Knowledge,” Nature, Oct. 2017, https://www.nature.com/articles/nature24270

[2] Google AlphaFold: “Improved Protein Structure Prediction Using Potentials from Deep Learning,” Nature, Jan. 2020, https://www.nature.com/articles/s41586-019-1923-7

[3] Charge domain CiM from Princeton Univ.: “A Programmable Embedded Microprocessor for Bit-scalable In-memory Computing,” 2019 IEEE Hot Chips 31 Symposium (HCS), https://ieeexplore.ieee.org/document/8875632

[4] Imec & GlobalFoundries press release on analog CiM in 2020: https://www.imec-int.com /en /articles/imec-and-globalfoundries-announce-breakthrough-in-ai-chip-bringing-deep-neural-network-calculations-to-iot-edge-devices

[5] MIT accelerator evaluation: “How to Evaluate Deep Neural Network Processors: TOPS/W (Alone) Considered Harmful,” IEEE Solid-State Circuits Magazine, Aug. 2020, https://ieeexplore.ieee.org/document/9177369

[6] Intel SRAM analog CiM: “A Charge Domain SRAM Compute-in-Memory Macro With C-2C Ladder-Based 8-Bit MAC Unit in 22-nm FinFET Process for Edge Inference,” IEEE Journal of Solid-State Circuits, Apr. 2023, https://ieeexplore.ieee.org/abstract/document/10008405

[7] Intel SRAM analog CiM: “A 32.2 TOPS/W SRAM Compute-in-Memory Macro Employing a Linear 8-bit C-2C Ladder for Charge Domain Computation in 22nm for Edge Inference,” IEEE Symposium on VLSI Technology and Circuits, Jun. 2022, https://ieeexplore.ieee.org/abstract/document/9830322

[8] IBM’s NVM analog in-memory computing: “Memory devices and applications for in-memory computing,” Nature Nanotechnology, Mar. 2020, https://www.nature.com/articles/s41565-020-0655-z?proof=t

[9] IBM’s analog in-memory computing for high dimensional computing: “In-memory hyperdimensional computing,” Nature Electronics, Jun. 2020, https://www.nature.com/articles/s41928-020-0410-3

[10] Intel BNN accelerator paper: “Energy Efficient BNN Accelerator using CiM and a Time-Interleaved Hadamard Digital GRNG in 22nm CMOS,” IEEE Asian Solid-State Circuits Conference (A-SSCC), Nov. 2022, https://ieeexplore.ieee.org/abstract/document/9980539

hh1667 · ‎12-02-2023

Thanks a lot for the great article. I am wondering, how the number of transistors is calculated in the case of the time domain?