Electronicdesign 27537 Memorydigital 1012357142

Processing-in-Memory Accelerates AI

June 25, 2019
Renesas’ ternary SRAM is designed to further push the speed of machine-learning CNN computations.

There are a host of hardware accelerators for various machine-learning ML models. To wit, Renesas has come up with a ternary SRAM-based system to accelerate convolutional-neural-network (CNN) computations. The CNN is a machine-learning class of deep-neural-network (DNN) models.

One of the challenges with ML is moving around input and output data as well as the weights involved in the calculations. Various approaches have been used to optimize data movement. For instance, Flex Logix’s NMAX keeps weights in local memory.

The ternary approach uses two single-bit memory cells to encode 1.5 bits of information as a -1, 0, or 1 (Fig. 1). The Processing-in-Memory (PIM) method takes advantage of the ternary values.

1. Renesas’ hardware can take advantage of a ternary memory cell that stores a value of -1, 0, or 1.

The basic ternary storage can be combined into multibit solutions. Blocks can be combined for different accuracies, allowing users to optimize the balance between accuracy and power consumption (Fig. 2).

2. The hardware can combine ternary calculations into multibit operations.

Conventional memories read the contents using analog-to-digital converters (ADCs). This is a robust approach, but it requires space for the ADC and power. Renesas combined a 1-bit sense amplifier comparator with replica cells in which the current can be controlled flexibly to develop a high-precision memory data-readout circuit (Fig. 3). A “zero-detector” was developed to stop operation of the comparators when detecting the state that MAC result is equal to zero.

3. A “zero-detector” was developed to stop operation of the comparators when detecting the state that the MAC result is equal to zero.

This strategy takes advantage of the fact that the number of nodes (neurons) activated by neural-network operation is very small, about 1%, and it achieves even lower power operation by stopping operation of the readout circuits for nodes (neurons) that aren’t activated. As a result, power is significantly reduced while maintaining accuracy.

One downside of not using ADCs is that the storage isn’t as robust. Part of the issue stems from process variations during chip manufacturing. Renesas implemented multiple SRAM calculation blocks that have minimal manufacturing variations to address calculation errors due to manufacturing variations (Fig. 4). Normally, only a small number of all nodes will be activated. Nodes are allocated selectively to SRAM calculation circuit blocks that have minimal manufacturing process variations to perform the calculations. This allows calculation errors to be reduced to a level where they can be essentially ignored.

4. Multiple SRAM calculation blocks with minimal manufacturing variations are implemented to address calculation errors due to such variations.

Renesas engineers created a chip to demonstrate the ternary PIM approach (Fig. 5). The 12-nm technology chip contains four clusters, each containing the PIM and logic along with conventional SRAM storage. Each cluster can operate independently; thus, the system is able to manage up to four CNN models at one time. The chip can handle up to 128 CNN layers. PIM storage is 4.74 Mb and the SRAM stores 12.58 Mb. The 1-W chip can deliver 8.8 TOPS.

5. Renesas engineers created a chip to demonstrate the ternary PIM approach with four clusters. Each cluster can operate on a different ML model.

The chip has been used to execute a number of models, including one that recognizes handwritten characters. It maintained a recognition accuracy of over 99%. The chip is only a prototype, but it highlights how different approaches to ML acceleration can deliver higher performance while lowering power requirements.

Sponsored Recommendations

TTI Transportation Resource Center

April 8, 2024
From sensors to vehicle electrification, from design to production, on-board and off-board a TTI Transportation Specialist will help you keep moving into the future. TTI has been...

Cornell Dubilier: Push EV Charging to Higher Productivity and Lower Recharge Times

April 8, 2024
Optimized for high efficiency power inverter/converter level 3 EV charging systems, CDE capacitors offer high capacitance values, low inductance (< 5 nH), high ripple current ...

TTI Hybrid & Electric Vehicles Line Card

April 8, 2024
Components for Infrastructure, Connectivity and On-board Systems TTI stocks the premier electrical components that hybrid and electric vehicle manufacturers and suppliers need...

Bourns: Automotive-Grade Components for the Rough Road Ahead

April 8, 2024
The electronics needed for transportation today is getting increasingly more demanding and sophisticated, requiring not only high quality components but those that interface well...

Comments

To join the conversation, and become an exclusive member of Electronic Design, create an account today!