Electronicdesign 27537 Memorydigital 1012357142
Electronicdesign 27537 Memorydigital 1012357142
Electronicdesign 27537 Memorydigital 1012357142
Electronicdesign 27537 Memorydigital 1012357142
Electronicdesign 27537 Memorydigital 1012357142

Processing-in-Memory Accelerates AI

June 25, 2019
Renesas’ ternary SRAM is designed to further push the speed of machine-learning CNN computations.

There are a host of hardware accelerators for various machine-learning ML models. To wit, Renesas has come up with a ternary SRAM-based system to accelerate convolutional-neural-network (CNN) computations. The CNN is a machine-learning class of deep-neural-network (DNN) models.

One of the challenges with ML is moving around input and output data as well as the weights involved in the calculations. Various approaches have been used to optimize data movement. For instance, Flex Logix’s NMAX keeps weights in local memory.

The ternary approach uses two single-bit memory cells to encode 1.5 bits of information as a -1, 0, or 1 (Fig. 1). The Processing-in-Memory (PIM) method takes advantage of the ternary values.

1. Renesas’ hardware can take advantage of a ternary memory cell that stores a value of -1, 0, or 1.

The basic ternary storage can be combined into multibit solutions. Blocks can be combined for different accuracies, allowing users to optimize the balance between accuracy and power consumption (Fig. 2).

2. The hardware can combine ternary calculations into multibit operations.

Conventional memories read the contents using analog-to-digital converters (ADCs). This is a robust approach, but it requires space for the ADC and power. Renesas combined a 1-bit sense amplifier comparator with replica cells in which the current can be controlled flexibly to develop a high-precision memory data-readout circuit (Fig. 3). A “zero-detector” was developed to stop operation of the comparators when detecting the state that MAC result is equal to zero.

3. A “zero-detector” was developed to stop operation of the comparators when detecting the state that the MAC result is equal to zero.

This strategy takes advantage of the fact that the number of nodes (neurons) activated by neural-network operation is very small, about 1%, and it achieves even lower power operation by stopping operation of the readout circuits for nodes (neurons) that aren’t activated. As a result, power is significantly reduced while maintaining accuracy.

One downside of not using ADCs is that the storage isn’t as robust. Part of the issue stems from process variations during chip manufacturing. Renesas implemented multiple SRAM calculation blocks that have minimal manufacturing variations to address calculation errors due to manufacturing variations (Fig. 4). Normally, only a small number of all nodes will be activated. Nodes are allocated selectively to SRAM calculation circuit blocks that have minimal manufacturing process variations to perform the calculations. This allows calculation errors to be reduced to a level where they can be essentially ignored.

4. Multiple SRAM calculation blocks with minimal manufacturing variations are implemented to address calculation errors due to such variations.

Renesas engineers created a chip to demonstrate the ternary PIM approach (Fig. 5). The 12-nm technology chip contains four clusters, each containing the PIM and logic along with conventional SRAM storage. Each cluster can operate independently; thus, the system is able to manage up to four CNN models at one time. The chip can handle up to 128 CNN layers. PIM storage is 4.74 Mb and the SRAM stores 12.58 Mb. The 1-W chip can deliver 8.8 TOPS.

5. Renesas engineers created a chip to demonstrate the ternary PIM approach with four clusters. Each cluster can operate on a different ML model.

The chip has been used to execute a number of models, including one that recognizes handwritten characters. It maintained a recognition accuracy of over 99%. The chip is only a prototype, but it highlights how different approaches to ML acceleration can deliver higher performance while lowering power requirements.

Sponsored Recommendations

Understanding Thermal Challenges in EV Charging Applications

March 28, 2024
As EVs emerge as the dominant mode of transportation, factors such as battery range and quicker charging rates will play pivotal roles in the global economy.

Board-Mount DC/DC Converters in Medical Applications

March 27, 2024
AC/DC or board-mount DC/DC converters provide power for medical devices. This article explains why isolation might be needed and which safety standards apply.

Use Rugged Multiband Antennas to Solve the Mobile Connectivity Challenge

March 27, 2024
Selecting and using antennas for mobile applications requires attention to electrical, mechanical, and environmental characteristics: TE modules can help.

Out-of-the-box Cellular and Wi-Fi connectivity with AWS IoT ExpressLink

March 27, 2024
This demo shows how to enroll LTE-M and Wi-Fi evaluation boards with AWS IoT Core, set up a Connected Health Solution as well as AWS AT commands and AWS IoT ExpressLink security...

Comments

To join the conversation, and become an exclusive member of Electronic Design, create an account today!