The Hot Chips Symposium is where the cutting-edge technology appears, and this year is no different. Parallel processing is the name of the game, especially when it comes to 5G and machine-learning (ML) solutions.
We got a more detailed glimpse at Xilinx’s “Everest” adaptive compute acceleration platform (ACAP) that targets 5G and ML. ACAP is a tile-based architecture that’s commonly found in high-performance computing and network processing (Fig. 1). The non-blocking interconnect delivers over 200 GB/s/tile and adjacent cores can also share results. The instruction set includes integrated synchronization primitives. Like an FPGA, the system is scalable, allowing large arrays to tackle more ambitious projects.
1. Xilinx’s adaptive compute acceleration platform (ACAP) is built around a tile-based network of parallel processors.
Tachyum’s Prodigy chip targets networking applications with a pair of 400-Gb Ethernet links to feed its 64-core chip (Fig. 2). It’s designed to address ML applications that are becoming increasing important in managing and delving into the contents of network traffic. The cores use out-of-order (OOO) execution courtesy of the compiler. It implements instruction parallelism by applying poison bits. All of the I/O is linked via a high-speed ring that also connects to the multicore fabric.
2. Tachyum’s Prodigy chip sports a 64-core array with 400-Gb Ethernet support to handle networking tasks.
On another front, Arm lifted the veil more on its ML processor. It includes features like static scheduling, where convolution operations wait until data is DMA’d into memory, providing relatively predictable performance (Fig. 3). Convolution output feature maps are interleaved across compute engines; weight and feature map compression support reduce data and power requirements.
3. Arm’s machine-learning processor can synchronize convolution operations with DMA transfers.
Mythic explained its matrix multiplying flash-memory technology designed to deliver deep-neural-network support on the edge using a fraction of the power needed by alternatives. The hybrid digital/analog array performs calculations with in the memory array where the network weights are stored. ADCs are used to read a memory cell’s voltage-variable conductance resulting in an 8-bit value rather than the one or two bits normally stored in a cell (Fig. 4). The architecture is designed to support ML models like those developed using TensorFlow.
4. Mythic drives its memory matrix with DACs and reads results using 8-bit ADCs.
The integration of ML in almost every application space is placing demands on hardware and software. These and other technologies are rising to meet the demand.