Multicore Array Targets Embedded Applications

June 16, 2011
Adapteva's 1-GHz E16G301 Epiphany delivers 32 GFLOPS using a 16-core array of 32-bit, single precision floating-point processors linked by a communication mesh that can expand to multiple chips.

1 GHz cores have 32 Kbytes of memory linked by communication fabric

Adapteva's Epiphany 32-bit, single precision floating-point multicore architecture targets low-power embedded applications (Fig. 1). Multicore systems are growing in terms of core count, but the cores often are complex, such as Intel's latest Sandy Bridge Xeons. Platforms like Tilera's 1.5-GHz Tile- GX chip with a 100-VLIW (very long instruction word), 64-bit core target communication environments and incorporate a flexible virtual memory system, allowing groups of cores to be partitioned into a symmetrical processing (SMP) island (see "Single Chip Packs In 100 VLIW Cores" at electronicdesign.com).

Xeons and Tile-GX chips provide a great programming environment, but it is overkill for many applications that are ideal for Adapteva's solution. For example, Epiphany has dedicated memory for each core without any cache and implements a NUMA architecture that permits any core to read or write information in another core's memory. Eliminating the cache provides a more deterministic programming environment.

Peripherals are memory-mapped, so it's even possible for a remote core to program the DMA channels. Each core has two DMA channels that can move memory between any memory location, not just local memory. A core, then, can get an interrupt from a local DMA channel when it completes a job.

Adapteva links cores using a mesh network called eMesh that consists of 13 networks. The rMesh handles read operations for on- and off-chip communication. Write operations are handled by xMesh for off-chip communication, and cMesh handles on-chip writes. Reads are requests that initiate a write. Writes are asynchronous, although instruction execution is synchronous waiting for off- chip memory if necessary. Adapteva implements a fixed routing system based on destination.

The RISC core has a 64-word register file and runs applications from its own 32 kbytes of memory shared between code and data. The smaller amount of per core memory leads to programming issues similar to those encountered on other platforms like IBM's Cell processor, which is found inside Sony's Playstation 3 (see "CELL Processor Gets Ready To Entertain The Masses" at electronicdesign.com). The Cell's eight Synergistic Processing Elements (SPEs) have 256 kbytes of memory and applications needed to move data between core memory and a slower but larger shared memory.

Adapteva's off-core memory access is slower than on-chip memory but not by as wide a margin as the Cell. Epiphany offers a consistent memory access mechanism regardless of location, providing a zero startup message passing system. The instruction set implements atomic operations. There is a single AND-mode global synchronization flag as well.

The 16-core, 1-GHz E16G301 delivers 32 GFLOPS of computational power. The chip has a total of 512 kbytes of memory with a 32-Gbyte/s/processor memory bandwidth. It also has four low-voltage differential signal (LVDS) interfaces associated with the edges of the mesh network. Each communication link has a bandwidth of 8 Gbytes/s in full duplex mode. The links are implemented as 8-bit ports that run at a 500-MHz double-data rate (DDR) similar to HyperTransport. The chips use only 2 W and come in a 15- by 15-mm, 324-ball ball-grid array ( BGA) package. Quantity pricing for the E16G301 starts at $499.

The 65-nm E16G301 also has nodes that are only 0.5mm2, allowing many cores to fit in the space of a single Arm Cortex core. The roadmap targets 64-, 256-, 1024-, and 4096-core chips with the move to 28 nm. Bittware, one of the early adopters, has placed four E16G301 Anemone chips on an FMC (VITA 57) carrier board. Only two of the communication links are brought out to an FPGA via the FMC connector due to pin limitations. FMC sites are common VPX (VITA46/48/65) and AdvancedMC (AMC) boards. Bittware has a number of FPGA boards that support the ATLANTiS Framework that make it easier to incorporate the Epiphany interface logic on the FPGA.

Bittware's approach highlights Adapteva's link interface, which requires an FPGA or a custom ASIC. This simplified the design of the 16-core chip and makes it essentially a coprocessor for FPGAs or host processors. It also allows the designer to choose how an array of chips will be used. FPGAs can provide any interface from PCI Express to Serial RapidIO to the array. The interface FPGA can be very simple, or it might provide its own computational facilities.

The chip targets high-performance computing. But the architecture is also available for ASIC designs, which can easily incorporate interfaces to a host processor or even provide the mesh with peripheral interfaces, allowing it to be the host. Programming is accomplished using conventional C/C++ tools. A GNU- based toolset and Eclipse integrated development environment (IDE) are available. A single-chip development system is paired with a standard FPGA board.

Adapteva's Epiphany architecture offers a glimpse at what embedded multicore designs can look like. It shows what many small cores can do.

Sponsored Recommendations

Comments

To join the conversation, and become an exclusive member of Electronic Design, create an account today!