Multicore is the name of the game for performance, and three new offerings push the boundaries. Each takes a different architectural approach designed to suit its target application space. Ambric delivers 336 cores, while Tilera tries to fit conventional symmetric multiprocessing (SMP) support in a partitionable array. Intellasys targets the embedded space with a 40-core Forth machine.
BRIC BY BRIC
The Ambric AM2045 houses 336 cores organized into a 45-bric array with four 32-bit GPIO ports, a pair of DDR2 memory interfaces, and four PCI Express lanes (Fig. 1). Each bric has eight 32-bit cores with DSP support. Also, the AM2045 can churn out 1 Toperation/s at 300 MHz. Idle brics run slower to conserve power.
Each core has its own memory. Communication highways link the nearest neighboring brics. The system uses a circuit switched interconnect, allowing information to flow from one bric to another automatically. Programming is done using a derivative of Java. The programming model uses a wait on send or receive approach. Ambric’s Rapid Media Processing (RMP) platform includes aDesigner, a reference board with the AM2045 GT chip, and a range of codecs with OpenVIS support. Codecs include MPEG-2 and H.264.
SAVED BY ZERO
The 36-core Tilera TilePro36 and 64-core TilePro64 can run SMP Linux. The chip can also be partitioned so the number of cores for a particular application is configurable. There is also the concept of a zero-overhead Linux.
In this case, a single Linux thread will run in a core that is dedicated to it until it needs to call a Linux service. This allows a data-flow application to be deterministic since those cores won’t be handling Linux timer interrupts. Instead, these interrupts will be handled by other cores that might be dedicated to the managing Linux core.
Tilera’s approach differs from the others since the cores share memory (Fig. 2). It also uses a distributed cache system that can be divided into dynamic regions. A piece of cached data for a group of cores will reside in the cache of one core in the group. A sixth communication channel was added for dedicated cache support that doubled the performance compared to the earlier Tile64.
New enhancements to the Tilera line include TileDirect I/O, which takes this new cache support into account. New support for user space device drivers also takes advantage of the architecture.
THE IMPORTANCE OF INSTRUCTION
The 40 18-bit integer cores in the Intellasys SEAforth 40C18 are connected in a 4-by-10 rectangular array with direct connections between a core’s four nearest neighbors. It uses a core/wire paradigm with packed 18-bit instructions optimized for core-to-core communication.
For example, a single 18-bit instruction word contains four instructions that can implement a micro-loop for transferring data either within memory or by passing it on to other cores (Fig. 3). The unext instruction jumps back to the start of the word if the counter value isn’t zero after an auto-decrement.
Each core only has 64 words of RAM and ROM, but this is enough space for almost 10,000 instructions for a chip. Most code uses a fraction of this storage since cores are performing dedicated parts of an application.
Edge cores support peripherals such as high-speed serializer/deserializer (SERDES) for chip-to-chip links, analog-to-digital converters (ADCs), digital-to-analog converters (DACs), and a programmable memory/SDRAM interface implemented via three cores.