Are you about to start a new design with massive signal-processing requirements, such as a media gateway or the latest MRI device? If so, consider using a DSP farm connected via the Serial RapidIO (sRIO) protocol, or just go out and buy a DSP "farm on a chip."
Serial RapidIO is a high-speed, packet-switched, point-to-point protocol with a predictable low latency. It's ideally suited for connecting scalable DSP farms used in video transcoding, industrial imaging, media gateways, wireless basestations, and other applications where bandwidth and low latency are crucial. A "farm on a chip"—a semiconductor that contains several DSP-type processors and fast interconnect—serves the same purpose as connecting several DSP chips together for the same types of applications.
The sRIO architecture provides a predictable standard for IC chip communications across a high-speed point-to-point serial I/O network. It can be configured in a number of different topologies to suit your design requirements (Fig. 1). It was designed for processor and peripheral interface connections, where bandwidth and low latency are crucial.
"The use of Serial RapidIO in bandwidth-intensive DSP applications helps simplify multiprocessing design by minimizing the number of signal pins used to achieve high-speed interconnect between DSPs and by simplifying the interprocessor communications, both of which help decrease system cost," says Leon Adams, worldwide manager of DSP Product Marketing for Texas Instruments. Additionally, Leon says that sRIO support has made TI's TMS320C6455 DSP a popular solution for the scalability requirements in DSP farms.
Serial RapidIO signals use an 8/10-bit encoding scheme (called 8B/10B encoding), in which eight bits of data are transmitted as 10-bit symbols and include special control symbols. Therefore, a packet size of 276 bytes is required to transmit 256 bytes of data. The remaining 20 bytes are used to identify both the sender and recipient and include cyclic redundancy codes (CRCs) and protocol information. Several other serial applications often employ 8B/10B encoding, such as Gigabit Ethernet, PCI Express, and InfiniBand.
In general, the RapidIO architecture is partitioned into a hierarchy of three layers: logical, transport, and physical. It's also highly scalable, allowing for simple future product upgrades that require more processing power while maintaining compatibility.
Speaking of compatibility, sRIO is a tried and true standard that provides the flexibility needed for a large number of applications. As a result, there's no need to "roll your own" when you wish to connect multiple DSPs together, since sRIO is a robust open standard with no proprietary elements.
This means there's no need to design an ASIC to handle inter-DSP communications over a proprietary bus. Off-the-shelf sRIO switches are available from companies like Tundra Semiconductor. Tundra also offers a number of development boards that use sRIO to connect several DSPs together, so you can be up and running in no time.
"In many applications such as wireless infrastructure or video head end equipment, multiple DSPs must be clustered to ensure that there is sufficient processing power to meet the real-time signal-processing requirements of the application," says Devashish Paul, product marketing manager for Tundra.
"Serial RapidIO switches provide a high-performance means of clustering large numbers of DSPs in a small form factor with a three-layer protocol terminated in hardware, minimizing the DSP software overhead normally associated with terminating other protocols such as Ethernet," continues Paul.
What makes the RapidIO architecture an ideal fabric technology? RapidIO's fabric was designed with architectural independence, traffic and fault management, and performance in mind. As such, RapidIO holds advantages over alternative architectures for distributed DSP applications.
For example, PCI-X and PCI Express use a hierarchal, spanning-tree topology. RapidIO employs peer-to-peer communications to implement several topologies, including dual-star, mesh, daisy-chained, and tree topologies (Fig. 1, again).
"Early on, it was recognized that DSP farm applications were a 'killer app' for Serial RapidIO. Legacy shared buses connecting more than a handful of devices become challenging electrical and board layout adventures. Fundamental physics intrude to cap increasing data rates," says Greg Shippen, system architect with Freescale Semiconductor's Networking & Computing Systems Group.
"Using SERDES (serializer/deserializer) technology, we achieve high bandwidth over just a few pins. With full hardware offload of the protocol stack, RapidIO minimizes the use of valuable DSP cycles simply to move data around the system," Shippen continues. "For example, multiple Freescale StarCorebased MSC8144 Quad Core devices can be connected using RapidIO to easily create a compute resource for wireless basestation, video transcoding, or packet telephony applications."
Another noteworthy point is that the RapidIO protocol calls for devices to share memory globally. Also, it provides direct memory access from multiple endpoints. Protocols like PCI generally use a common memory map that must be shared among all connected devices.
But what if you have to use both PCI and RapidIO? In this case, consider Micro Memory's CoSine family, which allows for bridging PCI/PCI-X/PCI-Express and sRIO. It also provides a multiport double-data-rate (DDR) controller and permits real-time data transfer between PCI and sRIO.
The type of system topology varies based on the requirements of the specific application. Serial RapidIO is optimal since it's incredibly flexible and lets the developer arrange the DSP network in ring, mesh, or star topologies for performance upgrades. Also, sRIO allows for more available bandwidth in the topology for other DSP devices to communicate with each other in tandem.
"With the demand for ever-increasing processing performance offset by system constraints such as size, weight, and power, FPGAs are being deployed to perform the heavy transformations and data reduction common to so many DSP applications," says Mike Jadon, director of product marketing for Micro Memory.
"However, many of our customers are finding the best balance to be in heterogeneous processing, where FPGAs are utilized in conjunction with general-purpose processors (GPPs) such as PowerPCs," adds Jadon.
"Combining heterogeneous processing with a scalable fabric interconnect like Serial RapidIO has turned out to be the most effective approach for them in meeting their project requirements, both technically and in terms of cost and time-to-market."
RapidIO is quickly being adopted by leading suppliers of embedded semiconductor devices, such as Motorola, Freescale, Texas Instruments, and Tundra Semiconductor. OEMs such as Alcatel, EMC Corp., Ericsson, and Lucent are also adopting RapidIO for the ease with which RapidIO may be integrated into the system.
DSP "FARM ON A CHIP"
What if you didn't have to buy several DSPs and connect them together using RapidIO, or any other method, since all of the compute power you needed was right at your fingertips in a single device? If this sounds appealing, say hello to two devices that may make your life easier. Ambric Semiconductor's Am2000 massively parallel fixed-point TeraOps solution uses a globally asynchronous, locally synchronous (GALS) architecture that enables solutions for the high-performance video-and image-processing markets. The Am2000 ICs are built around a set of parallel, multiple-instruction, multiple-data (MIMD) arrays (bricks) of 32-bit reduced-instruction-set computing (RISC) processors and memories in a fabric of asynchronous messaging channels (Fig. 2).
Ambric's claim to fame is its structural object programming model, which provides a much simplified platform to quickly develop and debug embedded applications. It also shows us why object-oriented programming (OOP) isn't just for software developers anymore.
You too may now take advantage of the Eclipse integrated development environment (IDE) and Java to engineer your application using venerable OOP techniques. Then the architecture can take care of the processing using good, old-fashioned divide-and-conquer techniques. Unused processors and their associated RAM are key benefits of a massively parallel architecture as they can be used for debugging without any functionality or performance penalties.
CELL STEPS FORWARD
Next, the research efforts of Toshiba, Sony, and IBM yielded the Cell Broadband Engine Architecture, or Cell for short. The "nucleus" of the Cell processor is a powerful RISC 64-bit dual-threaded IBM PowerPC core. The "ribosomes" are a set of eight 32-bit synergistic processing elements (SPEs), which are specialized coprocessors. Each SPE is a floating-point unit that's well-suited to quickly handle single-precision and double-precision mathematical calculations.
The Cell's "endoplasmic reticulum" comprises two high-speed busses. The first is for intra-Cell communications, called the Elemental Interface Bus (EIB). The second, dubbed the FlexIO bus, is used for inter-Cell communications when two or more Cell processors are connected together.
Target applications include high-definition displays, recording equipment, entertainment systems, digital imaging systems, and physical simulations (e.g., scientific and structural engineering modeling). Mercury Computer Systems is planning several server-class systems with the Cell integrated. Toshiba has plans for Cell-based HD televisions.
In addition, Sony announced that it will sell its highly anticipated PlayStation 3 with the Cell processor later this month. Now, what self-respecting gamer would pass up the chance to own a console that runs at 4 GHz and is theoretically capable of 256 GFLOPS, which, in all likelihood, is far more powerful than their PC?
NEED MORE INFORMATION?