Part 1 of this article, which also appears in the print version of the magazine, discusses the technology behind placing multiple cores on a single chip. Part 2 examines applications of this technology.
Motorola has also tapped the benefits of the SC140 core to create a quad-core-based solution for compute-intensive infrastructure applications, such as packet telephony media gateways, multichannel modem banks, and 3G wireless basestations.
Capable of delivering 4800-MMACs performance, the MSC8102 integrates four 300-MHz SC140 supercores and 11.5 Mbits of on-chip SRAM, along with high-speed serial interfaces, enhanced filter coprocessors, and a multichannel DMA engine, to support more than 60 universal voice/fax/modem channels, about 80 compressed voice channels with over 64 ms of carrier-class echo cancellation, and up to 600 uncompressed (G.711) voice channels. Implemented in 0.13-µm CMOS, the MSC8102 consumes only 1.6 W at a 300-MHz clock.
Agere and Motorola aren’t alone when it comes to catering to the requirements of OEMs developing high-density VoIP gateways and other similar system level solutions. Other major DSP vendors, like Texas Instruments (TI) and Analog Devices (ADI), are very much in this race. They continue to streamline their multicore DSP designs using the latest advances in core architectures.
Texas Instruments started building its ACPs (TNETV series) about two and a half years ago with the C54x core. It has since migrated to the latest high peformance low-power C55x DSP core. Compared to the C54x, the C55x core slashes power consumption by one-sixth and incorporates a dual-MAC architecture to double the number of functional units for increased parallel processing. The result is nearly two orders of magnitude of improvement in channel density.
For instance, the first design based on C54x could provide only two or three channels per chip. The newest version, the TNETV3010, uses six C55x cores on-chip to handle 200 to 300 channels per chip. Implemented in 0.13-µm CMOS, each C55x core operates at 300 MHz and shares multiple on-chip resources (Fig. 2).
"This architecture has been conceived and driven by the needs of the application," says Irvind Ghai, TI’s manager for VoIP/high-density silicon. "To increase port density, it focuses on power/channel, area/channel, cost/channel, and system integration. In short, it delivers a complete solution at an optimized density for a given application."
In addition, the new multicore device implements a distributed internal switch fabric to allow for multiple concurrent packet transfers. This ensures high data throughput for a chip running hundreds of voice channels simultaneously. The architecture extends the shared-resource model by leveraging peripherals across multiple cores. "This internal switch fabric has been designed with the goal of providing a simple programming model, while alleviating any bandwidth concerns through multiple concurrent transfers," Ghai notes.
ADI has also implemented the shared memory approach in its dual-core ADSP2192, tailored for multi-channel voice/fax/data over networks. "Besides sustaining inter-processor communications and programming flow, shared memory minimizes off-chip memory, and thereby the cost of the system," asserts Mark Gill, ADI’s DSP product line manager. The ADSP2192 is a derivative of the ADSP219x core. Using the ADSP218x core, ADI has also crafted an eight-core solution for 16 to 20 modem channels on a single chip.
Another builder of communications processors taking the multicore route is Brecis Communications. The developer has crafted a proprietary 3.2-Gbit/s multiservice bus that permits multiple processing engines to talk to each other,ensuring efficient peer-to-peer communications. Minimizing the risk of delay and jitter to voice payloads, this high-band-width multiservice bus is segmented and operates on a split-transaction protocol. It provides dynamic priority switching based on the classification of data types, as well as intelligent DMA capability. Connectivity to the bus is done via data transfer units.
Extending this technique, the maker has crafted a three-processor system for multiservice applications, such as packetized voice and data, with no loss in quality of service. Capable of tackling multiple, simultaneous streams of voice and data at the end of the last mile, namely customer premise equipment, the MSP5000 consists of a 200-MHz MIPS 32-bit RISC core for system management and traffic control, and two 160-MIPS ZSP400 DSP cores for voice and packet processing.
Each DSP core is supported by a hardware accelerator. Other functions on-chip include dual 10/100 Ethernet MACs, a telephony voice interface, a WAN data interface, a data-encryption-standard (DES) accelerator, packet authentication, a memory controller, external interfaces for configuring external devices, a JTAG interface for debugging, and a four-way data switch.
Together with the hardware accelerator, the voice engine supports up to 24 voice channels, in addition to echo cancellation, tone detection, and voice compression. The packet engine provides ATM and frame-relay encapsulation, packet classification, and shaping. The 0.18-µm CMOS MSP5000 consumes 2 W. Also, a full suite of development tools and a complete development program back it up. Derivatives for limited-voice and data applications are in the works too.
To quickly strengthen their positions in the burgeoning VoIP gateways and other wide-area-network markets, several leading broadband communications IC suppliers have acquired fabless design houses with expertise in the design of multicore DSPs. Last year, Broadcom acquired startup Silicon Spice, which devised a powerful multi-core RISC/DSP processor called Calisto for configurable algorithm adaptive instruction-set topology.
Early this year, Broadcom formed a technology and marketing alliance with Spectrum Signal Processing. Under this alliance, Spectrum is using Calisto or configurable algorithm-adaptive instruction-set topology to develop high-density voice processing boards for use in communications gateways. These gateways are deployed in telecommunications to enable voice/fax/data traffic to pass between two different networks, such as the traditional circuit switched telephone network and the packet-based Internet.
Compared to older DSP-based solutions, these next-generation voice-processing boards offer a tenfold improvement. According to Spectrum, traditional DSP-based boards offer between 240 and 500 channels on a PCI Mezzanine card. The Calisto-based solution leapfrogs such densities.
Spectrum has replaced the older C54x DSP in its aXs family of packet voice-processing boards with the Calisto processor. These newly developed boards employ modular algorithms and embedded packet telephony software developed at Broadcom. The packet telephony software resources, like voice compression, echo cancellation, tone detection and generation, IP packetization, ATM cellification, and delay equalization, were brought on-board with Broadcom’s acquisition of HotHaus Technologies last year.
Performing 2.7 billion instructions per second (BIPS), this new class of access communications processor combines 16 166-MHz DSP cores and five RISC processors on the same chip (Fig. 3). Unlike conventional DSP cores, each core in this 4-by-4 array uniquely combines scalar and vector processing. Plus, it offers room to run user software.
In similar moves last year, PMC-Sierra grabbed fabless developer Malleable Technologies to gain access to its multi-core Meca DSPs for voice-over-packet applications. Intel also brought VxTel’s multicore capabilities on-board. Integrating all DSP and packet processing functions on the same die, Meca can replace more than 10 conventional general- purpose DSPs in any application today, PMC-Sierra claims.
Leveraging their core competency, many leading providers of licensable DSP cores have joined this fray. In conjunction with Tality and HelloSoft, BOPS has readied a system-on-a-chip (SoC) solution, called VoiceRay, with a prepackaged application for a carrier-class VoIP gateway. While HelloSoft offers the application software for the VoiceRay chip, Tality provides SoC implementation services.
BOPS’ VoiceRay incorporates a set of eight fundamental cores and a 64-bit MIPS RISC engine to attain 192 channels of G.729a or 512 channels of G.711, both with 32 ms of echo cancellation on a single chip. The RISC CPU does the housekeeping and manages the traffic to each of these cores (Fig. 4). Each basic core in this design is a scalable ManArray-based 1x2 (PEs) fixed-point core, wherein each PE is a five-way indirect very-long-instruction-word (iVLIW) architecture.
In addition, the fundamental 1x2 core employs a single-instruction, multiple-data (SIMD) format to provide a high level of built-in parallelism. While the cluster switch buried inside of each core provides interprocessor communications, the on-chip DMA engine, coupled with control and data buses, enables communications between the RISC processor, the cores, and the peripherals.
"Our SoC intellectual property is licensable and synthesizable, allowing the user to take it to the foundry of choice," notes Zafar Malik, vice president of SoC design services at BOPS. "Therefore, by porting it to the right process technology, the user can achieve an optimum combination of size, power, and cost. By providing a licensable SoC in a box, BOPS has taken the IP game to the next level."
"As designers pack large numbers of cores on a single chip, the software task partition, system bus bandwidth, efficiency, and external memory band-width play a more critical role than the DSP core’s performance," explains Kan Lu, chief technical officer at 3DSP, a maker of configurable DSP cores. Combining a modular (loosely coupled) approach with proprietary SP-5 super-SIMD cores and a shuttle bus controller, 3DSP has developed an eight-core SoC for VoIP gateways. Designed for a network company, this VoIP SoC handles 768 channels with 128 ms of echo cancellation. Maximum power consumption for this device is 2.1 W, when the SP-5 core runs at a 280-MHz clock.
Concurrently, National Semiconductor is prepping a configurable SoC for 3G cellular handset applications employing multiple SP-5 cores and other system-level IPs that the company has obtained by acquiring Algorex in December of 1999. Using 0.18-µm CMOS, the supplier expects to see prototypes of its new baseband chip before year’s end and hopes to supply them to key customers by early next year.
Likewise, RealChip Communications has licensed Infineon Technologies’ carmel DSP core to build a multicore VoP solution on a single chip. Backed by applications software and a complete toolset, RealChip is confident that its SoC devices will enable convergence of voice, video, and data over packet-based networks.
Although the building block cores deployed in these multicore DSP SoCs offer some degree of flexibility and programmability, they’re not customizable and configurable. But that scenario is changing rapidly as licensable core developers, like ARC International, Improv Systems, QuickSilver Technology, and PACT, unwrap configurable methodologies for DSPs. They believe that configurable methodologies will alter the DSP landscape.
Improv, for example, has generated a programmable system architecture based on its configurable VLIW core, called Jazz. "Our platform is programmable, scalable, and configurable," notes Bob Bell, director of marketing at Improv. A key feature of this core is that it permits a designer to add custom instructions and/or execution units via the PSA composer tool suite. According to Improv, the Jazz PSA platform is fully supported by a Java-based development system and an advanced compilation system that provides system partitioning, memory allocation, code generation, and optimization.
Likewise, ARC has developed a user-customizable 32-bit RISC processor with DSP extensions, letting the user modify and extend the architecture for specific applications. One early adopter of this architecture for multiprocessor design is Hyperchip. Packing 16 ARC cores on-chip, Hyperchip has developed a petabit networking router.
Combining the best of both microprocessor and DSP capabilities with massively parallel processing on the same chip, PACT has readied a dynamically reconfigurable DSP engine with unprecedented performance and bandwidth. Comprising an array of 128 PEs on-chip, this extremely parallel reconfigurable 32-bit core can perform over 50 BOPS, while consuming only one-tenth the power of leading DSP designs, claims Martin Vorbach, PACT’s co-founder and chief technology officer. PACT hopes that the XPU128 will be used in conjunction with a DSP core in a multicore SoC solution for emerging 3G and 4G wireless basestations and other compute-intensive high-bandwidth applications.