Traditional programmable DSPs are running out of juice when it comes to tackling forthcoming 3G voice-over-intellectual-property (VoIP) and other high-density voice-processing systems. With a voracious appetite for billions of operations per second, these emerging applications demand a whole new class of monolithic DSP engines with a several-fold improvement in processing capability over conventional single-core DSP designs, and the ability to simultaneously handle multiple tasks, including controller operations.
Although single-core DSP chips have dramatically improved in signal-processing capability over the last decade, the thirst for performance has risen even faster. To accomplish such monumental goals, DSP developers are packing powerful multiple cores on a single chip. In addition to ensuring that these multiple on-chip cores can communicate with each other effectively, they're built to support programming in high-level languages. Moreover, these chips combine power-management techniques with the benefits of deep-submicron CMOS to guarantee low power consumption. Coupled with novel programming methodology, new C/C++ compilers are under development to generate efficient codes for these multicore DSP architectures.
While some have scalable bus-based platforms to support multiple DSP cores on the same piece of silicon, others implement switch-fabric techniques to handle higher data throughput, as well as a simpler programming model. As memory plays a key role in such architectures, shared memory techniques are being incorporated to enable multiple cores to share the communications and memory resources available on the chip. Memory size as well as the cost of the DSP chip are being optimized for the best price/performance ratio.
Analyst Will Strauss, president of Forward Concepts, identifies these multicore DSPs as access communications processors, or ACPs. He believes ACPs are as important to carrier-access and VoIP gateways as microprocessors are to PCs. Though the market for VoIP media gateways and VoIP-enabled remote-access concentrators is presently depressed, as is the rest of the telecom industry, the firm projects a rejuvenation in these markets starting next year. Strauss foresees a substantial increase in the consumption of multicore DSP processors. A recent company report indicates that multicore DSP shipments will rise from a mere $10 million this year to over $50 million next year and experience a compound annual growth rate of more than 140% until 2005.
Envisioning the need for such powerful chips in the communications infrastructure systems, like wireless, wireline, and packet-based networks, Agere Systems (formerly Lucent Technologies' Microelectronics Group) was an early developer of multicore DSP architectures. The company has leveraged the scalability of a high-speed local interconnect bus architecture, called Daytona, with the processing horsepower of StarCore's SC140S DSP supercore.
The result is the 16-bit StarPro2000, Agere's first derivative of the Daytona bus architecture using three DSP supercores (Fig. 1). Because each SC140S is a combination of a high-performance core, its local memory subsystem, interrupt controller, and bus interface, StarCore prefers to call this basic building block a supercore macrocell.
The Daytona architecture was first developed by researchers at Bell Labs with four programmable processing elements (PEs) connected to the high-performance 32-bit address, 128-bit-data split-transaction bus (STbus). Each PE in this chip implements a controller that manages the flow of data between the PEs and the shared memory, while an I/O controller handles the data flow on and off of the chip.
StarCore is a cooperative R&D initiative between Agere Systems and Motorola's Semiconductor Products Sector. According to Agere, this multicore architecture is scalable to a higher and lower number of cores.
With three such supercores on-chip, the Starpro2000 can perform 3000 million multiply and accumulates per second (MMACs) at 250 MHz. Plus, it consumes below 2.0 W at a 250-MHz clock with a 1.0-V core, not including any I/O activity.
Designed for multichannel infrastructure applications, the StarPro 2000 can process up to 64 wideband CDMA basestation voice/data bits, as well as the speech coding and echo cancellation of 64 wireless voice channels and 90 data channels. Other features include 768 bytes of shared SRAM, three serial I/O units, one parallel interface unit, two 32-bit external memory interface units, and eight memory-to-memory DMA channels. As the core SC140 has excellent control code efficiency, the StarPro2000 can execute both control and DSP operations.
Programming such multicore designs isn't a trivial job. It takes a whole new way of thinking. "Unlike programming and controlling a single-core solution, multiple cores must be treated as separate devices," notes Charlie Mera, director of marketing at Agere Systems. "Whether it's a parallel processing or a pipelined problem, each core must be treated as a single entity when writing your application code. Never try to spread the cycle-by-cycle processing of an application or thread across multiple cores," he adds.
"In parallel processing, for instance, the user can share the application code and peripherals," Mera explains. Because memory and peripherals dominate the size and cost of any DSP, this approach has significant cost advantages. In a pipelined approach, the architecture facilitates efficient exchange of data and communications between the cores in the chain. "The key decision or challenge for either case is to decide how you want to control the subsystem—centralized, where one core manages all of the resources, or distributed, where each core is involved in controlling the resources," he asserts.
According to Agere, the multicore Starpro2000 has a suite of development tools, from a compiler to an assembler/linker, to a device simulator, to fully support it. Currently, Agere is in the process of integrating OSE's Illuminator to support task aware debugging.
As the company prepares to take the Starpro2000 into production by mid-2002 in 0.13-µm CMOS, it also is evaluating a future Starpro member with built-in coprocessors, smarter peripherals, and an improved memory hierarchy.