Designers are packing more functionality into less space while increasing the data bandwidth in portable devices. Unfortunately, they're quickly running out of signal-processing performance. Today's crop of DSPs can deliver 30 to 100 MIPS. But portable communications also need to handle algorithms for MP3 audio and other multimedia and web/Internet support tasks.
To help developers of next-generation cell phones, PDAs, and other products, designers at Infineon have crafted a second-generation implementation of their Carmel DSP core. Used in conjunction with the PowerPlug accelerator options, it can deliver from two to ten times the performance of the first-generation Carmel DSP 10XX family.
The core contains an enhanced instruction set that includes all the commands from the DSP 10XX. It also features additional commands that target the complex signal-processing tasks required by advanced voice processing, third-generation cell-phone algorithms, multimedia applications, and data communications. Like its predecessor, the Carmel DSP 20XX core retains the dual multiplier-accumulator (MAC), the dual arithmetic and logic units (ALUs), and the unique configurable long-instruction-word (CLIW) capability. The CLIW feature lets programmers create their own "superinstructions" by combining several of the DSP core's instructions into one large operation that does more in parallel, greatly improving algorithm performance.
Also, special add-on hardware accelerators known as PowerPlugs can be cointegrated with the core whenever the software execution can't deliver the performance required by the algorithms. These coprocessor blocks simplify the system-on-a-chip (SoC) implementation, since designers don't have to worry about importing a function from another design library or designing it themselves.
The PowerPlug functions that initially will be available include a MAC, a quad 8-bit ALU, and an MPEG-4 video decoder. The MAC PowerPlug can supplement the core's dual MAC. The cointegration of two MAC PowerPlugs would enable the DSP to deliver four MAC operations per clock cycle. In turn, this would permit the DSP core to handle multiplication-intensive computations.
Similarly, the quad 8-bit ALU PowerPlug can increase the pixel-processing rate when used in conjunction with the Carmel's ALU. If the quad ALU is employed for all four PowerPlug coprocessors, the DSP block could process up to 16 pixels per cycle. Or, the MPEG-4 PowerPlug could occupy one of the coprocessor slots and process real-time MPEG streams.
These PowerPlug blocks provide the DSP core extra horsepower for functions that can't readily use traditional "MIPS" (typically, multiply-accumulate operations). Such nontraditional MIPS are accounting for an ever-larger portion of the DSP's execution time as the applications go beyond the traditional signal-processing realm.
Some DSP solutions resolve these challenges by adding dedicated instructions that allow efficient implementation of the algorithms—commands to execute key portions of a Viterbi decoder, for example. These instructions are supported by dedicated hardware that provides the computational acceleration.
Not everyone writes algorithms in the same way, though. Dedicated instructions and hardware could end up as excess baggage. This was mainly why Infineon developed the CLIW approach, which lets programmers define their own CLIW instructions. Each is actually a composite of up to four Carmel instructions that execute in parallel.
This CLIW concept has been extended in the Carmel DSP 20XX series. Designers not only can configure the instruction set in the new core, they also can modify the core's datapath to meet their application requirements. These modifications are implemented through the addition of one to four PowerPlug accelerators (see the figure).
The Carmel 20XX's basic architecture consists of two processor blocks. One of these contains an ALU, a MAC, a barrel shifter, and an exponent unit. The second processor block only has a second ALU and a MAC. The two blocks can operate in parallel to make fast work of time-critical computations. Both MACs can perform 16- by 16-bit single-cycle multiplications and accumulations of up to 40 bits.
Sometimes, designers need more compute power. In that case, PowerPlug accelerators can be added to the architecture by tapping into the data-bus switches. The accelerators can be linked to the software via the CLIW commands, which the user must construct.
The Carmel 20XX core generates the memory addresses, control signals, and wait states. It also provides the PowerPlug accelerator operation codes and operands while writing back the results to the memory. PowerPlug-enabled development tools recognize PowerPlug-extended CLIW instructions. The accelerators are fully supported during debug and emulation, including the ability to view PowerPlug-specific internal registers.
The accelerators decode their own instructions, control their own registers, and perform the desired datapath function. The CLIW commands provide flexible software control of the PowerPlugs. Since the accelerators are modular and interruptible, the DSP designer can mix and match the PowerPlug accelerators and then dynamically select them at every instruction. The accelerators also support memory wait states.
Able to control up to four PowerPlug accelerators simultaneously, the CLIW operations employ 16-bit instructions for each PowerPlug. Those instruction opcodes are coded inside the CLIW word and passed to the appropriate PowerPlug unit along with the operands. Opcodes can optionally be extended to 32 bits for use with very sophisticated PowerPlug functions.
CLIW instructions are 144 bits wide. They're structured as a 48-bit instruction word that's stored in program memory, and a 96-bit instruction word (six 16-bit instructions) stored in the CLIW active memory (an SRAM block). The RAM can hold up to 2048 CLIW instructions. An 11-bit field in the 48-bit instruction word is used to index the CLIW memory and serve as a lookup pointer to fetch the correct 96-bit custom instruction, which is divided into six parallel subinstructions.
The first four subinstructions apply to the basic Carmel 10XX/20XX common architecture for the two ALUs and two MACs. PowerPlug instructions can be substituted for any or all of these four original instructions. The remaining two instruction slots usually hold memory-access instructions.
The Carmel core uses 24-bit instructions. Those instructions can be extended to 48 bits to get a wider selection of operands that have larger immediate operand fields and direct operand references. In a single cycle, the core can execute one standard 24-bit instruction, two standard 24-bit instructions, or one standard 48-bit instruction. The CLIW architecture extends the "traditional" DSP instructions by an additional 96 bits.
The core supports conditional execution through a predication mechanism that avoids branch penalties and also avoids the use of fast context switching. It utilizes a register-bank exchange instruction and a conditional-execution load instruction. Hardware looping support is available in the DSP core to allow zero-overhead loops, nested up to four levels. Back Trace instructions that accelerate functions such as Viterbi decoding are included as well.
ALU instructions support double-precision operations. Also, there are special instructions for square, divide, minimum/maximum, block floating point, logical and arithmetic shifts, bit manipulation, fractional and integer arithmetic, and other special operations. These unique operations include limiting, saturation, nearest and convergent rounding modes, and special instructions for efficient C-compiler support.
Price and Availability
The Carmel 20XX core is immediately available. The license fee for the core and the PowerPlugs depends on volume and other factors. To discuss license fees, contact Shaul Berger.
Infineon Technology Inc., 1730 N. First St., San Jose, CA 95112; (800) 777-4363; www.infineon.com.