LTE Comm Processor Implements Mutli Dispatch VLIW Architecture
ConnX SSP16 Soft Bit Processor
Tensilica has delivered a range of processor architectures like the Xtensa LX (see Power Play For The SoC Developers) as well as DSP and the ConnX comm processor (see Dataplane Processing Unit More Flexible Than DSP). Its latest ConnX processor suite includes four platforms that are combined to address the LTE market. They include the ConnX BBE64-128, the ConnX Turbo16, ConnX SSP16, and the ConnX BSP3. Each targets an aspect of an LTE base station to provide optimum configurability, power usage and performance while minimizing chip real estate.
The high end, 28nm ConnX BBE64-128 delivers 100 GigaMACs (multiply-accumulates) performance and it is designed as the central work horse of an LTE (Long-Term Evolution) Advanced system. It employs a multi slot VLIW architecture that is more akin to a CISC architecture with multiple pipelined execution units. The fixed size VLIW instructions are split into sub-instructions or slots for a particular type of execution unit like a normal VLIW architecture. The difference is that there can be more than one execution unit per type and an instruction slot does not map to a particular execution unit. Instead, the slots are dispatched in a fashion similar to a CISC system using an idle execution unit.
The execution units are also pipelined and interlocked so the programmer and compiler do not have to contend with race conditions. A typical VLIW system normally executes each instruction in a single cycle or has the compiler handle multiple cycle executions. Tensilica's approach is easier to contend with and does not have problems with interrupt handling because instructions will always complete and another instruction will not mess up the process.
The compiler can optimize performance by making sure that the right number of instruction slots are filled with code that will execute efficiently. This is similar to RISC architectures where instruction ordering can improve performance where a subsequent instruction might cause the system to idle while waiting for another instruction to complete.
Essentially the decoder takes the instructions in each slot and tries to assign them to an execution unit. A VLIW instruction will not complete until all the slots have been given over to an execution unit. ConnX BBE64-128 has two 64-bit MAC units, 4 SIMD ALUs, and 4 regular ALUs.
This same approach is taken with the other VLIW-based architectures like the ConnX SSP16 Soft Bit Processor (Fig. 1). It has a two slot VLIW instruction that drives a SIMD unit and two ALUs.
The ConnX BBE64-128 also incorporates a range of new features. Its “soft bit” vector data types support operations including arbitrary field insertion and extraction for complex transmit operations. It has rarallel register files for 10/20-bit and 40-bit data types. There are single-cycle 16-way complex radix-4 and radix-8 FFT (fast Fourier transform) and DFT (discrete Fourier transform) instructions. The instruction set supports interleaving for all bit, byte, half-word and word vector types for flexibility and efficiency in HARQ (hybrid automatic repeat request), forward error correction and convolutional coding found in LTE applications. The AXI interface allows for easy shared memory connection design when incorporating other cores.
The ConnX SSP16 Soft Stream Processor targets channel encoding as well a modulation and demodulation chores. It includes a 16-way SIMD baseband core optimized for the processing of soft bits. It can accelerate wireless communication PHY routines such as Viterbi, HARQ, and de-rate matching.
The ConnX BSP3 Bit Stream Processor is about one-quarter the size of the SSP16 and usually handles channel decode chores. It is designed for processing and control of bit streams and can accelerate wireless communication PHY routines such as bit mapping, bit interleaving and turbo encoding.
The multi-standard ConnX Turbo16 Turbo Decoder (ConnX Turbo16) is about the same size as the SSP16 but tailored as a programmable turbo decoder for LTE and HSPA+. It can achieves 150 Mbit/s decoded bit rates.
No one core address LTE well but this combination efficiently covers base station design.