Customers want more for less. That's not a hardship in semiconductors, though. Shrinking features and better design techniques help to churn out lower-power, higher-performing circuits at less cost—like the Xtensa LX.
This next-generation CPU core from Tensilica consumes just 50 µW/MHz when implemented in a 130-nm process. This power-frugal core also delivers high I/O bandwidth. Internal queues can sustain data rates as high as 350 Gbits/s for each input or output queue.
Improved compute performance is key to the LX core. Thanks to the flexible length instruction extensions (FLIX), the processor boosts performance dramatically over the previous Xtensa V architecture. The FLIX architecture provides the flexibility to freely and modelessly intermix instructions of various lengths (16-, 24-, or 32/64-bit).
Because FLIX packs multiple operations into a wide 32- or 64-bit instruction word, designers can accelerate a broader class of hot spots in embedded applications. It accomplishes this by eliminating performance and codesize drawbacks associated with a fixed instruction length.
In benchmark tests run by the Embedded Microprocessor Benchmark Consortium, the core achieved an "out-of-the-box" score of 171.6 at 330 MHz. That's a 600% improvement over the Xtensa V processor and nearly nine times better than the ARM1020E processor. When running DSP-related benchmarks (the BDTIsimMark2000), the core delivers a score of 6150 at 345 MHz—or about 70% better than the CEVA-X1620, the previous BDTI benchmark leader for licensable DSP cores.
To address the need for higher-speed memories, Tensilica crafted the core with a configurable pipeline. As a result, system designers can select two additional clock cycles, expanding the pipeline from five to seven stages for memory access if the application requires. This will help boost the clock frequency in systems that employ large local memories or low-power memories with slower access speeds.
Two innovations to improve I/O throughput complement the higher-performance processor core: an option for a second load/store unit and designer-defined ports and queues. When configuring the LX core with Tensilica's design tools, designers can select one or two load/store units that are 128 bits wide. Many applications can benefit from dual load/store units when it comes to handling data-intensive inner loops—a standard feature on many high-end DSP engines.
Adding designer-defined ports and queues helps improve the I/O capability. Ports are wires that directly connect two Xtensa LX processors or an LX processor to external RTL Port connections. They can be arbitrarily wide, enabling wide data types to be transferred easily without the need for multiple load/store operations. As many as 1 million signals (1024 1024-bit wide ports) can theoretically be used to deliver a peak transfer rate of 350 terabits/s of direct data flow per processor. Though theoretical, this shows that old concepts of I/O bottlenecks in a processor are obsolete.
While ports quickly convey control and status information, queues provide a high-speed approach to transfer streaming data. Queues can sustain data rates as high as one transfer every clock cycle, or over 350 Gbits/s for each queue added to the LX core. Custom instructions can perform multiple queue operations per cycle, perhaps combining inputs from two input queues with local data and sending the computed values to two output queues. With its high bandwidth and low control overhead of queues, the Xtensa LX processor can be used in applications with extreme data rates.
The core implements extensive power-management schemes. This lets the processor's minimum configuration dissipate just 50 µW/MHz, which is 25% less than the Xtensa V processor.
Licensing fees for a single-processor setup start at $550,000 for the Xtensa LX processor and Vectra LX DSP engine. Separate licenses are available for the Xtensa developers toolkit, which includes the Xplorer development environment, a C/C++ compiler, an instruction set simulator, and the TIE compiler.