MIPS Technologies combines multithreaded and multicore support into its latest embedded SMP platform. As with most multithreaded designs, the MIPS32 1004K’s multithreaded support provides an incremental performance boost that is less than adding another full core. Still, multithreaded support can take advantage of a core’s idle time that would otherwise waste power— a critical item in most embedded designs.
Each core can include one or two MIPS32-compliant Virtual Processing Elements (VPEs). The multithreaded support can deliver an additional 30% to 50% of a core’s base performance. This provides a nice upgrade increment for designers that start with a single-core solution, move up to a single-core, multithreaded solution, and progress all the way up to a four-core, multithreaded platform (see the figure).
Designers often are able to create a system that consumes less power by running multiple cores at a slower speed. MIPS multithreading support gives designers more flexibility. The architecture itself gets part of its performance boost from the multithreaded nine-stage pipeline. Designers can also mix and match floatingpoint support.
CACHE SIMPLIFIES SMP Each core contains a dual-port cache tag memory, allowing simultaneous access by the VPEs (only one can access the cache at a time) and the system’s coherence manager. This lets the coherence manager operate in the background.In addition, MIPS provides a number of configuration options for the cache subsystem, such as the inclusion and size of translation look-aside buffers (TLBs). Tuning the system can be critical because cache miss percentages can have a major impact on system performance. For example, the performance difference between a 0.8% and 4% cache miss ratio can be a factor of 3. Of course, the application has a major effect on this result. But determining what tradeoffs to apply is just one of a designer’s jobs.
The coherence manager handles the interaction with the optional L2 cache accessed via the 256-bit memory bus. MIPS also lets designers move I/O coherence management into hardware. This often is done in software on other architectures, reducing the performance that can be applied to the application code. The cache system supports L1 cache-to-cache transfers.
The global interrupt controller supports system and interprocessor interrupts. System interrupts can be routed to a specific core. The MIPS32 1004K will be available in the second quarter. It has a maximum speed of 800 MHz. A typical two-core/ four-VPE system with 32-kbyte L1 caches uses about 3.8 mm2.
MIPS TECHNOLOGIES • www.mips.com