Single thread, multi-pipeline
Multiple thread, single pipeline
Multiple thread, multiple pipeline
Currently MIPS offers two multi-threaded solutions: the single-core MIPS32 34K and the multi-core MIPS32 1004K coherent processing system (CPS). It's new 64-bit MIPS64 Prodigy platform pushes past these systems with an ambitious multiple thread, multiple pipeline approach called simulataneous multithreading. It targets high end, high performance applications where throughput and efficiency are critical.
Most computing environments these days are multithreaded at some level including many low end, microcontroller applications. The trick with a single core is to utilize a multitasking scheduler but at the low end a single thread is actually running at any particular time. Getting the most performance per watt means efficiently utilizing the hardware.
The typical method to get more performance out of the hardware is for a single thread to drive multiple execution units (Fig. 1). Often stages within multiple pipeline execution units will be idle because of how the programs are written. Idle "bubbles" in a multi-issue pipeline can be reduced by using out-of-order dispatch but usually at a high cost in power and area requirements. Even a single pipeline can have bubbles because of outside issues such as cache updates or I/O and memory access.
In a multithreaded environment it is sometimes more advantageous to feed a single execution unit from multiple sources (Fig. 2). This can be more efficient from various views if the execution pipeline is the more costly aspect of the system. Typically this approach eliminates idle bubbles in the pipeline at the expense of limiting the per thread performance. Various hyperthreading approaches are in use today but it can be a challenge to balance the thread performance.
Things become more interesting when multiple threads feed multiple pipelines (Fig. 3). This is what MIPS calls simultaneous multithreading. It is also an approach that is common because the instruction pipeline is usually only part of a system. Memory and I/O access pipelines are also part of the mix.
The approach does not require all threads to operate in an equal fashion and it is often useful to allow threads to idle or "park" when waiting for external events. A number of architectures implement I/O in this fashion instead of or in conjunction with interrupt controllers. In this case, a thread's context is ready to run as soon as it is enabled by an external source. Of course, this approach tends to be limited by the number of contexts that can be maintained by the hardware but it is very handy when dealing with a limited number of I/O events.
The Prodigy approach includes many of the features found in other platforms such as out of order execution and speculative execution using pre-fetch and branch prediction, It is based on the MIPS 1004K architecture. The architecture utilizes lightweight Thread Contexts (TCs) and Virtual Processing Elements (VPE). The chip designer can select a mix of TCs and VPEs. This allows designers to tailor an ASIC to the application's performance requirements. TCs can wait on events and a user-configurable thread scheduler provides Quality of Service (QoS) support. This is critical for real time behavior.
The architecture is designed to provide an efficient inter-thread communication system. It is designed to allow implemention a high-performance data-flow system often found in communication platforms. This is another place where the zero-overhead TC interrupt capability comes into play.
Virtualization is part of the mix but MIPS was not as forthcoming with these details yet. A formal product launch will occur in the fall.