It’s been more than two years since the emergence of the first dual-core processors. Millions of mainstream PCs built around multicore processors have now been shipped and the desktop PC market has already started to move to quad-core machines as the standard. So, it seems as good a time as any to review this technology trend to determine who the winners and losers are, what the impact has been, and what the future holds.
Much has been said about multicore over the last two years, and you could get the impression that multiprocessor computing architectures were only developed in the last few years. In fact, multiprocessor machines have been around for many years, typically targeted at communications, servers, or high-performance computing applications.
Transputer time
In the 1980s, Bristol, UK-based INMOS developed the transputer. This was arguably the first general-purpose microprocessor designed for use in parallel computing systems. The individual devices were largely self-contained and they included serial links to allow a single device to communicate with up to four others. INMOS predicted that the transputer would eventually become ubiquitous, and for a time, it seemed like this was the only approach to many of the computing challenges of the time. However, two factors conspired to hasten the transputer’s commercial failure.
To make proper use of the transputer, you had to learn a new programming language—occam—and a new development environment and tools, including a "folding editor" for the source code. This allowed the developer to hide or reveal sections of code to more clearly represent program structure. These days, this is a standard feature of virtually every development environment, but at the time it was pretty revolutionary.
In addition, whilst the transputer was developed to address some of the perceived architecture roadblocks in the processor marketplace in the early 1980s, the more traditional processor vendors didn’t stand still. They retained the single CPU arrangement, but used improved fabrication processes to cram more transistors in the same space and use them to build multiple arithmetic units in the same device. Thus, along came the concept of instruction-level or superscalar parallelism.
Free lunch time
This allowed the existing Basic, Fortran, and C programs to run faster with few, if any, changes. Even though INMOS and third parties eventually developed extensions to these languages to allow them to target the transputer, the damage was already done. We had entered the era of the "free lunch," as described by Microsoft software architect Herb Sutter, where existing applications get performance improvements from increased processor clock speed.
Of course, that wasn’t the end of the road for multiple-processor architectures. They lived on in more specialised applications like communications, servers, or even games consoles. But as a mainstream platform, the single processor system continued to deliver performance improvements until just the last few years.
As clock speeds increased, chip designers reduced core voltages to keep power dissipation within practical limits. But you can’t keep lowering core voltages and still have functioning devices. When these core voltage and heat-dissipation limits were reached, mainstream CPU vendors adopted the multicore approach. The additional transistors that Moore’s Law delivers now mean that, unlike the transputer in the 1980s, which delivered multiple processors as separate devices, processor developers can now integrate multiple processors on the same die at much lower cost and greater performance.
The first mainstream multicore processors were launched in 2005 by both Intel and AMD. Nowadays, it’s becoming difficult to buy a PC without a multicore processor.
Winning the core war
Whilst many other companies offer multicore designs, on the desktop there’s really only AMD and Intel. So, can we define a winner and a loser in the war of the cores? Right now, I think the answer is pretty clear.
Right from the beginning, it became apparent that the two companies had different ideas about the best approach to deliver multicore. For example, the AMD Athlon 64 X2 took two Athlon 64 single cores and integrated them on the same die. In contrast, the Intel Pentium D took two single-core Pentium dies and integrated them into the same multichip module (MCM). On the face of it the AMD approach is a better solution, leading to tighter, faster communication between the two cores.
This proved to be the case, but AMD lost that advantage when Intel’s Core 2 devices launched. Now Intel had two cores on the same die; the difference was that these two cores were designed to work in a multicore device and the result was better performance at lower clock speeds. When it came to quad-core designs, AMD remained true to their philosophy and persevered with the Phenom architecture, which placed four cores on the same die (Fig. 1).
Intel, however, continued to be the more pragmatic by using the MCM approach again (Fig. 2). As a result, Intel was able to launch their first quad core design 12 months ahead of AMD.
AMD has found it difficult to reliably manufacture their first quad-core devices. Yields are low and the company has been unable to push clocks speeds as high as they need to compete with the fastest Intel Core 2 Extreme quad-core devices. In fact, though the AMD approach should deliver higher performance, this would only be apparent when used with demanding multithreaded applications in which tight integration and communication between all cores is important. In reality, most current desktop applications are still single-threaded.
With the more loosely coupled CPU cores found in the Intel devices, end users still get the benefit of higher performance multitasking. For example, you could keep typing up a report whilst an FPGA bitstream is being compiled in the background. Even for demanding multithreaded engineering and scientific applications that already take advantage of high-level, multicore-aware tools like National Instruments’ LabVIEW graphical programming language, Intel’s ability to deliver quad-core devices with significantly higher clock speeds is overcoming AMD’s theoretical inter-core communication advantage.
So the future really means many more cores—and multicore processors will start becoming the norm in electronic devices and embedded systems. But, software still looms as a formidable challenge. Until now, most multicore processors have enabled end users to realise performance gains through better multitasking of their existing single-threaded applications. But this has masked the fact that the "free lunch is over." Scientists and engineers designing test, control, and embedded systems have already realised that their demanding applications must be multithreaded to exploit multicore processors.
A different approach is required to program the next generation of massively parallel processors, whether on the desktop or in embedded systems. You could augment text-based languages, using extensions such as OpenMP, or take the higher-level graphical route.
The fundamental benefit of graphical programming is that programmers can "see" the parallelism in the code. One example is NI’s LabVIEW graphical programming language, which was originally developed in 1986 (native multithreading was added in 1998). LabVIEW is a fully compiled, graphical programming language based on a structured data flow. Figure 3 shows a block diagram (source code) for a LabVIEW application employing three main elements: I/O, data processing, and user interface.
Critical technique
So multicore is here and the core war is going to deliver more performance and many more cores. As a result, multithreaded programming is becoming an even more critical technique for software developers to master. Findings from the Landscape of Parallel Computing Research project at the University of California, Berkeley, put it best: "Since real-world applications and hardware architectures are inherently parallel, so too should be the software programming model."
For anyone writing software today, multicore means you must learn about multithreading and tools, like LabVIEW, that express parallelism natively.