System designers have a number of choices when it comes to multiprocessor system architectures. Many of the components are common to single-processor designs. Some new technologies, such as Serial ATA and InfiniBand, will wind up in both single-processor and multiprocessor systems (see the figure).
Of course, the performance obtained from any multiprocessor system is only as good as the individual components it comprises, mainly the processor and memory types. So, choosing the right combination of processor, memory, supporting chip sets, and peripheral functions, as well as local and peripheral buses, is necessary to best match the application in mind. Designers have a wide selection of processors, and supporting chip sets are available to work with, while even higher-performance versions are waiting in the wings.
Symmetrical multiprocessor systems (SMPs) typically employ identical processors, with 32-bit systems giving way to 64-bit processors at the high end. Sun's UltraSparc, Compaq's Alpha, and IBM's PowerPC are 64-bit processors incorporated into the respective companies' SMP offerings. They have shown up in offerings from other vendors too, with IBM's PowerPC as the most popular.
Intel's much-anticipated IA-64 very-long-instruction-word (VLIW) processor architecture has also garnered the interest and support of vendors. The first of the IA-64 processors, the Itantium, is finding a home in evaluation and development systems. SMP systems are included in this collection, but broad commercial distribution will probably occur with the chip code-named McKinley. The IA-64 looks like the one to watch over the long term, although Sun's UltraSparc has a rather large head start.
The x86-64 from Advanced Micro Devices (AMD) is the virtual dark horse. It has been announced, but samples aren't available yet. The major advantage to x86-64 is the way that it extends the x86 architecture instead of replacing it as Intel does with the Itantium.
The Itantium runs 32-bit x86 code, but the native 64-bit code is completely different. Plus, the VLIW approach requires investment in new development and debugging tools. AMD's approach requires new development and debugging tools too, but they only need relatively small, incremental changes from the x86 tools that are already well established.
The x86-based 32-bit processors dominate the midrange. Currently, Intel's Pentium III and Xenon are the workhorses in two- to eight-processor systems. The new Pentium IV is now used in the single-processor space but is expected to replace the Pentium III and Xenon processors in most new designs. It will have to compete with the IA-64 processors. The big question is where the boundary will wind up. While x86 applications have the stability from years of development and deployment, the IA-64 has yet to prove itself.
AMD's Athlon has pushed its way into the 32-bit single-processor realm with great success. Although they're late to the SMP arena, dual-processor Athlon systems are due soon. But whether the Athlon will make a dent in systems with more than two processors remains to be seen.
Dual-processor system designs are popular because the bandwidth of the processors, memory, and chip sets allow two processors to run at full speed without radically changing system designs, compared to a single-processor system. In fact, most dual-processor motherboards are identical to their single-processor counterparts, except for a second processor socket and a change in chip-set numbers.
Intel, VIA Technologies, and Broadcom ServerWorks Group sell dual-processor chip sets, on which the majority of dual-processor motherboards are based. ServerWorks ServerSet chip set is the only product line that expands past two processors.
These chip sets target the x86 space with support for the Intel Pentium III and Xenon processors. Intel's 840 chip set is one of the most commonly used dual-processor chip sets. Its 82840 Memory Controller Hub supports 2x/4x AGP video ports and has dual RDRAM channels. The companion 82801 I/O Controller Hub ties two processors to PCI, IDE and USB peripherals.
AMD's AMD-761 chip set will allow Athlon to move into the SMP space. It had been restricted to single-processor systems where it became very popular. Meanwhile, the AMD-761 supports DDR-SDRAMs running at 100/200 MHz or 133/266 MHz, a 33-MHz 32-bit PCI bus, and 4x AGP video. Its specifications are a little less impressive than those available for the Intel processors, but the AMD-761 provides more than enough power for the majority of dual-processor workstations and servers. Additional third-party, dual-processor chip-set support for the Athlon is expected in the future.
Dual processors are standard fare in the server market, with quad- and eight-processor systems becoming more common as thin-client support, database servers, and Web-based application servers grow in importance. These configurations are found in embedded applications like telephone switching systems too, where performance and reliability are key. Such configurations need a significantly different design approach from dual-processor systems.
Midrange And Large-Scale SMP
High-end chip sets from ServerWorks employ the Grand Champion HE architecture for 32-bit Intel-based multiprocessor platforms. It has a 6.4-Gbyte/s memory bandwidth that supports up to 32 Gbytes of DDR SDRAM. The 128-bit ECC algorithm employed corrects quad-bit errors and detects 8-bit errors. The chip set also supports up to six independent PCI-X bus segments, and it is popular in quad-processor systems.
Systems with more than eight processors require more advanced designs. Typically, they employ custom architectures that address issues beyond simply connecting a large number of processors to a common memory. Partitioning, reconfiguration, error recovery, and load distribution are just a few of the features found in large SMP systems.
IBM's nonuniform memory access (NUMA) architecture offers another alternative, and it is the only one implemented in IBM products. Local memory has the fastest access time. Nonlocal memory takes longer. NUMA has the advantage of low incremental costs for additional processors and support for very large processor configurations. Unfortunately, programming a NUMA system is more difficult than a conventional SMP system.
Unisys' Cellular Multiprocessor (CMP) architecture is one approach. CMP is used in the ES7000 Enterprise Server. It supports up to four cells with eight processors per cell. The cells have their own memory that they share using a crossbar switch fabric. Unisys has licensed the CMP architecture to a number of third-party vendors.
Similar to CMP, the Sun Enterprise line uses a crossbar switch fabric. But on its high-end systems, the company employs multiple address buses. It also has advanced domain controls for partitioning processors and memory.
The one commonality among all of these architectures is their support for standard system bus interfaces for communication between processors and peripherals.
Most change is occurring in the area of system buses (see the table). The venerable PCI bus is being superceded by the faster PCI-X bus, but InfiniBand is where the action is taking place.
Although InfiniBand has the support of all major suppliers, it's just now being incorporated into systems and peripherals. InfiniBand is important not only for its increased speed and connection distance, but also because it brings a completely new, packet-oriented communication architecture into the picture. Although InfiniBand won't eliminate the need for large SMP systems, it can act as the connection system for clustered systems.
InfiniBand is like simple PCI/PCI-X support in that it's a mechanism to connect one part of the system to another. At least initially, many of the InfiniBand connects will be two other peripheral buses.
InfiniBand won't eliminate the need for PCI and PCI-X because of cost considerations. So, there's a lot of interest in AMD's HyperTransport, which is designed to provide higher performance less expensively than InfiniBand. It doesn't address many problems that InfiniBand solves, though, including longer connection distances. Instead, HyperTransport is more like a cluster system interconnect, designed to reduce the number of buses within a system.
Initially designed for Athlon, HyperTransport is a general architecture that has gained support from major vendors. It is of special interest to embedded designers trying to achieve maximum performance without resorting to a custom multiprocessor system design.
HyperTransport is scalable with 2- to 32-bit wide bidirectional links. Communication is packet-oriented like InfiniBand.
Even SMP systems deal with peripherals. Serial ATA is the one new light in peripheral bus architecture, although improvements in other standard peripheral buses continue. The most apparent improvement is in USB 2, which pushes its bandwidth past 400 Mbits/s. This puts USB 2 in competition with IEEE 1394. Its compatibility with USB 1.1 may make USB 2 the peripheral bus of choice in the future, especially given the popularity of USB 1.1 with motherboard and peripheral vendors.
The IEEE-1394 bus has the edge in multimedia devices, specifically digital video (DV) camcorders. Other peripherals, such as external hard disks, have only been popular with a very limited number of users due to cost. Only systems in Apple's Power Mac line have a standard IEEE-1394 connection. The Power Mac demonstrates the power of the IEEE-1394 bus, yet it will have difficulty competing with USB in the future. Even the Power Mac comes with a USB 1.1 port.
USB and IEEE 1394 are primarily for external peripheral connections. Inside a PC, connections are normally for storage devices. This is where IDE, SCSI, and Fibre Channel come in. All three may eventually be replaced by Serial ATA, but it could take Serial ATA and InfiniBand to replace Fibre Channel.
IDE and SCSI might have one or two more revisions before they run out of steam. At 320 Mbytes/s, SCSI can still move a lot of data, but IDE and SCSI require wide-connection cables.
Unfortunately, it will take some time to generate enough interest to place Serial ATA on the motherboard and to replace the low-cost IDE hard disks on the market today. Serial ATA's promise of higher speeds and simpler cabling must be turned into real products.
Serial ATA will benefit all systems and may be the most important system improvement (except for possibly USB 2). Serial ATA will be complementary to InfiniBand because the two address different aspects of system design.
Feeding data quickly from multiple processors is key to good system performance. Unlike today's designs, memory subsystem design used to be a snap. Sockets and chip-set support was standard. Now two standards are vying for domination: double-data-rate (DDR) SDRAM and RAMBUS DRAM (RDRAM). Each one has its own set of interface and socket standards.
DDR SDRAM builds on the standard SDRAM design but doubles the transfer rate by pumping out two words of information every clock cycle. The DDR SDRAM sockets aren't compatible with SDRAM sockets. Small-outline DDR SDRAM is available too.
DDR SDRAM also enjoys significant support from chip-set and memory vendors. This makes it more popular than RDRAM right now.
RDRAM uses a high-bandwidth, low-pin-count interface. The architecture is different from the popular SDRAM interface. New chip-set and processor designs, such as Intel's Pentium IV, can take advantage of RDRAM.
Overall, PC platforms are experiencing significant improvements in architecture and performance. Many of the technologies, like InfiniBand, Serial ATA, and HyperTransport, have yet to prove themselves, but clearly they will be viable in the long run. The main question is, when will this happen? The answer looks to be quite soon.
|Companies Mentioned In This Report|
Advanced Micro Devices Inc.
Apple Computer Inc.
Broadcom ServerWorks Group
Compaq Computer Corp.
Sun Microsystems Inc.
VIA Technologies Inc.