Multicore Mania Sweeps Through Computer Design

Dec. 1, 2008

Today’s computers are going multicore where performance matters. Whether it’s for a desktop or server, more cores are showing up in the compute engine and graphics rendering, providing users with everything from more lifelike video to so

William G. Wong

This year, three products stood out. Intel’s six-core Xeon pushes the envelope for the typical operating platforms such as Linux and Windows. The Tesla C1060 opens Nvidia’s multicore GPU (graphics processing unit) to programmers to do more than just graphics. For graphics rendering, AMD’s ATI Radeon 4870 x2 puts two multicore GPUs on a single board, which reduces overhead for communication when the two chips cooperate to render a single video stream.

SIX-CORE XEON: MULTICORE WORKHORSE Intel’s (www.intel.com) “Dunnington” 7400 series Xeon chip delivers four or six cores in a single package (Fig. 1). It’s the last of the Penryn generation of Intel processors. However, it will be the workhorse until the 45-nm, eight-core Nehalem arrives next year.

The devices in the 7400 series use Intel’s 45-nm Hi-K (hafnium- based, hi-k metal gate) technology. The chip contains 1.9 billion transistors, including a shared 16-Mbyte L3 cache. It’s compatible with exiting sockets that can handle earlier quad-core Xeon chips, so plenty of motherboards out there can corral this workhorse.

It can be power efficient, too. The six-core low end sips a cool 65 W even with a 1066-MHz frontside bus. The chip is designed to be used in systems with up to 16 CPU sockets for a total of 96 cores.

The 7400 series employs the latest virtualization technology, since these chips are destined for server farms that are running lots of virtualized clients. It supports Intel’s FlexMigration technology, which facilitates use of older client images as well as movement to Nehalem in the future.

TESLA C1060: GPU DOES MORE THAN GRAPHICS Nvidia’s (www.nvidia.com) Tesla C1060 contains a GPU with 240 processing cores (Fig. 2). The GPU employs a single-instruction, multiple-task architecture (see “SIMT Architecture Delivers Double-Precision TeraFLOPS” at www. electronicdesign.com, ED Online 19280) that’s equally useful in graphics applications and streaming computation on large amounts of data.

The Tesla C1060 is an impressive computing platform. But when combined with the CUDA (Compute Unified Device Architecture) development environment, it becomes a best-of-class system.

CUDA lets programmers use an extended version of C to develop applications that run on Nvidia's latest GPU platforms, including the popular GeForce line. Some applications will run slightly faster while others may improve by two orders of magnitude. It all depends on how much the application can take advantage of the SIMT architecture. CUDA supports multiple GPU environments like the Tesla S1070 with four C1060 class boards containing 960 cores.

ATI RADEON HD 4870 X2: TWICE THE GRAPHICS The R770 graphics processing unit (GPU) shows up twice in AMD’s (www.amd.com) ATI Radeon HD 4870 X2 board (Fig. 3). Using a pair of GPUs isn’t new. In fact, AMD’s ATI Crossfire technology has been used regularly to link a pair of boards to double graphics performance (see “ATI X1950XTX,” ED Online 14198). But the Radeon HD 4870 X2 does it with just one board.

Putting two GPUs on the same board boosts performance even more than linking a pair of boards because of the tighter integration. A pair of boards can bring even more processing power to bear.

AMD has opened its GPU to programmers as well. This opens possibilities to use the extra cores for chores other than graphics rendering, and there are plenty of applications in gaming where the HD 4870 X2 excels.