Electronic Design

Virtually Real

Hardware acceleration and parallel processing yield realistic gaming.

You're on the five-yard line, and you've got your sights set on a rookie wide receiver who's wide open. The rain is coming down hard, but it's always raining in East Rutherford. As you raise your arm to throw, you see Paul Grasmanis in the corner of your eye. But before you can even blink, you feel your controller shaking violently in your hands. You've been sacked. Gaming has come a long way since the days of Pac-Man and Pong. The latest software is so realistic that players can almost feel the weight of a defensive tackle crashing down on them. But without the latest hardware, Madden NFL 2007 would be about as exciting as Solitaire.

The days of a single processor driving a game are long gone. The latest hardware accelerators let game designers forego canned explosions, gross approximations in lighting, and nonplayer characters with simplistic intelligence, delivering instead movie-quality images and lifelike environments where shadows, projectiles, and animals react as they would in the real world.

Today, hardware acceleration means addressing basics like audio, video, and computation. But even the computation is being augmented by Ageia's new PhysX processor. It offloads the computations necessary to simulate the physical gaming environment—tracking everything from a collision between a bullet and a jar to how many pieces the jar breaks into. This level of complexity allows games like Airtight Games' forthcoming Hangar Of Doom to have hundreds of moving objects on screen (Fig. 1).

When it comes to the platform, the gaming industry remains split between consoles, PCs, and handheld devices. The handheld market is led by Sony's PSP and Nintendo's DS and Game Boy. But an increasing number of cell phones offers games, too (see "In The Palm Of Your Hand" at www.electronicdesign.com, Drill Deeper 12538).

Each of these platforms has its tradeoffs, so the level of hardware acceleration varies significantly. A PC can afford to have sophisticated cooling systems attached to the various hardware accelerators, but handheld devices must be power misers. Likewise, handhelds have a smaller screen that places its own constraints on the type and quality of games that can be developed.

Consoles seemingly hold an edge in gaming performance when they first appear. But PCs quickly jump to the front of the pack, due in large part to their quick-change video adapters. Of course, few PCs come equipped with high-end gaming-system capability, while consoles provide a consistent platform for game developers. This also makes consoles more powerful game platforms than the average PC.

Computer gaming used to be CPU-centric (central processing unit). Now you'll find two or more cooperating processor complexes. In the future, the CPU may hand tasks like artificial intelligence or search algorithms off to other hardware accelerators (see "Smart Gaming" online at Drill Deeper 12539). The CPU in a typical gaming system already passes off work to a number of different accelerators.

The most visible part of the gaming setup is the graphics processing unit (GPU). Other components include the physics processing unit (PPU)—a technology in its infancy (e.g., Ageia's PhysX processor)—and the audio processing unit. All told, these devices make today's gaming a far cry from an 8080 banging away at a bit-map screen for a line-drawn, Asteroids-style game.

Multicore, multiple-instruction/multiple-data (MIMD), and single-instruction/multiple-data (SIMD) architectures add further to the complexity for game developers, on top of the task of spreading gaming computation among a number of processing units. Fortunately, many processing units provide a simpler, black-box level of control that hides the growing complexity of the underlying hardware.

Consoles represent the state of the art in consumer gaming, and it seems that IBM's PowerPC is the CPU platform of choice. Well, almost.

Microsoft's Xbox 360 has a CPU with three PowerPC cores (Fig. 2). Similarly, Sony's PlayStation 3 uses a version of IBM's Cell processor (see "Cell Processor Gets Ready To Entertain The Masses" at www.electronicdesign.com, ED Online 9748) (Fig. 3). It consists of a PowerPC core and a group of synergistic processor elements (SPEs) (Fig. 4).

The PowerPC's vector processing unit is one reason it's popular in gaming platforms. The Xbox 360 and the PlayStation 3 are both designed so that the various components complement each other in speed, interface, and layout. The Xbox 360 GPU/CPU connections, for example, are a block of parallel traces from one chip to the other. Differential pairs are the norm, given the high-speed nature of the connections.

The Xbox 360 consists of three major components. The CPU uses a conventional, multicore architecture with a shared level 2 cache. The PowerPC cores are identical. The Xbox 360 also features a unified memory architecture, which is one reason why the GPU doubles as the memory controller. The GPU's level 2 cache is fed from the data moving through the GPU from the off-chip memory.

The unified architecture makes lots of sense for gaming platforms. The CPU-based applications simply adjust the state of the virtual gaming world, and the GPU can immediately access it. The GPU does have its own rendering memory, which generates each frame so that the graphics core won't have to dominate access to main memory when it creates a display frame.

The south bridge (which uses standard interfaces, including the PCI Express link between it and the GPU) handles the comparatively low-speed peripherals. Serial ATA (SATA) interfaces are used for both the hard disk and optical drive. And, the USB interfaces with handheld gaming controllers.

The PlayStation 3 architecture resembles the Xbox 360 at a very high level. The major difference is that the PlayStation 3's Cell processor is the center of its world, while the GPU is the center of the Xbox's architecture (see "More Cores" online at Drill Deeper 12553). The biggest change is that system memory is connected directly to the Cell processor.

Rambus was instrumental in designing the interfaces to the Cell processor. They include the 64-bit, 25-Gbyte/s memory interface, the FlexIO used to link the Cell processor to the IO bridge (which is comparable to the Xbox 360's south bridge), and the GPU. The protocol-agnostic, parallel FlexIO runs at speeds up to 8 Gbits/s. Sony isn't providing details about the interface, though it really doesn't affect application programming. Also, audio support in the PlayStation 3 resides in the GPU rather than the I/O bridge (see "The Sound Of Thunder" online at Drill Deeper 12554).

ATI's and nVidia's VPU chips for the Xbox 360 and PlayStation 3 are custom versions of the graphic architecture used in PC video adapters. The GPU architecture from nVidia exposes the general layout for high-performance GPUs (Fig. 5). At a high level, nVidia's architecture is similar to ATI's. Application programming interfaces (APIs) used by game developers, like Microsoft's DirectX, hide the underlying hardware and convert a 3D virtual world into a 2D presentation screen

One major difference between GPU and CPU design is that GPUs typically have numerous processors, longer pipelines, and a very regular architecture. This is possible due to the GPU's very structured environment. For example, long pipelines in a GPU usually don't deal with flushing. However, a CPU's pipelines often will flush when a branch occurs. In this sense, GPUs tend to be closer to DSPs that are designed for efficient, repetitive computation.

One prevailing trend in GPUs involves the migration from integer to floating-point encoding. With extremely high-dynamic-range operation that includes floating-point support, more details can be integrated in the dark areas of an image, as well as in the brighter areas. This more closely matches the human eye's response, creating a more realistic experience.

Though ATI's approach differs slightly from nVidia's, the company still has dedicated groups of processors for vertex and pixel processing. Also, the data flow through the system is very wide, and processing is highly parallel in nature. What changes are the details of how each stage are tied together and the type of processing employed in areas such as pixel shaders.

Gamers with big bucks looking for the ultimate visual feedback can check out nVidia's SLI (Scalable Link Interface). SLI uses multiple GPUs to increase performance. It can split the processing burden in various ways, depending on the number of GPUs supported (Fig. 6).

Dual configurations support split-frame-rendering (SFR) and alternate-frame-rendering (AFR) modes, while a four-GPU system also supports AFR or SFR mode. ATI's Crossfire technology is similar in functionality to nVidia's SLI. Both companies require a motherboard to handle their cards.

SLI is applicable to PCs, but not to fixed-architecture console games. Trying to pick the best or fastest architecture is very difficult due to the many factors involved, including the type of games being considered and how they're designed.

The vase drops and breaks into hundreds of pieces. A cluster bomb destroys half a dozen space cruisers. Water flows slowly down a creek while a little minnow flips out of the water and flips back in with a splash. All of these images have one thing in common—physics.

Like 3D hardware graphics acceleration, Ageia's PhysX PPU ($299 MSRP) brings real physics to game play. The challenges addressed by 3D graphics hardware and physics acceleration hardware are complex and require lots of calculations. But streamlining the process and providing a standard interface has made both practical options for a wide range of games. In the future, the best performance of high-end games will only be revealed when physics hardware support is available, such as 3D graphics.

PhysX and 3D gaming have much in common. Both maintain the state of the environment, and the application on the CPU changes the environment by providing transformation information. That's a small amount of information compared to the environment and number of calculations required to apply the transformation.

Also, like 3D graphics, Ageia's physics environment is parametrized to simplify and standardize interaction between the application and the hardware. For example, cloth is a type of item within the physics environment that is defined by 12 different parameters like texture and flexibility

Of course, how a programmer exploits these features can lead to some interesting approaches. One game uses the cloth approach to simulate deformable car fenders. It only required setting the parameters of the item as a shiny silver cloth.

Fidelity, sophistication, interaction, and scaling show the importance of physics to a game. For example, fidelity with hardware support could show a boat floating on the water, while a simplistic approach would keep the water flat and the boat movement severely restricted. The boat example also shows sophistication.

Meanwhile, hardware acceleration permits true muscular and skeletal-based movement. Or, say, it could allow for a broken windshield that shatters into hundreds of pieces based on the impact details of another object—including its size, direction, and velocity. In a typical non-accelerated game, a canned sequence would be used whenever a windshield breaks, regardless of the object that hit the windshield.

Such interaction is key as the number of interactions increases. Imagine walking under a waterfall. The water that splashes off a person walking through the waterfall interacts with the rest of the water. These kinds of interactions are readily noticed by the observer, but game play before physics hardware severely restricted such a level of interaction.

Like 3D graphics chip vendors, Ageia keeps its internal architecture secret. The company only provides general details about data flow and the number and type of processors contained within its chip. The company indicates that any game performance bottleneck tends to occur on the graphics side, rather than with the PCI interface between the host and the PhysX chip.

The PhysX chip consists of on-chip memory with a memory bandwidth on the order of 1 Tbit/s. Its massively parallel set of processors is optimized for physics-oriented computation. Each thread of execution handles data depending on the type of interaction. For example, a tennis ball may consist of thousands of triangles on screen, but it will be detailed as a deformable body with the physics world. Fluids are interpreted using meshes.

Ageia delivered a standard API for its hardware and its free software implementation. Of course, the differences between these are usually two orders of magnitude. Software-only implementations then must limit the fidelity, sophistication, interaction, and scale to achieve reasonable game performance.

Most of the gaming technology developed over the years still targets gaming applications. But desktop applications and operating environments are starting to take advantage of features like hardware 3D graphic support and alpha blending.

Microsoft's forthcoming Windows Vista has a version of the user interface that requires 3D video hardware. No 3D hardware. Then your are stuck with the more conventional Windows XP-style interface. Don't be surprised to see 3D showing up in other places, like GPS navigation systems.

Gaming is only going to get better if you can afford the latest console or PC upgrade. The only limits are money, power consumption, and the ingenuity of developers.

For a complete list of companies mentioned in this report, see Drill Deeper 12540.

TAGS: Components
Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.