PCI Express' impact on high-performance systems has become rather interesting now that interface standards like AGP (Advanced Graphics Port) are essentially dead. These days, high-end graphics link to a host via an x16 PCI Express (PCIe) connection. Some motherboards have a single x16 connection. But many high-end systems feature multiple slots, opening up options for products like AMD's Stream Processor.
The Stream Processor is really nothing more than a customized graphics board (Fig. 1). In fact, if you didn't look closely, you might miss the differences between the Stream Processor and the AMD graphics boards that are now shipping. Both possess dual DVI outputs to drive monitors, but the Stream Processor needn't be connected to a display to be useful.
Unlike a typical graphics board, the Stream Processor is open to the programmer. Its graphics-only counterpart comes with display drivers that program the board and only allow it to be used to display information. It can be used as a coprocessor accepting input and generating output to be used by host applications. Also, it can deliver output to a display using the onchip drivers, but these can simply be ignored if data is only going to be used by the host.
The R580 graphics processor forms the heart of the Stream Processor (Fig. 2) and AMD's X1950XTX display adapter, which is the high-end graphics adapter for desktop PCs (see "Pickin' Powerful Parts For The Top PC"). It's quite possible that a Stream Processor could find its way into a system with the X1950XTX, with the Stream Processor handling chores like physics simulation or artificial-intelligence algorithms for games.
The R580 has a number of processing cores on-chip, including 48 shader processors. The dispatch processor can keep up to 512 threads running. The memory architecture and the relationship with other on-chip processing cores are optimized for consumer gaming graphics, but it's general enough to address a wide range of other applications as well, such as streaming video processing. Each core handles 32-bit floating-point data, so it's equally applicable to many applications that crunch numbers.
Also, the R580 is programmable, and each type of core has its own programming environment. Some parts of the architecture, like the general-purpose register arrays, provide communication among relatively general shader processors. These arrays contain instructions optimized for shading in a 3D environment for a PC display. Yet like a DSP, the base architecture is a conventional processor almost any programmer can use.
The big difference between a general architecture is the level of shared resources and caches necessary to support the primary target, consumer gaming displays. This can be ignored for some applications. Or, programmers may be able to exploit the infrastructure and optimize their applications to the R580.
The board supports graphics interfaces such as OpenGL and DirectX. An included programming application programming interface (API) and toolset can be used for custom development. Essentially, developers have access to the same types of tools and features as developers at AMD who are designing the latest graphics adapters. Developers using the Stream Processor must decide what parts will be applicable, since few designers will be creating a new graphics adapter—except in a few instances where the graphics interface is being enhanced.
Overall, the Stream Processor is impressive. Its 1 Gbyte of DDR3 memory will enable it to tackle substantial problems. Multiple Stream Processors can be combined in a system limited only by the number of PCI Express slots available.
AMD STREAM PROCESSOR
Processor: R580 GPU with 48 shader
processors featuring 32-bit floatingpoint precision, eight vertex shaders, 16
texture units, 16 renderers