Electronic Design

What Will You Do With 1 TFLOP Of Double-Precision Power?

Don’t look now, but you may have a supercomputer on your desk. It’s hiding in your video card. While it won’t make your word processor faster, it may improve the transcoding speed when you’re moving movies to your mobile Internet device.

Intel and AMD have been pushing multicore in the 64-bit x86 realm with only four-core chips at this point. Intel’s 80-core Polaris is designed to push the envelope, but AMD and NVidia have other ideas, at least when it comes to stream computing.

Multicore has flourished in graphics processing units (GPUs). Until a few years ago, GPUs literally were black-box systems designed to improve gaming and deliver fast updates for CAD packages and medical applications. The closest a programmer got to the GPU was the video device driver.

That was then. Now, NVidia and AMD/ATI not only have opened up their precious GPU, they also have delivered an impressive collection of software and application programming interfaces (APIs). We’re now into third-generation boards targeted specifically at areas including stream computing.

NVidia’s C1060, designed for parallel computing, lacks a video output (Fig. 1). Still, the board often will be used for video preprocessing chores such as image analysis and ray tracing with another video card providing rendering services.

The C1060 uses the same architecture as NVidia’s GForce video adapters and packs 4 Gbytes of memory for its 240 cores. It also uses the same SIMT architecture as the GForce, just with many more cores. And, the C1060 does double-precision while the GForce products are single-precision, for now.

AMD’s FireStream 9250 is based on the company’s double-precision RV770 chip, which also is found in AMD’s Radeon HD 4850 (Fig. 2). It has 160 cores that normally are used as shaders when tasked with graphic chores.

These latest boards target high-performance computing applications, though the software used to create applications is equally applicable to GPUs in video boards. While the video boards may have to perform double duty by running a parallel application and displaying a windowed desktop, the amount of performance available is often sufficient to handle both.

The first step was to provide runtime libraries that delivered array manipulation services. Yet the real power came when programmers were able to write applications than ran on the GPU. NVidia’s Compute Unified Device Architecture (CUDA) and AMD’s FireStream software development kit (SDK) can do this, and they’re available as free downloads. A forthcoming version of CUDA will even generate code that runs on non- GPU platforms such as multicore x86 processors.

The C code used with these GPU tools is augmented to explicitly annotate the parallel aspects of the programs. Developers will need to try out this approach, and not all applications can benefit from the tools and GPUs.

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.