Staking a claim to the title of world's fastest 64-bit floating-point coprocessor chip, Clear-Speed Technology's CSX600 can deliver a sustained 25 GFLOPs for DGEMM (matrix multiplication) calculations. Each of the chip's 96 very-long-instruction-word processor elements (PEs) can operate at 250 MHz.
Each PE includes multiple execution units, starting with a floating-point adder and a floating-point multiplier (able to execute 32/64-bit IEEE 754-compatible operations). Additional units include a divide/square-root unit, a fixed-point 16- by 16-bit multiplier-accumulator, an integer ALU with shifter, a load/store unit, a five-port high-bandwidth register file, 6 kbytes of closely coupled SRAM, and both a DMA controller and an address generator.
The chip includes a DDR2 memory interface, chip-to-chip bridge ports, and an instruction control processor. About 128 million transistors are used to implement the coprocessor in a 130-nm process with eight levels of copper interconnect.
When running at 250 MHz, the coprocessor consumes about 10 W and can execute algorithms such as complex single-precision fast-Fourier transforms (FFTs) with ease. For instance, it can execute a half-million 1-kpoint complex single-precision floating-point FFTs per second.
The coprocessors will be used on an add-in PCI-X card that ClearSpeed will offer late in the third quarter. Each card will contain two CSX600 coprocessors and up to 4 Gbytes of local DRAM (2 Gbytes per CSX600). Also, each card will deliver an aggregate sustained throughput of 50 GFLOPs.
Multiple cards can be installed in a PCI-X-capable personal computer. Accelerated standard libraries, pre-ported applications, and a full software development kit will support the cards. Pricing has not been established yet.
ClearSpeed Technology www.clearspeed.com