Flexibility is key to FPGA success, but speed is equally important. Achronix almost triples the throughput of the system by taking clock gating to the extreme. The Achronix Speedster FPGAs use a unique pipeline architecture but completely hide it from developers. Designers can use the devices with unaltered Verilog, VHDL, or RTL. Developers also can continue to use development tools like Synplicity’s Synplify-Pro and Mentor Grahpics’ Precision.
Speedster’s overall architecture (Fig. 1) and specs (see the table) look like most FPGAs. It is RAM-based and built around four input lookup tables (LUTs). Also, it has the usual complement of I/O interfaces, including high-speed serializer/deserializers (SERDES) and memory controllers. Most importantly, picoPIPE elements— which no other FPGA has— are sprinkled throughout the interconnect fabric (Fig. 2).
The picoPIPE elements change the way things work within Speedster. In a conventional FPGA, LUTs are connected together and data flows from one latch to another. The latch clocks are normally synchronized, and clock distribution and synchronization are major limitations in conventional FPGAs as well. This becomes important in pushing the performance boundaries of the system.
System clocking must account for the delay through the LUTs. This means that the clock rate will be limited by the maximum delay through the longest chain of LUTs. In the sample example above (#1), the delay would be three LUTs. Achronix makes a different assumption by placing a picoPIPE between each stage.
In the first example (#2), the clock rate of the Speedster can be increased by a factor of four because the queue will include this many states. The shortest chain will limit the maximum number of states a subsystem can contain. If #2, #3, and #4 are used in a design, then #4 is the limiting factor with only two states. If only #2 and #3 were used, then the subsystem could handle up to four states.
The picoPIPEs operate in an asynchronous fashion. This is significant because it eliminates the clock distribution and synchronization problems since clocks that are only used with latches and the source and destination clocks do not have to be synchronized. They do need to operate at the same speed, though.
STEP BY STEP Following data through the system helps understand how things work. The first piece of data (1) enters the systems when the left-most set of latches is clocked. In the conventional FPGA, the data will propagate through the LUTs, and it will be available at the other latch when the next clock cycle occurs. The LUT delay limits the clock rate, as already noted.With Speedster, the first data item will run through the picoPIPE FIFOs until it gets to the other end of the system. It is removed when the latch on the right side is clocked.
The delay through the LUTs is the same, but there can be more than one piece of data within the subsystem. If a piece of data essentially “bumps” into the next piece, it will remain in the prior picoPIPE stage until the data is removed from the next stage.
This approach permits different length paths such as #2 and #4 to operate within the same subsystem. But the number of items within the subsystem is limited by the smallest number of picoPIPEs within any one chain of computation. It is possible to insert empty stages without LUTs as in #3 so its FIFO length matches the other stage (#2).
If all three sample stages (#2, #3, and #4) are used, then only two pieces of data can be in the subsystem at a time. If there are more, then data will be lost as in a typical FIFO architecture. Also, the clock rate of the system is now limited by the delay for a single LUT, not the overall chain. This gives the Speedster its high-throughput characteristics.
Achronix effectively hides the picoPIPE from the development process except for optimization and tuning. The place-and-route system automatically allocates picoPIPE elements. The developer gets the same kind of throughput information that a typical FGPA place-and-route software package will provide, but Achronix additionally provides information about picoPIPE usage.
BASED ON ECLIPSE The Achronix CAD Environment is an Eclipse-based tool. It provides advanced place and route, timing analysis, and critical path analysis. It lets developers tune the use of picoPIPE stages. As with FPGAs in general, there is a limitation on the number of items and routes available to the place-androute software, so usage does not hit 100% even when a design hits the limit of the hardware. That is one reason why there are lots of picoPIPE elements on a chip. This tends to be the limit of picoPIPE exposure to developers.It will be interesting to see whether this opens up in the future since the self-clocking FIFO architecture opens up significant design possibilities. Achronix has taken clock gating to the extreme without the problem of synchronization. This technology is a game changer. These FPGAs aren’t for everyone. When it comes to pushing the envelope, though, Speedster looks to beat even ASICs.
Pricing for the Speedster starts at $200. The SPD60 will be the first one available. A development kit provides access to the platform.
ACRHONIX • www.achronix.com