The UltraScale architecture is Xilinx's answer to getting ASIC system level performance out of an FPGA. It will be utilized in their 20-nm FPGAs that including the Artix, Kintex and Virtex families. The new architecture incorporates a number of new features that target high end, high performance ASIC class applications.
Unfortunately many of the under-the-hood details of the UltraScale architecture (Fig. 1) are sparse because this is the secret sauce that gives Xilinx and edge. The overall intent it to allow designers to create efficient systems that handle the wide data paths that are hundreds of bits wide with high throughput and low latency.
One of the primary improvements is routing. It becomes more of a challenge as the size of the FPGAs grow because one end of a circuit may be farther away from the other end requiring more travel through the FPGA interconnect fabric. Originally, FPGAs were very regular with a relatively simple programmable interconnect fabric that fit nicely with the programmable logic elements (LE). It was still a challenge to fit a design so the fabric could support the interconnect and there were often LEs that would go unused because there was insufficient interconnect paths to access them. This was hidden by the place-and-route tools that have become evermore sophisticated.
- 10,000 Connections Between FPGA Slices
- Advanced Packaging Delivers Capacity And Performance
- FPGA Design Suite Generates Global Minimum Layout
- Xilinx Unifies FPGA Line
The challenge with the conventional approach is that LEs have O(N2) growth. The interconnect fabric connections, or tracks, has O(N) growth. This is less of an issue when the FPGA chip is smaller and logic is local.
UltraScale makes the interconnect fabric more complex by essentially adding a mechanism to connect an LE to ones farther away. For example, assume that LEs are connected to their nearest neighbors. It does not really work that way but it helps in understanding what Xilinx is doing. In this scenario, UltraScale adds additional connections that connect a block of LEs to another set that is a few hops away. This link would be faster and it simplifies routing when logic is more dispersed. Another analogy would be highways with local and express lanes. The former would have more exits but slower with more latency.
Xilinx's development tool, Vivado (see FPGA Design Suite Generates Global Minimum Layout), hides the FPGA fabric in existing and the new UltraScale FPGAs. The new fabric layout is more complex bu it helps reduce routing congestion. It can also increase chip utilization to over 90% even for large, complex FPGAs and designs while maintaining performance and latency.
Another aspect of UltraScale is memory optimization. Xilinx maintains the modular on-chip memory blocks for designers to employ but these blocks are designed to be stacked more efficiently. Address and data cascading is handled by hard logic instead of using the LEs and the interconnect fabric.
Cascaded memory helps with clocking but so does UltraScale's multi-region clocking support. This is akin to the the ASIC approach that lowers clock skew allowing high clock rates. It is key to handling those wide, high speed buses that are need for network routing or radar processing applications and it also improves power management. Additional static and dynamic power gating adds to the power savings.
Cascading and new features also crop up in the new 27 by 18-bit multipliers DSP support. For example, wide XOR logic operations are key to speeding up calculations like ECC or CRC. These are also operations that are very common in communication and storage applications. There is also improved support for fixed- and IEEE Std 754 floating point arithmetic.
Of course, increased use of hard logic like PCI Express blocks highlights the trend towards high speed SERDES and the need for very wide, internal buses which UltraScale is designed to handle. Massive interconnects also become more important when dealing with 2.5D (see 10,000 Connections Between FPGA Slices) and 3D structures where the FPGA is part of a larger collection of silicon. Technologies like Hybrid Memory Cube (see Hybrid Memory Cube Shows New Direction For High Performance Storage) are designed to deliver tremendous amounts of memory at very high bandwidths.
FPGAs will never match an ASIC in performance and power utilization but UltraScale comes a lot closer. Plus ASICs will never match the customizability of FPGAs.