Since their introduction in 1984, Field Programmable Gate Arrays (FPGAs) have given designers substantial flexibility and time-to-market advantages compared with ASICs. However, in recent years FPGAs have hit a wall. FPGAs’ great strength, reconfigurability, is unfortunately also the cause of a large weakness — low performance — which keeps FPGAs from competing with ASICs despite the use of smaller technology nodes. As a result, traditional FPGAs have been prevented from playing a role in many high-performance systems where ASICs still dominate, even in low-to-medium-volume applications where an FPGA would be a much more cost effective solution.
The economic and technical forces driving the world of high-performance electronics are leading to frustration with both the ASIC and traditional FPGA approaches. More often, system architects require reprogrammability and short time to market while still satisfying their performance requirements.
This article looks at how an innovative new FPGA architecture signals a break from the nearly three decades of FPGA design in which performance has often been sacrificed for flexibility and time to market. This architecture blends elements of both synchronous and asynchronous architectures, delivering FPGAs capable of exceeding standard-cell ASIC performance.
A NEW FPGA ARCHITECTURE
The new FPGA architecture from Achronix is said to achieve three times the throughput of traditional FPGAs, approaching 1.5 GHz in peak performance. At the heart of this new architecture is the picoPIPE logic fabric (Fig. 1), a fabric that is based on the use of Data Tokens rather than a conventional clocked structure. This high-performance fabric is surrounded by a conventional I/O frame of configurable I/Os, SerDes, clocks, PLLs, etc., providing the off-chip interfaces and forming the boundary between the picoPIPE core and these interfaces. All data entering and exiting the core must pass through the frame. From a designer’s perspective, the internal picoPIPE fabric is virtually indistinguishable from a conventional FPGA fabric — the only distinction is that the data throughput is substantially increased.
In conventional logic, a Data Token is a logic value which is qualified by a clock edge. With a traditional logic implementation, data is always present but is only valid (and therefore propagated though storage elements) when a clock edge is received at a storage element. Hence every time data is propagated from one storage element to the next, only a distinct and valid data value is propagated. The combination of Clock and Data can therefore be implicitly considered as a Data Token. For each register (storage element) in a design that has a clock, there will be a Data Token propagated at every clock tick.
In an Achronix FPGA, the picoPIPE fabric uses explicit Data Tokens rather than implicit ones. Wherever there was an implicit data token in the original design, it will be replaced with an explicit Data Token once the design is mapped into the picoPIPE fabric. As explicit Data Tokens are used, the clock information is encoded into the Data Token – the fact that a token exists at all indicates that a clock edge has occurred. Because each Data Token contains both Data AND Clock information, no global clock is required within the fabric. Data Tokens are still clocked into and out of the fabric using special elements in the frame. The explicit Data Tokens are controlled by fast, local handshaking rather than a global clock, hence are able to propagate at very high speeds.
The basic elements of a picoPIPE (Fig. 2) fabric are the Connection Element (CE), the Functional Element (FE), the Boundary Elements (BEs), and the pipeline stage. Pipeline stages (Fig. 3) connect CEs, FEs, and BEs to form pipeline networks. Once combined into networks, the picoPIPE implementation exactly matches the functionality of conventional FPGAs but is capable of much higher throughput.
Each pipeline stage is capable of holding a Data Token, meaning that picoPIPE logic is highly pipelined by design. In traditional logic designs, adding pipeline stages will change the computed logic function. With picoPIPE logic this is not the case. picoPIPE pipeline stages can be added without automatically adding a new Data Token into the circuit. This is possible because the new data representation has separated pipeline stages from Data Tokens. In a traditional design, adding a pipeline stage (register) will always cause a new Data Token to also be introduced –thus altering the functionality. picoPIPE logic has freed Data Tokens from being tied to pipeline stages. Thus, pipeline stages can be added without adding Data Tokens.
CEs can be initialised with a Data Token or without. Wherever a register existed in the original design, they will have an initial Data Token, while all other CEs will not have initial Tokens. The main difference between a series of uninitialised CEs and a wire is that each pipeline stage between CEs is still capable of containing a Data Token even if it doesn’t start with one initially. This enables the throughput of Achronix FPGAs to be increased while maintaining exact logical equivalence to a conventional circuit.
FEs have functionality equivalent to combinatorial logic. The only difference relates to how ingress and egress data is handled. The local handshaking within a picoPIPE network means the FEs must also handshake data in and out. This handshaking ensures that only valid, settled data is evaluated and propagated.
BEs are only used at the boundary where the picoPIPE fabric meets the FPGA frame. These elements are responsible for converting Data Tokens in the frame into Data Tokens in the picoPIPE fabric (ingress). They are also used for converting Data Tokens in the fabric back into Data Tokens in the frame (egress). Therefore every signal entering and exiting the picoPIPE fabric will pass through Ingress Boundary Elements and Egress Boundary Elements, respectively.
Higher throughput compared with existing FPGAs is achieved because of the fine-grained pipeline stages. Unlike existing FPGA implementations, these pipeline stages can be automatically inserted anywhere in a design without changing its logic functionality.
As Figure 4 shows, there are often many levels of logic between storage elements in traditional technology. It takes time for data to propagate from the Q register output through the combinatorial logic and to settle at a stable state on the next register’s D input. As the clock cannot occur until all data is settled, the clock speed must run no faster than the longest path in the entire clock network. Data in every path that is shorter than the longest path (by definition, all paths except the longest) must wait for the longest path.
In contrast, picoPIPE technology (Fig. 4) allows optimum pipelining without changing the logic functionality. Each pipeline stage has less logic depth and therefore completes its operation very quickly. This allows the rate of Data Tokens through the logic to be increased, which increases the effective clock rate.
In traditional FPGAs (Fig. 5), signals travel on long routing tracks and pass through many routing components. These signals suffer from a high capacitive load; the larger the FPGA, the longer the paths that need to be traversed. Additionally, there are many levels of logic between state holding elements (registers).
Within Achronix FPGAs, the built-in pipelining ensures that signals only ever need to travel on short routing tracks. This reduces the capacitance of the signal at each stage. For larger devices signals still may need to propagate from one corner of the device to the other. While larger devices may have slightly increased latency, unlike other FPGAs, they do not have decreased throughput, as each pipeline stage is capable of holding a new Data Token. Thus the inherent pipelining of picoPIPE technology allows maximum throughput to be maintained, regardless of how large the FPGA is. Pipelining also ensures there is only one logic level per pipeline stage, allowing a much faster rate of Data Tokens to be used.