Designing a single-purpose FPGA prototype board is hard enough. But what about a register-transfer-level (RTL) emulation system based on FPGAs? Emulators are known for their fast compile times and simulator-like debug capabilities—features not normally associated with FPGAs. Throw in the need to support up to 1 billion ASIC-equivalent gates as well as multiple concurrent users with multiple use models, and you’ve got quite a challenge. That, in fact, is the challenge EVE faced while developing its latest product, ZeBu-Server (for “Zero Bug”).
For an FPGA-based emulation system, we needed to consider both the hardware and the software. Let’s look at the hardware first. At the lowest level reside the FPGAs that will be used for the design under test (DUT). For ZeBu-Server, our overwhelming priority was density. But also on our “must-have” list was bandwidth for sustained performance and a “readback” capability for debug.
The Xilinx Virtex-5 LX330 devices provided the right combination of design capacity, high-speed I/O, and visibility into the DUT. With an emulation capacity of 2.5 million ASIC gates per FPGA, we need 400 FPGAs to support a billion-gate design.
There’s more to emulation than just the DUT. Emulators support a number of use models, including co-emulation with an HDL simulator, transaction-based emulation, and emulation with a synthesizable testbench or a target hardware system. These reconfigurable testbench (RTB) capabilities are best kept separate from the DUT to leverage incremental compilation for shorter turnaround times.
In a fully configured ZeBu-Server system, we use 25 RTB FPGAs to handle the HDL and transaction-level interfaces, as well as any synthesizable testbench components, for multiple concurrent users and host PC connections (see the figure). We also use FPGAs as on-board memory and clock servers, which means that ZeBu-Server uses 450 FPGAs to emulate 1 billion gates. And, the full system includes more than 500 FPGAs.
Because not everyone has a billion-gate design (yet), we also needed to have modularity in our system. We created flexibility at the module, unit, and system level. Modules can have four, eight, or 16 FPGAs; units can contain up to five modules; and up to five units can be cascaded to create a billion-gate system. To maintain high performance for designs of all sizes, we used direct low-voltage differential signaling (LVDS) connections at every level of hierarchy. This, of course, resulted in loads of “fun” for our printed-circuit-board (PCB) routing team.
That covers the hardware side of things. In some ways, the software side of ZeBu-Server was an even bigger challenge. Parts of the flow are standard—we leverage the Xilinx ISE tools for place and route, for example, and use XST for synthesis in the RTB.
Yet for a billion-gate design spanning 450 DUT FPGAs, some of the standard tools break down. Synthesis time can grow exponentially with design size, and partitioning the design across such a large array of FPGAs is no easy task either. Moreover, an emulator has different priorities from an FPGA prototype—the end goal of synthesis isn’t optimal performance or resource utilization, for instance. Rather, our primary goals are short turnaround times and design visibility.
For these reasons, we developed our own toolset, including zFAST (ZeBu Fast Synthesis). The tools can synthesize SystemVerilog, VHDL, or Verilog designs up to 10 times faster than standard synthesis products and retain all RTL signal and register names. After synthesis, our clustering tool partitions the design at the gate level and can cluster a billion-gate design in just a few hours.
We also created a user interface known as zCUI to encapsulate our flow into a single, easy-to-use environment. This shields the user from getting deeply involved with the implementation details of the system. But we willingly hand over greater control to those users who are accomplished FPGA designers. They are free to use the standard FPGA flow to try to get more performance out of the DUT, constraining and using any tricks they know. The system makes it easy for a non-FPGA designer, but it doesn’t restrain an expert either.
Making an FPGA prototype for a specific design is difficult enough. Making a generic, FPGA-based, RTL emulator system that provides billion-gate support as well as all of the features associated with emulation is a huge undertaking. It involves hardware and a great deal of software that goes well beyond the standard FPGA flow.