Tool Up For The FPGA Blitz

FPGA usage divides into two primary segments. Historically, the foremost role of FPGAs has been to verify an ASIC, system-on-achip (SoC), or application-specific standard part (ASSP). Designers now will use FPGAs to prototype a portion or all of their design, to tweak the same, or as a platform to get ahead on developing system software. According to some industry experts, as many as 90% to 100% of ASICs today are prototyped on FPGAs.

For many years, a main application for FPGAs has involved the in-house construction of ASIC prototyping boards, some using a single FPGA and some with multiple chips. Anecdotes mention boards with as many as 50 FPGAs. Of course, the use of multiple devices requires designers to partition the design among them and handle associated timing issues. Now that a broad array of commercially built FPGA prototyping boards and verification systems is on the market, the build-versus-buy decision has become more complicated.

Others have turned to FPGAs as a production vehicle for their end products. Today’s FPGA, a far more capable animal than those of just a few years ago, embeds resources that lend themselves to a broad array of applications. Not so long ago, production use of FPGAs was limited to glue logic or relatively small applications. This is no longer the case.

FPGA vendors tend to be early adopters of cutting-edge process technologies, with vendors like Altera and Xilinx pushing these processes even into their Spartan and Cyclone low-cost device families. Designers thus find them attractive for mediumvolume consumer applications. Still, whatever you’re using FPGAs for, a tool chain must be in place to execute your design, verify its functionality, and place and route the design on the device itself.

ESL AND FPGAS DO MIX Electronic system-level (ESL) tools are getting more attention from designers facing the faster-than-ever design and verification of complex designs. One ESL synthesis tool that can work in an FPGA flow is Forte Design Systems’ Cynthesizer. Despite being developed to meet the needs of ASIC/SoC designers targeting particular process technologies, Cynthesizer addresses a number of use models that include FPGAs, according to Mike Meredith, Forte’s vice president of technical marketing.

For instance, there’s high-speed functional validation of RTL code. “When an FPGA implementation is used for functional validation of RTL code, the FPGA is being used as a high-speed simulator,” says Meredith.

In such an instance, the goal is to verify the correctness of the RTL code before committing to silicon. Designers can synthesize a SystemC design using the .lib file and clock speed that will ultimately be used for the high-volume silicon implementation. Then, using Cynthesizer’s integration with Synopsys’ Synplify Pro, the RTL code can be targeted to a specific FPGA.

Because the FPGA is unlikely to achieve the aggressive timing of the ASIC implementation, the resulting FPGA will have to be run at a lower clock speed. However, functional verification can still proceed many times faster than using a software simulator, says Meredith.

High-level synthesis (HLS) tools also can target different process technologies and clock speed targets without changing the high-level source. Designers using an FPGA as a stepping stone to an ASIC implementation can exploit this capability to develop an FPGA prototype that runs as fast as possible. Subsequently, they will be able to automatically create ASIC RTL code optimized for the timing characteristics of their chosen process and foundry.

In addition to using FPGAs as a stepping stone toward an ASIC implementation, Forte is seeing more users turning to the FPGA as the ultimate production target. “These designers require RTL code targeted for the specific FPGA that they plan to use, so they use our FPGA-specific high-level synthesis flow,” says Meredith. Cynthesizer provides a fully automated integration with the Xilinx and Altera place-and-route tools, making it easy to go from SystemC source to a running FPGA.

BUIDLING AN ASIC-LIKE-FLOW For most designers considering FPGAs as a verification vehicle, a cultural divide must be crossed. Writing well-crafted RTL code is the same no matter where it ends up being implemented, so design and coding styles change little. What does change, though, are the designers’ expectations.

“FPGA tools must get to a certain maturity level,” says Daniel Platzker, product line director for FPGA synthesis at Mentor Graphics (Fig. 1). “The ASIC flow is mature and designers are used to having everything required to eliminate respins, which are devastating. So the FPGA flow had to mature in similar fashion with much more simulation and verification.”

FPGA flows are maturing in the area of physical synthesis, which endows the synthesis flow with a built-in knowledge of the target FPGA’s architecture. Physical synthesis, a synthesis run that also completes placement, gives designers a much better handle on timing closure. Interconnect delays are the dominant factor in the timing of critical paths, and these delays are altogether unpredictable until completing placement.

One example of a physical synthesis tool, the Synopsys Synplify Premier, employs a graph-based placement engine to first map the design to the FPGA with global placement and routing and then optimize with detailed placement and local routing. Final routing is left to the FPGA vendor’s router.

“The on-chip resources you choose are what makes a huge difference in timing,” says Angela Sutton, staff product marketing manager for FPGA implementation products in the Synopsys Synplicity Business Group. “For example, there are some DSP resources on the Virtex 5. If you choose a particular resource in one iteration and another in the next, that will have a huge effect on timing. Locking that down is key in consistent timing estimates.”

Other vendors take a slightly different tack with physical synthesis. For instance, Mentor Graphics prefers to term it “physicalaware synthesis,” says Daniel Platzker, product line director with the FPGA Division at Mentor Graphics.

Continue to page 2

“It’s clear that load delays and cell delays are not enough to achieve timing closure as you move to higher speeds and smaller geometries, where routing delay is sometimes more than the cell delay itself,” Platzker says.

Mentor’s approach, as embodied in its Precision RTL Plus product, is to account for the FPGA’s routing resources. Then the tool tries out placement options and estimates the resulting delays. “This lets you more accurately estimate where the critical paths are and lets you focus on optimization,” says Platzker.

Precision RTL Plus ultimately produces an optimized netlist that the FPGA vendor’s placement and routing tools take in. “We don’t dictate restrictions to place and route itself,” says Platzker. “We don’t believe synthesis should limit the capacity or ability of the place-and-route tool to make its decisions. In this way, we are complementary to the place and route being done by FPGA vendor tools.”

VENDOR-INDEPENDENT IP To augment its Precision Synthesis flow, Mentor Graphics’ just-announced Precise-IP program can be used to create vendor-independent IP blocks. “We provide an inventory of basic IP, including things like memory, RAM, ROM, arithmetic functions, multipliers, and more,” says Platzker. This IP is completely configurable and parameterizable. “It’s done in wizard style in the accompanying application,” says Platzker.

The second element of Mentor’s Precise-IP Partner Program concerns partnerships with leading third-party IP providers. These vendors provide a comprehensive catalog of complex cores—such as processors, interface controllers, and application- specific cores—that are available for multiple FPGA families. ARC, ARM, Aeroflex Gaisler, CAST, Eureka Technology, Helion, IPextreme, Innovative Logic, and OptNgn all participate in the program.

HARDWARE FOR VERIFICATION As mentioned earlier, many design teams invest the time and resources into building custom FPGA-based prototyping boards for use in hardware/software co-verification. But a plethora of commercially available solutions exists on the market for that purpose.

One relative newcomer, GateRocket, offers a platform for dealing with burgeoning FPGA complexity. “We’re seeing faster and larger FPGAs, and that means longer runtimes for simulation, debug, synthesis, and place and route,” says Dave Orecchio, GateRocket’s president and CEO. “Verification tools aren’t keeping pace with the silicon.”

GateRocket’s answer is to effectively bring the FPGA directly into the simulation environment. Its flagship product, the RocketDrive, is a disk-drive-like peripheral that plugs into any 5.25-in. drive bay in a Linux box. It acts like a simulator turbocharger, using the exact FPGAs targeted by the designer.

“Users run either full or partial designs through synthesis and load them into the RocketDrive. The testbench runs on their normal simulator, which serves as the user interface,” says Orecchio.

The result is an extremely accurate simulation that runs up to 10 times faster. And because the design is running on native hardware, a RocketDrive-based verification methodology directly discovers and diagnoses bugs.

“We can directly compare, block by block, between the silicon and the simulation,” says Orecchio. “We can tell you exactly where the differences are. And because our flow uses the customer’s place-and-route tools, if there are issues with functionally mapping the design onto the FPGA, we’ll expose and identify them early.”

RocketVision software, a debug package that gives simulators visibility into the FPGA hardware, augments the RocketDrive. Released at the Design Automation Conference (DAC) in July, version 4.0 of RocketVision automatically configures the verification environment. It also adds dynamic block selection, eliminating the need to rebuild the FPGA during debug—users can dynamically select which blocks run in the FPGA and which run in the simulator. DAC also saw the release of a RocketDrive version based on Altera’s Stratix IV FPGAs.

Another type of FPGA-based verification environment can be created with products like those from EVE. The introduction of ZeBu-Server, a scalable emulation system that can handle up to 1 billion ASIC-equivalent gates, recently enhanced EVE’s ZeBu line of FPGA-based verification environments. It can be used as a multi-user, multimode accelerator/emulator with typical performance of 10 MHz on a 40-million-gate design. And as FPGA design projects go, ZeBu-Server was, in itself, a whopper (see \\[\\[you-re-using-how-many-fpgas-21747|“You’re Using How Many FPGAs?”\\]\\]).

One might well ask where a GateRocket RocketDrive ends and a ZeBu-Server begins. Certainly, a disparity is evident in terms of the sizes of the designs these solutions can handle. But there’s also a disparity regarding the implementation targets of these systems. Because it relies on native hardware, the RocketDrive is more suited as a purely FPGA-oriented appliance. “We can handle IP blocks from the FPGA vendors in addition to generic synthesizable RTL,” says GateRocket’s Orecchio.

EVE’s systems aim more toward generic synthesizable RTL, says Ron Choi, EVE’s product marketing director. “Because our goal is to be agnostic in terms of final implementation, we do base everything on the register-transfer level,” he says. “But the flip side of that is because we run synthesizable RTL, it’s more about the functionality. We don’t care if you head off into an ASIC flow or an FPGA flow.”

THE IMPLEMENTATION FLOWS Once your design is functionally verified, implementation is a matter for the FPGA vendors’ back-end tools. These vendors work closely with the EDA houses so their tools integrate smoothly with the front-end flows provided by companies such as Mentor Graphics and Synopsys. Moreover, the FPGA vendors all put a great deal of development resources into their tools.

A case in point is Altera and its current release of the Quartus II v.9.0 software. An important feature in Quartus II is its incremental compilation function, which the company has continually improved since its initial introduction. Incremental compilation enables two different kinds of flows: top down and bottom up.

Continue to page 3

In either case, partitions are created within the design, according to Jordon Inkeles, Altera’s senior manager for software product marketing. “Partitioning is similar to what an architect does on a white board at the outset of a design project, dividing up the design among individuals by functions,” says Inkeles.

In the top-down flow, you divide it up into blocks A, B, and C (Fig. 2). The entire design stays within the same Quartus project. When a change is made to partition A, the team can lock partitions B and C, maintaining timing in those partitions and reducing overall compilation time by up to 70%.

The bottom-up flow is a team-based variation. In this case, the partitions are divided into separate Quartus projects. “Partitioning for a bottom-up flow would be done by the system architect up front inside Quartus,” says Inkeles. Once the partitions are within separate Quartus projects, different individuals or teams can work on them in parallel and in different locations.

Similarly, Xilinx has made progressive improvements to its tool flow. “From a tool standpoint, we looked at things like language support,” says Bruce Fienberg, Xilinx’s senior group communications manager for products and solutions. “We had to make sure the language support designers had in ASIC tools was the same here. We revamped the tools so that we’re at parity with the rest of the industry.”

In Xilinx’s most recent tool release, ISE 11.1, the company rethought its approach to customers as well as how it delivers its technology. “We’ve said for years that FPGAs are good for embedded applications and DSP. But these designers tend to not be FPGA people, so the goal is to make the technology easier for them to access,” says Fienberg.

Xilinx came up with four interoperable and domain-specific design flows and tool configurations for logic, DSP, embedded processing, and system-level design (Fig. 3). “Think of a pyramid, a base platform which is the tools, the IP, and the silicon for basic logic design. The next level is the domainspecific platforms. There we add reference designs and kits for those various domains,” Fienberg continues.

“As we move toward 2010, we’ll deliver market-specific targeted design platforms in which we build on a base and domain to create very application-specific solutions. Examples might include a video development kit or industrial-control development kit. The idea is to not assume that because someone is an EE that they can design an FPGA,” says Fienberg.

NEW KID ON THE BLOCK Every so often, new FPGA vendors appear with silicon that attempts to put a twist on what’s come before. An example is Achronix Semiconductor Corp., which touts its FPGAs as the world’s fastest.

“Traditional FPGAs have decent speed at the I/Os but fall short in speed of the fabric,” says Yousef Khalilollahi, Achronix’s vice president of worldwide sales and marketing. “This means that customers have to compromise performance inside the device. Our value proposition is not only high-speed I/O, but we keep the speed throughout the fabric of the FPGA. To quantify that, our devices achieve up to 1.5-GHz peak performance.”

To complement its silicon, like other FPGA vendors, Achronix offers its Achronix CAD Environment (ACE) suite of software for implementation. “The design flow that customers use for our FPGAs is the same as for any other device,” says Khalilollahi. “We use standard Verilog and VHDL as design entry. That code is run through synthesis, which is shipped by us and is either Synopsys’ Synplify Pro or Mentor Graphics’ Precision Synthesis.”

The output of the synthesis tool is entered into the ACE back-end tools. Within the ACE environment are five different views of a design project. The Projects View is the default view with a hierarchical view of the workspace. Here, users create projects and can view netlists, constraint files, and IP.

A second view is IP Configuration, which automatically generates complex IP and creates the RTL as users specify, as well as generates the associated constraint files. A third view, Physical Layout, provides a graphical view of the device’s physical layout, allowing users to visualize place-and-route data as well as critical paths. Here, cross-probing is enabled between timing analysis and layout.

With the Package Viewer, users can visualize the device’s package and assign pins. They’re able to see banks of I/Os and their associations and ensure that buses are grouped together in a way that makes sense. The fifth and final view, Debugging, comes into play after the user programs an FPGA with a bitstream. In this view, the user can set up logic within the design with which to capture a trace buffer.