FPGAS Bring Reconfigurable Computing to Embedded Systems

While endeavouring to address the demand for computational cycles in high-performance applications (e.g. signal and image data-stream processing, defence, and aerospace COTS), vendors must also simultaneously confront space, weight, and power (SWaP) limitations. These competing design constraints have generated more interest in field-programmable gate arrays (FPGAs), which offer dense computational power and support for reconfigurable computing.

Previously, the strengths of FPGA devices-flexibility and complexity- were also perceived as barriers to their adoption in open-system VME, CompactPCI, and VPX board platforms. In addition to cost issues, FPGAs were perceived as difficult to integrate and test effectively. Alternatively, many designers either opted to sacrifice system performance and features or turned to custom ASICs, which are less flexible and typically have longer design cycles. In recent years, though, improvements in FPGA devices, including larger gate counts and software tools for development and integration, have helped to increase the popularity of FPGAs for a wider range of solutions.

In the embedded defence and aerospace market, many applications (e.g., radar and signal intelligence) are growing computationally faster than processor performance. FPGAs fit well into these high-performance processing applications thanks to their inherent parallelism, which is suited to repetitive algorithmic processing on received data streams. When used as the front-end of complex sensor systems, FPGAs provide significant slot count reduction to provide parallel FFT, filtering, and decimation processing of the incoming data streams, derived from as many as hundreds of input channels.

An example of an advanced new FPGA is the Xilinx Virtex-5 (Fig. 1), which is manufactured in 65nm foundries using triplegate oxide transistors to form the FPGA fabric. This design reduces transistor leakage current and the FPGA's overall static power dissipation. The 65nm processing node also cuts node capacitance, which in turn reduces dynamic power consumption in conjunction with the FPGA's 1.0V core voltage.

The Virtex-5's ExpressFabric architecture has an enhanced LUT structure, going from a 4- to a 6-input, to support more compact designs. The FPGA's DSP48E DSP were expanded with 25 x 18bit multipliers to enhance the device's floatingpoint functionality. The DSP48E blocks can be pipelined or cascaded to increase the throughput of various filtering algorithms.

The Virtex-5 also features enhanced clock control and management. A PLL was added to the clock-management-tile (CMT) blocks of the Virtex-5 to provide phase-shift and frequency-synthesis control from the DLL, and jitter reduction from the PLL. The CMT can generate clocks up to 550MHz to drive the logic and memory in the Virtex-5 devices. The Virtex-5 LXT version of the FPGA includes up to 24 highspeed/ low-power serial I/O channels with performance from 100Mbps to 3.2Gbps, and supports many high-speed serial I/O standards. In addition to soft cores for Aurora or RapidIO communications, the LXT also contains dedicated hardware blocks for Gigabit Ethernet (GbE) and PCI Express (PCIe).


Unburdened by time or budget constraints a system integrator might consider developing a custom FPGA-based solution that precisely meets their application's specific needs. Unfortunately, time-to-market and time-to-deployment requirements often preclude the custom option.

COTS solutions offer a faster, more cost-effective approach than a fully custom FPGA solution. For embedded-system integrators considering the use of FPGAs for algorithm processing, there are some basic considerations, such as whether the algorithm can be efficiently implemented on an FPGA and what system-level savings can be achieved by moving from general- purpose processors to FPGAs.

Other design issues that should be considered include how the dataflow will move into and out of the FPGA, as well as how the FPGA processing will be able to integrate and coordinate into the larger system.

To identify subsystem components that would improve significantly if implemented on an FPGA, integrators should consider whether their algorithm provides significant opportunities for parallelisation. They should also consider whether the algorithm calculations be done with fixedpoint mathematics.


Making COTS FPGA boards even more attractive is the development of toolkits that tie together interprocessor middleware, PCI interfaces, and DDR SDRAM controllers to support the FPGA's on-chip processors. Vendors are investing heavily in the provision of much-extended IP and software modules.

Today, thanks to these toolkits, new FPGA-based boards such as Curtiss-Wright's dual, Virtex-5-based, CHAMP-FX2 VPX engine (Fig. 2), make it easier to optimise algorithms and integrate FPGA engines with SBCs and high-performance, multiprocessor computers. The toolkits now frequently match the performance levels of time-consuming, handcrafted designs straight out of the box, which results in much faster time-todeployment for military application developers.

A hardware-specific development toolkit is a critical element of a COTS solution. It could dramatically ease and speed the integration of the application algorithm onto the board hardware, and afterward, facilitate the process of integrating the board into a larger multicomputer system. This is particularly important for accelerating the application algorithm, since there's usually general-purpose or DSP-based processing required prior to and after the FPGA-based processing.

FPGA toolkits are designed from the start to integrate with IP components. COTS board suppliers typically provide a toolkit consisting of software drivers and libraries, IP blocks, and a simulation testbench. While third-party IP blocks are available from chip vendors, many of these blocks may not be optimised for the particular hardware that the system designer needs to target. Complicating the integration of IP blocks further, there currently aren't any de facto standards for interfaces.

As a result, the process of integrating off-the-shelf blocks with custom user codes to achieve the application's performance goals can be a difficult task. To mitigate this challenge, some COTS vendors are providing optimised IP blocks with their development toolkit. For example, Curtiss- Wright's Continuum FXtools kit provides highly optimised IP blocks for I/O and common interblock interfaces. Because COTS FPGA boards are frequently deployed in harsh environments, it's critical to qualify the support IP blocks on the hardware across the same temperature range for which the host board was designed. This ensures that the optimised, qualified IP blocks will perform and survive across extended temperature ranges (ranges of -40°C to 85°C aren't uncommon in military systems).


Another improvement in today's FPGAs is their use of high-speed serial ports to connect the FPGA to a serial switching fabric, such as RapidIO. Examples of FPGAs that support high-speed serial ports include Xilinx's Virtex-II Pro, Virtex-4, and Virtex-5. The CHAMP-FX2 interconnects its two on-board Virtex-5 FPGAs to a 4- lane port on its on-board Serial RapidIO (SRIO) switch. The SRIO switch also features four 4-lane SRIO ports, which are routed to the backplane to form the switch fabric. Each of these ports can provide up to 2.5 Gbytes/s of bidirectional bandwidth.

The high-speed serial data paths enable the board to interconnect seamlessly to other RapidIOenabled VPX hardware, such as the company's CHAMP-AV6 quad PowerPC DSP engine, or the VPX6-185 Single-Board Computer (SBC). Because the FPGA is physically integrated into the SRIO switched fabric, the bidirectional nature of the fabric can be harnessed to stream data into and out of the FPGA.


After the system integrator integrates the application with the vendor-supplied blocks and completes a satisfactory simulation, the next integration step is to establish interprocessor communications and application command and control. Compared to a general- purpose processor like a PowerPC, FPGAs lack processing resources useful for commanding data movements, setting processing modes within the FPGA, and other important command and control functions.

To address these limitations, a system integrator can use external processing resources. One essential external resource is a rich library of fairly high-level command functions. Leaving the developer to set registers and create control structures like those used by advanced DMA engines by hand can be tedious and error-prone. A support library should give the developer easy-touse functions for complex DMA command control, synchronisation, and other system-level tasks.

Another good way to reduce integration time is to use a highlevel, interprocessor-communications, middleware solution designed to support the FPGA. The middleware simplifies integration by managing mundane tasks like memory mapping, DMA engine setup, and interrupt service routines, while providing a simple application-level API.

An example of an interprocessor communications solution for use on a COTS FPGA board is Curtiss-Wright's Continuum IPC software. The Continuum IPC middleware enables FPGA resources, such as buffers, semaphores, and DMA commands, to be abstracted into named objects in the same way they would be on a PowerPC, thus simplifying the data movement coding effort. Another benefit is that during the system tuning stage, the processing and buffers can be moved around the system without needing to be recoded-the middleware automatically resolves the new locations. This makes the FPGA much more of a peer processing element in the system, which provides clear benefits when using FPGAs as algorithm accelerators.

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.