Mixing microprocessors and FPGAs is a natural way to blend programmability with high-performance computing. In the past, these devices typically were combined with the microcontroller or microprocessor in one chip and the FPGA in another. These days the FPGA is more likely to contain one hard-core or soft-core processor or more than not, bringing programmable control closer to the custom logic of the FPGA fabric.
In general, the FPGA fabric consists of a regular collection of lookup table (LUT) blocks, storage blocks, clocks, and an interconnect. In addition, interface ports are attached to the interconnect. Hard logic can be added to this mix, ranging from high-speed serializers-deserializers (SERDES) to interfaces such as PCI or PCI Express. FPGAs can be programmed to provide a range of standard interfaces such as serial peripheral interface (SPI), serial, and parallel ports.
FPGA programming interfaces differ depending on the technology. Flash-based FPGAs have on-board flash storage that retains the system configuration. RAM-based FPGAs utilize off-chip storage, typically a serial flash chip, that’s loaded into RAM when the system starts up.
These interfaces allow an off-chip processor to interface to the FPGA. Dynamic reprogramming of an FPGA by an off-chip processor is possible, though normally an FPGA configuration will be fixed for a particular system. Still, a complete system reset is relatively easy if the new configuration file is available and the latest crop of FPGAs supports partial reconfiguration (see “Climb On Board Next-Generation FPGAs” at electronicdesign.com).
Off-Chip And On-Chip
Having off-chip processors use hard or soft on-chip interfaces is academic but potentially less efficient than if the processor is on the FPGA since both the FPGA and processor require additional interfaces. There is also the footprint issue. An off-chip processor solution requires at least one more chip than the FPGA. Higher-performance processors often have additional support chips from memory to controllers that add to a solution’s footprint.
Having a processor on-chip saves space. It also can reduce interface limitations and interface power requirements as well as provide a more intimate interaction between logic in the FPGA fabric and the processor. In the latter case, a special processor instruction might invoke an operation or access the results of computations or actions being handled by the fabric.
How this is done and the level of interaction differ based on whether the processor is a hard core or a soft core. FPGA hard-core logic has the advantage of efficiency while soft-core logic, implemented in the FPGA fabric, offers flexibility. Hard-core processors typically match the architecture and efficiency of a standalone processor or microcontroller. Soft-core processors must be implemented using LUTs. It’s possible to take a standard processor architecture and implement it as a soft-core design, but it means force-fitting the design to the FPGA. This is one reason for cores designed specifically for FPGA deployment.
Xilinx’s MicroBlaze and PicoBlaze (see “Soft Processors Raise Performance Levels” at electronicdesign.com) as well as Altera’s NIOS (see “Stick It With NIOS II” at electronicdesign.com) are FPGA vendor-supplied soft cores. Software vendors even target these soft cores such as the LynuxWorks BlueCat Linux Micro Edition for Xilinx’s MicroBlaze (see “RTOS Targets FPGA Soft Cores” at electronicdesign.com). Several standard processor architectures such as Arm’s Cortex-M1 target FPGAs as well.
Hard-Core Processors and FPGAs
Intel’s E600C (Fig. 1) comprises two separate components, an Intel Atom and an Altera FPGA, in a single package (see “Configurable Platform Blends FPGA With Atom” at electronicdesign.com). Internally the two are primarily connected via a pair of PCI Express SERDES. I/O pins for both are exposed to the outside world with a few linked internally.
For all intents and purposes from a developer’s perspective, the E600C can be considered a one-chip version of the two chips. This puts the E600C into a different class compared to FPGAs with embedded hard-core processors.
This is not to say that the Intel/Altera approach is not useful or viable. On the contrary, it’s likely the typical scenario found with higher-end processor/FPGA pairings where a Core i3/5/7 would be linked to one or more FPGAs.
In this case the PCI Express links would still be the primary connection between the FPGA and the processor. These links are quite effective in moving data between these devices. What they lack is the intimacy at the instruction level that an embedded solution would provide.
Another platform that doesn’t exactly match the scenarios to follow is Cypress Semiconductor’s PSoC (see “PSoC: Almost An FPGA”). The PSoC 3 is based on an 8051 core, while the PSoC 5 is based on an Arm Cortex-M3.
The PSoC’s analog and digital field-programmable component array (FPCA) differs from an FPGA fabric in its low-level components, which are more functional but more rigid in terms of configurability than an FPGA LUT. The FPCA is designed to make the creation of custom peripherals easy, but the kinds of peripherals are modeled after standard microcontroller peripherals like serial ports and analog-to-digital converters (ADCs).
Targeting microcontroller peripherals is the primary area where the PSoC deviates from an FPGA with a hard-core or soft-core processor. In particular, the FPGA fabric can incorporate logic that operates completely independently of the onboard processor.
While that’s possible to a limited extent with the PSoC, the primary mode of operation is centered around the processor core. Designers are simply choosing from or designing custom peripherals that will be manipulated by an application.
In a sense, FPGAs with embedded cores mirror much of the PSoC in terms of design. More often than not, the programs running on these cores will view the FPGA logic as custom peripherals, albeit ones that can be significantly more complex than those found in the PSoC.
Some hard cores found in many FPGAs also tend to be heftier than those found in the PSoC line. Xilinx still sells FPGAs such as the Virtex 4 with hard-core Power PC cores, but the latest crop of FPGAs with a hard core tend to have Arm cores, with the Arm Cortex-M3 and Cortex-A9 being the most popular (see “Xilinx Unifies FPGA Line” at electronicdesign.com). The Cortex-A9 makes a suitable replacement for a PowerPC core, providing advanced memory management and computational capabilities.
Xilinx’s Zynq-7000 EPP (Fig. 2) is based on a dual-core Cortex-A9 (see “FPGA Packs In Dual Cortex-A9 Micro” at electronicdesign.com). The primary difference between the Virtex 4-based solutions and the Zynq-7000 EPP is the hard logic surrounding the Cortex-A9 cores.
Like the Virtex 4 Power PC cores, the Cortex-A9 has a standard bus interface to the FPGA fabric. The difference with the Zynq-7000 EPP is that it has a complete complement of peripherals and memory controllers, which provide access to off-chip memory. Essentially, the Zynq-7000 EPP is a dual-Cortex-A9 microprocessor with an intimate link to an FPGA fabric.
This is important for two reasons. First, the programmer can view the processing platform as a self-contained, self-sufficient system. In fact, the microprocessor could be used by itself. That does defeat the purpose of the chip, but it simplifies the view from a software perspective. Second, it allows a significant amount of debugging to be placed on the software side that’s more dynamic than the FPGA fabric, which is typically configured once (or once per a debugging session).
It’s possible and desirable to add diagnostic logic to an FPGA application, but this typically exposes internal information to an application running on a processor. Having that processor on chip simplifies this design task. Xilinx’s platform blends a conventional multicore processor design with an FPGA.
The AMBA (Advanced Microcontroller Bus Architecture) AXI (Advanced eXtensible Interface) interconnect is a standard for Arm-based designs, making FPGA peripheral design easier as well as making the connection to the processors an almost trivial exercise compared to typical FPGA application logic design.
MicroSemi SoC Products Group’s (formerly Actel) SmartFusion (Fig. 3) is similar to Xilinx’s Zynq-7000 as the SmartFusion Cortex-M3 core is surrounded by a standard set of hard peripherals (see “FPGA Combines Hard-Core Cortex-M3 And Analog Peripherals” at electronicdesign.com). The processor is linked to a standard FPGA fabric via a standard bus interface, so FPGA-based peripherals are easily accessed.
The big difference between the SmartFusion and other hard-core FPGAs is the SmartFusion’s analog subsystem. In a sense, this is just a very fancy analog peripheral, but this subsystem is capable of semi-independent operation. It’s also accessible by the FPGA logic, allowing for very complex designs.
It’s unlikely that FPGAs with hard-core processors and peripherals will follow the microcontroller route where there is a plethora of SKUs. That would actually be counterintuitive given an FPGA fabric. Still, hard-core logic has significant advantages over FPGA implementations as already noted.
One advantage that an off-chip processor has over a combined solution is power management. Hard-core processors on FPGAs have power advantages over soft-core processors, but normally the processor, peripherals, and FPGA will all be running whereas an off-chip solution could, in theory, power down the FPGA as well as most of its own components.
This can be very desirable in many applications, especially battery or mobile applications. FPGAs may provide more modular power-management solutions in the future, but for now most designs will have to account for the full FPGA fabric power requirements along with the hard-core logic.
As with most microprocessors and microcontrollers, a designer will utilize most of the processing power and peripherals within the system but not all. This is one area where soft-core processors have an advantage since they can usually be tuned to some degree such as specifying the amount of cache using in the design.
Soft-Core Processors and FPGAs
FPGAs incorporating soft-core processors have many advantages over their hard-core counterparts including the ability to incorporate any mix and any number of cores with a design being limited only by the resources provided by the target chip.
First, the core selection is significantly larger. Each major FPGA vendor has its own soft core or cores tailored to its hardware, as all FPGAs are not created equal. But there’s a host of standard soft cores designed for FPGA implementation that work with almost any FPGA.
The Arm Cortex-M1 and the Freescale V1 ColdFire can be used with a range of FPGAs (see “Cold, Dense, And Gratis MCU Core Targets FPGAs” at electronicdesign.com). Of course, soft cores are limited by the size of an FPGA since there must be sufficient room for the core as well as any supporting logic.
Second, any number of soft cores can be incorporated into a single FPGA limited only by the cores, their required peripherals, and headroom needed for the application. Typically the cores are configured in an asymmetrical multiprocessing (AMP) configuration with cores often being tailored to a specific task.
Having a common core makes programming easier, but having different cores is just as easy from a hardware design perspective. Symmetrical multiprocessing (SMP) and non-uniform memory architecture (NUMA) systems are possible but actually more difficult to design from a hardware perspective.
Third, the designer has control over the placement and connections between the cores and the rest of the FPGA fabric. This can affect layout and operational efficiency. For instance, a pipeline AMP architecture might be best implemented with cores interspersed between communication and processing logic for data passed from one core to another. Hard-core FPGAs force designers to start with the existing placement of the hard core and its interfaces.
Fourth, soft cores can be mixed with hard cores. This will be application-specific as with any multicore design, but it allows designers to take advantage of hard cores while adding the power of additional cores. These days, the hard and soft cores may even use the same instruction set. Operating systems and runtime libraries can often hide any hardware minor differences.
Finally, soft cores can be configured, affecting their size and performance. The selections available to a designer can change these factors significantly. For example, storage often utilizes a significant amount of space on the FPGA, so using small caches can reduce a core’s footprint. Increasing cache size or the number of registers can increase performance.
Another example is the type of interrupt handling methodology that could be employed. A single global interrupt is at one end of the spectrum, while a priority-based, dedicated register-swapping approach is at the other. Of course, the dedicated register swapping requires significantly more hardware resources and programming skills.
The number of options can be mind-boggling, but the results can be a system that is tuned to the application and supporting just the right size FPGA.
Soft-core configuration varies depending upon the FPGA design tools and the selected cores. Configuration is normally menu-driven, allowing designers to choose from a range of options. Standard configurations are available by default because software considerations are key. This is one reason why hard-core and standard soft-core processors make sense. Adjusting the software tool chain or underlying operating system is not something that should be done on a regular basis, if at all.
This issue becomes more important if there are additions or modifications to the soft core’s instructions because support then needs to be made available to the programmers. This can be done in a number of ways, from simple assembler macros to changes to a C/C++ compiler.
Minor changes or custom assembler code is often used if only a few instructions are added to the mix or if their use is limited to a few algorithms. The advantage is improved performance and efficiency since software can interact more directly with logic with the rest of the FPGA. The disadvantage is more complexity.
The Synopsis ARCs and Tensilica Xtensa are examples of cores that are available for FPGA use but normally target ASIC designs. Their tools provide support for generating developer tools, such as compilers, that take advantage of new instructions added by the hardware designers.
Typically, interaction between a soft core and the FPGA is via I/O ports, or memory for memory mapped I/O. Asynchronous feedback is usually by the interrupt scheme implemented by the designer. Likewise, direct memory access (DMA) would operate within these same constraints.
Standard bus interfaces like Arm’s AMBA AXI make hardware and software design significantly easier. Early soft-core tools lacked this standardization, but lately the standardization allows the menu selection of peripherals as well as custom core/FPGA peripherals. For instance, instead of just providing an I/O port, a design tool might allow FIFOs to be part of the option mix.
A standard bus interface also simplifies FPGA designers’ chores when it comes to adding a new peripheral to the mix selectable for use with a soft-core processor. This is the same as dealing with a bus interface on a board-level system allowing boards to be used in different systems and working with different operating systems. All that needs to be supplied is the hardware and a device driver. Providing a C code device driver offers a high level of portability.
Soft peripheral designers can look at the FPGA in the same way. Simply design to the bus interface, typically Arm AMBA AXI, and the peripheral can be used with any soft core or hard core as long as it supports the matching bus.
The work of the hardware designer is not finished after the core and its peripherals are selected since the peripherals must be connected either to I/O pads for outside connectivity or to the other application FPGA logic if the devices are internal. Luckily, this is often a case of simply selecting the signal names appropriately and letting the FPGA design tool do the routing.
A Better FPGA For Soft Cores
The LUT overhead of a soft-core processor versus a hard-core processor is just one implementation issue that must be dealt with when it comes to soft-core processors and performance. Another issue is multiport memory. The typical FPGA has dual-port memory, but advanced processor designs frequently have long, complex pipelines that could benefit from more memory ports.
Tabula’s ABAX (Fig. 4) provides a unique SpaceTime time-based FPGA architecture that supports more ports for memory (see “FPGAs Enter The Third Dimension” at electronicdesign.com). SpaceTime implements multiple layers of logic, up to 16, that change each clocked state. ABAX actually has single-port memory, but data can be read during each state, essentially providing up to a 16-port memory.
Imagine branch look-ahead optimization logic being implemented with this feature. These types of features within the ABAX design potentially enable designers to implement soft cores that are more efficient.
Interestingly, Tabula’s Stylus tools build on conventional FPGA design tools mapping a design to the underlying
SpaceTime architecture. Stylus and the timing constraints placed on the design by the developer actually handle the number of layers required for a design.
In many ways, the logical time layered SpaceTime approach is very similar to pipelining within advanced processor designs. Research being done with the ABAX could lead to some interesting soft cores with some attractive attributes.
Advanced FPGA Integration
FPGAs are like lumps of clay. They can be made into almost anything. Still, most designers are likely to take a conventional approach to incorporating a hard or soft core into their designs using a static collection of menu-selected peripherals augmented by custom FPGA logic coupled to I/O ports. It’s easy to do and more than sufficient for most applications.
Engineers who are more advanced are likely to use multiple and possibly different soft cores. Some other advanced possibilities include dynamic partial reconfiguration available with many FPGAs and the latest FPGA design toolsets. The approach more readily uses RAM-based FPGAs.
Essentially, the configuration of a region of the FPGA is changed at runtime. The area could contain a soft core or other logic, but the change is typically initiated under the control of a processor. On-chip processors, hard or soft, could accomplish this task.
Dynamic reconfiguration is something new and not part of any hard-core or soft-core processor, menu-driven design flow. It may also add too much complexity for designers, but it is a possibility that recalls the flexibility of some older nano-programmed minicomputers like the Burroughs B1700 series, which changed its configuration to run applications written in different programming languages. In fact, the B1700 ran applications written in languages like Algol and COBOL.
One thing that should be clear is that incorporation of processor cores into an FPGA design is relatively easy while the payback in flexibility is significant. Many FPGA applications do not require a processor core on-chip. In many instances, it would actually be in the way of an optimal system design.
On the other hand, microcontroller- or microprocessor-based designs might benefit from the flexibility and power provided by an FPGA-based solution, especially single-chip solutions. So, what will your next hardware project look like?
Cypress Semiconductor’s PSoCs share many attributes with hard-core FPGAs (see “Field-Programmable I/O Augments 8- and 32-Bit Microcontrollers” at electronicdesign.com). The PSoC 3 line is based on the 8051 core, while the PSoC 5 line incorporates an Arm Cortex-M3.
Like some hard-core FPGA combinations, the PSoC has a few dedicated peripherals. However, most are provided using the programmable fabric that surrounds the hard processor core.
The big difference is that the PSoC’s programmable hardware is more rigid than an FPGA’s more general LUT-based (lookup table) approach. Instead, the PSoC has digital and analog blocks that can be configured and connected together to form more complex devices.
Cypress Semiconductor calls this a field-programmable component array, or FPCA (see the figure). The components are tailored to efficiently provide conventional digital devices such as serial ports and timers as well as analog devices such as analog-to-digital converters (ADCs) and digital-to-analog converters (DACs).
The advantage over a collection of digital and analog peripherals is the customizability available to designers. For example, timing and synchronization for serial ports can be configured to a much greater degree than a serial port in a typical microcontroller. Components can be tied together forming larger or more complex systems. This ability is more powerful than that found on other microcontrollers with customizable event systems because the components are more configurable (see “Offloading CPU Boosts Microcontroller Performance And Cuts Power” at electronicdesign.com).
The PSoC lacks the horsepower of the higher-end FPGAs, but it competes well with low-range to mid-range FPGAs. Likewise, it competes with microcontrollers like those from Atmel, Microchip, and Texas Instruments that have various configurable event systems that link multiple peripherals together. Like the PSoC, these microcontrollers often operate when the main processor is sleeping, further reducing power consumption.