Now nothing will impede the field-programmable gate array's (FPGA) march into scores of volume applications. New architectures both lower the cost and improve functionality, enabling them to cost-effectively compete with ASICs— even when volumes hit several hundred thousand units and more.
Smaller process geometries that shrink chip size go a long way toward reducing cost. But size isn't the only factor in the cost equation. The chips also must contain enough logic functionality and I/O pads to meet the needs of a wide range of systems.
Also driving the use of programmable devices is the shorter product life of many mass-market products. With ASICs, designers typically must allow for manufacturing cycle times of a month or more. Then there are the nonrecurringengineering (NRE) costs. And don't forget to budget for chips that become useless if the end product is discontinued or the features change so much that the ASIC can't be used in the next spin of the system.
Programmable logic devices combat all of the above issues. These off-the-shelf parts have no significant NRE charges. If the system is designed properly, they can be updated in the field to correct system flaws or add new features. The per-chip costs might be higher than those of an ASIC, but after considering these other factors, FPGAs become a more cost-effective solution.
To further the trend, all companies that offer FPGAs have developed families of low-cost devices that now target mass-market applications. Chip prices start as low as $1.30 apiece in lots of 250,000 units for devices with about 30 kgates of logic.
Today, designers can purchase million-gate FPGAs for less than $10 each in large quantities. For example, a chip with 1.2 million system gates in Xilinx's Spartan 3E family, the XC3S1200E, is sampling now and has a projected price in the second half of 2006 of less than $9 each in lots of 500,000 units. That low price puts the FPGA into direct competition with traditional ASICs.
Nearly every other FPGA vendor also has released budgetconscious families. There's Actel and its ProASIC 3 series; Altera's Cyclone II; Lattice's EC (economy), ECP-DSP ( economy plus digital-signal processing support) and XP families; and QuickLogic's Eclipse II series.
The Spartan-3E family represents Xilinx's seventh generation of Spartan devices, and it specifically targets budget-minded applications. Comprising five members, the family provides capacities ranging from 100k to 1.6 million system gates, up to 231 kbits of distributed SRAM, up to 648 kbits of dedicated configurable RAM blocks, and from 66 to 376 I/O pins (Fig. 1). The E series devices run 30% cheaper than the Spartan-3 chips, thanks to a streamlined architecture.
Designed for consumer applications, the 3E series supports mini-low-voltage differentialsignaling (LVDS) interfaces, PCI 64/66, PCIX, and DDR333 memories. On-chip DSP support comes in the form of dedicated 18-by 18-bit multiplier-accumulator blocks. The 1.2-Mgate XC3S1200E can rip through 9.1 GMACs/s when clocked at 325 MHz. Dedicated digital clock managers (up to eight) permit designers to implement multiple independent clock trees.
Unlike previous FPGAs that required a special configuration memory, the 3E series can take advantage of low-cost serialperipheralinterface (SPI) or standard commodity flash memories. For those systems already equipped with a standard parallel flash memory (with room to hold the configuration code), designers can save on the cost of configuration memory.
Another variation of the Spartan-3 family, the 3L series, stakes a claim as the lowest standby current FPGAs. Devices in the family draw as little as 6 to 10 mA in their hibernate mode, which constitutes up to a 98% quiescent power reduction compared to the Spartan-3 series. Devices in the 3L series pack from 1 million to 4 million system gates and up to 633 user I/O pins.
In addition to the distributed SRAM of 120 to 432 kbits, the 3L devices pack dedicated blocks of configurable RAM— 432 kbits on the 1-Mgate chip and 1728 Mbits on the 4-Mgate chip. Dedicated multiplier-accumulator blocks also are available on the 3L chips, with 24 on the 1-Mgate chip and 96 on the 4-Mgate device.
GREAT MINDS THINK ALIKE - Altera's Cyclone II family has a set of resources similar to those incorporated by Xilinx in the Spartan-3E series. Its devices pack from 4.6k to over 68k logic elements (about 120k to 1.8 million system gates if one logic element block is equivalent to about 26 system gates). Total memory ranges from about 120 kbits to over 1.1 Mbits. In addition, there are 13 to 150 18- by 18-bit multipliers, two or four phase-locked loops, and from 143 to 622 maximum user I/O pins.
Each on-chip multiplier also can be configured as two independent 9- by 9-bit multipliers, operating at clock speeds of up to 250 MHz. Like the Spartan-3E series, the Cyclone II devices support a wide range of I/O standards, including LVDS, differential HSTL, differential SSTL, and many single-ended standards at 1.5, 1.8, 2.5, and 3.3 V.
Dedicated DDR2 and QDRII memory interfaces, incorporatedas hard-wired blocks on the devices, can handle data-transfer rates of up to 668 Mbits/s. Most other FPGAs only offer firstgeneration double-data-rate (DDR) memory support.
Two families of low-cost FPGAs hail from Lattice—the LFXP series of flash-based FPGAs, and the EC/ECP-DSP series of SRAM-based devices, some of which include dedicated DSP support. Although the LFXP chips have on-chip flash to hold their configuration patterns, the FPGAs employ SRAM-based lookup tables (LUTs) that are loaded upon power-up with the data held in the flash memory.
The FPGAs also can be updated during operation. A background data-transfer mode (TransFR) allows a new configuration pattern to be loaded into the flash while the chip continues operating. This TransFR reconfiguration mode lets the chip update its operating mode on-the-fly. As a result, designers can build updatable systems without requiring lengthy reboot cycles.
Offering between 3k and almost 20k LUTs, the LFXP chips include configurable blocks of dedicated SRAM—from 54 to 414 kbits—and from 62 to 340 I/O pins. The LUT RAMs can be used as distributed RAM, which adds another 12 to 79 kbits of available memory. This rather I/O-rich family packs a dedicated DDR333-capable DRAM interface.
The LUTs in the LFXP FPGAs are grouped into blocks called programmable functional units (PFUs). These PFUs contain all of the building blocks for logic, arithmetic, RAM, ROM, and register functions. But not all circuit implementations need the RAM. So, designers created RAM-less PFUs—called PFFs—that just implement the logic, arithmetic, and ROM functions. Implementing the logic fabric with a mix of PFUs and PFFs enables the creation of a more area-efficient FPGA, thereby reducing chip cost.
The EC and ECP-DSP FPGAs require an off-chip configuration memory but offer from 1.5k to 32.8k LUTs and from 65 to 496 I/O pins. Like the LFXP devices, the EC and ECPDSP chips include dedicated blocks of SRAM and distributed SRAM by leveraging the small SRAMs in the LUTs.
Dedicated RAM on the chips ranges between 18 and 498 kbits, while distributed RAM ranges from 6 to 131 kbits. ECP-DSP versions in the family include dedicated DSP support in the form of configurable multiplier-accumulators.
They make it possible to implement four to eight 36- by 36-bit, 16 to 32 18- by 18-bit, or 32 to 64 9- by 9-bit multipliers that can run at up to 250 MHz.
Like the LFXP devices, the EC and ECP FPGAs include a dedicated DDR memory interface. However, the interface can handle higher data rates—DDR400 versus DDR333. And to keep the chip size as small as possible, the EC and ECP logic fabrics contain a mix of PFUs and PFFs, also like the LFXP version.
DIRECT FLASH CONTROL - By integrating the flash configuration cells directly into the logic fabric rather than load LUTs from a bank of flash storage, Actel's ProASIC3 and ProASIC3E FPGA families can deliver ultra-low cost/gate figures.
With prices that dip to $1.50 for a 30-kgate device in volume, the ProASIC3 family can extend itself to chips with as many as 1 million system gates, 144 kbits of dedicated and configurable true dual-port SRAM, and up to 288 I/O pins. The 3E series offers devices featuring 600k to 3 Mgates, up to 504 kbits of SRAM, and as many as 604 I/O pins that can support 19 different I/O standards.
Thanks to clock rates that run at up to 350 MHz, the chips can readily implement 66-MHz, 64-bit PCI interfaces. A 128-bit AES encryption engine provides design security without external overhead to prevent access to all programming information.
A small user-accessible flash memory organized as eight 128-bit pages is available on the ProASIC3 chips to hold system data. It can also hold other information such as revision or manufacturing history, or Internet Protocol device addressing, calibration settings, or date stamping.
Claiming the lowest power drain for an FPGA family, the Eclipse II series from QuickLogic keeps power in the microwatt range during standby. The antifuse-based FPGAs feature a quiescent current of 14 A, which runs some 20 to 400 times lower-than that of other similar-density-FPGAs.
Five devices make up the family, with gate counts ranging from about 47k up to 320 kgates and on-chip memory from 9 kbits to about 55 kbits. The two largest family members, the QL8250 and 8325, also include embedded computational blocks. As many as 12 computation blocks reside on the QL8326, and each block can perform up to 12 8-bit multiply-accumulate functions per cycle. When the computational blocks clock at 100 MHz, up to 1 billion MACs can be executed.
Also available from QuickLogicis the older Eclipse-E family and the pASIC3 FPGAs. The Eclipse-E FPGAs are similar to the new Eclipse II family, but the power drain isn't as low. The lower-complexity antifuse-based pASIC3 series is more like a CPLD replacement, offering up to 60k usable PLD gates.
CPLDs OFFER MORE FOR LESS - In addition to FPGAs, complex programmable-logic devices (CPLDs) can be used in cost-sensitive applications. But their gate counts are typically lower than the low-cost FPGAs. From an external point of view, the differences between FPGAs and the larger CPLDs are fairly minor. Yet internal logic architectures differ, and that can be a deciding factor. CPLDs are usually rich in logic macrocells and I/O pins, but they don't have much on-chip memory or timing support (phase-locked loops). This makes them a good fit for control-type applications.
In contrast, FPGAs tend to include blocks of embedded memory, or at least, they can use the LUT memories as distributed RAM. They also often contain one or more PLLs for clock distribution and a moderate number of I/O pins. Such a combination of resources better suits datapath and other compute-oriented applications.
Of course, this isn't a hard and fast rule. Designers can readily insert FPGAs into control applications. But at the lower complexity levels, below about 512 macrocells, CPLDs tend to be more cost-effective. The reverse usually isn't the case, though. Memory resources on the lower-complexity CPLDs typically come up short for compute-oriented applications.
However, some larger CPLDs incorporate blocks of memory-and other resources that would allow them to handle compute applications. In fact, some larger CPLDs are actually FPGAs in disguise. Their internal architecture is based on configurable SRAM-based LUTs, rather than macrocells, and on-chip nonvolatile flash storage holds the configuration pattern that's loaded into the SRAM upon power-up.
Both Altera and Lattice recently released families of CPLDs based on FPGA-like internal architectures. Altera's MAX II series and Lattice's MachXO family both use four-input LUTs as the configurable logic blocks. (For more about the MachXO series, see "PLD Family Bridges FPGA And CPLD Needs," p. 34.) The families are trying to straddle the logic gap between CPLDs and high-density FPGAs by offering more registers than practical with traditional CPLD macrocells, minimally encroaching on the FPGA turf.
MAXING OUT THE DENSITY - Although Altera's MAX II series follows in the footsteps of the company's older product-term-based MAX family of CPLDs, it offers almost four times the logic capacity, more I/O pads, and smaller chip sizes. Moving from a 300-nm process used for the MAX series to a 180-nm flash process and switching to an LUT-based architecture allows the MAX II devices to deliver the equivalent of 128 to 1700 macrocells (150 to 2210 logic elements).
Each logic element (LE) contains a four-input LUT, a programmable register, and a carry chain with carry-select logic (Fig. 2). Ten LEs form a logic array block (LAB). Rows and columns of LABs connect to I/O elements that contain bidirectional I/O buffers, Schmitt-trigger inputs, and still other features.
In addition to the on-chip configuration memory, the MAX II devices pack an additional 8 kbits of user-accessible flash memory. This block can hold some user or system data, eliminating the need for an external data memory in some systems.
The largest members of the MAX II and MachXO families have nearly identical I/O pin counts of 272 and 271, respectively. Likewise, both families have four members with roughly similar gate LEs. The MAX II family operates with supplies as low as 1.8 V and offers a standby current of about 2 mA. A lower operating voltage of 1.2 V for the MachXO series helps keep the sleep-mode standby current drain to less than 100 A.
Although it's a few years old, the CoolRunner II family from Xilinx offers densities from 32 to 512 macrocells and logic delays ranging from 2.8 to about 7 ns pintopin. They're similar to the MAX II devices in that they operate from a 1.8-V supply. But they draw less than 25 A on standby. Running at 50 MHz, the smallest and largest devices draw 2.5 and 55 mA, respectively. In addition, like the LatticeXO series, the CoolRunner II devices allow on-the-fly reconfiguration.
Most programmable-logic vendors haven't made major updates to their older CPLD families. But this doesn't mean that the CPLD-type architectures are fading away. Altera still offers its FLEX6000 series of PLDs, which house from about 10k to 24k gates. Lattice recently updated its offerings with the ispXPLD 5000MX series, a family of flash CPLDs based on a memory-rich logic blocks called multifunction blocks (MFB).
Each MFB can be configured in one of six possible modes—a superwide logic module, a true dual-port SRAM, a pseudo-dual-port SRAM, a single-port SRAM, a FIFO memory, or a ternary content-addressable memory. Additionally, each MFB contains 32 macrocells, 16 kbits of SRAM, and a fully populated programmable AND array with 160 logic product terms and four control product terms. There are 68 inputs from the global routing pool on the chip. Furthermore, adjacent MFBs can be cascaded to create a block with 136 inputs.
The ispXPLD family contains four chips that pack between eight and 32 MFBs each. That translates to 256 to 1024 macrocells and 128 to 512 kbits of RAM. The combination of logic and memory yields pin-to-pin propagation delays of 4 to 5.2 ns. That speed, along with logic and memory resources, makes this family a good fit for storage control, networking, printer support, and communications applications.
BUDGET-PRICED FPGA VENDORSActel Corp.
Lattice Semiconductor Corp.