Silicon Convergence and the Future of System Design

For system designers, increasing integration in integrated circuits brings both good news and a new problem. The good news is that each new silicon process node allows chip designers to pack more components—more processors, accelerators, memory, and peripheral controllers—into one chip. More components in one chip means higher performance, lower power, and less space. But more integration also means that many decisions system designers formerly made are now made by the chip designers, gradually shifting innovation and differentiation away from the system design team.

It is important for system designers to understand what the chip designers are thinking, and to reserve for themselves the freedom to differentiate their products. In this article we will examine an important category of applications and trace the evolution of chip designs serving those markets, with this new need in mind.

An application category

Many of the most important applications in the electronics market today—including video surveillance, wired and wireless communications, and advanced industrial controls—follow a single pattern. In these applications the system gathers high-bandwidth signals, processes these signals to extract their data, applies computationally-intensive analyses to reach decisions, and then acts to implement the decisions, all subject to maximum-latency requirements.

For example, a surveillance system may take in 1080-line, progressive-scan video from a camera. The system would process the video stream to enhance edges, identify objects, and separate out objects of potential interest. This processing normally uses standardized, relatively simple but computationally-intensive algorithms.

In the next stage, more powerful processing units analyze the objects, trying to detect intrusions, for example, or to identify individual persons of interest. These algorithms may be application-specific and may change frequently. Finally, the analysis will determine whether the situation requires triggering alarms, securing gates, or alerting public-safety authorities.

First solution, software

Design teams have taken three different paths to implement these systems. The first technique begins with software running on a microprocessor or, more recently, an application-specific standard-product IC (ASSP) or a powerful 32-bit microcontroller. The design team debugs the software, confirming the algorithms, and then begins checking the system performance. If a task is too slow, the designers will speed it up by moving it to a separate CPU or, if an appropriate accelerator such as a DSP core or vector processor is available on the IC, to an accelerator. When all tasks are meeting their timing requirements, the system is ready for final verification of function, timing, and power consumption.

In our surveillance example, the system control software would run on one CPU core. The standard image-processing algorithms might be run in standard library routines on a DSP core, while the more complex application-specific algorithms would be hand-crafted for parallel execution on all the available CPU cores.

This design technique has important advantages. It maintains a steady focus on the software, and hence on the functionality of the system. And because most of the system functionality stays in software, the system remains relatively easy to alter as bugs appear or requirements change. But in general, software on CPU or DSP cores is the slowest and most energy-consuming way to execute an algorithm. So the software-centric approach is not best for systems that have demanding performance or efficiency constraints. And since the differential features of the system lie in the software, they are easily copied by any competitor—or circumvented by any hostile party—who has access to the same hardware.

Hardware as a solution

The opposite approach to system design is to develop a hardware design directly from the system requirements, while concurrently writing the software that will run on that hardware. This is the way most application-specific ICs (ASICs) are created. At the beginning, system architects determine what CPUs, accelerators, memories, and controllers will be necessary, and give these requirements to a chip design team, which proceeds to develop the ASIC.

In our example system, architects might select a pair of ARM cores to run the system software, license a third-party image-processing engine to handle the initial image-processing tasks, and design a custom, firmware-coded DSP pipeline for the complex algorithms at the end of the process. Then while the IC design progressed, the software team would have to deal with three sets of programming and debug tools for the three very different kinds of engines in the design.

The hardware-centric approach has important advantages. It can produce the highest system speeds and the greatest energy efficiency of any technique. But it requires a skilled IC design team and—at advanced process nodes--considerable investment. Also, once the ASIC is designed, it is difficult, expensive, and slow to change the hardware, either to correct errors or to respond to changing requirements. Software work-arounds can save the day, but only by sacrificing some of the speed and power that made the ASIC approach attractive.

Hence, while the hardware-centric technique is best in principle for all performance- or power-constrained designs, in practice design teams will only create an ASIC if they expect huge sales volume to justify the cost and risk, and if they know the system hardware requirements are unlikely to change over the product life. In the real world, teams facing critical design challenges often turn their backs on the ASIC approach, and go shopping for an ASSP that approximates the system IC design they cannot afford to do themselves.

A middle path

FPGAs offer a third alternative to system designers. In many ways, FPGAs have been a middle path between the software-centric, CPU-based techniques and the hardware-centric ASIC alternative. An algorithm implemented in an FPGA is not as easy to modify as software, but changing the FPGA configurations is vastly easier than taping-out a new version of an ASIC, even if the changes are confined to a few metal layers. Conversely, a task in an FPGA can be far faster, and can consume far less energy, than the same task in software. But the FPGA version will usually be slower and less energy-efficient than an equivalent ASIC.

Consequently, system designers have turned to FPGAs when a software-only solution couldn’t meet speed or energy requirements, an ASSP that allowed sufficient differentiation could not be found, and either budget constraints, low expected volume, or the probability of changes preclude use of an ASIC. Fortunately for FPGA vendors, this situation occurs frequently enough that FPGAs have consistently increased sales more rapidly than their alternatives in recent years.

In our surveillance example, designers might combine an industry-standard microprocessor running the system software with an FPGA that would implement off-the-shelf IP for the standard image processing, and custom-designed DSP pipelines for the hard parts. Thus the design in the FPGA would resemble the design in an ASIC at a functional-block level, though the implementation might be quite different at the gate level.

The best of all worlds

Ideally, system developers would not have to choose one path over another. Ideally, developers could choose the best implementation for each task. Rarely-traversed, non-critical tasks could remain in software on an appropriate CPU. Performance- or power-critical tasks would be defined by standards, so they wouldn’t change, and become fixed hardware. Tasks that require hardware support but that may change would go into FPGA programmable logic fabric.

Several silicon generations ago this was actually a common practice. The scale of integration was small enough that microprocessors, accelerators, complex interface controllers, and FPGAs were all separate chips. But by the 90-nm generation, SoCs incuded all of these functions except for the FPGA fabric. And most implementation decisions were made by the SoC designers, not the system designers. System designers could only achieve differentiation by picking the best available SoC, writing unique software, and if possible very cleverly interfacing an FPGA to the SoC.

Now the situation is changing again. The enormous number of transistors available to chip developers has allowed what we at Altera call silicon convergence. Powerful microcontrollers have added applications-specific hardware, so they look like ASIC SoCs. ASICs and ASSPs can include powerful 32-bit CPUs, so they look like high-end microcontrollers. And FPGAs such as Altera’s SoC FPGA family are including both multicore CPUs and dedicated hardware blocks, creating in reality that ideal: the ability for the system designer to select software, dedicated hardware, or programmable logic on a task by task basis.

In our example, designers might employ such a converged chip to place the system software and multi-threaded portions of the image-processing algorithms on a pair of powerful CPU cores. They could implement the remaining algorithms on a combination of hard DSP cores and programmable fabric, all on one chip.

As spiraling development costs limit ASICs to fewer and fewer situations, silicon convergence trend is drawing together the remaining three system-level solutions. Microcontrollers, ASSPs, and FPGAs are becoming almost the same, with one important difference. For both technical and intellectual-property-law reasons, only FPGAs can offer state-of-the-art programmable logic fabric. So only FPGAs can support the system designer’s differentiation strategy down to the hardware level.

A differentiation future

path, we will see high-end microcontrollers and ASSPs become the hardware foundations for systems whose hardware will become almost commoditized, while the differentiation between system products in the market passes to the software. On another path, we will see hardware-differentiated, FPGA-based system diverge from the crowd.

This divergence will accelerate because of two emerging technologies: 3D ICs and heterogeneous programming systems. 3D IC technology will allow integration of ICs of radically different technologies—for example, FPGA, microprocessor, DRAM, and radio-frequency—into a stack without the inter-chip timing and power costs of separate ICs. An early example of this trend is the Intel Atom E6x5C Series, which integrates an Atom CPU with an Altera FPGA. The Atom provides an industry-standard architecture for software, while the FPGA provides the ability to create application-specific accelerators and interface controllers.

The E6x5C Series also illustrates the need for the second emerging technology, a heterogeneous programming environment. Ideally, system developers could begin by just writing and debugging software for one CPU. Then the development platform would assist them in identifying critical code segments, assigning tasks across multiple CPU cores with shared caches, and creating hardware accelerators for critical code kernels. In this way the design team would refine the design until it met timing and power requirements.

An example of such a development environment is the OpenCL-FPGA project now under way at Altera. The objective of this work is a single environment in which system developers would create a program in a dialect of C, isolate compute-intensive kernels, generate parallel hardware engines to accelerate the kernels, and integrate the resulting hardware-software systems.

Conclusion

Driven by increasing silicon integration, convergence is gradually pulling all the major electronic blocks of a system into one package, in the process robbing system developers of much of their ability to differentiate their end products. But FPGAs, while looking superficially more and more like ASSPs and microcontrollers, are actually enhancing their ability to allow system developers to differentiate in hardware. The emerging technologies of 3D ICs and heterogeneous development environments will only speed this separation of FPGA-enabled system-level ICs from the traditional microelectronics world.