Designing For Low Power? Get Started At System Level

Your next system-on-a-chip (SoC) design will, in all likelihood, dissipate more power than your last one did. Even if you’ve topped out on core speed, chances are you’ll be adding functionality that will consume additional power. If you’re contemplating a move to a more advanced process technology, the increased leakage alone will cost you in terms of your power budget.

Power is the constraint du jour in chip design, and that’s reason enough to begin taking it much more seriously than ever before. If you’re truly serious about meeting your system power budget, a critical aspect of any portable or consumer design, you may want to begin thinking about power as early as you possibly can, and that means in the architectural definition phase of the project.

Power analysis and optimization at the electronic system level (ESL) isn’t yet part of mainstream design methodologies, but the day is rapidly approaching when it will be. This article will help you get up to speed with ESL power analysis and optimization, including what’s available for optimization of the RTL that you’ll feed into your synthesis flow.

WHY START ON POWER AT ESL?

The reasons for optimizing for power at ESL are many, but the simplest explanation is that you’ll get the biggest bang for your buck at higher levels of abstraction. “Lots of tools analyze power at RTL, which is good for determining whether you’re in spec,” says Brett Cline, vice president of marketing at Forte Design Systems. “But the changes you can make in RTL are very limited. You only have so many options there.”

Real power optimization happens at architecture level, says Cline. ESL synthesis tools can output RTL that is optimized for power. To be sure, designers can adopt best practices for RTL coding and use techniques such as clock gating to gain power reductions. “But realistically, someone needs to make smart architecture decisions to really reduce power,” says Cline.

“Most of a chip’s power consumption is determined in the architecture, RTL, and synthesis stage,” says Tom Sandoval, CEO of Calypto Design Systems. “Once synthesis has created a logic structure, the effects of subsequent changes on power are going to be small and incremental. Thus it is important to consider power during the logic design process.”

“Our real perspective is that the most important determiner of power is architecture,” says George Harper, vice president of marketing at Bluespec. “At that level, the extent to which you can explore the space, provide transparency, and facilitate the types of design choices is the best you can do for optimization of power in a chip design.”

Some EDA vendors claim that power savings of as much as 80% can be realized by tinkering at the architectural level. Yet often, they’re not telling you that this figure refers to savings compared with their own non-ESL flow. One can quibble over these numbers and how they are arrived at, but the point remains that you can’t do better later in the flow than you can at ESL (Fig. 1).

Architectural analysis and optimization carries multiple benefits in the power domain. The power savings are realized through optimization of the system architecture, not just for power but also for area and performance. The architecture level is also where designers can best address the partitioning of hardware and software. They can tune the application software to achieve best power consumption and also correlate power requirements with system workload.

The most common approach being used today for high-level power analysis and optimization is the virtual platform, or VP (see “Virtual Platform Technology 101”). Virtual platforms are a high-level representation of the system hardware that’s assembled from models typically written in C/C++ or SystemC.

Among those using ESL virtual platforms are system architects and software developers. Obviously, designers of portable systems are extremely concerned about system efficiency. “That isn’t just about how one particular hardware element might consume power, but how the different use cases of the product interact to consume power. That’s a system-level question, a software question, and a hardware question,” says Pat Sheridan, director of marketing at CoWare. ESL vendors such as Co-Ware and Carbon Design Systems market tools for the creation of virtual platforms.

Many designers are finding it useful to equip their virtual platforms to help measure power consumption. “Users are instrumenting VPs,” says Frank Schirrmeister, director of product marketing for system-level solutions at Synopsys. This entails using an existing VP in which designers have integrated various intellectual property (IP) blocks, including processor(s) and peripherals. As embedded software developers run their code on the platform, they can examine the power states of the blocks and annotate the VP with power-state information.

Texas Instruments has extensively used the Synopsys Innovator VP development environment for power modeling of its Open Multimedia Application Platform (OMAP) family. “We have done power modeling within the platform in which TI and their customers brought up the models. You can see the voltage regions within the chip and what happens if you drive the processors and peripherals to different states of load,” says Schirrmeister. “You can observe the external power-management IC and how it regulates the system voltage.”

The objective here is embedded software development within a low-power context, as well as the optimization of the design. So depending on when these tasks take place in the cycle, you may be able to organize voltage regions differently. Using VPs in this way is not yet a mainstream part of design methodologies, as it is mostly restricted to advanced design teams.

“The main challenge is not putting the instrumentation within the platform to accumulate power data. That’s well understood,” says Schirrmeister. “Rather, the crux of the problem is the initial characterization of the blocks for power. You can either take your power budget, use those numbers, and refine them later once you have RTL and a layout to simulate, or you can do real chip measurements and annotate that back into the VP for the next project.” According to Schirrmeister, Synopsys customers have been pushing for the company to add more characterization data to its DesignWare TLM library models.

Another notable entry in the VP arena is Mentor Graphics’ Vista platform, which allows for comprehensive architecture design and prototyping. Vista uses scalable transaction-level models and is based on the OSCI TLM 2.0 standard. It enables users to model power at the transaction level using power-estimation policies long in advance of an implementation at RTL. Or, users can annotate more accurate power behavior based on attributes of the technology process of the target implementation IP blocks.

Scalability in Vista models is accomplished by modeling the block’s core function. This aspect of the model is “golden” and does not change. Then, a separate layer is added to represent a model of timing and power. These aspects of the overall model are related to the architecture, and they can and do change. Finally, a communication layer is added to represent the protocol that’s in play.

The power modeling in Vista is a transaction-level representation of all types of power associated with the block, including static (leakage) power, clock-tree power, and dynamic power on a per-transaction basis. The power model is reactive to incoming traffic and inner logic states. It also supports voltage and frequency scaling.

An architectural power-optimization flow that’s associated with Vista enables designers to assemble a transaction-level reference platform and use it to analyze timing and power in a system context. They can modify timing/power policies or the overall platform architecture and quickly iterate optimizations.

TAKING CHIPS’ TEMPERATURES

Power analysis at ESL should include consideration of operating temperature in various use scenarios. “There is a strong correlation between power consumption and temperature,” says Ghislain Kaiser, CEO and co-founder of DOCEA Power. Greater power consumption inherently brings higher temperatures. Higher temperatures can induce more leakage current, which feeds a vicious cycle that can culminate in thermal runaway and failure.

“That’s why it’s important to make the right decisions early in the flow, taking into account current flow and temperature, especially in wireless applications and portables,” says Kaiser.

In the past, these kinds of problems were most commonly approached through the use of Excel spreadsheets, which can get extremely complex. But that’s not the biggest issue with the approach, says Kaiser. “The main disadvantage of a spreadsheet approach to thermal management is that it’s static,” he says. Temperature analysis and simulation requires a dynamic dimension, says Kaiser, without which you miss temporal aspects that threaten your design’s reliability.

DOCEA’s ACEplorer tool represents a power and thermal holistic modeling platform based on what DOCEA terms a “separation of concerns” principle (Fig. 2). “We follow the concept of divide and conquer,” explains Kaiser. “With the complexity of the design, it’s better to divide the concerns. They can be timing, functionality, power, planning, and so on. ACEplorer captures and simulates the power behavior of every contributor in the system (digital, analog, I/O) and can analyze virtual platforms, SoCs, SiPs (systems-in-a-package), or IP blocks.”

More importantly, ACEplorer simulates dynamic behavior such as dynamic voltage and frequency scaling (DVFS), IR drop, power/temperature coupling, and complex power-supply efficiency. Thus, it provides a tool for early power and thermal estimations that support technology decisions on packaging, process nodes, and IP selection.

PREPARING FOR SYNTHESIS

Most designers have a laundry list of items to consider when it comes to power optimization in the design creation stage. For one thing, they want a holistic approach, says Tom Sandoval, CEO of Calypto Design Systems.

“If you are building a networking design, 70% to 80% of the power may be consumed in memory. Conversely, in a computationally heavy graphics chip, it’s the logic that consumes the most,” says Sandoval. “Or, you may have interface chips where it’s the I/O burning power. Further, the tools must be fully automated for maximum efficiency. Designers aren’t all that interested in toying around with scripts.”

Power optimization should be surgical and not impact other aspects of synthesis quality of results (QoR). Designers also want to know what is being done in optimization. They want full transparency into what’s happening so power fixes aren’t detrimental to area or to timing in critical paths.

Moreover, power optimization in pre-synthesis stages should fit into existing flows. “No one wants to change their methodology,” says Sandoval. There should also be a verification flow associated with the optimizations to maintain confidence in the changes being made.

THE IMPACT OF HIGH-LEVEL SYNTHESIS

Many users of high-level-synthesis (HLS) tools are coming to understand the true value of HLS in their flows. Like virtual platforms, HLS enables designers to explore multiple microarchitectures for optimizing QoR (Fig. 3).

In Calypto’s case, the company’s PowerPro MG, which automates memory power optimization, and its PowerPro CG, which performs broader RTL memory optimization, will soon include enhancements for designers in the process of hand-tweaking RTL. “We’ll give greater visibility into the RTL for designers who are creating the RTL by hand, either because it’s very high-performance circuitry or because they don’t want tools touching it,” says Sandoval.

Often, the RTL generated by HLS tools cannot be hand-optimized because it’s not readable by humans. “What the designer can do with HLS is build multiple high-level designs to figure out which gets best QoR, but he can’t go in and further modify it,” says Sandoval. Calypto’s Power CG and MG along with SLEC System-HLS, which independently verifies the HLS results, let them read that RTL and make sure they get the optimizations they seek (Fig. 4).

Another possibly more organic approach is that taken by ChipVision Design Systems, whose PowerOpt tool synthesizes power-optimized RTL directly from either ANSI C or SystemC code. “Since we already understood the power characteristics of the ESL model, we could implement an RTL version that minimized the power dissipation by minimizing both the area of the design, which is tied heavily to leakage, and the switching power, into which designers don’t have insight at RTL,” says Craig Cochran, ChipVision’s vice president of marketing and business development.

PowerOpt should be considered as a front end to RTL synthesis, says Cochran. Starting from C and synthesizing to RTL obviously saves the time and effort of hand-coding the RTL. But there’s the added benefit, says Cochran, of a much better RTL architecture. “When you have the opportunity to try many different architectural options, you end up with a better RTL starting point for the rest of the flow,” says Cochran.

PowerOpt does what all HLS tools do in terms of what-if analysis and rapid generation of multiple RTL architectures. It weighs all options to save area and/or increase speed such as pipelining and assorted memory architectures. But it adds a focus on power consumption by executing the C code with the user’s C testbench to measure switching activity. It then uses those measurements to guide switching minimization (Fig. 5).

Looking a little further under the hood, PowerOpt’s switching analyses emphasize mission modes and not corner cases to provide a true picture of power consumption in typical system operation. Switching activity data is compiled in a database that is accessed by the HLS engine. During synthesis, the tool compares various possible architectures and decides whether to map functions onto single resources, implement multiple resources, or instantiate pipelining.

“The tool comes up with some interesting tradeoffs,” says Cochran. “Generally, power tradeoffs are not very intuitive to users.” As the tool decides which architectures are most efficient, it looks into whether it should share registers and measures the impact that would have on power.

LIBRARY FOR PIPELINING

Bluespec is another HLS house with a path from high-level code to RTL. In this case, the code is Bluespec SystemVerilog (BSV). In an example of an 802.11a Wi-Fi transmitter design, the design team was able to use Bluespec’s simulator and compiler to implement seven different microarchitectures in just five man-days. Had the team followed the designer’s original intuition for the architecture, the device would have consumed an average of 34.6 mW. However, the architecture that was ultimately implemented consumed about 4 mW, albeit in considerably more area.

To augment its tools’ capabilities, Bluespec plans to offer a library of plug-and-play building blocks for pipelined architectures. The Pipeline Architecture Composers’ Library (PAClib) will enable designers to generate many different pipelined microarchitectures from a single source by simply dialing in parameters. The blocks, predesigned in BSV source code, let users separate the notion of architecture from the blocks’ computation function and from microarchitecture choice.

With the PAClib, designers write a single implementation and adjust parameters in an associated parameter file. “These can be thought of as Lego blocks for datapath pipelines,” says George Harper, Bluespec’s vice president of marketing. “They start from the premise that architecture is the primary driver of power consumption in chip design.” Because the blocks are so flexible in terms of parameters, they offer designers further opportunities for power savings.