Reuse-Driven Methods Can Help Optimize Systems

The escalating complexity of electronic-product design is focusing attention on significant gaps in the methodology and technology for design of complex system-chips and chip sets. New methods and tools are needed to help designers confirm critical architectural decisions, such as the hardware and software partitioning of system functionality early in the design of both first-generation and derivative products. This capability will allow system companies and silicon vendors to optimize product specifications, shorten development cycles, and capitalize on the increased capabilities offered by multimillion gate integrated circuits.

The need for new techniques and tools for the design of consumer products is heightened by the significant and rapid changes in the marketplace. Some of these changes are:

Product design times and lifetimes are not only shrinking rapidly, but the market windows defining the success of a product also have become fixed in time. If your product is not on the shelves by Christmas, you probably should not have started the design in the first place.
As a direct result of this, product iterations due to implementation errors are not allowed because failure to hit market windows usually equals product death.
System companies are increasingly focused on their core technological and market competence and are, therefore, opening up to bringing in complementary expertise.
Product consumerization emphasizes the creation of product derivatives, and the addition of small amounts of customization to differentiate products.
Products must conform to complex interoperability standards, either de jure (for example, type approval in communications markets) or de facto (by cable companies in the set-top box market).

The impact of these changes on design methodology is profound:

System-level decisions made very early in the design cycle determine the cost, performance and viability of the product.
Creating a "virtual prototype" of the product is vital to guarantee acceptance at type approval or product qualification.
Engineers must evaluate, combine, integrate, and verify pre-designed virtual components*often referred to as intellectual property*in order to meet design deadlines. Virtual components are needed in both the software and hardware domain.

The broad need for a new approach to both hardware (HW) and software (SW) codesign is evidenced by the wide range of electronic products that use embedded controllers. In consumer electronics these include CD players, single-chip PCs, and video games. In telecommunications, they are telephone switches, cellular phones, and high-speed modems. Multimedia applications are in digital cameras and set-top boxes. Automotive uses include engine controllers and anti-lock brake controllers

The Current Approach Current approaches to virtual-component-based design are of limited use for complex system-chip architectures. They address mainly the hardware aspects of design, and they rely on highly detailed models, which are not suitable for rapid trade-off analysis at the system level.

For example, at the register transfer level (RTL)—the level at which many so-called "soft" virtual components are defined—architectures must be fully articulated or elaborated, with all signals instantiated on all blocks, all block pins defined, and in most cases, a full on-chip clocking and test scheme defined. Furthermore, designs at RTL have completely defined communications mechanisms.

This makes it very hard to change the on-chip control structures and communications mechanisms between blocks. Therefore, it is very risky at this stage to apply an architectural change, both because such a change will be very time-consuming, and because it might jeopardize the project schedule. When unavoidable, an RTL architectural change can only be applied by "ripping-up" and "re-routing" of the communications mechanisms, and rewiring of any new or substituted functional blocks to the communications structures.

A related effect is that it is difficult to "drop-in" or substitute a virtual component with another choice. Because dropping in a new microcontroller core requires detailed "rip-up" and "re-route" to link the block to the communications structure, it is extremely hard to effectively explore virtual component alternatives.

Furthermore, designs captured in RTL code mix both behavioral and architectural design together. Often, the only model of a virtual component block function is the synthesizable RTL code, which represents the implementation of the function. Similarly, the only model of a SW function may be the C or assembly language implementation of the function. This "intertwining" of behavior and architectural components together makes it extremely difficult to evolve the behavior of the design and its architectural implementation separately.

Finally, verification of embedded HW-SW designs in RTL requires nearly-complete hardware design and nearly-complete software code for the hardware interface (drivers), part of the RTOS, and the layered application(s) if the behavior of the system is to be verified.

Coverification at this level is clearly not fast enough to verify complete system application behavior in an HDL/C simulation environment. If major application problems are found during cosimulation, a time-consuming and tedious redesign process is required to repair the design. Re-partitioning is also difficult because the communications infrastructure will require detailed redesign. In addition, substitutions to the programmable architectural virtual components like new processors, or controllers, or custom hardware accelerators for part of the software implementation require significant changes to the application software.

The net result of today's limited methodology is that it is almost impossible to effectively explore the behavior and architecture as efficiently as required for modern system-chip designs. As a result, system partitioning and design is often done with manual, back-of-the-envelope techniques, and carries an inherent risk that major problems will emerge during downstream implementation and integration.

Providing solutions that overcome the major limitations in today's methodology and tools requires moving up to higher levels of design abstraction*to the "architectural" and "behavioral" levels. In this environment, the system designer first captures and verifies the functional behavior of the entire system at a pure behavioral level. This step relies heavily on the reuse of existing behavioral libraries and algorithmic fragments (see the figure).

At this level, behavioral virtual components are instantiated with simple connections to abstract views of communications mechanisms. At the highest level of abstraction, communications can be described as moving frames, packets, or tokens between function blocks over channels.

The next step within this environment is to evaluate different potential target architectures, which carry the behavior—the product functionality—after implementation. The architectural virtual components can be classified into several categories: processors (control-dominated or signal-processing-dominated), custom function blocks (MPEG decoders, filter blocks, etc.), memories, peripheral controllers, buses, and others.

The architectural abstractions must be easy to capture, evolve, and change. As discussed earlier, this requires that such abstractions must remove the fully elaborated detail that is not necessary for first- and second-order architectural exploration and evaluation. Cycle- and pin-accurate simulation models are fundamentally too slow to allow the exploration of architectural alternatives. This can be accomplished through performance analysis techniques that enable trade-off analysis of first- and second-order architectural decisions.

Once the system behavior and the target architectures are defined, the system behavior and system architecture are kept distinctly separate. The next step is to map behavioral functions and communications arcs to the architectural resources. Only by avoiding the "intertwining" of behavior and architecture is efficient design-space exploration possible*for example, a "sweep" over potential target architectures.

The basic system performance can be described on an abstract level by characterizing the speed at which blocks process tokens or run software, and the delay involved in transferring tokens between blocks over communications mechanisms. The equations describing block performance are called delay equations.

Since shared resources*buses and processors, for example*are contended for, they must be modeled in the appropriate delay equations, which need to invoke models of resource contention. Furthermore, it is desirable to use techniques for automatic estimation of software performance on target computing architectures and derive the performance of the hardware automatically using an estimation process or back annotation from implementation tools like behavioral synthesis.

This previously described codesign approach enables the exploration of system architectural alternatives via performance analysis of speed, power, and cost. It takes into account the impact of mapping certain functionality to hardware (for example, peripherals and standard components) and software architectural elements (for example, processors). The system architecture, behavior, and mapping are iterated and analyzed to identify the optimal architecture and associated partitioning between the hardware and software subsystems.

Bridging the Gap The codesign process described results in decisions relative to the partitioning between hardware and software and the mapping of system behavior to a target architecture. However, system-level virtual components used during codesign are processing tokens, frames, or packets, or they are stepping through control and computational sequences under software control.

Therefore, the target architecture must be further refined into detailed micro-architectures that are implemented in both hardware and software domains. For example, memories are mapped onto actual implementations, communications interface blocks chosen, and interface (glue) control hardware defined.

To keep the freedom of evaluating different behaviors and architectural alternatives, communication details have been kept at an fairly abstract level up to this point. Now, communication refinement must begin.

This refinement is the process of mapping communication arcs in the behavioral hierarchy to architectural resources, and decomposing the resulting communication blocks down to the pin-accurate level. For behavioral blocks mapped to the same software processor, interface decomposition down to the real-time operating system (RTOS) interface level is sufficient.

If the user has selected standard hardware components or a standard RTOS within the architecture specification, then these selections constrain the decomposition process on the behavioral side to match the actual interfaces within the architecture.

At this level, the mapped behavior described above is extended to model the communication mechanisms within the system architecture. For two communicating behavioral blocks mapped to hardware, the modeling is done at the bus transaction level. For example, the user will see transactions such as: write(501), read(), burst-write(53, ...), irq(1,5). The token types transmitted are those directly supported by the hardware bus.

Shared resources within the implementation (processors, buses, etc.) can be modeled abstractly via resource models. They are instantiated by the performance simulation interpretation of the delay equations.

For software-to-hardware or hardware-to-software communication, modeling at the bus transaction level is again required. For software-to-software communication when the two software behaviors are mapped to the same processor, modeling of the communication interface with the RTOS is sufficient, since there is no need to model at a lower level of abstraction. The transaction types in this case might be: wait(), read(), lock(), unlock(), or emit().

The refined target architecture is then passed forward for hardware and software implementation and coverification. Information forwarded from the codesign environment about hardware and software would consist of the following elements:

At the top-level, a hardware description language (HDL) file with references to all the virtual components and the full pin-accurate wiring information (for example, all signals referenced, including I/O pads, test buses, self-test structures, and parameters).
Synthesizable RTL HDL blocks invoked from the top level, or executable software code that implements the communications structure (for example, the bus interface).
A test bench that helps validate at the cycle-accurate level the assumptions made at the performance analysis level.
Memory image descriptions for each software implementation with specific information on physical memory location, size, and load address.
A memory map with no overlaps, including a fully defined interrupt vector and DMA trigger table.

Moving beyond today's RTL-based methodology for system-chip design requires the development of a new reuse driven codesign methodology and the delivery of new tools and technologies to support it.

The above description outlines the key characteristics of such a new methodology, and work is underway on new technology that will allow designers to leverage HW and SW virtual component reuse and integration at the system-level. These are key enablers to achieving the 100X productivity required if we are to truly realize the advantages of advanced systems-on-a-chip.