Understanding SoC Functional Verification Metrics

System-on-a-chip (SoC) functional verification involves integrating multiple intellectual property (IP) blocks. Accordingly, understanding how to define, measure, correlate, and analyze appropriate IP and system-level metrics is fundamental to improving performance and achieving quality goals.

But there’s a dilemma. The use of metrics today is too often limited to simple coverage measurements. In this article, we show how to take a broader view. We will describe the four key aspects of any metrics-driven process: understanding the landscape; categorizing the metrics selected for a given project; using the simulation resources required appropriately, to collect most metrics; and reporting data from metrics in a way that’s manageable.

Understanding The Landscape

One way to manage the breadth and volume of metrics is to initially organize them into high-level areas of focus and then provide controls enabling the measurement of relevant areas of interest. Metrics can be organized as test-specific, user-specific, and project-specific.

The relevance of a given metric changes along with the execution of various simulations, each of which focuses on a specific design area. Some simulations, for example, may run stimuli focused on specific system components, while others may spread activity across a broad set of components.

When specific components are targeted, then the associated metrics (which are likely IP-specific, stimulus, checking, and simulation) are usually relevant. When simulation activity is spread across an entire SoC, low-level metrics within specific components are likely to be less relevant, while higher-level metrics (perhaps measuring bus and application programming interface activity) may be more relevant.

Designers can use a conceptual checklist matrix to measure the completeness of a specific test (Fig. 1). The matrix provides the user with a method of determining if a specific simulation ran at the appropriate levels of abstraction, or if multiple tests were run concurrently with a specific irritator. The conditions measured within an actual checklist matrix would be design-specific, yet this simple example hints at the power of graphical analysis.

1. By mapping multiple complex IP blocks (top)—here, containing a coherent cache and memory subsystem—to a checklist matrix, you can develop a test-specific analysis of the subsystem (bottom).

Or, a user may want to change the relevance of metrics at different points within a project. For example, enabling metrics in specific areas can augment information provided by traditional checkers and monitors. In addition, the user will want to enable metrics that reveal how the environment was constructed and initialized, as well as how it is running.

A number of metrics may be used at the project level to measure productivity and progress, including simulation time, build information, farm execution, and broad measurements of the environment (e.g., system under test stimulus, checkers, and abstraction). Such measurements are generally relevant for all simulations and are likely to be enabled on all runs as a way of capturing and monitoring the overall project.

One example of a project-specific metric is tracking how long a regression simulation takes per various IP blocks. For example, Figure 2 shows the regression run for the Coherent Cache IP block depicted in Figure 1. Notice the sudden spike in regression time on week 17. This might be caused by design or coding issues associated with a recent modification to this particular IP block. Investigation of this spike allows us to respond to issues before they get out of hand.

2. After the regression run for the Coherent Cache IP block of Figure 1, the next logical step is to investigate the spike seen in week 17.

Categorization Of Metrics

Metrics can be organized in a general, high-level approach or by way of a more focused grouping of measurements through categorization. In system simulation, an integrated IP block may measure some aspect of low-level functional coverage, which is only expected to be covered in an IP-level simulation. Categorization can be used to identify the most interesting metrics and to disable others.

Categorization can be used to improve the performance of a metrics-driven process in several areas:

Allow specific concerns to be addressed: Complex system-verification environments often lead to use of various classes of simulations. One example is the choice of abstraction level. Higher-abstraction-level simulations are generally used to allow for faster simulations with less accuracy. This approach can be useful for testing some higher-level concepts or performing long simulation runs involving firmware.

However, the reduction in accuracy may require disabling a whole group of metrics. Once again, categorization can be used to specify a group of metrics that should be disabled during simulation.
Improve regression efficiency: If most regression runs pass with only occasional failures, then it may be useful to have the regression environment run with most or all metrics disabled first, and then decide whether to rerun for failing regressions. If failure types are classified, then the regression environment may be able to look up the desired metrics categories for the particular failure type and rerun the test automatically. This approach may provide a reasonable tradeoff between regression efficiency and the engineering need to debug the failure.
Allow a team to package information with an IP through categorization: IP designers can have specific knowledge that they may want to package up for use at the system level. Metrics categorization can help with this goal by allowing the designers to correlate metrics to system-level requirements. For example, the designers might add a category of functional coverage to the packaged IP that they consider important at the system level. This category is likely a subset of the full functional coverage metrics.

Similarly, IP-specific performance metrics are not likely to be useful at the system level. These metrics could be disabled through appropriate categorization. Furthermore, a category of metrics might be used to control how multiple IP blocks behave at various integration or abstraction levels.

Runtime Control

The team members generally understand the goals of each particular simulation. For example, some simulation runs focus on specific design areas, while regressions explore the random stimulus space and check for correct behavior.

Because metrics can require considerable simulation resources, it’s useful to only enable resources that help reach the specific goal. When a simulation is focused on one area of the environment, it is reasonable to enable metrics from that area to disable most others. For regressions, coverage metrics may be the most important, followed by performance metrics.

For performance reasons and to reduce the likelihood of introducing a change in the simulation behavior as a result of changing a monitor, it is important to provide the user with a runtime mechanism to turn on metrics without having to recompile the design. Runtime mechanisms can be used to enable and disable categories of metrics at the start at the specified time of the simulation. This can improve an environment’s performance and random stability.

However, just enabling a monitor can affect a design’s random stability if not done carefully. Open Verification Methodology (OVM) and Universal Verification Methodology (UVM) are structured to minimize the likelihood of an instantiation change resulting in a randomization difference.

Reporting

Plotting metrics over time is one way to determine progress and direction within a project. This approach can be particularly useful to check progress against a schedule or to determine the effectiveness of specific verification methods. The simplest reports might track a single measurement, such as bug-open and bug-closure rates plotted, or the percentage of tests written.

A slightly more complex report might include the correlation of multiple metrics, plotted over time. Again, the idea is to choose a group of metrics which, when analyzed together, provide a useful view into the project.

Code and functional-coverage reports fit into this category. Coverage is generally measured as the ratio of covered to uncovered lines of code (or functional coverage points, in the case of functional coverage). Plotting that ratio over time reveals a trend; in the best case, it is one that shows that code coverage is increasing.

Metrics can be used for other reasons. For example, a manager might query the database of metrics to determine the performance of either the simulator or the device under test (DUT). Alternatively, a manager might query the database of metrics to determine the effectiveness of specific verification components. Here are a few examples of queries that one might encounter on a typical SoC verification project:

Query test-specific example: In the case of test-specific queries, a verification engineer might be interested in knowing if a group of tests designed to cause specific interactions between blocks within the DUT actually worked. A metrics query may be the simplest solution, particularly if multiple data points are needed. As an example, a query can be used to determine if a specific test, when run at a specific integration level in which several specific blocks were instantiated, caused a specific cover point to be hit. This type of unique and specific query may be used as a part of determining the completion status of a regression, but it is unlikely to be of interest as a general trend discussed in the previous section.
Query simulation performance example: Although simulation efficiency can be measured in many different ways, the most straightforward way is as cycles per second (provided the bus frequencies within the DUT are constant). Studying cycles per second can help detect the introduction of inefficient code when you log the frequencies across IP blocks as they migrate from standalone environments into subsystem integrations. Making simulation performance a criterion for revision control system check-ins can reduce system simulation time. In larger systems, a more-interesting performance measurement may be productivity per simulation cycle, which can be captured through the number of tests, checks, and cover points achieved per cycle. One could also track the number of RTL bugs caught per cycle, per category of stimulus or per regression. Such reports reveal the effectiveness of the verification environment and longer-term trends when plotted over time.
Query architectural performance example: Tests can help confirm that specific DUT operations perform as expected and look at the performance of a group of operations. However, by definition, any test can only measure DUT performance within the confines of that test.

Summary

Metrics enable users to see and measure day-to-day design activity. More useful, however, are the measurements over time revealing trends and progress (Fig. 3). The time horizon can be anywhere from days (to see the immediate progress of the design, verification environment and project) to weeks (an interval over which the project can be tracked against the schedule) to months (generally enough time for tracking, measuring and improving large-scale project elements, including teams, methodology, and tool productivity and effectiveness).

3. A project metrics-driven verification dashboard shows trends and progress over time, which can be more revealing than just seeing day-to-day activity via basic metrics.

Only by knowing the current state of verification is it possible to determine what to improve in the future, or whether an earlier change has caused the desired improvement. Gut feelings, impressions, and intuition can be effective in small projects where a few people have a good understanding of the entire design. However, in larger SoC projects that involve multiple, complex IPs, no one person has a view that encompasses the entire project.

As a result, the intuition of one or even several people may not accurately portray the state of the project. Metrics can be used to provide a quantitative measure of the state of a project and permit comparisons, analysis and corrections to be made.

Metrics can also be used to catch inefficiencies when they are first introduced into a system. Without these metrics, it may be weeks before a small change in an IP block causes a significant slow-down in the system environment. By then, it could be difficult to determine what change caused the deterioration.

By providing a quantitative assessment of IP quality and efficiency, metrics can track productivity by component or system and at a point in time or as a trend. This view into the system is the basis for productivity improvements and for on-the-fly detection and correction of issues with the design or verification environment.

Additional Notes

Much of this article is culled from a much longer paper we co-authored for DVCon 2012, “Metrics in SoC Verification”. It includes much more information on what is driving change in SoC verification generally and all the various metrics that can be collected in work on SoCs. What differentiates this paper from prior work is our focus on metrics issues specifically concerning SoC design and verification.

Of course, discussion of metrics is hardly new. Applying metrics to quantitatively improve a process is a fundamental component within the iconic (at least for computer science types) Capability Maturity Model, or CMM, a framework for assessing and improving software processes originally developed by Carnegie Mellon University and the Software Engineering Institute.¹ For hardware verification, coverage is one metric that has been used for years.

The book Functional Verification Coverage Measurement and Analysis provides an excellent overview and taxonomy of various coverage measurements.² In addition, the book Metric Driven Design Verification provides an introduction to metrics-driven processes in hardware design and verification.³

References

Carnegie Mellon University and Software Engineering Institute, The Capability Maturity Model: Guidelines for Improving the Software Process, Addison-Wesley, 1995.
Piziali, A., Functional Verification Coverage Measurement and Analysis, Kluwer Academic Publishers, 2004.
Carter, H., Hemmady, S., Metric Driven Design Verification: An Engineer’s and Executive’s Guide to First Pass Success, Springer, 2007.