Get Better Emulation Results in Less Time

As the sun sets high atop Mauna Kea, the day is just starting for some. Night is the new day at the Keck Observatory, where the sun is but a massive generator of light pollution obscuring the interesting things that can be seen only on dark, cloudless nights.

There aren’t many places like this, so the team that busily readies itself for tonight’s long bout of observations has already spent countless hours and days and months preparing. A telescope is precious and rare. Astronomers sign up for it long in advance. When their allocated time window comes, they have to make the most of it because, when it’s over, it’s over. Someone else’s window is starting and they need to hand over control to the next user.

Their basic operating principle is to get as much data as they can while they have access to the telescope. There’s plenty of time afterward to figure out what that data means. The primary problem is that if, during analysis, they find they needed to capture some other data they didn’t originally think of, then they have to go back and sign up for another session, spending a lot of money to get the missing measurements. That’s why they spend months beforehand preparing to get it right the first time.

From The Mountaintop To The Desktop

While it might seem a comparison between poetry and prose, the beleaguered verification engineer proving out a system-on-chip (SoC) on an emulation platform faces a similar situation. Yes, in theory, you can work with an emulator as if it were simply a really fast simulator. But that’s like saying the equipment atop Mauna Kea is just a bigger version of your backyard telescope. The practical reality is that simulator licenses cost less than, and are therefore more plentiful than, emulators.

As a result, emulators also become a precious resource, and engineers sign up for them and get their window of opportunity before being kicked off by some other engineer who is also trying to meet a tape-out deadline. This use model suggests that, rather than pretending the emulator is a simulator, it might be much more efficient simply to grab all the data you can while you have the machine and then look at the results later.

This is typically referred to as “offline debugging,” and new tools are making it much easier to manage efficiently. The concept is straightforward, but there are ways to balance how you ensure your verification runs in a manner that saves emulation time but also lets you converge on bug diagnosis more quickly.

Imagine a software engineer is checking out his code by running it on the only thing fast enough to do the job—the emulator. When a bug crops up, the engineer tries to see where the issue is and decides (rightly or wrongly) that there is a hardware issue and ships everything over to you, the verification engineer, so you can locate the alleged hardware problem.

At the highest level, your obvious goal is to identify the failure mechanism as quickly as possible. But, of course, it’s not so simple. Debugging is very much a retroactive process. Once you discover a failure, you need to make an educated guess as to the time and location of its source and then go back and look deeper. If you find that you didn’t enable the appropriate assertions or generate waveforms for the right block or time period, you can’t retroactively activate those debugging features. You need another emulation run to enable them and capture the necessary data.

One potential solution would be to enable all available debugging techniques and waveform generation for the full chip from time 0, but this is an impractical use mode for emulation. Generating full-chip RTL waveforms from the emulator for an entire application run will cause a major bandwidth bottleneck, effectively erasing the performance benefits of using emulation. Besides, no waveform viewer would even be able to open a file that large.

The asynchronous timing of hardware-software interaction further complicates debugging with an emulator. When a software program requires user input—either as part of how it works or simply because the software engineer is working in debug mode, starting and stopping things and poking here and there to figure out what’s wrong—the hardware clock doesn’t stop. Thus, two runs that look identical to the software will look different in hardware simply because the user responded with the same inputs but with different timing. This can make it very hard to reproduce exactly when a problem crops up in the system. In some cases, the problem may even disappear completely.

Adding Determinism

When trying to debug a failure in emulation, things would go a lot more smoothly if you didn’t have to worry about the hardware timing changing from run to run. If you could ensure the system would behave deterministically, like it does in simulation, then you could always go back and dig deeper without fear of losing the bug. In fact, you can do this with the right preparation early on.

As the engineer coordinating hardware verification, you’re typically the one who sets up the emulator for the software engineers. Their original runs are done based on your configuration. That setup will yield dividends in debugging if it “sniffs” the way the design is stimulated throughout the original test. Keeping with the networking analogy, the resultant “frame” could then be used to abstract away the specific user input timing, making subsequent debug sessions cycle-accurate reproductions of the original system configuration.

Judicious use of frames can provide yet another benefit. When reproducing events in a failing system, you may still have to do some additional emulation runs. You could save a lot of time if you didn’t have to run the original configuration again in its entirety. For example, if a Linux boot is part of the test run, but the problem occurs well after Linux boots, then you would benefit significantly if your setup allowed you to skip the boot time.

You can create smaller, more frequent frames by saving the state of the system, including the design and memory contents, after key program milestones during your original run (Fig. 1). Each of these frames can then act as a kind of bookmark. You no longer have to wait for that Linux boot-up because you have a picture of the entire system state sometime far after that, and you can start there. If the origin of the problem happens to be earlier than the frame you chose to start with, you can always go back and start from an earlier frame without the fear of losing synchronization between the runs.

1. Frames act like bookmarks that let you reproduce an emulation run near some event of interest, bypassing what might be hours of prior run time.

Achieving Debugging Convergence

Frames give you a more efficient way to reach the point of interest in any needed emulation debug runs, but you still need to produce and analyze the data from those frames to achieve debugging convergence. You will eventually need to generate waveforms, but you still face the potential bandwidth bottleneck as you dump the necessary data.

You can dramatically improve the bandwidth demand by paring down the number of signals you require frequent access to from the emulator, while maximizing the relevant information from that set. These critical signals, identified from the verification plan, can be used to raise the level of abstraction in debugging. They can be leveraged in SystemVerilog assertions, Direct Programming Interface (DPI) calls, and transactors, which are all executable in emulation without any significant bandwidth or performance impact.

The higher-abstraction debugging technologies lay dormant in emulation until they are activated, so you can reduce the active set of signals even further by only enabling the checkers that are relevant to the functional areas identified by the software engineer. Leveraging the assertions, checkers, and monitors, you can quickly achieve debugging convergence, isolating the time and location of the failure to a realistic window for waveform generation and logic debugging (Fig. 2).

2. A hierarchical approach to debugging abstracts away details early on, helping zero in on failures.

At the logic debugging level, there’s yet another way to reduce your emulator bandwidth that leverages the greater availability of PC resources. The approach lets you dump a subset of what you want to look at from the emulator. Given that partial dump, offline simulation can fill in the missing pieces.

Emulation is so much faster than simulation that you might expect the addition of simulation-based waveform generation to bog you down, sacrificing much of the benefit of emulation. In fact, adding offline waveform generation helps to maintain emulation’s performance benefits while also using your resources more efficiently.

Incorporating offline waveform generation reduces the amount of data transfer required between the emulator and its host PC, reducing the likelihood of a bandwidth bottleneck. On the simulation side of things, performance benefits from the breaking of big jobs into multiple smaller jobs, which can then be run in parallel on a server farm—freeing up more time on the emulator for others.

These concepts—frames, raising the level of debugging abstraction, and the ability to calculate signals offline—are critical to balancing the need to get as much relevant data as you can from each emulation run, while completing those runs and converging on the source of the problem as quickly as possible.

Getting your answers with fewer faster emulation sessions isn’t just good for you. It’s good for your other team members too. After all, as you complete your run when your day ends, someone else’s day is just starting.