In today's system-on-a-chip (SoC) world, there is an increasing need to bring system-level verification into the design process as early as possible. Teams that succeed in integrating their systems, in tandem with the development of silicon, will win the time-to-market battle by capturing essential, early design wins.
Every SoC device includes a processing element, which introduces embedded software modules into the system verification environment. Meaningful system verification requires the integration of those software modules into the overall product verification environment.
This article explores an innovative approach to coupling the embedded software development environment, with its high-speed execution and familiar debugging tools, to an ASIC emulator. Through loose coupling, both tools achieve maximum performance, while maintaining the functional integrity of the system. The use model for each tool is also preserved.
The method used to accomplish this loose coupling is called functional coverification. Figure 1, the example for this discussion, is a high-level block diagram of the hardware/software system. The embedded microcontroller and its bus-interface hardware are assumed to be functionally correct, and are not the subject of this verification project. They are replaced by a virtual communications conduit connecting the embedded software system to the ASIC hardware-under-test.
In hardware emulation, one can considerably reduce the emulation gate counts by not modeling processor circuitry. Note, however, that an instruction-set simulator may be used to act as the processor in the software verification tool.
Multiple Tools To better understand a loosely coupled verification system, one must have some appreciation of the coverification problem, as well as the techniques used for synchronizing disparate tools. Multiple verification tools include event-driven simulators, instruction-set simulators, standalone behavioral models, ASIC emulators, and embedded software simulators.Whenever these tools are employed in a large system simulation, with each one modeling some portion of that system, there is an inherent synchronization problem. The models running on one tool may not exhibit adequate performance in relation to components running on other tools—which is a must for maintaining functional integrity.
For example, assume that an event-driven simulator (Verilog or VHDL) is modeling an I/O port at 20 cps, and the driving software is running on an instruction set simulator at 1 Mcps. Suppose that, in the real application, the processor and port both run at 100 Mcps. If the software writes a command to the output port, which produces a result in a status register on the next clock, it is okay to access the status port soon after issuing the command—without checking if the status is indeed ready.
But in this simulation, the design breaks. The status port is accessed before it is ready, because the hardware is performing 500 times slower than the software. The software system runs erroneously ahead, creating this artificial problem. Clearly, if both verification tools ran their models at the same speed, either at 20 cps or 1 Mcps, the problem would be avoided. When the tools naturally run at different speeds, as in this case, some cross-tool mechanism must be used to keep them in sync. Several techniques are available: time synchronization, cycle synchronization, functional synchronization, or some variation of these.
Time synchronization maintains a common time base across all tools. It is extremely compute-intensive, and rarely allows performance of more than a few hundred cycles per second, even when simulation accelerators or emulators are used. In the above example, time synchronization would indiscriminately slow the software system to 20 cps. Even when processing code doesn't access the I/O system, this would happen-an unacceptable solution.
With cycle synchronization, no tool can advance to the next clock cycle until all the tools have completed the current cycle. This technique supports higher speeds, in the 1000 to 10,000 cps range. As with any cycle verification technique, however, problems arise when asynchronous behavior is present (multiple clock domains, unclocked logic, etc.) across the interface. And, performance is still inadequate.
The functional synchronization technique, by comparison, features free-running underlying tools, with no tool-based synchronization. For the verification to maintain integrity (no one tool runs erroneously ahead in verification time), the system-under-test itself must maintain functional interlocks. Because synchronization occurs only when necessary, as determined by the system-under-test, this technique offers the highest performance potential, with speeds approaching the maximum for each tool. No common-time base is maintained across the tools.
The system being verified must provide the functional interlocks. Thus, careful attention is given to finding the correct interfaces to partition the system among the tools. The interfaces between the hardware running on the emulator, and the software running on the workstation, are one example of this. Hopefully, the selected interfaces are already interlocked in the design, or can easily be modified to add the required handshaking, which results in a more robust design.
Thus, it seems that functional synchronization with emulation offers the best solution for synchronizing these disparate tools. One of the first steps in applying this technique is to find the proper functional interface. Fortunately, functional interfaces naturally exist between hardware and software subsystems. They occur at the I/O port level, where software commands are transformed into hardware transactions. These interfaces are designed to be at least partially interlocked by the chip designer, because response times between the software and hardware modules, in actual applications, cannot be precisely predicted by either component.
For example, assume that the software writes two words to a data port in response to an interrupt from the hardware. If the hardware assumes that the first word write would occur exactly n1 clocks after the interrupt, and the second word would be written exactly n2 clocks later, the design would be defective. Software systems do not offer that preciseness. However, it is reasonable, and often necessary, to place a maximum limit on the response time.
The hardware-level timing at the bus transfer level is always fully interlocked by design. However, at the application level, assumptions are made in the design about the relative performance of the subsystems. Thus, the communication path is partially interlocked.
When the ASIC emulator and software execution platforms run at nearly the same relative speeds as their modeled components do in the real system, these partially interlocked interfaces should be adequate. But, if either operates substantially faster or slower than the other, even a partially interlocked interface may fail. One of the tools would then have to be artificially slowed to better match the other.
As shown in Figure 1, the embedded microcontroller usually communicates with the ASIC logic via a standard I/O bus. This bus can be viewed at different levels of abstraction (see the table).
One level of abstraction, layer 1, defines electrical bus behavior. Here, multiple events happen in sequence for each data transfer, and most of them are not interlocked. It is assumed that the master and slave are synchronized to the same clock, and running the same protocol. Clearly, this is not the level where an intertool virtual bus should be modeled in functional coverification. The software system is not cycle-synchronized to the hardware emulation tool.
Layers 2 and 3, on the other hand, define the function to be accomplished, rather than how it should be achieved. Both define asynchronous operations, and are at least partially interlocked. Layer 2 (data transfer) is the best for coverification. It defines simple data transfer commands. Layer 3, in contrast, contains added system complexity, which complicates the underlying modeling of the virtual bus connection.
In the actual interconnection of the two tools, the master bus controller is implemented as a bus-functional model (BFM) in the emulator (Fig. 2). The BFM implements layer 1 of the bus model, which is required to interface to the ASIC ports. The BFM, however, simply takes commands sent from the software system, and expands them into bus waveforms. Because it is modeled in the emulator, it maintains clock-cycle synchronization with the ASIC logic-under-test.
The functional interface connects to the 32-bit I/O card in the workstation, which operates at the data transfer level. Note that the interface model has a buffer memory. This allows for the modeling of burst transfers to and from the ASIC. The I/O driver software system can write a buffer of data to the memory, and then trigger a burst write onto the bus.
Shared memory is also implemented in emulation, under the assumption that the emulated hardware accesses the shared memory more frequently than the software system. There is also the possibility that it has a non-interlocked-synchronous path to it. Only those memory pages that are accessible to the emulated hardware should be modeled in shared memory, as access to this memory will be slow from the software system. Trying to execute code out of the shared memory, for example, would result in a hopelessly slow verification.
There are other more sophisticated techniques for handling shared memory. These involve buffers and coherency algorithms (not part of the actual system, but modeled for verification) that would deliver a higher performance interface for both the hardware and software subsystems.
A separate data stream passes through the ASIC (Fig. 2). This stream represents the additional I/O of the ASIC, and usually is driven from high-level behavioral models using techniques similar to those discussed here. Unlike the embedded software system, however, these models are developed for verification only.
The software system interfaces with the hardware-under-test via programmed I/O operations (through command and status registers), interrupts, and shared memory. I/O and shared memory accesses can be trapped in the software verification tool, converted to I/O commands, and sent to the ASIC emulation environment. This requires that the user develop drivers, invoked by the trapped commands, to do the actual conversions. A polling mechanism can be added to the software execution environment to monitor interrupts from the ASIC hardware.
A significant amount of modeling must be done to realize the system described here. Yet, this system does avoid the pitfalls of targeted emulation, while providing a software-integration platform capable of emulation speeds. Emulation vendors, including IKOS Systems, have skilled staffs of consulting engineers capable of implementing part, or all, of this system.
References:
- Qureshy, N. and Hafeman, D., "A High Performance Multi-tool Hardware Verification Platform," HESDC 1997 Proceedings, pp. 255-267.
- VSIA On-Chip Bus DWG Plan (Members only); http://www.vsi.org.
- Dreike, P. and McCoy, J. "Co-Simulating Software and Hardware in Embedded Systems," Embedded System Programming; http://www.espmag.com/97/feat9706.htm.