The rise of affordable microprocessors has resulted in an immense amount of computing capacity in mainstream digital design. To drive these complex devices, software has dramatically grown in size, with elaborate interface logic, usually captured in large ASICs, required to support the intricate communications between the CPU and the surrounding hardware. This growth in software and hardware complexity is straining existing design practices—especially system verification. System verification now accounts for more than 40% of the overall design cycle, an unacceptable situation given the ever-shrinking market windows. And yet, verification is the crucial factor in maximizing the likelihood of first-time success.
Until recently, the only really viable hardware/software (HW/SW) integration strategy was to bring the two components together after the hardware was built and prototyped. True, limited ways were available to merge a system's hardware and software before physical integration, but they were just that, limited. For example, hardware teams at Northern Telecom tried to model their designs using a set of component models and their connectivity, while a bus-functional model represented the microprocessor or controller.
A bus-functional model, however, does not model the microprocessor's complete behavior, only the different bus cycles the processor can execute. Using these models, the hardware designers constructed a test that would, for example, write to and then read from each of the memory components in the design. Clearly, this fell far short of what was needed for comprehensive verification.
On the software side, the teams required a fully functional model of the processor to execute software on a simulated design. However, writing a program that completely emulates the behavior of a complex processor is an extremely complex task. To obtain this model, the software team might use a device called a hardware modeler. This is a machine that contains much of the circuitry of a semiconductor tester, and is interfaced to a hardware simulator. Modeling the processor in this manner usually results in speeds of 1 to 10 instructions per second on the simulated design, which is obviously much too slow to execute and verify a meaningful amount of software.
The only realistic alternative for system integration was to wait for the hardware prototype. Unfortunately though, delaying integration that long doesn't give a design team much time to address the numerous performance issues that usually surface. Fearful of missing critical delivery dates, the temptation is to fix hardware problems in the software, leading to compromised functionality or performance goals. Consequently, Northern Telecom, like many other design teams, has been actively exploring new ways for validating system design.
Deciding To Coverify
What became painfully obvious was that to fix system problems without incurring the time and expense of changing the hardware, the problems had to be discovered prior to the hardware-prototype stage. In other words, the software must be run on the hardware while it is still in simulation, as a virtual prototype.
This has been the dream for years, but two things were necessary before virtual prototyping could become a reality. First, the ability to simulate hardware at speeds sufficient to make software execution feasible was absolutely necessary. In most cases, this means that overall simulation performance must be increased by a factor of at least 1000 over the current execution speeds of hardware-oriented simulation products. Second, the debugging and development environments for the hardware and software need to be brought closer together. As a result, the original source form for both the software and hardware must be maintained within a single, unified debugging environment.
In the past few years, the underlying technology to support a true coverification environment has emerged and matured. Commercial solutions for coverification are now finally available that enable HW/SW integration earlier in the design cycle. Because these approaches create a virtual test and integration environment, software and hardware teams can now work together from the beginning. This eliminates time-consuming back-end integration and testing, helping designers to uncover problems earlier in the design process where they are less costly and easier to fix.
Moreover, due to the design's fluid nature at this stage, functional changes can be made where they make the most sense, either in hardware or software. Although HW/SW coverification technology and methods are relatively new for embedded systems designers, it is rapidly becoming an integral part of mainstream electronic system design.
Eager to adopt coverification in its design flow, the Northern Telecom design team decided to test the viability of HW/SW coverification using an embedded, digital-phase-locked-loop (DPLL) design targeted at switching applications. The decision was influenced by the software group's strong need to have access to hardware earlier in the design cycle, to test and adjust complicated algorithms, control constant determination, verify HW/SW interfaces, and conduct performance modeling. To evaluate the HW/SW interface of the DPLL design and the DPLL software algorithm within a coverification environment, the design team used the Seamless Co-Verification Environment (CVE) from Mentor Graphics.
The DPLL design was selected because it was relatively simple, yet still proved the concept. While the hardware content of this subsystem is relatively small, the software aspect is key to providing clock synchronization within a complex telecom switching environment. This design had already been created using the traditional approach; designing the hardware and software in tandem, and integrating the two only after a hardware prototype was available. Using a preexisting design to validate the coverification solution would afford the opportunity to directly compare the effectiveness of the two approaches, providing a "proof of concept."
Using the traditional approach, after the hardware team sends an ASIC design off to be manufactured, it might be as long as 11 weeks before the software team could have access to the ASIC prototype. With coverification however, the software group calculated it would obtain access up to nine weeks in advance of the traditional methodology. This is even more impressive considering that approximately two weeks of preparation was required to ready the design for Seamless, as no Verilog netlist was available from the board schematic database.
The design implements a DPLL low-pass filter. It reads an error signal from the phase comparators, computes a digital-to-analog-converter (DAC) value, and writes to the DAC. The low-pass filter uses the Munter algorithm in software.1 The DPLL performs five main functions (Fig. 1). Each functions occurs at a periodic interval; although it may not be the same interval, it will be an integer multiple. In addition to the DPLL algorithm, various error-detection routines and processor-specific code (interrupt handlers, timer code, etc.) must be run.
The major hardware building blocks of the DPLL subsystem are an MC68307 microprocessor, two 128 k by 8 SRAMs, and a custom ASIC. One of the principal functions of the ASIC is to measure the phase error between multiple 8-kHz reference-frame pulses, and the frame pulse generated from the local 32.768-MHz voltage-controlled crystal oscillator. If the generated pulse exactly matches the reference-frame pulse rate, the value clocked into the latch will always be the same. Typically, however, there are slight frequency variations in the reference-frame pulse, resulting in an increase or decrease in the value latched. The change in this phase number is the phase error between the two signals, and is measured as a fraction of a frame period. Each phase comparator is memory-mapped so the processor can read it directly.
On the software side, the DPLL software hosted on the 68307 is complex. Without getting mired in the details, it is important to understand the two basic functions executed in the software. First, the PC_INTERVAL represents the time between samples of the phase comparator, measured in milliseconds. Second, the DPLL_INTERVAL is the rate at which a new value is written to the DAC, also measured in milliseconds. This runs in a continuous loop.
The DPLL_INTERVAL must be greater than or equal to the PC_INTERVAL, to ensure that a new DAC value is only computed once on a set of phase-comparator values. In addition, the DPLL_INTERVAL must be an integer value of the PC_INTERVAL, but not greater than 16. This limit is due to a defined data-structure size, which only holds 16 samples from each sync source.
The DPLL loop is initiated by an interrupt from the sync ASIC. At least once during this period, but usually up to the limit of 16, all the phase comparators will be read and stored in the data structure. A new DAC value is then computed and written, with the software incrementing to the next available data structure.
After the DPLL design was completely described in a Verilog netlist, and a Seamless model for the MC68307 was incorporated, the next step was to map the processor memory to the simulation memory instances in the hardware description. At this point, any stimulus required by the hardware simulator had to be defined.
This was necessary because the processor model can execute an unmodified version of the application code written in C/C++ and/or assembly code. The overhead of defining the stimulus was reduced to defining clocks and external interrupts and/or serial data streams.
Once the design was ready for coverification, the hardware designers were able to run unmodified application code directly compiled from the software engineers, and the software engineers could test their software on a virtual prototype. The coverification environment provided a high level of design insight, as the designers were able to execute and track software execution on the virtual prototype. On the DPLL design, for instance, the code execution could be followed in the debugger, while a trace and list monitor in the hardware simulator displayed the bus activity. Break points were used in either environment to control simulation execution.
A powerful capability not readily available when dealing with a physical prototype is the ability to control and observe the internals of the sync ASIC during coverification. With access to the internals of the RTL description and all the phase comparator registers in the processor's memory map, the designers could directly monitor, and even modify the contents for "what-if" fault analysis using a software debugger tool. Because an RTL representation was used for coverification, any software integration issues could be addressed in the description prior to synthesis and silicon fabrication.
Running the coverification without optimizations, with a DPLL_INTERVAL default interval of 64 ms took approximately 7.5 hours. Obviously, this was not productive for debugging larger amounts of software. Plus, if any problems were uncovered, they would have to be fixed, and the system reverified from scratch.
To improve performance, the design team collapsed the stimulus to run the loops one right after another. This reduced the time to boot and run five DPLL_INTERVAL loops to 68 minutes. To improve wall time further, the team took advantage of several patented optimizations in the Seamless kernel that can dramatically increase performance. With its instruction set simulator, for example, the tool can trace breakpoints, register contents and pointers, as well as modify variables in the context of an original source code view. When all the optimizations were applied to the DPLL_INTERVAL loop, it collapsed from 1.7-ms of simulation time to approximately 300 ns in its effort to execute the one hardware write cycle to the ASIC with the new DAC value
In order to apply optimizations in a dynamic fashion during the DPLL_INTERVAL, conditional breakpoints were used. Without all these optimizations, the boot sequence, plus one DPLL_INTERVAL loop, required 3.69 ms, or roughly 26 min. Using conditional breakpoints to control optimizations, the same amount of run time would yield 29 complete DPLL_INTERVAL loops (Fig. 2).
The coverification effort was very successful. First, it proved that the software and hardware can be reconciled months before the hardware is available, saving precious time and engineering effort. Second, using coverification enabled the software team to identify all of the major problems they had uncovered during the actual hardware integration. This point, more than anything else, convinced the team that coverification is now a viable alternative.
This test case shows that coverification has progressed beyond just being a tantalizing concept. With today's sophisticated coverification environments, the technique has become a truly realistic and powerful approach that can help mainstream embedded design teams produce challenging applications on aggressive schedules.
- Munter, E., "Synchronization Clock or DMS-100 Family," IEEE Transactions on Communication, vol. Com-28, No. 8, August 1980.