Debugging Survey Says: What Works, What Doesn't

When stored-program computers were first invented, it didn't take long for people to realize that programmers would spend a large part of their time on debugging. As British computer pioneer Maurice Wilkes recalls in his memoirs, "By June 1949, people had begun to realize that it was not so easy to get a program right as had at one time appeared. It was on one of my journeys between the EDSAC room and the punching equipment that the realization came over me with full force that a good part of the remainder of my life was going to be spent in finding errors in my own programs." (EDSAC was Cambridge Computer Laboratory's Electronic Delay Storage Automatic Computer, the first stored-program machine.)

Software engineers still spend a "good part of their life" finding errors in their own programs, a process we now know as debugging. Results from a survey Virtutech distributed at the Embedded Systems Conference this year confirmed that software developers spend over 50% of their time debugging.

Consequently, enhancing the efficiency of the debugging process is the most highly leveraged improvement possible to increase programmer productivity. Because the number of lines of code is doubling every two to three years but the number of programmers is only increasing at 8% per year (source: Venture Development Corp., Natick, Mass.), improving productivity is essential. If not, Jack Ganssle's observation that 80% of embedded systems are delivered late and often hopelessly bug-ridden will continue to be true.

For several decades, software developers have used the same methods and tools for debugging. Indeed, the survey revealed that nearly one-half of all respondents could not name a significant new debugging tool, technology, or process developed in the last five years. The last major advance in debugging was source-code debugging in the late 1970s, when programmers no longer had to concern themselves with the assembly code output by the compiler but could debug as if their computer were really running their high-level language directly.

But still, the major problem with current debugging methodologies remains. Finding a bug requires stopping the program just before the bug occurs, repeatedly starting the program from the beginning each time the point of interest is missed. In a complex system, with real-time inputs or multiprocessors, there is not even a guarantee that the problem will re-occur on demand. Systems like this are notorious for so-called "Heisenbugs"—bugs that disappear once the code is instrumented to track them down.

Programmers who are working on programs for hardware other than their desktop computer face additional challenges. They mainly need to know what to run their software on to debug it. Almost one-half of respondents cited used actual hardware during test and development; about 15% used some sort of virtual platform based on instruction-set simulation. Simulation has a number of fundamental advantages over real hardware, such as determinism and the ability to restart the platform from a checkpoint or, for the most complex systems, the ability to pause them and debug them all.

However, in the area of debugging, simulation makes it possible to do things that are simply impossible with the real hardware. As the program runs, it is possible to save data from the platform that enables history to be examined or even to run the code backward. Green Hills Software's Time Machine, the Time-Traveling Virtual Machines of the University of Michigan, and Virtutech's own Simics Hindsight provide increasingly powerful ways to go back in time to track down a bug after it has been detected, without restarting the program. This is especially important for the most difficult bugs, those in device-drivers, operating-system kernels, or where the bug is detected long after the original root cause has passed and the data of interest is removed from the stack.

These new powerful tools bring us one step closer to spending less of our lives "finding errors in our own programs."

See associated figure