Testing The System Early Isn't Always A Good Idea

All engineers know that finding defects late in the design process is expensive. It's not unusual for the cost of making design changes to rise by three orders of magnitude between the first and last 10% of the design process .

Some late defects can be prevented through diligent engineering. Others can't be anticipated because they represent system-level behaviors that cannot be easily predicted from the behavior of individual components. Such unpredictable system-level failure modes aren't merely the result of incompetent engineering—they're inherent to the nature of a system.

This all suggests that the earlier we begin system testing, the cheaper the cost of responding to the defects found in system test. Because of this, we counsel new engineers to integrate systems as soon as possible, and to conduct decisive system tests early. While this is a sound general principle, we should remember that it's possible to integrate a system too early in the design process. It's also possible to conduct system testing too soon. Premature system integration will actually slow down design and troubleshooting instead of accelerating it.

When systems are integrated too early, it may be quite difficult to test them productively. Downtime usually increases at a system level because more things must work to test a system than a subsystem. Also, debugging complexity rises due to interactions and fingerpointing between subsystems. Thus, we often pay a heavy efficiency penalty by beginning system test too early.

Testing is often slower as well. For example, consider a controller board that must recognize error codes coming in from the rest of the system. A good subsystem test could sweep through the entire range of inputs in a thousandth of the time that it would take for the same error codes to show up during system test.

Early integration tests can create illusions of success. Unfortunately, we're able to pass a system test without having a truly robust system. For instance, component variations in many designs may present no problem when we build a single unit. But they can create big problems when we produce units in quantity. It's far easier to exercise subsystems near their margins when testing at a subsystem level because we have direct control over all subsystem inputs. As a result, good subsystem tests are often more powerful at establishing the true robustness of the design than system tests.

The final disadvantage of early system test is the most insidious. By overemphasizing system test, you can enter a testing death spiral. Most companies have a finite pool of testing resources. When we focus on early system test, we draw time and attention away from subsystem test. Less robust subsystems then are integrated, and more bugs show up in system test. This increased defect level appears to justify the increased emphasis on system test. In reality, it was the emphasis on system test that increased the defect level.

How can we tell if we are integrating and testing too soon? Take a careful look at the types of defects that you find at various stages in your testing hierarchy. Ask whether certain defects might be found more effectively at an earlier stage of the test hierarchy. Modify your process so you primarily find defects in system test that are most economically found there. Remember, integrating a system too early can be as dangerous as integrating it too late.