One of the hardest lessons to learn when designing wireless devices is the tenacious nature of wireless connections. For example, physical objects like buildings, tunnels, and canyons—both rural and urban—can adversely affect the RF transmission of signals to both cellular handsets and Wi-Fi devices. For this reason, engineers learned long ago to design systems that fail "well" or in a graceful fashion. Instead of experiencing a total system failure, failure-resistant hardware and software merely change to a reduced level of overall functionality. This is known as the "graceful degradation" of the system. Who wouldn't want the automobile's embedded controller to sacrifice fuel efficiency in order to keep the vehicle operating—especially if the alternative is that the car will simply stop functioning?
Early wireless pioneers realized that asynchronous communication architectures are an important way to resist the network failures that are inherent in wireless systems. Asynchronous designs use less power than synchronous ones. In addition, they provide a graceful way to deal with the problem of intermittent connectivity. Such architectures allow for graceful network failures by decoupling critical application processing from network communications.
For a current example of gracefully failing wireless systems, consider self-replicating, peer-to-peer mesh networks. Here, a disconnect in one node of the wireless mesh network is quickly rerouted to another node. Recently, this ability to fail with grace has caught the attention of the U.S. defense community. In the very near future, every military vehicle—from UAVs to Humvees and helicopters—may become nodes in a wireless mesh. In a battlefield scenario, such nodes would create a very dense network. When any given node was destroyed, this network would "repair" itself automatically.
Graceful degradation is perhaps most necessary, however, in the development of wireless software applications. This need arises from the enormous flexibility that is afforded by embedded and application-level software systems. But such design flexibility always comes at a price. In this case, the price is the capability to fully verify the system. Highly flexible systems are impossible to fully test. Perhaps this is why experienced software engineers and programmers added an unusually humanistic term to their vocabulary: "grace." This is not the unexpected grace of inspiration, but rather the grace of failing with style.
Software will always fail under a particular set of circumstances. But many modes of failure can be anticipated. The effects are then mitigated so that the failures are not catastrophic. How does one make software fail gracefully? The answer is simple: in the same way that one produces good software. Generally, that means designing software modules that balance low coupling with high cohesion. Charles Shelton and Professor Philip Koopman, who are both with Carnegie Mellon University, suggest meeting the following architectural objectives:
- Construct a fine-grained distributed system to decouple components.
- Partition the system into critical and non-critical components.
- Build well-defined component-interface definitions.
- Design components to be semi-autonomous (i.e., to provide some functionality when inputs are lost).
The Robust Self-configuring Embedded Systems (RoSES) research project at Carnegie Mellon University (www.ices.cmu.edu/roses) may be the most noticeable effort to study graceful degradation in distributed embedded systems. One of the central models used in the RoSES project is the product family architecture (PFA). In the PFA, different but related products share similar architectures and components that form a conceptual framework for specifying and implementing graceful degradation.
The PFA model has three main characteristics that make it well suited for wireless embedded systems: distributed functionality, smart sensors, and processing power. This processing power is devoted to optimizations—not just core power requirements.
The graceful degradation of software systems is only part of the solution. Like the fallen world in Milton's Paradise Lost, failed components must be restored to grace. In the technical vernacular of software engineering, this is the equivalent of the graceful repair and reintegration of the failed components. As subsystems are actively repaired or replaced, they provide added resources to restore functionality.
Graceful degradation is the desired goal of any serious wireless hardware and/or software design. Yet much work still must be done to provide a realistic framework for its inclusion in the average product development cycle. Please share your comments with me at [email protected].