Troubleshooting communications test systems can be made easier if the hardware and software development are separated from each other. Problems can be more readily found and corrected by following this approach.
Faults often occur because designers fail to anticipate the issues that arise when components of the system interact.1 Keeping this in mind, there are several things that can be done during the planning and design phase to mitigate this risk and make it easier to debug when issues do occur. The additional time taken for design and code reviews during the planning and development phase will reduce the stress of solving unpredictable and mysterious bugs during testing.2
Software Design Planning
Designing software properly can give a lot of insight into what is going on with the hardware in the system. First, make the communications layer modular so that if an instrument must be swapped, only the driver layer needs to be changed. This planning will be useful if a more precise instrument needs to be used during a later phase or if existing equipment should fail.
Creating small programs to debug without going through an entire system also is a good practice. Certain instruments may be used during the entire testing procedure, such as a controller-area network (CAN) to talk to the DUT. Using these programs, if there are problems during execution, this portion can be ruled out.
Other times, instruments may be used for only a portion of the testing. For these instruments, create a small program that tests an entire path to make sure the instrument is working correctly. If you have a DMM, write a program that connects the entire path to the DUT and powers on the DUT. If the DMM measures the expected value, then the DMM is working correctly.
Create tests that can be run without a DUT. This will isolate issues to determine if it is an issue with the system or with the DUT.
Pre-conditions and post-conditions should be added in the software to prevent unexpected behavior in the hardware. If these conditions are not met, the software should handle these violations sensibly.1 Software locks may stop the user from shorting out components. Current-limiting power supplies will prevent further damage if there is an issue or a bad DUT.
Keep spares because hardware components are susceptible to manufacturing defects, wear, environmental effect, and physical damage.1
Create logs so that communications can be monitored during execution. This will help detect if the system is sending the correct messages at the correct time. Timestamp the logs and add a description of the command so that it is easily readable. Have the capability to disable logging to speed up processing time or save disk space.
Design the capability to simulate instruments. This will be useful when testing the systems in iterations while the instruments are not installed in the system.
Hardware Design and System Integration Planning
It is important not only to design the system for the best functionality, but also to be easily accessible for debugging. This involves both communications interfaces and test points.
Often, there are multiple ways to connect to an instrument, such as GPIB, USB, or serial interface. Keep in mind how all of the instruments and hardware will be integrated in the system.
A further consideration is the communications methodology. Look at the APIs for the different methods and research forums to see if there have been issues with any particular one. Also, see if any may be easier to debug during execution.
Expose test points where current or voltage can be measured. During execution, these measurements will be helpful in detecting faulty parts. Label the cables and components at every connection so if the system is moved around, wrong connections can be easily detected.
After the system is designed and integration is taking place, it’s time to consider debugging.
Configuration and Settings
The DUT or instruments may have many configurable parameters. Do not assume that the values being used are the correct ones. Look at the requirements in the specifications and industry standards to make sure that the values being used comply. Also, look at forums or talk to the instrument vendor to find optimal levels for those not restricted by specifications or standards.
Consider the type of testing to isolate issues and gather information to determine root causes. Although there may be independent tests such as USB, RF, or putting the DUT into different current modes in the sequence, testing them sequentially may interfere since they are using the same components on the DUT or system.
Start by running tests individually to make sure each one can run by itself. If a test is not passing, try dividing the test into subtests. For example, if USB functionality is being tested, separate the USB functionality test into two portions. First, see if a USB drive can be detected. Then, verify that a file can be stored to the USB drive correctly.
Next, combine several of the tests and run them sequentially. If some of the them no longer pass, change the order of the sequence. Explore all possible situations the end-user may use the test system for such as configurations and functionality ordering.1 Make sure that all scenarios are tested.
Monitor the traffic coming from the system. This will ensure that the actual communications traffic is working and the issue may be in post-processing. There also may be a method to access the messages to monitor what the DUT is seeing.
Further isolation can determine if the issue is really with the system or if there is an issue with the DUT itself. Run several tests with a known-good DUT to see if the results align with what is expected.
It is necessary to have golden DUTs so if there are issues during testing, the debugging efforts can be focused on the system instead of determining if the issue lies in the system or the DUT. Having DUTs with known failures also is a necessity because it shows that the system will fail units correctly.
Data analysis is a key component in finding the root cause of an issue. Testing a DUT accurately is not just sending the right commands and measuring the correct values; it also is doing all of those steps at the correct time.
Look for places where the delay is too long. This could be causing problems such as before initialization/closing of an instrument, after changing the state of the DUT to account for settling time, or after changing the state of the system such as closing relays or taking measurements.
The operating system also could cause interruptions in soft real-time or nondeterministic applications. Avoid having CPU-intensive programs such as virus scanners or Internet connections running during testing.
There are more complications if multiple DUTs are being tested simultaneously. First, verify that one DUT is working as expected. If that is the case and multiple DUTs are causing issues, look for interference between the two DUTs. Are there RF signals interfering between the DUTs? Could there be crosstalk in the audio signals? Is testing one DUT bogging down the OS which interrupts testing in the second DUT?
Looking at the big picture also can give insight into issues on the system. Analyze the results of large samples of data. Once this data is acquired, plot the data to see different trends and correlation.
vs. Wi-Fi Rx Sensitivity Packets Recieved
The data can be plotted on a scatter plot to look for correlation. For example, on Figure 1, there is a correlation when the DUT current is less than 1 A and the number of packets received for Wi-Fi is less than 90%. This analysis could be used to focus on why the DUT current is dropping, as seen in Figure 2.
• Avoid Hard Coding Configuration and Timing Values
Because of the nature of creating complex systems, timing, configuration, and limits may change. Avoid hard coding these values. Make these units configurable. If needed, the capability to change these values could be password protected. This also will be helpful if a future version of the DUT is released.
• Intuitive Error Messaging
When creating error messages, make sure the messaging is intuitive so it’s easy to trace the error’s origin. Include the instrument name, channel, specific error, and if applicable, the incorrect measurement.
• Replicate Attempting to Fix the Bug
Before attempting to fix the bug, the steps and conditions to reproduce the bug must be known.3 After the bug has been fixed, the same steps and conditions should be used to verify that the bug has indeed been fixed. If the bug is not reproducible, the fix cannot be verified.
• Debug One Thing at a Time
Changing too many parameters and chasing too many paths can make it difficult to pinpoint which change actually fixed the issue. Also, one of the changes could have resolved one specific issue but caused a different effect.
• Keep a DUT With Known Behavior as Changes Are Made With the System
As changes are made to the system, you will know how the DUT should behave.
• Develop and Run a Test Plan Before Deploying the System
It should be tested over all of the required functions and the range of conditions. The system will be verified against expected results.4
• Ensure That Hardware and Software Engineers Work Together
Because the software and hardware experts usually are different people, a blame game can develop over who is responsible for an issue. A proper plan and a cohesive relationship need to be in place so that the two teams work together. All parties involved on the project must have synergy so that the issues can be solved efficiently.4
It is inevitable that some bugs will be present in a test system, but by planning ahead, it will be easier to debug during execution. With a complicated system, it is best to isolate issues. In many cases, brainstorming with those who are not deeply involved in the project may provide a different point of view.
Learning from mistakes can lead to ideas on how to fix other bugs. Both the brainstorming techniques and the evaluation of mistakes made in a design can serve as useful lessons for future projects and give insight into which development or testing skills an engineer can improve upon.4
The author would like to thank Michael Vavrek, a staff systems engineer at VI Technology,
for his contributions to this article.
1. Thane, H., “Monitoring, Testing and Debugging of Distributed Real-Time Systems,” doctoral thesis, Mechatronics Laboratory, Department of Machine Design, Royal Institute of Technology, Sweden, 2000, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.13.4735&rep=rep1&type=pdf
2. Akhter, S. and Roberts, J., “Multi-Threaded Debugging Techniques,” 2007, http://www.drdobbs.com/architecture-and-design/199200938
3. Taylor, I.L., “Debugging,” 2003, http://www.airs.com/ian/essays/debug/debug.html
4. LaACES Student Ballooning Course, “System Testing and Debugging,” Lecture 7, Louisiana State University, 2004, http://laspace.lsu.edu/aces/Lectures/Programming/Programming20Lecture207.ppt#256,1,System Testing and Debugging
About the Author
Lily Del Aguila is a systems engineer in the engineering services group at VI Technology, an Aeroflex company. Her experience is in test and measurement systems and system integration. She holds a B.S. in electrical engineering from the University of Texas. Aeroflex Test Solutions, VI Technology, 3700 W. Parmer Ln., Austin, TX 78753, 512-327-4401, e-mail: [email protected]