Automated Testing Speeds Effective Deployment Of Java Functionality

To fulfill its "write-once, run-anywhere" promise, Java technology must be widely deployed in resource-constrained information appliances and embedded devices. In addition to limited memory resources, developers face challenges in testing this highly heterogeneous class of products.

Dominating the desktop world is one major operating system, the Windows family, and one major processor, the x86/Pentium. But the operating-system/processor application environment in an information appliance or embedded device could be one among dozens of possible combinations. Each operating system and processor—when combined together, along with any necessary middleware, applications, and the Java virtual machine (JVM) solution—present unique testing challenges that can dramatically affect how soon a product comes to market.

This article discusses the benefits and challenges of test automation, plus its use in effectively and quickly deploying Java functionality in these highly diverse devices. It describes how engineers at Insignia Solutions developed test automation to test its Jeode virtual machine for these de-vices. Some benefits of test automation include:

Faster than manual testing
Repeatability to identify regressions immediately
Insulation from human factors, such as boredom or carelessness
Improved audit trail to identify bugs
Shorter residence time of bugs in the code, reducing the manual effort of development and test engineers
Ability to test code in live development

The main challenge of automation is the difficulty in implementing it. Some tests, particularly those involving devices that feature a graphical user interface, are hard to automate. Automated testing of embedded devices that employ real-time operating systems can also be challenging. Automation generally takes longer to set up and requires more skilled software engineers than a manual test.

Testing the Jeode platform: Insignia Solutions develops and markets Java virtual machine technologies, branded under the Jeode name. These products are targeted at information-appliance and embedded developers building devices with limited resources, like memory and battery power. Such devices can be characterized as heterogeneous because they employ a wide variety of operating-system/processor combinations. For example, to support the diversity in the embedded industry, Jeode technology is currently ported to Windows CE, NT4, VxWorks, Linux, ITRON, Nucleus, pSOS, and BSDi Unix operating systems, and ARM, MIPS, x86, SuperH-3, SuperH-4, and PowerPC processors.

Naturally, the multiplatform nature of this product complicates testing. At any one time, ports for up to six targets will be in development for an upcoming general release, perhaps six more for specific customers, and another 20 for internal development, which could be promoted to release candidates if needed.

It's important to automatically test as many platforms as possible. Some target systems, such as Windows NT and most Unix variants, are easy to automate. Others, including Windows CE and most embedded operating systems, are much more difficult. They must all be tested because the target market isn't in PC-type systems, where regular computer crashes are tolerated. Instead, it's in devices like Web terminals, set-top boxes, and networking infrastructure, where uptime and reliability are expected.

The Jeode Embedded Virtual Machine (EVM) is built from a source tree that combines platform-neutral code used in all ports, and platform-specific code, which is only used in some. For example, in Insignia's dynamic adaptive compiler, the code that decides which Java bytecodes to compile is common among all platforms. But the code generator is different, depending on which CPU the code is for. In addition, platform-specific code exists for different operating systems.

Engineers testing the Jeode EVM on a device test the amalgamation of the platform-specific and the target-neutral code. Both are equally important. Platform-neutral code may behave in unexpected and different ways on different platforms. For instance, on one real-time operating-system platform, the company discovered thread-priority problems in the platform-neutral code of the finalizer. These were due to the operating system's strict thread-scheduling behavior not seen under other less-strict schedulers.

Automated testing methodology: Automatic testing starts with the automatic building of the product. An important part of Insignia's automatic testing is the build and bench (bb) queuing system, in which both building and testing of the product occur in a series of multicomputer queues.

At midnight every night, a cron job starts the overnight build—conducted by a build farm that consists of 20 to 30 NT and Linux computers. They produce about 200 builds for approximately 30 target systems, totaling around 6 Gbytes of executables and supporting libraries. These are then stored on a central file server for about 30 days, where the automatic test tools, developers, and testers access them. When each build completes (most do by 3:00 a.m.), an automatic queuing system automatically starts the test scripts. Because most automatic building and testing takes place at night, the robustness of the system is important.

Types of automated test: The first step in establishing the quality of the EVM is to consider the quality of its source code. To accomplish this, we examine the warning and error messages that occurred during its compilation. This process can be easily automated. The verbose output files from the build system are analyzed by a series of scripts. This produces an HTML table of build results on the intranet, showing the success of each build.

A system of color coding categorizes the status of the builds: those highlighted in green have no warnings; those in yellow have some warnings; and those in red have errors that prevented successful compilation. The table displays all builds done overnight, grouped by branch, target OS and processor architecture, Java type (among others are PersonalJava and EmbeddedJava), and build variant (production, feedback, and many different debugging variants).

A second process that can be readily automated is regression tests, which test if the EVM, or one of its features, works correctly. These tests range from the trivial and ubiquitous "Hello World" application to the Technology Compatibility Kit (TCK) compatibility test suite. Some regression tests test if a particular Java API is present and operates correctly. Others work by testing the limits of the Java specification. For example, they may start thousands of concurrent threads, all communicating with each other. Or, they might attempt to provoke a stack overflow with deeply recursive functions or those that take hundreds of arguments.

All regression tests can only pass or fail. They don't produce a score. If they fail, they might do so in an unpredictable manner, making test automation more difficult. Some tests politely report their failure, or throw an uncaught Java exception. Others, especially those that test the limits of the specification, can cause a JVM to create a hardware exception (null-pointer exception, access violation, undefined instruction, and so on). On many embedded targets that the company works with, such exceptions may not be properly handled, potentially rendering the entire system unstable. This could require human intervention to perform a soft reset of the entire system.

The final category of tests that merit automation is benchmarking. These are often the easiest to automate because they have well defined outputs. As a result, capturing the output and extracting the result is easy. Interpreting the results is generally more difficult than in regression tests.

Benchmarks can generally be divided into two categories: Fixed Time and Fixed Work. Fixed-Time benchmarks do as much work as possible in a given time period (perhaps one minute), while Fixed-Work benchmarks are assessed on the time taken to perform a given work unit (say 100 iterations). Both derive their score by repeatedly doing a task, and then dividing the amount of work done by the time taken. However, benchmarks designed for modern high-performance desktop PCs are often ill suited to resource-limited, low-power devices.

All benchmarks suffer on embedded devices because they run differently than normal desktop Java applications. When designing the compiler in the Jeode EVM, an important assumption was that it would run standard Java applications that exhibit slack times (e.g., while waiting for user input). The compiler uses these slack times to function. Unfortunately, benchmark programs allow no slack times, so the compiler thread has to steal cycles from the Java thread.

Also, there are a variety of challenges with benchmarks. For example, Fixed-Time benchmarks can be problematic because there may not be enough time for the compiler to complete its work. In this case, the reported score reflects only the interpreter performance. Internally, we can work around this problem by running the benchmark twice. The first time shows it to the compiler, and the second time, after a short sleep, gauges true performance.

Figure 1 illustrates how the EVM uses dynamic adaptive compilation to interpret rarely used Java code paths and compile frequently used Java code paths to native code. This approach achieves optimal performance in constrained-memory environments, but presents benchmarking challenges.

Tools for automated testing: Because the EVM runs on a wide range of devices and operating systems, we can't use a single, all-encompassing test tool like MS Test. Instead, it takes a variety of tools, some developed internally, and some from outside.

The most important glue holding these systems together is Perl. Another useful external tool is the TCK test suites provided by Sun Microsystems to Java licensees. This software suite consists of several thousand tests that a JVM must pass to be certified as compatible. A typical test system consists of a harness running on a host PC (Windows or Unix), and a slave running on a small device.

The harness dispatches the tests in sequence to its slave and reports the result. When a test run is complete, the harness generates an HTML report. That lists the passed and failed tests, with links to more detailed output. Both the harness and slave can be run as command line applications, making auto-mation easy.

The TCK is useful for test automation. In addition to the large test suites that provide a convenient regression test, users can adapt the system to run their own internal regression tests through the same harness.

Three main internal tools are used for test automation. One is the Embedded Remote Console (ERC), a feature of the Jeode EVM that was added early in its development to assist developers working on small devices with limited display capabilities. ERC lets developers run the EVM on PDA-type devices with small screens and no keyboard. Simultaneously, they can use the large keyboard and screen on their development PCs for input and output. For test automation, Insignia has adapted ERC as a way to capture output from the EVM and any Java programs running under it.

Another tool of key importance to test automation is iRemote. This small program opens a server socket on the target and accepts commands from the host. It can transfer files between the host and target, plus start and stop processes.

The remote-scripting tool, developed by the author last year, builds on both ERC and iRemote. It attempts to unify test automation across platforms and simplify test automation by hiding an embedded target's test details behind layers of abstraction. Instead, the test developer sees a common API for all targets. The remote-scripting tool handles the details of downloading a test to the target, running the test, and capturing the results. The abstraction makes the test script very short and simple, and permits its reuse for a wide variety of test targets.

Figure 2 illustrates how the automatic-testing process provides layers of abstraction between the data-management code in the host PC and the iRemote Java code. The latter interacts with the target device.

Future developments: Over the past two years, test automation at Insignia has developed from a series of piecemeal scripts to the unified system currently in use. In the past, every automated test on each platform had a test script to do the job. In the future, Insignia hopes to eliminate as much of this test- and platform-specific complexity as possible by using cross-platform automation tools, especially the remote-scripting tool previously discussed.

By using an upgraded iRemote, the company would like to set up overnight TCK runs on as many platforms as possible—running both the official TCK-compatibility tests, and its own internal-regression tests. For benchmarks and stress tests, Insignia wants to expand testing via the remote-scripting tool.