What's New in Static Analysis Technology

1 of Enlarge image

I had a chance to talk with Dr. Benjamin Brosgol of Adacore about the latest static analysis technology including how it fits into the software development life cycle.

Wong: What is your definition of a static analysis tool?

Brosgol: A static analysis tool is software that takes as input a program's source text (or some intermediate representation derived from the source text, such as Java bytecodes) and computes properties of that program without executing it. That is a very general definition, and in the context of this discussion the interesting properties are those related to a program's reliability, safety, and/or security. Things like the absence of vulnerabilities (no out-of-range array indices), or, at a more abstract level, no bugs in the program's logic.

Wong: Many languages claim to be "strongly typed", with lots of checks at compile time or run time or both. Why are special tools needed for static analysis?

Brosgol: Languages vary considerably in how well they detect errors, and indeed checks that are automatically enforced by a compiler for one language might need to be performed by a separate tool for another language. An example is conversion (casting) between unrelated pointer types. This is a compile-time error in Ada, detected by any compiler, but for C a supplemental analysis tool would be needed.

The situation with run-time checks is analogous. The classical "buffer overrun" vulnerability in C and C++ would not occur in languages such as Ada and Java; instead of the program having an unspecified effect, an exception would be raised/thrown. But from the programmer's viewpoint, the important thing is to know that the overrun (or exception) will not occur. That requires a deeper analysis of the program's control and data flow than most compilers are able to perform, so special tools are needed.

The presence of particular language features can help static analysis. A case in point is the ability to specify range constraints for scalar data in Ada, for example declaring that a variable N is in the range 0 through 100. Using range information, a static analyzer can compute more precise bounds on intermediate results and provide more accurate diagnostics for range violations and integer overflow.

In the other direction, some language features make code difficult to analyze. For example if a pointer can designate a declared variable, then detecting whether the variable is initialized before it is referenced becomes more complex.

Wong: Where do static analysis tools fit in the software development life cycle?

Brosgol: Most tools fall into one of two categories. Some are "bug finders" that work either retrospectively on existing codebases in order to identify vulnerabilities (for example a memory leak checker), or as part of an ongoing development in order to prevent errors and vulnerabilities (for example a coding standard enforcer). Other tools help the user understand the code, for example by deriving preconditions for subprograms. These can be used during initial development by the original programmer, as part of a code review, or retrospectively on existing code that might need to be reused or updated. Roughly speaking, the first category of tools performs analysis, while the second category also performs synthesis.

Wong: There are lots of static analysis tools on the market. Has the technology reached its limit, or are there opportunities to advance the state of the art?

Brosgol: As noted above, there are two main categories of static analysis tools. But there is no reason that a tool could not serve both purposes: identify potential run-time errors, and also synthesize program properties that can help the user understand the code. By computing "deep" properties of the code, such a multi-faceted tool can help identify logic errors.

Wong: What do you mean by "deep" properties?

Brosgol: From a mathematical point of view, any subprogram (code module) computes a partial function that maps inputs - either parameters or global objects - to outputs. The programmer may or may not have a precise specification of the constraints on the input or the output, but a static analysis tool can compute these constraints, or a reasonable approximation, based on the actual code. The resulting "contract" for the subprogram can be reviewed by the developer to check whether it matches the intent. For example if the derived precondition for an implementation of the arcsin function shows that its parameter can be outside the range -1.0 through 1.0, that indicates an error.

Wong: What are some examples of the properties that a multi-faceted static analysis tool can compute?

Brosgol: One of its main purposes is to identify code defects, so it should flag constructs that might raise an exception at run time. These include an array index out of range, integer division by zero, null pointer dereference, and integer overflow. The tool should detect accesses to possibly uninitialized variables, and also unprotected accesses to variables used by multiple tasks (race conditions). It should identify suspicious code that likely reflects a logic error (for example dead code, unused variables, or conditions that always evaluate to True).

The second main goal is to assist human code review. Here the idea is to characterize the input requirements and the net effect (preconditions and postconditions) of each subprogram. In some cases a subprogram whose source code is unavailable might be invoked; the tool can compute "presumptions" about such external subprograms based on information accumulated at each invocation. All these "contracts" - preconditions, postconditions, presumptions - can be presented in human readable form in addition to being maintained internally by the tool itself. Developers can use these contracts to verify both the explicit and implicit assumptions made by the code writers.

Wong: What about "false alarms"?

Brosgol: This is of course important, or the user will be frustrated by the deluge of mostly irrelevant warning messages. A well-designed static analysis tool can mitigate this problem by qualifying each warning with a ranking of the likelihood that the message corresponds to a real problem. Basing the ranking on heuristics that the user can adjust is an effective way to deal with the practicalities.

Wong: How about other usability issues?

Brosgol: An important issue is ease of incremental use when the tool is applied to a system in development. A static analysis tool can generate a large body of data. When it is applied to an updated version of a program that has already been analyzed, it should be easy for the user to see what has changed in the tool's output. This can be addressed if the tool maintains an historical database.

Scalability is essential. If a tool is to be useful for real systems, then its performance should not degrade with program size. The use of a subprogram's "contracts" in the analysis of its invocations contributes towards this goal; there is no need to rescan the subprogram's complete code.

Another important issue, especially for systems in development versus existing code bases, is the ability to run the tool on one component of the system, even if the other components are not available. The computation of "presumptions" for external subprograms fits in with this goal.

A tool that calculates "deep" properties needs to carry out sophisticated control and data flow analysis. As an example the analysis will identify, for the variables in Boolean conditions, the value sets that cause the conditions to take one branch versus the other. Consider this Ada program fragment:

if N>0 then
  Do_Something;
end if;

Assume that N is a 32-bit integer value. The static analysis tool will compute the following value sets for N:

N : \{-2_147_483_648 .. 0\}, \{1 .. 2_147_483_647\}

These value sets appear in the human-readable output and can guide developers in the writing of run-time tests.

Wong: This all sounds promising. Do any currently available products provide all the functionality described above?

Brosgol: Yes, the CodePeer static analysis tool, which works on Ada code. This tool, developed jointly by AdaCore and SofCheck, was explicitly designed to serve both as a "bug detector" and an aid to program understanding / code review. A technical paper describing CodePeer's features and technology is available from AdaCore and SofCheck.