Static analysis has established itself as a "must-have" for the verification of critical software. Notably, it can find problems that are hard to uncover by testing, such as concurrency issues and security vulnerabilities. Many static analyzers can now associate CWE entries (for Common Weakness Enumeration,1 a well-known classification of security vulnerabilities) to their messages, to help users review all messages related to a given vulnerability. For example, CWE-120 is the classic buffer overflow.
But, in general, there’s no guarantee that the code is immune from classic buffer overflows even after reviewing all messages associated with CWE-120. Indeed, most static analyzers are "unsound" in their verification; that is, unable to report all potential problems. This has led projects to use a combination of static analyzers to detect as many security vulnerabilities as possible. For example, Mozilla uses Clang Analyzer, clang-tidy, their own checkers, and Coverity on its C/C++ code.2
But this still doesn’t provide any guarantee that a given vulnerability has been fully covered. Enter sound static analysis. Some static analyzers are "sound" in their verification and can guarantee that a given vulnerability isn’t present in some piece of software. In practice, sound static analyzers output an exhaustive list of places where the vulnerability could occur, most of which are false alarms or "false positives" that need to be reviewed.
While more demanding for users, these static analyzers make it possible to achieve higher levels of confidence than is possible with their unsound counterparts, which makes them attractive in a security context. A recent workshop at NIST focused on "Sound Static Analysis for Security"3 with presentations from providers and users of such tools showing the maturity of that field. In this article, we draw on that workshop’s results and our experience building both sound and unsound static analyzers to explain why their difference matters—and when you should use which tool.
Definition of Sound Analysis
A sound analyzer is correct with respect to a class of defects it can detect in a safe and exhaustive way. It offers the guarantee that it will detect any such vulnerability occurring during run-time. This is in contrast with an unsound analyzer, which might not flag faulty code lines (in order to limit the number of warnings returned to the user).
In addition, a sound analyzer is guaranteed to be exhaustive, at the expense of possible false positives: warnings emitted on code for which no problem actually occurs at execution time. As explained by David Wheeler in his talk about verification of legacy software,4 exhaustiveness is a large commitment, which can hardly be made by any other verification technique.
This difference between sound and unsound tools is shown in Figure 1. The green space represents the programs that are correct, the red part those that have a defect. In each illustration, a circle represents the answers provided by a tool, with the programs inside the circle being reported as correct and the programs outside the circle reported as potentially buggy.
1. Shown are green and red areas that indicate correct programs and those that have a defect, respectively, when using sound and unsound tools.
Though sound tools will never report a buggy program to be correct, they may report potential errors on correct programs (false positives). Unsound tools will typically generate many fewer false positives, but will not detect some erroneous programs. Unsound tools are sometimes even complete (they report no false positives), making them very easy to use—every reported bug is a real problem and should be fixed. (In literature, the term “precision” is sometimes used to refer to a static-analysis tool’s ability to avoid false positives. Thus, a complete tool is one that’s absolutely precise.)
An example of a shortcut used by an unsound tool to produce fast, mostly noise-free outputs is to entirely ignore large or difficult-to-analyze functions. Likewise, calls to functions of external libraries may be skipped. To be fast, such analyses are sometimes flow- and path-insensitive, meaning that they don’t depend on exactly what the function did “before” the source code line that’s being analyzed. As this usually results in poor precision, unsound tools will filter their outputs aggressively, using heuristics to determine which messages are likeliest to be correct.
In contrast, a sound tool is usually more accurate, albeit slower and less selective (higher rate of false positives). Still, even though they aim at detecting all occurrences of some kind of defect, the guarantees offered by a sound analyzer are subject to certain assumptions.
First, most analyzers work on the source code, as opposed to, say, the final executable. Discrepancies may be introduced by gaps between the logic semantics used for the verification and the actual execution of the program on the platform (because of bugs in a tool used in the compilation process for example). Second, a sound analyzer might not properly support certain language constructs which are too complicated, resorting to approximations to handle them.
Finally, to keep the analysis tractable and to increase the precision of the tool, a sound tool may choose to treat in an unsound way some particular language features. For example, it might assume that calls to completely unknown functions don’t modify variables, or it might ignore attempts to manipulate the stack through setjmp/longjmp, etc.
In contrast with unsound tools, a sound tool usually reports the assumptions it makes, so that they can be verified by others means. As explained by Roderick Chapman, this is part of the soundness case of a sound static-analysis tool, and helps to build trust in the tool.5 The need for such a gap with “full” or “ideal” soundness has been recognized by the research community in the soundiness manifesto.6
Please note an important point of terminology. Sound is sometimes used also for unsound tools! In that context, it usually means that all findings are real bugs, i.e., that no false positives are ever emitted. This is what we called complete above. Hence, care must be taken when reading that a tool is sound. In this article, we always use “sound” in the sense of “no false negatives”; in other words, all possible bugs are reported.
Costs and Benefits of Sound and Unsound Analysis
Sound and unsound analyzers bring different benefits for different costs. The first benefit of a sound analyzer is that it guarantees absence of a certain class of defects—notwithstanding a set of assumptions more or less easily reviewable. In addition, while unsound analyzers generally aim at detecting bugs, sound analyzers can be used to verify that the software has some “good” properties in a larger sense. These good properties naturally include absence of certain defects, but aren’t limited to that. Some examples of good properties targeted by sound analyzers include:
- Correct data flows: Each function only accesses the data it’s allowed to access, and its output values only depend on the input values from which they’re allowed to depend. This is particularly interesting for security applications, to ensure that secrets don’t leak.
- Absence of some class of defects: Initialization errors, run-time errors, concurrency errors.
- Verification of key safety or security properties: Preservation of safety invariants, unreachability of bad states, etc.
- Functional correctness: The program conforms to its high-level specification. In these last two cases, required properties should be manually expressed by the user in a dedicated formal language.
The major benefit of unsound analysis, on the other hand, is its cost. Since they require little or no annotations, and will focus on limiting the number of false alarms they generate, unsound analyzers are generally easy to apply. In fact, they may produce quick gains even when applied blindly on large existing code bases. Conversely, sound analyses often take more effort to apply. Depending on the technique used, they may require code changes, user-supplied annotations, or reviews of numerous false alarms.
To alleviate this cost, a project can take a step-by-step approach to the adoption of sound tools, achieving increased benefits with each step. A scale of confidence levels for the TrustInSoft Analyzer was presented by Benjamin Monate.7 An example of this approach is the adoption process for the SPARK analyzer, which was accomplished as a joint effort between AdaCore and Thales.8 It defines five levels of assurance, offering increasing benefits (and costs) as a project ascends to higher levels (Fig. 2).
2. In the SPARK analyzer adoption process, benefits (and costs) increase as the project ascends to higher levels.
Each level builds on the previous ones—the higher levels require more work to achieve, but also provide more gain. To reach the bronze level, no user-provided annotation is needed, even though some work is already required to identify parts of the code that should be analyzed, possibly rewriting them to comply with tool-imposed restrictions. Reaching this level already ensures total absence of some defects, like uninitialized data, aliasing of parameters, or data race conditions.
The platinum level, at the other end of the spectrum, requires writing the full functional requirements as specification in the code, as well as providing intermediate specifications and possibly other annotations when required by the tool. But code verified at this level is guaranteed to be free from most run-time errors, and to conform with its supplied specification.
As discussed by Stuart Matthews, the level to aim for on a specific project is a difficult question.9 It may depend on the level of integrity required by the project, on the high-level objectives discharged by the analysis tool, as well as on the kind of code and properties that are envisioned. Different levels may even be targeted on different parts of the same software; for example, depending on their relative criticality or the features that they use (such as floating point numbers, nonlinear arithmetic computations etc.).
Unsound static analyzers allow developers to quickly find defects in a large code base. Sound static analyzers guarantee finding all defects of a given kind, albeit with additional effort. As emphasized by Paul Black in his introduction to the workshop,10 there are different static-analysis techniques for different use cases, which all have their strengths and weaknesses. They complement each other well, and the appropriate technique(s) should be chosen depending on the use case.
Thanks to its ease of deployment, unsound static analysis has become a standard tool in serious software development. It’s used in most large software companies, and advised by best practices. Due to its higher cost, sound static analysis has long been the domain of experts. However, with the recent progress in verification techniques, sound static analysis is used in more and more projects, and is becoming part of the standard development process when strong safety or security requirements are needed. In the years to come, sound static analysis may become a standard tool for critical software development.
2. https://fosdem.org/2018/schedule/event/mozilla_how_ship_quality_software/attachments/slides/2330/export/events/attachments/mozilla_how_ship_quality_software/slides/2330/firefox_quality_fosdem_2018.pdf, slides 24-25