Static and dynamic code analysis can improve application performance, safety and reliability by identifying problems early in the development cycle if the proper tools and procedures are used from the start. Dr. Mike Hennell, founder of LDRA, spoke with me about the various aspects of these types of tools.
Wong: What is static analysis? What kinds of software errors does it identify?
Hennell: Static analysis is performed on the code itself which is usually but not necessarily in a high- level language. The code is not executed. Static analysis is directed at finding the technical faults in the code. These are the class of faults which could occur in any piece of code regardless of application and are features of the language itself and the understanding and intent of the programmers. An example would be a divide-by-zero fault which could in principle occur in any application. In general, the technique is based on knowledge of the syntax of the language and only limited use of the semantics.
Static analysis can consist of a simple scan of the code in order to discover violations of simple rules such as use of goto statements, or it can consist of a deep analysis of the whole project scanning all possible paths in order to discover whether, e.g. files can be written to after they have been closed.
However, the definition of static analysis is contentious and many formal methods techniques can also be included in this category.
Examples of the techniques performed by static analysis are:
- Programming standards verification – which assesses if the source code confirms to a particular set of programming rules or guidelines.
- Structured programming verification – which determines whether a program is well structured.
- Complexity metric production – which measures a range of complexity metrics such as cyclomatic complexity, knots, essential cyclomatic complexity, essential knots, loop depth, etc.
- Full variable cross referencing – which relates the uses of global and local variables across a project allowing for aliasing (through pointers, references and parameters).
- Unreachable code reporting – which shows that control flow cannot reach the components.
- Static data flow analysis which follows the use of variables through the control flow graph and reports anomalous events, e.g. the computation of a value which is never used.
- Loop analysis – which assess the interrelationships of loops, ensuring that they are properly formed and do not have unnecessary complexity or the ability to loop forever.
- Recursive procedure analysis – which reports the use of various types of recursion and detects whether there is a recurse forever capability or potential to run out of stack space.
- Procedure interface analysis – which analyses the procure interfaces for defects and deficiencies. The interfaces are then projected through the complete project to detect integration faults.
- Pointer analysis – which looks at the use of pointers and the objects to which they point in order to detect anomalous behaviour.
- File usage analysis – which traces the control flow graph for defects in the use of files, .e.g. failure to open before writing, failure to close, multiple opening, multiple closing etc.
- Deadlock detection – which looks for various types of concurrent execution which can cause mutual interference and run-time faults.
- Information flow analysis – which relates the input variables to the output variables.
Wong: What doesn’t Static Analysis do? Where does it fall short?
Hennell: Static analysis by itself cannot show application faults since it does not know what the software is supposed to do. It is also subject to the consequences of the halting theorem which means that it is not possible in all cases to determine certain classes of faults, e.g. the presence of all infeasible code. However it is possible to add annotations to provide semantic information and also application information in which case the class of faults can be widened.
Wong: What’s the difference between Static and Dynamic Analysis? Are these separate tools? Do results from the static analysis engine get used for dynamic analysis?
Hennell: Dynamic analysis is performed on the executing code. It is generally dependant on at least a minimal amount of static analysis so the two techniques are not wholly independent. The degree of dependence depends on exactly what dynamic information is required.
Dynamic analysis can detect application faults because the test data which drives it is build on knowledge of the requirements or at least some knowledge of what the software is supposed to do.
It is the quality and origin of the test data which determines the efficacy of dynamic analysis. The project or components of the project subject to test are executed and the consequent behaviour is monitored. The test data therefore must be subjected to quality assessment, these quality assessments are usually termed coverage metrics.
Wong: How does static analysis work?
Hennell: Static analysis in general uses the language syntax to derive a control flow graph of the whole project or control flow graphs for each procedure and possibly a control flow tree to show the interconnections of the procedures. The properties and characteristics of these graphs such as size and complexity can be measured and reported.
By annotating the graphs with details of the variables and their uses wide classes of faults can be determined. This analysis is usually known as ‘deep analysis’ and comprises techniques such as dataflow analysis and many formal methods techniques. These analyses scan the annotated graphs using powerful mathematically based algorithms searching all paths for anomalous behaviour.
The graphs can be scanned for example to derive worst case path lengths, stack usage and other properties.
Wong: What normally takes more time: parsing the program or analyzing the tree?
Hennell: Parsing a program is usually a fast activity. It is the analysis in which time can be consumed extensively as the type of analysis becomes more complex (deep) and the detected faults more subtle. As a general rule, the faster the tool the more shallow the analysis.
Wong: Is it possible to do incremental analysis?
Hennell: Incremental analysis is possible. For instance, it is possible to build up the big picture by analyzing each procedure as it becomes available and then finally determining how they all interact. Some tools have the capability to reuse the results of past analyses and concentrate on areas where there are differences.
Wong: Are there advantages or disadvantages to using multiple static analysis tools?
Hennell: The multiple usages of tools do have both advantages and disadvantages. Since the tools usually differ considerably, there is the possibility of detecting a wider class of faults. On the other hand, there is also the possibility of apparently conflicting information due to the ambiguity in many of the common programming guidelines and the different interpretations in tools. For instance, not all metrics are uniquely defined, e.g. the common metric of a line of code can be dependent on different formatting strategies or basic file structure.
Wong: Can static analysis be accelerated using multicore or server farms?
Hennell: Some aspects of static analysis can be accelerated by hardware platforms. The analysis of each procedure in a project can be handled separately. Similarly, some aspect of graph analysis can be handled more effectively by multiprocessors.
Wong: How does dynamic analysis work?
Hennell: Dynamic analysis works by detecting the order of specific events such as control flow jumps. This information can be obtained by various techniques such as instrumentation (i.e. inserting extra statements such as print statements) or by monitoring events on data-buses. The objective is to determine firstly the exact sequence of events and possibly the values of specific variables. In more sophisticated tools the order of events detected is compared with the order of events as predicted by a static analysis. This turns the technique from a monitoring technique to a fault detecting technique.
If the outputs of the program are compared with those of the requirements, then the technique is capable of detecting application faults and it is in this mode that the technique is most widely used. The ability to detect faults is strongly dependent on the quality of the test data and this quality is usually measured by the coverage (of statements, branches, etc) achieved by the test data. The level of coverage metrics achievable varies from tool to tool, the more sophisticated offering a wide variety of achievement metrics.
Some dynamic analysis tools automatically support the whole dynamic testing environment such as:
- tracing requirements to and from high level down to source or object code,
- automatically building test harnesses,
- constructing test data in various ways,
- comparing actual outputs with expected outputs,
- running regression tests,
- managing the test process,
- documenting the whole testing process.
Wong: Can hardware be used to assist in dynamic analysis?
Hennell: Yes hardware can help. The major difficulty with dynamic analysis is that it is necessary to capture information from the executing program and there are many different hardware solutions to this problem.
Wong: How are the results depicted? Does a static analysis engine create compiler-like reports where there is a list of rules which have been violated or are there ways to graphically depict findings?
Hennell: The information yielded by both static and dynamic analysis can be so varied that there are many different ways of presentation. They can vary from a simple annotation of the code like a compiler listing to a detailed graphical representation. Faults which arise from multiple sources are particularly hard to represent. Tabular forms are also useful particularly for information on metrics and other software characteristics. The grading of results can also be particularly helpful since the presence of some faults is more serious than others. For instance portability faults are not of great interest if the software will never be ported to another environment.
Wong: My understanding is that static analysis engines are used primarily to enforce programming standards. How does it do this and what standards does it support?
Hennell: It is certainly true that there are many static analysis engines devoted to the detection of violations of programming standards but in itself this can be misleading. Some programming standards are themselves very simplistic, .e.g. banning the use of gotos, and others can be extremely sophisticated, e.g. checking array for bound violations. In either case while the static analysis engine can the directed to the same apparent objective (checking a given standard), the sophistication of those engines can vary dramatically.
Similarly just counting the total number of rules enforced by a tool can be misleading because it is possible that most or all of the rules are formatting and pretty printing rather than fault detecting.
The LDRA static analysis has some 800 rules of which 750 are identifying defects which in unfortunate circumstances could lead to faults. The internationally recognised standards which it enforces are:
|C Standards||C++ Standards||Ada Standards|
|MISRA-C:1998||JSF++ AV||SPARK Ada subset|
|MISRA-C:2004||High Integrity C++*||Ravenscar Profile|
|HIS||LM Train Control Program|
Organisations can configure the static analysis engine to check for the rules which they are required to meet due to industry or in-house programming requirements.
Wong: Are security vulnerabilities also identified by a static analysis engine?
Hennell: Most known security vulnerabilities can be detected by static analysis but most of them require the most sophisticated variants. In June 2009, CERT-C a security-oriented standard was released. This standard along with the US Department of Homeland Security’s Common Weakness Enumeration (CWE) form the basis of modern security checking tools. There is a considerable degree of overlap between these guidelines and those developed for safety.
Wong: What about process-orientated certifications such as DO-178B or ISO 26262?
Hennell: All the process-oriented standards require the use of sophisticated programming standards and additionally frequently specify required software characteristics such as low complexity. These can only be checked by static analysis and hence it is not possible to obtain certification without the use of static analysis tools or a large amount of manpower deployment. The standards explicitly mandate the use of dynamic analysis.
Wong: What do you believe are the top strengths of your static analysis product?
Hennell: The LDRA tool suite offers the deepest and widest static analysis currently available. It checks for around 800 different programming rules. Of these rules, some 100 report potential integration faults which are the hardest to find and are the most expensive if left in the code to find during integration or service. Despite the depth and quantity of the checks, the time taken is not excessive and the tool can be invoked at any time.
The use of the TBvision® product enables users not only to view the violations but also to switch between different programming standards and to benchmark the current version against previous versions to assess progress. The general presentation of results is extremely flexible and the different views enhance the visibility of the faults. This feature will also assist by highlighting the seriousness of the different faults detected.
The tool presents its reports in a format which has been accepted by all the worlds’ major certification agencies.
Finally, the close integration of the LDRA static and dynamic analysis capabilities enables the dynamic analysis to defect differences between actual and predicted run-time behaviour, which makes it the most powerful dynamic analysis available.