Take any interface you use in your daily work. It may be the application programming interface (API) from a standard component, some internal library shared within your company, or a piece of code some guy in the next cubicle is developing for you. How would you know what to call it?
Modern languages provide various ways to identify services as well as various means to refer to those services, through typing, function declarations, or parameter specifications. Certain languages may give you a little bit more. Java specifies what exceptions may be thrown, Ada enables you to specify type ranges, C allows… um, never mind.
This level of specification reveals little about the behavior of the application, though, so there’s the need for something else, informal, not checked by any tool: the infamous documentation, almost exclusively in the form of comments. “Now why is that a problem?” you might say. “It’s already hard enough to get people to comment!”
Documentation is an achievement in a number of situations. But when reliability is at stake, it’s merely a Stone Age specification. How about verifying that the specification is correct? How about verifying that callers make the calls the way they’re supposed to and that they rely on properties they’re supposed to rely on? How about ensuring throughout the lifetime of the project that the specification stays consistent after various fixes, refactorings, and rewritings?
Some alternative techniques go beyond bare comments for specifying behavior and use formalisms checked by tools. Here, we’ll consider three languages in particular: C, Java, and Ada. We’ll also show how, once a specification is written, it can be used not only for documentation but also directly by testing and static analysis tools.
Table of Contents
- Precondition, Postcondition, And Type Invariant
- Java And JML
- C And ACSL
- Ada And Ada 2012
- Testing Executable Annotations
- Helping Static Analysis
- All The Way To Formal Proof?
Let’s first introduce the notion of a contract. In programming languages, a contract is typically a Boolean expression that’s verified at certain points in the program. The most interesting contract is probably the postcondition. It defines the properties that must be true after a function call. Typically, it can be a means of specifying the behavior or requirements of that function. It’s a constraint put on the implementer (who must fulfill the postcondition in the implementation) and a guarantee given to the caller (who can rely on the postcondition after the call).
To fulfill the postcondition, we normally need two things: a correct implementation of the function and a set of properties that need to be met before the call. These properties are called preconditions. A precondition is a constraint put on the caller (who must fulfill it before the call to ensure proper behavior) and a guarantee given to the implementer (who can rely on the precondition in its implementation). A postcondition is a guarantee given to the caller (the state of the environment after the call) and a constraint on the implementer (what the function has to perform).
A type invariant is a property that must be true at all times for every object of a given type. It can describe value ranges, relationships between fields, and similar properties.
Here, we’ll consider a simple system where physical objects are moved in a two-dimensional environment. Each object is associated with an X and Y coordinate, a size, and an identifier. A function “iterate” is called at each cycle and is responsible for updating the object’s position by a step. Objects will be stored on a list.
In addition to the structural specification, we’ll add some behavioral requirements:
1. X and Y must be within the interval [–200.0, 200].
2. Size must be within the interval [0 .. 100].
3. Except when set to the special non-initialized value, all object IDs must be unique.
4. Iterate must only be called on objects that have been initialized.
5. Iterate cannot change an object’s size or ID.
6. Each movement made by iterate must be smaller than a tenth of the object’s size.
Some of these requirements can clearly be mapped to type invariants, namely 1, 2, and 3, which are proprieties of types and data that must be true throughout the entire life of the program. Others can be mapped to pre-conditions, in particular 4. Finally, requirements 5 and 6 describe the function’s behavior, therefore mapping to post-conditions. Now, it’s possible to implement the above specification using three different languages based on Java, C, and Ada.
Java Modeling Language (JML) is an extension to Java, where the Java code should be compliable using a regular Java complier. Therefore, all JML annotations are written in special java comments, starting with //@ or /*@. Let’s first translate and then comment Code 1.
Several things are worth noting in Code 1. First, requirements 1, 2, and 3 have been translated into three type invariants, providing conditions on the range of the fields. The iterate subprogram is also associated with contracts. “Requires” states the precondition. In our semantics, it means the ID must have been initialized or different from NO_INIT. “Ensures” states the postcondition. Note that we can refer to the value of a variable before the call through the \old notation. The assignable command lists objects whose value may be modified by the implementation of the function.
The last invariant is introducing an additional expression capability required to express our requirements, the quantifier. Here, using the \forall notation, we specify that for all values of a variable j for which the expression (j >= 0 && j < list.size) is true, either “list [j].id.equals (NO_INIT)” is true or the ID is unique. This uniqueness is expressed through a quantifier (“\forall”) expression.
JML offers many additional methods to extend the expressiveness of the specification. In addition to requires, ensures, invariant, \forall, and \old that we just described, the language grammar also includes:
• pure, defining a function with no side effects
• \result, referring to the value returned by a function
• \exists, which checks that a property is true for at least one value of set
Once the specifications are written, additional tools will be required to verify their consistency (with regards to the JML specification), translate them into executable assertions for dynamic checks, or perform static analysis and/or formal proofs. More information on these tools can be found on the JML Web site, http://www.eecs.ucf.edu/~leavens/JML/index.shtml.
Following the design principles of JML (among others), a language for behavioral specification has been designed for C: the ANSI/ISO C Specification Language (ACSL). Code 2 is an example of a program similar to the one shown above in JML.
The first notable difference between the ACSL and JML examples is that, while JML allows the introduction of executable contracts that can refer to the actual entities in the code, ACSL is currently purely formal. Therefore, the ACSL cannot call “real” functions, but only formal ones, and can reason only on the corresponding formal properties. This is the particular intent of the Math axiomatic, introducing the pow and sqrt formal functions.
The Range_Object axiomatic is defining the range of values for x, y and the size fields of an object. This axiomatic is then used in the type invariant to verify that all instances of the object type respect this axiom. Note as well the \valid (v) construct, which specifies that the pointer must be valid (can be dereferenced).
Other than these two differences, this is pretty close to the JML example. More information on ACSL can be found at http://frama-c.com/acsl.html.
One of the major new features of the Ada 2012 language is its ability to include significant contract information in the language definition, without the need for external tools, along with the corresponding dynamic semantics. Code 3 shows how the same specifications in the previous examples can be written in Ada 2012.
The first notable difference is that while type ranges have to be defined using invariants in JML and ACSL, Ada considers this common enough to provide a specific way to express it: the value range. Indeed, this kind of type invariant has been available since the very first version of the language. Type ranges could have been expressed through an Ada 2012 contract in this specific case, but using a range will allow the compiler to perform more precise analysis.
Generalized contracts are introduced using the Ada 2012 notation called an aspect, although a backwards compatible version using pragmas is available, allowing pre-2012 tools to still process the code. Beyond this syntactical difference, precondition and postcondition are very similar to require, and they ensure clauses. The same is true with the quantifier, with the notable exception that in Ada the range of the value is explicitly defined while in JML it’s defined by an expression. So, the translation between an Ada quantifier and its corresponding executable loop is more obvious to the tool, though the quantifier may be less expressive.
An Ada compiler will be able to check the consistency and generate dynamic checks from contracts itself, without additional tooling. However, formal verification will require the use of external tools. The SPARK language, which has a long history in the formal proof domain, is currently enhanced to support the Ada 2012 formalism.
You can access an open-source Ada compiler implementing these contracts as well as the open-source version of the current SPARK tools at http://libre.adacore.com. The first edition of the new version of the SPARK language (still being developed) is available at www.open-do.org/projects/hi-lite/.
Errors found at integration time—or worse, after deployment—are among the biggest sources of budget overruns in programming projects. Pieces of the puzzle may seem to work fine independently, but the biggest inconsistencies arise only when they’re put together, sometimes involving specification changes to fix them. The ability to formalize a contract shared between modules is a great step forward to ensure the expectations from the implementer and user are the same. The next step, quite naturally, is to check that the contract is adhered to.
One of the key advantages of formalisms such as JML and Ada 2012 is that contracts are expressed in the form of “executable annotations,” which means they’re associated with a dynamic semantic and therefore can be used to generate checks in the code. However, this ability can have an important impact at runtime. It’s not uncommon to have a contract that can take more time to check than the code itself!
Next, programmers should consider the places where checks are performed. For preconditions and postconditions, this is rather obvious: checks will be made around points of calls. But with regards to invariants, things can be trickier, as the compiler will need to decide when to perform the checks. Let’s illustrate this with an Ada 2012 example.
In Ada 2012, predicates on a type (one particular type of invariant) are checked on parameter passing and assignment. So if we have Code 4, there will be a check failure on the assignment, since the predicate is not true. No check is generated on individual field modifications, though, so Code 5 does not raise an exception.
Although the assignment to the V fields breaks the invariant, no exception is raised on these two statements. Thankfully, as soon as a call using V as a parameter is done, a subtype check will occur and the inconsistency will be pointed out. Hopefully, this will not be too far from the introduction of the problem.
In short, checking the code with dynamic contracts is a definite improvement and will allow many errors to be spotted earlier in the process. However, it needs to be practical in terms of processing time and won’t be a complete proof that these contracts are always honored, but rather that they are correct in the subset of the inputs encountered, at the points where checks are done.
Also known as bug finders, static analysis tools look at potential vulnerabilities in the code such as buffer overflows, exceptions potentially raised, dead code, and infinite loops by interpreting the code statically before any execution. Using them is a great way to find problems early in the development process, complementary to testing.
However, you can only look at violations of constraints that are defined in the source code, and by default there’s not much that can be said. Having contracts in the code will improve code analysis a great deal. It will give static analysis additional things to verify as well as information about how things are meant to be. Even if they aren’t proven, these contracts will be accounted for to produce additional inferences, a bit like axioms.
One important point that should be noted here is that we’re talking about static analysis, as opposed to formal proof. The main difference is that static analysis usually isn’t “sound” or perfect. Depending on the strategy, it will only report a subset of the problem with the application—the parts that are the easiest to find.
Let’s take a simple example. Consider Code 6, which at some point is accessing to the value of an array. It has a mistake. Indeed, at run-time, we pass an array that (according to Ada semantics) has bounds between 1 and 10. As you can see, X is passed with a value of –10, and X+10 (0) is used within P to compute some value. At this point, there’s an out of bounds exception, since 0 is not within the bounds of the array.
According to static analysis, there’s a potential error on the V (X + 10) call. Depending on the rest of the code, the static analyzer will either be able to say there is a definite error at that point or that it’s unable to verify that there’s no error, leaving it as unknown. The question here is, conceptually, if we are making an error when accessing the component or if the error is coming from the call itself, not respecting some (implicit) requirements.
Specifying which of those is the case will help both the developer and static analysis tool. Code 7 slightly changes the code with a contract. We now know that the access V (X + 10) has to be valid because the precondition makes this validity an explicit requirement. Static analysis tools will be able to take advantage of this information and point to the actual error on the subprogram call rather than on the access.
This is just one example out of many of how contracts can help static analysis. It’s unlikely that it will ever say “this code is correct.” But adding contracts will help move potential errors around so they can be detected sooner and in a more appropriate and manageable manner.
At this stage, let’s agree on a common understanding of the word “correct” when it relates to a program. For the purpose of our discussion, a program is correct when it complies with its specifications for all possible inputs. If the specifications are wrong, there’s no chance to get a program “right,” but it will, however, be considered correct. This notion of specification can be broader than just explicit preconditions and postconditions. For example, an implicit specification of a program is often that it should not raise any exceptions.
As we have seen, static analysis is an extremely useful tool to detect errors in an application. But in actual usage, many things cannot be determined to be either correct or incorrect. The tool will just “not know” and issue a probable false alarm. This false alarm can be due to several factors, such as the complexity of the algorithms it’s trying to prove things about, usage of language features that are not a good fit for formal proof, limitations of the analysis technology, or a lack of axioms to reason with.
Yet there are very good reasons why you may want to reach a demonstration of the full correctness of an application. To achieve that, the analysis tool needs to be used in conjunction with a language (subset) that is fit for analysis. Certain constructs introduce strong theoretical limitations and must be forbidden.
One interesting example of this is the Ada 2012 subset introduced by the new versions of the SPARK language. Three main categories of constructs are forbidden: exceptions, tasking (multi-threading), and pointers. Out of these three, the pointer aspect is probably the one that causes the strongest theoretical limitations.
Code 8 is a simple example that illustrates this. For the non-Ada reader, the only syntactical element that may look strange is the “.all” notation, which performs a dereference. This would be equivalent to “*A” in C and C++.This looks relatively straightforward and most likely is correct. However, the postcondition cannot be proven. Indeed, the postcondition is satisfied only if A and B point to two different objects. But if both pointers point to the same object, both “A.all” and “B.all” will be equal to “C - D” at the exit of the subprogram.
This is just one example illustrating the kinds of constraints that arise when trying to introduce strong formal prove techniques. This kind of problem can be detected by using constructs a bit more constrained than low-level pointers, such as references in SPARK. There’s much more to say here, but it’s probably the topic for a forthcoming paper.
The industrialization age of programming by contract is opening a new era in software development. Just as development techniques went from assembly to structured languages and from structured languages to object orientation, contract-based programming is providing one more abstraction to software design.
For a long time, these techniques were only available to developers who had some extended mathematical knowledge. Thanks to new formalisms such as Ada 2012, they are now based on semantics understandable by any software developer, democratizing their usage. This is a good time to try them out!
Quentin Ochem has a software engineering background, specialized in software development for critical applications. He has over 10 years of experience in Ada development. He works today as a technical account manager for AdaCore, following projects related to the avionics, railroad, space, and defense industries. He also teaches the avionics standard DO-178B course at the EPITA University in Paris. He can be reached at [email protected]