It’s Time to Use a Safer C

I have been an advocate for writing safer embedded software, so when I recently ran across a Microsoft Research paper on Checked C: Making C Safe by Extension, by Archibald Samuel Elliott, University of Washington, Andrew Ruef and Michael Hicks, University of Maryland, and David Tarditi, Microsoft Research, for the IEEE Cybersecurity Development Conference 2018, it piqued my interest. The implementation of Checked C is done using LLVM extensions and is available on GitHub. Like many research papers, it presents a small model, compared to a commercial product, along with statistics, justification, and so on.

Checked C is a combination of static- and dynamic-analysis techniques designed to support spatial safety. The name is actually a misnomer, as the system targets C++ using Checked C templates for pointers. It’s focused on “backward-compatibility, incremental conversion, developer control, and enabling highly performant code.”

Of course, the additional annotation and checking isn’t free. The dynamic runtime aspect adds about an 8.6% overhead. However, benchmarks showed less than a 1% impact, indicating that much of the overhead is in areas that aren’t executed constantly. Compile time overhead can vary radically, but this overhead is something that can be addressed by more powerful build servers as well as incremental builds.

The system adds _Ptr and _Array_ptr templates as well as _Checked and _Unchecked type annotations. These allow the system to do more static checking to minimize runtime support. In a limited sense, it’s similar to other annotation systems, though significantly less rigorous than Ada 2012 contracts. Checked C has the notion of checked and unchecked regions as well as bounds-safe interfaces.

Checked C also addresses zero/null terminated (nt) strings. These strings are used throughout C and C++ programs. Unfortunately, they’re also the most error-prone aspects of C and C++ with buffer overruns being common due to a missing or overwritten terminator. Codelist 1 is an example of a common string function written with Checked C support.

// Return index of t in s, -1 if none.

int strindex(char source nt_checked[] : count(0),

             char searchfor nt_checked[] : count(0)) {

  int i = 0, k = 0;

  nt_array_ptr s : count(i) = source;

  nt_array_ptr t : count(k) = searchfor;

  for (i = 0; s[i] != '\0'; i++) {

    nt_array_ptr tmp_s = s + i;

    for (k = 0; t[k] != '\0' && *tmp_s == t[k]; tmp_s++, k++)

     if (k > 0 && t[k] == '\0')

       return i;

  return -1;

Codelist 1: The nt_checked and nt_array_ptr definitions indicate the Checked C null terminated array support.

As with many projects, Checked C is a tool in flux with new features being added and others modified or removed as more research is done. Unlike many other research papers, though, this technology, in some form, is more likely to eventually wind up in a product or open-source project because of the backer.

Another similarly-themed paper that I read recently was Safe Dynamic Memory Management in Ada and SPARK by Maroua Maalej, Tucker Taft, and Yannick Moy. What makes this very interesting is, as it states, “In this work, we introduce an extension to the Ada language, and to its SPARK subset, to provide pointer types (“access types” in Ada) that provide provably safe, automatic storage management without any asynchronous garbage collection, and without explicit deallocation by the user.”

For those unfamiliar with Ada or SPARK, SPARK is a subset of Ada that uses a feature introduced in Ada 2012 called contracts. It allows a developer to indicate requirements and specifications for results of procedures. SPARK applications can be proven to meet the specifications and many of the checks that Ada might include to verify proper operation. However, these can often be removed because the prover has eliminated the need to perform these checks at runtime.

SPARK limits the functionality of an Ada program by not allowing dynamic data allocation. This is fine for many embedded applications; still, a host of other applications need such support. It’s possible to use SPARK in these applications, but it’s then necessary to indicate which parts use SPARK and which use the full Ada complement.

Ada already has more robust-type checking; most other languages and contracts can improve safety and reliability with further checking. It will handle issues with pointers, heaps, and stack allocation and the language includes support for storage pools, a feature usually found in libraries for other languages.

The addition includes the idea of ownership types to address the issue of aliasing of pointers with a program:

Pointers (access types) are essential to many complex Ada data structures, but they also have downsides, and can create various safety and security problems. When attempting to prove properties of a program, particularly those with multiple threads of control, the enemy is often the unknown “aliasing” of names introduced by access types and certain uses of (potentially) by-reference parameters. We say that two names may alias if they have the possibility to refer to overlapping memory regions.

The idea is to allow the use of pointers while avoiding dangling references and storage leaks, and at the same time provide safe, immediate, automatic reclamation of storage without relying on unchecked deallocation or using automatic garbage collection. The approach is designed to work with existing Ada features such as by-copy/by-reference parameter passing and exception handling.

The approach adds a Boolean ownership aspect to access types, whereby there can be at most one writable access path to the designated data. An object can be in one of three states: unrestricted, observed, and borrowed. Similar concepts exist in Rust, a new programming language also designed for building safe applications. General operations in these Ada enhancements include moving, borrowing, and observing objects.

Codelist 2 shows one reason why SPARK currently avoids aliasing by not allowing points.

procedure Add_One ( X_Param , Y_Param : in Int_Ptr ) is

  Post => X_Param.all = X_Param.all’Old + 1

    and Y_Param.all = Y_Param.all’Old + 1 ;

begin

    all := X_Param.all + 1 ;

    all := Y_Param.all + 1 ;

end Add_One ;

Codelist 2: It would not be possible to prove the post condition if X_Param and Y_Param were the same pointer since the result would be adding 2 to the location.

The addition of the new rules and definitions would allow the compiler to determine when this type of aliasing would be a problem and flag errors accordingly.

What’s remarkable is that the implementation allows traversing a linked data structure with read/write permission, something that other systems do not allow, or require exceptions that bypass checking. Likewise, the proposed approach doesn’t require additional annotations because Ada already includes indicators such as in and out for procedure arguments. The result is a SPARK implementation that allows pointers to be used in applications, greatly expanding its potential usage areas without having to deal with unchecked exceptions.

Like the Checked C features, these Ada enhancements may wind up in a future version of the compiler. However, such changes are more likely to wind up sooner in the Ada compiler given the nature of the research and the involvement of the authors in the standardization process, since the C++ community is much larger.