Download this article in .PDF format
This file type includes high resolution graphics and schematics when applicable.
Following Part 1 of this series on source-code weaknesses, which discussed code injections, this second installment delves into information leaks (some prefer the term information exposure). Information leaks occur when a program inadvertently communicates sensitive information inappropriately.
Information leakage vulnerabilities can be critical, especially if the sensitive information in question contains login credentials or private keys. Below I start by giving a brief explanation of the different kinds of information leaks, and then I describe some serious security vulnerabilities that have arisen due to such leaks. Finally, I address some techniques for defending against them.
What are Information Leaks?
An information leak is defined as the disclosure of information to a party that does not have explicit authorization to access that information.1 Today’s highly connected world where security is vital, contains many kinds of sensitive information. Certainly, usernames, passwords, and secret keys are the most sensitive, but other types of information can be useful to an attacker, too. If the attacker can learn about a software system’s implementation, for instance, that can sometimes be used to mount a different kind of attack.
Basically, information can leak through three kinds of so-called channels, as described in the following sections.
The most obvious channel is through normal use of the software. Assuming the software was specified correctly in the first place, the most common source of these leaks is programmer error. For example, a program with an SQL injection vulnerability can be thought of as being prone to information leakage. That’s because an attacker can extract information from the database by creating a specially crafted string and sending it to the program using the normal query mechanism.
Harmful information leakage can occur when a programmer simply doesn’t realize that certain kinds of information can be indirectly sensitive. For example, let’s say a program uses a third-party network protocol stack, and some versions of that implementation are vulnerable to a packet-flooding attack that causes the device to crash. If the program announces the version of the protocol stack that it’s using, the attacker can use that information to determine if the program is vulnerable, and trigger the crash at will. If that information weren’t available to the attacker, the amount of effort required to find a vulnerability would be increased, potentially raising the bar enough to deter the attack.
The second channel is the error or diagnostic channel. In this situation, error messages or other responses to erroneous inputs divulge the sensitive information to the attacker. Well-written programs are, of course, required to behave in a civilized fashion when given bad data. In most cases, the programmer will wish to give an error message that makes it easy to diagnose the problem. However, in a sensitive deployment, that can be unwise.
A good example of this is a program that asks for a login name, but only asks for a password after it has first confirmed that the name is valid. Doing so makes it easier for the attacker to find a valid login name. A better approach is to read both the name and the password first, then give an error message when the authentication fails, and never disclose which of the two inputs was wrong.
If an attacker can force the program to have an internal error, the results can be very useful to the attacker. Again, a competent programmer will have anticipated the error state and written the code to detect it (such as with an exception handler), and give enough diagnostic information to make it easy to find and correct.
Because internal errors represent real bugs, the most useful information to the programmer might be a stack trace and the values of key variables at the point when the error was detected. Those variables may contain sensitive information themselves; but even if they do not, it’s a huge mistake to give that information to the user. That’s because it reveals details about how the program was implemented, which can be used to mount a different attack.
In all considerations of security, there’s a fundamental conflict with convenience. In this case, the conflict is especially tricky because the programmer implementing the security is the one who will be most inconvenienced by it. Consequently, it’s important to pay special attention to diagnostic channels, to make sure that inappropriate information isn’t disclosed.
The final way information can be leaked is over a covert channel. Attackers can use covert channels to draw some conclusions about the sensitive information by observing or measuring how the program operates. Covert channels are about how the implementation works, rather than about the properties of the underlying algorithm itself. The classic example is the timing attack from cryptography. If a cryptographic algorithm takes a certain amount of time to complete, which depends on the inputs, precise measurements of that time can allow an attacker to deduce the inputs. In practice, the timing channel is more likely to be used to reduce the search space that the attacker has to comb through.
A more commonplace covert channel is storage that was used by the program to store sensitive information, but which hasn’t been appropriately cleansed. In some circumstances, it can be possible to force the program to dump that information.
For more information about information-leak vulnerabilities, check out the CWE database.
One of the most serious security vulnerabilities in recent years was the Heartbleed bug that was found in OpenSSL. It’s hard to understate the havoc wrought by that bug. As the most widely used open-source SSL implementation, OpenSSL had been incorporated into thousands of products ranging from small embedded devices to network routers and large enterprise applications. Because it was so widely used, most people assumed it was secure. Unfortunately, it contained a long-standing information-leak bug that allowed an attacker to snoop on the server’s internal data, which could include the private keys.
Detailed descriptions of the bug can be found elsewhere, but roughly speaking, the attacker would send malformed packets to the server, which would fail to detect that the packet was bad. The server would send back the contents of a chunk of memory, usually a portion of the heap that had been freed. By repeatedly sending those malformed packets and recording the results, the attacker could piece together most of the sensitive information that was stored in the server’s memory.
Heartbleed was an example of a covert-channel information leak. The sensitive information was stored on the heap, the memory allocator reused portions of that heap, and then the bug in the program meant that the attacker could read the data out of the heap.
Beware of Compiler Optimizations!
A key principle of secure programming is to limit the lifetime of sensitive information. That way, it’s less likely to be leaked. For example, if a password must be temporarily stored in a buffer as clear-text, it’s a good idea to overwrite, or scrub, the buffer once it’s no longer needed. This was exactly the intention of a programmer at Microsoft2 who wrote something like the following:
char pass[PSIZE]; read_password(pass); ok = validate_pass(pass); memset(pass, 0, sizeof(pass)); /* Optimized away! */ return ok;
Although the programmer did exactly the right thing, the compiler was too clever. It concluded that because pass was going out of scope, the call to memset() was redundant, and generated code in which there was no call to memset(). Although most programmers are surprised that the compiler is allowed to do this, it’s an example of perfectly legal “dead store removal” optimization, in which the compiler is free to remove statements with no observable effect.
Consequently, the clear-text version of the password remains on the stack. This might seem harmless until you realize that the contents of memory may end up being written to a swap file, or a crash dump file. Although this may seem difficult to exploit, remember that hackers are ingenious at coming up with new ways to break into your system, and that most successful exploits these days are possible because of multiple vulnerabilities that are chained together.
Avoiding Information Leaks
Two key secure coding rules should be followed to minimize the chance of information leaks. The first is one of the most fundamental rules of secure programming in general: sensitive information should be encrypted in motion and at rest. If the information must be stored in a file or database, it should be encrypted. If it must be communicated across a network, a secure encrypted channel should be used.
Of course, it’s difficult to do any meaningful processing of encrypted information, so it must be decrypted at some point in its lifetime, which motivates the second rule: limit the lifetime of clear-text sensitive information. That is, only keep the clear-text copies of the information for as long as absolutely necessary. When done, overwrite that information using a reliable technique.
These two rules, if followed, can reduce the chance of information leaks being introduced in the first place. Additional techniques are useful for detecting information leaks. The CWE entry for Information Exposure (the term in that database used to describe information leaks) [CWE] recommends the following techniques as being effective for detecting information leaks:
• Dynamic analysis: For Web or database-oriented applications, scanners that can automatically probe the software and interpret the results are highly cost-effective. Other techniques such as fuzzers and execution monitors are also effective, but require more manual interpretation.
• Manual source-code review: This can also be cost-effective, even though it can be very labor-intensive. No automatic tool can substitute for the judgment of an experienced engineer. However, it’s important to realize that humans are most effective at high-level concepts, but weak at poring through low-level details. For the latter, tools provide essential assistance.
• Static analysis: These tools can automatically find programming weaknesses that lead to information leaks. Although such tools are usually used on source code, some are also available for analyzing object code. These offer one of the few ways to detect cases where the optimizing compiler undermines the programmer, such as with the example described above.
Static-analysis tools in their out-of-the-box configurations are good at finding generic defects, but most software will also be vulnerable to application-specific defects. The best way to use static-analysis tools to find information leaks requires spending some time configuring them so that they can find application-specific defects, too. The best tools allow end users to write their own checks that piggyback on the work that the analysis is doing for the generic defects.
For instance, CodeSonar (the tool I work on) lets users track how potentially hazardous information flows through the application. Sensitive information can be tagged and a rule can be specified that will trigger a warning if the information reaches a location in the code where it may leak. Having the analysis automatically find those paths is much more effective than a manual review of the code.
Information leaks are serious security vulnerabilities with a high likelihood of exploitability. Left unaddressed, they can lead to the loss of confidential information. If that information includes login credentials, an attacker has an open door to a complete compromise of the system running the software. Good secure programming practices are the best defense because they can prevent the problems at the source. Once the code is written, it’s important to use code reviews, and both static and dynamic tools, to actively seek out the remaining instances of this pernicious problem.
1. “Information Exposure”
2. “Some Bad News and Some Good News,” Michael Howard