Today’s sophisticated vehicles leverage embedded systems for everything from infotainment to powertrain management. Even low-end automobiles incorporate scores of electronic control units (ECUs). By some estimates, vehicles can have well over 1 GB of memory on board and can run as much as 10 million or more lines of C code. Of course, the more sophisticated the system, the more complex the software—and the more vulnerable that software is to faults. A fault on a door lock system is inconvenient. In the case of safety-critical systems like antilock brakes, errors can easily cost lives. Recognizing this, the Motor Industry Software Reliability Association (MISRA) developed MISRA C, a software development standard for C aimed at ensuring safety and reliability.
The stack and the heap RAM memory allocations are fundamental to an embedded system. Stack and heap memory must be allocated statically by the programmer but calculating the space required is notoriously difficult for all but the smallest embedded systems. Incorrectly used, they may cause the system to wreak havoc in the strangest ways. Indeed, to guard against faults, MISRA C rule 20.4 holds that dynamic heap memory allocation shall not be used, at least not where its bounds can overflow. The stack, however, remains a permissible and essential part of the system. This article will take a closer look at the principles and methods for reliable stack design, highlighting some methods that can be used to reliably calculate the required stack size and detect stack related problems.
The stack is an area of RAM where a program stores temporary data during the execution of code blocks. Typically statically allocated, the stack is operates on a “last in, first out” basis. The life span of variables on the stack is limited to the duration of the function. As soon as the function returns, the used stack memory will be free for use by subsequent function calls.
The types of data stored in the stack include:
- local variables
- return addresses
- function arguments
- compiler temporaries
- interrupt contexts
The stack usually grows downward in memory. If the memory area allocated for the stack is not large enough, the executing code will write to the area allocated below the stack and an overflow situation occurs. The area written to is usually the area where global and static variables are stored (see Fig. 1). As a result, underestimating stack usage can cause serious runtime errors including overwritten variables, wild pointers, and corrupted return addresses. All of these errors can be very difficult to find.
Stretching the limits in everyday life can be rewarding but can sometimes cause trouble. Stretching the limits in programming when it comes to allocated data will definitely cause trouble. For those who are lucky, that trouble will arise during system testing, but it might also manifest itself when the product has already been delivered to thousands or millions of customers. In the case of automotive applications, at best it might require an expensive recall. At worst, it could cost lives.
A number of factors combine to make it difficult to calculate maximum stack usage. Many applications are complex and event driven, consisting of hundreds of functions and many interrupts. There are interrupt functions that can be triggered at any time and if they are allowed to be nested, calculating the required stack size becomes even more difficult. There may be indirect calls using function pointers where the destination of the call could be a number of different functions. Recursion and un-annotated assembly routines will also cause problems for anyone who wants to calculate the maximum stack usage. There is not a natural execution flow that can be easily followed.
Many microcontrollers implement multiple stacks, for example a system stack and a user stack. Multiple stacks are also a reality with the use an embedded RTOS like µC/OS, ThreadX, and others, in which case, each task has its own stack area. Runtime libraries and third-party software are other factors that complicate the calculation because the source code for libraries and the RTOS may not be available. It is also important to remember that changes to the code and the scheduling of the application can have a large impact on the stack usage. Different compilers and optimization levels also generate code that uses different amounts of stack. All of these factors make it essential to continuously track the maximum stack requirement.
Setting stack size
When developing an application, the developer must consider stack size requirement during the design phase. This requires a method for determining the appropriate amount of stack. One obvious approach is to test the system under conditions that produce worst-case stack behavior. During these tests, a method is required for finding out how much stack has been used. Another approach is to calculate the theoretical maximum stack requirement. This can be done in basically two ways: from printouts of the current stack usage or by making sure that traces of stack usage are found in memory after the test run has been completed.
Quickly and accurately calculating stack size requires a tool that can analyze the complete system. The tool must operate on either the binary image or the source code. A binary tool works at the machine instruction level to find all possible movements of the program counter through the code, to find the worst-case execution path. A source code static analysis tool reads all of the compilation units of the application. In both cases, the tool must be able to determine direct function calls and indirect function calls through pointers in the compilation unit, and therefore compute a conservative stack usage profile across the entire system for all call graphs. The source code tool has to take into account the demands the compiler places on the stack, such as alignments and compiler temporaries. This can be done by the tool examining the object/executable code.
To calculate stack depth, the address of the current stack pointer can be used. This requires simply taking the address of a function's argument or local variable. It should be done at the beginning of the main function and for each of the functions that are potentially using the most stack.
This method can yield quite good results in small and deterministic systems. However, for many systems, it can be difficult to determine the nested function with the deepest stack usage and to provoke the worst-case situation. In addition, the results obtained with this method do not take into account stack usage by interrupt functions.
The stack guard zone
A stack guard zone is a memory area allocated just below the stack, where the stack leaves traces if it overflows (see Fig. 2). For a guard zone to be effective, it must be of a reasonably large size in order to catch writes to the guard zone.
One technique to detect stack overflow is to fill the entire amount of memory allocated to the stack area with a dedicated fill value, for example 0xCD, before the application starts executing. Whenever the execution stops, the stack memory can be searched upwards from the end of the stack until a value that is not 0xCD is found, which is assumed to be how far the stack has been used. If the dedicated value cannot be found, the stack has consumed all stack space and most likely overflowed.
This method of monitoring stack usage is commonly used by debuggers. This means that the debugger can display a graphical representation of the stack usage like in Figure 3.
Although this is a reasonably reliable way to track stack usage, there is no guarantee that it will detect a stack overflow. For example, a stack can incorrectly grow outside its bounds, and even modify memory outside the stack area, without actually modifying any of the bytes near the stack range. Likewise, the application might modify memory within the stack area by mistake.
Linker-calculated maximum stack requirement
Build tools like compilers and linkers can be used to calculate maximum stack requirements (see Fig. 3). A linker can accurately calculate the maximum stack usage for each call graph root (each function that is not called from another function, like the start of the application). However, it may require some input from the developer because this is only accurate if there is accurate stack usage information for each function in the application. Stack usage provided for the depth of recursive functions and nested interrupts, among other things, is difficult to determine at compile time.
In general, the compiler will generate this information for each C function, but in some situations, the programmer must provide stack-related information to the system. For example, if there are indirect calls (calls using function pointers) in the application, a list must be supplied of possible functions that can be called from each calling function.
Table 1: Result of linker-calculated maximum stack usage
The total maximum stack usage for the complete system is calculated by adding together the result for each call graph root. It is important to remember that this type of stack usage analysis produces a worst-case result. The application might not actually end up in the maximum call chain, by design or by coincidence. Table 1 shows the result of a linker-calculated maximum stack usage when using IAR Embedded Workbench.
To ensure high reliability in safety-critical ECUs, developers need to properly allocate stack size. Calculating the stack requirements can be a complex and difficult task, but there are a number of useful tools and methods that can simplify the process. The time and money spent on good calculation during the development phase is well rewarded by a robust, high-performance system.
- Nigel Jones, blog posts at embeddedgurus.com, 2007 and 2009.
- John Regehr,“Say no to stack overflow,” EE Times Design, 2004.
- Carnegie Mellon University, “Secure Coding in C and C++, Module 4, Dynamic Memory Management,” 2010.
About the authors
Anders Lundgren has been with IAR Systems since 1997. Working initially with compiler design and development, he has played a key role in architecting the IAR ARM toolchain and infrastructure. Mr. Lundgren is a regular speaker at seminars and conferences covering coding and debugging methodology. He has an MSCS and has been working with electronics, software and embedded systems since childhood.
Lotta Frimanson has been with IAR Systems since 1999. After gaining early experience in debugger development, Frimanson managed 8-, 16- and 32-bit toolchain products. In her recent responsibilities, she has led the IAR RTOS and ARM technology programs. Frimanson is a regular speaker at seminars and conferences covering RTOS and debugging methodology. Before joining IAR, she worked with embedded systems development for 10 years. She has an MSCS.