Secure partitioning brings performance guarantees to telematics and infotainment systems

A multitude of technologies are converging in automotive telematics and infotainment systems, enabling cars to become safer, more convenient, and, ultimately, more desirable to the consumer. In fact, “the more, the better” aptly describes the prevailing trend in this market. Already, automakers have introduced systems that combine GPS navigation, satellite radio, real-time traffic reports, 3-D interfaces, DVD playback, voice-controlled operation, automated emergency dialing, connectivity with MP3 players, hard-drive music storage, user-defined music playlists, and numerous other capabilities, all in one integrated unit.

To provide such rich functionality, these systems employ a large number of software components, totalling 10, 20, or even 30 megabytes of code. This software complexity poses a significant challenge to system reliability and performance, for the simple reason that the more code a system contains, the greater the chance that programming errors, security holes, or resource conflicts between software components will occur.

As the software responsible for controlling access to the CPU, memory, and other system resources, the real-time operating system (RTOS) can play a major role in diagnosing and preventing such problems. In particular, it can enforce secure boundaries between software processes and thereby prevent any process from inadvertently or maliciously degrading the performance of other processes. To achieve this goal, some RTOSs have introduced support for resource partitioning.

Briefly stated, this concept allows system designers to group software processes into separate compartments, or partitions, and to allocate a guaranteed portion of system resources, such as memory and CPU time, to each partition. As a result, processes in one partition cannot monopolize resources required by processes in other partitions.

Among other things, partitions can provide memory protection, where the RTOS uses the memory management unit (MMU) to control all access to memory. A microkernel RTOS, for instance, allows developers to partition applications, device drivers, protocol stacks, and file systems into separate, memory-protected processes. If any process, such as a device driver, attempts to access memory outside of its process container, the MMU will notify the OS, which can then terminate and restart the process.

This approach:

prevents coding errors in any process from corrupting other processes or the RTOS kernel;
allows the developer to quickly identify and correct memory access violations that could otherwise take weeks to isolate; and
reduces fault-recovery times dramatically rather than having to reboot when a fault occurs (a procedure that can take seconds to minutes), the system can simply restart the offending process (a procedure that may require only a few milliseconds).

CPU GUARANTEES

Still, building a reliable infotainment or telematics system involves more than partitioning functionality into separate memory domains. In many cases, guaranteeing access to the CPU is also critical. If any subsystem — the HMI, for example — is deprived of CPU cycles, then that subsystem will become unavailable to the user.

The need for CPU guarantees arises from the priority-based pre-emptive scheduling that most RTOSs employ. In a nutshell, this scheduling model helps ensure that processes and threads execute in order of their assigned priority: a higher-priority thread can always pre-empt a lower-priority thread, and a lower-priority thread cannot stop a higher-priority thread from running.

Priority-based scheduling offers many advantages, including:

Predictable latency. By allocating time-critical functions to high-priority threads, developers can control how long the system takes to respond to external events, even when the system becomes heavily loaded.
Concurrency and flexibility. With priority-based scheduling, an embedded system can handle a mix of tasks, including regularly occurring tasks with hard deadlines, high-priority event-driven tasks, and background-processing tasks.
Proven and familiar. Priority-based scheduling is widely used in automotive applications and is well understood by embedded developers.

Despite these advantages, priority-based scheduling can lead to a condition called task starvation. For instance, let's say a system contains two threads, A and B, and that A has a slightly higher priority than B. If A becomes too busy, it will lock out B (as well as any other lower-priority thread) from accessing the CPU.

In the automobile, thread A might control the navigation display and process B, the MP3 player. If the navigation system consumes too many CPU cycles when performing a route calculation, it can starve the MP3 player and cause MP3s to skip. Likewise, in a hands-free telematics system, a higher-priority thread responsible for echo cancellation may starve a lower-priority thread responsible for noise reduction — a problem that will affect the noise-reduction module and every other process downstream in the outgoing audio chain.

In short, priority-based scheduling cannot guarantee that lower-priority threads will access even a fraction of the CPU. Services provided by lower-priority threads — including diagnostic services that protect the system from software faults — can be starved of CPU cycles for unbounded periods of time, thereby compromising system availability. These issues become more frequent as software complexity (and the number of threads) increases.

This inability to provide resource guarantees can result in serious conflicts among the many subsystems that make up a modern telematics or infotainment unit — a problem that may not become obvious until final integration and verification testing. Subsystems that worked well in isolation respond slowly, if at all, once they begin vying with one another for CPU time and other resources.

Such resource conflicts are inherently difficult to diagnose and solve. System designers must juggle task priorities, possibly change behavior across the system, and then retest and refine their modifications. Together, these activities can easily take several calendar weeks, resulting in increased costs and delayed product.

GUARANTEED RESOURCE BUDGETS

Partitioning can help avoid these problems. For instance, in Figure 1, the designer has grouped software subsystems into four partitions and allocated a CPU budget for each partition: 20% for the user interface, 20% for MP3 playback, 30% for hands-free audio, and 30% for navigation and route calculation. The designer could also assign a separate memory budget to each partition. For instance, the navigation partition could be assigned 40% of memory.

With this approach, each development team knows from the start how much memory and CPU time will be available for their software subsystem. Moreover, each team can easily test their subsystem to ensure that it works within those defined budgets. At integration time, the RTOS will enforce the resource budgets, preventing any subsystem from consuming resources needed by other subsystems. Each system will work as expected — and as previously tested.

In effect, partitioning makes it much easier for development teams to work in parallel. For instance, as a developer, you no longer have to worry about the priorities of threads outside of your subsystem: those threads won't impact your throughput, even if they run at a higher priority than yours.

Also, by controlling the partition budgets, designers can trade off the response times of various subsystems to quickly tune system performance. Ideally, a partitioning scheduler will let designers perform this CPU tuning dynamically at runtime, without forcing them to rebuild applications or the system image. Figure 2 shows a tool for dynamically tuning partition budgets.

ADAPTIVE PARTITIONING

Partitioning schedulers vary. Some strictly enforce CPU budgets at all times, forcing each partition to consume its full budget even when it has no work to do. Other implementations will dynamically allocate these unused CPU cycles to other partitions, thereby maximizing overall CPU utilization and allowing the system to handle peak demands. Such an approach offers the best of both worlds: it can enforce CPU guarantees when the system runs out of excess cycles (for guaranteed availability of lower-priority services) and dispense free CPU cycles when they become available (for maximum performance). For instance, if the navigation partition in Figure 3 becomes busy, it could use CPU cycles that any other partition isn't currently using.

Properly implemented, a partitioning scheduler doesn't require code changes, nor does it change the debugging techniques that designers are already familiar with. It can also use the standard POSIX programming model, allowing embedded developers to work with the same industry-standard APIs and task-prioritization schemes that they do today. To introduce partitioning, developers simply define partition budgets and decide which processes or threads reside in each partition. The processes themselves can remain unchanged. Within each partition, the RTOS can continue to schedule threads according to the traditional rules of a pre-emptive, priority-based scheduler.

WELL-INTEGRATED, WELL-PROTECTED

As complexity and code size grows, the probability that task starvation and other software problems will make their way into the final product also grows. The cost of resolving such problems after a system has been deployed increases dramatically — not to mention the damage done to the supplier's reputation and bottom line. Properly implemented, partitioning provides an efficient, easy-to-use mechanism to prevent these problems. Moreover, it offers increased security and greater system availability by preventing malware or denial of service attacks from monopolizing memory and the CPU. In short, it allows embedded developers to create infotainment systems that are well integrated and well protected.

ABOUT THE AUTHORS

Kerry Johnson is senior product manager at QNX Software Systems. Paul N. Leroux is technology analyst and QNX Software Systems. The authors can be reached at [email protected], and [email protected].