Tackling Security and Reliability in the Zephyr RTOS

The Zephyr RTOS provides a secure, reliable foundation to build an embedded system, but challenges arise with multithreading, software timing, etc. Such aspects aren’t apparent in the source code and call for good insight into the runtime system.

Dr. Johan Kraft

Related To:

Percepio

Feb. 24, 2022

7 min read

Code Krisana Antharith Dreamstime L 93689313 61f81bd37d594

This article is part of TechXchanges: Trace Debugging Techniques and RTOS: Zephyr Project

What you’ll learn:

Get an introduction to the multithreaded open-source Zephyr RTOS, which is designed with safety and security in mind.
How trace diagnostics tools help to quickly and easily identify software behavior that may impair the reliability and security of Zephyr designs.

The Zephyr Project has spent the last five years developing a multithreaded, open-source real-time operating system (RTOS) for embedded designs.

The RTOS currently supports over 200 boards running embedded microcontrollers from Arm and RISC-V to Tensilica, NIOS, and ARC as single and multicore systems. It also supports wireless connectivity from Bluetooth Low Energy, Wi-Fi, and 802.15.4 Matter (formerly Zigbee), alongside standards such as 6LoWPAN, CoAP, Ethernet, USB, CAN, and Thread. The board support packages include the libraries to make it easy for developers to get the RTOS up and running.

Security Focus

One important aspect of the Zephyr Project is that the RTOS has been designed with security in mind as well as fast, efficient development. The project is part of The Linux Foundation, which means it’s supported by a product incident response team and the codebase is developed with a goal of safety certifications. Updates are released approximately every three months; there’s also a long-term support (LTS) version for users who prefer a stable platform with just security updates.

All of this allows chips to connect quickly and securely to any cloud service with a range of chips and boards for applications in the industrial, automotive, smart-city, and smart-home markets. The two main alternatives, FreeRTOS and Azure RTOS ThreadX, are tied to their respective cloud vendors, Amazon and Microsoft, while Zephyr is independent.

The COVID‑19 pandemic has driven the need for contact-tracing wearables, distance trackers, and even smart safety shoes. All have been built on Zephyr in as little as three months with its small footprint, integrated stacks, and dependability.

One developer added a smart personal protection feature to its smart shoes to alert a worker through a vibrating signal when there’s a risk of getting too close to another worker. When the worker gets this signal, they can either put on a mask or move away from the other person.

Dealing with Vulnerabilities

But even with security at the heart of the development, problems can creep in, both in the Zephyr kernel and in the application code. The Zephyr project keeps a list of vulnerabilities that have been found and patched, which includes the BadAlloc issue as well as vulnerabilities in the USB and Bluetooth libraries.

An embedded system with connectivity to the cloud contains a fair amount of code and thus complexity. A “Hello world” project with AWS connectivity over MQTT and TLS produces a binary code that’s hundreds of kilobytes in size.

Most RTOS applications have plenty of complex interactions where bugs are fairly common, including vulnerabilities that could be exploited to access the device and potentially the rest of the network. If a system isn’t secured against unauthorized access, it’s not functionally safe either, and any vulnerabilities can have dramatic consequences for, say, industrial or medical systems.

Multithreading

An RTOS such as Zephyr adds another layer of complexity that must be considered by embedded designers: multithreading. This allows the code to be divided into separate threads that run independently, at least in theory.

In practice, dependencies between the threads often add to the complexity. Some are intentional and necessary, as the threads typically need to communicate, but others are unintentional and sometimes problematic. They’re usually not apparent in the source code, and it’s hard to predict how they affect the runtime behavior.

This is a challenge with microcontrollers (MCUs) commonly used for IoT applications, since they don’t always have memory protection. Thus, bugs aren’t isolated and might affect any part of the runtime system via data corruption. In combination with multithreading, where other threads may preempt the execution at almost any point, and with thread dependencies contributing to the complexity, there’s a greater risk for non-deterministic behavior, elusive bugs, and vulnerabilities.

Deterministic execution is crucial for testability. Without it, it’s very difficult to design good tests and thus ensure security and safety. Determinism requires that software timing variations are minimized and don’t affect the order of important events. Otherwise, one may end up with an astronomical number of execution scenarios that are impossible to cover with unit tests. Functional tests of complex applications can usually only catch the most obvious issues.

While several verification methods exist for security, there’s no single solution that covers all potential vulnerabilities. Ensuring deterministic execution in an RTOS system requires analysis of the runtime behavior, including timing variations and patterns in the kernel scheduling and API calls.

Software Tracing

One way to examine the runtime behavior of multithreaded software running on the Zephyr RTOS is to use a software-tracing tool. These tools use hooks at strategic locations in the kernel code to record events and create a trace recording of the application running on Zephyr. No modification of the Zephyr source code is needed to use such tools—only a rebuild to enable the hooks (Fig. 1). In Zephyr 2.6.0, the trace subsystem was expanded with additional hooks and tool support to provide better means for analyzing the execution.

1. A visual trace tool can provide a wide range of analysis for the Zephyr RTOS without requiring any manual modification of the source code.

Tracing tools can provide a visual timeline that facilitates debugging, as well as profiling of CPU time and stack and heap usage for each task. This kind of information can help developers detect timing variations and other sources of non-deterministic behavior. Fixing these are, as we stated earlier, key to accomplish a more stable and testable application that in turn leads to improved reliability, security, and safety.

Software tracing also can be used for API calls and logging user-defined application events, which have broad applications (Fig. 2). These include detection of deadlocks, memory leaks, as well as buffer overflows commonly used for injecting malware.

2. A visual trace showing Zephyr scheduling and API calls, as well as blocking.

Advanced tool-supported analysis and visualization of trace data—what we call visual trace diagnostics—enables something similar to a surveillance camera for embedded software, where developers can zoom out to get an overview of the execution and zoom in to analyze the details (Fig. 3). This allows trace data to be visualized from many perspectives and at different abstraction levels. It permits a top-down workflow in which anomalies can be identified in high-level overviews and then investigated in more detailed views.

3. Visual trace diagnostics is a bit like a surveillance camera, tracing the behavior of embedded software code to pinpoint anomalies on a visual timeline for further top-down analysis.

A key point is that the sort of tracing described here can be done entirely in software, either by keeping the latest events in a ring buffer in target RAM, or streaming events continuously over a TCP/IP link or a debug probe to the host system. Thus, teams of developers can monitor the system over long periods of time and capture even very rare issues.

Software-based tracing is applicable for essentially any embedded processor and toolchain. It doesn’t not rely on any particular hardware support for tracing.

Conclusion

The ability to identify potential problems quickly and easily is essential for keeping a project on track and delivering a high-quality product.

Reliability and security are key requirements for embedded systems. Zephyr 2.6.0 brings expanded support for software tracing, which facilitates debugging and allows for improved reliability, security, and safety. Visual trace diagnostics make it easy to detect chaotic, non-deterministic software behavior, which may impair testability and thereby hide elusive bugs and vulnerabilities.

The insight provided by visual trace diagnostics leads to higher productivity due to faster debugging. It also improves the testability and thereby reliability, security, and safety—all critical factors in successful software-development projects.

Read more articles in the TechXchanges: Trace Debugging Techniques and RTOS: Zephyr Project

About the Author

Dr. Johan Kraft

CEO/Founder

Dr. Johan Kraft is CEO and founder of Percepio AB. Dr. Kraft is the original developer of Percepio Tracealyzer, a tool for visual trace diagnostics that provides insight into runtime systems to accelerate embedded software development. His applied academic research, in collaboration with industry, focused on embedded software timing analysis. Prior to founding Percepio in 2009, he worked in embedded software development at ABB Robotics. Dr. Kraft holds a PhD in computer science.

SETI Celebrates 30 Years of Searching for Extraterrestrial Signals

What’s the Difference Between DIMM and CAMM?

Sponsored

Smarter Sunroof Control with MPS Power ICs

Sponsored

Tackling Security and Reliability in the Zephyr RTOS

Security Focus

Dealing with Vulnerabilities

Multithreading

Software Tracing

Conclusion

About the Author

Dr. Johan Kraft

CEO/Founder

Related

SETI Celebrates 30 Years of Searching for Extraterrestrial Signals

What’s the Difference Between DIMM and CAMM?

Smarter Sunroof Control with MPS Power ICs

Optimized Power for Wiper Control Systems

Voice Your Opinion!

To join the conversation, and become an exclusive member of Electronic Design, create an account today!

Trending

2025 PowerBest Winners: Too Cool to Classify

2025 PowerBest Winners: Power Devices

Rust in Safety-Critical Systems: Predictions for 2026

Sponsored Picks

LT8645/LT8646 Synchronous Step-Down Regulators

Faster Timing Design and Accurate Performance Testing with Live Bench Measurement Tool

IP Rated DC Power Jacks