Real-time operating systems (RTOS) and Linux each bring their own advantages for embedded-systems designers. With an RTOS, designers can build deterministic multi-threaded applications with low memory requirements and fast boot times. However, they’re bound by the feature set and API provided by the RTOS vendor, and the middleware is often an extra cost and may not be fully integrated into the OS. The Linux operating system, by contrast, provides a standard, open-source platform with extensive driver and middleware support for a variety of components, including communication devices like Wi-Fi modules.
Despite these benefits, Linux brings with it a hefty set of hardware requirements. For instance, it requires a microprocessor (MPU) with memory management unit (MMU) plus a large amount of external memory—typically DDRx SDRAM—that requires high-speed signals to be routed on a multi-layer PCB. Boot-up times in standard Linux implementations are much longer than in RTOS-based systems, too. All of this overhead can push microcontroller (MCU) and MPU designers to avoid Linux altogether. In so doing, they unfortunately miss out on the many advantages that Linux brings.
Enter Execute-in-Place Linux
XIP, or Execute in-Place, Linux is a disruptive technology that approaches embedded-system architecture from a different angle. It allows code to be executed directly from external serial flash, which allows for significant optimization of the memory footprint. This is akin to how an MCU runs code out of on-chip flash, only with MPU-level performance. It overcomes many long-standing design challenges, while simplifying PCB design and shrinking the bill-of-materials (BOM) cost.
Traditional Linux Kernel Architecture
Embedded Linux systems have been constrained to using flash memory for code storage and DRAM for code execution and data buffering. With this type of architecture, application code is copied from flash to DRAM at boot-up before it’s executed, which prolongs the boot process.
While application code can be read out of flash in pieces as needed, the kernel code must stay in RAM indefinitely during the operation of the program. That is, it can’t be paged in or out. Likewise, driver modules loaded at run-time must also be copied to RAM and remain thereafter. To compound matters, the memory bandwidth has to be split between code and data, which creates a performance bottleneck that can potentially cause deleterious visual artifacts like flickering and lag time on touchscreen displays.
XIP Linux Kernel Architecture
Typical Linux systems, such as PCs and high-end MPUs, aren’t memory-constrained, so they can incorporate large, high-speed RAM without penalty. In a memory-constrained system, things must be done differently. With XIP Linux architecture, the code and constants can be kept in CPU-accessible ROM. In so doing, the CPU can execute the kernel code directly from flash, so there’s nothing to copy at boot time (Fig. 1). Thus, if the kernel is 5 MB, it instantly saves 5 MB of RAM, which is a major benefit of XIP Linux.
1. Memory mapping with embedded XIP Linux architecture allows code and constants to be kept in CPU-accessible ROM. As a result, the CPU can execute directly from external serial flash. (Source: Renesas Electronics America Inc.)
XIP Linux was first added to the Linux kernel for the PowerPC, with the primary purpose of speeding up boot time, and the secondary objective of reducing RAM usage. Later, this capability was extended to the ARM tree. Early systems incorporated parallel NOR flash, which was slow and expensive. While XIP Linux achieved its intended purpose at the time, as DDR memory got cheaper and faster, the NOR flash approach became less and less appealing.
Today, interest in XIP Linux has been revitalized due to the capability of the Renesas RZ/A1 MPU to execute Linux in place from (dual) QSPI flash. The chip supports up to 10 MB of on-chip SRAM and three levels of cache to boost efficiency and minimize cache misses and thrashing, while executing from serial flash memory. The memory footprint can be optimized such that no external DRAM is required at all, which is unique for an MPU running Linux.
Boot time was measured with all of the BSP (board support package) drivers enabled for the RZ/A1 Renesas Starter Kit at only 3.05 seconds from the point in u-boot when kernel boot starts, to the appearance of the log message "Freeing unused kernel memory” (which indicates that the file system is mounted). This time was reduced further by using an optimized boot-up process, allowing the processor to boot and launch a full GUI application in only 2.2 seconds from CPU RESET.
The application chosen was a security and industrial control panel GUI example from Crank Software (Fig. 2). Only a small number of drivers were included in this build—the largest being the Ethernet driver and TCP/IP stack—thus, the on-chip SRAM consumed by the XIP kernel was only 332 kB. This is a significant savings compared with the non-XIP kernel mode implementation, which consumed 3,803 KB of SRAM—more than 10X the space.
2. A security panel GUI running on XIP Linux with AXFS demonstrates how an XIP-oriented architecture can improve memory usage and performance. (Source: Renesas Electronics America Inc.)
Performance was tested by timing a TFTP file transfer from a PC to the development board. This is a relevant task to benchmark because the Ethernet driver and networking stack are located in the kernel as opposed to in the application in RAM. No provisions were taken into account for saving the file on the board, so the file was dumped to /dev/null. Moreover, the file system caches were cleared before each test for consistency.
Three different system configurations were used to transfer files from the PC to the board:
- An XIP kernel located in QSPI flash, with the Advanced XIP File System (AXFS) located in QSPI Flash and internal RAM.
- An XIP kernel located in DRAM, with AXFS located in QSPI flash and external DRAM.
- A traditional kernel with squashfs, both living in SDRAM.
Test copying was done on a 1-MB file, a 100-kB file, as well as ten 100-kB files. The results from this test (see table) show that there’s not a substantial difference between using an XIP kernel executing from QSPI flash as compared to a SDRAM, which validates AXFS as a memory architecture. In fact, the XIP Linux was observed to be the fastest memory configuration tested.
Kernel Update while Executing in Place
Updating the kernel on an XIP Linux system while it’s running can pose a challenge. If the processor is running in XIP mode using the Quad SPI interface, one can’t simply modify (i.e., erase/write) that SPI flash device. That’s because it would require taking the SPI peripheral out of XIP mode and putting it into SPI mode, which could crash the system. The safer way to update the kernel is to first save the state, and then reboot into u-boot or some other custom bootloader that executes out of on-chip RAM.
In the previously discussed example of the Renesas RZ/A1 implementation of XIP Linux, a loadable kernel module feature is available. With this feature, a memory buffer in kernel space gets loaded with new code to overwrite existing code. To make this process work, the processor’s interrupts must first be disabled. After that, the entire kernel module is loaded into on-chip RAM, which allows the SPI flash interface to be switched from XIP mode to SPI mode, while keeping the system running. This is the only way to enable erasing and reprogramming of flash.
During this time, no functions other than the kernel module can be run, even including utility functions, since Kernel access is completely disabled. Once the SPI flash is completely reprogrammed, it’s switched back into XIP mode to allow normal execution to resume. This loadable kernel module is included as part of the Linux BSP and documentation at http://elinux.org/RZ-A.
Choosing the Optimal File System for XIP Linux Implementations
Determining the best file-system match up for the XIP Linux can prevent several design headaches further down the road. The AXFS is recommended when using XIP Linux because it is possible to leave code and .r/o (read-only) data in the flash/ROM and copy only relevant .r/w (read/write) data into RAM. Thus, instead of mapping the entire MMU to RAM (as in a traditional file system), the MMU can mark some memory as XIP and map it to flash/ROM, while mapping other memory to RAM.
In the AXFS, everything is broken into pages (4 kB for ARM), so memory can be mixed and matched between RAM and ROM at page-level granularity. AXFS also supports mixing of compressed and uncompressed pages within the same executable (application) file. XIP pages must be uncompressed, of course, but the other pages can be compressed to reduce size. When ready to run, the compressed pages can simply be decompressed and copied into RAM.
Page Faults are an example of issues resolved much faster with AXFS as compared to a traditional file system (like eXT3 or eXT4). With the AXFS, the MMU only needs to map to the location in flash to resolve the fault. There’s no need to copy anything into RAM in response to the incident, which can save a considerable amount of CPU time.
XIP Linux with AXFS enables embedded systems to run Linux within a memory-constrained system by executing most of the code in-place from flash memory like an MCU. With XIP Linux, users have access to all of the Linux drivers and the open-source ecosystem support of the Linux community, while getting the fast boot-up familiar to MCU programmers.
All relevant documentation of XIP Linux for the Renesas RZ/A1 MPU is posted to http://elinux.org/RZ-A, which includes Github repositories for the kernel, u-boot, and BSPs. TSeveral application notes and demonstrations are available, giving engineers the ability to try out XIP Linux on their own.
David Olsen is Senior Manager of Product Marketing, Chris Brandt is Senior Staff Application Engineer, and Ganesh Balamitran is Product Marketing Manager at Renesas Electronics America Inc.