Intel chips dominate the server and PC markets, but they’re also widely used in embedded applications. A serious flaw, called Meltdown, has been found in these chips, and the fix could have significant implications. The details of the flaw and fix are still under wraps. However, we do know some information about the issue and the potential fix. All of this comes on the heels of the Intel Management Engine problem that affected a large number of Intel processors.
The snag appears to be how the memory management unit (MMU) protects memory—a key to implementing a secure system. The issue relates to kernel memory and how it can be examined from a conventional application. The solution is to not include any kernel memory in the application’s virtual-memory (VM) space. Patches for Windows, Linux, and MacOS are in the works, and other operating systems that target the Intel platforms will likely have changes as well.
Developers will need to work with their software suppliers for these changes. Any operating system with virtual-memory or virtual-machine support running on processors with this flaw will require changes to address it.
The Meltdown bug is now documented as CVE-2017-7574. Two other major bugs, known as Spectre, have been reported as well. These include bounds check bypass (CVE-2017-5753) and branch target injection (CVE-2017-5715). Meltdown is found in Intel platforms while Spectre can affect AMD and ARM Cortex-A platforms.
To Share or Not to Share
The problem is a design tradeoff between keeping the kernel in its own address space and sharing some with an application. Keeping everything in the kernel’s own address space means only the kernel has access to it, but any calls from an application to the kernel now require a major state swap that incurs more overhead. It’s one reason why many microkernel approaches have a hard time challenging monolithic kernels like Linux in terms of performance.
For example, the kernel/application virtual memory split for an application in 32-bit Linux allocates the top 1 GB of virtual address space to the kernel. The application gets the remaining 3 GB. The kernel address space is mapped to physical memory in linear fashion, making it possible to easily map the addresses to physical addresses so that direct-memory-access (DMA) operations are easier to contend with.
The x86 architecture also has a four-ring security system. Most operating systems only use two rings. Ring 0 “supervisor mode,” the highest security level, is used to run the kernel. Ring 3 “user mode” is where an application lives. The virtual-memory system allows memory block access control to consider not only the virtual-to-physical mapping, but also what ring access rights are applied to the block. A task, the kernel, running at Ring 0 will be able to access anything, but an application running at Ring 3 won’t be able to access memory with Ring 0 security. This prevents an application from accessing the upper memory in the kernel space.
This approach has worked well… until now.
Another component in the MMU support is the translation lookaside buffer. This is part of the caching system that handles recent translations between virtual and physical memory. It’s also part of the security system.
Intel’s MMU implements a feature called “speculative execution,” which can provide a performance boost to the system. AMD doesn’t implement its MMU in the same way as Intel, so the problem won’t occur in AMD x86 processors. This is where the details are fuzzy, because researchers, OS programmers, and Intel are keeping the information secret until fixes can be made available to the public.
What’s the Solution?
The fix is to move a majority of the kernel space data at the top of memory that’s shared with all applications into the kernel’s own, lower memory space. Essentially, this part of the kernel operates like a conventional application from a memory-map standpoint. The MMU flaw doesn’t affect the protection of unshared areas like these.
This change means that a full context switch will be required for more kernel/system call operations, since the data needed to process the information will only be accessible using this approach.
The fix incurs additional overhead, which could potentially impact overall system performance. Numbers ranging from 5% to 30% have been tossed out, but we will have to wait for actual fixes to test those assertions. Even 5% can have an impact on embedded applications where certification, tuning, and other issues would be affected by even a small change. Likewise, changing the operating system would require recertification or testing for many critical applications.
Most of the discussion about the defect is related to security and performance. This is reasonable since the effect of a change on users, most server applications, and cloud providers will be better security, but with lower overall performance. This will also be true for most embedded applications, assuming they can install the fixes. Unfortunately, updating the operating system isn’t always an option when it comes to embedded systems. Many systems will need significant regression testing and even recertification. Some may even require a redesign or change of delivered features, because existing hardware performance may not be sufficient after the change to support some features.
The level of impact will depend on the application mix running on a system. Applications that have a low number of system calls will encounter minimal overhead. It doesn’t matter whether the operations being performed are in the application or the kernel, as long as the number of transitions between the two is low. Applications with a high number of system calls could see a significant slowdown.
So why is there still no fix, since a lot of work has already been done to address the defect? It’s the same reason that any fix will have in terms of impact on embedded systems. A change of this magnitude affects everything in an unknown way. Therefore, making sure the new software still works correctly with existing code, including the rest of the operating system, is critical.
Will everyone incorporate or have these fixes available for their systems? Probably not. The changes will likely target the latest versions of popular operating systems. Since the fix is within the OS, those still running something like Windows 95 or even Windows Vista will run into problems. One way to address the issue is to improve the security around the system such as providing external firewalls to isolate a system running current software without the fixes that would slow down the system.
Apple recently included an operating system patch that slowed down some older iPhones to address aging batteries. This generated such furor that Apple is providing low-cost battery replacements. We will see what impact Intel’s processor flaw will have on Intel and the rest of the world.