Based on behavior seen on Android and Chrome OS, Google has started work on a new page fetching strategy for its Linux-based operating systems aimed at improving how the virtual memory subsystem fetches pages. unused memory. More recent work shows that the new MGLRU policy can also benefit server environments.
Google’s research on how the Linux kernel handles memory overhead comes from analyzing both servers with hundreds of gigabytes of memory as well as personal and mobile devices. In both cases, a Google engineer came to the conclusion that:
Fetching the current page is too expensive in terms of CPU usage and often makes poor choices about what to delete. We want to offer a powerful, versatile and simple increase.
Two principles of the current LRU-like implementation of page replacement in the Linux kernel have fallen under their control: sorting pages into active and inactive lists, and scanning those lists incrementally to find candidates for eviction, which which leads to a number of inefficiencies, according to Google engineers.
In particular, incremental analyzes using
rmap has resulted in high CPU usage and poor performance in memory pressure situations, as many pages have to be scanned to find enough pages to fetch. In contrast, reasoning in terms of active and inactive pages did not seem useful for task scheduling in server environments and led to biased page eviction on Android and Chrome OS with negative impact on interface rendering. user.
The new policy, MGLRU, instead leverages the notion of generation numbers to go beyond the active/inactive distinction, and replaces incremental scans with differential scans via page tables. Basically, this means that the pages are grouped into generations, each generation being made up of all the pages referenced since the previous generation. Generations are discovered using differential scanning. Older generations are marked as evictionable and are eventually evicted by an aging process that accounts for whether a page has been used since the last crawl.
The cost of each differential analysis is roughly proportional to the number of referenced pages it discovers. Unless the address spaces are extremely sparse, page tables generally have better memory locality than rmap.
According to Google’s original benchmarks, based on rolling out MGLRU to tens of millions of Chrome OS users and approximately one million Android users, the new policy resulted in 59% OOM removals in less on Chrome OS and 18% less on Android, as well as improvements in other UX metrics.
Since the initial patch, submitted in March 2021, Google engineers have continued to work on MGLRU to improve its performance and extend it to additional architectures. The latest patch, submitted in the early days of 2022, includes benchmarks for the most popular open-source memory-intensive applications, such as Apache Hadoop, Memcached, MongoDB, PostgreSQL and others.
An independent lab has evaluated MGLRU against the most widely used benchmark suites for the above applications. They released 960 data points along with kernel metrics and performance profiles collected over 500+ hours of total benchmark time. Their final reports show that with 95% confidence intervals (CI), the above apps all performed significantly better for at least some of their benchmark matrices.
Linus Torvalds endorsed the work of Google engineers on MGLRU by observing:
So I personally think it’s worth tracking, in part simply because of the reported improvements that have been measured. But also to a large extent because the whole notion of doing multi-generational LRU isn’t exactly a crazy wackadoodle thing. We’re already doing active vs inactive, the whole multi-generational thing just doesn’t seem to be that “far away”.
Although the outlook is positive, it is not yet clear whether MGLRU will upgrade to 5.17 or later. InfoQ will continue to report on the progress of this new Linux feature as more details become available.