Efficient Virtual Memory for Big Memory Servers
A. Basu, J. Gandhi, J. Chang, M. D. Hill, and M. M. Swift, ISCA 2013
Summary
Many “big-memory” server workloads, such as databases, in-memory caches, and graph analytics, pay a high cost for paged VM. TLB misses can account for up to 51% of execution time, while rich features of paged VM is not needed by most applications.
Proposal: Paged VM + Direct Segements, an alternative for huge pages. Mapping part of a process’s linear virtual address space with a direct segment, while page mapping the rest of the virtual address space.

Motivation
Many “big-memory” server workloads pay a high cost for paged VM: they suffer from high TLB misses while not requiring rich features of paged VM.
Trend
- 
    The amount of physical memory has gone from a few MBs to a few GBs and then a few TBs now. 
- 
    The size of TLBs has remained fairly unchanged. 
- 
    Many “big-memory” workloads exhibit low access locality. 
Higher memory capacity + const TLB + low locality access pattern = more TLB misses.
Observation of Big-memory Workloads
- 
    For the majority of their address space, big-memory workloads do not require swapping, fragmentation mitigation, or fine-grained protection afforded by current virtual memory implementations. They allocate memory early and have stable memory usage. 
- 
    Big-memory workloads pay a cost of paged VM: substantial performance lost to TLB misses. 
- 
    Many big-memory workloads are long running, sized to match memory capacity, and have one (or a few) primary processses. 
Solution: Paged VM + Direct Segements
Goal: enable fast and minimalist address translation through segmentation where possible, while defaulting to conventional page-based virtual memory where needed.
Proposal: direct-segment hardware that is used via a software primary region.
Hardware Support: Direct Segment
Idea: Translate a contiguous virtual address range directly onto a contiguous physical address range to eliminate TLB miss. Any virtual address outside the aforementioned virtual address range is mapped through conventional paging.
Implementation: Segmentation. BASE, LIMIT, OFFSET registers added per core. Direct segments are aligned to the base page size, so page offset bits are omitted from these registers. A given virtual address for a process is translated either through direct segment or through conventional page-based virtual memory but never both.
Software Support: Primary Region
- 
    OS provides a primary region abstraction to let applications specify which portion of their memory does not benefit from paging. 
- 
    OS provisions physical memory for a primary region and maps all or part of the primary region through a direct segment by configuring the direct-segment registers. 
Two approaches to manage physical memory:
- 
    Create contiguous physical memory dynamically through periodic memory compaction. 
- 
    Use physical memory reservations and set aside memory immediately after system startup. 
Why Not Huge Pages?
- 
    Large pages and their TLB support do not automatically scale to much larger memories. To support big-memory workloads, the size of large pages and/or size of TLB hierarchy must continue to scale as memory capacity increases. 
- 
    Efficient TLB support for multiple page sizes is difficult. Because the indexing address bits for large pages are unknown until the translation completes, a split-TLB design is typically required where separate sub-TLBs are used for different page sizes. This design can suffer from performance unpredictability while using larger page sizes. 
- 
    Large page sizes are often few and far apart. 
Results from Experiments
For all workloads examined (graph500, memcached, MySQL, NPB:BT, NPB:CG, GUPS), the percentage of time spent on TLB miss handling was reduced to less than 0.5%.
Questions
- 
    For long-running workloads, can we gradually optimize the virtual-physical mapping by placing important things in the direct segment? 
- 
    Can multiple processes use direct segment concurrently? If so, how to manage the direct segment?