Reverse Mapping (rmap) in Linux Kernel
- 1. Reverse Mapping (rmap) in Linux Kernel
Adrian Huang | May, 2022
* Based on kernel 5.11 (x86_64) – QEMU
* SMP (4 CPUs) and 8GB memory
* Kernel parameter: nokaslr norandmaps
* Userspace: ASLR is disabled
* Legacy BIOS
- 2. Agenda
• Mapping & reverse mapping
• rmap: legacy approach vs new approach (performance improvement)
• Implementation Detail
- 3. Mapping & Reverse Mapping
Process 1
Process N
.
.
Page Table 1
Page Table N
.
.
Physical Memory
Page Frame
Process 1
Process N
.
.
RMAP
Physical Memory
Page Frame Page Table 1
Page Table N
reclaim
clear pte
clear pte
1
2
Reverse mapping
Mapping
rmap – “clear pte”: check ptep_get_and_clear()
- 4. rmap: legacy approach vs new approach
(performance improvement)
1. Legacy approach: 2.6.33 or earlier kernel
2. New approach: 2.6.34 or later kernel – High-level overview
- 5. anon_vma
Page #0
vma Page Table
Page #1 Page #999
.
. anon_vma
Page #0
vma: parent
vma: child #1
vma: child #N
Page Table
Page Table
Page Table
.
.
Page #1 Page #999
.
.
Page #0
Page #0 Page #1
Page #1 Page #999
Page #999
Parent process
Parent process & child processes: Some pages may be COWed
Fork #N children
rmap: 2.6.33 or earlier kernel
- 6. rmap: 2.6.33 or earlier kernel - Limitation
anon_vma
Page #0
vma: parent
vma: child #1
vma: child #N
Page Table
Page Table
Page Table
.
.
Page #1 Page #999
.
.
Page #0
Page #0 Page #1
Page #1 Page #999
Page #999
Parent process & child processes Issue statement
- 7. 2.6.34 or later kernel – High-level overview
anon_vma
Page #0 Page #1 Page #999
.
.
anon_vma_chain vma
Process
Legend
Pointer
Doubly linked list
RB-tree: RB node
- 8. .
.
.
2.6.34 or later kernel – parent/child processes interconnection
anon_vma
Page #0 Page #1 Page #999
.
.
anon_vma_chain vma
anon_vma_chain anon_vma
COW Page #1 COW Page #999
.
.
anon_vma_chain vma
Parent process
Child process #1
anon_vma_chain
anon_vma anon_vma_chain vma
Child process #2
anon_vma anon_vma_chain vma
Child process #N
Page #0
COW Page #1 Page #999
.
.
Page #0
Page #1 Page #999
.
.
COW Page #0
anon_vma_chain
.
.
.
Legend
Pointer
Doubly linked list
RB-tree: RB node
RB-tree: RB node (possible linked node)
Shared page
COW page
- 9. .
.
.
2.6.34 or later kernel: example 1
anon_vma
Page #0 Page #1 Page #999
.
.
anon_vma_chain vma
anon_vma_chain
anon_vma
COW Page #1 COW Page #999
.
.
anon_vma_chain vma
Parent process
Child process #1
anon_vma_chain
anon_vma anon_vma_chain vma
Child process #2
anon_vma anon_vma_chain vma
Child process #N
Page #0
COW Page #1 Page #999
.
.
Page #0
Page #1 Page #999
.
.
COW Page #0
anon_vma_chain
.
.
.
Legend
Pointer
Doubly linked list
RB-tree: RB node
RB-tree: RB node (possible linked node)
Shared page
COW page
reclaim
1
Traverse path (Check pfn of childrens’
pte = reclaiming page’s one)
2 pfn match?
- 10. .
.
.
2.6.34 or later kernel: example 1 – more detail
anon_vma
Page #0 Page #1 Page #999
.
.
anon_vma_chain vma
anon_vma_chain
anon_vma
COW Page #1 COW Page #999
.
.
anon_vma_chain vma
Parent process
Child process #1
anon_vma_chain
anon_vma anon_vma_chain vma
Child process #2
anon_vma anon_vma_chain vma
Child process #N
Page #0
COW Page #1 Page #999
.
.
Page #0
Page #1 Page #999
.
.
COW Page #0
anon_vma_chain
.
.
.
Legend
Pointer
Doubly linked list
RB-tree: RB node
RB-tree: RB node (possible linked node)
Shared page
COW page
reclaim
1
Traverse path (Check pfn of childrens’
pte = reclaiming page’s one)
4
2
3
pfn match?
pfn match?
pfn match?
- 11. .
.
.
2.6.34 or later kernel: example 2
anon_vma
Page #0 Page #1 Page #999
.
.
anon_vma_chain vma
anon_vma_chain anon_vma
COW Page #1 COW Page #999
.
.
anon_vma_chain vma
Parent process
Child process #1
anon_vma_chain
anon_vma anon_vma_chain vma
Child process #2
anon_vma anon_vma_chain vma
Child process #N
Page #0
COW Page #1 Page #999
.
.
Page #0
Page #1 Page #999
.
.
COW Page #0
anon_vma_chain
.
.
.
Legend
Pointer
Doubly linked list
RB-tree: RB node
RB-tree: RB node (possible linked node)
Shared page
COW page
reclaim
1
Traverse path (Check pfn of childrens’
pte = reclaiming page’s one)
- 12. .
.
.
2.6.34 or later kernel: example 2
anon_vma
Page #0 Page #1 Page #999
.
.
anon_vma_chain vma
anon_vma_chain anon_vma
COW Page #1 COW Page #999
.
.
anon_vma_chain vma
Parent process
Child process #1
anon_vma_chain
anon_vma anon_vma_chain vma
Child process #2
anon_vma anon_vma_chain vma
Child process #N
Page #0
COW Page #1 Page #999
.
.
Page #0
Page #1 Page #999
.
.
COW Page #0
anon_vma_chain
.
.
.
Legend
Pointer
Doubly linked list
RB-tree: RB node
RB-tree: RB node (possible linked node)
Shared page
COW page
reclaim
1
Traverse path (Check pfn of childrens’
pte = reclaiming page’s one)
Do not need to traverse
all anon_vma_chain(s)
- 20. fork(): COW → write fault (write-protected fault)
do_wp_page
wp_page_copy
new_page = alloc_page_vma(…)
cow_user_page
copy_user_highpage
maybe_mkwrite
wp_page_shared
[MAP_PRIVATE] COW: Copy On Write
[MAP_SHARED] vma is (VM_WRITE|VM_SHARED)
page_add_new_anon_rmap
- 23. rb_root & rb – When/who to use?
Interval tree traversal (implemented via red-black tree) for reverse mapping
- 25. rmap: page reclaiming – try_to_unamp()
rmap_walk
rmap_walk_anon
rmap_walk_ksm rmap_walk_file
try_to_unmap
anon_vma_interval_tree_foreach(…, &anon_vma->rb_root, …)
invalid_migration_vma
try_to_unmap_one
page_mapcount_is_zero
try_to_unmap_one
page_mapcount_is_zero
anon_vma_interval_tree_foreach(…, &mapping->i_mmap, …)