SlideShare a Scribd company logo
Reverse Mapping (rmap) in Linux Kernel
Adrian Huang | May, 2022
* Based on kernel 5.11 (x86_64) – QEMU
* SMP (4 CPUs) and 8GB memory
* Kernel parameter: nokaslr norandmaps
* Userspace: ASLR is disabled
* Legacy BIOS
Agenda
• Mapping & reverse mapping
• rmap: legacy approach vs new approach (performance improvement)
• Implementation Detail
Mapping & Reverse Mapping
Process 1
Process N
.
.
Page Table 1
Page Table N
.
.
Physical Memory
Page Frame
Process 1
Process N
.
.
RMAP
Physical Memory
Page Frame Page Table 1
Page Table N
reclaim
clear pte
clear pte
1
2
Reverse mapping
Mapping
rmap – “clear pte”: check ptep_get_and_clear()
rmap: legacy approach vs new approach
(performance improvement)
1. Legacy approach: 2.6.33 or earlier kernel
2. New approach: 2.6.34 or later kernel – High-level overview
anon_vma
Page #0
vma Page Table
Page #1 Page #999
.
. anon_vma
Page #0
vma: parent
vma: child #1
vma: child #N
Page Table
Page Table
Page Table
.
.
Page #1 Page #999
.
.
Page #0
Page #0 Page #1
Page #1 Page #999
Page #999
Parent process
Parent process & child processes: Some pages may be COWed
Fork #N children
rmap: 2.6.33 or earlier kernel
rmap: 2.6.33 or earlier kernel - Limitation
anon_vma
Page #0
vma: parent
vma: child #1
vma: child #N
Page Table
Page Table
Page Table
.
.
Page #1 Page #999
.
.
Page #0
Page #0 Page #1
Page #1 Page #999
Page #999
Parent process & child processes Issue statement
2.6.34 or later kernel – High-level overview
anon_vma
Page #0 Page #1 Page #999
.
.
anon_vma_chain vma
Process
Legend
Pointer
Doubly linked list
RB-tree: RB node
.
.
.
2.6.34 or later kernel – parent/child processes interconnection
anon_vma
Page #0 Page #1 Page #999
.
.
anon_vma_chain vma
anon_vma_chain anon_vma
COW Page #1 COW Page #999
.
.
anon_vma_chain vma
Parent process
Child process #1
anon_vma_chain
anon_vma anon_vma_chain vma
Child process #2
anon_vma anon_vma_chain vma
Child process #N
Page #0
COW Page #1 Page #999
.
.
Page #0
Page #1 Page #999
.
.
COW Page #0
anon_vma_chain
.
.
.
Legend
Pointer
Doubly linked list
RB-tree: RB node
RB-tree: RB node (possible linked node)
Shared page
COW page
.
.
.
2.6.34 or later kernel: example 1
anon_vma
Page #0 Page #1 Page #999
.
.
anon_vma_chain vma
anon_vma_chain
anon_vma
COW Page #1 COW Page #999
.
.
anon_vma_chain vma
Parent process
Child process #1
anon_vma_chain
anon_vma anon_vma_chain vma
Child process #2
anon_vma anon_vma_chain vma
Child process #N
Page #0
COW Page #1 Page #999
.
.
Page #0
Page #1 Page #999
.
.
COW Page #0
anon_vma_chain
.
.
.
Legend
Pointer
Doubly linked list
RB-tree: RB node
RB-tree: RB node (possible linked node)
Shared page
COW page
reclaim
1
Traverse path (Check pfn of childrens’
pte = reclaiming page’s one)
2 pfn match?
.
.
.
2.6.34 or later kernel: example 1 – more detail
anon_vma
Page #0 Page #1 Page #999
.
.
anon_vma_chain vma
anon_vma_chain
anon_vma
COW Page #1 COW Page #999
.
.
anon_vma_chain vma
Parent process
Child process #1
anon_vma_chain
anon_vma anon_vma_chain vma
Child process #2
anon_vma anon_vma_chain vma
Child process #N
Page #0
COW Page #1 Page #999
.
.
Page #0
Page #1 Page #999
.
.
COW Page #0
anon_vma_chain
.
.
.
Legend
Pointer
Doubly linked list
RB-tree: RB node
RB-tree: RB node (possible linked node)
Shared page
COW page
reclaim
1
Traverse path (Check pfn of childrens’
pte = reclaiming page’s one)
4
2
3
pfn match?
pfn match?
pfn match?
.
.
.
2.6.34 or later kernel: example 2
anon_vma
Page #0 Page #1 Page #999
.
.
anon_vma_chain vma
anon_vma_chain anon_vma
COW Page #1 COW Page #999
.
.
anon_vma_chain vma
Parent process
Child process #1
anon_vma_chain
anon_vma anon_vma_chain vma
Child process #2
anon_vma anon_vma_chain vma
Child process #N
Page #0
COW Page #1 Page #999
.
.
Page #0
Page #1 Page #999
.
.
COW Page #0
anon_vma_chain
.
.
.
Legend
Pointer
Doubly linked list
RB-tree: RB node
RB-tree: RB node (possible linked node)
Shared page
COW page
reclaim
1
Traverse path (Check pfn of childrens’
pte = reclaiming page’s one)
.
.
.
2.6.34 or later kernel: example 2
anon_vma
Page #0 Page #1 Page #999
.
.
anon_vma_chain vma
anon_vma_chain anon_vma
COW Page #1 COW Page #999
.
.
anon_vma_chain vma
Parent process
Child process #1
anon_vma_chain
anon_vma anon_vma_chain vma
Child process #2
anon_vma anon_vma_chain vma
Child process #N
Page #0
COW Page #1 Page #999
.
.
Page #0
Page #1 Page #999
.
.
COW Page #0
anon_vma_chain
.
.
.
Legend
Pointer
Doubly linked list
RB-tree: RB node
RB-tree: RB node (possible linked node)
Shared page
COW page
reclaim
1
Traverse path (Check pfn of childrens’
pte = reclaiming page’s one)
Do not need to traverse
all anon_vma_chain(s)
Implementation Detail
1. How/when to construct/link anon_vma, anon_vma_chain and vm_area_struct
A. Let’s start from fork()
2. COW - Detail
vm_area_struct
vm_mm
vm_ops
vm_file
anon_vma_chain
anon_vma
anon_vma_chain
vma
anon_vma
same_vma
struct rb_node rb
anon_vma
struct anon_vma *root
struct anon_vma *parent
struct rb_root_cached rb_root
Interval tree implemented
via red-black tree
RMAP
page
mapping
page cache
address_space
i_pages (xarray)
i_mmap
anonymous page
Physical Memory
page frame
unsigned degree = 2
anon_vma, anon_vma_chain & vm_area_struct - Detail
fork() – anon_vma_clone()
fork() – anon_vma_clone()
fork() – anon_vma_fork()
fork() – anon_vma_fork()
fork(): COW → write fault (write-protected fault)
fork(): COW → write fault (write-protected fault)
do_wp_page
wp_page_copy
new_page = alloc_page_vma(…)
cow_user_page
copy_user_highpage
maybe_mkwrite
wp_page_shared
[MAP_PRIVATE] COW: Copy On Write
[MAP_SHARED] vma is (VM_WRITE|VM_SHARED)
page_add_new_anon_rmap
fork(): COW - page_add_new_anon_rmap()
Child Process: COW
Write fault
rb_root & rb – When/who to use?
Interval tree traversal (implemented via red-black tree) for reverse mapping
struct list_head anon_vma_chain – When/who to use?
Remove VMA: check unlink_anon_vmas()
rmap: page reclaiming – try_to_unamp()
rmap_walk
rmap_walk_anon
rmap_walk_ksm rmap_walk_file
try_to_unmap
anon_vma_interval_tree_foreach(…, &anon_vma->rb_root, …)
invalid_migration_vma
try_to_unmap_one
page_mapcount_is_zero
try_to_unmap_one
page_mapcount_is_zero
anon_vma_interval_tree_foreach(…, &mapping->i_mmap, …)
Reference
• Understanding the Linux Kernel, 3rd Edition
• 【原创】(十五)Linux内存管理之RMAP
• 奔跑吧 Linux 內核

More Related Content

Reverse Mapping (rmap) in Linux Kernel

  • 1. Reverse Mapping (rmap) in Linux Kernel Adrian Huang | May, 2022 * Based on kernel 5.11 (x86_64) – QEMU * SMP (4 CPUs) and 8GB memory * Kernel parameter: nokaslr norandmaps * Userspace: ASLR is disabled * Legacy BIOS
  • 2. Agenda • Mapping & reverse mapping • rmap: legacy approach vs new approach (performance improvement) • Implementation Detail
  • 3. Mapping & Reverse Mapping Process 1 Process N . . Page Table 1 Page Table N . . Physical Memory Page Frame Process 1 Process N . . RMAP Physical Memory Page Frame Page Table 1 Page Table N reclaim clear pte clear pte 1 2 Reverse mapping Mapping rmap – “clear pte”: check ptep_get_and_clear()
  • 4. rmap: legacy approach vs new approach (performance improvement) 1. Legacy approach: 2.6.33 or earlier kernel 2. New approach: 2.6.34 or later kernel – High-level overview
  • 5. anon_vma Page #0 vma Page Table Page #1 Page #999 . . anon_vma Page #0 vma: parent vma: child #1 vma: child #N Page Table Page Table Page Table . . Page #1 Page #999 . . Page #0 Page #0 Page #1 Page #1 Page #999 Page #999 Parent process Parent process & child processes: Some pages may be COWed Fork #N children rmap: 2.6.33 or earlier kernel
  • 6. rmap: 2.6.33 or earlier kernel - Limitation anon_vma Page #0 vma: parent vma: child #1 vma: child #N Page Table Page Table Page Table . . Page #1 Page #999 . . Page #0 Page #0 Page #1 Page #1 Page #999 Page #999 Parent process & child processes Issue statement
  • 7. 2.6.34 or later kernel – High-level overview anon_vma Page #0 Page #1 Page #999 . . anon_vma_chain vma Process Legend Pointer Doubly linked list RB-tree: RB node
  • 8. . . . 2.6.34 or later kernel – parent/child processes interconnection anon_vma Page #0 Page #1 Page #999 . . anon_vma_chain vma anon_vma_chain anon_vma COW Page #1 COW Page #999 . . anon_vma_chain vma Parent process Child process #1 anon_vma_chain anon_vma anon_vma_chain vma Child process #2 anon_vma anon_vma_chain vma Child process #N Page #0 COW Page #1 Page #999 . . Page #0 Page #1 Page #999 . . COW Page #0 anon_vma_chain . . . Legend Pointer Doubly linked list RB-tree: RB node RB-tree: RB node (possible linked node) Shared page COW page
  • 9. . . . 2.6.34 or later kernel: example 1 anon_vma Page #0 Page #1 Page #999 . . anon_vma_chain vma anon_vma_chain anon_vma COW Page #1 COW Page #999 . . anon_vma_chain vma Parent process Child process #1 anon_vma_chain anon_vma anon_vma_chain vma Child process #2 anon_vma anon_vma_chain vma Child process #N Page #0 COW Page #1 Page #999 . . Page #0 Page #1 Page #999 . . COW Page #0 anon_vma_chain . . . Legend Pointer Doubly linked list RB-tree: RB node RB-tree: RB node (possible linked node) Shared page COW page reclaim 1 Traverse path (Check pfn of childrens’ pte = reclaiming page’s one) 2 pfn match?
  • 10. . . . 2.6.34 or later kernel: example 1 – more detail anon_vma Page #0 Page #1 Page #999 . . anon_vma_chain vma anon_vma_chain anon_vma COW Page #1 COW Page #999 . . anon_vma_chain vma Parent process Child process #1 anon_vma_chain anon_vma anon_vma_chain vma Child process #2 anon_vma anon_vma_chain vma Child process #N Page #0 COW Page #1 Page #999 . . Page #0 Page #1 Page #999 . . COW Page #0 anon_vma_chain . . . Legend Pointer Doubly linked list RB-tree: RB node RB-tree: RB node (possible linked node) Shared page COW page reclaim 1 Traverse path (Check pfn of childrens’ pte = reclaiming page’s one) 4 2 3 pfn match? pfn match? pfn match?
  • 11. . . . 2.6.34 or later kernel: example 2 anon_vma Page #0 Page #1 Page #999 . . anon_vma_chain vma anon_vma_chain anon_vma COW Page #1 COW Page #999 . . anon_vma_chain vma Parent process Child process #1 anon_vma_chain anon_vma anon_vma_chain vma Child process #2 anon_vma anon_vma_chain vma Child process #N Page #0 COW Page #1 Page #999 . . Page #0 Page #1 Page #999 . . COW Page #0 anon_vma_chain . . . Legend Pointer Doubly linked list RB-tree: RB node RB-tree: RB node (possible linked node) Shared page COW page reclaim 1 Traverse path (Check pfn of childrens’ pte = reclaiming page’s one)
  • 12. . . . 2.6.34 or later kernel: example 2 anon_vma Page #0 Page #1 Page #999 . . anon_vma_chain vma anon_vma_chain anon_vma COW Page #1 COW Page #999 . . anon_vma_chain vma Parent process Child process #1 anon_vma_chain anon_vma anon_vma_chain vma Child process #2 anon_vma anon_vma_chain vma Child process #N Page #0 COW Page #1 Page #999 . . Page #0 Page #1 Page #999 . . COW Page #0 anon_vma_chain . . . Legend Pointer Doubly linked list RB-tree: RB node RB-tree: RB node (possible linked node) Shared page COW page reclaim 1 Traverse path (Check pfn of childrens’ pte = reclaiming page’s one) Do not need to traverse all anon_vma_chain(s)
  • 13. Implementation Detail 1. How/when to construct/link anon_vma, anon_vma_chain and vm_area_struct A. Let’s start from fork() 2. COW - Detail
  • 14. vm_area_struct vm_mm vm_ops vm_file anon_vma_chain anon_vma anon_vma_chain vma anon_vma same_vma struct rb_node rb anon_vma struct anon_vma *root struct anon_vma *parent struct rb_root_cached rb_root Interval tree implemented via red-black tree RMAP page mapping page cache address_space i_pages (xarray) i_mmap anonymous page Physical Memory page frame unsigned degree = 2 anon_vma, anon_vma_chain & vm_area_struct - Detail
  • 19. fork(): COW → write fault (write-protected fault)
  • 20. fork(): COW → write fault (write-protected fault) do_wp_page wp_page_copy new_page = alloc_page_vma(…) cow_user_page copy_user_highpage maybe_mkwrite wp_page_shared [MAP_PRIVATE] COW: Copy On Write [MAP_SHARED] vma is (VM_WRITE|VM_SHARED) page_add_new_anon_rmap
  • 21. fork(): COW - page_add_new_anon_rmap()
  • 23. rb_root & rb – When/who to use? Interval tree traversal (implemented via red-black tree) for reverse mapping
  • 24. struct list_head anon_vma_chain – When/who to use? Remove VMA: check unlink_anon_vmas()
  • 25. rmap: page reclaiming – try_to_unamp() rmap_walk rmap_walk_anon rmap_walk_ksm rmap_walk_file try_to_unmap anon_vma_interval_tree_foreach(…, &anon_vma->rb_root, …) invalid_migration_vma try_to_unmap_one page_mapcount_is_zero try_to_unmap_one page_mapcount_is_zero anon_vma_interval_tree_foreach(…, &mapping->i_mmap, …)
  • 26. Reference • Understanding the Linux Kernel, 3rd Edition • 【原创】(十五)Linux内存管理之RMAP • 奔跑吧 Linux 內核