SlideShare a Scribd company logo
Netlink Performance
Optimization
- Kalimuthu Velappan
- Dhanasekar Rathinavel
Introduction
• Netlink messaging Architecture
• System Scaling Issue
• Proposed solution
• Netlink socket filtering
• CBPF /EBPF – Micro code assembly
• EBPF – Clang/LLVM integration/Restricted C coding
• PoC
• TeamD scaling issue verification
• Performance measurements
• Application Integration
• Proposed model
• Q & A
Netlink Messaging Framework
• SONiC mainly uses NETLINK_ROUTE family for Interface
notifications
• It is a broadcast domain
● All Network interface updates are grouped under
NETLINK_ROUTE family.
● Each netdevice notifies the NETLINK subsystem about the
change in its port-properties.
● NETLINK subsystem posts a message(pkt holding “struct
nlmsghdr”) to socket recv-Q of all the registered application.
● Application then reads the message from the recv-Q,
Teamd STPd
NETLINK subsystem
Device Driver
Network Interfaces: Bridge, Vlan, Eth, PO etc.
Other Apps
RTM_NEWLINK/ RTM_DELLINK
Multicasted to all registered apps
Applications interested in NETLINK_ROUTE family updates.
Vlanmgr Portmgrd Other Apps
Applications creating/updating the NetDevice Properties
User Space
Kernel Space
Application Interaction with Netlink
Vlanmgrd
Ex: Ethernet0 is added to 4K Vlans
<<config vlan member range add 2 4094 Ethernet0>>
User Space
Kernel Space
NETLINK subsystem
4K Vlans Ethernet0
NetDevices
8190
Teamsyncd
8190
STPd UDLD
8190 8190
Without Filter
Vlanmgrd
Ethernet0 is added to 4K Vlans
<< config vlan member range add 2 4094 Ethernet0>>
User Space
Kernel Space
NETLINK subsystem
4K Vlans Ethernet0
NetDevices
8190
Teamsyncd
Dropped
8190
STPd UDLD
8190
Teamsyncd & STPd - Binded with eBPF Filter to drop all
Vlan-Member add.
With eBPF Filter
Dropped
8190
nl_msg_hdr (for msg_type == RTM_NEWLINK/DELLINK)
ifinfomsg
Attribute-1
T = IFLA_ADDRESS
V = MAC
Attribute-2
T = IFLA_IFNAME
V = if_name
Attribute-3
T = IFLA_LINKINFO
V = Nested TLVs
T = IFLA_INFO_KIND
V = Team/Vlan
T = IFLA_INFO_SLAVE_KIND
V = Team/Vlan
TLV-N
Attribute-N
nlmsghdr (Carries single netdevice attributes)
sk_buff->data
Netlink Message Format
Every attribute change in the interface will generate the
RTM_NEWLINK message with all the attributes
nlmsghdr-1
sk_buff->data
nlmsghdr-2 nlmsghdr-3 nlmsghdr-N
nlmsghdr-1 nlmsghdr-2 nlmsghdr-3 nlmsghdr-N
nlmsghdr-1 nlmsghdr-2 nlmsghdr-3 nlmsghdr-N
sk_buff->data
sk_buff->data
NetLink Dump will continue untill the complete DB is sent to Application.
Each DUMP reply will have NLM_F_MULTI flag and the last DUMP msg will have NLMSG_DONE. Which is used in filter to trap all DUMP-replies.
NetLink Dump
SONiC Netlink Message - Scaling Issue
• Every net device has multiple attributes
• Any attribute change will generate an net-link message notification
• Application has to process all the netlink messages generated by all the net-devices.
• There is no way to register only for a specific interface or a specific attribute change.
• When 4K VLAN is configured per port
• It generates ~8K Netlink messages
• On a scaled system
• Each process registers for kernel link notification
• Each process suffers from the same bursty notification issue as seen with Teamd
• Easley more than 1M unnecessary messages are getting broadcasted across system.
• Application is not able to process all the messages during config reload and also system reboot
• When socket queue is getting full, messages are dropped with ENOBUF error. No way to retrieve
the lost notification
Netlink Filter
• Berkeley Socket Filter (BPF)
• Interface to execute Micro ASM in the kernel as Minimal VM
• ASM Filter code gets executed for every packet reception
• Return value decides whether to accept/drop the packet
• Gets executed as part of Netlink message sender context
• Filter execution doesn’t affect much of the CPU performance
Netlink socket filtering – CBPF/EBPF
• CBPF /EBPF
• Micro code assembly
• Performance – Optimized flow
• Easy to attach filter
• Limitations
• No loops
• Limited set of registers
• Jump tracing is very hard to debug
• No Local storage – Array/maps –
CBPF
• No NLATTR helper function in EBPF
fd = socket(NETLINIK_ROUTE)
Socket fd
Receive netlink message
struct bpf_insn prog[] = {
BPF_MOV64_REG(R6, R1),
BPF_LD_ABS(BPF_B, 14 + 9), /* Protocol offset */
BPF_JMP_IMM(BPF_NEQ, R0, 7, 1), /* UDP(7) */
BPF_MOV64_IMM(R0, 0xFFFF) /* 0xFFFF- ACCEPT */
BPF_EXIT_INSN(),
};
setsockoption(fd, SO_ATTACH_BPF..)
BPF verifier
BPF JIT compiler
BPF in
Native code
User
Kernel
recvmsg(fd..)
Netlink subsystem
Netlink socket filtering – Clang/LLVM
• Clang/LLVM
• Restricted C
• Array and Hash map
support
• Easy to write and debug
the filter code
• Limitations
• Not an optimized
instruction flow
fd = socket(NETLINIK_ROUTE)
Socket fd
Receive netlink message
SEC("socket") int bpf_prog1(struct __sk_buff *skb)
{
uint16_t flags = load_half(skb, offsetof(struct nlmsghdr, nlmsg_flags));
if ( flags & NLM_F_MULTI)
return ACCEPT_PKT;
else
return DROP_PKT;
}
Clang/llvm
compilation
BPF verifier
BPF JIT compiler
BPF in
Native code
User
Kernel
recvmsg(fd..)
Netlink subsystem
load_and_attach(fd, SO_ATTACH_BPF..)
filter-obj.bpf
PoC with TeamD
• Arlo [ JIRA-7122 ] is fixed
• Verified the ENOBUF issue is not
seen with 4K VLAN sanity suite.
• Thanks to Madhukar
• Helping to understand the teamd
filter requirements
• Validating the PoC filter
FILTER DROP COUNT
Dropped in
Kernel
Trapped to
Application
Dropped %
Teamd
(Per port-channel)
79814 238 99.7%
teamsyncd 214510 42696 83.4%
Design for PoC verification
• Added Kernel patch for nlattr
and nestednlattr search helper
function
• Customized EBPF filter logic for
TeamD
• Clang/LLVM compiler integration
fd = socket(NETLINIK_ROUTE)
Socket fd
Receive netlink message
Hash MAP
DB
BPF Filter
User
Kernel
Netlink subsystem
KEY /
IFINDEX
VALUE/
Attributes
1 [ s:1, f:2, v:3 ]
64 [ s:1, f:3, v:7 ]
23 [ s:1, f:5, v:6 ]
Access from User space
Application Integration Proposal
• EBPF assembly filter
• Clang/LLVM based filter
• Customized BPF filter library
• BCC – python integrated
EBPF assembly filter
• 11 Register set
• Kernel helper functions
• Kernel trace printk
• Array/Hash map APIs
• Tail calls
• Redirects
EBPF
Register
Description
R0 Return value from in-kernel
function, and exit value for eBPF
program
R1 ~ R5 Arguments from eBPF program to
in-kernel function
R6 ~ R9 Callee saved registers that in-kernel
function will preserve
R10 Read-only frame pointer to access
stack
Clang/LLVM
• Clang/LLVM compiler integration
• Build infra for compilation of
application specific filter
• Libsbpf.so - library
• Application interface
• Loads the ebpf object into kernel
• Attaches the ebpf filter code into
application socket
• Application
• App User will write custom filter
for their needs
Application
attach_filter(fd,”myfilter.o”)
libsbpf.so
attach_filter(fd, fobj)
BPF Filter build framework
BPF bytecode
compiler
MyFilter
[ My filter logic – myfilter.c ]
myfilter.o
filter callback
load_filter(fd, fobj)
Customized EBPF library for SONiC (Idea)
• Set of BPF filter rules and actions
• Rules can be
• Offset lookup and match
• Attribute lookup and match
• Nested attribute lookup and match
• Save result into a variable
• Action can be
• Accept
• Drop
• Jump to Nth rule
Label Rule Offset Mask Exp Action
FCHECK OFFSET 0x20 0xFF 0xaa ACCEPT
NLCHECK NLMATCH 0x56 0FE 0xbb GOTO NESTCHECK
DROP DROP 0x00 0x00 0x00 DROP
NESTCHECK NAMATCH 0x89 0xAF 0xcc ACCEPT
RETURN DROP 0x00 0x00 0x00 DROP
BPF Possibilities
• Time critical protocol packets can be generated from kernel.
• Statistics collection
• Custom user code injection
• And Much more …
Thank You

More Related Content

Netlink-Optimization.pptx

  • 1. Netlink Performance Optimization - Kalimuthu Velappan - Dhanasekar Rathinavel
  • 2. Introduction • Netlink messaging Architecture • System Scaling Issue • Proposed solution • Netlink socket filtering • CBPF /EBPF – Micro code assembly • EBPF – Clang/LLVM integration/Restricted C coding • PoC • TeamD scaling issue verification • Performance measurements • Application Integration • Proposed model • Q & A
  • 3. Netlink Messaging Framework • SONiC mainly uses NETLINK_ROUTE family for Interface notifications • It is a broadcast domain ● All Network interface updates are grouped under NETLINK_ROUTE family. ● Each netdevice notifies the NETLINK subsystem about the change in its port-properties. ● NETLINK subsystem posts a message(pkt holding “struct nlmsghdr”) to socket recv-Q of all the registered application. ● Application then reads the message from the recv-Q,
  • 4. Teamd STPd NETLINK subsystem Device Driver Network Interfaces: Bridge, Vlan, Eth, PO etc. Other Apps RTM_NEWLINK/ RTM_DELLINK Multicasted to all registered apps Applications interested in NETLINK_ROUTE family updates. Vlanmgr Portmgrd Other Apps Applications creating/updating the NetDevice Properties User Space Kernel Space Application Interaction with Netlink
  • 5. Vlanmgrd Ex: Ethernet0 is added to 4K Vlans <<config vlan member range add 2 4094 Ethernet0>> User Space Kernel Space NETLINK subsystem 4K Vlans Ethernet0 NetDevices 8190 Teamsyncd 8190 STPd UDLD 8190 8190 Without Filter
  • 6. Vlanmgrd Ethernet0 is added to 4K Vlans << config vlan member range add 2 4094 Ethernet0>> User Space Kernel Space NETLINK subsystem 4K Vlans Ethernet0 NetDevices 8190 Teamsyncd Dropped 8190 STPd UDLD 8190 Teamsyncd & STPd - Binded with eBPF Filter to drop all Vlan-Member add. With eBPF Filter Dropped 8190
  • 7. nl_msg_hdr (for msg_type == RTM_NEWLINK/DELLINK) ifinfomsg Attribute-1 T = IFLA_ADDRESS V = MAC Attribute-2 T = IFLA_IFNAME V = if_name Attribute-3 T = IFLA_LINKINFO V = Nested TLVs T = IFLA_INFO_KIND V = Team/Vlan T = IFLA_INFO_SLAVE_KIND V = Team/Vlan TLV-N Attribute-N nlmsghdr (Carries single netdevice attributes) sk_buff->data Netlink Message Format Every attribute change in the interface will generate the RTM_NEWLINK message with all the attributes
  • 8. nlmsghdr-1 sk_buff->data nlmsghdr-2 nlmsghdr-3 nlmsghdr-N nlmsghdr-1 nlmsghdr-2 nlmsghdr-3 nlmsghdr-N nlmsghdr-1 nlmsghdr-2 nlmsghdr-3 nlmsghdr-N sk_buff->data sk_buff->data NetLink Dump will continue untill the complete DB is sent to Application. Each DUMP reply will have NLM_F_MULTI flag and the last DUMP msg will have NLMSG_DONE. Which is used in filter to trap all DUMP-replies. NetLink Dump
  • 9. SONiC Netlink Message - Scaling Issue • Every net device has multiple attributes • Any attribute change will generate an net-link message notification • Application has to process all the netlink messages generated by all the net-devices. • There is no way to register only for a specific interface or a specific attribute change. • When 4K VLAN is configured per port • It generates ~8K Netlink messages • On a scaled system • Each process registers for kernel link notification • Each process suffers from the same bursty notification issue as seen with Teamd • Easley more than 1M unnecessary messages are getting broadcasted across system. • Application is not able to process all the messages during config reload and also system reboot • When socket queue is getting full, messages are dropped with ENOBUF error. No way to retrieve the lost notification
  • 10. Netlink Filter • Berkeley Socket Filter (BPF) • Interface to execute Micro ASM in the kernel as Minimal VM • ASM Filter code gets executed for every packet reception • Return value decides whether to accept/drop the packet • Gets executed as part of Netlink message sender context • Filter execution doesn’t affect much of the CPU performance
  • 11. Netlink socket filtering – CBPF/EBPF • CBPF /EBPF • Micro code assembly • Performance – Optimized flow • Easy to attach filter • Limitations • No loops • Limited set of registers • Jump tracing is very hard to debug • No Local storage – Array/maps – CBPF • No NLATTR helper function in EBPF fd = socket(NETLINIK_ROUTE) Socket fd Receive netlink message struct bpf_insn prog[] = { BPF_MOV64_REG(R6, R1), BPF_LD_ABS(BPF_B, 14 + 9), /* Protocol offset */ BPF_JMP_IMM(BPF_NEQ, R0, 7, 1), /* UDP(7) */ BPF_MOV64_IMM(R0, 0xFFFF) /* 0xFFFF- ACCEPT */ BPF_EXIT_INSN(), }; setsockoption(fd, SO_ATTACH_BPF..) BPF verifier BPF JIT compiler BPF in Native code User Kernel recvmsg(fd..) Netlink subsystem
  • 12. Netlink socket filtering – Clang/LLVM • Clang/LLVM • Restricted C • Array and Hash map support • Easy to write and debug the filter code • Limitations • Not an optimized instruction flow fd = socket(NETLINIK_ROUTE) Socket fd Receive netlink message SEC("socket") int bpf_prog1(struct __sk_buff *skb) { uint16_t flags = load_half(skb, offsetof(struct nlmsghdr, nlmsg_flags)); if ( flags & NLM_F_MULTI) return ACCEPT_PKT; else return DROP_PKT; } Clang/llvm compilation BPF verifier BPF JIT compiler BPF in Native code User Kernel recvmsg(fd..) Netlink subsystem load_and_attach(fd, SO_ATTACH_BPF..) filter-obj.bpf
  • 13. PoC with TeamD • Arlo [ JIRA-7122 ] is fixed • Verified the ENOBUF issue is not seen with 4K VLAN sanity suite. • Thanks to Madhukar • Helping to understand the teamd filter requirements • Validating the PoC filter FILTER DROP COUNT Dropped in Kernel Trapped to Application Dropped % Teamd (Per port-channel) 79814 238 99.7% teamsyncd 214510 42696 83.4%
  • 14. Design for PoC verification • Added Kernel patch for nlattr and nestednlattr search helper function • Customized EBPF filter logic for TeamD • Clang/LLVM compiler integration fd = socket(NETLINIK_ROUTE) Socket fd Receive netlink message Hash MAP DB BPF Filter User Kernel Netlink subsystem KEY / IFINDEX VALUE/ Attributes 1 [ s:1, f:2, v:3 ] 64 [ s:1, f:3, v:7 ] 23 [ s:1, f:5, v:6 ] Access from User space
  • 15. Application Integration Proposal • EBPF assembly filter • Clang/LLVM based filter • Customized BPF filter library • BCC – python integrated
  • 16. EBPF assembly filter • 11 Register set • Kernel helper functions • Kernel trace printk • Array/Hash map APIs • Tail calls • Redirects EBPF Register Description R0 Return value from in-kernel function, and exit value for eBPF program R1 ~ R5 Arguments from eBPF program to in-kernel function R6 ~ R9 Callee saved registers that in-kernel function will preserve R10 Read-only frame pointer to access stack
  • 17. Clang/LLVM • Clang/LLVM compiler integration • Build infra for compilation of application specific filter • Libsbpf.so - library • Application interface • Loads the ebpf object into kernel • Attaches the ebpf filter code into application socket • Application • App User will write custom filter for their needs Application attach_filter(fd,”myfilter.o”) libsbpf.so attach_filter(fd, fobj) BPF Filter build framework BPF bytecode compiler MyFilter [ My filter logic – myfilter.c ] myfilter.o filter callback load_filter(fd, fobj)
  • 18. Customized EBPF library for SONiC (Idea) • Set of BPF filter rules and actions • Rules can be • Offset lookup and match • Attribute lookup and match • Nested attribute lookup and match • Save result into a variable • Action can be • Accept • Drop • Jump to Nth rule Label Rule Offset Mask Exp Action FCHECK OFFSET 0x20 0xFF 0xaa ACCEPT NLCHECK NLMATCH 0x56 0FE 0xbb GOTO NESTCHECK DROP DROP 0x00 0x00 0x00 DROP NESTCHECK NAMATCH 0x89 0xAF 0xcc ACCEPT RETURN DROP 0x00 0x00 0x00 DROP
  • 19. BPF Possibilities • Time critical protocol packets can be generated from kernel. • Statistics collection • Custom user code injection • And Much more …