SlideShare a Scribd company logo
Pipeline architecture
Network Platforms Group
Legal Disclaimer
General Disclaimer:
© Copyright 2015 Intel Corporation. All rights reserved. Intel, the Intel logo, Intel Inside, the Intel Inside logo, Intel.
Experience What’s Inside are trademarks of Intel. Corporation in the U.S. and/or other countries. *Other names and
brands may be claimed as the property of others.
FTC Disclaimer:
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software
or service activation. Performance varies depending on system configuration. No computer system can be absolutely
secure. Check with your system manufacturer or retailer or learn more at [intel.com].
Software and workloads used in performance tests may have been optimized for performance only on Intel
microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems,
components, software, operations and functions. Any change to any of those factors may cause the results to vary. You
should consult other information and performance tests to assist you in fully evaluating your contemplated purchases,
including the performance of that product when combined with other products. For more complete information visit
http://www.intel.com/performance.
Network Platforms Group 3
Run-to-completion vs pipeline software models
Processor 0
Physical
Core 0
Linux* Control Plane
NUMA
Pool Caches
Queue/Rings
Buffers
10 GbE
10 GbE
Physical
Core 1
Intel® DPDK
PMD Packet I/O
Packet work
Rx
Tx
Physical
Core 2
Intel® DPDK
PMD Packet I/O
Flow work
Rx
Tx
Physical
Core 3
Intel® DPDK
PMD Packet I/O
Flow Classification
App A, B, C
Rx
Tx
Physical
Core 5
Intel® DPDK
PMD Packet I/O
Flow Classification
App A, B, C
Rx
Tx
Run to Completion model
• I/O and Application workload can be handled on a single core
• I/O can be scaled over multiple cores
10 GbE
Pipeline model
• I/O application disperses packets to other cores
• Application work performed on other cores
Processor 1
Physical
Core 4
Intel® DPDK
10 GbE
Physical
Core 5
Intel® DPDK
Physical
Core 0
Intel® DPDK
PMD Packet I/O
Hash
Physical
Core 1
Intel® DPDK
App A App B App C
Physical
Core 2
Intel® DPDK
App A App B App C
Physical
Core 3
Intel® DPDK
Rx
Tx
10 GbE
Physical
Core 4
Intel® DPDK
PMD Packet I/O
Flow Classification
App A, B, C
Rx
Tx
RSS
Mode
QPI
PCIePCIePCIePCIe
PCIePCIe
NUMA
Pool Caches
Queue/Rings
Buffers
Look at more I/O on
fewer cores with
vectorization
Network Platforms Group
Simple Pipeline application
RX (poll mode)
Validation/
Classification
Enqueue
DequeueNIC NIC
HW Thread
API
Functional
block
Async
queue
Memory pool
Mbuf
Sched. queues
p
Master core (stats)
Network Platforms Group 5
Pipeline applications
DPDK Packet Framework
 development framework for building packet
processing applications using standard pipeline
blocks
DPPD: Data Plane Performance Demonstrators
 Linux user space applications based mainly
intended for performance analysis purposes
DPDK Packet framework
DPDK Programmer’s Guide, Packet Framework
DPDK Sample Applications User Guide, Internet Protocol (IP) Pipeline Sample Application
Network Platforms Group 7
Rapid Development of Packet Processing Apps
DPDK Packet Framework quickly turns
requirements into code
Network Platforms Group 8
Packet Framework Components
# Component
1 Port library Port abstract interface API
Basic ports: HWQ, SWQ
Advanced ports: IP frag, IP ras, Traffic Mgr, KNI, QAT
2 Table library Table abstract interface API
Tables: Hash (Extendible bucket, LRU), ACL, LPM, Array
3 Pipeline library Pipeline configuration and run-time API
Configuration API implementation
Run-time API implementation
4 IP Pipeline
example
The Internet Protocol (IP) Pipeline application illustrates the use of
the DPDK Packet Framework tool suite by implementing functional
blocks such as packet RX, packet TX, flow classification, firewall,
routing, IP fragmentation, IP reassembly, etc which are then assigned
to different CPU cores and connected together to create complex
multi-core applications.
Network Platforms Group 9
DPDK Packet Framework, Pipeline Level
Ports
HW queue
SW queue
IP Fragmentation
IP Reassembly
Traffic Mgr
KNI
Source/Sink
Tables
Exact Match / Hash
Access Control List (ACL)
Longest Prefix Match (LPM)
Array
Pattern Matching
Pipeline
Port In 0
Port In 1
Port Out 0
Port Out 1
Port Out 2
Table 0
Flow #
Flow #
Flow #
Actions
Actions
Actions
Table 1
Flow #
Flow #
Flow #
Actions
Actions
Actions
Zoom in: Pipeline level
Actions
Reserved actions: Send to port,
Send to table, Drop
Packet edits: push/pop labels,
modify headers (NAT, TTL update)
Flow-based: meter, stats, app ID
Accelerators: crypto, compress
Load Balancing
librte_port
librte_table
Standard methodology for pipeline development.
Ports and tables are connected together in tree-like topologies ,
with tables providing the actions to be executed on input packets.
Network Platforms Group 10
DPDK Packet Framework, App Level
Pipelines
Packet I/O
Flow Classification
Firewall
Routing
Traffic Mgmt
CPU Core
Pipeline
CPU Core
Pipeline
CPU Core
Pipeline
CPU Core
P
CPU Core
Pipeline
P
Zoom out: Multi-core application level
librte_pipeline
The Framework breaks the app into multiple pipelines, assigns each pipeline
to a specific core and chains the pipelines together
Network Platforms Group 11
Multi-core scaling
A complex application is typically split across multiple cores, with cores
communicating through SW queues
There is usually a performance limit on the number of table lookups and
actions that can be fit on a single a single core (due to cache memory size,
cache BW, memory BW, etc.)
The Framework breaks the app into multiple pipelines, assigns each pipeline to
a specific core and chains pipelines together
One core can do more than one pipeline, but a pipeline cannot be split across
multiple cores
Intel®dataplane performance
demonstrators
https://01.org/intel-data-plane-performance-demonstrators
Network Platforms Group 13
Intel® DPPD: Data Plane Performance Demonstrators
An open source application
 BSD3C license
Config file defines
 Which cores are used
 Which interfaces are used
 Which tasks are executed and how configured
Allows to
 Find bottlenecks and measure performance
 Try and compare different core layouts without changing code
 Reuse config file on different systems (CPUs, hyper-threads, sockets, interfaces)
Network Platforms Group 14
DPPD: Sample Configurations
Very simple port forwarding Simple load balancer and worker thread
Network Platforms Group 15
Finding bottlenecks
Network Platforms Group 16
QoS and BNG(simplified view)
4 * WT4 * WT4 * WT
InternetCPE
Classify LB
LB
6x W
Interface
LB = Load Balancing
W = Worker
TX = IO Transmit
QoS=QoS Scheduler
Traffic from CPE to Internet (upstream)
Traffic from Internet to CPE (downstream)
Classify
LB
LB
QoS
QoSTX
TX
QoS
QoS
Interface
InterfaceInterface
Network Platforms Group 17
DPPD Display
5 pipeline arch_rationale

More Related Content

5 pipeline arch_rationale

  • 2. Network Platforms Group Legal Disclaimer General Disclaimer: © Copyright 2015 Intel Corporation. All rights reserved. Intel, the Intel logo, Intel Inside, the Intel Inside logo, Intel. Experience What’s Inside are trademarks of Intel. Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. FTC Disclaimer: Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at [intel.com]. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance.
  • 3. Network Platforms Group 3 Run-to-completion vs pipeline software models Processor 0 Physical Core 0 Linux* Control Plane NUMA Pool Caches Queue/Rings Buffers 10 GbE 10 GbE Physical Core 1 Intel® DPDK PMD Packet I/O Packet work Rx Tx Physical Core 2 Intel® DPDK PMD Packet I/O Flow work Rx Tx Physical Core 3 Intel® DPDK PMD Packet I/O Flow Classification App A, B, C Rx Tx Physical Core 5 Intel® DPDK PMD Packet I/O Flow Classification App A, B, C Rx Tx Run to Completion model • I/O and Application workload can be handled on a single core • I/O can be scaled over multiple cores 10 GbE Pipeline model • I/O application disperses packets to other cores • Application work performed on other cores Processor 1 Physical Core 4 Intel® DPDK 10 GbE Physical Core 5 Intel® DPDK Physical Core 0 Intel® DPDK PMD Packet I/O Hash Physical Core 1 Intel® DPDK App A App B App C Physical Core 2 Intel® DPDK App A App B App C Physical Core 3 Intel® DPDK Rx Tx 10 GbE Physical Core 4 Intel® DPDK PMD Packet I/O Flow Classification App A, B, C Rx Tx RSS Mode QPI PCIePCIePCIePCIe PCIePCIe NUMA Pool Caches Queue/Rings Buffers Look at more I/O on fewer cores with vectorization
  • 4. Network Platforms Group Simple Pipeline application RX (poll mode) Validation/ Classification Enqueue DequeueNIC NIC HW Thread API Functional block Async queue Memory pool Mbuf Sched. queues p Master core (stats)
  • 5. Network Platforms Group 5 Pipeline applications DPDK Packet Framework  development framework for building packet processing applications using standard pipeline blocks DPPD: Data Plane Performance Demonstrators  Linux user space applications based mainly intended for performance analysis purposes
  • 6. DPDK Packet framework DPDK Programmer’s Guide, Packet Framework DPDK Sample Applications User Guide, Internet Protocol (IP) Pipeline Sample Application
  • 7. Network Platforms Group 7 Rapid Development of Packet Processing Apps DPDK Packet Framework quickly turns requirements into code
  • 8. Network Platforms Group 8 Packet Framework Components # Component 1 Port library Port abstract interface API Basic ports: HWQ, SWQ Advanced ports: IP frag, IP ras, Traffic Mgr, KNI, QAT 2 Table library Table abstract interface API Tables: Hash (Extendible bucket, LRU), ACL, LPM, Array 3 Pipeline library Pipeline configuration and run-time API Configuration API implementation Run-time API implementation 4 IP Pipeline example The Internet Protocol (IP) Pipeline application illustrates the use of the DPDK Packet Framework tool suite by implementing functional blocks such as packet RX, packet TX, flow classification, firewall, routing, IP fragmentation, IP reassembly, etc which are then assigned to different CPU cores and connected together to create complex multi-core applications.
  • 9. Network Platforms Group 9 DPDK Packet Framework, Pipeline Level Ports HW queue SW queue IP Fragmentation IP Reassembly Traffic Mgr KNI Source/Sink Tables Exact Match / Hash Access Control List (ACL) Longest Prefix Match (LPM) Array Pattern Matching Pipeline Port In 0 Port In 1 Port Out 0 Port Out 1 Port Out 2 Table 0 Flow # Flow # Flow # Actions Actions Actions Table 1 Flow # Flow # Flow # Actions Actions Actions Zoom in: Pipeline level Actions Reserved actions: Send to port, Send to table, Drop Packet edits: push/pop labels, modify headers (NAT, TTL update) Flow-based: meter, stats, app ID Accelerators: crypto, compress Load Balancing librte_port librte_table Standard methodology for pipeline development. Ports and tables are connected together in tree-like topologies , with tables providing the actions to be executed on input packets.
  • 10. Network Platforms Group 10 DPDK Packet Framework, App Level Pipelines Packet I/O Flow Classification Firewall Routing Traffic Mgmt CPU Core Pipeline CPU Core Pipeline CPU Core Pipeline CPU Core P CPU Core Pipeline P Zoom out: Multi-core application level librte_pipeline The Framework breaks the app into multiple pipelines, assigns each pipeline to a specific core and chains the pipelines together
  • 11. Network Platforms Group 11 Multi-core scaling A complex application is typically split across multiple cores, with cores communicating through SW queues There is usually a performance limit on the number of table lookups and actions that can be fit on a single a single core (due to cache memory size, cache BW, memory BW, etc.) The Framework breaks the app into multiple pipelines, assigns each pipeline to a specific core and chains pipelines together One core can do more than one pipeline, but a pipeline cannot be split across multiple cores
  • 13. Network Platforms Group 13 Intel® DPPD: Data Plane Performance Demonstrators An open source application  BSD3C license Config file defines  Which cores are used  Which interfaces are used  Which tasks are executed and how configured Allows to  Find bottlenecks and measure performance  Try and compare different core layouts without changing code  Reuse config file on different systems (CPUs, hyper-threads, sockets, interfaces)
  • 14. Network Platforms Group 14 DPPD: Sample Configurations Very simple port forwarding Simple load balancer and worker thread
  • 15. Network Platforms Group 15 Finding bottlenecks
  • 16. Network Platforms Group 16 QoS and BNG(simplified view) 4 * WT4 * WT4 * WT InternetCPE Classify LB LB 6x W Interface LB = Load Balancing W = Worker TX = IO Transmit QoS=QoS Scheduler Traffic from CPE to Internet (upstream) Traffic from Internet to CPE (downstream) Classify LB LB QoS QoSTX TX QoS QoS Interface InterfaceInterface
  • 17. Network Platforms Group 17 DPPD Display