5 pipeline arch_rationale

Network Platforms Group
Legal Disclaimer
General Disclaimer:
© Copyright 2015 Intel Corporation. All rights reserved. Intel, the Intel logo, Intel Inside, the Intel Inside logo, Intel.
Experience What’s Inside are trademarks of Intel. Corporation in the U.S. and/or other countries. *Other names and
brands may be claimed as the property of others.
FTC Disclaimer:
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software
or service activation. Performance varies depending on system configuration. No computer system can be absolutely
secure. Check with your system manufacturer or retailer or learn more at [intel.com].
Software and workloads used in performance tests may have been optimized for performance only on Intel
microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems,
components, software, operations and functions. Any change to any of those factors may cause the results to vary. You
should consult other information and performance tests to assist you in fully evaluating your contemplated purchases,
including the performance of that product when combined with other products. For more complete information visit
http://www.intel.com/performance.

Network Platforms Group 3
Run-to-completion vs pipeline software models
Processor 0
Physical
Core 0
Linux* Control Plane
NUMA
Pool Caches
Queue/Rings
Buffers
10 GbE
10 GbE
Physical
Core 1
Intel® DPDK
PMD Packet I/O
Packet work
Rx
Tx
Physical
Core 2
Intel® DPDK
PMD Packet I/O
Flow work
Rx
Tx
Physical
Core 3
Intel® DPDK
PMD Packet I/O
Flow Classification
App A, B, C
Rx
Tx
Physical
Core 5
Intel® DPDK
PMD Packet I/O
Flow Classification
App A, B, C
Rx
Tx
Run to Completion model
• I/O and Application workload can be handled on a single core
• I/O can be scaled over multiple cores
10 GbE
Pipeline model
• I/O application disperses packets to other cores
• Application work performed on other cores
Processor 1
Physical
Core 4
Intel® DPDK
10 GbE
Physical
Core 5
Intel® DPDK
Physical
Core 0
Intel® DPDK
PMD Packet I/O
Hash
Physical
Core 1
Intel® DPDK
App A App B App C
Physical
Core 2
Intel® DPDK
App A App B App C
Physical
Core 3
Intel® DPDK
Rx
Tx
10 GbE
Physical
Core 4
Intel® DPDK
PMD Packet I/O
Flow Classification
App A, B, C
Rx
Tx
RSS
Mode
QPI
PCIePCIePCIePCIe
PCIePCIe
NUMA
Pool Caches
Queue/Rings
Buffers
Look at more I/O on
fewer cores with
vectorization

Network Platforms Group
Simple Pipeline application
RX (poll mode)
Validation/
Classification
Enqueue
DequeueNIC NIC
HW Thread
API
Functional
block
Async
queue
Memory pool
Mbuf
Sched. queues
p
Master core (stats)

Pipeline applications
DPDK Packet Framework
 development framework for building packet
processing applications using standard pipeline
blocks
DPPD: Data Plane Performance Demonstrators
 Linux user space applications based mainly
intended for performance analysis purposes

DPDK Packet framework
DPDK Programmer’s Guide, Packet Framework
DPDK Sample Applications User Guide, Internet Protocol (IP) Pipeline Sample Application

Rapid Development of Packet Processing Apps
DPDK Packet Framework quickly turns
requirements into code

Packet Framework Components
# Component
1 Port library Port abstract interface API
Basic ports: HWQ, SWQ
Advanced ports: IP frag, IP ras, Traffic Mgr, KNI, QAT
2 Table library Table abstract interface API
Tables: Hash (Extendible bucket, LRU), ACL, LPM, Array
3 Pipeline library Pipeline configuration and run-time API
Configuration API implementation
Run-time API implementation
4 IP Pipeline
example
The Internet Protocol (IP) Pipeline application illustrates the use of
the DPDK Packet Framework tool suite by implementing functional
blocks such as packet RX, packet TX, flow classification, firewall,
routing, IP fragmentation, IP reassembly, etc which are then assigned
to different CPU cores and connected together to create complex
multi-core applications.

DPDK Packet Framework, Pipeline Level
Ports
HW queue
SW queue
IP Fragmentation
IP Reassembly
Traffic Mgr
KNI
Source/Sink
Tables
Exact Match / Hash
Access Control List (ACL)
Longest Prefix Match (LPM)
Array
Pattern Matching
Pipeline
Port In 0
Port In 1
Port Out 0
Port Out 1
Port Out 2
Table 0
Flow #
Flow #
Flow #
Actions
Actions
Actions
Table 1
Flow #
Flow #
Flow #
Actions
Actions
Actions
Zoom in: Pipeline level
Actions
Reserved actions: Send to port,
Send to table, Drop
Packet edits: push/pop labels,
modify headers (NAT, TTL update)
Flow-based: meter, stats, app ID
Accelerators: crypto, compress
Load Balancing
librte_port
librte_table
Standard methodology for pipeline development.
Ports and tables are connected together in tree-like topologies ,
with tables providing the actions to be executed on input packets.

DPDK Packet Framework, App Level
Pipelines
Packet I/O
Flow Classification
Firewall
Routing
Traffic Mgmt
CPU Core
Pipeline
CPU Core
Pipeline
CPU Core
Pipeline
CPU Core
P
CPU Core
Pipeline
P
Zoom out: Multi-core application level
librte_pipeline
The Framework breaks the app into multiple pipelines, assigns each pipeline
to a specific core and chains the pipelines together

Multi-core scaling
A complex application is typically split across multiple cores, with cores
communicating through SW queues
There is usually a performance limit on the number of table lookups and
actions that can be fit on a single a single core (due to cache memory size,
cache BW, memory BW, etc.)
The Framework breaks the app into multiple pipelines, assigns each pipeline to
a specific core and chains pipelines together
One core can do more than one pipeline, but a pipeline cannot be split across
multiple cores

Intel®dataplane performance
demonstrators
https://01.org/intel-data-plane-performance-demonstrators

Intel® DPPD: Data Plane Performance Demonstrators
An open source application
 BSD3C license
Config file defines
 Which cores are used
 Which interfaces are used
 Which tasks are executed and how configured
Allows to
 Find bottlenecks and measure performance
 Try and compare different core layouts without changing code
 Reuse config file on different systems (CPUs, hyper-threads, sockets, interfaces)

DPPD: Sample Configurations
Very simple port forwarding Simple load balancer and worker thread

Finding bottlenecks

QoS and BNG(simplified view)
4 * WT4 * WT4 * WT
InternetCPE
Classify LB
LB
6x W
Interface
LB = Load Balancing
W = Worker
TX = IO Transmit
QoS=QoS Scheduler
Traffic from CPE to Internet (upstream)
Traffic from Internet to CPE (downstream)
Classify
LB
LB
QoS
QoSTX
TX
QoS
QoS
Interface
InterfaceInterface

DPPD Display

5 pipeline arch_rationale

More Related Content

5 pipeline arch_rationale