SlideShare a Scribd company logo
© 2016 Mesosphere, Inc. All Rights Reserved.
DC/OS 1.8
NETWORKING
@SARGUN
1
Sargun Dhillon, July 2016
© 2016 Mesosphere, Inc. All Rights Reserved.
WHO AM I?
2
© 2016 Mesosphere, Inc. All Rights Reserved.
AGENDA
3
• How, and why did we go here?
• How does DC/OS bring you closer to the ideal?
• Some of the future
© 2016 Mesosphere, Inc. All Rights Reserved.
A BRIEF
HISTORY OF
NETWORKS
IN THE DC
4
© 2016 Mesosphere, Inc. All Rights Reserved.
ORGANIZATIONS CIRCA 2007
5
•Before DevOps was first heard
•Clear differentiation of ownership
•The datacenter was owned by a the NOC
•Deployment of services was done by sysadmins in the operations group
•Developers operated without access to production
•Production deployments gated by QA, Operations
© 2016 Mesosphere, Inc. All Rights Reserved.
SOFTWARE CIRCA 2007
6
•Different services glued together via CORBA, XML-RPC, SOAP
•No one was really consciously doing microservices
•Networks were static, giant layer 2 domains
•Load Balancing provided by hardware
•Everyone ran their own datacenter
•EC2 in its infancy, only a year prior has the term “Cloud” began to become popular
•Systems statically partitioned
© 2016 Mesosphere, Inc. All Rights Reserved. 7
SaaS continued to grow at
an incredible rate
© 2016 Mesosphere, Inc. All Rights Reserved. 8
There became a race to ship faster
© 2016 Mesosphere, Inc. All Rights Reserved. 9
We kept the software alive
By feeding it
With Sysadmins
© 2016 Mesosphere, Inc. All Rights Reserved. 10
We kept the machines alive
By feeding them
With Blood
© 2016 Mesosphere, Inc. All Rights Reserved. 11
This wasn’t working
© 2016 Mesosphere, Inc. All Rights Reserved.
ORGANIZATIONS 2008+
12
•We began seeing a gradual shift in the industry where lines between QA, Dev, and
Ops were blurring
•Devops term coined in 2008, first DevOpsDay in 2009
•Gradual adoption of the cloud, fewer organizations owning their own datacenters
•Either networking was outsourced to the cloud, or typically remained in a small
internal organization
•Needed to reduce ratio of operators to servers
© 2016 Mesosphere, Inc. All Rights Reserved.
SOFTWARE CIRCA 2008+
13
•Popularization of Open Source tooling to automate much of traditional operations, and QA
•Jenkins / Hudson
•Puppet / Chef
•Capistrano
•Popularization of stacks requiring with more complex operational requirements
•Nutch / Hadoop
•NoSQLs
•Still statically partitioned machines
•Networks still sacred territory
© 2016 Mesosphere, Inc. All Rights Reserved.
CIRCA 2011
14
•Much of what’s been happening for the past half-decade hits networking
•Much of this falls under the term “SDN” (Software Defined Networking) or “NFV” (Network
Function Virtualization)
•Hastened by the adoption of VMs in the enterprise in the hype cycle
•Openflow promises to fix everything
•Major adoptions of the cloud by startups as well as enterprise
•Virtualization begins to become mainstream as a mechanism of consolidating workloads
•The invention of the “private cloud”
•DotCloud / Docker funded by Y-combinator a year earlier
•Term “Microservice” coined
© 2016 Mesosphere, Inc. All Rights Reserved.
CIRCA 2013
15
• Docker becomes instant hit and brings containers to the forefront
• Dynamic partitioning begins to make in-roads
• Google releases Omega paper
• Apache Aurora open sourced
• Microservice counts explode, demanding collocation of workloads for efficiency
• Mesosphere Founded
• Site Reliability Engineering begins to popularize and further blur the lines between
Dev, Ops, and QA
© 2016 Mesosphere, Inc. All Rights Reserved. 16
Everything was changing
© 2016 Mesosphere, Inc. All Rights Reserved. 17
Why?
© 2016 Mesosphere, Inc. All Rights Reserved. 18
Business Value
© 2016 Mesosphere, Inc. All Rights Reserved. 19
© 2016 Mesosphere, Inc. All Rights Reserved.
BENEFITS
20
•Reduction in cost of goods sold
•Smaller engineer to server ratio
•Linear, or super linear growth rate of engineering team to servers is unsustainable
•Smaller engineer to capability ratio, where capability includes:
•Features
•Throughput
•Better User Experience
•Better availability
•Quicker release to features
© 2016 Mesosphere, Inc. All Rights Reserved. 21
But at what cost?
© 2016 Mesosphere, Inc. All Rights Reserved. 22
Complexity
© 2016 Mesosphere, Inc. All Rights Reserved.
OLD WORLD
23
© 2016 Mesosphere, Inc. All Rights Reserved.
NEW WORLD
24
© 2016 Mesosphere, Inc. All Rights Reserved.
DIVING DEEPER
25
© 2016 Mesosphere, Inc. All Rights Reserved. 26
Paxos?
Raft?
Ω Failure Detector?
Pods?
Wat?
Sidecars?
Etcd?
Zookeeper?
VxLan?
© 2016 Mesosphere, Inc. All Rights Reserved. 27
Performance
© 2016 Mesosphere, Inc. All Rights Reserved.
REDIS PERFORMANCE
28
© 2016 Mesosphere, Inc. All Rights Reserved.
THE DC/OS
APPROACH
29
© 2016 Mesosphere, Inc. All Rights Reserved.
CORE TENANTS
30
•DC/OS must be agnostic to the underlying environment
•AWS / Azure / GCE / Softlayer as the lowest common
denominators
•DC/OS should require no to minimal changes to the
code in order to work
•DC/OS should provide similar services to existing
environments
•Fixed load balancers
•Security
•IP/Container
•We do not want to require a change in organization
procedures
© 2016 Mesosphere, Inc. All Rights Reserved.
CURRENT
SERVICES
PROVIDED
31
• Service Discovery
• Mesos-DNS
• Navstar*
• Spartan*
• Load Balancing
• Minuteman*
• Accessibility
• Octarine*
• IP Per Container
• Control Plane
• Lashup* *Project Name
© 2016 Mesosphere, Inc. All Rights Reserved.
SERVICE DISCOVERY
PROJECTS:
MESOS-DNS
NAVSTAR
SPARTAN
32
© 2016 Mesosphere, Inc. All Rights Reserved.
MESOS-DNS / NAVSTAR / SPARTAN
33
• Simple service discovery mechanism that exposes service locations over DNS
• Service names, and locations exposed
• via SRV records.
• via A records
• Typically requires modification of downstream code
• Good for bootstrap
© 2016 Mesosphere, Inc. All Rights Reserved.
HIGH LEVEL
34
© 2016 Mesosphere, Inc. All Rights Reserved.
EXTENDED
USAGE
35
© 2016 Mesosphere, Inc. All Rights Reserved.
INTEGRATIONS
36
© 2016 Mesosphere, Inc. All Rights Reserved.
SPARTAN
37
• DNS proxy that’s closely coupled with Navstar
• Raises availability by doubling work
• Makes DNS 2N, 2N+1, or N+1 systems act as such
• Reduces latency at scale
• Dual dispatches the query, and waits for first response
A Jeff Dean Jig
© 2016 Mesosphere, Inc. All Rights Reserved.
SPARTAN
38
© 2016 Mesosphere, Inc. All Rights Reserved.
LOAD
BALANCING
PROJECTS:
MINUTEMAN
39
© 2016 Mesosphere, Inc. All Rights Reserved.
MINUTEMAN
40
•Low-Overhead TCP load balancing
•Low-overhead during continuous TCP connection
•Pay balancing cost upfront
•Inflicts minimal overhead on non-load balanced traffic
•Fault-tolerance period aims to be <100ms
© 2016 Mesosphere, Inc. All Rights Reserved.
FUNCTIONALLY
41
© 2016 Mesosphere, Inc. All Rights Reserved. 42
© 2016 Mesosphere, Inc. All Rights Reserved. 43
© 2016 Mesosphere, Inc. All Rights Reserved. 44
© 2016 Mesosphere, Inc. All Rights Reserved.
VIRTUAL
NETWORKS
PROJECTS:
NAVSTAR
45
© 2016 Mesosphere, Inc. All Rights Reserved.
VIRTUAL NETWORKS
46
•Base DC/OS 1.8 functionality
•With custom Mesos module
•Provides IP/Container out of the box
•Utilizes off the shelf encapsulation
•VXLan
•Artisanal controller built at Mesosphere: Navstar
© 2016 Mesosphere, Inc. All Rights Reserved.
50’000 FOOT VIEW
47
© 2016 Mesosphere, Inc. All Rights Reserved.
DIVING IN
48
© 2016 Mesosphere, Inc. All Rights Reserved.
ACCESSIBILITY
PROJECTS:
OCTARINE
49
© 2016 Mesosphere, Inc. All Rights Reserved.
OCTARINE
50
• Transparent HTTP Proxy
• Automatically integrates with Mesos-DNS Resolves SRV records
• SOCKS Proxy
• Automatic OpenVPN Proxy
• Currently leverages master SSH access for ACLs
• Soon will integrate with DC/OS ACLs
© 2016 Mesosphere, Inc. All Rights Reserved.
USE CASE
51
© 2016 Mesosphere, Inc. All Rights Reserved.
INTERACTION DIAGRAM
52
© 2016 Mesosphere, Inc. All Rights Reserved.
OCTARINE: BENEFITS
53
• Provides Day 0 access to DC/OS services
• With security
• Without internet exposure
• Without task pinning
• Works without custom software
• Works without infrastructure modification
© 2016 Mesosphere, Inc. All Rights Reserved.
CONTROL PLANE
PROJECTS:
LASHUP
54
© 2016 Mesosphere, Inc. All Rights Reserved. 55
We had a problem
© 2016 Mesosphere, Inc. All Rights Reserved. 56
Computers
© 2016 Mesosphere, Inc. All Rights Reserved. 57
Sometimes
They Break
© 2016 Mesosphere, Inc. All Rights Reserved. 58
Sometimes
Many Break
© 2016 Mesosphere, Inc. All Rights Reserved. 59
Sometimes
You don’t know
© 2016 Mesosphere, Inc. All Rights Reserved. 60
Before Lashup
© 2016 Mesosphere, Inc. All Rights Reserved. 61
90+ Second Resolution
For
10% failure
© 2016 Mesosphere, Inc. All Rights Reserved. 62
You’d Need to Be Ultron To Keep Track
Of It All
© 2016 Mesosphere, Inc. All Rights Reserved. 63
Let’s Distribute the Problem
© 2016 Mesosphere, Inc. All Rights Reserved. 64
A Mess
© 2016 Mesosphere, Inc. All Rights Reserved. 65
Academia
© 2016 Mesosphere, Inc. All Rights Reserved. 66
Connected Graph
Hyparview
© 2016 Mesosphere, Inc. All Rights Reserved. 67
Constant Adaptive
Health Checks
© 2016 Mesosphere, Inc. All Rights Reserved. 68
Dealing with Failure
Hyparview
© 2016 Mesosphere, Inc. All Rights Reserved. 69
Dealing with Failure
Hyparview
© 2016 Mesosphere, Inc. All Rights Reserved. 70
Dealing with Failure
Hyparview
© 2016 Mesosphere, Inc. All Rights Reserved. 71
Throw Some Link-State
Routing At it
© 2016 Mesosphere, Inc. All Rights Reserved. 72
Free Multicast
© 2016 Mesosphere, Inc. All Rights Reserved. 73
Sprinkle On Some CRDTs
© 2016 Mesosphere, Inc. All Rights Reserved. 74
© 2016 Mesosphere, Inc. All Rights Reserved.
PROJECT LASHUP
75
•A novel distributed systems SDK that provides:
•Failure detection
•Membership
•Multicast Delivery
•Strongly-eventually consistent data storage
•Powers:
•Minuteman VIP dissemination
•Minuteman node liveness checks
•Overlay routing
•DNS Synchronization
© 2016 Mesosphere, Inc. All Rights Reserved.
THE FUTURE
76
© 2016 Mesosphere, Inc. All Rights Reserved.
FUTURE PLANS?
77
• Security
• Encryption in flight
• Task-level Microsegmentation and Filtering
• Further research required:
• QoS between services
• “Zero-overhead” NFV
© 2016 Mesosphere, Inc. All Rights Reserved. 78
Zero Overhead NFV:
Rewrite the OS
at
The Syscall Layer
© 2016 Mesosphere, Inc. All Rights Reserved. 79
Zero Overhead NFV:
~/linux$ sudo samples/bpf/test_probe_write_user
Server bound to: 127.0.0.1:35707
Client connecting to: 255.255.255.255:5555
Server received connection from: 0.0.0.0:44804
Client's peer address: 127.0.0.1:35707
© 2016 Mesosphere, Inc. All Rights Reserved. 80
Zero Overhead NFV:
~/linux$ sudo strace -e ... samples/bpf/test_probe_write_user
bind(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
getsockname(3, {sa_family=AF_INET, sin_port=htons(42085), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
Server bound to: 127.0.0.1:42085
Client connecting to: 255.255.255.255:5555
connect(7, {sa_family=AF_INET, sin_port=htons(5555), sin_addr=inet_addr("255.255.255.255")}, 16) = 0
accept(3, {sa_family=AF_INET, sin_port=htons(50016), sin_addr=inet_addr("127.0.0.1")}, [16]) = 8
Server received connection from: 0.0.0.0:50016
getpeername(7, {sa_family=AF_INET, sin_port=htons(42085), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
Client's peer address: 127.0.0.1:42085
© 2016 Mesosphere, Inc. All Rights Reserved.
ZERO-OVERHEAD NFV
81
•Implemented using kernel probes
•JIT at runtime
•Allows standard BSD API to work
•In preliminary testing: Undetectable overhead
© 2016 Mesosphere, Inc. All Rights Reserved.
ZERO-OVERHEAD NFV
82
•We’re still taking the VM approach; Why not take the container approach to NFV?
•Lie to the program
•Manipulate the syscalls
•Win big
•Current Status:
•Research Project
•Upstreamed first patches due for the 4.8 Kernel
© 2016 Mesosphere, Inc. All Rights Reserved.
DC/OS 1.8 NETWORKING
83
•Core functionality:
•IP/CT
•Internal Service Discovery
•Load Balancing
•3rd Party Service Integration
•Upcoming Features:
•Security
•External Load Balancing
•Future research going on
© 2016 Mesosphere, Inc. All Rights Reserved.
DC/OS 1.8
NETWORKING
@SARGUN
84

More Related Content

DC/OS 1.8 Container Networking

  • 1. © 2016 Mesosphere, Inc. All Rights Reserved. DC/OS 1.8 NETWORKING @SARGUN 1 Sargun Dhillon, July 2016
  • 2. © 2016 Mesosphere, Inc. All Rights Reserved. WHO AM I? 2
  • 3. © 2016 Mesosphere, Inc. All Rights Reserved. AGENDA 3 • How, and why did we go here? • How does DC/OS bring you closer to the ideal? • Some of the future
  • 4. © 2016 Mesosphere, Inc. All Rights Reserved. A BRIEF HISTORY OF NETWORKS IN THE DC 4
  • 5. © 2016 Mesosphere, Inc. All Rights Reserved. ORGANIZATIONS CIRCA 2007 5 •Before DevOps was first heard •Clear differentiation of ownership •The datacenter was owned by a the NOC •Deployment of services was done by sysadmins in the operations group •Developers operated without access to production •Production deployments gated by QA, Operations
  • 6. © 2016 Mesosphere, Inc. All Rights Reserved. SOFTWARE CIRCA 2007 6 •Different services glued together via CORBA, XML-RPC, SOAP •No one was really consciously doing microservices •Networks were static, giant layer 2 domains •Load Balancing provided by hardware •Everyone ran their own datacenter •EC2 in its infancy, only a year prior has the term “Cloud” began to become popular •Systems statically partitioned
  • 7. © 2016 Mesosphere, Inc. All Rights Reserved. 7 SaaS continued to grow at an incredible rate
  • 8. © 2016 Mesosphere, Inc. All Rights Reserved. 8 There became a race to ship faster
  • 9. © 2016 Mesosphere, Inc. All Rights Reserved. 9 We kept the software alive By feeding it With Sysadmins
  • 10. © 2016 Mesosphere, Inc. All Rights Reserved. 10 We kept the machines alive By feeding them With Blood
  • 11. © 2016 Mesosphere, Inc. All Rights Reserved. 11 This wasn’t working
  • 12. © 2016 Mesosphere, Inc. All Rights Reserved. ORGANIZATIONS 2008+ 12 •We began seeing a gradual shift in the industry where lines between QA, Dev, and Ops were blurring •Devops term coined in 2008, first DevOpsDay in 2009 •Gradual adoption of the cloud, fewer organizations owning their own datacenters •Either networking was outsourced to the cloud, or typically remained in a small internal organization •Needed to reduce ratio of operators to servers
  • 13. © 2016 Mesosphere, Inc. All Rights Reserved. SOFTWARE CIRCA 2008+ 13 •Popularization of Open Source tooling to automate much of traditional operations, and QA •Jenkins / Hudson •Puppet / Chef •Capistrano •Popularization of stacks requiring with more complex operational requirements •Nutch / Hadoop •NoSQLs •Still statically partitioned machines •Networks still sacred territory
  • 14. © 2016 Mesosphere, Inc. All Rights Reserved. CIRCA 2011 14 •Much of what’s been happening for the past half-decade hits networking •Much of this falls under the term “SDN” (Software Defined Networking) or “NFV” (Network Function Virtualization) •Hastened by the adoption of VMs in the enterprise in the hype cycle •Openflow promises to fix everything •Major adoptions of the cloud by startups as well as enterprise •Virtualization begins to become mainstream as a mechanism of consolidating workloads •The invention of the “private cloud” •DotCloud / Docker funded by Y-combinator a year earlier •Term “Microservice” coined
  • 15. © 2016 Mesosphere, Inc. All Rights Reserved. CIRCA 2013 15 • Docker becomes instant hit and brings containers to the forefront • Dynamic partitioning begins to make in-roads • Google releases Omega paper • Apache Aurora open sourced • Microservice counts explode, demanding collocation of workloads for efficiency • Mesosphere Founded • Site Reliability Engineering begins to popularize and further blur the lines between Dev, Ops, and QA
  • 16. © 2016 Mesosphere, Inc. All Rights Reserved. 16 Everything was changing
  • 17. © 2016 Mesosphere, Inc. All Rights Reserved. 17 Why?
  • 18. © 2016 Mesosphere, Inc. All Rights Reserved. 18 Business Value
  • 19. © 2016 Mesosphere, Inc. All Rights Reserved. 19
  • 20. © 2016 Mesosphere, Inc. All Rights Reserved. BENEFITS 20 •Reduction in cost of goods sold •Smaller engineer to server ratio •Linear, or super linear growth rate of engineering team to servers is unsustainable •Smaller engineer to capability ratio, where capability includes: •Features •Throughput •Better User Experience •Better availability •Quicker release to features
  • 21. © 2016 Mesosphere, Inc. All Rights Reserved. 21 But at what cost?
  • 22. © 2016 Mesosphere, Inc. All Rights Reserved. 22 Complexity
  • 23. © 2016 Mesosphere, Inc. All Rights Reserved. OLD WORLD 23
  • 24. © 2016 Mesosphere, Inc. All Rights Reserved. NEW WORLD 24
  • 25. © 2016 Mesosphere, Inc. All Rights Reserved. DIVING DEEPER 25
  • 26. © 2016 Mesosphere, Inc. All Rights Reserved. 26 Paxos? Raft? Ω Failure Detector? Pods? Wat? Sidecars? Etcd? Zookeeper? VxLan?
  • 27. © 2016 Mesosphere, Inc. All Rights Reserved. 27 Performance
  • 28. © 2016 Mesosphere, Inc. All Rights Reserved. REDIS PERFORMANCE 28
  • 29. © 2016 Mesosphere, Inc. All Rights Reserved. THE DC/OS APPROACH 29
  • 30. © 2016 Mesosphere, Inc. All Rights Reserved. CORE TENANTS 30 •DC/OS must be agnostic to the underlying environment •AWS / Azure / GCE / Softlayer as the lowest common denominators •DC/OS should require no to minimal changes to the code in order to work •DC/OS should provide similar services to existing environments •Fixed load balancers •Security •IP/Container •We do not want to require a change in organization procedures
  • 31. © 2016 Mesosphere, Inc. All Rights Reserved. CURRENT SERVICES PROVIDED 31 • Service Discovery • Mesos-DNS • Navstar* • Spartan* • Load Balancing • Minuteman* • Accessibility • Octarine* • IP Per Container • Control Plane • Lashup* *Project Name
  • 32. © 2016 Mesosphere, Inc. All Rights Reserved. SERVICE DISCOVERY PROJECTS: MESOS-DNS NAVSTAR SPARTAN 32
  • 33. © 2016 Mesosphere, Inc. All Rights Reserved. MESOS-DNS / NAVSTAR / SPARTAN 33 • Simple service discovery mechanism that exposes service locations over DNS • Service names, and locations exposed • via SRV records. • via A records • Typically requires modification of downstream code • Good for bootstrap
  • 34. © 2016 Mesosphere, Inc. All Rights Reserved. HIGH LEVEL 34
  • 35. © 2016 Mesosphere, Inc. All Rights Reserved. EXTENDED USAGE 35
  • 36. © 2016 Mesosphere, Inc. All Rights Reserved. INTEGRATIONS 36
  • 37. © 2016 Mesosphere, Inc. All Rights Reserved. SPARTAN 37 • DNS proxy that’s closely coupled with Navstar • Raises availability by doubling work • Makes DNS 2N, 2N+1, or N+1 systems act as such • Reduces latency at scale • Dual dispatches the query, and waits for first response A Jeff Dean Jig
  • 38. © 2016 Mesosphere, Inc. All Rights Reserved. SPARTAN 38
  • 39. © 2016 Mesosphere, Inc. All Rights Reserved. LOAD BALANCING PROJECTS: MINUTEMAN 39
  • 40. © 2016 Mesosphere, Inc. All Rights Reserved. MINUTEMAN 40 •Low-Overhead TCP load balancing •Low-overhead during continuous TCP connection •Pay balancing cost upfront •Inflicts minimal overhead on non-load balanced traffic •Fault-tolerance period aims to be <100ms
  • 41. © 2016 Mesosphere, Inc. All Rights Reserved. FUNCTIONALLY 41
  • 42. © 2016 Mesosphere, Inc. All Rights Reserved. 42
  • 43. © 2016 Mesosphere, Inc. All Rights Reserved. 43
  • 44. © 2016 Mesosphere, Inc. All Rights Reserved. 44
  • 45. © 2016 Mesosphere, Inc. All Rights Reserved. VIRTUAL NETWORKS PROJECTS: NAVSTAR 45
  • 46. © 2016 Mesosphere, Inc. All Rights Reserved. VIRTUAL NETWORKS 46 •Base DC/OS 1.8 functionality •With custom Mesos module •Provides IP/Container out of the box •Utilizes off the shelf encapsulation •VXLan •Artisanal controller built at Mesosphere: Navstar
  • 47. © 2016 Mesosphere, Inc. All Rights Reserved. 50’000 FOOT VIEW 47
  • 48. © 2016 Mesosphere, Inc. All Rights Reserved. DIVING IN 48
  • 49. © 2016 Mesosphere, Inc. All Rights Reserved. ACCESSIBILITY PROJECTS: OCTARINE 49
  • 50. © 2016 Mesosphere, Inc. All Rights Reserved. OCTARINE 50 • Transparent HTTP Proxy • Automatically integrates with Mesos-DNS Resolves SRV records • SOCKS Proxy • Automatic OpenVPN Proxy • Currently leverages master SSH access for ACLs • Soon will integrate with DC/OS ACLs
  • 51. © 2016 Mesosphere, Inc. All Rights Reserved. USE CASE 51
  • 52. © 2016 Mesosphere, Inc. All Rights Reserved. INTERACTION DIAGRAM 52
  • 53. © 2016 Mesosphere, Inc. All Rights Reserved. OCTARINE: BENEFITS 53 • Provides Day 0 access to DC/OS services • With security • Without internet exposure • Without task pinning • Works without custom software • Works without infrastructure modification
  • 54. © 2016 Mesosphere, Inc. All Rights Reserved. CONTROL PLANE PROJECTS: LASHUP 54
  • 55. © 2016 Mesosphere, Inc. All Rights Reserved. 55 We had a problem
  • 56. © 2016 Mesosphere, Inc. All Rights Reserved. 56 Computers
  • 57. © 2016 Mesosphere, Inc. All Rights Reserved. 57 Sometimes They Break
  • 58. © 2016 Mesosphere, Inc. All Rights Reserved. 58 Sometimes Many Break
  • 59. © 2016 Mesosphere, Inc. All Rights Reserved. 59 Sometimes You don’t know
  • 60. © 2016 Mesosphere, Inc. All Rights Reserved. 60 Before Lashup
  • 61. © 2016 Mesosphere, Inc. All Rights Reserved. 61 90+ Second Resolution For 10% failure
  • 62. © 2016 Mesosphere, Inc. All Rights Reserved. 62 You’d Need to Be Ultron To Keep Track Of It All
  • 63. © 2016 Mesosphere, Inc. All Rights Reserved. 63 Let’s Distribute the Problem
  • 64. © 2016 Mesosphere, Inc. All Rights Reserved. 64 A Mess
  • 65. © 2016 Mesosphere, Inc. All Rights Reserved. 65 Academia
  • 66. © 2016 Mesosphere, Inc. All Rights Reserved. 66 Connected Graph Hyparview
  • 67. © 2016 Mesosphere, Inc. All Rights Reserved. 67 Constant Adaptive Health Checks
  • 68. © 2016 Mesosphere, Inc. All Rights Reserved. 68 Dealing with Failure Hyparview
  • 69. © 2016 Mesosphere, Inc. All Rights Reserved. 69 Dealing with Failure Hyparview
  • 70. © 2016 Mesosphere, Inc. All Rights Reserved. 70 Dealing with Failure Hyparview
  • 71. © 2016 Mesosphere, Inc. All Rights Reserved. 71 Throw Some Link-State Routing At it
  • 72. © 2016 Mesosphere, Inc. All Rights Reserved. 72 Free Multicast
  • 73. © 2016 Mesosphere, Inc. All Rights Reserved. 73 Sprinkle On Some CRDTs
  • 74. © 2016 Mesosphere, Inc. All Rights Reserved. 74
  • 75. © 2016 Mesosphere, Inc. All Rights Reserved. PROJECT LASHUP 75 •A novel distributed systems SDK that provides: •Failure detection •Membership •Multicast Delivery •Strongly-eventually consistent data storage •Powers: •Minuteman VIP dissemination •Minuteman node liveness checks •Overlay routing •DNS Synchronization
  • 76. © 2016 Mesosphere, Inc. All Rights Reserved. THE FUTURE 76
  • 77. © 2016 Mesosphere, Inc. All Rights Reserved. FUTURE PLANS? 77 • Security • Encryption in flight • Task-level Microsegmentation and Filtering • Further research required: • QoS between services • “Zero-overhead” NFV
  • 78. © 2016 Mesosphere, Inc. All Rights Reserved. 78 Zero Overhead NFV: Rewrite the OS at The Syscall Layer
  • 79. © 2016 Mesosphere, Inc. All Rights Reserved. 79 Zero Overhead NFV: ~/linux$ sudo samples/bpf/test_probe_write_user Server bound to: 127.0.0.1:35707 Client connecting to: 255.255.255.255:5555 Server received connection from: 0.0.0.0:44804 Client's peer address: 127.0.0.1:35707
  • 80. © 2016 Mesosphere, Inc. All Rights Reserved. 80 Zero Overhead NFV: ~/linux$ sudo strace -e ... samples/bpf/test_probe_write_user bind(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, 16) = 0 getsockname(3, {sa_family=AF_INET, sin_port=htons(42085), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0 Server bound to: 127.0.0.1:42085 Client connecting to: 255.255.255.255:5555 connect(7, {sa_family=AF_INET, sin_port=htons(5555), sin_addr=inet_addr("255.255.255.255")}, 16) = 0 accept(3, {sa_family=AF_INET, sin_port=htons(50016), sin_addr=inet_addr("127.0.0.1")}, [16]) = 8 Server received connection from: 0.0.0.0:50016 getpeername(7, {sa_family=AF_INET, sin_port=htons(42085), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0 Client's peer address: 127.0.0.1:42085
  • 81. © 2016 Mesosphere, Inc. All Rights Reserved. ZERO-OVERHEAD NFV 81 •Implemented using kernel probes •JIT at runtime •Allows standard BSD API to work •In preliminary testing: Undetectable overhead
  • 82. © 2016 Mesosphere, Inc. All Rights Reserved. ZERO-OVERHEAD NFV 82 •We’re still taking the VM approach; Why not take the container approach to NFV? •Lie to the program •Manipulate the syscalls •Win big •Current Status: •Research Project •Upstreamed first patches due for the 4.8 Kernel
  • 83. © 2016 Mesosphere, Inc. All Rights Reserved. DC/OS 1.8 NETWORKING 83 •Core functionality: •IP/CT •Internal Service Discovery •Load Balancing •3rd Party Service Integration •Upcoming Features: •Security •External Load Balancing •Future research going on
  • 84. © 2016 Mesosphere, Inc. All Rights Reserved. DC/OS 1.8 NETWORKING @SARGUN 84