SlideShare a Scribd company logo
Networks for SREs: What do I need
to know
Michael Kehoe
Staff SRE
Introduction
Michael Kehoe
$ WHOAMI
• Staff Site Reliability Engineer @ LinkedIn
• Production-SRE Team
• Funny accent = Australian + 3 years
American
• Former Network Engineer at the
University of Queensland
Agenda and Vision
Today’s
agenda
1 Introductions
2 Problem Statement
3 Basics of Networks
4 Advances in networks
5 Clos Networks
6 Advances in Network Speeds
7 IPv6
8 Summary
Networks just work right?
Probably…
Probably…Not…
What are we trying to solve
Problem Statement
• Network Design – Has evolved
• Network software/ hardware –
Has advanced
• Learning – The average SRE may
not necessarily understand the
ramifications
• Tooling – Has been left behind
What this talk is
• Tale into potential pitfalls of modern
day networks
What this talk isn’t
• How to make the network do all the
things…quickly & reliably…
What this talk isn’t
• How to make the network do all the
things…quickly & reliably…
• Sorry
Basics of Networks
Basics of Networks
Peering
Facility
Tier 1 ISP’s
Tier 3 ISP Tier 2 ISP
Tier 2 ISP
Tier 2
Cable ISP
Basics of Networks
Advances in Network
Design
Advances in Network Design
• Clos Networks
• Advancement of network speeds
• IPv6 Implementation (Finally)
• Multi-homed internet connections
• Moving away from traditional internal
routing protocols
Clos Networks
Clos Networks
Clos Networks
Clos Networks
Credit: Facebook
Clos Networks
Credit: Facebook
Advancement of Network
Speeds
Advancement of Network Speeds
Speed Name Standard Year
10Mb 10BASE-T 802.3i 1990
100Mb 100BASE-TX 802.3u 1995
1000Mb = 1Gb 1000BASE-T 802.3ab 1999
10Gb 10GBASE 802.3ae 2002
40/100Gb 40GbE/ 100GbE 802.3ba 2010
Advancement of Network Speeds
• What this gives us
• Better transfer bulk speeds
• The ability to have higher concurrency
services (1M connection problem)
• Run multiple high-concurrency
applications (LPS)
Networks just work right?
Probably…
Probably…Not…
Optimizations Required
Advancement of Network Speeds
NIC Linux Kernel Network
Switches
Advancement of Network Speeds
• Network Interface Cards
• Various RX/ TX queue size limits/
defaults
• Various interrupt schemes
• Plethora of tunables that vary wildly
• LITTLE TO NO DOCUMENTATION!
• How do you monitor/ tune it???
Advancement of Network Speeds
• Linux Kernel
• Lots of network tunables
• Some defaults assume year ~2000
era hardware
• E.g. net.ipv4.tcp_max_syn_backlog
• Important to understand the type of
application you run and cater your
tunables to that.
Advancement of Network Speeds
• Network switches
• Similarly to interfaces and Linux
software, there’s a lot of options
• Deep Buffers
• DSCP marking
• Switching latency
• DCTCP
Adoption of IPv6
IPv6 Features
Address
Space
Better
Performance
Simplified
Header
No-NAT Auto-
Configuratio
n
IPv6: Address Space
• Moving from a 32-bit address space to
128-bit.
• 4B  340TTT
• Read up on IPv6 addressing
representation
• RFC-5952
IPv6: Address Space
A SINGLE ADDRESS CAN BE REPRESENTED MANY WAYS
2001:db8:0:0:1:0:0:1
2001:0db8:0:0:1:0:0:1
2001:db8::1:0:0:1
2001:db8::0:1:0:0:1
2001:0db8::1:0:0:1
2001:db8:0:0:1::1
2001:db8:0000:0:1::1
2001:DB8:0:0:1::1
IPv6: Address Space
YOU CAN MAKE FUN PHRASES
• :cafe:beef
• :feed:f00d:
• :bad:f00d:
• :bad:beef:
• :bad:d00d:
• :f00d:cafe:
• :bad:fa11:
IPv6: Address Space
OR CLEVER ADVERTISING
[mkehoe@mkehoe ~]$ host -6 www.facebook.com
www.facebook.com is an alias for star-mini.c10r.facebook.com.
star-mini.c10r.facebook.com has IPv6 address
2a03:2880:f113:8083:face:b00c:0:25de
IPv6: Address Space
SPECIAL ADDRESSES: IPV4
RFC IP Block Use
1918 10.0.0.0/8
172.16.0.0/16
192.168.0.0/16
Private IP Addressing
6890/ 3927 169.254.0.0/16 Link-Local
5771
2365
224.0.0.0/4 Multicast
IPv6: Address Space
SPECIAL ADDRESSES: IPV6
IP Block Use
::/128 Unspecified Address
::1/128 Loopback address
::ffff:0:0/96 IPv4 mapped addresses
64:ff9b::/96 IPv4/ V6 translation
fc00:::/7 Unique Local Address
fe80::/10 Link-Local address
ff00::/8 Multicast addresses
IPv6: Address Space
OR CLEVER ADVERTISING
[mkehoe@mkehoe ~]$ host -6 www.facebook.com
www.facebook.com is an alias for star-mini.c10r.facebook.com.
star-mini.c10r.facebook.com has IPv6 address
2a03:2880:f113:8083:face:b00c:0:25de
IPv6: Simplified Header
IPv6: No NAT
• No need for NAT anymore
• Simplified Configuration
• Less points-of-failure
• Potential for better performance
• NAT is slow
• Harder for abusers to hide behind NAT
IPv6: Auto-Configuration
• Stateless = Auto-Configured
• Stateful = DHCP/ Statically assigned
IPv6: Better Performance
• The elimination of NAT is a significant
factor
• Generally less hops across the internet
for IPv6 vs IPv4
• Simplified Header gives small amount of
optimization
Summary
Summary
• Don’t implicitly trust the network!
• Understand where your packets flow
• End-to-End monitoring of your network. It
is the lifeblood of your infrastructure
• For any network infrastructure changes,
ensure you understand how to
benchmark and monitor it!
Networks just work right?
Q&A
SRECon-Europe-2017: Networks for SREs

More Related Content

SRECon-Europe-2017: Networks for SREs

Editor's Notes

  1. So today I want to briefly talk about what this talk is about and what I hope to achieve by the end of this session. I then want to do a quick review some of the basics of the internet and networks. Then talk about 3 specific advances of networks.
  2. NOTE: So what’s the problem we’re trying to solve in this space: If there’s one thing I would like you all to get out of this talk, it is: Don’t trust any part of the network
  3. Tier 1: AT&T Level3 Tata Telecom Italia Telefonica
  4. We have our Layer 7 application layer which the application protocols that we use daily HTTP, DNS, SSH, SMTP and somewhat importantly the BGP protocol We have the Layer 4 Transport layer, which is our where our TCP & UDP protocols live We have the Layer 3 IP or Internet Layer, this is there the IP protocol lives, the ICMP protocol, but also a number of other important routing protocols including IPSEC, OSPF & RIP We have our Layer 2 data-link layer, this layer provides the functional means to transfer data between entities. This is where the Ethernet protocol (802.3) protocol is And finally we have the physical layer which we’ll talk about in a few minutes
  5. So in the last 10 years or so we’ve finally started to see an advancement in the implementation of networks, particularly in the following areas Clos Networks Advancement of network speeds Eventual implementation of IPv6 within networks and on the internet Multi-homed internet connections Using BGP as an Interior Routing Protocol All of these things have brought their own set of unique challenges to the way we operate the network, but also the applications we as SRE’s run underneath them. So let’s talk about these
  6. Clos Networks, named after Charles Clos who formalized this design in 1952. The Clos Network design actually started out as a multi-stage switching system for telephone systems. Funnily enough, the original “key advantage” of this design was to increase capacity and reduce bottlenecks in switching devices.
  7. Fast-forward approximately 60 years, Network Engineers started to use Clos topology in datacenter networks. In a fashion similar to what you see on the screen. The interesting thing about the typical implementation of the Clos (Spine/Leaf) topology is that instead of it being a switching network (A Layer 2 network), It’s a Routed Network (A Layer 3 network).
  8. As an aside, Clos networks can be represented a number of different ways. In the three representations shown here, The spine planes are all connected in the same way, just arranged differently.
  9. So now we have traffic being routed across multiple links (no L2 protocols here: spanning tree or LACP here). We are using what’s known as Equal Cost Multipath Routing or ECMP. So what does it mean to us as SRE’s? Simply put, how you go from server A to B (within a datacenter or fabric) could be 16, 64 or 256 various paths. Making ”why are my packets not making it to Server B”, a difficult problem to troubleshoot. According to the Paris-Traceroute research paper, ECMP flows are load-balanced using a set of five fields (Source/ Destination IP’s, Ports and Type of Service). Unfortunately, unless you have a SDN controller that’s aware of these flows, it’s not possible to identify the path of application traffic in real time. So where does that leave us for troubleshooting poor connectivity between servers. Unfortunately, for the most part, traditional tools like ping, traceroute and even MTR aren’t useful. Using the default options on these options will only let you discover 1 path out of potentially 100’s. There are two utilities that have made ground in this area: Dublin-traceroute which draws paths Fbtracert, by Facebook which is built ontop of Go Hopefully in the near future, we can bring a similar utility to LinkedIn
  10. As you can see, since the 1990’s, we’ve been growing our LAN network speeds every few years. In the space of 20 years, we’ve gone from 10Mb Ethernet over copper wires to 100Gb over optical fibers. As internet backbone speeds have grown, so have the speeds on our desktops and of course on our servers.
  11. So to think you’re going to get 10Gb out of the box is somewhat of a pipedream unfortunately, there are some optimizations and forethought required.
  12. So for this to work harmoniously together, there’s three components that need to work together NIC Linux Kernel Network Switches
  13. So when you look at the NIC side of the equation, there’s so many variables Suggest you check out Joe Damato’s talk from Monitorama 2016 where he talks about why statistic collection for network devices in Linux is probably wrong.
  14. Standard TCP congestion control relies on packet loss to detect congestion DCTCP
  15. Standard TCP congestion control relies on packet loss to detect congestion DCTCP
  16. Standard TCP congestion control relies on packet loss to detect congestion DCTCP
  17. http://sophiedogg.com/funny-ipv6-words/
  18. http://www.networkworld.com/article/2692482/ipv6/infographic-ipv4-vs-ipv6.html
  19. http://www.tcpipguide.com/free/t_IPv6AutoconfigurationandRenumbering.htm
  20. 1. Don’t implicitly trust the network. Remember that the network rarely hard-fails, most of the failures are partial and troublesome to debug