SlideShare a Scribd company logo
Open-source Infrastructure at Lyft
Constance Caramanolis
Daniel Hochman July 2017
Overview of Lyft Architecture
Open-source Infrastructure Projects
- Confidant
- Discovery
- Ratelimit
- Envoy
Q&A
Agenda
Architecture (simplified)
Front Envoy
Application
Envoy
DiscoveryConfidant
>100 Clusters
Ratelimit
Python
lyft / confidant
Your secret keeper. Stores secrets in Dynamo, encrypted at rest.
1,105
12 contributors
November 2015
How is a service configured?
lyft / location-service Private
common:
PORT: 8080
TIMEOUT_MS: 15000
development:
USE_AUTH: False
staging:
API_KEY: secret_key_igjq3i494fqq234qbc
production:
API_KEY: secret_key_ojajf823jj49ij8h
environment.yaml
Service
location-service
Confidant to the rescue!
Credential
api_key: password123
Behind the scenes
Application
IAM Role
EC2 Instance
Credential
api_key: password123
api_key = os.getenv('CREDENTIAL_API_KEY')
KMS
DynamoDB
Confidant
Server-blind secrets
Highly sensitive secrets are encrypted and decrypted by the end-users.
Confidant stores but can't read them.
Confidant
KMS
IAM Role
EC2 Instance
lyft / discovery
Provides a REST interface for querying for the list of hosts that belong to a microservices
54
6 contributors
Python
August 2016
POST /v1/registration/location-service
{
"ip": "10.0.0.1",
"port": 80,
"revision": "da08f35b",
"tags": {
"id": "i-910203",
"az": "us-east-1a",
"canary": true
}
}
Tracking hosts
* * * * *
- Hosts are stored in DynamoDB
- Storage support is abstract
- Hosts removed if not reporting since now - HOST_TTL
- Ecosystem designed to tolerate eventual consistency
unlike Zookeeper, etcd, Consul
- Pair with active healthchecks
Storage
DynamoDB
GET /v1/registration/<service>
{
"hosts": [
{
"ip": "10.0.0.1", "port": 80, "revision": "da08f35b",
"tags": {"id": "i-910203", "az": "us-east-1a", "canary": true}
},
...
{
"ip": "10.0.0.2", "port": 80, "revision": "da08f35b",
"tags": {"id": "i-121286", "az": "us-east-1d"}
}
]
}
Fetching hosts
Services list the hosts they want to talk to!
internal_hosts:
- jobscheduler
- roads
external_hosts:
- dynamodb_iad
- kinesis_iad
Envoy per-service configuration
location-service/envoy.yaml
/etc/envoy.conf
(on the box)
Active Healthcheck
Application
Envoy
Discovery
jobscheduler roads
GET /healthcheck
Application
Envoy
GET
GET
Every host healthchecks every host in a destination cluster
location-service
lyft / ratelimit
Go/gRPC service designed to enable generic rate limit scenarios
224
6 contributors
Go
January 2017
Why rate limit?
- Control flow
- Protect against attacks
- Bad actors
- Accidents happen
oops!
Rate Limit Service
- Written in Go
- Enable generic rate limit
scenarios
- Decisions based on a domain
and set of descriptors
- Settings configured at runtime
- Backed by Redis
Ratelimit
?
INCR
Domains and descriptors
Domain
Defines a container for a set of rate limits
Globally unique
e.g. "envoy_front"
Descriptors
Ordered list of key/value pairs
Case sensitive
e.g. ("destination_cluster", "location-service"), ("user_id", "1234")
Limit definition
Runtime Setting
Defines the request per unit for a descriptor.
Request flow example
Rq1: (“user_id”, “1234”)
Redis state: user_id_1234 : 1
Rs1: RateLimitResponse_OK
Rq2: (“user_id”, “9876”)
Redis state: user_id_1234: 1, user_id_9876 : 1
Rs2: RateLimitResponse_OK
Rq3: (“user_id”, “1234)
Redis state: user_id_1234: 2, user_id_9876 : 1
Rs3: RateLimitResponse_OVER_LIMIT
Definition
domain: test_domain
key: user_id
rate_limit:
unit: hour
requests_per_unit: 1
Ratelimit Client
from lyft_idl.client.ratelimit.ratelimit_client import RateLimitClient
ratelimit_client = RateLimitClient(settings.LYFT_API_USER_AGENT)
# Determines whether or not to limit jsonp_messages_post according to ratelimit service.
def should_allow_jsonp_messages_post(ip_address, phone_number):
domain = settings.get('RATE_LIMIT_DOMAIN')
ip_descriptors = [(('jsonp_messages_post_from_ip_address', ip_address), )]
phone_descriptors = [(('jsonp_messages_post_from_phone_number', phone_number), )]
return (
ratelimit_client.is_request_allowed(domain, ip_descriptors) and
ratelimit_client.is_request_allowed(domain, phone_descriptors)
)
lyft / envoy
Front/service L7 proxy
1,924
62 contributors
C++
September 2016
Why Envoy?
Service Oriented Architecture
- Many languages and frameworks
- Protocols (HTTP/1, HTTP/2, databases, caching, etc…)
- Partial implementation of SoA best practices (retries, timeouts, …)
- Observability
- Load balancers (AWS, F5)
What is Envoy?
The network should be transparent to applications.
When network and application problems do occur it
should be easy to determine the source of the problem.
What is Envoy?
- Modern C++11
- Runs alongside applications
- Service discovery integration
- Rate Limit integration
- HTTP2 first (get gRPC!)
- Act as front/edge proxy
- Stats, Stats, Stats
- Logging
Observability: Global Health
Observability: Service to Service
Envoy Client in Python (internal)
from lyft.api_client import EnvoyClient
switchboard_client = EnvoyClient(
service='switchboard'
)
switchboard_client.post(
"/v2/messages",
data={
'template': 'welcome'
},
headers={
'x-lyft-user-id': 12345647363394
}
)
Envoy deployment @Lyft
- > 100 services
- > 10,000 hosts
- > 2,000,000 RPS
- All service to service traffic (REST and gRPC)
- MongoDB, DynamoDB, Redis proxy
- External service proxy (AWS and other partners)
- Kibana/Elastic Search for logging.
- LightStep for tracing
- Wavefront for stats
Architecture Revisited
Front Envoy
Application
Envoy
DiscoveryConfidant
>100 Clusters
Ratelimit
Done!
- Lyft is hiring. If you want to work on large-scale problems in a fast-moving,
high-growth company visit lyft.com/jobs
- Visit github.com/lyft
- Slides available at slideshare.net/danielhochman
- Q&A

More Related Content

Open-source Infrastructure at Lyft

  • 1. Open-source Infrastructure at Lyft Constance Caramanolis Daniel Hochman July 2017
  • 2. Overview of Lyft Architecture Open-source Infrastructure Projects - Confidant - Discovery - Ratelimit - Envoy Q&A Agenda
  • 4. Python lyft / confidant Your secret keeper. Stores secrets in Dynamo, encrypted at rest. 1,105 12 contributors November 2015
  • 5. How is a service configured? lyft / location-service Private common: PORT: 8080 TIMEOUT_MS: 15000 development: USE_AUTH: False staging: API_KEY: secret_key_igjq3i494fqq234qbc production: API_KEY: secret_key_ojajf823jj49ij8h environment.yaml
  • 6. Service location-service Confidant to the rescue! Credential api_key: password123
  • 7. Behind the scenes Application IAM Role EC2 Instance Credential api_key: password123 api_key = os.getenv('CREDENTIAL_API_KEY') KMS DynamoDB Confidant
  • 8. Server-blind secrets Highly sensitive secrets are encrypted and decrypted by the end-users. Confidant stores but can't read them. Confidant KMS IAM Role EC2 Instance
  • 9. lyft / discovery Provides a REST interface for querying for the list of hosts that belong to a microservices 54 6 contributors Python August 2016
  • 10. POST /v1/registration/location-service { "ip": "10.0.0.1", "port": 80, "revision": "da08f35b", "tags": { "id": "i-910203", "az": "us-east-1a", "canary": true } } Tracking hosts * * * * *
  • 11. - Hosts are stored in DynamoDB - Storage support is abstract - Hosts removed if not reporting since now - HOST_TTL - Ecosystem designed to tolerate eventual consistency unlike Zookeeper, etcd, Consul - Pair with active healthchecks Storage DynamoDB
  • 12. GET /v1/registration/<service> { "hosts": [ { "ip": "10.0.0.1", "port": 80, "revision": "da08f35b", "tags": {"id": "i-910203", "az": "us-east-1a", "canary": true} }, ... { "ip": "10.0.0.2", "port": 80, "revision": "da08f35b", "tags": {"id": "i-121286", "az": "us-east-1d"} } ] } Fetching hosts
  • 13. Services list the hosts they want to talk to! internal_hosts: - jobscheduler - roads external_hosts: - dynamodb_iad - kinesis_iad Envoy per-service configuration location-service/envoy.yaml /etc/envoy.conf (on the box)
  • 14. Active Healthcheck Application Envoy Discovery jobscheduler roads GET /healthcheck Application Envoy GET GET Every host healthchecks every host in a destination cluster location-service
  • 15. lyft / ratelimit Go/gRPC service designed to enable generic rate limit scenarios 224 6 contributors Go January 2017
  • 16. Why rate limit? - Control flow - Protect against attacks - Bad actors - Accidents happen oops!
  • 17. Rate Limit Service - Written in Go - Enable generic rate limit scenarios - Decisions based on a domain and set of descriptors - Settings configured at runtime - Backed by Redis Ratelimit ? INCR
  • 18. Domains and descriptors Domain Defines a container for a set of rate limits Globally unique e.g. "envoy_front" Descriptors Ordered list of key/value pairs Case sensitive e.g. ("destination_cluster", "location-service"), ("user_id", "1234")
  • 19. Limit definition Runtime Setting Defines the request per unit for a descriptor.
  • 20. Request flow example Rq1: (“user_id”, “1234”) Redis state: user_id_1234 : 1 Rs1: RateLimitResponse_OK Rq2: (“user_id”, “9876”) Redis state: user_id_1234: 1, user_id_9876 : 1 Rs2: RateLimitResponse_OK Rq3: (“user_id”, “1234) Redis state: user_id_1234: 2, user_id_9876 : 1 Rs3: RateLimitResponse_OVER_LIMIT Definition domain: test_domain key: user_id rate_limit: unit: hour requests_per_unit: 1
  • 21. Ratelimit Client from lyft_idl.client.ratelimit.ratelimit_client import RateLimitClient ratelimit_client = RateLimitClient(settings.LYFT_API_USER_AGENT) # Determines whether or not to limit jsonp_messages_post according to ratelimit service. def should_allow_jsonp_messages_post(ip_address, phone_number): domain = settings.get('RATE_LIMIT_DOMAIN') ip_descriptors = [(('jsonp_messages_post_from_ip_address', ip_address), )] phone_descriptors = [(('jsonp_messages_post_from_phone_number', phone_number), )] return ( ratelimit_client.is_request_allowed(domain, ip_descriptors) and ratelimit_client.is_request_allowed(domain, phone_descriptors) )
  • 22. lyft / envoy Front/service L7 proxy 1,924 62 contributors C++ September 2016
  • 23. Why Envoy? Service Oriented Architecture - Many languages and frameworks - Protocols (HTTP/1, HTTP/2, databases, caching, etc…) - Partial implementation of SoA best practices (retries, timeouts, …) - Observability - Load balancers (AWS, F5)
  • 24. What is Envoy? The network should be transparent to applications. When network and application problems do occur it should be easy to determine the source of the problem.
  • 25. What is Envoy? - Modern C++11 - Runs alongside applications - Service discovery integration - Rate Limit integration - HTTP2 first (get gRPC!) - Act as front/edge proxy - Stats, Stats, Stats - Logging
  • 28. Envoy Client in Python (internal) from lyft.api_client import EnvoyClient switchboard_client = EnvoyClient( service='switchboard' ) switchboard_client.post( "/v2/messages", data={ 'template': 'welcome' }, headers={ 'x-lyft-user-id': 12345647363394 } )
  • 29. Envoy deployment @Lyft - > 100 services - > 10,000 hosts - > 2,000,000 RPS - All service to service traffic (REST and gRPC) - MongoDB, DynamoDB, Redis proxy - External service proxy (AWS and other partners) - Kibana/Elastic Search for logging. - LightStep for tracing - Wavefront for stats
  • 31. Done! - Lyft is hiring. If you want to work on large-scale problems in a fast-moving, high-growth company visit lyft.com/jobs - Visit github.com/lyft - Slides available at slideshare.net/danielhochman - Q&A