SlideShare a Scribd company logo
Pierre Souchay
Discovery Team @Criteo
Twitter: @vizionr
Github: pierresouchay
Inversion Of
Control with
Consul
Leading Discovery Team @Criteo (SDKs + Consul)
Dealing with 240k+ services, 38k Consul nodes in 9 DCs
1st external contributor to Consul
Author of consul-templaterb
Today’s dishes
• Starters
• History of Consul at Criteo
• Entrées
• Inversion of Control explained
• Cheese
• Real World Examples
• Sweets
• How it changes infrastructure
3 •
Consul History
at
When, Why, How ?
4 •
More servers every year
DC: 12 (9 prod)
Servers: 38 000+
Services: 3400
Instances: 240k+
HTTP req/s: 3M+
BigData: 180+ Pb
Kafka msg/s: 8M+
Criteo Infrastructure growth
Back to 2014
2014 -machines
• Web-App internet facing
• µServices
• NoSQL (caches, KV…)
• Logs
2019 hashiconf seattle_consul_ioc
2019 hashiconf seattle_consul_ioc
2019 hashiconf seattle_consul_ioc
2015 - Mesos
• Containers
• Frequent changes
• Many services/machine
• Different Provisioning
2019 hashiconf seattle_consul_ioc
2019 hashiconf seattle_consul_ioc
1 2 3 4
Provisioning
time
is an issue
globally (F5)
Database
polling shows
its limits
Services both
in containers
and machines?
More latency
Introduced
By new
Load-Balancers
Sounds almost good enough but…
Time to move on
Consul
to discover everything
• No SPOF`
• Multi DC support
• Service oriented
• Real time updates
• Toolbox (KV, locks)
• DNS integration!
• Working on IP
Step 1
Register * in
Consul
Deployer
Add Health-Checks
Add Tags
Step 2 : re-
implement
our libraries
HTTP Client Side LB Database Access
Kafka Memcached/Couchbase
Step 3
Load Balancers
Step 3 was harder
• Watch changes many services
→ cpu/net: idx/service: #3899 (and many more)
• Leader get saturated
→ discovery_max_stale: #3920
• DNS issues on big services
→ DNS fixes: #3940, #3948, #4071,
• 800mb/s to watch changes
→ consul-templaterb : now 12kb/s
• Weights in services / meta in services
→ #3881 / #4047 / #4468
All of this is solved now
What did we learn about our users?
love their services configuration into their systems
What did we learn about our users?
love their services
want predictability
configuration into their systems
give them tools to investigate
What did we learn about our users?
love their services
want predictability
love business semantics
configuration into their systems
give them tools to investigate
focus on semantics, ignore tools
What did we learn about our users?
love their services
want predictability
love business semantics
want it fast and magic
configuration into their systems
give them tools to investigate
focus on semantics, ignore tools
magic is better than As A Service
Can we go further ?
Can we change the way we create infrastructure and tools?
So, what is this IoC?
27 •
Inversion of Control
Decoupling systems stuff using a framework
Provides semantics of your needs
Someone will provide what you need (and
much more)
Broader than Dependency Injection
Decouple producer/consumer
29 •
Consul exposes lots of stuff
list all services filter services using tags Notifications in real time Provide configuration
settings (KV/Service Data)
30 •
Let’s use
those features
Expose searchable semantics
using tags
Provide configuration hints with
business semantics as meta
Tools observe, react & provision
Consul is like an infra DB
31 •
Inversion Of Control
32 •
Inversion Of Control: Swagger
33 •
Inversion Of Control: Automatic alerts
Why is meta so cool?
Direct configuration
• alert_* => automatic alerts
• vip_*=> VAAS Configuration
• swagger_* => you saw it
…and information
• version
• start
• team
• OWNERS...
.. Automatically cleaned up
Consumers of those meta can be…
OPT-IN
• VAAS (network load balancing)
• Swagger repository
OPT-OUT
• Chaos Monkey
• Security Scanner
• Automatic Alerting
• App Watcher
• Version Scanner
More meta,
More power,
More services
node meta + service meta: 2 layers
metrics can re-use it as well (ex:
Prometheus/Consul integration)
Templates are re-useable
Same meta can be re-used for new tools
It gets easier and easier
Isn't K/V the right place instead of meta?
Most of the time… no
Cardinality is hard to get right: a service is NOT a monolith
Cleanup is just too hard
It gives a bit more work on consumer side, because of cardinality
38 •
Semantics, not tool oriented
alert_channel = “myteam”
alert_ratio = “0.5”
alert_on_call_group = “myteam-alerts”
alert_depends_on = “database1,serviceX”
alert_criticity = “business,high”
39 •
Real-World
Examples
Automatic Metering /
Alerts
• templates of consul-templaterb generating prometheus alerts
• Provide 100% of coverage of Criteo for free
• Also provide metrics such as availability for all of Criteo
• App availability according to version/OS/rack, using meta!
• Re-use those meta in all metrics
41 •
VAAS
• Provides all networking for Criteo
• Serving more than 4M HTTP req/s
• Share semantics for several load-balancers
• HaProxy
• F5
• Provisioning of much more than reverse proxies
• DNS (including Geo-DNS)
• TLS
• Real time creation of Services (less than 1 minute)
42 •
• Detect old applications
• Detect invalid ownership
• Old security groups
• Deprecated users…
Services Scanner
43 •
Consul-UI / Consul-
Timeline
• Live logs for all services
• History of services
• http://github.com/criteo/consul-
templaterb/
• Provides real time updates about the
status of all services
• Provides an history of changes for all
services
44 •
And much
more…
Swagger browser (catalog of all JSON APIs
in Criteo)
Chaos Monkey
Resource Tracking Systems
Latency Monitoring between machines
Security Scanner looks up for new services
to scan
45 •
How IoC
changes infra
Removes
configuration
from hidden
places
If you are providing a cross service new
system, you probably don’t need a git
repository for the configuration
So everything is transparent and open to
everybody
Information is where it needs to be, on
the service itself
Ease onboarding of newcomers
Cleanup is not
a hard
problem
anymore
Systems live and die,
consumers react
Ops synchronization is not
needed anymore
Help
innovating
Real decoupling
You can start your new project on your laptop
Templating systems create your configs easily
No migration costs anymore, we don’t configure
tools
Semantics are better than YAML config files
Q&A
/
Demo

More Related Content

2019 hashiconf seattle_consul_ioc

  • 1. Pierre Souchay Discovery Team @Criteo Twitter: @vizionr Github: pierresouchay Inversion Of Control with Consul Leading Discovery Team @Criteo (SDKs + Consul) Dealing with 240k+ services, 38k Consul nodes in 9 DCs 1st external contributor to Consul Author of consul-templaterb
  • 2. Today’s dishes • Starters • History of Consul at Criteo • Entrées • Inversion of Control explained • Cheese • Real World Examples • Sweets • How it changes infrastructure
  • 4. 4 • More servers every year DC: 12 (9 prod) Servers: 38 000+ Services: 3400 Instances: 240k+ HTTP req/s: 3M+ BigData: 180+ Pb Kafka msg/s: 8M+ Criteo Infrastructure growth
  • 6. 2014 -machines • Web-App internet facing • µServices • NoSQL (caches, KV…) • Logs
  • 10. 2015 - Mesos • Containers • Frequent changes • Many services/machine • Different Provisioning
  • 13. 1 2 3 4 Provisioning time is an issue globally (F5) Database polling shows its limits Services both in containers and machines? More latency Introduced By new Load-Balancers Sounds almost good enough but…
  • 15. Consul to discover everything • No SPOF` • Multi DC support • Service oriented • Real time updates • Toolbox (KV, locks) • DNS integration! • Working on IP
  • 16. Step 1 Register * in Consul Deployer Add Health-Checks Add Tags
  • 17. Step 2 : re- implement our libraries HTTP Client Side LB Database Access Kafka Memcached/Couchbase
  • 19. Step 3 was harder • Watch changes many services → cpu/net: idx/service: #3899 (and many more) • Leader get saturated → discovery_max_stale: #3920 • DNS issues on big services → DNS fixes: #3940, #3948, #4071, • 800mb/s to watch changes → consul-templaterb : now 12kb/s • Weights in services / meta in services → #3881 / #4047 / #4468
  • 20. All of this is solved now
  • 21. What did we learn about our users? love their services configuration into their systems
  • 22. What did we learn about our users? love their services want predictability configuration into their systems give them tools to investigate
  • 23. What did we learn about our users? love their services want predictability love business semantics configuration into their systems give them tools to investigate focus on semantics, ignore tools
  • 24. What did we learn about our users? love their services want predictability love business semantics want it fast and magic configuration into their systems give them tools to investigate focus on semantics, ignore tools magic is better than As A Service
  • 25. Can we go further ? Can we change the way we create infrastructure and tools?
  • 26. So, what is this IoC?
  • 27. 27 • Inversion of Control Decoupling systems stuff using a framework Provides semantics of your needs Someone will provide what you need (and much more) Broader than Dependency Injection
  • 29. 29 • Consul exposes lots of stuff list all services filter services using tags Notifications in real time Provide configuration settings (KV/Service Data)
  • 30. 30 • Let’s use those features Expose searchable semantics using tags Provide configuration hints with business semantics as meta Tools observe, react & provision Consul is like an infra DB
  • 32. 32 • Inversion Of Control: Swagger
  • 33. 33 • Inversion Of Control: Automatic alerts
  • 34. Why is meta so cool? Direct configuration • alert_* => automatic alerts • vip_*=> VAAS Configuration • swagger_* => you saw it …and information • version • start • team • OWNERS... .. Automatically cleaned up
  • 35. Consumers of those meta can be… OPT-IN • VAAS (network load balancing) • Swagger repository OPT-OUT • Chaos Monkey • Security Scanner • Automatic Alerting • App Watcher • Version Scanner
  • 36. More meta, More power, More services node meta + service meta: 2 layers metrics can re-use it as well (ex: Prometheus/Consul integration) Templates are re-useable Same meta can be re-used for new tools It gets easier and easier
  • 37. Isn't K/V the right place instead of meta? Most of the time… no Cardinality is hard to get right: a service is NOT a monolith Cleanup is just too hard It gives a bit more work on consumer side, because of cardinality
  • 38. 38 • Semantics, not tool oriented alert_channel = “myteam” alert_ratio = “0.5” alert_on_call_group = “myteam-alerts” alert_depends_on = “database1,serviceX” alert_criticity = “business,high”
  • 40. Automatic Metering / Alerts • templates of consul-templaterb generating prometheus alerts • Provide 100% of coverage of Criteo for free • Also provide metrics such as availability for all of Criteo • App availability according to version/OS/rack, using meta! • Re-use those meta in all metrics
  • 41. 41 • VAAS • Provides all networking for Criteo • Serving more than 4M HTTP req/s • Share semantics for several load-balancers • HaProxy • F5 • Provisioning of much more than reverse proxies • DNS (including Geo-DNS) • TLS • Real time creation of Services (less than 1 minute)
  • 42. 42 • • Detect old applications • Detect invalid ownership • Old security groups • Deprecated users… Services Scanner
  • 43. 43 • Consul-UI / Consul- Timeline • Live logs for all services • History of services • http://github.com/criteo/consul- templaterb/ • Provides real time updates about the status of all services • Provides an history of changes for all services
  • 44. 44 • And much more… Swagger browser (catalog of all JSON APIs in Criteo) Chaos Monkey Resource Tracking Systems Latency Monitoring between machines Security Scanner looks up for new services to scan
  • 46. Removes configuration from hidden places If you are providing a cross service new system, you probably don’t need a git repository for the configuration So everything is transparent and open to everybody Information is where it needs to be, on the service itself Ease onboarding of newcomers
  • 47. Cleanup is not a hard problem anymore Systems live and die, consumers react Ops synchronization is not needed anymore
  • 48. Help innovating Real decoupling You can start your new project on your laptop Templating systems create your configs easily No migration costs anymore, we don’t configure tools Semantics are better than YAML config files