SlideShare a Scribd company logo
Charity Majors
@mipsytipsy
Charity Majors
@mipsytipsy
There and back again: a Chef tale
How we drank the Kool-Aid, sobered up, and
learned to cook responsibly.
Mobile apps platform
500k+ apps
AWS
MongoDB, Cassandra, Mysql, Redis
ruby & rails => golang
Our mission:
• Support relentless growth
• Ship products fast
• Solve mobile apps naively at scale
Active monthly Parse installations
API requests per second
• Support relentless growth
• Ship products fast
• Solve mobile apps naively at scale
Our mission:
our mission
your mission
There and Back Again: How We Drank the Chef Kool-Aid, Sobered Up, and Learned to Cook Responsibly
There and Back Again: How We Drank the Chef Kool-Aid, Sobered Up, and Learned to Cook Responsibly
Chef the Base System!!
• bootstrapping nodes with knife-ec2
• configuring system packages
• managing deb versions
• ec2 hostname tags from chef node names
• route53 DNS records from hostname tags
• cron jobs, batch jobs
Chef the Services!!
• haproxy configs
• generate yaml files
• generate host lists
• manage config files for Parse services
• monitoring and graphing based off roles
Chef the Databases!!
• creating/managing mongo replica sets
• provisioning & assembling RAID devices
• assigning cassandra initial tokens
• backups, snapshotting & restores
• community cookbooks for mysql, redis
Chef the Deploys!!
• deploy Parse services?
….??????
wait …
There and Back Again: How We Drank the Chef Kool-Aid, Sobered Up, and Learned to Cook Responsibly
1) Things we did with
chef badly
2) Things that chef was
not the right tool for
mistakes were made …
• Overloading roles with too much work
• Confusion between role vs instantiation of service
• Using definitions instead of providers
• Using lots of data bags
• One attribute per config entry instead of a hash of all
entries
• Using knife search extensively
mistakes were made …
• Forking + modifying community cookbooks
• Importing community cookbooks with too many
custom dependencies
• Not using repo-per-cookbook / Berkshelf
• Not investing the time into vagrant, unit tests, staging
environment, versioning
• Where is my source of truth?!
but these are all solvable
problems.
but these are all solvable
problems.
what isn’t?
sometimes, chef just
ain’t enough.
• Provisioning from scratch
• Service registration & discovery
• Managing software & configs
• Databases
Problem areas
bootstrapping from vanilla AMIs
launching instances with knife-ec2
Provisioning
bootstrapping from vanilla AMIs
launching instances with knife-ec2
Provisioning
Solution: bake AMI with chef, use ASGs
realtime search needs realtime data
Service discovery
realtime search needs realtime data
Service discovery
Solution: zookeeper, consul, etcd, etc
Service discovery
avoid snowflake hosts
use distributed locking for cron jobs
Managing software & configs
• System software (debs, rpms)
• Developer-owned services
• Internal operations software
Managing software & configs
System software
Managing software & configs
Developer-owned services
• Do not tie code deploys to system changes
• Perform the minimal set of changes
• Configs *are* software. Version together.
Managing software & configs
Internal operations software
• Treat software engineering like software
engineering
• Treat systems-y packages like systems
packages
• Package and version “util” scripts
• Manage package versions with Chef
Databases at scale
Databases
DBA operations
Not really what chef is best at.
Imperative commands
Automatic remediation
Coordinating actions across nodes
Databases
DBA operations
• Create, tear down replica sets or nodes
• Verify backups
• Rolling version upgrade
• Elect new primary / switch masters
• Enable/disable query killer
• Change schemas or indexes
• Compaction, rotation
• Version replica set state
• Etc
Databases
DBA operations
If you don’t have to do a ton of DBA
ops, Chef can manage databases.
Don’t over-engineer in advance of
your actual needs.
Databases
Separation of configuration and state
Base system => chef
Detect and publish state changes => chef, zk
Generate monitoring configs => chef
Imperative commands => db tooling
Databases at scale
We chef for:
• Building base AMIs
• Generating monitoring configs
• Storing encrypted secrets
• Cron jobs (with zk lock)
• Inferring and publishing db state changes
Things we still suck at
• Single source of truth (git / chef-server)
• Isolated staging environment
• Full continuous testing for cookbooks
• Realtime data
• Internal software packaging & management
• Database administration at scale
Things we don’t chef
There and Back Again: How We Drank the Chef Kool-Aid, Sobered Up, and Learned to Cook Responsibly
Charity Majors
@mipsytipsy

More Related Content

There and Back Again: How We Drank the Chef Kool-Aid, Sobered Up, and Learned to Cook Responsibly

  • 3. There and back again: a Chef tale How we drank the Kool-Aid, sobered up, and learned to cook responsibly.
  • 4. Mobile apps platform 500k+ apps AWS MongoDB, Cassandra, Mysql, Redis ruby & rails => golang
  • 5. Our mission: • Support relentless growth • Ship products fast • Solve mobile apps naively at scale
  • 6. Active monthly Parse installations
  • 8. • Support relentless growth • Ship products fast • Solve mobile apps naively at scale Our mission:
  • 12. Chef the Base System!! • bootstrapping nodes with knife-ec2 • configuring system packages • managing deb versions • ec2 hostname tags from chef node names • route53 DNS records from hostname tags • cron jobs, batch jobs
  • 13. Chef the Services!! • haproxy configs • generate yaml files • generate host lists • manage config files for Parse services • monitoring and graphing based off roles
  • 14. Chef the Databases!! • creating/managing mongo replica sets • provisioning & assembling RAID devices • assigning cassandra initial tokens • backups, snapshotting & restores • community cookbooks for mysql, redis
  • 15. Chef the Deploys!! • deploy Parse services? ….??????
  • 18. 1) Things we did with chef badly 2) Things that chef was not the right tool for
  • 19. mistakes were made … • Overloading roles with too much work • Confusion between role vs instantiation of service • Using definitions instead of providers • Using lots of data bags • One attribute per config entry instead of a hash of all entries • Using knife search extensively
  • 20. mistakes were made … • Forking + modifying community cookbooks • Importing community cookbooks with too many custom dependencies • Not using repo-per-cookbook / Berkshelf • Not investing the time into vagrant, unit tests, staging environment, versioning • Where is my source of truth?!
  • 21. but these are all solvable problems.
  • 22. but these are all solvable problems. what isn’t?
  • 24. • Provisioning from scratch • Service registration & discovery • Managing software & configs • Databases Problem areas
  • 25. bootstrapping from vanilla AMIs launching instances with knife-ec2 Provisioning
  • 26. bootstrapping from vanilla AMIs launching instances with knife-ec2 Provisioning Solution: bake AMI with chef, use ASGs
  • 27. realtime search needs realtime data Service discovery
  • 28. realtime search needs realtime data Service discovery Solution: zookeeper, consul, etcd, etc
  • 29. Service discovery avoid snowflake hosts use distributed locking for cron jobs
  • 30. Managing software & configs • System software (debs, rpms) • Developer-owned services • Internal operations software
  • 31. Managing software & configs System software
  • 32. Managing software & configs Developer-owned services • Do not tie code deploys to system changes • Perform the minimal set of changes • Configs *are* software. Version together.
  • 33. Managing software & configs Internal operations software • Treat software engineering like software engineering • Treat systems-y packages like systems packages • Package and version “util” scripts • Manage package versions with Chef
  • 35. Databases DBA operations Not really what chef is best at. Imperative commands Automatic remediation Coordinating actions across nodes
  • 36. Databases DBA operations • Create, tear down replica sets or nodes • Verify backups • Rolling version upgrade • Elect new primary / switch masters • Enable/disable query killer • Change schemas or indexes • Compaction, rotation • Version replica set state • Etc
  • 37. Databases DBA operations If you don’t have to do a ton of DBA ops, Chef can manage databases. Don’t over-engineer in advance of your actual needs.
  • 38. Databases Separation of configuration and state Base system => chef Detect and publish state changes => chef, zk Generate monitoring configs => chef Imperative commands => db tooling
  • 40. We chef for: • Building base AMIs • Generating monitoring configs • Storing encrypted secrets • Cron jobs (with zk lock) • Inferring and publishing db state changes
  • 41. Things we still suck at • Single source of truth (git / chef-server) • Isolated staging environment • Full continuous testing for cookbooks
  • 42. • Realtime data • Internal software packaging & management • Database administration at scale Things we don’t chef