SlideShare a Scribd company logo
One-Man Ops
with Puppet & Friends
     Jos Boumans
Operations @ Krux Digital
Can I have
another /8

    How you know us
Ubuntu Server
10.04 LTS
AWS Integration
Good guys of
Data Privacy
Not to be confused
Our Traffic

• Serving 4000-10000 user & contextual data
• Sub 100 ms response times
• Processing ~150 gb of raw data per day
• Twitter: Average ~3000 tweets/second
Our Infrastructure

• Started small on AWS. Now:
• 100 dedicated nodes
• +100-200 on demand Map/Reduce nodes
• Dozens of local development machines
• 20 different types of machines
One-Man Ops team
Sad Panda
Go from here...
... to here
Your Toolkit
Ubuntu 10.04
Uses AMI user-data to bootstrap puppet on the client

### Update puppet to 2.6.3
- source: "ppa:mathiaz/puppet-backports"
apt_update: true
apt_upgrade: true

ssh-rsa: AAAAB3NzaC.....+ujFHz

   server: ""
   # certname %i: instanceid, %f: fqdn of the machine
   certname: "%i.%f"
  ca_cert: |
monthly updates
you can upgrade
       the kernel
      Only AMI that I know that can do this
Updated software for
           Backported builds for
     Apache, Memcache, Mysql, PHP, etc
I may be biased
<3 Elastic Load
They're free and will save you more than once
<3 S3
(Simple Storage Service)
      Great cheap data retention
        Good poor mans CDN
Tip: Get ExpanDrive for
great SSHFS and S3FS
     Available for Windows and Mac:
   Hot Standby - Failover is ~7 minutes
Read Replicates - Improve read performance

   BUT, you can't replicate out of RDS :(
Use EBS Root
   (Elastic Block Storage)
You can reboot and stop/start machines and keep state
  Consider attaching extra EBS for data persistence

Tip: Software raid for multiple EBS drives for better IO
</3 Network
        This will happen to you a lot

Relying on network connections will decrease
        availability of your machines
</3 Floating
   public IPS
    AWS DHCP server is flaky

  AWS DNS TTL is 60 seconds

Limited amount of fixed public IPs
Sort your DNS
  AWS offers

When you go multi data center or have big traffic,
 seriously consider Dyn:
Avoid Single
Points of Failure
       Because they WILL fail.

 Architect for eventually consistent,
 distributed systems where you can.
Remember him..?
Optimize for making
Puppet development
   Bridge the gap between dev & ops

     Tip: use a c1.medium at least
Put your Puppet
  code in VCS
I really don't need to explain why, right?
Run multiple Puppet

We put 1 host of each cluster in puppet environment
 development, 1 in staging, the rest in production

         Don't break everything at once :)
Split your Puppet
 code into modules
     We use: Forge, Components, Services
Use seperate init.pp,
params.pp & config.pp
Params.pp so you can include variables from elsewhere

              Config.pp lets you specify:
           kfoo::config { $fqdn } in a service
                     and require:
        Kfoo::Config[ $fqdn ] in the component
Use a common
        base class
Set up all the plumbing from users, to apt,
 to filesystems, to mounts, ntp, sudo, git,
        monitoring, ssh, and so on.

      Run it early using run stages
Sample Service
class s_webui {
  include kbase
  include kapache
  include kwebui
  include kredis

    kwebui         { $fqdn: }
    kapache::vhost { $fqdn: ssl => 443 }
    kredis::config { $fqdn: memory => '100M' }
Write tools to make
you more productive
Enable developers to run their own Puppet master

         Create new components easily

           Push changes to production

       Our code: /
Your own Puppet server
          & manifests
puppet001:puppet-jib$ screen -S jib.puppetmaster 
  bin/run_puppet_master_locally 8180

Running: sudo puppet master --no-daemonize
 --verbose --debug --masterport 8180
 --pidfile /mnt/tmp/
 --confdir /data/git/puppet-jib/bin/..

notice: Starting Puppet master version 2.6.3
Our Layout
Use an External
           Node Classifier
           Manage your host specific configuration
              separately from your manifests

Our code: /blob/puppet/bin/
Keep node
configuration in an
 editable location
                 We chose S3

Git, LDAP, or anything else that works for you.
Sign nodes that have
  a configuration only
        Keyed off their certname, run periodically

                     Inspired by:

 Our code: /blob/puppet/bin/
Master Puppet.conf
node_terminus = exec
external_nodes = /usr/bin/ --bucket instances
reports        = http, store, foreman

### different puppet environments: development, staging, production
templatedir = $confdir/env/development/templates
modulepath = $confdir/env/development/krux-modules:

Sample Configuration
{ 'classes': ['s_sandbox::jib'],
  'parameters': {
  'zone':                 'us-east-1c',
  'instance_type':         'c1.medium',
  'instance_id':           'i-23a3d042',
  'security_group':         'krux-ops-dev',
  'puppet_environment': 'development',
  'puppet_master_port': 8180,
  'kredis_save_to_disk': 0
  'certname':                '
Attend a Puppet
    Master Training!
            No, I don't get a kick back :)
... avoid becoming him
 Reports & Alerts
   This feature alone is worth installing it.

      Run it on the same host as your
     Puppet master for minimal friction
Dashboard / Browser
   Node Classifier

    We are happy with S3 based solution

       YMMV though: do look into it!
Initiate Puppetrun

      Couldn't get it to work though :(
Python Boto & s3cmd
$ s3cmd put file.txt
Great for cronjobs, maintenance tasks & file syncs

  Consider s3://my-dropbox for your company

boto: Full python API
   access to AWS
        Boto + AWS + Puppet
     Real 'Infrastructure as Code'
     Launch AWS nodes
          Manage zone, security group, type ami,
              puppet class, EBS, hostname

              Bootstraps the node for puppet,
          integrates with external node classifier

Our code: /blob/aws/bin/
$ -t m1.large -z us-east-1a -a 10
  -H -s mycorp-development
  ami-2ec83147 s_development

Starting instance of ami ami-2ec83147 - this may take a while
......... started i-12345678

Attaching 10gb volume to instance i-12345678 - this may take a while
..... attached vol-87654321

Created these DNS entries: =>

Wrote configuration to S3 key:
       Manage & Sync
     Programmatically manage your security groups
           keep groups in sync across regions

Our code: /blob/aws/bin/
Monitoring & Graphing
Free developer
            1 Free node with all features,
         unlimited nodes with basic features
         Free: HTTP(S), PING, SSH, DNS, TCP
Premium: HTTP JSON(!), Custom plugins, Mysql, Apache
                  mod_status, etc.

        Get a 2nd free node through referral:
Performance Graphs
Puppet classes &
       config information

Monitoring & Alerts
Generate your
  cloudkick.conf from
  Use puppet classes, tags, colors as you define them
                  as cloudkick tags

Our code for doing so:
Cloudkick Gem for
    Uses your cloudkick tags to do node selection,
which are based straight off your puppet classes & facts
Cloudkick pssh
$ cloudkick pssh --query 'node:redis-c*' 'hostname'

[1] 18:38:23 [SUCCESS]
[2] 18:38:23 [SUCCESS]
[3] 18:38:24 [SUCCESS]
[4] 18:38:24 [SUCCESS]
Krux Improvements:
 pscp, listing nodes
           Get it from our github:

          Fork and contribute!
Cloudkick list
$cloudkick list --full --query 'node:redis-c*'

# Name            IP                Type         Zone
redis-c-master001     m2.4xlarge   us-east-1a
redis-c-slave001      m2.4xlarge    us-east-1a
redis-c-slave002      m2.4xlarge   us-east-1b
redis-c-slave004       m2.4xlarge   us-east-1d
Take away:
Measure Everything!
                Further reading:

  Pagerduty for cell phone/pager/email alerts
  New Relic for more in depth app monitoring
MCollective for more advanced task parallelization
Just one more thing....
VirtualBox + Ubuntu
   + Puppet = JFDI
     Use same puppet infrastructure to provision
               dev machines locally

Put it on a USB stick, be up and running in 30 minutes

Our code for doing so:
Thank You!
Slides at:

  Follow us: @KruxEngineering

  We're Hiring:

More Related Content

What's hot

Drupal VM for Drupal 8 Dev - Drupal Camp STL 2017
Drupal VM for Drupal 8 Dev - Drupal Camp STL 2017Drupal VM for Drupal 8 Dev - Drupal Camp STL 2017
Drupal VM for Drupal 8 Dev - Drupal Camp STL 2017
Jeff Geerling
Extend and build on Kubernetes
Extend and build on KubernetesExtend and build on Kubernetes
Extend and build on Kubernetes
Stefan Schimanski
Practical Operation Automation with StackStorm
Practical Operation Automation with StackStormPractical Operation Automation with StackStorm
Practical Operation Automation with StackStorm
Shu Sugimoto
Configuration management and deployment with ansible
Configuration management and deployment with ansibleConfiguration management and deployment with ansible
Configuration management and deployment with ansible
Ivan Dimitrov
Container and microservices: a love story
Container and microservices: a love storyContainer and microservices: a love story
Container and microservices: a love story
Thomas Rossetto
Rebooting a Cloud
Rebooting a CloudRebooting a Cloud
Rebooting a Cloud
Jesse Robbins
決済サービスのSpring Bootのバージョンを2系に上げた話
決済サービスのSpring Bootのバージョンを2系に上げた話決済サービスのSpring Bootのバージョンを2系に上げた話
決済サービスのSpring Bootのバージョンを2系に上げた話
Ryosuke Uchitate
DevOps for Humans - Ansible for Drupal Deployment Victory!
DevOps for Humans - Ansible for Drupal Deployment Victory!DevOps for Humans - Ansible for Drupal Deployment Victory!
DevOps for Humans - Ansible for Drupal Deployment Victory!
Jeff Geerling
Drupal VM for Drupal 8 Dev - MidCamp 2017
Drupal VM for Drupal 8 Dev - MidCamp 2017Drupal VM for Drupal 8 Dev - MidCamp 2017
Drupal VM for Drupal 8 Dev - MidCamp 2017
Jeff Geerling
Micrometerでメトリクスを収集してAmazon CloudWatchで可視化
Micrometerでメトリクスを収集してAmazon CloudWatchで可視化Micrometerでメトリクスを収集してAmazon CloudWatchで可視化
Micrometerでメトリクスを収集してAmazon CloudWatchで可視化
Ryosuke Uchitate
Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA
Riot Games Scalable Data Warehouse Lecture at UCSB / UCLARiot Games Scalable Data Warehouse Lecture at UCSB / UCLA
Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA
React & The Art of Managing Complexity
React &  The Art of Managing ComplexityReact &  The Art of Managing Complexity
React & The Art of Managing Complexity
Ryan Anklam
Locarise,reagent and JavaScript Libraries
Locarise,reagent and JavaScript LibrariesLocarise,reagent and JavaScript Libraries
Locarise,reagent and JavaScript Libraries
Ikuru Kanuma
Achieving Continuous Delivery: An Automation Story
Achieving Continuous Delivery: An Automation StoryAchieving Continuous Delivery: An Automation Story
Achieving Continuous Delivery: An Automation Story
Docker cr ineta-20150601
Docker cr ineta-20150601Docker cr ineta-20150601
Docker cr ineta-20150601
Chasing AMI - Building Amazon machine images with Puppet, Packer and Jenkins
Chasing AMI - Building Amazon machine images with Puppet, Packer and JenkinsChasing AMI - Building Amazon machine images with Puppet, Packer and Jenkins
Chasing AMI - Building Amazon machine images with Puppet, Packer and Jenkins
Tomas Doran
Windows Azure Web Sites - Things they don’t teach kids in school - BuildStuffLT
Windows Azure Web Sites - Things they don’t teach kids in school - BuildStuffLTWindows Azure Web Sites - Things they don’t teach kids in school - BuildStuffLT
Windows Azure Web Sites - Things they don’t teach kids in school - BuildStuffLT
Maarten Balliauw
CloudStack and NFV
CloudStack and NFVCloudStack and NFV
CloudStack and NFV
Alan Norton
Breaking Up With Your Data Center Presentation
Breaking Up With Your Data Center PresentationBreaking Up With Your Data Center Presentation
Breaking Up With Your Data Center Presentation

What's hot (20)

Drupal VM for Drupal 8 Dev - Drupal Camp STL 2017
Drupal VM for Drupal 8 Dev - Drupal Camp STL 2017Drupal VM for Drupal 8 Dev - Drupal Camp STL 2017
Drupal VM for Drupal 8 Dev - Drupal Camp STL 2017
Extend and build on Kubernetes
Extend and build on KubernetesExtend and build on Kubernetes
Extend and build on Kubernetes
Practical Operation Automation with StackStorm
Practical Operation Automation with StackStormPractical Operation Automation with StackStorm
Practical Operation Automation with StackStorm
Configuration management and deployment with ansible
Configuration management and deployment with ansibleConfiguration management and deployment with ansible
Configuration management and deployment with ansible
Container and microservices: a love story
Container and microservices: a love storyContainer and microservices: a love story
Container and microservices: a love story
Rebooting a Cloud
Rebooting a CloudRebooting a Cloud
Rebooting a Cloud
決済サービスのSpring Bootのバージョンを2系に上げた話
決済サービスのSpring Bootのバージョンを2系に上げた話決済サービスのSpring Bootのバージョンを2系に上げた話
決済サービスのSpring Bootのバージョンを2系に上げた話
DevOps for Humans - Ansible for Drupal Deployment Victory!
DevOps for Humans - Ansible for Drupal Deployment Victory!DevOps for Humans - Ansible for Drupal Deployment Victory!
DevOps for Humans - Ansible for Drupal Deployment Victory!
Drupal VM for Drupal 8 Dev - MidCamp 2017
Drupal VM for Drupal 8 Dev - MidCamp 2017Drupal VM for Drupal 8 Dev - MidCamp 2017
Drupal VM for Drupal 8 Dev - MidCamp 2017
Micrometerでメトリクスを収集してAmazon CloudWatchで可視化
Micrometerでメトリクスを収集してAmazon CloudWatchで可視化Micrometerでメトリクスを収集してAmazon CloudWatchで可視化
Micrometerでメトリクスを収集してAmazon CloudWatchで可視化
Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA
Riot Games Scalable Data Warehouse Lecture at UCSB / UCLARiot Games Scalable Data Warehouse Lecture at UCSB / UCLA
Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA
React & The Art of Managing Complexity
React &  The Art of Managing ComplexityReact &  The Art of Managing Complexity
React & The Art of Managing Complexity
Locarise,reagent and JavaScript Libraries
Locarise,reagent and JavaScript LibrariesLocarise,reagent and JavaScript Libraries
Locarise,reagent and JavaScript Libraries
Achieving Continuous Delivery: An Automation Story
Achieving Continuous Delivery: An Automation StoryAchieving Continuous Delivery: An Automation Story
Achieving Continuous Delivery: An Automation Story
Docker cr ineta-20150601
Docker cr ineta-20150601Docker cr ineta-20150601
Docker cr ineta-20150601
Chasing AMI - Building Amazon machine images with Puppet, Packer and Jenkins
Chasing AMI - Building Amazon machine images with Puppet, Packer and JenkinsChasing AMI - Building Amazon machine images with Puppet, Packer and Jenkins
Chasing AMI - Building Amazon machine images with Puppet, Packer and Jenkins
Windows Azure Web Sites - Things they don’t teach kids in school - BuildStuffLT
Windows Azure Web Sites - Things they don’t teach kids in school - BuildStuffLTWindows Azure Web Sites - Things they don’t teach kids in school - BuildStuffLT
Windows Azure Web Sites - Things they don’t teach kids in school - BuildStuffLT
CloudStack and NFV
CloudStack and NFVCloudStack and NFV
CloudStack and NFV
Breaking Up With Your Data Center Presentation
Breaking Up With Your Data Center PresentationBreaking Up With Your Data Center Presentation
Breaking Up With Your Data Center Presentation

Similar to One-Man Ops

Bare Metal to OpenStack with Razor and Chef
Bare Metal to OpenStack with Razor and ChefBare Metal to OpenStack with Razor and Chef
Bare Metal to OpenStack with Razor and Chef
Matt Ray
Rapid scaling in_the_cloud_with_puppet
Rapid scaling in_the_cloud_with_puppetRapid scaling in_the_cloud_with_puppet
Rapid scaling in_the_cloud_with_puppet
Carl Caum
Building an HPC Cluster in 10 Minutes
Building an HPC Cluster in 10 MinutesBuilding an HPC Cluster in 10 Minutes
Building an HPC Cluster in 10 Minutes
Monica Rut Avellino
Kubernetes laravel and kubernetes
Kubernetes   laravel and kubernetesKubernetes   laravel and kubernetes
Kubernetes laravel and kubernetes
William Stewart
Google Cloud Platform for DeVops, by Javier Ramirez @ teowaki
Google Cloud Platform for DeVops, by Javier Ramirez @ teowakiGoogle Cloud Platform for DeVops, by Javier Ramirez @ teowaki
Google Cloud Platform for DeVops, by Javier Ramirez @ teowaki
javier ramirez
Docker Security workshop slides
Docker Security workshop slidesDocker Security workshop slides
Docker Security workshop slides
Docker, Inc.
Symfony finally swiped right on envvars
Symfony finally swiped right on envvarsSymfony finally swiped right on envvars
Symfony finally swiped right on envvars
Sam Marley-Jarrett
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2
Writing & Sharing Great Modules - Puppet Camp Boston
Writing & Sharing Great Modules - Puppet Camp BostonWriting & Sharing Great Modules - Puppet Camp Boston
Writing & Sharing Great Modules - Puppet Camp Boston
Reusable, composable, battle-tested Terraform modules
Reusable, composable, battle-tested Terraform modulesReusable, composable, battle-tested Terraform modules
Reusable, composable, battle-tested Terraform modules
Yevgeniy Brikman
Couch to OpenStack: Nova - July, 30, 2013
Couch to OpenStack: Nova - July, 30, 2013Couch to OpenStack: Nova - July, 30, 2013
Couch to OpenStack: Nova - July, 30, 2013
Trevor Roberts Jr.
Puppet and CloudStack
Puppet and CloudStackPuppet and CloudStack
Puppet and CloudStack
Kubernetes 101 for_penetration_testers_-_null_mumbai
Kubernetes 101 for_penetration_testers_-_null_mumbaiKubernetes 101 for_penetration_testers_-_null_mumbai
Kubernetes 101 for_penetration_testers_-_null_mumbai
n|u - The Open Security Community
TIAD - DYI: A simple orchestrator built step by step
TIAD - DYI: A simple orchestrator built step by stepTIAD - DYI: A simple orchestrator built step by step
TIAD - DYI: A simple orchestrator built step by step
The Incredible Automation Day
Postgres the hardway
Postgres the hardwayPostgres the hardway
Postgres the hardway
Dave Pitts
Itb session v_memcached
Itb session v_memcachedItb session v_memcached
Itb session v_memcached
Skills Matter
Portland Puppet User Group June 2014: Writing and publishing puppet modules
Portland Puppet User Group June 2014: Writing and publishing puppet modulesPortland Puppet User Group June 2014: Writing and publishing puppet modules
Portland Puppet User Group June 2014: Writing and publishing puppet modules
June 2014 PDX PUG: Writing and Publishing Puppet Modules
June 2014 PDX PUG: Writing and Publishing Puppet Modules June 2014 PDX PUG: Writing and Publishing Puppet Modules
June 2014 PDX PUG: Writing and Publishing Puppet Modules
Build Your Own CaaS (Container as a Service)
Build Your Own CaaS (Container as a Service)Build Your Own CaaS (Container as a Service)
Build Your Own CaaS (Container as a Service)
HungWei Chiu

Similar to One-Man Ops (20)

Bare Metal to OpenStack with Razor and Chef
Bare Metal to OpenStack with Razor and ChefBare Metal to OpenStack with Razor and Chef
Bare Metal to OpenStack with Razor and Chef
Rapid scaling in_the_cloud_with_puppet
Rapid scaling in_the_cloud_with_puppetRapid scaling in_the_cloud_with_puppet
Rapid scaling in_the_cloud_with_puppet
Building an HPC Cluster in 10 Minutes
Building an HPC Cluster in 10 MinutesBuilding an HPC Cluster in 10 Minutes
Building an HPC Cluster in 10 Minutes
Kubernetes laravel and kubernetes
Kubernetes   laravel and kubernetesKubernetes   laravel and kubernetes
Kubernetes laravel and kubernetes
Google Cloud Platform for DeVops, by Javier Ramirez @ teowaki
Google Cloud Platform for DeVops, by Javier Ramirez @ teowakiGoogle Cloud Platform for DeVops, by Javier Ramirez @ teowaki
Google Cloud Platform for DeVops, by Javier Ramirez @ teowaki
Docker Security workshop slides
Docker Security workshop slidesDocker Security workshop slides
Docker Security workshop slides
Symfony finally swiped right on envvars
Symfony finally swiped right on envvarsSymfony finally swiped right on envvars
Symfony finally swiped right on envvars
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2
Writing & Sharing Great Modules - Puppet Camp Boston
Writing & Sharing Great Modules - Puppet Camp BostonWriting & Sharing Great Modules - Puppet Camp Boston
Writing & Sharing Great Modules - Puppet Camp Boston
Reusable, composable, battle-tested Terraform modules
Reusable, composable, battle-tested Terraform modulesReusable, composable, battle-tested Terraform modules
Reusable, composable, battle-tested Terraform modules
Couch to OpenStack: Nova - July, 30, 2013
Couch to OpenStack: Nova - July, 30, 2013Couch to OpenStack: Nova - July, 30, 2013
Couch to OpenStack: Nova - July, 30, 2013
Puppet and CloudStack
Puppet and CloudStackPuppet and CloudStack
Puppet and CloudStack
Kubernetes 101 for_penetration_testers_-_null_mumbai
Kubernetes 101 for_penetration_testers_-_null_mumbaiKubernetes 101 for_penetration_testers_-_null_mumbai
Kubernetes 101 for_penetration_testers_-_null_mumbai
TIAD - DYI: A simple orchestrator built step by step
TIAD - DYI: A simple orchestrator built step by stepTIAD - DYI: A simple orchestrator built step by step
TIAD - DYI: A simple orchestrator built step by step
Postgres the hardway
Postgres the hardwayPostgres the hardway
Postgres the hardway
Itb session v_memcached
Itb session v_memcachedItb session v_memcached
Itb session v_memcached
Portland Puppet User Group June 2014: Writing and publishing puppet modules
Portland Puppet User Group June 2014: Writing and publishing puppet modulesPortland Puppet User Group June 2014: Writing and publishing puppet modules
Portland Puppet User Group June 2014: Writing and publishing puppet modules
June 2014 PDX PUG: Writing and Publishing Puppet Modules
June 2014 PDX PUG: Writing and Publishing Puppet Modules June 2014 PDX PUG: Writing and Publishing Puppet Modules
June 2014 PDX PUG: Writing and Publishing Puppet Modules
Build Your Own CaaS (Container as a Service)
Build Your Own CaaS (Container as a Service)Build Your Own CaaS (Container as a Service)
Build Your Own CaaS (Container as a Service)

Recently uploaded

Cookies program to display the information though cookie creation
Cookies program to display the information though cookie creationCookies program to display the information though cookie creation
Cookies program to display the information though cookie creation
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly DetectionAdvanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Bert Blevins
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdfPigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
Kief Morris
What's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptxWhat's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptx
Stephanie Beckett
The Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive ComputingThe Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive Computing
Larry Smarr
Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...
BookNet Canada
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
Yevgen Sysoyev
The Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU CampusesThe Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU Campuses
Larry Smarr
Quality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of TimeQuality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of Time
Aurora Consulting
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
Toru Tamaki
Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024
BookNet Canada
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
UiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs ConferenceUiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs Conference
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
Safe Software
Measuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at TwitterMeasuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at Twitter
Best Programming Language for Civil Engineers
Best Programming Language for Civil EngineersBest Programming Language for Civil Engineers
Best Programming Language for Civil Engineers
Awais Yaseen
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf

Recently uploaded (20)

Cookies program to display the information though cookie creation
Cookies program to display the information though cookie creationCookies program to display the information though cookie creation
Cookies program to display the information though cookie creation
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly DetectionAdvanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdfPigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
What's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptxWhat's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptx
The Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive ComputingThe Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive Computing
Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
The Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU CampusesThe Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU Campuses
Quality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of TimeQuality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of Time
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
UiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs ConferenceUiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs Conference
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
Measuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at TwitterMeasuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at Twitter
Best Programming Language for Civil Engineers
Best Programming Language for Civil EngineersBest Programming Language for Civil Engineers
Best Programming Language for Civil Engineers
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf

One-Man Ops

  • 1. One-Man Ops with Puppet & Friends Jos Boumans Operations @ Krux Digital
  • 3. Can I have another /8 please? How you know us
  • 10. Not to be confused with...
  • 11. Our Traffic • Serving 4000-10000 user & contextual data requests/second • Sub 100 ms response times • Processing ~150 gb of raw data per day • Twitter: Average ~3000 tweets/second
  • 12. Our Infrastructure • Started small on AWS. Now: • 100 dedicated nodes • +100-200 on demand Map/Reduce nodes • Dozens of local development machines • 20 different types of machines
  • 19. cloud-init Uses AMI user-data to bootstrap puppet on the client
  • 20. #cloud-config ### Update puppet to 2.6.3 apt_sources: - source: "ppa:mathiaz/puppet-backports" apt_update: true apt_upgrade: true ssh-rsa: AAAAB3NzaC.....+ujFHz puppet: conf: puppetd: server: "" # certname %i: instanceid, %f: fqdn of the machine certname: "%i.%f" ca_cert: | -----BEGIN CERTIFICATE----- ....
  • 22. you can upgrade the kernel Only AMI that I know that can do this grub-kernels-for-kernel-upgrades/
  • 23. Updated software for 10.04 Backported builds for Apache, Memcache, Mysql, PHP, etc
  • 24. I may be biased
  • 25. AWS
  • 26. <3 Elastic Load Balancer They're free and will save you more than once
  • 27. <3 S3 (Simple Storage Service) Great cheap data retention Good poor mans CDN
  • 28. Tip: Get ExpanDrive for great SSHFS and S3FS Available for Windows and Mac:
  • 29. RDS > Own MySQL Hot Standby - Failover is ~7 minutes Read Replicates - Improve read performance BUT, you can't replicate out of RDS :(
  • 30. Use EBS Root (Elastic Block Storage) You can reboot and stop/start machines and keep state Consider attaching extra EBS for data persistence Tip: Software raid for multiple EBS drives for better IO
  • 31. </3 Network Partitioning This will happen to you a lot Relying on network connections will decrease availability of your machines
  • 32. </3 Floating public IPS AWS DHCP server is flaky AWS DNS TTL is 60 seconds Limited amount of fixed public IPs
  • 33. Sort your DNS AWS offers When you go multi data center or have big traffic, seriously consider Dyn:
  • 34. Avoid Single Points of Failure Because they WILL fail. Architect for eventually consistent, distributed systems where you can.
  • 37. Optimize for making Puppet development EASY Bridge the gap between dev & ops Tip: use a c1.medium at least
  • 38. Put your Puppet code in VCS I really don't need to explain why, right?
  • 39. Run multiple Puppet environments We put 1 host of each cluster in puppet environment development, 1 in staging, the rest in production Don't break everything at once :)
  • 40. Split your Puppet code into modules We use: Forge, Components, Services
  • 41. Use seperate init.pp, params.pp & config.pp Params.pp so you can include variables from elsewhere Config.pp lets you specify: kfoo::config { $fqdn } in a service and require: Kfoo::Config[ $fqdn ] in the component
  • 42. Use a common base class Set up all the plumbing from users, to apt, to filesystems, to mounts, ntp, sudo, git, monitoring, ssh, and so on. Run it early using run stages
  • 43. Sample Service class s_webui { include kbase include kapache include kwebui include kredis kwebui { $fqdn: } kapache::vhost { $fqdn: ssl => 443 } kredis::config { $fqdn: memory => '100M' } }
  • 44. Write tools to make you more productive Enable developers to run their own Puppet master Create new components easily Push changes to production Our code: /
  • 45. Your own Puppet server & manifests puppet001:puppet-jib$ screen -S jib.puppetmaster bin/run_puppet_master_locally 8180 Running: sudo puppet master --no-daemonize --verbose --debug --masterport 8180 --pidfile /mnt/tmp/ --confdir /data/git/puppet-jib/bin/.. ..... notice: Starting Puppet master version 2.6.3 .....
  • 46. Our Layout $git/ bin/ env/ development/ forge/ krux-modules/ services/ staging/ ... production/ ...
  • 47. Use an External Node Classifier Manage your host specific configuration separately from your manifests Our code: /blob/puppet/bin/
  • 48. Keep node configuration in an editable location We chose S3 Git, LDAP, or anything else that works for you.
  • 49. Sign nodes that have a configuration only Keyed off their certname, run periodically Inspired by: puppet-in-uecec2-puppet-support-in-ubuntu-images/ Our code: /blob/puppet/bin/
  • 50. Master Puppet.conf [master] ....... node_terminus = exec external_nodes = /usr/bin/ --bucket instances reports = http, store, foreman ### different puppet environments: development, staging, production [development] templatedir = $confdir/env/development/templates modulepath = $confdir/env/development/krux-modules: $confdir/env/development/forge: $confdir/env/development/services [....]
  • 51. Sample Configuration { 'classes': ['s_sandbox::jib'], 'parameters': { 'zone': 'us-east-1c', 'instance_type': 'c1.medium', 'instance_id': 'i-23a3d042', 'security_group': 'krux-ops-dev', 'puppet_environment': 'development', 'puppet_master_port': 8180, 'kredis_save_to_disk': 0 'certname': ' 47334fd8-1516-451d-bd5a-8760ab2a36c0', }}
  • 52. Attend a Puppet Master Training! No, I don't get a kick back :)
  • 55. Email Reports & Alerts This feature alone is worth installing it. Run it on the same host as your Puppet master for minimal friction Summarized_E-Mail_Reports
  • 57. Theoretically: Node Classifier External_Nodes We are happy with S3 based solution YMMV though: do look into it!
  • 59. Python Boto & s3cmd
  • 60. $ s3cmd put file.txt s3://my-bucket Great for cronjobs, maintenance tasks & file syncs Consider s3://my-dropbox for your company
  • 61. boto: Full python API access to AWS Boto + AWS + Puppet = Real 'Infrastructure as Code'
  • 62. Launch AWS nodes Manage zone, security group, type ami, puppet class, EBS, hostname Bootstraps the node for puppet, integrates with external node classifier Our code: /blob/aws/bin/
  • 63. $ -t m1.large -z us-east-1a -a 10 -H -s mycorp-development ami-2ec83147 s_development Starting instance of ami ami-2ec83147 - this may take a while ......... started i-12345678 Attaching 10gb volume to instance i-12345678 - this may take a while ..... attached vol-87654321 Created these DNS entries: => Wrote configuration to S3 key: s3://instances/
  • 64. Manage & Sync Programmatically manage your security groups keep groups in sync across regions Our code: /blob/aws/bin/
  • 66. Free developer account 1 Free node with all features, unlimited nodes with basic features Free: HTTP(S), PING, SSH, DNS, TCP Premium: HTTP JSON(!), Custom plugins, Mysql, Apache mod_status, etc. Get a 2nd free node through referral:
  • 68. Puppet classes & config information Monitoring & Alerts
  • 69. Generate your cloudkick.conf from Puppet Use puppet classes, tags, colors as you define them as cloudkick tags Our code for doing so:
  • 70. Cloudkick Gem for parallel-ssh Uses your cloudkick tags to do node selection, which are based straight off your puppet classes & facts
  • 71. Cloudkick pssh $ cloudkick pssh --query 'node:redis-c*' 'hostname' [1] 18:38:23 [SUCCESS] [2] 18:38:23 [SUCCESS] [3] 18:38:24 [SUCCESS] [4] 18:38:24 [SUCCESS]
  • 72. Krux Improvements: pscp, listing nodes Get it from our github: Fork and contribute!
  • 73. Cloudkick list $cloudkick list --full --query 'node:redis-c*' # Name IP Type Zone redis-c-master001 m2.4xlarge us-east-1a redis-c-slave001 m2.4xlarge us-east-1a redis-c-slave002 m2.4xlarge us-east-1b redis-c-slave004 m2.4xlarge us-east-1d
  • 74. Take away: Measure Everything! Further reading: Pagerduty for cell phone/pager/email alerts New Relic for more in depth app monitoring MCollective for more advanced task parallelization
  • 75. Just one more thing....
  • 77. VirtualBox + Ubuntu + Puppet = JFDI Use same puppet infrastructure to provision dev machines locally Put it on a USB stick, be up and running in 30 minutes Our code for doing so:
  • 79. Slides at: Follow us: @KruxEngineering We're Hiring: