Shipping NodeJS with Docker and CoreOS

Shipping NodeJS with
Docker and CoreOS
@RossKukulinski
BayNode Talk Night
November 20, 2014

@RossKukulinski
@RossKukulinski
SpeakIt.io Founder
BayNode Co-Organizer
Soccer Fanatic
Node-Forward Mentor

What I’m going to Cover
@RossKukulinski
• Our Story
• Background on Docker & CoreOS
• Tips & Tricks / Lessons Learned

* History of SpeakIt
* We were the Future-Tech / Labs group, based in CA, but HQ in Virginia
* Tried lots of audio/video conferencing, none worked well enough for us
* We wanted
* high quality audio
* no user accounts or 18 digit codes
* run in a web browser, one-click on a link
* support for 1-on-1 as well as company wide meetings
* Parent company builds real-time communication for flight simulators and military training
* why don’t we build our own tool?
* NodeJS, websockets/socket.io
* WebRTC
* Proprietary high-performance audio mixing platform

The internal tool that wasn’t
internal anymore
@RossKukulinski
SpeakIt was well received, we used it internally
But we started using it for customer calls, open source projects
Other people wanted in… so we opened SpeakIt up to the public via a closed beta
It was a side project that grew with new features, capabilities, etc while staying one big monolithic app

@RossKukulinski
When we decided to spin SpeakIt off, we realized to actually build and iterate quickly we needed to breakup our monolithic application into smaller micro services.
Currently, new devs needed to understand the whole application to make changes, which slowed down our on-boarding process.
We also knew this re-architecture would require a complete rework of our build, test, deployment infrastructure.
So we took a deep breath. Planned things out, and got to work.

@RossKukulinski
12factor.net
As we were planning out the reworked architecture for the new SpeakIt, we drew heavily from the idea of a 12-factor app.
Highly recommend you take a close look at this site — in my opinion great guidelines for building networked applications

@RossKukulinski
Our Goals
• Reduce application complexity (do one thing well!)
• Scalable
• Fault tolerant
• Support running multiple versions of the same app
• Consistent app from dev → test → staging → prod
• Minimize time spent doing ‘devops’
People from the node community should be familiar with the idea of building small modules that do one thing well. We wanted to take the same approach at the
application layer. Each app should do one thing and one thing well. For example, we have an app that manages file uploads, another for managing user accounts,
another for managing conference rooms, etc.

@RossKukulinski
Docker
I’ve been tracking the Docker project since it was first open sourced in 2013. We did some experiments with version 1.0 and were pretty happy.

VM vs Docker
@RossKukulinski
https://docker.com/whatisdocker/
In some ways, you can think of Docker containers as lightweight virtual machines. Docker was originally based on LXC, or Linux Containers, but earlier this year they
moved to their own lib container library.
One big difference is that Docker containers usually run only _one_ process, whereas your VM might run lots of processes. Since Docker containers are so lightweight,
its easy to run lots of containers, one per process.

@RossKukulinski
• Containers start quickly
• Containers have small footprint
• Dockerized applications run anywhere
• Really fast builds via cached images
• Registry for storing images from build pipeline
• Images can be layered
• Abstracts app networking from system networking

@RossKukulinski
Our Goals
• Scalable
• Fault tolerant
• Run multiple versions of the same app
So by using Docker as our runtime, we could split up our monolithic application to several smaller pieces, each that runs inside its own container.
Because docker abstracts the networking and filesystem for each app, we could run multiple versions of the same app on the same host. This is helpful for A/B testing
of changes as well as allowing us to horizontally scale our applications out.
Finally, because containers are run once, run anywhere… this helps to eliminate the ‘but it runs on my machine’

How do you ship
docker containers?
@RossKukulinski
Bash scripts (ugh)
Ansible / Puppet / Chef
So, then the question became — ok, how do we actually get our containers into the wild.
Obviously, the docker registry would be a central tenant, but there needs to be some orchestration tools around it

Linux for Massive Server Deployments
@RossKukulinski
This is where CoreOS entered the picture for us. We had an opportunity to sit down with some seasoned Rackspace developers who had worked on their onMetal
product. onMetal launched supporting CoreOS, and Rackspace has been really helpful to us in planning and building our new architecture. After a lengthy discussion
with them, we came to the conclusion that we should tryout CoreOS.

@RossKukulinski
• Minimal Operating System
• Automated software updates
• Runs docker containers
• Supported by all major cloud
providers
• Can also run on bare metal
https://coreos.com/

@RossKukulinski
Fault Tolerant
• Clustered by default
• Support for multiple HA zones
• Distributed tools like etcd & fleet
• HTTP Key-Value Store
• Service Discovery
• Application Scheduling
https://coreos.com/
ETCD is a distributed high-available key-value store that runs in your CoreOS cluster. Its similar to ZooKeeper, in that it can be used for service discovery, shared
configuration,

@RossKukulinski
Scalable
https://coreos.com/
What’s cool about CoreOS is that the topology is immensely scalable.
The core Cluster running etcd is limited to 9 nodes, but the worker machines can be consumers of the etcd / fleet scheduling of Docker containers. This allows you to
scale (and de-scale) your resources quickly and efficiently.

@RossKukulinski
Goals
• Scalable
• Fault tolerant
• Run multiple versions of the same app
So with CoreOS and Docker, we’ve checked everything off our list.
There are definitely some missing pieces of the puzzle. There’s no great automated way of hooking your build/test system (jenkins for us) to deploying new services to
your CoreOS cluster.
There are a few systems that are trying to take this on but are pretty early in development. In the meantime, we’re building our own (open source) tool to handle these
rolling deploys.

Now for the good stuff
@RossKukulinski
Lessons Learned / Tips & Tricks

Docker Registry
@RossKukulinski
• Public Registry
• Private Registry
• Quay.io / DockerHub
• Run your own
• Take advantage of layering docker images
* Docker registry is great (though not as great as npm!)
* Published docker images are a great starting point
* Definitely use them for your underlying OS
* But for software I’ve found them to be out of date, and tagged versions _tend_ to be old or not quite usable
* I often copy/paste a public docker file and make my own version, then push to either the public registry or our own private one
* Private registries seem to be hit or miss. We tried quay.io and Docker Hub and were not impressed with their performance for downloading images. I spoke with some
senior folks at Quay (now owned by CoreOS) about our performance problems and they say they’ve made improvements, especially the AWS <-> Rackspace network
connection.
* We ultimately decided to run our own registry _inside_ our datacenter for fast docker push/pulls.
* I do recommend using SSD backed storage for that however, and probably should take advantage of elastic-block storage (or w/e its called in your hosting
system).
* I should also note that the registry runs _inside_ a Docker container. So that means that you’re storing docker images inside a docker container. Yo dog, I heard
you like images.

@RossKukulinski
speakit/nodejs
This is an example Dockerfile for building a NodeJS image. This is what we use for our base nodejs images, and we have published this on github and Docker hub.
An important note regarding Dockerfile builds: Docker build by default caches each build step (each RUN, ADD, command). This is great, because subsequent builds
only need to start at the first step that has changed. Unfortunately, if your build does things over the network (like yum update / yum install, npm install), Docker doesn’t
know whether the network resources has changed — so it just uses the cache. This is why we added an ENV LAST_UPDATED to line 4. If we want to update this
images packages, we just need to bump the LAST_UPDATED env line and then the next Docker build will start with line 4 and then grab the latest packages.
It’s important to note that docker build does have a no-cache flag which will force all steps to be build every time. This ensures your builds are always up to date… but
they will also take longer to complete. The choice is yours.

Node App Dockerfile
@RossKukulinski
This is an example Dockerfile for one of our nodejs apps.
It’s important to note that we add our package.son file first, run npm install which can take a while then we add our source code.
This way if you change your source, a new docker build only needs to copy your new source
Important to note that you might also want to add ENV NPM_INSTALLED <date> above npm install Otherwise subsequent builds will not grab the latest packages

Private GitHub Repos
@RossKukulinski
If you need to install git clone from private repos, then you’ll need to add your private key to the image. We have a special sshkey for our automated build system (that’s
tied to a robot github account in our organization).
This allows us to git clone private github repos in later images.
We remove the id_rsa key in our application Dockerfiles so that our private key is never in a container outside our private network.
It’s been pointed out to me that we could take advantage of docker’s new ‘onbuild’ command in this dockerfile. Onbuild would allow us to add package.json, npm install,
git clone, and then remove the ssh key in an automated way. That’s something we might experiment with.

> docker pull image:latest
@RossKukulinski
The docker pull command by default pulls all tagged versions of your image. If you have automated builds, this pull can take a _long_ time. I highly recommend only
pulling the :latest tag, or if you know a specific version you want, pull that tag.
For us, our automated builds tag images with the git-sha and jenkins build# so we can easily pull down any version from our private registry.

Local CoreOS Dev
• Can use Vagrant with a single (or multi) node
cluster
• Digital Ocean pretty cheap for small cluster
@RossKukulinski
In theory you can use Vagrant for local CoreOS development testing. We’ve gotten it to work, but have run into too many problems over the last 2-3 weeks that we’ve
scrapped it.
Instead we’re taking advantage of cost-effective VMs in the cloud to provision a CoreOS machine (or 3) per developer on-demand.

@RossKukulinski
Monitoring
• CLI tools (fleetctl via ssh)
• CoreGI (github.com/astilabs/
CoreGI)
• cAdvisor (github.com/google/
cadvisor)
I’m all for having great command-line tools. Fleet & etcd are pretty decent overall, but if you’ve got more than 20 services or so (we run almost 100 in production), it
quickly becomes annoying to monitor the state of your cluster.
We built an open source tool called CoreGI that is a simple NodeJS/AngularJS application that runs in a container (duh) on every host in your CoreOS cluster. CoreGI
provides an easy at-a-glance display of the machines, units, and unit-files in your cluster.
If you want more in-depth look at the performance of your containers, take a look at cAdvisor which gives per-container CPU, memory, and network activity. cAdvisor
has a REST API, so I expect we’ll eventually add that information to CoreGI.

Service Discovery
@RossKukulinski
• Don’t hardcode the
host port of your
container
• Sidekick pattern -> Write
to etcd
• Confd (github.com/
kelseyhightower/confd)
• Vulcan
(vulcanproxy.com)
https://coreos.com/
One of the new things for me when building out our new application stack was the idea of service discovery. Each time an instance of a service comes up (e.g. user
account management), it (or its sidekick) registers the IP & Port of the running container with etcd. This allows any other services that need to make account queries to
find a running instance of the account REST API.
This also allows you to quickly scale any one particular service out horizontally by dynamically updating load balancers / proxies (like nginx / HA Proxy) based on etcd
information. A great tool for doing that is Confd written by kelsey Hightower.
Another tool that’s still under active development is Vulcan Proxy, which could really help in this area. It was a little _too_ new for even us. nginx/haproxy are really good
at what they do.

Cloud Load Balancers
• How do your users access services in CoreOS?
• Could run Global service with proxy on 80/443
• Or update cloud lbs dynamically based on etcd
• Soon: github.com/astilabs/CoreOS-Cloud-LB
@RossKukulinski
So with service discovery, all of the apps/proxies within your cluster can easily dynamically reconfigure themselves as applications start/stop across your cluster.
But how do your end-users actually connect to your system? You *could* run a global service on every machine that listens on port 80/443 and proxies traffic
accordingly. Then your DNS system does round-robin balancing across all the hosts in your system.
I’m not a huge fan of this simply because of the delays in DNS propagation if a host goes down. This is especially true if you’re taking advantage of auto-scaling of your
cluster.
Instead, we’ve created a package called CoreOS-Cloud-LB (probably will rename to etcd-cloud-lb for trademark conflict). This tool monitors etcd for loadbalancer
named mappings & ip:ports and dynamically updates your cloud load balancers accordingly. Right now this tool only supports Rackspace Load balancers, but under the
hood it uses a cross-cloud library called pkgcloud. Once we clean up the documentation a bit we will be open sourcing this tool (probably next week (11/23/2014)

npm install -g coreos-cluser-cli
@RossKukulinski
If you’re a developer and you quickly want to bring up a CoreOS cluster, check out this Cool tool built by Ken Perkins using pkgcloud.
—> Right now Rackspace only, but no reason it couldn’t be expanded to other platforms

Things to watch
@RossKukulinski
• Kubernetes
• Google Container Engine
• Vulcan Proxy
• Paz (paz.sh)
• Panamax (panamax.io)
• Mesosphere (mesosphere.com)
I mentioned earlier that the deployment (and rolling deployment) of services is still fairly “roll your own” in the CoreOS ecosystem. Kubernetes is the project to watch in
this area — it’s what Google is using to automated their new Google Container Engine. I talked with a few of the folks working on the project and they recommended
taking a serious look for “production” use of Kubernetes in the late-spring / early summer 2015 timeframe.
Paz.sh is a project that’s being worked on by the folks of http://www.yld.io/. It looks to be a slightly different take of CoreGI in that it’s focused on the build/delivery
pipeline of containers to your core-os sytem. I understand that they’re planning on open sourcing Paz in early 2015.
Panamax is another commercial (open source?) project thats supposed to help with the automated deploy of containers. We haven’t looked too closely at it, but its
something we’re keeping an eye on.
Finally, be sure to check out mesosphere. it’s a combination of coreos and Apache Mesos that in theory will be amazingly awesome. its still new and to be honest their
documentation / walk through tutorials are still very immature (non-existent). I sat down with a friend from new relic a couple weeks ago and we were unsuccessful in
deploying node & redis to a mesosphere cluster. *sigh*

@RossKukulinski
Resources
• Example cloud_config
• https://gist.github.com/rosskukulinski/
9ddff8e5f67a24cc7bb7
• Full example of sidekick pattern for Redis
• https://gist.github.com/rosskukulinski/
96f7709fa20d7def6b9e
• PXE Booting CoreOS Post coming soon…
Example cloud config with Rackspace monitoring, using 2nd hard drive for docker image storage, private docker hub credentials, dynamically detecting private ips for
use in apps
Another common pattern with CoreOS is service announcement/discovery through etcd. The exact systemd/etcd configuration can be a little tricky.
Finally, we’ve gotten PXE booting of CoreOS that matches our production cloud configuration. We’ll be blogging / open sourcing some tools in that space very soon.

Other Resources
@RossKukulinski
• CoreOS Docs: https://coreos.com/docs/
• CoreOS User Google Group
• #coreos & #docker on FreeNode (I’m ‘rossk’)
• SpeakIt GitHub (https://github.com/astilabs)
• SpeakIt Blog (https://blog.speakit.io)
Definitely checkout CoreOS docs as well as the CoreOS User & Dev google croups.
#coreos on FreeNode has a pretty active community. If you’ve got questions, ask there — please feel free to also ping me directly (I’m ‘rossk’ on FreeNode).
Checkout our GitHub and our blog for more posts as we explore the CoreOS / Docker ecosystem.

@RossKukulinski
Thanks!
Questions?
I hope this material proves helpful.
I’d love to hear your feedback, comments, suggestions, and corrections! You can find me on twitter / github @rosskukulinski and you can email me at ross@speakit.io.

Shipping NodeJS with Docker and CoreOS

Related slideshows

More Related Content

Shipping NodeJS with Docker and CoreOS