[B6]heroku postgres-hgmnz

Heroku Postgres
The Tale of Conceiving and Building
a Leading Cloud Database Service

Harold Giménez
@hgmnz

1

Saturday, September 8, 12

Heroku Postgres
•
Database-as-a-service

•
Cloud

•
Fully managed

•
Over 2 years in production

•
From tiny blogs to superbowl commercials

2


Heroku Postgres is a Database as a Service provider
We provision and run databases in cloud infrastructure
It is fully managed, always on and available
Has been in production for over 2 years, and has powered everything from a personal blog to
sites backing superbowl commercial sites.

Heroku origins

3


Heroku is born with a vision of increasing developer productivity and agility.

Anyone remember heroku garden? While that product no longer exists, that vision remains
part of our core culture.

We want to enable developers to bring their creations to market as fast and pleasantly as
possible.

focus on rails

4


heroku got in the business of running web applications. As with any startup, it focused on
doing one thing well, and for heroku that was running rails applications.

The approach empowered developers like never before. As a heroku customer, as I was back
then, I was excited to make hobby apps available on the internet on a regular basis. It was so
easy.

rails apps need a database

5


Clearly, rails apps need a database. Rails got really good at doing CRUD, after all.

web apps need a database

6


but this is true of any web application

thankfully postgres was chosen

7


The story was something like

“Hey, we need a database. What should we use?”

Heroku was a very small team. The security expert happened to speak up and recommends
Postgres,
for it’s correctness track record and ﬁne grained user role management

otherwise I wouldn’t be here

8


I’ve been a Postgres user for years and know it is vastly superior to other open source RDBMS
projects. If Postgres had not been chosen, I wouldn’t be here.

“let’s make a production grade
postgres service”

9


Heroku would give you a free database whenever you create an app.

One database server would hold a bunch of users.

But this is not sufficient for serious production applications that require exclusive access to
more resources, and higher availability ops.

10


This is our team’s mascot. It is a slide from IBM used in marketing materials in the 70s.

It’s funny how this vision was not true back then, but we are making it a reality over 30 years
later.

(hopefully yours)

11


Heroku Postgres v.0.pre.alpha

•
A sinatra app implementing the heroku addons API

•
create servers

•
install postgres service

•
create databases for users - a “Resource”

•
Sequel talks to postgres

•
stem talks to AWS
12


Let’s talk about the tools used to build the very ﬁrst version of Heroku Postgres.

It was built in Ruby.
Sinatra is used to expose and APIs.
Sequel is used to talk to postgres databases as well as as an ORM
stem was built for this project - a very minimalistic and pleasant interface to the AWS APIs.
stem was made available as open source software.

Two main entities

13


There are two main entities in this application

Resource
{
database: ‘d4f9wdf02’,
port: 5432,
username: ‘uf0wjasdf’,
password: ‘pf14fhjas’,
created_at: ‘2012-05-02’,
state: ‘available’
}

14


A resource encapsulates a database, the actual tangible resource that customers buy. A
customer only cares about the database URL, used to connect to it.

Server
{
elastic_ip: ‘192.168.0.1’,
instance_id: ‘i-2efjoiads’,
ami: ‘pg-prod’,
availability_zone: ‘us-east-1’,
created_at: ‘2012-05-02’,
state: ‘booting’
}

15


A server is the physical box where the resource is installed.

Customers don’t have direct access to it. It’s for our own bookkeeping and maintenance.

It includes an IP address, availability zone, and other AWS related attributes.

...and a thin admin web interface

erb templates in sinatra
endpoint

16


The early application also had an admin interface, right in the very same codebase as erb
templates within some sinatra HTTP endpoints.

We are just an add-on

17


The Heroku Postgres offering is just an heroku addon.

18


There are numerous components to Heroku, one of which is the addons system.

Heroku Postgres is an addon just like any other third party is an addon (such as Papertrail or
Sendgrid). We don’t utilize any backdoors of any kind, and instead interface with the rest of
Heroku in the same way other addon providers do.

This is a great position to be in, because as consumers of the heroku addons echosystem, we
help drive its evolution.

we run on

19


Furthermore, the entire Heroku Postgres infrastructure runs on Heroku itself.

the simplest thing
that could possibly work,
but no less

20


Simplicity is key to building any sort of system, and in this case, the initial version of the
Heroku Postgres management app was as simple as it could be.

This allows us to modify behavior and evolve as quickly as possible, on a smaller more
pleasant code base.

We’ve come a long way since then

21


Fast forward a few years, and we are now managing a very large number of databases,
keeping them alive, and creating new ones at a higher rate than ever.

This requires more sophisticated processes and managers.

Let’s dive into how it works today

Monitoring and Workﬂow

22


Monitoring and Workﬂow are key to this type of system.

draw inspiration from gaming

23


In programming we often draw inspiration from a number of things.

A good example is OOP itself, which is inspired by the way messages are sent between
organisms in a biological ecosystem

The project lead (@pvh) has a background in gaming.

Imagine the bad guy in a Diablo game. He’s just wandering around doing nothing, because
there’s nothing to attack around him. At some point, he sees your character and charges
toward you. You battle the Diablo. He ﬁghts back, and ﬁnally you kill him. He dies a slow and
painful death.

There are many ways to model these kinds of systems. One can be an events based system,
where observers listen on events that are occurring and react to them appropriately. You
could also load all objects that need monitoring and process that queue. This either gets too
complex easily, or doesn’t scale at all because of memory constraints and size of the
workload.

A state machine is another good way to model this. A state machine is, at heart, an entity that
is fed some inputs, and in return it takes some action, and then may or may not transition to
a different state.

The bad guy is in a `wondering around` state when nothing is around it. But as soon as it
saw your character, it entered a `battle` state, and so on.

We model what happens in real life, which is that we observe our environment, register it, and
react to it.

class Resource
def feel monitoring
observations.create(
Feeler.new(self).current_environment
)
end
end

class Feeler
def current_environment
{
service_available?: service_available?,
open_connections: open_connections,
row_count: row_count,
table_count: table_count,
seq_scans: seq_scans,
index_scans: index_scans
}
end
end
24


This is what the actual source code looks like.

A Resource has a #feel method, which stores an observation based on what the Feeler sees.

A Feeler is an object that observes the current environment around it. It checks things like is
the service available, how many connections are open, and many more health checks.

class Resource
include Stateful workﬂow

state :available do
unless service_available?
transition :unavailable
end
end
end

resource = Resource.new
resource.transition :available
resource.feel
resource.tick
puts resource.state
# ‘unavailable’

25


module Stateful
def self.included(base) workflow
base.extend ClassMethods
end

module ClassMethods
def state(name, &block)
states[name] = block
end
def states; @states ||= {}; end
end

def tick
self.instance_eval(
&self.class.states[self.state.to_sym]
)
end

def transition(state)
# log and assign new state
end
end
26


In terms of workflow, we built an extremely simple state machine system.

It allows you to define states via the `state` method which takes an arbitrary block of code to
execute when invoked via the `#tick` method.

resource.feel
resource.tick

Need to do this all the time

27


We ﬁrst call #feel on an object, and then call #tick on it.

Feel stores new observed information, while #tick uses this information to make system
decisions, such as transitioning to other states, sending alerts, and much more.

We must run these two methods continuously

db1 db2 db3 db4 db5 db6 db7 db8 db9 ... dbn

db1.feel
db1.tick

28


One way to run it continously is via a work queue.

db2 db3 db4 db5 db6 db7 db8 db9 ... dbn db1

db2.feel enqueue(db1)
db2.tick

29


We create a queue and place all active resources on it. A set or workers pull jobs from the
queue, invoke feel and tick, and then enqueue themselves again.

This is in escense a poorly implemented distributed ring buffer, and it’s served us well.

QueueClassic
http://github.com/ryandotsmith/queue_classic

30


Our queue is implemented on top of the QueueClassic gem, which is a queue system built in
Ruby on top of Postgres with some interesting characteristics.

31


Let’s look at some of the states on our resource class. A resource can go through these
states.

One very important aspect of this system is idempotency. The system must be designed in
such a way that each state can be run any number of times and without affecting the end
result.

Examples where this is not immediately obvious are the creating and deprovisioning state.

Durability and Availability

32


Let’s talk about how we handle durability and availability of databases.

33


In Postgres, as in other similar systems, when you issue a write transaction, it firsts writes the
transaction to what’s called the Write-Ahead Log (WAL), and only then does it write to the
data files.

This ensures that all data committed to the system exists first in the WAL stream.

34


Of course, if the WAL stream is on the same physical disks as the data files, there’s a high
risk of data loss.

Many opt to place the wal segments on a separate disk than the data files. This is a great first
step (and one we also take).

But really, we don’t consider data to be durable until the WAL segments are replicated across
many data centers.

We ship WAL segments to multi-datacenter storage every 60 seconds. We use Wal-e, a
python WAL archiver written at Heroku and now available as open source.

35


Now that the WAL segments are out of the box, we can do many other tricks.

For example, creating a “follower” is as easy as fetching the WAL segments from the
distributed storage, and replaying these logs on a brand new server - once it has caught up,
we set up direct streaming replication between primary and follower.

36


Similary, a fork of a database sets pulls down the WAL segments from distributed storage and
replays them on a new server.

Once it’s caught up, instead of setting up streaming replication as in the follow case, instead
this new server starts producing WAL segments of it’s own (when write transactions occur on
it). So now the fork is set up to ship WAL segments to distributed storage, just like its leader.

Continuous Protection
•
Write-Ahead Log segments shipped to durable storage every
60 seconds

•
We can replay these logs on a new server to recover your
data

•
https://github.com/heroku/WAL-E

37


This is what we call Continuous Protection.

Having WAL segments always available is a primary concern of ours, as it allows us to easily
rebuild a server’s data state, and can be updated continuously as opposed to capturing full
backups of the system.

Need a more ﬂexible object model

38


Now, the introduction of all of these functions required us to rethink our object model.

timeline
39


We have the concept of a timeline

A timeline at time = zero contains no data, no commits.

participant
40


Participants are attached to a timeline. Participants can write data to the timeline.

41


Writing data to the timeline moves the timeline forward in time.

resource
43


A resource is what our users get. It maps to a URL. A resource is attached to one participant.

follower44


This allows us to model followers easily.

A follower is just a participant on the same timeline as its reader.

The difference is that followers can’t write to the same timeline. Only one participant can
write to the timeline, the follower’s leader (or primary).

fork
45


When we fork a database, it creates its own timeline. The new timeline now has drifted away
from it’s parent, and can be writable. So it will create it’s own path.

disaster
46


Finally, this system can be used during the event of catastrophic hardware failure.

When a database’s hardware fails completely, instead of trying to recover the server itself, it’s
best to create a new node and “STONITH” (http://en.wikipedia.org/wiki/STONITH)

47


What we do is create a new participant, hidden from the user.

recovery48


And once it is caught up and ready to go, we tie the resource to it.

So, the user only sees a blip in availability, but behind the scenes they are actually sitting on
entirely new hardware, like magic.

big project

49


Needless to say, this has become a big project over time.

lots of moving parts

50


long test suite

51


modularize and build APIs

52


So it’s time to spread out responsabilities by modularizing the system and building APIs that
are used for them to talk to each other.

53


What we’ve built is a constellation of heroku apps. We may split this even further in the
future.

gain in agility

54


This gains un in agility.

The test suites of each individual project is much smaller now, which improves our ability to
develop quicker.

It also means that each component can be deployed individually. For example, a deploy to the
admin front end UI has no effect on the main system’s APIs.

composable services

55


It also allows us to build better abstractions at the systematic level, which gains us in the
ability to compose services better.

For example, a system that provisions and manages servers from our infrastructure provider
can be used by many other consumers, not only heroku postgres.

independently scalable

56


They can furthermore be scaled individually. Some parts of the system require different loads
and response times than others, so now we are able to easily and clearly tweak our system
operations based on clearly decoupled subsystems.

Logging and Metrics

57


Finally, I’d like to talk about visibility into our app.

log generation

58


First, let’s talk about logging.

59


In Heroku, there’s a service called Logplex (it’s open source).

Your application is able to send logs to the logplex service to a speciﬁc channel (it uses
Capability Based Security).

Then, one or more consumers can “drain” the logs for that channel.

logs are event streams

60


how should you log?

61


Having this logging infrastructure available, let’s talk about how to make best use of it.

post “/work” do
puts “starting to do work”
worker = Worker.new(params)
begin
worker.lift_things_up
worker.put_them_down
rescue WorkerError => e
puts “Fail :( #{e.message}”
status 500
end
puts “done doing work”
status 200
end

62


This is an example of terrible logging.

$ heroku logs --tail

2012-07-28T02:43:35 [web.4] starting to do
work
2012-07-28T02:43:35 [web.4] Fail :(
invalid worker, nothing to do
2012-07-28T02:43:35 heroku[router] POST
myapp.com/work dyno=web.4 queue=0 wait=0ms
service=14ms status=500 bytes=643

63


There’s no structure to these logs, so it can’t be easily read and interpreted by a computer.

bad logging
•
What exactly happened?

•
When did it happen?

•
How long did it take?

•
How many times has it happened?

64


good logging
•
parseable

•
consistent

•
plentiful

65


post “/work” do
log(create_work: true, request_id: uuid) do
worker = Worker.new(params.merge(uuid: uuid))
begin
worker.lift_things_up
worker.put_them_down
rescue WorkerError => e
log_exception(e, create_work: true)
end
end
end

helpers do
def uuid
@uuid ||= SecureRandom.uuid
end
end

66


Instead, let’s do some more structured logging.

Also note how every request gets a UUID. This is critical to tying up all the logs for a given
request.

require ‘scrolls’
module App
module Logs
extend self

def log(data, &block)
Scrolls.log(with_env(data), &block)
end

def log_exception(exception, data, &block)
Scrolls.log_exception(with_env(data), &block)
end

def with_env(hash)
{ environment: ENV[‘RACK_ENV’] }.merge(data)
end
end
end

67


On the prior slide, we saw the `log` and `log_exception` methods.

This is a small module that provides those methods. It is a wrapper for the `scrolls` (open
source) gem.

Scrolls provides a framework for structured logging.
This module merely adds our environment name to the logs, which is useful for parsing later.

$ heroku logs --tail
2012-07-28T02:43:35 [web.4] create_work
request_id=afe2-f0d at=start
2012-07-28T02:43:35 [web.4] create_work
request_id=afe2-f0d at=exception
message=invalid worker, nothing to do
2012-07-28T02:43:35 [web.4] create_work
request_id=afe2-f0d at=finish elapsed=53
2012-07-28T02:43:35 heroku[router] POST
myapp.com/work dyno=web.4 queue=0 wait=0ms
service=14ms status=500 bytes=643

68

Now our logs look like this.

Easy to parse, and still easy to read by a human.

log consumption

69


Let’s talk about consuming those logs, which should make it clear why structured logging is
so important.

(this is the fun part)

70


71


As mentioned before, it’s possible to set up multiple log drains.

The heroku toolbelt has a utility to print out logs to your terminal (accessible via heroku logs
--tail).
But why stop there? You can have as many drains as you want!

We can set up a drain that stores data locally for further analysis and metrics generation.
Here, a postgres database is set up and logs stored to it on the key-value data type called
hstore.

select * from events;

72


73


Now that we have stored data on a postgres database, we can use SQL to query it and
generate some metrics.

We have a process that continuously queries this database and sends aggregated results to a
metrics collection service (third party).

good logging

metrics

alerts
74


Visibility into your system starts with good logging

Great logs enable easy metrics collection

Metrics lead to system alerts.

current tooling
•
still using sequel and sinatra

•
fog displaced stem

•
backbone.js for web UIs

•
fernet for auth tokens, valcro for validations

•
python, go and bash in some subsystems

75


So to wrap up, our current tooling includes these pieces of technology

lessons
•
managing databases is hard

•
start simple

•
extract (and share) reusable code

•
separate concerns into services

•
learn to love your event stream

76


thanks!

@hgmnz
@herokupostgres

77


[B6]heroku postgres-hgmnz

More Related Content

Similar to [B6]heroku postgres-hgmnz

Similar to [B6]heroku postgres-hgmnz (20)

More from NAVER D2

More from NAVER D2 (20)

Recently uploaded

Recently uploaded (20)

[B6]heroku postgres-hgmnz