Infrastructure as Code, Theory Crash Course

Infrastructure as Code - Theory
A true “crash course” with lots of references by Sven Balnojan
Background by Taylor Vick on Unsplash

What IS IaC???
• “[…]. When we say "as code" we mean that all the good practices we've
learned in the software world should be applied to infrastructure. Using
source control, adhering to the DRY principle, modularisation, maintainability,
and using automated testing and deployment are all critical practices. Those of
us with a deep software and infrastructure background need to empathise with
and support colleagues who do not. Saying "treat infrastructure like code" isn't
enough; we need to ensure the hard-won learnings from the software world are
also applied consistently throughout the infrastructure realm.”
• Source: https://www.thoughtworks.com/de/radar/techniques/infrastructure-as-
code
• Great, but why exactly do we need that?

And then…. there was… the Cloud!
(image from Unsplash)

What Happened - The Cloud
• Cloud => mostly micro service driven architectures

• Cloud => high elasticity possible & now standard (we can spin up anything
within a minute… compare that to buying a new server…)
• Problem IaC tries to fix: We were still working in “old ways” with “new
stuﬀ” => lots of trouble…

What We Will Talk About…
• 1. Typical Problems

• 2. Generic Solution

• 3. Rule of Three

• 4. More Specific Solution

• 5. Basic Principles of IaC

• 6. Practices of IaC

• 7. Understanding the Tooling Landscape

• 8. Further Reading! (I will leave out a ton of details…)
• THERE WILL BE EXERCISES & QUESTIONS THROUGHOUT.

1. Typical Problems
(small excerpt)

A not-complete list…
• 1. Server Sprawl… (large number, not well maintained)
• 2. Configuration Drift… (state of server is diﬀerent, unknown, drifted)

• 3. Snowflake… (“This server is special! It can’t be replicated without expert
X”)

• 4. Fragile Infrastructure/Jenga… (Pull out one piece, and everything falls)
• 5. Automation Fear… (“Cannot automate, too expensive”, causes even
less automation, even more of problems 4, 3, 2, 1…)

Highlight 2. Configuration Drift
It's when:
•We buy a server, configure it with some ACM (like Ansible/Chef/Puppet) and are
happy. E.g. set up SSH keys that are checked into a repository. 
•We then over the course of it's life time (which is years!) change stuff manually (without
ACM, because it’s much faster) like adding more SSH keys (and forgetting old ones)
update python versions, install packages,… 
•Suddenly, the checked in state (as in the ACM) and the true one differ a LOT. They shift
away. 
•Additionally, because there is stuff running on the system, they shift away anyways
even if we were not to touch it. (Ever had to delete files to make space on the disc? =>
configuration drift)..

Exercise 2. Configuration Drift
•Exercise 1: If not manually, how can you add SSH keys to a running
instance? Brainstorm some ways.

Exercise 2. Configuration Drift
•Possible Answers:
•- You could use a shell script to deploy SSH keys & remove old ones.
(versioned, manual)
•- You could use an ACM to deploy the keys. (versioned, manual)
•- You could “bake them in”, and relaunch. (automatic)
•- You could let them be “pulled on deployment”. (automatic)

Exercise 2. Addendum Configuration Drift
•Take “use ACM tool to deploy, manually”. Then you can distinguish two
kinds of configuration drift. Figure them out following this exercise!
•Exercise 2, Set up keys with a manual ACM tool run. Then…
•1. Add a new SSH key to the instance manually. Now run the ACM tool.
What happens in terms of configuration drift?
•2. Now do it differently, add the key somewhere else on the instance, with
some fancy tacky script to activate it. Run the ACM tool, what happens?
•

Exercise 2. Addendum Configuration Drift
•Exercise 2.1: Hopefully, the ACM tool will “detect the drift”, delete the key.
•Exercise 2.2: Nothing happens. The drift keeps on drifting.
•This leads us to two types of drifts, the ones that are detectable from a
declarative state & the saved state (see later what that is); And the
undetectable.
•Bad news: There is always undetectable drift, because we have to abstract!
•Good news: The second kind can still be handled, see “immutable
infrastructure” later.

Generic Solution
So how do we solve this generic issue? By using software engineering best
practices in the realm of Infrastructure. Call it Infrastructure as Code (IaC).
The idea: We simply threat the two parts,
- infrastructure component,
- and it’s configuration as
code, then apply all the good stuff to it.

Yvgeny Brikman, “How to use terraform”
A great summary of that is the “golden rule of terraform”:
“The master branch of the live repository should be a 1:1 representation
of what’s actually deployed in production.”

3. Rule of Three
(because I think automation fear deserves special attention)

Rule of Three - On Automation Fear…
My phD supervisor had a simple rule:
He would explain something once to me. If I came a second time he would
tell me “you better take a lot of photos, because I do not explain things three
times,… ever.”
My call: “Please, do not explain something to your infrastructure more than
twice….”

Exercise 3: SSH Keys revisited. It’s totally fine to add the first key manually.
But which solution would you choose the….
2. time?
3. time?

SSH Keys revisited: Which solution would you use the…
1. time: Add it manually.
2. time: Add it to repository, use a versioned shell script to deploy.
3. time: Write a test - e.g. for the number of keys or for the config version, let
it break, find ANY automated solution to fix the test. Be happy ever after
because you just made your infrastructure a bit antifragile…

4. More Specific Solution
Exercise 3: What is “all the good stuﬀ” from software engineering we want
to apply to infrastructure?

Specific Solution - What SE Practices Come to
IaC?
(A possible) Answer to Exercise 3: Basically anything that makes SE quality
high, which is (currently): 
1. Version Control
2. Automated Testing
3. Continuous Integration & Delivery/Deployment (CI & CD)…. and yes, that
implies about 100 more things.

Basic Principles of IaC
Nine Principles:
1.1. Reproducibility (whatever I do, everyone should be able to do it again) =
remember the rule of three...
2.2. Disposability (“it’s kettle, not pets”!)
3.3. Consistency (if I have two servers of the same component with a load balancer in
front, they should be consistent)
4.4. IaC actions should be easy, cheap, repeatable (a commit to master, that’s it.)
5.5. Service continuity (continual availability even if parts of the infra disappear -
should be able to delete it every minute! Zero Downtime updates.)
6.6. Self-testing systems (automated tests built in, run dozens of times a day)
7.7. Self-documenting systems (e.g. captured in code)
8.8. Small changes (if you spent more than 1 hour working without pushing it out, you're
making big changes)
9.9. Version all things (for traceability, visibility, actionability, rollback,...)

Basic Principles of IaC - A small exercise
1.Take an example:
2.1. A web scraper. One docker service which gets some “start url” then
starts scraping and stores the data to some place on the container…
4.Exercise 5 - Now do the following & think about how to do it in a IaC way:
5.1. Kill the scraper….and restart it (imagine for (1) that this happens within
the same second, without downtime.)
6.2. Launch two scraper service instances…
7.3. Then update the scraping script.
8.4. How do you deploy the scraper?

Basic Principles of IaC - A small exercise
1.1. Kill the scraper….
2.Did you just kill a “pet”? Think about the already scraped URLs, about the data.
3.(=> store data not in the service, but externally. Store the list of scraped URLs there as well)
5.2. Launch two scraper service instances…
6.They both got the same “start URL”. How do you untangle them?
7.(=> some consistency & shared data mechanism necessary. A DB, versioning on a shared file system etc.)
9.3. Then update the scraping script.
10.If you do it manually, you will get inconsistencies. So the only good solution => redeploy everything. But
wait, we like “service-continuity”, so migrating step by step (Zero downtime update) is the best way to go.
(although for this service down time seems to be ok…)
12.4. How do you deploy the scraper?
13.By hand using the AWS UI? That’s not very reproducible. At least put stuff into bash scripts using the
14.AWS-CLI, or better use infrastructure as code pipelines.
15.

6. Practices of IaC
(I’ll highlight three practices I find underrated and really useful… for the
rest, read “the” book.)

Practices of IaC - Highlight 1 - Declarative over
procedural language
1.1. Procedural languages
2.Describe the procedure, the steps to do something = the HOW.
4.2. Declarative languages
5.Describe the what, not a single step.
7.So how does that help us? It enforces the golden rule of terraform, it makes
things “reproducible”, and it helps YOU, because you don’t have to
understand HOW to do something but only what to do (more output focused).
9.It also makes a change set to be completely reviewable before deployment
to production!

procedural language
1.1. Exercise 6: Make declarative (bad pseudo code), take the following
procedural code and turn it into pseudo declarative code
3.“BUCKET_NAME=bla
4.If (aws s3api eval BUCKET_NAME exists): // do nothing;
5.Else (aws s3api crate BUCKET_NAME)”

procedural language
1.1. Exercise 6: Make declarative (bad pseudo code), take the following
procedural code and turn it into pseudo declarative code
BUCKET_NAME=bla;
SHOULD_BE_THERE=True
(yep that’s right, no mention of “how”)

Practices of IaC - Highlight 2 - Immutable
Infrastructure to the core
1.1. Immutable Infrastructure…
2.Is infrastructure you do not touch at all.
4.2. Immutable infrastructure can be terminated & relaunched every day, every hour.
5.If you do not need to touch the infrastructure piece it will be up and running really fast.
6.As a result it can be deleted all the time. That’s true “kettle”.
8.Less obvious side effect: There will be next to no configuration drift because things don’t live long!
10.3. Example
11.Not (that) immutable: You have a stage environment and one production env. On deployment of your
Instance, you pull in the configuration for the environment from the local “env server”. Everything is
versioned.
12.Immutable: “pulling in configuration” is changing a piece of infra. Immutable would be to either pre-
bake all config in, or to pre-bake multiple times.
13.(the example is already pretty extreme, just keep 2. In mind, you want to be able to kill stuff often. If
you’re afraid you probably need to pre-bake more stuff, make things more immutable.)

Practices of IaC - Highlight 3 - 6 months Cycle
Time for Technologies
1.6 Months Cycle Time for Technologies
2.… means pretending (and actually doing) that your technology choices will be valid for
about 6 months (the rough cycle of the ThoughtWorks TechRadar ;)) to a year.
4.Consequences
5.… if you use this idea, you will build modular infrastructure, and not trying to get locked
in. That’s a good thing! Because if you wrap stuff, you also make it testable, and
replaceable.
7.Question: Which piece of your infrastructure (applies to all technology choices really)
that’s older than 6 months would you find hard to replace when you notice a piece that’s
20%+ better?
9.(Now think about that thing, I bet it’s neither wrapped, nor tested end to end, otherwise
you wouldn’t find it hard to replace it with something better.)

Practices of IaC - MORE
1.There are many more, please seriously read the book on it…

7. Tooling Landscape
(not complete, just a list to understand diﬀerences in tools…)

1. First there was nothing…
1.How do you “deploy” something programmatically to AWS without CF?
2.- e.g. using the “AWS CLI” (which makes an API call to some endpoint)
3.- “aws s3 create-bucket …”
4.- what happens when the bucket is already there? => Script fails…

2. Then there was CloudFormation.
1.Solution: CloudFormation is declarative! If the bucket is there, that’s great,
if not it creates it.
3.
5.Not so nice: very bulky scripts in JSON/YAML, lots and lots of duplicate
code,… (some say they reduced the code size by a factor or 10-100 by
migrating from CF to something else…)

3. Then Terraform
1.Better Solution: Terraform, provider agnostic, allows for modules, has it’s
own DSL (not bulky JSON) and allows for separate files, etc.
3.
5.Not so nice: not so much, but it has it’s own DSL which brings some
downsides…

4. Then there were Pulumi & the CDK
1.Better Solution: Pulumi (& the CDK), provider agnostic (or not), and allow
you to simply use a first grade programming language (TypeScript or
Python,…) to do stuff => you can test, write modules, functions etc.
3.
5.Not so nice: we will see…

Thanks for listening!
8. Further Reading
Kief Morris, Infrastructure as Code (must read)
Yevgeniy Brinkman, Terraform Up and Running (has lots of generic stuﬀ!)

Infrastructure as Code, Theory Crash Course

Related slideshows

More Related Content

Infrastructure as Code, Theory Crash Course