Care and feeding notes

3/3/12 No Title

Care and Feeding of Large Web Applications
by Perrin Harkins

So, you launched your website. Congratulations!

And then there were a bunch of quick fixes. And you started getting traffic so you had to add more
machines. And some more developers. And more features to keep your new users happy. And suddenly
you find yourself spending all your time doing damage control on a site that seems to have taken on a life of
its own and you can't make a new release because the regression testing alone would take three years.

Usually, this is the part where everyone starts clamoring for a rewrite, and the CEO contemplates firing
your ass and bringing in an army of consultants to rewrite it all in the flavor of the month.

How can we avoid this mess? How can we create a web development process that is sustainable for years
and doesn't hold back development?

Backstory
There's more than one way to do it, but I'll tell how my team did it, at a small startup company called Plus
Three. Let me give you a few stats about our project:

About 2.5 years of continuous development

2 - 5 developers on the team during that time

65,000+ lines of Perl code

1600+ lines of SQL

(Computed with David Wheeler's SLOCCount program)

Plenty of HTML, CSS, and JavaScript too

6000+ automated tests in 78 files

169 CPAN modules

It's a big system, built to support running websites for political campaigns and non-profit membership
organizations. Some of the major components are a content management system, an e-commerce system
with comprehensive reporting, a data warehouse with an AJAX query builder GUI, a large-scale e-mail
campaign system, a variety of user-facing web apps, and an asynchronous job queue.

This talk isn't meant to be about coding style, which I've discussed in some previous talks, but I'll give you
the 10,000 foot overview:

Object-oriented

MVC-ish structure with the typical breakdown into controller classes, database classes, and templates.

ﬁle:///Users/perrinharkins/Conferences/care_and_feeding.html 1/7

3/3/12 No Title

(Not very pure MVC, but that's a whole separate topic.)

Our basic building blocks were CGI::Application, Class::DBI, and HTML::Template.

Ok, that's the software. How did we keep it under control?

Deployment
Let's dive right in by talking about the hardest thing first: deployment. So hard to get right, but so rarely
discussed and so hard to generalize. Everyone ends up with solutions that are tied very closely to their own
organization's quirks.

The first issue here is how to package a release. We used plain old .tar.gz files, built by a simple script after
pulling a tagged release from our source control system. We tried to always release complete builds, not
individual files. This is important in order to be sure you have a consistent production system that you can
rebuild from scratch if necessary. It's also important for setting up QA testing. If you just upload a file here
and there (or worse, vi a file in production!), you get yourself in a bad state where your source control no
longer reflects what's really live and your testing misses things because of it. We managed to stick to the
"full build release" rule, outside of dire emergencies.

Like most big Perl projects we used a ton of CPAN modules. The first advice you'll get about how to install
them is "just use the CPAN shell," possibly with a bundle or Task file. This is terrible advice.

The most obvious problem with it is that as the number of CPAN modules increases, the probability of one
of them failing to install via the CPAN shell for some obscure and irrelevant reason approaches 1.

The second most obvious problem is that you don't want to install whatever the latest version of some
module happens to be -- you want to install the specific version that you've been developing with and that
you tested in QA. There might be something subtly different about the new version that will break your site.
Test it first.

Let me lay out the requirements we had for a CPAN installer:

Install specific versions.

Install from local media. Sometimes a huge CPAN download is not convenient.

Handle versions with local patches. We always submitted our patches, but sometimes we couldn't
afford to wait for a release that included them.

Fully automated. That means that modules which ask pesky questions during install must be handled
in some way. I'm looking at you, WWW::Mechanize.

Install into a local directory. We don't want to put anything in the system directories because we want
to be able to run multiple versions of our application on one machine, even if they require different
versions of the same module.

Skip the tests. I know this sounds like blasphemy, but bear with me. If you have a cluster of identical
machines, running all the module tests on all of them is a waste of time. And the larger issue is that
CPAN authors still don't all agree on what the purpose of tests is. Some modules come with tests that
are effectively useless or simply fail unless you set up test databases or jump through similar hoops.


3/3/12 No Title

Our solution to the installation problem was to write an automated build system that builds all the modules it
finds in the src/ directory of our release package. (Note that this means we can doctor one of those modules
if we have to.) We used the Expect module (which is included and bootstrapped at the beginning of the
build) and gave it canned answers for the modules with chatty install scripts. We also made it build some
non-CPAN things we needed: Apache and mod_perl, the SWISH-E search engine. If we could have
bundled Perl and MySQL too, that would have been ideal.

Why bundle the dependencies? Why not just use whatever apache binary we find lying around? In short,
we didn't want to spend all of our time troubleshooting insane local configurations and builds where
someone missed a step. A predictable runtime environment is important.

To stress that point a little more, if your software is an internal application that's going to be run on
dedicated hardware, you can save yourself a lot of trouble by only supporting very specific configurations.
Just as an example, only supporting one version of one operating system cuts down the time and resources
you need for QA testing. To this end, we specified exact versions of Perl, MySQL, Red Hat Linux, and a
set of required packages and install options in addition to the things we bundled in our releases.

That was the theory anyway. Reality intruded a bit here in the form of cheap legacy hardware that would
work with some versions of Red Hat and not others. If we had a uniform cluster of hardware, we could
have gone as far as creating automated installs, maybe even network booting, but the best we were able to
do was keep our list of supported OS versions down to a handful. This is also a place where human nature
can become a problem. If you have a separate sysadmin group, they can get territorial when developers try
to dictate details of the OS to install. But that's another separate topic.

The automated build worked out very well. Eventually though, as we added more modules, the builds
started taking longer than we would have liked. Remember, we built them on every machine. Not the most
efficient thing to do.

The obvious next step would be binary distributions, possibly using RPMs, or just tar balls. Not trivial, but
not too bad if you can insist on one version of Perl and one hardware architecture. If we were only
concerned about distributing the CPAN modules, it might be possible to use something existing like PAR.

If you're interested in seeing this build system, the Krang CMS (which we used) comes with a version of it,
along with a pretty nice automated installer that checks dependencies and can be customized for different
OSes. (http://krangcms.com/) You could probably make your own for the CPAN stuff using CPANPLUS,
but you'd still need to do the Expect part and the non-CPAN builds.

QA
Upgrades
We didn't automate upgrades enough. Changes on a production system are tense for everyone, and its much
better to have them automated so that you can fully test them ahead of time and make the actual work to be
done in the upgrade process as dumb as possible. We didn't fully automate this, but we did fully automate
one of the crucial parts of it: data and database schema upgrades.

Our procedure was pretty simple, and coincidentally similar to the Ruby on Rails schema upgrade
approach. We kept the current schema version number in the database and the code version number in the
release package, and when we ran our upgrade utility it would look for any upgrade scripts with versions
between the one we were on and the one we wanted to go to. For example, when going from version 2.0 to

3/3/12 No Title

3.0, it would look in the upgrade/ directory (also in our install bundle), find scripts named V2.1 and V3.0,
and run them in order. Usually they just ran SQL scripts, but sometimes we needed to do some things in
perl as well.

Our SQL upgrade scripts were written by hand. I tried a couple of schema diffing utilities but they were
pretty weak. They didn't pick up things like changes in default value for a column, or know what to do with
changes in foreign keys. Maybe someday someone will make a good one. Even then, it will still require
some manual intervention when columns and tables get renamed, or a table gets split into multiple tables.

One cool thing we discovered recently is a decent way to test these upgrades on real data. We always set up
a QA server with a copy of the current version of the system, and then try our upgrade procedure and
continue with testing. This works fine except that when you fix a bug and need to do it again, it takes
forever to set it up again. We tried VMWare snapshots, but the disk performance for Linux on VMWare
was so poor that we had to abandon it. Backups over the network seemed like they would take a long time
to restore. Then we tried LVM, the Linux Volume Manager. It let us take a snapshot just before the upgrade
test, and then roll back to it almost instantly.

Time-travel bug

Plugin System
Harder than it sounds

Simple factory works for most things

Configuration
The trouble with highly configurable software is that someone has to configure it. Our configuration options
expanded greatly as time went on, and we had to devise ways to make configuring it easier.

We started with a simple config file containing defaults and comments, like the one that comes with
Apache. In fact it was very much like that one because we used Config::ApacheFormat.

In the beginning, this worked fine. Config::ApacheFormat supplied a concept of blocks that inherit from
surrounding blocks, so that if you have a block for each server and a parameter that applies to all of them,
you can put it outside of those blocks and avoid repeating it. You can even override that parameter in the
one server that needs something different.

As the number of parameters grew, we realized a few things:

People will ignore configuration options they don't understand. Expectations are that if the server
starts, it must be okay.

A few of lines of comments in a config file is pretty weak documentation.

Long config files full of things that you hardly ever need to change are pointless and look daunting.

To deal with these problems, we started making extensive use of default values, so that things that didn't
usually get changed could be left out of the file. We ended up creating a fairly complex config system in
order to keep the file short. It does things like default several values based on one setting, e.g. setting the
domain name for a server allows it to default the cookie domain, the e-mail account to use as the From

3/3/12 No Title

address on site-related mail, etc.

Of course this created the necessity to see what all of the values were defaulting to, so a config dumper
utility was created.

By the time we were done, we had moved to a level where using one of the complex config modules like
Config::Scoped probably would have been a better choice than maintaining our own. Well, Config::Scoped
still scares me, but something along those lines.

Testing
You all know the deal with testing. You have to have it. It's your only hope of being able to change the
code later without breaking everything. This point became very clear to me when I did a couple of big
refactorings and the test suite found all kind of problems I missed on my own.

For any large application, you'll probably end up needing some local test libraries that save setup work in
your test scripts. Ours had functions for doing common things like getting a WWW::Mechanize object all
logged in and ready to go.

When you're testing a large database-driven application, you need some strategies for generating and
cleaning up test data. We created a module for this called Arcos::TestData. (Arcos is the name of the
project.) The useage is like this:

my $creator = Arcos::TestData->new();
END { $creator->cleanup() }

# create an Arcos::DB::ContactInfo
my $contact_info = $creator->create_contact_info();

# create one with some values specified
my $aubrey = $creator->create_contact_info(first_name => 'George',
occupcation => 'housecat');

This one is simple, but some of them will create a whole tree of dependent objects with default values to
avoid needing to code all that in your test. When the END block runs, it deletes all the registered objects in
reverse order, to avoid referential integrity problems.

This seemed very clever at the time. However, after a while there were many situations that required special
handling, like web-based tests that cause objects to be created by another process. We had solutions for
each one, but they took programmer time, and at this point I think it might have been smarter to simply wipe
the whole schema at the end of a test script. We could have just truncated all the non-lookup tables pretty
quickly.

We got a lot of mileage out of Test::WWW::Mechanize.

Test::Class helps similar classes

Testing web interfaces - Mech tricks - Selenium

Smolder

Testing difficult things


3/3/12 No Title

Code Formatting
This was the first project I worked on where we had an official Perl::Tidy spec and we all used it. Can I just
say it was awesome? That's all I wanted to say about it. Developers who worked on Perl::Tidy, you have
my thanks.

Version Control
A couple of years ago, only crackpots had opinions about version control. CVS was the only game in town.
These days, there's several good open source choices and everyone wants to tell you about their favorite
and why yours is crap.

I'm not going to go into the choice of tools too much here. You can fight that out amongst yourselves. We
used Subversion, but I'll try to talk about the theory without getting bogged down in the mechanics.

Most projects need at least two branches: one for maintenance of the release currently in production, and
one for new development. Most of you are familiar with this from open source projects.

Here are the main ideas we used for source control:

The main branch is for new development, but must be stable. Code should not to be checked in until
all tests pass. (But more about that later.)

When you make a release of the main branch, tag it. That means tagging the whole branch at that
point. Example: tag release 2.0. The main branch is now for development of 3.0.

For each main branch release, make a maintenance branch from the point where you tagged it.
Example: make a "2.x" branch for fixing bugs that show up in production.

When you make a bug fix release from a maintenance branch, tag the branch and then merge all
changes since the last release on that branch to the main branch. This is the only merging ever done
and it's always a merge of changes from one sequentially numbered tag to the next and into the main
branch. Example: tag the 2.x branch bug fix release as 2.1. Merge all changes from 2.0 to 2.1 to the
main development branch.

This is about as simple as you can make it, and it worked very well for us for a long time. Eventually
though, we discovered situations that didn't fit nicely. One of these was that sometimes there was a period
of a few days during QA where part of the team would still be working on bug fixes on the development
branch while others were ready to move on to working on features for the next major release. You can't do
both in the same place. One solution is to create the maintenance branch at that point, for doing the final
pre-release bug fixes, and let the main branch open up for major new development. It's a bad sign if you
need to do this often. Usually the team should be sharing things evenly enough to make it unnecessary.

Another problem, although less frequent than you might expect, is keeping the development branch stable at
all times. Some changes are too big to be done safely as a single commit. At that point it becomes necessary
to make a feature branch, working on it until the new feature is stable and all tests are passing again, and
then merging it back to the main development branch.

Beware of complicated merging, whether your tools support it well or not. A web app is not the Linux
kernel. If you find yourself needing to do bidirectional merges or frequent repeated merges to the point


3/3/12 No Title

where you have trouble keeping track of what's been merged, you may need to take a look at your process
and see if there's some underlying reason. Maybe the source control system is being used as a substitute for
basic personal communication on your team, or has become a battleground for warring factions. Some
problems are easier to solve by talking to your co-workers than by devising a complex branching scheme.


Care and feeding notes

Related slideshows

Recommended for you

More Related Content

What's hot

What's hot (20)

Similar to Care and feeding notes

Similar to Care and feeding notes (20)

More from Perrin Harkins

More from Perrin Harkins (9)

Recently uploaded

Recently uploaded (20)

Care and feeding notes