32

At our shop we use SVN for source control and CruiseControl for CI on handling automatic builds and deployments to our development, test, and integration environments.

This all works smoothly however due to hardware and resource constraints our integration environment is not a 2 server load balanced environment like our production environment. While everything else is equal that would be the only difference between our integration and production environments (although a big one!)

Theoretically the difference is a slightly different configuration of our app servers and the deploy script merely would have to drop the build artifacts into two servers instead of just the one, but why am I so nervous to automate our production deployments?!

I am generally not a control freak but I always feel the insatiable need to deploy production to production manually. I have heard from colleagues that this is generally a Really BAD Thing™ but they failed to make a case against it.

I know that when I do it manually I can SEE that I am physically copying the correct files, I am physically shutting down the app servers and ensuring they closed successfully, I am physically starting the servers back up and then physically inspecting the logs to make sure it started up okay and the deployment was successful. It gives me a peace of mind.

What are arguments against this OR arguments for automatic scripted production deployment?

2
  • 'ls' after 'rm' once allowed me to catch a disasterous rm that was recursing down through hard links up to higher places in the file system. Was able to catch it while there was enough of the system left to use to recover the files that had already deleted (deleting millions of files seems to take awhile fortunately!). :-) Commented Oct 13, 2011 at 17:39
  • continuousdelivery.com Commented Oct 14, 2011 at 9:07

15 Answers 15

30

There are a few obvious arguments against this.

  1. What happens if you leave. Is all this information carefully documented, or is it mostly in your head. Automated scripts are a much better place for someone else to take over from.

  2. Everyone makes mistakes. There will come a time when the person doing the deployment is tired, not paying attention whatever. Yes ideally deployments are only done in an happy calm place with lots of time. In practise they can be rushed and stressed when trying to roll out urgent fixes. This is the mostly likely time to make a mistake, and also the most costly. If the deployment is a single script then the potential for mistakes is limited.

  3. Time. As deployments get more complicated the amount that needs doing increases. Scripts just require kicking off, a manual check, and then a manual switch-over (you could automate this as well, but I share some of the paranoia :).

21

You can get the best of best worlds: peace of mind with process verification and the reliability of automation.

Script the deployment. Then, go through and manually verify that processes are started, files removed, etc. In other words, write your own QA script just to verify that automated steps 1 - X actually occurred.

5
  • 7
    Maybe something like creating your own Wizard, where you can manually trigger each step. A log output is produced with as much detail you need to verify before going to the next step.
    – JeffO
    Commented Oct 13, 2011 at 13:17
  • @JeffO I like that idea! We just invested in a nice Swing GUI building tool I chomp at the bit for every excuse to use it. I am whipping out GUI tools faster than ever, and a visual wizard would be something so nice that a junior developer could handle it.
    – maple_shaft
    Commented Oct 13, 2011 at 13:29
  • @maple_shaft And you get the piece of mind knowing the step where they copy the correct files was done at the right time.
    – JeffO
    Commented Oct 13, 2011 at 13:47
  • I agree with this. Something as simple as a batch file (or series of them) to do your deployment for you can ease the tension a lot. Use batch files to ensure that you don't make any mistakes, and run the manually to ensure that there aren't any catastrophic errors while running the batch files.
    – Kibbee
    Commented Oct 13, 2011 at 14:39
  • 4
    @Jeff O - I like the logging idea. This creates traceability and also gives maple something to QA. I do not like the wizard idea. The more steps it takes to publish your product to production, the more likely somebody is gonna screw it up. Just automate it all. Check it with humans. Commented Oct 13, 2011 at 17:24
15

I think the key here is: why do you think that you can't script the verification process?

My deploy scripts don't just push archives and restart services. They print out lots of color-coded information during each step of the deploy, and provide me a summary of events at the end. It lets me know that processes are up and running, that the homepage is serving a 200 status code, and that all the machines and services can see one another fine. I then have a separate service that's not part of the script which monitors log files, 4xx and 5xx-level errors, and key site metrics. It then proceeds to yell at me through every medium possible (email, txt msg, and alarms) if there are drastic negative-effect spikes.

Between that and CI servers running tests, I literally deploy and forget at this level of automation. I don't even browse a single page on the site after a push because of how reliable the process now is, which not only lets me deploy as often as I want, but lets a new developer on the project make an update to the live site within minutes of coming on board. In the past I've even made the CI servers auto-deploy to production after a commit to a master/trunk branch that passes everything. That's how confident I am in my tools.

You should be too.

3
  • 1
    I wish I could have this level of confidence but it isn't confidence in tools that prevents this, it is confidence in the quality of the application that I inherited and its "Primadonna" nature after getting deployed. Of course what you describe is my wet dream and is the end game I am looking for.
    – maple_shaft
    Commented Oct 13, 2011 at 14:04
  • @maple_shaft Yeah, if it's a legacy application with inadequate test coverage, I can definitely see wanting to take manual intervention, especially if it's known to be finicky. Commented Oct 13, 2011 at 14:18
  • 1
    One of good methods of preparing the script is to simply record one of deployments to a file, input and output, then modify it to include scanning the output for facts you check with your eyes normally.
    – SF.
    Commented Oct 13, 2011 at 14:53
8
+100

I can understand being a bit nervous trying something new on the prod environment. Being wary of potential disaster is a Good ThingTM.

Automated scripting is also a Good ThingTM and so long as you approach it carefully, you should be able to minimise the danger and lower your fear. So my advice is this;

  • Prepare (and practice on the integration env) a checklist/set of tests so you can quickly find out whether it worked and what if anything went wrong. Verbose logging may help with this.
  • Back up everything. Prepare and practice a manual rollback so that you can recover if it goes badly wrong.
  • Test as much as you can before you do it for real on prod. Sounds like you're a good way along with this with your integration env.
  • First time you try it, do it on a low profile, low impact change. Something like a minor upgrade or patch. The idea is to minimise the fallout if it goes wrong. Do not choose a high profile major upgrade (where the CEO and all your competitors are watching) for your first run.

Once you've got a few successful runs under your belt your confidence will grow and soon you'll wonder how you ever managed doing manual deployments.

1
  • 2
    I think your answer is one of the bests, because it actually adresses the anxiety while most of the other answers are off-topic, advocating automated deployment—whose benefits the OP is alreadey aware of. Thus your answer deserves the bounty!
    – user40989
    Commented Dec 11, 2013 at 11:26
8

Do you also run your production machines with remote debugging, and you manually stepping through them? Building a proper script is identical to writing a program. All of the issues you have indicate things that it will need to watch for and check against.

If something goes wrong, it should go through proper rollback procedures, and send you a message. Everything that happens can get logged for later. You can version control the scripts, and set up test cases.

But if you are manually running commands, you don't have any of these advantages. You instead have a list of disadvantages.

  • You don't have a good log, shell history doesn't count
  • No one else knows how to do it
  • Steps get missed
  • Checks are only sometimes done
  • Some items to deploy may get missed, I've done that before
  • It takes a lot longer
  • You can get interrupted during the process

A proper script should be almost identical to if you typed everything out on the shell. This is one of the reasons that we have bash scripts. If you trust the things you do, why can't you record everything and tighten it up? Better checking, faster checking, more checking can happen because the computer does it.

4

Here's the biggest argument against manual deploys to production: You're a human and will make mistakes. There will undoubtedly be times when you'll forget to do something that will cause you grief. A well-written automated deployment doesn't have that same tendency. It's true that you can still have messed-up production deploys, but that's because your automated deployment has bugs that need to be solved.

In my experience the benefits of automated deploys to production are tremendous. The biggest one is that you get to have fun on the weekends instead of trying to march through a manual deployment process that won't cooperate.

That said, here are some key pointers for automating your production deployments:

  • Don't do it all at once! Start slowly writing your automated deployments. Set up a separate non-production environment first, and try automating deployments there. Once you've built up confidence in your automated deployments, you can start thinking about doing production deploys
  • Start releasing and deploying very frequently! It's much easier to do automated deployments when you don't have 4 months of code waiting to be released. Release small features and bug fixes multiple times per week. The benefits of this release style cannot be understated!
  • Rely on automated tests to give you confidence that your production environment will work. Again, this takes time to build up, but is very important. Automated tests are always better than manual acceptance tests. Sure, manual acceptance tests are fine, but automated tests can help you know whether you should deploy into production or not. They are the key that enable this whole process of automated, continuous delivery. If your tests don't pass, you know not to deploy into production.
3

Run the scripts on the live server. It will work, and after you've seen it work fine a few times, you will be perfectly confident in it.

Seriously though, you are more likely to make mistakes than the deployment script.

3

Computers don't make mistakes, people do.

Write your script once and thoroughly check it, go through it line by line. From then on you can be sure that each time you deploy, it will work.

Do it by hand and you're bound to make mistakes. Maybe you wrote, everything you have to do, down but it's oh so easy to make a mistake. You have to copy all files except the web.config file? You can bet that someday you will overwrite it. A script will never make this mistake.

3

How can I automate production deployments without experiencing extreme anxiety?

The extreme anxiety you would experience when automating production deployments is, most probably, based on two beliefs:

  1. One day or the other, some deployment step will fail and you or another human is able to recover rapidly from the failure while an automated script could overlook it.

  2. An overlooked failure in production has dramatic consequences.

There is little one can do about 2., besides avoiding failures, so let us focus on 1.

A cheap solution slightly improving on the existant would be to use a semi-automatic deployment procedure, waiting for validation at the end of each step of the installation. With a semi-automatic solution you would enjoy benefits of a full automatic solution, like consistency and reproductibility, while you will still get a chance to monitor progresses and recover from errors as you are currently used to.

The semi-automated script and its biotope (regression tests etc.) could also serve as vehicle for the knowledge you are gathering about failures that happen in the installation procedure and ways to recover from them.

2

What I like is you can test the deployment on staging or QA and know that when you run it on prod the exact same steps will happen.

When you do it manually it is easier to forget a step or do them out of order.

3
  • The problem is that prod and staging and QA are not looking same. So script will do different things on each environment. So script will be tested for the first time on production. Commented Dec 10, 2013 at 23:46
  • Then set up an environment that you refresh from Prod just before you run the automated script. Use it for nothing else.
    – HLGEM
    Commented Dec 11, 2013 at 14:48
  • I don't understand. If he could setup environment that looks like PROD he wouldn't have a problem at all. Commented Dec 11, 2013 at 17:29
1

...due to hardware and resource constraints our integration environment is not a 2 server load balanced environment like our production environment. While everything else is equal that would be the only difference between our integration and production environments (although a big one!)

Given above, I would probably be as anxious as you.

I once did review and testing of automated script that deploys to SLB and my feeling is that without pre-testing at load balanced setup I'd prefer to manually do things.


Besides prod-like testing setup, another thing that had significant impact on my peace of mind is that prod deployment was done by other team that developers - by guys whose only job was to maintain production environment.

  • In one of the projects I was assisting them in deployment as dev team representative. Prior to deployment, they were reviewing my instructions and during deployment I was just sitting online ready to consult if things go wrong. Back then, I learned to appreciate that separation.
     
    Not that they were faster (why would they? I did test deployments 5x-10x more frequently than them). The big difference was in focus. I mean, my head is always loaded by "main" stuff - coding, debugging, new features - there's just too much distractions to properly concentrate on deployment. As opposed to that, their main stuff was just production maintenance and they were focused on that.
     
    It's amazing how much better brain works when focused. These guys, they were just so much more attentive, they made so much less mistakes than me. They just knew that stuff better than me. They even taught me a thing or two that made my own test deployments easier.
2
  • Thanks, it is good to hear from somebody who knows what this feels like. Needless to say we are much too small to warrant a build team that handles our production deployments. When you work at a startup you learn to wear 20 different hats pretty fast and I don't always have the luxury of "focus". I think that I am going to write a robust deploy and verification script for my sanity. For the first time in a while I have a two week lull between projects where I can get something like this done.
    – maple_shaft
    Commented Oct 13, 2011 at 18:49
  • verification script I see. Well, given your situation, this seems to be next best thing after dedicated build team. I wonder btw do you really have no option to test-deploy on two-servers setup? even if you skip the load balancer, just to smoke-test that both master/slave URLs respond?
    – gnat
    Commented Oct 13, 2011 at 18:58
1

Build a deployment script which you use to move your code into any environment. We use the exact same deployment process to move code to dev, qa, staging, and finally production. Since we're deploying multiple times per day to dev, and daily to QA, we've gained confidence that the deployment scripts are correct. Basically, test the hell out of it by using it often.

1
  1. Simplify. Your change process should be rsync files, run SQL script, nothing more.
  2. Automate.
  3. Test.

The reason to automate is to obtain something that is test-able, repeatable, and that you can trust to work correctly in every expected situation.

You still need to have a back out plan, as for any change in any context, and it should be automated as well.

You will still want to observe the process as it happens if the environment is really sensitive, but never do it manually as it just can't be reproduced.

0

It is entirely possible to use automation scripts to deploy to production environments. However to do so reliably you need to be able to do several things.

  1. Reliably roll back to the previous version.
  2. Obtain positive confirmation that the deployment has been successfully applied, and is responding to valid traffic.
  3. Have comparable environments for development and QA which also use the same scripts.

There are some advantages to scripts, such as they will never miss out a command because its 2am, and its tired.

However, scripts can and will still fail. Sometimes the failure is in the design of the script, but it could also be caused by a network or power failure, corrupt file system, running out of memory.....

That is why it is important that after the script has run, that a defined test phase is also followed that verifies that the new deployment is up, running and handling requests, before live traffic is enabled.

-2
  1. take a larger window for deployment the first time if things goes wrong
  2. Divide the Deployment process in two parts. a. Backup (manual) - this should give you confidence if anything goes wrong during deployment

    b. Deployment(automated)

once you are able to deploy with confidence for the first time. you can automate the backup process as well.

1
  • this does not answer the question asked: "What are arguments against this OR arguments for automatic scripted production deployment?"
    – gnat
    Commented Jan 5, 2014 at 11:18

Not the answer you're looking for? Browse other questions tagged or ask your own question.