28

We end up with a fair amount of AWS EC2 snapshots where the AMI has been deleted, but the snapshot is left behind to rot. I'd like a non-manual way of identifying and deleting these orphans to save us money and space.

Ideally I'm thinking a bash script leveraging the CLI, but my AWS-fu is weak. I assume someone's done this before but I can't find a script that actually works.

In the best-case scenario this will also check volumes and clean those as well, but that may be better suited for a second question.

1

5 Answers 5

16

Largely inspired by the blog posts and gist already linked in the other answers, here is my take to the problem.

I did use some convoluted JMESpath functions to get a list of snapshots and not require tr.

Disclaimer: Use at your own risks, I did my best to avoid any problem and keep sane defaults, but I won't take any blame if it cause problem to you.

#!/bin/sh
# remove x if you don't want to see the commands
set -ex

# Some variable initialisation with sane defaults
DRUN='--dry-run'
DO_DELETE=${1:-'no'}
REGION=${2:-'eu-west-1'}
ACCOUNTID=${3:-'self'}

# Get two temporary files
SNAP_FILE=$(mktemp)
IMAGE_FILE=$(mktemp)

# Get the snapshot list and the volume list
aws --region "$REGION" ec2 describe-snapshots --owner-ids "$ACCOUNTID" --query 'Snapshots[*].[SnapshotId]' --output text > "$SNAP_FILE"
aws --region "$REGION" ec2 describe-images --owners "$ACCOUNTID" --filters Name=state,Values=available --query 'Images[*].BlockDeviceMappings[*].Ebs.[SnapshotId]' --output text > "$IMAGE_FILE"

# Check if the outputed command should be dry-run (default) or not
if [ "$DO_DELETE" = "IAMSURE" ]
then
 DRUN=''
fi

# count each snapshot id, decrease when a volume reference it, print delete command for those with no volumes
awk -v REGION="$REGION" -v DRUN="$DRUN" '
FNR==NR { snap[$1]++; next } # increment snapshots and get to next line in file immediately

{ snap[$1]-- } # we changed file, decrease the snap counter when a volume reference it

END {
 for (s in snap) { # loop over the snapshots
   if (snap[s] > 0) { # if we did not decrese under 1 that means there is no volume referencing this snapshot
    cmd="aws --region " REGION " " DRUN " ec2 delete-snapshot --snapshot-id " s
    print(cmd)
  }
 }
}
' "$SNAP_FILE" "$IMAGE_FILE"
# Clean up the temp files
rm "$SNAP_FILE" "$IMAGE_FILE"

I hope the script itself is commented enough.

Default usage (no-params) will list delete commands of orphaned snapshots for the current account and region eu-west-1, extract:

aws --region eu-west-1 --dry-run ec2 delete-snapshot --snapshot-id snap-81e5856a
aws --region eu-west-1 --dry-run ec2 delete-snapshot --snapshot-id snap-95c68c7e
aws --region eu-west-1 --dry-run ec2 delete-snapshot --snapshot-id snap-a3bf50bd

You can redirect this output to a file for review before sourcing it to execute all the commands.

If you want the script to execute the command instead of printing them, replace print(cmd) by system(cmd).

Usage is as follow with a script named snap_cleaner:

for dry-run commands in us-west-1 region

./snap_cleaner no us-west-1

for usable commands in eu-central-1

./snap_cleaner IAMSURE eu-central-1 

A third parameter can be used to access another account (I do prefer to switch role to another account before).

Stripped down version of the script with awk script as a oneliner:

#!/bin/sh
set -ex

# Some variable initialisation with sane defaults
DRUN='--dry-run'
DO_DELETE=${1:-'no'}
REGION=${2:-'eu-west-1'}
ACCOUNTID=${3:-'self'}

# Get two temporary files
SNAP_FILE=$(mktemp)
IMAGE_FILE=$(mktemp)

# Get the snapshot list and the volume list
aws --region "$REGION" ec2 describe-snapshots --owner-ids "$ACCOUNTID" --query 'Snapshots[*].[SnapshotId]' --output text > "$SNAP_FILE"
aws --region "$REGION" ec2 describe-images --owners "$ACCOUNTID" --filters Name=state,Values=available --query 'Images[*].BlockDeviceMappings[*].Ebs.[SnapshotId]' --output text > "$IMAGE_FILE"

# Check if the outputed command should be dry-run (default) or not
if [ "$DO_DELETE" = "IAMSURE" ]
then
 DRUN=''
fi

# count each snapshot id, decrease when a volume reference it, print delete command for those with no volumes
awk -v REGION="$REGION" -v DRUN="$DRUN" 'FNR==NR { snap[$1]++; next } { snap[$1]-- } END { for (s in snap) { if (snap[s] > 0) { cmd="aws --region " REGION " " DRUN " ec2 delete-snapshot --snapshot-id " s; print(cmd) } } }' "$SNAP_FILE" "$IMAGE_FILE"
# Clean up the temp files
rm "$SNAP_FILE" "$IMAGE_FILE"
12
  • Magnific! And except from the 'follow' (which IMO should be 'follows'), I think this answer is to be considered as a sample of high quality posts. The only thing in it that seems a bit redundant, is the disclaimer (anything one uses from something on an SE site comes with "use it at your own risk"). I can only think of 1 additional improvement you might want to add: an indication if you did test this script and if so how to summarize its test results (something like "works as designed"?). Obviously, if you already use it yourself, that's an even better indication.
    – Pierre.Vriens
    Commented Mar 23, 2017 at 11:08
  • @pierre wrote it this morning , tested partially, will probably enter our pipeline this afternoon, and while I agree on the general idea ´provided as is' , the risk level of removing a ´backup' is high and I feel I should stress it even more.
    – Tensibai
    Commented Mar 23, 2017 at 11:16
  • Hm, so we can get you involved to start a free code writing service for these kinds of DevOps needs (with some disclaimer-strings attached) ... interesting! I suggest that later on (when time is right), you add a minor update (at the end) like "my script entered our pipeline this afternoon".
    – Pierre.Vriens
    Commented Mar 23, 2017 at 11:24
  • 1
    Perfect, thanks for editing! Works exactly as intended.
    – Alex
    Commented Mar 23, 2017 at 14:19
  • 1
    I needed a list of snapshot that was easy to check, and potentially remove some entries from, before actually deleting the stuff, as I had some manually created snapshots that I wanted to keep, even though no AMI referenced them. So I used you code as a base for a new scripts for supporting this: gist.github.com/Kreinoee/8d80ff8710f983cfc096be3afa641216 Commented Mar 24, 2021 at 21:37
5

I used the following script on GitHub by Rodrigue Koffi (bonclay7) and it works pretty good.

https://github.com/bonclay7/aws-amicleaner

Command:

amicleaner --check-orphans

From the documentation blog post it does some more things:

It actually does a bit more than that, at of today it allows:

  • Removing a list of images and associated snapshots
  • Mapping AMIs:
    • Using names
    • Using tags
  • Filtering AMIs:
    • used by running instances
    • from autoscaling groups (launch configurations) with a desired capacity set to 0
    • from launch configurations detached from autoscaling groups
  • Specifying how many AMIs you want to keep
  • Cleaning orphan snapshots
  • A bit of reporting
3

Here is one script which can help you find orphaned snapshots

comm -23 <(echo $(ec2-describe-snapshots --region eu-west-1 | grep SNAPSHOT | awk '{print $2}' | sort | uniq) | tr ' ' '\n') <(echo $(ec2-describe-images --region eu-west-1 | grep BLOCKDEVICEMAPPING | awk '{print $3}' | sort | uniq) | tr ' ' '\n') | tr '\n' ' '

(from here)

Also you can check this article from serverfault

P.S. Of course you can change the region to reflect your

P.P.S. Here is updated code:

 comm -23 \
<(echo $(aws ec2 describe-snapshots --region eu-west-1 |awk '/SNAPSHOT/ {print $2}' | sort -u) | tr ' ' '\n') \
<(echo $(aws ec2 describe-images --region eu-west-1 |  awk '/BLOCKDEVICEMAPPING/ {print $3}' | sort -u) | tr ' ' '\n') | tr '\n' ' '

The sample exaplanations what the code do is:

echo $(aws ec2 describe-snapshots --region eu-west-1 | awk '/SNAPSHOT/ {print $2}' | sort -u) | tr ' ' '\n')

send to STDOUT the list of snapshots. this construction:

<(...)

create virtual temporary filehandler to make comm command read from two "files" and compare them

10
  • Did you test it? I found the same article but can't get it to work. If you can, user error on my end, but I fear it may be outdated based on the age of the article.
    – Alex
    Commented Mar 22, 2017 at 18:26
  • @Alex, can check it tomorrow Commented Mar 22, 2017 at 18:27
  • Command see have changed, use aws ec2 describe/delete
    – Tensibai
    Commented Mar 22, 2017 at 18:47
  • 1
    I did found the same source, but chaining hero awk sort and uniq makes my shell coder side sad, I'll post my version tomorrow :)
    – Tensibai
    Commented Mar 22, 2017 at 18:53
  • 1
    Fine for me, just wanted to provide you some (constructive) feedback to let you know that what probably looks like regular English to an expert (like you), looks pretty much like Chinese to me, ok? PS: and it doesn't sound Flemish either ... Drop me an extra comment if you want to notify me after you're done (if you want my updated feedback then).
    – Pierre.Vriens
    Commented Mar 22, 2017 at 19:22
2

Here is a GitHub Gist code snippet of exactly what you are asking for by Daniil Yaroslavtsev.

It uses the list of all images and their snapshots and compares the IDs to list of all snapshot IDs. Whatever remains are the orphaned ones. The code works in the same principle as the answer above, but is better formatted and slightly more readable.

The code takes advantage of the JMESPath with --query Snapshots[*].SnapshotId option (you can also use jp command line utility for that, if its already in your distribution. The formats the output as text with --output text. Here is a link to API reference and few examples. It is slightly more elegant than a long chain of grep/awk/sort/uniq/tr pipes.

Warning by Todd Walton: Don't mistake with 'jq' utility which uses different query language to parse json documents.

2
  • Just FYI, the jq command line utility is not the same JSON query language as what the "aws" command uses. The "aws" command uses JMESPath. Commented Nov 14, 2018 at 17:54
  • Thank you for pointing that out. I've learned something new today. Commented Nov 14, 2018 at 18:58
0

I've written snapshots.py script which iterates over all snapshots (in defined list of regions) and generates report.csv. This file contains information about instance, AMI and volume referenced by all snapshots.

There is also command to interactively remove dangling snapshots.

Not the answer you're looking for? Browse other questions tagged or ask your own question.