Saturday, December 26, 2009

Backups with Duplicty and Amazon S3

For a while now I have been keeping remote backups on the Amazon S3 service. Why Amazon S3? Well, it's very cheap; costing me only about US 20c a month for about 1 - 2 Gb of compressed backups. Don't great support service for these prices though - you are very much on your own.

My criteria for backups was the ability to do incremental backups; a site remote from my own computer and easy to use. Less critical was the ability to encrypt data (my stuff isn't that private), but it seemed like a good idea anyway.

The obvious choice for doing the actual backup was Duplicity. Great program, though the documentation is a little hard to follow. Duplicity reads your files, compresses them into tarballs, then uploads the tarballs onto the remote site along with an index file. When you do the next incremental back up, Duplicity downloads the index file(s) and checks what has been changed. It then uploads incremental change files, with new indexes.

Duplicity can also encrypt the tarballs with your pgp key, making them secure from prying eyes. Not that I think Amazon really wants to sift through my files.

That's the background, the purpose of this post is to show a shell script that automates it all:

#!/bin/bash

source ${HOME}/.backup-data/duplicity-amazon.conf

INCLUDE_FILE="${HOME}/.backup-data/include_file_amazon.txt"

SOURCE="/"

DESTINATION="s3+http://your-bucket"

/usr/bin/duplicity -v5 --encrypt-key ${ENCRKEY} --sign-key ${SIGNKEY} \
--full-if-older-than 1M \
--include-globbing-filelist ${INCLUDE_FILE} \
--exclude '**' \
--allow-source-mismatch \
${SOURCE} \
${DESTINATION}

/usr/bin/duplicity remove-all-but-n-full 2 \
-v4 --encrypt-key ${ENCRKEY} --sign-key ${SIGNKEY} \
--force \
${DESTINATION}

# Reset the ENV variables
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export PASSPHRASE=


This shell script is designed to be run in a multi-user system and backs up specified directories for each user - such as their home drive.

The first lines incorporate the users pgp key and list of files to be backed up. These are the files duplicity-amazon.conf and include_file_amazon.txt, from the users home directory.

The source for backing up is set as the file system root ('/'), and the backup destination is your chosen Amazon S3 bucket.

The first call to Duplicity does a full backup if the current backup is older than one month. All matching directores in the 'include file' are backed up, everything else is ignored by default.

The second call to Duplicity removes old backups older than three months.

Now the two files mentioned above. First the pgp key file:

#!/bin/sh

# Amazon keys
export AWS_ACCESS_KEY_ID='Your Amazon ID'
export AWS_SECRET_ACCESS_KEY='Your Amazon key'

# gpg encryption key
SIGNKEY='Your pgp key'
ENCRKEY=$SIGNKEY
# gpg passphrase
export PASSPHRASE='Your secret passphrase'

It is important that this file is stored in the users home directory with read/write permissions for the user only. You don't want everyone seeing your pgp key and Amazon ID.

Note that the key environment variables are set to null before the main script ends. Again, this is for security.

Now the include files. Note that entries are relative to the system root we specified earlier:

/home/asimpson/Documents
/home/asimpson/simpson image media

No comments:

Post a Comment