Fast, secure incremental Rsync backups

WARNING

This is nowhere near complete. As pointed out by Daniel Goering, if you modify an individual file, this is propagated "back in time" through the backup directories. This means that a change in directory structure is correctly backed up, but the file contents themselves will always be at their most recent. He suggests modifying the script to use hardlink so that individual altered files will not be linked, but completely copied. I will work on a new script in the near future to see if I can implement this. Please be advised: you use this script entirely at your own risk: As harrowing as data loss can be, I cannot be held responsible for any such loss as a result of the use of this script.

The background

I recently hosed my laptop harddrive through an upgrade to kernel 2.6.19. The culprit was a combination of a nasty ext3 filesystem bug and heavy disk usage. I was smart enough to backup my important documents religiously by hand, but I lost every single config file on my heavily customised Debian distribution, a total of 3 years worth of modifications and long-forgotten workarounds. It took me well over a week to get my laptop working as I liked it again.

With the relative affordability of hard disk space these days, it seemed like a good idea to invest in a disk large enough to backup my entire filesystem to. My Laptop disk is only 20GB, so this was laughably affordable. It then occurred to me to make several backups, on a daily, and weekly basis. There is an inefficient and an efficient way to do this: I could straight-copy the filesystem into several directories, but transferring 20GB a night is pretty excessive, and probably reduces the lifetime of both the backup disk and the original. In addition, a weeks worth of backups now runs to 140GB. 500GB disks are now commonplace, but why be so inefficient. There is a better way: Rsync.

So what is Rsync?

Rsync is an open source utility that allows fast incremental file transfer. It's quick, can use SSH so it's secure, and it is efficient. Everytime you use it to copy a directory structure, it carefully checks and only copies the changes, saving time and bandwidth.

Great. But what if I want to make multiple backups at regular intervals so I can "go back in time"?

This is where cp comes in. Anyone who has spent more than a few minutes on a linux/unix box knows this command. What they might not know is that it is a whole lot more powerful than they give it credit for. Read the man pages, paying particular attention to the -a and -l options.

Combining the two, you get cp -al :) This makes a hard-linked copy of your directory of choice that behaves identically to the original folder, and uses no extra space. The upshot of this is, if you now create or delete files in either directory, the other stays the same, and the only difference in disk-space usage is the size of the changes made. If I have a 10MB folder, and I cp -al it, I still only take up 10MB. If I then create two 1MB files in this directory, the total disk space taken up becomes 12MB. How's that for efficient?

Tell me about BASH

No. :) There are many, many bash scripting guides out there. I'll leave it to google. Suffice to say that it is a fairly easy language to code very small programs in,that will work on most modern (and old) linux distros, but if you want to make larger scripts, use something like Perl.

Security and redundancy

So now I can backup efficiently to my nice big new disk, but what happens if my laptop is stolen, or catches fire (disturbingly common with Dell laptops these days!), or some sneaky hacker breaks into it and destroys both disks, or the world ends?

Well, there's a solution to all of these problems except the last one. Rsync doesn't just copy stuff efficiently from disk to disk. It lets you do it from computer to computer, using a number of means. As security is the issue here, we'll do it with SSH. SSH is pretty much standard on *nix systems these days. It gives you complete access to another computer as if you were sitting in front of it, but does it over a secure connection. It is every geeks best friend, as it lets you do everything from everywhere without having to get up.

I want my script to work remotely, so that if anything nasty happens to my laptop, it doesn't compromise my backup machine. To do this though, you need to know about unpassworded SSH keys. Instead of logging in with a password, you keep a public key on one computer, and a private key on the other. The machine with the private key can then authenticate itself with the public key automatically, and not need a password.

Before we go any further, I need to warn you that if you set this up without knowing what you are doing, you are likely to enter a whole world of trouble. If someone gets hold of your private key, they can pretend they're you, connect to your computer, and do anything that you can, including delete files, use it to inflict harm on other peoples' computers anonymously, steal private data you have stored, and cause general havoc.

I'll say that again, to let it sink in. MAKE SURE YOU KNOW WHAT YOU ARE DOING! KEEP YOUR PRIVATE KEYS SAFE. I'm not going to discuss this here, as I am not confident enough in my own abilities to guarantee the security of other peoples' computers. There are tutorials out there, including the one in the next paragraph

Now I've scared you (and rightly so), I'll point you in the right direction: A good thing to do is make a specific key that will only work on one machine, and with restrictions. A good tutorial for this can be found here.

The Script

So now that I've told you about all these tools, how is it brought together? The script looks a little daunting, so I'll talk you through it.

I hope that gave you an insight into how the script works. Here it is, feel free to edit it to better suit your needs, and if you find a bug, or can improve it, tell me!

#!/bin/bash
# ----------------------------------------------------------------------
# Conor's backup solution, with credit to 
# mikes handy rotating-filesystem-snapshot utility
# conorfitzpatrick@gmail.com
# http://void.printf.net/~conor 
# ----------------------------------------------------------------------
# this generates a daily backup of the directory of choice on a computer
# all of it is done from a remote server, reducing cpu overhead and 
# keeping it secure. It also allows multiple computers to be backed up 
# by one server
# ----------------------------------------------------------------------

unset PATH      # suggestion from H. Milz: avoid accidental use of $PATH

# ------------- system commands used by this script --------------------
ID=/usr/bin/id;
ECHO=/bin/echo;
RM=/bin/rm;
MV=/bin/mv;
CP=/bin/cp;
TOUCH=/bin/touch;
RSYNC=/usr/bin/rsync;
MKDIR=/bin/mkdir;

# ------------- file locations -----------------------------------------

BACKUP_DIRECTORY_LOCAL=/mnt/backupdisk/computername;    #the directory in 
                                                        #which you STORE 
                                                        #the backups

BHOST=remotecomputer;   #the host you're BACKING UP FROM

BACKUP_DIRECTORY_REMOTE=/home/conor;    #the directory you're BACKING UP

EXCLUDES=/home/backupaccount/excludes.list;     #files you DON'T want backed up

USER=conor;                     #the user you want to be on the remote computer

KEY=/home/backupuser/.ssh/rsync-key;    #this user's public ssh key 
                                        #(as in authorixed_keys on 
                                        #the remote box) 
# ------------- the script itself --------------------------------------

#copy the last backup made to a temporary directory to perform the rsync 
#if it doesn't already exist. If it is the first time, a temporary folder 
#will be made

if [ -d $BACKUP_DIRECTORY_LOCAL/backuptmp ] ; then
        echo "temp already there. not copying back last backup" ;
else
        if [ -d $BACKUP_DIRECTORY_LOCAL/daily.1 ] ; then
                $CP -al $BACKUP_DIRECTORY_LOCAL/daily.1 $BACKUP_DIRECTORY_LOCAL/backuptmp ;
                if (( $? )) ; then
                        echo "Copying back old backup to backuptmp: FAILED"
                        exit 1
                fi;
                        echo "populated $BACKUP_DIRECTORY_LOCAL/backuptmp from $BACUP_DIRECTORY_LOCAL/daily.1" ;

        else
                echo "No prior backups found. Assuming this is the first time you've generated a backup" ;
                $MKDIR $BACKUP_DIRECTORY_LOCAL/backuptmp ;
                if (( $? )) ; then
                        echo "Making temporary backup directory: FAILED"
                        exit 1
                fi;
        fi;
fi;

# actual rsync starts here, connecting to the remote machine by ssh 
# with a key. MAKE SURE YOU KEEP YOUR PRIVATE KEY SAFE, AND FOR THE 
# LOVE OF GOD DON'T SET THIS UP AS ROOT! better yet, edit the 
# authorised_keys to allow only this computer to perform only this command.

        echo "beginning incremental rsync" ;
        $RSYNC  -va --delete --delete-excluded --exclude-from="$EXCLUDES" -e "ssh -i $KEY" $USER@$BHOST:$BACKUP_DIRECTORY_REMOTE/ $BACKUP_DIRECTORY_LOCAL/backuptmp/ ;

        if (( $? )) ; then
                echo "rsync: FAILED"
                exit 1
        fi;

        $TOUCH $BACKUP_DIRECTORY_LOCAL/backuptmp ;

#rotate the locally stored backups in preparation for synching
# delete the oldest snapshot, if it exists:
        echo "cycling backups" ;
        if [ -d $BACKUP_DIRECTORY_LOCAL/daily.7 ] ; then
                $RM -rf $BACKUP_DIRECTORY_LOCAL/daily.7 ;
        fi;

# shift the middle snapshots(s) back by one, if they exist
        if [ -d $BACKUP_DIRECTORY_LOCAL/daily.6 ] ; then
                $MV $BACKUP_DIRECTORY_LOCAL/daily.6 $BACKUP_DIRECTORY_LOCAL/daily.7 ;
        fi;
        if [ -d $BACKUP_DIRECTORY_LOCAL/daily.5 ] ; then
                $MV $BACKUP_DIRECTORY_LOCAL/daily.5 $BACKUP_DIRECTORY_LOCAL/daily.6 ;
        fi;
        if [ -d $BACKUP_DIRECTORY_LOCAL/daily.4 ] ; then
                $MV $BACKUP_DIRECTORY_LOCAL/daily.4 $BACKUP_DIRECTORY_LOCAL/daily.5 ;
        fi;
        if [ -d $BACKUP_DIRECTORY_LOCAL/daily.3 ] ; then
                $MV $BACKUP_DIRECTORY_LOCAL/daily.3 $BACKUP_DIRECTORY_LOCAL/daily.4 ;
        fi;
        if [ -d $BACKUP_DIRECTORY_LOCAL/daily.2 ] ; then
                $MV $BACKUP_DIRECTORY_LOCAL/daily.2 $BACKUP_DIRECTORY_LOCAL/daily.3 ;
        fi;
        if [ -d $BACKUP_DIRECTORY_LOCAL/daily.1 ] ; then
                $MV $BACKUP_DIRECTORY_LOCAL/daily.1 $BACKUP_DIRECTORY_LOCAL/daily.2 ;
        fi;

# make a hard-link-only (except for dirs) copy of the latest snapshot, 
# if that exists
        if [ -d $BACKUP_DIRECTORY_LOCAL/backuptmp ] ; then
                $CP -al $BACKUP_DIRECTORY_LOCAL/backuptmp $BACKUP_DIRECTORY_LOCAL/daily.1 ;
        if (( $? )) ; then
                echo "copy of temporary backup directory FAILED"
                exit 1
        fi;
fi;
echo "backup complete" ;
exit 0

That's it! Those of you familiar with cron can make this run daily at a time that suits you (in my case when I'm asleep at 4am), and email the results to you in the morning.

Credit is due to mike, whose first incarnation of this script set me on the right path. his website is here

Conor Fitzpatrick, 21/01/07

Home



Valid CSS!

Valid HTML 4.01 Transitional