Incremental Backup of Image Files (or: How to Diff and Patch Big Binary Files)

More often than expected, there is a problem for which there should be an easy solution, but a short googling session lets you behind with the hollow feeling that the world let you down… again. But then you put out your unix skills to find a solution for the problem on your own.

Update: There is a Part II to this post, which explains the idea behind the solution shown here

Today is such a day… The problem is as follows: you backup a disk (e.g. the sdcard of a raspberry pi) with dd like this:

$ sudo dd if=/dev/mmcblk0 of=/media/backup/yyyymmdd-raspi-homebridge.img bs=1M

A backup with dd is a bitwise copy, which takes exactly the space of the disk, no matter how empty the block device is. I.e., the dd-image of an sdcard with nominally 16GB takes about 15GB (the usable space of the disk). If the device is more or less empty, the image consists of a lot of zeros and can be compressed with tools like bzip2 very well. In your (i.e., my) case 6 GB are used on the disk. After compressing the image it is less than 2 GB. Sounds great, right? Unfortunately, you are paranoid and want to store the last X backups. Even with a small X, this can get really hungry on your cloud storage. This is the time where your inner voice says: Wouldn’t it be great to store the delta of an old to a new backup, only?

That means, you store the complete (compressed) backup of the most current backup, as it is most likely, that you need it than older ones. The older backups are just deltas to the next-newer backup. Each time a new backup is created, the predecessor image is replaced by a diff/delta between it and the new backup.

There must be a solution for this, right? Meh, at least I couldn’t find that solution. If you found it, please comment below. So, I started some experiments…

First, I tried to install the tool bsdiff. Unfortunately, it takes at least 17 times the main memory of the size of the source file. Since we talk here about a 16GB file (even if we use the compressed file this would be an issue), the tool didn’t even try and gave me invalid argument. For a blink I considered to split the file into smaller chunks… but then I tried git diff --no-index --binary. It tried hard, but died when it tried to allocate to much memory. Perhaps it uses the same algorithm as bsdiff does. I remembered, that I did this already when using rdiff-backup way back when. Hence, I found the blog post “Creating and Applying Diffs with Rsync”. Using rsync for this works great. The rest of this post will describe the solution and give some numbers how much space you can save (more precisely: how much space I could save in my case).

The following script can be used to create the patch file (diff-binary.sh):

#!/bin/bash

if [ $# -eq 0 ]; then
    echo "Usage: diffBinary.sh oldFile newFile patchFile"
    echo "  oldFile is the last binary file"
    echo "  newFile is the new binary file"
    echo "  patchFile the resulting patch file :)"
    exit 1
fi

oldFile=$1
newFile=$2
patchFile=$3

rsync -ar --only-write-batch=$patchFile $oldFile $newFile
rm ${patchFile}.sh

And the following for applying the patch (patch-binary.sh):

#!/bin/bash

if [ $# -eq 0 ]; then
    echo "Usage: patchBinary.sh patchFile fullFile patchedFile"
    echo "These files refer to the corresponding files from the diffBinary.sh command: "
    echo "  patchFile ^= patchFile <- the result of binaryDiff.sh"
    echo "  fullFile ^= newFile"
    echo "  patchedFile ^= oldFile <- here patchBinary.sh will store its result :)"
    exit 1
fi

patchFile=$1
fullFile=$2
patchedFile=$3

cp $fullFile $patchedFile
rsync -ar --read-batch=$patchFile $patchedFile

I use it like the following:

$ diffBinary.sh 20220211-raspi-homebridge.img 20220212-raspi-homebridge.img 20220211-20220212-raspi-homebridge.patch
$ patchBinary.sh 20220211-20220212-raspi-homebridge.patch 20220212-raspi-homebridge.img 20220211-raspi-homebridge.img

After using bzip2 the file sizes look like the following ☺️:

534M 12 Feb 01:10 20220105-20220211-raspi-homebridge.patch.bz2
138M 13 Feb 22:32 20220211-20220212-raspi-homebdridge.patch.bz2
1,8G 12 Feb 21:22 20220212-raspi-homebridge.img.bz2

Exciting 🤓.

Leave a Reply

Your email address will not be published. Required fields are marked *