After more than 2 years of needing a backup server, I've built myself a backup server.
I mentioned in the Synapse Migration post earlier that I was looking into backups with Btrbk, and I've figured out how that works. It's actually super simple once you've set the first one up.
Btrfs
Btrfs, pronounced "bee-tree ef es" or "butter ef es" is a copy-on-write filesystem designed to allow for quickly and efficiently saving and restoring the system's state. It has features such as Atomic Snapshotting and Filesystem-level RAID. Btrfs snapshots are implemented by telling the filesystem that we want to save the current state, so any writes to the disk after that point happen in free space somewhere else. The new data is linked back to the old data so long as it doesn't change, meaning snapshots take up no additional room on the disk until something changes, and then it only keeps track of what changed. It doesn't need to replicate the whole filesytem in order to change a few files.
Some basic ideas of btrfs involve 'subvolumes', which are essentially individual filesystems that btrfs keeps track of. You can create new subvolumes on the fly, and take snapshots of any subvolume at any time.
$ btrfs subvolume create /path/to/@subvolume
Subvolumes can be mounted as if they were partitions.
$ mount -o subvol=@subvolume /dev/mapper/my-btrfs-drive /place/to/mount
Taking a snapshot of a subvolume
$ btrfs subvolume snapshot -r /path/to/@subvolume /path/to/snapshots/@subvolume-snapshot-1
A snapshot is itself a subvolume, but we can use the -r
flag to store it as read-only. If you need to restore a subvolume from a snapshot, you can do this.
$ btrfs subvolume delete /path/to/@subvolume
$ btrfs subvolume snapshot /path/to/snapshots/@subvolume-snapshot-1 /path/to/@subvolume
Here, we've deleted our current subvolume and replaced it with a snapshot of the snapshot we created earlier. Notice we didn't use the -r
flag, so this new snapshot we've created can be written to.
What if we have two hard drives with btrfs filesystems on them, and we want to move a subvolume from one to the other? We use btrfs send
and btrfs receive
to do that
$ btrfs subvolume snapshot -r /mnt/disk1/@subvolume /mnt/disk1/@snapshot1
$ btrfs send /mnt/disk1/@snapshot1 | btrfs receive /mnt/disk2/
This command will transfer the entire @snapshot1
subvolume from disk1 to disk2, but say we do something like
$ echo "hello" > /mnt/disk1/@subvolume/new_file
And now we want to send the changes to the new disk? Well now we'll do something a little different
$ btrfs subvolume snapshot -r /mnt/disk1/@subvolume /mnt/disk1/@snapshot2
$ btrfs send /mnt/disk1/@snapshot2 -p /mnt/disk1/@snapshot1 | btrfs receive /mnt/disk2/
But this time, it only sends what has been changed between @snapshot1 and @snapshot2! This way, all our future backups will be quick.
Now that we've seen snapshots sent between disks on a system, it's not too much of a stretch to imagine how snapshots can be sent between hosts on a network.
Btrbk
I've chosen to use Btrbk, since that's what I've been using since I first started using btrfs. Btrbk is a perl script that keeps track of snapshots for you. By default, the snapshots it takes are read-only, and it can be configured to keep snapshots it takes for certain amounts of time. An example btrbk configuration can have it can keep hourly snapshots for a day, daily snapshots for two weeks, weekly snapshots for 10 weeks, and monthly snapshots forever. Since the filesystem will grow in size as it changes since a snapshot was taken, I prefer not to keep snapshots around for too long.
My Setup
In general, if I have a disk plugged into my devices, I have it formatted as btrfs
and I have it mounted at /btrfs/hdd
or /btrfs/ssd
in addition to wherever it's mounted for application usage. This is so I can have a consistent snapshot config across all my devices. Inside each of these directories, I have a @snapshots
subvolume, and one or two more subvolumes specific to the application running on the host.
We'll start with my on-host btrbk config. This is something I've had in place on all my hosts for years, and it's saved me a few times. This example specifically came from my mastodon host.
# Enable transaction log
transaction_log /var/log/btrbk.log
# Don't delete any snapshots that haven't been around for 2 days yet
snapshot_preserve_min 2d
# Delete daily snapshots after 14 days
snapshot_preserve 14d
# Snapshot subvolumes in /btrfs/hdd
volume /btrfs/hdd
# Put snapshots into the @snapshots folder
snapshot_dir @snapshots
# snapshot the @mastodon subvolume
subvolume @mastodon
This config snapshots the @mastodon subvolume into the @snapshots folder. I have a cron job that runs btrbk every hour so if I have any issue, I can revert mastodon's state to something at most one hour old. Hourly snapshots are deleted after two days, and daily snapshots are deleted after 14 days.
This kind of configuration is great when your disk works, but if your disk is starting to go bad, this won't help you. For this reason, it's good to take backups on another disk. In my case, that other disk is on another host entirely.
Setting up Btrbk to work over a network isn't too difficult, but there are a couple steps
1. Create a btrbk
user on the target system
This is the application host, not the backup host. The btrbk user is important because it will be allowed to run the btrfs
command and readlink
command as root without a password in order to launch btrfs send
to send the subvolumes over the network.
$ sudo useradd -m -G sudo btrbk
2. Give the btrbk
user the required permissions
We'll create a sudoers config for this new user
$ sudo vi /etc/sudoers.d/90-btrbk
The contents of the config should be this:
# /etc/sudoers.d/90-btrbk
btrbk ALL=(root:nobody) NOPASSWD:NOEXEC: /bin/btrfs, /bin/readlink
What this does is allow the btrbk
user to run the btrfs
and readlink
commands as root,
with no group, and without a password. NOEXEC
here prevents the btrfs
or readlink
commands from spawning further commands as the root user.
3. Set up SSH Keys
In order for the backup host to connect securely to the application host, we need to set up keys to initiate the session. On the backup host, generate a new key:
$ ssh-keygen
The defaults this command provides are sensible, as a result, two new files should have been created on your system:
~/.ssh/id_rsa
~/.ssh/id_rsa.pub
Let's make a directory in the btrbk
folder for these and limit the private key's access to root only.
$ sudo mkdir -p /etc/btrbk/ssh
$ sudo mv ~/.ssh/id_rsa{,.pub} /etc/btrbk/ssh
$ sudo chown root:root /etc/btrbk/ssh/id_rsa{,.pub}
$ sudo chmod 600 /etc/btrbk/ssh/id_rsa
4. Install the public key
Now that we've generated our SSH keys, let's install them on the application host.
$ sudo su - btrbk
$ mkdir .ssh
$ chmod 700 .ssh
$ vi .ssh/authorized_keys
The contents of the authorized_keys file should be identical to the contents of the public key you've generated (/etc/btrbk/ssh/id_rsa.pub
), but preceded by the text listed below. This file should only be one line long
command="/usr/local/bin/ssh_filter_btrbk.sh -i -s -l -p /btrfs/hdd --sudo"
So the actual file looks like this
command="/usr/local/bin/ssh_filter_btrbk.sh -i -s -l -p /btrfs/hdd --sudo" ssh-rsa MY/REALLY/LONG/KEY/STRING/BLAH/BLAH backup_user@backup_host
What's this command
about?
The command listed here is a program that is run when an SSH session is started to this user that verifies the incoming command is allowed. In this case, it's using a script provided by btrbk, which is going to validate a few things:
-i
: The incoming command is allowed to bebtrfs info
-s
: The incoming command is allowed to bebtrfs snapshot
orbtrfs send
-l
: The incoming command should be logged-p /btrfs/hdd
: The incoming command should only operate on subvolumes inside/btrfs/hdd
--sudo
: The incoming command should be prefixed withsudo -n
This way, if we try to run this command from the backup host it will work:
$ sudo ssh -i /etc/btrbk/ssh/id_rsa btrbk@application_host 'sudo -n btrbk subvolume list /btrfs/hdd'
But if we try to run this command, it won't:
$ sudo ssh -i /etc/btrbk/ssh/id_rsa btrbk@application_host 'sudo -n rm -rf /'
5. Set up btrbk on the application host
The btrbk config I provided towards the start of this article should help you get started. Place your config at /etc/btrbk/btrbk.conf
.
Now make a cron job in /etc/cron.hourly
called btrbk
#!/usr/bin/env bash
# /etc/cron.hourly/btrbk
btrbk -q run
And mark it as executable
$ sudo chmod +x /etc/cron.hourly/btrbk
6. Set up btrbk on the backup host
Now we'll create a similar config to the one we did earlier, but have it copy subvolumes off of the application_host.
Edit your btrbk config:
$ sudo vi /etc/btrbk/btrbk.conf
And add the contents
# /etc/btrbk/btrbk.conf
# Log our actions
transaction_log /var/log/btrbk.log
# Set a 512MB buffer size for network transfers
stream_buffer 512m
# Use the SSH key we generated earlier
ssh_identity /etc/btrbk/ssh/id_rsa
# Connect to the application_host as the btrbk user
ssh_user btrbk
# Use sudo for running btrfs commands on the application_host
backend btrfs-progs-sudo
# There is no minimum for keeping backups around
target_preserve_min no
# But we do keep 10 weekly backups and 6 monthly backups around at all times
target_preserve 0d 10w 6m
# Set up our remote btrfs volume
# here, we specify we connect to application_host over ssh, and look in the /btrfs/hdd folder
volume ssh://application_host/btrfs/hdd/
# We'll back up the @mastodon subvolume
subvolume @mastodon
# From the @snapshots directory
snapshot_dir @snapshots
# Don't ever delete snapshots on the remote system, since the remote btrbk instance handles that already
snapshot_preserve_min all
# Use an existing snapshots, don't make new ones
snapshot_create no
# Back up the snapshot to backup_host's /btrfs/hdd/@snapshots/r641 folder (you may need to mkdir -p this path)
target send-receive /btrfs/hdd/@snapshots/r641
You're done!
There you have it, a complete btrfs network backup setup. If you want to add another host to backup in the future, you follow the same steps, skipping the SSH key generation, and you'll add another volume
section to the backup host's btrbk config. If you want to back up another subvolume on an existing application_host, you add another subvolume
section to that host's volume
section.
I hope that this article has been helpful to anyone looking to implement network backups with btrfs.
Comments
No comments yet. Be the first to react!