1. Policies
  2. Backup storage
    1. Marcos storage
    2. External
    3. Offsite
    4. Offsite (squirrel mode)
    5. Marcos backup inventory details
  3. Disk replacement
    1. LUKS/LVM single disk
    2. Single large drive setup
    3. Parts price points and providers
  4. Disaster recovery
    1. Tier 1
    2. VPS providers
    3. Large storage options
  5. Offsite procedures
    1. Remaining work on borg
    2. Remaining work on git-annex
    3. Random git-annex docs
    4. Append-only git repositories
    5. Encrypted remotes
    6. Encrypted repos restore procedure
    7. rsync.net backups
    8. References
      1. Alternatives

Policies

Main server backups are automatic, nightly. Offsite backups are by hand, monthly.

Workstation and laptop backups are more irregular, on a separate drive.

Backups are performed with borg and git-annex.

Some offsite backups were done with bup, but that was replaced by borg because the latter supports client-side encryption out of the box, supports purging old snapshots (which bup didn't at the time), and has a better commandline interface.

Backup storage

I have about 30TB of storage deployed in various places, quite ineffeciently managing a little over 5TB of original data stored in various places. The main reason for that inefficiency is that many drives outlived their usefulness because they are too small and no "enterprise" storage mechanisms (like RAID) were deployed to aggregate multiple drives.

Such bad usage pattern could (eventually?) be fixed by regrouping all those drives in a single cohesive unit, as a NAS for example. See marcos for a discussion of alternatives.

Marcos storage

External

Offsite

A shared server called toutatis was previously used to backup personal photo and video collections using git-annex encrypted remotes. This server has been replaced with a dedicated server called tubman and git-annex encryption was removed, see encrypted remotes.

Offsite (squirrel mode)

Those are archives that were disseminated in different locations.

Marcos backup inventory details

This is out of date.

path backup location notes
/ borg on calyx
/var borg on calyx
/usr borg on calyx
/home borg on calyx
/srv no see below
/srv/archive/ bup-srv on calyx one time only
/srv/audiobooks/ git-annex on green
/srv/auto/ no transient data
/srv/backup/ bup-srv on calyx one time only
/srv/books/ git-annex on green
/srv/books-incoming/ no transient data
/srv/conference/ no local copy of public data
/srv/espresso/ git-annex on markov
/srv/incoming/ bup-srv on calyx one time only
/srv/karaoke/ bup-srv on calyx one time only
/srv/mp3/ git-annex on VHS also markov, angela, archive0
/srv/playlists/ bup-srv on calyx one time only
/srv/podcast/ no todo?
/srv/roms/ git-annex on green
/srv/sid/ bup-srv on calyx one time only
/srv/SteamLibrary/ bup-srv on calyx one time only
/srv/tahoe/ no redundant data, by definition, unusable without key
/srv/tempete/ bup-srv on calyx one time only
/srv/tftp/ git-annex not sync'd to green, but files are publicly available, and git repo copied over at koumbit
/srv/video/ git-annex on green

Disk replacement

LUKS/LVM single disk

This procedure describes a major disk replacement on a system with LUKS encryption and LVM, but without RAID-1 (which would be obviously much easier). It is specific to my setup but could be useful to others and is aimed at technical users familiar with the commandline.

  1. create parts with parted, mark a 8MB leading part with the bios_grub flag:

     parted /dev/sdc mklabel gpt
     parted -a optimal /dev/sdc mkpart primary 0% 8MB
     parted -a optimal /dev/sdc mkpart primary 8MB 100%
    

    Marcos partitions are currently:

     $ sudo lvdisplay -C
     LV   VG        Attr       LSize
     home marcossd1 -wi-ao---- 380,00g
     root marcossd1 -wi-ao----  10,00g
     swap marcossd1 -wi-ao----   4,00g
     usr  marcossd1 -wi-ao----  20,00g
     var  marcossd1 -wi-ao----  30,00g
    
  2. initialise crypt partition:

    cryptsetup -v --verify-passphrase luksFormat /dev/sdX3
    cryptsetup luksOpen /dev/sdX3 crucial_crypt
    

    Note that newer versions of Debian (e.g. stretch and later) have good settings so you do not need to choose cipher settings and so on. But on older machines, you may want something like:

    --cipher aes-xts-plain64 --key-size 512 --hash sha256 --iter-time 5000
    

    I was also recommending --use-random here but I believe it is not necessary anymore.

  3. initialize logical volumes

    pvcreate /dev/mapper/crucial_crypt
    vgcreate marcossd1 /dev/mapper/crucial_crypt
    

    repeat for every filesystem, use vgdisplay -C and lvdisplay -C to inspect existing sizes:

    lvcreate -L10G -n root marcossd1
    mkfs /dev/mapper/marcoss1-root
    # [...]
    
  4. basic filesystem setup:

    mount /dev/mapper/marcossd1-root /mnt
    mkdir /mnt/{dev,sys,proc,boot,usr,var,home,srv}
    
  5. restore the root filesystem:

    cd /mnt
    borg extract -e boot -e usr -e var -e home --progress /media/sdc2/borg::marcos-2017-06-19
    

    note that --progress is available only in newer versions of borg (1.1 and later).

    if borg is not available for some reason, the filesystem can also be synchronized directly:

      rsync -vaHAx --inplace --delete --one-file-system / /mnt/
    

    note that this will destroy the mountpoint directories like /mnt/usr, which need to be recreated.

  6. edit /mnt/etc/fstab (and keep a copy in /etc/fstab.new) to change the VG paths and the /boot UUID (which can be found with blkid /dev/sdX2

  7. mount all filesystems:

    mount -o bind /dev /mnt/dev
    chroot /mnt
    mount -a
    mount -t sysfs sys /sys
    exit
    
  8. change /mnt/etc/crypttab (make a copy in /etc/crypttab.new) to follow the new partition names:

    • make sure you have NO TYPO in the new line
    • use blkid to get the UUID of the crypto device, e.g. blkid /dev/sdX3
  9. restore everything from backups:

    cd /mnt
    borg extract --progress /media/sdc2/borg::marcos-auto-2017-06-19
    borg extract --progress /media/sdc2/borg::marcos-logs-2017-11-28
    

    or rsync from the live filesystem (see below).

  10. go to single user mode:

    shutdown now
    
  11. sync from the live filesystem again, using /home/anarcat/bin/backup-rsync-mnt - a bunch of rsync for each partition, basically:

    rsync -vaHAx --inplace --delete /usr/ /mnt/usr/
    
  12. install boot blocks

    chroot /mnt mv /etc/fstab.new /etc/fstab mv /etc/crypttab.new /etc/crypttab echo "search.fs_uuid c7bf0134-d9bf-4506-b859-3d19e9a333c1 root" >> /boot/grub/load.cfg update-initramfs -u -k all update-grub2 grub-install /dev/sdX

    the fs.uuid flag comes from the /boot device, and can be found with the blkid command as well.

  13. reboot and pray

Single large drive setup

See also 2019-02-25-new-large-disk-8-year-old-anniversary for another hard drive configuration procedure.

Parts price points and providers

Those sites provide a good way to evaluate the best price point of hard drives and SSDs:

Generally, we currently r/datahoarder is a good place to get advice on which drives to get, for example this post saying:

I usually consider good prices to be around $15/tb for a HDD and $75/tb for a budget SSD. SSD prices will go up quickly from there depending on features though (sata vs nvme, pcie 3.0 vs 4.0, etc.)

... those are probably US numbers, but they somewhat match the above data.

Disaster recovery

backup plan if all else fails

  1. GTFO with the backup drives, and at least password manager (laptop/workstation rip out)

  2. confirm Gandi, park domains on a "Gandi Site" (free, one page)

  3. setup one VPS to restore DNS service, secondary at Gandi

  4. setup second VPS to restore tier-1 services

  5. restore other services as necessary

Tier 1

DNS: setup 3 primary zones and glue records.

Email: install dovecot + postfix, setup aliases and delivery. Restore mailboxes.

Web: install apache2 + restore wiki.

VPS providers

This list is sorted alphabetically.

Provider Mthly RAM Disk CPUs Traffic Notes
Digital Ocean 4$ 1GB 10GB 1 500GB USA, Europe, Canada (although not the 4$ deal in .ca)
Gandi 25$CAD 1GB 20GB SAN 3TB hourly billing, acquired then merged, uncertain future
Greenhost 6.20EUR 1GB 5GiB 1 2TiB 3.50EUR/mth for minimal: 512MiB, 5GiB
Hetzner 3EUR 2GB 20GB 20TB ≈60mbps + 1EUR/TB, 10Gbps?, hourly/monthly billing, Germany, USA, used at work
Hostwinds 8.24$ 1GB 30GB 1TB
Infomaniak 4.2EUR 2GB 20GB 1 ? 3EUR/mth for IPv4 (!)
Koumbit 20$CAD 1GB 15GB SSD 100Mbps friends, local
Linode 5$ 1GB 25GB SSD 1 1TB used 2.5admins credit to setup emergency mirrors (fumiko and louise), acquired by Akamai, prices raised 20% in April 2022 (5->6$)
Lunanode 3.50$ 1GB 15GB 1TB currently only in Toronto
OVH 3.50$CAD 2GB 20GB SSD 100Mbps KVM, France, Québec, very messy
Prgmr 5$USD 1.25GiB 15 GiB 750Mbps 2.5Mbps "congestion"?, Xen, FreeBSD jails, no bullshit, ssh console
Tetaneutral 5-10€ 1GiB 20Go SSD 100Go HDD 1 Debian Stretch

Large storage options

This was done as part of research for archival in virtual machines.

VPS:

Dedicated:

Backups:

Colo (mostly things that came out of the GitLab colo plan:

See also Goerzen's list.

Offsite procedures

A new offsite backup system was created. Previously, it was a manual process: bring the drives back to the server, pop them in a SATA enclosure, start the backup script by hand, wait, return the drives to the offsite location. This "pull" configuration had the advantage of being resilient against an attacker wanting to destroy all data, but the manual process meant the backups were never done as often as they should have.

A new design based on borg and git-annex assumes a remote server online that receives the backups (a "push" configuration). The goal is to setup the backup in "append-only" mode so that an attacker is limited in its capacity to destroy stuff on the server.

A first sync was done locally to bootstrap the dataset. This was harder than expected because the external enclosure had an older SATA controller that didn't support the 8TB drive (it was detected as 2TB) so I had to connect it in my workstation instead (an Intel NUC, which meant a tangled mess).

All this needs to be documented better and merged with the above documentation.

Remaining work on borg

  1. decice what to do with /var/log (currently excluded because we want lower retention on those)

  2. prune policies, skipped for now because incompatible with append-only

  3. automate crypto:

    a. change passphrase a. include it in script here a. include a GnuPG symmetric encrypted copy of the pass on the offsite disk

    Note: this approach should work, but needs a full shell when the key is changed, so it is fundamentally incompatible with restricted shell provider

  4. set append-only mode and restricted shell by allowing only the right borg command to be called, in authorized_keys:

    command="borg serve --append-only",restrict ssh-rsa AAAAB...
    
  5. test full run again

  6. document this in the borg documentation itself or at least here

Remaining work on git-annex

  1. switch git-annex remotes and borg repo to remote server when drive is installed (done)

  2. enable in script sync in script (done)

  3. resync everything again (done)

  4. add Photos repo with git-annex encryption (blocker: error while setting up gcrypt remote, fixed by removing the push.sign option, sent patch to spwhitton, so done)

  5. restricted shell, see git-annex-shell:

    command="GIT_ANNEX_SHELL_LIMITED=true git-annex-shell -c \"$SSH_ORIGINAL_COMMAND\"",restrict ssh-rsa AAAAB3NzaC1y[...] user@example.com
    

    GIT_ANNEX_SHELL_DIRECTORY would be useful, but we have multiple repositories we want to allow, and that, if I read CmdLine.GitAnnexShell.Checks.checkDirectory correctly, is not pattern-based but an exact match (using equalFilePath). (done, see below)

  6. make repositories made append-only, not currently supported by git-annex (done, see below)

  7. change encryption key for encrypted repositories so they work unattended. the sticky question here is which key to use. a different subkey? or a whole other keypair? if that, then how to deal with expiry, propagation, etc?

  8. setup cronjobs for all repositories (partly done: non-encrypted repositories are part of the manual backup script)

Random git-annex docs

This is how the git-annex repositories were setup at first:

for r in  audiobooks books espresso incoming mp3 playlists podcast roms video; do 
    git init /mnt/$r
    git -C /srv/$r remote add offsite /mnt/$r
    git -C /srv/$r annex sync
    git -C /srv/$r annex wanted offsite standard
    git -C /srv/$r annex group offsite backup
    git -C /srv/$r annex sync --content
done

Append-only git repositories

On the server, for each repo, disable destructive pushes:

git config receive.denyDeletes true
git config receive.denyNonFastForwards true

And force git-annex to be used for that key, in ~/.ssh/authorized_keys:

command="GIT_ANNEX_SHELL_APPENDONLY=true git-annex-shell -c \"$SSH_ORIGINAL_COMMAND\"",restrict ssh-rsa AAAAB3NzaC1y[...] user@example.com

This only works with git-annex 6.20180529 or later.

Then, on the client, generate a key for this purpose:

ssh-keygen -f ~/.ssh/id_rsa.git-annex

Then, in each repo, configure the key:

git config core.sshCommand "ssh -i /home/anarcat/.ssh/id_rsa.git-annex -o IdentitiesOnly=yes"

Unfortunately, because git-annex does not respect the core.sshCommand configuration in git, we need to use a special remote configured in ~/.ssh/config as such:

Host backup-annex
    # special key for git-annex
    IdentitiesOnly yes
    IdentityFile ~/.ssh/id_rsa.git-annex

And then change the remote:

git remote set-url offsite backup-annex:/srv/offsite/foo/

Then a cronjob (or the assistant, but i chose the former) can be ran to sync changes automatically:

for r in audiobooks books espresso roms mp3 incoming video; do
    echo "syncing $r"
    git -C /srv/$r annex sync --content -J2 
done

The problem here is that --quiet is not completely quiet:

$ LANG=C.UTF-8 git annex sync --content --quiet 
On branch master
nothing to commit, working tree clean

git-annex should pass --quiet down to git commit...

Another problem is that this only works for regular git remotes. This will fail on encrypted remotes, which rely on rsync. A workaround I found was to rely on a feature of the git-shell command, which git-annex-shell calls unless GIT_ANNEX_SHELL_LIMITED is set. That feature allows you to write custom wrappers that get called by git when an unknown command is sent. I wrote this wrapper in ~/git-shell-commands/rsync:

#!/bin/sh

for i; do
    case $i in
    --server|--sender|-WIe.LsfxC|-de.LsfxC|-re.iLsfxC|--log-format=X|--partial-dir)
        true
        ;;
    .rsync-partial|.|/home/anarcat/offsite/*.git/*|/home/anarcat/offsite/*.annex/*)
        true
        ;;
    *)
        logger <<EOF
disallowed rsync argument: '$i'
EOF
        exit 1
        ;;
    esac
done

logger <<EOF
passthrough rsync command: '$@'
EOF

exec rsync "$@"

This allows only certain rsync commands to go through, namely the normal rsync arguments passed by git-annex, but also only paths along a restricted pattern. Yes, this means an attacker can overwrite any git repository it choses, but it needs to be a bare git repository (/*.git/*), as there is no way to use gcrypt with append-only repositories, unfortunately.

Encrypted remotes

Note: this approach was abandoned as it was too clunky and error-prone. It also suffered from performance limitations and was generally just too had to figure out.

One example of a screw up is that it seems I somehow managed to mix the gcrypt remote content with the git-annex encrypted contents. For some reason, I believe those two repos had to be separate and I must have interchanged them at some point, so some files ended up being in one repo or the other. Confusing.

The rest of this section documents how this was setup, and is kept for historical purposes.


To setup the encrypted for pictures remotes, first the git-annex objects:

Photos$ git annex initremote offsite-annex type=rsync rsyncurl=user@example.net:/srv/Photos.annex/ encryption=hybrid keyid=8DC901CE64146C048AD50FBB792152527B75921E
Photos$ git annex sync --content offsite-annex

Then the git objects themselves:

Photos$ git remote add offsite-git gcrypt::rsync://user@example.net:/srv/Photos.git/
Photos$ git annex sync offsite-git

It is still unclear to me why those need to be separate. I first tried as a single repo with encryption, as documented on the website but it turns out this has significant performance problems, e.g. gcrypt remote: every sync uploads huge manifest. So spwhitton suggested the above approach of splitting the repositories in two.

What I don't understand is why git-annex can't simply encrypt the blobs and pass them down its regular remote structures like bare git repositories. Using rsync creates unnecessary overhead and complex URLs. The user interface on transfers is also far from intuitive:

$ git annex sync --content offsite-annex
commit
Sur la branche master
rien à valider, la copie de travail est propre
ok
copy 1969/12/31/20120415_009.mp4 (checking offsite-annex...) (to offsite-annex...)
sending incremental file list
840/
840/562/
840/562/GPGHMACSHA1--71995881d2ebb35a364558125d30999cf1c956d5/
840/562/GPGHMACSHA1--71995881d2ebb35a364558125d30999cf1c956d5/GPGHMACSHA1--71995881d2ebb35a364558125d30999cf1c956d5
      1,289,927 100%  199.82MB/s    0:00:00 (xfr#1, to-chk=0/5)
ok

Parallel transfers also don't have progress information. It really feels like encryption is a second-class citizen here. I also feel it will be rather difficult to reconstruct this repository from scratch, and an attempt will need to be made before we feel confident of our restore capacities. Yet the restore was tested below and seems to work so we're going ahead with the approach.

Encrypted repos restore procedure

To access the files in a bare-metal restore, first the OpenPGP keyring needs to be extracted from somewhere of course. Then a blank git repository is created:

git init Photos

And the first git remote is added and fetched:

git remote add origin gcrypt::rsync://user@example.net:/srv/Photos.git/
git fetch origin
git merge origin/master

Then the object store is added and fetched:

git annex enableremote offsite-annex type=rsync rsyncurl=user@example.net:/srv/Photos.annex/ encryption=hybrid keyid=8DC901CE64146C048AD50FBB792152527B75921E
git annex get --from offsite-annex

The first line is critical: initremote might create a new encryption key instead of reusing the existing one?

rsync.net backups

rsync.net is quirky. they h ave an old borg version so you need to specify:

export BORG_REMOTE_PATH=/usr/local/bin/borg1/borg1

otherwise you get all sorts of warnings and, ultimately, can't actually backup. They also do daily snapshots which is not super useful with borg.

Then it's kind of weird to figure out where to connect. You need to login to https://www.rsync.net/ then click on the FMT link that will show you a hostname to connect to, which is also your username. Then everything happens over SSH, for example you can look at your quota with:

ssh fm1234@fm1234.rsync.net quota

My username, for example, is fm1234 (redacted) above.

They have nice server-side ZFS snapshots but that's not very useful for me as I do not want to trust them with my cleartext data, so I use borg for my backups. The magic borg URL is something like:

export BORG_REPO="ssh://fm1234@fm1234.rsync.net/data1/home/fm1234/borg-marcos"

First backup is relatively fast, but doesn't quite saturate my uplink (~5-10mbps vs 50mbps), not sure where that bottleneck is, could be the local disk as well. Here is the server-side backup:

------------------------------------------------------------------------------
Repository: ssh://fm1234@fm1234.rsync.net/data1/home/fm1234/borg-marcos
Archive name: marcos-auto-2024-07-03T15:58:38
Archive fingerprint: 062f69b4a6692a09ba6f8cf41e9297c37599e93b77bd1e7de14373bef5d97459
Time (start): Wed, 2024-07-03 15:58:49
Time (end):   Thu, 2024-07-04 00:18:29
Duration: 8 hours 19 minutes 39.56 seconds
Number of files: 2123411
Utilization of max. archive size: 1%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:              204.00 GB            129.49 GB            114.67 GB
All archives:              204.00 GB            129.49 GB            114.72 GB

                       Unique chunks         Total chunks
Chunk index:                 1791565              2170349
------------------------------------------------------------------------------

Another incremental run was of course much faster:

------------------------------------------------------------------------------
Repository: ssh://fm1234@fm1234.rsync.net/data1/home/fm1234/borg-marcos
Archive name: marcos-auto-2024-07-04T13:34:44
Archive fingerprint: 17a50d859f600af29185b4332c1f274f650d303f5aec1157a67643f4ef1b1c4f
Time (start): Thu, 2024-07-04 13:35:00
Time (end):   Thu, 2024-07-04 13:42:25
Duration: 7 minutes 24.86 seconds
Number of files: 2123656
Utilization of max. archive size: 1%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:              204.10 GB            129.51 GB            506.07 MB
All archives:              408.10 GB            259.00 GB            115.22 GB

                       Unique chunks         Total chunks
Chunk index:                 1792561              4330539
------------------------------------------------------------------------------

Here, obviously, bandwidth is not the bottleneck, we're probably blocked by disk I/O, specifically walking the directories. The resulting bandwidth, for the above 506MB/7m25s, is 1.1MB/s.

The laptop job aborted halfway (after 4.32GB and 16 hours), but that might be because the laptop went to sleep: indeed, the process terminated when I came back in the office... The final status was:

------------------------------------------------------------------------------
Repository: ssh://fm1234@fm1234.rsync.net/data1/home/fm1234/borg-angela
Archive name: angela-2024-07-04T09:48:18.194260
Archive fingerprint: c58891e2a915a0145bd990861eaf702687747a8bf6549a612b7bce52386b382d
Time (start): Thu, 2024-07-04 09:49:44
Time (end):   Thu, 2024-07-04 12:27:20
Duration: 2 hours 37 minutes 35.71 seconds
Number of files: 2354887
Utilization of max. archive size: 1%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:              149.95 GB            117.18 GB             49.81 GB
All archives:              150.32 GB            117.27 GB            102.98 GB

                       Unique chunks         Total chunks
Chunk index:                 1841394              3576009
------------------------------------------------------------------------------

Note that during the first full backup, both backups were running in parallel so that has also impacted performance.

The incremental on the laptop had similar performance:

------------------------------------------------------------------------------
Repository: ssh://fm1702@fm1702.rsync.net/data1/home/fm1702/borg-angela
Archive name: angela-2024-07-04T13:48:48.403736
Archive fingerprint: a036c2cc424340b77744cd97cb35c461a69743154c28cfbb3a7538b40e64b246
Time (start): Thu, 2024-07-04 13:49:08
Time (end):   Thu, 2024-07-04 13:57:54
Duration: 8 minutes 45.95 seconds
Number of files: 2354928
Utilization of max. archive size: 1%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:              149.96 GB            117.18 GB            471.51 MB
All archives:              300.28 GB            234.45 GB            103.46 GB

                       Unique chunks         Total chunks
Chunk index:                 1842235              5951877
------------------------------------------------------------------------------

References

Borg:

Once we figure out git-annex, the following pages need to be updated:

Alternatives

Created . Edited .