Tubman is named after Harriet Tubman, an "American abolitionist and political activist. Born into slavery, Tubman escaped and subsequently made some 13 missions to rescue approximately 70 enslaved people, including family and friends, using the network of antislavery activists and safe houses known as the Underground Railroad. During the American Civil War, she served as an armed scout and spy for the Union Army. The first woman to lead an armed expedition in the war, she guided the raid at Combahee Ferry, which liberated more than 700 enslaved people. In her later years, Tubman was an activist in the movement for women's suffrage."

I was the conductor of the Underground Railroad for eight years, and I can say what most conductors can't say — I never ran my train off the track and I never lost a passenger.


(copied from v1)

Installation procedure

I would have used FAI's setup-storage but it doesn't support ZFS, unfortunately. It is part of the long term roadmap, that said, and there's a howto for stretch, but that doesn't use setup-storage. I was hoping I would reuse the installer I've been working on at work...

We have the following disk configuration:

We boot from a grml live image based on Debian testing (bullseye), and will follow this howto:

  1. install requirements:

    apt update
    apt install --yes debootstrap gdisk dkms dpkg-dev linux-headers-$(uname -r) zfs-dkms
    modprobe zfs
    apt install --yes zfsutils-linux

    Note that those instructions differ from the documentation (we don't use buster-backports) because we start from a bullseye live image.

  2. clear the partitions on the two HDD, and setup a BIOS, UEFI, boot pool and native encrypted partition:

    for DISK in /dev/sdb /dev/sdc ; do
        sgdisk --zap-all $DISK
        sgdisk -a1 -n1:24K:+1000K -t1:EF02 $DISK
        sgdisk     -n2:1M:+512M   -t2:EF00 $DISK
        sgdisk     -n3:0:+1G      -t3:BF01 $DISK
        sgdisk     -n4:0:0        -t4:BF00 $DISK

    resulting partition table:

    root@grml ~ # sgdisk -p /dev/sdb
    Disk /dev/sdb: 7814037168 sectors, 3.6 TiB
    Model: ST4000DM004-2CV1
    Sector size (logical/physical): 512/4096 bytes
    Disk identifier (GUID): 63B2F372-B4E9-45FF-8151-9706F9F158C9
    Partition table holds up to 128 entries
    Main partition table begins at sector 2 and ends at sector 33
    First usable sector is 34, last usable sector is 7814037134
    Partitions will be aligned on 16-sector boundaries
    Total free space is 14 sectors (7.0 KiB)
    Number  Start (sector)    End (sector)  Size       Code  Name
       1              48            2047   1000.0 KiB  EF02  
       2            2048         1050623   512.0 MiB   EF00  
       3         1050624         3147775   1024.0 MiB  BF01  
       4         3147776      7814037134   3.6 TiB     BF00
  3. create the boot pool called bpool and the root pool called rpool, the latter will prompt for a disk encryption key:

    zpool create \
        -o cachefile=/etc/zfs/zpool.cache \
        -o ashift=12 -d \
        -o feature@async_destroy=enabled \
        -o feature@bookmarks=enabled \
        -o feature@embedded_data=enabled \
        -o feature@empty_bpobj=enabled \
        -o feature@enabled_txg=enabled \
        -o feature@extensible_dataset=enabled \
        -o feature@filesystem_limits=enabled \
        -o feature@hole_birth=enabled \
        -o feature@large_blocks=enabled \
        -o feature@lz4_compress=enabled \
        -o feature@spacemap_histogram=enabled \
        -o feature@zpool_checkpoint=enabled \
        -O acltype=posixacl -O canmount=off -O compression=lz4 \
        -O devices=off -O normalization=formD -O relatime=on -O xattr=sa \
        -O mountpoint=/boot -R /mnt \
        bpool mirror /dev/sdb3 /dev/sdc3
    zpool create \
        -o ashift=12 \
        -O encryption=aes-256-gcm \
        -O keylocation=prompt -O keyformat=passphrase \
        -O acltype=posixacl -O canmount=off -O compression=lz4 \
        -O dnodesize=auto -O normalization=formD -O relatime=on \
        -O xattr=sa -O mountpoint=/ -R /mnt \
        rpool mirror /dev/sdb4 /dev/sdc4
  4. create filesytems and "datasets":

    • this creates two containers, for ROOT and BOOT

      zfs create -o canmount=off -o mountpoint=none rpool/ROOT zfs create -o canmount=off -o mountpoint=none bpool/BOOT

  5. this actually creates the boot and root filesystems:

    zfs create -o canmount=noauto -o mountpoint=/ rpool/ROOT/debian
    zfs mount rpool/ROOT/debian
    zfs create -o mountpoint=/boot bpool/BOOT/debian
  6. then they use even more data sets, although I'm not sure they are all necessary:

    zfs create                                 rpool/home
    zfs create -o mountpoint=/root             rpool/home/root
    chmod 700 /mnt/root
    zfs create -o canmount=off                 rpool/var
    zfs create -o canmount=off                 rpool/var/lib
    zfs create                                 rpool/var/log
    zfs create                                 rpool/var/spool
  7. to exclude temporary files from snapshots, for example:

    zfs create -o com.sun:auto-snapshot=false  rpool/var/cache
    zfs create -o com.sun:auto-snapshot=false  rpool/var/tmp
    chmod 1777 /mnt/var/tmp
  8. and a /srv:

    zfs create                                 rpool/srv
  9. or for Docker (TODO):

    zfs create -o com.sun:auto-snapshot=false rpool/var/lib/docker
  10. make a tmpfs for /run:

    mkdir /mnt/run
    mount -t tmpfs tmpfs /mnt/run
    mkdir /mnt/run/lock
  11. install the base system and copy the ZFS config:

    debootstrap --components=main,contrib bullseye /mnt
    mkdir /mnt/etc/zfs
    cp /etc/zfs/zpool.cache /mnt/etc/zfs/
  12. base system configuration:

    echo HOSTNAME > /mnt/etc/hostname
    vi /mnt/etc/hosts
    apt install ca-certificates
    echo 'deb https://deb.debian.org/debian-security bullseye-security main contrib' > /etc/apt/sources.list.d/security.list
  13. bind mounts and chroot for more complex config:

    mount --rbind /dev  /mnt/dev
    mount --rbind /proc /mnt/proc
    mount --rbind /sys  /mnt/sys
    chroot /mnt /bin/bash
  14. more base system config:

    ln -s /proc/self/mounts /etc/mtab
    apt update
    apt install --yes console-setup locales
    dpkg-reconfigure locales tzdata
  15. ZFS boot configuration

    apt install --yes dpkg-dev linux-headers-amd64 linux-image-amd64
    apt install --yes zfs-initramfs
    echo REMAKE_INITRD=yes > /etc/dkms/zfs.conf
    apt install --yes grub-pc
    apt remove --purge os-prober
  16. pick a root password

  17. bpool import hack (TODO: whyy)

    cat > /etc/systemd/system/zfs-import-bpool.service <<EOF
    ExecStart=/sbin/zpool import -N -o cachefile=none bpool
    # Work-around to preserve zpool cache:
    ExecStartPre=-/bin/mv /etc/zfs/zpool.cache /etc/zfs/preboot_zpool.cache
    ExecStartPost=-/bin/mv /etc/zfs/preboot_zpool.cache /etc/zfs/zpool.cache
    systemctl enable zfs-import-bpool.service
  18. enable tmpfs (TODO: isn't there a better way?)

    ln -s /usr/share/systemd/tmp.mount /etc/systemd/system/
    root@grml:/# systemctl enable tmp.mount
  19. grub setup:

    root@grml:/# grub-probe /boot
    root@grml:/# update-initramfs -c -k all
    update-initramfs: Generating /boot/initrd.img-5.10.0-6-amd64
    root@grml:/# sed -i 's,GRUB_CMDLINE_LINUX.*,GRUB_CMDLINE_LINUX="root=ZFS=rpool/ROOT/debian",' /etc/default/grub
    root@grml:/# update-grub
    Generating grub configuration file ...
    Found linux image: /boot/vmlinuz-5.10.0-6-amd64
    Found initrd image: /boot/initrd.img-5.10.0-6-amd64
    root@grml:/# grub-install /dev/sdb 
    Installing for i386-pc platform.
    Installation finished. No error reported.
    root@grml:/# grub-install /dev/sdc 
    Installing for i386-pc platform.
    Installation finished. No error reported.

    make sure you check both disks in there:

     dpkg-reconfigure grub-pc
  20. filesystem mount ordering (TODO: is this necessary?):

    mkdir /etc/zfs/zfs-list.cache
    touch /etc/zfs/zfs-list.cache/bpool
    touch /etc/zfs/zfs-list.cache/rpool
    zed -F &

    then verify the files have data:

    root@grml:/# cat /etc/zfs/zfs-list.cache/bpool                                                                                                                         
    bpool   /mnt/boot       off     on      on      off     on      off     on      off     -       none    -       -       -       -       -       -       -       -
    bpool/BOOT      none    off     on      on      off     on      off     on      off     -       none    -       -       -       -       -       -       -       -
    bpool/BOOT/debian       /mnt/boot       on      on      on      off     on      off     on      off     -       none    -       -       -       -       -       -     -
    root@grml:/# cat /etc/zfs/zfs-list.cache/rpool                                                                                                                         |
    rpool   /mnt    off     on      on      on      on      off     on      off     rpool   prompt  -       -       -       -       -       -       -       -
    rpool/ROOT      none    off     on      on      on      on      off     on      off     rpool   none    -       -       -       -       -       -       -       -
    rpool/ROOT/debian       /mnt    noauto  on      on      on      on      off     on      off     rpool   none    -       -       -       -       -       -       -     -
    rpool/home      /mnt/home       on      on      on      on      on      off     on      off     rpool   none    -       -       -       -       -       -       -     -
    rpool/home/root /mnt/root       on      on      on      on      on      off     on      off     rpool   none    -       -       -       -       -       -       -     -
    rpool/srv       /mnt/srv        on      on      on      on      on      off     on      off     rpool   none    -       -       -       -       -       -       -     -
    rpool/var       /mnt/var        off     on      on      on      on      off     on      off     rpool   none    -       -       -       -       -       -       -     -
    rpool/var/cache /mnt/var/cache  on      on      on      on      on      off     on      off     rpool   none    -       -       -       -       -       -       -     -
    rpool/var/lib   /mnt/var/lib    off     on      on      on      on      off     on      off     rpool   none    -       -       -       -       -       -       -     -
    rpool/var/log   /mnt/var/log    on      on      on      on      on      off     on      off     rpool   none    -       -       -       -       -       -       -     -
    rpool/var/spool /mnt/var/spool  on      on      on      on      on      off     on      off     rpool   none    -       -       -       -       -       -       -     -
    rpool/var/tmp   /mnt/var/tmp    on      on      on      on      on      off     on      off     rpool   none    -       -       -       -       -       -       -     -
    root@grml:/# fg
    zed -F
  21. fix the paths to eliminate /mnt:

    sed -Ei "s|/mnt/?|/|" /etc/zfs/zfs-list.cache/*
  22. extra config, setup SSH with auth key:

    apt install --yes openssh-server
    mkdir /root/.ssh/
    cat > /root/.ssh/authorized_keys <<EOF
  23. snapshot initial install:

    zfs snapshot bpool/BOOT/debian@install
    zfs snapshot rpool/ROOT/debian@install
  24. exit chroot:

  25. unmount filesystems:

    mount | grep -v zfs | tac | awk '/\/mnt/ {print $3}' | \
        xargs -i{} umount -lf {}
    zpool export -a
  26. reboot:


That procedure actually worked! The only problem was the interfaces(5) configuration, which was missing (regardless of what the above says). I want to do systemd-networkd anyways.

We performed steps 1 through 6, remaining steps are optional and troubleshooting.

SSD caching

The machine has been installed on two HDD: spinning rust! Those are typically slow, but they are redundant which should ensure high availability. To boost performance, we're setting up a SSD cache.

ZFS has two types of caches:

The L2ARC is purely a performance cache, and if it dies, no data is lost. The former, however, can cause data loss (typically a few seconds, but still) in case the drive dies. So we're going with L2ARC, based on this source for the redundancy claim.

To configure the L2ARC cache, we simply did this:

zpool add rpool cache /dev/sda3

(Actually, -f was necessary because there already was a crypto_LUKS partition on there, which we didn't care about.)

The sda3 device is the third partition on the SSD drive. It's 465GB so it should provide a lot of space for the cache.

The status of the cache can be found with the zpool iostat command:

root@tubman:~# zpool iostat -v
              capacity     operations     bandwidth 
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
bpool       47.8M   912M      0      0      3     14
  mirror    47.8M   912M      0      0      3     14
    sdb3        -      -      0      0      1      7
    sdc3        -      -      0      0      1      7
----------  -----  -----  -----  -----  -----  -----
rpool       1.29G  3.62T      0     60    437   432K
  mirror    1.29G  3.62T      0     60    437   432K
    sdb4        -      -      0     30    199   216K
    sdc4        -      -      0     30    238   216K
cache           -      -      -      -      -      -
  sda3       326M   465G      0    183  4.96K  11.9M
----------  -----  -----  -----  -----  -----  -----

Next steps


Decisions taken during the procedure

Changes from the original procedure

Abandoned ideas

To be improved


ZFS primer


Listing partitions and snapshots:

zfs list

IO statistics, every second:

zpool iostat 1



zfs snapshot pool/volume@LABEL


zfs list -t snapshot

Listing with creation date:

zfs list -t snapshot -o name,creation


zfs rollback pool/volume@LABEL


zfs destroy pool/volume@LABEL

Other documentation

ZFS documentation

Created . Edited .