1. Installation
    1. Example installations
    2. Issues
      1. Swap
      2. Native encryption
    3. TRIM
  2. Information
  3. Mounts
    1. Mounting
    2. Alternate mountpoints
    3. Encrypted datasets
    4. Deprecated: zfsutil
    5. Cool hack: moving data into ZFS easily
  4. Snapshots
    1. Automated snapshots
    2. Sanoid/syncoid alternatives
      1. zrepl
      2. zfs-auto-snapshot
      3. simplesnap
      4. znapzend
      5. zelta
      6. Other DIY solutions
  5. Caveats
    1. Empty datasets
    2. Not mainline
    3. Write amplification
  6. Other documentation
    1. ZFS documentation

Installation

The installation itself is not documented here, see below for examples instead.

Example installations

Issues

Swap

Swap on ZFS volumes (AKA "swap on ZVOL") can trigger lockups and that issue is still not fixed upstream. Ubuntu recommends using a separate partition for swap instead. cks would rather have no swap that swap on ZFS and compares it to NFS...

curie was setup without a swap partition (or, at least, hoping to use a ZFS dataset as a swap backend) but this has proven to be generally a bad idea. Were we to setup a new ZFS system, we'd use LUKS encryption and setup a dedicated swap partition, as we had problems with ZFS encryption as well.

Native encryption

ZFS supports native encryption, but there are serious caveats with it.

I've had trouble moving encrypted datasets between pools when trying to move the tubman rpool from HDDs to SSDs. This is a problem many people are facing, without good solutions, see also this TrueNAS discussion, reddit thread, HN thread, this openzfs docs thread, and this other one.

Also, native encryption "will not encrypt metadata related to the pool structure, including dataset and snapshot names, dataset hierarchy, properties, file size, file holes, and deduplication tables (though the deduplicated data itself is encrypted)." So it will leak some metadata about the filesystem. Deduplication is limited to the dataset level.

Therefore, it might be better to use LUKS encryption underneath ZFS to configure fully encrypted systems, although I haven't tested this directly.

Note that I use dropbear-initramfs alongside zfs-initramfs to unlock the partitions remotely. This requires the key in /etc/dropbear/authorized_keys as normal.

TRIM

I enabled (a little late) TRIM on the SSD pools:

zfs set org.debian:periodic-trim=enable bpoolssd
zfs set org.debian:periodic-trim=enable rpoolssd

That will setup periodic TRIMs, but it's also possible to set the equivalent of "discard" that "looks for space which has been recently freed, and is no longer allocated by the pool, to be periodically trimmed, however it does not immediately reclaim blocks after a free, which makes it very effective at a cost of more likely of encountering tiny ranges."

zpool set autotrim=on bpoolssd
zpool set autotrim=on rpoolssd

You can do a manual trim with:

zpool trim bpoolssd
zpool trim rpoolssd

Here's an example run:

root@tubman:/etc# zpool status -t rpoolssd
  pool: rpoolssd
 state: ONLINE
  scan: scrub repaired 0B in 00:00:37 with 0 errors on Sun Nov 13 00:24:38 2022
config:

    NAME        STATE     READ WRITE CKSUM
    rpoolssd    ONLINE       0     0     0
     mirror-0  ONLINE       0     0     0
       sdb4    ONLINE       0     0     0  (untrimmed)
       sdd4    ONLINE       0     0     0  (untrimmed)

errors: No known data errors
root@tubman:/etc# zpool trim rpoolssd
root@tubman:/etc# zpool status -t rpoolssd
  pool: rpoolssd
 state: ONLINE
  scan: scrub repaired 0B in 00:00:37 with 0 errors on Sun Nov 13 00:24:38 2022
config:

    NAME        STATE     READ WRITE CKSUM
    rpoolssd    ONLINE       0     0     0
     mirror-0  ONLINE       0     0     0
       sdb4    ONLINE       0     0     0  (3% trimmed, started at Wed 16 Nov 2022 12:19:04 PM EST)
       sdd4    ONLINE       0     0     0  (3% trimmed, started at Wed 16 Nov 2022 12:19:04 PM EST)

errors: No known data errors

See also the TRIM documentation in the Debian wiki.

Information

Listing partitions and snapshots:

zfs list

IO statistics, every second:

zpool iostat 1

Mounts

Mounting

After a zfs list, you should see the datasets you can mount. You can mount one by name, for example with:

zfs mount bpool/ROOT/debian

Alternate mountpoints

Note that it will mount the device in its pre-defined mountpoint property. In the above, it was /boot. If you want to change its mountpoint, it can be done on the fly with:

zfs set -o mountpoint=/mnt/boot bpool/ROOT/debian

If the dataset is already mounted, it will be moved to that new location immediately. Note that the parent pool's altroot property affects this path, as it's pre-pended to the mountpoint. See zpoolprops(8) for details.

If you are dealing with a new pool that's not yet known to ZFS (e.g. you just added a new drive), you will first need to import it. Typically, you'd also want to do that in an altroot, so that it doesn't override existing mounts, like this:

zpool import POOLNAME -R /mnt

This would import all pools ZFS can find:

zpool import -a -R /mnt

Encrypted datasets

If the dataset is encrypted, however, you first need to unlock it with:

zpool import -l -a

For rescue operations, that would be the right incantation:

zpool import -l -a -R /mnt

Deprecated: zfsutil

This is another way to use an alternate mountpoint, although I'm less certain it's a good way anymore:

mount -o zfsutil -t zfs bpool/BOOT/debian /mnt

Cool hack: moving data into ZFS easily

I used this procedure to move /srv/sbuild/qemu from a spinning rust drive (BTRFS, on curie) to a ZFS dataset running over NVMe. With other filesystems, this would have required either creating a new logical volumes or hacking around bind mounts. With ZFS, this was the procedure:

zfs create -o mountpoint=none -o canmount=off rpool/srv
zfs create -o mountpoint=/mnt/sbuild rpool/srv/sbuild
mv /srv/sbuild/* /mnt/sbuild/
zfs set mountpoint=/srv/sbuild rpool/srv/sbuild

That's it! You can graft mountpoints like this anywhere, which is powerful and scary!

Snapshots

Creating:

zfs snapshot pool/volume@LABEL

Listing:

zfs list -t snapshot

Listing with creation date:

zfs list -t snapshot -o name,creation

Rollback:

zfs rollback pool/volume@LABEL

Destroy:

zfs destroy pool/volume@LABEL

Limiting the number of snapshots:

zfs set snapshot_limit=2 rpool/var/cache

This is useful if you automate snapshot creation (like, say, with sanoid) and you have filesystems that have ridiculous disk usage because of old, useless snapshots.

Automated snapshots

Automatic snapshots we configured with sanoid, see the Puppet code and configuration file).

Sanoid/syncoid alternatives

TODO: we're considering alternatives to sanoid/syncoid.

After reading the code to implement a --dryrun argument on syncoid, I have found the code to have some issues. There's large functions, lots of system calls without arrays... It feels a little messy, and hard to audit, review, or work on..

zrepl

zrepl is an interesting alternative. It claims support for native encryption, bandwidth limiting, pull/push, Prometheus monitoring with a provided Grafana dashboard. It's written in Golang, and is not packaged in Debian.

There's an issue and discussion that gives a rough idea of how it differs from sanoid. There's this ticket open for a migration guide.

It has no dry run mode.

zfs-auto-snapshot

The zfs-auto-snapshot upstream is possibly dead, or at least looking for volunteers, so probably not an option.

simplesnap

Goerzen's simplesnap is another option. It's a pair of fairly short shell scripts (~600 lines total) that send snapshots to a backup host. It's unclear if it supports encryption any better than other tools, fairly minimalist.

Packaged in Debian.

znapzend

Znapzend stores the configuration inside dataset's metadata, can use local snapshots or (multiple) ssh remotes, with mbuffer support. It supports pre/post hooks to quiesce datasets, progressive thinning, and a built-in scheduler that can deal with long transfers. It has a daemon mode, a dry run, debugging output, can run as a normal user, and has a utility to analyze snapshot disk usage.

It has a setup command to initialize a configuration, example setup:

znapzendzetup create --recursive\
   --pre-snap-command="/bin/sh /usr/local/bin/lock_flush_db.sh" \
   --post-snap-command="/bin/sh /usr/local/bin/unlock_db.sh" \
   SRC '7d=>1h,30d=>4h,90d=>1d' tank/home \
   DST:a '7d=>1h,30d=>4h,90d=>1d,1y=>1w,10y=>1month' root@bserv:backup/home

There is no official Debian package but upstream has a debian source package. It is written in Perl.

zelta

zelta is written in Awk. Incomplete, ran with zfsnap.

Other DIY solutions

twb (#debian-til) wrote cyber-zfs-backup. It's short (~300 SLOCC lines of Python, 1600 with comments). There's a MySQL/MariaDB part that has a "quiescence" hook (another 100 SLOCC) and does the good ol' FLUSH TABLES WITH READ LOCK; trick which, it turns out, is apparently better served by the BACKUP STAGE command now (see the upstream docs).

Another person from the Debian community wrote their own shell script, backup-zfs.

Caveats

Empty datasets

You can sometimes end up with odd situations when mounting datasets. In the tubman install, I ended up in a situation where /var was a valid dataset, but it had canmount=off so it wasn't actually used.

This meant that the data in /var was actually in the rpool/ROOT/debian dataset, mounted on /. I mistakenly reset the canmount flag to on which shadowed that mountpoint, and basically emptied /var.

There's also some evidence that having a mountpoint for a ZFS dataset will cause it to shadow the actual dataset, which is the reverse of what one would normally expect from a filesystem. According to this discussion:

Yes you need to delete the directory -- if it exists, it cannot be mounted there.

In other words, if you have a directory called /mnt/foo and you have a dataset pool/foo configured to mount on /mnt/foo:

zfs mount pool/foo

will show /mnt/foo empty, because the /mnt/foo directory will shadow the dataset. The solution is to unmount the dataset, remove (or rename, if not empty) the directory, and remount the dataset:

zfs umount pool/foo
rmdir /mnt/foo || mv /mnt/foo /mnt/foo.bak
zfs mount pool/foo

Not mainline

ZFS is still not mainline, and will likely never be.

It should be possible, however, to ship Debian binary packages for ZFS. It's apparently possible to directly build a package with this magic command:

dkms mkbmdeb zfs/2.0.3

See also this idea in grml and this packaging attempt.

Also note that Ubuntu actually ships binary packages for ZFS and are questioning the incompatibility claims.

Write amplification

When layering filesystems, you are always at risk of causing "write amplification" because of mismatched block sizes or alignment. For example, if you have a virtual machine with a filesystem with a 4kB block size over a host device with a 8kB block size, the host will have to read that 8kB block to get the other 4kB half before writing it back.

In ZFS, it's even worse; from what I understand, because of the copy-on-write semantics. I'm not exactly clear on the details of this unfortunately, but it's something to keep in mind when deploying ZFS in complex setups.

In particular, this affects Proxmox which uses 8kB block size zvols for virtual machines, which seems to cause performance problems.

Other documentation

ZFS documentation

Created . Edited .