syncmaildir (SMD) configuration
In May 2018, I have migrated from OfflineIMAP to syncmaildir. This page documents how that process was done and the SMD configuration.
I tried to follow the official procedure to migrate from OfflineIMAP to SMD. I hit some difficulties, which I documented in upstream issues. What follows is the detailed test procedure I followed to test the synchronization and notes about the process.
OfflineIMAP migration
This procedure was attempted to migrate my OfflineIMAP mailbox to SMD, but ultimately failed for reasons explained below.
run
smd-check-conf
to create a template configuration in.smd/config.default
and configure it with:SERVERNAME=smd-server-anarcat CLIENTNAME=curie-anarcat MAILBOX_LOCAL=Maildir-smd MAILBOX_REMOTE=Maildir-smd TRANSLATOR_RL="smd-translate -m oimap-dovecot -d RL default" TRANSLATOR_LR="smd-translate -m oimap-dovecot -d LR default" EXCLUDE="$MAILBOX_LOCAL/.notmuch/hooks/* $MAILBOX_LOCAL/.notmuch/xapian/*"
authenticate remote server:
ssh imap.anarc.at true
created a
.ssh/config
entry# wrapper for smd Host smd-server-anarcat Hostname imap.anarc.at User anarcat BatchMode yes IdentitiesOnly yes Compression yes
This will be useful later to configure the restricted shell account. A quick overview of those options:
Host
: a unique alias that is unlikely to be reused outside of that configurationHostname
: make sure we connect to the right host regardless of the alias defined inHost
User
: same, but for the userBatchMode
: do not prompt so fails correctly if public key is missingIdentitiesOnly
: same, but do not look for external crypto tokens like a YubikeyCompression
: SMD does not do compression by default, so delegate that toSSH
create the test maildirs, takes about 2 minutes, on both the client and the server:
server$ cp -a Maildir/ Maildir-smd/ client$ cp -a Maildir/Anarcat/ Maildir-smd/
rename the
INBOX
folder on the client. the problem here is that theINBOX
folder exists locally (thanks to offlineimap) and not remotely (thanks to dovecot). this was reported in bug #7 and it seems the workaround might be, on the client:client$ mv Maildir-smd/INBOX/{cur,new,tmp} Maildir-smd/ && rmdir Maildir-smd/INBOX/
run
smd-check-conf
repeatedly until it stops complaining and looks sane. steps taken to cleanup remote directory:- removed top-level stray folders
moved out the Koumbit subdirectory that I couldn't make smd ignore. nothing seemed to work: not only did the folder not get ignored, the translation layer would fail to convert back and forth, because this was not really a Dovecot folder. i tried:
EXCLUDE_REMOTE='Maildir/Koumbit Maildir/Koumbit Maildir/Koumbit/ Maildir/Koumbit.INBOX.Archives.2012/ Maildir/.notmuch/hooks/ Maildir/.notmuch/xapian/'
filed as bug #4. The
.notmuch
folder ignores are necessary because smd crashes on symlinks (bug #5).the above fails with remote folder as
Maildir-smd
. just removed the folders for now, but they are important and shouldn't be completely destroyed!server$ mkdir Maildir-smd-notmuch && mv Maildir-smd/.notmuch/{hooks,xapian,muchsync} Maildir-smd-notmuch
when the config looks sane, the next step is to convert the folder away from OfflineIMAP idiosyncracies, particularly removing the
X-OfflineIMAP
header. there I have found that the suggested commandline:find Mail -type f -exec sed -i '/^X-OfflineIMAP/d' {} \;
was problematic: it could corrupt email bodies as it works on complete messages, not just the headers. I wrote a generic header stripping script to workaround that issue. to call it, use this, which takes about three minutes, on both local and remote servers (because yes, the remote server also has OfflineIMAP headers somehow):
server$ ~/dist/syncmaildir/misc/strip-header Maildir-smd/ 2>&1 >&2 | pv -s 187000 -l > log client$ ~/dist/syncmaildir/misc/strip-header Maildir-smd/ 2>&1 >&2 | pv -s 187000 -l > log
contributed upstream as PR #3.
then the files need to be renamed to please SMD, using, which takes about 3 minutes and generates ~140k renames (basically all files get renamed):
client$ smd-uniform-names -v
actually rename the files, using the script created, takes about 2 minutes:
client$ sh -x smd-rename.sh 2>&1 | pv -l -s $(wc -l smd-rename.sh | cut -f1 -d ' ') > log-rename
first dry-run pull, takes about 4 minutes. about 80k emails are missing, probably because SMD considers all folders, even if not subscribed (bug #2). all filenames to transfer are also dumped to stdout, ouch! (bug #6). at least logs are in
~/.smd/log/client.default.log
client$ smd-pull --dry-run
update: after a full rerun, it turns out only 9k emails are missing. after running the strip-header on both sides, only ~600. but with s/^cp/^mv/, back to 1400. grml. after another full rerun, the numbers are 401 new mails on pull.
to get the count:
grep stats::new-mails /home/anarcat/.smd/log/client.default.log
... grouped per folder:
grep stats::mail-transferre /home/anarcat/.smd/log/client.default.log | sed -e 's/ , /\n/g' | sed 's#/cur/.*##' | sort | uniq -c | sort -n
first dry-run push - problem: 15k mails missing from remote? maybe because we're in dry-run mode? need to backup remote and test again. nope. there are genuine diffs there, e.g. git-annex folder totally different. maybe not subscribed?
client$ smd-push --dry-run
result: 591 more duplicate emails.
pull again: 490 mails deleted? push again, no change. wtf...
create email remotely and locally, go to 10
run
smd-loop
and hook into startup scripts? (TODO)create restricted shell (TODO)
call
notmuch new
in~/.smd/hooks/post-pull.d/
(TODO)
Clearing the slate can be done by running this command on both ends:
\rm -r .smd/workarea/ .smd/*.db.txt Maildir-smd/
The migration did not work very well: as documented above, lots of problems with exclude patterns, weird error messages, large dumps on the console, and scripts I had to rewrite. I also end up with duplicate emails in the process, something I generally try to avoid. Even if it's about 500 emails over ~200 000, it's still annoying.
I found it was better to start off with a clean slate and just copy all files as is to start with. The problem then of course is the directory layout is completely changed and is now incompatible with OfflineIMAP forever. But that is inevitable: the second we rename files and unmangle the headers to remove OfflineIMAP specific stuff, the folder cannot be reused, so it's unclear what the benefit of migrating from OfflineIMAP is over just using rsync to have a clean mirror to start with, avoiding all the messy rewrite rules logic.
Full synchronization
This is an alternative procedure from the above that just copies the files over and starts from a clean slate. Considering we lose compatibility with OfflineIMAP anyways, it seems like a much simpler procedure at no extra cost (except we need to copy files over).
The layout of the files will change to something a little more obscure which requires some fixes in notmuch tagging scripts and my notmuch-emacs config, but that will actually simplify things as we reduce the deltas between machines. It also requires a rescan of notmuch, which takes a long time (30-60 minutes), but that's a one-time cost only without data loss.
configure SSH using the snippet from step 3 in the above procedure and confirm logging in works:
ssh smd-server-anarcat
create a copy on remote server, takes about two minutes:
ssh anarc.at cp -a Maildir Maildir-smd
strip offlineimap headers on remote, takes about 3-4 minutes?
ssh anarc.at "~/dist/syncmaildir/misc/strip-header Maildir-smd/ 2>&1 >&2 | pv -s 187000 -l > log"
synchronize the Maildir folder, this takes about 20 minutes: 1` ssh anarc.at "tar czf - Maildir-smd/" | pv -s 7G | tar xfz -
cleanup IMAP server cruft copied from the remote folder:
( cd Maildir-smd find \( -name :list -o -name courierimapkeywords -o -name courierimapuiddb -o -name courierimapacl \) -a -delete find \( -name maildirfolder -o -name 'dovecot.index*' -o -name 'dovecot-uidlist' -o -name 'dovecot-keywords' \) -a -delete \rm dovecot-uidvalidity* dovecot.mailbox.log subscriptions )
clear out notmuch metadata from remote and keep a dump of our current tags
( cd Maildir-smd \rm .notmuch/dump-201* \rm -r .notmuch/{muchsync,xapian,hooks}/ ) cp -a Maildir/.notmuch/{xapian,hooks} Maildir-smd/.notmuch/ notmuch dump > Maildir-smd/.notmuch/dump
we go live. do one last offlienimap run, stop offlineimap, stop sending email locally, stop whatever writes to
~/Maildir/
. on the server, this means stopping dovecot and postfix as well:killall -HUP offlineimap ; sleep 60 ; killall offlineimap ssh anarc.at sudo systemctl stop dovecot postfix ssh anarc.at mv Maildir Maildir.orig && mv Maildir-smd Maildir
swap the old OfflineIMAP folder with the new folder.
mv Maildir Maildir-offlineimap mv Maildir-smd Maildir
also change the path to the OfflineIMAP folder in
.offlineimaprc
in case it gets started by mistake.change folders to point to real folders, change the translators, and fix the exclude patterns in
~/.smd/config.default
:SERVERNAME=smd-server-anarcat CLIENTNAME=curie-anarcat MAILBOX=Maildir EXCLUDE="Maildir/.notmuch/hooks/* Maildir/.notmuch/xapian/*" TRANSLATOR_LR="smd-translate -m move -d LR default" TRANSLATOR_RL="smd-translate -m move -d RL default"
run a test pull, should be no change:
smd-pull --dry-run
we had problems at this step previously because we were running smd on the
Maildir-smd
folder directly, before the rename. it might have been keeping state in~/.smd/workarea
that was confusing things so instead we made this direct procedure. the problem was reported upstream in bug #8, but it turns out the bug was my mistake, as I wasn't renaming the folder on the server side.test push, same:
smd-push --dry-run
run
notmuch new
. this involves first fixing thenotmuch-tag
andnotmuch-purge
script to remove the host-specific exception, then dropping all theunread
,flagged
andinbox
tags that will be re-imported by mistake, then restoring from the backup:notmuch new notmuch tag -inbox -flagged -unread pv ~/Maildir/.notmuch/dump | notmuch restore
running
notmuch new
the first time takes a loong time, maybe as long as if it was a clean database. might be worth not copying thexapian
folder after all and just start from scratch, as long as we restore the previous dump.run the real pull/push
send myself an email, which should create an email locally and remotely, run pull/push again
hook
notmuch new
in SMD:echo '#!/usr/bin/notmuch new' > ~/.smd/hooks/post-pull.d/notmuch chmod +x ~/.smd/hooks/post-pull.d/notmuch
start
smd-loop
from somewhere. for now just running in place of OfflineIMAP in a different workspace. I'm also experimenting with starting SMD from systemdcreate restricted shell. this means creating a new key with
ssh-keygen
, without password, and adding it on the server in~/.ssh/authorized_keys
with restrictions, like this:command="smd-restricted-shell",restrict ssh-ed25519 AAAAC3... user@host
The
restrict
parameter assumes the server runs OpenSSH 7.2 or later, otherwise you need to spell out the individual restrictions likeno-port-forwarding
and so on, seeauthorized_keys(5)
for details.
Performance comparison
OfflineIMAP is definitely slower. Here is a single run, although with a password input prompt that I estimate takes about one second:
*** Finished account 'Anarcat' in 0:13
11.46user 2.96system 0:13.39elapsed 107%CPU (0avgtext+0avgdata 84760maxresident)k
0inputs+112outputs (0major+51610minor)pagefaults 0swaps
That is with postsynchook
disabled and running in "once" mode (-k
Account_Anarcat:postsynchook=true -o
), the hook (notmuch) takes about
3 seconds on its own.
SMD, on a snapshot of that mailbox about an hour old (so essentially the same):
$ time sh -c "smd-pull ; smd-push"
1.25user 0.82system 0:06.88elapsed 30%CPU (0avgtext+0avgdata 49180maxresident)k
0inputs+115240outputs (0major+89930minor)pagefaults 0swaps
So OfflineIMAP is at least two times slower, 8 full seconds
overall. It definitely feels slower and clunkier. Even better, to
fetch new mail we actually only need pull
, which takes even less
time:
$ time smd-pull
0.29user 0.04system 0:03.90elapsed 8%CPU (0avgtext+0avgdata 31480maxresident)k
0inputs+57592outputs (0major+5400minor)pagefaults 0swaps
Five times faster! And notice that much lower CPU usage.
Server-side usage is harder to diagnose, but I couldn't "see" smd on
the server side during smd-pull
at all: maybe it just popped in and
out of existence without me noticing with top(1)
. But I definitely
noticed Dovecot's imap
process during the OfflineIMAP
run. Circumstancial evidence from Prometheus monitoring shows a 18%
CPU usage of Dovecot during fetches on marcos.
Running smd pull; smd push
on a sleep 1
busy loop does make
mddiff
, lua5.1
and xdelta
show up in top
eventually, and CPU
usage does seem higher than OfflineIMAP then (22%) - but it's a bit
of an unfair comparison, because the updates are running much
faster.
A fairer loop would be based on sleep 20
and would match better with
the general OfflineIMAP loop frequency (-k
Account_Anarcat:autorefresh=0.1
means 6 seconds plus the ~13 seconds
run time). Such a loop converges to about 5% extra CPU usage.
Open questions
How should SMD be started in a session? User level systemd service? There is an "applet" that can be used, but that could be annoying. How else should errors be reported? It does look simple enough and non-intrusive: by default it doesn't notify for new mail, which is good.
It would be nice if notmuch insert
could be used to deliver the
emails locally instead of having to rescan the whole database.
How do we refresh a session? In OfflineIMAP, we used to SIGUSR
, but
it's unclear how that works in SMD. Maybe through the FIFOs?
Issue summary
This is a summary of the issues reported upstream, already mentioned above, in chronological order:
- offlineimap migration script might corrupt messages (fixed)
- excluding subscribed folders
- safely strip offlineimap headers (merged)
- fails to exclude remote folder
- should ignore symlinks in "mailboxes"
- smd-pull/push --dry-run way too verbose
- Maildir/INBOX exception
- full resync fails (closed, PEBKAC)
- how do FIFOs work?
Trivia: at the time of writing, all of the currently reported issues against SMD are the ones above, reported by yours truly.