syncmaildir (SMD) configuration
In May 2018, I have migrated from OfflineIMAP to syncmaildir. This page documents how that process was done and the SMD configuration.
I tried to follow the official procedure to migrate from OfflineIMAP to SMD. I hit some difficulties, which I documented in upstream issues. What follows is the detailed test procedure I followed to test the synchronization and notes about the process.
OfflineIMAP migration
This procedure was attempted to migrate my OfflineIMAP mailbox to SMD, but ultimately failed for reasons explained below.
run
smd-check-confto create a template configuration in.smd/config.defaultand configure it with:SERVERNAME=smd-server-anarcat CLIENTNAME=curie-anarcat MAILBOX_LOCAL=Maildir-smd MAILBOX_REMOTE=Maildir-smd TRANSLATOR_RL="smd-translate -m oimap-dovecot -d RL default" TRANSLATOR_LR="smd-translate -m oimap-dovecot -d LR default" EXCLUDE="$MAILBOX_LOCAL/.notmuch/hooks/* $MAILBOX_LOCAL/.notmuch/xapian/*"authenticate remote server:
ssh imap.anarc.at truecreated a
.ssh/configentry# wrapper for smd Host smd-server-anarcat Hostname imap.anarc.at User anarcat BatchMode yes IdentitiesOnly yes Compression yesThis will be useful later to configure the restricted shell account. A quick overview of those options:
Host: a unique alias that is unlikely to be reused outside of that configurationHostname: make sure we connect to the right host regardless of the alias defined inHostUser: same, but for the userBatchMode: do not prompt so fails correctly if public key is missingIdentitiesOnly: same, but do not look for external crypto tokens like a YubikeyCompression: SMD does not do compression by default, so delegate that toSSH
create the test maildirs, takes about 2 minutes, on both the client and the server:
server$ cp -a Maildir/ Maildir-smd/ client$ cp -a Maildir/Anarcat/ Maildir-smd/rename the
INBOXfolder on the client. the problem here is that theINBOXfolder exists locally (thanks to offlineimap) and not remotely (thanks to dovecot). this was reported in bug #7 and it seems the workaround might be, on the client:client$ mv Maildir-smd/INBOX/{cur,new,tmp} Maildir-smd/ && rmdir Maildir-smd/INBOX/run
smd-check-confrepeatedly until it stops complaining and looks sane. steps taken to cleanup remote directory:- removed top-level stray folders
moved out the Koumbit subdirectory that I couldn't make smd ignore. nothing seemed to work: not only did the folder not get ignored, the translation layer would fail to convert back and forth, because this was not really a Dovecot folder. i tried:
EXCLUDE_REMOTE='Maildir/Koumbit Maildir/Koumbit Maildir/Koumbit/ Maildir/Koumbit.INBOX.Archives.2012/ Maildir/.notmuch/hooks/ Maildir/.notmuch/xapian/'
filed as bug #4. The
.notmuchfolder ignores are necessary because smd crashes on symlinks (bug #5).the above fails with remote folder as
Maildir-smd. just removed the folders for now, but they are important and shouldn't be completely destroyed!server$ mkdir Maildir-smd-notmuch && mv Maildir-smd/.notmuch/{hooks,xapian,muchsync} Maildir-smd-notmuch
when the config looks sane, the next step is to convert the folder away from OfflineIMAP idiosyncracies, particularly removing the
X-OfflineIMAPheader. there I have found that the suggested commandline:find Mail -type f -exec sed -i '/^X-OfflineIMAP/d' {} \;was problematic: it could corrupt email bodies as it works on complete messages, not just the headers. I wrote a generic header stripping script to workaround that issue. to call it, use this, which takes about three minutes, on both local and remote servers (because yes, the remote server also has OfflineIMAP headers somehow):
server$ ~/dist/syncmaildir/misc/strip-header Maildir-smd/ 2>&1 >&2 | pv -s 187000 -l > log client$ ~/dist/syncmaildir/misc/strip-header Maildir-smd/ 2>&1 >&2 | pv -s 187000 -l > logcontributed upstream as PR #3.
then the files need to be renamed to please SMD, using, which takes about 3 minutes and generates ~140k renames (basically all files get renamed):
client$ smd-uniform-names -vactually rename the files, using the script created, takes about 2 minutes:
client$ sh -x smd-rename.sh 2>&1 | pv -l -s $(wc -l smd-rename.sh | cut -f1 -d ' ') > log-renamefirst dry-run pull, takes about 4 minutes. about 80k emails are missing, probably because SMD considers all folders, even if not subscribed (bug #2). all filenames to transfer are also dumped to stdout, ouch! (bug #6). at least logs are in
~/.smd/log/client.default.logclient$ smd-pull --dry-runupdate: after a full rerun, it turns out only 9k emails are missing. after running the strip-header on both sides, only ~600. but with s/^cp/^mv/, back to 1400. grml. after another full rerun, the numbers are 401 new mails on pull.
to get the count:
grep stats::new-mails /home/anarcat/.smd/log/client.default.log... grouped per folder:
grep stats::mail-transferre /home/anarcat/.smd/log/client.default.log | sed -e 's/ , /\n/g' | sed 's#/cur/.*##' | sort | uniq -c | sort -nfirst dry-run push - problem: 15k mails missing from remote? maybe because we're in dry-run mode? need to backup remote and test again. nope. there are genuine diffs there, e.g. git-annex folder totally different. maybe not subscribed?
client$ smd-push --dry-runresult: 591 more duplicate emails.
pull again: 490 mails deleted? push again, no change. wtf...
create email remotely and locally, go to 10
run
smd-loopand hook into startup scripts? (TODO)create restricted shell (TODO)
call
notmuch newin~/.smd/hooks/post-pull.d/(TODO)
Clearing the slate can be done by running this command on both ends:
\rm -r .smd/workarea/ .smd/*.db.txt Maildir-smd/
The migration did not work very well: as documented above, lots of problems with exclude patterns, weird error messages, large dumps on the console, and scripts I had to rewrite. I also end up with duplicate emails in the process, something I generally try to avoid. Even if it's about 500 emails over ~200 000, it's still annoying.
I found it was better to start off with a clean slate and just copy all files as is to start with. The problem then of course is the directory layout is completely changed and is now incompatible with OfflineIMAP forever. But that is inevitable: the second we rename files and unmangle the headers to remove OfflineIMAP specific stuff, the folder cannot be reused, so it's unclear what the benefit of migrating from OfflineIMAP is over just using rsync to have a clean mirror to start with, avoiding all the messy rewrite rules logic.
Full synchronization
This is an alternative procedure from the above that just copies the files over and starts from a clean slate. Considering we lose compatibility with OfflineIMAP anyways, it seems like a much simpler procedure at no extra cost (except we need to copy files over).
The layout of the files will change to something a little more obscure which requires some fixes in notmuch tagging scripts and my notmuch-emacs config, but that will actually simplify things as we reduce the deltas between machines. It also requires a rescan of notmuch, which takes a long time (30-60 minutes), but that's a one-time cost only without data loss.
configure SSH using the snippet from step 3 in the above procedure and confirm logging in works:
ssh smd-server-anarcatcreate a copy on remote server, takes about two minutes:
ssh anarc.at cp -a Maildir Maildir-smdstrip offlineimap headers on remote, takes about 3-4 minutes?
ssh anarc.at "~/dist/syncmaildir/misc/strip-header Maildir-smd/ 2>&1 >&2 | pv -s 187000 -l > log"synchronize the Maildir folder, this takes about 20 minutes: 1` ssh anarc.at "tar czf - Maildir-smd/" | pv -s 7G | tar xfz -
cleanup IMAP server cruft copied from the remote folder:
( cd Maildir-smd find \( -name :list -o -name courierimapkeywords -o -name courierimapuiddb -o -name courierimapacl \) -a -delete find \( -name maildirfolder -o -name 'dovecot.index*' -o -name 'dovecot-uidlist' -o -name 'dovecot-keywords' \) -a -delete \rm dovecot-uidvalidity* dovecot.mailbox.log subscriptions )clear out notmuch metadata from remote and keep a dump of our current tags
( cd Maildir-smd \rm .notmuch/dump-201* \rm -r .notmuch/{muchsync,xapian,hooks}/ ) cp -a Maildir/.notmuch/{xapian,hooks} Maildir-smd/.notmuch/ notmuch dump > Maildir-smd/.notmuch/dumpwe go live. do one last offlienimap run, stop offlineimap, stop sending email locally, stop whatever writes to
~/Maildir/. on the server, this means stopping dovecot and postfix as well:killall -HUP offlineimap ; sleep 60 ; killall offlineimap ssh anarc.at sudo systemctl stop dovecot postfix ssh anarc.at mv Maildir Maildir.orig && mv Maildir-smd Maildirswap the old OfflineIMAP folder with the new folder.
mv Maildir Maildir-offlineimap mv Maildir-smd Maildiralso change the path to the OfflineIMAP folder in
.offlineimaprcin case it gets started by mistake.change folders to point to real folders, change the translators, and fix the exclude patterns in
~/.smd/config.default:SERVERNAME=smd-server-anarcat CLIENTNAME=curie-anarcat MAILBOX=Maildir EXCLUDE="Maildir/.notmuch/hooks/* Maildir/.notmuch/xapian/*" TRANSLATOR_LR="smd-translate -m move -d LR default" TRANSLATOR_RL="smd-translate -m move -d RL default"run a test pull, should be no change:
smd-pull --dry-run
we had problems at this step previously because we were running smd on the
Maildir-smdfolder directly, before the rename. it might have been keeping state in~/.smd/workareathat was confusing things so instead we made this direct procedure. the problem was reported upstream in bug #8, but it turns out the bug was my mistake, as I wasn't renaming the folder on the server side.test push, same:
smd-push --dry-run
run
notmuch new. this involves first fixing thenotmuch-tagandnotmuch-purgescript to remove the host-specific exception, then dropping all theunread,flaggedandinboxtags that will be re-imported by mistake, then restoring from the backup:notmuch new notmuch tag -inbox -flagged -unread pv ~/Maildir/.notmuch/dump | notmuch restorerunning
notmuch newthe first time takes a loong time, maybe as long as if it was a clean database. might be worth not copying thexapianfolder after all and just start from scratch, as long as we restore the previous dump.run the real pull/push
send myself an email, which should create an email locally and remotely, run pull/push again
hook
notmuch newin SMD:echo '#!/usr/bin/notmuch new' > ~/.smd/hooks/post-pull.d/notmuch chmod +x ~/.smd/hooks/post-pull.d/notmuch
start
smd-loopfrom somewhere. for now just running in place of OfflineIMAP in a different workspace. I'm also experimenting with starting SMD from systemdcreate restricted shell. this means creating a new key with
ssh-keygen, without password, and adding it on the server in~/.ssh/authorized_keyswith restrictions, like this:command="smd-restricted-shell",restrict ssh-ed25519 AAAAC3... user@hostThe
restrictparameter assumes the server runs OpenSSH 7.2 or later, otherwise you need to spell out the individual restrictions likeno-port-forwardingand so on, seeauthorized_keys(5)for details.
Performance comparison
OfflineIMAP is definitely slower. Here is a single run, although with a password input prompt that I estimate takes about one second:
*** Finished account 'Anarcat' in 0:13
11.46user 2.96system 0:13.39elapsed 107%CPU (0avgtext+0avgdata 84760maxresident)k
0inputs+112outputs (0major+51610minor)pagefaults 0swaps
That is with postsynchook disabled and running in "once" mode (-k
Account_Anarcat:postsynchook=true -o), the hook (notmuch) takes about
3 seconds on its own.
SMD, on a snapshot of that mailbox about an hour old (so essentially the same):
$ time sh -c "smd-pull ; smd-push"
1.25user 0.82system 0:06.88elapsed 30%CPU (0avgtext+0avgdata 49180maxresident)k
0inputs+115240outputs (0major+89930minor)pagefaults 0swaps
So OfflineIMAP is at least two times slower, 8 full seconds
overall. It definitely feels slower and clunkier. Even better, to
fetch new mail we actually only need pull, which takes even less
time:
$ time smd-pull
0.29user 0.04system 0:03.90elapsed 8%CPU (0avgtext+0avgdata 31480maxresident)k
0inputs+57592outputs (0major+5400minor)pagefaults 0swaps
Five times faster! And notice that much lower CPU usage.
Server-side usage is harder to diagnose, but I couldn't "see" smd on
the server side during smd-pull at all: maybe it just popped in and
out of existence without me noticing with top(1). But I definitely
noticed Dovecot's imap process during the OfflineIMAP
run. Circumstancial evidence from Prometheus monitoring shows a 18%
CPU usage of Dovecot during fetches on marcos.
Running smd pull; smd push on a sleep 1 busy loop does make
mddiff, lua5.1 and xdelta show up in top eventually, and CPU
usage does seem higher than OfflineIMAP then (22%) - but it's a bit
of an unfair comparison, because the updates are running much
faster.
A fairer loop would be based on sleep 20 and would match better with
the general OfflineIMAP loop frequency (-k
Account_Anarcat:autorefresh=0.1 means 6 seconds plus the ~13 seconds
run time). Such a loop converges to about 5% extra CPU usage.
Open questions
How should SMD be started in a session? User level systemd service? There is an "applet" that can be used, but that could be annoying. How else should errors be reported? It does look simple enough and non-intrusive: by default it doesn't notify for new mail, which is good.
It would be nice if notmuch insert could be used to deliver the
emails locally instead of having to rescan the whole database.
How do we refresh a session? In OfflineIMAP, we used to SIGUSR, but
it's unclear how that works in SMD. Maybe through the FIFOs?
Issue summary
This is a summary of the issues reported upstream, already mentioned above, in chronological order:
- offlineimap migration script might corrupt messages (fixed)
- excluding subscribed folders
- safely strip offlineimap headers (merged)
- fails to exclude remote folder
- should ignore symlinks in "mailboxes"
- smd-pull/push --dry-run way too verbose
- Maildir/INBOX exception
- full resync fails (closed, PEBKAC)
- how do FIFOs work?
Trivia: at the time of writing, all of the currently reported issues against SMD are the ones above, reported by yours truly.