TLPL; j'ai changé de logiciel pour la gestion de mon blog, de Drupal à Ikiwiki.

TLDR; I have changed my blog from Drupal to Ikiwiki.

http://anarcat.koumbit.org/ will continue operating for a while to give a chance to feed aggregators to catch that article. It will also give time to the Internet archive to catchup with the static stylesheets (it turns out it doesn't like Drupal's CSS compression at all!) An archive will therefore continue being available on the internet archive for people that miss the old stylesheet.

I have redirected the http://anarcat.koumbit.org URL to the new blog location, http://anarc.at/blog. This will be my last blog post written on Drupal, and all new content will be available on the new URL. RSS feed URLs should not change.

Why
What
When
Who
How

Why

I have migrated away from Drupal because it is basically impossible to upgrade my blog from Drupal 6 to Drupal 7. Or if it is, I'll have to redo the whole freaking thing again when Drupal 8 comes along.

And frankly, I don't really need Drupal to run a blog. A blog was originally a really simple thing: a web blog. A set of articles written on the corner of a table. Now with Drupal, I can add ecommerce, a photo gallery and whatnot to my blog, but why would I do that? and why does it need to be a dynamic CMS at all, if I get so little comments?

So I'm switching to ikiwiki, for the following reason:

no upgrades necessary: well, not exactly true, i still need to upgrade ikiwiki, but that's covered by the Debian package maintenance and I only have one patch to it, and there's no data migration! (the last such migration in ikiwiki was in 2009 and was fully supported)
offline editing: this is a a big thing for me: i can just note things down and push them when I get back online
one place for everything: this blog is where I keep my notes, it's getting annoying to have to keep track of two places for that stuff
future-proof: extracting content from ikiwiki is amazingly simple. every page is a single markdown-formatted file. that's it.

Migrating will mean abandoning the barlow theme, which was seeing a declining usage anyways.

What

So what should be exported exactly. There's a bunch of crap in the old blog that i don't want: users, caches, logs, "modules", and the list goes on. Maybe it's better to create a list of what I need to extract:

nodes
- title (meta title and guid tags, guid to avoid flooding aggregators)
- body (need to check for "break comments")
- nid (for future reference?)
- tags (should be added as [[!tag foo bar baz]] at the bottom)
- URL (to keep old addresses)
- published date (meta date directive)
- modification date (meta updated directive)
- revisions?
- attached files
menus
- RSS feed
- contact
- search
comments
- author name
- date
- title
- content
attached files
- thumbnails
- links
tags
- each tag should have its own RSS feed and latest posts displayed

When

I had planned to do this before summer 2015, but it turned out being fairly easy and fun, so i spent two evenings working on a script on feb 5th and 6th, and finally turned off the Drupal site on monday february 9th.

Who

Well me, who else. You probably really don't care about that, so let's get to the meat of it.

How

How to perform this migration... There are multiple paths:

MySQL commandline: extracting data using the commandline mysql tool (drush sqlq ...)
Views export: extracting "standard format" dumps from Drupal and parse it (JSON, XML, CSV?)

Both approaches had issues, and I found a third way: talk directly to mysql and generate the files directly, in a Python script. But first, here are the two previous approaches I know of.

MySQL commandline

LeLutin switched using MySQL requests, although he doesn't specify how content itself was migrated. Comments importing is done with that script:

echo "select n.title, concat('| [[!comment   format=mdwn|| username=\"', c.name, '\"|| ip=\"', c.hostname, '\"|| subject=\"', c.subject, '\"|| date=\"', FROM_UNIXTIME(c.created), '\"|| content=\"\"\"||', b.comment_body_value, '||\"\"\"]]') from node n, comment c, field_data_comment_body b where n.nid=c.nid and c.cid=b.entity_id;" | drush sqlc | tail -n +2 | while read line; do if [ -z "$i" ]; then i=0; fi; title=$(echo "$line" | sed -e 's/[    ]\+|.*//' -e 's/ /_/g' -e 's/[:(),?/+]//g'); body=$(echo "$line" | sed 's/[^|]*| //'); mkdir -p ~/comments/$title; echo -e "$body" > ~/comments/$title/comment_$i._comment; i=$((i+1)); done

Kind of ugly, but beats what i had before (which was "nothing").

I do think it is the good direction to take, to simply talk to the MySQL database, maybe with a native Python script. I know the Drupal database schema pretty well (still! this is D6 after all) and it's simple enough that this should just work.

Views export

mvc recommended views data export on Lelutin's blog. Unfortunately, my experience with the views export interface has been somewhat mediocre so far. Yet another reason why I don't like using Drupal anymore is this kind of obtuse dialogs:

I clicked through those for about an hour to get JSON output that turned out to be provided by views bonus instead of views_data_export. And confusingly enough, the path and format_name fields are null in the JSON output (whyyy!?). views_data_export unfortunately only supports XML, which seems hardly better than SQL for structured data, especially considering I am going to write a script for the conversion anyways.

Basically, it doesn't seem like any amount of views mangling will provide me with what i need.

Nevertheless, here's the failed-export-view.txt that I was able to come up with, may it be useful for future freedom fighters.

Python script

I ended up making a fairly simple Python script to talk directly to the MySQL database.

The script exports only nodes and comments, and nothing else. It makes a bunch of assumptions about the structure of the site, and is probably only going to work if your site is a simple blog like mine, but could probably be improved significantly to encompass larger and more complex datasets. History is not preserved so no interaction is performed with git.

Generating dump

First, I imported the MySQL dump file on my local mysql server for easier development. It is 13.9MiO!!

mysql -e 'CREATE DATABASE anarcatblogbak;'
ssh aegir.koumbit.net "cd anarcat.koumbit.org ; drush sql-dump" | pv | mysql anarcatblogbak

I decided to not import revisions. The majority (70%) of the content has 1 or 2 revisions, and those with two revisions are likely just when the node was actually published, with minor changes. ~80% have 3 revisions or less, 90% have 5 or less, 95% 8 or less, and 98% 10 or less. Only 5 articles have more than 10 revisions, with two having the maximum of 15 revisions.

Those stats were generated with:

SELECT title,count(vid) FROM anarcatblogbak.node_revisions group
by nid;

Then throwing the output in a CSV spreadsheet (thanks to mysql-workbench for the easy export), adding a column numbering the rows (B1=1,B2=B1+1), another for generating percentages (C1=B1/count(B$2:B$218)) and generating a simple graph with that. There were probably ways of doing that more cleanly with R, and I broke my promise to never use a spreadsheet again, but then again it was Gnumeric and it's just to get a rough idea.

There are 196 articles to import, with 251 comments, which means an average of 1.15 comment per article (not much!). Unpublished articles (5!) are completely ignored.

Summaries are also not imported as such (break comments are ignored) because ikiwiki doesn't support post summaries.

Calling the conversion script

The script is in drupal2ikiwiki.py. It is called with:

./drupal2ikiwiki.py -u anarcatblogbak -d anarcatblogbak blog -vv

The -n and -l1 have been used for first tests as well. Use this command to generate HTML from the result without having to commit and push all:

ikiwiki --plugin meta --plugin tag --plugin comments --plugin inline  . ../anarc.at.html

More plugins are of course enabled in the blog, see the setup file for more information, or just enable plugin as you want to unbreak things. Use the --rebuild flag on subsequent runs. The actual invocation I use is more something like:

ikiwiki --rebuild --no-usedirs --plugin inline --plugin calendar --plugin postsparkline --plugin meta --plugin tag --plugin comments --plugin sidebar  . ../anarc.at.html

I had problems with dates, but it turns out that I wasn't setting dates in redirects... Instead of doing that, I started adding a "redirection" tag that gets ignored by the main page.

Files and old URLs

The script should keep the same URLs, as long as pathauto is enabled on the site. Otherwise, some logic should be easy to add to point to node/N.

To redirect to the new blog, rewrite rules, on original blog, should be as simple as:

Redirect / http://anarc.at/blog/

When we're sure:

Redirect permanent / http://anarc.at/blog/

Now, on the new blog, some magic needs to happen for files. Both /files and /sites/anarcat.koumbit.org/files need to resolve properly. We can't use symlinks because ikiwiki drops symlinks on generation.

So I'll just drop the files in /blog/files directly, the actual migration is:

cp $DRUPAL/sites/anarcat.koumbit.org/files $IKIWIKI/blog/files
rm -r .htaccess css/ js/ tmp/ languages/
rm foo/bar # wtf was that.
rmdir *
sed -i 's#/sites/anarcat.koumbit.org/files/#/blog/files/#g' blog/*.mdwn
sed -i 's#http://anarcat.koumbit.org/blog/files/#/blog/files/#g' blog/*.mdwn
chmod -R -x blog/files
sudo chmod -R +X blog/files

A few pages to test images:

http://anarcat.koumbit.org/node/157
http://anarcat.koumbit.org/node/203

There are some pretty big files in there, 10-30MB MP3s - but those are already in this wiki! so do not import them!

Running fdupes on the result helps find oddities.

The meta guid directive is used to keep the aggregators from finding duplicate feed entries. I tested it with Liferea, but it may freak out some other sites.

Remaining issues

postsparkline and calendar archive disrespect meta(date) - filed upstream bup
~~merge the files in /communication with the ones in /blog/files before import~~ - done!
~~import non-published nodes~~ ignored for now
check nodes with a format different than markdown (only a few 3=Full HTML found so far)
replace links to this wiki in blog posts with internal links

More progress information in the script itself.

RSS

indieweb / fedweb

Salut Antoine,

Pas évident maintenir un Drupal personnel...

Serais-tu intéressé par une petite rencontre mercredi soir pour discuter d'indieweb et fedweb? https://indiewebcamp.com/events/2015-02-11-homebrew-website-club (et mes débuts de notes sur http://robin.millette.info/indie-fed-web - aussi éditable off-line, youppi!)

En fait, je pense aussi au Smallest Federated Wiki et à la migration de c2.com vers un Single Page App - et comparé à ikiwiki. Bref, je crois qu'il y a moyen de fédérér toutes sortes d'affaires et j'aimerais connaitre tes impressions sur tout ça.

À bientôt!

Comment by robin — 2015-02-07 22:00

salut robin! oui, ça

salut robin!

oui, ça m'intéresse, mais malheureusement je peux pas être présent. je vous souhaite une bonne rencontre...

Comment by anarcat — 2015-02-09 11:11

Jekyll?

I think I'm going to move to Jekyll. Lots of people have written conversion scripts, including from drupal, which makes me think it shouldn't be too bad. My first ever migration away from drupal!

Comment by mvc — 2015-02-14 23:44

about jekyll

i think jekyll is great! however, it is more geek-oriented than ikiwiki is from my point of view (believe it or not) as ikiwiki has a web interface and supports comments (such as this one) that lambda users can input without knowledge of git. it is also somewhat integrated with github, a proprietary software company which silos i try to avoid..

but yeah, had i known there was a migration script for jekyll, i might have worked on improving that to support ikiwiki (so comments) and more... oh well, it was fun writing python.

Comment by anarcat [id.koumbit.net] — 2015-03-04 20:31

Comments on this page are closed.

Created 2015-02-09 11:53. Edited 2015-09-10 16:48.