I am an amateur archivist. I keep an archive of audio (music and audiobooks), books (physical and electronic), video (films and TV episodes), and websites as a hobby but also as a librarian: some stuff should be preciously kept for future generations (and enjoyed by ours as well of course).

Web archives

I specifically worked on archiving web sites and wrote a LWN article (local copy) on the topic. Detailed documentation is in web.

Data rescue

I have some experience in data recovery, mostly built as I dealt with various broken hardware: fake flash cards, old CD-ROMs, dead hard drives... My notes on this are in rescue.

Archive management

Mirroring and restoring data is only part of the problem. Once (re)created, the data needs to be properly indexed otherwise it's an undecipherable pile of garbage where nothing can be found. Metadata need to be created for the content and properly indexed. This can include, for each piece of content:

who created it
when
what type of media is it (a book, official documents, newspaper clippings, music, interview, video, etc)
what is in the media (a show? where? a picture of what? etc)
etc

Determining that data is only one part, you also need a way to store the information in a meaningful way. Unfortunately, I don't have good advice for this but to make sure you name the created folders and files correctly. Various storage mediums have support for metadata (MP3 tags, Exif tags for photos, etc): use them. Otherwise filenames can be used or auxiliary text files.

I mostly use git-annex to manage my archives and make sure I have redundant copies. git-annex also supports "scrubbing" copies by verifying checksums on the content.

I also use the following software to import, index and browse contents:

Bookmarks: Wallabag
Books: Zotero
E-books: Calibre
Music: MPD/GMPC, Airsonic, Kodi
Photos: RPD, Darktable, Sigal, Nextcloud (previously: shotwell, f-spot...)
Software: GitLab.com, GitHub.com, Debian.org
Video: RPD, Kodi
Web archives: crawl, pywb, webrecorder.io, archive.org. should evaluate wpull next.

All of those are stored in multiple locations with git-annex, except software which is managed through git only and web archives which are not replicated and usually stored directly on archive.org.

I do not have good mechanisms for the following:

audiobooks
contacts: all over the place. old mutt alias files, VCF exports from phone, phone numbers in agenda. considering monica
game (ROMs)
podcasts (not archived, browsed with AntennaPod on Android)
scans - considering paperless

I need to evaluate the following tools for archive management:

Those come from the awesome self-hosted list.

Created 2018-08-29 16:58. Edited 2018-10-05 17:40.