December 2018 report: archiving Brazil, calendar and LTS
Last two months free software work
Keen readers probably noticed that I didn't produce a report in November. I am not sure why, but I couldn't find the time to do so. When looking back at those past two months, I didn't find that many individual projects I worked on, but there were massive ones, of the scale of archiving the entire government of Brazil or learning the intricacies of print media, both of which were slightly or largely beyond my existing skill set.
Calendar project
I've been meaning to write about this project more publicly for a while, but never found the way to do so productively. But now that the project is almost over -- I'm getting the final prints today and mailing others hopefully soon -- I think this deserves at least a few words.
As some of you might know, I bought a new camera last January. Wanting to get familiar with how it works and refresh my photography skills, I decided to embark on the project of creating a photo calendar for 2019. The basic idea was simple: take pictures regularly, then each month pick the best picture of that month, collect all those twelve pictures and send that to the corner store to print a few calendars.
Simple, right?
Well, twelve pictures turned into a whopping 8000 pictures since January, not all of which were that good of course. And of course, a calendar has twelve months -- so twelve pictures -- but also a cover and a back, which means thirteen pictures and some explaining. Being critical of my own work, it turned out that finding those pictures was sometimes difficult, especially considering the medium imposed some rules I didn't think about.
For example, the US Letter paper size imposes a different ratio (1.29) than the photographic ratio (~1.5) which means I had to reframe every photograph. Sometimes this meant discarding entire ideas. Other photos were discarded because too depressing even if I found them artistically or journalistically important: you don't want to be staring at a poor kid distressed at going into school every morning for an entire month. Another advice I got was to forget about sunsets and dark pictures, as they are difficult to render correctly in print. We're used to bright screens displaying those pictures, paper is a completely different feeling. Having a good vibe for night and star photography, this was a fairly dramatic setback, even though I still did feature two excellent pictures.
Then I got a little carried away. At the suggestion of a friend, I figured I could get rid of the traditional holiday dates and replace them with truly secular holidays, which got me involved in a deep search for layout tools, which in turn naturally brought me to this LaTeX template. Those who have worked with LaTeX (or probably any professional layout tool) know what's next: I spent a significant amount of time perfecting the rendering and crafting the final document.
Slightly upset by the prices proposed by the corner store (15$CAD/calendar!), I figured I could do better by printing on my own, especially encouraged by a friend who had access to a good color laser printer. I then spent multiple days (if not weeks) looking for the right paper, which got me in the rabbit hole of paper weights, brightness, texture, and more. I'll just say this: if you ever thought lengths were ridiculous in the imperial system, wait until you find out how you find out about how paper weights work. I finally managed to find some 270gsm gloss paper at the corner store -- after looking all over town, it was right there -- and did a first print of 15 calendars, which turned into 14 because of trouble with jammed paper. Because the printer couldn't do recto-verso copies, I had to spend basically 4 hours tending to that stupid device, bringing my loathing of printers (the machines) and my respect for printers (the people) to an entirely new level.
The time spent on the print was clearly not worth it in the end, and I ended up scheduling another print with a professional printer. The first proof are clearly superior to the ones I have done myself and, in retrospect, completely worth the 15$ per copy.
I still haven't paid for my time in any significant way on that project, something I seem to excel at doing consistently. The prints themselves are not paid for, but my time in producing those photographs is not paid either, which clearly outlines my future as a professional photographer, if any, lie far away from producing those silly calendars, at least for now.
More documentation on the project is available, in french, in calendrier-2019. I am also hoping to eventually publish a graphical review of the calendar, but for now I'll leave that for the friends and family who will receive the calendar as a gift...
Archival of Brasil
Another modest project I embarked on was a mission to archive the government of Brazil following the election the infamous Jair Bolsonaro, dictatorship supporter, homophobe, racist, nationalist and christian freak that somehow managed to get elected president of Brazil. Since he threatened to rip apart basically the entire fabric of Brazilian society, comrades were worried that he might attack and destroy precious archives and data from government archives when he comes in power, in January 2019. Like many countries in Latin America that lived under dictatorships in the 20th century, Brazil made an effort to investigate and keep memory of the atrocities that were committed during those troubled times.
Since I had written about archiving websites, those comrades naturally thought I could be of use, so we embarked on a crazy quest to archive Brazil, basically. We tried to create a movement similar to the Internet Archive (IA) response to the 2016 Trump election but were not really successful at getting IA involved. I was, fortunately, able to get the good folks at Archive Team (AT) involved and we have successfully archived a significant number of websites, adding terabytes of data to the IA through the backdoor that is AT. We also ran a bunch of archival on a special server, leveraging tools like youtube-dl, git-annex, wpull and, eventually, grab-site to archive websites, social network sites and video feeds.
I kind of burned out on the job. Following Brazilian politics was scary and traumatizing - I have been very close to Brazil folks and they are colorful, friendly people. The idea that such a horrible person could come into power there is absolutely terrifying and I kept on thinking how disgusted I would be if I would have to archive stuff from the government of Canada, which I do not particularly like either... This goes against a lot of my personal ethics, but then it beats the obscurity of pure destruction of important scientific, cultural and historical data.
Miscellaneous
Considering the workload involved in the above craziness, the fact that I worked on less project than my usual madness shouldn't come as a surprise.
As part of the calendar work, I wrote a new tool called
moonphases
which shows a list of moon phase events in the given time period, and shipped that as part of undertime 1.5 for lack of a better place.AlternC revival: friends at Koumbit asked me for source code of AlternC projects I was working on. I was disappointed (but not surprised) that upstream simply took those repositories down without publishing an archive. Thankfully, I still had SVN checkouts but unfortunately, those do not have the full history, so I reconstructed repositories based on the last checkout that I had for alternc-mergelog, alternc-stats, and alternc-slavedns.
I packaged two new projects into Debian, bitlbee-mastodon (to connect to the new Mastodon network over IRC) and python-internetarchive (a command line interface to the IA upload forms)
my work on archival tools led to a moderately important patch in pywb: allow symlinking and hardlinking files instead of just copying was important to manage multiple large WARC files along with git-annex.
I also noticed the IA people were using a tool called slurm to diagnose bandwidth problems on their networks and implemented iface speed detection on Linux while I was there. slurm is interesting, but I also found out about bmon through the hilarious hollywood project. Each has their advantages: bmon has packets per second graphs, while slurm only has bandwidth graphs, but also notices maximal burst speeds which is very useful.
Debian Long Term Support (LTS)
This is my monthly Debian LTS report. Note that my previous report wasn't published on this blog but on the mailing list.
Enigmail / GnuPG 2.1 backport
I've spent a significant amount of time working on the Enigmail backport for a third consecutive month. I first published a straightforward backport of GnuPG 2.1 depending on the libraries available in jessie-backports last month, but then I actually rebuilt the dependencies as well and sent a "HEADS UP" to the mailing list, which finally got peoples' attention.
There are many changes bundled in that possible update: GnuPG actually depends on about half a dozen other libraries, mostly specific to GnuPG, but in some cases used by third party software as well. The most problematic one is libgcrypt20 which Emilio Pozuelo Monfort said included tens of thousands of lines of change. So even though I tested the change against cryptsetup, gpgme, libotr, mutt and Enigmail itself, there are concerns that other dependencies that merit more testing as well.
This caused many to raise the idea of aborting the work and simply marking Enigmail as unsupported in jessie. But Daniel Kahn Gillmor suggested this should also imply removing Thunderbird itself from jessie, as simply removing Enigmail will force people to use the binaries from Mozilla's add-ons service. Gillmor explained those builds include a OpenPGP.js implementation of dubious origin, which is especially problematic considering it deals with sensitive private key material.
It's unclear which way this will go next. I'm taking a break of this
issue and hope others will be able to test the packges. If we keep on
working on Enigmail, the next step will be to re-enable the dbg
packages that were removed in the stretch updates, use dh-autoreconf
correctly, remove some mingw
pacakges I forgot and test gcrypt like
crazy (especially the 1.7 update). We'd also update to the
latest Enigmail, as it fixes issues that forced the Tails project to
disable autocrypt because of weird interactions that make it
send cleartext (instead of encrypted) mail in some cases.
Automatic unclaimer
My previous report yielded an interesting discussion around my work on the security tracker, specifically the "automatic unclaimer" designed to unassign issues that are idle for too long. Holger Levsen, with his new coordinator hat, tested the program and found many bugs and missing features, which I was happy to implement. After many patches and back and forth, it seems the program is working well, although it's ran by hand by the coordinator.
DLA website publication
I took a look at various issues surrounding the publication of LTS advisories on the main debian.org website. While normal security advisories are regularly published on debian.org/security about 500+ DLAs are missing from the website, mainly because DLAs are not automatically imported.
As it turns out, there is a script called parse-dla.pl
that is
designed to handle those entries but for some reason, they are not
imported anymore. So I got to work to import the backlog and make sure
new entries are properly imported.
Various fixes for parse-dla.pl were necessary to properly parse
messages both from the templates generated by gen-DLA
and the
existing archives correctly. then I tested the result with two
existing advisories, which resulted in two MR on the webml repo: add
data for DLA-1561 and add dla-1580 advisory. I requested and
was granted access to the repo, and eventually merged my own MRs after
a review from Levsen.
I eventually used the following procedure to test importing the entire archive:
rsync -vPa master.debian.org:/home/debian/lists/debian-lts-announce .
cd debian-lts-announce
xz -d \*.xz
cat \* > ../giant.mbox
mbox2maildir ../giant.mbox debian-lts-announce.d
for mail in debian-lts-announce.d/cur/\*; do
~/src/security-tracker/./parse-dla.pl $mail;
done
This lead to 82 errors on an empty directory, which is not bad at all considering the amount of data processed. Of course, there many more errors in the live directory as many advisories were already present. In the live directory, this resulted in 2431 new advisories added to the website.
There were a few corner cases:
The first month or so didn't use DLA identifiers and many of those were not correctly imported even back then.
DLA-574-1 was a duplicate, covered by the DLA-574-2 regression update. But I only found the Imagemagick advisory - it looks like the qemu one was never published.
Similarly, the graphite2 regression was never assigned a real identifier.
Other cases include for example DLA-787-1 which was sent twice and the DLA-1263-1 duplicate, which was irrecuperable as it was never added to
data/DLA/list
Those special cases will all need to be handled by an eventual automation of this process, which I still haven't quite figured out. Maybe a process similar to the unclaimer will be followed: the coordinator or me could add missing DLAs until we streamline the process, as it seems unlikely we will want to add more friction to the DLA release by forcing workers to send merge requests to the web team, as that will only put more pressure on the web team...
There are also nine advisories missing from the mailing list archive because of a problem with the mailing list server at that time. We'll need to extract those from people's email archives, which I am not sure how to coordinate at this point.
PHP CVE identifier confusion
I have investigated CVE-2018-19518, mistakenly identified as CVE-2018-19158 in various places, including upstream's bugtracker. I requested the latter erroneous CVE-2018-19158 to be retired to avoid any future confusion. Unfortunately, Mitre indicated the CVE was already in "active use for pre-disclosure vulnerability coordination", which made it impossible to correct the error at that level.
I've instead asked upstream to correct the metadata in their tracker but it seems nothing has changed there yet.