October 2017 report: LTS, feed2exec beta, pandoc filters, git mediawiki
Debian Long Term Support (LTS)
This is my monthly Debian LTS report. This time I worked on the famous KRACK attack, git-annex, golang and the continuous stream of GraphicsMagick security issues.
WPA & KRACK update
I spent most of my time this month on the Linux WPA code, to
backport it to the old (~2012) wpa_supplicant
release. I
first published a patchset based on the patches shipped after the
embargo for the oldstable/jessie release. After feedback from the
list, I also built packages for i386 and ARM.
I have also reviewed the WPA protocol to make sure I understood
the implications of the changes required to backport the patches. For
example, I removed the patches touching the WNM sleep mode code as
that was introduced only in the 2.0 release. Chunks of code regarding
state tracking were also not backported as they are part of the state
tracking code introduced later, in 3ff3323. Finally, I still have
concerns about the nonce setup in patch #5. In the last chunk, you'll
notice peer->tk
is reset, to_set to negotiate a new TK
. The other
approach I considered was to backport 1380fcbd9f ("TDLS: Do not
modify RNonce for an TPK M1 frame with same INonce") but I figured I
would play it safe and not introduce further variations.
I should note that I share Matthew Green's observations regarding the opacity of the protocol. Normally, network protocols are freely available and security researchers like me can easily review them. In this case, I would have needed to read the opaque 802.11i-2004 pdf which is behind a TOS wall at the IEEE. I ended up reading up on the IEEE_802.11i-2004 Wikipedia article which gives a simpler view of the protocol. But it's a real problem to see such critical protocols developed behind closed doors like this.
At Guido's suggestion, I sent the final patch upstream explaining the concerns I had with the patch. I have not, at the time of writing, received any response from upstream about this, unfortunately. I uploaded the fixed packages as DLA 1150-1 on October 31st.
Git-annex
The next big chunk on my list was completing the work on git-annex (CVE-2017-12976) that I started in August. It turns out doing the backport was simpler than I expected, even with my rusty experience with Haskell. Type-checking really helps in doing the right thing, especially considering how Joey Hess implemented the fix: by introducing a new type.
So I backported the patch from upstream and notified the security team that the jessie and stretch updates would be similarly easy. I shipped the backport to LTS as DLA-1144-1. I also shared the updated packages for jessie (which required a similar backport) and stretch (which didn't) and those Sebastien Delafond published those as DSA 4010-1.
Graphicsmagick
Up next was yet another security vulnerability in the Graphicsmagick stack. This involved the usual deep dive into intricate and sometimes just unreasonable C code to try and fit a round tree in a square sinkhole. I'm always unsure about those patches, but the test suite passes, smoke tests show the vulnerability as fixed, and that's pretty much as good as it gets.
The announcement (DLA 1154-1) turned out to be a little special because I had previously noticed that the penultimate announcement (DLA 1130-1) was never sent out. So I made a merged announcement to cover both instead of re-sending the original 3 weeks late, which may have been confusing for our users.
Triage & misc
We always do a bit of triage even when not on frontdesk duty, so I:
triaged Puppet's CVE-2016-5714 out of wheezy and other suites, after a thorough analysis of the what has become the intricate numbering scheme for Puppet suites
triaged ImageMagick as not affecting in wheezy and jessie, but it turned out the latter was a little too enthusiastic as the team wanted to wait for upstream confirmation before skipping jessie
did some research on tiff's CVE-2017-11613 (skipped by RHEL) and CVE-2017-9935 (no fix upstream)
I also did smaller bits of work on:
worked on a patch to add a
dch --lts
flag in Debian bug #762715 which is currently pending reviewgolang's CVE-2017-15041 which I originally triaged out but then changed my mind as the patch was small and the impact was large. This turned into DLA-1148-1.
The latter reminded me of the concerns I have about the long-term maintainability of the golang ecosystem: because everything is statically linked, an update to a core library (say the SMTP library as in CVE-2017-15042, thankfully not affecting LTS) requires a full rebuild of all packages including the library in all distributions. So what would be a simple update in a shared library system could mean an explosion of work on statically linked infrastructures. This is a lot of work which can definitely be error-prone: as I've seen in other updates, some packages (for example the Ruby interpreter) just bit-rot on their own and eventually fail to build from source. We would also have to investigate all packages to see which one include the library, something which we are not well equipped for at this point.
Wheezy was the first release shipping golang packages but at least it's shipping only one... Stretch has shipped with two golang versions (1.7 and 1.8) which will make maintenance ever harder in the long term.
We build our computers the way we build our cities--over time, without a plan, on top of ruins. - Ellen Ullman
Other free software work
This month again, I was busy doing some serious yak shaving operations all over the internet, on top of publishing two of my largest LWN articles to date (2017-10-16-strategies-offline-pgp-key-storage and 2017-10-26-comparison-cryptographic-keycards).
feed2exec beta
Since I
announced
this new project last month I have released it as a beta and
it entered Debian. I have also wrote useful plugins like the
wayback
plugin that saves pages on the Wayback machine for
eternal archival. The archive
plugin can also similarly save pages
to the local filesystem. I also added bash completion, expanded unit
tests and documentation, fixed default file paths and a bunch of bugs,
and refactored the code. Finally, I also started using two external
Python libraries instead of rolling my own code: the pyxdg
and requests-file libraries, the latter which
I packaged in Debian (and fixed a bug in their test suite).
The program is working pretty well for me. The only thing I feel is
really missing now is a retry/fail mechanism. Right now, it's a little
brittle: any network hiccup will yield an error email, which are
readable to me but could be confusing to a new user. Strangely enough,
I am particularly having trouble with (local!) DNS resolution that I
need to look into, but that is probably unrelated with the software
itself. Thankfully, the user can disable those with --loglevel=ERROR
to silence WARNING
s.
Furthermore, some plugins still have some rough edges. For example,
The Transmission integration would probably work better as a
distinct plugin instead of a simple exec
call, because when it adds
new torrents, the output is totally cryptic. That plugin could also
leverage more feed parameters to save different files in different
locations depending on the feed titles, something would be hard to do
safely with the exec
plugin now.
I am keeping a steady flow of releases. I wish there was a way to see how effective I am at reaching out with this project, but unfortunately GitLab doesn't provide usage statistics... And I have received only a few comments on IRC about the project, so maybe I need to reach out more like it says in the fine manual. Always feels strange to have to promote your project like it's some new bubbly soap...
Next steps for the project is a final review of the API and release production-ready 1.0.0. I am also thinking of making a small screencast to show the basic capabilities of the software, maybe with asciinema's upcoming audio support?
Pandoc filters
As I mentioned earlier, I dove again in Haskell programming when working on the git-annex security update. But I also have a small Haskell program of my own - a Pandoc filter that I use to convert the HTML articles I publish on LWN.net into a Ikiwiki-compatible markdown version. It turns out the script was still missing a bunch of stuff: image sizes, proper table formatting, etc. I also worked hard on automating more bits of the publishing workflow by extracting the time from the article which allowed me to simply extract the full article into an almost final copy just by specifying the article ID. The only thing left is to add tags, and the article is complete.
In the process, I learned about new weird Haskell constructs. Take this code, for example:
-- remove needless blockquote wrapper around some tables
--
-- haskell newbie tips:
--
-- @ is the "at-pattern", allows us to define both a name for the
-- construct and inspect the contents as once
--
-- {} is the "empty record pattern": it basically means "match the
-- arguments but ignore the args"
cleanBlock (BlockQuote t@[Table {}]) = t
Here the idea is to remove <blockquote>
elements needlessly wrapping
a <table>
. I can't specify the Table
type on its own, because then
I couldn't address the table as a whole, only its parts. I could
reconstruct the whole table bits by bits, but it wasn't as clean.
The other pattern was how to, at last, address multiple string elements, which was difficult because Pandoc treats spaces specially:
cleanBlock (Plain (Strong (Str "Notifications":Space:Str "for":Space:Str "all":Space:Str "responses":_):_)) = []
The last bit that drove me crazy was the date parsing:
-- the "GAByline" div has a date, use it to generate the ikiwiki dates
--
-- this is distinct from cleanBlock because we do not want to have to
-- deal with time there: it is only here we need it, and we need to
-- pass it in here because we do not want to mess with IO (time is I/O
-- in haskell) all across the function hierarchy
cleanDates :: ZonedTime -> Block -> [Block]
-- this mouthful is just the way the data comes in from
-- LWN/Pandoc. there could be a cleaner way to represent this,
-- possibly with a record, but this is complicated and obscure enough.
cleanDates time (Div (_, [cls], _)
[Para [Str month, Space, Str day, Space, Str year], Para _])
| cls == "GAByline" = ikiwikiRawInline (ikiwikiMetaField "date"
(iso8601Format (parseTimeOrError True defaultTimeLocale "%Y-%B-%e,"
(year ++ "-" ++ month ++ "-" ++ day) :: ZonedTime)))
++ ikiwikiRawInline (ikiwikiMetaField "updated"
(iso8601Format time))
++ [Para []]
-- other elements just pass through
cleanDates time x = [x]
Now that seems just dirty, but it was even worse before. One thing I find difficult in adapting to coding in Haskell is that you need to take the habit of writing smaller functions. The language is really not well adapted to long discourse: it's more about getting small things connected together. Other languages (e.g. Python) discourage this because there's some overhead in calling functions (10 nanoseconds in my tests, but still), whereas functions are a fundamental and important construction in Haskell that are much more heavily optimized. So I constantly need to remind myself to split things up early, otherwise I can't do anything in Haskell.
Other languages are more lenient, which does mean my code can be more dirty, but I feel get things done faster then. The oddity of Haskell makes frustrating to work with. It's like doing construction work but you're not allowed to get the floor dirty. When I build stuff, I don't mind things being dirty: I can cleanup afterwards. This is especially critical when you don't actually know how to make things clean in the first place, as Haskell will simply not let you do that at all.
And obviously, I fought with Monads, or, more specifically, "I/O" or
IO
in this case. Turns out that getting the current time is IO
in
Haskell: indeed, it's not a "pure" function that will always return
the same thing. But this means that I would have had to change the
signature of all the functions that touched time to include IO
. I
eventually moved the time initialization up into main
so that I had
only one IO
function and moved that timestamp downwards as simple
argument. That way I could keep the rest of the code clean, which
seems to be an acceptable pattern.
I would of course be happy to get feedback from my Haskell readers (if any) to see how to improve that code. I am always eager to learn.
Git remote MediaWiki
Few people know that there is a MediaWiki remote for Git which allow you to mirror a MediaWiki site as a Git repository. As a disaster recovery mechanism, I have been keeping such a historical backup of the Amateur radio wiki for a while now. This originally started as a homegrown Python script to also convert the contents in Markdown. My theory then was to see if we could switch from Mediawiki to Ikiwiki, but it took so long to implement that I never completed the work.
When someone had the weird idea of renaming a page to some impossible long name on the wiki, my script broke. I tried to look at fixing it and then remember I also had a mirror running using the Git remote. It turns out it also broke on the same issue and that got me looking in the remote again. I got lost in a zillion issues, including fixing that specific issue, but I especially looked at the possibility of fetching all namespaces because I realized that the remote fetches only a part of the wiki by default. And that drove me to submit namespace support as a patch to the git mailing list. Finally, the discussion came back to how to actually maintain that contrib: in git core or outside? Finally, it looks like I'll be doing some maintenance that project outside of git, as I was granted access to the GitHub organisation...
Galore Yak Shaving
Then there's the usual hodgepodge of fixes and random things I did over the month.
Beta testing for the new Signal desktop app which is, unfortunately, an Electron app. This means the binary is huge, at 95MB, and the app takes a whopping 300MB of ram right after startup. Compare this with my IRC client (irssi) which takes 15MB of ram or pidgin, which takes 80MB of ram. Then ponder the costs of developer convenience versus impact on the environment and users...
Routine linkchecker maintenance: 3 PRs merged including a bugfix of my own, one of which inspired me to add the pyxdg dependency in feed2exec.
When adding "badges" to the feed2exec documentation, I also fixed an issue in badges.debian.net, where Debian was referring to wheezy instead of the symbolic name
Tested the Wireguard VPN with moderate success. It can only cross NAT one way, so it was useless for my use case. I ended up using end to end IPv6! still, I updated the Turris OS Wireguard version so that I could use this as a way to get end to end IPv4 connectivity to machines behind my NAT.
Tested a static image gallery generator replacement called Sigal. I suggested a way to add progress information for video processing and did a general issue review. I also looked at how to package it in Debian, in RFP #879239. I looked into Sigal after finally getting a confirmation from the Photofloat maintainer (Jason A. Donenfeld, who's also the Wireguard author) that our patches will never get merged, after 3 years of radio silence. In comparison, the response I got from the communications with Sigal were much more positive. I have therefore requested removal of photofloat from Debian
More on ham radio stuff: I got a Signalink working, which allowed me to receive and transmit APRS signals using a handheld transmitter
I worked more on SafeEyes to try and get it to credit me idle time, which is still not working yet, and suggested improvements to the preferences panel
I found out about the Chrome Lighthouse web page auditor, which doesn't seem to work at all in Chromium, but has a lot of potential. there's a standalone version as well, which could be packaged for Debian, but found that #775925 is for a different lighthouse - a dead Bitcoin-based crowdfunding platform
I helped Koumbit.org configure load balancing for their HTTPS services. For this, I needed to change my IP address, so I test again the Bitmask client and found problems with polkit and the password failure prompt. I also suggested improvements the kerying package.
After realizing there was no future in the It's all text! (IAT) extension, I have switched to Ghosttext which works both in Chromium and Firefox and uses a model similar to the Edit in Emacs extension for Chrome, but it also supports Firefox and other editors. I have filed an issue about silent failures with uMatrix and generally tried to help the IAT community find an exit strategy. (Short story: it's impossible to port, and XUL will die a horrible death.)
There is no [web extension] only XUL! - Inside joke
I was interested to read of your interactions with git-mediawiki. I was looking at using it for backing up mediawikis, too, and I hit all the problems you did. I ended up abandoning it and instead using the ArchiveTeam sub-team WikiTeam's backup/dump scripts, which worked for me: https://github.com/WikiTeam/wikiteam
They were also a LOT faster. I've dumped doomwiki, nin.wiki and chocolate-doom.org so far.
Hm. I used git-remote-mediawiki to dump all the Wikis into git branches when we removed MediaWiki from our FusionForge instances due to security reasons.
On the other hand: NOOOOOOOOO! “It’s all text” is so basic, and now they break it. Firefox really is a barebones-but-bloated thing (I remember Opera having a lot more actually useful functionality built in). Even lynx has that feature built-in (press ^Xe in a textarea).
The suggested replacement doesn’t seem to be able to run xterm-based editors, so it’s not a replacement at all.
I guess many people will stick with Firefox 45 ESR now (because 52 ESR also broke sound, except in Debian where it was still compiled with ALSA support).