Join me in the rabbit hole of git repository verification, and how we could improve it.

Problem statement
OpenPGP verification
Worrying about git and GnuPG
Possible ways forward
Other Projects
1. OpenBSD
2. Golang
3. Python
4. Docker
5. Android and iOS
Conclusion

Problem statement

As part of my work on automating install procedures at Tor, I ended up doing things like:

git clone REPO
./REPO/bootstrap.sh

... something eerily similar to the infamous curl pipe bash method which I often decry. As a short-term workaround, I relied on the SHA-1 checksum of the repository to make sure I have the right code, by running this both on a "trusted" (ie. "local") repository and the remote, then visually comparing the output:

$ git show-ref master
9f9a9d70dd1f1e84dec69a12ebc536c1f05aed1c refs/heads/master

One problem with this approach is that SHA-1 is now considered as flawed as MD5 so it can't be used as an authentication mechanism anymore. It's also fundamentally difficult to compare hashes for humans.

The other flaw with comparing local and remote checksums is that we assume we trust the local repository. But how can I trust that repository? I can either:

audit all the code present and all the changes done to it after
or trust someone else to do so

The first option here is not practical in most cases. In this specific use case, I have audited the source code -- I'm the author, even -- what I need is to transfer that code over to another server.

(Note that I am replacing those procedures with Fabric, which makes this use case moot for now as the trust path narrows to "trust the SSH server" which I already had anyways. But it's still important for my fellow Tor developers who worry about trusting the git server, especially now that we're moving to GitLab.)

But anyways, in most cases, I do need to trust some other fellow developer I collaborate with. To do this, I would need to trust the entire chain between me and them:

the git client
the operating system
the hardware
the network (HTTPS and the CA cartel, specifically)
then the hosting provider (and that hardware/software stack)
and then backwards all the way back to that other person's computer

I want to shorten that chain as much as possible, make it "peer to peer", so to speak. Concretely, it would eliminate the hosting provider and the network, as attackers.

OpenPGP verification

My first reaction is (perhaps perversely) to "use OpenPGP" for this. I figured that if I sign every commit, then I can just check the latest commit and see if the signature is good.

The first problem here is that this is surprisingly hard. Let's pick some arbitrary commit I did recently:

commit b3c538898b0ed4e31da27fc9ca22cb55e1de0000
Author: Antoine Beaupré <anarcat@debian.org>
Date:   Mon Mar 16 14:37:28 2020 -0400

    fix test autoloading

    pytest only looks for file names matching `test` by default. We inline
    tests inside the source code directly, so hijack that.

diff --git a/fabric_tpa/pytest.ini b/fabric_tpa/pytest.ini
new file mode 100644
index 0000000..71004ea
--- /dev/null
+++ b/fabric_tpa/pytest.ini
@@ -0,0 +1,3 @@
+[pytest]
+# we inline tests directly in the source code
+python_files = *.py

That's the output of git log -p in my local repository. I signed that commit, yet git log is not telling me anything special. To check the signature, I need something special: --show-signature, which looks like this:

commit b3c538898b0ed4e31da27fc9ca22cb55e1de0000
gpg: Signature faite le lun 16 mar 2020 14:37:53 EDT
gpg:                avec la clef RSA 7B164204D096723B019635AB3EA1DDDDB261D97B
gpg: Bonne signature de « Antoine Beaupré <anarcat@orangeseeds.org> » [ultime]
gpg:                 alias « Antoine Beaupré <anarcat@torproject.org> » [ultime]
gpg:                 alias « Antoine Beaupré <anarcat@anarc.at> » [ultime]
gpg:                 alias « Antoine Beaupré <anarcat@koumbit.org> » [ultime]
gpg:                 alias « Antoine Beaupré <anarcat@debian.org> » [ultime]
Author: Antoine Beaupré <anarcat@debian.org>
Date:   Mon Mar 16 14:37:28 2020 -0400

    fix test autoloading

    pytest only looks for file names matching `test` by default. We inline
    tests inside the source code directly, so hijack that.

Can you tell if this is a valid signature? If you speak a little french, maybe you can! But even if you would, you are unlikely to see that output on your own computer. What you would see instead is:

commit b3c538898b0ed4e31da27fc9ca22cb55e1de0000
gpg: Signature made Mon Mar 16 14:37:53 2020 EDT
gpg:                using RSA key 7B164204D096723B019635AB3EA1DDDDB261D97B
gpg: Can't check signature: No public key
Author: Antoine Beaupré <anarcat@debian.org>
Date:   Mon Mar 16 14:37:28 2020 -0400

    fix test autoloading

    pytest only looks for file names matching `test` by default. We inline
    tests inside the source code directly, so hijack that.

Important part: Can't check signature: No public key. No public key. Because of course you would see that. Why would you have my key lying around, unless you're me. Or, to put it another way, why would that server I'm installing from scratch have a copy of my OpenPGP certificate? Because I'm a Debian developer, my key is actually part of the 800 keys in the debian-keyring package, signed by the APT repositories. So I have a trust path.

But that won't work for someone who is not a Debian developer. It will also stop working when my key expires in that repository, as it already has on Debian buster (current stable). So I can't assume I have a trust path there either. One could work with a trusted keyring like we do in the Tor and Debian project, and only work inside that project, that said.

But I still feel uncomfortable with those commands. Both git log and git show will happily succeed (return code 0 in the shell) even though the signature verification failed on the commits. Same with git pull and git merge, which will happily push your branch ahead even if the remote has unsigned or badly signed commits.

To actually verify commits (or tags), you need the git verify-commit (or git verify-tag) command, which seems to do the right thing:

$ LANG=C.UTF-8 git verify-commit b3c538898b0ed4e31da27fc9ca22cb55e1de0000
gpg: Signature made Mon Mar 16 14:37:53 2020 EDT
gpg:                using RSA key 7B164204D096723B019635AB3EA1DDDDB261D97B
gpg: Can't check signature: No public key
[1]$

At least it fails with some error code (1, above). But it's not flexible: I can't use it to verify that a "trusted" developer (say one that is in a trusted keyring) signed a given commit. Also, it is not clear what a failure means. Is a signature by an expired certificate okay? What if the key is signed by some random key in my personal keyring? Why should that be trusted?

Worrying about git and GnuPG

In general, I'm worried about git's implementation of OpenPGP signatures. There has been numerous cases of interoperability problems with GnuPG specifically that led to security, like EFAIL or SigSpoof. It would be surprising if such a vulnerability did not exist in git.

Even if git did everything "just right" (which I have myself found impossible to do when writing code that talks with GnuPG), what does it actually verify? The commit's SHA-1 checksum? The tree's checksum? The entire archive as a zip file? I would bet it signs the commit's SHA-1 sum, but I just don't know, on the top of my head, and neither do git-commit or git-verify-commit say exactly what is happening.

I had an interesting conversation with a fellow Debian developer (dkg) about this and we had to admit those limitations:

<anarcat> i'd like to integrate pgp signing into tor's coding practices more, but so far, my approach has been "sign commits" and the verify step was "TBD"

<dkg> that's the main reason i've been reluctant to sign git commits. i haven't heard anyone offer a better subsequent step. if torproject could outline something useful, then i'd be less averse to the practice.

i'm also pretty sad that git remains stuck on sha1, esp. given the recent demonstrations. all the fancy strong signatures you can make in git won't matter if the underlying git repo gets changed out from under the signature due to sha1's weakness

In other words, even if git implements the arcane GnuPG dialect just so, and would allow us to setup the trust chain just right, and would give us meaningful and workable error messages, it still would fail because it's still stuck in SHA-1. There is work underway to fix that, but in February 2020, Jonathan Corbet described that work as being in a "relatively unstable state", which is hardly something I would like to trust to verify code.

Also, when you clone a fresh new repository, you might get an entirely different repository, with a different root and set of commits. The concept of "validity" of a commit, in itself, is hard to establish in this case, because an hostile server could put you backwards in time, on a different branch, or even on an entirely different repository. Git will warn you about a different repository root with warning: no common commits but that's easy to miss. And complete branch switches, rebases and resets from upstream are hardly more noticeable: only a tiny plus sign (+) instead of a star (*) will tell you that a reset happened, along with a warning (forced update) on the same line. Miss those and your git history can be compromised.

Possible ways forward

I don't consider the current implementation of OpenPGP signatures in git to be sufficient. Maybe, eventually, it will mature away from SHA-1 and the interface will be more reasonable, but I don't see that happening in the short term. So what do we do?

git evtag

The git-evtag extension is a replacement for git tag -s. It's not designed to sign commits (it only verifies tags) but at least it uses a stronger algorithm (SHA-512) to checksum the tree, and will include everything in that tree, including blobs. If that sounds expensive to you, don't worry too much: it takes about 5 seconds to tag the Linux kernel, according to the author.

Unfortunately, that checksum is then signed with GnuPG, in a manner similar to git itself, in that it exposes GnuPG output (which can be confusing) and is likely similarly vulnerable to mis-implementation of the GnuPG dialect as git itself. It also does not allow you to specify a keyring to verify against, so you need to trust GnuPG to make sense of the garbage that lives in your personal keyring (and, trust me, it doesn't).

And besides, git-evtag is fundamentally the same as signed git tags: checksum everything and sign with GnuPG. The difference is it uses SHA-512 instead of SHA-1, but that's something git will eventually fix itself anyways.

kernel patch attestations

The kernel also faces this problem. Linus Torvalds signs the releases with GnuPG, but patches fly all over mailing list without any form of verification apart from clear-text email. So Konstantin Ryabitsev has proposed a new protocol to sign git patches which uses SHA256 to checksum the patch metadata, commit message and the patch itself, and then sign that with GnuPG.

It's unclear to me what this solves, if anything, at all. As dkg argues, it would seem better to add OpenPGP support to git-send-email and teach git tools to recognize that (e.g. git-am) at least if you're going to keep using OpenPGP anyways.

And furthermore, it doesn't resolve the problems associated with verifying a full archive either, as it only attests "patches".

jcat

Unhappy with the current state of affairs, the author of fwupd (Richard Hughes) wrote his own protocol as well, called jcat, which provides signed "catalog files" similar to the ones provided in Microsoft windows.

It consists of a "gzip-compressed JSON catalog files, which can be used to store GPG, PKCS-7 and SHA-256 checksums for each file". So yes, it is yet again another wrapper to GnuPG, probably with all the flaws detailed above, on top of being a niche implementation, disconnected from git.

The Update Framework

One more thing dkg correctly identified is:

<dkg> anarcat: even if you could do exactly what you describe, there are still some interesting wrinkles that i think would be problems for you.

the big one: "git repo's latest commits" is a loophole big enough to drive a truck through. if your adversary controls that repo, then they get to decide which commits to include in the repo. (since every git repo is a view into the same git repo, just some have more commits than others)

In other words, unless you have a repository that has frequent commits (either because of activity or by a bot generating fake commits), you have to rely on the central server to decide what "the latest version" is. This is the kind of problems that binary package distribution systems like APT and TUF solve correctly. Unfortunately, those don't apply to source code distribution, at least not in git form: TUF only deals with "repositories" and binary packages, and APT only deals with binary packages and source tarballs.

That said, there's actually no reason why git could not support the TUF specification. Maybe TUF could be the solution to ensure end-to-end cryptographic integrity of the source code itself. OpenPGP-signed tarballs are nice, and signed git tags can be useful, but from my experience, a lot of OpenPGP (or, more accurately, GnuPG) derived tools are brittle and do not offer clear guarantees, and definitely not to the level that TUF tries to address.

This would require changes on the git servers and clients, but I think it would be worth it.

Other Projects

OpenBSD

There are other tools trying to do parts of what GnuPG is doing, for example minisign and OpenBSD's signify. But they do not integrate with git at all right now. Although I did find a hack] to use signify with git, it's kind of gross...

Golang

Unsurprisingly, this is a problem everyone is trying to solve. Golang is planning on hosting a notary which would leverage a "certificate-transparency-style tamper-proof log" which would be ran by Google (see the spec for details). But that doesn't resolve the "evil server" attack, if we treat Google as an adversary (and we should).

Python

Python had OpenPGP going for a while on PyPI, but it's unclear if it ever did anything at all. Now the plan seems to be to use TUF but my hunch is that the complexity of the specification is keeping that from moving ahead.

Docker

Docker and the container ecosystem has, in theory, moved to TUF in the form of Notary, "a project that allows anyone to have trust over arbitrary collections of data". In practice however, in my somewhat limited experience, setting up TUF and image verification in Docker is far from trivial.

Android and iOS

Even in what is possibly one of the strongest models (at least in terms of user friendliness), mobile phones are surprisingly unclear about those kind of questions. I had to ask if Android had end-to-end authentication and I am still not clear on the answer. I have no idea of what iOS does.

Conclusion

One of the core problems with everything here is the common usability aspect of cryptography, and specifically the usability of verification procedures. We have become pretty good at encryption. The harder part (and a requirement for proper encryption) is verification. It seems that problem still remains unsolved, in terms of usability. Even Signal, widely considered to be a success in terms of adoption and usability, doesn't properly solve that problem, as users regularly ignore "The security number has changed" warnings...

So, even though they deserve a lot of credit in other areas, it seems unlikely that hardcore C hackers (e.g. git and kernel developers) will be able to resolve that problem without at least a little bit of help. And TUF seems like the state of the art specification around here, it would seem wise to start adopting it in the git community as well.

Update: git 2.26 introduced a new gpg.minTrustLevel to "tell various signature verification codepaths the required minimum trust level", presumably to control how Git will treat keys in your keyrings, assuming the "trust database" is valid and up to date. For an interesting narrative of how "normal" (without PGP) git verification can fail, see also A Git Horror Story: Repository Integrity With Signed Commits.

RSS

Easiest steps forward

Great read, thank you.

I just set up automatic git signature verification for my company, which is why your article is especially interesting for me (and it might be interesting for you to hear about a use case where it is actually usable, disregarding the issues below). The scenario is the following:

We use automated ci/cd tools to deploy our software. Naturally, that means, that the deployment pipeline needs access to production server credentials. In order to minimize the trust we need to have in our git repository platform, the pipeline runner is providing the secret required to accesss the production server to the pipeline if all commits in the repository are signed properly. We're not using GPG keys, but X508 certificates to simplify certificate management for us (creation and revocation of certificates is possible without redeployment of the pipeline runner).

Correct me if I'm wrong, but with this automated setup, the only remaining issues are hash collision attacks (which is indeed quite problematic), performance (since we're checking all commits that lead to the current git HEAD) for larger repositories and the possibility of an attacker with access to our remote repository/pipeline configuration to deploy an outdated version of the software.

The first issue would obviously be fixed if git used a strong hash function (which we'll hopefully get in the near future). Without it, we definitely have a problem here. The other problems I'd be willing to accept since the effort forbimplementing a way to prevent the deployment of outdated versions probably outweighs the risk for our use case.

If I had to implement something, I'd probably use frequent key rotation (i.e. every developer doesn't get a trusted client certificate but an intermediate CA instead. For signing commits, he would then create client certificates himself with a expiration period of just a few weeks).

Comment by Tobi — 2020-04-19 16:47

Comments on this page are closed.

Created 2020-03-17 21:16. Edited 2020-03-23 10:47.