Concerns with Signal receipt notifications
During some experiments with a custom Signal client with a friend, let's call him Bob, he was very surprised when we had a conversation that went a little like this:
A> hey Bob! welcome home!
B> what?
B> wait, how did you know I got home?
B> what the heck man? did you hack my machine? OMGWTFSTHUBERTBBQ?!
I'm paraphrasing as I lost copy of the original chat, but it was striking how he had absolutely no clue how I figured out he had just came home in front of his laptop. He was quite worried I hacked into his system to spy on his webcam or some other "hack". As it turns out, I just made simple assertions based on data Signal provides to other peers when you send messages. Using those messages, I could establish when my friend opened his laptop and the Signal Desktop app got back online.
How this works
This is possible because the receipt notifications in Signal are
per-device. This means that the "double-checkmark" you see when a
message is delivered to the device is actually only when the first
device receives the message. Behind the scenes, Signal actually sends
a notification for each device, with a unique, per-device
identifier. Those identifiers are visible with signal-cli. For
example, this is a normal notification the Signal app will send when
confirming reception for a message, as seen from signal-cli
:
Envelope from: “Bob” +15555555555 (device: 1)
Timestamp: 1532279834422 (2018-07-22T17:17:14.422Z)
Got receipt.
That's Bob's phone telling me it received the message. On my side, the Signal app shows a second checkmark to tell me the message was transmitted. (There are also "blue checkmarks" now that tell the user the other person has seen the message, but I haven't looked into those in detail.) Then another notification comes in:
Envelope from: “Bob” +15555555555 (device: 2)
Timestamp: 1532279901951 (2018-07-22T17:18:21.951Z)
Got receipt.
Notice the device number there? It changed from 1
to 2
. This tells
me this is a different device than the first one. Device 1 will most
likely be the phone app and device 2 will most likely be Signal
Desktop. (In my case, I tried so many different configurations thatI
have device numbers up to 8, but my phone is still device 1.)
An attacker can use those notifications to tell when my phone goes online. It is also possible to make reasonable assertions about the identity of each device: any device number above one is most likely a Signal Desktop client. This can be used to assert physical presence on different machines: the desktop at home, laptop in the office, etc. It might not seem like much, but it sure felt creepy to Bob.
While writing this article, I figured I would reproduce those results, I wrote Bob again to ask for help. Here's how the (redacted and reformatted) conversation went:
A-1> hey you there?
* B-1 message received
A-1> i want to see if i can freak you out with signal again
* B-1 message received
A-1> i'm going to write about the issue, and i want to reproduce the results
* B-1 message received
B-1> he's driving
B-1> sure, I'll be your guinea pig he says
A-1> all he needs to do is open his laptop and start signal-desktop :p
* B-1 message received
B-1> we'll be home in 1h30
A-1> i'll know, don't worry :p
* B-1 message received
After an hour or two, Bob gets home opens his laptop, and you can see the key message that reveals it:
* B-2 mesage received
A-1> welcome home, sucker! ;)
B-2> dang dog.
This attack can be carried out by anyone who knows Bob's phone number. Because Signal is an open network, you are free to send messages to anyone without their consent. An attacker only has to send spam messages to a victim to figure out when they're online, how many devices they own and when they are online. There's no way for Bob to protect himself from this attack, other than trying to keep his phone number private.
Why Signal works that way
When I shared an earlier draft of this article to the Signal Security team, they stated this was a necessary trade-off, as each device carries a unique cryptographic key anyways and that:
Signal encrypts messages individually to each recipient device. Thus as long as there is a "delivery receipt" feature, it will be possible to learn which recipient devices are online, for example by sending an encrypted message to a subset of the recipient devices, and seeing whether a delivery receipt is received or not.
The alternative seems to be to either disable receipt notifications or sharing the same private key among different devices, which induces other problems:
Having all recipient devices share the same encryption keys would render the Diffie-Hellman ratcheting which is part of the Signal protocol ineffective, since all devices (including offline ones) would have to use synchronized DH ratchet key pairs, preventing these values from adding fresh randomness. In addition, it would add massive protocol complexity and fragility to try to keep recipient devices synchronized, while trying to achieve the (probably-infeasible) goal of eliminating all ways to distinguish recipient devices.
I am not certain those tradeoffs are that clear-cut, however. I am not a cryptographer, and specifically not very familiar with the "ratcheting" algorithm behind the "Signal protocol" (or is it called Noise now?), but it seems to me there should be a way to provide multi-device, multi-key encryption, without revealing per-device identifiers to other clients. In particular, I do not understand what purpose those integers serve: maybe they are automatically generated by signal-cli and are just a side-effect of a fundamental property of the protocol, in which case I would understand why they would be unavoidable. To be fair, other cryptographic systems also share similar problems: an encrypted OpenPGP email usually embeds metadata about source and destination addresses, as email headers are not encrypted. Even a normal OpenPGP encrypted blob includes OpenPGP key data by default, although there are ways to turn that off and make sure an encrypted blob is just an undecipherable blob. The problem with this, of course, is that many critics of OpenPGP present it as an old technology that should be replaced by more modern alternatives like Signal, so it's a bit disappointing to see it suffers from similar metadata exposure problems as older protocols.
But apart from cryptographic properties, there are certain user expectations regarding Signal, and my experience with this specific issue is that this property certainly breaks some privacy expectations for users. I'm not sure people would choose to have delivery notifications if they were given the choice.
Other metadata issues
There are other metadata issues in Signal, of course. Like receipt notifications, they are tradeoffs between usability and privacy. The most notable one is probably how Signal shares your contact list. The user-visible effect is the "Bob is on Signal!" message that pops up when the server figures that out. The Signal people have done extensive research to make this work securely while at the same time leveraging the contacts on your phone, but it's still a surprising phenomenon to new users who don't know about the specifics of how this is implemented.
Another one is how groups are opt-out only: anyone can add you to a group without your consent, which shares your phone number to the other members of the group, a bit like how carbon-copies in emails reveals a social network.
Compared with groups and new users notifications, the receipt notification issue is a little more pernicions: the leak is not visible at all to users except if they run signal-cli... While people clearly see each other's presence in a group, they definitely will not know that those little checkmark disclose more information than they seem to other users.
The bottomline is that crypto and security are hard to implement but also hard to make visible to users. Signal does a great job at making a solid communication application that provides decent security, but it can have surprising properties even for skilled engineers who thought they knew about the security properties of the system, so I am worried about my fellow non-technical friends and their expectations of privacy...