One-Time Programs

One-Time Programs

One of the things I like to do on this blog is write about new research that has a practical angle. Most of the time (I swear) this involves writing about other folks’ research: it’s not that often that I write about work that comes out of my own lab. Today I’m going make an exception to talk about a new paper that will be appearing at TCC ’22. This is joint work with my colleagues Abhishek Jain and Aarushi Goel along with our students Harry Eldridge and Max Zinkus.

This paper is fun for three reasons: (1) it addresses a cool problem, (2) writing about it gives me a chance to cover a bunch of useful, general background that fits the scope of this blog [indeed, our actual research won’t show up until late in the post!], and most critically (3) I want people to figure out how to do it better, since I think it would be neat to make these ideas more practical. (Note, if you will, that TCC stands for Theory of Cryptography conference, which is kind of a weird fit.)

Our work is about realizing a cryptographic primitive called the One-Time Program (OTP). This is a specific kind of cryptographically obfuscated computer program — that is, a program that is “encrypted” but that you can mail (literally) to someone who can run it on any untrusted computer, using input that the executing party provides. This ability to send “secure, unhackable” software to people is all by itself of a holy grail of cryptography, since it would solve so many problems both theoretical and practical. One-time programs extend these ideas with a specific property that is foreshadowed by the name: the executing computer can only run a OTP once.

OTPs are one of the cooler ideas to pop out of our field, since they facilitate so many useful things. Aside from the obvious dark-side implications (perfect DRM! scary malware!) they’re useful for many constructive applications. Want to send a file encrypted under a weak password, but ensure nobody can just brute-force the thing? Just send a OTP that checks the password and outputs the file. Want to send a pile of sensitive data and let users compute statistical functions over it using differential privacy? Sure. Time-lock encryption? Why not. In fact, OTPs are powerful enough that you can re-invent many basic forms of cryptography from them… provided you’re not too worried about how efficient any of it will be.

As we’ll see in a minute, OTPs might be nice, but they are perhaps a little bit too good to be true. Most critically they have a fundamental problem: building them requires strong model-breaking assumptions. Indeed, many realizations of OTPs require the program author to deliver some kind of secure hardware to the person who runs the program.* This hardware can be ridiculously simple and lightweight (much simpler than typical smartcards) but the need to have some of it represents a very big practical limitation. This is likely why we don’t have OTPs out in the world — the hardware required by scientists does not actually exist.

In this work we tried to answer a very similar question. Specifically: can we use real hardware (that already exists inside of phones and cloud services) to build One-Time Programs? That’s the motivation at a high level.

Now for the details… and boy are there a lot of them.

One-Time Programs

One-Time Programs (OTPs) were first proposed by Goldwasser, Kalai and Rothblum (GKR) back in CRYPTO 2008. At a surface level, the idea is quite simple. Let’s imagine that Alice has some (secret) computer program P that takes in some inputs, cogitates for a bit, and then produces an output. Alice wishes to mail a copy of P to her untrustworthy friend Bob who will then be able to run it. However Alice (and Bob) have a few very strict requirements:

  1. Bob can run the program on any input he wants, and he will get a correct output.
  2. Bob won’t learn anything about the program beyond what he learns from the output (except, perhaps, an upper-bound on its size/runtime.)
  3. Bob can run the program exactly once.

Let’s use a very specific example to demonstrate how these programs might work. Imagine that Alice wants to email Bob a document encrypted under a relatively weak password such as a 4-digit PIN. If Alice employed a traditional password-based encryption scheme, this would be a very bad idea! Bob (or anyone else who intercepts the message before it reaches Bob) could attempt to decrypt the document by systematically testing each of the (10,000) different passwords until one worked correctly.

Using a one-time program, however, Alice can write a program with code that looks like this, and turn it into an OTP:

Program P: takes an argument "password"
  1. if password != "<secret password>",
      output "BAD"
  2. else output <secret document>
I don’t know if the kids even get my meme choices anymore.

When Bob receives an OTP of the program above, he can then run it on some password input he chooses — even if Alice is no longer around to help him. More critically, because it’s a one-time program, Bob will only be able to try a single password guess. Assuming Bob knows the right password this is just fine. But a recipient who does not know the password will have to guess it correctly the first time. The nice implication is that even a “weak” 4-digit PIN reasonably to safe to use as a password.

(Of course if Alice is worried about Bob innocently fat-fingering his password, she can send him a few different copies of the same program. Put differently: one-time programs trivially imply N-time programs.)

One-time programs have many other useful applications. Once I can make “unhackable” limited-use software, I can send you all sorts of useful functionalities based on secret intellectual property or private data rather than keeping that stuff locked up on my own server. But before we can do those things, we need to actually build OTPs.

Why hardware?

If you spend a few minutes thinking about this problem, it should become obvious that we can’t build OTPs using (pure) software: at least not the kind of software that can run on any general-purpose computer.

The problem here stems from the “can only run once” requirement.

Imagine that you send me a pure software version of a (purported) One-Time Program. (By this I mean: a piece of software I can run on a standard computer, whether that’s a Macbook or a VM/emulator like QEMU.) I’m supposed to be able to run the program once on any input I’d like, and then obtain a valid output. The program is never supposed to let me run it a second time on a different input. But of course if I can run the software once that way, we run into the following obvious problem:

What stops me from subsequently wiping clean my machine (or checkpointing my VM) and then re-installing a fresh copy of the same software you sent—and running it a second time on a different input?

Sadly the answer is: nothing can prevent any of this. If you implement your purported “OTP” using (only) software then I can re-run your program as many times as I want, and each time the program will “believe” it’s running for the very first time. (In cryptography this is sometimes called a “reset attack”.)

Keanu experiences a reset attack.

For those who are familiar with multi-party computation (MPC), you’ll recognize that such attacks can be thwarted by requiring some interaction between the sender and recipient each time they want to run the program. What’s unique about OTPs is that they don’t require any further interaction once Alice has sent the program: OTPs work in a “fire and forget” model.

In their original paper, GKR noted this problem and proposed to get around it by changing the model. Since pure software (on a single machine) can’t possibly work, they proposed to tap the power of tamper-resistant hardware. In this approach, the program author Alice sends Bob a digital copy of the OTP along with a physical tamper-resistant hardware token (such as a USB-based mini-HSM). The little token would be stateful and act like one of those old copy-protection dongles from the 1990s: that is, it would contains cryptographic key material that the program needs in order to run. To run the program, Bob would simply pop this “hardware token” into his computer.

A single USB token might contain thousands of “one-time memories.”

Now you might object: doesn’t using hardware make this whole idea kind of trivial? After all, if you’re sending someone a piece of specialized tamper-resistant hardware, why not just pop a general-purpose CPU into that thing and run the whole program on its CPU? Why use fancy cryptography in the first place?

The answer here has to do with what’s inside that token. Running general programs on tamper-proof hardware would require a token with a very powerful and expensive (not to mention complex) general-purpose CPU. This would be costly and worse, would embed a large attack software and hardware attack surface — something we have learned a lot about recently thanks to Intel’s SGX, which keeps getting broken by researchers. By contrast, the tokens GKR propose are absurdly weak and simple: they’re simple memory devices that spit out and erase secret keys when asked. The value of the cryptography here is that Bob’s (untrusted) computer can still do the overwhelming share of the actual computing work: the token merely provides the “icing” that makes the cake secure.

But while “absurdly weak and simple” might lower the hardware barrier to entry, this is not the same thing as having actual tokens exist. Indeed it’s worth noting that GKR proposed their ideas way back in 2008, it is now 2022 and nobody (to my knowledge) has ever built the necessary token hardware to deploy the in the world. (One could prototype such hardware using an open HSM platform, but how secure would that actually be — compared to a proper engineering effort by a major company like Apple, Google or Amazon?)

And yet One-Time Programs are neat. It would be useful to be able to write and run them on real devices! For many years I’ve run into problems that would melt away if we could deploy them easily on consumer devices. Wouldn’t it be great if we could build them using some hardware that already exists?

How to build a (GKR) One-Time Program

In order to explain the next part, it’s necessary to give an extremely brief overview of the GKR construction for One-Time Programs, and more specifically: their specialized tokens. This construction is based on garbled circuits and will make perfect sense if you’re already familiar with that technique. If not, it will require a little bit more explanation.

GKR’s idea is to rely on many individual tokens called One-Time Memories (OTM). An invidual OTM token works like this:

  1. When a program author (Alice) sets one up, she gets to install two different strings into it: let’s call them K0 and K1. She can then mail the token to Bob.
  2. When Bob receives the token and wants to use it, he can ask the token for either of the two strings (0/1). The token will hand Bob the desired string (either K0 or K1.)
  3. Once the token has given Bob the string he asked for, it permanently erases the other string.

The strings themselves need not be very long: 128 bits is ideal. To use these tokens for building One-Time Programs, Alice might need to set up a few hundred or a few thousand of these “tokens” (which can technically all be glommed together inside a single USB device) and send them to Bob.

Once you assume these tokens, the GKR style of building One-Time Programs is pretty straightforward if you’re a cryptographer. Summarized to someone who is familiar with garbled circuits: the basic idea is to take the classical Yao two-party computation (2PC) scheme and replace the (interactive) Oblivious Transfer portion by sending the evaluator a set of physical One-Time Memory tokens.

If that doesn’t work for you, a slightly more detailed explanation is as follows:

Alice first converts her program P into a boolean circuit, like the one below:

Having done that, she then assigns two random cryptographic keys (annoyingly called ‘labels’) to every single wire in the circuit. One key/label corresponds to the “0” value on that wire, and the other to the “1” bit. Notice that the input wires (top-left) also count here: they each get their own pair of keys (labels) that correspond to any input bit.

labeledcircuit

Alice next “garbles” the circuit (encrypting each gate) using a clever approached devised by Andrew Yao, which I won’t describe precisely here but Vitalik Buterin nicely explains it in this blog post. The result is that each table is replaced with an encrypted table of ciphertexts: anyone who has two of the appropriate labels going into that gate will be able to evaluate it, and in return they will receive the appropriate label for the gate’s output wire. Hence all Bob needs is the appropriate keys/labels for the value Bob wishes to feed into the input wires at the top-left of the circuit — and then he can then recursively evaluate the circuit all the way to the output wires.

From Bob’s perspective, therefore, all he needs to do is obtain the labels that correspond to the input arguments he wants to feed into the circuit. He will also need a way to translate the labels on the output wires into actual bits. (Alice can send Bob a lookup table to help him translate those labels into actual bits, or she can just make the circuit output raw binary values instead of labels on those wires.)

A critical warning is that Bob must receive only one input label for each wire: if he had more than that, he could run the circuit on two “different inputs.” (And in this case, many other bad things would probably happen.) Hence the high-level problem is how to deliver the appropriate input labels to Bob, while ensuring that he gets exactly one label for each wire and never more.

And this is where the One-Time Memory tokens fit in perfectly. Alice will set up exactly one OTM token for each input wire: it will contain both labels for that wire. She’ll send it to Bob. Bob can then query each token to obtain exactly one label for that wire, and then use those labels to evaluate the rest of the circuit. The OTM token will destroy all of the unused labels: this ensures that Bob can only run the program on exactly one input. And that’s the ballgame.**

So where do I pick myself up some One-Time Memories?

You buy them at your local One-Time Memory store, obviously.

Seriously, this is a huge flaw in the hardware-based OTP concept. It would be awfully useful to have OTPs for all sorts of applications, assuming we had a bunch of One-Time Memory tokens lying around and it was easy to ship them to people. It would be even more awesome if we didn’t need to ship them to people.

For example:

  • Imagine there was a publicly-accessible cloud provider like Google or Apple that had lots of One-Time Memories sitting in your data center that you could rent. Alice could log in to Google and set up a bunch of OTM “tokens” remotely, and then simply send Bob the URLs to access them (and hence evaluate a OTP that Alice mails him.) As long as the cloud provider uses really good trusted hardware (for example, fancy HSMs) to implement these memories, then even the cloud provider can’t hack into the secrets of Alice’s One-Time Programs or run them without Bob’s input.
  • Alternatively, imagine we had a bunch of One-Time Memories built into every smartphone. Alice couldn’t easily send these around to people, but she could generate One-Time Programs that she herself (or someone she sends the phone to) could later run. For example, if Alice could build a sophisticated biometric analysis program that uses inputs from the camera to unlock secrets in her app, and she could ensure that the program stays safe and can’t be “brute-forced” through repeated execution. Even if someone stole Alice’s phone, they would only be able to run the program (once) on some input, but they would never be able to run it twice.

The problem is that, of course, cloud providers and phone manufacturers are not incentivized to let users build arbitrary “unhackable” software, nor are they even thinking about esoteric things like One-Time Memories. No accounting for taste.

And yet cloud providers and manufacturers increasingly are giving consumers access to specialized APIs backed by secure hardware. For example, every single iPhone ships with a specialized security chip called the Secure Enclave Processor (SEP) that performs specific cryptographic operations. Not every Android phone has such processor, but most include a small set of built-in TrustZone “trustlets” — these employ processor virtualization to implement “secure” mini-apps. Cloud services like Google, Apple iCloud, Signal and WhatsApp have begun to deploy Hardware Security Modules (HSMs) in their data centers — these store encryption keys for consumers to use in data backup, and critically, are set up in such a way that even the providers can’t hack in and get the keys.

Unfortunately none of the APIs for these hardware services offer anything as powerful as One-Time Programs or (even) One-Time Memories. If anything, these services are locked down specifically to prevent people from doing cool stuff: Apple’s SEP supports a tiny number of crypto operations. TrustZone does support arbitrary computation, but today your applet must be digitally signed by a phone manufacturer or a TEE-developer like Qualcomm before you can touch it. Consumer-facing cloud providers definitely don’t expose such powerful functionality (for that you’d need something like AWS Nitro.) The same goes for “simple” functionalities like One-Time Memories: they may be “simple”, but they are also totally weird and why would any consumer manufacturer bother to support them?

But let’s never forget that cryptography is a sub-field of computer security. In that field when someone doesn’t let you run the machine you want to run, you figure out how to build it.

What hardware APIs do manufacturers and cloud providers support?

Not very many, unfortunately.

The Keymaster meets the Gatekeeper. I think there was also a dog in this movie?

If you hunt around the Apple Developer Documentation for ways to use the iPhone SEP, for example, you’ll find APIs for generating public keys and using them, as well as ways to store keys under your passcode. Android (AOSP) provides similar features via the “Gatekeeper” and “Keymaster” trustlets. Consumer-facing cloud services provide HSM-backed services that also let you store your keys by encrypting them under a password.

None of this stuff is a “One-Time Memory,” unfortunately. Most of it isn’t even stateful… with one exception.

“Counter lockboxes”

Reading through documentation reveals one functionality that isn’t exactly what we need, but at least has some promise. In its documentation, Apple calls this function a “counter lockbox“, and it’s designed to store encryption keys that are protected with a user-selected password. The same person (or someone else) can later retrieve the key by sending in the right passcode. In the event that the entered passcode is not the right one, the lockbox increments an “attempt counter” that caps the number of incorrect guesses that will be permitted before the key is locked forever.

And I do mean forever. When the user exceeds the maximum number of guesses, something interesting happens (here is Apple’s version):

Source: Apple Platform Security guide.

While I’m going to use Apple’s terminology, lockboxes are not just an Apple thing: a nearly-identical functionality exists inside every phone and in basically all of the consumer-facing cloud services I mentioned above.

Since counter lockboxes erase things, and they’re pretty much ubiquitous, this gives us hope. Even though they are not OTMs, maybe we can somehow use them to build OTMs. To explain how this works, we first need to give a clear description of how a basic a counter lockbox works. Here’s a simplified version:***

  1. To set up a fresh lockbox, Alice provides an encryption key K and a password P as well as a “maximum attempt counter” M. The initial attempt counter is set to A := 0. The token stores (K, P, A, M).
  2. When Bob wants to access the lockbox, he sends in a “guess” P’.
    1. If P’ == P (i.e., the guess is correct), the token returns K and resets A := 0.
    2. Else if P’ != P (the guess is incorrect), the token sets A := A + 1 and returns “bad password”.
  3. If A == M (i.e., the maximum attempts have been reached) then the token wipes its memory completely.

If you remember our description of a one-time memory (OTM) further above, you’ll notice that a lockbox stores only one string (key), but that an OTM has two different strings stored in it. Moreover, the OTM will always give us one or the other string, but a counter lockbox only gives us its single string when we enter a password.

But perhaps we can simulate OTMs by using multiple lockboxes.

Imagine that Alice has access to two different (fresh) counter lockboxes, as illustrated below. Let’s assume she configures both lockboxes to permit exactly one password attempt (M=1). Next, she stores the string K0 into the lockbox on the left, and sets it up to use password “0”. She then places the string K1 into the lockbox on the right and set it up with password “1”, as follows:

The picture above shows you what’s inside each lockbox. But you need to remember that to anyone other than Alice, lockboxes are kind of a black box. That is, they don’t reveal the key or password they’ve set up with — until the lockbox is queried on a specific password. To exploit this fact, Alice can shuffle the two lockboxes’ and send them to Bob in a random ordering. Until he tries to access one, these lockboxes will look like this to Bob:

So how does Bob “open” the lockboxes to reliably obtain one of the two strings?

For a moment let’s assume that Bob is honest, and wants to get either K0 or K1. He does not know which of the two lockboxes contains the string he wants. He can, however, employ a simple strategy to obtain it.

Assuming Bob’s choice is “0”, he can simply query both lockboxes on password “0”. One of the two lockboxes will output K0, and the other will output “bad password”. (If his choice is “1”, then he can do the same using that as the password.) This strategy works perfectly. In either case Bob always obtains the string he wanted, and the other string ends up getting erased: just like a One-Time Memory!

The problem with this approach is that Bob might not be honest.

A cheating Bob can query the first lockbox on password “0”, and if that lockbox hands him K0, he can switch strategies and query the remaining lockbox on password “1”. If this works, he will end up with both of the strings — something that should never happen in a One-Time Memory. And this is very bad! Imagine we use such broken “memories” to build GKR one-time programs and Bob pulls this off for even a single input wire in our garbled circuit, then he will be able to query the circuit on multiple distinct inputs. (And it gets worse: if Bob obtains two labels on the same input wire, the garbled circuit construction will often unravel like a cheap sweater.)

The good news here is that Bob’s strategy doesn’t always work. He will sometimes get both strings, but sometimes he won’t. Half the time he’ll query the first lockbox on “0” and it will say “bad password”… and the key it holds will be gone forever. (Bob can still go on to get the other string from the untouched lockbox, but that’s not a problem.) This means we have at best a 1/2 probability of thwarting a cheating Bob. That’s good… but can we reduce Bob’s chances even more?

The most obvious approach is to use more lockboxes. Instead of two lockboxes, imagine Alice sets up, say, 80 different shuffled lockboxes divided into two sets. The first forty lockboxes can be programmed with the “0” password, and the other forty can be programmed with password “1”: then all will be mixed together. Our goal in this is to ensure that Bob will obtain K0 only if he guesses where all of the “0” lockboxes are. To ensure this, Alice will use secret sharing to split K0 into 40 different “shares”, with the property that all of the shares are needed to obtain the original string. Each share can be placed into one of the 40 boxes. (A similar process will be used to protect K1.)

Of course Bob can still cheat and try to guess his way through this maze Alice has built, but he will need to correctly guess where each of the “0”-password lockboxes are located without making a single mistake: since even a single wrong guess will doom his chances to get both strings. Such a lucky guessing strategy is extremely challenging for Bob to pull off. (The analysis is not quite equivalent to Bob flipping a coin “heads” forty times in a row [1/240] but it results in a probability that’s similarly low.)

By adjusting the number of lockboxes she uses, Alice can carefully tune the security level of each “virtual one-time memory” so that cheating (almost) never works.

This sure seems like a lot of lockboxes

Well, I said this paper is appearing in the Theory of cryptography conference, didn’t I?

Obviously the proposal above is not the end of the story. In that solution, for an desired S-bit security level the “naive” approach requires Alice to set up 2S lockboxes (or O(S) if you don’t like constants.) To build GKR one-time programs from this, Alice would need to use that O(S) lockboxes for every input bit Bob might want to feed into the program. Surely we can do better.

The rest of our paper looks at ways to reduce the number of lockboxes required, mainly focusing on the asymptotic case. Our second proposal reduces the number of lockboxes to about O(1) lockboxes for every input bit (when there are many input wires) and the final proposal removes the dependence on the number of input bits entirely. This means in principle we can run programs of arbitrary input length using some reasonable fixed number of lockboxes.

I can’t possibly do justice to the full details here, except to note that we rely on some very cool techniques proposed by others. Our first transform relies on a ‘robust garbling’ technique devised by Almashaqbeh, Benhamouda, Han, Jaroslawicz, Malkin, Nicita, Rabin, Shah and Tromer. The second uses an underappreciated tool called “Laconic OT” that was proposed by Cho, Dottling, Garg, Gupta, Miao and Polychroniadou. (Sadly, Laconic OT constructions are not quite “concretely” practical yet, so I hope these new applications will motivate some more practical research into that area.)

The upshot is that we can, in principle, manufacture programs that run on user inputs of arbitrary size with a fixed cost of at most several thousand lockboxes. While this is still quite a large number in practice(!), it isn’t not infeasible. Such a quantity of lockboxes could be obtained by, for example, setting up boatloads of fake iCloud or Google accounts and then using Apple’s iCloud Key Vault or Google’s Titan Backup service to store a “backup key” for each of them.

(I do not recommend you actually do any of this: I suspect both Apple and Google find such things irritating, and will make your life unpleasant if you try it.)

Are there any downsides to this research?

Maybe. There are many applications of obfuscated programs (including one-time programs) that are extremely bad for the world. One of those applications is the ability to build extremely gnarly ransomware and malware. This is presumably one of the reasons that systems like TrustZone and Intel SGX require developers to possess a certificate in order to author code that can run in those environments.

For example: a ransomware author could encrypt a whole system and the store the keys inside of a One-Time Program that would, in turn, need to be executed on a valid chunk of a blockchain in order to release the keys. This blockchain would then incorporate a fragment that proves that the system owner has paid some sum of money to the malware developer. This system would be fully ‘autonomous’ in the sense that your computer, once infected, would never need to phone home to any “command and control” infrastructure operated by the ransomware author. This would make malware networks harder to take down.

If systems designers are worried about malware on SGX, our research shows that (eventually) those designers may also need to think about the availability of “lockbox”-type functionalities as well. Or to give a shorter summary: don’t let the bad guys get more than a few thousand lockboxes, or they gain the power of secure computation and all that comes with it. And the exact number they need could be much smaller than the results we achieve in this paper.

Perhaps at a higher level, the main lesson of this research is that computation is much easier than you’d expect it to be. If someone wants to do it badly enough, they’ll always find a way.

Notes:

* There is a second line of work that uses very powerful cryptographic assumptions and blockchains to build such programs. I am very enthusiastic about this work [and my co-authors and I have also written about this], but this post is going to ignore those ideas and stick with the hardware approach.

** Of course that’s not really the ballgame. As with all simple descriptions of complex protocol papers, this simple explanation omits all the subtle details that matter for the security proof, but that aren’t terribly relevant for the purposes of this blog. These include all sorts of interesting questions about how to prove security in an environment where an adversary can query the tokens in a weird, arbitrary order. It’s a good paper!

*** Some implementations will pick the key K for you. Others fix the maximum attempt counter at some constant (usually 10 attempts) rather than supporting any amount. All of these details can be worked around in practice.

In defense of crypto(currency)

In defense of crypto(currency)

Last week a group of technologists, including Bruce Schneier, sent a letter to Congress outlining their concerns around cryptocurrency and urging Congress to regulate the space.

Now let me be the first to say that I broadly support this goal. I have no problem with the idea of legislators (intelligently) passing laws to regulate cryptocurrency. Indeed, given the level of insanity and the number of outright scams that are happening in this area, it’s pretty obvious that our current regulatory framework is not up to the task. If the recent letter simply asked for intelligent regulation, I’d gladly sign onto it. Unfortunately that’s not at all what this letter says. Instead, it argues that the entire technology field is worthless and cannot be used for any practical purpose.

Don’t take my word for it. I urge you to stop reading this post right now and take a look for yourself. I’ve helpfully reproduced some of the critical pieces below (emphasis added):

By its very design, blockchain technology, specifically so-called “public blockchains”, are poorly suited for just about every purpose currently touted as a present or potential source of public benefit. From its inception, this technology has been a solution in search of a problem and has now latched onto concepts such as financial inclusion and data transparency to justify its existence, despite far better solutions already in use. After more than thirteen years of development, it has severe limitations and design flaws that preclude almost all applications that deal with public customer data and regulated financial transactions and are not an improvement on existing non-blockchain solutions.

The catastrophes and externalities related to blockchain technologies and crypto-asset investments are neither isolated nor are they growing pains of a nascent technology. They are the inevitable outcomes of a technology that is not built for purpose and will remain forever unsuitable as a foundation for large-scale economic activity.

Frankly, this whole letter bums me out. Over the years I’ve spent a decent amount of time on Twitter calling out cryptocurrency shills who spout technical nonsense while promoting outright confidence games. I took for granted that my technical colleagues would be more a little more reasonable, particularly when speaking as technical experts to Congress and regulators. This is not simply someone “being wrong on the Internet.” These are important claims that deserve serious attention, and there are real consequences to being wrong here.

So while I appreciate the authors’ intentions, against my better judgement — and you’d better believe my better judgement is shouting at me for writing these words — I feel compelled to say something in defense of this technology area. “Public blockchain” technology enables many stupid things: today’s cryptocurrency schemes can be venal, corrupt, overpromised. But the core technology is absolutely not useless. In fact, I think there are some pretty exciting things happening in the field, even if most of them are further away from reality than their boosters would admit. Moreover, many of crypto’s technical problems are also amenable to some really exciting technical solutions, many of which are already here or on their way to deployment.

So instead of enjoying a beautiful week of Baltimore summer, I’m going to spend my time indoors, writing about what I think those things are — and (more or less incidentally) why I think these distinguished authors are wrong that “public blockchain technology” is a technical dead end. This post is not precisely a rebuttal to the letter above: instead, I’ve decided to phrase it as a general response to some of the more common spurious objections I hear people make to public blockchain systems. It just happens to be the case that a few of these (but not all) come up in the letter.

Finally, in the interest of full disclosure: I have designed privacy-preserving cryptocurrency systems (and still serve on a Foundation board of one), and I’m currently working on a startup that is trying to add regulatory compliance capabilities to public blockchains. (You can decide for yourself if this makes me a shill for Big Blockchain or Big Regulation. Frankly I’m not sure.)

Objection: “Cryptocurrency is terrible for the environment”

It’s probably best not to beat around the bush on this one: the most serious current objection to cryptocurrency (as it’s currently construed) is the massive environmental impact of proof-of-work (PoW) mining. I won’t devote a single second towards minimizing this concern. Many defenders have tried to paint the electricity consumption of Bitcoin and other PoW currencies as “green” or define it as a form of energy storage. This is dishonest nonsense: estimates hold that at east 60% of mining energy consumption still comes from fossil sources.

And that’s a lot of energy.

The totals depend on which smallish nation we’re using as our standard unit of energy consumption: this September 2021 NYT article has Bitcoin (all by itself) consuming nearly as much energy as Finland. Yet whereas Finland produces Nokia cellphones and lovable wooden clogs, Bitcoin just produces… bitcoin. Not to mention a pathetic transaction rate of about 3.5 tx/second across the entire globe, as of this week.

Overview of mining pool activity on the Bitcoin chain. Does that look decentralized to you?

Regardless of where you stand on cryptocurrency as a technology, you should understand that this wasteful resource consumption colors the public’s views of cryptocurrency in a highly negative way, and it is absolutely right for people to have these feelings because we are in a climate crisis and wasting this much goddamn energy on a single consensus protocol is pointless, harmful, and quite possibly evil — particularly when the result of all this energy consumption isn’t even a particularly decentralized network.

However: before you call for some kind of misguided cryptocurrency ban, you should understand that this is a temporary situation and not an intrinsic component of public blockchain technology.

Proof-of-work was chosen early in Bitcoin’s history because Bitcoin was designed to be operated by volunteers with personal computers. The concern in this setting was that a single user could mount a “Sybil attack” and pretend to be many different computers, thus dominating the construction of blocks on the network. Since verifiable real-world identities didn’t exist on the Internet, Nakamoto chose proof-of-work as an elegant solution: this approach makes your “vote” in the network organization proportional to the amount of computing power you possess. Since the typical early Bitcoin user only had one or a small number of computer CPUs to mine with, this kept the early Bitcoin network relatively decentralized.

Unfortunately modern proof-of-work mining looks nothing at all like the early Bitcoin network: today’s mining is a capital-intensive industry that competes to build entire datacenters full of specialized ASIC-based mining hardware. This change has undone most of the early decentralization benefits, while pointlessly burning tons of coal and natural gas.

But all is not lost.

Proof-of-work is not the only technology we have on which to build consensus protocols. Today, many forward-looking networks are deploying proof-of-stake (PoS) for their consensus. In these systems, your “voting power” in the network is determined by your ownership stake in some valuable on-chain asset, such as a new or existing electronic token. Since cryptocurrency has coincidentally spent a lot of time distributing tokens, this means that new protocols can essentially “cut out the middleman” and simply use coin ownership directly as a proxy for voting power, rather than requiring operators to sell their coins to buy electricity and mining hardware. Proof-of-stake systems are not perfect: they still lead to some centralization of power, since in this paradigm the rich tend to get richer. However it’s hard to claim that the result will be worse than the semi-centralized mess that proof-of-work mining has turned into.

And proof-of-stake is no longer theory. It’s already been deployed in production within a number of successful projects, including Avalanche, Cardano, Algorand, and Tezos. The Ethereum project is in the process of rolling out a proof-of-stake upgrade they call “Ethereum 2”, and while the final plans still seem a little handwavy for this late date, there has at least been some real progress in launching parts of the system. Beyond proof-of-stake, there are other technologies in deployment, such as the proof-of-time-and-space construction used by Chia, or more centralized proof-of-authority systems.

Now, admittedly: none of this solves the problem that much cryptocurrency is dirty today.

The letter is actually surprisingly nuanced about this objection.

But the question we should be asking is not whether to be angry about the power consumption of proof-of-work mining. We should be trying to figure out the right path out of this mess. And more concretely, whether there’s a path forward which is more likely to produce a good outcome than what is already happening in the industry — namely, that projects are rapidly deploying cleaner technologies to replace proof-of-work. Because it’s very unlikely that shade or hypothetical cryptocurrency bans are going to fix the problem any faster, and in fact: government overreaction could make it much, much worse by driving resources away from the cleaner chains that are coming online to solve the problem.

Objection: “Public blockchains can never support banking features like transaction reversal.”

One of the biggest problems in the field of cryptocurrency is that too many technical experts stopped looking at the field around 2015. This means they’ve missed a lot of the more interesting developments that have occurred in the past couple of years.

To give an example of this phenomenon, let’s take one claim that occurs very near the top of my colleagues’ letter:

This claim is not technically accurate. And worse, it indicates that the readers have missed several years’ of business and technological development. Unfortunately, correcting mistakes like this requires diving pretty deep into the technical weeds.

Blockchains work by assembling a data structure called an append-only ledger. Much like a traditional pen-and-paper bank ledger (shown at right), this ledger represents a list of events, such as currency transactions. A common feature of blockchain tech is that the ledger is constructed using a kind of “adversarial collaboration” that runs between many different computers. The upshot of this process is that entries on the ledger are very difficult to tamper with.

In the earliest cryptocurrency systems (like Bitcoin), the ledger is used to record the ownership and transfer of made-up tokens, such as the bitcoin currency. Since there is no trusted party or “bank” that manages the accounts on these systems, Bitcoin’s transaction rules are very simple and work like cash. If I sent money to your account, only you (using a cryptographic private key) should be able to control where it goes next. This is exciting and also a bit scary: there is no “undo” feature in these cash-like tokens.

By contrast, the modern retail-facing credit card and banking industries work very differently. In those industries there exist trusted parties (your bank or credit card’s customer-service representative) who can and do “reverse” fraudulent or mistaken transactions under specific circumstances. (Whether they will do so is a very different question.)

While it’s technically accurate that blockchain ledgers cannot easily be overwritten, it’s critical to understand that ledgers really has nothing really to do with transaction reversibility — any more than the specific writing instrument used by a historical bank (pen vs. pencil) determines whether a bank can reverse transactions. In practice, transaction reversal has nothing to do with the way a ledger is written. Transaction reversal is not about ledger technology, it’s about transaction rules and trust: it requires that there is someone that you trust to make transaction in your accounts without your explicit permission.

In other words, transaction reversibility is not about the ledger, but rather about the transaction rules that a currency uses. A reversible currency requires that someone anoint this trusted party (or trusted parties) and that they use their powers to freeze/burn/transact currency in ways that are at odds with the recorded owners’ intentions. And indeed, this is a capability that many tokens now possess, thanks to the development of sophisticated smart contract systems like Ethereum, that allow parties to design currencies with basically any set of transaction rules they want.

And what’s fascinating is that none of this is hypothetical!

One of the most interesting developments of the past several years is the deployment of several government-regulated “stablecoins” that represent tokenized versions of real fiat currency (e.g., dollars) in a bank account. More or less without exception, these regulated currencies, which are issued by licensed and government-regulated organixations like USDC and BUSD (and are not the same as unregulated algorithmic scam coins like UST) each possess a centralized party/committee that can “freeze”, mint, or “burn” money owned by any user in the system [BUSD code, USDC code]. A centralized manager of the currency can therefore “lock” the account of any illegitimate recipient and compensate the spender directly, or in some cases, the centralized manager can “burn” and mint new currency to send back to the originator.

Indeed, this capability is explicitly mandated by regulators:

It is reasonable to point out that, compared to mature banking systems, current stablecoins’ transaction reversal capabilities are quite rudimentary. From a business perspective there is no guarantee that a coin issuer will return your stolen money, nor that they can do so in the event that a thief has already passed your money on to a fence. (Just as there’s no guarantee that Zelle will do so.) Modern banks implement these features with fraud-detection features and with a combination of delayed settlement, insurance, and trust. Some of these are technical capabilities, but many are largely business questions. In either case, none are “antithetical to the design of public blockchains.” If reversibility is actually important to you as a feature, then you should pay attention to how these new regulated systems develop.

Two regulated stablecoins with freeze capability: Binance USD (issued by Paxos) and
(issued by Circle/Centre). Chart shows market capitalization from near-$0 in 2020 to about $60bn today.

Objection: “Cryptocurrency doesn’t scale [or the fees are too damned high]”

The early Bitcoin protocol was designed to be many things, but fast and efficient was not one. The system’s famously low transaction rate is effectively the result of several tradeoffs in the design of the network’s consensus algorithm: where centralized payment systems like Visa or Mastercard can scale horizontally — by assigning different computers to handle various subsets of user transactions — in Bitcoin (and Ethereum, and most other extant platforms) every single participating node must validate every single transaction made by anyone in the system. This means that simply adding more computing power doesn’t produce a faster network.

The result is pretty dismal. Bitcoin’s transaction rate has historically topped out around 7 tx/sec worldwide (although recent upgrades may improve things slightly.) Ethereum pulls off maybe 20-30 tx/sec by sailing a bit closer to the wind. Meanwhile: networks like Visa handle about 1700 tx/sec on an ordinary day (!), and 10x that rate on major holidays.

This scaling problem makes cryptocurrency unworkable for just about any mainstream payments application. A single popular mobile game like Candy Crush probably conducts enough in-game transactions to challenge the Ethereum network.

via Etherscan.

And because the transaction rate in these so-called Layer 1 (L1) cryptocurrency networks is so low, competition for scarce network resources translates into high transaction fees. That’s why it recently cost $22 (!) to send a single token transaction on Ethereum, and much more to handle sophisticated transactions like DeFi swaps. These prices are fine if you’re a crypto speculator doing $1,000+ trades. But they rule out anything as mundane as paying people.

This sounds pretty bad. However, an important lesson I’ve learned in my career is this: if people are sufficiently motivated and the only barrier is an engineering problem, then it’s probably wise not to bet against them.

With some exceptions, the cryptocurrency community has acknowledged that existing techniques are unscalable, and they are engineering to try to get around this. The result has gone in two different directions. Both are unproven, and the resolution of these developments will determine exactly how well the field can grow in the future.

More money, more networks. The most obvious way to scale L1 cryptocurrencies is just to build more of them. This describes a lot of the action in 2019-2021: as networks like Ethereum saturate with transactions and become expensive, new entrants are deploying compatible (and sometimes incompatible) chains that provide more transaction capacity without the fees. These networks often look like Ethereum (in that they’re Nakamoto-consensus, proof-of-work mined systems), but sometimes they’re stripped down or use faster consensus technologies (examples include Avalanche, Polygon, Celo and Solana.) Some are just centralized clones of Ethereum.

This might stave off total chaos, but it probably isn’t sustainable. Even if all these networks work perfectly — and that’s a big assumption — the problem with adding more networks is that your ecosystem fragments. If funds are on the Ethereum chain and you want to use an application that lives on a different network, how do you get your funds over there? The solution today mainly involves “bridges” — semi-centralized parties that will accept your money on one L1 network and provide you with funds on another. If you want to move your funds back to the original network, that’s another pass through the bridge, with fees on both networks. It also means you have to trust the integrity of both the bridge itself and the destination network, which might itself be much less robust (and safe) than the original network you started from. It’s possible this approach will work well enough to enable most applications, but I wouldn’t bet on it.

Rollups. The second approach, heavily promoted by the Ethereum Foundation, is to improve the scaling of individual L1 networks like Ethereum by performing transactions on some second-layer, with the most common proposal being a “rollup server.”

Rollup servers are centralized machines that can validate many transactions quickly. One rollup server doesn’t handle every payment or smart contract on the chain. Instead, it operates over one or a small number of applications (say, a few specific smart contracts.) Users submit their transactions either to the chain itself, or directly to the server, which then validates the transactions and posts a short “proof” to the L1 chain asserting that a large collection of transactions has been checked and found to be valid. The idea here is to reduce the amount of computation and storage that the L1 nodes must perform to check these transactions: rather than validating 10,000 individual transactions, the L1 nodes simply verify one short assertion posted by the rollup server.

This sounds like magic, and it is, sort of. There are two approaches to building rollups, both of which are in “experimental” production today:

  • Optimistic rollups use a “trust and punish” approach. The rollup server posts a financial bond to assure the world that it will correctly validate all of its transactions. In the unlikely event that the rollup server “cheats” and authorizes an invalid transaction, any third party whistleblower can submit a “fraud proof” that proves the rollup server’s failure. The L1 network can check these proofs, which will invalidates the bad transactions and pay the whistleblower a large reward.
  • ZK rollups use cryptographic technology drawn from the field of zero-knowledge protocols, such as SNARK or STARK proofs, so the server can “prove” that all transactions were validated correctly before it posts the summary results to the chain. In principle this means the L1 chain can verify a short “proof” that covers many thousand complicated transactions, with (essentially) no possibility of cheating.

Rollups sound like a terrific idea, and the Ethereum community has bet heavily on the tech. But it’s worth pointing out that in practice nobody knows how well things will actually bear out when these systems are in widespread deployment.

One problem is that rollups today are largely focused on reducing the computational burden of verifying transactions. This is a big deal, particularly for Ethereum where verifying complex smart contract executions is quite costly. But even with today’s rollups, L1 nodes are still expected to store and transmit the raw transaction data: without keeping these transactions around, the loss of a rollup server could freeze an entire smart contract in place, blocking any further progress. This means scaling bottlenecks still exist — they’ll just be hit when nodes run low on bandwidth and storage, rather than compute.

That’s still a potentially huge improvement and many folks are optimistic: Vitalik Buterin calculates maximum possible scaling improvements on the order of 100x for rollup servers, though he quickly tempers this calculation by noting practical concerns.

In any case, the point of this section is not to claim that scaling is a solved problem. Rather, the point here is that scale improvements are on the minds of a lot of smart engineers, and there are solutions on the way. The results might not be perfect and presumably there will be much experimentation before we settle on a workable approach, but the end result will almost certainly work fine at some level of scalability. The main question is simply how robust and decentralized the result will be.

Objection: “There is no privacy on blockchains (or there is too much privacy)”

Public blockchains rely on volunteers to operate a network that verifies transactions. The implication of this design is that the transaction data itself must be publicly viewable. While a few naive people still believe that these currencies anonymous, the truth is quite different: these older public chains expose your transactions to anyone who wants to see them. In those systems, your main protection is that transactions use a pseudonym (called an address) in place of your real identity.

A random transaction from a Bitcoin block explorer.

Some advocates once felt that pseudonymity was good enough for privacy, but this belief has receded a bit as sophisticated “chain analysis” companies have grown up and made progress towards identifying the real owners of various accounts. The lack of privacy on these systems poses a real challenge for those who would like to build financial infrastructure on public chains.

The good news is that researchers have made a lot of progress on solving privacy problems around cryptocurrency. We now have deployed privacy tech that allows users to conduct transactions on public blockchains without revealing any information that they do not wish to reveal. These systems work by (effectively) encrypting transactions and using sophisticated zero-knowledge proofs to convince verifiers that the transactions data is consistent (i.e., that encrypted transaction amounts “add up” and do not create money out of thin air.) There now exist several deployed cryptocurrencies that provide strong privacy even against government surveillance, and law enforcement has expressed concerns about them.

Text from US Executive Order dated March 2022.

Still, to criticize an early technology for being both private and not-private-enough has a “nobody goes there, it’s too crowded” feel to it. What is true is that over many years the traditional financial system has learned to walk a tightrope where customer privacy is balanced on one hand against anti-money-laundering regulations on the other (as regulated by laws like the Bank Secrecy Act in the US.) Achieving those goals in today’s financial infrastructure has not been without cost, however. Our privacy as individuals has been vastly degraded by technological developments, with little opportunity for democratic debate. And the result sucks: collecting all of our private data and securing it is expensive, an expense we all pay for in high fees and catastrophic breaches. The cryptographic privacy offered by cryptocurrency is exciting because offers a different road forward: one that promises to keep irrelevant data in our own hands.

Even TornadoCash has regulatory compliance. Sort of.

The actual form that these mature systems take is unknown to me. Perhaps they’ll use zero-knowledge policies that keep smaller transactions private, while ensuring that larger transactions are known to regulators or other parties. I’m not excited to use those systems, because I think they’ll be risky. But at very least they’ll be better than our current collect-it-all-and-then-hand-it-to-hackers approach, which certainly has not done us very many favors.

So why do I care about any of this?

To put it simply: because payments are important. And because something is badly wrong.

Credit card merchant fees have risen since this ad was on TV.

Over the past four decades, computer networking has radically changed the economics of just about every industry that relies heavily on IT. Google made information access so easy that we can barely remember the world before it existed. My kids refuse to believe that I once paid $1/minute for long distance phone calls. It’s so inexpensive to start an online retailer that we now have more online pet food stores than we have drive-in movie theaters in the United States.

And yet if you looked at the money-transfer and payments industry, you’d see no such changes. Credit card merchant fees are similar, or have actually risen in the United States since the 1990s, and that is an absolute tragedy — since these fees are baked into the cost of most retail goods and thus fall heavily on the working poor (who pay them even if they use cash.)

If all you care about is technology, consumer payment tech has improved at a glacial pace. It took nearly two decades to roll out anti-fraud improvements like EMV chipchards and tap-to-pay (NFC) in the US. Shopping online in 1995 meant typing credit card numbers into webforms. In 2022… it mostly means exactly the same thing. As a result, online payment fraud has ballooned to around $200bn in 2020. And forget about real innovation like payment privacy or pay-by-phone (which is ubiquitous in Kenya and China but still in its infancy here in the US, and will likely only be “solved” by giving Apple and Google total control over payments.)

Why are these IT-focused industries so consistently immune to the same technological improvements and cost reductions we see everywhere else?

I’m only a computer scientist, so I’m going to let someone else answer that question. I can only tell you that what we have right now is not functioning properly: I suspect that legacy industry and regulators have smothered two generations of technological improvement, largely (I suspect) by building a (mostly) closed and permissioned financial system. And this is a big deal: payments are too important to our economy to entrust them to 1970s-era technology and an extractive industry. We don’t even know what novel applications — Googles, Facebooks, Wikipedias, Instagrams — we’re missing out on because the industry simply won’t allow them to exist.

I don’t know if blockchains are the solution to this problem. I see indications that the technology is finally starting to grow up in ways that seem like a harbinger of major positive changes on the horizon. Progress here is slow, though in some cases because the regulatory apparatus is throwing sand in the gears of cooperative products, and/or utterly failing to move expeditiously to uncover possible fraud. And maybe the result won’t even be a success for blockchain solutions: perhaps we’ll simply get more and better offerings from “traditional finance” industry as they start to wake up to the fact that more open systems can compete with their closed offerings.

So while I don’t know if cryptocurrency will be the answer, I’m just hopeful that something will be.

Title image by Flickr user Joegoauk Goa, used under CC license.

An extremely casual code review of MetaMask’s crypto

An extremely casual code review of MetaMask’s crypto

NB: This post describes a very casual code review of a few cryptography functions used by MetaMask. It does not describe any vulnerabilities. If you’re the kind of person who likes a meandering and amateurish code review that goes absolutely nowhere, you’ll enjoy this post. Otherwise you might want to read something more exciting: I suggest Moxie’s recent post on web3.

For reasons I can’t fully explain, the other day I decided that it might be fun to spend an hour or two investigating the cryptography used by MetaMask.

For those who aren’t familiar with the tool, MetaMask is a browser-based cryptocurrency wallet that is used used to access decentralized applications (dapps) on networks like Ethereum. My interest in MetaMask is mostly just based on curiosity: I recently invested about $100 into a decentralized finance application and I wanted to see how safe it really was. While there are lots of different ways I could lose that money, taking a look at MetaMask’s crypto seemed like as good a place to start as any.

I want to stress that this was an informal code review: I didn’t use any tooling, didn’t even download (most) code to my computer. In fact my “review” mostly involved poking around various Github repositories to see if I could find anything that immediately jumped out as incorrect, and failing that, at least could give me a feeling for the quality of MetaMask’s crypto code. (In fact I did about half the work on my phone while eating a burrito bowl at Chipotle.)

After an hour or two of hunting through dependencies, I made the mistake of tweeting about my feelings:

I swear that this offhand Tweet was not meant to make anyone unhappy or scare people that their funds were at risk. Although I had a couple of scary moments, those were entirely on me. So let me be clear as possible right now: I did not find any exploitable vulnerabilities in MetaMask’s crypto!

What I did come back with is an uncomfortable feeling about the complexity and quality of MetaMask’s (current) crypto code, and some unhappy feelings about its dependency structure. Some of this stuff is basically unavoidable: file it under “browser crypto is scary.” Some is specific to the way the code is laid out. And some of it is a (non-trivial) gripe: this code is is much harder to audit than it should be.

And this last part isn’t just a personal gripe. My TL;DR is that finding and reviewing the correct code was made difficult because there were far too many different organizations owning the dependencies that led to the crypto routines themselves. This made me uncomfortable, given how much money these routines are responsible for securing.

In this post I’m going to justify these opinions by walking you through a casual skim of MetaMask’s code and and crypto dependencies. To make you feel “like you were there”, I’m going to discuss the review more or less in the order it occurred — including the embarrassing part where I went off the rails and reviewed the wrong code.

If you enjoy reading crypto code for entertainment, this post might be for you. If you’re hoping that this will actually lead to anything exciting, I suspect you’ll be very disappointed.

Quick explainer: what is MetaMask?

If you already know what MetaMask does, skip this part.

Since many readers have probably never used DeFi, I figure I should give a quick background here on “web3” and MetaMask in particular.

From a technical perspective, web3 is pretty straightforward. It consists of many “decentralized apps” (dapps), each of which typically comprises a (typically Javascript) front-end web app, as well as some back-end business logic. What makes dapps special is the back-end portion, which is decentralized. Generally this means it relies on a smart contract running in a network like Ethereum.

Web3 front-ends are just web apps, and typically they’re served to your browser via standard web infrastructure. (Security here usually means HTTPS, so anyone who hacks a server or steals a Cloudflare API key can change these apps’ code in malicious ways.)

Of course, web apps can’t communicate directly to blockchains, nor are they a good place to store private keys. This is why MetaMask exists. In its most popular instantiation, MetaMask ships as a browser extension for Chrome and Firefox. When a web application needs to send a transaction to a smart contract (e.g., because you want to deposit money into Aave), MetaMask is responsible for signing the transaction and shipping the transaction to the chain.

On the bright side for this review, the actual cryptography in MetaMask is fairly limited: as a wallet, it must generate and store Ethereum public and private keys and it also needs to handle “simple” operations like ECDSA signing. On the other hand: we’re in a browser. So nothing is actually simple.

Climbing the dependency ladder

To put some bounds on the effort required, I decided to only look at the implementation of ECDSA signing and key generation (excluding the curve operations.) This seemed like an easy task I could get done in an hour or two while eating an extended lunch.

Finding the starting point for a review like this was harder than I expected. In short: it isn’t easy to grok the control flow of a complex event-based Javascript browser extension to find out exactly which calls are “real” and which are tests or dead code. To shortcut this problem, my approach amounts to “grepping and hoping”, starting from the “develop” branch of MetaMask’s extension repository.

A quick search for “sign” leads us to a promising call in the file metamask-controller.js:

Sadly, following this call path quickly leads us into dependency hell.

First, we reach a library called eth-keyring-controller, which presumably manages Ethereum keyrings. A quick scan of that library shows it calling a second dependency: eth-sig-util. (Both are NPM packages developed by MetaMask.) We’ll jump right to the latter package, which… you guessed it, calls yet another package:

This call takes us out of MetaMask-owned code and out to a new package called “ethereum-jsutil” that’s maintained by the “Ethereum Javascript community“, whoever they are. (Don’t worry, we won’t stay here long enough to care.) Of course, in this package we find yet another layer of dependency indirection:

From here, we head over to a package called “js-ethereum-cryptography“, which holy cow is actually a repository maintained by the Ethereum project itself!

At this point I’m about halfway into my allotted review time and asking myself questions like “why didn’t MetaMask just call this library directly” and “why do I make poor life choices”. But never mind, we’re almost there: surely we are going to see some actual crypto code soon!

Except no, we are not. This is what we find in the Ethereum library:

Yet another dependency: this time to something called “noble“. Here my quick-and-dirty approach to dependency resolution (Google “npm <package-name>”) fails me, since NPM says that “noble” is is a “Node.js Bluetooth Low Energy library.” As cool as that would be — access the blockchain through Bluetooth! — I’m guessing this is not right.

A bit more searching leads me to believe that noble is actually the Noble cryptography library, which appears to have been developed by Paul Miller. And hey, this code looks promising! The page lists actual cryptography design goals that seem reasonable, and the code is written in TypeScript. Even better: the library has been subject to an audit by Cure53.

Nonetheless, and with no disrespect to Paul or anyone else in this community: I would like to take a moment to complain about this dependency structure:

  1. Resolving all these pointless dependencies has eaten up a lot more of my review time than you might imagine. (I’m leaving out all the times I accidentally visited the wrong libraries because they used some combination of “js” and “ethereum” and “cryptography” and just Googling is risky here.)
  2. More substantively, I can’t help but notice that there are a lot of code owners in this critical path. So far I’ve traveled through repositories owned by four different organizations, and the last one is someone’s personal Github account. Is this normal for a system that secures billions of dollars of user funds? Maybe it should not be.

But enough complaining. There is actual crypto code here! We can finally look at ECDSA.

Except… it turns out that we can’t do that, because I made a stupid mistake.

After speaking with Paul Miller on Twitter, I learned that this whole code review has been premised on a very foolish assumption: namely that the code in the main (development) branches of MetaMask and its dependencies is actually the code people are using in MetaMask today. That turns out to be wrong. In fact, Paul tells me, the noble library is slated for deployment in an upcoming release. The current release of MetaMask relies on a library called “elliptic“, which was written by Fedor Indutny.

I’m now scraping up those last bits of cheese in the burrito bowl.

Let’s look at elliptic

Ok, so forget everything I did above. That was my mistake. I promise to come back to those newer code paths soon, but that’s for the future. My goal in this review was to look at MetaMask as it exists today. Apparently that means I need to look at Fedor’s elliptic library.

(Note: if I was being professional then what I would really do is review all the release branches of MetaMask and all of its dependencies just to make sure I’m in the right place, and to see what the calls looked like. But life is too short: I’m going to trust Paul.)

The elliptic library is written in “plain old” Javascript, so types will tend to be confusing. On the bright side, it’s relatively compact. The core library supports Ed25519 (for EdDSA) and Secp256k1 for ECDSA on networks like Ethereum. Let’s focus on the latter.

I’m not a Javascript developer so certain things aggravate me about reviewing this code. One is that developers often put critical routines in stupidly-named files like “index.js”, which is where we finally reach the core signing implementation for ECDSA.

Signing

ECDSA is a stupid algorithm, but fortunately it’s not a very complicated one. Leaving aside the curve operations, there are basically two places where ECDSA implementations tend to go wrong. The first is in key generation. The other is in signing.

In ECDSA signing the main security risk is in how nonces are generated: it’s critical that the ECDSA nonce (“k”) is sampled uniformly from the range 1 … n-1, where n is the group order. Critically, the same nonce must never be used with two distinct messages (really, message hashes.) If this ever happens, it’s possible to recover a private key from two signatures, something that’s generally frowned upon.

Let’s look at how signing is handled in this library:

So looking at the code above, how do we get “k“?

A first observation is that there is no actual randomness here, no call to a random number generator. Instead, the signing routine instantiates a deterministic random bit generator (DRBG) based on HMAC. That algorithm stretches a shorter “seed” into a longer sequence of pseudorandom bits that can then be used to obtain our random nonce.

I initially had a small heart attack when I (briefly!) misread the code and thought that the only seed for the DRBG was the ECDSA private key: this would definitely lead to nonce re-use. On closer inspection, the DRBG is seeded with two fields: entropy is set to the ECDSA private key, and nonce is set to the (hashed) message to be signed. Within the underlying DRBG implementation both values get hashed into the a value called seed, which actually is the seed for the DRBG. This design should mean that each (private key, message) pair will get different random bits, and thus different nonces “k”.

While I don’t love a purely deterministic generation of “k” (and I’ll explain why below), this isn’t some roll-your-own idea: in fact this is appears very similar to the approach recommended by RFC 6979. And going one step farther, it appears that elliptic is actually directly implementing an algorithm from that RFC, specifically the one in section 3.3, “Alternate Description of the Generation of k.” Annoyingly I did not learn this fact from the code itself, which would have saved me a lot of time. I realized it only while reading through the RFC for unrelated reasons (in fairness, elliptic’s README does mention it.)

(A brief aside to developers: if you implement a standard algorithm, please please add a reference in comments at the point where you’re using it, and comment each step so reviewers can see exactly how it maps to the pseudocode. For a good example of how to do this, take a look at how noble implemented the same standard.)

So are there any issues with this implementation?

A particularly weird feature is that the caller of the sign() function can pass in a data structure called options. One element of this structure appears to be a function, which is named (not very descriptively) options.k. If included in the options structure, this function appears to override the built-in nonce generation and replaces it with logic the caller provides. This is weird. While I can see an argument for allowing callers to provide extra entropy, I can’t see a good reason why a caller should be allowed to entirely override nonce generation within the signing routine.

Overall, this seems like a giant footgun and also a good opportunity for some caller to slip in malicious code that could easily slip past a review. (Indeed, thanks to all this dependency confusion I’m not even sure which of MetaMask’s dependencies is calling this routine.)

As a weird added bonus, the same options struct can also be passed in the enc argument as well; if this happens, the structure will be copied over into options. I’m sure there are perfectly good reasons for this.

A few smaller notes:

  • The function _truncateToN() seems poorly defined and serves two slightly different purposes, triggered by a boolean flag — which is just omitted in some calls.
  • In the first mode it simply truncates the message to ensure it is the same length as the group order n. This mode is only used to truncate nonces, which is not something that should ever need to happen — except when you’re using the crazy options.k function discussed above.
  • When called with “truncOnly = false“, however, _truncateToN() will also ensure that the truncated result is less than or equal to n, and if not it will subtract n. This appears to implement the bits2octets subroutine of RFC 6979, which is fine I guess but took too long to figure out. Why not just write two routines and name them appropriately?

Neither of these two issues are critical, but they made the code harder to read.

As a final note: I would urge developers to avoid these purely deterministic ECDSA implementations, mostly because they make signing implementations very fragile. Since all of the “entropy” in this signing routine comes from the message and private key, this means that any non-determinism in the actual ECDSA implementation (e.g., caused by weird option flags or glitches in BigNum encoding) can potentially produce exploitable signatures if the same message is ever signed twice. (As Nadia points out: this could also happen if you install the same private key into two different wallet implementations and sign the same message.)

In either case: adding some genuine entropy to the nonce could remove these concerns in most cases. Section 3.6 of RFC 6979 describes how to do this by adding an additional random input.

Key generation

Update (1/14): the comments below discuss key generation in elliptic, but MetaMask uses a BIP39-based keyphrase, which means that most ECDSA keys are actually generated elsewhere (allegedly in bitcoinjs/bip39). In retrospect this should have been obvious to me, but thanks to @kumavis_ for pointing it out. I’m keeping the section below just to be complete, but this code (probably) isn’t used by MetaMask.

The last stop on this very brief tour is the ECDSA key generation algorithm. This be found slightly higher up in the same file:

Once again we have an appearance from our old friend HMAC-DRBG. As with the signing routine, the caller can supply an options structure. Fortunately this one only contains optional entropy, and does not allow the caller to completely override key generation. Even better: the input to HMAC-DRBG appears to include actual randomness, drawn from a call to a function called rand().

Does rand() produce actual random numbers in all browsers? I think so, but really could not tell you for certain. This routine is implemented by a package called indutny/brorand, the entirety of which I’m pasting below, just because it made me burst into maniacal laughter:

Let’s just assume the entropy is good. The remainder of the (private) key generation function takes place in these lines:

This generates a random number priv between [0…2^{256} – 1] and then makes sure that priv is not greater than the group order n, in which case it re-generates priv. It then adds 1, which I think gives us a priv in the range [1…n+1], which seems wrong, since it should be in the range [1…n-1]. Am I doing my arithmetic backwards?

A basic point to make here is that much of this review has come down to examining code that solves a single problem: (deterministically) sampling integers within a precise range. So far I’ve reviewed at least two different custom implementations of the same process, both with slightly different results. Why is the same code repeated so often? Just make this a subroutine so we can analyze it and be sure it’s working correctly.

Anyway the (possible) issues above probably don’t really matter. I’ll assume in most cases private keys are generated correctly using a secure random number generator, which takes the most obvious risks off the table.

Conclusions

Ok, I am basically just exhausted now. This was a lot of work to evaluate a tiny piece of a crypto library that, frankly, might not even be the actual crypto library that MetaMask is using. I’m not sure if any of this was worth it, and I’m starting to get indigestion from the Chipotle.

If I could summarize my overall findings above, they would look something like this:

  1. I do not like reviewing Javascript for many reasons, not least of which is the lack of types. Fortunately many MetaMask dependencies are written in TypeScript. This library should be too. (Fortunately Noble appears to have made this upgrade.)
  2. The stupid dependency chain between MetaMask and its crypto libraries makes reviews more difficult than they need to be, and adds too many parties that need to be trusted. This should be greatly simplified, unless there is a reason for it that I don’t understand.
  3. The crypto code may be well-written in a cryptographic sense, but it was not really optimized for humans to review. This makes any review much harder, and I think anything that makes reviews harder is bad for users.
  4. Allowing callers to specify dangerous optional arguments to crypto routines seems bad.
  5. If you’re writing a security-critical routine like “sample random integer in a precise range,” write that routine one time, not multiple times. Please!
  6. Too much dependency on random numbers can make ECDSA dangerous. I would argue that too much determinism can be just as risky. It might be good to find a solution that mixes the two, as discussed forther above.

If it was up to me I would re-write the entire codebase in TypeScript and would try to use more standard libraries. I would remove layers of dependencies. I would tighten up the crypto APIs to make sure malicious calls are harder to get away with. Finally, I would make sure every single major block in this code had crystal-clear comments explaining precisely what it is doing.

As a last point, I would move all of this code back under the control of some more centralized organization(s), rather than leaving essential code in random personal repositories.

In summary: I don’t think this tiny review was entirely a waste of time. Although I don’t love everything about the way this code works, I’m now 15% more confident that MetaMask isn’t doing something utterly stupid with my cryptographic keys. That seems like a genuine win.

Next post: I’m going to review the noble-cryptography libraries to see what the new MetaMask code is planning to do.

Thinking about “traceability”

Thinking about “traceability”

A few weeks back, the messaging service WhatsApp sued the Indian government over new legislation that could undermine its end-to-end encryption (E2EE) software. The legislation requires, among other things, that social media and messaging companies must include the ability to “trace” the source of harmful viral content.

This tracing capability has been a major issue in India due to several cases of misinformation content that led to brutal mob attacks. The ostensible goal of the new legislation is to make it possible for police to track down those who originate or disseminate this content. Put simply, what the authorities say they want is a means to identify a piece of content (for example, a video or a meme) that has gone to a large group of people, and then trace the content back to the WhatsApp account that originally sent it.

I don’t plain to weigh in on whether this policy is a good idea or viable on the merits, nor is it in my wheelhouse to say whether the Indian government is being forthright in their reasons for demanding this capability. (I will express a very grave degree of skepticism that this approach will catch any criminal who is remotely malicious and takes steps to cover their tracks.) In this post I mostly want to talk about the technology implications for encrypted messaging services, and what tracing features might mean for end-to-end encrypted systems like WhatsApp.

Why is content tracing hard in the first place?

A first thing to keep in mind is that content tracing is not really a natural feature for messaging system. That is, tracing content poses a challenge even for completely unencrypted messaging apps: the kind where service providers can see all data traveling through the system. Tracing back to a content originator requires that the provider must be able to identify a file received at some end-user’s account, and then “chase” the content backwards through time through each user account that forwarded it. Even this description doesn’t quite respect the difficulty of the problem: not every user literally hits a “forward” button to send content onward. Many will save and re-upload a file, which breaks the forwarding chain and can even produce a slightly different file — thanks to the magic of digital compression.

These problems aren’t intractable. In a system with no end-to-end encryption, they could perhaps be solved using perceptual hash functions that identify similar media files by constructing a “fingerprint” of each file that can easily be compared against other files, and can survive re-encoding or minor edits. (This is the same technology that’s used in detecting child sexual abuse imagery, something I wrote about here.) What’s important is that even with this technology, the obvious approach to content tracing requires the provider to have plaintext access to (at least the hashes of) user content.

This turns out to be a big problem for encrypted communication systems like WhatsApp, in which end-to-end encryption protects the confidentiality of content even from the gaze of the service provider. In WhatsApp, all messages (as well as file attachments) are encrypted directly from sender to recipient, using an encryption key that WhatsApp doesn’t possess. With a few engineering caveats,* tracing content in these systems is very difficult.

But difficult is not the same thing as impossible. A recent post by WhatsApp makes the case that tracing is fundamentally impossible to implement securely in an end-to-end encrypted system. While this claim seems intuitively correct, it’s also kind of unsatisfying. After all, “impossible” is a strong word, and it’s highly dependent on which assumptions you’re making. The problem with imprecise claims is that they invite arguments — and indeed WhatsApp’s claim has has already been disputed by some in the field.

In this post I’m going to consider the following simple question: is traceability in end-to-end encrypted systems actually possible? And if so, what are the costs to privacy and security? For the record: I’m writing this post as much to answer the question for myself as to answer it for anyone else, so don’t expect this to be short or simple. I’m working things out as I go along.

Let’s start with the most basic question.

What is “traceability” in an end-to-end encrypted system?

The biggest problem with the debate over content tracing is that nobody seems to have a terribly rigorous definition of what, precisely an end-to-end encrypted tracing scheme is supposed to do — or more precisely, what its limits will be. To understand whether these systems can be built, we need to think hard about what they’re supposed to do in the first place.

From the perspective of law enforcement, content tracing is a simple feature. A piece of “viral” content (say an image file) has been widely distributed through a messaging platform, and authorities have decided that this content is harmful. Since the content is widespread, we can assume that police have received a copy of the file at an account they control (e.g., their own accounts, or the account of a cooperating user.) The authorities wish to contact the service provider and ask them for the originator and/or major spreaders of the content. This gives us our first requirement:

  • Traceability: given a piece of “viral” content received by a device/account (plus any cryptographic keys at the device), a tracing scheme should be able to reliably trace the content back to the originator (or at least, earlier) accounts.

From the user’s side, E2EE systems are supposed to maintain the confidentiality of user communications. Confidentiality is a broad term and can mean a lot of things. In this case it has two specific flavors that are relevant, with names that I just made up now:

I wanted to illustrate this post with memes about the Swedish monarchy. Unfortunately, it turns out that Swedish Monarchy memes basically suck.
  • The confidentiality of the content itself: this ensures that forwarded files are known only to the sender and authorized recipients(s) of a conversation. Notice that for viral content, this property isn’t terribly important. Remember: our assumption is that the content itself has ultimately been forwarded widely, until (nearly) everyone has received a copy.
  • The confidentiality of “who sent what”: while the content itself may not be secret, the fact that a given user transmitted a piece of content is still quite sensitive. If I send you a political meme — perhaps the one at right, poking fun at the King of Sweden — then I might not care very much about the secrecy of the meme itself. But I sure might want to hide the fact that I sent it, to avoid retribution by a totalitarian Swedish government.** Proper end-to-end encryption is supposed to protect this sort of expression.

In short: traceability can really screw with the “who sent what” side of content confidentiality. It is a fairly harmless thing for, say, the tyrannical Swedish government to learn that specific memes about the King of Sweden exist. It is very different for them to know that I’ve been sending a lot of them to a specific group of friends.

Finally I need to clarify one more thing, since discussions with colleagues have made me realize that it is not obvious. Information revealed about “who sent what” in an E2E system is not the same as metadata. I feel stupid having to point this out, but metadata (information about data that we can’t easily hide from providers, such as the list of contacts you’ve communicated with) is a very different thing. WhatsApp might inevitably learn that I texted 500 people last month because they delivered my (encrypted) messages. They still shouldn’t learn that any of my messages are making fun of the Swedish monarchy.

So can traceability be accomplished without breaking E2E?

It really depends what you mean by “traceability” and what you mean by “breaking.”

While confidentiality and traceability may seem like they’re in conflict, it’s important to point out that some forms of tracing can be implemented in a non-coercive way that does not inherently violate confidentiality. For example, imagine Alice originates a meme, and this meme subsequently makes its way to police officer Eve via the following forwarding path:

Provided that Bob, Charlie and Dave are willing to cooperate with the police, then Eve can use shoe-leather detective work to trace the content backwards towards Alice. After all: each participant (1) is an authorized recipient of the data and (2) knows who they received the content from. Nobody is “breaking” E2E if they perform this sort of cooperative tracing: it’s just people sharing information they’re already entitled to have.

It’s now time to say a stupid and obvious thing: what’s being proposed in India is not cooperative tracing.

Let’s be clear: if detective work and cooperation was sufficient to trace the originators of harmful content, the police wouldn’t be asking for new encryption laws, and WhatsApp wouldn’t be suing the Indian government.

By passing these laws, police are tacitly admitting that voluntary content tracing is not sufficient to met their investigative needs. This implies that when police try to follow a chain like the one shown above, they’re running into people who are either (1) unwilling to share this information in a timely way, or (2) simply don’t have the information anymore — maybe because they deleted the messages or lost their phone.

Let’s draw a picture of this situation. Below, each node represents a WhatsApp account, with the red node being the originator of some viral content, and the blue node representing police. Green nodes represent users who are willing to cooperate with the police, provided they are contacted. Here the gray nodes are users who won’t cooperate — either because they didn’t keep the information, or maybe because they just don’t like police.

Police (blue) can try to cooperatively trace content backwards through this forwarding graph by talking to cooperative users (green), but they’ll never reach the originator (red) because there are too many non-cooperating nodes (gray) in the way.

The prevalance of uncooperative nodes in the above graph makes it virtually impossible for cooperative tracing to find the originator. It seems obvious that real-world situations like this will make voluntary tracing very difficult to achieve.

This brings us to the central challenge of all content tracing proposals so far: to make tracing possible, a tracing system needs to turn every WhatsApp user (including the originator) into a cooperative green circle — regardless of whether users actually want to cooperate with police. Moreover, to prevent users from losing their phones and/or going offline, the system will need to force users to place the necessary tracing information into escrow as soon as they receive content, so it will remain available even if users leave the network or lose their phones.

Not only that, but each of these newly “cooperative” users might even be forced to admit to police that they also forwarded the content. Don’t want to tell the Swedish government that you made fun of their beloved King? Then you’d better not use a system that follows this pattern of enforced traceability.

How do we force users to cooperate?

The major questions facing an end-to-end tracing system are twofold: (1) precisely how much information are recipients going to be forced to reveal against their will, and (2) who will be able to access this information?

There are at least two proposals that I’ve seen for adding traceability to E2EE communications schemes, and both start from similar assumptions. They both rely on making changes to the end-users’ client software to ensure that tracing information is stored in “escrow” at the provider every time content is sent from one user to another.

One proposal is academic, and it takes something like the following “strawman” approach:

  1. Each time someone sends content to another user, they will generate some fresh encryption key K.
  2. They will use this key to encrypt a record that contains (A) the content (or a hash of it) and (B) the sender and receiver identities. They will store the encrypted record on WhatsApp’s servers as a kind of “key escrow.” Critically, at this point they will not send WhatsApp the key K.
  3. The sender will transmit the record encryption key K to its recipient, using end-to-end encryption.
  4. When the next user forwards the same content on to another user, it will repeat steps (1-4) and it will also send all the keys generated by previous users.

Now if the police receive a copy of some viral content on an account they control, they will have a list of encryption keys that correspond to everyone in the forwarding chain for that content. They can just go back to WhatsApp with a subpoena, obtain the encrypted records, and use the chain of keys to decrypt them. This will reveal the entire forwarding path back to the originator.

Alice sends a message to Bob and “escrows” some encrypted information on WhatsApp’s servers. She sends the encryption key to Bob. Bob forwards to Eve and “escrows” similar information, sending the key to Eve. Eve (a police officer) can now use her key to decrypt the records stored at WhatsApp until she learns who Alice is.

Of course sending thousands of keys along with each forwarded message is kind of a drag, so there are some efficiency optimizations one can use to compress this information. For example, each time a user forwards a message they can store the previous user’s encryption key inside the encrypted record they escrow with WhatsApp. That means if police get one key — corresponding to the last record in a chain — they can decrypt the escrow record, and then they will obtain the key for the previous record in the chain. They can repeat this process until the entire forwarding chain is “unzipped”.

A lazy diagram (at right) shows how this process might work with three participants. Essentially the whole thing is a form of key escrow, with WhatsApp acting as the escrow authority. If the police get included in any chain at all, they (Eve in this diagram) can subpoena WhatsApp to trace the chain back to originator.

Of course, this is a very simple strawman explanation of the ideas: for a more fully-specified (academic) proposal, you can see this paper by Tyagi, Miers and Ristenpart. Not only does it support path traceback, but it also lets you figure out who else the message was forwarded to! The cryptography is a bit more optimized, but the security guarantees are roughly the same.

A second proposal by Dr V. Kamakoti of IIT Madras is far simpler: it essentially requires each person who originates new content into the network (as opposed to forwarding it) to attach a “watermark” to the content that identifies the account ID of the sender. This also assumes a trustworthy WhatsApp client, of course. Presumably that watermark could be encrypted using a key stored at WhatsApp, so this tracing will at least require the provider’s involvement.

Ok, so what’s wrong with these traceability proposals?

Well if you’re ok with the fact that police can determine the identity of every single person who forwarded a piece of viral content regardless of whether they’re not the originator of that content then, I guess, nothing.

That’s the essence of what the Tyagi, Miers and Ristenpart proposal offers, and frankly I’m not particularly ok with it. Even if I accepted the logic that we should have the means to trace “content originators” — the actual justification governments have offered for building systems like this one — I surely would not want to reveal every random user account that happened to forward the content. That seems like a recipe for persecuting innocent people.

Moreover, regardless of whether the system finds “content originators” or just “everyone on the forwarding path”, I think these ideas are pretty much synonymous with mass surveillance — and certainly they buttress WhatsApp’s technical claim that “traceability” breaks end-to-end encryption.

But I want to go just a little farther, and point out that these ideas have a major weakness that makes the entire approach feel confused.

The approaches I describe above rely on a critical assumption: that all participants in the system are going to behave honestly — that is, everyone will run the official WhatsApp client, which will contain logic designed to store an escrow record on WhatsApp’s servers. Nobody in this system will try to bypass this system by running an unofficial client, or by hacking their client to disable the escrow logic.

If you’re willing to make such a strong assumption, why bother with the complicated Tyagi, Miers and Ristenpart proposal? Why not just use the Kamakoti proposal: modify your WhatsApp client to add a small “watermark” to each fresh non-forwarded media file. After all: once you’ve assumed that everyone is running an honest client, you can assume that the content originator will be too — can’t you? This approach would still reveal a lot of information to police, but it wouldn’t reveal the identity of every random person who forwarded the content.

My guess is that Tyagi, Miers and Ristenpart have an answer to this that boils down to something like “maybe you can’t trust the originator to run the correct client, but loads of other people will be running it.” To me this invites a much more detailed discussion about what security assumptions you’re making, and how “bad” the bad guys really are.

One last note to academic authors: don’t help bad people build unrestricted surveillance systems and then punt “preventing abuse” to later papers, ok?

If you read this far to answer the (rarified) question of how traceability could work and whether it breaks E2E encryption, then you can stop here. The rest of this post is not about that. It’s just me alienating a whole bunch of my academic peers.

Here is what I want to say to them.

The debate around key escrow and law enforcement surveillance is a very hard one. People have a lot of opinions about whether this work is “helping the good guys” or “helping the bad guys”, i.e., whether it’s about helping police find criminals, or whether it’s going to build the infrastructure for authoritarianism and spying. I don’t know the answer to this. I suppose the answer depends to some extent on the integrity of the government(s) that are implementing them. I have opinions, but I don’t expect all of my colleagues to share them.

What I would ask my colleagues to think hard about is the following:

When you propose (or review a paper that proposes) a new “lawful access” system, is it solving the hard problems, or is it punting on the hard problems and solving only the easy ones?

Because at the end of the day, building systems that violate the confidentiality of E2E encryption is a relatively easy problem from a scientific perspective. We’ve known how to build key escrow systems from the earliest days of encryption. Building these systems is not interesting, scientifically. It is useful from an engineering perspective, of course — to parties who want to deploy such systems. When respected academics write such papers, it is also politically useful to those same parties.

What is scientifically interesting is whether we can build systems that actually prevent abuse, either by governments that misuse the technology or by criminals who steal keys. We don’t really know to do that very well right now. This is the actual scientific problem of designing law enforcement access systems — not the “access” part, which is relatively easy and mostly consists of engineering. In short, the scientific problem is figuring out how to prevent the wrong type of access.

When I read a paper that builds a sophisticated surveillance system, I expect it to address those abuse problems in a meaningful way. If the paper punts the important problems to subsequent work — if what I get is a paragraph like the one at right — my thinking is that you aren’t solving the right problem. You’re just laying the engineering groundwork for a world I don’t want my kids to live in. I would politely ask you all to stop doing that.

Notes:

* Some messaging systems implement attachment forwarding by passing a pointer to an existing file that is stored on their servers. This is a nice storage optimization, since it avoids the need to make and store a full duplicate copy of each object whenever the user hits “forward”. The downside of this approach is that it makes tracing relatively easy for providers, since they can see exactly which users accessed a given file. Such optimizations are inimical to private systems and really should be avoided.

** All claims about the Swedish government are fictionalized.

A case against security nihilism

A case against security nihilism

This week a group of global newspapers is running a series of articles detailing abuses of NSO Group’s Pegasus spyware. If you haven’t seen any of these articles, they’re worth reading — and likely will continue to be so as more revelations leak out. The impetus for the stories is a leak comprising more than 50,000 phone numbers that are allegedly the targets of NSO’s advanced iPhone/Android malware.

Notably, these targets include journalists and members of various nations’ political opposition parties — in other words, precisely the people who every thinking person worried would be the target of the mass-exploitation software that NSO sells. And indeed, that should be the biggest lesson of these stories: the bad thing everyone said would happen now has.

This is a technical blog, so I won’t advocate for, say, sanctioning NSO Group or demanding answers from the luminaries on NSO’s “governance and compliance” committee. Instead I want to talk a bit about some of the technical lessons we’ve learned from these leaks — and even more at a high level, precisely what’s wrong with shrugging these attacks away.

We should all want perfect security!

Don’t feel bad, targeted attacks are super hard!

A perverse reaction I’ve seen from some security experts is to shrug and say “there’s no such thing as perfect security.” More concretely, some folks argue, this kind of well-resourced targeted attack is fundamentally impossible to prevent — no matter how much effort companies like Apple put into stopping it.

And at the extremes, this argument is not wrong. NSO isn’t some script-kiddy toy. Deploying it costs hundreds of thousands of dollars, and fighting attackers with that level of resources is always difficult. Plus, the argument goes, even if we raise the bar for NSO then someone with even more resources will find their way into the gap — perhaps charging an even more absurd price. So let’s stop crapping on Apple, a company that works hard to improve the baseline security of their products, just because they’re failing to solve an impossible problem.

Still that doesn’t mean today’s version of those products are doing everything they could be to stop attacks. There is certainly more that corporations like Apple and Google could be doing to protect their users. However, the only way we’re going to get those changes is if we demand them.

Not all vectors are created equal

Because spyware is hard to capture, we don’t know precisely how Pegasus works. The forensic details we do have come from an extensive investigation conducted by Amnesty International’s technology group. They describe a sophisticated infection process that proved capable of infecting a fully-patched iPhone 12 running the latest version of Apple’s iOS (14.6).

Many attacks used “network injection” to redirect the victim to a malicious website. That technique requires some control of the local network, which makes it hard to deploy to remote users in other countries. A more worrying set of attacks appear to use Apple’s iMessage to perform “0-click” exploitation of iOS devices. Using this vector, NSO simply “throws” a targeted exploit payload at some Apple ID such as your phone number, and then sits back and waits for your zombie phone to contact its infrastructure.

This is really bad. While cynics are probably correct (for now) that we probably can’t shut down every avenue for compromise, there’s good reason to believe we can close down a vector for 0-interaction compromise. And we should try to do that.

What can we do to make NSO’s life harder?

What we know that these attacks take advantage of fundamental weaknesses in Apple iMessage: most critically, the fact that iMessage will gleefully parse all sorts of complex data received from random strangers, and will do that parsing using crappy libraries written in memory unsafe languages. These issues are hard to fix, since iMessage can accept so many data formats and has been allowed to sprout so much complexity over the past few years.

There is good evidence that Apple realizes the bind they’re in, since they tried to fix iMessage by barricading it behind a specialized “firewall” called BlastDoor. But firewalls haven’t been particularly successful at preventing targeted network attacks, and there’s no reason to think that BlastDoor will do much better. (Indeed, we know it’s probably not doing its job now.)

Adding a firewall is the cheap solution to the problem, and this is probably why Apple chose this as their first line of defense. But actually closing this security hole is going to require a lot more. Apple will have to re-write most of the iMessage codebase in some memory-safe language, along with many system libraries that handle data parsing. They’ll also need to widely deploy ARM mitigations like PAC and MTE in order to make exploitation harder. All of this work has costs and (more importantly) risks associated with it — since activating these features can break all sorts of things, and people with a billion devices can’t afford to have .001% of them crashing every day.

An entirely separate area is surveillance and detection: Apple already performs some remote telemetry to detect processes doing weird things. This kind of telemetry could be expanded as much as possible while not destroying user privacy. While this wouldn’t necessarily stop NSO, it would make the cost of throwing these exploits quite a bit higher — and make them think twice before pushing them out to every random authoritarian government.

It’s the scale, stupid

Critics are correct that fixing these issues won’t stop exploits. The problem that companies like Apple need to solve is not preventing exploits forever, but a much simpler one: they need to screw up the economics of NSO-style mass exploitation.

Targeted exploits have been around forever. What makes NSO special is not that they have some exploits. Rather: NSO’s genius is that they’ve done something that attackers were never incentivized to do in this past: democratize access to exploit technology. In other words, they’ve done precisely what every “smart” tech business is supposed to do: take something difficult and very expensive, and make it more accessible by applying the magic of scale. NSO is basically the SpaceX of surveillance.

But this scalability is not inevitable.

NSO can afford to maintain a 50,000 number target list because the exploits they use hit a particular “sweet spot” where the risk of losing an exploit chain — combined with the cost of developing new ones — is low enough that they can deploy them at scale. That’s why they’re willing to hand out exploitation to every idiot dictator — because right now they think they can keep the business going even if Amnesty International or CitizenLab occasionally catches them targeting some human rights lawyer.

But companies like Apple and Google can raise both the cost and risk of exploitation — not just everywhere, but at least on specific channels like iMessage. This could make NSO’s scaling model much harder to maintain. A world where only a handful of very rich governments can launch exploits (under very careful vetting and controlled circumstances) isn’t a great world, but it’s better than a world where any tin-pot authoritarian can cut a check to NSO and surveil their political opposition or some random journalist.

So how do we get to that world?

In a perfect world, US and European governments would wake up and realize that arming authoritarianism is really is bad for democracy — and that whatever trivial benefit they get from NSO is vastly outweighed by the very real damage this technology is doing to journalism and democratic governance worldwide.

But I’m not holding my breath for that to happen.

In the world I inhabit, I’m hoping that Ivan Krstić wakes up tomorrow and tells his bosses he wants to put NSO out of business. And I’m hoping that his bosses say “great: here’s a blank check.” Maybe they’ll succeed and maybe they’ll fail, but I’ll bet they can at least make NSO’s life interesting.

But Apple isn’t going to do any of this if they don’t think they have to, and they won’t think they have to if people aren’t calling for their heads. The only people who can fix Apple devices are Apple (very much by their own design) and that means Apple has to feel responsible each time an innocent victim gets pwned while using an Apple device. If we simply pat Apple on the head and say “gosh, targeted attacks are hard, it’s not your fault” then this is exactly the level of security we should expect to get — and we’ll deserve it.

Why the FBI can’t get your browsing history from Apple iCloud (and other scary stories)

Why the FBI can’t get your browsing history from Apple iCloud (and other scary stories)

It’s not every day that I wake up thinking about how people back up their web browsers. Mostly this is because I don’t feel the need to back up any aspect of my browsing. Some people lovingly maintain huge libraries of bookmarks and use fancy online services to organize them. I pay for one of those because I aspire to be that kind of person, but I’ve never been organized enough to use it.

In fact, the only thing I want from my browser is for my history to please go away, preferably as quickly as possible. My browser is a part of my brain, and backing my thoughts up to a cloud provider is the most invasive thing I can imagine. Plus, I’m constantly imagining how I’ll explain specific searches to the FBI.

All of these thoughts are apropos a Twitter thread I saw last night from the Engineering Director on Chrome Security & Privacy at Google, which explains why “browser sync” features (across several platforms) can’t provide end-to-end encryption by default.

This thread sent me down a rabbit hole that ended in a series of highly-scientific Twitter polls and frantic scouring of various providers’ documentation. Because while on the one hand Justin’s statement is mostly true, it’s also a bit wrong. Specifically, I learned that Apple really seems to have solved this problem. More interestingly, the specific way that Apple has addressed this problem highlights some strange assumptions that make this whole area unnecessarily messy.

This munging of expectations also helps to explain why “browser sync” features and the related security tradeoffs seem so alien and horrible to me, while other folks think these are an absolute necessity for survival.

Let’s start with the basics.

What is cloud-based browser “sync”, and how secure is it?

Most web browsers (and operating systems with a built-in browser) incorporate some means of “synchronizing” browsing history and bookmarks. By starting with this terminology we’ve already put ourselves on the back foot, since “synchronize” munges together three slightly different concepts:

  1. Synchronizing content across devices. Where, for example you have a phone, a laptop and a tablet all active and in occasional use and want your data to propagate from one to the others.
  2. Backing up your content. Wherein you lose all your device(s) and need to recover this data onto a fresh clean device.
  3. Logging into random computers. If you switch computers regularly (for example, back when we worked in offices) then you might want to be able to quickly download your data from the cloud.

(Note that the third case is kind of weird. It might be a subcase of #1 if you have another device that’s active and can send you the data. It might be a subcase of #2. I hate this one and am sending it to live on a farm upstate.)

You might ask why I call these concepts “very different” when they all seem quite similar. The answer is that I’m thinking about a very specific question: namely, how hard is it to end-to-end encrypt this data so that the cloud provider can’t read it? The answer is different between (at least) the first two cases.

If what we really want to do is synchronize your data across many active devices, then the crypto problem is relatively easy. The devices generate public keys and register them with your cloud provider, and then each one simply encrypts relevant content to the others. Apple has (I believe) begun to implement this across their device ecosystem.

If what we want is cloud backup, however, then the problem is much more challenging. Since the base assumption is that the device(s) might get lost, we can’t store decryption keys there. We could encrypt the data under the user’s device passcode or something, but most users choose terrible passcodes that are trivially subject to dictionary attacks. Services like Apple iCloud and Google (Android) have begun to deploy trusted hardware in their data centers to mitigate this: these “Hardware Security Modules” (HSMs) store encryption keys for each user, and only allow a limited number of password guesses before they wipe the keys forever. This keeps providers and hackers out of your stuff. Yay!

Except: not yay! Because, as Justin points out (and here I’m paraphrasing in my own words) users are the absolute worst. Not only do they choose lousy passcodes, but they constantly forget them. And when they forget their passcode and can’t get their backups, do they blame themselves? Of course not! They blame Justin. Or rather, they complain loudly to their cloud backup providers.

While this might sound like an extreme characterization, remember: when you have a billion users, the extreme ones will show up quite a bit.

The consequence of this, argues Justin, is that most cloud backup services don’t use default end-to-end encryption for browser synchronization, and hence your bookmarks and in this case your browsing history will be stored at your provider in plaintext. Justin’s point is that this decision flows from the typical user’s expectations and is not something providers have much discretion about.

And if that means your browsing history happens to get data-mined, well: the spice must flow.

Except none of this is quite true, thanks to Apple!

The interesting thing about this explanation is that it’s not quite true. I was inclined to believe this explanation, until I went spelunking through the Apple iCloud security docs and found that Apple does things slightly differently.

(Note that I don’t mean to blame Justin for not knowing this. The problem here is that Apple absolutely sucks at communicating their security features to an audience that isn’t obsessed with reading their technical documentation. My students and I happen to be obsessive, and sometimes it pays dividends.)

What I learned from my exploration (and here I pray the documentation is accurate) is that Apple actually does seem to provide end-to-end encryption for browser data. Or more specifically: they provide end-to-end encryption for browser history data starting as of iOS 13.

Image

More concretely, Apple claims that this data is protected “with a passcode”, and that “nobody else but you can read this data.” Presumably this means Apple is using their iCloud Keychain HSMs to store the necessary keys, in a way that Apple itself can’t access.

What’s interesting about the Apple decision is that it appears to explicitly separate browsing history and bookmarks, rather than lumping them into a single take-it-or-leave-it package. Apple doesn’t claim to provide any end-to-end encryption guarantees whatsoever for bookmarks: presumably someone who resets your iCloud account password can get those. But your browsing history is protected in a way that even Apple won’t be able to access, in case the FBI show up with a subpoena.

That seems like a big deal and I’m surprised that it’s gotten so little attention.

Why should browser history be lumped together with bookmarks?

This question gets at the heart of why I think browser synchronization is such an alien concept. From my perspective, browsing history is an incredibly sensitive and personal thing that I don’t want anywhere. Bookmarks, if I actually used them, would be the sort of thing I’d want to preserve.

I can see the case for keeping history on my local devices. It makes autocomplete faster, and it’s nice to find that page I browsed yesterday. I can see the case for (securely) synchronizing history across my active devices. But backing it up to the cloud in case my devices all get stolen? Come on. This is like the difference between backing up my photo library, and attaching a GoPro to my head while I’m using the bathroom.

(And Google’s “sync” services only stores 90 days of history, so it isn’t even a long-term backup.)

One cynical answer to this question is: these two very different forms of data are lumped together because one of them — browser history — is extremely valuable for advertising companies. The other one is valuable to consumers. So lumping them together gets consumers to hand over the sweet, sweet data in exchange for something they want. This might sound critical, but on the other hand, we’re just describing the financial incentive that we know drives most of today’s Internet.

A less cynical answer is that consumers really want to preserve their browsing history. When I asked on Twitter, a bunch of tech folks noted that they use their browsing history as an ad-hoc bookmarking system. This all seemed to make some sense, and so maybe there’s just something I don’t get about browser history.

However, the important thing to keep in mind here is that just because you do this doesn’t mean it should drive a few billion people’s security posture. The implications of prioritizing the availability of browser history backups (as a default) is that vast numbers of people will essentially have their entire history uploaded to the cloud, where it can be accessed by hackers, police and surveillance agencies.

Apple seems to have made a different calculation: not that history isn’t valuable, but that it isn’t a good idea to hold the detailed browser history of a billion human beings in a place where any two-bit police agency or hacker can access it. I have a very hard time faulting them in that.

And if that means a few users get upset, that seems like a good tradeoff to me.

Ok Google: please publish your DKIM secret keys

Ok Google: please publish your DKIM secret keys

The Internet is a dangerous place in the best of times. Sometimes Internet engineers find ways to mitigate the worst of these threats, and sometimes they fail. Every now and then, however, a major Internet company finds a solution that actually makes the situation worse for just about everyone. Today I want to talk about one of those cases, and how a big company like Google might be able to lead the way in fixing it.

This post is about the situation with Domain Keys Identified Mail (DKIM), a harmless little spam protocol that has somehow become a monster. My request is simple and can be summarized as follows:

Dear Google: would you mind rotating and publishing your DKIM secret keys on a periodic basis? This would make the entire Internet quite a bit more secure, by removing a strong incentive for criminals to steal and leak emails. The fix would cost you basically nothing, and would remove a powerful tool from hands of thieves.

That’s the short version. Read on for the long.

What the heck is DKIM, and how does it protect my email?

Electronic mail was created back in the days when the Internet was still the ARPANET. This was a gentler time when modern security measures — and frankly, even the notion that the Internet would need security — was a distant, science-fiction future.

The early email protocols (like SMTP) work on the honor system. Emails can arrive at your mail server directly from a sender’s mail server, or they can pass through intermediaries. In either case, when an email says it comes from your friend Alice, you trust that it comes from Alice. What possible reason would there be for anyone to lie?

The mainstream adoption of email showed that this attitude was pretty badly miscalibrated. In the space of a few years, Internet users discovered that there were plenty of people who would lie about who they were. Most of these were email spammers, who were thrilled that SMTP allowed them to impersonate just about any sender — your friend Alice, your boss, the IRS, a friendly Nigerian prince. Without a reliable mechanism to prevent that spamming, email proved hilariously vulnerable to spoofing.

To their great credit, email providers quickly realized that email without sender authenticity was basically unworkable. To actually filter emails properly, they needed to be able to verify at least which server an email had originated from. This property has a technical name; it’s called origin authenticity.

The solution to the origin authenticity, like all fixes to core Internet protocols, is kind of a band-aid. Email providers were asked to bolt on an (optional) new cryptographic extension called Domain Keys Identified Mail, or DKIM. DKIM bakes digital signatures into every email sent by a participating mail server. When a recipient mail server obtains a DKIM-signed email claiming to originate from, say, Google: it first looks up Google’s public key using the Domain Name System (DNS). The recipient can now verify a signature to ensure that the email is authentic and unmodified — the signature covers the content and most of the headers — and they can then use that knowledge as an input to spam filtering. (A related protocol called ARC offers a similar guarantee.)

Of course, this approach isn’t perfect. Since DKIM is optional, malicious intermediaries can “strip off” the DKIM signatures from a given email in an effort to convince recipients that the email was never DKIM-signed. A related protocol, DMARC, uses DNS to allow mail senders to broadcast preferences that enforce the checking of signatures on their email messages. These two protocols, when used together, should basically wipe out spoofing on the Internet.

Example DKIM signature on some automated email I received today.

What’s the problem with DKIM/ARC/DMARC, and what’s “deniability”?

As an anti-spam measure there’s nothing really wrong with DKIM, ARC and DMARC. The problem is that DKIM signing has an unexpected side effect that goes beyond its initial spam-filtering purpose. Put simply:

DKIM provides a life-long guarantee of email authenticity that anyone can use to cryptographically verify the authenticity of stolen emails, even years after they were sent.

This new non-repudiation feature was not part of DKIM’s design goals. The designers didn’t intend it, nobody discussed whether it was a good idea, and it seems to have largely taken them by surprise. Worse, this surprise feature has some serious implications: it makes us all more vulnerable to extortion and blackmail.

To see the problem, it helps to review the goals of DKIM.

The key design goal of DKIM was to prevent spammers from forging emails while in transit. This means that receiving servers really do need to be able to verify that an email was sent by the claimed origin mail server, even if that email transits multiple untrusted servers on the way.

However, once mail transmission is complete, the task of DKIM is complete. In short: the authenticity guarantee only needs to hold for a short period of time. Since emails generally take only a few minutes (or hours, in unusual cases) to reach their destination, the authenticity guarantee does not need to last for years, nor do these signatures need to be exposed to users. And yet here we are.

Until a few years ago, nobody thought much about this. In fact, in the early DKIM configurations were kind of a joke: mail providers chose DKIM signing keys that were trivial for motivated attackers to crack. Back in 2012 a security researcher named Zachary Harris pointed out that Google and several other companies were using using 512-bit RSA to sign DKIM. He showed that these keys could be “cracked” in a matter of hours on rented cloud hardware, and then used these keys to forge emails from Larry and Sergey.

Providers like Google reacted to the whole “Larry and Sergey” embarassment in the way you’d expect. Without giving the implications any serious thought, they quickly ramped up their keys to 1024-bit or 2048-bit RSA. This stopped the forgeries, but inadvertently turned a harmless anti-spam protocol into a life-long cryptographic authenticity stamp — one that can be used to verify the provenance of any email dump, regardless of how it reaches the verifier.

You’re crazy. Nobody uses DKIM to authenticate emails.

For better or for worse, the DKIM authenticity stamp has been widely used by the press, primarily in the context of political email hacks. It’s real, it’s important, and it’s meaningful.

The most famous example is also one of the most divisive: back in 2016, Wikileaks published a batch of stolen emails stolen from John Podesta’s Google account. Since the sourcing of these emails was murky, WikiLeaks faced a high burden in proving to readers that these messages were actually authentic. DKIM provided an elegant solution: every email presented on Wikileaks’ pages publicly states the verification status of the attached DKIM signatures, something you can see today. The site also provides a helpful resource page for journalists, explaining how DKIM proves that the emails are real.

But the Podesta emails weren’t the end of the DKIM story. In 2017, ProPublica used DKIM to verify the authenticity of emails allegedly sent to a critic by President Trump’s personal lawyer Mark Kasowitz. In 2018, the Associated Press used it once again to verify leaked emails tying a Russian lawyer to Donald Trump Jr. And it happened again this year, when the recipients of an alleged “Hunter Biden laptop” provided a single 2015 email to Rob Graham for DKIM verification, in an effort to overcome journalistic skepticism at the sourcing of their information.

Some will say that DKIM verification doesn’t matter, and that leaked emails will be believed or rejected based on their content alone. Yet multiple news organizations’ decision to rely on DKIM clearly demonstrates how wrong that assumption is. News organizations, including Wikileaks, implicitly admit that questionable sourcing of emails creates doubt: perhaps so much so that credible publication in a national news organization is impossible. DKIM short-circuits this problem, and allows anyone to leap right over that hurdle.

The Associated Press even provides a DKIM verification tool.

In short, an anti-spam protocol originally designed to provide short-lived authenticity for emails traveling between email servers has mutated — without any discussion or consent from commercial mail customers — into a tool that provides cryptographically undeniable authentication of every email in your inbox and outbox. This is an amazing resource for journalists, hackers, and blackmailers.

But it doesn’t benefit you.

So what can we do about this?

DKIM was never intended to provide long-lived authenticity for your emails. The security guarantees it provides are important, but need only exist for a period of hours (perhaps days on the outside) from the moment a mail server transmits your email. The fact that DKIM can be used to prove authenticity of stolen email from as long ago as 2015 is basically a screwup: the result of misuse and misconfiguration by mail providers who should know better.

Fortunately there’s an easy solution.

DKIM allows providers to periodically “rotate”, or replace, the keys that they use to sign outgoing emails. The frequency of this rotation is slightly limited by the caching behavior of the DNS infrastructure, but these limits aren’t very tight. Even a big provider like Google can easily replace its signing keys at least every few weeks without disrupting email transit. This sort of key replacement is good practice in any case, and it represents part of the solution.

Of course, merely replacing DKIM keypairs does nothing by itself: smart people on the Internet routinely archive DKIM public keys. This is, in fact, how a 2015 Google email was verified in 2020: the key that Google used for verifying DKIM emails during that long-ago time period (a single key was used from 2012-2016, seriously Google, this is just malpractice!) is no longer in use, but has been cached in various places on the Internet.

Solving this problem requires only a small additional element: Google needs to publish the secret key portion of its keypairs once they’ve been rotated out of service. They should post this secret key in an easily-accessible public location so that anyone can use it to forge alleged historical emails from any Google user. The public availability of Google’s signing key would make any new email leak cryptographically deniable. Since any stranger would have been able to forge a DKIM signature, DKIM signatures become largely worthless as evidence of authenticity.

(People who run their own mail servers can achieve the same thing automatically by using this awesome script.)

Google could launch the process right now by releasing its ancient 2016-era private keys. Since the secrecy of these serves literally no security purpose at this point, except for allowing third parties to verify email leaks, there’s no case for keeping these values secret at all. Just dump them.

(A paranoid reader might also consider the possibility that motivated attackers might already have stolen Google’s historical secret DKIM keys. After all, DKIM signing keys aren’t exactly the crown jewels of Google’s ecosystem, and are unlikely to receive Google’s strongest security efforts. By keeping its keys secret in that case, Google is simply creating a situation where specific actors can counterfeit emails with impunity.)

But DKIM authenticity is great! Don’t we want to be able to authenticate politicians’ leaked emails?

Modern DKIM deployments are problematic because they incentivize a specific kind of crime: theft of private emails for use in public blackmail and extortion campaigns. An accident of the past few years is that this feature has been used primarily by political actors working in a manner that many people find agreeable — either because it suits a partisan preference, or because the people who got “caught” sort of deserved it.

But bad things happen to good people too. If you build a mechanism that incentivizes crime, sooner or later you will get crimed on.

Email providers like Google have made the decision, often without asking their customers, that anyone who guesses a customer’s email password — or phishes one of a company’s employees — should be granted a cryptographically undeniable proof that they can present to anyone in order to prove that the resulting proceeds of that crime are authentic. Maybe that proof will prove unnecessary for the criminals’ purposes. But it certainly isn’t without value. Removing that proof from criminal hands is an unalloyed good.

The fact that we even have to argue about this makes me really sad.

Timestamping, better crypto, and other technical objections

Every time I mention this idea of dumping old secret keys, I get a batch of technical objections from people who make really good points but are thinking about a stronger threat model than the one we generally face.

The most common objection is that publication of secret keys only works if the signed emails were obtained after the secret keys were published. By this logic, which is accurate, any emails stolen and published before the DKIM secret key is published are not deniable. For example, if someone hacks your account and immediately publishes any email you receive in real-time, then cryptographic verifiability is still possible.

Some even pose a clever attack where recipients (or hackers who have persistent access to your email account) use a public timestamping service, such as a blockchain, to verifiably “stamp” each email they receive with the time of receipt. This allows such recipients to prove that they had the signed email before the DKIM secret key became public — and checkmate.

This is a nice theoretical hack and it’s clever, but it’s also basically irrelevant in the sense that it addresses a stronger threat model. The most critical issue with DKIM today is that DKIM signatures sit around inside your archived mailbox. This means that a hacker who breaches my Gmail account today can present DKIM signatures on emails I sent/received years ago. Publishing old DKIM secret keys would take care of this problem instantly. Fixing the theoretical “real-time hacker” scenario can wait until later.

Another objection I receive sometimes is that cryptographic authentication is a useful feature. And I agree with that, in some settings. The problem with DKIM is that no customers asked for this feature as a default in their commercial mail account. If people want to cryptographically authenticate their emails, there exists a lovely set of tools they can use for this purpose.

Finally, there’s a question of whether the DKIM deniability problem can be fixed with exciting new cryptography. As a cryptographic researcher, I’m enthusiastic about that direction. In fact: my co-authors Mike Specter and Sunoo Park and I recently wrote a paper about how a long-term fix to DKIM might work. (Mike wrote a great blog post about it.) I don’t claim that our solution is necessarily the best approach, but I’m hoping that it will inspire more work in the future.

But sometimes the best solution is the obvious one. And right now, Google as the largest commercial mail provider, could make a huge impact (and protect their customers against future leaks) by doing something very simple. It’s a mystery to me why they won’t.

Attack of the week: Voice calls in LTE

Attack of the week: Voice calls in LTE

I haven’t written an “attack of the week” post in a while, and it’s been bumming me out. This is not because there’s been a lack of attacks, but mostly because there hasn’t been an attack on something sufficiently widely-used that it can rouse me out of my blogging torpor.

But today brings a beautiful attack called ReVoLTE, on a set of protocols that I particularly love to see get broken: namely, cellular protocols. And specifically, the (voice over) LTE standards. I’m excited about these particular protocols — and this new attack — because it’s so rare to see actual cellular protocols and implementations get broken. This is mostly because these standards are developed in smoke-filled rooms and written up in 12,000 page documents that researchers never have the energy to deal with. Moreover, implementing the attacks requires researchers to mess with gnarly radio protocols.

And so, serious cryptographic vulnerabilities can spread all over the world, presumably only exploited by governments, before a researcher actually takes a look at them. But every now and then there’s an exception, and today’s attack is one of them.

The attack itself is by David Rupprecht, Katharina Kohls, Thorsten Holz, and Christina Pöpper at RUB and NYU Abu Dhabi. It’s a lovely key re-installation attack on a voice protocol that you’re probably already using, assuming you’re one of the older generation who still make phone calls using a cellular phone.

Let’s start with some background.

What is LTE, and what is VoLTE?

The basis for our modern cellular telephony standards began in Europe back in the 1980s, with a standard known as Global System for Mobile. GSM was the first major digital cellular telephony standard, and it introduced a number of revolutionary features such as the use of encryption to protect phone calls. Early GSM was designed primarily for voice communications, although data could be sent over the air at some expense.

As data became more central to cellular communications, the Long Term Evolution (LTE) standards were devised to streamline this type of communication. LTE builds on a group of older standards such as GSM, EDGE and HSPA to make data communication much faster. There’s a lot of branding and misbranding in this area, but the TL;DR is that LTE is a data communications system that serves as a bridge between older packet data protocols and future 5G cellular data technologies.

Of course, history tells us that once you have enough (IP) bandwidth, concepts like “voice” and “data” start to blur together. The same is true with modern cellular protocols. To make this transition smoother, the LTE standards define Voice-over-LTE (VoLTE), which is an IP-based standard for transmitting voice calls directly over the data plane of the LTE system, bypassing the circuit-switched portion of the cellular network entirely. As with standard VoIP calls, VoLTE calls can be terminated by the cellular provider and connected to the normal phone network. Or (as is increasingly common) they can be routed directly from one cellular customer to another, even across providers.

Like standard VoIP, VoLTE is based on two popular IP-based protocols: Session Initiation Protocol (SIP) for call establishment, and Real Time Transport Protocol (which should be called RTTP but is actually called RTP) to actually handle voice data. VoLTE also adds some additional bandwidth optimizations, such as header compression.

Ok, what does this have to do with encryption?

Like GSM before it, LTE has a standard set of cryptographic protocols for encrypting packets while they travel over the air. These are mainly designed to protect your data while it travels between your handset (called the “User Equipment”, or UE) and the cellular tower (or wherever your provider decides to terminate the connection.) This is because cellular providers view outside eavesdroppers as the enemy, not themselves. Obviously.

(However, the fact that VoLTE connections can occur directly between customers on different provider networks does mean that the VoLTE protocol itself has some additional and optional encryption protocols that can happen at higher network layers. These aren’t relevant to the current paper except insofar as they could screw things up. We’ll talk about them briefly further below.)

Historical GSM encryption had many weaknesses: bad ciphers, protocols where only the handset authenticated itself to the tower (meaning an attacker could impersonate a tower, giving rise to the “Stingray“) and so on. LTE fixed many of the obvious bugs, while keeping a lot of the same structure.

Let’s start with the encryption itself. Assuming key establishment has already happened — and we’ll talk about that in just a minute — each data packet is encrypted using a stream cipher mode using some cipher called “EEA” (which in practice can be implemented with things like AES). The encryption mechanism is basically CTR-mode, as shown below:

Basic encryption algorithm for VoLTE packets (source: ReVoLTE paper). EEA is a cipher, “COUNT’ is a 32-bit counter, “BEARER” is a unique session identifier that separates VoLTE connections from normal internet traffic. And “DIRECTION” indicates whether the traffic is going from UE to tower or vice-versa.

Since the encryption algorithm itself (EEA) can be implemented using a strong cipher like AES, it’s unlikely that there’s any direct attack on the cipher itself, as there was back in the GSM days. However, even with a strong cipher, it’s obvious that this encryption scheme is a giant footgun waiting to go off.

CTR mode nonce re-use attacks were a thing when Poison was a thing.

Specifically: the LTE standard uses an (unauthenticated) stream cipher with a mode that will be devastatingly vulnerable should the counter — and other inputs, such as ‘bearer’ and ‘direction’ — ever be re-used. In modern parlance the term for this concept is “nonce re-use attack“, but the potential risks here are not modern. They’re well-known and ancient, going back to the days of hair-metal and even disco.

In fairness, the LTE standards says “don’t re-use these counters, please“. But the LTE standards are also like 7,000 pages long, and anyway, this is like begging toddlers not to play with a gun. Inevitably, they’re going to do that and terrible things will happen. In this case, the discharging gun is a keystream re-use attack in which two different confidential messages get XORed with the same keystream bytes. This is known to be utterly devastating for message confidentiality.

So what’s ReVoLTE?

The ReVoLTE attack paper points out that, indeed, this highly vulnerable encryption construction is in fact, misused by real equipment in the wild. Specifically, the authors analyze actual VoLTE calls made using commercial equipment, and show that they can exploit something called a “key re-installation attack”. (Much credit for the basic observation goes to Raza and Lu, who first pointed out the potential vulnerability. But the ReVoLTE research turns it into a practical attack.)

Let me give a quick overview of the attack here, although you should really read the paper.

You might assume that once LTE sets up a packet data connection, voice-over-LTE is just a question of routing voice packets over that connection alongside all of your other data traffic. In other words, VoLTE would be a concept that exists only above Layer 2. This isn’t precisely the case.

In fact, LTE’s data link layer introduces the concept of a “bearer“. Bearers are separate session identifiers that differentiate various kinds of packet traffic. Normal Internet traffic (your Twitter and Snapchat) goes over one bearer. SIP signalling for VoIP goes over another, and voice traffic packets are handled on a third. I don’t have much insight into the RF and network routing mechanisms of LTE, but I presume this is done because LTE networks want to enable quality of service mechanisms to ensure that these different packet flows are treated with different priority levels: i.e., your crummy TCP connections to Facebook can be prioritized at a lower level than your real-time voice calls.

This isn’t exactly a problem, but it raises an issue. Keys for LTE encryption are derived separately each time a new “bearer” is set up. In principle this should happen afresh each time you make a new phone call. This would result in a different encryption key for each call, thus eliminating the possibility that the same key will be re-used to encrypt two different sets of call packets. Indeed, the LTE standard says something like “you should use a different key each time you set up a new bearer to handle a new phone call.” But that doesn’t mean it happens.

In fact, in real implementations, two different calls that happen in close temporal proximity will end up using the exact same key — despite the fact that new (identically-named) bearers are configured between them. The only practical change that happens between those calls is that the encryption counter will reset back to zero. In the literature, this is sometimes called a key reinstallation attack. One can argue that this is basically an implementation error, although in this case the risks seem largely set up by the standard itself.

In practice, this attack leads to keystream re-use where an attacker can obtain the encrypted packets C_1 = M_1 \oplus KS and C_2 = M_2 \oplus KS, which allows her to compute C_1 \oplus C_2 = M_1 \oplus M_2. Even better, if the attacker knows one of M_1 or M_2, she can immediately recover the other. This gives her a strong incentive to know one of the two plaintexts.

This brings us to the complete and most powerful attack scenario. Consider an attacker who can eavesdrop the radio connection between a target phone and the cellular tower, and who somehow gets “lucky enough” to record two different calls where the second happens immediately subsequent to the other. Now imagine she can somehow can guess the plaintext contents of one of the calls. In this eventuality, our attacker can completely decrypt the first call, using a simple XOR evaluation between the two sets of packets.

And of course, as it happens — luck has nothing to do with it. Since phones are designed to receive calls, an attacker who can eavesdrop that first call will be able to initiate a second call at exactly moment the first call ends. This second call, should it re-use the same encryption key with a counter set back to zero, will enable plaintext recovery. Even better, since our attacker actually controls the data in the second call, she may be able to recover the contents of the first one — pending a whole lot of implementation-specific details all breaking in her favor.

Here’s a picture of the overall attack, taken from the paper:

Attack overview from the ReVoLTE paper. This diagram assumes that two different calls happen using the same key. The attacker controls a passive sniffer (top left) as well as a second handset that they can use to make a second call to the victim phone.

So does the attack actually work?

At one level, this is really the entire question for the ReVoLTE paper. All of the ideas above sound great in theory, but they leave a ton of questions. Such as:

  1. Is it feasible for (academic researchers) to actually sniff VoLTE connections?
  2. Do real LTE systems actually re-install keys?
  3. Can you actually initiate that second call quickly and reliably enough to make a handset and tower re-use a key?
  4. Even if systems do re-install keys, can you actually know the digital plaintext of the second call — given that things like codecs and transcoding may totally change the (bitwise) contents of that second call, even if you have access to the “bits” flowing out of your attacker phone?

The ReVoLTE paper answers several of these questions in the affirmative. The authors are able to use a commercial software-defined radio downlink sniffer called Airscope in order to eavesdrop the downlink side of a VoLTE call. (As is typical with academic research, I expect that simply getting hold of the software and figuring out how to work it took months off some poor graduate students’ lives.)

In order for key re-use to happen, the researchers discovered that a second call has to occur very rapidly after the termination of the first one, but not too rapidly — about ten seconds for the providers they experimented with. Fortunately, it doesn’t really matter if the target picks the call up within that time — the “ringing”, i.e., SIP communication itself causes the provider to re-use the same key.

Many of the gnarliest issues thus revolve around issue (4), obtaining all of the plaintext bits for the attacker-initiated call. This is because a lot of things can happen to your plaintext as it travels from your attacker handset out to the victim’s phone and through the cellular network. These include nastiness such as transcoding of encoded audio data, which makes the audio sound the same but totally changes the binary representation of the audio. LTE networks also use RTP header compression that can substantially change big portions of the RTP packet.

Finally, packets sent by the attacker need to roughly line up with packets that happened in the first phone call. This can be problematic, as silent patches in a phone call result in shorter messages (called comfort noise), which may not overlap well with the original call.

The “real world attack” section of the paper is worth reading for all the details. It addresses many of the above concerns — specifically, the authors find that some codecs are not transcoded, and that roughly 89% of the binary representation of the target call can be recovered, for at least two European providers that the attackers tested.

This is an astonishingly high level of success, and frankly much better than I anticipated when I started the paper.

So what can we do to fix this?

The immediate answer to this question is straightforward: since the vulnerability is a key re-use (re-installation) attack, just fix this attack. Make sure to derive a new key for each phone call, and never allow your packet counter to reset back to zero with the same key. Problem solved!

Or maybe not. Getting this right will require upgrading a lot of equipment, and frankly the fix itself isn’t terribly robust. It would be nice if standards could find a cleaner way to implement their encryption modes that isn’t instantly and catastrophically vulnerable to these nonce-reuse issues.

One possible direction is to use modes of encryption where nonce-misuse doesn’t result in catastrophic outcomes. This might be too expensive for some current hardware, but it’s certainly a direction that designers should be thinking about for the future, particular as the 5G standards are about to take over the world.

This new result also raises a general question about why the same damned attacks keep cropping up in standard after standard, many of which use very similar designs and protocols. At a certain point when you’ve had this same key re-installation issue happen in multiple widely-deployed protocols such as WPA2, maybe it’s time to make your specifications and testing procedures more robust to it? Stop treating implementers as thoughtful partners who will pay attention to your warnings, and treat them as (inadvertent) adversaries who are inevitably going to implement everything incorrectly.

Or alternatively, we can do what the Facebooks and Apples of the world are increasingly doing: make voice call encryption happen at a higher level of the OSI network stack, without relying on cellular equipment manufacturers to get anything right. We could even promote end-to-end encryption of voice calls, as WhatsApp and Signal and FaceTime do, assuming the US government would just stop trying to trip us up. Then (with the exception of some metadata) many of these problems would go away. This solution is particularly pertinent in a world where governments aren’t even sure if they trust their equipment providers.

Alternatively, we could just do what our kids have already done: and just stop answering those annoying voice calls altogether.

Why is Signal asking users to set a PIN, or “A few thoughts on Secure Value Recovery”

Why is Signal asking users to set a PIN, or “A few thoughts on Secure Value Recovery”

Over the past several months, Signal has been rolling out a raft of new features to make its app more usable. One of those features has recently been raising a bit of controversy with users. This is a contact list backup feature based on a new system called Secure Value Recovery, or SVR. The SVR feature allows Signal to upload your contacts into Signal’s servers without — ostensibly — even Signal itself being able to access it.

The new Signal approach has created some trauma with security people, due to the fact that it was recently enabled without a particularly clear explanation. For a shorter summary of the issue, see this article. In this post, I want to delve a little bit deeper into why these decisions have made me so concerned, and what Signal is doing to try to mitigate those concerns.

What’s Signal, and why does it matter?

For those who aren’t familiar with it, Signal is an open-source app developed by Moxie Marlinkspike’s Signal Technology Foundation. Signal has received a lot of love from the security community. There are basically two reasons for this. First: the Signal app has served as a sort of technology demo for the Signal Protocol, which is the fundamental underlying cryptography that powers popular apps like Facebook Messenger and WhatsApp, and all their billions of users.

Second, the Signal app itself is popular with security-minded people, mostly because the app, with its relatively smaller and more technical user base, has tended towards a no-compromises approach to the security experience. Wherever usability concerns have come into conflict with security, Signal has historically chosen the more cautious and safer approach — as compared to more commercial alternatives like WhatsApp. As a strategy for obtaining large-scale adoption, this is a lousy one. If your goal is to build a really secure messaging product, it’s very impressive.

Let me give an example.

Encrypted messengers like WhatsApp and Apple’s iMessage routinely back up your text message content and contact lists to remote cloud servers. These backups undo much of the strong security offered by end-to-end encryption — since they make it much easier for hackers and governments to obtain your plaintext content. You can disable these backups, but it’s surprisingly non-obvious to do it right (for me, at least). The larger services justify this backup default by pointing out that their less-technical users tend be more worried about lost message history than by theoretical cloud hacks.

Signal, by contrast, has taken a much more cautious approach to backup. In June of this year, they finally added a way to manually transfer message history from one iPhone to another, and this transfer now involves scanning QR codes. For Android, cloud backup is possible, if users are willing to write down a thirty-digit encryption key. This is probably really annoying for many users, but it’s absolutely fantastic for security. Similarly, since Signal relies entirely on phone numbers in your contacts database (a point that, admittedly, many users hate), it never has to back up your contact lists to a server.

What’s changed recently is that Signal has begun to attract a larger user base. As users with traditional expectations enter the picture, they’ve been unhappy with Signal’s limitations. In response, the signal developers have begun to explore ways by which they can offer these features without compromising security. This is just plain challenging, and I feel for the developers.

One area in which they’ve tried to square the circle is with their new solution for contacts backup: a system called “secure value recovery.”

What’s Secure Value Recovery?

Signal’s Secure Value Recovery (SVR) is a cloud-based system that allows users to store encrypted data on Signal’s servers — such that even Signal cannot access it — without the usability headaches that come from traditional encryption key management. At the moment, SVR is being used to store users’ contact lists and not message content, although that data may be on the menu for backup in the future.

The challenge in storing encrypted backup data is that strong encryption requires strong (or “high entropy”) cryptographic keys and passwords. Since most of us are terrible at selecting, let alone remembering strong passwords, this poses a challenging problem. Moreover, these keys can’t just be stored on your device — since the whole point of backup is to deal with lost devices.

The goal of SVR is to allow users to protect their data with much weaker passwords that humans can actually can memorize, such as a 4-digit PIN. With traditional password-based encryption, such passwords would be completely insecure: a motivated attacker who obtained your encrypted data from the Signal servers could simply run a dictionary attack — trying all 10,000 such passwords in a few seconds or minutes, and thus obtaining your data.

Signal’s SVR solves this problem in an age-old way: it introduces a computer that even Signal can’t hack. More specifically, Signal makes use of a new extension to Intel processors called Software Guard eXtensions, or SGX. SGX allows users to write programs, called “enclaves”, that run in a special virtualized processor mode. In this mode, enclaves are invisible to and untouchable by all other software on a computer, including the operating system. If storage is needed, enclaves can persistently store (or “seal”) data, such that any attempt to tamper with the program will render that data inaccessible. (Update: as a note, Signal’s SVR does not seal data persistently. I included this in the draft thinking that they did, but I misremembered this from the technology preview.)

Signal’s SVR deploys such an enclave program on the Signal servers. This program performs a simple function: for each user, it generates and stores a random 256-bit cryptographic secret “seed” along with a hash of the user’s PIN. When a user contacts the server, it can present a hash of its PIN to the enclave and ask for the cryptographic seed. If the hash matches what the enclave has stored, the server delivers the secret seed to the client, which can mix it together with the PIN. The result is a cryptographically strong encryption key that can be used to encrypt or decrypt backup data. (Update: thanks to Dino Dai Zovi for correcting some details in here.)

The key to this approach is that the encryption key now depends on both the user’s password and a strong cryptographic secret stored by an SGX enclave on the server. If SGX does its job, then even a user who hacks into the Signal servers — and here we include the Signal developers themselves, perhaps operating under duress — will be unable to retrieve this user’s secret value. The only way to access the backup encryption key is to actually run the enclave program and enter the user’s hashed PIN. To prevent brute-force guessing, the enclave keeps track of the number of incorrect PIN-entry attempts, and will only allow a limited number before it locks that user’s account entirely.

This is an elegant approach, and it’s conceptually quite similar to systems already deployed by Apple and Google, who use dedicated Hardware Security Modules to implement the trusted component, rather than SGX.

The key weakness of the SVR approach is that it depends strongly on the security and integrity of SGX execution. As we’ll discuss in just a moment, SGX does not exactly have a spotless record.

What happens if SGX isn’t secure?

Anytime you encounter a system that relies fundamentally on the trustworthiness of some component — particularly a component that exists in commodity hardware — your first question should be: “what happens if that component isn’t actually trustworthy?”

With SVR that question takes on a great deal of relevance.

Let’s step back. Recall that the goal of SVR is to ensure three things:

  1. The backup encryption key is based, at least in part, on the user’s chosen password. Strong passwords mean strong encryption keys.
  2. Even with a weak password, the encryption key will still have cryptographic strength. This comes from the integration of a random seed that gets chosen and stored by SGX.
  3. No attacker will be able to brute-force their way through the password space. This is enforced by SGX via guessing limits.
Example of a high-entropy passphrase (from this random manual). Please don’t use this as your Signal password.

Note that only the first goal is really enforced by cryptography itself. And this goal will only be achieved if the user selects a strong (high-entropy) password. For an example of what that looks like, see the picture at right.

The remaining goals rely entirely on the integrity of SGX. So let’s play devil’s advocate, and think about what happens to SVR if SGX is not secure.

If an attacker is able to dump the memory space of a running Signal SGX enclave, they’ll be able to expose secret seed values as well as user password hashes. With those values in hand, attackers can run a basic offline dictionary attack to recover the user’s backup keys and passphrase. The difficulty of completing this attack depends entirely on the strength of a user’s password. If it’s a BIP39 phrase, you’ll be fine. If it’s a 4-digit PIN, as strongly encouraged by the UI of the Signal app, you will not be.

(The sensitivity of this data becomes even worse if your PIN happens to be the same as your phone passcode. Make sure it’s not!)

Similarly, if an attacker is able to compromise the integrity of SGX execution: for example, to cause the enclave to run using stale “state” rather than new data, then they might be able to defeat the limits on the number of incorrect password (“retry”) attempts. This would allow the attacker to run an active guessing attack on the enclave until they recover your PIN. (Edit: As noted above, this shouldn’t be relevant in SVR because data is stored only in RAM, and never sealed or written to disk.)

A final, and more subtle concern comes from the fact that Signal’s SVR also allows for “replication” of the backup database. This addresses a real concern on Signal’s part that the backup server could fail — resulting in the loss of all user backup data. This would be a UX nightmare, and understandably, Signal does not want users to be exposed to it.

To deal with this, Signal’s operators can spin up a new instance of the Signal server on a cloud provider. The new instance will have a second copy of the SGX enclave software, and this software can request a copy of the full seed database from the original enclave. (There’s even a sophisticated consensus protocol to make sure the two copies stay in agreement about the state of the retry counters, once this copy is made.)

The important thing to keep in mind is that the security of this replication process depends entirely on the idea that the original enclave will only hand over its data to another instance of the same enclave software running on a secure SGX-enabled processor. If it was possible to trick the original enclave about the status of the new enclave — for example, to convince it to hand the database over to a system that was merely emulating an SGX enclave in normal execution mode — then a compromised Signal operator would be able to use this mechanism to exfiltrate a plaintext copy of the database. This would break the system entirely.

Prevention against this attack is accomplished via another feature of Intel SGX, which is called “remote attestation“. Essentially, each Intel processor contains a unique digital signing key that allows it to attest to the fact that it’s a legitimate Intel processor, and it’s running a specific piece of enclave software. These signatures can be verified with the assistance of Intel, which allows enclaves to verify that they’re talking directly to another legitimate enclave.

The power of this system also contains its fragility: if a single SGX attestation key were to be extracted from a single SGX-enabled processor, this would provide a backdoor for any entity who is able to compromise the Signal developers.

With these concerns in mind, it’s worth asking how realistic it is that SGX will meet the high security bar it needs to make this system work.

So how has SGX done so far?

Not well, to be honest. A list of SGX compromises is given on Wikipedia here, but this doesn’t really tell the whole story.

The various attacks against SGX are many and varied, but largely have a common cause: SGX is designed to provide virtualized execution of programs on a complex general-purpose processor, and said processors have a lot of weird and unexplored behavior. If an attacker can get the processor to misbehave, this will in turn undermine the security of SGX.

This leads to attacks such as “Plundervolt“, where malicious software is able to tamper with the voltage level of the processor in real-time, causing faults that leak critical data. It includes attacks that leverage glitches in the way that enclaves are loaded, which can allow an attacker to inject malicious code in place of a proper enclave.

The scariest attacks against SGX rely on “speculative execution” side channels, which can allow an attacker to extract secrets from SGX — up to and including basically all of the working memory used by an enclave. This could allow extraction of values like the seed keys used by Signal’s SVR, or the sealing keys (used to encrypt that data on disk.) Worse, these attacks have not once but twice been successful at extracting cryptographic signing keys used to perform cryptographic attestation. The most recent one was patched just a few weeks ago. These are very much live attacks, and you can bet that more will be forthcoming.

This last part is bad for SVR, because if an attacker can extract even a single copy of one processor’s attestation signing keys, and can compromise a Signal admin’s secrets, they can potentially force Signal to replicate their database onto a simulated SGX enclave that isn’t actually running inside SGX. Once SVR replicated its database to the system, everyone’s secret seed data would be available in plaintext.

But what really scares me is that these attacks I’ve listed above are simply the result of academic exploration of the system. At any given point in the past two years I’ve been able to have a beer with someone like Daniel Genkin of U. Mich or Daniel Gruss of TU Graz, and know that either of these professors (or their teams) is sitting on at least one catastrophic unpatched vulnerability in SGX. These are very smart people. But they are not the only smart people in the world. And there are smart people with more resources out there who would very much like access to backed-up Signal data.

It’s worth pointing out that most of the above attacks are software-only attacks — that is, they assume an attacker who is only able to get logical access to a server. The attacks are so restricted because SGX is not really designed to defend against sophisticated physical attackers, who might attempt to tap the system bus or make direct attempts to unpackage and attach probes to the processor itself. While these attacks are costly and challenging, there are certainly agencies that would have no difficulty executing them.

Finally, I should also mention that the security of the SVR approach assumes Intel is honest. Which, frankly, is probably an assumption we’re already making. So let’s punt on it.

So what’s the big deal?

My major issue with SVR is that it’s something I basically don’t want, and don’t trust. I’m happy with Signal offering it as an option to users, as long as users are allowed to choose not to use it. Unfortunately, up until this week, Signal was not giving users that choice.

More concretely: a few weeks ago Signal began nagging users to create a PIN code. The app itself didn’t really explain well that setting this PIN would start SVR backups. Many people I spoke to said they believed that the PIN was for protecting local storage, or to protect their account from hijacking.

And Signal didn’t just ask for this PIN. It followed a common “dark pattern” born in Silicon Valley of basically forcing users to add the PIN, first by nagging them repeatedly and then ultimately by blocking access to the entire app with a giant modal dialog.

This is bad behavior on its merits, and more critically: it probably doesn’t result in good PIN choices. To make it go away, I chose the simplest PIN that the app would allow me to, which was 9512. I assume many other users simply entered their phone passcodes, which is a nasty security risk all on its own.

Some will say that this is no big deal, since SVR currently protects only users’ contact lists — and those are already stored in cleartext on competing messaging systems. This is, in fact, one of the arguments Moxie has made.

But I don’t buy this. Nobody is going to engineer something as complex as Signal’s SVR just to store contact lists. Once you have a hammer like SVR, you’re going to want to use it to knock down other nails. You’ll find other critical data that users are tired of losing, and you’ll apply SVR to back that data up. Since message content backups are one of the bigger pain points in Signal’s user experience, sooner or later you’ll want to apply SVR to solving that problem too.

In the past, my view was that this would be fine — since Signal would surely give users the ability to opt into or out of message backups. The recent decisions by Signal have shaken my confidence.

Addendum: what does Signal say about this?

Originally this post had a section that summarized a discussion I had with Moxie around this issue. Out of respect for Moxie, I’ve removed some of this at his request because I think it’s more fair to let Moxie address the issue directly without being filtered through me.

So in this rewritten section I simply want to make the point that now (following some discussion on Twitter), there is a workaround to this issue. You can either choose to set a high-entropy passcode such as a BIP39 phrase, and then forget it. This will not screw up your account unless you turn on the “registration lock” feature. Or you can use the new “Disable PIN” advanced feature in Signal’s latest beta, which does essentially the same thing in an automated way. This seems like a good addition, and while I still think there’s a discussion to be had around consent and opt-in, this is a start for now.

Does Zoom use end-to-end encryption?

Does Zoom use end-to-end encryption?

TL;DR: It’s complicated.

Yesterday Zoom (the videoconferencing company, not the defunct telecom) put out a clarification post describing their encryption practices. This is a nice example of a company making necessary technical clarifications during a difficult time, although it comes following widespread criticism the company received over their previous, and frankly slightly misleading, explanation.

Unfortunately, Citizenlab just put out a few of their own results which are based on reverse-engineering the Zoom software. These raise further concerns that Zoom isn’t being 100% clear about how much end-to-end security their service really offers.

This situation leaves Zoom users with a bit of a conundrum: now that everyone in the world is relying on this software for so many critical purposes, should we trust it? In this mostly non-technical post I’m going to talk about what we know, what we don’t know, and why it matters.

End-to-end encryption: the world’s shortest explainer

The controversy around Zoom stems from some misleading marketing material that could have led users to believe that Zoom offers “end-to-end encryption”, or E2E. The basic idea of E2E encryption is that each endpoint — e.g., a Zoom client running on a phone or computer — maintains its own encryption keys, and sends only encrypted data through the service.

In a truly E2E system, the data is encrypted such that the service provider genuinely cannot decrypt it, even if it wants to. This ensures that the service provider can’t read your data, nor can anyone who hacks into the service provider or its cloud services provider, etc. Ideally this would include various national intelligence agencies, which is important in the unlikely event that we start using the system to conduct sensitive government business.

While end-to-end encryption doesn’t necessarily stop all possible attacks, it represents the best path we have to building secure communication systems. It also has a good track record in practice. Videoconferencing like Apple’s FaceTime, and messaging apps like WhatsApp and Signal already use this form of encryption routinely to protect your traffic, and it works.

Zoom: the good news

The great news from the recent Zoom blog post is that, if we take the company at its word, Zoom has already made some progress towards building a genuinely end-to-end encrypted videoconferencing app. Specifically, Zoom claims that:

[I]n a meeting where all of the participants are using Zoom clients, and the meeting is not being recorded, we encrypt all video, audio, screen sharing, and chat content at the sending client, and do not decrypt it at any point before it reaches the receiving clients.

Note that the emphasis is mine. These sections represent important caveats.

Taken at face value, this statement seems like it should calm any fears about Zoom’s security. It indicates that the Zoom client — meaning the actual Zoom software running on a phone or desktop computer — is capable of encrypting audio/video data to other Zoom clients in the conversation, without exposing your sensitive data to Zoom servers. This isn’t a trivial technical problem to solve, so credit to Zoom for doing the engineering work.

Unfortunately the caveats matter quite a bit. And this is basically where the good news ends.

The “unavoidably bad”

The simplest caveats are already present in Zoom’s statement above. Zoom provides a number of value-added services, including “cloud recording” and support for dial-in telephony conferencing. Unfortunately, each of these features is fundamentally incompatible with end-to-end encryption.

Zoom supports these services in a fairly rational way. When those services are active, they provide a series of “endpoints” within their network. These endpoints act like normal Zoom clients, meaning that they participate in your group conversation, and they obtain the keys to decrypt and access the audio/video data: either to record it, or bridge to normal phones.

In theory this isn’t so bad. Even an end-to-end encrypted system can optionally allow these features: a user (e.g., the conference host) could simply send its encryption keys to a Zoom endpoint, allowing it to participate in the call. This would represent a potential loss of security, but at least users would be making the decision themselves.

Unfortunately, in Zoom’s system the decision to share keys may not be entirely left up to the users. And this is where Zoom gets a little scary.

The “pretty-damn-bad”, AKA key management

The real magic in an end-to-end encrypted system is not necessarily the encryption. Rather, it’s the fact that decryption keys never leave the endpoint devices, and are therefore never available to the service provider.

(If you need a stupid analogy here, try this one: availability of keys is like the difference between me when I don’t have access to a cheescake, and me when a cheesecake is sitting in my refrigerator.)

So the question we should all be asking is: does Zoom have access to the decryption keys? On this issue, Zoom’s blog post becomes maddeningly vague:

Zoom currently maintains the key management system for these systems in the cloud. Importantly, Zoom has implemented robust and validated internal controls to prevent unauthorized access to any content that users share during meetings, including – but not limited to – the video, audio, and chat content of those meetings.

In other words: it sounds an awful lot like Zoom has access to decryption keys.

Thankfully we don’t have to wait for Zoom to clarify their answers to this question. Bill Marczak and John Scott-Railton over at CitizenLab have done it for us, by reverse-engineering and taking a close look at the Zoom protocol in operation. (I’ve worked with Bill and his speed at REing things amazes me.)

What they found should make your hair curl:

By default, all participants’ audio and video in a Zoom meeting appears to be encrypted and decrypted with a single AES-128 key shared amongst the participants. The AES key appears to be generated and distributed to the meeting’s participants by Zoom servers  ….

In addition, during multiple test calls in North America, we observed keys for encrypting and decrypting meetings transmitted to servers in Beijing, China.

In short, Zoom clients may be encrypting their connections, but Zoom generates the keys for communication, sometimes overseas, and hands them out to clients. This makes it easy for Zoom to add participants and services (e.g., cloud recording, telephony) to a conversation without any user action.

It also, unfortunately, makes it easy for hackers or a government intelligence agency to obtain access to those keys. This is problematic.

The good news

From the limited information in the Zoom and Citizenlab posts, the good news is that Zoom has already laid much of the groundwork for building a genuinely end-to-end encrypted service. That is, many of the hard problems have already been solved.

(NB: Zoom has some other cryptographic flaws, like using ECB mode encryption, eek, but compared to the key management issues this is a minor traffic violation.)

What Zoom needs now is to very rapidly deploy a new method of agreeing on cryptographic session keys, so that only legitimate participants will have access to them. Fortunately this “group key exchange” problem is relatively easy to solve, and an almost infinite number of papers have been written on the topic.

(The naive solution is simply to obtain the public encryption keys of each participating client, and then have the meeting host encrypt a random AES session key to each one, thus cutting Zoom’s servers out of the loop.)

This won’t be a panacea, of course. Even group key exchange will still suffer from potential attacks if Zoom’s servers are malicious. It will still be necessary to authenticate the identity and public key of different clients who join the system, because a malicious provider, or one compelled by a government, can simply modify public keys or add unauthorized clients to a conversation. (Some Western intelligence agencies have already proposed to do this in practice.) There will be many hard UX problems here, many of which we have not solved even in mature E2E systems.

We’ll also have to make sure the Zoom client software is trustworthy. All the end-to-end encryption in the world won’t save us if there’s a flaw in the endpoint software. And so far Zoom has given us some reasons to be concerned about this.

Still: the perfect is the enemy of the good, and the good news is that Zoom should be able to get better.

Are we being unfair to Zoom?

I want to close by saying that many people are doing the best they can during a very hard time. This includes Zoom’s engineers, who are dealing with an unprecedented surge of users, and somehow managing to keep their service from falling over. They deserve a lot of credit for this. It seems almost unfair to criticize the company over some hypothetical security concerns right now.

But at the end of the day, this stuff is important. The goal here isn’t to score points against Zoom, it’s to make the service more secure. And in the end, that will benefit Zoom as much as it will benefit all of the rest of us.