On Ashton Kutcher and Secure Multi-Party Computation

Back in March I was fortunate to spend several days visiting Brussels, where I had a chance to attend a panel on “chat control“: the new content scanning regime being considered by the EU Commission. Among various requirements, this proposed legislation would mandate that client-side scanning technology be incorporated into encrypted text messaging applications like Signal, WhatsApp and Apple’s iMessage. The scanning tech would examine private messages for certain types of illicit content, including child sexual abuse media (known as CSAM), along with a broad category of textual conversations that constitute “grooming behavior.”

I have many thoughts about the safety of the EU proposal, and you can read some of them here. (Or you’re interested in the policy itself, you can read this recent opinion by the EU’s Council’s Legal Service.) But although the EU proposal is the inspiration for today’s post, it’s not precisely what I want to talk about. Instead, I’d like to clear up some confusion I’ve noticed around the specific technologies that many have proposed to use for building these systems.

Also: I want to talk about Ashton Kutcher.

*Ashton Kutcher visits the EU parliament in March 2023 (photo: Roberta Metsola.)*

It turns out there were a few people visiting Brussels to talk about encryption this March. Only a few days before my own visit, Ashton Kutcher gave a major speech to EU Parliament members in support of the Commission’s content scanning proposal. (And yes, I’m talking about that Ashton Kutcher, the guy who played Steve Jobs and is married to Mila Kunis.)

Kutcher has been very active in the technical debate around client-side scanning. He’s the co-founder of an organization called Thorn, which aims to develop cryptographic technology to enable content scanning. In March he gave an impassioned speech to the EU Parliament urging the deployment of these technologies, and remarkably he didn’t just talk about the policy side of things. When asked how to balance user privacy against the needs of scanning, he even made a concrete technical proposal: to use fully-homomorphic encryption (FHE) as a means to evaluate encrypted messages.

Now let me take a breath here before my head explodes.

I promise I am not one of those researchers who believes only subject-matter experts should talk about cryptography. Really I’m not! I write this blog because I think cryptography is amazing and I want everyone talking about it all the time. Seeing mainstream celebrities toss around phrases like “homomorphic encryption” is literally a dream come true and I wish it happened every single day.

And yet, there are downsides to this much winning.

I ran face-first into some of those downsides when I spoke to policy experts about Kutcher’s proposal. Terms like fully homomorphic encryption can be confusing and off-putting to non-cryptographers. When filtered through people who are not themselves experts in the technology, these ideas can produce the impression that cryptography is magical pixie dust we can sprinkle on all the hard problems in the world. And oh how I wish that were true. But in the real world, cryptography is full of tradeoffs. Solving one problem often just makes new problems, or creates major new costs, or else shifts the risks and costs to other parts of the system.

So when people on various sides of the debate asked me whether “fully-homomorphic encryption” could really do what Kutcher said it would, I couldn’t give an easy five-word answer. The real answer is something like: (scream emoji) it’s very complicated. That’s a very unsatisfying thing to have to tell people. Out here in the real world the technical reality is eye-glazing and full of dragons.

Which brings me to this post.

What Kutcher is really proposing is that we to develop systems that perform privacy-preserving computation on encrypted data. He wants to use these systems to enable “private” scanning of your text messages and media attachments, with the promise that these systems will only detect the “bad” content while keeping your legitimate private data safe. This is a complicated and fraught area of computer science. In what goes below, I am going to discuss at a high and relatively non-technical level the concepts behind it: what we can do, what we can’t do, and how practical it all is.

In the process I’ll discuss the two most powerful techniques that we have developed to accomplish this task: namely, multi-party computation (MPC) and, as an ingredient towards achieving the former, fully-homomorphic encryption (FHE). Then I’ll try to clear up the relationship between these two things, and explain the various tradeoffs that can make one better than the other for specific applications. Although these techniques can be used for so many things, throughout this post I’m going to focus on the specific application being considered in the EU: the use of privacy-preserving computation to conduct content scanning.

This post will not require any mathematics or much computer science, but it will require some patience. So find a comfortable chair and buckle in.

Computing on private data

Encryption is an ancient technology. Roughly speaking, it provides the ability to convert meaningful messages (and data) into a form that only you, and your intended recipient(s) can read. In the modern world this is done using public algorithms that everyone can look at, combined with secret keys that are held only by the intended recipients.

Modern encryption is really quite excellent. So as long as keys are kept safe, encrypted data can be sent over insecure networks or stored in risky locations like your phone. And while occasionally people find a flaw in an implementation of encryption, the underlying technology works very well.

But sometimes encryption can get in the way. The problem with encrypted data is that it’s, well, encrypted. When stored in this form, such data is virtually useless for practical purposes like performing calculations. Before you can compute on that data, you often need to first decrypt it and thus remove all the beautiful protections we get from encryption.

If your goal is to compute on multiple pieces of data that originate from different parties, the problem can become even more challenging. Who can we trust to do the computing? An obvious solution is to decrypt all that data and hand it to one very trustworthy person, who will presumably swear not to steal it or get hacked. Finding those parties can be quite challenging.

Fortunately for us all, the first academic cryptographers also happened to be computer scientists, and so this was exactly the sort of problem that excited them. Those researchers quickly devised a set of specific and general techniques designed to solve these problems, and also came up with a cool name for them: secure multi-party computation, or MPC for short.

MPC: secure private computation (in six eight ten paragraphs)

The setting of MPC is fairly simple: imagine that we have two (or more!) parties that each have some private data they don’t want to give to anyone else. Yet each of the parties is willing to provide their data as input to some specific computation, and are all willing to reveal the output of that computation — either to everyone involved, or perhaps just to some agreed subset of the parties. Can these parties now perform the computation jointly, without appointing a trusted party?

Let’s make this easier by using a concrete example.

Imagine a group of workers all know their own salaries, but don’t know anything about anyone else’s salary. Yet they wish to compute some statistics over their salary data: for example, the average of their salaries. These workers aren’t willing to share their own salary data with anyone else, but they are willing to submit it as one input in a large calculation under the strict condition that only the final result is ever revealed.

This might seem contrived to you, but it is in fact a real problem that some people have used MPC to solve.

An MPC protocol allows the workers to do this, without appointing a trusted central party or revealing their inputs (and intermediate calculations) to anyone else. At the conclusion of the protocol each party will learn only the result of the calculation:

*The “cloud” at the center of this diagram is actually a complicated protocol where every party sends messages to every other party.*

MPC protocols typically provide strong provable guarantees about their properties. The details vary, but typically speaking: no party will learn anything about the other parties’ inputs. Indeed they won’t even learn any partial information that might be produced during the calculation. Even better, all parties can be assured that the result will be correct: as long as all parties submit valid inputs to the computation, none of them should be able to force the calculation to go awry.

Now obviously there are caveats.

In practice, using MPC is a bit like making a deal with a genie: you need to pay very careful attention to the fine print. Even when the cryptography works perfectly, this does not mean that computing your function is actually “safe.” In fact, it’s entirely possible to choose functions that when computed securely are still devastating to your privacy.

For example: imagine that I use an MPC protocol to compute an average salary between myself and exactly one other worker. This could be a very bad idea! Note that if the other worker is curious, then she can figure out how much I make. That is: the average of our two wages reveals enough information that she find my wage given knowledge of her own input. This (obvious) caveat applies to many other uses of MPC, even when the technology works perfectly.

This is not a criticism of MPC, just the observation that it’s a tool. In practice, MPC (or any other cryptographic technology) is not a privacy solution by itself, at least not in the sense of privacy that real-world human beings like to think about. It provides certain guarantees that may or may not be useful for providing privacy in the real world.

What does MPC have to do with client-side scanning?

We began this post by discussing client-side scanning for encrypted messaging apps. This is a perfect example of an application that fits the MPC (or two-party computation) use-case perfectly. That’s because in this setting we generally have multiple parties with secret data who want to perform some joint calculation on their inputs.

In this setting, the first party is typically a client (usually a person using an encrypted messaging app like WhatsApp or Signal), who possesses some private text message or perhaps a media file they wish to send to another user. Under proposed law in the EU, their app could be legally mandated to “scan” that image to see if it contains illegal content.

According to the EU Commission, this scanning can be done in a variety of ways. For example, the device could compare an images against a secret database of known illicit content (typically using a specialized perceptual hash function.) However, while the EU plan starts there, their plans also get much more ambitious: they also intend to look for entirely new instances of illicit content as well as textual “grooming” conversations, possibly using machine learning (ML) models, that is, deep neural networks that will be trained to recognize data that fits these patterns. These various models must be sophisticated enough to understand entirely new images, as well as to derive meaning from complex interactive human conversation. None of this is likely to be very simple.

Now most of this could be done using standard techniques on the client device, except for one major limitation. The challenge in this setting is that the provider doing the scanning usually wants to keep these hashes and/or ML models secret.

There are several reasons for this. The first reason is that knowledge of the scanning model (or database of illicit content) makes it relatively easy for bad actors to evade the model. In other words, with only modest transformations it’s possible to modify “bad” images so that they become invisible to ML scanners.

Knowledge of the model can also allow for the creation of “poisoned” imagery: these include apparently-benign images (e.g., a picture of a cat) that trigger false positives in the scanner. (Indeed this such “colliding” images have been developed for some hash-based CSAM scanning proposals.) More worryingly, in some cases the hashes and neural network models can be “reversed” to extract the imagery and textual content they were trained on: this has all kinds of horrifying implications, and could expose abuse victims to even more trauma.

So here the user doesn’t want to send its confidential data to the provider for scanning, and the provider doesn’t want to hand its confidential model parameters to the user (or even to expose them inside the user’s phone, where they might be stolen by reverse-engineers.) This is exactly the situation that MPC was designed to handle:

Sketch of a client-side scanning architecture that uses (two-party) MPC between the client and the Provider. The client inputs the content to be scanned, while the server provides its secret model and/or hash database. The protocol gives the provider a copy of the user’s content if and only if the model says it’s illicit content, otherwise the provider sees nothing. (Note in this variant, the output goes only to the Provider.)

This makes everything very complicated. In fact, there has only been one real-world proposal for client-side CSAM scanning that has ever come (somewhat) close to deployment: that system was designed by Apple for a (now abandoned) client-side photo scanning plan. The Apple approach is cryptographically very ambitious: it uses neural-network based perceptual hashing, and otherwise exactly follows the architecture described above. However, critically: it relied on a neural-network based hash function that was not kept secret. Disastrous results ensued (see further below.)

(If you’re interested in getting a sense of how complex this protocol is, here is a white paper describing how it works.)

A diagram from the Apple CSAM scanning protocol.

Ok, so what kind of MPC protocols are available to us?

Multi-party computation is a broad category. It describes a class of protocols. In practice there are many different cryptographic techniques that allow us to realize it. Some of these (like the Apple protocol) were designed for specific applications, while others are capable of performing general-purpose computation.

I promised this post would not go into the weeds, but it’s worth pointing out that general MPC techniques typically make use of (some combination of) three different techniques: secret sharing, circuit garbling, and homomorphic encryption. Often, efficient modern systems will use a mixture of two or three of those techniques, ~~just to make everything more confusing~~ because they’re to maximize efficiency.

What is it that you need to know about these techniques? Here I’ll try, in a matter of a few sentences (that will draw me endless grief) to try to summarize the strengths and disadvantages of each technique.

Both secret sharing and garbling techniques share a common feature, which is that they require a great deal of data to be sent between the parties. In practice the amount of data sent between the parties will grow with (at least) the size of the inputs they’re computing on, but often will grow according to the complexity of the calculation they’re performing. For things like deep neural networks where both the data and calculation are huge, this generally results in fairly surprising amounts of data transfer.

This is not usually considered to be a problem on the general Internet or within EC2 datacenters, where data transfer is cheap. It can be quite a challenge when one of those parties is using a cellphone, however. That makes any scheme using these technologies subject to some very serious limitations.

Homomorphic encryption schemes take a different approach. These systems make use of specialized encryption schemes that are malleable. This means that encrypted data can be “modified” in useful ways without ever decrypting it.

In a bit more detail: in fully-homomorphic encryption MPC systems, a first party can encrypt its data under a public key that it generates. It can then send the encrypted data to a second party. This second party can then perform calculations on the ciphertext while it is still encrypted — adding and multiplying it together with other data (including data encrypted by the second party) to perform some calculation. Throughout this process all of the data remains encrypted. At the conclusion of this process, the second party will end up with a “modified” ciphertext that internally contains a final calculation result, but that it cannot read. To finish the protocol, the second party can send that ciphertext back to the first party, who can then decrypt it using its secret key and obtain the final output.

The major upshot of the pure-FHE technique is that it substantially reduces the amount of data that the two parties need to transmit between them, especially compared to the other MPC techniques. The downside of this approach is… well, there are several. One is that FHE calculations typically require vastly more computational effort (and hence time and carbon emissions) than the other techniques. Moreover, they may still require a good deal of data transfer — in part because the number of calculations that one can perform on a given ciphertext is usually limited by “noise” that turns up within the ciphertext. Hence, calculations must either be very simple or else broken up into “phases”, where the partial calculation result is decrypted and re-encrypted so that more computation can be done. This can be done interactively between the parties, or by the second party alone (using a technique called “bootstrapping”) but in both cases the cost is either much more bandwidth exchanged or a great deal of extra computation.

In practice, cryptographers rarely commit to a single approach. They instead combine all these techniques in order to achieve an appropriate balance of data-transfer and computational effort. These “mixed systems” tend to have merely large amounts of data transfer and large amounts of computation, but are still amazingly efficient compared to the alternatives.

For an example of this, consider this very optimized two-party MPC scheme aimed at performing neural network classification. This scheme takes (from the client) a 32×32 image, and evaluates a tiny 7-layer neural network held by a server in order to perform classification. As you can see, evaluating the model even on a single image requires about 8 seconds of computation and 200 megabytes of bandwidth exchange, for each image being evaluated:

Source: MUSE paper, figure 8. These are the times for a 7-layer MiniONN network trained on the CIFAR-10 dataset.

These numbers may seem quite high, but in fact they’re actually really impressive as these things go. Previous systems used nearly an order of magnitude more time and bandwidth to do their work. Maybe there will be further improvements in the future! Even on a pure efficiency basis there is much work to be done.

What are the other risks of MPC in this setting?

The final point I would like to make is that secure MPC (or MPC built using FHE as a tool) is not itself enough to satisfy the requirements of a safe content scanning system. As I mentioned above, MPC systems merely evaluate some function on private data. The question of whether that function is safe is left largely to the system designer.

In the case of these content scanning systems, the safety of the resulting system really comes down to a question of whether the algorithms work well, particularly in settings where “bad guys” can find adversarial inputs that try to disrupt the system. It also requires new techniques to ensure that the system cannot be abused. That is: there must be guarantees within the computation to ensure that the provider (or a party who hacks the provider) cannot change the model parameters to allow them to access your private content.

This is a much longer conversation than I want to have in this post, because it fundamentally requires one to think about whether the entire system makes sense. For a much longer discussion of the risks, see this paper.

This was nice, but I would like to learn more about each of these technologies!

The purpose of this post was just to give the briefest explanation of the techniques that exist for performing all of these calculations. If you’re interested in knowing (a lot more!) about these technologies, take a look at this textbook by Evans, Kolesnikov and Rosulek. MPC is an exciting area, and one that is advancing every single (research) conference cycle.

And maybe that is the lesson of this post: these technologies are still research techniques. It’s probably not quite time to put them out in the world.

Attack of the week: Voice calls in LTE

I haven’t written an “attack of the week” post in a while, and it’s been bumming me out. This is not because there’s been a lack of attacks, but mostly because there hasn’t been an attack on something sufficiently widely-used that it can rouse me out of my blogging torpor.

But today brings a beautiful attack called ReVoLTE, on a set of protocols that I particularly love to see get broken: namely, cellular protocols. And specifically, the (voice over) LTE standards. I’m excited about these particular protocols — and this new attack — because it’s so rare to see actual cellular protocols and implementations get broken. This is mostly because these standards are developed in smoke-filled rooms and written up in 12,000 page documents that researchers never have the energy to deal with. Moreover, implementing the attacks requires researchers to mess with gnarly radio protocols.

And so, serious cryptographic vulnerabilities can spread all over the world, presumably only exploited by governments, before a researcher actually takes a look at them. But every now and then there’s an exception, and today’s attack is one of them.

The attack itself is by David Rupprecht, Katharina Kohls, Thorsten Holz, and Christina Pöpper at RUB and NYU Abu Dhabi. It’s a lovely key re-installation attack on a voice protocol that you’re probably already using, assuming you’re one of the older generation who still make phone calls using a cellular phone.

Let’s start with some background.

What is LTE, and what is VoLTE?

The basis for our modern cellular telephony standards began in Europe back in the 1980s, with a standard known as Global System for Mobile. GSM was the first major digital cellular telephony standard, and it introduced a number of revolutionary features such as the use of encryption to protect phone calls. Early GSM was designed primarily for voice communications, although data could be sent over the air at some expense.

As data became more central to cellular communications, the Long Term Evolution (LTE) standards were devised to streamline this type of communication. LTE builds on a group of older standards such as GSM, EDGE and HSPA to make data communication much faster. There’s a lot of branding and misbranding in this area, but the TL;DR is that LTE is a data communications system that serves as a bridge between older packet data protocols and future 5G cellular data technologies.

Of course, history tells us that once you have enough (IP) bandwidth, concepts like “voice” and “data” start to blur together. The same is true with modern cellular protocols. To make this transition smoother, the LTE standards define Voice-over-LTE (VoLTE), which is an IP-based standard for transmitting voice calls directly over the data plane of the LTE system, bypassing the circuit-switched portion of the cellular network entirely. As with standard VoIP calls, VoLTE calls can be terminated by the cellular provider and connected to the normal phone network. Or (as is increasingly common) they can be routed directly from one cellular customer to another, even across providers.

Like standard VoIP, VoLTE is based on two popular IP-based protocols: Session Initiation Protocol (SIP) for call establishment, and Real Time Transport Protocol (which should be called RTTP but is actually called RTP) to actually handle voice data. VoLTE also adds some additional bandwidth optimizations, such as header compression.

Ok, what does this have to do with encryption?

Like GSM before it, LTE has a standard set of cryptographic protocols for encrypting packets while they travel over the air. These are mainly designed to protect your data while it travels between your handset (called the “User Equipment”, or UE) and the cellular tower (or wherever your provider decides to terminate the connection.) This is because cellular providers view outside eavesdroppers as the enemy, not themselves. Obviously.

(However, the fact that VoLTE connections can occur directly between customers on different provider networks does mean that the VoLTE protocol itself has some additional and optional encryption protocols that can happen at higher network layers. These aren’t relevant to the current paper except insofar as they could screw things up. We’ll talk about them briefly further below.)

Historical GSM encryption had many weaknesses: bad ciphers, protocols where only the handset authenticated itself to the tower (meaning an attacker could impersonate a tower, giving rise to the “Stingray“) and so on. LTE fixed many of the obvious bugs, while keeping a lot of the same structure.

Let’s start with the encryption itself. Assuming key establishment has already happened — and we’ll talk about that in just a minute — each data packet is encrypted using a stream cipher mode using some cipher called “EEA” (which in practice can be implemented with things like AES). The encryption mechanism is basically CTR-mode, as shown below:

Basic encryption algorithm for VoLTE packets (source: ReVoLTE paper). EEA is a cipher, “COUNT’ is a 32-bit counter, “BEARER” is a unique session identifier that separates VoLTE connections from normal internet traffic. And “DIRECTION” indicates whether the traffic is going from UE to tower or vice-versa.

Since the encryption algorithm itself (EEA) can be implemented using a strong cipher like AES, it’s unlikely that there’s any direct attack on the cipher itself, as there was back in the GSM days. However, even with a strong cipher, it’s obvious that this encryption scheme is a giant footgun waiting to go off.

CTR mode nonce re-use attacks were a thing when Poison was a thing.

Specifically: the LTE standard uses an (unauthenticated) stream cipher with a mode that will be devastatingly vulnerable should the counter — and other inputs, such as ‘bearer’ and ‘direction’ — ever be re-used. In modern parlance the term for this concept is “nonce re-use attack“, but the potential risks here are not modern. They’re well-known and ancient, going back to the days of hair-metal and even disco.

In fairness, the LTE standards says “don’t re-use these counters, please“. But the LTE standards are also like 7,000 pages long, and anyway, this is like begging toddlers not to play with a gun. Inevitably, they’re going to do that and terrible things will happen. In this case, the discharging gun is a keystream re-use attack in which two different confidential messages get XORed with the same keystream bytes. This is known to be utterly devastating for message confidentiality.

So what’s ReVoLTE?

The ReVoLTE attack paper points out that, indeed, this highly vulnerable encryption construction is in fact, misused by real equipment in the wild. Specifically, the authors analyze actual VoLTE calls made using commercial equipment, and show that they can exploit something called a “key re-installation attack”. (Much credit for the basic observation goes to Raza and Lu, who first pointed out the potential vulnerability. But the ReVoLTE research turns it into a practical attack.)

Let me give a quick overview of the attack here, although you should really read the paper.

You might assume that once LTE sets up a packet data connection, voice-over-LTE is just a question of routing voice packets over that connection alongside all of your other data traffic. In other words, VoLTE would be a concept that exists only above Layer 2. This isn’t precisely the case.

In fact, LTE’s data link layer introduces the concept of a “bearer“. Bearers are separate session identifiers that differentiate various kinds of packet traffic. Normal Internet traffic (your Twitter and Snapchat) goes over one bearer. SIP signalling for VoIP goes over another, and voice traffic packets are handled on a third. I don’t have much insight into the RF and network routing mechanisms of LTE, but I presume this is done because LTE networks want to enable quality of service mechanisms to ensure that these different packet flows are treated with different priority levels: i.e., your crummy TCP connections to Facebook can be prioritized at a lower level than your real-time voice calls.

This isn’t exactly a problem, but it raises an issue. Keys for LTE encryption are derived separately each time a new “bearer” is set up. In principle this should happen afresh each time you make a new phone call. This would result in a different encryption key for each call, thus eliminating the possibility that the same key will be re-used to encrypt two different sets of call packets. Indeed, the LTE standard says something like “you should use a different key each time you set up a new bearer to handle a new phone call.” But that doesn’t mean it happens.

In fact, in real implementations, two different calls that happen in close temporal proximity will end up using the exact same key — despite the fact that new (identically-named) bearers are configured between them. The only practical change that happens between those calls is that the encryption counter will reset back to zero. In the literature, this is sometimes called a key reinstallation attack. One can argue that this is basically an implementation error, although in this case the risks seem largely set up by the standard itself.

In practice, this attack leads to keystream re-use where an attacker can obtain the encrypted packets $C_1 = M_1 \oplus KS$ and $C_2 = M_2 \oplus KS$ , which allows her to compute $C_1 \oplus C_2 = M_1 \oplus M_2$ . Even better, if the attacker knows one of $M_1$ or $M_2$ , she can immediately recover the other. This gives her a strong incentive to know one of the two plaintexts.

This brings us to the complete and most powerful attack scenario. Consider an attacker who can eavesdrop the radio connection between a target phone and the cellular tower, and who somehow gets “lucky enough” to record two different calls where the second happens immediately subsequent to the other. Now imagine she can somehow can guess the plaintext contents of one of the calls. In this eventuality, our attacker can completely decrypt the first call, using a simple XOR evaluation between the two sets of packets.

And of course, as it happens — luck has nothing to do with it. Since phones are designed to receive calls, an attacker who can eavesdrop that first call will be able to initiate a second call at exactly moment the first call ends. This second call, should it re-use the same encryption key with a counter set back to zero, will enable plaintext recovery. Even better, since our attacker actually controls the data in the second call, she may be able to recover the contents of the first one — pending a whole lot of implementation-specific details all breaking in her favor.

Here’s a picture of the overall attack, taken from the paper:

Attack overview from the ReVoLTE paper. This diagram assumes that two different calls happen using the same key. The attacker controls a passive sniffer (top left) as well as a second handset that they can use to make a second call to the victim phone.

So does the attack actually work?

At one level, this is really the entire question for the ReVoLTE paper. All of the ideas above sound great in theory, but they leave a ton of questions. Such as:

Is it feasible for (academic researchers) to actually sniff VoLTE connections?
Do real LTE systems actually re-install keys?
Can you actually initiate that second call quickly and reliably enough to make a handset and tower re-use a key?
Even if systems do re-install keys, can you actually know the digital plaintext of the second call — given that things like codecs and transcoding may totally change the (bitwise) contents of that second call, even if you have access to the “bits” flowing out of your attacker phone?

The ReVoLTE paper answers several of these questions in the affirmative. The authors are able to use a commercial software-defined radio downlink sniffer called Airscope in order to eavesdrop the downlink side of a VoLTE call. (As is typical with academic research, I expect that simply getting hold of the software and figuring out how to work it took months off some poor graduate students’ lives.)

In order for key re-use to happen, the researchers discovered that a second call has to occur very rapidly after the termination of the first one, but not too rapidly — about ten seconds for the providers they experimented with. Fortunately, it doesn’t really matter if the target picks the call up within that time — the “ringing”, i.e., SIP communication itself causes the provider to re-use the same key.

Many of the gnarliest issues thus revolve around issue (4), obtaining all of the plaintext bits for the attacker-initiated call. This is because a lot of things can happen to your plaintext as it travels from your attacker handset out to the victim’s phone and through the cellular network. These include nastiness such as transcoding of encoded audio data, which makes the audio sound the same but totally changes the binary representation of the audio. LTE networks also use RTP header compression that can substantially change big portions of the RTP packet.

Finally, packets sent by the attacker need to roughly line up with packets that happened in the first phone call. This can be problematic, as silent patches in a phone call result in shorter messages (called comfort noise), which may not overlap well with the original call.

The “real world attack” section of the paper is worth reading for all the details. It addresses many of the above concerns — specifically, the authors find that some codecs are not transcoded, and that roughly 89% of the binary representation of the target call can be recovered, for at least two European providers that the attackers tested.

This is an astonishingly high level of success, and frankly much better than I anticipated when I started the paper.

So what can we do to fix this?

The immediate answer to this question is straightforward: since the vulnerability is a key re-use (re-installation) attack, just fix this attack. Make sure to derive a new key for each phone call, and never allow your packet counter to reset back to zero with the same key. Problem solved!

Or maybe not. Getting this right will require upgrading a lot of equipment, and frankly the fix itself isn’t terribly robust. It would be nice if standards could find a cleaner way to implement their encryption modes that isn’t instantly and catastrophically vulnerable to these nonce-reuse issues.

One possible direction is to use modes of encryption where nonce-misuse doesn’t result in catastrophic outcomes. This might be too expensive for some current hardware, but it’s certainly a direction that designers should be thinking about for the future, particular as the 5G standards are about to take over the world.

This new result also raises a general question about why the same damned attacks keep cropping up in standard after standard, many of which use very similar designs and protocols. At a certain point when you’ve had this same key re-installation issue happen in multiple widely-deployed protocols such as WPA2, maybe it’s time to make your specifications and testing procedures more robust to it? Stop treating implementers as thoughtful partners who will pay attention to your warnings, and treat them as (inadvertent) adversaries who are inevitably going to implement everything incorrectly.

Or alternatively, we can do what the Facebooks and Apples of the world are increasingly doing: make voice call encryption happen at a higher level of the OSI network stack, without relying on cellular equipment manufacturers to get anything right. We could even promote end-to-end encryption of voice calls, as WhatsApp and Signal and FaceTime do, assuming the US government would just stop trying to trip us up. Then (with the exception of some metadata) many of these problems would go away. This solution is particularly pertinent in a world where governments aren’t even sure if they trust their equipment providers.

Alternatively, we could just do what our kids have already done: and just stop answering those annoying voice calls altogether.

Let’s talk about PAKE

The first rule of PAKE is: nobody ever wants to talk about PAKE. The second rule of PAKE is that this is a shame, because PAKE — which stands for Password Authenticated Key Exchange — is actually one of the most useful technologies that (almost) never gets used. It should be deployed everywhere, and yet it isn’t.

To understand why this is such a damn shame, let’s start by describing a very real problem.

Imagine I’m operating a server that has to store user passwords. The traditional way to do this is to hash each user password and store the result in a password database. There are many schools of thought on how to handle the hashing process; the most common recommendation these days is to use a memory-hard password hashing function like scrypt or argon2 (with a unique per-password salt), and then store only the hashed result. There are various arguments about which hash function to use, and whether it could help to also use some secret value (called “pepper“), but we’ll ignore these for the moment.

Regardless of the approach you take, all of these solutions have a single achilles heel:

When the user comes back to log into your website, they will still need to send over their (cleartext) password, since this is required in order for the server to do the check.

This requirement can lead to disaster if your server is ever persistently compromised, or if your developers make a simple mistake. For example, earlier this year Twitter asked all of its (330 million!) users to change their passwords — because it turned out that company had been logging cleartext (unhashed) passwords.

Now, the login problem doesn’t negate the advantage of password hashing in any way. But it does demand a better solution: one where the user’s password never has to go to the server in cleartext. The cryptographic tool that can give this to us is PAKE, and in particular a new protocol called OPAQUE, which I’ll get to at the end of this post.

What’s a PAKE?

A PAKE protocol, first introduced by Bellovin and Merritt, is a special form of cryptographic key exchange protocol. Key exchange (or “key agreement”) protocols are designed to help two parties (call them a client and server) agree on a shared key, using public-key cryptography. The earliest key exchange protocols — like classical Diffie-Hellman — were unauthenticated, which made them vulnerable to man-in-the-middle attacks. The distinguishing feature of PAKE protocols is the client will authenticate herself to the server using a password. For obvious reasons, the password, or a hash of it, is assumed to be already known to the server, which is what allows for checking.

If this was all we required, PAKE protocols would be easy to build. What makes a PAKE truly useful is that it should also provide protection for the client’s password. A stronger version of this guarantee can be stated as follows: after a login attempt (valid, or invalid) both the client and server should learn only whether the client’s password matched the server’s expected value, and no additional information. This is a powerful guarantee. In fact, it’s not dissimilar to what we ask for from a zero knowledge proof.

pakediagram — Ideal representation of a PAKE protocol. The two parties’ inputs also include some randomness, which isn’t shown. An eavesdropper should not learn the strong shared secret key K, which should itself be *random* and not simply a function of the password.

Of course, the obvious problem with PAKE is that many people don’t want to run a “key exchange” protocol in the first place! They just want to verify that a user knows a password.

The great thing about PAKE is that the simpler “login only” use-case is easy to achieve. If I have a standard PAKE protocol that allows a client and server to agree on a shared key K if (and only if) the client knows the right password, then all we need add is a simple check that both parties have arrived at the same key. (This can be done, for example, by having the parties compute some cryptographic function with it and check the results.) So PAKE is useful even if all you’ve got in mind is password checking.

SRP: The PAKE that Time Forgot

The PAKE concept seems like it provides an obvious security benefit when compared to the naive approach we use to log into servers today. And the techniques are old, in the sense that PAKEs have been known since way back in 1992! Despite this, they’ve seen from almost no adoption. What’s going on?

There are a few obvious reasons for this. The most obvious has to do with the limitations of the web: it’s much easier to put a password form onto a web page than it is to do fancy crypto in the browser. But this explanation isn’t sufficient. Even native applications rarely implement PAKE for their logins. Another potential explanation has to do with patents, though most of these are expired now. To me there are two likely reasons for the ongoing absence of PAKE: (1) there’s a lack of good PAKE implementations in useful languages, which makes it a hassle to use, and (2) cryptographers are bad at communicating the value of their work, so most people don’t know PAKE is even an option.

Even though I said PAKE isn’t deployed, there are some exceptions to the rule.

One of the remarkable ones is a 1998 protocol designed by Tom Wu [correction: not Tim Wu] and called “SRP”. Short for “Secure Remote Password“, this is a simple three-round PAKE with a few elegant features that were not found in the earliest works. Moreover, SRP has the distinction of being (as far as I know) the most widely-deployed PAKE protocol in the world. I cite two pieces of evidence for this claim:

SRP has been standardized as a TLS ciphersuite, and is actually implemented in libraries like OpenSSL, even though nobody seems to use it much.
Apple uses SRP extensively in their iCloud Key Vault.

This second fact by itself could make SRP one of the most widely used cryptographic protocols in the world, so vast is the number of devices that Apple ships. So this is nothing to sneer at.

Industry adoption of SRP is nice, but also kind of a bummer: mainly because while any PAKE adoption is cool, SRP itself isn’t the best PAKE we can deploy. I was planning to go into the weeds about why I feel so strongly about SRP, but it got longwinded and it distracted from the really nice protocol I actually want to talk about further below. If you’re still interested, I moved the discussion onto this page.

In lieu of those details, let me give a quick and dirty TL;DR on SRP:

SRP does some stuff “right”. For one thing, unlike early PAKEs it does not require you to store a raw password on the server (or, equivalently, a hash that could be used by a malicious client in place of the password). Instead, the server stores a “verifier” which is a one-way function of the password hash. This means a leak of the password database does not (immediately) allow the attacker to impersonate the user — unless they conduct further expensive dictionary attacks. (The technical name for this is “asymmetric” PAKE.)
Even better, the current version of SRP (v4 v6a) isn’t obviously broken!
However (and with no offense to the designers) the SRP protocol design is completely bonkers, and earlier versions have been broken several times — which is why we’re now at revision 6a. Plus the “security proof” in the original research paper doesn’t really prove anything meaningful.
SRP currently relies on integer (finite field) arithmetic, and for various reasons (see point 3 above) the construction is not obviously transferable to the elliptic curve setting. This requires more bandwidth and computation, and thus SRP can’t take advantage of the many efficiency improvements we’ve developed in settings like Curve25519.
SRP is vulnerable to pre-computation attacks, due to the fact that it hands over the user’s “salt” to any attacker who can start an SRP session. This means I can ask a server for your salt, and build a dictionary of potential password hashes even before the server is compromised.
Despite all these drawbacks, SRP is simple — and actually ships with working code. Plus there’s working code in OpenSSL that even integrates with TLS, which makes it relatively easy to adopt.

Out of all these points, the final one is almost certainly responsible for the (relatively) high degree of commercial success that SRP has seen when compared to other PAKE protocols. It’s not ideal, but it’s real. This is something for cryptographers to keep in mind.

OPAQUE: The PAKE of a new generation

When I started thinking about PAKEs a few months ago, I couldn’t help but notice that most of the existing work was kind of crummy. It either had weird problems like SRP, or it required the user to store the password (or an effective password) on the server, or it revealed the salt to an attacker — allowing pre-computation attacks.

Then earlier this year, Jarecki, Krawczyk and Xu proposed a new protocol called OPAQUE. Opaque has a number of extremely nice advantages:

It can be implemented in any setting where Diffie-Hellman and discrete log (type) problems are hard. This means that, unlike SRP, it can be easily instantiated using efficient elliptic curves.
Even better: OPAQUE does not reveal the salt to the attacker. It solves this problem by using an efficient “oblivious PRF” to combine the salt with the password, in a way that ensures the client does not learn the salt and the server does not learn the password.
OPAQUE works with any password hashing function. Even better, since all the hashing work is done on the client, OPAQUE can actually take load off the server, freeing an online service up to use much strong security settings — for example, configuring scrypt with large RAM parameters.
In terms of number of messages and exponentiations, OPAQUE is not much different from SRP. But since it can be implemented in more efficient settings, it’s likely to be a lot more efficient.
Unlike SRP, OPAQUE has a reasonable security proof (in a very strong model).

There’s even an Internet Draft proposal for OPAQUE, which you can read here. Unfortunately, at this point I’m not aware of any production quality implementations of the code (if you know of one, please link to it in the comments and I’ll update). (Update: There are several potential implementations listed in the comments — I haven’t looked closely enough to endorse any, but this is great!) But that should soon change.

The full OPAQUE protocol is given a little bit further below. In the rest of this section I’m going to go into the weeds on how OPAQUE works.

Problem 1: Keeping the salt secret. As I mentioned above, the main problem with earlier PAKEs is the need to transmit the salt from a server to a (so far unauthenticated) client. This enables an attacker to run pre-computation attacks, where they can build an offline dictionary based on this salt.

The challenge here is that the salt is typically fed into a hash function (like scrypt) along with the password. Intuitively someone has to compute that function. If it’s the server, then the server needs to see the password — which defeats the whole purpose. If it’s the client, then the client needs the salt.

In theory one could get around this problem by computing the password hashing function using secure two-party computation (2PC). In practice, solutions like this are almost certainly not going to be efficient — most notably because password hashing functions are designed to be complex and time consuming, which will basically explode the complexity of any 2PC system.

OPAQUE gets around this with the following clever trick. They leave the password hash on the client’s side, but they don’t feed it the stored salt. Instead, they use a special two-party protocol called an oblivious PRF to calculate a second salt (call it salt2) so that the client can use salt2 in the hash function — but does not learn the original salt.

The basic idea of such a function is that the server and client can jointly compute a function PRF(salt, password), where the server knows “salt” and the client knows “password”. Only the client learns the output of this function. Neither party learns anything about the other party’s input.

The gory details:

The actual implementation of the oblivious PRF relies on the idea that the client has the password $P$ and the server has the salt, which is expressed as a scalar value $s$ . The output of the PRF function should be of the form $H(P)^s$ , where $H:\{0,1\}^* \rightarrow {\mathcal G}$ is a special hash function that hashes passwords into elements of a cyclic (prime-order) group.

To compute this PRF requires a protocol between the client and server. In this protocol, the client first computes $H(P)$ and then “blinds” this password by selecting a random scalar value r, and blinding the result to obtain $C = H(P)^r$ . At this point, the client can send the blinded value C over to the server, secure in the understanding that (in a prime-order group), the blinding by r hides all partial information about the underlying password.

The server, which has a salt value s, now further exponentiates this calue to obtain $R = C^s$ and sends the result R back to the client. If we write this out in detail, the result can be expressed as $R = H(P)^{rs}$. The client now computes the inverse of its own blinding value r and exponentiates one more time as follows: $R' = R^{r^{-1}} = H(P)^s$ . This element $R'$ , which consists of the hash of the password exponentiated by the salt, is the output of the desired PRF function.

A nice feature of this protocol is that, if the client enters the wrong password into the protocol, she should obtain a value that is very different from the actual value she wants. This guarantee comes from the fact that the hash function is likely to produce wildly different outputs for distinct passwords.

Problem 2: Proving that the client got the right key K. Of course, at this point, the client has derived a key K, but the server has no idea what it is. Nor does the server know whether it’s the right key.

The solution OPAQUE uses based an old idea due to Gentry, Mackenzie and Ramzan. When the user first registers with the server, she generates a strong public and private key for a secure agreement protocol (like HMQV), and encrypts the resulting private key under K, along with the server’s public key. The resulting authenticated ciphertext (and the public key) is stored in the password database.

C = Encrypt(K, client secret key | server’s public key)

When the client wishes to authenticate using the OPAQUE protocol, the server sends it the stored ciphertext C. If the client entered the right password into the first phase, she can derive K, and now decrypt this ciphertext. Otherwise it’s useless. Using the embedded secret key, she can now run a standard authenticated key agreement protocol to complete the handshake. (The server verifies the clients’ inputs against its copy of the client’s public key, and the client does similarly.)

Putting it all together. All of these different steps can be merged together into a single protocol that has the same number of rounds as SRP. Leaving aside the key verification steps, it looks like the protocol above. Basically, just two messages: one from the client and one returned to the server.

The final aspect of the OPAQUE work is that it includes a strong security proof that shows the resulting protocol can be proven secure under the 1-more discrete logarithm assumption in the random oracle model, which is a (well, relatively) standard assumption that appears to hold in the settings we work with.

In conclusion

So in summary, we have this neat technology that could make the process of using passwords much easier, and could allow us to do it in a much more efficient way — with larger hashing parameters, and more work done by the client? Why isn’t this everywhere?

Maybe in the next few years it will be.

Let’s talk about ZRTP

I just checked the date on my last post and it seems that I haven’t blogged in nearly a month. Believe me, this isn’t for lack of trying. The world has just been a very, very busy place.

But this is the Thanksgiving holiday and every (US) American knows the best part of Thanksgiving is sitting around for hours waiting for a huge bird to cook thoroughly enough that it won’t kill your family. And that means I finally have time to write about my favorite wonky topic: cryptographic protocols. And how utterly confusing they can be.

ZRTP

The subject of today’s post is the ZRTP key agreement protocol. ZRTP has recently gotten some press for being the primary key agreement used by Silent Circle, a new encrypted voice/video/text service launched by PGP inventor Phil Zimmermann and some other bright folks. But it’s also used in other secure VoIP solutions, like Zfone and Moxie’s RedPhone.

Silent Circle’s an interesting case, since it’s gotten some gentle criticism lately from a variety of folks — well, mostly Nadim Kobeissi — for being partially closed-source and for having received no real security audits. Nadim’s typical, understated critique goes like this:

Silent Circle is morally bankrupt for promoting its unreviewed, closed-source, centralized crypto software and marketing it as “secure.”
— Nadim Kobeissi (@kaepora) November 6, 2012

And is usually followed by the whole world telling Nadim he’s being a jerk. Eventually a tech reporter notices the fracas and chimes in to tell us the whole Infosec community is a bunch of jerks:

This twitter.com/marshray/statu… is exactly the kind of thing that makes a community look juvenile, petty and not ready for primetime.
— Quinn Norton (@quinnnorton) November 7, 2012

And the cycle repeats, without anyone having actually learned much at all. (Really, it’s enough to make you think Twitter isn’t the right place to get your Infosec news.)

Now, as unhelpful as all this is, maybe we can make lemonade and let all of this serve as a teaching moment. For one thing, it gives us a (wonky) chance to learn a little bit about this ZRTP protocol that so many people seem to be using.

Overview of ZRTP

The ZRTP protocol is fully laid out in RFC 6189. This is one of the more confusing specs I’ve read — partly because the critical information is spread out across so many different sections, and partly because ZRTP seems determined to address every possible attack simultaneously.

Fortunately the Intro does a good job of telling us what the protocol’s up to:

ZRTP is a key agreement protocol that performs a Diffie-Hellman key exchange during call setup in the media path and is transported over the same port as the Real-time Transport Protocol (RTP) media stream which has been established using a signaling protocol such as Session Initiation Protocol (SIP). This generates a shared secret, which is then used to generate keys and salt for a Secure RTP (SRTP) session.

So: simple enough. ZRTP lets us establish keys over an insecure channel using Diffie-Hellman.

However we all know that Diffie-Hellman isn’t secure against active (Man-in-the-Middle) attacks. Normally we prevent these by signing Diffie-Hellman messages using keys distributed via a PKI. ZRTP is having none of it:

Although it uses a public key algorithm, [ZRTP] does not rely on a public key infrastructure (PKI) … [ZRTP] allows the detection of man-in-the-middle (MiTM) attacks by displaying a short authentication string (SAS) for the users to read and verbally compare over the phone. … But even if the users are too lazy to bother with short authentication strings, we still get reasonable authentication against a MiTM attack, based on a form of key continuity. It does this by caching some key material to use in the next call, to be mixed in with the next call’s DH shared secret, giving it key continuity properties analogous to Secure SHell (SSH).

So our protection is twofold: (1) every time we establish a connection with some remote party, we can verbally compare a “short authentication string” (SAS) to ensure that we’ve both agreed on the same key. Assuming that our attacker isn’t a voice actor, this should let us easily detect a typical MiTM. And just in case we’re too lazy to bother with this, even completing one ‘un-attacked’ connection leaves us with (2) a long-term shared secret that we can use to validate future connection attempts.

So is this a reasonable model? Well, you can draw your own conclusions — but it works fine for me. Moreover, I’m willing to accept just about any assumption that allows us to not think about the ‘Bill Clinton’ or ‘Rich Little attacks‘. So let’s just assume that this voice thing works… and move on to the interesting bits.

The Guts of the Protocol

ZRTP is an interaction between two parties, defined as an Initiator and a Responder. This figure from the spec illustrates the flow of a typical transaction:

Note that the identity of the party acting as the Initiator is determined during the protocol run — it’s the one that sends the Commit message (F5). The protocol breaks down into roughly four phases:

Discovery and protocol negotiation (F1-F4), in which the parties start up a protocol transaction and agree on a supported ZRTP version and ciphersuites.
Diffie-Hellman key establishment (F5-F7). This is almost ‘missionary position’ Diffie-Hellman, with one exception (the F5 message), which we’ll talk more about in a second.
Key confirmation (F8-F10), in which the parties verify that they’ve agreed on the same key.
Secure communication. That’s the last section, labeled “SRTP begins”.

Discovery and Protocol Negotiation

Messages F1-F4 are responsible for setting up a ZRTP connection. This stuff is (almost) entirely non-cryptographic, which means this is the part of the protocol where stuff is most likely to go wrong.

Let me explain: prior to completing the Diffie-Hellman key exchange in messages F5-F7, we can’t assume that Alice or Bob share a cryptographic key yet. Without a key, they can’t authenticate their messages — at least not until after the Diffie-Hellman transaction is complete. That means an attacker can potentially re-write any portion of those messages and get away with it. At least for a while.

So what’s in those messages? Well, just a couple of small things:

A 4-character string containing the version of the ZRTP protocol.
A Client Identifier string (cid), which is 4 words long and identifies the vendor and release of the ZRTP software.
The 96-bit-long unique identifier for the ZRTP endpoint (ZID).
A Signature-capable flag (S) indicates this Hello message is sent from a ZRTP endpoint which is able to parse and verify digital signatures.
The MiTM flag (M), which has something to do with PBXes.
The Passive flag (P), which is set to true if and only if this Hello message is sent from a device that is configured to always act as a Responder (not Initiator).
The supported Hash algorithms, Cipher algorithms (including Diffie-Hellman handshake type), SRTP Auth Tag Types, Key Agreement Types, and SAS Types.

Which is to say: quite a lot! And some of these values are pretty critical. Changing the version number might allow an attacker to roll us back to an earlier (potentially insecure) version of the protocol. Changing the ciphers might (theoretically) allow us to switch to a weaker set of Diffie-Hellman parameters, hash function or Short Authentication String algorithm.

At this point a careful protocol reviewer will be asking ‘what does ZRTP does to prevent this?’ ZRTP gives us several answers, both good and not-so-good:

On the bad side, ZRTP allows us to send a hash of the Initiator’s Hello message through a separate integrity-protected channel, e.g.,secure SIP. (This protection gets folded on into later messages using a crazy hash-chain and MAC construction.) In general I’m not a fan of this protection — it’s like saying a life-preserver will keep you alive… as long as you have a separate lifeboat to wear it in. If you have a secure channel in the first place, why are you using ZRTP? (Even the ZRTP authors seem a little sheepish about this one.)
On the confusing side, Hello messages are MACed using a MAC key that isn’t revealed until the subsequent message (e.g., Commit). In theory this means you can’t forge a MAC until the Commit message has been delivered. In practice, an MiTM attacker can just capture both the Initiator Hello (F3) and the Commit message (F5), learn the MAC key, and then forge the Hello message to its heart’s content… before sending both messages on to the recipient. This entire mechanism baffles me. The less said about it the better.
A more reasonable protection comes from the key derivation process. In a later phase of the protocol, both ZRTP parties compute a hash over the handshake messages. This value is folded into the Key Derivation Function (KDF) that’s ultimately to compute session keys. The hashing process looks like: total_hash = hash(Hello of responder || Commit || DHPart1 || DHPart2)

But… zounds! One important message is missing: the Initiator’s Hello (F3). I can’t for the life of me figure out why this message would be left out. And unless there’s something really clever that I’m missing, this means an attacker can tamper with the Initiator’s Hello message without affecting the key at all.*So is this a problem? Well, in theory it means you could roll back any field in the Initiator Hello message without detection. It’s not exactly clear what practical benefit this would have. You could certainly modify the Diffie-Hellman parameters or ciphers to turn off advanced options like ECDH or 256-bit AES. Fortunately ZRTP does not support ‘export weakened‘ ciphers, so even the ‘weak’ options are pretty strong. Still: this seems like a pretty big oversight, and possibly an unforced error.

For the moment, let’s file it under ‘this protocol is really complicated‘ or ‘don’t analyze protocols at Thanksgiving‘. At very least I think this could use a good explanation.

(Update 11/25: After some back and forth, Moxie points out that the Hash, SAS and Cipher information is repeated in the Commit message [F5], so that should provide an extra layer of protection against rollback on those fields — but not the other fields like version or signature capability. And of course, an implementation might have the Responder prune its list of supported algorithms by basing it off the Initiator’s list, which would be a big problem.)
Diffie-Hellman Key EstablishmentAssuming that the two parties have successfully completed the negotiation, the next phase of the protocol requires the two parties to establish a shared key. This is done using (nearly) straight-up Diffie-Hellman. The parameters are negotiated in the previous phase, and are drawn from a list defined by RFC3526 and a NIST publication.
Normally a Diffie-Hellman agreement would work something like this (all exponentiation is in a group, i.e., mod p):

Alice picks a, sends g^a (message F6).
Bob picks b, sends g^b (message F7).
Both parties compute g^ab and HMAC this value together with lots of other things to derive a session key s0.
The parties further compute a 32-bit Short Authentication value as a function of s0, convert this into a human-readable Short Authentication String (SAS), and compare notes verbally.

The problem with the traditional Diffie-Hellman protocol is that it doesn’t play nice with the Short Authentication String mechanism. Say an MiTM attacker Eve has established a connection with Bob, during which they agreed on SAS value X. Eve now tries to run the Diffie-Hellman protocol with Alice. Once Alice sends her choice of g^a, Eve can now run an offline attack, trying millions of candidate b values until she gets one such that the derived SAS between her and Alice is also X.This is a problem, since Alice and Bob will now see the same SAS, even though Eve is duping them.ZRTP deals with this by forcing one party (Bob) to commit to g^b before the protocol even begins. Thus, in ZRTP Bob picks b and sends H(g^b) before Alice sends her first message. This ‘commitment’ is transmitted in the “Commit” message (F5).

This prevents Bob from ‘changing his mind’ after seeing Alice’s input, and thus the remote party has at most one chance to get a colliding SAS per protocol run. Which means Eve is (probably) out of luck.**

Key Confirmation & Secure Communication

The rest of the protocol does all of the stuff you expect from a key agreement protocol: the two parties, having successfully completed a Diffie-Hellman exchange, now derive a session key by HMACing together the value g^ab with total_hash and any pre-shared secrets they hold from older sessions. If all goes well, the result should be a strong secret that only the two parties know — or an SAS mismatch that reveals Eve’s tampering.

From this point it’s all downhill, or it should be at least. Both parties now have a shared secret that they can use to derive secure encryption and MAC keys. They now construct “Confirm” messages (F8, F9), which they encrypt and (partially) MAC. In principle this exchange gives both parties a chance to detect a mismatch in keys and call the whole thing off.There’s not a ton to say about this section except for one detail: the MAC on these messages is computed only over the encrypted portion (between the == signs below), and leaves out critical details like the Initialization Vector that’s used to encrypt them: ***

Structure of a ZRTP “Confirm” message (source).

Once again it’s not clear if there’s any practical impact here, but at least technically speaking, someone could tamper with this IV and thus change the decryption of the message (specifically, the first 64 bits of H0). I have no idea if this matters — or even if it’s a real attack — but in general it seems like the kind of thing you want to avoid. It’s another example of a place where I just plain don’t understand ZRTP.

Conclusion

I think it should be obvious at this point that I have no real point with this post — I just love protocols. If you’re still reading at this point, you must also love protocols… or else you did something really wrong, and reading this post is your punishment.

ZRTP is a fun protocol to analyze, since it’s a particularly complicated spec that deals with many use-cases at once. Plus it’s an old protocol and it’s been subjected to lots of analysis (including analysis by robots). That’s why I’m so surprised to see loose ends at this date, even if they’re not particularly useful loose ends.

Regardless of the results, anyone who’s interested in cryptography should try their hands at this from time to time. There are so many protocols that we rely on in our day-to-day existence, and far too few of these have really been poked at.

Notes:

* An older paper by Gupta and Shmatikov also notes the lack of authentication for ZID messages, and describes a clever attack that causes the peers to not detect mutually shared secrets. This seems to be fixed in later versions of the protocol, since the ZID values are now included in the Key Derivation Function. But not the rest of the fields in the Initiator Hello.

*** This is another place where the specification is a bit vague. It says that the MAC is computed over the “encrypted part of the message”, but isn’t entirely clear if the MAC is computed pre- or post- encryption. If it’s pre- encryption this is a bit non-standard, but not a major concern.

Reposted: A cryptanalysis of HDCP v2.1

Update 8/27: This post was originally published three weeks ago under a different title. I subsequently took it down to give affected vendors time to patch the bugs. As a result of the notification, Digital Content Protection LLC (DCP) has updated the spec to v2.2.

Contrary to my understanding when I wrote the original post, HDCP v2 actually is used by a number of devices. I would like to give credit to Alon Ziv at Discretix, who had previously discovered the Locality Check issue, and to Martin Kaiser who experimentally verified the master secret issue on a European Samsung TV and a Galaxy S II.

Finally, I would like to thank Hanni Fakhoury and Marcia Hofmann at the Electronic Frontier Foundation for all of their helpful advice. The EFF is one of the only organizations that represents security researchers. Please consider donating so they can keep doing it!

Over the past couple of weeks I’ve mostly been blogging about inconsequential things. Blame summer for this — it’s hard to be serious when it’s 104 degrees out. But also, the world just hasn’t been supplying much in the way of interesting stuff to write about.

Don’t get me wrong, this is a good thing! But in a (very limited) way it’s also too bad. One of the best ways to learn about security systems is to take them apart and see how they fail. While individual systems can be patched, the knowledge we collect from the process is invaluable.

Fortunately for us, we’re not completely helpless. If we want to learn something about system analysis, there are plenty of opportunities right out there in the wild. The best place to start is by finding a public protocol that’s been published, but not implemented yet. Download the spec and start poking!

This will be our task today. The system we’ll be looking at is completely public, and (to the best of my knowledge) has not yet been deployed anywhere (Update: see note above). It’s good practice for protocol cryptanalysis because it includes all kinds of complicated crypto that hasn’t been seriously reviewed by anyone yet.

(Or at least, my Google searches aren’t turning anything up. I’m very willing to be corrected.)

Best of all, I’ve never looked at this system before. So whatever we find (or don’t find), we’ll be doing it together.

A note: this obviously isn’t going to be a short post. And the TL;DR is that there is no TL;DR. This post isn’t about finding bugs (although we certainly will), it’s about learning how the process works. And that’s something you do for its own sake.

HDCPv2

The protocol we’ll be looking at today is the High Bandwidth Digital Content Protection (HDCP) protocol version 2. Before you get excited, let me sort out a bit of confusion. We are not going to talk about HDCP version 1, which is the famous protocol you probably have running in your TV right now.

HDCPv1 was analyzed way back in 2001 and found to be wanting. Things got much worse in 2010 when someone leaked the HDCPv1 master key — effectively killing the whole system.

What we’ll be looking at today is the replacement: HDCP v2. This protocol is everything that its predecessor was not. For one thing, it uses standard encryption: RSA, AES and HMAC-SHA256. It employs a certificate model with a revocation list. It also adds exciting features like ‘localization’, which allows an HDCP transmitter to determine how far away a receiver is, and stop people from piping HDCP content over the Internet. (In case they actually wanted to do that.)

HDCPv2 has barely hit shelves yet (Update: though it was recently selected as the transport security for MiraCast). The Digital Content Protection licensing authority has been keeping a pretty up-to-date set of draft protocol specifications on their site. The latest version at the time of this writing is 2.1, and it gives us a nice opportunity to see how industry ‘does’ protocols.

An overview of the protocol

As cryptographic protocols go, HDCPv2 has a pretty simple set of requirements. It’s designed to protect high-value content running over a wire (or wireless channel) between a transmitter (e.g., a DVD player) and a receiver (a TV). The protocol accomplishes the following operations:

Exchanging and verifying public key certificates.
Establishing shared symmetric keys between the transmitter and receiver.
Caching shared keys for use in later sessions.
Verifying that a receiver is local, i.e., you’re not trying to proxy the data to some remote party via the Internet.

These functions are accomplished via three (mostly) separate protocols: a public-key Authenticated Key Agreement (AKE) protocol, a pairing protocol, where the derived key is cached for later use, and a locality check protocol to ensure that the devices are physically close.

I’m going to take these protocols one at a time, since each one involves its own messages and assumptions.

Phase (1): Authenticated Key Agreement (AKE)

The core of HDCPv2 is a custom key exchange protocol, which looks quite a bit like TLS. (In fact, the resemblance is so strong that you wonder why the designers didn’t just use TLS and save a lot of effort.) It looks like this:

HDCPv2 key agreement protocol (source). Click the image to enlarge.

Now, there’s lots going on here. But if we only look at the crypto, the summary is this:

The transmitter starts by sending ‘AKE_Init’ along with a random 64-bit nonce R_tx. In response, the receiver sends back its certificate, which contains its RSA public key and device serial number, all signed by the HDCP licensing authority.

If the certificate checks out (and is not revoked), the transmitter generates a random 128-bit ‘master secret’ K_m and encrypts it under the receiver’s public key. The result goes back to the receiver, which decrypts it. Now both sides share K_m and R_tx, and can combine them using a wacky custom key derivation function. The result is a shared a session key K_d.

The last step is to verify that both sides got the same K_d. The receiver computes a value H’, using HMAC-SHA256 on inputs K_d, R_tx and some other stuff. If the receiver’s H’ matches a similar value computed at the transmitter, the protocol succeeds.

Simple, right?

Note that I’ve ignored one last message in the protocol, which turns out to be very important. Before we go there, let’s pause and take stock.

If you’re paying close attention, you’ve noticed a couple of worrying things:

The transmitter doesn’t authenticate itself at all. This means anyone can pretend to be a transmitter.
None of the handshake messages (e.g., AKE_Transmitter_Info) appear to be authenticated. An attacker can modify them as they transit the wire.
The session key K_d is based solely on the inputs supplied by the transmitter. The receiver does generate a nonce R_rx, but it isn’t used in the above protocol.

None of these things by themselves are a problem, but they make me suspicious.

Phase (2): Pairing

Public-key operations are expensive. And you only really need to do them once. The designers recognized this, and added a feature called ‘pairing’ to cache the derived K_m for use in later sessions. This is quite a bit like what TLS does for session resumption.

However, there’s one catch, and it’s where things get complicated: some receivers don’t have a secure non-volatile storage area for caching keys. This didn’t phase the designers, who came up with a ‘clever’ workaround for the problem: the receiver can simply ask the transmitter to store K_m for it.

To do this, the receiver encrypts K_m under a fixed internal AES key K_h (which is derived by hashing the receiver’s RSA private key). In the last message of the AKE protocol the receiver now sends this ciphertext back to the transmitter for storage. This appears in the protocol diagram as the ciphertext E(K_h, K_m).

The obvious intuition here is that K_m is securely encrypted. What could possibly go wrong? The answer is to ask how K_m is encrypted. And that’s where things get worrying.

According to the spec, K_m is encrypted using AES in what amounts to CTR mode, where the ‘counter’ value is defined as some value m. On closer inspection, m turns out to be just the transmitter nonce R_tx padded with 0 bits. So that’s simple. Here’s what it looks like:

Encryption of the master key K_m with the receiver key K_h. The value m is equal to (R_tx || 0x000000000000000).

Now, CTR is a perfectly lovely encryption mode provided that you obey one unbreakable rule: the counter value must never be re-used. Is that satisfied here? Recall that the counter m is actually chosen by another party — the transmitter. This is worrying. If the transmitter wants, it could certainly ask the receiver to encrypt anything it wants under the same counter.

Of course, an honest transmitter won’t do this. But what about a dishonest transmitter? Remember that the transmitter is not authenticated by HDCP. The upshot is that an attacker can pretend to be a transmitter, and submit her own K_m values to be encrypted under K_h and m.

Even this might be survivable, if it weren’t for one last fact: in CTR mode, encryption and decryption are the same operation.

All of this leads to the following attack:

Observe a legitimate communication between a transmitter and receiver. Capture the values R_tx and E(K_h, K_m) as they go over the wire.
Now: pretend to be a transmitter and initiate your own session with the receiver.
Replay the captured R_tx as your initial transmitter nonce. When you reach the point where you pick the master secret, don’t use a random value for K_m. Instead, set K_m equal to the ciphertext E(K_h, K_m) that you captured earlier. Recall that this ciphertext has the form:AES(K_h, R_Tx || 000…) ⊕ K_m

Now encrypt this value under the receiver’s public key and send it along.
Sooner or later the receiver will encrypt the ‘master secret’ you chose above under its internal key K_h. The resulting ciphertext can be expanded to:
AES(K_h, R_Tx || 000…) ⊕ AES(K_h, R_Tx || 000…) ⊕ K_m

Thanks to the beauty of XOR, the first two terms of this ciphertext simply cancel out. The result is the original K_m from the first session! Yikes!

This is a huge problem for two reasons. First, K_m is used to derive the session keys used to encrypt HDCP content, which means that you may now be able to decrypt any past HDCP content traces. And even worse, thanks to the ‘pairing’ process, you may be able to use this captured K_m to initiate or respond to further sessions involving this transmitter.

Did I mention that protocols are hard?

Phase (3): The Locality Check

For all practical purposes, the attack above should be our stopping point. Once you have the stored K_m you can derive the session keys and basically do whatever you want. But just for fun, let’s go on and see what else we can find.

At its heart, the locality check is a pretty simple thing. Let’s assume the transmitter and receiver are both trusted, and have successfully established a session key K_d by running the AKE protocol above. The locality check is designed to ensure that the receiver is nearby — specifically, that it can provide a cryptographic response to a challenge, and can do it in < 7 milliseconds. This is a short enough time that it should prevent people from piping HDCP over a WAN connection.

(Why anyone would want to do this is a mystery to me.)

In principle the locality check should be simple. In practice, it turns out to be pretty complicated. Here’s the ‘standard’ protocol:

Simple version of the locality check. K_d is a shared key and R_rx is a receiver nonce.

Now this isn’t too bad: in fact, it’s about the simplest challenge-response protocol you can imagine. The transmitter generates a random nonce R_n and sends it to the receiver. The receiver has 7 milliseconds to kick back a response L’, which is computed as HMAC-SHA256 of {the session key K_d, challenge nonce R_n, and a ‘receiver nonce’ R_rx}. You may recall that the receiver nonce was chosen during the AKE.

So far this looks pretty hard to beat.

But here’s a wrinkle: some devices are slow. Consider that the 7 milliseconds must the round-trip communication time, as well as the time required to compute the HMAC. There is a very real possibility that some slower, embedded devices might be not be able to respond in time.

Will HDCP provide a second, optional protocol to deal with those devices? You bet it will.

The second protocol allows the receiver to pre-compute the HMAC response before the timer starts ticking. Here’s what it looks like:

‘Precomputed’ version of the protocol.

This is nearly the same protocol, with a few small differences. Notably, the transmitter gives the receiver all the time it wants to compute the HMAC. The timer doesn’t start until the receiver says it’s ready.

Of course, there has to be something keeping the RTT under 7ms. In this case the idea is to keep the receiver from speaking until it’s received some authenticator from the transmitter. This consists of the least significant 128-bits of the expected HMAC result (L’), which is computed in the same way as above. The receiver won’t speak until it sees those bits. Then it‘ll it kick back its own response, which consists of the most-significant 128 bits of the same value.

Ok, so here we have a protocol that’s much more complicated. But considered its own, this one looks pretty ok by me.

But here’s a funny question: what if we try running both protocols at once?

No, I’m not being ridiculous. You see, it turns out that the receiver and transmitter get to negotiate which protocol they support. By default they run the ‘simple’ protocol above. If both support the pre-computed version, they must indicate this in the AKE_Transmitter_Info and AKE_Receiver_Info messages sent during the handshake.

This leads to the following conjecture: what if, as a man-in-the-middle attacker, we can convince the transmitter to run the ‘pre-computed’ protocol. And at the same time, convince the receiver to run the ‘simple’ one? Recall that none of the protocol flags (transmitted during the AKE) are authenticated. We might be able to trick both sides into seeing a different view of the other’s capabilities.

Here’s the setup: we have a receiver running in China, and a transmitter located in New York. We’re a man-in-the-middle sitting next to the transmitter. We want to convince the transmitter that the receiver is close — close enough to be on a LAN, for example. Consider the following attack:

Modify the message flags so that the transmitter thinks we’re running the pre-computed protocol. Since it thinks we’re running the pre-computed protocol, it will hand us R_n and then give us all the time in the world to do our pre-computation.
Now convince the receiver to run the ‘simple’ protocol. Send R_n to it, and wait for the receiver to send back the HMAC result (L’).
Take a long bath, mow the lawn. Watch Season 1 of Game of Thrones.
At your leisure, send the RTT_READY message to the transmitter, which has been politely waiting for the receiver to finish the pre-computation
The transmitter will now send us some bits. Immediately send it back the most significant bits of the value L’, which we got in step (2).
Send video to China.

Now this attack may not always work — it hinges on whether we can convince the two parties to run different protocols. Still, this is a great teaching example in that it illustrates a key problem in cryptographic protocol design: parties may not share the same view of what’s going on.

A protocol designer’s most important job is to ensure that such disagreements can never happen. The best way to do this is to ensure that there’s only one view to be had — in other words, dispense with all the options and write a single clear protocol. But if you must have options, make sure that the protocol only succeeds if both sides agree on what those options are. This is usually accomplished by authenticating the negotiation messages, but even this can be a hard, hard problem.Compared to the importance of learning those lessons, actually breaking localization is pretty trivial. It’s a stupid feature anyway.

In Conclusion

This has been a long post. To the readers I have left at this point: thanks for sticking it out.The only remaining thing I’d like to say is that this post is not intended to judge HDCPv2, or to make it look bad. It may or it may not be a good protocol, depending on whether I’ve understood the specification properly and depending on whether the above flaws make it into real devices. Which, hopefully they won’t now.

What I’ve been trying to do is teach a basic lesson: protocols are hard. They can fail in ruinous, subtle, unexpected, exciting ways. The best cryptographers — working with BAN logic analyzers and security proofs — still make mistakes. If you don’t have those tools, steer clear.

The best ‘fix’ for the problem is to recognize how dangerous protocols can be,and to avoid designing your own. If you absolutely must do so, please try to make yours as simple as possible. Too many people fail to grok this lesson, and the result is, well, HDCPv2.

===

Update 8/27: As I mentioned above, DCP has released a new version of the specification. Version 2.2 includes several updates: it changes the encryption of Km to incorporate both the Transmitter and Receiver nonces. It also modifies the locality check to patch the bug described above. Both of these changes appear to mitigate the bugs above, at least in new devices.

A missing post (updated)

Update (8/27): I’ve put the original post back up.

Update (8/9): I’ve re-written this post to include a vague, non-specific explanation of the bug. I’ve now confirmed the problem with one vendor, who has asked for a week to issue a patch. So far I haven’t had a response from the DCP’s technical people. And yes, I do realize someone PasteBinned the original post while it was up.

A few people have asked what happened to the post that was in this space just a few hours ago. No, you’re not going crazy. It was here.

The post contained a long, detailed evaluation of the HDCP v2 protocol. My idea was to do real-time evaluation of an industry protocol that hasn’t been deployed yet — a kind of ‘liveblogging’ cryptanalysis. What I expected to find was some bad practices, which I would gently poke fun at. I didn’t expect to find anything serious.

I was wrong in that initial judgement, with some caveats. I’m going to give a vague and non-specific summary here, and I hope to re-post the detailed technical post in a few days when I’ve heard (something, anything!) from DCP, the organization that maintains HDCP.

In case you’ve never heard of it, HDCP is a security protocol used to ‘protect’ video traveling over wired and wireless networks. There are two versions. Version 1 is in your TV today, and was seriously compromised in 2010. Version 2 is much better, but has only been deployed in a few products — including those that implement MiraCast (formerly Wi-Fi Display).

Version 2 contains a key agreement protocol that’s designed to establish a session encryption key between a transmitter (your phone, for example) and a receiver (a TV). Once this key is established, the transmitter can encrypt all video data going over the wire.

What I discovered in my brief analysis is a flaw in the key agreement protocol that may allow a man-in-the-middle to recover the session key (actually the ‘master secret’ used to derive the session key). This could potentially allow them to decrypt content. More on that in a minute, though.

I also discovered some slightly less serious flaws elsewhere in the protocol. It turns out that the DCP already knows about those, thanks to some enterprising work by a smart guy at an unnamed vendor (who deserves credit, and will get it once I put the original post back up).

Now for a few big caveats about the session key bug.

The bug I found does not get you all the way to decrypting HDCPv2 streams in practice, thanks to a tiny additional protection I missed while writing the original post. I don’t think much of this protection, since it involves a secret that’s stored in every single HDCPv2-compliant device. That’s a pretty lousy way to keep a secret.

And of course I haven’t personally verified this in any real HDCP devices, since I don’t own any. Although if I did, I could use this nifty HDCP plugin for WireShark to do some of the work.

The issue has been confirmed by one vendor, who is working on a patch for their product. Their products are used in real things that you’ve heard of, so I’m trusting that they’d know.

The last thing I want to address is why I published this, and why I subsequently pulled it.

When I wrote the original post I thought HDCP v2 was just a ‘paper spec’ — that there were no devices actually using it. Shortly after posting, I came across one commercial product that does claim to support HDCPv2. Later I discovered a few others. To be on the safe side, I decided to pull the post until I could notify the vendors. Then through sheer ineptitude I briefly re-posted it. Now I’m doing my best to put the toothpaste back in the tube.

As soon as I get some feedback I intend to put the post back up. A post which, incidentally, was not intended to break anything, but rather to serve as a lesson in just how complicated it is to design your own protocol. I suppose it’s achieved that goal.

Anyway, I’m putting this up as a placeholder in case you’re curious about what happened or why the heck I’m not blogging. Writing a long technical post and then having to can it is a drag. But hopefully we’ll be back to our regularly-scheduled programming in no time at all.

Where Things Fall Apart: Protocols (Part 2 of 2)

This is the fourth installment in a series on cryptographic engineering. You can see the previous posts here:

1. Introduction

2. Where Things Fall Apart: Primitives
3. Where Things Fall Apart: Protocols (Part 1)

Test Drive

Your vehicle ignition system is the result of a Darwinian arms race between the people who build cars — and those who like to drive them.

Vehicle ignition switches through history. From left: vintage floor-mounted ignition button; 60s-era cylinder lock; 90s/2000-era high security key; 2011 dash-mounted ignition button.

The very first electronic vehicle ignition was nothing more than a switch that completed an electrical circuit. This worked fine in small towns and out on the farm, but things weren’t so simple in the big city. So manufacturers adapted, first adding a mechanical lock cylinder then hardening the wiring. This worked for a while, until inevitably the thieves got smarter. Worse, at this point the answer wasn’t so obvious: ignition lock technology stagnated. By the late 1980s and early 1990s, vehicle theft was a multi-billion dollar industry.

A few luxury manufacturers tried to improve the physical security of their locks using high-security keys and non-standard templates. But for most manufacturers, however, there was already a more promising approach at hand. Cars themselves were becoming reliant on microcontrollers for engine control. Why not use a digital lock?

The result is the vehicle immobilizer. A typical first-gen immobilizer used a small chip embedded into the head of the car key. This chip had a single purpose: when the driver inserted the key into the lock cylinder, it would squirt out a code (or serial number), which could be received by an antenna in the lock cylinder. If this code matched what the vehicle expected to see, the engine would start. Expressed as a protocol, the transaction looked like this:

Immobilizers effectively shut down traditional hotwiring and lock-picking. But they had a fatal flaw that criminals soon discovered. Since the code never changed, someone with the right equipment could eavesdrop on the communication (or borrow your keys), and later replay it to the car. This sounds complicated, but quickly became practical thanks to inexpensive devices called “code-grabbers”.

Once again manufacturers adapted. The next generation of immobilizers dropped the fixed key in favor of a simple challenge/response authentication protocol. In this approach, the immobilizer chip and car share a cryptographic key of some sort. When you insert your car key, the car generates a random “challenge” number and sends it to the key. The chip in your car key uses the cryptographic key to compute a response based on the challenge. This tops code-grabbers, since the key itself never goes over the air, and the challenge always changes.

Challenge response protocol between vehicle and immobilizer key. The key and car share a deterministic cryptographic algorithm F() and a secret key. The car computes F(key, challenge) and compares it to the response value.

So we’ve laid out a cryptographic protocol, but it’s a simple one, and one that’s likely to be effective. What could go wrong?

40 bits of personal history

Back in 2005, along with some colleagues, I looked at the immobilizer system used in a few million Ford, Toyota and Nissan vehicles. This particular system was based on a Texas Instruments chip called the Digital Signature Transponder (DST), a tiny RFID chip with a cipher F() and a 40-bit secret key.

Two DST form factors. The big one is a Mobil Speedpass, which also relies on the DST technology.

The DST uses exactly the challenge-response protocol I describe above. The reader (car) sends it a 40 bit challenge, the DST encrypts that value with its cipher, truncates the result down to a 24 bit response, ships it back. The car also has a copy of the secret key which it uses to verify the response.

The problem with the DST is not the protocol. Rather, it’s the number I mentioned above: 40. As in 40-bit key length. If an adversary — say, a malicious parking attendant — borrows your car key, she can issue a challenge to the chip. After collecting the response, she can, at her leisure, test every single one of the approximately 1.1 trillion possible Immobilizer keys until they find one where F(key, challenge) is equal to the response they got from your DST chip.** This sounds hard, but it only takes a few hours on an FPGA.

This process is called “cloning”. It’s not the scariest attack since, in general, it requires the adversary to get your car key, or at least get close enough to scan it.

Now we come to the interesting part. Texas Instruments was well aware of the DST’s keysize limitation. To foil this attack, they deployed a new chip called the DST+. Rather than simply replace the weak 40-bit algorithm with a better one, which would have solved the problem, they decided to address cloning attacks using a protocol.

DST+ Mutual Authentication protocol. From a presentation in the Fourth Conference on the Advanced Encryption Standard (AES) (2004).

What I know about the DST+ protocol comes from a public presentation posted by a Dr. Ulrich Kaiser from Texas Instruments Germany. I freely admit that I’ve never verified this diagram against a real DST+, so caveat emptor.

The diagram is a little confusing: let’s walk through it step by step. The car (“Reader”) is on the left, and the DST+ chip (“Transponder”) is on the right. For the most part it’s exactly the same as the DST: the car generates a 40-bit challenge and sends it over to the chip. The chip encrypts the challenge under its secret (“Immobilizer”) key, truncates the result down to 24 bits, and sends back the response.

The new stuff is “tacked on” to the original protocol. long with the challenge, the car transmits an extra 24 bits that it derives by: 1) encrypting its own challenge under the Immobilizer key, 2) encrypting the result again under a second “Mutual Authentication Key” (MAK), and 3) truncating that result down to 24 bits.

Since the DST+ chip shares both keys, it can verify that the car’s transmission is correct before it responds. If the value isn’t correct, the DST+ clams up. No response for you!

In principle this stumps our malicious valet. He doesn’t know the right keys. He can send the DST+ all the challenges he wants — but it won’t answer him. No responses, no attack.

All’s well that ends well?

A first observation is that the DST+ protocol only protects against challenges sent by an unauthorized reader. If our valet can eavesdrop on the communication between the DST+ and the legitimate reader in the car, he can still obtain a (challenge, response) pair. Since these values are identical to those in the original DST protocol, the same attacks apply. He can use an FPGA to brute force the 40-bit Immobilizer key.

Here’s something else. Once he’s got the car’s Immobilizer key, he can go back and find the Mutual Authentication Key (MAK). Given the challenge sent by the car, along with the 24-bit “additional authentication” string, he can:

compute I = F(Immobilizer key, challenge),
use the FPGA to test every single possible MAK value
stop when he finds a MAK value s.t. F(MAK, I) matches the “additional authentication”.

This might seem a little pointless. After all, we already have the Immobilizer key, which is all we need to simulate a DST+ and thus start the car. Why bother going back for the MAK?

Into hypothetical territory

Yet imagine… What if a car manufacturer made a tiny mistake. What if, speaking hypothetically, the manufacturer decided to use a single MAK across many different cars — say, every 2009 Toyota Camry? A tiny, harmless optimization.

And yet.

We know that knowledge of the Immobilizer Key makes it easy to find the car’s MAK. But this works the other way, too. If many cars share a MAK, then anyone who knows that value can use it to derive the Immobilizer key for a car.

Even better (or worse, depending on your point of view) our attacker can do this without ever seeing the car key at all. All you need is a challenge value and “additional authentication” value, both of which the car will give to you. The owner can be fast asleep with his keys safe on his nightstand next to him. You’re outside stealing his car.

So in other words, if you use the DST+ mutual authentication key, and make the small mistake of re-using a MAK across multiple vehicles, you’ve transformed a mild key cloning attack into something much worse. People can now steal your car even without scanning your key.

Please keep in mind that all of this is hypothetical and speculative. But the re-use of a MAK key could happen. There’s evidence that it may have, at least in the past. What it goes to show that if you’re not very careful about your goals and security properties, protocols can do unexpected things. They can make you less secure.

Rolling it up

These posts were not intended to be an in-depth tutorial on the mysteries of protocol design and analysis. I do hope to talk about that more in the future. So far we’ve barely scratched the surface of what can go wrong in a cryptographic protocol. And these are certainly not the best examples of “bad” protocols.

Instead, the purpose of this discussion was to provide a couple of case studies involving real protocols whose failure has implications for millions of people. It was also to show you how tiny changes to a protocol can have a significant impact.

In the next few installment of this overview series we’ll look a bit at hardware, physical security, and the kind of things that can go wrong even when you build the best machines with the best intentions.

Footnotes:

** Note, since the response is only 24 bits long, there’s a possibility of collisions, i.e., many different keys will satisfy truncate(F(key, challenge)) == response. To deal with this problem, all you need to do is obtain a second (challenge, response) pair and weed out the false positives.

Where Things Fall Apart: Protocols (Part 1 of 2)

This is the third installment in a series on cryptographic engineering. You can see the previous posts here:

1. Introduction

2. Where Things Fall Apart: Primitives

Protocols

Once upon a time there were no cryptographic protocols. Most cryptographic communication took the form of one-move operations like this one:

There are plenty of things that can go wrong in this scenario. But if you’re the sender (as opposed to the sap carrying the message) you could restrict your concerns to the security of the primitive (e.g., cipher), or perhaps to the method of key distribution. Admittedly, through most of history it was almost impossible to get these things “right”. Still, you knew where your weaknesses were.

Fast forward to present day. Modern cryptographic systems are almost never content with simple one-way communication. Consider the simple act of plugging in your high-definition TV via a digital connection. Thanks to modern rights management protocols such as HDCP and DTCP, this can lead to communication flows like this:

Not only is the communication here vastly more complicated, but it takes place in the presence of a more dangerous adversary. For one thing, he owns the communication channel and the hardware — it’s probably sitting in his living room or lab. Most importantly, he’s got time on his hands. Your typical barbarian only got one chance to capture a ciphertext. If this guy needs to, he can run the protocol 100,000 times.

Now, most people will tell you that cryptography is one of the most settled areas of computer security. Relative to the disaster that is software security, this is true. Particularly in when it comes to standardized cryptographic primitives, we’ve made enormous progress — to the point where attacks lag by years, if not decades (though, see my previous post). If you use reasonable primitives, your attacker will not get you by breaking the algorithm.

As Gibson said, the future is here, but it’s not evenly distributed. And in the field of cryptography, protocol design is one area in which progress has definitely not been as fast.

We’ve learned a huge amount about what not to do. We have tools that can sometimes catch us when we trip up. Unfortunately, we still don’t have a widely-accepted methodology for building provably-secure protocols that area as practical as they are safe to use.

Maybe I’m getting ahead of myself. What do I even mean when I refer to a “cryptographic protocol”?

For the purposes of this discussion, a protocol is an interactive communication between two or more parties. Obviously a cryptographic protocol is one that combines cryptographic primitives like encryption, digital signing, and key agreement to achieve some security goal.

What makes cryptographic protocol special is the threat. When we analyze cryptographic protocols, we assume that the protocol will be conducted in the presence of an adversary, who (at minimum) eavesdrops on all communications. More commonly we assume that the adversary also has the ability to record, inject and modify communications flowing across the wire. This feature is what makes cryptographic protocols so much more interesting — and troublesome — than the other types of protocol that crop up in computer networking.

A first example: SSL and TLS

SSL and its younger cousin TLS are probably the best-known security protocols on the Internet. The most recent version is TLS 1.2 and we think it’s a pretty solid protocol now. But it didn’t reach achieve this status in one fell swoop. The history of each SSL/TLS revision includes a number of patches, each improving on various flaws found in the previous version.

The original SSL v1 was a proprietary protocol created by Netscape Inc. back in 1994 to secure web connections between a browser and a web server. SSL v2 was standardized the next year. The protocol is a highly flexible framework by which two parties who have never met before can authenticate each other, agree on transport keys, and encrypt/authenticate data. Part of the flexible nature of the protocol is that it was designed to support many different ciphersuites and modes of operation. For example, it’s possible to configure SSL to authenticate data without encrypting it, even though almost nobody actually does this.

Now let me be clear. There’s nothing wrong with flexibility, per se. The benefits of flexibility, extensibility and modularity are enormous in software design — there’s nothing worse than re-factoring 100,000 lines of production code because the original designers didn’t consider the possibility of new requirements. It’s just that when it comes to building a secure cryptographic protocol, flexibility almost always works against you.

Let’s consider a concrete example. The SSL v2 protocol included a very nice feature known as “ciphersuite negotiation”. This allowed both parties to negotiate which ciphers they wanted to use from a (potentially) distinct list of candidates that each one supported. In practice, some of those ciphers were likely to be better than others. Some were designed to be worse: for example, they were created to support export-controlled browsers that max out at a 40-bit key length.

There’s nothing fundamentally wrong with ciphersuite negotiation. It clearly gives a lot of flexibility to SSL, since the protocol can easily be upgraded to support new ciphersuites without breaking compatibility with older implementations. From the spec, you can see that this section of the protocol was treated as something of an abstraction by the designers. The negotiation is handled via its own dedicated set of messages. Looking at this, you can almost hear some long-ago dry erase marker scribbling “ciphersuite negotiation messages go here”.

Unfortunately, in providing all of this flexibility, the designers of SSL v2 essentially created many implicit protocols, each of which could be instantiated by changing the details of the ciphersuite negotiation process. Worse, unlike the messages in the “main” protocol, the ciphersuite negotiation messages exchanged between client and server in the SSLv2 protocol were not authenticated.

In practice, this means that our adversary can sit on the wire in between the two, replacing and editing the messages passing by to make it look like each party only supports the lowest common ciphersuite. Thus, two parties that both support strong 128-bit encryption can be easily tricked into settling on the 40-bit variant. And 40 bits is not good.

Now, the good news is that this vulnerability is a piece of history. It was closed in SSL v3, and the fix has held up well through the present day. But that doesn’t mean TLS and SSL are home free.

This post is continued in the next section: “Where Things Fall Apart: Protocols (Part 2)”.