Why is plain-hash-then-encrypt not a secure MAC?

Question

It seems that even in MAC-then-encrypt systems like SSL, something like HMAC is used rather than a plain hash. Why?

Suppose we use some stream cipher; then why can't we use $Encrypt(m | H(m))$ as the MAC-then-encrypted version of the message? Assuming no bad relations between $Encrypt$ and $H$, what are the possible weaknesses? It seems that in this case we are encrypting the hash with part of a secretly keyed stream cipher keystream, and the hash is secure; does this not fulfil an approximation of the ideal MAC?

This must be somehow bad, since no secure protocol seems to use it. Why?

See also Does CBC encryption of a hash provide authenticity? — Gilles 'SO- stop being evil', Commented Jan 5, 2018 at 20:38

fgrieu · Accepted Answer · 2019-12-11 17:32:14Z

$\operatorname{Encrypt}(m\|H(m))$ is not an operating mode providing authentication; forgeries are possible in some very real scenarios. Depending on the encryption used, that can be assuming only known plaintext.

Here is a simple example with $\operatorname{Encrypt}$ a stream cipher, including any block cipher in CTR or OFB mode. Mallory wants to forge an authenticator for some message $m$ of his choice.

Mallory intercepts a cryptogram $\operatorname{Encrypt}(m_0\|H(m_0))$ of the form $IV\|C_0$, with $m_0$ known and at least the size of $m$; $C_0$ is the size of $m_0\|H(m_0)$ and the eXclusive-OR of the later with keystream $K$ function of $IV$.
Mallory computes $K$ as $C_0\oplus(m_0\|H(m_0))$, truncated to the size of $m\|H(m)$ if $m$ is shorter than $m_0$.
Mallory computes $C=(m\|H(m))\oplus K$.
Mallory replaces the cryptogram by $IV\|C$, which will pass the authentication check of the receiver, and decipher as $m$.

Another example: assume $\operatorname{Encrypt}$ is AES in CBC, CFB, or OFB mode with random $IV$; and $H$ is SHA-256. Mallory wants to forge an authenticator for some message $m$ of his choice, with size $s$ blocks of 16 bytes (the block size of AES).

Mallory computes $m_1=m\|H(m)$, of $s+2$ blocks.
Mallory manages to insert $m_1$ into some message $m_0\|m_1\|m_2$ sent by a legitimate holder of the AES key, who chooses $m_0$ and $m_2$, with $m_0$ non-empty and of size multiple of the block size known by Mallory.
Mallory intercepts the cryptogram $IV\|C$.
From $C$ Mallory removes one less block than in $m_0$; keeps the next block, forming $\widetilde{IV}$; then the next $s+2$ blocks, forming $\widetilde{C}$.
Mallory replaces the cryptogram with $\widetilde{IV}\|\widetilde{C}$, which will pass the authentication check of the receiver, and decipher as $m$.

That scenario is not far fetched: if Mallory is in a position to choose some file in a CD-ROM image, like a movie he pretends will be a promotion, then he can make a fake enciphered-and-authenticated CD-ROM image, which will appear as an authentic CD-ROM image and decipher to something Mallory chooses arbitrarily.

If $m_0$ can be empty (which is less realistic), an even simpler attack works: $\widetilde{IV}$ is $IV$, $\widetilde{C}$ is the first $s+2$ blocks of $C$.

^{The title of the question asks if $\operatorname{Encrypt}(m\|H(m))$ is a secure MAC, which to a Vulcan means that $\operatorname{Encrypt}(m\|H(m))$ is appended to $m$ sent in clear. This is does not provide authentication either, and succumbs to trivial variants of the above.}

otus · Accepted Answer · 2014-05-28 06:08:25Z

Two things going on that together may make plain-hash-then-encrypt insecure.

First, the distinction between secure MACs and hashes, which is that a hash function may allow you to derive $H(m')$ from $H(m)$ even if you only know how $m'$ and $m$ differ. Length extension attacks on SHA-1 and SHA-2 are a practical way that can happen, but there could be others if the hash function doesn't specifically guarantee that there isn't.

Second, stream ciphers allow you to make deterministic changes to the plaintext. Specifically, you can flip any bits you like in the ciphertext and they'll be flipped in the decrypted plaintext as well (assuming the cipher uses XOR as most do).

Put those together and it could be possible to e.g. flip the last message bit, determine which bits that would flip in the hash, then flip those.

Additionally, even if you chose a secure MAC as $H$, having the hash inside encryption means you need to decrypt and hash before deciding whether the ciphertext has been corrupted, increasing retransmission latency on a noisy channel compared to encrypt-then-hash.

Community · Accepted Answer · 2017-04-13 12:48:22Z

…something like HMAC is used rather than a plain hash.

HMAC is a keyed hash… which means it additionally provides unforgeability.

A “plain hash” (which I assume to include cryptographic hashes) merely provides collision resistance, while a HMAC provides both collision resistance and unforgeability… because an attacker is unable to calculate a new, valid HMAC of a modified/forged message ($m$), unless that attacker also has the needed key to produce a new, valid, and verifiable HMAC.

If you would use a (let’s just call it) “unkeyed” hash, an attacker would be able to modify/change the message ($m$) if that attacker would have (for example) guessed or intercepted your encryption key, or if that attacker would be able to break your encryption in any way. The problem (better: security issue) in this case is that you would potentially be accepting and decrypting ciphertext from the attacker instead of the expected sender, and you would never learn about the fact that the message $m$ has been messed with. Your simple hash would be useless… as all it provides is collision resistance and that’s it.

When you use a HMAC, that same attacker would additionally have to successfully guess/intercept the key for your HMAC (that is, unless you use the same key for both the encryption and the HMAC… which certainly isn’t the smartest idea because that would make it easier for an attacker to forge-it-all). So, the use of a keyed hash does not only provide collision resistance like an unkeyed hash, it also lets you verify if $m$ is authentic (unforgeability).

Keeping it short and simple: creating a “simple hash” of some plaintext is a piece of cake for an attacker; but trying to create a valid and verifiable “keyed hash” is a completely different and more complicated beast. So, compared to the use of a regular hash function, HMAC practically adds an additional layer of security. That is one of the main reasons to prefer HMACs over “unkeyed” hashes.

Note that I’m not saying your Encrypt(m | H(m)) method would be insecure (that mainly depends on the encryption… better: the used algorithm and key choice). All I’m explaining is why it’s logic to prefer a MAC over a simple hash, and that a simple hash would give you nothing if your encryption fails/breaks.

…does this not fulfill an approximation of the ideal MAC?

No.

…no secure protocol seems to use it. Why?

If you can get an additional, well-vetted security layer for free, why would you want to avoid or ignore it?

You once said “HMAC seems a bit complicated”, but if you look at it you will notice that it also gives you a good chunk of “unforgeability” in return. See, sometimes, things are “a bit complicated” for a good reason… and people prefer to use it for exactly that reason. In this case, to gain unforgeability.

e-sushi · Accepted Answer · 2014-05-31 17:33:09Z

Why?

We want to be able to build off of more basic encryption schemes.

Suppose we use some stream cipher; then why can't we use Encrypt(m|H(m)) as the MAC-then-encrypted version of the message?

For "traditional" non-authenticated encryption schemes, encrypting plaintexts with the same length always produces ciphertexts with the same length and for all messages $m_0$, one can easily find a string $p$ such that for all messages $m_1\hspace{-0.0 in}$, truncating $Encrypt(m_0|H(m_0)|p|m_1|H(m_0|H(m_0)|p|m_1))$ to the appropriate length will produce an encryption of $m_0|H(m_0)$.

Assuming no bad relations between $Encrypt$ and $H$, what are the possible weaknesses?

The possible weakness is an adversary's ability to choose a "ciphertext" that will decrypt to a highly non-random "plaintext".

It seems that in this case we are encrypting the hash with part of a secretly keyed stream cipher keystream, and the hash is secure; does this not fulfil an approximation of the ideal MAC?

Yes; as I explained earlier in this answer, that does not fulfill an approximation of the ideal MAC.

This must be somehow bad, since no secure protocol seems to use it. Why?

It only works when the underlying encryption scheme already has some resistance to active attacks.

Stack Exchange Network

Why is plain-hash-then-encrypt not a secure MAC?

4 Answers 4

Not the answer you're looking for? Browse other questions tagged
hash
protocol-design
mac
or ask your own question.

Linked

Hot Network Questions

Why is plain-hash-then-encrypt not a secure MAC?

4 Answers 4

Not the answer you're looking for? Browse other questions tagged hashprotocol-designmac or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
hash
protocol-design
mac
or ask your own question.