1
$\begingroup$

I am new to Cryptography and I know there are better ways to test a cipher's effectiveness out there but in this case I am trying to test a cryptographic algorithm's(AES,xchacha20,twofish) hamming distance and avalanche effect for a school assignment/paper. Basically encrypting 50 files containing almost the same content but with differences of a single byte and calculating the hamming distance and avalanche effect of the encrypted files(ciphertext).

My question is, is there anything out there regarding a baseline of how much hamming distance an crypto algorithm should have? I can't find anything from NIST so I'm asking here in hopes of finding an established standard so I can reference it in my assignment.

These are the results that I have:

AES

Twofish

Xchacha20-Poly1305

AES+Xchacha20-Poly1305 Hybrid

$\endgroup$
9
  • 1
    $\begingroup$ To see the avalanche effect is usually performed on the block cipher itself. One sets many plaintext blocks starts bit-flipping each position and tests how many bits changed. In the end, you expected to see a normal distribution. Select random plaintexts, execute the single bit-flipping function, and calculate the distances. Further, start for double-bit flipping and so on... $\endgroup$
    – kelalaka
    Commented Oct 23, 2023 at 19:21
  • $\begingroup$ well in this case, I have a plaintext ex: "apple" and generating 50 files out of the original plaintext resulting into modifications(of 1 byte) like "!pple", "upple" "aiple". Then encrypting them and comparing the ciphertext to get the hamming distance. Is this an acceptable way to visualize the avalanche effect? $\endgroup$
    – Jake
    Commented Oct 23, 2023 at 20:53
  • 1
    $\begingroup$ Start with bit manipulations, not bytes. Do not limit yourself to files. Generate random plaintext of size 128 bits. Perform the 128 different bit manipulations with the original one. Execute this million and million times. This is for a one-bit change. After that, you can start two-bit changes though generating them will be a problem... $\endgroup$
    – kelalaka
    Commented Oct 23, 2023 at 21:23
  • $\begingroup$ AES is not a generic cipher, we've been over that, you need a mode of operation. If a cipher is sufficiently randomized then what would you expect if you compare two (identically sized) ciphertext with regards to the Hamming distance? As with testing for "entropy" or randomness I don't think you can show the avalanche effect if you just look to the output of a cipher; you'd need to look inside the (block) cipher to see what is happening (but I'm happy to be corrected on that). $\endgroup$
    – Maarten Bodewes
    Commented Oct 24, 2023 at 9:13
  • $\begingroup$ @MaartenBodewes It is the first test ( for the avalanche) to see that one did not miss something in the design. Consider that you can see less/more than %50 percent of change on the individual output bits per bit flip on the plaintext. The mode of operations are not considered on this, they are already proven/shown under the assumption of the PRP/PRF. We are looking at the cipher as to how good this to be a primitive to be used... $\endgroup$
    – kelalaka
    Commented Oct 24, 2023 at 9:25

2 Answers 2

1
$\begingroup$

Yes there is 😃

Hamming distance and the behaviour of the avalanche effect are closely related.

From here we see that the hamming distance $d$ between two strings $a, b$ is $d(a,b)$. It's the number of edits calculated as $a \oplus b$.

The avalanche effect is the number of output bits that flip having changed one input bit in a cryptographic primitive offering pseudo random behaviour (permutation or function). We expect 50% of them to change on average, and there will be a variance due to the pesky nature of randomness. We expect a Normal distribution as:-

$$ \mathcal{N}(0.5w ,0.25w) $$

where $w$ is the bit width of the block function. I suggest that if after a few tests standardised $\mathcal{X} < -1.96$ or $\mathcal{X} > 1.96$, the function is broken. I've chosen $\alpha = 0.05$ somewhat arbitrarily, but trying to fall on the right side of security.

Conclusion: your testing failed. Your testing, not the primitives. E.g. Top image is for AES. That has a block width of 128 bits. Therefore we expect a mean of 64 bits to change, with a standard deviation of approximately 5.7 bits.

$\endgroup$
2
  • $\begingroup$ This is very interesting. Thank you so much for your input on the matter. By the way I'm using AES256 in this example, does that mean that we should expect a mean of 128 bits to change? would that have any effect on the standard deviation? $\endgroup$
    – Jake
    Commented Oct 29, 2023 at 18:49
  • $\begingroup$ @Jake Ah, AES256 means that the key is 256 bits. The input/output block size is always 128 bits. $\endgroup$
    – Paul Uszak
    Commented Oct 30, 2023 at 0:58
1
$\begingroup$

An encryption algorithm is meant to model a pseudorandom permutation, and I think at its heart your experiment is about trying to determine how well AES does this. It sounds like you are analyzing the behavior of something like $E(x) \oplus E(x \oplus c_i)$, where $c_i$ is some low-weight constant, and your test is to look at the weight of the result. If the algorithm $E$ does act like a pseudorandom function, then $E(x) \oplus E(x \oplus c_i)$ should be indistinguishable from random. The Hamming weight should then follow a simple binomial distribution.

For you, I think the big questions are: what statistical tests will you use to assess if your data matches the expected distribution? And how much data do you need to collect to be able to draw any conclusions?

Also note that if you are talking about strings and bytes you may already operating at too high a level. The obvious place to start is with the all-zeroes input, and single-bit differences. Presumably this experiment uses a single fixed key, will that be all zeroes also?

$\endgroup$
5
  • $\begingroup$ Err, why would the key be all zeros for avalanche effect measurement? Is that a weak AES key? $\endgroup$
    – Paul Uszak
    Commented Oct 28, 2023 at 22:23
  • $\begingroup$ Yes, you are correct, I am using the same key with each file. Although I just generated a random 256 key and I used the same random generated key over and over. Can you please expand on what you meant with "operating at a too high a level"? $\endgroup$
    – Jake
    Commented Oct 29, 2023 at 18:45
  • $\begingroup$ @PaulUszak There are no known weak keys, and there is no reason to expect the behavior to depend on the key. So you might as well choose one that is easy to remember, enter, and communicate. $\endgroup$
    – bmm6o
    Commented Oct 30, 2023 at 14:48
  • $\begingroup$ @Jake Maybe I'm splitting hairs, but the implementation of AES that you are using will likely accept and return byte arrays or similar. If you start with string inputs, you have to worry about character encodings or how your language/runtime represents strings as bytes. This is something that confuses many novice programmers, and so it might be beneficial to skip right to bytes. $\endgroup$
    – bmm6o
    Commented Oct 30, 2023 at 14:55
  • $\begingroup$ Also, if you are going to be flipping individual bits of your inputs, you'll be operating on bytes and not characters. So you might as well frame the problem in terms of bytes and not strings. $\endgroup$
    – bmm6o
    Commented Oct 30, 2023 at 21:00

Not the answer you're looking for? Browse other questions tagged or ask your own question.