35
$\begingroup$

I'm not even sure if they are serious, but I've heard many times that some people refuse to not only trust their computer to generate a random string (which is understandable) but also don't trust themselves to do it. So, instead of simply typing/writing down:

fcC540dfdK45xslLDdfd7dDL92

And then randomly changing a few of those to other ones a few seconds later, such as one in the beginning, one in the middle and one in the end, they use dice which they roll again and again to generate random numbers, which are then treated as "truly random" and thus "truly secure".

Why would a dice roll be "more random" than simply coming up with a sequence in your head, and then changing some of them?

I simply don't believe that this could possibly be "not secure". Why the need to do the very tedious dice rolling? It feels like a ritual that they go through, based not on logic and reason, but on some sort of delusion that their brain is going to generate the same sequence as others who guess it, or a computer, even though they also change some of them after the phrase is "done".

I don't understand this. Is there anything that speaks for this practice as being "more random"/"more secure"?

$\endgroup$
23
  • 109
    $\begingroup$ I don't see any characters from the top row of a QWERTY keyboard in your random string. $\endgroup$ Commented Feb 4, 2021 at 3:32
  • 65
    $\begingroup$ Why do you think you're good at generating random numbers? $\endgroup$ Commented Feb 4, 2021 at 11:24
  • 40
    $\begingroup$ "I simply don't believe that this could possibly be". Yes, this kind of total confidence is required to maintain a position against a mountain of evidence: people's password choices, their performance in games like Man vs Machine below, tax fraud detected by number frequencies, and generally abiding beliefs in properties of random sequences that just aren't true - I'm sure you've met a person who sees a pattern in one and thinks it can't possibly have been output by a random generator, or who sincerely believes that rolling a few low numbers will make a high one more likely the next time... $\endgroup$ Commented Feb 4, 2021 at 16:28
  • 26
    $\begingroup$ @LeifWillerts brings up a good point. Humans are exceptionally good at seeing patterns. So good, they often see patterns even where none exists. That bias makes them bad at making random numbers as in the attempt at being "random", they will often start making choices to avoid the appearance of a pattern. Those choices actually reduce the randomness. $\endgroup$
    – Seth R
    Commented Feb 4, 2021 at 16:40
  • 35
    $\begingroup$ Input fcC540dfdK45xslLDdfd7dDL92 into Keyboard Heatmap. $\endgroup$
    – user76284
    Commented Feb 5, 2021 at 0:31

8 Answers 8

116
$\begingroup$

In short, it is more than a belief: there is strong evidence that humans are not good entropy sources. There is a test for this

Try to win!

So we don't rely on whether generating a random number from the mind or random keyboard typings and mouse movements that seem like a monkey playing on the computer from outsiders. We rely on good entropy sources like the /dev/urandom. That kind of sources comes from good research.

Some researches on supporting this;


Other online tests;

$\endgroup$
16
  • 19
    $\begingroup$ Interestingly, entirely conscious decisions (not pounding the keys) of whether to use 0 or 1 still gave me an even 50% score (90-90) after 227 moves. It seems the main reason that any human would be considered a bad source of entropy is the strong tendency toward always alternating input so that it "appears more random." With the test, consciously tending toward using longish strings of consecutive bits will bring the bias back to "just about random." There is likely still some bias, but it is greatly decreased. $\endgroup$
    – owacoder
    Commented Feb 4, 2021 at 17:59
  • 34
    $\begingroup$ @Clockwork Interestingly, winning the game by a large margin is exactly as indicative of non-randomness as losing by a large margin. You could imagine setting up a second machine that always guesses the opposite of the first machine, and this machine would beat you 80-20. A machine that's always wrong is actually as predictive as one that's always right. $\endgroup$ Commented Feb 4, 2021 at 19:47
  • 2
    $\begingroup$ Minor nitpick: if there's evidence, then it's very likely to also be a belief. A justified belief, rather than "just a" belief. $\endgroup$ Commented Feb 5, 2021 at 21:07
  • 2
    $\begingroup$ @TobySpeight maybe I should have said the evidence is strong that we are not good entropy sources. Nice. $\endgroup$
    – kelalaka
    Commented Feb 5, 2021 at 21:12
  • 5
    $\begingroup$ Funny that challenge: I inserted 200 bits generated with random.org into that "game" and the pc won "56%". Allowing 'pass' gives it quite a hefty advantage even on truly random sequences. $\endgroup$
    – paul23
    Commented Feb 6, 2021 at 18:26
31
$\begingroup$

For me, the fraud-related applications of Benford's Law come to mind. When people make up data they tend to create overly uniform data, even when it's not appropriate. There's a definite psychology going on that may cause people to be less random than they are intending to be (Wikipedia links to a paper claiming humans are in fact bad at this). Or perhaps misconceptions about what randomness "looks like." In any case, knowledge of things like this may generate self-doubt about generating randomness. In fact, the very idea of explicitly changing some of the allegedly random data you just generated may seem error-prone to some, and potentially the root of any problems that could later arise.

Dice, on the other hand, people trust to be random despite any unconscious bias they may be introducing. By following the outcome of dice rolls people can be more certain that there is no "gotcha" that might make their data less random. They had no real input and therefore feel sufficiently removed from the generation of the data.

Perhaps people are different enough that no general analysis could be done to make a case for a reduction in apparent entropy in human-generated random data. But I think this is ultimately a risk assessment -- i.e. are you willing to bet whatever you're protecting with the password on the assumption that your attempt at random data is truly random?

All of that said, I question whether this matters much, provided enough data is being generated. For example, a human-generated 8-character password is probably fairly insecure no matter how good a job they did at making it "random." In contrast, a 32-character password is probably fairly secure if they were trying at all. In either case, the way the password is actually used and/or secured may well matter more to whether their account will ultimately get compromised.

Still, it would be frustrating, even embarrassing, to learn that your carefully generated "random" password was able to be guessed due to its human origin, or because other "random" strings you had previously generated were compromised. Eliminating all possibility of that scenario, no matter how unlikely, is undoubtedly attractive to some, if only for that reason.

$\endgroup$
8
  • 15
    $\begingroup$ +1 for "misconceptions about what randomness 'looks like.'" For example, in a string of random coin flips, people will think that getting 5 consecutive heads is less likely than it actually is, because it "doesn't look random". $\endgroup$ Commented Feb 4, 2021 at 16:28
  • 9
    $\begingroup$ @Graham : the birthday "paradox" is a failure to appreciate combinatorial explosion. Let's not mix up common misconceptions here, that just makes it harder to explain why each of them is wrong. $\endgroup$ Commented Feb 4, 2021 at 21:18
  • 3
    $\begingroup$ @LeifWillerts It's the same root of why we get randomness wrong though. We expect "random" to be "everything's different". So we underestimate the probability of random outcomes which look like a pattern, and for the same reason we overreach when we try to be random. $\endgroup$
    – Graham
    Commented Feb 4, 2021 at 22:13
  • 2
    $\begingroup$ @LeifWillerts More concretely than Graham’s response (which is accurate), humans are notoriously bad at understanding the difference between simple independent correlation from actual dependency. The issue of randomness is a result of this applying in one way (we interpret ‘independent’ as ‘uncorrelated’), while issues such as the gambler’s fallacy are a result of the reverse application (it arises from interpreting correlation as dependence). Both directions are a result of how learning works (namely, learning is all based on pattern matching). $\endgroup$ Commented Feb 4, 2021 at 23:39
  • 1
    $\begingroup$ +1 for talking about uniform randomness. One of the biggest mistakes non-stats people make is thinking randomness only means drawn from a uniform distribution, probably because the most basic probability devices are things like coins and dice which generate discrete uniform distributions. $\endgroup$
    – eps
    Commented Feb 5, 2021 at 17:22
18
$\begingroup$

Why would a dice rolled be "more random" than simply coming up with a sequence in your head, and then changing some of them?

Humans have too many biases regarding what a random sequence is. If you ask humans to generate a random sequence, they will probably pay attention not to use the same character in a row, i.e., aa or bb, as they think that ab is more random than aa. He or she will also have a bias due to the language used, where some combinations are more frequent than others. Humans easily but wrongly generate values based on what they have generated before, so there is no true independence between values! Also note that many people put semantic on numbers (7, 13, 666, etc), and then avoid some of them! All of this is very well known, and many experiments exist to demonstrate it. You may think that rolling dice is not really random, but alas, as there is no link between each roll, they are truly independent (at least nobody can control dependencies).

I simply don't believe that this could possibly be "not secure". Why the need to do the very tedious dice rolling? It feels like a ritual that they go through, based not on logic and reason, but on some sort of delusion that their brain is going to generate the same sequence as others who guess it, or a computer, even though they also change some of them after the phrase is "done".

Alas, don't believe is not sufficient. There are many scientific results on the subject, and generating a "secure" random sequence is not easy. Even a small bias may be dangerous and exploited. The human mind has a real difficulty understanding/tackling what randomness is.

I don't understand this. Is there anything that speaks for this practice as being "more random"/"more secure"?

There is true random physical randomness-- for example, rolling a dice. Of course, this can't be used in computer systems as the throughput will not be sufficient. Truly random sequences can be generated by radiation decay, but it is not easy to integrate it into a computer. So, modern random sequences are generated by a mix of pseudo-random generators and physical events. Pseudo-random generators are algorithms. Thus they can't produce true randomness but something very close. Then mixing the result with true randomness gives even more security.

$\endgroup$
1
  • 2
    $\begingroup$ Your first point is very true. According to official sources, 11 people died of COVID-19 on 2020-08-13 in Moscow. The next day, another 11 died. Since then, never once the figure was the same on two consecutive days. Chart $\endgroup$ Commented Feb 5, 2021 at 14:16
15
$\begingroup$

Randomness is a measurable, statistical property of a set of values. It doesn't mean the same as "hard for a human to guess."

Your sample string is hard for a human to guess, but it isn't very random.

There is a tool called "ent" for most Unix systems that can quantify the randomness, by some measures, of a file.

Available here: https://www.fourmilab.ch/random/

Your string was 27 characters long, all ASCII, and limited to the set of [a-zA-Z0-9] . Let's compare your string to 27 characters from /dev/urandom limited to that same range, using "ent".

Your string: fcC540dfdK23xslLDdfd7dDL92

Here are the results from "ent".

$ ent test1.txt

Entropy = 3.926572 bits per byte.

Optimum compression would reduce the size of this 27 byte file by 50 percent.

Chi-square distribution for 27 samples is 532.41, and randomly would exceed this value less than 0.01 percent of the times.

The arithmetic mean value of data bytes is 77.9259 (127.5 = random).

Monte Carlo value for Pi is 4.000000000 (error 27.32 percent). Serial correlation coefficient is 0.271042 (totally uncorrelated = 0.0).

27 characters from /dev/random: Q9HpOpJrS3yYKlLc71yq003IMR

Here are the results from "ent".

$ ent test2.txt

Entropy = 4.458591 bits per byte.

Optimum compression would reduce the size of this 27 byte file by 44 percent.

Chi-square distribution for 27 samples is 304.85, and randomly would exceed this value 1.76 percent of the times.

The arithmetic mean value of data bytes is 78.8889 (127.5 = random).

Monte Carlo value for Pi is 4.000000000 (error 27.32 percent).

Serial correlation coefficient is -0.024251 (totally uncorrelated = 0.0).


The program was easily able to quantify how much less "random" (in the statistical sense) your string was.

"People believe" we humans are bad at generating randomness because we are.

$\endgroup$
14
$\begingroup$

People are not that bad, but we're slow. See How were one-time pads and keys historically generated? In summary, MB's of 100% secure key material were generated for one time pads by people simply key smashing on type writers. Sufficient to win three world wars. It's just that a human's entropy rate is a little lower than a laser phase based TRNG.

fcC540dfdK23xslLDdfd7dDL92

is pretty much random. But do it again, and again and again. Randomness is a function of sample size, and the more you create by keyboard smashing, the more it becomes susceptible to frequency analysis.

That's not to say that raw irreducible information (entropy) isn't being generated, but it has to be uniformly distributed for use with cryptography. The uniformity aspect is the difficult bit. So try it. Write out 500 kB of 'randomness' and then run it through a program called ent. I can guarantee that your data will fail the test. And yes, comments below correctly highlight the speed issue.

That's not to say your typing wasn't random, but it won't have been random enough. Refer back to my linked answer, and read about randomness extraction which statistically reshapes biased randomness into useful cryptographic entropy.

$\endgroup$
10
  • 2
    $\begingroup$ Yes, just casually write out 500 kB. Estimating 5 characrets per word, that is slightly shorter than Harry Potter and the Prisoner of Azkaban, at 107 253 words $\endgroup$
    – Suppen
    Commented Feb 4, 2021 at 15:15
  • 4
    $\begingroup$ @Suppen, right, it's going to get real tedious real fast. If a human were forced to do that, they would probably pretty quickly fall back on known, practiced patterns to make the task easier on them, even if they don't realize it. There's not going to be much entropy there. $\endgroup$
    – Seth R
    Commented Feb 4, 2021 at 17:19
  • 2
    $\begingroup$ @SethR et al: Guys, that the reasoning behind my answer :-) But follow my OTP link. MB's of 100% secure key material were generated by people simply key smashing on type writers. Sufficient to win three world wars. It's just that a human's entropy rate is a bit lower than a laser phase TRNG. $\endgroup$
    – Paul Uszak
    Commented Feb 4, 2021 at 18:02
  • 8
    $\begingroup$ [...] Sufficient to win three world wars -- I seem to have overslept World War Three, again. At least we won instead of them $\endgroup$ Commented Feb 5, 2021 at 17:33
  • 3
    $\begingroup$ No, people are bad. People not generating good enough randomness actually lost some world wars. (Well, that and other factors.) Some of the successes of Enigma cryptanalysis were due to process or operator errors, such as avoiding consecutive identical elements in sequences that should have been random, or using keyboard patterns instead of random input. $\endgroup$ Commented Feb 7, 2021 at 15:59
9
$\begingroup$

Evidence suggests that people asked to generate random data will produce repetition in the data substantially less often than random chance would.

For example, let's assume you were asked to generate random digits (i.e., just 0 through 9).

In purely random data, a sequence like NN (i.e., the same digit twice in a row) happens about 10% of the time. That is, given some arbitrary first digit, there's a one in ten chance that we'll randomly choose the same digit the next time.

But when people are producing (what they want to be) random digits, most people see this as something that's unlikely to happen by random chance, so what they produce will have substantially fewer instances of the same digit twice in a row than random chance would suggest.

Two digit runs are only the tip of the iceberg though. By the same logic, we see that runs of three identical digits should happen around 1% of the time. That is, given some arbitrary digit N, there's a one in ten chance that the next digit we select will also be N, and a one in ten chance that the third time, we'll select N again. 1/10 * 1/10 = 1/100 = 1%.

That continues with longer strings as well--4 digit runs should happen with a frequency of about 0.1%, 5 digit runs with a frequency of about 0.01%, and so on.

Testing indicates, however, that when people are asked to generate random numbers, they'll produce repeated strings like this considerably less often than random chance would. And the longer the string, the worse the disparity between human-generated and randomly-generated strings becomes, to the point that most people simply won't produce a run of the same digit (say) 4 or 5 times in a row, no matter how many random digits you ask them to produce. To most people, the chances of that happening randomly seem so remote that they simply never do it. The same happens with other things that seem like obvious patterns such as "1234" or "3210"--most people won't produce them nearly as often as they would occur by random chance.

$\endgroup$
5
  • 3
    $\begingroup$ People mailing stuff to my house have written the wrong house number more than once. It is 4321. They tend to write 4231, because "it just can't be 4321. The odds are too low." Hell, I wrote 4231 once myself. $\endgroup$
    – DKNguyen
    Commented Feb 4, 2021 at 22:58
  • $\begingroup$ According to official sources, 11 people died of COVID-19 on 2020-08-13 in Moscow. The next day, another 11 died. Since then, never once the figure was the same on two consecutive days. Chart $\endgroup$ Commented Feb 5, 2021 at 14:20
  • $\begingroup$ @RomanOdaisky: Unless you believe that to be a random source, I don't quite see how you'd consider it relevant. If you do believe it's a random source...then I disagree, and think it's still not relevant. $\endgroup$ Commented Feb 5, 2021 at 23:08
  • 1
    $\begingroup$ It’s a nice real-life example of how lack of repetition can disprove the hypothesis that the data came from a random process. $\endgroup$ Commented Feb 5, 2021 at 23:11
  • $\begingroup$ @RomanOdaisky: I see what you're getting at. Yes, certainly indicates that the values aren't purely random. $\endgroup$ Commented Feb 5, 2021 at 23:17
0
$\begingroup$

I suppose the problem is not that a human would generate a biased random number. Computers also use biased random sources, but as long as there is entropy in them, they could be hashed into a shorter random enough number. However bad humans are, what humans think of obviously has entropy in it.

The problem is, humans are bad at memorizing true random numbers, and doesn't have an internal hash mechanism (at least there isn't one humans are known to be able to feel and make use of). If they hash mechanically, it would take much time and need to memorize more numbers. Everyone would be lazy and choose to just use a computer. The rest of people who don't feel lazy are the ones who don't know how biased they are and how to make random numbers correctly. What they could get in average is to be expected.

$\endgroup$
3
  • 2
    $\begingroup$ I don't understand how you are trying to answer OP with this. Remembering numbers is exactly why we fail at random, because we look at the previous numbers to pick something that "looks random". Dice have zero memory and zero hashing ability. $\endgroup$
    – pipe
    Commented Feb 5, 2021 at 5:55
  • $\begingroup$ @pipe I don't think it's because "we" want something "looks random". Yes, many people do like that, but it's because the people knowing not to do that would also realize how difficult it is to get everything right, and prefer easier ways using computers. It's not the case that it's difficult to train people knowing better to generate actual random numbers. It's just not worth it. $\endgroup$
    – user23013
    Commented Feb 5, 2021 at 6:34
  • 1
    $\begingroup$ @pipe (Technically, people absolutely without any training cannot speak a language, and cannot understand the word "random", which shouldn't be used as an evidence of how bad humans intrinsically are, but only how bad most people are. If the idea is actually how most people are, I don't disagree with the common belief. But unlike other answers, my opinion is, a human mind generally has the random source with usable quality. It only need to be processed to use in cryptography.) $\endgroup$
    – user23013
    Commented Feb 5, 2021 at 6:39
0
$\begingroup$

It's mathematics and psychology. People tend to create patterns that aren't random even when they try not to.

Randomness isn't just any gibberish that doesn't mean anything, it's data NOT HAVING ANY PATTERN. Humans create patterns.

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.