Passwords - extended dictionary vs dictionary attacks

Question

Taking inspiration from Diceware and the other passphrase generators, I took a dictionary of 20k English words and used a script to generate typos of them, resulting in 7M "words", giving 22.7 bits of entropy per word compared to Diceware's 12.9. 5 of those became part of my new master password (it's easier to remember than 9 words without typos and is only slightly weaker). The setup behind TOR bridges gave me an idea, and I thought of taking this to the extreme and building a passphrase generator with such a large dictionary that it's unbreakable... read on for the fine print.

Just to make the question obvious, I am asking how effective this would be in practice. If it's worth the effort I shall infect my friends and make them replace their correct horse battery staples with this.

The idea is that each user will take a copy of the already large dictionary of common English (or another language) words as their personal dictionary, and then add their own words - uncommon words, typos, names, words in other languages, short phrases, numbers, pseudowords, whatever they can think of (it's their own personal dictonary, if it's not memorable it's their fault). Of course, individual users will have difficulty coming up with enough words to boost the entropy of their personal dictionary by any significant amount, which is why we would go and mix our dictionaries with our friends' to make them larger - they add some of our words to their dictionary and we add some of theirs to ours. There will be a spectrum of word rarity, with a large portion of words being in many users' dictionaries and some few words being unique, at least until you share them with other people.

At the end of the day, with all this mixing, we may have succeeded in adding a few bits of entropy to our own personal dictionary. This may have been equivalent to 1 extra Diceware word. But it gets a little better if we look at targeted attacks:

If they do know our personal dictionary, it still contains the standard dictionary, so it's at least as strong as an equivalent number of Diceware words, which is already secure*. Kerckhoff's principle is still obeyed and the setup can only get stronger over time as more words are added.
If they don't know our personal dictionary... If we have even a single rare word in our passphrase and it isn't included in the dictionary used by the attacker, they will never guess our password. To circumvent this, the attacker may snag other people's dictionaries in hopes of finding one that includes our rare words. They don't know which words are the rare words, so they have no way to filter the combined dictionary. This combined dictionary will also contain lots more words that are not in our personal dictionary and will only slow down the dictionary attack. Guessing entropy is based on how hard the passphrase is to guess, and so in this case it is based not on our personal dictionary but by the (estimated) size of the combined dictionary needed to crack it, and this combined dictionary would be much larger than our personal dictionary.

*But if it's already secure, then why add more security? We can get enough entropy to be secure while still using 6, 5, or even 4 (with a huge dictionary) words, which would otherwise be on or below the threshold for what's currently considered secure. It's only equivalent to Diceware if the standard dictionary is the same size and no user words are added.

I reference Diceware since that seems to be the most popular passphrase generator, or at least it's the one I was pointed to.

"They don't know which words are the rare words, so they have no way to filter the combined dictionary." - Can't the attacker just maintain their own combined dictionary of unique words? I don't see this distributed dictionary scheme adding any significant amount of entropy, tbh. — Maybe_Factor, Commented Sep 14, 2017 at 6:06

Royce Williams · Accepted Answer · 2017-09-14 16:08:23Z

This approach would add unnecessary complexity. More importantly, it would also defeat a core purpose of Diceware-style passwords: only having to remember individual words, not weird additional stuff about them.

First, to illustrate the complexity, let's do the math. (For any proposed password scheme, if you do the math informed by a model of the risk, you'll get most of the answer automatically.)

Assuming an extreme worst case - starting with your base 20,000 word dictionary, and assuming that your attacker can guess 100 trillion passwords per second (which is almost certainly beyond current nation-state capabilities), and further assuming that you want your password to resist cracking for 100 years - you would only need:

ceil(log base 20000 of (100000000000000 * 100 * 31556926))

... six words anyway. (Thanks to Jeremi Gosney for teaching me how to illustrate the math).

And that's for a hypothetically advanced future nation-state attacker - which is almost certainly not who you should be worried about. Based on our current understanding of math and physics, millions of words in a Diceware dictionary is completely unnecessary for most actual risk models.

Second, the whole point of Diceware is to make passwords that are easy to remember. Introducing millions of typos would make the cognitive burden unmanageable. Trying to remember those typos would be as bad as trying to remember which letter you capitalized, which 'A' you turned into an '@', etc. It's simply not worth the trade-off - and precisely what Diceware inventor Art Reinhold wanted to avoid.

Instead, for your 20,000-word dictionary, you'd be far better off by simply adding a word. Not only is this easier to memorize accurately ... it automatically (and simply!) increases the cost of cracking the passphrase by 20,000 times.

Alternatively, you could use the american-english-large dictionary that's distributed with some Linuxes, which is 163,730 words. That would drop our extreme case down to 5 words. And instead of wasting energy on memorizing typo variations, you can improve your vocabulary! :)

Your wolframalpha link correctly uses base 20,000 but the text says base 62. — AndrolGenhald, Commented Sep 14, 2017 at 15:56

Stack Exchange Network

Passwords - extended dictionary vs dictionary attacks

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
passwords
password-cracking
dictionary
.

Hot Network Questions

Passwords - extended dictionary vs dictionary attacks

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged passwordspassword-crackingdictionary.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
passwords
password-cracking
dictionary
.