52

I have been hearing more and more that the haveibeenpwned password list is a good way to check if a password is strong enough to use or not.

I am confused by this. My understanding is that the haveibeenpwned list comes from accounts which have been compromised, whether because they were stored in plain text, using a weak cipher, or some other reason. This seems to have little to do with password strength to me. There could be very strong passwords that were stored in plain text, and thus compromised, and would really be pretty fine to use as long as they weren't used in combination with the original email/username. The fact that their hashes are known (duh, any particular password's hash is known!) doesn't matter if the place you are storing them is salted. Although it really doesn't hurt to rule out these passwords, as perhaps a hacker would start with this list when brute forcing, and it is easy to choose another one.

But the inverse is where I am concerned - there will always be very easy to crack passwords that aren't on the list. "longishpassword" at this time has not had an account using this password that was hit by a leak. This does not mean however that were a leak of hashes to happen, this password would be safe. It would be very easy to break.

What is the rationale behind checking a password (without an email/username) against the haveibeenpwned list to see if it is worthy to be used? Is this a good use of the list or is it misguided?

edit:

It is way too late to change the scope of the question now, but I just wanted to be clear, this question came from a perspective of checking other people's passwords (for instance when users register on your website, or people in your organisation are given AD accounts) not for validating the strength of a personal password. So any comments saying "just use a password manager" have not been helpful to me.

17
  • 23
    Can you cite a source saying HIBP "is a good way to check if a password is strong enough to use or not"?
    – schroeder
    Commented Jun 3, 2019 at 9:11
  • 9
    In brief, HIBP has a huge list of real passwords, including both strong and weak ones. It is possible that the strong ones are filtered and not used in bruteforce attacks, but it's also possible that it's not worth filtering the list (after all, passwords that look strong might actually be weak and used by more than one user). So attackers might just use the whole list for bruteforcing, and therefore every password on that list is going to be at risk.
    – reed
    Commented Jun 3, 2019 at 9:30
  • 5
    I'd say that if your password was able to be cracked using any hash, it's not a good password. I'd also say that your your password was revealed in plaintext, it's now in the cracking dictionary, and thus a bad password. But I certainly wouldn't say that NOT being in haveIBeenPwned.com means it's a good password. The website primarily exists to show how common account cracking is, and how BAD your password is. It's nearly impossible to show how good a password is, unless it can be demonstrated to have a sufficient amount of entropy by the method used to generate it. Commented Jun 3, 2019 at 19:52
  • 13
    HIBP and pretty much any password strength meters including zxcvbn can tell you when a password is bad; they can't tell you that a password is strong.
    – Lie Ryan
    Commented Jun 4, 2019 at 2:51
  • 5
    You're not using HIBP to validate a good password, you're using it to exclude bad ones. You have to use some other method to evaluate password strength.
    – Neil_UK
    Commented Jun 5, 2019 at 10:00

10 Answers 10

27

It's definitely one of your validation steps, but can't be fully relied on.

Given the fact that most users reuse passwords, and build passwords using a relatively small base of words, a dictionary attack is a particularly effective means of guessing passwords. Since HIBP is regularly updated, it will have many passwords in frequent use, and thus probable candidates that a dictionary attacker would try. Thus, it is a good starting point to check. However, just because your password is not in the list, it doesn't mean your password won't be guessed easily. It's just that known passwords would be high on their list of passwords to try along with text mined from the internet, combinations of words with digits/symbols, transpositions, etc. As more password leaks happen, HIBP and other such tools become more useful, and hackers' lists of passwords to try become more effective to them as well.

I was quite surprised to see some passwords I know are quite easily guessed and are definitely being used in multiple sites, not on the HIBP list, so I can vouch for it not being the determinant of password strength (just like the example in the question). However, if I have come up with what I think is a strong password, and it's on the list, I would definitely not use it.

1
  • 1
    This answer seems to best sum up what I have learned from the answers and comments on this question. Thanks Kristopher
    – Nacht
    Commented Jun 5, 2019 at 5:59
67

"Strong" has always had the intention of meaning "not guessable". Length and complexity help to make a password more "not guessable", but a long, complex, but commonly used password is just as weak as Pa$$w0rd.

If a password is in the HIBP list, then attackers know that the password has a higher likelihood of being chosen by people, hence, might be used again. So those lists will be hit first.

So, if your password is on the list, then it is "guessable".

If your password is not on the list, then from a dictionary attack approach, it is less guessable and not what others have chosen, and by implication (for as much as that's worth), is "less guessable". Many other factors, of course, can make your password "more guessable", even if it is not on the HIBP list.

As always, a randomly generated password is the most "unguessable" and a maximum length and randomly generated password is extremely difficult to bruteforce. And if you are randomly generating it, then why not go max length?

14
  • 27
    I think the confusion is compounded by "password strength" often being described by "entropy", and misapplication of Kerckhoffs's principle: the strength of a password is a property not of how you select it, but of how an attacker will attack it. Just as the attacker is trying to guess how the password was selected, the user can try to guess how the attacker will brute force it.
    – IMSoP
    Commented Jun 3, 2019 at 17:58
  • 14
    You mentioned it, but it may help OP to realize: Even if you created the strongest, most unguessable/crackable password in the universe for www.example.com, and that site gets hacked and the passwords are released for folks to buy/download, then that strong password is effectively worthless. All a hacker has to do (and likely will do) is download the "known cracked website passwords" and loop through that. It'll find that super strong password and try it - so they don't even have to guess the password. It's on a list. Therefore, worthless.
    – BruceWayne
    Commented Jun 3, 2019 at 21:29
  • 1
    @Drew - only by a very trivial amount. Even if you restrict the alphabet to single-case ascii letters, you will only lose about 1 part in 27 of the possible passwords. Commented Jun 4, 2019 at 17:01
  • 1
    @Mast The chance of choosing a 20 character password at random, and ending up with a sequence of words is non-zero, but I think it is negligible. Commented Jun 4, 2019 at 17:02
  • 6
    @StianYttervik Not if you use a password manager; in that case, you only need to memorise one master password, and all the services you use only see unique, long, random strings of characters. The tradeoff is that the password manager is a single point of failure, but it's less vulnerable than using the same memorised password directly for multiple services, where a breach in one would compromise your accounts on all the others.
    – IMSoP
    Commented Jun 4, 2019 at 17:21
29

To answer this question properly, you need to think like the hacker who wants to work out your password.

But to avoid having to dive straight into a mathsy way of thinking, let's start instead by thinking about a competitor on the Lego Movie game show "Where are my pants?"

Obviously, when the competitor wants to find their clothes, the first thing they'll do is go to their wardrobe. If that doesn't prove fruitful, they might check their drawers, followed by the chair in the corner of the room, followed by the laundry basket, and perhaps the dog's basket if the dog is of the naughty pants-stealing sort. That'll all happen before they start looking in the fridge.

What's going on here is of course that the competitor will look in the most likely places first. They could have systematically worked through every square foot of the house in a grid, in which case they would on average have to check half the house. On the other hand with this strategy they have a good chance of getting it on the first go, and certainly wouldn't expect to cover half the house.

A hacker ideally wants to do the same thing. Suppose they know that the password they are after is 8 lowercase letters long. They could try working through them one at a time, but there are 208,827,064,576 possible options, so a given completely random guess has about a 1 in 208 billion chance of being right. On the other hand, it's well known that "password" is the most common password. (except when it's banned) In fact looking at the data from haveibeenpwned, the chance of the right answer being "password" is about 1 in 151. Not 151 billion, just 151. So that's over a billion times more likely than some random guess, and they'd be stupid not to start with it. (And obviously, since you want your password not to be found, you want to avoid picking what they'd start with)

Now, the question is whether that generalises beyond "password." Is it worth their while working through a list of leaked passwords? For a bit of information, consider this quote from the original release write up.

I moved on to the Anti Public list which contained 562,077,488 rows with 457,962,538 unique email addresses. This gave me a further 96,684,629 unique passwords not already in the Exploit.in data. Looking at it the other way, 83% of the passwords in that set had already been seen before.

What that tells us is that, roughly speaking, a randomly selected password has a better than 80% chance of featuring in the list. The list has a few hundred million entries, compared with a few hundred billion options for random 8 letter passwords. So, roughly speaking our hacker trying 8 letter passwords would have a 0.1% chance without the list in the time they could get an 80% chance with the list. Obviously they'd want to use it. And again, you might as well avoid it. After all, you still have hundreds of billions of options to choose from, and you can get thousands of billions by just going to nine letters!

That's the justification for checking the list.

Now your first worry is that "there will always be very easy to crack passwords that aren't on the list." That may be true. For example, "kvym" is not on the list. It's only 4 letters. There are only half a million passwords that are 4 lowercase letters or shorter, so if people are likely to prefer short passwords then a hacker would blaze through them in a fraction of the time it would take to finish the leaks list. It's likely that they'd try both.

The answer to that is obvious. Use both rules. Don't use a password that has appeared in a breach, and don't use a password that is very short. If you have a random password of any significant length, you have more than enough options that a hacker has no shortcut way to find.

16

Others go into why it's a good idea. I'll take a different direction.

From a compliance standpoint, the relevant NIST standards, NIST Special Publication 800-63, Digital Identity Guidelines specifically requires that when users set their passwords, it shall be checked against a list of previously compromised passwords. The relevant section is SP 800-63B, Authentication and Lifecycle Management, section 5.1.1.2, which says

When processing requests to establish and change memorized secrets, verifiers SHALL compare the prospective secrets against a list that contains values known to be commonly-used, expected, or compromised. For example, the list MAY include, but is not limited to:

  • Passwords obtained from previous breach corpuses.
  • Dictionary words.
  • Repetitive or sequential characters (e.g. ‘aaaaaa’, ‘1234abcd’).
  • Context-specific words, such as the name of the service, the username, and derivatives thereof.

If the chosen secret is found in the list, the CSP or verifier SHALL advise the subscriber that they need to select a different secret, SHALL provide the reason for rejection, and SHALL require the subscriber to choose a different value.

By definition, anything found via the Pwned Passwords API are "values known to be [...] compromised."

If your organization has to worry about compliance, be aware that the two main standards for passwords are incompatible. The Payment Card Industry Digital Security Standard (PCI-DSS) says that passwords must be changed every 30 days, must be a combination of upper case, lower case, numbers, and symbols, etc., while the NIST standard says that passwords should not arbitrarily expire based on dates, and should not have complex rules about the class of characters allowed, but should be flexible enough to allow users to use any combination of character classes.

It is up to your organization to determine which standarsd to comply with, of course.

If you are developing for an agency under the US Department of Commerce, you must follow the NIST standards, full stop. It's the law. (And with all things regarding the law, check with your organization's legal department, don't trust me blindly.)

If you are working on any system that processes payment information, you are very strongly encouraged to follow the PCI-DSS. If you just have a web store, and are using a third party payment processor, then this doesn't apply to you. It does not have the weight of law, but you should check with your lawyers, as not following the PCI-DSS may expose you to being found negligent if things go wrong.

If none of these apply, then for me, the NIST standards make the most sense. Have several thorough discussions with your security team, do research, and figure out what makes the most sense to you.

As an example of figuring out what makes the most sense to you, in my organization, we do not reject passwords that had less than 10 hits in the Pwned Passwords API. We still show a warning message letting the user know that, even though the password was seen in a breach, we still accepted it. And, that they should consider switching to using a password manager to generate truly random passwords. I'm lucky enough to be in an organization where we can talk to the users, and we can have honest discussions about password management. Others will have to adjust their approaches to meet the needs of their organization.

3
  • 1
    Although NIST is in Commerce, under FISMA its standards apply (with some lag) to all Federal government and contractor systems except 'national security' systems which are under NSA instead -- and NSA mostly aligns its standards with NIST. To what extent anybody else (state/local, foreign, or private) should follow them is a matter of judgement. Commented Jun 5, 2019 at 0:20
  • @dave_thompson_085, thanks for the clarification. I'll look up FISMA and see if I can work it into the answer. And yes, following the NIST standards only makes it easier for US federal agencies -- the rest of us still have to do our own thinking.
    – Ghedipunk
    Commented Jun 5, 2019 at 15:43
  • Great answer! However, I'd say that if a password is on the list of known passwords at all, it shouldn't be allowed to be used for anything requiring a real authentication. The Have I been pwned service has "only" about 700M passwords and offline attacker going through such a tiny list would definitely try every one of those so no password on that list is safe for any use. Commented Jan 8 at 15:08
5

Let's do the math:

Let's say every person on earth has used ~1000 passwords so far. That makes approximately 10 trillion passwords, which is ~243 if I am not mistaken. Choosing any existing password at random is thus about as good as a truly random 8-9 character case-sensitive character password. Not very good. See this answer.

That basically means that, in theory, not only should one not reuse a password, one should not reuse a password that has been used by anyone ever. Passwords that have been used before are basically one big dictionary attack waiting to happen.

4
  • 2
    Would you mind if I make a new question specifically about your answer? I think it's interesting and underrated, and I'd like to see analysis by other users. Commented Jul 3, 2019 at 16:14
  • 1
    @Michael Sure, go ahead.
    – kutschkem
    Commented Jul 4, 2019 at 18:15
  • 2
    Took long enough, but I did, eventually! Commented Nov 15, 2019 at 21:21
  • Have I been owned database contains around 700M passwords. If we assume that total amount of passwords used is around 2^40 (3 magnitudes less than your assumption to be on the safe side), the database still contains only 0.008% of all the passwords. I think it's safe to blacklist those passwords for good, no matter if the password was seen once or thousands of times. Commented Jan 8 at 15:11
2

I have to admit I'm a bit lost in what strong means nowadays. I like to think that strong means a complex and long password. But that doesn't make a good password since it can possibly still be guessed easily.

As you already note: "a hacker would start with this list when brute forcing". So if your password occurs in this list, your password will be quickly guessed and this means it is not a good password.

There's an explanation on the website when you enter a string that's not in the list:

This password wasn't found in any of the Pwned Passwords loaded into Have I Been Pwned. That doesn't necessarily mean it's a good password, merely that it's not indexed on this site.

Using the HIBP list is a way of checking how easy your password will be guessed, but is not an indication of its strength. You need to use a password strength checker for this, which often will not check the leaked password lists. HIBP password list and a password strength checker complement each other.

6
  • 2
    Password strength checkers have very limited utility as they assume certain criteria for brute-forcing and may not check any dictionary at all. They are useful for illustrative purposes, but not for choosing a strong password.
    – schroeder
    Commented Jun 3, 2019 at 9:51
  • 1
    @schroeder I don't see how your comment adds to what I already said. Can you explain?
    – LVDV
    Commented Jun 3, 2019 at 11:23
  • 5
    "You need to use a password strength checker for [an indication of its strength]" - password strength checkers should not be used for this and are not good at determining strength. They are illustrative at best, good for learning the basics of the effects of making certain changes to passwords. I just Googled "password strength checker" and the top hit returned "very strong, 82%" for the input of Pa$$w0rd.
    – schroeder
    Commented Jun 3, 2019 at 12:07
  • 1
    Checking strength is not the thing to do. The thing to do is to generate passwords that have strength.
    – schroeder
    Commented Jun 3, 2019 at 12:08
  • In the beginning of my answer I explained what I understand under strong, which is length and complexity. A password strength checker can help you define this. Pa$$w0rd is a strong password by my definition (although a bit short), but it is predictable which makes it a bad and ineffective pw. I can have a strong password according to your definition of length, complexity and predictability, but it still wouldn't be a good password if I've been using it for 10 years and for 50 different sites. That's why I prefer the simple term "good" when talking about the final effectiveness of a password.
    – LVDV
    Commented Jun 3, 2019 at 12:37
1

Once password is sent to some random password checking site, it is no longer secure. Using such sites is definitely not a good idea with passwords you (are going to) use.

There is nothing preventing such site from adding the password you tested directly into wordlist, and then selling to hackers.

Again: using such sites with real passwords is IMHO very bad idea.

3
  • 5
    This is sort of not a problem with HIBP. You can certainly go to the website and input your password. In this case I agree with you. But if you use the HIBP API you can be fairly sure your password is still secure. The API uses the k-anonymity system where you only need to share the start of a hash of your password. The API returns all hashes which match that also start with that hash, leaving you to verify whether or not the password has been compromised. HIBP would never know your password. Commented Jun 5, 2019 at 15:01
  • 1
    @Michael Hancock Yes, if you are not sending actual password, it is way safer of course. :-) But that's not the case of the website. I would only be putting actual password for test into trusted open source, and preferably desktop applications.
    – Firzen
    Commented Jun 5, 2019 at 15:16
  • 3
    This would be a good answer if it discussed the difference between the API (and its protections against this) and using the website directly. However, the question just talks about using the list (not the website) and about a specific service (not "some random password checking site"), so IMO this answer jumps to a wrong conclusion.
    – IMSoP
    Commented Jun 5, 2019 at 17:08
1

There are many good answers on this page, but I don't see anyone considering the concept of credential stuffing.

It relies on the fact that many users have the same username (email address, really) and password on multiple site. So you can grab a list of username/passwords (similar to what HIBP uses), and simply fire off all the pairs on the list against the web site you want to break into.

By ensuring that none of your users have passwords present in any of the lists known to HIBP, you very effectively block this attack.

1

But the inverse is where I am concerned - there will always be very easy to crack passwords that aren't on the list. "longishpassword" at this time has not had an account using this password that was hit by a leak. This does not mean however that were a leak of hashes to happen, this password would be safe. It would be very easy to break.

You are 100% right that absence from HIBP's Pwned Passwords database doesn't guarantee that a password is strong. However, I think you're underestimating the enormous value of checking passwords against the HIBP database. The point is that the case that you're concerned about—a weak password that's not in HIBP's database—is considerably less common than weak passwords that are in the list.

Troy Hunt (the creator of HIBP) writes extensively about his projects, and his 2018 blog entry "86% of Passwords are Terrible (and Other Statistics)" gives what I think should be an extremely eye-opening example (edited for brevity):

But I always wondered - what sort of percentage of passwords would [Pwned Passwords] actually block? I mean if you had 1 million people in your system, is it a quarter of them using previously breached passwords? A half? More? What I needed to test this theory was a data breach that contained plain text passwords, had a significant volume of them and it had to be one I hadn't seen before and didn't form part of the sources I used to create the Pwned Passwords list in the first place.

And then CashCrate [a big breach and leak] came along.

Of those 6.8M records, 2,232,284 of the passwords were in plain text. So to the big question raised earlier, how many of these were already in Pwned Passwords? Or in other words, how many CashCrate subscribers were using terrible passwords already known to have been breached?

In total, there were 1,910,144 passwords out of 2,232,284 already in the Pwned Passwords set. In other words, 86% of subscribers were using passwords already leaked in other data breaches and available to attackers in plain text.

So while you are right to think that Pwned Passwords doesn't solve the whole problem, the volume of low-hanging fruit that it addresses is enormous. Combine it with a scientifically well grounded password strength checker like zxcvbn and you bite off another big chunk:

password:               longishpassword
guesses_log10:          8.09552
score:                  3 / 4
function runtime (ms):  2
guess times:
100 / hour:   centuries (throttled online attack)
10  / second: 5 months (unthrottled online attack)
10k / second: 3 hours (offline attack, slow hash, many cores)
10B / second: less than a second (offline attack, fast hash, many cores)

And after you've knocked off the low-lying fruit you probably hit rapidly diminishing returns.

-2

I would argue that simply ruling out billions of perhaps strong passwords that are on this list from previous breaches is not necessarily useful as in the context of your environment it might just make it very hard to select one when billions are excluded already especially if people have to remember it for one reason or the other (can't use password manager for example).

I think this should also be put in the context of whether you also employ MFA in which case knowing the password only get's you so far. Also, brute-force attacks can be effectively countered by employing account lockout rules for wrong password entries.

4
  • 4
    A password in any breach is no longer a strong password. You always must assume, that an attacker tries passwords from breaches first. First the attacker does not judge each password if it is in the breach because it is a common one or if it is one of the stronger ones, second you may be the user with the password in the breach and third it is cheap to extend a wordlist with a million passwords and has a huge benefit for an attacker. So you must assume that any password in a breach will be tried.
    – allo
    Commented Jun 4, 2019 at 11:15
  • I agree with @allo here and would even emphasize this: If a password appears in a list from a breach, I consider it compromised. If an attacker has access to a hashed passwords file (or can access the account without rate limit), they will certainly try password lists first.
    – Dubu
    Commented Jun 4, 2019 at 12:33
  • Treat the master passwords list as a dictionary. A very good dictionary that has a high likelyhood of containing a password. If you are going to protect against a dictionary attack, then you would most definitely want to protect from using the password list as the source of that attack. If you want to protect the users from using password123 as their password, then you also most definitely want to protect against a password list attack that will have stuff way more common than this.
    – VLAZ
    Commented Jun 5, 2019 at 15:01
  • All of these things don't take into account the (hopefully) setup "number of failed attempts before lockout/delay" it's all very well saying superpassword123! has appeared in the list once is "compromised" but you'd probably not try it as the 1st 100+ attempted passwords, not using it because it's appeared once is a bit crazy (assuming you've not used it yourself before elsewhere, means nothing if someone else has really) [superpassword123! is used as an example, not as a case of a proper smart to use password]
    – Stephen
    Commented Jun 6, 2019 at 13:03

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .