547

A developer, let's call him 'Dave', insists on using home-brew scripts for password security. See Dave's proposal below.

His team spent months adopting an industry standard protocol using Bcrypt. The software and methods in that protocol are not new, and are based on tried and tested implementations that support millions of users. This protocol is a set of specifications detailing the current state of the art, software components used, and how they should be implemented. The implementation is based on a known-good implementation.

Dave argued against this protocol from day one. His reasoning was that algorithms like Bcrypt, because they are published, have greater visibility to hackers, and are more likely to be targeted for attack. He also argued that the protocol itself was too bulky and difficult to maintain, but I believe Dave's primary hangup was the fact that Bcrypt is published.

What I'm hoping to accomplish by sharing his code here, is to generate consensus on:

  1. Why home-brew is not a good idea, and
  2. What specifically is wrong with his script
/** Dave's Home-brew Hash */

// user data
$user = '';
$password = '';

// timestamp, random #
$time = date('mdYHis');
$rand = mt_rand().'\n';

// crypt
$crypt = crypt($user.$time.$rand);

// hash
function hash_it($string1, $string2) {
    $pass = md5($string1);
    $nt = substr($pass,0,8);
    $th = substr($pass,8,8);
    $ur = substr($pass,16,8);
    $ps = substr($pass,24,8);

    $hash = 'H'.sha1($string2.$ps.$ur.$nt.$th);
    return $hash
}

$hash = hash_it($password, $crypt);
25
  • 668
    ... and now Dave's protocol is published? Commented Dec 18, 2012 at 15:14
  • 150
    Dave might have had a point if he ended up with a better and more secure approach. He did not. The reason we use published algorithms is so that we all benefit from the collective wisdom of others and don't have to rely on our personal and evolving understanding of a concept.
    – schroeder
    Commented Dec 18, 2012 at 16:01
  • 261
    Dave has a fundamentally false premise, that the security of an algorithm relies on (even partially) its obscurity - that's not the case. The security of a hashing algorithm relies on the limits of our understanding of mathematics, and, to a lesser extent, the hardware ability to brute-force it. Once Dave accepts this reality (and it really is reality, read the Wikipedia article on hashing), it's a question of who is smarter - Dave by himself, or a large group of specialists devoted to this very particular problem. Commented Dec 18, 2012 at 16:33
  • 111
    Worst of all his algorithm could be brute forced in a matter of minutes likely for your entire userbase.
    – Ramhound
    Commented Dec 18, 2012 at 18:24
  • 250
    The issue is not that Dave is dumb. We're all, in a sense, dumb. It's that he's fallen victim to the Dunning-Kruger effect and not only has no idea how dumb he is, but is completely convinced that he knows better than the experts in the field. This is the real danger. Dumb and aware of it is fine, and how we should all strive to be. Dumb and oblivious is bad, but is comparatively easy to remedy. Dumb and embarrassingly overconfident is both extremely dangerous and frequently impossible to fix. Commented Dec 18, 2012 at 23:19

11 Answers 11

524
/** Dave's Home-brew Hash */

// user data
$user = '';
$password = '';

// timestamp, "random" #
$time = date('mdYHis'); // known to attackers - totally pointless
// ^ also, as jdm pointed out in the comments, this changes daily. looks broken!

// different hashes for different days? huh? or is this stored as a salt?
$rand = mt_rand().'\n'; // mt_rand is not secure as a random number generator
// ^ it's even less secure if you only ask for a single 31-bit number. and why the \n?

// crypt is good if configured/salted correctly
// ... except you've used crypt on the username?
$crypt = crypt($user.$time.$rand); 

// hash
function hash_it($string1, $string2) {
    $pass = md5($string1); // why are we MD5'ing the same pass when crypt is available?
    $nt = substr($pass,0,8); // <--- BAD BAD BAD - why shuffle an MD5 hash?
    $th = substr($pass,8,8);
    $ur = substr($pass,16,8);
    $ps = substr($pass,24,8); // seriously. I have no idea. why?
    // ^ shuffling brings _zero_ extra security. it makes _zero_ sense to do this.
    // also, what's up with the variable names?

    // 
    $hash = 'H'.sha1($string2.$ps.$ur.$nt.$th); 
    return $hash
}

$hash = hash_it($password, $crypt); // 

summon_cthulhu();

Dave, you are not a cryptographer. Stop it.

This home-brew method offers no real resistance against brute force attacks, and gives a false impression of "complicated" security. In reality you're doing little more than sha1(md5(pass) + salt) with a possibly-broken and overly complicated hash. You seem to be under the illusion that complicated code gives better security, but it doesn't. A strong cryptosystem is strong regardless of whether the algorithm is known to an attacker - this fact is known as Kerckhoff's principle. I realise that it's fun to re-invent the wheel and do it all your own way, but you're writing code that's going into a business-critical application, which is going to have to protect customer credentials. You have a responsibility to do it correctly.

Stick to tried and tested key derivation algorithms like PBKDF2 or bcrypt, which have undergone years of in-depth analysis and scrutiny from a wide range of professional and hobbyist cryptographers.

If you'd like a good schooling on proper password storage, check out these links:

22
  • 16
    Could you explain why this is 'BAD BAD BAD'? I thought repeated hashing or processing the hash inputs can help thwart rainbow table or brute force attacks. How does this introduce vulnerabilities?
    – jdm
    Commented Dec 18, 2012 at 15:35
  • 77
    @jdm The "BAD BAD BAD" part refers to the pointless shuffling of the MD5 hash, since it offers zero extra security, but in this case the "repeated" hashing is badly designed. It relies on MD5, crypt (whatever that ends up being configured to) and SHA1 all at once, but only really does 3 computations. That's nowhere near enough to defeat GPU-based bruteforce. The whole thing brings nothing but obscurity, and is probably less secure than sha1(md5(pass+salt)) with a decent salt. It's important to use a proper key derivation algorithm (see the first link).
    – Polynomial
    Commented Dec 18, 2012 at 15:38
  • 31
    Thank you. I am equally frustrated, believe me. Dave is a great guy, and has done excellent work in the past. His reaction to all of this took me by surprise. He's missing some fundamental concepts that could be picked up with a simple Google search. I'm not trying to be vindictive or cruel, but I'm done trying to explain this to him. This is exactly the kind of feedback he needs ... Commented Dec 18, 2012 at 15:44
  • 54
    @nallenscott To be fair, I went through the same phase when I was learning to code - I wanted to do everything myself, and quickly fell into the trap of writing my own "encryption" and hashing. Then I actually learnt about real security, and discovered much of what I'd previously written was horribly broken. Massive wakeup call!
    – Polynomial
    Commented Dec 18, 2012 at 15:48
  • 76
    @Casey: offcourse, always invent your own. Make your own linked list, your own crypto, your own database, your own mergesort. Implement the algorithms and datastructures to understand them better. Then throw the implementation away - the existing are almost always better. If not, get on the project and improve them! Your peers will review your changes and give feedback. If you're wrong, you'll learn, if you're right, your changes benefit all humankind ;)
    – Konerak
    Commented Jan 24, 2014 at 9:07
250

Advantages of a public protocol:

  • Probably written by smarter people than you
  • Tested by a lot more people (probably some of them smarter than you)
  • Reviewed by a lot more people (probably some of them smarter than you), often has mathematical proof
  • Improved by a lot more people (probably some of them smarter than you)
  • At the moment just one of those thousands of people finds a flaw, a lot of people start fixing it

Disadvantages of a public protocol:

  • If indeed a flaw is found,
  • AND made public
  • AND not fixed fast enough

then attackers will start searching for vulnerable sites/applications. BUT:

  • They'll probably go after more important targets first
  • When the flaw is exposed, you know and can lock-down
  • Not all flaws are "critical" (most are like "collisions are possible under certain conditions" or "the time to bruteforce is not 1012 years, but 106 years")

Security by obscurity is deemed outright bad. Inventing your own falls under that category.

5
  • 13
    For well known algorithms if you're not a professional cryptographer probably should be replaced with definitely. Commented Dec 18, 2012 at 15:55
  • 76
    "Probably written by smarter people than you" - you're assuming 'Dave' admits the existence of such.
    – AakashM
    Commented Dec 18, 2012 at 16:33
  • 9
    Any password could be called "security by obscurity". But in a good system, it's been proven that the password is the only thing you must keep secret. An untested system almost certainly has other flaws, so that while you're protecting the password, somebody gets in using an attack you didn't expect. Which is why you want an algorithm which has been attacked brutally and found to be strong. Commented Dec 18, 2012 at 21:01
  • 1
    @NathanLong: An important point about passwords/keys: if one suspects that one's secrets have been compromised, one can re-establish security easily and with confidence by e.g. rolling 25 transparent dice and using the values to generate a password or key (64+ bits of entropy); any die roll will be essentially as good as any other, provided only that the enemy doesn't find out what it was.
    – supercat
    Commented Sep 12, 2014 at 20:31
  • 7
    If by smart you mean people more knowledgeable about cryptography...
    – monster
    Commented Nov 24, 2015 at 14:53
171

If Dave is really "your" developer, as in you have the authority to fire him, then you have the authority to direct him to use a more well-documented scheme, and you should.

In cryptography, the fewer secrets that are required to be kept, the better. This applies especially to "hard-coded" secrets, such as the hash function itself, which are not secrets as soon as the code containing the secret leaves your building.

This is why open standards for cryptography are a good thing; if the scheme has been around for years or even decades, with the basic algorithm and various implementations publicly known, and still nobody's found a way to get in, it's a good scheme.

Case in point, Polynomial provided a very good breakdown of what's wrong with the scheme, but here are the major failings of the proposed hash algorithm:

  • MD5 is completely broken; While a preimage attack isn't made much easier by the primary ways that MD5 is broken (it's extremely vulnerable to known-plaintext strong collisions; knowing the message and the digest, one can produce a collision in 2^24 time), the hashing algorithm is way too fast to be secure against modern distributed cracking hardware. It should never be used in applications requiring data security.

  • SHA-1 is considered vulnerable; Again, there's not an elegant vector for a preimage attack, but there is a strong collision attack (2^61 time for a 160-bit hash), and the hashing algorithm is fast enough and the keyspace small enough that current hardware can feasibly brute-force it.

  • More hashes don't necessarily mean better hashing. Hashes are a deterministic process; they stand in for a "random oracle" in theoretical cryptography, but since they must always produce the same output given the same input, they cannot add any entropy to what is already inherent in the input.

    Stated simply, using a secret combination of multiple hashes as Dave is doing, even if that combination is unknown, doesn't add a significant amount of work to a brute-force (the relative order of three hashes has 6 possibilities, increasing the complexity of trying all possibilities by less than 3 powers of 2), and that extra complexity disappears as soon as an attacker can decompile your code and discover the relative order of the hash functions.

  • Passwords are inherently low-entropy. The best "user-consumable" (non-gibberish) passwords max out in complexity at about 50 bits of entropy, meaning they can be cracked, if you have some idea of what you're looking for, in 2^50 or better time. That makes passwords easier to crack than other things you'd normally hash (like certificate digests), requiring additional "proof of work" to increase their security.

  • This scheme is not adding any significant proof of work; three hashes instead of one, in the grand scheme of things, adds the equivalent of about 1.25 bits of entropy to the scheme in extra required work; you could do better by just requiring passwords to be one character longer.

BCrypt, in contrast to all of the above, has as its main advantage an extremely slow key derivation function or KDF. Blowfish, the encryption cipher that forms the basis of BCrypt, uses an "SBox"; a large array of predetermined initial values, which is modified by the key and/or IV to produce the initial state of the cipher through which the plaintext is fed. In BCrypt, this SBox setup is made more complex by taking the smaller password and salt values and running them through the "unkeyed" cipher a predetermined number of times, "warming up" the cipher into a state that theoretically could only be produced by performing the same number of iterations of setup, using the same password and salt. This "warmed-up" cipher is then fed a constant plaintext to produce the hash value. In CS terms, the hash forms a "proof of work"; a computational task that is difficult (time- and/or resource-consuming) to perform, but easy to check for correctness (does the produced hash match the one we already have?).

That "predetermined number" of iterations of SBox setup is configurable; this is BCrypt's real power. A number of "rounds" of key setup are specified when hashing, and for n rounds, 2^n iterations of key setup are performed. That means that incrementing the number of rounds makes the hash perform twice as much work and take twice as long on the same hardware to produce the requisite proof of work. BCrypt therefore can easily keep up with Moore's Law, which has been the bane of many hash functions that came before.

BCrypt's main theoretical weakness is its low memory usage; while computation time can be exponentially increased, the amount of memory required remains effectively constant (the SBox doesn't get any bigger as the iterations increase and no "intermediate results" are required to be kept around). As such, while BCrypt stymies the pure pipelined computational power of a GPU, it remains vulnerable to cracking by more sophisticated "field-programmable gate arrays" (basically a massive, highly modular set of "soft-wired" logic units, that are the current state-of-the-art in highly distributed computing), because the low memory constraints mean you can just throw more relatively-inexpensive circuit boards at the problem.

A newer algorithm, SCrypt, combats FPGA crackers by exponentially increasing the memory demands as well as the computational expense of calculating the hash, so the limited memory available to each FPGA quickly makes them infeasible as well (distributed cracking would basically require large numbers of CPU/FSB/RAM combinations, essentially full computers, hooked together). The only problem there is that SCrypt is only 2 or 3 years old, so it doesn't have the cryptanalytic pedigree of BCrypt (13 years old). And really, if you have to worry about an FPGA-armed cracker getting a password in a feasible amount of time (like before you can change all of them anyway), you have pissed off some very powerful people, and have already exposed another serious vulnerability (allowing an attacker to obtain your stored password hashes in the first place, so they can crack one "offline").

Bottom line, use BCrypt for password hashing. It's 13 years old, based on a 20-year-old cipher algorithm, an in that time no known vulnerabilities have been found that would aid a password cracker. It's slow as molasses, and can be configured to always be so, provided you combine it with a requirement for users to change their password every 90 days or so, so new passwords keep getting hashed with more expensive configurations.

9
  • 23
    First, SHA1 is 160 bits. Second, being broken for collision attacks (2^(n/2) to brute force) doesn't matter the slightest in the domain of cracking a hashed password (where you worry about (first) pre-image attacks - 2^n to brute force); see: en.wikipedia.org/wiki/Preimage_attack . For crypto-signatures a collision broken hash (and SHA1 is broken in 2^61 time) can't be used (as signed messages could be altered). (Though the simplicity and speed of MD5 and the existence of large rainbow tables does mean its bad for passwords.) Agree with your arguments for bcrypt though.
    – dr jimbob
    Commented Dec 18, 2012 at 17:49
  • 1
    Deleted long discussion on preimage attacks, collisions, hashing strength etc
    – Rory Alsop
    Commented Dec 19, 2012 at 8:42
  • 12
    I use MD5 to create nice colours in my charting scripts - not for securing passwords
    – mplungjan
    Commented Mar 19, 2013 at 7:19
  • 4
    @mplungjan Nice idea! Commented Nov 12, 2018 at 13:36
  • 4
    Oh my God I hate companies so much when they require resetting a password every 90 days, it's so frustrating. Do not do this. Your company is not special, unless you're Facebook or Google, it's not even likely that you users visit your page that often, you're asking your users to reset their password nearly every single time they visit your website, which is absurd. Commented Nov 13, 2019 at 7:48
103

To be fair to Dave, in terms of homebrew password security this is one of the better cases as all it just a little obsfuscation (and really not much) masking hash = SHA1(salt + MD5'(Password)) where MD5' does a reversible swap of the order of the bytes of the MD5 hash. Now the username/time/random/crypt-part is just used to generate a salt, and the only requirement we have of salts is that they just need to have a very good likelihood of uniqueness; so while its over-complicated there's really no use talking about it.

Again hash = sha1($salt+md5'($password)). Rearranging the MD5 adds no security (swap(00112233445566778899aabbccddeeff) becomes ccddeeff8899aabb0011223344556677) the swap does not prevent you from using md5 rainbow tables after (e.g., look at the code and reverse the swap). However, the presence of the unique salt makes the rainbow tables infeasible.

Now to be critical, this boils down to a simple cryptographic hash function applied twice; which is better than storing plaintext password (see plaintext offenders ), and better than storing an unsalted hash (see linkedin leak ). However, in the age of cheap massively-parallel GPUs, this is too weak for modern use. Anyone with a bit of general-purpose GPU programming experience that got on the live server somehow (to obtain the hashes with their salt) can likely see his source code, try out their own password as a test case and then can brute-force any particular password at a rate of billions of attempts per second per GPU.

So if any user is using a password on a list of a million or so previously leaked password (say from linkedin), the attacker can near instantaneously crack it. If some user's password is a random 8-letter from the char set A-Za-z0-9; it would take about 60 hours on average to break per GPU (so if you had 60 GPUs; would take 1 hour). Using common cracking techniques exploiting forms of common passwords could speed it up dramatically. Its also worth noting that since $password passes through a 128-bit MD5 hashing function, there is absolutely no benefit in using more than 128-bits of entropy in the passphrase (though to be fair that's a very secure password; like a 10 word diceware passphrase or random 22 character alphanumeric password).

Really, you should be using iterated cryptographic hash functions; that is something like bcrypt or PBKDF that slows down the brute-force rate of attackers by a large constant factor (say 105) (so instead of 60 hours to crack a random 8-char password from a 62-char set (A-Za-z0-9); it takes 6 million hours (~700 years to likely be broken) with a single GPU, and this gets better with stronger passwords (e.g., a 10 char password would take ~3 million years; so even with a million GPUs it would take 3 years). So a little keystrengthening moves relatively weak password (8 random chars from 62-charset) out of the range of feasibility of being cracked by an attacker. For more on the upper limits of using a simple keystrengthened password see this answer.

Collision Attacks vs Pre-Image Attacks (or Why Collision Attacks on a hash function are irrelevant for password hashing)

KeithS's answer, while it gives good advice (use bcrypt and not a simple cryptographic hash function to hash your password) originally criticized MD5 and SHA1 for the wrong rationale (don't use MD5 as its vulnerable to collision attacks). There's a subtle difference between pre-image and collision attacks and the arguing in the comments was too condensed, so I am elaborating here. I highly suggest reading the wikipedia article on pre-image attacks.

A pre-image attack says given a specific 128-bit hash written in hexadecimal: h=ad2baf26a87795b3c8a8366a08b44112, a specific hash function H, please find any message m such that h=H(m); note that there are N=2128 different distinct hashes. Now if your cryptographic hash function is not broken, each bit in your hash has a 50% chance of being 0 or 1 for a random message m. Then I will need on average to generate hashes for roughly N=2128 hashes before I am lucky enough to find any message m such that h=H(m) (this message may not be the same input that was used to originally generate the hash -- but this still falls under the category of 'pre-image' attack).

A collision attack says find me any two messages m1 and m2 such that H(m1) = H(m2). Note that m1, m2, (and H(m1)) are all free to change. This case is a much easier problem since if I generate M hashes for M different messages, I am not just comparing the M messages to one specific hash (so have M chances of a finding a collision), now I have M*(M-1)/2 pairs of hashes, so roughly M^2 chances of having a collision. So in this case, I will need roughly to generate roughly sqrt(N)~264 hashes before its likely that one of them will collide with another one on an ideal 128-bit hash.

Let's look at the two types of birthday problems. The collision problem translates into the common birthday "paradox"; how many people do you need in a room before its quite that two people share a birthday with N=365 days in a year. The answer is a paradox as you only need about sqrt(N) ~ 23 people before its likely that two people share a birthday (as with 23 people in a room you have 253 different pairs of people that could match). (I do know that sqrt(365) != 23: I am using approximate math not focusing on insignificant constant factors. Re-doing the calculation with sqrt(365) ~ 19 people in the room then P(two share birthday) = 19! * comb(365,19)/365**n = 37.9%, which while not strictly 50% still means its fairly likely to happen). Note for the collision birthday problem, you can't have N+1=366 people in a room and there be a chance of not having a collision (ignoring leap days); at best the first 365 people had different birthdays and the last person generated a collision.

The pre-image problem is a very different question, how many people do I need in a room before it is quite likely that someone has one specific birthday (say B=December 18th). In this case, I need roughly N ~ 365 people before its likely to happen. E.g., with 365 people in the room you have a P(somebody has birthday B) = 1 - (1 - 1/365)^(# people) so for # people = 365 you have a 63% chance that someone's birthday will be some fixed date B. In this case, you can easily imagine having any number of people being in a room and there being no one who has a birthday on one specific date. (Say you only invited people into the room if there birthday was not a given date; there's no limit to the number of people you could invite).

When a hash function like MD5/SHA1 is broken for collision attacks that means you can generate a collision in less work than the brute force time of sqrt(N) ~ 2numbits/2. For MD5 it only takes about ~ 2^24 time to generate a collision; for SHA1 it takes ~ 2^61 time. That means collision attacks on MD5 are extremely easy to make; but practical attacks on SHA1 are still difficult. But collision attacks only matter if you don't care what hash you are trying to match. These collision attacks are quite relevant for some applications like cryptographically signing messages to ensure message integrity, so beware of using MD5/SHA1 in those cases. However, when you have a unique salt, and a specific hash you are trying to match to authenticate, collision attacks do not matter.

69

enter image description here

See the related Security Meme post

While this may seem very simplistic, the rules hold true - designing crypto algorithms and implementing them correctly/securely is very hard. Even the ones designed by experts and picked at by thousands of people over years have holes discovered in them eventually.

So Do Not Roll Your Own Crypto is good advice for everyone (possible exceptions include notables like Rivest, Adleman, Pornin, Shamir)

6
  • 4
    Funny and agree with the sentiment, but its irrelevant as he didn't write his own cipher. He took existing crypto-hash functions MD5 and SHA1 along with custom permutation function dumb_perm (dumb_perm('00112233445566778899aabbccddeeff') goes to 'ccddeeff8899aabb0011223344556677'), so hash = SHA1(salt++dumb_perm(MD5(pw))) and created their salt in an overly complicated manner. While they've increased their maintenance costs for no gain in security, they are not creating their own cipher--the flaw is that simple hashes are too quick nowadays, so key-strengthening is necessary.
    – dr jimbob
    Commented Dec 19, 2012 at 16:41
  • I think the greater problem is that the question's example is more of a case of key weakening than it is of key strengthening. It may not be a new cipher, but it's falling for the same pitfalls.
    – Jeff Ferland
    Commented Dec 19, 2012 at 18:08
  • 1
    @drjimbob cipher != hash Commented Dec 19, 2012 at 21:09
  • @bradley.ayers - I understand the difference (ciphers are used for encryption; hashing is not encryption as its not reversible.) The meme image first used the word "cipher", and I didn't criticize Rory as cryptographic hash function like MD5 and SHA1 are based on algorithms similar to block ciphers: en.wikipedia.org/wiki/… However, Dave didn't create his own cipher/hash functions or anything similar. He merely did a dumb permutation of an MD5 at one step of a weak hashing scheme (sha1(salt+md5(pw)).
    – dr jimbob
    Commented Dec 19, 2012 at 21:17
  • 3
    One does not simply write one's own cipher sounds better
    – mplungjan
    Commented Mar 19, 2013 at 7:21
44

While we can find plenty of flaws with Dave's algorithm, it really isn't horrible because it isn't 100% home brew; he does use hashing protocols that (albeit weak) are based on solid principles. On the other hand, he takes steps that increase complexity for the developer but do little to improve the security of his algorithm.

But the reason I am adding another answer here is to bring up the age-old argument that there is value in obscurity. The primary line of defense should always be strong, widely-accepted hashing algorithms but there is no harm in adding even the tiniest amount of obscurity as an additional layer of defense.

That layer of obscurity should never be considered a defense in itself, simply an additional factor that reduces the chances of compromise. Since a successful attack normally requires a number of factors to fall in place, even small defenses can have a great impact in reducing your overall risk.

I can't tell you how many times I have seen hash dumps that I have tried to crack where I simply get nothing because some Dave somewhere thought of this dumb non-standard algorithm that I can't crack short of doing an intense cryptoanalysis or breaking in to the web servers to get the algorithm. You can't deny that these dumb algorithms are a valid defensive factor.

Now if Dave took his dumb algorithm and added it as a layer to stronger algorithms such as PBKDF2 or bcrypt, then he has obscurity with the backing of solid security practices. Also, obscurity doesn't need to be as complex as Dave's code, all you really need is one bit to be different to be obscure. Remember this is not a defense, just a layer of obscurity.

Better ways to accomplish obscurity are using an HMAC, concatenating a server-side salt constant, and/or using a large, non-standard number of iterations. None of these measures will stand on their own but they certainly do address specific attack vectors that contribute to your overall risk.

tl:dr; security through obscurity is bad, but solid security plus a reasonable amount of obscurity is a valid strategy.

2
  • 3
    Indeed Dave's algorithm is essentially salted HMAC with a little twist. It appears to be as strong as HMAC. For example attacker won't be able to use someone else's rainbow table. My biggest concern with this system is where salt is stored.
    – user17497
    Commented Dec 20, 2012 at 9:50
  • 2
    @qarma HMAC is using the secret (i.e. the password) twice, this algorithm is using it only once. Commented Apr 6, 2013 at 13:43
30

OK, fire Dave. At the very least hit him with a very large clue-bat. Open protocols are good because anyone can look and attempt to find vulnerabilities and structural problems, and implement fixes. The visibility improves the protocol. Good security means that everyone can know how the system works and it is still secure.

4
  • 1
    Do you still belief that in all open systems all holes are found and fixed and therefore they are the most secure systems, when considering the leaks of the last few years that have revealed that the likes of NSA and CIA keep vulnerabilities to themselves and develop exploits off them for various barely legal reasons? If they can do it (the so-called 'good guys'), can't the so-called 'bad-guys' do it too ('Chinese/Russian hackers' and so on)?
    – a20
    Commented May 14, 2017 at 5:27
  • You are confusing design with implementation @a20, in any case I don't think being closed helps, most of the vulnerabilities that the US gov't are sitting on are in closed systems, for instance MS Operating Systems.
    – GdD
    Commented May 15, 2017 at 6:59
  • "most of the vulnerabilities that the US gov't are sitting on are in closed systems" .. fair point. What did you mean that I'm confusing design with implementation?
    – a20
    Commented May 16, 2017 at 8:01
  • @a20 I'm really late to this party, but I'm guessing that GdD was referring to the fact that those real-world vulnerabilities you're talking about are rarely fundamental flaws in a cryptographical standard, more often they're flaws in the way a software product uses or implements the standard. For example, if I write an implementation of an open protocol that is vulnerable because I used a timestamp instead of a crypto-PRNG as a seed, that vulnerability is evidence of a weakness in my implementation, not in the open protocol necessarily. And closed systems are arguably more susceptible here. Commented Jul 1, 2020 at 23:22
19

Convince him with good reasoning. Don't berate him.

You have to think about why we are hashing passwords: The reason is to protect the original password by making the hashing process take a lot of CPU time to execute. Brute force is the way the passwords are typically recovered.

If the attacker is able to steal your password database then they've managed to access your database. If they have access to the database then they probably have access to the code that produces these passwords as well. After reviewing the code they can start brute-forcing the passwords because the hashing algorithm takes too little time to execute.

The idea behind well-known hashing methods and practices like using PBKDF2 with 10,000 iterations is that they take a lot of time to execute on current hardware.

If you have some time on your hands you could also write a brute-forcing program and show him how many attacks per second you can execute with his algorithm vs prevailing standards.

5
  • 5
    Just because they can access the DB, does not mean they have access to his code.
    – jmoreno
    Commented Dec 18, 2012 at 16:48
  • 1
    @jmoreno Still doesn't justify his useless scheme.
    – Thomas
    Commented Dec 18, 2012 at 20:39
  • 2
    @Thomas: true, and the reason for rejecting bcrypt is simply unsupportable, but that doesn't mean that your code can't add (or remove) value from using the standard libraries and practicies.
    – jmoreno
    Commented Dec 18, 2012 at 22:02
  • 1
    @jmoreno In general, security by obscurity is perceived as a crutch for poor cryptography practices, and it often is. That doesn't mean it doesn't have value - it does have its uses, but in this situation it is not warranted.
    – Thomas
    Commented Dec 18, 2012 at 22:13
  • @jmoreno, right, which is why I said 'probably'. The main point is that the hashing must take time. Commented Dec 19, 2012 at 14:56
11

I've written several hashing algorithms. There's nothing wrong with it, if you know what you're doing. In a very slight way, he's right about the fact that tried-and-true algorithms may be a little more vulnerable to attack. So if you can create a good algorithm, you're golden. The only problem is that his totally sucks, for a number of reasons.

  1. // timestamp, random # $time = date('mdYHis');

    I don't even feel like I need to explain this one. He was trying to create a seeded "random" number when it doesn't make any sense to do so. The hashes will change day to day. I cannot imagine this working in any reasonable scenario.

  2. He says that he want's to stray away from known hashing algorithms, yet he uses MD5 (the known-est of them all, and a fairly weak one too) for the bulk of "his" algorithm. His method is only a very slight derivative of plain ole MD5 hashing. Which brings me to:
  3. He's shuffling his MD5'd result. Uh, why? As a fun little activity, try asking him to explain his reasoning there. And I'll give you a tip to help you out: Whatever his answer is -- it's wrong. Shuffling a hashed value is like shuffling a deck of all-blank cards. There is no extra benefit from it whatsoever.

There are more things wrong with it, too, but these are the big ones. Tell him to get back to his usual work, and just use bcrypt (or scrypt, which is probably even a little more secure to do the fact that it's really CPU intensive)

3
  • 3
    Yes you can write your own, but most people who think they are qualified to write their own are not qualified to write their own. It's easy to judge how smart we are, but difficult to judge how dumb we are. Commented Dec 21, 2012 at 19:44
  • 2
    I think the reason for the shuffling is to make the algorithm different from anything an attacker (who doesn't know the algorithm) expects. Using a application-specific secret salt input (aka "pepper") has this same effect and is more reliable. Commented Dec 22, 2012 at 2:04
  • Regarding #3: A shuffled MD5 hash cannot be looked up in a rainbow table. Commented Jul 3, 2019 at 21:09
10

Steganography would benefit from Dave's operational secrecy better than Cryptography. This might be what Dave was intuitively aiming for.

Cryptography

Each cryptographic solution benefits from widest possible exposure because relying on a secret other than the private key makes operational security for a crypto-consumer much harder. A key is a single easy-to-manage, portable, predictable, well-defined security risk that crypto-experts have focused great deal of effort on ensuring that all risk in the algorithm has been squeezed into only the key.

Keys can be made unguessable (high-entropy) as to require brute-force or side-channel attacks. Secrets in other aspects of an algorithm can not be made as unguessable because any other aspect of an encryption algorithm be generalised into a universal class of (possibly weaker) encryption algorithm + a constant value. This constant value representing all residual invariant entropy in the algorithm's structure or obfuscation. This value is never particularly large or entropic in a home-brew algorithm and is inherently constant in any non-self-modifying computer program. An attacker can either acquire the source and remove this (minor) entropy directly or run the ciphertext through a generalised class of algorithm. NSA for example would be doing precisely this for unknown variants of common algorithms in use by foreign powers.

Steganography

Each steganographic solution benefits from the least public exposure because hiding the location of secrets requires the attacker to customise their search in a fashion potentially unique for that victim; and scales to the amount possible hiding places and methods of hiding at the victim's disposal.

This could be as simple as having a fake hash password column and hiding the real one in four (64bit) time_t columns across various tables that record user information. This can be as complex as hiding a Portable Linux emulation inside a BluRay movie.

Steganography ultimately relies on creating or modifying an artefact that has a public semantic meaning or purpose and secondary private semantic meaning or purpose. An person or organisation benefits from hiding secondary semantic meanings in much the same way as any lie by omission. Be that "My sock is boring and doesn't contain diamonds" or "These are the hashes you are looking for, ignore that honeypot or that socket connection to an audit server".

In Combination

So Dave should use public vetted high-level libraries like BouncyCastle instead of a home-brew of low level crypto-primatives; but should feel free to make the hacker navigating the server ecology feel like they are reading a brainfuck program that's missing some characters.

Assuming full support documentation is kept in a secure offline location for Dave's successors...

8

I'm going to try a different approach to answering this. I'm not going to tell you something is bad, and not explain why. I'll explain exactly why it's bad, step-by-step, and show you how to break it, but I won't go too deeply into Rainbow Tables... it's assumed you know what that is.

Here's why your developer's hash is incorrect: because it just introduces a few factors that could be utterly brute-forced once the method is discovered, even if he were to randomize the shuffling.

How? Because if I have access to your server, I also know the method which your developer is using to hash your password because I've stolen that code too, and I can rearrange everything to suit my little crackpot application.

Let's say I hacked your company, and gained access to your database so that I can obtain these hashes. I know WHEN users register, correct? I know when they last changed their password because, unless you're a complete newbie, you should be logging that information to the database.

$time = date('mdYHis');

This timestamp is probably around the same time the the user registered his or her password, or changed it. We can know this because the last time it was updated should be stored in the database. The time you registered should be stored in the database. Part of the crypt has been defeated.

$rand = mt_rand().'\n';

Cool, a random number generator. You know, nothing is really random on computers. Random number generators aren't truly random. This is the Mersenne Twister. All I need are 624 different combinations to figure out all future numbers. I've stolen your "salt" from the database as well, and I know how you're calculating it.

Because I know the dates on which your users registered, I know how to recreate the correct number for the Mersenne Twister routine at that specific time. I can write a little cracking method to extract the exact number that would likely happen at that time. Your salt is no longer random. But you know what? It's irrelevant. I know the exact format you're using for that salt, and I likely got it from your database anyway. Then I can use Rainbow Tables to utterly annihilate your salt in under a second because mt_rand() alone produces a maximum of 10^10 hashes, and a minimum of 10^9, which means 11 billion possibilities, and since all of them are numbers, and I know the format of your salt, they will be cracked in less than a second.

So now I know who this username belongs to because I jacked your database. I know when they registered, I know when they last changed their password, and I know your random number. I see you, Dave, a thief on the roof. My new satellite link has both infrared and the x-ray spectrum. I see your heart beating. I see you are afraid.

Now that $crypt has been defeated, let's take a look at function hash_it().

function hash_it($string1, $string2) {
    $pass = md5($string1);
    $nt = substr($pass,0,8);
    $th = substr($pass,8,8);
    $ur = substr($pass,16,8);
    $ps = substr($pass,24,8);

    $hash = 'H'.sha1($string2.$ps.$ur.$nt.$th);
    return $hash
}

Now let's take a look at some sample input: hash_it(Dave, testpassword1);

Results:

$pass = "b7e055c6165da55c3e12c49ae5207455"
$nt = "b7e055c6";
$th = null;
$ur = "3e12c49a";
$ps = "e5207455";

$hash Input: "H" + "username + "RegistrationDate" + "1604716014" + "e5207455" + "3e12c49a" + "b7e055c6"(tldr: "HDaveRegistrationDate1604716014e52074553e12c49ab7e055c6")

$hash = 'H'.sha1("DaveRegistrationDate1604716014e52074553e12c49ab7e055c6);

`$hash output: "6b347a1521b6b84501806268614abe1e7324c703"; (same thing every time)

Here's how an attack might work:

  1. Break Mersenne Twister (mt_rand()) after 624 observations, or just Rainbow Table the salt.
  2. Grab Dave's REGISTRATION_DATE from the database, or the USER_LAST_MODIFIED_DATE
  3. Grab Dave's SHA1 hash
  4. Initiate brute force attack (it becomes a static sha1 hash after we've broken your "security") with the following parameters:
    • "Username" + "RegistrationDate" + YourRandomNumber + DictionaryAttack[]
  5. Harm your customers.

And that's the story of why you shouldn't roll your own cryptography unless you really know what you're doing. Imagine if I were an actual cracker, and not someone who just went backwards through your code.

4
  • 1
    And what if you don't have access to the code as you presume?
    – monster
    Commented Nov 24, 2015 at 9:51
  • 1
    @monster Of course I have access to the code. It's PHP, and he's trying to protect his database, which I've already stolen. The point of hashing passwords is to protect your users once you've been breached. If I can get a hold of your database, getting a hold of the rest of the important files on the web server is a trivial task. Commented Nov 24, 2015 at 11:56
  • Let us continue this discussion in chat. Commented Nov 24, 2015 at 15:21
  • 2
    @MarkBuffalo Of course I have access to the code. It's PHP, and he's trying to protect his database There are many situations where SQLi may allow for dumping the database without obtaining the PHP files. Of course, that also means that Dave should just be using a pepper, since it accomplishes the same threat model but isn't so... broken.
    – forest
    Commented Dec 12, 2017 at 6:51

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .