0

I'm doing some research on PHP's str_shuffle function, I already asked this question multiply times on StackOverflow but got no answer https://stackoverflow.com/questions/61968859/is-it-possible-to-predict-the-next-output-for-the-str-shuffle-function-in-php.

I'm interested in how would you break str_shuffle, since PHP's docs states "It does not generate cryptographically secure random values", hence I'm guessing it can be compromised. But how? What I've tried so far is a bruteforce attack wrote about it here: https://stackoverflow.com/questions/62106860/bruteforcing-a-32-bit-seed-in-php

Why is str_shuffle() used so frequently for generating random tokens? I see alot of this kind of code:

function generate($letters, $length = 8) // $letters is a pool of characters and numbers
{
return substr(str_shuffle($letters), 0, $length);
}

Is this enough for security or not?

1
  • 1
    "Why is str_shuffle() used so frequently?" - Because a lot of programmers, especially PHP programmers, don't know about security. A programmer wants a random token and that function seems to do that just fine. Even if they're told "it's not random enough", as long as they can't find a pattern in it, they will dismiss it as good enough.
    – user163495
    Commented Jul 27, 2020 at 8:05

1 Answer 1

4

Per https://www.php.net/manual/en/function.str-shuffle.php, str_shuffle uses PHP's internal PRNG, which is an implementation of Mersenne Twister (see http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html or https://en.wikipedia.org/wiki/Mersenne_Twister). According to Wikipedia, the standard Mersenne Twister implementation requires observing 624 iterations to be able to predict all future values. For a moderately low-traffic site, this would be easy; just write a script that logs in (or whatever else generates the output of such a function) as fast as possible, as many times as possible, until you get the desired number of outputs in sequence. Depending on how str_shuffle is implemented, a single output may involve multiple invocations of mt_rand (PHP's Mersenne Twister-based PRNG), so you might not actually need a large number of outputs in sequence if you can infer the PRNG outputs by looking at the generated strings. Conveniently, PHP is open source, so you can go look at exactly how these functions work.

Leaving that aside, though, there are other problems with the approach you've described.

  1. An output of length 8 is simply insufficient. Even if each value was a perfectly randomly chosen byte, that's only 64 bits of entropy; while 64 bits is enough that it's not practical to brute force any particular token quickly (assuming a network connection is involved), if there are enough tokens active at once then a birthday attack can be used to find some valid token in much less time than you might think.
  2. Even if a cryptographically secure random number generator were used, constraining the input set as done weakens the security of the output. For example, suppose the input set was "abcdefghijklmnopqrstuvwxyz0123456789", that's only 36 characters and you can guarantee that after you see a character once you won't see it again. Even if $length is 36 - the most it can be - the number of possible permutations is only 36! (factorial(36)) rather than 36^36 (what you'd get if you selected with replacement; that is, if each position's value was chosen from the entire input rather than just the characters that hadn't been chosen before), which is nearly 286 trillion times as many possibilities. Both would actually be secure enough (assuming a CSPRNG!) - 36! is about 2^138, which is plenty of entropy for most auth tokens - but it does matter. With only length 8, you'd have 36!/28!, which is ~1.22 trillion possible combinations, only 40 bits of entropy and just over 43% as many possibilities as the with-replacement approach. A smaller input set, or one the same length but including repeated characters, would be weaker still for those lengths.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .