62

For a project, I need a method of creating thousands of random strings while keeping collisions low. I'm looking for them to be only 12 characters long and uppercase only. Any suggestions?

4
  • 3
    You mean you don't want any lowercase digits?
    – martineau
    Commented Aug 19, 2013 at 17:02
  • Hmm, yeah, that should be clarified :) Commented Aug 19, 2013 at 17:02
  • Don't forget to read this page about the default random number generator in python. The chance of collisions seems to be fully dependent on the size of the "random strings", but that does not mean that an attacker cannot re-create the random numbers; the random numbers generated are not cryptographically secure. Commented Aug 19, 2013 at 17:10
  • Hah, right. I meant alphanumeric.
    – Brandon
    Commented Aug 20, 2013 at 15:08

7 Answers 7

139

CODE:

from random import choice
from string import ascii_uppercase

print(''.join(choice(ascii_uppercase) for i in range(12)))

OUTPUT:

5 examples:

QPUPZVVHUNSN
EFJACZEBYQEB
QBQJJEEOYTZY
EOJUSUEAJEEK
QWRWLIWDTDBD

EDIT:

If you need only digits, use the digits constant instead of the ascii_uppercase one from the string module.

3 examples:

229945986931
867348810313
618228923380
7
  • 4
    yeah, well this is missleading: "12 digits long and uppercase" -- since digits can't be uppercased
    – Peter Varo
    Commented Aug 19, 2013 at 17:01
  • And if you need Alphanumeric i.e ASCII Uppercase plus digits then import digits print(''.join(choice(ascii_uppercase + digits) for i in range(12))) Commented Jan 5, 2017 at 12:45
  • Does this gives an unique Id each time? What if I call this function from multiple threads (e.g. 2 of them) for 10000 times? What is the probability of collision or getting the same id at given point of time?
    – AnilJ
    Commented Sep 6, 2017 at 22:43
  • @AnilJ for further info on how the random module is working, please read the official documentation on it: docs.python.org/3/library/random.html
    – Peter Varo
    Commented Sep 7, 2017 at 7:44
  • Well, digits is not on Python3. You can use string.hexdigits to get a mix of '0123456789abcdefABCDEF', or just string.digits + string.ascii_letters for all letters.
    – goetz
    Commented Oct 31, 2017 at 1:20
24

By Django, you can use get_random_string function in django.utils.crypto module.

get_random_string(length=12,
    allowed_chars=u'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')
    Returns a securely generated random string.

    The default length of 12 with the a-z, A-Z, 0-9 character set returns
    a 71-bit value. log_2((26+26+10)^12) =~ 71 bits

Example:

get_random_string()
u'ngccjtxvvmr9'

get_random_string(4, allowed_chars='bqDE56')
u'DDD6'

But if you don't want to have Django, here is independent code of it:

Code:

import random
import hashlib
import time

SECRET_KEY = 'PUT A RANDOM KEY WITH 50 CHARACTERS LENGTH HERE !!'

try:
    random = random.SystemRandom()
    using_sysrandom = True
except NotImplementedError:
    import warnings
    warnings.warn('A secure pseudo-random number generator is not available '
                  'on your system. Falling back to Mersenne Twister.')
    using_sysrandom = False


def get_random_string(length=12,
                      allowed_chars='abcdefghijklmnopqrstuvwxyz'
                                    'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'):
    """
    Returns a securely generated random string.

    The default length of 12 with the a-z, A-Z, 0-9 character set returns
    a 71-bit value. log_2((26+26+10)^12) =~ 71 bits
    """
    if not using_sysrandom:
        # This is ugly, and a hack, but it makes things better than
        # the alternative of predictability. This re-seeds the PRNG
        # using a value that is hard for an attacker to predict, every
        # time a random string is required. This may change the
        # properties of the chosen random sequence slightly, but this
        # is better than absolute predictability.
        random.seed(
            hashlib.sha256(
                ("%s%s%s" % (
                    random.getstate(),
                    time.time(),
                    SECRET_KEY)).encode('utf-8')
            ).digest())
    return ''.join(random.choice(allowed_chars) for i in range(length))
4

Could make a generator:

from string import ascii_uppercase
import random
from itertools import islice

def random_chars(size, chars=ascii_uppercase):
    selection = iter(lambda: random.choice(chars), object())
    while True:
        yield ''.join(islice(selection, size))

random_gen = random_chars(12)
print next(random_gen)
# LEQIITOSJZOQ
print next(random_gen)
# PXUYJTOTHWPJ

Then just pull from the generator when they're needed... Either using next(random_gen) when you need them, or use random_200 = list(islice(random_gen, 200)) for instance...

8
  • 2
    And the advantage of using a generator for this would be?
    – martineau
    Commented Aug 19, 2013 at 17:10
  • @martineau can take one at a time, set up ones with different variables, can slice off to take n many at a time etc... The main difference is that it's in effect an iterable itself, instead of repeatedly calling a function... Commented Aug 19, 2013 at 17:12
  • Why wouldn't you just repeatedly call a function? Commented Aug 19, 2013 at 17:50
  • functools.partial can fix parameters, and list(itertools.islice(gen, n)) isn't any better than [func() for _ in xrange(n)] Commented Aug 19, 2013 at 17:58
  • @user2357112 by building a generator, there's an advantage over resuming its state, than setting up and calling up a function repeatedly... Also the list and islice will work at the implementation level instead of as a list-comp that could leak its _ (in Py 2.x) variable and has to build an unnecessary range constraint that's otherwise handled... Also, it's also harder to build on top of functions, rather than streams... Commented Aug 19, 2013 at 18:05
1
#!/bin/python3
import random
import string
def f(n: int) -> str:
        bytes(random.choices(string.ascii_uppercase.encode('ascii'),k=n)).decode('ascii')

run faster for very big n. avoid str concatenate.

0

For cryptographically strong pseudo-random bytes you might use the pyOpenSSL wrapper around OpenSSL.

It provides the bytes function to gather a pseudo-random sequences of bytes.

from OpenSSL import rand

b = rand.bytes(7)

BTW, 12 uppercase letters is a little bit more that 56 bits of entropy. You will only to have to read 7 bytes.

2
  • 1
    Wouldn't 12 randomly selected uppercase letters correspond to ~56.4 bits worth of entropy?
    – DSM
    Commented Aug 19, 2013 at 17:40
  • 1
    rand.bytes is no more supported in the last versions of OpenSSL
    – david
    Commented Sep 6, 2018 at 12:15
0

This function generates random string of UPPERCASE letters with the specified length,

eg: length = 6, will generate the following random sequence pattern

YLNYVQ

    import random as r

    def generate_random_string(length):
        random_string = ''
        random_str_seq = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
        for i in range(0,length):
            if i % length == 0 and i != 0:
                random_string += '-'
            random_string += str(random_str_seq[r.randint(0, len(random_str_seq) - 1)])
        return random_string
1
  • With above code random_str_seq = "ABC@#$%^!&_+|*()OPQRSTUVWXYZ" can give you even more complex results.
    – Iqra.
    Commented Jan 18, 2019 at 11:39
0

A random generator function without duplicates using a set to store values which have been generated before. Note this will cost some memory with very large strings or amounts and it probably will slow down a bit. The generator will stop at a given amount or when the maximum possible combinations are reached.

Code:

#!/usr/bin/env python

from typing import Generator
from random import SystemRandom as RND
from string import ascii_uppercase, digits


def string_generator(size: int = 1, amount: int = 1) -> Generator[str, None, None]:
    """
    Return x random strings of a fixed length.

    :param size: string length, defaults to 1
    :type size: int, optional
    :param amount: amount of random strings to generate, defaults to 1
    :type amount: int, optional
    :yield: Yield composed random string if unique
    :rtype: Generator[str, None, None]
    """
    CHARS = list(ascii_uppercase + digits)
    LIMIT = len(CHARS) ** size
    count, check, string = 0, set(), ''
    while LIMIT > count < amount:
        string = ''.join(RND().choices(CHARS, k=size))
        if string not in check:
            check.add(string)
            yield string
            count += 1


for my_count, my_string in enumerate(string_generator(12, 20)):
    print(my_count, my_string)

Output:

0 IESUASWBRHPD
1 JGGO1THKLC9K
2 BW04A5GWBA7K
3 KDQTY72BV1S9
4 FAOL5L28VVMN
5 NLDNNBGHTRTI
6 2RV6TE6BCQ8K
7 B79B8FBPUD07
8 89VXXRHPUN41
9 DFC8QJUY6HRB
10 FXYYDKVQHC5Z
11 57KTZE67RSCU
12 389H1UT7N6CI
13 AKZMN9XITAVB
14 6T9ACH3GDAYG
15 CH8RJUQMTMBE
16 SPQ7E02ZLFD3
17 YD6JFXGIF3YF
18 ZUSA2X6OVNCN
19 JQRH6LR229Y4

Not the answer you're looking for? Browse other questions tagged or ask your own question.