Generate a lot of random strings

Question

I have a method that will generate 50,000 random strings, save them all to a file, and then run through the file, and delete all duplicates of the strings that occur. Out of those 50,000 random strings, after using set() to generate unique ones, on average 63 of them are left.

Function to generate the strings:

def random_strings(size=8, chars=string.ascii_uppercase + string.digits + string.ascii_lowercase):
    return ''.join(random.choice(chars) for _ in xrange(size))

Delete duplicates:

    with open("dicts/temp_dict.txt", "a+") as data:
        created = 0
        while created != 50000:
            string = random_strings()
            data.write(string + "\n")
            created += 1
            sys.stdout.write("\rCreating password: {} out of 50000".format(created))
            sys.stdout.flush()

        print "\nRemoving duplicates.."
        with open("dicts\\rainbow-dict.txt", "a+") as rewrite:
            rewrite.writelines(set(data))

Example of before and after: https://gist.github.com/Ekultek/a760912b40cb32de5f5b3d2fc580b99f

How can I generate completely random unique strings without duplicates?

Do you require 2 files or do you just want 50000 unique strings? — Simon Black, Commented Oct 18, 2016 at 18:26

Paul · Accepted Answer · 2016-10-18 19:36:34Z

3

You can use set from the start

created = set()
while len(created) < 50000:
    created.add(random_strings())

And save once outside the loop

edited Oct 18, 2016 at 19:36

Paul

10.7k14 gold badges52 silver badges90 bronze badges

answered Oct 18, 2016 at 18:25

volcano

3,58222 silver badges28 bronze badges

Wouldn't this slow down this process a whole lot though?
– Pyth0nicPenguin
Commented Oct 18, 2016 at 18:29
@Pyth0nicPenguin, less than re-writing files. And if you remove doubles after you created 50k words - you get less words.
– volcano
Commented Oct 18, 2016 at 18:32
BTW, if you worry ab.execution time - you probably shouldn't log every 50k combinations.
– volcano
Commented Oct 18, 2016 at 18:33
I'm not to worried about the execution time, I was just curious as to why I was only getting 63 out of 50k is all, I'll give this a shot and see what happens, thank you
– Pyth0nicPenguin
Commented Oct 18, 2016 at 18:35
2

@Pyth0nicPenguin, probably because you get too many repetitions. random is not as random as advertised :-) . And you are welcome
– volcano
Commented Oct 18, 2016 at 18:37

Add a comment |

trincot · Accepted Answer · 2016-10-18 19:25:07Z

You could guarantee unique strings by generating unique numbers, starting with a random number is a range that is 1/50000^th of the total number of possibilities (62⁸). Then generate more random numbers, each time determining the window in which the next number can be selected. This is not perfectly random, but I believe it's practically close enough.

Then these numbers can each be converted to strings by considering a representation of a 62-base number. Here is the code, and a test at the end to check that indeed all 50000 strings are unique:

import string
import random

def random_strings(count, size=8, chars=string.ascii_uppercase + string.digits + string.ascii_lowercase):
    max = len(chars) ** size - 1
    start = 0
    choices = []
    for i in range(0,count):
        start = random.randint(start, start + (max-start) // (count-i))
        digits = []
        temp = start
        while len(digits) < size:
            temp, i = divmod(temp, len(chars))
            digits.append(chars[i])
        choices.append(''.join(digits))
        start += 1
    return choices

choices = random_strings(50000)
# optional shuffle, since they are produced in order of `chars`
random.shuffle(choices)
# Test: output how many distinct values there are:
print (len(set(choices)))

See it run on repl.it

This produces your strings in linear time. With the above parameters you'll have the answer within a second on the average PC.

Collectives™ on Stack Overflow

Generate a lot of random strings

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
python
string
python-2.7
random
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Not the answer you're looking for? Browse other questions tagged pythonstringpython-2.7random or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
python
string
python-2.7
random
or ask your own question.