I have a method that will generate 50,000 random strings, save them all to a file, and then run through the file, and delete all duplicates of the strings that occur. Out of those 50,000 random strings, after using set()
to generate unique ones, on average 63 of them are left.
Function to generate the strings:
def random_strings(size=8, chars=string.ascii_uppercase + string.digits + string.ascii_lowercase):
return ''.join(random.choice(chars) for _ in xrange(size))
Delete duplicates:
with open("dicts/temp_dict.txt", "a+") as data:
created = 0
while created != 50000:
string = random_strings()
data.write(string + "\n")
created += 1
sys.stdout.write("\rCreating password: {} out of 50000".format(created))
sys.stdout.flush()
print "\nRemoving duplicates.."
with open("dicts\\rainbow-dict.txt", "a+") as rewrite:
rewrite.writelines(set(data))
Example of before and after: https://gist.github.com/Ekultek/a760912b40cb32de5f5b3d2fc580b99f
How can I generate completely random unique strings without duplicates?
set(data)
supposed to do?