11
\$\begingroup\$

I just wrote a little Python 3 program to spit out every possible combination of a set of 99 characters. It gets the job done, but I would be very interested in what you think of it.

I am just a few days into Python, so I would be grateful for even seemingly obvious advice.


import sys

# List of 99 characters and a blank string:

lib=["","0","1","2","3","4","5","6","7","8","9","A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z","a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","°","!","\"","§","$","%","&","/","(",")","=","ß","´","`","+","#","-",".",",",">","<","@","€","|","^","~","–","{","[","]","}","Ä","Ö","Ü","ä","ö","ü"]


# 10 counters for up to 10-digit-combinations:

counter0=-1
counter1=0
counter2=0
counter3=0
counter4=0
counter5=0
counter6=0
counter7=0
counter8=0
counter9=0

# Repetitive if-statements adding to the counters:

for i in range(sys.maxsize**99999):
    counter0+=1
    
    if counter0>99:
        counter0=counter0*0
        counter1+=1
        
    elif counter1>99:
        counter1=counter1*0
        counter2+=1
        
    elif counter2>99:
        counter2=counter2*0
        counter3+=1
        
    elif counter3>99:
        counter3=counter3*0
        counter4+=1
        
    elif counter4>99:
        counter4=counter4*0
        counter5+=1
        
    elif counter5>99:
        counter5=counter5*0
        counter6+=1
        
    elif counter6>99:
        counter6=counter6*0
        counter7+=1
        
    elif counter7>99:
        counter7=counter7*0
        counter8+=1
        
    elif counter8>99:
        counter8=counter8*0
        counter9+=1
        
    elif counter9>99:
        print("DONE.")
        
# Printing the translation from counters to character - and deleting the former output so it stays in one line:
        
    else:
        print(lib[counter0]+lib[counter1]+lib[counter2]+lib[counter3]+lib[counter4]+lib[counter5]+lib[counter6]+lib[counter7]+lib[counter8]+lib[counter9], end="\r")
        sys.stdout.write("\b"*10+" "*10+"\b"*10)

\$\endgroup\$
0

3 Answers 3

21
\$\begingroup\$
  • We can convert a plain string to a list rather than maintain a list for the characters.

    It is far easier to modify and read the following then a list.

    lib = [''] + list('0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz°!"§$%&/()=ß´`+#-.,><@€|^~–{[]}ÄÖÜäöü')
    
  • When we have counter0, counter1, ..., countern it's a hint that we should use a list.

    counters = [-1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    

    We can then use counters[0] as a drop in replacement for counter0.

  • Now that we have counters which is a list we can simplify your print from the following.

    print(lib[counters[0]] + lib[counters[1]] + lib[counters[2]] + lib[counters[3]] + > lib[counters[4]] + lib[counters[5]] + lib[counters[6]] + lib[counters[7]] + lib[counters[8]] + lib[counters[9]], end="\r")
    

    We can use a for loop to loop through counters, index lib and print the character. We'll use end="" to get the same format that you have. Since we've changed from "\r" to "" we'll need to print that afterwards.

    for counter in counters:
        print(lib[counter], end="")
    print(end="\r")
    
  • It would be better to use len(lib) rather than hard coding 99 in your ifs. If we change the content of lib it's much easier to edit just lib rather than lib and 10 99's.

  • Rather than counter0=counter0*0 it would make more sense to remove the multiplication and just set the value to 0.

    counter0 = 0
    
  • It's convention in Python to put a space either side of operators. This means a+b should instead be a + b. This is as it makes it easier to see what is and is not an operator and either side of an operator.

  • It's convention in Python to use _ as a 'throw away' variable. This means it's normal to use _ rather than i in your for loop.

Together this gets:

import sys

lib = [''] + list('0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz°!"§$%&/()=ß´`+#-.,><@€|^~–{[]}ÄÖÜäöü')
counters = [-1, 0, 0, 0, 0, 0, 0, 0, 0, 0]

for _ in range(sys.maxsize**99999):
    counters[0] += 1
    if counters[0] >= len(lib):
        counters[0] = 0
        counters[1] += 1
    elif counters[1] >= len(lib):
        counters[1] = 0
        counters[2] += 1
    elif counters[2] >= len(lib):
        counters[2] = 0
        counters[3] += 1
    elif counters[3] >= len(lib):
        counters[3] = 0
        counters[4] += 1
    elif counters[4] >= len(lib):
        counters[4] = 0
        counters[5] += 1
    elif counters[5] >= len(lib):
        counters[5] = 0
        counters[6] += 1
    elif counters[6] >= len(lib):
        counters[6] = 0
        counters[7] += 1
    elif counters[7] >= len(lib):
        counters[7] = 0
        counters[8] += 1
    elif counters[8] >= len(lib):
        counters[8] = 0
        counters[9] += 1
    elif counters[9] >= len(lib):
        print("DONE.")
    else:
        for counter in counters:
            print(lib[counter], end="")
        print(end="\r")
        sys.stdout.write("\b"*10 + " "*10 + "\b"*10)

There are still some changes that we can make to your code to make it easier to work with. These are a bit more advanced so don't worry if you don't get them straight away.

  • We can change your big if elif block into a single for loop.
    Lets see what we have so far:

    if counters[0] > len(lib):
        counters[0] = 0
        counters[1] += 1
    

    We know that this section is repeated each time for each index. So we can make this generic by changing 0 to index and 1 to index + 1.

    if counters[index] >= len(lib):
        counters[index] = 0
        counters[index + 1] += 1
    

    Now we just need to loop over range(len(counters) - 1) to duplicate the block 9 times.

    for index in range(len(counters) - 1):
        if counters[index] >= len(lib):
            counters[index] = 0
            counters[index + 1] += 1
    
  • We can use some Python sugar to make your print for loop 'cleaner'. Firstly we can remove all the prints by building a list.

    combination = []
    for counter in counters:
        combination.append(lib[counter])
    

    From here we can join all the strings together with "".join and pass it to print like you did before. This will join the list by empty strings so it converts it's like manually doing combination[0] + combination[1] + ....

    print("".join(combination), end="\r")
    

    We can then use a list comprehension to build combination in one line. This is just syntaxtic sugar and is the same as the for loop we used before. It's just different, cleaner, syntax.

    combination = [lib[counter] for counter in counters]
    
  • We can use either a while True loop or itertools.count rather than range(sys.maxsize**99999) to loop infinitely.

    while True:
        counters[0] += 1
    
    import itertools
    
    for _ in range(itertools.count()):
        counters[0] += 1
    
  • We can probably just use print rather than sys.stdout.write.

    To make it so there is no newline we can pass end="". This however doesn't flush the stream straight away, and so we need to pass flush=True.

lib = [''] + list('0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz°!"§$%&/()=ß´`+#-.,><@€|^~–{[]}ÄÖÜäöü')
counters = [-1, 0, 0, 0, 0, 0, 0, 0, 0, 0]

while True:
    counters[0] += 1
    for index in range(len(counters) - 1):
        if counters[index] >= len(lib):
            counters[index] = 0
            counters[index + 1] += 1
    
    if counters[9] >= len(lib):
        print("DONE.")
    else:
        print("".join([lib[counter] for counter in counters]), end="\r")
        print("\b"*10 + " "*10 + "\b"*10, end="", flush=True)
\$\endgroup\$
6
  • 1
    \$\begingroup\$ Very nice answer! One thing I'd add, is that this kind of function seems to be much more useful as a generator. Maybe move the code into a function and a main where the desired amount of data is printed from the generator. \$\endgroup\$
    – jaaq
    Commented Oct 5, 2020 at 7:17
  • 2
    \$\begingroup\$ @jaaq Thank you! I would normally agree with you 110% however I feel bringing generator functions into the mix may cause significant confusion for the OP. I know the OP wants the code to run infinitely and figuring out two things at the some time is likely to be harder than just one. (generator functions and making it run infinitely) After the OP has figured out how to make it infinite I think it would make a good follow up question where explaining generator functions would be a reasonable stepping stone in growing the OP's programming abilities. But to be clear, generators would be good here \$\endgroup\$
    – Peilonrayz
    Commented Oct 5, 2020 at 11:47
  • 1
    \$\begingroup\$ Yeah, I agree it'd be a lot at once for someone just starting out. Just wanted to throw it out there; broaden the horizon a bit more you know :) \$\endgroup\$
    – jaaq
    Commented Oct 5, 2020 at 14:05
  • 1
    \$\begingroup\$ Why have a list of single character strings, when a string is already a list of characters? \$\endgroup\$
    – Voo
    Commented Oct 5, 2020 at 21:03
  • 1
    \$\begingroup\$ @Voo As in change lib to '0123...'? It's because it won't include ''. \$\endgroup\$
    – Peilonrayz
    Commented Oct 5, 2020 at 21:04
11
\$\begingroup\$

It might be useful to know that python has some built-in options for performing combinatorics. In particular, I found the itertools module very handy for these kind of operations. It might be a bit advanced when still starting with Python, but in time you'll pick up many of these useful things.

For the case of bruto forcing a password, the product method seems ideal.

For example, if you want all possible combinations with 5 digits, you can run:

from itertools import product

lib = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz°!"§$%&/()=ß´`+#-.,><@€|^~–{[]}ÄÖÜäöü'
for combination in product(lib, repeat=5):
  attempt="".join(combination) # This turns combination (a list of characters) into a string.
  # Use your password attempt here

So if you want to expand this to all number of digits up to 10, you can use:

for length in range(10):
  for combination in product(lib, repeat=length):
    attempt="".join(combination)
    # Use your password attempt here

A note on using generators

One advantage of this method is that the methods from itertools don't store all combinations, but instead they generate them on the go. As a result, their memory usage doesn't increase with the number of combinations.

This is quite important in a scenario like yours, where the amount of possible combinations has a factorial growth. This article gives a good introduction, with some extra use-cases.

Another nice part of this is, that it's quite easy to let this code keep trying all combinations of increasing length untill something is found.

This also uses the count() method from itertools, which is a generator that starts from a number and keeps increasing forever.

from itertools import product, count
lib = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz°!"§$%&/()=ß´`+#-.,><@€|^~–{[]}ÄÖÜäöü'

for length in count(0):
  for combination in product(lib, repeat=length):
    attempt="".join(combination)
    # Use your password attempt here
    if password_found:
      print(attempt)
      break
\$\endgroup\$
0
6
\$\begingroup\$

For better usability (will come in very handy in a moment), let's change your code into a generator. That's just a function that yields values one by one when requested (or rather, technically, the generator object it returns does). So the only change is that instead of printing, you yield:

def combinations():

    # List of 99 characters and a blank string:

    ...

        else:
            yield lib[counter0]+lib[counter1]+lib[counter2]+lib[counter3]+lib[counter4]+lib[counter5]+lib[counter6]+lib[counter7]+lib[counter8]+lib[counter9]

Now we can for example loop over its results and print them:

for comb in combinations():
    print(comb)

Output:


0
1
2
3
4
5
6
7
8
9
A
B
...

Or we can take some to build a list:

from itertools import islice

print(list(islice(combinations(), 13)))

Output:

['', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B']

Let's watch it switch from one to two characters:

>>> list(islice(combinations(), 98, 102))
['ö', 'ü', '00', '10']

Or from two to three:

>>> list(islice(combinations(), 99*100-1, 99*100+3))
['öü', 'üü', '10', '20']

Wait what? Why is 'üü' followed by '10'? Shouldn't that have come a lot earlier? Did we produce that twice?

>>> list(islice(combinations(), 99*100+3)).count('10')
2

Yeah we did. Oops. So there's some bug in your code. Much harder to notice in your version, btw, with all the combinations just being printed and immediately being overwritten :-)

Anyway, I don't really want to dig deeper into that but show a simple alternative. Let's start from scratch. While we're at it, let's call it words and make the alphabet a parameter. Start simple, only yield the words of lengths 0 and 1:

def words(alphabet):
    yield ''
    for letter in alphabet:
        yield letter

Demo:

>>> list(words('abc'))
['', 'a', 'b', 'c']

Now how to produce the longer words? Let's see what we want:

''         '' + ''
'a'        '' + 'a'
'b'        '' + 'b'
'c'        '' + 'c'
'aa'       'a' + 'a'
'ab'       'a' + 'b'
'ac'       'a' + 'c'
'ba'       'b' + 'a'
'bb'       'b' + 'b'
'bc'       'b' + 'c'
'ca'       'c' + 'a'
'cb'       'c' + 'b'
'cc'       'c' + 'c'
'aaa'      'aa' + 'a'
'aab'      'aa' + 'b'
'aac'      'aa' + 'c'
'aba'      'ab' + 'a'
'abb'      'ab' + 'b'
 ...           ...

On the left is the words how we want them, and on the right I split them into prefix and last letter (if any). If we look at the last letter, we can see that it keeps cycling through the alphabet. All letters for each prefix. Let's pretend we had a prefix function that gave us the prefixes. Then we could just write our solution like this:

def words(alphabet):
    yield ''
    for prefix in prefixes(alphabet):
        for letter in alphabet:
            yield prefix + letter

But wait. The first prefix is '', then 'a', 'b', 'c', 'aa', 'ab', etc. So the prefix just goes through the same sequence of words that we want overall. So... our words function can use itself to produce the prefixes:

def words(alphabet):
    yield ''
    for prefix in words(alphabet):
        for letter in alphabet:
            yield prefix + letter

That's it. That's the whole solution.

Demo:

>>> list(islice(words('abc'), 20))
['', 'a', 'b', 'c', 'aa', 'ab', 'ac', 'ba', 'bb', 'bc', 'ca',
 'cb', 'cc', 'aaa', 'aab', 'aac', 'aba', 'abb', 'abc', 'aca']

Finally let's try it with your alphabet again and see that switch from two to three letters:

>>> alphabet = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz°!"§$%&/()=ß´`+#-.,><@€|^~–{[]}ÄÖÜäöü'
>>> list(islice(words(alphabet), 99*100-1, 99*100+3))
['üö', 'üü', '000', '001']

So... we ended up implementing the whole thing with a generator function that has just four simple lines, and it works with an arbitrary alphabet, and as generator it's easy to use in many ways.

It's probably also a lot faster than yours, although because of your bug, that's not easy to benchmark properly. Peilonrayz' version also has a bug at the moment, but we can compare with Ivo_Merchiers' solution and a few variations.

First ten million words using your long alphabet of 99 letters:

same first 9,999,999: True
same 10,000,000th: True {'9TT8'}

1.41  1.38  1.38  seconds   Stefan_Pochmann
1.66  1.64  1.63  seconds   Stefan_Pochmann_2
2.45  2.45  2.45  seconds   Ivo_Merchiers
2.19  2.20  2.21  seconds   Ivo_Merchiers_2
1.50  1.49  1.50  seconds   Ivo_Merchiers_3
1.20  1.20  1.20  seconds   Ivo_Merchiers_chain

First ten million words using alphabet abc:

same first 9,999,999: True
same 10,000,000th: True {'abcaccbbcccacbc'}

2.49  2.43  2.48  seconds   Stefan_Pochmann
4.16  4.17  4.19  seconds   Stefan_Pochmann_2
3.91  3.91  3.93  seconds   Ivo_Merchiers
3.64  3.66  3.64  seconds   Ivo_Merchiers_2
2.74  2.74  2.75  seconds   Ivo_Merchiers_3
2.45  2.46  2.45  seconds   Ivo_Merchiers_chain

Full benchmark code:

from itertools import product, count, islice, chain
from timeit import repeat
from collections import deque

lib = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz°!"§$%&/()=ß´`+#-.,><@€|^~–{[]}ÄÖÜäöü'

def Stefan_Pochmann(alphabet):
    yield ''
    for prefix in Stefan_Pochmann(alphabet):
        for letter in alphabet:
            yield prefix + letter

def Stefan_Pochmann_2(alphabet):
    yield ''
    for prefix in Stefan_Pochmann_2(alphabet):
        yield from map(prefix.__add__, alphabet)

def Ivo_Merchiers(lib):
    for length in count(0):
        for combination in product(lib, repeat=length):
            yield ''.join(combination)

def Ivo_Merchiers_2(lib):
    join = ''.join
    for length in count(0):
        for combination in product(lib, repeat=length):
            yield join(combination)

def Ivo_Merchiers_3(lib):
    for length in count(0):
        yield from map(''.join, product(lib, repeat=length))

def Ivo_Merchiers_chain(lib):     # from Peilonrayz
    join = ''.join
    return chain.from_iterable(
        map(join, product(lib, repeat=length))
        for length in count(0)
    )

solutions = Stefan_Pochmann, Stefan_Pochmann_2, Ivo_Merchiers, Ivo_Merchiers_2, Ivo_Merchiers_3, Ivo_Merchiers_chain

for alphabet in lib, 'abc':
    print(alphabet)

    n = 10**7
    
    # Correctness
    sets = map(set, zip(*(words(alphabet) for words in solutions)))
    print(f'same first {n-1:,}:', all(len(s) == 1 for s in islice(sets, n - 1)))
    s = next(sets)
    print(f'same {n:,}th:', len(s) == 1, s)
    print()
    
    # Speed
    tss = [[] for _ in solutions]
    for _ in range(3):
        for words, ts in zip(solutions, tss):
            t = min(repeat(lambda: deque(islice(words(alphabet), n), 0), number=1))
            ts.append(t)
    for words, ts in zip(solutions, tss):
        print(*('%.2f' % t for t in ts), 'seconds ', words.__name__, sep='  ')
    print()
\$\endgroup\$
5
  • 1
    \$\begingroup\$ @Peilonrayz Ah, I just finished the benchmark, see the update :-). Mine's fastest, as I actually expected because my "innermost loop" just adds a prefix and a letter. Faster than Ivo's join. I didn't include yours because you have some bug. Not sure it's the same bug you're talking about. I mean that between ü and 00 you produce 0 again. \$\endgroup\$ Commented Oct 5, 2020 at 18:07
  • 1
    \$\begingroup\$ Interesting I'm suprised at how slow itertools is. I even moved a lot of Ivo's code into C and it becomes only a smidge faster than yours. Timings & code. If you want to add just your two timings here's another picture. \$\endgroup\$
    – Peilonrayz
    Commented Oct 5, 2020 at 19:28
  • 1
    \$\begingroup\$ @Peilonrayz Updated. Your chain version is indeed the fastest. \$\endgroup\$ Commented Oct 5, 2020 at 20:48
  • \$\begingroup\$ Interesting analysis, I'm also surprised by how slow the itertools code works. Also surprising to see that chain has such a significant speed impact. \$\endgroup\$ Commented Oct 6, 2020 at 6:21
  • \$\begingroup\$ Nice answer. I wanted to propose another alternative : use 100 chars, pick any number written in base 10 and convert it to base 100, which can be done really fast. Your solution is cleaner and faster. \$\endgroup\$ Commented Oct 8, 2020 at 18:44

Not the answer you're looking for? Browse other questions tagged or ask your own question.