For better usability (will come in very handy in a moment), let's change your code into a generator. That's just a function that yields values one by one when requested (or rather, technically, the generator object it returns does). So the only change is that instead of printing, you yield:
def combinations():
# List of 99 characters and a blank string:
...
else:
yield lib[counter0]+lib[counter1]+lib[counter2]+lib[counter3]+lib[counter4]+lib[counter5]+lib[counter6]+lib[counter7]+lib[counter8]+lib[counter9]
Now we can for example loop over its results and print them:
for comb in combinations():
print(comb)
Output:
0
1
2
3
4
5
6
7
8
9
A
B
...
Or we can take some to build a list:
from itertools import islice
print(list(islice(combinations(), 13)))
Output:
['', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B']
Let's watch it switch from one to two characters:
>>> list(islice(combinations(), 98, 102))
['ö', 'ü', '00', '10']
Or from two to three:
>>> list(islice(combinations(), 99*100-1, 99*100+3))
['öü', 'üü', '10', '20']
Wait what? Why is 'üü'
followed by '10'
? Shouldn't that have come a lot earlier? Did we produce that twice?
>>> list(islice(combinations(), 99*100+3)).count('10')
2
Yeah we did. Oops. So there's some bug in your code. Much harder to notice in your version, btw, with all the combinations just being printed and immediately being overwritten :-)
Anyway, I don't really want to dig deeper into that but show a simple alternative. Let's start from scratch. While we're at it, let's call it words
and make the alphabet a parameter. Start simple, only yield the words of lengths 0 and 1:
def words(alphabet):
yield ''
for letter in alphabet:
yield letter
Demo:
>>> list(words('abc'))
['', 'a', 'b', 'c']
Now how to produce the longer words? Let's see what we want:
'' '' + ''
'a' '' + 'a'
'b' '' + 'b'
'c' '' + 'c'
'aa' 'a' + 'a'
'ab' 'a' + 'b'
'ac' 'a' + 'c'
'ba' 'b' + 'a'
'bb' 'b' + 'b'
'bc' 'b' + 'c'
'ca' 'c' + 'a'
'cb' 'c' + 'b'
'cc' 'c' + 'c'
'aaa' 'aa' + 'a'
'aab' 'aa' + 'b'
'aac' 'aa' + 'c'
'aba' 'ab' + 'a'
'abb' 'ab' + 'b'
... ...
On the left is the words how we want them, and on the right I split them into prefix and last letter (if any). If we look at the last letter, we can see that it keeps cycling through the alphabet. All letters for each prefix. Let's pretend we had a prefix
function that gave us the prefixes. Then we could just write our solution like this:
def words(alphabet):
yield ''
for prefix in prefixes(alphabet):
for letter in alphabet:
yield prefix + letter
But wait. The first prefix is ''
, then 'a'
, 'b'
, 'c'
, 'aa'
, 'ab'
, etc. So the prefix just goes through the same sequence of words that we want overall. So... our words
function can use itself to produce the prefixes:
def words(alphabet):
yield ''
for prefix in words(alphabet):
for letter in alphabet:
yield prefix + letter
That's it. That's the whole solution.
Demo:
>>> list(islice(words('abc'), 20))
['', 'a', 'b', 'c', 'aa', 'ab', 'ac', 'ba', 'bb', 'bc', 'ca',
'cb', 'cc', 'aaa', 'aab', 'aac', 'aba', 'abb', 'abc', 'aca']
Finally let's try it with your alphabet again and see that switch from two to three letters:
>>> alphabet = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz°!"§$%&/()=ß´`+#-.,><@€|^~–{[]}ÄÖÜäöü'
>>> list(islice(words(alphabet), 99*100-1, 99*100+3))
['üö', 'üü', '000', '001']
So... we ended up implementing the whole thing with a generator function that has just four simple lines, and it works with an arbitrary alphabet, and as generator it's easy to use in many ways.
It's probably also a lot faster than yours, although because of your bug, that's not easy to benchmark properly. Peilonrayz' version also has a bug at the moment, but we can compare with Ivo_Merchiers' solution and a few variations.
First ten million words using your long alphabet of 99 letters:
same first 9,999,999: True
same 10,000,000th: True {'9TT8'}
1.41 1.38 1.38 seconds Stefan_Pochmann
1.66 1.64 1.63 seconds Stefan_Pochmann_2
2.45 2.45 2.45 seconds Ivo_Merchiers
2.19 2.20 2.21 seconds Ivo_Merchiers_2
1.50 1.49 1.50 seconds Ivo_Merchiers_3
1.20 1.20 1.20 seconds Ivo_Merchiers_chain
First ten million words using alphabet abc
:
same first 9,999,999: True
same 10,000,000th: True {'abcaccbbcccacbc'}
2.49 2.43 2.48 seconds Stefan_Pochmann
4.16 4.17 4.19 seconds Stefan_Pochmann_2
3.91 3.91 3.93 seconds Ivo_Merchiers
3.64 3.66 3.64 seconds Ivo_Merchiers_2
2.74 2.74 2.75 seconds Ivo_Merchiers_3
2.45 2.46 2.45 seconds Ivo_Merchiers_chain
Full benchmark code:
from itertools import product, count, islice, chain
from timeit import repeat
from collections import deque
lib = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz°!"§$%&/()=ß´`+#-.,><@€|^~–{[]}ÄÖÜäöü'
def Stefan_Pochmann(alphabet):
yield ''
for prefix in Stefan_Pochmann(alphabet):
for letter in alphabet:
yield prefix + letter
def Stefan_Pochmann_2(alphabet):
yield ''
for prefix in Stefan_Pochmann_2(alphabet):
yield from map(prefix.__add__, alphabet)
def Ivo_Merchiers(lib):
for length in count(0):
for combination in product(lib, repeat=length):
yield ''.join(combination)
def Ivo_Merchiers_2(lib):
join = ''.join
for length in count(0):
for combination in product(lib, repeat=length):
yield join(combination)
def Ivo_Merchiers_3(lib):
for length in count(0):
yield from map(''.join, product(lib, repeat=length))
def Ivo_Merchiers_chain(lib): # from Peilonrayz
join = ''.join
return chain.from_iterable(
map(join, product(lib, repeat=length))
for length in count(0)
)
solutions = Stefan_Pochmann, Stefan_Pochmann_2, Ivo_Merchiers, Ivo_Merchiers_2, Ivo_Merchiers_3, Ivo_Merchiers_chain
for alphabet in lib, 'abc':
print(alphabet)
n = 10**7
# Correctness
sets = map(set, zip(*(words(alphabet) for words in solutions)))
print(f'same first {n-1:,}:', all(len(s) == 1 for s in islice(sets, n - 1)))
s = next(sets)
print(f'same {n:,}th:', len(s) == 1, s)
print()
# Speed
tss = [[] for _ in solutions]
for _ in range(3):
for words, ts in zip(solutions, tss):
t = min(repeat(lambda: deque(islice(words(alphabet), n), 0), number=1))
ts.append(t)
for words, ts in zip(solutions, tss):
print(*('%.2f' % t for t in ts), 'seconds ', words.__name__, sep=' ')
print()