11
\$\begingroup\$

This little program is self-explanatory. I count letters in a string (can be any string), using a for loop to iterate through each letter. The problem is that this method is very slow and I want to avoid loops.

Any ideas? I thought that maybe if I remove checked letters from the string after each loop, then in some cases, where many letters repeat, that would make a difference.

def count_dict(mystring):
    d = {}
# count occurances of character
    for w in mystring: 
        d[w] = mystring.count(w)
# print the result
    for k in sorted(d):
        print (k + ': ' + str(d[k]))

mystring='qwertyqweryyyy'
count_dict(mystring)

The output:

e: 2
q: 2
r: 2
t: 1
w: 2
y: 5
\$\endgroup\$

3 Answers 3

14
\$\begingroup\$

Use the built in Counter in the collections module:

>>> from collections import Counter
>>> Counter('qwertyqweryyyy')
Counter({'y': 5, 'e': 2, 'q': 2, 'r': 2, 'w': 2, 't': 1})
\$\endgroup\$
0
6
\$\begingroup\$

Counter is definitely the way to go (and I've upvoted Jaime's answer).

If you want to do it yourself and iterate only once, this should work :

d={}
for l in s:
        d[l] = d.get(l,0) + 1

There might be a short/more pythonic way to do so but it works...

Edit : I must confess that Jaime's comment to this answer surprised me but I've just tested this code :

from profilehooks import profile

s="qwertyuiopasdfghjklzxcvbnm"

@profile
def function1(s):
        d={}
        for l in s:
                d[l] = d.get(l,0)+1
        return d

@profile
def function2(s):
        return dict((char_, s.count(char_)) for char_ in set(s))

for i in xrange(0,200):
        function1(s*i)
        function2(s*i)

And the results can hardly be contested :

*** PROFILER RESULTS ***
function2 (./fsdhfsdhjk.py:13)
function called 200 times

         10948 function calls in 0.161 seconds

   Ordered by: cumulative time, internal time, call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      200    0.083    0.000    0.161    0.001 fsdhfsdhjk.py:13(function2)
     5374    0.033    0.000    0.077    0.000 fsdhfsdhjk.py:15(<genexpr>)
     5174    0.044    0.000    0.044    0.000 {method 'count' of 'str' objects}
      200    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        0    0.000             0.000          profile:0(profiler)



*** PROFILER RESULTS ***
function1 (./fsdhfsdhjk.py:6)
function called 200 times

         517800 function calls in 2.891 seconds

   Ordered by: cumulative time, internal time, call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      200    1.711    0.009    2.891    0.014 fsdhfsdhjk.py:6(function1)
   517400    1.179    0.000    1.179    0.000 {method 'get' of 'dict' objects}
      200    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        0    0.000             0.000          profile:0(profiler)

TL;DR Jaime's solution (function2) is 18 times faster than mine (function1).

\$\endgroup\$
5
  • 4
    \$\begingroup\$ This would be the right way of doing things in a language such as C. Because Python loops have so much overhead, it actually may turn out to be faster to do something like: char_counts = dict((char_, test_string.count(char_)) for char_ in set(test_string)) It does run multiple times over the string, but because the loops run in C, not in Python, it is faster. \$\endgroup\$
    – Jaime
    Commented Jun 26, 2013 at 23:12
  • \$\begingroup\$ I am really really impressed. I've updated my answer accordingly. \$\endgroup\$
    – SylvainD
    Commented Jun 26, 2013 at 23:32
  • 3
    \$\begingroup\$ I tested it on a 10**6 character string and there it was only 4x faster. It's one of the few things I don't like about Python, that sometimes optimizing code is not about writing the most efficient algorithm, but about figuring out which built-in functions run faster. \$\endgroup\$
    – Jaime
    Commented Jun 27, 2013 at 3:38
  • \$\begingroup\$ Yes, as function2 loops roughly (x+1) times on the string (x being the number of different characters), I can imagine that the performance gain, compared to function 1 looping only once, gets smaller as x and the string get bigger. Still, this is a damn long string :-) \$\endgroup\$
    – SylvainD
    Commented Jun 27, 2013 at 6:31
  • \$\begingroup\$ This is my first experience with python profiler. I ran cProfile on several methods and surprisingly, my original method took less function calls than suggested Counter, even though all methods take exactly the same amount of time (I had a relatively short string). I am running profiler with $ python -m cProfile -s time test.py \$\endgroup\$
    – casper
    Commented Jun 28, 2013 at 20:37
1
\$\begingroup\$

This is the shortest answer I can think of:

{i:str.count(i) for i in str}

This is called Dictionary comprehension, which is an efficient way to get the count of each alphabet in the string as a letter(key):count(value) pair.

Example:

str = "StackExchange"  
{i:str.count(i) for i in str}  
{'a': 2, 'c': 2, 'E': 1, 'g': 1, 'h': 1, 'k': 1, 'n': 1, 'S': 1, 't': 1, 'x': 1, 'e': 1}
\$\endgroup\$

Not the answer you're looking for? Browse other questions tagged or ask your own question.