Counting letters in a string

Question

This little program is self-explanatory. I count letters in a string (can be any string), using a for loop to iterate through each letter. The problem is that this method is very slow and I want to avoid loops.

Any ideas? I thought that maybe if I remove checked letters from the string after each loop, then in some cases, where many letters repeat, that would make a difference.

def count_dict(mystring):
    d = {}
# count occurances of character
    for w in mystring: 
        d[w] = mystring.count(w)
# print the result
    for k in sorted(d):
        print (k + ': ' + str(d[k]))

mystring='qwertyqweryyyy'
count_dict(mystring)

The output:

e: 2
q: 2
r: 2
t: 1
w: 2
y: 5

Jaime · Accepted Answer · 2013-06-25 22:53:21Z

14

Use the built in Counter in the collections module:

>>> from collections import Counter
>>> Counter('qwertyqweryyyy')
Counter({'y': 5, 'e': 2, 'q': 2, 'r': 2, 'w': 2, 't': 1})

answered Jun 25, 2013 at 22:53

Jaime

6,59720 silver badges30 bronze badges

Add a comment |

SylvainD · Accepted Answer · 2013-06-26 23:32:06Z

Counter is definitely the way to go (and I've upvoted Jaime's answer).

If you want to do it yourself and iterate only once, this should work :

d={}
for l in s:
        d[l] = d.get(l,0) + 1

There might be a short/more pythonic way to do so but it works...

Edit : I must confess that Jaime's comment to this answer surprised me but I've just tested this code :

from profilehooks import profile

s="qwertyuiopasdfghjklzxcvbnm"

@profile
def function1(s):
        d={}
        for l in s:
                d[l] = d.get(l,0)+1
        return d

@profile
def function2(s):
        return dict((char_, s.count(char_)) for char_ in set(s))

for i in xrange(0,200):
        function1(s*i)
        function2(s*i)

And the results can hardly be contested :

*** PROFILER RESULTS ***
function2 (./fsdhfsdhjk.py:13)
function called 200 times

         10948 function calls in 0.161 seconds

   Ordered by: cumulative time, internal time, call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      200    0.083    0.000    0.161    0.001 fsdhfsdhjk.py:13(function2)
     5374    0.033    0.000    0.077    0.000 fsdhfsdhjk.py:15(<genexpr>)
     5174    0.044    0.000    0.044    0.000 {method 'count' of 'str' objects}
      200    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        0    0.000             0.000          profile:0(profiler)



*** PROFILER RESULTS ***
function1 (./fsdhfsdhjk.py:6)
function called 200 times

         517800 function calls in 2.891 seconds

   Ordered by: cumulative time, internal time, call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      200    1.711    0.009    2.891    0.014 fsdhfsdhjk.py:6(function1)
   517400    1.179    0.000    1.179    0.000 {method 'get' of 'dict' objects}
      200    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        0    0.000             0.000          profile:0(profiler)

TL;DR Jaime's solution (function2) is 18 times faster than mine (function1).

This would be the right way of doing things in a language such as C. Because Python loops have so much overhead, it actually may turn out to be faster to do something like: char_counts = dict((char_, test_string.count(char_)) for char_ in set(test_string)) It does run multiple times over the string, but because the loops run in C, not in Python, it is faster. — Jaime, Commented Jun 26, 2013 at 23:12
I am really really impressed. I've updated my answer accordingly. — SylvainD, Commented Jun 26, 2013 at 23:32
I tested it on a 10**6 character string and there it was only 4x faster. It's one of the few things I don't like about Python, that sometimes optimizing code is not about writing the most efficient algorithm, but about figuring out which built-in functions run faster. — Jaime, Commented Jun 27, 2013 at 3:38
Yes, as function2 loops roughly (x+1) times on the string (x being the number of different characters), I can imagine that the performance gain, compared to function 1 looping only once, gets smaller as x and the string get bigger. Still, this is a damn long string :-) — SylvainD, Commented Jun 27, 2013 at 6:31
This is my first experience with python profiler. I ran cProfile on several methods and surprisingly, my original method took less function calls than suggested Counter, even though all methods take exactly the same amount of time (I had a relatively short string). I am running profiler with $ python -m cProfile -s time test.py — casper, Commented Jun 28, 2013 at 20:37

Rajesh · Accepted Answer · 2015-03-29 04:21:25Z

1

This is the shortest answer I can think of:

{i:str.count(i) for i in str}

This is called Dictionary comprehension, which is an efficient way to get the count of each alphabet in the string as a letter(key):count(value) pair.

Example:

str = "StackExchange"  
{i:str.count(i) for i in str}  
{'a': 2, 'c': 2, 'E': 1, 'g': 1, 'h': 1, 'k': 1, 'n': 1, 'S': 1, 't': 1, 'x': 1, 'e': 1}

edited Mar 29, 2015 at 4:21

user34073

answered Mar 29, 2015 at 3:25

Rajesh

212 bronze badges

Add a comment |

Stack Exchange Network

Counting letters in a string

3 Answers 3

Not the answer you're looking for? Browse other questions tagged
python
strings
or ask your own question.

Hot Network Questions

Counting letters in a string

3 Answers 3

Not the answer you're looking for? Browse other questions tagged pythonstrings or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
python
strings
or ask your own question.