7
\$\begingroup\$

The question was to see if a word is an anagram of another word.

How would you improve this? It returns True if an anagram, or False otherwise.

# write the function is_anagram
def is_anagram(test, original):
    if len(test) != len(original):
        return False
    for letter in test.lower():
        if letter not in original.lower():
            return False
    for letter in original.lower():
        if letter not in test.lower():
            return False
    return True
\$\endgroup\$
3
  • 4
    \$\begingroup\$ What about the words Hallo and Holllllah. Thises aren't annagramms of each other but your algorithm would say they are \$\endgroup\$
    – MrSmith42
    Commented Jul 10, 2015 at 14:25
  • \$\begingroup\$ Thanks for that! I added in this statement to solve the issue. if len(test) != len(original): return False \$\endgroup\$
    – HussienK
    Commented Jul 10, 2015 at 18:31
  • 2
    \$\begingroup\$ Your algorithm gives a false-positive on words with the same length and the same letters, but in different quantities. For example, "goggle" and "google" are not anagrams, but your algorithm says that they are. \$\endgroup\$ Commented Jul 10, 2015 at 20:45

6 Answers 6

11
\$\begingroup\$

The collections module provides a Counter class that can do the counting and the comparing for you:

from collections import Counter
>>> Counter('nana') == Counter('anna')
True
>>> Counter('nana') == Counter('ana')
False

Counter is basically a dictionary with items as keys and counts as values, so you can build it yourself from more primitive Python types by doing something like:

def count_items(sequence):
    counts = {}
    for item in sequence:
        counts[item] = counts.get(item, 0) + 1
    return counts

def is_anagram(a, b):
    return count_items(a.lower()) == count_items(b.lower())
\$\endgroup\$
1
  • \$\begingroup\$ Best answer. Besides being the most laconic and readable, it's O(n), right? OP should accept this. \$\endgroup\$
    – tonytony
    Commented Jul 11, 2015 at 11:20
13
\$\begingroup\$

You need to check if each letter occurs the name number of times in both strings.

One method would be to sort the letters and compare the lists of letters for equality.

Here is my approach:

def is_anagram(test, original):
  return sorted(list(test.lower())) == sorted(list(original.lower()))

This algorithm is far from optimal runtime O(n*logn), but I have chosen it because of its simplicity. For an Efficient algorithm see the comment of @corsiKa for a quite simple O(n) algorithm.

\$\endgroup\$
5
  • \$\begingroup\$ Note that sorted will take a string (or any other iterable) just fine... \$\endgroup\$
    – jonrsharpe
    Commented Jul 10, 2015 at 15:03
  • 3
    \$\begingroup\$ There should be a way to do this in O(n) time instead of O(n*log n) time. To do this, you could create two hashes which map letters to integers, then compare the contents of the hashes. This answer is simpler though both in readability and length. \$\endgroup\$ Commented Jul 10, 2015 at 18:09
  • 1
    \$\begingroup\$ @John you basically have to make a 26 element array, count letters going up in the first one, and count letters going down in the second, then check your array for all 0s. You may find in practice that to's faster to sort and compare. The fastest implementation probably performs a selection sort and short circuits (immediately returns false) when it finds a mismatch. \$\endgroup\$
    – corsiKa
    Commented Jul 10, 2015 at 20:21
  • \$\begingroup\$ @corsiKa What if there are uppercase letters too? If they are the same length there is no need to check the array for 0s \$\endgroup\$
    – spyr03
    Commented Jul 11, 2015 at 3:55
  • 1
    \$\begingroup\$ Anagrams don't usually consider case sensitivity. If they did you'd need a 52 element array, but that seems contrary to the norm. As far as length check, yes you can optimize by short circuiting the length, since different length inputs are never anagrams. \$\endgroup\$
    – corsiKa
    Commented Jul 11, 2015 at 6:02
3
\$\begingroup\$

Inspired by MrSmith42's nice answer, I came up with a version that doesn't require sorting of the full arguments, just sorting of their Counter()s. And that sorting is triggered only if the set()s of the arguments aren't equal. That check should allow rejecting most non-anagrams very quickly, without sorting anything.

The .most_common() method of a Counter does the sorting. Counters are dictionaries that return the number of recurrences of each unique item in an iterable. The default order in which these unique items are listed is unpredictable, so sorting via .most_common() is required.

I haven't implemented case-insensitivity but that would be easy with .lower().

from collections import Counter
def is_anagram(test, original):
    if set(test) == set(original):
        return Counter(test).most_common() == Counter(original).most_common()
    return False

Update: I compared the two solutions so far (mine and MrSmith42's) for speed. The use of Counter is slower than I thought, slower than sorting the whole list. Also, I realized that in addition to set equality, anagrams must have len equality, which is another speed pre-check to use to discard non-anagrams before sorting. So combining the best elements of MrSmith42's and my answers so far, we get

def is_anagram(test, original):
    if len(test) == len(original):
        if set(test) == set(original):
            return sorted(list(test)) == sorted(list(original))
    return False

Timing:

from string import letters
from random import choice, sample
from copy import copy

import timeit 

# Make a very long random word
long_word = ''.join(choice(letters) for _ in xrange(10**6))

# Shuffle it to make an anagram
long_anagram = ''.join(sample(long_word, len(long_word)))

# A near anagram of the same length but with different character sets
wrong_chars = copy(long_anagram[:-1]) + '!'

# A near anagram but with a different length
wrong_len = copy(long_anagram) + 'a'

# the function works
print is_anagram(long_word, long_anagram)
print is_anagram(long_word, wrong_chars)
print is_anagram(long_word, wrong_len)

# when string are anagrams (or near anagrams, unchecked here) sorting is required and takes the longest
%timeit is_anagram(long_word, long_anagram)

# faster rejection of obviously wrong answers
%timeit is_anagram(long_word, wrong_chars)
%timeit is_anagram(long_word, wrong_len)

True
False
False
1 loops, best of 3: 486 ms per loop
10 loops, best of 3: 40 ms per loop
The slowest run took 8.11 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 264 ns per loop
\$\endgroup\$
7
  • 1
    \$\begingroup\$ Could you not just implement your own counter and then the order will be predictable? As in make a list and for each character in the string, increment the list at that character's index? Then when going through the second string you can decrement instead, and if you ever go negative, you know it isn't an anagram \$\endgroup\$
    – spyr03
    Commented Jul 10, 2015 at 17:00
  • \$\begingroup\$ That seems like a perfectly good idea to try. I also tried the .subtract() method available for Counters and then checking that all values in the resulting dictionary were zero. That was also slower than sorting, at least for my example above. I'm not sure how subtract was implemented, but I just figured if even that wasn't faster I should give up on Counter and on counting based methods in general. Perfectly happy to be proven wrong! \$\endgroup\$
    – Curt F.
    Commented Jul 10, 2015 at 17:08
  • \$\begingroup\$ if I post an answer, will you time the code? I'm not too sure on benchmarking in python \$\endgroup\$
    – spyr03
    Commented Jul 10, 2015 at 17:20
  • 1
    \$\begingroup\$ Not like it matters, but a small improvement: if len(test) == len(original) and set(test) == set(original): is more pythonic than having two nested if sentences \$\endgroup\$ Commented Jul 10, 2015 at 19:07
  • 1
    \$\begingroup\$ Dictionary comparison does not rely on the order of the keys, so the sorting is completely unnecessary. And if it where necessary, your code would break in case of words where more than one letter appeared the same number of times. From the docs: "Elements with equal counts are ordered arbitrarily." \$\endgroup\$
    – Jaime
    Commented Oct 4, 2016 at 10:20
3
\$\begingroup\$

First check if they are the same length, if not then we can return false immediately.

Next make a list which will store the frequency for each character in the first string. ord() converts a char to its ascii value.

Then we go through the second string, and subtract 1 from each corresponding frequency, and if we go below 0, we know that a character is not in the original string (enough times). Since they are the same length, we dont' have to check the frequency list for left-over characters.

def isAnagram(original, test):
    if len(original) == len(test):
        count = [0] * ord('z') #characters up to 'z' in ascii, increase if needed
        for c in original:
            count[ord(c)] += 1
        for c in test:
            if count[ord(c)] == 0:
                return False
            else:
                count[ord(c)] -= 1
        return True
    return False

original = raw_input()
test = raw_input()

print "%s and %s are %s" % (original, test, "anagrams" if (isAnagram(original, test)) else "not anagrams")
\$\endgroup\$
6
  • \$\begingroup\$ What are a and b and how are they different from test and original? \$\endgroup\$
    – Curt F.
    Commented Jul 11, 2015 at 3:58
  • \$\begingroup\$ a and b are the quick names I came up with to test the code, I probably should have just labelled them as original and test, my bad \$\endgroup\$
    – spyr03
    Commented Jul 11, 2015 at 3:59
  • \$\begingroup\$ I couldn't get your function to run because of the for c in a: and for c in b: lines. I tried changing a and b to test and original, but then got a IndexError: list index out of range at the count[ord(c)] += 1 line. Sorry, I feel like I'm missing something silly/obvious that would let me get this to work. \$\endgroup\$
    – Curt F.
    Commented Jul 11, 2015 at 4:10
  • \$\begingroup\$ @CurtF. I have edited it to remove all references to a and b, hopefully it works for you now \$\endgroup\$
    – spyr03
    Commented Jul 11, 2015 at 4:27
  • \$\begingroup\$ Success! I finally got your method to work. I kept getting list index errors until I changed count to a dictionary and initialized it with chars = set(original).union(set(test)); count = {}; for char in chars: count[char] = 0; \$\endgroup\$
    – Curt F.
    Commented Jul 11, 2015 at 5:34
2
\$\begingroup\$
def isAnagram(str1,str2):
    str1 = str1.lower()
    str2 = str2.lower()
    str1 = sorted(str1)
    str2 = sorted(str2)
    return str1 == str2

This is the simplest I can think of. Though, I am not a developer, but, just a systems engineer who likes to code

\$\endgroup\$
1
\$\begingroup\$

Its also easier to compare hashtables of each of the strings. For example,

def string2dict(st):
  d = {}
  for ch in st:
    d [ch] = d.get(ch,0) + 1
  return d

assert string2dict('aba') == string2dict('aab')
\$\endgroup\$

Not the answer you're looking for? Browse other questions tagged or ask your own question.