Determining if two words are anagrams

Question

I was given a this problem at a technical interview and failed miserably in the time given. Afterwards, I sat down and worked out what I think is a good solution. The task was actually to build an AngularJS app to do this, which I did, but the guts of my solution is in JavaScript. I wanted to ask if there is a better way to do this:

// convert strings to LC for case insensitivity
// split strings into arrays
// sort the arrays (spaces will sort first and be trimmed later)
// join the arrays back into strings
// we're only concerned about the printable characters in the anagram so,
// trim() to remove any spaces and then compare the resulting strings
var str1 = this.word1.toLowerCase().split('').sort().join('').trim();
var str2 = this.word2.toLowerCase().split('').sort().join('').trim();

if (str1 === str2) {
    this.isAnagram = true;
} else {
    this.isAnagram = false;
}

As it often happens, the best solution depends heavily on your constraints, which aren't clear at all. Can you assume that both word1 and word2 are actually strings? Can you assume that they only contain letters? Do you need type conversion? Do you need to treat "ÿ" like "y"? Do you need error handling? Etc... — GOTO 0, Commented Aug 7, 2015 at 7:52
If you're writing real code on a team for a real project, your solution above is perfect. ie, the only concern in software, for 20 years, is readability and clarity --so, you get an A. (In the extremely unusual case that performance is relevant, there are great explorations of that concept below, which you can take as an interesting, sort of historic footnote to the issue. Note though that, of course, if you present a "performance" solution that's a total fail: you're job, your existence, your life, your every heartbeat is one simple concept: readability.) — Fattie, Commented Aug 8, 2015 at 14:32
Regarding this now extremely confusing QA. I feel it is well worth restating that (1) matching an anagram (sort, equality) is an utterly, utterly, utterly trivial process (2) of course, obviously, you do it in a line of code using the utterly trivial .Net calls available to you for the purpose. It is, simply, utterly inconceivable you would start "writing code!" to do that. No more than you would start "writing code!" to, for example, divide two numbers or to make a string lowercase. — Fattie, Commented Aug 12, 2015 at 12:41
I'm sorry to use the word "utterly" so often but it's that type of QA! :) — Fattie, Commented Aug 12, 2015 at 12:43

rrowland · Accepted Answer · 2015-08-10 03:52:09Z

28

My original assertion (Outdated):

Sorting a string (Or any array) is inefficient because even the fastest algorithm will sort no faster than O(n log n) in an average case. The most efficient way would use a hash map to count letters in each word. Something like:

Although reading from a hash map can be as quick as O(1), writing to a hash map is significantly slower. By using a 26-value array (0-25) to represent lowercase letters, the speed of operations can be sped up significantly:

function isAnagram(word1, word2) {
  if (typeof word1 !== 'string' || typeof word2 !== 'string') {
    throw new Error('isAnagram requires two strings to be passed.')
  }

  var normalizedWord1 = word1.replace(/[^A-Za-z]+/g, '').toLowerCase();
  var normalizedWord2 = word2.replace(/[^A-Za-z]+/g, '').toLowerCase();

  var counts = [];
  var word1Length = normalizedWord1.length;
  var word2Length = normalizedWord2.length

  if (word1Length !== word2Length) { return false; }

  for (var i = 0; i < word1Length; i++) {
    var index = normalizedWord1.charCodeAt(i)-97;
    counts[index] = (counts[index] || 0) + 1;
  }

  for (var i = 0; i < word2Length; i++) {
    var index = normalizedWord2.charCodeAt(i)-97;
    if (!counts[index]) { return false; }
    else { counts[index]--; }
  }

  return true;
}

EDIT: A speed comparison between using a hash and using a 26-value array: http://jsperf.com/anagram-algorithms

edited Aug 10, 2015 at 3:52

answered Aug 7, 2015 at 3:50

rrowland

4403 silver badges9 bronze badges

\$\begingroup\$ Yes. 3n is an order of magnitude faster than n log n. \$\endgroup\$
– rrowland
Commented Aug 7, 2015 at 4:13
\$\begingroup\$ @Tushar If you're interested in learning a little about computational complexity: en.wikipedia.org/wiki/Big_O_notation \$\endgroup\$
– rrowland
Commented Aug 7, 2015 at 4:17
1

\$\begingroup\$ @EasyBB Yeah I wasn't aware for..in was that much slower. A quick Google search enlightened me. Thanks! \$\endgroup\$
– rrowland
Commented Aug 7, 2015 at 5:08
2

\$\begingroup\$ -1 Because a HashMap is typically implemented as a Heap, which is O(log n) for inserting. More to the point. You are in effect doing the equivilent of an Insert Sort on a Heap, which "dun dun dun" is O(n log n). \$\endgroup\$
– Aron
Commented Aug 7, 2015 at 6:37
1

\$\begingroup\$ @Aron You made a good point about insertion speed. I sped up the algorithm by converting the lowercase characters to values 0-25 so they can quickly be looked up in by array index rather than by hash key. \$\endgroup\$
– rrowland
Commented Aug 7, 2015 at 7:12

| Show 34 more comments

Community · Accepted Answer · 2017-04-13 12:40:41Z

24

Everybody is talking about your algorithm, but everybody does the same mistake.

Repeated code! Yeah, you have repeated code.

Look at the following lines:

var str1 = this.word1.toLowerCase().split('').sort().join('').trim();
var str2 = this.word2.toLowerCase().split('').sort().join('').trim();

Just look how long that is! And repeated! Move that to a new function:

var regularize = function(str) {
    return str.toLowerCase().split('').sort().join('').trim();
}

And now you can do like this:

var str1 = regularize(this.word1);
var str2 = regularize(this.word2);

But you have this a few lines below:

if (str1 === str2) {
    this.isAnagram = true;
} else {
    this.isAnagram = false;
}

So, you don't need those variables or anything... Cleaning up, you can just do:

this.isAnagram = regularize(this.word1) == regularize(this.word2);

As suggested many times before, you can do some cleanup to the string. Regular expressions come to my mind. Based on @Tushar's answer, I came up with this:

var regularize = function(str) {
    return str.toLowerCase().replace(/[^a-z\d]/g,'').split('').sort().join('');
}

All assembled together:

var regularize = function(str) {
    return str.toLowerCase().replace(/[^a-z\d]/g,'').split('').sort().join('');
}
this.isAnagram = regularize(this.word1) == regularize(this.word2);

Pretty short, isn't it?

edited Apr 13, 2017 at 12:40

CommunityBot

1

answered Aug 7, 2015 at 9:23

Ismael Miguel

6,1162 gold badges22 silver badges62 bronze badges

\$\begingroup\$ Please, have the kindness to explain the downvotes \$\endgroup\$
– Ismael Miguel
Commented Aug 10, 2015 at 8:02
\$\begingroup\$ While I like that someone is focusing on readability, I don't think that moving that section of doubled code into a separate function enhances readability - rather, it makes it necessary to refer elsewhere to see what the code does. When the code is simply duplicated on successive lines, it's very obvious that the code is the same. (The downvote wasn't from me, though.) \$\endgroup\$
– Brilliand
Commented Aug 10, 2015 at 21:16
3

\$\begingroup\$ @Brilliand The idea was to improve readability to making the code DRY and reducing other bloat. The same idiomatic meaning is still present on both ways. Mine simply removed all bloat. I believe it is easier to read this than the original. Yes, you have to refer to somewhere else, but if something is repeated then maybe it's time to have it's own function. \$\endgroup\$
– Ismael Miguel
Commented Aug 11, 2015 at 8:39
1

\$\begingroup\$ Moving the duplicated code into its own function is a prerequisite to improving its implementation (for example, we could populate a histogram instead). If, after refactoring the function, you think it's worth inlining again, that's a decision to make at that point. So I'm all for extracting repeated functionality like this - it's very helpful. \$\endgroup\$
– Toby Speight
Commented Aug 23, 2018 at 10:08

Add a comment |

Tushar · Accepted Answer · 2015-08-10 03:24:52Z

Your solution is not removing punctuations and spaces before checking if two strings are Anagrams.

Also, the code can be optimized.

First, remove all the spaces and punctuation marks from both the strings using Regular Expression
Check if the length of strings are equal, if not return false immedietly
Check for Anagrams only when both strings are of equal length.

Code:

var regex = /[^a-z0-9]/gi;

var str1 = this.word1.replace(regex, ''),
    str2 = this.word2.replace(regex, '');

this.isAnagram = str1.length > 0 && str1.length === str2.length && (str1.toLowerCase().split('').sort().join('') === str2.toLowerCase().split('').sort().join(''));

This will first check for the length of strings are equal, if equal then only (str1.toLowerCase().split('').sort().join('') === str2.toLowerCase().split('').sort().join('')) is evaluated.

Regex Explanation

/: Delimiter of regex
[]: Character class
[^..]: Not containing any of the following characters
a-z0-9: All alphanumeric characters
g: Global flag. Matches all the characters from the class
i: Case insensitive match

Demo

var regex = /[^a-z0-9]/gi;
document.getElementById('btn').addEventListener('click', function() {

  var str1 = (document.getElementById('str1').value).replace(regex, ''),
    str2 = (document.getElementById('str2').value).replace(regex, '');

  var isAnagram = str1.length > 0 && str1.length === str2.length && (str1.toLowerCase().split('').sort().join('') === str2.toLowerCase().split('').sort().join(''));

  alert('Is Anagram: ' + isAnagram);
}, false);

<input type="text" id="str1" />
<br />
<input type="text" id="str2" />
<br />
<input type="button" id="btn" value="check" />

Edit

Another approach:

var checkAnagram = (function () {
    var sanitizeRegex = /[^a-z0-9]/gi;

    var sanitizeString = function (str) {
        return str.replace(sanitizeRegex, '').toLowerCase().split('').sort().join('');
    };

    return function (str1, str2) {
        return sanitizeString(str1) === sanitizeString(str2);
    };
}());

var isAnagram = checkAnagram('Rust! Ha?', 'Tushar'); // Usage

Demo

Would "Rust Ha" be considered an anagram of "Tushar"? if so then this approach wouldn't quick cover all cases. — Jason Sperske, Commented Aug 7, 2015 at 3:43
Only that one of the side effects of .toLowerCase().split('').sort() would be to sort all of the spaces to the end (or beginning, I haven't tested which), but this would fail the length test (despite trimming) (took me a while to come up with a clever anagram to illustrate this point, please don't take any offense :) — Jason Sperske, Commented Aug 7, 2015 at 3:47
If I was interviewing someone and looking for how complete they attempted to solve a programming puzzle I would consider this a strong answer. Rust? Ha! Tushar :) — Jason Sperske, Commented Aug 7, 2015 at 3:56
Downvoter, Please comment here, why downvoted? Can something be improved? — Tushar, Commented Aug 7, 2015 at 4:04

Simon Kuang · Accepted Answer · 2015-08-08 01:42:55Z

I have some boilerplate code similar to rrowland's, but I feel like my algorithm could be a little bit faster. It operates in O(n) by using prime number multiplication to count letters, and is non-branching in the longest-time routine.

Instead of doing ind - 97 I keep 97 empty spots in the array that is accessed.

I think if you were more obsessive you could do the counting using bitwise operations, but this is good enough.

function isAnagram(word1, word2) {
  if (!word1 || !word2 || !word1.length || !word2.length) {
    throw new Error('isAnagram requires two strings to be passed.')
  }

  var nword1 = word1.replace(/\s+/g, '').toLowerCase();
  var nword2 = word2.replace(/\s+/g, '').toLowerCase();

  var length1 = nword1.length;
  var length2 = nword2.length;

  if (length1 !== length2) {
    return false;
  }

  var word1hash = 1;
  var word2hash = 1;

  var primes = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101];

  var ind;
  for (var i = 0; i < length1; i++) {
    ind = nword1.charCodeAt(i);
    word1hash *= primes[ind];
  }

  for (var i = 0; i < length2; i++) {
    ind = nword2.charCodeAt(i);
    word2hash *= primes[ind];
  }

  console.log(word1hash);
  console.log(word2hash);

  return word1hash == word2hash;
}

Edit 1

I guess we have a speed contest now, and this version is a great deal faster than all its precedents (benchmark).

function isAnagram(word1, word2) {
  if (!word1 || !word2 || !word1.length || !word2.length) {
    throw new Error('isAnagram requires two strings to be passed.')
  }

  var nword1 = word1;
  var nword2 = word2;

  var length1 = nword1.length;
  var length2 = nword2.length;

  if (length1 !== length2) {
    return false;
  }

  var word1hash = 1;
  var word2hash = 1;

  var tab = {'q': 2, 'w': 3, 'e': 5, 'r': 7, 't': 11, 'y': 13, 'u': 17, 'i': 19, 'o': 23, 'p': 29, 'a': 31, 's': 37, 'd': 41, 'f': 43, 'g': 47, 'h': 53, 'j': 59, 'k': 61, 'l': 67, 'z': 71, 'x': 73, 'c': 79, 'v': 83, 'b': 89, 'n': 97, 'm': 101, 'Q': 2, 'W': 3, 'E': 5, 'R': 7, 'T': 11, 'Y': 13, 'U': 17, 'I': 19, 'O': 23, 'P': 29, 'A': 31, 'S': 37, 'D': 41, 'F': 43, 'G': 47, 'H': 53, 'J': 59, 'K': 61, 'L': 67, 'Z': 71, 'X': 73, 'C': 79, 'V': 83, 'B': 89, 'N': 97, 'M': 101, ' ': 1}

  for (var i = 0; i < length1; i++) {
    word1hash *= tab[word1[i]]
  }

  for (var i = 0; i < length2; i++) {
    word2hash *= tab[word2[i]]
  }

  return word1hash == word2hash;
}

Edit 2

I know this is silly, but adding logarithms of prime numbers is actually a tiny bit faster. (benchmark)

function isAnagram(word1, word2) {
  if (!word1 || !word2 || !word1.length || !word2.length) {
    throw new Error('isAnagram requires two strings to be passed.')
  }

  var length1 = word1.length;
  var length2 = word2.length;

  var word1hash = 1;
  var word2hash = 1;

  var tab = {'q': 0.6931471805599453, 'w': 1.0986122886681098, 'e': 1.6094379124341003, 'r': 1.9459101490553132, 't': 2.3978952727983707, 'y': 2.5649493574615367, 'u': 2.833213344056216, 'i': 2.9444389791664403, 'o': 3.1354942159291497, 'p': 3.367295829986474, 'a': 3.4339872044851463, 's': 3.6109179126442243, 'd': 3.713572066704308, 'f': 3.7612001156935624, 'g': 3.8501476017100584, 'h': 3.970291913552122, 'j': 4.07753744390572, 'k': 4.110873864173311, 'l': 4.204692619390966, 'z': 4.2626798770413155, 'x': 4.290459441148391, 'c': 4.3694478524670215, 'v': 4.418840607796598, 'b': 4.48863636973214, 'n': 4.574710978503383, 'm': 4.61512051684126, 'Q': 0.6931471805599453, 'W': 1.0986122886681098, 'E': 1.6094379124341003, 'R': 1.9459101490553132, 'T': 2.3978952727983707, 'Y': 2.5649493574615367, 'U': 2.833213344056216, 'I': 2.9444389791664403, 'O': 3.1354942159291497, 'P': 3.367295829986474, 'A': 3.4339872044851463, 'S': 3.6109179126442243, 'D': 3.713572066704308, 'F': 3.7612001156935624, 'G': 3.8501476017100584, 'H': 3.970291913552122, 'J': 4.07753744390572, 'K': 4.110873864173311, 'L': 4.204692619390966, 'Z': 4.2626798770413155, 'X': 4.290459441148391, 'C': 4.3694478524670215, 'V': 4.418840607796598, 'B': 4.48863636973214, 'N': 4.574710978503383, 'M': 4.61512051684126, ' ': 0.0
}

  for (var i = 0; i < length1; i++) {
    word1hash += tab[word1[i]]
  }

  for (var i = 0; i < length2; i++) {
    word2hash += tab[word2[i]]
  }

  return word1hash == word2hash;
}

Edit 3: removed equal length check.

+1 The prime multiplication makes it faster. I'd probably find a way to turn your literal construction of 97 consecutive zeroes into something more readable and maintainable, though. — rrowland, Commented Aug 7, 2015 at 18:44
@rrowland, I'm actually not exactly sure as to whether the literal 97 zeros are faster than subtracting 97 from the index. It was just a conjecture. The funny thing was, on jsperf when I declared primes in the initializer's scope, the whole function was way slower—go figure. — Simon Kuang, Commented Aug 7, 2015 at 18:55
I ran a benchmark - rrowland's answer runs in about half the time the OP does, and this runs in about half the time of rrowland's answer. (Warning to anyone else benchmarking or using this: take out the console.logs before running this ten thousand times.) — Brilliand, Commented Aug 7, 2015 at 19:51
Here's a comparison on jsperf: jsperf.com/anagram-algorithms/3 — rrowland, Commented Aug 7, 2015 at 19:56
I suspect there will be some cases of long words that this function counts as anagrams when they really aren't, though - javascript integers start losing precision after a certain point, and this function will tend to cross that threshold after ten letters or so. — Brilliand, Commented Aug 7, 2015 at 19:57

Shivek Parmar · Accepted Answer · 2016-12-08 11:54:02Z

0

I have an easy example

function isAnagram(strFirst, strSecond) {

 if(strFirst.length != strSecond.length)
       return false;

 var tempString1 = strFirst.toLowerCase();
 var tempString2 = strSecond.toLowerCase();

 var matched = true ;
 var cnt = 0;
 while(tempString1.length){
    if(tempString2.length < 1)
        break;
    if(tempString2.indexOf(tempString1[cnt]) > -1 )
        tempString2 = tempString2.replace(tempString1[cnt],'');
    else
        return false;

    cnt++;
 }

 return matched ;

 }

Calling function will be isAnagram("Army",Mary); Function will return true or false

answered Dec 8, 2016 at 11:54

Shivek Parmar

1092 bronze badges

\$\begingroup\$ It would help if you explained why this code is better than what the OP and other people posted and how it is different from the other answers. \$\endgroup\$
– Graipher
Commented Dec 8, 2016 at 14:58

Add a comment |

Stack Exchange Network

Determining if two words are anagrams

5 Answers 5

Not the answer you're looking for? Browse other questions tagged
javascript
strings
or ask your own question.

Linked

Hot Network Questions

Determining if two words are anagrams

5 Answers 5

Not the answer you're looking for? Browse other questions tagged javascriptstrings or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
javascript
strings
or ask your own question.