13
$\begingroup$

Given two strings how can you check if they are a permutation of each other using O(1) space? Modifying the strings is not allowed in any way.
Note: O(1) space in relation to both the string length AND the size of the alphabet.

$\endgroup$
11
  • 3
    $\begingroup$ What do you think? What have you tried, and where did you get stuck? Are the strings over a constant-sized alphabet? Have you tried computing their histograms? $\endgroup$ Commented Jun 7, 2017 at 19:28
  • $\begingroup$ @YuvalFilmus it should be O(1) space both to the length of the string and the size of the alphabet $\endgroup$ Commented Jun 7, 2017 at 19:32
  • $\begingroup$ This seems clearly impossible. Any algorithm will require additional space to store at least a position in one string or a single character. Neither of these things is O(1). $\endgroup$ Commented Jun 8, 2017 at 12:38
  • $\begingroup$ @DavidSchwartz - how? O(1) means constant, not one bute. It doesn't matter how long the string is, position in it is one number. $\endgroup$
    – Davor
    Commented Jun 8, 2017 at 14:09
  • $\begingroup$ It depends on the machine model, obviously no problem in uniform models. In a logarithmic cost model storing the index is O(log n) for strings of length n which is neither constant by means of the length nor to the alphabet size. When the strings can be temporarily modified, I think there is a solution with increased alphabet which is linear in the alphabet size but constant in the string length in a logarithmic model. $\endgroup$
    – kap
    Commented Jun 8, 2017 at 14:57

7 Answers 7

7
$\begingroup$

The naive approach would be building histograms of both strings and checking whether they are the same. Since we are not allowed to store such a data structure (whose size would be linear to the size of the alphabet) that could be computed in one pass, we need to count the occurences of each possible symbol after the other:

function count(letter, string)
    var count := 0
    foreach element in string
        if letter = element
            count++
    return count

function samePermutation(stringA, stringB)
    foreach s in alphabet
        if count(s, stringA) != count(s, stringB)
            return false
    return true

This does of course assume that the counts and iterator indices are integers of constant size, instead of being dependent on the length of the strings.

$\endgroup$
7
  • $\begingroup$ As an optimization, you can go over one array and only compute the histograms of the letters you encounter. In this way the complexity becomes independent of the alphabet size. $\endgroup$ Commented Jun 8, 2017 at 5:40
  • $\begingroup$ To expand on @YuvalFilmus comment, you would need to also 1) check that the string lengths are the same or 2) iterate over both input strings. You need one of these as it's possible some letters in one are not in the other. Option 1 should have fewer calculations. $\endgroup$
    – BurnsBA
    Commented Jun 8, 2017 at 11:39
  • $\begingroup$ @YuvalFilmus I wanted to avoid that as it would mean quadratic time complexity, I'd expect the alphabet to be smaller than the average string size. For small strings and an ordered alphabet, I'd consider computing the next smallest present symbol along with the count in the inner loop, so that one can skip a few iterations of the alphabet loop - with a complexity of O(n * min(n, |Σ|)). Hm, now that I think about it, that sounds quite like the "allowed to repeat" solution from your answer, doesn't it? $\endgroup$
    – Bergi
    Commented Jun 8, 2017 at 12:04
  • $\begingroup$ count isn't O(1) (i.e. it may overflow) $\endgroup$ Commented Jun 8, 2017 at 15:17
  • 1
    $\begingroup$ @Eternalcode I never said that count was an int :-) Yes, it would not work, but in Java that cannot happen anyway $\endgroup$
    – Bergi
    Commented Jun 16, 2017 at 5:56
12
$\begingroup$

Denote the arrays by $A,B$, and suppose they are of length $n$.

Suppose first that the values in each array are distinct. Here is an algorithm that uses $O(1)$ space:

  1. Compute the minimum values of both arrays, and check that they are identical.

  2. Compute the second minimum values of both arrays, and check that they are identical.

  3. And so on.

Computing the minimum value of an array clearly uses $O(1)$ space. Given the $k$th smallest element, we can find the $(k+1)$st smallest element by finding the minimal value larger than the $k$th smallest element (here we use the fact that all elements are distinct).

When elements are allowed to repeat, we modify the algorithm as follows:

  1. Compute the minimum values $m_{A,1},m_{B,1}$ of both arrays, count how many times each appear, and verify the $m_{A,1} = m_{B,1}$ and that the counts are identical.

  2. Compute the minimum values $m_{A,2},m_{B,2}$ larger than $m_{A,1},m_{B,1}$ in the two arrays (respectively), and count how many times each appear. Verify that $m_{A,2} = m_{B,2}$, and that the counts are identical.

  3. And so on.

$\endgroup$
11
  • 1
    $\begingroup$ Would this approach be $O(n^2)$ since it seems like the only way to find the min element in $O(1)$ space and read-only access to the array is to iterate over all elements? $\endgroup$
    – ryan
    Commented Jun 8, 2017 at 3:21
  • 4
    $\begingroup$ This requires an ordering on the alphabet, though it's easy to change the algorithm not to require that. However, in the "has duplicates" case, this requires $O(\lg n)$ space, not $O(1)$. Counting takes space. $\endgroup$ Commented Jun 8, 2017 at 5:06
  • 7
    $\begingroup$ Counting does need (logarithmic) space, but - by this definition of space usage - so does even iterating over the array. Thus, under the strict meaning of space usage, there is no way to do it in constant space. $\endgroup$ Commented Jun 8, 2017 at 5:54
  • 4
    $\begingroup$ @DanielJour, it depends on which cost model you're using. Under uniform cost, this is possible in constant space. $\endgroup$
    – ryan
    Commented Jun 8, 2017 at 6:22
  • 7
    $\begingroup$ If you are only allowed a constant number of bits, you can only handle constant size alphabets (this follows from the theory of regular languages). $\endgroup$ Commented Jun 8, 2017 at 7:03
2
$\begingroup$

Define some function f(c) which maps some character c to a unique prime number (a = 2, b = 3, c = 5, etc).

set checksum = 1
set count = 0 <-- this is probably not even necessary, but it's another level of check
for character c in string 1
    checksum = checksum * f(c)
    count = count + 1
for character c in string 2
    checksum = checksum / f(c)
    count = count = 1

permutation = count == 0 and checksum == 1

Just declaring that you can use a prime number mapping function is a bit handwavey, and most likely where a problem would arise keeping $\textit{O}\textbf{(1)}$ space.

$\endgroup$
4
  • $\begingroup$ With a bound on the alphabet, $f(c)$ should use $O(1)$ space, otherwise I believe it would not be constant space. Furthermore, if you did compute it in $O(1)$ space it would be extremely inefficient based on current results. Still, +1 for the primality approach. $\endgroup$
    – ryan
    Commented Jun 8, 2017 at 5:13
  • $\begingroup$ Another issue I realised after posting is that checksum is going to be a gigantic number for large strings, to the extent that by itself it could violate the O(1) space requirement. This can be resolved by using floats and mutliplying by a character on one string then dividing on the other, then just saying checksum must be close to 1. The strings would have to be truly gigantic for floating point error to be a problem. $\endgroup$
    – Turksarama
    Commented Jun 8, 2017 at 5:29
  • 4
    $\begingroup$ Such answers are the reason we need to be careful of our computation model. The usual model we use when analyzing algorithms counts memory in units of machine words, which have size $O(\log n)$ bits. So you can't do the computation in integers. If you switch to floating point, your algorithm might fail even when the two strings are permutations of one another, and conversely won't necessarily give the right answer when they are not. $\endgroup$ Commented Jun 8, 2017 at 5:44
  • 4
    $\begingroup$ This doesn't use constant space. Even for a fixed alphabet, the size of the integer checksum is going to be $\Theta(n)$ bits for inputs of length $n$. $\endgroup$ Commented Jun 8, 2017 at 8:54
0
$\begingroup$

You can do this is O(nlogn). Sort the two strings, and compare them index by index. If they differ anywhere, they're not permutations of each other.

For an O(n) solution, hashing could be used. This hashing function would work, and e for any letter would be its ascii value. If the two hashes of the strings differ, they're not permutations of each other.

The hashing function in the link:

One potential candidate might be this. Fix a odd integer R. For each element e you want to hash compute the factor (R + 2*e). Then compute the product of all these factors. Finally divide the product by 2 to get the hash.

The factor 2 in (R + 2e) guarantees that all factors are odd, hence avoiding that the product will ever become 0. The division by 2 at the end is because the product will always be odd, hence the division just removes a constant bit.

E.g. I choose R = 1779033703. This is an arbitrary choice, doing some experiments should show if a given R is good or bad. Assume your values are [1, 10, 3, 18]. The product (computed using 32-bit ints) is

(R + 2) * (R + 20) * (R + 6) * (R + 36) = 3376724311 Hence the hash would be

3376724311/2 = 1688362155.

Using double hashing (or for overkill even more) by changing the value of R would successfully identify them as permutations with very high probability.

$\endgroup$
1
  • 1
    $\begingroup$ You cannot sort the strings since you are not allowed to modify them. As for hashing, it is a randomized algorithm that could give the wrong answer. $\endgroup$ Commented Jun 8, 2017 at 5:42
0
$\begingroup$

Let's say you have two strings called s and t.

You could use heuristics to make sure that they are not unequal.

  1. s.length == t.length
  2. sum of chars of s == sum of chars in t
  3. [the same as in 2. but with xor instead of sum]

After this you can easily run an algorithm to prove that the string are equal.

  1. sort one string to be equal to the other and compare (O(n^2))
  2. sort both and compare (O(2n log(n))
  3. check for each char in s if there are the same amounts in both strings (O(n^2))

Of course you can't sort that fast if you are not allowed to use additional space. So it does not matter which algorithm you choose - each algorithm will need will run in O(n^2) time when there is only O(1) space and if the heuristic was unable to prove that they can't be equal.

$\endgroup$
1
  • 3
    $\begingroup$ "Modifying the strings is not allowed in any way." $\endgroup$
    – Bergi
    Commented Jun 8, 2017 at 18:34
0
$\begingroup$

In C-style code for the whole routine:

for (int i = 0; i < n; i++) {
   int k = -1;
   next: for (int j = 0; j <= i; j++)
       if (A[j] == A[i]) {
          while (++k < n)
              if (B[k] == A[i])
                  continue next;
          return false; // note at this point j == i
       }
}
return true; 

Or in very verbose pseudo code (using 1-based indexing)

// our loop invariant is that B contains a permutation of the letters
// in A[1]..A[i-1]
for i=1..n
   if !checkLetters(A, B, i)
      return false
return true

where the function checkLetters(A,B,i) checks that if there are M copies of A[i] in A[1]..A[i], then there are at least M copies of A[i] in B:

checkLetters(A,B,i)
    k = 0 // scan index into B
    for j=1..i
      if A[j] = A[i]
         k = findNextValue(B, k+1, A[i])
         if k > n
            return false
    return true

and function findNextValue searches in B for a value starting from an index, and returns the index where it was found (or n+1 if not found).

The idea here is that we have already validated that we have a permutation of the characters before i in the outer loop. In the j loop, we proceed through all the characters that match A[i], and we need to be able to find unique indices in B that match for them (including the i-th position). Thus for each of those, we need to move our scan forward through B. Time is O($n^2$).

$\endgroup$
3
  • $\begingroup$ Can you please convert your C code to pseudocode? This is not a programming site. $\endgroup$ Commented Jun 9, 2017 at 9:20
  • $\begingroup$ This seems like another variant of Bergi's answer (with some inconsequential differences). $\endgroup$ Commented Jun 9, 2017 at 18:04
  • $\begingroup$ It is similar but not a variant. Bergi's answer is $O(nm)$ where m=alphabet size. This is $O(n^2)$. $\endgroup$
    – MotiNK
    Commented Jun 10, 2017 at 4:24
0
$\begingroup$

I think this is the most simplest algorithm (with $O(n^3$) time, $n$ length of strings)

Loop through string1 and string2, for every character check how often it can be found in string1 and string2. I a character is more often in one string than in the other one, it is not a permutation. If the frequencies of all characters are equal then the strings are permutations of each other.

Here is a piece of python to make this precise

s1="abcaba"
s2="aadbba"

def check_if_permutations(string1, string2):
  for string in [string1, string2]:
    # string references string1 
    #  string2, it is not a copy
    for char in string:
      count1=0
      for char1 in string1:
        if  char==char1:
          count1+=1
      count2=0
      for char2 in string2:
        if  char==char2:
          count2+=1
      if count1!=count2:
        print('unbalanced character',char)
        return()
  print ("permutations")
  return()

check_if_permutations(s1,s2)

The program needs some pointers to strings (string, string1, string2, char, char1, char2) and variables of size $O(\log n)$ for counting (count1, count2). It needs to check if characters are equal or not but it dows not need any order on these characters. Maybe it needs some variables for small integers (e.g. to hold boolean values or to represent the postition of string in [string1, string2].

Of course you don't even need the count variables but can use pointers.

s1="abcaba"
s2="aadbba"

def check_if_permutations(string1, string2):
  for string in [string1, string2]:
    # string references one of string1 
    # or string2, it is not a copy
    for char in string:
      # p1 and p2 should be views as pointers
      p1=0
      p2=0
      while (p1<len(string1)) and (p2<len(string2)):
        # p1>=len(string1): p1 points to beyond end of string
        while (p1<len(string1)) and (string1[p1]!=char) :
          p1+=1
        while(p2<len(string2)) and (string2[p2]!=char):
          p2+=1
        if (p1<len(string1)) != (p2<len(string2)):
          print('unbalanced character',char)
          return()
        p1+=1
        p2+=1
  print ("permutations")
  return()

check_if_permutations(s1,s2)

This second program needs similar variables as the first one except that it does not need the $O(\log(n))$-size variables for holding the count values.

So it actually does not depend of $n$ or the size of the alphabet.

$\endgroup$
3
  • $\begingroup$ This is the same as Bergi's solution below. $\endgroup$ Commented Jun 9, 2017 at 12:32
  • $\begingroup$ @YuvalFilmus No, it does not iterate over the whole alphabet and so its runtime therefore does not depend on the alphabet size. It only uses the two strings that should be tested. Also the second program avoids counting. $\endgroup$
    – miracle173
    Commented Jun 9, 2017 at 13:04
  • $\begingroup$ @YuvalFilmus I see now, that your and other comments point in the direction of the way I used in my program. $\endgroup$
    – miracle173
    Commented Jun 9, 2017 at 13:14

Not the answer you're looking for? Browse other questions tagged or ask your own question.