112

You are given a large range [a,b] where 'a' and 'b' can be typically between 1 and 4,000,000,000 inclusive. You have to find out the XOR of all the numbers in the given range.

This problem was used in TopCoder SRM. I saw one of the solutions submitted in the match and I'm not able to figure out how its working.

Could someone help explain the winning solution:

long long f(long long a) {
     long long res[] = {a,1,a+1,0};
     return res[a%4];
}

long long getXor(long long a, long long b) {
     return f(b)^f(a-1);
}

Here, getXor() is the actual function to calculate the xor of all number in the passed range [a,b] and "f()" is a helper function.

3
  • I edited your question just a bit. We don't mind explaining the why of some code, but we don't need a new list of other ways to solve this. Leave that to TopCoder.
    – Kev
    Commented May 21, 2012 at 0:00
  • @Kev No issues! I wrote that because some people love to give their own way rather than explaining the already written thing. And any new idea is never a waste... ;) Commented May 22, 2012 at 2:55
  • This has undefined behaviour for a<=0, or for b<0. long long is a signed type, so x%4 is negative (or 0) for negative inputs. Perhaps you want unsigned long long, and/or a & 3 to index the array? Commented Jan 30, 2019 at 0:12

6 Answers 6

166

This is a pretty clever solution -- it exploits the fact that there is a pattern of results in the running XORs. The f() function calculates the XOR total run from [0, a]. Take a look at this table for 4-bit numbers:

0000 <- 0  [a]
0001 <- 1  [1]
0010 <- 3  [a+1]
0011 <- 0  [0]
0100 <- 4  [a]
0101 <- 1  [1]
0110 <- 7  [a+1]
0111 <- 0  [0]
1000 <- 8  [a]
1001 <- 1  [1]
1010 <- 11 [a+1]
1011 <- 0  [0]
1100 <- 12 [a]
1101 <- 1  [1]
1110 <- 15 [a+1]
1111 <- 0  [0]

Where the first column is the binary representation and then the decimal result and its relation to its index (a) into the XOR list. This happens because all the upper bits cancel and the lowest two bits cycle every 4. So, that's how to arrive at that little lookup table.

Now, consider for a general range of [a,b]. We can use f() to find the XOR for [0,a-1] and [0,b]. Since any value XOR'd with itself is zero, the f(a-1) just cancels out all the values in the XOR run less than a, leaving you with the XOR of the range [a,b].

14
  • min range threshold is 1, not 0 Commented May 20, 2012 at 3:39
  • 3
    @PenchoIlchev Whether or not it includes 0 is kind of moot -- (n^0)==n
    – FatalError
    Commented May 20, 2012 at 4:03
  • 2
    @rajneesh2k10 Well, in runs of 4 (starting at a multiple of 4), all the bits except the lowest are the same, so they alternate between canceling each other or having their original value. It's true that the lowest bit cycles every 2, but 0^1 == 1 (i.e. they don't cancel). The reason the lowest two is special is because (00 ^ 01 ^ 10 ^ 11) == 00. In other words, every 4 values you cycle through gets you back to 0, and so you can cancel out all such cycles, which is why a%4 is significant.
    – FatalError
    Commented May 20, 2012 at 4:45
  • 3
    @Pandrei the a there is 2, not 0.
    – user555045
    Commented Jan 23, 2014 at 11:38
  • 1
    That column is the running xor and 1 xor 2 is 3, so the current value in that row looks correct to me.
    – FatalError
    Commented Aug 7, 2016 at 14:27
66

Adding to FatalError's great answer, the line return f(b)^f(a-1); could be explained better. In short, it's because XOR has these wonderful properties:

  • It's associative - Place brackets wherever you want
  • It's commutative - that means you can move the operators around (they can "commute")

Here's both in action:

(a ^ b ^ c) ^ (d ^ e ^ f) = (f ^ e) ^ (d ^ a ^ b) ^ c
  • It reverses itself

Like this:

a ^ b = c
c ^ a = b

Add and multiply are two examples of other associative/ commutative operators, but they don't reverse themselves. Ok, so, why are these properties important? Well, a simple route is to expand it out into what it really is, and then you can see these properties at work.

First, let's define what we want and call it n:

n      = (a ^ a+1 ^ a+2 .. ^ b)

If it helps, think of XOR (^) as if it was an add.

Let's also define the function:

f(b)   = 0 ^ 1 ^ 2 ^ 3 ^ 4 .. ^ b

b is greater than a, so just by safely dropping in a few extra brackets (which we can because it's associative), we can also say this:

f(b)   = ( 0 ^ 1 ^ 2 ^ 3 ^ 4 .. ^ (a-1) ) ^ (a ^ a+1 ^ a+2 .. ^ b)

Which simplifies to:

f(b)   = f(a-1) ^ (a ^ a+1 ^ a+2 .. ^ b)

f(b)   = f(a-1) ^ n

Next, we use that reversal property and commutivity to give us the magic line:

n      = f(b) ^ f(a-1)

If you've been thinking of XOR like an add, you would've dropped in a subtract there. XOR is to XOR what add is to subtract!

How do I come up with this myself?

Remember the properties of logical operators. Work with them almost like an add or multiply if it helps. It feels unusual that and (&), xor (^) and or (|) are associative, but they are!

Run the naive implementation through first, look for patterns in the output, then start finding rules which confirm the pattern is true. Simplify your implementation even further and repeat. This is probably the route that the original creator took, highlighted by the fact that it's not completely optimal (i.e. use a switch statement rather than an array).

2
  • 4
    This reminds me of my Discrete Mathematics course I took last year in university. Fun days. What came to my mind immediately after reading this is this XKCD comic. Commented Jan 18, 2017 at 13:40
  • It reverses itself is technically "a is the inverse of itself", or really we can extend this discussion all the way to basic Group Theory...
    – iBug
    Commented Dec 29, 2021 at 14:22
3

I found out that the below code is also working like the solution given in the question.

May be this is little optimized but its just what I got from observing repetition like given in the accepted answer,

I would like to know / understand the mathematical proof behind the given code, like explained in the answer by @Luke Briggs

Here is that JAVA code

public int findXORofRange(int m, int n) {
    int[] patternTracker;

    if(m % 2 == 0)
        patternTracker = new int[] {n, 1, n^1, 0};
    else
        patternTracker = new int[] {m, m^n, m-1, (m-1)^n};

    return patternTracker[(n-m) % 4];
}
2

I have solved the problem with recursion. I simply divide the dataset into an almost equal part for every iteration.

public int recursion(int M, int N) {
    if (N - M == 1) {
        return M ^ N;
    } else {
        int pivot = this.calculatePivot(M, N);
        if (pivot + 1 == N) {
            return this.recursion(M, pivot) ^ N;
        } else {
            return this.recursion(M, pivot) ^ this.recursion(pivot + 1, N);
        }
    }
}
public int calculatePivot(int M, int N) {
    return (M + N) / 2;
}

Let me know your thoughts over the solution. Happy to get improvement feedbacks. The proposed solution calculates the XOR in 0(log N) complexity.

Thank you

1
  • 2
    This one has same Computational complexity with normal m ^ (m+1) ^ ... ^ (n-1) ^ n calculation. This is 0(n). Commented Apr 23, 2018 at 9:49
0

To support XOR from 0 to N the code given needed to be modified as below,

int f(int a) {
    int []res = {a, 1, a+1, 0};
    return res[a % 4];
}

int getXor(int a, int b) {
    return f(b) ^ f(a);
}
0

Adding on even further to FatalError's answer, it's possible to prove (by induction) that the observed pattern in f() will cycle for every 4 numbers.

We're trying to prove that for every integer k >= 0,

f(4k + 1) = 1
f(4k + 2) = 4k + 3
f(4k + 3) = 0
f(4k + 4) = 4k + 4

where f(n) is 1 ^ 2 ^ ... ^ n.

As our base case, we can work out by hand that

f(1) = 1
f(2) = 1 ^ 2 = 3
f(3) = 3 ^ 3 = 0
f(4) = 0 ^ 4 = 4

For our inductive step, assume that these equations are true up to a particular integer 4x (i.e. f(4x) = 4x). We want to show that our equations are true for 4x + 1, 4x + 2, 4x + 3 and 4x + 4.

To help write and visualize the proof, we can let b(x) denote the binary (base-2) string representation of x, for example
b(7) = '111', b(9) = '1001'.

and

b(4x)     = 'b(x)00'
b(4x + 1) = 'b(x)01'
b(4x + 2) = 'b(x)10'
b(4x + 3) = 'b(x)11'

Here is the inductive step:

Assume: f(4x) = 4x = 'b(x)00'
Then:

f(4x + 1) = f(4x)    ^ (4x + 1)  // by definition
          = f(4x)    ^ 'b(x)01'  // by definition
          = 'b(x)00' ^ 'b(x)01'  // from assumption
          = '01'                 // as b(x) ^ b(x) = 0

f(4x + 2) = f(4x + 1) ^ (4x + 2)
          = f(4x + 1) ^ 'b(x)10'       
          = '01'      ^ 'b(x)10'
          = 'b(x)11'             // this is 4x + 3

f(4x + 3) = f(4x + 2) ^ (4x + 3)
          = f(4x + 2) ^ 'b(x)11' 
          = 'b(x)11'  ^ 'b(x)11'
          = '00'

For the last case, we don't use binary strings, 
since we don't know what b(4x + 4) is.

f(4x + 4) = f(4x + 3) ^ (4x + 4)
          = 0         ^ (4x + 4)
          = 4x + 4

So the pattern holds for the next four numbers after 4x, completing the proof.

Not the answer you're looking for? Browse other questions tagged or ask your own question.