So let's say I have the number 45, which is equal to 101101 in binary and has 4 1's in it. What's the most efficient way to write an algorithm to do this?

Instead of writing an algorithm to do this its best to use the built in function. Integer.bitCount()

What makes this especially efficient is that the JVM can treat this as an intrinsic. i.e. recognise and replace the whole thing with a single machine code instruction on a platform which supports it e.g. Intel/AMD

To demonstrate how effective this optimisation is

public static void main(String... args) {


private static void perfTestIntrinsic() {
    long start = System.nanoTime();
    long countBits = 0;
    for (int i = 0; i < Integer.MAX_VALUE; i++)
        countBits += Integer.bitCount(i);
    long time = System.nanoTime() - start;
    System.out.printf("Intrinsic: Each bit count took %.1f ns, countBits=%d%n", (double) time / Integer.MAX_VALUE, countBits);

private static void perfTestACopy() {
    long start2 = System.nanoTime();
    long countBits2 = 0;
    for (int i = 0; i < Integer.MAX_VALUE; i++)
        countBits2 += myBitCount(i);
    long time2 = System.nanoTime() - start2;
    System.out.printf("Copy of same code: Each bit count took %.1f ns, countBits=%d%n", (double) time2 / Integer.MAX_VALUE, countBits2);

// Copied from Integer.bitCount()
public static int myBitCount(int i) {
    // HD, Figure 5-2
    i = i - ((i >>> 1) & 0x55555555);
    i = (i & 0x33333333) + ((i >>> 2) & 0x33333333);
    i = (i + (i >>> 4)) & 0x0f0f0f0f;
    i = i + (i >>> 8);
    i = i + (i >>> 16);
    return i & 0x3f;


Intrinsic: Each bit count took 0.4 ns, countBits=33285996513
Copy of same code: Each bit count took 2.4 ns, countBits=33285996513

Each bit count using the intrinsic version and loop takes just 0.4 nano-second on average. Using a copy of the same code takes 6x longer (gets the same result)

The most efficient way to count the number of 1's in a 32-bit variable v I know of is:

v = v - ((v >> 1) & 0x55555555);
v = (v & 0x33333333) + ((v >> 2) & 0x33333333);
c = ((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24; // c is the result

Updated: I want to make clear that it's not my code, actually it's older than me. According to Donald Knuth (The Art of Computer Programming Vol IV, p 11), the code first appeared in the first textbook on programming, The Preparation of Programs for an Electronic Digital Computer by Wilkes, Wheeler and Gill (2nd Ed 1957, reprinted 1984). Pages 191–193 of the 2nd edition of the book presented Nifty Parallel Count by D B Gillies and J C P Miller.

See Bit Twidling Hacks and study all the 'counting bits set' algorithms. In particular, Brian Kernighan's way is simple and quite fast if you expect a small answer. If you expect an evenly distributed answer, lookup table might be better.


This is called Hamming weight. It is also called the population count, popcount or sideways sum.


The following is either from "Bit Twiddling Hacks" page or Knuth's books (I don't remember). It is adapted to unsigned 64 bit integers and works on C#. I don't know if the lack of unsigned values in Java creates a problem.

By the way, I write the code only for reference; the best answer is using Integer.bitCount() as @Lawrey said; since there is a specific machine code operation for this operation in some (but not all) CPUs.

  const UInt64 m1 = 0x5555555555555555;
  const UInt64 m2 = 0x3333333333333333;
  const UInt64 m4 = 0x0f0f0f0f0f0f0f0f;
  const UInt64 h01 = 0x0101010101010101;

  public int Count(UInt64 x)
      x -= (x >> 1) & m1;
      x = (x & m2) + ((x >> 2) & m2);
      x = (x + (x >> 4)) & m4;
      return (int) ((x * h01) >> 56);
public int f(int n) 
    int result = 0;
    for(;n > 0; n = n >> 1)
        result += ((n & 1) == 1 ? 1 : 0);

    return result;
The following Ruby code works for positive numbers.

count = 0
while num > 1
    count = (num % 2 == 1) ? count + 1 : count
    num = num >> 1
count += 1
return count
The fastest I have used and also seen in a practical implementation (in the open source Sphinx Search Engine) is the MIT HAKMEM algorithm. It runs superfast over a very large stream of 1's and 0's.

