69

I need to count the number of zero elements in numpy arrays. I'm aware of the numpy.count_nonzero function, but there appears to be no analog for counting zero elements.

My arrays are not very large (typically less than 1E5 elements) but the operation is performed several millions of times.

Of course I could use len(arr) - np.count_nonzero(arr), but I wonder if there's a more efficient way to do it.

Here's a MWE of how I do it currently:

import numpy as np
import timeit

arrs = []
for _ in range(1000):
    arrs.append(np.random.randint(-5, 5, 10000))


def func1():
    for arr in arrs:
        zero_els = len(arr) - np.count_nonzero(arr)


print(timeit.timeit(func1, number=10))
6
  • 1
    count_nonzero is a very basic compiled operation. Whether you want to know the number of zeros or the number of nonzeros, you still have to loop through the whole array. Let numpy do that in compiled code and don't worry about efficiency.
    – hpaulj
    Commented Mar 21, 2017 at 0:30
  • 3
    Why do you think len(arr) - np.count_nonzero(arr) is inefficient? Commented Mar 21, 2017 at 0:35
  • 1
    You realize that len(are) is a simple attribute lookup, right? It doesn't iterate the array again... Commented Mar 21, 2017 at 0:38
  • 2
    @juanpa.arrivillaga len(arr) is an attribute lookup through a function call. Pure attribute lookup a.size takes 25% less time.
    – DYZ
    Commented Mar 21, 2017 at 0:40
  • 1
    @DYZ yes, you should use a.size anyway, especially since len(a) will give the wrong answer for multidimensional arrays. But I don't think that is what OP was referring to... Commented Mar 21, 2017 at 0:43

1 Answer 1

94

A 2x faster approach would be to just use np.count_nonzero() but with the condition as needed.

In [3]: arr
Out[3]: 
array([[1, 2, 0, 3],
      [3, 9, 0, 4]])

In [4]: np.count_nonzero(arr==0)
Out[4]: 2

In [5]:def func_cnt():
            for arr in arrs:
                zero_els = np.count_nonzero(arr==0)
                # here, it counts the frequency of zeroes actually

You can also use np.where() but it's slower than np.count_nonzero()

In [6]: np.where( arr == 0)
Out[6]: (array([0, 1]), array([2, 2]))

In [7]: len(np.where( arr == 0))
Out[7]: 2

Efficiency: (in descending order)

In [8]: %timeit func_cnt()
10 loops, best of 3: 29.2 ms per loop

In [9]: %timeit func1()
10 loops, best of 3: 46.5 ms per loop

In [10]: %timeit func_where()
10 loops, best of 3: 61.2 ms per loop

more speedups with accelerators

It is now possible to achieve more than 3 orders of magnitude speed boost with the help of JAX if you've access to accelerators (GPU/TPU). Another advantage of using JAX is that the NumPy code needs very little modification to make it JAX compatible. Below is a reproducible example:

In [1]: import jax.numpy as jnp
In [2]: from jax import jit

# set up inputs
In [3]: arrs = []
In [4]: for _ in range(1000):
   ...:     arrs.append(np.random.randint(-5, 5, 10000))

# JIT'd function that performs the counting task
In [5]: @jit
   ...: def func_cnt():
   ...:     for arr in arrs:
   ...:         zero_els = jnp.count_nonzero(arr==0)

# efficiency test
In [8]: %timeit func_cnt()
15.6 µs ± 391 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
8
  • 2
    np.where uses uses np.count_nonzero (at compiled level) to determine the size of the arrays that it returns.
    – hpaulj
    Commented Mar 21, 2017 at 0:29
  • 1
    The new function is 2x faster than my original function. I think that's as good as it gets. Thank you!
    – Gabriel
    Commented Mar 21, 2017 at 1:24
  • 1
    On my computer with numpy version: 1.12.1 and 1000 arrays in the list, I get for func1: 1000 loops, best of 3: 376 µs per loop and func_cnt: 1000 loops, best of 3: 1.65 ms per loop. So I am puzzled now.
    – plasmon360
    Commented Mar 21, 2017 at 2:41
  • 1
    Just in case anyone else has the same idea I had: I thought I'd try to use zero_els = np.sum(arr == 0), but it is actually slower than any of the methods described above.
    – Brunox13
    Commented Jun 27, 2020 at 19:26
  • 3
    Using a function named 'count_nonzero' to count occurrences of zeroes. Good. Excellent. I would not change a thing here. This is good naming and makes sense.
    – Jarvis
    Commented Jul 30, 2022 at 8:57

Not the answer you're looking for? Browse other questions tagged or ask your own question.