why the where argument default value is false?

Question

The following code demonstrates the difference of the computational time with respect to three syntax forms.

import numpy as np

a = np.random.randn(10000000)
b = np.zeros(a.shape)

np.sin(a, out=b, where=False)
# 100 loops, best of 3: 6.61 ms per loop
b = np.sin(a)
# 10 loops, best of 3: 162 ms per loop
np.sin(a, out=b)
# 10 loops, best of 3: 146 ms per loop

I would like to use the syntax with that provides the minimal computation time. My question is: why if I define out=b, the default value for the where argument is still True. Is there a way to avoid it? It really makes the code more complicated.

The where is only needed when certain of the elements produce errors or bad values, such as with log or divide. It's not a performance tool. If you want all the values to be calculated leave the where parameter alone. — hpaulj, Commented Jun 19, 2018 at 16:06

FHTMitchell · Accepted Answer · 2018-06-19 11:39:15Z

3

Have you looked at the output of np.sin(a, out=b, where=False)?

a = np.linspace(0, 2*np.pi, 10)
# a: array([0.        , 0.6981317 , 1.3962634 , 2.0943951 , 2.7925268 ,
#           3.4906585 , 4.1887902 , 4.88692191, 5.58505361, 6.28318531])

b = np.zeros_like(a)  # [0, 0, 0, ...]

np.sin(a, out=b, where=False)
# --> b: array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

It's all zero, because values of False means "don't calculate here". In this case False means don't calculate for the entire array. That's why it's so fast!

https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.sin.html

where works like this

x = np.ones(3)  # [1,1,1]

np.divide(x, 2, where=[True, False, True])
# --> array([0.5, 1. , 0.5])

so we can say we only want to apply the function in some places.

out simply says that we will store the result in a pre-allocated array. This allows us to save memory np.log(x, out=x) # only use x or to save on array creation time (lets say we're doing many calculations in a loop).

The difference is that b = np.log(a) is effectively:

__temp = np.empty(a.shape)  # temporary hidden array
np.log(a, out=__temp)
b = __temp  # includes dereferencing old b -- no advantage to initialising b before
del __temp

whereas using out skips creating the temporary array, so is slightly faster.

On a side note I think that allowing False as a value is a bit silly since why would you ever want to not calculate the function anywhere?

edited Jun 19, 2018 at 11:39

answered Jun 19, 2018 at 11:15

FHTMitchell

12.1k2 gold badges39 silver badges50 bronze badges

You are right. Can you please than elaborate for me, why the "out=" method should be faster and what is the use of the "where" argument?
– Gideon Kogan
Commented Jun 19, 2018 at 11:19
maybe because you need the values updated only in several locations and preserve the previous value in other locations.
– Gideon Kogan
Commented Jun 19, 2018 at 11:33
I don't understand. np.log(a, where=False) is literally the same as a.copy()
– FHTMitchell
Commented Jun 19, 2018 at 11:34
say you want b[index_1:index_2] = sin(a[index_1:index_2]) without pre-allocation
– Gideon Kogan
Commented Jun 19, 2018 at 11:36
That's a NameErrorwithout preallocation. You must preallocate a. Still doesn't explain why I'd want to set where=False.
– FHTMitchell
Commented Jun 19, 2018 at 11:38

Add a comment |

Collectives™ on Stack Overflow

why the where argument default value is false?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
python
numpy
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged pythonnumpy or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
python
numpy
or ask your own question.