Python numpy.corrcoef() RuntimeWarning: invalid value encountered in true_divide c /= stddev[:, None]

Question

It seems that corrcoef from numpy throw a RuntimeWarning when a constant list passed to the corrcoef() function, for example the below code throw a warning :

import numpy as np
X = [1.0, 2.0, 3.0, 4.0]
Y = [2, 2, 2, 2]
print(np.corrcoef(X, Y)[0, 1])

Warning :

/usr/local/lib/python3.6/site-packages/numpy/lib/function_base.py:3003: RuntimeWarning: invalid value encountered in true_divide
  c /= stddev[:, None]

Can anyone explain why it's throwing this error when one of the lists is constant and how to prevent this error when a constant list is passed to the function?

The error is probably occurring because the standard deviation (stddev) of the constant list Y is 0. I'm not sure that it makes sense to calculate the covariance of something with respect to something that's constant... — Josh Karpel, Commented Aug 26, 2017 at 15:39
@JoshKarpel So covariance of a constant variable is undifined? — Abdennacer Lachiheb, Commented Aug 26, 2017 at 15:50
A quick Google (Covariance Rule #4) indicates that the covariance of a random variable with respect to a constant is zero. So it's not undefined, but any algorithm for calculating the covariance numerically probably assumes that it won't be zero. — Josh Karpel, Commented Aug 26, 2017 at 17:47

andrew_reece · Accepted Answer · 2017-08-26 18:12:47Z

Correlation is a measure of how well two vectors track with each other as they change. You can't track mutual change when one vector doesn't change.

As noted in OP comments, the formula for Pearson's product-moment correlation coefficient divides the covariance of X and Y by the product of their standard deviations. Since Y has zero variance in your example, its standard deviation is also zero. That's why you get the true_divide error - you're trying to divide by zero.

Note: It might seem tempting, from an engineering standpoint, to simply add a very small quantity (say, a value just above machine epsilon) onto one of the entries in Y, in order to get around the zero-division issue. But that's not statistically viable. Even adding 1e-15 will seriously derange your correlation coefficient, depending on which value you add it to.

Consider the difference between these two cases:

X = [1.0, 2.0, 3.0, 4.0]

tiny = 1e-15

# add tiny amount to second element
Y1 = [2., 2.+tiny, 2., 2.]
np.corrcoef(X, Y1)[0, 1] 
-0.22360679775

# add tiny amount to fourth element
Y2 = [2., 2., 2., 2.+tiny]
np.corrcoef(X, Y2)[0, 1]
0.67082039325

This may be obvious to statisticians, but given the nature of the question it seems like a relevant caveat.

How to ignore the warning?
– zhaozk
Commented Jul 7, 2023 at 6:20 — zhaozk, Commented Jul 7, 2023 at 6:20

baustin · Accepted Answer · 2024-02-13 01:55:46Z

2

If you would just like to suppress the warning, you can modify the function like this:

import numpy as np
X = [1.0, 2.0, 3.0, 4.0]
Y = [2, 2, 2, 2]
with np.errstate(divide="ignore", invalid="ignore"): 
    print(np.corrcoef(X, Y))

That should make it robust to divisions by zero and NaN values.

answered Feb 13 at 1:55

baustin

413 bronze badges

Add a comment |

EHadavi · Accepted Answer · 2022-04-12 09:28:07Z

The Pandas .corr is more versatile in such conditions and even presence of NaN(s), so first option could be to convert the lists to pandas series and use the pandas correlation and the response would be a nan;

import pandas as pd
X = [1.0, 2.0, 3.0, 4.0]
Y = [2, 2, 2, 2]
print(pd.Series(X).corr( pd.Series(Y)))

If you want to stick to numpy then you can check the standard deviation of series using an if statement, and proceed for correlation only when they are more than zero. Apparently here we have no valid output because one list has no variation, but the concept could be applied to other cases.

import numpy as np
X = [1.0, 2.0, 3.0, 4.0]
Y = [2, 2, 2, 2]
if np.std(Y)==0 or np.std(X)==0 :
    print ('The correlation could not be computed because the standard deviation of one of the series is equal to zero')
else:
    print(np.corrcoef(X, Y)[0, 1])

So what is the solution when the question is repeated? Similar answers are required or to be able to merge the questions. — EHadavi, Commented Sep 21, 2021 at 15:07

Gaslight Deceive Subvert · Accepted Answer · 2023-09-06 12:28:04Z

1

Here is a "fixed" version of np.corrcoef:

def pearsonccs(samples):
    C = np.cov(samples)
    diag = np.diag(C)
    N = np.sqrt(np.outer(diag, diag))
    N[N == 0] = 1
    return C / N

All it does is replacing the zeroes in the covariance matrix with one to avoid the divide-by-zero-error.

answered Sep 6, 2023 at 12:28

Gaslight Deceive Subvert

19.9k20 gold badges89 silver badges129 bronze badges

Add a comment |

Collectives™ on Stack Overflow

Python numpy.corrcoef() RuntimeWarning: invalid value encountered in true_divide c /= stddev[:, None]

4 Answers 4

Not the answer you're looking for? Browse other questions tagged
python
numpy
correlation
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Not the answer you're looking for? Browse other questions tagged pythonnumpycorrelation or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
numpy
correlation
or ask your own question.