Autocorrelation of a multidimensional array in numpy

Question

I have a two dimensional array, i.e. an array of sequences which are also arrays. For each sequence I would like to calculate the autocorrelation, so that for a (5,4) array, I would get 5 results, or an array of dimension (5,7).

I know I could just loop over the first dimension, but that's slow and my last resort. Is there another way?

Thanks!

EDIT:

Based on the chosen answer plus the comment from mtrw, I have the following function:

def xcorr(x):
  """FFT based autocorrelation function, which is faster than numpy.correlate"""
  # x is supposed to be an array of sequences, of shape (totalelements, length)
  fftx = fft(x, n=(length*2-1), axis=1)
  ret = ifft(fftx * np.conjugate(fftx), axis=1)
  ret = fftshift(ret, axes=1)
  return ret

Note that length is a global variable in my code, so be sure to declare it. I also didn't restrict the result to real numbers, since I need to take into account complex numbers as well.

Great if you could add the computation of length
– FooBar
Commented May 16, 2019 at 12:28 — FooBar, Commented May 16, 2019 at 12:28

Andrew · Accepted Answer · 2010-12-21 21:04:38Z

Using FFT-based autocorrelation:

import numpy
from numpy.fft import fft, ifft

data = numpy.arange(5*4).reshape(5, 4)
print data
##[[ 0  1  2  3]
## [ 4  5  6  7]
## [ 8  9 10 11]
## [12 13 14 15]
## [16 17 18 19]]
dataFT = fft(data, axis=1)
dataAC = ifft(dataFT * numpy.conjugate(dataFT), axis=1).real
print dataAC
##[[   14.     8.     6.     8.]
## [  126.   120.   118.   120.]
## [  366.   360.   358.   360.]
## [  734.   728.   726.   728.]
## [ 1230.  1224.  1222.  1224.]]

I'm a little confused by your statement about the answer having dimension (5, 7), so maybe there's something important I'm not understanding.

EDIT: At the suggestion of mtrw, a padded version that doesn't wrap around:

import numpy
from numpy.fft import fft, ifft

data = numpy.arange(5*4).reshape(5, 4)
padding = numpy.zeros((5, 3))
dataPadded = numpy.concatenate((data, padding), axis=1)
print dataPadded
##[[  0.   1.   2.   3.   0.   0.   0.   0.]
## [  4.   5.   6.   7.   0.   0.   0.   0.]
## [  8.   9.  10.  11.   0.   0.   0.   0.]
## [ 12.  13.  14.  15.   0.   0.   0.   0.]
## [ 16.  17.  18.  19.   0.   0.   0.   0.]]
dataFT = fft(dataPadded, axis=1)
dataAC = ifft(dataFT * numpy.conjugate(dataFT), axis=1).real
print numpy.round(dataAC, 10)[:, :4]
##[[   14.     8.     3.     0.     0.     3.     8.]
## [  126.    92.    59.    28.    28.    59.    92.]
## [  366.   272.   179.    88.    88.   179.   272.]
## [  734.   548.   363.   180.   180.   363.   548.]
## [ 1230.   920.   611.   304.   304.   611.   920.]]

There must be a more efficient way to do this, especially because autocorrelation is symmetric and I don't take advantage of that.

+1 for the FFT based approach. As for the (5,7) shaped answer, you've computed the circular correlation (en.wikipedia.org/wiki/…). Simply pad each row with 3 zeros so that the spectral multiplication doesn't wrap around, and you'll get what the orginal question asked for. — mtrw, Commented Dec 21, 2010 at 20:48
Thanks guys, that looks promising! For zero padding, I just need to add n=(length*2-1) to fft?? — Christoph, Commented Dec 21, 2010 at 21:13
For a 1-D sequence with n variables, this solution would pad with n-1 zeros. So if the data shape had been (5, 121), the padding shape would be (5, 120) — Andrew, Commented Dec 21, 2010 at 21:22
I will update my question with the answer and choose yours as the "official" answer because it led me to the right solution, with the comment from mtrw. Thanks guys! — Christoph, Commented Dec 21, 2010 at 21:26
Good call using the 'n=' option of the fft instead of padding by hand. — Andrew, Commented Dec 21, 2010 at 22:12

macrocosme · Accepted Answer · 2020-08-18 11:21:29Z

4

For really large arrays it becomes important to have n = 2 ** p, where p is an integer. This will save you huge amounts of time. For example:

def xcorr(x):
    l = 2 ** int(np.log2(x.shape[1] * 2 - 1))
    fftx = fft(x, n = l, axis = 1)
    ret = ifft(fftx * np.conjugate(fftx), axis = 1)
    ret = fftshift(ret, axes=1)
    return ret

This might give you wrap-around errors. For large arrays the auto correlation should be insignificant near the edges, though.

edited Aug 18, 2020 at 11:21

macrocosme

4737 silver badges24 bronze badges

answered Mar 17, 2012 at 15:33

Lasse

411 bronze badge

1

Great if you could add the computation of length -- is it simply x.shape[1]?
– FooBar
Commented May 16, 2019 at 12:28

Add a comment |

Alex Eftimiades · Accepted Answer · 2015-04-27 21:17:35Z

Maybe it's just a preference, but I wanted to follow from the definition. I personally find it a bit easier to follow that way. This is my implementation for an arbitrary nd array.

from itertools import product
from numpy import empty, roll

def autocorrelate(x):
    """
    Compute the multidimensional autocorrelation of an nd array.
    input: an nd array of floats
    output: an nd array of autocorrelations
    """

    # used for transposes
    t = roll(range(x.ndim), 1)

    # pairs of indexes
    # the first is for the autocorrelation array
    # the second is the shift
    ii = [list(enumerate(range(1, s - 1))) for s in x.shape]

    # initialize the resulting autocorrelation array
    acor = empty(shape=[len(s0) for s0 in ii])

    # iterate over all combinations of directional shifts
    for i in product(*ii):
        # extract the indexes for
        # the autocorrelation array 
        # and original array respectively
        i1, i2 = asarray(i).T

        x1 = x.copy()
        x2 = x.copy()

        for i0 in i2:
            # clip the unshifted array at the end
            x1 = x1[:-i0]
            # and the shifted array at the beginning
            x2 = x2[i0:]

            # prepare to do the same for 
            # the next axis
            x1 = x1.transpose(t)
            x2 = x2.transpose(t)

        # normalize shifted and unshifted arrays
        x1 -= x1.mean()
        x1 /= x1.std()
        x2 -= x2.mean()
        x2 /= x2.std()

        # compute the autocorrelation directly
        # from the definition
        acor[tuple(i1)] = (x1 * x2).mean()

    return acor

Collectives™ on Stack Overflow

Autocorrelation of a multidimensional array in numpy

3 Answers 3

Not the answer you're looking for? Browse other questions tagged
python
numpy
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Not the answer you're looking for? Browse other questions tagged pythonnumpy or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
numpy
or ask your own question.