Create a forecast matrix from time series samples

Question

I would like to create a matrix of delay from a time series.

For example, given

y = [y_0, y_1, y_2, ..., y_N] and W = 5

I need to create this matrix:

| 0       | 0       | 0       | 0       | 0   |
| 0       | 0       | 0       | 0       | y_0 |
| 0       | 0       | 0       | y_0     | y_1 |
| ...     |         |         |         |     |
| y_{N-4} | y_{N-3} | y_{N-2} | y_{N-1} | y_N |

I know that function timeseries_dataset_from_array from TensorFlow does approximatively the same thing when well configured, but I would like to avoid using TensorFlow.

This is my current function to perform this task:

def get_warm_up_matrix(_data: ndarray, W: int) -> ndarray:
    """
    Return a warm-up matrix
    If _data = [y_1, y_2, ..., y_N]
    The output matrix W will be
    W =     +---------+-----+---------+---------+-----+
            | 0       | ... | 0       | 0       | 0   |
            | 0       | ... | 0       | 0       | y_1 |
            | 0       | ... | 0       | y_1     | y_2 |
            | ...     | ... | ...     | ...     | ... |
            | y_1     | ... | y_{W-2} | y_{W-1} | y_W |
            | ...     | ... | ...     | ...     | ... |
            | y_{N-W} | ... | y_{N-2} | y_{N-1} | y_N |
            +---------+-----+---------+---------+-----+
    :param _data:
    :param W:
    :return:
    """
    N = len(_data)

    warm_up = np.zeros((N, W), dtype=_data.dtype)
    raw_data_with_zeros = np.concatenate((np.zeros(W, dtype=_data.dtype), _data), dtype=_data.dtype)

    for k in range(W, N + W):
        warm_up[k - W, :] = raw_data_with_zeros[k - W:k]

    return warm_up

It works well, but it's quite slow since the concatenate operation and the for loop take time to be performed. It also takes a lot of memory since the data have to be duplicated in memory before filling the matrix.

Can I make it faster and more memory-friendly?

In C++, we would use a std::ranges::slide_view for this (with a take_view to limit it to W rows). I don't know NumPy well enough, but does it have a concept of views like C++ does? — Toby Speight, Commented Sep 19, 2022 at 6:41

Reinderien · Accepted Answer · 2022-09-19 11:48:01Z

Yes, Numpy already has a built-in sliding window method (though it's perhaps a little obscure). This should indeed be memory-friendly, occupying about (W + N) and not W(W + N) due to it being a view.

Also, about your own code: don't underscore _data and don't capitalise w. Delete the boilerplate section of your docstring, and prefix ndarray with np in your typehints.

Suggested

import numpy as np


def get_warm_up_matrix(data: np.ndarray, w: int) -> np.ndarray:
    """
    Return a warm-up matrix
    If data = [y_1, y_2, ..., y_N]
    The output matrix W will be
    W = +---------+-----+---------+---------+-----+
        | 0       | ... | 0       | 0       | 0   |
        | 0       | ... | 0       | 0       | y_1 |
        | 0       | ... | 0       | y_1     | y_2 |
        | ...     | ... | ...     | ...     | ... |
        | y_1     | ... | y_{W-2} | y_{W-1} | y_W |
        | ...     | ... | ...     | ...     | ... |
        | y_{N-W} | ... | y_{N-2} | y_{N-1} | y_N |
        +---------+-----+---------+---------+-----+
    """
    padded = np.zeros(shape=len(data) + w, dtype=data.dtype)
    padded[w:] = data
    return np.lib.stride_tricks.sliding_window_view(x=padded, window_shape=w)


print(get_warm_up_matrix(data=np.arange(1, 11), w=4))

Output

[[ 0  0  0  0]
 [ 0  0  0  1]
 [ 0  0  1  2]
 [ 0  1  2  3]
 [ 1  2  3  4]
 [ 2  3  4  5]
 [ 3  4  5  6]
 [ 4  5  6  7]
 [ 5  6  7  8]
 [ 6  7  8  9]
 [ 7  8  9 10]]

For people wondering, the proposed code by @Reinderien is about 24 times faster than mine. — graille, Commented Sep 19, 2022 at 17:03

Stack Exchange Network

Create a forecast matrix from time series samples

1 Answer 1

Suggested

Output

Not the answer you're looking for? Browse other questions tagged
python
performance
python-3.x
numpy
memory-optimization
or ask your own question.

Hot Network Questions

Create a forecast matrix from time series samples

1 Answer 1

Suggested

Output

Not the answer you're looking for? Browse other questions tagged pythonperformancepython-3.xnumpymemory-optimization or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
python
performance
python-3.x
numpy
memory-optimization
or ask your own question.