I'm considering two strategies to do "data augmentation" on time-series forecasting.
First, a little bit of background. A predictor $P$ to forecast the next step of a time-series $\lbrace A_i\rbrace$ is a function that typically depends on two things, the time-series past states, but also the predictor's past states:
$$P(\lbrace A_{i\leq t-1}\rbrace,P_{S_{t-1}})$$
If we want to adjust/train our system to obtain a good $P$, then we'll need enough data. Sometimes available data won't be enough, so we consider doing data augmentation.
First approach
Suppose we have the time-series $\lbrace A_i \rbrace$, with $1 \leq i \leq n$. And suppose also that we have $\epsilon$ that meets the following condition: $0<\epsilon < |A_{i+1} - A_i| \forall i \in \lbrace 1, \ldots,n\rbrace$.
We can construct a new time series $\lbrace B_i = A_i+r_i\rbrace$, where $r_i$ is a realization of the distribution $N(0,\frac{\epsilon}{2}) $.
Then, instead of minimizing the loss function only over $\lbrace A_i \rbrace$, we do that also over $\lbrace B_i \rbrace$. So, if the optimization process takes $m$ steps, we have to "initialize" the predictor $2m$ times, and we'll compute approximately $2m(n-1)$ predictor internal states.
Second approach
We compute $\lbrace B_i \rbrace$ as before, but we don't update the predictor's internal state using $\lbrace B_i \rbrace$, but $\lbrace A_i \rbrace$. We only use the two series together at the time of computing the loss function, so we'll compute approximately $m(n-1)$ predictor internal states.
Of course, there is less computational work here (although the algorithm is a little bit uglier), but it does not matter for now.
The doubt
The problem is: from a statistical point of view, which is the the "best" option? And why?
My intuition tells me that the first one is better, because it helps to "regularize" the weights related with the internal state, while the second one only helps to regularize the weights related with the observed time-series' past.
Extra:
- Any other ideas to do data augmentation for time series forecasting?
- How to weight the synthetic data in the training set?