2

I have a small time series with monthly intervals. I wanted to plot it and then decompose into seasonality, trend, residuals. I start by importing csv into pandas and than plotting just the time series which works fine. I follow This tutorial and my code goes like this:

%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd

ali3 = pd.read_csv('C:\\Users\\ALI\\Desktop\\CSV\\index\\ZIAM\\ME\\ME_DATA_7_MONTH_AVG_PROFIT\\data.csv',
 names=['Date', 'Month','AverageProfit'],
 index_col=['Date'],
 parse_dates=True)

\* Delete month column which is a string */
del ali3['Month']


ali3
plt.plot(ali3)

Data Frame

At this stage I try to do the seasonal decompose like this:

import statsmodels.api as sm 
res = sm.tsa.seasonal_decompose(ali3.AverageProfit)  
fig = res.plot() 

which results in the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-41-afeab639d13b> in <module>()
      1 import statsmodels.api as sm
----> 2 res = sm.tsa.seasonal_decompose(ali3.AverageProfit)
      3 fig = res.plot()

C:\Users\D063375\AppData\Local\Continuum\Anaconda2\lib\site-packages\statsmodels\tsa\seasonal.py in seasonal_decompose(x, model, filt, freq)
     86             filt = np.repeat(1./freq, freq)
     87 
---> 88     trend = convolution_filter(x, filt)
     89 
     90     # nan pad for conformability - convolve doesn't do it

C:\Users\D063375\AppData\Local\Continuum\Anaconda2\lib\site-packages\statsmodels\tsa\filters\filtertools.py in convolution_filter(x, filt, nsides)
    287 
    288     if filt.ndim == 1 or min(filt.shape) == 1:
--> 289         result = signal.convolve(x, filt, mode='valid')
    290     elif filt.ndim == 2:
    291         nlags = filt.shape[0]

C:\Users\D063375\AppData\Local\Continuum\Anaconda2\lib\site-packages\scipy\signal\signaltools.py in convolve(in1, in2, mode)
    468         return correlate(volume, kernel[slice_obj].conj(), mode)
    469     else:
--> 470         return correlate(volume, kernel[slice_obj], mode)
    471 
    472 

C:\Users\D063375\AppData\Local\Continuum\Anaconda2\lib\site-packages\scipy\signal\signaltools.py in correlate(in1, in2, mode)
    158 
    159     if mode == 'valid':
--> 160         _check_valid_mode_shapes(in1.shape, in2.shape)
    161         # numpy is significantly faster for 1d
    162         if in1.ndim == 1 and in2.ndim == 1:

C:\Users\D063375\AppData\Local\Continuum\Anaconda2\lib\site-packages\scipy\signal\signaltools.py in _check_valid_mode_shapes(shape1, shape2)
     70         if not d1 >= d2:
     71             raise ValueError(
---> 72                 "in1 should have at least as many items as in2 in "
     73                 "every dimension for 'valid' mode.")
     74 

ValueError: in1 should have at least as many items as in2 in every dimension for 'valid' mode.

Can anyone shed some light on what I'm doing wrong and how may I fix it? much obliged.

Edit: Thats how the data frame looks like

Date            AverageProfit

2015-06-01          29.990231
2015-07-01          26.080038
2015-08-01          25.640862
2015-09-01          25.346447
2015-10-01          27.386001
2015-11-01          26.357709
2015-12-01          25.260644
3
  • Can you please include your dataframe as text in the question, not as a screenshot?
    – IanS
    Commented Sep 20, 2016 at 9:48
  • Ok, I will edit it.
    – ljourney
    Commented Sep 20, 2016 at 9:51
  • Sorry, misunderstood your question before. It may be that you are not passing the freq value, that is the seasonality timescale?
    – AlvaroP
    Commented Sep 20, 2016 at 11:03

1 Answer 1

1

You have 7 data points, that is usually a very small number for performing stationarity analysis.

You don't have enough points to use seasonal decomposition. To see this, you can concatenate your data to create an extended time series (just repeating your data for the following months). Let extendedData be this extended dataframe and data your original data.

data.plot()

enter image description here

extendedData.plot()

enter image description here

res = sm.tsa.seasonal_decompose(extendedData.interpolate())
res.plot()

enter image description here

The frequency (freq) for the seasonal estimate is automatically estimated form the data, and can be manually specified.


You can try to take a first difference: generate a new time series subtracting each data value from the previous one. In your case it looks like this:

enter image description here

An stationarity test can be applied next, as explained here

4
  • Thank you for the answer, but my dataset only contains seven months of data. does that mean I cant decompose it?
    – ljourney
    Commented Sep 20, 2016 at 10:03
  • Are we talking about yearly seasonality (that is, 12 months) ? If so, you would need more data, at least 2 years, although all depends on the randomness of the data, etc. The link I posted before has valuable insights!
    – AlvaroP
    Commented Sep 20, 2016 at 10:21
  • I'm new to statistically looking into time series. I have seven months data. and I want to see if the time series is stationary or not. and i thought decomposing into trend, seasonality and residuals for seven months might make it more visible.
    – ljourney
    Commented Sep 20, 2016 at 10:40
  • Thank you for the comprehensive answer. I really appreciate it. It also made me realise that I need to learn more about the theory behind timeseries analysis.
    – ljourney
    Commented Sep 20, 2016 at 12:42

Not the answer you're looking for? Browse other questions tagged or ask your own question.