When fitting a statsmodel, I'm receiving a warning about the date frequency.

First, I import a dataset:

import statsmodels as sm
df = sm.datasets.get_rdataset(package='datasets', dataname='airquality').data

df['Year'] = 1973
df['Date'] = pd.to_datetime(df[['Year', 'Month', 'Day']])

df.drop(columns=['Year', 'Month', 'Day'], inplace=True)
df.set_index('Date', inplace=True, drop=True)

Next I try to fit a SES model:

fit = sm.tsa.api.SimpleExpSmoothing(df['Wind']).fit()

Which returns this warning:

/anaconda3/lib/python3.6/site-packages/statsmodels/tsa/base/tsa_model.py:171: ValueWarning: No frequency information was provided, so inferred frequency D will be used. % freq, ValueWarning)

My dataset is daily so inferred 'D' is ok, but I was wondering how I can manually set the frequency.

Note that the DatetimeIndex doesn't have the freq (last line) ...

DatetimeIndex(['1973-05-01', '1973-05-02', '1973-05-03', '1973-05-04',
               '1973-05-05', '1973-05-06', '1973-05-07', '1973-05-08',
               '1973-05-09', '1973-05-10',
               '1973-09-21', '1973-09-22', '1973-09-23', '1973-09-24',
               '1973-09-25', '1973-09-26', '1973-09-27', '1973-09-28',
               '1973-09-29', '1973-09-30'],
              dtype='datetime64[ns]', name='Date', length=153, freq=None)

As per this answer I've checked for missing dates, but there doesn't appear to be any:

pd.date_range(start = '1973-05-01', end = '1973-09-30').difference(df.index)

DatetimeIndex([], dtype='datetime64[ns]', freq='D')

How should I set the frequency for the index?

2 Answers 2


I think pd.to_datetime not set default frequency, need DataFrame.asfreq:

df = df.set_index('Date').asfreq('d')
print (df.index)

DatetimeIndex(['1973-05-01', '1973-05-02', '1973-05-03', '1973-05-04',
               '1973-05-05', '1973-05-06', '1973-05-07', '1973-05-08',
               '1973-05-09', '1973-05-10',
               '1973-09-21', '1973-09-22', '1973-09-23', '1973-09-24',
               '1973-09-25', '1973-09-26', '1973-09-27', '1973-09-28',
               '1973-09-29', '1973-09-30'],
              dtype='datetime64[ns]', name='Date', length=153, freq='D')

But if duplicated values in index get error:

df = pd.concat([df, df])
df = df.set_index('Date')

print (df.asfreq('d').index)

ValueError: cannot reindex from a duplicate axis

Solution is use resample with some aggregate function:

print (df.resample('2D').mean().index)

DatetimeIndex(['1973-05-01', '1973-05-03', '1973-05-05', '1973-05-07',
               '1973-05-09', '1973-05-11', '1973-05-13', '1973-05-15',
               '1973-05-17', '1973-05-19', '1973-05-21', '1973-05-23',
               '1973-05-25', '1973-05-27', '1973-05-29', '1973-05-31',
               '1973-06-02', '1973-06-04', '1973-06-06', '1973-06-08',
               '1973-06-10', '1973-06-12', '1973-06-14', '1973-06-16',
               '1973-06-18', '1973-06-20', '1973-06-22', '1973-06-24',
               '1973-06-26', '1973-06-28', '1973-06-30', '1973-07-02',
               '1973-07-04', '1973-07-06', '1973-07-08', '1973-07-10',
               '1973-07-12', '1973-07-14', '1973-07-16', '1973-07-18',
               '1973-07-20', '1973-07-22', '1973-07-24', '1973-07-26',
               '1973-07-28', '1973-07-30', '1973-08-01', '1973-08-03',
               '1973-08-05', '1973-08-07', '1973-08-09', '1973-08-11',
               '1973-08-13', '1973-08-15', '1973-08-17', '1973-08-19',
               '1973-08-21', '1973-08-23', '1973-08-25', '1973-08-27',
               '1973-08-29', '1973-08-31', '1973-09-02', '1973-09-04',
               '1973-09-06', '1973-09-08', '1973-09-10', '1973-09-12',
               '1973-09-14', '1973-09-16', '1973-09-18', '1973-09-20',
               '1973-09-22', '1973-09-24', '1973-09-26', '1973-09-28',
              dtype='datetime64[ns]', name='Date', freq='2D')

The problem is caused by the not explicitly set frequence. In most cases you can't be sure that your data does not have any gaps, so generate a data range with

rng = pd.date_range(start = '1973-05-01', end = '1973-09-30', freq='D')

reindex your DataFrame with this rng and fill the np.nan with your method or value of choice.


Not the answer you're looking for? Browse other questions tagged or ask your own question.