How to set frequency with pd.to_datetime()?

Question

When fitting a statsmodel, I'm receiving a warning about the date frequency.

First, I import a dataset:

import statsmodels as sm
df = sm.datasets.get_rdataset(package='datasets', dataname='airquality').data

df['Year'] = 1973
df['Date'] = pd.to_datetime(df[['Year', 'Month', 'Day']])

df.drop(columns=['Year', 'Month', 'Day'], inplace=True)
df.set_index('Date', inplace=True, drop=True)

Next I try to fit a SES model:

fit = sm.tsa.api.SimpleExpSmoothing(df['Wind']).fit()

Which returns this warning:

/anaconda3/lib/python3.6/site-packages/statsmodels/tsa/base/tsa_model.py:171: ValueWarning: No frequency information was provided, so inferred frequency D will be used. % freq, ValueWarning)

My dataset is daily so inferred 'D' is ok, but I was wondering how I can manually set the frequency.

Note that the DatetimeIndex doesn't have the freq (last line) ...

DatetimeIndex(['1973-05-01', '1973-05-02', '1973-05-03', '1973-05-04',
               '1973-05-05', '1973-05-06', '1973-05-07', '1973-05-08',
               '1973-05-09', '1973-05-10',
               ...
               '1973-09-21', '1973-09-22', '1973-09-23', '1973-09-24',
               '1973-09-25', '1973-09-26', '1973-09-27', '1973-09-28',
               '1973-09-29', '1973-09-30'],
              dtype='datetime64[ns]', name='Date', length=153, freq=None)

As per this answer I've checked for missing dates, but there doesn't appear to be any:

pd.date_range(start = '1973-05-01', end = '1973-09-30').difference(df.index)

DatetimeIndex([], dtype='datetime64[ns]', freq='D')

How should I set the frequency for the index?

jezrael · Accepted Answer · 2019-02-11 12:25:57Z

I think pd.to_datetime not set default frequency, need DataFrame.asfreq:

df = df.set_index('Date').asfreq('d')
print (df.index)

DatetimeIndex(['1973-05-01', '1973-05-02', '1973-05-03', '1973-05-04',
               '1973-05-05', '1973-05-06', '1973-05-07', '1973-05-08',
               '1973-05-09', '1973-05-10',
               ...
               '1973-09-21', '1973-09-22', '1973-09-23', '1973-09-24',
               '1973-09-25', '1973-09-26', '1973-09-27', '1973-09-28',
               '1973-09-29', '1973-09-30'],
              dtype='datetime64[ns]', name='Date', length=153, freq='D')

But if duplicated values in index get error:

df = pd.concat([df, df])
df = df.set_index('Date')

print (df.asfreq('d').index)

ValueError: cannot reindex from a duplicate axis

Solution is use resample with some aggregate function:

print (df.resample('2D').mean().index)

DatetimeIndex(['1973-05-01', '1973-05-03', '1973-05-05', '1973-05-07',
               '1973-05-09', '1973-05-11', '1973-05-13', '1973-05-15',
               '1973-05-17', '1973-05-19', '1973-05-21', '1973-05-23',
               '1973-05-25', '1973-05-27', '1973-05-29', '1973-05-31',
               '1973-06-02', '1973-06-04', '1973-06-06', '1973-06-08',
               '1973-06-10', '1973-06-12', '1973-06-14', '1973-06-16',
               '1973-06-18', '1973-06-20', '1973-06-22', '1973-06-24',
               '1973-06-26', '1973-06-28', '1973-06-30', '1973-07-02',
               '1973-07-04', '1973-07-06', '1973-07-08', '1973-07-10',
               '1973-07-12', '1973-07-14', '1973-07-16', '1973-07-18',
               '1973-07-20', '1973-07-22', '1973-07-24', '1973-07-26',
               '1973-07-28', '1973-07-30', '1973-08-01', '1973-08-03',
               '1973-08-05', '1973-08-07', '1973-08-09', '1973-08-11',
               '1973-08-13', '1973-08-15', '1973-08-17', '1973-08-19',
               '1973-08-21', '1973-08-23', '1973-08-25', '1973-08-27',
               '1973-08-29', '1973-08-31', '1973-09-02', '1973-09-04',
               '1973-09-06', '1973-09-08', '1973-09-10', '1973-09-12',
               '1973-09-14', '1973-09-16', '1973-09-18', '1973-09-20',
               '1973-09-22', '1973-09-24', '1973-09-26', '1973-09-28',
               '1973-09-30'],
              dtype='datetime64[ns]', name='Date', freq='2D')

JoergVanAken · Accepted Answer · 2019-02-11 12:10:42Z

1

The problem is caused by the not explicitly set frequence. In most cases you can't be sure that your data does not have any gaps, so generate a data range with

rng = pd.date_range(start = '1973-05-01', end = '1973-09-30', freq='D')

reindex your DataFrame with this rng and fill the np.nan with your method or value of choice.

answered Feb 11, 2019 at 12:10

JoergVanAken

1,2769 silver badges11 bronze badges

1

With "method or value of choice" the parameters "method" or "fill_value" of the asfreq() function are meant, see the details at pandas.pydata.org/pandas-docs/stable/reference/api/….
– questionto42
Commented Aug 4, 2020 at 19:05

Add a comment |

Collectives™ on Stack Overflow

How to set frequency with pd.to_datetime()?

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
pandas
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Not the answer you're looking for? Browse other questions tagged pandas or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
pandas
or ask your own question.