2

I have a dataframe of boolean variables, idexed by timestamps. The timestamps are irregular and I wish to fill in the gaps. I know that the frequency needed is 3ms.

So far, I can do the following :

df = pd.read_csv(path, sep= ';')
df['timestamp'] = pd.to_datetime(df ['timestamp'], errors='raise',infer_datetime_format = True)
df = df.sort(['timestamp'])
df = df.set_index('timestamp')
df.reindex(pd.period_range(df.index[0], df.index[-1], freq='ms'))     
df = df.fillna(method = 'ffill')

So, I am reindexing using a ms interval and filling forward missing values (which is what fits my case : all variables are boolean, so at each moment, the current state is the last appearing in my data).

How can I resample every 3 milliseconds?

EDIT : It seems like DataFrame.resample can also be used for upsampling. Any suggestions on how to use it in my case ? I do not seem to get how it works.

1
  • Is possible add some data sample with expected output?
    – jezrael
    Commented Jan 25, 2019 at 12:54

2 Answers 2

2

Use DataFrame.asfreq:

df = pd.DataFrame({
    'timestamp': pd.to_datetime(['2015-02-01 15:14:11.30',
                                 '2015-02-01 15:14:11.36',
                                 '2015-02-01 15:14:11.39']),
    'B': [7,10,3]
})
print (df)
                timestamp   B
0 2015-02-01 15:14:11.300   7
1 2015-02-01 15:14:11.360  10
2 2015-02-01 15:14:11.390   3

df = df.set_index('timestamp').asfreq('3ms', method='ffill')

print (df)
                          B
timestamp                  
2015-02-01 15:14:11.300   7
2015-02-01 15:14:11.303   7
2015-02-01 15:14:11.306   7
2015-02-01 15:14:11.309   7
2015-02-01 15:14:11.312   7
2015-02-01 15:14:11.315   7
2015-02-01 15:14:11.318   7
2015-02-01 15:14:11.321   7
2015-02-01 15:14:11.324   7
2015-02-01 15:14:11.327   7
2015-02-01 15:14:11.330   7
2015-02-01 15:14:11.333   7
2015-02-01 15:14:11.336   7
2015-02-01 15:14:11.339   7
2015-02-01 15:14:11.342   7
2015-02-01 15:14:11.345   7
2015-02-01 15:14:11.348   7
2015-02-01 15:14:11.351   7
2015-02-01 15:14:11.354   7
2015-02-01 15:14:11.357   7
2015-02-01 15:14:11.360  10
2015-02-01 15:14:11.363  10
2015-02-01 15:14:11.366  10
2015-02-01 15:14:11.369  10
2015-02-01 15:14:11.372  10
2015-02-01 15:14:11.375  10
2015-02-01 15:14:11.378  10
2015-02-01 15:14:11.381  10
2015-02-01 15:14:11.384  10
2015-02-01 15:14:11.387  10
2015-02-01 15:14:11.390   3
1
  • Honestly, it is no more efficient than resample. Commented Jan 25, 2019 at 13:21
2

if you have your timestamp in index:

df = df.resample('3ms').ffill()

EDIT:

performance benchmark

import time
import pandas as pd


dd = {'dt': ['2018-01-01 00:00:00', '2018-01-01 01:12:59'], 'v':[1,1]}

df = pd.DataFrame(data=dd)
df['dt'] = pd.to_datetime(df['dt'])
df = df.set_index('dt')

start = time.time()
df = df.resample('3ms').ffill()
print(time.time() - start)


df = pd.DataFrame(data=dd)
df['dt'] = pd.to_datetime(df['dt'])
df = df.set_index('dt')

start = time.time()
df = df.asfreq('3ms', method='ffill')
print(time.time() - start)

print(df.shape)

result:

0.03699994087219238
0.029999732971191406
(1459667, 1)
6
  • Thanks. Your answer is correct but much less efficient than the one of jezrael : on my machine for my data set, your solution took 17s whereas jezrael's took 0,24 s.
    – Matina G
    Commented Jan 25, 2019 at 13:13
  • this is not possible. I have run the benchmark. check the results in the edit in my answer. Commented Jan 25, 2019 at 13:19
  • 1
    @naivepredictor - There are irregural datetimes data - so possible. You tested with regural data, so it is different test.
    – jezrael
    Commented Jan 25, 2019 at 13:21
  • 1
    I do not know what to respond to this... That is what I saw on my computer and for my data.. I performed over and over again, the result is te same..
    – Matina G
    Commented Jan 25, 2019 at 13:21
  • 1
    So, this is really good finding. I was not aware of it and was constantly using resample without taking care of performance as I did not know about asfreq use makes such huge difference. Commented Jan 25, 2019 at 13:29

Not the answer you're looking for? Browse other questions tagged or ask your own question.