Here's a solution:
from pandas import DataFrame
from datetime import datetime, time
df = DataFrame([
{'TransactionID': 'xyz1', 'DateTime': datetime(year=2022, month=6, day=16, hour=13, minute=1, second=2)},
{'TransactionID': 'xyz1', 'DateTime': datetime(year=2022, month=6, day=16, hour=13, minute=3, second=4)},
{'TransactionID': 'xyz1', 'DateTime': datetime(year=2022, month=6, day=17, hour=13, minute=5, second=6)},
{'TransactionID': 'xyz1', 'DateTime': datetime(year=2022, month=6, day=18, hour=13, minute=1, second=2)},
{'TransactionID': 'xyz1', 'DateTime': datetime(year=2022, month=6, day=19, hour=13, minute=3, second=4)}
])
times = DataFrame(df['DateTime'].dt.time).groupby(['DateTime'])['DateTime'].count()
time_counts = DataFrame((
(t := time(h, m, s), int(times[t]) if t in times else 0)
for h in range(24) for m in range(60) for s in range(60)
), columns=['Time', 'Counts'])
print(time_counts)
print(time_counts[time_counts['Time'] == time(13, 1, 2)])
This basically creates an additional DataFrame times
based on the original data (assuming you don't even have a Time
and Date
column, but if you do, you can of course use those) - times
takes all the times from df
and then groups and counts them.
However, that is missing the times which don't occur in df
, so time_counts
is constructed by generated all possible times and either selecting the count from times
or 0 if it doesn't exist in times
.
Result:
Time Counts
0 00:00:00 0
1 00:00:01 0
2 00:00:02 0
3 00:00:03 0
4 00:00:04 0
... ... ...
86395 23:59:55 0
86396 23:59:56 0
86397 23:59:57 0
86398 23:59:58 0
86399 23:59:59 0
[86400 rows x 2 columns]
Time Counts
46862 13:01:02 2
User @riley's suggestion seems to point at a nicer solution, something like:
# create df as before, then:
df['Time'] = df['DateTime'].dt.time.astype('category')
result = df.value_counts('Time')
But that seems to have the same problem as you originally stated, even though the dtype
of the 'Time'
column is category
. Perhaps someone has additional suggestions to make that work.
pandas
? (probably add that tag) Have you tried anything at all? Your problem is fairly common and the pandas documentation and available guides and tutorials often tackle it? What exactly are you stuck on? Please share some code, but also provide an example (a few rows) of the data and what output you expect..value_counts()
is what you need, isn't the answer simply the result of.value_counts()
, with an added row for 0 values, which would be the total number of rows in the original DataFrame, minus the total of values in the.value_counts()
result?value_counts
will then include the zero values.