1

I'm trying to generate a frequency DataFrame in Jupyter/Python of HH:MM:SS elements of a datetime column.

Aside from iterating over all HH:MM:SS combinations and counting them (I need to include 0 values), is there a function in Python that can do it for me?

.value_counts() creates what I need, however, 0 values are not included.

Many thanks, in advance, for your assistance :)

EDIT: sample data:

TransactionID DateTime Date Time
012sad9j20j 01/01/22 04:23:32 01/01/22 04:23:32
938hfd82dj2 07/04/22 23:12:59 07/04/22 23:12:59
s9j20jd902j 18/05/22 13:44:19 18/05/22 13:44:19

Expected to generate a dataframe containing:

Time Count
04:23:31 0
04:23:32 1
04:23:33 0
5
  • Since you're talking about a 'DataFrame', can we assume you're using pandas? (probably add that tag) Have you tried anything at all? Your problem is fairly common and the pandas documentation and available guides and tutorials often tackle it? What exactly are you stuck on? Please share some code, but also provide an example (a few rows) of the data and what output you expect.
    – Grismar
    Commented Jun 15, 2022 at 21:45
  • If .value_counts() is what you need, isn't the answer simply the result of .value_counts(), with an added row for 0 values, which would be the total number of rows in the original DataFrame, minus the total of values in the .value_counts() result?
    – Grismar
    Commented Jun 15, 2022 at 21:47
  • Thanks both; I've edited to include the tag, yes, I am utilising pandas. Essentially, I have a table of transactions each transaction with a datetime field. I have seperated this datetime element into a date column and a time column. What I am wanting to do is create a summary dataframe counting all records for that specific time, however, there isnt a transaction for every single time element in a day. The purpose is to create a line plot to show times of increased transactions, and also quietest periods.
    – Shaun R
    Commented Jun 15, 2022 at 22:49
  • My current plan is to iterate through each time element 00:00:00, 00:00:01, 00:00:02 etc counting rows - this is a long winded way and was hoping for something a lot more efficient. I'm relatively new to Python and learning via Google, so I apologise if I am looking at this incorrectly (or even using SO incorrectly; I've had a few questions closed because of my errors!)
    – Shaun R
    Commented Jun 15, 2022 at 22:54
  • You can always change the dtype of your Time column to be a category, whose categorical values are all possible times. The value_counts will then include the zero values.
    – Riley
    Commented Jun 16, 2022 at 0:23

1 Answer 1

1

Here's a solution:

from pandas import DataFrame
from datetime import datetime, time

df = DataFrame([
    {'TransactionID': 'xyz1', 'DateTime': datetime(year=2022, month=6, day=16, hour=13, minute=1, second=2)},
    {'TransactionID': 'xyz1', 'DateTime': datetime(year=2022, month=6, day=16, hour=13, minute=3, second=4)},
    {'TransactionID': 'xyz1', 'DateTime': datetime(year=2022, month=6, day=17, hour=13, minute=5, second=6)},
    {'TransactionID': 'xyz1', 'DateTime': datetime(year=2022, month=6, day=18, hour=13, minute=1, second=2)},
    {'TransactionID': 'xyz1', 'DateTime': datetime(year=2022, month=6, day=19, hour=13, minute=3, second=4)}
])

times = DataFrame(df['DateTime'].dt.time).groupby(['DateTime'])['DateTime'].count()

time_counts = DataFrame((
    (t := time(h, m, s), int(times[t]) if t in times else 0)
    for h in range(24) for m in range(60) for s in range(60)
), columns=['Time', 'Counts'])
print(time_counts)

print(time_counts[time_counts['Time'] == time(13, 1, 2)])

This basically creates an additional DataFrame times based on the original data (assuming you don't even have a Time and Date column, but if you do, you can of course use those) - times takes all the times from df and then groups and counts them.

However, that is missing the times which don't occur in df, so time_counts is constructed by generated all possible times and either selecting the count from times or 0 if it doesn't exist in times.

Result:

           Time  Counts
0      00:00:00       0
1      00:00:01       0
2      00:00:02       0
3      00:00:03       0
4      00:00:04       0
...         ...     ...
86395  23:59:55       0
86396  23:59:56       0
86397  23:59:57       0
86398  23:59:58       0
86399  23:59:59       0

[86400 rows x 2 columns]
           Time  Counts
46862  13:01:02       2

User @riley's suggestion seems to point at a nicer solution, something like:

# create df as before, then:

df['Time'] = df['DateTime'].dt.time.astype('category')
result = df.value_counts('Time')

But that seems to have the same problem as you originally stated, even though the dtype of the 'Time' column is category. Perhaps someone has additional suggestions to make that work.

1
  • Your first code snippet worked an absolute breeze - thanks so much. Just to add, I wasnt able to plot these values straight away i received a "float() argument must be a string or a number, not 'datetime.time'" message To fix this, I created an additional column which appended the current date to all rows using time_counts['datetime'] = [datetime.datetime.combine(datetime.date.today(), t) for t in time_counts['Time']]
    – Shaun R
    Commented Jun 16, 2022 at 6:20

Not the answer you're looking for? Browse other questions tagged or ask your own question.