0

Task:

Calculate the frequency of each ID for each month of 2021

Frequency formula: Activity period (Length of time between last activity and first activity) / (Number of activity Days - 1)

e.g. ID 1 - Month 2: Activity Period (2021-02-23 - 2021-02-18 = 5 days) / (3 active days - 1) == Frequency = 2,5

Sample:

times = [
    '2021-02-18',
    '2021-02-22',
    '2021-02-23',
    '2021-04-23',
    '2021-01-18',
    '2021-01-19',
    '2021-01-20',
    '2021-01-03',
    '2021-02-04',
    '2021-02-04'
] 

id = [1, 1, 1, 1, 44, 44, 44, 46, 46, 46]

df = pd.DataFrame({'ID':id, 'Date': pd.to_datetime(times)})

df = df.reset_index(drop=True)

print(df)

       ID       Date
0   1 2021-02-18
1   1 2021-02-22
2   1 2021-02-23
3   1 2021-04-23
4  44 2021-01-18
5  44 2021-01-19
6  44 2021-01-20
7  46 2021-01-03
8  46 2021-02-04
9  46 2021-02-04

Desired Output:

If frequency negative == 0

  id  01_2021  02_2021  03_2021  04_2021
0   1        0        2        0        0
1  44        1        0        0        0
2  46        0        0        0        0 
0

2 Answers 2

1

Try a pivot_table with a custom aggfunc:

# Create Columns For Later
dr = pd.date_range(start=df['Date'].min(),
                   end=df['Date'].max() + pd.offsets.MonthBegin(1), freq='M') \
    .map(lambda dt: dt.strftime('%m_%Y'))

new_df = (
    df.pivot_table(
        index='ID',
        # Columns are dates in MM_YYYY format
        columns=df['Date'].dt.strftime('%m_%Y'),
        # Custom Agg Function
        aggfunc=lambda x: (x.max() - x.min()) /
                          pd.offsets.Day(max(1, len(x) - 1))
        # max(1, len(x) -1) to prevent divide by 0
    )
        # Fix Axis Names and Column Levels
        .droplevel(0, axis=1)
        .rename_axis(None, axis=1)
        # Reindex  to include every month from min to max date
        .reindex(dr, axis=1)
        # Clip to exclude negatives
        .clip(lower=0)
        # Fillna with 0
        .fillna(0)
        # Reset index
        .reset_index()
)

print(new_df)

new_df:

   ID  01_2021  02_2021  03_2021  04_2021
0   1      0.0      2.5      0.0      0.0
1  44      1.0      0.0      0.0      0.0
2  46      0.0      0.0      0.0      0.0
2
  • Thanks Henry! What if I want to take the cumulative Number of activity Days instead?
    – Robin_Hcp
    Commented May 18, 2021 at 7:21
  • You’re welcome. That should be a separate question if there are not already answers.
    – Henry Ecker
    Commented May 18, 2021 at 10:38
0

You will need to pivot the table, but first if you want only the month and year of the date, you need to transform it.

df['Date'] = df.Date.map(lambda s: "{}_{}".format(s.year,s.month))
df['counts'] = 1
df_new = pd.pivot_table(df, index=['ID'],
                        columns=['Date'], aggfunc=np.sum)

Not the answer you're looking for? Browse other questions tagged or ask your own question.