I'm struggling to find an elegant solution to create a running total column in my dataframe. It should start the count if two criteria are met and reset any time they aren't.
If the user in the example frame below is the same as the row before and the 'Value Col' is 0, the running total should start and increase by one for every row until either the user changes or the Value Col is NOT 0.
This is being run on a very large dataset (30+ million rows), so I'm hoping there can be a solution using built in, optimised functions, but I can brute force it with .apply if that's the only option.
Example:
User | Value Col | Running total |
---|---|---|
One | 2 | 0 |
One | 0 | 1 |
One | 0 | 2 |
One | 0 | 3 |
One | 1 | 0 |
One | 3 | 0 |
One | 0 | 1 |
One | 0 | 2 |
Two | 0 | 1 |
Two | 0 | 2 |
Two | 0 | 3 |
Two | 3 | 0 |
Two | 0 | 1 |
Two | 0 | 2 |