Python Apply function to create new rows in loop

Question

Goal: Here is a sample of a dataset that has "ID", "PHASENAME", "CDAYS", "MULTI_FACTOR", "DAY_COUNTER", and "DAILY_LABOR_PERCENT". The goal is to take every "ID", "PHASENAME", "CDAYS" and increase the "DAY_COUNTER" from whatever the last day is increase it to 100 days. I am also to repeat this for "DAILY_LABOR_PERCENT".

currently

ID	PHASENAME	C_DAYS	Multi_Factor	DAY_COUNTER	DAILY_LABOR_%
BAR	C	14.0	7.142857	1	1.0
BAR	C	14.0	7.142857	2	5.0
BAR	C	14.0	7.142857	13	4.0
BAR	C	14.0	7.142857	14	0.0

need it to be

ID	PHASENAME	C_DAYS	Multi_Factor	DAY_COUNTER	DAILY_LABOR_%
BAR	C	14.0	7.142857	1	.14
BAR	C	14.0	7.142857	2	some number
BAR	C	14.0	7.142857	99	some number
BAR	C	14.0	7.142857	100	0.0

In this sample dataset, this person worked for 14 days and their labor percentage is recorded for each day. Instead of have 14 days of labor, I want to have 100 days of labor ( formulas: 'DAY_COUNTER' * 'Multi_Factor' and "DAILY_LABOR_PERCENT" / 'Multi_Factor ).

Also, I'm worried that once I enlarge this dataset having the range as 100 won't work.

Looking for

The Labor is repeating. So when I apply my logic I am not getting the true labor percentage.
Suggestions on how to approach this with a larger dataset.

code

import pandas as pd
import numpy as np


data={
    "ID": [ "BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR"],
    "PHASENAME": [ "C","C","C","C","C","C","C","C","C","C","C","C","C","C"],
    "C_DAYS": [ 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0],
    "Multi_Factor": [7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857],
    "DAY_COUNTER": [1,2,3,4,5,6,7,8,9,10,11,12,13,14],
    "DAILY_LABOR_PERCENT": [1.0,5.0,9.0,11.0,10.0,9.0,9.0,9.0,8.0,10.0,8.0,7.0,4.0,0.0],
    }

df=pd.DataFrame(data)
df1=df.copy()


n = 100
first_day = df["DAY_COUNTER"].iloc[0]

rep = np.tile(df.values, (n // len(df) + 1, 1))
out = pd.DataFrame(rep, columns=df.columns).iloc[:n]

out["DAY_COUNTER"] = range(first_day, n + first_day)


def labor_factor(row):
    return row['DAILY_LABOR_PERCENT'] / row['Multi_Factor']

#creating empty list to store rows for each id, phasename, and cdays
new_rows= []

#iterating through each index range and applying function to get day_counter to 100

for i in range(len(out)):
    out['new_labor']= out.apply(labor_factor,axis=1)

@Barmar I’m not sure if I’m understanding. I thought that my math function could create a new value and the iteration would have created new rows for that value in the new dataset. — Ty Kendall, Commented Jul 3 at 22:09
The problem is how you're passing the arguments to the function. .apply() isn't passing the entire row, it's just passing one value. — Barmar, Commented Jul 3 at 22:11
I think you may be confusing how Pandas.DataFrame.apply and Pandas.Series.apply work. .iloc returns a series. — Barmar, Commented Jul 3 at 22:12

e-motta · Accepted Answer · 2024-07-03 22:21:34Z

0

As a general rule, you want to avoid iterating over dataframe rows, as that's very inefficient.

IIUC, you can instead repeat your dataframe using something vectorized like numpy.tile until you have the desired number of rows, then adjust the values accordingly:

n = 100
first_day = df["DAY_COUNTER"].iloc[0]

rep = np.tile(df.values, (n // len(df) + 1, 1))
out = pd.DataFrame(rep, columns=df.columns).iloc[:n]

out["DAY_COUNTER"] = range(first_day, n + first_day)

     ID PHASENAME C_DAYS Multi_Factor  DAY_COUNTER DAILY_LABOR_PERCENT
0   BAR         C   14.0     7.142857            1                 1.0
1   BAR         C   14.0     7.142857            2                 5.0
2   BAR         C   14.0     7.142857            3                 9.0
3   BAR         C   14.0     7.142857            4                11.0
4   BAR         C   14.0     7.142857            5                10.0
..  ...       ...    ...          ...          ...                 ...
95  BAR         C   14.0     7.142857           96                 7.0
96  BAR         C   14.0     7.142857           97                 4.0
97  BAR         C   14.0     7.142857           98                 0.0
98  BAR         C   14.0     7.142857           99                 1.0
99  BAR         C   14.0     7.142857          100                 5.0

answered Jul 3 at 22:21

e-motta

5,7543 gold badges7 silver badges30 bronze badges

Where is the multiplication by Multi_Factor?
– Barmar
Commented Jul 3 at 22:41
@Barmar it's not needed to "get day_counter to 100".
– e-motta
Commented Jul 3 at 22:48
@e-motta thanks, for teaching me about numpy tile and for the refresher about for loops. However, now, I am stuck with expanding the "DAILY_LABOR_PERCENT". Right now, the labor is repeating. I will update my problem and query above.
– Ty Kendall
Commented Jul 3 at 23:45
@Barmar thanks, for teaching me about numpy tile and for the refresher about for loops. However, now, I am stuck with expanding the "DAILY_LABOR_PERCENT". Right now, the labor is repeating. I will update my problem and query above.
– Ty Kendall
Commented Jul 3 at 23:45
@TyKendall If you need to divide column 'DAILY_LABOR_PERCENT' proportionately by how much the dataframe was enlarged, you can do out["DAILY_LABOR_PERCENT"] = out["DAILY_LABOR_PERCENT"] / (n / len(df)). n / len(df) in this case is 7.142857, but this way you make it dynamic and don't need to store a repeated value in another column.
– e-motta
Commented Jul 4 at 0:01

| Show 2 more comments

Collectives™ on Stack Overflow

Python Apply function to create new rows in loop

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
python
pandas
loops
iteration
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged pythonpandasloopsiteration or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
pandas
loops
iteration
or ask your own question.