Goal: Here is a sample of a dataset that has "ID", "PHASENAME", "CDAYS", "MULTI_FACTOR", "DAY_COUNTER", and "DAILY_LABOR_PERCENT". The goal is to take every "ID", "PHASENAME", "CDAYS" and increase the "DAY_COUNTER" from whatever the last day is increase it to 100 days. I am also to repeat this for "DAILY_LABOR_PERCENT".
currently
ID | PHASENAME | C_DAYS | Multi_Factor | DAY_COUNTER | DAILY_LABOR_% |
---|---|---|---|---|---|
BAR | C | 14.0 | 7.142857 | 1 | 1.0 |
BAR | C | 14.0 | 7.142857 | 2 | 5.0 |
BAR | C | 14.0 | 7.142857 | 13 | 4.0 |
BAR | C | 14.0 | 7.142857 | 14 | 0.0 |
need it to be
ID | PHASENAME | C_DAYS | Multi_Factor | DAY_COUNTER | DAILY_LABOR_% |
---|---|---|---|---|---|
BAR | C | 14.0 | 7.142857 | 1 | .14 |
BAR | C | 14.0 | 7.142857 | 2 | some number |
BAR | C | 14.0 | 7.142857 | 99 | some number |
BAR | C | 14.0 | 7.142857 | 100 | 0.0 |
In this sample dataset, this person worked for 14 days and their labor percentage is recorded for each day. Instead of have 14 days of labor, I want to have 100 days of labor ( formulas: 'DAY_COUNTER' * 'Multi_Factor' and "DAILY_LABOR_PERCENT" / 'Multi_Factor ).
Also, I'm worried that once I enlarge this dataset having the range as 100 won't work.
Looking for
- The Labor is repeating. So when I apply my logic I am not getting the true labor percentage.
- Suggestions on how to approach this with a larger dataset.
code
import pandas as pd
import numpy as np
data={
"ID": [ "BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR"],
"PHASENAME": [ "C","C","C","C","C","C","C","C","C","C","C","C","C","C"],
"C_DAYS": [ 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0],
"Multi_Factor": [7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857],
"DAY_COUNTER": [1,2,3,4,5,6,7,8,9,10,11,12,13,14],
"DAILY_LABOR_PERCENT": [1.0,5.0,9.0,11.0,10.0,9.0,9.0,9.0,8.0,10.0,8.0,7.0,4.0,0.0],
}
df=pd.DataFrame(data)
df1=df.copy()
n = 100
first_day = df["DAY_COUNTER"].iloc[0]
rep = np.tile(df.values, (n // len(df) + 1, 1))
out = pd.DataFrame(rep, columns=df.columns).iloc[:n]
out["DAY_COUNTER"] = range(first_day, n + first_day)
def labor_factor(row):
return row['DAILY_LABOR_PERCENT'] / row['Multi_Factor']
#creating empty list to store rows for each id, phasename, and cdays
new_rows= []
#iterating through each index range and applying function to get day_counter to 100
for i in range(len(out)):
out['new_labor']= out.apply(labor_factor,axis=1)
row
is not a row, it's the stringBAR
..apply()
isn't passing the entire row, it's just passing one value.Pandas.DataFrame.apply
andPandas.Series.apply
work..iloc
returns a series.