reinforcement learning

Ask Question

Asked 1 year, 11 months ago

Modified 1 year, 11 months ago

Viewed 50 times

How can we be sure that confounding variables/control variables don’t pickup the effect our decisions w.r.t decision variables had on the actual control variable?

Since the term control variable overlap in regression and control theory I want to make it clear that I use the term “control variable” when I refer to an variable aiming to pick up confounding effects that we don’t want to pick up in our decision variables. Decision variables then being the variables that we can control in our optimization scenario.

Consider the resource allocation problem where we want to allocate an portion of the total resource B to two investment options every day over an finite horizon H.

Consider this reward function $Y_{t} = intercept + c_{1} * x_{1t} + c_{2} * x_{2t} + c_{3} * trend_{t}$

Where $Y_{t}$ is sales at timestep ${t}$, $x_{1t}$ is budget for investmentoption 1 at timestep t, $x_{2t}$ is budget for investmentoption 2 at timestep t. $Trend_{t}$ is the trend at timestep t and have been fitted on the datasets historical sales data by using fourier decomposition. The trend variable is thought of as capturing the underlying market trend that we cannot control for.

Consider the following algorithm:

Fit our reward function to the historical data by performing OLS.
Maximize the reward function over the whole horizon s.t to constraints.
Set the allocation for the current timestep.
See the reward for the current timestep.
Add the allocation and reward to our dataset.
Repeat step 1.

Consider the horizon 30. Meaning that we optimize for one month.

Now if I start to optimize this, the actual trend might change, hopefully the sales will start increasing... now, how can I be sure that this change is reflected in our decision variable coefficients and not our trend variable. Two interesting scenarios is when we leave c1 as it is, a timeinvariant coefficient and if we don’t, let’s say we fit it such that we aim to capture day in month effects on our decision variables thereby letting the algorithm optimize also w.r.t day of month.

I am thinking if this would impose a problem or if im missing out on something. Seems like it should be a quite normal problem in stochastic programming/reinforcement learning/model predictive control.. basically any sequential decision problem where we have a predefinied model with both decision variables and variables that aim to control confounding effects.

To check if this would impose a problem I assume we could check the variance inflation factor and correlation between the trend variable and our decision variables?

But what is the actual solution to the problem?

Maybe fit the trend component to our historic data and then when we roll out the algorithm, we solely extrapolate the trend we saw before we started the algorithm?

edited Aug 8, 2022 at 8:21

asked Aug 8, 2022 at 7:53

stewardbranson

312 bronze badges

Add a comment |

Stack Exchange Network

Control variables and cofounding effects in stochastic programming/,model predictive control/reinforcement learning

0

Browse other questions tagged
optimization
stochastic-programming
dynamic-programming
optimal-control
or ask your own question.

Hot Network Questions

Control variables and cofounding effects in stochastic programming/,model predictive control/reinforcement learning

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Browse other questions tagged optimizationstochastic-programmingdynamic-programmingoptimal-control or ask your own question.

Related

Hot Network Questions

Browse other questions tagged
optimization
stochastic-programming
dynamic-programming
optimal-control
or ask your own question.