Least-bad stepwise procedure for a simulation that shows issues with stepwise regression

Ask Question

Asked 1 month ago

Modified 30 days ago

Viewed 57 times

I am well-aware of the issues that stepwise regression causes. I want to demonstrate some of them via simulation in a particular situation.

I am thinking of a regression where I have some categorical variable of interest and then some covariates. I am operating under the assumption that some combination of these covariates matters, and that the stepwise selection will select a good combination of them (probably not, but that failure is what I want to show in my simulation).

However, if I just run a normal stepwise regression like MASS:stepAIC, I run the risk of not including that original categorical variable of interest.

What I would ideally like to do is to run the stepwise selection on just the covariates and then find the t-stat for the final test (after the stepwise elimination or inclusion) on that original categorical variable of interest, as if I had gone with that model from the beginning, but there will not be a t-stat if the variable is entirely excluded.

What would be the remedy? Sure, I can code my own stepwise selection algorithm that does not consider a particular variable, but I am not even totally sure what I would do if the "correct" stepwise elimination step is to remove that main variable of interest. Is that the end of my backward elimination?

Citing a simulation study (or mathematical derivation) showing what happens to the test statistic distributions would make for an interesting answer.

edited Jun 20 at 21:14

asked Jun 13 at 5:19

Dave

65k7 gold badges101 silver badges286 bronze badges

1

$\begingroup$ It seems like this might be a request for code, which would be off topic. In SAS, you can use the INCLUDE option on the SELECTION statement in PRO GLMSELECT. There is probably a way to do this in R, but I don't know what it is. $\endgroup$
– Peter Flom
Commented Jun 13 at 11:32
2

$\begingroup$ This has been studied enough, for example onlinelibrary.wiley.com/doi/abs/10.1002/sim.4780100504 - we don’t really need more demonstrations of what a disaster stepwise regression is. $\endgroup$
– Frank Harrell
Commented Jun 13 at 12:29
1

$\begingroup$ The simplest workaround perhaps is to replace all variables--explanatory and response--by their residuals in a regression against the variable of interest. Run the model selection process on the residuals in place of the original variables. This is mathematically identical to forcing the variable of interest to be included in all the models visited in the stepwise algorithm. $\endgroup$
– whuber ♦
Commented Jun 20 at 21:33
1

$\begingroup$ @whuber That's totally straightforward to implement! Why is that equivalent, though? $//$ You mean something like L1 <- lm(y ~ x1 + x2 + x3) and then L2 <- lm(resid(L1) ~ x1 + x2 + x3), right? $\endgroup$
– Dave
Commented Jun 20 at 21:35
1

$\begingroup$ Replace the $x_i$ with their residuals, too. This is explained at stats.stackexchange.com/a/46508/919 which includes R code to do the job (the take.out function). The idea merely generalizes the familiar concept of centering all variables for multiple regression to avoid including an explicit constant term. A mathematical demonstration is given at stats.stackexchange.com/a/113207/919. $\endgroup$
– whuber ♦
Commented Jun 20 at 21:40

| Show 1 more comment

Stack Exchange Network

Least-bad stepwise procedure for a simulation that shows issues with stepwise regression

0

Browse other questions tagged
regression
hypothesis-testing
feature-selection
simulation
stepwise-regression
or ask your own question.

Linked

Hot Network Questions

Least-bad stepwise procedure for a simulation that shows issues with stepwise regression

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Browse other questions tagged regressionhypothesis-testingfeature-selectionsimulationstepwise-regression or ask your own question.

Linked

Related

Hot Network Questions

Browse other questions tagged
regression
hypothesis-testing
feature-selection
simulation
stepwise-regression
or ask your own question.