1
$\begingroup$

So I have a question about multiple linear regression.

$$Y^j=\beta_1 X_1^j+\beta_2 X_2^j +\cdots +\beta_p X_p^j + \epsilon \tag{*}$$

When I test the significance of the $\beta$ with student or Fisher statistic using for instance EXCEL or R some of the $\beta$ are non significant meaning according to the test some of the betas are equal to zero and some not.

My question is easy: do i have to take out the betas who are equal to zero in the model(*) and i have a new model( or is it the same ??) without the betas equal to zeros ?

thanks in advance i hope everyone understand me !).

$\endgroup$
7
  • $\begingroup$ Yes, you omit the variables where the betas can be considered as 0. Then you can think about what you do with the omitted variables. You also should repeat the F-test for the whole model. Did the model get worse after reducing the variables? $\endgroup$ Commented Mar 22, 2020 at 20:53
  • $\begingroup$ In general you also make a reality check. It is plausible to omit a variable "height" when you want to predict the weight? $\endgroup$ Commented Mar 22, 2020 at 20:59
  • $\begingroup$ thanks for the answer. I've just read [link] math.stackexchange.com/questions/2269791/… and one of the answers say that you shouldn't omit the the variables where the betas can be considered as 0 $\endgroup$ Commented Mar 22, 2020 at 21:00
  • $\begingroup$ I haven´t found what you said. But sure at the first step you omit the variable where t-test says that $\beta_i=0$. But only if it makes sense somhow, see my second comment. $\endgroup$ Commented Mar 22, 2020 at 21:06
  • $\begingroup$ the first answer by Yujie Zha $\endgroup$ Commented Mar 22, 2020 at 21:09

1 Answer 1

1
$\begingroup$

That is a model selection question and in general there are many approaches. Asking on https://stats.stackexchange.com/ will probably give you multiple detailed answers.

When you do a t-test or an F-test for a single variable, you test whether the coefficient is zero "in the presence of all the other variables". Dropping all the insignificant ones at once changes the model,and there is no clear-cut answer. You can do an F-test for the significance of the resulting model after the reduction (i.e., test if all remaining non-intercept coefficients are zero), and keep the model if you do not reject the null.

There are many other (and better) approaches for model selection, i.e., picking which covariates to include in the model. You can for example, use all-subset regression (\ell_0 penalty) together with any of these measures: AIC, BIC, Mallow's $C_p$, adjusted $R^2$, prediction error sum-of-squares (PRESS). You can also use the Lasso or concave penalties (like the MCP) to avoid going over all subsets.

You can also use step-wise (forward and/or backward) regression which basically uses the t or F test in a sequential manner to include the most significant variables one at a time (or to eliminate the least significant one sequentially.)

These are not the only approaches!

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .