5
$\begingroup$

I'm trying to capture the intuition behind the statements of book about fixed regressor in regression models. The book says that in general we wish to work with the random sampling assumption (where the population model is specified and an iid random sample can be drawn from the population). Then,any explanatory variables are treated as random outcomes, along with data on response variables. Fixed regressors cannot be identically distributed across observations, and so the random sampling assumption technically excludes the classical linear model.

Why fixed regressors cannot be identically distributed across observations and why the random sampling assumption technically excludes the classical linear model?

$\endgroup$
3
  • 1
    $\begingroup$ What book are you reading? Can you provide page numbers or a screenshot of the quote? $\endgroup$
    – dimitriy
    Commented Dec 31, 2022 at 18:02
  • $\begingroup$ The book is: Wooldridge, Econometric Analysis of Cross Section and Panel Data, pag.5 $\endgroup$
    – Maximilian
    Commented Dec 31, 2022 at 18:18
  • 2
    $\begingroup$ Seems like a pretty narrow definition of the "classical linear model" to me. $\endgroup$ Commented Dec 31, 2022 at 18:41

1 Answer 1

5
$\begingroup$

The intuition is that if regressors are fixed, they cannot be identically nor non-identically distributed since they are not random.

Take a sample from a one-variable population model, $y_i = \beta_0 + \beta_1 x_i + u_i$ for $i=1,\ldots, n$.

Compare the following two scenarios:

  1. $y_i$ is crop output from a plot of land, where you apply different amounts of fertilizer $x_i$ (an experiment).

  2. $y$ is wages and $x_i$ is years of education (observational data).

We derive the statistical properties of the OLS estimators as conditional on the values of the $x_i$ in our sample. In statistical derivations, conditioning on the sample values of the independent variable is the same as treating the $x_i$ as if it was fixed in repeated samples. FIRS means it's possible to redraw the sample with the same independent variable values. You can conceptualize that as first choosing $n$ sample values for $x_1, x_2, \ldots , x_n$. These could be as simple as zero fertilizer versus some fertilizer (binary $x$) or something more complicated where you are attempting to trace out the curve over some range of $x$. Given these values, we obtain a sample on $y$ (effectively by getting a random sample of the $u_i$). Next, another sample of $y$ is obtained, using the same values for $x_1, x_2, \ldots , x_n$. Then another sample, and so on. This is like sampling plots with different levels of fertilizer (which you have selected as the experimenter, which is a different way of saying you fixed them). Here $u_i$ is land quality, farmer ability, and other unobserved factors like pests and vermin, which are the sources of variation in $y$.

The fixed-in-repeated-samples scenario is not very realistic in nonexperimental contexts. For instance, in sampling individuals to estimate the wage-on-education, Mincer earnings regression, it makes little sense to think of choosing the values of education ahead of time and then sampling individuals with those particular levels of education. Random sampling, where individuals are plucked and have their wages and education recorded, is more representative of how most data sets are obtained for empirical analysis in economics. This is why we often relax the FIRS assumption (or weaken it) and worry about things like endogeneity.

$\endgroup$
4
  • $\begingroup$ Thank you very much for your massive answer. Are you saying that fixing regressors means that the researcher set the regressors values a priori, and so they are deterministic, which means that every new sample will have the same X matrix. Right? What is still a little unclear is: if regressors are deterministic, does it make sense to talk about a distribution for regressors? My intuition would say that a sample with a deterministic and varying regressor is no longer an identically distributed sample. What I'm still missing? $\endgroup$
    – Maximilian
    Commented Dec 31, 2022 at 23:36
  • $\begingroup$ Sets makes sense in the context of experiments. But you could also get fixed regressors by, say, sampling all 50 states in the US or all brands and vintages of champagne. In either case, it makes no sense to talk about distribution since there’s no randomness. $\endgroup$
    – dimitriy
    Commented Jan 1, 2023 at 0:00
  • $\begingroup$ Thanks again. The very last point. With fixed regressors, $E[Yi|xi]=E[Yi]= \beta x_i $, and so each $Y_i$ has its own distribution, because the $x_i$'s varies. In case of random regressors, this won't be the case because the $x_i$'s are sampled from the same distribution, right? $\endgroup$
    – Maximilian
    Commented Jan 1, 2023 at 0:44
  • $\begingroup$ I think that is incorrect. In the FIRS case, $y$ has a distribution because $u$ is random variable and $y_i = \beta_0 + \beta_1 \cdot x_i + u_i$. And in the non-FIRS case, $\{ y_i,x_i,u_i \}_{i=1}^N$ are sampled from a joint distribution. $\endgroup$
    – dimitriy
    Commented Jan 1, 2023 at 1:06

Not the answer you're looking for? Browse other questions tagged or ask your own question.