0
$\begingroup$

In a survey, a complex sample was collected, and the sample was designed to provide estimates at national level. In other words, individuals from one state were more likely to be sampled due to stratification and clustering, leading to intentional oversampling of certain groups for better representation.

The final dataset is as following:

ID PSU (Primary Sampling Unit) Stratum Sampling Weight Age Ethnicity Income
001 102 A 1.5 35 Hispanic 65000
002 203 B 2.0 45 Caucasian 40000
003 102 A 1.5 28 Asian 90000
... ... ... ... ... ... ...

Given that I have information about the location of each individual, I want to add a column state, and regress ( at the individual level) Income on state and other variables, to estimate the location influence on salaries.

Naturally, the regression would consider the unit weights, using some package for analysis of complex samples, as the survey package from R, for instance.

What are the implications of that, considering the sampling design?

$\endgroup$
7
  • 1
    $\begingroup$ I don't know what is complex about this sample, however if you recorded the individuals state and want to include it in a model, you can do that. What is worrying you about your sampling design? $\endgroup$ Commented Dec 6, 2023 at 12:16
  • $\begingroup$ Why were some individuals more likely to be sampled than others? How exactly did you sample? What was your design? In any case, if this is true then you almost surely must include state in your analysis to account for this design. $\endgroup$ Commented Dec 6, 2023 at 12:27
  • $\begingroup$ My concern is that is not a random sample... Individuals from one state were more likely to be sampled than others, due to the sampling design..Does this violate OLS random sampling assumption? Does the survey package correct for that? $\endgroup$
    – Oalvinegro
    Commented Dec 6, 2023 at 12:27
  • $\begingroup$ Individuals from one state are more likely to be sampled due to stratification and clustering. An intentional oversampling of certain groups for better representation. $\endgroup$
    – Oalvinegro
    Commented Dec 6, 2023 at 12:30
  • 1
    $\begingroup$ If you have any additional information that can be inferred, including state, add it on. There isn't any negative consequences of this. $\endgroup$
    – Alex J
    Commented Dec 7, 2023 at 22:49

0