0
$\begingroup$

I am looking at e-cig prevalence within a city. I used surveys to collect data from residents, and I have a query around weighing data.

I have made the assumption, due to over and underrepresentation from particular groups in the survey results - that data should be weighed according to distribution of age within the city population, and to weigh my data based on this. E.g. I collected data from the census on amount of individuals within each age category (e.g. 18-24, 25-34 etc.) and then compared the distribution of individuals in each age category in the city with my data to get the desired responses and the according weight factors. I then used these to get weighed data, that has a better representation to what the city demographics are like.

My question was that, and correct me if I'm wrong, but I should only use this weighed data when making comparisons across the generational age categories or population as a whole. And use the unweighed data, when looking at specific responses within a generation itself.

E.g. If I was trying to look at perception of harm of e-cigs among 18-24yrs by current user, previous user, never used - I should only look at the unweighed data for making these comparisons? But if I were looking at perception of harm of current e-cig users between each age group, I would be using the weighed data?

I just want to confirm this, before doing hundreds of statistical tests on different factors. Thank you.

$\endgroup$
5
  • $\begingroup$ If you're only doing comparisons within a group then everyone in the analysis will have the same weight and so the weight won't matter $\endgroup$ Commented Apr 13 at 4:58
  • $\begingroup$ @ThomasLumley and if I were to then use a statistical test, like chi to compare relationships with e.g. age and e-cig use. Would it be better to use the raw unweighted data, or the weighted data. I'm leaning toward raw unweighted, as weights could cause issues in statistical tests and then only use the weighted data to estimate % use across the broad population? $\endgroup$
    – Aidan
    Commented Apr 13 at 5:14
  • $\begingroup$ If you were using sampling weights correctly it wouldn't matter, but it's a lot easier to just use unweighted analyses within groups $\endgroup$ Commented Apr 13 at 5:32
  • $\begingroup$ @ThomasLumley - okay thank you. Reason for noting, is that when I weighted my data based on age distribution within the city population, I noticed that while the % values would remain the same, the chi-squared value would be higher for the weighted data than unweighted. For example, when I compared the weighted data of e-cig use by age category using chi - while the proportions within each age group remained the same percentages regardless of weighting/unweighted (current user/previous/never), the overall chi would be much higher for the weighted data than the unweighted. $\endgroup$
    – Aidan
    Commented Apr 13 at 5:41
  • $\begingroup$ That's because the chi-squared is not being calculated correctly for the weights, which is why it's easier to do the unweighted analysis. For the weighted analyses you need to use software that understands sampling weights (eg Stata or R's survey package) $\endgroup$ Commented Apr 13 at 5:44

0