I am looking at e-cig prevalence within a city. I used surveys to collect data from residents, and I have a query around weighing data.
I have made the assumption, due to over and underrepresentation from particular groups in the survey results - that data should be weighed according to distribution of age within the city population, and to weigh my data based on this. E.g. I collected data from the census on amount of individuals within each age category (e.g. 18-24, 25-34 etc.) and then compared the distribution of individuals in each age category in the city with my data to get the desired responses and the according weight factors. I then used these to get weighed data, that has a better representation to what the city demographics are like.
My question was that, and correct me if I'm wrong, but I should only use this weighed data when making comparisons across the generational age categories or population as a whole. And use the unweighed data, when looking at specific responses within a generation itself.
E.g. If I was trying to look at perception of harm of e-cigs among 18-24yrs by current user, previous user, never used - I should only look at the unweighed data for making these comparisons? But if I were looking at perception of harm of current e-cig users between each age group, I would be using the weighed data?
I just want to confirm this, before doing hundreds of statistical tests on different factors. Thank you.