Questions tagged [weighted-sampling]
If you have survey data with weights, please use "survey-sampling" instead. If you need to draw Monte Carlo samples from a distribution that is intractable/inconvenient, and have to use a sampler from a simpler distribution that you would then correct with weights, please use "importance-sampling", "monte-carlo" and/or "simulation" instead.
141
questions
0
votes
0
answers
23
views
Weighing Data Issue
I am looking at e-cig prevalence within a city. I used surveys to collect data from residents, and I have a query around weighing data.
I have made the assumption, due to over and underrepresentation ...
2
votes
2
answers
100
views
Is R's weighted sample without replacement function misleading?
Background
The 2023 article "Remarks on some misconceptions about unequal probability sampling without replacement" by Tillé suggests the sample function ...
0
votes
0
answers
17
views
Is there any point upsampling a minority class if it is 40% of the dataset? [duplicate]
The minority class of my target variable is 40% of the dataset. Is there any point to upsampling them to 50%? or is upsampling only used when there is severe class imbalance?
1
vote
0
answers
18
views
Is there any statistical advantage to using a deterministic sample size in unequal probability sampling with the Horvitz-Thompson estimator?
Say I'm sampling from a large population of size $N$ without replacement, and denote by $\pi_i$ the probability that unit $i$ is included in the sample, and $\pi_{ij}$ the probability that both $i$ ...
0
votes
0
answers
35
views
Assign weights to examples in a highly imbalanced dataset
I have a highly imbalanced dataset and I'd like to train a simple ANN classifier on it. My model currently is a simple 2-layer feed-forward neural network with ReLU activation in between. After a few ...
0
votes
1
answer
109
views
Stratified SRS vs. probability-proportional-to-size (PPS) sampling - what's the difference?
If my understanding is correct, the key difference is that:
In stratified SRS you intentionally draw $N_h$ samples from each of your $k$ strata ($h = 1...k$, $\sum_{1}^{k}{N_h} = N$) and are ...
2
votes
1
answer
125
views
Upper bound for covariance of Hortvitz-Thompson Estimators
I need to bound on a covariance quantity that has come up in a sampling problem. $\widehat{Y}$ and $\widehat{T}$ are Horvitz-Thompson estimators of population totals, $Y=\sum_{i=1}^N y_i$ and $T=\sum_{...
0
votes
0
answers
9
views
How to improve sample representativeness for longitudinal data collected via an online platform?
I am working with a longitudinal dataset exploring cognitive ageing (e.g., memory performance over time). Participants complete the study annually. Inclusion criteria for this study are 1) UK resident,...
1
vote
0
answers
23
views
Is there relationship between propensity score based causal inference and sampling weights?
Consider observational study with single outcome $Y$, single covariate $X$ and treatment assignment variable $W$. Under unconfounded treatment assignment assumption, $E_{sp}[Y(1)]=E[\frac{Y_i^{obs}W_i}...
1
vote
1
answer
187
views
Non-parametric bootstrap for 95%CI calculation in stratified sample in R
I am estimating the population mean of the 2023 value of cars from a stratified sample. The value of the cars is right skewed on visual inspection, and some basic diagnostics indicate normality ...
3
votes
0
answers
85
views
What is the (Ratio estimator for the) covariance of two weighted means? [closed]
In a previous question I've asked How to estimate the (approximate) variance of the weighted mean?, specifically, how to prove the following formula:
$$
\widehat{\sigma_{\bar{y}_w}^2} = \frac{1}{(\sum{...
1
vote
0
answers
25
views
Amplification effect of retweets on uncertainty
Consider you are scoring tweets for tone based on some sentiment analysis implementation. Each tweet has hypothetically a 90% chance of being correctly scored, while 10% get it wrong for whatever ...
5
votes
1
answer
252
views
Density of sampled exponential data, with sampling weights proportional to x itself
Suppose $p(x) = \lambda e^{-\lambda x}$. However, our probability of observing a given sample of $x$ (denoted $z$) is further proportional to $x$ itself, i.e., $p(z\mid x) = \lambda e^{-\lambda x}$. ...
1
vote
1
answer
104
views
Probability of drawing one element before another in weighted sampling without replacement
Setup:
The setup is weighted sampling without replacement. By which I mean:
You have a set of $n$ items, indexed by integers 1 through $n$, and the items have associated weights $\{w_1,\ldots,w_n\}$ ...
1
vote
0
answers
345
views
Logistic regression for case-control studies
If I have designed a study where participants from 3 disease groups of fixed size were being sampled and suppose the three groups A, B and C are of sizes n_A=50, n_B=50 and n_C=100. Group A is a ...