Skip to main content

Questions tagged [weighted-sampling]

If you have survey data with weights, please use "survey-sampling" instead. If you need to draw Monte Carlo samples from a distribution that is intractable/inconvenient, and have to use a sampler from a simpler distribution that you would then correct with weights, please use "importance-sampling", "monte-carlo" and/or "simulation" instead.

0 votes
0 answers
23 views

Weighing Data Issue

I am looking at e-cig prevalence within a city. I used surveys to collect data from residents, and I have a query around weighing data. I have made the assumption, due to over and underrepresentation ...
Aidan's user avatar
  • 1
2 votes
2 answers
100 views

Is R's weighted sample without replacement function misleading?

Background The 2023 article "Remarks on some misconceptions about unequal probability sampling without replacement" by Tillé suggests the sample function ...
LBogaardt's user avatar
  • 582
0 votes
0 answers
17 views

Is there any point upsampling a minority class if it is 40% of the dataset? [duplicate]

The minority class of my target variable is 40% of the dataset. Is there any point to upsampling them to 50%? or is upsampling only used when there is severe class imbalance?
ibarbo's user avatar
  • 65
1 vote
0 answers
18 views

Is there any statistical advantage to using a deterministic sample size in unequal probability sampling with the Horvitz-Thompson estimator?

Say I'm sampling from a large population of size $N$ without replacement, and denote by $\pi_i$ the probability that unit $i$ is included in the sample, and $\pi_{ij}$ the probability that both $i$ ...
crf's user avatar
  • 309
0 votes
0 answers
35 views

Assign weights to examples in a highly imbalanced dataset

I have a highly imbalanced dataset and I'd like to train a simple ANN classifier on it. My model currently is a simple 2-layer feed-forward neural network with ReLU activation in between. After a few ...
Green绿色's user avatar
0 votes
1 answer
109 views

Stratified SRS vs. probability-proportional-to-size (PPS) sampling - what's the difference?

If my understanding is correct, the key difference is that: In stratified SRS you intentionally draw $N_h$ samples from each of your $k$ strata ($h = 1...k$, $\sum_{1}^{k}{N_h} = N$) and are ...
k13's user avatar
  • 47
2 votes
1 answer
125 views

Upper bound for covariance of Hortvitz-Thompson Estimators

I need to bound on a covariance quantity that has come up in a sampling problem. $\widehat{Y}$ and $\widehat{T}$ are Horvitz-Thompson estimators of population totals, $Y=\sum_{i=1}^N y_i$ and $T=\sum_{...
Eaman's user avatar
  • 41
0 votes
0 answers
9 views

How to improve sample representativeness for longitudinal data collected via an online platform?

I am working with a longitudinal dataset exploring cognitive ageing (e.g., memory performance over time). Participants complete the study annually. Inclusion criteria for this study are 1) UK resident,...
Aepkr's user avatar
  • 309
1 vote
0 answers
23 views

Is there relationship between propensity score based causal inference and sampling weights?

Consider observational study with single outcome $Y$, single covariate $X$ and treatment assignment variable $W$. Under unconfounded treatment assignment assumption, $E_{sp}[Y(1)]=E[\frac{Y_i^{obs}W_i}...
user45765's user avatar
  • 1,445
1 vote
1 answer
187 views

Non-parametric bootstrap for 95%CI calculation in stratified sample in R

I am estimating the population mean of the 2023 value of cars from a stratified sample. The value of the cars is right skewed on visual inspection, and some basic diagnostics indicate normality ...
burnt_pianos's user avatar
3 votes
0 answers
85 views

What is the (Ratio estimator for the) covariance of two weighted means? [closed]

In a previous question I've asked How to estimate the (approximate) variance of the weighted mean?, specifically, how to prove the following formula: $$ \widehat{\sigma_{\bar{y}_w}^2} = \frac{1}{(\sum{...
Tal Galili's user avatar
  • 21.8k
1 vote
0 answers
25 views

Amplification effect of retweets on uncertainty

Consider you are scoring tweets for tone based on some sentiment analysis implementation. Each tweet has hypothetically a 90% chance of being correctly scored, while 10% get it wrong for whatever ...
geotheory's user avatar
  • 647
5 votes
1 answer
252 views

Density of sampled exponential data, with sampling weights proportional to x itself

Suppose $p(x) = \lambda e^{-\lambda x}$. However, our probability of observing a given sample of $x$ (denoted $z$) is further proportional to $x$ itself, i.e., $p(z\mid x) = \lambda e^{-\lambda x}$. ...
jessexknight's user avatar
1 vote
1 answer
104 views

Probability of drawing one element before another in weighted sampling without replacement

Setup: The setup is weighted sampling without replacement. By which I mean: You have a set of $n$ items, indexed by integers 1 through $n$, and the items have associated weights $\{w_1,\ldots,w_n\}$ ...
postylem's user avatar
  • 155
1 vote
0 answers
345 views

Logistic regression for case-control studies

If I have designed a study where participants from 3 disease groups of fixed size were being sampled and suppose the three groups A, B and C are of sizes n_A=50, n_B=50 and n_C=100. Group A is a ...
s.stats's user avatar
  • 477

15 30 50 per page
1
2 3 4 5
10