Correlation coefficient with noisy data

Question

I am trying to analyse the dependencies between some variables, looking at the Pearson and Spearman correlation coefficients (for now, at least). The problem is that one variable has known errors (for each data point, there is a confidence interval).

Is there any way to propagate the confidence intervals of the data to obtain a confidence interval on the coefficient itself?

It looks as though weighted correlation might by helpful here. Try this Q&A stats.stackexchange.com/questions/221246/… and see whether that helps. If not perhaps edit your quesiton to explain more? — mdewey, Commented Nov 9, 2018 at 13:33
@mdewey Thank you for your comment. It seems like my question was not clear enough, but since I'm not sure what was unclear, I'll just try to give you an example (I'll edit the question if this makes more sense). I have about 3000 samples of 2 variables (X,Y). For each sample, I know that the X variable has an error of +-Xerr. Thus, each sample is represented by a triplet (X,Y,Xerr), and I know that the actual values of the sample might be between (X-Xerr,Y), (X+Xerr, Y) (Xerr is given per each sample, Y has no error). How can I use Xerr in the correlation between X and Y? — Paul92, Commented Nov 9, 2018 at 13:40
You could weight each observation by the inverse of its variance. That would admittedly ignore that $Y$ is known without error. — mdewey, Commented Nov 9, 2018 at 14:31
Is your data chronological . If so all bets are off using the correlation coefficient. — IrishStat, Commented Nov 9, 2018 at 20:15
@IrishStat No, it is not chronological. Y is a measure of some property of a group of individuals. The problem is that the number of individuals (X) cannot be determined exactly and thus has a quantifiable error associated. — Paul92, Commented Nov 9, 2018 at 21:01

rolando2 · Accepted Answer · 2018-11-09 20:00:10Z

Create a Monte Carlo simulation. Suppose you run 1,000 trials. Each value of X will take on a normal distribution of 1,000 values. The primary or "expected" value will serve as the mean of this distribution. The upper and lower limits, allowing for error, will determine your standard deviations, i.e., those limits could mark off values about -3 and +3 standard deviations above and below that mean, respectively.

Now when you compute the corresponding 1,000 simulated correlation coefficients between the Y series and each simulated X series, you will gain a sense of the variability of those coefficients in a more genuine way than would be indicated by the usual formula for the standard error of r.

Stack Exchange Network

Correlation coefficient with noisy data

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
distributions
correlation
or ask your own question.

Linked

Hot Network Questions

Correlation coefficient with noisy data

1 Answer 1

Not the answer you're looking for? Browse other questions tagged distributionscorrelation or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
distributions
correlation
or ask your own question.