Say I have a sample, of finite size $N$, and I compute some statistic $\theta$ from it. I want to plot this sample estimate, $\hat{\theta}$, with an error-bar.
To compute the error, I am using bootstrapping, i.e. taking samples with replacement from my original sample, each of the same size $N$. I then compute the statistic on each bootstrap sample, $\theta^*$, and bin these values into a bootstrap distribution.
If the bootstrap assumption holds true, meaning my sample is representative of the underlying population, then
- The bootstrap standard error will approximate the actual standard error. In other words, the standard deviation of the bootstrap distribution will approximate the standard deviation of the sampling distribution of $\theta$, built by taking many samples of size $N$ with replacement from the population, and binning all their sample estimates $\hat{\theta}$. Mathematically, $\sigma(\theta^*)\approx \sigma(\hat{\theta})$.
- The bias of the bootstrap distribution will approximate the bias of the sampling distribution. In other words, the distance between the mean of the bootstrap distribution and my original sample estimate will approximate the distance between the mean of the sampling distribution and the true value (computed from the population). Mathematically, $\overline{\theta^*} -\hat{\theta} \approx \overline{\hat{\theta}}-\theta$.
Coming back to my aim: I do not have access to the population, only to a sample of it, and I want to show a sensible error-bar on my sample estimate, $\hat{\theta}$.
If the bootstrap assumption holds true and I find that, after building the bootstrap distribution from my sample, the bias is negligible $(\overline{\theta^*} -\hat{\theta} \approx 0)$, then the bootstrap standard error, $\sigma(\theta^*)$, is a measure both of
- The precision of my sample estimate, as $\sigma(\theta^*)\approx\sigma(\hat{\theta})$, which is the variability of sample estimates around the mean.
- The accuracy of my sample estimate, as $\overline{\theta^*} -\hat{\theta} \approx \overline{\hat{\theta}}-\theta\approx 0$, which means that the standard error measures the variability of samples estimates around the true (or "correct") value.
This is in line with the wikipedia page on errorbar, which says
Error bars are graphical representations of the variability of data and used on graphs to indicate the error or uncertainty in a reported measurement. They give a general idea of how precise a measurement is, or conversely, how far from the reported value the true (error free) value might be.
As I understand it, the last part is a statement of accuracy.
Problem
Now, all of that is fine, but the question is what to do when there is actually a non-zero bias. In this case, point 1 above still holds: the bootstrap standard error measures the precision of my sample estimate. However, point 2 does not hold, i.e. the variability that the standard error measures is no longer around the true value (as it is displaced from the mean), so the standard error is no longer related to the accuracy of my sample estimate.
I am not happy with that because I would like my errors to indicate, as the wikipedia article states, how far from the reported value the true (error free) value might be.
Potential solutions
I can think of two different ways of achieving what I want, but I am not sure they are sensible, as I am not an expert in statistics. Let me call the non-zero bias $\alpha = \overline{\theta^*} -\hat{\theta} \approx \overline{\hat{\theta}}-\theta$. The potential solutions are
- Given the last equality above, $\alpha$ is the average accuracy of the sample estimates. A simple, perhaps naïve, way to deal with it could be to "correct" for it, substracting it from my sample estimate, $\hat{\theta}-\alpha$. The result would be an unbiased estimate, and I would just show it with the bootstrap standard error as error-bar.
- Instead of computing the bootstrap standard error, $\sigma(\theta^*)$, i.e. measure the (square root of the) mean squared deviation of bootstrap estimates from the mean, $\overline{\theta^*}$, I could instead compute the deviation from my sample estimate, $\hat{\theta}$. Calling this quantity $\epsilon(\hat{\theta})$, and given $\hat{\theta}=\overline{\theta^*}-\alpha$, we would have $$\begin{align} \epsilon^2(\hat{\theta})=& \frac{1}{B}\sum_{b=1}^{B}(\theta^*_b-\hat{\theta})^2\\ =& \frac{1}{B}\sum_{b=1}^{B}(\theta^*_b+\alpha-\overline{\theta^*})^2\\ =& \frac{1}{B}\sum_{b=1}^{B}\left[(\theta^*_b-\overline{\theta^*})^2+\alpha^2-2\alpha(\theta^*_b-\overline{\theta^*})\right]\\ =& \frac{1}{B}\sum_{b=1}^{B}(\theta^*_b-\overline{\theta^*})^2+ \frac{1}{B}\sum_{b=1}^{B}\alpha^2- \frac{2\alpha}{B}\sum_{b=1}^{B}\theta^*_b+\frac{2\alpha}{B}\sum_{b=1}^{B}\overline{\theta^*}\\ =& \sigma^2(\theta^*)+ \alpha^2- 2\alpha\overline{\theta^*} + 2\alpha\overline{\theta^*}\\ =& \sigma^2(\theta^*)+ \alpha^2. \end{align}$$ So this quantity incorporates the bias in quadrature relative to the bootstrap standard error, $$\epsilon(\hat{\theta})=\sqrt{\sigma^2(\theta^*)+\alpha^2},$$ and, as a consequence, it is a measure of both the precision and accuracy of my sample estimate $\hat{\theta}$, as I would like. As the bias approaches zero, it converges to the bootstrap standard error, $\epsilon(\hat{\theta})=\sigma(\theta^*)$.
Question(s)
Are any of my potential solutions sensible? Have I made any wrong assumptions? Any ideas would be highly appreciated.
Edit
In response to @EdM, my statistic is the following: $$\theta=\frac{1}{2}\arctan\left(\frac{\sigma_{xy}^2}{|\sigma_{xx}^2-\sigma_{yy}^2|}\right),$$
where, given a 2D sample of $x$ and $y$ coordinates, $\sigma_{ij}^2=\langle ij \rangle - \langle i \rangle \langle j \rangle$.
This variable $\theta$ measures the angle by which the direction of highest dispersion in the plane is tilted with respect to the coordinate axes. In Galactic astronomy we call it the vertex deviation, with symbol $l_\mathrm{v}$. One important thing to note is that its range is limited to $[-45,45]˚$.
I am simulating the type of bias this variable exhibits below. The left and right panels were produced starting from two different populations. Each panel shows several sampling distributions for different sample sizes, and the legend indicates their mean value of $l_\mathrm{v}$. Lower $N$ distributions develop a positive bias (i.e. the magnitude of their mean $l_\mathrm{v}$ decreases) relative to the true value. The limited range of the variable seems to be at least part of the issue.