Demonstration: Integral (discrete summing) of $C_\ell$ has a better variance than one single $C_\ell$

Question

I am working in a cosmological context where I use the $C_{\ell}$ quantities coming from Legendre transformation.

I am faced to a issue to prove the gain that we get by computing the variance of an estimator, that is to say in my case an integral of $C_\ell$ (actually dicrete summing), compared on the variance of a single $C_\ell$.

I made the following reasoning:

by defining this integral below:

$$\begin{aligned} \hat{\mathcal{D}}_{\mathrm{gal}}&=\int_{l_{\min}}^{l_{\max }} \hat{C}_{\ell, \mathrm{gal}, }(\ell) \,\mathrm{d}\ell\\ &\simeq \dfrac{(\ell_{max}-\ell_{min})}{n} \sum_{i=l}^{n} \hat{C}_{\ell,\mathrm{gal}}(\ell_{i})\\ \end{aligned}$$

and by taking the definition of a $\hat{C}_\ell(\ell_i)$:

$$\hat{C_{\ell_i}}=<|\hat{a}_{\ell_i m}|^{2}>_{m=-\ell_i, \ldots, \ell_i}\,\,$$

I would like to demonstrate the fact that I have a gain when I estimate the variance on $\hat{\mathcal{D}}_{\mathrm{gal}}$ compared to the situation where I consider only one $C_\ell$.

From my old memories, I thought that the following relation was right:

$$\,\,\dfrac{1}{\sigma_{\hat{\mathcal{D}}}^{\,\,\,2}}=\dfrac{1}{\hat{C}_{\ell,1}}+\dfrac{1}{\hat{C}_{\ell,2}}+\ldots+\dfrac{1}{\hat{C}_{\ell,n}}\gg \dfrac{1}{\hat{C}_{\ell,i}} \,\text{for any}\, i$$

But firstly, I don't know if this formula is valid and if this is not the case, how to formulate that we have all benefit by considering the estimated integral $\hat{\mathcal{D}}_{\mathrm{gal}}$ instead of considering a single $C_\ell$: I talk about this point at level of variance of $\hat{\mathcal{D}}_{\mathrm{gal}}$ which should be smaller than the variance of one $\hat{C}_\ell $.

I just need help to justify clearly how we get better accuracy (smaller variance) by integrating.

If someone could tell how to proceed to prove that, or just gives elements/tracks of answers.

UPDATE 1: to justify the choice of taking the integral instead of taking a single $C_{\ell}\left(\ell_{i}\right)$, someone suggested me to prove, if we take $Z=\sum_{i=1}^{n} X_{i}$, that :

$$\frac{\sigma_{Z}}{Z}<\frac{\sigma_{X_{i}}}{X_{i}} \text { for any } i$$

But $\frac{\sigma_{Z}}{Z}$ is a relative error, that is to say, it makes imply the value of $Z$ and not just $\sigma_{Z}$, doesn't it ?

By the way, what the absolute difference (just $\sigma_{Z}$ ) and relative difference can bring as informations ? I mean, how to interpret and compare them if we want to get significant informations about the variables that we want to estimate ?

UPDATE 2: I think that I may find the solution from the "Inverse variance weighting". Indeed, I was stuck by the condition that sum of weights below $w_{i}$ should be equal to 1.

If I Consider a generic weighted sum $Y=\sum_{i} w_{i} X_{i}$, where the weights $w_{i}$ are normalised such that $\sum w_{i}=1$. If the $X_{i}$ are all independent, the variance of $Y$ is given by :

$$\operatorname{Var}(Y)=\sum_{i} w_{i}^{2} \sigma_{i}^{2}\quad(1)$$

Lagrange multiplier $w_{0}$ to enforce the constraint, we express the variance

$$\operatorname{Var}(Y)=\sum_{i} w_{i}^{2} \sigma_{i}^{2}-w_{0}\left(\sum_{i} w_{i}-1\right)$$

For $k>0$ $0=\frac{\partial}{\partial w_{k}} \operatorname{Var}(Y)=2 w_{k} \sigma_{k}^{2}-w_{0}$

which implies that

$w_{k}=\frac{w_{0} / 2}{\sigma_{k}^{2}}$

The main takeaway here is that $w_{k} \propto 1 / \sigma_{k}^{2}$.

Since $\sum_{i} w_{i}=1$,

$$\frac{2}{w_{0}}=\sum_{i} \frac{1}{\sigma_{i}^{2}}:=\frac{1}{\sigma_{0}^{2}}$$

The individual normalised weights are $$w_{k}=\frac{1}{\sigma_{k}^{2}}\left(\sum_{i} \frac{1}{\sigma_{i}^{2}}\right)^{-1}$$

Variance of the estimator is then given by $$\operatorname{Var}(Y)=\sum_{i} \frac{\sigma_{0}^{4}}{\sigma_{i}^{4}} \sigma_{i}^{2}=\sigma_{0}^{4} \sum_{i} \frac{1}{\sigma_{i}^{2}}=\sigma_{0}^{4} \frac{1}{\sigma_{0}^{2}}=\sigma_{0}^{2}=\frac{1}{\sum_{i} 1 / \sigma_{i}^{2}}$$

and I realized that weigths $w_{i}$ may be equal, at a factor near, to $\Delta\ell=\dfrac{\ell_{max}-\ell_{min})}{N}$ where $N$ is the number of values that I am summing when one computes integral by rectangular numerical method : we have $\sum_{i=1}^{N}\Delta\ell$.

How could I find the trick to respect the conditions where the weights $w_{i}$ are normalised such that $\sum w_{i}=1$ , with :

$Y=\sum_{i} w_{i} X_{i}$

and where we can assimilate $\hat{\mathcal{D}}_{\mathrm{gal}}\equiv Y$ and $X_{i}\equiv \hat{C}_{\ell,\mathrm{gal}}(\ell_{i})$ :

What do you think about this track ? How to normalize the sum of weights $w_{i}$ ?

UPDATE 3: In order you understand the goal :

Initially, my tutor wanted to compare the ratio between 2 $C_\ell$ coming from 2 different probes (let say 1 and 2): so I should have to compute the variance of quantity $O=\dfrac{C_\ell,1}{C_\ell,2}$. And after this, my tutor tolds me that, by compute instead the ratio of the 2 integrals of $C_\ell,1$ and $C_\ell,2$ :

that is to say, the ratio between

$\begin{aligned} \hat{\mathcal{D}}_{\mathrm{ga,1}}&=\int_{l_{\min}}^{l_{\max }} \hat{C}_{\ell, \mathrm{gal,1}, }(\ell) \,\mathrm{d}\ell\\ &\simeq \dfrac{(\ell_{max}-\ell_{min})}{n} \sum_{i=l}^{n} \hat{C}_{\ell,\mathrm{gal,1}}(\ell_{i})\\ \end{aligned}$

and

$\begin{aligned} \hat{\mathcal{D}}_{\mathrm{ga,2}}&=\int_{l_{\min}}^{l_{\max }} \hat{C}_{\ell, \mathrm{gal,2}, }(\ell) \,\mathrm{d}\ell\\ &\simeq \dfrac{(\ell_{max}-\ell_{min})}{n} \sum_{i=l}^{n} \hat{C}_{\ell,\mathrm{gal,2}}(\ell_{i})\\ \end{aligned}$

He justified this by tell me we will more accuracy and that's why I try to prove the gain I get for the variance of the integral compared to a ratio of 2 single $C_\ell$ taken at the same redshift.

UPDATE 4: From the last answer of @Andrew below, I would like to mention that I am using in my code the general well-known (if we could say) that variance on a $C_\ell$ is given by : $\sigma_{C_\ell}=\sqrt{\dfrac{2}{2\ell+1}}\,C_\ell$ : how to include this standard deviation in the reasoning of his answer ?

I’m voting to close this question because it is about statistics, not physics. stats.stackexchange.com would be a better place to ask. — Buzz, Commented Jul 8, 2021 at 23:10
@Buzz I don't totally agree since statiticians don't know the notion of Legendre $C_\ell$ and $a_{\ell m}$ quantities and more generally the context of cosmology. Actually, this is a statistic issue but applied to cosmology and I am afraid that statiticians may be lost in all these concepts. — user87745, Commented Jul 9, 2021 at 7:42
@Buzz . I have posted this question on stats.stackexchange.com but moderators reproach me that my question is not clear and too much physical oriented : impossible to post on this exchange forum, they systematically close it : what have I got to do ? I am desperated. — user87745, Commented Jul 9, 2021 at 18:58
Doesn't this boil down to the statement that the variance of the sample mean scales like the inverse square root of the number of samples? wikipedia — Andrew, Commented Jul 10, 2021 at 18:00
@Buzz I don't agree there is no physics in this question, just because it involves statistics. Otherwise why not close questions about Newton's laws because they are "just" calculus questions? While the text of the question itself can probably be streamlined, it's fundamentally a question about the CMB angular power spectrum and cosmic variance. — Andrew, Commented Jul 10, 2021 at 18:12

Andrew · Accepted Answer · 2021-07-17 05:42:08Z

I interpret your question in the following way.

You have a theoretical model for the $C_\ell$ which says that the $C_\ell$ are independent of $\ell$. We can express this by saying that \begin{equation} C_\ell = \mathcal{D} \end{equation} where $\mathcal{D}$ does not depend on $\ell$. In this expression, all quantities are theoretical in the sense that they do not depend on any data, and in principle could be computed from theory.

Furthermore, you have some estimators $\hat{C}^{A}_\ell$ for $C_\ell$, where $A$ is an index that labels the detector that is doing a measurement (in your case, there are two surveys, so $A$ runs over the values $1$ and $2$). For all $A$ we will assume the estimator is unbiased \begin{equation} \langle \hat{C}_\ell^A\rangle = C_\ell \end{equation} We further assume that the variance of the estimators $\hat{C}_\ell^A$ are (a) independent of $\ell$, (b) uncorrelated between different $\ell$ values (each $\ell$ mode can be measured independently), and (c) uncorrelated between different experiments (different surveys have different errors). We can express these assumptions in the equation \begin{equation} \langle \hat{C}_\ell^A \hat{C}_{\ell'}^B \rangle - \langle \hat{C}_\ell^A \rangle \langle \hat{C}_{\ell'}^B \rangle = \sigma^2 \delta_{\ell \ell'} \delta^{AB} \end{equation} where $\delta_{ab}$ is the Kronecker delta ($\delta_{ab}=1$ if $a=b$ and $0$ otherwise). Assumption (a) may not be a good assumption in all circumstances; for example cosmic variance limits measurements of $C_\ell$ for low $\ell$ in the CMB but is subdominant at higher $\ell$.

Given these properties, we can construct an unbiased estimator for $\mathcal{D}$ (which we will call $\hat{D}$) as follows: \begin{equation} \hat{\mathcal{D}}^A = \frac{1}{\ell^A_{\rm max}-\ell^A_{\rm min}+1} \sum_{\ell=\ell^A_{\rm min}}^{\ell^A_{\rm max}} \hat{C}^A_{\ell} \end{equation} where $\ell^A_{\rm max}$ is the maximum value of $\ell$ that can be observed using survey $A$ (note $A$ is an index, not a power), and $\ell_{\rm min}^A$ is the minimum value.

Using the expression for $\langle \hat{C}^A_\ell \rangle$ given above, we can derive that $\hat{\mathcal{D}}^A$ is an unbiased estimator \begin{equation} \langle \hat{\mathcal{D}}^A \rangle = \mathcal{D} \end{equation} and we can also derive the variance of $\hat{\mathcal{D}}$ as \begin{equation} \langle \hat{\mathcal{D}}^A \hat{\mathcal{D}}^B \rangle - \langle \hat{\mathcal{D}}^A \rangle \langle \hat{\mathcal{D}}^B \rangle = \frac{\sigma^2}{\ell^A_{\rm max}-\ell^A_{\rm min}+1} \delta^{AB} \end{equation}

Then as I understand, your questions are:

How can I show that the variance in the estimate of $\mathcal{D}$ decreases as I include more $\ell$ modes?
How do I compare the variance (uncertainty) of two different probes?

To address question 1, we can see from the explicit formula above that the variance for $\mathcal{D}$ scales as one over the number of $\ell$ modes in the sum, so including more $\ell$ modes decreases the uncertainty on $\mathcal{D}$.

To address question 2, first note that there is no covariance between $A$ and $B$, so we can look at the variances of each experiment separately.

The variances in the surveys (accounting only for statistical uncertainty due to measuring a finite range of $C_\ell$) will be different if and only if the range of $\ell$ values that is probed is different. Explicitly, the ratio is \begin{equation} \frac{\ell_{\rm max}^B - \ell_{\rm min}^B + 1}{\ell_{\rm max}^A - \ell_{\rm min}^A + 1} \end{equation} which of course is 1 if the detectors probe the same range of $\ell$, $\ell_{\rm min}^A = \ell_{\rm max}^B$ $\ell_{\rm min}^A = \ell_{\rm min}^B$

Another source of difference between the two surveys is noise. We have so far concerned ourselves only with the expected signal in the detector, and its statistical variation from having limited to a finite range of $\ell$ modes. However, a detection also depends on the inherent noise in the instruments. Let's let $(n^A)^2$ be the noise variance in detector $A$. Then a natural definition of the expected signal-to-noise ratio in detector $A$ is \begin{equation} \rho_A = \frac{\mathcal{D}}{\sqrt{(n^A)^2 + \frac{\sigma^2}{\ell_{\rm max}^A - \ell_{\rm min}^A + 1}}} \end{equation} Taking a ratio of expected signal-to-noise ratios, $\rho_A/\rho_B$, gives us another way to compare the detectors, accounting for both inherent detector noise and statistical noise from having access to a finite number of $C_\ell$'s: \begin{equation} \left(\frac{\rho_A}{\rho_B}\right)^2 = \frac{(n^B)^2 + \frac{\sigma^2}{\ell_{\rm max}^B - \ell_{\rm min}^B + 1}}{(n^A)^2 + \frac{\sigma^2}{\ell_{\rm max}^A - \ell_{\rm min}^A + 1}} \end{equation} It is interesting to explore this ratio in several simplifying limits, for example when both detectors have the same range of $\ell$ ($\ell_{\rm max}^A=\ell_{\rm max}^B$ and $\ell_{\rm min}^A=\ell_{\rm min}^B$) and when the noise in both detectors is negligible $n^A,n^B \rightarrow 0$.

Thanks for this detailed answer. Just a question : the definition of $C_\ell$ is : $C_\ell = <|a_{lm}|^2> = \dfrac{1}{2\ell+1}\sum_{m=-\ell}^{m=+\ell}\,a_{lm}^{2}$, so $C_\ell = \mathcal{D}$ depends on $\ell$, do you understand ? — user87745, Commented Jul 17, 2021 at 6:02
@youpilat13 Well it depends on what the $\ell$ dependence of $a_{\ell m}$ is! Based on the discussion in the comments, you said that in your case, the sum over $\ell$ of the $C_\ell$ (used to estimate $\mathcal{D}$) could be thought of as an average; if it is a literal average, this is only possible if the $C_\ell$ are independent of $\ell$, which implies a certain $\ell$ dependence for the $a_{\ell m}$. If you have a different, known $\ell$ dependence of the $a_{\ell m}$ and $C_\ell$, you can generalize my argument by using a "weighted average" of $C_\ell$ (or "Wiener filter"). — Andrew, Commented Jul 17, 2021 at 6:06
It's not 100% clear to me what you are doing based on the question and comments, so I did my best to guess without simultaneously dealing with every possible case. — Andrew, Commented Jul 17, 2021 at 6:06
In other words, say $C_\ell = K \ell^4$ for some constant $K$, then you could define $B_\ell=C_\ell/\ell^4$ and run through my answer with $B_\ell$ instead of $C_\ell$. — Andrew, Commented Jul 17, 2021 at 6:08
@youpilat13 If you don't know anything about the $\ell$ dependence of the $C_\ell$ or $a_{\ell m}$, then you can't think of different $\hat{C}_\ell$'s as measurements of the same quantity; instead they are all measurements of the different quantities (the value of $C_\ell$ at different $\ell$'s). Then you don't gain in the uncertainty by measuring more $\ell$ modes. Rather, you gain by measuring more things with "average" uncertainty, instead of one thing with "good" uncertainty. — Andrew, Commented Jul 17, 2021 at 6:44

Stack Exchange Network

Demonstration: Integral (discrete summing) of $C_\ell$ has a better variance than one single $C_\ell$

1 Answer 1

Hot Network Questions

Demonstration: Integral (discrete summing) of $C_\ell$ has a better variance than one single $C_\ell$

1 Answer 1

Related

Hot Network Questions