Is there a name for this probabilistic paradox?

Question

Let $X\sim Exp(1)$ and $Y\sim Exp(\lambda)$, independent. Then, \begin{align} f_{X|Y=mX}(x) = \frac{f_{X,Y}(x,mx) }{\int f_{X,Y}(x,mx) \:dx }=\frac{f_X(x)f_Y(mx) }{\int f_X(x)f_Y(mx) \:dx } = \frac{e^{-(1+\lambda m)x}}{\int e^{-(1+\lambda m)x} dx} = (1+m\lambda)e^{-(1+\lambda m)x} \end{align} So $X|_{Y=mX} \sim Exp(1+m\lambda)$. That means $E[X|Y=mX]=\frac{1}{1+ \lambda m} < 1$ whenever $m>0$.

This makes sense mathematically. $X|_{Y=0}\sim Exp(1)$, $f_{X,Y}(0,0)$ is the same for all $m$, but $f_{X,Y}(x,mx)<f_{X,Y}(x,m'x)$ if $m'<m$.

The practical implication seems weird, though. Your friend gets to your house via one Poisson bus and one Poisson train. You expect someone to wait 1 minute for a train. But they tell you they waited $m$ times as long for the bus than they did for the train, and now you gotta revise your expectation about the train down?

Edit: I think the reason for the (seeming) violation of the Tower property is that I incorrectly defined $f_{X|Y=mX}$. It should instead be

$$ f_{X|Y=mX}(x) = \frac{xf_{X,Y}(x,mx) }{\int x f_{X,Y}(x,mx) \:dx} = \frac{xe^{-(1+\lambda m)x}}{\int xe^{-(1+\lambda m)x} dx} $$ Think about it like the flag of Seychelles: the width of the "ray" is twice as large if you go twice as far out. (Aside: this is indeed Borel's paradox) This means that \begin{align} E[X|Y=mX]= \frac{\int x^2e^{-(1+\lambda m)x} dx}{\int xe^{-(1+\lambda m)x} dx} = \frac{2}{1+\lambda m} \end{align} The distribution of slopes is the distribution of $M:=Y/X$ \begin{align} f_{M}(m) &= \int_0^\infty f_Y(y) f_{1/X}(m/y) y^{-1}\:dy\\ &= \int_0^\infty \lambda e^{-\lambda y}\left(\frac{y}{m}\right)^2 e^{-y/m} y^{-1} \:dy\\ &= \frac{\lambda}{m^2} \int_0^\infty y e^{-y(\lambda + \frac{1}{m}) } \:dy \\ &= \frac{\lambda}{m^2}\frac{1}{(\frac{1}{m} + \lambda)^2}\\ &= \frac{\lambda}{ (1+\lambda m)^2} \end{align} The Law of Iterated Expectaions holds for this definition of the conditional density: \begin{align} \int E[X|Y/X = m] \: dP(M\leq m) = \int_0^\infty \frac{2\lambda}{ (1+\lambda m)^3} \:dm = -\frac{1}{(1+\lambda m)^2}\bigg|_0^\infty = 1 = EX \end{align}

I'm sorta confused. $X|Y=mX$ seems like a circular or non well defined description of a variable. Would you please help me understand better — Alborz, Commented Aug 13, 2023 at 23:30
@Alborz It's shorthand for $$E(X\mid Y=mX) \overset{\text{def}}{=}\lim_{h\to 0}E(X\mid m\leq Y/X \leq m+h)$$ (amongst other things...) — Andrew, Commented Aug 13, 2023 at 23:32
@AndrewZhang but $X$ and $Y$ are independent. So doesn't $E(X | Y = mX) = E(X)$? — Alborz, Commented Aug 13, 2023 at 23:34
I'm actually not sure the proposed calculation is correct, I would have to check it. Unfortunately, for this sort of thing, there are (at least) two ways to interpret such a notation, and they generally lead to two different answers. Indeed, one can also interpret $$E(X\mid Y = mX) \overset{\text{def}}{=}\lim_{h\to 0}E(X\mid 0\leq Y-mX \leq h)$$, which is different than what I initially proposed. Regarding your question, I don't see how that follows @Alborz — Andrew, Commented Aug 13, 2023 at 23:40
I know there's some theoretical issues when you use continuous RVs, but it makes just as much sense if you use some sort of discrete RV where the masses are decreasing. For example, let $X$ be the number of consecutive heads on one coin, let $Y$ be the number on a second coin. The expected value of $X$ is 2, but the expected value of $X$ conditional on knowing that $X$ is the same as $Y$ is not 2. — Christopher Wu, Commented Aug 13, 2023 at 23:51

heropup · Accepted Answer · 2023-08-14 00:41:38Z

The first question is to ask, what are the set of outcomes for which $Y = mX$ for a fixed and known $m$? This is just the set of ordered pairs $$\mathcal S = \{(x,y) \in \mathbb R^2 \mid (x > 0) \cap (y = mx) \}.$$ Visualized geometrically in the Cartesian coordinate plane, this is an (open) ray in the first quadrant that is a subset of the line $y = mx$. The joint density of $X$ and $Y$ is simply $$f_{X,Y}(x,y) = e^{-x} e^{-\lambda y} \mathbb 1(x > 0) \mathbb 1 (y > 0),$$ and so, given that the outcome lies on this ray, the probability density of $X$ must be proportional to

$$f_{X \mid Y = mX}(x) \propto f_{X,Y}(x,mx) = f_X(x) f_Y(mx) = e^{-x} e^{-\lambda m x} = e^{-(\lambda m + 1) x}.$$ This implies the conditional distribution is exponential with rate $\lambda m + 1$, hence its expectation is $1/(\lambda m + 1)$.

Note that your calculation cannot be correct because it does not depend on $\lambda$, but $\lambda$ is informative of $Y$ and in turn, informative of $X$.

I do not see any intrinsic paradox here. The idea is that knowledge that the waiting time of one variable is exactly $m$ times the waiting time of the other, gives you additional information about the outcome of both. A simple way to see this is to look at the discrete analogue, which is a geometric distribution. Suppose $X \sim \operatorname{Geometric}(1/2)$ and $Y \sim \operatorname{Geometric}(1/4)$, where $X$ and $Y$ are independent. For convenience, let the parametrizations have strictly positive support, so their individual means are $2$ and $4$, respectively. Then given that $Y = 2X$, this corresponds to the set of outcomes $$\{(1,2), (2,4), (3,6), \ldots\}$$ and so, the probability mass function of $X$ is given by $$\Pr[X = x \mid Y = 2X] = \frac{\Pr[X = x]\Pr[Y = 2x]}{\sum_{x=1}^\infty \Pr[X = x]\Pr[Y = 2x]} = \frac{(1/2)^x (3/4)^{2x-1} (1/4)}{3/23} = \frac{23}{9} \left(\frac{9}{32}\right)^x,$$ for $x \in \mathbb Z^+$. That is to say, $X$ given that $Y = 2X$ is geometric with parameter $23/32 \ne 1/2$. In this case, we can use the coin-flipping analogy to interpret such a result: we have coin A that is fair, and coin B that has a $1/4$ probability of showing heads. Your friend flips both until the first occurrence of heads, and counts the numbers $(X, Y)$ of tails that were observed for coins $A$ and $B$, respectively. He reports there were twice as many tails for $Y$ as for $X$. This is additional information about $X$ that should affect your idea about the average number of tails for $X$--indeed, it also furnishes additional information about $Y$. In fact, it should be plainly obvious that this information must affect the posterior marginal $Y$, since in the discrete case, $Y$ cannot be odd when $m = 2$.

As an exercise, what is the general case for geometric $X$ and $Y$ with parameters $p_1$ and $p_2$, respectively, and for some positive integer constant $m$?

Another exercise: Is there a choice of $p_1, p_2, m$ such that the posterior for $X$ remains unchanged? Why or why not?

I think the (unequivocal) assertion that $f_{X\mid Y=mX} \propto f(x,mx)$ is problematic, essentially due to the reasons I wrote in my comments. — Andrew, Commented Aug 14, 2023 at 0:55
Of course the extra information can change the expectation. The "paradox" here is that the extra information (irrespective of the value of $m$) always changes it in the same direction. This is impossible since $E(E(X|Z))=E(X)$. Therefore the conditional distribution you obtain must be incorrect. — Especially Lime, Commented Aug 14, 2023 at 9:12
@EspeciallyLime Yes, I think you are right. In particular, rather than taking a line integral, one has to "weight" the line integral. — Christopher Wu, Commented Aug 14, 2023 at 12:35
@EspeciallyLime "$E(E(X|Z))=E(X)$" doesn't apply to $E(X\mid Y=mX),$ as $\{Y=mX\}$ is an event, not a random variable. Of course $E(E(X\mid {1}_{\{Y=mX\}}))=E(X),$ but that's a different matter. — r.e.s., Commented Aug 14, 2023 at 17:36

Stack Exchange Network

Is there a name for this probabilistic paradox?

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
probability
probability-theory
probability-distributions
terminology
paradoxes
.

Hot Network Questions

Is there a name for this probabilistic paradox?

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged probabilityprobability-theoryprobability-distributionsterminologyparadoxes.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
probability
probability-theory
probability-distributions
terminology
paradoxes
.