1
$\begingroup$

Suppose I have a general random X, that is not necessarily continuous. It makes sense to me that I should be able to write:

$$E(X|X>c)=\dfrac{E(X\cdot 1_{X>c})}{P(X>c)}$$

If I assume that the variable is either completely discrete or continuous, the conditional probability/density definitions can be used. But I can't find an argument for a general distribution.

$\endgroup$
3
  • 8
    $\begingroup$ For any integrable random variable $X$ and event $A$ with $P(A)>0$, $E(X|A)$ is defined to be $E(X \cdot 1_A)/P(A)$. $\endgroup$
    – Julius
    Commented May 29 at 11:22
  • $\begingroup$ Do you know a book that defines it this way? $\endgroup$
    – Yeet
    Commented May 29 at 22:06
  • $\begingroup$ @Yeet in contrast to conditional (on event) probability, I'd say that such conditional expectation is more rarely define explicitly in the textbooks. Instead, a conditional expectation wrt sigma-algebra is defined most of the cases (whereas conditional probability wrt sigma-algebra usually is not talked about explicitly) $\endgroup$
    – SBF
    Commented May 31 at 11:19

3 Answers 3

3
$\begingroup$

The solution hinges on how $\mathbf{E}[\cdot \mid X > c]$ is defined.

If it is defined as the expectation with respect to the conditional probability measure $\mathbf{P}(\cdot \mid X > c)$, then the proof goes by invoking the standard machinery as follows:

Theorem. For any random variable $Y$ that is either a.s.-non-negative or integrable with respect to $\mathbf{P}$, we have

$$ \mathbf{E}[Y \mid X > c] = \frac{\mathbf{E}[Y \cdot \mathbf{1}_{\{X > c\}}]}{\mathbf{P}(X > c)} \tag{*}\label{e:wtp} $$

Step 1. Note that if $\mathbf{Q}$ is any probability measure and $\mathbf{E}_{\mathbf{Q}}$ is the associated expectation, then we have $\mathbf{E}_{\mathbf{Q}}[\mathbf{1}_A] = \mathbf{Q}(A) $ for any event $A$. Now applying this to the choice $\mathbf{Q}(\cdot) = \mathbf{P}(\cdot \mid X > c)$, for any event $A$ we get

\begin{align*} \mathbf{E}[\mathbf{1}_A \mid X > c] = \mathbf{P}(A \mid X > c) &= \frac{\mathbf{P}(A \cap \{ X > c \})}{\mathbf{P}(X > c)}. \end{align*}

Then applying the same argument for $\mathbf{Q} = \mathbf{P}$, the right-hand side becomes

\begin{align*} \frac{\mathbf{P}(A \cap \{ X > c \})}{\mathbf{P}(X > c)} &= \frac{\mathbf{E}[\mathbf{1}_{A \cap \{ X > c \}}]}{\mathbf{P}(X > c)} = \frac{\mathbf{E}[\mathbf{1}_A \cdot \mathbf{1}_{\{ X > c \}}]}{\mathbf{P}(X > c)}. \end{align*}

This proves that $\eqref{e:wtp}$ holds when $Y = \mathbf{1}_A$ is an indicator function.

Step 2. By the linearity of expectation, $\eqref{e:wtp}$ holds when $Y$ is a linear combination of finitely many indicator functions.

Step 3. Now let $Y$ be any non-negative random variable. Then

$$ Y_n = \min\biggl\{ \frac{\lfloor 2^n Y \rfloor}{2^n}, n \biggr\} = \sum_{k=0}^{n2^n - 1} \frac{k}{2^n} \mathbf{1}_{\{\frac{k}{2^n} \leq Y < \frac{k+1}{2^n} \}} + n \mathbf{1}_{\{Y \geq n\}} $$

increases monotonically to $Y$ as $n \to \infty$. Also, by Step 2,

\begin{align*} \mathbf{E}[ Y_n \mid X > c] &= \frac{\mathbf{E}[Y_n \cdot \mathbf{1}_{\{X > c\}}]}{\mathbf{P}(X > c)}. \end{align*}

Then by letting $n \to \infty$, the monotone convergence theorem tells that $\eqref{e:wtp}$ holds in this case.

Step 4. If $\mathbf{E}[|Y|] < \infty$, then writing $Y = Y_+ - Y_-$ for the positive part $Y_+ = \max\{Y, 0\}$ and the negative part $Y_- = \max\{-Y, 0\}$ and then applying Step 3 to each of $Y_{\pm}$ shows that $\eqref{e:wtp}$ holds in this case.

$\endgroup$
2
$\begingroup$

By definition of $E[X\mid X>c]$: $$E[X\mid\sigma(X>c)]=E[X\mid X>c]1_{X>c}+E[X\mid X\leq c]1_{X\leq c}$$ And by definition of conditional expectation: $$E[X1_{X>c}]=E\big[E[X\mid\sigma(X>c)]1_{X>c}\big]=E\big[E[X\mid X>c]1_{X>c}\big]=$$ $$=E[X\mid X>c]E[1_{X>c}]=E[X\mid X>c]P(X>c)$$
I would like to clarify this point:

Let $X$ be a random variable defined on $(\Omega,\mathcal{F},\mathbf{P})$ and $\mathcal{A},\mathcal{H}\in \mathcal{F}$
According to the definition of conditional expectation, it is easy to show that: $$\mathbf{E}[X\mid\sigma(\mathcal{H})]=\mathbf{E}[X\mid I_{\mathcal{H}}]=\frac{\mathbf{E}[XI_\mathcal{H}]}{\mathbf{P}(\mathcal{H})}I_{\mathcal{H}}+\frac{\mathbf{E}[XI_\mathcal{\bar{H}}]}{\mathbf{P}(\mathcal{\bar{H}})}I_{\mathcal{\bar{H}}}$$

We call these constants $\mathbf{E}[X\mid\mathcal{H}]$ and $\mathbf{E}[X\mid \mathcal{\bar{H}}]$, respectively. That is to say:

$$\mathbf{E}[X\mid\mathcal{H}]=\frac{\mathbf{E}[XI_\mathcal{H}]}{\mathbf{P}(\mathcal{H})}\quad and \quad \mathbf{E}[X\mid \mathcal{\bar{H}}]=\frac{\mathbf{E}[XI_\mathcal{\bar{H}}]}{\mathbf{P}(\mathcal{\bar{H}})}$$ Similarly: $$\mathbf{P}[\mathcal{A}\mid\sigma(\mathcal{H})]=\mathbf{P}[\mathcal{A}\mid I_{\mathcal{H}}]=\frac{\mathbf{P}[\mathcal{A}\cap\mathcal{H}]}{\mathbf{P}(\mathcal{H})}I_{\mathcal{H}}+\frac{\mathbf{P}[\mathcal{A}\cap\mathcal{\bar{H}}]}{\mathbf{P}(\mathcal{\bar{H}})}I_{\mathcal{\bar{H}}}$$

We call these constants $\mathbf{P}[\mathcal{A}\mid\mathcal{H}]$ and $\mathbf{P}[\mathcal{A}\mid \mathcal{\bar{H}}]$, respectively. That is to say:

$$\mathbf{P}[\mathcal{A}\mid\mathcal{H}]=\frac{\mathbf{P}[\mathcal{A}\cap\mathcal{H}]}{\mathbf{P}(\mathcal{H})}\quad and \quad\mathbf{P}[\mathcal{A}\mid \mathcal{\bar{H}}]=\frac{\mathbf{P}[\mathcal{A}\cap\mathcal{\bar{H}}]}{\mathbf{P}(\mathcal{\bar{H}})}$$

And, since: $\mathbf{E}[X\mid\sigma(\mathcal{H})]=\displaystyle\int Xd\mathbf{P}(.\mid\sigma(\mathcal{H})) $, it follows easily that: $$\mathbf{E}[X\mid\mathcal{H}]=\int Xd\mathbf{P}(.\mid\mathcal{H})\quad and \quad \mathbf{E}[X\mid \mathcal{\bar{H}}]=\int Xd\mathbf{P}(.\mid\mathcal{\bar{H}})$$

$\endgroup$
3
  • $\begingroup$ I'm not sure I follow. What is the definition of $\mathbf{E}\left( X \mid X > c \right)$ you use? Your first line suggests the definition is $\frac{\mathbf{E}\left( X \mid \sigma \left( X > c \right) \right) - \mathbf{E}\left( X \mid X \leq c \right) \mathbf{1}_{X \leq c}}{\mathbf{1}_{X > c}}$ $\endgroup$
    – msantama
    Commented May 30 at 3:25
  • $\begingroup$ If you meant to write $\mathbf{E}\left( X \mid \sigma \left( X > c \right) \right) = \mathbf{E}\left( X \mid \sigma \left( X > c \right) \right) \mathbf{1}_{X > c} + \mathbf{E}\left( X \mid \sigma \left( X \leq c \right) \right) \mathbf{1}_{X \leq c}$ then I certainly agree. But then $\mathbf{E}\left( \mathbf{E}\left( X \mid \sigma \left( X > c \right) \right) \mathbf{1}_{X > c} \right) = \mathbf{E}\left( \mathbf{E}\left( X \mid X > c \right) \mathbf{1}_{X > c} \right)$ does not follow. $\endgroup$
    – msantama
    Commented May 30 at 3:35
  • $\begingroup$ $E[X\mid\sigma(X>c)]$ is necessarily of the form $A1_{X>c}+B1_{X\leq c}$. We call $E[X\mid X>c]$ and $E[X\mid X\leq c]$ the constants $A$ and $B$. Similarly $P[X\in A\mid\sigma(X>c)]=P[X\in A\mid X>c]1_{X>c}+P[X\in A\mid X\leq c ]1_{X\leq c}$. Certainly the conditional expectations are integral with respect to these conditional probabilities. $\endgroup$
    – Speltzu
    Commented May 30 at 6:02
0
$\begingroup$

I will expand on Sangchul's comment that the solution depends on how we define $\mathbf{E}\left( X \mid X \in B \right)$, where $B$ is any Borel set. Suppose we work in a probability space $\left( \Omega, \mathcal{F}, \mathbf{P} \right)$ and let $X, Y$ be Borel-measurable random variables.

For any $\mathcal{H} \subset \mathcal{F}$ we define the conditional expectation $\mathbf{E}\left( X \mid \mathcal{H} \right)$ to be any $\mathcal{H}$-measurable random variable such that

$$\int_{H} X \ \text{d}\mathbf{P} = \int_H \mathbf{E}\left( X \mid \mathcal{F} \right) \ \text{d}\mathbf{P} \quad \forall H \in \mathcal{H}$$

We further extend this definition by understanding $\mathbf{E}\left( X \mid Y \right)$ to mean $\mathbf{E}\left( X \mid \sigma_Y \right)$.

One option is to interpret $\mathbf{E}\left( X \mid X \in B \right)$ as $\mathbf{E}\left( X \mid \mathbf{1}_{X \in B} \right)$, ie. conditioned on a random variable. By the preceding comments this makes $\mathbf{E}\left( X \mid X \in B \right)$ a well-defined random variable.

Another reasonable choice is to regard $\mathbf{E}\left( X \mid X \in B \right)$ as $\mathbf{E}\left( X \mid X^{-1}\left( B \right) \right)$, ie. conditioned on an event. We still must define what this means, and I think Sangchul gives a great answer using a conditional probability measure.

A neat property I think worth mentioning concerns consistency between the two interpretations. It can be show that for any $A \in \mathcal{F}$

$$\mathbf{E}\left( X \mid \mathbf{1}_A \right)\left( \omega \right) = \mathbf{E}\left( X \mid A \right)$$

for almost all $\omega \in A$.

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .