
I'm struggling with the concept of conditional expectation. First of all, if you have a link to any explanation that goes beyond showing that it is a generalization of elementary intuitive concepts, please let me know.

Let me get more specific. Let $\left(\Omega,\mathcal{A},P\right)$ be a probability space and $X$ an integrable real random variable defined on $(\Omega,\mathcal{A},P)$. Let $\mathcal{F}$ be a sub-$\sigma$-algebra of $\mathcal{A}$. Then $E[X|\mathcal{F}]$ is the a.s. unique random variable $Y$ such that $Y$ is $\mathcal{F}$-measurable and for any $A\in\mathcal{F}$, $E\left[X1_A\right]=E\left[Y1_A\right]$.

The common interpretation seems to be: "$E[X|\mathcal{F}]$ is the expectation of $X$ given the information of $\mathcal{F}$." I'm finding it hard to get any meaning from this sentence.

  1. In elementary probability theory, expectation is a real number. So the sentence above makes me think of a real number instead of a random variable. This is reinforced by $E[X|\mathcal{F}]$ sometimes being called "conditional expected value". Is there some canonical way of getting real numbers out of $E[X|\mathcal{F}]$ that can be interpreted as elementary expected values of something?

  2. In what way does $\mathcal{F}$ provide information? To know that some event occurred, is something I would call information, and I have a clear picture of conditional expectation in this case. To me $\mathcal{F}$ is not a piece of information, but rather a "complete" set of pieces of information one could possibly acquire in some way.

Maybe you say there is no real intuition behind this, $E[X|\mathcal{F}]$ is just what the definition says it is. But then, how does one see that a martingale is a model of a fair game? Surely, there must be some intuition behind that!

I hope you have got some impression of my misconceptions and can rectify them.

    $\begingroup$ This is not the definition of conditional expectation with which I'm familiar. Do you have a reference? $\endgroup$
    $\begingroup$ @Qiaochu: I'm using Klenke's Probability Theory, but it's the same on Wikipedia. $\endgroup$
    – Stefan
    Commented Feb 24, 2011 at 21:19
  $\begingroup$ You may want to read the answer to this question, math.stackexchange.com/questions/23093/…, where user joriki explains what it means for event A to be conditionally dependent on event B. $\endgroup$
    – Uticensis
    Commented Feb 24, 2011 at 22:48
  $\begingroup$ A wonderful explanation about conditional expectation can be found here ma.utexas.edu/users/gordanz/notes/conditional_expectation.pdf $\endgroup$
  Commented Aug 12, 2016 at 14:08

Maybe this simple example will help. I use it when I teach conditional expectation.

(1) The first step is to think of ${\mathbb E}(X)$ in a new way: as the best estimate for the value of a random variable $X$ in the absence of any information. To minimize the squared error $${\mathbb E}[(X-e)^2]={\mathbb E}[X^2-2eX+e^2]={\mathbb E}(X^2)-2e{\mathbb E}(X)+e^2,$$ we differentiate to obtain $2e-2{\mathbb E}(X)$, which is zero at $e={\mathbb E}(X)$.

For example, if I throw a fair die and you have to estimate its value $X$, according to the analysis above, your best bet is to guess ${\mathbb E}(X)=3.5$. On specific rolls of the die, this will be an over-estimate or an under-estimate, but in the long run it minimizes the mean square error.

(2) What happens if you do have additional information? Suppose that I tell you that $X$ is an even number. How should you modify your estimate to take this new information into account?

The mental process may go something like this: "Hmmm, the possible values were $\lbrace 1,2,3,4,5,6\rbrace$ but we have eliminated $1,3$ and $5$, so the remaining possibilities are $\lbrace 2,4,6\rbrace$. Since I have no other information, they should be considered equally likely and hence the revised expectation is $(2+4+6)/3=4$".

Similarly, if I were to tell you that $X$ is odd, your revised (conditional) expectation is 3.

(3) Now imagine that I will roll the die and I will tell you the parity of $X$; that is, I will tell you whether the die comes up odd or even. You should now see that a single numerical response cannot cover both cases. You would respond "3" if I tell you "$X$ is odd", while you would respond "4" if I tell you "$X$ is even". A single numerical response is not enough because the particular piece of information that I will give you is itself random. In fact, your response is necessarily a function of this particular piece of information. Mathematically, this is reflected in the requirement that ${\mathbb E}(X\ |\ {\cal F})$ must be $\cal F$ measurable.

I think this covers point 1 in your question, and tells you why a single real number is not sufficient. Also concerning point 2, you are correct in saying that the role of $\cal F$ in ${\mathbb E}(X\ |\ {\cal F})$ is not a single piece of information, but rather tells what possible specific pieces of (random) information may occur.

    $\begingroup$ Great answer, thank you. "In fact, your response is necessarily a function of this particular piece of information." But then, why is $E(X|\mathcal{F})$ defined on $\Omega$ and not on $\mathcal{F}$? This is central to my not understanding. If $\mathcal{F}$ is not generated by a disjoint family of sets, there is no such interpretation of $E(X|\mathcal{F})(\omega)$, is there? $\endgroup$
    – Stefan
    Commented Feb 24, 2011 at 22:51
    $\begingroup$ The conditional expectation $E(X|{\cal F})$ is characterized by being $\cal F$ measurable and having certain integrals over $\cal F$ sets. This function is therefore only defined $P$ almost everywhere. Therefore, you are right; the pointwise value $E(X|{\cal F})(\omega)$ has no particular meaning. $\endgroup$
    – user940
    Commented Feb 25, 2011 at 15:02
    $\begingroup$ Wonderful explanation. $\endgroup$
    – JT_NL
    Commented Apr 20, 2011 at 23:05
    $\begingroup$ I want to add that if a r.v. $Z$ (defined on $\Omega$) is measurable with respect to $\mathcal F$, then one can indeed interpret $Z$ as a function of information of $\mathcal F$ in the following sense: if $\mathcal F$ represents the information for knowing outcome of another r.v. $Y$ (also defined on $\Omega$), i.e., if $\mathcal F = \sigma(Y)$, then $Z$ is a measurable function of $Y$, i.e., $Z = \phi(Y)$ (a.e.) for some $\phi$ where $\phi$ is some measurable function from the codomain of $Y$ (which would be $\mathbb R$ if $Y$ were a real-valued r.v.) to the codomain of $Z$. and... $\endgroup$
    – Jisang Yoo
    Commented Apr 26, 2014 at 16:04
    .. If $\mathcal F$ represents the information for knowing joint outcome of a (finite or countable) family of r.v.s $Y_i$ ($i \in I$), i.e., if $\mathcal F = \sigma(Y_i, i \in I)$, then $Z$ is a measurable function of $Y$, i.e., $Z = \phi(Y_i, i \in I)$ (a.e.) for some $\phi$ that is a measurable function from the product of codomains of $Y_i$ to the codomain of $Z$. $\endgroup$
    – Jisang Yoo
    Commented Apr 26, 2014 at 16:04

I think a good way to answer question 2 is as follows.

I am performing an experiment, whose outcome can be described by an element $\omega$ of some set $\Omega$. I am not going to tell you the outcome, but I will allow you to ask certain questions yes/no questions about it. (This is like "20 questions", but infinite sequences of questions will be allowed, so it's really "$\aleph_0$ questions".) We can associate a yes/no question with the set $A \subset \Omega$ of outcomes for which the answer is "yes".

Now, one way to describe some collection of "information" is to consider all the questions which could be answered with that information. (For example, the 2010 Encyclopedia Britannica is a collection of information; it can answer the questions "Is the dodo extinct?" and "Is the elephant extinct?" but not the question "Did Justin Bieber win a 2011 Grammy?") This, then, would be a set $\mathcal{F} \subset 2^\Omega$.

If I know the answer to a question $A$, then I also know the answer to its negation, which corresponds to the set $A^c$ (e.g. "Is the dodo not-extinct?"). So any information that is enough to answer question $A$ is also enough to answer question $A^c$. Thus $\mathcal{F}$ should be closed under taking complements. Likewise, if I know the answer to questions $A,B$, I also know the answer to their disjunction $A \cup B$ ("Are either the dodo or the elephant extinct?"), so $\mathcal{F}$ must also be closed under (finite) unions. Countable unions require more of a stretch, but imagine asking an infinite sequence of questions "converging" on a final question. ("Can elephants live to be 90? Can they live to be 99? Can they live to be 99.9?" In the end, I know whether elephants can live to be 100.)

I think this gives some insight into why a $\sigma$-field can be thought of as a collection of information.

  $\begingroup$ Great job on this answer! $\endgroup$
    – Don Shanil
    Commented Jul 11, 2016 at 12:16
  $\begingroup$ The Borel $\sigma$-field $\mathcal{B}$ contains every singleton $\{\omega\}$, so with the information in $\mathcal{B}$ we'd be able to answer every question $A\subseteq \mathbb{R}$ by asking the oracle whether $\omega\in\{\omega\}$. So we should add the caveat that we can only decide questions that we can understand. A Vitali set, for instance, is not intelligible. $\endgroup$
  @LaconianThinker: Yeah, in the case of $\mathbb{R}$, you can start by saying that you can ask whether $\omega < b$ for any $b$ (i.e. the set $(-\infty, b)$). This is a "computable" question in the sense that if you start computing decimal digits of $\omega$, you will eventually be able to tell whether it is less than $b$. And if you're able to ask a countable number of such questions, then you can determine whether $\omega$ is in your favorite Borel set $B$. But a countable number of such questions can never tell you whether $\omega$ is in the Vitali set.
  @NateEldredge I'm not sure you understand my point (maybe I don't, either). Naively, we'd say a set $A$ is measurable if, regardless of which $\omega$ was sampled, we can decide whether $\omega\in A$ by asking the oracle countably many questions of the form "is $\omega$ less than $b$? or "is $\omega$ equal to $b$?". (I know the second case is redundant, but it will simplify my explanation). (continued)
  The problem is that any set (even a Vitali set!) would be measurable: if say, $\omega = 5$, we can ask the question "is $\omega$ equal to $5$?", and the answer ("yes") will give us complete information about $\omega$. Similarly for any other value $\omega$ could possibly have.

An example. Suppose that $X \sim {\rm binomial}(m,p)$ and $Y \sim {\rm binomial}(n,p)$ are independent ($0 < p < 1$). For any integer $0 \leq s \leq m+n$, it holds $$ {\rm E}[X|X + Y = s] = \frac{{m }}{{m + n }}s. $$ This means that $$ {\rm E}[X|X + Y] = \frac{{m }}{{m + n }}(X+Y). $$ Note that ${\rm E}[X|X + Y]$ is a random variable which is a function of $X+Y$.

Note that, in general, the conditional expectation of $X$ given $Z$, denoted ${\rm E}[X|Z]$, is defined as ${\rm E}[X|\sigma(Z)]$, where $\sigma(Z)$ is the $\sigma$-algebra generated by $Z$.

EDIT. In response to the OP's request, I note that the binomial distribution (which is discrete) plays no special role in the above example. For completely analogous results for the normal and gamma distributions (both are continuous) see this and this, respectively; for a substantial generalization, see this.

  $\begingroup$ @Shai: +1, nice answer! I was wondering whether the conditional expectation of a r.v. $X$ given another r.v. $Y$ is defined on the common domain $\Omega$ of both $X$ and $Y$, or on the codomain of $Y$, or can be either? The Wikipedia article says it is defined on the codomain of $Y$. See en.wikipedia.org/wiki/… $\endgroup$
    – Tim
    Commented Feb 24, 2011 at 22:10
  $\begingroup$ @Tim: The conditional expectation is defined on the common domain $\Omega$. $\endgroup$
    – Shai Covo
    Commented Feb 24, 2011 at 22:23
  $\begingroup$ Thanks for your answer. In this example, there is a connection between conditional expectation and elementary expectation. But I fail to see how this generalizes a) to non-discrete random variables and b) to $\sigma$-algebras $\mathcal{F}$ which are not of the form $\sigma(Y)$ for any random variable $Y$. $\endgroup$
    – Stefan
    Commented Feb 24, 2011 at 22:25
  $\begingroup$ @Shai: Define $f$ by $E[X|Z]=f(Z)$. In your example $E[X|Z=s]=f(s)$. This is absurd if $P[Z=s]=0$. If you have something similar in that case, it would be helpful, though it probably won't answer all my questions. $\endgroup$
    – Stefan
    Commented Feb 24, 2011 at 23:26
  $\begingroup$ @Shai: I didn't see your edit before posting my comment. So is it correct that $E[X|Z=s]=f(s)$ defines $E[X|Z=s]$ with the terminology of my last comment? $\endgroup$
    – Stefan
    Commented Feb 24, 2011 at 23:37

You can think of the conditional expectation as the orthogonal projection onto the closed subspace of $\mathcal F$-measurable random variables in the Hilbert space of square integrable random variables.

This is a detailed and elementary discussion of this viewpoint.


I happened to read an article on Wikipedia today on Conditional Expectation. That clarified a lot of my questions. Hope it helps!

  1. For your first question, in the linked article, there is the definition for conditional expectation of a r.v. $X: \Omega \rightarrow \mathbb{R}$ given a sub sigma algebra $\mathcal{F}$ of the one $\mathcal{A}$ on domain $\Omega$. It is a $\mathcal{F}$-measurable function $: \Omega \rightarrow \mathbb{R}$, denoted as $E(X \vert \mathcal{F})$. If you evaluate this conditional expectation at a point $\omega \in \Omega$, you will get a value $E(X \vert \mathcal{F})(\omega)$, which is called the conditional expectation of $X$ given $\mathcal{F}$ at $\omega$.

    When the r.v. $X$ is an indicator function on some measurable subset say $A \in \mathcal{A}$, its conditional expectation given the sub sigma algebra is called the conditional probability of the subset $A$ given the sub sigma algebra $\mathcal{F}$, denoted as $P( A \vert \mathcal{F})$. It is a mapping: $\Omega \rightarrow \mathbb{R}$.

    If we let $A$ vary within $\mathcal{A}$, the conditional probability $P( \cdot \vert \mathcal{F})$ is a mapping: $\mathcal{A} \times \Omega \rightarrow \mathbb{R}$. In some cases, $\forall \omega \in \Omega$, $P( \cdot \vert \mathcal{F})(\omega)$ is a probability measure on $(\Omega, \mathcal{F})$, in which case $P(\cdot \vert \mathcal{F})$ is called a regular conditional probability.

    When $\mathcal{F}$ is generated by another r.v. $Y$, then the conditional expectation and conditional probability will be called the ones given the r.v. $Y$.

  2. For your second question, I am still wondierng what kind of information a sigma algebra (of a r.v.) can provide, and how it is provided?

I was only able to understand the notion of the conditional expectation with respect to a sub-$σ$-algebra $\mathcal F$, when I realized that this game is only interesting when $\mathcal F$ is "not Hausdorff", meaning that there might be points $x$ and $y$ which cannot be separated by an $\mathcal F$-measurable set. Any $\mathcal F$-measurable function must therefore coincide on $x$ and $y$, so $E(X|\mathcal F)$ tries to be the best photograph of the random variable $X$ which coincides on $x$ and $y$, as well as on any other similar pairs of points.

In the event that $\mathcal F$ is the smallest sub-$σ$-algebra possible, namely $\mathcal F = \{\emptyset, \Omega\}$, only constant functions are measurable. So $E(X|\mathcal F)$ must be a constant function, and that constant turns out to be the average of $X$, a.k.a. the expectation of $X$.

PS. This is a comment I made in a recent question (Refference for conditional expectation) which in turn brought me here to this 10 year old question when I clicked on a "Related" post. Reading the answers I did not find anyone referring to the above point of view, so I hope my little contribution will be useful to someone.


