24
$\begingroup$

A few days ago I had my last discussion session on probability theory as a TA. In the end I asked students to ask me questions as this is the last class. One of the student asked me about the (real) definition of expectation; he said he is confused by "simple facts" like $$ E(c)=c, c\in \mathbb{R} $$ and why the expectation is linear.

Embarrassed that I did not explain this well enough earlier, I tried to explain to him expectation is a form of weighted sum, and for continuous case we are using some kind of probability measure coming from the density function. So the expectation is really some kind of integration. The student seemed to follow at this point. Then I tried to explain that the integration involved this way is not the same as the Riemann integration he learned in calculus classes, which is partly why the expectation of Cauchy distribution does not exist. I drew a graph and showed the Lebesgue integral can be viewed as a kind of "horizontal"-decomposition of the integration area. He asked me a very good question:

"What is the benefit of using horizontal instead of vertical? Isn't that the same thing?"

I did not really know how to answer this appropriately in a short time. I showed him that the horizontal decomposition would involve general (measurable) sets, not rectangles. And I also intuitively defined the outer measure of a set using open boxes. As a practice I showed that $\mathbb{Q}\cap [0,1]$ has zero measure using this definition, and as a result the Dirichlet function has zero integral on the unit interval, while it is impossible to define the Riemann integral rigorously. However, I noticed that by this point he was more or less lost when I showed $m(\mathbb{Q}\cap [0,1])=0$. I told him that to understand it properly he needed to take a year of real analysis classes, and that he should consult a professor I know in my department.

I want to ask what is a good way to explain the ideas of the Lebesgue integral to a student like him next time without making him/her confused. I later learned that the student did not have a proper proof writing background (like he did not know what it means to be injective and surjective). Since these students constitute the majority of my my classes, I feel obliged to find a way to explain myself better without forcing them to pay a visit to my professor or read a serious textbook.

$\endgroup$
6
  • 1
    $\begingroup$ Not a duplicate but some of the information could be helpful matheducators.stackexchange.com/questions/135/… $\endgroup$
    – quid
    Commented Dec 7, 2014 at 11:44
  • 2
    $\begingroup$ I think I understand the issue. However, if you use instead of convergence of functions the fact that the measure is/should be sigma-addivity this might work. And, arguably, a main reasons why one uses Lebesgue integral in that context is just to have sigma-addivity. I think additivity is really intuitive as something one wants to have and then to generalize to countable unions should also be alright. I would try the message: we use this integral since in this way we get this nice property sigma-additive, while otherwise we do not. (Might expand to answer later.) $\endgroup$
    – quid
    Commented Dec 7, 2014 at 12:40
  • 15
    $\begingroup$ Why go to Lebesgue integration at this point? There are plenty of Riemann integrable probability distributions (there are plenty of finite probability distributions!) and the question can be answered in terms of them. $\endgroup$ Commented Dec 7, 2014 at 15:20
  • 2
    $\begingroup$ @DavidSpeyer: I agree. Perhaps I should not have mentioned a concept I cannot explain very well. $\endgroup$ Commented Dec 7, 2014 at 16:23
  • 6
    $\begingroup$ I don't see a need to introduce L integration for non math majors. But if you do mention it, do so in a conceptual manner, so they have a concept of what it is about (something similar to the Wikipedia lead section). "Can integrate non continuous functions" or the like. DON'T try to teach them the method or explain by proofs. Explain as the concept and leave it at that. Plus you can better use time with your target audience to teach them some actual probability and statistics, methods and applications, rather than theory foundations. $\endgroup$
    – guest
    Commented Jan 29, 2018 at 7:36

6 Answers 6

14
$\begingroup$

As you told the student, the easiest way is to regard the Lebesgue integral as beginning with a partition of the range, rather than the domain. Perhaps a more refined way to view this is that the partition, rather than the "heights" of the rectangles, can be used to encode the "shape" of function being integrated.

The way to encode the function in the partition is to build the parts using the natural sets $S_{\lambda}=\{t\in \mathbb{R}:f(t)> \lambda\}$. These sets behave well with respect to limits. (Of course to be honest you must talk about sigma algebras…but this is a trailer for the film!)

Since you are talking to a non-major, you could stop here and just say that following up with this we obtain natural and desirable properties that Riemann integrability is too strong to satisfy, like the Monotone Convergence Theorem. Also, you might note that the question of when a function is Riemann integrable is naturally answered using the resulting class of sets (f on [a,b] is Riemann integral if and only if it is bounded and continuous almost everywhere in terms of the Lebesgue measure). All of this is discussed on the Wikipedia page…so then you could point the student there!

Also, see the excellent book by David Bressoud for an extremely detailed account of the history and motivation.

$\endgroup$
8
$\begingroup$

To see the reason why Lebesgue integral is preferred in probability theory one must go beyond the setting of real functions $f \colon \mathbb{R} \mapsto \mathbb{R}$. In this setting both the Riemann and the Lebesgue integral can be defined, and reasons for choosing one of them are quite subtle.

But in probability the interest is in random variables, which are functions $X(\omega) \colon \Omega \mapsto \mathbb{R}$, say. Here $\Omega$ is a probability space, which is an abstract space which can be pretty much anything (with a $\sigma$-algebra included). Generally we cannot assume that this space carries a structure necessary to define a Riemann integral! To define the Riemann integral we would need to be able to define intervals and partitions of them in $\Omega$, and we cannot.

But the Lebesgue integral do not need such things, as we use the structure of the value space (here $\mathbb{R}$) to define it. That way it can be used in the great generality we need in probability theory.

One can in part get away with defining expectation in a bits-and-pieces manner, like one integral when the random variable $X$ has a density, a sum when $X$ has a probability mass function, some weird beast when it is a mixture of those two cases, ... But using the Lebesgue integral we can get one, unifying definition. I think that is the real reason for preferring the Lebesgue integral, not the subtleties with the convergence theorems (which is an added premium).

$\endgroup$
3
  • 2
    $\begingroup$ Do you use non-standard Borel probability spaces often? If not, then all of your probability spaces just look like (are Borel isomorphic) the unit interval union finitely many atoms. I love this motivation you point to, but the Borel isomorphism here gives me pause, as most probability spaces any non-major would care about are certainly standard, and saying that these spaces are terribly more exotic than the real line may not be really telling the truth! $\endgroup$
    – Jon Bannon
    Commented Jan 27, 2016 at 17:59
  • 2
    $\begingroup$ @Jon Bannon: You might be right, but how do you, in practice, define a Riemann integral on such spaces? While the Borel isomorphism is there, probability theory does not really use the probability space as if it was a unit interval. $\endgroup$ Commented Jan 27, 2016 at 18:07
  • $\begingroup$ Indeed you are right! Anyhow, nice answer! $\endgroup$
    – Jon Bannon
    Commented Jan 27, 2016 at 19:14
6
$\begingroup$

I have the impression that the underlying problem is the expected value itself, not the integral (on which the expected value is based, of course). But since the question asks about the integral, I really don't see why people consider the Lebesgue integral inherently more difficult than the Riemann integral.

For all the examples below, consider $\mu$ to be a measure on (some $\sigma$-algebra of) $X$.

The opposition vertical vs horizontal is false

It's not like we cannot introduce the Lebesgue integral by vertical slices (see Wiki). The basic definition I was given in my second year analysis course was that the integral of a simple function $f = \sum_{i=1}^n a_i 1_{A_i}$ is given by $\int_X f d\mu = \sum_{i=1}^n a_i \mu(A_i)$. One can define integrals for larger classes of functions by using a limiting process.

Note that the integral $\int_X f d\mu$ above is nothing else than looking at vertical slices $A_i \times [0,a_i]$ and summing up their measures (OK, one would have to introduce the product measure $\mu \otimes \lambda^1$ on $X \times \mathbb{R}$, but that's not the point). The only difference between this and the Riemann integral is that the latter requires $A_i$ to be intervals in $\mathbb{R}$. Quite arbitrary, right? This takes us to the next point.

Riemann integration is built upon additional (and a posteriori, irrelevant) structure

Riemann integration requires us to look at intervals in $\mathbb{R}$, while Lebesgue integration only sees the $\sigma$-algebra of measurable sets, disregarding any (possible) existing metric or topological structure on $X$ (although for most applications we do care about measuring Borel sets, and about some metric compatibility, too). Mathematicians around 1900 needed some time to arrive at this generalization and simplification, so why not embrace their wisdom?

One application that a non-math major student could appreciate concerns summing up infinite series. Did you notice how confusing it is to study conditionally convergent series, the Riemann rearrangement theorem in particular? And how absolutely convergent series are better behaved, for some mysterious reasons?

In a way, it's an artifact of how we introduce infinite summation: \begin{equation} \label{eq:standard} \sum_{n=1}^\infty a_n = S \quad \Longleftrightarrow \quad \forall_{\varepsilon > 0} \, \exists_{N > 0} \, \forall_{n>N} \, |a_1+\ldots+a_n - S| < \varepsilon. \tag{$\star$} \end{equation} The order of the sequence $(a_n)$ is encoded into the definition - a fact which most students (I, for one) miss at first encounter. Hence all the problems with rearrangements.

The alternative Lebesgue summation approach is as follows. If $a_n \ge 0$, then let $$ \sum a := \sup \left\{ \sum_{j \in J} a_{j} : J \subseteq \mathbb{N} \text{ is finite} \right\}. $$ This supremum may well be infinite, but it always exists. And of course, the result does not depend on the order of $(a_n)$ in any way. One can check that $\sum a$ coincides with the sum in \eqref{eq:standard}.

For a general sequence $(a_n)$, consider its positive and negative part: $$ a^+_n := \max(a_n,0), \qquad a^-_n := \max(-a_n,0). $$ This way, $a_n = a^+_n - a^-_n$. If both $(a^+_n)$ and $(a^-_n)$ have finite sums, we put $$ \sum a := \sum a^+ - \sum a^-. $$ Again, the order doesn't matter. Note that the assumption $\sum a^+, \sum a^- < \infty$ is equivalent to $\sum |a| < \infty$, which is the same as saying that $\sum_{n=1}^\infty |a_n|$ is absolutely convergent.

The best thing is, the definition above is not an ad hoc construction. It's simply the result of applying Lebesgue integration theory to $\mathbb{N}$ with the counting measure (no measurability nuisance involved!).

The Lebesgue integral can be built on top of the Riemann integral

There's a number of reasons why horizontal is in fact better that vertical. BCLC's answer links to the money analogy Lebesgue himself made when explaining his integration theory. Jon Bannon's answer highlights the role of superlevel sets $S_{\lambda} := \{ x \in X : f(x) > \lambda \}$. Let me take it a bit further.

The horizontal slice between two values $\lambda < \mu$, i.e. the set $\{ (x,t) : \lambda < t \le \min(\mu, f(x)) \}$, has measure between $(\mu-\lambda) \mu(S_\lambda)$ and $(\mu-\lambda) \mu(S_\mu)$. If you sum up the contribution of all slices, and take the limit (as size of slices tends to zero), the integral turns out to be \begin{equation} \label{eq:horizontal} \int_X f d \mu = \int_0^\infty \mu(S_\lambda) d\lambda, \tag{$\star\star$} \end{equation} at least for $f \ge 0$. The cheapest way to have the Lebesgue integral is by taking \eqref{eq:horizontal} as the definition, where the left side is Lebesgue and the right side is Riemann. The necessary ingredients are:

  • determining $\mu(S_\lambda)$ (you need to have a measure $\mu$ to do this);
  • making sure that $\lambda \mapsto \mu(S_\lambda)$ is a Riemann-integrable function (it is, because it's monotone);
  • defining the Riemann integral $\int_0^\infty \mu(S_\lambda) d\lambda$.

The are reasons why more powerful tools are better than less powerful tools

For historical and pedagogical reasons, it's nice to start with something less powerful, but conceptually simpler. And many times it's perfectly OK to stop there. But students can benefit from knowing that the integration theory doesn't stop on Riemann, and for a good reason.

As already discussed in the other answers, Lebesgue's theory is useful for Fourier series and Fourier transform in general (due to limit theorems, I guess), calculus of variations (for the same reason, basically), probability theory (thanks to the general setting of measure spaces) and geometric applications (while one can extend Riemann integration to manifolds somehow, it's better to think about abstract measures).

All right, most students won't need it personally, but at least they should know that the world has made progress since the XIX century. And if they're OK with using smartphones or WolframAlpha as calculators - just because there's a powerful and easy-to-use tool at hand - they might be OK with Lebesgue's integral as well.

$\endgroup$
1
  • 3
    $\begingroup$ the last heading: "there are reasons" $\endgroup$
    – ryang
    Commented Jun 15, 2022 at 10:21
4
$\begingroup$

For nonnegative functions, define the integral to be the smallest member of $[0,+\infty]$ (closed at both ends) that is not too small to be the integral. And define "too small" to mean smaller than the integral of a nonnegative function that takes only finitely many values and that nowhere exceeds the function to be integrated. And the integral of a function that takes only finitely many values is defined the way you would expect in terms of the values and the measures of those subsets of the domain where it takes those values. The measures of subsets of the domain may be defined merely "intuitively" in some contexts.

For functions taking some positive and some negative values, treat the two parts separately and leave the integral undefined only when both parts are infinite.

From this it follows that something like $\int_{\mathbb R} dx/(1+x^2)$ is not an "improper" integral, since it is not defined by first integrating over a bounded interval and then taking a limit as the bounds approach infinity.

I would call attention to the distinction between iterated integrals and double integrals thus:

$$ \int\limits_{[a,b]} \left( \int\limits_{[c,d]} f(x,y)\, dy \right) \,dx \text{ versus } \int\limits_{[a,b]\times[c,d]} f(x,y)\, d(x,y) $$ and illustrate that a thing that guarantees they are equal is absolute convergence.

I would exhibit a simple case in which $$ \int\limits_A \lim_n f_n(x)\,dx \ne \lim_n\int\limits_A f_n(x)\,dx $$ and explain that a thing that guarantees they are equal is that $$ \int\limits_A \sup_n |f_n(x)|\,dx<+\infty. $$ And I would say that Lebesgue's definitions are what facilitate proving these propositions.

If this is well organized it could fit into an hour.

$\endgroup$
1
$\begingroup$

Supplement to other answers. Genius question for both you and the person who asked the ff, which I believe is the very heart of the difference of the 2 integrals.

"What is the benefit of using horizontal instead of vertical? Isn't that the same thing?"

I think of money. From wikipedia:

I have to pay a certain sum, which I have collected in my pocket. I take the bills and coins out of my pocket and give them to the creditor in the order I find them until I have reached the total sum. This is the Riemann integral. But I can proceed differently. After I have taken all the money out of my pocket I order the bills and coins according to identical values and then I pay the several heaps one after the other to the creditor. This is my integral.

To a non-maths major, just let them take for granted that somehow vertical integrals (Riemann integrals) can't integrate certain functions over certain sets the way horizontal integrals (Lebesgue integrals) can.

P.S. Vertical vs horizontal should already be a great explanation. I think your question might be rephrased to asking about vertical vs horizontal specifically (unless you're open to other ways besides the vertical vs horizontal thing).

P.P.S. I've been mentioned by Michał Miśkiewicz. Coooool.

$\endgroup$
0
$\begingroup$

Reframing: I wouldn't attempt this. After all, consider, the point is not the construction, but the properties. And we know what properties we'd ideally like (modulo non-trivial technical details). Some not-so-crazy counter-examples show that, e.g., even if a pointwise limit of continuous functions on $[a,b]$ is continuous, the integral of the limit need not be the limit of the integrals, Riemann or otherwise. Yes, there are pointwise limits of continuous functions (on otherwise-nice compact sets, such as $[a,b]$) that are not Riemann-integrable... but, in my practice, this is subordinate to the larger problems (about not generally being able to interchange integrals and pointwise limits). That is, I think the serious issue is not about forming integrals, but about subtler notions of convergence than raw pointwise convergence.

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.