You are currently browsing the monthly archive for October 2010.
In this course so far, we have focused primarily on one specific example of a countably additive measure, namely Lebesgue measure. This measure was constructed from a more primitive concept of Lebesgue outer measure, which in turn was constructed from the even more primitive concept of elementary measure.
It turns out that both of these constructions can be abstracted. In this set of notes, we will give the Carathéodory lemma, which constructs a countably additive measure from any abstract outer measure; this generalises the construction of Lebesgue measure from Lebesgue outer measure. One can in turn construct outer measures from another concept known as a pre-measure, of which elementary measure is a typical example.
With these tools, one can start constructing many more measures, such as Lebesgue-Stieltjes measures, product measures, and Hausdorff measures. With a little more effort, one can also establish the Kolmogorov extension theorem, which allows one to construct a variety of measures on infinite-dimensional spaces, and is of particular importance in the foundations of probability theory, as it allows one to set up probability spaces associated to both discrete and continuous random processes, even if they have infinite length.
The most important result about product measure, beyond the fact that it exists, is that one can use it to evaluate iterated integrals, and to interchange their order, provided that the integrand is either unsigned or absolutely integrable. This fact is known as the Fubini-Tonelli theorem, and is an absolutely indispensable tool for computing integrals, and for deducing higher-dimensional results from lower-dimensional ones.
We remark that these notes omit a very important way to construct measures, namely the Riesz representation theorem, but we will defer discussion of this theorem to 245B.
This is the final set of notes in this sequence. If time permits, the course will then begin covering the 245B notes, starting with the material on signed measures and the Radon-Nikodym-Lebesgue theorem.
Read the rest of this entry »
Let be a large integer, and let
be the Gaussian Unitary Ensemble (GUE), i.e. the random Hermitian matrix with probability distribution
where is a Haar measure on Hermitian matrices and
is the normalisation constant required to make the distribution of unit mass. The eigenvalues
of this matrix are then a coupled family of
real random variables. For any
, we can define the
-point correlation function
to be the unique symmetric measure on
such that
A standard computation (given for instance in these lecture notes of mine) gives the Ginebre formula
for the -point correlation function, where
is another normalisation constant. Using Vandermonde determinants, one can rewrite this expression in determinantal form as
where the kernel is given by
where and
are the (
-normalised) Hermite polynomials (thus the
are an orthonormal family, with each
being a polynomial of degree
). Integrating out one or more of the variables, one is led to the Gaudin-Mehta formula
(In particular, the normalisation constant in the previous formula turns out to simply be equal to
.) Again, see these lecture notes for details.
The functions can be viewed as an orthonormal basis of eigenfunctions for the harmonic oscillator operator
indeed it is a classical fact that
As such, the kernel can be viewed as the integral kernel of the spectral projection operator
.
From (1) we see that the fine-scale structure of the eigenvalues of GUE are controlled by the asymptotics of as
. The two main asymptotics of interest are given by the following lemmas:
Lemma 1 (Asymptotics of
in the bulk) Let
, and let
be the semicircular law density at
. Then, we have
as
for any fixed
(removing the singularity at
in the usual manner).
Lemma 2 (Asymptotics of
at the edge) We have
as
for any fixed
, where
is the Airy function
and again removing the singularity at
in the usual manner.
The proof of these asymptotics usually proceeds via computing the asymptotics of Hermite polynomials, together with the Christoffel-Darboux formula; this is for instance the approach taken in the previous notes. However, there is a slightly different approach that is closer in spirit to the methods of semi-classical analysis, which was briefly mentioned in the previous notes but not elaborated upon. For sake of completeness, I am recording some notes on this approach here, although to focus on the main ideas I will not be completely rigorous in the derivation (ignoring issues such as convegence of integrals or of operators, or (removable) singularities in kernels caused by zeroes in the denominator).
This is going to be a somewhat experimental post. In class, I mentioned that when solving the type of homework problems encountered in a graduate real analysis course, there are really only about a dozen or so basic tricks and techniques that are used over and over again. But I had not thought to actually try to make these tricks explicit, so I am going to try to compile here a list of some of these techniques here. But this list is going to be far from exhaustive; perhaps if other recent students of real analysis would like to share their own methods, then I encourage you to do so in the comments (even – or especially – if the techniques are somewhat vague and general in nature).
(See also the Tricki for some general mathematical problem solving tips. Once this page matures somewhat, I might migrate it to the Tricki.)
Note: the tricks occur here in no particular order, reflecting the stream-of-consciousness way in which they were arrived at. Indeed, this list will be extended on occasion whenever I find another trick that can be added to this list.
Emmanuel Breuillard, Ben Green, Robert Guralnick, and I have just uploaded to the arxiv our paper “Strongly dense free subgroups of semisimple algebraic groups“, submitted to Israel J. Math.. This paper was originally motivated by (and provides a key technical tool for) another forthcoming paper of ours, on expander Cayley graphs in finite simple groups of Lie type, but also has some independent interest due to connections with other topics, such as the Banach-Tarski paradox.
Recall that one of the basic facts underlying the Banach-Tarski paradox is that the rotation group contains a copy of the free non-abelian group
on two generators; thus there exists
such that
obey no nontrivial word identities. In fact, using basic algebraic geometry, one can then deduce that a generic pair
of group elements
has this property, where for the purposes of this paper “generic” means “outside of at most countably many algebraic subvarieties of strictly smaller dimension”. (In particular, using Haar measure on
, almost every pair has this property.) In fact one has a stronger property, given any non-trivial word
, the associated word map
from
to
is a dominant map, which means that its image is Zariski-dense. More succinctly, if
is generic, then
is generic also.
In contrast, if one were working in a solvable, nilpotent, or abelian group (such as ), then this property would not hold, since every subgroup of a solvable group is still solvable and thus not free (and similarly for nilpotent or abelian groups). (This already goes a long way to explain why the Banach-Tarski paradox holds in three or more dimensions, but not in two or fewer.) On the other hand, a famous result of Borel asserts that for any semisimple Lie group
(over an algebraically closed field), and any nontrivial word
, the word map
is dominant, thus generalising the preceding discussion for
. (There is also the even more famous Tits alternative, that asserts that any linear group that is not (virtually) solvable will contain a copy of the free group
; as pointed out to me by Michael Cowling, this already shows that generic pairs of generators will generate a free group, and with a little more effort one can even show that it generates a Zariski-dense free group.)
Now suppose we take two words , and look at the double word map
on a semisimple Lie group
. If
are non-trivial, then Borel’s theorem tells us that each component of this map is dominant, but this does not mean that the entire map is dominant, because there could be constraints between
and
. For instance, if the two words
commute, then
must also commute and so the image of the double word map is not Zariski-dense. But there are also non-commuting examples of non-trivial constraints: for instance, if
are conjugate, then
must also be conjugate, which is also a constraint that obstructs dominance.
It is still not clear exactly what pairs of words have the dominance property. However, we are able to establish that all pairs of non-commuting words have a weaker property than dominance:
Theorem. Let
be non-commuting words, and let
be generic elements of a semisimple Lie group
over an algebraically closed field. Then
generate a Zariski-dense subgroup of
.
To put it another way, not only contains free subgroups, but contains what we call strongly dense free subgroups: free subgroups such that any two non-commuting elements generate a Zariski-dense subgroup.
Our initial motivation for this theorem is its implications for finite simple groups of Lie type. Roughly speaking, one can use this theorem to show that a generic random walk in such a group cannot be trapped in a (bounded complexity) proper algebraic subgroup
of
, and this “escape from subgroups” fact is a key ingredient in our forthcoming paper in which we demonstrate that random Cayley graphs in such groups are expander graphs.
It also has implications for results of Banach-Tarski type; it shows that for any semisimple Lie group G, and for generic , one can use
to create Banach-Tarski paradoxical decompositions for all homogeneous spaces of
. In particular there is one pair of
that gives paradoxical decompositions for all homogeneous spaces simultaneously.
Our argument is based on a concept that we call “degeneration”. Let be generic elements of
, and suppose for contradiction that
generically generated a group whose algebraic closure was conjugate to a proper algebraic subgroup
of
. Borel’s theorem lets us show that
, and latex [w_1(a,b), w_2(a,b)]$ each generate maximal tori of
, which by basic algebraic group theory can be used to show that
must be a proper semisimple subgroup of
of maximal rank. If we were in the model case
, then we would already be done, as there are no such maximal rank semisimple subgroups; but in the other groups, such proper maximal semisimple groups unfortunately exist. Fortunately, they have been completely classified, and we take advantage of this classification in our argument.
The degeneration argument comes in as follows. Let be a non-generic pair in
. Then
lies in the Zariski closure of the generic pairs, which means that
lies in the Zariski closure of the set formed by
and its conjugates. In particular, if the non-generic pair is such that
generates a group that is dense in some proper algebraic subgroup
, then
is in the Zariski closure of the union of the conjugates of
. When this happens, we say that
is a degeneration of
. (For instance,
could be the stabiliser of some non-degenerate quadratic form, and
could be the stabiliser of a degenerate limit of that form.)
The key fact we need (that relies on the classification, and a certain amount of representation theory) is:
Proposition. Given any proper semisimple maximal rank subgroup
of
, there exists another proper semisimple subgroup
that is not a degeneration of
.
Using an induction hypothesis, we can find pairs such that
generate a dense subgroup of
, which together with the preceding discussion contradicts the proposition.
The proposition is currently proven by using some known facts about certain representation-theoretic invariants of all the semisimple subgroups of the classical and exceptional simple Lie groups. While the proof is of finite length, it is not particularly elegant, ultimately relying on the numerical value of one or more invariants of being sufficiently different from their counterparts for
that one can prevent the latter being a degeneration of the former. Perhaps there is another way to proceed here that is not based so heavily on classification.
One notable feature of mathematical reasoning is the reliance on counterfactual thinking – taking a hypothesis (or set of hypotheses) which may or may not be true, and following it (or them) to its logical conclusion. For instance, most propositions in mathematics start with a set of hypotheses (e.g. “Let be a natural number such that …”), which may or may not apply to the particular value of
one may have in mind. Or, if one ever argues by dividing into separate cases (e.g. “Case 1:
is even. … Case 2:
is odd. …”), then for any given
, at most one of these cases would actually be applicable, with the other cases being counterfactual alternatives. But the purest example of counterfactual thinking in mathematics comes when one employs a proof by contradiction (or reductio ad absurdum) – one introduces a hypothesis that in fact has no chance of being true at all (e.g. “Suppose for sake of contradiction that
is equal to the ratio
of two natural numbers.”), and proceeds to demonstrate this fact by showing that this hypothesis leads to absurdity.
Experienced mathematicians are so used to this type of counterfactual thinking that it is sometimes difficult for them to realise that it this type of thinking is not automatically intuitive for students or non-mathematicians, who can anchor their thinking on the single, “real” world to the extent that they cannot easily consider hypothetical alternatives. This can lead to confused exchanges such as the following:
Lecturer: “Theorem. Let
be a prime number. Then…”
Student: “But how do you know that
is a prime number? Couldn’t it be composite?”
or
Lecturer: “Now we see what the function
does when we give it the input of
instead. …”
Student: “But didn’t you just say that the input was equal to
just a moment ago?”
This is not to say that counterfactual thinking is not encountered at all outside of mathematics. For instance, an obvious source of counterfactual thinking occurs in fictional writing or film, particularly in speculative fiction such as science fiction, fantasy, or alternate history. Here, one can certainly take one or more counterfactual hypotheses (e.g. “what if magic really existed?”) and follow them to see what conclusions would result. The analogy between this and mathematical counterfactual reasoning is not perfect, of course: in fiction, consequences are usually not logically entailed by their premises, but are instead driven by more contingent considerations, such as the need to advance the plot, to entertain or emotionally affect the reader, or to make some moral or ideological point, and these types of narrative elements are almost completely absent in mathematical writing. Nevertheless, the analogy can be somewhat helpful when one is first coming to terms with mathematical reasoning. For instance, the mathematical concept of a proof by contradiction can be viewed as roughly analogous in some ways to such literary concepts as satire, dark humour, or absurdist fiction, in which one takes a premise specifically with the intent to derive absurd consequences from it. And if the proof of (say) a lemma is analogous to a short story, then the statement of that lemma can be viewed as analogous to the moral of that story.
Another source of counterfactual thinking outside of mathematics comes from simulation, when one feeds some initial data or hypotheses (that may or may not correspond to what actually happens in the real world) into a simulated environment (e.g. a piece of computer software, a laboratory experiment, or even just a thought-experiment), and then runs the simulation to see what consequences result from these hypotheses. Here, proof by contradiction is roughly analogous to the “garbage in, garbage out” phenomenon that is familiar to anyone who has worked with computers: if one’s initial inputs to a simulation are not consistent with the hypotheses of that simulation, or with each other, one can obtain bizarrely illogical (and sometimes unintentionally amusing) outputs as a result; and conversely, such outputs can be used to detect and diagnose problems with the data, hypotheses, or implementation of the simulation.
Despite the presence of these non-mathematical analogies, though, proofs by contradiction are still often viewed with suspicion and unease by many students of mathematics. Perhaps the quintessential example of this is the standard proof of Cantor’s theorem that the set of real numbers is uncountable. This is about as short and as elegant a proof by contradiction as one can have without being utterly trivial, and despite this (or perhaps because of this) it seems to offend the reason of many people when they are first exposed to it, to an extent far greater than most other results in mathematics. (The only other two examples I know of that come close to doing this are the fact that the real number
is equal to 1, and the solution to the blue-eyed islanders puzzle.)
Some time ago on this blog, I collected a family of well-known results in mathematics that were proven by contradiction, and specifically by a type of argument that I called the “no self-defeating object” argument; that any object that was so ridiculously overpowered that it could be used to “defeat” its own existence, could not actually exist. Many basic results in mathematics can be phrased in this manner: not only Cantor’s theorem, but Euclid’s theorem on the infinitude of primes, Gödel’s incompleteness theorem, or the conclusion (from Russell’s paradox) that the class of all sets cannot itself be a set.
I presented each of these arguments in the usual “proof by contradiction” manner; I made the counterfactual hypothesis that the impossibly overpowered object existed, and then used this to eventually derive a contradiction. Mathematically, there is nothing wrong with this reasoning, but because the argument spends almost its entire duration inside the bizarre counterfactual universe caused by an impossible hypothesis, readers who are not experienced with counterfactual thinking may view these arguments with unease.
It was pointed out to me, though (originally with regards to Euclid’s theorem, but the same point in fact applies to the other results I presented) that one can pull a large fraction of each argument out of this counterfactual world, so that one can see most of the argument directly, without the need for any intrinsically impossible hypotheses. This is done by converting the “no self-defeating object” argument into a logically equivalent “any object can be defeated” argument, with the former then being viewed as an immediate corollary of the latter. This change is almost trivial to enact (it is often little more than just taking the contrapositive of the original statement), but it does offer a slightly different “non-counterfactual” (or more precisely, “not necessarily counterfactual”) perspective on these arguments which may assist in understanding how they work.
For instance, consider the very first no-self-defeating result presented in the previous post:
Proposition 1 (No largest natural number). There does not exist a natural number
that is larger than all the other natural numbers.
This is formulated in the “no self-defeating object” formulation. But it has a logically equivalent “any object can be defeated” form:
Proposition 1′. Given any natural number
, one can find another natural number
which is larger than
.
Proof. Take .
While Proposition 1 and Proposition 1′ are logically equivalent to each other, note one key difference: Proposition 1′ can be illustrated with examples (e.g. take , so that the proof gives
), whilst Proposition 1 cannot (since there is, after all, no such thing as a largest natural number). So there is a sense in which Proposition 1′ is more “non-counterfactual” or “constructive” than the “counterfactual” Proposition 1.
In a similar spirit, Euclid’s theorem (which we give using the numbering from the previous post),
Proposition 3. There are infinitely many primes.
can be recast in “all objects can be defeated” form as
Proposition 3′. Let
be a collection of primes. Then there exists a prime
which is distinct from any of the primes
.
Proof. Take to be any prime factor of
(for instance, one could take the smallest prime factor, if one wished to be completely concrete). Since
is not divisible by any of the primes
,
must be distinct from all of these primes.
One could argue that there was a slight use of proof by contradiction in the proof of Proposition 3′ (because one had to briefly entertain and then rule out the counterfactual possibility that was equal to one of the
), but the proposition itself is not inherently counterfactual, as it does not make as patently impossible a hypothesis as a finite enumeration of the primes. Incidentally, it can be argued that the proof of Proposition 3′ is closer in spirit to Euclid’s original proof of his theorem, than the proof of Proposition 3 that is usually given today. Again, Proposition 3′ is “constructive”; one can apply it to any finite list of primes, say
, and it will actually exhibit a prime not in that list (in this case,
). The same cannot be said of Proposition 3, despite the logical equivalence of the two statements.
[Note: the article below may make more sense if one first reviews the previous blog post on the “no self-defeating object”. For instance, the section and theorem numbering here is deliberately chosen to match that of the preceding post.]
Let be a compact interval of positive length (thus
). Recall that a function
is said to be differentiable at a point
if the limit
exists. In that case, we call the strong derivative, classical derivative, or just derivative for short, of
at
. We say that
is everywhere differentiable, or differentiable for short, if it is differentiable at all points
, and differentiable almost everywhere if it is differentiable at almost every point
. If
is differentiable everywhere and its derivative
is continuous, then we say that
is continuously differentiable.
Remark 1 Much later in this sequence, when we cover the theory of distributions, we will see the notion of a weak derivative or distributional derivative, which can be applied to a much rougher class of functions and is in many ways more suitable than the classical derivative for doing “Lebesgue” type analysis (i.e. analysis centred around the Lebesgue integral, and in particular allowing functions to be uncontrolled, infinite, or even undefined on sets of measure zero). However, for now we will stick with the classical approach to differentiation.
Exercise 2 If
is everywhere differentiable, show that
is continuous and
is measurable. If
is almost everywhere differentiable, show that the (almost everywhere defined) function
is measurable (i.e. it is equal to an everywhere defined measurable function on
outside of a null set), but give an example to demonstrate that
need not be continuous.
Exercise 3 Give an example of a function
which is everywhere differentiable, but not continuously differentiable. (Hint: choose an
that vanishes quickly at some point, say at the origin
, but which also oscillates rapidly near that point.)
In single-variable calculus, the operations of integration and differentiation are connected by a number of basic theorems, starting with Rolle’s theorem.
Theorem 4 (Rolle’s theorem) Let
be a compact interval of positive length, and let
be a differentiable function such that
. Then there exists
such that
.
Proof: By subtracting a constant from (which does not affect differentiability or the derivative) we may assume that
. If
is identically zero then the claim is trivial, so assume that
is non-zero somewhere. By replacing
with
if necessary, we may assume that
is positive somewhere, thus
. On the other hand, as
is continuous and
is compact,
must attain its maximum somewhere, thus there exists
such that
for all
. Then
must be positive and so
cannot equal either
or
, and thus must lie in the interior. From the right limit of (1) we see that
, while from the left limit we have
. Thus
and the claim follows.
Remark 5 Observe that the same proof also works if
is only differentiable in the interior
of the interval
, so long as it is continuous all the way up to the boundary of
.
Exercise 6 Give an example to show that Rolle’s theorem can fail if
is merely assumed to be almost everywhere differentiable, even if one adds the additional hypothesis that
is continuous. This example illustrates that everywhere differentiability is a significantly stronger property than almost everywhere differentiability. We will see further evidence of this fact later in these notes; there are many theorems that assert in their conclusion that a function is almost everywhere differentiable, but few that manage to conclude everywhere differentiability.
Remark 7 It is important to note that Rolle’s theorem only works in the real scalar case when
is real-valued, as it relies heavily on the least upper bound property for the domain
. If, for instance, we consider complex-valued scalar functions
, then the theorem can fail; for instance, the function
defined by
vanishes at both endpoints and is differentiable, but its derivative
is never zero. (Rolle’s theorem does imply that the real and imaginary parts of the derivative
both vanish somewhere, but the problem is that they don’t simultaneously vanish at the same point.) Similar remarks to functions taking values in a finite-dimensional vector space, such as
.
One can easily amplify Rolle’s theorem to the mean value theorem:
Corollary 8 (Mean value theorem) Let
be a compact interval of positive length, and let
be a differentiable function. Then there exists
such that
.
Proof: Apply Rolle’s theorem to the function .
Remark 9 As Rolle’s theorem is only applicable to real scalar-valued functions, the more general mean value theorem is also only applicable to such functions.
Exercise 10 (Uniqueness of antiderivatives up to constants) Let
be a compact interval of positive length, and let
and
be differentiable functions. Show that
for every
if and only if
for some constant
and all
.
We can use the mean value theorem to deduce one of the fundamental theorems of calculus:
Theorem 11 (Second fundamental theorem of calculus) Let
be a differentiable function, such that
is Riemann integrable. Then the Riemann integral
of
is equal to
. In particular, we have
whenever
is continuously differentiable.
Proof: Let . By the definition of Riemann integrability, there exists a finite partition
such that
for every choice of .
Fix this partition. From the mean value theorem, for each one can find
such that
and thus by telescoping series
Since was arbitrary, the claim follows.
Remark 12 Even though the mean value theorem only holds for real scalar functions, the fundamental theorem of calculus holds for complex or vector-valued functions, as one can simply apply that theorem to each component of that function separately.
Of course, we also have the other half of the fundamental theorem of calculus:
Theorem 13 (First fundamental theorem of calculus) Let
be a compact interval of positive length. Let
be a continuous function, and let
be the indefinite integral
. Then
is differentiable on
, with derivative
for all
. In particular,
is continuously differentiable.
Proof: It suffices to show that
for all , and
for all . After a change of variables, we can write
for any and any sufficiently small
, or any
and any sufficiently small
. As
is continuous, the function
converges uniformly to
on
as
(keeping
fixed). As the interval
is bounded,
thus converges to
, and the claim follows.
Corollary 14 (Differentiation theorem for continuous functions) Let
be a continuous function on a compact interval. Then we have
for all
,
for all
, and thus
for all
.
In these notes we explore the question of the extent to which these theorems continue to hold when the differentiability or integrability conditions on the various functions are relaxed. Among the results proven in these notes are
- The Lebesgue differentiation theorem, which roughly speaking asserts that Corollary 14 continues to hold for almost every
if
is merely absolutely integrable, rather than continuous;
- A number of differentiation theorems, which assert for instance that monotone, Lipschitz, or bounded variation functions in one dimension are almost everywhere differentiable; and
- The second fundamental theorem of calculus for absolutely continuous functions.
The material here is loosely based on Chapter 3 of Stein-Shakarchi. Read the rest of this entry »
This week I once again gave some public lectures on the cosmic distance ladder in astronomy, once at Stanford and once at UCLA. The slides I used were similar to the “version 3.0” slides I used for the same talk last year in Australia and elsewhere, but the images have been updated (and the permissions for copyrighted images secured), and some additional data has also been placed on them. I am placing these slides here on this blog, in Powerpoint format and also in PDF format. (Video for the UCLA talk should also be available on the UCLA web site at some point; I’ll add a link when it becomes available.)
These slides have evolved over a period of almost five years, particularly with regards to the imagery, but this is likely to be close to the final version. Here are some of the older iterations of the slides:
- (Version 1.0, 2006) A text-based version of the slides, together with accompanying figures.
- (Version 2.0, 2007) First conversion to Powerpoint format.
- (Version 3.0, 2009) Second conversion to Powerpoint format, with completely new imagery and a slightly different arrangement.
- (Version 4.0, 2010) Images updated from the previous version, with copyright permissions secured.
- (Version 4.1, 2010) The version used for the UCLA talk, with some additional data and calculations added.
- (Version 4.2, 2010) A slightly edited version, incorporating some corrections and feedback.
- (Version 4.3, 2017) Some further corrections.
I have found that working on and polishing a single public lecture over a period of several years has been very rewarding and educational, especially given that I had very little public speaking experience at the beginning; there are several other mathematicians I know of who are also putting some effort into giving good talks that communicate mathematics and science to the general public, but I think there could potentially be many more such talks like this.
A note regarding copyright: I am happy to have the text or layout of these slides used as the basis for other presentations, so long as the source is acknowledged. However, some of the images in these slides are copyrighted by others, and permission by the copyright holders was granted only for the display of the slides in their current format. (The list of such images is given at the end of the slides.) So if you wish to adapt the slides for your own purposes, you may need to use slightly different imagery.
(Update, October 11: Version 4.2 uploaded, and notice on copyright added.)
(Update, October 20: Some photos from the UCLA talk are available here.)
(Update, October 25: Video from the talk is available on Youtube and on Itunes.)
The following question came up in my 245A class today:
Is it possible to express a non-closed interval in the real line, such as [0,1), as a countable union of disjoint closed intervals?
I was not able to answer the question immediately, but by the end of the class some of the students had come up with an answer. It is actually a nice little test of one’s basic knowledge of real analysis, so I am posing it here as well for anyone else who is interested. Below the fold is the answer to the question (whited out; one has to highlight the text in order to read it).
If one has a sequence of real numbers
, it is unambiguous what it means for that sequence to converge to a limit
: it means that for every
, there exists an
such that
for all
. Similarly for a sequence
of complex numbers
converging to a limit
.
More generally, if one has a sequence of
-dimensional vectors
in a real vector space
or complex vector space
, it is also unambiguous what it means for that sequence to converge to a limit
or
; it means that for every
, there exists an
such that
for all
. Here, the norm
of a vector
can be chosen to be the Euclidean norm
, the supremum norm
, or any other number of norms, but for the purposes of convergence, these norms are all equivalent; a sequence of vectors converges in the Euclidean norm if and only if it converges in the supremum norm, and similarly for any other two norms on the finite-dimensional space
or
.
If however one has a sequence of functions
or
on a common domain
, and a putative limit
or
, there can now be many different ways in which the sequence
may or may not converge to the limit
. (One could also consider convergence of functions
on different domains
, but we will not discuss this issue at all here.) This is contrast with the situation with scalars
or
(which corresponds to the case when
is a single point) or vectors
(which corresponds to the case when
is a finite set such as
). Once
becomes infinite, the functions
acquire an infinite number of degrees of freedom, and this allows them to approach
in any number of inequivalent ways.
What different types of convergence are there? As an undergraduate, one learns of the following two basic modes of convergence:
- We say that
converges to
pointwise if, for every
,
converges to
. In other words, for every
and
, there exists
(that depends on both
and
) such that
whenever
.
- We say that
converges to
uniformly if, for every
, there exists
such that for every
,
for every
. The difference between uniform convergence and pointwise convergence is that with the former, the time
at which
must be permanently
-close to
is not permitted to depend on
, but must instead be chosen uniformly in
.
Uniform convergence implies pointwise convergence, but not conversely. A typical example: the functions defined by
converge pointwise to the zero function
, but not uniformly.
However, pointwise and uniform convergence are only two of dozens of many other modes of convergence that are of importance in analysis. We will not attempt to exhaustively enumerate these modes here (but see this Wikipedia page, and see also these 245B notes on strong and weak convergence). We will, however, discuss some of the modes of convergence that arise from measure theory, when the domain is equipped with the structure of a measure space
, and the functions
(and their limit
) are measurable with respect to this space. In this context, we have some additional modes of convergence:
- We say that
converges to
pointwise almost everywhere if, for (
-)almost everywhere
,
converges to
.
- We say that
converges to
uniformly almost everywhere, essentially uniformly, or in
norm if, for every
, there exists
such that for every
,
for
-almost every
.
- We say that
converges to
almost uniformly if, for every
, there exists an exceptional set
of measure
such that
converges uniformly to
on the complement of
.
- We say that
converges to
in
norm if the quantity
converges to
as
.
- We say that
converges to
in measure if, for every
, the measures
converge to zero as
.
Observe that each of these five modes of convergence is unaffected if one modifies or
on a set of measure zero. In contrast, the pointwise and uniform modes of convergence can be affected if one modifies
or
even on a single point.
Remark 1 In the context of probability theory, in which
and
are interpreted as random variables, convergence in
norm is often referred to as convergence in mean, pointwise convergence almost everywhere is often referred to as almost sure convergence, and convergence in measure is often referred to as convergence in probability.
Exercise 2 (Linearity of convergence) Let
be a measure space, let
be sequences of measurable functions, and let
be measurable functions.
- Show that
converges to
along one of the above seven modes of convergence if and only if
converges to
along the same mode.
- If
converges to
along one of the above seven modes of convergence, and
converges to
along the same mode, show that
converges to
along the same mode, and that
converges to
along the same mode for any
.
- (Squeeze test) If
converges to
along one of the above seven modes, and
pointwise for each
, show that
converges to
along the same mode.
We have some easy implications between modes:
Exercise 3 (Easy implications) Let
be a measure space, and let
and
be measurable functions.
- If
converges to
uniformly, then
converges to
pointwise.
- If
converges to
uniformly, then
converges to
in
norm. Conversely, if
converges to
in
norm, then
converges to
uniformly outside of a null set (i.e. there exists a null set
such that the restriction
of
to the complement of
converges to the restriction
of
).
- If
converges to
in
norm, then
converges to
almost uniformly.
- If
converges to
almost uniformly, then
converges to
pointwise almost everywhere.
- If
converges to
pointwise, then
converges to
pointwise almost everywhere.
- If
converges to
in
norm, then
converges to
in measure.
- If
converges to
almost uniformly, then
converges to
in measure.
The reader is encouraged to draw a diagram that summarises the logical implications between the seven modes of convergence that the above exercise describes.
We give four key examples that distinguish between these modes, in the case when is the real line
with Lebesgue measure. The first three of these examples already were introduced in the previous set of notes.
Example 4 (Escape to horizontal infinity) Let
. Then
converges to zero pointwise (and thus, pointwise almost everywhere), but not uniformly, in
norm, almost uniformly, in
norm, or in measure.
Example 5 (Escape to width infinity) Let
. Then
converges to zero uniformly (and thus, pointwise, pointwise almost everywhere, in
norm, almost uniformly, and in measure), but not in
norm.
Example 6 (Escape to vertical infinity) Let
. Then
converges to zero pointwise (and thus, pointwise almost everywhere) and almost uniformly (and hence in measure), but not uniformly, in
norm, or in
norm.
Example 7 (Typewriter sequence) Let
be defined by the formula
whenever
and
. This is a sequence of indicator functions of intervals of decreasing length, marching across the unit interval
over and over again. Then
converges to zero in measure and in
norm, but not pointwise almost everywhere (and hence also not pointwise, not almost uniformly, nor in
norm, nor uniformly).
Remark 8 The
norm
of a measurable function
is defined to the infimum of all the quantities
that are essential upper bounds for
in the sense that
for almost every
. Then
converges to
in
norm if and only if
as
. The
and
norms are part of the larger family of
norms, which we will study in more detail in 245B.
One particular advantage of convergence is that, in the case when the
are absolutely integrable, it implies convergence of the integrals,
as one sees from the triangle inequality. Unfortunately, none of the other modes of convergence automatically imply this convergence of the integral, as the above examples show.
The purpose of these notes is to compare these modes of convergence with each other. Unfortunately, the relationship between these modes is not particularly simple; unlike the situation with pointwise and uniform convergence, one cannot simply rank these modes in a linear order from strongest to weakest. This is ultimately because the different modes react in different ways to the three “escape to infinity” scenarios described above, as well as to the “typewriter” behaviour when a single set is “overwritten” many times. On the other hand, if one imposes some additional assumptions to shut down one or more of these escape to infinity scenarios, such as a finite measure hypothesis or a uniform integrability hypothesis, then one can obtain some additional implications between the different modes.
Recent Comments