You are currently browsing the monthly archive for July 2009.

In view of the sustained interest in new polymath projects, Tim Gowers, Gil Kalai, Michael Nielsen, and I have set up a new blog to propose, plan, and run these projects.  This blog is not intended to hold a “monopoly” on the polymath enterprise, but to merely be a convenient central location for discussing and running such projects should one choose.

We have started the ball rolling on this blog with some proposed rules for running a polymath, a mock-up of what a research thread and a discussion thread for a project would look like, two new proposals for the next polymath project (deterministic location of primes, and a problem of Michael Boshernitzan), and a thread on how one should select the next project (which we intend to do in a few months, with a tentative plan to actually start the next project at around October or so).  Please come give the blog a visit, and contribute your thoughts and suggestions, though it should be noted that we are not planning to start a new polymath project right away, but merely to plan for the next one for now.

[Update, July 28: Actually, this may change.  There has already been enough progress on the “deterministic location of primes” project that a discussion thread and wiki page has been created, and this polymath project may in fact formally launch much sooner than anticipated, perhaps in a matter of weeks.  However, much work still needs to be done in laying the groundwork of this project, in particular developing preparatory materials in the wiki and elsewhere to allow participants to get up to speed.]

On a somewhat related note, now that we have a dedicated blog for these sorts of polymath projects, I am thinking of revisiting the ratings system for comments that I recently turned on here.   I guess this would be a good question to poll the readers on:

Note that the ratings system is somewhat customisable: see this wordpress page for details.

My two-volume book, “Poincaré’s legacies: pages from year two of a mathematical blog“, which was based on the 2008 posts of this blog, has now been published by the American Mathematical Society.

[Update, Aug 3: Actually, only the first volume has been published so far.  The second volume of this book will be available on Aug 10.]

The mini-polymath project to find solutions to Problem 6 of the 2009 IMO is still ongoing, but I thought that, while the memories of the experience are still fresh, it would be a good time to open a parallel thread to collect the impressions that participants and observers had of how the project was conducted, how successful it was, and how it (or future projects) could be made to run more smoothly.

Just to get the ball rolling, here are some impressions I got as a (rather passive) moderator:

  1. There is no shortage of potential interest in polymath projects. I was impressed by how the project could round up a dozen interested and qualified participants in a matter of hours; this is one particular strength of the polymath paradigm.  Of course, it helped that this particular project was elementary, and was guaranteed to have an elementary (and relatively short) solution.  Nevertheless, the availability of volunteers does bode well for future projects of this type.
  2. A wiki needs to be set up as soon as possible. The wiki for polymath1 was an enormously valuable resource, once it was set up.  I had naively thought that the mini-polymath1 project would be short enough that a wiki was not necessary, but now I see that it would have come in handy for organising and storing the arguments, strategies, insights, and ideas that arose through the linear blog thread format, but which was difficult to summarise in that format.  (I have belatedly set a wiki for this project up here.)  For the next polymath project (I have none planned yet, but can imagine that one would eventually arise), I will try to ensure a wiki is available early on.
  3. There is an increasing temptation to work offline as the project develops. In the rules of the polymath projects to date, the idea is for participants to avoid working “offline” for too long, instead reporting all partial progress and thoughts on the blog and/or the wiki as it occurs.  This ideal seems to be adhered to well in the first phases of the project, when the “easy” but essential observations are being made, and the various “low-hanging fruits” are harvested, but at some point it seems that one needs to do more non-trivial amounts of computation and thought, which is still much easier to do offline than online.  It is possible that future technological advances (e.g. the concurrent editing capabilities of platforms such as Google Wave) may change this, though; also a culture and etiquette of collaborative thinking might also evolve over time, much as how mathematical research has already adapted to happily absorb new modes of communication, such as email.  In the meantime, though, I think one has accommodate both online and offline modes of thinking to make a polymath project as successful as possible, avoiding degeneration into a mass of low-quality observations on one hand, and a fracturing into isolated research efforts on the other.
  4. Without leadership or organisation, the big picture can be obscured by chaos. As I was distracted by other tasks (for instance, flying from Bremen back to Los Angeles), and had already known of a solution to the problem, I adopted a laissez faire attitude to task of moderating the project.  This worked to some extent, and there was certainly no shortage of ideas being tossed back and forth, arguments being checked and repaired, etc., but I think that with more active moderation, one could have had a bit more focus on longer-term strategy and vision than there was.  Perhaps in future projects one could be more explicit in the rules about encouraging this sort of perspective (for instance, in encouraging periodic summaries of the situation either on the blog or on the wiki).
  5. Polymath projects tend to generate multiple solutions to a problem, rather than a single solution. A single researcher will tend to focus on only one idea at a time, and is thus generally led to just a single solution (if that idea ends up being successful); but a polymath project is more capable of pursuing several independent lines of attack simultaneously, and so often when the breakthrough comes, one gets multiple solutions as a result.  This makes it harder to do direct comparison of success between polymath projects and individual efforts; from the (limited) data points available, I tentatively hypothesise that polymath projects tend to be slower, but obtain broader and deeper results, than what a dedicated individual effort would accomplish.
  6. Polymath progress is both very fast and very slow. I’ve noticed something paradoxical about these projects.  On the one hand, progress can be very fast in the sense that ideas get tossed out there at a rapid rate; also, with all the proofreaders, errors in arguments get picked up much quicker than when only one mathematician is involved.  On the other hand, it can take a while for an idea or insight obtained by one participant to be fully absorbed by the others, and sometimes the key observation can be drowned out by a large number of less important observations.  The process seems somewhat analogous to that of evolution and natural selection in biology; consider for instance how the meme of “try using induction”, which was the ultimately successful approach, had to first fight among competing memes such as “try using contradiction”, “try counting arguments”, “try topological arguments on the cube”, etc., before gaining widespread acceptance.  In contrast, an individual might through luck (or experience) hit upon the right approach (in this case, induction) very early on and end up with a solution far quicker than a polymath effort; conversely, he or she may select the wrong approach and end up wasting far more time than a polymath would.
  7. The wordpress blog format is adequate, but far from ideal. Technical problems (most notably, the spam filter, the inability to preview or edit comments [except by myself], and the (temporary) lack of nesting and automatic comment numbering) made things more frustrating and clunky than they should be.  Adding the wiki helps some of the problems, but definitely not all, especially since there is no integration between the blog and the wiki.  But the LaTeX support included in the WordPress blog is valuable, even if it does act up sometimes. Hopefully future technologies will provide better platforms for this sort of thing.  (As a temporary fix, one might set up some dedicated blog (or other forum) for polymath projects with customised code, rather than relying on hosts.)
<swolpert@support.ucla.edu>

Well, participation in the IMO 2009 Q6 polymath project has exceeded my expectations; it appears that the collaborative effort has scored some partial successes toward a solution in the first 24 hours of its existence, but is not quite there yet.

As the thread has become overly long, I am following established polymath practice and starting a new thread here, hopefully to try to impose more order onto the chaos.  In order to assist this effort, some of the participants may wish to summarise some of the “state of play” so far.  (I am unable to do this due to being biased by the knowledge of a solution.)

The comment numbering system has not worked as smoothly as I had hoped, but it is still better than nothing.  I propose that comments in this thread start at 200, so as not to collide with the previous thread.

Of course, all the rules of the polymath exercise (as discussed in the previous thread) continue to apply.  I am pleased to see that virtually everyone participating has adhered to the collaborative spirit of the exercise, for instance in keeping criticism constructive and technical rather than pejorative and personal.

As an experiment, I have enabled one level of comment threading; thus one can reply to a specific comment in the thread below.  If one does so, I would request that you number the comment by subsection, for instance the first response to comment 234 would be numbered 234.1, the next one 234.2, and so forth.  There is a tradeoff here; the threading adds more structure to the comments, but the flip side of this is that it makes it harder to read off at a glance all the comments made since the last time one checks the thread.  Hopefully participants will be able to use their discretion when to reply and when not to. [Update, Jul 21: Threading should work properly now, I didn’t set it properly before.]

The International Mathematical Olympiad (IMO) consists of a set of six problems, to be solved in two sessions of four and a half hours each.  Traditionally, the last problem (Problem 6) is significantly harder than the others.  Problem 6 of the 2009 IMO, which was given out last Wednesday, reads as follows:

Problem 6. Let a_1, a_2, \ldots, a_n be distinct positive integers and let M be a set of n-1 positive integers not containing s = a_1 +a_2 +\ldots+a_n. A grasshopper is to jump along the real axis, starting at the point 0 and making n jumps to the right with lengths a_1, a_2, \ldots , a_n in some order. Prove that the order can be chosen in such a way that the grasshopper never lands on any point in M.

Of the 500-odd participants in the Olympiad, only a half-dozen or so managed to solve this problem completely (I don’t have precise statistics yet).  I myself worked it out about seven hours after first hearing about the problem, though I was preoccupied with other things for most of that time period.

I thought that this problem might make a nice “mini-Polymath” project to be solved collaboratively; it is significantly simpler than an unsolved research problem (in particular, being an IMO problem, it is already known that there is a solution, which uses only elementary methods), and the problem is receptive to the  incremental, one-trivial-observation-at-a-time polymath approach.  So I would like to invite people to try solving the problem collaboratively on this blog, by posting one’s own comments, thoughts, and partial progress on the problem here.

To keep with the spirit of the polymath approach, I would however like to impose some ground rules:

  1. Everyone who does not already know the solution, and has not worked on the problem already, is welcome to jump in and participate, regardless of mathematical level.
    1. However, in order not to spoil the experiment, I would ask that those of you who have already found a solution not to give any hint of the solution here until after the collaborative effort has found its solution.  (At that point, everyone is welcome to give out their solutions here.)  For instance, I will not be participating in the project except as a moderator.
    2. For similar reasons, I would ask that competitors at the actual 2009 IMO refrain from commenting substantively on the problem on this thread until after the collaborative effort has succeeded.  (I know this may require some significant restraint, but I suspect the problem will become too easy if we get comments along the lines of “This was a tough problem!  I tried X and Y and Z, and they didn’t work; I tried W also but ran out of time.  I hear that someone solved the problem using U, though.”  Of course, after the collaborative effort has succeeded, you are more than welcome to share your own experiences with the problem.)
  2. Participants should avoid explicitly searching for solutions to this problem on the internet (I would imagine spoilers would become available in a few days). If you do accidentally find such a solution online, I would ask that you recuse yourself from the rest of the collaboration, until after they have found a solution also.  (In particular, posting links to a solution is strongly discouraged until after the collaborative effort has succeeded.)
    1. In a similar vein, extensive searching of the mathematical literature should only be undertaken if there is a consensus to do so on this thread.
  3. Participants are also discouraged from working too hard on this problem “offline”; if you have a potentially useful observation, one should share it with the other collaborators here, rather than develop it further in private, unless it is “obvious” how to carry the observation further.
    1. Actually, even “frivolous” observations can (and should) be posted on this thread, if there is even a small chance that some other participant may be able to find it helpful for solving the problem.
    2. Similarly, “failed” attempts at a solution are also worth posting; another participant may be able to salvage the argument, or else the failure can be used as a data point to eliminate some approaches to the problem, and to isolate more promising ones.
  4. Participants should view themselves as contributing to a team effort, rather than competing with each other (in contrast to the actual IMO).  The point is not to obtain bragging rights for being the first or quickest to solve the problem (which has, after all, already been solved), but instead to experimentally test the hypothesis that a mathematical problem can be solved by a massive collaboration, without requiring serious effort on behalf of any one of the participants.  (See Tim Gowers’ essay “is massively collaborative mathematics possible?” for more discussion.)
  5. To make it easier to reference comments in this thread, I would ask commenters to number their comments (so that the first comment be labeled 1., the second comment be labeled 2., and so forth.)
  6. Unlike the actual IMO, there is no artificial time limit on this exercise, though if there is insufficient participation, or the collaborative effort grinds to a halt, I may at my discretion close the experiment and give out solutions after a reasonable time period.

This week I am in Bremen, where the 50th International Mathematical Olympiad is being held.  A number of former Olympians (Béla Bollobás, Tim Gowers, Laci Lovasz, Stas Smirnov, Jean-Christophe Yoccoz, and myself) were invited to give a short talk (20 minutes in length) at the celebratory event for this anniversary.  I chose to talk on a topic I have spoken about several times before, on “Structure and randomness in the prime numbers“.  Given the time constraints, there was a limit as to how much substance I could put into the talk; but I try to describe, in very general terms, what we know about the primes, and what we suspect to be true, but cannot yet establish.  As I have mentioned in previous talks, the key problem is that we suspect the distribution of the primes to obey no significant patterns (other than “local” structure, such as having a strong tendency to be odd (which is local information at the 2 place), or obeying the prime number theorem (which is local information at the infinity place)), but we still do not have fully satisfactory tools for establishing the absence of a pattern. (This is in contrast with many types of Olympiad problems, where the key to solving a problem often lies in discovering the right pattern or structure in the problem to exploit.)

The PDF of the talk is here; I decided to try out the Beamer LaTeX package for a change.

Seven videotaped lectures from 1964 by Richard Feynman given in Cornell, on “The Character of Physical Law“,  have recently been put online (by Microsoft Research, through the purchase of these lectures from the Feynman estate by Bill Gates, as described in this interview with Gates), with a number of multimedia enhancements (external links, subtitles, etc.).  These lectures, intended for a general audience, broadly cover the same type of material that is in his famous lectures on physics.

I have just finished the first lecture, describing the history and impact of the law of gravitation as a model example of a physical law; I had of course known of Feynman’s reputation as an outstandingly clear, passionate, and entertaining lecturer, but it is quite something else to see that lecturing style directly.  The lectures are each about an hour long, but I recommend setting aside the time to view at least one of them, both for the substance of the lecture and for the presentation.  His introduction to the first lecture is surprisingly poetic:

The artists of the Renaissance said that man’s main concern should be for man.

And yet, there are some other things of interest in this world: even the artist appreciates sunsets, and ocean waves, and the march of the stars across the heavens.

And there is some reason, then, to talk of other things sometimes.

As we look into these things, we get an aesthetic pleasure directly on observation, but there’s also a rhythm and pattern between the phenomena of nature, which isn’t apparent to the eye, but only to the eye of analysis.

And it’s these rhythms and patterns that we call physical laws.

What I want to talk about in this series of lectures is the general characteristics of these physical laws.  …

The talk then shifts to the very concrete and specific topic of gravitation, though, as can be seen in this portion of the video:

Coincidentally, I covered some of the material in Feynman’s first lecture in my own talk on the cosmic distance ladder, though I was approaching the topic from a rather different angle, and with a less elegant presentation.

[Via the New York Times.  Note that some browsers may need a Silverlight extension in order to view the videos. Youtube versions of three of the seven lectures are available here.  Another, much more recent, series of videotaped lectures of Feynman is available here.]

[Update, July 15: Of particular interest to mathematicians is his second lecture “The relation of mathematics and physics”.  He draws several important contrasts between the reasoning of physics and the axiomatic reasoning of formal, settled mathematics, of the type found in textbooks; but it is quite striking to me that the reasoning of unsettled mathematics – recent fields in which the precise axioms and theoretical framework has not yet been fully formalised and standardised – matches Feynman’s description of physical reasoning in many ways. I suspect that Feynman’s impressions of mathematics as performed by mathematicians in 1964 may differ a little from the way mathematics is performed today.]

The Riemann zeta function {\zeta(s)}, defined for {\hbox{Re}(s)>1} by

\displaystyle  \zeta(s) := \sum_{n=1}^\infty \frac{1}{n^s} \ \ \ \ \ (1)

and then continued meromorphically to other values of {s} by analytic continuation, is a fundamentally important function in analytic number theory, as it is connected to the primes {p=2,3,5,\ldots} via the Euler product formula

\displaystyle  \zeta(s) = \prod_p (1 - \frac{1}{p^s})^{-1} \ \ \ \ \ (2)

(for {\hbox{Re}(s) > 1}, at least), where {p} ranges over primes. (The equivalence between (1) and (2) is essentially the generating function version of the fundamental theorem of arithmetic.) The function {\zeta} has a pole at {1} and a number of zeroes {\rho}. A formal application of the factor theorem gives

\displaystyle  \zeta(s) = \frac{1}{s-1} \prod_\rho (s-\rho) \times \ldots \ \ \ \ \ (3)

where {\rho} ranges over zeroes of {\zeta}, and we will be vague about what the {\ldots} factor is, how to make sense of the infinite product, and exactly which zeroes of {\zeta} are involved in the product. Equating (2) and (3) and taking logarithms gives the formal identity

\displaystyle  - \log \zeta(s) = \sum_p \log(1 - \frac{1}{p^s}) = \log(s-1) - \sum_\rho \log(s-\rho) + \ldots; \ \ \ \ \ (4)

using the Taylor expansion

\displaystyle  \log(1 - \frac{1}{p^s}) = - \frac{1}{p^s} - \frac{1}{2 p^{2s}} - \frac{1}{3p^{3s}} - \ldots \ \ \ \ \ (5)

and differentiating the above identity in {s} yields the formal identity

\displaystyle  - \frac{\zeta'(s)}{\zeta(s)} = \sum_n \frac{\Lambda(n)}{n^s} = \frac{1}{s-1} - \sum_\rho \frac{1}{s-\rho} + \ldots \ \ \ \ \ (6)

where {\Lambda(n)} is the von Mangoldt function, defined to be {\log p} when {n} is a power of a prime {p}, and zero otherwise. Thus we see that the behaviour of the primes (as encoded by the von Mangoldt function) is intimately tied to the distribution of the zeroes {\rho}. For instance, if we knew that the zeroes were far away from the axis {\hbox{Re}(s)=1}, then we would heuristically have

\displaystyle  \sum_n \frac{\Lambda(n)}{n^{1+it}} \approx \frac{1}{it}

for real {t}. On the other hand, the integral test suggests that

\displaystyle  \sum_n \frac{1}{n^{1+it}} \approx \frac{1}{it}

and thus we see that {\frac{\Lambda(n)}{n}} and {\frac{1}{n}} have essentially the same (multiplicative) Fourier transform:

\displaystyle  \sum_n \frac{\Lambda(n)}{n^{1+it}} \approx \sum_n \frac{1}{n^{1+it}}.

Inverting the Fourier transform (or performing a contour integral closely related to the inverse Fourier transform), one is led to the prime number theorem

\displaystyle  \sum_{n \leq x} \Lambda(n) \approx \sum_{n \leq x} 1.

In fact, the standard proof of the prime number theorem basically proceeds by making all of the above formal arguments precise and rigorous.

Unfortunately, we don’t know as much about the zeroes {\rho} of the zeta function (and hence, about the {\zeta} function itself) as we would like. The Riemann hypothesis (RH) asserts that all the zeroes (except for the “trivial” zeroes at the negative even numbers) lie on the critical line {\hbox{Re}(s)=1/2}; this hypothesis would make the error terms in the above proof of the prime number theorem significantly more accurate. Furthermore, the stronger GUE hypothesis asserts in addition to RH that the local distribution of these zeroes on the critical line should behave like the local distribution of the eigenvalues of a random matrix drawn from the gaussian unitary ensemble (GUE). I will not give a precise formulation of this hypothesis here, except to say that the adjective “local” in the context of distribution of zeroes {\rho} means something like “at scale {O(1/\log T)} when {\hbox{Im}(s) = O(T)}“.

Nevertheless, we do know some reasonably non-trivial facts about the zeroes {\rho} and the zeta function {\zeta}, either unconditionally, or assuming RH (or GUE). Firstly, there are no zeroes for {\hbox{Re}(s)>1} (as one can already see from the convergence of the Euler product (2) in this case) or for {\hbox{Re}(s)=1} (this is trickier, relying on (6) and the elementary observation that

\displaystyle  \hbox{Re}( 3\frac{\Lambda(n)}{n^{\sigma}} + 4\frac{\Lambda(n)}{n^{\sigma+it}} + \frac{\Lambda(n)}{n^{\sigma+2it}} ) = 2\frac{\Lambda(n)}{n^\sigma} (1+\cos(t \log n))^2

is non-negative for {\sigma > 1} and {t \in {\mathbb R}}); from the functional equation

\displaystyle  \pi^{-s/2} \Gamma(s/2) \zeta(s) = \pi^{-(1-s)/2} \Gamma((1-s)/2) \zeta(1-s)

(which can be viewed as a consequence of the Poisson summation formula, see e.g. my blog post on this topic) we know that there are no zeroes for {\hbox{Re}(s) \leq 0} either (except for the trivial zeroes at negative even integers, corresponding to the poles of the Gamma function). Thus all the non-trivial zeroes lie in the critical strip {0 < \hbox{Re}(s) < 1}.

We also know that there are infinitely many non-trivial zeroes, and can approximately count how many zeroes there are in any large bounded region of the critical strip. For instance, for large {T}, the number of zeroes {\rho} in this strip with {\hbox{Im}(\rho) = T+O(1)} is {O(\log T)}. This can be seen by applying (6) to {s = 2+iT} (say); the trivial zeroes at the negative integers end up giving a contribution of {O(\log T)} to this sum (this is a heavily disguised variant of Stirling’s formula, as one can view the trivial zeroes as essentially being poles of the Gamma function), while the {\frac{1}{s-1}} and {\ldots} terms end up being negligible (of size {O(1)}), while each non-trivial zero {\rho} contributes a term which has a non-negative real part, and furthermore has size comparable to {1} if {\hbox{Im}(\rho) = T+O(1)}. (Here I am glossing over a technical renormalisation needed to make the infinite series in (6) converge properly.) Meanwhile, the left-hand side of (6) is absolutely convergent for {s=2+iT} and of size {O(1)}, and the claim follows. A more refined version of this argument shows that the number of non-trivial zeroes with {0 \leq \hbox{Im}(\rho) \leq T} is {\frac{T}{2\pi} \log \frac{T}{2\pi} - \frac{T}{2\pi} + O(\log T)}, but we will not need this more precise formula here. (A fair fraction – at least 40%, in fact – of these zeroes are known to lie on the critical line; see this earlier blog post of mine for more discussion.)

Another thing that we happen to know is how the magnitude {|\zeta(1/2+it)|} of the zeta function is distributed as {t \rightarrow \infty}; it turns out to be log-normally distributed with log-variance about {\frac{1}{2} \log \log t}. More precisely, we have the following result of Selberg:

Theorem 1 Let {T} be a large number, and let {t} be chosen uniformly at random from between {T} and {2T} (say). Then the distribution of {\frac{1}{\sqrt{\frac{1}{2} \log \log T}} \log |\zeta(1/2+it)|} converges (in distribution) to the normal distribution {N(0,1)}.

To put it more informally, {\log |\zeta(1/2+it)|} behaves like {\sqrt{\frac{1}{2} \log \log t} \times N(0,1)} plus lower order terms for “typical” large values of {t}. (Zeroes {\rho} of {\zeta} are, of course, certainly not typical, but one can show that one can usually stay away from these zeroes.) In fact, Selberg showed a slightly more precise result, namely that for any fixed {k \geq 1}, the {k^{th}} moment of {\frac{1}{\sqrt{\frac{1}{2} \log \log T}} \log |\zeta(1/2+it)|} converges to the {k^{th}} moment of {N(0,1)}.

Remarkably, Selberg’s result does not need RH or GUE, though it is certainly consistent with such hypotheses. (For instance, the determinant of a GUE matrix asymptotically obeys a remarkably similar log-normal law to that given by Selberg’s theorem.) Indeed, the net effect of these hypotheses only affects some error terms in {\log |\zeta(1/2+it)|} of magnitude {O(1)}, and are thus asymptotically negligible compared to the main term, which has magnitude about {O(\sqrt{\log \log T})}. So Selberg’s result, while very pretty, manages to finesse the question of what the zeroes {\rho} of {\zeta} are actually doing – he makes the primes do most of the work, rather than the zeroes.

Selberg never actually published the above result, but it is reproduced in a number of places (e.g. in this book by Joyner, or this book by Laurincikas). As with many other results in analytic number theory, the actual details of the proof can get somewhat technical; but I would like to record here (partly for my own benefit) an informal sketch of some of the main ideas in the argument.

Read the rest of this entry »

This is a continuation of the preceding two threads here on the polymath1 project, which are now full.  We are closing on on the computation of the sixth Moser number – the size of the largest subset of the six-dimensional cube \{1,2,3\}^6 that does not contain any lines: it is either 353 or 354, and 354 looks close to being eliminated soon.

Besides this, the nominal purpose of this thread is to coordinate the writing of the second paper of the project.  In addition to incorporating whatever results we get from the six-dimensional Moser problem, a lot of work still has to be done on other sections of the paper, notably the higher k component and the component on Fujimura’s problem, as well as the appendices.

A remarkable phenomenon in probability theory is that of universality – that many seemingly unrelated probability distributions, which ostensibly involve large numbers of unknown parameters, can end up converging to a universal law that may only depend on a small handful of parameters. One of the most famous examples of the universality phenomenon is the central limit theorem; another rich source of examples comes from random matrix theory, which is one of the areas of my own research.

Analogous universality phenomena also show up in empirical distributions – the distributions of a statistic {X} from a large population of “real-world” objects. Examples include Benford’s law, Zipf’s law, and the Pareto distribution (of which the Pareto principle or 80-20 law is a special case). These laws govern the asymptotic distribution of many statistics {X} which

  • (i) take values as positive numbers;
  • (ii) range over many different orders of magnitude;
  • (iiii) arise from a complicated combination of largely independent factors (with different samples of {X} arising from different independent factors); and
  • (iv) have not been artificially rounded, truncated, or otherwise constrained in size.

Examples here include the population of countries or cities, the frequency of occurrence of words in a language, the mass of astronomical objects, or the net worth of individuals or corporations. The laws are then as follows:

  • Benford’s law: For {k=1,\ldots,9}, the proportion of {X} whose first digit is {k} is approximately {\log_{10} \frac{k+1}{k}}. Thus, for instance, {X} should have a first digit of {1} about {30\%} of the time, but a first digit of {9} only about {5\%} of the time.
  • Zipf’s law: The {n^{th}} largest value of {X} should obey an approximate power law, i.e. it should be approximately {C n^{-\alpha}} for the first few {n=1,2,3,\ldots} and some parameters {C, \alpha > 0}. In many cases, {\alpha} is close to {1}.
  • Pareto distribution: The proportion of {X} with at least {m} digits (before the decimal point), where {m} is above the median number of digits, should obey an approximate exponential law, i.e. be approximately of the form {c 10^{-m/\alpha}} for some {c, \alpha > 0}. Again, in many cases {\alpha} is close to {1}.

Benford’s law and Pareto distribution are stated here for base {10}, which is what we are most familiar with, but the laws hold for any base (after replacing all the occurrences of {10} in the above laws with the new base, of course). The laws tend to break down if the hypotheses (i)-(iv) are dropped. For instance, if the statistic {X} concentrates around its mean (as opposed to being spread over many orders of magnitude), then the normal distribution tends to be a much better model (as indicated by such results as the central limit theorem). If instead the various samples of the statistics are highly correlated with each other, then other laws can arise (for instance, the eigenvalues of a random matrix, as well as many empirically observed matrices, are correlated to each other, with the behaviour of the largest eigenvalues being governed by laws such as the Tracy-Widom law rather than Zipf’s law, and the bulk distribution being governed by laws such as the semicircular law rather than the normal or Pareto distributions).

To illustrate these laws, let us take as a data set the populations of 235 countries and regions of the world in 2007 (using the CIA world factbook); I have put the raw data here. This is a relatively small sample (cf. my previous post), but is already enough to discern these laws in action. For instance, here is how the data set tracks with Benford’s law (rounded to three significant figures):

{k} Countries Number Benford prediction
1 Angola, Anguilla, Aruba, Bangladesh, Belgium, Botswana, Brazil, Burkina Faso, Cambodia, Cameroon, Chad, Chile, China, Christmas Island, Cook Islands, Cuba, Czech Republic, Ecuador, Estonia, Gabon, (The) Gambia, Greece, Guam, Guatemala, Guinea-Bissau, India, Japan, Kazakhstan, Kiribati, Malawi, Mali, Mauritius, Mexico, (Federated States of) Micronesia, Nauru, Netherlands, Niger, Nigeria, Niue, Pakistan, Portugal, Russia, Rwanda, Saint Lucia, Saint Vincent and the Grenadines, Senegal, Serbia, Swaziland, Syria, Timor-Leste (East-Timor), Tokelau, Tonga, Trinidad and Tobago, Tunisia, Tuvalu, (U.S.) Virgin Islands, Wallis and Futuna, Zambia, Zimbabwe 59 ({25.1\%}) 71 ({30.1\%})
2 Armenia, Australia, Barbados, British Virgin Islands, Cote d’Ivoire, French Polynesia, Ghana, Gibraltar, Indonesia, Iraq, Jamaica, (North) Korea, Kosovo, Kuwait, Latvia, Lesotho, Macedonia, Madagascar, Malaysia, Mayotte, Mongolia, Mozambique, Namibia, Nepal, Netherlands Antilles, New Caledonia Norfolk Island, Palau, Peru, Romania, Saint Martin, Samoa, San Marino, Sao Tome and Principe, Saudi Arabia, Slovenia, Sri Lanka, Svalbard, Taiwan, Turks and Caicos Islands, Uzbekistan, Vanuatu, Venezuela, Yemen 44 ({18.7\%}) 41 ({17.6\%})
3 Afghanistan, Albania, Algeria, (The) Bahamas, Belize, Brunei, Canada, (Rep. of the) Congo, Falkland Islands (Islas Malvinas), Iceland, Kenya, Lebanon, Liberia, Liechtenstein, Lithuania, Maldives, Mauritania, Monaco, Morocco, Oman, (Occupied) Palestinian Territory, Panama, Poland, Puerto Rico, Saint Kitts and Nevis, Uganda, United States of America, Uruguay, Western Sahara 29 ({12.3\%}) 29 ({12.5\%})
4 Argentina, Bosnia and Herzegovina, Burma (Myanmar), Cape Verde, Cayman Islands, Central African Republic, Colombia, Costa Rica, Croatia, Faroe Islands, Georgia, Ireland, (South) Korea, Luxembourg, Malta, Moldova, New Zealand, Norway, Pitcairn Islands, Singapore, South Africa, Spain, Sudan, Suriname, Tanzania, Ukraine, United Arab Emirates 27 ({11.4\%}) 22 ({9.7\%})
5 (Macao SAR) China, Cocos Islands, Denmark, Djibouti, Eritrea, Finland, Greenland, Italy, Kyrgyzstan, Montserrat, Nicaragua, Papua New Guinea, Slovakia, Solomon Islands, Togo, Turkmenistan 16 ({6.8\%}) 19 ({7.9\%})
6 American Samoa, Bermuda, Bhutan, (Dem. Rep. of the) Congo, Equatorial Guinea, France, Guernsey, Iran, Jordan, Laos, Libya, Marshall Islands, Montenegro, Paraguay, Sierra Leone, Thailand, United Kingdom 17 ({7.2\%}) 16 ({6.7\%})
7 Bahrain, Bulgaria, (Hong Kong SAR) China, Comoros, Cyprus, Dominica, El Salvador, Guyana, Honduras, Israel, (Isle of) Man, Saint Barthelemy, Saint Helena, Saint Pierre and Miquelon, Switzerland, Tajikistan, Turkey 17 ({7.2\%}) 14 ({5.8\%})
8 Andorra, Antigua and Barbuda, Austria, Azerbaijan, Benin, Burundi, Egypt, Ethiopia, Germany, Haiti, Holy See (Vatican City), Northern Mariana Islands, Qatar, Seychelles, Vietnam 15 ({6.4\%}) 12 ({5.1\%})
9 Belarus, Bolivia, Dominican Republic, Fiji, Grenada, Guinea, Hungary, Jersey, Philippines, Somalia, Sweden 11 ({4.5\%}) 11 ({4.6\%})

Here is how the same data tracks Zipf’s law for the first twenty values of {n}, with the parameters {C \approx 1.28 \times 10^9} and {\alpha \approx 1.03} (selected by log-linear regression), again rounding to three significant figures:

{n} Country Population Zipf prediction Deviation from prediction
1 China 1,330,000,000 1,280,000,000 {+4.1\%}
2 India 1,150,000,000 626,000,000 {+83.5\%}
3 USA 304,000,000 412,000,000 {-26.3\%}
4 Indonesia 238,000,000 307,000,000 {-22.5\%}
5 Brazil 196,000,000 244,000,000 {-19.4\%}
6 Pakistan 173,000,000 202,000,000 {-14.4\%}
7 Bangladesh 154,000,000 172,000,000 {-10.9\%}
8 Nigeria 146,000,000 150,000,000 {-2.6\%}
9 Russia 141,000,000 133,000,000 {+5.8\%}
10 Japan 128,000,000 120,000,000 {+6.7\%}
11 Mexico 110,000,000 108,000,000 {+1.7\%}
12 Philippines 96,100,000 98,900,000 {-2.9\%}
13 Vietnam 86,100,000 91,100,000 {-5.4\%}
14 Ethiopia 82,600,000 84,400,000 {-2.1\%}
15 Germany 82,400,000 78,600,000 {+4.8\%}
16 Egypt 81,700,000 73,500,000 {+11.1\%}
17 Turkey 71,900,000 69,100,000 {+4.1\%}
18 Congo 66,500,000 65,100,000 {+2.2\%}
19 Iran 65,900,000 61,600,000 {+6.9\%}
20 Thailand 65,500,000 58,400,000 {+12.1\%}

As one sees, Zipf’s law is not particularly precise at the extreme edge of the statistics (when {n} is very small), but becomes reasonably accurate (given the small sample size, and given that we are fitting twenty data points using only two parameters) for moderate sizes of {n}.

This data set has too few scales in base {10} to illustrate the Pareto distribution effectively – over half of the country populations are either seven or eight digits in that base. But if we instead work in base {2}, then country populations range in a decent number of scales (the majority of countries have population between {2^{23}} and {2^{32}}), and we begin to see the law emerge, where {m} is now the number of digits in binary, the best-fit parameters are {\alpha \approx 1.18} and {c \approx 1.7 \times 2^{26} / 235}:

{m} Countries with {\geq m} binary digit populations Number Pareto prediction
31 China, India 2 1
30 2 2
29 “, United States of America 3 5
28 “, Indonesia, Brazil, Pakistan, Bangladesh, Nigeria, Russia 9 8
27 “, Japan, Mexico, Philippines, Vietnam, Ethiopia, Germany, Egypt, Turkey 17 15
26 “, (Dem. Rep. of the) Congo, Iran, Thailand, France, United Kingdom, Italy, South Africa, (South) Korea, Burma (Myanmar), Ukraine, Colombia, Spain, Argentina, Sudan, Tanzania, Poland, Kenya, Morocco, Algeria 36 27
25 “, Canada, Afghanistan, Uganda, Nepal, Peru, Iraq, Saudi Arabia, Uzbekistan, Venezuela, Malaysia, (North) Korea, Ghana, Yemen, Taiwan, Romania, Mozambique, Sri Lanka, Australia, Cote d’Ivoire, Madagascar, Syria, Cameroon 58 49
24 “, Netherlands, Chile, Kazakhstan, Burkina Faso, Cambodia, Malawi, Ecuador, Niger, Guatemala, Senegal, Angola, Mali, Zambia, Cuba, Zimbabwe, Greece, Portugal, Belgium, Tunisia, Czech Republic, Rwanda, Serbia, Chad, Hungary, Guinea, Belarus, Somalia, Dominican Republic, Bolivia, Sweden, Haiti, Burundi, Benin 91 88
23 “, Austria, Azerbaijan, Honduras, Switzerland, Bulgaria, Tajikistan, Israel, El Salvador, (Hong Kong SAR) China, Paraguay, Laos, Sierra Leone, Jordan, Libya, Papua New Guinea, Togo, Nicaragua, Eritrea, Denmark, Slovakia, Kyrgyzstan, Finland, Turkmenistan, Norway, Georgia, United Arab Emirates, Singapore, Bosnia and Herzegovina, Croatia, Central African Republic, Moldova, Costa Rica 123 159

Thus, with each new scale, the number of countries introduced increases by a factor of a little less than {2}, on the average. This approximate doubling of countries with each new scale begins to falter at about the population {2^{23}} (i.e. at around {4} million), for the simple reason that one has begun to run out of countries. (Note that the median-population country in this set, Singapore, has a population with {23} binary digits.)

These laws are not merely interesting statistical curiosities; for instance, Benford’s law is often used to help detect fraudulent statistics (such as those arising from accounting fraud), as many such statistics are invented by choosing digits at random, and will therefore deviate significantly from Benford’s law. (This is nicely discussed in Robert Matthews’ New Scientist article “The power of one“; this article can also be found on the web at a number of other places.) In a somewhat analogous spirit, Zipf’s law and the Pareto distribution can be used to mathematically test various models of real-world systems (e.g. formation of astronomical objects, accumulation of wealth, population growth of countries, etc.), without necessarily having to fit all the parameters of that model with the actual data.

Being empirically observed phenomena rather than abstract mathematical facts, Benford’s law, Zipf’s law, and the Pareto distribution cannot be “proved” the same way a mathematical theorem can be proved. However, one can still support these laws mathematically in a number of ways, for instance showing how these laws are compatible with each other, and with other plausible hypotheses on the source of the data. In this post I would like to describe a number of ways (both technical and non-technical) in which one can do this; these arguments do not fully explain these laws (in particular, the empirical fact that the exponent {\alpha} in Zipf’s law or the Pareto distribution is often close to {1} is still quite a mysterious phenomenon), and do not always have the same universal range of applicability as these laws seem to have, but I hope that they do demonstrate that these laws are not completely arbitrary, and ought to have a satisfactory basis of mathematical support. Read the rest of this entry »

Archives