You are currently browsing the tag archive for the ‘GUE’ tag.
Hariharan Narayanan, Scott Sheffield, and I have just uploaded to the arXiv our paper “Sums of GUE matrices and concentration of hives from correlation decay of eigengaps“. This is a personally satisfying paper for me, as it connects the work I did as a graduate student (with Allen Knutson and Chris Woodward) on sums of Hermitian matrices, with more recent work I did (with Van Vu) on random matrix theory, as well as several other results by other authors scattered across various mathematical subfields.
Suppose are two Hermitian matrices with eigenvalues and respectively (arranged in non-increasing order. What can one say about the eigenvalues of the sum ? There are now many ways to answer this question precisely; one of them, introduced by Allen and myself many years ago, is that there exists a certain triangular array of numbers called a “hive” that has as its boundary values. On the other hand, by the pioneering work of Voiculescu in free probability, we know in the large limit that if are asymptotically drawn from some limiting distribution, and and are drawn independently at random (using the unitarily invariant Haar measure) amongst all Hermitian matrices with the indicated eigenvalues, then (under mild hypotheses on the distribution, and under suitable normalization), will almost surely have a limiting distribution that is the free convolution of the two original distributions.
One of my favourite open problems is to come up with a theory of “free hives” that allows one to explain the latter fact from the former. This is still unresolved, but we are now beginning to make a bit of progress towards this goal. We know (for instance from the calculations of Coquereaux and Zuber) that if are drawn independently at random with eigenvalues , then the eigenvalues of are distributed according to the boundary values of an “augmented hive” with two boundaries , drawn uniformly at random from the polytope of all such augmented hives. (This augmented hive is basically a regular hive with another type of pattern, namely a Gelfand-Tsetlin pattern, glued to one side of it.) So, if one could show some sort of concentration of measure for the entries of this augmented hive, and calculate what these entries concentrated to, one should presumably be able to recover Voiculescu’s result after some calculation.
In this paper, we are able to accomplish the first half of this goal, assuming that the spectra are not deterministic, but rather drawn from the spectra of rescaled GUE matrices (thus are independent rescaled copies of the GUE ensemble). We have chosen to normalize matters so that the eigenvalues have size , so that the entries of the augmented hive have entries . Our result is then that the entries of the augmented hive in fact have a standard deviation of , thus exhibiting a little bit of concentration. (Actually, from the Brunn-Minkowski inequality, the distribution of these entries is log concave, so once once controls the standard deviation one also gets a bit of exponential decay beyond the standard deviation; Narayanan and Sheffield had also recently established the existence of a rate function for this sort of model.) Presumably one should get much better concentration, and one should be able to handle other models than the GUE ensemble, but this is the first advance that we were able to achieve.
Augmented hives seem tricky to work with directly, but by adapting the octahedron recurrence introduced for this problem by Knutson, Woodward, and myself some time ago (which is related to the associativity of addition for Hermitian matrices), one can construct a piecewise linear volume-preserving map between the cone of augmented hives, and the product of two Gelfand-Tsetlin cones. The problem then reduces to establishing concentration of measure for certain piecewise linear maps on products of Gelfand-Tsetlin cones (endowed with a certain GUE-type measure). This is a promising formulation because Gelfand-Tsetlin cones are by now quite well understood.
On the other hand, the piecewise linear map, initially defined by iterating the octahedron relation , looks somewhat daunting. Fortunately, there is an explicit formulation of this map due to Speyer, as the supremum of certain linear maps associated to perfect matchings of a certain “excavation graph”. For us it was convenient to work with the dual of this excavation graph, and associate these linear maps to certain “lozenge tilings” of a hexagon.
It would be more convenient to study the concentration of each linear map separately, rather than their supremum. By the Cheeger inequality, it turns out that one can relate the latter to the former provided that one has good control on the Cheeger constant of the underlying measure on the Gelfand-Tsetlin cones. Fortunately, the measure is log-concave, so one can use the very recent work of Klartag on the KLS conjecture to eliminate the supremum (up to a logarithmic loss which is only moderately annoying to deal with).
It remains to obtain concentration on the linear map associated to a given lozenge tiling. After stripping away some contributions coming from lozenges near the edge (using some eigenvalue rigidity results of Van Vu and myself), one is left with some bulk contributions which ultimately involve eigenvalue interlacing gaps such as
where is the eigenvalue of the top left minor of , and is in the bulk region for some fixed . To get the desired result, one needs some non-trivial correlation decay in for these statistics. If one was working with eigenvalue gaps rather than interlacing results, then such correlation decay was conveniently obtained for us by recent work of Cippoloni, Erdös, and Schröder. So the last remaining challenge is to understand the relation between eigenvalue gaps and interlacing gaps.For this we turned to the work of Metcalfe, who uncovered a determinantal process structure to this problem, with a kernel associated to Lagrange interpolation polynomials. It is possible to satisfactorily estimate various integrals of these kernels using the residue theorem and eigenvalue rigidity estimates, thus completing the required analysis.
Let be a large natural number, and let be a matrix drawn from the Gaussian Unitary Ensemble (GUE), by which we mean that is a Hermitian matrix whose upper triangular entries are iid complex gaussians with mean zero and variance one, and whose diagonal entries are iid real gaussians with mean zero and variance one (and independent of the upper triangular entries). The eigenvalues are then real and almost surely distinct, and can be viewed as a random point process on the real line. One can then form the -point correlation functions for every , which can be defined by duality by requiring
for any test function . For GUE, which is a continuous matrix ensemble, one can also define for distinct as the unique quantity such that the probability that there is an eigenvalue in each of the intervals is in the limit .
As is well known, the GUE process is a determinantal point process, which means that -point correlation functions can be explicitly computed as
for some kernel ; explicitly, one has
where are the (normalised) Hermite polynomials; see this previous blog post for details.
Using the asymptotics of Hermite polynomials (which then give asymptotics for the kernel ), one can take a limit of a (suitably rescaled) sequence of GUE processes to obtain the Dyson sine process, which is a determinantal point process on the real line with correlation functions
where is the Dyson sine kernel
A bit more precisely, for any fixed bulk energy , the renormalised point processes converge in distribution in the vague topology to as , where is the semi-circular law density.
On the other hand, an important feature of the GUE process is its stationarity (modulo rescaling) under Dyson Brownian motion
which describes the stochastic evolution of eigenvalues of a Hermitian matrix under independent Brownian motion of its entries, and is discussed in this previous blog post. To cut a long story short, this stationarity tells us that the self-similar -point correlation function
obeys the Dyson heat equation
(see Exercise 11 of the previously mentioned blog post). Note that vanishes to second order whenever two of the coincide, so there is no singularity on the right-hand side. Setting and using self-similarity, we can rewrite this equation in time-independent form as
One can then integrate out all but of these variables (after carefully justifying convergence) to obtain a system of equations for the -point correlation functions :
where the integral is interpreted in the principal value case. This system is an example of a BBGKY hierarchy.
If one carefully rescales and takes limits (say at the energy level , for simplicity), the left-hand side turns out to rescale to be a lower order term, and one ends up with a hierarchy for the Dyson sine process:
Informally, these equations show that the Dyson sine process is stationary with respect to the infinite Dyson Brownian motion
where are independent Brownian increments, and the sum is interpreted in a suitable principal value sense.
I recently set myself the exercise of deriving the identity (3) directly from the definition (1) of the Dyson sine process, without reference to GUE. This turns out to not be too difficult when done the right way (namely, by modifying the proof of Gaudin’s lemma), although it did take me an entire day of work before I realised this, and I could not find it in the literature (though I suspect that many people in the field have privately performed this exercise in the past). In any case, I am recording the computation here, largely because I really don’t want to have to do it again, but perhaps it will also be of interest to some readers.
I’ve just uploaded to the arXiv my paper The asymptotic distribution of a single eigenvalue gap of a Wigner matrix, submitted to Probability Theory and Related Fields. This paper (like several of my previous papers) is concerned with the asymptotic distribution of the eigenvalues of a random Wigner matrix in the limit , with a particular focus on matrices drawn from the Gaussian Unitary Ensemble (GUE). This paper is focused on the bulk of the spectrum, i.e. to eigenvalues with for some fixed .
The location of an individual eigenvalue is by now quite well understood. If we normalise the entries of the matrix to have mean zero and variance , then in the asymptotic limit , the Wigner semicircle law tells us that with probability one has
where the classical location of the eigenvalue is given by the formula
and the semicircular distribution is given by the formula
Actually, one can improve the error term here from to for any (see this previous recent paper of Van and myself for more discussion of these sorts of estimates, sometimes known as eigenvalue rigidity estimates).
From the semicircle law (and the fundamental theorem of calculus), one expects the eigenvalue spacing to have an average size of . It is thus natural to introduce the normalised eigenvalue spacing
and ask what the distribution of is.
As mentioned previously, we will focus on the bulk case , and begin with the model case when is drawn from GUE. (In the edge case when is close to or to , the distribution is given by the famous Tracy-Widom law.) Here, the distribution was almost (but as we shall see, not quite) worked out by Gaudin and Mehta. By using the theory of determinantal processes, they were able to compute a quantity closely related to , namely the probability
that an interval near of length comparable to the expected eigenvalue spacing is devoid of eigenvalues. For in the bulk and fixed , they showed that this probability is equal to
where is the Dyson projection
to Fourier modes in , and is the Fredholm determinant. As shown by Jimbo, Miwa, Tetsuji, Mori, and Sato, this determinant can also be expressed in terms of a solution to a Painleve V ODE, though we will not need this fact here. In view of this asymptotic and some standard integration by parts manipulations, it becomes plausible to propose that will be asymptotically distributed according to the Gaudin-Mehta distribution , where
A reasonably accurate approximation for is given by the Wigner surmise [EDIT: as pointed out in comments, in this GUE setting the correct surmise is ], which was presciently proposed by Wigner as early as 1957; it is exact for but not in the asymptotic limit .
Unfortunately, when one tries to make this argument rigorous, one finds that the asymptotic for (1) does not control a single gap , but rather an ensemble of gaps , where is drawn from an interval of some moderate size (e.g. ); see for instance this paper of Deift, Kriecherbauer, McLaughlin, Venakides, and Zhou for a more precise formalisation of this statement (which is phrased slightly differently, in which one samples all gaps inside a fixed window of spectrum, rather than inside a fixed range of eigenvalue indices ). (This result is stated for GUE, but can be extended to other Wigner ensembles by the Four Moment Theorem, at least if one assumes a moment matching condition; see this previous paper with Van Vu for details. The moment condition can in fact be removed, as was done in this subsequent paper with Erdos, Ramirez, Schlein, Vu, and Yau.)
The problem is that when one specifies a given window of spectrum such as , one cannot quite pin down in advance which eigenvalues are going to lie to the left or right of this window; even with the strongest eigenvalue rigidity results available, there is a natural uncertainty of or so in the index (as can be quantified quite precisely by this central limit theorem of Gustavsson).
The main difficulty here is that there could potentially be some strange coupling between the event (1) of an interval being devoid of eigenvalues, and the number of eigenvalues to the left of that interval. For instance, one could conceive of a possible scenario in which the interval in (1) tends to have many eigenvalues when is even, but very few when is odd. In this sort of situation, the gaps may have different behaviour for even than for odd , and such anomalies would not be picked up in the averaged statistics in which is allowed to range over some moderately large interval.
The main result of the current paper is that these anomalies do not actually occur, and that all of the eigenvalue gaps in the bulk are asymptotically governed by the Gaudin-Mehta law without the need for averaging in the parameter. Again, this is shown first for GUE, and then extended to other Wigner matrices obeying a matching moment condition using the Four Moment Theorem. (It is likely that the moment matching condition can be removed here, but I was unable to achieve this, despite all the recent advances in establishing universality of local spectral statistics for Wigner matrices, mainly because the universality results in the literature are more focused on specific energy levels than on specific eigenvalue indices . To make matters worse, in some cases universality is currently known only after an additional averaging in the energy parameter.)
The main task in the proof is to show that the random variable is largely decoupled from the event in (1) when is drawn from GUE. To do this we use some of the theory of determinantal processes, and in particular the nice fact that when one conditions a determinantal process to the event that a certain spatial region (such as an interval) contains no points of the process, then one obtains a new determinantal process (with a kernel that is closely related to the original kernel). The main task is then to obtain a sufficiently good control on the distance between the new determinantal kernel and the old one, which we do by some functional-analytic considerations involving the manipulation of norms of operators (and specifically, the operator norm, Hilbert-Schmidt norm, and nuclear norm). Amusingly, the Fredholm alternative makes a key appearance, as I end up having to invert a compact perturbation of the identity at one point (specifically, I need to invert , where is the Dyson projection and is an interval). As such, the bounds in my paper become ineffective, though I am sure that with more work one can invert this particular perturbation of the identity by hand, without the need to invoke the Fredholm alternative.
Van Vu and I have just uploaded to the arXiv our paper A central limit theorem for the determinant of a Wigner matrix, submitted to Adv. Math.. It studies the asymptotic distribution of the determinant of a random Wigner matrix (such as a matrix drawn from the Gaussian Unitary Ensemble (GUE) or Gaussian Orthogonal Ensemble (GOE)).
Before we get to these results, let us first discuss the simpler problem of studying the determinant of a random iid matrix , such as a real gaussian matrix (where all entries are independently and identically distributed using the standard real normal distribution ), a complex gaussian matrix (where all entries are independently and identically distributed using the standard complex normal distribution , thus the real and imaginary parts are independent with law ), or the random sign matrix (in which all entries are independently and identically distributed according to the Bernoulli distribution (with a chance of either sign). More generally, one can consider a matrix in which all the entries are independently and identically distributed with mean zero and variance .
We can expand using the Leibniz expansion
where ranges over the permutations of , and is the product
From the iid nature of the , we easily see that each has mean zero and variance one, and are pairwise uncorrelated as varies. We conclude that has mean zero and variance (an observation first made by Turán). In particular, from Chebyshev’s inequality we see that is typically of size .
It turns out, though, that this is not quite best possible. This is easiest to explain in the real gaussian case, by performing a computation first made by Goodman. In this case, is clearly symmetrical, so we can focus attention on the magnitude . We can interpret this quantity geometrically as the volume of an -dimensional parallelopiped whose generating vectors are independent real gaussian vectors in (i.e. their coefficients are iid with law ). Using the classical base-times-height formula, we thus have
where is the -dimensional linear subspace of spanned by (note that , having an absolutely continuous joint distribution, are almost surely linearly independent). Taking logarithms, we conclude
Now, we take advantage of a fundamental symmetry property of the Gaussian vector distribution, namely its invariance with respect to the orthogonal group . Because of this, we see that if we fix (and thus , the random variable has the same distribution as , or equivalently the distribution
where are iid copies of . As this distribution does not depend on the , we conclude that the law of is given by the sum of independent -variables:
A standard computation shows that each has mean and variance , and then a Taylor series (or Ito calculus) computation (using concentration of measure tools to control tails) shows that has mean and variance . As such, has mean and variance . Applying a suitable version of the central limit theorem, one obtains the asymptotic law
where denotes convergence in distribution. A bit more informally, we have
when is a real gaussian matrix; thus, for instance, the median value of is . At first glance, this appears to conflict with the second moment bound of Turán mentioned earlier, but once one recalls that has a second moment of , we see that the two facts are in fact perfectly consistent; the upper tail of the normal distribution in the exponent in (4) ends up dominating the second moment.
It turns out that the central limit theorem (3) is valid for any real iid matrix with mean zero, variance one, and an exponential decay condition on the entries; this was first claimed by Girko, though the arguments in that paper appear to be incomplete. Another proof of this result, with more quantitative bounds on the convergence rate has been recently obtained by Hoi Nguyen and Van Vu. The basic idea in these arguments is to express the sum in (2) in terms of a martingale and apply the martingale central limit theorem.
If one works with complex gaussian random matrices instead of real gaussian random matrices, the above computations change slightly (one has to replace the real distribution with the complex distribution, in which the are distributed according to the complex gaussian instead of the real one). At the end of the day, one ends up with the law
(but note that this new asymptotic is still consistent with Turán’s second moment calculation).
We can now turn to the results of our paper. Here, we replace the iid matrices by Wigner matrices , which are defined similarly but are constrained to be Hermitian (or real symmetric), thus for all . Model examples here include the Gaussian Unitary Ensemble (GUE), in which for and for , the Gaussian Orthogonal Ensemble (GOE), in which for and for , and the symmetric Bernoulli ensemble, in which for (with probability of either sign). In all cases, the upper triangular entries of the matrix are assumed to be jointly independent. For a more precise definition of the Wigner matrix ensembles we are considering, see the introduction to our paper.
The determinants of these matrices still have a Leibniz expansion. However, in the Wigner case, the mean and variance of the are slightly different, and what is worse, they are not all pairwise uncorrelated any more. For instance, the mean of is still usually zero, but equals in the exceptional case when is a perfect matching (i.e. the union of exactly -cycles, a possibility that can of course only happen when is even). As such, the mean still vanishes when is odd, but for even it is equal to
(the fraction here simply being the number of perfect matchings on vertices). Using Stirling’s formula, one then computes that is comparable to when is large and even. The second moment calculation is more complicated (and uses facts about the distribution of cycles in random permutations, mentioned in this previous post), but one can compute that is comparable to for GUE and for GOE. (The discrepancy here comes from the fact that in the GOE case, and can correlate when contains reversals of -cycles of for , but this does not happen in the GUE case.) For GUE, much more precise asymptotics for the moments of the determinant are known, starting from the work of Brezin and Hikami, though we do not need these more sophisticated computations here.
Our main results are then as follows.
Theorem 1 Let be a Wigner matrix.
- If is drawn from GUE, then
- If is drawn from GOE, then
- The previous two results also hold for more general Wigner matrices, assuming that the real and imaginary parts are independent, a finite moment condition is satisfied, and the entries match moments with those of GOE or GUE to fourth order. (See the paper for a more precise formulation of the result.)
Thus, we informally have
when is drawn from GUE, or from another Wigner ensemble matching GUE to fourth order (and obeying some additional minor technical hypotheses); and
when is drawn from GOE, or from another Wigner ensemble matching GOE to fourth order. Again, these asymptotic limiting distributions are consistent with the asymptotic behaviour for the second moments.
The extension from the GUE or GOE case to more general Wigner ensembles is a fairly routine application of the four moment theorem for Wigner matrices, although for various technical reasons we do not quite use the existing four moment theorems in the literature, but adapt them to the log determinant. The main idea is to express the log-determinant as an integral
of . Strictly speaking, the integral in (7) is divergent at infinity (and also can be ill-behaved near zero), but this can be addressed by standard truncation and renormalisation arguments (combined with known facts about the least singular value of Wigner matrices), which we omit here. We then use a variant of the four moment theorem for the Stieltjes transform, as used by Erdos, Yau, and Yin (based on a previous four moment theorem for individual eigenvalues introduced by Van Vu and myself). The four moment theorem is proven by the now-standard Lindeberg exchange method, combined with the usual resolvent identities to control the behaviour of the resolvent (and hence the Stieltjes transform) with respect to modifying one or two entries, together with the delocalisation of eigenvector property (which in turn arises from local semicircle laws) to control the error terms.
Somewhat surprisingly (to us, at least), it turned out that it was the first part of the theorem (namely, the verification of the limiting law for the invariant ensembles GUE and GOE) that was more difficult than the extension to the Wigner case. Even in an ensemble as highly symmetric as GUE, the rows are no longer independent, and the formula (2) is basically useless for getting any non-trivial control on the log determinant. There is an explicit formula for the joint distribution of the eigenvalues of GUE (or GOE), which does eventually give the distribution of the cumulants of the log determinant, which then gives the required central limit theorem; but this is a lengthy computation, first performed by Delannay and Le Caer.
Following a suggestion of my colleague, Rowan Killip, we give an alternate proof of this central limit theorem in the GUE and GOE cases, by using a beautiful observation of Trotter, namely that the GUE or GOE ensemble can be conjugated into a tractable tridiagonal form. Let me state it just for GUE:
Proposition 2 (Tridiagonal form of GUE) Let be the random tridiagonal real symmetric matrix
where the are jointly independent real random variables, with being standard real Gaussians, and each having a -distribution:
where are iid complex gaussians. Let be drawn from GUE. Then the joint eigenvalue distribution of is identical to the joint eigenvalue distribution of .
Proof: Let be drawn from GUE. We can write
where is drawn from the GUE, , and is a random gaussian vector with all entries iid with distribution . Furthermore, are jointly independent.
We now apply the tridiagonal matrix algorithm. Let , then has the -distribution indicated in the proposition. We then conjugate by a unitary matrix that preserves the final basis vector , and maps to . Then we have
where is conjugate to . Now we make the crucial observation: because is distributed according to GUE (which is a unitarily invariant ensemble), and is a unitary matrix independent of , is also distributed according to GUE, and remains independent of both and .
We continue this process, expanding as
Applying a further unitary conjugation that fixes but maps to , we may replace by while transforming to another GUE matrix independent of . Iterating this process, we eventually obtain a coupling of to by unitary conjugations, and the claim follows.
The determinant of a tridiagonal matrix is not quite as simple as the determinant of a triangular matrix (in which it is simply the product of the diagonal entries), but it is pretty close: the determinant of the above matrix is given by solving the recursion
with and . Thus, instead of the product of a sequence of independent scalar distributions as in the gaussian matrix case, the determinant of GUE ends up being controlled by the product of a sequence of independent matrices whose entries are given by gaussians and distributions. In this case, one cannot immediately take logarithms and hope to get something for which the martingale central limit theorem can be applied, but some ad hoc manipulation of these matrix products eventually does make this strategy work. (Roughly speaking, one has to work with the logarithm of the Frobenius norm of the matrix first.)
This week I am at the American Institute of Mathematics, as an organiser on a workshop on the universality phenomenon in random matrices. There have been a number of interesting discussions so far in this workshop. Percy Deift, in a lecture on universality for invariant ensembles, gave some applications of what he only half-jokingly termed “the most important identity in mathematics”, namely the formula
whenever are and matrices respectively (or more generally, and could be linear operators with sufficiently good spectral properties that make both sides equal). Note that the left-hand side is an determinant, while the right-hand side is a determinant; this formula is particularly useful when computing determinants of large matrices (or of operators), as one can often use it to transform such determinants into much smaller determinants. In particular, the asymptotic behaviour of determinants as can be converted via this formula to determinants of a fixed size (independent of ), which is often a more favourable situation to analyse. Unsurprisingly, this trick is particularly useful for understanding the asymptotic behaviour of determinantal processes.
There are many ways to prove the identity. One is to observe first that when are invertible square matrices of the same size, that and are conjugate to each other and thus clearly have the same determinant; a density argument then removes the invertibility hypothesis, and a padding-by-zeroes argument then extends the square case to the rectangular case. Another is to proceed via the spectral theorem, noting that and have the same non-zero eigenvalues.
By rescaling, one obtains the variant identity
which essentially relates the characteristic polynomial of with that of . When , a comparison of coefficients this already gives important basic identities such as and ; when is not equal to , an inspection of the coefficient similarly gives the Cauchy-Binet formula (which, incidentally, is also useful when performing computations on determinantal processes).
Thanks to this formula (and with a crucial insight of Alice Guionnet), I was able to solve a problem (on outliers for the circular law) that I had in the back of my mind for a few months, and initially posed to me by Larry Abbott; I hope to talk more about this in a future post.
Today, though, I wish to talk about another piece of mathematics that emerged from an afternoon of free-form discussion that we managed to schedule within the AIM workshop. Specifically, we hammered out a heuristic model of the mesoscopic structure of the eigenvalues of the Gaussian Unitary Ensemble (GUE), where is a large integer. As is well known, the probability density of these eigenvalues is given by the Ginebre distribution
where is Lebesgue measure on the Weyl chamber , is a constant, and the Hamiltonian is given by the formula
At the macroscopic scale of , the eigenvalues are distributed according to the Wigner semicircle law
Indeed, if one defines the classical location of the eigenvalue to be the unique solution in to the equation
then it is known that the random variable is quite close to . Indeed, a result of Gustavsson shows that, in the bulk region when , is distributed asymptotically as a gaussian random variable with mean and variance . Note that from the semicircular law, the factor is the mean eigenvalue spacing.
At the other extreme, at the microscopic scale of the mean eigenvalue spacing (which is comparable to in the bulk, but can be as large as at the edge), the eigenvalues are asymptotically distributed with respect to a special determinantal point process, namely the Dyson sine process in the bulk (and the Airy process on the edge), as discussed in this previous post.
Here, I wish to discuss the mesoscopic structure of the eigenvalues, in which one involves scales that are intermediate between the microscopic scale and the macroscopic scale , for instance in correlating the eigenvalues and in the regime for some . Here, there is a surprising phenomenon; there is quite a long-range correlation between such eigenvalues. The result of Gustavsson shows that both and behave asymptotically like gaussian random variables, but a further result from the same paper shows that the correlation between these two random variables is asymptotic to (in the bulk, at least); thus, for instance, adjacent eigenvalues and are almost perfectly correlated (which makes sense, as their spacing is much less than either of their standard deviations), but that even very distant eigenvalues, such as and , have a correlation comparable to . One way to get a sense of this is to look at the trace
This is also the sum of the diagonal entries of a GUE matrix, and is thus normally distributed with a variance of . In contrast, each of the (in the bulk, at least) has a variance comparable to . In order for these two facts to be consistent, the average correlation between pairs of eigenvalues then has to be of the order of .
Below the fold, I give a heuristic way to see this correlation, based on Taylor expansion of the convex Hamiltonian around the minimum , which gives a conceptual probabilistic model for the mesoscopic structure of the GUE eigenvalues. While this heuristic is in no way rigorous, it does seem to explain many of the features currently known or conjectured about GUE, and looks likely to extend also to other models.
Let be a large integer, and let be the Gaussian Unitary Ensemble (GUE), i.e. the random Hermitian matrix with probability distribution
where is a Haar measure on Hermitian matrices and is the normalisation constant required to make the distribution of unit mass. The eigenvalues of this matrix are then a coupled family of real random variables. For any , we can define the -point correlation function to be the unique symmetric measure on such that
A standard computation (given for instance in these lecture notes of mine) gives the Ginebre formula
for the -point correlation function, where is another normalisation constant. Using Vandermonde determinants, one can rewrite this expression in determinantal form as
where the kernel is given by
where and are the (-normalised) Hermite polynomials (thus the are an orthonormal family, with each being a polynomial of degree ). Integrating out one or more of the variables, one is led to the Gaudin-Mehta formula
(In particular, the normalisation constant in the previous formula turns out to simply be equal to .) Again, see these lecture notes for details.
The functions can be viewed as an orthonormal basis of eigenfunctions for the harmonic oscillator operator
indeed it is a classical fact that
As such, the kernel can be viewed as the integral kernel of the spectral projection operator .
From (1) we see that the fine-scale structure of the eigenvalues of GUE are controlled by the asymptotics of as . The two main asymptotics of interest are given by the following lemmas:
Lemma 1 (Asymptotics of in the bulk) Let , and let be the semicircular law density at . Then, we have
as for any fixed (removing the singularity at in the usual manner).
Lemma 2 (Asymptotics of at the edge) We have
as for any fixed , where is the Airy function
and again removing the singularity at in the usual manner.
The proof of these asymptotics usually proceeds via computing the asymptotics of Hermite polynomials, together with the Christoffel-Darboux formula; this is for instance the approach taken in the previous notes. However, there is a slightly different approach that is closer in spirit to the methods of semi-classical analysis, which was briefly mentioned in the previous notes but not elaborated upon. For sake of completeness, I am recording some notes on this approach here, although to focus on the main ideas I will not be completely rigorous in the derivation (ignoring issues such as convegence of integrals or of operators, or (removable) singularities in kernels caused by zeroes in the denominator).
Our study of random matrices, to date, has focused on somewhat general ensembles, such as iid random matrices or Wigner random matrices, in which the distribution of the individual entries of the matrices was essentially arbitrary (as long as certain moments, such as the mean and variance, were normalised). In these notes, we now focus on two much more special, and much more symmetric, ensembles:
- The Gaussian Unitary Ensemble (GUE), which is an ensemble of random Hermitian matrices in which the upper-triangular entries are iid with distribution , and the diagonal entries are iid with distribution , and independent of the upper-triangular ones; and
- The Gaussian random matrix ensemble, which is an ensemble of random (non-Hermitian) matrices whose entries are iid with distribution .
The symmetric nature of these ensembles will allow us to compute the spectral distribution by exact algebraic means, revealing a surprising connection with orthogonal polynomials and with determinantal processes. This will, for instance, recover the semi-circular law for GUE, but will also reveal fine spacing information, such as the distribution of the gap between adjacent eigenvalues, which is largely out of reach of tools such as the Stieltjes transform method and the moment method (although the moment method, with some effort, is able to control the extreme edges of the spectrum).
Similarly, we will see for the first time the circular law for eigenvalues of non-Hermitian matrices.
There are a number of other highly symmetric ensembles which can also be treated by the same methods, most notably the Gaussian Orthogonal Ensemble (GOE) and the Gaussian Symplectic Ensemble (GSE). However, for simplicity we shall focus just on the above two ensembles. For a systematic treatment of these ensembles, see the text by Deift.
Read the rest of this entry »
One theme in this course will be the central nature played by the gaussian random variables . Gaussians have an incredibly rich algebraic structure, and many results about general random variables can be established by first using this structure to verify the result for gaussians, and then using universality techniques (such as the Lindeberg exchange strategy) to extend the results to more general variables.
One way to exploit this algebraic structure is to continuously deform the variance from an initial variance of zero (so that the random variable is deterministic) to some final level . We would like to use this to give a continuous family of random variables as (viewed as a “time” parameter) runs from to .
At present, we have not completely specified what should be, because we have only described the individual distribution of each , and not the joint distribution. However, there is a very natural way to specify a joint distribution of this type, known as Brownian motion. In these notes we lay the necessary probability theory foundations to set up this motion, and indicate its connection with the heat equation, the central limit theorem, and the Ornstein-Uhlenbeck process. This is the beginning of stochastic calculus, which we will not develop fully here.
We will begin with one-dimensional Brownian motion, but it is a simple matter to extend the process to higher dimensions. In particular, we can define Brownian motion on vector spaces of matrices, such as the space of Hermitian matrices. This process is equivariant with respect to conjugation by unitary matrices, and so we can quotient out by this conjugation and obtain a new process on the quotient space, or in other words on the spectrum of Hermitian matrices. This process is called Dyson Brownian motion, and turns out to have a simple description in terms of ordinary Brownian motion; it will play a key role in several of the subsequent notes in this course.
Given a set , a (simple) point process is a random subset of . (A non-simple point process would allow multiplicity; more formally, is no longer a subset of , but is a Radon measure on , where we give the structure of a locally compact Polish space, but I do not wish to dwell on these sorts of technical issues here.) Typically, will be finite or countable, even when is uncountable. Basic examples of point processes include
- (Bernoulli point process) is an at most countable set, is a parameter, and a random set such that the events for each are jointly independent and occur with a probability of each. This process is automatically simple.
- (Discrete Poisson point process) is an at most countable space, is a measure on (i.e. an assignment of a non-negative number to each ), and is a multiset where the multiplicity of in is a Poisson random variable with intensity , and the multiplicities of as varies in are jointly independent. This process is usually not simple.
- (Continuous Poisson point process) is a locally compact Polish space with a Radon measure , and for each of finite measure, the number of points that contains inside is a Poisson random variable with intensity . Furthermore, if are disjoint sets, then the random variables are jointly independent. (The fact that Poisson processes exist at all requires a non-trivial amount of measure theory, and will not be discussed here.) This process is almost surely simple iff all points in have measure zero.
- (Spectral point processes) The spectrum of a random matrix is a point process in (or in , if the random matrix is Hermitian). If the spectrum is almost surely simple, then the point process is almost surely simple. In a similar spirit, the zeroes of a random polynomial are also a point process.
A remarkable fact is that many natural (simple) point processes are determinantal processes. Very roughly speaking, this means that there exists a positive semi-definite kernel such that, for any , the probability that all lie in the random set is proportional to the determinant . Examples of processes known to be determinantal include non-intersecting random walks, spectra of random matrix ensembles such as GUE, and zeroes of polynomials with gaussian coefficients.
I would be interested in finding a good explanation (even at the heuristic level) as to why determinantal processes are so prevalent in practice. I do have a very weak explanation, namely that determinantal processes obey a large number of rather pretty algebraic identities, and so it is plausible that any other process which has a very algebraic structure (in particular, any process involving gaussians, characteristic polynomials, etc.) would be connected in some way with determinantal processes. I’m not particularly satisfied with this explanation, but I thought I would at least describe some of these identities below to support this case. (This is partly for my own benefit, as I am trying to learn about these processes, particularly in connection with the spectral distribution of random matrices.) The material here is partly based on this survey of Hough, Krishnapur, Peres, and Virág.
The Riemann zeta function , defined for by
and then continued meromorphically to other values of by analytic continuation, is a fundamentally important function in analytic number theory, as it is connected to the primes via the Euler product formula
(for , at least), where ranges over primes. (The equivalence between (1) and (2) is essentially the generating function version of the fundamental theorem of arithmetic.) The function has a pole at and a number of zeroes . A formal application of the factor theorem gives
where ranges over zeroes of , and we will be vague about what the factor is, how to make sense of the infinite product, and exactly which zeroes of are involved in the product. Equating (2) and (3) and taking logarithms gives the formal identity
and differentiating the above identity in yields the formal identity
where is the von Mangoldt function, defined to be when is a power of a prime , and zero otherwise. Thus we see that the behaviour of the primes (as encoded by the von Mangoldt function) is intimately tied to the distribution of the zeroes . For instance, if we knew that the zeroes were far away from the axis , then we would heuristically have
for real . On the other hand, the integral test suggests that
and thus we see that and have essentially the same (multiplicative) Fourier transform:
Inverting the Fourier transform (or performing a contour integral closely related to the inverse Fourier transform), one is led to the prime number theorem
In fact, the standard proof of the prime number theorem basically proceeds by making all of the above formal arguments precise and rigorous.
Unfortunately, we don’t know as much about the zeroes of the zeta function (and hence, about the function itself) as we would like. The Riemann hypothesis (RH) asserts that all the zeroes (except for the “trivial” zeroes at the negative even numbers) lie on the critical line ; this hypothesis would make the error terms in the above proof of the prime number theorem significantly more accurate. Furthermore, the stronger GUE hypothesis asserts in addition to RH that the local distribution of these zeroes on the critical line should behave like the local distribution of the eigenvalues of a random matrix drawn from the gaussian unitary ensemble (GUE). I will not give a precise formulation of this hypothesis here, except to say that the adjective “local” in the context of distribution of zeroes means something like “at scale when “.
Nevertheless, we do know some reasonably non-trivial facts about the zeroes and the zeta function , either unconditionally, or assuming RH (or GUE). Firstly, there are no zeroes for (as one can already see from the convergence of the Euler product (2) in this case) or for (this is trickier, relying on (6) and the elementary observation that
is non-negative for and ); from the functional equation
(which can be viewed as a consequence of the Poisson summation formula, see e.g. my blog post on this topic) we know that there are no zeroes for either (except for the trivial zeroes at negative even integers, corresponding to the poles of the Gamma function). Thus all the non-trivial zeroes lie in the critical strip .
We also know that there are infinitely many non-trivial zeroes, and can approximately count how many zeroes there are in any large bounded region of the critical strip. For instance, for large , the number of zeroes in this strip with is . This can be seen by applying (6) to (say); the trivial zeroes at the negative integers end up giving a contribution of to this sum (this is a heavily disguised variant of Stirling’s formula, as one can view the trivial zeroes as essentially being poles of the Gamma function), while the and terms end up being negligible (of size ), while each non-trivial zero contributes a term which has a non-negative real part, and furthermore has size comparable to if . (Here I am glossing over a technical renormalisation needed to make the infinite series in (6) converge properly.) Meanwhile, the left-hand side of (6) is absolutely convergent for and of size , and the claim follows. A more refined version of this argument shows that the number of non-trivial zeroes with is , but we will not need this more precise formula here. (A fair fraction – at least 40%, in fact – of these zeroes are known to lie on the critical line; see this earlier blog post of mine for more discussion.)
Another thing that we happen to know is how the magnitude of the zeta function is distributed as ; it turns out to be log-normally distributed with log-variance about . More precisely, we have the following result of Selberg:
Theorem 1 Let be a large number, and let be chosen uniformly at random from between and (say). Then the distribution of converges (in distribution) to the normal distribution .
To put it more informally, behaves like plus lower order terms for “typical” large values of . (Zeroes of are, of course, certainly not typical, but one can show that one can usually stay away from these zeroes.) In fact, Selberg showed a slightly more precise result, namely that for any fixed , the moment of converges to the moment of .
Remarkably, Selberg’s result does not need RH or GUE, though it is certainly consistent with such hypotheses. (For instance, the determinant of a GUE matrix asymptotically obeys a remarkably similar log-normal law to that given by Selberg’s theorem.) Indeed, the net effect of these hypotheses only affects some error terms in of magnitude , and are thus asymptotically negligible compared to the main term, which has magnitude about . So Selberg’s result, while very pretty, manages to finesse the question of what the zeroes of are actually doing – he makes the primes do most of the work, rather than the zeroes.
Selberg never actually published the above result, but it is reproduced in a number of places (e.g. in this book by Joyner, or this book by Laurincikas). As with many other results in analytic number theory, the actual details of the proof can get somewhat technical; but I would like to record here (partly for my own benefit) an informal sketch of some of the main ideas in the argument.
Recent Comments