A Dynamical Approach to Random Matrix Theory 1470436485, 9781470436483

This book is a concise and self-contained introduction of recent techniques to prove local spectral universality for lar

678 116 1MB

English Pages x+227 [239] Year 2017

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

A Dynamical Approach to Random Matrix Theory
 1470436485, 9781470436483

Citation preview

C O U R A N T

28

˝S LÁSZLÓ ERDO

LECTURE

HO RNG-TZER YAU

A Dynamical Approach to Random Matrix Theory

American Mathematical Society Courant Institute of Mathematical Sciences

NOTES

A Dynamical Approach to Random Matrix Theory

Courant Lecture Notes in Mathematics Executive Editor Jalal Shatah Managing Editor Paul D. Monsour Production Editor Neelang Parghi Copy Editor Michael Munn

László Erdo˝s Institute of Science and Technology Austria

Horng-Tzer Yau Harvard University

28

A Dynamical Approach to Random Matrix Theory

Courant Institute of Mathematical Sciences New York University New York, New York American Mathematical Society Providence, Rhode Island

2010 Mathematics Subject Classification. Primary 15B52, 82B44.

For additional information and updates on this book, visit www.ams.org/bookpages/cln-28

Library of Congress Cataloging-in-Publication Data Names: Erd˝ os, L´ aszl´ o, 1966- | Yau, Horng-Tzer. | Courant Institute of Mathematical Sciences. Title: A dynamical approach to random matrix theory / L´ aszl´ o Erd˝ os, Institute of Science and Technology, Austria, Horng-Tzer Yau, Harvard University. Description: Providence, RI : American Mathematical Society, [2017] | Series: Courant lecture notes in mathematics ; volume 28 | “Courant Institute of Mathematical Sciences.” | Includes bibliographical references and index. Identifiers: LCCN 2017012815 | ISBN 9781470436483 (alk. paper) Subjects: LCSH: Random matrices. Classification: LCC QA196.5 .E73 2017 | DDC 512.9/434–dc23 LC record available at https://lccn.loc.gov/2017012815

Copying and reprinting. Individual readers of this publication, and nonprofit libraries acting for them, are permitted to make fair use of the material, such as to copy select pages for use in teaching or research. Permission is granted to quote brief passages from this publication in reviews, provided the customary acknowledgment of the source is given. Republication, systematic copying, or multiple reproduction of any material in this publication is permitted only under license from the American Mathematical Society. Permissions to reuse portions of AMS publication content are handled by Copyright Clearance Center’s RightsLink service. For more information, please visit: http://www.ams.org/rightslink. Send requests for translation rights and licensed reprints to [email protected]. Excluded from these provisions is material for which the author holds copyright. In such cases, requests for permission to reuse or reprint material should be addressed directly to the author(s). Copyright ownership is indicated on the copyright page, or on the lower right-hand corner of the first page of each article within proceedings volumes. c 2017 by the authors. All rights reserved.  Printed in the United States of America. ∞ The paper used in this book is acid-free and falls within the guidelines 

established to ensure permanence and durability. Visit the AMS home page at http://www.ams.org/ 10 9 8 7 6 5 4 3 2 1

22 21 20 19 18 17

Contents Preface

ix

Chapter 1. Introduction

1

Chapter 2. Wigner Matrices and Their Generalizations

7

Chapter 3. Eigenvalue Density 3.1. Wigner Semicircle Law and Other Canonical Densities 3.2. The Moment Method 3.3. The Resolvent Method and the Stieltjes Transform

11 11 12 14

Chapter 4. Invariant Ensembles 4.1. Joint Density of Eigenvalues for Invariant Ensembles 4.2. Universality of Classical Invariant Ensembles via Orthogonal Polynomials

17 17

Chapter 5. Universality for Generalized Wigner Matrices 5.1. Different Notions of Universality 5.2. The Three-Step Strategy

29 29 31

Chapter 6. Local Semicircle Law for Universal Wigner Matrices 6.1. Setup 6.2. Spectral Information on 𝑆 6.3. Stochastic Domination 6.4. Statement of the Local Semicircle Law 6.5. Appendix: Behavior of Γ and Γ̃ and the Proof of Lemma 6.3

33 33 35 37 38 40

Chapter 7. Weak Local Semicircle Law 7.1. Proof of the Weak Local Semicircle Law, Theorem 7.1 7.2. Large-Deviation Estimates

45 45 57

Chapter 8. Proof of the Local Semicircle Law 8.1. Tools 8.2. Self-Consistent Equations on Two Levels 8.3. Proof of the Local Semicircle Law Without Using the Spectral Gap

61 61 64 67

Chapter 9. Sketch of the Proof of the Local Semicircle Law Using the Spectral Gap

79

v

20

vi

CONTENTS

Chapter 10. Fluctuation Averaging Mechanism 10.1. Intuition Behind the Fluctuation Averaging 10.2. Proof of Lemma 8.9 10.3. Alternative Proof of (8.47) of Lemma 8.9

83 83 84 94

Chapter 11. Eigenvalue Location: The Rigidity Phenomenon 11.1. Extreme Eigenvalues 11.2. Stieltjes Transform and Regularized Counting Function 11.3. Convergence Speed of the Empirical Distribution Function 11.4. Rigidity of Eigenvalues

99 99 100 103 106

Chapter 12. Universality for Matrices with Gaussian Convolutions 12.1. Dyson Brownian Motion 12.2. Derivation of Dyson Brownian Motion and Perturbation Theory 12.3. Strong Local Ergodicity of the Dyson Brownian Motion 12.4. Existence and Restriction of the Dynamics

109 109 112 114 119

Chapter 13. Entropy and the Logarithmic Sobolev Inequality (LSI) 13.1. Basic Properties of the Entropy 13.2. Entropy on Product Spaces and Conditioning 13.3. Logarithmic Sobolev Inequality 13.4. Hypercontractivity 13.5. Brascamp-Lieb Inequality 13.6. Remarks on the Applications of the LSI to Random Matrices 13.7. Extensions to the Simplex; Regularization of the DBM

123 123 126 128 136 138 140 144

Chapter 14. Universality of the Dyson Brownian Motion 14.1. Main Ideas Behind the Proof of Theorem 14.1 14.2. Proof of Theorem 14.1 14.3. Restriction to the Simplex via Regularization 14.4. Dirichlet Form Inequality for Any 𝛽 > 0 14.5. From Gap Distribution to Correlation Functions: Proof of Theorem 12.4 14.6. Details of the Proof of Lemma 14.8

151 153 154 161 163

Continuity of Local Correlation Functions Under the Matrix OU Process 15.1. Proof of Theorem 15.2 15.2. Proof of the Correlation Function Comparison Theorem, Theorem 15.3

165 166

Chapter 15.

Universality of Wigner Matrices in Small Energy Windows: GFT 16.1. Green Function Comparison Theorems 16.2. Conclusion of the Three-Step Strategy

171 174 177

Chapter 16.

Chapter 17.

Edge Universality

183 183 188 191

CONTENTS

vii

Chapter 18. Further Results and Historical Notes 18.1. Wigner Matrices: Bulk Universality in Different Senses 18.2. Historical Overview of the Three-Step Strategy for Bulk Universality 18.3. Invariant Ensembles and Log-Gases 18.4. Universality at the Spectral Edge 18.5. Eigenvectors 18.6. General Mean Field Models 18.7. Beyond Mean Field Models: Band Matrices

203 203

References

217

Index

225

205 207 210 211 213 215

Preface This book is a concise and self-contained introduction to the recent techniques to prove local spectral universality for large random matrices. Random matrix theory is a fast-expanding research area, and this book mainly focuses on the methods we participated in developing over the past few years. Many other interesting topics are not included, nor are several new developments within the framework of these methods. We have chosen instead to present key concepts that we believe are the core of these methods and should be relevant for future applications. We keep technicalities to a minimum to make the book accessible to graduate students. With this in mind, we include in this book the basic notions and tools for high-dimensional analysis such as large deviation, entropy, Dirichlet form, and logarithmic Sobolev inequality. The material in this book originates from our joint works with a group of collaborators over the past several years. Not only were the main mathematical results in this book taken from these works, but the presentation of many sections followed the routes laid out in these papers. In alphabetical order, these coauthors were Paul Bourgade, Antti Knowles, Sandrine Péché, Jose Ramírez, Benjamin Schlein, and Jun Yin. We would like to thank all of them. This manuscript was developed and continuously improved over the last five years. We have taught this material in several regular graduate courses at Harvard, Munich, and Vienna, in addition to various summer schools and short courses. We are thankful for the generous support of the Institute for Advanced Study, Princeton, where part of this manuscript was written during the special year devoted to random matrices in 2013–14. L.E. also thanks Harvard University for the continuous support during his numerous visits. L.E. was partially supported by the SFB TR 12 grant of the German Science Foundation and the ERC Advanced Grant, RANMAT 338804 of the European Research Council. H.-T. Y. would like to thank the National Center for the Theoretic Sciences at the National Taiwan University, where part of the manuscript was written, for the hospitality and support for his several visits. H.-T. Y. gratefully acknowledges the support from NSF DMS-1307444 and DMS-1606305 and a Simons Investigator award. Finally, we are grateful to the editorial support from the publisher, to Amol Aggarwal, Johannes Alt, and Patrick Lopatto for careful reading of the manuscript, and to Alex Gontar for his help in composing the bibliography.

ix

CHAPTER 1

Introduction Perhaps I am now too courageous when I try to guess the distribution of the distances between successive levels [of energies of heavy nuclei] …. Theoretically, the situation is quite simple if one attacks the problem in a simple-minded fashion. The question is simply ‘what are the distances of the characteristic values of a symmetric matrix with random coefficients?’ Eugene Wigner on the Wigner surmise, 1956 [71]

Random matrices appeared in the literature as early as 1928, when Wishart [139] used them in statistics. The natural question regarding their eigenvalue statistics, however, was not raised until the pioneering work [138] of Eugene Wigner in the 1950s. Wigner’s original motivation came from nuclear physics when he noticed from experimental data that gaps in energy levels of large nuclei tend to follow the same statistics irrespective of the material. Quantum mechanics predicts that energy levels are eigenvalues of a self-adjoint operator, but the correct Hamiltonian operator describing nuclear forces was not known at that time. In addition, the computation of the energy levels of large quantum systems would have been impossible even with the full Hamiltonian explicitly given. Instead of pursuing a direct solution to this problem, Wigner appealed to a phenomenological model to explain his observation. Wigner’s pioneering idea was to model the complex Hamiltonian by a random matrix with independent entries. All physical details of the system were ignored except one, the symmetry type: systems with time-reversal symmetry were modeled by real symmetric random matrices, while complex Hermitian random matrices were used for systems without time-reversal symmetry (e.g., with magnetic forces). This simple-minded model amazingly reproduced the correct gap statistics, indicating a profound universality principle working in the background. Notwithstanding their physical significance, random matrices are also very natural mathematical objects, and their studies could have been initiated by mathematicians driven by pure curiosity. A large number of random numbers and vectors have been known to exhibit universal patterns; the obvious examples are the law of large numbers and the central limit theorem. What are their analogues in the noncommutative setting, e.g., for matrices? Focusing on the spectrum, what do eigenvalues of typical large random matrices look like? 1

2

1. INTRODUCTION

As the first result of this type, Wigner proved a type of law of large numbers for the density of eigenvalues, which we now explain. The (real or complex) Wigner ensembles consist of 𝑁 × 𝑁 self-adjoint matrices 𝐻 = (ℎ𝑖𝑗 ) with matrix elements having mean zero and variance 1/𝑁 that are independent up to the symmetry constraint ℎ𝑖𝑗 = ℎ𝑗𝑖 . The Wigner semicircle law states that the empirical density of the eigenvalues of 𝐻 is given by the semicircle law, 1 𝜚sc (𝑥) = 2𝜋 √(4 − 𝑥 2 )+ , as 𝑁 → ∞, independent of the details of the distribution of ℎ𝑖𝑗 . On the scale of individual eigenvalues, Wigner predicted that the fluctuations of the gaps are universal and their distribution is given by a new law, the Wigner surmise. This might be viewed as the random matrix analogue of the central limit theorem. After Wigner’s discovery, Dyson, Gaudin, and Mehta achieved several fundamental mathematical results. In particular, they were able to compute the gap distribution and the local correlation functions of the eigenvalues for Gaussian ensembles. They are called the Gaussian orthogonal ensemble (GOE) and the Gaussian unitary ensemble (GUE), corresponding to the two most important symmetry classes, the real symmetric and complex Hermitian matrices. The Wigner surmise turned out to be slightly wrong and the correct law is given by the Gaudin distribution. Dyson and Mehta gradually formulated what is nowadays known as the Wigner-Dyson-Mehta (WDM) universality conjecture. As presented in the classical treatise of Mehta [106], this conjecture asserts that the local eigenvalue statistics for large random matrices with independent entries are universal; i.e., they do not depend on the particular distribution of the matrix elements. In particular, they coincide with those in the Gaussian case that were computed explicitly. On a more philosophical level, we can recast Wigner’s vision as the hypothesis that the eigenvalue gap distributions for large complicated quantum systems are universal in the sense that they depend only on the symmetry class of the physical system but not on other detailed structures. Therefore, the WignerDyson-Mehta universality conjecture is merely a test of Wigner’s hypothesis for a special class of matrix models, the Wigner ensembles, which are characterized by the independence of their matrix elements. The other large class of matrix models is the invariant ensembles. They are defined by a Radon-Nikodym density of the form exp(−Tr 𝑉(𝐻)) with respect to the flat Lebesgue measure on the space of real symmetric or complex Hermitian matrices 𝐻. Here 𝑉 is a real-valued function called the potential of the invariant ensemble. These distributions are invariant under orthogonal or unitary conjugation, but the matrix elements are not independent except for the Gaussian case. The universality conjecture for invariant ensembles asserts that the local eigenvalue gap statistics are independent of the function 𝑉. In contrast to the Wigner case, substantial progress was made for invariant ensembles in the last two decades. The key element of this success was that invariant ensembles, unlike Wigner matrices, have explicit formulas for

1. INTRODUCTION

3

the joint densities of the eigenvalues. These formulas express the eigenvalue correlation functions as determinants whose entries are given by functions of orthogonal polynomials. The limiting local eigenvalue statistics are thus determined by the asymptotic behavior of these formulas and in particular those of the orthogonal polynomials as the size of the matrix tends to infinity. A key important tool, the Riemann-Hilbert method [72], was brought into this subject by Fokas, Its, and Kitaev, and the universality of eigenvalue statistics was established for large classes of invariant ensembles by Bleher-Its [20] and by Deift and collaborators [37–41]. Behind the spectacular progress in understanding the local statistics of invariant ensembles, the cornerstone of this approach lies in the fact that there are explicit formulas to represent eigenvalue correlation functions by orthogonal polynomials—a key observation made in the original work of Gaudin and Mehta for Gaussian random matrices. For Wigner ensembles, there are no explicit formulas for the eigenvalue statistics, and the WDM conjecture was open for almost 50 years with virtually no progress. The first significant advance in this direction was made by Johansson [86], who proved the universality for complex Hermitian matrices under the assumption that the common distribution of the matrix entries has a substantial Gaussian component; i.e., the random matrix 𝐻 is of the form 𝐻 = 𝐻0 + 𝑎𝐻 G where 𝐻0 is a general Wigner matrix, 𝐻 G is the GUE matrix, and 𝑎 is a certain, not too small, positive constant independent of 𝑁. His proof relied on an explicit formula by Brézin and Hikami [30, 31] that uses a certain version of the Harish-Chandra-ItzyksonZuber formula [85]. These formulas are available for the complex Hermitian case only, which restricted the method to this symmetry class. If local spectral universality is so ubiquitous as Wigner, Dyson, and Mehta conjectured, there must be an underlying mechanism driving local statistics to their universal fixed point. In hindsight, the existence of such a mechanism is almost a synonym of “universality.” However, up to ten years ago there was no clear indication whether a solution of the WDM conjecture would rely on some yet-to-be discovered formulas or would come from some other deep insight. The goal of this book is to give a self-contained introduction to a new approach to local spectral universality which, in particular, resolves the WDM conjecture. To keep technicalities to a minimum, we will consider the simplest class of matrices that we call generalized Wigner matrices (Definition 2.1), and we will focus on proving universality in a simpler form, in the averaged energy sense (see Section 5.1) and the necessary prerequisites. We stress that these are not the strongest results that are currently available, but they are good representatives of the general ideas we have developed in the past several years. We believe that these techniques are sufficiently mature to be presented in a book format.

4

1. INTRODUCTION

This approach consists of the following three steps: Step 1. Local semicircle law: It provides an a priori estimate showing that the density of eigenvalues of generalized Wigner matrices is given by the semicircle law at very small microscopic scales, i.e., down to spectral intervals that contain 𝑁 𝜀 eigenvalues. Step 2. Universality for Gaussian divisible ensembles: It proves that the local statistics of Gaussian divisible ensembles 𝐻0 + 𝑎𝐻 G are the same as those of the Gaussian ensembles 𝐻 G as long as 𝑎 ≥ 𝑁 −1/2+𝜀 , i.e., already for very small 𝑎. Step 3. Approximation by a Gaussian divisible ensemble: It is a type of “density argument” that extends the local spectral universality from Gaussian divisible ensembles to all Wigner ensembles. The conceptually novel point of our method is Step 2. The eigenvalue distributions of the Gaussian divisible ensembles, written in the form 𝑒−𝑡/2 𝐻0 + √1 − 𝑒−𝑡 𝐻 G , are the same as that of the solution of a matrix-valued OrnsteinUhlenbeck (OU) process 𝐻𝑡 for any time 𝑡 ≥ 0. Dyson [45] observed half a century ago that the dynamics of the eigenvalues of 𝐻𝑡 is given by an interacting stochastic particle system, called the Dyson Brownian motion (DBM). In addition, the invariant measure of this dynamics is exactly the eigenvalue distribution of GOE or GUE. This invariant measure is also a Gibbs measure of point particles in one dimension interacting via a long-range, logarithmic potential. Using a heuristic physical argument, Dyson remarked [45] that the DBM reaches its “local equilibrium” on a short time scale 𝑡 ≳ 𝑁 −1 . We will refer to this as Dyson’s conjecture, although it was more an intuitive physical picture than an exact mathematical statement. Since Dyson’s work in the 1960s, there has been virtually no progress in proving this conjecture. Besides the limit of available mathematical tools, one main factor is the vague formulation of the conjecture involving the notion of “local equilibrium,” which even nowadays is not well-defined for a Gibbs measure with a general long-range interaction. Furthermore, a possible connection between Dyson’s conjecture and the solution of the WDM conjecture has never been elucidated in the literature. In fact, “relaxation to local equilibrium” in a time scale 𝑡 refers to the phenomenon that after time 𝑡 the dynamics has changed the system, at least locally, from its initial state to a local equilibrium. It therefore appears counterintuitive that one may learn anything useful in this way about the WDM conjecture, since the latter concerns the initial state. The key point is that by applying local relaxation to all initial states (within a reasonable class) simultaneously, Step 2 generates a large set of random matrix ensembles for which universality holds. We prove that, for the purpose of universality, this set is sufficiently dense so that any Wigner matrix 𝐻 is sufficiently close to a Gaussian divisible ensemble of the form 𝑒−𝑡/2 𝐻0 + √1 − 𝑒−𝑡 𝐻 G with a suitably chosen 𝐻0 . Originally, our motivation was driven not by Dyson’s conjecture, but by the desire to prove the

1. INTRODUCTION

5

universality for Gaussian divisible ensembles with the Gaussian component as small as possible. As it turned out, our method used in Step 2 actually proved a precisely formulated version of Dyson’s conjecture. The applicability of our approach actually goes well beyond the original scope. In recent years the method has been extended to invariant ensembles, sample covariance matrices, adjacency matrices of sparse random graphs, and certain band matrices. At the end of the book we will give a short overview of these applications. In the following, we outline the main contents of the book. In Chapters 2 through 4 we start with an elementary recapitulation of some well-known concepts, such as the moment method, the orthogonal polynomial method, the emergence of the sine kernel, the Tracy-Widom distribution, and the invariant ensembles. These sections are included only to provide backgrounds for the topics discussed in this book; details are often omitted, since these issues are discussed extensively in other books. The core material starts from Chapter 5 with a precise definition of different concepts of universality and with the formulation of a representative theorem (Theorem 5.1) that we eventually prove in this book. We also give here a detailed overview of the three-step strategy. The first step, the local version of Wigner’s semicircle law, is given in Chapter 6. For pedagogical reasons, we give the proof on several levels. A weaker version is given in Chapter 7, whose proof is easier; an ambitious reader may skip this section. The proof of the stronger version is found in Chapter 8, which gives the optimal result in the bulk of the spectrum. To obtain the optimal result at the edge, an additional idea is needed. This is sketched in Chapter 9, but we do not give all the technical details. This section may be skipped at first reading. A key ingredient of the proofs, the fluctuation averaging mechanism, is presented separately in Chapter 10. Important consequences of the local law, such as rigidity, speed of convergence, and eigenvector delocalization, are given in Chapter 11. The second step, the analysis of the Dyson Brownian motion and the proof of Dyson’s conjecture, is presented in Chapter 12 and Chapter 14. Before the proof in Chapter 14, we added a separate Chapter 13 devoted to general tools of very large dimensional analysis, where we introduce the concepts of entropy, Dirichlet form, Bakry-Émery method, and logarithmic Sobolev inequality. Some concepts (e.g., the Brascamp-Lieb inequality and hypercontractivity) are not used in this book, but we included them since they are useful and related topics. The third step, the perturbative argument to compare arbitrary Wigner matrices with Gaussian divisible matrices, is given in two versions. A shorter but somewhat weaker version, the continuity of the correlation functions, is found in Chapter 15. A technically more involved version, the Green function comparison theorem, is presented in Chapter 16. This completes the proof of our main result. Separately, in Chapter 17 we prove the universality at the edge; the main input is the strong form of the local semicircle law at the edge. Finally, we

6

1. INTRODUCTION

give a historical overview, discuss newer results, and provide some outlook on related models in Chapter 18. This book is not a comprehensive monograph on random matrices. Although we summarize a few general concepts at the beginning, we mostly focus on a concrete strategy and the necessary technical steps behind it. For readers interested in other aspects, in addition to the classical book of Mehta [106], several excellent works are available that present random matrices in a broader scope. The books by Anderson, Guionnet, and Zeitouni [8] and Pastur and Shcherbina [113] contain extensive material starting from the basics. Tao’s book [128] provides a different aspect to this subject and is self-contained as a graduate textbook. Forrester’s monograph [73] is a handbook for any explicit formulas related to random matrices. Finally, [6] is an excellent comprehensive overview of diverse applications of random matrix theory in mathematics, physics, neural networks, and engineering. Finally, we list a few notational conventions. In order to focus on the essentials, we will not follow the dependencies of various constants on different parameters. In particular, we will use the generic letters 𝐶 and 𝑐 to denote positive constants, whose values may change from line to line and which may depend on some fixed basic parameters of the model. For two positive quantities 𝐴 and 𝐵, we will write 𝐴 ≲ 𝐵 to indicate that there exists a constant 𝐶 such that 𝐴 ≤ 𝐶𝐵. If 𝐴 and 𝐵 are comparable in the sense that 𝐴 ≲ 𝐵 and 𝐵 ≲ 𝐴, then we write 𝐴 ≍ 𝐵. We introduce the notation J𝐴, 𝐵K ∶= ℤ ∩ [𝐴, 𝐵] for the set of integers between any two real numbers 𝐴 < 𝐵.

CHAPTER 2

Wigner Matrices and Their Generalizations Consider 𝑁 × 𝑁 square matrices of the form

(2.1)

𝐻 = 𝐻 (𝑁)

ℎ ℎ12 … ℎ1𝑁 ⎛ 11 ⎞ ℎ21 ℎ22 ⋯ ℎ2𝑁 ⎟ =⎜ ⋮ ⋱ ⋮ ⎟ ⎜ ⋮ ⎝ℎ𝑁1 ℎ𝑁2 ⋯ ℎ𝑁𝑁 ⎠

with centered entries (2.2)

𝔼 ℎ𝑖𝑗 = 0,

𝑖, 𝑗 = 1, … , 𝑁.

The random variables ℎ𝑖𝑗 for 𝑖, 𝑗 = 1, … , 𝑁, are real or complex random variables subject to the symmetry constraint ℎ𝑖𝑗 = ℎ𝑗𝑖 so that 𝐻 is either Hermitian (complex) or symmetric (real). Let 𝑆 = 𝑆 (𝑁) denote the matrix of variances

(2.3)

𝑠 𝑠12 ⎛ 11 𝑠21 𝑠22 𝑆 ∶= ⎜ ⋮ ⎜ ⋮ 𝑠 𝑠 ⎝ 𝑁1 𝑁2

… 𝑠1𝑁 ⎞ … 𝑠2𝑁 ⎟, ⋱ ⋮ ⎟ … 𝑠𝑁𝑁 ⎠

𝑠𝑖𝑗 ∶= 𝔼|ℎ𝑖𝑗 |2 .

We assume that 𝑁

∑ 𝑠𝑖𝑗 = 1

(2.4)

𝑗=1

for every 𝑖 = 1, … , 𝑁; i.e., the deterministic 𝑁 ×𝑁 matrix of variances, 𝑆 = (𝑠𝑖𝑗 ), is symmetric and doubly stochastic. The size of the random matrix 𝑁 is the fundamental limiting parameter throughout this book. Most results become sharp in the large-𝑁 limit. Many quantities, such as the distribution of 𝐻 and the matrix 𝑆, naturally depend on 𝑁 but, for notational simplicity, we will often omit this dependence from the notation. The simplest case, when 1 , 𝑖, 𝑗 = 1, … , 𝑁, 𝑁 is called the Wigner matrix ensemble. We summarize this setup in the following definition. 𝑠𝑖𝑗 =

Definition 2.1. An 𝑁 × 𝑁 symmetric or Hermitian random matrix (2.1) is called a universal Wigner matrix (ensemble) if the entries are centered (2.2), 7

8

2. WIGNER MATRICES AND THEIR GENERALIZATIONS

their variances 𝑠𝑖𝑗 = 𝔼|ℎ𝑖𝑗 |2 satisfy ∑ 𝑠𝑖𝑗 = 1,

(2.5)

𝑖 = 1, … , 𝑁,

𝑗

and {ℎ𝑖𝑗 ∶ 𝑖 ≤ 𝑗} are independent. An important subclass of the universal Wigner ensembles is called generalized Wigner matrices (ensembles) if, additionally, the variances are comparable; i.e., (2.6)

0 < 𝐶inf ≤ 𝑁𝑠𝑖𝑗 ≤ 𝐶sup < ∞,

𝑖, 𝑗 = 1, … , 𝑁,

holds with some fixed positive constants 𝐶inf , 𝐶sup . For Hermitian ensembles, we additionally require that for each 𝑖, 𝑗 the 2 × 2 covariance matrix is bounded by 𝐶/𝑁 in the matrix sense, i.e., Σ𝑖𝑗 ∶= (

𝔼(ℜℎ𝑖𝑗 )2 𝔼(ℜℎ𝑖𝑗 )(ℑℎ𝑖𝑗 ) 𝐶 )≥ . 𝑁 𝔼(ℜℎ𝑖𝑗 )(ℑℎ𝑖𝑗 ) 𝔼(ℑℎ𝑖𝑗 )2

1 In the special case 𝑠𝑖𝑗 = 1/𝑁 and Σ𝑖𝑗 = 2𝑁 𝐼2×2 , we recover the original definition of the Wigner matrix or Wigner ensemble [138].

In this book we will focus on the generalized Wigner matrices. These are mean field models in the sense that the typical size of the matrix elements ℎ𝑖𝑗 are comparable; see (2.6). In Section 18.7 we will discuss models beyond the mean field regime. The most prominent Wigner ensembles are the Gaussian orthogonal ensemble (GOE) and the Gaussian unitary ensemble (GUE), i.e., real symmetric and complex Hermitian Wigner matrices with rescaled matrix elements √𝑁ℎ𝑖𝑗 being standard Gaussian variables. More precisely, in the Hermitian case, √𝑁ℎ𝑖𝑗 is a standard complex Gaussian variable; i.e., 2

𝔼|√𝑁ℎ𝑖𝑗 | = 1 with the real and imaginary part independent and having the same variance when 𝑖 ≠ 𝑗. Furthermore, √𝑁ℎ𝑖𝑖 is a real standard Gaussian variable with variance 1. For the real symmetric case we require that 2 2 1 𝔼|√𝑁ℎ𝑖𝑗 | = 1 = 𝔼|√𝑁ℎ𝑖𝑖 | 2

for any 𝑖 ≠ 𝑗. For simplicity of the presentation, in the case of the Wigner ensembles we will assume that the variances of ℎ𝑖𝑗 for 𝑖 < 𝑗 are not only identical but also identically distributed. In this case we fix a distribution 𝜈 and we assume that the rescaled matrix elements √𝑁ℎ𝑖𝑗 are distributed according to 𝜈. Depending on the symmetry type, the diagonal elements may have a slightly different distribution, but we will omit this subtlety from the discussion. The distribution 𝜈 will be called the single-entry distribution of 𝐻. In the case of generalized

2. WIGNER MATRICES AND THEIR GENERALIZATIONS

9

−1/2 Wigner matrices, we assume that the rescaled matrix elements 𝑠𝑖𝑗 ℎ𝑖𝑗 with unit variance all have law 𝜈. We will need some decay property of 𝜈 and we will consider two decay types. Either we assume that 𝜈 has subexponential decay, i.e., that there are constants 𝐶, 𝜗 > 0 such that for any 𝑠 > 0

(2.7)

∫ 1(|𝑥| ≥ 𝑠)d𝜈(𝑥) ≤ 𝐶 exp(−𝑠𝜗 ); ℝ

or we assume that 𝜈 has a polynomial decay with arbitrary high power, i.e., (2.8)

∫ 1(|𝑥| ≥ 𝑠)d𝜈(𝑥) ≤ 𝐶𝑝 (1 + 𝑠)−𝑝 ℝ

for all 𝑠 > 0 and any 𝑝 ∈ ℕ. Another class of random matrices, which even predate Wigner, are the random sample covariance matrices. These are matrices of the form (2.9)

𝐻 = 𝑋∗𝑋

where 𝑋 is a rectangular 𝑀 × 𝑁 matrix with centered i.i.d. entries with variance 𝔼|𝑋𝑖𝑗 |2 = 𝑀 −1 . Note that the matrix elements of 𝐻 are not independent, but they are generated from the independent matrix elements of 𝑋 in a straightforward way. These matrices appear in statistical samples and were first considered by Wishart [139]. In the case when 𝑋𝑖𝑗 are centered Gaussian, the random covariance matrices are called Wishart matrices or ensembles.

CHAPTER 3

Eigenvalue Density 3.1. Wigner Semicircle Law and Other Canonical Densities For a real symmetric or complex Hermitian Wigner matrix 𝐻, let 𝜆1 ≤ 𝜆2 ≤ ⋯ ≤ 𝜆𝑁 denote the eigenvalues. The empirical distribution of eigenvalues follows a universal pattern, the Wigner semicircle law. To formulate it more precisely, note that the typical spacing between neighboring eigenvalues is of order 1/𝑁. So, in a fixed interval [𝑎, 𝑏] ⊂ ℝ one expects macroscopically many (of order 𝑁) eigenvalues. More precisely, it can be shown (first proof was given by Wigner [138]) that for any fixed 𝑎 ≤ 𝑏 real numbers, 𝑏

(3.1)

1 lim #{𝑖 ∶ 𝜆𝑖 ∈ [𝑎, 𝑏]} = ∫ 𝜚sc (𝑥)d𝑥, 𝑁→∞ 𝑁 𝑎 𝜚sc (𝑥) ∶=

1 (4 − 𝑥 2 )+ , 2𝜋 √

where (𝑎)+ ∶= max{𝑎, 0} denotes the positive part of the number 𝑎. Note the emergence of the universal density, the semicircle law, that is independent of the details of the distribution of the matrix elements. The semicircle law holds beyond Wigner matrices: it characterizes the eigenvalue density of the universal Wigner matrices (see Definition 2.1). For the random covariance matrices (2.9), the empirical density of eigenvalues 𝜆𝑖 of 𝐻 converges to the Marchenko-Pastur law [104] in the limit when 𝑀, 𝑁 → ∞ such that 𝑑 = 𝑁/𝑀 is fixed 0 < 𝑑 ≤ 1: 𝑏

1 #{𝑖 ∶ 𝜆𝑖 ∈ [𝑎, 𝑏]} = ∫ 𝜚MP (𝑥)d𝑥, 𝑁→∞ 𝑁 𝑎 lim

(3.2)

𝜚MP (𝑥) ∶=

[(𝜆+ − 𝑥)(𝑥 − 𝜆− )]+ 1 , √ 2𝑢𝜋𝑑 𝑥2

with 𝜆± ∶= (1 ± √𝑑)2 being the spectral edges. Note that in case 𝑀 ≤ 𝑁, the matrix 𝐻 has macroscopically many zero eigenvalues; otherwise, the spectra of 𝑋𝑋 ∗ and 𝑋 ∗ 𝑋 coincide so the Marchenko-Pastur law can be applied to all nonzero eigenvalues with the role of 𝑀 and 𝑁 exchanged. 11

12

3. EIGENVALUE DENSITY

3.2. The Moment Method The eigenvalue density is commonly approached via the fairly robust moment method (see [8] for an exposé) that was also the original approach of Wigner to prove the semicircle law [138]. The following proof is formulated for Wigner matrices with identically distributed entries, in particular with variances 𝑠𝑖𝑗 = 𝑁 −1 , but a slight modification of the argument also applies to more general ensembles satisfying (2.5). The simplest moment is the second moment, Tr 𝐻 2 : 𝑁

𝑁

∑ 𝜆2𝑖 = Tr 𝐻 2 = ∑ |ℎ𝑖𝑗 |2 . 𝑖=1

𝑖,𝑗=1

Taking expectation and using (2.5) we have 1 1 ∑ 𝔼 𝜆2𝑖 = ∑ 𝑠𝑖𝑗 = 1. 𝑁 𝑖 𝑁 𝑖𝑗 In general, we can compute traces of all even powers of 𝐻, i.e., 𝔼 Tr 𝐻 2𝑘 , by expanding the product as (3.3)

𝔼 ∑ ℎ𝑖1 𝑖2 ℎ𝑖2 𝑖3 ⋯ ℎ𝑖2𝑘 𝑖1 . 𝑖1 ,…,𝑖2𝑘

Since the expectation of each matrix element vanishes, each factor ℎ𝑥𝑦 must be paired with at least another copy ℎ𝑦𝑥 = ℎ𝑥𝑦 ; otherwise the expectation value is zero. In general, there is no need that the sequence form a perfect pairing. If we identify ℎ𝑦𝑥 with ℎ𝑥𝑦 , the only restriction is that the matrix element ℎ𝑥𝑦 should appear at least twice in the sequence. Under this condition, the main contribution comes from the index sequences that satisfy the perfect pairing condition such that each ℎ𝑥𝑦 is paired with a unique ℎ𝑦𝑥 in (3.3). These sequences can be classified according to their complexity, and it turns out that the main contribution comes from the so-called backtracking sequences. An index sequence 𝑖1 𝑖2 𝑖3 ⋯ 𝑖2𝑘 𝑖1 returning to the original index 𝑖1 is called backtracking if it can be successively generated by a substitution rule (3.4)

𝑎 → 𝑎𝑏𝑎,

𝑏 ∈ {1, … , 𝑁}, 𝑏 ≠ 𝑎,

with an arbitrary index 𝑏. For example, we represent the term (3.5)

ℎ𝑖1 𝑖2 ℎ𝑖2 𝑖3 ℎ𝑖3 𝑖2 ℎ𝑖2 𝑖4 ℎ𝑖4 𝑖5 ℎ𝑖5 𝑖4 ℎ𝑖4 𝑖2 ℎ𝑖2 𝑖1 ,

𝑖1 , … , 𝑖5 ∈ {1, … , 𝑁},

in the expansion of Tr 𝐻 8 (𝑘 = 4) by a walk of length 2 × 4 on the set {1, … , 𝑁}. This path is generated by the operation (3.4) in the following order: 𝑖1 → 𝑖1 𝑖2 𝑖1 → 𝑖1 𝑖2 𝑖3 𝑖2 𝑖1 → 𝑖1 𝑖2 𝑖3 𝑖2 𝑖4 𝑖2 𝑖1 → 𝑖1 𝑖2 𝑖3 𝑖2 𝑖4 𝑖5 𝑖4 𝑖2 𝑖1 with 𝑖1 ≠ 𝑖2 , 𝑖2 ≠ 𝑖3 , 𝑖3 ≠ 𝑖4 , 𝑖4 ≠ 𝑖5 . It may happen that two nonsuccessive labels coincide (e.g., 𝑖1 = 𝑖3 ), but the contribution of such terms is by a factor

3.2. THE MOMENT METHOD

13

1/𝑁 less than the leading term so we can neglect them. Thus, we assume that all labels 𝑖1 , 𝑖2 , 𝑖3 , 𝑖4 , 𝑖5 are distinct. We may bookkeep these paths by two objects: (a) a graph on vertices labeled by 1, 2, 3, 4, 5 (i.e., by the indices of the labels 𝑖𝑗 ) and with edges defined by walking over the vertices in following order 1, 2, 3, 2, 4, 5, 4, 2, 1, i.e., the order of the indices of the labels in (3.5) (b) an assignment of a distinct label 𝑖𝑗 , 𝑗 = 1, 2, 3, 4, 5, to each vertex. It is easy to see that the graph generated by backtracking sequences is a (double-edged) tree on the vertices 1, 2, 3, 4, 5. Now we count the combinatorics of the objects (a) and (b) separately. We start with (a). Lemma 3.1. The number of graphs with 𝑘 double-edges, derived from back1 tracking sequences, is explicitly given by the Catalan numbers, 𝐶𝑘 ∶= 𝑘+1 (2𝑘). 𝑘

Proof. There is a one-to-one correspondence between backtracking sequences and the number of nonnegative one-dimensional random walks of length 2𝑘 returning to the origin at the time 2𝑘. This is defined by the simple rule that a forward step 𝑎 → 𝑏 in the substitution rule (3.4) will be represented by a step to the right in the one-dimensional random walk while the backtracking step 𝑏 → 𝑎 in (3.4) is step to the left. Since at any moment the number of backtracking steps cannot be larger than the number of forward steps, the walk remains on the nonnegative semi-axis. From this interpretation, it is easy to show that the recursive relation 𝑛−1

𝐶𝑛 = ∑ 𝐶𝑘 𝐶𝑛−𝑘−1 ,

𝐶0 = 𝐶1 = 1,

𝑘=0

holds for the number 𝐶𝑛 of the backtracking sequences of total length 2𝑛. Elementary combinatorics shows that the solution to this recursion is given by the Catalan numbers, which proves the lemma. We remark that 𝐶𝑛 has many other definitions. Alternatively, it is also the number of planar binary trees with 𝑛 + 1 vertices and one could also use this definition to verify the lemma. □ The combinatorics of label assignments in step b) for backtracking paths of length 2𝑘 is 𝑁(𝑁 − 1) … (𝑁 − 𝑘) ≈ 𝑁 𝑘+1 for any fixed 𝑘 in the 𝑁 → ∞ limit. It is easy to see that among all paths of length 2𝑘, the backtracking paths support the maximal number (𝑘 + 1)-independent indices. The combinatorics for all non-backtracking paths is at most 𝐶𝑁 𝑘 ; i.e., by a factor 1/𝑁 less, thus they are negligible. Hence 𝔼 Tr 𝐻 2𝑘 can be computed fairly precisely for each finite 𝑘: (3.6)

1 1 2𝑘 𝔼 Tr 𝐻 2𝑘 = ( ) + 𝑂𝑘 (𝑁 −1 ). 𝑁 𝑘+1 𝑘

Note that the number of independent labels, 𝑁 𝑘+1 after dividing by 𝑁, exactly cancels the size of the 𝑘-fold product of variances, (𝔼|ℎ|2 )𝑘 = 𝑁 −𝑘 .

14

3. EIGENVALUE DENSITY

The expectation of the traces of odd powers of 𝐻 is negligible since they can never satisfy the pairing condition. One can easily check that (3.7)

∫ 𝑥 2𝑘 𝜌sc (𝑥)d𝑥 = 𝐶𝑘 ; ℝ

i.e., the semicircle law is identified as the probability measure on ℝ whose even moments are the Catalan numbers and the odd moments vanish. This proves that (3.8)

1 𝔼 Tr 𝑃(𝐻) → ∫ 𝑃(𝑥)𝜚sc (𝑥)d𝑥 𝑁 ℝ

for any polynomial 𝑃. With standard approximation arguments, the polynomials can be replaced with any continuous functions with compact support and even with characteristic functions of intervals. This proves the semicircle law (3.1) for the Wigner matrices in the “expectation sense.” To rigorously complete the proof of the semicircle law (3.1) without expectation, we also need to control the variance of Tr 𝑃(𝐻) in the left-hand side of (3.8). Although this can also be done by the moment method, we will not get into further details here; see [8] for a detailed proof. Finally, we also remark that there are index sequences in (3.3) that require computing higher than the second moment of ℎ. While the combinatorics of these terms is negligible, the proof above assumed polynomial decay (2.8), i.e., that all rescaled moments of ℎ are finite, 𝔼|ℎ𝑖𝑗 |𝑚 ≤ 𝐶𝑚 𝑁 −𝑚/2 . This condition can be relaxed by a cutoff argument, which we skip here. The interested reader should consult [8] for more details. 3.3. The Resolvent Method and the Stieltjes Transform An alternative approach to the eigenvalue density is via the Stieltjes transform. The empirical measure of the eigenvalues is defined by 𝑁

1 ∑ 𝛿(𝑥 − 𝜆𝑗 )d𝑥. 𝜚𝑁 (d𝑥) ∶= 𝑁 𝑗=1 The Stieltjes transform of the empirical measure is defined by 𝑁

(3.9)

d𝜚 (𝑥) 1 1 1 1 ∑ 𝑚(𝑧) = 𝑚𝑁 (𝑧) ∶= Tr = =∫ 𝑁 𝑁 𝐻−𝑧 𝑁 𝑗=1 𝜆𝑗 − 𝑧 𝑥−𝑧 ℝ

for any 𝑧 = 𝐸 + 𝑖𝜂, 𝐸 ∈ ℝ, 𝜂 > 0. Notice that 𝑚 is simply the normalized trace of the resolvent of the random matrix 𝐻 with spectral parameter 𝑧. The real part 𝐸 = Re 𝑧 will often be referred to as the “energy,” alluding to the quantum mechanical interpretation of the spectrum of 𝐻. An important property of the Stieltjes transform of any measure on ℝ is that its imaginary part is positive whenever Im 𝑧 > 0.

3.3. THE RESOLVENT METHOD AND THE STIELTJES TRANSFORM

15

For large 𝑧 one can expand 𝑚𝑁 as follows: ∞

(3.10)

𝑚𝑁 (𝑧) =

𝑚

1 1 1 𝐻 ∑( ) ; Tr =− 𝑁 𝐻−𝑧 𝑁𝑧 𝑚=0 𝑧

so after taking the expectation, using (3.6), and neglecting the error terms, we get ∞

(3.11)

1 2𝑘 ( ) 𝑧−(2𝑘+1) , 𝑘+1 𝑘 𝑘=0

𝔼 𝑚𝑁 (𝑧) ≈ − ∑

which, after some calculus, can be identified as the Laurent series of 12 (−𝑧 + √𝑧2 − 4). The approximation becomes exact in the 𝑁 → ∞ limit. Although the expansion (3.10) is valid only for large 𝑧, given that the limit is an analytic function of 𝑧 one can extend the relation 1 (3.12) lim 𝔼𝑚𝑁 (𝑧) = (−𝑧 + √𝑧2 − 4) 2 𝑁→∞ by analytic continuation to the whole upper half-plane 𝑧 = 𝐸+𝑖𝜂, 𝜂 > 0. It is an easy exercise to see that this is exactly the Stieltjes transform of the semicircle density, i.e., (3.13)

𝜚 (𝑥)d𝑥 1 𝑚sc (𝑧) ∶= (−𝑧 + √𝑧2 − 4) = ∫ sc . 2 𝑥−𝑧 ℝ

The square root function is chosen with a branch cut in the segment [−2, 2] so that √𝑧2 − 4 ≍ 𝑧 at infinity. This guarantees that Im 𝑚sc (𝑧) > 0 for Im 𝑧 > 0. Since the Stieltjes transform identifies the measure uniquely, and pointwise convergence of Stieltjes transforms implies weak convergence of measures, we obtain (3.14)

𝔼 𝜚𝑁 (d𝑥) ⇀ 𝜚sc (𝑥)d𝑥.

The relation (3.12) actually holds with high probability; i.e., for any 𝑧 with Im 𝑧 > 0 1 (3.15) lim 𝑚𝑁 (𝑧) = (−𝑧 + √𝑧2 − 4) 2 𝑁→∞ in probability. In the next sections we will prove this limit with an effective error term via the resolvent method. The semicircle law can be identified in many different ways. The moment method in Section 3.2 utilized the fact that the moments of the semicircle density are given by the Catalan numbers (3.7), which also emerged as the normalized traces of powers of 𝐻; see (3.6). The resolvent method relies on the fact that 𝑚𝑁 approximately satisfies a self-consistent equation, 𝑚𝑁 ≈ −(𝑧 +𝑚𝑁 )−1 , that is very close to the quadratic equation that 𝑚sc from (3.13) satisfies: 1 𝑚sc (𝑧) = − . 𝑧 + 𝑚sc (𝑧)

16

3. EIGENVALUE DENSITY

In other words, in the resolvent method the semicircle density emerges via a specific relation for its Stieltjes transform. It turns out that this approach allows us to perform a much more precise analysis, especially in the short-scale regime where Im 𝑧 approaches to 0 as a function of 𝑁. Since the Stieltjes transform of a measure at spectral parameter 𝑧 = 𝐸 + 𝑖𝜂 essentially identifies the measure around 𝐸 on scale 𝜂 > 0, a precise understanding of 𝑚𝑁 (𝑧) for small Im 𝑧 will yield a local version of the semicircle law. This will be explained in Chapter 6.

CHAPTER 4

Invariant Ensembles 4.1. Joint Density of Eigenvalues for Invariant Ensembles There is another natural way to define probability distributions on real symmetric or complex Hermitian matrices apart from directly imposing a given probability law 𝜈 on their entries. They are obtained by defining a density function directly on the set of matrices (4.1)

ℙ(𝐻)d𝐻 ∶= 𝑍 −1 exp (−𝑁 Tr 𝑉(𝐻))d𝐻.

Here d𝐻 = ∏𝑖≤𝑗 d𝐻𝑖𝑗 is the flat Lebesgue measure (in the case of complex Hermitian matrices and 𝑖 < 𝑗, d𝐻𝑖𝑗 is the Lebesgue measure on the complex plane ℂ). The function 𝑉 ∶ ℝ → ℝ is assumed to grow mildly at infinity (some logarithmic growth would suffice) to ensure that the measure defined in (4.2) is finite and 𝑍 is the normalization factor. Probability distributions of the form (4.1) are called invariant ensembles since they are invariant under orthogonal or unitary conjugation (in the case of symmetric or Hermitian matrices, respectively). For example, in the Hermitian case, for any fixed unitary matrix 𝑈 the transformation 𝐻 → 𝑈 ∗ 𝐻𝑈 leaves the distribution (4.1) invariant thanks to Tr 𝑉(𝑈 ∗ 𝐻𝑈) = Tr 𝑉(𝐻). Wigner matrices and invariant ensembles form two different universes with quite different mathematical tools available for their studies. In fact, these two classes are almost disjoint, the Gaussian ensembles being the only invariant Wigner matrices. This is the content of the following lemma. Lemma 4.1 ([37] or theorem 2.6.3 [106]). Suppose that the real symmetric or complex Hermitian matrix ensembles given in (4.1) have independent entries ℎ𝑖𝑗 , 𝑖 ≤ 𝑗. Then 𝑉(𝑥) is a quadratic polynomial, 𝑉(𝑥) = 𝑎𝑥 2 +𝑏𝑥+𝑐 with 𝑎 > 0. This means that apart from a trivial shift and normalization, the ensemble is GOE or GUE. For ensembles that remain invariant under the transformation 𝐻 → 𝑈 ∗ 𝐻𝑈 for any unitary matrix 𝑈 (or, in the case of symmetric matrices 𝐻, for any orthogonal matrix 𝑈), the joint probability density function of all the 𝑁 eigenvalues can be explicitly computed. These ensembles are typically given by (4.2)

𝛽 ℙ(𝐻)d𝐻 = 𝑍 −1 exp (− 𝑁 Tr 𝑉(𝐻))d𝐻, 2 17

18

4. INVARIANT ENSEMBLES

which is of the form (4.1) with a traditional extra factor 𝛽/2 that makes some later formulas nicer. The parameter 𝛽 is determined by the symmetry type: 𝛽 = 1 for real symmetric ensembles and 𝛽 = 2 for complex Hermitian ensembles. The joint (symmetrized) probability density of the eigenvalues of 𝐻 can be computed explicitly: (4.3)

𝑝𝑁 (𝜆1 , … , 𝜆𝑁 ) = const ∏(𝜆𝑖 − 𝜆𝑗 )𝛽 𝑒

𝑁

𝛽 2

− 𝑁 ∑𝑗=1 𝑉(𝜆𝑗 )

.

𝑖 0. This will imply the rigidity of eigenvalues; i.e., the eigenvalues are near their classical locations in the sense to be made clear in Theorem 11.5. We also obtain precise estimates on the matrix elements of the Green function which, in particular, imply complete delocalization of eigenvectors. Step 2. Universality for Gaussian divisible ensembles: The Gaussian divisible ensembles are matrices of the form 𝐻𝑡 = 𝑒−𝑡/2 𝐻0 + √1 − 𝑒−𝑡 𝐻 G where 𝑡 > 0 is a parameter, 𝐻0 is a (generalized) Wigner matrix, and 𝐻 G is an independent GOE matrix. The parametrization of 𝐻𝑡 is chosen so that 𝐻𝑡 can be obtained by an Ornstein-Uhlenbeck process starting from 𝐻0 . More precisely, consider the following matrix Ornstein-Uhlenbeck process 1 1 (5.7) d𝐻𝑡 = dB𝑡 − 𝐻𝑡 d𝑡 2 √𝑁 with initial data 𝐻0 where B𝑡 = {𝑏𝑖𝑗,𝑡 }𝑁 𝑖,𝑗=1 is a symmetric 𝑁 × 𝑁 matrix such that its matrix elements 𝑏𝑖𝑗,𝑡 for 𝑖 < 𝑗 and 𝑏𝑖𝑖,𝑡 /√2 are independent, standard

32

5. UNIVERSALITY FOR GENERALIZED WIGNER MATRICES

Brownian motions starting from zero. Then 𝐻𝑡 and 𝑒−𝑡/2 𝐻0 + √1 − 𝑒−𝑡 𝐻 G have the same distribution. The aim of Step 2 is to prove the bulk universality of 𝐻𝑡 for 𝑡 = 𝑁 −𝜏 for the entire range of 0 < 𝜏 < 1. This is connected to the local ergodicity of the Dyson Brownian motion, which we now define. Definition 5.2. Given a real parameter 𝛽 ≥ 1, consider the following system of stochastic differential equations (SDE) : (5.8)

d𝜆𝑖 =

√2 √𝛽𝑁

d𝐵𝑖 + (−

𝜆𝑖 1 1 + ∑ )d𝑡, 2 𝑁 𝑗≠𝑖 𝜆𝑖 − 𝜆𝑗

𝑖 ∈ J1, 𝑁K,

where (𝐵𝑖 ) is a collection of real-valued, independent, standard Brownian motions. The solution of this SDE is called the Dyson Brownian motion (DBM). In a seminal paper [45], Dyson observed that the eigenvalue flow of the matrix OU process is exactly the DBM with 𝛽 = 1, 2 corresponding to real symmetric or complex Hermitian ensembles. Furthermore, the invariant measure of the DBM is given by the Gaussian 𝛽-ensemble defined in (4.4). Dyson further conjectured that the time to “local equilibrium” for DBM is of order 1/𝑁, while the time to global equilibrium is of order one. It should be noted that there is no standard notion for the “local equilibrium”; we will instead take a practical point of view to interpret Dyson’s conjecture as that the local statistics of the DBM at any time 𝑡 ≫ 𝑁 −1 satisfy the universality defined earlier in this section. In other words, Dyson’s conjecture is exactly that the local statistics of 𝐻𝑡 are universal for 𝑡 = 𝑁 −𝜏 for any 0 < 𝜏 < 1. Step 3. Approximation by a Gaussian divisible ensemble. This is a simple density argument in the space of matrix ensembles which shows that for any probability distribution of the matrix elements there exists a Gaussian divisible distribution with a small Gaussian component, as in Step 2, such that the two associated Wigner ensembles have asymptotically identical local eigenvalue statistics. The general result to compare any two matrix ensembles with matching moments will be given in Theorem 16.1. Alternatively, to follow the evolution of the Green function under the OU flow, we can use the following continuity of the matrix OU process. Step 3a. Continuity of eigenvalues under the matrix OU process. In Theorem 15.2 we will show that the changes of the local statistics in the bulk under the flow (5.7) up to time scales 𝑡 ≪ 𝑁 −1/2 are negligible; see Lemma 15.4. This clearly can be used in combination with Step 2 to complete the proof of Theorem 5.1. The three-step strategy outlined here is very general, and it has been applied to many different models as we will explain in Chapter 18. It can also be extended to the edges of the spectrum, yielding the universality at the spectral edges. This will be reviewed in Chapter 17.

CHAPTER 6

Local Semicircle Law for Universal Wigner Matrices 6.1. Setup We recall the definition of the universal Wigner matrices (Definition 2.1); in particular, the matrix elements may have different distributions but independence (up to symmetry) is always assumed. The fundamental data of this model is the 𝑁 × 𝑁 matrix of variances 𝑆 = (𝑠𝑖𝑗 ) where 𝑠𝑖𝑗 ∶= 𝔼 |ℎ𝑖𝑗 |2 , and we assume that 𝑆 is (doubly) stochastic: (6.1)

∑ 𝑠𝑖𝑗 = 1 𝑗

for all 𝑖. We will assume the polynomial decay analogous to (5.6) (6.2)

𝔼|ℎ𝑖𝑗 |𝑝 ≤ 𝜇𝑝 [𝑠𝑖𝑗 ]𝑝/2

where 𝜇𝑝 depends only on 𝑝 and is uniform in 𝑖, 𝑗, and 𝑁. We introduce the parameter 𝑀 ∶= [max𝑖,𝑗 𝑠𝑖𝑗 ]−1 that expresses the maximal size of 𝑠𝑖𝑗 : (6.3)

𝑠𝑖𝑗 ≤ 𝑀 −1

for all 𝑖 and 𝑗. We regard 𝑁 as the fundamental parameter and 𝑀 = 𝑀𝑁 as a function of 𝑁. In this section we do not assume a lower bound on 𝑠𝑖𝑗 . Notice that for generalized Wigner matrices 𝑀 is comparable with 𝑁 and one may use 𝑁 everywhere instead of 𝑀 in all error bounds in this section. However, we wish to keep 𝑀 as a separate parameter and assume only that (6.4)

𝑁𝛿 ≤ 𝑀 ≤ 𝑁

for some fixed 𝛿 > 0. For standard Wigner matrices, ℎ𝑖𝑗 are identically distributed, hence 𝑠𝑖𝑗 = 𝑁1 and 𝑀 = 𝑁. Another motivating example where 𝑀 may substantially differ from 𝑁 is the random band matrices that play a key role interpolating between Wigner matrices and random Schrödinger operators (see Section 18.7). Example 6.1. Random band matrices are characterized by translation invariant variances of the form |𝑖 − 𝑗|𝑁 1 (6.5) 𝑠𝑖𝑗 = 𝑓( ), 𝑊 𝑊 33

34

6. LOCAL SEMICIRCLE LAW FOR UNIVERSAL WIGNER MATRICES

where 𝑓 is a smooth, symmetric probability density on ℝ, 𝑊 is a large parameter, called the band width, and |𝑖 − 𝑗|𝑁 denotes the periodic distance on the discrete torus 𝕋 of length 𝑁. In this case, 𝑀 is comparable with 𝑊, which is typically much smaller than 𝑁. The generalization in higher spatial dimensions is straightforward: in this case the rows and columns of 𝐻 are labeled by a discrete 𝑑-dimensional torus 𝕋𝑑𝐿 of length 𝐿 with 𝑁 = 𝐿𝑑 . Recall from (3.1) that the global eigenvalue density of 𝐻 in the 𝑁 → ∞ limit follows the celebrated Wigner semicircle law, 1 (4 − 𝑥 2 )+ , 2𝜋 √ and its Stieltjes transform with the spectral parameter 𝑧 = 𝐸 + i𝜂 is defined by (6.6)

(6.7)

𝜚sc (𝑥) ∶=

𝑚sc (𝑧) ∶= ∫ ℝ

𝜚sc (𝑥) d𝑥. 𝑥−𝑧

The spectral parameter 𝑧 will sometimes be omitted from the notation. The two endpoints ±2 of the support of 𝜚sc are called the spectral edges. It is well-known (see Section 3.3) that the Stieltjes transform 𝑚sc is the unique solution of the quadratic equation (6.8)

𝑚(𝑧) +

1 +𝑧 =0 𝑚(𝑧)

with Im 𝑚(𝑧) > 0 for Im 𝑧 > 0. Thus we have −𝑧 + √𝑧2 − 4 2 where the square root is chosen so that Im 𝑚sc (𝑧) > 0 for Im 𝑧 > 0. In particular, √𝑧2 − 4 ≍ 𝑧 for 𝑧 large so that it cancels the −𝑧-term in the above formula. Under this convention, we collect basic bounds on 𝑚sc in the following lemma. (6.9)

𝑚sc (𝑧) =

Lemma 6.2. We have for all 𝑧 = 𝐸 + 𝑖𝜂 with 𝜂 > 0 that (6.10)

|𝑚sc (𝑧)| = |𝑚sc (𝑧) + 𝑧|−1 ≤ 1.

Furthermore, there is a constant 𝑐 > 0 such that for 𝐸 ∈ [−10, 10] and 𝜂 ∈ (0, 10] we have (6.11)

𝑐 ≤ |𝑚sc (𝑧)| ≤ 1 − 𝑐𝜂,

(6.12)

2 (𝑧)| ≍ √𝜅 + 𝜂, |1 − 𝑚𝑠𝑐

as well as (6.13)

√𝜅 + 𝜂 Im 𝑚sc (𝑧) ≍ { 𝜂 √𝜅+𝜂

if |𝐸| ≤ 2, if |𝐸| ≥ 2,

where 𝜅 ∶= ||𝐸| − 2| denotes the distance of 𝐸 to the spectral edges. Proof. The proof is an elementary exercise using (6.9).



6.2. SPECTRAL INFORMATION ON 𝑺

35

We define the Green function or the resolvent of 𝐻 through 𝐺(𝑧) ∶= (𝐻 − 𝑧)−1 , and denote its entries by 𝐺𝑖𝑗 (𝑧). We recall from (3.9) that the Stieltjes transform of the empirical spectral measure 𝜚(d𝑥) = 𝜚𝑁 (d𝑥) ∶=

1 ∑ 𝛿(𝜆𝛼 − 𝑥)d𝑥 𝑁 𝛼

for the eigenvalues 𝜆1 ≤ ⋯ ≤ 𝜆𝑁 of 𝐻 is (6.14)

𝑚(𝑧) = 𝑚𝑁 (𝑧) ∶= ∫ ℝ

𝜚𝑁 (d𝑥) 1 1 1 = Tr 𝐺(𝑧) = ∑ . 𝑥−𝑧 𝑁 𝑁 𝛼 𝑧 − 𝜆𝛼

We remark that every quantity related to the random matrix 𝐻, such as the eigenvalues, Green function, empirical density of states, and its Stieltjes transform, all depend on 𝑁, but this dependence will often be omitted in the notation for brevity. In some formulas, especially in statements of the main results, we will put back the 𝑁 dependence to stress its presence. Since 𝜂 1 1 1 Im = 𝜃𝜂 (𝐸 − 𝜆𝛼 ) with 𝜃𝜂 (𝑥) ∶= 𝜋 𝑧 − 𝜆𝛼 𝜋 𝑥2 + 𝜂2 is an approximation to the identity (i.e., delta function) at the scale 𝜂 = Im 𝑧, we have 𝜋 −1 Im 𝑚𝑁 (𝑧) = 𝜚𝑁 ∗ 𝜃𝜂 (𝑧); i.e., the imaginary part of 𝑚𝑁 (𝑧) is the density of the eigenvalues “at the scale 𝜂.” Thus, the convergence of the Stieltjes transform 𝑚𝑁 (𝑧) to 𝑚sc (𝑧) as 𝑁 → ∞ will show that the empirical local density of the eigenvalues around the energy 𝐸 in a window of size 𝜂 converges to the semicircle law 𝜚sc (𝐸). Therefore, the key task is to control 𝑚𝑁 (𝑧) for small 𝜂. 6.2. Spectral Information on 𝑺 We will show that the diagonal matrix elements of the resolvent 𝐺𝑖𝑖 (𝑧) satisfy a system of self-consistent vector equations of the form 𝑁

(6.15)



1 ≈ 𝑧 + ∑ 𝑠𝑖𝑗 𝐺𝑗𝑗 (𝑧) 𝐺𝑖𝑖 (𝑧) 𝑗=1

with very high probability. This equation will be viewed as a small perturbation of the deterministic equation 𝑁

(6.16)

1 − = 𝑧 + ∑ 𝑠𝑖𝑗 𝑚𝑖 (𝑧), 𝑚𝑖 (𝑧) 𝑗=1

which, under the side condition that Im 𝑚𝑖 > 0, has a unique solution, namely 𝑚𝑖 (𝑧) = 𝑚sc (𝑧) for every 𝑖 (see (6.8)). Here the stochasticity condition (6.1) is essential. For the stability analysis of (6.16) the invertibility of the operator 2 (𝑧)𝑆 plays a key role. 1 − 𝑚sc

36

6. LOCAL SEMICIRCLE LAW FOR UNIVERSAL WIGNER MATRICES

Therefore an important parameter of the model is 1 ‖ ‖ (6.17) Γ(𝑧) ∶= ‖‖ , Im 𝑧 > 0. 2 1 − 𝑚sc (𝑧)𝑆 ‖‖∞→∞ Note that 𝑆, being a stochastic matrix, satisfies −1 ≤ 𝑆 ≤ 1, and 1 is an eigenvalue with eigenvector e = 𝑁 −1/2 (1, … , 1), 𝑆e = e. For convenience, we assume that 1 is a simple eigenvalue of 𝑆 (which holds if 𝑆 is irreducible and aperiodic). Another important parameter is 1 ‖ | ‖ ˜ | (6.18) Γ (𝑧) ∶= ‖‖ , 2 1 − 𝑚sc (𝑧)𝑆 |e⟂ ‖‖∞→∞ 2 i.e., the norm of (1 − 𝑚sc 𝑆)−1 restricted to the subspace orthogonal to the constants. Recalling that −1 ≤ 𝑆 ≤ 1 and using the upper bound |𝑚sc | ≤ 1 from (6.11), we find that there is a constant 𝑐 > 0 such that (6.19) 𝑐≤ ˜ Γ ≤ Γ. 2 Since |𝑚sc (𝑧)| ≤ 1 − 𝑐𝜂 , we can expand (1 − 𝑚𝑠𝑐 𝑆)−1 into a geometric series. Using that ‖𝑆‖ℓ∞ →ℓ∞ ≤ 1 from (6.1), we obtain the trivial upper bound

(6.20)

Γ ≤ 𝐶𝜂 −1 .

We remark that we also have the following easy lower bound on Γ: 1 2 −1 (6.21) Γ ≥ |1 − 𝑚sc | ≥ . 2 2 This can be seen by applying the matrix (1 − 𝑚sc 𝑆)−1 to the constant vector and using the definition of Γ. For standard Wigner matrices, 𝑠𝑖𝑗 = 𝑁1 , and for bounded energies |𝐸| ≤ 𝐶, we easily obtain that 1 1 ≍ (6.22) Γ(𝑧) = , ˜ Γ (𝑧) = 1, 2 |1 − 𝑚sc (𝑧)| √𝜅𝐸 + 𝜂 where (6.23)

𝜅𝐸 ∶= min{|𝐸 − 2|, |𝐸 + 2|}

is the distance of 𝐸 = Re 𝑧 from the spectral edges. To see this, we note that in this case 𝑆 = 𝑃e , the projection operator onto the direction e. Hence the 2 operator [1 − 𝑚sc (𝑧)𝑆]−1 can be computed explicitly. The comparison relation in (6.22) follows from (6.12). In the general case, we have the same relations up to a constant factor: Lemma 6.3. For generalized Wigner matrices (Definition 2.1) and a bounded range of energies, say for any |𝐸| ≤ 10, we have 𝐶 𝑐 𝐶 ≤ Γ(𝑧) ≤ (6.24) ≍ , 𝑐≤ ˜ Γ (𝑧) ≤ 𝐶, 2 2 |1 − 𝑚sc (𝑧)| |1 − 𝑚sc (𝑧)| √𝜅𝐸 + 𝜂 where the constant 𝐶 depends on 𝐶inf and 𝐶sup in (2.6) and the constant 𝑐 in the lower bound is universal. The proof will be given in Chapter 6.5.

6.3. STOCHASTIC DOMINATION

37

6.3. Stochastic Domination The following definition introduces a notion of a high-probability bound that is suited for our purposes. It first appeared in [55] and it relieves us from the burden of keeping track of exceptional sets of small probability where some bound does not hold. Definition 6.4 (Stochastic domination). Let 𝑋 = (𝑋 (𝑁) (𝑢) ∶ 𝑁 ∈ ℕ, 𝑢 ∈ 𝑈 (𝑁) ),

𝑌 = (𝑌 (𝑁) (𝑢) ∶ 𝑁 ∈ ℕ, 𝑢 ∈ 𝑈 (𝑁) ),

be two families of nonnegative random variables where 𝑈 (𝑁) is a possibly 𝑁dependent parameter set. We say that 𝑋 is stochastically dominated by 𝑌, uniformly in 𝑢, if for all (small) 𝜀 > 0 and (large) 𝐷 > 0 we have sup ℙ[𝑋 (𝑁) (𝑢) > 𝑁 𝜀 𝑌 (𝑁) (𝑢)] ≤ 𝑁 −𝐷 𝑢∈𝑈(𝑁)

for large enough 𝑁 ≥ 𝑁0 (𝜀, 𝐷). Unless stated otherwise, throughout this paper the stochastic domination will always be uniform in all parameters apart from the parameter 𝛿 in (6.4) and the sequence of constants 𝜇𝑝 in (5.6). Thus, 𝑁0 (𝜀, 𝐷) also depends on 𝛿 and 𝜇𝑝 . If 𝑋 is stochastically dominated by 𝑌, uniformly in 𝑢, we use the notation 𝑋 ≺ 𝑌. Moreover, if for some complex family 𝑋 we have |𝑋| ≺ 𝑌, we also write 𝑋 = 𝑂≺ (𝑌). The following proposition collects some basic properties of the stochastic domination. The proofs are left as an exercise. Proposition 6.5. The relation ≺ satisfies the following properties: (i) ≺ is transitive: 𝑋 ≺ 𝑌 and 𝑌 ≺ 𝑍 imply 𝑋 ≺ 𝑍. (ii) ≺ satisfies the familiar arithmetic rules of order relations; i.e., if 𝑋1 ≺ 𝑌1 and 𝑋2 ≺ 𝑌2 , then 𝑋1 + 𝑋2 ≺ 𝑌1 + 𝑌2 and 𝑋1 𝑋2 ≺ 𝑌1 𝑌2 . (iii) Moreover, the following cancellation property holds: (6.25)

if 𝑋 ≺ 𝑌 + 𝑁 −𝜀 𝑋 for some 𝜀 > 0, then 𝑋 ≺ 𝑌.

(iv) Furthermore, if 𝑋 ≺ 𝑌, 𝔼𝑌 ≥ 𝑁 −𝐶 , and |𝑋| ≤ 𝑁 𝐶 almost surely with some fixed exponent 𝐶, then for any 𝜀 > 0 and sufficiently large 𝑁 ≥ 𝑁0 (𝜀) we have (6.26)

𝔼𝑋 ≤ 𝑁 𝜀 𝔼𝑌.

Later in Lemma 10.1 the relation (6.26) will be extended to partial expectations. We now define appropriate subsets of the spectral parameter 𝑧. Definition 6.6 (Spectral domain). We call an 𝑁-dependent family 𝐃 ≡ 𝐃(𝑁) ⊂ {𝑧 ∶ |𝐸| ≤ 10, 𝑀 −1 ≤ 𝜂 ≤ 10} a spectral domain. (Recall that 𝑀 ≡ 𝑀𝑁 depends on 𝑁.) (𝑁)

We always consider families 𝑋 (𝑁) (𝑢) = 𝑋𝑖 (𝑧) indexed by 𝑢 = (𝑧, 𝑖) where 𝑧 takes on values in some spectral domain 𝐃, and 𝑖 takes on values in some finite (possibly 𝑁-dependent or empty) index set. The stochastic domination 𝑋 ≺ 𝑌

38

6. LOCAL SEMICIRCLE LAW FOR UNIVERSAL WIGNER MATRICES

of such families will always be uniform in 𝑧 and 𝑖, and we usually do not state this explicitly. Usually which spectral domain 𝐃 is meant will be clear from the context, in which case we shall not mention it explicitly. For example, using Chebyshev’s inequality and (6.2) one easily finds that ℎ𝑖𝑗 ≺ (𝑠𝑖𝑗 )1/2 ≺ 𝑀 −1/2

(6.27)

uniformly in 𝑖 and 𝑗, so that we may also write ℎ𝑖𝑗 = 𝑂≺ ((𝑠𝑖𝑗 )1/2 ). The definition of ≺ with the polynomial factors 𝑁 −𝜀 and 𝑁 −𝐷 are tailored for the assumption (6.2). We remark that if the analogous subexponential decay (2.7) is assumed then a stronger form of stochastic domination can be introduced, but we will not pursue this direction here. 6.4. Statement of the Local Semicircle Law The local semicircle law is very sensitive to the closeness of 𝜂 = Im 𝑧 to the real axis. When 𝜂 is too small the resolvent and its trace become strongly fluctuating. So our results will hold only above a certain threshold for 𝜂. We now define the lower threshold for 𝜂 that depends on the energy 𝐸 ∈ [−10, 10]

(6.28)

˜ 𝜂 𝐸 ∶= min{𝜂 ∶

𝑀 −2𝛾 1 𝑀 −𝛾 ≤ min{ , } ˜ 𝑀𝜉 Γ (𝐸 + i𝜉)3 ˜ Γ (𝐸 + i𝜉)4 Im 𝑚sc (𝐸 + i𝜉) holds for all 𝜉 ≥ 𝜂}.

Although this expression looks complicated, we shall see that it comes out naturally in the analysis of the self-consistent equations for the Green functions. Here 𝛾 > 0 is a parameter that can be chosen arbitrarily small; for all practical purposes, the reader can neglect it. For generalized Wigner matrices 𝑀 ≍ 𝑁, from (6.24) we have (6.29)

˜ 𝜂 𝐸 ≤ 𝐶𝑁 −1+2𝛾 ;

i.e., we will get the local semicircle law on the smallest possible scale 𝜂 ≫ 𝑁 −1 , modulo an 𝑀 𝛾 correction with an arbitrary, small exponent. We remark that if we assume subexponential decay (2.7) instead of the polynomial decay (6.2), then the small 𝑀 𝛾 correction can be replaced with a (log 𝑀)𝐶 factor. Finally, we define our fundamental control parameter (6.30)

Π(𝑧) ∶=

Im 𝑚sc (𝑧) 1 + . 𝑀𝜂 𝑀𝜂 √

We can now state the main result of this section, which in this full generality first appeared in [55]. Previous results that have cumulatively led to this general formulation will be summarized at the end of the section. Theorem 6.7 (Local semicircle law [55]). Consider a universal Wigner matrix satisfying the polynomial decay condition (6.2) and (6.3). Then, uniformly in

6.4. STATEMENT OF THE LOCAL SEMICIRCLE LAW

39

the energy |𝐸| ≤ 10 and 𝜂 ∈ [ ˜ 𝜂 𝐸 , 10], we have the bounds (6.31) max |𝐺𝑖𝑗 (𝑧) − 𝛿𝑖𝑗 𝑚sc (𝑧)| ≺ Π(𝑧) = 𝑖,𝑗

Im 𝑚sc (𝑧) 1 + , 𝑀𝜂 𝑀𝜂 √

𝑧 = 𝐸 + i𝜂,

as well as 1 . 𝑀𝜂 Moreover, outside of the spectrum we have the stronger estimate 1 1 (6.33) |𝑚𝑁 (𝑧) − 𝑚(𝑧)| ≺ + 𝑀(𝜅𝐸 + 𝜂) (𝑀𝜂)2 √𝜅𝐸 + 𝜂 (6.32)

|𝑚𝑁 (𝑧) − 𝑚sc (𝑧)| ≺

uniformly in 𝑧 ∈ {𝑧 ∶ 2 ≤ |𝐸| ≤ 10, 𝜂˜𝐸 ≤ 𝜂 ≤ 10, 𝑀𝜂 √𝜅𝐸 + 𝜂 ≥ 𝑀 𝛾 } for any fixed 𝛾 > 0 where 𝜅𝐸 ∶= ||𝐸| − 2|. For the generalized Wigner matrix, the threshold 𝜂˜𝐸 can be chosen 𝜂˜𝐸 = 𝑁 −1+2𝛾 . We point out two remarkable features of these bounds. The error term for the resolvent entries behaves essentially as (𝑀𝜂)−1/2 , with an improvement near the edges where Im 𝑚sc vanishes. The error bound for the Stieltjes transform, i.e., for the average of the diagonal resolvent entries, is one order better, (𝑀𝜂)−1 , but without improvement near the edge. The resolvent matrix element 𝐺𝑖𝑗 may be viewed as the scalar product ⟨e𝑖 , 𝐺e𝑗 ⟩ where e𝑖 is the 𝑖th coordinate vector. In fact, a more general version of (6.31), the isotropic local law, also holds for generalized Wigner matrices: Theorem 6.8 (Isotropic law [21]). For a generalized Wigner matrix with polynomial decay (5.6) and for any fixed unit vector 𝐯, 𝐰 we have (6.34)

|⟨𝐯, 𝐺(𝑧)𝐰⟩ − 𝑚sc (𝑧)⟨𝐯, 𝐰⟩| ≺

Im 𝑚sc (𝑧) 1 + 𝑁𝜂 𝑁𝜂 √

uniformly in the set {𝑧 = 𝐸 + 𝑖𝜂 ∶ |𝐸| ≤ 𝜔−1 , 𝑁 −1+𝜔 ≤ 𝜂 ≤ 𝜔−1 } for any fixed 𝜔 > 0. The isotropic law was first proven in [89] for Wigner matrices under a vanishing third moment condition. The general case in the form above was given in [21]. We will not prove this result here since it is not needed for the proof of Theorem 5.1. We will first prove a weaker version of Theorem 6.7 in Chapter 7, the socalled weak local semicircle law where the error term is not optimal. After that, we will prove Theorem 6.7 in Chapter 8 using Γ instead of ˜ Γ . This yields the same estimate as given in Theorem 6.7 but only on a smaller set of the spectral parameter for which the argument is somewhat simpler. The proof of Theorem 6.7 for the entire domain will only be sketched in Chapter 9, and we refer the reader to the original paper for the complete version. Chapter 7 is included mainly for pedagogical reasons to introduce the ideas of continuity argument and self-consistent equations. Chapter 8 demonstrates how to use the vector

40

6. LOCAL SEMICIRCLE LAW FOR UNIVERSAL WIGNER MATRICES

self-consistent equations in a simpler setup. In Chapter 9 we sketch the analysis of the same self-consistent equation but splitting it on space orthogonal to the constant vector and exploiting a spectral gap. Finally, Chapter 10 is devoted to a key technical lemma, the fluctuation averaging lemma. We close this section with a short summary of the main developments that have gradually led to the local semicircle law, Theorem 6.7, in its general form. In Section 8.2 we will demonstrate that all proofs of the local semicircle law rely on some version of a self-consistent equation. At the beginning this was a scalar equation for 𝑚𝑁 used in various forms by many authors; see, e.g., Pastur [110], Girko [77], and Bai [10]. The self-consistent vector equation for 𝑣𝑖 = 𝐺𝑖𝑖 − 𝑚sc (see Section 8.2.2) first appeared in [69]. This allowed one to deviate from the identical distributions for ℎ𝑖𝑗 and opened up the route to estimates on individual resolvent matrix elements. Finally, the self-consistent matrix equation for 𝔼|𝐺𝑥𝑦 |2 first appeared in [54], and it yielded the diffusion profile for the resolvent. In this book we only need the vector equation. The local semicircle law has three important features that have been gradually achieved. We explain them for the case when 𝑀 is comparable with 𝑁, the general case is only a technical extension. First, the local law in the bulk holds down to the scale expressed by the lower bound 𝜂 ≫ 𝑁 −1 . This is the smallest possible scale to control 𝑚(𝑧), since at scale 𝜂 ≲ 𝑁 −1 a few individual eigenvalues strongly influence its behavior, and 𝑚(𝑧) is not close to a deterministic quantity. This optimal scale in the bulk of the spectrum was first established for Wigner matrices in a series of papers [60–62] with the extension to generalized Wigner matrices in [69]. Second, the optimal speed of convergence in the entrywise bound (6.31) is 𝑁 −1/2 , while in the bound 𝑚 − 𝑚sc (6.32) it is 𝑁 −1 . These optimal 𝑁-dependences were first achieved in [68] by introducing the fluctuation averaging mechanism. Third, near the spectral edge the stability of the self-consistent equation deteriorates, manifested by the behavior of Γ(𝑧) in (6.24) when 𝜅𝐸 is small. This can be compensated by the fact that the density is small near the edge. After many attempts and weaker results in [64, 68, 69], the optimal form of this compensation was eventually found in [70], basically separating the analysis of the self-consistent equation onto the space orthogonal to constants. Since ˜ Γ does not deteriorate near the edge, see (6.24), the estimate becomes optimal. ˜ and the Proof of Lemma 6.3 6.5. Appendix: Behavior of 𝚪 and 𝚪 In this appendix we give basic bounds on the parameters Γ and ˜ Γ . Readers interested only in the Wigner case may skip this section entirely; if 𝑠𝑖𝑗 = 𝑁 −1 , then the explicit formulas in (6.22) already suffice. As it turns out, the behavior of Γ and ˜ Γ is intimately linked with the spectrum of 𝑆, more precisely with its spectral gaps. Recall that the spectrum of 𝑆 lies in [−1, 1], with 1 being a simple eigenvalue. Definition 6.9. Let 𝛿− be the distance from −1 to the spectrum of 𝑆 and 𝛿+ the distance from 1 to the spectrum of 𝑆 restricted to 𝐞⟂ . In other words, 𝛿±

˜ AND THE PROOF OF LEMMA 6.3 6.5. APPENDIX: BEHAVIOR OF 𝚪 AND 𝚪

41

are the largest numbers satisfying 𝑆 ≥ −1 + 𝛿− ,

𝑆|𝐞⟂ ≤ 1 − 𝛿+ .

For generalized Wigner matrices the lower and upper spectral gaps satisfy 𝛿± ≥ 𝑎 with a constant 𝑎 ∶= min{𝐶inf , 𝐶sup }; see (2.6). This simple fact follows easily by splitting 𝑆 = (𝑆 − 𝑎ee∗ ) + 𝑎ee∗ and noticing that the first term is (1−𝑎) times a doubly stochastic matrix; hence its spectrum lies in [−1 + 𝑎, 1 − 𝑎]. Proof of Lemma 6.3. The lower bound on Γ in (6.24) follows from (1 − 2 2 −1 𝑚sc 𝑆)−1 𝐞 = (1 − 𝑚sc ) 𝐞 combined with (6.12). For the upper bound, we first 2 notice that 1−𝑚sc 𝑆 is invertible since −1 ≤ 𝑆 ≤ 1 and |𝑚sc | < 1; see (6.11). Since 𝑚sc and its reciprocal are bounded (6.11), it is sufficient to bound the inverse −2 of 𝑚sc − 𝑆. Since the spectrum of 𝑆 lies in the set [−1 + 𝛿− , 1 − 𝛿+ ] ∪ {1} ⊂ −2 [−1 + 𝑎, 1 − 𝑎] ∪ {1}, and |𝑚sc | ≥ 1, we easily get

(6.35)

1 1 ‖ ‖ ‖ ‖ ‖‖ 1 − 𝑚2 𝑆 ‖‖ 2 2 ≤ 𝐶 ‖‖ 𝑚−2 − 𝑆 ‖‖ 2 2 sc sc ℓ →ℓ ℓ →ℓ 𝐶𝑎 𝐶 ≤ ≤ . −2 2 | min{𝑎, |𝑚sc − 1|} |1 − 𝑚sc

2 In order to find the ℓ∞ → ℓ∞ norm, we solve (1 − 𝑚sc 𝑆)𝐯 = 𝐮 directly using (6.10): 2 ‖𝐯‖∞ = ‖𝐮 + 𝑚sc 𝑆𝐯‖∞ ≤ ‖𝐮‖∞ + ‖𝑆‖ℓ1 →ℓ∞ ‖𝐯‖1

≤ ‖𝐮‖∞ + 𝑁 1/2 ‖𝑆‖ℓ1 →ℓ∞ ‖𝐯‖2 ‖ ≤ ‖𝐮‖∞ + 𝑁 1/2 ‖𝑆‖ℓ1 →ℓ∞ ‖ ≤ (1 +

1 ‖ ‖𝐮‖2 2 ‖ 1 − 𝑚sc 𝑆 ℓ2 →ℓ2

𝐶𝑎 𝑁‖𝑆‖ℓ1 →ℓ∞ )‖𝐮‖∞ . 2 | |1 − 𝑚sc

Here we used (6.35) and that ‖𝐯‖1 = ∑𝑖 |𝑣𝑖 | ≤ 𝑁 1/2 (∑𝑖 |𝑣𝑖 |2 )1/2 and ‖𝐮‖2 ≤ 𝑁 1/2 ‖𝐮‖∞ . Since for generalized Wigner matrices 𝑁‖𝑆‖ℓ1 →ℓ∞ = 𝑁 ⋅ max𝑖𝑗 𝑠𝑖𝑗 ≤ 𝐶sup from (2.6), this proves 1 𝐶 ‖ ‖ ‖‖ 1 − 𝑚2 𝑆 ‖‖ ∞ ∞ ≤ |1 − 𝑚2 | , sc sc ℓ →ℓ where the constant depends on 𝐶inf and 𝐶sup . This proves the upper bound on Γ in (6.24). Finally, we bound ˜ Γ . The lower bound was already given in (6.19). For the upper bound we follow the argument above, but we restrict 𝑆 to e⟂ . Since the spectrum of this restriction lies in [−1 + 𝑎, 1 − 𝑎], we immediately get the bound 2 𝑆)−1 |e⟂ in the right-hand side of (6.35). This can 𝐶/𝑎 for the ℓ2 -norm of (1 − 𝑚sc

42

6. LOCAL SEMICIRCLE LAW FOR UNIVERSAL WIGNER MATRICES

be lifted to the same estimate for the ℓ∞ -norm. This completes the proof of the lemma. □ This simple proof of Lemma 6.3 used both spectral gaps and that 𝑠𝑖𝑗 = 𝑂(𝑁 −1 ). Lacking this information in the general case, the following proposition gives explicit bounds on Γ and ˜ Γ depending on the spectral gaps 𝛿± in the general case. We recall the notation 𝑧 = 𝐸 + i𝜂 and 𝜅 = 𝜅𝐸 ∶= ||𝐸| − 2| and define 𝜂

𝜅+ √𝜅+𝜂 𝜃 ≡ 𝜃(𝑧) ∶= { √𝜅 + 𝜂

(6.36)

if |𝐸| ≤ 2, if |𝐸| > 2.

Proposition 6.10. For the matrix elements of 𝑆 we assume 0 ≤ 𝑠𝑖𝑗 = 𝑠𝑗𝑖 ≤ 𝑀 and ∑𝑗 𝑠𝑖𝑗 = 1. Then there is a universal constant 𝐶 such that the following holds uniformly in the domain {𝑧 = 𝐸 + 𝑖𝜂 ∶ |𝐸| ≤ 10, 𝑀 −1 ≤ 𝜂 ≤ 10} and, in particular, in any spectral domain 𝐃. −1

(i) We have the estimate (6.37)

𝐶 log 𝑁 𝐶 log 𝑁 1 ≤ Γ(𝑧) ≤ ≤ . 2 1±𝑚 sc | min{𝜂 + 𝐸 2 , 𝜃} 𝐶 √𝜅 + 𝜂 1 − max± || | 2

(ii) In the presence of a gap 𝛿− we may improve the upper bound to (6.38)

Γ(𝑧) ≤

𝐶 log 𝑁 . min{𝛿− + 𝜂 + 𝐸 2 , 𝜃}

(iii) For ˜ Γ we have the bounds (6.39)

𝐶 −1 ≤ ˜ Γ (𝑧) ≤

𝐶 log 𝑁 . min{𝛿− + 𝜂 + 𝐸 2 , 𝛿+ + 𝜃}

Proof. The first bound of (6.37) follows from 2 2 −1 (1 − 𝑚sc 𝑆)−1 𝐞 = (1 − 𝑚sc ) 𝐞

combined with (6.12). In order to prove the second bound of (6.37), we write 1 1 1 = 2𝑆 2 1+𝑚 sc 21− 1 − 𝑚sc 𝑆 2

and observe that (6.40)

2 2 𝑆‖ ‖ 1 + 𝑚sc | | 1 ± 𝑚sc | | ≤ max ± ‖‖ ‖ ‖ℓ2 →ℓ2 | 2 | =∶ 𝑞. 2

˜ AND THE PROOF OF LEMMA 6.3 6.5. APPENDIX: BEHAVIOR OF 𝚪 AND 𝚪

43

Therefore, 𝑛 −1

𝑛



𝑛

0 2 2 𝑆‖ 𝑆‖ 1 ‖ ‖ 1 + 𝑚sc ‖ 1 + 𝑚sc ‖ √ ∑ ∑ ≤ + 𝑁 ‖‖ 1 − 𝑚2 𝑆 ‖‖ ∞ ∞ ‖ ‖ ‖ ‖‖ 2 2 ‖ ‖ ‖ 2 2 sc ℓ →ℓ ℓ∞ →ℓ∞ ℓ →ℓ

𝑛=0

≤ 𝑛 0 + √𝑁 ≤

𝑛=𝑛0

𝑛0

𝑞 1−𝑞

𝐶 log 𝑁 , 1−𝑞 𝐶 log 𝑁

where in the last step we chose 𝑛0 = 01−𝑞 for large enough 𝐶0 . Here we used that ‖𝑆‖ℓ∞ →ℓ∞ ≤ 1 and (6.11) to estimate the summands in the first sum. This concludes the proof of the second bound of (6.37). The third bound of (6.37) follows from the elementary estimates

(6.41)

2 | 1 − 𝑚sc | 2 | | | 2 | ≤ 1 − 𝑐(𝜂 + 𝐸 ),

2 𝜂 | 1 + 𝑚sc | 2 | | | 2 | ≤ 1 − 𝑐((Im 𝑚sc ) + Im 𝑚sc + 𝜂 ) ≤ 1 − 𝑐𝜃, for some universal constant 𝑐 > 0, where in the last step we used Lemma 6.2. The estimate (6.38) follows similarly. Due to the gap 𝛿− in the spectrum of 𝑆, we may replace the estimate (6.40) with

(6.42)

2 2 𝑆‖ ‖ 1 + 𝑚sc 2 || 1 + 𝑚sc || ≤ max{1 − 𝛿 − 𝜂 − 𝐸 , − ‖‖ 2 2 ‖‖ | 2 |}. 2 ℓ →ℓ

Hence (6.38) follows using (6.41). The lower bound of (6.39) was proved in (6.19). The upper bound is proved similarly to (6.38), except that (6.42) is replaced with 2 2 𝑆| ‖ ‖ 1 + 𝑚sc | 1 + 𝑚sc | 2 | | | ≤ max{1 − 𝛿 − 𝜂 − 𝐸 , min{1 − 𝛿 , − + ‖‖ | 2 |}}. |𝐞⟂ ‖‖ℓ2 →ℓ2 2

This concludes the proof of (6.39).



CHAPTER 7

Weak Local Semicircle Law Before we prove the local semicircle law in the strong form, Theorem 6.7, for pedagogical reasons we first prove the following weaker version whose proof is easier. For simplicity, in this section we consider the Wigner case, i.e., 𝑠𝑖𝑗 = 𝑁1 and 𝑀 = 𝑁. In the bulk, this weaker estimate is still effective for all 𝜂 down to the smallest scales 𝑁 −1+𝜀 , but the power of 𝑁1 in the error estimate is not optimal ( 12 instead of 1 in (6.32)). Near the edge the bound is even weaker; the power of 1 is reduced to 41 , indicating that this proof is not sufficiently strong near the 𝑁 edge. Theorem 7.1 (Weak local semicircle law). Let 𝑧 = 𝐸 + 𝑖𝜂 and let 𝜅 ∶= ||𝐸| − 2|. Let 𝐻 be a Wigner matrix, let 𝐺(𝑧) = (𝐻 − 𝑧)−1 be its resolvent, and set 𝑚𝑁 (𝑧) = 𝑁1 Tr 𝐺(𝑧). We assume that the single-entry distribution satisfies the decay condition (5.6). Choose any 𝛾 > 0. Then for 𝑧 ∈ D𝛾 ∶= {𝐸 + 𝑖𝜂 ∶ |𝐸| ≤ 10, 𝑁 −1+𝛾 ≤ 𝜂 ≤ 10} we have (7.1)

|𝑚𝑁 (𝑧) − 𝑚sc (𝑧)| ≺ min {

1

1 }. 1/4 (𝑁𝜂) √𝑁𝜂𝜅 ,

For definiteness, we present the proof for the Hermitian case, but all formulas below carry over to the other symmetry classes with obvious modifications. 7.1. Proof of the Weak Local Semicircle Law, Theorem 7.1 The proof is divided into five steps. Step 1. Schur complement formula. The first step to prove the weak local semicircle law is the to use the Schur complement formula, which we state in the following lemma. The proof is an elementary exercise. Lemma 7.2 (Schur formula). Let 𝐴, 𝐵, and 𝐶 be 𝑛 × 𝑛, 𝑚 × 𝑛, and 𝑚 × 𝑚 matrices. We define the (𝑚 + 𝑛) × (𝑚 + 𝑛) matrix 𝐷 as (7.2)

𝐴 𝐵∗ 𝐷 ∶= ( ) 𝐵 𝐶

ˆ as and 𝑛 × 𝑛 matrix 𝐷 (7.3)

ˆ ∶= 𝐴 − 𝐵∗ 𝐶 −1 𝐵. 𝐷 45

46

7. WEAK LOCAL SEMICIRCLE LAW

ˆ is invertible if 𝐷 is invertible, and for any 1 ≤ 𝑖, 𝑗 ≤ 𝑛 we have Then 𝐷 ˆ −1 )𝑖𝑗 (𝐷 −1 )𝑖𝑗 = (𝐷

(7.4)

for the corresponding matrix elements. Recall that 𝐺𝑖𝑗 = 𝐺𝑖𝑗 (𝑧) denotes the matrix element of the resolvent 𝐺𝑖𝑗 = (

1 ) . 𝐻 − 𝑧 𝑖𝑗

Let 𝐻 (𝑖) be the matrix where all matrix elements of 𝐻 = (ℎ𝑎𝑏 ) in the 𝑖th column and row are set to be 0: (𝐻 (𝑖) )𝑎𝑏 ∶= ℎ𝑎𝑏 ⋅ 1(𝑎 ≠ 𝑖) ⋅ 1(𝑏 ≠ 𝑖), (𝑖)

𝑎, 𝑏 = 1, … , 𝑁.

[𝑖]

In other words, 𝐻 is the 𝑖th minor 𝐻 of 𝐻 augmented to an 𝑁 × 𝑁 matrix by adding a zero row and column. Recall that the minor 𝐻 [𝑖] is an (𝑁 − 1) × (𝑁 − 1) matrix with the 𝑖th row and column removed: [𝑖]

𝐻𝑎𝑏 ∶= ℎ𝑎𝑏 ,

𝑎, 𝑏 ≠ 𝑖.

Denote the Green function of 𝐻 (𝑖) by 𝐺 (𝑖) (𝑧) = (𝐻 (𝑖) − 𝑧)−1 , which is again an 𝑁 × 𝑁 matrix. Notice that −1

(7.5)

(𝑖) 𝐺𝑎𝑏

⎧(−𝑧) = 0 ⎨ [𝑖] −1 [𝑖] ⎩𝐺𝑎𝑏 = (𝐻 − 𝑧)𝑎𝑏

if 𝑎 = 𝑏 = 𝑖, if exactly one of 𝑎 or 𝑏 equals 𝑖, if 𝑎 ≠ 𝑖, 𝑏 ≠ 𝑖.

We warn the reader that in some earlier papers on the subject a different convention was used where 𝐻 (𝑖) and 𝐺 (𝑖) denoted the (𝑁 − 1) × (𝑁 − 1) minors (𝐻 [𝑖] with the current notation) and their Green functions. The current convention simplifies several formulas although the mathematical contents of both versions are identical. With similar conventions, we can define 𝐺 (𝑖𝑗) , 𝐺 (𝑖𝑗𝑘) , etc. The superscript in parenthesis for resolvents always means “after setting the corresponding row and column of 𝐻 to 0” (in some terminology, this procedure is also described as “zero out the corresponding row and column”). In particular, by the independence of matrix elements, this means that the matrix 𝐺 (𝑖𝑗) , say, is independent of the 𝑖th and 𝑗th row and column of 𝐻. This helps to decouple dependencies in formulae. Let (7.6)

𝐚𝑖 = (ℎ1𝑖 , ℎ2𝑖 , … , 0, … , ℎ𝑁𝑖 )T

be the 𝑖th column of 𝐻, after setting the 𝑖th entry to 0. Using Lemma 7.2 with 𝑛 = 1 and 𝑚 = 𝑁 − 1, (7.5), and (7.6), we have 1 , (7.7) 𝐺𝑖𝑖 = ℎ𝑖𝑖 − 𝑧 − 𝐚𝑖 ⋅ 𝐺 (𝑖) 𝐚𝑖 where (7.8)

(𝑖)

[𝑖]

𝐚𝑖 ⋅ 𝐺 (𝑖) 𝐚𝑖 = ∑ ℎ𝑖𝑘 𝐺𝑘𝑙 ℎ𝑙𝑖 = ∑ ℎ𝑖𝑘 𝐺𝑘𝑙 ℎ𝑙𝑖 . 𝑘,𝑙≠𝑖

𝑘,𝑙≠𝑖

7.1. PROOF OF THE WEAK LOCAL SEMICIRCLE LAW, THEOREM 7.1

47

(We sometimes use 𝐮 ⋅ 𝐯 instead of 𝐮∗ 𝐯 or (𝐮, 𝐯) for the usual Hermitian scalar product.) We now introduce some notation. Definition 7.3 (Partial expectation and independence). Let 𝑋 ≡ 𝑋(𝐻) be a random variable. For 𝑖 ∈ {1, … , 𝑁} define the operations 𝑃𝑖 and 𝑄𝑖 through 𝑃𝑖 𝑋 ∶= 𝔼(𝑋|𝐻 (𝑖) ),

𝑄𝑖 𝑋 ∶= 𝑋 − 𝑃𝑖 𝑋.

We call 𝑃𝑖 the partial expectation in the index 𝑖. Moreover, we say that 𝑋 is independent of a set 𝕋 ⊂ {1, … , 𝑁} if 𝑋 = 𝑃𝑖 𝑋 for all 𝑖 ∈ 𝕋. We can decompose 𝐚𝑖 ⋅ 𝐺 (𝑖) 𝐚𝑖 into its expectation and fluctuation 𝐚𝑖 ⋅ 𝐺 (𝑖) 𝐚𝑖 = 𝑃𝑖 [𝐚𝑖 ⋅ 𝐺 (𝑖) 𝐚𝑖 ] + 𝑍𝑖 where 𝑍𝑖 ∶= 𝑄𝑖 [𝐚𝑖 ⋅ 𝐺 (𝑖) 𝐚𝑖 ].

(7.9)

Since 𝐺 (𝑖) is independent of 𝐚𝑖 , we need to compute expectations and fluctuations of quadratic functions. The expectation is easy: 1 (𝑖) (𝑖) (𝑖) ∑ 𝐺𝑘𝑘 , 𝑃𝑖 [𝐚𝑖 ⋅ 𝐺 (𝑖) 𝐚𝑖 ] = 𝑃𝑖 ∑ 𝐚𝑘𝑖 𝐺𝑘𝑙 𝐚𝑙𝑖 = ∑ 𝑃𝑖 [ℎ𝑖𝑘 𝐺𝑘𝑙 ℎ𝑙𝑖 ] = 𝑁 𝑘,𝑙 𝑘≠𝑖 𝑘,𝑙≠𝑖 where in the last step we used that different matrix elements are independent, i.e., 𝑃𝑖 [ℎ𝑖𝑘 ℎ𝑖𝑙 ] = 𝑁1 𝛿𝑘𝑙 . The summations always run over all indices from 1 to 𝑁, apart from those that are explicitly excluded. We define 1 1 (𝑖) (𝑖) ∑ 𝐺 (𝑧), Tr 𝐺 [𝑖] (𝑧) = 𝑚𝑁 (𝑧) ∶= 𝑁−1 𝑁 − 1 𝑘≠𝑖 𝑘𝑘 [𝑖]

(𝑖)

where we used 𝐺𝑘𝑘 = 𝐺𝑘𝑘 for 𝑘 ≠ 𝑖 from (7.5). Hence we have the identity (7.10)

𝐺𝑖𝑖 =

1 1 = . 1 (𝑖) (𝑖) 𝑖 𝑖 ℎ𝑖𝑖 − 𝑧 − 𝑃𝑖 [𝐚 ⋅ 𝐺 𝐚 ] − 𝑍𝑖 ℎ𝑖𝑖 − 𝑧 − (1 − )𝑚𝑁 (𝑧) − 𝑍𝑖 𝑁

Step 2. Interlacing of eigenvalues. We now estimate the difference between (𝑖) 𝑚sc and 𝑚𝑁 . The first step is the following well-known lemma. We include a short proof for completeness. For simplicity we consider the randomized setup with a continuous distribution to avoid multiple eigenvalues. The general case easily follows from standard approximation arguments. Lemma 7.4 (Interlacing of eigenvalues). Let 𝐻 be a symmetric or Hermitian 𝑁 × 𝑁 matrix with continuous distribution. Decompose 𝐻 as follows: (7.11)

ℎ 𝐚∗ 𝐻=( ), 𝐚 𝐵

where 𝐚 = (ℎ12 , … , ℎ1𝑁 )∗ and 𝐵 = 𝐻 [1] is the (𝑁 − 1) × (𝑁 − 1) minor of 𝐻 obtained by removing the first row and first column from 𝐻. Denote by 𝜇1 ≤ ⋯ ≤ 𝜇𝑁 the eigenvalues of 𝐻 and 𝜆1 ≤ ⋯ ≤ 𝜆𝑁−1 the eigenvalues of 𝐵. Then with

48

7. WEAK LOCAL SEMICIRCLE LAW

probability 1 the eigenvalues of 𝐵 are distinct and the eigenvalues of 𝐻 and 𝐵 are interlaced: (7.12)

𝜇1 < 𝜆1 < 𝜇2 < 𝜆2 < ⋯ < 𝜇𝑁−1 < 𝜆𝑁−1 < 𝜇𝑁 .

Proof. Since matrices with multiple eigenvalues form a lower-dimensional submanifold within the set of all Hermitian matrices, the eigenvalues are distinct almost surely. Let 𝜇 be one of the eigenvalues of 𝐻 and let 𝐯 = (𝑣1 , … , 𝑣𝑁 )T be a normalized eigenvector associated with 𝜇. From the eigenvalue equation 𝐻𝐯 = 𝜇𝐯 and from (7.11) we find that (7.13)

ℎ𝑣1 + 𝐚 ⋅ 𝐰 = 𝜇𝑣1

and 𝐚𝑣1 + 𝐵𝐰 = 𝜇𝐰

with 𝐰 = (𝑣2 , … , 𝑣𝑁 )T . From these equations we obtain 𝐰 = (𝜇 − 𝐵)−1 𝐚𝑣1

and thus

(𝜇 − ℎ)𝑣1 = 𝐚 ⋅ (𝜇 − 𝐵)−1 𝐚𝑣1

(7.14)

=

𝜉𝛼 𝑣1 ∑ 𝑁 𝛼 𝜇 − 𝜆𝛼

using the spectral representation of 𝐵 where we set 2

𝜉𝛼 = |√𝑁𝐚 ⋅ 𝐮𝛼 | , with 𝐮𝛼 being the normalized eigenvector of 𝐵 associated with the eigenvalue 𝜆𝛼 . From the continuity of the distribution it also holds that 𝑣1 ≠ 0 almost surely and thus we have 𝜉𝛼 1 (7.15) 𝜇−ℎ = ∑ , 𝑁 𝛼 𝜇 − 𝜆𝛼 where the 𝜉𝛼 ’s are strictly positive almost surely (notice that 𝐚 and 𝐮𝛼 are independent). In particular, this shows that 𝜇 ≠ 𝜆𝛼 for any 𝛼. In the open interval 𝜇 ∈ (𝜆𝛼−1 , 𝜆𝛼 ), the function Φ(𝜇) ∶=

𝜉𝛼 1 ∑ 𝑁 𝛼 𝜇 − 𝜆𝛼

is strictly decreasing from ∞ to −∞; therefore there is exactly one solution to the equation 𝜇 − ℎ = Φ(𝜇). A similar argument shows that there is also exactly one solution below 𝜆1 and above 𝜆𝑁−1 . This completes the proof. □ We can now compare the normalized traces of 𝐺 and 𝐺 [𝑖] : Lemma 7.5. Under the conditions of Lemma 7.4, for any 1 ≤ 𝑖 ≤ 𝑁 we have (7.16)

1 𝐶 | | (𝑖) |𝑚𝑁 (𝑧) − (1 − )𝑚𝑁 (𝑧)| ≤ | | 𝑁𝜂 , 𝑁

𝜂 = Im 𝑧 > 0.

Proof. With the notation of Lemma 7.4, let 1 1 𝐹(𝑥) ∶= #{𝜆𝑗 ≤ 𝑥} and 𝐹 (𝑖) (𝑥) ∶= #{𝜇𝑗 ≤ 𝑥} 𝑁 𝑁−1

7.1. PROOF OF THE WEAK LOCAL SEMICIRCLE LAW, THEOREM 7.1

49

denote the normalized counting functions of the eigenvalues. The interlacing property of the eigenvalues of 𝐻 and 𝐻 [𝑖] (see (7.12)) in terms of these functions means that sup |𝑁𝐹(𝑥) − (𝑁 − 1)𝐹 (𝑖) (𝑥)| ≤ 1. 𝑥

Then, after integrating by parts, 1 1 d𝐹 (𝑖) (𝑥) | | | | d𝐹(𝑥) (𝑖) |𝑚𝑁 (𝑧) − (1 − )𝑚𝑁 (𝑧)| = |∫ | ∫ − − (1 ) | | | 𝑥−𝑧 𝑁 𝑁 𝑥−𝑧 | 1 𝑁 1 ≤ 𝑁

(7.17)

=

| 𝑁𝐹(𝑥) − (𝑁 − 1)𝐹 (𝑖) (𝑥) | |∫ d𝑥 | (𝑥 − 𝑧)2 | | 𝐶 d𝑥 ∫ ≤ , 𝑁𝜂 |𝑥 − 𝑧|2

which proves (7.16). Note that the (7.16) would also hold without the prefactor 1 (1 − ) since we have the trivial bound |𝑚(𝑖) | ≤ 𝜂 −1 . □ 𝑁

We can now rewrite (7.10) into the following form, after summing over 𝑖: 1 1 1 (7.18) 𝑚𝑁 (𝑧) = ∑ with Ω𝑖 ∶= ℎ𝑖𝑖 − 𝑍𝑖 + 𝑂( ). 𝑁 𝑖 −𝑧 − 𝑚𝑁 (𝑧) + Ω𝑖 𝑁𝜂 At this stage we can explain the main idea for the proof of the local semicircle law (7.1). Equation (7.18) is the key self-consistent equation for 𝑚𝑁 , the Stieltjes transform of the empirical eigenvalue density of 𝐻. Notice that if Ω𝑖 were 0 then we would have the equation 1 (7.19) 𝑚=− 𝑚+𝑧 complemented with the side condition that Im 𝑚 > 0. This is exactly the defining equation (6.8) of the Stieltjes transform of the semicircle density. In the next step we will give bounds on ℎ𝑖𝑖 and 𝑍𝑖 , and then we will effectively control the stability of (7.19) against small perturbations. This will give the desired estimate (7.1) on |𝑚𝑁 − 𝑚sc |. Step 3. Large-deviation estimate of the fluctuations. The quantity 𝑍𝑖 defined in (7.9) will be viewed as a random variable in the probability space of the 𝑖th column. We will need an upper bound on it in the large-deviation sense, i.e., with a very high probability. Since it is a quadratic function of the independent matrix elements of the 𝑖th column, standard large-deviation theorems do not directly apply. To focus on the main line of the proof and to keep technicalities at a minimum, we postpone the discussion of the full version of the quadratic large-deviation bounds to Section 7.2. However, in order to get an idea of the size of 𝑍𝑖 , we compute its second moment as follows: (7.20) 𝑃𝑖 |𝑍𝑖 |2 = ∑

(𝑖)

(𝑖)

(𝑖)

(𝑖)

∑ 𝑃𝑖 [(ℎ𝑖𝑘 𝐺𝑘𝑙 ℎ𝑙𝑖 − 𝑃𝑖 [ℎ𝑖𝑘 𝐺𝑘𝑙 ℎ𝑙𝑖 ])(ℎ𝑖𝑘′ 𝐺 𝑘′ 𝑙′ ℎ𝑙′ 𝑖 − 𝑃𝑖 [ℎ𝑖𝑘′ 𝐺 𝑘′ 𝑙′ ℎ𝑙′ 𝑖 ])].

𝑘,𝑙≠𝑖 𝑘′ ,𝑙′ ≠𝑖

50

7. WEAK LOCAL SEMICIRCLE LAW

Since 𝔼ℎ = 0, the nonzero contributions to this sum come from index combinations when all ℎ and ℎ are paired. For pedagogical simplicity, assume that 𝔼ℎ2 = 0; this holds, for example, if the distribution of the real and imaginary parts are the same. Then ℎ factors in the above expression have to be paired in such a way that ℎ𝑖𝑘 = ℎ𝑖𝑘′ and ℎ𝑖𝑙 = ℎ𝑖𝑙′ , i.e., 𝑘 = 𝑘 ′ and 𝑙 = 𝑙 ′ . Note that pairing ℎ𝑖𝑘 = ℎ𝑖𝑙 would give 0 because the expectation is subtracted. The result is 𝑚 −1 1 (𝑖) 2 (𝑖) 2 (7.21) ℙ 𝑖 |𝑍𝑖 |2 = 2 ∑ |𝐺𝑘𝑙 | + 4 2 ∑ |𝐺𝑘𝑘 | , 𝑁 𝑘,𝑙≠𝑖 𝑁 𝑘≠𝑖 where 𝑚4 = 𝔼|√𝑁ℎ|4 is the fourth moment of the single-entry distribution. The first term can be computed as 1 1 1 (𝑖) 2 [𝑖] 2 ∑ |𝐺𝑘𝑙 | = 2 ∑ |𝐺𝑘𝑙 | = 2 ∑ (|𝐺 [𝑖] |2 )𝑘𝑘 2 𝑁 𝑘,𝑙≠𝑖 𝑁 𝑘,𝑙≠𝑖 𝑁 𝑘≠𝑖 (7.22)

=

1 1 [𝑖] ∑ Im 𝐺𝑘𝑘 𝑁𝜂 𝑁 𝑘≠𝑖

=

1 1 (𝑖) (1 − ) Im 𝑚𝑁 𝑁𝜂 𝑁

where |𝐺|2 = 𝐺𝐺 ∗ . In the middle step we have used the Ward identity valid for the resolvent 𝑅(𝑧) = (𝐴 − 𝑧)−1 of any Hermitian matrix 𝐴: |𝑅(𝑧)|2 =

(7.23)

1 1 = Im 𝑅(𝑧), 2 2 𝜂 |𝐴 − 𝐸| + 𝜂

𝑧 = 𝐸 + 𝑖𝜂.

We can estimate the second term in (7.21) by the general fact that the resolvent 𝑅 = (𝐴 − 𝑧)−1 of any Hermitian matrix 𝐴, we have ∑ |𝑅𝑘𝑘 |2 ≤ ∑

(7.24)

𝛼

𝑘

1 |𝜁𝛼 − 𝑧|2

where 𝜁𝛼 are the eigenvalues of 𝐴. To see this, let 𝐮𝛼 be the normalized eigenvectors. Then by the spectral theorem 𝑅𝑘𝑘 = ∑ 𝛼

|𝐮𝛼 (𝑘)|2 , 𝜁𝛼 − 𝑧

and thus we have 2

|𝐮𝛼 (𝑘)|2 |𝐮𝛽 (𝑘)| |𝜁𝛼 − 𝑧| |𝜁𝛽 − 𝑧| 𝛼,𝛽

∑ |𝑅𝑘𝑘 |2 ≤ ∑ ∑ 𝑘

𝑘

≤∑ 𝛼

1 1 ∑ |𝐮 (𝑘)|2 ∑ |𝐮𝛽 (𝑘)|2 = ∑ , |𝜁𝛼 − 𝑧|2 𝑘 𝛼 |𝜁 − 𝑧|2 𝛼 𝛼 𝛽

where we have used the Schwarz inequality and that {𝐮𝛽 } is an orthonormal basis. Applying this bound to the Green function of 𝐻 [𝑖] with eigenvalues 𝜇𝛼 ,

7.1. PROOF OF THE WEAK LOCAL SEMICIRCLE LAW, THEOREM 7.1

51

we have 2 2 1 |𝐺 (𝑖) | = 1 ∑ |𝐺 [𝑖] | ∑ | | 𝑁 2 𝑘≠𝑖 | 𝑘𝑘 | 𝑁 2 𝑘≠𝑖 𝑘𝑘 𝑁−1

𝜂 1 1 ∑ ≤ 𝑁𝜂 𝑁 𝛼=1 |𝜇𝛼 − 𝑧|2

(7.25)

=

1 1 (𝑖) (1 − ) Im 𝑚𝑁 . 𝑁𝜂 𝑁

(𝑖)

By (7.16), we can estimate 𝑚𝑁 by 𝑚𝑁 . Thus the estimates (7.22) and (7.25) confirm that the size of 𝑍𝑖 is roughly (7.26)

|𝑍𝑖 | ≲

1 √𝑁𝜂

√Im 𝑚𝑁

in the second-moment sense. In Section 7.2 we will prove that this inequality actually holds in large-deviation sense; i.e., we have (7.27)

|𝑍𝑖 | ≺

1 √𝑁𝜂

√Im 𝑚𝑁 .

The diagonal entry ℎ𝑖𝑖 can be easily estimated. Since the single-entry distribution has finite moments (5.6), we have ℙ(|ℎ𝑖𝑖 | ≥ 𝑁 𝜀 𝑁 −1/2 ) ≤ 𝐶𝑝 𝑁 −𝜀𝑝 for each 𝑖 and for any 𝜀 > 0. Hence we can guarantee that all diagonal elements ℎ𝑖𝑖 simultaneously satisfy |ℎ𝑖𝑖 | ≺ 𝑁 −1/2 . Step 4. Initial estimate at large scales. To control the error terms in the self-consistent equation (7.18), we need two inputs. First, from now on we assume that (7.27) holds. Since 𝑁𝜂 is large, this implies that 𝑍𝑖 is small provided that Im 𝑚𝑁 is bounded. Second, we need to ensure that the denominator in the right-hand side of (7.18) does not become too small. Since the main term in this denominator is 𝑧 + 𝑚𝑁 (𝑧), our task is to show that 1 (7.28) ≺ 1. |𝑧 + 𝑚𝑁 (𝑧)| Notice that both inputs are in terms of the yet uncontrolled quantity 𝑚𝑁 ; they would be trivially available if 𝑚𝑁 were replaced with 𝑚sc (see Lemma 6.2). Since the smallness of 𝑚𝑁 −𝑚sc is our goal, to break this apparently circular argument we will use a bootstrap strategy. The convenient bootstrap parameter is 𝜂. We first establish the result for large 𝜂 in this section, which is called the initial estimate. Then, in the next section, step by step we reduce the value 𝜂 by using the control from the previous scale to estimate Im 𝑚𝑁 and |𝑧 + 𝑚𝑁 (𝑧)|−1 . This control will use the large-deviation bounds on 𝑍𝑖 , which hold with very high probability. Hence at each step an exceptional event of very small probability will have to be excluded. This is the main reason why the bootstrap argument is

52

7. WEAK LOCAL SEMICIRCLE LAW

done in small discrete steps, although in essence this is a continuity argument. We will explain this important argument in more detail in the next section. Both the initial estimate and the continuity argument use the following simple idea. By expanding Ω𝑖 in the denominator in (7.18) and using (7.27) and ℎ𝑖𝑖 ≺ 𝑁 −1/2 , we have

(7.29)

1 | | | |𝑚𝑁 (𝑧) + | 𝑧 + 𝑚𝑁 (𝑧) | 1 max |Ω𝑖 | ≺ |𝑧 + 𝑚𝑁 (𝑧)|2 𝑖 ≺

Im 𝑚𝑁 1 + 𝑁 −1/2 + (𝑁𝜂)−1 ] [ |𝑧 + 𝑚𝑁 (𝑧)|2 √ 𝑁𝜂

provided that Ω𝑖 can be considered as a small perturbation of 𝑧 + 𝑚𝑁 (𝑧), i.e., 1 (7.30) max |Ω𝑖 | ≪ 1, |𝑧 + 𝑚𝑁 (𝑧)| 𝑖 so that the expansion is justified. We can now use the following elementary lemma. Lemma 7.6. Fix 𝑧 = 𝐸 + 𝑖𝜂, with |𝐸| ≤ 20, 0 < 𝜂 ≤ 10, and set 𝜅 ∶= ||𝐸| − 2|. Suppose that 𝑚 satisfies the inequality 1 | | |𝑚 + |≤𝛿 | 𝑧 + 𝑚|

(7.31) for some 𝛿 ≤ 1. Then (7.32)

1 | 𝐶𝛿 | |} ≤ ≤ 𝐶 √𝛿. min {|𝑚 − 𝑚sc (𝑧)|, |𝑚 − | 𝑚sc (𝑧) | √𝜅 + 𝜂 + 𝛿

Proof. For 𝛿 ≤ 1 and |𝑧| ≤ 20, (7.31) implies that |𝑚| ≤ 22. Write (7.31) as

1 =∶ Δ, |Δ| ≤ 𝛿, 𝑧+𝑚 and subtract this from the equation 𝑚sc + (𝑧 + 𝑚sc )−1 = 0. After some simple algebra we get 𝑚+

(7.33)

(𝑚 − 𝑚sc )[𝑚 −

1 ] = Δ(𝑚 + 𝑧), 𝑚sc

i.e., (7.34)

1 | | | ≤ 𝐶𝛿, |𝑚 − 𝑚sc |||𝑚 − 𝑚sc |

for some fixed constant 𝐶, where we have used (6.11). 2 | ≤ 𝐶 ′ √𝛿 for some large 𝐶 ′ , then we write We separate two cases. If |1 − 𝑚sc the above inequality as (7.35)

2 | | 1 − 𝑚sc | ≤ 𝐶𝛿. |𝑚 − 𝑚sc | |𝑚 − 𝑚sc − | 𝑚sc |

7.1. PROOF OF THE WEAK LOCAL SEMICIRCLE LAW, THEOREM 7.1

53

We claim that this implies |𝑚 − 𝑚sc | ≤ 2𝐶 ′ √𝛿/|𝑚sc |. Indeed, if |𝑚 − 𝑚sc | ≥ 2𝐶 ′ √𝛿/|𝑚sc | were true, the second factor in (7.35) would be at least 𝐶 ′ √𝛿/|𝑚sc |, so the left-hand side of (7.35) would be at least 2(𝐶 ′ /|𝑚sc |)2 𝛿. Since 𝑚sc ≍ 1 (see (6.11)), we would get a contradiction if 𝐶 ′ is large enough. Thus we proved |𝑚 − 𝑚sc | ≤ 2𝐶 ′ √𝛿/|𝑚sc | ≤ 𝐶 √𝛿 in this case, which is in agreement with 2 the first inequality in (7.32) since the condition |1 − 𝑚sc | ≤ 𝐶 ′ √𝛿 also implies √𝜅 + 𝜂 ≲ √𝛿 by (6.12). 2 In the second case we have |1 − 𝑚sc | ≥ 𝐶 ′ √𝛿, i.e., √𝜅 + 𝜂 ≳ √𝛿, so for −1 | ≲ 𝛿/√𝜅 + 𝜂. If (7.32) we need to prove that |𝑚 − 𝑚sc | ≲ 𝛿/√𝜅 + 𝜂 or |𝑚 − 𝑚sc 1

2 |𝑚 − 𝑚sc | ≤ |1 − 𝑚sc |/|𝑚sc |, then the second factor in (7.35) would be at least 1 2

2 2 |1 − 𝑚sc |/|𝑚sc |

≍ √𝜅 + 𝜂, so we would immediately get |𝑚 − 𝑚sc | ≲ 𝛿/√𝜅 + 𝜂. 1

2 We are left with the case |𝑚 − 𝑚sc | ≥ |1 − 𝑚sc |/|𝑚sc |, i.e., |𝑚 − 𝑚sc | ≳ √𝜅 + 𝜂. 2 Rewrite (7.35) as 2 | | | 1 − 𝑚sc 1 | 1 | |𝑚 − | ≤ 𝐶𝛿, |𝑚 − + | 𝑚sc 𝑚sc | | 𝑚sc |

(7.36)

and repeat the previous argument with the roles of 𝑚sc and 1/𝑚sc interchanged. −1 2 We conclude that either |𝑚 − 𝑚sc | ≤ 12 |1 − 𝑚sc |/|𝑚sc |, in which case we im−1 −1 mediately get |𝑚 − 𝑚sc | ≲ 𝛿/√𝜅 + 𝜂, or |𝑚 − 𝑚sc | ≳ √𝜅 + 𝜂. In the latter case, combining it with |𝑚 − 𝑚sc | ≳ √𝜅 + 𝜂 from the previous argument, we −1 2 would get |𝑚 − 𝑚sc ||𝑚 − 𝑚sc | ≳ 𝜅 + 𝜂. Since 𝜅 + 𝜂 ≳ |1 − 𝑚sc | ≥ 𝐶 ′ 𝛿, this ′ would contradict (7.34) if 𝐶 is large enough. This completes the proof of the lemma. □

Applying Lemma 7.6 in (7.29), we see that (7.37)

| min{|𝑚𝑁 (𝑧) − 𝑚sc (𝑧)|, ||𝑚𝑁 (𝑧) −

1 | |} ≺ 𝑚sc (𝑧) |

1/2 max𝑖 |Ω𝑖 | 1 min{(max |Ω𝑖 |) , }; |𝑧 + 𝑚𝑁 (𝑧)| 𝑖 √𝜅

i.e., a good bound on Ω𝑖 directly yields an estimate on 𝑚𝑁 − 𝑚sc provided a −1 (𝑧)| lower bound on |𝑧 + 𝑚𝑁 (𝑧)| is given and if the possibility that |𝑚𝑁 (𝑧) − 𝑚sc is small can be excluded. Using this scheme, we now derive the initial estimate, which is (7.1) for any 𝜂 ≥ 𝑁 −1/16 . To check (7.30), we start with the trivial bound Im 𝑚𝑁 ≤ 𝜂 −1 . By (7.27), we have 𝑍𝑖 ≺ 𝑁 −15/32 for 𝜂 ≥ 𝑁 −1/16 . Hence we have (7.38)

max |Ω𝑖 | ≤ max |𝑍𝑖 | + |ℎ𝑖𝑖 | + 𝐶𝑁 −15/16 ≺ 𝑁 −15/32 . 𝑖

𝑖

54

7. WEAK LOCAL SEMICIRCLE LAW

Since Im(𝑧 + 𝑚𝑁 (𝑧)) ≥ 𝜂 ≥ 𝑁 −1/16 , (7.30) is satisfied for 𝜂 ≥ 𝑁 −1/16 . Thus (7.29) implies that (7.39)

1 1 | | |𝑚𝑁 (𝑧) + |≺ 𝑂(𝑁 −15/32 ) ≺ 𝑁 −11/32 | 𝑧 + 𝑚𝑁 (𝑧) | |𝑧 + 𝑚𝑁 (𝑧)|2

for any 𝜂 ≥ 𝑁 −1/16 . The precise exponents do not matter here; our goal is to find ′ some 𝜂 = 𝑁 −𝑐 such that the last equation holds with an error 𝑁 −𝑐 for some 𝑐, 𝑐′ > 0. Applying Lemma 7.6 we obtain, for any 𝜂 ≥ 𝑁 −1/16 , that either (7.40)

|𝑚𝑁 (𝑧) − 𝑚sc (𝑧)| ≺ 𝑁 −11/64

or

1 | | |𝑚𝑁 (𝑧) − | ≺ 𝑁 −11/64 . | 𝑚sc (𝑧) |

However, the second option is excluded since Im 𝑚𝑁 ≥ 0 and thus (7.41)

|𝑚𝑁 (𝑧) −

1 1 | ≥ Im 𝑚𝑁 − Im 𝑚sc (𝑧) 𝑚sc (𝑧) ≥ 𝑐 Im 𝑚sc (𝑧) ≥ 𝑐𝜂 ≥ 𝑐𝑁 −1/16 ,

where we used (6.13). This proves that (7.42)

|𝑚𝑁 (𝑧) − 𝑚sc (𝑧)| ≺ 𝑁 −11/64 .

Once we know that |𝑚𝑁 (𝑧) − 𝑚sc (𝑧)| is small, we can simply perturb the relation 1 | | | | | 𝑧 + 𝑚sc (𝑧) | = |𝑚sc (𝑧)| ≤ 1 from Lemma 6.2 to obtain (7.43)

|𝑚𝑁 (𝑧)| ≤ 𝐶

1 | | | ≤ 𝐶. and || 𝑧 + 𝑚𝑁 (𝑧) |

Thus we can bound Ω𝑖 from (7.27) and ℎ𝑖𝑖 ≺ 𝑁 −1/2 by (7.44)

max |Ω𝑖 | ≺ 𝑖

1 √𝑁𝜂

,

and, instead of (7.39), we get the stronger bound 1 1 | | |𝑚𝑁 (𝑧) + |≺ . | 𝑧 + 𝑚𝑁 (𝑧) | √𝑁𝜂 Applying (7.32) once again, with 𝛿 = (𝑁𝜂)−1/2 and using (7.41) to exclude the possibility that 𝑚𝑁 is close to 1/𝑚sc , we obtain the better bound (7.45)

|𝑚𝑁 (𝑧) − 𝑚sc (𝑧)| ≺ min(

1 1 , ) for any 𝜂 ≥ 𝑁 −1/16 , 1/4 (𝑁𝜂) √𝑁𝜂𝜅

which is exactly (7.1) for 𝜂 ≥ 𝑁 −1/16 . Step 5. Continuity argument and completion of the proof. With (7.1) proven for any 𝜂 ≥ 𝑁 −1/16 , we now proceed to reduce the scale of 𝜂, while 𝐸, the real

7.1. PROOF OF THE WEAK LOCAL SEMICIRCLE LAW, THEOREM 7.1

55

part of 𝑧, is kept fixed. To do this, choose another scale 𝜂1 = 𝜂 − 𝑁 −4 slightly smaller than 𝜂 (but still 𝜂1 ≥ 𝑁 −1 ). Since (7.46)

′ |𝑚𝑁 (𝑧)| ≤ 𝑁 −1 ∑ 𝛼

1 ≤ 𝜂 −2 , |𝜆𝛼 − 𝑧|2

we have the deterministic relation (7.47)

|𝑚𝑁 (𝑧1 ) − 𝑚𝑁 (𝑧)| ≤ 𝑁 −2 ,

𝑧 = 𝐸 + i𝜂, 𝑧1 = 𝐸 + i𝜂1 .

Hence (7.1) holds as well for 𝜂1 instead of 𝜂 since 𝑁 −2 is smaller than all our error terms. But we cannot rely on this idea for the obvious reason that we would need to apply this procedure at least 𝑁 4 times and the many small errors would accumulate. We need to show that the estimate in (7.1) does not deteriorate even by the small amount 𝑁 −2 . Since the definition of ≺ includes additional factors 𝑁 𝜀 , it is not sufficiently sensitive to track such precision. For the continuity argument we will now abandon the formalism of stochastic domination and go back to tracking exceptional sets precisely. In Chapter 8 we will set up a continuity scheme fully within the framework of stochastic domination, but here we complete the proof in a more elementary way. The idea is to show that for any small exponent 𝜀 ∈ (0, 𝛾/10) there is a constant 𝐶1 large enough such that if −1 𝑆(𝑧) ∶= min {|𝑚𝑁 (𝑧) − 𝑚sc (𝑧)|, |𝑚𝑁 (𝑧) − 𝑚sc (𝑧)|}

(7.48)

≤ 𝐶1 𝑁 𝜀 min(

1 1 , ) 1/4 (𝑁𝜂) √𝑁𝜂𝜅

in a set 𝐴 in the probability space, then we not only have the bound (7.49)

𝑆(𝑧1 ) ≤ 𝐶1 𝑁 𝜀 min(

1 1 , ) + 2𝑁 −2 (𝑁𝜂)1/4 √𝑁𝜂𝜅

that trivially follows from (7.47), but we also have that (7.50)

𝑆(𝑧1 ) ≤ 𝐶1 𝑁 𝜀 min(

1 1 , ) (𝑁𝜂1 )1/4 √𝑁𝜂1 𝜅

holds with the same 𝐶1 and 𝜀 in a set 𝐴1 ⊂ 𝐴 with (7.51)

𝑃(𝐴 ⧵ 𝐴1 ) ≤ 𝑁 −𝐷

for any 𝐷 > 0.

Thus the deterioration of the estimate from (7.47) to (7.50) can be avoided at least with a very high probability. To see this, we note that in the regime 𝜂 ≥ 𝑁 −1+5𝜀 the bound (7.49) implies that in the set 𝐴 we have (7.52)

|Im 𝑚𝑁 (𝑧1 )| ≤ |𝑚𝑁 (𝑧1 )| ≤ |𝑚sc (𝑧1 )| + 𝑜(1) ≤ 2

and (7.53)

1 ≤ 2. |𝑧1 + 𝑚𝑁 (𝑧1 )|

56

7. WEAK LOCAL SEMICIRCLE LAW

The last estimate is a perturbative consequence of the bounds |𝑧 + 𝑚sc (𝑧)|−1 ≤ 1 −1 and |𝑧 + 𝑚sc (𝑧)|−1 ≤ 𝐶, where we used either (7.54) |(𝑧1 + 𝑚𝑁 (𝑧1 )) − (𝑧 + 𝑚sc (𝑧))| ≤ |𝑚𝑁 (𝑧1 ) − 𝑚𝑁 (𝑧)| + |𝑧1 − 𝑧| + |𝑚𝑁 (𝑧) − 𝑚sc (𝑧)| = 𝑜(1) or −1 |(𝑧1 + 𝑚𝑁 (𝑧1 )) − (𝑧 + 𝑚sc (𝑧))| ≤ −1 |𝑚𝑁 (𝑧1 ) − 𝑚𝑁 (𝑧)| + |𝑧1 − 𝑧| + |𝑚𝑁 (𝑧) − 𝑚sc (𝑧)| = 𝑜(1);

one of which follows from (7.47) and (7.48). Let 𝐴1 be the intersection of three sets: 𝐴, the set on which |ℎ𝑖𝑖 | ≤ 𝑁 −1/2+𝜀 holds, and the set that (7.27) holds at the scale 𝜂1 , i.e., at 𝑧 = 𝐸 +𝑖𝜂1 . Hence, together with (7.53), the condition (7.30) holds in 𝐴1 . Now we can use (7.37) together with (7.52) and (7.53) to prove that (7.50) holds if 𝐶1 is chosen large enough. At each step, we lose a set of probability 𝑁 −𝐷 due to checking (7.27) at that scale. The estimate from the previous scale is used only to check (7.52), (7.53), and (7.30). Finally, the constant 𝐶1 in the final bound of 𝑆(𝑧1 ) comes from the constant in (7.32), which is uniform in 𝜂. Therefore, 𝐶1 does not deteriorate when passing to the next scale. The only price to pay is the loss of an exceptional set of probability 𝑁 −𝐷 . Since the number of steps are 𝑁 4 , while 𝐷 can be arbitrary large, this loss is affordable. Since the exponent 𝜀 > 0 was arbitrary, this proves −1 (𝑧)|} 𝑆(𝑧) = min{|𝑚𝑁 (𝑧) − 𝑚sc (𝑧)|, |𝑚𝑁 (𝑧) − 𝑚sc

(7.55)

≺ min{

1

1 } 1/4 (𝑁𝜂) √𝑁𝜂𝜅 ,

ˆ 𝛾 ∶= 𝐃𝛾 ∩ for any fixed 𝑧 ∈ 𝐃𝛾 . Using this relation for a discrete net 𝐃 −4 𝑁 (ℤ + 𝑖ℤ) and a union bound, we obtain that (7.55) holds simultaneously for ˆ 𝛾 . Using the Lipschitz continuity of 𝑆(𝑧) (with a Lipschitz constant at all 𝑧 ∈ 𝐃 most 𝑁 2 ), we get (7.55) simultaneously for all 𝑧 ∈ 𝐃𝛾 . Finally, we need to show that the estimate holds not only for the minimum, but for |𝑚𝑁 (𝑧) − 𝑚sc (𝑧)|. We have already seen this for any 𝜂 ≥ 𝑁 −1/16 in (7.45). Fixing 𝐸 = Re 𝑧, we consider 𝑚𝑁 (𝐸 + 𝑖𝜂), 𝑚sc (𝐸 + 𝑖𝜂), and 𝑆(𝐸 + 𝑖𝜂) as functions of 𝜂. Since these are continuous functions, by reducing 𝜂 we obtain that 𝑚𝑁 (𝐸 + 𝑖𝜂) remains close to 𝑚sc (𝐸 + 𝑖𝜂) as long as the right-hand side of (7.55) is larger than the difference −1 |𝑚sc (𝐸 + 𝑖𝜂) − 𝑚sc (𝐸 + 𝑖𝜂)| ≍ √𝜅𝐸 + 𝜂.

The right-hand side of (7.55) increases as 𝜂 decreases, while the separation bound √𝜅𝐸 + 𝜂 decreases. Therefore, once 𝜂 is so small that the separation bound √𝜅𝐸 + 𝜂 becomes smaller than the right-hand side of (7.55), the differ−1 remains irrelevant for any smaller 𝜂. Thus (7.1) holds ence between 𝑚sc and 𝑚sc on the entire 𝐃𝛾 . This proves the weak local semicircle law, i.e., Theorem 7.1

7.2. LARGE-DEVIATION ESTIMATES

57

except for the large-deviation estimate (7.27), which we will prove in the next subsection. 7.2. Large-Deviation Estimates Finally, in order to estimate large sums of independent random variables as in (7.8) and later in (8.2), we will need a large-deviation estimate for linear and quadratic functionals of independent random variables. The case of the linear functionals is standard. Quadratic functionals were considered in [62, 69]; the current formulation is taken from [52]. (𝑁)

Theorem 7.7 (Large-deviation bounds). Let (𝑋𝑖 pendent families of random variables and

(𝑁) (𝑎𝑖𝑗 )

here 𝑁 ∈ ℕ and 𝑖, 𝑗 = 1, … , 𝑁. Suppose that all

(𝑁)

) and (𝑌𝑖

) be inde-

(𝑁) and (𝑏𝑖 ) be deterministic; (𝑁) (𝑁) entries 𝑋𝑖 and 𝑌𝑖 are inde-

pendent and satisfy (7.56)

𝔼𝑋 = 0,

𝔼|𝑋|2 = 1,

‖𝑋‖𝑝 ∶= (𝔼|𝑋|𝑝 )1/𝑝 ≤ 𝜇𝑝 ,

for all 𝑝 ∈ ℕ and some constants 𝜇𝑝 . Then we have the bounds 1/2

∑ 𝑏𝑖 𝑋𝑖 ≺ (∑|𝑏𝑖 |2 )

(7.57)

𝑖

,

𝑖 1/2

∑ 𝑎𝑖𝑗 𝑋𝑖 𝑌𝑗 ≺ (∑|𝑎𝑖𝑗 |2 )

(7.58)

𝑖,𝑗

1/2

∑ 𝑎𝑖𝑗 𝑋𝑖 𝑋𝑗 ≺ (∑|𝑎𝑖𝑗 |2 )

(7.59)

,

𝑖,𝑗

𝑖≠𝑗

.

𝑖≠𝑗

Our proof in fact generalizes trivially to arbitrary multilinear estimates for ∗ quantities of the form ∑𝑖 ,…,𝑖 𝑎𝑖1 …𝑖𝑘 (𝑢)𝑋𝑖1 (𝑢) ⋯ 𝑋𝑖𝑘 (𝑢), where the star indicates 1 𝑘 that the summation indices are constrained to be distinct. For our purposes the most important inequality is the last one, (7.59). It confirms the intuition from Step 3 on page 49 that for sums of the form ∑𝑖≠𝑗 𝑎𝑖𝑗 𝑋𝑖 𝑋𝑗 the second-moment calculation gives the correct order of magnitude, and it can be improved to a high probability statement in a large-deviation sense. Indeed, the second-moment calculation gives 2

| | 𝔼|| ∑ 𝑎𝑖𝑗 𝑋𝑖 𝑋𝑗 || = ∑ ∑ 𝑎𝑖𝑗 𝑎𝑖′ 𝑗′ 𝑋𝑖 𝑋𝑗 𝑋 𝑖′ 𝑋 𝑗′ ′ ′ 𝑖≠𝑗

𝑖≠𝑗 𝑖 ≠𝑗

= ∑ [|𝑎𝑖𝑗 |2 + 𝑎𝑖𝑗 𝑎𝑗𝑖 ] ≤ 2 ∑ |𝑎𝑖𝑗 |2 , 𝑖≠𝑗

𝑖≠𝑗

since only the pairings 𝑖 = 𝑖 ′ , 𝑗 = 𝑗 ′ and 𝑖 = 𝑗 ′ , 𝑗 = 𝑖 ′ give nonzero contribution. To prepare the proof of Theorem 7.7, we first recall the following version of the Marcinkiewicz-Zygmund inequality.

58

7. WEAK LOCAL SEMICIRCLE LAW

Lemma 7.8. Let 𝑋1 , … , 𝑋𝑁 be a family of independent random variables, each satisfying (7.56), and suppose that the family (𝑏𝑖 ) is deterministic. Then ‖∑ 𝑏 𝑋 ‖ ≤ (𝐶𝑝)1/2 𝜇 (∑|𝑏 |2 )1/2 . 𝑖 𝑖‖ 𝑝 𝑖 ‖ 𝑝

(7.60)

𝑖

𝑖

Proof. The proof is a simple application of Jensen’s inequality. Writing 𝐵 ∶= ∑𝑗 |𝑏𝑖 |2 , we get, by the classical Marcinkiewicz–Zygmund inequality [127] in the first line, that 2

𝑝

2

𝑝

𝑝/2

‖∑ 𝑏 𝑋 ‖ ≤ (𝐶𝑝)𝑝/2 ‖(∑|𝑏 |2 |𝑋 |2 )1/2 ‖ = (𝐶𝑝)𝑝/2 𝐵𝑝 𝔼[(∑ |𝑏𝑖 | |𝑋 |2 ) 𝑖 𝑖‖ 𝑖 𝑖 ‖ ‖ ‖𝑝 𝐵2 𝑖 𝑝 𝑖

𝑖

]

𝑖

≤ (𝐶𝑝)𝑝/2 𝐵𝑝 𝔼[∑ 𝑖



|𝑏𝑖 |2 |𝑋 |𝑝 ] 𝐵2 𝑖

𝑝 (𝐶𝑝)𝑝/2 𝐵𝑝 𝜇𝑝 .



Next, we prove the following intermediate result. Lemma 7.9. Let 𝑋1 , … , 𝑋𝑁 , 𝑌1 , … , 𝑌𝑁 be independent random variables, each satisfying (7.56), and suppose that the family (𝑎𝑖𝑗 ) is deterministic. Then for all 𝑝 ≥ 2 we have ‖∑ 𝑎 𝑋 𝑌 ‖ ≤ 𝐶𝑝𝜇2 (∑|𝑎 |2 )1/2 . 𝑖𝑗 𝑝 𝑖𝑗 𝑖 𝑗 ‖ ‖ 𝑝 𝑖,𝑗

𝑖,𝑗

Proof. Write ∑ 𝑎𝑖𝑗 𝑋𝑖 𝑌𝑗 = ∑ 𝑏𝑗 𝑌𝑗 , 𝑖,𝑗

𝑏𝑗 ∶= ∑ 𝑎𝑖𝑗 𝑋𝑖 .

𝑗

𝑖

Note that (𝑏𝑗 ) and (𝑌𝑗 ) are independent families. By conditioning on the family (𝑏𝑗 ), we therefore get from Lemma 7.8 and the triangle inequality that 1/2

1/2 ‖∑ 𝑏 𝑌 ‖ ≤ (𝐶𝑝)1/2 𝜇 ‖∑|𝑏 |2 ‖ 1/2 2 ≤ (𝐶𝑝) 𝜇 ‖𝑏 ‖ . (∑ ) 𝑗 𝑗‖ 𝑝‖ 𝑗 ‖ 𝑝 𝑗 𝑝 ‖ 𝑝 𝑝/2 𝑗

𝑗

𝑗

Using Lemma 7.8 again, we have ‖𝑏𝑗 ‖𝑝 ≤ (𝐶𝑝)1/2 𝜇𝑝 ‖‖∑|𝑎𝑖𝑗 |2 ‖‖

1/2

.

𝑖



This concludes the proof.

Lemma 7.10. Let 𝑋1 , … , 𝑋𝑁 be independent random variables, each satisfying (7.56), and suppose that the family (𝑎𝑖𝑗 ) is deterministic. Then we have ‖∑ 𝑎 𝑋 𝑋 ‖ ≤ 𝐶𝑝𝜇2 (∑|𝑎 |2 )1/2 . 𝑝 𝑖𝑗 𝑖 𝑗 ‖ 𝑖𝑗 ‖ 𝑝 𝑖≠𝑗

𝑖≠𝑗

7.2. LARGE-DEVIATION ESTIMATES

59

Proof. The proof relies on the identity (valid for 𝑖 ≠ 𝑗) 1 ∑ 𝟏(𝑖 ∈ 𝐼)𝟏(𝑗 ∈ 𝐽) (7.61) 1= 𝑍𝑁 𝐼⊔𝐽=ℕ 𝑁

where the sum ranges over all partitions of ℕ𝑁 = {1, … , 𝑁} into two sets 𝐼 and 𝐽, and 𝑍𝑁 ∶= 2𝑁−2 is independent of 𝑖 and 𝑗. Moreover, we have ∑

(7.62)

1 = 2𝑁 − 2,

𝐼⊔𝐽=ℕ𝑁

where the sum ranges over nonempty subsets 𝐼 and 𝐽. Now we may estimate ‖∑ 𝑎 𝑋 𝑋 ‖ ≤ 1 ∑ ‖‖∑ ∑ 𝑎𝑖𝑗 𝑋𝑖 𝑋𝑗 ‖‖ 𝑖𝑗 𝑖 𝑗 ‖ ‖ 𝑍 𝑝 𝑝 𝑁 𝐼⊔𝐽=ℕ 𝑖∈𝐼 𝑗∈𝐽 𝑖≠𝑗 𝑁



1 𝑍𝑁

1/2



2 𝐶𝑝𝜇𝑝 (∑|𝑎𝑖𝑗 |2 )

𝐼⊔𝐽=ℕ𝑁

𝑖≠𝑗

where we used that, for any partition 𝐼 ⊔𝐽 = ℕ𝑁 , the families (𝑋𝑖 )𝑖∈𝐼 and (𝑋𝑗 )𝑗∈𝐽 are independent, and hence Lemma 7.9 is applicable. The claim now follows from (7.62). □ As remarked above, the proof of Lemma 7.10 may be easily extended to ∗ multilinear expressions of the form ∑𝑖 ,…,𝑖 𝑎𝑖1 …𝑖𝑘 𝑋𝑖1 ⋯ 𝑋𝑖𝑘 . 1 𝑘 We may now complete the proof of Theorem 7.7. Proof of Theorem 7.7. The proof is a simple application of Chebyshev’s inequality. Part (i) follows from Lemma 7.8, part (ii) from Lemma 7.10, and part (iii) from Lemma 7.9. We give the details for part (iii). For 𝜖 > 0 and 𝐷 > 0 we have 1/2

ℙ[||∑ 𝑎𝑖𝑗 𝑋𝑖 𝑋𝑗 || ≥ 𝑁 𝜀 Ψ] ≤ ℙ[||∑ 𝑎𝑖𝑗 𝑋𝑖 𝑋𝑗 || ≥ 𝑁 𝜀 Ψ, (∑|𝑎𝑖𝑗 |2 ) 𝑖≠𝑗

≤ 𝑁 𝜀/2 Ψ]

𝑖≠𝑗

𝑖≠𝑗

+ ℙ[(∑|𝑎𝑖𝑗 |2 )

1/2

≥ 𝑁 𝜀/2 Ψ]

𝑖≠𝑗 1/2

≤ ℙ[||∑ 𝑎𝑖𝑗 𝑋𝑖 𝑋𝑗 || ≥ 𝑁 𝜀/2 (∑|𝑎𝑖𝑗 |2 ) 𝑖≠𝑗

≤(

] + 𝑁 −𝐷−1

𝑖≠𝑗

2 𝑝 𝐶𝑝𝜇𝑝 ) + 𝑁 −𝐷−1 𝑁 𝜀/2

for arbitrary 𝐷. In the second step we used the definition of (∑𝑖≠𝑗 |𝑎𝑖𝑗 |2 )1/2 ≺ Ψ with parameters 𝜀/2 and 𝐷 + 1. In the last step we used Lemma 7.10 by conditioning on (𝑎𝑖𝑗 ). Given 𝜀 and 𝐷, there is a large enough 𝑝 such that the first term on the last line is bounded by 𝑁 −𝐷−1 . Since 𝜖 and 𝐷 were arbitrary, the proof is complete. The claimed uniformity in 𝑢 in the case that 𝑎𝑖𝑗 and 𝑋𝑖 depend on an index 𝑢 also follows from the above estimate. □

CHAPTER 8

Proof of the Local Semicircle Law In this section we start the proof of the local semicircle law, Theorem 6.7. This section can be read independently of the previous Chapter 7, so some basic definitions and facts are repeated for convenience. For those readers who may wish to compare this argument with the proof of the weak law, Theorem 7.1, we mention that the basic strategy is similar except that we use an additional mechanism that we call fluctuation averaging. We point out that the first estimate in (7.29) was not optimal; here we estimated the average of Ω𝑖 by its maximum. Since Ω𝑖 ’s are almost centered random variables with weak correlation, a more precise estimate of their average leads to a considerable improvement. We now recall that the basic steps to prove the weak law were (1) (2) (3) (4) (5)

the self-consistent equation for 𝑚𝑁 , (𝑖) the interlacing of eigenvalues to compare 𝑚𝑁 and 𝑚𝑁 , the quadratic large-deviation estimate for the error term 𝑍𝑖 , the initial estimate for large 𝜂, and extending the estimate to small 𝜂 by the continuity argument.

Exploiting the fluctuation averaging mechanism requires a control on the individual matrix elements of the resolvent 𝐺 instead of just its normalized trace, 𝑚𝑁 = 𝑁 −1 Tr 𝐺. Therefore, instead of considering the scalar equation for 𝑚𝑁 , we will consider the vector self-consistent equation for the diagonal elements 𝐺𝑖𝑖 of the Green function and investigate the stability of this equation. There is no direct analogue of the interlacing property for 𝐺𝑖𝑖 ; thus, we introduce new resolvent decoupling identities to compare the resolvents of the original matrix and its minors. We will still use the quadratic large-deviation estimate to bound the error term. We also use a continuity argument similar to the one given in the weak law to derive a crude estimate on 𝐺𝑖𝑖 (formulated in terms of a certain dichotomy), but instead of making many small steps and keeping track of the exceptional sets, we follow a genuinely continuous approach within the framework of the stochastic domination. Finally, we use the fluctuation-averaging lemma (Lemma 8.9) to boost the error estimate by one order and an iteration argument to prove the local semicircle law, Theorem 6.7. We now start the rigorous proof. We will largely follow the presentation in [55]. 8.1. Tools In this subsection we collect some basic definitions and facts. First, we repeat Definition 7.3 of the partial expectation. 61

62

8. PROOF OF THE LOCAL SEMICIRCLE LAW

Definition 8.1 (Partial expectation and independence). Let 𝑋 ≡ 𝑋(𝐻) be a random variable. For 𝑖 ∈ {1, … , 𝑁} define the operations 𝑃𝑖 and 𝑄𝑖 through 𝑃𝑖 𝑋 ∶= 𝔼(𝑋|𝐻 (𝑖) ),

𝑄𝑖 𝑋 ∶= 𝑋 − 𝑃𝑖 𝑋.

We call 𝑃𝑖 partial expectation in the index 𝑖. Moreover, we say that 𝑋 is independent of a set 𝕋 ⊂ {1, … , 𝑁} if 𝑋 = 𝑃𝑖 𝑋 for all 𝑖 ∈ 𝕋. Next, we define matrices with certain columns and rows zeroed out. Definition 8.2. For 𝕋 ⊂ {1, … , 𝑁} we set 𝐻 (𝕋) to be the 𝑁 × 𝑁 matrix defined by (𝐻 (𝕋) )𝑖𝑗 ∶= 𝟏(𝑖 ∉ 𝕋)𝟏(𝑗 ∉ 𝕋)ℎ𝑖𝑗 ,

𝑖, 𝑗 = 1, … , 𝑁.

Moreover, we define the resolvent of 𝐻 (𝕋) and its normalized trace through (𝕋)

𝐺𝑖𝑗 (𝑧) ∶= (𝐻 (𝕋) − 𝑧)−1 𝑖𝑗 ,

𝑚(𝕋) (𝑧) ∶=

1 Tr 𝐺 (𝕋) (𝑧). 𝑁

We also set the notation (𝕋)

∑ ∶= ∑ . 𝑖

𝑖∶𝑖∉𝕋

These definitions are the natural generalizations of 𝐻 (𝑖) and 𝐺 (𝑖) introduced in Chapter 1. In particular, notice that 𝐻 (𝕋) is the matrix obtained by setting all rows and columns in 𝕋 to 0. This is different from considering the minors by removing the columns and rows in 𝕋. Similarly, 𝐺 (𝕋) is still an 𝑁 × 𝑁 matrix (𝕋) (𝕋) with 𝐺𝑖𝑖 = −𝑧−1 for 𝑖 ∈ 𝕋 and 𝐺𝑖𝑗 = 0 if 𝑖 ∈ 𝕋 and 𝑗 ∉ 𝕋. We will denote 𝐺 ({𝑖}) simply by 𝐺 (𝑖) and similarly for a few more indices. This is consistent with the notation we used in the earlier chapters. The following resolvent decoupling identities form the backbone of all of our calculations. The idea behind them is that a resolvent matrix element 𝐺𝑖𝑗 depends strongly on the 𝑖th and 𝑗th columns of 𝐻, but weakly on all other columns. The first identity determines how to make a resolvent matrix element 𝐺𝑖𝑗 independent of an additional index 𝑘 ≠ 𝑖, 𝑗. The second identity expresses the dependence of a resolvent matrix element 𝐺𝑖𝑗 on the matrix elements in the 𝑖th or 𝑗th column of 𝐻. We added a third identity that relates sums of off-diagonal resolvent entries with a diagonal one. The proofs are elementary. Lemma 8.3 (Resolvent decoupling identities). For any real or complex Hermitian matrix 𝐻 and 𝕋 ⊂ {1, … , 𝑁} the following identities hold: (i) First resolvent decoupling identity [69]: If 𝑖, 𝑗, 𝑘 ∉ 𝕋 and 𝑖, 𝑗 ≠ 𝑘, then (𝕋)

(8.1)

(𝕋) 𝐺𝑖𝑗

=

(𝕋𝑘) 𝐺𝑖𝑗

+

(𝕋)

𝐺𝑖𝑘 𝐺𝑘𝑗 (𝕋)

𝐺𝑘𝑘

.

8.1. TOOLS

63

(ii) Second resolvent decoupling identity [52]: If 𝑖, 𝑗 ∉ 𝕋 satisfy 𝑖 ≠ 𝑗, then (𝕋)

(8.2)

(𝕋)

𝐺𝑖𝑗 = −𝐺𝑖𝑖

(𝕋𝑖)

(𝕋𝑖)

(𝕋)

(𝕋𝑗)

(𝕋𝑗)

∑ ℎ𝑖𝑘 𝐺𝑘𝑗 = −𝐺𝑗𝑗 ∑ 𝐺𝑖𝑘 ℎ𝑘𝑗 𝑘

𝑘

where the superscript in the summation means omission; e.g., the summation in the first sum runs over all 𝑘 ∉ 𝕋 ∪ {𝑖}. (iii) Ward identity. For any 𝕋 ⊂ {1, … , 𝑁} we have 2 1 (𝕋) (𝕋) ∑ ||𝐺𝑖𝑗 || = Im 𝐺𝑖𝑖 . 𝜂 𝑗

(8.3)

Proof. We will prove this lemma only for 𝕋= 0; the general case is a straightforward modification. We first consider (8.2). Recall the resolvent expansion stating that for any two matrices 𝐴 and 𝐵, 1 1 1 1 1 1 1 = − 𝐵 = − 𝐵 𝐴+𝐵 𝐴 𝐴+𝐵 𝐴 𝐴 𝐴 𝐴+𝐵

(8.4)

provided that all the matrix inverses exist. To obtain the first formula in (8.2), we use the first resolvent identity (8.4) (𝑖) at the (𝑖𝑗)th matrix element with 𝐴 = 𝐻 (𝑖) − 𝑧 and 𝐵 = 𝐻 − 𝐻 (𝑖) . Since 𝐺𝑖𝑗 = (𝐴−1 )𝑖𝑗 = 0 if 𝑖 ≠ 𝑗, we immediately have (8.5)

(𝑖)

(𝑖)

(𝑖)

𝐺𝑖𝑗 = − ∑ 𝐺𝑖𝑖 ℎ𝑖𝑘 𝐺𝑘𝑗 − ∑ 𝐺𝑖𝑘 ℎ𝑘𝑖 𝐺𝑖𝑗 = −𝐺𝑖𝑖 ∑ ℎ𝑖𝑘 𝐺𝑘𝑗 , 𝑘

𝑘

𝑖 ≠ 𝑗.

𝑘≠𝑖

The second identity in (8.2) follows in the same way by using the second identity in (8.4). To prove the identity (8.1), we let 𝐴 = 𝐻 (𝑘) − 𝑧 and 𝐵 = 𝐻 − 𝐻 (𝑘) . Then, from the first formula in (8.4), we have (𝑘)

(𝑘)

(𝑘)

𝐺𝑖𝑗 = 𝐺𝑖𝑗 − 𝐺𝑖𝑘 ∑ ℎ𝑘ℓ 𝐺ℓ𝑗 = 𝐺𝑖𝑗 +

(8.6)

ℓ≠𝑘

𝐺𝑖𝑘 𝐺𝑘𝑗 , 𝐺𝑘𝑘

and in the second step we used (8.2). Finally, the Ward identity (8.3), already used in (7.23), is well-known. It follows from the spectral decomposition for 𝐻 with eigenvalues 𝜆𝛼 and eigenvectors 𝐮𝛼 , namely, ∗ ∑ |𝐺𝑖𝑗 |2 = ∑ 𝐺𝑖𝑗 𝐺𝑗𝑖 = [|𝐺|2 ]𝑖𝑖 𝑗

𝑗

=∑ 𝛼

|𝐮𝛼 (𝑖)|2 |𝐮𝛼 (𝑖)|2 1 1 ∑ = Im = Im 𝐺𝑖𝑖 . 𝜂 𝜆 − 𝑧 𝜂 |𝜆𝛼 − 𝑧|2 𝛼 𝛼



64

8. PROOF OF THE LOCAL SEMICIRCLE LAW

8.2. Self-Consistent Equations on Two Levels By using the notation from the previous section, the Schur formula (Lemma 7.2) can be written as (𝕋𝑖)

1

(8.7)

(𝕋) 𝐺𝑖𝑖

(𝕋𝑖)

= ℎ𝑖𝑖 − 𝑧 − ∑ ℎ𝑖𝑘 𝐺𝑘𝑙 ℎ𝑙𝑖 𝑘,𝑙

where 𝑖 ∉ 𝕋 ⊂ {1, … , 𝑁}. We can take 𝕋 = ∅ to have (8.8)

𝐺𝑖𝑖 =

1 (𝑖)

(𝑖)

.

ℎ𝑖𝑖 − 𝑧 − ∑𝑘,𝑙 ℎ𝑖𝑘 𝐺𝑘𝑙 ℎ𝑙𝑖

The partial expectation with respect to the index 𝑖 in the denominator can be (𝑖) computed explicitly since ℎ𝑖𝑘 ℎ𝑙𝑖 is independent of 𝐺𝑘𝑙 and its expectation is nonzero only if 𝑘 = 𝑙. We thus get (𝑖)

(𝑖) 𝑃𝑖 [∑ ℎ𝑖𝑘 𝐺𝑘𝑙 ℎ𝑙𝑖 ] 𝑘,𝑙

(𝑖)

=

(𝑖) ∑ 𝑠𝑖𝑘 𝐺𝑘𝑘 𝑘

(𝑖)

(𝑖)

= ∑ 𝑠𝑖𝑘 𝐺𝑘𝑘 − ∑ 𝑠𝑖𝑘 𝑘

𝑘

= ∑ 𝑠𝑖𝑘 𝐺𝑘𝑘 − ∑ 𝑠𝑖𝑘 𝑘

𝑘

𝐺𝑖𝑘 𝐺𝑘𝑖 𝐺𝑖𝑖 𝐺𝑖𝑘 𝐺𝑘𝑖 , 𝐺𝑖𝑖

where in the second step we used (8.1). We will compare (8.8) with the defining equation of 𝑚sc : (8.9)

𝑚sc =

1 , −𝑧 − 𝑚sc

so we introduce the notation for the difference 𝑣𝑖 ∶= 𝐺𝑖𝑖 − 𝑚sc . Recalling (6.1), we get the following system of self-consistent equations for 𝑣𝑖 : (8.10)

𝑣𝑖 =

1 − 𝑚sc , −𝑧 − 𝑚sc − (∑𝑘 𝑠𝑖𝑘 𝑣𝑘 − Υ𝑖 )

where Υ𝑖 ∶= 𝐴𝑖 + ℎ𝑖𝑖 − 𝑍𝑖 ,

𝐴𝑖 ∶= ∑ 𝑠𝑖𝑘 𝑘

(8.11)

(𝑖)

𝐺𝑖𝑘 𝐺𝑘𝑖 , 𝐺𝑖𝑖

(𝑖)

𝑍𝑖 ∶= 𝑄𝑖 [∑ ℎ𝑖𝑘 𝐺𝑘𝑙 ℎ𝑙𝑖 ]. 𝑘,𝑙

All these quantities depend on 𝑧, but we omit it in the notation. We will show that Υ is a small error term. This is clear about ℎ𝑖𝑖 by (6.27). The term 𝐴𝑖 will be small since off-diagonal resolvent entries are small. Finally, 𝑍𝑖 will be small by a large-deviation estimate (7.59) from Theorem 7.7.

8.2. SELF-CONSISTENT EQUATIONS ON TWO LEVELS

65

Before we present more details, we heuristically show the power of this new system of self-consistent equations (8.10) and compare it with the single selfconsistent equation used in Chapter 7 and, in fact, used in all previous literature on the resolvent method for Wigner matrices. 8.2.1. A Scalar Self-Consistent Equation. Introduce the notation (8.12)

[𝑎] =

1 ∑𝑎 𝑁 𝑖 𝑖

for the average of a vector (𝑎𝑖 )𝑁 𝑖=1 . Consider the standard Wigner case, 𝑠𝑖𝑗 = Then 1 ∑ 𝑠𝑖𝑘 𝑣𝑘 = ∑ 𝑣𝑘 = [𝑣] (= 𝑚𝑁 − 𝑚sc ). 𝑁 𝑘 𝑘

1 . 𝑁

Neglecting Υ𝑖 in (8.10) and taking the average of this relation for each 𝑖 = 1, … , 𝑁, we get (8.13)

[𝑣] ≈

1 − 𝑚sc . −𝑧 − 𝑚sc − [𝑣]

Recall (8.9), the defining equation of 𝑚sc : 𝑚sc =

1 . −𝑧 − 𝑚sc

Using Lemma 7.6, which asserts that this equation is stable under small perturbations, at least away from the spectral edges 𝑧 = ±2, we conclude [𝑣] ≈ 0 from (8.13). This means that 𝑚𝑁 ≈ 𝑚sc and hence for the empirical density 𝜚𝑁 ≈ 𝜚sc ; i.e., we obtained Wigner’s original semicircle law. This is exactly the idea we used in Chapter 7 except that the interlacing argument is replaced by Lemma 8.3 this time. 8.2.2. A Vector Self-Consistent Equation. If we are interested in individual resolvent matrix elements 𝐺𝑖𝑖 instead of their average, 𝑁1 Tr 𝐺, then the scalar equation (8.13) discussed in the previous section is not sufficient. We have to consider (8.10) as a system of equations for the components of the vector 𝐯 = (𝑣1 , … , 𝑣𝑁 ). In order to analyze it, we will linearize this system of equations. There are two possible linearizations depending on how we expand the error terms. Linearization I. If we know that ∑𝑘 𝑠𝑖𝑘 𝑣𝑘 − Υ𝑖 is small, we can expand the denominator in (8.10) to have 2

3 2 𝑣𝑖 = 𝑚sc (∑ 𝑠𝑖𝑘 𝑣𝑘 − Υ𝑖 ) (∑ 𝑠𝑖𝑘 𝑣𝑘 − Υ𝑖 ) + 𝑚sc

(8.14)

𝑘

𝑘 3

+ 𝑂((∑ 𝑠𝑖𝑘 𝑣𝑘 − Υ𝑖 ) ), 𝑘

66

8. PROOF OF THE LOCAL SEMICIRCLE LAW

4 where we have used the defining equation (8.9) for 𝑚sc and used that |𝑚sc |≤1 in the error term. After rearranging (8.14), we get 2 (8.15) [(1 − 𝑚sc 𝑆)𝐯]𝑖 = ℰ𝑖 ∶= 2

3

2 3 − 𝑚sc Υ𝑖 + 𝑚sc (∑ 𝑠𝑖𝑘 𝑣𝑘 − Υ𝑖 ) + 𝑂((∑ 𝑠𝑖𝑘 𝑣𝑘 − Υ𝑖 ) ), 𝑘

𝑘

Linearization II. If we know that 𝑣𝑖 is small, then it is useful to rewrite (8.10) to the following form: (8.16)

− ∑ 𝑠𝑖𝑘 𝑣𝑘 + Υ𝑖 = 𝑘

1 1 − , 𝑚sc + 𝑣𝑖 𝑚sc

where we have also used the defining equation of 𝑚sc . If |𝑣𝑖 | = 𝑜(1), then we 2 expand 𝑣𝑖 to the second-order and multiply both sides by 𝑚sc to obtain (8.17)

2 2 𝑆)𝐯]𝑖 = ℰ𝑖 ∶= −𝑚sc Υ𝑖 + [(1 − 𝑚sc

1 2 𝑣 + 𝑂(|𝑣𝑖 |3 ), 𝑚sc 𝑖

2 where we have estimated |𝑚sc | ≥ 𝑐 in the last term (see Lemma 6.2). Notice that the definitions of ℰ𝑖 in (8.15) and (8.18) are different, although 2 their leading behavior −𝑚sc Υ𝑖 is the same. In both cases we can continue the 2 analysis by inverting the operator (1 − 𝑚sc 𝑆) to obtain

(8.18) 𝐯 =

1 ℰ, 2 𝑆 1 − 𝑚sc

1 ‖ ‖ hence ‖𝐯‖∞ ≤ ‖‖ ‖ℰ‖∞ = Γ‖ℰ‖∞ , 2 ‖ 𝑆 ‖∞→∞ 1 − 𝑚sc

and this relation shows how the quantity Γ, defined in (6.17), emerges. If the error term is indeed small and Γ is bounded, then we obtain that ‖𝐯‖∞ = max |𝐺𝑖𝑖 − 𝑚sc | is small. While the expansion logic behind equations (8.14) and (8.17) is the same and the resulting formulae are very similar, the structure of the main proof depends on which linearization of the self-consistent equation is used. In both cases we need to derive an a priori bound to ensure that the expansion is valid. Intuitively, the smallness of ∑𝑘 𝑠𝑖𝑘 𝑣𝑘 −Υ𝑖 seems easier than that of 𝑣𝑖 , since both terms are averaged quantities and extra averaging typically helps. But these are random objects and every estimate comes with an exceptional set in the probability space where it does not hold. It turns out that on the technical level it is worth minimizing the bookkeeping of these events, and this reason favors the second version of the linearization, which operates with controlling a single quantity, max𝑖 |𝑣𝑖 |. In this book, therefore, we will follow the linearization (8.17). We remark that the other option was used in [69, 70], which required first proving the weak semicircle law, Theorem 7.1, to provide the necessary a priori bound. The linearization (8.17) circumvents this step and the a priori bound on 𝑣𝑖 will be proved directly.

8.3. PROOF OF THE LOCAL SEMICIRCLE LAW WITHOUT USING THE SPECTRAL GAP

67

8.3. Proof of the Local Semicircle Law Without Using the Spectral Gap In this section we prove a restricted version of Theorem 6.7; namely, we replace threshold ˜ 𝜂 𝐸 with a larger threshold 𝜂𝐸 defined as 𝜂𝐸 ∶= min{𝜂 ∶ (8.19)

1 𝑀 −𝛾 𝑀 −2𝛾 ≤ min{ , } 𝑀𝜉 Γ(𝐸 + i𝜉)3 Γ(𝐸 + i𝜉)4 Im 𝑚sc (𝐸 + i𝜉) holds for all 𝜉 ≥ 𝜂}.

Γ is replaced with the larger This definition is exactly the same as (6.28), but ˜ quantity Γ; in other words, we do not make use of the spectral gap in 𝑆. This will pedagogically simplify the presentation, but it will prove the estimates in Theorem 6.7 only for the 𝜂 ≥ 𝜂𝐸 regime. In Chapter 9 we will give the proof for the entire 𝜂 ≥ ˜ 𝜂 𝐸 regime. We recall Lemma 6.3 showing that there is no difference between Γ and ˜ Γ for generalized Wigner matrices away from the edges (both are of order 1), so readers interested in the local semicircle law only in the bulk should be content with the simpler proof. Near the spectral edges, however, there is a substantial difference. Note that even in the Wigner case (see (6.22)), 𝜂𝐸 is much larger near the spectral edges than the optimal threshold ˜ 𝜂 𝐸 . For generalized Wigner matrices, while ˜ 𝜂 𝐸 ≫ 𝑁1 , the threshold 𝜂𝐸 is determined by the relation 𝑁𝜂𝐸 (𝜅𝐸 + 𝜂𝐸 )3/2 ≫ 1, where 𝜅𝐸 = ||𝐸| − 2| is the distance of 𝐸 from the spectral edges. We stress, however, that the proof given below does not use any modelspecific upper bound on Γ, such as (6.22) or (6.24); only the trivial lower and upper bounds, (6.19) and (6.20), are needed. The actual size of Γ enters only implicitly by determining the threshold 𝜂𝐸 . This makes the argument applicable to a wide class of problems beyond generalized Wigner matrices, including band matrices; see [55]. Definition 8.4. A deterministic nonnegative function Ψ ≡ Ψ(𝑁) (𝑧) is called an admissible control parameter if we have 𝑐𝑀 −1/2 ≤ Ψ ≤ 𝑀 −𝑐

(8.20)

for some constant 𝑐 > 0 and large enough 𝑁. Moreover, after fixing a 𝛾 > 0, we call any (possibly 𝑁-dependent) subset D = D(𝑁) ⊂ {𝑧 ∶ |𝐸| ≤ 10, 𝑀 −1 ≤ 𝜂 ≤ 10} a spectral domain. In this section we will mostly use the spectral domain (8.21)

S ∶= {𝑧 ∶ |𝐸| ≤ 10, 𝜂 ∈ [𝜂𝐸 , 10]}

where we note that (8.22)

𝜂𝐸 ≥

1 −1+𝛾 𝑀 , 8

68

8. PROOF OF THE LOCAL SEMICIRCLE LAW

using the lower bound Γ ≥ 𝑐 from (6.19) in the definition (8.19). Define the random control parameters (8.23)

Λo ∶= max|𝐺𝑖𝑗 |, 𝑖≠𝑗

Λd ∶= max|𝐺𝑖𝑖 − 𝑚sc |, 𝑖

Λ ∶= max(Λo , Λd ),

where the letters d and o refer to diagonal and off-diagonal elements. In the typical regime that we will work, all these quantities are small. The key quantity is Λ, and we will develop an iterative argument to control it. We first derive an estimate of Λo + |Υ𝑖 | in terms of Λ. This will be possible only in the event when Λ is already small, so we will need to introduce an indicator function 𝜙 = 1(Λ ≤ 𝑀 −𝑐 ) with some small 𝑐. More generally, we will consider any indicator function 𝜙 so that 𝜙Λ ≺ 𝑀 −𝑐 . Notice that this is a somewhat weaker concept than 𝜙Λ ≤ 𝑀 −𝑐 (even if the exponent 𝑐 is slightly adjusted), but it turns out to be more flexible since algebraic manipulations involving ≺ (see Proposition 6.5) can be directly used. 8.3.1. Large-Deviation Bounds on 𝚲o and 𝚼𝐢 . Lemma 8.5. The following statements hold for any spectral domain 𝐃: Let 𝜙 be the indicator function of some (possibly 𝑧-dependent) event. If 𝜙Λ ≺ 𝑀 −𝑐 for some 𝑐 > 0 holds uniformly in 𝑧 ∈ 𝐃, then (8.24)

𝜙(Λo + |𝑍𝑖 | + |Υ𝑖 |) ≺

Im 𝑚sc + Λ 𝑀𝜂 √

uniformly in 𝑧 ∈ 𝐃. Moreover, for any fixed (𝑁-independent) 𝜂 > 0 we have (8.25)

Λo + |𝑍𝑖 | + |Υ𝑖 | ≺ 𝑀 −1/2

uniformly in 𝑧 ∈ {𝑤 ∈ 𝐃 ∶ Im 𝑤 = 𝜂}. In other words, (8.24) means that (8.26)

Λo + |𝑍𝑖 | + |Υ𝑖 | ≺

Im 𝑚sc + Λ 𝑀𝜂 √

on the event where Λ ≺ 𝑀 −𝑐 has been a priori established. Proof of Lemma 8.5. We first observe that 𝜙Λ ≺ 𝑀 −𝑐 ≪ 1 and the positive lower bound |𝑚sc (𝑧)| ≥ 𝑐 implies that (8.27)

𝜙 ≺ 1. |𝐺𝑖𝑖 |

A simple iteration of the expansion formula (8.1) concludes that (8.28)

(𝕋) 𝜙 ||𝐺𝑖𝑗 || ≺ 𝑀 −𝑐 for 𝑖 ≠ 𝑗,

for any subset 𝕋 of fixed cardinality.

(𝕋) 𝜙||𝐺𝑖𝑖 || ≺ 1,

𝜙 |𝐺𝑖𝑖(𝕋) |

≺ 1,

8.3. PROOF OF THE LOCAL SEMICIRCLE LAW WITHOUT USING THE SPECTRAL GAP

69

We begin with the first statement in Lemma 8.5. First we estimate 𝑍𝑖 , which we split as (8.29)

| (𝑖) | (𝑖) | (𝑖) | (𝑖) 2 𝜙|𝑍𝑖 | ≤ 𝜙|∑(|ℎ𝑖𝑘 | − 𝑠𝑖𝑘 )𝐺𝑘𝑘 | + 𝜙| ∑ ℎ𝑖𝑘 𝐺𝑘𝑙 ℎ𝑙𝑖 |. |𝑘 | |𝑘≠𝑙 |

We estimate each term using Theorem 7.7 by conditioning on 𝐺 (𝑖) and using (𝑖) the fact that the family (ℎ𝑖𝑘 )𝑁 𝑘=1 is independent of 𝐺 . By (7.57) the first term of (8.29) is stochastically dominated by 1/2 2 | (𝑖) |2 𝜙[∑ 𝑠𝑖𝑘 𝐺𝑘𝑘 ] 𝑘 (𝑖)

≺ 𝑀 −1/2 ,

where (8.28), (6.3), and (6.1) were used. For the second term of (8.29) we apply 1/2 (𝑖) 1/2 −1/2 (7.59) from Theorem 7.7 with 𝑎𝑘𝑙 = 𝑠𝑖𝑘 𝐺𝑘𝑙 𝑠𝑙𝑖 and 𝑋𝑘 = 𝑠𝑖𝑘 ℎ𝑖𝑘 . We find (𝑖)

(8.30)

(𝑖)

2 2 𝜙 (𝑖) (𝑖) ∑ 𝑠𝑖𝑘 ||𝐺𝑘𝑙 || 𝜙 ∑ 𝑠𝑖𝑘 ||𝐺𝑘𝑙 || 𝑠𝑙𝑖 ≤ 𝑀 𝑘,𝑙 𝑘,𝑙 (𝑖)

=

Im 𝑚sc + Λ 𝜙 (𝑖) ∑ 𝑠𝑖𝑘 Im 𝐺𝑘𝑘 ≺ , 𝑀𝜂 𝑘 𝑀𝜂

where in the last step we used (8.1) and the estimate 1/𝐺𝑖𝑖 ≺ 1. Thus, we get (8.31)

Im 𝑚sc + Λ , 𝑀𝜂 √

𝜙|𝑍𝑖 | ≺

where we absorbed the bound 𝑀 −1/2 on the first term of (8.29) into the righthand side of (8.31). Here we only needed to use Im 𝑚sc (𝑧) ≥ 𝑐𝜂 as follows from an explicit estimate; see (6.13). Next, we estimate Λo . We can iterate (8.2) once to get, for 𝑖 ≠ 𝑗, (𝑖)

(8.32)

𝐺𝑖𝑗 =

(𝑖) −𝐺𝑖𝑖 ∑ ℎ𝑖𝑘 𝐺𝑘𝑗 𝑘

The term ℎ𝑖𝑗 is trivially 𝑂≺ (𝑀

−1/2

=

(𝑖) −𝐺𝑖𝑖 𝐺𝑗𝑗 (ℎ𝑖𝑗

(𝑖𝑗)

(𝑖𝑗)

− ∑ ℎ𝑖𝑘 𝐺𝑘𝑙 ℎ𝑙𝑗 ). 𝑘,𝑙

). In order to estimate the other term, we (𝑖𝑗)

1/2 1/2 −1/2 𝐺𝑘𝑙 𝑠𝑙𝑗 , 𝑋𝑘 = 𝑠𝑖𝑘 ℎ𝑖𝑘 , and invoke (7.58) from Theorem 7.7 with 𝑎𝑘𝑙 = 𝑠𝑖𝑘 −1/2 𝑌𝑙 = 𝑠𝑙𝑗 ℎ𝑙𝑗 . As in (8.30), we find (𝑖)

2 Im 𝑚sc + Λ (𝑖𝑗) 𝜙 ∑ 𝑠𝑖𝑘 ||𝐺𝑘𝑙 || 𝑠𝑙𝑗 ≺ , 𝑀𝜂 𝑘,𝑙

and thus (8.33)

𝜙Λo ≺

Im 𝑚sc + Λ , 𝑀𝜂 √

where we again absorbed the term ℎ𝑖𝑗 ≺ 𝑀 −1/2 into the right-hand side.

70

8. PROOF OF THE LOCAL SEMICIRCLE LAW

In order to estimate 𝐴𝑖 and ℎ𝑖𝑖 in the definition of Υ𝑖 , we use (8.28) to get 𝜙[|𝐴𝑖 | + |ℎ𝑖𝑖 |] ≺ 𝜙Λ2o + 𝑀 −1/2 ≤ 𝜙Λo + 𝐶

Im 𝑚sc Im 𝑚sc + Λ ≺ , 𝑀𝜂 √ √ 𝑀𝜂

where the second step follows from Im 𝑚sc ≥ 𝑐𝜂. Collecting (8.31) and (8.33), this completes the proof of (8.24). (𝑖) The proof of (8.25) is almost identical to that of (8.24). The quantities |𝐺𝑘𝑘 | (𝑖𝑗)

and |𝐺𝑘𝑘 | are estimated by the trivial deterministic bound 𝜂−1 = 𝑂(1). We omit the details. □ 8.3.2. Initial Bound on 𝚲. In this subsection, we prove an initial bound asserting that Λ ≺ 𝑀 −1/2 holds for 𝑧 with a large imaginary part 𝜂. This bound is rather easy to get after we have proved in (8.25) that the error term Υ𝑖 in the self-consistent equation (8.16) is small. Lemma 8.6. We have Λ ≺ 𝑀 −1/2 uniformly in 𝑧 ∈ [−10, 10] + 2i. Proof. We shall make use of the trivial bounds |𝐺 (𝕋) | ≤ 1 = 1 , |𝑚 | ≤ 1 = 1 , (8.34) sc | 𝑖𝑗 | 𝜂 2 𝜂 2 where the last inequality follows from the fact that 𝑚sc is the Stieltjes transform of a probability measure. From (8.25) we get Λo + |𝑍𝑖 | ≺ 𝑀 −1/2 .

(8.35)

Moreover, we use (8.1) and (8.32) to estimate (𝑖𝑗) (𝑖) | | | 𝐺𝑖𝑗 𝐺𝑗𝑖 | (𝑖𝑗) (𝑖) | ≤ 𝑀 −1 + ∑ 𝑠𝑖𝑗 ||𝐺𝑗𝑖 𝐺𝑗𝑗 || ||ℎ𝑖𝑗 − ∑ ℎ𝑖𝑘 𝐺𝑘𝑙 ℎ𝑙𝑗 || ≺ 𝑀 −1/2 , |𝐴𝑖 | ≤ ∑ 𝑠𝑖𝑗 | | 𝐺𝑖𝑖 | | | 𝑗 𝑗 𝑘,𝑙

where the last step follows by using (7.58), exactly as the estimate of the righthand side of (8.32) in the proof of Lemma 8.5. We conclude that |Υ𝑖 | ≺ 𝑀 −1/2 . Next, we write (8.16) as (8.36)

𝑣𝑖 =

𝑚sc (∑𝑘 𝑠𝑖𝑘 𝑣𝑘 − Υ𝑖 ) −1 − ∑𝑘 𝑠𝑖𝑘 𝑣𝑘 + Υ𝑖 𝑚sc

.

−1 Using |𝑚sc | ≥ 2 and |𝑣𝑘 | ≤ 1 as follows from (8.34), we find

| −1 | |𝑚sc + ∑ 𝑠𝑖𝑘 𝑣𝑘 − Υ𝑖 | ≥ 1 + 𝑂≺ (𝑀 −1/2 ). | | 𝑘

Using |𝑚sc | ≤ conclude that (8.37)

1 2

and taking the maximum over all 𝑖 in (8.36), we therefore Λd ≤

Λd + 𝑂≺ (𝑀 −1/2 ) Λd = + 𝑂≺ (𝑀 −1/2 ) , 2 2 + 𝑂≺ (𝑀 −1/2 )

from which the claim follows together with the estimate on Λo from (8.35).



8.3. PROOF OF THE LOCAL SEMICIRCLE LAW WITHOUT USING THE SPECTRAL GAP

71

8.3.3. A Rough Bound on 𝚲 and a Continuity Argument. The next step is to get a rough bound to Λ by a continuity argument. Proposition 8.7. We have Λ ≺ 𝑀 −𝛾/3 Γ−1 uniformly in the domain 𝐒 defined in (8.21). Proof. The core of the proof is a continuity argument. The first task is to establish a gap in the range of Λ by establishing a dichotomy. Roughly speaking, the following lemma asserts that, for all 𝑧 ∈ 𝐒, with high probability either Λ ≤ 𝑀 −𝛾/2 Γ−1 or Λ ≥ 𝑀 −𝛾/4 Γ−1 ; i.e., there is a gap or forbidden region in the range of Λ with very high probability. Lemma 8.8. We have the bound 𝟏(Λ ≤ 𝑀 −𝛾/4 Γ−1 )Λ ≺ 𝑀 −𝛾/2 Γ−1 uniformly in 𝐒. Proof of Lemma 8.8 . Set 𝜙 ∶= 𝟏(Λ ≤ 𝑀 −𝛾/4 Γ−1 ). Then by definition we have 𝜙Λ ≤ 𝑀 −𝛾/4 Γ−1 ≤ 𝐶𝑀 −𝛾/4 , where in the last step we have used that Γ is bounded below (6.19). Hence we may invoke (8.24) to estimate Λo and Υ𝑖 by √(Im 𝑚sc + Λ)/𝑀𝜂. In order to estimate Λd , we use (8.18) to get 𝜙Λd = 𝜙 max|𝑣𝑖 | ≺ Γ(Λ2 +

(8.38)

𝑖

Im 𝑚sc + Λ ). 𝑀𝜂 √

Recalling (6.19) and (8.24), we therefore get 𝜙Λ ≺ 𝜙Γ(Λ2 +

(8.39)

Im 𝑚sc + Λ ). 𝑀𝜂 √

Next, by definition of 𝜙, we may estimate 𝜙ΓΛ2 ≤ 𝑀 −𝛾/2 Γ−1 . Moreover, by definition of 𝐒, we have 1 𝑀 −𝛾 𝑀 −2𝛾 ≤ min{ 3 , 4 }. 𝑀𝜂 Γ Im 𝑚sc Γ

(8.40)

Together with the definition of 𝜙, we have 𝜙Γ

Im 𝑚sc + Λ Im 𝑚sc Γ−1 ≤Γ +Γ 𝑀𝜂 √ 𝑀𝜂 √ √ 𝑀𝜂 ≤ 𝑀 −𝛾 Γ−1 + 𝑀 −𝛾/2 Γ−1 ≤ 2𝑀 −𝛾/2 Γ−1 .

(Notice that the middle inequality is the crucial place where the definition of 𝜂𝐸 and the restriction 𝜂 ≥ 𝜂𝐸 are used.) Plugging this into (8.39) yields 𝜙Λ ≺ 𝑀 −𝛾/2 Γ−1 , which is the claim. □

72

8. PROOF OF THE LOCAL SEMICIRCLE LAW

Figure 8.1. The (𝜂, Λ)-plane for a fixed 𝐸 with the graph of 𝜂 → Λ(𝐸 + i𝜂). The shaded region is forbidden with high probability by Lemma 8.8. The initial estimate, Lemma 8.6, is marked with a black dot. The graph of Λ = Λ(𝐸 + i𝜂) is continuous and lies beneath the shaded region. Note that this method does not control Λ(𝐸 + 𝑖𝜂) in the regime 𝜂 ≤ 𝜂𝐸 . If we knew that Λ is excluded from the interval [𝑀 −𝛾/2 Γ−1 , 𝑀 −𝛾/4 Γ−1 ], then we could immediately finish the proof of Proposition 8.7. We could argue that Λ = Λ(𝐸 + 𝑖𝜂) is continuous in 𝜂 = Im 𝑧 and hence cannot jump from one side of the gap to the other; moreover, for 𝜂 = 2 it is below the gap by Lemma 8.6, so Λ is below the gap for all 𝑧 ∈ 𝐒 with high probability. For a pictorial illustration of this argument, see Figure 8.1 (borrowed from [55]). However, Lemma 8.8 guarantees a gap in the range of Λ only with a very high probability for each fixed 𝑧. We need to use a fine discrete grid in the space of 𝑧 to upgrade this statement to all 𝑧 with high probability. Then the continuity argument described in the previous paragraph will be valid with high probability. In the next step we explain the details. The continuity argument. Fix 𝐷 > 10. Lemma 8.8 implies that for each 𝑧 ∈ 𝐒 the probability that Λ falls into the gap (or the forbidden region) is very small; i.e., we have (8.41)

ℙ(𝑀 −𝛾/3 Γ(𝑧)−1 ≤ Λ(𝑧) ≤ 𝑀 −𝛾/4 Γ(𝑧)−1 ) ≤ 𝑁 −𝐷

for 𝑁 ≥ 𝑁0 where 𝑁0 ≡ 𝑁0 (𝛾, 𝐷) does not depend on 𝑧. All of the argument below is valid for 𝑁 ≥ 𝑁0 . Next, take a lattice Δ ⊂ 𝐒 such that |Δ| ≤ 𝑁 10 ; for each 𝑧 ∈ 𝐒 there exists a 𝑤 ∈ Δ such that |𝑧 − 𝑤| ≤ 𝑁 −4 . Then (8.41) applied to all elements of Δ, a

8.3. PROOF OF THE LOCAL SEMICIRCLE LAW WITHOUT USING THE SPECTRAL GAP

73

simple union bound gives (8.42)

ℙ(∃𝑤 ∈ Δ ∶ 𝑀 −𝛾/3 Γ(𝑤)−1 ≤ Λ(𝑤) ≤ 𝑀 −𝛾/4 Γ(𝑤)−1 ) ≤ 𝑁 −𝐷+10 .

From the definitions of Λ(𝑧), Γ(𝑧), and 𝐒 (recall (6.19)), we immediately find that Λ and Γ are Lipschitz continuous on 𝐒, with Lipschitz constant at most 𝑀 2 . Hence (8.42) implies that ℙ(∃𝑧 ∈ 𝐒 ∶ 2𝑀 −𝛾/3 Γ(𝑧)−1 ≤ Λ(𝑧) ≤ 2−1 𝑀 −𝛾/4 Γ(𝑧)−1 ) ≤ 𝑁 −𝐷+10 ; i.e., a slightly smaller gap is present simultaneously for all 𝑧 ∈ 𝐒 and not only for the discrete lattice Δ. We conclude that there is an event Ξ satisfying ℙ(Ξ) ≥ 1 − 𝑁 −𝐷+10 such that, for each 𝑧 ∈ 𝐒, either 𝟏(Ξ)Λ(𝑧) ≤ 2𝑀 −𝛾/3 Γ(𝑧)−1 or 𝟏(Ξ)Λ(𝑧) ≥ 2−1 𝑀 −𝛾/4 Γ(𝑧)−1 . Since Λ is continuous and 𝐒 is by definition connected, we conclude that either (8.43)

∀𝑧 ∈ 𝐒 ∶ 𝟏(Ξ)Λ(𝑧) ≤ 2𝑀 −𝛾/3 Γ(𝑧)−1

or (8.44)

∀𝑧 ∈ 𝐒 ∶ 𝟏(Ξ)Λ(𝑧) ≥ 2−1 𝑀 −𝛾/4 Γ(𝑧)−1 .

Here the bounds (8.43) and (8.44) each hold surely, i.e., for every realization of Λ(𝑧). It remains to show that (8.44) is impossible. In order to do so, it suffices to show that there exists a 𝑧 ∈ 𝐒 such that Λ(𝑧) < 2−1 𝑀 −𝛾/4 Γ(𝑧)−1 with probability greater than 1/2. But this holds for any 𝑧 with Im 𝑧 = 2, as follows from Lemma 8.6 and the bound Γ ≤ 𝐶𝜂 −1 (6.20). This concludes the proof of Proposition 8.7. □ 8.3.4. Fluctuation Averaging Lemma. Recall the definition of the small control parameter Π from (6.30) and the definition of the average [𝑎] of a vector (𝑎𝑖 )𝑁 𝑖=1 from (8.12). We have shown in Lemma 8.5 that |Υ𝑖 | ≺ Π, but in fact the average [Υ] is one order better; it is Π2 . This is due to the fluctuation averaging phenomenon, which we state as the following lemma. We will explain its proof and related earlier results in Chapter 10. We shall perform the averaging with respect to a family of complex weights 𝑇 = (𝑡𝑖𝑘 ) satisfying (8.45)

0 ≤ |𝑡𝑖𝑘 | ≤ 𝑀 −1 ,

∑|𝑡𝑖𝑘 | ≤ 1. 𝑘

Typical example weights are 𝑡𝑖𝑘 = 𝑠𝑖𝑘 and 𝑡𝑖𝑘 = 𝑁 −1 . Note that in both of these cases 𝑇 commutes with 𝑆. Lemma 8.9 (Fluctuation averaging). Fix a spectral domain 𝐃 and a deterministic control parameter Ψ satisfying (8.20). Let the weights 𝑇 = (𝑡𝑖𝑘 ) satisfy (8.45). (i) If Λ ≺ Ψ, then we have (8.46)

∑ 𝑡𝑖𝑘 𝑄𝑘 𝐺𝑘𝑘 = 𝑂≺ (Ψ2 ). 𝑘

74

8. PROOF OF THE LOCAL SEMICIRCLE LAW

(ii) If Λd ≺ 𝑀 −𝑐 for some 𝑐 > 0 and Λo ≺ Ψo (with Ψo also satisfying (8.20)), then we have 1 ∑ 𝑡𝑖𝑘 𝑄𝑘 (8.47) = 𝑂≺ (Ψ2o ). 𝐺 𝑘𝑘 𝑘 (iii) Assume that 𝑇 commutes with 𝑆 and Λ ≺ Ψ. Then we have ∑ 𝑡𝑖𝑘 𝑣𝑘 = 𝑂≺ (ΓΨ2 ).

(8.48)

𝑘

If, additionally, we have ∑ 𝑡𝑖𝑘 = 1

(8.49)

for all 𝑖,

𝑘

then (8.50)

∑ 𝑡𝑖𝑘 (𝑣𝑘 − [𝑣]) = 𝑂≺ ( ˜ Γ Ψ2 )

for all 𝑖

𝑘

where we defined 𝑣𝑖 ∶= 𝐺𝑖𝑖 − 𝑚. The estimates (8.46)–(8.48) and (8.50) are uniform in the index 𝑖. Notice that the last statement (8.50) involves the parameter ˜ Γ instead of Γ, indicating that the spectral gap of 𝑆 is used in its proof. We now prove a simple corollary of the bound (8.47). Corollary 8.10. Suppose that Λd ≺ 𝑀 −𝑐 for some 𝑐 > 0 and Λo ≺ Ψo for some deterministic control parameter Ψo satisfying (8.20). Suppose that the weights 𝑇 = (𝑡𝑖𝑘 ) satisfy (8.45). Then we have ∑𝑘 𝑡𝑎𝑘 Υ𝑘 = 𝑂≺ (Ψ2𝑜 ). Proof. The claim easily follows from the Schur complement formula (8.7) written in the form 1 Υ𝑖 = 𝐴𝑖 + 𝑄𝑖 . 𝐺𝑖𝑖 We may therefore estimate ∑𝑘 𝑡𝑎𝑘 Υ𝑘 using the trivial bound |𝐴𝑖 | ≺ Ψ2𝑜 as well as the fluctuation averaging bound from (8.47). □ 8.3.5. The Final Iteration Scheme. First note that Proposition 8.7 guarantees that 𝜙 ≡ 1 may be chosen in Lemma 8.5, since the condition 𝜙Λ ≺ 𝑀 −𝑐 is satisfied. Therefore, Lemma 8.5 asserts that Λo is stochastically dominated by Im 𝑚sc + Λ Im 𝑚sc 𝑀 𝜀 ≤ 𝑀 −𝜀 Λ + + , 𝑀𝜂 𝑀𝜂 √ √ 𝑀𝜂 where we have used the Schwarz inequality. The next lemma, the main estimate behind the proof of Theorem 6.7, extends this estimate to bound Λd with the same quantity. Thus, roughly speaking, we can estimate Λ by 𝑀 −𝜀 Λ plus a deterministic error term. This gives a recursive relation on the upper bound for Λ, which will be the basic step of our iteration scheme.

8.3. PROOF OF THE LOCAL SEMICIRCLE LAW WITHOUT USING THE SPECTRAL GAP

75

Proposition 8.11. Let Ψ be a control parameter satisfying 𝑐𝑀 −1/2 ≤ Ψ ≤ 𝑀 −𝛾/3 Γ−1

(8.51)

and fix 𝜀 ∈ (0, 𝛾/3). Then on the domain 𝐒 we have the implication (8.52)

Λ≺Ψ



Λ ≺ 𝐹(Ψ) ,

where we defined 𝐹(Ψ) ∶= 𝑀 −𝜀 Ψ +

Im 𝑚sc 𝑀 𝜀 + . 𝑀𝜂 √ 𝑀𝜂

Proof. Suppose that Λ ≺ Ψ for some deterministic control parameter Ψ satisfying (8.51). We invoke Lemma 8.5 with 𝜙 = 1 (recall the bound (6.19)) to get (8.53)

Λo + |𝑍𝑖 | + |Υ𝑖 | ≺

Im 𝑚sc + Λ Im 𝑚sc + Ψ ≺ . 𝑀𝜂 𝑀𝜂 √ √

Next, we estimate Λd . Define the 𝑧-dependent indicator function 𝜓 ∶= 𝟏(Λ ≤ 𝑀 −𝛾/4 ). By (8.51), (6.19), and the assumption Λ ≺ Ψ, we have 1 − 𝜓 ≺ 0. On the event {𝜓 = 1}, (8.17) is rigorous and we get the bound 𝜓|𝑣𝑖 | ≤ 𝐶𝜓 ||∑ 𝑠𝑖𝑘 𝑣𝑘 − Υ𝑖 || + 𝐶𝜓Λ2 . 𝑘

Using the fluctuation averaging estimate (8.48) to bound ∑𝑘 𝑠𝑖𝑘 𝑣𝑘 and (8.53) to bound Υ𝑖 , we find (8.54)

𝜓|𝑣𝑖 | ≺ ΓΨ2 +

Im 𝑚sc + Ψ , 𝑀𝜂 √

where we used the assumption Λ ≺ Ψ and the lower bound from (6.19) so that 𝐶𝜓Λ2 ≤ ΓΨ2 . Since the set {𝜓 = 0} has very small probability (using our notation, this is expressed by 1 − 𝜓 ≺ 0), we conclude (8.55)

Λd ≺ ΓΨ2 +

Im 𝑚sc + Ψ , 𝑀𝜂 √

which, combined with (8.53), yields (8.56)

Λ ≺ ΓΨ2 +

Im 𝑚sc + Ψ . 𝑀𝜂 √

Using the Schwarz inequality and the assumption Ψ ≤ 𝑀 −𝛾/3 Γ−1 , we conclude the proof. □

76

8. PROOF OF THE LOCAL SEMICIRCLE LAW

Finally, we complete the proof of Theorem 6.7. It is easy to check that, on the domain 𝐒, if Ψ satisfies (8.51), then so does 𝐹(Ψ). In fact, this step is the origin of the somewhat complicated definition of 𝜂𝐸 . We may therefore iterate (8.52). This yields a bound on Λ that is essentially the fixed point of the map Ψ ↦ 𝐹(Ψ), which is given by Π, defined in (6.30) (up to the factor 𝑀 𝜀 ). More precisely, the iteration is started with Ψ0 ∶= 𝑀 −𝛾/3 Γ−1 ; the initial hypothesis Λ ≺ Ψ0 is provided by Proposition 8.7. For 𝑘 ≥ 1 we set Ψ𝑘+1 ∶= 𝐹(Ψ𝑘 ). Hence, from (8.52) we conclude that Λ ≺ Ψ𝑘 for all 𝑘. Choosing 𝑘 ∶= ⌈𝜀−1 ⌉ yields Λ≺

Im 𝑚sc 𝑀 𝜀 + . 𝑀𝜂 √ 𝑀𝜂

Since 𝜀 was arbitrary, we have proved that (8.57)

Λ≺Π=

Im 𝑚sc (𝑧) 1 + , 𝑀𝜂 𝑀𝜂 √

which is (6.31). What remains is to prove (6.32). On the set {𝜓 = 1}, we once again use (8.17) to get (8.58)

2 𝜓𝑚sc (− ∑ 𝑠𝑖𝑘 𝑣𝑘 + Υ𝑖 ) = −𝜓𝑣𝑖 + 𝑂(𝜓Λ2 ). 𝑘

Averaging in (8.58) yields 2 𝜓𝑚sc (−[𝑣] + [Υ]) = −𝜓[𝑣] + 𝑂(𝜓Λ2 ).

(8.59)

By (8.57) and (8.53) with Ψ = Π, we have Λ + |Υ𝑖 | ≺ Π. Moreover, by Corollary 8.10 (with the choice of 𝑡𝑖𝑘 = 𝑁1 ), we have |[Υ]| ≺ Π2 . Thus, we get 2 𝜓[𝑣] = 𝑚sc 𝜓[𝑣] + 𝑂≺ (Π2 ). 2 Since 1 − 𝜓 ≺ 0, we conclude that [𝑣] = 𝑚sc [𝑣] + 𝑂≺ (Π2 ). Therefore,

|[𝑣]| ≺ (8.60)

Im 𝑚sc Π2 1 2 ≤( + ) 2 2 2 𝑀𝜂 |1 − 𝑚sc | |1 − 𝑚sc | |1 − 𝑚sc |𝑀𝜂 2 Γ 𝐶 ≤ (𝐶 + ≤ . ) 𝑀𝜂 𝑀𝜂 𝑀𝜂

2 | In the third step, we used the elementary explicit bound Im 𝑚sc ≤ 𝐶|1 − 𝑚sc 2 −1 from (6.12)–(6.13) and the bound Γ ≥ |1 − 𝑚sc | from (6.21). Because |𝑚𝑁 − 𝑚sc | = |[𝑣]|, this concludes the proof of (6.32). The proof of (6.33) is exactly the same, just in the third inequality of (8.60) we may use the stronger bound Im 𝑚sc ≍ 𝜂/√𝜅 + 𝜂 from (6.13) in the regime |𝐸| ≥ 2. This completes the proof of Theorem 6.7 in the entire regime 𝐒, i.e., for 𝜂 ≥ 𝜂𝐸 .

8.3. PROOF OF THE LOCAL SEMICIRCLE LAW WITHOUT USING THE SPECTRAL GAP

77

8.3.6. Summary of the Proof: A Recapitulation. In order to highlight the main ideas again, we summarize the key steps leading to the proof of Theorem 6.7 restricted to the regime 𝜂 ≥ 𝜂𝐸 . As a guiding principle we stress that the main control parameter in the proof is Λ = max𝑖𝑗 {|𝐺𝑖𝑖 − 𝑚sc |, |𝐺𝑖𝑗 |}. The main goal is to get a closed, self-improving inequality for Λ, at least with a very high probability. We needed the following ingredients: (1) A self-consistent system of equations for the diagonal elements of the resolvent, written as 1 1 (8.61) −(𝑆𝐯)𝑖 + Υ𝑖 = − where 𝑣𝑖 = 𝐺𝑖𝑖 − 𝑚sc . 𝑚sc + 𝑣𝑖 𝑚sc Since Υ𝑖 contains the resolvent 𝐺 (𝑖) , we also need perturbation formulas connecting 𝐺 and 𝐺 (𝑖) to make these equations self-consistent. (2) A large-deviation bound on the off-diagonal elements and on the fluctuation, i.e., on Λo + |𝑍𝑖 |, as (8.62)

Λo + |𝑍𝑖 | + |Υ𝑖 | ≺

Im 𝑚sc + Λ . 𝑀𝜂 √

(3) An initial estimate on Λ for large 𝜂, where the a priori bounds |𝐺𝑖𝑗 | ≤ 𝜂−1 ≤ 𝐶 are effective. (4) A dichotomy, showing that there is a forbidden region for Λ: 𝟏(Λ ≤ 𝑀 −𝛾/4 Γ−1 )Λ ≺ 𝑀 −𝛾/2 Γ−1 . (5) A crude bound on Λ down to small 𝜂 obtained from the dichotomy and the initial estimate via a continuity argument. (6) Application of the fluctuation averaging lemma, i.e., Lemma 8.9, to estimate 𝑆𝐯 in the self-consistent equation. Thus 𝑆𝐯 becomes of order Λ2 , i.e., one order higher than the trivial bound |𝑣𝑖 | ≤ Λ. This “boost” exploits the cancellations in averages of weakly correlated centered random variables, and it constitutes the crucial improvement over Chapter 7. Thus, we can use (8.17) to have (8.63)

𝑣𝑖 ≲ Υ𝑖 + 𝑂(Λ2 ).

(7) Combination of the large-deviation bound (8.62) for Υ𝑖 with (8.63) provides an estimate on Λd = max𝑖 |𝑣𝑖 | in terms of Λ that is better than the trivial bound Λd ≤ Λ. Together with the large-deviation bound on Λo in (8.62), we have a closed inequality for Λ: Λ ≤ 𝑀 −𝜀 Λ +

Im 𝑚sc 𝑀 𝜀 + . 𝑀𝜂 √ 𝑀𝜂

The error term 𝑀 −𝜀 Λ can be absorbed into the left side, and we have proved the estimate on Λ asserted in Theorem 6.7. In practice, the last inequality was formulated in terms of a control parameter and we used an iteration scheme.

78

8. PROOF OF THE LOCAL SEMICIRCLE LAW

(8) Averaging the self-consistent equation (8.17) to obtain a stronger estimate on the average quantity [𝑣]. Fluctuation averaging is used again, this time to the average of Υ, so that it becomes one order higher. This concludes the proof of Theorem 6.7 for 𝜂 ≥ 𝜂𝐸 .

CHAPTER 9

Sketch of the Proof of the Local Semicircle Law Using the Spectral Gap In Section 8.3 we proved the local semicircle law, Theorem 6.7, uniformly for 𝜂 ≥ 𝜂𝐸 instead of the larger regime 𝜂 ≥ ˜ 𝜂 𝐸 . Recall that for generalized Wigner matrices the two thresholds 𝜂𝐸 and ˜ 𝜂 𝐸 are determined by the relation 𝑁𝜂𝐸 (𝜅𝐸 + 𝜂𝐸 )3/2 ≫ 1 and ˜ 𝜂 𝐸 ≫ 1/𝑁. Hence these two thresholds coincide in the bulk but substantially differ near the edge. In this section we sketch the proof of Theorem 6.7 for any 𝜂 ≥ ˜ 𝜂 𝐸 , i.e., for the optimal domain for 𝜂. For the complete proof we refer to section 6 of [55]. This section can be read independently of Section 8.3. We point out that the difference between these two thresholds stem from the difference between Γ and ˜ Γ (see (6.28) and (8.19)). The bound Γ on the 2 −1 norm of (1 − 𝑚 𝑆) entered the proof when the self-consistent equation (8.17) was solved. The key idea in this section is that we can use ˜ Γ to solve the selfconsistent equation (8.17) separately on the subspace of constants (the span of the vector e) and on its orthogonal complement e⟂ . Step 1. We bound the control parameter Λ = Λ(𝑧) from (8.23) in terms of Θ ∶= |𝑚𝑁 − 𝑚sc |. This is the content of the following lemma. From now on, we assume that 𝑐 ≤ ˜ Γ ≤ 𝐶, which is valid for generalized Wigner matrices (see (6.24)). This will simplify several estimates in the argument that follows; for the general case, we refer to [55]. Lemma 9.1. Define the 𝑧-dependent indicator function 𝜙 ∶= 𝟏(Λ ≤ 𝑀 −𝛾/4 )

(9.1)

and the random control parameter (9.2)

𝑞(Θ) ∶=

Im 𝑚sc + Θ 1 + , 𝑀𝜂 𝑀𝜂 √

Θ = |𝑚𝑁 − 𝑚sc |.

Then we have (9.3)

𝜙Λ ≺ Θ + 𝑞(Θ).

Proof. For the whole proof we work on the event {𝜙 = 1}; i.e., every quantity is multiplied by 𝜙. We consistently drop these factors 𝜙 from our notation in order to avoid cluttered expressions. In particular, we set Λ ≤ 𝑀 −𝛾/4 throughout the proof. 79

80

9. SKETCH OF THE PROOF OF THE LOCAL SEMICIRCLE LAW

We begin by estimating Λo and Λd in terms of Θ. Recalling (6.19), we find that 𝜙 satisfies the hypotheses of Lemma 8.5, from which we get (9.4)

Λo + |Υ𝑖 | ≺ 𝑟(Λ),

𝑟(Λ) ∶=

Im 𝑚sc + Λ . 𝑀𝜂 √

In order to estimate Λd , from (8.17) we have, on the event {𝜙 = 1}, that 2 ∑ 𝑠𝑖𝑘 𝑣𝑘 = 𝑂≺ (Λ2 + 𝑟(Λ)); 𝑣𝑖 − 𝑚sc

(9.5)

𝑘

here we used the bound (9.4) on |Υ𝑖 |. Next, we subtract the average 𝑁 −1 ∑𝑖 from each side to get 2 ∑ 𝑠𝑖𝑘 (𝑣𝑘 − [𝑣]) = 𝑂≺ (Λ2 + 𝑟(Λ)) (𝑣𝑖 − [𝑣]) − 𝑚sc 𝑘

where we used ∑𝑘 𝑠𝑖𝑘 = 1. Note that the average over 𝑖 of the left-hand side vanishes, so that the average of the right-hand side also vanishes. Hence, the 2 right-hand side is perpendicular to 𝐞. Inverting the operator 1 − 𝑚sc 𝑆 on the ⟂ subspace 𝐞 and using ˜ Γ ≤ 𝐶 therefore yields |𝑣𝑖 − [𝑣]| ≺ Λ2 + 𝑟(Λ).

(9.6)

Combining this with the bound Λo ≺ 𝑟(Λ) from (9.4) and recalling that Θ = |[𝑣]|, we therefore get Λ ≺ Θ + Λ2 + 𝑟(Λ).

(9.7)

By definition of 𝜙 we have Λ2 ≤ 𝑀 −𝛾/4 Λ, so that the second term on the righthand side of (9.7) may be absorbed into the left-hand side, (9.8)

Λ ≺ Θ + 𝑟(Λ),

where we used the cancellation property (6.25). Using (9.8) and the CauchySchwarz inequality, we get 𝑟(Λ) ≤

Im 𝑚sc + Λ Im 𝑚sc + Θ 𝑟(Λ) ≺ + 𝑀𝜂 𝑀𝜂 √ 𝑀𝜂 √ √ ≤

Im 𝑚sc + Θ 1 + 𝑀 −𝜖 𝑟(Λ) + 𝑀 𝜖 𝑀𝜂 𝑀𝜂 √

for any 𝜖 > 0. Since 𝜖 > 0 can be arbitrarily small, we conclude that (9.9)

𝑟(Λ) ≺ 𝑞(Θ).

Clearly, (9.3) follows from (9.8) and (9.9).



Step 2. Having estimated Λ in terms of Θ, we can derive a closed relation involving only Θ. More precisely, we will show that the following self-consistent equation holds: (9.10)

−1 2 [𝑣]2 ) = 𝜙𝑂≺ (𝑞(Θ)2 + 𝑀 −𝛾/4 Θ2 ). 𝜙((1 − 𝑚sc )[𝑣] − 𝑚sc

9. SKETCH OF THE PROOF OF THE LOCAL SEMICIRCLE LAW

81

Considering the right-hand side as a small error, we will view (9.10) as a small perturbation of a quadratic equation for [𝑣]. Recalling that Θ = |[𝑣]|, the error term can be determined self-consistently. Thus, up to the accuracy of the error terms, we can solve (9.10) for [𝑣]. We now give a heuristic proof for (9.10). There are two main issues that we ignore in this sketch. First, we work only on the event {𝜙 = 1}; i.e., we assume that an a priori bound Λ ≪ 1 has already been proved uniformly for all 𝜂 ≥ ˜ 𝜂 𝐸. It will require a separate argument to show that the complement event {𝜙 = 0} is negligible and, in some sense, this constitutes the essential part to extend the proof of Theorem 6.7 from 𝜂 ≥ 𝜂𝐸 to the larger regime 𝜂 ≥ ˜ 𝜂 𝐸 . Second, we will neglect certain subtleties of the ≺ relation. While most arithmetics involving ≺ work in the same way as the usual inequality relation ≤, the cancellation rule (6.25) requires the coefficient of 𝑋 on the right-hand side to be small. We will disregard this requirement below and we will treat ≺ as the usual ≤. Since we work on the event {𝜙 = 1}, it is understood that every quantity below is multiplied by 𝜙, but we will ignore this fact in the formulas. Recall from (8.17) that (9.11)

2 −1 2 2 ∑ 𝑠𝑖𝑘 𝑣𝑘 + 𝑚sc Υ𝑖 = 𝑚sc 𝑣𝑖 + 𝑂(Λ3 ). 𝑣𝑖 − 𝑚sc 𝑘

In order to take the average over 𝑖 and get a closed equation for [𝑣], we write, using (9.6), 𝑣𝑖2 = ([𝑣] + 𝑣𝑖 − [𝑣])2 = [𝑣]2 + 2[𝑣](𝑣𝑖 − [𝑣]) + 𝑂≺ ((Λ2 + 𝑟(Λ))2 ). Plugging this back into (9.11) and taking the average over 𝑖 gives 2 2 −1 [Υ] = 𝑚sc [𝑣]2 + 𝑂≺ (Λ3 + 𝑟(Λ)2 ). (1 − 𝑚sc )[𝑣] + 𝑚sc

By Corollary 8.10 with 𝑡𝑖𝑘 = (9.12)

1 , 𝑁

we can estimate

[Υ] ≺ Λ2o ≺ 𝑟(Λ)2 ≺ 𝑞(Θ)2 ,

where we have used (8.24) and (9.9). Hence, we have 2 −1 [𝑣]2 = 𝑂≺ (Λ3 + 𝑟(Λ)2 ) ≤ 𝑂≺ (Θ3 + 𝑞(Θ)2 ) (1 − 𝑚sc )[𝑣] − 𝑚sc

where we have used (9.3). This shows (9.10). Step 3. Finally, we need to solve the approximately quadratic relation (9.10) for [𝑣], keeping in mind that Θ = |[𝑣]|. This will give the main estimates (6.32) and (6.33) in Theorem 6.7. The entrywise version of the local semicircle law, (6.31), will then directly follow from (9.3) on the event {𝜙 = 1}. Recall the definition of 𝑞 from (9.2), so that (9.13)

𝑞(Θ)2 ≍

Im 𝑚sc + Θ 1 , + 𝑀𝜂 (𝑀𝜂)2

82

9. SKETCH OF THE PROOF OF THE LOCAL SEMICIRCLE LAW

and recall from Lemma 6.2 that 𝑐 ≤ |𝑚sc (𝑧)| ≤ 1 − 𝑐𝜂,

2 |1 − 𝑚sc (𝑧)| ≍ √𝜅 + 𝜂,

√𝜅 + 𝜂 Im 𝑚sc (𝑧) ≍ { 𝜂

(9.14)

√𝜅+𝜂

if |𝐸| ≤ 2, if |𝐸| ≥ 2.

In particular, the coefficient of Θ2 = [𝑣]2 in the left-hand side of (9.10) is a nonvanishing constant. In the right-hand side it is at most 𝑀 −𝛾/4 ≪ 1, so the latter can be absorbed into the former. We can thus rewrite (9.10) as [𝑣]2 − 𝐵[𝑣] = 𝐷,

(9.15) where 𝐵=

2 1 − 𝑚sc + 𝑂((𝑀𝜂)−1 ) −1 𝑚sc + 𝑂(𝑀 −𝛾/4 ) Im 𝑚sc

|𝐷| ≤

𝑀𝜂 −1 𝑚sc +

+

2 ≍ 1 − 𝑚sc + 𝑂((𝑀𝜂)−1 ),

1 (𝑀𝜂)2

𝑂(𝑀 −𝛾/4 )



Im 𝑚sc 1 . + 𝑀𝜂 (𝑀𝜂)2

This quadratic equation has two solutions 𝐵 ± √𝐵2 + 4𝐷 ; 2 the correct one is selected by a continuity argument. Fix an energy |𝐸| ≤ 2, consider [𝑣] = [𝑣(𝐸 + 𝑖𝜂)] as a function of 𝜂, and look at the two solution branches (9.16) as continuous functions of 𝜂. It is easy to check that for 𝜂 ≫ 1 large, we have |𝐵| ≍ 1 and |𝐷| ≪ 1, so the two solutions are well-separated; one of them is close to 0, the other one is close to 𝐵. The a priori bound |[𝑣]| ≤ 𝜂 −1 ≪ 1 guarantees that the correct branch is the one near 0. A more careful analysis of (9.16) with the given coefficient functions 𝐵(𝜂) and 𝐷(𝜂) shows that on this branch the following inequality holds: (9.16)

[𝑣] =

(9.17)

Θ = |[𝑣]| ≲ (𝑀𝜂)−1 .

To see this more precisely, we distinguish two cases by reducing 𝜂 from a very large value to 0. In the first regime, where 𝑀𝜂 √𝜅 + 𝜂 ≫ 1, we have |𝐵| ≍ √𝜅 + 𝜂 and |𝐷| ≲ √𝜅 + 𝜂/𝑀𝜂 ≪ |𝐵|2 , so the two branches remain separated and the relevant branch satisfies Θ ≲ |𝐷/𝐵| ≲ (𝑀𝜂)−1 . As we decrease 𝜂 and enter the regime 𝑀𝜂 √𝜅 + 𝜂 ≲ 1, both solutions satisfy Θ ≲ |𝐵| + √|𝐷|. In this regime, |𝐵| ≲ (𝑀𝜂)−1 and |𝐷| ≲ (𝑀𝜂)−2 , so we also have (9.17). In both cases we conclude the final bound (6.32). The proof of (6.33) for the regime |𝐸| > 2 is similar. The improvement comes from the fact that in this regime (9.14) gives a better bound for Im 𝑚sc (𝑧).

CHAPTER 10

Fluctuation Averaging Mechanism 10.1. Intuition Behind the Fluctuation Averaging Before starting the proof of Lemma 8.9, we explain why it is so important in our context. We also give a heuristic second moment calculation to show the essential mechanism. The leading error in the self-consistent equation (8.58) for 𝑣𝑖 is Υ𝑖 . Among the three summands of Υ𝑖 (see (8.11)), typically 𝑍𝑖 is the largest; thus, we typically have (10.1)

𝑣𝑖 ≺ 𝑍𝑖

in the regime where Γ is bounded. The large-deviation bounds in Theorem 7.7 show that 𝑍𝑖 ≺ Λo , and a simple second-moment calculation shows that this bound is essentially optimal. On the other hand, (8.3) shows that the typical size of the off-diagonal resolvent matrix elements is at least of order (𝑁𝜂)−1/2 ; thus, the estimate 1 𝑍𝑖 ≺ Λ o ≺ √𝑁𝜂 is essentially optimal in the generalized Wigner case (𝑀 ≍ 𝑁). Together with (10.1) this shows that the natural bound for Λ is (𝑁𝜂)−1/2 , which is also reflected in the bound (6.31). However, the bound (6.32) for the average, 𝑚𝑁 − 𝑚 = [𝑣], is of order Λ2 ≍ (𝑁𝜂)−1 ; i.e., it is one order better than the bound for 𝑣𝑖 . For the purpose of estimating [𝑣], it is the average [Υ] of the leading errors Υ𝑖 that matters (see (8.59)). Since 𝑍𝑖 , the leading term in Υ𝑖 , is a fluctuating quantity with zero expectation, the improvement comes from the fact that fluctuations cancel out in the average. The basic example of this phenomenon is the central limit theorem; the average of 𝑁 independent centered random variables is typically of order 𝑁 −1/2 , much smaller than the size of each individual variable. In our case, however, the 𝑍𝑖 are not independent. In fact, their correlations do not decay, at least in the Wigner case where the column of 𝐻 with an index other than 𝑖 plays the same role in the function 𝑍𝑖 . Thus, standard results on central limit theorems for weakly correlated random variables do not apply here and a new argument is needed. We first give an idea how the fluctuation averaging mechanism works by a second-moment calculation. First, we claim that 1 | | |𝑄𝑘 | ≺ Ψo . (10.2) | 𝐺𝑘𝑘 | 83

84

10. FLUCTUATION AVERAGING MECHANISM

Indeed, from the Schur complement formula (8.7) we get |𝑄𝑘 (𝐺𝑘𝑘 )−1 | ≤ |ℎ𝑘𝑘 |+ |𝑍𝑘 |. The first term is estimated by |ℎ𝑘𝑘 | ≺ 𝑀 −1/2 ≤ Ψo . The second term is estimated exactly as in (8.29) and (8.30), giving |𝑍𝑘 | ≺ Ψo . In fact, the same (𝕋) bound holds if 𝐺𝑘𝑘 is replaced with 𝐺𝑘𝑘 as long as the cardinality of 𝕋 is bounded (see (10.10) later). Abbreviate 𝑋𝑘 ∶= 𝑄𝑘 (𝐺𝑘𝑘 )−1 and compute the variance: (10.3)

2

2 𝔼||∑ 𝑡𝑖𝑘 𝑋𝑘 || = ∑ 𝑡𝑖𝑘 𝑡𝑖𝑙 𝔼𝑋𝑘 𝑋𝑙 = ∑ 𝑡𝑖𝑘 𝔼𝑋𝑘 𝑋𝑘 + ∑ 𝑡𝑖𝑘 𝑡𝑖𝑙 𝔼𝑋𝑘 𝑋𝑙 . 𝑘

𝑘

𝑘,𝑙

𝑘≠𝑙

Using the bounds (8.45) on 𝑡𝑖𝑘 and (10.2), we find that the first term on the right-hand side of (10.3) is 𝑂≺ (𝑀 −1 Ψ2o ) = 𝑂≺ (Ψ4o ), where we used that Ψo is admissible, recalling (8.20). Let us therefore focus on the second term of (10.3). Using the fact that 𝑘 ≠ 𝑙, we apply the first resolvent decoupling formula (8.1) to 𝑋𝑘 and 𝑋𝑙 to get 𝔼𝑋𝑘 𝑋𝑙 = 𝔼𝑄𝑘 ( (10.4) = 𝔼𝑄𝑘 (

1 1 )𝑄𝑙 ( ) 𝐺𝑘𝑘 𝐺𝑙𝑙 1 (𝑙) 𝐺𝑘𝑘



𝐺𝑘𝑙 𝐺𝑙𝑘 (𝑙) 𝐺𝑘𝑘 𝐺𝑘𝑘 𝐺𝑙𝑙

)𝑄𝑙 (

1 (𝑘) 𝐺𝑙𝑙

𝐺𝑙𝑘 𝐺𝑘𝑙



(𝑘)

).

𝐺𝑙𝑙 𝐺𝑙𝑙 𝐺𝑘𝑘

Notice that we used (8.1) in the form (10.5)

1 (𝕋)

𝐺𝑘𝑘

=

(𝕋)

1 (𝕋𝑙)

𝐺𝑘𝑘



(𝕋)

𝐺𝑘𝑙 𝐺𝑙𝑘 (𝕋)

(𝕋𝑙)

(𝕋)

for any 𝑘 ≠ 𝑙, 𝑘, 𝑙 ∉ 𝕋

𝐺𝑘𝑘 𝐺𝑘𝑘 𝐺𝑙𝑙

with 𝕋 = ∅. We multiply out the parentheses on the right-hand side of (10.4). The crucial observation is that if the random variable 𝑌 is independent of 𝑖 (see Definition 7.3) then not only 𝑄𝑖 𝑌 = 0 but also 𝔼𝑄𝑖 (𝑋)𝑌 = 𝔼𝑄𝑖 (𝑋𝑌) = 0 hold for any 𝑋. Hence out of the four terms obtained from the right-hand side of (10.4), the only nonvanishing one is 𝔼𝑄𝑘 (

𝐺𝑘𝑙 𝐺𝑙𝑘 (𝑙) 𝐺𝑘𝑘 𝐺𝑘𝑘 𝐺𝑙𝑙

)𝑄𝑙 (

𝐺𝑙𝑘 𝐺𝑘𝑙 (𝑘) 𝐺𝑙𝑙 𝐺𝑙𝑙 𝐺𝑘𝑘

) ≺ Ψ4o

where we used that the denominators are harmless (see (8.27)–(8.28)). Together 2 with (8.45), this concludes the proof of 𝔼|∑𝑘 𝑡𝑖𝑘 𝑋𝑘 | ≺ Ψ4o , which means that ∑𝑘 𝑡𝑖𝑘 𝑋𝑘 is bounded by Ψ2o in the second-moment sense. 10.2. Proof of Lemma 8.9 Here we follow the presentation of the proof from [55], and we will comment on previous results in Section 10.3.1. We start with a simple lemma that summarizes the key properties of ≺ when combined with partial expectation. Lemma 10.1. Suppose that the deterministic control parameter Ψ satisfies Ψ ≥ 𝑁 −𝐶 , and that for all 𝑝 there is a constant 𝐶𝑝 such that the nonnegative

10.2. PROOF OF LEMMA 8.9

85

random variable 𝑋 satisfies 𝔼𝑋 𝑝 ≤ 𝑁 𝐶𝑝 . Suppose, moreover, that 𝑋 ≺ Ψ. Then, for any fixed 𝑛 ∈ ℕ we have 𝔼𝑋 𝑛 ≺ Ψ𝑛 .

(10.6)

(Note that this estimate involves deterministic quantities only; i.e., it means that 𝔼𝑋 𝑛 ≤ 𝑁 𝜀 Ψ𝑛 for any 𝜀 > 0 if 𝑁 ≥ 𝑁0 (𝑛, 𝜀).) Conversely, if (10.6) holds for any 𝑛 ∈ ℕ, then 𝑋 ≺ Ψ. Moreover, we have (10.7)

𝑃𝑖 𝑋 𝑛 ≺ Ψ𝑛 ,

𝑄𝑖 𝑋 𝑛 ≺ Ψ𝑛 ,

uniformly in 𝑖. If 𝑋 = 𝑋(𝑢) and Ψ = Ψ(𝑢) depend on some parameter 𝑢, and the above assumptions are uniform in 𝑢, then so are the conclusions. This lemma is a generalization of (6.26), but notice that the majoring quantity Ψ has to be deterministic. For general random variables, 𝑋 ≺ 𝑌 may not imply the analogous relation 𝔼[𝑋|ℱ] ≺ 𝔼[𝑌|ℱ] for conditional expectations with respect to a 𝜎-algebra ℱ. Proof of Lemma 10.1. To prove (10.6), it is enough to consider the case 𝑛 = 1; the case of larger 𝑛 follows immediately from the case 𝑛 = 1 using the basic property of the stochastic domination that 𝑋 ≺ Ψ implies 𝑋 𝑛 ≺ Ψ𝑛 . For the first claim, pick 𝜖 > 0. Then 𝔼𝑋 = 𝔼𝑋 ⋅ 𝟏(𝑋 ≤ 𝑁 𝜖 Ψ) + 𝔼𝑋 ⋅ 𝟏(𝑋 > 𝑁 𝜖 Ψ) ≤ 𝑁 𝜖 Ψ + √𝔼𝑋 2 √ℙ(𝑋 > 𝑁 𝜖 Ψ) ≤ 𝑁 𝜖 Ψ + 𝑁 𝐶2 /2−𝐷/2 for arbitrary 𝐷 > 0. The first claim therefore follows by choosing 𝐷 large enough. For the converse statement, we use the Chebyshev inequality: for any 𝜀 > 0 and 𝐷 > 0, ℙ(𝑋 ≥ 𝑁 𝜀 Ψ) ≤

𝔼𝑋 𝑛 ≤ 𝑁 𝜀−𝜀𝑛 ≤ 𝑁 −𝐷 𝑁 𝜀𝑛 Ψ𝑛

by choosing 𝑛 large enough. Finally, the claim (10.7) follows from Chebyshev’s inequality, using a highmoment estimate combined with Jensen’s inequality for partial expectation. We omit the details, which are similar to those of the first claim. □ We remark that it is tempting to generalize the converse of (10.6) and try to verify stochastic domination 𝑋 ≺ 𝑌 between two random variables by proving that 𝔼𝑋 𝑛 ≤ 𝔼𝑌 𝑛 . This implication in general is wrong, but it is correct if 𝑌 is deterministic. This subtlety is the reason why many of the following theorems about two random variables 𝐴 and 𝐵 are formulated in such a way that “if 𝐴 satisfies 𝐴 ≺ Ψ for some deterministic Ψ, then 𝐵 ≺ Ψ,” instead of directly saying the more natural (but sometimes wrong) assertion that 𝐴 ≺ 𝐵. We shall apply Lemma 10.1 to the resolvent entries of 𝐺. In order to verify its assumptions, we record the following bounds.

86

10. FLUCTUATION AVERAGING MECHANISM

Lemma 10.2. Suppose that Λ ≺ Ψ and Λo ≺ Ψo for some deterministic control parameters Ψ and Ψo both satisfying (8.20). Fix 𝑝 ∈ ℕ. Then, for any 𝑖 ≠ 𝑗 and 𝕋 ⊂ {1, … , 𝑁} satisfying |𝕋| ≤ 𝑝 and 𝑖, 𝑗 ∉ 𝕋 we have (10.8)

1

(𝕋)

𝐺𝑖𝑗 = 𝑂≺ (Ψo ),

(𝕋)

𝐺𝑖𝑖

= 𝑂≺ (1).

(𝕋) Moreover, we have the rough bounds |𝐺𝑖𝑗 | ≤ 𝑀 and 𝑛

| 1 | | ≤ 𝑁𝜖 𝔼| | 𝐺 (𝕋) |

(10.9)

𝑖𝑖

for any 𝜖 > 0 and 𝑁 ≥ 𝑁0 (𝑛, 𝜖). Proof. The bounds (10.8) follow easily by a repeated application of (8.1), the assumption Λ ≺ 𝑀 −𝑐 , and the lower bound in (6.11). The deterministic (𝕋) bound |𝐺𝑖𝑗 | ≤ 𝑀 follows immediately from 𝜂 ≥ 𝑀 −1 by definition of a spectral domain. In order to prove (10.9), we use the Schur complement formula (8.7) applied (𝕋) (𝕋) to 1/𝐺𝑖𝑖 where the expectation is estimated using (5.6) and |𝐺𝑖𝑗 | ≤ 𝑀. (Recall (6.4).) This gives 𝑝

| 1 | | ≺ 𝑁 𝐶𝑝 𝔼| | 𝐺 (𝕋) | 𝑖𝑖

for all 𝑝 ∈ ℕ. Since

(𝕋) 1/𝐺𝑖𝑖

≺ 1, (10.9) therefore follows from (10.6).



We now start the proof of Lemma 8.9. The main statement is (8.47), the other bounds will be relatively simple consequences. Finally, we present an alternative argument for (8.47) that organizes the stopping rule in the expansion somewhat differently. Proof of Lemma 8.9. Part I: Proof of (8.47). First we claim that, for any fixed 𝑝 ∈ ℕ, we have (10.10)

| 1 | |𝑄𝑘 | ≺ Ψo | 𝐺 (𝕋) | 𝑘𝑘

uniformly for 𝕋 ⊂ {1, … , ℕ}, |𝕋| ≤ 𝑝, and 𝑘 ∉ 𝕋. To simplify notation, for the proof we set 𝕋 = ∅; the proof for nonempty 𝕋 is the same. From the Schur complement formula (8.7), we get |𝑄𝑘 (𝐺𝑘𝑘 )−1 | ≤ |ℎ𝑘𝑘 | + |𝑍𝑘 |. The first term is estimated by |ℎ𝑘𝑘 | ≺ 𝑀 −1/2 ≤ Ψo . The second term is estimated exactly as in (8.29) and (8.30): (𝑘)

|𝑍𝑘 | ≺ ( ∑ 𝑥≠𝑦

1/2 (𝑘) 2 𝑠𝑘𝑥 |𝐺𝑥𝑦 | 𝑠𝑦𝑘 )

≺ Ψo

10.2. PROOF OF LEMMA 8.9

87

(𝑘) where in the last step we used that |𝐺𝑥𝑦 | ≺ Ψo as follows from (10.8) and the bound 1/|𝐺𝑘𝑘 | ≺ 1 (recall that Λ ≺ Ψ ≤ 𝑀 −𝑐 ). This concludes the proof of (10.10). Abbreviate 𝑋𝑘 ∶= 𝑄𝑘 (𝐺𝑘𝑘 )−1 . We shall estimate ∑𝑘 𝑡𝑖𝑘 𝑋𝑘 in probability 2𝑝

by estimating its 𝑝th moment by Ψo , from which the claim (8.47) will easily follow using Chebyshev’s inequality. After this preparation, the rest of the proof for (8.47) can be divided into four steps. Step 1. Coincidence structure in the expansion of the 𝐿𝑝 norm. Fix some even integer 𝑝 and write 𝑝

| | 𝔼||∑ 𝑡𝑖𝑘 𝑋𝑘 || = 𝑘

∑ 𝑡𝑖𝑘1 … 𝑡𝑖𝑘𝑝/2 𝑡 𝑖𝑘𝑝/2+1 … 𝑡𝑖𝑘𝑝 𝔼𝑋𝑘1 ⋯ 𝑋𝑘𝑝/2 𝑋 𝑘𝑝/2+1 ⋯ 𝑋 𝑘𝑝 . 𝑘1 ,…,𝑘𝑝

Next, we regroup the terms in the sum over 𝐤 ∶= (𝑘1 , … , 𝑘𝑝 ) according to the coincidence structure in 𝐤 as follows: given a sequence of indices 𝐤, define the partition 𝒫(𝐤) of {1, … , 𝑝} by the equivalence relation 𝑟 ∼ 𝑠 if and only if 𝑘𝑟 = 𝑘𝑠 . Denote the set of all partitions of {1, … , 𝑝} by 𝔓𝑝 . Then, we write 𝑝

(10.11)

| | 𝔼||∑ 𝑡𝑖𝑘 𝑋𝑘 || = ∑ ∑ 𝑡𝑖𝑘1 … 𝑡𝑖𝑘𝑝/2 𝑡 𝑖𝑘𝑝/2+1 … 𝑡 𝑖𝑘𝑝 𝟏(𝒫(𝐤) = 𝑃)𝑉(𝐤) 𝑘

𝑃∈𝔓𝑝 𝐤

where we defined 𝑉(𝐤) ∶= 𝔼𝑋𝑘1 … 𝑋𝑘𝑝/2 𝑋 𝑘𝑝/2+1 … 𝑋 𝑘𝑝 . Given a partition 𝑃, for any 𝑟 ∈ {1, … , 𝑝}, we denote by [𝑟] the block of 𝑟 in 𝑃, i.e., the set of all indices in the same block of the partition as 𝑟. Let 𝐿 ≡ 𝐿(𝑃) ∶= {𝑟 ∶ [𝑟] = {𝑟}} ⊂ {1, … , 𝑝} be the set of lone labels. We denote by 𝐤𝐿 ∶= (𝑘𝑟 )𝑟∈𝐿 the summation indices associated with lone labels. Step 2. Resolution of dependence in weakly dependent random variables. The resolvent entry 𝐺𝑘𝑘 depends strongly on the randomness in the 𝑘-column of 𝐻, but only weakly on the randomness in the other columns. We conclude that if 𝑟 is a lone label, then all factors 𝑋𝑘𝑠 with 𝑠 ≠ 𝑟 in 𝑉(𝐤) depend weakly on the randomness in the 𝑘𝑟 th column of 𝐻 (if 𝑟 is not a lone label, then this statement holds only for “all factors 𝑋𝑘𝑠 with 𝑘𝑠 ≠ 𝑘𝑟 ”). Thus, the idea is to make all resolvent entries inside the expectation of 𝑉(𝐤) as independent of the indices 𝐤𝐿 as possible (see Definition 7.3), using the first decoupling resolvent identity (8.1): for 𝑥, 𝑦, 𝑢 ∉ 𝕋 and 𝑥, 𝑦 ≠ 𝑢 (𝑥 can be equal to 𝑦), (𝕋)

(𝕋) 𝐺𝑥𝑦

(10.12)

=

(𝕋𝑢) 𝐺𝑥𝑦

+

(𝕋)

𝐺𝑥𝑢 𝐺𝑢𝑦 (𝕋)

𝐺𝑢𝑢

and for 𝑥, 𝑢 ∉ 𝕋 and 𝑥 ≠ 𝑢 (10.13)

1 (𝕋)

𝐺𝑥𝑥

=

1 (𝕋𝑢)

𝐺𝑥𝑥

(𝕋)



(𝕋)

𝐺𝑥𝑢 𝐺𝑢𝑥 (𝕋)

(𝕋𝑢)

(𝕋)

𝐺𝑥𝑥 𝐺𝑥𝑥 𝐺𝑢𝑢

.

88

10. FLUCTUATION AVERAGING MECHANISM (𝕋)

Definition 10.3. A resolvent entry 𝐺𝑥𝑦 with 𝑥, 𝑦 ∉ 𝕋 is maximally expanded with respect to a set 𝐵 ⊂ {1, … 𝑁} if 𝐵 ⊂ 𝕋 ∪ {𝑥, 𝑦}. (𝕋)

Given the set 𝐤𝐿 of lone indices, we say that a resolvent entry 𝐺𝑥𝑦 is maximally expanded if it is maximally expanded with respect to the set 𝐵 = 𝐤𝐿 . The motivation behind this definition is that using (8.1) we cannot add upper indices from the set 𝐤𝐿 to a maximally expanded resolvent entry. We shall apply (8.1) to all resolvent entries in 𝑉(𝐤). In this manner we generate a sum of monomials consisting of off-diagonal resolvent entries and inverses of diagonal resolvent entries. We can now repeatedly apply (8.1) to each factor until either they are all maximally expanded or a sufficiently large number of off-diagonal resolvent entries has been generated. The cap on the number of off-diagonal entries is introduced to ensure that this procedure terminates after a finite number of steps. In order to define the precise algorithm, let 𝒜 denote the set of monomials (𝕋) in the off-diagonal entries 𝐺𝑥𝑦 , with 𝕋 ⊂ 𝐤𝐿 , 𝑥 ≠ 𝑦, and 𝑥, 𝑦 ∈ 𝐤 ⧵ 𝕋, as well as (𝕋)

the inverse diagonal entries 1/𝐺𝑥𝑥 with 𝕋 ⊂ 𝐤𝐿 and 𝑥 ∈ 𝐤 ⧵ 𝕋. Starting from 𝑉(𝐤), the algorithm will recursively generate sums of monomials in 𝒜. Let 𝑑(𝐴) denote the number of off-diagonal entries in 𝐴 ∈ 𝒜. For 𝐴 ∈ 𝒜 we shall define 𝑤0 (𝐴), 𝑤1 (𝐴) ∈ 𝒜 satisfying 𝐴 = 𝑤0 (𝐴) + 𝑤1 (𝐴), 𝑑(𝑤0 (𝐴)) = 𝑑(𝐴), 𝑑(𝑤1 (𝐴)) ≥ max{2, 𝑑(𝐴) + 1}. The idea behind this splitting is to use (8.1) on one entry of 𝐴; the first term on the right-hand side of (8.1) gives rise to 𝑤0 (𝐴) and the second to 𝑤1 (𝐴). The precise definition of the algorithm applied to 𝐴 ∈ 𝒜 is as follows: (1) If all factors of 𝐴 are maximally expanded or 𝑑(𝐴) ≥ 𝑝 + 1, then stop the expansion of 𝐴. (2) Otherwise, choose some (arbitrary) factor of 𝐴 that is not maximally (𝕋) expanded. If this entry is off-diagonal, 𝐺𝑥𝑦 , we use (10.12) to de(𝕋) compose 𝐺𝑥𝑦 into a sum of two terms with 𝑢 the smallest element (𝕋) in 𝐤𝐿 ⧵ (𝕋 ∪ {𝑥, 𝑦}). If the chosen entry is diagonal, 1/𝐺𝑥𝑥 , we use (𝕋) (10.13) to decompose 𝐺𝑥𝑥 into two terms with 𝑢 the smallest element in 𝐤𝐿 ⧵ (𝕋 ∪ {𝑥}). The choice of 𝑢 to be the smallest element is not important, we just chose it for definiteness of the algorithm. From the (𝕋) (𝕋) splitting of the factor 𝐺𝑥𝑦 or 1/𝐺𝑥𝑥 in the monomial 𝐴, we obtain a (𝕋) (𝕋) splitting 𝐴 = 𝑤0 (𝐴) + 𝑤1 (𝐴) (i.e., we replace 𝐺𝑥𝑦 or 1/𝐺𝑥𝑥 by the right-hand sides of (10.12) or (10.13)). It is clear that (10.14) holds with the algorithm just defined. In fact, in most cases the last inequality in (10.14) is an equality, 𝑑(𝑤1 (𝐴)) = max{2, 𝑑(𝐴) + 1}. The only exception is when (10.12) is used for 𝑥 = 𝑦, then two new off-diagonal entries are added, i.e., 𝑑(𝑤1 (𝐴)) = 𝑑(𝐴) + 2. Notice also that this algorithm

10.2. PROOF OF LEMMA 8.9

89

Figure 10.1. The binary tree generated by applying the algorithm (1)–(2) to a monomial 𝐴𝑟 . Each vertex of the tree is indexed by a binary string 𝜎 and encodes a monomial 𝐴𝑟𝜍 . An arrow towards the left represents the action of 𝑤0 and an arrow towards the right the action of 𝑤1 . The monomial 𝐴𝑟11 satisfies the assumptions of step (1), and hence its expansion is stopped, so that the tree vertex 11 has no children. contains some arbitrariness in the choice of the factor of 𝐴 to be expanded but any choice will work for the proof. We now apply this algorithm recursively to each entry 𝐴𝑟 ∶= 1/𝐺𝑘𝑟 𝑘𝑟 in the definition of 𝑉(𝐤). More precisely, we start with 𝐴𝑟 and define 𝐴𝑟0 ∶= 𝑤0 (𝐴𝑟 ) and 𝐴𝑟1 ∶= 𝑤1 (𝐴𝑟 ). In the second step of the algorithm we define four monomials 𝐴𝑟00 ∶= 𝑤0 (𝐴𝑟0 ),

𝐴𝑟01 ∶= 𝑤0 (𝐴𝑟1 ),

𝐴𝑟10 ∶= 𝑤1 (𝐴𝑟0 ),

𝐴𝑟11 ∶= 𝑤1 (𝐴𝑟1 ),

and so on, at each iteration performing the steps (1) and (2) on each new monomial independently of the others. Note that the lower indices are binary sequences that describe the recursive application of the operations 𝑤0 and 𝑤1 . In this manner, we generate a binary tree whose vertices are given by finite binary strings 𝜎. The associated monomials satisfy 𝐴𝑟𝜍𝑖 ∶= 𝑤𝑖 (𝐴𝑟𝜍 ) for 𝑖 = 0, 1 where 𝜎𝑖 denotes the binary string obtained by appending 𝑖 to the right end of 𝜎. See Figure 10.1 (borrowed from [55]) for an illustration of the tree. Notice that any monomial that is obtained along these recursion is of the form (10.14)

(𝕋 )

(𝕋 )

(𝕋′ )

(𝕋′ )

𝐺𝑥1 𝑦1 1 𝐺𝑥2 𝑦2 2 ⋯ 𝐺𝑧1 𝑧1 1 𝐺𝑧2 𝑧2 2 ⋯

where all factors in the numerators are off-diagonal entries (𝑥𝑖 ≠ 𝑦𝑖 ) and all factors in the denominators are diagonal entries. For later usage, we shall refer to the sum (10.15)

|𝕋1 | + |𝕋2 | + ⋯ + |𝕋′1 | + |𝕋′2 | + ⋯

as the total number of upper indices in the monomial (10.14). (The absolute value denotes the cardinality of the set.)

90

10. FLUCTUATION AVERAGING MECHANISM

We stop the recursion of a tree vertex whenever the associated monomial satisfies the stopping rule of step (1). In other words, the set of leaves of the tree is the set of binary strings 𝜎 such that either all factors of 𝐴𝑟𝜍 are maximally expanded or 𝑑(𝐴𝑟𝜍 ) ≥ 𝑝 + 1. We list a few obvious facts of this algorithm: (i) 𝑑(𝐴𝑟𝜍 ) ≤ 𝑝 + 1 for any vertex 𝜎 of the tree (by the stopping rule in (1)). (ii) The number of ones in any 𝜎 is at most 𝑝 (since each application of 𝑤1 increases 𝑑( ⋅ ) by at least one). (iii) The number of resolvent entries (in the numerator and denominator) in 𝐴𝑟𝜍 is bounded by 4𝑝 + 1 (since each application of 𝑤1 increases the number of resolvent entries by at most four, and the application of 𝑤0 does not change this number). (iv) The maximal total number of upper indices in 𝐴𝑟𝜍 for any tree vertex 𝜎 is (4𝑝 + 1)𝑝 (since |𝕋𝑖 | ≤ 𝑝 and |𝕋′𝑖 | ≤ 𝑝 in the formula (10.15)). (v) 𝜎 contains at most (4𝑝 + 1)𝑝 zeros (since each application of 𝑤0 increases the total number of upper indices by one). (vi) The maximal length of the string 𝜎 (i.e., the depth of the tree) is at most (4𝑝 + 1)𝑝 + 𝑝 = 4𝑝2 + 2𝑝. (vii) The number of leaves of a tree is bounded by (𝐶𝑝2 )𝑝 (since this num2 𝑝 ber is bounded by ∑𝑘=0 (4𝑝 +2𝑝) ≤ (𝐶𝑝2 )𝑝 where 𝑘 is the number of 𝑘 ones in the string encoding the leaf). Denoting by ℒ𝑟 the set of leaves of the binary tree generated from 𝐴𝑟 , we have |ℒ𝑟 | ≤ (𝐶𝑝2 )𝑝 . In particular, the algorithm terminates after at most (𝐶𝑝2 )𝑝 steps, since each step generates a new leaf. By definition of the tree and 𝑤0 and 𝑤1 , we have the following resolution of dependence decomposition 𝑋𝑘𝑟 = 𝑄𝑘𝑟 ∑ 𝐴𝑟𝜍 ,

(10.16)

𝜍∈ℒ𝑟

𝐴𝑟𝜍

where each monomial for 𝜎 ∈ ℒ𝑟 either consists entirely of maximally expanded resolvent entries or satisfies 𝑑(𝐴𝑟𝜍 ) = 𝑝 + 1. (This is an immediate consequence of the stopping rule in (1)). Using (10.11) and (10.16) we have the representation (10.17)

𝑝

𝑉(𝐤) = ∑ ⋯ ∑ 𝔼(𝑄𝑘1 𝐴1𝜍1 ) … (𝑄𝑘𝑝 𝐴𝜍𝑝 ). 𝜍1 ∈ℒ1

𝜍𝑝 ∈ℒ𝑝

Step 3. Bounding the individual terms in the expansion (10.17). We now claim that any nonzero term on the right-hand side of (10.17) satisfies (10.18)

𝑝

𝑝+|𝐿|

𝔼(𝑄𝑘1 𝐴1𝜍1 ) ⋯ (𝑄𝑘𝑝 𝐴𝜍𝑝 ) = 𝑂≺ (Ψo

).

Proof of (10.18). Before embarking on the proof, we explain its idea. First, notice that for any string 𝜎 we have (10.19)

𝑏(𝜍)+1

𝐴𝑘𝜍 = 𝑂≺ (Ψo

)

10.2. PROOF OF LEMMA 8.9

91

where 𝑏(𝜎) is the number ones in the string 𝜎. Indeed, if 𝑏(𝜎) = 0 then this follows from (10.10); if 𝑏(𝜎) ≥ 1 this follows from the last statement in (10.14) which guarantees that every one in the string 𝜎 increases the exponent of Ψo by at least one (we also use (10.8)). In particular, each 𝐴𝑘𝜍 is bounded by at least Ψo . If we used only the trivial bound 𝐴𝑘𝜍 = 𝑂≺ (Ψo ) for each factor in (10.18); i.e., we did not exploit the gain from the 𝑤1 (𝐴) type terms in the expansion, 𝑝 then the naive size of the left-hand side of (10.18) would only be Ψo . The key observation behind (10.18) is that each lone label 𝑠 ∈ 𝐿 yields one extra factor Ψo to the estimate. This is because the expectation in (10.17) would vanish if all other factors (𝑄𝑘𝑟 𝐴𝑟𝜍𝑟 ), 𝑟 ≠ 𝑠, were independent of 𝑘𝑠 . The expansion of the binary tree makes this dependence explicit by exhibiting 𝑘𝑠 as a lower index. But this requires performing an operation 𝑤1 with the choice 𝑢 = 𝑘𝑠 in (10.12) or (10.13). However, 𝑤1 increases the number of off-diagonal elements by at least one. In other words, every index associated with a lone label must have a “partner” index in a different resolvent entry which arose by application of 𝑤1 . Such a partner index may only be obtained through the creation of at least one off-diagonal resolvent entry. The actual proof below shows that this effect applies cumulatively for all lone labels. In order to give the rigorous proof of (10.18), we consider two cases. Consider first the case where for some 𝑟 = 1, … , 𝑝 the monomial 𝐴𝑟𝜍𝑟 on the left-hand side of (10.18) is not maximally expanded. Then 𝑑(𝐴𝑟𝜍𝑟 ) = 𝑝 + 1, so that (10.8) 𝑝+1 yields 𝐴𝑟𝜍𝑟 ≺ Ψo . Therefore, the observation that 𝐴𝑠𝜍𝑠 ≺ Ψo for all 𝑠 ≠ 𝑟 2𝑝 together with (10.7) implies that the left-hand side of (10.18) is 𝑂≺ (Ψo ). Since |𝐿| ≤ 𝑝, (10.18) follows. Consider now the case where 𝐴𝑟𝜍𝑟 on the left-hand side of (10.18) is maximally expanded for all 𝑟 = 1, … , 𝑝. The key observation is the following claim about the left-hand side of (10.18) with a nonzero expectation. (∗)

For each 𝑠 ∈ 𝐿 there exists 𝑟 ∶= 𝜏(𝑠) ∈ {1, … , 𝑝} ⧵ {𝑠} such that the monomial 𝐴𝑟𝜍𝑟 contains a resolvent entry with lower index 𝑘𝑠 .

To prove (∗), suppose by contradiction that there exists an 𝑠 ∈ 𝐿 such that for all 𝑟 ∈ {1, … , 𝑝} ⧵ {𝑠} the lower index 𝑘𝑠 does not appear in the monomial 𝐴𝑟𝜍𝑟 . To simplify notation, we assume that 𝑠 = 1. Then, for all 𝑟 = 2, … , 𝑝, since 𝐴𝑟𝜍𝑟 is maximally expanded, we find that 𝐴𝑟𝜍𝑟 is independent of 𝑘1 . Therefore, we have 𝑝

𝑝

𝔼(𝑄𝑘1 𝐴1𝜍1 )(𝑄𝑘2 𝐴2𝜍2 ) ⋯ (𝑄𝑘𝑝 𝐴𝜍𝑝 ) = 𝔼𝑄𝑘1 (𝐴1𝜍1 (𝑄𝑘2 𝐴2𝜍2 ) ⋯ (𝑄𝑘𝑝 𝐴𝜍𝑝 )) = 0, where in the last step we used that 𝔼𝑄𝑖 (𝑋)𝑌 = 𝔼𝑄𝑖 (𝑋𝑌) = 0 if 𝑌 is independent of 𝑖. This concludes the proof of (∗). The statement (∗) can be reformulated as asserting that, after expansion, every lone label 𝑠 has a “partner” label 𝑟 = 𝜏(𝑠), such that the index 𝑘𝑠 appears

92

10. FLUCTUATION AVERAGING MECHANISM

also as a lower index in the expansion of 𝐴𝑟 (note that there may be several such partner labels 𝑟, we can choose 𝜏(𝑠) to be any one of them). For 𝑟 ∈ {1, … , 𝑝} we define ℓ(𝑟) ∶= ∑𝑠∈𝐿 𝟏(𝜏(𝑠) = 𝑟), the number of times that the label 𝑟 was chosen as a partner to some lone label 𝑠. We now claim that 1+ℓ(𝑟)

𝐴𝑟𝜍𝑟 = 𝑂≺ (Ψo

(10.20)

).

To prove (10.20), fix 𝑟 ∈ {1, … , 𝑝}. By definition, for each 𝑠 ∈ 𝜏 −1 ({𝑟}) the index 𝑘𝑠 appears as a lower index in the monomial 𝐴𝑟𝜍𝑟 . Since 𝑠 ∈ 𝐿 is by definition a lone label and 𝑠 ≠ 𝑟, we know that 𝑘𝑠 does not appear as an index in 𝐴𝑟 = 1/𝐺𝑘𝑟 𝑘𝑟 , i.e., 𝑘𝑟 ≠ 𝑘𝑠 . By definition of the monomials associated with the tree vertex 𝜎𝑟 , it follows that 𝑏(𝜎𝑟 ), the number of ones in 𝜎𝑟 , is at least |𝜏 −1 ({𝑟})| = ℓ(𝑟) since each application of 𝑤1 adds precisely one new lower index while application of 𝑤0 leaves the lower indices unchanged. Note that in this step it is crucial that 𝑠 ∈ 𝜏 −1 ({𝑟}) was a lone label. Recalling (10.19), we therefore get (10.20). Using (10.20) and Lemma 10.1, we find 𝑝

𝑝+|𝐿| |(𝑄 𝐴1 ) ⋯ (𝑄 𝐴𝑝𝜍 )| ≺ ∏ Ψ1+ℓ(𝑟) . = Ψo o 𝑘𝑝 | 𝑘1 𝜍1 𝑝 | 𝑟=1



This concludes the proof of (10.18).

Summing over the binary trees in (10.17) and using Lemma 10.1, we get from (10.18) 𝑝+|𝐿|

(10.21)

𝑉(𝐤) = 𝑂≺ (Ψo

).

Step 4. Summing over the expansion. We now return to the sum (10.11). We perform the summation by first fixing 𝑃 ∈ 𝔓𝑝 , with associated lone labels 𝐿 = 𝐿(𝑃). Using |𝑡𝑖𝑗 | ≤ 𝑀 −1 and (8.45), we have | | | | |∑ 𝟏(𝒫(𝐤) = 𝑃) 𝑡𝑖𝑘1 ⋯ 𝑡𝑖𝑘𝑝/2 𝑡𝑖𝑘𝑝/2+1 ⋯ 𝑡𝑖𝑘𝑝 | ≤ 𝑀 −𝑝 |∑ 𝟏(𝒫(𝐤) = 𝑃) |. | | | | 𝐤

𝐤

Now the number of 𝐤 with |𝐿| lone labels can be bounded easily by 𝑀 |𝐿|+(𝑝−|𝐿|)/2 since each block of 𝑃 that is not contained in 𝐿 consists of at least two labels. Thus, we can bound the last displayed equation by | | |∑ 𝟏(𝒫(𝐤) = 𝑃)𝑡𝑖𝑘1 ⋯ 𝑡𝑖𝑘𝑝/2 𝑡𝑖𝑘𝑝/2+1 ⋯ 𝑡𝑖𝑘𝑝 | ≤ 𝑀 −𝑝 𝑀 |𝐿|+(𝑝−|𝐿|)/2 | | 𝐤

= (𝑀 −1/2 )𝑝−|𝐿| . From (10.11) and (10.21) we get 𝑝

𝑝+|𝐿(𝑃)| 2𝑝 | | 𝔼||∑ 𝑡𝑖𝑘 𝑋𝑘 || ≺ ∑ (𝑀 −1/2 )𝑝−|𝐿(𝑃)| Ψo ≤ 𝐶𝑝 Ψo , 𝑘

𝑃∈𝔓𝑝

10.2. PROOF OF LEMMA 8.9

93

where in the last step we used the lower bound from (8.20) and estimated the summation over 𝔓𝑝 with a constant 𝐶𝑝 (which is bounded by (𝐶𝑝2 )𝑝 ). Summarizing, we have proved that 𝑝

2𝑝 | | 𝔼||∑ 𝑡𝑖𝑘 𝑋𝑘 || ≺ Ψo

(10.22)

𝑘

for any 𝑝 ∈ 2ℕ. We conclude the proof of (8.47) with a simple application of Chebyshev’s inequality. Fix 𝜖 > 0 and 𝐷 > 0. Using (10.22) and Chebyshev’s inequality we find | | ℙ(||∑ 𝑡𝑖𝑘 𝑋𝑘 || > 𝑁 𝜖 Ψ2o ) ≤ 𝑁 𝑁 −𝜖𝑝 𝑘

for large enough 𝑁 ≥ 𝑁0 (𝜖, 𝑝). Choosing 𝑝 ≥ 𝜖−1 (1+𝐷), we conclude the proof of (8.47). Remark 10.4. The first decoupling formula (8.1) is the only identity about the entries of 𝐺 that is needed in the proof of Lemma 8.9. In particular, the second decoupling formula (8.2) is never used, and the actual entries of 𝐻 never appear in the argument. Part II: Proof of (8.46). The bound (8.46) may be proved by following the proof of (8.47) verbatim; the only modification is the bound |𝑄 𝐺 (𝕋) | = |𝑄 (𝐺 (𝕋) − 𝑚)| ≺ Ψ, | 𝑘 𝑘𝑘 | | 𝑘 𝑘𝑘 | which replaces (10.10). Here we again use the same upper bound Ψo = Ψ for Λ and Λo . Part III: Proofs of (8.48) and (8.50). In order to prove (8.48), we note that the last term in (8.17) is of order 𝑂≺ (Ψ2 ). Hence we have 𝑤𝑎 ∶= ∑ 𝑡𝑎𝑖 𝑣𝑖 𝑖 2 2 2 ∑ 𝑠𝑎𝑖 𝑡𝑖𝑘 𝑣𝑘 + 𝑂≺ (Ψ2 ), ∑ 𝑡𝑎𝑖 Υ𝑖 + 𝑂≺ (Ψ2 ) = 𝑚sc ∑ 𝑡𝑎𝑖 𝑠𝑖𝑘 𝑣𝑘 − 𝑚sc = 𝑚sc 𝑖,𝑘

𝑖

𝑖,𝑘

where in the last step we used Corollary 8.10 (recall that the proof of this corollary used only (8.47), which was already proved in Part I) to bound ∑𝑖 𝑡𝑎𝑖 Υ𝑖 by 𝑂≺ (Ψ2 ) and that the matrices 𝑇 and 𝑆 commute by assumption. Introducing the vector 𝐰 = (𝑤𝑎 )𝑁 𝑎=1 , we therefore have the equation (10.23)

2 𝑆𝐰 + 𝑂≺ (Ψ2 ) , 𝐰 = 𝑚sc

where the error term is in the sense of the ℓ∞ -norm (uniform in the components 2 of the vector 𝐰). Inverting the matrix 1−𝑚sc 𝑆 and recalling the definition (6.17) yields (8.48).

94

10. FLUCTUATION AVERAGING MECHANISM

The proof of (8.50) is similar, except that we have to treat the subspace 𝐞⟂ separately. Using (8.49), we write 1 ∑ 𝑡𝑎𝑖 (𝑣𝑖 − [𝑣]) = ∑ 𝑡𝑎𝑖 𝑣𝑖 − ∑ 𝑣𝑖 , 𝑁 𝑖 𝑖 𝑖 and apply the above argument to each term separately. This yields 1 2 2 ∑ 𝑡𝑎𝑖 (𝑣𝑖 − [𝑣]) = 𝑚sc ∑ 𝑡𝑎𝑖 ∑ 𝑠𝑖𝑘 𝑣𝑘 − 𝑚sc ∑ ∑ 𝑠𝑖𝑘 𝑣𝑘 + 𝑂≺ (Ψ2 ) 𝑁 𝑖 𝑖 𝑖 𝑘 𝑘 2 ∑ 𝑠𝑎𝑖 𝑡𝑖𝑘 (𝑣𝑘 − [𝑣]) + 𝑂≺ (Ψ2 ) = 𝑚sc 𝑖,𝑘

where we used (6.1) in the second step. Note that the error term on the righthand side is perpendicular to 𝐞 when regarded as a vector indexed by 𝑎, since all other terms in the equation are. Hence we may invert the matrix (1 − 𝑚2 𝑆) on the subspace 𝐞⟂ , as above, to get (8.50). This completes the proof of Lemma 8.9. □ 10.3. Alternative Proof of (8.47) of Lemma 8.9 We conclude this section with an alternative proof of the key statement (8.47) of Lemma 8.9. While the underlying argument remains similar, the following proof makes use of an additional decomposition of the space of random variables that avoids the use of the stopping rule from Step (1 in the above proof of Lemma 8.9. This decomposition may be regarded as an abstract reformulation of the stopping rule. Alternative proof of (8.47) of Lemma 8.9. As before, we set 𝑋𝑘 ∶= 𝑄𝑘 (𝐺𝑘𝑘 )−1 . For simplicity of presentation, we set 𝑡𝑖𝑘 = 𝑁 −1 . The decomposition is defined using the operations 𝑃𝑖 and 𝑄𝑖 , introduced in Definition 7.3. It is immediate that 𝑃𝑖 and 𝑄𝑖 are projections, that 𝑃𝑖 + 𝑄𝑖 = 1, and that all of these projections commute with each other. For a set 𝐴 ⊂ {1, … , 𝑁} we use the notation 𝑃𝐴 ∶= ∏𝑖∈𝐴 𝑃𝑖 and 𝑄𝐴 ∶= ∏𝑖∈𝐴 𝑄𝑖 . Let 𝑝 be even and introduce the shorthand 𝑋˜𝑘𝑠 ∶= 𝑋𝑘𝑠 for 𝑠 ≤ 𝑝/2 and 𝑋˜𝑘𝑠 ∶= 𝑋 𝑘𝑠 for 𝑠 > 𝑝/2. Then we get 𝑝

𝑝

1 |1 | 𝔼|| ∑ 𝑋𝑘 || = 𝑝 𝑁 𝑘 𝑁 =

1 𝑁𝑝

˜𝑘 ∑ 𝔼∏𝑋 𝑠 𝑘1 ,…,𝑘𝑝

𝑠=1 𝑝

𝑝

˜𝑘 ). ∑ 𝔼 ∏(∏(𝑃𝑘𝑟 + 𝑄𝑘𝑟 )𝑋 𝑠 𝑘1 ,…,𝑘𝑝

𝑠=1 𝑟=1

Introducing the notation 𝐤 = (𝑘1 , … , 𝑘𝑝 ) and [𝐤] = {𝑘1 , … , 𝑘𝑝 }, we therefore get, by multiplying out the parentheses, 𝑝

𝑝

(10.24)

1 |1 | 𝔼|| ∑ 𝑋𝑘 || = 𝑝 ∑ 𝑁 𝑘 𝑁 𝐤 𝐴

∑ 1 ,…,𝐴𝑝 ⊂[𝐤]

𝔼 ∏(𝑃𝐴c𝑠 𝑄𝐴𝑠 𝑋˜𝑘𝑠 ). 𝑠=1

10.3. ALTERNATIVE PROOF OF (8.47) OF LEMMA 8.9

95

˜𝑘 , we have that 𝑋 ˜𝑘 = 𝑄𝑘 𝑋 ˜ , which implies that Next, by definition of 𝑋 𝑠 𝑠 𝑠 𝑘𝑠 ˜ 𝑃𝐴c𝑠 𝑋𝑘𝑠 = 0 if 𝑘𝑠 ∉ 𝐴𝑠 . Hence we may restrict the summation to 𝐴𝑠 satisfying (10.25)

𝑘𝑠 ∈ 𝐴 𝑠

for all 𝑠. Moreover, we claim that the right-hand side of (10.24) vanishes unless (10.26)

𝑘𝑠 ∈



𝐴𝑞

𝑞≠𝑠

for all 𝑠. Indeed, suppose that 𝑘𝑠 ∈ ⋂𝑞≠𝑠 𝐴c𝑞 for some 𝑠, say 𝑠 = 1. In this case, ˜𝑘 is independent of 𝑘1 (see Definition for each 𝑠 = 2, … , 𝑝 the factor 𝑃𝐴c 𝑄𝐴 𝑋 𝑠

𝑠

𝑠

7.3). Thus, we get 𝑝

𝑝

˜ ) ˜𝑘 ) = 𝔼(𝑃𝐴c 𝑄𝐴 𝑄𝑘 𝑋 ˜ ) ∏(𝑃𝐴c 𝑄𝐴 𝑋 𝔼 ∏(𝑃 𝑄𝐴𝑠 𝑋 𝑠 𝑠 1 1 𝑘1 𝑠 𝑘𝑠 1 𝐴c𝑠

𝑠=2

𝑠=1

𝑝

˜𝑘 ) ∏(𝑃𝐴c 𝑄𝐴 𝑋 ˜ )) = 0, = 𝔼𝑄𝑘1 ((𝑃𝐴c1 𝑄𝐴1 𝑋 𝑠 1 𝑠 𝑘𝑠 𝑠=2

where in the last step we used that 𝔼𝑄𝑖 (𝑋) = 0 for any 𝑖 and random variable 𝑋. We conclude that the summation on the right-hand side of (10.24) is restricted to indices satisfying (10.25) and (10.26). Under these two conditions we have 𝑝

∑ |𝐴𝑠 | ≥ 2 |[𝐤]|,

(10.27)

𝑠=1

since each index 𝑘𝑠 must belong to at least two different sets 𝐴𝑞 : to 𝐴𝑠 (by (10.25)) as well as to some 𝐴𝑞 with 𝑞 ≠ 𝑠 (by (10.26)). Next, we claim that for 𝑘 ∈ 𝐴 we have |𝐴|

(10.28)

|𝑄𝐴 𝑋𝑘 | ≺ Ψo .

(Note that if we were doing the case 𝑋𝑘 = 𝑄𝑘 𝐺𝑘𝑘 instead of 𝑋𝑘 = 𝑄𝑘 (𝐺𝑘𝑘 )−1 , then (10.28) would have to be weakened to |𝑄𝐴 𝑋𝑘 | ≺ Ψ|𝐴| , in accordance with (8.46). Indeed, in that case and for 𝐴 = {𝑘}, we only have the bound |𝑄𝑘 𝐺𝑘𝑘 | ≺ Ψ and not |𝑄𝑘 𝐺𝑘𝑘 | ≺ Ψo .) Before proving (10.28), we show how it may be used to complete the proof. Using (10.24), (10.28), and Lemma 10.1, we find 𝑝

1 2|[𝑘]| |1 | 𝔼|| ∑ 𝑋𝑘 || ≺ 𝐶𝑝 𝑝 ∑ Ψo 𝑁 𝑁 𝐤

𝑘

𝑝

= 𝐶𝑝 ∑ Ψ2𝑢 o 𝑢=1 𝑝

1 ∑ 𝟏(|[𝐤]| = 𝑢) 𝑁𝑝 𝐤 2𝑝

𝑢−𝑝 ≤ 𝐶𝑝 (Ψo + 𝑁 −1/2 )2𝑝 ≤ 𝐶𝑝 Ψo , ≤ 𝐶𝑝 ∑ Ψ2𝑢 o 𝑁 𝑢=1

96

10. FLUCTUATION AVERAGING MECHANISM

where in the first step we estimated the summation over the sets 𝐴1 , … , 𝐴𝑝 by a combinatorial factor 𝐶𝑝 depending on 𝑝, in the fourth step we used the elementary inequality 𝑎𝑛 𝑏𝑚 ≤ (𝑎 + 𝑏)𝑛+𝑚 for positive 𝑎, 𝑏, and in the last step we used (8.20) and the bound 𝑀 ≤ 𝑁. Thus, we have proved (10.22), from which the claim follows exactly as in the first proof of (8.47). What remains is the proof of (10.28). The case |𝐴| = 1 (corresponding to 𝐴 = {𝑘}) follows from (10.10), exactly as in the first proof of (8.47). To simplify notation, for the case |𝐴| ≥ 2 we assume that 𝑘 = 1 and 𝐴 = {1, … , 𝑡} with 𝑡 ≥ 2. It suffices to prove that | 1 | |𝑄𝑡 ⋯ 𝑄2 | ≺ Ψ𝑡o . 𝐺11 | |

(10.29)

We start by writing, using the first decoupling formula (8.1), 𝑄2

𝐺12 𝐺21 𝐺12 𝐺21 1 1 + 𝑄2 = 𝑄2 , = 𝑄2 (2) (2) (2) 𝐺11 𝐺11 𝐺11 𝐺11 𝐺22 𝐺11 𝐺11 𝐺22 (2)

where the first term vanishes since 𝐺11 is independent of 2 (see Definition 7.3). We now consider 𝐺12 𝐺21 1 = 𝑄2 𝑄3 , 𝑄3 𝑄2 (2) 𝐺11 𝐺 𝐺 𝐺 11 11

22

and apply again the first decoupling formula (8.1) with 𝑘 = 3 to each resolvent entry on the right-hand side,and multiply everything out. The result is a sum of fractions of entries of 𝐺, whereby all entries in the numerator are diagonal and all entries in the denominator are diagonal. The leading-order term vanishes, (3)

𝑄2 𝑄3

(3)

𝐺12 𝐺21 (3)

(23)

(3)

= 0,

𝐺11 𝐺11 𝐺22

so that the surviving terms have at least three (off-diagonal) resolvent entries in the numerator. We may now continue in this manner; at each step the number of (off-diagonal) resolvent entries in the numerator increases by at least 1. More formally, we obtain a sequence 𝐴2 , … , 𝐴𝑡 , where 𝐴2 ∶= 𝑄2

𝐺12 𝐺21 (2)

𝐺11 𝐺11 𝐺22

and 𝐴𝑖 is obtained by applying (8.1) with 𝑘 = 𝑖 to each entry of 𝑄𝑖 𝐴𝑖−1 and keeping only the nonvanishing terms. The following properties are easy to check by induction: (i) 𝐴𝑖 = 𝑄𝑖 𝐴𝑖−1 . (ii) 𝐴𝑖 consists of the projection 𝑄2 ⋯ 𝑄𝑖 applied to a sum of fractions such that all entries in the numerator are off-diagonal and all entries in the denominator are diagonal.

10.3. ALTERNATIVE PROOF OF (8.47) OF LEMMA 8.9

97

(iii) The number of (off-diagonal) entries in the numerator of each term of 𝐴𝑖 is at least 𝑖. By Lemma 10.1 combined with (ii) and (iii) we conclude that |𝐴𝑖 | ≺ Ψo𝑖 . From (i) we therefore get 1 𝑄𝑡 ⋯ 𝑄2 = 𝐴𝑡 = 𝑂≺ (Ψ𝑡o ). 𝐺11 This is (10.29). Hence the proof is complete. □ 10.3.1. History of the fluctuation averaging. Lemma 8.9 is a version of the fluctuation averaging mechanism, taken from [55], that is the most useful for our purpose in proving Theorem 6.7. We now briefly comment on its history. The first version of the fluctuation averaging mechanism appeared in [68] for the Wigner case, where [𝑍] = 𝑁 −1 ∑𝑘 𝑍𝑘 was bounded by Λ2o (recall 𝑍𝑖 from (8.11)). Since 𝑄𝑘 [𝐺𝑘𝑘 ]−1 is essentially 𝑍𝑘 (see (8.7)), this corresponds to the first bound in (8.46). A different proof (with a better bound on the constants) was given in [70]. A conceptually streamlined version of the original proof was extended to sparse matrices [56] and to sample covariance matrices [114]. Finally, an extensive analysis in [52] treated the fluctuation averaging of general polynomials of resolvent entries and identified the order of cancellations depending on the algebraic structure of the polynomial. Moreover, in [52] an additional cancellation effect was found for the quantity 𝑄𝑖 |𝐺𝑖𝑗 |2 . All proofs of the fluctuation averaging lemmas rely on computing expectations of high moments of the averages and carefully estimating various terms of different combinatorial structure. In [52] we have developed a Feynman diagrammatic representation for bookkeeping the terms, but this is necessary only for the case of general polynomials. For the special cases stated in Lemma 8.9, the proof presented here is relatively simple.

CHAPTER 11

Eigenvalue Location: The Rigidity Phenomenon The local semicircle law in Theorem 6.7 was proven for universal Wigner matrices, characterized by the upper bound 𝑠𝑖𝑗 ≤ 1/𝑀 on the variances of ℎ𝑖𝑗 . This result implies rigidity estimates on the location of the eigenvalues with a precision depending on 𝑀; e.g., in the bulk spectrum the eigenvalues can typically be located with a precision slightly above 1/𝑀. For the sake of simplicity, from now on we restrict our presentation to the special case of generalized Wigner matrices characterized by 𝐶inf /𝑁 ≤ 𝑠𝑖𝑗 ≤ 𝐶sup /𝑁 with two positive constants 𝐶inf and 𝐶sup (see Definition 2.1). So, from now on the parameter 𝑀 will be replaced with 𝑁. For rigidity results in the general case, see theorem 7.6 in [55]. In this section we will show that eigenvalues for generalized Wigner matrices are quite rigid; they may fluctuate only on a scale slightly above 1/𝑁. This is a manifestation that the eigenvalues are strongly correlated; the typical fluctuation scale of 𝑁 independent, say Poisson, points in a finite interval would be 𝑁 −1/2 . 11.1. Extreme Eigenvalues We first state a corollary to Theorem 6.7 on the extreme eigenvalues. Corollary 11.1. For any generalized Wigner matrix, the largest eigenvalue of 𝐻 is bounded by 2 + 𝑁 −2/3+𝜀 for any 𝜀 > 0 in the sense that for any 𝜀 > 0 and any 𝐷 ≥ 1 we have (11.1)

ℙ( max |𝜆𝛼 | ≥ 2 + 𝑁 −2/3+𝜀 ) ≤ 𝑁 −𝐷 𝛼=1,…,𝑁

for any 𝑁 ≥ 𝑁0 (𝜀, 𝐷). Proof. Set 𝜂 = 𝑁 −2/3 and choose an energy 𝐸 = 2 + 𝜅 outside of the spectrum with some 𝜅 ≥ 𝑁 −2/3+𝜀 ≫ 𝑁 𝜀/2 𝜂. From (6.33) with 𝑧 = 𝐸 + 𝑖𝜂, we have (11.2)

|Im 𝑚𝑁 (𝑧) − Im 𝑚sc (𝑧)| ≤

𝑁 𝜀/2 1 ≪ 𝑁𝜅 𝑁𝜂

with very high probability. On the other hand, if there is an eigenvalue 𝜆 with |𝜆 − 𝐸| ≤ 𝜂, then we have (11.3)

Im 𝑚𝑁 (𝑧) ≥

1 𝑁 𝜀/2 ≫ Im 𝑚sc (𝑧) + . 2𝑁𝜂 𝑁𝜅 99

100

11. EIGENVALUE LOCATION: THE RIGIDITY PHENOMENON

Here, the first inequality comes from 𝜂 𝜂 1 1 Im 𝑚𝑁 (𝑧) = ∑ ≥ , 2 2 𝑁 𝛼 |𝜆𝛼 − 𝐸| + 𝜂 𝑁 |𝜆 − 𝐸|2 + 𝜂 2 while the second follows from Im 𝑚sc (𝑧) ≍ 𝜂/√𝜅 from (6.13). The equations (11.2) and (11.3), however, contradict each other, showing that with very high probability there is no eigenvalue with |𝜆 − 𝐸| ≤ 𝑁 −2/3 . We can repeat this argument for a grid of energies 𝐸 = 𝐸𝑘 = 2+𝑁 −2/3+𝜀 +𝑘𝑁 −2/3 for 𝑘 = 0, 1, … , 𝐶𝑁 2/3+𝐷 for any 𝐷 finite. We can then use the union bound for the exceptional sets of small probabilities to exclude the existence of eigenvalues between 𝑁 −2/3+𝜀 and 𝑁 𝐷 . Finally, for a large enough 𝐷, it is trivial to prove that the bound ‖𝐻‖ ≤ ∑𝑖𝑗 |ℎ𝑖𝑗 | ≤ 𝑁 𝐷 holds with very high probability. This excludes eigenvalues |𝜆| ≥ 𝑁 𝐷 and proves the corollary. □ 11.2. Stieltjes Transform and Regularized Counting Function For any signed measure ˜ 𝜌 on the real line, define its Stieltjes transform by (11.4)

𝑆(𝑧) = ∫ ℝ

˜ 𝜌 (𝜆) d𝜆, 𝜆−𝑧

𝑧 ∈ ℂ ⧵ ℝ.

The Helffer-Sjöstrand formula (originally used to develop an alternative functional calculus for self-adjoint operators (see, e.g., [36])) yields an expression for ∫ℝ 𝑓(𝜆) ˜ 𝜌 (𝜆)d𝜆 for a large class of functions 𝑓(𝜆) on the real line in terms of the Stieltjes transform of ˜ 𝜌 . More precisely, let 𝑓 ∈ 𝐶 1 (ℝ) with compact support and let 𝜒(𝑦) be a smooth cutoff function with support in [−1, 1], with 𝜒(𝑦) = 1 for |𝑦| ≤ 12 and with bounded derivatives. Define ˜ (𝑥 + 𝑖𝑦) ∶= (𝑓(𝑥) + 𝑖𝑦𝑓 ′ (𝑥))𝜒(𝑦) 𝑓 to be an almost-analytic extension of 𝑓. With the standard convention 𝑧 = 𝑥+𝑖𝑦 and 𝜕𝑧 = [𝜕𝑥 + 𝑖𝜕𝑦 ]/2, we have (11.5)

𝑓(𝜆) =

˜ (𝑥 + 𝑖𝑦) 𝜕 𝑓 1 ∫ 𝑧 d𝑥 d𝑦. 𝜋 ℝ2 𝜆 − 𝑥 − 𝑖𝑦

To see this identity, we use 𝜕𝑧 (𝜆 − 𝑧)−1 = 0 to write (11.6)

˜ (𝑥 + 𝑖𝑦) ˜ (𝑥 + 𝑖𝑦) 𝜕 𝑓 𝑓 1 1 ∫ 𝑧 ∫ 𝜕𝑧 [ d𝑥 d𝑦 = ]d𝑧 d𝑧. 𝜋 ℝ2 𝜆 − 𝑥 − 𝑖𝑦 2𝜋𝑖 ℝ2 𝜆 − 𝑥 − 𝑖𝑦

We can rewrite the last term as (11.7)

˜ (𝑥 + 𝑖𝑦) 𝑓 1 ∫ d[ d𝑧], 2𝜋𝑖 ℝ2 𝜆 − 𝑥 − 𝑖𝑦

where the operator d is the differential in the sense of a 1-form. From the Green ˜ , we can integrate by parts to a small circle theorem and the compact support of 𝑓

11.2. STIELTJES TRANSFORM AND REGULARIZED COUNTING FUNCTION

101

𝐶𝜀 of radius 𝜀 around 𝜆. The contribution of the two-dimensional integration inside the circle will vanish in the limit 𝜀 → 0. Hence, the last term is equal to ˜ (𝑥 + 𝑖𝑦) 𝑓 1 ∫ (11.8) lim d𝑧 = 𝑓(𝜆), 𝜆 − 𝑥 − 𝑖𝑦 𝜀→0 2𝜋𝑖 𝐶 𝜀

and this proves (11.5). Computing the 𝜕𝑧̄ in (11.5) explicitly, we have (11.9)

𝑓(𝜆) =

𝑖𝑦𝑓 ″ (𝑥)𝜒(𝑦) + 𝑖(𝑓(𝑥) + 𝑖𝑦𝑓 ′ (𝑥))𝜒 ′ (𝑦) 1 ∫ d𝑥 d𝑦. 𝜋 ℝ2 𝜆 − 𝑥 − 𝑖𝑦

˜ (𝑥 + 𝑖𝑦) ∶= We remark that the same formula holds if we simply defined 𝑓 ′ ˜ 𝑓(𝑥)𝜒(𝑦). The addition of the term 𝑖𝑦𝑓 (𝑥)𝜒(𝑦) is to make 𝜕𝑧 𝑓 (𝑥 + 𝑖𝑦) = 𝑂(𝑦) for |𝑦| small, which will be needed in the following estimates. In fact, the concept of almost-analytic functions can be extended to arbitrary order 𝑛. We could have ˜ (𝑥 + 𝑖𝑦) = 𝑂(|𝑦|𝑛 ), but we will not need this result as ˜ such that 𝜕𝑧 𝑓 defined 𝑓 it would not improve our estimate. The following lemma in a slightly less general form appeared in [59,68,69]. It shows how to translate estimates on the Stieltjes transform 𝑆(𝑧) in the regime Im 𝑧 ≥ ˜ 𝜂 to the regularized distribution function of ˜ 𝜚 . Given two real numbers, 𝐸1 < 𝐸2 , we wish to estimate ˜ 𝜚 ([𝐸1 , 𝐸2 ]), the ˜ 𝜚 -measure of the interval [𝐸1 , 𝐸2 ]. Since the integral of the sharp indicator function cannot be directly controlled, we need to smooth it out; the function 𝑓2 − 𝑓1 in the lemma below plays this role. The scale of the smoothing, denoted by ˆ 𝜂 below, must be larger than ˜ 𝜂. Another feature of this lemma is that the condition on 𝑆(𝑧) = 𝑆(𝐸 + 𝑖𝜂) is local; only 𝐸-values in a ˜ 𝜂 -neighborhood of the endpoints 𝐸1 , 𝐸2 are needed. Lemma 11.2. Let ˜ 𝜚 be a signed measure on ℝ, and let 𝑆 be its Stieltjes transform. Fix two energies 𝐸1 < 𝐸2 . Suppose that for some 0 < ˜ 𝜂 ≤ 12 and 0 < 𝑈1 ≤ 𝑈2 we have 𝑈 |𝑆(𝑥 + i𝑦)| ≤ 1 for any 𝑥 ∈ [𝐸1 , 𝐸1 + ˜ 𝜂 ] ∪ [𝐸2 , 𝐸2 + ˜ 𝜂] (11.10) 𝑦 and ˜ 𝜂 ≤ 𝑦 ≤ 1, 𝑈1 1 for any 𝑥 ∈ [𝐸1 , 𝐸2 + ˜ 𝜂 ] and ≤ 𝑦 ≤ 1, 𝑦 2 𝑈 𝜂 ] ∪ [𝐸2 , 𝐸2 + ˜ 𝜂] |Im 𝑆(𝑥 + 𝑖𝑦)| ≤ 2 for any 𝑥 ∈ [𝐸1 , 𝐸1 + ˜ (11.12) 𝑦 and 0 < 𝑦 < ˜ 𝜂. For ˆ 𝜂 ≥ ˜ 𝜂 , define two functions 𝑓𝑗 = 𝑓𝐸𝑗 ,ˆ 𝜂 ∶ ℝ → ℝ such that 𝑓𝑗 (𝑥) = 1 for (11.11)

|Im 𝑆(𝑥 + 𝑖𝑦)| ≤

𝑥 ∈ (−∞, 𝐸𝑗 ], 𝑓𝑗 (𝑥) vanishes for 𝑥 ∈ [𝐸𝑗 + ˆ 𝜂 , ∞); moreover, |𝑓𝑗′ (𝑥)| ≤ 𝐶 ˆ 𝜂 and |𝑓𝑗″ (𝑥)| ≤ 𝐶 ˆ 𝜂

−2

for some constant 𝐶. Then, for some other constant 𝐶 > 0

independent of 𝑈1 , 𝑈2 , and ˜ 𝜂 we have (11.13)

−1

2 | | ˜ 𝜂 |∫(𝑓2 − 𝑓1 )(𝜆) ˜ 𝜚 (𝜆)d𝜆| ≤ 𝐶‖ ˜ 𝜚 ‖TV [𝑈1 |log ˜ 𝜂 | + 𝑈2 2 ] | | ˆ 𝜂

102

11. EIGENVALUE LOCATION: THE RIGIDITY PHENOMENON

where ‖ ⋅ ‖TV denotes the total variation norm of signed measures. In addition, if we assume that ˜ 𝜌 has compact support, then we also have 2 | | ˜ 𝜂 |∫ 𝑓2 (𝜆) ˜ 𝜚 (𝜆)d𝜆| ≤ 𝐶‖ ˜ 𝜚 ‖TV [𝑈1 |log ˜ 𝜂 | + 𝑈2 2 ] | | ˆ 𝜂

(11.14)

where 𝐶 depends on the size of the support of ˜ 𝜚. Proof. In this proof, we will consider only the case ˜ 𝜂 = ˆ 𝜂 and 𝑈1 = 𝑈2 . We will denote these common values by 𝜂 and 𝑈. We may also assume ‖˜ 𝜚 ‖TV = 1. These simplifications only streamline some notation; the general case is proven exactly in the same way. Let 𝑓 = 𝑓2 − 𝑓1 be a function that is smooth and compactly supported. From the representation (11.9) and since 𝑓 is real, we have ∞



| | | | |∫ 𝑓(𝜆) ˜ 𝜌 (𝜆)d𝜆| = |Re ∫ 𝑓(𝜆) ˜ 𝜌 (𝜆)d𝜆| | −∞ | | | −∞ | | ≤ 𝐶 |∬ 𝑦𝑓 ″ (𝑥)𝜒(𝑦) Im 𝑆(𝑥 + i𝑦)d𝑥 d𝑦 | | |

(11.15)

+ 𝐶 ∬ (|𝑓(𝑥)||𝜒 ′ (𝑦)||Im 𝑆(𝑥 + i𝑦)| + |𝑦||𝑓 ′ (𝑥)||𝜒 ′ (𝑦)||Re 𝑆(𝑥 + i𝑦)|)d𝑥 d𝑦 for some universal 𝐶 > 0 and where 𝜒 is a smooth cutoff function with support in [−1, 1] with 𝜒(𝑦) = 1 for |𝑦| ≤

1 2

and with bounded derivatives. Recall that 𝑓 ′

is O(𝜂−1 ) on two intervals of size O(𝜂) each and 𝜒 ′ is supported in

1 2

≤ |𝑦| ≤ 1.

By (11.10), we can bound (11.16)

∬ |𝑦||𝑓 ′ (𝑥)||𝜒 ′ (𝑦)||Re 𝑆(𝑥 + i𝑦)|d𝑥 d𝑦 ≤ 𝐶𝑈

(note that due to 𝑆(𝑥 + 𝑖𝑦) = 𝑆(𝑥 − 𝑖𝑦), the bounds analogous to (11.10)–(11.11) also hold for negative 𝑦’s). Using that 𝑓 is bounded with compact support, we have, by (11.11), (11.17)

∬ |𝑓(𝑥)||𝜒 ′ (𝑦)||Im 𝑆(𝑥 + i𝑦)|d𝑥 d𝑦 ≤ 𝐶𝑈.

For the first term on the right-hand side of (11.15), we split the integral into two regimes depending on 0 < 𝑦 < 𝜂 or 𝜂 < 𝑦 < 1. Note that by symmetry we only need to consider positive 𝑦. From (11.12), the integral on the first

11.3. CONVERGENCE SPEED OF THE EMPIRICAL DISTRIBUTION FUNCTION

103

integration regime is easily bounded by | | |∬ 𝑦𝑓 ″ (𝑥)𝜒(𝑦) Im 𝑆(𝑥 + i𝑦)d𝑥 d𝑦 | | | 0 0 we know that 1 (11.23) |𝑚𝑁 (𝑥 + i𝜂) − 𝑚sc (𝑥 + i𝜂)| ≺ 𝑁𝜂 holds for any |𝑥 − 𝐸| ≤ ˜ 𝜂 and any 𝜂 ∈ [ ˜ 𝜂 , 10]. Then, we have (11.24)

|𝔫𝑁 (𝐸) − 𝑛sc (𝐸)| ≺ ˜ 𝜂.

For generalized Wigner matrices, the condition (11.23) holds with the choice ˜ 𝜂 = 𝑁 −1+𝜀 uniformly in |𝐸| ≤ 10 (see (6.29) and (6.32)). Thus, we have proved the following corollary: Corollary 11.4. For generalized Wigner matrices, we have (11.25)

sup |𝔫𝑁 (𝐸) − 𝑛sc (𝐸)| ≤ 𝑁 −1+𝜀

|𝐸|≤10

with probability at least 1 − 𝑁 −𝐷 for any 𝐷, 𝜀 > 0 if 𝑁 ≥ 𝑁0 (𝜀, 𝐷) is sufficiently large. Proof of Lemma 11.3. Since both the semicircle measure and the empirical measure (by Corollary 11.1) have a compact support within [−3, 3] with a very high probability, (11.24) clearly holds for 𝐸 ≤ −3 and for 𝐸 ≥ 3. For an energy |𝐸| ≤ 3, we will apply Lemma 11.2 with the choice ˜ 𝜌 = 𝜌∆ , 𝑈1 = 𝑈2 = 𝜀 𝜂 , and 𝐸2 = 𝐸. The estimate (11.14) will immediately imply (11.24) provided 𝑁 ˜ the other assumptions on Lemma 11.2 are satisfied. For this purpose, we need the bounds (11.10), (11.11), and (11.12) on the Stieltjes transform with the choice 𝑈1 = 𝑈2 = 𝑁 𝜀 ˜ 𝜂 . We denote this common value by 𝑈 ∶= 𝑁 𝜀 ˜ 𝜂 . The assumption (11.23) implies (11.10) (with very high probability) if we choose 𝑈 ≥ 𝑁 −1+𝜀 . From Lemma 6.2, we find (11.26)

|Im 𝑚sc (𝑥 + i𝑦)| ≤ 𝐶√𝜅𝑥 + 𝑦,

recalling the definition 𝜅𝑥 = min{|𝑥 − 2|, |𝑥 + 2|} from (6.23). By spectral decomposition of 𝐻, it is easy to see that the function 𝑦 ↦ 𝑦 Im 𝑚𝑁 (𝑥 + i𝑦) is

11.3. CONVERGENCE SPEED OF THE EMPIRICAL DISTRIBUTION FUNCTION

105

monotone increasing. Thus, we get, using (11.26) and (11.23), that 1 ) 𝑁˜ 𝜂 1 ≺ ˜ 𝜂 + 𝜂 √𝜅𝑥 + ˜ 𝑁

𝑦 Im 𝑚𝑁 (𝑥 + i𝑦) ≤ ˜ 𝜂 Im 𝑚𝑁 (𝑥 + i ˜ 𝜂) ≺ ˜ 𝜂 (√𝜅𝑥 + ˜ 𝜂 + (11.27)

for 𝑦 ≤ ˜ 𝜂 and |𝑥| ≤ 10. Using the notation 𝑚∆ ∶= 𝑚𝑁 − 𝑚sc and recalling (11.26), we therefore get |𝑦 Im 𝑚∆ (𝑥 + i𝑦)| ≺ ˜ 𝜂 √𝜅𝑥 + ˜ 𝜂 +

(11.28)

1 𝜂 ≤ 𝐶˜ 𝑁

for 𝑦 ≤ ˜ 𝜂 and |𝑥| ≤ 10; here we have used the assumption ˜ 𝜂 ≥ 𝑁1 in the last step. Therefore, with the choice 𝑈 = ˜ 𝜂 𝑁 𝜀 , the bounds (11.10), (11.11), and (11.12) indeed hold. Recall from Lemma 11.2 that 𝑓𝐸,𝜂˜ denotes a smooth indicator function of [−∞, 𝐸] on scale ˜ 𝜂 , namely, 𝑓𝐸,𝜂˜ (𝑥) = 1 for 𝑥 ∈ [−∞, 𝐸] and 𝑓𝐸,𝜂˜ (𝑥) = 0 for −1 𝑥≥𝐸+ ˜ 𝜂 and having derivative bounded by ˜ 𝜂 . Using (11.14) from Lemma 11.2, we conclude that | | |∫ 𝑓𝐸,𝜂˜ (𝜆)(𝜆)𝜚∆ (𝜆)d𝜆| ≺ ˜ 𝜂. | |

(11.29) Hence, we have

𝔫𝑁 (𝐸) − 𝑛sc (𝐸) − ∫ 𝑓𝐸,𝜂˜ (𝜆) 𝜚∆ (𝜆)d𝜆 = ˜ 𝐸+𝜂

˜ 𝐸+𝜂

−∫

𝑓𝐸,𝜂˜ (𝜆) 𝜚𝑁 (𝜆)d𝜆 + ∫

𝐸

𝐸

𝑓𝐸,𝜂˜ (𝜆)𝜚sc (𝜆)d𝜆 ≤ 𝐶 ˜ 𝜂.

Notice that we only used the positivity of 𝜚𝑁 in the first term and the boundedness of 𝑓 and 𝜚sc in the second. Conversely, we have 𝔫𝑁 (𝐸) − 𝑛sc (𝐸) − ∫ 𝑓𝐸−𝜂˜,𝜂˜ (𝜆)𝜚∆ (𝜆)d𝜆 = 𝐸



𝐸

(1 − 𝑓𝐸−𝜂˜,𝜂˜ )(𝜆)𝜚𝑁 (𝜆)d𝜆 − ∫

˜ 𝐸−𝜂

(1 − 𝑓𝐸−𝜂˜,𝜂˜ )(𝜆)𝜚sc (𝜆)d𝜆 ≥ −𝐶 ˜ 𝜂.

˜ 𝐸−𝜂

Hence we have proved that −𝐶 ˜ 𝜂 + ∫ 𝑓𝐸−𝜂˜,𝜂˜ (𝜆)𝜚∆ (𝜆)d𝜆 ≤ 𝔫𝑁 (𝐸) − 𝑛sc (𝐸) ≤ ∫ 𝑓𝐸,𝜂˜ (𝜆)𝜚∆ (𝜆)d𝜆 + 𝐶 ˜ 𝜂. The right-hand side is directly bounded by (11.29); the left-hand side is estimated in the same way by using (11.29) for 𝑓𝐸−𝜂˜,𝜂˜ instead of 𝑓𝐸,𝜂˜ . This proves □ Lemma 11.3.

106

11. EIGENVALUE LOCATION: THE RIGIDITY PHENOMENON

11.4. Rigidity of Eigenvalues We now state and prove the rigidity theorem concerning the eigenvalue locations for generalized Wigner matrices. Recall the definition of 𝜚𝑁 from (11.22). Hence we have 𝜆

𝑗 𝑗 = ∫ 𝜚𝑁 (𝑥)d𝑥. 𝑁 −∞

(11.30)

We define the classical location of the 𝑗th eigenvalue by the equation 𝛾

𝑗 𝑗 = ∫ 𝜚sc (𝑥)d𝑥; 𝑁 −∞

(11.31)

in other words, 𝛾𝑗 is the 𝑗th 𝑁-quantile of the semicircle distribution. Theorem 11.5 (Rigidity of eigenvalues). For any generalized Wigner matrix ensemble, for any constant 𝜀 > 0 and any constant 𝐷 ≥ 1 we have (11.32)

ℙ{∃𝑗 ∶ |𝜆𝑗 − 𝛾𝑗 | ≥ 𝑁 𝜀 [min(𝑗, 𝑁 − 𝑗 + 1)]−1/3 𝑁 −2/3 } ≤ 𝑁 −𝐷

for any sufficiently large 𝑁 ≥ 𝑁0 (𝜀, 𝐷). Proof. We consider only the case 𝑗 ≤ 𝑁/2; the other case is similar. By (11.30) and (11.31) we have 𝜆𝑗

0=∫ (11.33)

𝛾𝑗

𝜌𝑁 (𝑥)d𝑥 − ∫ 𝜌sc (𝑥)d𝑥

−∞ 𝜆𝑗

−∞ 𝛾𝑗

= ∫ (𝜌𝑁 (𝑥) − 𝜌sc (𝑥))d𝑥 − ∫ 𝜚sc (𝑥)d𝑥, −∞

𝜆𝑗

i.e., | 𝛾𝑗 | | 𝜆𝑗 | |∫ 𝜚sc (𝑥)d𝑥 | ≤ |∫ (𝜌𝑁 (𝑥) − 𝜌sc (𝑥))d𝑥| = |𝔫𝑁 (𝜆𝑗 ) − 𝑛sc (𝜆𝑗 )| | 𝜆𝑗 | | −∞ | and thus by the uniform bound (11.25) | 𝛾𝑗 | 1 |∫ 𝜚sc (𝑥)d𝑥| ≺ . 𝑁 | 𝜆𝑗 |

(11.34)

Since 𝜚sc (𝑥) ≍ √𝜅𝑥 for 𝑥 ∈ [−2, 1], we have (11.35)

𝑛sc (𝑥) ≍ 𝜅𝑥3/2 ,

1/3 ′ (𝑥), (𝑥) ≍ 𝑛sc 𝜚sc (𝑥) = 𝑛sc

𝑥 ∈ [−2, 1].

This also implies for 𝑗 ≤ 𝑁/2 that 2/3

1/3

𝑗 𝑗 ) , 𝜚sc (𝛾𝑗 ) ≍ ( ) . 𝑁 𝑁 If we knew that 𝜚sc (𝛾𝑗 ) and 𝜚sc (𝜆𝑗 ) were comparable, then the monotonicity of 𝜚sc and the mean value theorem (11.34) would immediately imply

(11.36)

(11.37)

𝛾𝑗 + 2 ≍ (

|𝜆𝑗 − 𝛾𝑗 |𝜌sc (𝛾𝑗 ) ≺ 𝑁 −1 .

11.4. RIGIDITY OF EIGENVALUES

107

Combining this with the second asymptotics form (11.36), we would have (11.38)

|𝜆𝑗 − 𝛾𝑗 | ≺ 𝑁 −1 𝜚sc (𝛾𝑗 )−1 ≤ 𝐶𝑁 −1 (𝑗/𝑁)−1/3 .

For the complete proof, we first consider the indices 𝑗 ≥ 𝑗0 ∶= 𝑁 𝜀/2 . Since 𝑛sc (𝛾𝑗 ) ≥ 𝑁 −1+𝜀/2 and 𝑛sc (𝛾𝑗 ) = 𝔫𝑁 (𝜆𝑗 ), we have |𝑛sc (𝜆𝑗 ) − 𝑛sc (𝛾𝑗 )| ≤ 𝑁 −𝜀/4 𝑛sc (𝛾𝑗 ) with very high probability by (11.25). This shows that 𝑛sc (𝜆𝑗 ) ≍ 𝑛sc (𝛾𝑗 ), but then 𝜚sc (𝜆𝑗 ) ≍ 𝜚sc (𝛾𝑗 ) by (11.35), as we presumed above. Finally, for indices 𝑗 ≤ 𝑗0 = 𝑁 𝜀/2 we use monotonicity and rigidity (11.38) for the index 𝑗0 : 𝜆𝑗 ≤ 𝜆𝑗0 ≤ 𝛾𝑗0 + 𝑁 −2/3+𝜀/2 ≤ −2 + 𝑁 −2/3+𝜀 with very high probability. In the last step we used (11.36). For the lower bound on 𝜆𝑗 we refer to (11.1). This shows that |𝜆𝑗 + 2| ≤ 𝑁 −2/3+𝜀 with very high probability, and we also have |𝛾𝑗 + 2| ≤ 𝑁 −2/3+𝜀 , so |𝜆𝑗 − 𝛾𝑗 | ≤ 2𝑁 −2/3+𝜀 . This proves the rigidity estimate (11.32) for all indices 𝑗. □

CHAPTER 12

Universality for Matrices with Gaussian Convolutions 12.1. Dyson Brownian Motion Consider the following matrix-valued stochastic differential equation (12.1)

d𝐻(𝑡) =

1 √𝑁

dB(𝑡) −

1 𝐻(𝑡)d𝑡, 2

𝑡 ≥ 0,

with initial data 𝐻0 where B(𝑡) is defined somewhat differently in the two symmetry classes, namely: (i) In the real symmetric case (indicated by superscript), B(s) is an 𝑁 × 𝑁 (s) (s) matrix such that 𝑏𝑖𝑗 for 𝑖 < 𝑗 and 𝑏𝑖𝑖 /√2 are independent standard (s)

(s)

Brownian motions and 𝑏𝑖𝑗 = 𝑏𝑗𝑖 . (ii) In the complex Hermitian case, B(h) is an 𝑁 × 𝑁 matrix such that (h) (h) (h) √2 Re(𝑏𝑖𝑗 ), √2 Im(𝑏𝑖𝑗 ) for 𝑖 < 𝑗 and 𝑏𝑖𝑖 are independent standard (h)

(h)

Brownian motions and 𝑏𝑗𝑖 = (𝑏𝑖𝑗 )∗ . We will drop the s or h superscript; the following formulas hold for both cases. In coordinates, we have

(12.2)

dℎ𝑖𝑗 (𝑡) =

d𝑏𝑖𝑗 (𝑡) √𝑁



1 ℎ (𝑡)d𝑡 2 𝑖𝑗

where the entries 𝑏𝑖𝑗 (𝑡) have variance 𝑡 in the Hermitian case and in the symmetric case the off-diagonal entries (𝑏𝑖𝑗 (𝑡)) have variance 𝑡, while the diagonal entries have variance 2𝑡. The equation (12.1) defines a matrix-valued Ornstein-Uhlenbeck (OU) process. It plays a distinguished role in the theory of random matrices that was discovered by Dyson. Depending on the typographical convenience, the time dependence will sometimes be indicated as an argument, 𝐻(𝑡), and sometimes as an index, 𝐻𝑡 ; we keep both notations in parallel, i.e., 𝐻(𝑡) = 𝐻𝑡 . In particular, unlike in some PDE literature, subscripts do not mean derivatives. 109

110

12. UNIVERSALITY FOR MATRICES WITH GAUSSIAN CONVOLUTIONS

Next, we are concerned about the evolution of the eigenvalues 𝝀(𝑡) = (𝜆1 (𝑡), … , 𝜆𝑁 (𝑡)) of 𝐻𝑡 along the OU flow. We label the eigenvalues in an increasing order; i.e., we assume that 𝝀(𝑡) ∈ Σ𝑁 , where we define the open simplex Σ𝑁 ∶= {𝝀 ∈ ℝ𝑁 ∶ 𝜆1 < ⋯ < 𝜆𝑁 }.

(12.3)

A theorem below will guarantee that the eigenvalues are simple and are continuous functions of 𝑡, hence the labeling is consistently preserved along the evolution. In principle, the eigenvalues {𝜆𝑗 (𝑡)} and eigenvectors {𝐮𝑗 (𝑡)} of 𝐻𝑡 are strongly related, and one would expect a coupled system of stochastic differential equations for them. It was Dyson’s fundamental observation [45] that the eigenvalues themselves satisfy an autonomous system of stochastic differential equations (SDE) that do not involve the eigenvectors. This SDE is given in the following definition. Definition 12.1. Given a real parameter 𝛽 ≥ 1, consider the following system of SDE: (12.4)

d𝜆𝑖 =

√2 √𝛽𝑁

(𝑖)

d𝐵𝑖 + (−

𝜆𝑖 1 1 + ∑ )d𝑡, 2 𝑁 𝑗 𝜆𝑖 − 𝜆𝑗

𝑖 ∈ J1, 𝑁K

where (𝐵𝑖 ) is a collection of real-valued, independent standard Brownian motions. The solution of this SDE is called the Dyson Brownian motion (DBM) with parameter 𝛽. Notice that we defined the DBM for any 𝛽 ≥ 1 and not only for the classical values 𝛽 = 1, 2 that will correspond to an OU matrix flow. The following theorem summarizes the main properties of the DBM. For the proof, see lemma 4.3.3 and proposition 4.3.5 of [8]. We remark that the authors in [8] considered (12.4) without the drift term −𝜆𝑖 /2. However, any drift term of the form −𝑉 ′ (𝜆𝑖 ) with 𝑉 ∈ 𝐶 2 (ℝ) could be added without further complications. Theorem 12.2. Let 𝛽 ≥ 1 and suppose that the initial data satisfy 𝝀(0) ∈ Σ𝑁 . Then there exists a unique (strong) solution to (12.4) in the space of continuous functions (𝝀(𝑡))𝑡≥0 ∈ 𝐶(ℝ+ , Σ𝑁 ). Furthermore, for any 𝑡 > 0 we have 𝝀(𝑡) ∈ Σ𝑁 , and 𝝀(𝑡) depends continuously on 𝝀(0). In particular, if 𝝀(0) ∈ Σ𝑁 ; i.e., the multiplicity of the initial points is 1, then (𝝀(𝑡))𝑡≥0 ∈ 𝐶(ℝ+ , Σ𝑁 ); i.e., this property is preserved for all times along the evolution. We point out that the solution is considered in the strong sense. This means that despite the singularity in (12.4), the DBM admits a solution for almost all realizations of the Brownian motions 𝐵𝑖 . For the precise definition, we recall the concept of strong solution to a (scalar) SDE of the form (12.5)

d𝑋𝑡 = 𝑎(𝑋𝑡 )d𝐵𝑡 + 𝑏(𝑋𝑡 )d𝑡,

𝑋0 = 𝜉, 𝑡 ≥ 0,

12.1. DYSON BROWNIAN MOTION

111

with respect to a fixed realization of a Brownian motion 𝐵𝑡 where 𝑎 and 𝑏 are given coefficient functions. The strong solution is a process (𝑋𝑡 )𝑡≥0 on a filtrated probability space (Ω, ℱ𝑡 ) with continuous sample paths that satisfies the following properties: • 𝑋𝑡 is adapted to the filtration ℱ𝑡 = 𝜎(𝒢𝑡 ∪ 𝒩) where 𝒢𝑡 = 𝜎(𝐵𝑠 , 𝑠 ≤ 𝑡, 𝑋0 ),

𝒩 ∶= {𝑁 ⊂ Ω, ∃𝐺 ⊂ 𝒢∞ with 𝑁 ⊂ 𝐺, ℙ(𝐺) = 0};

i.e., the filtration of 𝑋𝑡 is the same as that of 𝐵𝑡 after completing it with events of zero measure. • 𝑋0 = 𝜉 almost surely. 𝑡 • For any time 𝑡, we have ∫0 [|𝑏(𝑋𝑠 )|2 + |𝑎(𝑋𝑠 )|2 ]d𝑠 < ∞ almost surely. • The integral version of (12.5); i.e., 𝑡

𝑡

𝑋𝑡 = 𝑋0 + ∫ 𝑎(𝑋𝑠 )d𝐵𝑠 + ∫ 𝑏(𝑋𝑠 )d𝑠 0

0

holds almost surely for any 𝑡 ≥ 0. For simplicity, we only presented the definition for the scalar case, 𝑋𝑡 ∈ ℝ; the extension to the vector-valued case, 𝑋𝑡 ∈ ℝ𝑁 , is straightforward. The relation between the OU matrix flow (12.1) and the DBM (12.4) is given by the following theorem, essentially due to Dyson. Theorem 12.3 ([45]). Let 𝐻𝑡 solve the matrix-valued SDE (12.1) in a strong sense. Then its eigenvalue process satisfies (12.4) where the parameter 𝛽 = 1 if the matrix B is real symmetric and 𝛽 = 2 if B is complex Hermitian. We remark that the Ornstein-Uhlenbeck drift term − 12 𝐻𝑡 d𝑡 in (12.1) is not essential. One may replace it with −𝑉 ′ (𝐻)d𝑡 with any 𝑉 ∈ 𝐶 2 (ℝ) potential; then the same theorem holds but the −𝜆𝑖 /2 term has to be replaced with −𝑉 ′ (𝜆𝑖 ) in (12.4). In particular, the drift term can be removed; for example, the monograph [8] defines DBM without the drift term. While there is no fundamental difference between these formulations, (12.1) is technically slightly more convenient since it preserves not only the zero expectation value but also the 1/𝑁 variance of the matrix elements if the variance of the initial data 𝐻0 is 1/𝑁. If technical complications related to the singularities (𝜆𝑖 − 𝜆𝑗 )−1 in (12.4) could be neglected, the proof of Theorem 12.3 would be a relatively straightforward Itô calculus combined with standard perturbation formulas for eigenvalues and eigenvectors of self-adjoint matrices. We will present this calculation in the next section. A rigorous proof is slightly more cumbersome since a priori the 𝐿2 -integrability of the singularity is not guaranteed. The actual proof first regularizes the singular term on a very short scale. Then a stopping time argument shows that the regularization can be removed since the dynamics has an effective level repulsion mechanism that keeps neighboring points separated. We will not present this argument here since it is not particularly instructive for this book. The full details can be found in section 4.3.1 of [8] for the case

112

12. UNIVERSALITY FOR MATRICES WITH GAUSSIAN CONVOLUTIONS

when the term − 12 𝐻𝑡 d𝑡 is not present in (12.1). The necessary modifications for (12.1) (or for some other drift term −𝑉 ′ (𝜆𝑖 )) are straightforward. 12.2. Derivation of Dyson Brownian Motion and Perturbation Theory Let 𝐮𝛼 , 𝛼 = 1, … , 𝑁, be the ℓ2 -normalized eigenvectors of 𝐻 = (ℎ𝑖𝑗 ) with (real) eigenvalues 𝜆𝛼 . For simplicity, we assume that all eigenvalues are distinct. Fix an index pair (𝑖, 𝑗) and abbreviate the partial derivative as 𝑓 ̇ ∶=

𝜕𝑓 . 𝜕ℎ𝑖𝑗

Differentiating 𝐻𝐮𝛼 = 𝜆𝛼 𝐮𝛼 and 𝐮∗𝛼 𝐮𝛽 = 𝛿𝛼,𝛽 yields (12.6)

̇ 𝛼 + 𝐻 𝐮̇ 𝛼 = 𝜆𝛼̇ 𝐮𝛼 + 𝜆𝛼 𝐮̇ 𝛼 , 𝐻𝐮

as well as 𝐮̇ ∗𝛼 𝐮𝛽 + 𝐮∗𝛼 𝐮̇ 𝛽 = 0,

𝐮̇ ∗𝛼 𝐮𝛼 = 0.

Taking the inner product with 𝐮𝛼 on both sides of (12.6), we get ̇ 𝛼. 𝜆𝛼̇ = 𝐮∗𝛼 𝐻𝐮

(12.7)

Moreover, multiplying (12.6) by 𝐮∗𝛽 yields, for 𝛽 ≠ 𝛼, ̇ 𝛼 + 𝐮∗𝛽 𝐻 𝐮̇ 𝛼 = 𝜆𝛼 𝐮∗𝛽 𝐮̇ 𝛼 ; 𝐮∗𝛽 𝐻𝐮 i.e., ̇ 𝛼 + 𝜆𝛽 𝐮∗𝛽 𝐮̇ 𝛼 = 𝜆𝛼 𝐮∗𝛽 𝐮̇ 𝛼 𝐮∗𝛽 𝐻𝐮 where we have used that 𝐻 is self-adjoint and 𝜆𝛽 is real in the last equation. Hence, (12.8)

𝐮̇ 𝛼 =

∑ (𝐮∗𝛽 𝐮̇ 𝛼 )𝐮𝛽 𝛽≠𝛼

= ∑ 𝛽≠𝛼

̇ 𝛼 𝐮∗𝛽 𝐻𝐮 𝜆𝛼 − 𝜆𝛽

𝐮𝛽 .

For simplicity of notation, from now on we consider only the real symmetric case. The complex Hermitian case can be treated similarly. In the real symmetric case, (12.7) reads as (12.9)

𝜕𝜆𝛼 = 𝑢𝛼 (𝑖)𝑢𝛼 (𝑗)[2 − 𝛿𝑖𝑗 ], 𝜕ℎ𝑖𝑗

where 𝑢𝛼 (1), … , 𝑢𝛼 (𝑁) denote the coordinates of the vector 𝐮𝛼 . We may assume that the eigenvectors are real. From (12.8) we get (12.10)

𝑢𝛽 (𝑖)𝑢𝛼 (𝑗) + 𝑢𝛽 (𝑗)𝑢𝛼 (𝑖)[1 − 𝛿𝑖𝑗 ] 𝜕𝑢𝛼 (𝑘) = ∑ 𝑢𝛽 (𝑘). 𝜕ℎ𝑖𝑗 𝜆𝛼 − 𝜆 𝛽 𝛽≠𝛼

12.2. DERIVATION OF DYSON BROWNIAN MOTION AND PERTURBATION THEORY

113

Combining these last two formulas allows us to compute the second partial derivatives; i.e., for any fixed indices 𝑖, 𝑗, 𝑘, and ℓ we have 𝜕 2 𝜆𝛼 𝜕ℎℓ𝑗 𝜕ℎ𝑖𝑘 = [2 − 𝛿𝑖𝑘 ][

𝜕𝑢𝛼 (𝑖) 𝜕𝑢 (𝑘) 𝑢 (𝑘) + 𝑢𝛼 (𝑖) 𝛼 ] 𝜕ℎℓ𝑗 𝛼 𝜕ℎℓ𝑗

1 [(𝑢 (𝑗)𝑢𝛼 (ℓ) + 𝑢𝛽 (ℓ)𝑢𝛼 (𝑗)[1 − 𝛿𝑗ℓ ])𝑢𝛽 (𝑖)𝑢𝛼 (𝑘) 𝜆 − 𝜆𝛽 𝛽 𝛼 𝛽≠𝛼

= [2 − 𝛿𝑖𝑘 ] ∑

(12.11)

+ (𝑢𝛽 (ℓ)𝑢𝛼 (𝑗) + 𝑢𝛽 (𝑗)𝑢𝛼 (ℓ)[1 − 𝛿𝑗ℓ ])𝑢𝛽 (𝑘)𝑢𝛼 (𝑖)] 1 [(𝑢 (𝑗)𝑢𝛼 (ℓ) + 𝑢𝛽 (ℓ)𝑢𝛼 (𝑗)[1 − 𝛿𝑗ℓ ]) 𝜆 − 𝜆𝛽 𝛽 𝛼 𝛽≠𝛼

= [2 − 𝛿𝑖𝑘 ] ∑

⋅ (𝑢𝛽 (𝑖)𝑢𝛼 (𝑘) + 𝑢𝛽 (𝑘)𝑢𝛼 (𝑖))].

By (12.2), (12.9), (12.11), and using Itô’s formula (neglecting the issue of singularity), we have 𝜕𝜆𝛼 𝜕 2 𝜆𝛼 1 dℎ𝑖𝑘 + ∑ ∑ (dℎ𝑖𝑘 )(dℎℓ𝑗 ) 𝜕ℎ𝑖𝑘 2 𝑖≤𝑘 𝑗≤ℓ 𝜕ℎ𝑖𝑘 𝜕ℎℓ𝑗 𝑖≤𝑘

d𝜆𝛼 = ∑

= ∑ 𝑢𝛼 (𝑖)𝑢𝛼 (𝑘)[

d𝑏𝑖𝑘



ℎ𝑖𝑘 d𝑡] 2

√𝑁 1 1 ∑∑ + [|𝑢 (𝑖)|2 |𝑢𝛼 (𝑘)|2 + |𝑢𝛼 (𝑖)|2 |𝑢𝛽 (𝑘)|2 ]d𝑡 2𝑁 𝑖,𝑘 𝛽≠𝛼 𝜆𝛼 − 𝜆𝛽 𝛽 𝑖,𝑘

(12.12)

=

1 √𝑁

∑ 𝑢𝛼 (𝑖)𝑢𝛼 (𝑘)d𝑏𝑖𝑘 − 𝑖,𝑘

𝜆𝛼 1 1 ∑ d𝑡 + d𝑡. 2 𝑁 𝛽≠𝛼 𝜆𝛼 − 𝜆𝛽

In the second line we used 𝑏𝑖𝑘 = 𝑏𝑘𝑖 for 𝑖 ≠ 𝑘 and that (dℎ𝑖𝑘 )(dℎℓ𝑗 ) =

1 𝛿 𝛿 [1 + 𝛿𝑖𝑘 ]d𝑡. 𝑁 𝑖ℓ 𝑘𝑗

Finally, in the last line we used the equation 𝐻𝐮𝛼 = 𝜆𝛼 𝐮𝛼 to get ∑ 𝑢𝛼 (𝑖)𝑢𝛼 (𝑘)ℎ𝑖𝑘 = ∑ 𝑢𝛼 (𝑖)(𝐻𝐮𝛼 )(𝑖) = 𝜆𝛼 ∑ |𝑢𝛼 (𝑖)|2 = 𝜆𝛼 , 𝑖,𝑘

𝑖

𝑖

which gives the −(𝜆𝑖 /2)d𝑡 term in (12.4). In the first term on the right-hand side of (12.12) we define a new real Gaussian process ˜ 𝛼 ∶= ∑ 𝑢𝛼 (𝑖)𝑢𝛼 (𝑘)𝑏𝑖𝑘 . 𝐵 𝑖,𝑘

114

12. UNIVERSALITY FOR MATRICES WITH GAUSSIAN CONVOLUTIONS

˜ 𝛼 = 0 and its covariance satisfies Clearly, 𝔼d 𝐵 ˜ 𝛼d 𝐵 ˜ 𝛼′ = 𝔼 ∑ ∑ 𝑢𝛼 (𝑖)𝑢𝛼 (𝑘)d𝑏𝑖𝑘 𝑢𝛼′ (ℓ)𝑢𝛼′ (𝑗)d𝑏ℓ𝑗 𝔼d 𝐵 𝑖,𝑘 ℓ,𝑗

= 2 ∑ 𝑢𝛼 (𝑖)𝑢𝛼′ (𝑖)𝑢𝛼 (𝑘)𝑢𝛼′ (𝑘)d𝑡 𝑖,𝑘

= 𝛿𝛼𝛼′ 2 d𝑡 where there are two pairings, (𝑖, ℓ) = (𝑘, 𝑗) and (𝑖, ℓ) = (𝑗, 𝑘), that contributed ˜ 𝛼 = √2𝐵𝛼 where (𝐵𝛼 )𝑁 to the contractions. Thus 𝐵 𝛼=1 is the standard (real) 𝑁 Brownian motion in ℝ , and this gives the martingale term in (12.4) with 𝛽 = 1. A similar formula can be derived for the Hermitian case, where the parameter becomes 𝛽 = 2; we omit the details. In the next section we put the Dyson Brownian motion in a more general context that is closer to the interpretation of GUE as an invariant ensemble in the spirit of Chapter 4. It turns out that the measure on the eigenvalues, explicitly given in (4.3), generates a DBM in a canonical way such that this measure will be invariant under the dynamics. This holds for any values 𝛽 ≥ 1, i.e., even if there is no underlying matrix ensemble. 12.3. Strong Local Ergodicity of the Dyson Brownian Motion One key property of DBM is that it leaves the Gaussian Wigner ensembles invariant. Moreover, the Gaussian measure is the only equilibrium of DBM, and the DBM dynamics converges to this equilibrium from any initial condition. In this section we will quantify these ideas. We start with introducing the concept of the Dirichlet form and the generator associated with any probability measure 𝜇 = 𝜇𝑁 on ℝ𝑁 . For definiteness, the reader may keep the invariant 𝛽-ensemble (4.3) in mind, i.e., 𝜇𝑁 (d𝝀) = (12.13)

𝑒−𝛽𝑁ℋ𝑁 (𝝀) d𝝀, 𝑍

𝑁

ℋ𝑁 (𝝀) ∶=

1 1 ∑ 𝑉(𝜆𝑖 ) − ∑ log |𝜆𝑗 − 𝜆𝑖 |. 2 𝑖=1 𝑁 𝑖 0; thus we see that for any initial condition 𝑓0 ∈ 𝐿2 (d𝜇) we have 𝑓𝑡 ∈ 𝐻 1 (d𝜇) for any 𝑡 > 0. The restriction 𝛽 ≥ 1 is essential to define the dynamics on the simplex Σ𝑁 without specifying additional boundary conditions; the same restriction was also necessary in Theorem 12.2 to define the strong solution to the DBM as a stochastic differential equation. Indeed, if 𝛽 < 1, the particles cross each other in the SDE formulation. In this section, we thus assume 𝛽 ≥ 1. Nevertheless, some ideas and results presented here are still applicable to any 𝛽 > 0 with a regularization; we will comment on this in Section 13.7. By writing the distribution of the eigenvalues as 𝑓𝑡 𝜇𝐺 , our formulation of the problem has already taken into account Dyson’s observation that the invariant measure for DBM is 𝜇𝐺 since, as we will see, at 𝑡 → ∞ the solution 𝑓𝑡 converges 2

116

12. UNIVERSALITY FOR MATRICES WITH GAUSSIAN CONVOLUTIONS

to 𝑓∞ = 1. A natural question regarding the DBM is how fast the dynamics reaches equilibrium. Dyson had already posed this question in 1962: Dyson’s conjecture [45]. The global equilibrium of DBM is reached in time of order 1 and the local equilibrium (in the bulk) is reached in time of order 1/𝑁. Dyson further remarked, The picture of the gas coming into equilibrium in two wellseparated stages, with microscopic and macroscopic time scales, is suggested with the help of physical intuition. A rigorous proof that this picture is accurate would require a much deeper mathematical analysis. [45, p. 1197] We will prove that Dyson’s conjecture is correct if the initial data of the DBM is the eigenvalues of a Wigner ensemble, which was Dyson’s original interest. Our result in fact is valid for DBM with much more general initial data, which we now survey. Briefly, it will turn out that the global equilibrium is indeed reached within a time of order 1, but local equilibrium is achieved much faster if an a priori estimate on the location of the eigenvalues (also called points) is satisfied. Recalling the definition of the classical locations 𝛾𝑗 from (11.31), the a priori bound (originally referred to as “Assumption III” in [63, 64]) is formulated as follows. A priori estimate. There exists a 𝜉 > 0 such that average rigidity on scale −1+𝜉 𝑁 holds, i.e., 𝑁

(12.18)

1 ∫ ∑ (𝜆𝑗 − 𝛾𝑗 )2 𝑓𝑡 (𝜆)𝜇𝐺 (d𝜆) ≤ 𝐶𝑁 −2+2𝜉 0≤𝑡≤𝑁 𝑁 𝑗=1

𝑄 = 𝑄𝜉 ∶= sup

with a constant 𝐶 uniformly in 𝑁. (We assumed (12.18) with a lower bound 𝑡 ≥ 0 in the supremum, but in fact the proof shows that 𝑡 ≥ 𝑁 −1+2𝜉 would be sufficient.) Notice that (12.18) requires that the rigidity of the eigenvalues on scale −1+𝜉 𝑁 holds, at least in an average sense. We recall that Theorem 11.5 guarantees that for Wigner eigenvalues rigidity holds on scale 𝑁 −1+𝜉 for any 𝜉 > 0 (in the bulk), so (12.18) holds for any 𝜉 > 0 in this case. However, our theory on the local ergodicity of the DBM applies to other models as well, so we wish to keep our formulation general and allow situations where (12.18) holds only for larger 𝜉. The main result on the local ergodicity of Dyson Brownian motion (12.17) states that if the a priori estimate (12.18) is satisfied, then the local correlation functions of the measure 𝑓𝑡 𝜇𝐺 are the same as the corresponding ones for the Gaussian measure, 𝜇𝐺 = 𝑓∞ 𝜇𝐺 , provided that 𝑡 is larger than 𝑁 −1+2𝜉 . The 𝑛-point correlation functions of the (symmetrized) probability measure 𝜈 are defined, similarly to (4.20), by (12.19)

(𝑛)

𝑝𝜈,𝑁 (𝑥1 , … , 𝑥𝑛 ) ∶= ∫

ℝ𝑁−𝑛

𝜈(𝐱)d𝑥𝑛+1 ⋯ d𝑥𝑁 ,

𝐱 = (𝑥1 , … , 𝑥𝑁 ).

12.3. STRONG LOCAL ERGODICITY OF THE DYSON BROWNIAN MOTION

117

In particular, when 𝜈 = 𝑓𝑡 d𝜇𝐺 , we denote the correlation functions by (𝑛)

(12.20)

(𝑛)

𝑝𝑡,𝑁 (𝑥1 , … , 𝑥𝑛 ) = 𝑝𝑓𝑡 𝜇𝐺 ,𝑁 (𝑥1 , … , 𝑥𝑛 ). (𝑛)

(𝑛)

We also use 𝑝𝐺,𝑁 for 𝑝𝜇𝐺 ,𝑁 . In general, if 𝐻 is an 𝑁 × 𝑁 Wigner matrix, then (𝑛)

we will also use 𝑝𝐻,𝑁 for the correlation functions of the eigenvalues of 𝐻. Due to the convention that one can view the locations of eigenvalues as the coordinates of particles (or points), we have used 𝐱, instead of 𝝀, in the last equation. From now on, we will use both conventions depending on which viewpoint we wish to emphasize. Notice that the probability distribution of the eigenvalues at time 𝑡, 𝑓𝑡 𝜇𝐺 , is the same as that of the Gaussian divisible matrix: (12.21)

𝐻𝑡 = 𝑒−𝑡/2 𝐻0 + (1 − 𝑒−𝑡 )1/2 𝐻 G ,

where 𝐻0 is the initial Wigner matrix and 𝐻 G is an independent standard GUE (or GOE) matrix. This establishes the universality of the Gaussian divisible ensembles. The precise statement is the following theorem (notice that [64] uses somewhat different notation). Theorem 12.4 ([64, theorem 2.1]). Suppose that for some exponent 𝜉 ∈ (0, ), the average rigidity (12.18) holds for the solution 𝑓𝑡 of the forward equation 1 2

(12.17) on scale 𝑁 −1+𝜉 . Additionally, suppose that in the bulk the rigidity holds on scale 𝑁 −1+𝜉 even without averaging; i.e., for any 𝜅 > 0 (12.22)

sup

|𝜆𝑗 − 𝛾𝑗 | ≺ 𝑁 −1+𝜉

𝜅𝑁≤𝑗≤(1−𝜅)𝑁

holds for any 𝑡 ∈ [𝑁 −1+2𝜉 , 𝑁] if 𝑁 ≥ 𝑁0 (𝜉, 𝜅) is large enough. Let 𝐸 ∈ (−2, 2) and 𝑏 = 𝑏𝑁 > 0 such that [𝐸 − 𝑏, 𝐸 + 𝑏] ⊂ (−2, 2). Then, for any integer 𝑛 ≥ 1 and for any compactly supported smooth test function 𝑂 ∶ ℝ𝑛 → ℝ we have, for any 𝑡 ∈ [𝑁 −1+2𝜉 , 𝑁], | 𝐸+𝑏 d𝐸 ′ 𝜶 | (𝑛) (𝑛) ∫ d𝜶 𝑂(𝜶)(𝑝𝑡,𝑁 − 𝑝𝐺,𝑁 )(𝐸 ′ + )| ≤ (12.23) |∫ 2𝑏 𝑁 | 𝐸−𝑏 | ℝ𝑛 𝑁𝜀[

𝑁 −1+𝜉 1 +√ ]‖𝑂‖𝐶1 𝑏 𝑏𝑁𝑡

for any 𝑁 sufficiently large, 𝑁 ≥ 𝑁0 (𝑛, 𝜉, 𝜅). The upper limit 𝑡 ≤ 𝑁 for the time interval in (12.18) and within the theorem is unimportant; one could have replaced it with 𝑁 𝜀 for any 𝜀 > 0. Beyond 𝑡 ≥ 𝑁 𝜀 the measure 𝑓𝑡 𝜇G is already exponentially close to equilibrium 𝜇G in the entropy sense, so all correlation functions are also close. Also, notice that for simplicity of notation we replaced 𝜶/𝑁𝜚sc (𝐸) by 𝜶/𝑁 in the argument of the correlation functions (compare (12.23) with (5.2)). This can be easily achieved by a change of variables and a redefinition of the test function 𝑂. This convention will be followed in the rest of the book.

118

12. UNIVERSALITY FOR MATRICES WITH GAUSSIAN CONVOLUTIONS

In other words, this theorem states that if we have rigidity on scale 𝑁 −1+𝜉 for some 𝜉 ∈ (0, 12 ), then the DBM has average energy universality in the bulk for any time 𝑡 ≫ 𝑁 −1+2𝜉 on scale 𝑏 ≫ max{𝑁 −1+𝜉 , (𝑁𝑡)−1 }. For generalized Wigner matrices we know that rigidity holds on the smallest possible scale, so we have the following corollary: Corollary 12.5. Consider the matrix OU process with initial matrix ensemble 𝐻0 being a generalized Wigner ensemble and let |𝐸| < 2. Then, for any 𝜀 > 0, for any 𝑡 ∈ [𝑁 −1+2𝜀 , 𝑁 𝜀 ], and for any 𝑏 ≥ (𝑁𝑡)−1 with 𝑏 < ||𝐸| − 2| we have (12.24)

| 𝐸+𝑏 d𝐸 ′ 𝑁 3𝜀 𝜶 | (𝑛) (𝑛) |∫ ∫ d𝜶 𝑂(𝜶)(𝑝𝑡,𝑁 − 𝑝𝐺,𝑁 ) (𝐸 ′ + )| ≤ ‖𝑂‖𝐶1 𝑁 | √𝑏𝑁𝑡 | 𝐸−𝑏 2𝑏 ℝ𝑛 (𝑛)

for any 𝑁 ≥ 𝑁0 (𝑛, 𝜀) where 𝑝𝑡,𝑁 is the correlation function of 𝐻𝑡 , or equivalently, that of 𝑒−𝑡/2 𝐻0 + √1 − 𝑒−𝑡 𝐻 G . Proof. Since 𝐻0 is a generalized Wigner ensemble, so is 𝑒−𝑡/2 𝐻0 + √1 − 𝑒−𝑡 𝐻 G for all 𝑡. Hence the optimal rigidity estimate (11.32) holds for all times. Therefore, (12.18) and (12.22) hold with any 𝜉 > 0. Choose 𝜉 = 𝜀/2 and apply Theorem 12.4. The right-hand side of (12.23) is bounded by 𝑁𝜀[

1 𝑁 3𝜀 𝑁 𝜀/2 + , ]≤ 𝑁𝑏 √𝑏𝑁𝑡 √𝑏𝑁𝑡

where we have used that 𝑡 ≤ 𝑁 𝜀 and 𝑏𝑁𝑡 ≥ 1.



The estimate (12.24) means that averaged energy universality holds in a window of size slightly bigger than 𝑡𝑁. A large energy window of size 𝑏 = 𝑁 −𝛿 , for some small 𝛿 > 0, can already be achieved after a very short time, i.e., for any 𝑡 ≥ 𝑁 −1+2𝛿 (by choosing 𝜀 = 𝛿/8). To achieve universality on very short energy scales, 𝑏 = 𝑁 −1+𝛿 , we will need relatively large times, 𝑡 ≥ 𝑁 −𝛿/2 (by choosing 𝜀 = 𝛿/16). In both cases, the right-hand side (12.24) is bounded by 𝐶𝑁 −𝜀 . In the sense of universality of large-energy windows of size 𝑏 = 𝑁 −𝛿 , our result essentially establishes the Dyson’s conjecture that the time to local equilibrium is 𝑁 −1 . A better error estimate can also be achieved if both 𝑏 and 𝑡 are large; e.g., with the choice 𝑏 = 𝑁 −𝛿 and 𝑡 ≥ 𝑁 −𝛿 , we get a convergence of order 𝑁 −1/2+𝛿+𝜀 for any 𝜀 > 0. Going back to Theorem 12.4, we also remark that there is considerable room in this argument, even if optimal rigidity is not available (this is the case for random band matrices, see [55]). In fact, Theorem 12.4 provides universality, albeit only for relatively large times 𝑡 ≥ 𝑁 −𝛿 and with a large-energy averaging, 𝑏 = 𝑁 −𝛿 , even if (12.18) and (12.22) hold only with 𝜉 = (1+𝛿)/2. This restriction comes from the requirement that 𝑡 ≥ 𝑁 −1+2𝜉 should be implied by 𝑡 ≥ 𝑁 −𝛿 . This exponent 𝜉 slightly above 12 corresponds to a rigidity control on a scale

12.4. EXISTENCE AND RESTRICTION OF THE DYNAMICS

119

just below 𝑁 −1/2 in the bulk. Clearly, 𝑁 −1/2 is a critical threshold; no local universality can be concluded with this argument unless a rigidity control on a scale below 𝑁 −1/2 is established a priori. 12.4. Existence and Restriction of the Dynamics This technical section was taken from appendix A of [64]; we include it here for the reader’s convenience. As in Section 12.3, we consider the euclidean space ℝ𝑁 with the normalized measure 𝜇 = exp(−𝛽𝑁ℋ)/𝑍. The Hamiltonian ℋ, of the form (12.13), is symmetric with respect to the permutation of the variables 𝐱 = (𝑥1 , … , 𝑥𝑁 ); thus, the measure can be restricted to the subset Σ𝑁 ⊂ ℝ𝑁 defined in (12.3). In this appendix we outline how to define the dynamics (12.17) with its generator, 1 formally given by 𝐿 = 2𝑁 Δ − 12 (∇ℋ)∇, on Σ𝑁 . The condition 𝛽 ≥ 1 and the specific factors ∏𝑖 0, on 𝐿2 [75, theorem 1.4.1], and it can be extended to a contraction semigroup to 𝐿1 (ℝ𝑁 , d𝜇), ‖𝑇𝑡 𝑓‖1 ≤ ‖𝑓‖1 [75, sec. 1.5]. The generator 𝐿 of the semigroup is defined via the Friedrichs extension [75, theorem 1.3.1], and it is a positive self-adjoint operator on its natural do1 main 𝐷(𝐿) with 𝐶0∞ being the core. The generator is given by 𝐿 = 2𝑁 Δ− 12 (∇ℋ)∇ on its domain [75, cor. 1.3.1]. By the spectral theorem, 𝑇𝑡 maps 𝐿2 into 𝐷(𝐿); thus with the notation 𝑓𝑡 = 𝑇𝑡 𝑓 for some 𝑓 ∈ 𝐿2 , it holds that (12.25)

𝜕𝑡 𝑓𝑡 = 𝐿𝑓𝑡 ,

𝑡 > 0,

and

lim ‖𝑓𝑡 − 𝑓‖2 = 0.

𝑡→0+

Moreover, by approximating 𝑓 by 𝐿2 functions and using that 𝑇𝑡 is a contraction in 𝐿1 (section 1.5 in [75]), the differential equation holds even if the initial condition 𝑓 is only in 𝐿1 . In this case, the convergence 𝑓𝑡 → 𝑓, as 𝑡 → 0+ , holds only in 𝐿1 . We remark that 𝑇𝑡 is also a contraction on 𝐿∞ , by duality. Now we restrict the dynamics to Σ = Σ𝑁 equipped with the probability measure 𝜇Σ (d𝐱) ∶= 𝑁! 𝜇(d𝐱) ⋅ 1(𝐱 ∈ Σ). Here 𝜇𝑁 is from (12.13) and the factor 𝑁! restores the normalization after restriction to Σ. Repeating the general construction with ℝ𝑁 replaced by Σ𝑁 , we obtain the corresponding generator 𝐿(Σ) (Σ) and the semigroup 𝑇𝑡 on the space 𝐻 1 (Σ, d𝜇Σ ) that is defined as the closure of 𝐶0∞ (Σ) with respect to the norm ‖ ⋅ ‖2+ = ℰ( ⋅ , ⋅ ) + ‖ ⋅ ‖22 on Σ.

120

12. UNIVERSALITY FOR MATRICES WITH GAUSSIAN CONVOLUTIONS

To establish the relation between 𝐿 and 𝐿(Σ) , we first define the symmetrized version of Σ, ˜ ∶= ℝ𝑁 ⧵ {𝐱 ∶ ∃𝑖 ≠ 𝑗 with 𝑥𝑖 = 𝑥𝑗 }. Σ ˜ ). The key information is that 𝑋 is dense in 𝐻 1 (ℝ𝑁 , d𝜇), Denote 𝑋 ∶= 𝐶0∞ ( Σ which is equivalent to the density of 𝑋 in 𝐶0∞ (ℝ𝑁 , d𝜇). We will check this property below. Then the general argument above directly applies if ℝ𝑁 is replaced ˜ 𝑁 , and it shows that the generator 𝐿 is the same (with the same domain) by Σ if we start from 𝑋 instead of 𝐶0∞ (ℝ𝑁 , d𝜇) as a core. Note that both 𝐿 and 𝐿(Σ) are local operators and 𝐿 is symmetric with respect to the permutation of the variables. For any function 𝑓 defined on Σ, we define its ˜ . Clearly, 𝐿 𝑓 ˜ = ˜ ˜ by 𝑓 symmetric extension onto Σ 𝐿(Σ) 𝑓 for any 𝑓 ∈ 𝐶0∞ (Σ). Since the generator is uniquely determined by its action on its core, and the generator uniquely determines the dynamics, we see that for any 𝑓 ∈ 𝐿1 (Σ, d𝜇), (Σ) ˜ and restricting it to Σ. In other one can determine 𝑇𝑡 𝑓 by computing 𝑇𝑡 𝑓 words, the dynamics (12.17) is well-defined when restricted to Σ = Σ𝑁 . Finally, we have to prove the density of 𝑋 in 𝐶0∞ (ℝ𝑁 , d𝜇), i.e., to show that if ˜ ) exists such that ℰ(𝑓−𝑓𝑛 , 𝑓−𝑓𝑛 ) → 0. 𝑓 ∈ 𝐶0∞ (ℝ𝑁 ), then a sequence 𝑓𝑛 ∈ 𝐶0∞ ( Σ ˜ is complicated since in addition to the one-codimensional The structure of Σ coalescence hyperplanes 𝑥𝑖 = 𝑥𝑗 (and 𝑥𝑖 = 0 in case of Σ+ ), it contains higherorder coalescence subspaces with higher codimensions. We will show the approximation argument in a neighborhood of a point 𝐱 such that 𝑥𝑖 = 𝑥𝑗 but 𝑥𝑖 ≠ 𝑥𝑘 for any other 𝑘 ≠ 𝑖, 𝑗. The proof uses the fact that the measure d𝜇 vanishes at least to first order, i.e., at least as |𝑥𝑖 − 𝑥𝑗 | around 𝐱, thanks to 𝛽 ≥ 1. This is the critical case; the argument near higher-order coalescence points is even easier, since they have lower codimension and the measure 𝜇 vanishes at even higher order. In a neighborhood of 𝐱 we can change to local coordinates such that 𝑟 ∶= 𝑥𝑖 − 𝑥𝑗 remains the only relevant coordinate. Thus, the task is equivalent to showing that any 𝑔 ∈ 𝐶0∞ (ℝ) can be approximated by a sequence 𝑔𝜀 ∈ 𝐶0∞ (ℝ ⧵ {0}) in the sense that (12.26)

∫ |𝑔′ (𝑟) − 𝑔𝜀′ (𝑟)|2 |𝑟|d𝑟 → 0 ℝ

as 𝜀 → 0 over a discrete set, e.g., 𝜀 = 𝜀𝑛 = 1/𝑛. It is sufficient to consider only the positive semi-axis, i.e., 𝑟 > 0. More precisely, define (12.27)

𝑔𝜀 (𝑟) ∶= 𝑔(𝑟)𝜙𝜀 (𝑟)

where 𝜙𝜀 is a continuous cutoff function defined as 𝜙𝜀 (𝑟) = 0 for 𝑟 ≤ 𝜀2 , 𝜙𝜀 (𝑟) = 1 for 𝑟 ≥ 𝜀, and (12.28)

𝜙𝜀 (𝑟) = 𝐶𝜀 (log 𝑟 − 2 log 𝜀), 𝜀2 ≤ 𝑟 ≤ 𝜀, 1 𝐶𝜀 = = |log 𝜀|−1 . log 𝜀 − log 𝜀2

12.4. EXISTENCE AND RESTRICTION OF THE DYNAMICS

121

Simple calculation shows that ∞

(12.29)

∫ |𝑔′ (𝑟) − 𝑔𝜀′ (𝑟)|2 𝑟 d𝑟 ≤ 0 ∞



𝐶 ∫ |𝑔′ (𝑟)|2 |1 − 𝜙𝜀 (𝑟)|2 𝑟 d𝑟 + 𝐶 ∫ |𝑔(𝑟)|2 |𝜙𝜀′ (𝑟)|2 𝑟 d𝑟. 0

0

The first term converges to 0 as 𝜀 → 0 by dominated convergence since 𝑔 ∈ 𝐻 1 (ℝ+ , 𝑟 d𝑟) and 𝜙𝜀 → 1 pointwise. The second term is bounded by ∞

∫ |𝑔(𝑟)|

𝜀

2

|𝜙𝜀′ (𝑟)|2 𝑟 d𝑟



𝐶𝜀2 ‖𝑔‖2∞

0

‖𝑔‖2∞ 12 ∫ ||| ||| 𝑟 d𝑟 = , 𝑟 |log 𝜀| 𝜀2

which also tends to 0 since 𝑔 ∈ 𝐿∞ . This shows that 𝑔𝜀 converges to 𝑔 in 𝐻 1 . The function 𝑔𝜀 is not yet smooth, but we can smooth it on a scale 𝛿 ≪ 𝜀2 by taking the convolution with a smooth function 𝜂𝛿 compactly supported in [−𝛿, 𝛿]. It is an elementary exercise to see that 𝜂𝛿 ⋆ 𝑔𝜀 converges to 𝑔𝜀 in 𝐻 1 as 𝛿 → 0. This verifies (12.26). The same proof shows that 𝐶0∞ (ℝ+ ⧵ {0}) is dense in 𝐻 1 (ℝ+ , 𝑟 d𝑟) ∩ 𝐿∞ (ℝ+ , 𝑟 d𝑟). This completes the proof that for 𝛽 ≥ 1 the dynamics is well-defined in the space 𝐻 1 (Σ, d𝜇Σ ). In fact, the boundedness condition 𝑓 ∈ 𝐿∞ (ℝ+ , 𝑟 d𝑟) can be removed in the above argument, and one can show that 𝐶0∞ (ℝ+ ⧵ {0}) is dense in 𝐻 1 (ℝ+ , 𝑟 d𝑟). This is a type of Meyers-Serrin theorem concerning the equivalence of two possible definitions of Sobolev spaces on the measure space (ℝ+ , 𝑟 d𝑟): the closure of 𝐶0∞ functions coincides with the set of functions with finite 𝐻 1 -norm. Another formulation is the following: If we extend the functions on ℝ+ to twodimensional radial functions, i.e., 𝐺(𝑥) ∶= 𝑔(|𝑥|), 𝐺𝜀 (𝑥) = 𝑔𝜀 (|𝑥|) for 𝑥 ∈ ℝ2 , then this statement is equivalent to the fact that a point in two dimensions has zero capacity. To see this stronger claim, we need a different cutoff function in (12.27). Let us now define 𝜙𝜀 (𝑟) ∶= log(𝑎 + 𝑏 log 𝑟), 𝜀2 ≤ 𝑟 ≤ 𝜀, with 𝑏 ∶= (1 − 𝑒)/ log 𝜀, 𝑎 ∶= 2𝑒 − 1, and 𝜙𝜀 (𝑟) ∶= 0 for 𝑟 < 𝜀2 , and 𝜙𝜀 (𝑟) ∶= 1 for 𝑟 ≥ 𝜀. The first term on the right-hand side of (12.29) goes to 0 as before. For the last term in (12.29), we have 𝜀



∫ |𝑔(𝑟)|

2

|𝜙𝜀′ (𝑟)|2 𝑟 d𝑟

2

=𝑏 ∫ 𝜀2

0

|𝑔(𝑟)|2 𝑟 d𝑟. 𝑟2 (𝑎 + 𝑏 log 𝑟)2

2

We view this as an integral on ℝ by radially extending the function 𝑔 to ℝ2 by 𝐺(𝑦) ∶= 𝑔(|𝑦|) for any 𝑦 ∈ ℝ2 . Then the left-hand side above gives 𝜀

𝑏2 ∫ 𝜀2

|𝑔(𝑟)|2 |𝐺(𝑦)|2 2 ∫ 𝑟 d𝑟 = 𝑏 d𝑦 𝑟2 (𝑎 + 𝑏 log 𝑟)2 |𝑦|2 (𝑎 + 𝑏 log |𝑦|)2 𝜀2 ≤|𝑦|≤𝜀 ≤ 𝐶∫ 𝜀2 ≤|𝑦|≤𝜀

|𝐺(𝑦)|2 d𝑦. |𝑦|2 (log |𝑦|)2

122

12. UNIVERSALITY FOR MATRICES WITH GAUSSIAN CONVOLUTIONS 𝑘

Define the sequence 𝜀𝑘 ∶= 2−2 ; clearly 𝜀𝑘2 = 𝜀𝑘+1 . Setting 𝐼𝑘 ∶= ∫ 𝜀𝑘+1 ≤|𝑦|≤𝜀𝑘

|𝐺(𝑦)|2 d𝑦, |𝑦|2 (log |𝑦|)2

we clearly have ∞



∑ 𝐼𝑘 = ∫ 𝑘=1

|𝑦|≤1/2

|𝐺(𝑦)|2 d𝑦 ≤ 𝐶 ∫ |∇𝐺|2 = 𝐶 ′ ∫ |𝑔′ (𝑟)|2 𝑟 d𝑟, |𝑦|2 (log |𝑦|)2 ℝ2 0

where in the last step we used the critical Hardy inequality with logarithmic correction (see, e.g., theorem 2.8 in [48]). Since in our case 𝑔 ∈ 𝐻 1 (ℝ+ , 𝑟 d𝑟), from the last displayed inequality we have 𝐼𝑘 → 0 as 𝑘 → ∞. Therefore, we can use the cutoff function 𝜙𝜀𝑘 to prove the claim. The Meyers-Serrin theorem we just proved for (𝑅+ , 𝑟 d𝑟) can be extended to (ℝ𝑁 , d𝜇) using local changes of coordinates if 𝛽 ≥ 1. Thus, one may prove that the form domain of the operator 𝐿(Σ) is 𝐻 1 (Σ, d𝜇Σ ) and the dynamics (12.25) is well-defined on this space.

CHAPTER 13

Entropy and the Logarithmic Sobolev Inequality (LSI) 13.1. Basic Properties of the Entropy In this section, we prove a few well-known inequalities for the entropy that will be used in next section for the logarithmic Sobolev inequality (LSI). In this section we work on an arbitrary probability space (Ω, ℬ) with an appropriate 𝜎-algebra. We will use the probabilistic and the analysis notation and concepts in parallel. In particular, we interchangeably use the concept of random variables and measurable functions on Ω. Similarly, both the expectation 𝔼𝜇 [𝑓] and the integral notation ∫Ω 𝑓 d𝜇 will be used. We will drop the Ω in the notation. Definition 13.1. For any two probability measures 𝜈, 𝜇 on the same probability space, we define the relative entropy of 𝜈 w.r.t. 𝜇 by (13.1)

𝑆(𝜈|𝜇) ∶= ∫ log

d𝜈 d𝜈 d𝜈 d𝜈 = ∫ log d𝜇 d𝜇 d𝜇 d𝜇

if 𝜈 is absolutely continuous with respect to 𝜇, and we set 𝑆(𝜈|𝜇) ∶= ∞ otherwise. Applying the definition (13.1) to 𝜈 = 𝑓𝜇 for any probability density 𝑓 with ∫ 𝑓 d𝜇 = 1, we may also write 𝑆𝜇 (𝑓) ∶= 𝑆(𝑓𝜇|𝜇) = ∫ 𝑓(log 𝑓)d𝜇. In most cases, the reference measure 𝜇 will be canonical (e.g., the natural equilibrium measure), so often we will drop the subscript 𝜇 if there is no confusion. We may then call 𝑆𝜇 (𝑓) = 𝑆(𝑓) the entropy of 𝑓. Using the convexity of the function 𝑥 ↦ 𝑥 log 𝑥 on ℝ+ , a simple Jensen’s inequality, 0 = (∫ 𝑓 d𝜇) log(∫ 𝑓 d𝜇) ≤ ∫ 𝑓 log 𝑓 d𝜇, shows that the relative entropy is always nonnegative, 𝑆(𝜈|𝜇) ≥ 0. Proposition 13.2 (Entropy inequality or Gibbs inequality). Let 𝜇, 𝜈 be two probability measures on a common probability space and let 𝑋 be a random variable. Then, for any positive number 𝛼 > 0, the following inequality holds as long as the right-hand side is finite: (13.2)

𝔼𝜈 [𝑋] ≤ 𝛼 −1 𝑆(𝜈|𝜇) + 𝛼 −1 log 𝔼𝜇 𝑒𝛼𝑋 . 123

124

13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI)

Proof. First note that we only have to prove the case 𝛼 = 1 since we can redefine 𝛼𝑋 → 𝑋. From the concavity of the logarithm and Jensen’s inequality, we have d𝜇 ∫ 𝑋 d𝜈 − 𝑆(𝜈|𝜇) = ∫ log[𝑒𝑋 ]d𝜈 ≤ log 𝔼𝜈 𝑒𝑋 , d𝜈 □ and this proves (13.2). Although (13.2) is just an inequality, there is a variational characterization of the relative entropy behind it. Namely, (13.3)

𝑆(𝜈|𝜇) = sup[𝔼𝜈 [𝑋] − log 𝔼𝜇 𝑒𝑋 ], 𝑋

where the supremum is over all bounded random variables. As we will not need this relation in this book, we leave it to the interested reader to prove it. As a corollary of (13.2), we mention that for any set 𝐴 we have the bound ℙ𝜈 (𝐴) ≤

(13.4)

log 2 + 𝑆(𝜈|𝜇) log

1

.

ℙ𝜇 (𝐴)

The proof is left as an exercise. This inequality will be used in the following context. Suppose that the relative entropy of 𝜈 with respect to 𝜇 is finite. Then in order to show that a set has a small probability w.r.t. 𝜈, we need to verify that this set is exponentially small w.r.t. 𝜇. In this sense, entropy provides only a relatively weak link between two measures. However, it is still stronger than the total variational norm, which we will show now. For any two probability measures 𝑓𝜇 and 𝜇, the 𝐿𝑝 -distance between 𝑓𝜇 and 𝜇 is defined by 1/𝑝 𝑝

(13.5)

[∫ |𝑓 − 1| d𝜇]

.

When 𝑝 = 1, it is called the total variational norm between 𝑓𝜇 and 𝜇. Entropy is a weaker measure of distance between two probability measures than the 𝐿𝑝 distance for any 𝑝 > 1 but stronger than the total variational norm. The former statement can be expressed, for example, by the elementary inequality 1/𝑝

(13.6) ∫ 𝑓 log 𝑓 d𝜇 ≤ 2[∫ |𝑓 − 1|𝑝 d𝜇]

+

2 ∫ |𝑓 − 1|𝑝 d𝜇, 𝑝−1

𝑝 > 1,

whose proof is left as an exercise. (There is a more natural way to compare 𝐿𝑝 distance and the relative entropy in terms of the 𝐿 log 𝐿 Orlicz norm, but we will not use this norm in this book.) The latter statement will be made precise in the following proposition. Furthermore, we also remark the following easy relation among 𝐿𝑝 -norms and the entropy: (13.7)

d | | d𝑝 |

1/𝑝 𝑝

[∫ 𝑓 d𝜇] 𝑝=1

holds for any probability density 𝑓 w.r.t. 𝜇.

= ∫ 𝑓 log 𝑓 d𝜇

13.1. BASIC PROPERTIES OF THE ENTROPY

125

Proposition 13.3 (Entropy and total variation norm, Pinsker inequality). Suppose that ∫ 𝑓 d𝜇 = 1 and 𝑓 ≥ 0. Then, we have 2

(13.8)

2[∫ |𝑓 − 1|d𝜇] ≤ ∫ 𝑓 log 𝑓 d𝜇.

Proof. By the variational principle, we first rewrite (13.9)

∫ |𝑓 − 1|d𝜇 = sup ∫ 𝑓𝑔 d𝜇 − ∫ 𝑔 d𝜇. |𝑔|≤1

For any such function 𝑔, we have by the entropy inequality (13.2) that for any 𝑡>0 (13.10) ∫ 𝑓𝑔 d𝜇 − ∫ 𝑔 d𝜇 ≤ 𝑡 −1 log ∫ 𝑒𝑡𝑔 d𝜇 − ∫ 𝑔 d𝜇 + 𝑡 −1 ∫ 𝑓 log 𝑓 d𝜇. We now define the function ℎ(𝑡) ∶= log ∫ 𝑒𝑡𝑔 d𝜇 for any 𝑡 ≥ 0. A simple calculation shows that the second derivative of ℎ is given by ℎ′′ (𝑡) = ⟨𝑔; 𝑔⟩𝜔𝑡 ,

𝜔𝑡 ∶=

𝑒𝑡𝑔 𝜇 , ∫ 𝑒𝑡𝑔 d𝜇

where 𝜔𝑡 is a probability measure, and ⟨𝑓; 𝑔⟩𝜔 ∶= ∫ 𝑓𝑔 d𝜔 − (∫ 𝑓 d𝜔)(∫ 𝑔 d𝜔) denotes the covariance. Recall that the covariance is positive definite; i.e., it satisfies the usual Schwarz inequality, |⟨𝑓; 𝑔⟩𝜔 |2 ≤ ⟨𝑓; 𝑓⟩𝜔 ⟨𝑔; 𝑔⟩𝜔 . Since |𝑔| ≤ 1, we have 0 ≤ ⟨𝑔; 𝑔⟩𝜔𝑡 ≤ 1, so ℎ″ (𝑡) ≤ 1. Thus by Taylor’s theorem, ℎ(𝑡) ≤ ℎ(0) + 𝑡ℎ′ (0) + 𝑡 2 /2, i.e., 𝑡 𝑡−1 log ∫ 𝑒𝑡𝑔 d𝜇 ≤ ∫ 𝑔 d𝜇 + . 2 Together with (13.10), we have ∫ 𝑓𝑔 d𝜇 − ∫ 𝑔 d𝜇 ≤

1 𝑡 ∫ 𝑓 log 𝑓 d𝜇, + 𝑡 −1 ∫ 𝑓 log 𝑓 d𝜇 ≤ 2 √2

where we optimized 𝑡 in the last step. Since this bound holds for any 𝑔 with □ |𝑔| ≤ 1, using (13.9) we have proved (13.8).

126

13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI)

13.2. Entropy on Product Spaces and Conditioning Now we consider a product measure; i.e., we assume that the probability space has a product structure, Ω = Ω1 × Ω2 and 𝜇 = 𝜇1 ⊗ 𝜇2 , 𝜈 = 𝜈1 ⊗ 𝜈2 where 𝜇𝑗 and 𝜈𝑗 are probability measures on Ω𝑗 , 𝑗 = 1, 2. For simplicity, we denote the elements of Ω by (𝑥, 𝑦) with 𝑥 ∈ Ω1 , 𝑦 ∈ Ω2 . Writing 𝜈𝑗 = 𝑓𝑗 𝜇𝑗 , clearly we have 𝑆(𝜈1 ⊗ 𝜈2 |𝜇1 ⊗ 𝜇2 ) = 𝑆𝜇1 ⊗𝜇2 (𝑓1 ⊗ 𝑓2 ) =∬

(13.11)

𝑓1 (𝑥)𝑓2 (𝑦)[log 𝑓1 (𝑥) + log 𝑓2 (𝑦)]𝜇1 (d𝑥)𝜇2 (d𝑦)

Ω1 ×Ω2

= 𝑆𝜇1 (𝑓1 ) + 𝑆𝜇2 (𝑓2 ) = 𝑆(𝜈1 |𝜇1 ) + 𝑆(𝜈2 |𝜇1 );

i.e., the entropy is additive for product measures. This property of the entropy makes it an especially suitable tool to measure closeness in very high-dimensional analysis. For example, if the probability space is a large product space Ω = Ω⊗𝑁 equipped with two product measures 1 𝜇 = 𝜇1⊗𝑁 and 𝜈 = 𝑓 ⊗𝑁 𝜇, then their relative entropy 𝑆𝜇 (𝑓 ⊗𝑁 ) = 𝑁𝑆𝜇1 (𝑓)

(13.12)

grows only linearly in 𝑁, the number of degrees of freedom. It is easy to check that any 𝐿𝑝 -distance (13.5) for 𝑝 > 1 grows exponentially in 𝑁, which usually renders it useless. Thus the entropy is still stronger than the 𝐿1 -norm, but its growth in 𝑁 is much more manageable. An additional advantage is that the entropy is often easier to compute than any other 𝐿𝑝 -norm (with the exception of 𝐿2 ). Next we discuss how the entropy decomposes under conditioning. For simplicity, we assume the product structure, Ω = Ω1 × Ω2 as before. Let 𝜔 be a probability measure on the product space Ω. For any integrable function (random variable) 𝑢(𝑥, 𝑦) on Ω we denote its conditional expectation by 𝑢 ˆ = 𝔼𝜔 [𝑢|ℱ1 ]

(13.13)

where ℱ1 is the 𝜎-algebra of Ω1 canonically lifted to Ω. Note that 𝑢 ˆ = 𝑢 ˆ(𝑥) depends only on the 𝑥-variable. The conditional expectation 𝑢 ˆ may also be characterized by the following relation: it is the unique measurable function on Ω1 such that for any bounded, measurable (w.r.t. ℱ1 ) function 𝑂(𝑥) the following identity holds: 𝔼𝜔 [ˆ 𝑢𝑂] = 𝔼𝜔 [𝑢𝑂]. Written out in coordinates, it means that (13.14)

∬𝑢 ˆ(𝑥)𝑂(𝑥)𝜔(d𝑥 d𝑦) = ∬ 𝑢(𝑥, 𝑦)𝑂(𝑥)𝜔(d𝑥 d𝑦). Ω



In particular, if 𝜔(d𝑥 d𝑦) = 𝜔(𝑥, 𝑦)d𝑥 d𝑦 is absolutely continuous with respect to some reference product measure d𝑥 d𝑦 on Ω, then we have, somewhat formally, ∫Ω 𝑢(𝑥, 𝑦)𝜔(𝑥, 𝑦)d𝑦 . (13.15) 𝑢 ˆ(𝑥) = 2 ∫Ω2 𝜔(𝑥, 𝑦)d𝑦

13.2. ENTROPY ON PRODUCT SPACES AND CONDITIONING

127

The notation d𝑥 d𝑦 for the reference measure already indicates that in our applications Ω𝑗 will be euclidean spaces ℝ𝑛𝑗 of some dimension 𝑛𝑗 , and the reference measure will be the Lebesgue measure. We remark that the concept of conditioning can be defined in full generality with respect to any sub-𝜎-algebra; the product structure of Ω is not essential. However, we will not need the general definition in this book and the above definition is conceptually simpler. The conditional expectation gives rise to a trivial martingale decomposition: (13.16)

𝑢=𝑢 ˆ + (𝑢 − 𝑢 ˆ),

where 𝑢 ˆ is ℱ1 -measurable, while 𝑢−ˆ 𝑢 has zero expectation on any ℱ1 -measurable set. Subtracting the expectation 𝑢 ∶= 𝔼𝜔 𝑢 = 𝔼𝜔 𝑢 ˆ and squaring this formula, we have the martingale decomposition of the variance of 𝑢: (13.17)

Var𝜔 (𝑢) ∶= 𝔼𝜔 (𝑢 − 𝑢)2 = 𝔼𝜔 (𝑢 − 𝑢 ˆ)2 + 𝔼𝜔 (ˆ 𝑢 − 𝑢)2 = 𝔼𝜔 Var(𝑢(𝑥, ⋅ )) + Var𝜔 (ˆ 𝑢)

where we defined the conditional variance Var(𝑢(𝑥, ⋅ )) ∶= 𝔼[(𝑢 − 𝑢 ˆ)2 |ℱ1 ] = 𝔼[𝑢2 |ℱ1 ] − [ˆ 𝑢 ]2 . The identity (13.17) is a triviality, but its interpretation is important. It means that the variance is additive w.r.t. the martingale decomposition (13.16). The first term 𝔼𝜔 Var(𝑢(𝑥, ⋅ )) is the expectation of the variance w.r.t. 𝑦 conditioned on 𝑥; the second term Var(ˆ 𝑢 − 𝑢)2 is the variance of the marginal w.r.t. 𝑥. In other words, we can compute the variance one by one. The martingale decomposition has an analogue for the entropy. For simplicity, we assume that 𝜔 has a density 𝜔(𝑥, 𝑦) w.r.t. a reference measure d𝑥 d𝑦. Denote by 𝜔˜ the marginal 𝜔˜ probability density on Ω1 , 𝜔(𝑥) ˜ = ∫ 𝜔(𝑥, 𝑦)d𝑦 , Ω2

and by 𝑓ˆ the marginal density of 𝑓𝜔 w.r.t. 𝜔, ˜ i.e., ˆ = 𝑓(𝑥)

∫Ω2 𝑓(𝑥, 𝑦)𝜔(𝑥, 𝑦)d𝑦 𝜔(𝑥) ˜

.

In particular, for any test function 𝑂(𝑥) we have (13.18)

ˆ 𝜔(𝑥)d𝑥. ∬ 𝑂(𝑥)𝑓(𝑥, 𝑦)𝜔(𝑥, 𝑦)d𝑥d𝑦 = ∫ 𝑂(𝑥)𝑓(𝑥) ˆ Ω1



Let

𝜔(𝑥, 𝑦) 𝜔(𝑥) ˜ be the probability density on Ω2 conditioned on a fixed 𝑥 ∈ Ω1 . Define 𝜔𝑥 (𝑦) ∶=

(13.19)

𝑓𝑥 (𝑦) ∶=

𝑓(𝑥, 𝑦) ˆ 𝑓(𝑥)

128

13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI)

to be the corresponding density of the measure 𝑓𝜔 conditioned on a fixed 𝑥 ∈ Ω1 . Note that with these definitions, we have 𝑓(𝑥, 𝑦)𝜔(𝑥, 𝑦) (𝑓𝜔)𝑥 (𝑦) = = 𝑓𝑥 (𝑦)𝜔𝑥 (𝑦). ∫Ω2 𝑓(𝑥, 𝑦)𝜔(𝑥, 𝑦)d𝑦 Now we are ready to state the following proposition on the additivity of entropy w.r.t. martingale decomposition: Proposition 13.4. Using the notation above, we have (13.20)

𝑆𝜔 (𝑓) = 𝔼𝑓𝜔˜𝑆𝜔𝑥 (𝑓𝑥 ) + 𝑆𝜔˜(𝑓ˆ ). ˆ

In particular, the marginal entropy is bounded by the total entropy, i.e., 𝑆𝜔˜(𝑓ˆ ) ≤ 𝑆𝜔 (𝑓).

(13.21)

In other words, the entropy is additive w.r.t. the martingale decomposition; i.e., the entropy is the sum of the expectation of the entropy in 𝑦 conditioned on 𝑥 and the entropy of the marginal in 𝑥. The additivity of entropy in this sense is an important tool in the application of the LSI, as we will demonstrate shortly. It also indicates that entropy is an extensive quantity; i.e., the entropy of two probabilities on a product space Ω𝑁 is often of order 𝑁, generalizing the formulas (13.11)-(13.12) to nonproduct measures. Proof. We can decompose the entropy as follows: 𝑆𝜔 (𝑓) = ∫ 𝑓 log 𝑓 d𝜔 = ∫ 𝑓 log(𝑓/𝑓ˆ )d𝜔 + ∫ 𝑓 log 𝑓ˆ d𝜔 (13.22)





= ∫ 𝑓 log(𝑓/𝑓ˆ )d𝜔 + ∫ 𝑓ˆ log 𝑓ˆ d𝜔, ˜ Ω

Ω1

where in the last step we used (13.18). The first term we can rewrite as ∫ 𝑓 log(𝑓/𝑓ˆ )𝜔(𝑥, 𝑦)d𝑥 d𝑦 (13.23)

ˆ = ∫ d𝑥 𝜔(𝑥) ˜ 𝑓(𝑥)[∫ 𝑓𝑥 (𝑦) log 𝑓𝑥 (𝑦)𝜔𝑥 (𝑦)d𝑦] = 𝔼𝑓𝜔˜𝑆𝜔𝑥 (𝑓𝑥 ). ˆ



We have thus proved (13.20). 13.3. Logarithmic Sobolev Inequality

The logarithmic Sobolev inequality requires that the underlying probability space admit a concept of differentiation. While the theory can be developed for more general spaces, we restrict our attention to the probability space Ω = ℝ𝑁 in this subsection. We will work with probability measures 𝜇 on ℝ𝑁 that are defined by a Hamiltonian ℋ: (13.24)

d𝜇(𝐱) =

𝑒−ℋ(𝐱) d𝐱, 𝑍

13.3. LOGARITHMIC SOBOLEV INEQUALITY

129

where 𝑍 is a normalization factor. In Section 13.7 we will comment on how to extend the results of this section to certain subsets of ℝ𝑁 , most importantly, to the simplex Σ𝑁 = {𝐱 ∈ ℝ𝑁 ∶ 𝑥1 < ⋯ < 𝑥𝑁 }. Note that for simplicity in this presentation we neglect the 𝛽𝑁 prefactor compared with (12.13). Let ℒ be the generator of the dynamics associated with the Dirichlet form 𝐷(𝑓) = 𝐷𝜇 (𝑓) = − ∫ 𝑓(ℒ𝑓)d𝜇 (13.25)

𝑁

∶= ∫ |∇𝑓|2 d𝜇 = ∑ ∫(𝜕𝑗 𝑓)2 d𝜇,

𝜕𝑗 = 𝜕𝑥𝑗 .

𝑗=1

Formally, we have ℒ = Δ − (∇ℋ) ⋅ ∇. The operator ℒ is symmetric with respect to the measure d𝜇, i.e., (13.26)

∫ 𝑓(ℒ𝑔)d𝜇 = ∫(ℒ𝑓)𝑔 d𝜇 = −∫ ∇𝑓 ⋅ ∇𝑔 d𝜇.

We remark that in many books on probability, e.g., [43], the Dirichlet form (13.25) is defined with a factor 12 , but this convention is not compatible with the 1/(𝛽𝑁) prefactor in (12.14). The lack of this 12 factor in (13.25) causes slight deviations from their customary form in the following theorems. Definition 13.5. The probability measure 𝜇 on ℝ𝑁 satisfies the logarithmic Sobolev inequality if there exists a constant 𝛾 such that (13.27)

𝑆(𝑓) = ∫ 𝑓 log 𝑓 d𝜇 ≤ 𝛾 ∫ |∇√𝑓|2 d𝜇 = 𝛾𝐷(√𝑓)

holds for any smooth density function 𝑓 ≥ 0 with ∫ 𝑓 d𝜇 = 1. The smallest such 𝛾 is called the logarithmic Sobolev inequality constant of the measure 𝜇. A simple density argument shows that (13.27) extends from smooth functions to any nonnegative function 𝑓 ∈ 𝐶0∞ (ℝ𝑁 ), the space of smooth functions with compact support. In fact, it is easy to extend it to all √𝑓 ∈ 𝐻 1 (d𝜇). Since we will not use the space 𝐻 1 (d𝜇) in this book, we will just use 𝑓 ∈ 𝐶0∞ (ℝ𝑁 ) in the LSI. Theorem 13.6 (Bakry-Émery [13]). Consider a probability measure 𝜇 on ℝ𝑁 of the form (13.24), i.e., 𝜇 = 𝑒−ℋ /𝑍. Suppose that a convexity bound holds for the Hamiltonian; i.e., with some positive constant 𝐾 we have ∇2 ℋ(𝐱) ≥ 𝐾

(13.28)

for any 𝐱 (in the sense of quadratic forms). Then, the logarithmic Sobolev inequality (13.27) holds with an LSI constant 𝛾 ≤ 2/𝐾, i.e., (13.29)

𝑆(𝑓) ≤

2 𝐷(√𝑓) for any density 𝑓 with ∫ 𝑓 d𝜇 = 1. 𝐾

130

13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI)

Furthermore, the dynamics (13.30)

𝜕𝑡 𝑓𝑡 = ℒ𝑓𝑡 ,

𝑡 > 0,

relaxes to equilibrium on the time scale 𝑡 ≍ 1/𝐾, both in the sense of entropy and Dirichlet form: (13.31)

𝑆(𝑓𝑡 ) ≤ 𝑒−2𝑡𝐾 𝑆(𝑓0 ),

𝐷(√𝑓𝑡 ) ≤

2 −𝑡𝐾 𝑒 𝑆(𝑓0 ). 𝑡

Proof. Let 𝑓𝑡 be the solution to the evolution equation (13.30) with a given smooth initial condition 𝑓0 . Simple calculation shows that the derivative of the entropy 𝑆(𝑓𝑡 ) is given by 𝜕𝑡 𝑆(𝑓𝑡 ) = ∫(ℒ𝑓𝑡 ) log 𝑓𝑡 d𝜇 + ∫ 𝑓𝑡 (13.32) = −∫

ℒ𝑓𝑡 d𝜇 𝑓𝑡

(∇𝑓𝑡 )2 d𝜇 = −4𝐷(√𝑓𝑡 ), 𝑓𝑡

where we used that ∫ ℒ𝑓𝑡 d𝜇 = 0 by (13.26). Similarly, we can compute the evolution of the Dirichlet form. Let ℎ𝑡 ∶= √𝑓𝑡 for simplicity; then 𝜕𝑡 ℎ𝑡 =

1 1 1 𝜕𝑡 ℎ𝑡2 = ℒℎ𝑡2 = ℒℎ𝑡 + (∇ℎ𝑡 )2 . 2ℎ𝑡 2ℎ𝑡 ℎ𝑡

We compute (dropping the 𝑡 subscript for brevity) 𝜕𝑡 𝐷(√𝑓𝑡 ) = 𝜕𝑡 ∫(∇ℎ)2 d𝜇 = 2 ∫ ∇ℎ ⋅ ∇𝜕𝑡 ℎ d𝜇 = 2 ∫(∇ℎ) ⋅ (∇ℒℎ)d𝜇 + 2 ∫(∇ℎ) ⋅ ∇

(∇ℎ)2 d𝜇 ℎ

= 2 ∫(∇ℎ) ⋅ [∇, ℒ]ℎ d𝜇 + 2 ∫(∇ℎ) ⋅ ℒ(∇ℎ)d𝜇

(13.33)

+ 2 ∫ ∑ 𝜕𝑖 ℎ[ 𝑖𝑗

2(𝜕𝑗 ℎ)𝜕𝑖 𝜕𝑗 ℎ (𝜕𝑗 ℎ)2 𝜕𝑖 ℎ − ]d𝜇 ℎ ℎ2

= −2 ∫(∇ℎ) ⋅ (∇2 ℋ)∇ℎ d𝜇 − 2 ∫ ∑(𝜕𝑖 𝜕𝑗 ℎ)2 d𝜇 𝑖𝑗

+ 2 ∫ ∑[ 𝑖𝑗

2(𝜕𝑗 ℎ)(𝜕𝑖 ℎ)𝜕𝑖𝑗 ℎ (𝜕𝑗 ℎ)2 (𝜕𝑖 ℎ)2 − ]d𝜇 ℎ ℎ2

= −2 ∫(∇ℎ) ⋅ (∇2 ℋ)∇ℎ d𝜇 − 2 ∫ ∑(𝜕𝑖𝑗 ℎ − 𝑖𝑗

(𝜕𝑖 ℎ)(𝜕𝑗 ℎ) 2 ) d𝜇 ℎ

13.3. LOGARITHMIC SOBOLEV INEQUALITY

131

where we used the commutator [∇, ℒ] = −(∇2 ℋ)∇. Therefore, under the convexity condition (13.28), we have (13.34)

𝜕𝑡 𝐷(√𝑓𝑡 ) ≤ −2𝐾𝐷(√𝑓𝑡 ).

Integrating (13.34), we have (13.35)

𝐷(√𝑓𝑡 ) ≤ 𝑒−2𝑡𝐾 𝐷(√𝑓0 ).

This proves that the equilibrium is achieved at 𝑡 = ∞ with 𝑓∞ = 1, and both the entropy and the Dirichlet form are zero. Integrating (13.32) from 𝑡 = 0 to 𝑡 = ∞, and using the monotonicity of 𝐷(√𝑓𝑡 ) from (13.34), we obtain ∞

−𝑆(𝑓0 ) = −4 ∫ 𝐷(√𝑓𝑡 )d𝑡 0

(13.36)



≥ −4𝐷(√𝑓0 ) ∫ 𝑒−2𝑡𝐾 d𝑡 0

2 = − 𝐷(√𝑓0 ). 𝐾 This proves the LSI (13.29) for any smooth function 𝑓 = 𝑓0 . By a standard density argument, it can be extended to any function 𝑓 with finite Dirichlet form, 𝐷(√𝑓) < ∞. For functions with unbounded Dirichlet form we interpret the LSI (13.29) as a tautology. In particular, (13.29) holds for 𝑓𝑡 as well, 𝑆(𝑓𝑡 ) ≤ (2/𝐾)𝐷(√𝑓𝑡 ). Inserting this back into (13.32), we have 𝜕𝑡 𝑆(𝑓𝑡 ) ≤ −2𝐾𝑆(𝑓𝑡 ). Integrating this inequality from time 0, we obtain the exponential relaxation of the entropy on time scale 𝑡 ≍ 1/𝐾 (13.37)

𝑆(𝑓𝑡 ) ≤ 𝑒−2𝑡𝐾 𝑆(𝑓0 ).

Finally, we can integrate (13.32) from time 𝑡/2 to 𝑡 to get 𝑡

𝑆(𝑓𝑡 ) − 𝑆(𝑓𝑡/2 ) = −4 ∫ 𝐷(√𝑓𝜏 )d𝜏. 𝑡/2

Using the positivity of the entropy 𝑆(𝑓𝑡 ) ≥ 0 on the left side and the monotonicity of the Dirichlet form (from (13.34)) on the right side, we get 2 𝑆(𝑓𝑡/2 ); 𝑡 thus, using (13.37), we obtain exponential relaxation of the Dirichlet form on time scale 𝑡 ≍ 1/𝐾, 2 𝐷(√𝑓𝑡 ) ≤ 𝑒−𝑡𝐾 𝑆(𝑓0 ). □ 𝑡 (13.38)

𝐷(√𝑓𝑡 ) ≤

132

13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI)

Standard example. Let 𝜇 be the centered Gaussian measure with variance 𝑎2 on ℝ𝑛 , i.e., (13.39)

d𝜇(𝑥) =

1 2 2 𝑒−𝑥 /2𝑎 d𝑥. (2𝜋𝑎2 )𝑛/2

Written in the form 𝑒−ℋ /𝑍 with the quadratic Hamiltonian ℋ(𝑥) = 𝑥 2 /2𝑎2 , we see that (13.28) holds with 𝐾 = 𝑎−2 . Then, (13.29) with the replacement 𝑓 → 𝑓 2 yields that ∫ 𝑓 2 log 𝑓 2 d𝜇 ≤ 2𝑎2 ∫(∇𝑓)2 d𝜇

(13.40)

for any normalized ∫ 𝑓 2 d𝜇 = 1. The constant of this inequality is optimal. Alternative formulation of LSI. We remark that there is another formulation of the LSI (see [100]). To make this connection, let (13.41)

𝑔(𝑥) ∶= 𝑓(𝑥)

1 2 2 𝑒−𝑥 /4𝑎 , (2𝜋𝑎2 )𝑛/4

and notice that ∫ 𝑔2 (𝑥)d𝑥 = ∫ 𝑓 2 d𝜇 where d𝑥 is the Lebesgue measure and d𝜇 is of the form (13.39). Rewriting (13.40) to an inequality for 𝑔 and with the replacement 2𝜋𝑎2 → 𝑎2 , we get the following version of the LSI: (13.42) ∫ 𝑔2 log(𝑔2 /‖𝑔‖2 )d𝑥 + 𝑛[1 + log 𝑎] ∫ 𝑔2 d𝑥 ≤ (𝑎2 /𝜋) ∫ |∇𝑔|2 d𝑥, ℝ𝑛

ℝ𝑛

ℝ𝑛

which holds for any 𝑎 > 0 and any function 𝑔, where ‖𝑔‖ = (∫ 𝑔2 d𝑥)1/2 is the 𝐿2 -norm with respect to the Lebesgue measure. Proposition 13.7 (LSI implies spectral gap). Let 𝜇 satisfy the LSI (13.27) with an LSI constant 𝛾. Then, for any 𝑣 ∈ 𝐿2 (𝜇) with ∫ 𝑣 d𝜇 = 0, we have (13.43)

∫ 𝑣 2 d𝜇 ≤

𝛾 𝛾 ∫ |∇𝑣|2 d𝜇 = 𝐷(𝑣); 2 2

i.e., 𝜇 has a spectral gap of size at least 𝛾/2. Proof. By definition of the LSI constant, we have ∫ 𝑢 log 𝑢 d𝜇 ≤ 𝛾𝐷(√𝑢) for any 𝑢 with ∫ 𝑢 d𝜇 = 1. For any bounded, smooth function 𝑣 with ∫ 𝑣 d𝜇 = 0, define 𝑢 = 1 + 𝜀𝑣. Then, we have (13.44)

𝜀−2 ∫(1 + 𝜀𝑣) log(1 + 𝜀𝑣)d𝜇 ≤

|∇𝑣|2 𝛾 ∫ d𝜇. 4 1 + 𝜀𝑣

Taking the limit 𝜀 → 0, we get that the right-hand side converges to by dominated convergence. This proves the proposition.

1 2

∫ 𝑣2 d𝜇 □

13.3. LOGARITHMIC SOBOLEV INEQUALITY

133

Proposition 13.8 (Concentration inequality (Herbst bound)). Suppose that the measure 𝜇 satisfies the LSI with a constant 𝛾. Let 𝐹 be a function with 𝔼𝜇 𝐹 = 0. Then, we have 𝛾 (13.45) 𝔼𝜇 𝑒𝐹 ≤ exp ( ‖∇𝐹‖2∞ ) 4 where ∑ |𝜕𝑖 𝐹(𝐱)|2 . √ 𝑖

‖∇𝐹‖∞ ∶= sup 𝐱

In particular, we have ℙ𝜇 (|𝐹| ≥ 𝛼) ≤ exp(−

(13.46)

𝛼2 ) 𝛾‖∇𝐹‖2∞

for any 𝛼 > 0. Notice that we get an exponential tail estimate from the LSI. If we only have the spectral gap estimate, (13.43), we can only bound the variance of 𝐹 that yields a quadratic tail estimate 𝛾‖∇𝐹‖2∞ . 2𝛼2 In our typical applications, we have 𝛾‖∇𝐹‖2∞ ≍ 𝑁 −1 . We often need to control the concentration of many (≍ 𝑁 𝐶 ) different functions 𝐹 in parallel. Thus the simple union bound is applicable with the LSI but not with the spectral gap estimate. ℙ𝜇 (|𝐹| ≥ 𝛼) ≤

Proof. Denote by 𝑢 = 𝑢(𝑡) ∶=

exp (𝑒𝑡 𝐹) . 𝔼𝜇 exp (𝑒𝑡 𝐹)

By differentiation and the LSI, we have d −𝑡 [𝑒 log 𝔼𝜇 exp (𝑒𝑡 𝐹)] = 𝑒−𝑡 𝔼𝜇 𝑢 log 𝑢 ≤ 𝑒−𝑡 𝛾𝔼𝜇 |∇√𝑢|2 . d𝑡

(13.47) Clearly,

𝑒2𝑡 ‖∇𝐹‖2∞ . 4 Integrating from any 𝑡 < 0 to 0 yields that 𝔼𝜇 |∇√𝑢|2 ≤

(13.48)

0

𝜇

(13.49)

[log 𝔼 exp (𝐹)] − [𝑒

−𝑡

𝛾 log 𝔼 exp (𝑒 𝐹)] = ‖∇𝐹‖2∞ ∫ d𝑠 𝑒𝑠 . 4 𝑡 𝜇

𝑡

From the condition 𝔼𝜇 𝐹 = 0, we have 1 log 𝔼𝜇 𝑒𝜀𝐹 𝜀→0 𝜀 1 = lim log 𝔼𝜇 [1 + 𝜀𝐹 + 𝑂(𝜀2 𝑒𝜀𝐹 )] = 0 𝜀→0 𝜀

lim [𝑒−𝑡 log 𝔼𝜇 exp (𝑒𝑡 𝐹)] = lim

𝑡→−∞

134

13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI)

by dominated convergence. This proves the inequality (13.45). The concentration bound (13.46) follows from an exponential Markov inequality: 𝛾 ℙ𝜇 (𝐹 ≥ 𝛼) ≤ 𝔼𝜇 𝑒𝑡(𝐹−𝛼) ≤ exp(−𝛼𝑡 + 𝑡 2 ‖∇𝐹‖2∞ ), 4 where we chose the optimal 𝑡 = 2𝛼/[𝛾‖∇𝐹‖2∞ ] in the last step. Changing 𝐹 → −𝐹, we obtain the opposite bound and thus we have proved (13.46). □ Proposition 13.9 (Stability of the LSI constant). Consider two measures 𝜇, 𝜈 on ℝ𝑁 that are related by 𝜈 = 𝑔𝜇 with some bounded function 𝑔. Let 𝛾𝜇 denote the LSI constant for 𝜇. Then 𝛾𝜈 ≤ ‖𝑔‖∞ ‖𝑔−1 ‖∞ 𝛾𝜇 . Proof. Take an arbitrary function 𝑓 ≥ 0 with ∫ 𝑓 d𝜈 = 1. Let 𝛼 ∶= ∫ 𝑓 d𝜇 ≤ ‖𝑔−1 ‖∞ and by definition of the entropy we have 𝑆𝜇 (𝑓/𝛼) = ∫(𝑓/𝛼) log(𝑓/𝛼)d𝜇. The following inequality for any two nonnegative numbers 𝑎, 𝑏 can be checked by elementary calculus: 𝑎 log 𝑎 − 𝑏 log 𝑏 − (1 + log 𝑏)(𝑎 − 𝑏) ≥ 0. Hence, ∫[𝑓 log 𝑓 − 𝛼 log 𝛼 − (1 + log 𝛼)(𝑓 − 𝛼)]d𝜈 ≤ ‖𝑔‖∞ ∫[𝑓 log 𝑓 − 𝛼 log 𝛼 − (1 + log 𝛼)(𝑓 − 𝛼)]d𝜇 = ‖𝑔‖∞ 𝛼𝑆𝜇 (𝑓/𝛼). The left-hand side of the last inequality is equal to 𝑆𝜈 (𝑓) − [log 𝛼 − (𝛼 − 1)] ≥ 𝑆𝜈 (𝑓) , where we have used the concavity of log. This proves that 𝑆𝜈 (𝑓) ≤ ‖𝑔‖∞ 𝛼𝑆𝜇 (𝑓/𝛼). Suppose that the LSI holds for 𝜇 with constant 𝛾𝜇 . Then, we have 𝑆𝜈 (𝑓) ≤ ‖𝑔‖∞ 𝛾𝜇 𝛼 ∫(∇√𝑓/𝛼)2 d𝜇 = ‖𝑔‖∞ 𝛾𝜇 ∫(∇√𝑓)2 𝑔−1 d𝜈 ≤ ‖𝑔‖∞ ‖𝑔−1 ‖∞ 𝛾𝜇 ∫(∇√𝑓)2 d𝜈, and this proves the lemma.



Proposition 13.10 (Tensorial property of the LSI). Consider two probability measures 𝜇, 𝜈 on ℝ𝑛 , ℝ𝑚 , respectively. Suppose that the LSI holds for them with LSI constants 𝛾𝜇 and 𝛾𝜈 , respectively. Let 𝜔 = 𝜇 ⊗ 𝜈 be the product measure on ℝ𝑚+𝑛 . Then, the LSI holds for 𝜔 with LSI constant 𝛾𝜔 ≤ max{𝛾𝜇 , 𝛾𝜈 }.

13.3. LOGARITHMIC SOBOLEV INEQUALITY

135

Proof. We will use the notation introduced in Section 13.2 with Ω1 = ℝ𝑛 , Ω2 = ℝ𝑚 . Recall the additivity of entropy (13.20) w.r.t. martingale decomposition. In the current situation, 𝜔 = 𝜇 ⊗ 𝜈 and thus 𝜔 ˆ = 𝜇 and 𝜔𝑥 = 𝜈 for any 𝑥. Furthermore, (13.50)

ˆ = 𝜔(𝑥) 𝑓(𝑥) ˆ −1 ∫ 𝑓(𝑥, 𝑦)𝜔(𝑥, 𝑦)d𝑦 = ∫ 𝑓(𝑥, 𝑦)𝜈(𝑦)d𝑦.

By the additivity of entropy and the LSI w.r.t. 𝜇 and 𝜈, we have ˆ 𝑆𝜔 (𝑓) = 𝔼𝑓𝜇 𝑆𝜈 (𝑓𝑥 ) + 𝑆𝜇 (𝑓ˆ ) 2

ˆ ∫ ≤ 𝛾𝜈 ∫ 𝑓(𝑥)𝜇(𝑥)d𝑥

(13.51)

|∇𝑦 √𝑓(𝑥, 𝑦)| 𝜈(𝑦)d𝑦 ˆ 𝑓(𝑥)

| | + 𝛾𝜇 ∫ ||∇𝑥 ∫ 𝑓(𝑥, 𝑦)𝜈(𝑦)d𝑦 || 𝜇(𝑥)d𝑥. | √ | The integral in the first term on the right-hand side is equal to 2

∬|∇𝑦 √𝑓(𝑥, 𝑦)| 𝜇(𝑥)𝜈(𝑦)d𝑥 d𝑦. The integral of the second term on the right-hand side is bounded by 2

2 | ∫ ∇𝑥 𝑓(𝑥, 𝑦)𝜈(𝑦)d𝑦 | ∫ 𝜇(𝑥)d𝑥 ≤ ∬||∇𝑥 √𝑓(𝑥, 𝑦)|| 𝜇(𝑥)𝜈(𝑦)d𝑥 d𝑦, 4 ∫ 𝑓(𝑥, 𝑦)𝜈(𝑦)d𝑦

where we have written ∇𝑥 𝑓 = 2(∇𝑥 √𝑓)√𝑓 and used the Schwarz inequality. Summarizing, we have proved that 2

2

𝑆𝜔 (𝑓) ≤ 𝛾𝜈 ∫|∇𝑦 √𝑓(𝑥, 𝑦)| 𝜔(𝑥, 𝑦)d𝑥 d𝑦 + 𝛾𝜇 ∫||∇𝑥 √𝑓(𝑥, 𝑦)|| 𝜔(𝑥, 𝑦)d𝑥 d𝑦, □

and this proves the proposition.

The following lemma is a useful tool to control the entropy flow w.r.t. nonequilibrium measure. Lemma 13.11 ([140]). Suppose we have evolution equation 𝜕𝑡 𝑓𝑡 = ℒ𝑓𝑡 with ℒ defined via the Dirichlet form ⟨𝑓, (−ℒ)𝑓⟩𝜇 = ∑𝑗 ∫(𝜕𝑗 𝑓)2 d𝜇. Then, for any time-dependent probability density 𝜓𝑡 w.r.t. 𝜇, we have the entropy flow identity 𝜕𝑡 𝑆𝜇 (𝑓𝑡 |𝜓𝑡 ) = −4 ∑ ∫(𝜕𝑗 √𝑔𝑡 )2 𝜓𝑡 d𝜇 + ∫ 𝑔𝑡 (ℒ − 𝜕𝑡 )𝜓𝑡 d𝜇 𝑗

where 𝑔𝑡 ∶= 𝑓𝑡 /𝜓𝑡 and 𝑆𝜇 (𝑓𝑡 |𝜓𝑡 ) ∶= ∫ 𝑓𝑡 log is the relative entropy of 𝑓𝑡 𝜇 w.r.t. 𝜓𝑡 𝜇.

𝑓𝑡 d𝜇 = 𝑆(𝑓𝑡 𝜇|𝜓𝑡 𝜇) 𝜓𝑡

136

13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI)

Proof. A simple computation then yields that ℒ𝑓 d 𝑆𝜇 (𝑓𝑡 |𝜓𝑡 ) = ∫(ℒ𝑓𝑡 )(log 𝑔𝑡 )d𝜇 + ∫ 𝑓𝑡 𝑡 d𝜇 − ∫(𝜕𝑡 𝜓𝑡 ) 𝑔𝑡 d𝜇 d𝑡 𝑓𝑡 = ∫(ℒ𝑓𝑡 )(log 𝑔𝑡 )d𝜇 − ∫(𝜕𝑡 𝜓𝑡 ) 𝑔𝑡 d𝜇 = ∫(𝑔𝑡 𝜓𝑡 ) ℒ(log 𝑔𝑡 )d𝜇 − ∫ 𝑔𝑡 𝜕𝑡 𝜓𝑡 d𝜇 = ∫ 𝜓𝑡 [𝑔𝑡 ℒ(log 𝑔𝑡 ) − 𝑔𝑡

ℒ𝑔𝑡 ] d𝜇 + ∫ 𝑔𝑡 (ℒ − 𝜕𝑡 )𝜓𝑡 d𝜇. 𝑔𝑡

By definition of ℒ, we have 2

(13.52)

(𝜕𝑗 𝑔) 2 ℒ𝑔 ℒ(log 𝑔) − = −∑ = −4 ∑ (𝜕𝑗 √𝑔) , 𝑔 𝑔 𝑗 𝑗 □

and we have proved Lemma 13.11.

We remark that stochastic processes and their generators can be defined in more general setups, e.g., the underlying probability spaces may be different from ℝ𝑁 and the generators may involve discrete jumps. In this case, the stochastic generator ℒ is defined directly, without an a priori Dirichlet form. It can, however, be proved that −𝑔[ℒ(log 𝑔) − (ℒ𝑔)/𝑔] ≥ 0, which can then be viewed as a generalization of the “Dirichlet form” associated with a generator. 13.4. Hypercontractivity We now present an interesting connection between the LSI of a probability measure 𝜇 and the hypercontractivity properties of the semigroup generated by ℒ = ℒ𝜇 . Since this result will not be used later in this book, this section can be skipped. To state the result, we define the semigroup {𝑃𝑡 }𝑡≥0 by 𝑃𝑡 𝑓 ∶= 𝑓𝑡 , where 𝑓𝑡 solves the equation 𝜕𝑡 𝑓𝑡 = ℒ𝑓𝑡 with initial condition 𝑓0 = 𝑓. Theorem 13.12 (L. Gross [79]). For a measure 𝜇 on ℝ𝑛 and for any fixed constants 𝛽 ≥ 0 and 𝛾 > 0 the following two statements are equivalent: (i) The generalized LSI (13.53) ∫ 𝑓 log 𝑓 d𝜇 ≤ 𝛾 ∫ |∇√𝑓|2 d𝜇 + 𝛽

for any 𝑓 ≥ 0 with ∫ 𝑓 d𝜇 = 1

holds. (ii) The hypercontractivity estimate (13.54)

1 1 − ]} ‖𝑓‖𝐿𝑝 (𝜇) 𝑝 𝑞 holds for all exponents satisfying 𝑞−1 ≤ 𝑒4𝑡/𝛾 , 1 < 𝑝 ≤ 𝑞 < ∞. 𝑝−1 ‖𝑃𝑡 𝑓‖𝐿𝑞 (𝜇) ≤ exp {𝛽 [

13.4. HYPERCONTRACTIVITY

137

Proof. We will only prove (i) ⇒ (ii), i.e., that the generalized LSI implies the decay estimate, the proof of the converse statement can be found in [43]. First we assume that 𝑓 ≥ 0, hence 𝑓𝑡 ≥ 0. Direct differentiation yields the identity (13.55)

d log ‖𝑓𝑡 ‖𝑝(𝑡) = d𝑡 𝑝(𝑡) ̇ 4(𝑝(𝑡) − 1) 𝐷(𝑢(𝑡)) + ∫ 𝑢(𝑡)2 log(𝑢(𝑡)2 )d𝜇] [− 𝑝(𝑡) ̇ 𝑝(𝑡)2

with 𝑝(𝑡) ̇ =

d d𝑡

𝑝(𝑡) and where we defined 𝑝(𝑡)/2

𝑢(𝑡) ∶= 𝑓𝑡

−𝑝(𝑡)/2

‖𝑓𝑡 ‖𝑝(𝑡)

,

𝐷(𝑢) = ∫(∇𝑢)2 d𝜇.

Now choose 𝑝(𝑡) to solve the differential equation 𝛾=

4(𝑝(𝑡) − 1) 𝑝(𝑡) ̇

with 𝑝(0) = 𝑝, i.e., 𝑝(𝑡) = 1 + (𝑝 − 1)𝑒4𝑡/𝛾 ,

where 𝛾 is the constant given in the theorem. Using (13.53) for the 𝐿1 (𝜇)normalized function 𝑢(𝑡)2 , we have 𝛽 𝑝(𝑡) ̇ d log ‖𝑓𝑡 ‖𝑝(𝑡) ≤ . d𝑡 𝑝(𝑡)2 Integrating both sides, from 𝑡 = 0 to some 𝑇 we have log ‖𝑓𝑇 ‖𝑝(𝑇) − log ‖𝑓‖𝑝 ≤ 𝛽[

1 1 − ]. 𝑝 𝑝(𝑇)

Choosing 𝑇 such that 𝑝(𝑇) = 𝑞, we have proved (13.54) for 𝑓 ≥ 0. The general case follows from separating the positive and negative parts of 𝑓. □ Exercise. In this exercise, we show that the idea of the LSI can be useful even if the invariant measure is not a probability measure. The sketch below follows the paper by Carlen-Loss [33], and it works for any parabolic equation of the type 𝜕𝑡 𝑓𝑡 = [∇ ⋅ (𝐷(𝑥, 𝑡)∇) + b(𝑥, 𝑡) ⋅ ∇]𝑓𝑡 for any divergence free b and 𝐷(𝑥, 𝑡) ≥ 𝑐 > 0. For simplicity of notation, we consider only the heat equation on ℝ𝑛 (13.56)

𝜕𝑡 𝑓𝑡 = Δ𝑓𝑡 .

138

13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI)

The invariant measure of this flow is the standard Lebesgue measure on ℝ𝑛 . (i) Check the formula (13.55) in this setup, i.e., show that the following identity holds:

(13.57)

𝑝(𝑡) ̇ d 1 −𝑝(𝑡) d 𝑝(𝑡) 𝑝 ∫ 𝑓𝑡 d𝑥 − log ‖𝑓𝑡 ‖𝑝(𝑡) = ‖𝑓𝑡 ‖𝑝(𝑡) log(‖𝑓𝑡 ‖𝑝 ) d𝑡 d𝑡 𝑝(𝑡) 𝑝(𝑡)2 𝑝(𝑡) ̇ 4(𝑝(𝑡) − 1) ∫(∇𝑢(𝑡))2 d𝑥 = [− 𝑝(𝑡) ̇ 𝑝(𝑡)2 + ∫ 𝑢(𝑡)2 log(𝑢(𝑡)2 )d𝑥], where all norms are w.r.t. the Lebesgue measure and 𝑝/2

𝑢(𝑡) = 𝑓𝑡

−𝑝/2

‖𝑓𝑡 ‖𝑝

,

𝑝 = 𝑝(𝑡).

(ii) For any 𝑡 fixed, choose the constant in (13.42) by setting 𝑎2 =

4(𝑝(𝑡) − 1) . 𝑝(𝑡) ̇

Thus we have 𝑛𝑝(𝑡) ̇ 4𝜋(𝑝(𝑡) − 1) 1 d log ‖𝑓𝑡 ‖𝑝(𝑡) ≤ − )]. [1 + log( 2 d𝑡 2 𝑝(𝑡) ̇ 𝑝(𝑡) For a suitable choice of the function 𝑝(𝑡) with 𝑝(0) = 1, 𝑝(𝑇) = ∞, show that (13.58)

‖𝑓𝑇 ‖∞ ≤ (4𝜋𝑇)−𝑛/2 ‖𝑓0 ‖1 .

We remark that in the case of 𝐷 ≡ 1 and b ≡ 0, this heat kernel estimate is also a trivial consequence of the explicit formula for the heat kernel 𝑒𝑡∆ (𝑥, 𝑦). The point is, as remarked earlier, that this proof works for a general class of second-order parabolic equations. 13.5. Brascamp-Lieb Inequality The following inequality will not be needed for the main result of this book, so the reader may skip it. Nevertheless, we included it here since it is an important tool in the analysis of probability measures in very high dimensions and was used in universality proofs for the invariant ensembles [25]. We will formulate it on ℝ𝑁 , and in Section 13.7 we comment on how to extend it to certain probability measures on the simplex Σ𝑁 defined in (12.3). Theorem 13.13 (Brascamp-Lieb inequality [29]). Consider a probability measure 𝜇 on ℝ𝑁 of the form (13.24), i.e., 𝜇 = 𝑒−ℋ /𝑍. Suppose that the Hamiltonian is strictly convex, i.e., (13.59)

∇2 ℋ(𝐱) ≥ 𝐾 > 0

13.5. BRASCAMP-LIEB INEQUALITY

139

as a matrix inequality for some positive constant 𝐾. Then, for any smooth function 𝑓 ∈ 𝐿2 (𝜇), we have (13.60)

⟨𝑓; 𝑓⟩𝜇 ≤ ⟨∇𝑓, [∇2 ℋ]−1 ∇𝑓⟩𝜇 .

Recall that ⟨𝑓, 𝑔⟩𝜇 = ∫ 𝑓𝑔 d𝜇 denotes the scalar product and ⟨𝑓; 𝑔⟩𝜇 = ⟨𝑓, 𝑔⟩𝜇 − ⟨1, 𝑓⟩𝜇 ⟨1, 𝑔⟩𝜇 is the covariance. With a slight abuse of notation, we 𝑁

also use the notation ⟨F, G⟩𝜇 = ∑𝑖=1 ⟨𝐹𝑖 , 𝐺𝑖 ⟩𝜇 for the scalar product of any two vector-valued functions F, G ∶ ℝ𝑁 → ℝ𝑁 ; this extended scalar product is used in the right-hand side of (13.60). Remark. Notice that the Brascamp-Lieb inequality is optimal in the sense that, if 𝜇 is a Gaussian measure, then we can find 𝑓 so that the inequality becomes equality. Moreover, if we use the matrix inequality ∇2 ℋ ≥ 𝐾, then (13.60) is reduced to the spectral gap inequality (13.43), but of course (13.60) is stronger. The following proof is due to Helffer-Sjöstrand [81] and Naddaf-Spencer [109]. Proof of Theorem 13.13 . Recall that ℒ is the generator for a reversible dynamics with reversible measure 𝜇 defined in (13.26). We have for the dynamics (13.30) d ⟨𝑓, 𝑒𝑡ℒ 𝑔⟩ = ⟨𝑓, ℒ𝑒𝑡ℒ 𝑔⟩ = − ∑⟨𝜕𝑥𝑗 𝑓, 𝜕𝑥𝑗 𝑒𝑡ℒ 𝑔⟩. d𝑡 𝑗 Define 𝐺𝑗 (𝑡, 𝐱) ∶= 𝜕𝑥𝑗 [𝑒𝑡ℒ 𝑔(𝐱)].

G(𝑡, 𝑥) ∶= (𝐺1 (𝑡, 𝐱), … , 𝐺𝑁 (𝑡, 𝐱)), Recall the explicit form of ℒ:

ℒ = Δ − (∇ℋ) ⋅ ∇ = Δ + ∑ 𝑏𝑘 𝜕𝑥𝑘 ,

𝑏𝑘 ∶= −𝜕𝑥𝑘 ℋ.

𝑘

Define a new generator L on any vector-valued function G ∶ ℝ𝑁 → ℝ𝑁 by (LG)𝑗 (𝐱) ∶= ℒ𝐺𝑗 (𝐱) + ∑(𝜕𝑥𝑗 𝑏𝑘 )𝐺𝑘 (𝐱). 𝑘

Clearly, by explicit computation, we have 𝜕𝑡 G(𝑡, 𝐱) = LG(𝑡, 𝐱). From the decay estimate (13.31), it follows that the dynamics are mixing, i.e., lim𝑡→∞ ⟨𝑓, 𝑒𝑡ℒ 𝑔⟩ = 0 for any smooth function 𝑓 with ∫ 𝑓 d𝜇 = 0. Thus for

140

13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI)

any such function 𝑓 we have ∞

⟨𝑓, 𝑔⟩𝜇 = − ∫ 0



d ⟨𝑓, 𝑒𝑡ℒ 𝑔⟩𝜇 d𝑡 = ∑ ∫ ⟨𝜕𝑥𝑗 𝑓, 𝐺𝑗 (𝑡, 𝐱)⟩𝜇 d𝑡 d𝑡 𝑗 0 ∞

= ∫ ⟨∇𝑓, 𝑒𝑡L ∇𝑔⟩𝜇 d𝑡 0

= ⟨∇𝑓, (−L)−1 ∇𝑔⟩𝜇 . Taking 𝑔 = 𝑓 and from the definition of L, we have −(L)𝑖,𝑗 = −ℒ𝛿𝑖,𝑗 − (𝜕𝑥𝑖 𝑏𝑗 ) ≥ (𝜕𝑥𝑖 𝜕𝑥𝑗 ℋ) as an operator inequality, and we have proved that ⟨𝑓, 𝑓⟩𝜇 = ⟨∇𝑓, (−L)−1 ∇𝑓⟩𝜇 ≤ ⟨∇𝑓, [∇2 ℋ]−1 ∇𝑓⟩𝜇 .



13.6. Remarks on the Applications of the LSI to Random Matrices The Herbst bound provides strong concentration results for probability measures that satisfy the LSI. In this section we exploit this connection for random matrices. For definiteness, we will consider the Gaussian orthogonal ensemble (GOE), although the arguments below can be extended to more general Wigner as well as invariant ensembles. There are two ways to view Gaussian matrix ensembles in the context of the LSI: we can either work on the probability space of the matrix 𝐻 with Gaussian matrix elements, or we can work directly on the space of the eigenvalues 𝝀 by using the explicit formula (4.12) for invariant ensembles. Both measures satisfy the LSI (in Section 13.7 we will comment on how to generalize the LSI from ℝ𝑁 to the simplex Σ𝑁 , (12.3), a version of the LSI that we will actually use). We will explore both possibilities, and we start with the matrix point of view. 13.6.1. LSI from the Wigner matrix point of view. Let 𝜇 = 𝜇G be the standard Gaussian measure 𝜇 ∼ exp (−𝑁 Tr 𝐻 2 ) on symmetric matrices. Notice that for this measure the family {𝑥𝑖𝑗 = 𝑁 1/2 ℎ𝑖𝑗 , 𝑖 ≤ 𝑗} is a collection of independent standard Gaussian random variables (up to a factor 2). Hence the LSI holds for every matrix element and by its tensorial property, i.e., Proposition 13.10, the LSI holds for any function of the full matrix considered as a function of this collection {𝑥𝑖𝑗 = 𝑁 1/2 ℎ𝑖𝑗 , 𝑖 ≤ 𝑗}. In particular, using the spectral gap estimate (13.43) for the Gaussian variables 𝑥𝑖𝑗 , we have for any function 𝐹 = 𝐹(𝐻) (13.61)

⟨𝐹(𝐻); 𝐹(𝐻)⟩𝜇 ≤ 𝐶 ∑ ∫ |𝜕𝑥𝑖𝑗 𝐹(𝐻)|2 d𝜇 = 𝑖≤𝑗

2 𝐶 ∑ ∫|𝜕ℎ𝑖𝑗 𝐹(𝐻)| d𝜇, 𝑁 𝑖≤𝑗

where the additional 𝑁 −1 factor comes from the scaling ℎ𝑖𝑗 = 𝑁 −1/2 𝑥𝑖𝑗 .

13.6. REMARKS ON THE APPLICATIONS OF THE LSI TO RANDOM MATRICES

141

Suppose that 𝐹(𝐻) = 𝑅(𝝀(𝐻)) is a real function of the eigenvalues 𝝀 = (𝜆1 , … , 𝜆𝑁 ); then by the chain rule we have 2

| 𝜕𝜆 | 𝐶 𝐶 ∑ ∫ |𝜕ℎ𝑖𝑗 𝑅(𝝀(𝐻))|2 d𝜇 = ∑ ∫|∑ 𝜕𝜆𝛼 𝑅(𝝀) 𝛼 | d𝜇. 𝑁 𝑖≤𝑗 𝑁 𝑖≤𝑗 | 𝛼 𝜕ℎ𝑖𝑗 | Expanding the square and using the perturbation formula (12.9) with real eigenvectors, we can compute 2

𝜕𝜆 | 1 | ∑ ∫|∑ 𝜕 𝑅(𝝀) 𝛼 || d𝜇 𝑁 𝑖≤𝑗 | 𝛼 𝜆𝛼 𝜕ℎ𝑖𝑗 2

(13.62)



𝜕𝜆 | 1 2 | ∫||∑ 𝜕𝜆𝛼 𝑅(𝝀) 𝛼 || d𝜇 = ∑ 𝑁 𝑖≤𝑗 2 − 𝛿𝑖𝑗 𝜕ℎ𝑖𝑗 𝛼

=

𝜕𝜆 𝜕𝜆𝛽 1 2 ∑ ∑ ∫ 𝜕𝜆𝛼 𝑅(𝝀)𝜕𝜆𝛽 𝑅(𝝀) 𝛼 d𝜇 𝑁 𝑖≤𝑗 2 − 𝛿𝑖𝑗 𝛼,𝛽 𝜕ℎ𝑖𝑗 𝜕ℎ𝑖𝑗

=

2 ∑[2 − 𝛿𝑖𝑗 ] ∑ ∫ 𝜕𝜆𝛼 𝑅(𝝀)𝜕𝜆𝛽 𝑅(𝝀)𝑢𝛼 (𝑖)𝑢𝛼 (𝑗)𝑢𝛽 (𝑖)𝑢𝛽 (𝑗) d𝜇 𝑁 𝑖≤𝑗 𝛼,𝛽

=

2 ∑ ∑ ∫ 𝜕𝜆𝛼 𝑅(𝝀)𝜕𝜆𝛽 𝑅(𝝀)𝑢𝛼 (𝑖)𝑢𝛼 (𝑗)𝑢𝛽 (𝑖)𝑢𝛽 (𝑗) d𝜇 𝑁 𝑖,𝑗 𝛼,𝛽

=

2 ∑ ∫ |𝜕𝜆𝛼 𝑅(𝝀)|2 d𝜇, 𝑁 𝛼

where we have used the orthogonality property and the normalization convention of the eigenvectors in the last step. We remark that the annoying factor [2 − 𝛿𝑖𝑗 ] can be avoided if we first consider 𝜆𝛼 as a function of all {𝑥𝑖𝑗 ∶ 1 ≤ 𝑖, 𝑗 ≤ 𝑁} as independent variables. Then the perturbation formula (12.9) becomes (13.63)

𝜕𝜆𝛼 | | = 𝑢𝛼 (𝑖)𝑢𝛼 (𝑗), 𝜕ℎ𝑖𝑗 |ℎ𝑖𝑗 =ℎ𝑗𝑖

i.e., the derivative evaluated on the submanifold of Hermitian matrices. In this way, we can keep the summation in (13.61) unrestricted, and up to a constant factor, we will get the same final result as in (13.62). In summary, we proved (13.64)

⟨𝐹; 𝐹⟩𝜇 = ⟨𝑅(𝝀(𝐻)); 𝑅(𝝀(𝐻))⟩𝜇 ≤

𝐶 ∑ ∫ |𝜕𝜆𝛼 𝑅(𝝀)|2 d𝜇. 𝑁 𝛼

Notice that this argument holds for any generalized Wigner matrix as long as a spectral gap estimate (13.43) holds for the distribution of every rescaled matrix element 𝑁 1/2 ℎ𝑖𝑗 . Furthermore, it can be generalized for Wigner matrices with Bernoulli random variables for which there is a spectral gap (and LSI) in discrete form. Similar remarks apply to the following LSI estimates.

142

13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI)

To see how good this estimate is, consider a local linear statistics of eigenvalues, i.e., the local average of 𝐾 consecutive eigenvalues with label near 𝑀, by setting 1 𝛼−𝑀 𝑅(𝝀) ∶= ∑ 𝐴( )𝜆𝛼 , 𝐾 𝛼 𝐾 where 𝐴 is a smooth function of compact support and 1 ≤ 𝑀, 𝐾 ≤ 𝑁. Computing the right-hand side of (13.64) and combining it with (13.61), we get (13.65)

⟨𝐹; 𝐹⟩𝜇 ≤

𝐶 𝐶 𝛼−𝑀 ∑ 𝐴2 ( )≤ 𝐴 𝐾 𝑁𝐾 𝑁𝐾 2 𝛼

where the last constant 𝐶𝐴 depends on the function 𝐴. For 𝐾 = 1, this inequality estimates the square of the fluctuation of a single eigenvalue (choosing 𝐴 appropriately). The bound (13.65) is off by a factor 𝑁 since the true fluctuation is of order almost 1/𝑁 by rigidity, Theorem 11.5, at least in the bulk, i.e., if 𝛿𝑁 ≤ 𝑀 ≤ (1 − 𝛿)𝑁. On the other hand, for 𝐾 = 𝑁 the bound (13.65) is much more precise; it shows that the variance of a macroscopic average of the eigenvalues is at most of order 𝑁 −2 . This is the correct order of magnitude; in 𝛼 fact, it is known that ∑𝛼 𝐴( 𝑁 )𝜆𝛼 converges to a Gaussian random variable (see, e.g., [103,118] and references therein). Hence, the spectral gap argument yields the optimal (up to a constant) result for any macroscopic average of eigenvalues. Another common quantity of interest is the Stieltjes transform of the empirical eigenvalue distribution, i.e., (13.66) 𝐹(𝐻) = 𝐺(𝝀(𝐻))

with 𝐺(𝝀) = 𝑚𝑁 (𝑧) =

1 1 ∑ , 𝑧 = 𝐸 + 𝑖𝜂. 𝑁 𝛼 𝜆𝛼 − 𝑧

In this case, using (13.64), we can estimate ⟨𝑚𝑁 (𝑧); 𝑚𝑁 (𝑧)⟩𝜇 ≤ (13.67) =

𝐶 ∑ ∫ |𝜕𝜆𝛼 𝐺(𝝀)|2 d𝜇 𝑁 𝛼 𝐶 𝐶 1 ∑∫ d𝜇 ≤ 2 4 |𝜆𝛼 − 𝑧|4 𝑁3 𝛼 𝑁 𝜂

where we used |𝜆𝛼 − 𝑧| ≥ Im 𝑧 = 𝜂. This shows that the variance of 𝑚𝑁 (𝑧) vanishes if 𝜂 ≫ 𝑁 −1/2 . The last estimate can be improved if we know that the density of the empirical measure is bounded. Roughly speaking, if we know that in the summation over 𝛼, the number of eigenvalues in an interval of size 𝜂 near 𝐸 is at most of order 𝑁𝜂; then the last estimate can be improved to

(13.68)

1 1 1 ∑∫ d𝜇 ≲ 3 4 3 |𝜆𝛼 − 𝑧| 𝑁 𝛼 𝑁 ≤

∑ 𝛼 |𝜆𝛼 −𝐸|≤𝜂



1 d𝜇 |𝜆𝛼 − 𝑧|4

𝐶 1 𝐶 (𝑁𝜂) 4 = 2 3 . 𝜂 𝑁3 𝑁 𝜂

13.6. REMARKS ON THE APPLICATIONS OF THE LSI TO RANDOM MATRICES

143

This shows that the variance of 𝑚𝑁 (𝑧) vanishes if 𝜂 ≫ 𝑁 −2/3 . Since vanishing fluctuations can be used to estimate the density, this argument can actually be made rigorous by a bootstrapping argument [61]. Notice that the scale 𝜂 ≫ 𝑁 −2/3 is still far from the resolution demonstrated in the local semicircle law, Theorem 6.7. Whenever the LSI is available, the variance bounds can be easily lifted to a concentration estimate with exponential tail. We now demonstrate this mechanism for the fluctuation of a single eigenvalue. In other words, we will apply (13.46) with 𝐹(𝐻) = 𝜆𝛼 (𝐻) − 𝔼𝜇 𝜆𝛼 (𝐻) for a fixed 𝛼. From (12.9), we have ∑ |∇𝑥𝑖𝑗 𝐹|2 ≤ 𝑖≤𝑗

2 ; 𝑁

thus (13.46) implies (13.69)

ℙ𝜇 (|𝜆𝛼 − 𝔼𝜇 𝜆𝛼 | ≥ 𝑡) ≤ exp(−𝑐𝑁𝑡 2 )

for any 𝛼 fixed. This inequality has two deficiencies compared with the rigidity bounds in Theorem 11.5. First, the control of |𝜆𝛼 − 𝔼𝜇 𝜆𝛼 | is only up to 𝑁 −1/2 , which is still far from the optimal 𝑁 −1 (in the bulk). Second, it does not give information on the difference between the expectation 𝔼𝜇 𝜆𝛼 and the corresponding classical location 𝛾𝛼 defined as the 𝛼th 𝑁-quantile of the limiting density (see (11.31)). 13.6.2. LSI from the invariant ensemble point of view. Now we pass to the second point of view, where the basic measure 𝜇𝐺 is the invariant ensemble on the eigenvalues. One might hope that the situation can be improved since we look directly at the Gaussian eigenvalue ensemble defined in (12.13) with the Gaussian choice for 𝑉(𝜆) = 12 𝜆2 . Notice that the role of ℋ in Theorem 13.6 will be played by 𝑁ℋ𝑁 defined in (12.13) (with 𝛽 = 1). The Hessian of ℋ𝑁 is given by (all inner products and norms in the following equation are w.r.t. the standard inner product in ℝ𝑁 ) (13.70)

(𝑣𝑖 − 𝑣𝑗 )2 1 1 1 (𝐯, ∇2 ℋ𝑁 (𝐱)𝐯) ≥ ‖𝐯‖2 + ∑ ≥ ‖𝐯‖2 , 2 2 𝑁 𝑖 0 or only for 𝛽 ≥ 1 depends on the statement, as we explain now. For simplicity, we work in the setup of the Gaussian 𝛽-ensemble; i.e., we set 𝑁

1 2 1 𝑥 − ∑ log(𝑥𝑗 − 𝑥𝑖 ) 4 𝑖 𝑁 𝑖 0. As usual, 𝑍𝜇 is a normalization constant. Recall from (12.4) that the DBM on the simplex Σ𝑁 is defined via the stochastic differential equation (13.73)

d𝑥𝑖 =

√2

1 1 1 d𝐵𝑖 + (− 𝑥𝑖 + ∑ )d𝑡 2 𝑁 𝑗≠𝑖 𝑥𝑖 − 𝑥𝑗 √𝛽𝑁

for 𝑖 = 1, … , 𝑁,

where 𝐵1 , … , 𝐵𝑁 is a family of independent standard Brownian motions. The unique strong solution exists only if 𝛽 ≥ 1 (Theorem 12.2) with equilibrium measure 𝜇𝛽 . Since DBM does not exist on Σ𝑁 unless 𝛽 ≥ 1, we cannot use DBM either in the SDE form (12.4) or in the PDE form (12.17) when 𝛽 < 1 (see a

13.7. EXTENSIONS TO THE SIMPLEX; REGULARIZATION OF THE DBM

145

remark below (12.17)). On the other hand, certain results may be extended to any 𝛽 > 0 if their final formulations do not involve DBM. In this section, we present a regularization procedure to show that substantial parts of the main results of Theorem 13.6, i.e., the LSI for any 𝛽 > 0 and exponential relaxation decay of the entropy for 𝛽 ≥ 1, remain valid on the simplex Σ𝑁 . A similar generalization holds for the Brascamp-Lieb inequality. In the next section, the same regularization will be used to show that the key Dirichlet form inequality (Theorem 14.3) also holds for 𝛽 > 0. For later applications, we work with a slightly bigger class of measures than just 𝜇𝛽 . We consider measures on Σ = Σ𝑁 of the form ˆ −1 −𝛽𝑁ℋ 𝜔 = 𝑍𝜔 𝑒 𝜇𝛽 , ˆ(𝐱) = ∑ 𝑈𝑗 (𝑥𝑗 ) for some convex real valued functions 𝑈𝑗 on ℝ. The where ℋ 𝑗 ˆ. Note that 𝑈𝑗 are defined and convex total Hamiltonian of 𝜔 is ℋ𝜔 ∶= ℋ + ℋ on the entire ℝ𝑁 . The entropy and the Dirichlet form are defined as before: 𝑆𝜔 (𝑓) = ∫ 𝑓 log 𝑓 d𝜔,

𝐷𝜔 (𝑓) =

Σ

1 ∫ |∇𝑓|2 d𝜔. 𝛽𝑁 Σ

The corresponding DBM is given by (13.74)

d𝑥𝑖 =

√2

1 1 1 d𝐵𝑖 + (− 𝑥𝑖 − 𝑈𝑖′ (𝑥𝑖 ) + ∑ )d𝑡. 2 𝑁 𝑥 − 𝑥𝑗 √𝛽𝑁 𝑗≠𝑖 𝑖

We assume a lower bound on the Hessian of the total Hamiltonian, (13.75)

″ ˆ″ ≥ ℋ𝜔 = ℋ″ + ℋ

1 + min min 𝑈𝑗″ (𝑥) ≥ 𝐾 2 𝑗 𝑥∈ℝ

for some positive constant 𝐾 on the entire ℝ𝑁 . This bound (13.75) plays the role of (13.28). Let 𝐷𝜔 , 𝑆𝜔 , and ℒ𝜔 denote the Dirichlet form, entropy, and generator corresponding to the measure 𝜔. Now we claim that Theorem 13.6 holds for the measure 𝜔 on Σ𝑁 in the following form: Theorem 13.14. Assume (13.75). Then for 𝛽 > 0, the LSI holds, i.e., 2 𝐷 (√𝑓) 𝐾 𝜔 for any nonnegative normalized density 𝑓 on Σ𝑁 , ∫ 𝑓 d𝜔 = 1, that satisfies 𝑓 ∈ 𝐿∞ and ∇√𝑓 ∈ 𝐿∞ . For 𝛽 ≥ 1 the requirement that 𝑓 and ∇√𝑓 are bounded can be removed. Moreover, the Brascamp-Lieb inequality also holds for any 𝛽 > 0; i.e., for any bounded function 𝑓 ∈ 𝐿2 (Σ𝑁 , d𝜔) we have (13.76)

(13.77)

𝑆𝜔 (𝑓) ≤

−1

″ ⟨𝑓; 𝑓⟩𝜔 ≤ ⟨∇𝑓, [ℋ𝜔 ] ∇𝑓⟩𝜔 .

For 𝛽 ≥ 1 the requirement that 𝑓 be bounded can be removed.

146

13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI)

Furthermore, for any 𝛽 ≥ 1 the dynamics 𝜕𝑡 𝑓𝑡 = ℒ𝜔 𝑓𝑡 with initial condition 𝑓0 = 𝑓 is well-defined on Σ𝑁 , and it approaches to equilibrium in the sense that (13.78)

𝑆𝜔 (𝑓𝑡 ) ≤ 𝑒−2𝑡𝐾 𝑆𝜔 (𝑓).

The inequalities above are understood in the usual sense that they are relevant only when the right-hand side is finite. Moreover, by a standard density argument, they extend to the closure of the corresponding spaces. For example, (13.76) holds for any 𝑓 that can be approximated by a sequence of bounded normalized densities 𝑓𝑛 ∈ 𝐿∞ with ∇√𝑓𝑛 ∈ 𝐿∞ such that 𝐷𝜔 (√𝑓𝑛 − √𝑓) → 0. Before starting the formal proof, we explain the key idea. For the proofs of (13.76) and (13.77), we extend the measure 𝜔 from Σ to the entire ℝ𝑁 in a continuous way by relaxing the strict ordering 𝑥𝑖 < 𝑥𝑖+1 imposed by Σ but heavily penalizing the opposite order. In this way we can use Theorem 13.6 for the regularized measure and avoid the problematic boundary terms in the integration by parts in (13.33). At the end we remove the regularization using the additional boundedness assumptions on 𝑓 and ∇√𝑓. For 𝛽 ≥ 1 these additional assumptions are not necessary using that 𝐶0∞ (Σ) functions are dense in 𝐻 1 (Σ, d𝜔). The entropy decay (13.78) will follow from the LSI and the time-integral version of the entropy dissipation (13.32), which can be proven directly on Σ if 𝛽 ≥ 1. We remark that there is an alternative regularization method that mimics the proof of Theorem 13.6 directly on Σ for 𝛽 ≥ 1. This is based on introducing carefully selected cutoff functions in order to approximate 𝑓𝑡 by a function compactly supported on Σ. The compact support renders the boundary terms in the integration by parts zero, but the cutoff does not commute with the dynamics; the error has to be tracked carefully. The advantage of this alternative method is that it also gives the exponential decay of the Dirichlet form, and it is also the closest in spirit to the strategy of the proof of Theorem 13.6 on ℝ𝑁 . The disadvantage is that it works only for 𝛽 ≥ 1; in particular, it does not yield the LSI for 𝛽 ∈ (0, 1). We will not discuss this approach in this book; the interested reader may find details in appendix B of [64]. −1 −𝛽𝑁ℋ𝛿 Proof of Theorem 13.14. Define the extension 𝜇𝛽𝛿 = 𝑍𝜇,𝛿 of the 𝑒 𝑁 measure 𝜇𝛽 from Σ𝑁 to ℝ for any 𝛿 > 0 by replacing the singular logarithm with a 𝐶 2 -function. To that end, we introduce the approximation parameter 𝛿 > 0 and define for 𝐱 ∈ ℝ𝑁

(13.79)

ℋ𝛿 (𝐱) ∶= ∑ 𝑖

1 2 1 𝑥 − ∑ log (𝑥 − 𝑥𝑖 ) 4 𝑖 𝑁 𝑖 0, if 𝑥 ≤ 0.

The convergence is monotone decreasing, i.e., (13.80)

log𝛿 (𝑥) ≥ log𝛿′ (𝑥),

𝑥 ∈ ℝ,

for any 𝛿 ≥ 𝛿 ′ . Furthermore, we have 1

if 𝑥 > 𝛿,

𝑥

𝜕𝑥 log𝛿 (𝑥) = { 2𝛿−𝑥 𝛿2

if 𝑥 ≤ 𝛿,

and the lower bound 1

𝜕𝑥2

− 2 log𝛿 (𝑥) ≥ { 𝑥1 − 2 𝛿

if 𝑥 > 𝛿, if 𝑥 ≤ 𝛿;

in particular, ℋ𝛿 is convex with ℋ𝛿″ ≥ 12 . Similarly, we can extend the measure 𝜔 to ℝ𝑁 by setting ˆ 𝛿 −1 −𝛽𝑁ℋ 𝜔𝛿 ∶= 𝑍𝜔,𝛿 𝑒 𝜇𝛽 ,

(13.81)

and its Hamiltonian still satisfies (13.75). By the monotonicity (13.80), we know that 𝑒−𝛽𝑁𝐻𝛿 (𝐱) is pointwise monotonically decreasing in 𝛿 and clearly 𝑍𝜇,𝛿 and 𝑍𝜔,𝛿 converge to 1 as 𝛿 → 0. Clearly, 𝜔𝛿 → 𝜔 as 𝛿 → 0 in the sense that for any set 𝐴 ⊂ ℝ𝑁 we have 𝜔𝛿 (𝐴) → 𝜔(𝐴 ∩ Σ𝑁 ). We remark that the regularized DBM corresponding to the measure 𝜔𝛿 is given by d𝑥𝑖 = (13.82)

√2

1 1 ′ d𝐵𝑖 +(− 𝑥𝑖 − 𝑈𝑖′ (𝑥𝑖 ) + ∑ log𝛿 (𝑥𝑖 − 𝑥𝑗 ) 2 𝑁 √𝛽𝑁 𝑗𝑖 𝛿 𝑗

Given a function 𝑓 on Σ𝑁 such that ∫Σ𝑁 𝑓 d𝜔 = 1 and 𝐷𝜔 (√𝑓) = ∫Σ𝑁 |∇√𝑓|2 d𝜔 < ∞, we extend it by symmetry to ℝ𝑁 . To do so, we first define the symmetrized version of Σ𝑁 , i.e., (13.83)

˜ 𝑁 ∶= ℝ𝑁 ⧵ {𝐱 ∶ ∃𝑖 ≠ 𝑗, 𝑥𝑖 = 𝑥𝑗 }, Σ

which has the disjoint union structure ˜𝑁 = Σ



𝜋(Σ𝑁 )

𝜋∈𝑆𝑁

where 𝜋 runs through all 𝑁-element permutations and acts by permuting the ˜ 𝑁 there is a unique 𝜋 so that coordinates of any point 𝐱 ∈ ℝ𝑁 . For any 𝐱 ∈ Σ ˜ ˜ (𝐱) ∶= 𝑓(𝜋 −1 (𝐱)). Clearly, 𝐱 ∈ 𝜋(Σ𝑁 ), and we then define the extension 𝑓 by 𝑓

148

13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI)

𝜋 −1 (𝐱) is simply the coordinates of 𝐱 = (𝑥1 , … , 𝑥𝑁 ) permuted in increasing ˜ is defined on ℝ𝑁 apart from a zero measure set and is bounded order. Thus 𝑓 ˜ )1/2 ] is also bounded since ∇√𝑓 ∈ 𝐿∞ . since 𝑓 ∈ 𝐿∞ . Furthermore, ∇[( 𝑓 ˜ is bounded, we have ∫ 𝑁 𝑓 ˜ d𝜔𝛿 → ∫ 𝑓 d𝜔 = 1, so there is a Since 𝑓 ℝ Σ𝑁 ˜ constant 𝐶𝛿 such that 𝑓𝛿 ∶= 𝐶𝛿 𝑓 is a probability density, i.e., ∫ 𝑁 𝑓𝛿 d𝜔𝛿 = 1, ℝ

and clearly 𝐶𝛿 → 1 as 𝛿 → 0. Applying Theorem 13.6 to the measure 𝜔𝛿 and the function 𝑓𝛿 on ℝ𝑁 , we see that the LSI holds for 𝜔𝛿 , i.e., 𝑆𝜔𝛿 (𝑓𝛿 ) ≤

2 𝐷 (√𝑓𝛿 ), 𝐾 𝜔𝛿

or, equivalently, ˜ log 𝑓 ˜ d𝜔𝛿 + log 𝐶𝛿 ≤ 2𝐶𝛿 𝐷𝜔 ( 𝑓 ˜ ). 𝐶𝛿 ∫ 𝑓 𝛿 √ 𝐾 𝑁 ℝ ˜ )1/2 ], the weak convergence Now we let 𝛿 → 0. Using the boundedness of ∇[( 𝑓 of 𝜔𝛿 to 𝜔, and that 𝑍𝜔,𝛿 and 𝐶𝛿 converge to 1, the Dirichlet form on the right side of the last inequality converges to 𝐷𝜔 (√𝑓). The first term on the left side of the inequality converges to ∫ 𝑓 log 𝑓 d𝜔 by dominated convergence, and the second term converges to 0. Thus we arrive at (13.76). In the above argument, the boundedness of 𝑓 and ∇√𝑓 were only used to ˜ has finite integral, and the Dirichlet ensure that 𝑓 or rather its extension 𝑓 ˜ ), converges to 𝐷𝜔 (√𝑓 ). For form w.r.t. the regularized measure 𝜔𝛿 , 𝐷𝜔𝛿 (√ 𝑓 𝛽 ≥ 1 we can remove these conditions by using a different extension of 𝑓 to ℝ𝑁 if 𝑓 ∈ 𝐻 1 (Σ, d𝜔). We may assume that 𝐷𝜔 (√𝑓) < ∞; otherwise (13.76) is a tautology. We first still assume that 𝑓 ∈ 𝐿∞ (Σ). We smoothly cut off 𝑓 to be 0 at the boundary of Σ𝑁 ; i.e., we find a nonnegative sequence 𝑓𝜀 ∈ 𝐶0∞ (Σ𝑁 ) such that √𝑓𝜀 → √𝑓 in 𝐻 1 (Σ, d𝜔) and ∫ 𝑓𝜀 d𝜔 = 1. For 𝛽 ≥ 1 the existence of a similar sequence but 𝑓𝜀 → 𝑓 in 𝐻 1 was shown in Section 12.4. In fact, the same construction shows that we can also guarantee √𝑓𝜀 → √𝑓 in 𝐻 1 (Σ, d𝜔). Now we use the LSI for the smooth functions 𝑓𝜀 , i.e., 𝑆𝜔 (𝑓𝜀 ) ≤

2 𝐷 (√𝑓𝜀 ), 𝐾 𝜔

and we let 𝜀 → 0. The right-hand side converges to 𝐷𝜔 (√𝑓) by the above choice of 𝑓𝜀 . For the left-hand side, recall that apart from a smoothing that can be dealt with via standard approximation arguments, the cutoff function was constructed in the form 𝑓𝜀 (𝐱) = 𝐶𝜀 𝜙𝜀 (𝐱)𝑓(𝐱), where 𝜙𝜀 (𝐱) ∈ (0, 1) with 𝜙𝜀 ↗ 1 monotonically pointwise and 𝐶𝜀 is a normalization such that 𝐶𝜀 → 1 as 𝜀 → 0. Clearly, 𝑆𝜔 (𝑓𝜀 ) = 𝐶𝜀 log 𝐶𝜀 ∫ 𝜙𝜀 𝑓 d𝜔 + 𝐶𝜀 ∫ 𝑓𝜙𝜀 log 𝜙𝜀 d𝜔 + 𝐶𝜀 ∫ 𝜙𝜀 𝑓 log 𝑓 d𝜔. Σ

Σ

Σ

13.7. EXTENSIONS TO THE SIMPLEX; REGULARIZATION OF THE DBM

149

The first term converges to 0 since ∫ 𝜙𝜀 𝑓 d𝜔 ≤ 1 and 𝐶𝜀 log 𝐶𝜀 → 0. The second term also converges to 0 by dominated convergence since |𝜙𝜀 log 𝜙𝜀 | is bounded by 1 and goes to 0 pointwise. Finally, the last term converges to 𝑆𝜔 (𝑓) by monotone convergence. This proves (13.76) for 𝛽 ≥ 1 for any bounded 𝑓. Finally, we remove this last condition. Given any 𝑓 with 𝐷𝜔 (√𝑓) < ∞, we define ˜ ∶= 𝐶𝑀 𝑓𝑀 where 𝐶𝑀 is the normalization. Clearly, 𝑓𝑀 ∶= min{𝑓, 𝑀} and 𝑓 𝑀 ˜ , we have 𝐶𝑀 → 1 and 𝑓𝑀 ↗ 𝑓 pointwise. Since (13.76) holds for 𝑓 𝑀 (13.84) 𝐶𝑀 log 𝐶𝑀 ∫ 𝑓𝑀 d𝜔 + 𝐶𝑀 ∫ 𝑓𝑀 log 𝑓𝑀 d𝜔 ≤

2 2 𝐶𝑀 ∫|∇√𝑓𝑀 | d𝜔. 𝐾

Now we let 𝑀 → ∞. The first term on the left is just log 𝐶𝑀 → 0. The second term on the left converges to 𝑆𝜔 (𝑓) by monotone convergence and 𝐶𝑀 → 1 and similarly the right-hand side converges to (2/𝐾)𝐷𝜔 (√𝑓) by monotone convergence. This proves (13.76) for 𝛽 ≥ 1 without any additional condition on 𝑓. The Brascamp-Lieb inequality, (13.77), is proved similarly, starting from its regularized version on ℝ𝑁 , ˆ″ ]−1 ∇𝑓⟩ ⟨𝑓; 𝑓⟩𝜔𝛿 ≤ ⟨∇𝑓, [ℋ𝛿″ + ℋ 𝜔

𝛿

that follows directly from Theorem 13.13. We can then take the limit 𝛿 → 0 using monotone convergence on the left and the dominated convergence on the right, ˆ″ ]−1 is uniformly bounded. using that ℋ𝛿″ → ℋ and the inverse [ℋ𝛿″ + ℋ For the third part of the theorem, for the proof of (13.78), we first note that the remark after (12.17) applies to the generator ℒ𝜔 as well; i.e., 𝛽 ≥ 1 is necessary for the well-posedness of the equation 𝜕𝑡 𝑓𝑡 = ℒ𝜔 𝑓𝑡 on Σ𝑁 with initial condition 𝑓0 supported on Σ𝑁 . The construction of the dynamics in Section 12.4 also implies that 𝑓𝑡 ∈ 𝐻 1 (d𝜔) for any 𝑡 > 0 if 𝑓 ∈ 𝐿2 (d𝜔). We now mimic the proof of the entropy dissipation (13.32) in our setup. Since we do not know that 𝐷𝜔 (√𝑓𝑡 ) < ∞, we have to introduce a regularization 𝑐 > 0 to keep 𝑓𝑡 away from 0. We compute ℒ𝑓𝑡 d ∫ 𝑓𝑡 log(𝑓𝑡 + 𝑐)d𝜔 = ∫(ℒ𝑓𝑡 ) log(𝑓𝑡 + 𝑐)d𝜔 + ∫ 𝑓𝑡 d𝜔 d𝑡 𝑓𝑡 + 𝑐

(13.85)

= −∫

|∇𝑓𝑡 |2 𝑐ℒ𝑓𝑡 d𝜔 − ∫ d𝜔 𝑓𝑡 + 𝑐 𝑓𝑡 + 𝑐

= −∫

|∇𝑓𝑡 |2 𝑐|∇𝑓𝑡 |2 d𝜔 − ∫ d𝜔. 𝑓𝑡 + 𝑐 (𝑓𝑡 + 𝑐)2

Owing to the regularization 𝑐 > 0, we used integration by parts for 𝐻 1 (d𝜔) functions only. Dropping the last term that is negative and integrating from 0 to 𝑡, we obtain 𝑡

(13.86)

∫ 𝑓𝑡 log(𝑓𝑡 + 𝑐)d𝜔 + ∫ d𝑠 ∫ 0

|∇𝑓𝑠 |2 d𝜔 ≤ ∫ 𝑓0 log(𝑓0 + 𝑐)d𝜔. 𝑓𝑠 + 𝑐

150

Since 𝑓𝑡 log

13. ENTROPY AND THE LOGARITHMIC SOBOLEV INEQUALITY (LSI) 𝑓𝑡 +𝑐 𝑓𝑡

≥ 0, we have 𝑡

(13.87)

∫ 𝑓𝑡 log 𝑓𝑡 d𝜔 + ∫ ∫ 0

|∇𝑓𝑠 |2 d𝑠 ≤ ∫ 𝑓0 log(𝑓0 + 𝑐)d𝜔. 𝑓𝑠 + 𝑐

Note that both terms on the left-hand side are nonnegative. Now we let 𝑐 → 0. By monotone convergence and 𝑆𝜔 (𝑓0 ) < ∞, we get 𝑡

(13.88)

∫ 𝑓𝑡 log 𝑓𝑡 d𝜇 + ∫ d𝑠 ∫ 0

|∇𝑓𝑠 |2 d𝜔 ≤ ∫ 𝑓0 log 𝑓0 d𝜔 𝑓𝑠

or 𝑡

(13.89)

𝑆𝜔 (𝑓𝑡 ) + 4 ∫ 𝐷𝜔 (√𝑓𝑠 )d𝑠 ≤ 𝑆𝜔 (𝑓0 ). 0

This is the entropy dissipation inequality in a time integral form. Notice that neither equality nor the differential version as in (13.32) is claimed. Note that by integrating (13.85) between 𝑡 and 𝜏, a similar argument yields 𝑡

(13.90)

𝑆𝜔 (𝑓𝑡 ) + 4 ∫ 𝐷𝜔 (√𝑓𝑠 )d𝑠 ≤ 𝑆𝜔 (𝑓𝜏 ),

𝑡 ≥ 𝜏 ≥ 0.

𝜏

In particular, the entropy decays: (13.91)

0 ≤ 𝑆𝜔 (𝑓𝑡 ) ≤ 𝑆𝜔 (𝑓𝜏 ),

𝑡 ≥ 𝜏 ≥ 0.

Now we use the LSI (13.76) to estimate 𝐷𝜔 (√𝑓𝑠 ) in (13.90) and recall that the LSI holds for any 𝑓𝑠 since 𝛽 ≥ 1. We get 𝑡

(13.92)

𝑆𝜔 (𝑓𝑡 ) + 2𝐾 ∫ 𝑆𝜔 (𝑓𝑠 )d𝑠 ≤ 𝑆𝜔 (𝑓𝜏 ),

𝑡 ≥ 𝜏 ≥ 0.

𝜏

A standard calculus exercise shows that 𝑆𝜔 (𝑓𝑡 ) ≤ 𝑒−2𝐾𝑡 𝑆𝜔 (𝑓0 ) for all 𝑡 ≥ 0. One possible argument is to fix any 𝛿 > 0 and choose 𝜏 = (𝑛 − 1)𝛿, 𝑡 = 𝑛𝛿 with 𝑛 = 1, 2, … in (13.92). By monotonicity of the entropy, we have (1 + 2𝐾𝛿)𝑆𝜔 (𝑓𝑛𝛿 ) ≤ 𝑆𝜔 (𝑓(𝑛−1)𝛿 ) for any 𝑛, and by iteration we obtain 𝑆𝜔 (𝑓𝑛𝛿 ) ≤ (1 + 2𝐾𝛿)−𝑛 𝑆𝜔 (𝑓0 ). Setting 𝛿 = 𝑡/𝑛 and letting 𝑛 → ∞ we get (13.78). This completes the proof of Theorem 13.14. □

CHAPTER 14

Universality of the Dyson Brownian Motion In this section we assume 𝛽 ≥ 1 since Dyson Brownian motion in a strong sense is well-defined only for these values of 𝛽. However, some results will hold for any 𝛽 > 0, which we will comment on separately. We consider the Gaussian equilibrium measure 𝜇 = 𝜇G ∼ exp(−𝛽𝑁ℋ) on Σ𝑁 , defined in (12.13) in Section 12.3, with the Gaussian potential 𝑉(𝜆) = 12 𝜆2 . From (12.14) we recall the corresponding Dirichlet form 𝑁

(14.1)

𝐷𝜇 (𝑓) ∶=

1 1 ∑ ∫(𝜕𝑖 𝑓)2 d𝜇 = ‖∇𝑓‖2𝐿2 (𝜇) 𝛽𝑁 𝑖=1 𝛽𝑁

(𝜕𝑖 ≡ 𝜕𝜆𝑖 )

and the generator ℒ = ℒG defined via (14.2)

𝐷𝜇 (𝑓) = − ∫ 𝑓ℒG 𝑓 d𝜇,

and explicitly given by ℒG =

1 𝛽𝑁

Δ − (∇ℋ) ⋅ ∇. The corresponding dynamics is

given by (12.17), i.e., (14.3)

𝜕𝑡 𝑓𝑡 = ℒG 𝑓𝑡 ,

𝑡 ≥ 0.

In this section, we will drop all subscripts G. As remarked in the previous section, the Hamiltonian ℋ is convex since the Hessian of the Hamiltonian of 𝜇 satisfies ∇2 (𝛽𝑁ℋ) ≥ 𝛽𝑁/2 by (13.70). Taking the different normalization of the Dirichlet form in (13.25) and (14.1) into account, Theorem 13.6 (actually, its extension to Σ𝑁 in Theorem 13.14 with 𝑈𝑗 ≡ 0) guarantees that 𝜇 satisfies the LSI in the form 𝑆𝜇 (𝑓) ≤ 4𝐷𝜇 (√𝑓), and the relaxation time to equilibrium is of order 1. Furthermore, we have the exponential convergence to the equilibrium 𝜇G in the sense of total variation norm (13.78) at exponential speed on time scales beyond order 1 for initial data with finite entropy. The following theorem is the main result of this section. It shows that under a certain condition on the rigidity of the eigenvalues the relaxation times of the DBM for observables depending only on the eigenvalue differences are in fact much shorter than order 1. The main assumption is the a priori estimate (12.18), which we restate here. 151

152

14. UNIVERSALITY OF THE DYSON BROWNIAN MOTION

A priori estimate. There exists 𝜉 > 0 such that 𝑁

(14.4)

1 ∫ ∑ (𝜆𝑗 − 𝛾𝑗 )2 𝑓𝑡 (𝝀)𝜇G (d𝝀) ≤ 𝐶𝑁 −2+2𝜉 𝑄 ∶= sup 𝑁 0≤𝑡≤𝑁 𝑗=1

with a constant 𝐶 uniformly in 𝑁. We also assume that after time 1/𝑁 the solution of the equation (14.3) satisfies the bound 𝑆𝜇 (𝑓1/𝑁 ) ≤ 𝐶𝑁 𝑚

(14.5)

for some fixed 𝑚. Later in Lemma 14.6, we will show that for 𝛽 = 1, 2 this bound automatically holds. Theorem 14.1 (Gap universality of the Dyson Brownian motion for short time [64, theorem 4.1]). Let 𝛽 ≥ 1 and assume (14.5). Fix 𝑛 ≥ 1 and an array 𝑛 of positive integers, 𝐦 = (𝑚1 , … , 𝑚𝑛 ) ∈ ℕ𝑛 + . Let 𝐺 ∶ ℝ → ℝ be a bounded, smooth function with compact support and define (14.6) 𝒢𝑖,𝐦 (𝐱) ∶= 𝐺(𝑁(𝑥𝑖 − 𝑥𝑖+𝑚1 ), 𝑁(𝑥𝑖+𝑚1 − 𝑥𝑖+𝑚2 ), … , 𝑁(𝑥𝑖+𝑚𝑛−1 − 𝑥𝑖+𝑚𝑛 )). 1

Then, for any 𝜉 ∈ (0, ) and any sufficiently small 𝜀 > 0, independent of 𝑁, 2 there exist constants 𝐶, 𝑐 > 0, depending only on 𝜀 and 𝐺, such that for any 𝐽 ⊂ {1, … , 𝑁 − 𝑚𝑛 } we have (14.7)

| | 1 𝑁2𝑄 𝜀 |∫ ∑ 𝒢𝑖,𝐦 (𝐱)(𝑓𝑡 d𝜇 − d𝜇)| ≤ 𝐶𝑁 𝜀 + 𝐶𝑒−𝑐𝑁 . |𝐽| 𝑖∈𝐽 | √ |𝐽|𝑡 𝑡≥𝑁−1+2𝜉+𝜀 | sup

In particular, if (14.4) holds, then for any 𝑡 ≥ 𝑁 −1+2𝜉+2𝜀+𝛿 with some 𝛿 ≥ 0, we have (14.8)

| | 1 |∫ ∑ 𝒢𝑖,𝐦 (𝐱)(𝑓𝑡 d𝜇 − d𝜇)| ≤ 𝐶 | | |𝐽| 𝑖∈𝐽

1 √|𝐽|𝑁

𝜀

+ 𝐶𝑒−𝑐𝑁 . 𝛿−1

Thus the gap distribution, averaged over 𝐽 indices, coincides for 𝑓𝑡 d𝜇 and d𝜇 provided that |𝐽|𝑁 𝛿−1 → ∞. We stated this result with a very general test function (14.6) because we will need this form later. Obviously, controlling test functions of the form (14.6) for all 𝑛 is equivalent to controlling test functions of neighboring gaps only (maybe with a larger 𝑛), so the reader may think only of the case 𝑚1 = 1, 𝑚2 = 2, … , 𝑚𝑛 = 𝑛. In our applications, 𝐽 is chosen to be the indices of the eigenvalues in the interval [𝐸 − 𝑏, 𝐸 + 𝑏] and thus |𝐽| ≍ 𝑁𝑏. This identifies the averaged gap distributions of eigenvalues. However, Theorem 12.4 concerns correlation functions and not gap distributions directly. It is intuitively clear that the information carried in gap distribution or correlation functions are equivalent if both statistics are averaged on a scale larger than the typical fluctuation of a single eigenvalue

14.1. MAIN IDEAS BEHIND THE PROOF OF THEOREM 14.1

153

(which is smaller than 𝑁 −1+𝜀 in the bulk by (11.32)). This standard fact will be proved in Section 14.5. As pointed out after Theorem 12.4, the input of this theorem, the a priori estimate (14.4), identifies the location of the eigenvalues only on a scale 𝑁 −1+𝜉 , which is much weaker than the 1/𝑁 precision encoded in the rescaled eigenvalue differences in (14.7). Moreover, by the rigidity estimate (11.32), the a priori estimate (14.4) holds for any 𝜉 > 0 if the initial data of the DBM is a generalized Wigner ensemble. Therefore, Theorem 14.1 holds for any 𝑡 ≥ 𝑁 −1+𝜀 for any 𝜀 > 0. This establishes Dyson’s conjecture (described in Section 12.3) in the sense of averaged gap distributions for any generalized Wigner matrices. 14.1. Main Ideas Behind the Proof of Theorem 14.1 The key method is to analyze the relaxation to equilibrium of the dynamics (13.30). This approach was first introduced in section 5.1 of [63]; the presentation here follows [64]. Recall the convexity inequality (13.70) for the Hessian of the Hamiltonian (14.9)

⟨𝐯, ∇2 ℋ(𝐱)𝐯⟩ ≥

(𝑣𝑖 − 𝑣𝑗 )2 1 1 1 ≥ ‖𝐯‖2 , ‖𝐯‖2 + ∑ 2 2 𝑁 𝑖 0 we have 𝐷𝜔 (√𝑞) | | 1 ∑ 𝒢𝑖,𝐦 (𝐱)(𝑞 d𝜔 − d𝜔)| ≤ 𝐶(𝑡 (14.21) |∫ ) |𝐽| | |𝐽| 𝑖∈𝐽 |

1/2

+ 𝐶 √𝑆𝜔 (𝑞)𝑒−𝑐𝑡/𝜏 .

Proof. We give the proof for the 𝛽 ≥ 1 case here since this is relevant for Theorem 14.1. The general case 𝛽 > 0 with additional assumptions will be discussed in Section 14.4. For simplicity of notation, we consider only the case 𝑛 = 1, 𝑚1 = 1, 𝒢𝑖,𝐦 (𝐱) = 𝐺(𝑁(𝑥𝑖 − 𝑥𝑖+1 )). Let 𝑞𝑡 satisfy ˜ 𝑡, 𝜕𝑡 𝑞𝑡 = ℒ𝑞

𝑡 ≥ 0,

with an initial condition 𝑞0 = 𝑞. We write (14.22)

∫[

1 ∑ 𝐺(𝑁(𝑥𝑖 − 𝑥𝑖+1 ))](𝑞 − 1)d𝜔 = |𝐽| 𝑖∈𝐽 ∫[

1 ∑ 𝐺(𝑁(𝑥𝑖 − 𝑥𝑖+1 ))](𝑞 − 𝑞𝑡 )d𝜔 |𝐽| 𝑖∈𝐽 + ∫[

1 ∑ 𝐺(𝑁(𝑥𝑖 − 𝑥𝑖+1 ))](𝑞𝑡 − 1)d𝜔. |𝐽| 𝑖∈𝐽

The second term in (14.22) can be estimated by (13.8), the decay of the entropy (14.18), and the boundedness of 𝐺; this gives the second term in (14.21). ˜ 𝑡 To estimate the first term in (14.22), by the evolution equation 𝜕𝑞𝑡 = ℒ𝑞 ˜ we have and the definition of ℒ ∫

1 1 ∑ 𝐺(𝑁(𝑥𝑖 − 𝑥𝑖+1 ))𝑞𝑡 d𝜔 − ∫ ∑ 𝐺(𝑁(𝑥𝑖 − 𝑥𝑖+1 ))𝑞0 d𝜔 = |𝐽| 𝑖∈𝐽 |𝐽| 𝑖∈𝐽 𝑡

∫ d𝑠 ∫ 0

1 ∑ 𝐺 ′ (𝑁(𝑥𝑖 − 𝑥𝑖+1 ))[𝜕𝑖 𝑞𝑠 − 𝜕𝑖+1 𝑞𝑠 ]d𝜔. |𝐽| 𝑖∈𝐽

From the Schwarz inequality and 𝜕𝑞 = 2√𝑞𝜕 √𝑞, the last term is bounded by 1/2

𝑡

2 𝑁2 ∑[𝐺 ′ (𝑁(𝑥𝑖 − 𝑥𝑖+1 ))] (𝑥𝑖 − 𝑥𝑖+1 )2 𝑞𝑠 d𝜔] 2[∫ d𝑠 ∫ 2 |𝐽| 𝑖∈𝐽 0 ℝ𝑁 𝑡

(14.23)

2 1 1 ∑ × [∫ d𝑠 ∫ [𝜕𝑖 √𝑞𝑠 − 𝜕𝑖+1 √𝑞𝑠 ] d𝜔] 2 2 𝑁 𝑖 (𝑥𝑖 − 𝑥𝑖+1 ) ℝ𝑁 0

𝐷𝜔 (√𝑞0 )𝑡 ≤ 𝐶( ) |𝐽|

1/2

1/2

,

where we have used (14.16) and that [𝐺 ′ (𝑁(𝑥𝑖 − 𝑥𝑖+1 ))]2 (𝑥𝑖 − 𝑥𝑖+1 )2 ≤ 𝐶𝑁 −2 due to 𝐺 being smooth and compactly supported. □

14.2. PROOF OF THEOREM 14.1

157

Alternatively, we could have directly estimated the left-hand side of (14.21) by using the total variation norm between 𝑞𝜔 and 𝜔, which in turn could be estimated by the entropy and the Dirichlet form using the logarithmic Sobolev inequality, i.e., by (14.24)

𝐶 ∫ |𝑞 − 1|d𝜔 ≤ 𝐶 √𝑆𝜔 (𝑞) ≤ 𝐶 √𝜏𝐷𝜔 (√𝑞).

However, compared with this simple bound, the estimate (14.21) gains an extra factor |𝐽| ≍ 𝑁 in the denominator; i.e., it is in terms of the Dirichlet form per particle. The improvement is due to the fact that the observable in (14.21) depends only on the gap, i.e., difference of points. This allows us to exploit the additional term (14.16) gained in the Bakry-Émery argument. This is a manifestation of the general observation that gap observables behave much better than point observables. The final ingredient in proving Theorem 14.1 is the following entropy and Dirichlet form estimates. Theorem 14.4. Let 𝛽 ≥ 1 be arbitrary. Suppose that (13.70) holds and recall the definition of 𝑄 from (12.18). Fix some (possibly 𝑁-dependent) 𝜏 > 0, and consider the local relaxation measure 𝜔 with this 𝜏. Set 𝜓 ∶= 𝜔/𝜇 and let 𝑔𝑡 ∶= 𝑓𝑡 /𝜓 with 𝑓𝑡 solving the evolution equation (14.3). Suppose there is a constant 𝑚 such that 𝑆(𝑓𝜏 𝜇|𝜔) ≤ 𝐶𝑁 𝑚 .

(14.25)

Fix any small 𝜀 > 0. Then, for any 𝑡 ∈ [𝜏𝑁 𝜀 , 𝑁] the entropy and the Dirichlet form satisfy the estimates (14.26)

𝑆(𝑔𝑡 𝜔|𝜔) ≤ 𝐶𝑁 2 𝑄𝜏 −1 ,

𝐷𝜔 (√𝑔𝑡 ) ≤ 𝐶𝑁 2 𝑄𝜏 −2 ,

where the constants depend on 𝜀 and 𝑚. Remark 14.5. It will not be difficult to check that if the initial data is given by a Wigner ensemble, then (14.25) holds without any assumption (see Lemma 14.6). Proof of Theorem 14.4. Using Lemma 13.11, we can compute the evolution of the entropy 𝑆(𝑓𝑡 𝜇|𝜔) = 𝑆(𝑓𝑡 𝜇|𝜓𝜇) as 𝜕𝑡 𝑆(𝑓𝑡 𝜇|𝜔) = −

4 ∑ ∫(𝜕𝑗 √𝑔𝑡 )2 𝜓 d𝜇 + ∫(ℒ𝑔𝑡 )𝜓 d𝜇 𝛽𝑁 𝑗

where ℒ is defined in (14.3) and we used that 𝜓 = 𝜔/𝜇 is time independent. Hence we have, by using (14.13), 𝜕𝑡 𝑆(𝑓𝑡 𝜇|𝜔) = −

4 ˜ 𝑡 d𝜔 + ∑ ∫ 𝑏𝑗 𝜕𝑗 𝑔𝑡 d𝜔. ∑ ∫(𝜕𝑗 √𝑔𝑡 )2 d𝜔 + ∫ ℒ𝑔 𝛽𝑁 𝑗 𝑗

158

14. UNIVERSALITY OF THE DYSON BROWNIAN MOTION

˜ Since 𝜔 is ℒ-invariant and time independent, the middle term on the right-hand side vanishes. From the Schwarz inequality and 𝜕𝑔 = 2√𝑔 𝜕 √𝑔, we have (14.27)

𝜕𝑡 𝑆(𝑓𝑡 𝜇|𝜔) ≤ −2𝐷𝜔 (√𝑔𝑡 ) + 𝐶𝑁 ∑ ∫ 𝑏𝑗2 𝑔𝑡 d𝜔 𝑗 2

≤ −2𝐷𝜔 (√𝑔𝑡 ) + 𝐶𝑁 𝑄𝜏 −2 . Notice that (14.27) is reminiscent to (13.32) for the derivative of the entropy of the measure 𝑔𝑡 𝜔 = 𝑓𝑡 𝜇 with respect to 𝜔. The difference is, however, that 𝑔𝑡 ˜ The last term in does not satisfy the evolution equation with the generator ℒ. (14.27) expresses the error. Together with the logarithmic Sobolev inequality (14.17), we have (14.28) 𝜕𝑡 𝑆(𝑓𝑡 𝜇|𝜔) ≤ −2𝐷𝜔 (√𝑔𝑡 ) + 𝐶𝑁 2 𝑄𝜏 −2 ≤ −𝐶𝜏 −1 𝑆(𝑓𝑡 𝜇|𝜔) + 𝐶𝑁 2 𝑄𝜏 −2 . Integrating the last inequality from 𝜏 to 𝑡 and using the assumption (14.25) and 𝑡 ≥ 𝜏𝑁 𝜀 , we have proved the first inequality of (14.26). Using this result and integrating (14.27), we have 𝑡

∫ 𝐷𝜔 (√𝑔𝑠 )d𝑠 ≤ 𝐶𝑁 2 𝑄𝜏 −1 .

(14.29)

𝜏

Notice that 𝐷𝜇 (√𝑓𝑠 ) = (14.30)



|∇(𝑔𝑠 𝜓)|2 d𝜔 1 ∫ 𝑔𝑠 𝜓 𝜓 𝛽𝑁 |∇𝑔𝑠 |2 𝐶 ∫[ + |∇ log 𝜓|2 𝑔𝑠 ]d𝜔 𝑔𝑠 𝛽𝑁

= 𝐶𝐷𝜔 (√𝑔𝑠 ) + 𝐶𝑁 2 𝑄𝜏 −2 by a Schwarz inequality. Thus from (14.29), after restricting the integration and using 𝑡 ≥ 2𝜏, we get 𝑡 2

𝐶𝑁 𝑄𝜏 (14.31)

−1

≥ ∫ 𝐷𝜔 (√𝑔𝑠 )d𝑠 𝑡−𝜏 𝑡

≥∫ [ 𝑡−𝜏

1 𝐷 (√𝑓𝑠 ) − 𝐶𝑁 2 𝑄𝜏 −2 ]d𝑠 𝐶 𝜇

𝜏 ≥ 𝐷𝜇 (√𝑓𝑡 ) − 𝐶𝑁 2 𝑄𝜏 −1 , 𝐶 where, in the last step, we used that 𝐷𝜇 (√𝑓𝑡 ) is decreasing in 𝑡, which follows from the convexity of the Hamiltonian of 𝜇 (see, e.g., (13.33)). Using the opposite inequality 𝐷𝜔 (√𝑔𝑡 ) ≤ 𝐶𝐷𝜇 (√𝑓𝑡 ) + 𝐶𝑁 2 𝑄𝜏 −2 , which can be proven similarly to (14.30), we obtain 𝐷𝜔 (√𝑔𝑡 ) ≤ 𝐶𝑁 2 𝑄𝜏 −2 , i.e., the second inequality of (14.26).



14.2. PROOF OF THEOREM 14.1

159

Finally, we complete the proof of Theorem 14.1. For any given 𝑡 > 0 we now choose 𝜏 ∶= 𝑡𝑁 −𝜀 and construct the local relaxation measure 𝜔 with this 𝜏 as in (14.12). Set 𝜓 = 𝜔/𝜇 and let 𝑞 ∶= 𝑔𝑡 = 𝑓𝑡 /𝜓 be the density 𝑞 in Theorem 14.3. We would like to apply Theorem 14.4, and for this purpose we need to verify the assumption (14.25). By the definitions of 𝜔, 𝜇, and 𝜓 = 𝜔/𝜇, we have (14.32) 𝑆(𝑓𝜏 𝜇|𝜔) = ∫ 𝑓𝜏 log 𝑓𝜏 d𝜇 − ∫ 𝑓𝜏 log 𝜓 d𝜇 = 𝑆𝜇 (𝑓𝜏 ) − ∫ 𝑓𝜏 log 𝜓 d𝜇. We can bound the last term by (14.33)

− ∫ 𝑓𝜏 log 𝜓 d𝜇 ≤ 𝐶𝑁 ∫ 𝑓𝜏 𝑊 d𝜇 + log

˜ 𝑍 ≤ 𝐶𝑁 2 𝑄𝜏 −1 ≤ 𝐶𝑁 𝑚 𝑍

˜ ≥ ℋ. To verify the ˜ ≤ 𝑍 since ℋ for some 𝑚, where we have used that 𝑍 assumption (14.25), it remains to prove that 𝑆𝜇 (𝑓𝜏 ) ≤ 𝐶𝑁 𝑚 for some 𝑚. This inequality follows from the following lemma. Lemma 14.6. Let 𝛽 = 1, 2. Suppose the initial data 𝑓0 of the DBM is given by the eigenvalue distribution of a Wigner matrix. Then for any 𝜏 > 0 we have (14.34)

𝑆𝜇 (𝑓𝜏 ) ≤ 𝐶𝑁 2 (1 − log(1 − e−𝜏 )).

Proof. For simplicity, we consider only the case 𝛽 = 1, i.e., the real Wigner matrices. Recall that the probability measure 𝑓𝜏 𝜇 is the same as the eigenvalue distribution of the Gaussian divisible matrix (12.21): (14.35)

𝐻𝜏 = 𝑒−𝜏/2 𝐻0 + (1 − 𝑒−𝜏 )1/2 𝐻 G

where 𝐻0 is the initial Wigner matrix and 𝐻 G is an independent standard GOE matrix. Since the entropy is monotonic w.r.t. taking a marginal, (13.21), we have 𝑆𝜇 (𝑓𝜏 ) = 𝑆(𝑓𝜏 𝜇|𝜇) ≤ 𝑆(𝜇𝐻𝜏 |𝜇𝐻G ) where 𝜇𝐻𝜏 and 𝜇𝐻G are the laws of the matrix 𝐻𝜏 and 𝐻 G , respectively. Since the laws of both 𝜇𝐻𝜏 and 𝜇𝐻G are given by the product of the laws of the matrix elements, from the additivity of entropy (13.11), 𝑆(𝜇𝐻𝜏 |𝜇𝐻G ) is equal to the sum of the relative entropies of the matrix elements. Recall that the variances of offdiagonal and diagonal entries for GOE differ by a factor of 2. For simplicity of notation, we consider only the off-diagonal terms. Let 𝛾 = 1 − e−𝜏 and denote by 𝑔𝛼 the standard Gaussian distribution with variance 𝛼, i.e., 𝑔𝛼 (𝑥) ∶=

1 √2𝜋𝛼

exp(−

𝑥2 ). 2𝛼

Let 𝜚𝛾 be the probability density of (1 − 𝛾)1/2 𝜁 = 𝑒−𝜏/2 𝜁 where 𝜁 is the random variable for an off-diagonal matrix element. By definition, the probability density of the matrix element of 𝐻𝜏 is given by 𝜁𝜏 = 𝜚𝛾 ∗ 𝑔2𝛾/𝑁 . Therefore Jensen’s

160

14. UNIVERSALITY OF THE DYSON BROWNIAN MOTION

inequality yields

(14.36)

| 𝑆(𝜁𝜏 |𝑔2/𝑁 ) = 𝑆(∫ d𝑦 𝜚𝛾 (𝑦) 𝑔2𝛾/𝑁 (⋅ − 𝑦) || 𝑔2/𝑁 ) ≤ ∫ d𝑦 𝜚𝛾 (𝑦)𝑆(𝑔2𝛾/𝑁 (⋅ − 𝑦)|𝑔2/𝑁 ).

By explicit computation one finds 𝑆(𝑔𝜍2 (⋅ − 𝑦)|𝑔𝑠2 ) = log

𝑦2 𝑠 𝜎2 1 + 2+ 2− . 𝜎 2𝑠 2 2𝑠

In our case, we have 1 𝑁 𝑆(𝑔2𝛾/𝑁 (⋅ − 𝑦)|𝑔2/𝑁 ) = ( 𝑦 2 − log 𝛾 + 𝛾 − 1). 2 2 We can now continue the estimate (14.36). Using ∫ 𝑦 2 𝜚𝛾 (𝑦)d𝑦 = 2/𝑁, we obtain 𝑆(𝜁𝜏 |𝑔2/𝑁 ) ≤ 𝐶 − log 𝛾, □

and the claim follows.

We now return to the proof of Theorem 14.1. We can apply Lemma 14.6 1 with the choice 𝜏 = 𝑁 −1+2𝜉 , where 𝜉 ∈ (0, ) is from the assumption (14.4). 2 Together with (14.32)–(14.33), this implies that (14.25) holds. Thus, Theorem 14.4 and Theorem 14.3 imply for any 𝑡 ∈ [𝜏𝑁 𝜀 , 𝑁] that 1/2

𝐷𝜔 (√𝑞) | 1 | |∫ ∑ 𝒢m,𝑖 (𝐱)(𝑓𝑡 d𝜇 − d𝜔)| ≤ 𝐶(𝑡 ) |𝐽| | 𝑁 𝑖∈𝐽 | (14.37)

≤ 𝐶(𝑡

𝑁2𝑄 ) |𝐽|𝜏 2

≤ 𝐶𝑁 𝜀

+ 𝐶 √𝑆𝜔 (𝑞) 𝑒−𝑐𝑁

1/2

+ 𝐶𝑒−𝑐𝑁

𝜀

𝜀

𝑁2𝑄 𝜀 + 𝐶𝑒−𝑐𝑁 ; √ |𝐽|𝑡

i.e., the local statistics of 𝑓𝑡 𝜇 and 𝜔 are the same for any initial data 𝑓𝜏 . Applying the same argument to the Gaussian initial data, 𝑓0 = 𝑓𝜏 = 1, we can also compare 𝜇 and 𝜔. We have thus proved the estimate (14.7). Finally, if 𝑡 ≥ 𝑁 −1+2𝜉+𝛿+2𝜀 , then the assumption (14.4) guarantees that 𝑁2𝑄 ≤ √ |𝐽|𝑡 which proves Theorem 14.1.

1 √

|𝐽|𝑁 𝛿−1

,

14.3. RESTRICTION TO THE SIMPLEX VIA REGULARIZATION

161

14.3. Restriction to the Simplex via Regularization In the previous section we tacitly assumed that the Bakry-Émery analysis from Section 13.3 extends to Σ. Strictly speaking, this is not fully rigorous. There are two options to handle this technical point. As we remarked after Theorem 13.14, there is a direct method via cutoff functions to make the Bakry-Émery argument rigorous on Σ if 𝛽 ≥ 1 (see appendix B of [64]). The same technique can also make every step in Section 14.2 rigorous on Σ. We present here a more transparent argument that relies on the regularized measure 𝜔𝛿 on ℝ𝑁 already used in the proof of the LSI in Theorem 13.14. This path is technically simpler since it performs the core of the analysis on ℝ𝑁 , and it uses an extra input, the strong solution to the DBM, Theorem 12.2, to remove the regularization at the end. We now explain the correct procedure of the proof of Theorem 14.1 using this regularization. The goal is to prove (14.37). For any small 𝛿 > 0 we introduce the regularized versions 𝜇𝛿 and 𝜔𝛿 of the measures 𝜇 and 𝜔 as in Section 13.7. Let ℒ𝜔𝛿 be the generator of the regularized measure 𝜔𝛿 on ℝ𝑁 , and let 𝑓𝑡,𝛿 be the solution to 𝜕𝑡 𝑓𝑡,𝛿 = ℒ𝜔𝛿 𝑓𝑡,𝛿 on ℝ𝑁 with the same initial condition as for 𝛿 = 0; i.e., 𝑓0,𝛿 is the extension of 𝑓0 to the entire ℝ𝑁 with 𝑓0,𝛿 (𝐱) = 0 for 𝐱 ∉ Σ𝑁 . The three key theorems, Theorems 14.2–14.4, of Section 14.2 were formulated and proven for the regularized setup on ℝ𝑁 , so they can be applied to the regularized dynamics. Using the notation employed in these theorems, the conclusion is that if 𝑆(𝑓𝜏,𝛿 𝜇𝛿 |𝜔𝛿 ) ≤ 𝐶𝑁 𝑚

(14.38) (see (14.25)), then

| 1 | (14.39) |∫ ∑ 𝒢m,𝑖 (𝐱)(𝑓𝑡,𝛿 d𝜇𝛿 − d𝜔𝛿 )| ≤ | 𝑁 𝑖∈𝐽 | 𝐶𝑁 𝜀

𝑁 2 𝑄𝛿 𝜀 + 𝐶𝑒−𝑐𝑁 , |𝐽|𝑡 √

𝑡 ∈ [𝜏𝑁 𝜀 , 𝑁],

where 𝑄𝛿 is defined exactly as in (14.4) with the regularized measures, i.e., 𝑁

(14.40)

1 ∫ ∑ (𝑥𝑗 − 𝛾𝑗 )2 𝑓𝑡,𝛿 (𝐱)𝜇𝛿 (d𝐱). 𝑁 0≤𝑡≤𝑁 𝑗=1

𝑄𝛿 ∶= sup

Now we let 𝛿 → 0. Since 𝜔𝛿 converges weakly to 𝜔 in the sense that 𝜔𝛿 (𝐴) → 𝜔(𝐴 ∩ Σ𝑁 ) for any subset 𝐴 ∈ ℝ𝑁 , we have ∫ 𝑂(𝐱)d𝜔𝛿 → ∫ 𝑂(𝐱)d𝜔 for any bounded observable 𝑂. Setting 1 (14.41) 𝑂(𝐱) = ∑ 𝒢m,𝑖 (𝐱), 𝑁 𝑖∈𝐽 we see that the second term within the integral in the left-hand side of (14.39) converges to the corresponding term in (14.37). On the right-hand side a similar

162

14. UNIVERSALITY OF THE DYSON BROWNIAN MOTION

argument shows that 𝑄𝛿 → 𝑄 as 𝛿 → 0. Strictly speaking, the observable (𝑥𝑗 − 𝛾𝑗 )2 is unbounded, but a very simple cutoff argument shows that ℙ𝑓𝑡,𝛿 𝜇𝛿 (max |𝑥𝑗 | ≥ 𝑁 2 ) ≤ 𝑒−𝑐𝑁 𝑗

uniformly in 𝛿 and 𝑡 ∈ [0, 𝑁]. For the first term in the left side of (14.39), we use the stochastic representation of the solution to 𝜕𝑡 𝑓𝑡,𝛿 = ℒ𝜔𝛿 𝑓𝑡,𝛿 to note that 𝐱

∫ 𝑂(𝐱)𝑓𝑡,𝛿 (𝐱)d𝜔𝛿 = 𝔼𝑓0 𝜔 𝔼𝛿0 𝑂(𝐱(𝑡)),

(14.42)

ℝ𝑁

𝐱

where 𝔼𝛿0 denotes the expectation with respect to the law of the regularized DBM (𝐱(𝑡))𝑡 (see (13.82)) starting from 𝐱0 , and 𝔼𝑓0 𝜔 denotes the expectation of 𝐱0 with respect to the measure 𝑓0 𝜔. Now we use the existence of the strong solution to the DBM (13.74) for any 𝛽 ≥ 1, that can be obtained exactly as in Theorem 12.2 for the 𝑈𝑗 ≡ 0 case. Since 𝐱(𝑡) is continuous and remains in the open set Σ𝑁 , the probability that up to a fixed time 𝑡 it remains away from a 𝛿-neighborhood of the boundary of Σ𝑁 goes to 1 as 𝛿 → 0, i.e., lim ℙ𝜔𝛿 (dist(𝐱(𝑠), 𝜕Σ𝑁 ) ≥ 𝛿 ∶ 𝑠 ∈ [0, 𝑡]) = 1.

𝛿→0

Notice that (13.74) and (13.82) are exactly the same for paths that stay away from the boundary of Σ𝑁 . This means that the right-hand side of (14.42) converges to 𝔼𝑓0 𝜔 𝔼𝐱0 𝑂(𝐱(𝑡)), where 𝔼𝐱0 denotes expectation with respect to the law of (13.74). This proves that (14.43)

lim ∫ 𝑂(𝐱)𝑓𝑡,𝛿 (𝐱)d𝜔𝛿 = ∫ 𝑂(𝐱)𝑓𝑡 (𝐱)d𝜔,

𝛿→0 ℝ𝑁

Σ𝑁

which completes the proof of (14.37). Finally, we need to verify the condition (14.38). Following (14.32)–(14.33), we only need to show that 𝑆𝜇𝛿 (𝑓𝜏,𝛿 ) ≤ 𝐶𝑁 𝑚 for sufficiently small 𝛿 > 0. We first run the DBM for a very short time and replace the original matrix ensemble 𝐻0 by 𝐻𝜍 , its tiny Gaussian convolution of variance 𝜎 = 𝑁 −10 (see (14.35)). This is not a restriction, since Theorem 14.1 concerns 𝑓𝑡 for much larger times, 𝑡 ≫ 𝑁 −1 . The corresponding eigenvalue density 𝑓𝜍 supported on Σ𝑁 satisfies 𝑆𝜇 (𝑓𝜍 ) ≤ 𝐶𝑁 12 by (14.34). As a second cutoff, for some large 𝐾 we set 𝑓ˆ𝜍,𝐾 ∶= 𝐶𝐾 max{𝑓𝜍 , 𝐾} with a normalization constant 𝐶𝐾 such that ∫ 𝑓ˆ𝜍,𝐾 d𝜇 = 1 and 𝐶𝐾 → 1 as 𝐾 → ∞. We will actually apply Theorem 14.1 to 𝑓ˆ𝜍,𝐾 as an initial condition. The regularized flow also starts with this initial condition; i.e., we define 𝑓𝑡,𝛿 to be the solution of 𝜕𝑡 𝑓𝑡,𝛿 = ℒ𝜇𝛿 𝑓𝑡,𝛿 with 𝑓0,𝛿 ∶= 𝑓ˆ𝜍,𝐾 . Since the

14.4. DIRICHLET FORM INEQUALITY FOR ANY 𝜷 > 𝟎

163

entropy decays along the regularized flow, we immediately have 𝑆𝜇𝛿 (𝑓𝜏,𝛿 ) ≤ 𝑆𝜇𝛿 (𝑓0,𝛿 ) = 𝑆𝜇𝛿 (𝑓ˆ𝜍,𝐾 ). Since 𝑓ˆ𝜍,𝐾 is supported on Σ𝑁 and is bounded, clearly 𝑆𝜇𝛿 (𝑓ˆ𝜍,𝐾 ) converges to 𝑆𝜇 (𝑓ˆ𝜍,𝐾 ) as 𝛿 → 0. We then have 𝑆𝜇 (𝑓ˆ𝜍,𝐾 ) = 𝐶𝐾 ∫ max{𝑓𝜍 , 𝐾}[log max{𝑓𝜍 , 𝐾} + log 𝐶𝐾 ]d𝜇 → 𝑆𝜇 (𝑓𝜍 ) as 𝐾 → ∞ by monotone convergence. Thus, first choosing 𝐾 sufficiently large so that 𝑆𝜇 (𝑓ˆ𝜍,𝐾 ) ≤ 2𝑆𝜇 (𝑓𝜍 ) ≤ 𝐶𝑁 12 , then choosing 𝛿 sufficiently small, we achieved 𝑆𝜇𝛿 (𝑓𝜏,𝛿 ) ≤ 𝐶𝑁 12 . This verifies the condition (14.38) for some sufficiently small 𝛿, so that (14.39) holds. Letting 𝛿 → 0, we obtain (14.7), which completes the fully rigorous proof of Theorem 14.1. 14.4. Dirichlet Form Inequality for Any 𝜷 > 𝟎 Here we extend the proof of the key Dirichlet form inequality, Theorem 14.3, from 𝛽 ≥ 1 to any 𝛽 > 0. This extension will not be needed for this book, so the reader may skip this section, but we remark that the Dirichlet form inequality for 𝛽 > 0 played a crucial role in the proof of the universality for invariant ensembles for any 𝛽 > 0 (see [25]). Lemma 14.7. Let 𝛽 > 0 and 𝑞 be a probability density with respect to 𝜔 defined in (14.12) with ∇√𝑞 ∈ 𝐿∞ (d𝜔) and 𝑞 ∈ 𝐿∞ (d𝜔). Then, for any 𝐽 ⊂ {1, … , 𝑁 − 𝑚𝑛 − 1} and any 𝑡 > 0 we have | | 1 1 ∑ 𝒢𝑖,𝐦 𝑞 d𝜔 − ∫ ∑ 𝒢𝑖,𝐦 d𝜔| ≤ (14.44) |∫ |𝐽| 𝑖∈𝐽 | | |𝐽| 𝑖∈𝐽 𝐶

𝐷𝜔 (√𝑞) 𝑡 + 𝐶 √𝑆𝜔 (𝑞) e−𝑐𝑡/𝜏 . |𝐽| √

For 𝛽 ≥ 1, the conditions ∇√𝑞 ∈ 𝐿∞ and 𝑞 ∈ 𝐿∞ (d𝜔) can be removed. We emphasize that this lemma holds for any 𝛽 > 0; i.e., it does not (directly) rely on the existence of the DBM. The parameter 𝑡 is not connected with the time parameter of a dynamics on Σ𝑁 (although it emerges as a time cutoff in a regularized dynamics on ℝ𝑁 within the proof). Proof of Lemma 14.7. First we record the result if 𝜔 on Σ𝑁 is replaced with the regularized measure 𝜔𝛿 on ℝ𝑁 , as defined in (13.81) with the choice ˆ(𝐱) = ∑ 𝑈𝑗 (𝑥𝑗 ), where 𝑈𝑗 is given in (14.10). In this case the proof of of ℋ 𝑗 Theorem 14.3 applies to the letter even for 𝛽 > 0, since now we work on ℝ𝑁 and complications arising from the boundary are absent. We immediately obtain for any probability density ˆ 𝑞 on ℝ𝑁 with ∫ ˆ 𝑞 d𝜔𝛿 = 1, for any 𝐽 ⊂ {1, 𝑁 − 𝑚𝑛 − 1}

164

14. UNIVERSALITY OF THE DYSON BROWNIAN MOTION

and any 𝑡 > 0 that 1 1 | | ∑ 𝒢𝑖,𝐦 𝑞 ̂ d𝜔𝛿 − ∫ ∑ 𝒢𝑖,𝐦 d𝜔𝛿 || ≤ (14.45) ||∫ |𝐽| |𝐽| 𝑖∈𝐽

𝑖∈𝐽

𝐶

√ √ 𝐷𝜔𝛿 (√𝑞)̂ 𝑡 √

|𝐽|

+ 𝐶 √𝑆𝜔𝛿 (𝑞)̂ e−𝑐𝑡/𝜏 .

Suppose now that √𝑞 ∈ 𝐻 1 (d𝜔) and 𝑞 is a bounded probability density in Σ𝑁 with respect to 𝜔. Similarly to the proof of (13.76) for the general 𝛽 > 0 case, ˜ by symmetrization. Let ˜ we may extend 𝑞 to Σ 𝑞 denote this extension, which 1/2 ˜ is also bounded since 𝑞 has these properties. Then there is bounded, and ∇ 𝑓 is a constant 𝐶𝛿 such that 𝑞𝛿 ∶= 𝐶𝛿 ˜ 𝑞 is a probability density with respect to 𝜔𝛿 . Since ˜ 𝑞 is bounded, we have ∫ℝ𝑁 ˜ 𝑞 d𝜔𝛿 → ∫Σ𝑁 𝑞 d𝜔 by dominated convergence, and thus 𝐶𝛿 → 1 as 𝛿 → 0. Now we apply (14.45) 𝑞ˆ = 𝑞𝛿 . Taking the limit 𝛿 → 0, the left-hand side converges to that of (14.44) since 𝒢𝑖,m is a bounded smooth function and 𝑞𝛿 d𝜔𝛿 = 𝐶𝛿 ˜ 𝑞 d𝜔𝛿 converges weakly to 𝑞(𝜔)1(𝜔 ∈ Σ𝑁 )d𝜔 by dominated convergence. We thus have | | 1 1 ∑ 𝒢𝑖,𝐦 𝑞 d𝜔 − ∫ ∑ 𝒢𝑖,𝐦 d𝜔| ≤ (14.46) |∫ |𝐽| 𝑖∈𝐽 | |𝐽| 𝑖∈𝐽 | 𝐶 lim sup 𝛿→0

√ √ 𝐶𝛿 𝐷𝜔𝛿 (√ ˜ 𝑞)𝑡 √

|𝐽|

+ 𝐶 lim sup √𝑆𝜔𝛿 (𝐶𝛿 ˜ 𝑞 ) e−𝑐𝑡/𝜏 . 𝛿→0

1/2

1/2

˜ ) → ˜ By dominated convergence, using that ∇ 𝑓 ∈ 𝐿∞ , we have 𝐷𝜔𝛿 ( 𝑓 𝐷𝜔 (√𝑓). Similarly, the entropy term also converges to 𝑆𝜔 (𝑞) by ˜ 𝑞 ∈ 𝐿∞ . ∞ ∞ This proves (14.44) under the condition 𝑞 ∈ 𝐿 , ∇√𝑞 ∈ 𝐿 . Finally, these conditions can be removed for 𝛽 ≥ 1 by a simple approximation. We may assume 𝐷𝜔 (√𝑞) < ∞; otherwise, (14.44) is an empty statement. By the LSI (13.76), we also know that 𝑆𝜔 (𝑞) < ∞. First, we still keep the assumption that 𝑞 ∈ 𝐿∞ . Since 𝐶0∞ (Σ) is dense in 𝐻 1 (d𝜔) (see Section 12.4), we can find a sequence of densities 𝑞𝑛 ∈ 𝐿∞ (Σ) such that ∇√𝑞𝑛 ∈ 𝐿2 (d𝜔) and √𝑞𝑛 → √𝑞 in 𝐿2 (d𝜔) and ∇√𝑞𝑛 → ∇√𝑞 in 𝐿2 (d𝜔). In fact, the construction in Section 12.4 guarantees that, apart from an irrelevant smoothing, 𝑞𝑛 may be chosen of the form 𝑞𝑛 = 𝐶𝑛 𝜙𝑛 𝑞 where 𝜙𝑛 is a cutoff function with 0 ≤ 𝜙𝑛 ≤ 1, converging to 1 pointwise and 𝐶𝑛 → 1. We know that (14.44) holds for every 𝑞𝑛 . Taking the limit 𝑛 → ∞, the left-hand side converges since | | |∫ 𝑂(𝑞𝑛 − 𝑞)d𝜔| ≤ ‖𝑂‖∞ ‖√𝑞𝑛 − √𝑞‖𝐿2 (d𝜔) ‖√𝑞𝑛 + √𝑞‖𝐿2 (d𝜔) → 0, | | where we have used that ‖√𝑞𝑛 + √𝑞‖2 ≤ ‖√𝑞𝑛 ‖2 + ‖√𝑞‖2 and ‖√𝑞‖2 = ∫ 𝑞 d𝜔 = 1. Here, 𝑂 is given by (14.41). For the right-hand side of (14.44)

14.5. FROM GAP DISTRIBUTION TO CORRELATION FUNCTIONS

165

applied to the approximate function 𝑞𝑛 , we note that 𝐷𝜔 (√𝑞𝑛 ) → 𝐷𝜔 (√𝑞) directly by the choice of 𝑞𝑛 . Finally, we need to show that 𝑆𝜔 (𝑞𝑛 ) → 𝑆𝜔 (𝑞) as 𝑛 → ∞. Clearly, 𝑆𝜔 (𝑞𝑛 ) − 𝑆𝜔 (𝑞) = ∫[𝑞𝑛 log 𝑞𝑛 − 𝑞 log 𝑞]d𝜔 Σ

= ∫[𝐶𝑛 𝜙𝑛 𝑞 log(𝐶𝑛 𝜙𝑛 ) + (𝐶𝑛 𝜙𝑛 − 1)𝑞 log 𝑞]d𝜔, Σ

and both integrals go to 0 by dominated convergence and by 𝑆𝜔 (𝑞) < ∞. This proves (14.44) for any 𝑞 ∈ 𝐿∞ . Finally, this last condition can be removed by approximating any density 𝑞 with 𝑞𝑀 ∶= 𝐶𝑀 max{𝑞, 𝑀}, where 𝐶𝑀 is the normalization and 𝐶𝑀 → 1 as 𝑀 → ∞. Similarly to the proof in (13.84), one can show that 𝑆𝜔 (𝑞𝑀 ) → 𝑆𝜔 (𝑞) and 𝐷𝜔 (𝑞𝑀 ) → 𝐷𝜔 (𝑞). Thus we can pass to the limit in the inequality (14.44) written up for 𝑞𝑀 . This proves Lemma 14.7. Notice that the proof did not use the existence of the DBM; instead, it used the existence of the regularized DBM. □ 14.5. From Gap Distribution to Correlation Functions: Proof of Theorem 12.4 Theorem 12.4 follows from Theorem 14.1 if we can show that the correlation function difference in (12.23) can be bounded in terms of the difference of the expectation of gap observables in (14.8). We state it as the following lemma, which clearly proves Theorem 12.4 with Theorem 14.1 as an input. Lemma 14.8. Let 𝑓 be a probability density with respect to the Gaussian equilibrium measure (12.13). Suppose that the estimate (12.22) holds with some exponent 𝜉 > 0 and that (14.47)

| | 1 |∫ ∑ 𝒢𝑖,𝐦 (𝐱)(𝑓 d𝜇 − d𝜇)| ≤ 𝐶 | |𝐽| 𝑖∈𝐽 |

1 𝛿−1 √|𝐽|𝑁

also holds for some 𝛿 > 0 and for any 𝐽 ⊂ {1, … , 𝑁 − 𝑚𝑛 }, where 𝒢𝑖,𝐦 is defined in (14.6). Then, for any 𝜀 > 0 and 𝑁 large enough, we have for any 𝑏 = 𝑏𝑁 with 𝑁 −1 ≪ 𝑏 ≪ 1 that | 𝐸+𝑏 d𝐸 ′ | 𝜶 (𝑛) (𝑛) ∫ d𝜶 𝑂(𝜶)(𝑝𝑓𝜇,𝑁 − 𝑝𝜇,𝑁 )(𝐸 ′ + (14.48) |∫ )| ≤ 𝑁𝜚sc (𝐸) | | 𝐸−𝑏 2𝑏 ℝ𝑛 𝑁 2𝜀 [

𝑁 −1+𝜉 𝑁 −𝛿 +√ ]. 𝑏 𝑏

This is a fairly standard technical argument, and the details will be given in Section 14.6. Here we just summarize the main points. To understand the difference between (14.47) and (14.48), consider 𝑛 = 1 for simplicity and let 𝑚1 = 1, say. The observable (14.7) answers the question, What is the empirical

166

14. UNIVERSALITY OF THE DYSON BROWNIAN MOTION

distribution of differences of consecutive eigenvalues?; in other words, (14.7) directly identifies the gap distribution. Correlation functions answer the question, What is the probability that there are two eigenvalues at a fixed distance away from each other, in other words, that they are not directly sensitive to the possible other eigenvalues in between? Of course, these two questions are closely related, and it is easy to deduce the answers from each other under some reasonable assumptions on the distribution of the eigenvalues. We now prove that the correlation functions can be derived from (generalized) gap distributions, which is the content of Lemma 14.8. 14.6. Details of the Proof of Lemma 14.8 By definition of the correlation function in (12.19), we have 𝐸+𝑏

∫ 𝐸−𝑏

d𝐸 ′ 𝜶 (𝑛) ∫ d𝜶𝑂(𝜶)𝑝𝑓𝜇,𝑁 (𝐸 ′ + ) 2𝑏 ℝ𝑛 𝑁 𝐸+𝑏

(14.49)

= 𝐶𝑁,𝑛 ∫ 𝐸−𝑏 𝐸+𝑏

= 𝐶𝑁,𝑛 ∫ 𝐸−𝑏

d𝐸 ′ ∫ ∑ 𝑂(𝑁(𝑥𝑖1 − 𝐸 ′ ), 𝑁(𝑥𝑖1 − 𝑥𝑖2 ), … , 𝑁(𝑥𝑖𝑛−1 − 𝑥𝑖𝑛 ))𝑓 d𝜇 2𝑏 𝑖 ≠⋯≠𝑖 1

𝑛

𝑁



d𝐸 ∫ ∑ ∑ 𝑌𝑖,m (𝐸 ′ , 𝐱)𝑓 d𝜇 2𝑏 m∈𝑆 𝑖=1 𝑛

𝑛

with 𝐶𝑁,𝑛 ∶= 𝑁 (𝑁 − 𝑛)! /𝑁! = 1 + 𝑂𝑛 (𝑁 −1 ), where we let 𝑆𝑛 denote the set of increasing positive integers m = (𝑚2 , … , 𝑚𝑛 ) ∈ ℕ𝑛−1 + , 𝑚2 < ⋯ < 𝑚𝑛 , and we introduced (14.50)

𝑌𝑖,m (𝐸 ′ , 𝐱) ∶= 𝑂(𝑁(𝑥𝑖 − 𝐸 ′ ), 𝑁(𝑥𝑖 − 𝑥𝑖+𝑚2 ), … , 𝑁(𝑥𝑖 − 𝑥𝑖+𝑚𝑛 )). (𝑛)

We will set 𝑌𝑖,m = 0 if 𝑖 + 𝑚𝑛 > 𝑁. By permutational symmetry of 𝑝𝑓,𝑁 we can assume that 𝑂 is symmetric, and thus we restrict the summation to 𝑖1 < ⋯ < 𝑖𝑛 upon an overall factor 𝑛!. Then, we changed indices 𝑖 = 𝑖1 , 𝑖2 = 𝑖 + 𝑚2 , 𝑖3 = 𝑖+𝑚3 , … , and performed a resummation over all index differences encoded in m. Apart from the first variable 𝑁(𝑥𝑖1 − 𝐸 ′ ), the function 𝑌𝑖,m is of the form (14.6), so (14.47) will apply. The dependence on the first variable will be negligible after the d𝐸 ′ integration on a macroscopic interval. To control the error terms in this argument, especially to show that the error terms in the potentially infinite sum over m ∈ Sn converge, one needs an a priori bound on the local density. But this information is provided very precisely by the bulk rigidity estimate (12.22). The details for the rest of the argument will be presented in the next subsection. Since (14.49) also holds if we set 𝑓 = 1, to prove Lemma 14.8, we only need to estimate (14.51)

𝑁 | | 𝐸+𝑏 d𝐸 ′ ∫ ∑ ∑ 𝑌𝑖,m (𝐸 ′ , 𝐱)(𝑓 − 1)d𝜇|. Θ ∶= |∫ | | 𝐸−𝑏 2𝑏 m∈𝑆𝑛 𝑖=1

Let 𝑀 be an 𝑁-dependent parameter chosen at the end of the proof. Let 𝑆𝑛 (𝑀) ∶= {m ∈ 𝑆𝑛 , 𝑚𝑛 ≤ 𝑀},

𝑆𝑛c (𝑀) ∶= 𝑆𝑛 ⧵ 𝑆𝑛 (𝑀),

14.6. DETAILS OF THE PROOF OF LEMMA 14.8

167 (1)

(2)

and note that |𝑆𝑛 (𝑀)| ≤ 𝑀 𝑛−1 . We have the simple bound Θ ≤ Θ𝑀 + Θ𝑀 + (3) Θ𝑀 where (1) Θ𝑀

(14.52)

𝑁 | 𝐸+𝑏 d𝐸 ′ | ∫ ∑ ∑ 𝑌𝑖,𝐦 (𝐸 ′ , 𝐱)(𝑓 − 1)d𝜇| ∶= |∫ | 𝐸−𝑏 2𝑏 | 𝐦∈𝑆 (𝑀) 𝑖=1 𝑛

and (2) Θ𝑀

(14.53)

𝑁 | 𝐸+𝑏 d𝐸 ′ | ∫ ∑ 𝑌𝑖,𝐦 (𝐸 ′ , 𝐱)𝑓 d𝜇|. ∶= ∑ |∫ 2𝑏 | | 𝑖=1 𝐦∈𝑆c (𝑀) 𝐸−𝑏 𝑛

(3)

(2)

We define Θ𝑀 to be the same as Θ𝑀 but with 𝑓 replaced by the constant 1, i.e., the equilibrium measure 𝜇. (1)

Step 1. Small 𝐦 case; estimate of Θ𝑀 . After performing the d𝐸 ′ integration, we will eventually apply Theorem 14.1 to the function 𝐺(𝑢1 , 𝑢2 , … ) ∶= ∫ 𝑂(𝑦, 𝑢1 , 𝑢2 , … )d𝑦, ℝ

i.e., to the quantity ∫ d𝐸 ′ 𝑌𝑖,𝐦 (𝐸 ′ , 𝐱) =

(14.54)



1 𝐺(𝑁(𝑥𝑖 − 𝑥𝑖+𝑚2 ), … ) 𝑁

for each fixed 𝑖 and m. ± For any 𝐸 and 0 < 𝜁 < 𝑏, define sets of integers 𝐽 = 𝐽𝐸,𝑏,𝜁 and 𝐽 ± = 𝐽𝐸,𝑏,𝜁 by 𝐽 ∶= {𝑖 ∶ 𝛾𝑖 ∈ [𝐸 − 𝑏, 𝐸 + 𝑏]},

𝐽 ± ∶= {𝑖 ∶ 𝛾𝑖 ∈ [𝐸 − (𝑏 ± 𝜁), 𝐸 + 𝑏 ± 𝜁]},

where 𝛾𝑖 was defined in (11.31). Clearly, 𝐽 − ⊂ 𝐽 ⊂ 𝐽 + . Using this notation, we have 𝐸+𝑏

(14.55)

∫ 𝐸−𝑏

𝐸+𝑏

𝑁 ′



d𝐸 ∑ 𝑌𝑖,𝐦 (𝐸 , 𝐱) = ∫ 𝑖=1

𝐸−𝑏

d𝐸 ′ ∑ 𝑌𝑖,𝐦 (𝐸 ′ , 𝐱) + Ω+ 𝐽,𝐦 (𝐱). 𝑖∈𝐽+

+ The error term Ω+ 𝐽,𝐦 , defined by (14.55) indirectly, comes from those 𝑖 ∉ 𝐽 −1 ′ indices, for which 𝑥𝑖 ∈ [𝐸 − 𝑏, 𝐸 + 𝑏] + 𝑂(𝑁 ) since 𝑌𝑖,𝐦 (𝐸 , 𝐱) = 0 unless |𝑥𝑖 − 𝐸 ′ | ≤ 𝐶/𝑁, the constant depending on the support of 𝑂. Thus, −1 |Ω+ 𝐽,𝐦 (𝐱)| ≤ 𝐶𝑁 #{𝑖 ∶ |𝑥𝑖 − 𝛾𝑖 | ≥ 𝜁/2, |𝑥𝑖 − 𝐸| ≤ 2𝑏}

for any sufficiently large 𝑁 assuming 𝜁 ≫ 1/𝑁 and using that 𝑂 is a bounded function. The additional 𝑁 −1 factor comes from the d𝐸 ′ integration. Due to the ′ rigidity estimate (12.22) and choosing 𝜁 = 𝑁 −1+𝜉+𝜀 with some 𝜀′ > 0, we get (14.56)

−𝐷 ∫ |Ω+ 𝐽,𝐦 (𝐱)|𝑓 d𝜇 ≤ 𝑁

168

14. UNIVERSALITY OF THE DYSON BROWNIAN MOTION

for any 𝐷. We can also estimate 𝐸+𝑏

d𝐸 ′ ∑ 𝑖, m(𝐸 ′ , 𝐱)



𝑖∈𝐽+

𝐸−𝑏

𝐸+𝑏

d𝐸 ′ ∑ 𝑌𝑖,m (𝐸 ′ , 𝐱) + 𝐶𝑁 −1 |𝐽 + ⧵ 𝐽 − |

≤∫

𝑖∈𝐽−

𝐸−𝑏

(14.57)

= ∫ d𝐸 ′ ∑ 𝑌𝑖,m (𝐸 ′ , 𝐱) + 𝐶𝑁 −1 |𝐽 + ⧵ 𝐽 − | + Ξ+ 𝐽,m (𝐱) 𝑖∈𝐽−



≤ ∫ d𝐸 ′ ∑ 𝑌𝑖,m (𝐸 ′ , 𝐱) + 𝐶𝑁 −1 |𝐽 + ⧵ 𝐽 − | ℝ

𝑖∈𝐽

+ 𝐶𝑁 −1 |𝐽 ⧵ 𝐽 − | + Ξ+ 𝐽,m (𝐱) − where the error term Ξ+ 𝐽,m , defined by (14.57), comes from indices 𝑖 ∈ 𝐽 such that 𝑥𝑖 ∉ [𝐸 −𝑏, 𝐸 +𝑏]+𝑂(1/𝑁). It satisfies the same bound (14.56) as Ω+ 𝐽,m . By the continuity of 𝜚, the density of the 𝛾𝑖 ’s is bounded by 𝐶𝑁, thus |𝐽 + ⧵𝐽 − | ≤ 𝐶𝑁𝜁 and |𝐽 ⧵ 𝐽 − | ≤ 𝐶𝑁𝜁. Therefore, summing up the formula (14.54) for 𝑖 ∈ 𝐽, we obtain from (14.55), (14.56), and (14.57) 𝐸+𝑏

∫ 𝐸−𝑏

𝑁

d𝐸 ′ ∫ ∑ 𝑌𝑖,m (𝐸 ′ , 𝐱)𝑓 d𝜇 ≤ 𝑖=1



1 ∑ 𝐺(𝑁(𝑥𝑖 − 𝑥𝑖+𝑚2 ), … )𝑓 d𝜇 + 𝐶𝜁 + 𝐶𝑁 −𝐷 𝑁 𝑖∈𝐽

for each m ∈ 𝑆𝑛 . A similar lower bound can be obtained analogously, and we get 𝑁 | 𝐸+𝑏 ′ (14.58) | ∫ d𝐸 ∫ ∑ 𝑌𝑖,m (𝐸 ′ , 𝐱)𝑓 d𝜇 | 𝐸−𝑏 𝑖=1

−∫

| 1 ∑ 𝐺(𝑁(𝑥𝑖 − 𝑥𝑖+𝑚2 ), … )𝑓 d𝜇| ≤ 𝐶𝜁 + 𝐶𝑁 −𝐷 𝑁 𝑖∈𝐽 |

for each m ∈ 𝑆𝑛 . The error term 𝑁 −𝐷 can be neglected. Adding up (14.58) for all m ∈ 𝑆𝑛 (𝑀), we get 𝑁 | 𝐸+𝑏 ′ (14.59) |∫ d𝐸 ∫ ∑ ∑ 𝑌𝑖,m (𝐸 ′ , 𝐱)𝑓 d𝜇 | 𝐸−𝑏 m∈𝑆 (𝑀) 𝑖=1 𝑛

−∫

| 1 ∑ 𝐺(𝑁(𝑥𝑖 − 𝑥𝑖+𝑚2 ), … )𝑓 d𝜇| ≤ 𝐶𝑀 𝑛−1 𝜁. 𝑁 𝑖∈𝐽 | (𝑀)

∑ m∈𝑆𝑛

Clearly, the same estimate holds for the equilibrium, i.e., if we set 𝑓 = 1 in (14.59). Subtracting these two formulas and applying (14.47) to each summand

14.6. DETAILS OF THE PROOF OF LEMMA 14.8

169

on the second term in (14.58), we conclude that 𝑁 | | 𝐸+𝑏 d𝐸 ′ (1) ∫ ∑ ∑ 𝑌𝑖,m (𝐸 ′ , 𝐱)(𝑓 d𝜇 − d𝜇)| Θ𝑀 = | ∫ | | 𝐸−𝑏 2𝑏 m∈𝑆 (𝑀) 𝑖=1

(14.60)

𝑛

≤ 𝐶𝑀

𝑛−1

−1

(𝑏 𝑁

−1+𝜉+𝜀′

+ 𝑏−1/2 𝑁 −𝛿/2 ), (1)

where we have used that |𝐽| ≤ 𝐶𝑁𝑏. This completes the estimate of Θ𝑀 . (2)

let

(3)

Step 2. Large m case; estimate of Θ𝑀 and Θ𝑀 . For a fixed 𝑦 ∈ ℝ, ℓ > 0, 𝑁

𝜒(𝑦, ℓ) ∶= ∑ 1(𝑥𝑖 ∈ [𝑦 − 𝑖=1

ℓ ℓ , 𝑦 + ]) 𝑁 𝑁

denote the number of points in the interval [𝑦 − ℓ/𝑁, 𝑦 + ℓ/𝑁]. Note that for a fixed m = (𝑚2 , … , 𝑚𝑛 ), we have 𝑁

∑ |𝑌𝑖,m (𝐸 ′ , 𝐱)| ≤ 𝐶 ⋅ 𝜒(𝐸 ′ , ℓ) ⋅ 1(𝜒(𝐸 ′ , ℓ) ≥ 𝑚𝑛 ) 𝑖=1

(14.61)

𝑁

≤ 𝐶 ∑ 𝑚 ⋅ 1(𝜒(𝐸 ′ , ℓ) ≥ 𝑚) 𝑚=𝑚𝑛

where ℓ denotes the maximum of |𝑢1 | + ⋯ + |𝑢𝑛 | in the support of 𝑂(𝑢1 , … , 𝑢𝑛 ). Since the summation over all increasing sequences m = (𝑚2 , … , 𝑚𝑛 ) ∈ ℕ𝑛−1 with a fixed 𝑚𝑛 contains at most 𝑚𝑛𝑛−2 terms, we have + (14.62)

𝑁 | | 𝐸+𝑏 ′ |∫ d𝐸 ∫ ∑ 𝑌𝑖,m (𝐸 ′ , 𝐱)𝑓 d𝜇| ≤ | | 𝑖=1 m∈𝑆c (𝑀) 𝐸−𝑏

∑ 𝑛

𝐸+𝑏

𝐶∫

𝑁

d𝐸 ′ ∑ 𝑚𝑛−1 ∫ 1(𝜒(𝐸 ′ , ℓ) ≥ 𝑚)𝑓 d𝜇.

𝐸−𝑏

𝑚=𝑀

The rigidity bound (12.22) clearly implies ∫ 1(𝜒(𝐸 ′ , ℓ) ≥ 𝑚)𝑓 d𝜇 ≤ 𝐶𝐷,𝜀′ 𝑁 −𝐷 ′

for any 𝐷 > 0 and 𝜀′ > 0, as long as 𝑚 ≥ ℓ𝑁 𝜀 . Therefore, choosing 𝐷 = 2𝑛 + 2, we see from (14.62) that Θ(2) is negligible, e.g., (2)

Θ𝑀 ≤ 𝐶𝑁 −1



if 𝑀 ≥ ℓ𝑁 𝜀 .

Since all rigidity bounds hold for the equilibrium measure, the last bound is valid ′ (3) for Θ𝑀 as well. We will choose 𝑀 = ℓ𝑁 𝜀 and recall that ℓ is independent of 𝑁. Combining these bounds with (14.60), we obtain ′



Θ ≤ 𝐶𝑁 𝜀 (𝑛+1) (𝑏−1 𝑁 −1+𝜉+𝜀 + 𝑏−1/2 𝑁 −𝛿/2 ) + 𝐶𝑁 −1 . Choosing 𝜀′ = 𝜀/(𝑛 + 1), we conclude the proof of Lemma 14.8.

CHAPTER 15

Continuity of Local Correlation Functions Under the Matrix OU Process We have completed the first two steps of the three-step strategy introduced in Chapter 5, i.e., the local semicircle law and the universality of Gaussian divisible ensembles (Theorem 12.4). In this section, we will complete this strategy by proving a continuity result for the local correlation functions of the matrix OU process in the following Theorem 15.2 and Lemma 15.3. This is Step 3a defined in Chapter 5. From these results, we obtain a weaker version of Theorem 5.1; namely, we get averaged energy universality of Wigner matrices but only on scale 𝑏 ≥ 𝑁 −1/2+𝜀 . In Section 16.1, we will use the idea of “approximation by a Gaussian divisible ensemble” and prove Theorem 5.1 down to any scale 𝑏 ≥ 𝑁 −1+𝜀 . Theorem 15.1 ([68, theorem 2.2]). Let 𝐻 be an 𝑁 × 𝑁 real symmetric or complex Hermitian Wigner matrix. In the Hermitian case we assume that the real and imaginary parts are i.i.d. Suppose that the distribution 𝜈 of the rescaled matrix elements √𝑁ℎ𝑖𝑗 satisfies the decay condition (5.6). Fix a small 𝜀 > 0, an integer 𝑛 ≥ 1 and let 𝑂 ∶ ℝ𝑛 → ℝ be a continuous, compactly supported function. Then, for any |𝐸| < 2 and 𝑏 ∈ [𝑁 −1/2+𝜀 , 𝑁 −𝜀 ], we have 𝐸+𝑏

(15.1)

𝜶 1 (𝑛) (𝑛) ∫ lim d𝐸 ′ ∫ d𝜶 𝑂(𝜶)(𝑝𝐻,𝑁 − 𝑝𝐺,𝑁 )(𝐸 ′ + ) = 0. 𝑁 𝑁→∞ 2𝑏 𝐸−𝑏 ℝ𝑛

To prove this theorem, we first recall the matrix OU process (12.1) defined by (15.2)

d𝐻𝑡 =

1

1 dB𝑡 − 𝐻𝑡 d𝑡 2 √𝑁

with the initial data 𝐻0 . The eigenvalue evolution of this process is the DBM and recall that we denote the eigenvalue distribution at the time 𝑡 by 𝑓𝑡 d𝜇 with 𝑓𝑡 satisfying (12.17). In this section, we assume that the initial data 𝐻0 is an 𝑁 × 𝑁 Wigner matrix and the distribution of matrix element satisfies the uniform polynomial decay condition (5.6). We have the following Green function continuity theorem for the matrix OU process. Theorem 15.2 (Continuity of Green function). Suppose that the initial data 𝐻0 is an 𝑁 × 𝑁 Wigner matrix with the distribution of matrix elements satisfying the uniform polynomial decay condition (5.6). Let 𝜅 > 0 be arbitrary and suppose 171

172

15. CONTINUITY OF LOCAL CORRELATION FUNCTIONS

that for some small parameter 𝜎 > 0 and for any 𝑦 ≥ 𝑁 −1+𝜍 we have the following estimate on the diagonal elements of the resolvent for any 0 ≤ 𝑡 ≤ 1: (15.3)

1 | | max max ||( ) | ≺ 𝑁 2𝜍 1≤𝑘≤𝑁 |𝐸|≤2−𝜅 𝐻𝑡 − 𝐸 − 𝑖𝑦 𝑘𝑘 |

with some constants 𝐶, 𝑐 depending only on 𝜎, 𝜅. Let 𝐺(𝑡, 𝑧) = (𝐻𝑡 − 𝑧)−1 denote the resolvent and 𝑚(𝑡, 𝑧) = 𝑁 −1 Tr 𝐺(𝑡, 𝑧). Suppose that 𝐹(𝑥1 , … , 𝑥𝑛 ) is a function such that for any multi-index 𝛼 = (𝛼1 , … , 𝛼𝑛 ) with 1 ≤ |𝛼| ≤ 5 and for any 𝜀′ > 0 sufficiently small, we have (15.4)



max{|𝜕 𝛼 𝐹(𝑥1 , … , 𝑥𝑛 )| ∶ max |𝑥𝑗 | ≤ 𝑁 𝜀 } ≤ 𝑁 𝐶0 𝜀



𝑗

and (15.5)

max{|𝜕 𝛼 𝐹(𝑥1 , … , 𝑥𝑛 )| ∶ max |𝑥𝑗 | ≤ 𝑁 2 } ≤ 𝑁 𝐶0 𝑗

for some constant 𝐶0 . Let 𝜀 > 0 be arbitrary and choose an 𝜂 with 𝑁 −1−𝜀 ≤ 𝜂 ≤ 𝑁 −1 . For any sequence of complex parameters 𝑧𝑗 = 𝐸𝑗 ± 𝑖𝜂, 𝑗 = 1, … , 𝑛, with |𝐸𝑗 | ≤ 2 − 𝜅, there is a constant 𝐶 such that for any choices of the signs in the imaginary part of 𝑧𝑗 and for any 𝑡 ∈ [0, 1] we have (15.6) |𝔼𝐹(𝑚(𝑡, 𝑧1 ), … , 𝑚(𝑡, 𝑧𝑛 )) − 𝔼𝐹(𝑚(0, 𝑧1 ), … , 𝑚(0, 𝑧𝑛 ))| ≤ 𝐶𝑁 20𝜍+6𝜀 (𝑡√𝑁). We defer the proof of Theorem 15.2 to the next section. The following result shows how the Green function continuity, i.e., estimate of the type (15.6), can be used to compare correlation functions. The proof will be given in Section 15.2. Next we will compare local statistics of two Wigner matrix ensembles. We will use the labels 𝐯 and 𝐰 to distinguish them, because later in Chapter 16 we will denote the matrix elements of the two ensembles by different letters, 𝑣𝑖𝑗 and 𝑤𝑖𝑗 . For any two (generalized) Wigner matrix ensembles 𝐻 𝐯 and 𝐻 𝐰 , we denote the probability laws of their eigenvalues 𝜆𝐯 and 𝜆𝐰 by 𝜇𝐯 and 𝜇𝐰 , respectively. Denote by 𝑚𝐯 (𝑧), 𝑚𝐰 (𝑧) the Stieltjes transforms of the eigenvalues, i.e., 𝑁

1 1 ∑ , 𝑚 (𝑧) = 𝑁 𝑗=1 𝜆𝑗𝐯 − 𝑧 𝐯

(𝑛)

(𝑛)

and similarly for 𝑚𝐯 (𝑧). Let 𝑝𝐯,𝑁 and 𝑝𝐰,𝑁 be the 𝑛-point correlation functions of the eigenvalues w.r.t. 𝜇𝐯 and 𝜇𝐰 . We remind the reader that sometimes we will use the notation 𝔼𝐯 𝐹(𝜆) to denote the expectation of 𝔼𝐹(𝜆𝐯 ). Although the setup is motivated by local correlation functions of eigenvalues, we point out that the concept of matrices or eigenvalues plays no role in the following theorem. It is purely a statement about comparing two point processes on the real line, asserting that if the finite-dimensional marginals of the Stieltjes transforms are close, then the local correlation functions are also close. It is essential that in (15.8) below the spectral parameter of the Stieltjes transform must be at distance

15. CONTINUITY OF LOCAL CORRELATION FUNCTIONS

173

𝑁 −1−𝜍 to the real axis for some 𝜎 > 0, i.e., below the scale of the eigenvalue spacing. Controlling the Stieltjes transform on this very short scale is necessary to identify and compare local correlation functions at the scale 𝑁 −1 . The following theorem is a slightly modified version of [69, theorem 6.4]. Theorem 15.3 (Correlation function comparison). Let 𝜅 > 0 be arbitrary, and suppose that for some small parameters 𝜎, 𝛿 > 0 the following two conditions hold: (i) For any 𝜀 > 0 and any 𝑘 integer (15.7)

𝔼[Im 𝑚𝐯 (𝐸 + 𝑖𝑁 −1+𝜀 )]𝑘 + 𝔼[Im 𝑚𝐰 (𝐸 + 𝑖𝑁 −1+𝜀 )]𝑘 ≤ 𝐶

holds for any |𝐸| ≤ 2 − 𝜅 and 𝑁 ≥ 𝑁0 (𝜀, 𝑘, 𝜅). (ii) For any sequence 𝑧𝑗 = 𝐸𝑗 + 𝑖𝜂𝑗 , 𝑗 = 1, … , 𝑛, with |𝐸𝑗 | ≤ 2 − 𝜅 and 𝜂𝑗 = 𝑁 −1−𝜍𝑗 for some 𝜎𝑗 ≤ 𝜎, we have (15.8)

|𝔼(Im 𝑚𝐯 (𝑧1 ) ⋯ Im 𝑚𝐯 (𝑧𝑛 )) − 𝔼(Im 𝑚𝐰 (𝑧1 ) ⋯ Im 𝑚𝐰 (𝑧𝑛 ))| ≤ 𝑁 −𝛿 .

Then, for any integer 𝑛 ≥ 1 there are positive constants 𝑐𝑛 = 𝑐𝑛 (𝜎, 𝛿) such that for any |𝐸| ≤ 2 − 2𝜅 and for any 𝐶 1 -function 𝑂 ∶ ℝ𝑛 → ℝ with compact support, (15.9)

(𝑛)

(𝑛)

∫ d𝜶 𝑂(𝜶)(𝑝𝐯,𝑁 − 𝑝𝐰,𝑁 )(𝐸 + ℝ𝑛

𝜶 ) ≤ 𝐶𝑁 −𝑐𝑛 𝑁

where 𝐶 depends on 𝑂 and 𝑁 is sufficiently large. We remark that in some applications we will use slightly different conditions. Instead of (15.8) we may assume (15.10) |𝔼𝐹 (Im 𝑚𝐯 (𝑧1 ), … , Im 𝑚𝐯 (𝑧𝑛 )) − 𝔼𝐹 (Im 𝑚𝐰 (𝑧1 ), … , Im 𝑚𝐰 (𝑧𝑛 )) | ≤ 𝑁 −𝛿 where 𝐹 is as in Theorem 15.2. Then, (15.8) holds since we can approximate 𝔼[Im 𝑚𝐰 (𝑧1 ) … Im 𝑚𝐰 (𝑧𝑛 )] by the expression in (15.10), where the function 𝐹 is chosen to be 𝐹(𝑥1 , … , 𝑥𝑛 ) ∶= 𝑥1 ⋯ 𝑥𝑛 if max𝑗 |𝑥𝑗 | ≤ 𝑁 c , and it is smoothly cut off to go to 0 in the regime max𝑗 |𝑥𝑗 | ≥ 𝑁 2𝑐 for some small 𝑐 > 0. To see this, recall the elementary inequality, 𝜂 𝜂 (15.11) 𝜃𝜂1 ≤ 2 𝜃𝜂2 implying Im 𝑚(𝐸 + 𝑖𝜂1 ) ≤ 2 Im 𝑚(𝐸 + 𝑖𝜂2 ) 𝜂1 𝜂1 for any 𝜂2 ≥ 𝜂1 > 0. Hence, (15.7) implies that (15.12)

𝔼[Im 𝑚𝐯 (𝑧)]𝑘 ≤ 𝐶(𝑁𝜂)𝑘

for any 𝑘 > 0. Let 𝜂 = 𝑁 −1+𝑎 , 𝑎 > 0, and consider 𝑛 = 1 for simplicity. By choosing 𝑐 = 2𝑎, 𝔼|𝐹(Im 𝑚𝐯 (𝑧)) − Im 𝑚𝐯 (𝑧)| ≤ 𝔼 Im 𝑚𝐯 (𝑧)1(Im 𝑚𝐯 (𝑧) ≥ 𝑁 c ) ≤ 𝑁 −𝑐(𝑘−1) 𝔼[Im 𝑚𝐯 (𝑧)]𝑘 ≤ 𝑁 𝑘𝑎−𝑐(𝑘−1) ≤ 𝑁 −𝛿

174

15. CONTINUITY OF LOCAL CORRELATION FUNCTIONS

provided that 𝑘 is large enough. Similar arguments can be used for general 𝑛, and this implies that the difference between the expectation of 𝐹 and the product is negligible. This proves that the condition (15.8) can be replaced by the condition (15.10). Notice that we pass through the continuity of traces of Green functions as an intermediate step to get continuity of the correlation functions. If we choose to follow the evolution of the correlation functions directly by differentiating them, it will involve higher derivatives of eigenvalues. From the formulas of derivatives of eigenvalues and eigenvectors (12.9), (12.10), higher derivatives of eigenvalues involve singularities of the form (𝜆𝑖 − 𝜆𝑗 )−𝑛 for some positive integers 𝑛. These singularities are very difficult to control precisely. Our approach to use the Green function as an intermediate step completely avoids this difficulty because the Green function has a natural regularization parameter, the imaginary part of the spectral parameter. Proof of Theorem 15.1. Recall that the matrix OU (15.2) can be solved by the formula (12.21) so that the probability distribution of the matrix OU is given by a Wigner matrix ensemble if the initial data is a Wigner matrix ensemble. More precisely, if we denote the initial Wigner matrix by 𝐻0 , then the distribution of 𝐻𝑡 is the same as 𝑒−𝑡/2 𝐻0 + (1 − 𝑒−𝑡 )1/2 𝐻G . Hence, the rigidity holds in this case by (11.32), and (15.3) holds with any sufficiently small 𝜎 > 0; (𝑛) we may choose 𝜎 = 𝜀1 . Recall that 𝑝𝑡,𝑁 denotes the correlation functions of 𝐻𝑡 . We now apply Theorem 15.3 with 𝐻 𝐯 = 𝐻0 and 𝐻 𝐰 = 𝐻𝑡 . The assumption (15.8) can be verified by (15.6) if 𝑡 ≤ 𝑁 −1/2−𝜀 for any 𝜀 > 0. From (15.9), we have 𝐸+𝑏

(15.13)

lim ∫

𝑁→∞

𝐸−𝑏

𝜶 d𝐸 ′ (𝑛) (𝑛) ∫ d𝜶 𝑂(𝜶)(𝑝0,𝑁 − 𝑝𝑡,𝑁 )(𝐸 ′ + ) = 0. 2𝑏 ℝ𝑛 𝑁

This compares the correlation functions of 𝐻0 and 𝐻𝑡 if 𝑡 is not too large. To compare 𝐻𝑡 with 𝐻∞ = 𝐻G , by (12.24), we have for 𝑡 = 𝑁 −1/2−𝜀 and 𝑏 ≥ (𝑛)

𝑁 −1/2+10𝜀 that (15.13) holds with 𝑝0,𝑁 (which are the correlation functions of (𝑛)

𝐻0 ) replaced by 𝑝𝐺,𝑁 . We have thus completed the proof of Theorem 15.1.



15.1. Proof of Theorem 15.2 The first ingredient to prove Theorem 15.2 is the following continuity estimate of the matrix OU process. To state it, we need to introduce the deformation of a matrix 𝐻 at certain matrix elements. For any 𝑖 ≤ 𝑗, let 𝜽𝑖𝑗 𝐻 denote a new matrix with matrix elements (𝜽𝑖𝑗 𝐻)𝑘ℓ = 𝐻𝑘ℓ if {𝑘, ℓ} ≠ {𝑖, 𝑗} and 𝑖𝑗 𝑖𝑗 (𝜽𝑖𝑗 𝐻)𝑘,ℓ = 𝜃𝑘,ℓ 𝐻𝑘ℓ with some 0 ≤ 𝜃𝑘,ℓ ≤ 1 if {𝑘, ℓ} = {𝑖, 𝑗}. In other words, the deformation operation 𝜽𝑖𝑗 multiplies the matrix elements 𝐻𝑖𝑗 and 𝐻𝑗𝑖 by a factor between 0 and 1 and leaves all other matrix elements intact.

15.1. PROOF OF THEOREM 15.2

175

Lemma 15.4. Suppose that 𝐻0 is a Wigner ensemble and fix 𝑡 ∈ [0, 1]. Let 𝑔 be a smooth function of the matrix elements (ℎ𝑖𝑗 )𝑖≤𝑗 and set (15.14)

𝑀𝑡 ∶= sup sup sup 𝔼((𝑁 3/2 |ℎ𝑖𝑗 (𝑠)3 | + √𝑁|ℎ𝑖𝑗 (𝑠)|)|𝜕ℎ3 𝑖𝑗 𝑔(𝜽𝑖𝑗 𝐻𝑠 )|) 0≤𝑠≤𝑡 𝑖≤𝑗 𝜽𝑖𝑗

where the last supremum runs through all deformations 𝜽𝑖𝑗 . Then, 𝔼𝑔 (𝐻𝑡 ) − 𝔼𝑔 (𝐻0 ) = O(𝑡√𝑁)𝑀𝑡 . This lemma also holds for generalized Wigner matrices. We refer the reader to [12] for the minor adjustment needed to this case. Proof. Denote 𝜕𝑖𝑗 = 𝜕ℎ𝑖𝑗 ; notice that despite the two indices, this is still a first- and not a second-order partial derivative. By Itô’s formula, we have 𝜕𝑡 𝔼𝑔(𝐻𝑡 ) = − ∑(𝔼(ℎ𝑖𝑗 (𝑡)𝜕𝑖𝑗 𝑔(𝐻𝑡 )) − 𝑖≤𝑗

1 2 𝔼(𝜕𝑖𝑗 𝑔(𝐻𝑡 ))). 2𝑁

A Taylor expansion of the first derivative 𝜕𝑖𝑗 𝑔 in the direction ℎ𝑖𝑗 yields 2 𝔼(ℎ𝑖𝑗 (𝑡)𝜕𝑖𝑗 𝑔(𝐻𝑡 )) = 𝔼ℎ𝑖𝑗 (𝑡)𝜕𝑖𝑗 𝑔ℎ𝑖𝑗 (𝑡)=0 + 𝔼(ℎ𝑖𝑗 (𝑡)2 𝜕𝑖𝑗 𝑔ℎ𝑖𝑗 (𝑡)=0 ) 3 + O(sup 𝔼(|ℎ𝑖𝑗 (𝑡)3 𝜕𝑖𝑗 𝑔(𝜽𝑖𝑗 𝐻𝑡 )|)) 𝜽𝑖𝑗

3 2 𝑔ℎ𝑖𝑗 (𝑡)=0 ) + O(sup 𝔼(|ℎ𝑖𝑗 (𝑡)3 𝜕𝑖𝑗 𝑔(𝜽𝑖𝑗 𝐻𝑡 )|)), = 𝑠𝑖𝑗 𝔼(𝜕𝑖𝑗 𝜽𝑖𝑗

3 2 2 𝔼(𝜕𝑖𝑗 𝑔(𝐻𝑡 )) = 𝔼(𝜕𝑖𝑗 𝑔ℎ𝑖𝑗 (𝑡)=0 ) + O(sup 𝔼(|ℎ𝑖𝑗 (𝑡)(𝜕𝑖𝑗 𝑔(𝜽𝑖𝑗 𝐻))|)). 𝜽𝑖𝑗

˜𝑡 ), where (𝐻 ˜𝑡 )𝑘ℓ = Here the shorthand notation 𝜕𝑖𝑗 𝑔ℎ𝑖𝑗 (𝑡)=0 means (𝜕𝑖𝑗 𝑔)(𝐻 ˜𝑡 )𝑖𝑗 = (𝐻 ˜𝑡 )𝑗𝑖 = 0. In the calculation above, we (𝐻𝑡 )𝑘ℓ if {𝑘, ℓ} ≠ {𝑖, 𝑗} and (𝐻 ˜ 𝑡 is indepenused the independence of the matrix elements, in particular, that 𝐻 dent of ℎ𝑖𝑗 (𝑡). We also used the fact that the OU process preserves the first and second moments, i.e., 𝔼ℎ𝑖𝑗 (𝑡) = 𝔼ℎ𝑖𝑗 (0) = 0 and 𝔼|ℎ𝑖𝑗 (𝑡)|2 = 𝔼|ℎ𝑖𝑗 (0)|2 = 1/𝑁. Thus we have 3 𝜕𝑡 𝔼𝑔(𝐻𝑡 ) = 𝑁 1/2 O(sup sup 𝔼(𝑁 3/2 |ℎ𝑖𝑗 (𝑡)3 | + 𝑁 1/2 |ℎ𝑖𝑗 (𝑡)|)|𝜕𝑖𝑗 𝑔(𝜽𝑖𝑗 𝐻𝑡 )|). 𝑖≤𝑗 𝜽𝑖𝑗

Integration over time finishes the proof.



The second ingredient to prove Theorem 15.2 is an estimate of the type (15.3), not only for diagonal matrix elements but also for off-diagonal ones and for spectral parameters with imaginary parts slightly below 𝑁 −1 .

176

15. CONTINUITY OF LOCAL CORRELATION FUNCTIONS

Lemma 15.5. Suppose for a Wigner matrix 𝐻 we have the following estimate: 1 | | sup ||Im( ) || ≺ 𝑁 3𝜍+𝜀 . 𝐻 − 𝐸 ± 𝑖𝜂 1≤𝑘≤𝑁 |𝐸|≤2−𝜅 𝜂≥𝑁−1−𝜀 𝑘𝑘

(15.15)

max

sup

Then, we have that for any 𝜂 ≥ 𝑁 −1−𝜀 1 | | sup ||( ) || ≺ 𝑁 3𝜍+𝜀 . 𝐻 − 𝐸 ± 𝑖𝜂 1≤𝑘,ℓ≤𝑁 |𝐸|≤2−𝜅 𝑘ℓ

(15.16)

sup

We remark that (15.16) could be strengthened by including the supremum over 𝜂 ≥ 𝑁 −1−𝜀 . This can be obtained by establishing the estimate first for a fine grid of 𝜂’s with spacing 𝑁 −10 and then extend the bound for all 𝜂 by using that the Green functions are Lipschitz continuous in 𝜂 with a Lipschitz constant 𝜂−2 . Proof. Let 𝜆𝑚 and 𝑢𝑚 denote the eigenvalues and eigenvectors of 𝐻; then by the definition of the Green function, we have 𝑁

|𝑢 (𝑗)||𝑢𝑚 (𝑘)| 1 | | |( |≤ ∑ 𝑚 ) | 𝐻 − 𝑧 𝑗𝑘 | |𝜆𝑚 − 𝑧| 𝑚=1

(15.17)

𝑁

|𝑢 (𝑗)|2 ≤[∑ 𝑚 ] |𝜆𝑚 − 𝑧| 𝑚=1

1/2

𝑁

|𝑢 (𝑘)|2 [∑ 𝑚 ] |𝜆𝑚 − 𝑧| 𝑚=1

1/2

.

Define a dyadic decomposition 𝑈𝑛 = {𝑚 ∶ 2𝑛−1 𝜂 ≤ |𝜆𝑚 − 𝐸| < 2𝑛 𝜂},

(15.18)

𝑈0 = {𝑚 ∶ |𝜆𝑚 − 𝐸| < 𝜂},

𝑛 = 1, … , 𝑛0 ∶= 𝐶 log 𝑁,

𝑈∞ ∶= {𝑚 ∶ 2𝑛0 𝜂 ≤ |𝜆𝑚 − 𝐸|},

and divide the summation over 𝑚 into ⋃𝑛 𝑈𝑛 : 𝑁

|𝑢𝑚 (𝑗)|2 |𝑢𝑚 (𝑗)|2 |𝑢𝑚 (𝑗)|2 =∑ ∑ ≤ 𝐶 ∑ ∑ Im |𝜆𝑚 − 𝑧| |𝜆𝑚 − 𝑧| 𝜆𝑚 − 𝐸 − 𝑖2𝑛 𝜂 𝑛 𝑚∈𝑈 𝑛 𝑚∈𝑈 𝑚=1 ∑

𝑛

𝑛

≤ 𝐶 ∑ Im( 𝑛

1 ) . 𝐻 − 𝐸 − 𝑖2𝑛 𝜂 𝑗𝑗

Now using (15.15) we can control the right-hand side of (15.17) and conclude (15.16). □ Now we can finish the proof of Theorem 15.2. First note that from the trivial bound (15.19)

Im(

𝑦 1 1 ) ≤ ( ) Im( ) , 𝐻 − 𝐸 − 𝑖𝜂 𝑗𝑗 𝜂 𝐻 − 𝐸 − 𝑖𝑦 𝑗𝑗

𝜂 ≤ 𝑦,

and (15.3), the assumption (15.15) in Lemma 15.5 holds. Therefore, the bounds (15.16) on the matrix elements are available.

15.2. PROOF OF THE CORRELATION FUNCTION COMPARISON THEOREM

177

Next, we will first consider the specific function (15.20)

𝑔(𝐻) = 𝐺𝑎𝑏 (𝑧)

for a fixed index pair 𝑎, 𝑏 where 𝑧 = 𝐸 + i𝜂 with 𝑁 −1−𝜀 ≤ 𝜂 ≤ 1 and |𝐸| ≤ 2 − 𝜅. While this function is not of the form appearing in (15.6), once the comparison for 𝐺𝑎𝑏 (𝑧) is established, we take 𝑎 = 𝑏 and average over 𝑎 to get the normalized trace of 𝐺, i.e., the Stieltjes transform. This will then prove (15.6) for the function 𝐹(𝑥) = 𝑥; i.e., compare 𝔼𝑚(𝑡, 𝑧) with 𝔼𝑚(0, 𝑧). The case of a polynomial 𝐹 with several arguments is analogous, and similarly any function satisfying (15.4)– (15.5) can be sufficiently well approximated by a Taylor expansion. Returning to the case (15.20), in order to apply Lemma 15.4 we need to bound the third derivatives of 𝑔(𝐻) to estimate 𝑀𝑡 from (15.14). We have 3 𝜕𝑖𝑗 𝐺(𝑧)𝑎𝑏 = − ∑ 𝐺(𝑧)𝑎𝛼1 𝐺(𝑧)𝛽1 ,𝛼2 𝐺(𝑧)𝛽2 ,𝛼3 𝐺(𝑧)𝛽3 ,𝑏 𝜶,𝜷

where {𝛼𝑘 , 𝛽𝑘 } = {𝑖, 𝑗} or {𝑗, 𝑖}. By (15.16), the four expressions 𝐺(𝑧)𝑎𝛼1 ,

𝐺(𝑧)𝛽1 𝛼2 ,

𝐺(𝑧)𝛽2 𝛼3 ,

𝐺(𝑧)𝛽3 𝑏 ,

4𝜍+𝜀

are bounded by 𝑁 with very high probability provided 𝑁 −1−𝜀 ≤ 𝜂 ≤ 1. Consequently, we proved that uniformly in 𝐸 ∈ (−2 + 𝜅, 2 − 𝜅), 𝑁 −1−𝜀 ≤ 𝜂 ≤ 1, 3 𝜕𝑖𝑗 𝐺(𝑧)𝑎𝑏 = O(𝑁 20𝜍+5𝜀 )

with very high probability. The same argument holds if some matrix elements are reduced by deformation; i.e., we have 3 |𝜕𝑖𝑗 𝑔(𝜽𝑖𝑗 𝐻𝑠 )| ≤ 𝑁 20𝜍+5𝜀 ,

and thus (15.14) holds with 𝑀𝑡 = 𝑁 20𝜍+6𝜀 using the bound ℎ𝑖𝑗 ≺ 𝑁 −1/2 . Hence, by Lemma 15.4, we have proved that for any 𝑡 ≤ 1 we have |𝔼𝑔(𝐻𝑡 ) − 𝔼𝑔(𝐻0 )| ≤ 𝐶𝑁 20𝜍+6𝜀 (𝑡√𝑁). This completes the proof of Theorem 15.2. 15.2. Proof of the Correlation Function Comparison Theorem, Theorem 15.3 Define an approximate delta function at the scale 𝜂 by 1 1 (15.21) 𝜃𝜂 (𝑥) ∶= Im . 𝜋 𝑥 − 𝑖𝜂 We will choose 𝜂 ∼ 𝑁 −1−𝑎 , with some small 𝑎 > 0, i.e., slightly smaller than the typical eigenvalue spacing. This means that an observable of the form 𝜃𝜂 has sufficient resolution to detect individual eigenvalues. Moreover, polynomials of such observables detect correlation functions. On the other hand, 1 ∑ 𝜃 (𝜆 − 𝐸) = 𝜋 Im 𝑚(𝑧); 𝑁 𝑖 𝜂 𝑖 therefore expectation values of such observables are covered by the condition (15.8). The rest of the proof consists of making this idea precise. There are two

178

15. CONTINUITY OF LOCAL CORRELATION FUNCTIONS

technicalities to resolve. First, correlation functions involve distinct eigenvalues (see (15.25) below), while polynomials of the resolvent include an overcounting of coinciding eigenvalues. Thus an exclusion-inclusion formula will be needed. Second, although 𝜂 is much smaller than the relevant scale 1/𝑁, it still does not give pointwise information on the correlation functions. However, the correlation functions in (4.38) are identified only as a weak limit, i.e., tested against a continuous function 𝑂. The continuity of 𝑂 can be used to show that the difference between the exact correlation functions and the smeared out ones on the scale 𝜂 ∼ 𝑁 −1−𝑎 is negligible. This last step requires an a priori upper bound on the density to ensure that not too many eigenvalues fall into an irrelevantly small interval; this bound is given in (15.7), and it will eventually be verified by the local semicircle law. Proof of Theorem 15.3. For notational simplicity, we give the detailed proof only for the case of 𝑛 = 3-point correlation functions; the proof is analogous for the general case. Denote by 𝜂 = 𝑁 −1−𝑎 for some small 𝑎 > 0; by following the proof one may check that any 𝑎 ≤ 𝑐 min{𝛿, 𝜎}/𝑛2 will do. Let 𝑂 be a compactly supported test function, and let 𝑂𝜂 (𝛽1 , 𝛽2 , 𝛽3 ) ∶=

𝛽 − 𝛼3 𝛽 − 𝛼1 1 ∫ d𝛼 d𝛼 d𝛼 𝑂(𝛼1 , 𝛼2 , 𝛼3 )𝜃𝜂 ( 1 ) ⋯ 𝜃𝜂 ( 3 ) 𝑁 𝑁 𝑁 3 ℝ3 1 2 3

be its smoothing on scale 𝑁𝜂. We note that for any nonnegative 𝐶 1 function 𝑂 with compact support, there is a constant depending on 𝑂 such that (15.22)

𝑂𝜂 ≤ 𝐶𝑂𝜂′

for any 0 ≤ 𝜂 ≤ 𝜂 ′ ≤ 1/𝑁.

We can apply this bound with 𝜂′ = 1/𝑁 and combine it with (15.11) with 𝜂1 = 1/𝑁 and 𝜂2 = 𝑁 −1+𝜀 for any small 𝜀 > 0 to have 𝑂𝜂 ≤ 𝐶𝑁 3𝜀 𝑂𝑁−1+𝜀

(15.23)

for any 0 ≤ 𝜂 ≤ 1/𝑁.

After the change of variables 𝑥𝑗 = 𝐸 + 𝛽𝑗 /𝑁 and with 𝐸𝑗 ∶= 𝐸 + 𝛼𝑗 /𝑁, we have (3)

∫ d𝛽1 d𝛽2 d𝛽3 𝑂𝜂 (𝛽1 , 𝛽2 , 𝛽3 )𝑝𝐰,𝑁 (𝐸 + ℝ3

(15.24)

𝛽 𝛽1 ,…,𝐸 + 3) 𝑁 𝑁

= ∫ d𝛼1 d𝛼2 d𝛼3 𝑂(𝛼1 , 𝛼2 , 𝛼3 ) ℝ3 (3)

⋅ ∫ d𝑥1 d𝑥2 d𝑥3 𝑝𝐰,𝑁 (𝑥1 , 𝑥2 , 𝑥3 )𝜃𝜂 (𝑥1 − 𝐸1 )𝜃𝜂 (𝑥2 − 𝐸2 )𝜃𝜂 (𝑥3 − 𝐸3 ). ℝ3

15.2. PROOF OF THE CORRELATION FUNCTION COMPARISON THEOREM

179

By definition of the correlation function, for any fixed 𝐸, 𝛼1 , 𝛼2 , 𝛼3 , (3)

∫ d𝑥1 d𝑥2 d𝑥3 𝑝𝐰,𝑁 (𝑥1 , 𝑥2 , 𝑥3 )𝜃𝜂 (𝑥1 − 𝐸1 )𝜃𝜂 (𝑥2 − 𝐸2 )𝜃𝜂 (𝑥3 − 𝐸3 ) 1 𝑁(𝑁 − 1)(𝑁 − 2) 𝛼 𝛼 𝛼 ⋅ ∑ 𝜃𝜂 (𝜆𝑖 − 𝐸 − 1 )𝜃𝜂 (𝜆𝑗 − 𝐸 − 2 )𝜃𝜂 (𝜆𝑘 − 𝐸 − 3 ), 𝑁 𝑁 𝑁 𝑖≠𝑗≠𝑘

= 𝔼𝐰

(15.25)

where 𝔼𝐰 indicates expectation w.r.t. the 𝐰-variables. By the exclusion-inclusion principle, (15.26) 𝔼𝐰

1 ∑ 𝜃 (𝑥 − 𝐸1 )𝜃𝜂 (𝑥2 − 𝐸2 )𝜃𝜂 (𝑥3 − 𝐸3 ) = 𝑁(𝑁 − 1)(𝑁 − 2) 𝑖≠𝑗≠𝑘 𝜂 1 𝔼𝐰 𝐴1 + 𝔼𝐰 𝐴2 + 𝔼𝐰 𝐴3

where 3

𝑁3 1 ∏[ ∑ 𝜃𝜂 (𝜆𝑖 − 𝐸𝑗 )], 𝐴1 ∶= 𝑁 𝑁(𝑁 − 1)(𝑁 − 2) 𝑗=1 𝑖 𝐴3 ∶=

2 ∑ 𝜃 (𝜆 − 𝐸1 )𝜃𝜂 (𝜆𝑖 − 𝐸2 )𝜃𝜂 (𝜆𝑖 − 𝐸3 ), 𝑁(𝑁 − 1)(𝑁 − 2) 𝑖 𝜂 𝑖

and 𝐴2 ∶= 𝐵1 + 𝐵2 + 𝐵3 with 𝐵𝑗 defined as follows: 𝐵3 = −

1 ∑ 𝜃 (𝜆 − 𝐸1 )𝜃𝜂 (𝜆𝑖 − 𝐸2 ) ∑ 𝜃𝜂 (𝜆𝑘 − 𝐸3 ). 𝑁(𝑁 − 1)(𝑁 − 2) 𝑖 𝜂 𝑖 𝑘

𝐵1 consists of terms with 𝑗 = 𝑘, and 𝐵2 consists of terms with 𝑖 = 𝑘 from the triple sum (15.26). We remark that 𝐴3 + 𝐴2 ≤ 0, and thus (15.27)

(3)

∫ d𝑥1 d𝑥2 d𝑥3 𝑝𝐰,𝑁 (𝑥1 , 𝑥2 , 𝑥3 )𝜃𝜂 (𝑥1 − 𝐸1 )𝜃𝜂 (𝑥2 − 𝐸2 )𝜃𝜂 (𝑥3 − 𝐸3 ) ≤ 3

𝐶𝔼𝐰 ∏ Im 𝑚(𝐸𝑗 + 𝑖𝜂). 𝑗=1 (3)

The same bound holds for 𝑝𝐯,𝑁 as well. Therefore, we can combine (15.23) and (15.7) to obtain for any small 𝜀 > 0 (3) (3) (15.28) ∫ d𝛽1 d𝛽2 d𝛽3 𝑂𝜂 (𝛽1 , 𝛽2 , 𝛽3 )(𝑝𝐰,𝑁 + 𝑝𝐯,𝑁 )(𝐸 + ℝ3

𝛽 𝛽1 , … , 𝐸 + 3 ) = 𝑂(𝑁 3𝜀 ). 𝑁 𝑁

To approximate 𝔼𝐰 𝐵3 , we define 𝜙𝐸1 ,𝐸2 (𝑥) = 𝜃𝜂 (𝑥 − 𝐸1 )𝜃𝜂 (𝑥 − 𝐸2 ). Recall ˆ𝜂ˆ = 𝜃 ˆ1 + 𝜃 ˆ2 where 𝜃 ˆ2 (𝑦) = ˆ = 𝑁 −1−9𝑎 . Decompose 𝜃 𝜂 = 𝑁 −1−𝑎 and let 𝜂 ˆ𝑁 3𝑎 ). Denote 𝜃𝜂ˆ (𝑦)1(|𝑦| ≥ 𝜂 (15.29)

𝜃˜ = (1 − 𝜓(𝑎))−1 𝜃1̂ ,

ˆ2 (𝑦)d𝑦 ≤ 𝑁 −3𝑎 , 𝜓(𝑎) ∶= ∫ 𝜃

180

15. CONTINUITY OF LOCAL CORRELATION FUNCTIONS

so that ∫ ˜ 𝜃 = 1. Using |𝜙𝐸′ 1 ,𝐸2 (𝑥)| ≤ 𝐶𝜂 −2 𝜃𝜂 (𝑥 − 𝐸1 ) and |𝜙𝐸″1 ,𝐸2 (𝑥)| ≤ 𝐶𝜂 −3 𝜃𝜂 (𝑥 − 𝐸1 ), we have |˜ 𝜃 ∗ 𝜙𝐸1 ,𝐸2 (𝑥) − 𝜙𝐸1 ,𝐸2 (𝑥)| ≤

| | 1 ˆ1 (𝑦)[𝜙𝐸 ,𝐸 (𝑥 − 𝑦) − 𝜙𝐸 ,𝐸 (𝑥)]| |∫ d𝑦 𝜃 1 2 1 2 1 − 𝜓(𝑎) | |

| | ″ 2 ˆ1 (𝑦)[−𝜙′ ≤ 𝐶 |∫ d𝑦 𝜃 𝐸1 ,𝐸2 (𝑥)𝑦 + 𝑂(𝜙𝐸1 ,𝐸2 (𝑥))𝑦 ]| | | ≤𝐶

ˆ2 𝑁 6𝑎 𝜂 𝜃𝜂 (𝑥 − 𝐸1 ) 𝜂3

ˆ1 , i.e., ∫ d𝑦 𝜃 ˆ1 (𝑦)𝑦 = 0 and that 𝜃 ˆ1 (𝑦) is supwhere we used the symmetry of 𝜃 3𝑎 ′ ˆ𝑁 . Together with (15.11) with the choice of 𝜂 = 𝑁 −1+𝜀 for ported on |𝑦| ≤ 𝜂 some small 𝜀 > 0, we have | 1 | | 𝔼 3 ∑[ ˜ 𝜃 ∗ 𝜙𝐸1 ,𝐸2 (𝜆𝑖 ) − 𝜙𝐸1 ,𝐸2 (𝜆𝑖 )] ∑ 𝜃𝜂 (𝜆𝑘 − 𝐸3 )|| | 𝑁 𝑖

ˆ2

𝑘

6𝑎



𝜂 𝑁 | 1 | |𝔼 ∑ 𝜃 (𝜆 − 𝐸1 )𝜃𝜂 (𝜆𝑘 − 𝐸3 )|| 𝑁𝜂 3 | 𝑁 2 𝑖,𝑘 𝜂 𝑖



ˆ2 𝑁 8𝑎+2𝜀 | 1 𝜂 | |𝔼 ∑ 𝜃𝑁−1+𝜀 (𝜆𝑖 − 𝐸1 )𝜃𝑁−1+𝜀 (𝜆𝑘 − 𝐸3 )|| 𝑁𝜂 3 | 𝑁 2

(15.30)

𝑖,𝑘

ˆ)2 𝑁 11𝑎+2𝜀 ≤ 𝑁 −6𝑎 , ≤ (𝑁 𝜂 where we used the a priori bound (15.7) and then 𝜀 ≤ 𝑎/2 in the two last steps. Hence, we can approximate 𝔼𝐰 𝐵3 by 𝜃 (𝜆𝑖 − 𝑦)𝜃𝜂 (𝜆𝑘 − 𝐸3 ) + 𝑂(𝑁 −𝑎 ). (15.31) 𝔼𝐰 𝐵3 = 𝔼𝐰 ∫ d𝑦 𝜙𝐸1 ,𝐸2 (𝑦)𝑁 −3 ∑ ˜ 𝑖,𝑘

ˆ1 + 𝜃 ˆ2 and ˜ ˆ1 . We wish to replace ˜ Recall 𝜃𝜂ˆ = 𝜃 𝜃 = (1 − 𝜓(𝑎))−1 𝜃 𝜃 in the −1 −𝑎 right-hand side of (15.31) by (1 − 𝜓(𝑎)) 𝜃𝜂ˆ with an error 𝑂(𝑁 ). For this ˆ2 is bounded purpose, we need to prove that the additional contribution from 𝜃 by 𝑂(𝑁 −𝑎 ). ˆ2 (𝑦) ≤ 𝐶𝑁 −3𝑎 𝜃 3𝑎 ˆ (𝑦), we have Since 𝜃 𝑁 𝜂 (15.32)

ˆ2 (𝜆𝑖 − 𝑦)𝜃𝜂 (𝜆𝑘 − 𝐸3 ) ≤ ∫ d𝑦 𝜙𝐸1 ,𝐸2 (𝑦)𝔼𝐰 𝑁 −3 ∑ 𝜃 𝑖,𝑘

∫ d𝑦 𝜙𝐸1 ,𝐸2 (𝑦)𝔼𝐰 𝑁 −3 ∑ 𝑁 −3𝑎 𝜃𝑁3𝑎 𝜂ˆ (𝜆𝑖 − 𝑦)𝜃𝜂 (𝜆𝑘 − 𝐸3 ). 𝑖,𝑘

Now we show that the contribution of this error term to the right-hand side of (15.24) is negligible. Recalling 𝐸1 = 𝐸 +𝛼1 /𝑁 and using ∫ d𝛼1 𝜃𝜂 (𝑦 −𝐸1 ) ≤ 𝐶𝑁,

15.2. PROOF OF THE CORRELATION FUNCTION COMPARISON THEOREM

181

we have | |∫ d𝛼1 d𝛼2 d𝛼3 𝑂(𝛼1 , 𝛼2 , 𝛼3 ) | ℝ3 | × ∫ d𝑦𝜃𝜂 (𝑦 − 𝐸1 )𝜃𝜂 (𝑦 − 𝐸2 )𝔼𝐰 𝑁 −3 ∑ 𝑁 −3𝑎 𝜃𝑁3𝑎 𝜂ˆ (𝜆𝑖 − 𝑦)𝜃𝜂 (𝜆𝑘 − 𝐸3 )| | 𝑖,𝑘

(15.33)

≤ 𝐶𝑁 −3𝑎 ∫

d𝛼2 d𝛼3 𝔼𝐰 𝑁 −2 ∑ 𝜃𝜂 ∗ 𝜃𝑁3𝑎 𝜂ˆ (𝜆𝑖 − 𝐸2 ) ∑ 𝜃𝜂 (𝜆𝑘 − 𝐸3 )

|𝛼2 |+|𝛼3 |≤𝐶

≤ 𝐶𝑁 −3𝑎 ∫

𝑖

𝑘

d𝛼2 d𝛼3 𝔼𝐰 𝑁 −2 ∑ 𝜃𝜂 (𝜆𝑖 − 𝐸2 ) ∑ 𝜃𝜂 (𝜆𝑘 − 𝐸3 ),

|𝛼2 |+|𝛼3 |≤𝐶

𝑖

𝑘

where we have used 𝜃𝜂 ∗ 𝜃𝜂′ ≤ 𝐶𝜃𝜂 if 𝜂 > 𝜂 ′ with 𝐶 independent of 𝜂, 𝜂 ′ . Notice that we have only used that 𝑂 is compactly supported and ‖𝑂‖∞ is bounded. Using (15.11) and (15.7), we can bound the last line by 𝑁 −𝑎+𝐶𝜀 . As we will choose 𝜀 ≪ 𝑎, neglecting the 𝐶𝜀 exponent, we can thus approximate the contribution of 𝔼𝐰 𝐵3 to the r.h.s. of (15.24) by ∫ d𝛼1 d𝛼2 d𝛼3 𝑂(𝛼1 , 𝛼2 , 𝛼3 )𝔼𝐰 𝐵3 ℝ3

= (1 − 𝜓(𝑎))−1 ∫ d𝛼1 d𝛼2 d𝛼3 𝑂(𝛼1 , 𝛼2 , 𝛼3 ) ℝ3

(15.34)

× ∫ d𝑦 𝜙𝐸1 ,𝐸2 (𝑦)𝔼𝐰 𝑁 −3 ∑ 𝜃𝜂ˆ (𝜆𝑖 − 𝑦)𝜃𝜂 (𝜆𝑘 − 𝐸3 ) 𝑖,𝑘

+ 𝑂(𝑁 −𝑎 ). The same estimate holds if we replace the expectation 𝔼𝐰 by 𝔼𝐯 . Recalling (15.8), we can thus estimate their difference by | | |∫ d𝛼1 d𝛼2 d𝛼3 𝑂(𝛼1 , 𝛼2 , 𝛼3 )[𝔼𝐰 − 𝔼𝐯 ]𝐵3 | | | ℝ3 | | 𝐰 𝐯 −2 ≤ 𝐶 ∫ d𝑦 𝜙𝐸1 ,𝐸2 (𝑦)||[𝔼 − 𝔼 ]𝑁 ∑ 𝜃𝜂ˆ (𝜆𝑖 − 𝑦)𝜃𝜂 (𝜆𝑘 − 𝐸3 )|| 𝑖,𝑘

(15.35)

+ 𝑂(𝑁

−𝑎

)

≤ 𝐶(𝑁𝜂)−1 𝑁 −𝛿 + 𝑂(𝑁 −𝑎 ) ≤ 𝑂(𝑁 −𝑎 ) provided that 𝑎 ≤ 𝛿/2. Similar arguments can be used to prove that (15.36)

∫ d𝛼1 d𝛼2 d𝛼3 𝑂(𝛼1 , 𝛼2 , 𝛼3 )[𝔼𝐰 − 𝔼𝐯 ][𝐴1 + 𝐴2 + 𝐴3 ] = 𝑂(𝑁 −𝑎 ). ℝ3

182

15. CONTINUITY OF LOCAL CORRELATION FUNCTIONS

Recalling (15.24), we have thus proved that (15.37)

(3)

(3)

∫ d𝛽1 d𝛽2 d𝛽3 𝑂𝜂 (𝛽1 , 𝛽2 , 𝛽3 )(𝑝𝐰,𝑁 − 𝑝𝐯,𝑁 )(𝐸 + ℝ3

𝛽 𝛽1 ,…,𝐸 + 3) 𝑁 𝑁 = 𝑂(𝑁 −𝑎 )

for any 𝑂 compactly supported with ‖𝑂‖∞ bounded. In order to prove Theorem 15.3, it remains to replace 𝑂𝜂 with 𝑂 in (15.37); at this point we will use that 𝑂 is differentiable. For this purpose, we only need to bound the error (15.38)

(3)

∫ d𝛽1 d𝛽2 d𝛽3 (𝑂 − 𝑂𝜂 )(𝛽1 , 𝛽2 , 𝛽3 )𝑝𝐰,𝑁 (𝐸 + ℝ3

𝛽 𝛽1 ,…,𝐸 + 3) 𝑁 𝑁 = 𝑂(𝑁 −𝑎+𝜀 )

and use the similar estimate with 𝐰 replaced by 𝐯. One can easily check that ˜ with compact support and ‖𝑂‖ ˜ ∞ ≤ 1 such there is a 𝐶 1 nonnegative function 𝑂 that ˜𝜂 . (15.39) |𝑂 − 𝑂𝜂 | ≤ 𝐶(𝑁𝜂)𝑂 ˜ Choosing 𝜀 much smaller Hence, (15.38) follows from applying (15.28) to 𝑂. than 𝑎 completes the proof of Theorem 15.3. □

CHAPTER 16

Universality of Wigner Matrices in Small Energy Windows: GFT We have proved the universality of Wigner matrices in energy windows bigger than 𝑁 −1/2+𝜀 in Chapter 15. In this section, we will use the Green function comparison theorem to improve it to any energy windows of size bigger than 𝑁 −1+𝜀 . This is Step 3 in the three-step strategy introduced in Chapter 5. In the following, we will first prove the Green function comparison theorem and then use it to prove our main universality theorem, i.e., Theorem 5.1. 16.1. Green Function Comparison Theorems The main ingredient to prove this result is the following Green function comparison theorem stating that the correlation functions of eigenvalues of two matrix ensembles are identical on scale 1/𝑁 provided that the first four moments of all matrix elements of these two ensembles are almost identical. Here we do not assume that the real and imaginary parts are i.i.d.; hence, the 𝑘th moment of ℎ𝑖𝑗 is understood as the collection of numbers ∫ ℎ𝑠 ℎ𝑘−𝑠 𝜈𝑖𝑗 (dℎ), 𝑠 = 0, 1, … , 𝑘. Before stating the result we explain a notation. We will distinguish between the two ensembles by using different letters, 𝑣𝑖𝑗 and 𝑤𝑖𝑗 , for their matrix elements, and we often use the notation 𝐻 (𝐯) and 𝐻 (𝐰) to indicate the difference. Alternatively, one could denote the matrix elements of 𝐻 by ℎ𝑖𝑗 and make the distinction of the two ensembles in the measure, especially in the expectation, by using the notation 𝔼𝐯 and 𝔼𝐰 . Since the matrix elements will be replaced one by one from one distribution to the other, the latter notation would have been cumbersome, and so we will follow the first convention in this section. Theorem 16.1 (Green function comparison [69, theorem 2.3]). Suppose that we have two generalized 𝑁 × 𝑁 Wigner matrices, 𝐻 (𝐯) and 𝐻 (𝐰) , with matrix elements ℎ𝑖𝑗 given by the random variables 𝑁 −1/2 𝑣𝑖𝑗 and 𝑁 −1/2 𝑤𝑖𝑗 , respectively, with 𝑣𝑖𝑗 and 𝑤𝑖𝑗 satisfying the uniform polynomial decay condition (5.6). Fix a bijective ordering map on the index set of the independent matrix elements, 𝑁(𝑁 + 1) , 2 and denote by 𝐻𝛾 the generalized Wigner matrix whose matrix elements ℎ𝑖𝑗 follow the 𝑣-distribution if 𝜙(𝑖, 𝑗) ≤ 𝛾 and the 𝐰-distribution otherwise; in particular, 𝐻 (𝐯) = 𝐻0 and 𝐻 (𝐰) = 𝐻𝛾(𝑁) . Let 𝜅 > 0 be arbitrary and suppose that, for any small parameter 𝜏 > 0 and for any 𝑦 ≥ 𝑁 −1+𝜏 , we have the following estimate on 𝜙 ∶ {(𝑖, 𝑗) ∶ 1 ≤ 𝑖 ≤ 𝑗 ≤ 𝑁} → {1, … , 𝛾(𝑁)},

183

𝛾(𝑁) ∶=

184

16. UNIVERSALITY OF WIGNER MATRICES IN SMALL ENERGY WINDOWS: GFT

the diagonal elements of the resolvent: 1 | | max max ||( ) || ≺ 𝑁 2𝜏 0≤𝛾≤𝛾(𝑁) 1≤𝑘≤𝑁 |𝐸|≤2−𝜅 𝐻𝛾 − 𝐸 − 𝑖𝑦 𝑘𝑘

(16.1)

max

with some constants 𝐶, 𝑐 depending only on 𝜏, 𝜅. Moreover, we assume that the first four moments of 𝑣𝑖𝑗 and 𝑤𝑖𝑗 satisfy that (16.2)

𝑠−𝑢 𝑠−𝑢 | |𝔼𝑣𝑢 − 𝔼𝑤 𝑢 ≤ 𝑁 −𝛿−2+𝑠/2 , 𝑠 = 0, 1, 2, 3, 4, 𝑢 = 0, 1, … , 𝑠, 𝑖𝑗 𝑣𝑖𝑗 𝑖𝑗 𝑤𝑖𝑗

for some given 𝛿 > 0. Let 𝜀 > 0 be arbitrary, and choose an 𝜂 with 𝑁 −1−𝜀 ≤ 𝜂 ≤ 𝑁 −1 . For any sequence of complex parameters 𝑧𝑗 = 𝐸𝑗 ± 𝑖𝜂, 𝑗 = 1, … , 𝑛, with |𝐸𝑗 | ≤ 2 − 2𝜅 and with an arbitrary choice of the ± signs. Let 𝐺 (𝐯) (𝑧) = (𝐻 (𝐯) − 𝑧)−1 denote the resolvent, and let 𝐹(𝑥1 , … , 𝑥𝑛 ) be a function such that for any multi-index 𝛼 = (𝛼1 , … , 𝛼𝑛 ) with 1 ≤ |𝛼| ≤ 5 and for any 𝜀′ > 0 sufficiently small, we have ′

max{|𝜕 𝛼 𝐹(𝑥1 , … , 𝑥𝑛 )| ∶ max |𝑥𝑗 | ≤ 𝑁 𝜀 } ≤ 𝑁 𝐶0 𝜀

(16.3)



𝑗

and max{|𝜕 𝛼 𝐹(𝑥1 , … , 𝑥𝑛 )| ∶ max |𝑥𝑗 | ≤ 𝑁 2 } ≤ 𝑁 𝐶0

(16.4)

𝑗

for some constant 𝐶0 . Then there is a constant 𝐶1 , depending on 𝜗, ∑𝑚 𝑘𝑚 , and 𝐶0 such that for any 𝜂 with 𝑁 −1−𝜀 ≤ 𝜂 ≤ 𝑁 −1 and for any choices of the signs in the imaginary part of 𝑧𝑗𝑚 , we have (𝐯) (𝐯) (16.5) ||𝔼𝐹(𝐺𝑎1 𝑏1 (𝑧1 ), … , 𝐺𝑎𝑛 𝑏𝑛 (𝑧𝑛 )) − 𝔼𝐹(𝐺 (𝐯) → 𝐺 (𝐰) )|| ≤

𝐶1 𝑁 −1/2+𝐶1 𝜀 + 𝐶1 𝑁 −𝛿+𝐶1 𝜀 , where the arguments of 𝐹 in the second term are changed from the Green functions of 𝐻 (𝐯) to 𝐻 (𝐰) and all other parameters remain unchanged. Moreover, for any |𝐸| ≤ 2 − 2𝜅, any 𝑘 ≥ 1, and any compactly supported smooth test function 𝑂 ∶ ℝ𝑛 → ℝ, we have (𝑛)

(𝑛)

lim ∫ d𝜶 𝑂(𝜶)(𝑝𝐯,𝑁 − 𝑝𝐰,𝑁 )(𝐸 +

(16.6)

𝑁→∞ (𝑛) 𝑝𝐯,𝑁

ℝ𝑛

𝜶 )=0 𝑁

(𝑛) 𝑝𝐰,𝑁

where and denote the 𝑛-point correlation functions of the 𝐯- and 𝐰ensembles, respectively. Remark 1. We formulated Theorem 16.1 for functions of traces of monomials of the Green function because this is the form we need in the application. However, the result (and the proof we are going to present) holds directly for matrix elements of monomials of Green functions as well (for the precise statement, see [69]). We also remark that Theorem 16.1 holds for generalized Wigner matrices if 𝐶sup = sup𝑖𝑗 𝑁𝑠𝑖𝑗 < ∞. The positive lower bound on the variances, 𝐶inf > 0 in (2.6), is not necessary for this theorem.

16.1. GREEN FUNCTION COMPARISON THEOREMS

185

Remark 2. Although we state Theorem 16.1 for Hermitian and symmetric ensembles, similar results hold for real and complex sample covariance ensembles; the modification of the proof is obvious. Proof of Theorem 16.1 (Green Function Comparison Theorem). The basic idea is to estimate the effect of changing matrix elements of the resolvent one by one by a resolvent expansion. Since each matrix element has a typical size of 𝑁 −1/2 and the resolvents are essentially bounded thanks to (16.1), a resolvent expansion up to the fourth order will identify the change of each element with a precision 𝑂(𝑁 −5/2 ) (modulo some tiny corrections of order 𝑁 𝑂(𝜏) ). The expectation values of the terms up to fourth order involve only the first four moments of the single-entry distribution, which can be directly compared. The error terms are negligible even when we sum them up 𝑁 2 times, the number of comparison steps needed to replace all matrix elements. Now we turn to the one-by-one replacement. For notational simplicity, we will consider the case when the test function 𝐹 has only 𝑛 = 1 variable and 𝑘1 = 1; i.e., we consider the trace of a first-order monomial; the general case follows analogously. Consider the telescopic sum of differences of expectations (16.7) 𝔼𝐹(

1 1 1 1 Tr ) − 𝔼𝐹( Tr )= (𝐯) (𝐰) 𝑁 𝑁 𝐻 −𝑧 𝐻 −𝑧 𝛾(𝑁)

∑ [𝔼𝐹( 𝛾=1

1 1 1 1 Tr ) − 𝔼𝐹( Tr )]. 𝑁 𝐻𝛾 − 𝑧 𝑁 𝐻𝛾−1 − 𝑧

Let 𝐸 (𝑖𝑗) denote the matrix whose matrix elements are 0 everywhere except at (𝑖𝑗) the (𝑖, 𝑗) position, where it is 1, i.e., 𝐸𝑘ℓ = 𝛿𝑖𝑘 𝛿𝑗ℓ . Fix an 𝛾 ≥ 1 and let (𝑖, 𝑗) be determined by 𝜙(𝑖, 𝑗) = 𝛾. We will compare 𝐻𝛾−1 with 𝐻𝛾 . Note that these two matrices differ only in the (𝑖, 𝑗) and (𝑗, 𝑖) matrix elements, and they can be written as 1 𝐻𝛾−1 = 𝑄 + 𝑉, 𝑉 ∶= 𝑣𝑖𝑗 𝐸 (𝑖𝑗) + 𝑣𝑗𝑖 𝐸 (𝑗𝑖) , √𝑁 1 𝐻𝛾 = 𝑄 + 𝑊, 𝑊 ∶= 𝑤𝑖𝑗 𝐸 (𝑖𝑗) + 𝑤𝑗𝑖 𝐸 (𝑗𝑖) , √𝑁 with a matrix 𝑄 that has zero matrix element at the (𝑖, 𝑗) and (𝑗, 𝑖) positions and where we set 𝑣𝑗𝑖 ∶= 𝑣 𝑖𝑗 for 𝑖 < 𝑗 and similarly for 𝑤. Define the Green functions 𝑅 ∶=

1 , 𝑄−𝑧

𝑆 ∶=

1 . 𝐻𝛾 − 𝑧

We first claim that the estimate (15.16) holds for the Green function 𝑅 as well. To see this, we have, from the resolvent expansion, 𝑅 = 𝑆 + 𝑁 −1/2 𝑆𝑉𝑆 + ⋯ + 𝑁 −9/5 (𝑆𝑉)9 𝑆 + 𝑁 −5 (𝑆𝑉)10 𝑅. Since 𝑉 has only at most two nonzero elements, when computing the (𝑘, ℓ) matrix element of this matrix identity, each term is a finite sum involving matrix

186

16. UNIVERSALITY OF WIGNER MATRICES IN SMALL ENERGY WINDOWS: GFT

elements of 𝑆 or 𝑅 and 𝑣𝑖𝑗 , e.g., (𝑆𝑉𝑆)𝑘ℓ = 𝑆𝑘𝑖 𝑣𝑖𝑗 𝑆𝑗ℓ + 𝑆𝑘𝑗 𝑣𝑗𝑖 𝑆𝑖ℓ . Using the bound (15.16) for the 𝑆 matrix elements, the subexponential decay for 𝑣𝑖𝑗 , and the trivial bound |𝑅𝑖𝑗 | ≤ 𝜂 −1 , we obtain that the estimate (15.16) holds for 𝑅. We can now start proving the main result by comparing the resolvents of (𝛾−1) 𝐻 and 𝐻 (𝛾) with the resolvent 𝑅 of the reference matrix 𝑄. By the resolvent expansion, 𝑆 = 𝑅 − 𝑁 −1/2 𝑅𝑉𝑅 + 𝑁 −1 (𝑅𝑉)2 𝑅 − 𝑁 −3/2 (𝑅𝑉)3 𝑅

(16.8)

+ 𝑁 −2 (𝑅𝑉)4 𝑅 − 𝑁 −5/2 (𝑅𝑉)5 𝑆,

so we can write 1 Tr 𝑆 = 𝑅ˆ + 𝜉, 𝑁

4

(𝑚)

𝜉 ∶= ∑ 𝑁 −𝑚/2 𝑅ˆ𝐯

+ 𝑁 −5/2 Ω𝐯

𝑚=1

with 1 (𝑚) 𝑅ˆ𝐯 ∶= (−1)𝑚 Tr (𝑅𝑉)𝑚 𝑅, 𝑁 (16.9) 1 Ω𝐯 ∶= − Tr (𝑅𝑉)5 𝑆. 𝑁 For each diagonal element in the computation of these traces, the contribution ˆ 𝑅ˆ(𝑚) to 𝑅, 𝐯 , and Ω𝐯 is a sum of a few terms, e.g., 1 𝑅ˆ ∶= Tr 𝑅, 𝑁

1 (2) 𝑅ˆ𝐯 = ∑[𝑅𝑘𝑖 𝑣𝑖𝑗 𝑅𝑗𝑗 𝑣𝑗𝑖 𝑅𝑖𝑘 + 𝑅𝑘𝑖 𝑣𝑖𝑗 𝑅𝑗𝑖 𝑣𝑖𝑗 𝑅𝑗𝑘 𝑁 𝑘 + 𝑅𝑘𝑗 𝑣𝑗𝑖 𝑅𝑖𝑖 𝑣𝑖𝑗 𝑅𝑗𝑘 + 𝑅𝑘𝑗 𝑣𝑗𝑖 𝑅𝑖𝑗 𝑣𝑗𝑖 𝑅𝑖𝑘 ] and similar formulas hold for the other terms. Then we have 1 1 𝔼𝐹( Tr ) 𝑁 𝐻𝛾 − 𝑧 = 𝔼𝐹(𝑅ˆ + 𝜉) (16.10)

= 𝔼[𝐹(𝑅ˆ) + 𝐹 ′ (𝑅ˆ)𝜉 + 𝐹 ′′ (𝑅ˆ)𝜉 2 + ⋯ + 𝐹 (5) (𝑅ˆ + 𝜉 ′ )𝜉 5 ] 5

(𝑚)

= ∑ 𝑁 −𝑚/2 𝔼𝐴𝐯 𝑚=0

(𝑚)

where 𝜉 ′ is a number between 0 and 𝜉 that depends on 𝑅ˆ and 𝜉, and the 𝐴𝐯 ’s are defined as (0)

ˆ 𝐴𝐯 = 𝐹(𝑅),

(1)

(1)

ˆ 𝑅ˆ𝐯 , 𝐴𝐯 = 𝐹 ′ (𝑅) (3)

(2)

(1)

(2)

ˆ 𝑅ˆ𝐯 )2 + 𝐹 ′ (𝑅) ˆ 𝑅ˆ𝐯 , 𝐴𝐯 = 𝐹 ″ (𝑅)(

(4)

and similarly for 𝐴𝐯 and 𝐴𝐯 . Finally, (5) 5 ˆ + 𝐹 (5) (𝑅ˆ + 𝜉 ′ )(𝑅ˆ(1) 𝐴𝐯 = 𝐹 ′ (𝑅)Ω 𝐯 ) + ⋯.

16.1. GREEN FUNCTION COMPARISON THEOREMS

187

(𝑚)

The expectation values of the terms 𝐴𝐯 , 𝑚 ≤ 4, with respect to 𝑣𝑖𝑗 are determined by the first four moments of 𝑣𝑖𝑗 ; e.g., (2)

𝔼𝐴𝐯 = 𝐹 ′ (𝑅ˆ )[

1 ∑ 𝑅 𝑅 𝑅 + ⋯ ]𝔼|𝑣𝑖𝑗 |2 𝑁 𝑘 𝑘𝑖 𝑗𝑗 𝑖𝑘

+ 𝐹 ″ (𝑅ˆ )[

1 ∑ 𝑅 𝑅 𝑅 𝑅 + ⋯ ]𝔼|𝑣𝑖𝑗 |2 𝑁 2 𝑘,ℓ 𝑘𝑖 𝑗ℓ ℓ𝑗 𝑖𝑘

1 2 + 𝐹 ′ (𝑅ˆ )[ ∑ 𝑅𝑘𝑖 𝑅𝑗𝑖 𝑅𝑗𝑘 + ⋯ ]𝔼 𝑣𝑖𝑗 𝑁 𝑘 + 𝐹 ″ (𝑅ˆ )[

1 2 ∑ 𝑅 𝑅 𝑅 𝑅 + ⋯ ]𝔼𝑣𝑖𝑗 . 𝑁 2 𝑘,ℓ 𝑘𝑖 𝑗ℓ ℓ𝑖 𝑗𝑘

Note that the coefficients involve up to four derivatives of 𝐹 and normalized sums of matrix elements of 𝑅. Using the estimate (15.16) for 𝑅 and the derivative ˆ we see that all these coefficients are bounds (16.3) for the typical values of 𝑅, 𝐶(𝜏+𝜀) bounded by 𝑁 with a very large probability, where 𝐶 is an explicit constant. ˆ but this event has a very We use the bound (16.4) for the extreme values of 𝑅, 𝑠 𝑢 small probability by (15.16). Therefore, the coefficients of the moments 𝔼𝑣 𝑖𝑗 𝑣𝑖𝑗 , (0) (4) 𝑢+𝑠 ≤ 4, in the quantities 𝐴𝐯 , … , 𝐴𝐯 are essentially bounded, modulo a factor 𝑁 𝐶(𝜏+𝜀) . Notice that the 𝑘th moment of 𝑣𝑖𝑗 appears only in the 𝑚 = 𝑘 term that already has a prefactor 𝑁 −𝑘/2 in (16.10). (5)

Finally, we have to estimate the error term 𝐴𝐯 . All terms without Ω can be dealt with as before; after estimating the derivatives of 𝐹 by 𝑁 𝐶(𝜏+𝜀) , one can perform the expectation with respect to 𝑣𝑖𝑗 that is independent of 𝑅ˆ(𝑚) . For the terms involving Ω, one can argue similarly by appealing to the fact that the matrix elements of 𝑆 are also essentially bounded by 𝑁 𝐶(𝜏+𝜀) (see (15.16)) and that 𝑣𝑖𝑗 has subexponential decay. Alternatively, one can use the Hölder inequality to decouple 𝑆 from the rest and use (15.16) directly, e.g., 1 ˆ ˆ Tr(𝑅𝑉)5 𝑆| 𝔼|𝐹 ′ (𝑅)Ω| = 𝔼|𝐹 ′ (𝑅) 𝑁 1 ˆ 2 Tr 𝑆 2 ]1/2 [𝔼 Tr(𝑅𝑉)5 (𝑉𝑅∗ )5 ]1/2 ≤ [𝔼(𝐹 ′ (𝑅)) 𝑁 ≤ 𝐶𝑁 −5/2+𝐶(𝜏+𝜀) . Note that exactly the same perturbation expansion holds for the resolvent of 𝐻𝛾−1 with 𝐴𝐯 replaced by 𝐴𝐰 ; i.e., 𝑣𝑖𝑗 is replaced with 𝑤𝑖𝑗 everywhere. By the moment matching condition (16.2), the expectation values (𝑚)

𝑁 −𝑚/2 |𝔼𝐴𝐯

(𝑚) − 𝔼𝐴𝐰 | ≤ 𝑁 𝐶(𝜏+𝜀)−𝛿−2

for 𝑚 ≤ 4 in (16.10). Choosing 𝜏 = 𝜀, we have 𝔼𝐹(

1 1 1 1 Tr ) − 𝔼𝐹( Tr ) ≤ 𝐶𝑁 −5/2+𝐶𝜀 + 𝐶𝑁 −2−𝛿+𝐶𝜀 . 𝑁 𝐻𝛾 − 𝑧 𝑁 𝐻𝛾−1 − 𝑧

188

16. UNIVERSALITY OF WIGNER MATRICES IN SMALL ENERGY WINDOWS: GFT

After summing up in (16.7) we have thus proved that 1 1 1 1 Tr ) − 𝔼𝐹( Tr ) ≤ 𝐶𝑁 −1/2+𝐶𝜀 + 𝐶𝑁 −𝛿+𝐶𝜀 . 𝑁 𝑁 𝐻 (𝐯) − 𝑧 𝐻 (𝐰) − 𝑧 The proof can be easily generalized to functions of several variables. This concludes the proof of Theorem 16.1. □ 𝔼𝐹(

16.2. Conclusion of the Three-Step Strategy In this section we put the previous results together to prove our main result, Theorem 5.1. Recall that the main result of the Step 2, Theorem 12.4, asserts the universality of the Gaussian divisible ensemble. However, in order to reach energy window 𝑁 −1+𝜀 , we need to add a substantially large Gaussian component. In terms of DBM, we need the time 𝑡 ≥ 𝑁 −𝑐𝜀 for some 𝑐 > 0. The eigenvalue continuity result in the previous section is unfortunately restricted to 𝑡 ≤ 𝑁 −1−𝜀 . To overcome the deficiency of the continuity, we will approximate the Wigner ensemble by a suitably chosen Gaussian divisible ensemble. Thus, we term this step “approximation by a Gaussian divisible ensemble.” We first review the classical moment matching problem for real random variables. For any real random variable 𝜉, denote by 𝑚𝑘 (𝜉) = 𝔼𝜉 𝑘 its 𝑘th moment. By the Schwarz inequality, the sequence of moments, 𝑚1 , 𝑚2 , … are not arbitrary numbers, e.g., |𝑚1 |𝑘 ≤ 𝑚𝑘 and 𝑚22 ≤ 𝑚4 , etc., but there are more subtle relations. For example, if 𝑚1 = 0, then 𝑚4 𝑚2 − 𝑚32 ≥ 𝑚23 ,

(16.11) which can be obtained by

𝑚32 = [𝔼𝜉 3 ]2 = [𝔼𝜉(𝜉 2 − 1)]2 ≤ [𝔼𝜉 2 ][𝔼(𝜉 2 − 1)2 ] = 𝑚2 (𝑚4 − 2𝑚22 + 1) and noticing that (16.11) is scale invariant, so it is sufficient to prove it for 𝑚2 = 1. In fact, it is easy to see that (16.11) saturates if and only if the support of 𝜉 consists of exactly two points (apart from the trivial case when 𝜉 ≡ 0). This restriction shows that given a sequence of four admissible moments, 𝑚1 = 0, 𝑚2 = 1, 𝑚3 , 𝑚4 , there may not exist a Gaussian divisible random variable 𝜉 with these moments; e.g., the moment sequence (𝑚1 , 𝑚2 , 𝑚3 , 𝑚4 ) = (0, 1, 0, 1) uniquely characterizes the standard Bernoulli variable (𝜉 = ±1 with 1/2-1/2 probability). However, if we allow a bit of room in the fourth moment, then one can match any four admissible moments by a small Gaussian divisible variable. This is the content of the next lemma [69, lemma 6.5]. This matching idea in the context of random matrices appeared earlier in [131]. For simplicity, we formulate it in the case of real variables; the extension to complex variables is standard. Lemma 16.2 (Moment matching). Let 𝑚3 and 𝑚4 be two real numbers such that (16.12)

𝑚4 − 𝑚32 − 1 ≥ 0,

𝑚4 ≤ 𝐶2 ,

16.2. CONCLUSION OF THE THREE-STEP STRATEGY

189

for some positive constant 𝐶2 . Let 𝜉 G be a Gaussian random variable with mean 0 and variance 1. Then, for any sufficiently small 𝛾 > 0 (depending on 𝐶2 ), there exists a real random variable 𝜉𝛾 with subexponential decay that is independent of 𝜉 G such that the first four moments of 𝜉 ′ = (1 − 𝛾)1/2 𝜉𝛾 + 𝛾1/2 𝜉 G

(16.13)

are 𝑚1 (𝜉 ′ ) = 0, 𝑚2 (𝜉 ′ ) = 1, 𝑚3 (𝜉 ′ ) = 𝑚3 , and 𝑚4 (𝜉 ′ ), and |𝑚4 (𝜉 ′ ) − 𝑚4 | ≤ 𝐶𝛾

(16.14) for some 𝐶 depending on 𝐶2 .

Proof. We first construct a random variable 𝑋 with the first four moments given by 0, 1, 𝑚3 , 𝑚4 satisfying 𝑚4 − 𝑚32 − 1 ≥ 0. We take the law of 𝑋 to be of the form 𝑝𝛿𝑎 + 𝑞𝛿−𝑏 + (1 − 𝑝 − 𝑞)𝛿0 , where 𝑎, 𝑏, 𝑝, 𝑞 ≥ 0 are parameters satisfying 𝑝+𝑞 ≤ 1. The conditions 𝑚1 (𝑋) = 0 and 𝑚2 (𝑋) = 1 imply 1 1 𝑝= , 𝑞= . 𝑎(𝑎 + 𝑏) 𝑏(𝑎 + 𝑏) Furthermore, the condition 𝑝 +𝑞 ≤ 1 reads 𝑎𝑏 ≥ 1. By an explicit computation, we find (16.15)

𝑚3 (𝑋) = 𝑎 − 𝑏,

𝑚4 (𝑋) = 𝑚3 (𝑋)2 + 𝑎𝑏.

Clearly, the condition 𝑚4 − 𝑚32 − 1 ≥ 0 implies that (16.15) has a solution with 𝑎𝑏 ≥ 1. This proves the existence of a random variable supported at three points given four moments 0, 1, 𝑚3 , and 𝑚4 satisfying 𝑚4 − 𝑚32 − 1 ≥ 0. Our main task is to construct a Gaussian divisible distribution that approximates the four moment matching conditions stated in the lemma. For any real random variable 𝜁, independent of 𝜉 G , and with the first four moments being 0, 1, 𝑚3 (𝜁) and 𝑚4 (𝜁) < ∞, the first four moments of 𝜁 ′ = (1 − 𝛾)1/2 𝜁 + 𝛾1/2 𝜉 G

(16.16) are 0, 1,

𝑚3 (𝜁 ′ ) = (1 − 𝛾)3/2 𝑚3 (𝜁),

(16.17) and (16.18)

𝑚4 (𝜁 ′ ) = (1 − 𝛾)2 𝑚4 (𝜁) + 6𝛾 − 3𝛾2 .

Since we can match four moments by a random variable, for any 𝛾 > 0 there exists a real random variable 𝜉𝛾 such that the first four moments are (16.19)

0,

1,

𝑚3 (𝜉𝛾 ) = (1 − 𝛾)−3/2 𝑚3 ,

and 𝑚4 (𝜉𝛾 ) = 𝑚3 (𝜉𝛾 )2 + (𝑚4 − 𝑚32 ).

With 𝑚4 ≤ 𝐶2 , we have 𝑚32 ≤ 𝐶23/2 ; thus, (16.20)

|𝑚4 (𝜉𝛾 ) − 𝑚4 | ≤ 𝐶𝛾

190

16. UNIVERSALITY OF WIGNER MATRICES IN SMALL ENERGY WINDOWS: GFT

for some 𝐶 depending on 𝐶2 . Hence with (16.17) and (16.18), we obtain that 𝜉 ′ = (1 − 𝛾)1/2 𝜉𝛾 + 𝛾1/2 𝜉 G satisfies 𝑚3 (𝜉 ′ ) = 𝑚3 and (16.14). This completes the proof of Lemma 16.2. □ We now prove Theorem 5.1, i.e., that the limit in (15.1) holds. We restrict ourselves to the real symmetric case since the moment matching Lemma 16.2 was formulated for real random variables. A similar argument works for the complex case. From the universality of Gaussian divisible ensembles (more precisely, (12.24)) for any 𝑏 = 𝑁 −1+10𝜀 and 𝑡 ≥ 𝑁 −𝜀 , we have 𝐸+𝑏

1 𝜶 (𝑛) (𝑛) ∫ d𝐸 ′ ∫ d𝜶 𝑂(𝜶)(𝑝𝑡,𝑁 − 𝑝𝐺,𝑁 )(𝐸 ′ + ) → 0 2𝑏 𝐸−𝑏 𝑁 ℝ𝑛

(16.21) (𝑛)

where 𝑝𝑡,𝑁 is the eigenvalue correlation function of the matrix ensemble 𝐻𝑡 = 𝑒−𝑡/2 𝐻0 + √1 − 𝑒−𝑡 𝐻 G . The initial matrix ensemble 𝐻0 can be any Wigner ensemble. Recall that 𝐻 is the Wigner ensemble for which we wish to prove (15.1). Given the first four moments 𝑚1 = 0, 𝑚2 = 1, 𝑚3 , and 𝑚4 of the rescaled matrix elements √𝑁ℎ𝑖𝑗 of 𝐻, we first use Lemma 16.2 to construct a distribution 𝜉𝛾 with 𝛾 = 1 − 𝑒−𝑡 . This distribution has the property that the first four moments of 𝜉 ′ , defined by (16.13), matches the first four moments of the rescaled matrix elements of 𝐻. We now choose 𝐻0 to be the Wigner ensemble with rescaled matrix elements distributed according to 𝜉𝛾 . Then, on one hand, the matrix 𝐻𝑡 will have universal local spectral statistics in the sense of (16.21); on the other hand, we can apply the Green function comparison Theorem 16.1 to the matrix ensembles 𝐻 and 𝐻𝑡 so that (15.9) holds for these two ensembles. Clearly, (15.1) follows from (16.21) and (15.9). Note that averaging in the energy parameter 𝐸 was necessary only for the first step in (16.21), the comparison argument works for any fixed energy 𝐸.

CHAPTER 17

Edge Universality The main result of edge universality for the generalized Wigner matrix is given by the following theorem. For simplicity, we formulate it only for the real symmetric case; the complex Hermitian case is analogous. Theorem 17.1 (Universality of extreme eigenvalues). Suppose that we have two real symmetric generalized 𝑁 × 𝑁 Wigner matrices, 𝐻 (𝐯) and 𝐻 (𝐰) , with matrix elements ℎ𝑖𝑗 given by the random variables 𝑁 −1/2 𝑣𝑖𝑗 and 𝑁 −1/2 𝑤𝑖𝑗 , respectively, with 𝑣𝑖𝑗 and 𝑤𝑖𝑗 satisfying the uniform subexponential decay condition (2.7). Suppose that 2 2 𝔼𝑣𝑖𝑗 = 𝔼𝑤𝑖𝑗 .

(17.1) (𝐯)

(𝐰)

Let 𝜆𝑁 and 𝜆𝑁 denote the largest eigenvalues of 𝐻 (𝐯) and 𝐻 (𝐰) , respectively. Then there is an 𝜀 > 0 and 𝛿 > 0 depending on 𝜗 in (2.7) such that for any real parameter 𝑠 (may depend on 𝑁) we have (𝐯)

ℙ(𝑁 2/3 (𝜆𝑁 − 2) ≤ 𝑠 − 𝑁 −𝜀 ) − 𝑁 −𝛿 (17.2)

(𝐰)

≤ ℙ(𝑁 2/3 (𝜆𝑁 − 2) ≤ 𝑠) (𝐯)

≤ ℙ(𝑁 2/3 (𝜆𝑁 − 2) ≤ 𝑠 + 𝑁 −𝜀 ) + 𝑁 −𝛿 for 𝑁 ≥ 𝑁0 sufficiently large, where 𝑁0 is independent of 𝑠. An analogous result holds for the smallest eigenvalue 𝜆1 . This theorem shows that the statistics of the extreme eigenvalues depend only on the second moments of the centered matrix entries, but the result does not determine the distribution. For the Gaussian case, the corresponding distribution was identified by Tracy and Widom in [134, 135]. Theorem 17.1 immediately gives the same result for Wigner matrices, but not yet for generalized Wigner matrices. In fact, the Tracy-Widom law for generalized Wigner matrices does not follow from Theorem 17.1, and its proof in [24] required a quite different argument (see Theorem 18.7). (𝐯) We first give an outline of the proof of Theorem 17.1. Denote by 𝜚𝑁 the empirical eigenvalue distribution (𝐯) 𝜚𝑁 (𝑥)

𝑁

1 (𝐯) ∑ 𝛿(𝑥 − 𝜆𝛼 ). = 𝑁 𝛼=1 191

192

17. EDGE UNIVERSALITY (𝐯)

(𝐰)

Our goal is to compare the difference of 𝜚𝑁 and 𝜚𝑁 via their Stieltjes trans(𝐯)

(𝐰)

forms, 𝑚𝑁 (𝑧) and 𝑚𝑁 (𝑧). We will be able to locate the largest eigenvalue via the distribution of certain functionals of the Stieltjes transforms. This will be done in the preparatory Lemma 17.2 and its corollary, Corollary 17.3. Therefore, (𝐯)

(𝐰)

(𝐯)

if 𝑚𝑁 (𝑧) and 𝑚𝑁 (𝑧) are sufficiently close, we will be able to compare 𝜆𝑁 and (𝐰)

𝜆𝑁 . The main ingredient of the proof of Theorem 17.1 is the Green function comparison theorem at the edge (Theorem 17.4), which shows that the distributions (𝐯) (𝐰) of 𝑚𝑁 (𝑧) and 𝑚𝑁 (𝑧) on the critical scale Im 𝑧 ≈ 𝑁 −2/3 are the same provided the second moments of the matrix elements match. This comparison will be done by a resolvent expansion as done for the Green function comparison theorem in the bulk (Theorem 16.1). However, the resolvent expansion is more efficient at the edge for a reason that we explain now. Recall (6.13), stating that 𝑚sc (𝑧) ≍ 𝜂 if 𝜅 = ||𝐸| − 2| ≤ 𝜂 and 𝑧 = 𝐸 + 𝑖𝜂. Since the extremal eigenvalue gaps are expected to be order 𝑁 −2/3 , we need to set 𝜂 ≪ 𝑁 −2/3 in order to identify individual eigenvalues. This is a smaller scale than that of the local semicircle law; in fact, 𝑚𝑁 (𝑧) does not have a deterministic limit; we need to identify its distribution. Hence, the reference size of Im 𝑚𝑁 (𝑧) that we should keep in mind is 𝑁 −1/3 . The largest eigenvalue can be located via the Stieltjes transform as follows. By definition of 𝑚𝑁 , if there is an eigenvalue in a neighborhood of 𝐸 within distance 𝜂, then Im 𝑚𝑁 (𝐸 + i𝜂) =

𝜂 1 1 ∑ ≥ ≫ 𝑁 −1/3 ; 2 2 𝑁 𝛼 (𝜆𝛼 − 𝐸) + 𝜂 𝑁𝜂

otherwise Im 𝑚𝑁 (𝐸 +i𝜂) ≲ 𝑁 −1/3 . Based upon this idea, we will construct a certain functional of Im 𝑚𝑁 that expresses the distribution of 𝜆𝑁 . In other words, we expect that if we can control the imaginary part of the trace of the Green function by 𝑁 −1/3 , then we can identify the individual extremal eigenvalues. Roughly speaking, our basic principle is that whenever we understand 𝑚𝑁 (𝑧) with a precision better than (𝑁𝜂)−1 (i.e., we can improve over the strong local semicircle law (6.32) 1 |𝑚𝑁 (𝑧) − 𝑚sc (𝑧)| ≺ , Im 𝑧 = 𝜂, 𝑁𝜂 at some scale 𝜂), then we can “identify” the eigenvalue distribution at that scale precisely. It is instructive to compare the edge situation with the bulk, where the critical scale 𝜂 ≍ 1/𝑁 identifies individual eigenvalues and the typical size of Im 𝑚𝑁 is of order 1. In other words, the critical scale Im 𝑚𝑁 is of order 1 in the bulk and is of order 𝑁 −1/3 at the edge. Owing to (6.31), this fact extends to each resolvent matrix element, i.e., |𝐺𝑖𝑗 | ≲ 𝑁 −1/3 at the edge. Thus a resolvent expansion is more powerful near the edge, which explains that less moment matching is needed than in the bulk.

17. EDGE UNIVERSALITY

193

Now we start the detailed proof of Theorem 17.1. We first introduce some notation. For any 𝐸1 ≤ 𝐸2 , let 𝒩(𝐸1 , 𝐸2 ) ∶= #{𝑗 ∶ 𝐸1 ≤ 𝜆𝑗 ≤ 𝐸2 } = Tr 1[𝐸1 ,𝐸2 ] (𝐻) be the unnormalized counting function of the eigenvalues. Set 𝐸+ ∶= 2 + 𝑁 −2/3+𝜀 . From rigidity, Theorem 11.5, we know that 𝜆𝑁 ≤ 𝐸+ , i.e., (17.3)

𝒩(𝐸, ∞) = 𝒩(𝐸, 𝐸+ )

with very high probability, so it is sufficient to consider the counting function 𝒩(𝐸, 𝐸+ ) = Tr 𝜒𝐸 (𝐻), with 𝜒𝐸 (𝑥) ∶= 1[𝐸,𝐸+ ] (𝑥). We will approximate the sharp cutoff function 𝜒𝐸 by its smoothened version 𝜒𝐸 ⋆𝜃𝜂 on scale 𝜂 ≪ 𝑁 −2/3 , where we recall the notation 𝜃𝜂 (𝑥) from (15.21). The following lemma shows that the error caused by this replacement is negligible. Lemma 17.2. For any 𝜀 > 0 set ℓ1 = 𝑁 −2/3−3𝜀 and 𝜂 = 𝑁 −2/3−9𝜀 . Then there exist constants 𝐶, 𝑐 such that for any 𝐸 satisfying |𝐸 − 2| ≤ 32 𝑁 −2/3+𝜀 and for any 𝐷 we have (17.4) ℙ{| Tr 𝜒𝐸 (𝐻) − Tr 𝜒𝐸 ⋆ 𝜃𝜂 (𝐻)| ≤ 𝐶(𝑁 −2𝜀 + 𝒩(𝐸 − ℓ1 , 𝐸 + ℓ1 ))} ≥ 1 − 𝑁 −𝐷 if 𝑁 ≥ 𝑁0 (𝐷) is large enough. We will not give the detailed proof of this lemma since it is a straightforward approximation argument. We only point out that 𝜃𝜂 is an approximate delta function on scale 𝜂 ≪ 𝑁 −2/3 , and if it were compactly supported, then the difference between Tr 𝜒𝐸 (𝐻) and Tr 𝜒𝐸 ⋆ 𝜃𝜂 (𝐻) were clearly bounded by the number of eigenvalues in an 𝜂-vicinity of 𝐸. Since 𝜃𝜂 (𝑥) = 𝜋 −1 𝜂/(𝑥 2 + 𝜂 2 ); i.e., it has a quadratically decaying tail, the above argument must be complemented by estimating the density of eigenvalues away from 𝐸. This estimate comes from two contributions. Eigenvalues within the ℓ1 -vicinity of 𝐸 are directly estimated by the term 𝒩(𝐸 − ℓ1 , 𝐸 + ℓ1 ). Eigenvalues farther away come with an additional factor 𝜂/ℓ1 = 𝑁 −6𝜀 due to the decay of 𝜃𝜂 ; thus, it is sufficient to estimate their density only up to an 𝑁 𝜀 factor precision, which is provided by the optimal local law at the edge. For precise details, see the proof of lemma 6.1 in [70] for essentially the same argument. Using (17.3) and the local law to estimate a slightly averaged version of 𝒩(𝐸 − ℓ1 , 𝐸 + ℓ1 ), we arrive at the following corollary: Corollary 17.3. Let |𝐸 − 2| ≤ 𝑁 −2/3+𝜀 , ℓ = 12 ℓ1 𝑁 2𝜀 = 12 𝑁 −2/3−𝜀 . Then the inequality (17.5)

Tr 𝜒𝐸+ℓ ⋆ 𝜃𝜂 (𝐻) − 𝑁 −𝜀 ≤ 𝒩(𝐸, ∞) ≤ Tr 𝜒𝐸−ℓ ⋆ 𝜃𝜂 (𝐻) + 𝑁 −𝜀

194

17. EDGE UNIVERSALITY

holds with probability bigger than 1 − 𝑁 −𝐷 for any 𝐷 if 𝑁 is large enough. Furthermore, we have (17.6)

𝔼𝐹(Tr 𝜒𝐸−ℓ ⋆ 𝜃𝜂 (𝐻)) ≤ ℙ(𝒩(𝐸, ∞) = 0) ≤ 𝔼𝐹(Tr 𝜒𝐸+ℓ ⋆ 𝜃𝜂 (𝐻)) + 𝐶𝑁 −𝐷 . 3

Proof. We have 𝐸+ − 𝐸 ≫ ℓ; thus |𝐸 − 2 − ℓ| ≤ 𝑁 −2/3−𝜀 , and therefore 2 (17.4) holds for 𝐸 replaced with 𝑦 ∈ [𝐸 − ℓ, 𝐸] as well. We thus obtain 𝐸

Tr 𝜒𝐸 (𝐻) ≤ ℓ

−1



d𝑦 Tr 𝜒𝑦 (𝐻)

𝐸−ℓ 𝐸

≤ ℓ−1 ∫

d𝑦 Tr 𝜒𝑦 ∗ 𝜃𝜂 (𝐻)

𝐸−ℓ 𝐸

+ 𝐶ℓ−1 ∫

d𝑦[𝑁 −2𝜀 + 𝒩(𝑦 − ℓ1 , 𝑦 + ℓ1 )]

𝐸−ℓ

≤ Tr 𝜒𝐸−ℓ ∗ 𝜃𝜂 (𝐻) + 𝐶𝑁 −2𝜀 + 𝐶

ℓ1 𝒩(𝐸 − 2ℓ, 𝐸 + ℓ) ℓ

with a probability larger than 1 − 𝑁 −𝐷 for any 𝐷 > 0. From the rigidity estimate in the form (11.25), the conditions |𝐸 − 2| ≤ 𝑁 −2/3+𝜀 , ℓ1 /ℓ = 2𝑁 −2𝜀 , and ℓ ≤ 𝑁 −2/3 , we can bound 𝐸+ℓ

ℓ1 𝒩(𝐸 − 2ℓ, 𝐸 + ℓ) ≤ 𝑁 1−2𝜀 ∫ 𝜚sc (𝑥)d𝑥 + 𝑁 −2𝜀 𝑁 𝜀 ≤ 𝐶𝑁 −𝜀 ℓ 𝐸−2ℓ with a very high probability, where we estimated the explicit integral using that the integration domain is in a 𝐶𝑁 −2/3+𝜀 -vicinity of the edge at 2. We have thus proved 𝒩(𝐸, 𝐸+ ) = Tr 𝜒𝐸 (𝐻) ≤ Tr 𝜒𝐸−ℓ ∗ 𝜃𝜂 (𝐻) + 𝑁 −𝜀 . By (17.3), we can replace 𝒩(𝐸, 𝐸+ ) by 𝒩(𝐸, ∞). This proves the upper bound of (17.5) and the lower bound can be proved similarly. In the event that (17.5) holds, the condition 𝒩(𝐸, ∞) = 0 implies that Tr 𝜒𝐸+ℓ ∗ 𝜃𝜂 (𝐻) ≤ 1/9. Thus we have (17.7)

ℙ(𝒩(𝐸, ∞) = 0) ≤ ℙ(Tr 𝜒𝐸+ℓ ∗ 𝜃𝜂 (𝐻) ≤ 1/9) + 𝐶𝑁 −𝐷 .

Together with the Markov inequality, this proves the upper bound in (17.6). For the lower bound, we use 𝔼𝐹(Tr 𝜒𝐸−ℓ ∗ 𝜃𝜂 (𝐻)) ≤ ℙ(Tr 𝜒𝐸−ℓ ∗ 𝜃𝜂 (𝐻) ≤ 2/9) ≤ ℙ(𝒩(𝐸, ∞) ≤ 2/9 + 𝑁 −𝜀 ) = ℙ(𝒩(𝐸, ∞) = 0), where we used the upper bound from (17.5) and that 𝒩 is an integer. This completes the proof of the corollary. □

17. EDGE UNIVERSALITY

195

Since ℙ(𝜆𝑁 < 𝐸) = ℙ(𝒩(𝐸, ∞) = 0), we will use (17.6) and the identity 𝐸+

(17.8)

Tr 𝜒𝐸 ⋆ 𝜃𝜂 (𝐻) = 𝑁 ∫

d𝑦 Im 𝑚𝑁 (𝑦 + 𝑖𝜂)

𝐸

to relate the distribution of 𝜆𝑁 with that of the Stieltjes transform 𝑚𝑁 . The following Green function comparison theorem shows that the distribution of the right-hand side of (17.8) depends only on the second moments of the matrix elements. Its proof will be given after we have completed the proof of Theorem 17.1. Theorem 17.4 (Green function comparison theorem on the edge). Suppose that the assumptions of Theorem 17.1, including (17.1), hold. Let 𝐹 ∶ ℝ → ℝ be a function whose derivatives satisfy (17.9)

max |𝐹 (𝛼) (𝑥)| (|𝑥| + 1)

−𝐶1

𝑥

≤ 𝐶1 ,

𝛼 = 1, 2, 3,

with some constant 𝐶1 > 0. Then, there exists 𝜀0 > 0 depending only on 𝐶1 such that for any 𝜀 < 𝜀0 and for any real numbers 𝐸1 and 𝐸2 satisfying |𝐸1 − 2| ≤ 𝐶𝑁 −2/3+𝜀 ,

|𝐸2 − 2| ≤ 𝐶𝑁 −2/3+𝜀 ,

and setting 𝜂 = 𝑁 −2/3−𝜀 , we have 𝐸2 | (17.10) ||𝔼𝐹(𝑁 ∫ d𝑦 Im 𝑚(𝐯) (𝑦 + 𝑖𝜂)) | 𝐸1 𝐸2

− 𝔼𝐹(𝑁 ∫ 𝐸1

| d𝑦 Im 𝑚(𝐰) (𝑦 + 𝑖𝜂))| ≤ 𝐶𝑁 −1/6+𝐶𝜀 |

for some constant 𝐶 and large enough 𝑁 depending only on 𝐶1 , 𝜗, 𝛿± , and 𝐶0 (in (14.26)). Note that the 𝑁 times the integration gives a factor 𝑁|𝐸1 − 𝐸2 | ∼ 𝑁 1/3+𝜀 on the left-hand side. This factor compensates for the “natural” size of the imaginary part of the Stieltjes transform obtained from the local semicircle law and makes the argument of 𝐹 to be of order 1. The bound (17.10) shows that the distributions of this order 1 quantity with respect to the 𝐯- and 𝐰-ensembles are asymptotically the same. Assuming that Theorem 17.4 holds, we now prove Theorem 17.1. Proof of Theorem 17.1. By the rigidity of the eigenvalues in the form of (11.25), we know that |𝜆𝑁 − 2| ≤ 𝑁 −2/3+𝜀 ,

𝒩(2 − 𝐶𝑁 −2/3+𝜀 , 2 + 𝐶𝑁 −2/3+𝜀 ) ≤ 𝐶𝑁 𝜀 ,

with very high probability. Thus we can assume that |𝑠| ≤ 𝑁 𝜀 holds for the parameter 𝑠 in Theorem 17.1; the statement is trivial for other values of 𝑠. We

196

17. EDGE UNIVERSALITY

define 𝐸 ∶= 2 + 𝑠𝑁 −2/3 . With the left side of (17.6), for any sufficiently small 𝜀 > 0, we have 𝔼𝐰 𝐹(Tr 𝜒𝐸−ℓ ∗ 𝜃𝜂 (𝐻)) ≤ ℙ𝐰 (𝒩(𝐸, ∞) = 0)

(17.11) with the choice

1 ℓ ∶= 𝑁 −2/3−𝜀 , 𝜂 ∶= 𝑁 −2/3−9𝜀 . 2 The bound (17.10) applying to the case 𝐸1 = 𝐸 − ℓ and 𝐸2 = 𝐸+ shows that there exists 𝛿 > 0, for sufficiently small 𝜀 > 0, such that (17.12)

𝔼𝐯 𝐹(Tr 𝜒𝐸−ℓ ∗ 𝜃𝜂 (𝐻)) ≤ 𝔼𝐰 𝐹(Tr 𝜒𝐸−ℓ ∗ 𝜃𝜂 (𝐻)) + 𝑁 −𝛿

(note that 9𝜀 plays the role of the 𝜀 in the Green function comparison theorem). Then, applying the right side of (17.6) to the left-hand side of (17.12), we have (17.13)

ℙ𝐯 (𝒩(𝐸 − 2ℓ, ∞) = 0) ≤ 𝔼𝐯 𝐹(Tr 𝜒𝐸−ℓ ∗ 𝜃𝜂 (𝐻)) + 𝐶𝑁 −𝐷 .

Combining these inequalities, we have (17.14)

ℙ𝐯 (𝒩(𝐸 − 2ℓ, ∞) = 0) ≤ ℙ𝐰 (𝒩(𝐸, ∞) = 0) + 2𝑁 −𝛿

for sufficiently small 𝜀 > 0 and sufficiently large 𝑁. By recalling that 𝐸 = 2 + 𝑠𝑁 −2/3 , this proves the first inequality of (17.2) and, by switching the roles of 𝐯 and 𝐰, the second inequality of (17.2) as well. This completes the proof of Theorem 17.1. □ Proof of Theorem 17.4. We follow the notation and the setup in the proof of Theorem 16.1. First, we prove a simpler version of (17.10) with 𝐹(𝑥) = 𝑥 and without integration. Namely, we show that for any 𝐸 with |𝐸 − 2| ≤ 𝑁 −2/3+𝜀 we have (𝐰) (𝐯) 𝑁𝜂 ||𝔼𝑚𝑁 (𝑧) − 𝔼𝑚𝑁 (𝑧)|| 1 1 1 | 1 | (17.15) | = 𝑁𝜂 ||𝔼 Tr − 𝔼 Tr (𝐯) (𝐰) 𝑁 𝑁 𝐻 −𝑧 𝐻 − 𝑧| ≤ 𝐶𝑁 −1/6+𝐶𝜀 , 𝑧 = 𝐸 + 𝑖𝜂. The prefactor 𝑁𝜂 accounts for the 𝑁 times the integration from 𝐸1 to 𝐸2 in (17.10) since |𝐸1 − 𝐸2 | ≤ 𝐶𝜂. We write (17.16) 𝔼

1 1 1 1 Tr − 𝔼 Tr = (𝐯) (𝐰) 𝑁 𝑁 𝐻 −𝑧 𝐻 −𝑧 𝛾(𝑁)

∑ [𝔼 𝛾=1

1 1 1 1 Tr − 𝔼 Tr ] 𝑁 𝐻𝛾 − 𝑧 𝑁 𝐻𝛾−1 − 𝑧

with 𝑅 ∶= (𝑄 − 𝑧)−1 , 𝑆 ∶= (𝐻𝛾 − 𝑧)−1 , and (17.17)

1 Tr 𝑆 = 𝑅ˆ + 𝜉𝐯 , 𝑁

4

(𝑚)

𝜉𝐯 ∶= ∑ 𝑁 −𝑚/2 𝑅ˆ𝐯 𝑚=1

+ 𝑁 −5/2 Ω𝐯 ,

17. EDGE UNIVERSALITY

197

with 1 𝑅ˆ ∶= Tr 𝑅, 𝑁

1 (𝑚) 𝑅ˆ𝐯 ∶= (−1)𝑚 Tr(𝑅𝑉)𝑚 𝑅, 𝑁 1 5 Ω𝐯 ∶= − Tr(𝑅𝑉) 𝑆. 𝑁

(17.18)

All these quantities depend on 𝛾, or equivalently on the pair of indices (𝑖, 𝑗) with ˆ 𝜙(𝑖, 𝑗) = 𝛾 (see the proof of Theorem 16.1); i.e., 𝑅ˆ = 𝑅(𝛾), etc., but we will omit this fact from the notation. We also define the same quantities for 𝐯 replaced with 𝐰. By the moment matching condition (17.1), (𝑚)

𝔼𝑅ˆ𝐯

(𝑚)

= 𝔼𝑅ˆ𝐰 ,

𝑚 = 0, 1, 2,

so we can consider the 𝑚 = 3, 4 terms only in the summation in (17.17). Since |𝐸 −2| ≤ 𝑁 −2/3+𝜀 and 𝜂 = 𝑁 −2/3−𝜀 , the strong local semicircle law (6.32) implies that (17.19)

|𝑅𝑖𝑗 (𝑧) − 𝛿𝑖𝑗 𝑚sc (𝑧)| ≺

Im 𝑚sc (𝑧) 1 + ≤ 𝑁 −1/3+𝐶𝜀 , 𝑁𝜂 √ 𝑁𝜂

and a similar bound holds for 𝑆. In particular, the off-diagonal terms of the Green functions are of order 𝑁 −1/3+𝐶𝜀 and the diagonal terms are bounded with high (𝑚) probability. As a crude bound, we immediately obtain that |𝑅ˆ𝐯 |, |Ω𝐯 | ≤ 𝑁 𝐶𝜀 with high probability. Hence we have 𝛾(𝑁)

(17.20)

1 1 1 | 1 | |≤ ∑ (𝑁𝜂)||𝔼 Tr − 𝔼 Tr | 𝑁 𝐻 − 𝑧 𝑁 𝐻 − 𝑧 𝛾 𝛾−1 𝛾=1 4

(𝑚) (𝑚) ∑ 𝑁 2 (𝑁𝜂)𝑁 −𝑚/2 max|𝔼[𝑅ˆ𝐯 − 𝑅ˆ𝐰 ]| + 𝜂𝑁 1/2+𝐶𝜀 . 𝛾

𝑚=3

Given a fixed 𝛾 with 𝜙(𝑖, 𝑗) = 𝛾, we have the explicit formula (𝑚) (𝑚) 𝑅ˆ𝐯 = 𝑅ˆ𝐯 (𝛾)

(17.21)

𝑚

= (−1)𝑚

1 ∑∑ ∑ [𝑅 𝑣 𝑅 … 𝑣𝑎𝑚 𝑏𝑚 𝑅𝑏𝑚 𝑘 ]. 𝑁 𝑘 ℓ=1 {𝑎 ,𝑏 }={𝑖,𝑗} 𝑘𝑎1 𝑎1 𝑏1 𝑏1 𝑎2 ℓ



First we discuss the generic case, when all indices are different, in particular 𝑖 ≠ 𝑗, and later we comment on the case of the coinciding indices. Notice that, generically, the first and the last resolvents 𝑅𝑘𝑎1 , 𝑅𝑏𝑚 𝑘 are off-diagonal terms; thus their size is 𝑁 −1/3+𝐶𝜀 . Every other factor in (17.21) is 𝑂(𝑁 𝜀 ). Hence the generic contributions to the sum (17.20) give (17.22)

(𝑚)

𝑁 2 (𝑁𝜂)𝑁 −𝑚/2 𝑅ˆ𝐯

≤ 𝑁 2−𝑚/2 𝑁 −1/3+𝐶𝜀 .

198

17. EDGE UNIVERSALITY

For 𝑚 = 4 this is 𝑁 −1/3+𝐶𝜀 , so it is negligible. For 𝑚 = 3, we have the following: (3)

𝑁 2 (𝑁𝜂)𝑁 −𝑚/2 𝑅ˆ𝐯 ≤ 𝑁 1/6+𝐶𝜀 ,

(17.23)

so the straightforward power counting is not sufficient. Our goal is to show that the expectation of this term is in fact much smaller. To see this, we can assume that on the right-hand side of (17.21), all the Green functions except the first and the last one are diagonal since otherwise we have an additional (third) off-diagonal term, which yields an extra factor 𝑁 −1/3+𝐶𝜀 , which would be sufficient if combined with the trivial power counting yielding (17.23). If both intermediate Green functions in (17.21) are diagonal, then we can replace each of them with 𝑚sc by using the strong local semicircle law (17.19) at the expense of a multiplicative error 𝑁 −1/3+𝐶𝜀 . This improves the previous power counting of order 𝑁 1/6+𝐶𝜀 to 𝑁 −1/6+𝐶𝜀 as required in (17.15). Thus we only have to consider the contributions from the terms 1 ∑ ∑ (17.24) 𝑚2 𝔼[𝑅𝑘𝑎1 𝑣𝑎1 𝑏1 𝑣𝑎2 𝑏2 𝑣𝑎3 𝑏3 𝑅𝑏3 𝑘 ], 𝑁 𝑘 {𝑎 ,𝑏 }={𝑖,𝑗} sc ℓ



(3)

contributing to 𝑅ˆ𝐯 , where the summation is restricted to the choices consistent with the fact that the all Green functions in the middle are diagonal, i.e., 𝑏1 = 𝑎2 , 𝑏2 = 𝑎3 . Together with the restriction {𝑎ℓ , 𝑏ℓ } = {𝑖, 𝑗} and the assumption that all indices are distinct, it yields only two terms: (17.25)

1 1 2 ∑ 𝑚2 𝔼[𝑅𝑘𝑖 𝑣𝑖𝑗 𝑣𝑗𝑖 𝑣𝑖𝑗 𝑅𝑗𝑘 ] + ∑ 𝑚sc 𝔼[𝑅𝑘𝑗 𝑣𝑗𝑖 𝑣𝑖𝑗 𝑣𝑗𝑖 𝑅𝑖𝑘 ]. 𝑁 𝑘 sc 𝑁 𝑘

Considering the prefactors 𝑁 2 (𝑁𝜂)𝑁 −3/2 ≤ 𝑁 5/6 in (17.23), in order to prove (17.15) we need to show that both terms in (17.25) are bounded by 𝑁 −1+𝐶𝜀 . Note that the trivial power counting from the two off-diagonal elements would only give the bound 𝑁 −2/3+𝐶𝜀 . In other words, we need to improve this trivial estimate at least by a factor of 𝑁 −1/3 . We will focus on the first term in (17.25). For the last resolvent entry, 𝑅𝑗𝑘 , we use the identity 𝑅𝑗𝑖 𝑅𝑖𝑘 (𝑖) 𝑅𝑗𝑘 = 𝑅𝑗𝑘 + 𝑅𝑖𝑖 from (8.1). The second factor already has two off-diagonal elements, 𝑅𝑗𝑖 𝑅𝑖𝑘 , yielding an additional 𝑁 −1/3+𝐶𝜀 factor with high probability (i.e., even without taking the expectation). So we focus on the term (17.26)

1 (𝑖) 2 ∑ 𝑚sc 𝔼[𝑅𝑘𝑖 𝑣𝑖𝑗 𝑣𝑗𝑖 𝑣𝑖𝑗 𝑅𝑗𝑘 ]. 𝑁 𝑘

Recall the identity (8.2) (𝑖)

(𝑖)

𝑅𝑘𝑖 = −𝑅𝑘𝑘 ∑ 𝑅𝑘ℓ (𝑁 −1/2 𝑣ℓ𝑖 ). ℓ

17. EDGE UNIVERSALITY

199

Hence we have that (𝑖)

(17.27)

1 (𝑖) (𝑖) 2 −1/2 ∑ 𝔼[𝑅𝑘𝑘 𝑅𝑘ℓ 𝑣ℓ𝑖 𝑣𝑖𝑗 𝑣𝑗𝑖 𝑣𝑖𝑗 𝑅𝑗𝑘 ]. (17.26) = − ∑ 𝑚sc 𝑁 𝑁 𝑘 ℓ

We may again replace the diagonal term 𝑅𝑘𝑘 with 𝑚sc at a negligible error, and we are left with (𝑖)

(17.28)

(𝑖)

(𝑖)

3 −3/2 ∑ ∑ 𝔼[𝑅𝑘ℓ 𝑣ℓ𝑖 𝑣𝑖𝑗 𝑣𝑗𝑖 𝑣𝑖𝑗 𝑅𝑗𝑘 ]. −𝑚sc 𝑁 𝑘



The expectation with respect to the variable 𝑣ℓ𝑖 renders this term 0 unless ℓ = 𝑗, in which case we gain an additional factor 𝑁 −1/2+𝐶𝜀 in the power counting compared with (17.26). So the contribution of the last displayed expression to (17.23) is improved from 𝑁 1/6+𝐶𝜀 to 𝑁 −1/3+𝐶𝜀 . The argument so far assumed that all indices are distinct. In case of coinciding indices, each coinciding index results in a factor 1/𝑁 from the restriction in the summation. At the same time, one of the off-diagonal elements may become diagonal; hence a factor 𝑁 −1/3+𝐶𝜀 , attributed to this off-diagonal element, is “lost.” The total balance of a coinciding index in the power counting is thus a factor 𝑁 −2/3 , so we conclude that terms with coinciding indices are negligible. This proves the simplified version (16.5) of Theorem 17.4. To prove (17.10), for simplicity we ignore the integration and we prove only that (17.29) |𝔼𝐹(𝑁𝜂 Im 𝑚(𝐯) (𝑧)) − 𝔼𝐹(𝑁𝜂 Im 𝑚(𝐰) (𝑧))| ≤ 𝐶𝑁 −1/6+𝐶𝜀 ,

𝑧 = 𝐸 + 𝑖𝜂,

for any 𝐸 with |𝐸 − 2| ≤ 𝑁 −2/3+𝜀 . The 𝜂 factor, up to an irrelevant 𝑁 2𝜀 , accounts for the integration domain of length |𝐸2 − 𝐸1 | ≤ 𝑁 −2/3+𝜀 in the arguments in (17.10). The general form (17.10) follows in the same way. We point out that (17.29) with 𝐹(𝑁𝜂 Im 𝑚(𝑧)) replaced by 𝐹(𝑁𝜂 𝑚(𝑧)) also holds with the same argument given below. Taking into account the telescopic summation over all 𝛾’s, for the proof of (17.29) it is sufficient to prove that 1 1 1 1 | | (17.30) 𝑁 2 ||𝔼𝐹(𝑁𝜂 Im Tr ) − 𝔼𝐹(𝑁𝜂 Im Tr )| 𝑁 𝐻𝛾 − 𝑧 𝑁 𝐻𝛾 − 𝑧 | ≤ 𝐶𝑁 −1/6+𝐶𝜀 , and then sum it up for all 𝛾 = 𝜙(𝑖, 𝑗). As before, we assume the generic case, i.e., 𝑖 ≠ 𝑗. We Taylor expand the function 𝐹 around the point Ξ ∶= 𝑁𝜂 Im 𝑚(𝑅) (𝑧),

𝑧 = 𝐸 + 𝑖𝜂,

200

17. EDGE UNIVERSALITY

where 𝑚(𝑅) = 𝑁1 Tr 𝑅. Notice that 𝑚(𝑅) is independent of 𝑣𝑖𝑗 and 𝑤𝑖𝑗 . Recalling the definition of 4

(𝑚)

𝜉𝐯 = ∑ 𝑁 −𝑚/2 𝑅ˆ𝐯

(17.31)

+ 𝑁 −5/2 Ω𝐯

𝑚=1

from (17.17), and a similar definition for 𝜉𝐰 , we obtain 1 1 1 1 | | (17.32) ||𝔼𝐹(𝑁𝜂 Im Tr ) − 𝔼𝐹(𝑁𝜂 Im Tr )| ≤ 𝑁 𝐻𝛾 − 𝑧 𝑁 𝐻𝛾−1 − 𝑧 | 1 |𝔼𝐹 ′ (Ξ)(𝑁𝜂[𝜉𝐯 − 𝜉𝐰 ])| + |𝔼𝐹 ″ (Ξ)([𝑁𝜂𝜉𝐯 ]2 − [𝑁𝜂𝜉𝐰 ]2 )| + ⋯ . 2 We now substitute (17.31) into this expression. By the näive power counting (𝑚)

(𝑁𝜂)𝑁 −𝑚/2 |𝑅ˆ𝐯

| ≤ 𝑁 −𝑚/2−1/3+𝐶𝜀 ,

(𝑁𝜂)𝑁 −5/2 |Ω𝐯 | ≤ 𝑁 −13/6+𝐶𝜀 ,

from the previous proof. The contributions of all the Ω error terms as well as the 𝑚 = 4 terms clearly satisfy the aimed bound (17.30), and thus can be neglected. First, we prove that the quadratic term (as well as all higher-order terms) in the Taylor expansion (17.32) is negligible. Here the cancellation between the 𝐯 and 𝐰 contributions is essential. It is therefore sufficient to consider 3

(𝑁𝜂)2







′ (𝑚) (𝑚 ) (𝑚) (𝑚 ) − 𝑅ˆ𝐰 𝑅ˆ𝐰 ] 𝑁 −𝑚/2−𝑚 /2 [𝑅ˆ𝐯 𝑅ˆ𝐯

𝑚,𝑚′ =1

instead of [𝑁𝜂𝜉𝐯 ]2 − [𝑁𝜂𝜉𝐰 ]2 . Since 𝐹 ″ is bounded, any term with 𝑚 + 𝑚′ ≥ 4 is negligible, similarly to the previous by power counting. The contribution of the 𝑚 = 𝑚′ = 1 term to the quadratic term in (17.32) is explicitly given by 𝔼𝐹 ″ (Ξ)(

1 ∑[𝑅 𝑣 𝑅 + 𝑅𝑘𝑗 𝑣𝑗𝑖 𝑅𝑖𝑘 ]) 𝑁 𝑘 𝑘𝑖 𝑖𝑗 𝑗𝑘 ⋅(

1 ∑[𝑅 ′ 𝑣 𝑅 ′ + 𝑅𝑘′ 𝑗 𝑣𝑗𝑖 𝑅𝑖𝑘′ ]) − 𝔼𝐹 ″ (Ξ)(𝑣 ↔ 𝑤). 𝑁 𝑘′ 𝑘 𝑖 𝑖𝑗 𝑗𝑘

Since 𝑅 and Ξ are independent of 𝑣𝑖𝑗 and 𝑤𝑖𝑗 , the expectations with respect to 𝑣𝑖𝑗 and 𝑤𝑖𝑗 can be computed. Since their second moments coincide, the two expectations above exactly cancel each other. We are left with the 𝑚 + 𝑚′ = 3 terms, we may assume 𝑚 = 2 and 𝑚′ = 1. We do not expect any more cancellation from the 𝐯 and 𝐰 terms, so we consider them separately. A typical such term contributing to (17.30) with all the prefactors is of the form 𝑁 2 (𝑁𝜂)2 𝑁 −3/2 𝐹 ″ (Ξ)(

1 1 ∑ 𝑅 𝑣 𝑅 𝑣 𝑅 )( ∑ 𝑅 ′ 𝑣 𝑅 ′ ). 𝑁 𝑘 𝑘𝑖 𝑖𝑗 𝑗𝑗 𝑗𝑖 𝑖𝑘 𝑁 𝑘′ 𝑘 𝑖 𝑖𝑗 𝑗𝑘

The key point is that generically there are at least four off-diagonal elements, 𝑅𝑘𝑖 , 𝑅𝑖𝑘 , 𝑅𝑘′ 𝑖 , and 𝑅𝑗𝑘′ (we chose the “worst” term, where the resolvent 𝑅𝑗𝑗

17. EDGE UNIVERSALITY

201

in the middle is diagonal). So together with the boundedness of 𝐹 ″ , the direct power counting gives 𝑁 2 (𝑁𝜂)2 𝑁 −3/2 (𝑁 −1/3+𝐶𝜀 )4 = 𝑁 −1/6+𝐶𝜀 , which is exactly the precision required in (17.30). In case of each coinciding index (for example, 𝑘 = 𝑖), two off-diagonal elements may be “lost,” but we gain a factor 1/𝑁 from the restriction of the summation. So terms with coinciding indices are negligible, as before. Here we discussed only the quadratic term in the Taylor expansion (17.32), but it is easy to see that higher-order terms are even smaller, without any cancellation effect. So we are left with estimating the linear term in (17.32); i.e., we need to bound 3

(𝑚)

𝑁 2 (𝑁𝜂) ∑ 𝑁 −𝑚/2 𝔼𝐹 ′ (Ξ)[𝑅ˆ𝐯

(𝑚)

− 𝑅ˆ𝐰 ],

𝑚=1

where we already took into account that the 𝑚 = 4 term as well as the Ω𝐯 error term from (17.31) are negligible by direct power counting. If 𝐹 ′ (Ξ) were deterministic, then the previous argument leading to (17.15) would also prove the 𝑁 −1/6+𝐶𝜀 bound for the linear term in (17.32). In fact, the exact cancellation (𝑚) (𝑚) between the expectations of the 𝑅ˆ𝐯 and 𝑅ˆ𝐰 terms for 𝑚 = 1 and 𝑚 = 2 relied only on computing the expectation with respect to 𝑣𝑖𝑗 and 𝑤𝑖𝑗 and on the matching of the first and second moments. Since 𝐹 ′ (Ξ) is independent of 𝑣𝑖𝑗 and 𝑤𝑖𝑗 , this argument remains valid even with the 𝐹 ′ (Ξ) factor included. We thus need to control the 𝑚 = 3 term, i.e., show that (17.33)

(3) 𝑁 2 (𝑁𝜂)𝑁 −3/2 𝔼𝐹 ′ (Ξ)𝑅ˆ𝐯 ≤ 𝑁 −1/6+𝐶𝜀 .

The same bound then would also hold if 𝐯 is replaced with 𝐰. Using the boundedness of 𝐹 ′ and the argument between (17.24) and (17.28) in the previous proof, we see that for (17.33) it is sufficient to show that (𝑖) | 2 (𝑖) (𝑖) | −3/2 −3/2 ∑ ∑ 𝔼𝐹 ′ (Ξ)[𝑅𝑘ℓ 𝑣ℓ𝑖 𝑣𝑖𝑗 𝑣𝑗𝑖 𝑣𝑖𝑗 𝑅𝑗𝑘 ]| ≤ 𝑁 −1/6+𝐶𝜀 . (17.34) |𝑁 (𝑁𝜂)𝑁 𝑁 | | 𝑘 ℓ

Without the 𝐹 ′ (Ξ) term, the expectation with respect to 𝑣ℓ𝑖 would render this term 0, in the generic case when ℓ ≠ 𝑗. In order to exploit this effect we need to expand 𝐹 ′ (Ξ) in the single variable 𝑣ℓ𝑖 . Fix an index ℓ ≠ 𝑗 and let 𝑄ℓ be the matrix identical to 𝑄 except that the (ℓ, 𝑖) and (𝑖, ℓ) matrix elements are 0, and let 𝑇 = 𝑇 ℓ = (𝑄ℓ − 𝑧)−1 be its resolvent. Clearly 𝑇 is independent of 𝑣𝑖𝑗 and 𝑣ℓ𝑖 , and the local law (17.19) holds for the matrix elements of 𝑇 as well. Using the resolvent expansion, we find (17.35)

Ξ = (𝑁𝜂) Im[

1 ∑[𝑇 + 𝑁 −1/2 (𝑇𝑛ℓ 𝑣ℓ𝑖 𝑇𝑖𝑛 + 𝑇𝑛𝑖 𝑣𝑖ℓ𝑖 𝑇ℓ𝑛 ) + ⋯ ]] 𝑁 𝑛 𝑛𝑛

202

17. EDGE UNIVERSALITY

where the higher-order terms carry a prefactor 𝑁 −1 . We Taylor expand 𝐹 ′ (Ξ) around 1 Ξ0 ∶= (𝑁𝜂) Im ∑ 𝑇𝑛𝑛 , 𝑁 𝑛 and we obtain (17.36) 𝐹 ′ (Ξ) = 𝐹 ′ (Ξ0 ) +

1 ″ 1 𝐹 (Ξ0 )(𝑁𝜂) Im[ 3/2 ∑[𝑇𝑛ℓ 𝑣ℓ𝑖 𝑇𝑖𝑛 + 𝑇𝑛𝑖 𝑣𝑖ℓ𝑖 𝑇ℓ𝑛 + ⋯]] + ⋯ . 2 𝑁 𝑛

Since Ξ0 is independent of 𝑣ℓ𝑖 , when we insert the above expansion for 𝐹 ′ (Ξ) into (17.34), the contribution of the leading term 𝐹 ′ (Ξ0 ) gives 0 after taking the expectation for 𝑣ℓ𝑖 . The two subleading terms in (17.36), which are linear in 𝑣ℓ𝑖 , have generically two off-diagonal elements, 𝑇𝑛ℓ and 𝑇𝑖𝑛 (or 𝑇𝑛𝑖 and 𝑇ℓ𝑛 ), which have size 𝑁 −1/3+𝐶𝜀 each. So the contribution of this term to (17.34) by simple power counting is 𝑁 2 (𝑁𝜂)𝑁 −3/2 𝑁 −3/2 𝑁𝑁(𝑁𝜂)𝑁 −1/2 (𝑁 −1/3+𝐶𝜀 )4 ≤ 𝑁 −1/6+𝐶𝜀 . A similar argument shows that the higher-order terms in the Taylor expansion (17.36) as well as higher-order terms in the resolvent expansion (17.35) have a negligible contribution. As before, we presented the argument for generic indices; coinciding indices give smaller contributions as we explained before. We leave the details to the reader. This completes the proof of (17.30) and thus the proof of Theorem 17.4. □

CHAPTER 18

Further Results and Historical Notes Spectral statistics of large random matrices have been studied from many different perspectives. The main focus of this book was on one particular result: the Wigner-Dyson-Mehta bulk universality for Wigner matrices in the average energy sense (Theorem 5.1). In this last section we collect a few other recent results and open questions of this very active research area. We also give references to related results and summarize the history of the development. 18.1. Wigner Matrices: Bulk Universality in Different Senses Theorem 5.1, which is also valid for complex Hermitian ensembles, settles the average energy version of the Wigner-Dyson-Mehta conjecture for generalized Wigner matrices under the moment assumption (5.6) on the matrix elements. Theorem 5.1 was proved in increasing order of generality in [63,68–70]; see also [131] for the complex Hermitian Wigner ensemble and for a restricted class of real symmetric matrices (see comments about this class later on). The following theorem reduces the moment assumption to 4+𝜀 moments. The proof in [53] also provides an effective speed of convergence in (18.2). We also point out that the condition (2.6) can be somewhat relaxed (see corollary 8.3 in [55]). Theorem 18.1 (Universality with averaged energy [53, theorem 7.2]). Suppose that 𝐻 = (ℎ𝑖𝑗 ) is a complex Hermitian (respectively, real symmetric) generalized Wigner matrix. Suppose that for some constants 𝜀 > 0, 𝐶 > 0, 𝔼|√𝑁ℎ𝑖𝑗 |

(18.1)

4+𝜀

≤ 𝐶.

Let 𝑛 ∈ ℕ and 𝑂 ∶ ℝ𝑛 → ℝ be a test function (i.e., compactly supported and continuous). Fix |𝐸0 | < 2 and 𝜉 > 0, then with 𝑏𝑁 = 𝑁 −1+𝜉 we have 𝐸0 +𝑏𝑁

(18.2)

lim ∫

𝑁→∞

𝐸0 −𝑏𝑁

d𝐸 1 𝜶 (𝑛) (𝑛) ∫ d𝜶 𝑂(𝜶) ) = 0. (𝑝𝑁 − 𝑝G,𝑁 )(𝐸 + 𝑛 2𝑏𝑁 ℝ𝑛 𝜚sc (𝐸) 𝑁𝜚sc (𝐸) (𝑛)

Here 𝜚sc is the semicircle law defined in (3.1), 𝑝𝑁 is the 𝑛-point correlation func(𝑛)

tion of the eigenvalue distribution of 𝐻 (4.20), and 𝑝G,𝑁 is the 𝑛-point correlation function of an 𝑁 × 𝑁 GUE (respectively, GOE) matrix. A relatively straightforward consequence of Theorem 18.1 is the average gap universality formulated in the following corollary. The gap distributions of the comparison ensembles, i.e., the Gaussian ensembles, can be explicitly expressed 203

204

18. FURTHER RESULTS AND HISTORICAL NOTES

in terms of a Fredholm determinant [37, 40]. Recall the notation J𝐴, 𝐵K ∶= ℤ ∩ [𝐴, 𝐵] for any real numbers 𝐴 < 𝐵. Corollary 18.2 (Gap universality with averaged label). Let 𝐻 be as in Theorem 18.1 and 𝑂 be a test function of 𝑛 variables. Fix small positive constants 𝜉, 𝛼 > 0. Then for any integer 𝑗0 ∈ J𝛼𝑁, (1 − 𝛼)𝑁K we have (18.3)

1 𝑁→∞ 2𝑁 𝜉



lim

[𝔼 − 𝔼G ]𝑂(𝑁(𝜆𝑗 − 𝜆𝑗+1 ), … , 𝑁(𝜆𝑗 − 𝜆𝑗+𝑛 )) = 0.

|𝑗−𝑗0 |≤𝑁𝜉

Here 𝜆𝑗 ’s are the ordered eigenvalues, and for brevity we omitted the rescaling factor 𝜚(𝜆𝑗 ) from the argument of 𝑂. Moreover, 𝔼 and 𝔼G denote the expectation with respect to the Wigner ensemble 𝐻 and the Gaussian (GOE or GUE) ensemble, respectively. The next result shows that Theorem 18.1 holds under the same conditions without energy averaging. Theorem 18.3 (Universality at fixed energy [26]). Consider a complex Hermitian (respectively, real symmetric) generalized Wigner matrix with moment condition (18.1). Then, for any 𝐸 with |𝐸| < 2 we have (18.4)

lim ∫ d𝜶 𝑂(𝜶)

𝑁→∞

ℝ𝑛

𝜶 1 (𝑛) (𝑛) ) = 0. (𝑝𝑁 − 𝑝G,𝑁 )(𝐸 + 𝑛 𝜚sc (𝐸) 𝑁𝜚sc (𝐸)

The fixed energy universality (18.4) for the 𝛽 = 2 (complex Hermitian) case was already known earlier (see [57,66,132]). This case is exceptional since the Harish-Chandra-Itzykson-Zuber identity allows one to compute correlation functions for Gaussian divisible ensembles. This method relies on an algebraic identity and has no generalization to other symmetry classes. Finally, the gap universality with fixed label asserts that (18.3) holds without averaging. Theorem 18.4 (Gap universality with fixed label [67, theorem 2.2]). Consider a complex Hermitian (respectively, real symmetric) generalized Wigner matrix and assume subexponential decay of the matrix elements instead of (18.1). Then, Corollary 18.2 holds without averaging: lim [𝔼 − 𝔼G ]𝑂(𝑁(𝜆𝑗 − 𝜆𝑗+1 ), … , 𝑁(𝜆𝑗 − 𝜆𝑗+𝑛 )) = 0

(18.5)

𝑁→∞

for any 𝑗 ∈ J𝛼𝑁, (1 − 𝛼)𝑁K with a fixed 𝛼 > 0. More generally, for any 𝑘, 𝑚 ∈ J𝛼𝑁, (1 − 𝛼)𝑁K we have (18.6)

lim |𝔼𝑂((𝑁𝜚𝑘 )(𝜆𝑘 − 𝜆𝑘+1 ), … , (𝑁𝜚𝑘 )(𝜆𝑘 − 𝜆𝑘+𝑛 ))

𝑁→∞

− 𝔼G 𝑂((𝑁𝜚𝑚 )(𝜆𝑚 − 𝜆𝑚+1 ), … , (𝑁𝜚𝑚 )(𝜆𝑚 − 𝜆𝑚+𝑛 ))| = 0 where the local density 𝜚𝑘 is defined by 𝜚𝑘 ∶= 𝜚sc (𝛾𝑘 ) with 𝛾𝑘 from (11.31).

18.2. HISTORICAL OVERVIEW OF THE THREE-STEP STRATEGY

205

The second part (18.6) of this theorem asserts that the gap distribution is not only independent of the specific Wigner ensemble, but it is also universal throughout the bulk spectrum. This is the counterpart of the statement that the appropriately rescaled correlation functions (18.4) have a limit that is independent of 𝐸 (see (4.38)). Prior to [67], universality for a single gap was only achieved in the special case of the Gaussian unitary ensemble (GUE) in [129]; this statement then immediately implies the same result for complex Hermitian Wigner matrices satisfying the four moment matching conditions. 18.2. Historical Overview of the Three-Step Strategy for Bulk Universality We summarize the recent history related to the bulk universality results in Section 18.1. The three-step strategy first appeared implicitly in [57] in the context of complex Hermitian matrices and was conceptualized for all symmetry classes in [63] by linking it to the DBM. Several key components of this strategy appeared in the later work [63, 64, 69]. We will review the history of Steps 1–3 separately. 18.2.1. History of Step 1. The semicircle law was proved by Wigner for energy windows of order 1 [138]. Various improvements were made to shrink the spectral windows; in particular, results down to scale 𝑁 −1/2 were obtained by combining the results of [11] and [80]. The result at the optimal scale, 𝑁 −1 , referred to as the local semicircle law, was established for Wigner matrices in a series of papers [60–62]. The method was based on a self-consistent equation for the Stieltjes transform of the eigenvalues, 𝑚(𝑧), and the continuity in the imaginary part of the spectral parameter 𝑧. As a by-product, the optimal eigenvector delocalization estimate was proved. For generalized Wigner matrices there is no closed equation for 𝑚(𝑧) = 𝑁 −1 Tr 𝐺. In order to deal with this case, one needed to consider a self-consistent equation for the entire vector (𝐺𝑖𝑖 (𝑧))𝑁 𝑖=1 consisting of the diagonal matrix elements of the Green function [68, 69]. Together with the fluctuation averaging lemma, this method implied the optimal rigidity estimate of eigenvalues in the bulk in [68] and up to the edges in [70]. Furthermore, optimal control was obtained for individual matrix elements of the resolvent 𝐺𝑖𝑗 and not only for its trace. The estimate on 𝐺𝑖𝑖 also provided a simple alternative proof of the eigenvector delocalization estimate. A comprehensive summary of these results can be found in [55]. Several further extensions concern weaker moment conditions and improvements of (log 𝑁)-powers (see, e.g., [32,78,130] and references therein). 18.2.2. History of Step 2. The universality for Gaussian divisible ensembles was proved by Johansson [86] for complex Hermitian Wigner ensembles in a certain range of the bulk spectrum. This was the first result beyond exact

206

18. FURTHER RESULTS AND HISTORICAL NOTES

Gaussian calculations going back to Gaudin, Dyson, and Mehta. It was later extended to complex sample covariance matrices on the entire bulk spectrum by Ben Arous and Péché [19]. There were two major restrictions of this method: (i) the Gaussian component was required to be of order 1 independent of 𝑁; (ii) it relies on an explicit formula by Brézin-Hikami [30, 31] for the correlation functions of eigenvalues, which is valid only for Gaussian divisible ensembles with unitary invariant Gaussian component. The size of the Gaussian component was reduced to 𝑁 −1+𝜀 in [57] by using an improved formula for correlation functions and the local semicircle law from [60–62]. To eliminate the usage of an explicit formula, a conceptual approach for Step 2 via the local ergodicity of Dyson Brownian motion was initiated in [63]. In [64], a general theorem for the bulk universality was formulated that applies to all classical ensembles, i.e., real and complex Wigner matrices, real and complex sample covariance matrices, and quaternion Wigner matrices. The relaxation time to local equilibrium proved in these two papers were not optimal; the optimal relaxation time, conjectured by Dyson, was obtained later in [70]. The DBM approach in these papers yields the average gap and hence the average energy universality, while the method via the Brézin-Hikami formula gives universality in a fixed energy sense but is strictly restricted to the complex Hermitian case. The fixed energy universality for all symmetry classes was recently achieved in [26]. The method in this paper was still based on DBM, but the analysis of DBM was very different from the one discussed in this book.

18.2.3. History of Step 3. Once universality was established for Gaussian divisible ensembles 𝑒−𝑡/2 𝐻0 + (1 − 𝑒−𝑡 )1/2 𝐻 G (here we used the convention used in Chapter 5) for a sufficiently large class of 𝐻0 and sufficiently small 𝑡, the final step is to approximate the local eigenvalue distribution of a general Wigner matrix by that of a Gaussian divisible one. The first approximation result was obtained via the reversal heat flow in [57], which required smoothness of the distribution of matrix elements. Shortly after, Tao and Vu [131] proved the four-moment theorem, which asserts that the local statistics of two Wigner ensembles are the same provided that the first four moments of their matrix elements are identical. The third comparison method is the Green function comparison theorem, Theorem 16.1 [63, 68, 69], which was the main content of Chapter 16. It uses similar moment conditions as those appeared in [131], but it provides detailed information on the matrix elements of Green functions as well. Both [131] and the proof of the Green function comparison theorem [63, 68, 69] use the local semicircle law [60, 62] and the Lindeberg’s idea, which had appeared in the proof of the Wigner semicircle law by Chatterjee [34]. The approach [131] requires additional difficult estimates on level repulsions, while Theorem 16.1 follows directly from the estimates on the matrix elements of the Green function from Step 1 via standard resolvent expansions. Another

18.3. INVARIANT ENSEMBLES AND LOG-GASES

207

comparison method reviewed in this book, Theorem 15.2, can be viewed as a variation of the GFT. In summary, the WDM conjecture was first solved in the complex Hermitian case and then for the real symmetric case. In the first case, the bulk universality for Hermitian matrices with smooth distribution was proved in [57]. The smoothness requirement was removed in [131], but the complex Bernoulli distribution was still excluded; this was finally added in [58] (see also [66, 132] for related discussions). The result of [131] also implies bulk universality of symmetric Wigner matrices under the restriction that the first four moments of the matrix elements match those of the GOE. The major roadblock to the WDM conjecture for real symmetric matrices was the lack of Step 2 in this case. For symmetric matrices, we do not know any Brézin-Hikami type formula, which was the backbone for Step 2 in the Hermitian case. The resolution of this difficulty was to abandon any explicit formula and to prove Dyson’s conjecture on the relaxation of the Dyson Brownian motion (see Step 2). Together with the Green function comparison theorem [63], bulk universality was proved for real symmetric matrices without any smoothness requirement. In particular, this result covered the real Bernoulli case [68], which has been a major objective in this subject due to the application to the adjacency matrices for random graphs. The flexibility of the DBM approach also allowed one to establish universality beyond the traditional Wigner ensembles. The first such result was bulk universality for generalized Wigner matrices [69], followed by many other models (see Section 18.6). Finally, the technical condition assumed in all these papers, i.e., that the probability distributions of the matrix elements decay subexponentially, was reduced to the (4 + 𝜀)-moment assumption (18.1) by using the universality of Erdős-Rényi matrices [53]. This yielded the bulk universality as formulated in Theorem 18.1. 18.3. Invariant Ensembles and Log-Gases The explicit formula (4.3) for the joint density function of the eigenvalues for the invariant matrix ensemble defined in (4.2) gives rise to a direct statistical physics interpretation: it can be viewed as a measure 𝜇(𝑁) on 𝑁 point particles on the real line. Inspired by the Gibbs formalism, we may write it as (18.7)

𝜇(𝑁) (d𝝀) = 𝑝𝑁 (𝝀)d𝝀 = const ⋅ 𝑒−𝛽𝑁ℋ (𝝀) d𝝀,

𝝀 = (𝜆1 , … , 𝜆𝑁 ),

where 𝑁

1 1 𝑉(𝜆𝑗 ) − 2 𝑁 𝑗=1

ℋ(𝝀) ∶= ∑



log(𝜆𝑗 − 𝜆𝑖 ).

1≤𝑖 0, i.e., at any inverse temperature. In the case of invariant ensembles, it is well-known that for 𝑉 satisfying certain mild conditions the sequence of one-point correlation functions, or density, associated with (18.7) has a limit as 𝑁 → ∞, and the limiting equilibrium density 𝜚𝑉 (𝑠) can be obtained as the unique minimizer of the functional 𝐼(𝜈) = ∫ 𝑉(𝑡)𝜈(𝑡)d𝑡 − ∫ ∫ log |𝑡 − 𝑠|𝜈(𝑠)𝜈(𝑡)d𝑡 d𝑠. ℝ





This is the analogue of the Wigner semicircle law for log-gases. We assume that 𝜚 = 𝜚𝑉 is supported on a single compact interval, [𝐴, 𝐵], and 𝜚 ∈ 𝐶 2 (𝐴, 𝐵). Moreover, we assume that 𝑉 is regular in the sense that 𝜚 is strictly positive on (𝐴, 𝐵) and vanishes as a square root at the endpoints, i.e., 𝜚(𝑡) = 𝑠𝐴 √𝑡 − 𝐴 (1 + 𝑂 (𝑡 − 𝐴)) ,

(18.8)

𝑡 → 𝐴+ ,

for some constant 𝑠𝐴 > 0, and a similar condition holds at the upper edge. It is known that these conditions are satisfied if, for example, 𝑉 is strictly convex. In this case 𝜚𝑉 satisfies the equation 𝜚 (𝑠)d𝑠 1 ′ 𝑉 (𝑡) = ∫ 𝑉 2 𝑡−𝑠 ℝ

(18.9)

for any 𝑡 ∈ (𝐴, 𝐵). For the Gaussian case, 𝑉(𝑥) = 𝑥 2 /2, the equilibrium density is given by the semicircle law, 𝜚𝑉 = 𝜚sc (see (3.1)). Under these conditions, the density of 𝜇(𝑁) converges to 𝜚𝑉 even locally down to the optimal scale, and rigidity analogous to Theorem 11.5 holds: ℙ𝜇

(𝑁)

(|𝜆𝑗 − 𝛾𝑗,𝑉 | ≥ 𝑁 −2/3+𝜉 min{𝑘, 𝑁 − 𝑘 + 1}−1/3 ) ≤ 𝑒−𝑁

c

where the quantiles 𝛾𝑗,𝑉 of the density 𝜚𝑉 are defined by 𝛾𝑗,𝑉

(18.10)

𝑗 =∫ 𝑁 𝐴

𝜚𝑉 (𝑥)d𝑥,

similarly to (11.31) (see [24, theorem 2.4 ]). The higher-order correlation functions, given in Definition 4.2 and rescaled by the local density 𝜚𝛽,𝑉 (𝐸) around an energy 𝐸 in the bulk, exhibit a universal behavior in a sense that they depend only on 𝛽 but are independent of the potential 𝑉. A similar result holds at the edges of the support of the density 𝜚𝛽,𝑉 . This can be viewed as a special realization of the universality hypothesis for the invariant matrix ensemble. Historically, the universality for the special classical cases 𝛽 = 1, 2, 4 had been extensively studied via orthogonal polynomial methods. These methods rely on explicit expressions of the correlation functions in terms of Fredholm determinants involving orthogonal polynomials. The key advantage of this method is that it yields explicit formulas for the limiting gap distributions or correlation functions. However, these explicit expressions are

18.3. INVARIANT ENSEMBLES AND LOG-GASES

209

available only for the specific values 𝛽 = 1, 2, 4. Within the framework of this approach, the 𝛽 = 1, 2, 4 cases of Theorem 18.5 below was proved for very general potentials for 𝛽 = 2 and for analytic potentials with some additional conditions for 𝛽 = 1, 4. This is a whole subject by itself, and we can only refer the reader to some reviews or books [39, 91, 116]. The first general result for nonclassical 𝛽-values is the following theorem proved in [23–25]. In these works, a new approach to the universality of the 𝛽-ensemble was initiated. It departed from the previously mentioned traditional approach using explicit expressions for the correlation functions. Instead, it relies on comparing the probability measures of the 𝛽-ensembles to those of the Gaussian ones. This basic idea of comparing two probability measures directly was also used in the later works [18, 119], albeit in a different way, where Theorem 18.5 was reproved under different sets of conditions on the potential 𝑉. Theorem 18.5 (Universality with averaged energy). Assume the potential 𝑉 (𝑁) is regular, 𝑉 ∈ 𝐶 4 (ℝ), and let 𝛽 > 0. Consider the 𝛽-ensemble 𝜇(𝑁) = 𝜇𝛽,𝑉 given (𝑛) in (18.7) with correlation functions 𝑝𝑉,𝑁 defined analogously to (4.20). For the (𝑛) Gaussian case, 𝑉(𝑥) = 𝑥 2 /2, the correlation functions are denoted by 𝑝𝐺,𝑁 . Let 𝐸0 ∈ (𝐴, 𝐵) lie in the interior of the support for 𝜚, and similarly let 𝐸0′ ∈ (−2, 2) be inside the support for 𝜚sc . Then for 𝑏𝑁 = 𝑁 −1+𝜉 with some 𝜉 > 0 we have 𝐸0 +𝑏𝑁

lim ∫ d𝜶 𝑂(𝜶)[ ∫

𝑁→∞

(18.11)

ℝ𝑛

𝐸0 −𝑏𝑁

𝜶 d𝐸 1 (𝑛) 𝑝 (𝐸 + ) 2𝑏𝑁 𝜚(𝐸)𝑛 𝑉,𝑁 𝑁𝜚(𝐸)

𝐸′0 +𝑏𝑁

−∫ 𝐸′0 −𝑏𝑁

1 𝜶 d𝐸 ′ (𝑛) 𝑝 (𝐸 ′ + )] = 0; 2𝑏𝑁 𝜚sc (𝐸 ′ )𝑛 G,𝑁 𝑁𝜚sc (𝐸 ′ )

(𝑁)

i.e., the correlation functions of 𝜇𝛽,𝑉 averaged around 𝐸0 asymptotically coincide with those of the Gaussian case. In particular, they are independent of 𝐸0 . Universality at fixed energy, i.e., (18.11) without averaging in 𝐸, was proven by M. Shcherbina in [119] for any 𝛽 > 0 under certain regularity assumptions on the potentials. We remark that the gap distribution in the Gaussian case for any 𝛽 > 0 can be described by an interesting stochastic process [136]. Theorem 18.5 immediately implies gap universality with averaged labels, exactly in the same way as Corollary 18.2 was deduced from Theorem 18.1. To formulate the gap universality with a fixed label, we recall the quantiles 𝛾𝑗,𝑉 from (18.10) and set (18.12)

𝜚𝑗𝑉 ∶= 𝜚𝑉 (𝛾𝑗,𝑉 )

and 𝜚𝑗 ∶= 𝜚sc (𝛾𝑗 )

to be the limiting densities at the 𝑗th quantiles. Let 𝔼𝜇𝑉 and 𝔼G denote the expectation w.r.t. the measure 𝜇𝑉 and its Gaussian counterpart for 𝑉(𝜆) = 12 𝜆2 .

210

18. FURTHER RESULTS AND HISTORICAL NOTES

Theorem 18.6 (Gap universality with fixed label [67, theorem 2.3]). We consider the setup of Theorem 18.5 and also assume 𝛽 ≥ 1. Set some 𝛼 > 0; then (18.13)

lim |𝔼𝜇𝑉 𝑂((𝑁𝜚𝑘𝑉 )(𝜆𝑘 − 𝜆𝑘+1 ), … , (𝑁𝜚𝑘𝑉 )(𝜆𝑘 − 𝜆𝑘+𝑛 ))

𝑁→∞

− 𝔼𝜇𝐺 𝑂((𝑁𝜚𝑚 )(𝜆𝑚 − 𝜆𝑚+1 ), … , (𝑁𝜚𝑚 )(𝜆𝑚 − 𝜆𝑚+𝑛 ))| = 0

for any 𝑘, 𝑚 ∈ J𝛼𝑁, (1 − 𝛼)𝑁K. In particular, the distribution of the rescaled gaps w.r.t. 𝜇𝑉 does not depend on the index 𝑘 in the bulk. We point out that Theorem 18.5 holds for any 𝛽 > 0, but Theorem 18.6 requires 𝛽 ≥ 1. This is partly due to the fact that the De Giorgi-Nash-Moser regularity theory was used in [67]. This restriction was later removed by BekermanFigalli-Guionnet [18] under a higher regularity assumption on 𝑉 and some additional hypotheses. 18.4. Universality at the Spectral Edge So far we stated results for the bulk of the spectrum. Similar results hold at the edge; in this case the “averaged” results are meaningless. In Theorem 17.1 we already presented a universality result for the largest eigenvalue for the Wigner ensemble. That proof easily extends to the joint distribution of finitely many extreme eigenvalues. The following theorem gives a bit more: they control finite-dimensional distributions of the first 𝑁 1/4 eigenvalues and also identify the limit distribution. We then give the analogous result for log-gases. Theorem 18.7 (Universality at the edge for Wigner matrices [24]). Let 𝐻 be a generalized Wigner ensemble with subexponentially decaying matrix elements. Fix 𝑛 ∈ ℕ, 𝜅 < 1/4, and a test function 𝑂 of 𝑛 variables. Then for any Λ ⊂ J1, 𝑁 𝜅 K with |Λ| = 𝑛, we have |[𝔼 − 𝔼G ]𝑂((𝑁 2/3 𝑗 1/3 (𝜆𝑗 − 𝛾𝑗 )) )| ≤ 𝑁 −𝜒 𝑗∈Λ with some 𝜒 > 0, where 𝔼G is expectation w.r.t. the standard GOE or GUE ensemble depending on the symmetry class of 𝐻 and 𝛾𝑗 ’s are semicircle quantiles (11.31). The edge universality for Wigner matrices was first proved by Soshinikov [125] under the assumption that the distribution of the matrix elements is symmetric and has finite moments for all orders. Soshnikov used an elaborated moment matching method, and the conditions on the moments are not easy to improve. A completely different method based on the Green function comparison theorem was initiated in [70]. This method is more flexible in many ways and removes essentially all restrictions in previous works. In particular, the optimal moment condition on the distribution of the matrix elements was obtained by Lee and Yin [98]. All these works assume that the variances of the matrix elements are identical. The main point of Theorem 18.7 is to consider generalized Wigner matrices, i.e., matrices with nonconstant variances. It was

18.5. EIGENVECTORS

211

proved in [70, theorem 17.1] that the edge statistics for any generalized Wigner matrix coincide with those of a generalized Gaussian Wigner matrix with the same variances, but it was not shown that the statistics are independent of the variances themselves. Theorem 18.7 provides this missing step and thus proves the edge universality for the generalized Wigner matrices. Theorem 18.8 (Universality at the edge for log-gases [24]). Let 𝛽 ≥ 1 and ˜ be in 𝐶 4 (ℝ), regular such that the equilibrium density 𝜚𝑉 (resp., 𝜚 ) is 𝑉 (resp., 𝑉) ˜ 𝑉 ˜𝐵 ˜]). Without loss of generality we supported on a single interval [𝐴, 𝐵] (resp., [𝐴, assume that for both densities (18.8) holds with 𝐴 = 0 and with the same constant 𝑠𝐴 . Fix 𝑛 ∈ ℕ and 𝜅 < 2/5. Then for any Λ ⊂ J1, 𝑁 𝜅 K with |Λ| = 𝑛, we have (18.14)

|(𝔼𝜇𝑉 − 𝔼𝜇𝑉˜ )𝑂((𝑁 2/3 𝑗 1/3 (𝜆𝑗 − 𝛾𝑗 )) )| ≤ 𝑁 −𝜒 𝑗∈Λ

with some 𝜒 > 0. Here, 𝛾𝑗 are the quantiles w.r.t. the density 𝜚𝑉 (18.10). The history of edge universality runs in parallel to the bulk one in the sense that most bulk methods were extended to the edge case. Similar to the bulk case, the edge universality was first proved via orthogonal polynomial methods for the classical values of 𝛽 = 1, 2, 4 in [38, 42, 111, 117]. The first edge universality results were given independently in [24] (valid for general potentials and 𝛽 ≥ 1) and, with a completely different method, in [92] (valid for strictly convex potentials and any 𝛽 > 0). The later work [18] also covered the edge case and proved universality to 𝛽 > 0 under some higher regularity assumption on 𝑉 and an additional condition that can be checked for convex 𝑉. Some open questions concern the universal behavior at the nonregular edges where even the scaling may change. 18.5. Eigenvectors Universality questions have been traditionally formulated for eigenvalues, but they naturally extend to eigenvectors as well. They are closely related to two different important physical phenomena: delocalization and quantum ergodicity. The concept of delocalization stems from the basic theory of random Schrödinger operators describing a single quantum particle subject to a random potential 𝑉 in ℝ𝑑 . If the potential decays at infinity, then the operator −Δ + 𝑉 acting on 𝐿2 (ℝ𝑑 ) has a discrete pure point spectrum with 𝐿2 -eigenfunctions, localized or bound states, in the low-energy regime. In the high-energy regime it has absolutely continuous spectrum with bounded but not 𝐿2 -normalizable generalized eigenfunctions. The latter are also called delocalized or extended states, and they correspond to scattering and conductance. If the potential 𝑉 is a stationary, ergodic random field, then celebrated Anderson localization [9] occurs: at high disorder a dense pure point spectrum with exponentially decaying eigenfunctions appears even in the energy regime that would classically correspond to scattering states. The localized regime has been mathematically

212

18. FURTHER RESULTS AND HISTORICAL NOTES

well understood for both the high-disorder regime and the regime of low density of states (near the spectral edges) since the fundamental works of Fröhlich and Spencer [74] and later by Aizenman and Molchanov [1]. The local spectral statistics is Poisson [108], reflecting the intuition that exponentially decaying eigenfunctions typically have no overlap, so their energies are independent. The low- disorder regime, however, is mathematically wide open. In 𝑑 = 3 or higher dimensions it is conjectured that −Δ + 𝑉 has absolutely continuous spectrum (away from the spectral edges), the corresponding eigenfunctions are delocalized, and the local eigenvalue statistics of the restriction of −Δ + 𝑉 to a large finite box follow the GOE local statistics. In other words, a phase transition is believed to occur as the strength of the disorder varies; this is called the Anderson metal-insulator transition. Currently, even the existence of this extended states regime is not known; this is considered one of the most important open questions in mathematical physics. Only the Bethe lattice (regular tree) is understood [2, 87], but this model exhibits Poisson local statistics [3] due to its exponentially growing boundary. The strong link between local eigenvalue statistics and delocalization for random Schrödinger operators naturally raises the question about the structure of the eigenvectors for an 𝑁 × 𝑁 Wigner matrix 𝐻. Complete delocalization in this context would mean that any ℓ2 -normalized eigenvector 𝐮 = (𝑢1 , … , 𝑢𝑁 ) of 𝐻 is substantially supported on each coordinate; ideally |𝑢𝑖 |2 ≈ 𝑁 −1 for each 𝑖. Due to fluctuations, this can hold only in a somewhat weaker sense. The first type of results provide an upper bound on the ℓ∞ -norm in very high probability: Theorem 18.9. Let 𝐮𝛼 , 𝛼 = 1, … , 𝑁, be the ℓ2 -normalized eigenvectors of a generalized Wigner matrix 𝐻 satisfying the moment condition (5.6). Then, for any (small) 𝜀 > 0 and (large) 𝐷 > 0 we have (18.15)

ℙ(∃𝛼 ∶ ‖𝐮𝛼 ‖∞ ≥

𝑁𝜀 ) ≤ 𝑁 −𝐷 . 𝑁

The 𝑁 𝜀 tolerance factor may be changed to a log-power and the probability estimate improved to subexponential under stronger decay conditions on 𝐻. This theorem directly follows from the local semicircle law, Theorem 6.7. Indeed, by spectral decomposition Im 𝐺𝑖𝑖 (𝑧) = 𝜂 ∑ 𝛼

|𝑢𝛼 (𝑖)|2 ≥ 𝜂 −1 |𝑢𝛼 (𝑖)|2 , (𝐸 − 𝜆𝛼 )2 + 𝜂 2

𝑧 = 𝐸 + 𝑖𝜂,

where we chose 𝐸 in the 𝜂-vicinity of 𝜆𝛼 . Since from (6.31) the left-hand side is bounded by 𝑁 𝜀 (𝑁𝜂)−1 with very high probability, we obtain (18.15). Under stronger decay conditions on 𝐻, the local semicircle law can be slightly improved, and thus the 𝑁 𝜀 tolerance factor may be changed to a log-power and the probability estimate improved to subexponential.

18.6. GENERAL MEAN FIELD MODELS

213

Although this bound prevents concentration of eigenvectors onto a set of size less than 𝑁 1−𝜀 , it does not imply the “complete flatness” of eigenvectors in the sense that |𝑢𝛼 (𝑖)| ≈ 𝑁 −1/2 . Note that by a complete 𝑈(𝑁) or 𝑂(𝑁) symmetry, the eigenfunctions of a GUE or GOE matrix are distributed according to the Haar measure on the 𝑁-dimensional unit sphere; moreover, the distribution of a particular coordinate 𝑢𝛼 (𝑖) of an fixed eigenvector 𝐮𝛼 is asymptotically Gaussian. The same result holds for generalized Wigner matrices by theorem 1.2 of [28] (a similar result was also obtained in [88, 133] under the condition that the first four moments match with those of the Gaussian): Theorem 18.10 (Asymptotic normality of eigenvectors). Let 𝐮𝛼 be the eigenvectors of a generalized Wigner matrix with moment condition (5.6). Then there is a 𝛿 > 0 such that for any 𝑚 ∈ ℕ and 𝐼 ⊂ 𝑇𝑁 ∶= J1, 𝑁 1/4 K ∪ J𝑁 1−𝛿 , 𝑁 − 𝑁 1−𝛿 K ∪ J𝑁 − 𝑁 1/4 , 𝑁K with |𝐼| = 𝑚 and for any unit vector 𝐪 ∈ ℝ𝑁 , the vector (√𝑁(|⟨𝐪, 𝐮𝛼 ⟩|)𝛼∈𝐼 ) ∈ ℝ𝑚 is asymptotically normal in the sense of finite moments. Moreover, the study of eigenvectors leads to another fundamental problem of mathematical physics: quantum ergodicity. Recall that the quantum ergodicity theorem (Shnirel′ man [123], Colin de Verdière [35], and Zelditch [141]) asserts that “most” eigenfunctions for the Laplacian on a compact Riemannian manifold with ergodic geodesic flow are completely flat. For 𝑑-regular graphs under certain assumptions on the injectivity radius and spectral gap of the adjacency matrices, similar results were proved for eigenvectors of the adjacency matrices [7]. A stronger notion of quantum ergodicity, the quantum unique ergodicity (QUE) proposed by Rudnick-Sarnak [115], demands that all high energy eigenfunctions become completely flat, and it supposedly holds for negatively curved compact Riemannian manifolds. One case for which QUE was rigorously proved concerns arithmetic surfaces, thanks to tools from number theory and ergodic theory on homogeneous spaces [82, 83, 101]. For Wigner matrices, a probabilistic version of QUE was settled in corollary 1.4 of [28]: Theorem 18.11 (Quantum unique ergodicity for Wigner matrices). Let 𝐮𝛼 be the eigenvectors of a generalized Wigner matrix with moment condition (5.6). Then there exists 𝜀 > 0 such that for any deterministic 1 ≤ 𝛼 ≤ 𝑁 and 𝐼 ⊂ J1, 𝑁K, for any 𝛿 > 0 we have (18.16)

|𝐼| | 𝑁 −𝜀 | ℙ(||∑ |𝑢𝛼 (𝑖)|2 − || ≥ 𝛿) ≤ 2 . 𝑁 𝛿 𝑖∈𝐼 18.6. General Mean Field Models

Wigner matrices and their generalizations in Definition 2.1 are the most prominent mean field random matrix ensembles, but there are several other

214

18. FURTHER RESULTS AND HISTORICAL NOTES

models of interest. Here we give a partial list. A more detailed account of the results and citations can be found later on in the few representative references. The three-step strategy opens up a path to investigate local spectral statistics of a large class of matrices. In many examples the limiting density of the eigenvalues is not the semicircle anymore, so the Dyson Brownian motion is not in global equilibrium and Step 2 is more complicated. In the spirit of Dyson’s conjecture, however, gap universality already follows from thelocal equilibration of DBM. Indeed, a local version of the DBM was used as a basic tool in [25]. A recent development to compare local and global dynamics of DBM was implemented by a coupling and homogenization argument in [26]. It turns out that as long as the initial condition of the DBM is regular on a certain scale, the local gap statistics reaches its equilibrium, i.e., the Wigner-Dyson-Mehta distribition, on the corresponding time scale [93] (see also [65] for a similar result). This concept reduces the proof of universality to a proof of the corresponding local law. One natural class of mean field ensembles are the deformed Wigner matrices; these are of the form 𝐻 = 𝑉 + 𝑊, where 𝑉 is a diagonal matrix and 𝑊 is a standard Wigner matrix. In fact, after a trivial rescaling, these matrices may be viewed as instances of the DBM with initial condition 𝐻0 = 𝑉, recalling that the law of the DBM at time 𝑡 is given by 𝐻𝑡 = 𝑒−𝑡/2 𝑉 + (1 − 𝑒−𝑡 )1/2 𝑊. Assuming that the diagonal elements of 𝑉 have a limiting density 𝜚𝑉 , the limiting density of 𝐻𝑡 is given by the free convolution of 𝜚𝑉 with the semicircle density. The corresponding local law was proved in [94] followed by the proof of bulk universality in [97] and the edge universality [95, 97]. The free convolution of two arbitrary densities, 𝜚𝛼 ⊞ 𝜚𝛽 , naturally appear in random matrix theory as the limiting density of the sum of two asymptotically free matrices 𝐴 and 𝐵 whose limiting densities are given by 𝜚𝛼 and 𝜚𝛽 . Moreover, for any deterministic 𝐴 and 𝐵, the matrices 𝐴 and 𝑈𝐵𝑈 ∗ are asymptotically free [137] where 𝑈 is a Haar distributed unitary matrix. Thus, the eigenvalue density of 𝐴 + 𝑈𝐵𝑈 ∗ is given by the free convolution . The corresponding local law on the optimal scale in the bulk was given in [15]. Another way to generate a matrix ensemble with a limiting density different from the semicircle law is to consider Wigner-type matrices that are further generalizations of the universal Wigner matrices from Definition 2.1. These are complex Hermitian or real symmetric matrices with centered independent (up to symmetry) matrix elements but without the condition that the sum of the variances in each row is constant (2.5). The limiting density is determined by the matrix of variances 𝑆 by solving the corresponding Dyson-Schwinger equation. Depending on 𝑆, the density may exhibit square root singularity similar to the edge of the semicircle law or a cubic root singularity [5]. The corresponding optimal local law and bulk universality were proved in [4]. Different complications arise for adjacency matrices of sparse graphs. Each edge of the Erdős-Rényi graph is chosen independently with probability 𝑝, so it

18.7. BEYOND MEAN FIELD MODELS: BAND MATRICES

215

is a mean field model with a semicircle density, but a typical realization of the matrix contains many zero elements if 𝑝 ≪ 1. Optimal local law was proved in [56], and bulk universality for 𝑝 ≫ 𝑁 −1/3 in [53] and for 𝑝 ≫ 𝑁 −1 in [84]. Another prominent model of random graphs is the uniform measure on 𝑑-regular graphs . The elements of the adjacency matrix are weakly dependent since their sum in every row is 𝑑. For 𝑑 ≫ 1 the limiting density is the semicircle law, and optimal local law was obtained in [17]. Bulk universality was shown in the regime 1 ≪ 𝑑 ≪ 𝑁 2/3 in [16], Sample covariance matrices (2.9) and especially their deformations with finite rank deterministic matrices play an important role in statistics. The first application of the three-step strategy to prove bulk universality was presented in [64]. The edge universality was achieved in [90,96]. For applications in principal component analysis, the main focus is on the extreme eigenvalues and the outliers whose detailed analysis was given in [22]. Here we have listed references for papers using methods closely related to this book. There are many references in this subject that we are unable to include here. 18.7. Beyond Mean Field Models: Band Matrices Wigner predicted that universality should hold for any large quantum system, described by a Hamiltonian 𝐻, of sufficient complexity. After discretization of the underlying physical state space, the Hamilton operator is given by a matrix 𝐻 whose matrix elements 𝐻𝑥𝑦 describe the quantum transition rate from state 𝑥 to 𝑦. Generalized Wigner matrices correspond to a mean field situation where transition from any state 𝑥 to 𝑦 is possible. In contrast, random Schrödinger operators of the form −Δ + 𝑉 defined on ℤ𝑑 allow transition only to neighboring lattice sites with a random on-site interaction, so they are prototypes of disordered quantum systems with a nontrivial spatial structure. As explained in Section 18.5, from the point of view of Anderson phase transition, generalized Wigner matrices are always in the delocalized regime. Random Schrödinger operators in 𝑑 = 1 dimension, or more general random matrices 𝐻 representing one-dimensional Hamiltonians with short-range random hoppings, are in the localized regime. It is therefore natural to vary the range of interaction to detect the phase transition. One popular model interpolating between the Wigner matrices and random Schrödinger operator is the random band matrix (see Example 6.1). In this model the physical state space, which labels the matrix elements, is equipped with a distance. Band matrices are characterized by the property that 𝐻𝑖𝑗 becomes negligible if dist(𝑖, 𝑗) exceeds a certain parameter 𝑊, called the bandwidth. A fundamental conjecture [76] states that the local spectral statistics of a band matrix 𝐻 are governed by random matrix statistics for large 𝑊 and by Poisson statistics for small 𝑊. The transition is conjectured to be sharp [76,126] for the band matrices in one spatial dimension around the critical value 𝑊 = √𝑁.

216

18. FURTHER RESULTS AND HISTORICAL NOTES

In other words, if 𝑊 ≫ √𝑁, we expect the universality results of [26, 57, 63, 67] to hold. Furthermore, the eigenvectors of 𝐻 are expected to be completely delocalized in this range. For 𝑊 ≪ √𝑁, one expects that the eigenvectors are exponentially localized. This is the analogue of the celebrated Anderson metalinsulator transition for random band matrices. The only rigorous work indicating the √𝑁 threshold concerns the second mixed moments of the characteristic polynomial for a special class of Gaussian band matrices [120, 122]. The localization length for band matrices in one spatial dimension was recently investigated in numerous works. For general distribution of the matrix entries, eigenstates were proved to be localized [116] for 𝑊 ≪ 𝑁 1/8 , and delocalization of most eigenvectors in a certain averaged sense holds for 𝑊 ≫ 𝑁 6/7 [50, 51], improved to 𝑊 ≫ 𝑁 4/5 [54]. The Green’s function (𝐻 − 𝑧)−1 was controlled down to the scale Im 𝑧 ≫ 𝑊 −1 in [69], implying a lower bound of order 𝑊 for the localization length of all eigenvectors. When the entries are Gaussian with some specific covariance profiles, supersymmetry techniques are applicable to obtain stronger results. This approach was first developed by physicists (see [49] for an overview); the rigorous analysis was initiated by Spencer (see [126] for an overview) with an accurate estimate on the expected density of states on arbitrarily short scales for a three-dimensional band matrix ensemble in [44]. More recent works include universality for 𝑊 ≥ 𝑐𝑁 [121] and the control of the Green’s function down to the optimal scale Im 𝑧 ≫ 𝑁 −1 , hence delocalization in a strong sense for all eigenvectors, when 𝑊 ≫ 𝑁 6/7 [14] with first four moments matching the Gaussian ones (both results require a block structure and hold in part of the bulk spectrum). While delocalization and Wigner-Dyson-Mehta spectral statistics are expected to occur simultaneously, there is no rigorous argument directly linking them. The Dyson Brownian motion, the cornerstone of the three-step strategy, proves universality for matrices where each entry has a nontrivial Gaussian component and the comparison ideas require that second moments match exactly. Therefore, this approach cannot be applied to matrices with many zero entries. However, a combination of the quantum unique ergodicity with a mean field reduction strategy in [27] yields Wigner-Dyson-Mehta bulk universality for a large class of band matrices with general distribution in the large bandwidth regime 𝑊 ≥ 𝑐𝑁. In contrast to the bulk, universality at the spectral edge is much better understood: extreme eigenvalues follow the Tracy-Widom law for 𝑊 ≫ 𝑁 5/6 , an essentially optimal condition [124].

References [1] Aizenman, M., and Molchanov, S. Localization at large disorder and at extreme energies: an elementary derivation. Comm. Math. Phys. 157(2):245–278, 1993. [2] Aizenman, M., Sims, R., and Warzel, S. Absolutely continuous spectra of quantum tree graphs with weak disorder. Comm. Math. Phys. 264(2):371–389, 2006. doi:10.1007/s00220005-1468-5 [3] Aizenman, M., and Warzel, S. The canopy graph and level statistics for random operators on trees. Math. Phys. Anal. Geom. 9(4):291–333 (2007), 2006. doi:10.1007/s11040-0079018-3 [4] Ajanki,O., Erdős, L., and Krüger, T. Quadratic vector equations on complex upper halfplane. Preprint, 2015. arXiv:1506.05095 [math.PR]. [5] . Universality for general Wigner-type matrices. Probab. Theory Relat. Fields, to appear. doi:10.1007/s00440-016-0740-2. [6] Akemann, G., Baik, J., and Di Francesco, P., eds. The Oxford handbook of random matrix theory. Oxford University Press, Oxford, 2011. MR2920518. [7] Anantharaman, N., and Le Masson, E. Quantum ergodicity on large regular graphs. Duke Math. J. 164(4):723–765, 2015. doi:10.1215/00127094-2881592 [8] Anderson, G. W., Guionnet, A., and Zeitouni, O. An introduction to random matrices. Cambridge Studies in Advanced Mathematics, 118. Cambridge University Press, Cambridge, 2010. [9] Anderson, P. W. Absence of diffusion in certain random lattices. Phys. Rev. 109(5):1492– 1505, 1958. doi:10.1103/PhysRev.109.1492 [10] Bai, Z. D. Convergence rate of expected spectral distributions of large random matrices. I. Wigner matrices. Ann. Probab. 21(2):625–648, 1993. [11] Bai, Z. D., Miao, B., and Tsay, J. Convergence rates of the spectral distributions of large Wigner matrices. Int. Math. J. 1(1):65–90, 2002. [12] Bai, Z. D., and Yin, Y. Q. Limit of the smallest eigenvalue of a large-dimensional sample covariance matrix. Ann. Probab. 21(3):1275–1294, 1993. [13] Bakry, D., and Émery, M. Diffusions hypercontractives. Séminaire de probabilités, XIX, 1983/84, 177–206. Lecture Notes in Mathematics, 1123. Springer, Berlin, 1985. doi:10.1007/BFb0075847 [14] Bao, Z., and Erdős, L. Delocalization for a class of random block band matrices. Probab. Theory Related Fields 1–104, 2016. doi:10.1007/s00440-015-0692-y [15] Bao, Z., Erdős, L., and Schnelli, K. Local law of addition of random matrices on optimal scale. Comm. Math. Phys. 349(3):947–990, 2017. doi:10.1007/s00220-016-2805-6 [16] Bauerschmidt, R., Huang, J., Knowles, A., and Yau, H.-T. Bulk eigenvalue statistics for random regular graphs. Preprint, 2015. arXiv:1505.06700 [math.PR]. [17] Bauerschmidt, R., Knowles, A., and Yau, H.-T. Local semicircle law for random regular graphs. Preprint, 2015. arXiv:1503.08702 [math.PR]. [18] Bekerman, F., Figalli, A., and Guionnet, A. Transport maps for 𝛽-matrix models and universality. Comm. Math. Phys. 338(2):589–619, 2015. doi:10.1007/s00220-015-2384-y 217

218

REFERENCES

[19] Ben Arous, G., and Péché, S. Universality of local eigenvalue statistics for some sample covariance matrices. Comm. Pure Appl. Math. 58(10):1316–1357, 2005. doi:10.1002/cpa.20070 [20] Bleher, P., and Its, A. Semiclassical asymptotics of orthogonal polynomials, RiemannHilbert problem, and universality in the matrix model. Ann. of Math. (2) 150(1):185–266, 1999. doi:10.2307/121101 [21] Bloemendal, A., Erdős, L., Knowles, A., Yau, H.-T., and Yin, J. Isotropic local laws for sample covariance and generalized Wigner matrices. Electron. J. Probab. 19(33):53 pp., 2014. [22] Bloemendal, A., Knowles, A, Yau, H.-T., and Yin, J. On the principal components of sample covariance matrices. Probab. Theory Related Fields 164(1-2):459–552, 2016. doi:10.1007/s00440-015-0616-x [23] Bourgade, P., Erdős, L., and Yau, H.-T. Bulk universality of general 𝛽-ensembles with nonconvex potential. J. Math. Phys. 53(9):095221, 19 pp., 2012. doi:10.1063/1.4751478 [24] . Edge universality of beta ensembles. Comm. Math. Phys. 332(1):261–353, 2014. doi:10.1007/s00220-014-2120-z [25] . Universality of general 𝛽-ensembles. Duke Math. J. 163(6):1127–1190, 2014. doi:10.1215/00127094-2649752 [26] Bourgade, P., Erdős, L., Yau, H.-T., and Yin, J. Fixed energy universality for generalized Wigner matrices. Comm. Pure Appl. Math. 69(10):1815–1881, 2016. doi:10.1002/cpa.21624 [27] . Universality for a class of random band matrices. Preprint, 2016. arXiv:1602.02312 [math.PR]. [28] Bourgade, P., and Yau, H.-T. The eigenvector moment flow and local quantum unique ergodicity. Comm. Math. Phys. 1–48, 2016. doi:10.1007/s00220-016-2627-6 [29] Brascamp, H. J., and Lieb, E. H. Best constants in Young’s inequality, its converse, and its generalization to more than three functions. Advances in Math. 20(2):151–173, 1976. doi:10.1016/0001-8708(76)90184-5 [30] Brézin, E., and Hikami, S. Correlations of nearby levels induced by a random potential. Nuclear Phys. B 479(3):697–706, 1996. doi:10.1016/0550-3213(96)00394-X [31] . Spectral form factor in a random matrix theory. Phys. Rev. E (3) 55(4):4067–4083, 1997. doi:10.1103/PhysRevE.55.4067 [32] Cacciapuoti, C., Maltsev, A., and Schlein, B. Bounds for the Stieltjes transform and the density of states of Wigner matrices. Probab. Theory Related Fields 163(1-2):1–59, 2015. doi:10.1007/s00440-014-0586-4 [33] Carlen, E. A., and Loss, M. Optimal smoothing and decay estimates for viscously damped conservation laws, with applications to the 2-D Navier-Stokes equation. Duke Math. J. 81(1):135–157 (1996), 1995. doi:10.1215/S0012-7094-95-08110-1 [34] Chen, T. Localization lengths and Boltzmann limit for the Anderson model at small disorders in dimension 3. J. Stat. Phys. 120(1):279–337, 2005. doi:10.1007/s10955-005-5255-7 [35] Colin de Verdière, Y. Ergodicité et fonctions propres du laplacien. Comm. Math. Phys. 102(3):497–502, 1985. [36] Davies, E. B. The functional calculus. J. London Math. Soc. (2) 52(1):166–176, 1995. doi:10.1112/jlms/52.1.166 [37] Deift, P. A. Orthogonal polynomials and random matrices: a Riemann-Hilbert approach. Courant Lecture Notes in Mathematics, 3. New York University, Courant Institute of Mathematical Sciences, New York; American Mathematical Society, Providence, R.I., 1999. [38] Deift, P., and Gioev, D. Universality at the edge of the spectrum for unitary, orthogonal, and symplectic ensembles of random matrices. Comm. Pure Appl. Math. 60(6):867–910, 2007. doi:10.1002/cpa.20164

REFERENCES

[39] [40]

[41]

[42]

[43] [44] [45] [46] [47] [48] [49] [50]

[51] [52] [53]

[54] [55] [56] [57] [58]

219

. Universality in random matrix theory for orthogonal and symplectic ensembles. Int. Math. Res. Pap. 2007(2):116, Art. ID rpm004, 2007. doi:10.1093/imrp/rpm004 . Random matrix theory: invariant ensembles and universality. Courant Lecture Notes in Mathematics, 18. Courant Institute of Mathematical Sciences, New York; American Mathematical Society, Providence, R.I., 2009. doi:10.1090/cln/018 Deift, P., Kriecherbauer, T., McLaughlin, K. T.-R., Venakides, S., and Zhou, X. Strong asymptotics of orthogonal polynomials with respect to exponential weights. Comm. Pure Appl. Math. 52(12):1491–1552, 1999. doi:10.1002/(SICI)10970312(199912)52:123.3.CO;2-R . Uniform asymptotics for polynomials orthogonal with respect to varying exponential weights and applications to universality questions in random matrix theory. Comm. Pure Appl. Math. 52(11):1335–1425, 1999. doi:10.1002/(SICI)10970312(199911)52:113.0.CO;2-1 Deuschel, J.-D., and Stroock, D. W. Large deviations. Pure and Applied Mathematics, 137. Academic Press, Boston, 1989. Disertori, M., Pinson, H., and Spencer, T. Density of states for random band matrices. Comm. Math. Phys. 232(1):83–124, 2002. doi:10.1007/s00220-002-0733-0 Dyson, F. J. A Brownian-motion model for the eigenvalues of a random matrix. J. Mathematical Phys. 3:1191–1198, 1962. doi:10.1063/1.1703862 . Statistical theory of the energy levels of complex systems. I, II, III. J. Mathematical Phys. 3:140–156, 1962. doi:10.1063/1.1703773 . Correlations between eigenvalues of a random matrix. Comm. Math. Phys. 19:235–250, 1970. Edmunds, D. E., and Triebel, H. Sharp Sobolev embeddings and related Hardy inequalities: the critical case. Math. Nachr. 207:79–92, 1999. doi:10.1002/mana.1999.3212070105 Efetov, K. Supersymmetry in disorder and chaos. Cambridge University Press, Cambridge, 1997. Erdős, L., and Knowles, A. Quantum diffusion and delocalization for band matrices with general distribution. Ann. Henri Poincaré 12(7):1227–1319, 2007. doi:10.1007/s00023-0110104-5 . Quantum diffusion and eigenfunction delocalization in a random band matrix model. Comm. Math. Phys. 303(2):509–554, 2011. doi:10.1007/s00220-011-1204-2 Erdős, L., Knowles, A., and Yau, H.-T. Averaging fluctuations in resolvents of random band matrices. Ann. Henri Poincaré 14(8):1837–1926, 2013. doi:10.1007/s00023-013-0235-y Erdős, L., Knowles, A., Yau, H.-T., and Yin, J. Spectral statistics of Erdős-Rényi Graphs II: Eigenvalue spacing and the extreme eigenvalues. Comm. Math. Phys. 314(3):587–640, 2012. doi:10.1007/s00220-012-1527-7 . Delocalization and diffusion profile for random band matrices. Comm. Math. Phys. 323(1):367–416, 2013. doi:10.1007/s00220-013-1773-3 . The local semicircle law for a general class of random matrices. Electron. J. Probab. 18(59):58 pp., 2013. doi:10.1214/EJP.v18-2473 . Spectral statistics of Erdős-Rényi graphs I: Local semicircle law. Ann. Probab. 41(3B):2279–2375, 2013. doi:10.1214/11-AOP734 Erdős, L., Péché, S., Ramírez, J. A., Schlein, B., and Yau, H.-T. Bulk universality for Wigner matrices. Comm. Pure Appl. Math. 63(7):895–925, 2010. doi:10.1002/cpa.20317 Erdős, L., Ramírez, J., Schlein, B., Tao, T., Vu, V., and Yau, H.-T. Bulk universality for Wigner Hermitian matrices with subexponential decay. Math. Res. Lett. 17(4):667–674, 2010. doi:10.4310/MRL.2010.v17.n4.a7

220

REFERENCES

[59] Erdős, L., Ramírez, J. A., Schlein, B., and Yau, H.-T. Universality of sine-kernel for Wigner matrices with a small Gaussian perturbation. Electron. J. Probab. 15(18):526–603, 2010. doi:10.1214/EJP.v15-768 [60] Erdős, L., Schlein, B., and Yau, H.-T. Local semicircle law and complete delocalization for Wigner random matrices. Comm. Math. Phys. 287(2):641–655, 2009. doi:10.1007/s00220008-0636-9 [61] . Semicircle law on short scales and delocalization of eigenvectors for Wigner random matrices. Ann. Probab. 37(3):815–852, 2009. doi:10.1214/08-AOP421 [62] . Wegner estimate and level repulsion for Wigner random matrices. Int. Math. Res. Not. IMRN 2010(3):436–479, 2010. doi:10.1093/imrn/rnp136 [63] . Universality of random matrices and local relaxation flow. Invent. Math. 185(1):75–119, 2011. doi:10.1007/s00222-010-0302-7 [64] Erdős, L., Schlein, B., Yau, H.-T., and Yin, J. The local relaxation flow approach to universality of the local statistics for random matrices. Ann. Inst. Henri Poincaré Probab. Stat. 48(1):1–46, 2012. doi:10.1214/10-AIHP388 [65] Erdős, L., and Schnelli, K. Universality for random matrix flows with time-dependent density. Preprint, 2015. arXiv:1504.00650 [math.PR]. [66] Erdős, L., and Yau, H.-T. A comment on the Wigner-Dyson-Mehta bulk universality conjecture for Wigner matrices. Electron. J. Probab. 17(28):5 pp., 2012. doi:10.1214/EJP.v171779 [67] . Gap universality of generalized Wigner and 𝛽-ensembles. J. Eur. Math. Soc. (JEMS) 17(8):1927–2036, 2015. doi:10.4171/JEMS/548 [68] Erdős, L., Yau, H.-T., and Yin, J. Universality for generalized Wigner matrices with Bernoulli distribution. J. Comb. 2(1):15–81, 2011. doi:10.4310/JOC.2011.v2.n1.a2 [69] . Bulk universality for generalized Wigner matrices. Probab. Theory Related Fields 154(1-2):341–407, 2012. doi:10.1007/s00440-011-0390-3 [70] . Rigidity of eigenvalues of generalized Wigner matrices. Adv. Math. 229(3):1435– 1515, 2012. doi:10.1016/j.aim.2011.12.010 [71] Firk, F. W. K., and Miller, S. J. Nuclei, primes and the random matrix connection. Symmetry 1:64–105. doi:10.3390/sym1010064 [72] Fokas, A. S., It.s, A. R., and Kitaev, A. V. The isomonodromy approach to matrix models in 2D quantum gravity. Comm. Math. Phys. 147(2):395–430, 1992. doi:10.1007/BF02096594 [73] Forrester, P. J. Log-gases and random matrices. London Mathematical Society Monographs Series, 34. Princeton University Press, Princeton, N.J., 2010. doi:10.1515/9781400835416 [74] Fröhlich, J., and Spencer, T. Absence of diffusion in the Anderson tight binding model for large disorder or low energy. Comm. Math. Phys. 88(2):151–184, 1983. [75] Fukushima, M., Ōshima, Y., and Takeda, M. Dirichlet forms and symmetric Markov processes. De Gruyter Studies in Mathematics, 19. Walter de Gruyter, Berlin, 1994. doi:10.1515/9783110889741 [76] Fyodorov, Y. V., and Mirlin, A. D. Scaling properties of localization in random band matrices: a 𝜍-model approach. Phys. Rev. Lett. 67(18):2405–2409, 1991. doi:10.1103/PhysRevLett.67.2405 [77] Girko, V. L. Asymptotics of the distribution of the spectrum of random matrices. Translated from Uspekhi Mat. Nauk 44(4(268)):7–34, 256, 1989; Russian Math. Surveys 44(4):3– 36, 1989. doi:10.1070/RM1989v044n04ABEH002143 [78] Götze, F., Naumov, A., and Tikhomirov, A. Local semicircle law under moment conditions. Part I: The Stieltjes transform. Preprint, 2015. arXiv:1510.07350 [math.PR]. [79] Gross, L. Logarithmic Sobolev inequalities. Amer. J. Math. 97(4):1061–1083, 1975. doi:10.2307/2373688

REFERENCES

221

[80] Guionnet, A., and Zeitouni, O. Concentration of the spectral measure for large matrices. Electron. Comm. Probab. 5:119–136, 2000. doi:10.1214/ECP.v5-1026 [81] Helffer, B., and Sjöstrand, J. On the correlation for Kac-like models in the convex case. J. Statist. Phys. 74(1-2):349–409, 1994. doi:10.1007/BF02186817 [82] Holowinsky, R. Sieving for mass equidistribution. Ann. of Math. (2) 172(2):1499–1516, 2010. [83] Holowinsky, R., and Soundararajan, K. Mass equidistribution for Hecke eigenforms. Ann. of Math. (2) 172(2):1517–1528, 2010. [84] Huang, J., Landon, B., and Yau, H.-T. Bulk universality of sparse random matrices. J. Math. Phys. 56(12):123301, 19 pp., 2015. doi:10.1063/1.4936139 [85] Itzykson, C., and Zuber, J. B. The planar approximation. II. J. Math. Phys. 21(3):411–421, 1980. doi:10.1063/1.524438 [86] Johansson, K. Universality of the local spacing distribution in certain ensembles of Hermitian Wigner matrices. Comm. Math. Phys. 215(3):683–705, 2001. doi:10.1007/s002200000328 [87] Klein, A. Absolutely continuous spectrum in the Anderson model on the Bethe lattice. Math. Res. Lett. 1(4):399–407, 1994. doi:10.4310/MRL.1994.v1.n4.a1 [88] Knowles, A., and Yin, J. Eigenvector distribution of Wigner matrices. Probab. Theory Related Fields 155(3-4):543–582, 2013. doi:10.1007/s00440-011-0407-y [89] . The isotropic semicircle law and deformation of Wigner matrices. Comm. Pure Appl. Math. 66(11):1663–1750, 2013. doi:10.1002/cpa.21450 [90] . Anisotropic local laws for random matrices. Probab. Theory Relat. Fields 2016, 1–96. doi:10.1007/s00440-016-0730-4 [91] Kriecherbauer, T., and Shcherbina, M. Fluctuations of eigenvalues of matrix models and their applications. Preprint, 2010. arXiv:1003.6121. [math-ph]. [92] Krishnapur, M., Rider, B., and Virág, B. Universality of the stochastic Airy operator. Comm. Pure Appl. Math. 69(1):145–199, 2016. doi:10.1002/cpa.21573 [93] Landon, B., and Yau, H.-T. Convergence of local statistics of Dyson Brownian motion. Comm. Math. Phys., to appear. [94] Lee, J. O., and Schnelli, K. Local deformed semicircle law and complete delocalization for Wigner matrices with random potential. J. Math. Phys. 54(10):103504, 62 pp., 2013. doi:10.1063/1.4823718 [95] . Edge universality for deformed Wigner matrices. Rev. Math. Phys. 27(8):1550018, 94 pp., 2015. doi:10.1142/S0129055X1550018X [96] . Tracy–Widom distribution for the largest eigenvalue of real sample covariance matrices with general population. Ann. Appl. Probab. 26(6):3786–3839, 2016. doi:10.1214/16-AAP1193 [97] Lee, J. O., Schnelli, K., Stetler, B., and Yau, H.-T. Bulk universality for deformed Wigner matrices. Ann. Probab. 44(3):2349–2425, 2016. doi:10.1214/15-AOP1023 [98] Lee, J. O., and Yin, J. A necessary and sufficient condition for edge universality of Wigner matrices. Duke Math. J. 163(1):117–173, 2014. doi:10.1215/00127094-2414767 [99] Levin, E., and Lubinsky, D. S. Universality limits in the bulk for varying measures. Adv. Math. 219(3):743–779, 2008. doi:10.1016/j.aim.2008.06.010 [100] Lieb, E. H., and Loss, M. Analysis. Second edition. Graduate Studies in Mathematics, 14. American Mathematical Society, Providence, R.I., 2001. doi:10.1090/gsm/014 [101] Lindenstrauss, E. Invariant measures and arithmetic quantum unique ergodicity. Ann. of Math. (2) 163(1):165–219, 2006. doi:10.4007/annals.2006.163.165 [102] Lubinsky, D. S. A new approach to universality limits involving orthogonal polynomials. Ann. of Math. (2) 170(2):915–939, 2009. doi:10.4007/annals.2009.170.915

222

REFERENCES

[103] Lytova, A., and Pastur, L. Central limit theorem for linear eigenvalue statistics of random matrices with independent entries. Ann. Probab. 37(5):1778–1840, 2009. doi:10.1214/09AOP452 [104] Marčenko, V. A., and Pastur, L. A. Distribution of eigenvalues in certain sets of random matrices. Mat. Sb. (N.S.) 72(114):507–536, 1967. [105] Mehta, M. L. A note on correlations between eigenvalues of a random matrix. Comm. Math. Phys. 20:245–250, 1971. [106] . Random matrices. Second edition. Academic Press, Boston, 1991. [107] Mehta, M. L., and Gaudin, M. On the density of eigenvalues of a random matrix. Nuclear Phys. 18:420–427, 1960. [108] Minami, N. Local fluctuation of the spectrum of a multidimensional Anderson tight binding model. Comm. Math. Phys. 177(3):709–725, 1996. [109] Naddaf, A., and Spencer, T. On homogenization and scaling limit of some gradient perturbations of a massless free field. Comm. Math. Phys. 183(1):55–84, 1997. doi:10.1007/BF02509796 [110] Pastur, L. A. Spectra of random selfadjoint operators. Uspehi Mat. Nauk 28(1(169)):3–64, 1973. [111] Pastur, L., and Shcherbina, M. On the edge universality of the local eigenvalue statistics of matrix models. Mat. Fiz. Anal. Geom. 10(3):335–365, 2003. [112] . Bulk universality and related properties of Hermitian matrix models. J. Stat. Phys. 130(2):205–250, 2008. doi:10.1007/s10955-007-9434-6 [113] . Eigenvalue distribution of large random matrices. Mathematical Surveys and Monographs, 171. American Mathematical Society, Providence, R.I., 2011. doi:10.1090/surv/171 [114] Pillai, N. S., and Yin, J. Universality of covariance matrices. Ann. Appl. Probab. 24(3):935– 1001, 2014. doi:10.1214/13-AAP939 [115] Rudnick, Z., and Sarnak, P. The behaviour of eigenstates of arithmetic hyperbolic manifolds. Comm. Math. Phys. 161(1):195–213, 1994. [116] Schenker, J. Eigenvector localization for random band matrices with power law band width. Comm. Math. Phys. 290(3):1065–1097, 2009. doi:10.1007/s00220-009-0798-0 [117] Shcherbina, M. Edge universality for orthogonal ensembles of random matrices. J. Stat. Phys. 136(1):35–50, 2009. doi:10.1007/s10955-009-9766-5 [118] . Central limit theorem for linear eigenvalue statistics of the Wigner and sample covariance random matrices. Zh. Mat. Fiz. Anal. Geom. 7(2):176–192, 197, 199, 2011. [119] . Change of variables as a method to study general 𝛽-models: bulk universality. J. Math. Phys. 55(4):043504, 23 pp., 2014. doi:10.1063/1.4870603 [120] Shcherbina, T. On the second mixed moment of the characteristic polynomials of 1D band matrices. Comm. Math. Phys. 328(1):45–82, 2014. doi:10.1007/s00220-014-1947-7 [121] . Universality of the local regime for the block band matrices with a finite number of blocks. J. Stat. Phys. 155(3):466–499, 2014. doi:10.1007/s10955-014-0964-4 [122] . Universality of the second mixed moment of the characteristic polynomials of the 1D band matrices: real symmetric case. J. Math. Phys. 56(6):063303, 23 pp., 2015. doi:10.1063/1.4922621 [123] Šnirel′ man, A. I. Ergodic properties of eigenfunctions. Uspehi Mat. Nauk 29(6(180)):181– 182, 1974. [124] Sodin, S. The spectral edge of some random band matrices. Ann. of Math. (2) 172(3):2223– 2251, 2010. doi:10.4007/annals.2010.172.2223 [125] Soshnikov, A. Universality at the edge of the spectrum in Wigner random matrices. Comm. Math. Phys. 207(3):697–733, 1999. doi:10.1007/s002200050743

REFERENCES

223

[126] Spencer, T. Random banded and sparse matrices. The Oxford handbook of random matrix theory, 471–488. Oxford University Press, Oxford, 2011. [127] Stroock, D. W. Probability theory, an analytic view. Cambridge University Press, Cambridge, 1993. [128] Tao, T. Topics in random matrix theory. Graduate Studies in Mathematics, 132. American Mathematical Society, Providence, R.I., 2012. doi:10.1090/gsm/132 [129] . The asymptotic distribution of a single eigenvalue gap of a Wigner matrix. Probab. Theory Related Fields 157(1-2):81–106, 2013. doi:10.1007/s00440-012-0450-3 [130] Tao, T., and Vu, V. Random matrices: localization of the eigenvalues and the necessity of four moments. Acta Math. Vietnam. 36(2):431–449, 2011. [131] . Random matrices: universality of local eigenvalue statistics. Acta Math. 206(1):127–204, 2011. doi:10.1007/s11511-011-0061-3 [132] . The Wigner-Dyson-Mehta bulk universality conjecture for Wigner matrices. Electron. J. Probab. 16(77):2104–2121, 2011. doi:10.1214/EJP.v16-944 [133] . Random matrices: universal properties of eigenvectors. Random Matrices Theory Appl. 1(1):1150001, 27 pp., 2012. doi:10.1142/S2010326311500018 [134] Tracy, C. A., and Widom, H. Level-spacing distributions and the Airy kernel. Comm. Math. Phys. 159(1):151–174, 1994. [135] . On orthogonal and symplectic matrix ensembles. Comm. Math. Phys. 177(3):727– 754, 1996. [136] Valkó, B., and Virág, B. Continuum limits of random matrices and the Brownian carousel. Invent. Math. 177(3):463–508, 2009. doi:10.1007/s00222-009-0180-z [137] Voiculescu, D. Limit laws for random matrices and free products. Invent. Math. 104(1):201–220, 1991. doi:10.1007/BF01245072 [138] Wigner, E. P. Characteristic vectors of bordered matrices with infinite dimensions. Ann. of Math. (2) 62:548–564, 1955. doi:10.2307/1970079 [139] Wishart, J. The generalised product moment distribution in samples from a normal multivariate population. Biometrika 20A(1/2):32–52, 1928. doi:10.2307/2331939 [140] Yau, H.-T. Relative entropy and hydrodynamics of Ginzburg-Landau models. Lett. Math. Phys. 22(1):63–80, 1991. doi:10.1007/BF00400379 [141] Zelditch, S. Uniform distribution of eigenfunctions on compact hyperbolic surfaces. Duke Math. J. 55(4):919–941, 1987. doi:10.1215/S0012-7094-87-05546-3

Index

𝛽-ensemble, 20

gap universality; averaged, fixed, 30, 204 Gaudin distribution, 2 Gaussian 𝛽-ensemble, 20, 32 Gaussian divisible ensemble, 4, 32 Gaussian orthogonal ensemble (GOE), 2, 8 Gaussian unitary ensemble (GUE), 2, 8 generalized Wigner matrix, 3, 8 generator, 115 Gibbs inequality, 123 Gibbs measure, 19 Green function, 35 Green function comparison, 183, 192, 195 Green function continuity, 171 Gross’ hypercontractivity bound, 136

admissible control parameter, 67 Airy function, 26 Airy kernel, 26 Anderson localization, 211 Anderson transition, 212, 215 backtracking sequence, 12 Bakry-Émery argument, 129 bandwidth, 215 Brascamp-Lieb inequality, 138, 145 Catalan numbers, 13 Christoffel-Darboux formula, 23 classical exponent, 19 classical location of eigenvalues, 106 concentration inequality, 133 correlation function, 22

Haar measure, 19, 214 Harish-Chandra-Itzykson-Zuber formula, 3 Helffer-Sjöstrand formula, 100 Herbst bound, 133 Hermite polynomial, 21

deformed Wigner matrix, 214 delocalization, 31, 212 determinantal correlation, 23 Dirichlet form, 114, 129, 151 Dirichlet form inequality, 155, 163 Dyson conjecture, 4 Dyson Brownian motion (DBM), 4, 32, 110 Dyson conjecture, 32, 116, 153

integrated density of states, 103 interlacing of eigenvalues, 47 invariant ensemble, 2, 17, 207 invariant measure, 32 isotropic local law, 39, 39 level repulsion, 111 local equilibrium, 4, 32 local relaxation flow, 154 local relaxation measure, 153 local semicircle law, 4, 39 log-gas, 20, 207 logarithmic Sobolev inequality (LSI), 129

edge universality, 191, 210 empirical distribution function, 103 empirical eigenvalue distribution, 103 empirical eigenvalue measure, 14 energy parameter, 14 energy universality; averaged, fixed, 29, 203 entropy inequality, 125

Marchenko-Pastur law, 11 Marcinkiewicz-Zygmund inequality, 58 mean field model, 8 moment matching, 188

fluctuation averaging, 40, 61, 73, 83 free convolution, 214 225

226

moment method, 12 Ornstein-Uhlenbeck process, 4, 31, 109, 171 partial expectation, 47, 62 Pinsker inequality, 125 Plancherel-Rotach asymptotics, 24, 26 quadratic large deviation, 58 quantum (unique) ergodicity, 213, 216 quaternion determinant, 25 random band matrix, 33, 215 random Schrödinger operator, 33, 211 regular graph, 215 regular potential, 208 relative entropy, 123 resolvent, 14, 35 resolvent decoupling identities, 62 Riemann-Hilbert method, 3 rigidity of eigenvalues, 31, 106, 208 sample covariance matrix, 9, 215 Schur formula, 45 self-consistent equation, 40, 49, 61

INDEX

sine-kernel, 24 single-entry distribution, 8 sparse graph, 214 spectral domain, 37, 67 spectral edge, 34 spectral gap, 132 spectral parameter, 14 Stieltjes transform, 14, 34, 100 stochastic domination, 37 three-step strategy, 4, 31, 205 Tracy-Widom law, 191 universal Wigner matrix, 7 Vandermonde determinant, 18 Ward identity, 63 weak local law, 39, 45 Wigner-Dyson-Mehta (WDM) conjecture, 2, 216 Wigner-type matrix, 214 Wigner ensemble, 2, 7 Wigner semicircle law, 11, 34 Wigner surmise, 2 Wishart matrix, 9

Published Titles in This Series 28 L´ aszl´ o Erd˝ os and Horng-Tzer Yau, A Dynamical Approach to Random Matrix Theory, 2017 27 S. R. S. Varadhan, Large Deviations, 2016 26 Jerome K. Percus and Stephen Childress, Mathematical Models in Developmental Biology, 2015 25 Kurt O. Friedrichs, Mathematical Methods of Electromagnetic Theory, 2014 24 Christof Sch¨ utte and Marco Sarich, Metastability and Markov State Models in Molecular Dynamics, 2013 23 Jerome K. Percus, Mathematical Methods in Immunology, 2011 22 Frank C. Hoppensteadt, Mathematical Methods for Analysis of a Complex Disease, 2011 21 Frank C. Hoppensteadt, Quasi-Static State Analysis of Differential, Difference, Integral, and Gradient Systems, 2010 20 Pierpaolo Esposito, Nassif Ghoussoub, and Yujin Guo, Mathematical Analysis of Partial Differential Equations Modeling Electrostatic MEMS, 2010 19 Stephen Childress, An Introduction to Theoretical Fluid Mechanics, 2009 18 Percy Deift and Dimitri Gioev, Random Matrix Theory: Invariant Ensembles and Universality, 2009 17 Ping Zhang, Wigner Measure and Semiclassical Limits of Nonlinear Schr¨ odinger Equations, 2008 16 S. R. S. Varadhan, Stochastic Processes, 2007 15 Emil Artin, Algebra with Galois Theory, 2007 14 Peter D. Lax, Hyperbolic Partial Differential Equations, 2006 13 Oliver B¨ uhler, A Brief Introduction to Classical, Statistical, and Quantum Mechanics, 2006 12 J¨ urgen Moser and Eduard J. Zehnder, Notes on Dynamical Systems, 2005 11 V. S. Varadarajan, Supersymmetry for Mathematicians: An Introduction, 2004 10 Thierry Cazenave, Semilinear Schr¨ odinger Equations, 2003 9 Andrew Majda, Introduction to PDEs and Waves for the Atmosphere and Ocean, 2003 8 Fedor Bogomolov and Tihomir Petrov, Algebraic Curves and One-Dimensional Fields, 2002 7 S. R. S. Varadhan, Probability Theory, 2001 6 Louis Nirenberg, Topics in Nonlinear Functional Analysis, 2001 5 Emmanuel Hebey, Nonlinear Analysis on Manifolds: Sobolev Spaces and Inequalities, 2000 3 Percy Deift, Orthogonal Polynomials and Random Matrices: A Riemann-Hilbert Approach, 2000 2 Jalal Shatah and Michael Struwe, Geometric Wave Equations, 2000 1 Qing Han and Fanghua Lin, Elliptic Partial Differential Equations, Second Edition, 2011

A Dynamical Approach to Random Matrix Theory L Á S Z L Ó E R DO˝ S A ND HO R NG-T ZER YAU

This book is a concise and self-contained introduction of recent techniques to prove local spectral universality for large random matrices. Random matrix theory is a fast expanding research area, and this book mainly focuses on the methods that the authors participated in developing over the past few years. Many other interesting topics are not included, and neither are several new developments within the framework of these methods. The authors have chosen instead to present key concepts that they believe are the core of these methods and should be relevant for future applications. They keep technicalities to a minimum to make the book accessible to graduate students. With this in mind, they include in this book the basic notions and tools for high-dimensional analysis, such as large deviation, entropy, Dirichlet form, and the logarithmic Sobolev inequality. This manuscript has been developed and continuously improved over the last five years. The authors have taught this material in several regular graduate courses at Harvard, Munich, and Vienna, in addition to various summer schools and short courses.

For additional information and updates on this book, visit www.ams.org/bookpages/cln-28

CLN/28

AMS on the Web

New York University

www.ams.org