Large deviations for random graphs : École d'Été de Probabilités de Saint-Flour XLV - 2015 978-3-319-65816-2, 3319658166, 978-3-319-65815-5

This book addresses the emerging body of literature on the study of rare events in random graphs and networks. For examp

356 72 2MB

English Pages 170 [175] Year 2017

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Large deviations for random graphs : École d'Été de Probabilités de Saint-Flour XLV - 2015
 978-3-319-65816-2, 3319658166, 978-3-319-65815-5

Table of contents :
Front Matter ....Pages i-xi
Introduction (Sourav Chatterjee)....Pages 1-6
Preparation (Sourav Chatterjee)....Pages 7-25
Basics of Graph Limit Theory (Sourav Chatterjee)....Pages 27-41
Large Deviation Preliminaries (Sourav Chatterjee)....Pages 43-51
Large Deviations for Dense Random Graphs (Sourav Chatterjee)....Pages 53-70
Applications of Dense Graph Large Deviations (Sourav Chatterjee)....Pages 71-97
Exponential Random Graph Models (Sourav Chatterjee)....Pages 99-117
Large Deviations for Sparse Graphs (Sourav Chatterjee)....Pages 119-164
Back Matter ....Pages 165-170

Citation preview

Lecture Notes in Mathematics  2197

École d'Été de Probabilités de Saint-Flour

Sourav Chatterjee

Large Deviations for Random Graphs École d'Été de Probabilités de SaintFlour XLV - 2015

Lecture Notes in Mathematics Editors-in-Chief: Jean-Michel Morel, Cachan Bernard Teissier, Paris Advisory Board: Michel Brion, Grenoble Camillo De Lellis, Zurich Alessio Figalli, Zurich Davar Khoshnevisan, Salt Lake City Ioannis Kontoyiannis, Athens Gábor Lugosi, Barcelona Mark Podolskij, Aarhus Sylvia Serfaty, New York Anna Wienhard, Heidelberg

More information about this series at http://www.springer.com/series/304

2197

Sourav Chatterjee

Large Deviations for Random Graphs École d’Été de Probabilités de Saint-Flour XLV - 2015

123

Sourav Chatterjee Department of Statistics Stanford University Stanford, CA, USA

ISSN 0075-8434 ISSN 1617-9692 (electronic) Lecture Notes in Mathematics ISSN 0721-5363 École d’Été de Probabilités de Saint-Flour ISBN 978-3-319-65815-5 ISBN 978-3-319-65816-2 (eBook) DOI 10.1007/978-3-319-65816-2 Library of Congress Control Number: 2017951112 Mathematics Subject Classification (2010): 60-XX © Springer International Publishing AG 2017 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Printed on acid-free paper This Springer imprint is published by Springer Nature The registered company is Springer International Publishing AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

To the memory of my grandfather, Tarapada Chatterjee

Preface

These lecture notes were prepared for the 45th Saint-Flour Probability Summer School in July 2015. They contain an exposition of the main developments in the large deviations theory for random graphs in the last few years. I have tried to make the exposition as self-contained as possible, so that the reader will have essentially no need for looking up external results. For example, the necessary components of graph limit theory are developed from scratch. Instead of going through Szemerédi’s regularity lemma, I have taken an alternative analytic approach that develops the discrete and continuous theory “at one go”, making it unnecessary to use martingales for passing from discrete to continuous. Similarly, the required results from classical large deviations theory and concentration of measure are also developed from scratch. After the above preparatory materials, the main topics covered here are large deviations theory for dense random graphs, exponential random graph models, nonlinear large deviations, and large deviations for sparse random graphs. I have tried to write the monograph in a way that is accessible to beginning graduate students in mathematics and statistics, with some background in graduatelevel probability theory. Advanced readers may find some parts of the exposition to be too elementary. To avoid clutter, references to the literature are not given within the main material but summarized at the end of each chapter. I have tried to be as comprehensive as possible in my literature review, and I apologize for any inadvertent omission. It was a matter of great honor for me to be invited to deliver these lectures, which have a legendary status in the world of probability theory. I thank the scientific committee for selecting me for this honor—and, moreover, for allowing me to delay my lectures by 1 year after I had to cancel in 2014 due to the birth of my son. I am especially grateful to the local organizers, Laurent Serlet and Christophe Bahadoran, for taking care of every little practical detail during my visit. It was a great pleasure and a great learning experience to interact with my colecturers, Sara van de Geer and Lorenzo Zambotti. The students and other attendees at Saint-Flour were a source of inspiration. It is every lecturer’s dream to have an audience of that caliber. vii

viii

Preface

The development of large deviations for random graphs involved a number of my colleagues, who have contributed greatly to the topic at various points. I would like to take this opportunity to acknowledge their contributions. In particular, I would like to thank Bhaswar Bhattacharya, Amir Dembo, Shirshendu Ganguly, Eyal Lubetzky, Charles Radin, Raghu Varadhan, Mei Yin, and Yufei Zhao for the many useful interactions I have had with them over the last few years. I am grateful to Alexander Dunlap for carefully reading the manuscript and pointing out many typos and mistakes. The responsibility for any error that still remains is solely mine. Finally, a very special thanks to my wife and son for bringing joy and fulfillment to my life, to my parents for their good wishes, and to Persi Diaconis for his advice and encouragement. Stanford, CA, USA June 2017

Sourav Chatterjee

Contents

1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.1 Large Deviations .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.2 The Problem with Nonlinearity . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.3 Recent Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

1 1 3 3 5

2 Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.1 Probabilistic Preliminaries . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.2 Discrete Approximations of L2 Functions .. . . . . . .. . . . . . . . . . . . . . . . . . . . 2.3 The Weak Topology and Its Compactness .. . . . . . .. . . . . . . . . . . . . . . . . . . . 2.4 Compact Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.5 A Generalized Hölder’s Inequality.. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.6 The FKG Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Bibliographical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

7 7 10 14 17 21 23 25 25

3 Basics of Graph Limit Theory .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.1 Graphons and Homomorphism Densities . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.2 The Cut Metric .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.3 Equivalence Classes of Graphons .. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.4 Graphons as Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.5 Compactness of the Space of Graphons . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Bibliographical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

27 27 29 31 32 34 39 40

4 Large Deviation Preliminaries .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.1 Definition of Rate Function . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.2 A Local-to-Global Transference Principle . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.3 A General Upper Bound .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.4 The Azuma–Hoeffding Inequality . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.5 McDiarmid’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

43 43 44 45 47 50

ix

x

Contents

Bibliographical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

50 51

5 Large Deviations for Dense Random Graphs . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.1 The Erd˝os–Rényi Model . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.2 The Rate Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.3 Large Deviation Upper Bound in the Weak Topology . . . . . . . . . . . . . . . 5.4 Large Deviation Principle for Dense Random Graphs .. . . . . . . . . . . . . . 5.5 Conditional Distributions.. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Bibliographical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

53 53 54 57 60 68 69 70

6 Applications of Dense Graph Large Deviations . . . . . .. . . . . . . . . . . . . . . . . . . . 6.1 Graph Parameters .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.2 Rate Functions for Graph Parameters.. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.3 Large Deviations for Graph Parameters . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.4 Large Deviations for Subgraph Densities . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.5 Euler–Lagrange Equations . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.6 The Symmetric Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.7 Symmetry Breaking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.8 The Phase Boundary for Regular Subgraphs .. . . .. . . . . . . . . . . . . . . . . . . . 6.9 The Double Phase Transition . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.10 The Lower Tail and Sidorenko’s Conjecture . . . . .. . . . . . . . . . . . . . . . . . . . 6.11 Large Deviations for the Largest Eigenvalue .. . . .. . . . . . . . . . . . . . . . . . . . Bibliographical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

71 71 73 74 76 78 83 86 88 93 93 94 97 97

7 Exponential Random Graph Models . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.1 Formal Definition Using Graphons . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.2 Normalizing Constant . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.3 Asymptotic Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.4 An Explicitly Solvable Case . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.5 Another Solvable Example .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.6 Phase Transition in the Edge-Triangle Model .. . .. . . . . . . . . . . . . . . . . . . . 7.7 Euler–Lagrange Equations . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.8 The Symmetric Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.9 Symmetry Breaking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Bibliographical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

99 99 100 102 104 106 107 111 113 114 115 116

8 Large Deviations for Sparse Graphs . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.1 An Approximation for Partition Functions . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.2 Gradient Complexity of Homomorphism Densities . . . . . . . . . . . . . . . . . 8.3 Quantitative Estimates for Exponential Random Graphs .. . . . . . . . . . . 8.4 Nonlinear Large Deviations .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

119 119 129 138 141

Contents

8.5 Quantitative Estimates for Homomorphism Densities .. . . . . . . . . . . . . . 8.6 Explicit Rate Function for Triangles .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Bibliographical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

xi

147 153 161 164

Index . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 165

Chapter 1

Introduction

This introductory chapter lays out the general plan of the book and gives a quick description of the main issues, reproducing from the first three sections of my recent survey article [6].

1.1 Large Deviations The theory of large deviations studies two things: (a) the probabilities of rare events, and (b) the conditional probabilities of events given that some rare event has occurred. Often, the second question is more interesting than the first, but it is usually essential to answer the first question to be able to understand how to approach the second. As an illustration, consider the following simple example. Toss a fair coin n times, where n is a large number. Under normal circumstances, you expect to get approximately n=2 heads. Also, you expect to get roughly n=4 pairs of consecutive heads. However, suppose that the following rare event occurs: the tosses yield  2n=3 heads. Classical large deviations theory allows us to compute that the probability of this rare event is 5=3 =3/.1Co.1//

en log.2

(1.1.1)

as n ! 1. Moreover, it can be shown that if this rare event has occurred, then it is highly likely that there are approximately 4n=9 pairs of consecutive heads instead of the usual n=4. So, how is the estimate (1.1.1) obtained? The argument goes roughly as follows. Let X1 ; : : : ; Xn be independent random variables, such that P.Xi D 0/ D P.Xi D 1/

© Springer International Publishing AG 2017 S. Chatterjee, Large Deviations for Random Graphs, Lecture Notes in Mathematics 2197, DOI 10.1007/978-3-319-65816-2_1

1

2

1 Introduction

D 1=2 for each i. Then the number of heads in n tosses of a fair coin can be modeled by the sum Sn WD X1 C    C Xn . For any   0, P.Sn  2n=3/ D P.e Sn  e2 n=3 / E.e Sn / (Markov’s inequality) e2 n=3 Q Qn  Xi E. niD1 e Xi / / iD1 E.e D D (independence) 2 n=3 2 n=3 e e   1 C e n : D e2 n=3 2 

Optimizing over  gives the desired upper bound. To prove the lower bound, take some  > 0 and define a random variable Z as: ( 1 if 2n=3  Sn  .1 C /2n=3; Z WD 0 otherwise. Then for any   0, P.Sn  2n=3/  E.Z/  e.1C/2n=3 E.e Sn Z/ .since Sn  .1 C /2n=3 when Z ¤ 0/   1 C e n E.e Sn Z/ : D e.1C/2n=3 2 E.e Sn / The proof is completed by showing that if  is chosen to be the same number that optimized the upper bound and  is sent to zero sufficiently slowly as n ! 1, then E.e Sn Z/ D eo.n/ : E.e Sn / Establishing the above claim is the most nontrivial part of the whole argument, but is by now standard. This is sometimes called the ‘change of measure trick’. The above example has a built-in linearity, which allowed us to explicitly compute E.e Sn /. Generalizing this idea, classical large deviations theory possesses a collection of powerful tools to deal with linear functionals of independent random variables, random vectors, random functions, random probability measures and other abstract random objects. The classic text of Dembo and Zeitouni [10] contains an in-depth introduction to this broad area.

1.3 Recent Developments

3

1.2 The Problem with Nonlinearity In spite of the remarkable progress with linear functionals, there are no general tools for large deviations of nonlinear functionals. Nonlinearity arises naturally in many contexts. For instance, the analysis of real-world networks has been one of the most popular scientific endeavors in the last two decades, and rare events on networks are often nonlinear in nature. This is demonstrated by the following simple example. Construct a random graph on n vertices by putting an undirected edge between any two with probability p, independently of each other. This is known as the Erd˝os– Rényi G.n; p/ model, originally defined by Erd˝os and Rényi [12]. The model is too simplistic to be a model for any real-world network, but has many nice mathematical properties and has led to the developments of many new techniques in combinatorics and probability theory over the years. One can ask the following large deviation questions about this model: (a) What is the probability that the number of triangles in a G.n; p/ random graph is at least 1 C ı times the expected value of the number of triangles, where ı is some given number? (b) What is the most likely structure of the graph, if we know that the above rare event has occurred? This is an example of a nonlinear problem, because the number of triangles in G.n; p/ is a degree three polynomial of independent random variables. To see this, let f1; : : : ; ng be the set of vertices, and let Xij be the random variable that is 1 if the edge f i; jg is present in the graph and 0 if not. Then .Xij /1i p; and if ı1 < ı < ı2 , then the conditional structure is not like an Erd˝os–Rényi graph. Explicit formulas for ı1 and ı2 were derived by Lubetzky and Zhao [17]. In other words, if the number triangles exceeds the expected value by a little bit or by a lot, then the most likely scenario is that there is an excess number of edges spread uniformly; and if the surplus amount belongs to a middle range, then the structure of the graph is likely to be inhomogeneous. There is probably no way that the above result could have been guessed from intuition; it was derived purely from a set of mathematical formulas. The general theory of [9] and its main results and applications are described in Chaps. 5 and 6, after reviewing some preparatory materials in Chaps. 2, an introduction to graph limit theory in Chap. 3, and an introduction to classical large deviations theory in Chap. 4. The large deviation theory for the Erd˝os–Rényi model has been extended to more realistic models of random graphs. For example, it was applied to exponential random graph models in Chatterjee and Diaconis [8] and a number of subsequent papers by Charles Radin, Mei Yin, Rick Kenyon and their collaborators [1, 13, 14, 19, 20, 23]. These models are widely used in the analysis of real social networks. Applications of random graph large deviations to exponential random graph models are discussed in Chap. 7. The theory developed in [9] has one serious limitation: it applies only to dense graphs. A graph is called dense if the average vertex degree is comparable to the total number of vertices (recall that the number of neighbors of a vertex is called its degree). For example, in the Erd˝os–Rényi model with n D 10000 and p D :3, the average degree is roughly 3000. This is not true for real networks, which are usually sparse. Unfortunately, the graph theoretic tools used for the analysis of large deviations for random graphs are useful only in the dense setting. In spite of considerable progress in developing a theory of sparse graph limits [3– 5], there is still no result that fully captures the power of Szemerédi’s lemma in the sparse setting. In the absence of such tools, a nascent theory of ‘nonlinear large

References

5

deviations’, developed in Chatterjee and Dembo [7] and recently improved by Eldan [11], has been helpful in solving some questions about large deviations for sparse random graphs, for example in Lubetzky and Zhao [18] and Bhattacharya, Ganguly, Lubetzky and Zhao [2]. This theory is discussed in Chap. 8.

References 1. Aristoff, D., & Radin, C. (2013). Emergent structures in large networks. Journal of Applied Probability, 50(3), 883–888. 2. Bhattacharya, B. B., Ganguly, S., Lubetzky, E., & Zhao, Y. (2015). Upper tails and independence polynomials in random graphs. arXiv preprint arXiv:1507.04074. 3. Bollobás, B., & Riordan, O. (2009). Metrics for sparse graphs. In Surveys in combinatorics 2009, vol. 365, London Mathematical Society Lecture Note Series (pp. 211–287). Cambridge: Cambridge University Press. 4. Borgs, C., Chayes, J. T., Cohn, H., & Zhao, Y. (2014). An Lp theory of sparse graph convergence I: Limits, sparse random graph models, and power law distributions. arXiv preprint arXiv:1401.2906. 5. Borgs, C., Chayes, J. T., Cohn, H., & Zhao, Y. (2014). An Lp theory of sparse graph convergence II: LD convergence, quotients, and right convergence. arXiv preprint arXiv:1408.0744. 6. Chatterjee, S. (2016). An introduction to large deviations for random graphs. Bulletin of the American Mathematical Society, 53(4), 617–642. 7. Chatterjee, S., & Dembo, A. (2016). Nonlinear large deviations. Advances in Mathematics, 299, 396–450. 8. Chatterjee, S., & Diaconis, P. (2013). Estimating and understanding exponential random graph models. Annals of Statistics, 41(5), 2428–2461. 9. Chatterjee, S., & Varadhan, S. R. S. (2011). The large deviation principle for the Erd˝os-Rényi random graph. European Journal of Combinatorics, 32(7), 1000–1017. 10. Dembo, A., & Zeitouni, O. (2010). Large deviations techniques and applications. Corrected reprint of the second (1998) edition. Berlin: Springer. 11. Eldan, R. (2016). Gaussian-width gradient complexity, reverse log-Sobolev inequalities and nonlinear large deviations. arXiv preprint arXiv:1612.04346. 12. Erd˝os, P., & Rényi, A. (1960). On the evolution of random graphs. Publication of the Mathematical Institute of the Hungarian Academy of Sciences, 5, 17–61. 13. Kenyon, R., Radin, C., Ren, K., & Sadun, L. (2014). Multipodal structure and phase transitions in large constrained graphs. arXiv preprint arXiv:1405.0599. 14. Kenyon, R., & Yin, M. (2014). On the asymptotics of constrained exponential random graphs. arXiv preprint arXiv:1406.3662. 15. Kim, J. H., & Vu, V. H. (2000). Concentration of multivariate polynomials and its applications. Combinatorica, 20(3), 417–434. 16. Latała, R. (1997). Estimation of moments of sums of independent real random variables. The Annals of Probability, 25(3), 1502–1513. 17. Lubetzky, E., & Zhao, Y. (2015). On replica symmetry of large deviations in random graphs. Random Structures Algorithms, 47(1), 109–146. 18. Lubetzky E., & Zhao, Y. (2017). On the variational problem for upper tails of triangle counts in sparse random graphs. Random Structures Algorithms, 50(3), 420–436. 19. Radin, C., & Sadun, L. (2013). Phase transitions in a complex network. Journal of Physics A, 46, 305002. 20. Radin, C., & Yin, M. (2011). Phase transitions in exponential random graphs. arXiv preprint arXiv:1108.0649.

6

1 Introduction

21. Talagrand, M. (1995). Concentration of measure and isoperimetric inequalities in product spaces. Publications Mathématiques de l’Institut des Hautes Études Scientifiques, 81, 73–205. 22. Vu, V. H. (2002). Concentration of non-Lipschitz functions and applications. Probabilistic methods in combinatorial optimization. Random Structures Algorithms, 20(3), 262–316. 23. Yin, M. (2013). Critical phenomena in exponential random graphs. Journal of Statistical Physics, 153(6), 1008–1021.

Chapter 2

Preparation

Let Œ0; 1d be the d-dimensional unit hypercube. The cases d D 1 and d D 2 are the relevant ones in this manuscript. This chapter summarizes some basic facts about L2 .Œ0; 1d /. I will assume that the reader is familiar with Lebesgue measure, Borel sigma-algebra, integration, conditional expectation, basic results about integrals such as Fatou’s lemma, monotone convergence theorem, dominated convergence theorem, Hölder’s inequality and Minkowski’s inequality, and elementary notions from topology such as the abstract definition of a topological space and the properties of continuous functions.

2.1 Probabilistic Preliminaries This section summarizes some basic results from probability that are widely used in this monograph. Let .˝; F ; P/ be a probability space and f W ˝ ! R be a measurable function. Let R be an interval containing the range of f , and let  W R ! R be a convex function, meaning that for all x; y 2 R and t 2 Œ0; 1, .tx C .1  t/y/  t.x/ C .1  t/. y/: Proposition 2.1 (Jensen’s Inequality) Let ˝, P,  and f be as above. Let Z m WD ˝

f .x/ dP.x/:

Then Z .m/ 

˝

. f .x// dP.x/:

© Springer International Publishing AG 2017 S. Chatterjee, Large Deviations for Random Graphs, Lecture Notes in Mathematics 2197, DOI 10.1007/978-3-319-65816-2_2

7

8

2 Preparation

Moreover, if  is nonlinear in every open neighborhood of m, then equality holds in the above inequality if and only if f D m P-almost surely. Proof Note that m 2 R. Since  is a convex function, there exist a; b 2 R such that ax C b  .x/ for all x 2 R and am C b D .m/. To see this, simply observe that by the convexity of , lim inf x#m

.x/  .m/ .m/  .x/  lim sup ; xm mx x"m

choose a to be a number between these two limits, and choose b to satisfy ax C b D .m/. With these choices, the required properties follow from convexity. Having obtained a and b, observe that Z Z . f .x// dP.x/  .af .x/ C b/ dP.x/ D am C b D .m/; ˝

˝

which is the desired inequality. If  is nonlinear in every open neighborhood of m, then the convexity of  implies that with a suitable choice of a, it can be guaranteed that .x/ > ax C b for all x ¤ m. Thus, equality can hold if and only if f D m P-almost surely. t u Let .˝; F ; P/ be a probability space. Recall that L2 .˝; F ; P/ is the space of all F -measurable f W ˝ ! R such that Z k f k WD ˝

2



f .x/ dx < 1:

This is actually a normed space of equivalence classes, where two functions f and g are said to be equivalent if k f  gk D 0, which is the same as saying f D g except possibly on a set of measure zero. We will, however, always treat the elements of this space as functions rather than as equivalence classes. The most basic fact about L2 is that it is a complete metric space. To prove this, we need two lemmas. Lemma 2.1 (Chebychev’s Inequality) For any f 2 L2 .˝; F ; P/ and  > 0, P.fx 2 ˝ W j f .x/j  g/ 

k f k2 : 2

Proof Let g be a function which is 1 if j f .x/j   and 0 otherwise. Then g  j f j= everywhere. Therefore kgk2  k f k2 = 2 , which is exactly the statement of the lemma. t u

2.1 Probabilistic Preliminaries

9

Lemma 2.2 (Borel–Cantelli Lemma) If fAn gn1 is a sequence sets in F such that 1 X

P.An / < 1;

nD1

then P.fx W x 2 infinitely many An ’sg/ D 0: Proof Let N.x/ D the number of n such that x 2 An . By the monotone convergence theorem, p 2 Z k Nk D

N.x/ dx D Œ0;1d

1 X

P.An / < 1:

nD1

Therefore by Chebychev’s inequality, P.fx W N.x/  Lg/ ! 0 as L ! 1, which proves that for P-almost all x 2 ˝, N.x/ < 1. t u Proposition 2.2 (Completeness of L2 ) The space L2 .˝; F ; P/ is complete; that is, every Cauchy sequence converges to a limit. Moreover, any Cauchy sequence has a subsequence that converges P-almost everywhere to its limit. Proof Let f fn gn1 be a Cauchy sequence. Then there exists a sequence nk ! 1 such that k fnk  fnkC1 k  2k for each k. Let Ak WD fx W j fnk .x/  fnkC1 .x/j > 2k=2 g: Then by Chebychev’s inequality, P.Ak /  2k : Therefore by the Borel–Cantelli lemma, the set of all x such that x 2 infinitely many Ak ’s has measure zero. Now if x 2 only finitely many Ak ’s, then the sequence f fnk .x/gk1 is a Cauchy sequence in R. Let f .x/ denote the limit of this sequence. Then for each k, by Fatou’s lemma and Minkowski’s inequality, k fnk  f k  lim inf k fnk  fnl k  2kC1 : l!1

The convergence of fn to f now follows easily by the Cauchy property of the sequence f fn gn1 . t u In the rest of this chapter, we will mostly specialize to ˝ D Œ0; 1d , F D the Borel sigma-algebra generated by open sets, and P D Lebesgue measure.

10

2 Preparation

2.2 Discrete Approximations of L2 Functions For each k  0, let Dk be the set of all closed dyadic cubes of the form 

     i1  1 i1 i2  1 i2 id  1 id   ; ; ; 2k 2k 2k 2k 2k 2k

for some integers 1  i1 ; : : : ; id  2k . Let D WD

1 [

Dk :

kD0

Suppose that a function f 2 L2 .Œ0; 1d / has the property that its integral over D is zero for every D 2 Dk . We will now show that such a function must be equal to zero almost everywhere. To prove this, we need a basic result from measure theory, known as the monotone class theorem. Proposition 2.3 (Monotone Class Theorem) Let ˝ be any set and let F be an algebra of subsets of ˝. That is, ˝ 2 F , whenever A 2 F , Ac WD ˝ n A is also in F , and whenever A; B 2 F , A [ B is also in F . Let G be the smallest collection of subsets of ˝ such that G  F and G is closed under monotone unions and intersection. That is, if A1  A2     are members of G , then A1 [ A2 [    is also in G , and if B1  B2     are members of G , then B1 \ B2 \    is also in G . Then G is a sigma-algebra. Proof Take any A 2 F , and let GA be the set of all B 2 G such that A \ B, A \ Bc and Ac \ B are all in G . Since G is closed under monotone unions and intersections, so is GA . Since F is an algebra and is contained in G , therefore F  GA . By the minimality of G , this implies that GA D G . Next, take any B 2 G and any A 2 F . Since GA D G , therefore A 2 GB . Thus, F  GB . Also, GB is a monotone class. Therefore GB D G . Since this is true for all B 2 G , this shows that G is an algebra. Since G is closed under monotone unions and intersections, G must be a sigmaalgebra. t u Proposition 2.4 Suppose that f 2 L2 .Œ0; 1d / satisfies Z f .x/ dx D 0 D

for every D 2 D, where D is the collection of all closed dyadic cubes defined above. Then f D 0 almost everywhere. Proof For each k  0, let Dk0 be the set of all half-open dyadic cubes of the form 

i1  1 i1 ; 2k 2k



 

i2  1 i2 ; 2k 2k



 

id  1 id ; 2k 2k



2.2 Discrete Approximations of L2 Functions

11

for some integers 1  i1 ; : : : ; id  2k . Let D 0 WD

1 [

Dk0 :

kD0

Let F be the set of all finite unions of elements of D 0 . It is easy to see that F is an algebra of subsets of Œ0; 1/d . Let G be the set of all Borel subsets B  Œ0; 1/d such that Z f .x/ dx D 0: B

By the dominated convergence theorem, G is closed under monotone unions and intersections. By assumption, F  G . Therefore G contains the smallest collection of subsets of Œ0; 1/d that contains F and is closed under monotone unions and intersections. Since this collection is a sigma-algebra by the monotone class theorem, G contains a sigma-algebra that contains F . It is easy to see that the smallest sigma-algebra containing F is the Borel sigma-algebra of Œ0; 1/d . Thus, Z f .x/ dx D 0 B d for every Borel subset B  Œ0; 1/d . Take any  > 0 and let BC  WD fx 2 Œ0; 1/ W f .x/ > g. Then

Z 0D

BC 

f .x/ dx   Leb.BC  /  0;

C C where Leb.BC  / is the Lebesgue measure if B . This shows that Leb.B / D 0.  d  Similarly if B WD fx 2 Œ0; 1/ W f .x/ < g, then Leb.B / D 0. This proves that f D 0 almost everywhere on Œ0; 1d . t u Recall the definition of Dk , the set of all closed dyadic cubes of side-length 2k in Œ0; 1d . Take any f 2 L2 .Œ0; 1d / and an integer k  0. Let fk be the function that equals

1 Leb.D/

Z f .x/ dx D

in the interior of every D 2 Dk , and equals zero on the boundaries. We will refer to fk as the ‘level k dyadic approximant of f ’. Proposition 2.5 Take any f 2 L2 .Œ0; 1d / and let fk be the level k dyadic approximant of f . Then fk converges to f in L2 as k ! 1.

12

2 Preparation

Proof Take any 0  k < l, and a cube D 2 Dk . Let C be the set of all members of Dl that are contained in D. For each C 2 C , let xC be the value of fl in C. Similarly, let xD be the value of fk in D. Then note that xD D

1 X xC : jC j

(2.2.1)

C2C

Therefore, 1 Leb.D/

Z

1 X .xC  xD /2 jC j C2C   1 X 2 D x  x2D jC j C2C C Z 1 D . fl .x/2  fk .x/2 / dx: Leb.D/ D

. fl .x/  fk .x//2 dx D D

Summing over all D, we get k fl  fk k2 D k fl k2  k fk k2 : Using the Cauchy–Schwarz inequality, it is easy to see that for any k, k fk k2  k f k2 . Combining this with the identity displayed above, it follows that f fk gk0 is a Cauchy sequence in L2 .Œ0; 1d /. By the completeness of L2 .Œ0; 1d /, there exists g such that fk ! g in L2 as k ! 1. Consequently, for any D 2 D, Z

Z g.x/ dx D lim

k!1 D

D

fk .x/ dx:

However, for any D 2 D, Z

Z fk .x/ dx D D

f .x/ dx D

for all large enough k. Thus, for every D 2 D, Z . f .x/  g.x// dx D 0: D

By Proposition 2.4, this implies that f D g almost everywhere. t u For certain purposes, we will need a more general approximation result than Proposition 2.5. For a positive integer n, let Bn be the set of all cubes of the form 

     i2  1 i2 id  1 id i1  1 i1 ;  ;  ; : n n n n n n

2.2 Discrete Approximations of L2 Functions

13

Note that Dk D B2k . Given a function f 2 L2 .Œ0; 1d / and a positive integer n, define 1 Leb.B/

fOn .x/ WD

Z f .x/ dx B

if x belongs to the interior of a cube B 2 Bn , and let fOn be zero on the boundaries of these cubes. Again, note that the dyadic approximant fk is nothing but fO2k . We will call fOn the ‘level n approximant’ of f , dropping the word ‘dyadic’. The following result generalizes Proposition 2.5 to general approximants. Proposition 2.6 Take any f 2 L2 .Œ0; 1d / and let fOn be the level n approximant of f , as defined above. Then fOn converges to f in L2 as n ! 1. Proof Fix some k  0. Let fk be the level k dyadic approximant of f . Take any n  2k . Classify the elements of Bn into two groups, as follows. Let Bn0 be the set of all B 2 Bn that are fully contained in some D 2 Dk . Let Bn00 be the set of all elements of Bn that do not have the above property. Take any B 2 Bn0 . Then by the Cauchy–Schwarz inequality, Z

Z 

. fOn .x/  fk .x//2 dx D B

Z

B

1 Leb.B/

Z . f . y/  fk . y// dy

2 dx

B

. f .x/  fk .x//2 dx:

 B

Therefore XZ B2Bn0

. fOn .x/  fk .x//2 dx  k f  fk k2 :

(2.2.2)

B

Similarly for any B 2 Bn00 , Z

. fOn .x/  fk .x//2 dx  2 B

Z

fOn .x/2 dx C 2

B

Z  D2 Z

B

2

1 Leb.B/

B2Bn00

f . y/ dy B

Z

2

Z dx C 2

fk .x/2 dx

B

fk .x/2 dx:

B

This shows that Z X Z . fOn .x/  fk .x//2 dx  2 B

fk .x/2 dx B

Z

f .x/2 dx C 2

B

Z

Œ0;1d

. f .x/2 C fk .x/2 /

n .x/ dx;

(2.2.3)

14

2 Preparation

awhere ( n .x/

D

1

if x 2 B for some B 2 Bn00 ;

0

otherwise.

It is easy to see that n .x/ ! 0 as n ! 1 for almost every x 2 Œ0; 1d . Therefore by the dominated convergence theorem, the right-hand side of (2.2.3) tends to zero as n ! 1. Combining this with (2.2.2) gives lim sup k fOn  fk k  k f  fk k; n!1

and therefore, lim sup k fOn  f k  2k f  fk k: n!1

Since this holds for every k, Proposition 2.5 implies that fOn converges to f in L2 . The following corollary of Proposition 2.6 will be useful later.

t u

Corollary 2.1 Let f 2 L2 .Œ0; 1d / and let fOn be the level n approximant of f , as defined above. Let  W I ! R be a bounded continuous function, where I is any interval containing the range of (some version of) f . Then Z

Z lim

n!1 Œ0;1d

. fOn .x// dx D

Œ0;1d

. f .x// dx:

Proof Let Ln denote the integral on the left-hand side and L be the integral on the right. Note that Ln is well-defined since I contains the range of fOn for any n. Let fnk gk1 be a sequence of integers tending to infinity. Since fOnk tends to f in L2 by Proposition 2.6, Proposition 2.2 implies the existence of a further subsequence fnkl gl1 such that fOnkl ! f almost everywhere. By the dominated convergence theorem, this implies that Lnkl ! L as l ! 1. Thus, we have shown that for any sequence nk , there is a subsequence nkl such that Lnkl ! L. This shows that Ln ! L. t u

2.3 The Weak Topology and Its Compactness There is a natural inner product on L2 .Œ0; 1d /, defined as Z . f ; g/ WD

Œ0;1d

f .x/g.x/ dx:

Note that for any g 2 L2 .Œ0; 1d /, the map f 7! . f ; g/ is continuous. The weak topology on L2 .Œ0; 1d / is defined as the smallest topology under which the map f 7! . f ; g/ is continuous for every g 2 L2 .Œ0; 1d /.

2.3 The Weak Topology and Its Compactness

15

Take any g1 ; : : : ; gk 2 L2 .Œ0; 1d /, any x1 ; : : : ; xk 2 R and 1 ; : : : ; k > 0. Consider the set V WD f f 2 L2 .Œ0; 1d / W j. f ; gi /  xi j < i for i D 1; : : : ; kg:

(2.3.1)

Clearly, V is an open set in the weak topology. Consider the collection T of all possible unions of sets of the above form. Since sets like V are open in the weak topology, T is therefore a collection of open sets in the weak topology. (In topological terminology, a collection like V is called a basis for the topology T .) Since sets like V are closed under finite intersections, T is itself a topology. Moreover, for any g 2 L2 .Œ0; 1d /, the map f 7! . f ; g/ is continuous in this topology. Therefore, the topology T must be the same as the weak topology. In other words, any open set in the weak topology is a union of sets like V. Let B1 .Œ0; 1d / denote the closed unit ball of L2 .Œ0; 1d /. The weak topology on this set is of particular importance in this monograph. We will now show that B1 .Œ0; 1d / is metrizable and compact under the weak topology. Recall the set D of closed dyadic cubes in Œ0; 1d . Let D1 ; D2 ; : : : be an enumeration of the members of D. For two functions f ; g 2 B1 .Œ0; 1d /, define ı. f ; g/ WD

1 X

 ˇZ ˇ 2m min ˇˇ

mD1

Dm

ˇ  ˇ . f .x/  g.x// dxˇˇ; 1 :

It is easy to verify that ı is symmetric and satisfies the triangle inequality. By Proposition 2.4, ı. f ; g/ D 0 if and only if f D g almost everywhere. Therefore ı is a metric on B1 .Œ0; 1d /. Proposition 2.7 The metric ı defined above metrizes the restriction of the weak topology to B1 .Œ0; 1d /. Proof Take any f 2 B1 .Œ0; 1d / and any  > 0. Let M be so large that 2M < =2. Let ˇZ ˇ   ˇ ˇ U WD h 2 B1 .Œ0; 1d / W ˇˇ .h.x/  f .x// dxˇˇ < =2 for m D 1; : : : ; M : Dm

Then U is an open set in the weak topology restricted to B1 .Œ0; 1d /, and for any h 2 U, ı.h; f /
0. Suppose that  > 0. Then by the definition of kKk, .u C ıv; K.u C ıv//  ku C ıvk2 : Using the fact that K is self-adjoint, this can be rewritten as .u; Ku/ C 2ı.v; Ku/ C ı 2 .v; Kv/  kuk2 C 2ı.v; u/ C ı 2 kvk2 ; which is the same as 2ı.v; Ku  u/  kuk2  .u; Ku/ C ı 2 kvk2  ı 2 .v; Kv/ D ı 2 kvk2  ı 2 .v; Kv/: Dividing both sides by ı and letting ı ! 0, we get .v; Ku  u/  0: Taking v D Ku  u in the above inequality shows that Ku D u, completing the proof in the case  > 0. If  < 0, the proof is completed in a similar manner starting from the inequality .uCıv; K.uCıv//  kuCıvk2 and choosing v D .Kuu/. t u We will say that a function u 2 B1 .Œ0; 1d / is nonnegative, and write u  0, if u  0 almost everywhere. More generally, u  v will mean that u  v almost everywhere. We will say that a linear operator K on L2 .Œ0; 1d / is nonnegative, and write K  0, if Ku  0 whenever u  0. The following result is an improvement of Proposition 2.10 for nonnegative operators. Proposition 2.11 (Existence of Perron–Frobenius Eigenvalue for Nonnegative Operators) If K is a self-adjoint compact nonnegative linear operator on L2 .Œ0; 1d /, then there exists u 2 B1 .Œ0; 1d / that is nonnegative everywhere, and satisfies Ku D kKku.

20

2 Preparation

Proof If K D 0 then there is nothing to prove, so let us assume that kKk > 0. Let S be the set of nonnegative elements of B1 .Œ0; 1d /. For each u 2 S, let L.u/ WD supf 2 R W u  Kug: Since K  0, L.u/  0 for each u 2 S. Define L WD sup L.u/: u2S

It is easy to see that L.u/  kKk for all u 2 S, and therefore L  kKk. For each v 2 L2 .Œ0; 1d /, let jvj denote the function obtained by taking the absolute value of v.x/ at every x. Note that jvj  v and jvj  v. Since K is a nonnegative operator, this implies that Kjvj  Kv and Kjvj  Kv. Thus, Kjvj  jKvj: By Proposition 2.10, there exists v 2 B1 .Œ0; 1d / and  2 R such that Kv D v and jj D kKk. Let u WD jvj. Then u 2 S, and by the above inequality, kKku D jvj D jKvj  Kjvj D Ku: Thus, L  L.u/  kKk. Combining this with our previous observation that L  kKk, we get L D kKk:

(2.4.1)

Let fun gn1 be a sequence in S such that L.un / ! L . Then there is a sequence of nonnegative numbers fLn gn1 such that Ln ! L and Ln un  Kun for each n. Let vn WD Kun . Since K is a compact operator, we may assume without loss of generality that vn converges to some v in L2 . Since Kun  Ln un  0, kvk D lim kvn k n!1

D lim kKun k  lim Ln kun k D L : n!1

n!1

In particular, by (2.4.1), kvk  kKk > 0: Next, note that since K  0 and Ln un  Kun , Ln vn D K.Ln un /  K.Kun / D Kvn :

(2.4.2)

2.5 A Generalized Hölder’s Inequality

21

Since K is continuous and vn ! v in L2 , this gives L v  Kv:

(2.4.3)

Lastly, observe that since vn  0 for every n, Proposition 2.2 implies that v  0:

(2.4.4)

By (2.4.2), we are allowed to define w WD v=kvk. By (2.4.4), w  0, and by (2.4.3), L w  Kw. However, this means that L w must be equal to Kw, since otherwise, the nonnegativity of w and the inequality L w  Kw would imply that L D kL wk < kKwk  kKk; t u

contradicting (2.4.1). By (2.4.1), this completes the proof.

2.5 A Generalized Hölder’s Inequality The following non-trivial generalization of Hölder’s inequality will be useful later for our analysis of large deviations for random graphs. Theorem 2.1 Q Let 1 ; : : : ; n be probability measures on ˝ Q1 ; : : : ; ˝n , respectively, and let  D niD1 i be the product measure on ˝ D niD1Q ˝i . Let A1 ; : : : ; Am be non-empty subsets of Œn D f1; 2; : : : ; ng and write ˝ D A l2A ˝l and A D Q pi l2A l . Let fi 2 L .˝Ai ; Ai / with pi  1 for each i 2 Œm and suppose that for each l 2 Œn, X 1  1: p i W l2A i i

Each fi can be thought of as an element of Lpi .˝; / in a natural way. With this interpretation, the following inequality holds: Z Y m ˝ iD1

j fi j d 

m Z Y iD1

1=pi ˝Ai

j fi jpi dAi

:

In particular, when each l 2 Œn belongs to at most d many Ai ’s, then we can take pi D d for every i 2 Œm and get Z Y m ˝ iD1

j fi j d 

m Z Y iD1

˝Ai

1=d j fi jd dAi

:

22

2 Preparation

Proof The proof is by induction on n. The case n D 1 is ordinary Hölder’s inequality. Suppose that n > 1 and that the inequality holds for all smaller values of n. By Fubini’s theorem, Z Y m ˝ iD1

Z j fi j d D Z

Y ˝ i W n2A i

j fi j

j fi j d

i W n62Ai

Z

Y

D ˝Œn1

Y

˝n i W n2A i

j fi j dn

 Y

j fi j dŒn1 :

i W n62Ai

For each i such that n 2 Ai , define a function fi W ˝Œn1 ! R as fi

1=pi

Z pi

WD ˝n

j fi j dn

:

By Hölder’s inequality, Z

Y ˝n i W n2A i

j fi j dn 

Y Z i W n2Ai

1=pi ˝n

j fi jpi dn

D

Y

fi :

i W n2Ai

Substituting this in the identity obtained above, we get Z Y m ˝ iD1

Z j fi j d 

Y ˝Œn1 i W n2A i

fi

Y

j fi j dŒn1 :

i W n62Ai

Applying the induction hypothesis for n  1 to the right side gives Z

Y ˝Œn1 i W n2A i

i W n2Ai m Z Y iD1

as required.

j fi j dŒn1

i W n62Ai

Y Z



D

Y

fi

˝Œn1

. fi /pi dŒn1

1=pi Y Z i W n62Ai

1=pi ˝Œn1

j fi jpi dŒn1

1=pi j fi j d ; pi

˝

t u

2.6 The FKG Inequality

23

2.6 The FKG Inequality Let S be a finite or countable subset of R. Let n be a positive integer, and let be a probability mass function on Sn . If x D .x1 ; : : : ; xn / and y D .y1 ; : : : ; yn / are two elements of Sn , we will write x  y if xi  yi for each i. A function f W Sn ! R is called monotone increasing if f .x/  f .y/ whenever x  y. For x; y 2 Sn , x _ y denotes the vector whose ith coordinate is the maximum of xi and yi . Similarly, x ^ y denotes the vector whose ith coordinate is the minimum of xi and yi . The probability mass function is said to satisfy the FKG lattice condition if for all x; y 2 Sn , .x/ . y/  .x _ y/ .x ^ y/:

(2.6.1)

Recall that the covariance of two random variables X and Y is defined as Cov.X; Y/ WD E.XY/  E.X/E. Y/: The FKG inequality says the following. Theorem 2.2 (FKG Inequality) Let S, n and be as above and suppose that satisfies (2.6.1) and is strictly positive everywhere on Sn . Let X be a random vector with law . Then for any monotone increasing f ; g W Sn ! R such that f .X/ and g.X/ are square-integrable random variables, Cov. f .X/; g.X//  0. Proof The proof is by induction on n. First, suppose that n D 1. Let X; Y be independent random variables with law . Then by monotonicity of f and g, . f .X/  f .Y//.g.X/  g.Y// is a nonnegative random variable. By the independence of X and Y, it follows that Cov. f .X/; g.X// D E. f .X/g.X//  E. f .X//E.g.X// D

1 EŒ. f .X/  f . Y//.g.X/  g. Y//: 2

This proves the claim when n D 1. Next, suppose that n > 1 and that the theorem has been proved in all smaller dimensions. Define, for each a 2 S, f1 .a/ WD E. f .X/ j X1 D a/; g1 .a/ WD E.g.X/ j X1 D a/: Then by a well-known and easy-to-prove identity about covariances, Cov. f .X/; g.X// D E.Cov. f .X/; g.X/ j X1 // C Cov.E. f .X/ j X1 /; E.g.X/ j X1 // D E.Cov. f .X/; g.X/ j X1 // C Cov. f1 .X1 /; g1 .X1 //:

24

2 Preparation

If the first coordinate is fixed, then f and g are monotone increasing functions of the remaining n  1 coordinates. Also, it is easy to see that the condition probability mass function of .X2 ; : : : ; Xn / given X1 D a satisfies the lattice condition (2.6.1) on Sn1 , for any a. Combining these two observations, it follows from the induction hypothesis that for any a 2 S, Cov. f .X/; g.X/ j X1 D a/  0: Thus, the proof will be complete if we can show that f1 and g1 are monotone increasing functions on S. By symmetry, it suffices to show only for f1 . Take any a; b 2 S, a < b. Let X 0 WD .X2 ; : : : ; Xn / and for each x0 2 Sn1 , let

.x0 / WD

.b; x0 / : .a; x0 /

Then by the monotonicity of f , P

P

P 

f .b; x0 / .b; x0 /  f1 .a/ 0 x0 2Sn1 .b; x /

x0 2Sn1

f1 .b/  f1 .a/ D

f .a; x0 / .b; x0 /  f1 .a/ 0 x0 2Sn1 .b; x /

x0 2Sn1

P

D

E. f .a; X 0 / .X 0 / j X1 D a/  E. f .a; X 0 / j X1 D a/ E. .X 0 / j X1 D a/

D

Cov. f .a; X 0 /; .X 0 / j X1 D a/ : E. .X 0 / j X1 D a/

Now f .a; / is a monotone increasing function of Sn1 , and as observed before, the conditional law of X 0 given X1 D a satisfies (2.6.1). Thus, the above display shows that the proof of the monotonicity of f1 will be complete if we can prove that is a monotone increasing function on Sn1 . To prove this, take any x0 ; y0 2 Sn1 , x0  y0 . Then

. y0 /  .x0 / D

.b; y0 / .a; x0 /  .b; x0 / .a; y0 / : .a; y0 / .a; x0 /

But .b; y0 / .a; x0 /  .b; x0 / .a; y0 /  0 by (2.6.1). This completes the proof of the theorem. t u

References

25

Bibliographical Notes Most of the topics covered in this chapter are standard fare in graduate-level functional analysis and probability. I chose to restrict attention to L2 .Œ0; 1d / instead of general L2 spaces because this is all that we need in this monograph. The specialization to the hypercube allows shorter proofs for several theorems. Further discussions and applications of Chebychev’s inequality, Borel–Cantelli lemma and the monotone class theorem may be found in any graduate text on probability theory. Similarly, discussions on completeness of L2 , weak topology, the Banach–Alaoglu theorem and compact operators may be found in any graduate text on functional analysis. The discrete approximation presented in Sect. 2.2 is harder to find in textbooks; it is, however, very important for this monograph. The generalized Hölder’s inequality in Sect. 2.5 is less standard than the rest of the chapter. It is due to Finner [1], and also appears in Friedgut [3]. The statement and proof given here follow the presentation in Lubetzky and Zhao [4]. The FKG inequality is due to Fortuin et al. [2]. Many sophisticated generalized versions are now available, but the version stated and proved in Sect. 2.6 is the one we need in this monograph.

References 1. Finner, H. (1992). A generalization of Hölder’s inequality and some probability inequalities. The Annals of Probability, 20(4), 1893–1901. 2. Fortuin, C. M., Kasteleyn, P. W., & Ginibre, J. (1971). Correlation inequalities on some partially ordered sets. Communications in Mathematical Physics, 22, 89–103. 3. Friedgut, E. (2004). Hypergraphs, entropy, and inequalities. The American Mathematical Monthly, 111(9), 749–760. 4. Lubetzky, E., & Zhao, Y. (2015). On replica symmetry of large deviations in random graphs. Random Structures Algorithms, 47(1), 109–146.

Chapter 3

Basics of Graph Limit Theory

This chapter summarizes some basic results from graph limit theory. The only background assumed here is the list of results from Chap. 2. As in Chap. 2, I will try to make the presentation and the proofs as self-contained as possible.

3.1 Graphons and Homomorphism Densities A simple graph is an undirected graph with no multi-edges or self-loops. Let G be a finite simple graph on n vertices. Let V.G/ D f1; 2; : : : ; ng be the set of vertices of G and let E.G/ be the set of edges. Recall that the adjacency matrix of G is the n  n symmetric matrix whose .i; j/th entry is 1 if f i; jg 2 E.G/ and 0 if not. The adjacency matrix may be converted into a function f G on Œ0; 12 in the following canonical way: Take any .x; y/ 2 Œ0; 12 , and let i and j be the two unique integers such that i1 i j1 j 0. Let C be a maximal collection of orthonormal eigenfunctions of Kf such that none of the eigenvalues .u /u2C belong to .; /. Then jC j  1= 2 and for each u 2 C and x 2 Œ0; 1, ju.x/j  1=. Moreover, if g.x; y/ WD f .x; y/ 

X

u u.x/u. y/;

u2C

then kKg k < . Proof The bounds on jC j and ju.x/j follow from Proposition 3.5. Since g.x; y/ D g. y; x/, Kg is self-adjoint. Since f is bounded and the u’s are bounded and jC j is finite, g is bounded. Thus, Proposition 3.4 implies that Kg is a compact operator. So by Proposition 2.10, there exist v 2 B1 .Œ0; 12 / and  2 R such that jj D kKg k and Kg v D v. Suppose that jj  . Note that for any u 2 C , Kg u D 0, and therefore .u; v/ D 1 .u; Kg v/ D 1 .Kg u; v/ D 0: Thus, v D Kg v D Kf v 

X

u .u; v/u D Kf v:

u2C

In other words, C [ fvg is an orthonormal collection of eigenfunctions of Kf such that all eigenvalues are outside .; /. This gives a contradiction, since C is a maximal collection with this property. t u

3.5 Compactness of the Space of Graphons Given n  1, let Wn be the set of all graphons that are constant in open squares of the form     j1 j i1 i ;  ; ; n n n n

3.5 Compactness of the Space of Graphons

35

for every 1  i; j  n. Let Mn be the set of all measure-preserving bijections of Œ0; 1 that linearly translate any interval of the form ..i  1/=n; i=n/ to another interval of the same type, and do not move points of the form i=n. Note that if 2 Mn and f 2 Wn , then f 2 Wn . Note also that Mn has a natural bijection with the symmetric group Sn and therefore jMn j D nŠ. Let Un be the set of all Borel measurable functions from Œ0; 1 into R that are constant in intervals of the form ..i  1/=n; i=n/. Note that if u 2 Un and 2 Mn , then u 2 Un , where u .x/ WD u. x/. The goal of this section is to prove the following theorem, which says that the space f W defined in Sect. 3.1 is compact under the metric ı . The proof of this result will make use of the functional analytic results derived in Chap. 2 and the preceding section. Theorem 3.1 (Weak Regularity Theorem for Graphons) Given any  2 .0; 1/, there exists a set W ./  W with the following properties: (i) There are universal constants C1 , C2 and C3 such that jW ./j  C1  C2 

C3  2

:

(ii) For any f 2 W , there exists 2 M and h 2 W ./ such that d . f ; h/ < : In particular, the metric space .f W ; ı / defined in Sect. 3.1 is compact. (iii) If f 2 Wn , then the in part (ii) can be chosen to be in the set Mn defined above. Proof Throughout the proof, C, C1 , C2 and C3 will denote positive universal constants, whose values may change from line to line. Fix  > 0. Take any n  1 and a graphon f 2 Wn . Let C be a maximal collection of orthonormal eigenfunctions of Kf such that none of the eigenvalues belong to .=5; =5/. Then by Proposition 3.6, jC j  25= 2 and for each u 2 C and x 2 Œ0; 1, ju.x/j  5=. Enumerate the elements of C as fu1 ; : : : ; um g and the corresponding eigenvalues as 1 ; : : : ; m . Define r.x; y/ WD

m X

i ui .x/ui . y/:

iD1

Then again by Proposition 3.6, kKf  Kr k
sup.I.x//:

k!1 n!1

x2F

(4.2.1)

4.3 A General Upper Bound

45

By the compactness of F, we may assume, after passing to a subsequence if necessary, that xk ! x 2 F. Let

0k WD k C d.xk ; x/; where d is the metric on X . Then 0k ! 0 and B.xk ; k /  B.x; 0k /. Therefore, by the assumed inequality in the statement of the lemma and the inequality (4.2.1), I.x/  lim lim sup n log n .B.x; 0k // k!1 n!1

 lim lim sup n log n .B.xk ; k // k!1 n!1

> sup.I.x//; x2F

t u

which is a contradiction.

4.3 A General Upper Bound Recall that a topological vector space is a vector space endowed with a topology under which the vector space operations are continuous. Let X be a real topological vector space whose topology has the Hausdorff property. Let X  be the dual space of X , that is, the space of all continuous linear functionals on X . Let B be the Borel sigma-algebra of X and let fn gn1 be a sequence of probability measures on .X ; B/. Define the logarithmic moment generating function n W X  ! .1; 1 of n as Z n ./ WD log

X

e.x/ dn .x/:

Let fn gn1 be a sequence of positive real numbers tending to zero. Define a function N W X  ! Œ1; 1 as N ./ WD lim sup n n .=n /:

(4.3.1)

n!1

The Fenchel–Legendre transform of N is the function N  W X ! Œ1; 1 defined as N N  .x/ WD sup ..x/  .//: 2X 

(4.3.2)

46

4 Large Deviation Preliminaries

The following result shows that N  is an upper bound for the rate function of n if the n ’s are supported on a compact set. This is one of the commonly used tools in large deviation theory. Theorem 4.1 For any compact set  X , lim sup n log n . /   inf N  .x/: x2

n!1

Proof Fix a compact set  X and a number ı > 0. Let I ı .x/ WD minfN  .x/  ı; 1=ıg: The definition (4.3.2) of N  shows that for any x 2 , there exists x 2 X  such that N x /: I ı .x/  x .x/  .

(4.3.3)

Since x is a continuous linear functional, there exists an open neighborhood Ax of x such that inf .x . y/  x .x//  ı:

y2Ax

(4.3.4)

For any  2 X  , Z n .Ax / 

e. y/.x/ X

einfz2Ax ..z/.x//

dn . y/:

Taking  D x =n and using (4.3.4) gives n log n .Ax /  ı  x .x/ C n n .x =n /:

(4.3.5)

Since is compact, there exists a finite collection of points x1 ; : : : ; xN 2 such that 

N [

A xi ;

iD1

and hence n . / 

N X iD1

n .Axi /  N max n .Axi /: 1iN

4.4 The Azuma–Hoeffding Inequality

47

Therefore by (4.3.5), n log n . /  n log N C ı  min .xi .xi /  n n .xi =n //: 1iN

By the definition (4.3.1) of N and the inequality (4.3.3), this gives N i // lim sup n log n . /  ı  min .xi .xi /  .x n!1

1iN

 ı  min I ı .xi / 1iN

 ı  inf I ı .x/: x2

The proof is completed by taking ı ! 0. t u For the uninitiated reader, the following exercises may help clarify the nature of the upper bound from Theorem 4.1. Exercise 4.2 Let X1 ; X2 ; : : : be i.i.d. random variables that take value 1 with probability 1=2 and 1 with probability 1=2. Let n be the probability law of the sample mean .X1 C    C Xn /=n. With X D R and n D 1=n, compute N  in the problem. Exercise 4.3 Let n be as in Exercise 4.2. Show that the function N  for this problem is in fact the large deviation rate function for the sequence fn gn1 .

4.4 The Azuma–Hoeffding Inequality In this section we will state and prove a useful probability inequality, known as the Azuma–Hoeffding inequality. In contrast with Theorem 4.1, it gives a finite sample bound with no limits. Recall that a filtration of sigma-algebras fFi gi0 is a sequence of sigma-algebras such that Fi  FiC1 for each i. Given a filtration fFi gi0 on a probability space, with F0 being the trivial sigma-algebra, recall that a sequence of random variables fXi gi1 defined on this space is called a martingale difference sequence if Xi is Fi measurable, EjXi j < 1 and E.Xi j Fi1 / D 0 for each i  1. (I am assuming that the reader is familiar with the abstract definition of conditional expectation.) Theorem 4.2 (Azuma–Hoeffding Inequality) Suppose that fXi g1in is a martingale difference sequence with respect to some filtration fFi g0in with F0 D the trivial sigma-algebra. Suppose that A1 ; : : : ; An ; B1 ; : : : ; Bn are random variables and c1 ; : : : ; cn are constants such that for each i, Ai and Bi are Fi1 -measurable, and with probability one, Ai  Xi  Bi and Bi  Ai  ci . Let Sn WD X1 C    C Xn .

48

4 Large Deviation Preliminaries

Then for any x  0,  2x2 maxfP.Sn  x/; P.Sn  x/g  exp  Pn

 : 2

iD1 ci

The key ingredient in the proof of the Azuma–Hoeffding inequality is Hoeffding’s lemma, stated below. Lemma 4.2 (Hoeffding’s Lemma) Let X be a random variable that is bounded between two constants a and b with probability one. If E.X/ D 0, then for any   0, E.e X /  e

2 .ba/2 =8

:

Proof Without loss of generality, suppose that a < 0 < b. For any a < x < b and  2 R, e x D et bC.1t/ a ; where tD

xa 2 Œ0; 1: ba

Therefore by Jensen’s inequality (Proposition 2.1), e x  te b C .1  t/e a : Since E.X/ D 0, this implies that E.e X / 

be a  ae b : ba

For 0  t  1, let  t a  be  aet b h.t/ W D log ba D ta C log.1  p C pet.ba/ /; where p WD

a 2 .0; 1/: ba

(4.4.1)

4.4 The Azuma–Hoeffding Inequality

49

An easy verification shows that h.0/ D h0 .0/ D 0. Furthermore, h00 .t/ D  2 .b  a/2 y.1  y/; where yD

et.ba/ 2 .0; 1/: 1  p C pet.ba/

This shows that h00 .t/   2 .b  a/2 =4 for every t 2 Œ0; 1. Together with the inequality (4.4.1), this completes the proof of the lemma. t u Armed with Hoeffding’s lemma, we can now easily prove the Azuma–Hoeffding inequality. Proof (Proof of Theorem 4.2) Let mi ./ WD E.e Xi j Fi1 /. Then by the given conditions and Lemma 4.2, mi ./  e

2 c2 =8 i

:

Let i ./ WD E.e Si /, where Si D X1 C    C Xi . Then the above inequality implies that n ./ D E.e Sn1 mn .//  n1 ./e

2 c2 =8 n

:

Proceeding inductively, we get n ./  e

2 .c2 CCc2 /=8 n 1

:

Therefore Chebychev’s inequality implies that for any x  0 and   0, P.Sn  x/  P.e Sn  e x /  e x n ./  e xC

2 .c2 CCc2 /=8 n 1

:

Choosing  D 4x=.c21 C    C c2n / gives  P.Sn  x/  exp 

 2x2 : c21 C    C c2n

By a symmetrical argument, the same bound is proved for P.Sn  x/. The proof is completed by combining the two bounds. t u

50

4 Large Deviation Preliminaries

4.5 McDiarmid’s Inequality The Azuma–Hoeffding inequality has the following important corollary, which is known as McDiarmid’s inequality or the bounded differences inequality. Theorem 4.3 (McDiarmid’s Inequality) Let X be a set endowed with a sigmaalgebra. Let X1 ; : : : ; Xn be independent X -valued random variables. Let X10 ; : : : ; Xn0 be independent copies of X1 ; : : : ; Xn . Suppose that f W X n ! R is a measurable function and c1 ; : : : ; cn are constants such that for each i, with probability one, j f .X1 ; : : : ; Xn /  f .X1 ; : : : ; Xi0 ; : : : ; Xn /j  ci : Let W WD f .X1 ; : : : ; Xn /. Then for any x  0,  2x2 maxfP.W  E.W/  x/; P.W  E.W/  x/g  exp  Pn

 : 2

iD1 ci

Proof Let F0 be the trivial sigma-algebra, and let Fi be the sigma-algebra generated by X1 ; : : : ; Xi . Let Yi WD E.W j Fi /  E.W j Fi1 /; so that Y1 ; : : : ; Yn is a martingale difference sequence and Y1 C  CYn D W E.W/. Thus, we are in the setting of the Azuma–Hoeffding inequality. Take any i. Let Wi WD f .X1 ; : : : ; Xi0 ; : : : ; Xn / and let Gi be the sigma-algebra generated by .X1 ; : : : ; Xi0 ; : : : Xn /. Let Ui and Li denote the essential supremum and essential infimum of W  Wi given Gi . By the given condition on f , conditional on Gi , the maximum possible variation of W as Xi varies is bounded above by ci . Thus, Ui  Li  ci . Now, since Li  W  Wi  Ui with probability one, it follows that Ai  E.W  Wi j Fi /  Bi with probability one, where Ai D E.Li j Fi / and Bi D E.Ui j Fi /. Clearly, Bi  Ai  ci . By the independence of the Xi ’s, Ai and Bi are Fi1 -measurable. Also by independence, E.Wi j Fi / D E.W j Fi1 /, and therefore E.W  Wi j Fi / D Yi . The proof is now completed by an application of the Azuma–Hoeffding inequality (Theorem 4.2). t u

Bibliographical Notes The materials contained in Sects. 4.1 and 4.3 are extracted from the textbook of Dembo and Zeitouni [3], with minor notational modifications. Adhering to the general principle being followed in this monograph, I have presented only as much

References

51

as necessary. For example, I have avoided discussing general lower bounds, because the large deviation lower bounds required in this monograph will be derived directly without appealing to the general theory. The abstract framework for large deviations in topological spaces was formulated by Varadhan [9]. The idea behind the general upper bound of Theorem 4.1 goes back to results of Cramér and Chernoff for sums of i.i.d. random variables. An early version of the general result under additional assumptions was given in Gärtner [4]. The version presented here was proved by Stroock [8] and de Acosta [2]. Hoeffding’s lemma and the Azuma–Hoeffding inequality for sums of independent random variables was proved by Hoeffding [5]. It was later generalized by Azuma [1] to sums of martingale differences. It took a while for mathematicians to realize the usefulness of the Azuma–Hoeffding inequality, until it was pointed out by McDiarmid [6]. The central idea behind McDiarmid’s inequality was, however, already exploited in the earlier work of Shamir and Spencer [7].

References 1. Azuma, K. (1967). Weighted sums of certain dependent random variables. Tohoku Mathematical Journal, Second Series, 19(3), 357–367. 2. de Acosta, A. (1985). Upper bounds for large deviations of dependent random vectors. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, 69, 551–565. 3. Dembo, A., & Zeitouni, O. (2010). Large deviations techniques and applications. Corrected reprint of the second (1998) edition. Berlin: Springer. 4. Gärtner, J. (1977). On large deviations from the invariant measure. Theory of Probablity and its Applications, 22, 24–39. 5. Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58, 13–30. 6. McDiarmid, C. (1989). On the method of bounded differences. In J. Siemons (Ed.), Surveys in combinatorics. London Mathematical Society Lecture Notes Series, vol. 141, pp. 148–188. Cambridge: Cambridge University Press. 7. Shamir, E., & Spencer, J. (1987). Sharp concentration of the chromatic number on random graphs Gn;p . Combinatorica, 7(1), 121–129. 8. Stroock, D. W. (1984). An introduction to the theory of large deviations. Berlin: Springer. 9. Varadhan, S. R. S. (1966). Asymptotic probabilities and differential equations. Communications on Pure and Applied Mathematics, 19, 261–286.

Chapter 5

Large Deviations for Dense Random Graphs

A dense graph is a graph whose number of edges is comparable to the square of the number of vertices. The main result of this chapter is the formulation and proof of the large deviation principle for dense Erd˝os–Rényi random graphs. We will see later that this result can be used to derive large deviation principles for a large class of models. These and other applications will be given in later chapters. The results and definitions from the previous chapters will be used extensively in the proofs of this chapter.

5.1 The Erd˝os–Rényi Model Let n be a positive integer and p be an element of Œ0; 1. The Erd˝os–Rényi model defines an undirected random graph on the vertex set f1; 2; : : : ; ng by declaring that any two vertices are connected by an edge with probability p, and that these assignments are independent of each other. The resulting random graph model is denoted by G.n; p/. Let G be an Erd˝os–Rényi random graph with parameters n and p, defined on an abstract probability space .˝; F ; P/. Recall the definition of the graphon f G of G from Chap. 3. Recall the space W and the cut metric on this space. Equip W with the Borel sigma-algebra induced by the cut metric, and define a probability Pn;p on this space as Pn;p .B/ WD P. f G 2 B/

(5.1.1)

for every Borel measurable subset B of W . In other words, Pn;p is the probability measure on W induced by the random graph G. Since f G can take only finitely many values, the event f f G 2 Bg is guaranteed to be measurable.

© Springer International Publishing AG 2017 S. Chatterjee, Large Deviations for Random Graphs, Lecture Notes in Mathematics 2197, DOI 10.1007/978-3-319-65816-2_5

53

54

5 Large Deviations for Dense Random Graphs

Let feG be the image of f G in the quotient space f W . When f W is equipped with the Borel sigma-algebra induced by the metric ı , the random element feG induces a probability measure e Pn;p the same way that Pn;p was defined: e Pn;p .e B/ D P. feG 2 e B/

(5.1.2)

for every Borel set e B  f W . Again, since feG can take only finitely many values, there are no measurability issues. The following exercise describes the asymptotic behavior of e Pn;p . f representing the graphon that Exercise 5.1 Prove that if pQ denotes the element of W is identically equal to p, and G is an Erd˝os–Rényi G.n; p/ graph, then ı . feG ; pQ / ! 0 e containing pQ , in probability as n ! 1. Equivalently, show that for any open set U e e Pn;p .U/ ! 1 as n ! 1.

5.2 The Rate Function Given p 2 .0; 1/, let Ip W Œ0; 1 ! R be the function Ip .u/ WD u log

1u u C .1  u/ log : p 1p

(5.2.1)

Lemma 5.1 The function Ip can be alternately expressed as Ip .u/ D sup.au  log. pea C 1  p//: a2R

Proof Fixing u, let the term within the supremum be denoted by J.a/. When u 2 .0; 1/, it is an easy calculus exercise to verify that J is concave and J.a/ ! 1 as a ! ˙1, and the maximum of J is indeed Ip .u/. When u D 1, J.a/ < Ip .1/ for all a < 1 and J.a/ ! Ip .1/ as a ! 1. Similarly, when u D 0, J.a/ < Ip .0/ for all a > 1 and J.a/ ! Ip .0/ as a ! 1. t u The domain of the function Ip can be extended to W by defining Z Ip .h/ WD

Œ0;12

Ip .h.x; y// dx dy:

We would now like to show that Ip is a lower semi-continuous function on W . Recall that a function f from a topological space X into R is called lower semi-continuous if f 1 ..a; 1// is open for every a 2 R. Clearly, continuous functions are lower semi-continuous. From the definition it follows easily that if f f˛ g˛2A is an arbitrary collection of lower semi-continuous functions on X , then f WD sup˛2A f˛ is again

5.2 The Rate Function

55

lower semi-continuous, because f 1 ..a; 1// D

[

f˛1 ..a; 1//:

˛2A

An important characterization of lower semi-continuous functions on metric spaces says that if X is a metric space, then a function f W X ! R is lower semicontinuous if and only if for every sequence fxn gn1 in X that converges to a point x, lim inf f .xn /  f .x/: n!1

An easy consequence of the above inequality is that on compact metric spaces, a lower semi-continuous function f must necessarily attain its minimum, and moreover the set of minima is compact. Lemma 5.2 Let S be the set of a 2 L2 .Œ0; 12 / that satisfy the symmetry condition a.x; y/ D a. y; x/. The function Ip on W can be alternately expressed as Ip .h/ D sup Jp;a .h/; a2S

where Z Jp;a .h/ WD

Œ0;12

.a.x; y/h.x; y/  log. pea.x;y/ C 1  p// dx dy:

Proof Lemma 5.1 implies that Ip .h/  sup Jp;a .h/: a2S

For the opposite inequality, let a .x; y/ WD log

1  h.x; y/ h.x; y/  log ; p 1p

and notice that  .x;y/

Ip .h.x; y// D a .x; y/h.x; y/  log. pea

C 1  p/:

(5.2.2)

This would suffice to complete the proof if a were guaranteed to be in L2 . However, there is no such guarantee, and therefore we need to work a bit more. For each  2 .0; 1/, let A WD f.x; y/ 2 Œ0; 12 W   h.x; y/  1  g; B WD f.x; y/ 2 Œ0; 12 W 0 < h.x; y/ <  or 1   < h.x; y/ < 1g;

56

5 Large Deviations for Dense Random Graphs

and define E WD f.x; y/ 2 Œ0; 12 W h.x; y/ D 1g; F WD f.x; y/ 2 Œ0; 12 W h.x; y/ D 0g: For each  2 .0; 1/ and M  1, define 8  ˆ ˆa .x; y/ ˆ ˆ 0, define B.g; / WD fh 2 W W d .h; g/  g: The following lemma provides an important bridge between the weak and cut topologies. Lemma 5.4 For any g 2 W and  > 0, B.g; / is weakly closed.

62

5 Large Deviations for Dense Random Graphs

Proof Suppose that fgn gn1 is a sequence in W such that gn 2 B.g; / for each n and gn ! h weakly. Take any two Borel measurable functions a; b W Œ0; 1 ! Œ1; 1. Since gn ! h weakly, ˇZ ˇ ˇ ˇ

Œ0;12

ˇ ˇ a.x/b. y/.h.x; y/  g.x; y// dx dyˇˇ

ˇZ ˇ D lim ˇˇ n!1

Œ0;12

ˇ ˇ a.x/b. y/.gn.x; y/  g.x; y// dx dyˇˇ  :

Taking supremum over all a and b gives d .h; g/  .

t u

Q /, depending only on hQ and , with Lemma 5.5 There exists a function ı.h; Q / ! 0 as  ! 0, such that for each hQ 2 f ı.h; W , > 0 and  > 0, lim lim sup

!0 n!1

1 Q / \ B.W ./; //  Ip .h/ Q C ı.h; Q /: log Pn;p .B.h; n2

Proof Since W ./ is a finite set, it suffices to show that for fixed g 2 W ./, lim lim sup

!0 n!1

1 Q / \ B.g; //  Ip .h/ Q C ı.h; Q /: log Pn;p .B.h; n2

Q / \ B.g; / is empty for sufficiently small , then there is nothing to prove. If B.h; So let us assume that this is not the case. Then Q /: g 2 B.h;

(5.4.3)

Q  ı.h; Q / for f 2 B.h; Q 2/, where By lower semi-continuity of Ip , Ip . f /  Ip .h/ Q Q ı.h; / ! 0 as  ! 0. But by (5.4.3), B.g; /  B.h; 2/ and by Lemma 5.4, B.g; / is weakly closed. Therefore by Theorem 5.1, lim lim sup

!0 n!1

 lim sup n!1

1 Q / \ B.g; // log Pn;p .B.h; n2

1 log Pn;p .B.g; // n2

  inf Ip . f / f 2B.g;/



inf

f 2B.hQ ;2/

Q C ı.h; Q /: Ip . f /  Ip .h/

This completes the proof of the lemma. We are now ready to prove the upper bound of Theorem 5.2.

t u

5.4 Large Deviation Principle for Dense Random Graphs

63

Proof (Proof of the Upper Bound in Theorem 5.2) The combination of Lemma 5.3 and Lemma 5.5 gives the inequality (5.4.2) after taking  ! 0, which completes the proof of the upper bound of Theorem 5.2. t u Let us now turn our attention to the lower bound of Theorem 5.2. As for the upper bound, we will first go through a sequence of reductions. First, note that for f and 2 .0; 1/, the lower bound it suffices to prove that for all hQ 2 W lim inf n!1

1 Q //  Ip .h/; Q log e Pn;p .S .h; n2

e and hQ 2 U, e there exists 2 .0; 1/ such that S .h; Q /  U. e since for any open U Q / and Pn;p .B.h; Q // D e Q //, it suffices to Again, since B.h; /  B.h; Pn;p .S .h; prove that for any h 2 W and 2 .0; 1/, lim inf n!1

1 log Pn;p .B.h; //  Ip .h/: n2

Take any h 2 W and > 0. Let hO n be the level n approximant of h, as defined in Chap. 2, Sect. 2.2. Fix  2 .0; /. By Proposition 2.6, hO n ! h in L2 . An easy application of the Cauchy–Schwarz inequality shows that the L2 topology is finer than the cut topology. Therefore hO n ! h in the cut metric. Consequently, B.hO n ; /  B.h; / for all large n. Thus, it suffices to prove that lim inf n!1

1 log Pn;p .B.hO n ; //  Ip .h/: n2

As in the proof of Theorem 5.1, let B.i; j; n/ denote the square Œ.i  1/=n; i=n  Œ. j  1/=n; j=n and Bn D

n [

B.i; i; n/:

iD1

Note that hO n is constant in any such square (except possibly at the boundaries). Define a function qn that equals hO n everywhere except on Bn , where it is zero. Since the Lebesgue measure of Bn tends to zero as n ! 1 and qn and hO n are uniformly bounded functions, hO n  qn ! 0 in L2 and so d .hO n ; qn / ! 0. Therefore, it suffices to prove that for any  2 .0; 1/, lim inf n!1

1 log Pn;p .B.qn ; //  Ip .h/: n2

(5.4.4)

Let q.i; j; n/ be the value of qn in B.i; j; n/. Construct a random graph Hn on n vertices (on some abstract probability space .˝; F ; P/) by declaring that vertices i and j are connected by an edge with probability q.i; j; n/, for every 1  i < j  n.

64

5 Large Deviations for Dense Random Graphs

Let .i; j; n/ be a random variable that is 1 if the edge f i; jg is present in Hn and 0 otherwise. Let fn denote the graphon of Hn , and let Pn;h denote the law of fn . Lemma 5.6 Let fn and qn be as above. Then for any Borel measurable a; b W Œ0; 1 ! Œ1; 1 and any x  0, ˇZ ˇ P ˇˇ

ˇ  ˇ 2 2 ˇ a.x/b. y/. fn .x; y/  qn .x; y// dx dyˇ  x  2en x =4 : 2

Œ0;1

Proof For 1  i  n, let Z ai WD n

i=n

a.x/ dx; .i1/=n

and define bi similarly. Then Z Œ0;12

D

a.x/b. y/. fn .x; y/  qn .x; y// dx dy 2 n2

X

ai bj ..i; j; n/  q.i; j; n//:

1i =2/  2jUn0 .1=4/jen

:

The bound on jUn0 .1=4/j from Lemma 5.9 shows that the last expression tends to zero as n ! 1. t u We are now ready to prove the lower bound of Theorem 5.2. We will continue to use the notations introduced in the last few pages. Proof (Proof of the Lower Bound in Theorem 5.2) Note that   dPn;h dPn;h Pn;p .B.qn ; // D dPn;p D exp  log dPn;p B.qn ;/ B.qn ;/   Z 1 dPn;h dPn;h : D Pn;h .B.qn ; // exp  log Pn;h .B.qn ; // B.qn ;/ dPn;p Z

Z

68

5 Large Deviations for Dense Random Graphs

Therefore, by Jensen’s inequality (Proposition 2.1), log Pn;p .B.qn ; //  log Pn;h .B.qn ; // Z 1 dPn;h  log dPn;h : Pn;h .B.qn ; // B.qn ;/ dPn;p Since Pn;h .B.qn ; // ! 1 by Lemma 5.11, this implies that 2 2 lim inf 2 log Pn;p .B.qn ; //   lim 2 n!1 n n!1 n

Z log

dPn;h dPn;h : dPn;p

By Lemma 5.7, the expression on the right equals Ip .h/. This proves (5.4.4) and hence the lower bound of Theorem 5.2. t u

5.5 Conditional Distributions f be the Let G be a G.n; p/ random graph. Let f G be the graphon of G and let feG 2 W e G equivalence class of this graphon. In this section we will denote f simply by G, for ease of notation. Theorem 5.2 gives estimates of the probabilities of rare events for G. However, it does not answer the following question: given that some particular rare event has occurred, what does the graph look like? Naturally, one might expect that if G 2 e F for some closed set e Ff W satisfying Q > 0; h/ D inf Ip .h/ inf Ip .e F hQ 2e

o F hQ 2e

(5.5.1)

o then G should resemble one of the minimizers of Ip in e F. (Here e F denotes the  interior of e F.) In other words, given that G 2 e F, one might expect that ı .G; e F /

 0, where e F is the set of minimizers of Ip in e F and  Q ı .G; e F / WD inf ı .G; h/: F hQ 2e

(5.5.2)

However, it is not obvious that a minimizer must exist in e F. The compactness of f W comes to the rescue: since the function Ip is lower semicontinuous on e F and e F is closed, a minimizer must necessarily exist. The following theorem formalizes this argument. Theorem 5.3 Take any p 2 .0; 1/ and n  1. Let G be a random graph from the  G.n; p/ model. Let e F be a closed subset of f W satisfying (5.5.1). Let e F be the subset  of e F where Ip is minimized. Then e F is non-empty and compact, and for each n, and

5.5 Conditional Distributions

69

each  > 0, F/n P.ı .G; e F / jG2e F/  eC.;e 

2



where C.; e F/ is a positive constant depending only on  and e F and ı .G; e F / is  defined as in (5.5.2). In particular, if e F contains only one element hQ  , then the conditional distribution of G given G 2 e F converges to the point mass at hQ  as n ! 1. Proof Since f W is compact and e F is a closed subset, e F is also compact. Since Ip is a lower semicontinuous function on e F (Proposition 5.1) and e F is compact, it must  attain its minimum on e F. Thus, e F is non-empty. By the lower semicontinuity of Ip ,  e F is closed (and hence compact). Fix  > 0 and let  e Q e F WD fhQ 2 e F W ı .h; F /  g:

Then e F  is again a closed subset. Observe that P.G 2 e F /  P.ı .G; e : F / jG2e F/ D P.G 2 e F/ Thus, with Q I2 WD inf Ip .h/; Q I1 WD inf Ip .h/; F F hQ 2e hQ 2e Theorem 5.2 and condition (5.5.1) give lim sup n!1

1  log P.ı .G; e F / jG2e F/  I1  I2 : 2 n

The proof will be complete if it is shown that I1 < I2 . Now clearly, I1  I2 . If Q D I2 . I1 D I2 , the compactness of e F implies that there exists hQ 2 e F satisfying Ip .h/   However, this means that hQ 2 e F and hence e F \ e F ¤ ;, which is impossible. u t

Bibliographical Notes The Erd˝os–Rényi model was introduced by Gilbert [5] and Erd˝os and Rényi [3]. It has been the subject of extensive investigations over the years. See Bollobás [1] and Janson et al. [6] for partial surveys of this literature. The key feature of the Erd˝os–Rényi model, which makes it amenable to a lot of beautiful mathematics, is that the edges are independent. In the large deviation regime, however, the independence is lost: conditional on a rare event, the edges

70

5 Large Deviations for Dense Random Graphs

typically stop behaving as independent random objects. This is the main difficulty behind the large deviation analysis of the Erd˝os–Rényi model, which remained open for many years. The results of this chapter were proved by Chatterjee and Varadhan [2], generalizing a conjecture of Bolthausen et al. (Large deviations for random matrices and random graphs, Private communication, 2003) concerning large deviations for subgraph counts. The original proof of Theorem 5.2, as it appeared in Chatterjee and Varadhan [2], was based on Szemerédi’s regularity lemma. The proof given in this chapter uses Theorem 3.1, which can be called a version of the weak regularity theorem of Frieze and Kannan [4], adapted to the setting of graphons. While the published proof of Theorem 5.2 was combinatorial in nature, the proof given in this monograph is analytic.

References 1. Bollobás, B. (2001). Random graphs, 2nd ed., vol. 73, Cambridge studies in advanced mathematics. Cambridge: Cambridge University Press. 2. Chatterjee, S., & Varadhan, S. R. S. (2011). The large deviation principle for the Erd˝os-Rényi random graph. European Journal of Combinatorics, 32(7), 1000–1017. 3. Erd˝os, P., & Rényi, A. (1960). On the evolution of random graphs. Publication of the Mathematical Institute of the Hungarian Academy of Sciences, 5, 17–61. 4. Frieze, A., & Kannan, R. (1999). Quick approximation to matrices and applications. Combinatorica, 19, 175–220. 5. Gilbert, E. N. (1959). Random graphs. Annals of Mathematical Statistics, 30(4), 1141–1144. 6. Janson, S., Łuczak, T., & Ruci´nski, A. (2000). Random graphs. Wiley-interscience series in discrete mathematics and optimization. New York: Wiley-Interscience.

Chapter 6

Applications of Dense Graph Large Deviations

This chapter contains some simple applications of the large deviation principle for dense Erd˝os–Rényi random graphs that was derived in the previous chapter. The abstract theory yields surprising phase transition phenomena when applied to concrete problems. We will continue to use notations and terminologies introduced previously in the monograph.

6.1 Graph Parameters A graph parameter is a continuous function from f W into R. Any graph parameter has a natural interpretation as a continuous function on W . We will generally use the same letter to denote a graph parameter and its lift to W . The L1 norm of a graphon f , denoted by k f k1 , is the essential supremum of j f .x; y/j as .x; y/ ranges over Œ0; 12 . A graphon f 2 W is called a local maximum with respect to the L1 norm for a graph parameter if there exists  > 0 such that for any g 2 W with k f gk1  , we have .g/  . f /. Local minimum is defined similarly. A graphon f is called a global maximum if . f /  .g/ for all g, and a global minimum if . f /  .g/ for all g. The following special kind of graph parameters are important in this chapter. Definition 6.1 A graph parameter will be called a ‘nice graph parameter’ if every local maximum of with respect to the L1 norm is a global maximum and every local minimum of with respect to the L1 norm is a global minimum. The next two lemmas identify two examples of nice graph parameters. For the first, recall the definition of the homomorphism density t.H; f / from Chap. 3. Lemma 6.1 For any simple graph H with at least one edge, the function . f / WD t.H; f / is a nice graph parameter.

© Springer International Publishing AG 2017 S. Chatterjee, Large Deviations for Random Graphs, Lecture Notes in Mathematics 2197, DOI 10.1007/978-3-319-65816-2_6

71

72

6 Applications of Dense Graph Large Deviations

Proof Combining Proposition 3.2 and Exercise 3.4 from Chap. 3, it follows that

is continuous with respect to ı metric. In other words, is a graph parameter. Let f be a local maximum of . Choose some  > 0 and let gC WD minf f C; 1g. Unless t.H; f / D 1, it is easy to see that t.H; gC / > t.H; f /. On the other hand, kgf k1  . This violates the assumption that f is a local maximum of . Therefore t.H; f / must be 1. In other words, f must be a global maximum of . Next, suppose that f is a local minimum of . Choose some  > 0 and let g WD .1  /f . Unless t.H; f / D 0, t.H; g / must be strictly less than t.H; f /. Arguing as in the previous case, it follows that f must be a global minimum. Thus, is a nice graph parameter. t u For the next lemma, recall the definition of the operator Kf associated with a graphon f , as defined in Sect. 3.4 of Chap. 3. Recall also the definition of the operator norm kKf k of the operator Kf . Exercise 6.1 If is a measure preserving bijection of Œ0; 1 and f 2 W , show that kKf k D kKf k. Exercise 6.2 For a simple graph G on n vertices, let 1 .G/ be the largest eigenvalue of the adjacency matrix of G. Show that kKf k D 1 .G/=n. (Hint: Use the Perron– Frobenius theorem, for example the version presented as Proposition 2.11 in this monograph.) The following lemma shows that the operator norm is a nice graph parameter. Lemma 6.2 Let . f / WD kKf k. Then is a nice graph parameter. Proof Let g and h be two graphons and let f WD g  h. Take any u 2 L2 .Œ0; 1/ with kuk D 1. Then by the Cauchy–Schwarz inequality, Z 1 Z

4

kKf uk D

0

0

Z D Z  Z D

1

f .x; y/u. y/ dy

2

2 dx

0

Œ0;13

f .x; y/f .x; y /u. y/u. y / dx dy dy Z

Œ0;12

Œ0;14

0

0

1

f .x; y/f .x; y0 / dx

2

dy dy0

0

2

Z Œ0;12

u. y/2 u. y0 /2 dy dy0



f .x; y/f .x; y0 /f .x0 ; y/f .x0 ; y0 / dx dx0 dy dy0 :

Note that j f j  1 everywhere. Thus, for any x0 ; y0 , the definition of the cut distance implies that ˇZ ˇ ˇ ˇ

Œ0;12

ˇ ˇ f .x; y/f .x; y0 /f .x0 ; y/ dx dyˇˇ  d .g; h/;

6.2 Rate Functions for Graph Parameters

73

and so the last expression in the previous display is bounded by d .g; h/. Thus, for any g; h 2 W , j .g/  .h/j4  kKg  Kh k4  d .g; h/: Together with Exercise 6.1, this inequality shows that is a graph parameter. To show that is nice, first observe that there cannot exist any local minima, because for any  2 .0; 1/ and f 2 W , k.1  /Kf k D .1  /kKf k < kKf k unless . f / D kKf k D 0. Next, take any f 2 W and  > 0, and let g WD f C.1f /. Then g 2 W and kg  f k1  . By Proposition 3.4, Kf is a compact operator. Moreover, Kf is obviously a nonnegative operator, according to the definition of nonnegativity given in the paragraph preceding Proposition 2.11 in Chap. 2. Therefore by Proposition 2.11, there exists u 2 B1 .Œ0; 1/ such that Kf u D kKf ku. Since Kg u D .kKf k C .1  kKf k//u; it follows that kKg k > kKf k unless kKf k D 1. Thus, any local maximum of must be a global maximum. t u

6.2 Rate Functions for Graph Parameters Recall the function Ip defined in Sect. 5.2 of Chap. 5. Let be a nice graph parameter, as defined in the previous section. For any p 2 .0; 1/ and t 2 .W /, define  C . p; t/ WD inffIp . f / W f 2 W ; . f /  tg;   . p; t/ WD inffIp . f / W f 2 W ; . f /  tg: The following result establishes an important property of nice graph parameters. Proposition 6.1 If is a nice graph parameter, then for any p 2 .0; 1/, the map t 7!  C . p; t/ is non-decreasing and continuous and the map t 7!   . p; t/ is nonincreasing and continuous. Proof Since  is also a nice graph parameter, it suffices to prove the result for  C . Fix p 2 .0; 1/. By definition, t 7!  C . p; t/ is non-decreasing. Take any t 2 .W / and any f such that . f /  t. First, suppose that t < sup .W / and let ftn gn1 be a sequence that is strictly decreasing to t. Since every local maximum of with respect to the L1 norm is a global maximum, there exists a sequence of graphons f fn gn1 such that k fn  f k1 ! 0 and . fn /  tn for each n. By the uniform

74

6 Applications of Dense Graph Large Deviations

continuity of the function Ip on Œ0; 1 and the dominated convergence theorem, it follows that Ip . fn / ! Ip . f /. Thus, lim  C . p; tn /  lim Ip . fn / D Ip . f /:

n!1

n!1

Since this is true for every f such that . f /  t and  C is non-decreasing in t, this proves the right continuity of  C . Next, assume that t > inf .W / and take a sequence ftn gn1 that is strictly increasing to t. Let f fn gn1 be a sequence of graphons such that . fn /  tn for each n and  C . p; tn /  Ip . fn / ! 0. By the invariance of and Ip under measure preserving bijections and the compactness of f W (Theorem 3.1), we may assume without loss of generality that there exists a graphon f such that d . fn ; f / ! 0. By the continuity of , . f /  t. By the lower semi-continuity of Ip on W (Corollary 5.1), lim inf Ip . fn /  Ip . f /: n!1

Thus, lim inf  C . p; tn /  Ip . f /   C . p; t/: n!1

Since  C is non-decreasing, this proves that  C is left continuous.

t u

6.3 Large Deviations for Graph Parameters Let be a nice graph parameter, and let  C and   be defined as in the previous section. For a simple graph G on a finite set of vertices, we will use the notation

.G/ to denote . f G /. The following theorem is the main result of this section. It gives the large deviation rate functions for the upper and lower tails of nice graph parameters. Theorem 6.1 Fix p 2 .0; 1/. For each n, let Gn;p be a random graph from the G.n; p/ model. Let be a nice graph parameter. Then for any t 2 .W /, 2 log P. .Gn;p /  t/ D  C . p; t/; n2 2 lim log P. .Gn;p /  t/ D   . p; t/: n!1 n2 lim

n!1

Proof Since  is also a nice graph parameter, it suffices to prove the first identity. From Theorem 5.2 and the continuity of on f W , it follows that lim sup n!1

2 log P. .Gn;p /  t/   C . p; t/: n2

6.3 Large Deviations for Graph Parameters

75

Next, let e WD fhQ 2 f Q > tg: U W W .h/ e is an open set. Therefore by Theorem 5.2, By the continuity of , U lim inf n!1

2 2 e log P. .Gn;p /  t/  lim inf 2 log e Pn;p .U/ 2 n!1 n n Q   inf Ip .h/: U hQ 2e

But it is easy to see that for every  > 0, Q   C . p; t C /: inf Ip .h/

U hQ 2e

By the continuity of  C that we know from Proposition 6.1, the proof is complete. t u The next theorem describes the structure of an Erd˝os–Rényi graph conditional on the rare event that a nice graph parameter has a large deviation from its expected value. Following the convention introduced in Sect. 5.5 of Chap. 4, we will simply write G to denote the graphon f G and the equivalence class feG of a simple graph G. Also, as in (5.5.2), we will use the notation ı .G; e F/ to denote the infimum of Q over all hQ 2 e ı .G; h/ F. FC Theorem 6.2 Let and Gn;p be as in Theorem 6.1. Take any t 2 .W /. Let e

. p; t/ Q Q be the set of minimizers of Ip . f / subject to the constraint . f /  t and let e F

. p; t/ be the set of minimizers of Ip . fQ / subject to the constraint . fQ /  t. Then e FC

. p; t/ f and e F

. p; t/ are nonempty compact subsets of W . Moreover, for any  > 0 there C  exist constants C and C depending only on , p, t and  such that C

C n2

;



 n2

:

P.ı .Gn;p ; e F . p; t//   j .Gn;p /  t/  eC P.ı .Gn;p ; e F . p; t//   j .Gn;p /  t/  eC

The constant CC is positive if  C . p; t/ is nonzero, and the constant C is positive if   . p; t/ is nonzero. Proof This result is a simple consequence of Theorem 5.3 and Proposition 6.1. The condition (5.5.1) required for Theorem 5.3 can be easily shown to follow from the continuity of  C and   in t and the assumed positivity of these quantities, because any e f with .e f / > t lies in the interior of the set fe h W .e h/  tg. t u

76

6 Applications of Dense Graph Large Deviations

6.4 Large Deviations for Subgraph Densities In this section we will specialize the results of Sect. 6.3 to subgraph densities. Given two simple graphs G and H, recall the definition of the homomorphism density t.H; G/. Fix a simple graph H with at least one edge, and define  C .H; p; t/ WD inffIp . f / W f 2 W ; t.H; f /  tg;   .H; p; t/ WD inffIp . f / W f 2 W ; t.H; f /  tg: Let Gn;p be a random graph from the G.n; p/ model. The following theorem specializes Theorem 6.1 to subgraph densities. Theorem 6.3 Let H, Gn;p ,  C and   be as above. Fix p 2 .0; 1/. Then for any t 2 Œ0; 1, 2 log P.t.H; Gn;p /  t/ D  C .H; p; t/; n2 2 log P.t.H; Gn;p /  t/ D   .H; p; t/: lim n!1 n2 lim

n!1

Proof Recall that by Exercise 3.1, for any graph G, t.H; G/ is the same as t.H; f G /, where f G is the graphon of G. The result now follows easily by Lemma 6.1 and Theorem 6.1. t u Let e.H/ denote the number of edges in H. The following exercise describes the asymptotic behavior of t.H; Gn;p /. Exercise 6.3 Prove that E.t.H; Gn;p // D pe.H/ , and that as n ! 1, t.H; Gn;p / converges to pe.H/ in probability. The next lemma lists some basic properties of  C and   . Lemma 6.3 The functions  C and   and the related variational problems have the following properties: (i) The function  C is continuous and non-decreasing in t, and the function   is continuous and non-increasing in t. (ii) The minimum is attained in the variational problems defining  C and   . (iii) For any t, if f minimizes Ip . f / under the constraint t.H; f /  t, then f  p almost everywhere. If f minimizes Ip . f / under the constraint t.H; f /  t, then f  p almost everywhere. (iv) The function  C is zero when t  pe.H/ and strictly increasing in t when t > pe.H/. Similarly,   is zero when t  pe.H/ and strictly decreasing in t when t < pe.H/ . Proof The continuity and monotonicity claims follow from Proposition 6.1. The existence of minimizers follow from Theorem 6.2. The graphon f p satisfies

6.4 Large Deviations for Subgraph Densities

77

Ip . f / D 0 and t.H; f / D pe.H/ . Since Ip  0 everywhere, this shows that  C .H; p; t/ D 0 if t  pe.H/ and   .H; p; t/ D 0 if t  pe.H/. Take any f 2 W that minimizes Ip . f / subject to t.H; f /  t. Let f0 WD maxf f ; pg. Then t.H; f0 /  t.H; f /  t: Since Ip is a strictly decreasing function in Œ0; p, Ip . f0 / > Ip . f / unless f  p almost everywhere. Since f minimizes Ip subject to the constraint t.H; f /  t, this shows that f  p almost everywhere. Similarly, take any f 2 W that minimizes Ip . f / subject to t.H; f /  t. Let f0 WD minf f ; pg. Then t.H; f0 /  t.H; f /  t: Since Ip is a strictly increasing function in Œ p; 1, Ip . f0 / < Ip . f / unless f  p almost everywhere. Since f minimizes Ip subject to the constraint t.H; f /  t, this shows that f  p almost everywhere. Next, take t > s  pe.H/ . Take any f 2 W that minimizes Ip . f / subject to the constraint t.H; f /  t. Then f  p almost everywhere, as shown above. Moreover, f > p on a set of positive Lebesgue measure, since t.H; f /  t > pe.H/. Since Ip is strictly increasing in Œ p; 1, this shows that if g D .1  /f C p for some  2 .0; 1/, then Ip .g/ < Ip . f /. But if  is small enough, the t.H; g/  s. This shows that  C .H; p; s/ <  C .H; p; t/. The strict monotonicity of   follows by a similar argument. t u The following theorem specializes Theorem 6.2 to subgraph densities. FC .H; p; t/ be the Theorem 6.4 Let H and Gn;p be as above. Take any t 2 Œ0; 1. Let e Q Q set of minimizers of Ip . f / subject to the constraint t.H; f /  t and let e F .H; p; t/ be Q Q the set of minimizers of Ip . f / subject to the constraint t.H; f /  t. Then e F C .H; p; t/  f e and F .H; p; t/ are nonempty compact subsets of W . Moreover, for any  > 0 there exist positive constants CC and C depending only on H, p, t and  such that if t > pe.H/ , then C

P.ı .Gn;p ; e F .H; p; t//   j t.H; Gn;p /  t/  eC

C n2

;

and if t < pe.H/ , then  2  P.ı .Gn;p ; e F .H; p; t//   j t.H; Gn;p /  t/  eC n :

Proof This theorem is an immediate corollary of Theorem 6.2. The positivity of CC and C follow from Theorem 6.2 and Lemma 6.3. t u At this point, the obvious next step is to try to understand the nature of the sets C  e F .H; p; t/ and to explicitly evaluate  C .H; p; t/ and   .H; p; t/ F .H; p; t/ and e

78

6 Applications of Dense Graph Large Deviations

if possible. Although explicit solutions to these problems are not known for all possible H, p and t, we have a substantial amount of information. This is discussed in the next four sections.

6.5 Euler–Lagrange Equations In this section we will derive the Euler–Lagrange equations for the variational problems defining the rate functions  C and   for subgraph densities. The main technical challenge is that the domain for these variational problems is not an open set, so one has to take care of boundary effects. The Euler–Lagrange equations will be used later to explicitly solve the variational problems in certain regimes. For a finite simple graph H, let V.H/ and E.H/ denote the sets of vertices and edges of H. As before, let e.H/ be the number of edges of H. Given a symmetric Borel measurable function h W Œ0; 12 ! R, for each fr; sg 2 E.H/ and each pair of points xr ; xs 2 Œ0; 1, define Z H;r;s h.xr ; xs / WD

Y Œ0;1jV.H/j2

Y

h.xr0 ; xs0 /

fr0 ;s0 g2E.H/ fr0 ;s0 g¤fr;sg

dxv :

v2V.H/ v¤r;s

For x; y 2 Œ0; 1 define H h.x; y/ WD

X

H;r;s h.x; y/:

(6.5.1)

fr;sg2E.H/

For example, when H is a triangle, then V.H/ D f1; 2; 3g and Z H;1;2 h.x; y/ D H;1;3 h.x; y/ D H;2;3 h.x; y/ D

1

h.x; z/h. y; z/ dz 0

R1 and therefore H h.x; y/ D 3 0 h.x; z/h. y; z/dz. When H contains exactly one edge, define H h 1 for any h, by the usual convention that the empty product is 1. Theorem 6.5 Fix p 2 .0; 1/, a finite simple graph H containing at least one edge, C  and a number t 2 .0; 1/. Let e F .H; p; t/ and e F .H; p; t/ be defined as in the statement of Theorem 6.4. Let ˛ WD log. p=.1  p// and H be defined as above. C  F .H; p; t/, there exists ˇ 2 R Then for any h 2 W such that hQ 2 e F .H; p; t/ or hQ 2 e 2 such that for almost every .x; y/ 2 Œ0; 1 , h.x; y/ D

e˛CˇH h.x;y/ : 1 C e˛CˇH h.x;y/

6.5 Euler–Lagrange Equations

79

C  Proof Fix h such that hQ 2 e F .H; p; t/[ e F .H; p; t/. The strict monotonicities of  C and   (Lemma 6.3) imply that t.H; h/ D t. Therefore h minimizes Ip . f / among all f satisfying t.H; f / D t. Let g be a bounded symmetric Borel measurable function from Œ0; 1 into R. For each u 2 R, let

fu .x; y/ WD h.x; y/ C u g.x; y/; and let hu WD ˛.u/fu , where  ˛.u/ D

t.H; h/ t.H; fu /

1=e.H/

:

Note that t.H; hu / D t.H; h/ for any u such that t.H; fu / ¤ 0. First suppose that h is bounded away from 0 and 1. Then t.H; fu / ¤ 0 and hu 2 W for every u sufficiently small in magnitude. Thus, ˇ ˇ d Ip .hu /ˇˇ D 0: (6.5.2) du uD0 (Using the assumption that f is bounded away from 0 and 1, it is easy to check that Ip .hu / is differentiable in u for any h and g when juj is small enough.) A simple computation shows that ˇ Z ˇ d ˇ D Ip0 .h.x; y//.˛ 0 .0/h.x; y/ C ˛.0/g.x; y// dx dy: (6.5.3) Ip .hu /ˇ du Œ0;12 uD0 Note that d t.H; fu / du Z D

X

Œ0;1V.H/

fr;sg2E.H/

Z D

Œ0;12

g.xr ; xs /

Y

Y

fu .xr0 ; xs0 /

fr0 ;s0 g2E.H/ fr0 ;s0 g¤fr;sg

dxv

v2V.H/

g.x; y/H fu .x; y/ dy dx:

Now H fu D H h when u D 0. Thus, whenever g is such that Z Œ0;12

g.x; y/H h.x; y/ dy dx D 0;

(6.5.4)

then ˛ 0 .0/ D 0, and hence by (6.5.2) and (6.5.3), ˇ Z ˇ d Ip .hu /ˇˇ D Ip0 .h.x; y//g.x; y/ dx dy: 0D 2 du Œ0;1 uD0

(6.5.5)

80

6 Applications of Dense Graph Large Deviations

We claim that this implies that there exists ˇ 2 R such that for almost all .x; y/ 2 Œ0; 12 , Ip0 .h.x; y// D ˇH h.x; y/;

(6.5.6)

which is the same as the assertion of the theorem. To prove this, first suppose that H h.x; y/ D 0 almost everywhere. Then taking g.x; y/ D Ip0 .h.x; y//, we see that (6.5.4) is satisfied, and hence (6.5.5) holds. This, in turn, implies that Ip0 .h.x; y// D 0 almost everywhere, and therefore (6.5.6) holds with any value of ˇ. Next, suppose that H h is nonzero on a set of positive measure. Define R 0 Œ0;12 Ip .h.x; y//H h.x; y/ dx dy R ; ˇ WD 2 Œ0;12 H h.x; y/ dx dy and g.x; y/ WD Ip0 .h.x; y//  ˇH h.x; y/: By the definition of ˇ, (6.5.4) holds. Therefore (6.5.5) also holds. Subtracting ˇ times the left-hand side of (6.5.4) from the left-hand side of (6.5.5) shows that (6.5.6) is satisfied almost everywhere. Note that the above proof was carried out under the assumption that h is bounded away from 0 to 1. We will now prove that this assumption holds. For this proof, it is important to recall some basic properties of Ip , namely, that Ip is convex on Œ0; 1, Ip . p/ D 0, Ip is strictly increasing in Œ p; 1 and strictly decreasing in Œ0; p, Ip0 .x/ ! 1 as x ! 0, and Ip0 .x/ ! 1 as x ! 1.  First, suppose that hQ 2 e F .H; p; t/. By Lemma 6.3, h  p almost everywhere. So we only have to show that h is bounded away from zero. Fix  > 0 and let ( 0 if h.x; y/  ; w.x; y/ WD 1 if h.x; y/ < : Suppose that h <  on a set of positive Lebesgue measure, so that Z w.x; y/ dx dy > 0: Œ0;12

For each u  0, let gu .x; y/ WD .1  Au/.h.x; y/ C u w.x; y//; where

Z A WD B

Œ0;12

w.x; y/ dx dy;

(6.5.7)

6.5 Euler–Lagrange Equations

81

where B is a constant, to be chosen later. We have already observed at the beginning of this proof that t.H; h/ D t by a consequence of Lemma 6.3. A simple computation using this fact shows that ˇ Z ˇ d t.H; gu /ˇˇ D e.H/At.H; h/ C w.x; y/H h.x; y/ dx dy du Œ0;12 uD0 Z D .e.H/Bt C H h.x; y//w.x; y/ dx dy: Œ0;12

Choose B so large (depending only on t, H and h) such that for all x; y, e.H/Bt C H h.x; y/ < 0: Then by (6.5.7) and the above identity, ˇ ˇ d t.H; gu /ˇˇ < 0: du uD0

(6.5.8)

Now note that ˇ Z ˇ d Ip .gu /ˇˇ D Ip0 .h.x; y//.Ah.x; y/ C w.x; y// dx dy 2 du Œ0;1 uD0 Z  .CB C Ip0 .h.x; y///w.x; y/ dx dy;

(6.5.9)

Œ0;12

where Z CD

Œ0;12

Ip0 .h.x; y//h.x; y/ dx dy:

It is easy to see that C is finite, using the fact that h  p almost everywhere. Note also that C does not depend on . Therefore, if  is so small that Ip0 ./ < CB;

(6.5.10)

then by (6.5.9) and (6.5.7) (and the properties of Ip listed before), ˇ ˇ d Ip .gu /ˇˇ < 0: du uD0

(6.5.11)

Since h is bounded away from 1, gu 2 W for all sufficiently small positive u. Therefore by the minimizing property of h, the inequalities (6.5.8) and (6.5.11) cannot hold simultaneously. Therefore, if  is so small that (6.5.10) is satisfied,

82

6 Applications of Dense Graph Large Deviations

then (6.5.7) must be invalid. In other words, h   almost everywhere. This  completes the proof that any hQ 2 e F .H; p; t/ is bounded away from 0 and 1. C Next, take any h such that hQ 2 e F .H; p; t/. The proof is quite similar to the previous case, with minor modifications. By Lemma 6.3, h  p almost everywhere. So we only have to show that h is bounded away from 1. Fix  > 0 and let ( w.x; y/ WD

0

if h.x; y/  1  ;

1

if h.x; y/ > 1  :

Suppose that h > 1   on a set of positive Lebesgue measure, so that Z Œ0;12

w.x; y/ dx dy > 0:

(6.5.12)

For each u  0, let gu .x; y/ WD .1  Au/.h.x; y/  u w.x; y// C Au; where Z A WD B

Œ0;12

w.x; y/ dx dy;

where B is a constant, to be chosen later. A simple computation shows that ˇ Z ˇ d t.H; gu /ˇˇ D AD  w.x; y/H h.x; y/ dx dy du Œ0;12 uD0 Z D .BD  H h.x; y//w.x; y/ dx dy;

(6.5.13)

Œ0;12

where DD

X fr;sg2E.H/

Z Œ0;1V.H/

.1  h.xr ; xs //

Y fr0 ;s0 g2E.H/ fr0 ;s0 g¤fr;sg

h.xr0 ; xs0 /

Y

dxv :

v2V.H/

Since h  p almost everywhere, the above formula shows that D can be 0 only if h D 1 almost everywhere. Since t.H; h/ D t < 1, this is not true. Therefore D > 0. Thus, B can be chosen (depending only on t, H and h) such that for all x; y, BD  H h.x; y/ > 0:

6.6 The Symmetric Phase

83

Then by (6.5.12) and (6.5.13), ˇ ˇ d t.H; gu /ˇˇ > 0: du uD0 Now note that ˇ Z ˇ d Ip .gu /ˇˇ D Ip0 .h.x; y//.A.1  h.x; y//  w.x; y// dx dy 2 du Œ0;1 uD0 Z  .CB  Ip0 .h.x; y///w.x; y/ dx dy;

(6.5.14)

(6.5.15)

Œ0;12

where Z CD

Œ0;12

Ip0 .h.x; y//.1  h.x; y// dx dy:

It is easy to see that C is finite, using the fact that h  p almost everywhere. Note also that C does not depend on . Therefore, if  is so small that Ip0 .1  / > CB;

(6.5.16)

then by (6.5.15) and (6.5.12) (and the properties of Ip listed before), ˇ ˇ d Ip .gu /ˇˇ < 0: du uD0

(6.5.17)

Since h is bounded away from 0, gu 2 W for all sufficiently small positive u. Therefore by the minimizing property of h, the inequalities (6.5.14) and (6.5.17) cannot hold simultaneously. Therefore, if  is so small that (6.5.16) is satisfied, then (6.5.12) must be invalid. In other words, h  1   almost everywhere. This C t u completes the proof that any hQ 2 e F .H; p; t/ is bounded away from 0 and 1.

6.6 The Symmetric Phase Let  C and   be the rate functions for subgraph densities, defined in Sect. 6.4. Fix H and p. Take any t > pe.H/. We will say that t is in the symmetric phase if C the set of minimizers e F .H; p; t/ consists of a unique constant function. Similarly, for t < pe.H/, we will say that t is in the symmetric phase if the set of minimizers  e F .H; p; t/ consists of a unique constant function. The following result clarifies the significance of the symmetric phase.

84

6 Applications of Dense Graph Large Deviations

Proposition 6.2 Let H be a finite simple graph with at least one edge and p be an element of .0; 1/. Suppose that t > pe.H/ belongs to the symmetric phase. Let r WD t1=e.H/ . Let Gn;p and Gn;r be independent Erd˝os–Rényi graphs defined on the same probability space. Then for any  > 0, lim P.ı .Gn;p ; Gn;r / >  j t.H; Gn;p /  t/ D 0:

n!1

Similarly, if t < pe.H/ belongs to the symmetric phase, then lim P.ı .Gn;p ; Gn;r / >  j t.H; Gn;p /  t/ D 0:

n!1

Proof This result is an easy consequence of the definition of symmetric phase, Theorem 6.2 and Exercise 5.1. t u The next theorem, which is the main result of this section, says that if t is close enough to pe.H/ , it is in the symmetric phase. The proof uses the Euler–Lagrange equations derived in the previous section. Theorem 6.6 Take any finite simple graph H consisting of at least one edge and a number p 2 .0; 1/. For any t 2 Œ0; 1, let cH;t be the graphon that is identically equal to t1=e.H/ . Then there exists ı > 0 depending only on H and p such that C  e F .H; p; t/ D fQcH;t g if pe.H/  ı < F .H; p; t/ D fQcH;t g if pe.H/ < t < pe.H/ C ı and e t < pe.H/ . Consequently, in the first case,  C .H; p; t/ D Ip .t1=e.H/ / and in the second case,   .H; p; t/ D Ip .t1=e.H/ /. Proof In this proof, C will throughout denote any function from Œ0; 1/ into Œ0; 1, depending only on p and H but not on t, such that lim C.x/ D 0:

x!0

The function C may change from line to line. Let  WD jt  pe.H/ j: C



F .H; p; t/, We claim that for any h such that hQ 2 e F .H; p; t/ [ e Z Œ0;12

.h.x; y/  p/2 dx dy < C./:

(6.6.1)

To see this, first observe that t.H; cH;t / D t and Ip .cH;t / < C./. Therefore, for any C  h such that hQ 2 e F .H; p; t/ [ e F .H; p; t/, Ip .h/ < C./:

(6.6.2)

6.6 The Symmetric Phase

85

The map Ip on Œ0; 1 satisfies Ip . p/ D Ip0 . p/ D 0 and Ip00 .x/ D

1  2 for all x 2 Œ0; 1; 2x.1  x/

and therefore Ip .x/  .x  p/2 for all x 2 Œ0; 1: This, combined with (6.6.2), proves (6.6.1). Now fix any h such that C  hQ 2 e F .H; p; t/ [ e F .H; p; t/:

By Theorem 6.5, there is some ˇ 2 R such that h satisfies log

h.x; y/ p  log D ˇH h.x; y/ 1  h.x; y/ 1p

(6.6.3)

for almost all x; y. It is easy to see from (6.6.1) that Z Œ0;12

.H h.x; y/  pe.H/1/2 dx dy < C./:

From this and (6.6.1), it follows that there exists .x; y/ 2 Œ0; 12 that satisfies (6.6.3), such that ˇ ˇ ˇ ˇ ˇlog h.x; y/  log p ˇ < C./ ˇ 1  h.x; y/ 1  pˇ and jH h.x; y/  pe.H/1 j < C./: Consequently, from (6.6.3), we get jˇj < C./:

(6.6.4)

Let k  k1 denote the L1 norm on W . Let be any measure preserving bijection of Œ0; 1 and let g.x; y/ WD h. x; y/. Then g also satisfies (6.6.3). A simple computation shows that kH h  H gk1 

X

kH;r;s h  H;r;s gk1

fr;sg2E.H/

 e.H/.e.H/  1/kh  gk1 :

86

6 Applications of Dense Graph Large Deviations

Let ˛ WD log. p=.1p//. Using the above inequality, Theorem 6.5 and the inequality ˇ a ˇ ˇ e eb ˇˇ ja  bj ˇ ˇ 1 C ea  1 C eb ˇ  4 (easily proved by the mean value theorem) it follows that for almost all x; y, ˇ ˇ ˛CˇH h.x;y/ ˇ e e˛CˇH g.x;y/ ˇˇ ˇ  jh.x; y/  g.x; y/j D ˇ 1 C e˛CˇH h.x;y/ 1 C e˛CˇH g.x;y/ ˇ 1 jˇjkH h  H gk1 4 1  kh  gk1 jˇje.H/.e.H/  1/: 4 

If the coefficient of kh  gk1 in the last expression is strictly less than 1, it follows that h must be equal to g almost everywhere. Since this would hold for any bijection , h must be a constant function. Combined with (6.6.4), this completes the proof. t u

6.7 Symmetry Breaking Let all notation be as in Sect. 6.4. Let H be a finite simple graph. Take any p 2 .0; 1/. We will say that a number t > pe.H/ belongs to the region of broken symmetry if C e F .H; p; t/ contains only non-constant graphons, and a number t < pe.H/ belongs to  the region of broken symmetry if e F .H; p; t/ contains only non-constant graphons. Note that the definition leaves open the possibility that a number t may belong to neither the symmetric phase nor the region of broken symmetry, for example C if e F .H; p; t/ contains both constant and non-constant graphons. The following theorem shows that a region of broken symmetry for the upper tail exists under a mild condition on H. Recall that the degree of a vertex in a graph is the number of vertices that are adjacent to it. The average degree of a graph is the average of the vertex degrees. Theorem 6.7 Let H be a finite simple graph with average degree strictly greater than one. Let e C denote the set of constant functions in f W . Then for each t 2 .0; 1/, C 0 there exists p > 0 such that for all 0 < p < p0 , e F .H; p; t/ \ e C D ;. Moreover, for such p, there exists  > 0 such that lim P.ı .Gn;p ; e C/ >  j t.H; Gn;p /  t/ D 1:

n!1

Proof Take any t 2 .0; 1/. Let v.H/ be the number of vertices of H, and let ( 1 if maxfx; yg  t1=v.H/ ; H;t .x; y/ WD 0 otherwise.

6.7 Symmetry Breaking

87

Then t.H; H;t /  t, and for any t 2 .0; 1/, lim

p!0

Ip .cH;t / Ip .H;t / D t1=e.H/ > t2=v.H/ D lim ; p!0 log.1=p/ log.1=p/

by the assumed condition that the average degree is strictly greater than one (which is the same as saying 2e.H/=v.H/ > 1). Moreover, t.H; cH;t0 /  t if and only if t0  t, and if t0  t, then Ip .cH;t0 /  Ip .cH;t /. Thus, the above inequality shows that for each t 2 .0; 1/, there exists p0 > 0 such that for all 0 < p < p0 we have C e F .H; p; t/ \ e C D ;: C

Since e F .H; p; t/ and e C are non-empty compact subsets of f W , this implies that C

ı .e F .H; p; t/; e C/ > 0: It is now easy to complete the proof using Theorem 6.2. t u If H is a simple graph and H 0 is obtained from H by eliminating isolated vertices (if any), then t.H 0 ; G/ D t.H; G/ for any G. Therefore in our study of subgraph densities we may consider only those H that have no isolated vertices. If H is a finite simple graph with no isolated vertices, then it is easy to see that the average degree of H is at least one. The following exercise shows that for the existence of region of broken symmetry in the upper tail, it is necessary that the average degree is strictly bigger than one. Exercise 6.4 If H has no isolated vertices and has average degree exactly equal to one, show that for any p 2 .0; 1/, there is no region of broken symmetry in the upper tail. Notice that Theorem 6.7 only says that regions of broken symmetry exist for small enough p. We will see later that unless p is small enough (depending on H), a region of broken symmetry in the upper tail may not exist. The next theorem is the analog of Theorem 6.7 for lower tails. Recall that the chromatic number of a graph is the minimum number of colors required to color the vertices such that no two vertices connected by an edge receive the same color. Theorem 6.8 Let H be a finite simple graph with chromatic number at least three. Then for each p 2 .0; 1/, there exists t0 2 .0; 1/ such that for all 0 < t < t0 ,  e F .H; p; t/ \ e C D ;. Moreover, for such t, there exists  > 0 such that lim P.ı .Gn;p ; e C/ >  j t.H; Gn;p /  t/ D 1:

n!1

Proof Let k be a chromatic number of H. For a real number x, let bxc denote the largest integer that is  x. Define ( p if b.k  1/xc ¤ b.k  1/yc; H;p .x; y/ WD 0 otherwise.

88

6 Applications of Dense Graph Large Deviations

Let n be the number of vertices of H. Label these vertices as 1; 2; : : : n. Suppose that x1 ; : : : ; xn 2 .0; 1/ are points such that H;p .xi ; xj / ¤ 0 for all fi; jg 2 E.H/. Let ri W b.k  1/xi c. Then by the definition of H;p , ri ¤ rj for all fi; jg 2 E.H/. Thus, if we color vertex i with color ri , then no two adjacent vertices receive the same color. However, ri can take only k  1 possible values. This contradicts the fact that k is the chromatic number of H. Therefore, such x1 ; : : : ; xn cannot exist. This proves that t.H;

H;p /

D 0:

(6.7.1)

Since k  3 and Ip . p/ D 0, lim

t!0

1 Ip . H;p / Ip .cH;t / D1> D : log.1=.1  p// k1 log.1=.1  p//

By (6.7.1), by a similar argument as in the proof of Theorem 6.7, this shows that for all sufficiently small t,  e F .H; p; t/ \ e C D ;:

The rest of the proof is similar to that of Theorem 6.7. t u A graph with chromatic number one is just a collection of isolated vertices, and therefore quite uninteresting. It is conjectured that a graph with chromatic number two cannot have a region of broken symmetry in the lower tail, which means that the condition on H in Theorem 6.8 is necessary for the existence of a region of broken symmetry. The conjecture has been proved in some special cases. We will discuss more about this later.

6.8 The Phase Boundary for Regular Subgraphs Let us continue to work in the setting of Sect. 6.4. The additional assumption in this section is that the graph H is regular. (Recall that a graph is called regular if all its vertices have the same degree.) When H is regular, the exact boundary between the symmetric phase and the phase of broken symmetry for the upper tail can be identified. This is the content of the following theorem. To understand the statement of the theorem, recall that the convex minorant of a function f is the greatest convex function g such that g  f everywhere on the domain of f . Note that this definition makes sense because the pointwise supremum of an arbitrary collection of convex functions is convex. Recall also the definitions of symmetric phase and symmetry breaking from Sects. 6.6 and 6.7. Theorem 6.9 Let H be a finite, simple, regular graph of degree d  2. Take any p 2 .0; 1/ and t 2 . pe.H/ ; 1/. Let r WD t1=e.H/ . Then t belongs to the symmetric phase if the point .rd ; Ip .r// lies on the convex minorant of the function Jp .x/ WD Ip .x1=d /.

6.8 The Phase Boundary for Regular Subgraphs

89

On the other hand, if .rd ; Ip .r// does not lie on the convex minorant of Jp , then t belongs to the region of broken symmetry. To prove this theorem, we need a small amount of preparation. Lemma 6.4 Let Jp00 denote the second derivative of Jp . Then Jp00 cannot have more than two zeroes in .0; 1/. Proof A simple computation gives Ip0 .x/ D log

p x  log 1x 1p

and Ip00 .x/ D

1 : x.1  x/

Thus, Jp00 .x/ D

  1 1 1  1 x1=d2 Ip0 .x1=d / C x2=d2 Ip00 .x1=d /; d d d

which implies that Jp00 .x/ D 0 if and only if .d  1/Ip0 .x1=d / D x1=d Ip00 .x1=d /: In other words, x is a zero of Jp00 if and only if x1=d is a zero of Kp . y/ WD .d  1/Ip0 . y/  yIp00 . y/: Now note that Kp0 . y/ D

1 d1 .d  1/.1  y/  y  D ; y.1  y/ .1  y/2 y.1  y/2

which shows that Kp0 has exactly one zero in .0; 1/. By Rolle’s theorem, this implies that Kp can have at most two zeroes in .0; 1/. This completes the proof of the lemma. t u Lemma 6.5 Take any r 2 . p; 1/. If the point .rd ; Ip .r// lies on the graph of the convex minorant JOp of Jp , then JOp cannot be linear in a neighborhood of rd . Proof Suppose that JOp is linear in a neighborhood of rd . First, note that by the formulas for Ip0 , Ip00 and Jp00 derived in the proof of Lemma 6.4, it follows that Jp00 .x/ > 0 when x D pd and also when x is sufficiently close to 1.

90

6 Applications of Dense Graph Large Deviations

Next, recall that Jp .rd / D JOp .rd /, JOp is linear in a neighborhood of rd and Jp  O J p everywhere. From these it follows easily that Jp0 .rd / D JO0p .rd / and Jp00 .rd /  JO00p .rd / D 0. Now suppose that Jp is convex in . pd ; rd /. Then by the given conditions, the function that equals Jp in .0; rd  and equals JOp in Œrd ; 1/ is a convex function (since its derivative is nondecreasing) and lies between JOp and Jp everywhere. Therefore it must be equal to JOp everywhere. Thus, the convexity of Jp in . pd ; rd / would imply that Jp D JOp in . pd ; rd /. But by Lemma 6.4, Jp cannot be linear in an interval. Therefore the above scenario is impossible; Jp cannot be convex in . pd ; rd /. In particular, there exists a point in this interval where J 00 is strictly negative. By a similar argument, Jp cannot be convex in .rd ; 1/, and therefore there exists a point in this interval where Jp00 is strictly negative. Let us now collect our deductions. Under the assumption that JOp is linear in a neighborhood of rd , we have argued that Jp00 .rd /  0, and there exist x1 2 . pd ; rd / and x2 2 .rd ; 1/ such that Jp00 .x1 / < 0 and Jp00 .x2 / < 0. We have also observed that Jp00 .x/ > 0 for x sufficiently close to pd and for x sufficiently close to 1. These deductions jointly imply that Jp00 must have at least three zeroes, which is impossible by Lemma 6.4. t u Lemma 6.6 If .rd ; Ip .r// does not lie on the convex minorant of Jp , then there exist 0 < r1 < r < r2 < 1 such that .rd ; Ip .r// lies strictly above the line segment joining .r1d ; Ip .r1 // and .r2d ; Ip .r2 //. Proof First, note that the function that is identically equal to zero on .0; 1/ is a convex function that lies below Jp , and equals Jp at pd . Thus, Jp . pd / D JOp . pd /. Next, note that by the formula for Jp00 from the proof of Lemma 6.4, we know that Jp is convex near 1. Moreover, it is easy to verify that Jp0 .x/ ! 1 as x ! 1. From these two facts and the observation that JOp is bounded below on .0; 1/, it follows easily that if x is sufficiently close to 1, then the tangent line to Jp at x lies entirely below Jp in the interval .0; 1/. Consequently, for such x, Jp .x/ D JOp .x/. To summarize, Jp D JOp at pd and at all x sufficiently close to 1. Let r1 be the largest number less than r such that Jp .r1d / D JOp .r1d / and r2 be the smallest number bigger than r such that Jp .r2d / D JOp .r2d /. Choose a 2 .r1 ; r/ and b 2 .r; r2 /. Let h be the continuous function that equals JOp in .0; ad / [ .bd ; 1/, and is linear in the interval Œad ; bd . Since JOp is convex, so is h. Moreover, h  JOp everywhere. For each s 2 Œ0; 1, let hs WD sh C .1  s/JOp . Then for each s, hs is a convex function lying between JOp and h. Since Jp > JOp in the compact interval Œad ; bd , there must exist  > 0 such that Jp .x/  JOp .x/ C  for all x in this interval. Thus, for small enough positive s, hs lies below Jp . Since JOp is the convex minorant of Jp , this is possible only if h D JOp in Œad ; bd . In other words, JOp is linear in this interval. Taking a ! r1 and b ! r2 , we see that JOp is linear in .r1 ; r2 /. This completes the proof of the lemma. t u We are now ready to prove Theorem 6.9.

6.8 The Phase Boundary for Regular Subgraphs

91

Proof (Proof of Theorem 6.9) First, suppose that .rd ; Ip .r// lies on the convex minorant of Jp , which we denote by JOp . Recall the generalized Hölder’s inequality (Theorem 2.1) proved in Sect. 2.5 of Chap. 2. By that inequality and the regularity of H, it follows that for any f 2 W , Z t.H; f / 

e.H/=d Œ0;12

f .x; y/d dx dy

:

If t.H; f /  t D re.H/ , then by the above inequality, Jensen’s inequality and the assumption that JOp .rd / D Jp .rd /, we get Z Ip . f / D

Z Jp . f .x; y/ / dx dy  d

Œ0;12

Z

 JOp



Œ0;12

JOp . f .x; y/d / dx dy

(6.8.1)

f .x; y/ dx dy  JOp .rd / D Jp .rd / D Ip .r/: d

Œ0;12

Moreover, equality holds if and only if f D r almost everywhere since JOp is not linear in any neighborhood of rd by Lemma 6.5 and JOp is strictly increasing in . pd ; 1/ (which is easy to prove using the properties of Jp ). This proves the first part of the theorem. Next, suppose that .rd ; Ip .r// does not lie on the convex minorant of Jp . Then by Lemma 6.6, there exists 0 < r1 < r < r2 < 1 and s 2 .0; 1/ such that sr1d C .1  s/r2d D rd and sIp .r1 / C .1  s/Ip .r2 / < Ip .r/:

(6.8.2)

Choose some  > 0 and define a WD s 2 ; b WD .1  s/ 2 C  3 : Let  be chosen so small that 0 < a < 1  b < 1. Define three intervals I0 WD .a; 1  b/, I1 WD .0; a/ and I2 W .1  b; 1/. Define a graphon 8 ˆ ˆ JOp .rd /. As in the proof of Lemma 6.6, let r1 be the largest number less than r such that Jp .r1d / D JOp .r1d / and r2 be the smallest number bigger than r such that Jp .r2d / D JOp .r2d /. We saw in the proof of Lemma 6.6 that JOp is linear in the interval Œr1d ; r2d . Since Jp and JOp coincide at r1d and r2d , it is easy to see that Jp00 is nonnegative at these two points. Since JOp is linear in .r1d ; r2d / and Jp lies strictly above JOp in this interval, Jp00 must be strictly negative somewhere in this interval. Thus, Jp00 has at least two zeroes in Œr1d ; r2d . If Jp and JOp do not coincide somewhere outside .r1d ; r2d /, a repetition of the above argument would imply the existence of more zeroes, going against the assertion of Lemma 6.4. Thus, Œ pd ; r1d  [ Œr2d ; 1 is the symmetric phase. Lastly, recall that by Theorem 6.7, the region of broken symmetry is nonempty if p is sufficiently small. t u

6.10 The Lower Tail and Sidorenko’s Conjecture Let us continue using the notation of Sect. 6.4. The object of interest now is the  set e F .H; p; t/. We have already seen in Theorem 6.6 that if t is sufficiently close  to pe.H/, then e F .H; p; t/ consists of a single constant graphon. Theorem 6.8 tells us that if H has chromatic number at least three, then any sufficiently small t must

94

6 Applications of Dense Graph Large Deviations

belong to the region of broken symmetry. What about graphs of chromatic number two, that is, bipartite graphs? There is a famous conjecture about bipartite graphs, known as Sidorenko’s conjecture, which claims that for any bipartite graph H and any finite simple graph G, t.H; G/  t.K2 ; G/e.H/ ; where K2 is the complete graph on two vertices and e.H/ is the number of edges in H. Note that t.K2 ; G/ is simply the edge density of G. The conjecture has been verified in many special cases, such as trees, even cycles, hypercubes and bipartite graphs with one vertex complete to the other part (see Sect. 6.11 for references). The following theorem gives a complete solution to the lower tail problem for bipartite graphs conditional on Sidorenko’s conjecture. Theorem 6.11 If H is a bipartite graph that satisfies Sidorenko’s conjecture, then the lower tail does not have a region of broken symmetry. In particular,   .H; p; t/ D Ip .t1=e.H/ / for all t 2 .0; pe.H//. Proof Suppose that H satisfies Sidorenko’s conjecture. Take any t 2 .0; pe.H// and  let r D t1=e.H/ . Take any f such that fQ 2 e F .H; p; t/. By Proposition 3.1 and the Sidorenko property of H, Z Œ0;12

f .x; y/ dx dy D t.K2 ; f /  t.H; f /1=e.H/  r:

Since Ip is convex and decreasing in Œ0; p, the above inequality and Jensen’s inequality imply that Z Ip . f / D

Œ0;12

Ip . f .x; y// dx dy

Z

 Ip

Œ0;12

(6.10.1)

 f .x; y/ dx dy  Ip .r/:

Moreover, since Ip is nonlinear everywhere, equality holds if and only if f D r almost everywhere. u t

6.11 Large Deviations for the Largest Eigenvalue Let Gn;p be a random graph from the Erd˝os–Rényi G.n; p/ model. Let n;p be the largest eigenvalue of the adjacency matrix of Gn;p . We have seen in Exercise 6.2 that n;p equals n times the operator norm of the graphon of Gn;p . Lemma 6.2 tells us that the operator norm is a nice graph parameter. Therefore the large deviation rate function for the largest eigenvalue and the conditional behavior under large

6.11 Large Deviations for the Largest Eigenvalue

95

deviations can be obtained by straightforward applications of Theorems 6.1 and 6.2. Explicitly, the result would be that for any t 2 Œ0; 1 (since Œ0; 1 is the range of the operator norm), 2 P.n;p  tn/ D n!1 n2

C

. p; t/

2 P.n;p  tn/ D n2



. p; t/

lim

and lim

n!1

where C

. p; t/ WD inffIp . f / W f 2 W ; kKf k  tg

and 

. p; t/ WD inffIp . f / W f 2 W ; kKf k  tg:

As for subgraph densities, it is interesting to understand more about the variational problems defining C and  . In particular, one can define the symmetric phase and the region of broken symmetry as before. Note that in this setting, the unique constant optimizer in the symmetric phase is the function that is identically equal to t. If t 2 . p; 1/ belongs to the symmetric phase, then C . p; t/ D Ip .t/, and if t 2 .0; p/ belongs to the symmetric phase, then  . p; t/ D Ip .t/. Curiously, the phase boundary for the largest eigenvalue turns out to be exactly the same as that of the square root of the homomorphism density of a bipartite graph of degree two (for example, any even cycle). This is the content of the following theorem. Theorem 6.12 Take any p 2 .0; 1/ and t 2 . p; 1/. If the point .t2 ; Ip .t// lies on the convex minorant of the function Jp .x/ WD Ip .x1=2 /, then t belongs to the symmetric phase of the upper tail in the largest eigenvalue problem. On the other hand, if .t2 ; Ip .t// does not lie on the convex minorant of Jp then t belongs to the region of broken symmetry for the upper tail. Lastly, any t 2 .0; p/ belongs to the symmetric phase of the lower tail. Proof Let JOp denote the convex minorant of Jp . Take any t 2 . p; 1/. Suppose that .t2 ; Ip .t// lies on the convex minorant of Jp . Take any f such that kKf k  t. Recall the inequality (3.4.2) from Chap. 3, which says that k f k  kKf k. Therefore, applying the inequality (6.8.1) with d D 2 and proceeding as in the proof of Theorem 6.9, we get Ip . f /  JOp .k f k2 /  JOp .t2 / D Jp .t2 / D Ip .t/;

96

6 Applications of Dense Graph Large Deviations

with equality if and only if f D t almost everywhere. This proves the first part of the theorem. Next, suppose that .t2 ; Ip .t// does not lie on the convex minorant of Jp . For each  > 0, construct f as in the proof of Theorem 6.9, with d D 2, and r, r1 and r2 replaced by t, t1 and t2 . Then, as we have seen, Ip . f / < Ip .t/ for sufficiently small . Let a, b, I0 , I1 and I2 be as in the proof of Theorem 6.9. Define a function u W Œ0; 1 ! R as 8 ˆ ˆ .1  a  b/t1 t D tu.x/: Similarly, for x 2 I2 , Kf u.x/ > .1  a  b/t2 t D tu.x/: Finally, if x 2 I0 , then Kf u.x/ D a.1  a  b/t12 C b.1  a  b/t22 C .1  a  b/t2 D .1  a  b/.t2 C at12 C bt22 /: Plugging in the values of a and b the relation t2 D st12 C .1  s/t22 , this gives Kf u.x/ D t2 C .t22  t2 / 3 C O. 4 / as  ! 0, proving that when  is sufficiently small, Kf u.x/ > tu.x/ for all x 2 I0 . Thus, for  sufficiently small, Kf u.x/ > tu.x/ for almost all x 2 Œ0; 1. Since u and Kf are nonnegative functions, this implies that kKf uk > tkuk, and therefore kKf k > t. This proves that t belongs to the region of broken symmetry. Next, take any t 2 .0; p/. Take any f such that kKf k  t. Let 1 denote the graphon that is identically equal to 1. Then Z Œ0;12

f .x; y/ dx dy D .1; Kf 1/  k1kkKf 1k  kKf k  t:

Applying inequality (6.10.1), this gives Ip . f /  Ip .t/, with equality if and only if f D t almost everywhere, completing the proof of the theorem. u t

References

97

Bibliographical Notes The replica symmetric regime for the upper tail of the density of triangles was partially identified in Chatterjee and Dey [1] using techniques based on Stein’s method for concentration inequalities. The variational form of the large deviation rate function for triangle density was computed in Chatterjee and Varadhan [2], as an application of the general theory developed in that paper. The double phase transition in the upper tail for triangle density was established in Chatterjee and Varadhan [2], although the identification of the exact phase boundary was left as an open problem. The problem was finally solved by Lubetzky and Zhao [6], in a paper that contains most of the important results presented in this chapter, including Theorem 6.9 (the phase boundary theorem), Theorem 6.10 (double phase transition), Theorem 6.11 (lower tail for Sidorenko graphs) and Theorem 6.12 (large deviations for the largest eigenvalue). The notion of a nice graph parameter was also introduced in Lubetzky and Zhao [6]. A more general investigation of large deviations for random matrices was undertaken in Chatterjee and Varadhan [3]. The Euler–Lagrange equations (Theorem 6.5), the general symmetric phase theorem (Theorem 6.6) and the general symmetry breaking theorem (Theorem 6.7) are new contributions of this monograph. Sidorenko’s conjecture was posed by Erd˝os and Simonovits in Simonovits [8] and in a slightly more general form by Sidorenko [7]. The conjecture has been proved for trees and even cycles in Sidorenko [7], for hypercubes in Hatami [5] and for bipartite graphs with one vertex complete to the other part in Conlon et al. [4]. One topic that was investigated in Chatterjee and Varadhan [2] but is omitted in this monograph is the behavior of the rate functions when p is sent to zero. The behavior is predictable but the analysis is quite technical, which is the main reason for the omission.

References 1. Chatterjee, S., & Dey, P. S. (2010). Applications of Stein’s method for concentration inequalities. Annals of Probability, 38, 2443–2485. 2. Chatterjee, S., & Varadhan, S. R. S. (2011). The large deviation principle for the Erd˝os-Rényi random graph. European Journal of Combinatorics, 32(7), 1000–1017. 3. Chatterjee, S., & Varadhan, S. R. S. (2012). Large deviations for random matrices. Communications on Stochastic Analysis, 6(1), 1–13. 4. Conlon, D., Fox, J., & Sudakov, B. (2010). An approximate version of Sidorenko’s conjecture. Geometric and Functional Analysis, 20(6), 1354–1366. 5. Hatami, H. (2010). Graph norms and Sidorenko’s conjecture. Israel Journal of Mathematics, 175, 125–150. 6. Lubetzky, E. & Zhao, Y. (2015). On replica symmetry of large deviations in random graphs. Random Structures & Algorithms, 47(1), 109–146. 7. Sidorenko, A. (1993). A correlation inequality for bipartite graphs. Graphs and Combinatorics, 9(2), 201–204. 8. Simonovits, M. (1984). Extremal graph problems, degenerate extremal problems, and supersaturated graphs. In Progress in graph theory (Waterloo, Ontario, 1982) (pp. 419–437). Toronto, ON: Academic Press.

Chapter 7

Exponential Random Graph Models

Let Gn be the space of all simple graphs on n labeled vertices. A variety of probability models on this space can be presented in exponential form X  k ˇi Ti .G/  .ˇ/ pˇ .G/ D exp iD1

where ˇ D .ˇ1 ; : : : ; ˇk / is a vector of real parameters, T1 ; T2 ; : : : ; Tk are real-valued functions on Gn , and .ˇ/ is the normalizing constant. Usually, Ti are taken to be counts of various subgraphs, for example T1 .G/ D number of edges in G, T2 .G/ D number of triangles in G, etc. These are known as exponential random graph models (ERGM). Our goal in this chapter will be to understand the behavior of random graphs drawn from this class of models, and to calculate the asymptotic values of the normalizing constants. We will continue to use all the notations and terminologies introduced in the preceding chapters.

7.1 Formal Definition Using Graphons Fix n and let Gn denote the set of simple graphs on the vertex set f1; : : : ; ng, as above. For any G 2 Gn , recall the definition of the graphon f G from Chap. 3. Let feG be the image of f G in the quotient space f W . For simplicity, we will write e G instead e G of f . Let T be a graph parameter, that is, a real-valued continuous function on the space f W . Since f W is a compact space, T is automatically a bounded function. Define the probability mass function pn on Gn induced by T as: e

2 .T.G/

pn .G/ WD en

n/

;

© Springer International Publishing AG 2017 S. Chatterjee, Large Deviations for Random Graphs, Lecture Notes in Mathematics 2197, DOI 10.1007/978-3-319-65816-2_7

99

100

where

7 Exponential Random Graph Models n

is a constant such that the total mass of pn is 1. Explicitly, n

D

X 2 1 G/ log en T.e 2 n

(7.1.1)

G2Gn

The coefficient n2 is meant to ensure that Note that T does not vary with n.

n

tends to a non-trivial limit as n ! 1.

7.2 Normalizing Constant Define a function I W Œ0; 1 ! R as I.u/ WD u log u C .1  u/ log.1  u/ and extend I to f W in the usual manner: Z I.e h/ D I.h.x; y// dx dy Œ0;12

(7.2.1)

(7.2.2)

where h is a representative element of the equivalence class e h. It follows from Proposition 5.1 of Chap. 5 (taking p D 1=2) that I is well-defined and lower semi-continuous on f W . The following theorem gives the asymptotic value of the normalizing constant in any exponential random graph model as a variational problem involving the functional I defined above. Theorem 7.1 Let T be a graph parameter and n be the normalizing constant of the exponential random graph model induced by T on the set of simple graphs on n vertices, as defined in Eq. (7.1.1). Let I be the function defined above. Then lim

n!1

n

  1 h/ : h/  I.e D sup T.e 2 e e h2W

Proof For each Borel set e Af W and each n, define e An WD fe h2e A We hDe G for some G 2 Gn g: Let e Pn;p be the Erd˝os–Rényi measure on f W , as defined in Chap. 5. Note that e An is a finite set and An / D 2n.n1/=2e A/: je An j D 2n.n1/=2e Pn;1=2 .e Pn;1=2 .e

7.2 Normalizing Constant

101

Thus, if e F is a closed subset of f W , then by Theorem 5.2, lim sup n!1

log je Fnj log 2 1  inf I1=2 .e  h/ 2 n 2 2e h2e F 1 D  inf I.e h/: 2e h2e F

(7.2.3)

e is an open subset of f Similarly if U W, en j log jU 1   inf I.e h/: 2 n 2e h2e U

lim inf n!1

(7.2.4)

Fix  > 0. Since T is a bounded function, there is a finite set R such that the intervals f.a; a C / W a 2 Rg cover the range of T. For each a 2 R, let e F a WD T 1 .Œa; a C /. By the continuity of T, each e F a is closed. Now, 2

en

n



X

2 .aC/

en

2 je Fan j  jRj sup en .aC/ je Fan j:

a2R

a2R

By (7.2.3), this shows that lim sup

n

n!1

  1  sup a C   inf I.e h/ : 2e h2e Fa a2R

Each e h2e F a satisfies T.e h/  a. Consequently,     1 1 e 1 e e h/: sup T.h/  I.h/  sup a  I.h/ D a  inf I.e 2 2 2e h2e Fa e e h2e Fa h2e Fa Substituting this in the earlier display gives lim sup n!1

n

  1 e e   C sup sup T.h/  I.h/ 2 a2R e h2e Fa   1 D  C sup T.e h/ : h/  I.e 2 e e h2W

(7.2.5)

ea WD T 1 ..a; a C //. By the continuity of T, U ea is an open For each a 2 R, let U set. Note that 2

en

n

2

ean j:  sup en a jU a2R

102

7 Exponential Random Graph Models

Therefore by (7.2.4), for each a 2 R lim inf n!1

n

a

1 inf I.e h/: 2e h2e Ua

ea satisfies T.e Each e h2U h/ < a C . Therefore,     1 1 1 h/  sup a C   I.e h/ D a C   inf I.e sup T.e h/  I.e h/: 2 2 2e h2e Ua e e h2e Ua h2e Ua Together with the previous display, this shows that lim inf

n

n!1

  1 h/ h/  I.e   C sup sup T.e 2 a2R e h2e Ua   1 e e D  C sup T.h/  I.h/ : 2 e e h2W

Since  is arbitrary in (7.2.5) and (7.2.6), this completes the proof.

(7.2.6) t u

7.3 Asymptotic Structure Theorem 7.1 gives an asymptotic formula for n . However, it says nothing about the behavior of a random graph drawn from the exponential random graph model. Some aspects of this behavior can be described as follows. Let e F  be the subset 1 e f f e of W where T.h/  2 I.h/ is maximized. By the compactness of W , the continuity of T and the lower semi-continuity of I, e F is a non-empty compact set. Let Gn be a random graph on n vertices drawn from the exponential random graph model defined by T. The following theorem shows that for n large, e Gn must lie close to e F  e with high probability. In particular, if F is a singleton set, then the theorem gives a weak law of large numbers for Gn . Theorem 7.2 Let e F  and Gn be defined as in the above paragraph. Let P denote the probability measure on the underlying probability space on which Gn is defined. Then for any > 0 there exist C;  > 0 such that for any n, 2 P.ı .e Gn ; e F  / > /  Cen  :

Proof Take any > 0. Let e h; e F /  g: A WD fe h W ı .e

7.3 Asymptotic Structure

103

f and e It is easy to see that e A is a closed set. By compactness of W F  , and upper 1 semi-continuity of T  2 I, it follows that     1 1 h/  I.e h/  I.e 2 WD sup T.e h/  sup T.e h/ > 0: 2 2 e e h2e A e h2W Aa WD e A\e Fa. Choose  D  and define e Fa and R as in the proof of Theorem 7.1. Let e Then X 2 2 2 2 P.Gn 2 e A/  en n en .aC/ je Aan j  en n jRj sup en .aC/ je Aan j: a2R

a2R

While bounding the last term above, it can be assumed without loss of generality that e Aa is non-empty for each a 2 R, for the other a’s can be dropped without upsetting the bound. By (7.2.3) and Theorem 7.1 (noting that e Aa is compact), the above display gives log P.Gn 2 e A/ n2 n!1     1 1 e e e  sup a C   inf I.h/  sup T.h/  I.h/ : 2e 2 h2e Aa a2R e e h2W

lim sup

Each e h2e Aa satisfies T.e h/  a. Consequently,     1 1 e 1 e e sup T.h/  I.h/  sup a  I.h/ D a  inf I.e h/: 2 2 2e h2e Aa e e h2e Aa h2e Aa Substituting this in the earlier display gives log P.Gn 2 e A/ (7.3.1) n2 n!1     1 e 1 e e e   C sup sup T.h/  I.h/  sup T.h/  I.h/ 2 2 a2R e e h2e Aa e h2W     1 1 D  C sup T.e h/  sup T.e h/ D   2 D : h/  I.e h/  I.e 2 2 e e h2e A e h2W

lim sup

This completes the proof.

t u

104

7 Exponential Random Graph Models

7.4 An Explicitly Solvable Case Let H1 ; : : : ; Hk be finite simple graphs, where H1 is the complete graph on two vertices (that is, just a single edge), and each Hi contains at least one edge. Let ˇ1 ; : : : ; ˇk be k real numbers. For any h 2 W , let T.h/ WD

k X

ˇi t.Hi ; h/

(7.4.1)

iD1

where t.Hi ; h/ is the homomorphism density of Hi in h, defined in Chap. 3. By Proposition 3.2, any such T is a graph parameter. For any finite simple graph G that has at least as many nodes as the largest of the Hi ’s, k X

T.e G/ D

ˇi t.Hi ; G/;

iD1

where t.Hi ; G/ is the homomorphism density of Hi in G. For example, if k D 2, H2 is a triangle and G has at least three nodes, then number of edges in G number of triangles in G T.e G/ D 2ˇ1 C 6ˇ2 : n2 n3

(7.4.2)

When T is of the form (7.4.1) and ˇ2 ; : : : ; ˇk are nonnegative, the following theorem says that the variational problem of Theorem 7.1 can be reduced to a simple maximization problem in one real variable. The theorem moreover says that each solution of the variational problem is a constant function, and there are only a finite number of solutions. By Theorem 7.2, this implies that when ˇ2 ; : : : ; ˇk are nonnegative, exponential random graphs from this class of models behave like random graphs drawn from a finite mixture of Erd˝os–Rényi models. Theorem 7.3 Let H1 ; : : : ; Hk and T be as above. Suppose that the parameters ˇ2 ; : : : ; ˇk are nonnegative. Let n be the normalizing constant of the exponential random graph model induced by T on the set of simple graphs on n vertices, as defined in Eq. (7.1.1). Then lim

n!1

n

D sup

X k

0u1

1 ˇi ue.Hi /  I.u/ 2 iD1

 (7.4.3)

where I is the function defined in (7.2.1) and e.Hi / is the number of edges in Hi . Moreover, there are only a finite number of solutions of the variational problem of Theorem 7.1 for this T, and each solution is a constant function, where the constant solves the scalar maximization problem (7.4.3).

7.4 An Explicitly Solvable Case

105

Proof By Theorem 7.1, lim

n

n!1

  1 D sup T.h/  I.h/ : 2 h2W

(7.4.4)

By Hölder’s inequality, Z t.Hi ; h/ 

Œ0;12

h.x; y/e.Hi / dx dy:

Thus, by the nonnegativity of ˇ2 ; : : : ; ˇk , T.h/  ˇ1 t.H1 ; h/ C

k X

Z ˇi

iD2

Z D

k X Œ0;12

Œ0;12

h.x; y/e.Hi / dx dy

ˇi h.x; y/e.Hi / dx dy:

iD1

On the other hand, the inequality in the above display becomes an equality if h is a constant function. Therefore, if u is a point in Œ0; 1 that maximizes k X

1 ˇi ue.Hi /  I.u/; 2 iD1

then the constant function h.x; y/ u solves the variational problem (7.4.4). To see that constant functions are the only solutions, assume that there is at least one i such that the graph Hi has at least one vertex with two or more neighbors. The above steps show that if h is a maximizer, then for each i, Z t.Hi ; h/ D

Œ0;12

h.x; y/e.Hi / dx dy:

(7.4.5)

In other words, equality holds in Hölder’s inequality. Suppose that Hi has vertex set f1; 2; : : : ; kg and vertices 2 and 3 are both neighbors of 1 in Hi . Recall that Z t.Hi ; h/ D

Y Œ0;1k

h.xj ; xl / dx1    dxk :

f j;lg2E.Hi /

In particular, the integrand contains the product h.x1 ; x2 /h.x1 ; x3 /. From this and the criterion for equality in Hölder’s inequality, it follows that h.x1 ; x2 / is a constant multiple of h.x1 ; x3 / for almost every .x1 ; x2 ; x3 / 2 Œ0; 13 . Using the symmetry of h one can now easily conclude that h is almost everywhere a constant function.

106

7 Exponential Random Graph Models

If the condition does not hold, then each Hi is a union of vertex-disjoint edges. Assume that some Hi has more than one edge. Then again by (7.4.5) it follows that h must be a constant function. Finally, if each Hi is just a single edge, then the maximization problem (7.4.4) can be explicitly solved and the solutions are all constant functions. Lastly, note that since the second derivative of the map u 7!

k X

1 ˇi ue.Hi /  I.u/ 2 iD1

is a rational function of u, Rolle’s theorem implies that the set of maximizers is a finite set. t u

7.5 Another Solvable Example A j-star is an undirected graph with one root vertex and j other vertices connected to the root vertex, with no edges between any of these j vertices. Let Hj be a j-star for j D 1; : : : ; k. Let T be the graph parameter defined in (7.4.1) with these H1 ; : : : ; Hk . It turns out that the exponential random graph model for this T can be explicitly solved for any ˇ1 ; : : : ; ˇk . Theorem 7.4 For the sufficient statistic T defined above, the conclusions of Theorem 7.3 hold for any ˇ1 ; : : : ; ˇk 2 R. Proof Take any h 2 W . Note that Z t.Hj ; h/ D Z

Œ0;1 j 1

D

h.x1 ; x2 /h.x1 ; x3 /    h.x1 ; xj / dx1    dxj

M.x/ j dx

0

where Z

1

M.x/ D

h.x; y/ dy: 0

Since I is a convex function, Z

1

I.h.x; y// dy  I.M.x//; 0

7.6 Phase Transition in the Edge-Triangle Model

107

with equality if and only if h.x; y/ is the same for almost all y. Thus, putting P.u/ WD

k X

ˇj u j ;

jD1

we get 1 T.h/  I.h/ D 2

Z

1

1 P.M.x// dx  I.h/ 2 0  Z 1 1  P.M.x//  I.M.x// dx 2 0

with equality if and only if for almost all x, (a) h.x; y/ is constant as a function of y, and (b) M.x/ equals a value u that maximizes P.u/  12 I.u/. By the symmetry of h, the condition (a) implies that h is constant almost everywhere. The condition (b) gives the set of possible values of this constant. The rest follows as in the proof of Theorem 7.3. t u

7.6 Phase Transition in the Edge-Triangle Model The computations of Sect. 7.4 imply the existence of phase transitions in exponential random graph models. In this section we will demonstrate this through one example. Consider the exponential random graph model corresponding to the graph parameter T defined in (7.4.2). This is sometimes called the ‘edge-triangle model’. Let Gn be a random graph on n vertices drawn from this model. Fix ˇ1 and ˇ2 and let 1 `.u/ WD ˇ1 u C ˇ2 u3  I.u/ 2

(7.6.1)

where I is the function defined in (7.2.1). Let U be the set of maximizers of `.u/ in Œ0; 1. Theorem 7.3 describes the limiting behavior of Gn in terms of the set U. In particular, if U consists of a single point u D u .ˇ1 ; ˇ2 /, then Gn behaves like the Erd˝os–Rényi graph G.n; u / when n is large. It is unlikely that u .ˇ1 ; ˇ2 / has a closed form expression, other than when ˇ2 D 0, in which case u .ˇ1 ; 0/ D

e2ˇ1 : 1 C e2ˇ1

108

7 Exponential Random Graph Models

We will now present a theorem which shows that for ˇ1 below a threshold, u has a single jump discontinuity in ˇ2 , signifying a first-order phase transition in the parlance of statistical physics. Theorem 7.5 Let Gn be a random graph from the edge-triangle model, as defined above. Let c1 WD

eˇ1 1 ; c2 WD 1 C : 1 C eˇ1 2ˇ1

Suppose that ˇ1 < 0 and jˇ1 j is so large that c1 < c2 . Let e.Gn / be the number of edges in Gn and let f .Gn / WD e.Gn /= n2 be the edge density. Then there exists q D q.ˇ1 / 2 Œ0; 1/ such that if 1 < ˇ2 < q, then lim P. f .Gn / > c1 / D 0;

n!1

and if ˇ2 > q, then lim P. f .Gn / < c2 / D 0:

n!1

Proof Fix ˇ1 < 0 such that c1 < c2 . As a preliminary step, let us prove that for any ˇ2 > 0, lim P. f .Gn / 2 .c1 ; c2 // D 0:

n!1

(7.6.2)

Fix ˇ2 > 0. Let u be any maximizer of the function ` defined in (7.6.1). Then by Theorem 7.3, it suffices to prove that either u < eˇ1 =.1 C eˇ1 / or u > 1 C 1=2ˇ1 . This is proved as follows. Define a function g W Œ0; 1 ! R as g.v/ WD `.v 1=3 /: Then ` is maximized at u if and only if g is maximized at u3 . Since ` is a bounded continuous function and `0 .0/ D 1 and `0 .1/ D 1, ` cannot be maximized at 0 or 1. Therefore the same is true for g. Let v be a point in .0; 1/ at which g is maximized. Then g00 .v/  0. A simple computation shows that   1 v 1=3 1 : g .v/ D 5=3 2ˇ1 C log  9v 1  v 1=3 2.1  v 1=3 / 00

Thus, g00 .v/  0 only if log

v 1=3  ˇ1 1  v 1=3

or



1  ˇ1 : 2.1  v 1=3 /

7.6 Phase Transition in the Edge-Triangle Model

109

This shows that a maximizer u of ` must satisfy u  c1 or u  c2 . Now, if u D c1 , then u < c2 , and therefore the above computations show that g00 .v/ > 0, where v D u3 . Similarly, if u D c2 then u > c1 and again g00 .v/ > 0. Thus, we have proved that u < c1 or u > c2 . By Theorem 7.2, this completes the proof of (7.6.2) when ˇ2 > 0. Now notice that as ˇ2 ! 1, supua `.u/ ˇ2 a3 for any fixed a  1. This shows that as ˇ2 ! 1, any maximizer of ` must eventually be larger than 1 C 1=2ˇ1 . Therefore, for sufficiently large ˇ2 , lim P. f .Gn / < c2 / D 0:

n!1

(7.6.3)

Next consider the case ˇ2  0. Let e F be the set of maximizers of T.e h/  12 I.e h/.  2ˇ Take any e h2e F and let h be a representative element of e h. Let p D e 1 =.1 C e2ˇ1 /. An easy verification shows that 1 1 1 T.h/  I.h/ D ˇ2 t.H2 ; h/  Ip .h/  log.1  p/; 2 2 2 where Ip .h/ is defined in Eq. (5.2.1) of Chap. 5. Define a new function h1 .x; y/ WD minfh.x; y/; pg: Since the function Ip defined in (5.2.1) is minimized at p, it follows that for all x; y 2 Œ0; 1, Ip .h1 .x; y//  Ip .h.x; y//. Consequently, Ip .h1 /  Ip .h/. Again, since ˇ2  0 and h1  h everywhere, ˇ2 t.H2 ; h1 /  ˇ2 t.H2 ; h/. Combining these observations, we see that T.h1 /  12 I.h1 /  T.h/  12 I.h/. Since h maximizes T  12 I it follows that equality must hold at every step in the above deductions, from which it is easy to conclude that h D h1 a.e. In other words, h.x; y/  p a.e. This is true for every e h2e F  . Since p < c1 , the above deduction coupled with Theorem 7.2 proves that when ˇ2  0, lim P. f .Gn / > c1 / D 0:

n!1

(7.6.4)

Recalling that ˇ1 is fixed, define an .ˇ2 / WD P. f .Gn / > c1 /; bn .ˇ2 / WD P. f .Gn / < c2 /: Let An and Bn denote the events in brackets in the above display. A simple computation shows that a0n .ˇ2 / D

6 6 Cov.1An ; .Gn // and b0n .ˇ2 / D Cov.1Bn ; .Gn //; n n

110

7 Exponential Random Graph Models

where .Gn / is the number of triangles in Gn . It is easy to verify that the edgetriangle model with ˇ2  0 satisfies the FKG lattice criterion (2.6.1) stated in Chap. 2. Moreover, 1An and  are increasing functions of the edge variables, and 1Bn is a decreasing function. Therefore the above identities and the FKG inequality (Theorem 2.2 of Chap. 2) show that on the nonnegative axis, an is a non-decreasing function and bn is a non-increasing function. Let q1 WD supfx 2 R W limn!1 an .x/ D 0g. By Eq. (7.6.3), q1 < 1 and by Eq. (7.6.4) q1  0. Similarly, if q2 WD inffx 2 R W limn!1 bn .x/ D 0g, then 0  q2 < 1. Also, clearly, q1  q2 since an C bn  1 everywhere. We claim that q1 D q2 . This would complete the proof by the monotonicity of an and bn . To prove that q1 D q2 , suppose not. Then q1 < q2 . Then for any ˇ2 2 .q1 ; q2 /, lim sup an .ˇ2 / > 0 and lim sup bn .ˇ2 / > 0. Now, 0  an .ˇ2 / C bn .ˇ2 /  1 D P. f .Gn / 2 .c1 ; c2 //: Therefore by (7.6.2), lim .an .ˇ2 / C bn .ˇ2 /  1/ D 0:

n!1

Thus, for any ˇ2 2 .q1 ; q2 /, lim sup.1  bn .ˇ2 // > 0. By Theorem 7.3, this implies that the function ` has a maximum in Œc2 ; 1. Similarly, for any ˇ2 2 .q1 ; q2 /, lim sup.1  an .ˇ2 // > 0 and therefore the function ` has a maximum in Œ0; c1 . Now fix q1 < ˇ2 < ˇQ2 < q2 , and let ` and `Q denote the two `-functions corresponding to ˇ2 and ˇQ2 respectively. That is, 1 Q D ˇ1 u C ˇQ2 u3  1 I.u/: `.u/ D ˇ1 u C ˇ2 u3  I.u/; `.u/ 2 2 By the above argument, ` attains its maximum at some point u1 2 Œ0; c1  and at some point u2 2 Œc2 ; 1. (There may be other maxima, but that is irrelevant for us.) Note that Q D max.`.u/ C .ˇQ2  ˇ2 /u3 /  `.u1 / C .ˇQ2  ˇ2 /c3 : max `.u/ 1 uc1

uc1

On the other hand Q  `.u Q 2 / D `.u2 / C .ˇQ2  ˇ2 /u3  `.u2 / C .ˇQ2  ˇ2 /c3 : max `.u/ 2 2 uc2

Since `.u1 / D `.u2 /, ˇQ2 > ˇ2 and c2 > c1 , this shows that Q < max `.u/; Q max `.u/ uc1

uc2

contradicting our previous deduction that `Q has maxima in both Œ0; c1  and Œc2 ; 1. This proves that q1 D q2 . t u

7.7 Euler–Lagrange Equations

111

7.7 Euler–Lagrange Equations In this section we will derive the Euler–Lagrange equation for the solution of the variational problem of Theorem 7.1. The analysis is very similar to the one carried out in Sect. 6.5 of Chap. 6. Let the operator H be defined as in Eq. (6.5.1) of that section. Theorem 7.6 Let T be as in Eq. (7.4.1). If h 2 W maximizes T.h/  12 I.h/, then for almost all .x; y/ 2 Œ0; 12 , h.x; y/ D

e2

Pk iD1

1 C e2

ˇi Hi h.x;y/

Pk iD1

ˇi Hi h.x;y/

:

Moreover, h is bounded away from 0 and 1. Proof Let g be a symmetric bounded measurable function from Œ0; 1 into R. For each u 2 R, let hu .x; y/ WD h.x; y/ C ug.x; y/: Then hu is a symmetric bounded measurable function from Œ0; 1 into R. First suppose that h is bounded away from 0 to 1. Then hu 2 W for every u sufficiently small in magnitude. Since h maximizes T.h/  12 I.h/ among all elements of W , therefore under the above assumption, for all u sufficiently close to zero, 1 1 T.hu /  I.hu /  T.h/  I.h/: 2 2 In particular,  ˇ ˇ d 1 T.hu /  I.hu / ˇˇ D 0: du 2 uD0

(7.7.1)

It is easy to check that T.hu /  12 I.hu / is differentiable in u for any h and g. In particular, the derivative is given by   X k d d 1 1 d T.hu /  I.hu / D I.hu /: ˇi t.Hi ; hu /  du 2 du 2 du iD1 Now, 1 d 1 I.hu / D 2 du 2 D

1 2

Z Z

Œ0;12

Œ0;12

d I.h.x; y/ C ug.x; y// dy dx du g.x; y/ log

hu .x; y/ dy dx: 1  hu .x; y/

112

7 Exponential Random Graph Models

Consequently, ˇ Z ˇ 1 h.x; y/ 1 d I.hu /ˇˇ dy dx: D g.x; y/ log 2 du 2 Œ0;12 1  h.x; y/ uD0 Next, note that d t.Hi ; hu / du Z D

X

Œ0;1V.H/ .r;s/2E.H / i

Z D

Œ0;12

Y

g.xr ; xs /

hu .xr0 ; xs0 /

fr0 ;s0 g2E.Hi / fr0 ;s0 g¤fr;sg

Y

dxv

v2V.Hi /

g.x; y/Hi hu .x; y/ dy dx:

Combining the above computations and (7.7.1), we see that for any symmetric bounded measurable g W Œ0; 1 ! R, “ g.x; y/

X k iD1

ˇi Hi h.x; y/ 

 h.x; y/ 1 log dy dx D 0: 2 1  h.x; y/

Taking g.x; y/ equal to the function within the brackets (which is bounded since h is assumed to be bounded away from 0 and 1), the conclusion of the theorem follows. Now recall that the theorem was proved under the assumption that h is bounded away from 0 and 1. We claim that this is true for any h that maximizes T.h/  12 I.h/. To prove this claim, take any such h. Fix p 2 .0; 1/. For each u 2 Œ0; 1, let hp;u .x; y/ WD h.x; y/ C u. p  h.x; y//C : In other words, hp;u is simply hu with g D . p  h/C . Then hp;u is a symmetric bounded measurable function from Œ0; 12 into Œ0; 1. Note that d hp;u .x; y/ D . p  h.x; y//C : du Using this, an easy computation as above shows that  ˇ ˇ 1 d T.hp;u /  I.hp;u / ˇˇ du 2 uD0  X Z k h.x; y/ 1 . p  h.x; y//C dy dx D ˇi Hi h.x; y/  log 2 1  h.x; y/ Œ0;12 iD1   Z h.x; y/ 1 log . p  h.x; y//C dy dx C   2 1  h.x; y/ Œ0;12

7.8 The Symmetric Phase

113

where C is a positive constant depending only on ˇ1 ; : : : ; ˇk and H1 ; : : : ; Hk (and not on p or h). When h.x; y/ D 0, the integrand is interpreted as 1, and when h.x; y/ D 1, the integrand is interpreted as 0. Now, if p is so small that C 

p 1 log > 0; 2 1p

then the previous display proves that the derivative of T.hp;u /  12 I.hp;u / with respect to u is strictly positive at u D 0 if h < p on a set of positive Lebesgue measure. Hence h cannot be a maximizer of T  12 I unless h  p almost everywhere. This proves that any maximizer of T  12 I must be bounded away from zero. A similar argument with g D .h  p/C shows that it must be bounded away from 1 and hence completes the proof of the theorem. t u

7.8 The Symmetric Phase This section is the analog of Sect. 6.6 of Chap. 6 for exponential random graph models. The following theorem shows that for the statistic T defined in Eq. (7.4.1), there is a unique maximizer of T.h/  12 I.h/ if jˇ2 j; : : : ; jˇk j are small enough. The proof involves a simple application of Euler–Lagrange equation presented in Theorem 7.6. Recall that the same statement was proved in Theorem 7.3 under the hypothesis that ˇ2 ; : : : ; ˇk are nonnegative. Theorem 7.7 Let T be as in Eq. (7.4.1). Suppose ˇ1 ; : : : ; ˇk are such that k X

jˇi je.Hi /.e.Hi /  1/ < 2

iD2

where e.Hi / is the number of edges in Hi . Then the conclusions of Theorem 7.3 hold. Proof It suffices to prove that the maximizer of T  12 I is unique. This is because if h is a maximizer, then so is h .x; y/ WD h. x; y/ for any measure preserving bijection W Œ0; 1 ! Œ0; 1. The only functions that are invariant under such transforms are functions that are constant almost everywhere. Let H be the operator defined in Sect. 6.5 of Chap. 6. Let k  k1 denote the L1 norm on W . Let h and g be two maximizers of T  12 I. For any finite simple graph H, a simple computation shows that kH h  H gk1 

X

kH;r;s h  H;r;s gk1

.r;s/2E.H/

 e.H/.e.H/  1/kh  gk1 :

114

7 Exponential Random Graph Models

Using the above inequality, Theorem 7.6 and the inequality ˇ ˇ x ˇ e ey ˇˇ jx  yj ˇ  ˇ 1 C ex 1 C ey ˇ  4 ; it follows that for almost all x; y, Pk ˇ 2 Pk ˇ  h.x;y/ ˇ ˇ e iD1 i Hi e2 iD1 ˇi Hi g.x;y/ ˇˇ ˇ jh.x; y/  g.x; y/j D ˇ  Pk Pk ˇ 1 C e2 iD1 ˇi Hi h.x;y/ 1 C e2 iD1 ˇi Hi g.x;y/

1X  jˇi jkHi h  Hi gk1 2 iD1 k

X 1 kh  gk1 jˇi je.Hi /.e.Hi /  1/: 2 iD1 k



If the coefficient of kh  gk1 in the last expression is strictly less than 1, it follows that h must be equal to g almost everywhere. t u

7.9 Symmetry Breaking Let T be as in (7.4.1). Theorems 7.3 and 7.7 give sufficient conditions for T  12 I to be maximized by a constant function. Is it possible that T  12 I has a non-constant maximizer in certain situations? In analogy with Sect. 6.7 of Chap. 6, we call this ‘symmetry breaking’. The following theorem shows that symmetry breaking is indeed possible, by considering the specific case of the edge-triangle model. As discussed in Sect. 6.7, symmetry breaking implies that a typical random graph from the model does not ‘look like’ an Erd˝os–Rényi graph. Theorem 7.8 Let T be as in Eq. (7.4.2). Then for any given value of ˇ1 , there is a positive constant C.ˇ1 / sufficiently large so that whenever ˇ2 < C.ˇ1 /, T  12 I is not maximized at any constant function. Proof Fix ˇ1 . Let p D e2ˇ1 =.1 C e2ˇ1 / and  D ˇ2 , so that for any h 2 W , 1 1 1 T.h/  I.h/ D  t.H2 ; h/  Ip .h/  log.1  p/: 2 2 2 Assume without loss of generality that ˇ2 < 0. Suppose that u is a constant such that the constant function h.x; y/ u maximizes T.h/  12 I.h/, that is, minimizes  t.H2 ; h/ C 12 Ip .h/. Note that 1 1  t.H2 ; h/ C Ip .h/ D  u3 C Ip .u/: 2 2

7.9 Symmetry Breaking

115

Clearly, the definition of u implies that  u3 C 12 Ip .u/   x3 C 12 Ip .x/ for all x 2 Œ0; 1. This implies that u must be in .0; 1/, because the derivative of x 7!  x3 C 12 Ip .x/ is 1 at 0 and 1 at 1. Thus,  ˇ ˇ d u 1 p 1 1 3 0D  x C Ip .x/ ˇˇ  log D 3 u2 C log dx 2 2 1u 2 1p xDu which shows that u  c. /, where c. / is a function of  such that lim c. / D 0:

 !1

This shows that   1 1 1 1 : lim min  x3 C Ip .x/ D Ip .0/ D log  !1 0x1 2 2 2 1p

(7.9.1)

Next let g be the function ( g.x; y/ WD

0

if x; y on same side of 1=2

p

if not.

Clearly, for almost all .x; y; z/ 2 Œ0; 13 , g.x; y/g.y; z/g.z; x/ D 0. Thus, t.H2 ; g/ D 0. A simple computation shows that 1 1 1 Ip .g/ D log : 2 4 1p 1 Therefore  t.H2 ; g/ C 12 Ip .g/ D 14 log 1p . This shows that if  is large enough (depending on p and hence ˇ1 ), then T  12 I cannot be maximized at a constant function. t u

Bibliographical Notes Exponential random graph models have a long history in the statistics literature. They were used by Holland and Leinhardt [10] in the directed case. Frank and Strauss [7] developed them, showing that if Ti are chosen as edges, triangles and stars of various sizes, the resulting random graph edges form a Markov random field. A general development is in Wasserman and Faust [20]. Newer developments, consisting mainly of new sufficient statistics and new ranges for parameters that give interesting and practically relevant structures, are summarized in Snijders et al. [19]. Rinaldo et al. [18] developed the geometric theory for this class of models with extensive further references.

116

7 Exponential Random Graph Models

A major problem in this field is the evaluation of the normalizing constant, which is crucial for carrying out maximum likelihood and Bayesian inference. For a long time, there used to exist no feasible analytic method for approximating the normalizing constant when n is large. Physicists had tried the unrigorous technique of mean field approximations; see Park and Newman [14, 15] for the case where T1 is the number of edges and T2 is the number of two-stars or the number of triangles. For exponential graph models, Chatterjee and Dey [4] proved that these approximations work in certain subregions of the symmetric phase. This was further developed in Bhamidi et al. [2]. The problem of analytically computing the normalizing constant in exponential random graph models was finally solved in Chatterjee and Diaconis [5] using the large deviation results of Chatterjee and Varadhan [6]. The theorems presented in this chapter are all reproduced from Chatterjee and Diaconis [5]. Recently, the mean field approach for analyzing exponential random graph models was made fully rigorous in Chatterjee and Dembo [3]. The results of this chapter could also have been derived using the methods of Chatterjee and Dembo [3] instead of graph limit theory. This is further discussed in Chap. 8. The phase transition discussed in Sect. 7.6 is a mathematically precise version of the phenomenon of ‘near-degeneracy’ that was observed and studied in Handcock [9] (see also Park and Newman [14] and Häggström and Jonasson [8]). Neardegeneracy, in this context, means that when ˇ1 is a large negative number, then as ˇ2 varies, the model transitions from being a very sparse graph for low values of ˇ2 , to a very dense graph for large values of ˇ2 , completely skipping all intermediate structures. Significant extensions of Theorem 7.5 have been made in Aristoff and Radin [1], Radin and Sadun [16], Radin and Yin [17] and Yin [21]. The symmetry breaking phenomenon discussed in Sect. 7.9 has been further investigated by Lubetzky and Zhao [13]. They showed that if in the edge-triangle model, the homomorphism density of triangles is replaced by the homomorphism density raised to the power ˛ for some ˛ 2 .0; 2=3/, then the model can exhibit symmetry breaking even if ˇ2 is nonnegative. As far as I know, there is no T for which a non-constant optimizer of T  12 I has been explicitly computed, either analytically or using a computer. There is however, one related class of models—called constrained graph models—where a similar theory has been developed and non-constant optimizers have been explicitly computed. See Kenyon et al. [12] and Kenyon and Yin [11] for details.

References 1. Aristoff, D., & Radin, C. (2013). Emergent structures in large networks. Journal of Applied Probability, 50(3), 883–888. 2. Bhamidi, S., Bresler, G., & Sly, A. (2008). Mixing time of exponential random graphs. In 2008 IEEE 49th Annual IEEE Symposium on Foundations of Computer Science (FOCS) (pp. 803–12).

References

117

3. Chatterjee, S., & Dembo, A. (2016). Nonlinear large deviations. Advances in Mathematics, 299, 396–450. 4. Chatterjee, S., & Dey, P. S. (2010). Applications of Stein’s method for concentration inequalities. Annals of Probability, 38, 2443–2485. 5. Chatterjee, S., & Diaconis, P. (2013). Estimating and understanding exponential random graph models. Annals of Statistics, 41(5), 2428–2461. 6. Chatterjee, S., & Varadhan, S. R. S. (2011). The large deviation principle for the Erd˝os-Rényi random graph. European Journal of Combinatorics, 32(7), 1000–1017. 7. Frank, O., & Strauss, D. (1986). Markov graphs. Journal of the American Statistical Association, 81, 832–842 8. Häggström, O., & Jonasson, J. (1999). Phase transition in the random triangle model. Journal of Applied Probability, 36(4), 1101–1115. 9. Handcock, M. S. (2003). Assessing degeneracy in statistical models of social networks, Working Paper 39. Technical report. Center for Statistics and the Social Sciences, University of Washington. 10. Holland, P. W., & Leinhardt, S. (1981). An exponential family of probability distributions for directed graphs. Journal of the American Statistical Association, 76(373), 33–65. 11. Kenyon, R., & Yin, M. (2014). On the asymptotics of constrained exponential random graphs. arXiv preprint arXiv:1406.3662. 12. Kenyon, R., Radin, C., Ren, K., & Sadun, L. (2014). Multipodal structure and phase transitions in large constrained graphs. arXiv preprint arXiv:1405.0599. 13. Lubetzky, E. & Zhao, Y. (2015). On replica symmetry of large deviations in random graphs. Random Structures & Algorithms, 47(1), 109–146. 14. Park, J., & Newman, M. E. J. (2004). Solution of the two-star model of a network. Physical Review E (3), 70, 066146, 5. 15. Park, J., & Newman, M. E. J. (2005). Solution for the properties of a clustered network. Physical Review E (3), 72, 026136, 5. 16. Radin, C., & Sadun, L. (2013). Phase transitions in a complex network. Journal of Physics A, 46, 305002. 17. Radin, C., & Yin, M. (2011). Phase transitions in exponential random graphs. arXiv preprint arXiv:1108.0649. 18. Rinaldo, A., Fienberg, S. E., & Zhou, Y. (2009). On the geometry of discrete exponential families with application to exponential random graph models. Electronic Journal of Statistics, 3, 446–484. 19. Snijders, T. A. B., Pattison, P. E., Robins, G. L., & Handcock, M. S. (2006). New specifications for exponential random graph models. Sociological Methodology, 36, 99–153. 20. Wasserman, S., & Faust, K. (2010). Social network analysis: Methods and applications (2nd ed.). In, Structural Analysis in the Social Sciences. Cambridge: Cambridge University Press. 21. Yin, M. (2013). Critical phenomena in exponential random graphs. Journal of Statistical Physics, 153(6), 1008–1021.

Chapter 8

Large Deviations for Sparse Graphs

The development in this monograph till now has been based on results from graph limit theory. This theory, however, is inadequate for understanding the behavior of sparse graphs. The goal of this chapter is to describe an alternative approach, called nonlinear large deviations, that allows us to prove similar results for sparse graphs. Nonlinear large deviation theory gives a way of getting quantitative error bounds in some of the large deviation theorems proved in earlier chapters. The quantitative error bounds make it possible to extend the results to the sparse regime. At the time of writing this monograph, this theory is not as well-developed as the theory for dense graphs, but developed enough to make some progress about sparse graphs.

8.1 An Approximation for Partition Functions Take any N  1. Let k f k denote the supremum norm of any function f W Œ0; 1N ! R. Suppose that f W Œ0; 1N ! R is twice continuously differentiable in .0; 1/N , such that f and all its first and second order derivatives extend continuously to the boundary. For each i and j, let fi WD

@f @2 f and fij WD : @xi @xi @xj

Define a WD k f k; bi WD k fi k and cij WD k fij k: Given  > 0, let D./ be a finite subset of RN such that for all x 2 f0; 1gN , there exists d D .d1 ; : : : ; dN / 2 D./ such that N X . fi .x/  di /2  N 2 :

(8.1.1)

iD1

© Springer International Publishing AG 2017 S. Chatterjee, Large Deviations for Random Graphs, Lecture Notes in Mathematics 2197, DOI 10.1007/978-3-319-65816-2_8

119

120

8 Large Deviations for Sparse Graphs

Define X

F WD log

e f .x/ :

x2f0;1gN

In the terminology of statistical mechanics, F is the logarithm of the partition function of the probability measure on f0; 1gN with Hamiltonian f . For x 2 Œ0; 1, let I.x/ WD x log x C .1  x/ log.1  x/;

(8.1.2)

and for x D .x1 ; : : : ; xN / 2 Œ0; 1N , let I.x/ WD

N X

I.xi /:

iD1

The following theorem gives a sufficient condition on f under which the approximation F D sup . f .x/  I.x// C lower order terms

(8.1.3)

x2Œ0;1N

is valid. This is sometimes called the ‘naive mean field approximation’ in the statistical physics literature. Roughly speaking, the condition is that in addition to some smoothness assumptions, the gradient vector rf .x/ D .@f =@x1 ; : : : ; @f =@xN / may be approximately encoded by o.N/ bits of information. A bit more precisely, we need jD./j D eo.N/ for some  D o.1/, where the implicit assumption is that N ! 1 and f varies with N. We will refer to this as the ‘low complexity gradient’ condition. Theorem 8.1 Let f , a, bi , cij , D./, F and I be as above. Then for any  > 0, F  sup . f .x/  I.x// C complexity C smoothness; x2Œ0;1N

where 1 X 2 1=2 N bi  C 3N C log jD./j; and 4 iD1 N

complexity D

smoothness D 4

X N iD1

C

.acii C

b2i /

 N

1=2 1 X 2 C ac C bi bj cij C 4bi cij 4 i; jD1 ij

N N N X 1 X 2 1=2 X 2 1=2 bi cii C3 cii C log 2: 4 iD1 iD1 iD1

8.1 An Approximation for Partition Functions

121

Moreover, F satisfies the lower bound 1X cii : 2 iD1 N

F  sup . f .x/  I.x//  x2Œ0;1N

The rest of this section is devoted to the proof of Theorem 8.1. Examples are worked in subsequent sections. We will generally denote the ith coordinate of a vector x 2 RN by xi . Similarly, the ith coordinate of a random vector X will be denoted by Xi . Given x 2 Œ0; 1N , define x.i/ to be the vector .x1 ; : : : ; xi1 ; 0; xiC1 ; : : : ; xN /. For a random vector X define X .i/ similarly. Given a function g W Œ0; 1N ! R, define the discrete derivative i g as i g.x/ WD g.x1 ; : : : ; xi1 ; 1; xiC1 ; : : : ; xN /  g.x1 ; : : : ; xi1 ; 0; xiC1 ; : : : ; xN /:

(8.1.4)

For each i, define a function xO i W Œ0; 1N ! Œ0; 1 as xO i .x/ D

1 : 1 C ei f .x/

Let xO W Œ0; 1N ! Œ0; 1N be the vector-valued function whose ith coordinate function is xO i . When the vector x is understood from the context, we will simply write xO and xO i instead of xO .x/ and xO i .x/. The proof of Theorem 8.1 requires two key lemmas. Lemma 8.1 Let X D .X1 ; : : : ; XN / be a random vector that has probability density proportional to e f .x/ on f0; 1gN with respect to the counting measure. Let XO D xO .X/. Then N N  X

1 X 2 2 2 O acij C bi bj cij : E . f .X/  f .X//  .acii C bi / C 4 iD1 i; jD1

Proof It is easy to see that xO i .x/ D E.Xi j Xj D xj ; 1  j  N; j ¤ i/: O Then clearly Let D WD f .X/  f .X/. jDj  2a: Define h.x/ WD f .x/  f .Ox.x//;

(8.1.5)

122

8 Large Deviations for Sparse Graphs

so that D D h.X/. Note that for i ¤ j, Z

ej f .x/ @Oxj D @xi .1 C ej f .x/ /2

1 0

fij .x1 ; : : : ; xj1 ; t; xjC1 ; : : : ; xn / dt;

and for i D j, the above derivative is identically equal to zero. Since ex 1  x 2 .1 C e / 4 for all x 2 R, this shows that for all i and j,    @Oxj  cij   :  @x  4 i

(8.1.6)

    N X  @h   @Oxj     k fi k C   k f k j   @x  @xi  i jD1

(8.1.7)

Thus,

1X bj cij : 4 jD1 N

 bi C Consequently, if Di WD h.X .i/ /, then

1X bj cij : 4 jD1 N

jD  Di j  bi C

(8.1.8)

For t 2 Œ0; 1 and x 2 Œ0; 1N define ui .t; x/ WD fi .tx C .1  t/Ox/; so that Z

N 1X

h.x/ D 0

.xi  xO i /ui .t; x/ dt:

iD1

Thus, 2

E.D / D

Z

N 1X 0

iD1

E..Xi  XO i /ui .t; X/D/ dt:

(8.1.9)

8.1 An Approximation for Partition Functions

123

Now, kui k  bi ;

(8.1.10)

    N X  @ui   @Oxj     tk fii k C .1  t/   k f k ij   @x  @xi  i jD1

(8.1.11)

and by (8.1.6),

 tcii C

N 1t X 2 c : 4 jD1 ij

The bounds (8.1.5), (8.1.8), (8.1.10) and (8.1.11) imply that ˇ ˇ ˇE..Xi  XO i /ui .t; X/D/  E..Xi  XO i /ui .t; X .i/ /Di /ˇ ˇ ˇ ˇ

ˇ  Eˇ ui .t; X/  ui .t; X .i/ / Dˇ C Eˇui .t; X .i/ /.D  Di /ˇ a.1  t/ X 2 1X cij C b2i C bi bj cij : 2 4 jD1 jD1 N

 2atcii C

N

But ui .t; X .i/ /Di is a function of the random variables .Xj /j¤i only. Therefore by the definition of XO i , E..Xi  XO i /ui .t; X .i/ /Di / D 0: Thus, N N X ˇ ˇ 1X ˇE..Xi  XO i /ui .t; X/D/ˇ  2atcii C a.1  t/ c2ij C b2i C bi bj cij : 2 4 jD1 jD1

Using this bound in (8.1.9) gives E.D2 / 

D

N  1X

Z 0

N X

2atcii C

iD1

.acii C b2i / C

iD1

completing the proof.

 N N a.1  t/ X 2 1X cij C b2i C bi bj cij dt 2 4 jD1 jD1 N

1 X 2 acij C bi bj cij ; 4 i; jD1

t u

124

8 Large Deviations for Sparse Graphs

Lemma 8.2 Let all notation be as in Lemma 8.1. Then  X 2  X N N N 1 X O  E .Xi  Xi /i f .X/ b2i C bi .bj C 4/cij : 4 i; jD1 iD1 iD1 Proof Let gi denote the function i f , for notational simplicity. Note that Z gi .x/ D

1 0

fi .x1 ; : : : ; xi1 ; t; xiC1 ; : : : ; xN / dt;

which shows that kgi k  k fi k D bi

(8.1.12)

   @gi     k fij k D cij :  @x  j

(8.1.13)

and for all j,

Let N X G.x/ WD .xi  xO i .x//gi .x/: iD1

Then   N  X @Oxj @gj @G 1f jDig  gj .x/ C .xj  xO j / D @xi @xi @xi jD1 and therefore by (8.1.6), (8.1.12) and (8.1.13),   N N X X  @G     bi C 1 cij bj C cij :  @x  4 jD1 i jD1

(8.1.14)

   @G   jG.x/  G.x.i/ /j    @x : i

(8.1.15)

Note that for any x,

Again, gi .X/ and G.X .i/ / are both functions of .Xj /j¤i only. Therefore E..Xi  XO i /gi .X/G.X .i/// D 0:

(8.1.16)

8.1 An Approximation for Partition Functions

125

Combining (8.1.14), (8.1.15) and (8.1.16) gives E.G.X/2 / D

N X

E..Xi  XO i /gi .X/G.X//

iD1

  N N N X X 1X  bi bi C cij bj C cij : 4 jD1 iD1 jD1 This completes the proof of the lemma. With the aid of Lemma 8.1 and 8.2, we are now ready to prove Theorem 8.1.

t u

Proof (Proof of the Upper Bound in Theorem 8.1) For x; y 2 Œ0; 1N , let g.x; y/ WD

N X

.xi log yi C .1  xi / log.1  yi //:

(8.1.17)

iD1

Note that g.x; xO /  I.Ox/ D

N X iD1

X xO i D .xi  xO i /i f .x/: 1  xO i iD1 N

.xi  xO i / log

(8.1.18)

Let B WD 4

 X N N

1=2 1 X 2 acij C bi bj cij C 4bi cij .acii C b2i / C : 4 i; jD1 iD1

Let A1 WD fx 2 f0; 1gN W jI.Ox/  g.x; xO /j  B=2g; and ˚  A2 WD x 2 f0; 1gN W j f .x/  f .Ox/j  B=2 : Let A D A1 \ A2 . By Lemma 8.1 and the identity (8.1.18), P.X 62 A1 /  1=4. By Lemma 8.2, P.X 62 A2 /  1=4. Thus, P.X 2 A/  1=2, which is the same as P e f .x/ 1 P x2A  : f .x/ e 2 N x2f0;1g

126

8 Large Deviations for Sparse Graphs

Therefore by the definition of the set A, F D log

X

e f .x/  log

e f .x/ C log 2

(8.1.19)

x2A

x2f0;1gN

 B C log

X

X

e f .Ox/I.Ox/Cg.x;Ox/ C log 2:

x2A

Now take some x 2 f0; 1gN and let d satisfy (8.1.1). Then by the Cauchy–Schwarz inequality, N X

j fi .x/  di j  N:

iD1

Fix such an x and d. Note that for each i, Z ji f .x/  fi .x/j 

1 0

j fi .x1 ; : : : ; xi1 ; t; xiC1 ; : : : ; xN /  fi .x/j dt

 k fii k D cii : By the last two inequalities and (8.1.1), N X

ji f .x/  di j  N C

iD1

N X

cii :

(8.1.20)

iD1

and 1=2 X N N 1=2 X 2 1=2 .i f .x/  di / N C c2ii : iD1

(8.1.21)

iD1

Let u.x/ D 1=.1 C ex /. Note that for all x, ju0 .x/j D

.ex=2

1 1  : x=2 2 Ce / 4

Therefore if a vector p D p.d/ is defined as pi D u.di /, then by (8.1.21), X N

.Oxi  pi /

2

1=2



1=2



1 X .i f .x/  di /2 16 iD1



1 X 2 1=2 N 1=2  C c : 4 4 iD1 ii

iD1

N

N

8.1 An Approximation for Partition Functions

127

Thus, if L WD

N X

b2i

1=2

;

iD1

then 1=2 X N 2 j f .Ox/  f . p/j  L .Oxi  pi /

(8.1.22)

iD1

L X 2 1=2 LN 1=2  C c : 4 4 iD1 ii N



Next, let v.x/ D log.1 C ex /. Then for all x, jv 0 .x/j D

ex  1: 1 C ex

Consequently, j log xO i  log pi j  ji f .x/  di j and j log.1  xO i /  log.1  pi /j  ji f .x/  di j: Therefore by (8.1.20), jg.x; xO /  g.x; p/j  2

N X

ji f .x/  di j  2N C 2

iD1

N X

cii :

iD1

Finally, let w.x/ D I.u.x//. Then w0 .x/ D u0 .x/I 0 .u.x// D

u.x/ ex log x 2 .1 C e / 1  u.x/

D

xex : .1 C ex /2

Thus, for all x, jw0 .x/j  sup x2R

jxjex 1  sup xex D : .1 C ex /2 e x0

(8.1.23)

128

8 Large Deviations for Sparse Graphs

Consequently, jI.Oxi /  I. pi /j 

1 ji f .x/  di j; e

jI.Ox/  I. p/j 

1X N C cii : e e iD1

and so by (8.1.20), N

(8.1.24)

For each d 2 D./ let C .d/ be the set of all x 2 f0; 1gN such that (8.1.1) holds, and let p.d/ be the vector p defined above. Then by (8.1.22), (8.1.23) and (8.1.24), log

X

X

e f .Ox/I.Ox/Cg.x;Ox/

(8.1.25)

d2D./ x2C .d/

x2A



X

e f .Ox/I.Ox/Cg.x;Ox/  log

N N N X L X 2 1=2 1X N LN 1=2  C C cii C 2N C 2 cii C cii 4 4 iD1 e e iD1 iD1 X X C log e f . p.d//I. p.d//Cg.x;p.d//: d2D./ x2C .d/

Now note that for any p 2 Œ0; 1N , X

eg.x;p/ D 1:

x2f0;1gN

Thus, log

X

X

e f . p.d//I. p.d//Cg.x;p.d//

(8.1.26)

d2D./ x2C .d/

 log

X

e f . p.d//I. p.d//

d2D./

 log jD./j C sup . f . p/  I. p//: p2Œ0;1N

Combining (8.1.19), (8.1.25) and (8.1.26), the proof is complete.

t u

Proof (Proof of the Lower Bound in Theorem 8.1) Let y be a point in the cube Œ0; 1N . Let Y D .Y1 ; : : : ; YN / be a random vector with independent components, where Yi is a Bernoulli. yi / random variable. Then by Jensen’s inequality,

8.2 Gradient Complexity of Homomorphism Densities

X

X

e f .x/ D

x2f0;1gN

129

e f .x/g.x;y/Cg.x;y/

x2f0;1gN

D E.e f .Y/g.Y;y/ /  exp.E. f .Y/  g.Y; y/// D exp.E. f .Y//  I. y//: Let S WD f .Y/  f . y/. For t 2 Œ0; 1 and x 2 Œ0; 1N define vi .t; x/ WD fi .tx C .1  t/y/; so that Z

N 1X

SD

.Yi  yi /vi .t; Y/ dt:

0

(8.1.27)

iD1

By the independence of Yi and Y .i/ , ˇ ˇ ˇ ˇ ˇE..Yi  yi /vi .t; Y//ˇ D ˇE..Yi  yi /.vi .t; Y/  vi .t; Y .i/ ///ˇ    @vi     @x   tcii : i Using this bound in (8.1.27) gives Z E.S/  

0

N 1X

1X cii : 2 iD1 N

tcii dt D 

iD1

t u

This completes the proof.

8.2 Gradient Complexity of Homomorphism Densities The goal of this section is to compute estimates for the smoothness and complexity terms of Theorem 8.1 for homomorphism densities. These estimates will be used later for establishing quantitative versions of some of the theorems proved earlier in this monograph, which, in turn, will be used for understanding large deviations in sparse graphs. Let n  2 be a positive integer. Let Pn denote the set of all upper triangular arrays like x D .xij /1i j. Let f .x/ WD n2 t.H; x/:

(8.2.2)

We will now obtain estimates for the smoothness and complexity terms for this f . The smoothness term is straightforward. The estimates for the smoothness term are given in Theorem 8.2 below. The bulk of this section is devoted to the analysis of the complexity term. The estimates for the complexity term are given in Theorem 8.3. Theorem 8.2 Let H and f be as above, and let m WD jEj. Then k f k  n2 , and for any i < j and i0 < j0 ,   @f   @x

    2m; and  ij   (  @2 f  4m.m  1/n1    @x @x 0 0   ij i j 4m.m  1/n2

if jfi; j; i0 ; j0 gj D 2 or 3; if jfi; j; i0 ; j0 gj D 4:

Proof It is clear that k f k  n2 since the xij ’s are all in Œ0; 1 and there are exactly nk terms in the sum that defines f . Next, note that for any i < j, 1 X @f D k2 @xij n

fa;bg2E

X

Y

q2Œnk

fl;l0 g2E fl;l0 g¤fa;bg

fqa ;qb gDfi;jg

xql ql0 ;

(8.2.3)

and therefore   2mnk2  D 2m:  nk2 ij

  @f   @x Next, for any i < j and i0 < j0 ,

1 X @2 f D k2 @xij @xi0 j0 n

fa;bg2E

X

X

Y

xql ql0 :

fc;dg2E fl;l0 g2E fc;dg¤fa;bg fqa ;qb gDfi;jg fl;l0 g¤fa;bg fqc ;qd gDfi0 ;j0 g fl;l0 g¤fc;dg q2Œnk

Take any two edges fa; bg; fc; dg 2 E such that fa; bg ¤ fc; dg. Then the number of choices of q 2 Œnk such that fqa ; qb g D fi; jg and fqc ; qd g D fi0 ; j0 g is at most 4nk3 if jfi; j; i0 ; j0 gj D 2 or 3 (since we are constraining qa , qb , qc and qd and jfa; b; c; dgj  3 always), and at most 4nk4 if jfi; j; i0 ; j0 gj D 4 (since jfa; b; c; dgj must be 4 if there is at least one possible choice of q for these i; j; i0 ; j0 ). This gives the upper bound for the second derivatives. t u

8.2 Gradient Complexity of Homomorphism Densities

131

To understand the complexity of the gradient of f , we need some preparation. For an n  n matrix M, recall the definition of the operator norm: kMk WD maxfkMzk W z 2 Rn ; kzk D 1g; where kzk denotes the Euclidean norm of z. Equivalently, kMk is the largest singular value of M. For x 2 Pn , let M.x/ be the symmetric matrix whose .i; j/th entry is xij , with the convention that xij D xji and xii D 0. Define the operator norm of x as kxkop WD kM.x/k: The following lemma estimates the entropy of the unit cube under this norm. Lemma 8.3 For any 2 .0; 1/, there is a finite set of n n matrices W . / such that jW . /j  e34.n=

2 / log.51= 2 /

;

and for any n  n matrix M with entries in Œ0; 1, there exists W 2 W . / such that kM  Wk  n : In particular, for any x 2 Pn there exists W 2 W . / such that kM.x/  Wk  n : Proof Suppose that n < 17= 2 . Let W . / consist of all n n matrices whose entries are all integer multiples of and belong to the interval Œ0; 1. It is then easy to see that for any M with entries in Œ0; 1, there exists W 2 W . / such that kMWk  n . Moreover, since n < 17= 2 , 2

jW . /j  n  e9.n=

2 / log.1= 2 /

;

completing the proof in this case. Next, suppose that n  17= 2 . Let l be the integer part of 17= 2 and ı D 1=l. Let A be a finite subset of the unit ball of Rn such that any vector inside the ball is at Euclidean distance  ı from some element of A . (In other words, A is a ı-net of the unit ball under the Euclidean metric.) The set A may be defined as a maximal set of points in the unit ball such that any two are at a distance greater than ı from each other. Since the balls of radius ı=2 around these points are disjoint and their union is contained in the ball of radius 1 C ı=2 centered at zero, it follows that jA jC.ı=2/n  C.1 C ı=2/n , where C is the volume of the unit ball. Therefore, jA j  .3=ı/n :

(8.2.4)

132

8 Large Deviations for Sparse Graphs

Take any x 2 Pn . Suppose that M has singular value decomposition MD

n X

i ui vit ;

iD1

where 1  2     n  0 are the singular values of M, and u1 ; : : : ; un and v1 ; : : : ; vn are singular vectors, and vit denotes the transpose of the column vector vi . Assume that the ui ’s and vi ’s are orthonormal systems. SinceP the elements of M all belong to the interval Œ0; 1, it is easy to see that 1  n and 2i  n2 . Due to the second inequality, there exists y 2 A such that n X

.n1 i  yi /2  ı 2 :

(8.2.5)

iD1

Let z1 ; : : : ; zn and w1 ; : : : ; wn be elements of A such that for each i, n X

.uij  zij /2  ı 2 and

jD1

n X

.vij  wij /2  ı 2 ;

(8.2.6)

jD1

where uij denotes the jth component of the vector ui , etc. Define two matrices V and W as V WD

l1 X

i ui vit and W WD

iD1

Note that since Thus,

P

l1 X

nyi zi wti :

iD1

2i  n2 and i decreases with i, therefore for each i, 2i  n2 =i. kM  Wk  kM  Vk C kV  Wk n  p C kV  Wk: l

Next, note that by (8.2.6), the operator norms of the rank-one matrices .ui zi /vit and zi .vi  wi /t are bounded by ı. And by (8.2.5), ji  nyi j  nı for each i. Therefore  X  X   l1   l1 t t  .  ny /u v ny .u  z /v kV  Wk   C i i i i i i i i   iD1

iD1

 X   l1 t  nyi zi .vi  wi /  C iD1

 max ji  nyi j C 2 1il1

l1 X

njyi jı

iD1

 1=2 l1 X p 3n  nı C 2nı .l  1/ y2i  nı C 2nı l  1  p : l iD1

8.2 Gradient Complexity of Homomorphism Densities

133

Thus, 4n 4n 4n kM  Wk  p  q  q D n : 17 16 l 1

2

2

Let W . / be the set of all possible W’s constructed in the above manner. Then W . / has the required property, and by (8.2.4), jW . /j  The number of ways of choosing y; z1 ; : : : ; zl1 ; w1 ; : : : ; wl1 2 A D jA j2l1  .3=ı/2nl D e2nl log.3l/ : This completes the proof of the lemma. t u Let r be a positive integer. Let Kr be the complete graph on the vertex set f1; : : : ; rg. For any set of edges A of Kr , any q D .q1 ; : : : ; qr / 2 Œnr , and any x 2 Pn , let P.x; q; A/ WD

Y

xqa qb ;

fa;bg2A

with the usual convention that the empty product is 1. Note that if qa D qb for some fa; bg 2 A, the P.x; q; A/ D 0 due to our convention that xii D 0 for each i. Next, note that if A and B are disjoint sets of edges, then P.x; q; A [ B/ D P.x; q; A/P.x; q; B/:

(8.2.7)

Lemma 8.4 Let A and B be sets of edges of Kr , and let e D f˛; ˇg be an edge that is neither in A nor in B. Then for any x; y 2 Pn , ˇX ˇ ˇ ˇ ˇ P.x; q; A/P. y; q; B/.xq˛ qˇ  yq˛ qˇ /ˇˇ  nr1 kx  ykop : ˇ q2Œnr

Proof By relabeling the vertices of Kr and redefining A and B, we may assume that ˛ D 1 and ˇ D 2. Let A1 be the set of all edges in A that are incident to 1. Let A2 be the set of all edges in A that are incident to 2. Note that since f1; 2g 62 A, therefore A1 and A2 must be disjoint. Similarly, let B1 be the set of all edges in B that are incident to 1 and let B2 be the set of all edges in B that are incident to 2. Let A3 D A n .A1 [ A2 / and B3 D B n .B1 [ B2 /. By (8.2.7), P.x; q; A/ D P.x; q; A1 /P.x; q; A2 /P.x; q; A3 /

134

8 Large Deviations for Sparse Graphs

and P. y; q; B/ D P. y; q; B1 /P. y; q; B2 /P. y; q; B3 /: Thus, X

P.x; q; A/P. y; q; B/.xq1q2  yq1 q2 /

q2Œnr

D

X

P.x; q; A3 /P. y; q; B3 /

q3 ;:::;qr

X

Q.x; y; q/.xq1 q2

  yq1 q2 / ;

q1 ;q2

where Q.x; y; q/ D P.x; q; A1 /P.x; q; A2 /P. y; q; B1 /P. y; q; B2 /: Now fix q3 ; : : : ; qr . Then P.x; q; A1 /P. y; q; B1 / is a function of q1 only, and does not depend on q2 . Let g.q1 / denote this function. Similarly, P.x; q; A2 /P. y; q; B2 / is a function of q2 only, and does not depend on q1 . Let h.q2 / denote this function. Both g and h are uniformly bounded by 1. Therefore ˇ ˇ ˇ ˇ ˇX ˇ ˇX ˇ ˇ ˇ ˇ Q.x; y; q/.xq1 q2  yq1 q2 /ˇ D ˇ g.q1 /h.q2 /.xq1 q2  yq1 q2 /ˇˇ ˇ q1 ;q2

q1 ;q2

 nkx  ykop : Since this is true for all choices of q3 ; : : : ; qr and P is also uniformly bounded by 1, this completes the proof of the lemma. t u Let A and B be two sets of edges of Kr . For x; y 2 Pn , define R.x; y; A; B/ WD

X

P.x; q; A/P. y; q; B/:

q2Œnr

Lemma 8.5 Let A, B, A0 and B0 be sets of edges of Kr such that A\B D A0 \B0 D ; and A [ B D A0 [ B0 . Then jR.x; y; A; B/  R.x; y; A0 ; B0 /j 

1 r.r  1/nr1 kx  ykop : 2

Proof First, suppose that e D f˛; ˇg is an edge such that e 62 A0 and A D A0 [ feg. Since A [ B D A0 [ B0 and A \ B D A0 \ B0 D ;, this implies that e 62 B and B0 D B [ feg. Thus, R.x; y; A; B/  R.x; y; A0 ; B0 / D

X q2Œnr

P.x; q; A0 /P. y; q; B/.xq˛ qˇ  yq˛ qˇ /;

8.2 Gradient Complexity of Homomorphism Densities

135

and the proof is completed using Lemma 8.4. For the general case, simply ‘move’ from the pair .A; B/ to the pair .A0 ; B0 / by ‘moving one edge at a time’ and apply Lemma 8.4 at each step. t u Lemma 8.6 Let gij WD @f =@xij , where f is the function defined in Eq. (8.2.2), and let m WD jEj, as before. Then for any x; y 2 Pn , X

.gij .x/  gij . y//2  8m2 k2 nkx  ykop :

1i j, let gij D gji . Fix x; y 2 Pn . Define for any q 2 Œnk and fa; bg 2 E Y

D.q; fa; bg/ WD

xql ql0 

fl;l0 g2E fl;l0 g¤fa;bg

Y

yql ql0 :

fl;l0 g2E fl;l0 g¤fa;bg

Define ( ij WD

2

if i D j;

1=2

if i ¤ j;

( ij WD

2

if i D j;

1

if i ¤ j:

Then note that n X

ij .gij .x/  gij . y//2

i; jD1

D

D

D

1 n2k4 1 n2k4 1 n2k4

n X

ij

i; jD1

 X fa;bg2E

n X X

2

X

D.q; fa; bg/

q2Œnk fqa ;qb gDfi;jg

X

X

ij D.q; fa; bg/D.s; fc; dg/

i; jD1 fa;bg2E q2Œnk s2Œnk fc;dg2E fqa ;qb gDfi;jg fsc ;sd gDfi;jg

X X fa;bg2E fc;dg2E

q2Œnk

X s2Œnk

fsc ;sd gDfqa ;qb g

qa qb D.q; fa; bg/D.s; fc; dg/:

136

8 Large Deviations for Sparse Graphs

Now fix two edges fa; bg and fc; dg in E. Relabeling vertices if necessary, assume that c D k  1 and d D k. Let r D 2k  2. For any t 2 Œnr , define two vectors q.t/ and s.t/ in Œnk as follows. For i D 1; : : : ; k, let qi .t/ D ti . For i D 1; : : : ; k  2, let si .t/ D tiCk . Let sk1 .t/ D ta and sk .t/ D tb . Note that X

X

q2Œnk

s2Œnk

qa qb D.q; fa; bg/D.s; fc; dg/

fsc ;sd gDfqa ;qb g

X

X

q2Œnk

s2Œnk sc Dqa ; sd Dqb

D

C

X

X

q2Œnk

s2Œnk

D.q; fa; bg/D.s; fc; dg/

D.q; fa; bg/D.s; fc; dg/:

sc Dqb ; sd Dqa

With the definition of q.t/ and s.t/ given above, it is clear that the first term on the right-hand side equals X

D.q.t/; fa; bg/D.s.t/; fc; dg/:

t2Œnr

Below, we will get a bound on this term. The same upper bound will hold for the other term by symmetry. Define two subsets of edges A and B of Kr as follows. Let A be the set of all edges fl; l0 g such that fl; l0 g 2 E n ffa; bgg. Let B be the set of all edges f.l/; .l0 /g such that fl; l0 g 2 E n ffk  1; kgg, where  W Œk ! Œr is the map

.x/ D

8 ˆ ˆ 0 and let

D

2 : 64m2 k2

Let W . / be as in Lemma 8.3. For each W 2 W . /, let y.W/ 2 Pn be a vector such that kM. y/  Wk  n . If for some W there does not exist any such y, leave y.W/ undefined. Let gij D @f =@xij , as in Lemma 8.6. Let g W Pn ! Rn.n1/=2 be the function whose .i; j/th coordinate is gij . Define ˚  D./ WD g. y/ W y D y.W/ for some W 2 W . / : Then by Lemma 8.3 jD./j  jW . /j  e34.n=

2 / log.51= 2 /

:

We claim that the set D./ satisfies the requirements of Theorem 8.5. To see this, take any x 2 Pn . By Lemma 8.3, there exists W 2 W . / such that kM.x/  Wk  n . In particular, this means that y WD y.W/ is defined, and so kx  ykop D kM.x/  M. y/k  kM.x/  Wk C kW  M. y/k  2n :

138

8 Large Deviations for Sparse Graphs

Therefore by Lemma 8.6, X

.gij .x/  gij . y//2  16m2 k2 n2 :

1i 0 and let D./ be as in Sect. 8.1 (for the function f ). Let 

 0 WD

2k

0k

 ; WD P

1=2 : N 1 2 N iD1 b2i

Let l 2 RN be the vector whose coordinates are all equal to log. p=.1  p// and define D 0 ./ WD fd C l W d 2 D. 0 /;  D j for some integer 0  j < k

0

k= g:

Let gi WD @g=@xi . Take any x 2 Œ0; 1N , and choose d 2 D./ satisfying (8.1.1). Choose an integer j between 0 and k 0 k= such that j

0

. f .x/=N/  j j  :

Let d0 WD j d C l, so that d0 2 D 0 ./. Then N N X X 0 2 .gi .x/  di / D . iD1

0

. f .x/=N/fi .x/  j di /2

iD1

 2.

0

. f .x/=N/  j /2

N X

fi .x/2 C 2k

iD1

 2 2

N X

b2i C 2k

0 2

k

N X . fi .x/  di /2 iD1

0 2

k N 02 D N 2 :

iD1

This shows that D 0 ./ plays the role of D./ for the function g. Note that jD 0 ./j 

0

k

k

jD. 0 /j:

This gives the upper bound on the complexity term for the function g. The proof is completed by applying Theorem 8.1. t u Proof (Proof of the Lower Bound in Theorem 8.5) Fix any z 2 Œ0; 1N such that f .z/  .t C ı0 /N: Let Z D .Z1 ; : : : ; ZN / be a random vector with independent components, where Zi Bernoulli.zi /. Let A be the set of all x 2 f0; 1gN such that f .x/  tN. Let g be

8.4 Nonlinear Large Deviations

145

defined as in (8.1.17), and let A 0 be the subset of A where jg.x; z/g.x; p/Ip.z/j  0 N. Then X P. f .Y/  tN/ D eg.x;p/ (8.4.5) x2A

D

X

eg.x;p/g.x;z/Cg.x;z/

x2A



X

eg.x;p/g.x;z/Cg.x;z/  eIp .z/0 N P.Z 2 A 0 /:

x2A 0

Note that E.g.Z; z/  g.Z; p// D Ip .z/; and Var.g.Z; z/  g.Z; p// D

N X

Var.Zi log.zi =p/ C .1  Zi / log..1  zi /=.1  p///

iD1

D

N X

 zi .1  zi / log

iD1

zi =p .1  zi /=.1  p/

2

:

p Using the inequalities j x log xj  2=e  1 and x.1  x/  1=4, we see that for any x 2 Œ0; 1, 2 x=p x.1  x/ log .1  x/=.1  p/  p p  j x log xj C j 1  x log.1  x/j C 

  2C

ˇ ˇ 1 ˇˇ p ˇˇ 2 : log 2ˇ 1  pˇ

ˇ ˇ p ˇˇ 2 1 ˇˇ log 2ˇ 1  pˇ

Combining the last three displays, we see that P.jg.Z; z/  g.Z; p/  Ip .z/j > 0 N/ ˇ ˇ  1 1ˇ 1 p ˇˇ 2  2 2 C ˇˇlog D : ˇ 2 1p 4 0 N

(8.4.6)

146

8 Large Deviations for Sparse Graphs

Let S WD f .Z/  f .z/ and vi .t; x/ WD fi .tx C .1  t/z/. Let Z .i/ be defined following the same convention as in the proof of Theorem 8.1. Let Si WD f .Z .i/ /  f .z/, so that jS  Si j  bi . Since Z

N 1X

SD 0

.Zi  zi /vi .t; Z/ dt;

iD1

we have 2

Z

E.S / D

0

N 1X

E..Zi  zi /vi .t; Z/S/ dt:

(8.4.7)

iD1

By the independence of Zi and the pair .Si ; Z .i/ /, ˇ ˇ ˇE..Zi  zi /vi .t; Z/S/ˇ ˇ ˇ D ˇE..Zi  zi /.vi .t; Z/S  vi .t; Z .i/ /Si //ˇ    @vi     @x EjSj C kvi kEjS  Si j i  2atcii C b2i : By (8.4.7), this gives E.S2 / 

N X .acii C b2i /: iD1

Therefore, P. f .Z/ < tN/ 

N 1 X 1 .acii C b2i / D : 4 ı02 N 2 iD1

(8.4.8)

Inequalities (8.4.6) and (8.4.8) give P.Z 2 A 0 / 

1 : 2

Plugging this into (8.4.5) and taking supremum over z completes the argument.

t u

8.5 Quantitative Estimates for Homomorphism Densities

147

8.5 Quantitative Estimates for Homomorphism Densities Let G D G.n; p/ be an Erd˝os-Rényi random graph on n vertices, with edge probability p. Let H be a fixed finite simple graph. Recall the definition of the homomorphism density t.H; G/ from Chap. 3. Let Pn and t.H; x/ be defined as in Sect. 8.2. Recall the function Ip defined in Eq. (8.4.1), and for x 2 Pn , let Ip .x/ WD

X

Ip .xij /:

1i 1 define p .u/

WD inffIp .x/ W x 2 Pn such that t.H; x/  u E.t.H; G//g:

The following theorem shows that for any u > 1,

P t.H; G/  u E.t.H; G// D exp.

p .u/

C lower order terms/:

(8.5.1)

provided that n is large and p is not too small. This is a quantitative version of Theorem 6.3 from Chap. 6. Theorem 8.6 Take any finite simple graph H and let p be defined as above. Let X WD t.H; G/. Let k be the number of vertices and m be the number of edges of H. Suppose that m  1 and p  n1=.mC3/ . Then for any u > 1 and any n sufficiently large (depending only on H and u), 1

c.log n/b1 C.log n/B1 p .u/ 1C  ; b b n 2p 3  log P.X  u E.X// nB2 pB3

where c and C are constants that depend only on H and u, and 1 ; b3 D 2m; 2m 9 C 8m 1 16m ; B2 D ; B3 D 2m  : B1 D 5 C 8m 5 C 8m k.5 C 8m/ b1 D 1; b2 D

The remainder of this section is devoted to the proof of Theorem 8.6. The proof is a direct application of Theorem 8.5, using the estimates obtained in Sect. 8.2. The first step is to understand the properties of the rate function p .t/ defined in Eq. (8.4.3), put in the context of this problem. We begin with a technical lemma. Lemma 8.7 For any r and any a1 ; : : : ; ar ; b 2 Œ0; 1, r r Y Y .ai C b.1  ai //  .1  br / ai C br : iD1

iD1

148

8 Large Deviations for Sparse Graphs

Proof The proof is by induction on r. The inequality is an equality for r D 1. Suppose that it holds for r  1. Then r Y

  r1 Y .ai C b.1  ai //  .1  br1 / ai C br1 ..1  b/ar C b/

iD1

iD1

D .1  br1 /.1  b/

r Y

ai C br1 .1  b/ar C .1  br1 /b

iD1

r1 Y

ai C br

iD1

 ..1  br1 /.1  b/ C br1 .1  b/ C .1  br1 /b/

r Y

ai C br

iD1

D .1  br /

r Y

ai C br :

iD1

t u

This completes the induction. 2

Lemma 8.8 Let p .t/ be defined as in (8.4.3), with f .x/ D n t.H; x/ and N D n.n  1/=2. Let l be the element of Pn whose coordinates are all equal to 1, and let t0 WD f .l/=N. Then for any 0 < ı < t < t0 ,  p .t  ı/  p .t/ 

ı t0  t

1=m N log.1=p/:

Proof Take any x 2 Pn such that f .x/  .t  ı/N and x minimizes Ip .x/ among all x satisfying this inequality. If f .x/  tN, then we immediately have p .t/  Ip .x/ D p .t  ı/, and there is nothing more to prove. So let us assume that f .x/ < tN. Let   WD

tN  f .x/ f .l/  f .x/

1=m

:

For each 1  i < j  n, let yij WD xij C .1  xij /: Then y 2 Pn , and by Lemma 8.7, f . y/  .1   m /f .x/ C  m f .l/ D tN: Thus, by the convexity of Ip , p .t/  Ip . y/ D Ip ..1  /x C l/  .1  /Ip .x/ C Ip .l/  Ip .x/ C  N log.1=p/ D p .t  ı/ C  N log.1=p/:

8.5 Quantitative Estimates for Homomorphism Densities

149

Since f .x/  .t  ı/N, m 

ı tN  .t  ı/N  : f .l/  .t  ı/N t0  t t u

This completes the proof of the lemma. Lemma 8.9 For any p and t, 1 1=k .dt ne C k/2 log.1=p/: 2

p .t/ 

Proof Let r WD dt1=k ne C k. Define x 2 Œ0; 1N as ( xij WD

1

if 1  i < j  r;

p

otherwise.

Then f .x/ 



1 nk2

X Y q2Œrk

xql ql0

fl;l0 g2E

r.r  1/    .r  k C 1/  tn2  tN; nk2

and since Ip . p/ D 0, Ip .x/ D

X i k. Then for any u > 1, 2 p2m

P.X  uE.X//  eCn

;

where C is a positive constant that depends only on u and k. Proof Recall that X is a function of n.n  1/=2 i.i.d. Bernoulli. p/ random variables. If the value of one of these variables changes, then from the definition of t.H; G/ we see that X can change by at most n2 . Moreover, observe that .u  1/E.X/  .u  1/n.n  1/    .n  k C 1/nk pm ;

150

8 Large Deviations for Sparse Graphs

since any map that takes the vertices of H to distinct vertices of G has a probability pm of being a homomorphism. The claim now follows easily by McDiarmid’s inequality (Theorem 4.3). t u We are now ready to prove Theorem 8.6. Proof (Proof of the Upper Bound in Theorem 8.6) The task now is to pull together all the information obtained above, for use in Theorem 8.5. As before, we work with f .x/ D n2 t.H; x/. Take t D pm for some fixed  > 0. Let ı and  be two positive real numbers, both less than t, to be chosen later. Note that ı < t < p2m=k since t D pm and k > 2. Assume that ı and  are bigger than n1=2 . Note that p is already assumed to be bigger than n1=2 in the statement of the theorem. Recall that the indexing set for quantities like bi and cij , instead of being f1; : : : ; Ng, is now f.i; j/ W 1  i < j  ng. For simplicity, we will write .ij/ instead of .i; j/. Throughout, C will denote any constant that depends only on the graph H, the constant , and nothing else. From Theorem 8.2, we have the estimates a  n2 ; b.ij/  C; and ( c.ij/.i0 j0 / 

Cn1

if jfi; j; i0 ; j0 gj D 2 or 3;

Cn2

if jfi; j; i0 ; j0 gj D 4:

Let  WD ı 1 p2m=k . By Lemma 8.9, K  Cp2m=k log n: Using the above bounds, we get ˛  Cn2 log n; ˇ.ij/  C log n; and ( .ij/.i0 j0 / 

Cn1  log n

if jfi; j; i0 ; j0 gj D 2 or 3;

Cn2 ı 1  log n

if jfi; j; i0 ; j0 gj D 4:

Therefore, we have the estimates X X 2 ˇ.ij/  Cn2  2 .log n/2 ; b2.ij/  Cn2 ; .ij/

.ij/

8.5 Quantitative Estimates for Homomorphism Densities

151

and by Theorem 8.3, log jD..ı/=.4K//j  

CK Cn 4 log 4 ı Cn 4 .log n/5 : 4

Combining the last three estimates, we see that the complexity term in Theorem 8.5 is bounded above by Cn2  log n C

Cn 4 .log n/5 : 4

Taking  D n1=5  3=5 .log n/4=5 , the above bound simplifies to Cn9=5  8=5 .log n/9=5 : Next, note that by the bounds obtained above and the inequality ı > n1=2 , X

˛.ij/.ij/  Cn3 .log n/2 ;

.ij/

X

.ij/;.i0 j0 /

X

2 3 2 3 ˛.ij/.i 0 j0 /  Cn  .log n/ ;

ˇ.ij/ .ˇ.i0 j0 / C 4/.ij/.i0 j0 /  Cn2 ı 1  3 .log n/3 ;

.ij/;.i0 j0 /

X .ij/

X

2 ˇ.ij/

1=2  X

2 .ij/.ij/

1=2

 Cn 2 .log n/2 ;

.ij/

.ij/.ij/  Cn log n:

.ij/

The above estimates show that the smoothness term in Theorem 8.5 is bounded above by a constant times n3=2 .log n/3=2 C nı 1=2  3=2 .log n/3=2 C n 2 .log n/2 : Putting WD p2m=k , we see that this is bounded by a constant times n3=2 ı 1 .log n/3=2 C nı 2 3=2 .log n/2 :

152

8 Large Deviations for Sparse Graphs

Since ı > n1=2 , we can further simplify this upper bound to n3=2 ı 1 .log n/2 : Combining the bounds on the complexity term and the smoothness term, we get that log P. f .Y/  tN/  p .t  ı/ C Cn9=5 ı 8=5 8=5 .log n/9=5 C Cn3=2 ı 1 .log n/2 : By Lemma 8.8, p .t  ı/  p .t/ C Cı 1=m n2 log n: Taking ı D nm=.5C8m/ 8m=.5C8m/ .log n/4m=.5C8m/ gives log P. f .Y/  tN/

(8.5.2)

 p .t/ C Cn.9C16m/=.5C8m/ 8=.5C8m/ .log n/.9C8m/=.5C8m/ C Cn.15C26m/=.10C16m/ 5=.5C8m/ .log n/.10C12m/=.5C8m/: Now note that since p > n1=2 , n.9C16m/=.5C8m/ 8=.5C8m/ D n.3C6m/=.10C16m/p6m=.5kC8mk/ n.15C26m/=.10C16m/ 5=.5C8m/  n.3C6m/=.10C16m/n3m=.5C8m/ D n3=.10C16m/ : This shows that the first term on the right-hand side in (8.5.2) dominates the second when n is sufficiently large. Therefore, when n is large enough, log P. f .Y/  tN/  p .t/ C Cn.9C16m/=.5C8m/p16m=.5kC8mk/ .log n/.9C8m/=.5C8m/:

8.6 Explicit Rate Function for Triangles

153

Written differently, this is p .t/  log P. f .Y/  tN/ 1C

Cn.9C16m/=.5C8m/p16m=.5kC8mk/ .log n/.9C8m/=.5C8m/ :  log P. f .Y/  tN/

By Lemma 8.10,  log P. f .Y/  tN/  Cn2 p2m ;

(8.5.3)

Therefore, p .t/  log P. f .Y/  tN/  1 C Cn1=.5C8m/ p2mC16m=.5kC8mk/ .log n/.9C8m/=.5C8m/: A minor verification using the assumption p  n1=4 shows that the  and ı chosen above are both bigger than n1=2 , as required. To complete the proof of the upper bound, notice that E.X/ is asymptotic to pm since p  n1=.mC3/ . t u Proof (Proof of the Lower Bound in Theorem 8.6) First, observe that by Lemma 8.8, Theorem 8.2, and the lower bound in Theorem 8.5, log P. f .Y/  tN/  p .t/  Cn1=2m n2 log n: Therefore, again applying (8.5.3), we get p .t/  1  Cn1=2m p2m log n:  log P. f .Y/  tN/ This completes the proof of the lower bound.

t u

8.6 Explicit Rate Function for Triangles Theorem 8.6 converts the large deviation question for subgraph densities into a question of solving a variational problem analogous to the variational problem for dense graphs that we saw in Theorem 5.2. Unlike the dense case, however, the sparse regime allows us to explicitly solve the variational problem in many examples. The simplest example, that of triangle density, is solved in this section. We will prove the following result.

154

8 Large Deviations for Sparse Graphs

Theorem 8.7 Let Tn;p be the number of triangles in an Erd˝os–Rényi graph G.n; p/. Then for any fixed ı > 0, as n ! 1 and p ! 0 simultaneously such that p n1=158 .log n/33=158 , P.Tn;p  .1 C ı/E.Tn;p //    2=3  ı 2 2 ı ; n p log.1=p/ : D exp  .1  o.1// min 2 3 Let H be a triangle, and for simplicity, let T. f / WD t.H; f /. Let Pn and t.H; x/ be as in Sect. 8.2. For x 2 Pn , let T.x/ WD t.H; x/. For ı > 0, let p .n; ı/

WD inffIp .x/ W x 2 Pn ; T.x/  .1 C ı/p3 g:

By Theorem 8.6, it suffices to show that under the given conditions, 

p .n; ı/

n2 p2 log.1=p/

! min

 ı 2=3 ı ; : 2 3

(8.6.1)

The remainder of this section is devoted to the proof of (8.6.1). The proof requires a number of technical estimates, which are worked out below. Throughout, we will use the notation a  b to mean that the two quantities a and b are varying in such a way that a=b ! 0. Similarly, a b will mean that a=b ! 1. The symbol o.a/ will denote any quantity that, when divided by a, tends to zero as the relevant parameters vary in some prescribed manner. Let Ip be defined as in (8.4.1). Lemma 8.11 If p ! 0 and 0  x  p, then Ip . p C x/ x2 =.2p/. On the other hand, if p ! 0, x ! 0 and p  x, then Ip . p C x/ x log.x=p/. Proof Note that Ip . p/ D Ip0 . p/ D 0, Ip00 . p/ D 1=. p.1  p// and Ip000 .x/ D

1 1  : .1  x/2 x2

(8.6.2)

The first assertion of the lemma follows easily by Taylor expansion, using the above identities. For the second assertion, note that if p ! 0, x ! 0 and p  x, then Ip . p C x/ D . p C x/ log D x log

1px pCx C .1  p  x/ log p 1p

x C O.x/: p

This completes the proof of the lemma.

t u

8.6 Explicit Rate Function for Triangles

155

Lemma 8.12 There exists p0 > 0 such that for all 0 < p  p0 and 0  x  b  1  p  1= log.1=p/, Ip . p C x/  .x=b/2 Ip . p C b/: Proof Let q WD 1  p  1= log.1=p/. Let f .t/ WD Ip . p C

p t/. Note that

  p 1 d p Ip0 . p C t/ dt 2 t p p 1 1 D Ip00 . p C t/  3=2 Ip0 . p C t/: 4t 4t

f 00 .t/ D

Let g.x/ WD 4x3 f 00 .x2 /. Then by the above formula, g.x/ D xIp00 . p C x/  Ip0 . p C x/: By (8.6.2), this shows that ( 0

g .x/ D

xIp000 . p

C x/

0

if p C x  1=2;

0

if p C x  1=2:

(8.6.3)

Now note that g.0/ D 0, and as p ! 0, g.q/ D

.1  p/.log.1=p/  1/ q log.1=p/  log .1  1= log.1=p// p

D  log log.1=p/ C O.1/: Thus, there exists p0 small enough such that if p  p0 , then g.q/ < 0. By (8.6.3), this implies that g.x/  0 for all x 2 Œ0; q. In particular, f is concave in the interval Œ0; q2 . Thus, for any 0  x  b  q, Ip . p C x/ D f .x2 /  .x2 =b2 /f .b2 / D .x=b/2 Ip . p C b/; which completes the proof.

t u

Corollary 8.1 There is some p0 > 0 such that when 0 < p  p0 and 0  x  1  p, Ip . p C x/  x2 Ip .1  1= log.1=p//: Proof Let p0 be as in Lemma 8.12 and let b D 1p1= log.1=p/. When 0  x  b, the inequality follows by Lemma 8.12 since b < 1. When b  x  1  p, then we trivially deduce Ip . p C x/  Ip . p C b/  x2 Ip . p C b/. t u

156

8 Large Deviations for Sparse Graphs

Now let p .ı/ WD

1 inffIp . f / W f 2 W ; T. f /  .1 C ı/p3 g: 2

Note that unlike p .n; ı/, p .ı/ does not depend on n. The following lemma compares p .n; ı/ and p .ı/. Lemma 8.13 For any p, n and ı, p .ı/ 

1 n2

p .n; ı/

C

1 1 log : 2n 1p

Proof For any x 2 Pn , construct a graphon f in the natural way, by defining f .s; t/ D xij if either .s; t/ or .t; s/ belongs to the square Œ.i  1/=n; i=n/  Œ. j  1/=n; j=n/ for some 1  i  j  n, where xii is taken to be zero. It is easy to verify that T. f / D T.x/, and n2 Ip . f / D 2Ip .x/ C nIp .0/: Dividing by 2n2 on both sides and taking infimum over x completes the proof. We are now ready to prove Theorem 8.7.

t u

Proof (Proof of Theorem 8.7) Throughout this proof, C will denote any universal constant and o.1/ will denote any quantity that depends only on p and tends to zero as p ! 0. Take any f such that T. f /  .1 C ı/p3 . First, we wish to show that   2=3 ı 2 1 ı Ip . f /  .1  o.1// min ; p log.1=p/: 2 2 3

(8.6.4)

By Lemma 8.13, this will show that lim inf

p .n; ı/

n2 p2 log.1=p/

  min

 ı 2=3 ı ; ; 2 3

proving one direction of (8.6.1). Assume that   2=3 1 ı 2 ı Ip . f /  min ; p log.1=p/: 2 2 3

(8.6.5)

We can make this assumption because if there exists no such f (among all f satisfying T. f /  .1 C ı/p3 ), then the proof of (8.6.4) is automatically complete. Since Ip is decreasing in Œ0; p and increasing in Œp; 1, we may assume without loss of generality that f  p everywhere. (Otherwise, it suffices to prove (8.6.4)

8.6 Explicit Rate Function for Triangles

157

for f 0 WD maxf f ; pg, because T. f 0 /  T. f /  .1 C ı/p3 and Ip . f 0 /  Ip . f /.) Let g WD f  p. Then T. f / D T. p C g/ D p3 C T.g/ C 3pR.g/ C 3p2 S.g/; where

(8.6.6)

Z R.g/ WD

g.x; y/g. y; z/ dx dy dz

Œ0;13

and

Z S.g/ WD

Œ0;12

g.x; y/ dx dy:

Now, if S.g/  p3=2 log.1=p/, then by Lemma 8.11, Ip . f / D Ip . p C g/  Ip . p C S.g//  Ip . p C p3=2 log.1=p// p2 log.1=p/; which proves (8.6.4). So let us assume that S.g/ < p3=2 log.1=p/:

(8.6.7)

Under the above assumption, (8.6.6) gives T.g/ C 3pR.g/  .ı  o.1//p3: For each x 2 Œ0; 1, let Z

1

d.x/ WD

g.x; y/ dy; 0

and for each b 2 Œ0; 1, let Bb WD fx 2 Œ0; 1 W d.x/  bg: Note that by Jensen’s inequality, Z Ip . f / D Ip . p C g/ 

1 0

Ip . p C d.x// dx:

Thus, R1 Leb.Bb / 

0

Ip . p C d.x// dx Ip . f /  : Ip . p C b/ Ip . p C b/

(8.6.8)

158

8 Large Deviations for Sparse Graphs

Therefore by the assumption (8.6.5), we get Leb.Bb / 

Cp2 log.1=p/ : Ip . p C b/

If b varies with p such that p  b < 1p, then by Lemma 8.11, the above inequality gives Leb.Bb / 

Cp2 Cp2 log.1=p/  : b log.b=p/ b

(8.6.9)

Thus, Z Bb Bb

g.x; y/2 dx dy  Leb.Bb /2 

Cp4 : b2

(8.6.10)

Let D be the set of all .x; y; z/ 2 Œ0; 13 such p that at least one of x, y and z belongs to Bb . Choose b depending on p such that p log.1=p/  b  1. Then by (8.6.7) and (8.6.9), Z g.x; y/g. y; z/g.z; x/ dx dy dz  3Leb.Bb /S.g/  p3 : (8.6.11) D

Let Bcb WD Œ0; 1nBb , and let 1 ı1 WD 2 p

Z

g.x; y/2 dx dy Bb Bcb

and ı2 WD

1 p2

Z

g.x; y/2 dx dy: Bcb Bcb

By the inequality (8.6.11) and the generalized Hölder inequality from Chap. 2 (Theorem 2.1), Z T.g/ D g.x; y/g. y; z/g.z; x/ dx dy dz C o. p3 / Bcb Bcb Bcb 3=2

 .ı2

C o.1//p3 :

(8.6.12)

8.6 Explicit Rate Function for Triangles

159

Next, by Lemma 8.12 and Jensen’s inequality, Z Ip . f / D Ip . p C g/  Z

1 0

 Bcb



Ip . p C d.x// dx Ip . p C d.x// dx

Ip . p C b/ b2

Z

d.x/2 dx: Bcb

Proceeding as in the proof of (8.6.9), this gives Z

d.x/2 dx  Bcb

b 2 Ip . f /  Cp2 b D o. p2 /: Ip . p C b/

Therefore, Z

1

R.g/ D

d.x/2 dx D

0

d.x/2 dx C o. p2 / Bb

Z Z

1

 Bb

Z

0

g.x; y/2 dy dx C o. p2 /:

Combining this with Eq. (8.6.10), we get R.g/  .ı1 C o.1//p2:

(8.6.13)

By Eqs. (8.6.8), (8.6.12) and (8.6.13), we get 3=2

3ı1 C ı2

 ı  o.1/:

Finally, by Corollary 8.1, Ip . f / D Ip . p C g/  .1  o.1//.2ı1 C ı2 /p2 log.1=p/; where the o.1/ term appears because Ip .1  1= log.1=p// D .1  o.1// log.1=p/: Combining all of the above, we get  p .ı/ 1  inf ı1 C ı2 W ı1  0; ı2  0; lim inf 2 p!0 p log.1=p/ 2  3=2 3ı1 C ı2  ı :

(8.6.14)

160

8 Large Deviations for Sparse Graphs

Now take any a  0, and for 0  x  a=3, let 1 f .x/ WD x C .a  3x/2=3 : 2 An easy verification shows that depending on the value of a, either f is increasing throughout the interval Œ0; a=3, or increases up to a maximum value and then decreases. In either case, the minimum value of f is achieved at one of the two 3=2 endpoints of the interbal Œ0; a=3. This shows that under the constraint 3ı1 C ı2 D a, the minimum value of ı1 C ı2 =2 is attained at either ı1 D 0 or ı2 D 0. Since this is true for any a, the minimization in (8.6.14) is attained when either ı1 D 0 or ı2 D 0. From this, (8.6.4) follows easily. Let us now prove the other direction of (8.6.1), that is,  ı 2=3 ı  min ; : lim sup 2 2 n p log.1=p/ 2 3 p .n; ı/



(8.6.15)

First let a be an integer, to be chosen later, and let x 2 Pn be the element defined as xij D 1 if 1  i < j  a, and xij D p otherwise. Then it is easy to see that T.x/ 

1 .a.a  1/.a  2/ C .n.n  1/.n  2/  a.a  1/.a  2//p3 / n3

and Ip .x/ D

a.a  1/ log.1=p/: 2

Thus, we can choose a such that a D .ı 1=3 C o.1//np, T.x/  .1 C ı/p3 , and  Ip .x/ D

 ı 2=3 C o.1/ n2 p2 log.1=p/: 2

Next, let x be defined as xij D 1 if 1  i  a and i < j, and xij D p otherwise. Then T.x/ 

1 .3a.n  a/.n  a  1/p C .n  a/.n  a  1/.n  a  2/p3 / n3

and   aC1 log.1=p/: Ip .x/ D a n  2

8.6 Explicit Rate Function for Triangles

161

Thus, we can choose a such that a D .ı=3 C o.1//p2n, T.x/  .1 C ı/p3 and  Ip .x/ D

 ı C o.1/ n2 p2 log.1=p/: 3

Combining the two bounds proves (8.6.15).

t u

Bibliographical Notes The theory developed in the earlier chapters applies only to dense random graphs. However, most graphs that arise in applications are sparse. The main roadblock to developing the analogous theory for sparse graphs is the unavailability of a suitable sparse version of graph limit theory. Although a number of attempts have been made towards extending graph limit theory to the sparse setting such as in Bollobás and Riordan [2] and Borgs et al. [3, 4], an effective version of a sparse ‘counting lemma’ that connects the cut metric with subgraph counts, is still missing. There are numerous powerful upper bounds for tail probabilities of nonlinear functions of independent Bernoulli random variables, such as the bounded difference inequality popularized by McDiarmid [18], and the inequalities of Talagrand [19], Latała [16], Kim and Vu [14, 15], Vu [20] and Janson et al. [13]. However, these upper bounds generally do not yield the right constants in the exponent, which is the goal of large deviations theory. Even establishing the right powers of n and p in the rate functions for subgraph densities has been quite challenging—see, for example, Chatterjee [7] and DeMarco and Kahn [10, 11]. Motivated by the problem of understanding the large deviations of subgraph densities in sparse random graphs, the nonlinear large deviation theory presented in this chapter was developed in Chatterjee and Dembo [8]. The technique used for proving Lemmas 8.1 and 8.2 was originally developed in Chatterjee [5, 6] and Chatterjee and Dey [9] in the context of developing rigorous formulations and proofs of mean field equations. The move from mean field equations to the mean field approximation of Theorem 8.1 required the low complexity gradient condition, which was the main new idea in Chatterjee and Dembo [8]. The proof of Theorem 8.6 presented here is slightly shorter than the one in [8], eliminating the need for using a result of Janson et al. [13], which would have required a large number of additional pages if I attempted to present the proof here. This comes at the cost of a slightly worse error bound. Theorem 8.7 was proved in Lubetzky and Zhao [17]. The proof presented here is almost a verbatim copy of the proof in Lubetzky and Zhao [17]. The error bounds in Chatterjee and Dembo [8] allow p to go to zero slower than n1=42 in Theorem 8.7. The slightly worse error bound of Theorem 8.6 is the reason why p needs to decay slower than n1=158 here. However, these are all very far from the conjectured threshold: It is believed that Theorem 8.7 holds whenever p ! 0 slower than n1=2 . Incidentally, Lubetzky and Zhao [17] show that with the use of a weak version of

162

8 Large Deviations for Sparse Graphs

Szemerédi’s regularity lemma, it is possible to probe a sparse large deviation result as long as p tends to zero slower than a negative power of log n. Lubetzky and Zhao [17] also proved analogous results for the density of cliques. These results were extended to general homomorphism densities by Bhattacharya et al. [1], who obtained the following beautiful explicit formula for the rate function for general homomorphism densities. Take any finite simple graph H with maximum degree . Let H  be the induced subgraph of H on all vertices whose degree in H is . Recall that an independent set in a graph is a set of vertices such that no two are connected by an edge. Also, recall that a graph is called regular if all its vertices have the same degree, and irregular otherwise. Define a polynomial PH  .x/ WD

X

iH  .k/xk ;

k

where iH  .k/ is the number of k-element independent sets in H  . Theorem 8.8 (Bhattacharya et al. [1]) Let H be a connected finite simple graph on k vertices with maximum degree   2. Then for any ı > 0, there is a unique positive number  D .H; ı/ that solves PH  ./ D 1 C ı, where PH  is the polynomial defined above. Let Hn;p be the number of homomorphisms of H into an Erd˝os–Rényi G.n; p/ random graph. Then there is a constant ˛H > 0 depending only on H, such that if n ! 1 and p ! 0 slower than n˛H , then for any ı > 0,  P.Hn;p  .1 C ı/E.Hn;p // D exp

 .1 C o.1//c.ı/n2p log

 1 ; p

where ( c.ı/ D

minf; 12 ı 2=k g

if H is regular,



if H is irregular.

The formula given in Theorem 8.8 is more than just a formula. It gives a hint at the conditional structure of the graph, and at the nature of phase transitions as ı varies. Unlike the dense case, it is hard to give a precise meaning to claims about the conditional structure in the sparse setting due to the lack of an adequate sparse graph limit theory. The paper [1] also gives a number of examples where the coefficient c.ı/ in Theorem 8.8 can be explicitly computed. For instance, if H D C4 , the cycle of length four, then

c.ı/ D

8 p