Mathematical foundations and applications of graph entropy 9783527339099, 3527339094, 9783527693221, 352769322X, 9783527693245, 3527693246

The book introduces to the reader a number of cutting edge statistical methods which can e used for the analysis of geno

547 104 5MB

English Pages [299] Year 2016

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

The Statistical Foundations Of Entropy 9813234121, 9789813234123

This book presents an innovative unified approach to the statistical foundations of entropy and the fundamentals of e

632 143 1MB Read more

Foundations Of Mathematical Logic

1,856 430 27MB Read more

Mathematical foundations of the calculus of probability

1,538 159 11MB Read more

Chemical applications of graph theory 0120760509, 9780120760503

Shipped from UK, please allow 10 to 21 business days for arrival. Chemical applications of graph theory, hardcover, Good

272 46 3MB Read more

Applications of Graph Theory 0127578404, 9780127578408

303 43 6MB Read more

Modern Applications of Graph Theory 0198856741, 9780198856740

Modern Applications of Graph Theory discusses many cutting-edge applications of graph theory, such as traffic networks,

388 188 4MB Read more

Mathematical Foundations of Fuzzy Sets 9781119981527, 1119981522

Mathematical Foundations of Fuzzy Sets Introduce yourself to the foundations of fuzzy logic with this easy-to-use guide

143 86 37MB Read more

Mathematical Foundations of Fuzzy Sets 9781119981527

375 173 3MB Read more

Graph Theory and Its Applications, third edition

9,023 1,156 9MB Read more

High-Entropy Materials: Advances and Applications 9781032323916, 9781032335209, 9781003319986

Research in the field of high-entropy materials is advancing rapidly. High-Entropy Materials: Advances and Applications

247 42 13MB Read more

Mathematical foundations and applications of graph entropy
9783527339099, 3527339094, 9783527693221, 352769322X, 9783527693245, 3527693246

Author / Uploaded
Dehmer
Matthias

Table of contents :
Cover......Page 1
Title Page......Page 5
Copyright......Page 6
Contents......Page 7
List of Contributors......Page 13
Preface......Page 17
Chapter 1 Entropy and Renormalization in Chaotic Visibility Graphs......Page 19
1.1 Mapping Time Series to Networks......Page 20
1.1.1 Natural and Horizontal Visibility Algorithms......Page 22
1.1.2.2 Hurricanes......Page 26
1.1.2.5 Physiology......Page 27
1.2.1 Definitions of Entropy in Visibility Graphs......Page 28
1.2.2 Pesin Theorem in Visibility Graphs......Page 30
1.2.3 Graph Entropy Optimization and Critical Points......Page 37
1.3 Renormalization Group Transformations of Horizontal Visibility Graphs......Page 44
1.3.1 Tangent Bifurcation......Page 47
1.3.2 Period-Doubling Accumulation Point......Page 49
1.3.3 Quasi-Periodicity......Page 50
1.3.4 Entropy Extrema and RG Transformation......Page 52
1.3.4.3 Quasi-periodicity......Page 53
1.4 Summary......Page 54
References......Page 55
2.1 Introduction......Page 59
2.2 Generalized Entropies......Page 60
2.3 Entropy of Networks: Definition and Properties......Page 61
2.4 Application of Generalized Entropy for Network Analysis......Page 63
2.5 Open Networks......Page 71
2.6 Summary......Page 77
References......Page 78
3.1.1 Background......Page 81
3.1.2 Basic Ideas of Information Thermodynamics......Page 82
3.1.3 Outline of this Chapter......Page 83
3.2.1 Shannon Entropy......Page 84
3.2.2 Relative Entropy......Page 85
3.2.3 Mutual Information......Page 86
3.2.4 Transfer Entropy......Page 87
3.3.1 Setup......Page 88
3.3.2 Energetics......Page 90
3.3.3 Entropy Production and Fluctuation Theorem......Page 91
3.4 Bayesian Networks......Page 94
3.5.1 Setup......Page 97
3.5.2 Information Contents on Bayesian Networks......Page 98
3.5.3 Entropy Production......Page 101
3.5.4 Generalized Second Law......Page 102
3.6.2 Example 2: Feedback Control with a Single Measurement......Page 104
3.6.3 Example 3: Repeated Feedback Control with Multiple Measurements......Page 107
3.6.4 Example 4: Markovian Information Exchanges......Page 109
3.6.5 Example 5: Complex Dynamics......Page 112
3.7 Summary and Prospects......Page 113
References......Page 114
Chapter 4 Entropy, Counting, and Fractional Chromatic Number......Page 119
4.1 Entropy of a Random Variable......Page 120
4.3 Entropy and Counting......Page 122
4.5 Entropy of a Convex Corner......Page 125
4.6 Entropy of a Graph......Page 126
4.7 Basic Properties of Graph Entropy......Page 128
4.8 Entropy of Some Special Graphs......Page 130
4.9 Graph Entropy and Fractional Chromatic Number......Page 134
4.10 Symmetric Graphs with respect to Graph Entropy......Page 137
4.11 Conclusion......Page 138
Appendix 4.A......Page 139
References......Page 148
5.1 Introduction......Page 151
5.2.1 Inequalities for Classical Graph Entropies and Parametric Measures......Page 157
5.2.2 Graph Entropy Inequalities with Information Functions fV, fP and fC......Page 159
5.2.3 Information Theoretic Measures of UHG Graphs......Page 161
5.2.4 Bounds for the Entropies of Rooted Trees and Generalized Trees......Page 164
5.2.5 Information Inequalities for If(G) based on Different Information Functions......Page 166
5.2.6 Extremal Properties of Degree- and Distance-Based Graph Entropies......Page 171
5.2.7 Extremality of If��(G), If2(G), If3(G) and Entropy Bounds for Dendrimers......Page 175
5.2.8 Sphere-Regular Graphs and the Extremality Entropies If2 and If��(G)......Page 181
5.2.9 Information Inequalities for Generalized Graph Entropies......Page 184
5.3 Relationships between Graph Structures, Graph Energies, Topological Indices, and Generalized Graph Entropies......Page 189
5.4 Summary and Conclusion......Page 197
References......Page 198
6.1 Introduction......Page 201
6.2 Random Graphs......Page 202
6.3 Graph Spectrum......Page 205
6.4 Graph Spectral Entropy......Page 207
6.6 Jensen-Shannon Divergence......Page 210
6.7 Model Selection and Parameter Estimation......Page 211
6.8 Hypothesis Test between Graph Collections......Page 213
6.9 Final Considerations......Page 216
6.9.1 Model Selection for Protein-Protein Networks......Page 217
6.10 Conclusions......Page 218
References......Page 219
7.1.1 Structure of the Chapter......Page 221
7.1.2 Quantitative Graph Theory......Page 222
7.1.3 Graph Models in Image Analysis......Page 223
7.1.4.1 Complementarity of Texture and Shape......Page 224
7.1.4.2 Texture Models......Page 225
7.1.4.3 Texture Segmentation......Page 226
7.2 Graph Entropy-Based Texture Descriptors......Page 227
7.2.1 Graph Construction......Page 228
7.2.2 Entropy-Based Graph Indices......Page 229
7.2.2.2 Bonchev and Trinajstic's Mean Information on Distances......Page 230
7.2.2.3 Dehmer Entropies......Page 231
7.3.1 Basic GAC Evolution for Grayscale Images......Page 232
7.3.2 Force Terms......Page 233
7.3.4 Remarks on Numerics......Page 234
7.4.1 First Synthetic Example......Page 235
7.4.2 Second Synthetic Example......Page 236
7.4.3 Real-World Example......Page 238
7.5.1 Rewriting the Information Functionals......Page 239
7.5.2 Infinite Resolution Limits of Graphs......Page 240
7.5.3 Fractal Analysis......Page 241
7.6 Conclusion......Page 244
References......Page 245
8.1 Introduction......Page 251
8.2.1 Information Content Measures......Page 254
8.2.2 Information Content of Partition of a Positive Integer......Page 258
8.2.3 Information Content of Graph......Page 261
8.2.3.1 Information Content of Graph on Vertex Degree......Page 263
8.2.3.2 Information Content of Graph on Topological Distances......Page 264
8.2.4 Information Content on the Shortest Molecular Path......Page 269
8.2.4.1 Computation of Example Indices......Page 270
8.3 Prediction of Physical Entropy......Page 271
8.3.1 Prediction of Entropy using Information Theoretical Indices......Page 272
8.4 Conclusion......Page 274
References......Page 275
9.1 Introduction......Page 277
9.1.1 Challenges in Bibliometric Data Sets, or Why Should We Consider Entropy Measures?......Page 278
9.2 State of the Art......Page 279
9.2.1 Graphs and Text Mining......Page 280
9.2.2 Graph Entropy for Data Mining and Knowledge Discovery......Page 281
9.2.3 Graphs from Bibliometric Data......Page 282
9.4 Method and Materials......Page 284
9.5 Results......Page 285
9.6.1 Open Problems......Page 289
References......Page 290
Index......Page 293
EULA......Page 299

Citation preview

Edited by Matthias Dehmer, Frank Emmert-Streib, Zengqiang Chen, Xueliang Li, and Yongtang Shi Mathematical Foundations and Applications of Graph Entropy

“Quantitative and Network Biology” Series editors M. Dehmer and F. Emmert-Streib

Advisory Board: Albert-László Barabási Northeastern University & Harvard Medical School, USA Douglas Lauffenburger Massachusetts Institute of Technology, USA Satoru Miyano University of Tokyo, Japan Ilya Shmulevich Institute for Systems Biology & University of Washington, USA

Previous Volumes of this Series: Volume 1 Dehmer, M., Emmert-Streib, F., Graber, A., Salvador, A. (eds.)

Applied Statistics for Network Biology Methods in Systems Biology

Volume 4 Emmert-Streib, F. Dehmer, M. (eds.)

Advances in Network Complexity 2013 ISBN: 978-3-527-33291-5

2011 ISBN: 978-3-527-32750-8

Volume 2 Dehmer, M., Varmuza, K., Bonchev, D. (eds.)

Statistical Modelling of Molecular Descriptors in QSAR/QSPR

Volume 5 Dehmer, M., Emmert-Streib, F., Pickl, S. (eds.)

Computational Network Theory 2015 ISBN: 978-3-527-33724-8

2012 ISBN: 978-3-527-32434-7

Volume 3 Emmert-Streib, F. Dehmer, M. (eds.)

Volume 6 Dehmer, M., Chen, Z., Li, X., Shi, Y., Emmert-Streib, F.

Statistical Diagnostics for Cancer

Mathematical Foundations and Applications of Graph Entropy

Analyzing High-Dimensional Data

2017

2013 ISBN: 978-3-527-32434-7

ISBN: 978-3-527-33909-9

Quantitative and Network Biology Series Editors M. Dehmer and F. Emmert-Streib Volume 6

Mathematical Foundations and Applications of Graph Entropy

Edited by Matthias Dehmer, Frank Emmert-Streib, Zengqiang Chen, Xueliang Li, and Yongtang Shi

The Editors Prof. Matthias Dehmer

Nankai University College of Computer and Control Engineering Tianjin 300071 PR China and UMIT - The Health & Life Sciences University Department of Biomedical Computer Sciences and Mechatronics Eduard-Wallnöfer-Zentrum 1 6060 Hall/Tyrol Austria Prof. Frank Emmert-Streib

Tampere University of Technology Predictive Medicine and Analytics Lab Department of Signal Processing Tampere Finland Prof. Zengqiang Chen

Nankai University College of Computer and Control Engineering Tianjin 300071 PR China Prof. Xueliang Li

Nankai University Center for Combinatorics 94 Weijin Road Tianjin 300071 China Prof. Yongtang Shi

Nankai University Center for Combinatorics No.94 Weijin Road Tianjin 300071 China

All books published by Wiley-VCH are carefully produced. Nevertheless, authors, editors, and publisher do not warrant the information contained in these books, including this book, to be free of errors. Readers are advised to keep in mind that statements, data, illustrations, procedural details or other items may inadvertently be inaccurate. Library of Congress Card No.: applied for British Library Cataloguing-in-Publication Data

A catalogue record for this book is available from the British Library. Bibliographic information published by the Deutsche Nationalbibliothek

The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at . © 2016 Wiley-VCH Verlag GmbH & Co. KGaA, Boschstr. 12, 69469 Weinheim, Germany All rights reserved (including those of translation into other languages). No part of this book may be reproduced in any form – by photoprinting, microfilm, or any other means – nor transmitted or translated into a machine language without written permission from the publishers. Registered names, trademarks, etc. used in this book, even when not specifically marked as such, are not to be considered unprotected by law. Print ISBN: 978-3-527-33909-9 ePDF ISBN: 978-3-527-69322-1 ePub ISBN: 978-3-527-69325-2 Mobi ISBN: 978-3-527-69323-8 oBook ISBN: 978-3-527-69324-5 Typesetting SPi Global, Chennai, India

Printed on acid-free paper

V

Contents List of Contributors XI Preface XV 1

Entropy and Renormalization in Chaotic Visibility Graphs 1 Bartolo Luque, Fernando Javier Ballesteros, Alberto Robledo, and Lucas Lacasa

1.1 1.1.1 1.1.2 1.1.2.1 1.1.2.2 1.1.2.3 1.1.2.4 1.1.2.5 1.2 1.2.1 1.2.2 1.2.3 1.3

Mapping Time Series to Networks 2 Natural and Horizontal Visibility Algorithms 4 A Brief Overview of Some Initial Applications 8 Seismicity 8 Hurricanes 8 Turbulence 9 Financial Applications 9 Physiology 9 Visibility Graphs and Entropy 10 Definitions of Entropy in Visibility Graphs 10 Pesin Theorem in Visibility Graphs 12 Graph Entropy Optimization and Critical Points 19 Renormalization Group Transformations of Horizontal Visibility Graphs 26 Tangent Bifurcation 29 Period-Doubling Accumulation Point 31 Quasi-Periodicity 32 Entropy Extrema and RG Transformation 34 Intermittency 35 Period Doubling 35 Quasi-periodicity 35 Summary 36 Acknowledgments 37 References 37

1.3.1 1.3.2 1.3.3 1.3.4 1.3.4.1 1.3.4.2 1.3.4.3 1.4 1.5 2

Generalized Entropies of Complex and Random Networks 41 Vladimir Gudkov

2.1

Introduction 41

VI

Contents

2.2 2.3 2.4 2.5 2.6

Generalized Entropies 42 Entropy of Networks: Definition and Properties 43 Application of Generalized Entropy for Network Analysis 45 Open Networks 53 Summary 59 References 60

3

Information Flow and Entropy Production on Bayesian Networks 63 Sosuke Ito and Takahiro Sagawa

3.1 3.1.1 3.1.2 3.1.3 3.2 3.2.1 3.2.2 3.2.3 3.2.4 3.3 3.3.1 3.3.2 3.3.3 3.4 3.5 3.5.1 3.5.2

Introduction 63 Background 63 Basic Ideas of Information Thermodynamics 64 Outline of this Chapter 65 Brief Review of Information Contents 66 Shannon Entropy 66 Relative Entropy 67 Mutual Information 68 Transfer Entropy 69 Stochastic Thermodynamics for Markovian Dynamics 70 Setup 70 Energetics 72 Entropy Production and Fluctuation Theorem 73 Bayesian Networks 76 Information Thermodynamics on Bayesian Networks 79 Setup 79 Information Contents on Bayesian Networks 80 Entropy Production 83 Generalized Second Law 84 Examples 86 Example 1: Markov Chain 86 Example 2: Feedback Control with a Single Measurement 86 Example 3: Repeated Feedback Control with Multiple Measurements 89 Example 4: Markovian Information Exchanges 91 Example 5: Complex Dynamics 94 Summary and Prospects 95 References 96

3.5.3 3.5.4 3.6 3.6.1 3.6.2 3.6.3 3.6.4 3.6.5 3.7

4

Entropy, Counting, and Fractional Chromatic Number 101 Seyed Saeed Changiz Rezaei

4.1 4.2 4.3

Entropy of a Random Variable 102 Relative Entropy and Mutual Information Entropy and Counting 104

104

Contents

4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11

Graph Entropy 107 Entropy of a Convex Corner 107 Entropy of a Graph 108 Basic Properties of Graph Entropy 110 Entropy of Some Special Graphs 112 Graph Entropy and Fractional Chromatic Number 116 Symmetric Graphs with respect to Graph Entropy 119 Conclusion 120 Appendix 4.A 121 References 130

5

Graph Entropy: Recent Results and Perspectives Xueliang Li and Meiqin Wei

5.1 5.2

Introduction 133 Inequalities and Extremal Properties on (Generalized) Graph Entropies 139 Inequalities for Classical Graph Entropies and Parametric Measures 139 Graph Entropy Inequalities with Information Functions f V , f P and f C 141 Information Theoretic Measures of UHG Graphs 143 Bounds for the Entropies of Rooted Trees and Generalized Trees 146 Information Inequalities for If (G) based on Different Information Functions 148 Extremal Properties of Degree- and Distance-Based Graph Entropies 153 Extremality of If 𝜆 (G), If 2 (G) If 3 (G) and Entropy Bounds for Dendrimers 157 Sphere-Regular Graphs and the Extremality Entropies If 2 (G) and If 𝜎 (G) 163 Information Inequalities for Generalized Graph Entropies 166 Relationships between Graph Structures, Graph Energies, Topological Indices, and Generalized Graph Entropies 171 Summary and Conclusion 179 References 180

5.2.1 5.2.2 5.2.3 5.2.4 5.2.5 5.2.6 5.2.7 5.2.8 5.2.9 5.3 5.4

133

6

Statistical Methods in Graphs: Parameter Estimation, Model Selection, and Hypothesis Test 183 Suzana de Siqueira Santos, Daniel Yasumasa Takahashi, João Ricardo Sato, Carlos Eduardo Ferreira, and André Fujita

6.1 6.2 6.3 6.4

Introduction 183 Random Graphs 184 Graph Spectrum 187 Graph Spectral Entropy

189

VII

VIII

Contents

6.5 6.6 6.7 6.8 6.9 6.9.1 6.9.2 6.9.3 6.10 6.11

Kullback–Leibler Divergence 192 Jensen–Shannon Divergence 192 Model Selection and Parameter Estimation 193 Hypothesis Test between Graph Collections 195 Final Considerations 198 Model Selection for Protein–Protein Networks 199 Hypothesis Test between the Spectral Densities of Functional Brain Networks 200 Entropy of Brain Networks 200 Conclusions 200 Acknowledgments 201 References 201

7

Graph Entropies in Texture Segmentation of Images 203 Martin Welk

7.1 7.1.1 7.1.2 7.1.3 7.1.4 7.1.4.1 7.1.4.2 7.1.4.3 7.2 7.2.1 7.2.2 7.2.2.1 7.2.2.2 7.2.2.3 7.3 7.3.1 7.3.2 7.3.3 7.3.4 7.4 7.4.1 7.4.2 7.4.3 7.5 7.5.1 7.5.2 7.5.3 7.6

Introduction 203 Structure of the Chapter 203 Quantitative Graph Theory 204 Graph Models in Image Analysis 205 Texture 206 Complementarity of Texture and Shape 206 Texture Models 207 Texture Segmentation 208 Graph Entropy-Based Texture Descriptors 209 Graph Construction 210 Entropy-Based Graph Indices 211 Shannon’s Entropy 212 Bonchev and Trinajstić’s Mean Information on Distances 212 Dehmer Entropies 213 Geodesic Active Contours 214 Basic GAC Evolution for Grayscale Images 214 Force Terms 215 Multichannel Images 216 Remarks on Numerics 216 Texture Segmentation Experiments 217 First Synthetic Example 217 Second Synthetic Example 218 Real-World Example 220 Analysis of Graph Entropy-Based Texture Descriptors 221 Rewriting the Information Functionals 221 Infinite Resolution Limits of Graphs 222 Fractal Analysis 223 Conclusion 226 References 227

Contents

8

Information Content Measures and Prediction of Physical Entropy of Organic Compounds 233 Chandan Raychaudhury and Debnath Pal

8.1 8.2 8.2.1 8.2.2 8.2.3 8.2.3.1 8.2.3.2 8.2.3.3 8.2.4 8.2.4.1 8.3 8.3.1 8.4 8.5

Introduction 233 Method 236 Information Content Measures 236 Information Content of Partition of a Positive Integer 240 Information Content of Graph 243 Information Content of Graph on Vertex Degree 245 Information Content of Graph on Topological Distances 246 Information Content of Vertex-Weighted Graph 251 Information Content on the Shortest Molecular Path 251 Computation of Example Indices 252 Prediction of Physical Entropy 253 Prediction of Entropy using Information Theoretical Indices 254 Conclusion 256 Acknowledgment 257 References 257

9

Application of Graph Entropy for Knowledge Discovery and Data Mining in Bibliometric Data 259 André Calero Valdez, Matthias Dehmer, and Andreas Holzinger

9.1 9.1.1

Introduction 259 Challenges in Bibliometric Data Sets, or Why Should We Consider Entropy Measures? 260 Structure of this Chapter 261 State of the Art 261 Graphs and Text Mining 262 Graph Entropy for Data Mining and Knowledge Discovery 263 Graphs from Bibliometric Data 264 Identifying Collaboration Styles using Graph Entropy from Bibliometric Data 266 Method and Materials 266 Results 267 Discussion and Future Outlook 271 Open Problems 271 A Polite Warning 272 References 272

9.1.2 9.2 9.2.1 9.2.2 9.2.3 9.3 9.4 9.5 9.6 9.6.1 9.6.2

Index

275

IX

XI

List of Contributors Fernando Javier Ballesteros

Matthias Dehmer

University of Valencia (UV) Astronomic Observatory 2 Jose Beltran Street CP E-46980 Paterna Valencia Spain

Nankai University College of Computer and Control Engineering Tianjin 300071 PR China

André Calero Valdez

UMIT–The Health & Life Sciences University Department of Biomedical Computer Sciences and Mechatronics Eduard-Wallnöfer-Zentrum 1 6060 Hall/Tyrol Austria

RWTH Aachen University Human Technology Research Center Campus Boulevard 57 52074 Aachen Germany

and

Seyed Saeed Changiz Rezaei

1QBit Information Technology 458–550 Burrard Street Vancouver, BC V6C 2B5 Canada Suzana de Siqueira Santos

University of São Paulo Department of Computer Science Institute of Mathematics and Statistics Rua do Matão 1010–Cidade Universitária São Paulo–SP 05508-090 Brazil

Carlos Eduardo Ferreira

University of São Paulo Department of Computer Science Institute of Mathematics and Statistics Rua do Matão 1010–Cidade Universitária São Paulo–SP 05508-090 Brazil

XII

List of Contributors

André Fujita

Lucas Lacasa

University of São Paulo Department of Computer Science Institute of Mathematics and Statistics Rua do Matão 1010–Cidade Universitária São Paulo–SP 05508-090 Brazil

Queen Mary University of London School of Mathematical Sciences Mile End Road London E14NS UK

Vladimir Gudkov

University of South Carolina Department of Physics and Astronomy 712 Main Street Columbia, SC 29208 USA Andreas Holzinger

Medical University Graz Research Unit HCI-KDD Institute for Medical Informatics Statistics and Documentation Austria and Graz University of Technology Institute for Information Systems and Computer Media Austria Sosuke Ito

Department of Physics Tokyo Institute of Technology Oh-okayama 2-12-1 Meguro-ku Tokyo 152-8551 Japan

Xueliang Li

Nankai University Center for Combinatorics 94 Weijin Road Tianjin 300071 PR China Bartolo Luque

Polytechnic University of Madrid (UPM) Department of Mathematics 3, Cardenal Cisneros CP 28040 Madrid Spain Debnath Pal

Department of Computational and Data Sciences Indian Institute of Science C. V. Raman Avenue Bangalore 560012 India Chandan Raychaudhury

Department of Computational and Data Sciences Indian Institute of Science C. V. Raman Avenue Bangalore 560012 India

List of Contributors

Alberto Robledo

Daniel Yasumasa Takahashi

National University of Mexico (UNAM) Institute of Physics and Center for Complexity Sciences Mexico City Distrito Federal CP 01000 Mexico Mexico

Princeton University Department of Psychology and Neuroscience Institute Green Hall Princeton, NJ 08544 USA

Takahiro Sagawa

The University of Tokyo Department of Applied Physics 7-3-1 Hongo Bunkyo-ku Tokyo 113-8656 Japan João Ricardo Sato

Federal University of ABC Center of Mathematics, Computation, and Cognition Rua Santa Adelia 166, Santo Andrè SP 09210-170 Brazil

Meiqin Wei

Nankai University Center for Combinatorics 94 Weijin Road Tianjin 300071 PR China Martin Welk

University for Health Sciences Medical Informatics and Technology (UMIT) Institute for Biomedical Computer Science Biomedical Image Analysis Division Eduard-Wallnöfer-Zentrum 1 6060 Hall/Tyrol Austria

XIII

XV

Preface Graph entropy measures represent information-theoretic measures for characterizing networks quantitatively. The first concepts in this framework were developed in the 1950s for investigating biological and chemical systems. Seminal work on this problem was done by Rashevsky, Trucco, and Mowshowitz, who investigated entropy measures for quantifying the so-called structural information content of a graph. To date, numerous graph entropies have been developed and applied to various problems in theoretical and applied disciplines. Examples are biology, computational biology, mathematical chemistry, Web mining, and knowledge engineering. Many of the described quantitative measures have been used for capturing network complexity. However, network complexity is constantly observed, and hence there is no right measure. Consequently, developing efficient information-theoretic graph measures (graph entropies) has been intricate and the overall process depends on a practical application. That is why several graph entropies that have been developed are not mathematically investigated. Graph entropies have been explored from different perspectives in a variety of disciplines, including discrete mathematics, computer science, finance, computational biology, knowledge mining, structural chemistry, and applications thereof such as structure-oriented drug design (quantitative structure–activity relationship/quantitative structure–property relationship (QSAR/QSPR). From a theoretical viewpoint, exploring properties of graph entropy measures has been crucial but intricate. Because some well-known graph entropies are not computable on large networks (e.g., Körner’s entropy), proving mathematical properties and interrelations has been even more important. The main goal of this book is to present and explain methods for defining graph entropies meaningfully. Furthermore, it deals with applying different graph entropy concepts to completely different application areas. This book is intended for researchers, graduates, and advanced undergraduate students in the fields of mathematics, computer science, chemistry, computational physics, bioinformatics, and systems biology. Many colleagues have provided us with input, help, and support before and during the preparation of this book. In particular, we would like to thank Abbe Mowshowitz, Maria and Gheorghe Duca, Andrey A. Dobrynin, Boris Furtula, Ivan

XVI

Preface

Gutman, Bo Hu, D. D. Lozovanu, Alexei Levitchi, Andrei Perjan, Ricardo de Matos Simoes, Fred Sobik, Shailesh Tripathi, Kurt Varmuza, Guihai Yu, and Dongxiao Zhu. Apologies go to those whose names have been inadvertently missed. Furthermore, we would like to thank the editors from Wiley-Blackwell, who have been always available and helpful. Last but not least, Matthias Dehmer thanks the Austrian Science Funds (project P22029-N13) for supporting this book. Zengqiang Chen, Matthias Dehmer, Xueliang Li and Yongtang Shi thank the National Natural Science Foundation of China and Nankai University for their support. Matthias Dehmer also thanks his sister Marion Dehmer-Sehn, who passed away in 2012 for giving love and mental support. To the best of our knowledge, this book is the first of its kind that is dedicated exclusively to graph entropy . Existing books dealing with related topics such as network complexity and complex networks have limited scope in the sense that they only consider specialized graph measures for specific applications. Therefore, we believe that this book will broaden the scope of scientists dealing with graph entropies. Finally, we hope this book conveys the enthusiasm and joy we have for this field and inspires fellow researchers in their own practical or theoretical work. Hall/Tyrol, Tianjian, and Tampere Matthias Dehmer, Frank Emmert-Streib, Zengqiang Chen Xueliang Li and Yongtang Shi

1

1 Entropy and Renormalization in Chaotic Visibility Graphs Bartolo Luque, Fernando Javier Ballesteros, Alberto Robledo, and Lucas Lacasa

In this chapter, we concentrate on a mapping from time series to graphs, the visibility algorithm introduced by Lacasa et al. [1]. In order to cite some of its most relevant features, we will stress its intrinsic nonlocality, low computational cost, straightforward implementation, and quite “simple” way of inheriting the time series properties in the structure of the associated graphs. These features will make it easier to find connections between the underlying processes and the networks obtained from them by a direct analysis of the latter. In particular, in this chapter, we will focus the implementation of the algorithm of visibility to three known routes to chaos. We will define a graph entropy and process of renormalization for visibility graphs that characterize these routes and analyze the relationship between the flow of renormalization and the extremes of the entropy function. Disregarding any underlying process, we can consider a time series just as an ordered set of values and transform this set into a different mathematical object with the aids of an abstract mapping [2, 3]. We can then ask which properties of the original set are conserved, which are transformed and how, what can we say about one of the mathematical representations just by looking at the other. This exercise is of mathematical interest by itself. In addition, it turns out that time series or signals is a universal method of extracting information from dynamical systems in any field of science. Therefore, the preceding mathematical mapping gains some unexpected practical interest as it opens the possibility of analyzing a time series from an alternative point of view. Of course, the relevant information stored in the original time series should be somehow conserved in the mapping. The motivation is completed when the new representation belongs to a relatively mature mathematical field, where information encoded in such a representation can be effectively disentangled and processed. This is, precisely, the first motivation to map time series into networks. This motivation is increased by two interconnected factors: (i) Although a mature field, time series analysis has some limitations, when it refers to study the so-called complex signals. Beyond the linear regime, there exists a wide range of phenomena, which are usually embraced in the field of the so-called Mathematical Foundations and Applications of Graph Entropy, First Edition. Edited by Matthias Dehmer, Frank Emmert-Streib, Zengqiang Chen, Xueliang Li, and Yongtang Shi. © 2016 Wiley-VCH Verlag GmbH & Co. KGaA. Published 2016 by Wiley-VCH Verlag GmbH & Co. KGaA.

2

1 Entropy and Renormalization in Chaotic Visibility Graphs

complex systems. Dynamical phenomena such as chaos, long-range correlated stochastic processes, intermittency, and multifractality are examples of complex phenomena, where time series analysis is pushed to its own limits. Nonlinear time series analysis develops from techniques such as nonlinear correlation functions, embedding algorithms, multrifractal spectra, and projection theorem tools that increase in complexity parallel to the complexity of the process/series under study. New approaches to deal with complexity are not only welcome, but needed. Approaches dealing with the intrinsic nonlinearity by being intrinsically nonlinear in turn deal with the possible multiscale character of the underlying process by being designed to naturally incorporated multiple scales, and such is the framework of networks, of graph theory. (ii) The technological era brings us the possibility to digitally analyze myriads of data in a glimpse. Massive data sets can nowadays be parsed, and with the aid of well-suited algorithms, we can gain access and filter data from many processes, let it be of physical, technological, or even social garment.

1.1 Mapping Time Series to Networks

The idea of mapping time series into graphs seems attractive, because it bridges two prolific fields of modern science as nonlinear signal analysis and complex networks theory, as much that it has attracted the attention of several research groups, which have contributed to the topic with different strategies of mapping. We shall briefly outline some of them. Zhang and Small [4] developed a method that mapped each cycle of a pseudoperiodic time series into a node in a graph. The connection between nodes was established by a distance threshold in the reconstructed phase space when possible or by the linear correlation coefficient between cycles in the presence of noise. Noisy periodic time series mapped into random graphs while chaotic time series did it into scale-free, small-world networks due to the presence of unstable periodic orbits. This method was subsequently applied to characterize cardiac dynamics. Xu, in collaboration with Zhang and Small [5], concentrated in the relative frequencies of appearance of four-node motifs inside a particular graph to classify it into a particular superfamily of networks, which corresponded to specific underlying dynamics of the mapped time series. In this case, the method of mapping consisted in embedding the time series in an appropriated phase space, where each point corresponded to a node in the network. A threshold was imposed not only in the minimum distance between two neighbors to be eligible (temporal separation should be greater than the mean period of the data), but also to the maximum number of neighbors a node could have. Different superfamilies were found for chaotic, hyperchaotic, random, and noisy periodic underlying dynamics and unique fingerprints were also found for specific dynamical systems within a family.

1.1

Mapping Time Series to Networks

Donner et al. [6–8] presented a technique, which was based on the properties of recurrence in the phase space of a dynamical system. More precisely, the recurrence matrix obtained by imposing a threshold in the minimum distance between two points in the phase space was interpreted as the adjacency matrix of an undirected, unweighted graph (as in Ref. [5]). Properties of such graphs at three different scales (local, intermediated, and global) were presented and studied on several paradigmatic systems (Hénon map, Rossler system, Lorenz system, and Bernoulli map). The variation of some of the properties of the graphs with the distance threshold was analyzed, the use of specific measures, such as the local clustering coefficient, was proposed as a way to detect dynamically invariant objects (saddle points or unstable periodic orbits), and studying the graph properties dependent on the embedding dimension was suggested as a means to distinguish between chaotic and stochastic systems. The Amaral Lab [9] contributed with an idea along the lines of Shirazi et al. [10], Strozzi et al. [11], and Haraguchi et al. [12] of a surjective mapping, which admits an inverse operation. This approach opens the reciprocal possibility of benefiting from time series analysis to study the structure and properties of networks. Time series are treated as Markov processes, and values are grouped in quantiles, which will correspond to nodes in the associated graph. Weighted and directed connections are established between nodes as a function of the probability of transition between quantiles. An inverse operation can be defined without an a priori knowledge of the correspondence between nodes and quantiles just by imposing a continuity condition in the time series by means of a cost function defined on the weighted adjacency matrix of the graph. A random walk is performed on the network and a time series with properties equivalent to the original one is recovered. This method was applied to a battery of cases, which included a periodic-to-random family of processes parameterized by a probability of transition, a pair of chaotic systems (Lorentz and Rossler attractors), and two human heart rate time series. Reciprocally, the inverse map was applied to the metabolic network of Arabidopsis thaliana and to the ’97 year Internet Network. In the same vein of an inverse transformation, Shimada et al. [13] proposed a framework to transform a complex network to a time series, realized by a multidimensional scaling. Applying the transformation method to a model proposed by Watts and Strogatz [14], they show that ring lattices are transformed to periodic time series, small-world networks to noisy periodic time series, and random networks to random time series. They also show that these relationships are analytically held by using the circulant matrix theory and the perturbation theory of linear operators. They generalize the results to several high-dimensional lattices. Gao and Jin proposed in Ref. [15] a method for constructing complex networks from a time series with each vector point of the reconstructed phase space represented by a single node and edge determined by the phase space distance. Through investigating an extensive range of network topology statistics, they find that the constructed network inherits the main properties of the time series in its structure. Specifically, periodic series and noisy series convert into regular networks and random networks, respectively, and networks generated from chaotic series

3

4

1 Entropy and Renormalization in Chaotic Visibility Graphs

typically exhibit small-world and scale-free features. Furthermore, they associate different aspects of the dynamics of the time series with the topological indices of the network and demonstrate how such statistics can be used to distinguish different dynamical regimes. Through analyzing the chaotic time series corrupted by measurement noise, they also indicate the antinoise ability of the method. Sinatra et al. [16] introduced a method to convert an ensemble of sequences of symbols into a weighted directed network, whose nodes are motifs, while the directed links and their weights are defined from statistically significant co-occurrences of two motifs in the same sequence. The analysis of communities of networks of motifs is shown to be able to correlate sequences with functions in the human proteome database, to detect hot topics from online social dialogs and characterize trajectories of dynamical systems. Sun et al. [17] have also proposed a novel method to transform a time series into a weighted and directed network. For a given time series, they first generate a set of segments via a sliding window, and then use a doubly symbolic scheme to characterize every windowed segment by combining absolute amplitude information with an ordinal pattern characterization. On the basis of this construction, a network can be directly constructed from the given time series: segments corresponding to different symbol-pairs are mapped to network nodes and the temporal succession between nodes is represented by directed links. With this conversion, dynamics underlying the time series has been encoded into the network structure. They illustrate the potential of their networks with a well-studied dynamical model as a benchmark example. Results show that network measures for characterizing global properties can detect the dynamical transitions in the underlying system. Moreover, they used a random walk algorithm to sample loops in networks, and found that a time series with different dynamics exhibits distinct cycle structure. That is, the relative prevalence of loops with different lengths can be used to identify the underlying dynamics. In the following, we will first present two versions of the visibility algorithm, our own alternative to these methods of mapping, along with its most notable properties that, in many cases, can be derived analytically. On the basis of these latter properties, several applications are addressed. 1.1.1 Natural and Horizontal Visibility Algorithms

Let {x(ti )}i=1,…,N be a time series of N data. The natural visibility algorithm [1] assigns each datum of the series to a node in the natural visibility graph (NVg). Two nodes i and j in the graph are connected if one can draw a straight line in the time series joining x(ti ) and x(tj ) that does not intersect any intermediate data height x(tk ) (see Figure 1.1 for a graphical illustration). Hence, i and j are two connected nodes if the following geometrical criterion is fulfilled within the time series: x(tk ) < x(ti ) + (x(tj ) − x(ti ))

tk − ti . tj − tk

(1.1)

1.1

Mapping Time Series to Networks

0.87,0.49,0.36,0.83,0.87,0.49,0.36,0.83,0.87,0.49,0.36,0.83,0.87,0.49,0.36,0.83...

1.0

0.5

0.0

Figure 1.1 Illustrative example of the natural visibility algorithm. In the upper part, we plot a periodic time series and in the bottom part, we represent the graph generated through the natural visibility algorithm. Each datum in the series corresponds to a node in the graph, such that two nodes are connected if their corresponding data heights fulfill the visibility criterion of equation 1.1.

Note that the degree distribution of the visibility graph is composed by a finite number of peaks, much in the vein of the discrete Fourier transform (DFT) of a periodic signal. We can thus interpret the visibility algorithm as a geometric transform. (Luque et al. [18]. Reproduced with permission of American Physical Society.)

It can be easily checked that by means of the present algorithm, the associated graph extracted from a time series is always: (i) Connected: each node sees at least its nearest neighbors (left- and righthand sides). (ii) Undirected: the way the algorithm is built up, there is no direction defined in the links. (iii) Invariant under affine transformations of the series data: the visibility criterion is invariant under (unsigned) linear rescaling of both horizontal and vertical axis, as well as under horizontal and vertical translations. (iv) “Lossy”: some information regarding the time series is inevitably lost in the mapping from the fact that the network structure is completely determined in the adjacency matrix. For instance, two periodic series with the same period as T1 = … , 3, 1, 3, 1, … and T2 = … , 3, 2, 3, 2, … would have the same visibility graph, albeit being quantitatively different. One straightforward question is: what does the visibility algorithm stand for? In order to deepen the geometric interpretation of the visibility graph, let us focus on a periodic series. It is straightforward that its visibility graph is a concatenation of a motif: a repetition of a pattern (see Figure 1.1). Now, which is the degree distribution p(k) of this visibility graph? Since the graph is just a motif’s repetition, the degree distribution will be formed by a finite number of nonnull values, this number being related to the period of the associated periodic series.

5

6

1 Entropy and Renormalization in Chaotic Visibility Graphs

This behavior reminds us the DFT, in which periodic series is formed by a finite number of peaks (vibration modes) related to the series period. Using this analogy, we can understand the visibility algorithm as a geometric transform. Whereas a DFT decomposes a signal in a sum of (eventually infinite) modes, the visibility algorithm decomposes a signal in a concatenation of graph’s motifs, and the degree distribution simply makes a histogram of such “geometric modes.” While the time series is defined in the time domain and the DFT is defined on the frequency domain, the visibility graph is then defined on the “visibility domain.” In fact this analogy is, so far, a simple metaphor to help our intuition, this transform is not a reversible one for instance. An alternative criterion for the construction of the visibility graph is defined as follows: let {x(ti )}i=1,…,N be a time series of N data. The so-called horizontal visibility algorithm [18] assigns each datum of the series to a node in the horizontal visibility graph (HVg). Two nodes i and j in the graph are connected if one can draw a horizontal line in the time series joining x(ti ) and x(tj ) that does not intersect any intermediate data height (see Figure 1.2 for a graphical illustration). Hence, i and j are two connected nodes if the following geometrical criterion is fulfilled within the time series: x(ti ), x(tj ) > x(tn ) for all n such that i < n < j.

(1.2)

This algorithm is a simplification of the Natural Visibility algorithm (NVa). In fact, the HVg is always a subgraph of its associated NVg for the same time series (see Figure 1.2). Besides, the HVg graph will also be (i) connected, (ii) undirected, (iii) invariant under affine transformations of the series, and (iv) “lossy.” Some concrete properties of these graphs can be found in Refs [18–21]. HVg method is quite more tractable analytically than NVg. Hence, for example, if {xi } is a bi-infinite sequence of independent and identically distributed random variables extracted from a continuous probability density f (x), then its associated HVg has degree distribution: ( ) 1 2 k−2 p(k) = , k = 2, 3, 4, … . (1.3) 3 3 A lengthy constructive proof can be found in Ref. [18] and alternative, shorter proofs can be found in Ref. [22]. The mean degree k of the HVg associated to an uncorrelated random process is then: ∞ ( ) ∑ ∑ k 2 k−2 k= 𝑘𝑝(k) = = 4. (1.4) 3 3 k=2 In fact, the mean degree of an HVg associated to an infinite periodic series of period T (with no repeated values within a period) is ) ( 1 . (1.5) k(T) = 4 1 − 2T A proof can be found in Ref. [22]. An interesting consequence is that every time series has an associated HVg with a mean degree 2 ≤ k ≤ 4, where the lower bound is reached for constant series, whereas the upper bound is reached for aperiodic series [18].

1.1

Mapping Time Series to Networks

(a)

(b) Figure 1.2 Illustrative example of the Natural (a) and Horizontal (b) visibility algorithms. We plot the same time series and represent the graphs generated through both visibility algorithms below. Each datum in the series corresponds to a node in the graph, such that two nodes are connected if their corresponding data heights fulfill the visibility

criteria of equations 1.1 and 1.2, respectively. Observe that the Horizontal Visibility graph is a subgraph of the Natural Visibility graph for the same time series. (Luque et al. [18]. Reproduced with permission of American Physical Society.)

7

8

1 Entropy and Renormalization in Chaotic Visibility Graphs

1.1.2 A Brief Overview of Some Initial Applications

In order to end this introduction without intending to be exhaustive, we believe appropriate to point out some of the areas where, despite the recent method, the visibility algorithm has been applied with interesting results: 1.1.2.1 Seismicity

Aguilar-San Juan and Guzman-Vargas presented, in Ref. [23], a statistical analysis of earthquake magnitude sequences in terms of the visibility graph method. Magnitude time series from Italy, southern California, and Mexico are transformed into networks and some organizational graph properties are discussed. Connectivities are characterized by a scale-free distribution with a notable effect for large scales due to either the presence or absence of large events. In addition, a scaling behavior is observed between different node measures such as betweenness centrality, clustering coefficient, nearest-neighbor connectivity, and earthquake magnitude. Moreover, parameters which quantify the difference between forward and backward links are proposed to evaluate the asymmetry of visibility attachment mechanism. Their results show an alternating average behavior of these parameters as earthquake magnitude changes. Finally, they evaluate the effects of reducing temporal and spatial windows of observation upon visibility network properties for main shocks. Telesca et al. [24–26] have analyzed the synthetic seismicity generated by a simple stick-slip system with asperities by using the method of the visibility graph. The stick-slip system mimics the interaction between tectonic plates, whose asperities are given by sandpapers of different granularity degrees. The visibility graph properties of the seismic sequences have been put in relationship with the typical seismological parameter, the b-value of the Gutenberg–Richter law. Between the b-value of the synthetic seismicity and the slope of the least-square line fitting, the k –M plot (relationship between the magnitude M of each synthetic event and its connectivity degree k), a close linear relationship is found, which is verified by real seismicity. 1.1.2.2 Hurricanes

Elsner et al. [27] demonstrated the method of construction of a network from a time series of US hurricane counts and showed how it can be used to identify unusual years in the record. The network links years based on a line-of-sight visibility algorithm applied to the time series plot and is physically related to the variation of hurricanes from 1 year to the next. The authors find that the distribution of node degree is consistent with a random Poisson process. High hurricaneoccurrence years that are surrounded by years with few hurricanes have many linkages. Of the environmental conditions known to affect coastal hurricane activity, they find years with little sunspot activity during September (peak month of the hurricane season) best correspond with the unusually high linkage years.

1.1

Mapping Time Series to Networks

1.1.2.3 Turbulence

A classic topic in fluid mechanics is the complex behavior exhibited by some fluids within a certain regime, characterized basically by a dimensionless number known as Reynolds number, consisting of a high-dimensional spatiotemporal form of chaos called turbulence. The multiscale nature of this phenomenon is reflected in the distribution of velocity increments and energy dissipation rates, which exhibit anomalous scalings suggesting some kind of multifractality. A first attempt to characterize an energy dissipation rate time series by means of the visibility algorithm was made by Liu et al. [28]. In this work, a series obtained from wind tunnel experimental measurements was mapped into a graph by the natural visibility version of the algorithm yielding a power law of exponent 𝛾 = 3.0 for the degree distribution. An edge covering box-counting method was used to prove the nonfractality of the graph and allometric scalings for the skeleton and random spanning trees of the graph were proposed, but no functional relation to any physical magnitude characterizing the phenomenon could be derived. 1.1.2.4 Financial Applications

Yang et al. [29] mapped six exchange rate series and their corresponding detrended series into graphs by means of the NVa. The results suggest that, for certain purposes, these series can be modeled as fractional Brownian motions. The multifractal structure of the series was broken by shuffling them and so, shuffled series mapped into graphs with exponential degree distributions. Qian et al. [30], in the same philosophy as Liu et al. [28], built three different classes of spanning trees from the graphs associated to 30 world stock market indices and studied their allometric properties, finding universal allometric scaling behavior in one of the classes. No satisfactory explanation was found for this fact. They also built spanning trees from graphs associated to fractional Brownian motions with different Hurst exponents, finding discrepancies in their allometric behavior with the ones mapped from the stock market indices. These discrepancies were attributed to the nonlinear long-term correlations and fat-tailed distributions of the financial series. 1.1.2.5 Physiology

Shao [31] used the visibility algorithm to construct the associated networks of time series of filtered data of five healthy subjects and five patients with congestive heart failure (CHF). He used the assortative coefficient of the networks to distinguish healthy patients from CHF patients. On the contrary, Dong and Liâ [32], in a comment on the first work, calculated the assortativity coefficients of heartbeat networks extracted from time series of healthy subjects and CHF patients and concluded that the assortative coefficients of such networks failed as an effective indicator to differentiate healthy patients from CHF patients at large. Ahmadlou et al. [33] presented a new chaos-wavelet approach for electroencephalogram (EEG)-based diagnosis of Alzheimer’s disease (AD) using the visibility graph. The approach is based on the research ideology that nonlinear features may not reveal differences between AD and control group in the

9

10

1 Entropy and Renormalization in Chaotic Visibility Graphs

band-limited EEG, but may represent noticeable differences in certain subbands. Hence, complexity of EEGs is computed using the VGs of EEGs and EEG subbands produced by wavelet decomposition. Two methods are used for computation of complexity of the VGs: one based on the power of scale-freeness of a graph structure and the other based on the maximum eigenvalue of the adjacency matrix of a graph. Analysis of variation is used for feature selection. Two classifiers are applied to the selected features to distinguish AD and control EEGs: a radial basis function neural network (RBFNN) and a two-stage classifier consisting of principal component analysis (PCA) and RBFNN. After comprehensive statistical studies, effective classification features and mathematical markers are presented.

1.2 Visibility Graphs and Entropy 1.2.1 Definitions of Entropy in Visibility Graphs

Following the pioneering works of Rashevsky [34] and Trucco [35], the use of entropy in graphs was introduced by A. Mowshowitz in 1968 [36] to characterize the complexity of a graph. Soon afterward, Korner in 1971 [37] applied a different definition of the concept to solve a coding problem formulated in information theory. Since then, various graph entropy measures have been developed, reinvented, and applied to a diversity of situations (see [38] for a review). Shannon ∑ [39] defined the entropy of any set of probabilities {pi } as H = − pi log pi ≥ 0, where pi is the probability of occurrence of the event i, but what is the meaning of pi in a graph? Here we can consider several possibilities. For example, pi could be the probability that a vertex in the graph has a degree k = i, hence we can rename it as p(k), and thus the graph entropy becomes: ∑ h=− p(k) log p(k). (1.6) k

Another possibility could be to consider clustering rather than degree, using p(C) instead. However, clustering is computationally harder to obtain, and it is very easy ∑ to prove that − p(C) log p(C) produces exactly the same value as Eq. 1.6. Other alternatives (as the probability of having two nodes connected, etc.) do not produce significantly different results; thus, for the following, we will use the degree distribution and define as graph entropy the one defined by Eq. 1.6 (note that in directed graphs, one could also consider the in and out degree distributions and hence define h𝑖𝑛 and h𝑜𝑢𝑡 ). In the case of visibility graphs coming from time series, the graph entropy h is strongly linked to the Shannon entropy H of its corresponding time series x(t), ∑ given by H = − p(x) log p(x) as at the end of the day, the information in the visibility graph comes from the time series. Therefore, the graph entropy of the

1.2

Visibility Graphs and Entropy

visibility graph is a very good proxy of the Shannon entropy for the associated time series. Graph entropy is obtained from the whole structure of the graph, and hence is a static magnitude. However, as visibility graphs come from dynamic processes, one should consider the role of entropies linked to such processes. Thus, in an iterative process, one can consider the Kolmogorov–Sinai entropy [40, 41], which was introduced to solve the “isomorphism problem,” that is, whether there is a mapping between two seemingly different dynamical systems, preserving the dynamical and statistical relationships between the successive states. As HKS is invariant under any isomorphism, two dynamical systems with different values for HKS are nonisomorphic. HKS can be defined as the rate of increment of entropy along the transformation T: Let us consider an abstract space Ω with a probability measure 𝜇 that assigns probabilities to subsets of Ω. Let us make a partition A of Ω composed by separate subsets A1 , A2 , … , An such that their union is Ω. The probability assigned to each subset is pi = 𝜇(Ai ), PA = (p1 , p2 , … , pn ) and its Shannon ∑ entropy is h(PA ) = − pi log pi . Let us consider T a dynamic transformation in Ω, leaving invariant the probabilities: pi = 𝜇(Ai ) = 𝜇(T −1 (Ai )). After m iterations of this transformation, we define Am as A ∨ T −1 (A) ∨ T −2 (A) … ∨ T −m+1 (A). For this given partition A and iterative process T, the Kolmogorov–Sinai entropy is given by HKS (T, A) = lim

m→∞

1 H(PAm ). m

(1.7)

It represents the increase of entropy due to the transformation T in the partition A. Note that, for m = 1, that is, for a single-step process, Kolmogorov–Sinai entropy is equal to Shannon entropy. In order to consider the entropy rate due purely to T, regardless of the partition considered, one has to take into account all the infinite possible partitions and keep the partition, which produces the higher value: HKS (T) = sup HKS (T, A).

(1.8)

A

In the case of chaotic time series, Kolmogorov–Sinai entropy exhibits a very interesting property, thanks to the Pesin theorem [42]. This theorem states an intimate relationship between HKS and the positive Lyapunov exponents given by ∑ ∑ HKS ≤ 𝛾i = 𝛾i+ . (1.9) i,𝛾i ≥0

i

This inequality, known as Pesin inequality, turns into an equality for sufficiently chaotic systems. Thus, for a deterministic dynamics, Kolmogorov–Sinai entropy is a criterion and quantitative index of chaos [43, 44]. The relevance of Kolmogorov–Sinai entropy in data analysis to globally quantify the temporal organization of the evolution has been recognized in numerous applications, and it is now a standard tool of nonlinear time series analysis. Block entropy is another way to link Kolmogorov–Sinai entropy and Shannon entropy. For a stationary stochastic process (xt )t≥0 (in discrete time t), Shannon

11

12

1 Entropy and Renormalization in Chaotic Visibility Graphs

entropy of the array (x1 ,…,xn ) is termed block entropy of order n and denoted Hn . It is the Shannon entropy of the n-word distribution, namely: 1 ∑ p(x1 … xn ) log p(x1 … xn ). (1.10) Hn = − n x …x 1

n

The n-block entropy captures quantitatively correlations of range shorter than n, by contrast with the simple entropy H = H1 , which is only sensitive to the frequencies of the different elementary states. The Kolmogorov–Sinai entropy can be recovered from the block entropy [45] as the asymptotic limit of block entropies. Taking advantage of this fact, we can define for a visibility graph an analogous set of graph block entropies related to the degrees of the graph, as 1 ∑ hn = − p(k1 … kn ) log p(k1 … kn ). (1.11) n k …k 1

n

And its asymptotic limit will be an analogue to the Kolmogorov–Sinai entropy, but for the visibility graph, that is, the graph Kolmogorov–Sinai entropy: hKS = lim hn . n→∞

(1.12)

1.2.2 Pesin Theorem in Visibility Graphs

The period-doubling bifurcation cascade or Feigenbaum scenario is perhaps the better known and most famous route to chaos [46, 47]. This route to chaos appears an infinite number of times among the family of attractors spawned by unimodal maps within the so-called periodic windows that interrupt stretches of chaotic attractors. Their shared bifurcation accumulation points form transitions between order and chaos that are known to exhibit universal properties [48]. A general observation is that the HVg extracts not only universal elements of the dynamics, free of the peculiarities of the individual unimodal map, but also of universality classes characterized by the degree of nonlinearity. Therefore, all the results presented in the following, while referring to the specific logistic map for illustrative reasons, apply to any unimodal map. In the case of the Feigenbaum scenario, these graphs are named Feigenbaum graphs. Logistic map is defined by the quadratic difference equation xt+1 = f (xt ) = 𝜇xt (1 − xt ), where xt ∈ [0, 1] and the control parameter 𝜇 ∈ [0, 4]. According to the horizontal visibility (HV) algorithm, a time series generated by the logistic map for a specific value of 𝜇 (after an initial transient of approach to the attractor) is converted into a Feigenbaum graph (see Figure 1.3). This is a well-defined subclass of HV graphs, where consecutive nodes of degree k = 2, that is, consecutive data with the same value, do not appear, what is actually the case for series extracted from maps (besides the trivial case of a constant series). A deep-seated feature of the period-doubling cascade is that the order in which the positions of a periodic attractor are visited is universal [50], the same for all

1.2

Figure 1.3 Feigenbaum graphs from the logistic map xt+1 = f (xt ) = 𝜇xt (1 − xt ). The main figure portrays the family of attractors of the logistic map and indicates a transition from periodic to chaotic behavior at 𝜇∞ = 3.569946 … through period-doubling bifurcations. For 𝜇 ≥ 𝜇∞ , the figure shows merging of chaotic-band attractors, where aperiodic behavior appears interrupted by

Visibility Graphs and Entropy

windows that, when entered from their lefthand side, display periodic motion of period T = m ⋅ 20 with m > 1 (for 𝜇 < 𝜇∞ , m = 1) that subsequently develops into m perioddoubling cascades with new accumulation points 𝜇∞ (m). (Luque et al. [49]. Reproduced with permission of American Institute of Physics.)

unimodal maps. This ordering turns out to be a decisive property in the derivation of the structure of the Feigenbaum graphs. See Figure 1.4, which plots the graphs for a family of attractors of increasing period T = 2n , that is, for increasing values of 𝜇 < 𝜇∞ . This basic pattern also leads to the expression for their associated degree distributions at the n-th period-doubling bifurcation: ( )k∕2 , k = 2, 4, 6, … , 2n, p(n, k) = 12 ( )n p(n, k) = 12 , k = 2(n + 1), (1.13)

13

14

1 Entropy and Renormalization in Chaotic Visibility Graphs n=0 n=1

n=2

Increasing μ

n=3

n=4

n=5

Figure 1.4 Periodic Feigenbaum graphs for 𝜇 < 𝜇∞ . The sequence of graphs associated to periodic attractors with increasing period T = 2n undergoing a period-doubling cascade. The pattern that occurs for increasing values of the period is related to the universal ordering with which an orbit visits

the points of the attractor. Observe that the hierarchical self-similarity of these graphs requires that the graph for n − 1 is a subgraph of that for n. (Luque et al. [49]. Reproduced with permission of American Institute of Physics.)

and zero for k odd or k > 2(n + 1). At the accumulation point 𝜇∞ , the period diverges (n → ∞) and the distribution is exponential for all even values of the degree, ( )k∕2 1 p(∞, k) = , k = 2, 4, 6, … , (1.14) 2 and zero for k odd. By making use of the expression we have for the degree distribution p(n, k) in the region 𝜇 < 𝜇∞ , we obtain for the graph entropy h(n) after the nth period-doubling bifurcation (do not confuse with block entropy hn ), the following result: ∑

2(n+1)

h(n) = −

p(n, k) log p(n, k)

k=2 2n

( ( ) ) ∑ 1 1 1 1 log log − n n k∕2 k∕2 2 2 2 2 k=2 ( ) ( ) log 2 2 1 k − n = log 4 1 − n . = 2 2 2

=−

(1.15)

We observe that the graph entropy increases with n and, interestingly, depends linearly on the mean degree k. This linear dependence between h and k is related to the fact that, generally, the entropy and the mean of a probability distribution are proportional to exponentially distributed functions, a property that holds exactly in the accumulation point (eq. 1.14) and approximately in the periodic region (eq. 1.13), where there is a finite cutoff that ends the exponential law. Finally, note

1.2

Visibility Graphs and Entropy

that in the limit n → ∞ (accumulation point), the entropy converges to a finite value h(∞) = log 4. Something similar happens in the chaotic region of the logistic map. Here, we find a period-doubling bifurcation cascade of chaotic bands that takes place as 𝜇 decreases from 𝜇 = 4 to 𝜇∞ . For the largest value of the control parameter, at 𝜇 = 4, the attractor is fully chaotic and occupies the entire interval [0, 1] (see Figure 1.3). This is the first chaotic band n = 0 at its maximum amplitude. As 𝜇 decreases in value within 𝜇∞ < 𝜇 < 4 band-narrowing and successive bandsplittings [46–48, 50] occur. In general, after n reverse bifurcations, the phase space is partitioned in 2n disconnected chaotic bands, which are self-affine copies of the first chaotic band [51]. The values of 𝜇 at which the bands split are called Misiurewicz points [50], and their location converges to the accumulation point 𝜇∞ for n → ∞. Significantly, while in the chaotic zone orbits are aperiodic, for reasons of continuity, they visit each of the 2n chaotic bands in the same order as positions are visited in the attractors of period T = 2n [50]. In Figure 1.5, we have plotted the Feigenbaum graphs generated through chaotic time series at different values of 𝜇 that correspond to an increasing number of reverse bifurcations. Since chaotic bands do not overlap, one can derive the following degree distribution for a Feigenbaum graph in the chaotic zone after n chaotic-band reverse bifurcations by using only the universal order of visits ( )k∕2 p𝜇 (n, k) = 12 , k = 2, 4, 6, … , 2n, ( )n p𝜇 (n, k ≥ 2(n + 1)) = 12 , (1.16) and zero for k = 3, 5, 7, … , 2n + 1. We note that this time, the degree distribution retains some dependence on the specific value of 𝜇, concretely, for those nodes with degree k ≥ 2(n + 1), all of which belong to the top chaotic band (labeled with dashed links in Figure 1.5). The HV algorithm filters out chaotic motion within all bands except for that taking place in the top band, whose contribution decreases as n → ∞ and appears coarse-grained in the cumulative distribution p𝜇 (n, k ≥ 2(n + 1)). As would be expected, at the accumulation point 𝜇∞ , we recover the exponential degree distribution (Eq. 1.14), i.e., limn→∞ p𝜇 (n, k) = p(∞, k). Regarding graph entropy in the chaotic zone, in general h cannot be derived exactly since the precise shape of p𝜇 (k) is unknown (albeit the asymptotic shape is also exponential). However, arguments of self-affinity similar to those used for describing the degree distribution of Feigenbaum graphs can be used to find some regularity properties of the entropy h𝜇 (n). Concretely, the entropy after n chaotic band reverse bifurcations can be expressed as a function of n and of the entropy in the first chaotic band h𝜇 (0). Using the expression of the degree distribution, a little algebra yields: top

h𝜇 (0) = log 4 + n . 2n 2 The chaotic-band reverse bifurcation process in the chaotic region from right to left leads in this case to a decrease of entropy with an asymptotic value of log 4 h𝜇 (n) = log 4 +

h𝜇 (n)

15

16

1 Entropy and Renormalization in Chaotic Visibility Graphs

n=0 μ=4

n=2 μ2–4 =3.5925…

Decreasing μ

n=1 μ2–4 =3.5925…

n=3 μ4–8 =3.5748…

Figure 1.5 Aperiodic Feigenbaum graphs for 𝜇 > 𝜇∞ . A sequence of graphs associated with chaotic series after n chaotic-band reverse bifurcations, starting at 𝜇 = 4 for n = 0, when the attractor extends along a single band and the degree distribution does not present any regularity (dashed links). For n > 0, the phase space is partitioned in 2n disconnected chaotic bands and the nth

self-affine image of 𝜇 = 4 is the nth Misiurewicz point 𝜇2n−1 −2n . In all cases, the orbit visits each chaotic band in the same order as in the periodic region 𝜇 < 𝜇∞ . This order of visits induces an ordered structure in the graphs (black links) analogous to that found for the period-doubling cascade. (Luque et al. [49]. Reproduced with permission of American Institute of Physics.)

for n → ∞ at the accumulation point. These results show that the graph entropy behaves qualitatively as the map’s Lyapunov exponent 𝜆, with the peculiarity of having a shift of log 4, as confirmed numerically in Figure 1.6. This agreement is expected in the chaotic region in view of the Pesin theorem [42], which relates the positive Lyapunov exponents of a map with its Kolmogorov–Sinai entropy (see Eq. 1.9) that for unimodal maps reads hKS = 𝜆, ∀𝜆 > 0, as we stated that graph entropy h can be used as a proxy for HKS . Unexpectedly, this qualitative agreement seems also valid in the periodic windows (𝜆 < 0), since the graph entropy is positive and approximately varies with the value of the associated (negative) Lyapunov exponent although HKS = 0, hinting at a Pesin-like relation valid also out of chaos. In short, graph entropy obtained from the whole structure of the HVg stores much of the information in the dynamic process, and it is a good estimation for the Kolmogorov–Sinai entropy of the original time series. We will see that the same happens with graph block entropies. For this example, we will use another common transition to chaos: intermittency, the seemingly random alternation of long quasi-regular or laminar phases, the so-called intermissions, and relatively short irregular or chaotic bursts. Intermittency is omnipresent in nonlinear science and has been weighed against comparable phenomena in nature, such as Belousov–Zhabotinski chemical reactions, Rayleigh–Benard instabilities, and turbulence [47, 52–54]. Pomeau and Manneville [55] introduced a classification as types I–III for different kinds of intermittency. For definiteness, we chose the case of type I intermittency, as it occurs just preceding an (inverse) tangent bifurcation in nonlinear iterated maps, although the very same methodology can be extended to other situations.

1.2

Visibility Graphs and Entropy

2

h

log4 1

λ

0

–1

–2 3.6

3.7

3.8

3.9

4

μ Figure 1.6 Horizontal visibility network entropy h and Lyapunov exponent 𝜆 for the Logistic map. We plot the numericalvalues of h and 𝜆 for 3.5 < 𝜇 < 4 (the numerical step is 𝛿𝜇 = 5 × 10−4 and in each case the processed time series have a size of 212 data). The inset reproduces the same data but with a rescaled entropy h − log(4).

The surprisingly good match between both quantities is due to the Pesin identity (see text). Unexpectedly, the Lyapunov exponent within the periodic windows (𝜆 < 0 inside the chaotic region) is also well captured by h. (Luque et al. [49]. Reproduced with permission of American Institute of Physics.)

Type I intermittency can be observed infinitely many times in the logistic map xt+1 = F(xt ) = 𝜇xt (1 − xt ), 0 ≤ x ≤ 1, 0 ≤ 𝜇 ≤ 4,

(1.17)

close to the control parameter values 𝜇 = 𝜇T at which windows √ of periodicity open with period T for values 𝜇 > 𝜇∞ . For instance, at 𝜇3 = 1 + 8, this map exhibits a cycle of period T = 3 with subsequent bifurcations. This is the most visible window of periodicity in the chaotic regime (and the one in whose vicinity the following results have been obtained). The regular periodic orbits hold slightly above 𝜇T , but below 𝜇T the dynamics consists of laminar episodes interrupted by chaos (i.e., intermittency). In the bottom part of Figure 1.7, we show the HV graph of the associated intermittent series, which consists of several repetitions of a three-node motif (periodic backbone) linked to the first node of the subsequent laminar trend, interwoven with groups of nodes irregularly (chaotically) connected among them. We observe that the motif repetitions in the graph correspond to the laminar regions in the trajectory (pseudo-periodic data with pseudo-period three) and the chaotically

17

18

1 Entropy and Renormalization in Chaotic Visibility Graphs

x(t)

t Peak node

Laminar trend Figure 1.7 Graphical illustration of how the horizontal visibility (HV) graph inherits in its structure the dynamics of the associated intermittent series. In the top of the figure, we show a sample intermittent series generated by the logistic map close to 𝜇c (𝜖 > 0), producing laminar regions (black) mixed with chaotic bursts (white). In the bottom, we plot the associated HV graph. Laminar regions

Chaotic burst

Laminar trend

are mapped into nodes with a periodic backbone, whereas the actual pseudo-periodicity of the series is inherited in the graph by the existence of the so-called peak or interfacial nodes. Chaotic bursts are mapped into chaotic nodes, with a characteristic degree distribution. (Núñez et al. [56]. Reproduced with permission of American Physical Society.)

connected groups correspond to the chaotic bursts in the trajectory. As laminar trends are indeed pseudo-periodic in the sense that they can be decomposed as a periodic signal and a drift, this pseudo-periodicity expresses in the graph structure by allowing a node for each period-three motif to be connected to the first node in the next laminar region (the so-called peak or interfacial node), as the values of the time series in the chaotic bursts are always smaller than those in the former laminar trend. Therefore, the connectivity of this node is a direct function of the length of the previous laminar phase. Trajectories generated by canonical models evidencing type I intermittency show power-law scaling in the Lyapunov exponent of the trajectories [55, 57], which reads 𝜆 ∼ 𝜖 0.5 as 𝜖 → 0, where 𝜖, called the channel width of the Poincaré section, is the distance between the local Poincaré map and the diagonal [58]. In our case, it is equal to 𝜖 = 𝜇T − 𝜇. In Figure 1.8, we made a log–log plot of the values of graph block entropies hn as a function of the channel width 𝜖 and the block size n. A power law scaling is recovered, albeit with a different scaling exponent 𝛼 < 0.5: hn ∼ 𝜖 𝛼(n) .

(1.18)

As we stated, h1 (corresponding to the graph entropy, Eq. 1.6) is only a proxy of the Kolmogorov–Sinai entropy of the time series, and thus a comparison with the Lyapunov exponent is only approximate. The same is valid for graph block

1.2

Visibility Graphs and Entropy

100

hn 0.5 –α(n)

1

0.1

10–1 10–7

10–6

10–5

1

Ɛ

10

n

10–4

10–3

Figure 1.8 Log–log plot of the block entropies hn constructed from degree distributions of n-sequence of connectivities in the HV graphs as a function of 𝜖: n = 1 (squares), n = 2 (upward-pointing triangles), n = 3 (downward-pointing triangles), and n = 4 (right triangles). A scaling of the form

10–1 hn ∼ 𝜖 𝛼(n) is found. (Inset panel) Log–log plot of the convergence of 𝛼(n) to the exponent associated to the Lyapunov exponent, as a function of n. A relation of the form [0.5 − 𝛼(n)] ∼ n−0.19 is found. (Núñez et al. [56]. Reproduced with permission of American Physical Society.)

entropies with n > 1, but note that as n increases 𝛼 decreases according to [0.5 − 𝛼(n)] ∼ n−0.19 , converging to 0.5. Here we recall that the limit n → ∞ of graph block entropies hn was our graph Kolmogorov–Sinai entropy, Eq. 1.12, giving: hKS = lim hn ∼ lim 𝜖 𝛼(n) = 𝜖 0.5 ∝ 𝜆, n→∞

(1.19)

n→∞

proving again the relationship between graph entropies and the Pesin theorem. We remark that, whereas the graph entropy and the graph Kolmogorov–Sinai entropy are magnitudes defined in the graph, the Lyapunov exponent is only defined in the system. Still, the strong numerical evidence in favor of a Pesin-like identity between the map’s Lyapunov exponent and the entropies defined in the graph support that a graph analogue of the Lyapunov exponent can be defined in the graph space. 1.2.3 Graph Entropy Optimization and Critical Points

The information stored in the graph entropy (Eq. 1.6) allows also to identify the critical points in maps with order-to-chaos transitions. We can arrive to this result via optimization of the entropy. In order to illustrate this, we will consider the logistic map and the period-doubling bifurcation cascade, or Feigenbaum scenario, already considered at the beginning of the previous section. Consider the Lagrangian (∞ (∞ ) ) ∞ ∑ ∑ ∑ p(k) log p(k) − (𝜆0 − 1) p(k) − 1 − 𝜆1 𝑘𝑝(k) − k , =− k=2

k=2

k=2

19

20

1 Entropy and Renormalization in Chaotic Visibility Graphs

for which the extremum condition reads 𝜕 = − log p(k) − 𝜆0 − 𝜆1 k = 0, 𝜕p(k) and has the general solution p(k) = e−𝜆0 −𝜆1 k . The Lagrange multipliers 𝜆0 and 𝜆1 can be calculated from their associated constraints. First, the normalization of the probability density, ∞ ∑

e−𝜆0 −𝜆1 k = 1,

k=2

implies the following relation between 𝜆0 and 𝜆1 e𝜆0 =

∞ ∑

e−𝜆1 k =

k=2

e−𝜆1 , −1

e𝜆1

and differentiation of this last expression with respect to 𝜆1 yields −

∞ ∑

ke−𝜆1 k =

k=2

e−𝜆1 − 2 . (e𝜆1 − 1)2

Second, the assumption that the mean degree is a well-defined quantity (true for HV graphs) yields ∞ ∑

ke−𝜆0 −𝜆1 k = k =

k=2

2 − e−𝜆1 . 1 − e−𝜆1

Combining the above results, we find ) ( k−1 𝜆1 = log , k−2 and

( 𝜆0 = log

(k − 2)2 k−1

) .

Hence, the degree distribution that maximizes h is )k ( k−1 k−2 , p(k) = (k − 2)2 k − 1 which is an increasing function of k. The maximal entropy therefore found for the maximal mean degree, which we saw in the section “Natural and Horizontal Visibility algorithms,” is k = 4. This yields an associated degree distribution ( ) ( ) 3 2 k 1 2 k−2 p(k) = = , 4 3 3 3 which coincides with the one expected for a random uncorrelated series, as we saw in the aforementioned section. Remarkably, we conclude that the HV graph

1.2

Visibility Graphs and Entropy

with maximal entropy is that associated with a purely uncorrelated random process. So far, we have not used any property of the Logistic map and their associated Feigenbaum graphs, and hence the previous result is a completely general one. Now we will use them in the form of restrictions to the maximization of the graph entropy. Note that by construction, the Feigenbaum graphs from the Logistic map along the period-doubling route to chaos (𝜇 < 𝜇∞ ) do not have odd values for the degree. Let us assume now this additional constraint in the former entropy optimization procedure. The derivation proceeds along similar steps, although summations now run only over even terms. Concretely, we have e𝜆0 =

∞ ∑

e−𝜆1 2k =

k=1

1 , e2𝜆1 − 1

which after differentiation over 𝜆1 gives ∞ ∑

ke−𝜆1 2k =

k=1

e2𝜆1 − 2 (e2𝜆1 − 1)2

and ∞ ∑

2ke−𝜆0 −𝜆1 2k = k =

k=1

2e2𝜆1 . −1

e2𝜆1

We obtain for the Lagrange multipliers ( ) 1 k 𝜆1 = log , 2 k−2 and

)

( 𝜆0 = log

k−2 2

.

The degree distribution that maximizes the graph entropy turns now to be ( )k∕2 2 k−2 . p(k) = k−2 k As before, entropy is an increasing function of k, attaining its larger value for the upper-bound value k = 4, which reduces to p(k) = (1∕2)k∕2 , k = 2, 4, 6, …, equation 1.14. We conclude that the maximum entropy of the entire family of Feigenbaum graphs, if we require that odd values for the degree are not allowed, is achieved at the logistic map accumulation point. Finally, the network entropy is trivially minimized for a degree distribution p(2) = 1, that is, the HV degree distribution coming from the constant series. In short, the graph entropy optimization leads to three special regimes: random dynamics, constant time series, and the critical accumulation point, where the transition order-to-chaos takes place.

21

22

1 Entropy and Renormalization in Chaotic Visibility Graphs

In the case of intermittency also studied in the previous section, it is very evident to show how the graph entropy corresponds to a minimum in the transition point. Eq. 1.18 showed that in this scenario, graph entropy (corresponding to block entropy with n = 1) was h1 ∼ 𝜖 𝛼(1) (being 𝛼(1) ≃ 0.12, see inset in Figure 1.8). Clearly, as we approach to the transition point, 𝜖 → 0 (i.e., as 𝜇 → 𝜇c , coming from the chaotic zone to the ordered period-three window), we obtain h1 → 0. Hence, graph entropy reaches a global minimum for the HV graph at tangency 𝜖 = 0 (but note that there is no continuity in h1 : when we effectively arrive to 𝜖 = 0, suddenly the graph changes radically to an ordered graph with h1 = log 3). Finally, we will study a third route to chaos, quasi-periodicity. Quasi-periodicity is observed along time evolution in nonlinear dynamical systems [47, 48, 59] and also in the spatial arrangements of crystals with forbidden symmetries [50, 60]. These two manifestations of quasi-periodicity are rooted in self-similarity and are seen to be related through analogies between incommensurate quantities in time and spatial domains [50]. Quasi-periodicity can be visualized also in the graphs generated when the HV algorithm is applied to the stationary trajectories of the universality class of low-dimensional nonlinear iterated maps with a cubic inflexion point, as represented by the circle map [50]. We briefly recall that the critical circle map [47, 48, 59] is the one-dimensional iterated map given by 1 sin(2𝜋𝜃t ), mod 1, (1.20) 2𝜋 representative of the general class of nonlinear circle maps: 𝜃t+1 = fΩ,K (𝜃t ) = 𝜃t + Ω + g(𝜃t ), mod 1, where g(𝜃) is a periodic function that fulfills g(𝜃 + 1) = g(𝜃). The dynamical variable 0 ≤ 𝜃t < 1 can be interpreted as a measure of the angle that specifies the trajectory on the unit circle, the control parameter Ω is the socalled bare winding number. The dressed winding number for the map is defined as the limit of the ratio: 𝜔 ≡ limt→∞ (𝜃t − 𝜃0 )∕t and represents an averaged increment of 𝜃t per iteration. Trajectories are periodic (locked motion) when the corresponding dressed winding number 𝜔(Ω) is a rational number p∕q and quasi-periodic when it is irrational. The resulting hierarchy of mode-locking steps at k = 1 can be conveniently represented by aFarey tree, which orders all the irreducible rational numbers p∕q ∈ [0, 1] according to their increasing denominators q. The HV algorithm assigns each datum 𝜃i of a time series {𝜃i }i=1,2,… to a node i in its associated HV graph, and i and j are two connected nodes if 𝜃i , 𝜃j > 𝜃n for all n such that i < n < j. The associated HV graph is a periodic repetition of a motif with q nodes, p of which have connectivity k = 2. (Observe that p in the map indicates the number of turns in the circle to complete a period). For k ≤ 1, the order of visits of positions in the attractors and their relative values remain invariant for a locked region with 𝜔 = p∕q [61], such that the HV graphs associated with them are the same. In Figure 1.9 we present an example, in which the first and last nodes in the motif correspond to the largest value in the attractor. In Figure 1.10, we depict the associated HV periodic motifs for each p∕q in the Farey tree. We directly observe that the graphs can be constructed by means of 𝜃t+1 = fΩ,K (𝜃t ) = 𝜃t + Ω −

1.2

Figure 1.9 Examples of two standard circle map periodic series with dressed winding number 𝜔 = 5∕8, K = 0 (a) and K = 1 (b). As can be observed, the order of visits on the

Visibility Graphs and Entropy

circle and the relative values of 𝜃n remain invariant and the associated HV graph is therefore the same in both cases.

the following inflation process: let p∕q be a Farey fraction with “parents” p′ ∕q′ < p′′ ∕q′′ , that is, p∕q = (p′ + p′′ )∕(q′ + q′′ ). The “offspring” graph G(p∕q) associated with 𝜔 = p∕q, can be constructed by the concatenation G(p′′ ∕q′′ ) ⊕ G(p′ ∕q′ ) of the graphs of its parents. By means of this recursive construction, we can systematically explore the structure of every graph along a sequence of periodic attractors leading to quasi-periodicity. A standard procedure to study the quasi-periodic route to chaos is selecting an irrational number 𝜔∞ ∈ [0, 1]. Then, a sequence 𝜔n of rational numbers approaching 𝜔∞ is taken. This sequence can be obtained through successive truncations of the continued fraction expansion of 𝜔∞ . The corresponding bare winding numbers Ω(𝜔n ) provide attractors, whose periods grow toward the onset of chaos, where the period of the attractor must be infi−1 nite. √ A well-studied case is the sequence of rational approximations of 𝜔∞ = 𝜙 = ( 5 − 1)∕2 ≃ 0.6180 …, the reciprocal of the Golden ratio, which yields winding numbers {𝜔n = Fn−1 ∕Fn }n=1,2,3,… , where Fn is the Fibonacci number generated by the recurrence Fn = Fn−1 + Fn−2 with F0 = 1 and F1 = 1. The first few steps of this route are shown in Figure 1.10(b): 𝜔1 = 1∕1, 𝜔2 = 1∕2, 𝜔3 = 2∕3, 𝜔4 = 3∕5, 𝜔5 = 5∕8 … , 𝜔6 = 8∕13 …. Within the range Ω(Fn−1 ∕Fn ), one observes trajectories of period Fn and, therefore, this route to chaos consists of an infinite family of periodic orbits with increasing periods of values Fn , n → ∞. If we denote by G𝜙−1 (n) the graph associated to 𝜔n = Fn−1 ∕Fn in the Golden ratio route, it is easy to prove that the associated connectivity distribution P(k) for G𝜙−1 (n) with n ≥ 3 and k ≤ n + 1 is pn (2) = Fn−2 ∕Fn , pn (3) = Fn−3 ∕Fn , pn (4) = 0 and pn (k) = Fn−k+1 ∕Fn . In the limit n → ∞ the connectivity distribution at the accumulation point G𝜙−1 (∞), the

23

24

1 Entropy and Renormalization in Chaotic Visibility Graphs

Figure 1.10 Six levels of the Farey tree and the periodic motifs of the graphs associated with the corresponding rational fractions p∕q taken as dressed winding numbers 𝜔 in the circle map (for space reasons, only two of these are shown at the sixth level). (a) In order to show how graph concatenation works, we have highlighted an example

using different gray tones on the left-hand side: as 1∕3 > 1∕4, G(1∕3) is placed on the left-hand side, G(1∕4) on the right-hand side and their extremes are connected to an additional link closing the motif G(2∕7). (b) Five steps in the Golden ratio route, b = 1 (thick solid line); (c) Three steps in the Silver ratio route, b = 2 (thick dashed line).

quasi-periodic graph at the onset of chaos, takes the form ⎧1 − 𝜙 ⎪2𝜙−1 − 1 p∞ (k) = ⎨ ⎪0 1−k ⎩𝜙 −1

k k k k

=2 =3 =4 ≥ 5.

(1.21)

A straightforward generalization of this scheme is obtained by considering the routes {𝜔n = Fn−1 ∕Fn }n=1,2,3,… with Fn = bFn−1 + Fn−2 , F0 = 1, F1 = 1, where b a √ natural number. It can be easily seen that limn→∞ Fn−1 ∕Fn = (−b + b2 + 4)∕2, which is a solution of the equation x2 + 𝑏𝑥 − 1 = 0. Interestingly, all the positive solutions of the above family of quadratic equations happen to be positive quadratic irrationals in [0, 1] with pure periodic continued fraction representation: 𝜙−1 = [b, b, b, …] = [b] (b = 1 corresponds to the Golden number, b = 2 to b the Silver number, and so on). Every b > 1 fulfills the condition Fn−1 ∕Fn < 1∕2. For fixed b ≥ 2, we can deduce from the construction process illustrated in Figure 1.10, and from the balance equation p∞ (k) = 𝜙−1 p∞ (k + b), that the degree b distribution p∞ (k) for quasi-periodic graphs with b ≥ 2 is ⎧ −1 ⎪𝜙b ⎪1 − 2𝜙−1 b p∞ (k) = ⎨ −1 (3−k)∕b )𝜙b (1 − 𝜙 ⎪ b ⎪0 ⎩

k=2 k=3 k = 𝑏𝑛 + 3, n ∈ ℕ otherwise.

(1.22)

Let us proceed now with the optimization of graph entropy. It has to take into account the constraints found in the HV graphs from the circle map: p(2) = 𝜙−1 , p(3) = 1 − 2𝜙−1 , mean connectivity ⟨k⟩ = 4 (as it comes from a nonb b periodic series), and p(k) = 0 ∀k ≠ 𝑏𝑛 + 3, n ∈ ℕ. Note that these are in fact the

1.2

Visibility Graphs and Entropy

constraints for b ≥ 2; for b = 1, the first two constraints should be p(2) = 1 − 𝜙−1 and p(3) = 2𝜙−1 − 1 (see Eq. 1.21), but to make this proof as general as possible, we will proceed with b ≥ 2 and let the case b = 1 as an exercise for the reader. In order to take into account the first two constraints, we define − (1 − 2𝜙−1 ) = 𝜙−1 , P ∶= 1 − p(2) − p(3) = 1 − 𝜙−1 b b b

(1.23)

that is, the sum of p(k) for k > 3. For the third constrain, we define the reduced mean connectivity ; Q ∶= 4 − 2p(k) − 3p(3) = 1 + 4𝜙−1 b

(1.24)

therefore, introducing these constraints in the Lagrangian, we have ( ∞ ) ) ( ∞ ∞ ∑ ∑ ∑ p(k) log p(k) − (𝜆0 − 1) p(k) − P − 𝜆1 𝑘𝑝(k) − Q , =− k=3+𝑏𝑛

k=3+𝑏𝑛

k=3+𝑏𝑛

for which the extremum condition reads 𝜕 = − log p(k) − 𝜆0 − 𝜆1 k = 0, 𝜕p(k) and has the solution p(k) = e−𝜆0 −𝜆1 k . From this, and using the definition of P we get P=

∑

∞ ∑

p(k) =

k>3

e

−𝜆0 −𝜆1 k

k=3+𝑏𝑛

=e

−𝜆0

∞ ∑

e𝜆1 k = 𝜙−1 . b

k=3+𝑏𝑛

As the infinite sum gives ∞ ∑

e−𝜆1 k =

k=3+𝑏𝑛

e−(3+b)𝜆1 , 1 − e−b𝜆1

(1.25)

we get the following relationship between the Lagrange multipliers: 𝜙−1 (1 − e−b𝜆1 ) b

. e−(3+b)𝜆1 Using now the reduced mean connectivity Q, we get e−𝜆0 =

Q=

∑ k>3

𝑘𝑝(k) =

∞ ∑ k=3+𝑏𝑛

ke−𝜆0 −𝜆1 k = e−𝜆0

∞ ∑

ke𝜆1 k = 1 + 4𝜙−1 . b

(1.26)

(1.27)

k=3+𝑏𝑛

In order to calculate the sum, we can derive Eq. 1.25 respect to 𝜆1 , which gives { } ∞ ∑ be−b𝜆1 (1 − e−b𝜆1 ) + be−2b𝜆1 3e−b𝜆1 −𝜆1 k −3𝜆1 . ke =e + 1 − e−b𝜆1 (1 − e−b𝜆1 )2 k=3+𝑏𝑛 Substituting this sum into Eq. 1.27 and using Eq. 1.26, we get { } −1 −b𝜆 be−b𝜆1 (1 − e−b𝜆1 ) + be−2b𝜆1 𝜙b (1 − e 1 ) 3e−b𝜆1 −3𝜆1 1 + 4𝜙−1 = e + , b 1 − e−b𝜆1 (1 − e−b𝜆1 )2 e−(3+b)𝜆1

25

26

1 Entropy and Renormalization in Chaotic Visibility Graphs

which after some algebra, yields be−b𝜆1 , 1 − e−b𝜆1 giving for the second Langrangian multiplier ) ( 𝜙b + 1 − b 1∕b . e−𝜆1 = 𝜙b + 1 𝜙b + 4 = 3 + b +

This can be simplified multiplying and dividing by 𝜙b and making use of the relationship for metallic numbers 𝜙2b = 1 + b𝜙b , giving −1

e−𝜆1 = 𝜙b b . Introducing it in Eq. 1.26, we get for the first Lagrange multiplier the result e−𝜆0 =

𝜙−1 (1 − e−b𝜆1 ) b e(3−b)𝜆1

=

𝜙−1 (1 − 𝜙−1 ) b b − 3+b b

𝜙b

3

= 𝜙bb (1 − 𝜙−1 ). b

Therefore, the degree distribution maximizing graph entropy in the circle map case is given by ⎧𝜙−1 ⎪ b −1 ⎪1 − 2𝜙b k p(k) = ⎨ 3 b −1 − b ⎪𝜙b (1 − 𝜙b )𝜙b ⎪0 ⎩

k=2 k=3 k = 𝑏𝑛 + 3, n ∈ ℕ otherwise.

(1.28)

which is exactly the same as Eq. 1.22. Q.E.D.

1.3 Renormalization Group Transformations of Horizontal Visibility Graphs

The infinite families of graphs generated by the HV algorithm from time series formed by trajectories obtained along the three routes to chaos in low-dimensional maps are particularly suitable objects for exploration via the renormalization group (RG) transformation. The RG method was originally developed in quantum field theory and in statistical mechanics of phase transitions to remove unwanted divergences in relevant quantities by redefining parameters iteratively [62, 63]. The method is capable of handling problems involving many length scales, and was found to be specially tractable and fruitful in nonlinear dynamics, where functional composition appears as the basic operation [47]. The central feature of study of the RG method is that of self-affine structures, and these appear profusely in the prototypical nonlinear one-dimensional iterated maps we chose to use for the assessment of the HV procedure. Some time ago, the transitions from periodic to chaotic motion present in these maps were studied via the RG method with celebrated results [47]. The transformation  in this case

1.3

Renormalization Group Transformations of Horizontal Visibility Graphs

consists of functional composition and rescaling, such as ] [  {f𝜇 (x)} = 𝛼f𝜇 f𝜇 (𝛼 −1 x) ,

(1.29)

where f𝜇 (x) is the one-dimensional nonlinear map, for instance, the logistic map, with control parameter 𝜇. Repeated application of  modifies the original map f𝜇 (x) into another map  {f𝜇 (x)}, a second application into yet another map  (2) {f𝜇 (x)}, and so on, with  (n) {f𝜇 (x)} after n applications. A “flow” is generated in the set of maps that terminates when n → ∞ at a fixed-point map f𝜇∗ (x) that satisfies ] [ (1.30) f𝜇∗ (x) = 𝛼f𝜇∗ f𝜇∗ (𝛼 −1 x) , for a given value of 𝛼. The fixed points that occur are classified as trivial or nontrivial according to whether these are reached, respectively, for all nonzero values of a small set of variables called relevant, or only for vanishing values of these variables. In our example, there is only one relevant variable Δ𝜇 ≡ 𝜇 − 𝜇c , where 𝜇c is the value of the control parameter 𝜇 at which a transition from regular to chaotic behavior takes place. The fixed-point maps enjoy a universal quality in the sense that a whole class of maps lead to and share the properties of these maps. This is the case of unimodal (one hump) maps of nonlinearity z > 1, where z is the degree of its extremum, so that the logistic map is one member of the universality class of quadratic maps z = 2. There is an infinite number of irrelevant variables, those that specify the differences between any given map for a given value of z and its nontrivial fixed-point map f𝜇∗c (x). An important feature of HV graphs is that each one of them represents a large number of nonlinear map trajectories, that is, many time series lead to the same HV graph, and each of them captures significant characteristics of a class of trajectories. In our case studies, the three routes to chaos, each HV graph represents an attractor. This is illustrated by the HV graphs obtained for the period-doubling cascade shown in Figure 1.4. These sets of graphs are independent of the details of the unimodal map, including the value of z. Therefore, we anticipate that application of RG transformations directly on the HV graphs would lead to a comprehensive description of their self-similar properties and characterization via their fixed-point graphs, in particular those that represent the transitions to chaos. A guide for the construction of the RG transformation  appropriate for the HV graphs already described is to observe in them the effect of functional composition of the map under consideration. Thus, we look at the HV graphs obtained for the period-doubling cascade of unimodal maps when 𝜇 < 𝜇∞ . See the consecutive graphs in Figure 1.4 that are obtained from the original map f𝜇 via the composin tions f𝜇(2) , f𝜇(4) , … , f𝜇(2 ) , … Note that each of these graphs transforms into the previous one if  is defined as the coarse-graining of every couple of adjacent nodes, where at least one of them has degree k = 2 into a block node that inherits the links of the previous two nodes. See Figure 1.11. That is, {G(1, n)} = G(1, n − 1), and therefore an iteration of this process yields an RG flow that converges to the trivial fixed point (n) {G(1, n)} = G(1, 0) ≡ G0 = {G0 }. This is the stable (trivial) fixed point of the RG flow for all 𝜇 < 𝜇∞ . We note that there is also only one

27

28

1 Entropy and Renormalization in Chaotic Visibility Graphs

Figure 1.11 Renormalization process and network RG flow structure. (a) Illustration of the renormalization process: a node with degree is coarse-grained with one of its neighbors (indistinctively) into a block node that inherits the links of both nodes. This process coarse-grains every node with the

degree of each renormalization step. (b) Example of an iterated renormalization process in a sample Feigenbaum graph at a periodic window with initial period after period-doubling bifurcations (an orbit of period). (c) RG flow diagram.

1.3

Renormalization Group Transformations of Horizontal Visibility Graphs

relevant variable in our RG scheme, represented by the reduced control parameter Δ𝜇 ≡ 𝜇c − 𝜇, where in this case 𝜇c = 𝜇∞ . Hence, in order to identify a nontrivial fixed point, we set Δ𝜇 = 0 or equivalently n → ∞, where the structure of the HV graph turns to be completely self-similar under . 1.3.1 Tangent Bifurcation

A common description of the tangent bifurcation [47] that mediates the transition between a chaotic attractor and an attractor of period T starts with the composition f (T) of a one-dimensional map f , that is, the logistic map, at such bifurcation, followed by an expansion around the neighborhood of one of the T points tangent to the line with unit slope. In general, in the neighborhood of the bifurcation, we have x′ = f (T) (x) = x + u sign(x)xz + · · · , z > 1,

(1.31)

where the most common value for the degree of nonlinearity at tangency is z = 2, obtained when the map is analytic at x = 0 with nonzero second derivative. When a small constant term 𝜖 ≲ 0 is added to Eq. (1.31), we observe in the original map f regular period T orbits, but for 𝜖 ≳ 0, the dynamics associated with f consists of quasi-regular motion while xt ≃ 0, named laminar episodes, interrupted by irregular motion until there is reinjection at a position x < 0, that leads to a second laminar episode, and so on, after reinjections at varying positions x < 0. The succession of laminar episodes and irregular bursts is known as intermittency of type I [47]. This can be observed at the windows of periodicity of the logistic map that open with period T for values of control parameter 𝜇 = 𝜇T > 𝜇∞ , in which case 𝜖 = 𝜇T − 𝜇. For convenience, we relabel 𝜇T ≡ 𝜇c . When 𝜖 = 0 trajectories initiated at x0 < 0 evolve monotonically toward x = 0, performing asymptotically a period T orbit in the original map f (x). While trajectories initiated at x0 > 0 move away, also monotonically, from x = 0, escaping soon from the local map in Eq. (1.31). In the original map f this leads, after a finite number of iterations, to reinjection at a position x < 0 of f (T) (x), followed by repetition of the case x0 < 0. The RG fixed-point map at the tangent bifurcation, the solution of Eq. (1.30), was obtained in analytical closed-form Ref. [47] together with the specific value 𝛼 = 21∕(1−z) , which upon expansion around x = 0 reproduces Eq. (1.31). Here, we are interested in reporting the effect of the transformation  on the intermittent graphs G(𝜖) already described. Results for the RG flows include the following [56]: (i) When 𝜖 < 0 (𝜇 ≳ 𝜇c ), trajectories are periodic and every HV graph trivially renormalizes toward the chain graph G0 (an infinite chain with k = 2) for all nodes [56]. The graph G0 is invariant under renormalization {G0 } = G0 , and indeed constitutes a trivial (attractive) fixed point of the RG flow, (n) {G(𝜖 < 0)} = G0 . (ii) When 𝜖 > 0 (𝜇 ≲ 𝜇c ), repeated RG transformations eliminate progressively the links in the graph associated with correlated elements in the time series,

29

30

1 Entropy and Renormalization in Chaotic Visibility Graphs

leading ultimately to the HV graph that corresponds to a random time series Grand . The links between laminar nodes stem mainly from temporal correlated data, whereas the links between burst and peak nodes originate from uncorrelated segments of the time series. If the laminar episodes are eliminated from the time series, the burst and reinjection data values form a new time series, which upon renormalization leads to the random time series. We have limn→∞ (n) {G(𝜖 > 0)} = Grand , where Grand is the HV graph associated with a random uncorrelated process with the aforementioned graph properties. This constitutes the second trivial (attractive) fixed point of the RG flow [56]. (iii) When 𝜖 = 0 (𝜇 = 𝜇c ), the HV graph generated by trajectories at tangency converges after repeated application of  to a nontrivial fixed point. This occurs after only two steps when T = 3, 2 {G(= 0)} = Gc = {Gc } and remains invariant under  afterward. This feature can be demonstrated by explicit application of  upon G(𝜖 = 0) (see Figure 1.12 for a graphical illustration of this process). The fixed-point graph Gc is the HV graph of a

G(ε=0) n∞

{G(ε=0)} n∞

2

{G(ε=0)}

n∞ Figure 1.12 Illustration of the renormalization operator  applied on the HV graph at 𝜖 = 0. This graph renormalizes, after two iterations of , into an HV graph Gc which is

itself (i) invariant under  and (ii) unstable under perturbations in 𝜖, thus constituting a nontrivial (saddle) fixed point of the graphtheoretical RG flow.

1.3

Renormalization Group Transformations of Horizontal Visibility Graphs

monotonically decreasing time series bounded at infinity by a large value, that of the initial position x0 . The fixed-point graph Gc is unstable under perturbations in 𝜖, and it is thus a saddle point of the RG flow, attractive only along the critical manifold [spanned by G(𝜖 = 0) and its replicas within other periodic windows of period T]. The RG flow diagram is shown in Ref. [56]. 1.3.2 Period-Doubling Accumulation Point

A classic example of functional composition RG fixed-point map is the solution of Eq. (1.30) associated with the period-doubling accumulation points shared by all unimodal maps [64] f𝜇 (x) = 1 − 𝜇|x|z , z > 1, −1 ≤ x ≤ 1, 0 ≤ 𝜇 ≤ 2.

(1.32)

In practice, it is often numerically illustrated by use of a single map, the quadratic z = 2 logistic map with the control parameter located at 𝜇 = 𝜇∞ (1) = 1.401155189092, the value for the accumulation point of the main period-doubling cascade [47, 64]. Iterating  on the Feigenbaum graphs, we can trace the RG flows of the period-doubling and band-splitting graphs G(Δ𝜇) already described. A complete schematic representation of the RG flows can be seen in Figure 1.11. Results include the following [21, 49]: (i) We have seen that when Δ𝜇(1) < 0 (𝜇 < 𝜇∞ (1)), the RG flow produced by repeated application of  on the period-doubling cascade of graphs G(1, n) leads to G0 ≡ G(1, 0), the infinite chain with k = 2 for all nodes, that is, {G0 } = G0 is the trivial (attractive) fixed point of this flow. (ii) We have also seen that when Δ𝜇(1) = 0 (𝜇 = 𝜇∞ (1)), the graph G(1, ∞) ≡ G∞ that represents the accumulation point of the cascade of period 2∞ is the nontrivial (repulsive) fixed point of the RG flow, {G∞ } ≡ G∞ . In connection with this, let pt (k) be the degree distribution of a generic Feigenbaum graph Gt in the period-doubling cascade after t iterations of , and point out that the RG operation {Gt } ≡ Gt+1 implies a recurrence relation (1 − pt (2))pt+1 (k) = pt (k + 2), whose fixed point coincides with the degree distribution found for the period-doubling cascade. This confirms that the nontrivial fixed point of the flow is indeed G∞ . (iii) When Δ𝜇(1) > 0 (𝜇 > 𝜇∞ (1)) and 𝜇 does not fall within a window of periodicity m > 1. Under the same RG transformation, the self-affine structure of the family of 2n -band attractors yields {G𝜇 (1, n)} = G𝜇 (1, n − 1), generating an RG flow that converges to the Feigenbaum graph associated to the first chaotic band, (n) {G𝜇 (1, n)} = G𝜇 (1, 0). Repeated application of  breaks temporal correlations in the series, and the RG flow leads to a second trivial fixed point (∞) {G𝜇 (1, 0)} = Grand = {Grand }, where Grand is the HV graph generated by a purely uncorrelated random process. As mentioned above, this graph has a universal degree distribution

31

32

1 Entropy and Renormalization in Chaotic Visibility Graphs

p(k) = (1∕3)(2∕3)k−2 , independent of the random process underlying probability density. (iv) When Δ𝜇(m) < 0 (Δ𝜇(m) ≡ 𝜇∞ (m) − 𝜇), where 𝜇∞ (m) is the accumulation point of the period-doubling cascades in the window of periodicity with initial period m. Since the RG transformation specifically applies to nodes with degree k = 2, the initial applications of  only change the core structure of the graph associated with the specific value m (see Figure 1.11 for an illustrative example). The RG flow will therefore converge to the trivial fixed point G0 via the initial path (p) {G𝜇 (m, n)} = G𝜇 (1, n), with p ≤ m, whereas it converges to the trivial fixed point Grand for G𝜇 (m, n) via (p) {G𝜇 (m, n)} = G𝜇 (1, n). In the limit of n → ∞, the RG flow proceeds toward the nontrivial fixed point G∞ via the path (p) {G(m, ∞)} = G(1, ∞). Incidentally, extending the definition of the reduced control parameter to Δ𝜇(m) ≡ 𝜇∞ (m) − 𝜇, the family of accumulation points is found at Δ𝜇(m) = 0. In summary, the repeated application of the RG transformation  generates flows terminating at two different trivial fixed points G0 and Grand or at the nontrivial fixed point G∞ . The graph G0 is a chain graph, in which every node has two links, Grand is a graph associated with a purely random uncorrelated process, whereas G∞ is a self-similar graph that represents the onset of chaos. The RG properties within the periodic windows are incorporated into a general RG flow diagram. As it is common to all RG applications, crossover phenomenon between these fixed points is present when n is large (or 𝜇 ≃ 𝜇∞ ) for both 𝜇 < 𝜇∞ and 𝜇 > 𝜇∞ . In both cases, the graphs G(1, n − j) and G𝜇 (1, n − j) with j ≪ n closely resemble the selfsimilar G∞ (obtained only when 𝜇 = 𝜇∞ ) for a range of values of the number j of repeated applications of the transformation  until a clear departure takes place toward G0 or Grand when j becomes comparable to n. Hence, for instance, the graph (j) {G𝜇 (1, n)}will only show its true chaotic nature (and therefore converge to Grand ) once j and n are of the same order. In other words, this happens once its degree distribution becomes dominated by the top contribution of p𝜇 (n, k) (alternatively, once the core of the graph, related to the chaotic band structure and the order of visits to chaotic bands, is removed by the iteration of the renormalization process). 1.3.3 Quasi-Periodicity

As with the intermittency and the period-doubling routes, the quasi-periodic route to chaos exhibits universal scaling properties. And an RG approach, analogous to that for the tangent bifurcation and the period-doubling cascade, has been carried out for the critical circle map [47]. The fixed-point map f ∗ (𝜃) of an RG transformation that consists of functional composition and rescaling appropriate for maps with a zero-slope cubic inflection point satisfies ( ) −2 𝜃) , (1.33) f ∗ (𝜃) = 𝛼gm f ∗ 𝛼𝑔𝑚 f ∗ (𝛼𝑔𝑚

1.3

Renormalization Group Transformations of Horizontal Visibility Graphs

where (for the golden mean route) 𝛼gm = −1.288575 is a universal constant [47]. We proceed as above and apply the same RG graph transformation  to the families of HV graphs that represent the quasi-periodic route to chaos associated with the golden mean [65]. Then, we consider other routes associated with other metallic mean numbers. The results are as follows: (i) We have {G𝜙−1 (n)} = G1−𝜙−1 (n − 1) and {G1−𝜙−1 (n)} = G𝜙−1 (n − 1), and hence the RG flow alternates between the two mirror routes described previously. If we define the operator “time reverse” by G𝜙−1 (n) ≡ G1−𝜙−1 (n), the transformation becomes {G𝜙−1 (n)} = G𝜙−1 (n − 1) and {G1−𝜙−1 (n)} = G1−𝜙−1 (n − 1). Repeated application of  yields two RG flows that converge, for n finite, to the trivial fixed point G0 (a graph with p(2) = 1). On the contrary, the quasi-periodic graphs, the accumulation points n → ∞, are nontrivial fixed points of the RG flow: {G𝜙−1 (∞)} = G𝜙−1 (∞) and {G1−𝜙−1 (∞)} = G1−𝜙−1 (∞). However, the above RG procedure works only in the case of the golden ratio route. This can be noted by looking at the silver ratio route shown in Figure 1.10. For this reason, the RG transformation was extended to other irrational numbers by constructing an explicit algebraic version of  and then applying to the Farey fractions associated with the graphs [65]. This is ( ) ( ) ⎧ R p = p if p < 1 , ⎪ 1 q p q−p q 2 = ⎨ (p) R q−p p 1 q ⎪R2 q = 1 − p if q > 2 , ⎩

(1.34)

along with the algebraic analog of the “time reverse” operator R(x) = 1 − R(x). Observe that along the golden ratio route, fractions are always greater than 1∕2, and we can therefore renormalize this route by setting ) ( ) ( Fn−1 F Fn−1 = R2 = n−2 , R (1.35) Fn Fn Fn−1 whose fixed-point equation R(x) = x is x2 + x − 1 = 0, with 𝜙−1 a solution of it. The generalization of this scheme to the metallic number ratios, irrational numbers with simple continued fractions, is obtained by considering the routes {3c9n = Fn−1 ∕Fn }n=1,2,3,… with Fn = bFn−1 + Fn−2 , F0 = 1, F1 = 1 and b √ a natural number. It can be easily observed that limn→∞ Fn−1 ∕Fn = (−b + b2 + 4)∕2, which is a solution of the equation x2 + 𝑏𝑥 − 1 = 0. Interestingly, all the positive solutions of the above family of quadratic equations happen to be positive quadratic irrationals in [0, 1] with pure periodic continued fraction representation: 𝜙−1 = [b, b, b, …] = [b] (b = 1 corresponds b to the golden route, b = 2 to the silver route, etc.). Every b > 1 fulfills the condition Fn−1 ∕Fn < 1∕2, and, as a result, we have ) ( ) ( Fn−1 Fn−1 Fn−1 = R1 = . (1.36) R Fn Fn (b − 1)Fn + Fn−2

33

34

1 Entropy and Renormalization in Chaotic Visibility Graphs

The transformation R1 can only be applied (b − 1) times before the result turns greater than 1∕2, so the subsequent application of R followed by reversion yields ( ) [ ( )] Fn−1 Fn−1 F = R2 R1(b−1) = n−2 . (1.37) R(b) Fn Fn Fn−1 It is easy to demonstrate by induction that R1(b−1) (x) =

x , 1 − (b − 1)x

(1.38)

whose fixed-point equation R1(b−1) (x) = R2 [R1(b−1) (x)] = x leads in turn to a solution of it. We can proceed in an analogous x2 + 𝑏𝑥 − 1 = 0, with 𝜙−1 b way for the symmetric case 3c9n = 1 − (Fn−1 ∕Fn ), but, as the sense of the inequalities for 1∕2 is reversed, the role of the operators R1 and R2 must be exchanged. The RG flow results are: (ii) The graphs for fixed b ≥ 2 are renormalized via R(b) {G𝜙−1 (n)} = G𝜙−1 (n − 1), b b and, as before, it is found that iteration of this process yields two RG flows that converge to the trivial fixed point G0 for n finite. The quasi-periodic graphs, reached as accumulation points (n → ∞), act as nontrivial fixed points of the RG flow, since R(b) {G𝜙−1 (∞)} = G𝜙−1 (∞). b b (iii) Again for fixed b ≥ 2, it is found with the help of the construction process illustrated in Figure 1.10, that p∞ (2) = 𝜙−1 , p∞ (3) = 1 − 2𝜙−1 and b b p∞ (k ≠ 𝑏𝑛 + 3) = 0, n = 1, 2, 3, … Whereas p∞ (k = 𝑏𝑛 + 3), n = 1, 2, 3, … can be obtained from the condition of RG fixed-point invariance of the p∞ (k + b), whose distribution, as it implies a balance equation p∞ (k) = 𝜙−1 b solution has the form of an exponential tail. The degree distribution p∞ (k) for these sets of quasi-periodic graphs was given earlier. 1.3.4 Entropy Extrema and RG Transformation

An important question pointed out some time ago [66] is whether there exists a connection between the extremal properties of entropy expressions and the RG approach. Namely, that the fixed points of RG flows can be obtained through a process of entropy optimization, adding to the RG approach a variational quality. The families of HV graphs obtained for the three routes to chaos offer a valuable opportunity to examine this issue. As we have seen, they possess simple closed expressions for the degree distribution p(k) and through them there is, when not analytical, exact quantitative access to their entropy ∑ p(k) log p(k). (1.39) h[p(k)] = − k

On the contrary, these families have been ordered along RG flows and their basic fixed points have been determined. The answer provided by HV graphs to the question posed above is clearly in the affirmative. We give some details below.

1.3

Renormalization Group Transformations of Horizontal Visibility Graphs

1.3.4.1 Intermittency

It is found that the entropy h reaches a minimum value at tangency and that this value is retained for 𝜖 < 0 [56]. The approach to the minimum, at h(𝜖 = 0) = log 3, for the window of periodicity T = 3 of the logistic map can be seen in Figure 1.8. This value is maintained within the window, provided |𝜖| is below the period-doubling bifurcations that take place there. Hence, entropy reaches a global minimum for the HV graph at tangency. Next, we inquire about the effect of the RG transformations on h. The entropy at the nontrivial fixed point vanishes, as h[pGc (k)] → 0 when the number of nodes N → ∞, that is, the RG reduces h when 𝜖 = 0. Also, the RG transformations increase h when 𝜖 > 0 (as h[pGrand (k)] = log(27∕4) [21, 49]) and reduce it when 𝜖 < 0 (since h[pG0 (k)] = 0 [21, 49]). When 𝜖 > 0, the renormalization process of removal at each stage of all nodes with k = 2 leads to a limiting renormalized system that consists only of a collection of uncorrelated variables, generating an irreversible flow along which the entropy grows. On the contrary, when 𝜖 < 0, renormalization increments the fraction of nodes with degree k = 2 at each stage driving the graph structure toward the simple chain G0 and thus decreases its entropy to its minimum value. 1.3.4.2 Period Doubling

As we have seen, the degree distribution p(k) that maximizes h is exactly p(k) = (1∕3)(2∕3)k−2 , which corresponds to the distribution for the second trivial fixed point of the RG flow Grand . Alternatively, with the incorporation of the additional constraint that allows only even values for the degree (the topological restriction for Feigenbaum graphs G(1, n)), entropy maximization yields a degree distribution that coincides with the one found in the nontrivial fixed point of the RG flow G∞ . Finally, the degree distribution that minimizes h trivially corresponds to G0 , the first trivial fixed point of the RG flow. Remarkably, these results indicate that the fixed-point structure of the RG flow is obtained via optimization of the entropy for the entire family of networks. The network entropy is trivially minimized for a degree distribution p(2) = 1, that is, at G0 with h = 0. The entropy h is an increasing function of k, attaining its larger value for the upper-bound value k = 4, which reduces to p(k) = (1∕2)k∕2 , k = 2, 4, 6, . . . . We conclude that the maximum entropy of the entire family of Feigenbaum graphs (when we require that odd values for the degree are not allowed) is achieved at the accumulation point, that is, at the nontrivial fixed point G∞ of the RG flow. These results indicate that the fixed-point structure of an RG flow can be obtained from an entropy optimization process, confirming the aforementioned connection. 1.3.4.3 Quasi-periodicity

Notably, all the aforementioned RG flow directions and fixed points for this route to chaos can be derived directly from the information contained in the degree distribution via optimization of the graph entropy functional h[p(k)]. The , optimization is for a fixed b and takes into account the constraints: p(2) = 𝜙−1 b

35

36

1 Entropy and Renormalization in Chaotic Visibility Graphs

p(3) = 1 − 2𝜙−1 , maximum possible mean connectivity k = 4 and p(k) = 0 for all b k ≠ 𝑏𝑛 + 3, n = 1, 2, 3, …. The degree distributions p(k) that maximize h[p(k)] can be proven to be exactly the connectivity distributions in Eqs (1.21) and (1.22) for the quasi-periodic graphs at the accumulation points found above. This establishes a functional relation between the fixed points of the RG flow and the extrema of h[p(k)], as it was verified for the intermittency and the period-doubling routes. Thus, we observe the familiar picture of the RG treatment of a model phase transition, two trivial fixed points that represent disordered and ordered, or highand low-temperature phases, and a nontrivial fixed point with scale-invariant properties that represent the critical point. There is only one relevant variable, Δ𝜇 = 𝜇c − 𝜇, that is necessary to vanish to enable the RG transformation to access the nontrivial fixed point.

1.4 Summary

Visibility algorithm is a tool that allows mapping time series into graphs. This algorithm has been applied with interesting results to several research areas. In this chapter, we have introduced several definitions of entropy applied to visibility graphs: graph entropy, and graph Kolmogorov–Sinai entropy. These entropies defined in the graph are equivalent to Shannon entropy and Kolmogorov–Sinai entropy of the time series. In fact, we have seen that the former are very good proxies of the latter, and we have found that there is a very good agreement between these entropies and the Lyapunov exponent of the corresponding chaotic time series, in view of the Pesin theorem. Graph entropy also allows to identify the critical points in chaotic maps, via optimization of this entropy. We have seen that critical points correspond to extremals in the process of graph entropy maximization, which produces degree distribution of the visibility graphs at the critical points and at two trivial points: the random series and constant series. Finally, we have defined some renormalization processes in the visibility graphs that generate flows leading to the same points than the graph entropy maximization does: two fixed points that represent ordered and disordered phases (i.e., the constant series and the random series, respectively), and a nontrivial fixed point that represents the critical point. The property that is seldom observed [66] is that an entropy functional, in the present case h[p(k)], varies monotonously along the RG flows and is extremal at the fixed points. A salient feature of the HV studies of the routes to chaos in lowdimensional nonlinear iterated maps, intermittency [56], period doubling [21, 49], and quasi-periodicity [65] is the demonstration that the entropy functional h[p(k)] attains extremal (maxima, minima, or saddle point) values at the RG fixed points.

References

1.5 Acknowledgments

A. Robledo acknowledges support by DGAPA-UNAM-IN103814 and CONACyT-CB-2011-167978 (Mexican Agencies). F.J. Ballesteros acknowledges support by the project AYA2013-48623-C2-2 and B. Luque by the project FIS2013-41057-P both from the Spanish Ministry of Economy and Competitiveness.

References 1. Lacasa, L., Luque, B., Ballesteros, F.,

2.

3.

4.

5.

6.

7.

8.

Luque, J., and Nu no, J.C. (2008) From time series to complex networks: the visibility graph. Proc. Natl. Acad. Sci. U.S.A., 105 (13), 4972–4975. Nu nez, A., Lacasa, L., Luque, B., and Gómez, J.P. (2011) Visibility Algorithms in Graph Theory (edited by Intech 2011), ISBN: 979-953-307-303-2. Lacasa, L. and Luque, B. (2011) in Mapping Time Series to Networks: A Brief Overview of Visibility Algorithms in Computer Science Research and Technology, vol. 3 (ed. J.P. Bauer), Nova Publishers, ISBN: 978-1-61122-074-2. Zhang, J. and Small, M. (2006) Complex network from pseudoperiodic time series: topology versus dynamics. Phys. Rev. Lett., 96 (23), 238701. Xu, X., Zhang, J., and Small, M. (2008) Superfamily phenomena and motifs of networks induced from time series. Proc. Natl. Acad. Sci. U.S.A., 105 (50), 19601–19605. Donner, R.V., Zou, Y., Donges, J.F., Marwan, N., and Kurths, J. (2010) Recurrence networks - a novel paradigm for nonlinear time series analysis. New J. Phys., 12 (3), 033025. Donner, R.V., Small, M., Donges, J.F., Marwan, N., Zou, Y., Xiang, R., and Kurths, J. (2010) Recurrence-based time series analysis by means of complex network methods. Int. J. Bifurcation Chaos, 21 (4), 1019–1046. Donner, R.V., Donges, J.F., Zou, Y., and Feldhoff, J.H. (2015) Complex Network Analysis of Recurrences. Recurrence Quantification, Springer International Publishing, pp. 101–163.

9. Campanharo, A.S.L.O., Sirer, M.I.,

10.

11.

12.

13.

14.

15.

16.

Malmgren, R.D., Ramos, F.M., and Amaral, L.A.N. (2011) Duality between time series and networks. PLoS ONE, 6 (8), e23378. Shirazi, A.H., Jafari, G.R., Davoudi, J., Peinke, J., Tabar, M.R.R., and Sahimi, M. (2011) Mapping stochastic processes onto complex networks. J. Stat. Mech. Theory Exp., 2009 (07), P07046. Strozzi, F., Zaldívar, J.M., Poljansek, K., Bono, F., and Gutiérrez, E. (2009) From Complex Networks to Time Series Analysis and Viceversa: Application to Metabolic Networks. JRC Scientific and Technical Reports, EUR 23947, JRC52892. Haraguchi, Y., Shimada, Y., Ikeguchi, T., and Aihara, K. (2009) Transformation from complex networks to time series using classical multidimensional scaling. Proceedings of the 19th International Conference on Artificial Neural Networks, ICANN 2009, Springer-Verlag, Heidelberg, Berlin. Shimada, Y., Ikeguchi, T., and Shigehara, T. (2012) From networks to time series. Phys. Rev. Lett., 109 (15), 158701. Watts, D.J. and Strogatz, S.H. (1998) Collective dynamics of ‘small-world’ networks. Nature, 393, 440–442. Gao, Z. and Jin, N. (2009) Complex network from time series based on phase space reconstruction. Chaos, 19 (3), 033137. Sinatra, R., Condorelli, D., and Latora, V. (2010) Networks of motifs from sequences of symbols. Phys. Rev. Lett., 105 (17), 178702.

37

38

1 Entropy and Renormalization in Chaotic Visibility Graphs 17. Sun, X., Small, M., Zhao, Y., and

18.

19.

20.

21.

22.

23.

24.

25.

26.

27.

Xue, X. (2014) Characterizing system dynamics with a weighted and directed network constructed from time series data. Chaos, 24, 024402, doi: 10.1063/1.4868261. Luque, B., Lacasa, L., Balleteros, F., and Luque, J. (2009) Horizontal visibility graphs: exact results for random time series. Phys. Rev. E, 80 (4), 046103. Gutin, G., Mansour, T., and Severini, S. (2011) A characterization of horizontal visibility graphs and combinatorics on words. Physica A, 390 (12), 2421–2428. Lacasa, L. and Toral, R. (2010) Description of stochastic and chaotic series using visibility graphs. Phys. Rev. E, 82 (3), 036120. Luque, B., Lacasa, L., Ballesteros, F.J., and Robledo, A. (2011) Feigenbaum graphs: a complex network perspective of chaos. PLoS ONE, 6 (9), e22411. Nú nez, A.M., Lacasa, L., Valero, E., Gómez, J.P., and Luque, B. (2012) Detecting series periodicity with horizontal visibility graphs. Int. J. Bifurcation Chaos, 22 (7), 1250160. Aguilar-San Juan, B. and Guzman-Vargas, L. (2013) Earthquake magnitude time series: scaling behavior of visibility networks. Eur. Phys. J. B, 86, 454. Telesca, L. and Lovallo, M. (2012) Analysis of seismic sequences by using the method of visibility graph. Eur. Phys. Lett., 97 (5), 50002. Telesca, L., Lovallo, M., and Laszlo, T. (2014) Visibility graph analysis of 2002–2011 Pannonian seismicity. Physica A, 416, 219–224. Telesca, L., Lovallo, M., Ramirez-Rojas, A., and Flores-Marquez, L. (2014) Relationship between the frequency magnitude distribution and the visibility graph in the synthetic seismicity generated by a simple stick-slip system with asperities. PLoS ONE, 9 (8), e106233, doi: 10.1371/journal.pone.0106233. Elsner, J.B., Jagger, T.H., and Fogarty, E.A. (2009) Visibility network of United States hurricanes. Geophys. Res. Lett., 36 (16), L16702.

28. Liu, C., Zhou, W.-X., and Yuan, W.-K.

29.

30.

31.

32.

33.

34.

35.

36.

37.

38.

39.

40.

41.

(2010) Statistical properties of visibility graph of energy dissipation rates in three-dimensional fully developed turbulence. Physica A, 389 (13), 2675–2681. Yang, Y., Jianbo, W., Yang, H., and Mang, J. (2009) Visibility graph approach to exchange rate series. Physica A, 388 (20), 4431–4437. Qian, M.-C., Jiang, Z.-Q., and Zhou, W.-X. (2010) Universal and nonuniversal allometric scaling behaviors in the visibility graphs of world stock market indices. J. Phys. A: Math. Theor., 43 (33), 335002. Shao, Z.-G. (2010) Network analysis of human heartbeat dynamics. Appl. Phys. Lett., 96 (7), 073703. Dong, Z. and Liâ, X. (2010) Comment on Network analysis of human heartbeat dynamics. Appl. Phys. Lett., 96 (26), 266101. Ahmadlou, M., Adeli, H., and Adeli, A. (2010) New diagnostic EEG markers of the Alzheimer’s disease using visibility graph. J. Neural Transm., 117 (9), 1099–1109. Rashevsky, N. (1955) Life information theory and topology. Bull. Math. Biophys., 17 (3), 229–235. Trucco, E. (1956) A note on the information content of graphs. Bull. Math. Biol., 18 (2), 129–135. Mowshowitz, A. (1968) Entropy and the complexity of graphs (I to IV). Bull. Math. Biophys., 30 (3), 387–414. Körner, J. (1973) Coding of an information source having ambiguous alphabet and the entropy of graphs. Transactions of the 6th Prague Conference on Information Theory, 1971, Academia, Prague, pp. 411–425. Dehmer, M. and Mowshowitz, A. (2011) A history of graph entropy measures. Inf. Sci., 181 (1), 57–78. Shannon, D.E. (1948) A mathematical theory of communication. Bell Syst. Tech. J., 27 (3), 379–423. Kolmogorov, A.N. (1965) Three approaches to the quantitative definition of information. Probab. Inf. Transm., 1 (1), 1–7. Sinai, Ya.G. (1959) On the concept of entropy for dynamical systems. Dokl.

References

42.

43.

44.

45.

46.

47.

48.

49.

50.

51.

52.

53.

54.

Acad. Nauk, SSSR124, 768–771 (in Russian). Pesin, Y. (1997) Dimension Theory in Dynamical Systems: Contemporary Views and Applications, University of Chicago Press, Chicago. Lagues, M. and Lesne, A. (2011) Invariances d’echelle, 2nd edn, Berlin, Paris. English traduction Scaling, Springer Berlin. Castiglione, P., Falcioni, M., Lesne, A., and Vulpiani, A. (2008) Chaos and Coarse-Graining in Statistical Mechanics, Cambridge University Press, Cambridge. Karamanos, K. and Nicolis, G. (1999) Symbolic dynamics and entropy analysis of Feigenbaum limit sets. Chaos, Solitons Fractals, 10 (7), 1135–1150. Peitgen, H.O., Jurgens, H., and Saupe, D. (1992) Chaos and Fractals: New Frontiers of Science, Springer-Verlag, New York. Schuster, H.G. (1988) Deterministic Chaos. An Introduction, 2nd revised edn, VCH Publishers, Weinheim. Strogatz, S.H. (1994) Nonlinear Dynamics and Chaos, Perseus Books Publishing, LLC. Luque, B., Lacasa, L., Ballesteros, F.J., and Robledo, A. (2012) Analytical properties of horizontal visibility graphs in the Feigenbaum scenario. Chaos, 22 (1), 013109. Schroeder, M. (1991) Fractals, Chaos, Power Laws: Minutes from An Infinite Paradise, Freeman and Co., New York. Crutchfield, J.P., Farmer, J.D., and Huberman, B.A. (1982) Fluctuations and simple chaotic dynamics. Phys. Rep., 92 (2), 45–82. Maurer, J. and Libchaber, A. (1980) Effect of the Prandtl number on the onset of turbulence in liquid 4He. J. Phys. Lett., 41 (21), 515–518. Pomeau, Y., Roux, J.C., Rossi, A., Bachelart, S., and Vidal, C. (1981) Intermittent behaviour in the BelousovZhabotinsky reaction. J. Phys. Lett., 42 (13), 271–273. Bergé, P., Dubois, M., Manneville, P., and Pomeau, Y. (1980) Intermittency

55.

56.

57.

58.

59.

60.

61.

62.

63.

64.

65.

66.

in Rayleigh-Bénard convection. J. Phys. Lett., 41 (15), 341–345. Manneville, P. and Pomeau, Y. (1980) Intermittent transition to turbulence in dissipative dynamical systems. Commun. Math. Phys., 74 (2), 189–197. Nú nez, A., Luque, B., Lacasa, L., ómez, J.P.G., and Robledo, A. (2013) Horizontal visibility graphs generated by type-I intermittency. Phys. Rev. E, 87 (5), 052801. Hirsch, J.E., Huberman, B.A., and Scalapino, D.J. (1982) Theory of intermittency. Phys. Rev. A, 25 (1), 519–532. Kim, M.C., Kwon, O.J., Lee, E.K., and Lee, H. (1994) New characteristic relations in type-I intermittency. Phys. Rev. Lett., 73 (4), 525–528. Hilborn, R.C. (1994) Chaos and Nonlinear Dynamics, Perseus Books Publishing, LLC. Shechtman, D., Blech, I., Gratias, D., and Cahn, J.W. (1984) Metallic phase with long-range orientational order and no translational symmetry. Phys. Rev. Lett., 53 (20), 1951–1953. Hao, B.-H. and Zeng, W.-M. (1998) Applied Symbolic Dynamics and Chaos, World Scientific Publishing Co., Singapore. Zinn-Justin, J. (2002) Quantum Field Theory and Critical Phenomena, Clarendon Press, Oxford, ISBN: 0-19-850923-5. Maris, H.J. and Kadanoff, L.P. (1978) Teaching the renormalization group. Am. J. Phys., 46 (6), 652–657. van der Weele, J.P., Capel, H.W., and Kluiving, R. (1987) Period doubling in maps with a maximum of order z. Physica A, 145 (3), 425–460. Luque, B., Nú nez, A., Ballesteros, F., and Robledo, A. (2013) Quasiperiodic graphs: structural design, scaling and entropic properties. J. Nonlinear Sci., 23 (2), 335–342. Robledo, A. (1999) Renormalization group, entropy optimization, and nonextensivity at criticality. Phys. Rev. Lett., 83 (12), 2289–2292.

39

41

2 Generalized Entropies of Complex and Random Networks Vladimir Gudkov

2.1 Introduction

The topology of complex networks has become a subject of growing interest during the recent years. The knowledge of network topology is crucial for understanding the structure, functionality, and evolution of the whole network and its building constituents. It can be used for many practical applications, including the study of network vulnerabilities, identification of functional relations between subgroups in the given network, and finding of hidden group activities. Real-world networks are usually very large; therefore, community detection in complex networks is known to be a difficult problem that demands very large computational resources, especially if a good level of accuracy is needed. Many methods have been proposed to solve the problem (see, e.g., [1–7] and references therein). However, the properties of modularity have not been fully studied, and the resolution of the clustering method based on its optimization is intrinsically limited by the number of links in the network [8]. The existence of a resolution limit for community detection implies that it is impossible to tell a priori whether a module contains substructure (i.e., if smaller clusters can be refined inside it). This is particularly important if the network has a self-similar character (e.g., a scale-free network), in which case a single partition does not describe the structure completely; and a tree-like partition that digs into different levels of structure is more appropriate. Another important topic in the study of complex networks is the possibility of existence of a correlation between the structure of the whole network and a representative part of it (a set of randomly chosen nodes). For instance, in the framework of social networks, choosing a sample of people and their links could provide important information about the structure of a large unknown network to which they belong to and a complete description of which might not be feasible due to the commonly large size of complex networks or another possible reason. Therefore, the development of a method that can characterize the whole network from the incomplete information available about it is a helpful tool to analyze network vulnerability, topology, and evolution. Mathematical Foundations and Applications of Graph Entropy, First Edition. Edited by Matthias Dehmer, Frank Emmert-Streib, Zengqiang Chen, Xueliang Li, and Yongtang Shi. © 2016 Wiley-VCH Verlag GmbH & Co. KGaA. Published 2016 by Wiley-VCH Verlag GmbH & Co. KGaA.

42

2 Generalized Entropies of Complex and Random Networks

One of the methods is related to the concept of entropy. It should be noted that there is no unique definition of entropy of a network. In last years, people used different definitions of network entropies that are useful for different fields of network studies (see, e.g., [5, 9–11]). For a comprehensive review of a history of graph/network entropy measures, see [12]. The main contribution of this paper is based on the results of [13, 14] with entropy defined according to [15]. The concept of entropy is a fundamental concept of statistical physics. Entropy is a characteristic of the state of a system, which contains information related to general structure of the system, such as the level of a disorder. Since real networks are usually very large, one can attempt to describe them in terms of macroscopic parameters using methods similar to those used in statistical physics. Being interested in a function that can describe the general (topological) features of the network as well as its interconnectivity simultaneously, we have found the entropy that takes into account not only properties of the nodes in the network (e.g., their degree of connectivity), but also parameters of interactions (information exchange) between each pair of nodes in the network. Thus, we are able to address the problem of how organized (or disorganized) the network is in general describing it in terms of the entropy and mutual entropy functions. There are other reasons, as to why to explore a generalized entropy approach for description of network dynamics. For example, it has been shown in Refs [16–18] that Rényi entropy can be considered as a measure of localization of complex systems. Another observation [19, 20] is the relationship between Rényi entropies and multifractal dimensionalities of local substructures of complex systems, which may be helpful for the description and understanding of the dynamics and topology of complex networks.

2.2 Generalized Entropies

The Shannon entropy corresponds to Boltzmann entropy in thermodynamics S=−

n ∑

pk log2 pk ,

(2.1)

k=1

where pk is the probability of the system to be found in the kth microstate. However, if we consider a general relation between entropy, as a measure of disorder of a system, and information, as a measure of our knowledge of the system (i.e., entropy is equal to information with an opposite sign), we immediately get to the question: in how many ways can we define the entropy/information of the system? The answer for this question leads to Kolmogorov–Nagumo theorem [21, 22], which results in only one additional option: ( n ) ∑ q 1 Rq = log (2.2) pk , (1 − q) k=1

2.3

Entropy of Networks: Definition and Properties

known as Rényi information/entropy of order q (q ≥ 0). It should be noted that for q = 1, it coincides with the Shannon entropy/information. Furthermore, the meaning of the order q has been associated with the density of the multifractal dimension of the phase space of a quantum system [16, 17], and Rényi’s entropies of orders q = 0, 1, 2 have been related correspondingly to the total number of components of the wave functions, Shannon’s entropy, and the participation number [18] (the participation number tells how many of the components possess a probability which is significantly larger than zero). The differences between entropies with different order q are a very useful tool to study the shape of the probability distributions, and therefore network structure. Thus, the difference between entropies of order q = 1 and q = 2, which is referred to as a “structural entropy”’ [16, 17], can characterize [14] local properties of the system. In other words, we can say that the difference R1 − R2 acts as a filter for the information in the distribution that can be attributed to the fractal structures of the network. It is worth noting that we are interested in describing the state of the nodes in a network in a general way. This requires a class of functions that provides us not only with information relevant to understand the probabilities of nodes having a particular degree, but also about how the connectivity of a particular node is related to the connectivities of other nodes in the network. Therefore, it is natural to use not only the entropy of the whole network described above, but also the mutual entropy (information) [23], which is defined in terms of conditional information (in a way similar to the definition of conditional probability). Following the standard definition (see, e.g., [23]), mutual information (Shannon or Rényi) for two probability distributions 𝜉 and 𝜂 can be written as I(𝜉, 𝜂) = I(𝜉) − I(𝜉|𝜂) = I(𝜉) + I(𝜂) − I((𝜉, 𝜂)),

(2.3)

where I(𝜉) is the usual Shannon/Rényi information for a 𝜉-probability distribution, I(𝜉|𝜂) is the conditional information, and I((𝜉, 𝜂)) = I(𝜂) + I(𝜉|𝜂) is a two dimensional information, which is equal to a sum of information I(𝜉) + I(𝜂) when 𝜉 and 𝜂 are independent distributions.

2.3 Entropy of Networks: Definition and Properties

In order to apply definitions of entropies to networks, we represent a network in terms of adjacency (or connectivity) matrix C. Thus, if a network consists of n nodes, one can define elements of the matrix C as C𝑖𝑗 = C𝑗𝑖 = 1

(2.4)

if nodes i and j are connected and C𝑖𝑗 = C𝑗𝑖 = 0 if they are disconnected, assuming that C𝑖𝑖 = 0.

(2.5)

43

44

2 Generalized Entropies of Complex and Random Networks

In general case, we can define nonzero elements of the matrix C as real numbers or by functions which represent the intensity of connections and describe interactions between nodes in the given network. Then, elements of the matrix C can contain information about connectivity between nodes such as intensity of connections and details of information exchange. For any choice of the connectivity matrix C, one can renormalize it requesting that n ∑

C𝑖𝑗 = 1.

(2.6)

i,j=1

Then, the parameter pk , defined as pk =

n ∑

C𝑘𝑗 ,

(2.7)

j

gives a level of the connectivity of the node k to all other nodes of the network. ∑ Taking into account that k pk = 1, one can consider pk as the “probability” for the node k to be connected to other nodes in the network. Now using Eq. (2.1), we can calculate Shannon’s entropy of the network [15] as H(𝑟𝑜𝑤) = −

n ∑

pi log pi ,

(2.8)

i=1

which could be considered as a measure of the uncertainty of connections of the rows of the matrix C for the given network. The amount of uncertainty for the connection of the column nodes given that the row nodes are connected is H(𝑐𝑜𝑙𝑢𝑚𝑛|𝑟𝑜𝑤) = −

n ∑

C𝑖𝑗 log C𝑖𝑗 − H(𝑟𝑜𝑤).

(2.9)

i,j

As a result, the amount of mutual information I(C) gained via the given connectivity of the network is I(C) = H(𝑟𝑜𝑤) + H(𝑐𝑜𝑙𝑢𝑚𝑛) − H((𝑐𝑜𝑙𝑢𝑚𝑛, 𝑟𝑜𝑤)) =

n ∑

C𝑖𝑗 log (C𝑖𝑗 ∕pi pj ), (2.10)

i,j

where H((𝑐𝑜𝑙𝑢𝑚𝑛|𝑟𝑜𝑤)) = −

n ∑

C𝑖𝑗 log (C𝑖𝑗 ).

(2.11)

i,j

It should be noted that, due to the double summation and the symmetry of the connectivity matrix, I(C) does not depend on the vertex relabeling, and is a permutation invariant measure of the connectivity matrix. For the case of Rényi entropy/information with the probability defined in Eq. (2.7), one has [14] ( n ) ∑ q 1 log (2.12) pk Iq = −Hq = (1 − q) k=1

2.4

and

Application of Generalized Entropy for Network Analysis

( 1 log Iq (𝑐𝑜𝑙𝑢𝑚𝑛, 𝑟𝑜𝑤) = − (1 − q)

n ∑

) q C𝑖𝑗

.

(2.13)

i,j

It is worth mentioning that there is no unique way to define the generalized network entropies. Therefore, another definition of probability may lead to different expressions for the generalized network entropies (see, e.g., [11]). Within the context of the Rényi’s entropies, and with the definition of probability given in Eq. (2.7), we can relate the mutual entropies of orders q = 0, 1, and 2, respectively, to the number of nodes, the information supplied by the degree distribution, and the information supplied by stronger structural nodes with large degree of connectivity. This implies that the difference between Rényi’s entropies of order q = 1 and q = 2 can be interpreted as a measure of the amount of information contained in the network that is being contributed by the most structural nodes (with the largest degree of connectivity), but is not supplied solely by the degree distribution. Considering the possibility that elements of the connectivity matrix can be real numbers, we call “simple” or S-type C matrices the connectivity matrices built with only 0s and 1s. On the contrary, if the connectivity matrix contains real numbers, we refer to them as “real” R-type C matrices.

2.4 Application of Generalized Entropy for Network Analysis

Equations (2.8)–(2.13) allow us to find relationships between the generalized mutual entropy (using standard equivalence of an entropy as a negative information) of a network and other functions, namely the (one-dimensional) generalized entropy of node degree distributions, and an average correlator, Kq , whose role is to quantify the divergence of the mutual information function for an R-type network compared to that of an S-type network with the same topological structure. These relations can be formulated as a theorem [14]: Theorem 2.1. The generalized mutual entropy Hq (C) for the connectivity matrix C can be represented as the sum of the generalized entropy Dq of the degrees of connectivity distribution of the nodes and the average correlator Kq (C): Hq = 2Dq − log N − Kq (C), where N is the total number of links in the network, Dq (C) = and ⎧ n q ∑ C𝑖𝑗 1 ⎪ log for q ≠ 1, N ⎪ 1−q 𝑖𝑗 ) Kq (C) = ⎨ ( n ∑ ⎪1 for q = 1. C𝑖𝑗 log C𝑖𝑗 N ⎪ 𝑖𝑗 ⎩

( 1 (1−q)

log

n ∑ k=1

q pk

) ,

45

46

2 Generalized Entropies of Complex and Random Networks

Proof. This theorem can be easily proved by direct substitution of Eqs (2.12) and (2.13) into Eq. (2.10). ◽ The interpretation of the theorem is thus immediate: the mutual entropy of a network is the sum of the entropy of the degree distribution, which corresponds to a topological structure (a factor of 2 comes from symmetry of the C-matrix), and the entropy due to “link structure” between nodes, which corresponds to interactions between nodes. For example, for evenly connected nodes (each element in the C-matrix having the same link intensity), all correlators Kq (C) are equal to zero. Therefore, all nodes interact “equally” and the entropy of interactions (information exchange between nodes) has the maximum value log N (since for an even-degree distribution pi = 1∕N, and the entropy is equal to log N), which corresponds to a completely disordered state of the system. The increase of the order in information exchange scheme leads to nonzero correlators, which decrease the total entropy. It is obvious that for S-type connectivity matrices (no structure in the information exchange), entropy of the network can be attributed to the particulardegree distribution and the maximum “interaction” entropy log N (all correlators are zeros). This fact can be stated as a corollary of the theorem above, as follows: Corollary 2.2. For the S-type connectivity matrix C, the generalized mutual entropy Hq (C) of the network contains exactly the same information as the (one-dimensional) generalized entropy Dq of the degree of connectivity of nodes. They are related by the equation Hq = 2Dq − log N. The last corollary is of particular importance for structural analysis purposes, since it shows clearly that for a network containing only binary information about the connections between nodes, the bulk of the information is contained in the degree distribution, while for a network with different intensities associated to its links, the part of the mutual entropy can come from the structure of information exchange between nodes. In order to study the properties of generalized entropies and their possible applications, we need to apply them to networks with different natures (scale-free, random, etc.) and different sizes. It is also important to identify which contributions to the parameters under investigation come from topological network structure, interactions between nodes, and random noise, which does not change major network characteristics. For this purpose, we use results presented in Ref. [14], where simulate networks of size ≥3000 nodes were studied. The algorithm for simulation used a preference attachment rule [24] for a creation of each new link according to the probability Πi : Πi = ∑t

(1 − p)di + p

j=1 [(1

− p)dj + p]

,

(2.14)

2.4

Application of Generalized Entropy for Network Analysis

where di is the degree of ith node, p is a “weighting” parameter, whose values lie in the range of 0–1 (0 ≤ p ≤ 1). The parameter p helps simulate networks with different properties. Thus, this algorithm can produce a scale-free network if p = 0 and a random network if p = 1, and different mixed networks for 0 ≤ p ≤ 1 (see [24] for details). First, we consider the case of scale-free networks (p = 0) to clarify contributions to the entropies from topological structure of networks and from “information exchange” between nodes. Figure 2.1 shows the components of the mutual entropy for a scale-free network of 5000 nodes as functions of order of entropy q. In order to distinguish between topological and “information exchange” contributions, we assign the values of the nonzero elements of the adjacency matrices of the simulated networks using three essentially different prescriptions: (a) the matrix element values C𝑖𝑗 are equal to the product of the degrees of nodes i and j; (b) their values are equal to the inverse product of the node’s degrees; and (c) the values of matrix elements are uniformly distributed as real numbers in the range between 0 and 1 (see Figure 2.1a–c, correspondingly). On these figures, the generalized mutual entropies H of the network (shown by circles), generalized entropy D of degree distributions (squares), and the average correlator K (stars) are plotted 10

10

8

8

6

6

4 2 0 –2 10–1 (a)

4

H (S-type) D (S-type) K (S-type) H (R-type) D (R-type) K (R-type)

2 0 100 q

101

100 q

101

–2 10–1 (b)

H (S-type) D (S-type) K (S-type) H (R-type) D (R-type) K (R-type)

100 q

101

10 8 6 4 2 0 –2 10–1 (c)

H (S-type) D (S-type) K (S-type) H (R-type) D (R-type) K (R-type)

Figure 2.1 The generalized mutual entropy H, the generalized entropy D, and the average correlator K are plotted versus the mutual entropy order q when the nonzero matrix elements of the connectivity matrix

are replaced by the products of the nodes degrees (a), the inverses of the products of the nodes degrees (b), and a uniform distribution of real numbers in the range between 0 and 1 (c).

47

48

2 Generalized Entropies of Complex and Random Networks

against q. Filled symbols represent the values for the S-type matrix, and the hollow symbols the R-type matrix. One can see that the mutual entropy for S- and R-types of networks (and their components) differs significantly only when the strengths of the links (values of matrix elements) vary in a large range of values. If the connection strength follows a flat distribution of random numbers, the mutual information does not differ very much between R- and S-types. This is because, the information exchange part became practically unstructured (disordered) and, as a consequence, variations of the total entropy are determined mostly by contributions from the topological part. Thus, in order to determine the relationship between network topology and mutual entropy, one can use S-type networks. In order to see how entropy depends on the size of a subnetwork, we calculate the mutual entropy for different q orders from randomly selected subsets of nodes of different sizes in a scale-free network. The mutual entropies of order 0, 1, and 2 (and the difference between orders 1 and 2) plotted versus the size of randomly chosen subnetworks are shown in Figure 2.2a, where filled circles are used for q = 0, filled triangles for q = 1, filled squares for q = 2, and filled stars for differences of q = 1 and 2. In order to understand how the entropies are sensitive to the basic topological structure of a network and to small changes (perturbations) of the network structure, the procedure to simulate a perturbed network was used (see [14]). The main idea is related to the fact that topological structure of a network is mainly defined by the subset of nodes with large degrees of connectivity. In order to recognize this relatively steady topological structure and filter it from the background of random connections, the network was perturbed without visible changes of its structure by applying the algorithm described in Ref. [25], in such a way that the most relevant links between nodes are reconnected according to a probability that is proportional to the product of the degrees of each couple of nodes in the network (see [14, 25] for details). Thus, the hollow circles, triangles, squares, and stars in Figure 2.2a show mutual entropies calculated for the corresponding perturbed subnetworks. One can see that the differences between the original and perturbed networks become larger as the size of the subnetwork decreases, thus reflecting the fact that smaller subnetworks contain less information about the whole network. Taking into account that the subnetworks are chosen randomly (without any preference to the particular cluster at the given network), one can consider this as a worst choice in determination of the representative part of a network. However, Figure 2.2b shows a plot of the average mutual entropies with q = 0, 1, 2, 3.5, and 5 over a denser population of sizes for the randomly chosen subsets of a scale-free network with 5000 nodes. These averages are calculated over 100 samples chosen for each subnetwork size, and the plot includes bars that represent the standard deviations of the mutual entropies around their average values. It can be seen that the standard deviations increase with smaller sizes of the randomly chosen subsets (this is not very noticeable for q = 0, since the entropy for that value of q is related to the size of the subnetwork and of course we should not expect large deviations of the values of

2.4

Application of Generalized Entropy for Network Analysis

49

8 7 8

6 5

7

4

6

3 5

2

4

1

3

0

2

(b)

1 0

8 1000

2000

3000

4000

5000

7 6

q =0, Original network q =0, Perturbed network q =1, Original network q =1, Perturbed network q =2, Original network q =2, Perturbed network q =1−2, Original network q =1−2, Perturbed network

(a)

500 1000 1500 2000 2500 3000 3500 4000 4500 5000

q=0 q=1 q=2 q = 3.5 q=5

5 4 3 2 1 102

103

(c) Figure 2.2 (a) Mutual entropies of randomly selected subsets of nodes of a matrix with size of 5000 nodes; circles, triangles, squares, and stars represent values of q = 0, 1, 2, and the difference of the entropies for q = 1 and q = 2, respectively. Filled symbols are used for the original network, and the hollow symbols are for the perturbed version of it. (b) Plot of the average mutual entropies of randomly chosen subsets of nodes from

a scale-free network with 5000 nodes, the bars represent the standard deviations of the mutual entropy at the corresponding subnetwork sizes. Averages and standard deviations have been calculated over a sample of 100 subnetworks for each subset size. (c) Semilog plot of the average mutual entropies of the randomly chosen subsets of nodes. (Gudkov [14]. Reproduced with permission of Elsevier.)

the mutual entropies around their averages in that case). The bars in Figure 2.2b show that randomly chosen subsets with sizes smaller than one-half of the size of the whole network are too scattered to be considered as representative samples. Figure 2.2c shows a semi-log plot of the average mutual entropies of Figure 2.2b. The mutual entropies show smooth logarithmic dependence on the size of a subnetwork for sizes larger than 2500 nodes (which is a natural property of entropy as a function of the size of the system; we can easily see this from a calculation of the entropy of ideal gases). For sizes smaller than 2500 nodes, the curves diverge from the logarithmic trends. These facts lead us to the observation that even with this worst case in choosing the subsets of nodes, the mutual entropies come to a saturation value for the size of a subnetwork by about one-half of the size of the whole

104

50

2 Generalized Entropies of Complex and Random Networks

Cluster 1 0

Cluster 2 Cluster 3

1000

Cluster 4

2000

Cluster 5 Cluster 6

3000

4000

5000

0

1000

2000

3000

4000

5000

Figure 2.3 Six main clusters are identified for a scale-free network of size of 5000 nodes showing a pattern of self-similarity. (Gudkov [14]. Reproduced with permission of Elsevier.)

network. Therefore, the results in Figure 2.2 clearly support our conjecture that the entropies of the subnetworks can be used to characterize the whole network. In other words, loss of information about small parts of a network is not critical if we use entropies as a measure of network characteristics. It should be noted that the saturation behavior of entropies shown in Figure 2.2 practically does not depend on the nature of the network (in our case, on the mixture parameter p), which indicate the universal property of the mutual entropy as a characteristic of a network. This result can be used in a number of applications since it does not depend on the nature of network; however, one can choose a subnetwork selectively, based on a closed (or other preferred) cluster. In order to study this option, we consider the same scale-free network, but we first identify the cluster (subnetwork) structure, and then organize the network according its cluster structure, as shown in Figure 2.3. In order to relate the information, we can obtain from a selected cluster to the entropy of the whole network structure and plot the mutual entropy corresponding to the cropped section of the connectivity matrix that contains the cluster versus its size (see Figure 2.4). Therefore, mutual entropies for clusters in the 5000-node scale-free network for order q = 0 are shown in Figure 2.4a, q = 1 in Figure 2.4b, q = 2 in Figure 2.4c, and for the difference of mutual entropies with q = 1 and q = 2 in Figure 2.4d. These figures show that the mutual entropy of the order 0 (Figure 2.4a) follows the same pattern that could be expected if the subnetworks were chosen by picking

2.4

Application of Generalized Entropy for Network Analysis

8

8

6

6

4

4

2

2

0

0

(a)

1000 2000 3000 4000 5000 Size of subnetwork

0

0 (b)

8

8

6

6

4

4

2

2

0

0

(c)

1000 2000 3000 4000 5000 Size of subnetwork

0

0 (d)

1000 2000 3000 4000 5000 Size of subnetwork

1000 2000 3000 4000 5000 Size of subnetwork

Figure 2.4 Mutual entropy for clusters in the 5000-node scale-free network. The graphs correspond to q = 0 (a), q = 1 (b), q = 2 (c), and q1 − q2 (d). (Gudkov [14]. Reproduced with permission of Elsevier.)

up nodes randomly. This is in agreement with the fact that mutual entropy of order 0 simply accounts for the number of connections in the network (subnetwork). Starting from the order 0, we observe a cluster differentiation (Figure 2.4b): two points got off the trend of the sequence. Those two points correspond to cluster 6 and the subnetwork comprising the nodes in clusters 3, 4, 5, and 6, which have different structures (Table 2.1). For the case of order 2 (Figure 2.4c), we observe even more accentuated entropy difference for these clusters, and a deviation from the trend for the entropy of clusters 4 and 5. This corresponds to the fact that for larger degrees of the mutual entropy, the cluster structure differentiation is more refined due to a power dependence in the degrees of probabilities (see, e.g., Eq. (2.2)). It is interesting that the difference of mutual entropies of orders 1 and 2 shows a clear flat trend for the scale-free subnetworks (Figure 2.4d). Their values for clusters 3, 4, 6, and the clusters (3 + 4 + 5 + 6) are separately grouped, clearly showing that the mutual entropy

51

52

2 Generalized Entropies of Complex and Random Networks

Table 2.1 Correspondence of the symbols in Figure 2.4 to the clusters identified in Figure 2.3. Clusters contained

Symbol used

1 2 3 4 5 6 1 and 2 3–5 1–5 3–6 Whole network

Filled circle Filled square Filled upward-pointing triangle Filled downward-pointing triangle Filled diamond Plus sign Circle Square Upward-pointing triangle Downward-pointing triangle Filled star

0

0

0

100

100

100

200

200

200

300

300

300

400

400

400 500

500

500 0

0

1000 2000 3000 4000 5000

(a)

0

1000 2000 3000 4000 5000

(b)

8

8

8

7

7

7

6

6

5

5

4

4

3

3

2

Order 0 Order 1 Order 2 Didd 1 − 2

1 0

6 5 Order 0 Order 1 Order 2 Didd 1−2

(d)

1000

2000 3000 4000

5000

4 3 2

2

1

1

0

Order 0 Order 0 Order 0 Didd 1 −2

–1

0 0

1000 2000 3000 4000 5000

(c)

0

(e)

1000

2000 3000 4000

5000

0

1000

2000 3000 4000

5000

(f)

Figure 2.5 Changes in the mutual entropy as a function of the size of the subnetwork with nodes included according to the size of the clusters. (Gudkov [14]. Reproduced with permission of Elsevier.)

difference is sensitive to the internal structure of clusters and could be used for identification of subnetwork structure. Figure 2.5 presents mutual entropy dependence on the choice of the sequences of the clusters in a given network. The mutual entropy obtained by accumulating the effects of the larger clusters first and then the most random ones shows rapid

2.5

Open Networks

fluctuations in the clustered region. This fluctuation is related to contributions from nodes with high degrees in this large cluster. After the larger cluster has been absorbed, the fluctuations disappear and the function becomes smooth and monotonically increasing, since the nodes with small degrees increase entropy smoothly by adding disorder into the structure of the system. For the reversely ordered nodes, the mutual entropy is monotonically increasing for orders 0 and 1, and starts showing some fluctuations for order 2. One can see a drop in the mutual entropy of order 1 just after the first large cluster has been included completely. This fact indicates the possibility to use mutual entropy to estimate a level of organization of a subnetwork. For example, mutual entropies can be used as a tool to differentiate a random subnetwork from a structured one. For the completeness of the consideration, we present the mutual entropy dependence for a random choice of the subnetwork in Figure 2.5f (see corresponding nodes order in Figure 2.5c), which was discussed in detail with relation to Figure 2.2. It is worth mentioning a rather simple, but a powerful method of testing network equivalence using network entropy techniques. Let us consider an example of a network with 100 nodes, with each node be randomly connected to seven other nodes (see Figure 2.6). In order to describe this network in terms of the adjacency matrix, one needs to choose an order how to number these nodes (1, 2, …). This leads to 100! combinations for the possible numbering of the nodes, and, as a consequence, to 100! possible adjacency matrices, which describe this network. Two arbitrary chosen adjacency matrices to represent this network are shown in Figures 2.7 and 2.8. Let us now interchange only two nodes (20,30) and (20,5) in the matrix of Figure 2.8 and randomly permute it (see Figure 2.9). How can we find if these matrices represent the same network or different ones? The recipe is simple: the calculations of Rényi entropies for these matrices give for the first two matrices R2 = 2.5268, and for the last one R2 = 2.5293. Therefore, we can conclude that the matrix presented in Figure 2.9 describes different (permuted) networks in comparison to the first two matrices.

2.5 Open Networks

The concept of open network, as an arbitrary selection of nodes of a large unknown network, which is a representative part of the whole network, was considered in Ref. [13]. For instance, in the framework of social networks, choosing a sample of people and their links could provide important information about the structure of the large unknown network to which they belong, and for which a complete description might not be feasible due to the commonly large size of complex networks or another reason. Therefore, the development of a method that can characterize the whole network from the incomplete information available for its subnetwork is a helpful tool to analyze network vulnerability, topology, and evolution.

53

2 Generalized Entropies of Complex and Random Networks

Figure 2.6 The network with 100 nodes, each of them randomly connected to seven other nodes.

Cij

0

Data1

10 20 30 40 j

54

50 60 70 80 90 100 0

10 20 30 40 50 60 70 80 90 100

Figure 2.7 Arbitrary chosen adjacency matrix for the network shown in Figure 2.6.

2.5

Open Networks

Bij

0 10 20 30

j

40 50 60 70 80 90 100 0

10 20 30 40 50 60 70 80 90 100 i

Figure 2.8 Randomly permuted adjacency matrix in Figure 2.7.

Bcor (20,30)->(20,5) + perm

0 10 20 30 40 50 60 70 80 90 100 0

10 20 30 40 50 60 70 80 90 100

Figure 2.9 Randomly permuted adjacency matrix in Figure 2.7, after interchanging nodes (20,30) and (20,5).

The type of network analysis which considers not only node connectivity, but also the network’s structure in the attempt to represent the whole network with only a part of it requires an approach similar to the one used in statistical physics, where the entropy function, being defined on a subsystem, contains information about the macroscopic state of the whole system. Therefore, we use the concept of mutual entropy to calculate the amount of information contained in a set of interconnected nodes, which can be either the whole network or a selected sample of it. Being a (uncountable) set of functions that includes Shannon’s definition of entropy, Rényi entropy supplies a more

55

56

2 Generalized Entropies of Complex and Random Networks

complete description of the network structure with the important feature that it depicts a more refined measure of the (sub)network’s level of disorder as the value of the q-order becomes larger. Since Rényi entropy can be considered as a measure of localization [16–18] in complex systems, two hypotheses were suggested and tested in Ref. [13]: (i) the set of entropies can be used to characterize the main properties related to the structure (topological and information exchange) and dynamics of networks and (ii) the mutual entropies, calculated over a subset of the network, contain enough information to represent the whole network. These two conjectures outline the approach for a structural analysis of the networks when the available data are not complete. Let us define the concept of an open network as an arbitrary chosen subset of nodes that belong to a large unknown network; thus, in the study of real-world networks, the set of available data obtained can be considered to be an open network whose entropy measurements can be “extrapolated” to be a reasonably accurate measure of the whole network. A crucial requirement to demand from the open network is the minimum critical size it must have to represent the whole network without a significant loss of information related to the main structure. Knowledge of this minimal size permits the definition of a representative subnetwork as an open network whose size is larger than that of the critical one. In order to find this threshold, a simulated scale-free network with size of 5000 nodes was used [13]. It should be noted that the size of 5000 nodes was chosen to avoid possible systematic errors in the network simulations (see for details [14]). Then, randomly chosen subsets of different sizes (which we call as random open networks) were taken by selecting a reasonably large number of these open networks at each size, and mutual entropy was calculated for each open network. These calculations were done 100 times for each size of the open network. As a result, the average mutual entropy Hq (s) and its uncertainty 𝜎q (s) as a function of the size s of the open network (chosen subset of the whole network) are shown in Figure 2.10a. Different q-orders in the figure are represented by q = 0 1

8 7

0.4

5 q=0 q=1 q=2 q = 1- q = 2

4 3 2

Zq

0.6 Fq

Hq

0.5

0.8

6

0.2 0.2

0.1

1 0

0 0

2500

(a)

s

0.3

0.4

5000

0

2500

(b)

s

Figure 2.10 The plots of (a) mutual entropy Hq , (b) rescaled mutual entropy Fq , and (c) relative entropy uncertainty zq versus the size of the open network. The triangles,

5000

0

0

2500

(c)

s

5000

circles, and squares represent the q-degree values 0, 1, and 2, respectively, and the stars represent the difference between qdegrees 1 and 2.

2.5

Open Networks

(triangles), q = 1 (circles), q = 2 (squares), and the entropy difference between q = 1 and q = 2 (stars). One can see that for q = 0, 1, and 2, the mutual entropy Hq increases rapidly with the size of the open network and after half of the whole network’s size reaches the entropy value of about the entropy of the whole network. The relative value of the mutual entropy defined as Fq (s) = Hq (s)∕Hq (5000) is presented in Figure 2.10b. One can see that, for s > 2500, the relative entropies are Fq > 0.9, which means that open network of half of the size of the whole network or larger has the same mutual entropy as a whole network with an accuracy of ≥10%. It should be noted that when the value of q becomes larger, the uncertainties in the entropies 𝜎q (s) become more noticeable, which is consistent with the fact that higher orders of the mutual entropy enhance the contribution of nodes with larger degrees. Therefore, we can use 𝜎q (s) as an additional criterion to define the minimal size of the representative open network. Then, one can see from the plot, a relative uncertainty zq (s) = 𝜎q (s)∕Hq (s) of the entropy as a function of open network size (see Figure 2.10c), that is, zq=0,1,2 < 0.05 for s > 2500, which agrees with the conclusion that the critical size of the open networks is about a half of the size of the whole network. It is important that the nodes of each subset used to create the plots in Figure 2.10 are chosen randomly; therefore, the mutual entropies Hq (s) found for each size should be considered as a worst case in the choice of the size of a representative network. If the nodes of the subsets are chosen according to a selection criterion, then the structural information contained in the sample may represent the whole network with a much better fidelity, and this should be observed in the mutual entropy plot of the selected subsets. In order to show this, we rearrange the nodes according to the network cluster structure based on their connectivity. Then, using a physics-based clustering algorithm [26], the sequence of the nodes in connectivity matrix used to create Figure 2.10 can be reordered to show the clusters, as in Figure 2.3. We can identify six clusters, which are arranged in a way such that the denser clusters contain the sparser ones, with intercluster connections indicating that the nodes are ordered hierarchically even inside each cluster. The density of cluster 6 is relatively smaller than the one seen in clusters 1–5, which might lead to the naïve interpretation that cluster 6 is structurally different from clusters 1 to 5; however, the structure of cluster 6 is still scale-free, as can be seen in Figure 2.11, where the degree distribution of cluster 6 indeed follows a power law, but differs with the whole network in that it does not contain nodes with such a high degree of connectivity, which are found in clusters 1–5. Therefore, large structural components in that cluster are not expected. Moreover, it is seen that the cluster 6 covers a very large fraction of the nodes of whole network and its size is much larger than the sizes of the other identified clusters, which indicates that an open network of any size (randomly chosen) is very likely to contain a significant number of nodes in cluster 6 and this reduces the probability of “fishing” strongly structural nodes in a random sample, which justifies the argument of Figure 2.10 representing the worst case in the choice of nodes.

57

58

2 Generalized Entropies of Complex and Random Networks

104 Whole network Cluster 6 103

102

101

100 100

101 d

102

Figure 2.11 Degree distribution of the whole 5000 nodes network compared to that of cluster 6. 8

3 Group A

7

Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Cluster 1 + 2 Cluster 3 + 4 + 5 Cluster 1 + 2 + 3 + 4 + 5 Cluster 3 + 4 + 5 + 6 Whole network Random selection

2.5

6 2 Group B H1-H2

H2

5 4 3

1.5 1

2 0.5

1

Group C 0

0

(a)

1000

2000

s

3000

4000

5000

0 0

(b)

1000

2000

s

3000

4000

5000

Figure 2.12 Plot of the mutual entropies of (a) order 2 H2 and (b) the difference of the mutual entropies H1 and H2 for the clusters identified in a scale-free network with 5000 nodes.

In order to compare the contributions of each cluster to the network’s structure, we plot their mutual entropies for order q = 2 (see Figure 2.12a), from which it can be seen that clusters 1, 2, (1 + 2), and (1 + 2 + 3 + 4 + 5) are aligned with the mutual entropy of the whole network, while every other combination of clusters possesses a mutual entropy that increases rapidly with the size of the cluster considered (same behavior as a random open network), in particular, cluster 6 (plus sign) and the joint clusters 3+4+5+6 (downward-pointing hollow triangle) have mutual entropies, which are even larger than the one of the whole network and the

2.6

Summary

averages of Figure 2.10a. This can be attributed to the fact that clusters composed of nodes with low degree of connectivity are expected to possess small substructures, and therefore they are expected to have a much larger amount of disorder than the one contained in a hierarchical structure. Figure 2.12b shows a plot of the difference of the mutual entropies ΔH(s) = H1 (s) − H2 (s); three groups of values for ΔH are found: group A contains clusters 1, 2, (1+2), and (1+2+3+4+5) with ΔH ≃ 2.5, group B contains clusters 3, 4, 5, and (3+4+5) with ΔH ≃ 1.5, and group C contains clusters 6 and (3+4+5+6) with ΔH ≤ 1.5. The mutual entropy difference found for the whole network is ΔH ≃ 2.0, which is also the highest value observed for the average of ΔH calculated for random open networks. This indicates that clusters containing high-connectivity degree nodes are more prone to possess a well-defined interconnected structure that is enhanced for larger q-orders, and therefore they are associated to a larger value of ΔH, which can indeed exceed the entropy difference of the whole network. On the contrary, clusters containing nodes with lower-connectivity degrees must be made up of small substructures, which manifests in a significantly larger value of the mutual entropy of order q = 2 (group C) and results in a value of ΔH, which can be smaller than the average for randomly selected open networks. This indicates that a structural subset of nodes in the network can be recognized if it has a significantly larger value of ΔH than a randomly chosen set of nodes.

2.6 Summary

In conclusion, one can see that by calculating the Rényi entropy for identifiable clusters of nodes within a given network, it is possible to differentiate network substructures. The scale-free networks possess a hierarchical structure that mimics itself in its main building blocks, which can be identified by means of comparing the whole network’s mutual entropy to that of a perturbed version of it. Thus, this method can be used to identify the most sensitive groups of nodes that make a scale-free network more vulnerable, because they contain most information about the global network structure, which can be extracted from a selected representative part of the whole network. Moreover, the network description provided by the mutual entropy can be used as a measure of the level of organization in network’s structure after being modified by a perturbation and to indicate which part of the network has been changed. Therefore, this is a promising tool to study the network’s evolution. We can also see that the analysis of different q-degree entropies of the network is an efficient measure to distinguish the contribution of each subnetwork to the global network structure, both to the topology and information exchanges between nodes. This is achievable because we use not only Shannon’s entropy, but the whole set of possible entropies (in general, the infinite set of Rényi entropies, defined for any positive q), each of which is sensitive to specific properties of the network. Since we can make conclusions about structure/dynamics of a whole network by analyzing only a representative part of it, this approach can lead

59

60

2 Generalized Entropies of Complex and Random Networks

to a promising method of analysis of real networks, when the number of nodes is unknown and/or changes with time. We have proposed a method to determine the value of the minimum critical size that an open network must have to represent the whole network in a reliable way; we have found this threshold to be about of half of the size of the network for the scale-free case. We also showed that the main topological features of the scale-free network type of structure can be found in certain clusters of nodes that contain the largest connectivity degrees in the network, such subsets of nodes can be chosen by a clustering algorithm or other preferred selection method. Comparing the mutual entropy of those selected clusters to the mutual entropies of random open networks with the same size shows that the clusters which possess most of the structural features of the network tend to have large differences in the entropies of different q-orders (with values that can be larger than those of the whole network), while clusters formed by small substructures tend to show very small differences in the same quantity (lower than the average of the open networks), in agreement with the fact that the mutual entropies of clusters with few important structural properties should not vary much between different q-orders. Therefore, the use of mutual information in different q-orders is a promising tool to analyze the structure of a large network without requiring the knowledge of the totality of the information contained in it. The applications of this analysis in real-world networks could provide a way to simplify or swift the current network structural analysis by eliminating the requirement of full knowledge of the network that is being analyzed; it can also be used to study network evolution by measuring the entropy changes. A list of references on a section level follows. References 1. Girvan, M. and Newman, M.E.J. (2002) 2. 3. 4. 5.

6. 7.

8. 9.

Proc. Natl. Acad. Sci. U.S.A, 99, 7821. Newman, M.E.J. (2004) Eur. Phys. J. B, 38, 321. Newman, M.E.J. and Girvan, M. (2004) Phys. Rev. E, 69, 026113. Newman, M.E.J. (2004) Phys. Rev. E, 69, 066133. Danon, L., Diaz-Guilera, A., Duch, J., and Arenas, A. (2005) J. Stat. Mech.: Theory Exp., 69, P09008. Newman, M.E.J. (2006) Proc. Natl. Acad. Sci. U.S.A., 103, 8577. Boccaletti, S., Ivanchenko, M., Latora, V., Pluchino, A., and Rapisarda, A. (2007) Phys. Rev. E, 75, 045102(R). Fortunato, S. and Barthélemy, M. (2007) Proc. Natl. Acad. Sci. U.S.A., 104, 36. Ronhovde, P. and Nussinov, Z. (2004) Phys. Rev. E, 80, 016109.

10. Dehmer, M. and Mowshowitz, A. (2011)

Complexity, 17, 45. 11. Dehmer, M., Li, X., and Shi, Y. (2014)

Complexity, doi: 10.1002/cplx.21539. 12. Dehmer, M. and Mowshowitz, A. (2011)

Inf. Sci., 181, 57. 13. Gudkov, V. and Montealegre, V. (2007)

14. 15. 16.

17.

Complexity, Metastability, and Nonextensivity: AIP Conference Proceedings, vol. 965, p. 336. Gudkov, V. and Montealegre, V. (2008) Physica A, 387, 2620. Gudkov, V. and Nussinov, S. (2002) arXiv: cond-mat/0209112. Halsey, T.C., Jensen, M.H., Kadanoff, L.P., Procaccia, I., and Schraiman, B.I. (1986) Phys. Rev. A, 33, 1141. Hentschel, H.G.E. and Procaccia, I. (1983) Phys. Rev. D, 8, 435.

References 18. Varga, I. and Pipek, J. (2003) Phys. Rev.

E, 68, 026202. 19. Falconer, K. (2003) PFractal Geometry: Mathematical Foundations and Applications, 2nd edn, John Wiley & Sons, Inc., Sussex. 20. Jizba, P. and Arimitsu, T. (2004) Ann. Phys., 312, 17. 21. Kolmogorov, A.N. (1930) Sur la notion de la moyenne, Atti Accad. Naz. Lincei Rend., 12, 388.

22. Nagumo, M. (1930) Jpn. J. Math., 7, 71. 23. Rényi, A. (1970) Probability The-

ory, North-Holland Publishing Co., Amsterdam. 24. Liu, Z., Lai, Y., Ye, N., and Dasgupta, P. (2002) Phys. Lett. A, 303, 337. 25. Chung, F. and Lu, L. (2002) Ann. Comb., 6, 125. 26. Gudkov, V., Montealegre, V., Nussinov, S., and Nussinov, Z. (2008) Phys. Rev. E, 78, 016113.

61

63

3 Information Flow and Entropy Production on Bayesian Networks Sosuke Ito and Takahiro Sagawa

3.1 Introduction 3.1.1 Background

The second law of thermodynamics is one of the most fundamental laws in physics, which identifies the upper bound of the efficiency of heat engines [1]. This law was established in the 19th century, after numerous failed trials to invent a perpetual motion of the second kind. Today, we realize that it is not possible; one can never extract a positive amount of work from a single heat bath in a cyclic way, or equivalently, the entropy of the whole universe never decreases. While thermodynamics has been formulated for macroscopic systems, thermodynamics of small systems has been developed over the last two decades. Imagine a single Brownian particle in water. The particle goes to thermal equilibrium in the absence of external driving, because water plays the role of a huge heat bath. In this case, even a single small particle can behave as a thermodynamic system. Moreover, if we drive the particle by applying a time-dependent external force, it goes far from equilibrium. Such a small stochastic system is an interesting playing field to investigate “stochastic thermodynamics” [2, 3], which is a generalization of thermodynamics by including the role of thermal fluctuations explicitly. We can show that, in small systems, the second law of thermodynamics can be violated stochastically, but is never violated on average. The probability of the violation of the second law can quantitatively be characterized by the fluctuation theorem [4–9], which is a prominent discovery in stochastic thermodynamics. From the fluctuation theorem, we can reproduce the second law of thermodynamics on average. Stochastic thermodynamics is applicable to not only a simple Brownian particle [10], but also much more complex systems such as RNA foldings [11, 12] and biological molecular motors [13]. More recently, stochastic thermodynamics has been extended to information processing [14]. The central idea is that one can utilize the information about Mathematical Foundations and Applications of Graph Entropy, First Edition. Edited by Matthias Dehmer, Frank Emmert-Streib, Zengqiang Chen, Xueliang Li, and Yongtang Shi. © 2016 Wiley-VCH Verlag GmbH & Co. KGaA. Published 2016 by Wiley-VCH Verlag GmbH & Co. KGaA.

64

3 Information Flow and Entropy Production on Bayesian Networks

thermal fluctuations to control small thermodynamic systems. Such an idea dates back to the thought experiment of “Maxwell’s demon” in the 19th century [15]. The demon can perform a measurement of the position and the velocity of each molecule, and manipulate it by utilizing the obtained measurement outcome. By doing so, the demon can apparently violate the second law of thermodynamics, by adiabatically decreasing the entropy. The demon has puzzled many physicist over a century [16–20], and it is now understood that the key to understand the consistency between the demon and the second law is the concept of information [21–23], and that the demon can be regarded as a feedback controller. The recent theoretical progress in this field has led to a unified theory of information and thermodynamics, which may be called information thermodynamics [14, 24–39]. The thermodynamic quantities and information contents are treated on an equal footing in information thermodynamics. In particular, the second law of thermodynamics has been generalized by including an informational quantity called the mutual information. The demon is now regarded as a special setup in the general framework of information thermodynamics. The entropy of the whole universe does not decrease even in the presence of the demon, if we take into account the mutual information as a part of the total entropy. Information thermodynamics has recently been experimentally studied with a colloidal particle [40–43] and a single electron [44]. Furthermore, the general theory of information thermodynamics is not restricted to the conventional setup of Maxwell’s demon, but is applicable to a variety of dynamics with complex information exchanges. In particular, information thermodynamics is applicable to autonomous information processing [45–58], and is further applied to sensory networks and biochemical signal transduction [59–63]. Such complex and autonomous information processing can be formulated in a unified way based on Bayesian networks [52]; this is the main topic of this chapter. An informational quantity called the transfer entropy [23], which represents the directional information transfer, is shown to play a significant role in the generalized second law of thermodynamics on Bayesian networks. 3.1.2 Basic Ideas of Information Thermodynamics

Before proceeding to the main part of this chapter, we briefly sketch the basic idea of information thermodynamics. The simplest model of Maxwell’s demon is known as the Szilard engine [17], which is shown in Figure 3.1. We consider a single particle in a box with volume V that is in contact with a heat bath at temperature T. The time evolution of the Szilard engine is as follows: (i) The particle is in thermal equilibrium, and the position of the particle is uniformly distributed. (ii) We divide the box by inserting a barrier at the center of the box. (iii) The demon performs a measurement of the position of the particle, and finds it in the left or right box with probability 1∕2. The obtained information is 1 bit, or equivalently ln 2 in the natural logarithm. (iv) If the particle is found in the left (right) box, then

3.1

(i)

T

Introduction

(ii)

V

L

R

Measurement (iii) (v)

(iv) m=L

m=R

Maxwell’ s demon

m=L

m=R

m=L?m=R? Feedback

Figure 3.1 Schematic of the Szilard engine. The demon obtains measurement outcome m = L (left) or m = R (right), corresponding to one bit (= ln 2) of information. It then extracts kB T ln 2 of work by feedback control.

the demon slowly moves the barrier to the right (left) direction, which is feedback control, depending on the measurement outcome. This process is assumed to be isothermal and quasi-static. (v) The partition is removed, and the particle returns to the initial equilibrium state. In step (iv), the single-particle gas is isothermally expanded and a positive amount of work is extracted. The amount of the work can be calculated by using the equation of states of the single-particle ideal gas (i.e., 𝑝𝑉 = kB T with kB the Boltzmann constant): V

kB T dV ′ = kB T ln 2. ∫V ∕2 V ′

(3.1)

This is obviously positive, while the entire process seems to be cyclic. The crucial point here is that the extracted work kB T ln 2 is proportional to the obtained information ln 2, which suggests the fundamental information–thermodynamics link. 3.1.3 Outline of this Chapter

In the following, we present an introduction to a theoretical framework of information thermodynamics based on Bayesian networks. This chapter is organized as follows: In Section 3.2, we briefly review the basic properties of information contents: the Shannon entropy, the relative entropy, the mutual information, and the transfer entropy. In Section 3.3, we review stochastic thermodynamics by focusing on a simple case of Markovian dynamics. In particular, we discuss the concept

65

66

3 Information Flow and Entropy Production on Bayesian Networks

of entropy production. In Section 3.4, we review the basic concepts and terminologies of Bayesian networks. In Section 3.5, we discuss the general theory of information thermodynamics on Bayesian networks, and derive the generalized second law of thermodynamics, including the transfer entropy. In Section 3.6, we apply the general theory to special situations such as repeated measurements and feedback control. In particular, we discuss the relationship between our approach based on the transfer entropy and another approach based on the dynamic information flow [53–58]. In Section 3.7, we summarize this chapter and discuss the future prospects of information thermodynamics.

3.2 Brief Review of Information Contents

In this section, we review the basic properties of several informational quantities. We first discuss various types of entropy: the Shannon entropy, the relative entropy, and the mutual information [21, 22]. We then describe the transfer entropy that quantifies the directional information transfer [23]. 3.2.1 Shannon Entropy

We first discuss the Shannon entropy, which characterizes the randomness of probability variables. Let x be a probability variable with probability distribution p(x). We first define a quantity called the stochastic Shannon entropy: s(x) ∶= − ln p(x),

(3.2)

which is large if p(x) is small. The ensemble average of s(x) over all x is equal to the Shannon entropy: ∑ ⟨s(x)⟩ ∶= − p(x) ln p(x). (3.3) x

We note that ⟨· · · ⟩ describes the ensemble average throughout this chapter. Since 0 ≤ p(x) ≤ 1, we have s(x) ≥ 0, and therefore, ⟨s(x)⟩ ≥ 0.

(3.4)

Let y be another probability variable that has the joint probability distribution with x as p(x, y). The conditional probability of x under the condition of y is given by p(x|y) ∶= p(x, y)∕p(y), which is the Bayes rule. We define the stochastic conditional Shannon entropy as s(x|y) ∶= − ln p(x|y), whose ensemble average is the conditional Shannon entropy: ∑ ⟨s(x|y)⟩ = − p(x, y) ln p(x|y). x,y

(3.5)

(3.6)

3.2

Brief Review of Information Contents

3.2.2 Relative Entropy

We next introduce the relative entropy (or the Kullback–Leibler divergence), which is a measure of the difference of two probability distributions. We consider two probability distributions p and q on the same probability variable x. The relative entropy between the probability distributions is defined as ∑ p(x) ∑ = p(x) ln p(x)[ln p(x) − ln q(x)]. (3.7) DKL (p||q) ∶= q(x) x x By introducing the stochastic relative entropy as dKL (p(x)||q(x)) ∶= ln p(x) − ln q(x),

(3.8)

we write the relative entropy as DKL (p||q) = ⟨dKL (p(x)||q(x))⟩.

(3.9)

The relative entropy is always nonnegative. In order to show this, we use the Jensen inequality [22]: ⟨− ln[q(x)∕p(x)]⟩ ≥ − ln⟨q(x)∕p(x)⟩,

(3.10)

which is a consequence of the concavity of the logarithmic function. We then have ⟩ ⟨ q(x) DKL (p||q) ≥ − ln p(x) ∑ q(x) = − ln p(x) p(x) x ∑ q(x) = − ln x

= 0, (3.11) ∑ where we used x q(x) = 1. We note that DKL (p(x)||q(x)) = 0 only if q(x) = p(x). We can also show the nonnegativity of the relative entropy in a slightly different way as follows. We first note that ⟨e−dKL (p(x)||q(x)) ⟩ = 1, because ⟨e−dKL (p(x)||q(x)) ⟩ =

⟨

(3.12) q(x) p(x)

⟩ =

∑ x

p(x)

q(x) ∑ = q(x) = 1. p(x) x

(3.13)

By applying the Jensen inequality to the exponential function that is convex, we have ⟨exp(−dKL (p(x)||q(x)))⟩ ≥ exp(−⟨dKL (p(x)||q(x))⟩).

(3.14)

Therefore, we obtain 1 ≥ exp(−DKL (p||q)),

(3.15)

which implies the nonnegativity of the relative entropy. We note that this proof is closely related to the fluctuation theorem as shown in Section 3.3.

67

68

3 Information Flow and Entropy Production on Bayesian Networks

3.2.3 Mutual Information

We discuss the mutual information between two probability variables x and y, which is an informational measure of correlation [21, 22]. The stochastic mutual information between x and y is defined as I(x ∶ y) ∶= ln

p(x, y) = ln p(x, y) − ln p(x) − ln p(y), p(x)p(y)

(3.16)

which can be rewritten as the stochastic relative entropy between p(x, y) and p(x)p(y) as I(x ∶ y) = dKL (p(x, y)||p(x)p(y)). Its ensemble average is the mutual information: ∑ p(x, y) = ⟨dKL (p(x, y)||p(x)p(y))⟩. p(x, y) ln ⟨I(x ∶ y)⟩ = p(x)p(y) x,y

(3.17)

(3.18)

From the nonnegativity of the relative entropy, we have ⟨I(x ∶ y)⟩ ≥ 0.

(3.19)

The equality is achieved only if x and y are stochastically independent, that is, p(x, y) = p(x)p(y). The mutual information can also be rewritten as the difference of the Shannon entropy as ⟨I(x ∶ y)⟩ = ⟨s(x)⟩ + ⟨s(y)⟩ − ⟨s(x, y)⟩ = ⟨s(x)⟩ − ⟨s(x|y)⟩ = ⟨s(y)⟩ − ⟨s(y|x)⟩.

(3.20)

From the nonnegativity of the conditional Shannon entropy, we find that the mutual information is bounded by the Shannon entropy: ⟨I(x ∶ y)⟩ ≤ ⟨s(x)⟩,

⟨I(x ∶ y)⟩ ≤ ⟨s(y)⟩.

(3.21)

Figure 3.2 shows a Venn diagram that summarizes the relationship between the Shannon entropy and the mutual information. We can also define the stochastic conditional mutual information between x and y under the condition of another probability variable z as I(x ∶ y|z) ∶= ln

p(x, y|z) = dKL (p(x, y|z)||p(x|z)p(y|z)). p(x|z)p(y|z)

Its ensemble average is the conditional mutual information: ∑ p(x, y|z) . ⟨I(x ∶ y|z)⟩ ∶= p(x, y, z) ln p(x|z)p(y|z) x,y,z

(3.22)

(3.23)

We have, ⟨I(x ∶ y|z)⟩ ≥ 0, where the equality is achieved only if x and y are conditionally independent, that is, p(x, y|z) = p(x|z)p(y|z).

3.2

Brief Review of Information Contents

⟨s(x,y)⟩

⟨s(x /y)⟩

⟨I(x : y)⟩

⟨s(y /x)⟩

⟨s(y)⟩

⟨s(x)⟩

Figure 3.2 Venn diagram showing the relationship between the Shannon entropy and the mutual information.

3.2.4 Transfer Entropy

The directional information transfer between two stochastic systems can be characterized by an informational quantity called the transfer entropy [23]. We consider a sequence of two probability variables: (x1 , x2 , … , xN , y1 , y2 , … , yN ). Intuitively, the states of interacting two systems X and Y at time k (= 1, 2, … , N) is given by (xk , yk ). The time evolution of the composite system is characterized by the transition probability p(xk+1 , yk+1 |x1 , y1 , … , xk , yk ), which is the probability of (xk+1 , yk+1 ) under the condition of (x1 , y1 , … , xk , yk ). The joint probability of all the variables is given by ∏

N−1

p(x1 , … , xN , y1 , … , yN ) =

p(xk+1 , yk+1 |x1 , y1 , … , xk , yk ) ⋅ p(x1 , y1 ). (3.24)

k=1

We now consider the information transfer from system X to Y during time k and k + 1. We define the stochastic transfer entropy as the stochastic conditional mutual information: tr (X → Y ) ∶= I((x1 , … , xk ) ∶ yk+1 |y1 , … , yk ) Ik+1 p(x1 , … , xk , yk+1 |y1 , … , yk ) . ∶= ln p(x1 , … , xk |y1 , … , yk )p(yk+1 |y1 , … , yk )

(3.25)

Its ensemble average is the transfer entropy: tr (X → Y )⟩ ∶= ⟨I((x1 , … , xk ) ∶ yk+1 |y1 , … , yk )⟩ ⟨Ik+1 ∑ ∶= p(x1 , … , xk , y1 , … , yk , yk+1 ) x1 ,…,xk ,y1 ,…,yk+1

× ln

p(x1 , … , xk , yk+1 |y1 , … , yk ) , p(x1 , … , xk |y1 , … , yk )p(yk+1 |y1 , … , yk )

(3.26)

69

70

3 Information Flow and Entropy Production on Bayesian Networks

which represents the information about past trajectory (x1 , x2 , … , xk ) of system X, which is newly obtained by system Y from time k to k + 1. While the mutual information is symmetric between two variables in general, the transfer entropy is asymmetric between two systems X and Y , as the transfer entropy represents the directional transfer of information. Equality (3.25) can be rewritten as tr Ik+1 (X → Y ) = I((x1 , … , xk ) ∶ (y1 , … , yk , yk+1 )) − I((x1 , … , xk ) ∶ (y1 , … , yk )),

(3.27) because I((x1 , … , xk ) ∶ yk+1 |y1 , … , yk ) p(x1 , … , xk , y1 , … , yk , yk+1 )p(y1 , … , yk ) = ln p(x1 , … , xk , y1 , … , yk )p(y1 , … , yk , yk+1 ) p(x1 , … , xk , y1 , … , yk , yk+1 ) p(x1 , … , xk , y1 , … , yk ) = ln − ln p(x1 , … , xk )p(y1 , … , yk , yk+1 ) p(x1 , … , xk )p(y1 , … , yk ) = I((x1 , … , xk ) ∶ (y1 , … , yk , yk+1 )) − I((x1 , … , xk ) ∶ (y1 , … , yk )).

(3.28)

Equality (3.27) clearly shows the meaning of the transfer entropy: the information about X newly obtained by Y . We note that Eq. (3.25) can also be rewritten by using the stochastic conditional Shannon entropy: tr Ik+1 (X → Y ) = s(yk+1 |y1 , … , yk ) − s(yk+1 |x1 , … , xk , y1 , … , yk ).

(3.29)

tr (X → Y )⟩ describes the reduction of the conditional Shannon Therefore, ⟨Ik+1 entropy of yk+1 due to the information gain about system X, which again confirms the meaning of the transfer entropy.

3.3 Stochastic Thermodynamics for Markovian Dynamics

We review stochastic thermodynamics of Markovian dynamics [2, 3], which is a theoretical framework to describe thermodynamic quantities such as the work, heat, and entropy production, at the level of thermal fluctuations. In particular, we discuss the second law of thermodynamics and the fluctuation theorem [4–9]. 3.3.1 Setup

We consider system X that stochastically evolves. We assume the physical situation that system X is attached to a single heat bath at inverse temperature 𝛽 ∶= (kB T)−1 , and also that the system is driven by external control parameter 𝜆 that describes, for example, the volume of the gas. We also assume that nonconservative force is not applied to system X for simplicity. Moreover, we assume that system X does not include any odd variable that changes its sign with

3.3

Stochastic Thermodynamics for Markovian Dynamics

External parameters

λN λN–1 ⃛

λ3 λ2 λ1 k=1

2

3

4 ⃛

N–1 N

Time

Figure 3.3 Discretization of the time evolution of the external parameter.

the time-reversal transformation (e.g., momentum). The generalization beyond these simplifications is straightforward. Although real physical dynamics are continuous in time, our formulation in this chapter is discrete in time. Therefore, we discretize time as follows. Suppose that the real stochastic dynamics of system X is parameterized by continuous time t. We then focus on the state of system X only at discrete time tk ∶= kΔt (k = 1, 2, … , N), where Δt is a finite time interval. In the following, we refer to time tk just as “time k.” Let xk be the state of system X at time k. We next assume that 𝜆 takes a fixed value 𝜆k during time interval tk ≤ t < tk+1 . The value of 𝜆 is changed from 𝜆k to 𝜆k+1 immediately before time tk+1 (see also Fig. 3.3). We here assume that the time evolution of 𝜆 is predetermined independent of the state of X. Let p(xk |xk−1 , … , x1 ) be the conditional probability of state xk under the condition of past trajectory x1 → · · · → xk−1 . It is natural to assume that the conditional probability is determined by external parameter 𝜆k that is fixed during time interval tk ≤ t < tk+1 ; we can explicitly show the 𝜆k -dependence by writing p(xk |xk−1 , … , x1 ; 𝜆k ). We also assume that the correlation time of the heat bath in the continuoustime dynamics is much shorter than Δt. Under this assumption, the discretized time evolution x1 → x2 → · · · → xN can be regarded as Markovian. We note that, if the continuous-time dynamics itself is Markovian, the discretized dynamics is obviously Markovian. From the Markovian assumption, we have p(xk |xk−1 , … , x1 ; 𝜆k ) = p(xk+1 |xk ; 𝜆k ),

(3.30)

which we sometimes write as, for simplicity of notation, p(xk |xk−1 ) ∶= p(xk+1 |xk ; 𝜆k ).

(3.31)

The joint probability distribution of (x1 , x2 , … , xN ) is then given by p(x1 , x2 , … , xN ) ∶= p(xN |xN−1 ) · · · p(x3 |x2 )p(x2 |x1 )p(x1 ).

(3.32)

In order to simplify the notation, we define set  ∶= {x1 , x2 , … , xN }, and denote p() ∶= p(x1 , x2 , … , xN ).

(3.33)

71

72

3 Information Flow and Entropy Production on Bayesian Networks

Strictly, set {x1 , x2 , … , xN } is not the same as vector (x1 , x2 , … , xN ). However, we sometimes do not distinguish them by notations for the sake of simplicity. 3.3.2 Energetics

We now consider the energy change in system X, and discuss the first law of thermodynamics. Let E(xk ; 𝜆k ) be the energy (or the Hamiltonian) of system X at time tk , which depends on external parameter 𝜆k as well as state xk . The energy change in system X is decomposed into two parts: heat and work. Heat is the energy change in X due to the stochastic change of the state of X induced by the heat bath, and work is the energy change due to the change of external parameter 𝜆. We stress that the heat and work are defined at the level of stochastic trajectories in stochastic thermodynamics [2]. The heat absorbed by system X from the heat bath during time interval tk ≤ t < tk+1 is given by Qk ∶= E(xk+1 ; 𝜆k ) − E(xk ; 𝜆k ),

(3.34)

which is a stochastic quantity due to the stochasticity of xk and xk+1 . On the contrary, the work is performed at time k at which the external parameter is changed. The work performed on system X at time k is given by (see also Fig. 3.3) Wk ∶= E(xk ; 𝜆k ) − E(xk ; 𝜆k−1 ),

(3.35)

which is also a stochastic quantity. The total heat absorbed by system X from time 1 to N along the trajectory (x1 , x2 , … , xN ) is then given by ∑

N−1

Q ∶=

Qk ,

(3.36)

k=1

and the total work is given by W ∶=

N ∑

Wk .

(3.37)

k=2

It is easy to check that the total heat and the work satisfy the first law of thermodynamics: ΔE = Q + W ,

(3.38)

ΔE ∶= E(xN , 𝜆N ) − E(x1 , 𝜆1 )

(3.39)

where

is the total energy change. We note that Eq. (3.38) is the first law at the level of individual trajectories.

3.3

Stochastic Thermodynamics for Markovian Dynamics

3.3.3 Entropy Production and Fluctuation Theorem

We next consider the second law of thermodynamics. We start from the concept of the detailed balance, which is satisfied in the absence of any nonconservative force. The detailed balance is given by, from time k to k + 1, p(xk+1 |xk ; 𝜆k )e−𝛽E(xk ;𝜆k ) = pk (xk |xk+1 ; 𝜆k )e−𝛽E(xk+1 ;𝜆k ) ,

(3.40)

where pk (xk |xk+1 ; 𝜆k ) describes the “backward” transition probability from xk+1 to xk under an external parameter 𝜆k . Equality (3.40) can also be written as, from the definition of heat (3.34), p(xk+1 |xk ; 𝜆k ) = e−𝛽Qk . p(xk |xk+1 ; 𝜆k )

(3.41)

The detailed balance condition (3.40) implies that, if the external parameter is fixed at 𝜆k and is not changed in time, the steady distribution of system X becomes the canonical distribution peq (x; 𝜆k ) = e𝛽(F(𝜆k )−E(x;𝜆k )) , (3.42) ∑ where F(𝜆k ) ∶= −𝛽 −1 ln x e−𝛽E(x;𝜆k ) is the free energy. In fact, it is easy to check that ∑ p(xk+1 |xk ; 𝜆k )peq (xk ; 𝜆k ) = peq (xk+1 ; 𝜆k ). (3.43) xk

It is known that the expression of the detailed balance (3.40) is valid for a much broader class of dynamics than the present setup. In fact, it is known that Eq. (3.40) is valid for Langevin dynamics even in the presence of nonconservative force [9]. Moreover, a slightly modified form of Eq. (3.40) is valid for nonequilibrium dynamics with multiple heat baths at different temperatures [8]. Therefore, we regard Eq. (3.40) as a starting point of the following argument. We now consider the entropy production, which is the sum of the entropy changes in system X and the heat bath. The stochastic entropy change in system X from time k to k + 1 is given by ΔsXk ∶= s(xk+1 ) − s(xk ),

(3.44)

where s(xk ) ∶= − ln p(xk ) is the stochastic Shannon entropy. The ensemble average of (3.44) gives the change in the Shannon entropy as ⟨ΔsXk ⟩ ∶= ⟨s(xk+1 )⟩ − ⟨s(xk )⟩. The total stochastic entropy change in X from time 1 to N is given by ∑

N−1

ΔsX ∶=

ΔsXk = s(xN ) − s(x1 ),

(3.45)

k=1

which is also written as p(x1 ) ΔsX = ln . p(xN )

(3.46)

73

74

3 Information Flow and Entropy Production on Bayesian Networks

The stochastic entropy change in the heat bath is identified with the heat dissipation into the bath [9]: Δsbath ∶= −𝛽Qk . k

(3.47)

From Eq. (3.41), Eq. (3.47) can also be rewritten as = ln Δsbath k

p(xk+1 |xk ; 𝜆k ) . p(xk |xk+1 ; 𝜆k )

(3.48)

The total stochastic entropy change in the heat bath from time 1 to N is then given by ∑

N−1

Δsbath ∶=

Δsbath = −𝛽Q, k

(3.49)

k=1

which can be rewritten as p(xN |xN−1 ; 𝜆N−1 ) · · · p(x3 |x2 ; 𝜆2 )p(x2 |x1 ; 𝜆1 ) . Δsbath = ln p(x1 |x2 ; 𝜆1 )p(x2 |x3 ; 𝜆2 ) · · · p(xN−1 |xN ; 𝜆N−1 )

(3.50)

The total stochastic entropy production of system X and the heat bath from time k to k + 1 is then defined as 𝜎k ∶= ΔsXk + Δsbath , k

(3.51)

and that from time 1 to N is defined as 𝜎 ∶= ΔsX + Δsbath .

(3.52)

The entropy production ⟨𝜎⟩ is defined as the average of 𝜎, where ⟨· · ·⟩ denotes the ensemble average over probability distribution p(). From Eqs (3.46) and (3.50), we obtain p(xN |xN−1 ; 𝜆N−1 ) · · · p(x3 |x2 ; 𝜆2 )p(x2 |x1 ; 𝜆1 )p(x1 ) 𝜎 = ln , (3.53) p(x1 |x2 ; 𝜆1 )p(x2 |x3 ; 𝜆2 ) · · · p(xN−1 |xN ; 𝜆N−1 )p(xN ) which is sometimes referred to as the detailed fluctuation theorem [8]. We discuss the meaning of the probability distributions in the right-hand side of Eq. (3.53). First, we recall that the probability distribution of  is given by p() ∶= p(xN |xN−1 ; 𝜆N−1 ) · · · p(x3 |x2 ; 𝜆2 )p(x2 |x1 ; 𝜆1 )p(x1 ),

(3.54)

which describes the probability of trajectory x1 → x2 → · · · → xN with the time evolution of the external parameter 𝜆1 → 𝜆2 → · · · 𝜆N . On the contrary, pB () ∶= p(x1 |x2 ; 𝜆1 )p(x2 |x3 ; 𝜆2 ) · · · p(xN−1 |xN ; 𝜆N−1 )p(xN )

(3.55)

is regarded as the probability of the “backward” trajectory xN → xN−1 → · · · → x1 starting from the initial distribution p(xN ), where the time evolution of the external prarameter is also time-reversed as 𝜆N → 𝜆N−1 → · · · → 𝜆1 . In other words, pB () describes the probability of the time-reversal of the original dynamics. In order to emphasize this, we introduced suffix “B” in pB () that represents “backward.” We also write pB (xk−1 |xk ) ∶= p(xk−1 |xk ; 𝜆k−1 ).

(3.56)

3.3

Stochastic Thermodynamics for Markovian Dynamics

We again stress that pB () is different from the original probability p(), but describes the probability of the time-reversed trajectory with the time-reversed time evolution of the external parameter. By using notations (3.54) and (3.55), Eq. (3.53) can be written in a simplified way as follows: 𝜎 = ln

p() . pB ()

(3.57)

In Eqs (3.53) and (3.57), the entropy production is determined by the ratio of the probabilities of a trajectory and its time-reversal. This implies that the entropy production is a measure of irreversibility. We consider the second law of thermodynamics, which states that the average entropy production is nonnegative: ⟨𝜎⟩ ≥ 0.

(3.58)

This is a straightforward consequence of the definition of 𝜎 as shown below. We first note that Eq. (3.57) can be rewritten by using the stochastic relative entropy defined in Eq. (3.8): 𝜎 = dKL (p()||pB ()).

(3.59)

By taking the ensemble average of dKL (p()||pB ()) by the probability distribution p(), we find that ⟨𝜎⟩ is equal to the relative entropy between p() and pB (): ⟨𝜎⟩ = ⟨dKL (p()||pB ())⟩ =∶ D(p||pB ),

(3.60)

which is nonnegative and implies inequality (3.58). The second law (3.58) can be shown in another way as follows. We first show that ⟨exp(−𝜎)⟩ = 1, because

⟨

⟨exp(−𝜎)⟩ =

(3.61) pB () p()

⟩ =

∑ 

p()

pB () ∑ = pB () = 1. p() 

(3.62)

Equality (3.61) is called the integral fluctuation theorem [7, 9]. By applying the Jensen inequality, we obtain ⟨exp(−𝜎)⟩ ≥ exp(−⟨𝜎⟩),

(3.63)

which, together with Eq. (3.61), leads to the second law (3.58). We note that Eq. (3.61) can be regarded as a special case of Eq. (3.12), and the above proof of inequality (3.58) is parallel to the argument below Eq. (3.12). We next consider the physical meaning of the entropy production for a special case, and relate the entropy production to the work and free energy. Suppose that the initial and the final probability distributions are given by the canonical distributions such that p(x1 ) = peq (x1 ; 𝜆1 ) and p(xN ) = peq (xN ; 𝜆N ). In this case, the stochastic Shannon entropy change is given by ΔsX = ln

peq (x1 ; 𝜆1 ) peq (xN ; 𝜆N )

= −𝛽(ΔF − ΔE),

(3.64)

75

76

3 Information Flow and Entropy Production on Bayesian Networks

where ΔF ∶= F(𝜆N ) − F(𝜆1 ) is the free energy change and ΔE ∶= E(xN ; 𝜆N ) − E(x1 ; 𝜆1 ) is the energy change. Therefore, the stochastic entropy production is given by 𝜎 = ΔsX − 𝛽Q = 𝛽(−ΔF + ΔE − Q).

(3.65)

By using the first law of thermodynamics (3.38), we obtain 𝜎 = 𝛽(W − ΔF).

(3.66)

Equality (3.66) gives the energetic interpretation of the entropy production for transitions between equilibrium states. In this case, the integral fluctuation theorem (3.61) reduces to ⟨e𝛽(ΔF−W ) ⟩ = 1,

(3.67)

which is called the Jarzynski equality [7]. It can also be shown that Eq. (3.67) is still valid even when the final distribution is out of equilibrium [7]. The second law of thermodynamics (3.58) then reduces to ⟨W ⟩ ≥ ΔF,

(3.68)

which is a well-known energetic expression of the second law; the free energy increase cannot be larger than the performed work.

3.4 Bayesian Networks

In this section, we review the basic concepts of Bayesian networks [64–69], which represent causal structures of stochastic dynamics with directed acyclic graphs. We first define the directed acyclic graph (see also Fig. 3.4). The directed graph  ∶= {, } is given by a finite set of nodes  and a finite set of directed edges . We write the set of nodes as  = {a1 , … , aN },

(3.69)

where aj is a node and N is the number of nodes. The set of directed edges  is given by a subset of all ordered pairs of nodes in :  ∶= {aj → aj′ |aj , aj′ ∈ , aj ≠ aj′ }.

(3.70)

Intuitively,  is the set of events, and their causal relationship is represented by . If (aj → aj′ ) ∈ , we say that aj is a parent of aj′ (or equivalently, aj′ is a child of aj ). We write as pa(aj ) the set of parents of aj (see also Fig. 3.5): pa(aj ) ∶= {aj′ |(aj′ → aj ) ∈ }. a3 a2 a1

: Node : Edge

(3.71)

Figure 3.4 Example of a simple directed acyclic graph  = {, } with  = {a1 , a2 , a3 } and  = {a1 → a2 , a1 → a3 }.

3.4

Bayesian Networks

p(aj an(aj)) = p(aj pa(aj),V ′) = p(aj pa(aj))

aj

pa(aj) V′

an(aj)

Figure 3.5 Schematic of the parents of aj . The set of the parents, pa(aj ), is defined as the set of the nodes that have directed edges toward aj . This figure also illustrates the setup of Eq. (3.80).

A directed graph is called acyclic if  does not include any directed cyclic path. In other words, a directed graph is cyclic if there exists (j, j(1) , j(2) , … , j(n) ), such that {aj → aj(1) , aj(1) → aj(2) , … , aj(n−1) → aj(n) , aj(n) → aj } ⊂ ; otherwise, it is acyclic. The acyclic property implies that the causal structure does not include any “time loop.” If a directed graph is acyclic, we can define the concept of topological ordering. An ordering of , written as (a1 , a2 , … , aN ), is called topological ordering, if aj is not a parent of aj′ for j > j′ . We then define the set of ancestors of aj by an (aj ) ∶= {aj−1 , … , a1 } (an(a1 ) ∶= ∅). We note that a topological ordering is not necessary unique. We show a simple example of a directed acyclic graph  = {, } with  = {a1 , a2 , a3 } and  = {a1 → a2 , a1 → a3 } in Figure 3.4. A node is described by a circle with variable aj , and a directed edge is described by a directed arrow between two nodes. In Figure 3.4, the sets of parents are given by pa(a1 ) = ∅, pa(a2 ) = {a1 }, and pa(a3 ) = {a1 }, where ∅ denotes the empty set. In this case, we have two topological orderings: {a1 , a2 , a3 } and {a1 , a3 , a2 }. We next consider a probability distribution on a directed acyclic graph  = {, }, which is a key concept of Bayesian networks. A directed edge aj → aj′ ∈  on a Bayesian network represents the probabilistic dependence (i.e., causal relationship) between two nodes aj and aj′ . Therefore, variable aj only depends on its parents pa(aj ). The causal relationship can be described by the conditional probability of aj under the condition of pa(aj ), written as p(aj |pa(aj )). If pa(aj ) = ∅, p(aj ) ∶= p(aj |∅) is just the probability of aj . The joint probability distribution of all the nodes in a Bayesian network is then defined as p() ∶=

V ∏

p(aj |pa(aj )),

(3.72)

j=1

which implies that the probability of a node is only determined by its parents. This definition represents the causal structure of Bayesian networks; the cause of a node is given by its parents.

77

78

3 Information Flow and Entropy Production on Bayesian Networks

(a)

(b) a6

a3 a4

a5

a2

a3

a2

a1 a1

Figure 3.6 Simple examples of Bayesian networks.

In Figure 3.6, we show two simple examples of Bayesian networks. For Figure 3.6(a), the joint distribution is given by p(a1 , a2 , a3 ) ∶= p(a3 |a2 )p(a2 |a1 )p(a1 ),

(3.73)

which describes a simple Markovian process. Figure 3.6(b) is a little less trivial, whose joint distribution is given by p(a1 , a2 , … , a6 ) = p(a6 |a1 , a4 , a5 )p(a5 |a3 )p(a4 |a2 )p(a3 |a1 )p(a2 |a1 )p(a1 ). (3.74) For any subset of nodes  ⊆ , the probability distribution on  is given by ∑ p(). (3.75) p() = ⧵

For , ′ ⊆ , the joint probability distribution is given by ∑ p(). p(, ′ ) =

(3.76)

⧵(∪′ )

The conditional probability is then given by the Bayes rule: p(|′ ) =

p(, ′ ) . p(′ )

(3.77)

Let A() be a probability variable that depends on nodes in . The ensemble average of A() is defined as ∑ ⟨A()⟩ ∶= A()p(). (3.78) 

In particular, if A depends only on  ⊆ , Eq. (3.78) reduces to ∑ ∑ A()p() = A()p(). ⟨A()⟩ ∶= 

(3.79)



We note that p(aj |an(aj )) = p(aj |pa(aj )) holds by definition, which implies that any probability variable directly depends on the nearest ancestors (i.e., parents).

3.5

Information Thermodynamics on Bayesian Networks

This is consistent with the description of directed acyclic graphs. In general, we have p(aj |pa(aj ),  ′ ) = p(aj |pa(aj ))

(3.80)

for any  ′ ⊆ {an(aj ) ⧵ pa(aj )} (see also Fig. 3.5).

3.5 Information Thermodynamics on Bayesian Networks

We now discuss a general framework of stochastic thermodynamics for complex dynamics described by Bayesian networks [52], where system X is in contact with systems C in addition to the heat bath. In particular, we derive the generalized second law of thermodynamics, which states that the entropy production is bounded by an informational quantity that consists of the initial and final mutual between X and C, and the transfer entropy from X to C. 3.5.1 Setup

First, we discuss how Bayesian networks represent the causal relationships in physical dynamics. We consider a situation that several physical systems interact with each other and stochastically evolve in time. A probability variable associated with a node, aj ∈ , represents a state of one of the systems at a particular time. We assume that the topological ordering (a1 , … , aN ) describes the time ordering; the time of state aj should not be later than the time of state aj+1 . This assumption does not exclude a situation that aj and aj+1 can be states of different systems at the same time. Each edge in  describes the causal relationship between states of the systems at different times. Correspondingly, the conditional probability p(aj |pa(aj )) characterizes the stochastic dynamics. The joint probability p() represents the probability of trajectories of the whole system. We focus on a particular system X, whose time evolution is described by a set of nodes. Let  ∶= {x1 , … , xN } ⊆  be the set of nodes that describe states of X and (x1 , … , xN ) be the topological ordering of the elements of , where we refer to the suffixes as “time.” A probability variable xk in  describes the state of system X at time k. We assume that there is a causal relationship between xk and xk+1 such that xk ∈ pa(xk+1 ).

(3.81)

For simplicity, we also assume that pa(xk+1 ) ∩  = {xk },

(3.82)

which does not exclude the situation that there are nodes in pa(xk+1 ) outside of  (see Fig. 3.7).

79

80

3 Information Flow and Entropy Production on Bayesian Networks

 

c

cN’ xN

x2 x1

c2 c1

Figure 3.7 Schematic of the physical setup of Bayesian networks. The time evolution of system X is given by the sequence of nodes  = {x1 , … , xN }, and the time evolution of C is given by  ∶=  ⧵  = {c1 , … , cN′ }.

We next consider the systems other than X, which we refer to as C. The states of C are given by the nodes in set  ∶=  ⧵  (see also Fig. 3.7). Let (c1 , c2 , … , cN ′ ) be the topological ordering of , where we again refer to the suffixes as “time.” A probability variable cl describes the state of C at time l. Since  =  ∪ , we can define a joint topological ordering of  as (c1 , … , cl(1) , x1 , cl(1) +1 , … , cl(2) , x2 , cl(2) +1 , … , … cl(N) , xN , cl(N) +1 , … , cN ′ ), (3.83) where the ordering (c1 , … , cl(1) , … , cl(2) , … , cN ′ ) is the same as the ordering (c1 , c2 , … , cN ′ ). The joint probability distribution p(, ) can be obtained from Eq. (3.72): p(, ) =

N ∏ k=1

′

p(xk |pa(xk ))

N ∏

p(cl |pa(cl )),

(3.84)

l=1

where the conditional probability p(xk+1 |pa(xk+1 )) represents the transition probability of system X from time k to k + 1. We note that the dynamics in  can be non-Markovian due to the non-Markovian property of C. We summarize the notations in Table 3.1. 3.5.2 Information Contents on Bayesian Networks

We consider information contents on Bayesian networks; the initial and final mutual informations between X and C and the transfer entropy from X to C. We first consider the initial correlation of the dynamics. The initial state x1 of X is initially correlated with its parents pa(x1 ) ⊆ . The initial correlation between system X and C is then characterized by the mutual information between x1 and

3.5

Information Thermodynamics on Bayesian Networks

81

Table 3.1 Summary of notations. Notation

Meaning

pa(a) (Parents of a) an(a) (Ancestors of a) xk  ∶= {x1 , … , xN }  ∶= {c1 , … , cN ′ }  ′ ∶= an(xN ) ∩  pa (cl ) ∶= pa(cl ) ∩  pa (cl ) ∶= pa(cl ) ∩  k+1 ∶= pa(xk+1 ) ⧵ {xk } Iini ∶= I(x1 : pa(x1 )) Iltr ∶= I(cl ∶ pa (cl )|c1 , … , cl−1 ) ∑ I tr ∶= l ∶ c ∈ ′ Iltr l Ifin ∶= I(xN ∶  ′ ) Θ ∶= Ifin − I tr − Iini 𝜎

Set of nodes that have causal relationship to node a. Set of nodes before node a in the topological ordering. State of system X at time k. Set of states of system X. Set of states of other systems C. Set of the ancestors of xN in C. Set of the parents of cl in . Set of the parents of cl in . Set of parents of xk+1 outside of . Initial mutual information between X and C. Transfer entropy from X to cl . Total transfer entropy from X to C. Final mutual information between X and C. −⟨Θ⟩ is the available information about X obtained by C. Stochastic entropy change in X and the heat bath.

(a)

(b)

(c) C

C



⫶

 x N

⫶



⫶ cI

Ifin Iltr

pax(cI)

⫶

x1

⫶

⫶

pac(cI)

Iini

⫶

pa(x1) ⫶

C′

⫶

⫶ {cl–1,…,c1}

Figure 3.8 Schematics of informational quantities on Bayesian networks. (a) The initial correlation between x1 and pa(x1 ), (b) the final correlation between xN and  ′ , and (c) the transfer entropy from X to cl .

pa(x1 ). The corresponding stochastic mutual information is given by (see also Fig. 3.8 (a)) Iini ∶= I(x1 : pa(x1 )).

(3.85)

Its ensemble average ⟨Iini ⟩ is the mutual information of the initial correlation. It vanishes only if p(x1 |pa(x1 )) = p(x1 ), or equivalently, pa(x1 ) = ∅. We next consider the final correlation of the dynamics. The final state of X is given by xN ∈ , which is correlated with its ancestors an(xN ). The final correlation between system X and C is then characterized by the mutual information

C

82

3 Information Flow and Entropy Production on Bayesian Networks

between xN and  ′ ∶= an(xN ) ∩ . The corresponding stochastic mutual information is given by (see also Fig. 3.8 (b)) Ifin ∶= I(xN ∶  ′ ).

(3.86)

Its ensemble average ⟨Ifin ⟩ is the mutual information of the final correlation. It vanishes only if p(xN | ′ ) = p(xN ). We next consider the transfer entropy from X to C during the dynamics. The transfer entropy on Bayesian networks has been discussed in Ref. [70]. We here focus on the role of the transfer entropy on Bayesian networks in terms of information thermodynamics. Let cl ∈ . Let pa (cl ) ∶= pa(cl ) ∩  be the set of the parents of cl in  and pa (cl ) ∶= pa(cl ) ∩  be the set of the parents of cl in  (see also Fig. 3.8 (c)). We note that pa (cl ) ∪ pa (cl ) = pa(cl ) and pa (cl ) ∩ pa (cl ) = ∅. We then have p(cl |pa(cl )) = p(cl |pa (cl ), pa (cl )) = p(cl |pa (cl ), c1 , … , cl−1 ),

(3.87)

= {c1 , … , cl−1 } ⧵ pa (cl ). where we used Eq. (3.80) with The transfer entropy from system X to state cl is defined as the conditional mutual information between cl and pa (cl ) under the condition of {c1 , … , cl−1 }. The corresponding stochastic transfer entropy is given by ′

Iltr ∶= I(cl ∶ pa (cl )|c1 , … , cl−1 ) p(cl , pa (cl )|c1 , … , cl−1 ) ∶= ln p(cl |c1 , … , cl−1 )p(pa (cl )|c1 , … , cl−1 ) p(cl |pa (cl ), c1 , … , cl−1 ) . = ln p(cl |c1 , … , cl−1 )

(3.88)

It can also be rewritten by using the conditional stochastic Shannon entropy: Iltr = s(cl |c1 , … , cl−1 ) − s(cl |pa (cl ), c1 , … , cl−1 ),

(3.89) Iltr

which is analogous to Eq. (3.29). The ensemble average of is the transfer entropy ⟨Iltr ⟩, which describes the amount of information about X that is newly obtained by C at time l. ⟨Iltr ⟩ is nonnegative from the definition, and is zero only if ln p(cl |pa (cl ), c1 , … , cl−1 ) = ln p(cl |c1 , … , cl−1 ), or equivalently pa (cl ) = ∅. The total transfer entropy from X to C during the dynamics from x1 to xN is then given by ∑ Iltr . (3.90) I tr ∶= l ∶ cl ∈ ′

By summing up the foregoing information contents, we introduce a key informational quantity Θ: Θ ∶= Ifin − I tr − Iini ,

(3.91)

which plays a crucial role in the generalized second law that will be discussed in Section 3.5. Here, the minus of the ensemble average of Θ (i.e., −⟨Θ⟩) characterizes

3.5

Information Thermodynamics on Bayesian Networks

the available information about X obtained by C during the dynamics from x1 to xN (see also Fig. 3.8). 3.5.3 Entropy Production

We next focus on the entropy production that is defined as the sum of the entropy changes in system X and the heat bath. While the key idea of the definition is the same as the case for the Markovian dynamics discussed in the previous section, a careful argument is necessary for the entropy production on Bayesian networks, because of the presence of other system C. We consider the subset of probability variables in C (i.e., nodes in ) that affect the time evolution of X from time k to k + 1, which is defined as (see Fig. 3.9) k+1 ∶= pa(xk+1 ) ⧵ {xk } ⊆ .

(3.92)

The transition probability of X from time k to k + 1 is then written as p(xk+1 |pa(xk+1 )) = p(xk+1 |xk , k+1 ).

(3.93)

We note that p(xk+1 |xk , k+1 ) describes the transition probability from xk to xk+1 under the condition that the states of C that affect X are given by k+1 . We define the functional form of p(xk+1 |xk , k+1 ) with arguments (xk+1 , xk , k+1 ) by f (xk+1 , xk , k+1 ) ∶= p(xk+1 |xk , k+1 ).

(3.94)

We then define the backward transition probability as pB (xk |xk+1 , k+1 ) ∶= f (xk , xk+1 , k+1 ),

(3.95)

which describes the transition probability from xk+1 to xk under the same condition k+1 as the forward process. Here, pB (xk |xk+1 , k+1 ) is different from the conditional probability p(xk |xk+1 , k+1 ) ∶= p(xk , xk+1 , k+1 )∕p(xk+1 , k+1 ), which is obtained from

Bk+1

xk+1

cl

Time evolution xk

X

Figure 3.9 Schematic of k+1 .

⫶

⫶

cl′

⫶ C

⫶

Other systems

83

84

3 Information Flow and Entropy Production on Bayesian Networks

the Bayes rule (3.77) of the Bayesian network. In order to emphasize the difference, we used the suffix “B” that represents “backward.” We note that pB (xk |xk+1 , k+1 ) is analogous to p(xk |xk+1 ; 𝜆k ) in Eq. (3.56) of Section 3.3, by replacing 𝜆k by k+1 . In fact, in many situations, we can assume that external parameter 𝜆k is determined by k+1 ; a typical case is feedback control as will be discussed in Sections 6.2 and 6.3. We also note that the backward probability pB (xk |xk+1 , k+1 ) can be defined even in the presence of odd variables such as momentum, by slightly modifying definition (3.95). We then define the entropy change in the heat bath from time k to k + 1 in the form of Eq. (3.48): ∶= ln Δsbath k

p(xk+1 |xk , k+1 ) . pB (xk |xk+1 , k+1 )

(3.96)

can be identified with −𝛽Qk in many situations. In fact, as We note that Δsbath k mentioned above, if k+1 affects xk only through the external parameter, Eq. (3.96) is equivalent to (3.48). In such a case, we can show that Δsbath = −𝛽Qk , as disk cussed in Section 3.3. The entropy change in the heat bath from time 1 to N is then given by ∑

N−1

Δsbath ∶=

Δsbath k

k=1

= ln

p(xN |xN−1 , N ) · · · p(x3 |x2 , 3 )p(x2 |x1 , 2 ) , pB (x1 |x2 , 2 )pB (x2 |x3 , 3 ) · · · pB (xN−1 |xN , N )

(3.97)

which is analogous to Eq. (3.50). The total entropy change in X and the heat bath from time 1 to N is then defined as 𝜎 ∶= ΔsX + Δsbath , which is also written as p(xN |xN−1 , N ) · · · p(x3 |x2 , 3 )p(x2 |x1 , 2 )p(x1 ) . 𝜎 = ln pB (x1 |x2 , 2 )pB (x2 |x3 , 3 ) · · · pB (xN−1 |xN , N )p(xN )

(3.98)

(3.99)

3.5.4 Generalized Second Law

We now consider the relationship between the second law of thermodynamics and informational quantities. The lower bound of the entropy change in system X and the heat bath is given by ⟨Θ⟩: ⟨𝜎⟩ ≥ ⟨Θ⟩,

(3.100)

or equivalently, ⟨𝜎⟩ ≥ ⟨Ifin ⟩ − ⟨I tr ⟩ − ⟨Iini ⟩,

(3.101)

which is the generalized second law of thermodynamics on Bayesian networks.

3.5

Information Thermodynamics on Bayesian Networks

The proof of the generalized second law (3.100) is as follows. We first show that 𝜎 − Θ can be rewritten as the stochastic relative entropy as follows: ] [ N−1 p(xN ,  ′ ) p(x1 ) ∏ p(xk+1 |xk , k+1 ) − ln 𝜎 − Θ = ln p(xN ) k=1 pB (xk |xk+1 , k+1 ) p(xN )p( ′ ) [ ] ∏ p(cl |pa(cl )) p(x1 |pa(x1 )) + ln + ln (3.102) p(x1 ) p(cl |c1 , … , cl−1 ) l ∶ c ∈ ′ l

∏ k=1 p(xk |pa(xk )) l ∶ cl ∈ ′ p(cl |pa(cl )) ∏N−1 ′ k=1 pB (xk |xk+1 , k+1 )p(xN ,  )

∏N = ln

=dKL (p()||pB ()), where we defined ∏

N−1

pB () ∶= p(xN ,  ′ )

pB (xk |xk+1 , k+1 )

∏

p(cl |pa(cl )).

(3.103)

l ∶ cl ∉ ′ ,cl ∈

k=1

We can confirm that pB () is normalized, and can be regarded as a probability distribution: ∑ 

pB () =

∑ X, ′

=

∑

∏

N−1

p(xN ,  ′ )

pB (xk |xk+1 , k+1 )

k=1

p(xN ,  ′ )

xN , ′

(3.104)

=1, ∑  ′ , and xk pB (xk |xk+1 , k+1 )

where we used xN ∈ , k+1 ⊆ = 1. From Eq. (3.102) and the nonnegativity of the relative entropy, we show that the ensemble average of 𝜎 − Θ is nonnegative: ⟨𝜎 − Θ⟩ = DKL (p||pB ) ≥ 0,

(3.105)

which implies the generalized second law (3.100). The equality in (3.100) holds only if p() = pB (). We consider the integral fluctuation theorem corresponding to inequality (3.100). From Eq. (3.12) for the stochastic relative entropy, we have ⟨e−dKL (p()||pB ()) ⟩ = 1,

(3.106)

or equivalently, ⟨e−𝜎+Θ ⟩ = 1.

(3.107)

This is the generalized integral fluctuation theorem for Bayesian networks. By applying the Jensen inequality to Eq. (3.107), we again obtain inequality (3.100).

85

86

3 Information Flow and Entropy Production on Bayesian Networks

We note that, from inequality (3.100) and ⟨Ifin ⟩ ≥ 0, we obtain a weaker bound of the entropy production: ⟨𝜎⟩ ≥ −⟨I tr ⟩ − ⟨Iini ⟩.

(3.108)

This weaker inequality can also be rewritten as the nonnegativity of the relative entropy DKL (p||̃pB ) ≥ 0, where the probability p̃ B () is defined as ∏

N−1

p̃ B () ∶= p(xN )p( ′ )

k=1

pB (xk |xk+1 , k+1 )

∏

p(cl |pa(cl )).

(3.109)

l ∶ cl ∉ ′ ,cl ∈

The corresponding integral fluctuation theorem is given by ⟨e−𝜎−I

tr −I

ini

⟩ = 1.

(3.110)

3.6 Examples

In the following, we illustrate special examples and discuss the physical meaning of the generalized second law (3.100). 3.6.1 Example 1: Markov Chain

As the simplest example, we revisit the Markovian dynamics discussed in Section 3.3 from the viewpoint of Bayesian networks. In this case,  =  = {x1 , … , xN } and  = ∅. The Markovian property is characterized by pa(xk ) = {xk−1 } with k ≥ 2, and pa(x1 ) = ∅ (see also Fig. 3.10). Since k+1 ∶= pa(xk+1 ) ⧵ {xk } = ∅, the entropy production (3.98) is equivalent to Eq. (3.52). From  = ∅, we have Ifin = 0, Iini = 0, Iltr = 0, and therefore Θ = 0. Therefore, the definition of 𝜎 in Eq. (3.99) reduces to Eq. (3.57), and the generalized second law (3.100) just reduces to ⟨𝜎⟩ ≥ 0. 3.6.2 Example 2: Feedback Control with a Single Measurement

We consider the system under feedback control with a single measurement as is the case of the Szilard engine. In this case, system X is the measured system, and the other system C is a memory that stores the measurement outcome. At time k = 1, a measurement on state x1 is performed, and the obtained outcome is stored in memory state m1 . The probability of outcome m1 under the condition of state x1 is denoted by p(m1 |x1 ), which characterizes the measurement error. If p(m1 |x1 ) is the delta function 𝛿m1 ,x1 , the measurement is errorfree. After the measurement, the time evolution of X is affected by m1 such that the transition probability of X from time k to k + 1 is given by p(xk+1 |xk , m1 ) (k = 1, 2, … , N − 1), which is feedback control. In terms of the physical interpretation discussed in Section 3.3, the dynamics of system X is determined by the

3.6

Example 1

Example 3

Example 2

xN

xN

⫶

⫶

x2

x2

xN mN–1

Ifin

⫶

x1

C′

Iltr





m2

x2

m1 x1

Examples

m1 x1

C′



Figure 3.10 Bayesian networks of the examples. Example 1: Markov chain. Example 2: Feedback control with a single measurement. Example 3: Repeated feedback control with multiple adaptive measurements.

external parameter. In the presence of feedback control, the time evolution of the external parameter is determined by m1 . The joint probability distribution of all the variables is then given by p(xN , … , x1 , m1 ) = p(xN |xN−1 , m1 ) · · · p(x2 |x1 , m1 )p(m1 |x1 )p(x1 ).

(3.111)

The Bayesian network corresponding to the above dynamics is characterized as follows. Let  ∶= {x1 , … , xN } be the set of the states of the measured system X,  ∶= {m1 } be the memory state, and  ∶=  ∪  = {x1 , … , xN , m1 } be the set of all notes. The causal structure described by Eq. (3.111) is given by pa(xk ) = {xk−1 , m1 } for k ≥ 2, pa(m1 ) = {x1 }, and pa(x1 ) = ∅ (see also Fig. 3.10). Since k+1 ∶= pa(xk+1 ) ⧵ {xk } = {m1 } for k ≥ 1, the entropy production (3.96) in the heat bath from time k to k + 1 is given by Δsbath = ln k

p(xk+1 |xk , m1 ) . pB (xk |xk+1 , m1 )

(3.112)

Considering the foregoing argument that p(xk+1 |xk , m1 ) depends on m1 through as the heat such that Δsbath = −𝛽Qk . external parameter 𝜆k , we can identify Δsbath k k The total entropy production (3.98) from time 1 to N is given by ] [ N−1 p(x1 ) ∏ p(xk+1 |xk , m1 ) . (3.113) 𝜎 = ln p(xN ) k=1 pB (xk |xk+1 , m1 ) From pa(x1 ) = ∅,  ′ =  = {m1 }, and pa (m1 ) = {x1 }, we have Ifin = I(xN ∶ m1 ), Iini = 0, Itr1 = I(x1 ∶ m1 ), and therefore Θ = I(xN ∶ m1 ) − I(x1 ∶ m1 ),

(3.114)

87

88

3 Information Flow and Entropy Production on Bayesian Networks

which is the difference between the initial and the final mutual information. Therefore, the generalized second law (3.100) reduces to ⟨𝜎⟩ ≥ ⟨I(xN ∶ m1 )⟩ − ⟨I(x1 ∶ m1 )⟩.

(3.115)

We note that inequality (3.115) is equivalent to the generalized second law obtained in Refs [36, 39]. Several simple models that achieve equality in inequality (3.115) have been proposed [32, 34, 35]. In general, the equality in inequality (3.115) is achieved only if a kind of reversibility with feedback control is satisfied [32]; the reversibility condition is given by ∏

N−1

∏

N−1

p(xk+1 |xk , m1 ) ⋅ p(x1 , m1 ) =

k=1

pB (xk |xk+1 , m1 ) ⋅ p(xN , m1 ).

(3.116)

k=1

The left-hand side of Eq. (3.116) represents the probability of the forward trajectory with feedback control. The physical meaning of the right-hand side is as follows. Suppose that we start a backward process just after a forward process by keeping m1 for each trajectory. In the backward process, we use outcome m1 obtained in the forward process to determine the external parameter; we do not perform feedback control in the backward process. The probability distribution of the backward trajectories is then given by the right-hand side of Eq. (3.116). We next consider a special case that the initial and final states of system X are in thermal equilibrium. The initial distribution is given by p(x1 ) = peq (x1 ) ∶= e𝛽(F(1)−E(x1 ;1)) ,

(3.117)

where F(1) is the initial free energy and E(x1 ; 1) is the initial Hamiltonian. Since the final Hamiltonian may depend on outcome m1 due to the feedback control, the final distribution under the condition of m1 is the conditional canonical distribution: p(xN |m1 ) = peq (xN |m1 ) ∶= e𝛽(F(m1 )−E(x1 ;m1 )) ,

(3.118)

where F(m1 ) is the final free energy and E(x1 ; m1 ) is the final Hamiltonian, both of which may depend on outcome m1 . The generalized second law (3.115) is then equivalent to 𝛽⟨W − ΔF⟩ ≥ −⟨I(x1 ∶ m1 )⟩,

(3.119)

where W is the work and ΔF ∶= F(m1 ) − F(1) is the free-energy difference. We note that the ensemble average is needed for ΔF, because F(m1 ) is a stochastic quantity due to the stochasticity of m1 . Inequality (3.119) has been derived in Ref. [27]. The derivation of inequality (3.119) from (3.115) is as follows. We first note that s(xN ) ∶= − ln p(xN ) ] [ p(m1 ) = − ln p(xN |m1 ) p(xN |m1 ) = s(xN |m1 ) + I(xN ∶ m1 ).

(3.120)

3.6

Examples

From Eqs (3.117) and (3.118), we have s(x1 ) = −𝛽(F(1) − E(x1 ; 1)), s(xN |m1 ) = −𝛽(F(m1 ) − E(x1 ; m1 )).

(3.121)

Therefore, we obtain s(xN ) − s(x1 ) = −𝛽(ΔF − ΔE) + I(xN ∶ m1 ).

(3.122)

By substituting the ensemble average of Eq. (3.122) into inequality (3.115), we obtain − 𝛽⟨ΔF − ΔE⟩ + ⟨I(xN ∶ m1 )⟩ − ⟨Q⟩ ≥ ⟨I(xN ∶ m1 )⟩ − ⟨I(x1 ∶ m1 )⟩. (3.123) By noting the first law ΔE = W + Q, we find that inequality (3.123) is equivalent to inequality (3.119). The simplest example of the present setup is the Szilard engine discussed in Section 1.2 (see also Fig. 3.1). In this case, the measurement is error-free and the outcome is m1 = L or R with probability 1∕2, and therefore ⟨I(x1 ∶ m1 )⟩ = ln 2. The final state is no longer correlated with m1 such that ⟨I(xN ∶ m1 )⟩ = 0. The extracted work is −⟨W ⟩ = 𝛽 −1 ln 2, and the free-energy change is ⟨ΔF⟩ = 0. Therefore, for the Szilard engine, both sides of inequality (3.119) are given by ln 2, and the equality in (3.119) is achieved. In this sense, the Szilard engine is an optimal information-thermodynamic engine. 3.6.3 Example 3: Repeated Feedback Control with Multiple Measurements

We consider the case of multiple measurements and feedback control. Let xk be the state of system X at time k (= 1, … , N). Suppose that the measurement outcome obtained at time k (= 1, … , N − 1), written as mk , is affected by past trajectory (x1 , x2 , … , xk ) of system X. In other words, the measurement at time k is performed on trajectory (x1 , x2 , … , xk ). Moreover, we assume that outcome mk is also affected by sequence (m1 , … , mk−1 ) of the past measurement outcomes, which describes the situation that the way of measuring X is changed depending upon the past measurement outcomes; such a measurement is called adaptive. The conditional probability of mk is then given by p(mk |x1 , … , xk−1 , xk , m1 · · · , mk−1 ). Next, outcome mk is used for feedback control after time k, and the transition probability from xk to xk+1 is written as p(xk+1 |xk , m1 , … , mk−1 , mk ). In this case, we assume that external parameter 𝜆k at time k is determined by memory states (m1 , … , mk−1 , mk ). The joint probability distribution of all the variables is then given by p(x1 , … , xN , m1 , … , mN−1 ) ∏

N−1

=

k=1

p(xk+1 |xk , m1 , … , mk )p(mk |x1 , … , xk , m1 , … , mk−1 ) ⋅ p(x1 ). (3.124)

89

90

3 Information Flow and Entropy Production on Bayesian Networks

If outcome mk is affected only by xk such that p(mk |x1 , … , xk , m1 , … , mk−1 ) = p(mk |xk ),

(3.125)

the measurement is Markovian and nonadaptive. If the transition probability from xk to xk+1 depends only on mk such that p(xk+1 |xk , m1 , … , mk−1 , mk ) = p(xk+1 |xk , mk ),

(3.126)

the feedback control is called Markovian. On the contrary, if p(xk+1 |xk , m1 , … , mk−1 , mk ) depends on ml with l < k, the feedback control is called non-Markovian, which describes the effect of time delay of the feedback loop. The Bayesian network corresponding to the above dynamics is as follows. Let  ∶= {x1 , … , xN },  ∶= {m1 , … , mN }, and  ∶=  ∪ . The causal structure is characterized by pa(xk ) = {xk−1 , m1 , … , mk−1 } for k ≥ 2, pa(x1 ) = ∅, pa(mk ) = {x1 , … , xk−1 , xk , m1 , … , mk } for k ≥ 2, and pa(m1 ) = {x1 }. Figure 3.10 describes the Bayesian network of a special case that p(mk |xk , … , x1 , m1 , … , mk−1 ) = p(mk |xk , mk−1 )

(3.127)

and pa(mk ) = {xk , mk−1 } for k ≥ 2. Since k+1 = {m1 , … , mk }, the entropy change (3.96) in the heat bath from time k to k + 1 is given by Δsbath = ln k

p(xk+1 |xk , m1 , … , mk ) . pB (xk |xk+1 , m1 , … , mk )

(3.128)

If we assume that p(xk+1 |xk , m1 , … , mk ) depends on (m1 , … , mk ) only through external parameter 𝜆k , the entropy change is identified with the heat: Δsbath = k −𝛽Qk . The total entropy production (3.98) from time 1 to N is defined as [ ] N−1 p(x1 ) ∏ p(xk+1 |xk , m1 , … , mk ) 𝜎 = ln . (3.129) p(xN ) k=1 pB (xk |xk+1 , m1 , … , mk ) From pa(x1 ) = ∅,  ′ =  = {m1 , … , mN−1 }, and pa (mk ) = {x1 , … , xk }, we have Iini = 0, Ifin = I(xN ∶ (m1 , … , mN−1 )), Iktr = I((x1 , … , xk ) ∶ mk |m1 , … , mk−1 ), and therefore ∑

N−1

Θ = I(xN ∶ (m1 , … , mN−1 )) −

I((x1 , … , xk ) ∶ mk |m1 , … , mk−1 ). (3.130)

l=1

Therefore, the generalized second law (3.100) reduces to ∑

N−1

⟨𝜎⟩ ≥ ⟨I(xN ∶ (m1 , … , mN−1 ))⟩ −

⟨I((x1 , … , xk ) ∶ mk |m1 , … , mk−1 )⟩. (3.131)

k=1

We note that, in the special case illustrated in Figure 3.10, we have pa (mk ) = {xk } and Iktr = I(xk ∶ mk |m1 , … , mk−1 ). Therefore, Θ in Eq. (3.130) reduces to ∑

N−1

Θ = I(xN ∶ (m1 , … , mN−1 )) −

k=1

I(xk ∶ mk |m1 , … , mk−1 ).

(3.132)

3.6

Examples

The equality in Eq. (3.131) holds only if the feedback reversibility is satisfied [32]: ∏

N−1

p(xk+1 |xk , m1 , … , mk )p(mk |x1 , … , xk , m1 , … , mk−1 ) ⋅ p(x1 )

k=1

∏

N−1

=

pB (xk |xk+1 , m1 , … , mk ) ⋅ p(xN , m1 , … , mN−1 ).

(3.133)

k=1

The right-hand side of Eq. (3.133) represents the probability distribution of the backward trajectories. In a backward process, any feedback control is not performed, and the external parameter is changed by using the measurement outcomes obtained in the corresponding forward process. If the initial and final states of system X are in thermal equilibrium such that p(x1 ) = peq (x1 )∶= e𝛽(F(m1 )−E(x1 ;m1 )) and p(xN |m1 , … , mN ) = peq (xN |m1 , … , mN )∶= e𝛽(F(m1 ,…,mN )−E(x1 ;m1 ,…,mN )) , inequality (3.131) reduces to, from a similar argument of the derivation of inequality (3.119), ∑

N−1

𝛽⟨W − ΔF⟩ ≥ −

⟨I((x1 , … , xk ) ∶ mk |m1 , … , mk−1 )⟩,

(3.134)

k=1

which was obtained in Refs [30, 35] for the case of nonadaptive measurements. 3.6.4 Example 4: Markovian Information Exchanges

We consider information exchanges between two interacting systems X and Y . Let xk and yk be the states of system X and Y in time ordering k = 1, … , N. Suppose that the transition from xk to xk+1 is affected by yk , and the transition from yk to yk+1 is affected by xk (see also Fig. 3.11 (a)). These assumptions imply that the interaction of X and Y is Markovian. During the dynamics, the transfer of entropy from X to Y and vice versa can be positive, and the mutual information between two systems can change. Therefore, such dynamics can describe Markovian information exchanges. In the continuous-time limit, such dynamics are called Markov jump processes of bipartite systems [54, 55]. We note that “bipartite systems” do not mean bipartite graphs in the terminology of Bayesian networks. The joint probability distribution of all the variables is given by ∏

N−1

p(x1 , y1 , … , xN , yN ) =

p(yk+1 |xk+1 , yk )p(xk+1 |xk , yk ) ⋅ p(y1 |x1 )p(x1 ).

k=1

(3.135) The transition probability of each step from (xk , yk ) to (xk+1 , yk+1 ) is given by p(xk+1 , yk+1 |xk , yk ) = p(yk+1 |xk+1 , yk )p(xk+1 |xk , yk ),

(3.136)

and correspondingly, the joint probability of (xk , yk , xk+1 , yk+1 ) is given by p(xk+1 , yk+1 , xk , yk ) = p(yk+1 |xk+1 , yk )p(xk+1 |xk , yk )p(yk |xk )p(xk ).

(3.137)

91

92

3 Information Flow and Entropy Production on Bayesian Networks

Example 4 (a)

Example 5

(b) yN

xN

C

x3 yk+1

yN–1



x1

I1tr

y1 C′

I1tr

 ⟨Θ⟩ ≤

x2

yk

xk y2

C

Ifin

xk+1

I2tr

z3

Ifin

Ifin

x2

C

Σ

C′

I4tr

y2 z2 I3tr

x1



Iini

z1 y1 C′

⟨Θk⟩

k

Figure 3.11 Bayesian networks of the examples. Example 4: Markovian information exchanges between two systems. (a) Entire dynamics. (b) A single transition. Example 5: Complex dynamics of three interacting systems.

First, we apply our general argument in Section 3.4 to the entire dynamics (3.135) illustrated in Figure 3.11 (a). Let  = {x1 , … , xN } be the set of the states of X,  = {y1 , … , yN } be the set of the states of Y , and  ∶=  ∪  = {x1 , y1 , … , xN , yN } be the set of all states. The causal structure described by Eq. (3.135) is given by pa(xk ) = {xk−1 , yk−1 } for k > 2, pa(yk ) = {xk , yk−1 } for k > 2, pa(y1 ) = {x1 }, and pa (x1 ) = ∅. Since k+1 = {yk }, the entropy change (3.96) in the heat bath from time k to k + 1 is given by Δsbath = ln k

p(xk+1 |xk , yk ) . pB (xk |xk+1 , yk )

(3.138)

The entropy production (3.98) from time 1 to N is then given by [ ] N−1 p(x1 ) ∏ p(xk+1 |xk , yk ) 𝜎 = ln . p(xN ) k=1 pB (xk |xk+1 , yk )

(3.139)

From pa(x1 ) = ∅,  ′ = an(xN ) ∩  = {y1 , … , yN−1 }, and pa (yk ) = {xk }, we have Iini = 0, Ifin = I(xN ∶ (y1 , … , yN−1 )), Iktr = I(xk ∶ yk |y1 , … , yk−1 ), and therefore ∑

N−1

Θ = I(xN ∶ (y1 , … , yN−1 )) −

l=1

I(xk ∶ yk |y1 , … , yk−1 ).

(3.140)

3.6

Examples

The generalized second law (3.100) then reduces to ∑

N−1

⟨𝜎⟩ ≥ ⟨Θ⟩ = ⟨I(xN ∶ (y1 , … , yN−1 ))⟩ −

⟨I(xk ∶ yk |y1 , … , yk−1 )⟩. (3.141)

k=1

Next, we apply our general argument in Section 3.4 only to a single transition described by Eq. (3.137), which is illustrated in Figure 3.11 (b). Let  = {xk , xk+1 } be the set of the states of X,  = {yk , yk+1 } be the set of the states of Y , and  ∶=  ∪  = {xk , yk , xk+1 , yk+1 } be the set of all states. The causal structure described by Eq. (3.137) is given by pa(xk+1 ) = {xk , yk }, pa(yk+1 ) = {xk+1 , yk }, pa(yk ) = {xk }, and pa (xk ) = ∅. Since k+1 = {yk }, the entropy change (3.96) in the heat bath from time k to k + 1 is equal to Eq. (3.138). The entropy production of the single transition, written as 𝜎k , is given by ] [ p(xk ) p(xk+1 |xk , yk ) 𝜎k = ln . (3.142) p(xk+1 ) pB (xk |xk+1 , yk ) ∑N Here, the sum k=1 𝜎k is equal to the entire entropy production 𝜎 given in Eq. (3.139). From pa(xk ) = ∅,  ′ = an(xk+1 ) ∩  = {yk }, and pa (yk+1 ) = {xk }, we have Iini = 0, Ifin = I(xk+1 ∶ yk ), and Iktr = I(xk ∶ yk ). Denoting Θ for the single transition by Θk , we obtain Θk = I(xk+1 ∶ yk ) − I(xk ∶ yk ).

(3.143)

Therefore, the generalized second law (3.100) reduces to ⟨𝜎k ⟩ ≥ ⟨Θk ⟩ = ⟨I(xk+1 ∶ yk )⟩ − ⟨I(xk ∶ yk )⟩.

(3.144)

By summing up inequality (3.144) for k = 1, 2, … , N − 1, we obtain ⟨𝜎⟩ ≥ ⟨Θd ⟩,

(3.145)

where ∑

N−1

Θd ∶=

Θk .

(3.146)

k=1

Inequality (3.145) gives another bound of the entire entropy production ⟨𝜎⟩. An informational quantity ⟨Θd ⟩ is called the dynamic information flow, which has been studied for the bipartite Markovian jump processes and coupled Langevin dynamics [53–58]. In order to summarize the foregoing argument, we have shown two inequalities (3.141) and (3.145) for the same dynamics described in Fig. 3.11 (a). Inequality (3.145) is obtained by summing up inequality (3.144) for k = 1, 2, … , N − 1, where inequality (3.144) is obtained by applying our general inequality (3.100) only to the single transition illustrated in Figure 3.11 (b).

93

94

3 Information Flow and Entropy Production on Bayesian Networks

We now discuss the relationship between two inequalities (3.141) and (3.145). We can calculate the difference between ⟨Θd ⟩ and ⟨Θ⟩ as ∑[

N−1

⟨Θd ⟩ − ⟨Θ⟩ =

⟨I(xk+1 ∶ yk )⟩ − ⟨I(xk ∶ yk )⟩ + ⟨I(xk ∶ yk |y1 , … , yk−1 )⟩

]

k=1

− ⟨I(xN ∶ (y1 , … , yN−1 )) ⟩ ⟨ N−1 ∏ p(xk+1 , yk )p(xk , y1 , … , yk ) = ln p(xk , yk )p(xk+1 , y1 , … , yk ) k=2 ∑[

N−1

=

⟨I(xk , (y1 , … , yk−1 )|yk )⟩ − ⟨I(xk+1 , (y1 , … , yk−1 )|yk )⟩

]

k=2

≥ 0,

(3.147)

where we used the data processing inequality [22] ⟨I(xk , (y1 , … , yk−1 )|yk )⟩ ≥ ⟨I(xk+1 , (y1 , … , yk−1 )|yk )⟩,

(3.148)

for the following conditional Markov chain: p(xk , xk+1 , y1 , … , yk−1 |yk ) = p(xk+1 |xk , yk )p(xk |y1 , … , yk−1 , yk )p(y1 , … , yk−1 |yk ). (3.149) Therefore, we obtain ⟨𝜎⟩ ≥ ⟨Θd ⟩ ≥ ⟨Θ⟩,

(3.150) ⟨Θd ⟩

which implies that the dynamic information flow gives a tighter bound of the entire entropy production than ⟨Θ⟩. This hierarchy has been also shown in Ref. [56] for coupled Langevin dynamics. 3.6.5 Example 5: Complex Dynamics

We consider three systems that interact with each other as illustrated in Figure 3.11. In this case,  ∶= {y1 , x1 , z1 , x2 , z2 , y2 , x3 , z3 }, pa(y1 ) = ∅, pa(x1 ) = {y1 }, pa(z1 ) = {y1 }, pa(x2 ) = {x1 , z1 }, pa(z2 ) = {x1 , z1 }, pa(y2 ) = {y1 , x2 , z2 }, pa(x3 ) = {x2 , y2 }, and pa(z3 ) = {x2 , z2 }. The joint probability of  is given by p() = p(z3 |x2 , z2 )p(x3 |x2 , y2 )p(y2 |y1 , x2 , z2 )p(z2 |x1 , z1 )p(x2 |x1 , z1 ) × p(z1 |y1 )p(x1 |y1 )p(y1 ).

(3.151)

We focus on system X with  ∶= {x1 , x2 , x3 }. The other systems are given by Y and Z, which constitute C with  = {c1 = y1 , c2 = z1 , c3 = z2 , c4 = y2 , c5 = z3 }. Since 2 = {z1 } and 3 = {y2 }, the total entropy production (3.98) is defined as 𝜎 ∶= ln

p(x3 |x2 , y2 )p(x2 |x1 , z1 )p(x1 ) . pB (x1 |z1 , x2 )pB (x2 |y2 , x3 )p(x3 )

(3.152)

3.7

Summary and Prospects

From  ′ = {y1 , z1 , z2 , y2 }, pa(x1 ) = {y1 }, pa (y1 ) = ∅, pa (z1 ) = ∅, pa (z2 ) = {x1 }, and pa (y2 ) = {x2 }, we have Ifin = I(x3 ∶ {y1 , z1 , z2 , y2 }), Iini = I(x1 ∶ y1 ), Itr1 = 0, Itr2 = 0, Itr3 = I(x1 ∶ z2 |y1 , z1 ), and Itr4 = I(x2 ∶ y2 |y1 , z1 , z2 ). The generalized second law (3.100) then reduces to ⟨𝜎⟩ ≥⟨I(x3 ∶ {y1 , z1 , z2 , y2 })⟩ − ⟨I(x1 ∶ y1 )⟩ − ⟨I(x1 ∶ z2 |y1 , z1 )⟩ − ⟨I(x2 ∶ y2 |y1 , z1 , z2 )⟩.

(3.153)

3.7 Summary and Prospects

In this chapter, we have explored a general framework of information thermodynamics based on Bayesian networks [52]. In our framework, Bayesian networks are used to graphically characterize stochastic dynamics of nonequilibrium thermodynamic systems. Each node of a Bayesian network describes a state of a physical system at a particular time, and each edge describes the causal relationship in the stochastic dynamics. A simple application of our framework is the setup of “Maxwell’s demon,” which performs measurements and feedback control, and can extract the work by using information. Moreover, our framework is not restricted to such simple measurement-feedback situations, but is applicable to a broader class of nonequilibrium dynamics with information exchanges. Our main result is the generalized second law of thermodynamics (3.100). The entropy production ⟨𝜎⟩, which is the sum of the entropy changes in system X and the heat bath, is bounded by an informational quantity ⟨Θ⟩, which consists of the initial and final mutual information between system X and other systems C, and the transfer entropy from X to C during the dynamics. A key ingredient here is the transfer entropy, which quantifies the directional information transfer from one stochastic system to the other. The physical meaning of the generalized second law is that the entropy reduction of system X is bounded by the available information about X obtained by C. We note that the generalized second law is derived as a consequence of the nonnegativity of the relative entropy as shown in (3.105), and also as a consequence of the integral fluctuation theorem (3.107). We have also discussed the relationship between the generalized second law with the transfer entropy (3.141) and that with the dynamic information flow (3.145) in Section 6.4; the latter second law is stronger. While we have focused on discrete-time dynamics in this chapter, we can also formulate continuous-time dynamics by Bayesian networks, where we assume that edges represent infinitesimal transitions [52, 63]. For the case of quantum systems, the effect of a single quantum measurement and feedback control has been studied, and the generalizations of the second law and the fluctuation theorem have been derived in the quantum regime [71–79]. However, the generalization of the formulation with Bayesian networks to the quantum regime has been elusive, which is a fundamental open problem. Potential applications of information thermodynamics beyond the conventional setup of Maxwell’s demon can be found in the field of biophysics. In fact,

95

96

3 Information Flow and Entropy Production on Bayesian Networks

there have been several works that analyze the adaptation process of living cells in terms of information thermodynamics [61–63]. For example, by applying the generalized second law to biological signal transduction of Escherichia coli (E. coli) chemotaxis, we found that the robustness of adaptation is quantitatively characterized by the transfer entropy inside a feedback loop of the signal transduction [63]. Moreover, it has been found that the E. coli chemotaxis is inefficient (dissipative) as a conventional thermodynamic engine, but is efficient as an information-thermodynamic engine. These results suggest that information thermodynamics is indeed useful to analyze autonomous information processing in biological systems. Another potential application of information thermodynamics would be machine learning, because neural networks perform stochastic information processing on complex networks. In fact, there has been an attempt to analyze neural networks in terms of information thermodynamics [80]. Moreover, information thermodynamics of neural information processing in brains would also be another fundamental open problem.

References 1. Callen, H.B. (1985) Thermodynamics and

2. 3.

4.

5.

6.

7.

8.

9.

an Introduction to Thermostatistics, 2nd edn, John Wiley & Sons, Inc., New York. Sekimoto, K. (2010) Stochastic Energetics, Springer-Verlag, New York. Seifert, U. (2012) Stochastic thermodynamics, fluctuation theorems and molecular machines. Rep. Prog. Phys., 75, 126001. Evans, D.J., Cohen, E.G.D., and Morris, G.P. (1993) Probability of second law violations in shearing steady states. Phys. Rev. Lett., 71, 2401. Gallavotti, G. and Cohen, E.G.D. (1995) Dynamical ensembles in nonequilibrium statistical mechanics. Phys. Rev. Lett., 74, 2694. Evans, D.J. and Searles, D.J. (2002) The fluctuation theorem. Adv. Phys., 51, 1529. Jarzynski, C. (1997) Nonequilibrium equality for free energy differences. Phys. Rev. Lett., 78, 2690. Jarzynski, C. (2000) Hamiltonian derivation of a detailed fluctuation theorem. J. Stat. Phys., 98, 77. Seifert, U. (2005) Entropy production along a stochastic trajectory and an integral fluctuation theorem. Phys. Rev. Lett., 95, 040602.

10. Wang, G.M., Sevick, E.M., Mittag, E.,

11.

12.

13.

14.

15. 16.

Searles, D.J., and Evans, D.J. (2002) Experimental demonstration of violations of the second law of thermodynamics for small systems and short time scales. Phys. Rev. Lett., 89, 050601. Liphardt, J., Dumont, S., Smith, S.B., Tinoco, I., and Bustamante, C. (2002) Equilibrium information from nonequilibrium measurements in an experimental test of Jarzynski’s equality. Science, 296, 1832. Collin, D., Ritort, F., Jarzynski, C., Smith, S.B., Tinoco, I., and Bustamante, C. (2005) Verification of the Crooks fluctuation theorem and recovery of RNA folding free energies. Nature, 437, 231. Hayashi, K., Ueno, H., Iino, R., and Noji, H. (2010) Fluctuation theorem applied to F1 -ATPase. Phys. Rev. Lett., 104, 218103. Parrondo, J.M.R., Horowitz, J.M., and Sagawa, T. (2015) Thermodynamics of information. Nat. Phys., 11, 131–139. Maxwell, J.C. (1871) Theory of Heat, Appleton, London. Leff, H.S. and Rex, A.F. (eds) (2003) Maxwell’s Demon 2: Entropy, Classical and Quantum Information, Computing, Princeton University Press, Princeton, NJ.

References 17. Szilard, L. (1929) Über die Entropiev-

18.

19.

20.

21.

22.

23. 24.

25.

26.

27.

28.

29.

30.

31.

32.

erminderung in einem thermodynamischen System bei Eingriffen intelligenter Wesen. Z. Phys., 53, 840. Brillouin, L. (1951) Maxwell’s demon cannot operate: information and entropy. I. J. Appl. Phys., 22, 334. Bennett, C.H. (1982) The thermodynamics of computation – a review. Int. J. Theor. Phys., 21, 905. Landauer, R. (1961) Irreversibility and heat generation in the computing process. IBM J. Res. Dev., 5, 183. Shannon, C.E. (1948) A mathematical theory of communication. Bell Syst. Tech. J., 27, 379–423. Cover, T.M. and Thomas, J.A. (1991) Elements of Information Theory, John Wiley & Sons, Inc., New York. Schreiber, T. (2000) Measuring information transfer. Phys. Rev. Lett., 85, 461. Touchette, H. and Lloyd, S. (2000) Information-theoretic limits of control. Phys. Rev. Lett., 84, 1156. Touchette, H. and Lloyd, S. (2004) Information-theoretic approach to the study of control systems. Physica A, 331, 140. Cao, F.J. and Feito, M. (2009) Thermodynamics of feedback controlled systems. Phys. Rev. E, 79, 041118. Sagawa, T. and Ueda, M. (2010) Generalized Jarzynski equality under nonequilibrium feedback control. Phys. Rev. Lett., 104, 090602. Ponmurugan, M. (2010) Generalized detailed fluctuation theorem under nonequilibrium feedback control. Phys. Rev. E, 82, 031129. Fujitani, Y. and Suzuki, H. (2010) Jarzynski equality modified in the linear feedback system. J. Phys. Soc. Jpn., 79, 104003. Horowitz, J.M. and Vaikuntanathan, S. (2010) Nonequilibrium detailed fluctuation theorem for repeated discrete feedback. Phys. Rev. E, 82, 061120. Esposito, M. and Van den Broeck, C. (2011) Second law and Landauer principle far from equilibrium. Europhys. Lett., 95, 40004. Horowitz, J.M. and Parrondo, J.M. (2011) Thermodynamic reversibility in

33.

34.

35.

36.

37.

38.

39.

40.

41.

42.

43.

44.

feedback processes. Europhys. Lett., 95, 10005. Ito, S. and Sano, M. (2011) Effects of error on fluctuations under feedback control. Phy. Rev. E, 84, 021123. Abreu, D. and Seifert, U. (2011) Extracting work from a single heat bath through feedback. Europhys. Lett., 94, 10001. Sagawa, T. and Ueda, M. (2012) Nonequilibrium thermodynamics of feedback control. Phys. Rev. E, 85, 021104. Sagawa, T. and Ueda, M. (2012) Fluctuation theorem with information exchange: role of correlations in stochastic thermodynamics. Phys. Rev. Lett., 109, 180602. Kundu, A. (2012) Nonequilibrium fluctuation theorem for systems under discrete and continuous feedback control. Phys. Rev. E, 86, 021107. Still, S., Sivak, D.A., Bell, A.J., and Crooks, G.E. (2012) Thermodynamics of prediction. Phys. Rev. Lett., 109, 120604. Sagawa, T. and Ueda, M. (2013) Role of mutual information in entropy production under information exchanges. New J. Phys., 15, 125012. Toyabe, S., Sagawa, T., Ueda, M., Muneyuki, E., and Sano, M. (2010) Experimental demonstration of information-to-energy conversion and validation of the generalized Jarzynski equality. Nat. Phys., 6, 988. Bérut, A., Arakelyan, A., Petrosyan, A., Ciliberto, S., Dillenschneider, R., and Lutz, E. (2012) Experimental verification of Landauer’s principle linking information and thermodynamics. Nature, 483, 187. Bérut, A., Petrosyan, A., and Ciliberto, S. (2013) Detailed Jarzynski equality applied to a logically irreversible procedure. Europhys. Lett., 103, 60002. Roldán, E., Martinez, I.A., Parrondo, J.M.R., and Petrov, D. (2014) Universal features in the energetics of symmetry breaking. Nat. Phys., 10, 457. Koski, J.V., Maisi, V.F., Sagawa, T., and Pekola, J.P. (2014) Experimental observation of the role of mutual information in the nonequilibrium dynamics of a Maxwell demon. Phys. Rev. Lett., 113, 030601.

97

98

3 Information Flow and Entropy Production on Bayesian Networks 45. Kim, K.H. and Qian, H. (2007) Fluc-

46.

47.

48.

49.

50.

51.

52.

53.

54.

55.

56.

57.

tuation theorems for a molecular refrigerator. Phys. Rev. E, 75, 022102. Munakata, T. and Rosinberg, M.L. (2012) Entropy production and fluctuation theorems under feedback control: the molecular refrigerator model revisited. J. Stat. Mech.: Theory Exp., 2012, P05010. Munakata, T. and Rosinberg, M.L. (2014) Entropy production and fluctuation theorems for Langevin processes under continuous non-Markovian feedback control. Phys. Rev. Lett., 112, 180601. Mandal, D. and Jarzynski, C. (2012) Work and information processing in a solvable model of Maxwell’s demon. Proc. Natl. Acad. Sci. U.S.A., 109, 11641. Barato, A.C. and Seifert, U. (2014) Unifying three perspectives on information processing in stochastic thermodynamics. Phys. Rev. Lett., 112, 090601. Strasberg, P., Schaller, G., Brandes, T., and Esposito, M. (2013) Thermodynamics of a physical model implementing a Maxwell demon. Phys. Rev. Lett., 110, 040601. Horowitz, J.M., Sagawa, T., and Parrondo, J.M. (2013) Imitating chemical motors with optimal information motors. Phys. Rev. Lett., 111, 010602. Ito, S. and Sagawa, T. (2013) Information thermodynamics on causal networks. Phys. Rev. Lett., 111, 180603. Allahverdyan, A.E., Dominik, J., and Guenter, M. (2009) Thermodynamic efficiency of information and heat flow. J. Stat. Mech.: Theory Exp., 2009, P09011. Hartich, D., Barato, A.C., and Seifert, U. (2014) Stochastic thermodynamics of bipartite systems: transfer entropy inequalities and a Maxwell’s demon interpretation. J. Stat. Mech.: Theory Exp., 2014, P02016. Horowitz, J.M. and Esposito, M. (2014) Thermodynamics with continuous information flow. Phys. Rev. X, 4, 031015. Horowitz, J.M. and Sandberg, H. (2014) Second-law-like inequalities with information and their interpretations. New. J. Phys., 16, 125007. Shiraishi, N. and Sagawa, T. (2015) Fluctuation theorem for partially masked nonequilibrium dynamics. Phys. Rev. E, 91, 012130.

58. Shiraishi, N., Ito, S., Kawaguchi, K., and

59.

60.

61.

62.

63.

64.

65.

66.

67.

68.

69.

70.

71.

72.

Sagawa, T. (2015) Role of measurementfeedback separation in autonomous Maxwell’s demons. New J. Phys., 17, 045012. Barato, A.C., Hartich, D., and Seifert, U. (2013) Information-theoretic versus thermodynamic entropy production in autonomous sensory networks. Phys. Rev. E, 87, 042104. Bo, S., Del Giudice, M., and Celani, A. (2015) Thermodynamic limits to information harvesting by sensory systems. J. Stat. Mech.: Theory Exp., 2015, P01014. Barato, A.C., Hartich, D., and Seifert, U. (2014) Efficiency of celluler information processing. New J. Phys., 16, 103024. Sartori, P., Granger, L., Lee, C.F., and Horowitz, J.M. (2014) Thermodynamic costs of information processing in sensory adaption. PLoS Compt. Biol., 10, e1003974. Ito, S. and Sagawa, T. (2015) Maxwell’s demon in biochemical signal transduction with feedback loop. Nat. Commun., 6, 7498. Minsky, M. (1963) Steps toward artificial intelligence, in Computers and Thought (eds E.A. Feigenbaum and J. Feldman), McGraw-Hill, New York, pp. 406–450. Pearl, J. (1986) Fusion, propagation, and structuring in belief networks. Artif. Intell., 29, 241. Bishop, C.M. (2006) Pattern Recognition and Machine Learning, Springer-Verlag, New York. Pearl, J. (1988) Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann. Pearl, J. (2000) Causality: Models, Reasoning and Inference, MIT press, Cambridge. Jensen, F.V. and Nielsen, T.D. (2009) Bayesian Networks and Decision Graphs, Springer-Verlag. Ay, N. and Polani, D. (2008) Information flows in causal networks. Adv. Complex Syst., 11, 17–41. Sagawa, T. and Ueda, M. (2008) Second law of thermodynamics with discrete quantum feedback control. Phys. Rev. Lett., 100, 080403. Jacobs, K. (2009) Second law of thermodynamics and quantum feedback

References

73.

74.

75.

76.

77.

control: Maxwell’s demon with weak entanglement transfer. Phys. Rev. E, 88, measurements. Phys. Rev. A, 80, 012322. 042143. Sagawa, T. and Ueda, M. (2009) Minimal 78. Sagawa, T. (2012) Second law-like energy cost for thermodynamic inforinequalities with quantum relmation processing: measurement and ative entropy: an introduction, information erasure. Phys. Rev. Lett. 102, arXiv:1202.0983; Chapter of Lectures on 250602; ibid., (2011) 106, 189901(E). Quantum Computing, Thermodynamics and Statistical Physics. (Kinki University Morikuni, Y. and Tasaki, H. (2011) Series on Quantum Computing, World Quantum Jarzynski-Sagawa-Ueda relaScientific, 2012). tions. J. Stat. Phys., 143, 1. Albash, T., Lidar, D.A., Marvian, M., and 79. Goold, J., Huber, M., Riera, A., del Rio, Zanardi, P. (2013) Fluctuation theorems L., and Skrzypczyk, P. (2016) The role for quantum processes. Phys. Rev. E, 88, of quantum information in thermody032146. namics – a topical review, J. Phys. A. 49, 143001. Funo, K., Watanabe, Y., and Ueda, M. (2013) Integral quantum fluctuation 80. Hayakawa, T. and Aoyagi, T. (2015) theorems under measurement and feedLearning in neural networks based on a back control. Phys. Rev. E, 88, 052121. generalized fluctuation theorem, Phys. Tajima, H. (2013) Second law of Rev. E, 92, 052710. information thermodynamics with

99

101

4 Entropy, Counting, and Fractional Chromatic Number Seyed Saeed Changiz Rezaei

In several applications in mathematical chemistry, entropy of a graph is an indication of its structural information content and is considered as a complexity measure. Complexity measures defined on graphs are obtained by applying Shannon’s entropy formula on partitions induced by structural properties of a graph. For instance, such a measure can be used to measure molecular complexity. This leads to a powerful tool in analyzing molecular structures. On the contrary, there are few works about the application of entropy measures for analyzing social network. For a comprehensive survey on different notions of entropy of graphs (see [1]). Here, by the entropy of a graph, we mean an information-theoretic functional, which is defined on a graph with a probability density on its vertex set. This functional was originally proposed by Körner in 1973 to study the minimum number of code words required for representing an information source (see [2]). However, it is worth mentioning that another notion of entropy of graphs was proposed before Körner by Mowshowity in 1968 (see [1]). Körner investigated the basic properties of the graph entropy in several papers from 1973 to 1992 (see [2–8]). Let F and G be two graphs on the same vertex set V . Then, the union of graphs F and G is the graph F ∪ G with vertex set V and its edge set is the union of the edge set of graph F and the edge set of graph G. That is, V (F ∪ G) = V , E(F ∪ G) = E(F) ∪ E(G). The most important property of the entropy of a graph is that it is subadditive with respect to the union of graphs. This leads to the application of graph entropy for graph-covering problem as well as the problem of perfect hashing. The graph-covering problem can be described as follows. Given a graph G and a family of graphs , where each graph Gi ∈  has the same vertex set as G, we want to cover the edge set of G with the minimum number of graphs from . Using the subadditivity of graph entropy, one can obtain lower bounds on this number.

Mathematical Foundations and Applications of Graph Entropy, First Edition. Edited by Matthias Dehmer, Frank Emmert-Streib, Zengqiang Chen, Xueliang Li, and Yongtang Shi. © 2016 Wiley-VCH Verlag GmbH & Co. KGaA. Published 2016 by Wiley-VCH Verlag GmbH & Co. KGaA.

102

4 Entropy, Counting, and Fractional Chromatic Number

Graph entropy was used in a paper by Fredman and Komlós for the minimum number of perfect hash functions of a given range that hash all k-element subsets of a set of a given size (see [9]). As another application of graph entropy, Kahn and Kim [10] proposed a sorting algorithm based on the entropy of an appropriate comparability graph. In 1990, Csiszár et al. characterized minimal pairs of convex corners, which generate the probability density P = (p1 , … , pk ) in a k-dimensional space. Their study led to another definition of the graph entropy in terms of the vertex packing polytope of the graph. They also gave another characterization of a perfect graph using the subadditivity property of graph entropy. The subadditivity property of the graph entropy was further studied in Körner [3], Körner and Longo [5], Körner et al. [6], and Körner and Marton [7]. These studies led to the notion of a class of graphs, which is called normal graphs [11]. In this chapter, we first recall the definition of the entropy of a random variable. Furthermore, we give some applications of entropy methods in counting problems. Specifically, we elaborate the proof of the Brégman’s theorem for counting the number of perfect matchings of a bipartite graph using entropy. Then, we revisit the different definitions of entropy of graphs and reprove that they are all the same. Simonyi [12] showed that the maximum of the graph entropy of a given graph over the probability density of its vertex set is equal to its fractional chromatic number. We call a graph symmetric with respect to graph entropy if the uniform density maximizes its entropy. We give an extended proof for his theorem. Then, we introduce the notion of symmetric graphs with respect to graph entropy and state our results about characterizations of symmetric graphs with respect to graph entropy.

4.1 Entropy of a Random Variable

Let X be a random variable X with probability density p(x). We denote the expectation by E. Then expected value of the random variable X is written as ∑ 𝑥𝑝(x), E(X) = x∈

and for a function g(⋅), the expected value of the random variable g(X) is written as ∑ g(x)p(x), Ep (g(X)) = x∈

or more simply as E(g(X)) when the probability density function is understood from the context. Let X be a random variable which can be drawn according to probability density function p(x). The entropy of X, H(X) is defined as the expected value of the 1 random variable log p(x) , therefore, we have ∑ H(X) = − p(x) log p(x). x∈

4.1

Entropy of a Random Variable

The log is to the base 2 and entropy is expressed in bits. Furthermore, by con1 ≥ 0, which implies that vention, 0 log 0 = 0. Since 0 ≤ p(x) ≤ 1, we have log p(x) H(X) ≥ 0. Let us recall our coin toss example in the previous section with n = 1, where the coin is not necessarily fair. That is, denoting the event head by H and the event tail by T, let P(H) = p and P(T) = 1 − p. Then, the corresponding random variable X is defined as X(H) = 1 and X(T) = 0. That is, we have { 1, Pr{X = 1} = p; X= 0, Pr{X = 0} = 1 − p. Then, H(X) = −p log p − (1 − p) log(1 − p). Note that the maximum of H(X) is equal to 1, which is attained when p = 12 . Thus, the entropy of a fair coin toss, that is, P(H) = P(T) = 12 is 1 bit. More generally, for any random variable X, H(X) ≤ log ||,

(4.1)

with equality only if X is uniformly distributed. The joint entropy H(X, Y ) of a pair of discrete random variables (X, Y ) with a joint probability density function p(x, y) is defined as ∑∑ p(x, y) log p(x, y). H(X, Y ) = − x∈ y∈

Note that we can also express H(X, Y ) as H(X, Y ) = −E(log p(X, Y )). We can also define the conditional entropy of a given another random variable. Let ∑ p(y|x) log p(y|x). H(Y |X = x) = − y∈

Then, conditional entropy H(Y |X) is defined as ∑ p(x)H(Y |X = x). H(Y |X) =

(4.2)

x∈

Now, we can again obtain another description of the conditional entropy in terms of the conditional expectation of random variable as follows: ∑ ∑ p(x) p(y|x) log p(y|x) H(Y |X) = − x∈

=−

∑∑

y∈

p(x, y) log p(y|x)

x∈ y∈

= −E log p(Y |X). The following theorem is proved by Cover and Thomas [13, pp. 17 and 18].

103

104

4 Entropy, Counting, and Fractional Chromatic Number

Theorem 4.1. Let X, Y , and Z be random variables with joint probability distribution p(x, y, z). Then, we have H(X, Y ) = H(X) + H(Y |X), H(X, Y |Z) = H(X|Z) + H(Y |X, Z). Furthermore, letting f (⋅) be any function (see [13, pp. 34 and 35]), we have 0 ≤ H(X|Y ) ≤ H(X|f (Y )) ≤ H(X).

(4.3)

4.2 Relative Entropy and Mutual Information

Let X be a random variable and consider two different probability density functions p(x) and q(x) for X. The relative entropy D(p||q) is a measure of the distance between two distributions p(x) and q(x). The relative entropy or Kullback–Leibler distance between two probability densities p(x) and q(x) is defined as ∑ p(x) D(p||q) = . (4.4) p(x) log q(x) x∈ p(X)

We can see that D(p||q) = Ep log q(X) . Now consider two random variables X and Y with a joint probability density p(x, y) and marginal densities p(x) and p(y). The mutual information I(X; Y ) is the relative entropy between the joint distribution and the product distribution p(x)p(y). More precisely, we have ∑∑ p(x, y) I(X; Y ) = p(x, y) log p(x)p(y) x∈ y∈ = D(p(x, y)||p(x)p(y)). It is proved in Cover and Thomas [13, pp. 28 and 29], that we have I(X; Y ) = H(X) − H(X|Y ),

(4.5)

I(X; Y ) = H(Y ) − H(Y |X), I(X; Y ) = H(X) + H(Y ) − H(X, Y ), I(X; Y ) = I(Y ; X), I(X; X) = H(X).

4.3 Entropy and Counting

In this section, we consider the application of entropy method in counting problems. The following lemmas are two examples of using entropy in solving the well-known combinatorial problems (see [14]).

4.3

Entropy and Counting

Lemma 4.2. (Shearer’s lemma). Suppose n distinct points in ℝ3 have n1 distinct projections on the 𝑋𝑌 -plane, n2 distinct projections on the 𝑋𝑍-plane, and n3 distinct projections on the 𝑌 𝑍-plane. Then, n2 ≤ n1 n2 n3 . Proof. Let P = (A, B, C) be one of the n points picked at random with uniform distribution, and P1 = (A, B), P2 = (A, C), and P3 = (B, C) are its three projections. Then, we have H(P) = H(A) + H(B|A) + H(C|A, B).

(4.6)

Furthermore, H(P1 ) = H(A) + H(B|A), H(P2 ) = H(A) + H(C|A), H(P3 ) = H(B) + H(C|B). Adding both sides of these equations and considering (4.3) and (4.6), we have 2H(P) ≤ H(P1 ) + H(P2 ) + H(P3 ). Now, noting that H(P) = log n, and H(Pi ) ≤ log ni , the lemma is proved. ◽ As another application of the entropy method, we can give an upper bound on the number of perfect matchings of a bipartite graph (see [14]). Theorem 4.3. (Brégman’s theorem). Let G be a bipartite graph with parts V1 and V2 such that |V1 | = |V2 | = n. Let d(v) denote the degree of a vertex v in G. Then, the number of perfect matchings in G is at most ∏ 1 (d(v)!) d(v) . v∈V1

Proof. Let  be the set of perfect matchings of G. Let X be a random variable corresponding to the elements of  with uniform density. Then, H(X) = log ||. The following remark is useful in our discussion. Let Y be any random variable with the set of possible values . First note that the conditional entropy H(Y |X) is obtained using (4.2). Let x denote the set of possible values for the random variable Y given x ∈ , that is, x = {y ∈  ∶ P(Y = y|X = x) > 0}. We partition the set  into sets 1 , 2 , … , r such that for i = 1, 2, … , r and all x ∈ i , we have |x | = i.

(4.7)

Letting Yx be a random variable taking its value on the set x with uniform density, and noting equations (4.1) and (4.7) for all x ∈ i , we have H(Yx ) = log i.

(4.8)

105

106

4 Entropy, Counting, and Fractional Chromatic Number

However, note that H(Y |X) = EX (H(Yx )).

(4.9)

Then, using (4.8) and (4.9), we get H(Y |X) ≤

r ∑

P(X ∈ i ) log i.

(4.10)

i

We define the random variable X(v) for all v ∈ V1 as X(v) ∶= u such that u ∈ V2 and u is matched to v in X, ∀v ∈ V1 . For a fixed ordering vertices v1 , … , vn of V1 , log || = H(X) = H(X(v1 )) + H(X(v2 )|X(v1 )) + · · · + H(X(vn )|X(v1 ), … , X(vn−1 )). Now, pick a random permutation 𝜏 ∶ [n] → V1 , and consider X in the order determined by 𝜏. Then, for every permutation 𝜏, we have H(X) = H(X(𝜏(1))) + H(X(𝜏(2))|X(𝜏(1))) + · · · + H(X(𝜏(n))|X(𝜏(1)), … , X(𝜏(n − 1))).

(4.11)

By averaging over all 𝜏, we get H(X) = E𝜏 (H(X(𝜏(1))) + H(X(𝜏(2))|X(𝜏(1))) + · · · + H(X(𝜏(n))|X(𝜏(1)), … , X(𝜏(n − 1)))). 𝜏 −1 (v). Then, we let v,𝜏

For a fixed 𝜏, fix v ∈ V1 and let k = u in V2 , which are adjacent to vertex v ∈ V1 and

(4.12) be the set of vertices

u ∉ {x(𝜏(1)), x(𝜏(2)), … , x(𝜏(k − 1))}. Letting  (v) be the set of neighbors of v ∈ V1 in V2 , we have v,𝜏 =  (v) ⧵ {x(𝜏(1)), x(𝜏(2)), … , x(𝜏(k − 1))}. Letting d(v) be the degree of vertex v and Yv,𝜏 = |v,𝜏 | be a random variable taking its value in {1, … , d(v)}, then Yv,𝜏 = j,

for j ∈ {1, … , d(v)}.

1 , we have Using (4.9) and noting that PX(v),𝜏 (Yv,𝜏 = j) = d(v) ∑ E𝜏 (X(v)|X(𝜏(1)), X(𝜏(2)), … , X(𝜏(k − 1))) H(X) = v∈V1

≤

∑ v∈V1

E𝜏

( d(v) ∑ j=1

(

)

PX(v) Yv,𝜏 = j ⋅ log j

)

4.5

=

d(v) ∑∑

Entropy of a Convex Corner

E𝜏 (PX(v) (Yv,𝜏 = j)) ⋅ log j

v∈V1 j=1

=

d(v) ∑∑

PX(v),𝜏 (Yv,𝜏 = j) ⋅ log j

v∈V1 j=1 d(v) ∑∑ 1 log j d(v) v∈V1 j=1 ∑ 1 = log (d(v)!) d(v) .

=

v∈V1

Then using (4.12), we get 1

|| ≤ (d(v)!) d(v) .

◽

4.4 Graph Entropy

In this section, we revisit the entropy of a graph that was defined by Körner in 1973 [2]. We present several equivalent definitions of this parameter. However, we will focus mostly on the combinatorial definition. We elaborate on the proof of the theorem by Simonyi relating the entropy of a graph with its fractional chromatic number in more detail.

4.5 Entropy of a Convex Corner

A subset  of ℝn+ is called a convex corner if it is compact, convex, has nonempty interior, and for every 𝐚 ∈ , 𝐚′ ∈ ℝn+ with 𝐚′ ≤ 𝐚, we have 𝐚′ ∈ . For example, the vertex packing polytope 𝑉 𝑃 (G) of a graph G, which is the convex hull of the characteristic vectors of its independent sets, is a convex corner. Now, let  ⊆ ℝn+ be a convex corner and P = (p1 , … , pn ) ∈ ℝn+ a probability density, that is, its coordinates add up to 1. The entropy of P with respect to  is H (P) = min a∈

n ∑ i=1

pi log

1 . ai

∑ Consider the convex corner  ∶= {x ≥ 0, i xi ≤ 1}, which is called a unit corner. The following lemma relates the entropy of a random variable defined in the previous section to the entropy of the unit corner. Lemma 4.4. The entropy H (P) of a probability density P with respect to the unit ∑ corner  is just the regular (Shannon) entropy H(P) = − i pi log pi .

107

108

4 Entropy, Counting, and Fractional Chromatic Number

Proof. Note that H (𝐩) = min − 𝐬∈

∑

pi log si =

i

min ∑

𝐬∈{x≥0,

i xi =1}

−

∑

pi log si .

i

Thus, the above minimum is attained by a probability density vector 𝐬. More precisely, we have H (𝐩) = D(𝐩||𝐬) + H(𝐩). Noting that D(𝐩||𝐬) ≥ 0 and D(𝐩||𝐬) = 0 only if 𝐬 = 𝐩, we get H (𝐩) = H(𝐩).

◽

4.6 Entropy of a Graph

Let G be a graph on vertex set V (G) = {1, … , n}, P = (p1 , … , pn ) be a probability density on V (G), and 𝑉 𝑃 (G) denote the vertex packing polytope of G. The entropy of G with respect to P is then defined as Hk (G, P) = min

𝐚∈𝑉 𝑃 (G)

n ∑

pi log(1∕ai ).

i=1

Let G = (V , E) be a graph with vertex set V and edge set E. Let V n be the set of sequences of length n from V . Then, the graph G(n) is the nth conormal power graph with vertex set V n , and two distinct vertices x and y of G(n) are adjacent in G(n) if there is some i ∈ n such that xi and yi are adjacent in G, that is, E(n) = {(x, y) ∈ V n × V n ∶ ∃i ∶ (xi , yi ) ∈ E}. For a graph F and Z ⊆ V (F), we denote by F[Z] the induced subgraph of F on Z. The chromatic number of F is denoted by 𝜒(F). Let T𝜖(n) = {U ⊆ V n ∶ Pn (U) ≥ 1 − 𝜖}. We define the functional H(G, P) with respect to the probability distribution P on the vertex set V (G) as follows: 1 (4.13) H(G, P) = lim min(n) log 𝜒(G(n) [U]). n→∞ U∈T n 𝜖 Let X and Y be two discrete random variables taking their values on some (possibly different) finite sets and consider the vector-valued random variable formed by the pair (X, Y ) (see [13, p. 16]). Now, let X denote a random variable taking its values on the vertex set of G and Y be a random variable taking its values on the independent sets of G. Having a fixed distribution P over the vertices, the set of feasible joint distributions  consists of the joint distributions Q of X and Y such that ∑ Q(X, Y = y) = P(X). y∈

4.6

Entropy of a Graph

As an example, let the graph G be a five-cycle C5 with the vertex set V (C5 ) = {x1 , x2 , x3 , x4 , x5 }, and let  denote the set of independent sets of G. Let P be the uniform distribution over the vertices of G, that is, 1 P(X = xi ) = , ∀i ∈ {1, … , 5}. 5 Noting that each vertex of C5 lies in two maximal independent sets, we define the joint distribution Q as {1 , y maximal and y ∋ x, Q(X = x, Y = y) = 10 (4.14) 0, Otherwise. is a feasible joint distribution. Now, given a graph G, we define the functional H ′ (G, P) with respect to the probability distribution P on the vertex set V (G) as H ′ (G, P) = min I(X; Y ). 

(4.15)

The following lemmas relate the functionals defined above. I. Csiszar et al. in [17] proved the following lemma. Lemma 4.5. (I. Csiszár et al.). For every graph G, we have Hk (G, P) = H ′ (G, P). Proof. First, we show that Hk (G, P) = H ′ (G, P). Let X be a random variable taking its values on the vertices of G with probability density P = (p1 , … , pn ). Furthermore, let Y be the random variable associated with the independent sets of G and  (G) be the family of independent sets of G. Let q be the conditional distribution of Y , which achieves the minimum in (4.15) and r be the corresponding distribution of Y . Then, we have ∑ ∑ r(F) H ′ (G, P) = I(X; Y ) = − . pi q(F|i) log q(F|i) i i∈F∈ (G) From the concavity of the log function, we have ∑ ∑ r(F) ≤ log q(F|i) log r(F). q(F|i) i∈F∈ (G) i∈F∈ (G) Now, we define the vector 𝐚 by setting ∑ r(F). ai = i∈F∈ (G)

Note that 𝐚 ∈ 𝑉 𝑃 (G). Hence, ∑ H ′ (G, P) ≥ − pi log ai , i

and consequently, H ′ (G, P) ≥ Hk (G, P).

109

110

4 Entropy, Counting, and Fractional Chromatic Number

Now, we prove the reverse inequality. Let 𝐚 ∈ 𝑉 𝑃 (G). Then, letting s be a probability density on  (G), we have ∑ s(F). ai = i∈F∈ (G)

We define transition probabilities as { s(F) i ∈ F, ai q(F|i) = 0 i ∉ F. ∑ Then, setting r(F) = i pi q(F|i), we get ∑ q(F|i) . pi q(F|i) log H ′ (G, P) ≤ r(F) i,F

(4.16)

By the concavity of the log function, we get ∑ ∑ r(F) log r(F) ≤ − r(F) log s(F). − F

Thus, −

∑

F

pi q(F|i) log r(F) ≤ −

i,F

∑

pi q(F|i) log s(F).

i,F

And therefore, H ′ (G, P) ≤

∑

pi q(F|i) log

i,F

∑ q(F|i) =− pi log ai . s(F) i

◽

J. Körner in [2] proved the following lemma. Lemma 4.6. (J. Körner). For every graph G, we have H ′ (G, P) = H(G, P). Proof. See Appendix A.

◽

Lemma 4.5 and Lemma 4.6 above indicate that the three functional defined above are all equal. 4.7 Basic Properties of Graph Entropy

The main properties of graph entropy are monotonicity, subadditivity, and additivity under vertex substitution. Monotonicity is formulated in the following lemma. Lemma 4.7. (J. Körner). Let F be a spanning subgraph of a graph G. Then, for any probability density P, we have (F, P) ≤ H(G, P). Proof. For graphs F and G, we have 𝑉 𝑃 (G) ⊆ 𝑉 𝑃 (F). This immediately implies the statement by the definition of graph entropy. ◽ Subadditivity was first recognized by Körner [4] and he proved the following lemma.

4.7

Basic Properties of Graph Entropy

Lemma 4.8. (J. Körner). Let F and G be two graphs on the same vertex set V and F ∪ G denote the graph on V with edge set E(F) ∪ E(G). For any fixed probability density P, we have H(F ∪ G, P) ≤ H(F, P) + H(G, P). Proof. Let 𝐚 ∈ 𝑉 𝑃 (F) and 𝐛 ∈ 𝑉 𝑃 (G) be the vectors achieving the minima in the definition of graph entropy for H(F, P) and H(G, P), respectively. Note that the vector 𝐚 ⚬ 𝐛 = (a1 b1 , a2 b2 , … , an bn ) is in 𝑉 𝑃 (F ∪ G), simply because the intersection of an independent set of F with an independent set of G is always an independent set in F ∪ G. Hence, we have H(F, P) + H(G, P) =

n ∑

pi log

1 ∑ 1 + p log ai i=1 i bi

pi log

1 ai bi

n

i=1

=

n ∑ i=1

≥ H(F ∪ G, P).

◽

The notion of substitution is defined as follows. Let F and G be two vertex disjoint graphs and v be a vertex of G. By substituting F for v, we mean deleting v and joining every vertex of F to those vertices of G which have been adjacent with v. We will denote the resulting graph Gv←F . We extend this concept also to distributions. If we are given a probability distribution P on V (G) and a probability distribution Q on V (F), then by Pv←Q , we denote the distribution on V (Gv←F ) given by Pv←Q (x) = P(x) if x ∈ V (G) ⧵ v and Pv←Q (x) = P(x)Q(x) if x ∈ V (F). This operation is illustrated in Figure 4.1. Now, we state the following lemma whose proof can be found in Körner et al. [6]. Lemma 4.9. (J. Körner, G. Simonyi, and Zs. Tuza). Let F and G be two vertex disjoint graphs, v be a vertex of G, and P and Q be probability distributions on V (G) and V (F), respectively. Then, we have H(Gv←F , Pv←Q ) = H(G, P) + P(v)H(F, Q). Note that the entropy of an empty graph (a graph with no edges) is always zero (regardless of the distribution on its vertices). Noting this fact, we have the following corollary as a consequence of Lemma 4.9. Corollary 4.10. Let the connected components of the graph G be the subgraphs Gi ’s and P be a probability distribution on V (G). Set Pi (x) = P(x)(P(V (Gi )))−1 , Then, H(G, P) =

∑ i

x ∈ V (Gi ).

P(V (Gi ))H(Gi , Pi ).

111

112

4 Entropy, Counting, and Fractional Chromatic Number

u1

u5

(a)

u2

u4

u3 v1

v3

v2

(b) v1

v2

v3

u5

(c)

u2

u4

u3

Figure 4.1 (a) A 5-cycle G. (b) A triangle F. (c) The graph Gu ← F. 1

Proof. Consider the empty graph on as many vertices as the number of connected components of G. Let a distribution be given on its vertices by P(V (Gi )) being the probability of the vertex corresponding to the ith component of G. Now, substituting each vertex by the component it belongs to and applying Lemma 4.9, the statement follows. ◽ 4.8 Entropy of Some Special Graphs

Now, we look at the entropy of some graphs, which are also mentioned in Simonyi [15, 12]. The first one is the complete graph. Lemma 4.11. For Kn , the complete graph on n vertices, one has H(Kn , P) = H(P). And the next one is the complete multipartite graph.

4.8

Entropy of Some Special Graphs

Lemma 4.12. Let G = Km1 ,m2 ,…,mk , that is, a complete k-partite graph with maximal independent sets of size m1 , m2 , … , mk . Given a distribution P on V (G) let Q be the distribution on S(G), the set of maximal independent sets of G, given by ∑ Q(J) = x∈J P(x) for each J ∈ S(G). Then H(G, P) = H(Kk , Q). A special case of the above Lemma is the entropy of a complete bipartite graph with uniform probability distribution over its vertex set which is equal to 1. Now, let G be a bipartite graph with color classes A and B. For a set D ⊆ A, let  (D) denote the set of neighbors of D in B, that is, a subset of the vertices in B that are adjacent to a vertex in A. Given a distribution P on V (G), we have ∑ pi , ∀D ⊆ V (G). P(D) = i∈D

Furthermore, defining the binary entropy as h(x) ∶= −x log x − (1 − x) log(1 − x),

0 ≤ x ≤ 1,

Körner and Marton proved the following theorem in Ref. [7]. Theorem 4.13. (J. Körner and K. Marton). Let G be a bipartite graph with no isolated vertices and P be a probability distribution on its vertex set. If P(D) P( (D)) ≤ , P(A) P(B) for all subsets D of A, then H(G, P) = h(P(A)). And if P(D) P( (D)) > , P(A) P(B) then, there exists a partition of A = D1 ∪ · · · ∪ Dk and a partition of B = U1 ∪ · · · ∪ Uk such that ) ( k ∑ P(Di ) . P(Di ∪ Ui )h H(G, P) = P(Di ∪ Ui ) i=1 Proof. Let us assume that the condition in the theorem statement holds. Then, using max-flow min-cut theorem (see [16, p. 150]), we show that there exists a probability density Q on the edges of G such that for all vertices v ∈ A, we have ∑ p(v) . (4.17) Q(e) = P(A) v∈e∈E(G) We define a digraph D′ by V (D′ ) = V (G) ∪ {s, t}, and joining vertices s and t to all vertices in parts A and B, respectively. The edges between A and B are the same in G. Furthermore, we orient edges from s to A,

113

114

4 Entropy, Counting, and Fractional Chromatic Number

from A to B, and from B to t. We define a capacity function c ∶ E(D′ ) → ℝ+ as ⎧ p(v) , ⎪ P(A) c(e) = ⎨ 1, ⎪ p(u) , ⎩ P(B)

e = (s, v), v ∈ A, e = (v, u), v ∈ A and u ∈ B, e = (u, t), u ∈ B.

(4.18)

By the definition of c, we note that the maximum 𝑠𝑡-flow is at most 1. Now, by showing that the minimum capacity of an 𝑠𝑡-cut is at least 1, we are done. Let 𝛿(U) be a 𝑠𝑡-cut for some subset U = {s} ∪ A′ ∪ B′ of V (D′ ) with A′ ⊆ A and ′ B ⊆ B. If  (A′ ) ⊄ B′ , then c(𝛿(U)) ≥ 1. Therefore, suppose that  (A′ ) ⊆ B′ . Then, using the assumption P(A′ ) P (A′ ) ≤ , P(A) P(A) we get P(B′ ) P(A ⧵ A′ ) + P(B) P(A) ′ P(A ⧵ A′ ) P(A ) ≥ + = 1. P(A) P(A)

c(𝛿(U)) ≥

(4.19)

(G)| as follows: Now, we define the vector 𝐛 ∈ ℝ|V +

(𝐛)v ∶=

p(v) . P(A)

Then, using (4.18), we have 𝐛 ∈ 𝑉 𝑃 (G). Thus, H(G, P) ≤

∑

p(v) log

v∈V (G)

1 = H(P) − h(P(A)). bv

Then, using Lemmas 4.7 and 4.12, we have H(G, P) ≤ h(P(A)), Now, adding the last two inequalities, we get H(G, P) + H(G, P) ≤ H(P).

(4.20)

4.8

Entropy of Some Special Graphs

On the contrary, by Lemma 4.8, we also have H(P) ≤ H(G, P) + H(G, P).

(4.21)

Comparing (4.21) and (4.22), we get H(P) = H(G, P) + H(G, P), which implies that H(G, P) = h(P(A)). This proves the first part of the theorem. Now, suppose that the condition does not hold. Let D1 be a subset of A such that P(D1 ) P(B) ⋅ P(A) P( (D1 )) is maximal. Now, consider the subgraph (A ⧵ D1 ) ∪ (B ⧵  (D1 )) and for i = 2, … , k, let Di ⊆ A ⧵

i−1 ⋃

Dj ,

j=1

such that

⋃i−1 P(B ⧵ j=1  (Dj )) P(Di ) . ⋃i−1 P( (Di )) P(A ⧵ j=1 Dj )

is maximal. Let Ui =  (Di ) ⧵  (Di ∪ · · · ∪ Di−1 ),

for i = 1, … , k.

Consider the independent sets J0 , … , Jk of the following form: J0 = B, J1 = D1 ∪ B ⧵ U1 , … , Ji = D1 ∪ · · · ∪ Di ∪ B ⧵ U1 ⧵ · · · ⧵ Ui , … , Jk = A. Set P(U1 ) , P(U1 ∪ D1 ) P(Ui+1 ) P(Ui ) − , 𝛼(Ji ) = P(Ui+1 ∪ Di+1 ) P(Ui ∪ Di ) P(Uk ) . 𝛼(Jk ) = 1 − P(Uk ∪ Dk )

𝛼(J0 ) =

for i = 1, … , k − 1,

Note that by the choice of Di ’s, all 𝛼(Ji )’s are nonnegative and add up to one. This (G)| implies that the vector 𝐚 ∈ ℝ|V , defined as + ∑ 𝛼(Jr ), ∀j ∈ V (G), aj = j∈Jr

115

116

4 Entropy, Counting, and Fractional Chromatic Number

is in 𝑉 𝑃 (G). Furthermore, { P(Di ) , j ∈ Di , i ∪Ui ) aj = P(D P(Ui ) , j ∈ Ui . P(D ∪U ) i

i

By the choice of the Dj ’s and using the same max-flow min-cut argument, we obtain a probability density Qi on edges of G[Di ∪ Ui ] such that ∑ pj , ∀j ∈ Di , b′j = Qi (e) = P(D i) j∈e∈E(G[D ∪U ]) b′j

=

∑

i

i

j∈e∈E(G[Di ∪Ui ])

Qi (e) =

pj P(Ui )

,

∀j ∈ Ui .

Now, we define the probability density Q on the edges of G as follows: ( ) { P(Di ∪ Ui )Qi (e), e ∈ E G[Di ∪ Ui ] , Q(e) = 0, e ∉ E(G[Di ∪ Ui ]). The corresponding vector 𝐛 ∈ 𝑉 𝑃 (G) is given by bj = P(Di ∪ Ui )b′j ,

for j ∈ Di ∪ Ui .

The vectors 𝐚 ∈ 𝑉 𝑃 (G) and 𝐛 ∈ 𝑉 𝑃 (G) are the minimizer vectors in the definition of H(G, P) and H(G, P), respectively. Suppose that it is not true. Then, noting that by the definition of 𝐚 and 𝐛, we have ∑ ∑ ∑ 1 1 1 pj log + pj log = pj log = H(P), a b p j j j j∈V (G) j∈V (G) j∈V (G) the subadditivity of graph entropy is violated. Now, it can be verified that H(G, P) is equal to what has been stated in the theorem. ◽ 4.9 Graph Entropy and Fractional Chromatic Number

In this section, we investigate the relationship between the entropy of a graph and its fractional chromatic number, which was already established by Simonyi [12]. First, we recall that the fractional chromatic number of a graph G, denoted by 𝜒f (G), is the minimum sum of nonnegative weights on the independent sets of G, such that for any vertex, the sum of the weights on the independent sets of G containing that vertex is at least one (see [17]). Csiszár et al. [18] showed that for every probability density P, the entropy of a graph G is attained by a point 𝐚 ∈ 𝑉 𝑃 (G) such that there is not any other point 𝐚′ ∈ 𝑉 𝑃 (G) majorizing the point 𝐚 coordinate-wise. Furthermore, for any such point 𝐚 ∈ 𝑉 𝑃 (G), there is some probability density P on 𝑉 𝑃 (G) such that the value of H(G, P) is attained by 𝐚. Using this fact, Simonyi [12] proved the following lemma.

4.9

Graph Entropy and Fractional Chromatic Number

Lemma 4.14. (G. Simonyi). For a graph G and probability density P on its vertices with fractional chromatic number 𝜒f (G), we have max H(G, P) = log 𝜒f (G). P

( Proof. Note that for every graph G, we have

1 , … , 𝜒 1(G) 𝜒f (G) f

)

∈ 𝑉 𝑃 (G). Thus, for

every probability density P, we have H(G, P) ≤ log 𝜒f (G). Now, we show that graph G has an induced subgraph G′ with 𝜒f (G′ ) = 𝜒f (G) = 𝜒f such that if 𝐲 ∈ 𝑉 𝑃 (G′ ) and 𝐲 ≥ 𝜒𝟏 , then 𝐲 = 𝜒𝟏 . f

f

Suppose the above statement does not hold for graph G. Consider all 𝐲 ∈ 𝑉 𝑃 (G) such that 𝟏 . 𝐲≥ 𝜒f (G)

Note that there is not any 𝐲 ∈ 𝑉 𝑃 (G) such that 𝐲>

𝟏 , 𝜒f (G)

because then we have a fractional coloring with value strictly less than 𝜒f (G). Thus, for every 𝐲 ≥ 𝜒 𝟏(G) , there is some v ∈ V (G) such that yv = 𝜒 1(G) . For such a fixed f

𝐲, let

f

{ Ω𝐲 =

}

1 v ∈ V (G) ∶ yv > 𝜒f (G)

.

Let 𝐲∗ be one of those 𝐲’s with |Ω𝐲 | of maximum size. Let G′ = G[V (G) ⧵ Ω𝐲∗ ]. From our definition of G′ and fractional chromatic number, we have either 𝜒f (G′ ) < 𝜒f (G) or ∃𝐲 ∈ 𝑉 𝑃 (G′ ), such that 𝐲 ≥ Suppose 𝜒f (G′ ) < 𝜒f (G). Therefore, 𝐳=

𝟏 ∈ 𝑉 𝑃 (G′ ) 𝜒f (G′ )

𝟏 𝟏 and 𝐲 ≠ . 𝜒f 𝜒f

117

118

4 Entropy, Counting, and Fractional Chromatic Number

and consequently 𝐳>

𝟏 . 𝜒f (G)

Without loss of generality, assume that V (G) ⧵ V (G′ ) = {1, … , |V (G) ⧵ V (G′ )|}. Set

( 1 𝜖 ∶= 2

) 1 min yv − v∈Ω𝐲∗ 𝜒f (G)

> 0,

𝐳∗ = (𝟎T|V (G)⧵V (G′ )| , 𝐳T )T ∈ 𝑉 𝑃 (G). Then (1 − 𝜖)𝐲∗ + 𝜖𝐳∗ ∈ 𝑉 𝑃 (G), which contradicts the maximality assumption of Ω𝐲∗ . Thus, we have 𝜒f (G′ ) = 𝜒f (G). Now we prove that if y ∈ 𝑉 𝑃 (G′ ) and 𝐲 ≥

𝟏 𝜒f

, then 𝐲 =

Suppose 𝐳′ be a point in 𝑉 𝑃 (G′ ) such that 𝐳′ ≥

1 𝜒f

𝟏 𝜒f

.

but 𝐳′ ≠

1 𝜒f

. Set

T

𝐲′ = (𝟎T|V (G)⧵V (G′ )| , 𝐳′ )T ∈ 𝑉 𝑃 (G). Then, using the 𝜖 > 0 defined above, we have (1 − 𝜖)𝐲∗ + 𝜖𝐲′ ∈ 𝑉 𝑃 (G), which contradicts the maximality assumption of Ω𝐲∗ . Now, by Csiszár et al. [18], there exists a probability density P′ on 𝑉 𝑃 (G′ ), such that H(G′ , P′ ) = log 𝜒f . Extending P′ to a probability distribution P as { ′ i ∈ V (G), pi , pi = 0, i ∈ V (G) ⧵ V (G′ ), the lemma is proved. Indeed, suppose that H(G, P) < H(G′ , P′ ) and let 𝐲 ∈ 𝑉 𝑃 (G) be a point in 𝑉 𝑃 (G) which gives H(G, P). Let 𝐲𝑉 𝑃 (G′ ) be the restriction of 𝐲 in 𝑉 𝑃 (G′ ). Then, there exists 𝐳 ∈ 𝑉 𝑃 (G′ ) such that 𝐳 ≥ 𝐲𝑉 𝑃 (G′ ) . This contradicts the fact that H(G′ , P′ ) = log 𝜒f .

◽

Remark 4.15. Note that the maximizer probability distribution of the graph entropy is not unique. Consider C4 with vertex set V (C4 ) = {v1 , v2 , v3 , v4 } with parts (A = {v1 , v) 3 } and B = {v ) Theorem 4.13, probability distributions ( 2 , v4 }. Using P1 = is 1.

1 1 1 1 , , , 4 4 4 4

and P2 =

1 1 3 1 , , , 8 4 8 4

give the maximum graph entropy, which

4.10

Symmetric Graphs with respect to Graph Entropy

4.10 Symmetric Graphs with respect to Graph Entropy

A graph G with distribution P on its vertices is called symmetric with respect to graph entropy H(G, P) if the uniform probability distribution on its vertices maximizes H(G, P). It is worth noting that the notion of a symmetric graph with respect to a functional was already defined by Greco [19]. The characterization of symmetric graphs with respect to graph entropy was comprehensively studied in Refs [20, 21], whose important results are stated as follows. In Ref. [20], we show that every vertex-transitive is symmetric with respect to graph entropy. Furthermore, we characterize all perfect graphs, which are symmetric with respect to graph entropy, as follows. Theorem 4.16. (S. Saeed C. Rezaei, and Chris Godsil). Let G = (V , E) be a perfect graph and P be a probability distribution on V (G). Then, G is symmetric with respect to graph entropy H(G, P) only if G can be covered by its cliques of maximum size. The fractional vertex packing polytope of G = (V , E), that is, 𝐹 𝑉 𝑃 (G) is defined as

{ 𝐹 𝑉 𝑃 (G) ∶=

𝐱∈

| ℝ|V +

∶

∑

} xv ≤ 1 for all cliques K of G

.

v∈K

The following theorem was previously proved in Refs [22, 23]. Theorem 4.17. (V. Chvátal and D. R. Fulkerson). A graph G is perfect only if 𝑉 𝑃 (G) = 𝐹 𝑉 𝑃 (G). The aforementioned theorem and the weak perfect graph theorem, which was proved by Lovász [24], are the main tools in proving Theorem 4.16. Theorem 4.16 implies that every bipartite graph without isolated vertices is symmetric with respect to graph entropy only if it has a perfect matching. Following Schrijver [16], we call a graph G1 a k-graph if it is k-regular and its fractional edge coloring number 𝜒f′ (G1 ) is equal to k. The following theorem, which was proved in Ref. [21], introduces another class of symmetric line graphs with respect to graph entropy. Authors used Karush–Kuhn–Tucker (KKT) optimality conditions in convex optimization (see [25]) to prove the following theorem. Theorem 4.18. (S. Saeed C. Rezaei, and Chris Godsil). Let G1 be a k-graph with k ≥ 3. Then, the line graph G2 = L(G1 ) is symmetric with respect to graph entropy. Let  be a multiset of independent sets of a graph G. We say  is uniform over a subset of vertices W of the vertex set of G if each vertex v ∈ W is covered by a constant number of independent sets in . Having the results of symmetric graphs with respect to graph entropy motivates one to find a characterization for all symmetric graphs with respect to graph

119

120

4 Entropy, Counting, and Fractional Chromatic Number

entropy. The author and Chiniforooshan in Ref. [21] answered this question by proving the following theorems. First, we have the following result. Theorem 4.19. (S. Saeed C. Rezaei, Ehsan Chiniforooshan). For every graph G and every probability distribution P over V (G), we have H(G, P) = log 𝜒f (G[{v ∈ V (G) ∶ pv > 0}]) only if there exists a multiset of independent sets , such that (1)  is uniform over {v ∈ V (G) ∶ pv > 0}, and (2) every independent set I ∈  is a maximum weighted independent set with respect to P. We say a graph is symmetric with respect to graph entropy if the uniform probability distribution maximizes its entropy. A corollary of the aforementioned theorem is the following characterization for symmetric graphs. Theorem 4.20. (S. Saeed C. Rezaei, Ehsan Chiniforooshan). A graph G is symn metric only if 𝜒f (G) = 𝛼(G) . Finally, the author and Chiniforooshan in Ref. [21] consider the complexity of deciding whether a graph is symmetric with respect to its entropy by proving the following theorem. Theorem 4.21. (S. Saeed C. Rezaei, Ehsan Chiniforooshan). It is co-NP-hard to decide whether a given graph G is symmetric. 4.11 Conclusion

In this chapter, we revisited the notion of entropy of a random variable and its application in counting problems. In particular, we restated Shearer’s lemma and Brégman’s theorem. We also gave a detailed proof of Brégman’s theorem. Furthermore, we studied the notion of entropy of graphs proposed by Körner. This notion of graph entropy helps to determine the minimum number of codewords required for encoding messages emitted by an information source. It is note worthy that symbols of messages belong to a finite set represented by the vertex set of an appropriate graph. We explained the relationship between the entropy of a graph and its fractional chromatic number established already by Simonyi, and we gave an extended proof of Simonyi’s theorem. In this respect, we introduced the notion of a symmetric graph with respect to graph entropy and cited our results about characterizations of them.

Appendix 4.A

Appendix 4.A Proof of Lemma 4.6

Here, we explain the proof of Lemma 4.6 with more detail than the proof already stated in the literature. First, we state a few lemmas as follows. Lemma 4.A.1. The chromatic number of a graph G, that is, 𝜒(G) is equal to the minimum number of maximal independent sets covering G. Proof. Let 𝜅(G) be the minimum number of maximal independent sets covering the vertices of G. Then 𝜅(G) ≤ 𝜒(G), since the color classes of any proper coloring of V (G) can be extended to maximal independent sets. On the contrary, consider a covering system consisting of maximal independent sets  with a minimum number of maximal independent sets. Let  = {S1 , … , S𝜅(G) }. We define a coloring c of the vertices of graph G as c(v) = i,

∀v ∈ Si ⧵ Si−1 , and ∀i ∈ {1, … , 𝜅(G)}.

The proposed coloring is a proper coloring of the vertices of V (G), in which each color class corresponds to a maximal independent set in our covering system . That is, 𝜅(G) ≥ 𝜒(G).

◽

Let  be a finite set, P be a probability density on its elements, and K be a constant. Then, a sequence 𝐱 ∈  n is called P-typical if for every y ∈  and for the number of occurrences of the element y in 𝐱, that is, N(y|𝐱), we have √ |N(y|𝐱) − 𝑛𝑝(y)| ≤ K p(y). Then, we have the following lemma. Lemma 4.A.2. Let T n (P) be the set of P-typical n-sequences. Then, (1)

For all 𝜖 > 0, there exists K > 0 such that ( ) P T n (P) < 𝜖, for this K.

(2)

For every typical sequence 𝐱, we have 2−(𝑛𝐻(P)+C

√ n)

≤ P(𝐱) ≤ 2−(𝑛𝐻(P)−C

√ n)

,

for some constant C > 0 depending on || and 𝜖 > 0 and independent of n and P.

121

122

4 Entropy, Counting, and Fractional Chromatic Number

(3) The number of typical sequences N(n) is bounded as 2𝑛𝐻(P)−C

√

n

≤ N(n) ≤ 2𝑛𝐻(P)+C

√ n

,

for some constant C > 0 depending on || and 𝜖 > 0 and independent of n and P. Having  defined as above, let (G, P) be a probabilistic graph with vertex V (G) = . We define the relation e as 𝑥𝑒𝑦 ⇔ either{x, y} ∈ E(G) or x = y. If e determines an equivalence relation on the vertex set V (G), then graph G is the union of pairwise disjoint cliques. Let H(P|e) denote the conditional entropy given the equivalence class e, that is, ∑ ∑ y∶𝑥𝑒𝑦 p(y) . H(P|e) = p(x) log p(x) x∈ Let  denote the collection of equivalence classes under e. Let Pe be the probability density on the element A of  given by ∑ pe (A) = p(x). x∈A

Then, we have the following lemma (see [26, 2]). Lemma 4.A.3. (V. Anantharam). The number of P-typical n-sequences in a Pe √ 𝑛𝐻(P|Pe )−C n and typical n-sequence of equivalence classes is bounded below by 2 √ above by 2𝑛𝐻(P|Pe )+C n for some constant C > 0. Proof. Let 𝐀 = (A1 , … , An ) be a Pe -typical n-sequence. That is, for each A ∈  √ (A.1) |N(A|𝐀) − npe (A)| ≤ K npe (A). Then, for all A ∈ , we have 𝑛𝑝(A) ≤ max(4K 2 , 2N(A|𝐀)). The proof is as follows. Suppose 𝑛𝑝(A) ≥ 4K 2 . Then, 𝑛𝑝(A) ≥ 2K therefore, √ np (A) . N(A|𝐀) ≥ npe (A) − K npe (A) ≥ e 2 Let 𝐱 = (x1 , … , xn ) be a P-typical n-sequence in 𝐀, that is, xi ∈ A i ,

1 ≤ i ≤ n.

From P-typicality of 𝐱, we have √ |N(x|𝐱) − 𝑛𝑝(x)| ≤ K 𝑛𝑝(x).

√

(A.2) npe (A) and

Appendix 4.A

Now, we prove that for each ( ) A ∈ , the restriction of 𝐱 to those coordinates having p(x) Ai = A is p (A) ∶ x ∈ A -typical. For x ∈ A, we have e

| | p(x) || p(x) || | | | ≤ |N(x|𝐱) − 𝑛𝑝(x)| + |𝑛𝑝(x) − N(A|𝐀) | |N(x|𝐱) − N(A|𝐀) | | pe (A) || pe (A) || | | √ p(x) √ K npe (A) ≤ K 𝑛𝑝(x) + pe (A) ) (√ p(x) p(x) √ + npe (A). =K pe (A) pe (A) √ p(x) p(x) Using (A.3), and noting N(A|𝐀) ≥ 1 and p (A) ≤ p (A) , we get e e (√ ) | p(x) p(x) √ p(x) || | + max(4K 2 , 2N(A|𝐀)) |≤K |N(x|𝐱) − N(A|𝐀) | pe (A) || pe (A) pe (A) | )√ (√ ) ( p(x) p(x) 4K 2 + ,2 max ≤K pe (A) pe (A) N(A|𝐀) √ ⋅ N(A|𝐀) ) (√ √ √ p(x) p(x) + ≤ max(2K 2 , 2K) N(A|𝐀) pe (A) pe (A) √ √ N(A|𝐀)p(x) 2 . (A.3) ≤ 2max(2K , 2K) pe (A) ∑ pe (A) p(x) , we give the following lower Now, letting H(P|e = A) denote x∈A p (A) log p(x) e and upper bounds on the number of P-typical n-sequences 𝐱 in 𝐀. Let C > 0 be some constant depending on K and || as in Lemma 4.11, then using Lemma 4.11 and (A.2), we get the following upper bound on the P-typical n-sequences 𝐱 in 𝐀 √ ∏ N(A|𝐀)H( P ) Pe (A) + C 2 N(A|𝐀) A∈

=2 ≤2

n n

≤ 2n

∑ A∈

∑ A∈

(

N(A|𝐀) H(P|e=A)+C n

) √ N(A|𝐀)

( ) √ √ ∑ pe (A)+ Kn npe (A) H(P|e=A)+ A∈ C n

∑

A∈ pe (A)H(P|Pe )+K

∑ A∈

√ ∑ n(C||+K A∈ log |A|) √ 𝑛𝐻(P|Pe )+ n(C||+K||)

≤ 2𝑛𝐻(P|Pe )+ =2

√ √ npe (A)H(P|e=A)+C|| n

Now, setting C1 = C|| + K||,

.

123

124

4 Entropy, Counting, and Fractional Chromatic Number

the number of P-typical n-sequences 𝐱 in 𝐀 is upper bounded by 2𝑛𝐻(P|e)+C1

√ n

.

Similarly, the number of P-typical n-sequences 𝐱 in 𝐀 is lower bounded by 2𝑛𝐻(P|e)−C1

√ n

.

Let 0 < 𝜖 < 1, and M(n, 𝜖) denote min 𝜒(G(n) [U]),

U∈T𝜖(n)

for sufficiently large n. Let 𝜆 > 0 be a positive number. First, we show that ′

M(n, 𝜖) ≥ 2(H (G,P)−𝜆) . Consider G(n) [U] for some U ∈ T𝜖(n) . Using Lemma 4.A.2, for any 𝛿 > 0, there is a K > 0, such that for any sufficiently large n, we have P(T n (P)) ≥ 1 − 𝛿. First, note that 1 − 𝛿 − 𝜖 ≤ P(U ∩ T n (P)).

(A.4)

Now, we estimate the chromatic number of G(n) [U ∩ T n (P)]. Let  n denote the family of the maximal independent sets of G(n) . Note that every color class in a minimum coloring of graph can be enlarged to a maximal independent set. Thus, P(U ∩ T n (P)) ≤ 𝜒(G(n) [U ∩ T n (P)]) ⋅ maxn P(𝐒 ∩ T n (P)). 𝐒∈

(A.5)

Furthermore, we have p(𝐱) ⋅ maxn |𝐒 ∩ T n (P)|. maxn P(𝐒 ∩ T n (P)) ≤ max n 𝐒∈

𝐱∈T (P)

S∈

(A.6)

It is worth mentioning that |𝐒 ∩ T n (P)| is the number of typical sequences contained in 𝐒. Furthermore, note that 𝐒 can be considered as an n-sequence of maximal independent sets taken from . Let N(y, R|𝐱, 𝐒) denote the number of occurrences of the pair (y, R) in the following double n-sequence: ) ( xn x 1 x2 · · · . S1 S2 · · · Sn In other words, N(y, R|𝐱, 𝐒) is the number of occurrences of the letter y selected from the maximal independent set R in the n-sequence 𝐱 taken from the maximal independent sequence 𝐒. Similarly, N(y|𝐱) denotes the number of occurrences of the source letter y in the n-sequence 𝐱. Setting q(y, R) =

N(y, R|𝐱, 𝐒) ⋅ p(y), N(y|𝐱)

(A.7)

Appendix 4.A

we have | 𝑛𝑞(y, R) | | ⋅ |N(y|𝐱) − 𝑛𝑝(y)| |N(y, R|𝐱, 𝐒) − 𝑛𝑞(y, R)| = || | | 𝑛𝑝(y) | √ 2 √ | q(y, R) | √ | ⋅ K 𝑛𝑝(y) = K n ⋅ q (y, R) ≤ K 𝑛𝑞(y, R), ≤ || | p(y) p(y) | | since 𝐱 is a P-typical sequence. Let ∑ a(R) = q(y, R).

(A.8)

y∶y∈R

Then, N(R|𝐒) − 𝑛𝑎(R) =

∑

N(y, R|𝐱, 𝐒) − 𝑛𝑞(y, R).

y∈R

And, therefore, using (A.9), |N(R|𝐒) − 𝑛𝑎(R)| ≤

√ ∑ ∑ √ K 𝑛𝑞(y, R) ≤ K1 n q(y, R) y∈

√

y∈

= K1 𝑛𝑎(R).

(A.9)

Now, we define an auxiliary graph Γ of G as follows. Letting S be a maximal independent set of G containing a vertex x of G, the vertex set of Γ consists of pairs (x, S). Furthermore, two vertices (x, S) and (y, R) are adjacent only if S ≠ R. Let K2 > 0 be some constant. Then, applying Lemma 4.A.3 with the equivalence relation a, which is ((x, S), (y, R)) ∉ E(Γ), and probability density Q for the graph Γ, the number of Q-typical n-sequences in each a-typical equivalence class A, which is a maximal independent set of G, lies in the interval [ √ √ ] (A.10) 2𝑛𝐻(Q|a)−K2 n , 2𝑛𝐻(Q|a)+K2 n . Noting that every pair (y, R) may occur 0, 1, …, or n times in the n-sequence (𝐱, 𝐒) and for a given y knowing N(y, R|𝐱, 𝐒) for all R uniquely determines N(y|𝐱), there are at most (n + 1)|V (Γ)| different auxiliary densities of the type given by (A.8). Now, we bound max𝐒∈ n |𝐒 ∩ T n (P)| as follows. Note that 𝐒 ∩ T n (P) is the set of P-typical n-sequences, which are contained in a given maximal independent set 𝐒 in G(n) . Then, letting  be the feasible joint distribution for (X, S), for all 𝐒 ∈  n and all Q ∈ , set T n (S, Q) ∶= {𝐱 ∶ 𝐱 ∈  n , xi ∈ Si , (𝐱, 𝐒) is Q-typical.}

125

126

4 Entropy, Counting, and Fractional Chromatic Number

From (A.8), for all 𝐒 ∈  n and for all 𝐱 in |𝐒 ∩ T n (P)|, there is some Q ∈ , such that 𝐱 ∈ T n (S, Q). Therefore, for all 𝐒 ∈  n , we get ⋃ |𝐒 ∩ T n (P)| ≤ | T n (S, Q)| Q∈

≤

∑

|T n (S, Q)|

Q∈

≤ || max |T n (S, Q)|. Q∈

Then, using (A.11), we obtain maxn |𝐒 ∩ T n (P)| ≤ (n + 1)|V (Γ)| ⋅ 2n⋅maxQ′ ∈ H(Q |a)+K2 ′

𝐒∈

Further, ∑

q(y, R) =

R∶y∈R

√ n

.

(A.11)

∑ p(y) ⋅ N(y, R|𝐱, 𝐒) = p(y). N(y|𝐱) R∶y∈R

From Lemma 4.A.2 part (ii), we get max p(𝐱) ≤ 2−(𝑛𝐻(P)−C n

𝐱∈T (P)

√ n)

.

(A.12)

Thus, using the inequalities (A.5)–(A.7), (A.12), and (A.13), we have ) ( (1 − 𝜆 − 𝜖) ≤ 𝜒 G(n) [U ∩ T n (𝐩)] ( ( ) ) √ ′ H(Q |a) − H(P) + K n + |V (Γ)| ⋅ log (n + 1) . ⋅exp2 n ⋅ max 2 2 ′ Q ∈

And consequently, ( ) 𝜒 G(n) [U ∩ T n (P)] ≥ (1 − 𝜆 − 𝜖) (A.13) ( ( )) √ H(Q′ |a) − K2 n − |V (Γ)| ⋅ log2 (n + 1) . ⋅ exp2 n H(P) − max ′ Q ∈

Note that H(Q′ |a) = min H(P) − max ′ ′ Q ∈

Q ∈

∑

q′ (x, S)log2

x,S

q′ (x, S) = min I(Q′ ). p(x) ⋅ q′ (S) Q′ ∈

Now, considering ) ( ) ( 𝜒 G(n) [U] ≥ 𝜒 G(n) [U ∩ T n (P)] , and using (A.14), for every U ∈ T𝜖(n) , we get

√ 𝜒(G(n) [U]) ≥ (1 − 𝜆 − 𝜖) ⋅ exp2 (nH ′ (G, P) − K2 n − |V (Γ)| ⋅ log2 (n + 1)).

Thus, 1 log n 2

(

) ( ) 1 minn 𝜒 G(n) [U] ≥ log2 (1 − 𝜆 − 𝜖) + H ′ (G, P) U∈T𝜖 n K |V (Γ)| log2 (n + 1). − √2 − n n

Appendix 4.A

Therefore, we get lim inf

n→∞

1 log M(n, 𝜖) ≥ H ′ (G, P). n 2

(A.14)

Now, we show that, for every 0 < 𝜖 < 1 and 𝛿 > 0 and sufficiently large n, there exist subgraphs G(n) [U] of G(n) , for some U ⊆ V (G(n) ), such that ′

𝜒(G(n) [U]) ≤ 2n(H (G,P)+𝛿) . Let Q∗ be the joint density on vertices and independent sets of G, which minimizes the mutual information I(Q∗ ). That is, I(Q∗ ) = H ′ (G, P). Then, the probability of every maximal independent set S is ∑ Q∗ (S) = Q∗ (y, S). y∶y∈S

Letting 𝐒 be 𝐒 = (S1 , S2 , … , Sn ) ∈  n , we have 𝐐∗ (𝐒) =

n ∏

Q∗ (Si ).

i=1

Let L be a fixed parameter. For a family of L maximal independent sets, not necessarily distinct and not necessarily covering, we define the corresponding probability density 𝐐∗L as follows. We assume that the L maximal independent sets of a given system of maximal independent sets are chosen independently. Thus, Q∗L (𝐒1 , 𝐒2 , … , 𝐒L ) =

L ∏

Q∗ (𝐒j ).

j=1

Now consider a fixed n. Let G(n) be the nth conormal power graph of graph G. Consider systems of maximal independent sets consisting of L maximal independent sets each in the form of an n-sequence of maximal independent sets. We call this system of maximal independent sets an L-system. For each L-system (𝐒1 , 𝐒2 , … , 𝐒L ), let U(𝐒1 , 𝐒2 , … , 𝐒L ) be the union of all vertices of V (G(n) ), which are not covered by the L-system (𝐒1 , 𝐒2 , … , 𝐒L ). For a given L, we show that the expected value of P(U(𝐒1 , 𝐒2 , … , 𝐒L )) is less than 𝜖. This implies that there exists at least one system 𝐒1 , … , 𝐒L covering a subgraph of G(n) with probability greater than or equal to 1 − 𝜖. For an L-system chosen with probability Q∗L , let Q∗L,𝐱 be the probability that a given n-sequence 𝐱 is not covered by an L-system, that is, Q∗L,𝐱 = Q∗L ({(𝐒1 , … , 𝐒L ) ∶ 𝐱 ∈ U(𝐒1 , … , 𝐒L )}) ∑ = Q∗L (𝐒1 , … , 𝐒L ). (𝐒1 ,…,𝐒L )∋𝐱

127

128

4 Entropy, Counting, and Fractional Chromatic Number

Then, we have E(P(U(𝐒1 , … , 𝐒L ))) =

∑ 𝐒1 ,…,𝐒L

Q∗L (𝐒1 , … , 𝐒L ) ⋅ P(U(𝐒1 , … , 𝐒L ))

∑

=

(𝐒1 ,…,𝐒L )

=

∑ 𝐱∈ n

=

∑

𝐱∈ n

( Q∗L (𝐒1 , … , 𝐒L ) (

∑

P(𝐱)

U(𝐒1 ,…,𝐒L )∋𝐱

∑

) P(𝐱)

𝐱∈U(𝐒1 ,…,𝐒L )

( ) Q∗L 𝐒1 , … , 𝐒L

)

P(𝐱) ⋅ Q∗L,𝐱 .

(A.15)

For a given 𝜖 with 0 < 𝜖 < 1, by Lemma 4.A.2, there exists a set of typical sequences with total probability greater than or equal to 1 − 2𝜖 . Then, we can write the right-hand side of the above equation as ∑ P(𝐱) ⋅ Q∗L,𝐱 𝐱∈ n

=

∑

𝐱∈T n (P)

+

∑

P(𝐱) ⋅ Q∗L,𝐱 P(𝐱) ⋅ Q∗L,𝐱 .

(A.16)

𝐱∈T n (P)

) ( The second term in (A.17) is upper-bounded by P T n (P) , which is less than 2𝜖 . ′

We give an upper bound for the first term and show that for L = 2n(H (G,P)+𝛿) , it tends to 0 as n → ∞. Now, ∑ P(𝐱).Q∗L,𝐱 ≤ 𝐱∈T n (P) n

Q∗L,𝐱 ≤ P(T (P)). max n 𝐱∈T (P)

Q∗L,𝐱 . max n

𝐱∈T (P)

If an n-sequence 𝐱 is not covered by an L-system, then 𝐱 is not covered by any element of this system. Letting 𝐱 be the set of maximal independent sets covering the n-sequence 𝐱, we have max Q∗L,𝐱 = max (1 − Q∗ (𝐱 ))L . n

𝐱∈T n (P)

𝐱∈T (P)

(A.17)

We obtain a lower bound for Q∗ (𝐱 ) by counting the Q∗ -typical n-sequences of maximal independent sets covering 𝐱 ∈ T n (P). This number is greater than or equal to the Q∗ -typical sequences (𝐲, 𝐁) with the first coordinate equal to 𝐱. The equality of the first coordinate of the ordered pairs in V (Γ) is an equivalence relation p on the set V (Γ). Thus, using Lemma 4.A.3, the number of the Q∗ -typical n-sequences of maximal independent sets is bounded from below by 2𝑛𝐻(Q

∗ |q)−K

3

√ n

.

(A.18)

Appendix 4.A

Let K4 be a constant independent of n and the density a(Q∗ ). Then, applying Lemma 4.A.2 to  and the marginal distribution a(Q∗ ) of Q∗ over the maximal independent sets, we obtain the following lower bound on the probability Q∗ of the a(Q∗ )-typical n-sequences of maximal independent sets, Q∗ ≥ 2−(𝑛𝐻(a(Q

∗ ))+K

4

√ n)

.

(A.19)

Combining (A.18)–(A.20), we get max Q∗L,𝐱 ≤

𝐱∈T n (P)

√ √ (1 − exp2 (−(𝑛𝐻(a(Q∗ )) + K4 n) + 𝑛𝐻(Q∗ |p) − K3 n))L .

(A.20)

Note that using (4.5), we have H ′ (G, P) = I(Q∗ ) = H(a(Q∗ ) − H(Q∗ |p)). Therefore,

( √ )L ∗ −(nH ′ (G,P)+K5 n) max Q ≤ 1 − 2 , L,𝐱 n

𝐱∈T (P)

for some constant K5 .

Then, using the inequality (1 − x)L ≤ exp2 (−𝐿𝑥), the above inequality becomes ( √ ) ′ max Q∗L,𝐱 ≤ exp2 −L ⋅ 2−(nH (G,P)+K5 n) . (A.21) n 𝐱∈T (P)

′

Setting L = 2(nH (G,P)+𝛿) , (A.22) becomes ( √ ) ∗ nH ′ (G,P)+𝛿 −(nH ′ (G,P)+K5 n) max Q ≤ exp − 1) ⋅ 2 −(2 2 𝐱∈T n (P) L,𝐱 ( √ ) ≤ exp2 −2n𝛿−K6 n . Substituting (A.17) into (A.23), we get ( √ ) ∑ 𝜖 P(𝐱) ⋅ Q∗L,𝐱 ≤ exp2 −2n𝛿−K6 n + , 2 𝐱∈ n ′

for L = 2(nH (G,P)+𝛿) . For sufficiently large n, the term exp2 (−2n𝛿−K6 zero, and (A.16) implies ∑ Q∗L (𝐒1 , … , 𝐒L ).P(U(𝐒1 , … , 𝐒L )) ≤ 𝜖.

(A.22)

√ n)

tends to

𝐒1 ,…,𝐒L

Thus, we conclude that for every 0 < 𝜖 < 1 and 𝛿 > 0, there exists a (2n(H (G,P)+𝛿) )system covering a subgraph G(n) [U] of G(n) with probability of U at least 1 − 𝜖. Now, from Lemma 4.A.1, the chromatic number of a graph is equal to the minimum number of maximal independent sets covering the graph. Therefore, for every 𝛿 > 0, there exists a subgraph G(n) [U] of G(n) with U ∈ T𝜖(n) , such that ′

′

𝜒(G(n) [U]) ≤ 2n(H (G,P)+𝛿) . Consequently, min

U⊂V (G(n) ),U∈T𝜖n

′

𝜒(G(n) [U]) ≤ 2n(H (G,P)+𝛿) ,

for every 𝛿 > 0,

129

130

4 Entropy, Counting, and Fractional Chromatic Number

Then, using the definition of M(n, 𝜖), we get 1 log M(n, 𝜖) ≤ H ′ (G, P) + 𝛿, n 2 And consequently, we get

for every 𝛿 > 0.

1 lim sup log2 M(n, 𝜖) ≤ H ′ (G, P). n→∞ n

(A.23)

Comparing (A.14) and (A.23), we obtain lim

n→∞

1 log M(n, 𝜖) = H ′ (G, P), n 2

for every 𝜖 with 0 < 𝜖 < 1.

◽

References 1. Dehmer, M. and Mowshowitz, A. (2011)

2.

3.

4.

5.

6.

7.

8.

9.

10.

A history of graph entropy measures. Inf. Sci., 181 (1), 57–78. Körner, J. (1973) Coding of an information source having ambiguous alphabet and the entropy of graphs. Transactions of the 6th Prague Conference on Information Theory, Academia, Prague, pp. 411–425. Körner, J. (1973) An extension of the class of perfect graphs. Studia Sci. Math. Hung., 8, 405–409. Körner, J. (1986) Fredman–Komlós bounds and information theory. SIAM J. Algebraic Discrete Methods, 7, 560–570. Körner, J. and Longo, G. (1973) Twostep encoding of finite memoryless sources. IEEE Trans. Inf. Theory, 19, 778–782. Körner, J., Simonyi, G., and Tuza, Zs. (1992) Perfect couples of graphs. Combinatorica, 12, 179–192. Körner, J. and Marton, K. (1988) Graphs that split entropies. SIAM J. Discrete Math., 1, 71–79. Körner, J. and Marton, K. (1988) New bounds for perfect hashing via information theory. Eur. J. Comb., 9, 523–530. Fredman, M. and Komlós, J. (1984) On the size of separating systems and perfect hash functions. SIAM J. Algebraic Discrete Methods, 5, 61–68. Kahn, J. and Kim, J.H. (1992) Entropy and sorting. Proceedings of the 24th Annual ACM Symposium on the Theory of Computing, pp. 178–187.

11. De Simone, C. and Körner, J. (1999)

12.

13.

14.

15.

16.

17.

18.

On the odd cycles of normal graphs, Proceedings of the 3rd International Conference on Graphs and Optimization, GO-III (Leukerbad, 1998). Discrete Appl. Math., 94 (1-3), 161–169. Simonyi, G. (2001) Perfect graphs and graph entropy: an updated survey, in Perfect Graphs (eds J.L. Ramirez-Alfonsm and B.A. Reed), John Wiley & Sons, Inc., New York, pp. 293–328. Cover, T.M. and Thomas, J.A. (2006) Elements of Information Theory, 2nd edn, A Wiley-Interscience publication. Radhakrishnan, J. (2001) Entropy and counting, in The Book Computational Mathematics, Modelling and Algorithms (ed. J.C. Misra), Narosa Publishers, New Delhi. Simonyi, G. (1995) Graph entropy: a survey, in Combinatorial Optimization, DIMACS Series in Discrete Mathematics and Computer Science, Vol. 20 (eds W. Cook, L. Lovas, and P. Seymour), AMS, pp. 399–441. Schrijver, A. (2003) Combinatorial Optimization, Springer-Verlag, Berlin Heidelberge. Godsil, C. and Royle, G.F. (2013) Algebraic Graph Theory, Vol. 207, Springer Science & Business Media. Csiszár, I., Körner, J., Lovás, L., Marton, K., and Simonyi, G. (1990) Entropy Splitting for antiblocking corners and perfect graphs. Combinatorica, 10, 27–40.

References 19. Greco, G. (1998) Capacities of graphs

23. Fulkerson, D.R. (1971) Blocking and

and 2-matchings. Discrete Math., 186 (1-3), 135–143. 20. C. Rezaei, S. Saeed (2016) Entropy of symmetric graphs. J. Discrete Math., 339(2), 475–483. 21. C. Rezaei, S. Saeed and Chiniforooshan, E. (2015) Symmetric Graphs with respect to Graph Entropy, http://arxiv .org/abs/1510.01415 (accessed 30 March 2016). 22. Chvátal, V. (1975) On certain polytopes associated with graphs. J. Comb. Theory B, 18, 138–154.

anti-blocking of polyhedra. Math. Program., 1, 168–194. 24. Lovász, L. (1972) Normal hypergraphs and the perfect graph conjecture. Discrete Math., 2 (3), 253–267. 25. Boyd, S. and Vandenberghe, L. (2004) Convex Optimization, Cambridge university press. 26. Anantharam, V. (1994) Error exponents in a source coding problem of Körner. J. Combin. Inform. System Sci., 19(1–4), 141–151.

131

133

5 Graph Entropy: Recent Results and Perspectives Xueliang Li and Meiqin Wei

5.1 Introduction

Graph entropy measures play an important role in a variety of subjects, including information theory, biology, chemistry, and sociology. It was first introduced by Rashevsky [1] and Trucco [2]. Mowshowitz [3–6] first defined and investigated the entropy of graphs and Körner [7] introduced a different definition of graph entropy closely linked to problems in information and coding theory. In fact, there may be no “right” one or “good” one, since what may be useful in one domain may not be serviceable in another. Distinct graph entropies have been used extensively to characterize the structures of graph-based systems in various fields. In these applications, the entropy of a graph is interpreted as the structural information content of the graph and serves as a complexity measure. It is worth mentioning that two different approaches to measure the complexity of graphs have been developed: deterministic and probabilistic. The deterministic category encompasses the encoding, substructure count, and generative approaches, while the probabilistic category includes measures that apply an entropy function to a probability distribution associated with a graph. The second category is subdivided into intrinsic and extrinsic subcategories. Intrinsic measures use structural features of a graph to partition the graph (usually the set of vertices or edges) and thereby determine a probability distribution over the components of the partition. Extrinsic measures impose an arbitrary probability distribution on graph elements. Both of these categories employ the probability distribution to compute an entropy value. Shannon’s entropy function is most commonly used, but several different families of entropy functions are also considered. In fact, three survey papers [8–10] on graph entropy measures were published already. However, Dehmer and Mowshowitz [8] and Simonyi [9] focused narrowly on the properties of Körner’s entropy measures and Simonyi [10] provided an overview of the most well-known graph entropy measures, which contains not so

Mathematical Foundations and Applications of Graph Entropy, First Edition. Edited by Matthias Dehmer, Frank Emmert-Streib, Zengqiang Chen, Xueliang Li, and Yongtang Shi. © 2016 Wiley-VCH Verlag GmbH & Co. KGaA. Published 2016 by Wiley-VCH Verlag GmbH & Co. KGaA.

134

5 Graph Entropy: Recent Results and Perspectives

many results and only concepts were preferred. Here we focus on the development of graph entropy measures and aim to provide a broad overview of the main results and applications of the most well-known graph entropy measures. Now we start our survey by providing some mathematical preliminaries. Note that all graphs discussed in this chapter are assumed to be connected. ( ) Definition 5.1. We use G = (V , E) with |V | < ∞ and E ⊆ V2 to denote a finite undirected graph. If G = (V , E), |V | < ∞ and E ⊆ V × V , then G is called a finite directed graph. We use ∪C to denote the set of finite undirected connected graphs. Definition 5.2. Let G = (V , E) be a graph. The quantity d(vi ) is called the degree of a vertex vi ∈ V , where d(vi ) equals the number of edges e ∈ E incident with vi . In the following, we simply denote d(vi ) by di . If a graph G has ai vertices of degree ∑t 𝜇i (i = 1, 2, … , t), where Δ(G) = 𝜇1 > 𝜇2 > · · · > 𝜇t = 𝛿(G) and i=1 ai = n, we at a1 a2 define the degree sequence of G as D(G) = [𝜇1 , 𝜇2 , … , 𝜇t ]. If ai = 1, we use 𝜇i instead of 𝜇i1 for convenience. Definition 5.3. The distance between two vertices u, v ∈ V , denoted by d(u, v), is the length of the shortest path between u, v ∈ V . A path P connecting u and v in G is called a geodesic path if the length of the path P is exactly d(u, v). We call 𝜎(v) = maxu∈V d(u, v) the eccentricity of v ∈ V . In addition, r(G) = minv∈V 𝜎(v) and 𝜌(G) = maxv∈V 𝜎(v) are the radius and diameter of G, respectively. Without causing any confusion, we simply denote r(G) and 𝜌(G) as 𝑟 and 𝜌, respectively. A path graph is a simple graph whose vertices can be arranged in a linear sequence in such a way that two vertices are adjacent if they are consecutive in the sequence, and are nonadjacent otherwise. Similarly, a cycle graph on three or more vertices is a simple graph, whose vertices can be arranged in a cyclic sequence in such a way that two vertices are adjacent if they are consecutive in the sequence, and are nonadjacent otherwise. Denote by Pn and Cn the path graph and the cycle graph on n vertices, respectively. A connected graph without any cycle is a tree. A star of order n, denoted by Sn , is the tree with n − 1 pendant vertices. Its unique vertex with degree n − 1 is called the center vertex of Sn . A simple connected graph is called unicyclic if it has exactly one cycle. We use Sn+ to denote the unicyclic graph obtained from the star Sn by adding to it an edge between two pendant vertices of Sn . Observe that a tree and a unicyclic graph of order n have exactly n − 1 and n edges, respectively. A bicyclic graph is a graph of order n with n + 1 edges. A tree is called a double star Sp,q if it is obtained from Sp+1 and Sq by identifying a leaf of Sp+1 with the center vertex of Sq . Therefore, for the double star Sp,q with n vertices, we have p + q = n. We call a double star Sp,q balanced if p = ⌊ n2 ⌋ and q = ⌈ n2 ⌉. A comet is a tree composed of a star and pendant path. For any integers n and t with 2 ≤ t ≤ n − 1, we denote by CS(n, t) the comet of order n with t pendant vertices, that is, a tree formed by a path Pn−t , of which one end vertex coincides with a pendant vertex of a star St+1 of order t + 1.

5.1

Introduction

Definition 5.4. The j-sphere of a vertex vi in G = (V , E) ∈ ∪C is defined by the set Sj (vi , G) ∶= {v ∈ V |d(vi , v) = j, j ≥ 1}. Definition 5.5. Let X be a discrete random variable by using alphabet , and p(xi ) = Pr(X = xi ) the probability mass function of X. The mean entropy of X is then defined by ∑ p(xi ) log(p(xi )). H(X) ∶= xi ∈

The concept of graph entropy introduced by Rashevsky [1] and Trucco [2] is used to measure structural complexity. Several graph invariants such as the number of vertices, vertex degree sequence, and extended degree sequences have been used in the construction of graph entropy measures. The main graph entropy measures can be divided into two classes: classical measures and parametric measures. Classical measures, denoted by I(G, 𝜏), are defined relative to a partition of a set X of graph elements induced by an equivalence relation 𝜏 on X. More precisely, let X be a set of graph elements (typically vertices), and let {Xi }, 1 ≤ i ≤ k, be a |X | partition of X induced by 𝜏. Suppose further that pi ∶= |X|i . Then, I(G, 𝜏) = −

k ∑

pi log(pi ).

i=1

As mentioned in Ref. [10], Rashevsky [1] defined the following graph entropy measure: ( ) k ∑ |Ni | |Ni | V log , (5.1) I (G) ∶= − |V | |V | i=1 where |Ni | denotes the number of topologically equivalent vertices in the ith vertex orbit of G and k is the number of different orbits. Vertices are considered as topologically equivalent if they belong to the same orbit of a graph. According to [11], we have that if a graph G is vertex-transitive [12, 13], then I V (G) = 0. In addition, Trucco [2] introduced a similar graph entropy measure ( E ) k ∑ |Ni | |NiE | E log , (5.2) I (G) ∶= − |E| |E| i=1 where |NiE | stands for the number of edges in the ith edge orbit of G. These two entropies are both classical measures, in which special graph invariants (e.g., numbers of vertices, edges, degrees, and distances) and equivalence relations have given rise to these measures of information contents. And thus far, a number of specialized measures have been developed that are used primarily to characterize the structural complexity of chemical graphs [14–16]. In recent years, rather than inducing partitions and determining their probabilities, researchers assign a probability value to each individual element of a graph

135

136

5 Graph Entropy: Recent Results and Perspectives

to derive graph entropy measures. This leads to the other class of graph entropy measures: parametric measures. Parametric measures are defined on graphs relative to information functions. Such functions are not identically zero and map graph elements (typically vertices) to nonnegative reals. Now we give the precise definition for entropies belonging to parametric measures. Definition 5.6. Let G ∈ ∪C and let S be a given set, for example, a set of vertices or paths. Functions f ∶ S → ℝ+ play a role in defining information measures on graphs and we call them information functions of G. Definition 5.7. Let f be an information function of G. Then, f (v ) pf (vi ) ∶= ∑|V | i . f (vj ) j=1 Obviously, pf (v1 ) + pf (v2 ) + · · · + pf (vn ) = 1,

where n = |V |.

Hence, (pf (v1 ), pf (v2 ), … , pf (vn )) forms a probability distribution. Definition 5.8. Let G be a finite graph and f be an information function of G. Then, If (G) ∶= −

|V | ∑ i=1

f (vi ) f (v ) log ∑|V | i , ∑|V | f (v ) f (v ) j j j=1 j=1

|V | ⎛ ∑ f (vi ) f (vi ) ⎞ ⎟ log If𝜆 (G) ∶= 𝜆 ⎜log(|V |) + ∑ ∑ |V | |V | ⎟ ⎜ f (v ) f (v ) i=1 j j ⎠ j=1 j=1 ⎝

(5.3)

(5.4)

are families of information measures representing structural information content of G, where 𝜆 > 0 is a scaling constant. If is the entropy of G, which belongs to parametric measures, and If𝜆 is its information distance between maximum entropy and If . The meaning of If (G) and If𝜆 (G) has been investigated by calculating the information content of real and synthetic chemical structures [17]. In addition, the information measures were calculated using specific graph classes to study extremal values and, hence, to detect the kind of structural information captured by the measures. In fact, there also exist graph entropy measures based on integral, although we do not focus on them in this paper. We introduce simply here one such entropy: the tree entropy. For more details, we refer to [18, 19]. A graph G = (V , E) with a distinguished vertex o is called a rooted graph, which is denoted by (G, o) here. A rooted isomorphism of rooted graphs is an isomorphism of the underlying graphs

5.1

Introduction

that takes the root of one to the root of the other. The simple random walk on G is the Markov chain, whose state space is V and transition probability from x to y equals the number of edges joining x to y divided by d(x). The average degree of G ∑ d(x) is x∈V . |V | Let pk (x; G) denote the probability that the simple random walk on G started at x and back at x after k steps. Given a positive integer R, a finite rooted graph H, and a probability distribution 𝜌 on rooted graphs, let p(R, H, 𝜌) denote the probability that H is rooted isomorphic to the ball of radius R about the root of a graph chosen with distribution 𝜌. Define the expected degree of a probability measure 𝜌 on rooted graphs to be d(𝜌) ∶=

∫

d(o)d𝜌(G, o).

For a finite graph G, let U(G) denote the distribution of rooted graphs obtained by choosing a uniform random vertex of G as root of G. Suppose that ⟨Gn ⟩ is a sequence of finite graphs and that 𝜌 is a probability measure on rooted infinite graphs. We say that the random weak limit of ⟨Gn ⟩ is 𝜌 if for any positive integer R, any finite graph H, and any 𝜖 > 0, we have lim P[|p(R, H, U(Gn )) − p(R, H, 𝜌)| > n→∞ 𝜖] = 0. Lyons [18] proposed the tree entropy of a probability measure 𝜌 on rooted infinite graphs: ( ) ∑1 h(𝜌) ∶= log d(o) − p (o, G) d𝜌(G, o). ∫ k k k≥1 For labeled networks, that is, labeled graphs, Lyons [19] also gave a definition of information measure, which is more general than the tree entropy. Definition 5.9. [19] Let 𝜌 be a probability measure on rooted networks. We call 𝜌 unimodular if ∑ ∑ f (G, o, x)d𝜌(G, o) = f (G, x, o)d𝜌(G, o) ∫ x∈V (G) ∫ x∈V (G) for all nonnegative Borel functions f on locally finite connected networks with an ordered pair of distinguished vertices that is invariant in the sense that for any (nonrooted) network isomorphism 𝛾 of G and any x, y ∈ V (G), we have f (𝛾G, 𝛾x, 𝛾y) = f (G, x, y). In fact, following the seminal paper of Shannon and Weaver [20], many generalizations of the entropy measure have been proposed. An important example of such a measure is called the Rényi entropy [21], which is defined by ) ( n ∑ 1 r 𝛼 I𝛼 (P) ∶= log (Pi ) , 𝛼 ≠ 1, 1−𝛼 i=1

137

138

5 Graph Entropy: Recent Results and Perspectives

where n = |V | and P ∶= (p1 , p2 , … , pn ). For further discussion of the properties of Rényi entropy, see [22]. Rényi and other general entropy functions allow for specifying families of information measures that can be applied to graphs. Similarly to some generalized information measures that have been investigated in information theory, Dehmer and Mowshowitz call these families generalized graph entropies. And in Ref. [23], they introduced six distinct such entropies, which are stated as follows. Definition 5.10. Let G = (V , E) be a graph on n vertices. Then, ( k ( ) ) ∑ |Xi | 𝛼 1 1 log , I𝛼 (G) ∶= 1−𝛼 |X| i=1 )𝛼 ) ( n ( ∑ f (vi ) 1 2 I𝛼 (G)f ∶= log , ∑n 1−𝛼 i=1 j=1 f (vj ) ∑k ( |Xi | )𝛼 −1 i=1 |X| I𝛼3 (G) ∶= , 21−𝛼 − 1 ( )𝛼 ∑n f (vi ) ∑ −1 n i=1 j=1 f (vj ) , I𝛼4 (G)f ∶= 21−𝛼 − 1 [ ] |X | 1− i , |X| |X| i=1 [ ] n ∑ f (vi ) f (vi ) 6 1 − ∑n , I (G)f ∶= ∑n i=1 j=1 f (vj ) j=1 f (vj ) I 5 (G) ∶=

k ∑ |Xi |

(5.5)

(5.6)

(5.7)

(5.8)

(5.9)

(5.10)

where X is a set of graph elements (typically vertices), {Xi } for 1 ≤ i ≤ k is a partition of X induced by the equivalence relation 𝜏, f is an information function of G, and 𝛼 ≠ 1. Parametric complexity measures have been proved useful in the study of complexity associated with machine learning. And Dehmer et al. [24] showed that generalized graph entropies can be applied to problems in machine learning such as graph classification and clustering. Interestingly, these new generalized entropies have been proved useful in demonstrating that hypotheses can be learned by using appropriate data sets and parameter optimization techniques. This chapter is organized as follows. Section 5.2 shows some inequalities and extremal properties of graph entropies and generalized graph entropies. Relationships between graph structures, graph energies, topological indices, and generalized graph entropies are presented in Section 5.3, and the last section is a simple summary.

5.2

Inequalities and Extremal Properties on (Generalized) Graph Entropies

5.2 Inequalities and Extremal Properties on (Generalized) Graph Entropies

Thanks to the fact that graph entropy measures have been applied to characterize the structures and complexities of graph-based systems in various areas, identity and inequality relationships between distinct graph entropies have been an interesting and popular research topic. In the meantime, extremal properties of graph entropies have also been widely studied and various results were obtained. 5.2.1 Inequalities for Classical Graph Entropies and Parametric Measures

Most of the graph entropy measures developed thus far have been applied in mathematical chemistry and biology [10, 14, 25]. These measures have been used to quantify the complexity of chemical and biological systems that can be represented as graphs. Given the profusion of such measures, it is useful to prove bounds for special graph classes or to study interrelations among them. Dehmer et al. [26] gave interrelationship between the parametric entropy and a classical entropy measure that is based on certain equivalence classes associated with an arbitrary equivalence relation. Theorem 5.1. [26] Let G = (V , E) be an arbitrary graph and Xi , 1 ≤ i ≤ k, be the equivalence classes associated with an arbitrary equivalence relation on X. Suppose further that f is an information function with f (vi ) > |Xi | for 1 ≤ i ≤ k and c ∶= ∑|V |1 . Then, j=1

f (vj )

∑ |Xi | log(|X|) ∑ f 1 If (G) < c ⋅ I(G, 𝜏) − c ⋅ log(c) − p (vi ) |X| |X| |X| i=1 i=1 ) ( |V | k |X| 1 ∑ f 1 ∑ f − p (vi ) log(pf (vi )) + p (vi ) log 1 + |X| i=k+1 |X| i=1 c ⋅ f (vi ) ) ( f k ∑ p (vi ) +1 . + log |X| i=1 k

k

Assume that f (vi ) > |Xi |, 1 ≤ i ≤ k, for some special graph classes and take the set X to be the vertex set V of G. Three corollaries of the above theorem on the upper bounds of If (G) can be obtained. Corollary 5.2. [26] Let Sn be a star graph on n vertices and suppose that v1 is the vertex with degree n − 1. The remaining n − 1 nonhub vertices are labeled arbitrarily, and v𝜇 stands for a nonhub vertex. Let f be an information function satisfying the conditions of Theorem 5.1. Let V1 ∶= {v1 } and V2 ∶= {v2 , v3 , … , vn } denote the

139

140

5 Graph Entropy: Recent Results and Perspectives

orbits of the automorphism group of Sn forming a partition of V . Then, ( ) ) ( 1 1 + pf (v𝜇 ) log 1 + f If (Sn ) < pf (v1 ) log 1 + f p (v1 ) p (v𝜇 ) n ∑

+ log(1 + pf (v1 )) + log(1 + pf (v𝜇 )) −

pf (vi ) log(pf (vi ))

i=2, i≠𝜇

−(n − 1) ⋅ c ⋅ log[(n − 1)c] − c log(c). Corollary 5.3. [26] Let GnI be an identity graph (a graph possessing a single graph automorphism) on n ≥ 6 vertices. GnI has only the identity automorphism and therefore each orbit is a singleton set, that is, |Vi | = 1, 1 ≤ i ≤ n. Let f be an information function satisfying the conditions of Theorem 5.1. Then, ( ) n n ∑ ∑ 1 I f p (vj ) log 1 + f log(1 + pf (vj )) − n ⋅ c ⋅ log(c). If (Gn ) < + p (vj ) j=1 j=1 Corollary 5.4. [26] Let GnP be a path graph on n vertices and f be an information function satisfying the conditions of Theorem 5.1. If n is even, GnP posses n2 equivalence classes Vi and each Vi contains two vertices. Then, n n ( ) 2 2 ∑ ∑ 1 P f If (Gn ) < p (vj ) log 1 + f log(1 + pf (vj )) + p (v ) j j=1 j=1 −

n ∑

pf (vj ) log(1 + pf (vj )) − n ⋅ c ⋅ log(2c).

j= n2 +1

If n is odd, then there exist n − ⌊ n2 ⌋ equivalence classes, n − ⌊ n2 ⌋ − 1 that have two elements and only one class containing a single element. This implies that ( ) n−⌊ n ⌋ n−⌊ n2 ⌋ ∑ ∑2 1 P f If (Gn ) < p (vj ) log 1 + f log(1 + pf (vj )) + p (vj ) j=1 j=1 −

) ( n pf (vj ) log(pf (vj )) − n − ⌊ ⌋ − 1 ⋅ 2c ⋅ log(2c) 2 j=n−⌊ n ⌋+1 n ∑ 2

−c ⋅ log(c). Assuming different initial conditions, Dehmer et al. [26] derived additional inequalities between classical and parametric measures. Theorem 5.5. [26] Let G be an arbitrary graph and pf (vi ) < |Xi |. Then, |V | k log(|X|) ∑ f 1 ∑ f 1 p (vi ) log(pf (vi )) − p (vi ) If (G) > I(G, 𝜏) − |X| |X| i=k+1 |X| i=1

5.2

Inequalities and Extremal Properties on (Generalized) Graph Entropies

) ) ( ( k k ∑ |Xi | |X| 1 ∑ − . − |X | log 1 + log 1 + |X| i=1 i |Xi | |X| i=1 Theorem 5.6. [26] Let G be an arbitrary graph with pi being the probabilities such that pi < f (vi ). Then, |V | ∑ log(c) 1 I(G, 𝜏) > If (G) + + pf (vi ) log(pf (vi )) c c i=k+1

−

k ∑

log(pf (vi )) −

i=1

k ∑

( log 1 +

i=1

1 pf (vi )

) (1 + pf (vi )).

For identity graphs, they also obtained a general upper bound for the parametric entropy measure. Corollary 5.7. [26] Let GnI be an identity graph on n vertices. Then, If (GnI ) < log(n) − c ⋅ log(c) + +

n ∑ i=1

( log 1 +

n ∑

log(pf (vi ))

i=1

1 pf (vi )

) (1 + pf (vi )).

5.2.2 Graph Entropy Inequalities with Information Functions f V , f P , and f C

In complex networks, information-theoretical methods are important for analyzing and understanding information processing. One major problem is to quantify structural information in networks based on the so-called information functions. Consider a complex network as an undirected connected graph, and based on such information functions, one can directly obtain different graphs entropies. Now we define two information functions, f V (vi ), f P (vi ), based on metrical properties of graphs, and a novel information function, f C (vi ), based on a vertex centrality measure. Definition 5.11. [27] Let G = (V , E) ∈ ∪C . For a vertex vi ∈ V , we define the information function as f V (vi ) ∶= 𝛼 c1 |S1 (vi ,G)|+c2 |S2 (vi ,G)|+···+c𝜌 |S𝜌 (vi ,G)| ,

ck > 0, 1 ≤ k ≤ 𝜌, 𝛼 > 0,

where ck are arbitrary real positive coefficients, Sj (vi , G) denotes the j-sphere of vi regarding G and |Sj (vi , G)| its cardinality, respectively. Before giving the definition of the information function f P (vi ), we introduce the following concepts first.

141

142

5 Graph Entropy: Recent Results and Perspectives

Definition 5.12. [27] Let G = (V , E) ∈ UC . For a vertex vi ∈ V , we determine the set Sj (vi , G) = {vuj , vwj , … , vxj } and define associated paths j

P1 (vi ) = (vi , vu1 , vu2 , … , vuj ), j

P2 (vi ) = (vi , vw1 , vw2 , … , vwj ), ⋮ j Pk (vi ) j

= (vi , vx1 , vx2 , … , vxj )

and their edge sets E1 = {{vi , vu1 }, {vu2 , vu3 }, … , {vuj−1 , vuj }}, E2 = {{vi , vw1 }, {vw2 , vw3 }, … , {vwj−1 , vwj }}, ⋮ Ekj = {{vi , vx1 }, {vx2 , vx3 }, … , {vxj−1 , vxj }}. Now we define the graph ℒG (vi , j) = (Vℒ , Eℒ ) ⊆ G as the local information graph regarding vi ∈ V with respect to f , where Vℒ ∶= {vi , vu1 , vu2 , … , vuj } ∪ {vi , vw1 , vw2 , … , vwj } ∪ · · · ∪ {vi , vx1 , vx2 , … , vxj } and Eℒ ∶= E1 ∪ E2 ∪ · · · ∪ Ekj . Further, j = j(vi ) is called the local information radius regarding vi . Definition 5.13. [27] Let G = (V , E) ∈ ∪C . For each vertex vi ∈ V and for j ∈ 1, 2, … , 𝜌, we determine the local information graph ℒG (vi , j), where ℒG (vi , j) j j j j is induced by the paths P1 (vi ), P2 (vi ), … , Pk (vi ). The quantity l(P𝜇 (vi )) ∈ ℕ, 𝜇 ∈ j

j

{1, 2, … , kj } denotes the length of P𝜇 (vi ) and k

l(P(ℒG (vi , j))) ∶=

j ∑

𝜇=1

j

l(P𝜇 (vi ))

expresses the sum of the path lengths associated with each ℒG (vi , j). Now we define the information function f P (vi ) as f P (vi ) ∶= 𝛼 b1 l(P(ℒG (vi ,1)))+b2 l(P(ℒG (vi ,2)))+···+b𝜌 l(P(ℒG (vi ,𝜌))) , where bk > 0, 1 ≤ k ≤ 𝜌, 𝛼 > 0 and bk are arbitrary real positive coefficients. Definition 5.14. [27] Let G = (V , E) ∈ ∪C and ℒG (vi , j) denote the local information graph defined as above for each vertex vi ∈ V . We define f C (vi ) as f C (vi ) ∶= 𝛼 a1 𝛽

ℒG (vi ,1) (v

i )+a2 𝛽

ℒG (vi ,2) (v

i )+···+a𝜌 𝛽

ℒG (vi ,𝜌) (v

i)

,

where 𝛽 ≤ 1, ak > 0, 1 ≤ k ≤ 𝜌, 𝛼 > 0, 𝛽 is a certain vertex centrality measure, 𝛽 ℒG (vi ,j) (vi ) expresses that we apply 𝛽 to vi regarding ℒG (vi , j) and ak are arbitrary real positive coefficients.

5.2

Inequalities and Extremal Properties on (Generalized) Graph Entropies

By applying Definitions 5.11, 5.13, 5.14, and Eq. (5.3), we obviously obtain the following three special graph entropies: If V (G) = −

|V | ∑ i=1

If P (G) = −

|V | ∑ i=1

f V (vi ) f V (v ) log ∑|V | i , ∑|V | V V (v ) f (v ) f j j j=1 j=1 f P (vi )

(5.11)

f P (vi ) log , ∑|V | P ∑|V | P f (vj ) f (vj ) j=1 j=1

(5.12)

f C (vi ) f C (v ) log ∑|V | i . ∑|V | C f (vj ) f C (vj ) j=1 j=1

(5.13)

and If C (G) = −

|V | ∑ i=1

The entropy measures based on the defined information functions (f V , f P , and can detect the structural complexity between graphs and therefore capture important structural information meaningfully. In Ref. [27], Dehmer investigated relationships between the above graph entropies and analyzed the computational complexity of these entropy measures.

f C)

Theorem 5.8. [27] Let G = (V , E) ∈ ∪C and f V , f P , and f C be information functions defined above. For the associated graph entropies, it holds the inequality ( [ )] P P P P If V (G) > 𝛼 𝜌[𝜙 𝜔 −𝜑] If P (G) − log 𝛼 𝜌[𝜙 𝜔 −𝜑] , 𝛼 > 1 where 𝜔P = max1≤i≤|V | 𝜔P (vi ), 𝜔P (vi ) = max1≤j≤𝜌 l(P(ℒG (vi , j))), 𝜙P = max1≤j≤𝜌 bj , and 𝜑 = min1≤j≤𝜌 cj ; and ( [ )] C C C C If V (G) < 𝛼 𝜌[𝜑 m −𝜙𝜔] If C (G) − log 𝛼 𝜌[𝜑 m −𝜙𝜔] , 𝛼 > 1, where 𝜙 = max1≤j≤𝜌 cj , 𝜑C = min1≤j≤𝜌 aj , mC = min1≤i≤|V | mC (vi ), 𝜔 = max1≤i≤|V | (𝜔(vi )), and 𝜔(vi ) = max1≤j≤𝜌 |Sj (vi , G)|. Theorem 5.9. [27] The time complexity to compute the entropies If V (G), If P (G), and If C (G) for G ∈ ∪C is O(|V |3 ). 5.2.3 Information Theoretic Measures of UHG Graphs

Let G be an undirected graph with vertex set V , edge set E, and N = |V | vertices. We call the function L ∶ V → ℒ multilevel function, which assigns to all vertices of G an element l ∈ ℒ that corresponds to the level it will be assigned. Then, a universal hierarchical graph (UHG) is defined by a vertex set V , an edge set E, a level set ℒ , and a multilevel function L. The vertex and edge sets define the connectivity and the level set and multilevel function induce a hierarchy between the vertices of G. We denote the class of UHG by 𝒢UH . Rashevsky [1] suggested to partition a graph and assign probabilities pi to all partitions in a certain way. Here, for a graph G = (V , E) ∈ 𝒢UH , such a partition is

143

144

5 Graph Entropy: Recent Results and Perspectives

given naturally by the hierarchical levels of G. This property directly leads to the definition of its graph entropies. Definition 5.15. We assign a discrete probability distribution Pn to a graph G ∈ n 𝒢UH with ℒ in the following way: Pn ∶ ℒ → [0, 1]|ℒ | with pni ∶= Ni , where ni is the number of vertices on level i. The vertex entropy of G is defined as H n (G) = −

|ℒ | ∑

pni log(pni ).

i

Definition 5.16. We assign a discrete probability distribution Pe to a graph G ∈ e 𝒢UH with ℒ in the following way: Pe ∶ ℒ → [0, 1]|ℒ | with pei ∶= Ei0 , where ei is 0 the number of edges incident with the vertices on level i and E = 2|E|. The edge entropy of G is defined as H e (G) = −

|ℒ | ∑

pei log(pei ).

i

Emmert-Streib and Dehmer [28] focused on the extremal properties of entropy measures of UHG graphs. In addition, they proposed the concept of joint entropy of UHG and further studied its extremal properties. Theorem 5.10. [28] For G ∈ 𝒢UH with N vertices and |ℒ | Levels, the condition for G to have maximum vertex entropy is (1) if (2) if

N |ℒ | N |ℒ |

pi =

∈ℕ∶

pi =

∈ℝ∶ {n ∶ N n−1 N

∶

n N

with n =

N , |ℒ |

or

1 ≤ i ≤ I1 , I1 + 1 ≤ i ≤ |ℒ | = I1 + I2 .

Theorem 5.11. [28] For G ∈ 𝒢UH with |E| edges and |ℒ | Levels, the condition for G to have maximum edge entropy is (1) if

E0 |ℒ | E0 |ℒ |

∈ℕ∶

∈ℝ∶ {e ∶ 0 pi = Ee−1 ∶ E0

(2) if

pi =

e E0

with e =

E0 , |ℒ |

or

1 ≤ i ≤ I1 , I1 + 1 ≤ i ≤ |ℒ | = I1 + I2 .

Now we give two joint probability distributions on G ∈ 𝒢UH and introduce two joint entropies for G.

5.2

Inequalities and Extremal Properties on (Generalized) Graph Entropies

Definition 5.17. A discrete joint probability distribution on G ∈ 𝒢UH is naturally given by p𝑖𝑗 ∶= pni pej . The resulting joint entropy of G is given by H2 (G) = −

|ℒ | |ℒ | ∑ ∑ i

p𝑖𝑗 log(p𝑖𝑗 ).

j

Definition 5.18. A discrete joint probability distribution on G ∈ 𝒢UH can also be given by { pn pe ∑ i ni e ∶ i = j, j pj pj p𝑖𝑗 = 0∶ i ≠ j. The resulting joint entropy of G is given by H2′ (G) = −

|ℒ | |ℒ | ∑ ∑ i

p𝑖𝑗 log(p𝑖𝑗 ).

j

Interestingly, the extremal property of joint entropy in Definition 5.18 for ℕ or

E0 |ℒ |

N |ℒ |

∈

∈ ℕ is similar to that of joint entropy in Definition 5.17.

Theorem 5.12. [28] For G ∈ 𝒢UH with N vertices, |E| edges and |ℒ | levels, the condition for G to have maximum joint entropy is (1)

if

(2)

if

(3)

if

(4)

N |ℒ | N |ℒ |

∈ ℕ and

E0 |ℒ | E0 |ℒ |

∈ ℕ: pni =

E0 |ℒ |

∈ ℕ: pei =

n N n N

with n =

e E0

with e =

N |ℒ | N |ℒ |

and pei =

∈ ℕ and ∈ ℝ: pni = with n = and {e ∶ 1 ≤ i ≤ I1e , 0 pei = Ee−1 ∶ I1e + 1 ≤ i ≤ L = |ℒ | = I1e + I2e , E0 N |ℒ |

∈ ℝ and {n ∶ pni = N n−1 ∶ N

1 ≤ i ≤ I1n , I1n + 1 ≤ i ≤ |ℒ | = I1n + I2n , 0

E ∈ ℝ and |ℒ ∈ ℝ: | {e ∶ 1 ≤ i ≤ I1e , 0 pei = Ee−1 ∶ I1e + 1 ≤ i ≤ |ℒ | = I1e + I2e , E0 {n ∶ 1 ≤ i ≤ I1n , n pi = N n−1 ∶ I1n + 1 ≤ i ≤ |ℒ | = I1n + I2n . N

if

N |ℒ |

E0 |ℒ |

and

e E0

with e =

E0 |ℒ |

145

146

5 Graph Entropy: Recent Results and Perspectives

Note that the algorithmic computation of information-theoretical measures always requires polynomial time complexity. Also in Ref. [28], Emmert-Streib and Dehmer provided some results about the time complexity to compute the vertex and edge entropy introduced as above. Theorem 5.13. [28] The time complexity to compute the vertex entropy (or edge entropy, which is defined in Definition 5.16) of an UHG graph G with N vertices and |ℒ | hierarchical levels is O(N)(𝑜𝑟 O(N 2 )). Let e𝑙𝑖 denote the number of edges the ith vertex has on level l and 𝜋l (⋅) be a permutation function on level l that orders the e𝑙𝑖 ’s such that e𝑙𝑖 ≥ el,i+1 with i = 𝜋l (k) and i + 1 = 𝜋l (m). This leads to an L × Nmax matrix M, whose elements correspond to e𝑙𝑖 , where i is the column index and l the row index. The number Nmax is the maximal number of vertices a level can have. In addition, EmmertStreib and Dehmer [28] also introduced another edge entropy and studied the time complexity to compute it, which we will state it in the following. Definition 5.19. We assign a discrete probability distribution Pe to a graph G ∈ ∑ e𝑙𝑖 𝒢UH with ℒ in the following way: Pe ∶ ℒ → [0, 1]|ℒ | with pei ∶= N 1 i M , Mi = max i ∑ e . The edge entropy of G is now defined as i 𝑙𝑖 H e (G) = −

|ℒ | ∑

pei log(pei ).

i

Theorem 5.14. [28] The time complexity to compute the edge entropy in Definition 5.19 of an UHG graph G with N vertices and |ℒ | hierarchical levels is O(|ℒ | ⋅ max((N 0 )2 , (N 1 )2 , … , (N |ℒ | )2 )). Here, N l with l ∈ {0, … , |ℒ |} is the number of vertices on level l. 5.2.4 Bounds for the Entropies of Rooted Trees and Generalized Trees

The investigation of topological aspects of chemical structures constitutes a major part of the research in chemical graph theory and mathematical chemistry [29–32]. There are a lot of problems dealing with trees for modeling and analyzing chemical structures. However, also rooted trees have wide applications in chemical graph theory such as enumeration and coding problems of chemical structures, and so on. Here, a hierarchical graph means a graph having a distinct vertex that is called a root and we also call it a rooted graph. Dehmer et al. [33] derived bounds for the entropies of hierarchical graphs in which they chose the classes of rooted trees and the so-called generalized trees. To start with the results of entropy bounds, we first define the graph classes mentioned above. Definition 5.20. An undirected graph is called undirected tree if this graph is connected and cycle-free. An undirected rooted tree  = (V , E) is an undirected

5.2

Inequalities and Extremal Properties on (Generalized) Graph Entropies

graph, which has exactly one vertex r ∈ V for which every edge is directed away from the root r. Then, all vertices in  are uniquely accessible from r. The level of a vertex v in a rooted tree  is simply the length of the path from r to v. The path with the largest path length from the root to a leaf is denoted by h. Definition 5.21. As a special case of  = (V , E) we also define an ordinary w-tree denoted as w , where w is a natural number. For the root vertex r, it holds d(r) = w and for all internal Vertices, r ∈ V holds d(v) = w + 1. Leaves are vertices without successors. A w-tree is fully occupied, denoted by wo if all leaves possess the same height h. Definition 5.22. Let  = (V , E1 ) be an undirected finite rooted tree. |L| denotes the cardinality of the level set L ∶= {l0 , l1 , … , lh }. The longest length of a path in  is denoted by h. It holds h = |L| − 1. The mapping Λ ∶ V → L is surjective and it is called a multilevel function if it assigns to each vertex an element of the level set L. A graph H = (V , EG ) is called a finite, undirected generalized tree if its edge set can be represented by the union EG ∶= E1 ∪ E2 ∪ E3 , where

• E1 forms the edge set of the underlying undirected rooted tree  . • E2 denotes the set of horizontal across-edges, that is, an edge whose incident vertices are at the same level i.

• E3 denotes the set of edges, whose incident vertices are at different levels. Note that the definition of graph entropy here are the same as Definition 5.11 and Eq. (5.11). Inspired by the technical assertion proved in Ref. [28], Dehmer et al. [33] studied bounds for the entropies of rooted trees and the so-called generalized trees. Here, we give the entropy bounds of rooted trees first. Theorem 5.15. [33] Let  be a rooted tree. For the entropy of  , it holds the inequality [ )] ( If V ( ) > 𝛼 𝜌[𝜙⋅𝜔−𝜑] Ig ( ) − log 𝛼 𝜌[𝜙⋅𝜔−𝜑] , ∀𝛼 > 1, where

[ Ig ( ) ∶= − g(v01 ) +

h 𝜎i ∑ ∑

] (v𝑖𝑘 ) log(g(v𝑖𝑘 ))

i=1 k=1

and 𝜔 ∶= max0≤i≤h,1≤k≤𝜎i 𝜔(v𝑖𝑘 ), 𝜔(v𝑖𝑘 ) ∶= max1≤j≤𝜌 |Sj (v𝑖𝑘 ,  )|, 𝜙 ∶= max1≤j≤𝜌 cj , 𝜑 ∶= min1≤j≤𝜌 cj , v𝑖𝑘 denotes the kth vertex on the ith level, 1 ≤ i ≤ h, 1 ≤ k ≤ 𝜎i and 𝜎i denotes the number of vertices on level i. As directed corollaries, special bounds for the corresponding entropies have been obtained by considering special classes of rooted trees. Corollary 5.16. [33] Let wo be a fully occupied w-tree. For the entropy of wo holds ( [ )] h h If V ( o ) > 𝛼 2h[𝜙⋅𝜔 −𝜑] Ig (wo ) − log 𝛼 2h[𝜙⋅𝜔 −𝜑] , ∀𝛼 > 1.

147

148

5 Graph Entropy: Recent Results and Perspectives

Corollary 5.17. [33] Let w be an ordinary w-tree. For the entropy of w holds ( [ )] h h If V ( ) > 𝛼 𝜌[𝜙⋅𝜔 −𝜑] Ig ( ) − log 𝛼 𝜌[𝜙⋅𝜔 −𝜑] , ∀𝛼 > 1. Next, we will state the entropy bounds for generalized trees. In fact, the entropy of a specific generalized tree can be characterized by the entropy of another generalized tree that is extremal with respect to a certain structural property; see the following theorems. Theorem 5.18. [33] Let H = (V , EG ) be a generalized tree with EG ∶= E1 ∪ E2 , that is, H possesses across-edges only. Starting from H, we define H ∗ as the generalized tree with the maximal number of across-edges on each level i, 1 ≤ i ≤ h.

• First, there exist positive real coefficients ck which satisfy the inequality system c1 |S1 (v𝑖𝑘 , H ∗ )| + c2 |S2 (v𝑖𝑘 , H ∗ )| + · · · + c𝜌 |S𝜌 (v𝑖𝑘 , H ∗ )| > c1 |S1 (v𝑖𝑘 , H)| + c2 |S2 (v𝑖𝑘 , H)| + · · · + c𝜌 |S𝜌 (v𝑖𝑘 , H)|, where 0 ≤ i ≤ h, 1 ≤ k ≤ 𝜎i , cj ≥ 0, 1 ≤ j ≤ 𝜌 and 𝜎i denotes the number of vertices on level i. • Second, it holds [ )] ( ∗ ∗ ∗ ∗ If V (H) > 𝛼 𝜌[𝜙 ⋅𝜔 −𝜑] If V (H ∗ ) − log 𝛼 𝜌[𝜙 ⋅𝜔 −𝜑] , ∀𝛼 > 1. Theorem 5.19. [33] Let H = (V H , E) be an arbitrary generalized tree and let H|V |,|V | be the complete generalized tree such that |V H | ≤ |V |. It holds If V (H) ≤ If V (H|V |,|V | ). 5.2.5 Information Inequalities for If (G) based on Different Information Functions

We begin this section with some definition and notation. Definition 5.23. Parameterized j-spheres: 𝜌(G)

∑

fP (vi ) = 𝛽

j=1

cj |Sj (vi ,G)|

exponential

,

information

function

using

(5.14)

where 𝛽 > 0 and ck > 0 for 1 ≤ k ≤ 𝜌(G). Definition 5.24. Parameterized linear information function using j-spheres: 𝜌(G)

fP′ (vi ) =

∑

cj |Sj (vi , G)|,

j=1

where ck > 0 for 1 ≤ k ≤ 𝜌(G).

(5.15)

5.2

Inequalities and Extremal Properties on (Generalized) Graph Entropies

Let LG (v, j) be the subgraph induced by the shortest path starting from the vertex v to all the vertices at distance j in G. Then, LG (v, j) is called the local information graph regarding v with respect to j, which is defined as in Definition 5.12 [27]. A local centrality measure that can be applied to determine the structural information content of a network [27] is then defined as follows. We assume that G = (V , E) is a connected graph with |V | = n vertices. Definition 5.25. The closeness centrality of the local information graph is defined by 𝛾(v; LG (v, j)) = ∑

1

x∈LG (v,j) d(v, x)

.

Similar to the j-sphere functions, we define further functions based on the local centrality measure as follows. Definition 5.26. Parameterized exponential information function using local centrality measure: fC (vi ) = 𝛼

∑n

j=1 cj 𝛾(vi ;LG (vi ,j))

,

where 𝛼 > 0, ck > 0 for 1 ≤ k ≤ 𝜌(G). Definition 5.27. Parameterized linear information function using local centrality measure: n ∑ fC ′ (vi ) = cj 𝛾(vi ; LG (vi , j)), j=1

where ck > 0 for 1 ≤ k ≤ 𝜌(G). Recall that entropy measures have been used to quantify the information content of the underlying networks and functions became more meaningful when we choose the coefficients to emphasize certain structural characteristics of the underlying graphs. Now, we first present closed form expressions for the graph entropy If (Sn ). Theorem 5.20. [34] Let Sn be a star graph on n vertices. Let f ∈ {fP , fP′ , fC , fC ′ } be the information functions as defined above. The graph entropy is given by ( ) 1−x If (Sn ) = −xlog2 x − (1 − x)log2 , n−1 where x is the probability of the central vertex of Sn : 1 , if f = fP , 1 + (n − 1)𝛽 (c2 −c1 )(n−2) c1 x= , if f = fP′ , 2c1 + c2 (n − 2) 1 x= if f = fC , ( ) ( ), 1 c1 n−2 +c 1 + (n − 1)𝛼 n−1 2 2n−3 x=

149

150

5 Graph Entropy: Recent Results and Perspectives

c1

x=

c1 (1 + (n − 1)2 ) + c2

(

(n−1)2 2n−3

),

if f = fC ′ .

Note that to compute a closed-form expression, a path is not always simple. In order to illustrate this, we present the graph entropy IfP′ (Pn ) by choosing particular values for its coefficients. Theorem 5.21. [34] Let Pn be a path graph and set c1 ∶= 𝜌(Pn ) = n − 1, c2 ∶= 𝜌(Pn ) − 1 = n − 2, … , c𝜌 ∶= 1. We have ) ⌈n∕2⌉ ( 2 ∑ n + n(2r − 3) − 2r(r − 1) IfP′ (Pn ) = 3 n(n − 1)(2n − 1) r=1 ) ( 2n(n − 1)(2n − 1) . ⋅ log2 3n2 + 3n(2r − 3) − 6r(r − 1) In Ref. [34], the authors presented explicit bounds or information inequalities for any connected graph if the measure is based on the information function using j-spheres, that is, f = fP or f = fP′ . Theorem 5.22. [34] Let G = (V , E) be a connected graph on n vertices. Then, we infer the following bounds: { X if 𝛽 > 1, 𝛽 log2 (n ⋅ 𝛽 X ), IfP (G) ≤ 𝛽 −X log2 (n ⋅ 𝛽 −X ), if 𝛽 < 1, ⎧ ⎪𝛽 X log2 (n ⋅ 𝛽 X ), ⎪ IfP (G) ≥ ⎨𝛽 −X log2 (n ⋅ 𝛽 −X ), ⎪ ⎪0, ⎩

( )1 if

1 n

X

≤ 𝛽 ≤ 1, 1

if 1 ≤ 𝛽 ≤ n X , ( )1 1 X or 𝛽 ≥ n X , if 0 < 𝛽 ≤ n1

where X = (cmax − cmin )(n − 1) with cmax = max{cj ∶ 1 ≤ j ≤ 𝜌(G)} and cmin = min{cj ∶ 1 ≤ j ≤ 𝜌(G)}. Theorem 5.23. [34] Let G = (V , E) be a connected graph on n vertices. Then, we infer the following bounds: ( ) n ⋅ cmax cmax , log IfP′ (G) ≤ cmin 2 cmin { c 0, if n ≤ cmax , min ( ) IfP′ (G) ≥ cmax cmin n⋅cmin , if n > log , 2 c c c max

max

min

where cmax = max{cj ∶ 1 ≤ j ≤ 𝜌(G)} and cmin = min{cj ∶ 1 ≤ j ≤ 𝜌(G)}. Let If1 (G) and If2 (G) be entropies of graph G defined using the information functions f1 and f2 , respectively. Further, we define another function f (v) = c1 f1 (v) + c2 f2 (v), v ∈ V . In the following, we will give the relations between

5.2

Inequalities and Extremal Properties on (Generalized) Graph Entropies

the graph entropy If (G) and the entropies If1 (G) and If2 (G), which were found and proved by Dehmer and Sivakumar [34]. Theorem 5.24. [34] Suppose f1 (v) ≤ f2 (v) for all v ∈ V . Then, If (G) can be bounded by If1 (G) and If2 (G) as follows: ( ) c A c (c + c2 )A2 (c + c2 )A1 If1 (G) − log2 1 1 − 2 1 , If (G) ≥ 1 A A c1 A ln(2) ( ) (c + c2 )A2 c A If (G) ≤ 1 If2 (G) − log2 2 2 , A A ∑ ∑ where A = c1 A1 + c2 A2 , A1 = f1 (v), and A2 = f2 (v). v∈V

v∈V

Theorem 5.25. [34] Given two information functions f1 (v), f2 (v) such that f1 (v) ≤ f2 (v) for all v ∈ V , then A log e A2 A A1 A2 I (G) + log2 − 2 log + 2 2 , A1 f2 A 1 + A2 A 1 2 A 1 + A2 A1 ∑ ∑ where A1 = f1 (v) and A2 = f2 (v). If1 (G) ≤

v∈V

v∈V

The next theorem gives another bound for If (G) in terms of both If1 (G) and If2 (G) by using the concavity property of the logarithmic function. Theorem 5.26. [34] Let f1 (v) and f2 (v) be two arbitrary functions defined on a graph G. If f (v) = c1 f1 (v) + c2 f2 (v) for all v ∈ V , we infer [ ] [ ] c A c A c A c A If (G) ≥ 1 1 If1 (G) − log2 1 1 + 2 2 If2 (G) − log2 2 2 − log2 e, A A A A [ ] [ ] c A c A c A c A If (G) ≤ 1 1 If1 (G) − log2 1 1 + 2 2 If2 (G) − log2 2 2 , A A A A ∑ ∑ where A = c1 A1 + c2 A2 , A1 = f1 (v), and A2 = f2 (v). v∈V

v∈V

The following theorem is a direct extension of the previous statement. Here, an information function is expressed as a linear combination of k arbitrary information functions. Corollary 5.27. [34] Let k ≥ 2 and f1 (v), f2 (v), … , fk (v) be arbitrary functions defined on a graph G. If f (v) = c1 f1 (v) + c2 f2 (v) + · · · + ck fk (v) for all v ∈ V , we infer [ ]} k { ∑ cA ci Ai Ifi (G) − log2 i i − (k − 1)log2 e, If (G) ≥ A A i=1 [ ]} k { ∑ cA ci Ai Ifi (G) − log2 i i , If (G) ≤ A A i=1 where A =

k ∑ i=1

ci Ai , Aj =

∑ v∈V

fj (v) for 1 ≤ j ≤ k.

151

152

5 Graph Entropy: Recent Results and Perspectives

Let G1 = (V1 , E1 ) and G2 = (V2 , E2 ) be two arbitrary connected graphs on vertices n1 and n2 , respectively. The union of the graphs G1 ∪ G2 is the disjoint union of G1 and G2 . The join of the graphs G1 + G2 is defined as the graph G = (V , E) with vertex set V = V1 ∪ V2 and edge set E = E1 ∪ E2 ∪ {(x, y) ∶ x ∈ V1 , y ∈ V2 }. In the following, we will state the results of entropy If (G) based on union of graphs and join of graphs. Theorem 5.28. [34] Let G = (V , E) = G1 ∪ G2 be the disjoint union of graphs G1 = (V1 , E1 ) and G2 = (V2 , E2 ). Let f be an arbitrary information function. Then, ( ) ( ) A A A A If (G) = 1 If (G1 ) − log2 1 + 2 If (G2 ) − log2 2 , A A A A ∑ ∑ fG1 (v) and A2 = fG2 (v). where A = A1 + A2 with A1 = v∈V1

v∈V2

As an immediate generalization of the previous theorem by taking k disjoint graphs into account, we have the following corollary. Corollary 5.29. [34] Let G1 = (V1 , E1 ), G2 = (V2 , E2 ), … , Gk = (Vk , Ek ) be k arbitrary connected graphs on n1 , n2 , … , nk vertices, respectively. Let f be an arbitrary information function and G = (V , E) = G1 ∪ G2 ∪ · · · ∪ Gk be the disjoint union of graphs Gi . Then, ( )} k { ∑ Ai Ai If (Gi ) − log2 , If (G) = A A i=1 ∑ where A = A1 + A2 + · · · + Ak with Ai = fGi (v) for 1 ≤ i ≤ k. v∈Vi

Next we focus on the value of IfP (G) and IfP′ (G) depending on the join of graphs. Theorem 5.30. [34] Let G = (V , E) = G1 + G2 be the join of graphs G1 = (V1 , E1 ) and G2 = (V2 , E2 ), where |Vi | = ni , i = 1, 2. The graph entropy IfP (G) can then be expressed in terms of IfP (G1 ) and IfP (G2 ) as follows: ( ) A 𝛽 c1 n1 A 𝛽 c1 n2 A 𝛽 c1 n2 IfP (G) = 1 IfP (G1 ) − log2 1 + 2 A A A ) ( A2 𝛽 c1 n2 , × IfP (G2 ) − log2 A ∑ ∑ where A = A1 𝛽 c1 n2 + A2 𝛽 c1 n2 with A1 = fG1 (v) and A2 = fG2 (v). v∈V1

v∈V2

Theorem 5.31. [34] Let G = (V , E) = G1 + G2 be the join of graphs G1 = (V1 , E1 ) and G2 = (V2 , E2 ), where |Vi | = ni , i = 1, 2. Then, ( ) ( ) A 2c n n A A A IfP′ (G) ≥ 1 IfP′ (G1 ) − log2 1 + 2 IfP′ (G2 ) − log2 2 − 1 1 2 , A A A A A ln(2) ∑ ∑ where A = 2c1 n1 n2 + A1 + A2 with A1 = fG1 (v) and A2 = fG2 (v). v∈V1

v∈V2

5.2

Inequalities and Extremal Properties on (Generalized) Graph Entropies

Furthermore, an alternate set of bounds has been achieved in Ref. [34]. Theorem 5.32. [34] Let G = (V , E) = G1 + G2 be the join of graphs G1 = (V1 , E1 ) and G2 = (V2 , E2 ), where |Vi | = ni , i = 1, 2. Then, ( ) ( ) A A A A IfP′ (G) ≤ 1 IfP′ (G1 ) − log2 1 + 2 IfP′ (G2 ) − log2 2 A A A A c2 n1 n2 c1 n1 n2 log2 1 2 , A A ( ) ( ) A2 A1 A1 A2 IfP′ (G1 ) − log2 + IfP′ (G2 ) − log2 IfP′ (G) ≥ A A A A −

c2 n1 n2 c1 n1 n2 log2 1 2 − log2 e, A A ∑ ∑ fG1 (v) and A2 = fG2 (v). where A = 2c1 n1 n2 + A1 + A2 with A1 = −

v∈V1

v∈V2

5.2.6 Extremal Properties of Degree- and Distance-Based Graph Entropies

Many graph invariants have been used to construct entropy-based measures to characterize the structure of complex networks or deal with inferring and characterizing relational structures of graphs in discrete mathematics, computer science, information theory, statistics, chemistry, biology, and so on. In this section, we will state the extremal properties of graph entropies that are based on information functions fdl (vi ) = dil and fkn (vi ) = nk (vi ), where l is an arbitrary real number and nk (vi ) is the number of vertices with distance k to vi , 1 ≤ k ≤ 𝜌(G). In this section, we assume that G = (V , E) is a simple connected graph with n vertices and m edges. By applying Eq. (5.3) in Definition 5.8, we can obtain two special graph entropies based on information functions fdl and fkn . ( n ) n n ∑ ∑ ∑ dil dil dil l If l (G) ∶= − − log = log d log dil , ∑ ∑ ∑ i n n n l l l d i=1 i=1 i=1 j=1 dj j=1 dj j=1 dj ) ( n ∑ nk (vi ) nk (vi ) If n (G) ∶= − log ∑n ∑n k i=1 j=1 nk (vj ) j=1 nk (vj ) ( n ) n ∑ ∑ 1 = log nk (vi ) − ∑n nk (vi ) log nk (vi ). i=1 j=1 nk (vj ) i=1 The entropy If l (G) is based on an information function by using degree powers, d which is one of the most important graph invariants and has been proved useful in information theory, social networks, network reliability, and mathematical chemistry [35, 36]. In addition, the sum of degree powers has received considerable attention in graph theory and extremal graph theory, which is related to the

153

154

5 Graph Entropy: Recent Results and Perspectives

famous Ramsey problem [37, 38]. Meanwhile, the entropy If n (G) relates to a new k information function, which is the number of vertices with distance k to a given vertex. Distance is one of the most important graph invariants. For a given vertex v in a graph, the number of pairs of vertices with distance three, which is related to the clustering coefficient of networks [39], is also called the Wiener polarity index introduced by Wiener [40]. ∑n Since i=1 di = 2m, we have 1 ∑ (d log di ). 2m i=1 i n

If 1 = log(2m) − d

In Ref. [41], the authors focused on extremal properties of graph entropy If 1 (G) d and obtained the maximum and minimum entropies for certain families of graphs, that is, trees, unicyclic graphs, bicyclic graphs, chemical trees, and chemical graphs. Furthermore, they proposed some conjectures for extremal values of those measures of trees. Theorem 5.33. [41] Let T be a tree on n vertices. Then, we have If 1 (T) ≤ If 1 (Pn ), d d the equality holds only if T ≅ Pn ; If 1 (T) ≥ If 1 (Sn ), the equality holds only if T ≅ Sn . d

d

A dendrimer is a tree with two additional parameters, the progressive degree p and the radius r. Every internal vertex of the tree has degree p + 1. In Ref. [42], the authors obtained the following result. Theorem 5.34. [42] Let D be a dendrimer with n vertices. The star and path graphs attain the minimal and maximal value of If 1 (D). d

Theorem 5.35. [41] Let G be a unicyclic graph with n vertices. Then, we have If 1 (G) ≤ If 1 (Cn ), the equality holds only if G ≅ Cn ; If 1 (G) ≥ If 1 (Sn+ ), the equality d d d d holds only if G ≅ Sn+ . Denote by G∗ and G∗∗ the bicyclic graphs with degree sequences [32 , 2n−2 ] and [n − 1, 3, 22 , 1n−4 ], respectively. Theorem 5.36. [41] Let G be a bicyclic graph of order n. Then, we have If 1 (G) ≤ d If 1 (G∗ ), the equality holds only if G ≅ G∗ ; If 1 (G) ≥ If 1 (G∗∗ ), the equality holds only d d d if G ≅ G∗∗ . In chemical graph theory, a chemical graph is a representation of the structural formula of a chemical compound in terms of graph theory. In this case, a graph corresponds to a chemical structural formula, in which a vertex and an edge correspond to an atom and a chemical bond, respectively. Since carbon atoms are 4-valent, we obtain graphs in which no vertex has degree greater than four. A chemical tree is a tree T with maximum degree at most four. We call chemical graphs with n vertices and m edges (n, m)-chemical graphs. For a more thorough introduction on chemical graphs, we refer to [29, 43].

5.2

Inequalities and Extremal Properties on (Generalized) Graph Entropies

Let T ∗ be a tree with n vertices and n − 2 = 3a + i, i = 0, 1, 2, whose degree sequence is [4a , i + 1, 1n−a−1 ]. Let G1 be the (n, m)-chemical graph with degree sequence [d1 , d2 , … , dn ] such that |di − dj | ≤ 1 for any i ≠ j and G2 be an (n, m)chemical graph with at most one vertex of degree 2 or 3. Theorem 5.37. [41] Let T be a chemical tree of order n such that n − 2 = 3a + i, i = 0, 1, 2. Then, we have If 1 (T) ≤ If 1 (Pn ), the equality holds only if T ≅ Pn ; d d If 1 (T) ≥ If 1 (T ∗ ), the equality holds only if T ≅ T ∗ . d

d

Theorem 5.38. [41] Let G be an (n, m)-chemical graph. Then, we have If 1 (G) ≤ d If 1 (G1 ), the equality holds only if G ≅ G1 ; If 1 (G) ≥ If 1 (G2 ), the equality holds only d d d if G ≅ G2 . By performing numerical experiments, Cao et al. [41] proposed the following conjecture, while several attempts to prove the statement by using different methods failed. Conjecture 5.39. [41] Let T be a tree with n vertices and l > 0. Then, we have If l (T) ≤ If l (Pn ), the equality holds only if T ≅ Pn ; If l (T) ≥ If l (Sn ), the equality holds d d d d only if T ≅ Sn . Furthermore, Cao and Dehmer [44] extended the results obtained in Ref. [41]. They explored the extremal values of If l (G) and the relations between this entropy d and the sum of degree powers for different values of l. In addition, they demonstrated those results by generating numerical results using trees with 11 vertices and connected graphs with seven vertices. Theorem 5.40. [44] Let G be a graph with n vertices. Denote by 𝛿 and Δ the minimum degree and maximum degree of G, respectively. Then, we have ( n ) ( n ) ∑ ∑ l di − l log Δ ≤ If l (G) ≤ log dil − l log 𝛿. log i=1

d

i=1

The following corollary can be obtained directly from the above theorem. Corollary 5.41. [44] If G is a d-regular graph, then If l (G) = log n for any l. d

Observe that if G is regular, then If l (G) is a function only on n. For the trees d with 11 vertices and connected graphs with seven vertices, Cao and Dehmer [44] ∑n l gave numerical results on i=1 di and If l (G), which gives support for the following d conjecture. Conjecture 5.42. [44] For l > 0, If l (G) is a monotonously increasing function on l d for connected graphs. In Ref. [45], the authors discuss the extremal properties of the graph entropy If n (G), thereof leading to a better understanding of this new information-theoretic k

155

156

5 Graph Entropy: Recent Results and Perspectives

∑n 1 quantity. For k = 1, If n (G) = log(2m) − 2m ⋅ i=1 di log di , because n1 (vi ) = di 1 ∑n and i=1 di = 2m. Denote by pk (G) the number of geodesic paths with length k ∑n in graph G. Then, we have i=1 nk (vi ) = 2pk (G), since each path of length k is ∑n counted twice in i=1 nk (vi ). Therefore, If n (G) = log(2pk (G)) − k

n ∑ 1 ⋅ n (v ) log nk (vi ). 2pk (G) i=1 k i

As is well known, there are some good algorithms for finding shortest paths in a graph. From this aspect, the authors obtained the following result first. Proposition 5.43. [45] Let G be a graph with n vertices. For a given integer k, the value of If n (G) can be computed in polynomial time. k

Let T be a tree with n vertices and V (T) = {v1 , v2 , … , vn }. In the following, we present the properties of If n (T) for k = 2 proved by Chen et al. [45]. By some elek mentary calculations, they [45] found that ( n ) ∑n ∑ n2 (vi ) log n2 (vi ) 2 di − 2(n − 1) − ∑i=1 If n (T) = log , n 2 2 i=1 i=1 di − 2(n − 1) If n (Sn ) = log(n − 1), 2

2 , If n (Pn ) = log(n − 2) + 2 n − 2 { log(n) If n (S⌊ n ⌋,⌈ n ⌉ ) = 3k−1 2 2 2 log(k) − k−1 log(k − 1) + 1 2k 2k

if n = 2k, if n = 2k + 1,

If n (CS(n, t)) = log(t 2 − 3t + 2n − 2) 2

2(n − t − 3) + n1 log t + (t − 1)2 log(t − 1) , t 2 − 3t + 2n − 2 Then they obtained the following result. −

n − t ≥ 3.

Theorem 5.44. [45] Let Sn , Pn , S⌊ n ⌋,⌈ n ⌉ be the star, path, and balanced double star 2 2 with n vertices, respectively. Then, If n (Sn ) < If n (Pn ) < If n (S⌊ n ⌋,⌈ n ⌉ ). 2

2

2

2

2

Depending on the above extremal trees of If n (T), Chen et al. [45] proposed the 2 following conjecture. Conjecture 5.45. [45] For a tree T with n vertices, the balanced double star and the comet CS(n, t0 ) can attain the maximum and the minimum values of If n (T). 2

By calculating the values If n (T) for n = 7–10, the authors obtained the trees 2 with extremal values of entropy, which are shown in Figures 5.1 and 5.2, respectively.

5.2

n =7

Inequalities and Extremal Properties on (Generalized) Graph Entropies

n=8

n=9

n = 10

Figure 5.1 The trees with maximum value of If n (T) among all trees with n vertices for 7 ≤ 2 n ≤ 10.

n=7

n=8

n=9

n = 10(2)

n = 10(1)

Figure 5.2 The trees with minimum value of If n (T) among all trees with n vertices for 7 ≤ 2 n ≤ 10.

Observe that the extremal graphs for n = 10 is not unique. From this observation, they [45] obtained the following result. Theorem 5.46. [45] Let CS(n, t) be a comet with n − t ≥ 4. Denote by T a tree obtained from CS(n, t) by deleting the leaf that is not adjacent to the vertex of maximum degree and attaching a new vertex to one leaf that is adjacent to the vertex of maximum degree. Then, If n (T) = If n (CS(n, t)). 2

2

5.2.7 Extremality of If 𝝀 (G), If 2 (G) If 3 (G) and Entropy Bounds for Dendrimers

In the setting of information-theoretic graph measures, we will often consider a tuple (𝜆1 , 𝜆2 , … , 𝜆k ) of nonnegative integers 𝜆i ∈ N. Let G = (V , E) be a connected graph with |V | = n vertices. Here, we define f 𝜆 (vi ) = 𝜆i , for all vi ∈ V . Next, we define f 2 , f 3 as follows.

157

158

5 Graph Entropy: Recent Results and Perspectives

Definition 5.28. [46] Let G = (V , E). For a vertex vi ∈ V , we define f 2 (vi ) ∶= c1 |S1 (vi , G)| + c2 |S2 (vi , G)| + · · · + c𝜌 |S𝜌 (vi , G)|, ck > 0, 1 ≤ k ≤ 𝜌, 𝛼 > 0, f 3 (vi ) ∶= ci 𝜎(vi ),

ck > 0, 1 ≤ k ≤ n,

where 𝜎(v) and Sj (v, G) are the eccentricity and the j-sphere of vertex v, respectively. For the information function f 2 , by applying Eqs (5.3) and (5.4) in Definition 5.8, we can obtain the following two entropy measures If 2 (G) and If𝜆2 (G): If 2 (G) ∶= −

|V | ∑ i=1

f 2 (vi ) f 2 (v ) log ∑|V | i , ∑|V | 2 f (vj ) f 2 (vj ) j=1 j=1

|V | ⎛ ∑ f 2 (vi ) f 2 (vi ) ⎞ ⎟. log If𝜆2 (G) ∶= 𝜆 ⎜log(|V |) + ∑|V | 2 ∑|V | 2 ⎜ f (vj ) f (vj ) ⎟⎠ i=1 j=1 j=1 ⎝

In Ref. [11], the authors proved that if the graph G = (V , E) is k-regular, then If 2 (G) = log(|V |), and hence, If𝜆2 (G) = 0. For our purpose, we will mainly use decreasing sequences c1 , … , c𝜌(G) of (1) constant decrease: c1 ∶= S, c2 ∶= S − k, … , c𝜌(G) ∶= S − (𝜌(G) − 1)k, (2) quadratic decrease: c1 ∶= S2 , c2 ∶= (S − k)2 , … , c𝜌(G) ∶= (S − (𝜌(G) − 1)k)2 , and (3) exponential decrease: c1 ∶= S, c2 ∶= Se−k , … , c𝜌(G) ∶= Se−(𝜌(G)−1)k . Intuitive choices for the parameters are S = 𝜌(G) and k = 1. Applying Eq. (5.3) in Definition 5.8, we can obtain three graph entropies as follows: I(𝜆1 , 𝜆2 , … , 𝜆n ) = If 𝜆 (G) = −

n ∑

𝜆i ∑n

j=1 𝜆j

i=1

If 2 (G) = −

n ∑ i=1

If 3 (G) = −

n ∑ i=1

𝜆 log ∑n i j=1

𝜆j

,

f 2 (vi ) f 2 (v ) log ∑n i , ∑n 2 2 j=1 f (vj ) j=1 f (vj ) f 3 (vi ) f 3 (v ) log ∑n i . ∑n 3 3 j=1 f (vj ) j=1 f (vj ) 𝜆

As described in Definition 5.7, pi ∶= pf (vi ) =

𝜆 ∑n i j=1

𝜆j

, i = 1, 2, … , n. Let p = (p1 ,

p2 , … , pn ) be the probability distribution vector. Depending on probability distribution vector, we denote the entropy If 𝜆 (G) as Ip (G) as well. Now, we present some extremal properties of the entropy measure If 𝜆 (G), that is, I(𝜆1 , 𝜆2 , … , 𝜆n ). Lemma 5.47. [46] If (i) 𝜆m + x ≤

Σm n−1

and x ≥ 0, or

5.2

(ii) (𝜆m + x)Σm ≥ then

∑

Inequalities and Extremal Properties on (Generalized) Graph Entropies

2 i∈{1,…,n}−m 𝜆i

and −𝜆m < x < 0,

I(𝜆1 , … , 𝜆m−1 , 𝜆m , 𝜆m+1 , … , 𝜆n ) ≤ I(𝜆1 , … , 𝜆m−1 , 𝜆m + x, 𝜆m+1 , … , 𝜆n ); on the contrary, if Σm (iii) 𝜆m ≤ n−1 and −𝜆m < x < 0, or ∑ (iv) 𝜆m Σm ≥ i∈{1,…,n}−m 𝜆2i and x > 0, then I(𝜆1 , … , 𝜆m−1 , 𝜆m , 𝜆m+1 , … , 𝜆n ) ≥ I(𝜆1 , … , 𝜆m−1 , 𝜆m + x, 𝜆m+1 , … , 𝜆n ), ∑n ∑ where x ≥ −𝜆m , Σ = j=1 𝜆j and Σm = j∈{1,…,n}−m 𝜆j . Let p = (p1 , p2 , … , pn ) be the original probability distribution vector and p = (p1 , p2 , … , pn ) be the changed one, both ordered in increasing order. Further, let ∑n Δp = p − p = (𝛿1 , … , 𝛿n ) where 𝛿1 , … , 𝛿n ∈ ℝ. Obviously, i=1 𝛿i = 0. Lemma 5.48. [46] If (i) there exists a k such that for all 1 ≤ i ≤ k, 𝛿i ≤ 0 and for all k + 1 ≤ i ≤ n, 𝛿 ≥ 0 or, more generally, if ∑i 𝓁 (ii) i=1 𝛿i ≤ 0 for all 𝓁 = 1, … , n, then, Ip (G) ≤ Ip (G). Lemma 5.49. [46] For two probability distribution vectors p and p fulfilling condition (ii) of Lemma 5.48, we have Ip (G) − Ip (G) ≥

n ∑

𝛿i log pi ,

i=1

where 𝛿i are the entries of Δp = p − p. Lemma 5.50. [46] Assume that for two probability distribution vectors p and p, ∑n the opposite of condition (ii) in Lemma 5.48 is true, that is, i=𝓁 𝛿i ≥ 0 for all 𝓁 = 1, … , n. Then, ∑ ∑ 0 > Ip (G) − Ip (G) ≥ 𝛿i log(pi − 𝜌) + 𝛿i log(pi + 𝜌), i∶𝛿i 0

where 𝜌 = maxi∈{2,…,n} (pi − pi−1 ). Proposition 5.51. [46] For two probability distribution vectors p and p with ∑𝓁 i=1 𝛿i ≤ 0 for all 𝓁 in {0, … , 𝓁1 − 1} ∪ {𝓁2 , … , n} (1 ≤ 𝓁1 < 𝓁2 ≤ n), we have 𝓁1 −1

Ip (G) − Ip (G) ≥

∑

𝛿i log pi +

i=1

(

𝓁1 −1

+

∑ i=1

𝛿i log

n ∑

𝓁2 −1

𝛿i log pi +

i=𝓁2

p𝓁1 − 𝜌 p𝓁1

∑

𝛿i log(pi + 𝜌)

i=𝓁1

)

𝓁2 −1

+

∑ i=1

𝛿i log

(

p𝓁2 p𝓁2 + 𝜌

) ,

159

160

5 Graph Entropy: Recent Results and Perspectives

where 𝜌 = maxi∈{2,…,n} (pi − pi−1 ). Hence, if ) (𝓁 −1 ( 𝓁1 −1 𝓁1 −1 n 2 p𝓁1 − 𝜌 ∑ ∑ ∑ ∑ 𝛿i log pi + 𝛿i log pi ≥ − 𝛿i log(pi + 𝜌) + 𝛿i log p𝓁1 i=1 i=1 i=𝓁2 i=𝓁1 )) ( 𝓁2 −1 p𝓁2 ∑ , + 𝛿i log p𝓁2 + 𝜌 i=1 then Ip (G) ≥ Ip (G). In the following, we will show some results [42, 46] regarding the maximum and minimum entropy by using certain families of graphs. As in every tree, a dendrimer has one (monocentric dendrimer) or two (dicentric dendrimer) central vertices, the radius r denotes the (largest) distance from an external vertex to the (closer) center. If all external vertices are at distance r from the center, the dendrimer is called homogeneous. Internal vertices different from the central vertices are called branching nodes and are said to be on the ith orbit if their distance to the (nearer) center is r. Let Dn denote a homogeneous dendrimer on n vertices with radius r and progressive degree p, and let z be its (unique) center. Further denote by Vi (Dn ) the set of vertices in the ith orbit. Now we consider the function f 3 (vi ) = ci 𝜎(vi ), where ci = cj for vi , vj ∈ Vi . We denote ci = c(v), v ∈ Vi . Lemma 5.52. [46] For ci = 1 with i = 0, … , n, the entropy fulfills log n −

1 ≤ If 3 (Dn ) ≤ log n. 4 ln 2

For ci = r − i + 1 with i = 0, … , n, we have log n −

(r − 1)2 ≤ If 3 (Dn ) = log n. 4 ln 2(r + 1)

In general, for weight sequence c(i), i = 0, … , r, where c(i)(r + i) is monotonic in i, we have log n − where 𝜌 =

(𝜌 − 1)2 ≤ If 3 (Dn ) ≤ log n, 2𝜌 ln 2

c(1) 2c(r)

for decreasing and 𝜌 =

2c(r) c(1)

for increasing sequences. The latter esti-

mate is also true for any sequence c(i), when 𝜌 =

maxi (c(i)(r+i)) . minj (c(j)(r+j))

Lemma 5.53. [46] For dendrimers, the entropy If 3 (Dn ) is of order log n as n tends to infinity. By performing numerical experiments, Dehmer and Kraus [46] raised the following conjecture and also gave some ideas on how to prove it.

5.2

Inequalities and Extremal Properties on (Generalized) Graph Entropies

Conjecture 5.54. [46] Let D be a dendrimer on n vertices. For all sequences c0 ≥ c1 ≥ · · · ≥ cr , the star graph (r = 1, p = n − 2) have maximal and the path graph (r = ⌈n − 1∕2⌉, p = 1) have minimal values of entropy If 3 (D). Furthermore, in Ref. [42], the authors proposed another conjecture, which is stated as follows. Conjecture 5.55. [42] Let D be a dendrimer on n vertices. For all sequences ci = cj with i ≠ j, the star graph (r = 1, p = n − 2) has the minimal value of entropy If 3 (D). Let G be a height h, which is defined as in Section 5.12. Denote by |V | and |Vi | the total number of vertices and the number of vertices on the ith level, respectively. A probability distribution based on the vertices of G is assigned as follows: ′

pVi =

|Vi | . |V | − 1

Then, another entropy of a generalized tree G is defined by ′

I V (G) = −

h ∑

′

′

pVi log(pVi ).

i=1

Similarly, denote by |E| and |Ei | the total number of edges and the number of edges on the ith level, respectively. A probability distribution based on the edges of G is assigned as follows: ′

pEi =

|Ei | . |E| − 1

Then, another entropy of a generalized tree G is defined by ′

I E (G) = −

h ∑

′

′

pEi log(pEi ).

i=1 ′

′

Now we give some extremal properties [42] of I V (D) and I E (D), where D is a dendrimer. Theorem 5.56. [42] Let D be a dendrimer on n vertices. The star graph attains ′ ′ the minimal value of I V (D) and I E (D), and the dendrimer with parameter t = ′ V′ t0 attains the maximal value of I (D) and I E (D), where t = t0 ∈ (1, n − 2) is the integer that is closest to the root of the equation ( ) ( ) t(t + 1) 𝑛𝑡 − n + 2 2t n ln − ln − = 0. n−1 t+1 n−1 t+1 According to Rashevsky [1], |Xi | denotes the number of topologically equivalent vertices in the ith vertex orbit of G, where k is the number of different orbits. ′ |Vi | Suppose |X| = |V | − 1. Then, the probability of Xi can be expressed as pVi = |V |−1 . Therefore, by applying Eqs (5.9), (5.5), (5.7) in Definition 5.10, we can obtain the entropies as follows:

161

162

5 Graph Entropy: Recent Results and Perspectives

(i) I 5 (G) ∶=

k ∑

′

′

pVi (1 − pVi ),

i=1

(ii) I𝛼1 (G)

( k ) ∑ ( ′ )𝛼 1 V pi log ∶= , 1−𝛼 i=1

(iii)

∑k I𝛼3 (G) ∶=

V′ 𝛼 i=1 (pi ) − 21−𝛼 − 1

1

,

𝛼 ≠ 1,

𝛼 ≠ 1.

Theorem 5.57. [42] Let D be a dendrimer on n vertices. (i) The star graph and path graph attain the minimal and maximal value of I 5 (D). (ii) For 𝛼 ≠ 1, the star graph and path graph attain the minimal and maximal value of I𝛼1 (D). (iii) For 𝛼 ≠ 1, the star graph and path graph attain the minimal and maximal value of I𝛼3 (D). Next, we describe the algorithm for uniquely decomposing a graph G ∈ UC into a set of undirected generalized trees [47]. Algorithm 5.1. A graph G ∈ UC with |V | vertices can be locally decomposed into a set of generalized trees as follows: Assign vertex labels to all vertices from 1 to |V |. These labels form the label set LS = {1, … , |V |}. Choose a desired height of the trees that is denoted by h. Choose an arbitrary label from LS , for example, i. The vertex with this label is the root vertex of a tree. Now, perform the following steps: (1) Calculate the shortest distance from the vertex i to all other vertices in the graph G, for example, by the algorithm of Dijkstra [48]. (2) The vertices with distance k from the vertex i are the vertices on the kth level of the resulting generalized trees. Select all vertices of the graph up to distance h, including the connections between the vertices. Connections to vertices with distance > h are deleted. (3) Delete the label i from the label set LS . (4) Repeat this procedure if LS is not empty by choosing an arbitrary label from LS ; otherwise, terminate. ′

Now we replace pEi =

|Ei | |E|−1

′

by pEi =

|Ei | , 2|E|−d(r)

where r is the root of the gener′

alized tree and d(r) is the degree of r. Then, we can obtain a new I E (G), which is defined similarly as above. In addition, we give another definition of the structural information content of a graph as follows. Definition 5.29. [47] Let G ∈ UC and SGH ∶= {H1 , H2 , … , H|V | } be the associated set of generalized trees obtained from Algorithm 5.1. We now define the

5.2

Inequalities and Extremal Properties on (Generalized) Graph Entropies

structural information content of G by ′

I V (G) ∶= −

|V | ∑

′

I V (Hi )

i=1

and ′

I E (G) ∶= −

|V | ∑

′

I E (Hi ).

i=1

In Ref. [47], Dehmer analyzed the time complexities for calculating the entropies ′ ′ I V (G) and I E (G) depending on the decomposition given by Algorithm 5.1. ′

′

Theorem 5.58. [47] The overall time complexity to calculate I V (G) and I E (G) is ∑|V | finally O(|V |3 + i=1 |VHi |2 ). Let Tn,d be the family of trees of order n with a fixed diameter d. We call a tree consisting of a star on n − d + 1 vertices together with a path of length d − 1 attached to the central vertex, a comet of order n with tail length d − 1, and denote it by Cn,d−1 . Analogously, we call a tree consisting of a star on n − d vertices together with two paths of lengths ⌊ d2 ⌋ and ⌈ d2 ⌉, respectively, attached to the central vertex, a two-tailed comet of order n and denote it by Cn,⌊ d ⌋,⌈ d ⌉ . 2

2

Theorem 5.59. [46] For every linearly or exponentially decreasing sequence c1 > c2 > · · · > cd with d ≥ 4 as well as every quadratically decreasing sequence with d ≥ 5, for large enough n, the probability distribution q(n, d) of the two-tailed comet Cn,⌊ d ⌋,⌈ d ⌉ is the majority by the probability distribution p(n, d) of the comet Cn,d−1 . 2 2 This is equivalent to the fact that Δp = q − p fulfills condition (ii) of Lemma 5.48. Hence, ( ) If 2 Cn,⌊ d ⌋,⌈ d ⌉ ≥ If 2 (Cn,d−1 ). 2

2

Conjecture 5.56. [46] Among all trees Tn,d , with d ≪ n, the two-tailed comet Cn,⌊ n ⌋,⌈ n ⌉ achieves maximal value of the entropies If 2 (G) and If 3 (G). 2

2

5.2.8 Sphere-Regular Graphs and the Extremality Entropies If 2 (G) and If 𝝈 (G)

Let G = (V , E) be a connected graph with |V | = n vertices. As we have defined before, the information function f 2 (vi ) = c1 |S1 (vi , G)| + c2 |S2 (vi , G)| + · · · + c𝜌 |S𝜌 (vi , G)|, where ck > 0, 1 ≤ k ≤ 𝜌, 𝛼 > 0, and Sj (v, G) is the j-sphere of the vertex v. Now we define another information function. Definition 5.30. The eccentricity function f 𝜎 if defined by f𝜎 ∶ V → ℤ ∶

f 𝜎 (v) = 𝜎(v).

163

164

5 Graph Entropy: Recent Results and Perspectives

Applying Eq. (5.3) in Definition 5.8, we can obtain the following two graph entropy measures [49]: If 2 (G) ∶= −

n ∑ i=1

If 𝜎 (G) ∶= −

n ∑ i=1

f 2 (vi ) f 2 (v ) log ∑n i , ∑n 2 2 j=1 f (vj ) j=1 f (vj ) f 𝜎 (vi ) f 𝜎 (v ) log ∑n i . ∑n 𝜎 𝜎 j=1 f (vj ) j=1 f (vj )

In Ref. [49], the authors proposed the concept of sphere-regular. Definition 5.31. [49] We call a graph sphere-regular if there exist positive integers s1 , … , s𝜌(G) , such that (|S1 (v, G)|, |S2 (v, G)|, … , |S𝜌(G) (v, G)|) = (s1 , … , s𝜌(G) ) for all vertices v ∈ V . In Ref. [49], the authors also tried to classify those graphs which return maximal value of entropy If 2 (G) for the sphere-function and an arbitrary decreasing weight sequence. In the following, we state their results. Proposition 5.61. [49] Every sphere-regular graph with n vertices has maximum entropy If 2 (G) = log n. Lemma 5.62. [49] Sphere-regular graphs are the only maximal graphs for If 2 (G) when using a weight sequence such that there exists no number ai , i = 1, … , 𝜌(G), ∑𝜌(G) ai ∈ ℤ with i=1 ai = 0, where 𝜌(G)

∑

aj cj = 0.

j=1

Theorem 5.63. [49] There are maximal graphs with respect to If 2 (G) which are not sphere-regular. Next, we will present some restrictions on maximal graphs for If 2 (G), which are valid for any decreasing weight sequence. Lemma 5.64. [49] A graph of diameter 2 is maximal for If 2 (G) only if it is sphereregular. Lemma 5.65. [49] Maximal graphs for If 2 (G) cannot have unary vertices (vertices with degree 1). Hence, in particular, trees cannot be maximal for If 2 (G). Corollary 5.66. [49] The last nonzero entries of the sphere-sequence of a vertex in a maximal graph cannot be two or more consecutive ones. Lemma 5.67. [49] A maximal graph for If 2 (G) different from the complete graph Kn cannot contain a vertex of degree n − 1.

5.2

8 (linear)

9 (linear)

Inequalities and Extremal Properties on (Generalized) Graph Entropies

8 (exponential)

9 (exponential)

Figure 5.3 Minimal graphs for the entropy If 2 (G) of orders 8 and 9.

In Ref. [49], the authors gave the minimal graphs for If 2 (G) of orders 8 and 9 by computations, which are depicted in Figure 5.3. The graphs on the left-hand side are minimal for the linear sequence and those on the right-hand side are minimal for the exponential sequence. Unfortunately, there is very little known about minimal entropy graphs. And the authors gave the following conjecture in Ref. [49]. Conjecture 5.68. [49] The minimal graph for If 2 (G) with the exponential sequence √ is a tree. Further, it is√ a generalized star of diameter approximately 2n, and hence, with approximately 2n branches. Interestingly, the graph 9(𝑙𝑖𝑛𝑒𝑎𝑟) is also one of the maximal graphs for If 𝜎 (G) in N9 , where Ni is the set of all nonisomorphic graphs on i vertices. In addition, one elementary result on maximal graphs with respect to f 𝜎 is also obtained. Lemma 5.69. [49] (i) A graph G is maximal with respect to If 𝜎 (G) only if its every vertex is an endpoint of a maximal path in G. (ii) A maximal graph different from the complete graph Kn cannot contain a vertex of degree n − 1. Similar to the case of If 2 (G), there is still very little known about minimal entropy graphs respect to If 𝜎 (G). For N8 and N9 , computations show that there are two minimal graphs. For n = 8, they are depicted in Figure 5.4, for n = 9, they contain five vertices of degree 8 each. Kraus et al. [49] gave another conjecture as follows. Conjecture 5.70. [49] A minimal graph for If 𝜎 (G) is a highly connected graph, that is, it is a graph obtained from the complete graph Kn by removal of a small number

165

166

5 Graph Entropy: Recent Results and Perspectives

Figure 5.4 Two minimal graphs for the graph entropy If 𝜎 (G) of order 8.

of edges. In particular, we conjecture that a minimal graph for If 𝜎 (G) on n vertices will have m ≥ n2 vertices of degree n − 1. 5.2.9 Information Inequalities for Generalized Graph Entropies

Sivakumar and Dehmer [50] discussed the problem of establishing relationships between information measures for network structures. Two types of entropy measures, namely the Shannon entropy and its generalization, the Rényi entropy, have been considered for their study. They established formal relationships, by means of inequalities, between these two types of measures. In addition, they proved inequalities connecting the classical partition-based graph entropies and partition-independent entropy measures, and also gave several explicit inequalities for special classes of graphs. To begin with, we give the theorem which provide the bounds for Rényi entropy in terms of Shannon entropy. Theorem 5.71. [50] Let pf (v1 ), pf (v2 ), … , pf (vn ) be the probability values on the vertices of a graph G with n vertices. Then, the Rényi entropy can be bounded by the Shannon entropy as follows: when 0 < 𝛼 < 1, If (G) ≤ I𝛼2 (G)f < If (G) +

n(n − 1)(1 − 𝛼)𝜌𝛼−2 , 2 ln 2

when 𝛼 > 1, If (G) −

(𝛼 − 1)n(n − 1) < I𝛼2 (G)f ≤ If (G), 2 ln 2 ⋅ 𝜌𝛼−2 pf (v )

where 𝜌 = maxi,k pf (v i ) . k

Observe that Theorem 5.71, in general, holds for any probability distribution with nonzero probability values. The following theorem illustrates this fact with the help of a probability distribution obtained by partitioning a graph object.

5.2

Inequalities and Extremal Properties on (Generalized) Graph Entropies

Theorem 5.72. [50] Let p1 , p2 , … , pk be the probabilities of the partitions obtained using an equivalence relation 𝜏 as stated before. Then, when 0 < 𝛼 < 1, I(G, 𝜏) ≤ I𝛼1 (G) < I(G, 𝜏) +

k(k − 1)(1 − 𝛼)𝜌𝛼−2 , 2 ln 2

when 𝛼 > 1, I(G, 𝜏) ≥ I𝛼1 (G) > I(G, 𝜏) −

k(k − 1)(𝛼 − 1) , 2 ln 2 ⋅ 𝜌𝛼−2

p

where 𝜌 = maxi,j pi . j

In the next theorem, bounds between like-entropy measures are established, by considering the two different probability distributions. Theorem 5.73. [50] Let G be a graph with n vertices. Suppose |Xi | < f (vi ) for 1 ≤ i ≤ k. Then, ) ( 𝛼 S log2 I𝛼1 (G) < I𝛼2 (G)f + 1−𝛼 |X| if 0 < 𝛼 < 1, and I𝛼1 (G) > I𝛼2 (G)f − if 𝛼 > 1. Here S =

∑n i=1

𝛼 log 𝛼−1 2

(

S |X|

)

f (vi ).

Furthermore, Sivakumar and Dehmer [50] also paid attention to generalized graph entropies, which is inspired by the Rényi entropy, and presented various bounds when two different functions and their probability distributions satisfy certain initial conditions. Let f1 and f2 be two information functions defined on ∑n ∑n G = (V , E) with |V | = n. Let S1 = i=1 f1 (vi ) and S2 = i=1 f2 (vi ). Let pf1 (v) and pf2 (v) denote the probabilities of f1 and f2 , respectively, on a vertex v ∈ V . Theorem 5.74. [50] Suppose pf1 (v) ≤ 𝜓pf2 (v), ∀v ∈ V , and 𝜓 > 0 a constant. Then, if 0 < 𝛼 < 1, I𝛼2 (G)f1 ≤ I𝛼2 (G)f2 + and if 𝛼 > 1, I𝛼2 (G)f1 ≥ I𝛼2 (G)f2 −

𝛼 log 𝜓, 1−𝛼 2 𝛼 log 𝜓. 𝛼−1 2

Theorem 5.75. [50] Suppose pf1 (v) ≤ pf2 (v) + 𝜙, ∀v ∈ V , and 𝜙 > 0 a constant. Then, n ⋅ 𝜙𝛼 1 I𝛼2 (G)f1 − I𝛼2 (G)f2 < ∑ 1 − 𝛼 v∈V (pf2 (v))𝛼

167

168

5 Graph Entropy: Recent Results and Perspectives

if 0 < 𝛼 < 1, and I𝛼2 (G)f2 − I𝛼2 (G)f1
1. Theorem 5.76. [50] Let f (v) = c1 f1 (v) + c2 f2 (v), ∀v ∈ V . Then, for 0 < 𝛼 < 1,
1, 𝛼 𝛼 A2 log A − I𝛼2 (G)f > I𝛼2 (G)f1 − 𝛼 − 1 2 1 𝛼 − 1 A1 where A1 =

c1 S1 c1 S1 +c2 S2

and A2 =

(∑

( f )𝛼 ) 𝛼1 2 v∈V p (v) , ∑ f1 𝛼 v∈V (p (v))

c2 S2 . c1 S1 +c2 S2

Theorem 5.77. [50] Let f (v) = c1 f1 (v) + c2 f2 (v), ∀v ∈ V . Then, if 0 < 𝛼 < 1, I𝛼2 (G)f
1, I𝛼2 (G)f >

where A1 =

1 2 𝛼 [I (G)f1 + I𝛼2 (G)f2 ] − log (A A ) 2 𝛼 2(𝛼 − 1) 2 1 2 ( )1 ( )1 ⎡ A ∑ (pf2 (v))𝛼 𝛼 A ∑ (pf1 (v))𝛼 𝛼 ⎤ 𝛼 v∈V v∈V 2 1 ⎢ ⎥, − + ∑ ∑ ⎥ 2(𝛼 − 1) ⎢ A1 A2 (pf1 (v))𝛼 (pf2 (v))𝛼 v∈V v∈V ⎣ ⎦

c1 S1 c1 S1 +c2 S2

and A2 =

c2 S2 . c1 S1 +c2 S2

Let Sn be a star on n vertices, whose central vertex is denoted by u. Let 𝜏 be an automorphism defined on Sn such that 𝜏 partitions V (Sn ) into two orbits, V1 and V2 , where V1 = {u} and V2 = V (Sn ) − {u}. Theorem 5.78. [50] If 𝜏 is the automorphism, as defined above, on Sn , then, for 0 < 𝛼 < 1, I𝛼1 (Sn ) < log2 n −

(1 − 𝛼)(n − 1)𝛼−2 n−1 log2 (n − 1) + , n ln 2

5.2

Inequalities and Extremal Properties on (Generalized) Graph Entropies

and for 𝛼 > 1, I𝛼1 (Sn ) > log2 n −

n−1 𝛼−1 log2 (n − 1) − . n (n − 1)𝛼−2 ln 2

Theorem 5.79. [50] Let 𝜏 be an automorphism on V (Sn ) and let f be any information function defined on V (Sn ) such that |V1 | < f (vi ) and |V2 | < f (vj ) for some i, j, 1 ≤ i ≠ j ≤ n. Then, for 0 < 𝛼 < 1, I𝛼2 (Sn )f >

1 𝛼 log (1 + (n − 1)𝛼 ) − log S, 1−𝛼 2 1−𝛼 2

and for 𝛼 > 1, 1 𝛼 I𝛼2 (Sn )f < log (1 + (n − 1)𝛼 ) + log S, 1−𝛼 2 𝛼−1 2 ∑ where S = v∈V f (v). The path graph, denoted by Pn , are the only trees with maximum diameter among all the trees on n vertices. Let 𝜏 be an automorphism defined on Pn , where 𝜏 partitions the vertices of Pn into n2 orbits (Vi ) of size 2, when n is even, and n−1 2 orbits of size 2 and one orbit of size 1, when n is odd. Sivakumar and Dehmer [50] derived equalities and inequalities on generalized graph entropies I𝛼1 (Pn ) and I𝛼2 (Pn )f depending on the parity of n. Theorem 5.80. [50] Let n be an even integer and f be any information function such that f (v) > 2 for at least n2 vertices of Pn and let 𝜏 be stated as above. Then, n I𝛼1 (Pn ) = log2 , 2 and I𝛼2 (Pn )f >

1 𝛼 log n − log S − 1 1−𝛼 2 1−𝛼 2

if 0 < 𝛼 < 1, 1 𝛼 log n + log S − 1 1−𝛼 2 𝛼−1 2 ∑ if 𝛼 > 1, where S = v∈V f (v). I𝛼2 (Pn )f
1, log2 n −

] [ (n + 1) ⋅ (𝛼 − 1) 1 n−1 ≥ I𝛼1 (Pn ) > log2 n − (n − 1) . + n n ln 2 ⋅ 2𝛼+1

169

170

5 Graph Entropy: Recent Results and Perspectives

Further, if f is an information function such that f (v) > 2 for at least of Pn , then,

n+1 2

vertices

1 𝛼 n−1 log n − log S − 1−𝛼 2 1−𝛼 2 n if 0 < 𝛼 < 1, and I𝛼2 (Pn )f >

1 𝛼 n−1 log n + log S − 1−𝛼 2 𝛼−1 2 n ∑ if 𝛼 > 1, where S = v∈V f (v). I𝛼2 (Pn )f
1, 𝛼(n − 1)X 𝛼(n − 1)X log2 𝛽 ≤ I𝛼2 (G)fP ≤ log2 n + log2 𝛽, 𝛼−1 𝛼−1 where X = cmax − cmin . log2 n −

Theorem 5.83. [50] Let fP′ be given by Eq. (5.15). Let cmax = max{ci ∶ 1 ≤ i ≤ 𝜌(G)} and cmin = min{ci ∶ 1 ≤ i ≤ 𝜌(G)}, where ci is defined in fP′ . Then, the value of I𝛼2 (G)fP′ can be bounded as follows. If 0 < 𝛼 < 1, log2 n −

𝛼 𝛼 log Y ≤ I𝛼2 (G)fP′ ≤ log2 n + log Y , 1−𝛼 2 1−𝛼 2

and if 𝛼 > 1, log2 n − where Y =

𝛼 𝛼 log Y ≤ I𝛼2 (G)fP′ ≤ log2 n + log Y , 𝛼−1 2 𝛼−1 2

cmax . cmin

5.3

Relationships between Graph Structures, Graph Energies, Topological Indices

5.3 Relationships between Graph Structures, Graph Energies, Topological Indices, and Generalized Graph Entropies

In this section, we introduce 10 generalized graph entropies based on distinct graph matrices. Connections between such generalized graph entropies and the graph energies, the spectral moments, and topological indices are provided. Moreover, we will give some extremal properties of these generalized graph entropies and several inequalities between them. Let G be a graph of order n and M be a matrix related to the graph G. Let 𝜇1 , 𝜇2 , … , 𝜇n be the eigenvalues of M (or the singular values for some matrices). If f ∶= |𝜇i |, then, as defined in Definition 5.7, |𝜇 | . pf (vi ) = ∑n i j=1 |𝜇j | Therefore, the generalized graph entropies are defined as follows: (i) 6

I (G)𝜇 =

n ∑

∑n

i=1

(ii)

[

|𝜇i |

j=1

|𝜇j |

]

|𝜇i |

1 − ∑n

j=1

|𝜇j |

,

( n ( )𝛼 ) ∑ |𝜇 | 1 i I𝛼2 (G)𝜇 = log , ∑n 1−𝛼 i=1 j=1 |𝜇j |

(iii)

( I𝛼4 (G)𝜇

=

1

n ∑

21−𝛼 − 1

i=1

(

|𝜇i |

∑n

j=1

|𝜇j |

)𝛼

𝛼 ≠ 1, )

−1 ,

𝛼 ≠ 1.

1. Let A(G) be the adjacency matrix of graph G, and the eigenvalues of A(G), 𝜆1 , 𝜆2 , … , 𝜆n , are said to be the eigenvalues of the graph G. The energy of G is ∑n 𝜀(G) = |𝜆 |. The kth spectral moment of the graph G is defined as Mk (G) = ∑n k i=1 i 𝜆i . In Ref. [51], the authors defined the moment-like quantities, Mk∗ (G) = i=1 ∑n k i=1 |𝜆i | . Theorem 5.84. [52] Let G be a graph with n vertices and m edges. Then, for 𝛼 ≠ 1, we have (i) I 6 (G)𝜆 = 1 − (ii) I𝛼2 (G)𝜆 =

2m , 𝜀2

M∗ 1 log 𝛼𝛼 , 1−𝛼 𝜀

171

172

5 Graph Entropy: Recent Results and Perspectives

(iii) I𝛼4 (G)𝜆 =

1 21−𝛼 − 1

(

) M𝛼∗ − 1 , 𝜀𝛼

where 𝜀 denotes the energy of graph G and M𝛼∗ =

n ∑ i=1

|𝜆i |𝛼 .

The above theorem directly implies that for a graph G, each upper (lower) bound of energy can be used to deduce an upper (a lower) bound of I 6 (G)𝜆 . Corollary 5.85. [52] (i) For a graph G with m edges, we have 1 1 ≤ I 6 (G)𝜆 ≤ 1 − . 2 2m (ii) Let G be a graph with n vertices and m edges. Then, 1 . n (iii) Let T be a tree of order n. We have I 6 (G)𝜆 ≤ 1 −

I 6 (Sn )𝜆 ≤ I 6 (T)𝜆 ≤ I 6 (Pn )𝜆 , where Sn and Pn denote the star graph and path graph of order n, respectively. (iv) Let G be a unicyclic graph of order n. Then, we have I 6 (G)𝜆 ≤ I 6 (Pn6 )𝜆 ,

(v)

where Pn6 [53, 54] denotes the unicyclic graph obtained by connecting a vertex of C6 with a leaf of order Pn−6 . Let G be a graph with n vertices and m edges. If its cyclomatic number is k = m − n + 1, then, we have 2m I 6 (G)𝜆 ≤ 1 − , (4n∕𝜋 + ck )2 where ck is a constant, which only depends on k.

2. Let Q(G) be the signless Laplacian matrix of a graph G. Then, Q(G) = D(G) + A(G), where D(G) = diag(d1 , d2 , … , dn ) denotes the diagonal matrix of vertex degrees of G and A(G) is the adjacency matrix of G. Let q1 , q2 , … , qn be the eigenvalues of Q(G). Theorem 5.86. [55] Let G be a graph with n vertices and m edges. Then, for 𝛼 ≠ 1, we have (i) I 6 (G)q = 1 − (ii) I𝛼2 (G)q =

1 (M + 2m), 4m2 1

M𝛼∗ 1 log , 1−𝛼 (2m)𝛼

5.3

Relationships between Graph Structures, Graph Energies, Topological Indices

(iii) I𝛼4 (G)q =

1 21−𝛼 − 1

(

) M𝛼∗ − 1 , (2m)𝛼

where M1 denotes the first Zagreb index and M𝛼∗ =

n ∑ i=1

|qi |𝛼 .

Corollary 5.87. [55] (i) For a graph G with n vertices and m edges, we have I 6 (G)q ≤ 1 −

1 1 − . 2m n

(ii) Let G be a graph with n vertices and m edges. The minimum degree of G is 𝛿 and the maximum degree of G is Δ. Then, I 6 (G)q ≥ 1 −

1 Δ2 + 𝛿 2 1 − − , 2m 2n 4nΔ𝛿

with equality only if G is a regular graph, or G is a graph whose vertices have exactly two degrees Δ and 𝛿 such that Δ + 𝛿 divides 𝛿n and there are exactly 𝛿n Δn p = 𝛿+Δ vertices of degree Δ and q = 𝛿+Δ vertices of degree 𝛿. 3. Let ℒ (G) and (G) be the normalized Laplacian matrix and the normalized signless Laplacian matrix, respectively. By definition, ℒ (G) = 1 1 1 1 D(G)− 2 L(G)D(G)− 2 and (G) = D(G)− 2 Q(G)D(G)− 2 , where D(G) is the diagonal matrix of vertex degrees, and L(G) = D(G) − A(G), Q(G) = D(G) + A(G) are, respectively, the Laplacian and the signless Laplacian matrices of the graph G. Denote the eigenvalues of ℒ (G) and (G) by 𝜇1 , 𝜇2 , … , 𝜇n and q1 , q2 , … , qn , respectively. Theorem 5.88. [55] Let G be a graph with n vertices and m edges. Then, for 𝛼 ≠ 1, we have (i) I 6 (G)𝜇 = I 6 (G)q = 1 − (ii)

1 (n + 2R−1 (G)), n2 ′

M∗ M∗ 1 1 log 𝛼𝛼 , I𝛼2 (G)q = log 𝛼𝛼 , = 1−𝛼 n 1−𝛼 n ( ′ ) ( ∗ ) M𝛼 M𝛼∗ 1 1 4 4 − 1 , I𝛼 (G)q = 1−𝛼 −1 , I𝛼 (G)𝜇 = 1−𝛼 n𝛼 2 − 1 n𝛼 2 −1 I𝛼2 (G)𝜇

(iii)

where R−1 (G) denotes the general Randić index R𝛽 (G) of G with 𝛽 = −1 and n n ∑ ∑ ′ M𝛼∗ = |𝜇i |𝛼 , M𝛼∗ = |qi |𝛼 . i=1

i=1

173

174

5 Graph Entropy: Recent Results and Perspectives

Corollary 5.89. [55] (i)

For a graph G with n vertices and m edges, if n is odd, then we have 2 1 1 ≤ I 6 (G)𝜇 = I 6 (G)q ≤ 1 − + , n n2 n−1 if n is even, then we have 1−

1 2 ≤ I 6 (G)𝜇 = I 6 (G)q ≤ 1 − n n−1 with right equality only if G is a complete graph, and with left equality only if G is the disjoint union of n2 paths of length 1 for n is even, and is the disjoint union of n−3 paths of length 1 and a path of length 2 for n is odd. 2 (ii) Let G be a graph with n vertices and m edges. The minimum degree of G is 𝛿 and the maximum degree of G is Δ. Then, 1−

1 1 1 1 − ≤ I 6 (G)𝜇 = I 6 (G)q ≤ 1 − − . n n𝛿 n nΔ Equality occurs in both bounds only if G is a regular graph. 1−

4. Let I(G) be the incidence matrix of a graph G with vertex set V (G) = {v1 , v2 , … , vn } and edge set E(G) = {e1 , e2 , … , em }, such that the (i, j)-entry of I(G) is 1 if the vertex vi is incident with the edge ej , and is 0 otherwise. As we know, Q(G) = D(G) + A(G) = I(G) ⋅ I T (G). If the eigenvalues of Q(G) are √ √ √ q1 , q2 , … , qn , then q1 , q2 , … , qn are the singular values of I(G). In addition, ∑n √ the incidence energy of G is defined as IE(G) = i=1 qi . Theorem 5.90. [55] Let G be a graph with n vertices and m edges. Then, for 𝛼 ≠ 1, we have (i) I 6 (G)√q = 1 − (ii)

2m , (IE(G))2

M𝛼∗ 1 log , 1−𝛼 (IE(G))𝛼 ( ) M𝛼∗ 1 = 1−𝛼 − 1 , 2 − 1 (IE(G))𝛼

I𝛼2 (G)√q = (iii) I𝛼4 (G)√q

where IE(G) denotes the incidence energy of G and M𝛼∗ =

n √ ∑ ( qi )𝛼 .

i=1

Corollary 5.91. [55] (i) For a graph G with n vertices and m edges, we have 1 . n The left equality holds only if m ≤ 1, whereas the right equality holds only if m = 0. 0 ≤ I 6 (G)√q ≤ 1 −

5.3

Relationships between Graph Structures, Graph Energies, Topological Indices

(ii) Let T be a tree of order n. Then, we have I 6 (Sn )√q ≤ I 6 (T)√q ≤ I 6 (Pn )√q , where Sn and Pn denote the star and path of order n, respectively. 5. Let the graph G be a connected graph, whose vertices are v1 , v2 , … , vn . The distance matrix of G is defined as D(G) = [d𝑖𝑗 ], where d𝑖𝑗 is the distance between the vertices vi and vj in G. We denote the eigenvalues of D(G) by 𝜇1 , 𝜇2 , … , 𝜇n . The ∑n distance energy of the graph G is DE(G) = i=1 |𝜇i |. ∑ The kth distance moment of G is defined as Wk (G) = 12 1≤i (1−2 𝛼−1) ln 2 I𝛼4 . (ii) When 𝛼 ≥ 2 and 0 < 𝛼 < 1, we have I𝛼4 > I 6 ; when 1 < 𝛼 < 2, we have I 6 > (1 − 21−𝛼 )I𝛼4 . 1−𝛼 (iii) When 𝛼 ≥ 2, we have I𝛼2 > (1−2 𝛼−1) ln 2 I 6 ; when 1 < 𝛼 < 2, we have (i)

I𝛼2 >

(1−21−𝛼 )2 ln 2 6 I ; 𝛼−1

when 0 < 𝛼 < 1, we have I𝛼2 > I 6 .

5.4 Summary and Conclusion

The entropy of a probability distribution can be interpreted as a measure of not only uncertainty but also information, and the entropy of a graph is an information-theoretic quantity for measuring the complexity of a graph. Information-theoretic network complexity measures have already been intensely used in mathematical and medicinal chemistry including drug design. So far, numerous such measures have been developed such that it is meaningful to show relationships between them. This chapter mainly attempts to capture the extremal properties of different (generalized) graph entropy measures and describe various connections and relationships between (generalized) graph entropies and other variables in graph theory. The first section aims to introduce various entropy measures contained in distinct entropy measure classes. Inequalities and extremal properties of graph entropies and generalized graph entropies, which are based on different information functions or distinct graph classes, have been described in Section 5.2. The last section focuses on the generalized graph entropies and shows the relationships between graph structures, graph energies, topological indices, and some selected generalized graph entropies. In addition, throughout this chapter, we also state various applications of graph entropies together with some open problems and conjectures for further research. In fact, graph entropy measures can be used to derive the so-called implicit information inequalities for graphs. In general, information inequalities describe relationships between information measures for graphs. In Ref. [17], the authors found and proved implicit information inequalities, which were also stated in the survey paper [10]. As a consequence, we will not give the detail results in this aspect. It is worth mentioning that many analyses have been conducted and numerical results have been obtained, which we refer to [17, 23, 27, 28, 33, 42, 56, 57] for details. These numerical results imply that the change of different entropies corresponds to different structural properties of graphs. Even for special graphs, such as trees, stars, paths, and regular graphs, the increase or decrease of graph entropies implies special properties of these graphs. As is known to all, graph entropy measures have important applications in a variety of problem areas,

179

180

5 Graph Entropy: Recent Results and Perspectives

including information theory, biology, chemistry, and sociology, which we refer to [11, 24, 58–64] for details. This further inspires researchers to explore the extremal properties and relationships among these (generalized) graph entropies.

References 1. Rashevsky, N. (1955) Life, informa-

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

tion theory and topology. Bull. Math. Biophys., 17, 229–235. Trucco, E. (1956) A note on the information content of graphs. Bull. Math. Biophys., 18 (2), 129–135. Mowshowitz, A. (1968) Entropy and the complexity of the graphs I: an index of the relative complexity of a graph. Bull. Math. Biophys., 30, 175–204. Mowshowitz, A. (1968) Entropy and the complexity of graphs II: the information content of digraphs and infinite graphs. Bull. Math. Biophys., 30, 225–240. Mowshowitz, A. (1968) Entropy and the complexity of graphs III: graphs with prescribed information content. Bull. Math. Biophys., 30, 387–414. Mowshowitz, A. (1968) Entropy and the complexity of graphs IV: entropy measures and graphical structure. Bull. Math. Biophys., 30, 533–546. Körner, J. (1973) Coding of an information source having ambiguous alphabet and the entropy of graphs. Transactions of the 6th Prague Conference on Information Theory, pp. 411–425. Simonyi, G. (1995) Graph entropy: a survey, in Combinatorial Optimization, DIMACS Series in Discrete Mathematics and Theoretical Computer Science (eds W. Cook, L. Lovász, and P. Seymour) vol. 20, AMS, pp. 399–441. Simonyi, G. (2001) Perfect graphs and graph entropy: an updated survey, in Perfect Graphs (eds J. Ramirez-Alfonsin and B. Reed), John Wiley & Sons, pp. 293–328. Dehmer, M. and Mowshowitz, A. (2011) A history of graph entropy measures. Inf. Sci., 181, 57–78. Dehmer, M., Barbarini, N., Varmuza, K., and Graber, A. (2009) A large scale analysis of information-theoretic network complexity measures using chemical structures. PLoS ONE, 4 (12), e8057.

12. Mowshowitz, A. and Mitsou, V. (2009)

13. 14.

15.

16.

17.

18.

19.

20.

21.

22. 23.

24.

25.

Entropy, Orbits and Spectra of Graphs, Analysis of Complex Networks: From Biology to Linguistics, Wiley-VCH Verlag GmbH, pp. 1–22. Harary, F. (1969) Graph Theory, Addison Wesley Publishing Company. Bonchev, D. (1983) Information Theoretic Indices for Characterization of Chemical Structures, Research Studies Press. Bonchev, D. (1979) Information indices for atoms and molecules. Commun. Math. Comput. Chem., 7, 65–113. Bonchev, D. and Rouvray, D. (2005) Complexity in Chemistry, Biology, and Ecology, Springer-Verlag. Dehmer, M. and Mowshowitz, A. (2010) Inequalities for entropy-based measures of network information content. Appl. Math. Comput., 215, 4263–4271. Lyons, R. (2005) Asymptotic enumeration of spanning trees. Comb. Probab. Comput., 14, 491–522. Lyons, R. (2010) Identities and inequalities for tree entropy. Comb. Probab. Comput., 19, 303–313. Shannon, C.E. and Weaver, W. (1949) The Mathematical Theory of Communication, University of Illinois Press. Rényi, P. (1961) On measures of information and entropy. Stat. Probab., 1, 547–561. Arndt, C. (2004) Information Measures, Springer-Verlag. Dehmer, M. and Mowshowitz, A. (2011) Generalized graph entropies. Complexity, 17, 45–50. Dehmer, M., Barbarini, N., Varmuza, K., and Graber, A. (2010) Novel topological descriptors for analyzing biological networks. BMC Struct. Biol., 10 (1), 18. Bonchev, D. (2009) Information theoretic measures of complexity. Encycl. Compl. Syst. Sci., 5, 4820–4838.

References 26. Dehmer, M., Mowshowitz, A., and

27.

28.

29.

30. 31.

32. 33.

34.

35.

36.

37.

38.

39.

40.

Emmert-Streib, F. (2011) Connections between classical and parametric network entropies. PLoS ONE, 6 (1), e15733. Dehmer, M. (2008) Information processing in complex networks: graph entropy and information functions. Appl. Math. Comput., 201, 82–94. Emmert-Streib, F. and Dehmer, M. (2007) Information theoretic measures of UHG graphs with low computational complexity. Appl. Math. Comput., 190, 1783–1794. Bonchev, D. and Rouvray, D.H. (1991) Chemical Graph Theory: Introduction and Fundamentals, Mathematical Chemistry. Diudea, M.V., Gutman, I., and Jäntschi, L. (2001) Molecular Topology, Nova. Gutman, I. and Polansky, O.E. (1986) Mathematical Concepts in Organic Chemistry, Springer-Verlag. Trinajstić, . (1992) Chemical Graph Theory, CRC. Dehmer, M., Borgert, S., and Emmert-Streib, F. (2008) Entropy bounds for hierarchical molecular networks. PLoS ONE, 3 (8), e3079. Dehmer, M. and Sivakumar, L. (2012) Recent developments in quantitative graph theory: information inequalities for networks. PLoS ONE, 7 (2), e31395. BollobÍás, B. and Nikiforov, V. (2004) Degree powers in graphs with forbidden subgraphs. Electron. J. Comb., 11, R42. Bollobás, B. and Nikiforov, V. (2012) Degree powers in graphs: the ErdösStone theorem. Comb. Probab. Comput., 21, 89–105. Goodman, A.W. (1959) On sets of acquaintances and strangers at any party. Am. Math. Mon., 66, 778–783. Goodman, A.W. (1985) Triangles in a complete chromatic graph. J. Aust. Math. Soc., Ser. A, 39, 86–93. Costa, L.D.F., Rodrigues, F.A., Travieso, G., and Villas Boas, P.R. (2007) Characterization of complex networks: a survey of measurements. Adv. Phys., 56, 167–242. Wiener, H. (1947) Structural determination of paraffin boiling points. J. Am. Chem. Soc., 69, 17–20.

41. Cao, S., Dehmer, M., and Shi, Y. (2014)

42.

43. 44.

45.

46.

47.

48.

49.

50.

51.

52.

53.

54.

55.

Extremality of degree-based graph entropies. Inform. Sci., 278, 22–33. Chen, Z., Dehmer, M., Emmert-Streib, F., and Shi, Y. (2014) Entropy bounds for dendrimers. Appl. Math. Comput., 242, 462–472. Trinajstic, N. (1992) Chemical Graph Theory, CRC Press. Cao, S. and Dehmer, M. (2015) Degreebased entropies of networks revisited. Appl. Math. Comput., 261, 141–147. Chen, Z., Dehmer, M., and Shi, Y. (2014) A note on distance-based graph entropies. Entropy, 16, 5416–5427. Dehmer, M. and Kraus, V. (2012) On extremal properties of graph entropies. MATCH Commun. Math. Comput. Chem., 68, 889–912. Dehmer, M. (2008) Informationtheoretic concepts for the analysis of complex networks. Appl. Artif. Intell., 22, 684–706. Dijkstra, E.W. (1959) A note on two problems in connection with graphs. Numerische Math., 1, 269–271. Kraus, V., Dehmer, M., and Schutte, M. (2013) On sphere-regular graphs and the extremality of information-theoretic network measures. MATCH Commun. Math. Comput. Chem., 70, 885–900. Sivakumar, L. and Dehmer, M. (2012) Towards information inequalities for generalized graph entropies. PLoS ONE, 7 (6), e38159. Zhou, B., Gutman, I., de la P˜ena, J.A., Rada, J., and Mendoza, L. (2007) On the spectral moments and energy of graphs. MATCH Commun. Math. Comput. Chem., 57, 183–191. Dehmer, M., Li, X., and Shi, Y. (2014) Connections Between Generalized Graph Entropies and Graph Energy, Complexity. Li, X., Mao, Y., and Wei, M. (2015) More on a conjecture about tricyclic graphs with maximal energy. MATCH Commun. Math. Comput. Chem., 73, 11–26. Li, X., Shi, Y., Wei, M., and Li, J. (2014) On a conjecture about tricyclic graphs with maximal energy. MATCH Commun. Math. Comput. Chem., 72, 183–214. Li, X., Qin, Z., Wei, M., Gutman, I., and Dehmer, M. (2015) Novel inequalities for generalized graph entropies-Graph

181

182

5 Graph Entropy: Recent Results and Perspectives

56.

57.

58.

59.

60.

energies and topological indices. Appl. Math. Comput., 259, 470–479. Dehmer, M., Moosbrugger, M., and Shi, Y. (2015) Encoding structural information uniquely with polynomial-based descriptors by employing the Randić matrix. Appl. Math. Comput., 268, 164–168. Du, W., Li, X., Li, Y., and Severini, S. (2010) A note on the von Neumann entropy of random graphs. Linear Algebra Appl., 433, 1722–1725. Dehmer, M., Varmuza, K., Borgert, S., and Emmert-Streib, F. (2009) On entropy-based molecular descriptors: statistical analysis of real and synthetic chemical structures. J. Chem. Inf. Model., 49, 1655–1663. Dehmer, M., Sivakumar, L., and Varmuzab, K. (2012) Uniquely discriminating molecular structures using novel eigenvalue-based descriptors. MATCH Commun. Math. Comput. Chem., 67, 147–172. Dehmera, M. and Emmert-Streib, F. (2008) Structural information content of

61.

62.

63.

64.

networks: graph entropy based on local vertex functions. Comput. Biol. Chem., 32, 131–138. Holzinger, A., Ofner, B., Stocker, C., Valdez, A.C., Schaar, A.K., Ziefle, M., and Dehmer, M. (2013) On graph entropy measures for knowledge discovery from publication network data, in Availability, Reliability, and Security in Information Systems and HCI, CD-ARES, LNCS, vol. 8127, pp. 354–362. Emmert-Streib, F. and Dehmera, M. (2012) Exploring statistical and population aspects of network complexity. PLoS ONE, 7 (5), e34523. Ignac, T.M., Sakhanenko, N.A., and Galas, D.J. (2012) Complexity of networks II: the set complexity of edge-colored graphs. Complexity, 17, 23–36. Sakhanenko, N.A. and Galas, D.J. (2011) Complexity of networks I: the setcomplexity of binary graphs. Complexity, 17, 51–64.

183

6 Statistical Methods in Graphs: Parameter Estimation, Model Selection, and Hypothesis Test Suzana de Siqueira Santos, Daniel Yasumasa Takahashi, João Ricardo Sato, Carlos Eduardo Ferreira, and André Fujita

6.1 Introduction

Graph theory dates back to 1735, when Leonhard Euler solved the Königsberg bridge problem [1]. Since then, this field has been contributing to several areas of knowledge, such as discrete mathematics, computer science, biology, chemistry, operations research, and social sciences. In 1959, the study of probability in graphs gained attention with the works of Erd˝os and Rényi [2], and Gilbert [3] about random graphs. In 1999, in the beginning of the Information Age, a new family of graphs was studied and gained much importance in the 21st century. This family contains several graphs, such as the WWW graphs, the graphs of coauthors, social graphs, and biological graphs that share unexpected similarities. Examples of characteristics shared by some of those networks include the sparsity (the number of edges are usually linear on the number of vertices), the small-world structure (small distance between vertices and the presence of groups of densely connected vertices), and the power law degree distribution (the number of vertices with degree d is proportional to d−𝛽 , for some exponent 𝛽 > 0) [1]. Because of their nontrivial structure, which is neither totally random nor totally regular, these graphs are called complex networks. To study this type of network, several random graph models have been proposed, such as the Barabási–Albert [4] and the Watts–Strogatz models [5], which aim to generate graphs with power law degree distribution and small-world structure, respectively. We can pose several questions about random graphs. For example, how predictable is the structure of a random graph? If we observe a graph, can we infer which random graph model generated it? How similar/different are two random graphs? To answer these questions, it is fundamental to construct statistical methods in graphs. However, the construction of statistical methods in graphs is not trivial due to the complexity to deal with sets of vertices and edges. To build the link between statistics and graph theory, Takahashi et al. [6] proposed the use of the spectrum of the graph (set of eigenvalues of the graph adjacency matrix) to Mathematical Foundations and Applications of Graph Entropy, First Edition. Edited by Matthias Dehmer, Frank Emmert-Streib, Zengqiang Chen, Xueliang Li, and Yongtang Shi. © 2016 Wiley-VCH Verlag GmbH & Co. KGaA. Published 2016 by Wiley-VCH Verlag GmbH & Co. KGaA.

184

6 Statistical Methods in Graphs: Parameter Estimation, Model Selection, and Hypothesis Test

define the spectral graph entropy. That measure of entropy is based on the differential entropy, which extends the classical Shannon measure [7] for continuous variables. It is important to determine the differences between the graph spectral entropy and other existing graph entropy measures. While the graph spectral entropy is based on the differential entropy, other measures are usually based on the discrete Shannon entropy (e.g., we refer to the review written by Dehmer and Mowshowitz [8]). The discrete graph entropy usually aims to measure the complexity of a given graph. By contrast, the graph spectral entropy measures how uncertain/random is the structure of a random graph [6]. Each graph entropy measure is based on a different graph feature. Examples of features used by the discrete graph entropy measures include vertex centralities, distances between vertices, classes of topologically equivalent vertices, and graph decomposition into special subgraphs of complexity [8]. For random graphs, we are interested in features that are shared by all graphs generated by the same random graph model and that are different among graphs from distinct models. The graph spectrum, which is tightly associated with several graph structural properties, is considered an adequate characterization of a random graph [6]. Based on that feature, the graph spectral entropy was proposed as a measure of the randomness of a random graph. On the basis of the graph spectral entropy, Takahashi et al. [6] introduced formal statistical methods in graphs for parameter estimation, model selection, and a hypothesis test to discriminate two populations of graphs. Applications of these methods were useful to uncover structural properties of the networks in molecular biology [6] and neuroscience [9]. The main contribution of this chapter is a review of statistical methods in graphs based on the graph spectral entropy. In the following sections, we provide an essential background regarding random graphs and then describe the details of the statistical framework in graphs. Monte Carlo simulations are also shown with the purpose of illustrating the performance of the methods.

6.2 Random Graphs

A graph G = (V , E) is an ordered pair, where V = {1, 2, … , n} is a set of vertices and E is a set of edges connecting the elements of V . All graphs considered in this chapter are undirected, that is, each edge e ∈ E is an unordered pair of vertices. To learn statistical methods in graphs, we must first understand the concept of probability distribution over graphs. The theory behind it is the theory of random graphs, which studies the intersection between graph theory and probability theory. A random graph is a probability space (Ω,  , P), where the sample space Ω is a nonempty set of graphs, the set of events  is a collection of subsets of the sample space (usually  is the power set of Ω), and P is a function that assigns a probability to each event. It is usual to describe a random graph by a sequence of steps to

6.2

Random Graphs

construct it. An algorithm that describes the construction of a random graph is called a random graph model. An example of random graph model is the Erd˝os–Rényi algorithm [2], in which the sample space Ω is the set of all graphs having n labeled vertices, and m edges (usually m is a function of n). Each graph of Ω can be generated by selecting m ( ) edges from the n2 possible edges. Therefore, the set Ω has size (( )) n 2

m

.

Then, the probability to choose a graph from Ω is ( ( ) )−1 n 2

m

.

A similar example of random graph model is the Gilbert model [3], in which two nodes in a graph with n vertices are connected with probability 0 ≤ p ≤ 1. Then, the sample space of the Gilbert random graph is the set of all graphs having ( ) n

−m

n labeled vertices. The probability to take a graph with m edges is pm (1 − p) 2 [10]. ( )The Erd˝os–Rényi and Gilbert models are almost interchangeable when m ∼ p n2 [11]. Another simple example, in terms of construction, is the random geometric graph, in which n vertices are drawn randomly and uniformly on a unit square and a pair of vertices are connected by an edge if the distance between them are at most a given parameter r [12]. In some applications, the random geometric graph is considered more realistic than the Gilbert and Erd˝os–Rényi models [12]. However, it is not suited for many real networks, such as biological networks, which usually have a particular type of degree distribution. A widely used random graph model that aims to construct graphs with a particular property of the degree distribution that is shared by many real networks is the Barabási–Albert algorithm [4]. In that model, we consider a small number (n0 ) of initial vertices. Then, in each step, we add a new vertex and connect it to a fixed number (≤ n0 ) of vertices that already exist in the network. The probability that the new vertex will be connected to a vertex i is proportional to d(i)ps , where d(i) is the degree of i and ps is a parameter called the scaling exponent parameter. That algorithm generates graphs in which the frequency of vertices with degree d is, asymptotically, proportional to d−3 [13]. Because of this power relationship between the frequency and degree, we say that the vertex degrees follow a power law distribution. Another widely used model is the Watts–Strogatz [5] algorithm, which generates graphs with groups of densely connected vertices. The Watts–Strogatz model is as follows: (1) Construct a ring lattice with n vertices and connect each vertex to the K nearest vertices (K∕2 in each side of the ring).

185

186

6 Statistical Methods in Graphs: Parameter Estimation, Model Selection, and Hypothesis Test

(2) Choose a vertex i and the edge e that connects i to its nearest neighbor in a clockwise manner. (3) With probability pr , replace the edge e by the edge that connects i to a vertex taken at random according to a uniform distribution over the entire ring. (4) Repeat the steps 2 and 3 for each vertex in a clockwise manner. (5) Choose a vertex i and the edge e that connects i to its second nearest neighbor in a clockwise sense, and repeat the steps (2)–(4). (6) Repeat the process considering the third nearest neighbor and so on, until each edge of the original lattice has been considered. Our last example is the k-regular model [14], which describes a probability space of all graphs of size n such that each vertex is connected to exactly k other vertices. Therefore, it is a particular type of the Erd˝os–Rényi random graph. We illustrate graphs generated by the Erd˝os–Rényi, Gilbert, Geometric, Barabási–Albert, Watts–Strogatz, and k-regular models in Figure 6.1. All these graphs were generated with n = 500 vertices and parameters indicated in the figure. ●

●

●

●

●

●

●

●

●

● ● ● ● ●

●

●

●

●

●

●

●

● ●

●

●●

●

● ●

●

●

●

● ●

●

●

●

● ●

●

●

●

● ●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

● ●

● ● ●

●

●

●

●

●

●

●

●

●

●

● ●

●

● ●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

● ●

●

●

●

●

●

● ●

(d)

● ●

●

●

●

●

●

●

● ● ●

● ●

●

● ●

●

● ●

●

●

● ●

●

●

●

●

● ●

●

●

●

● ●

●

●

● ●

●

●

● ● ● ●

●

●

●●

●

● ●

. In (b), the probability

● ●

● ● ●

●

●

●

●

●

●

●

●

● ●

●

● ● ●

●

● ●

●

● ●●

●

● ● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

● ● ●

●

●

●

●

● ●

● ●

●

●

● ●

●

●

● ●

●

●

● ●●

●

●

● ●

●

●

● ●

● ● ●

●

● ●

●

●

● ●

● ●

● ●

● ●

●

● ● ●

●

●

●

●

●

●

●

●

●

●

●

● ●

● ● ● ●

●

● ●

●

● ●

●

● ●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

● ● ●

● ●

●

● ● ●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

● ●●

● ●

● ●

●

●

●

● ● ●

● ●

● ●

●

● ●

●

●

●

●

●

●

●

●

● ●

●

●

● ●

● ●

●

●

● ●

●

●

● ●

●

●

● ●

● ● ●

●

●

●

●

●

●

● ●

● ●

●

●

●

● ● ●

●

●

●

● ●

●

● ● ●

●

●

●

●

● ● ●

●

● ●

●

● ●

●

●

● ●

●

●

●

●

●

● ●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

● ●

● ●

●

●

●

●

●

● ● ●

● ●

●

●

●

●

●

● ●

● ●

●

●

● ●

●

●

●

●

●

●

●

●

● ●●

●

● ●

●

●

●

●

● ●

●

●

● ●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

● ●

●

●

●

●

●

●

●

● ●

● ●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

● ● ●

●●

●

●

●

●

● ●

● ●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

● ●

● ●

●

●

●

● ● ● ● ●

●

●

●

● ●

●

● ●

●

●

●

●

●

● ●

● ●

●

●

●

● ● ●

●

●

●

●

●

●

● ●

●

● ●

●

●

●

p of connecting two vertices is equal to

● ●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

● ●

●

●

● ● ●

● ●

●

●

●

●

●

● ●●

●

● ●

●

●

●

●

(e)

500 2

●

●

●

●

● ●

● ●

●

● ●

●

●

●

●

●

● ●

●

●

●

● ●

●

●

● ●

● ●

●

●

●

●

●

● ●

●

● ●

●

● ●

● ●

●

● ●

● ●

●

●

●

●

●

● ●

●

●

●

● ●

●

●

●

●

●

●

● ● ●

● ●

●

●

●

● ●

●

●

●●

●

Figure 6.1 Random graph models. Graphs with n = 500 vertices generated by the ˝ Erdos–Rényi (a), Gilbert (b), Geometric (c), Barabási- Albert (d), Watts–Strogatz (e), and k-regular (f ) random graph models. In (a), the number(of edges is equal to 0.007N, ) where N =

●

● ● ●

●

●

●

●

●

●

●

●

●

●

●

●

● ● ●

●

●

●

●

● ●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

● ● ●

● ●

● ●

●

●

● ● ● ●

●

● ● ●

●

●

●

●

●

●

●

● ●

●

● ●

●

●

●

●

● ●

●

●

●

●

●

● ●

●

●

●

●

●

● ● ●

●

●

●

●

●

●

●

●

● ●

●

● ●

● ●

●

●

●

●

● ● ●

●

● ●

●

●

●

● ●

●

● ●

●

● ●

●

●

● ●

● ● ● ●

●

●

●

●

● ●

●

●

●

● ●●

●

●

●

● ●

●

●

●

●

●

● ●

● ●

●

●

●

●

● ●

●

● ●

● ●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

● ●

●

●

●

● ●

●

● ● ● ●

●

● ●

●

●

●

●

●

●

●

● ●

●

●

●

●

● ●

●

●

●

●

●

● ● ●

●

● ● ●

● ●

●

●

●

●

●

●

●

●

●

●

●

● ●

● ●

●

●

● ● ●

● ●

●

●

● ●

●

● ● ●

●

● ●

●

●

●

●

●

●

● ●

●

●

●

●

● ● ●

● ●

●

● ●

●

●

●

●

●

●

● ● ●

●

●

●

● ●

●

●

●

●

●

● ●

●

● ●

● ● ●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

● ●

●

●

● ●

●

● ●

●

●

● ●

●

●

●

●

●

●

● ● ●

● ●

● ●

●

●

● ●

● ●

●

● ●

●

●

● ●

● ●

●

●●

●

● ●

●

● ●

●

●

● ●

●

●

● ●

●

● ●

●

●

● ●

●

●

●

●

● ●

●

● ●

●

●

●

●

● ●

●

●

●

● ●

● ●

●

●

●

●

● ●

●

●

●

●● ●

●

● ●

●

●

●

●

●

●

●

●

●

● ● ●

●

●

●

●

●

●

●

●

● ●

●

● ●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

● ●

● ●

●

●

●

●● ●

● ● ●

● ●

●

●

●

● ●● ●

●

●

●

● ●

●

●

●

● ●

● ●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

● ● ●

● ●

● ●

● ●

● ●

● ●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

● ● ●

●

●

●

●

●

●

●

●

●

● ●

●●

●

● ●

●

●

●

●

●

●

●

● ●

●

● ● ●

● ●

●

●

● ●

●

●

● ●

●

● ●

●

● ● ● ●

●

●

●

● ●

●

●

●

●

●

● ●

●

● ●

● ●

●

●

●

●

●

●

● ●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

● ●

●

● ●

●

●

●

●

●

● ●

● ●

●

●

●

● ●

●

●

●

●

●

●●

●

● ●

● ●

●

●

●

● ●

● ●

●

●

●

●

● ●

●

●

● ●

● ● ●

●

●

● ●

●

● ● ● ●

●

●

●

●

●

● ●

●●

●

● ●

●

●

●

●

● ●

●

●

● ●

●

●

●

● ●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

● ●

● ●

●

●

● ●

●

●

●

●

● ●

●

●

●

●

●

● ●

●

●

●● ●

●

●

●

●

●

●

● ●

●

●

●

●

●

● ● ● ●

●

●

● ● ●

●

● ●

● ●

●

● ●

●

●

● ●

● ● ● ●● ● ● ● ● ● ●● ● ●

●

●

● ●

●

●

●

●

(c)

● ●

●

● ●

● ●

●

● ●

●

●

● ●

● ●

●

● ●

● ●

●

● ●

● ●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

● ● ●

●

●

●

●

●

●

●

●

● ● ●

● ●

● ●● ●

●

●

●

● ●

● ●

●

●

●●

●●

● ●

●

●

●

● ● ●

●

●

●

(b)

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●● ● ●●

● ●

●

●

●

●

●

(a)

●

● ●

●

●

●

●

●

●

●

● ●● ●

●

●● ● ●●

●

●

●

● ●

●

●●● ● ● ●

●

●

●

●

● ●● ● ● ●

● ● ●

●

●

● ●

● ●

●

● ●

●

●

●

●

●

● ●

●● ●

● ●

● ●

● ●

●

●● ●

●

● ● ●

●● ● ●

● ●

● ● ●

● ● ●

● ●

●

● ●

●

●

●

●

●

●

● ●

●

● ●

●

●

●

●

●

● ●

●

●

● ●

●

● ● ●

●

●

●

●

●●

● ●

● ● ●

●●

●

●

●

●

● ●

● ●

●

● ●

●

●

●●

●

● ●

●

●

●

● ●

● ●

●

●

●

●

●

●

●

● ●

●

●● ●

●

●

●

●

●

●

● ●

● ● ●

● ●

●

●

●

●

● ●

●

● ●

●

●

●

●

● ●

● ●

●

● ●

● ●

●

● ●

● ●

●

●

●

●● ●

●

●

● ●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

● ● ●

● ●

● ●

●

●

● ● ● ●

●

●

●

●

●

● ● ● ● ● ●

●

●

● ● ●

● ● ● ● ● ● ●

● ● ● ● ●

●

●

●

●

● ● ●

●

●●

● ●

● ●

● ●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

● ●

●

●

●

● ●

● ● ●

●

●

●

●

● ●

● ●

●

●

●

●

●

● ●

●

●

●

● ● ●

●

● ●

●

● ●

●

● ●

●

●

●

●

● ●

●

● ●

●

●

●

●

●

●

●

● ●●

● ●

●

●

● ● ● ●

●

●●

●●

●

● ●

●

●

●

●

●

● ● ●

● ● ●

● ●

●

●

●

● ●

●

●

● ●

●

● ● ●

● ●

● ●

●

●

● ●

● ●

●● ● ● ● ● ● ● ● ● ● ● ● ●

● ●

●

●

●

● ●

● ● ● ●

●

●

●

●

●

●

●

● ●●

●

● ●

●

●

●

● ●

●

● ●

●

●

●

●

●

●

● ● ● ●

● ● ●

● ●● ●

●

●

●

● ● ●

● ●

●

● ●

● ●

● ●

●

●

●

●

●

● ●

●●

● ● ● ● ● ● ●

●

●

●

●

● ● ● ●

●

●

●

●

●● ●●

●

● ●

●

●●

●

●●

●

●

●

●

●

● ● ● ●

●

● ●

●

●

●

●● ●

●

●

●

●

● ●

● ●

●

●

●

●

● ● ● ● ● ●● ●

● ●

● ●

● ●

● ●

●● ● ●

●

●

●

●● ● ● ● ● ● ●●

● ●

● ●

● ●

● ●

● ● ●

●

●

● ●

●

●

● ●

●

●

●

●

●

●

●

● ● ●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

● ● ● ●

●

●

● ●

● ●● ● ● ● ●

●

●

● ●

● ● ●

●●

● ●

●

●

●

●

● ● ●●

●

●

● ●

●

●● ● ●●

●

●

●

● ● ●

●

●

● ● ● ● ●

●

●

● ●

●

●

●

●

●

●

●

● ● ● ●

●●

●

●

●

●

● ●

●

●

●

●

●

● ●

●

●

● ●

● ●

●

● ●

● ●

●

●

●● ●

●

●

●

● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

●

● ●

●

●

● ● ●

●

●

●

●

●

●

●

● ●●

●

● ● ●

● ● ●

●

● ●

●

●

● ● ●

● ●

●

●

●

● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●

●

● ● ● ●●

●

●

●

●

●

● ●

● ●

●

●

●

●

●

●

●

●

●

●●

● ●

●● ● ● ●

●

●

●

● ●

●

●

● ●

● ● ●

●

● ● ●

● ● ●● ● ●

●

●

●

● ●

●

●

●

●

●

●●

● ● ●

●●

●

●

●

●

●

● ●

●

● ● ●

● ●

●

●

● ● ● ●● ●

●

●

●

●

● ●

●

● ●

● ●

● ● ● ●

● ●● ●

●

●

●

●

●● ● ●

●

● ● ●

●

● ●

●

●●

●

● ● ● ●

● ●

● ●

● ● ●

●

● ●

●

●

●

● ● ●

● ● ● ● ● ●

●

● ●

●

●

●

●

●

● ● ●

●

●

●

●

● ●

●

●

● ●

● ●

●

●● ● ●

●

●

● ●

●

● ●

●

● ● ●

●

●●

●

● ●

● ●

● ●

●

● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●

●

●

●

●

● ●

●

●

●

●

● ●

●●

● ● ●

●

●

● ●

● ● ●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

● ● ● ●

●

●

●

● ●

●

● ●

●

● ●

● ●

●

●

● ● ● ● ● ●

●

●

●

●

●

●

● ● ●

●

●

●

●

●●

●

●

●

● ● ●

● ●

●

●

●

●

●

●● ●

●

●

●

●

●

(f)

0.007. In (c), we have r = 0.1, where r is the radius used by the geometric model. In (d), we set the scale exponent ps to 1. In (e), the probability of reconnecting a vertex pr is equal to 0.07. In (f ), the degree of each vertex is k = 3. (Takahashi et al. [6]. Used under CC BY 4.0 https://creativecommons .org/licenses/by/4.0/.)

6.3

Graph Spectrum

Now, suppose that we take at random two graphs G1 and G2 , each one of size n. If both graphs are from the same random graph, then it is reasonable to expect that in the limit as n → ∞ they share some structural properties. By contrast, if G1 and G2 are from different random graphs, we may expect to find fundamental differences between their structural properties. Given the graphs G1 and G2 , can we measure the similarities between their structures? Is the probability of G1 and G2 being from the same random graph high? To answer these questions, we need a mathematical way to describe graph structural properties that are equal for graphs from the same random graph, but different for elements from distinct random graphs. Takahashi et al. [6] proposed that the spectrum of a graph is an adequate summarization of the graph structure for this problem. In the following section, we define the graph spectrum and other spectrum-based concepts that describe a set of graph structural properties.

6.3 Graph Spectrum

Let G = (V , E) be an undirected graph and n the number of vertices (i.e., n = |V |). The spectrum of G is the set of eigenvalues of its adjacency matrix, which is denoted by AG . As G is undirected, if two vertices i and j are connected by an edge, then AG𝑖𝑗 = AG𝑗𝑖 = 1, otherwise, AG𝑖𝑗 = AG𝑗𝑖 = 0 (i.e., if i and j are not connected). We have AG = ATG , and, therefore, all eigenvalues of the matrix AG are real. Let {𝜆1 , 𝜆2 , … , 𝜆n } be the spectrum of G such that 𝜆1 ≥ 𝜆2 ≥ · · · ≥ 𝜆n . The spectral graph theory studies graph spectrum properties and their association with the graph structure. We show some examples of this relationship below. (1) Let d(i) denote the number of edges in E connected to i (the degree of the vertex i). The eigenvalue 𝜆1 is at least 1∑ d(i), n i=1 n

and at most maxi∈V d(i) [15]. (2) The graph G is bipartite only if 𝜆n = −𝜆1 [15]. (3) If G is connected, then the eigenvalue 𝜆1 is strictly larger than 𝜆2 , and there exists a positive eigenvector of 𝜆1 [16]. (4) Each vertex in V is connected to exactly 𝜆1 vertices (i.e., G is 𝜆1 -regular) only if the vector of 1s is an eigenvector of 𝜆1 [17]. (5) Let C ⊆ V such that each pair of vertices in C are connected in G (i.e., C is a clique in G). Then, the size of C is at most 𝜆1 + 1 [18]. (6) Let k be the diameter of G. If G is connected, then AG has at least k + 1 distinct eigenvalues [19].

187

188

6 Statistical Methods in Graphs: Parameter Estimation, Model Selection, and Hypothesis Test

The facts above illustrate some of the several relationships between the structure of a graph and its spectrum. Now, let us recall that we are particularly interested in random graphs. What if we take the spectrum of a random graph? Given a set of n labeled vertices V = {1, 2, … , n}, let g be a random graph such that its sample space Ω has graphs on V . We define the spectrum of g as a random vector containing n random variables 𝜆1 , 𝜆2 , … , 𝜆n . Each function 𝜆i ∶ Ω → ℝ maps a graph in the sample space Ω to the ith largest eigenvalue of its adjacency matrix. Let 𝛿 be the Dirac delta, which is the probability measure satisfying (1) 𝛿(x) = 0, x ∈ ℝ∗ , (2) 𝛿(0) = ∞, +∞ (3) ∫−∞ 𝛿(x)𝑑𝑥 = 1. Let G be a graph in the sample space of g. The empirical spectral density of G is defined as [20] ( ) n 𝜆i (G) 1∑ 𝜌(𝜆, G) = 𝛿 𝜆− √ . n i=1 n It is pertinent to ask whether we can deduce anything about the empirical spectral densities of graphs from g, particularly in the limit n → ∞. Then, it is natural to take the limit of the expectation (denoted by ⟨⋅⟩) of the empirical spectral density according to the probability law of g: ⟨ n ( )⟩ 𝜆i 1∑ 𝜌(𝜆) = lim . 𝛿 𝜆− √ n→∞ n i=1 n We refer to the expected empirical spectral density defined above as the spectral density of g. To illustrate the spectral distribution of different random graphs, we show in Figure 6.2 the empirical spectral density for graphs generated by the Erd˝os–Rényi, Gilbert, Geometric, Barabási–Albert, and Watts–Strogatz models. Once we have observed the tight relationship between the spectrum and the graph structure and defined the spectral density, we can introduce measures that describe or compare random graph structural properties. Let g1 and g2 be two Gilbert random graphs with n vertices such that the probability p of connecting a pair of vertices is 0.5 in g1 and 0.9 in g2 . If we take two graphs G1 and G2 from g1 and g2 , respectively, can we deduce anything about their structures? Intuitively, as the probability to connect a pair of vertices is high in g2 and intermediate in g1 , the structure of G2 seems to be more predictable than that of G1 . This notion of predictability of random variable outcomes is studied by the Information Theory, in which for each random variable, we associate an amount of uncertainty. To quantify how well an outcome can be predicted, Shannon introduced the entropy measure [7]. In the following section, we show this concept for a random graph.

Graph Spectral Entropy

189

0.2

5 4 3 4

(c)

Density

Density 2 3

3

4

10 12 8 6

−0.2 0.0 0.2 0.4 0.6 0.8 1.0 Eigenvalue

0.2

−0.4 (d)

−0.2

0.0 0.2 Eigenvalue

0

0

0

2

1

1

4

Density

−0.2 −0.1 0.0 0.1 Eigenvalue

(b)

2

–0.2 –0.1 0.0 0.1 Eigenvalue

(a)

0

0

0

1

1

1

2

Density

Density 2 3

2

Density

3

4

4

6

5

6.4

−0.2

0.4 (e)

Figure 6.2 Graph spectral density. Spectral density estimators for graphs with 500 ˝ vertices generated by the Erdos–Rényi (a), Gilbert (b), Geometric (c), Barabási–Albert (d), Watts–Strogatz (e), and k-regular (f ) random graph models. In (a), the number ( )of edges is equal to 0.007N = 0.007

500 2

. In

0.0 0.2 0.4 Eigenvalue

−0.2

0.6 (f)

−0.1

0.0 0.1 Eigenvalue

(b), the probability p of connecting two vertices is equal to 0.007. In (c), we have r = 0.1, where r is the radius used by the geometric model. In (d), we set the scale exponent ps to 1. In (e), the probability of reconnecting a vertex pr is equal to 0.07. In (f ), the degree of each vertex is k = 3.

6.4 Graph Spectral Entropy

The entropy of a random graph quantifies the randomness of its structure. Let g be a random graph and 𝜌 its spectral density. The spectral entropy of g is defined as +∞

H(𝜌) = −

∫−∞

𝜌(𝜆) log 𝜌(𝜆)d𝜆,

(6.1)

where 0 log 0 = 0 [6]. We can approximate the spectral entropy of the Gilbert random graph to 1 1 ln (4𝜋 2 p(1 − p)) − , 2 2 where p is the probability to connect a pair of vertices [6]. Then, the maximum spectral entropy of the Gilbert random graph is achieved when p = 0.50. This is consistent with the intuitive idea that when all possible outcomes have the same probability to occur, the ability to predict the system is poor. By contrast, when p → 0 or p → 1, the construction of the graph becomes deterministic, and the H(𝜌) ∼

0.2

190

6 Statistical Methods in Graphs: Parameter Estimation, Model Selection, and Hypothesis Test

amount of uncertainty associated with the graph structure achieves its minimum value. To approximate the entropy of a random graph, one may use the random graph model to construct graphs, and then estimate the spectral density from that data set. For example, given a random graph model, we construct a set of graphs {G1 , G2 , … , GN } with n vertices using the model, and then for each Gj , 1 ≤ j ≤ N, we apply a density function estimator. In the examples shown in this chapter, we have considered an estimator based on the Gaussian kernel. It can be interpreted as a smoothed version of a histogram. Given a graph Gj (j) (j) (j) and its spectrum {𝜆1 , 𝜆2 , … , 𝜆n }, each eigenvalue 𝜆i contributes to estimate the function in a point 𝜆 according to the difference between 𝜆i and 𝜆. That contribution is weighted by the kernel (K) function and depends on a parameter known as bandwidth (h), which controls the size of the neighborhood around 𝜆. Formally, the density function estimator in a point 𝜆 is ) ( n 𝜆 − 𝜆i 1∑ , K f̂ (𝜆) = n i=1 h where 1 2 1 K(u) = √ e− 2 u . 2𝜋

For the examples given in this chapter, we have chosen the Silverman’s criterion [21] to select the bandwidth h. To obtain an estimator for the random graph, we can apply the procedure described above for each observed graph G1 , G2 , … , GN , and then take the average among all the estimators. We illustrate the empirical spectral entropy of the Erd˝os–Rényi, Gilbert, geometric, Barabási–Albert, Watts–Strogatz, and k-regular random graphs in Figure 6.3 by varying the parameter of each model as indicated in the images. For each model and each parameter, we have generated 50 graphs of size n = 500. The spectral entropy tells us that the uncertainty of the Erd˝os–Rényi, the geometric, and the Gilbert models are associated with their parameters((m, ) r, and p, respectively) in a similar way. When the number of edges is m = 0.5 n2 , where n is the number of vertices, the entropy of the Erd˝os–Rényi random graph achieves its maximum value. Conversely, when the graph approximates to a full or empty ( ) n graph (i.e., when m → 2 or m → 0, respectively), the entropy achieves its lowest values. The k-regular random graph also achieves a lower entropy when the graph approximates to an empty graph (i.e., when k → 0). If k increases, then the entropy will increase until k achieves an intermediate value. For the geometric random graph, it is intuitive that when the radius r is close to zero, a graph taken √ at random will probably have few edges, and when the parameter r is close to 2, the graph will probably approximate to a full graph. Therefore, in those scenarios, the amount of uncertainty is low. By contrast, if the radius r is intermediate, then the entropy of the geometric random graph achieves its highest values.

1.4

1.8

1.0

–0.4 –0.8 –1.6 0.2

0.4

0.6

0.8

1.0

p

0.0

0.2

0.6

0.8

0.6

Spectral entropy

0.5

0.4

0.4 r

(c)

0.8

Spectral entropy

2 3 4

–1.2

Spectral entropy

0.0 –0.5 0.0

(b)

0.0

0.8

0.5

0.4 0.6 m/N

191

1.0

1.0

5

Spectral entropy

Graph Spectral Entropy

0.5 0.2

–1.0

Spectral entropy 0.0

(a)

–1.5

0.0 –0.5 –1.0 –1.5

Spectral entropy

0.5

6.4

1.0 (d)

1.2

1.6 ps

0.0

2.0 (e)

Figure 6.3 Graph spectral entropy. The empirical graph spectral entropy (y-axis) for ˝ the Erdos–Rényi (a), Gilbert (b), Geometric (c), Barabási–Albert (d), Watts–Strogatz (e), and k-regular (f ) random graph models. In (a), (b), (c), and (e), the values on x-axis varies from 0 to 1. In (d), the values vary from 1 to 2. In (f ), the values vary from 0 to 0.5. In (b), (c), (d), and (e), the

0.2

0.4

0.6 pr

0.8

1.0

0.0 (f)

0.1

0.2 0.3 k/n

x-axis corresponds to the parameters p, r, ps , and pr , respectively. In (a), the parameter m is obtained by multiplying the value ( )

on the x-axis by N = n2 . In (f ), we multiply the value on the x-axis by n to obtain k. For each model, the empirical spectral entropy was obtained from 50 graphs of size n = 500.

For the Barabási–Albert and Watts–Strogatz models, the spectral entropy also satisfies the intuitive notion of uncertainty. As illustrated in Figure 6.3d, the Barabási–Albert empirical entropy is inversely proportional to the scale exponent (ps ). When ps is low, the randomness of the graph construction is high, because the influence of the vertex degrees over the probability to connect two vertices is low. Conversely, when ps is high, the vertex degrees greatly contribute to the insertion of edges, and, then, the amount of uncertainty is low. Finally, in Figure 6.3e, we note that in the Watts–Strogatz model, the spectral entropy increases with the increasing of the parameter pr . Remember that the parameter pr is the probability of replacing the last edge inserted, which connects a vertex i to another vertex that is near to it in the lattice, by an edge that connects the vertex i to another vertex chosen randomly. Thus, when pr = 1, we have a graph constructed in a completely random way. Conversely, when pr = 0, the graph construction is determined by the lattice structure, in which each vertex is connected to the K nearest vertices.

0.4

0.5

192

6 Statistical Methods in Graphs: Parameter Estimation, Model Selection, and Hypothesis Test

6.5 Kullback–Leibler Divergence

The Kullback–Leibler divergence (KL) measures the amount of information lost when a probability distribution is used to approximate other distribution. For graphs, the KL divergence can be used to discriminate random graphs, select a graph model, and estimate parameters that best describe the observed graph. Clearly, if two spectral densities are different, then the corresponding random graphs are different. However, different random graphs may have the same spectral density. Let g1 and g2 be two random graphs with spectral densities 𝜌1 and 𝜌2 , respectively. The Kullback–Leiber divergence is defined as follows. If the support of 𝜌2 contains the support of 𝜌1 , then the 𝐾𝐿 divergence between 𝜌1 and 𝜌2 is +∞

KL(𝜌1 |𝜌2 ) =

∫−∞

𝜌1 (𝜆) log

𝜌1 (𝜆) d𝜆, 𝜌2 (𝜆)

where 0 log 0 = 0 and 𝜌2 is the reference measure [6]. If the support of 𝜌2 does not contain the support of 𝜌1 , then KL(𝜌1 |𝜌2 ) = +∞. The KL divergence is nonnegative, and it is zero only if 𝜌1 and 𝜌2 are equal. Note that in many cases, KL(𝜌1 |𝜌2 ) and KL(𝜌2 |𝜌1 ) are different when 𝜌1 ≠ 𝜌2 , that is, KL is asymmetric. The asymmetric property of the KL divergence is particularly useful when we want to find the reference measure that best describes the observed spectrum (we explain details about applications of the KL divergence in Section 6.7). However, if the goal is to discriminate graph structural properties between two random graphs, then we usually do not have a reference measure. In this case, it is more adequate to use a symmetric divergence between graphs, such as the Jensen–Shannon divergence, which is described in the next section.

6.6 Jensen–Shannon Divergence

The Jensen–Shannon (JS) divergence is a symmetric alternative to the Kullback–Leibler divergence. The 𝐽 𝑆 divergence between two spectral densities 𝜌1 and 𝜌2 is JS(𝜌1 , 𝜌2 ) =

1 1 KL(𝜌1 |𝜌M ) + KL(𝜌2 |𝜌M ), 2 2

where 𝜌M = 12 (𝜌1 + 𝜌2 ) [6]. We can interpret the Jensen–Shannon divergence as a measure of the structural differences between two random graphs. The square root of the Jensen–Shannon is a metric, √1 , 𝜌2 ) ≥ 0, JS(𝜌 √ that is, JS(𝜌 √1 , 𝜌2 ) = 0 only if 𝜌1 = 𝜌2 , JS(𝜌1 , 𝜌2 ) = JS(𝜌2 , 𝜌1 ) and JS(𝜌1 , 𝜌3 ) ≤ JS(𝜌1 , 𝜌2 ) + JS(𝜌2 , 𝜌3 ) for any spectral density 𝜌3 .

6.7

Model Selection and Parameter Estimation

6.7 Model Selection and Parameter Estimation

Once we have defined random graphs and divergences between random graphs, several questions may be pertinent. Given a random graph g and a random graph model M(𝜃) with a parameter 𝜃, can we measure how well M(𝜃) describes g? If we take a set of random graph models S = {M1 , M2 , … , MP }, can we find which of them does best describe g? As we have discussed in the previous sections, the spectral density describes several structural properties of random graphs. Then, we may consider to address the first question by measuring the dissimilarity between the spectral densities of the random graph g and model M(𝜃) (i.e., the spectral density of the random graph generated by the model M with a given parameter 𝜃). To measure that dissimilarity, we can, for example, choose the random graph model spectral density as the reference measure, and then use the KL divergence to measure how much information is lost when the model is used to estimate the random graph spectral density. For addressing the second question, it is reasonable to select the model that minimizes the dissimilarity between the random graph spectral density and the model spectral density. We can break this approach into two steps. First, for a fixed model, we estimate the parameter and then, given the parameter estimator for each model, finally select a model to describe the random graph. Formally, we describe the first step as follows. Let 𝜌g be the spectral density of the random graph g. Given a random graph model M, let 𝜃 be a real vector containing values for each parameter of M. If we consider all possible choices for 𝜃, then the model M generates a parametric family of spectral densities {𝜌𝜃 }. Assuming that there exists a value of 𝜃 that minimizes KL(𝜌g |𝜌𝜃 ), which is denoted by 𝜃 ∗ , we have 𝜃 ∗ = arg min KL(𝜌g |𝜌𝜃 ). 𝜃

However, in real applications, the spectral density 𝜌g is unknown. Therefore, in practice, an estimator 𝜌̂g of 𝜌g is used, as described in Section 6.4. Then, an estimator 𝜃̂ of 𝜃 ∗ is 𝜃̂ = arg min KL(𝜌̂g |𝜌𝜃 ). 𝜃

(6.2)

The second step for the model selection consists in using Eq. 6.2 to estimate the parameters for each model and then choosing the model that minimizes the KL divergence. Let {𝜌𝜃 1 }, {𝜌𝜃 2 }, … , {𝜌𝜃 P } be parametric families of spectral densities, 𝜃̂i for i = 1, 2, … , P be the estimates of 𝜃i obtained by Eq. 6.2, and #(𝜃i ) be the dimension of 𝜃i . Then, the best candidate model j is selected by j = arg min 2KL(𝜌̂g |𝜌𝜃̂ i ) + 2#(𝜃̂i ). i

(6.3)

193

194

6 Statistical Methods in Graphs: Parameter Estimation, Model Selection, and Hypothesis Test

Table 6.1 Parameter estimation. Size Model

ER (m = 0.5N) GI (p = 0.5) GE (r = 0.5) BA (ps = 1) WS (pr = 0.07) KR (k = 0.25n)

20

50

100

500

0.503N ± 0.013N 0.506 ± 0.039 0.493 ± 0.061 1.128 ± 0.309 0.129 ± 0.155 0.264n ± 0.013n

0.500N ± 0.002N 0.501 ± 0.014 0.506 ± 0.037 1.044 ± 0.125 0.069 ± 0.011 0.245n ± 0.005n

0.500N ± 0 0.501 ± 0.008 0.502 ± 0.022 1.026 ± 0.047 0.071 ± 0.008 0.250n ± 0

0.499N ± 0.003N 0.499 ± 0.003 0.500 ± 0.010 1.020 ± 0.025 0.070 ± 0.003 0.249n ± 0.004n

Average and standard deviations of the parameters estimated by the Kullback–Leibler divergence for the Erd˝os–Rényi (ER), Gilbert (GI), Geometric (GE), Barabási–Albert (BA), Watts–Strogatz (WS), and k-regular (KR) models. For each random graph model, we have applied the parameter estimation to 1000 graphs of sizes 20, 50, 100, and 500. The reference measure of a model was estimated by randomly generating 100 graphs from the model and then by taking the average spectral density estimator. The true parameters used to generate the graphs are shown with parentheses (the number of edges m, the probability of connecting two vertices p, the radius r, the scale exponent ps , the probability of reconnecting a vertex pr , and the degree k). In general, the ( ) parameter m is a function of N = n2 and the parameter k is a function of n, where n is the number of vertices.

Equation 6.3 is based on the Akaike information criterion (AIC), which includes, in addition to the KL divergence, a penalty for the number of estimated parameters. That penalty aims to avoid overfitting, once adjusting the model is easier when the number of parameters is high. In particular, all models considered in this chapter have the same number of parameters, and therefore, we can select the model by minimizing the KL divergence without penalty. To illustrate the performance of the parameter estimation and model selection approaches, we have generated 1000 graphs of sizes n = 20, 50, 100, 500, using the Ed˝os–Rényi (ER), Gilbert (GI), Geometric (GE), Barabási–Albert (BA), Watts–Strogatz (WS), and k-regular (KR) random graph models with parameters m = 0.5N, p = 0.5, r = 0.5, ps = 1, pr = 0.07, and k = 0.25n. In Table 6.1, we can see that the average estimator is very close to the true parameter for all models when the graph is sufficiently large. The results of our illustrative model selection experiments are shown in Table 6.2. We can observe that when the true model is GE, BA, and WS, the number of right choices increases with the size of the graph. As expected, the ER graphs approximate to GI graphs asymptotically, and therefore, when the true model is ER or GI, the model selection approach cannot discriminate them. Since the k-regular graph is a particular type of ER graph, it is natural that some graphs generated by the ER/GI model will approximate to a regular graph. As we can see in Table 6.2 about 14% of the ER and GI graphs have been classified as KR. Similarly, some large KR graphs are classified as ER/GI.

6.8

Hypothesis Test between Graph Collections

Table 6.2 Model selection simulation. True model

ER

GI

GE

BA

WS

KR

n

ER

GI

20 50 100 500 20 50 100 500 20 20 100 500 20 50 100 500 20 50 100 500 20 50 100 500

835 564 841 378 762 725 587 346 0 0 0 0 571 313 6 0 75 0 0 0 7 0 0 56

36 282 28 480 135 213 337 510 0 0 0 0 134 70 40 0 2 0 0 0 0 0 0 72

Predicted model GE BA

0 0 0 0 0 0 0 0 1000 1000 1000 1000 197 2 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 98 615 954 1000 0 0 0 0 0 0 0 0

WS

KR

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 923 1000 1000 1000 0 0 0 0

129 154 131 142 103 62 76 144 0 0 0 0 0 0 0 0 0 0 0 0 993 1000 1000 872

The rows show the true models used to generate the graphs. Each position shows the number of graphs that were classified as the corresponding column. The model selection was performed on 1000 graphs of sizes n = 20, 50, 100, 500 for the Erd˝os–Rényi (ER), Gilbert (GI), Geometric (GE), Barabási–Albert (BA), Watts–Strogatz (WS), and the k-regular (KR) random graphs. To estimate the spectral density of each random graph, we have taken the average among 100 graphs generated by the model.

6.8 Hypothesis Test between Graph Collections

Let T1 and T2 be two collections of graphs. Assuming that all graphs from T1 were generated by the same random graph model M1 with parameter 𝜃1 and the graphs from T2 were generated by a model M2 with parameter 𝜃2 , could we check if M1 = M2 and 𝜃1 = 𝜃2 ? To answer this question, a natural approach is to measure the dissimilarity between the two graph collections. As we have shown in the previous section, comparing spectral densities is a reasonable approach for model selection and parameter estimation. However, differently from the previous problem, in the

195

196

6 Statistical Methods in Graphs: Parameter Estimation, Model Selection, and Hypothesis Test

hypothesis test between graph collections, it is not clear which distribution is the reference measure. Therefore, as the Kullback–Leibler divergence is an asymmetric measure, it is not suited for the hypothesis test. Remember that the Jensen–Shannon divergence is the symmetric version of the Kullback–Leibler divergence and then it is a natural candidate for the statistic of the hypothesis test between collections of graphs, which we describe formally as follows. Let g1 and g2 be two random graphs with spectral densities 𝜌1 and 𝜌2 , respectively. We want to test the following hypotheses: H0 ∶ JS(𝜌1 𝜌2 ) = 0 H1 ∶ JS(𝜌1 𝜌2 ) > 0 Given a sample from g1 , denoted by T1 , and a sample from g2 , denoted by T2 , the test statistic is JS(𝜌̂1 , 𝜌̂2 ), where 𝜌̂1 and 𝜌̂2 are the average spectral density estimates obtained from T1 and T2 , respectively. After choosing the test statistic and the null and alternative hypotheses, our goal is to obtain the p-value, which is the probability that the test statistic will be at least as extreme as the value observed in the data, assuming that the null hypothesis is true. Thus, to obtain a p-value for the hypothesis test, we need to obtain the statistic distribution under the null hypothesis. This is usually done from an asymptotic distribution or from random resamplings of the data. In the first case, an analytic formula, which is unknown for many statistics and usually depends on assumptions about the population distribution that cannot be verified, is necessary. In the second case, the random resamplings are usually constructed by the Monte Carlo method, which is based on pseudorandom numbers generated by the computer. In general, for that approach, we assume only that the data are independent and identically distributed. Takahashi et al. [6] proposed a test for the Jensen–Shannon divergence between spectral densities that relies on random resamplings of the data with replacement. That approach is known as the bootstrap procedure. The bootstrap was proposed by Efron [22] to estimate the sample distribution of a statistic from random resamplings of the data. This approach is used in several applications, such as standard error estimation for a statistic, confidence intervals for population parameters, and hypothesis tests. The idea behind the bootstrap is that the observed sample is usually the best population approximation. When the data are independent and identically distributed, the bootstrap approach can be implemented by resampling with replacement of the original data set. Let n1 and n2 be the number of graphs in T1 and T2 , respectively; B the number of desired bootstrap replications; and T = T1 ∪ T2 . The bootstrap procedure for the hypothesis test between T1 and T2 is described as follows: (1) Calculate JS(𝜌̂1 , 𝜌̂2 ). (2) Resample with replacement n1 graphs from T and construct a new set (bootstrap sample) T̃1 . Obtain the T̃1 average spectral density estimator, which is denoted by 𝜌̂T̃1 .

6.8

Hypothesis Test between Graph Collections

(3) Resample with replacement n2 graphs from T and construct a new set (bootstrap sample) T̃2 . Obtain the T̃2 average spectral density estimator, which is denoted by 𝜌̂T̃2 . (4) Calculate JS(𝜌̂T̃1 , 𝜌̂T̃2 ). (5) Repeat steps 2–4 for B times. (6) The p-value is the proportion of bootstrap replications such that JS(𝜌̂T̃1 , 𝜌̂T̃2 ) ≥ JS(𝜌̂1 , 𝜌̂2 ). Note that T̃1 and T̃2 are constructed by taking at random, with replacement, graphs from the set T, which has elements from both T1 and T2 . Therefore, the new data set containing T̃1 and T̃2 simulates the null hypothesis that the graphs in both sets are from the same population. To show that the bootstrap procedure for the Jensen–Shannon divergence rejects/accepts the null hypothesis as expected, we have generated, for each scenario, 1000 data sets of graphs using the Erd˝os–Rényi (ER), Gilbert (GI), Geometric (GE), Barabási–Albert (BA), Watts–Strogatz (WS), and k-regular (KR) random graph models. To evaluate the proportion of rejections under the null hypothesis that both collections of graphs being tested are from the same population, we have constructed each data set by generating, with the same random procedure, two collections of 50 graphs. For the ER, GI, GE, BA, WS, and KR models, we have used, respectively, the parameters m = 0.5N, p = 0.5, r = 0.1, p(s = 0.5, ) pr = 0.07, k = 0.05n, where n = 100 is the number of vertices,

and N = 100 . 2 We have also generated data sets under the alternative hypothesis (H1 ∶ the collections being tested were generated by different random processes) to illustrate the empirical statistical power of the Jensen–Shannon test. In our experiment under the alternative hypothesis, each data set has two collections of 50 graphs, all of them generated by the same random graph model, but using different parameter values for each collection. The graphs from the ER, GI, GE, BA, WS, and KR models were constructed using, respectively, the parameters m = 0.4N versus 0.41N, p = 0.4 versus 0.41, r = 0.25 versus 0.26, ps = 0.5 versus 0.6, pr = 0.2 versus ( 0.21, )

k = 0.05n versus 0.06n, where n = 100 is the number of vertices, and N = 100 . 2 For both experiments (under the null and alternative hypotheses), we have evaluated the results by constructing an ROC curve described as follows. The ROC curve is constructed over a two-dimensional plot, in which the x-axis corresponds to the significance level 𝛼 of the test and the y-axis corresponds to the proportion of rejected null hypotheses. Then, the y-axis represents the empirical statistical power of the test. If we reject the null hypothesis randomly, we expect that the proportion of rejected null hypotheses will be equal/close to the significance level, and then the ROC curve will lie on the diagonal. Conversely, if we reject the null hypothesis with high power, then the ROC curve will be drawn above the diagonal. Under the null hypothesis, we want the proportion of rejected null hypothesis (false positive rate) to be controlled by the significance level 𝛼. We then expect that the ROC curve will lie on the diagonal. In Figure 6.4, we show the ROC curves for the experiments under the null hypothesis. The dashed and continuous lines

197

0.2

0.4

0.6

0.8

1.0

Significance level of the test (FPR)

(e)

1.0

0.2

0.4

0.6

0.8

Significance level of the test (FPR)

Figure 6.4 ROC curves under H0 . The dashed lines show the expected outcome. The continuous lines indicate the observed ROC curve (significance level vs. proportion of rejected null hypothesis) for the Jensen–Shannon test between two collections of 50 graphs of size n = 100 generated by the same random process (null hypothesis). In each scenario, the Jensen–Shannon test was applied to 1000 data sets generated ˝ by the Erdos–Rényi (a), Gilbert (b), Geometric (c), Barabási–Albert (d), Watts–Strogatz

1.0

(f)

1.0 0.8 0.6 0.4 0.2 0.0

0.2

0.4

0.6

0.8

1.0

0.8

1.0

Significance level of the test (FPR)

0.6

1.0 0.8 0.6 0.0

0.0

(c)

0.4

0.8

0.2

0.6

0.0

0.4

Proportion of rejected null hypotheses (TPR)

1.0 0.8 0.6 0.4 0.2 0.0

0.2

Significance level of the test (FPR)

0.4

1.0 0.8 0.6 0.4 0.2 0.0

0.0

(b)

Proportion of rejected null hypotheses (TPR)

1.0

0.2

0.8

0.0

0.6

Proportion of rejected null hypotheses (TPR)

0.4

0.0

Proportion of rejected null hypotheses (TPR)

0.2

Significance level of the test (FPR)

Proportion of rejected null hypotheses (TPR)

1.0 0.8 0.6 0.4 0.2 0.0

(a)

(d)

6 Statistical Methods in Graphs: Parameter Estimation, Model Selection, and Hypothesis Test

0.0

Proportion of rejected null hypotheses (TPR)

198

0.0

0.2

0.4

0.6

0.8

1.0

Significance level of the test (FPR)

(e), and k-regular random graph models. For each model, all graphs were constructed with the same parameters. In (a), ( the) number of edges is equal to 0.5N = 0.5 100 . In (b), 2 the probability of connecting two vertices p is equal to 0.5. In (c), we have r = 0.1, where r is the radius used by the geometric model. In (d), we set the scale exponent ps to 0.5. In (e), the probability of reconnecting a vertex pr is equal to 0.07. In (f ), the vertex degrees are equal to k = 0.05 n = 5.

show, respectively, the expected and observed results. We can then observe that the ROC curves approximate the expected results for all graph models. Under the alternative hypothesis, it is desirable that the ROC curve is as far as possible from the diagonal. We can see in Figure 6.5 that the ROC curve is above the diagonal for all models. In particular, the empirical power of the ER, GI, WS, and KR models achieve its maximum value for all significance levels.

6.9 Final Considerations

In the previous sections, we have explained some concepts behind statistical methods in graphs. Then, it is pertinent to ask which types of real problems

0.2

0.4

0.6

0.8

Significance level of the test (FPR)

1.0

(e)

1.0

0.2

0.4

0.6

0.8

Significance level of the test (FPR)

Figure 6.5 ROC curves under H1 . The dashed lines show the poorest possible outcome. The continuous lines indicate the observed ROC curve (significance level vs. proportion of rejected null hypothesis) for the Jensen–Shannon test between two collections of 50 graphs of size n = 100 generated by different random processes (alternative hypothesis). In each scenario, the Jensen–Shannon test was applied to 1000 ˝ data sets generated by the Erdos–Rényi (a), Gilbert (b), Geometric (c), Barabási–Albert (d), and Watts–Strogatz (e) random graph models. For each model, two different

1.0

(f)

1.0 0.8 0.6 0.4 0.2 0.0

0.2

0.4

0.6

0.8

1.0

0.8

1.0

Significance level of the test (FPR)

0.6

1.0 0.8 0.0

0.0

(c)

0.4

0.8

199

0.2

0.6

Final Considerations

0.0

0.4

Proportion of rejected null hypotheses (TPR)

1.0 0.8 0.6 0.4 0.2 0.0

0.2

Significance level of the test (FPR)

0.6

1.0 0.8 0.6 0.4 0.2 0.0

0.0

(b)

Proportion of rejected null hypotheses (TPR)

1.0

0.4

0.8

0.2

0.6

0.0

0.4

0.0

Proportion of rejected null hypotheses (TPR) (d)

0.2

Significance level of the test (FPR)

Proportion of rejected null hypotheses (TPR)

0.0

(a)

Proportion of rejected null hypotheses (TPR)

1.0 0.8 0.6 0.4 0.2 0.0

Proportion of rejected null hypotheses (TPR)

6.9

0.0

0.2

0.4

0.6

0.8

Significance level of the test (FPR)

parameters are used to generate the two collections of graphs. In (a), the number of edges to 0.4N and 0.41N, where ( is equal )

N = 100 . In (b), the probability of con2 necting two vertices p is equal to 0.4 and 0.41. In (c), we have r = 0.25 for one collection and r = 0.26 for the other, where r is the radius used by the geometric model. In (d), we set the scale exponent ps to 0.5 and 0.6. In (e), the probability of reconnecting a vertex pr is equal to 0.2 and 0.21. In (f ), the vertex degrees are equal to k = 0.05 n = 5 and k = 0.06 n = 6.

can be addressed using those methods. In the following sections, we illustrate interesting applications in biological problems. 6.9.1 Model Selection for Protein–Protein Networks

In Section 6.7, we have described approaches for estimating parameters and selecting random graph models. Takahashi et al. [6] have applied the model selection approach based on the Kullback–Leibler divergence to protein–protein networks of eight species. In accordance with the literature, all graphs were classified as scale-free.

1.0

200

6 Statistical Methods in Graphs: Parameter Estimation, Model Selection, and Hypothesis Test

6.9.2 Hypothesis Test between the Spectral Densities of Functional Brain Networks

In addition to the model selection approach, Takahashi et al. [6] applied the method described in Section 6.8, which is based on the Jensen–Shannon divergence between the graph spectral densities, to test if two collections of functional brain networks have similar structures. The functional brain networks were inferred from fMRI (functional magnetic resonance imaging) data of two groups of people: the first containing 479 individuals with typical development and the second containing 159 individuals diagnosed with ADHD (attention-deficit hyperactivity disorder). The Jensen–Shannon test was then performed between the groups of patients with typical development and those diagnosed with ADHD. The resulting p-value was lower than 0.05, suggesting differences between the functional brain networks of typical development and ADHD individuals. 6.9.3 Entropy of Brain Networks

As we discussed in Section 6.4, the graph spectral entropy can be interpreted as a measure of the amount of uncertainty associated with the graph structure. Given a graph that represents the functional connectivity of the brain, can the entropy reveal anything about the brain patterns of connectivity? Sato et al. [9] measured the entropy of the functional brain network in the data set described in Section 6.9.2 containing individuals with typical development and individuals diagnosed with ADHD. After inferring the functional brain connections, the authors clustered the regions of the brains into four groups, each one representing a functional connectivity (subnetwork) of the brain. For each cluster, Sato et al. [9] have tested if the entropy of the corresponding subgraph is different between individuals with typical development and individuals with ADHD. The results were consistent with the literature, identifying brain regions associated with ADHD. Furthermore, in that study, the entropy measure has identified clusters that other commonly used measures of complex networks (e.g., average degree, average clustering coefficient, and average shortest path length) have not. Thus, the results suggested that the entropy could reveal abnormalities in the functional connectivity of the brain.

6.10 Conclusions

The development of statistical methods in graphs is crucial to better understand the mechanisms underlying biological systems such as functional brain networks and gene regulatory networks. This chapter has shown a statistical framework composed of methods to (i) measure the network entropy; (ii) estimate the parameters of the graphs; (iii) model selection; and (iv) test whether two populations of

References

graphs are generated by the same random graph model. Experiments were conducted using Monte Carlo simulation data to illustrate the strengths of the statistical approaches. These approaches are flexible and allow generalizations to other families of graphs that are not limited to the ones illustrated in this chapter.

6.11 Acknowledgments

Suzana de Siqueira Santos was supported by grants from FAPESP (2012/25417-9; 2014/09576-5). Daniel Yasumasa Takahashi was supported by grants from FAPESP (2014/09576-5; 2013/07699-0) and CNPq (473063/2013-1). João Ricardo Sato was supported by grants from FAPESP (2013/10498-6; 2014/09576-5). Carlos Eduardo Ferreira was supported by a grant from FAPESP (2013/07699-0). André Fujita was supported by grants from FAPESP (2014/09576-5; 2013/034476), CNPq (304020/2013-3; 473063/2013-1), NAP-eScience–PRP–USP, and CAPES. References 1. Chung, F. and Lu, L. (2006) Complex

2.

3.

4.

5.

6.

7.

Graphs and Networks (Cbms Regional Conference Series in Mathematics), American Mathematical Society, Boston, MA. Erdös, P. and Rényi, A. (1959) On random graphs. Publ. Math. Debrecen, 6, 290–297. Gilbert, E.N. (1959) Random graphs. Ann. Math. Stat., 30 (4), 1141–1144, doi: 10.1214/aoms/1177706098. Barabási, A.L. and Albert, R. (1999) Emergence of scaling in random networks. Science, 286 (5439), 509–512, doi: 10.1126/science.286.5439.509. Watts, D.J. and Strogatz, S.H. (1998) Collective dynamics of ‘small-world’ networks. Nature, 393 (6684), 440–442, doi: 10.1038/30918. Takahashi, D.Y., Sato, J.R., Ferreira, C.E., and Fujita, A. (2012) Discriminating different classes of biological networks by analyzing the graphs spectra distribution. PLoS ONE, 7 (12), e49 949, doi: 10.1371/journal.pone.0049949. Shannon, C.E. (1948) A mathematical theory of communication. Bell Syst. Tech. J., 27 (3), 379–423, doi: 10.1002/j.1538-7305.1948.tb01338.x.

8. Dehmer, M. and Mowshowitz, A.

9.

10.

11.

12.

13.

(2011) A history of graph entropy measures. Inf. Sci., 181 (1), 57–78, doi: 10.1016/j.ins.2010.08.041. Sato, J.R., Takahashi, D.Y., Hoexter, M.Q., Massirer, K.B., and Fujita, A. (2013) Measuring network’s entropy in ADHD: a new approach to investigate neuropsychiatric disorders. NeuroImage, 77, 44–51, doi: 10.1016/j.neuroimage.2013.03.035. Bollobás, B. and Chung, F.R.K. (1991) Probabilistic Combinatorics and its Applications, American Mathematical Society. Bollobás, B. and Riordan, O.M. (2002) Mathematical results on scale-free random graphs, in Handbook of Graphs and Networks (eds S. Bornholdt and H.G. Schuster), Wiley-VCH Verlag GmbH & Co. KGaA, pp. 1–34. Penrose, M. (2003) Random geometric graphs, Oxford Studies in Probability, vol. 5, Oxford University Press. Bollobás, B., Riordan, O., Spencer, J., and Tusnády, G. (2001) The degree sequence of a scale-free random graph process. Random Struct. Algorithms, 18 (3), 279–290, doi: 10.1002/rsa.1009.

201

202

6 Statistical Methods in Graphs: Parameter Estimation, Model Selection, and Hypothesis Test 14. Bollobás, B. (2001) Random Graphs, 15.

16.

17.

18.

Cambridge University Press. Cvetković, D.M., Doob, M., and Sachs, H. (1980) Spectra of Graphs: Theory and Application, Academic Press. Bapat, R.B. and Raghavan, T.E.S. (1997) Nonnegative Matrices and Applications, Cambridge University Press, Cambridge. Spielman, D. (2012) Spectral graph theory, in Combinatorial Scientific Computing, 1st edn (eds U. Naumann and O. Schenk), Chapman and Hall/CRC, Boca Raton, FL, pp. 495–517. Wilf, H.S. (1967) The Eigenvalues of a Graph and Its Chromatic Number. J. London Math. Soc., 42 (1), 330–332.

19. Brouwer, A.E. and Haemers, W.H. (2011)

Spectra of Graphs, Springer Science & Business Media. 20. Rogers, T. (2010) New results on the spectral density of random matrices. PhD thesis, King’s College London. 21. Silverman, B.W. (1986) Density Estimation for Statistics and Data Analysis, Chapman and Hall, Boca Raton, FL. 22. Efron, B. (1979) Bootstrap methods: another look at the jackknife. Ann. Stat., 7 (1), 1–26, doi: 10.1214/aos/1176344552.

203

7 Graph Entropies in Texture Segmentation of Images Martin Welk

7.1 Introduction

Graph models have been used in image analysis for a long time. The edited book [1] gives an overview of methods in this field. However, approaches from quantitative graph theory such as graph indices have not played a significant role in these applications so far. This is to some extent surprising as it is not a far-fetched idea to model information contained in small patches of a textured image by graphs, and once this has been done, graph indices with their ability to extract in a quantitative form structural information from large collections of graphs lend themselves as a promising tool specifically for texture analysis. A first step in this direction has been made in Ref. [2], where a set of texture descriptors was introduced that combines a construction of graphs from image patches with well-known graph indices. This set of texture descriptors was evaluated in Ref. [2] in the context of a texture discrimination task. In Ref. [3], an example for texture-based image segmentation was presented based on this work. This chapter continues the work begun in Refs [2] and [3]. Its purpose is twofold. On the one hand, it restates and slightly extends the experimental work from [3] on texture segmentation, focusing on those descriptors that are based on entropy measures, which turned out particularly useful in the previous contributions. On the other hand, it undertakes a first attempt to analyze the graph index-based texture descriptors with regard to what kind of information they actually extract from a textured image. 7.1.1 Structure of the Chapter

The remaining part of Section 7.1 briefly outlines the fields of research that are combined in this work, namely quantitative graph theory, see Section 7.1.2, graph

Mathematical Foundations and Applications of Graph Entropy, First Edition. Edited by Matthias Dehmer, Frank Emmert-Streib, Zengqiang Chen, Xueliang Li, and Yongtang Shi. © 2016 Wiley-VCH Verlag GmbH & Co. KGaA. Published 2016 by Wiley-VCH Verlag GmbH & Co. KGaA.

204

7 Graph Entropies in Texture Segmentation of Images

models in image analysis with emphasis on the pixel graph and its edge-weighted variant, see Section 7.1.3, and finally texture analysis, Section 7.1.4. In Section 7.2, the construction of graph entropy-based texture descriptors from [2] is detailed. Section 7.3 gives a brief account of the geodesic active contour method, a well-established approach for image segmentation that is based on the numerical solution of a partial differential equation (PDE). Texture segmentation combining the graph entropy-based texture descriptors with the geodesic active contour (GAC) method is demonstrated on two synthetic examples that represent typical realistic texture segmentation tasks, and a real-world example in Section 7.4. Some theoretical analysis is presented in Section 7.5, where (one setup of ) graph entropy-based texture descriptors is put into relation with fractal dimension measurements on a metric space derived from the pixel graph, and thus a connection is established between graph entropy methods and fractal-based texture descriptors. A short conclusion, Section 7.6, ends the chapter. 7.1.2 Quantitative Graph Theory

Quantitative measures for graphs have been developed for almost 60 years in mathematical chemistry as a means to analyze molecular graphs [4–8]. In the course of time, numerous graph indices have been derived based on edge connectivity, vertex degrees, distances, and information-theoretic concepts, see for example, [9] that classifies over 900 descriptors from literature and subjects them to a large-scale statistical evaluation on several test data sets. Recently, interesting new graph indices based on the so-called Hosoya polynomial have been proposed [10]. Fields of application have diversified in the last decades to include, for example, biological and social networks and other structures that can mathematically be modeled as graphs, see [11]. Effort to apply statistical methods to graph indices across these areas has been bundled in the emerging field of quantitative graph theory [11, 12]. Many contributions in this field group around the tasks of distinguishing and classifying graphs, and quantifying the differences between graphs. The first task focuses on the ability of indices to uniquely distinguish large sets of individual graphs, termed discrimination power [10, 13, 14]. For the latter task, inexact graph matching, the graph edit distance [15, 16], or other measures quantifying the size of substructures that are shared or not shared between two graphs are of particular importance, see also [17–23]. The concept of discrimination power has to be complemented for this purpose by the principle of smoothness of measures, see [24], which describes how similar the values of a measure are when it is applied to similar graphs. In Ref. [25], the quantitative measures of structure sensitivity and abruptness have been introduced in order to precisely analyze the discrimination power and smoothness of graph indices. These measures are based, respectively, on the average and maximum of the changes of graph index values when the underlying graph is modified by one elementary edit step of the graph edit distance.

7.1

Introduction

Discrimination of graph structures by graph indices is also a crucial part of the texture analysis approach discussed here. Thus, discrimination power and the related notions of high structure sensitivity and low abruptness also matter in this chapter. However, unique identification of individual graphs is somewhat less important in texture analysis than when examining single graphs as in Refs [10, 25], as in texture analysis one is confronted with large collections of graphs associated with image pixels, and one is interested in separating these into a small number of classes representing regions. Not only will each class contain numerous graphs, but also the spatial arrangement of the associated pixels is to be taken into account as an additional source of information, as segments are expected to be connected. 7.1.3 Graph Models in Image Analysis

As can be seen in Ref. [1], there are several ways in which image analysis can be linked to graph concepts. A large class of approaches is based on graphs in which the pixels of a digital image play the role of vertices, and the set of edges is based on neighborhood relations, with 4- and 8-neighborhoods as most popular choices in 2D, and similar constructions in 3D, see [26, Section 1.5.1] In order to imprint actual image information on such a graph, one can furnish it with edge weights that are based on image contrasts. Among others, the graph cut methods [27] that have recently received much attention for applications such as image segmentation [28, 29] and correspondence problems [30, 31] make use of this concept. This setup is also central for the work presented here, see a more detailed account of the pixel graph and edge-weighted pixel graph of an image in Section 7.2.1. Generalizing the pixel graph framework, the graph perspective allows to transfer image-processing methods from the regular mesh underlying standard digital images to nonregular meshes that can be related to scanned surface data [32], but arise also from standard images when considering nonlocal models [33] that have recently received great attention in image enhancement. Graph morphology, see for example, [34], is one of these generalizations of image-processing methods to nonregular meshes, but variational and PDE frameworks have also been generalized in this way [35]. We briefly mention that graphs can also be constructed, after suitable preprocessing, from vertices representing image regions, see [26, Section 1.5.2], opening avenues to high-level semantic image interpretation by means of partition trees. Comparison of hierarchies of semantically meaningful partitions can then be achieved, for example, using graph edit distance or related concepts [15]. Returning to the pixel graph setup, which we will also use in this chapter, see Section 7.2.1, we point out a difference of our approach from those that represent the entire image in a single pixel graph. We focus here on subgraphs related

205

206

7 Graph Entropies in Texture Segmentation of Images

to small image patches, thus generating large sets of graphs, whose vertex sets, connectivity, and/or edge weights encode local image information. In order to extract meaningful information from such collections, statistical methods such as entropy-based graph indices are particularly suited. 7.1.4 Texture

In image analysis, the term texture refers to the small-scale structure of image regions, and as such it has been an object of intensive investigation since the beginnings of digital image analysis. For example, [36–40] undertook approaches to define and analyze textures. 7.1.4.1 Complementarity of Texture and Shape

Real-world scenes often consist of collections of distinct objects, which, in the process of imaging, are mapped to regions in an image delineated by more or less sharp boundaries. While the geometric description of region boundaries is afforded by the concept of shape and typically involves large-scale structures, texture represents the appearance of the individual objects, either their surfaces in the case of reflection imaging (such as photography of opaque objects), or their interior if transmission-based imaging modalities (such as transmission microscopy, X-ray, and magnetic resonance) are being considered. Texture is then expressed in the distribution of intensities and their short-scale correlations within a region. A frequently used mathematical formulation of this distinction is the cartoontexture model that underlies many works on image restoration and enhancement, see for example, [41]. In this approach, (space-continuous) images are described as the sum of two functions: a cartoon component from the space BV of functions of bounded variation and a texture component from a suitable Sobolev space. In a refined version of this decomposition [42], noise is modeled as a third component assigned to a different function space. Note that also in image synthesis (computer graphics), the complementarity of shape and texture is used: Here, textures are understood as intensity maps that are mapped on the surfaces of geometrically described objects. The exact frontier between shape and texture information in a scene or image, however, is model-dependent. The intensity variation of a surface is partly caused by geometric details of that surface. With a coarse-scale modeling of shape, smallscale variations are included in the texture description, whereas with a refined modeling of shape, some of these variations become part of the shape information. For example, in the texture samples shown in Figure 7.1a and b, a geometric description with sufficiently fine granularity could capture individual leaves or blossoms as shapes, whereas the large-scale viewpoint treats the entire ensemble of leaves or blossoms as texture.

7.1

(a)

(b)

Figure 7.1 Left to right: (a) Texture patch flowers, 128 × 128 pixels. – (b) Texture patch leaves, same size. – (c) Test image composed from (a) and (b), 120 × 120 pixels. – Both texture patches are converted to gray scale, down scaled, and clipped from the VisTex database [43]. ©1995 Massachusetts

Introduction

(c) Institute of Technology. (Developed by Rosalind Picard, Chris Graczyk, Steve Mann, Josh Wachman, Len Picard, and Lee Campbell at the Media Laboratory, MIT, Cambridge, Massachusetts. Under general permission for scholarly use.)

7.1.4.2 Texture Models

Capturing texture is virtually never possible based on a single pixel. Only the simplest of all textures, homogeneous intensity, can be described by a single intensity. For all other textures, intensities within neighborhoods of suitable size (that differs from texture to texture) need to be considered to detect and classify textures. Moreover, there is a large variety of structures that can be constitutive of textures, ranging from periodic patterns in which the arrangement of intensity values follows strict geometric rules, via near- and quasi-periodic structures to irregular patterns, where just statistical properties of intensities within a neighborhood are characteristic of the texture. The texture samples in Figure 7.1a and b are located rather in the middle of the scale, where both geometric relations and statistics of the intensities are characteristic of the texture; near-periodic stripe patterns as in the later examples, Figures 7.4 and 7.5, are more geometrically dominated. With emphasis on different categories of textures within this continuum, numerous geometric and statistic approaches have been made over the decades to describe textures. For example, frequency-based models [44–46] emphasize the periodic or quasi-periodic aspect of textures. Statistics on intensities such as [38] mark the opposite end of the scale, whereas models based on statistics of image-derivative quantities such as gradients [37] or structure tensor entries [47] attempt to combine statistical with geometrical information. A concept that differs significantly from both approaches has been proposed in Ref. [48], where textures are described generatively via grammars. In addition, fractals [49] have been proposed as a means to describe, distinguish, and classify textures. Remember that a fractal is a geometric object, in fact, a topological space, for which it is possible to determine, at least locally, a Minkowski dimension (or almost identical, Hausdorff dimension), which differs from its topological dimension. Assume that the fractal is embedded in a

207

208

7 Graph Entropies in Texture Segmentation of Images

surrounding Euclidean space, and it is compact. Then, it can be covered by a finite number of boxes, or balls, of prescribed size. When the size of the boxes or balls is set to zero, the number of them needed to cover the structure grows with some power of the inverse box or ball size. The Minkowski dimension of the fractal is essentially the exponent in this power law. The Minkowski dimension of a fractal is often noninteger (which is the reason for its name); however, a more precise definition is that Minkowski and topological dimensions differ, which also includes cases such as the Peano curve, whose Minkowski dimension is an integer (2) but still different from the topological dimension (1). See also [50, 51] for different concepts of fractal dimension. Textured images can be associated with fractals by considering the image manifold, that is, the function graph if the image is modeled as a function over the image plane, which is naturally embedded in a product space of image domain and the range of intensity values. For example, a planar gray value image u ∶ ℝ2 ⊃ Ω → ℝ has the image manifold {(x, u(x))|x ∈ Ω} ⊂ ℝ3 . The dimension of this structure can be considered as a fractal-based texture descriptor. This approach has been stated in Ref. [52], where fractal dimension was put into relation with image roughness and the roughness of physical structures depicted in the image. Many works followed this approach, particularly in the 1980s and the early 1990s when fractals were under particularly intensive investigation in theoretical and applied mathematics. In Ref. [50], several of these approaches are reviewed. An attempt to analyze fractal dimension concepts for texture analysis is found in Ref. [53]. The concept has also been transferred to the analysis of 1D signals, see [54, 55]. During the last two decades, the interest in fractal methods has somewhat reduced, but research in the field remains ongoing as can be seen from more recent publications, see for example, [56] for signal analysis, [57, 58] for image analysis with application in material science. With regard to our analysis in Section 7.5 that leads to a relationship between graph methods and fractals, it is worth mentioning that already [59] linked graph and fractal methods in image analysis, albeit not considering texture, but shape description. 7.1.4.3 Texture Segmentation

The task of texture segmentation, that is, decomposing an image into several segments based on texture differences, has been studied for more than 40 years, see [38–40]. A great variety of different approaches to the problem have been proposed since then. Many of these combined generic segmentation approaches that could also be implemented for merely intensity-based segmentation, with sets of quantitative texture descriptors that are used as inputs to the segmentation. For example, [46, 47, 60, 61] are based on active contour or active region models for segmentation, whereas [62] is an example of a clustering-based method. Nevertheless, texture segmentation continues to challenge researchers; in particular, improvements on the side of texture descriptors are still desirable. Note that the task of texture segmentation involves a conflict: On the one hand, textures cannot be detected on single-pixel level, necessitating the inclusion of neighborhoods in texture descriptor computation. On the other hand, the

7.2

Graph Entropy-Based Texture Descriptors

intended output of a segmentation is a unique assignment of each pixel to a segment, which means to fix the segment boundaries at pixel resolution. In order to allow sufficiently precise location of boundaries, texture descriptors should therefore not use larger patches than necessary to distinguish the textures present in an image. The content of this chapter is centered on a set of texture descriptors that have been introduced in Ref. [2] based on graph representations of local image structure. This model seems to be the first that exploited graph models in discrimination of textures. Note that even texture segmentation approaches in the literature that use graph cuts for the segmentation task use non-graph-based texture descriptors to bring the texture information into the graph-cut formulation, see for example, [29]. Our texture segmentation approach that was already shortly demonstrated in Ref. [3] integrates instead graph-based texture descriptors into a non-graph-based segmentation framework, compare Section 7.3.

7.2 Graph Entropy-Based Texture Descriptors

Throughout the history of texture processing, quantitative texture descriptors have played an important role. Assigning a tuple of numerical values to a texture provides an interface to establish image-processing algorithms that were originally designed to act on intensities, and thereby to devise modular frameworks for image- processing tasks that involve texture information. Following this modular approach, we will focus on the texture segmentation task by combining a set of texture descriptors with a well-established image segmentation algorithm. In this section, we will introduce the texture descriptors, whereas the following section will be devoted to describing the segmentation method. Given the variety of different textures that exist in natural images, it cannot be expected that one single texture descriptor will be suitable to discriminate arbitrary textures. Instead, it will be sensible to come up with a set of descriptors that complement each other well in distinguishing different kinds of textures. To keep the set of descriptors at a manageable size, the individual descriptors should nevertheless be able to discriminate substantial classes of textures. On the contrary, it will be useful for both theoretical analysis and practical computation if the set of descriptors is not entirely disparate but based on some powerful common concept. In Ref. [2], a family of texture descriptors was introduced based on the application of several graph indices to graphs representing local image information. In combining six sets of graphs derived from an image, whose computation is based on common principles, with a number of different but related graph indices, this family of descriptors is indeed built on a common concept. The descriptors were evaluated in Ref. [2] in a simple texture discrimination task, and turned out to yield results competitive with Haralick features [36, 37], a well-established concept in texture analysis. In this comparison, graph indices based on entropy measures stood out by their texture discrimination rate.

209

210

7 Graph Entropies in Texture Segmentation of Images

In the following, we recall the construction of texture descriptors from [2], focusing on a subset of the descriptors discussed there. The first step is the construction of graphs from image patches. In the second step, graph indices are computed from these graphs. 7.2.1 Graph Construction

A discrete grayscale image is given as an array of real intensity values sampled at the nodes of a regular grid. The nodes are points (xi , yj ) in the plane, where xi = x0 + ihx , yj = y0 + jhy . The spatial mesh sizes hx and hy are often assumed to be 1 in image processing, which we will do also here for simplicity. Denoting the intensity values by ui,j and assuming that i ∈ {0, … , nx }, j ∈ {0, … , ny }, the image is then described as the array u = (ui,j ). The nodes of the grid, thus the pixels of the image, can naturally be considered as vertices of a graph in which neighboring pixels are connected by edges. We will call this graph the pixel graph Gu of the image. Two common choices for what pixels are considered as neighbors are based on 4-neighborhoods, in which pixel (i, j) has two horizontal neighbors (i ± 1, j) and two vertical neighbors (i, j ± 1), or 8-neighborhoods, in which also the four diagonal neighbors (i ± 1, j ± 1) are included in the neighborhood. Although the 4-neighborhood setting leads to a somewhat simpler pixel graph (particularly, it is planar), 8-neighborhoods are better suited to capture the geometry of the underlying (Euclidean) plane. In this chapter, we will mostly use 8-neighborhoods. For more variants of graphs assigned to images, we refer to [26, Sec. 1.5]. We upgrade the pixel graph to an edge-weighted pixel graph Gw by defining edge weights wp,q for neighboring pixels p, q via wp,q ∶= (||p − q||2 + 𝛽 2 |up − uq |2 )1∕2 ,

(7.1)

that is an l2 sum of the spatial distance of grid nodes ||p − q|| (where || ⋅ || denotes the Euclidean norm), and the contrast |up − uq | of their corresponding intensity values, weighted by a positive contrast scale 𝛽. This construction can in a natural way be generalized by replacing the Euclidean norm in the image plane, and the l2 sum by other norms. With various settings for these norms, it has been used in Refs [63–66] and further works to construct larger spatially adaptive neighborhoods in images, the so-called morphological amoebas. See also [2] for a more detailed description of the amoeba framework in a graph-based terminology. All graphs that will enter the texture descriptor construction are derived from the pixel graph or the edge-weighted pixel graph of the image. First, given a pixel p and radius 𝜚 > 0, we define the Euclidean patch graph GwE (p, 𝜚) as the subgraph of Gw , which includes all nodes q with the Euclidean distance ||q − p|| ≤ 𝜚. In this graph, the image information is encoded solely in the edge weights. Second, we define the adaptive patch graph GwA (p, 𝜚) as the subgraph of Gw , which includes all nodes q for which Gw contains a path from p to q with total weight less than or equal to 𝜚. In the terminology of [63–66], the node set of GwA (p, 𝜚) is a morphological amoeba of radius 𝜚 around p, which we will denote

7.2

Graph Entropy-Based Texture Descriptors

by 𝜚 (p). Note that the graph GwA (p, 𝜚) encodes image information not only in its edge weights, but also in its node set 𝜚 (p). One obvious way to compute 𝜚 (p) is by Dijkstra’s shortest path algorithm [67] with p as the starting point. A natural by-product of this algorithm, which is not used in amoeba-based image filtering as in Ref. [63], is the Dijkstra search tree, which we denote as TwA (p, 𝜚). This is the third candidate graph for our texture description. Image information is encoded in this graph in three ways: the edge weights, the node set, and the connectivity of the tree. Dropping the edge weights from TwA (p, 𝜚), we obtain an unweighted tree A Tu (p, 𝜚), which still encodes image information in its node set and connectivity. Finally, a Dijkstra search tree TwE (p, 𝜚) and its unweighted counterpart TuE (p, 𝜚) can be obtained by applying Dijkstra’s shortest path algorithm within the Euclidean patch graph GwE (p, 𝜚). Whereas TwE (p, 𝜚) encodes image information in the edge weights and connectivity, TuE (p, 𝜚) does so only in the connectivity. Applying these procedures to all pixels p = (i, j) of a discrete image u, we therefore have six collections of graphs, which represent different combinations of three cues to local image information (edge weights, node sets, and connectivity) and can therefore be expected to be suitable for texture discrimination. In the following, we will drop the arguments p, 𝜚 and use simply GwA etc. to refer to the collections of graphs. 7.2.2 Entropy-Based Graph Indices

In order to turn the collections of graphs into quantitative texture descriptors suitable for texture analysis tasks, the realm of graph indices developed in quantitative graph theory lends itself as a powerful tool. In Ref. [2], a selection of graph indices was considered for this purpose, including, on the one hand, distance-based graph indices (the Wiener index [8], the Harary index [7], and the Balaban index [68]) and on the other hand, entropybased indices (Bonchev-Trinajstić indices [4, 13] and Dehmer entropies [69]). We emphasize that the selection of graph indices used to construct texture descriptors in Ref. [2] is not exhaustive. Degree-based indices like the Randić index and Zagreb index [70] as well as spectrum-based indices such as the graph energy [71] are further well-studied classes of graph indices. Their potential for texture analysis might be a subject of further research. Regarding recent developments in the theoretical analysis of entropy-based graph indices, let us mention [72] and [73]; the latter work is also of interest for combining entropy ideas with the concept of the Randić index. The so-obtained set of 42 texture descriptors was evaluated in Ref. [2] with respect to their discrimination power and diversity. Using nine textures from a database [43], texture discrimination power was quantified based on simple statistics (mean value and standard deviation) of the descriptor values within singletexture patches, calibrating thresholds for certain and uncertain discrimination of textures within the set of textures. Diversity of descriptors was measured based

211

212

7 Graph Entropies in Texture Segmentation of Images

on the overlap in the sets of texture pairs discriminated by different descriptors. Despite the somewhat ad hoc character of the threshold calibration, the study provides valuable hints for the selection of powerful subsets of the 42 texture descriptors. Among the descriptors being analyzed, particularly the entropy-based descriptors are ranked medium to high regarding discrimination power for the sample set of textures. For the present work, we focus therefore on three entropy measures, which we will recall in the following, namely the Dehmer entropies If V and If P E

as well as Bonchev and Trinajstić’s mean information on distances I D . The latter is restricted by its construction to unweighted graphs, and is therefore used with the unweighted Dijkstra trees TuA and TuE . The Dehmer entropies can be combined with all six graph collections. In Ref. [2], the Dehmer entropies on the patch graphs GwA and GwE achieved the highest rates of certain discrimination of textures, and outperformed the Haralick features included in the study. Some of the other descriptors based on Dehmer entropies as well as the Bonchev-Trinajstić information measures achieved middle ranks, combining medium rates of certain discrimination with uncertain discrimination of almost all other texture pairs; thereby, they were still comparable to Haralick features and distance-based graph indices. 7.2.2.1 Shannon’s Entropy

The measures considered here are based on Shannon’s entropy [74] H(p) = −

n ∑

pi ld pi

(7.2)

i=1

that measures the information content of a discrete probability measure ∑n p ∶ {1, · · · , n} → ℝ+0 , i → pi , i=1 pi = 1. (Note that for pi = 0, one has to set in (7.2) pi ldpi = 0 by limit.) Following [69], a discrete probability measure can be assigned to an arbitrary nonnegative-valued function f ∶ {1, … , n} → ℝ+0 , an information functional, via f pi ∶= ∑n i

j=1 fj

.

(7.3)

An entropy measure on an arbitrary information functional f is then obtained by applying (7.2) to (7.3). 7.2.2.2 Bonchev and Trinajstić’s Mean Information on Distances

Introduced in Ref. [4] and further investigated in Ref. [13], the mean information on distances is the entropy measure resulting from an information functional on the path lengths in a graph. Let a graph G with n vertices v1 , … , vn be given, and let d(vi , vj ) denote the length of the shortest path from vi to vj in G (unweighted, i.e., each edge counting 1). Let D(G) ∶= max d(vi , vj ) be the diameter of G. For each d ∈ {1, · · · , D(G)}, let

i,j

kd ∶= #{(i, j)|1 ≤ i < j ≤ n, d(i, j) = d}.

(7.4)

7.2

Graph Entropy-Based Texture Descriptors

Then, the mean information on distances is the entropy measure based on the information functional kd , that is ∑ kd kd ( ) ld ( ) .

D(G)

E

I D (G) = −

n 2

d=1

(7.5)

n 2

Let us shortly mention that [4] also introduces the mean information on realized W distances I D (G), which we will not further consider here. As an entropy measure, W I D (G) can be derived from the information functional d(vi , vj ) on the set of all W

vertex pairs (vi , vj ), 1 ≤ i < j ≤ n, of G. As pointed out in Ref. [2], I D (G) can be generalized directly to edge-weighted graphs by measuring distances d(vi , vj ) with E

edge weights, but a similar generalization of I D (G) would be degenerated, because in generic cases, all edge-weighted distances in a graph will be distinct, leading to kd = 1 for all realized values d. Therefore, we will use the mean information on E

distances I D (G) only with the unweighted graphs TuE and TuA . 7.2.2.3 Dehmer Entropies

The two entropy measures If V (G) and If P (G) for unweighted graphs G were introduced in Ref. [69]. Their high discriminative power for large sets of graphs was impressively demonstrated in Ref. [14]. Both measures rely on information functionals on the vertex set {v1 , … , vn } of G, whose construction involves spheres Sd (vi ) of varying radius d around vi . Note that the sphere Sd (vi ) in G is the set of vertices vj with d(vi , vj ) ≤ d. For If V , the information functional f V on vertices vi of an unweighted graph G is defined as [69] (D(G) ) ∑ V f (vi ) ∶= exp cd sd (j) , (7.6) d=1

where (7.7)

sd (j) ∶= #Sd (vj )

is the cardinality of the d-sphere around vj , with positive parameters c1 , … , cD(G) . (Note that [69] used a general exponential with base 𝛼. For the purpose of the present chapter, however, this additional parameter is easily eliminated by multiplying the coefficients ci with ln 𝛼.) For If P , the information functional f P relies on the quantities ∑ d(vi , vj ), (7.8) ld (i) ∶= j∶vj ∈Sd (vi )

that is, ld (i) is the sum of distances from vi to all points in its d-sphere. With similar parameters c1 , … , cD(G) as before, one defines (D(G) ) ∑ P f (vi ) ∶= exp cd ld (j) . (7.9) d=1

213

214

7 Graph Entropies in Texture Segmentation of Images

As pointed out in Ref. [2], both information functionals, and thus the resulting entropy measures If V , If P , can be adapted to edge-weighted graphs G via ( n ) ∑ V f (vi ) = exp C(d(vi , vj )) , (7.10) j=1

P

f (vi ) = exp

( n ∑

) C(d(vi , vj ))d(vi , vj ) ,

(7.11)

j=1

where distances d(vi , vj ) are now measured using the edge weights, and C ∶ [0, D(G)] → ℝ+0 is a decreasing function interpolating a reverse partial sum series of the original cd coefficients. Further following [2], we focus on the specific choice cd = qd ,

q ∈ (0, 1)

(7.12)

(an instance of the exponential weighting scheme from [14]) and obtain accordingly C(d) = Mqd with a positive constant M, which yields ( ) n ∑ d(vi ,vj ) V q , (7.13) f (vi ) = exp M ( P

f (vi ) = exp M

j=1 n ∑

) q

d(vi ,vj )

d(vi , vj ) ,

(7.14)

j=1

with a positive constant M, as the final form of the information functionals for our construction of texture descriptors. 7.3 Geodesic Active Contours

We use for our experiments a well-established segmentation method based on PDEs. Introduced in Refs [75, 76], GACs perform a contrast-based segmentation of a (grayscale) input image f . In fact, other contrast-based segmentation methods could be chosen, including clustering [77, 78] or graph-cut methods [27, 79]. Advantages or disadvantages of these methods in connection with graph entropy-based texture descriptors may be studied in the future works. For the time being, we focus on the texture descriptors themselves, thus it matters to use just one well-established standard method. 7.3.1 Basic GAC Evolution for Grayscale Images

From the input image f , a Gaussian-smoothed image f𝜎 ∶= G𝜎 ∗ f is computed, where G𝜎 is a Gaussian kernel of standard deviation 𝜎. From f𝜎 , one computes an edge map g(|𝛁f𝜎 |) with the help of a decreasing and bounded function g ∶ ℝ+0 → ℝ+0 with lims→∞ g(s) = 0. A popular choice for g is g(s) =

1 1 + s2 ∕𝜆2

(7.15)

7.3

Geodesic Active Contours

which has originally been introduced by Perona and Malik [80] as a diffusivity function for nonlinear diffusion filtering of images. Herein, 𝜆 > 0 is a contrast parameter that acts as a threshold distinguishing high gradients (indicating probable edges) from small ones. In addition to the input image, GAC require an initial contour C0 (a regular closed curve) specified, for example, by user input. This contour is embedded into a level set function u0 , that is, u0 is a sufficiently smooth function in the image plane whose zero-level set (set of all points (x, y) in the image plane for which u0 (x, y) = 0) is the given contour. For example, u0 can be introduced as a signed distance function: u0 (x, y) is zero if (x, y) lies on C0 ; it is minus the distance of (x, y) to C0 if (x, y) lies in the region enclosed by C0 , and plus the same distance if (x, y) lies in the outer region. One takes then u0 as initial condition at time t = 0 for the parabolic PDE ) ( 𝛁u (7.16) ut = |𝛁u|div g(|𝛁f𝜎 |) |𝛁u| for a time-dependent level-set function u(x, y, t). At each time t ≥ 0, an evolved contour can be extracted from u(⋅, ⋅, t) as zero level set. For suitable input images and initializations and with appropriate parameters, the contours lock in at a steady state that provides a contrast-based segmentation. In order to understand equation (7.16), one can compare it to the curvature motion equation ut = div(𝛁u∕|𝛁u|) that would evolve all level curves of u by an inward movement proportional to their curvature. In (7.16), this inward movement of level curves is modulated by the edge map g(|𝛁f𝜎 |), which slows down the curve displacement at high-contrast locations, such that contours stick there. The name GACs is due to the fact that the contour evolution associated with (7.16) can be understood as gradient descent for the curve length of the contour in an image-adaptive metric (a Riemannian metric whose metric tensor is g(|𝛁f𝜎 |) times the unit matrix), thus yielding a final contour that is a geodesic with respect to this metric. 7.3.2 Force Terms

In their pure form (7.16), GACs require the initial contour (at least most of it) to be placed outside the object to be segmented. In some situations, however, it is easier to specify an initial contour inside an object, particularly if the intensities within the object are fairly homogeneous, but many spurious edges irritating the segmentation exist in the background region. Moreover, despite being able to handle also topology changes such as a splitting from one to several level curves encircling distinct objects to some extent, it has limitations when the topology of the segmentation becomes too complex. As a remedy to both difficulties, one can modify (7.16) by adding a force term 𝜈g(|𝛁f𝜎 |)|𝛁f𝜎 | to its right-hand side. This type of addition was proposed first in Ref. [81] (by the name of balloon force), whereas the specific form of the force term weighted with g was proposed in Refs [75, 76, 82]. Depending on the sign of 𝜈,

215

216

7 Graph Entropies in Texture Segmentation of Images

this force exercises an inward (𝜈 > 0) or outward (𝜈 < 0) pressure on the contour, which (i) speeds up the curve evolution, (ii) supports the handling of complex segmentation topologies, and (iii) enables for 𝜈 < 0 also segmentation of objects from initial contours placed inside. The modified GAC evolution with force term, ) ( ( ) 𝛁u ut = |𝛁u| div g(|𝛁f𝜎 |) + 𝜈 g(|𝛁f𝜎 |) (7.17) |𝛁u| will be our segmentation method when performing texture segmentation based on only one quantitative texture descriptor. In this case, the texture descriptor will be used as an input image f , from which the edge map g is computed. 7.3.3 Multichannel Images

It is straightforward to extend the GAC method, including its modified version with force term to multichannel input images f , where each location (x, y) in the image plane is assigned an r-tuple (f1 (x, y), … , fr (x, y)) of intensities. A common case, with r = 3, are RGB color images. In fact, equations (7.16) and (7.17) incur almost no change as even for multichannel input images, one computes the evolution of a simple real-valued level-set function u. What is changed is the computation of the edge map g: Instead of the gradient norm |𝛁f𝜎 |, one uses the Frobenius norm ||𝐃f 𝜎 || of the Jacobian 𝐃f 𝜎 , where f 𝜎 is the Gaussian-smoothed input image, f 𝜎 = (f𝜎;1 , … , f𝜎;r ) with f𝜎;i = G𝜎 ∗ fi , yielding g(||𝐃f 𝜎 ||) as edge map. Equation (7.17) with this edge map will be our segmentation method when performing texture segmentation with multiple texture descriptors. The input image f will have the individual texture descriptors as channels. In order to weigh the influence of texture descriptors, the channels may be multiplied by scalar factors. 7.3.4 Remarks on Numerics

For numerical computation, we rewrite PDE (7.17) as ) ( 𝛁u + ⟨𝛁g, 𝛁u⟩ + 𝜈 g|𝛁u| ut = g|𝛁u|div |𝛁u|

(7.18)

(where we have omitted the argument of g, which is a fixed input function to the PDE anyway). Following established practice, we use an explicit (Euler forward) numerical scheme, where the right-hand side is spatially discretized as follows. The first term, g|𝛁u|div(𝛁u∕|𝛁u|), is discretized using central differences. For the second term, ⟨𝛁g, 𝛁u⟩, an upwind discretization [83, 84] is used, in which the upwind direction for u is determined based on the central difference approximations of 𝛁g. The third term, 𝜈 g|𝛁u|, is discretized with an upwind discretization, too. Here, the upwind direction depends on the components of 𝛁u and the sign of 𝜈.

7.4

Texture Segmentation Experiments

Although a detailed stability analysis for this widely used type of explicit scheme for the GAC equation seems to be missing, the scheme works for time step sizes 𝜏 up to approximately 0.25 (for spatial mesh sizes of hx = hy = 1) for 𝜈 = 0, which needs to be reduced somewhat for nonzero 𝜈. In our experiments in Section 7.4, we use consistently 𝜏 = 0.1. For the level-set function u, we use the signed distance function of the initial contour as initialization. Since during the evolution the shape of the level-set function changes, creating steeper ascents in some regions but flattening slopes elsewhere, we re-initialize u to the signed distance function of its current zerolevel set every 100 iterations.

7.4 Texture Segmentation Experiments

In this section, we present experiments on two synthetic and one real-world test image, which demonstrate that graph entropy-based texture descriptors can be used for texture-based segmentation. An experiment similar to our second synthetic example was already presented in Ref. [3]. 7.4.1 First Synthetic Example

In our first example, we use a synthetic image, shown in Figure 7.1c, which is composed of two textures, see Figure 7.1a an b, with a simple shape (the letter “E”) switching between the two. Note that the two textures were also among the nine textures studied in Ref. [2] for the texture discrimination task. With its use of real-world textures, this synthetic example mimicks a realistic segmentation task. Its synthetic construction warrants at the same time a ground truth to compare segmentation results with. Figure 7.2 shows the results of eight graph entropy-based texture descriptors for the test image. In particular, the combination of the Dehmer entropy If V with all six graph variants from Section 7.2.1 is shown as well as If P on the weighted Dijkstra trees in nonadaptive and adaptive patches. Patch radii were fixed to 𝜚 = 5 for both nonadaptive and adaptive patches, whereas the contrast scale was chosen as 𝛽 = 0.1. These parameter settings have already been used in Ref. [2]; they are based on values that work across various test images in the context of morphological amoeba image filtering. Further investigation of variation of these parameters is left for future work. Visual inspection of Figure 7.2 indicates that for this specific textured image, the entropy measure If V separates the two textures well, particularly when combined with the weighted Dijkstra tree settings, in both adaptive and nonadaptive patches, see frames (b) and (f ). The other If V results in frames (c), (e), and (g) show insufficient contrast along some parts of the contour of the letter “E.” The index If V (GwE ) in frame (a), which was identified in Ref. [2] as a descriptor with

217

218

7 Graph Entropies in Texture Segmentation of Images

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Figure 7.2 Graph entropy-based texture descriptors applied to the test image from Figure 7.1c (values rescaled to [0, 255]). Patch radii were fixed to 𝜚 = 5, contrast scale to 𝛽 = 0.1, weighting parameter

to q = 0.1. Top row, left to right: (a) If V (GEw ). – (b) If V (TwE ). – (c) If V (TuE ). – (d) If P (TwE ). – Bottom row, left to right: (e) If V (GAw ). – (f) If V (TwA ). – (g) If V (TuA ). – (h) If P (TwA ).

high texture discrimination power, does not distinguish these two textures clearly but creates massive oversegmentation within each of them. In a sense, this oversegmentation is the downside of the high texture discrimination power of the descriptor. Note, however, that also other GwE -based descriptors tend to this type of oversegmentation. Regarding the If P index, Figure 7.2 (d) and (h), there is a huge difference between the adaptive and nonadaptive patch settings. Distinction of the two textures is much better when using nonadaptive patches. Finally, we show in Figure 7.3 GAC segmentation of the test image with the descriptor If V (TwA ). We start from an initial contour inside the “E” shape, see Figure 7.3a, and use an expansion force (𝜈 = −1) to drive the contour evolution in an outward direction. Frames (b) and (c) show two intermediary stages of the evolution, where it is evident that the contour starts to align with the boundary between the two textures. Frame (d) shows the steady state that was reached after 4900 iterations (t = 490). Here, the overall shape of the letter “E” is reasonably approximated, with deviations coming from small-scale texture details. Precision of the segmentation could be increased slightly by combining more than one texture descriptor. We do not follow this direction at this point. 7.4.2 Second Synthetic Example

In our second experiment, Figure 7.4, we use again a synthetic test image, where foreground and background segments are defined using the letter “E” shape such

7.4

(a)

(b)

(c)

Texture Segmentation Experiments

(d)

Figure 7.3 Geodesic active contour segmentation of the image shown in Figure 7.1c. The edge map is computed from the graph entropy-based texture descriptor If V (TwA ) from Figure 7.2f using presmoothing with 𝜎 = 2,

Perona–Malik edge-stopping function (7.15) with 𝜆 = 0.1, with expansion force 𝜈 = −1. Left to right: (a) Initial contour (t = 0). – (b) t = 40. – (c) t = 160. – (d) t = 490 (steady state).

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Figure 7.4 Texture segmentation of a synthetic image. Top row, left to right: (a) Test image (80 × 80 pixels) showing a stripe-textured shape in front of a noise background. – (b–d) Texture descriptors based on graph entropies applied in adaptive patches, 𝜚 = 5, 𝛽 = 0.1, q = 0.1; values rescaled to [0, 255]. (b) If V (GAw ). – (c)

If V (TwA ). – (d) If V (TuA ). – Bottom row: Geodesic active contour segmentation of (a) using the texture descriptor (If V on GAw ) from (b), same parameters as in Figure 7.2 except for 𝜎 = 1. Left to right: (e) Initial contour (t = 0). – (f) t = 10. – (g) t = 30. – (h) t = 110 (steady state).

that again the desired segmentation result is known as a ground truth. Also in this image, we combine two realistic textures, which can be seen as a simplified version of the foreground and background textures of the real-world test image, Figure 7.5a, used in the next section. This time, the foreground is filled with a stripe pattern, whereas the background is noise with uniform distribution in the intensity range [0, 255], see Figure 7.4a. In frames (b)–(d) of Figure 7.4, we show the texture descriptors based on If V with the three graph settings in adaptive patches, using again 𝜚 = 5 and 𝛽 = 0.1. The descriptor If V (TwA ) that was used for the segmentation in Section 7.4.1 visually does not distinguish foreground from background satisfactorily here, whereas If V (GwA ) that provided no clear distinction of the two textures in Section 7.4.1 clearly stands out here. This underlines the necessity of

219

220

7 Graph Entropies in Texture Segmentation of Images

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

Figure 7.5 Texture segmentation of a realworld image. Top row, left to right: (a) Photograph of a zebra (320 × 240 pixels), converted to gray scale. (Original image from http://en.wikipedia.org/wiki/File:Grants_Zebra .jpg, author Ruby 1x2, released to public domain.) – (b) Texture descriptor If V (GEw ), 𝜚 = 7, 𝛽 = 0.1, q = 0.1. – (c) Texture descriptor If V (TwE ), parameters as in (b). – Middle row, left to right: (d) Texture descriptor If P (TwE ), parameters as in (b). – (e) Texture descriptor

E

ID (TuA ), parameters as in (b). – (f) Initial contour for geodesic active contour segmentation (t = 0). – Bottom row, left to right: (g) Geodesic active contours with edge-stopping function computed from the texture descripE

tors If P (TwE ) shown in (d) and ID (TuA ) shown in (e) with 𝜎 = 7, Perona–Malik edge-stopping function, 𝜆 = 0.014, t = 100. – (h) Same as (g) but t = 400. – (i) Same as (g) but t = 1 340 (steady state).

considering multiple descriptors, which complement each other in distinguishing textures. Our GAC segmentation of the test image shown in frames (e)–(h) is based on the texture descriptor If V (GwA ) and quickly converges to a fairly good approximation of the segment boundary. 7.4.3 Real-World Example

In our last experiment, Figure 7.5, we consider a real-world image showing a zebra, see frame (a). In a sense, this experiment resembles the synthetic case from Section 7.4.2, because again a foreground dominated by a clear stripe pattern is to be distinguished from a background filled with small-scale detail. In frames (b)–(e),

7.5

Analysis of Graph Entropy-Based Texture Descriptors

four texture descriptors are shown. With regard to the higher resolution of the test image, the patch radius has been chosen slightly larger than that in the previous examples, 𝜚 = 7, whereas 𝛽 = 0.1 was retained. As can be seen in frame (b), If V (GwE ) shows the same kind of oversegmentation behavior as observed in Section 7.4.1; however, it also separates a large part of the zebra shape well from the background. The second descriptor, If V (TwE ) in frame (c), appears unsuitable here, because it does not yield sufficiently similar values within the black and white E stripes to recognize these as a common texture. By contrast, If P (TwE ) and I D (TuA ) in Figure 7.5(d) and (e), respectively, achieve this largely. Our GAC segmentation in frames (f )–(i) uses a larger Gaussian kernel for presmoothing than before, 𝜎 = 7, to flatten out small-scale inhomogeneities in the texture descriptors, and combines the two descriptors from (d) and (e). With these data, a large part of the zebra, including the head and front parts of the torso, is segmented in the final steady state. Not included are the rear part and the forelegs. Note that in the foreleg part, the stripes are much thinner than those in the segmented region, apparently preventing the recognition of this texture as a continuation of the one from the head and front parts of the torso. By contrast, the rear part of the torso shows very thick stripes, which, under the patch size chosen, decompose into separate (homogeneous) textures for black and white stripes, as is also visible in the texture descriptors (d) and (e) themselves. Further investigation of parameter variations and inclusion of more texture descriptors might improve this preliminary result in the future.

7.5 Analysis of Graph Entropy-Based Texture Descriptors

In this section, we undertake an attempt to analyze the texture descriptors based on the entropy measures If V and If P , focusing on the question what properties of textures are actually encoded in their information functionals f V and f P . Part of this analysis is on a heuristic level at the present stage of research, and future work will have to be invested to add precision to these arguments. This applies to the limiting procedure in Section 7.5.2 as well as to the concept of local fractal dimension arising in Section 7.5.3. We believe, however, that even in its present shape, the analysis provided in the following gives valuable intuition about the principles underlying our texture descriptors. 7.5.1 Rewriting the Information Functionals

For the purpose of our analysis, we generalize the information functional f V from (7.6) directly to edge-weighted graphs by replacing the series sd of cardinalities from (7.7) with the monotone-increasing function s ∶ [0, ∞) → ℝ, s(d) ∶= vol(Sd (vj ))

(7.19)

221

222

7 Graph Entropies in Texture Segmentation of Images

that measures volumes of spheres with arbitrary radius. Assuming the exponential weighting scheme (7.12) and large D(G), this yields ( ∞ ) qd s(d)dd . (7.20) f V (vi ) ≈ exp ∫0 An analogous generalization of (7.9) is ( ∞ ) P d f (vi ) ≈ exp q 𝑑𝑠(d)dd . ∫0

(7.21)

7.5.2 Infinite Resolution Limits of Graphs

We assume now that the image is sampled successively on finer grids, with decreasing hx = hy =∶ h. Note that the number of vertices of any region of the edge-weighted pixel graph, or any of the derived edge-weighted graphs introduced in Section 7.2.1, grows in this process with h−2 . By using the volumes of spheres rather than the original cardinalities, (7.19) provides a renormalization that compensates this effect in (7.20). Thus, it is possible to consider the limit case h → 0. In this limit case, graphs turn into metric spaces representing the structure of a space-continuous image. In addition, these metric spaces are endowed with a volume measure, which is the limit case of the discrete measures on graphs given by vertex counting. In simple cases, these metric spaces with volume measure can be manifolds. For example, for a homogeneous gray image without any contrast, the limit of the edge-weighted pixel graph is an approximation to a plane, that is, a two-dimensional manifold. For an image with extreme contrasts in one direction, for example, a stripe pattern, the edge-weighted pixel graphs will be path graphs, resulting in a metric space as limit, which is essentially a one-dimensional manifold. Finally, in the extreme case of a noise image, in which neighboring pixels have almost nowhere similar gray-values, the graph will practically decompose into numerous isolated connected components, corresponding to a discrete space of dimension 0. For more general textured images, the limit spaces will possess a more complicated topological structure. At the same time, it remains possible, in fact, to measure volumes of spheres of different radii in these spaces. Clearly, sphere volumes will increase with sphere radius. If they fulfil a power law, the (possible noninteger) exponent can immediately be interpreted as a dimension. The space itself is then interpreted as some type of fractal [49]. The dimension concept underlying here is almost that of the Minkowski dimension (closely related to Hausdorff dimension) that is frequently used in fractal theory, with the difference that the volume measure here is inside the object being measured instead of in an embedding space. On the basis of the above reasoning, values of the dimension will range between 0 and 2.

7.5

Analysis of Graph Entropy-Based Texture Descriptors

Note that even in situations in which there is no global power law for the sphere volumes, and therefore no global dimension, power laws, possibly with varying exponents, will still be approximated for a given sphere center in suitable ranges of the radius, thus allowing to define the fractal dimension as a quantity varying in space and resolution. This resembles the situation with most fractal concepts being applied to real-world data: the power laws that are required to hold for an ideal fractal across all scales will be found only for certain ranges of scales in reality. Dijkstra trees, too, turn into one-dimensional manifolds in the case of sharp stripe images; for other cases, they will also yield fractals. Fractal dimensions, wherever applicable, will be below those observed with the corresponding full edge-weighted pixel graphs; thus, the relevant range of dimensions is again bounded by 0 from below and 2 from above. One word of care must be said at this point. The fractal structures obtained here as limit cases of graphs for h → 0 are not identical with the image manifolds, whose fractal structures are studied as a means of texture analysis in Refs [50, 52, 53] and others. In fact, fractal dimensions of the latter, measured as Minkowski dimensions by means of the embedding of the image manifold of a grayscale image in three-dimensional Euclidean space, range from 2 to 3 with increasing roughness of the image, whereas the dimensions measured in the present work decrease from 2 to 0 with increasing image roughness. Whereas it can be conjectured that these two fractal structures are related, future work will be needed to gain clarity about this relationship. 7.5.3 Fractal Analysis

On the basis of the findings from the previous section, let us now assume that the limit h → 0 from one of the graph structures results in a measured metric space F of dimension 𝛿 ∈ [0, 2], in which sphere volumes are given by the equation s(d) = d𝛿 U(𝛿),

(7.22)

where U(𝛿) =

𝜋 𝛿∕2 Γ(𝛿∕2 + 1)

(7.23)

is the volume of a unit sphere, Γ denoting the Gamma function. Thus, we assume that s(d) interpolates the sphere volumes of Euclidean spaces for integer 𝛿. Note that this assumption has indeed two parts. First, (7.22) means that a volume measure on the metric space F exists that behaves homogeneously with degree 𝛿 with regard to distances. In the manifold case (integer 𝛿), this is the case of vanishing curvature; for general manifolds of integer dimension 𝛿, (7.22) would hold as an approximation for small radii.

223

224

7 Graph Entropies in Texture Segmentation of Images

The second assumption, (7.23), corresponds to the Euclideanness of the metric. For edge-weighted pixel graphs based on 4- or 8-neighborhoods, the volume of unit spheres actually deviates from (7.23), even in the limit. However, with increasing neighborhood size, (7.23) is approximated better and better. Most of the following analysis does not depend specifically on (7.23); thus, we will return to (7.23) only later for numerical evaluation of information functionals. With (7.22), we have ∞

∫0

∞

qd s(d)dd = U(𝛿)

∫0

exp(d ln q)d𝛿 dd

= (− ln q)𝛿+1 U(𝛿)

∞

∫0

(7.24)

exp(−w)w𝛿 dw

= (− ln q)𝛿+1 U(𝛿)Γ(𝛿 + 1)

(7.25)

where we have substituted w ∶= −d ln q. As a result, we obtain f V (vi ) ≈ exp((− ln q)𝛿+1 U(𝛿)Γ(𝛿 + 1)).

(7.26)

P

Analogous considerations for f from (7.21) lead to f P (vi ) ≈ exp((− ln q)𝛿+2 U(𝛿)Γ(𝛿 + 2)).

(7.27)

As pointed out earlier, the metric structure of the fractal F will, in general, be more complicated such that it does not possess a well-defined global dimension. However, such a dimension can be measured at each location and scale. The quantities f V and f P as stated in (7.26) and (7.27) can then be understood as functions of the local fractal dimension in a neighborhood of vertex vi , where the size of the neighborhood – the scale – is controlled by the decay of the function qd in the integrands of (7.20) and (7.21), respectively. As a result, we find that the information functionals f V and f P represent distributions over the input pixels of an image patch (nonadaptive or adaptive), in which the pixels are assigned different weights depending on a local fractal dimension measure. The entropies If V and If P then measure the local homogeneity or inhomogeneity of this dimension distribution: For very homogeneous dimension values within a patch, the density resulting from each of the information functionals f V , f P will be fairly homogeneous, implying high entropy. The more the dimension values are spread out, the more will the density be dominated by a few pixels with high values of f V or f P , thus yielding low entropy. The precise dependency of the entropy on the dimension distribution will be slightly different for f V and f P , and will also depend on the choice of q. Details of this dependency will be a topic of future work. To give a basic intuition, we present in Figure 7.6 graphs of ln f V and ln f P as functions of the dimension 𝛿 ∈ [0, 2] for selected values of q. In computing these values, the specific choice (7.23) for U(𝛿) has been used. In the left column, frames (a) and (b), we use q = 0.1 as in our experiments in Section 7.4. Here, both ln f V and ln f P increase drastically by almost 10 from 𝛿 = 0

2.2

70

2

60

1.8

50

1.6

1

10

0.8

0

0.6

0

0.5

1

1.5

2

δ

4.5 4

60

3.5

0

0.5

1

1.5

0.24

2

30

2.5

ln f P

ln f P

40

2 1.5

20

1

10

0.5 0

0.5

1

δ

1.5

0

2

(e)

0

0.5

1

δ

0

0.5

1.5

2

0.32 0.3 0.28 0.26 0.24 0.22 0.2 0.18 0.16 0.14 0.12

(f)

1

1.5

2

1.5

2

δ

(c)

3

50

ln f P

0.26

δ

70

0.3 0.28

(b)

80

0

0.32

1.4

20

(a)

(d)

0.34

1.2

30

225

0.36

ln f

40

Analysis of Graph Entropy-Based Texture Descriptors

V

80

ln f V

ln f V

7.5

0

0.5

1

δ

Figure 7.6 Information functionals (in logarithmic scale) as functions of local dimension. Top row: ln f V . Left to right: (a) q = 0.1. – (b) q = 0.5. – (c) q = 0.7. – Bottom row: ln f P . Left to right: (d) q = 0.1. – (e) q = 0.5. – (f) q = 0.7.

to 𝛿 = 1 and even by 70 from 𝛿 = 1 to 𝛿 = 2. In the resulting information functionals f V and f P , that is, after applying exp to the functions shown in the figure, even pixels with only slightly higher values of the dimension strongly dominate the entire information density within the patch. For increasing q, the rate of increment in f V and f P with 𝛿 becomes lower. For q = 0.5, as shown in the second column, frames (b) and (e), of Figure 7.6, the variation of ln f V and ln f P is already reduced to 2 and 4, respectively, such that vertices across the entire dimension range [0, 2] will have a relevant effect on the information density. For even larger q, the dependency of f V and f P on 𝛿 becomes nonmonotonic (as shown in (c) for f V with q = 0.7) and even monotonically decreasing (for both f V and f P at q = 0.9; not shown). It will therefore be interesting for further investigation to evaluate also the texture discrimination behavior of the entropy measures for varying q, as this may be a way to target the sensitivity of the measures specifically at certain dimension ranges. In this context, however, it becomes evident that the parameter q plays two different roles at the same time. First, it steers the approximate radius of influence for the fractal dimension estimation. Here, it is important that this radius of influence is smaller than the patch size underlying the graph construction, such that the cut-off of the graphs has no significant effect on the values of the information functional at the individual vertices. Second, q determines the shape and steepness of the function (compare Figure 7.6) that relates the local fractal dimension to the values of the information functionals. This makes it desirable to refine in future work the parametrization of the exponential weighting scheme (7.12), such that the two roles of q are distributed to two parameters.

226

7 Graph Entropies in Texture Segmentation of Images

7.6 Conclusion

In this chapter, we have presented the framework of graph index-based texture descriptors that has first been introduced in Ref. [2]. Particular emphasis was put on entropy-based graph indices that have been proved in Ref. [2] to afford medium to high sensitivity for texture differences. We have extended the work from [2] in two directions. First, we have stated an approach to texture-based image segmentation, in which the texture descriptor framework was integrated with geodesic active contours [75, 76], a standard method for intensity-based image segmentation. This approach was already briefly introduced in Ref. [3], and is demonstrated here by a larger set of experiments, including two synthetic and one real-world example. Second, we have analyzed one representative of the graph entropy-based texture descriptors to gain insight into the image properties that this descriptor relies on. It turned out that it stands in close relation to measurements of fractal dimension of certain metric spaces that arise from the graphs in local image patches that underly our texture descriptors. Although this type of fractal dimension measurement in images differs from existing applications of fractal theory in image (and texture) analysis, as the latter treat the image manifold as a fractal object, results indicate that the two fractal approaches are interrelated. Our texture descriptor framework as a whole and also both novel contributions presented here require further research. In order to mention some topics, we start with parameter selection of the texture descriptors. In Refs [2, 3] as well as in this chapter, most parameters were fixed to specific values based on heuristics. A systematic investigation of the effect of variations of all these parameters is part of ongoing work. Inclusion of novel graph descriptors proposed in the literature, for example, [10], is a further option. The algorithms currently used for the computation of graph entropy-based texture descriptors need computation times in the range of minutes already for small images. As the algorithms have not been designed for efficiency, there is much room for improvement, which will also be considered in future work. Both texture discrimination and texture segmentation have been demonstrated so far rather on a proof-of-concept level. Extensive evaluation on larger sets of image data is ongoing. This is also necessary to gain more insight into the suitability of particular texture descriptors from our set for specific classes of textures. Regarding the texture segmentation framework, the conceptual break between the graph-based set of texture descriptors and the partial differential equation for segmentation could be reduced by using, for example, a graph-cut segmentation method. It can be asked whether such a combination even allows for some synergy between the computation steps. This is not clear so far since the features used to weigh graph edges are different: intensity contrasts in the texture descriptor phase; texture descriptor differences in the graph-cut phase. Furthermore, the integration of graph entropy-based texture descriptors into more complex segmentation frameworks will be a challenge. Unsupervised segmentation approaches are not

References

capable of handling involved segmentation tasks (such as in medical diagnostics), where highly accurate segmentation can only be achieved by including prior information on the shape and appearance of the objects to be segmented. State-of-theart segmentation frameworks therefore combine the mechanisms of unsupervised segmentation approaches with model-based methods as introduced, for example, in Ref. [85]. On the theoretical side, the analysis of the fractal limit of the descriptor construction will have to be refined and extended to include all six graph settings from Section 7.2.1. Relationships between the fractal structures arising from the graph construction and the image manifolds more commonly treated in fractal-based image analysis will have to be analyzed. In general, much more theoretical work deserves to be invested in understanding the connections and possible equivalences between the very disparate approaches to texture description that can be found in the literature. A graph-based approach like ours admits different directions of such comparisons. It can thus be speculated that it could play a pivotal role in understanding the relationships between texture description methods and create a unifying view on different methods that would also have implications for the understanding of texture itself.

References related local vertex invariants and topological indices. J. Math. Chem., 12 (1), Image Processing and Analysis with 309–318. Graphs: Theory and Practice, CRC Press, Boca Raton, FL. 7. Plavšić, D., Nikolić, S., and Trinajstić, N. (1993) On the Harary index for the charWelk, M. (2014) Discrimination of acterization of chemical graphs. J. Math. image textures using graph indices, in Chem., 12 (1), 235–250. Quantitative Graph Theory: Mathematical Foundations and Applications, 8. Wiener, H. (1947) Structural determiChapter 12 (eds M. Dehmer and nation of paraffin boiling points. J. Am. F. Emmert-Streib), CRC Press, pp. Chem. Soc., 69 (1), 17–20. 355–386. 9. Dehmer, M., Emmert-Streib, F., and Tripathi, S. (2013) Large-scale evaluation Welk, M. (2016) Amoeba techniques of molecular descriptors by means of for shape and texture analysis, in Perclustering. PloS ONE, 8 (12), e83 956, spectives in Shape Analysis (eds M. doi: 10.1371/journal.pone.0083956. Breuss, A. Bruckstein, P. Maragos, and S. Wuhrer), Springer, Cham, in press. 10. Dehmer, M., Shi, Y., and Mowshowitz, A. (2015) Discrimination power of graph Bonchev, D. and Trinajstić, N. (1977) measures based on complex zeros of the Information theory, distance matrix, and partial Hosoya polynomial. Appl. Math. molecular branching. J. Chem. Phys., 67 Comput., 250, 352–355. (10), 4517–4533. Hosoya, H. (1971) Topological index: a 11. Dehmer, M., Emmert-Streib, F., and newly proposed quantity characterizing Mehler, A. (eds) (2012) Towards the topological nature of structural isoan Information Theory of Complex mers of saturated hydrocarbons. Bull. Networks: Statistical Methods and AppliChem. Soc. Jpn., 44 (9), 2332–2339. cations, Birkhäuser Publishing. Ivanciuc, O., Balaban, T.S., and Balaban, 12. Dehmer, M. and Emmert-Streib, F. A. (1993) Design of topological indices. (eds) (2014) Quantitative Graph ThePart 4. Reciprocal distance matrix, ory: Mathematical Foundations and

1. Lezoray, O. and Grady, L. (eds) (2012)

2.

3.

4.

5.

6.

227

228

7 Graph Entropies in Texture Segmentation of Images

13.

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

Applications, CRC Press, Boca Raton, FL. Bonchev, D., Mekenyan, O., and Trinajstić, N. (1981) Isomer discrimination by topological information approach. J. Comput. Chem., 2 (2), 127–148. Dehmer, M., Grabner, M., and Varmuza, K. (2012) Information indices with high discriminative power for graphs. PLoS ONE, 7 (2), e31 214. Ferrer, M. and Bunke, H. (2012) Graph edit distance – theory, algorithms, and applications, in Image Processing and Analysis with Graphs: Theory and Practice, Chapter 13 (eds O. Lezoray and L. Grady), CRC Press, Boca Raton, FL, pp. 383–422. Sanfeliu, A. and Fu, K.S. (1983) A distance measure between attributed relational graphs for pattern recognition. IEEE Trans. Syst. Man Cybern., 13 (3), 353–362. Cross, A., Wilson, R., and Hancock, E. (1997) Inexact graph matching using genetic search. Pattern Recognit., 30 (6), 953–970. Dehmer, M. and Emmert-Streib, F. (2007) Comparing large graphs efficiently by margins of feature vectors. Appl. Math. Comput., 188, 1699–1710. Dehmer, M., Emmert-Streib, F., and Kilian, J. (2006) A similarity measure for graphs with low computational complexity. Appl. Math. Comput., 182, 447–459. Emmert-Streib, F. and Dehmer, M. (2007) Topological mappings between graphs, trees and generalized trees. Appl. Math. Comput., 186, 1326–1333. Riesen, K. and Bunke, H. (2009) Approximate graph edit distance computation by means of bipartite graph matching. Image Vision Comput., 27, 950–959. Wang, J., Zhang, K., and Chen, G.W. (1995) Algorithms for approximate graph matching. Inf. Sci., 82, 45–74. Zhu, L., Ng, W., and Han, S. (2011) Classifying graphs using theoretical metrics: a study of feasibility, in Database Systems for Advanced Applications, Lecture Notes in Computer Science, vol. 6637 (eds J. Xu, G. Yu, S. Zhou, and

24.

25.

26.

27.

28.

29.

30.

31.

32.

33.

R. Unland), Springer-Verlag, Berlin, pp. 53–64. Emmert-Streib, F. and Dehmer, M. (2012) Exploring statistical and population aspects of network complexity. PLoS ONE, 7 (5), e34 523. Furtula, B., Gutman, I., and Dehmer, M. (2013) On structure-sensitivity of degree-based topological indices. Appl. Math. Comput., 219, 8973–8978. Lezoray, O. and Grady, L. (2012) Graph theory concepts and definitions used in image processing and analysis, in Image Processing and Analysis with Graphs: Theory and Practice, Chapter 1 (eds O. Lezoray and L. Grady), CRC Press, Boca Raton, FL, pp. 1–24. Ishikawa, H. (2012) Graph cuts – combinatorial optimization in vision, in Image Processing and Analysis with Graphs: Theory and Practice, Chapter 2 (eds O. Lezoray and L. Grady), CRC Press, Boca Raton, FL, pp. 25–63. Ishikawa, H. and Geiger, D. (1998) Segmentation by grouping junctions. Proceedings of 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Santa Barbara, CA, pp. 125–131. Shi, J. and Malik, J. (2000) Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 22 (8), 888–906. Boykov, Y., Veksler, O., and Zabih, R. (1998) Markov random fields with efficient approximation. Proceedings of 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Santa Barbara, CA, pp. 648–655. Roy, S. and Cox, I. (1998) Maximumflow formulation of the n-camera stereo correspondence problem. Proceedings of 1998 IEEE International Conference on Computer Vision, Bombay, pp. 492–499. Clarenz, U., Rumpf, M., and Telea, A. (2004) Surface processing methods for point sets using finite elements. Comput. Graph., 28, 851–868. Buades, A., Coll, B., and Morel, J. (2005) A non-local algorithm for image denoising, in Proceedings of 2005 IEEE

References

34.

35.

36.

37.

38.

39.

40.

41.

42.

43.

44.

Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, IEEE Computer Society Press, San Diego, CA, pp. 60–65. Najman, L. and Meyer, F. (2012) A short tour of mathematical morphology on edge and vertex weighted graphs, in Image Processing and Analysis with Graphs: Theory and Practice, Chapter 6 (eds O. Lezoray and L. Grady), CRC Press, Boca Raton, FL, pp. 141–173. Elmoataz, A., Lezoray, O., Ta, V.T., and Bougleux, S. (2012) Partial difference equations on graphs for local and nonlocal image processing, in Image Processing and Analysis with Graphs: Theory and Practice, Chapter 7 (eds O. Lezoray and L. Grady), CRC Press, Boca Raton, FL, pp. 174–206. Haralick, R. (1979) Statistical and structural approaches to texture. Proc. IEEE, 67 (5), 786–804. Haralick, R., Shanmugam, K., and Dinstein, I. (1973) Textural features for image classification. IEEE Trans. Syst. Man Cybern., 3 (6), 610–621. Rosenfeld, A. and Thurston, M. (1971) Edge and curve detection for visual scene analysis. IEEE Trans. Comput., 20 (5), 562–569. Sutton, R. and Hall, E. (1972) Texture measures for automatic classification of pulmonary disease. IEEE Trans. Comput., 21 (7), 667–676. Zucker, S. (1976) Toward a model of texture. Comput. Graph. Image Process., 5, 190–202. Osher, S., Solé, A., and Vese, L. (2003) Image decomposition and restoration using total variation minimization and the H −1 norm. Multiscale Model. Simul., 1 (3), 349–370. Aujol, J.F. and Chambolle, A. (2005) Dual norms and image decomposition models. Int. J. Comput. Vision, 63 (1), 85–104. Picard, R., Graczyk, C., Mann, S., Wachman, J., Picard, L., and Campbell, L. (1995) Vistex Database, Online Resource, http://vismod.media.mit.edu/ vismod/imagery/VisionTexture/vistex .html. Gabor, D. (1946) Theory of communication. J. Inst. Electr. Eng., 93, 429–457.

45. Lendaris, G. and Stanley, G. (1970)

46.

47.

48.

49.

50.

51.

52.

53.

54.

55.

56.

Diffraction pattern sampling for automatic pattern recognition. Proc. IEEE, 58 (2), 198–216. Sandberg, B., Chan, T., and Vese, L. (2002) A Level-Set and Gabor-Based Active Contour Algorithm for Segmenting Textured Images. Tech. Rep. CAM-02-39, Department of Mathematics, University of California at Los Angeles, CA, USA. Brox, T., Rousson, M., Deriche, R., and Weickert, J. (2010) Colour, texture, and motion in level set based segmentation and tracking. Image Vision Comput., 28, 376–390. Lu, S. and Fu, K. (1978) A syntactic approach to texture analysis. Comput. Graph. Image Process., 7, 303–330. Mandelbrot, B. (1982) The Fractal Geometry of Nature, W. H. Freeman and Company. Avadhanam, N. (1993) Robust fractal characterization of one-dimensional and two-dimensional signals. Master’s thesis, Graduate Faculty, Texas Tech University, USA. Barbaroux, J.M., Germinet, F., and Tcheremchantsev, S. (2001) Generalized fractal dimensions: equivalences and basic properties. J. Math. Pures Appl., 80 (10), 977–1012. Pentland, A.P. (1984) Fractal-based description of natural scenes. IEEE Trans. Pattern Anal. Mach. Intell., 6 (6), 661–674. Soille, P. and Rivest, J.F. (1996) On the validity of fractal dimension measurements in image analysis. J. Visual Commun. Image Represent., 7 (3), 217–229. Maragos, P. (1991) Fractal aspects of speech signals: dimension and interpolation. Proceedings of 1991 International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Toronto, Ontario, Canada, pp. 417–420. Maragos, P. and Sun, F.K. (1993) Measuring the fractal dimension of signals: morphological covers and iterative optimization. IEEE Trans. Signal Process., 41 (1), 108–121. Pitsikalis, V. and Maragos, P. (2009) Analysis and classification of speech

229

230

7 Graph Entropies in Texture Segmentation of Images

57.

58.

59.

60.

61.

62.

63.

64.

65.

66.

signals by generalized fractal dimension features. Speech Commun., 51, 1206–1223. Cikalova, U., Kroening, M., Schreiber, J., and Vertyagina, Y. (2011) Evaluation of Al-specimen fatigue using a “smart sensor”. Phys. Mesomech., 14 (5–6), 308–315. Kuznetsov, P.V., Panin, V.E., and Schreiber, J. (2001) Fractal dimension as a characteristic of deformation stages of austenite stainless steel under tensile load. Theor. Appl. Fract. Mech., 35, 171–177. da S. Torres, R., Falcão, A.X., and da F. Costa, L. (2004) A graph-based approach for multiscale shape analysis. Pattern Recognit., 37, 1163–1174. Paragios, N. and Deriche, R. (2002) Geodesic active regions: a new paradigm to deal with frame partition problems in computer vision. J. Visual Commun. Image Represent., 13 (1/2), 249–268. Sagiv, C., Sochen, N., and Zeevi, Y. (2006) Integrated active contours for texture segmentation. IEEE Trans. Image Process., 15 (6), 1633–1646. Georgescu, B., Shimshoni, I., and Meer, P. (2003) Mean shift based clustering in high dimensions: a texture classification example. Proceedings of 2003 IEEE International Conference on Computer Vision, Nice, vol. 1, pp. 456–463. Lerallut, R., Decencière, É., and Meyer, F. (2005) Image processing using morphological amoebas, in Mathematical Morphology: 40 Years On, Computational Imaging and Vision, vol. 30 (eds C. Ronse, L. Najman, and É. Decencière), Springer-Verlag, Dordrecht, pp. 13–22. Lerallut, R., Decencière, É., and Meyer, F. (2007) Image filtering using morphological amoebas. Image Vision Comput., 25 (4), 395–404. Welk, M., Breuß, M., and Vogel, O. (2011) Morphological amoebas are selfsnakes. J. Math. Imaging Vision, 39, 87–99. Welk, M. and Breuß, M. (2014) Morphological amoebas and partial differential equations, in Advances in Imaging and Electron Physics, vol. 185 (ed. P.W. Hawkes), Elsevier Academic Press, pp. 139–212.

67. Dijkstra, E. (1959) A note on two prob-

68.

69.

70.

71. 72.

73.

74.

75.

76.

77.

78.

79.

lems in connexion with graphs. Numer. Math., 1, 269–271. Balaban, A. (1982) Highly discriminating distance-based topological index. Chem. Phys. Lett., 89, 399–404. Dehmer, M. (2008) Information processing in complex networks: graph entropy and information functionals. Appl. Math. Comput., 201, 82–94. Li, X. and Shi, Y. (2008) A survey on the Randić index. MATCH Commin. Math. Comput. Chem., 59 (1) 127–156. Li, X., Shi, Y., and Gutman, I. (2012) Graph Energy, Springer, New York. Cao, S., Dehmer, M., and Shi, Y. (2014) Extremality of degree-based graph entropies, Inform. Sci., 278, 22–33. Chen, Z., Dehmer, M., Emmert-Streib, F., and Shi, Y. (2015) Entropy of weighted graphs with Randić weights. Entropy, 17 (6), 3710–3723. Shannon, C. (1948) A mathematical theory of communication. Bell Syst. Tech. J., 27, 379–423, 623–656. Caselles, V., Kimmel, R., and Sapiro, G. (1995) Geodesic active contours, in Proceedings of the 5th International Conference on Computer Vision, IEEE Computer Society Press, Cambridge, MA, pp. 694–699. Kichenassamy, S., Kumar, A., Olver, P., Tannenbaum, A., and Yezzi, A. (1995) Gradient flows and geometric active contour models. Proceedings of the 5th International Conference on Computer Vision, Cambridge, MA, pp. 810–815. Comaniciu, D. and Meer, P. (1997) Robust analysis of feature spaces: color image segmentation. Proceedings of 1997 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Juan, Puerto Rico, pp. 750–755. Lucchese, L. and Mitra, S.K. (1999) Unsupervised segmentation of color images based on k-means clustering in the chromaticity plane. Proceedings of IEEE Workshop on Content-Based Access of Image and Video Libraries, Fort Collins, CO, pp. 74–78. Boykov, Y., Veksler, O., and Zabih, R. (2001) Fast approximate energy minimization via graph cuts. IEEE Trans.

References 83. Courant, R., Isaacson, E., and Rees, M. Pattern Anal. Mach. Intell., 23 (11), (1952) On the solution of nonlinear 1222–1239. hyperbolic differential equations by finite 80. Perona, P. and Malik, J. (1990) Scale differences. Commun. Pure Appl. Math., space and edge detection using 5 (3), 243–255. anisotropic diffusion. IEEE Trans. Pattern Anal. Mach. Intell., 12, 629–639. 84. Rouy, E. and Tourin, A. (1992) A viscosity solutions approach to shape81. Cohen, L.D. (1991) On active contour from-shading. SIAM J. Numer. Anal., 29 models and balloons. CVGIP: Image (3), 867–884. Understanding, 53 (2), 211–218. 82. Malladi, R., Sethian, J., and Vemuri, B. 85. Cootes, T.F. and Taylor, C.J. (2001) (1995) Shape modeling with front propaStatistical Models of Appearance for gation: a level set approach. IEEE Trans. Computer Vision. Tech. Rep., University Pattern Anal. Mach. Intell., 17, 158–175. of Manchester, UK.

231

233

8 Information Content Measures and Prediction of Physical Entropy of Organic Compounds Chandan Raychaudhury and Debnath Pal

8.1 Introduction

Physical entropy, its measure, interpretation, and applications have kept scientists curious and interested for a long time perhaps ever since it was introduced by Clausius [1]. It is important for many practical purposes that physical measures of entropy be formalized in terms of mathematical or statistically derived equations. It is important to conceptualize/understand molecular structure as well as physical/chemical properties of chemical compounds that can be related to physical measures of entropy. For example, Zhao et al. [2] have proposed a mathematical formula for measuring entropy of boiling of organic compounds by considering several structural properties. However, some early mathematical work was done by Gordon and Kennedy, [3] considering a graph-like state of matter, where a molecule is represented by a graph, [4] whose vertices represent atoms and the edges represent chemical bonds (usually covalent bonds) and proposed an LCGI (linear combination of graph invariants) scheme in explaining physical entropy. Pal and Chakrabarti [5], in an interesting work, have shown how the individual contribution of amino acids to loss of main chain conformational entropy in protein folding is different although the backbone is same for all amino acids. Since the publishing of a pioneering work by Shannon and Weaver [6] on information theory, however, the scope to relate information theory with physical entropy remained due to close resemblances in their mathematical forms. A renewed interest and emphasis on the subject is apparent from a recent publication from Reis et al. [7]. Bonchev et al. [8], to our knowledge, for the first time, applied information theoretical formalism on point group symmetry of molecular structure to predict entropy. However, they [8] worked on a small set of compounds. More recently, Raychaudhury and Pal [9] have used information theoretical measures of molecular structures in predicting gas-phase thermal entropy of a series of over 100 molecules consisting of cyclic and acyclic organic compounds.

Mathematical Foundations and Applications of Graph Entropy, First Edition. Edited by Matthias Dehmer, Frank Emmert-Streib, Zengqiang Chen, Xueliang Li, and Yongtang Shi. © 2016 Wiley-VCH Verlag GmbH & Co. KGaA. Published 2016 by Wiley-VCH Verlag GmbH & Co. KGaA.

234

8 Information Content Measures and Prediction of Physical Entropy of Organic Compounds

The concept of measuring information theoretical molecular descriptors using graph theoretical models of chemical compounds is due to Rashevsky [10]. Trucco [11] has established the idea based on automorphism group (orbits) of a graph. Mowshowitz [12] has worked on graph complexity and entropy. Sarkar et al. [13] have proposed a partition scheme on the atoms (vertices) of a molecule by defining first-order neighborhood of the atoms considering multigraph model of a chemical compound. This is possibly for the first time that multigraph model has been used to measure information content (IC) of molecular graph. They [13] applied the measure to compute first-order IC of nucleotides. Recently, Dehmer and coworkers [14–16] have defined the so-called information functionals on various graph invariants, such as vertex degree and topological distances in a graph, and carried out studies on their mathematical properties and applications, for example, discrimination of a very large number of nonisomorphic unweighted graphs, which is one of the most challenging tasks in the search for finding unique value (or a string of values) for each molecular structure [17–20]. Over the years, researchers have defined and used various measures of IC of a system, particularly by considering different elements of a molecular graph representing a molecular structure as the elements of a discrete system such as atom (vertex), chemical bond (edge), topological distances between pairs of vertices. Brillouin [21] defined total information content (TIC) by multiplying IC with the total number of elements, say N, in the system, in connection with proposing negentropy principle. It includes the size factor to measure IC. On the contrary, in order to get an information theoretical measure to minimize/eliminate the effect of the size of a system (e.g., molecular structure in chemistry), Basak et al. [22] proposed a normalized measure of IC by dividing IC with log N, where log N is the maximum IC of a system having N elements. Having a measure of IC with minimal effect of the size of a system for working with molecular structures and their properties, Basak et al. [22] called it structural information content (SIC) and applied it for quantitative prediction of molecular properties/activities. Looking further into the mathematical and conceptual aspects of information measures, Raychaudhury and Ghosh [23] noted that the residual information [13] obtained by subtracting IC from the maximum IC log N gives that amount of information which is not available because of the equivalence of several elements of the system and complements the information obtained by IC, which is defined based on an equivalence relation defined on the elements of the system partitioned into disjoint classes. Because of such a complementary nature of this measure, Raychaudhury and Ghosh [23] named it “complementary information content (CIC).” In a further study, Raychaudhury and Ghosh [24] used the normalized measure of CIC, say redundant information content (RIC), as a measure of similarity and applied it in studying the role of this similarity measure for classifying some antibacterial compounds. However, all these measures are defined considering partition of all the elements of a system, and the elements of a given partitioned class are equivalent to each other with respect to some criterion. As a result, the elements of a given partitioned class are interchangeable and that makes no difference in the IC measures. According to the same criterion, elements of

8.1

Introduction

any two different partitioned classes are of different characteristics. In chemical sense, if one considers the partition of the atoms of a molecule according to their chemical nature alone, all the carbon atoms will be in one class, all the hydrogen atoms will be in another class, and so on. However, if they are partitioned according to, say, the topological distance distribution associated with each atom (topological distances of all the atoms in the molecule from the atom under consideration), all the carbon/hydrogen/oxygen, and so on, atoms may not fall in the same partitioned class. This will thus depend on the topological distancedependent position of each atom in the molecule, and hence elements of different partitioned classes would have atoms at different distance topological positions in the molecule. This may be, thus, interesting and important to investigate the contribution of each element of a system, be it in a one-, two-, or multi-element partitioned class, in a measure of IC. Henceforth, we will call such a measure the “partial information content (PIC)” for individual elements of a system. In this paper, we have defined PIC as well as a related measure for individual partitioned classes – “class partial information content (CPIC)” and have discussed on their possible applications in predicting molecular properties, for example, physical entropy. The reviews by Dehmer and Mowshowitz [25], Barigya et al. [26], Basak [27], and the book by Bonchev [28] take the reader nicely through these areas of study. The IC measures in the form of information theoretical topological indices (ITTIs) have been used extensively in chemistry and structure–property/ structure–activity relationship studies toward predicting molecular properties/ activities [9, 27–29]. Even, substructural (for molecular fragments) ITTIs have been used in predicting biological activities of chemical (organic) compounds [30–32] with significantly high rate of success. Since information theoretical molecular descriptors have been found to be useful in explaining molecular properties and activities of several series of compounds, it is felt that bringing certain mathematical aspects of information theoretical measures together with some of their applications in predicting an important thermodynamic property such as entropy would be useful for the researchers working in this field, more so as because these two measures have almost similar mathematical constructs. Thus, for the purpose of discussing the mathematical aspects of certain information theoretical measures and the application of some of those measures in predicting physical entropy, we have structured the main contribution of this chapter in the following manner. We have first discussed some mathematical aspects of few information theoretical measures, which are derived from Shannon’s information measure [6]. This is followed by their application for measuring IC of graph. In this part, we have defined IC and related information theoretical measures for molecular graphs. However, since one of the objectives of this chapter is to discuss IC measures that could be applied for the prediction of physical entropy of chemical compounds, we have restricted our discussion for a nonempty discrete system, composed of some elements of connected (molecular) graph only in defining and working with information theoretical measures. The following section contains discussion on the applications of theoretical methods for predicting physical

235

236

8 Information Content Measures and Prediction of Physical Entropy of Organic Compounds

entropy with an emphasis on the application of information theoretical indices derived from molecular graph models of organic compounds.

8.2 Method

IC measures used here are due to Shannon and Weaver [6]. Although initially they proposed it as a mathematical theory of communication, the mathematical properties of this measure have been found to be suitable for use in different fields of study, for example, chemistry and biology, which are the areas of our current interest. As a part of research in mathematical chemistry, IC of graph has been widely used for the last few decades. Our main interest in this chapter is to discuss different information theoretical measures proposed over the years based on Shannon’s information [6] together with some mathematical theorems. These measures will be used then in deriving information theoretical molecular descriptors in the form of IC of (molecular) graph and explained with illustrations. 8.2.1 Information Content Measures

Let N elements be present in a finite discrete system S. A measure of IC due to Shannon and Weaver [6] may be obtained by defining an equivalence relation on the elements of S and partitioning them into, say, k disjoint classes following a partition scheme: 1 p1

2 p2

3 p3

………… . . . .. ………… . . . ..

k pk

The partition scheme satisfies the following conditions: pi ≥ 0, i = 1, 2, 3, …, k ∑k and i pi = 1, and the measure of IC of the system S is given by k ∑ IC(S) = − (pi log pi ).

(8.1)

i=1

Also, pi = Ni /N, Ni being the number of elements in the (cardinality of the) ith partitioned class, i = 1, …, k. Therefore, Equation 8.1 may also be given by ) ( ) k ( ∑ Ni Ni log IC(S) = − N N i=1 =

k ∑ i=1

(

Ni N

)

( log

N Ni

) .

(8.2)

8.2

Method

Computation of IC values using Equation 8.2 is more convenient since we always need to count the total number of elements in a system as well as the number of elements in the partitioned classes. It may also be noted that IC is bound by the following values: 0 ≤ IC(S) ≤ log N. The IC measure becomes 0 when all the elements of a system lies in one class (i.e., when all the elements are equivalent to each other and there is no partition of the elements). On the contrary, IC gets the value log N, the maximum IC of a system of N elements, when not a single element is equivalent to each other; that is, all of them are distinct for a given condition of equivalence. It may be noted that if the base of the logarithm is taken as 2, the computed value of IC is expressed in bits. A number of other information theoretical measures may be obtained from IC. One of such measures is due to Brillouin [21], formulated in connection with proposing “negentropy” principle. This measure takes into account the size of the system S, that is, N, and is called “total information content.” It may be obtained by multiplying IC with the number of elements N in the system and is given by TIC(S) = N × IC(S) ( ) ∑k N = . Ni log i=1 Ni

(8.3)

On the contrary, the effect of the size of a system may be eliminated/reduced by considering the normalized value of IC. It may be called normalized information content (NIC) of a system S given by NIC(S) =

IC(S) log N [ ∑k

=1−

log Ni N log N

i=1 Ni

] .

(8.4)

It may be noted that the measure of normalized IC was used as an SIC by Basak et al. [22] when they wanted to get an information theoretical measure that could minimize the effect due to the size of chemical compounds while working on explaining biological activities of some series of chemical compounds using SIC. Being a normalized measure, NIC value lies between 0 and 1: 0 ≤ NIC(S) ≤ 1. It is noteworthy that when IC is obtained from the equivalence of the elements of a system, some information is not available due to the fact that, considering what may be obtained if all the elements belong to single element partitioned classes; that is, one element in each partitioned class leading to having a value log N, the maximum IC. This deviation from maximum IC value gives another measure of information, where this residual value [13] is obtained by subtracting IC from maximum IC, that is, log N, of a system composed of N elements. By conceptualizing this measure for unavailable information as a complementary measure of IC, which is obtained from Shannon’s information given by Equation 8.1, Raychaudhury and Ghosh [23] called it “complementary information content”, denoted by

237

238

8 Information Content Measures and Prediction of Physical Entropy of Organic Compounds

CIC, which may be obtained from Equation 8.5 given as follows: CIC(S) = log N − IC(S) ) k ( ∑ Ni = log Ni . N i=1

(8.5)

The measure CIC is bound by the following values: log N ≤ CIC(S) ≤ 0. Furthermore, normalization of CIC gives another measure, RIC [24], a measure of redundancy in a system and is given by RIC(S) =

CIC(S) log N

= 1 − NIC(S).

(8.6)

Clearly, RIC will assume a value 1 when all the N elements of a system are in a single partitioned class and 0 when each of them is in N different partitioned classes (i.e., one element in each of the N partitioned classes), and therefore, RIC is bound by the values: 1 ≤ RIC ≤ 0. It is to be noted that Raychaudhury and Ghosh [24] used RIC as a measure of similarity, where RIC = 1 corresponds to maximum similarity and RIC = 0 corresponds to no similarity (maximum dissimilarity). It may also be noted that a similar measure has been used by Basak et al. [22] in molecular structure–activity relationship studies, and they called it relative nonstructural information content (RNSIC). All the aforementioned information theoretical measures are obtained by partitioning the elements of a system into disjoint classes. It is interesting to note that the measure of IC due to Shannon and Weaver [6] is obtained by adding the values of individual partitioned classes and the elements therein. Therefore, it creates interest to know the amount of information contributed by each partitioned class as well as each element of a partitioned class. We call the contribution made by each element of a partitioned class i in measuring IC of a system S the “PIC” of S “for an element (e) of ith partitioned class” and is given by ( ) ( ) 1 N log . (8.7) PIC(Sei ) = N Ni In addition, a measure of partial IC for ith partitioned class may also be obtained. We call it the “CPIC” of the ith partitioned class of the system S, and is given by ( ) ( ) Ni N log . (8.8) CPIC(Si ) = N Ni Therefore, both the measures, PIC and CPIC, depend on both N and Ni ; that is, cardinalities of the original set of all elements in S and the ith partitioned class, respectively. Now, in order to illustrate the computation of different information theoretical measures for a given system S having 10 elements, let us consider the following

8.2

Method

partition of 10 with respect to a given equivalence relation defined on the elements of S: 10 (3, 3, 2, 2). In this partition of 10, there are three elements in the first two partitioned classes, that is, N 1 = N 2 = 3 and two elements in each of the two other partitioned classes, that is, N 3 = N 4 = 2. Therefore, computation of different information theoretical measures of this partition of 10 may be computed as follows: (1) IC(S) – (Equation 8.2): [ ( ) ( )] [ ( )] ( ) 3 10 10 2 IC(S) = 2 × log2 + 2× log2 10 3 10 2 = 1.9709. (2) TIC(S) – (Equation 8.3): TIC(S) = 10 × IC(S) = 19.7090. (3) NIC(S) – (Equation 8.4): 1.9709 NIC(S) = log2 10 = 0.5933. (4) CIC(S) – (Equation 8.5): CIC(S) = log2 10 − IC(S) = 1.3510. (5) RIC(S) – (Equation 8.6): RIC(S) = 1 − NIC(S) = 0.4067. (6) PIC(S) – (Equation 8.7): Let us consider the first partitioned class S1 containing three out of 10 elements in the system S. Therefore, PIC for each element of that partitioned class, PIC(Se1 ), may be computed as ( ) ( ) 1 10 PIC(Se1 ) = log2 10 3 = 0.0523. (7) CPIC(S) – (Equation 8.8): Now, CPIC of the first partitioned class CPIC (S1 ) may be computed as ( ) ( ) 3 10 log2 CPIC(S1 ) = 10 3 = 0.1569. IC using logarithm to the base 2 is expressed in bits.

239

240

8 Information Content Measures and Prediction of Physical Entropy of Organic Compounds

8.2.2 Information Content of Partition of a Positive Integer

We have seen that the measure of Shannon’s information [6] for a system S depends on the partition of the elements of the system. Now, the elements of S may be partitioned in different ways depending upon the kind of equivalence relation defined on it. For example, let there be 10 books on a table. Now say, five of them have hard cover and five of them do not have that. Again, let four of them are written in English, three in French, and the remaining three are written in Spanish. Clearly, if the books are partitioned based on hard cover, 10 books will be partitioned into two disjoint classes containing five books each. Therefore, the partition, say P1 , of the books based on this equivalence relation may be given by P1 (S) = 10(5, 5) and IC value of S corresponding to this partition P1 may be obtained using Equation 8.2: ( ) ( ) 5 10 log2 IC(P1 ) = 2 × 10 5 = 1.0000. Now, if the books are partitioned based on language, one would get a partition, say P2 , as P2 (S) = 10(4, 3, 3). Therefore, the corresponding IC value may be computed using Equation 8.2: ( ) ( ) ( ) { ( )} 3 4 10 10 IC(P2 ) = log2 + 2× log2 10 4 10 3 = 1.5709. It may be noted that as the elements of a system are partitioned into more disjoint classes with smaller cardinalities, the IC value increases. Clearly, if all the 10 elements of the system are partitioned into 10 disjoint classes having one element in each partitioned class, IC would get the maximum value, that is, log2 10, which is 3.3219 and is greater than the previously computed two values. Clearly, IC measures depend on the cardinalities of the partitioned classes. Therefore, it may be useful if some mathematical results can be derived on IC of partitions of the elements of a system, which may help understand how IC values may change with partition. This may in turn help decide how the equivalence relation may be induced on the elements of a system. We give here mathematical results on IC of a partition for some of the information theoretical measures defined earlier. Shannon’s Information content [6] for two partitions P1 and P2 of a positive integer N [33]:

8.2

Method

Theorem 8.1. If P1 = N P2 = N

(N1 , (N1 ,

N2 , N2 ,

…, …,

Nj , Ni + 𝛼,

…, …,

Nj , Nj − 𝛼,

…, …,

Nk ) Nk )

under the conditions: ∑k N1 ≥ N2 ≥ … ≥ Nk ; i=1 Ni ; k ≤ N, i < j, 𝛼 < Nj , 𝛼 being a positive integer, be two partitions of a positive integer N, then IC of partition P1 is more than the IC of partition P2 . Proof. Let us denote IC of partitions P1 and P2 of a positive number N by IC(P1 ) and IC(P2 ), respectively. To prove the theorem, we need to consider the following two possibilities: Case 1: 𝛼 < Nj < Nj In this case, IC(P1 ) − IC(P2 ) may be given as follows using Equation 8.2: ) ( ) ) ( ( Nj Ni Ni + 𝛼 N N N log log log + − IC(P1 ) − IC(P2 ) = N Ni N Nj N Ni + 𝛼 ( ) Nj − 𝛼 N log − . N Nj − 𝛼 By carrying out few necessary steps, this may be expressed as [ {( )𝛼 }] Ni (1 + (𝛼∕Ni ))Ni +𝛼 1 log . × IC(P1 ) − IC(P2 ) = N Nj (1 − (𝛼∕Nj ))𝛼−Nj Since, 𝛼 + Ni > 𝛼 − Nj , 1 + (𝛼/Ni ) > 1 − (𝛼/Nj ) and (Ni /Nj ) > 1, we get, IC(P1 ) − IC(P2 ) > 0. Thus, information content of partition P1 is more than that of partition P2 . Case 2: 𝛼 < Nj = Ni In this case, the expression using Equation 8.2 may be obtained as [ {( ))Ni +𝛼 }] ( 1 + 𝛼∕Ni 1 log . IC1 − IC2 = N (1 − (𝛼∕Ni ))𝛼−Ni Since 𝛼 + Ni > 𝛼 − Ni and [1 + (𝛼/Ni )] > [1 − (𝛼/Ni )], we get IC1 −IC2 > 0. Thus, in this case too, information content of P1 is more than that of P2 . Hence, the theorem is proved.

◽

It may be noted that Theorem 8.1 can be extended by considering 𝛼 = Nj , in which case, the jth component of partition P2 will be zero [33]. In addition, an empty partitioned class does not contribute anything to the IC measure. Now,

241

242

8 Information Content Measures and Prediction of Physical Entropy of Organic Compounds

certain similar results considering a special situation may be given here for another information theoretical measure, namely CIC [23] defined earlier in this chapter. Theorem 8.2. If P3 = N N4 = N

(N1 ), (N1 + 1,

N2 , N2 ,

…, …,

Nk ) Nk − 1)

…, …,

are two partitions of a positive integer N under the conditions: ∑k N1 ≥ N2 ≥ … ≥ Nk ; i=1 Ni ; k ≤ N, then CIC of partition P3 is less than the CIC of partition P4 . Proof. Let us denote CIC of P3 and P4 by CIC(P3 ) and CIC(P4 ), respectively. Now, using Equation 8.5, we get ) ( ) ( Nk N1 log N1 + log Nk CIC(P3 ) − CIC(P4 ) = N N ( ) ( ) ( ( ) ) Nk − 1 N1 + 1 log Ni + 1 − log Nk − 1 . − N N It is so because values for all other partitioned classes of P3 and P4 cancel each other. Thus, by simplifying the calculation, one can get CIC(P3 ) − CIC(P4 ) < 0. Hence, the theorem is proved. ◽ Thus, CIC will decrease if one element from a partitioned class of lower cardinality is added to a partitioned class of higher or, equal cardinality. These results seem to support the idea of calling the residual or unavailable information measure as CIC, since IC value will increase in such cases, which can be verified by Theorem 8.1. We will illustrate these results by taking few graphs as example, in the section on IC of graph. Another interesting result [9] may be obtained for TIC measure given by Equation 8.3. This result deals with a situation when one system has more elements than the other, and the elements of both the systems are partitioned into equal number of disjoint classes. Theorem 8.3. If P5 = N P6 = N + 1

(N1 , (N1 ,

N2 , N2 ,

…, …,

Ni Ni + 1

…, …,

Nk ) Nk )

are two partitions of two positive numbers N and ( N + 1), respectively, i = 1, 2, … k; with the condition k ≥ 2, then, TIC of partition P6 is more than TIC of partition P5 . Proof. Let us denote total information content of partition P5 by TIC(P5 ) and TIC of partition P6 by TIC(P6 ). Now, TIC(P6 ) − TIC(P5 ) = (N + 1) log(N + 1) − N log N − (Ni + 1) log(Ni + 1) + Ni log Ni

8.2

= log

(N + 1)N+1 × (Ni )Ni N N × (Ni + 1)Ni +1

Method

.

After few steps of calculation, one gets: TIC(P6 ) − TIC(P5 ) > 0 Hence, the theorem is proved.

◽

We will discuss the usefulness of this result (Theorem 8.3) in the application section regarding explaining physical entropy of organic compounds using information theoretical molecular descriptors. 8.2.3 Information Content of Graph

The idea of measuring IC of graph was introduced by Rashevsky [10] by applying Shannon’s information theoretical formalism [6] on graph theoretical models of molecular structures [34]. The graph model of a chemical compound provides the topological architecture of the molecule showing how its atoms represented by vertices are connected by chemical (usually covalent) bonds represented by edges. Thus, a graph model depicts molecular topology [35] in terms of connectivity. However, in order to use connectivity information for practical purposes, for example, in structure–property/structure–activity relationship studies, quantitative descriptors of molecular topology (connectivity) are usually computed [36]. Such descriptors are known as topological indices [37]. Accordingly, any measure of IC of graph may be called information theoretical topological index (ITTI). From graph theoretical point of view, any number that can be associated with/derived from a graph, is a graph invariant [4] and it gives the same value for two isomorphic graphs. From that point of view, any ITTI is a graph invariant. However, none of such ITTIs has yet been found to be unique, and it has been found that two or more graphs have same ITTI value. Nevertheless, information theoretical molecular descriptors have found useful applications in chemistry and in explaining a number of molecular properties and activities as mentioned earlier. It is, therefore, important to discuss how various ITTIs have been defined by considering graphs and graph models of chemical compounds. In doing that, we need to give here few basic definitions of graph theory [4] as well as define few more things related to graph model of molecular structure. Let G(V , E) be a connected graph [4] having nonempty set of vertices (V ) with unordered nonempty set of edges (E) connecting the vertices. For G, we first give here the following graph theoretical definitions: (1) Degree of vertex [4]: Number of edges incident with the vertex; (2) Path between pair of vertices [4]: A path between two vertices in a graph is an alternate sequence of vertices and edges starting and ending with vertices, for example, v1 ,e1 , v2 ,e2 …, en − 1 ,vn , where there are n vertices v1 , v2 , …, vn and n − 1 edges e1 , e2 , …, en − 1 in the path. It may be noted that there may be multiple shortest paths of a given length between two vertices in a graph containing cycle;

243

244

8 Information Content Measures and Prediction of Physical Entropy of Organic Compounds

(3) Shortest path: A shortest path between two vertices in a connected graph is a path that contains minimum number of edges; (4) Distance between pairs of vertices [4]: It is the number of edges in the shortest path between two vertices in a connected graph. Now, we give some definitions for the graph model of molecular structure: (1) Molecular graph: A molecular graph is a connected graph that represents the structural formula of a chemical compound giving the connectivity of the atoms (vertices) by chemical (primarily covalent) bonds (edges). (2) Molecular path: A molecular path is a sequence of vertices and edges in a molecular graph, where chemically meaningful labels are assigned to its vertices and/or, edges, for example, if we assign C for carbon, O for oxygen, H for hydrogen, then a path Carbon → Oxygen → Hydrogen may be written as (C–O–H). It may be noted that a similar definition was termed as chemical path by the authors in an earlier work [9]. (1) Shortest molecular path: We define a shortest molecular path between two vertices in a molecular graph as the path which has the minimum number of edges present in it. (2) Vertex-weighted molecular graph: A vertex-weighted molecular graph is a molecular graph in which some weights reflecting some molecular property of the atoms of chemical compound are assigned to its vertices. (3) TopoChemical Molecular Descriptor (TCMD): A TCMD is a real number derived from a molecular graph, whose vertices and/or edges are given chemically meaningful labels (alphabetical or numerical), for example, C for carbon, N for nitrogen, and so on, as vertex labels for the corresponding atoms and bond order as label for edges. (4) TopoPhysical Molecular Descriptor (TPMD): A TPMD is a real number derived from a molecular graph, whose vertices and/or edges are assigned with a physical property of the atoms and bonds, for example, atomic number, atomic weight, and so on, as vertex weight and bond length (in angstrom) as weight for the corresponding edges. It may be noted that Sarkar et al. [13] defined (first-order) neighborhood of the vertices in a multigraph and computed first-order IC of the graphs representing nucleotides. However, we do not deal with multigraph here. Interested readers may find a nice account of this measure and its applications in the papers by Sarkar et al. [13], Raychaudhury et al. [29], and Basak [27]. Now, we give some IC measures of graph together with illustrations and few results for some of them. In doing that, we will consider some of the widely used graph theoretical elements, namely degree of vertex, distance between pairs of vertices, as well as paths between pairs of vertices and vertex-weighted graphs representing molecular structures.

8.2

Method

8.2.3.1 Information Content of Graph on Vertex Degree

Let d1 , d2 , …, dn be degrees of n vertices (degree sequence) of a connected graph G and d the sum of the degree values such that d = d1 + d2 + … + dn . Considering d to be partitioned into k disjoint classes from the equivalence of vertex degree, one can have an IC measure ICd (G) of G from the partition of the vertices of G as ) k k ( ∑ ∑ di d log2 ; di = d. ICd (G) = (8.9) d di i=1 i=1 Illustration

Let G1 be a chain graph of six vertices:

G1 : o – o – o – o – o – o In graph G1 , there are four vertices of degree 2 and two vertices of degree 1. From equivalence of the degree values, the vertices will be partitioned into two disjoint classes, with one containing four vertices of degree 2 and the other containing two vertices of degree 1. Therefore, from Equation 8.2, we get, ICd (G1 ) as ) ( ) ( 2 4 6 6 log2 + log2 ICd (G1 ) = 6 4 6 2 = 0.9182. Now, we can compute other information theoretical indices on vertex degree for the graph G1 as TICd (G1 ) as TICd (G1 ) = 6 × IC d (G1 ) = 5.5092 NIC(G1 ) as NICd (G1 ) =

ICd (G1 ) log2 6

= 0.3552 CICd (G1 ) as CICd (G1 ) = log2 6 − ICd (G1 ) = 1.6668 RICd (G1 ) as RICd (G1 ) =

CICd (G1 ) log2 6

= 0.6448. It may be noted that the connectivity information of a (connected) graph is encoded in its adjacency matrix [4] and the sum of the nonzero elements of a given

245

246

8 Information Content Measures and Prediction of Physical Entropy of Organic Compounds

row/column of the matrix gives the degree of the corresponding vertex. Therefore, if we denote the adjacency matrix of G1 as A(G1 ) and label the vertices as 1, 2, …, 6 from left, then the labeled G1 graph and its adjacency matrix A(G1 ) together with degree of vertex d(v) may be given by G1 :

1

2

3

4

5

6

o – o – o – o – o – o

A(G1 ):

1 0 1 0 0 0 0

1 2 3 4 5 6

2 1 0 1 0 0 0

3 0 1 0 1 0 0

4 0 0 1 0 1 0

5 0 0 0 1 0 1

6 0 0 0 0 1 0

Degreeof vertex d(v1 ) = 1 d(v2 ) = 2 d(v3 ) = 2 d(v4 ) = 2 d(v5 ) = 2 d(v6 ) = 1

8.2.3.2 Information Content of Graph on Topological Distances

Similar to degrees of vertex, topological distances [4] between pairs of vertices in a connected graph are important graph theoretical elements and a number of ITTIs have been defined on the its basis. If there are n vertices in a connected graph, there will be [n × (n − 1)/2] distance values of different magnitudes that would exist if distance between all pairs of vertices are taken together. This may be obtained from the entries of the distance matrix associated with the graph. If D(G1 ) is the distance matrix of the graph G1 , then D(G1 ) may be given by

D(G1 )

1 2 3 4 5 6

1 0 1 2 3 4 5

2 1 0 1 2 3 4

3 2 1 0 1 2 3

4 3 2 1 0 1 2

5 4 3 2 1 0 1

6 5 4 3 2 1 0

Distance sum D(v1 ) = 15 D(v2 ) = 11 D(v3 ) = 9 D(v4 ) = 9 D(v5 ) = 11 D(v6 ) = 15

In the above distance matrix representation of G1 , we have 5, 4, 3, 2, and 1 number of distances of magnitudes 1, 2, 3, 4, and 5, respectively. The column given next to the distance matrix gives the sum of all the distances in the rows, also referred to as distance sum [28], D(v), of the vertices. In G1 , there are six D(vi ) values for the six vertices of G1 , i = 1, 2, …, 6. Bonchev and Trinajstic [38] first used a partition of distances from the equivalence of their magnitude. Since there are [n × (n − 1)/2] distances in a graph of n vertices, IC on distances may be computed from the partition of these distances based on their magnitude. Denoting this measure by ICD (G) for a connected graph G, it may be obtained from the following derivation:

8.2

Method

For the sake of simplicity, let us define n × (n − 1) , (8.10) 2 where D denotes distance. If there are ni distances of magnitude Di , in a connected graph G, i = 1, 2, …, max(D), then information content of G on distances, ICD (G), may be obtained from the following equation using Equation 8.2: ∑ ni ND log2 . (8.11) ICD (G) = D ni N i ND =

Now, a measure of TIC on graph distances, TICD (G), may also be obtained from the following equation using Equation 8.3: TICD (G) = N D × ICD (G).

(8.12)

Bonchev and Trinajstic [38] also proposed an information theoretical measure on the magnitude of distances. Denoting the measure by ICM (G), it may be derived in the following way: Let us denote the sum of all the ni number of distances of magnitude Di by DM , that is, ∑

max(D)

DM =

ni Di .

(8.13)

i=1

Now, considering DM to be partitioned into disjoint classes in such a way that there are ni number partitioned classes of cardinality Di , ICM (G) may be obtained from the following equation using Equation 8.2: ∑ D DM ICM (G) = ni × Mi log2 . (8.14) Di D i Now, TIC on the magnitude of graph distances may also be defined. Denoting by TICM (G), it may be defined by the following equation using Equation 8.3: TICM (G) = DM × IC M (G).

(8.15)

Evidently, one can always define other information theoretical measures, namely NIC, CIC, and RIC on graph distances and magnitude of graph distances using the corresponding equations given earlier. Illustration In order to illustrate the computation of the information theoretical indices on graph distances and the magnitudes, we take those of G1 as an example: In G1 , there are 5, 4, 3, 2, and 1 number of distances of magnitudes 1, 2, 3, 4, and 5, respectively. Therefore, there are 15 distances of different magnitudes in G1 . Now, considering a partition of these 15 distances from the equivalence of their magnitude, we can compute ICD (G1 ) and TICD (G1 ) from Equations 8.11

247

248

8 Information Content Measures and Prediction of Physical Entropy of Organic Compounds

and 8.12, respectively, as ) ( ) ( ) ( 15 15 15 4 3 5 log2 + log2 + log2 ICD (G1 ) = 15 5 15 4 15 3 ( ) 15 15 2 1 + log2 + log2 15 2 15 1 = 2.1493. Therefore, TIC on graph distances for G1 is TICD (G1 ) = 15 × ICD (G1 ) = 32.2395. We can now compute IC on the magnitude of distances in the following way: In G1 , the sum of all distances, DM , according to Equation 8.13 is DM = (1 × 5) + (2 × 4) + (3 × 3) + (4 × 2) + (5 × 1) = 35. Therefore, from this value of DM , ICM (G1 ) may be computed using Equation 8.14 as follows: ( ) ( ) ( ) 5 4 3 35 35 35 ICM (G1 ) = 1 × log2 + 2 × log2 + 3 × log2 35 5 35 4 35 3 ) ( ) ( 1 2 35 35 + 5 × log2 + 4 × log2 35 2 35 1 = 3.8544. Now, TIC on the magnitude of graph distances for G1 , TICM (G1 ), may be calculated using Equation 8.15 as TICM (G1 ) = DM × ICM (G1 ) = 134.9040. It may be noted that the information theoretical measures ICD , TICD , ICM , and TICM given here are the same with the distance-based measures proposed by W Bonchev and Trinajstic [38], where such measures were denoted by I D , ID , I D , W and ID , respectively. While distance-based information theoretical measures are computed for entire molecules, Raychaudhury et al. [19] proposed that such measures for individual vertices may also be obtained. Subsequently, those measures may be used to derive indices for the entire molecule. Accordingly, they considered distances of all the vertices from a given vertex in a connected graph as elements of a system and partitioned the elements into disjoint classes from equivalence of the distance values (distances of all the vertices in a connected graph from a given vertex may be obtained from the corresponding row/column in the distance matrix of the graph). It is interesting to note that this, in fact, gives a partition of the vertices in a graph if the given vertex lying at a distance zero from itself is also considered. Taking this into account, one can have an IC measure for a vertex v based on equivalence of

8.2

Method

distances. Therefore, if there are n vertices in G, mj vertices at a distance Dj from a vertex vi , i = 1, 2, …, n, in G and the measure is denoted by ICV (Gi ) for vertex vi , we get ( ) max(D) ∑ mj n . (8.16) log2 ICV (Gi ) = n mj j=0 In Equation 8.16, max(D) is the maximum distance, where a vertex is situated from vi in G and m0 is always 1, which is for vertex vi itself. However, an information theoretical measure for the whole graph may be obtained by computing the average ICV (Gi ) value for all the vertices of G. If this measure for the whole molecule is denoted by ICA (G), we get 1∑ V IC (Gi ), n i=1 n

ICA (G) =

(8.17)

where ICV (Gi ) is the IC of ith vertex in G. Similarly to the entire graph, an IC measure for individual vertices of G may be obtained on the magnitude of distances too. In fact, in this case, the distance values Dj of mj vertices of G from the ith vertex vi of G are the components of the partition of distance sum D(vi ) for the ith vertex in G, i = 1, 2, …, n. In this case, D0 , that is, the distance of the vertex vi from itself, which is zero, does not come into consideration. Denoting this measure for ith vertex by ICD (Gi ), we get ∑ Di D(vi ) ICD (Gi ) = log2 . (8.18) D(v ) Di i i Now, from the IC values on the magnitude of distances for n vertices of a graph G, one can get a measure for the graph G. Raychaudhury et al. [19] considered distance sum D(v) of individual vertices as the components of a partition of the sum of all the entries in the distance matrix D(G) of G, which is (DM × 2), where DM is obtained from Equation 8.13. In order to have a statistical average measure by considering partition of a partition (successive partition) scheme, Raychaudhury et al. [19] proposed an information theoretical measure on magnitude of distances for graph G from those of its n vertices. Denoting it by ICM (G), it may be computed from the following equation: ICM (G) =

n ∑ D(vi ) × ICD (Gi ). DM × 2 i=1

(8.19)

It may be noted that these information theoretical indices ICV , ICD , ICA , and ICM for vertex and entire graph were denoted by Vc (vertex complexity), Vd (vertex distance complexity), HV (graph vertex complexity), and HD (graph distance complexity) in the paper where the authors originally proposed the measures [19]. The vertex indices belong to a class of indices referred to as local graph invariants (LOVI) [39] and the index Vd has found useful applications in the prediction of biological activities of different series of chemical compounds [30–32].

249

250

8 Information Content Measures and Prediction of Physical Entropy of Organic Compounds

Illustration We will consider the distances in graph G1 for computing the vertex and graph indices. We find from the distance matrix D(G1 ) of graph G1 that there are two vertices at distance 1, two vertices at distance 2, and one vertex at distance 3 from vertex 3 (v3 ) in G1 . Now, there are six vertices in G1 , including v3 , which is at a distance zero from itself and belongs to its own single-element partitioned class. Considering this information, we get a partition, say, P(G1 ) of six vertices of G1 as

P(G1 ) = 6(2, 2, 1, 1). From this partition, one can compute ICV(G13 ) for vertex v3 as ) ( ) ( 1 2 6 6 + 2 × log2 ICV(G13 ) = 2 × log2 6 2 6 1 = 1.9184. In a similar way, one can compute the values of ICV (Gi ) for other vertices of G1 . The values are given below: ICV(G11 ) = 2.5850; ICV(G12 ) = 2.2516; ICV(G14 ) = 1.9184; ICV(G15 ) = 2.2516; ICV(G16 ) = 2.5850. Now, from these vertex information values, one can compute the value of information measure ICA (G1 ) for graph G1 using Equation 8.17 as follows: 1 × [(2 × 2.5850) + (2 × 2.2516) + (2 × 1.9184)] 6 = 2.2517.

ICA (G1 ) =

In order to compute the information theoretical measures ICD (G1i ) for the vertices of G1 and ICM (G1 ) for G1 on magnitude of the distances in G1 , we proceed as follows: The distance sum D(v3 ) of vertex 3 in G1 is 9. Let us consider 9 to be partitioned into its component distance values as 9 (3, 2, 2, 1, 1). Therefore, using this partition scheme, one can compute ICD (G13 ) as ] [ ] [ ] [ 2 1 9 9 9 3 ICD (G13 ) = log2 + 2 × log2 + 2 × log2 9 3 9 2 9 1 = 2.1968. Now, this information theoretical measure for other vertices of G1 may be computed in a similar way. The values are given below: ICD (G11 ) = 2.1493; ICD (G12 ) = 2.1181; ICD (G14 ) = 2.1968; ICD (G15 ) = 2.1181; ICD (G16 ) = 2.1493. From these vertex information values, ICM (G1 ) for G1 may be computed in the following way: Here, sum of all the D(vi ) values (i.e., DM × 2) is (2 × 9) + (2 × 11) + (2 × 15) = 70.

8.2

Method

Therefore, ICM (G1 ) may be computed using Equation 8.19 as [ ] [ ] [ ] 15 11 9 ICM (G1 ) = 2 × × 2.1493 + 2 × × 2.1181 + 2 × × 2.1968 70 70 70 = 2.1517. 8.2.3.3 Information Content of Vertex-Weighted Graph

As of this section, we have discussed on using information theoretical formalism for computing ITTIs considering elements of unweighted graphs. While such information theoretical molecular descriptors can serve useful purposes, in some cases, assigning weights on vertices may give some useful descriptors that may be used for explaining molecular properties. In this section, we define two such indices based on vertex-weighted graph models of chemical compounds. Let Zi be the atomic number of the ith atom in a chemical compound, where i = 1, 2, …, n, n being the number of atoms in the molecule. We discussed in the previous section that information theoretical measures may be computed on the magnitude of the partitioned elements of a graph. In a similar way, consider Zi as the magnitude of the partitioned vertices obtained from the weights assigned to V of the compound. the vertices of a vertex-weighted molecular graph model GW In order to have the measure of IC of this vertex-weighted graph, we proceed as follows: Let Z=

n ∑

Zi .

(8.20)

i=1

Now, considering Zi values as a partition of Z into n disjoint classes, we can use Shannon’s information formula [6] to have a measure of “IC on atomic number” V using Equation 8.20 as ICz for GW ) n ( ∑ Zi Z V log2 . (8.21) )= ICZ (GW Z Zi i=1 Having this measure of IC on atomic number, one can also have a measure of TIC on atomic number, TICZ , and it may be obtained from V V ) = Z × ICZ (GW ). TICZ (GW

(8.22)

8.2.4 Information Content on the Shortest Molecular Path

Measures of IC may also be obtained from the partition of the vertices of a molecular graph from path equivalences. Two vertices u and v in a vertex-labeled molecular graph GP are said to be equivalent if (1) the number of shortest molecular paths between u and all other vertices of Gp is the same as those between v and all other vertices in GP ;

251

252

8 Information Content Measures and Prediction of Physical Entropy of Organic Compounds

(2) for each shortest molecular path of length l from u, there is one shortest molecular path of length l from v; (3) chemical labels of u and other vertices in that shortest molecular path from u are the same as those of v and other vertices in the shortest molecular path from v. Now, from the equivalence of two vertices of GP satisfying above conditions, one can have a partition of the vertex set of GP and a measure of IC on the shortest molecular path, ICP , for GP . Thus, ICP for GP may be given by ∑ Xi X log2 . ICP (GP ) = (8.23) X X i i In Equation 8.23, Xi is the number of vertices in the ith partitioned class and ∑ X= Xi . i

Subsequently, total IC on the shortest path, TICP , may also be defined as TICP (GP ) = X × IC P (GP ).

(8.24)

8.2.4.1 Computation of Example Indices

Computation of the indices ICZ , TICZ , ICP , and TICP is illustrated below (Figure 8.1), taking ethane as an example. V In graph GW of ethane (Figure 8.1), the respective atomic numbers have been put as weights on the vertices, which are representing the atoms in ethane molecule. With two carbon atoms (atomic number 6) and six hydrogen atoms (atomic number 1) in this molecule, the sum of the atomic numbers Z is Illustration

V Z[GW ethane] = (2 × 6) + (6 × 1) = 18.

Therefore, ICZ for ethane may be obtained using Equation 8.22 as follows: ) ( ) ( 1 6 18 18 V ICZ [GW + 6 × log2 ethane] = 2 × log2 18 6 18 1 = 3.1133. In addition, TICZ for ethane may be obtained using Equation 8.23 as follows: V TICZ[GW ethane] = 18 × 3.1133 = 56.0394.

o1 V of ethane: GW

o1 o4 o4 o1 o1 H2

GP of ethane:

Figure 8.1 Vertex-weighted graph GVW and atomic symbol-labeled graph GP of ethane.

o1

H1

C1 H3

o1 H4 C2 H5

H6

8.3

Prediction of Physical Entropy

Again, in GP of ethane (Figure 8.1), molecular paths of different lengths (LX)* from its vertices (atoms) are

H1 –H6

C1 and C2

H–C: 6 × L1 H–C–H: 2 × L2 H–C–C: 1 × L2 H–C–C–H: 3 × L3

C–H: 3 × L1 C–C: 1 × L1 C–C–H: 3 × L2

* LX stands for the length of the path.

Looking into these molecular paths, one can define an equivalence relation on the vertex set (atoms) of GP based on path types and the number of such paths and can partition the vertex set of GP of ethane into disjoint classes. Subsequently, using this partition scheme, one can compute the IC indices ICP and TICP for ethane using Equations 8.23 and 8.24, respectively. Clearly, the eight vertices (atoms) of GP are partitioned into two disjoint classes, where one class contains six vertices representing six hydrogen (H) atoms (H1 –H6 ) and the other class contains two vertices representing two carbon (C) atoms (C1 and C2 ). Now, this partition of eight vertices, that is, 8 (6, 2), may be used to compute the indices. Therefore, ) ( ) ( 2 6 8 8 log2 + log2 ICP [GP ethane] = 8 6 8 2 = 0.8112, TICP [GP ethane] = 8 × 0.8112 = 6.4896.

8.3 Prediction of Physical Entropy

Prediction of physical entropy deserves attention since it is an important thermodynamic property which has many practical applications [2, 9]. This is true for both small molecules and macromolecules, including proteins, which are the key players for regulating many biological functions. For example, Pal and Chakrabarti [5] considered some structural properties of proteins taken from a three-dimensional (3D) data set of protein structures to estimate loss of main chain conformational entropy due to protein folding. Here, however, our primary interest is to discuss prediction of physical entropy of small organic compounds. In this regard, we will first discuss two interesting works for the prediction of entropy – one using graph theoretical concepts and the other from molecular properties. This will be followed by a discussion on two information theoretical approaches for the prediction of physical entropy.

253

254

8 Information Content Measures and Prediction of Physical Entropy of Organic Compounds

Few decades ago, Gordon and Kennedy [3] considered a graph-like state of matter and proposed that physical entropy could be calculated from an LCGI. It was emphasized [3] that if the coefficients of graph invariants in a mathematical equation could be determined empirically or purely from graph theory, then a physical measurable chemical system could be expressed in the form of LCGI. Subsequently, authors of [3] used graph-like state input terms to compute physical entropy, for example, input terms for deriving combinatorial (rotational symmetry) entropy by considering the order of the automorphism group of a molecular graph taken in its floppy graph-like state. It was perhaps for the first time that graph theoretical consideration was brought into picture for measuring physical entropy. However, there was a need for the development of theoretical models that could predict entropy for a greater variety of organic compounds. More recently, Zhao et al. [2] have considered a number of molecular structural and other properties to explain entropy of boiling of organic compounds. The authors [2] considered a number of parameters to build a multiple regression model for the prediction purpose. Reviewing an earlier work, they chose four independent variables to generate the regression equation model – rotational symmetry number (𝜎), effective number of torsional angle (𝜏), planarity number (𝜔), and a variable that takes into account the contribution of most recognized hydrogen bonding groups (group contribution) – since hydrogen bonding plays an important role in the change of entropy (increase in the value) when molecules undergoes change of phase from liquid to gas state at the time of vaporization. The regression model having a high statistical significance was built from the entropy values and those of other parameters of 903 organic compounds of diverse nature and produced closeness in the observed and predicted entropy values for a set of test compounds as evidenced from low root-mean-square error (RMSE) value obtained from statistical analysis on the data. This indicated that this regression model might be useful for predicting entropy of boiling for a large number of organic compounds. 8.3.1 Prediction of Entropy using Information Theoretical Indices

Although physical entropy could be measured using various theoretical methods, our primary interest was to discuss the application of information theoretical measures in predicting physical entropy as there was a close resemblance in their mathematical structure. To the best of our knowledge, information theoretical formalism to predict entropy was first used by Bonchev et al. [8] by partitioning the atoms of a compound from point group symmetry consideration, which takes into account geometrical aspects of molecular structure such as bond length and bond angle that had considerable impact on the properties of molecules. In their studies, they proposed two measures of IC on symmetry, I sym (TIC) and I sym (IC), and computed the values of these indices for small sets of some of the well-known chemical compounds, for example, alkanes, alcohols, and aromatic hydrocarbons. However, the necessity for developing more generalized mathematical/statistical

8.3

Prediction of Physical Entropy

models for the prediction of entropy using suitable information theoretical formalism on molecular graphs still remained. More recently, Raychaudhury et al. [9] have proposed and used some of the ITTIs, described earlier in this chapter, in explaining gas-phase thermal entropy of a series of organic compounds. The idea was to develop statistical regression model by using meaningful information theoretical molecular descriptors as well as a mathematical result that could be used as more generalized methods to predict gas-phase thermal entropy (S0 ) for a large number of organic compounds composed of both cyclic and acyclic structures. For developing a useful regression model, they defined two sets of ITTIs, reflecting important structural features such as size, bulk, and topological symmetry of the molecular structure, which could be believed to determine gas-phase thermal entropy of organic compounds [9]. They considered observed entropy (S0 ) values (taken from the literature) and the computed values of some of the ITTIs defined in their paper [9] for 100 compounds as a training set for finding correlation between S0 and ITTIs, and for developing a regression equation model, which could be used to predict gas-phase thermal entropy. Two ITTIs, TICZ and TICP , were individually found to produce high correlation (r2 ≥ 0.86; r being correlation coefficient) for 100 training set compounds. However, by taking both together for developing multiple regression model, the correlation became much higher (r2 = 0.92). Subsequently, entropy values were predicted for a test set of 10 compounds, comprising both cyclic and acyclic molecules, using the regression model corresponding to this higher correlation with two ITTIs and the predicted values were found to be close to the observed values. This gave an indication that such regression models could be used for the prediction of gas-phase thermal entropy for a large number of organic compounds. Since it was important to use meaningful molecular descriptors, which could reflect molecular bulk, size as well as some kind of structural symmetry, Raychaudhury et al. [9] found TIC indices, TICZ and TICP , more suitable since they included the sum of atomic numbers as well as the number of atoms in a molecule and not merely their partition into disjoint classes, which were measured from ICZ and ICP . Together with the statistical model, Raychaudhury et al. [9] also discussed on how some of the mathematical results could help rationalize prediction of entropy in a deterministic manner. For example, one could easily say from the result of Theorem 8.3 that the value of TICZ would increase if an atom of lower atomic number in a molecule was replaced by another atom of higher atomic number (other than monoatomic species). Again, symmetry in molecular structure was believed to be another factor determining entropy. Although, information theoretical measures from symmetry of 3D structures of compounds were proposed based on point group of symmetry [8], the indices ICP and TICP seemed to reflect some types of topological symmetry in a molecule, since it was based on the partition of its atoms (vertices) from the emergence of a similar type of shortest molecular path from them. This, in a way, put topologically similarly positioned atoms in one partitioned class. Furthermore, total information index TICP included the number of atoms in a molecule

255

256

8 Information Content Measures and Prediction of Physical Entropy of Organic Compounds

that took into account the size aspect too and was considered to be a more suitable index for the prediction of entropy. Therefore, it appeared from that study [9] that statistically significant regression model obtained using ITTIs, TICZ , and TICP , together with Theorem 8.3 might be useful in predicting gas-phase thermal entropy of a large number of organic compounds. All these seemed to indicate that information theoretical approach could be useful and might be explored more toward predicting physical entropy. 8.4 Conclusion

The purpose of this chapter is to discuss on different information theoretical measures that stem from Shannon’s information [6] and their applications in predicting physical entropy. It is believed that such measures have the potentiality of characterizing elements of a discrete system, such as elements of molecular graphs. In a way, these measures may be viewed as measures of “eliminated homogeneity” or “a measure of variety” as Ashby called it [40] rather than a measure of “eliminated uncertainty.” This notion seems to be fitting with situations when various atoms constitute a chemical compound, where the atoms can assume different chemical natures and properties. Representation and characterization of chemical structures using graph give an ample opportunity to use information theoretical formalism to work with chemical compounds. In particular, we have seen that some of the information theoretical indices can effectively explain physical entropy of organic compounds. We started this chapter with the idea of giving an account of different information theoretical measures and on finding how such measures could be related to physical entropy, since the mathematical construct of both of them is very close in nature. The statistically significant correlation between some of the ITTIs and gas-phase thermal entropy (S0 ) of a series of organic compounds and the closeness in the observed and predicted S0 values of test compounds seem to support that idea. The statistical predictive model developed by the authors of [9] may be regarded as quite a general one to find successful application in predicting gas-phase thermal entropy of a large number of organic compounds. In general, we have seen that researchers have always looked for suitable molecular descriptors/properties in explaining physical entropy. This approach would possibly help determine and understand entropy on a more rational basis. We have also given some mathematical results in the form of theorems that may help work on both IC measures and physical entropy in a more deterministic way. Some definitions have also been given that may help prepare a set of standard definitions to work on molecular graphs. We have also proposed two information theoretical measures, namely PIC for individual elements of a system belonging to different partitioned classes and CPIC for partitioned classes. This may help understand the contribution of individual elements of a system and of the elements belonging to a given partitioned class in explaining molecular properties such as the contribution of a group of atoms or of individual molecular fragments/

References

substructures in predicting physical entropy. Therefore, it is quite apparent from the results presented here that more mathematical and statistical models on two closely related measures such as IC of a system and physical entropy may uncover many more interesting relationships between them and may help predict physical entropy using information theoretical indices for a wide variety of organic compounds.

8.5 Acknowledgment

We sincerely acknowledge the financial support by the Department of Biotechnology, New Delhi. We would also like to thank the Department of Computational and Data Sciences, Indian Institute of Science, Bangalore, India for extending the facilities for doing this work.

References 1. Clausius, R.J.E. (1865) Ueber ver-

2.

3.

4. 5.

6.

7.

8.

schiedene für die Anwendung bequeme Formen der Hauptgleichungen der mechanischen Wärmetheorie. Ann. Phys., 125, 353–400. doi: 10.1002/andp.18652010702. Zhao, L., Li, P., and Yalkowsky, S.H. (1999) Predicting the entropy of boiling for organic compounds. J. Chem. Inf. Comput. Sci., 39, 1112–1116. Gordon, M. and Kennedy, J.W. (1973) The graph-like state of matter. J. Chem. Soc., Faraday Trans. 2, 69, 484–504. Harary, F. (1972) Graph Theory, Addison-Wesley, Reading, MA. Pal, D. and Chakrabarti, P. (1999) Estimation of the loss of main-chain conformational entropy of different residues on protein folding. Proteins Struct. Funct. Genet., 36, 332–339. Shannon, C.E. and Weaver, W. (1949) Mathematical Theory of Communication, University of Illinois Press, Urbana, IL. Reis, M.C., Florindo, C.C.F., and Bassi, A.B.M.S. (2015) Entropy and its mathematical properties: consequences for thermodynamics. ChemTexts., 1, 9. doi: 10.1007/s40828-015-0009-x Bonchev, D., Kamenski, D., and Kamenska, V. (1976) Symmetry and information content of chemical structures. Bull. Math. Biol., 38, 119–133.

9. Raychaudhury, C. and Pal, D. (2013)

10.

11.

12.

13.

14.

15.

16.

Information content of molecular graph and prediction of gas phase thermal entropy of organic compounds. J. Math. Chem., 51, 2718–2730. Rashevsky, N. (1955) Life, information theory and topology. Bull. Math. Biophys., 17, 229–235. Trucco, E. (1956) A note on the information content of graph. Bull. Math. Biophys., 18, 129–135. Mowshowitz, A. (1968) Entropy and the complexity of the graphs I: an index of the relative complexity of a graph. Bull. Math. Biophys., 30, 174–204. Sarkar, R., Roy, A.B., and Sarkar, P.K. (1978) Topological information content of genetic molecules-I. Math. Biosci., 39, 299–312. Dehmer, M., Grabner, M., and Varmuza, K. (2012) Information indices with high discriminative power for graphs. PLoS One, 7 (2), e31214. doi: 10.1371/journal.pone.0031214 Chen, Z., Dehmer, M., and Shi, Y. (2014) A note on distance-based graph entropies. Entropy, 16, 5416–5427. Cao, S. and Dehmer, M. (2015) Degreebased entropies of networks revisited. Appl. Math. Comput., 261, 141–147.

257

258

8 Information Content Measures and Prediction of Physical Entropy of Organic Compounds 17. Bonchev, D., Mekenyan, O., and

18.

19.

20.

21. 22.

23.

24.

25.

26.

27.

28.

29.

Trinajstic, N. (1981) Isomer discrimination by topological information approach. J. Comput. Chem., 2, 127–148. Balaban, A.T. (1982) Highly discriminating distance-based topological index. Chem. Phys. Lett., 89, 399–404. Raychaudhury, C., Ray, S.K., Ghosh, J.J., Roy, A.B., Basak, S.C. (1984) Discrimination of isomeric structure using information theoretic topological indices. J. Comput. Chem., 5, 581–588. Randic, M. (1984) On molecular identification number. J. Chem. Inf. Comput. Sci., 24, 164–175. Brillouin, L. (1956) Science and Information Theory, Academic Press, New York. Basak, S.C., Roy, A.B., and Ghosh, J.J. (1979) Proceedings of the Second International Conference on Mathematical Modeling, vol. 2, University of Missouri, Rolla, MO, p. 851. Raychaudhury, C. and Ghosh, J.J. (1984) Proceedings of the Third Annual Conference of the Indian Society for Theory of Probability and its Applications, August 22–24, 1981, Wiley Eastern Limited, New Delhi. Raychaudhury, C. and Ghosh, I. (2004) An information-theoretical measure of similarity and a topological shape and size descriptor for molecular similarity analysis. Internet Electron. J. Mol. Des., 3, 350–360. Dehmer, M. and Mowshowitz, A. (2011) A history of graph entropy measures. Inf. Sci., 181, 57–78. Barigya, S.J., Marrero-Ponce, Y., Perez-Gimenez, F., and Bonchev, D. (2014) Trends in information theorybased chemical structure codification. Mol. Diversity, 18 (3), 673–686. Basak, S.C. (1987) Use of molecular complexity indices in predictive pharmacology and toxicology. Med. Sci. Res., 15, 605–609. Bonchev, D. (1983) Information Theoretic Indices for Characterization of Chemical Structures, Wiley-Research Studies Press, Chichester. Raychaudhury, C., Basak, S.C., Roy, A.B., and Ghosh, J.J. (1980) Quantitative structure-activity relationship (QSAR)

30.

31.

32.

33.

34.

35. 36.

37.

38.

39.

40.

studies of pharmacological agents using topological information content. Indian Drugs, 18, 97–102. Raychaudhury, C. and Pal, D. (2012) Use of vertex index in structure-activity analysis and design of molecules. Curr. Comput.-Aided Drug Des., 8, 128–134. Klopman, G. and Raychaudhury, C. (1988) A novel approach to the use of graph theory in structure-activity relationship studies. Application to the qualitative evaluation of Mutagenicity in a series of Nonfused ring aromatic compounds. J. Comput. Chem., 9, 232–243. Klopman, G. and Raychaudhury, C. (1990) Vertex indices of molecular graphs in structure-activity relationships: a study of the convulsant-anticonvulsant activity of barbiturates and the carcinogenicity of unsubstituted polycyclic aromatic hydrocarbons. J. Chem. Inf. Comput. Sci., 30, 12–19. Raychaudhury, C. (1983) Studies on Topological Information Content of linear and multigraphs and their applications. PhD thesis. Jadavpur University, India. Balaban, A.T. (ed) (1967) Chemical Application of Graph Theory, Academic Press, London. Trinajstic, N. (1983) Chemical Graph Theory, CRC Press, Boca Raton, FL. Kier, L.B. and Hall, L.H. (1986) Molecular Connectivity in Structure-Activity Analysis, Wiley-Research Studies Press, Letchworth. Hosoya, H. (1971) Topological index. A newly proposed quantity characterizing the topological nature of structural isomers of saturated hydrocarbons. Bull. Chem. Soc. Jpn., 44, 2332–2339. Bonchev, D. and Trinajstic, N. (1977) Information theory, distance matrix, and molecular branching. J. Chem. Phys., 67, 4517–4533. Balaban, A.T. (1992) Using real numbers as vertex invariants for third-generation topological indexes. J. Chem. Inf. Comput. Sci., 32, 23–28. Ashby, W. (1956) An Introduction to Cybernetics, John Wiley & Sons, Inc., New York.

259

9 Application of Graph Entropy for Knowledge Discovery and Data Mining in Bibliometric Data André Calero Valdez, Matthias Dehmer, and Andreas Holzinger

9.1 Introduction

Entropy, originating from statistical physics, is an interesting and challenging concept with many diverse definitions and various applications. Considering all the diverse meanings, entropy can be used as a measure of disorder in the range between total order (structured) and total disorder (unstructured) [1, 2], as long as by “order” we understand that objects are segregated by their properties or parameter values. States of lower entropy occur when objects become organized, and ideally when everything is in complete order, the entropy value is 0. These observations generated a colloquial meaning of entropy [3]. Following the concept of the mathematical theory of communication by Shannon and Weaver [4], entropy can be used as a measure of the uncertainty in a data set. The application of entropy became popular as a measure for system complexity with the paper by Pincus [5], who described Approximate Entropy as a statistic quantifying regularity within a wide variety of relatively short (>100 points) and noisy time series data. The development of this approach was initially motivated by data of length constraints, which is commonly encountered in typical biomedical signals including heart rate and electroencephalography (EEG), but also in endocrine hormone secretion data sets [6]. Hamilton et al. [7] were the first to apply the concept of entropy to bibliometrics to measure interdisciplinarity from diversity. While Hamilton et al. worked on citation data, a similar approach has been applied by Holzinger et al. [8, 9] using enriched metadata for a large research cluster.

Mathematical Foundations and Applications of Graph Entropy, First Edition. Edited by Matthias Dehmer, Frank Emmert-Streib, Zengqiang Chen, Xueliang Li, and Yongtang Shi. © 2016 Wiley-VCH Verlag GmbH & Co. KGaA. Published 2016 by Wiley-VCH Verlag GmbH & Co. KGaA.

260

9 Application of Graph Entropy for Knowledge Discovery and Data Mining in Bibliometric Data

9.1.1 Challenges in Bibliometric Data Sets, or Why Should We Consider Entropy Measures?

The challenges in bibliometric data stem from various sources. First, data integrity and data completeness can never be assumed. Thus, bibliometrics faces the following problems:

• • • •

Heterogeneous data sources: need for data integration and data fusion. Complexity of the data: network dimensionality. Large data sets: manual handling of the data nearly impossible. Noisy, uncertain, missing, and dirty data: careful data preprocessing necessary.

Beyond these data integrity problems, problems of interpretation and application are important. Meyer [10] lists six stylized facts that represent recurring patterns in bibliometric data:

• Lotka’s law [11] (Frequency of publication per author in a field), • The Matthew effect: Famous researchers receive a lot more citations than less• • • •

prominent researchers [12], Exponential growth of the number of scientists and journals [13], Invisible schools of specialties for every 100 scientists [13], Short-citation half-life, and Bradfort’s law of scattering of information.

These stylized facts lead to the consideration of analyzing publication data using graph-based entropy analysis. Bibliometric data are similar to social network data (e.g., small-world phenomenon) and obey the aforementioned laws. In these types of network data graphs, entropy may reveal potentials unavailable to standard social-network analysis methodology. Entropy measures have successfully been tested for analyzing short, sparse, and noisy time series data. However, they have not yet been applied to weakly structured data in combination with techniques from computational topology. Consequently, the inclusion of entropy measures for discovery of knowledge in bibliometric data promises to be a big future research issue, and there are a lot of promising research routes. Particularly, for data mining and knowledge discovery from noisy, uncertain data, graph entropy-based methods may bring some benefits. However, in the application of entropy for such purposes are several unsolved problems. In this chapter, we focus on the application of topological entropy and open research issues involved. In general, graph theory provides powerful tools to map data structures and find novel connections between single-data objects [14, 15]. The inferred graphs can be further analyzed by using graph-theoretical and statistical techniques [16]. A mapping of the aforementioned hidden schools as a conceptual graph and the subsequent visual and graph-theoretical analysis may bring novel insights into the

9.2

State of the Art

hidden patterns of the data, which exactly is the goal of knowledge discovery [17]. Another benefit of the graph-based data structure is the applicability of methods from network topology and network analysis and data mining (e.g., small-world phenomenon [18, 19] and cluster analysis [20, 21]). 9.1.2 Structure of this Chapter

This chapter is organized as follows: We have already seen a short introduction into the problems of bibliometrics and how entropy could be used to tackle these problems. Next, we investigate the state of the art in graph-theoretical approaches and how they are connected to text mining (see Section 9.2.1). This prepares us to understand how graph entropy could be used in data-mining processes (see Section 9.2.2). Next, we show how different graphs can be constructed from bibliometric data and what research problems can be addressed by each of those (see Section 9.2.3). We then focus on coauthorship graphs to identify collaboration styles using graph entropy (see Section 9.3). For this purpose, we selected a subgroup of the DBLP database and prepared it for our analysis (see Section 9.4). The results (see Section 9.5) show how two entropy measures describe our data set. From these results, we conclude our discussion of the results and consider different extensions on how to improve our approach (see Section 9.6).

9.2 State of the Art

Many problems in the real world can be described as relational structures. Graph theory [22] provides powerful tools to map such data structures and find novel connections between single-data objects [14, 15]. The inferred graphs can be further analyzed by using graph-theoretical and statistical techniques [16]. A mapping of the existing and in-medical-practice-approved knowledge spaces as a conceptual graph and the subsequent visual and graph-theoretical analysis may bring novel insights on hidden patterns in the data, which exactly is the goal of knowledge discovery [17]. Another benefit of a graph-based data structure is the applicability of methods from network topology and network analysis and data mining (e.g., small-world phenomenon [18, 19] and cluster analysis [20, 21]). The first question is “How to get a graph?,” or simply “How to get point sets?,” because point cloud data sets (PCD) are used as primitives for such approaches. Apart from “naturally available” point clouds (e.g., from laser scanners, or resulting from protein structures or protein interaction networks [23], or also text can be mapped into a set of points (vectors) in ℝn ), the answer to this question is not trivial (for some solutions (see [24]).

261

262

9 Application of Graph Entropy for Knowledge Discovery and Data Mining in Bibliometric Data

9.2.1 Graphs and Text Mining

Graph-theoretical approaches for text mining emerged from the combination of the fields of data mining and topology, especially graph theory [25]. Graphs are intuitively more informative as example words/phrase representations [26]. Moreover, graphs are the best-studied data structure in computer science and mathematics, and they also have a strong relation with logical languages [25]. Its structure of data is suitable for various fields such as biology, chemistry, material science, and communication networking [25]. Furthermore, graphs are often used to represent text information in natural language processing [26]. Dependency graphs have been proposed as a representation of syntactic relations between lexical constituents of a sentence. This structure is argued to more closely capture the underlying semantic relationships, such as subject or object of a verb, among those constituents [27]. The beginning of graph-theoretical approaches in the field of data mining was in the mid-1990s [25] and there are some pioneering studies such as [28–30]. According to [25], there are five theoretical bases of graph-based data-mining approaches: (i) subgraph categories, (ii) subgraph isomorphism, (iii) graph invariants, (iv) mining measures, and (v) solution methods. Furthermore, there are five groups of different graph-theoretical approaches for data mining: (i) greedy search-based approach, (ii) inductive logic programming-based approach, (iii) inductive database-based approach, (iv) mathematical graph theory-based approach, and (v) -kernel function based approach [25]. There remain many unsolved questions about the graph characteristics and the isomorphism complexity [25]. Moreover, the main disadvantage of graph-theoretical text mining is the computational complexity of the graph representation. The goal of future research in the field of graph-theoretical approaches for text mining is to develop efficient graph-mining algorithms, which implement effective search strategies and data structures [26]. Graph-based approaches in text mining have many applications from biology and chemistry to Internet applications [31]. According to Morales et al. [32], graph-based text-mining approach combined with an ontology (e.g., the Unified Medical Language System – UMLS) can lead to better automatic summarization results. In Ref. [33], a graph-based data-mining approach was used to systematically identify frequent coexpression gene clusters. A graph-based approach was used to disambiguate word sense in biomedical documents in Agirre et al. [34]. Liu et al. [35] proposed a supervised learning method for extraction of biomedical events and relations, based directly on subgraph isomorphism of syntactic dependency graphs. The method extended earlier work [36] that required sentence subgraphs to exactly match a training example, and introduced a strategy to enable approximate subgraph matching. These methods have resulted in high-precision extraction of biomedical events from the literature. While graph-based approaches have the disadvantage of being computationally expensive, they have the following advantages:

9.2

State of the Art

• It offers a far more expressive document encoding than other methods [26]. • Data which are graph-structured widely occur in different fields such as bibliometrics, biology, chemistry, material science, and communication networking [25]. A good example for graph learning has been presented by Liu et al. [37], who proposed a graph-learning framework for image annotation, where, first, the image-based graph learning is performed to obtain candidate annotations for each image and then word-based graph learning is developed to refine the relationships between images and words to get final annotations for each image. In order to enrich the representation of the word-based graph, they designed two types of word correlations based on Web search results besides the word co-occurrence in the training set. In general, image annotation methods aim to learn the semantics of untagged images from already annotated images to ensure an efficient image retrieval. 9.2.2 Graph Entropy for Data Mining and Knowledge Discovery

Rashevsky [38], Trucco [39], and Mowshowitz [40] were among the first researchers to define and investigate the entropy of graphs. Graph entropy was described by Mowshowitz [41] to measure structural information content of graphs, and a different definition, more focused on problems in information and coding theory, was introduced by Körner [42]. Graph entropy is often used for not only the characterization of the structure of graph-based systems, for example, in mathematical biochemistry, but also any complex network [43]. In these applications, the entropy of a graph is interpreted as its structural information content and serves as a complexity measure, and such a measure is associated with an equivalence relation defined on a finite graph. By applying Shannon’s Eq. (2.4) in Ref. [44] with the probability distribution, we get a numerical value that serves as an index of the structural feature captured by the equivalence relation [44]. The open-source graph visualization tool Gephi allows for several different graph analyses of network graphs. Traditionally, these are used with social network graphs (i.e.,coauthorship graphs). Interpretation of graph statistics must be reevaluated for mixed-node graphs. Graph statistics that are of interest in regard to publication networks are:

• Network entropies have been developed to determine the structural information content of a graph [44, 45]. We have to mention that the term network entropy cannot be uniquely defined. A reason for this is that by using Shannon’s entropy [41, 46, 47], the probability distribution cannot be assigned to a graph uniquely. In the scientific literature, two major classes have been reported [45, 48, 49]: 1. Information-theoretic measures for graphs, which are based on a graph invariant X (e.g., vertex degrees and distances) and equivalence criterion

263

264

9 Application of Graph Entropy for Knowledge Discovery and Data Mining in Bibliometric Data

[41]. By starting from an arbitrary graph invariant X of a given graph and equivalence criterion, we derive a partitioning. Thus, one can further derive a probability distribution. An example thereof is to partition the vertex degrees (abbreviated as 𝛿(v)) of a graph into equivalence classes, that is, those classes only contain vertices with degree i = 1, 2, … , max𝛿(v) (see, e.g., [16]). 2. Instead of determining partitions of elements based on a given invariant, Dehmer [48] developed an approach, which is based on using the so-called information functionals. An information functional f is a mapping, which maps sets of vertices to the positive reals. The main difference to partitionbased measures (see previous item) is that we assign probability values to every individual vertex of a graph (and not to a partition), that is, f (v ) pf (vi ) ∶= ∑|V | i . f (v ) j j=1

(9.1)

As the probability values depend on the functional f , we infer a family of graph entropy measures If (G) ∶= −

|V | ∑

pf (vi ) log pf (vi ),

(9.2)

i=1

where |V | is the size of the vertex set of G. Those measures have been extensively discussed in Ref. [16]. Evidently, both graph measures can be interpreted as graph complexity measures [16]. The latter outperforms partition-based entropy measures, because they integrate features from every vertex instead of subgraphs. This is important because when we look at bibliometric data (e.g., coauthorship graphs), we often differ to small degrees. Measuring these with partition-based entropy could lead to highly similar data for dissimilar graph data. 9.2.3 Graphs from Bibliometric Data

Graph entropy in bibliometric data can be applied to various forms of data. Depending on how the metadata are interpreted, different types of graphs can be constructed [50]. The question that we can apply differs depending on the type of graph. One must ask: What does graph entropy mean when bibliometric graph is analyzed. For this purpose, we first list various types of bibliometric graph representations.

• Collaboration-based graphs/coauthorship graphs: In a coauthorship graph, vertices represent unique authors that have published articles. Edges are inserted by connecting vertices that have published articles as coauthors.

9.2

State of the Art

Edge weights can be mapped to the frequency of collaboration. A key benefit of this type of analysis is that it can be applied using metadata alone that is publicly available. Authorship graphs are undirected graphs. Typical analyses are conducted to understand patterns of collaboration, interdisciplinarity, and the evolution of scientific subjects [51]. – Author level: When edges represent individual authors, we speak about author-level coauthorship graphs. – Institutional level: When edges represent institutions, we speak about institutional-level coauthorship graphs.

• Citation-based graphs: Mapping citations from articles requires more data than often available [52]. As vertices, we use articles that are joined by citation edges. Obviously, these graphs can only be constructed when citation data are available. A citation-based graph is a directed graph. No weights are assigned to edges. In these graphs, analyses can be conducted in various forms. Typical analyses are cocitation analysis (i.e., what documents get cited together [53]), centrality analyses (e.g., what documents form the core of knowledge), and bibliographic coupling [50] (i.e., what documents cite similar documents). These measures are also often used to identify scientific subject or the degree of interdisciplinarity of a journal. – Article level: When nodes represent individual articles, we speak about article-level citation graphs. – Journal level: When nodes represent journals, we speak about journallevel citation graphs. – Subject level: When nodes represent scientific subjects, we speak about subject-level citation graphs.

• Content/topic-based graphs: When full text or abstract data are available, content of articles may also be used in a graph-based representation. Using different text-mining approaches, topics may be identified. These can be used to map various information. Often topic-based graphs are multimodal, representing the relationships between different entities. These graphs are often used for recommendation purposes or to identify trends. – Author–topic mapping: When nodes represent authors and topics, analyses can be performed to understand how authors contribute to different topics. – Journal–topic mapping: When nodes represent journals and topics, we can analyze how topics are formed and which journals are the main contributors to a topic, and how they do so. – Article–topic mapping: When nodes represent articles and topics, we can analyze which articles (and thus which authors) have formed a topic and how it develops over time.

• Other/combined graphs: Using the aforementioned graphs, we can factor in various forms of metadata (e.g., time series data and citation data) to combine

265

266

9 Application of Graph Entropy for Knowledge Discovery and Data Mining in Bibliometric Data

different approaches. For example, we can use the publication data and citation data to identify how certain groups of authors have formed topics and where central ideas come from.

9.3 Identifying Collaboration Styles using Graph Entropy from Bibliometric Data

Bibliometrics or scientometrics is the discipline of trying to discover knowledge from scientific publication to understand science, technology, and innovation. Various analyses have been conducted in scientometrics using coauthorship networks as reviewed by Kumar [51]. Collaboration styles have been investigated by Hou et al. [54], identifying patterns of social proximity for the field of scientometrics itself. Topics were identified using co-occurrence analysis, and collaborative fields were identified. Application of graph entropy to publication data could be used to determine how scientific collaboration differs in various subfields. By analyzing coauthorship graphs in subcommunities, we could be able to identify structural differences in and between groups.

9.4 Method and Materials

In our example, we need to address the simplest form of bibliometric graph data. We use this type of data to test how graph entropy works in our scenario and combine it with other methods (i.e., community detection). The aim of this approach is to identify how different communities (i.e., group of authors that coauthor articles) differ in their topology. For this purpose, we evaluate the DBLP database of computer science. The XML database contains metadata on publications in the field of computer science and covers over 3 million articles (as of September 2015). In order to limit computation times, we focus on data only from the largest journals that deal with graph theory (see Table 9.1). Because we are interested in the structure of a collaboration graph, we focus on measures that account for symmetry in the graph. Topological information content is used to measure local symmetry within communities, and parametric graph entropy is used to measure the overall symmetry of a subcommunity. By reviewing the influence of both, we see how symmetry plays out from a detailand meta-perspective. A total of about 4811 publications were made over the years. By extracting author names and constructing a coauthorship graph, we get a network of 6081 vertices (i.e., authors) and 8760 edges (i.e., collaborations). No correction for multiple author names was performed. Duplicate entries with different spellings are considered as two distinct entries. We could remove duplicates by applying

9.5

Results

Table 9.1 Largest 10 journals from DBLP and their article count. Journal name

Articles

Graphs and Combinatorics SIAM Journal on Discrete Mathematics Ars Combinatoria IEEE Transactions on Information Theory Discrete Applied Mathematics Electronic Notes in Discrete Mathematics Journal of Combinatorial Theory, Series B SIAM Journal on Computing Combinatorics, Probability and Computing IEEE Transactions on Knowledge and Data Engineering

710 644 641 489 480 432 412 402 327 277

similarity measures (e.g., Levensthein distance), but for our approach, this is not necessary. For community detection we ran the Louvain algorithm supplied by the igraph R package. After sorting communities, we measure topological information content to determine the characteristics of collaboration in these subcommunities. We evaluated the following graph entropies: (1) A partition-based graph entropy measure called topological information content based on vertex orbits due to [41]. (2) Parametric graph entropies based on a special information functional f due to Dehmer [48]. The information functional we used is 𝜌(G)

f (vi ) ∶=

∑

ck |Sk (vi , G)|,

with ck > 0,

(9.3)

k=1

summing the product of both the size of the k-sphere (i.e., the amount of nodes in G with a distance of k from vi given as |Sk (vi , G)|) and arbitrary positive correction coefficients ck for all possible k from 1 to the diameter of the graph G. The resulting graph entropies have been defined by If ∶= −

|V | ∑

pf (vi ) log pf (vi ).

(9.4)

i=1

9.5 Results

The largest 10 communities range from a size of 225 to 115. In order to identify communities, we measure eigenvector centrality for the identified subcommunities and identify the top three most central authors (see Table 9.2). We then determine our two entropy measures for the given subcommunities to characterize the collaboration properties within these communities.

267

9 Application of Graph Entropy for Knowledge Discovery and Data Mining in Bibliometric Data

Table 9.2 The largest 10 identified communities. ID

Size

Most central three authors

1 2

225 217

3

172

4 5 6 7 8

166 161 141 132 127

9 10

127 115

Noga Alon; Alan M. Frieze; Vojtech Rödl Douglas B. West; Ronald J. Gould; Alexandr V. Kostochka Daniel Král; Ken-ichi Kawarabayashi; Bernard Lidický Muriel Médard; Tracey Ho; Michelle Effros Hajo Broersma; Zsolt Tuza; Andreas Brandstädt Xueliang Li; Cun-Quan Zhang; Xiaoyan Zhang Hong-Jian Lai; Guizhen Liu; Hao Li Syed Ali Jafar; Abbas El Gamal; Massimo Franceschetti Michael A. Henning; Ping Zhang; Odile Favaron Jayme Luiz Szwarcfiter; Celina M. Herrera de Figueiredo; Dominique de Werra

Imowsh

Idehm

7.3 7.21

7.804 7.751

6.97

7.414

6.849 6.889 6.563 6.44 6.385

7.363 7.317 7.153 7.031 6.98

6.561 6.232

6.975 6.838

8

6

Entropy

268

Entropy 4

Topological information content Information functional

2

0 0

500 1000 Communities ordered by size (desc)

Figure 9.1 Topological information content and parametric graph entropy distributions.

We can note that the used graph entropies evaluate the complexity of our communities differently (see Figures 9.1 and 9.2). Both seem to plot logarithmic curves but show different dispersion from an ideal curve. The distribution plot also shows typical properties of bibliometric data. Both entropies follow a power law distribution (see Figure 9.1). The topological information content seems to scatter more strongly than parametric graph entropy. On the contrary, the information functional-based graph entropy seems to follow the steps of the community size more precisely.

9.5

Results

8

Entropy

6

Entropy 4

Topological information content Information functional

2

0 0

50

100 150 Community size

200

Figure 9.2 Both entropies plotted over community size.

Now we will explore this problem with an example, namely by considering the measures Imowsh < Idehm for the largest subcommunity (i.e., ID = 1). In this context, the inequality Imowsh < Idehm can be understood by the fact that those entropies have been defined on different concepts. As mentioned, Imowsh is based on the automorphism group of a graph and, therefore, can be interpreted as a measure of symmetry. This measure becomes small when all vertices are located in only one orbit. By contrast, the measure is maximal (= log2 (|V |)) if the input graph equals the so-called identity graph; that means, all vertex orbits are singleton sets. In our case, we obtain Imowsh = 7.3 < log2 (225) = 7.814 and conclude that according to the definition of Imowsh , the community is rather symmetrical. Instead, the entropy Idehm characterizes the diversity of the vertices in terms of their neighborhood, see [45]. The higher the value of Idehm , the less topologically different vertices are present in the graph and, finally, the higher is the inner symmetry of our subcommunity. Again, maximum entropy for our network equals log2 (225) = 7.814. On the basis of the fact that for the complete graph K, Idehm (Kn ) = log(n) holds; thus, we conclude from the result Idehm = 7.804 that the community network is highly symmetrical and connected and could theoretically be obtained by deleting edges from K225 (see also Figure 9.3). A similar conclusion can be derived from looking at Imowsh = 7.3. In comparison, the values of community ID = 30 differ with respect to these values. Its topological information content is Imowsh = 2.355, while its parametric graph entropy is Idehm = 3.571. The theoretical maximum for this graph is log2 (12) = 3.58, which is very near to the parametric graph entropy. When looking at the resulting network plot (see Figure 9.4), we can see that the graph is symmetrical on a higher level. We have three subcommunities, all held together

269

270

9 Application of Graph Entropy for Knowledge Discovery and Data Mining in Bibliometric Data

Degree 10 20 30

Figure 9.3 Plot of the largest community in the coauthorship graph.

Dana Pardubská Stefan Dobrev Rastislav Královic L’ubomír Török Jan Trdlicka Imrich Vrto José D. P. Ro

va Czabarka

Degree 2.5 5.0 7.5

Pavel Tvrdík

10.0

Drago Bokal László A. Székely

Farhad Shahrokhi

Figure 9.4 Plot of the 30th community in the coauthorship graph with author names.

9.6

Discussion and Future Outlook

by the central author “Imrich Vrto.” The graph is thus less symmetrical on a higher order, but the inner symmetry is still high.

9.6 Discussion and Future Outlook

Different entropy measures deliver different results, because they are based on different graph properties. When using the aforementioned entropy measures in a coauthorship graph, measures of symmetry Idehm (based on vertex neighborhood diversity) or Imowsh (based on the graph automorphism) deliver different measures of entropy. Interpreted, we could say that authors can be similar in regard to their neighborhoods (i.e., authors show similar publication patterns), while the whole graph shows low measures of automorphism-based symmetry to itself. This could mean that authors cannot be exchanged for one another without changing basic properties of the graph. On the contrary, when Imowsh is significantly lower than Idehm , we could argue that symmetry differs on different levels of the graph. Interpreting these differences could be more interesting than looking at the individual symmetry measures themselves. Since publications and thus collaboration are time-related, one could extend this approach to Markov networks. Applying various graph entropy measures in this context could reveal changes in collaboration and indicate a shift in topics for authors or subgroups of authors. 9.6.1 Open Problems

From our work, we must say that deriving coauthorship communities based on Louvain clustering naturally leads to specific structures in community building. The created communities are probabilistic estimates of real communities. The investigated communities tend to show high similarity for the parametric graph entropy. This is expected, as they are constructed by removing edges from the full graph that is separated into subgraphs that should be coherent clusters. Our analysis shows that we can derive properties from coauthorship graphs that represent collaboration behavior, but our method is biased. It is likely to fail with small collaboration groups, as their entropy cannot take up that many different values. One approach to tackle this problem could be the use of bimodal graphs that include publication nodes. This, however, leads to drastically larger graphs, which in turn require more processing power. For further investigations, one could use empirical data or integrate text-mining approaches to identify more accurate clusters. The use of nonexclusive clustering methods could also improve our results. Additional measures of entropy should also be used to evaluate found communities.

271

272

9 Application of Graph Entropy for Knowledge Discovery and Data Mining in Bibliometric Data

9.6.2 A Polite Warning

Bibliometric analyses tend to be used in evaluations of scientific success quite often. Unfortunately, they are often used with only introductory knowledge in bibliometric evaluation. The purpose of this chapter is not to propose a method to evaluate research performance, but to provide new methods for the analysis of collaboration. Major deficits in this approach for performance measurement stem from typical bibliometric limitations (e.g., database coverage and author identification). Using these methods for performance evaluations without considering these limitations reveals a lack of understanding of bibliometrics and should therefore be left to bibliometric experts.

References 1. Holzinger, A. (2012) DATA 2012, 2.

3.

4.

5. 6. 7.

8.

9.

INSTICC, Rome, Italy, pp. 9–20. Holzinger, A., Stocker, C. et al. (2014) Communications in Computer and Information Science CCIS 455 (eds M.S. Obaidat and J. Filipe), Springer-Verlag, Berlin Heidelberg, pp. 3–18. Downarowicz, T. (2011) Entropy in Dynamical Systems, vol. 18, Cambridge University Press, Cambridge. Shannon, C.E. and Weaver, W. (1949) The Mathematical Theory of Communication, University of Illinois Press, Urbana, IL. Pincus, S.M. (1991) Proc. Natl. Acad. Sci. U.S.A., 88, 2297. Pincus, S. (1995) Chaos, 5, 110. Hamilton, K.S., Narin, F., and Olivastro, D. (2005) Using Bibliometrics to Measure Multidisciplinarity, ipIQ, Inc., Westmont, NJ. Holzinger, A., Ofner, B., Stocker, C., Calero Valdez, A., Schaar, A.K., Ziefle, M., and Dehmer, M. (2013) On graph entropy measures for knowledge discovery from publication network data, in Availability, Reliability, and Security in Information Systems and HCI, Springer-Verlag, Berlin Heidelberg, pp. 354–362 Calero Valdez, A., Schaar, A.K., Ziefle, M., Holzinger, A., Jeschke, S., and Brecher, C. (2014) Using mixed node publication network graphs for analyzing success in interdisciplinary teams, in Automation, Communication and

10.

11.

12. 13.

14. 15.

16. 17. 18. 19. 20. 21. 22.

Cybernetics in Science and Engineering 2013/2014, Springer International Publishing, pp. 737–749. Meyer, M. (2011) Bibliometrics, stylized facts and the way ahead: How to build good social simulation models of science? J. Artif. Soc. Social Simul., 14 (4), 4, DOI: 10.18564/jasss.1824. Lotka, A.J. (1926) The frequency distribution of scientific productivity. J. Wash. Acad. Sci., 16 (12), 317–324. Merton, R.K. (1968) The Matthew effect in science. Science, 159 (3810), 56–63. De Solla Price, D.J. (1963) Little Science, Big Science… and Beyond, Columbia University Press, New York. Strogatz, S. (2001) Nature, 410, 268. Dorogovtsev, S. and Mendes, J. (2003) Evolution of Networks: From Biological Nets to the Internet and WWW , Oxford University Press. Dehmer, M. and Mowshowitz, A. (2011) Inf. Sci., 181, 57. Holzinger, A., Dehmer, M. et al. (2014) BMC Bioinformatics, 15, I1. Barabasi, A.L. and Albert, R. (1999) Science, 286, 509. Kleinberg, J. (2000) Nature, 406, 845. Koontz, W., Narendra, P. et al. (1976) IEEE Trans. Comput., 100, 936. Wittkop, T., Emig, D. et al. (2011) Nat. Protoc., 6, 285. Harary, F. (1965) Structural Models. An Introduction to the Theory of Directed Graphs, John Wiley & Sons, Inc.

References 23. Canutescu, A.A., Shelenkov, A.A. et al. 24.

25. 26. 27.

28. 29. 30. 31. 32.

33. 34. 35. 36.

(2003) Protein Sci., 12, 2001. Holzinger, A., Malle, B. et al. (2014) Interactive Knowledge Discovery and Data Mining: State-of-the-Art and Future Challenges in Biomedical Informatics, Springer LNCS 8401 (eds A. Holzinger and I. Jurisica), SpringerVeralg, Berlin, Heidelberg, pp. 57–80. Washio, T. and Motoda, H. (2003) ACM SIGKDD Explor. Newsl., 5, 59. Jiang, C., Coenen, F. et al. (2010) Knowl. Based Syst., 23, 302. Melcuk, I. (1988) Dependency Syntax: Theory and Practice, State University of New York Press. Cook, D.J. and Holder, L.B. (1994) J. Artif. Int. Res., 1, 231. Yoshida, K., Motoda, H. et al. (1994) Appl. Intell., 4, 297. Dehaspe, L. and Toivonen, H. (1999) Data Min. Knowl. Discovery, 3, 7. Fischer, I. and Meinl, T. (2004) SMC, vol. 5, IEEE, pp. 4578–4582. Morales, L.P., Esteban, A.D. et al. (2008) Proceedings of the 3rd Textgraphs Workshop on Graph-Based Algorithms for Natural Language Processing. TextGraphs-3, Association for Computational Linguistics, Stroudsburg, PA, pp. 53–56. Yan, X., Mehan, M.R. et al. (2007) Bioinformatics, 23, i577. Agirre, E., Soroa, A. et al. (2010) Bioinformatics, 26, 2889. Liu, H., Hunter, L. et al. (2013) PLoS ONE, 8 e60954. Liu, H., Komandur, R. et al. (2011) Proceedings of BioNLP Shared Task 2011 Workshop, Association for Computational Linguistics, pp. 164–172.

37. Liu, J., Li, M. et al. (2009) Pattern Recog-

nit., 42, 218. 38. Rashevsky, N. (1955) Bull. Math. Bio-

phys., 17, 229. 39. Trucco, E. (1956) Bull. Math. Biol., 18,

129. 40. Mowshowitz, A. (1968) Bull. Math.

Biophys., 30, 533. 41. Mowshowitz, A. (1968) Bull. Math.

Biophys., 30, 175. 42. Körner, J. (1973) 6th Prague Conference

on Information Theory, pp. 411–425. 43. Holzinger, A., Ofner, B. et al. (2013)

44. 45. 46. 47. 48. 49. 50.

51. 52. 53. 54.

Multidisciplinary Research and Practice for Information Systems, Springer LNCS 8127 (eds A. Cuzzocrea, C. Kittl, D.E. Simos, E. Weippl, and L. Xu), Springer-Verlag, Heidelberg, Berlin, pp. 354–362. Dehmer, M. (2011) Symmetry, 3, 767. Dehmer, M., Varmuza, K. et al. (2009) J. Chem. Inf. Model., 49, 1655. Shannon, C.E. (1948) Bell Syst. Tech. J., 27, 379–423. Holzinger, A., Stocker, C. et al. (2012) Entropy, 14, 2324. Dehmer, M. (2008) Appl. Math. Comput., 201, 82. Mowshowitz, A. and Dehmer, M. (2012) Entropy, 14, 559. Kessler, M.M. (1963) Bibliographic coupling between scientific papers. Am. Doc., 14 (1), 10–25. Kumar, S. (2015) Aslib J. Inf. Manage., 67, 55. Culnan, M.J. (1987) MIS Q., 11 (3), 341–353. Small, H. (1973) J. Am. Soc. Inf. Sci., 24 (4), 265–269. Hou, H., Kretschmer, H., and Liu, Z. (2008) Scientometrics, 75, 189.

273

275

Index

a

c

Akaike information criterion (AIC) 194

chemical graph theory 146 citation-based graphs 265 chaotic-band reverse bifurcation process 15 circulant-matrix theory 3 class partial information content (CPIC) 235, 238, 239, 256 collaboration-based graphs 264 collaboration-styles, bibliometrics 266 complementary information content (CIC) 234, 237–239, 242 complex networks theory 2 complex and random networks – generalized entropy See generalized mutual entropy – network entropies 42 – open network 53 – social networks 41 content/topic-based graphs 265 contrast-based segmentation methods 214 convex corner entropy 107–108 counting problem 104–107

b Barabási–Albert model 185, 191 bare winding number 22 Bayesian networks – conditional probability 78 – information contents 80–83 – probability distribution 77 – topological ordering 77 – entropy production 83–84 – generalized second law of thermodynamics 84–86 bibliometric datasets – cluster analysis 261 – co-authorship graph measures 270, 271 – collaboration properties 267, 268 – data-integrity problems 260 – database coverage 272 – entropy measures 260 – graph-theoretical and statistical techniques 260, 261 – heterogeneous data sources 260 – methods 266, 267 – network topology and analysis 261 – parametric graph entropy 269 – performance evaluations 272 – toplogical information content and parametric graph entropy distributions 268 block entropy 11, 12, 14, 22 Bonchev-Trinajstić indices 211, 212 Brégman’s theorem 105 Bradfort’s law of scattering of information 260

d data mining applications 262 – disadvantage 262 – graph invariants 262 – high-precision extraction of biomedical events 262 – image annotation methods 263 – mining measures 262 – and ontology 262 – solution methods 262 – subgraph isomorphism 262 discrete Fourier transform (DFT) 6 discrete Shannon entropy 184 dressed winding number 22

Mathematical Foundations and Applications of Graph Entropy, First Edition. Edited by Matthias Dehmer, Frank Emmert-Streib, Zengqiang Chen, Xueliang Li, and Yongtang Shi. © 2016 Wiley-VCH Verlag GmbH & Co. KGaA. Published 2016 by Wiley-VCH Verlag GmbH & Co. KGaA.

276

Index

e entropy 102 – conditional entropy 103 – of a convex corner 107 – of a graph 108 – joint entropy 103 – relative entropy 104 Erd˝os–Rényi random graph 185, 186 extremal properties, graph entropy – degree-based and distance-based graph entropies 153–157 – If l (G), If 2 (G) If 3 (G) and entropy bounds for dendrimers 157–163 – information inequalities 166–171 – sphere-regular graphs 163–166

f feedback control – multiple measurements 89–91 – single measurement 86–89 Feigenbaum graphs 15, 21 finite undirected graph 134 fractional Brownian motions 9 fractional chromatic number 116–118

g Gephi tool 263 generalized graph entropies – definition 171 – vs. graph energy 175 – vs. graph matrix 171, 172, 174, 176, 177 – vs. spectral moments 171, 175 – vs. topological indices 171, 179 generalized mutual entropy – cluster differentiation 51 – corollary of theorem 46 – R-type matrix 48 – S-type connectivity matrix 46 – scale-free network 50 – semi-log plot 49 – sub-network size 48 – network entropy techniques 53 – scale-free network 47 geodesic active contours (GAC) – contrast-based segmentation 214 – evolution, grayscale images 214, 215 – force terms 215, 216 – multi-channel images 216 – numerics 216, 217 Gilbert random graph 185, 189 graph entropy – applications 134 – bicyclic graph 134 – CHF patients 9

– – – – – – – – – – – – – – – – – – –

classical measures 135 complex networks 3 convex corner 107–108 counting problem 104–107 definition 133 deterministic category 133 extrinsic probabilistic category 133 financial applications 9 fractional chromatic number 116–118 graph-based systems 133 hurricanes 8 integral 136 intrinsic probabilistic category 133 joint distribution 109 local clustering coefficient 3 macroscopic parameters 42 max-flow min-cut theorem 113 multidimensional scaling 3 natural and horizontal visibility algorithms 4–6 – networks, definition and properties 43–45 – optimization and critical points 19–26 – parametric measures 135, 136 – probability distribution 136 – properties 3, 110–112 – random variable 102–104, 117 – relative entropy and mutual information 104 – seismicity 8–10 – structural complexity 135 – symmetric graphs 119–120 – turbulence 9 – unicyclic graph 134 – visibility graph 10–13 graph entropy, inequalities – computational complexity 143 – discrete joint probability distribution 145 – discrete probability distribution 144 – edge entropy 146 – information functions 141–143 – information theoretic measures of UHG graphs 143–146 – maximum joint entropy 145 – multi-level function 143 – and parametric measures 139–141 – parameterized exponential information functions 148–153 – rooted trees and generalized trees, bounds 146–148 – time complexity 143 – vertex entropy 144, 146 graph spectral entropy – definition 189 – empirical graph spectral entropy 191

Index

– of brain networks 200 – structure 189 graph spectrum – empirical spectral density 188 – predictability of random variable outcomes 188 – spectral density estimators 189 – spectral graph theory 187 – spectral_density 188 – structure 188 graph-entropy-based texture descriptors – application, graph indices 209 – Bonchev and Trinajstić’s mean information, distances 212, 213 – Dehmer entropies 213, 214 – entropy-based graph indices 211–214 – fractal analysis 223–225 – graph construction 210, 211 – Haralick features 209 – image processing algorithms 209 – infinite resolution limits, graphs 222, 223 – information functionals, rewriting 221, 222 – Shannon’s entropy 212 – theoretical analysis 209 greedy search based approach 262 Gutenberg–Richter law 8

h hierarchical graph 146 horizontal visibility (HV) graph 18, 22 hurricanes 8 hyper-Wiener index 175 hypothesis test, graph collections – bootstrap procedure 196, 197 – Jensen–Shannon divergence 196 – Kullback–Leibler divergence 196 – null and alternative hypotheses 197 – of functional brain networks 200 – random resamplings 196 – ROC curves 186, 197–199

i image analysis, graph models – transfer image processing methods 205 – graph morphology 205 – pixel-graph framework 205 inductive database based approach 262 inductive logic programming based approach 262 information content (IC) measures – finite discrete system 236 – molecular graph 244 – negentropy principle 237

– on shortest molecular path 251, 252 – on vertex degree 245, 246 – Shannon’s information 237, 243 – shortest path 244 – theoretical definitions 243, 244 – topological architecture, molecule 243 – topological distances 246–250 – topological indices 243 – vertex weighted graph 251–253 information theoretical topological indices (ITTI) 235, 243 information theory 188 information thermodynamics 64–65 information-theoretical methods 141

j Jensen–Shannon divergence

192

k Karush–Kuhn–Tucker (KKT) optimality conditions 119 kernel function based approach 262 knowledge discovery, bibliometrics – graph complexity measures 264 – information functionals 264 – information-theoretic measures 263 – network entropies 263 – partition-based entropy 264 – partition-based measures 264 – structural information content 263 Kolmogorov–Sinai entropy 11, 12, 16 Kullback–Leibler divergence 104, 192, 194

l linear combination of graph invariants (LCGI) 254 local graph invariants (LOVI) 249 Lotka’s law 260 Louvain-clustering 271

m Markov chain networks 137, 271 Markovian dynamics 86–87 – energetics 72 – entropy production and fluctuation theorem 73–76 – information exchange 91–94 – time evolution, of external parameter 71 mathematical graph theory based approach 262 Matthew effect 260 Maxwell’s demon 64 model_selection_and_parameter_estimation – definition 193

277

278

Index

model_selection_and_parameter_estimation (contd.) – Kullback–Leibler divergence 194 – model selection simulation 195 – protein–protein networks 199 – spectral density 193 mutual information 68–69, 104

n natural visibility algorithm (NVa) 4–6 negentropy principle 234, 237 network dimensionality 260 nonlinear signal analysis 2 non-negative Borel functions 137 normalized information content (NIC) 237, 239

o open network – inter-cluster connections 57 – scale-free network 56 – size of 57 open source graph visualization tool 263 organic compounds, physical entropy – applications 235 – graph theoretical models 234 – information content measures 236 – information theoretical measures 235 – molecular graph models 236 – multigraph model 234 – representation and characterization 256 – statistical predictive model 256

p parametric complexity measures 138 parametric graph entropies 267 partial information content (PIC) 235, 238, 239 partition-based graph entropy measure 267 path graph 134 perturbation theory of linear operators 3 Pesin theorem 12, 19 physical entropy – applications 233 – biological functions 253 – gas phase thermal entropy 233 – graph-theoretical concepts 253 – information theoretical approaches 253 – information theory 233 – interpretation 233 – measures 233 – regression equation model 254 – thermodynamic property 253

– using information theoretical indices 254–256 positive integer partition, IC measures – cardinalities 240 – Shannon’s IC 240–243 – Shannon’s information 240 power law distribution 185 probability density 107

q quantitative graph theory – abruptness 204 – description 204 – discrimination power 204, 205 – graph index values 204 – inexact graph matching 204 – statistical methods 204 – structure sensitivity 204 quasi-periodicity 22, 35

r Rényi entropy 137, 185 Randić index 176–178 random graph model – biological networks 185 – description 184 – spectrum-based concepts 187 random Poisson process 8 random networks 3 random variable – conditional entropy 103 – joint entropy 103 – joint probability distribution 104 – probability density function 102 recurrence matrix 3 redundant information content (RIC) 234, 238, 239 relative entropy 67–68 – and mutual information 104 relative non-structural information content (RNSIC) 238 renormalization group (RG) method – entropy extrema and transformation 34–36 – network RG flow structure” 28 – period doubling accumulation point 31–32 – quantum field theory” 26 – quasi periodicity 32–34 – self-affine structures 26 – tangent bifurcation 29–31 Rényi entropy 42 rooted isomorphism 136 rooted networks 137

Index

s Scientometrics See bibliometric datasets seismicity 8 Shannon entropy 11, 42–43, 66, 68, 133, 235 short-citation half life 260 shortest molecular path, graph 244 Silverman’s criterion 190 simple random walk 137 statistical methods, graphs – differential entropy 184 – graph theory 183 structural information content (SIC) 234, 237 symmetric graphs 119–120

– quantitative measures 204, 205 – real-world images, zebra 220, 221 – synthetic images 207, 217–219 – texture models 207–209 TopoChemical Molecular Descriptor (TCMD) 244 TopoPhysical Molecular Descriptor (TPMD) 244 total information content (TIC) 234, 237, 239, 242 transfer entropy 66, 69–70 tree entropy 136, 137

u t tangent bifurcation 29–31 text mining – dependency graphs 262 – graph-theoretical approaches 262 text mining See also data mining texture segmentation, images – complementarity, texture and shape 206 – adaptive and non-adaptive patches 217 – GAC method 204, 219 – graph models 205 – graph-entropy-based texture descriptors 218 – morphological amoeba image filtering 217 – precision, segmentation 218

unified medical language system (UMLS) 262 unit corner 107

v vertex packing polytope 107 vertex weighted molecular graph 244, 251 visibility algorithm 4–9, 36 visibility graph 8 – EEGs 10 – entropy 10–13 – horizontal visibility graphs 26–36

w Watts–Strogatz models 183, 185, 186, 191 Wiener index 175

279

WILEY END USER LICENSE AGREEMENT Go to www.wiley.com/go/eula to access Wiley’s ebook EULA.