Network Science

579 112 90MB

English Pages 479 Year 2016

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Network Science

Citation preview

X ALBERT-LÁSZLÓ BARABÁSI

NETWORK SCIENCE PREFACE

ACKNOWLEDGEMENTS

NICOLE SAMAY

SARAH MORRISON

INDEX

Introduction How to Use This Book

1

Acknowledgements

2

Homework Bibliography

Figure x.0

Networks of Dispossession In 2014 a popular protest movement has shaken Turkey, prompting thousands of activists and protesters to decamp at Gezi Park. The protests were accompanied by online campaigns, using Twitter or the WWW to mobilize supporters. A central component of this campaign was Networks of Dispossessions, generated by a coalition of artists, lawyers, activists and journalists that mapped the complex financial relationships behind Istanbul's political and business elite. First exhibited at the Istanbul Biennial in 2013, the map reproduced here shows “dispossession” projects as black circles. The size of each circle represents the monetary value of the project. Corporations and media outlets, shown in blue, are directly linked to their projects. Work related crimes are noted in red and supporters of Turkey's Olympic bid are shown in purple, while the sponsor of the Istanbul Biennial are in turquoise. The map was developed by Yaşar Adanalı, Burak Arıkan, Özgül Şen, Zeyno Üstün, Özlem Zıngıland and anonymous participants using the Graph Commons (http://graphcommons.com/).

This book is licensed under a Creative Commons: CC BY-NC-SA 2.0. PDF V26, 05.09.2014

SECTION x.1

HOW TO USE THIS BOOK

The perspective offered by networks is indispensable for those who wish to understand today’s interlinked world. This textbook is the best avenue I found to share this perspective, offering anyone the opportunity to become a bit of a network scientist. Many of the choices I made in selecting the topics and in presenting the material were guided by the desire to offer a quantitative yet easy to follow introduction to the field. At the same time I tried to pass on the many insights networks offer about the many complex systems that surround us. To resolve these often conflicting desires, I paired the technical advances with historical notes and boxes that illustrate the roots and the applications of the key discoveries. This preface has two puposes. On one end, by describing the class that motivated this text, it offers some practical tips on how to best use the textbook. Equally important, it acknowledges the long list of individuals who helped move forward this textbook.

ONLINE COMPENDIUM Network science is rich in content and knowledge that is best apprecitated online. Therefore throughout the chapters we encounter numerous ONLINE RESOURCES that point to pertinent online material - videos, software, interactive tools, datasets and data sources. These resources are available on the http://barabasi.com/NetworkScienceBook website. The website also contains the PowerPoint slides that I used to teach network science, mirroring the content of this textbook. Anyone teaching networks should feel free to use these slides and modify them as they see it fit to offer the best classroom experience. There is no need to ask the author for permission to use these slides in educational settings. Online Resource x.1

Given the empirical roots of network science, the book has a strong

barabasi.com/NetworkScienceBook

emphasis on the analysis of real networks. We have therefore assembled

The website offers online access to the textbook, the videos, software and interactive tools mentioned in the Online Resources in the chapters, the slides I use to teach network science and the datasets analyzed in the book.

ten network maps that are frequently used in the literature to test various network characteristics. They were chosen to represent the diversity of the networks explored in network science, describing social, biological,

PREFACE

3

How to use this book

technological and informational systems. The Online Compendium offers WIKI ASSIGNMENT

access to these datasets, which are used throughout the book to illustrate the tools of network science.

1. Select a keyword related to network science and check that it is not already covered in Wikipedia. “Related” is defined widely- you can select a technical concept (degree distribution), a network-related concept (terrorist networks), an application of networks (networks in finance), you can write about a network scientist, or anything else that you can convincingly relate to networks.

Finally for those teaching the book in different languages, the website also mirrors the ongoing translation projects.

TEACHING NETWORK SCIENCE I have taught network science in two different settings. The first is a full semester class that attracts graduate and advanced undergraduate students with physics, computer science and engineering background. The second is a three-week two-credit class for students with economics and social science background. The textbook builds on both teaching expe-

2. You are not expected to generate original material. Instead you need to identify 2-5 sources on the subject (research papers, books, etc.) and write a succinct, self-contained encyclopedic style summary with references, graphs, tables, images, photos, as required to best cover the material. Observe Wikipedia’s copyright and notability guidelines.

riences: In the full semester class I cover the full text, integrating into the lectures the proofs and derivations contained in the Advanced Topics. In the shorter class I only cover the content of the main sections, omitting the Advanced Topics and the chapter on degree correlations. In both settings a key component of the class are assignments and the research project described next. Homework Problems

3. Upload your page on Wikipedia and send us the link. You will need to sign up for an account in Wikipedia, as anonymous editors cannot add new pages. Make sure that the page is not deleted by the Wikipedia administrators, which happens when the concept is not well documented or referenced, or is not written in an encyclopedic

For the longer class we assign as homework a subset of the problems listed at the end of each chapter, testing the technical proficiency of the students with the material and their problem solving ability. Two rounds of homework cover the material as we progress with the class. Wiki Assignment We ask each student to select a concept or a term related to network science and write a Wikipedia page on it (Figure x.1). What makes this

style.

assignment somewhat challenging is that the topic must not be already 4. The grade will reflect how understandable, pertinent, self-contained and accurate is the content of your page.

covered by Wikipedia, yet must be sufficiently notable to be covered. The Wiki assignment tests the students' ability to synthetize and distill material in an easy-to-understand encyclopedic style, potentially turning them into regular Wikipedia contributors. At the same time the assignment enriches Wikipedia with network science content, offering a service to the whole community. Those teaching network science in

Figure x.1

Wikipedia Assignment Guidelines

other languages should consider contributing to Wikipedia in their native language. Social Network Analysis As a warmup to network analysis, students are asked to analyze the social network of the class. This requires a bit of preparation and the help of a teaching assistant. In the very first class the instructor hands out the class list and asks everyone to check that they are on the list or add their name if they are missing. The teaching assistant takes the final list, and during the class prints an accurate class list for each student. At the end of the class each student is asked to mark everyone they knew before coming to the class. To help students match the faces with the names, each student is asked to briefly introduce themselves - also offering a chance for the instructor to learn more about the students in PREFACE

4

How to use this book

the class. These lists are then compiled to generate a social network of the class, enriching the nodes with gender and the name of the program

PRELIMINARY PROJECT

the students are engaged in. The anonymized version of this network is

Present 5 slides in no more than five minutes:

returned to the class halfway through the course, the assignment being

• Introduce your network, discussing its nodes and links. • Tell us how you will collect the data and estimate size of the network (N, L). Make sure that N > 100. • Tell us what questions you are planning to ask. We understand that they may change as you advance with your project and the class. • Tell us why you care about your network.

to analyze its properties using the network science tools the students acquired up to that point. This allows them to explore a relatively small network that they are invested in and understand. The assignment offers a preparation for the more extensive network analysis they will perform for their final research project. This homework is assigned after the hands-on class on software, so that the students are already familiar with the online tools available for network analysis. Final Research Project The final project is the most rewarding part of the class, offering the students the opportunity to combine and utilize all the knowledge they acquired. Students are asked to select a network of interest to them,

Figure x.2

Preliminary Project Guidelines

map it out and analyze it. Some procedural details enrich this assignment: (a) The project is carried out in pairs. If the class composition allows, the students are asked to form professionally heterogenous pairs: undergraduate students are asked to pair up with graduate students, or students from different programs are asked to work together, like a physics student with a biology student. This forces the students to collaborate outside their expertise level and comfort zone, a common ingredient of interdisciplinary research. The instructor does not do the pairing, but students are encouraged to find their partners. (b) A few weeks into the course one class is devoted to preliminary project presentations. Each group is asked to offer a five minute presentation with no more than five slides, offering a preview of the dataset they selected (Figure x.2). Students are advised to collect their own data - simply downloading a dataset already prepared for network analysis is not acceptable. Indeed, one of the goals of the project is to experience the choices and compromises one must make in network mapping. Manual mapping is allowed, like looking up the ingredients of recipes in a cookbook or the interaction of characters in a novel or a historical text. Digital mapping is encouraged, like scrapping data from a website or a database that is not explicitly organized as a network map, but the students must reinterpret and clean the data to make it amenable for network analysis. For example one can systematically scrap data from Wikipedia to identify relationships between writers, scientists or concepts. (c) It is important to always emphasize that the purpose of the final project is to test a student's ability to analyze a network. Consequently students must stay focused on exploring the network aspect of the data, and avoid being carried away by other tempting questions their dataset poses that would take them away from this goal. PREFACE

5

How to use this book

FINAL PROJECT

(d) The course ends with the final project presentations. Depending on the size of the class, we devote one or two classes to this (Figure x.3).

Each group has 10 minutes to present their final project. Time limit is strictly enforced. On the first slide, give your title, name and program.

The choice of the Wikipedia keywords, the partner selection for the research project, and the choice of the topic for the final project requires repeated feedback from the instructor, making sure that all students

Tell us about your data and the data collection method. Show an entry of the data source to offer a sense of where you started from.

are on track. To achieve this the last ten minutes of each class is devoted to asking everyone: Have you chosen a network that you wish to analyze? What are your nodes and your links? Do you know how to get

Measure: N, L, and their time dependence if you have a time dependent network; degree distribution, average path length, clustering coefficient, C(k), the weight distribution P(w) if you have a weighted network. Visualize communities; discuss network robustness and spreading, degree correlations, whichever is appropriate for your project.

the data? Do you have a partner for your final project? What is your Wiki word? Did you check if it is already covered by Wikipedia? Did you collect literature pertaining to it? The answers range from "Not yet", to firm or vague ideas the students are entertaining. By providing public feedback about the appropriateness and the feasibility of their plans helps those who are behind to crystallize their ideas, and to identify potential partners with common interests. In a few classes typically ev-

It is not sufficient to simply measure things you need to discuss the insights you gained, always asking:

eryone finds a partner, identifies a research project and a Wikipedia keyword, at which point this end-of-class ritual ends.

• What was your expectation? • What is the proper random reference? • How do the results compare to your expectation? • What did you learn from each quantity?

Software We devote one class to various network analysis and visualization software, like Gephi, Cytoscape, or NetworkX. In the longer class we devote another one to other numerical procedures, like fitting, log-binning or network visualization. We ask students to bring their laptops to these

Grading criteria:

classes, so that they can try out these tools immediately.

• Use of network tools (completeness/ correctness); • Ability to extract information/insights from your data using the network tools; • Overall quality of the project/presentation.

Movie Night We devote one night, typically outside the class time, to a movie night, where we screen the documentary Connected by Annamaria Talas. The one-hour documentary features many contributors to network sci-

No need to write a report - email us the presentation as a pdf file.

ence, and offers a compelling narrative of the field's importance. Movie Night is advertised university wide, offering a chance to reach out to a wider community. Guest Speakers

Figure x.3

Final Project Guidelines

In the full semester class we invite researchers from the area to give research seminars about their work pertaining to networks. This offers the students a sense of what cutting edge research looks like in this area. This is typically (but not always) done towards the end of the class, by which point most theoretical tools are covered and the students are focusing on their final project. Such talks, advertised and open to the local research community, often inspire additional perspectives and ideas for the final project. To aid the planning of the class, Figure x.4 offers the schedule of the full semester class I co-taught before this book went to print.

PREFACE

6

How to use this book

GRADE DISTRIBUTION

COMPLEX NETWORKS: SYLLABUS

(1) Assignment 1 (Homework 1): 15%

Week 1 • Class 1 Ch. 1: Introduction • Class 2 Ch. 2: Graph Theory

(2) Assignment 2 (Homework 2): 15% (3) Assignment 3 (Class Network): 15%

Week 2 • Class 1 Ch. 3: Random Networks • Class 2 Ch. 3: Random Networks

(4) Assignment 4 (Wikipedia): 15% (5) Preliminary Project Presentation: No grading, only feedback.

Week 3 • Class 1 Ch. 4: The Scale-Free Property • Class 2 Ch. 4: The Scale-Free Property Hand-out Assignment 1 (Problems for Chapters 1-5) Week 4 • Class 1 Ch. 5: The Barabási-Albert model • Class 2 Ch. 5: The Barabási-Albert model

(6) Final Project: 40%

Figure x.4

Grading The grading system used in the one semester class.

Week 5 • Class 1 Preliminary Project Presentations • Class 2 Hands-on Class: Graph representation, binning, fitting Week 6 • Class 1 Hands-on Class: Gephi and Python Collect Assignment 1; Hand-out Assignment 2: Class Network Analysis • Class 2 Guest Speaker Week 7 • Class 1 Ch. 6: Evolving Networks • Class 2 Ch. 6: Evolving Networks Week 8 • Class 1 Guest Speaker Collect Assignment 2 • Class 2 Ch. 7: Degree Correlations Hand out Assignment 3 (Problems for Chapters 6-10) Week 9 • Class 1 Ch. 8: Network Robustness Hand out Assignment 4: Wikipedia Page • Class 2 Ch. 8: Network Robustness

Figure x.5

The Syllabus The week-by-week schedule of the four credit network science class, that meets twice a week.

Week 10 • Class 1 Ch. 9: Communities • Class 2 Ch. 9: Communities Movie Night: Connected, by Annamaria Talas Week 11 • Class 1 Ch. 10: Spreading Phenomena • Class 2 Ch. 10: Spreading Phenomena Week 12 • Class 1 Guest Speaker • Class 2 Ch. 10: Spreading Phenomena Collect Assignment 4 Week 13 • Class 1 Guest Speaker • Class 2 Open-Door class (Research Project Discussions) Collect Assignment 3 Week 14 • Exam Week Final Project Presentations (10 min per group)

PREFACE

7

How to use this book

SECTION x.2

ACKNOWLEDGEMENTS

Writing a book, any book, is an exercise in lonely endurance. This project was no different, dominating all my free time between 2011 and 2015. It was mostly time spent alone, working in one of the many coffeehouses I frequent in Boston and Budapest, or wherever in the world the morning found me. Despite this the book is far from being a lonely achievement: During these four years a number of individuals have donated their time and expertise to help move forward the project, offering me the opportunity to discuss the subject with colleagues, friends and lab members. I also shared the chapters on the internet for everyone to use, receiving valuable feedback from many individuals. In this section I wish to acknowledge the professional network that stepped in to help at various stages of this long journey.

Figure x.6

The Math Team Márton Pósfai was responsible for the calculations, simulations and measurements in the textbook.

FORMULAS, GRAPHS, SIMULATIONS A textbook must ensure that everything works as promised. That one can derive the key formulas, and that the measures described in the text, when applied to real data, work as the theory predicts. There is only one way to achieve this: One must check and repeat each calculation, measurement and simulation. This was a heroic job, most of it done by Márton Pósfai, who joined the project when he was a visiting student in my lab in Boston and stayed with it throughout his PhD work in Budapest, Hungary. He checked all derivations, if needed helped re-derive key formulas, performed all the simulations and measurements and prepared the book’s figures and tables. Many figures and tables amounted to small research projects, their outcome forcing us to de-emphasize some quantities because they did not work as promised, or helped us appreciate and understand the importance of others. His deep understanding of the network science literature and his careful work offered many subtle insights that enriched the book. There is no way I could have achieved this depth and reliability if it wasn’t for Márton’s tireless dedication to the project.

THE DESIGN The ambition to create a book that had a clear aesthetic and visual appeal was planted by Mauro Martino, a data visualization expert in my lab.

PREFACE

8

Acknowledgements

He created the first face of the chapters and many visual elements de-

Figure x.5

signed by him stayed with us until the end. After Mauro moved on to lead

The Design Team Mauro Martino, Gabriele Musella and Nicole Samay have developed the look and feel of the chapters and the figures, offering the book an elegant and consistent style.

a team of designers at IBM Research, Gabriele Musella took over the design. He standardized the color palette and designed the basic elements of the info-graphics appearing throughout the book, also redrawing most images. He worked with us until the fall of 2014, when he too had to return to London to take up his dream job. At that time the design was taken over by Nicole Samay, who tirelessly and gently retouched the whole book as we neared the finish line. The website for the book was designed by Kim Albrecht, who currently collaborates with Mauro to design the online experience that trails the book. An important component of the visual design are the images included at the beginning of each chapter, illustrating the interplay between networks and art. In selecting these images I have benefited from advice and discussions with several artists and designers, academics and practicing artists alike. Many thanks to Isabel Meirelles and Dietmar Offenhuber from the Art and Design Department at Northeastern, Mathew Ritchie from Columbia University, and Meredith Tromble from the San Francisco Art Institute, for helping me navigate the boundaries of art, data and network science.

THE DAILY DRILL: TYPING, EDITING

Figure x.6

The Editorial Team Payam Parsinejad, Amal Al-Hussieni and Sarah Morrison have worked daily on the book, editing and correcting it.

I remain an old-fashioned writer, who writes with a pencil rather than a computer. I am lost, therefore, without editors and typers, who integrate my hand-written notes, corrections and recommendations into each PREFACE

9

Acknowledgements

chapter. Sabrina Rabello and Galen Wilkerson have helped get this project started. Yet, the bulk of editing fell on the shoulders of three individuals. Payam Parsinejad worked with me during first year of the project. After he had to refocus on his research, Amal Al-Husseini, a former student from my network science class, joined us, and stayed until the very end. Equally defining was the help of Sarah Morrison, my former assistant, who joined the project after she moved to Lucca, Italy. Her timely and accurate editing were essential to finish the book. Each chapter, before it was released on our webpage, has undergone a final check by Phillipp Hoevel, who joined the project while visiting my lab, and continued to work with us even after he returned to Berlin to run his own lab. Philipp methodically reviewed everything, from the science to notations, becoming our first reader and final filter. Brett Common has worked tirelessly to secure all the permissions for the visual materials used throughout the textbook. This was a major project on its own, whose magnitude and difficulty was hard to anticipate.

HOMEWORK The homework at the end of each chapter were conceived and curated by Roberta Sinatra. As a research faculty affiliated with my lab, Roberta has co-taught the network science class with me in the fall of 2014, helping also catch and correct many typos and misunderstanding that surfaced while teaching the material.

Figure x.7

Accuracy and Rights Philipp Hoevel acted as our first reader and last editor. The rights were obtained and managed by Brett Common.

SCIENCE INPUT Throughout the project I have received comments, recommendations, advice, clarifications, and key materials from numerous scientists and students. It is impossible to recall them all, but I will try. Chaoming Song helped estimate the degree exponent of scale-free networks and helped me uncover the literature pertaining to cascading failures. The mathematician Endre Csóka helped clarify the subtle details of the Bollobás model. I have benefited from a great discussion with Raissa D’Souza on optimization models, with Ginestra Bianconi on the fitness model, and with Erzsébet Ravasz Reagan on the Ravasz algorithm. Alex Vespignani was a great resource on spreading processes and degree correlations. Marian Boguña has snapped the picture for the Karate Club Trophy. Huawei Shen calculated the future citations of research papers. Gergely Palla and Tamás Vicsek helped me understand the CFinder algorithm and Martin Rosvall pointed us to some key material on the InfoMap algorithm.

Figure x.8

Homework Roberta Sinatra has conceived and compiled the homework after each chapter in the textbook.

Gergely Palla, Sune Lehmann and Santo Fortunato offered critical comments on the community detection chapter. Yong-Yeol Ahn helped me develop the early version of the material on spreading phenomena. Ramis Movassagh, Hiroki Sayama and Sid Redner have provided careful feedback on several chapters, and Kate Coronges has helped improve the clarity of the first four chapters.

PREFACE

10

Acknowledgements

PUBLISHING Simon Capelin, my longtime editor at Cambridge University Press, has been encouraging this project even before I was ready to write it. He also had the patience to see the book to its completion, through many missed deadlines. Róisín Munnelly has helped move the book through production within Cambridge.

INSTITUTIONS This book would not have been possible if several institutions did not offer inspiring environments and a supporting infrastructure. First and foremost I need to thank the leadership of Northeastern University, from its President, Joseph Aoun, its Provost, Steve Director, my deans, Murray Gibson and Larry Finkelstein, and my department chair, Paul Champion, who were true champions of network science, turning it into a major cross-disciplinary topic within Northeastern. Their relentless support has lead to the hiring of several superb faculty focusing on networks, spanning all domains of inquiry, from physics and mathematics to social, political, computer and health sciences, turning Northeastern into the leading institution in this area. They have also urged and supported the creation of a network science PhD program and helped found the Network Science Institute lead by Alessandro Vespignani. My appointment at Harvard Medical School, through the Network Medicine Division at Brigham and Women's Hospital and Center for Cancer Systems Biology at Dana Farber Cancer Institute, offered a window on the applications of network science in cell biology and medicine. Many thanks to Marc Vidal from DFCI and Joe Loscalzo from Brigham, who, as colleagues and mentors have defined my work in this area, an experience that found its way into this book as well. My visiting appointment at Central European University, and the network science class I teach there in the summer, have exposed me to a student body with economics and social science background, an experience that has shaped this textbook. Balázs Vedres had the vision to bring network science to CEU, George Soros convinced me to get involved with the university and President John Shattuck and Provosts Farkas Katalin and Liviu Matei, with their relentless support, have smoothed the path toward CEU's superb program in this area, giving birth to CEU's PhD program in network science. Finally, thanks to the place where it all began: As a young assistant Professor, University of Notre Dame offered me the support and the serene environment to think about something different. And big thanks to Suzanne Aleva, who followed my lab from Notre Dame to Northeastern, and worked tirelessly for over a decade to foster an environment where I can focus, uninterrupted, on science.

PREFACE

11

Acknowledgements

1 ALBERT-LÁSZLÓ BARABÁSI

NETWORK SCIENCE INTRODUCTION

ACKNOWLEDGEMENTS

MÁRTON PÓSFAI GABRIELE MUSELLA MAURO MARTINO ROBERTA SINATRA

PHILIPP HOEVEL SARAH MORRISON AMAL HUSSEINI

INDEX

Vulnerability Due to Interconnectivity

1

Networks at the Heart of Complex Systems

2

Two Forces Helped the Emergence of Network Science

3

The Characteristics of Network Science

4

Societal Impact

5

Scientific Impact

6

Summary

7

Homework

8

Bibliography

9 Figure 1.0 (front cover)

Mark Lombardi: Global International Airway and Indian Spring State Bank Mark Lombardi (1951 – 2000) was an American artist who documented “the uses and abuses of power.” His work was preceded by careful research, resulting in thousands of index cards, whose number began to overwhelm his ability to deal with them. Hence Lombardi began assembling them into hand-drawn diagrams, intended to focus his work. Eventually these diagrams became a form of art on their own [1]. The image shows one such drawing, created between 1977 and 1983 in colored pencil and graphite on paper.

This work is licensed under a Creative Commons: CC BY-NC-SA 2.0. PDF V26, 03.09.2014

SECTION 1.1

VULNERABILITY DUE TO INTERCONNECTIVITY

At a first glance the two satellite images of Figure 1.1 are indistinguish-

(a)

able, showing lights shining brightly in highly populated areas and dark spaces that mark vast uninhabited forests and oceans. Yet, upon closer inspection we notice differences: Toronto, Detroit, Cleveland, Columbus and Long Island, bright and shining in (a), have have gone dark in (b). This is not a doctored shot from the next Armageddon movie but represents a real image of the US Northeast on August 14, 2003, before and after the blackout that left without power an estimated 45 million people in eight US states and another 10 million in Ontario. (b)

The 2003 blackout is a typical example of a cascading failure. When a network acts as a transportation system, a local failure shifts loads to other nodes. If the extra load is negligible, the system can seamlessly absorb it, and the failure goes unnoticed. If, however, the extra load is too much for the neighboring nodes, they will too tip and redistribute the load to their neighbors. In no time, we are faced with a cascading event, whose magnitude depends on the position and the capacity of the nodes that failed initially. Cascading failures have been observed in many complex systems. They take place on the Internet, when traffic is rerouted to bypass malfunction-

Figure 1.1 2003 North American Blackout

ing routers. This routine operation can occasionally create denial of service

(a) Satellite image on Northeast United States on August 13th, 2003,at 9:29pm (EDT), 20 hours before the 2003 blackout.

attacks, which make fully functional routers unavailable by overwhelming them with traffic. We witness cascading events in financial systems, like in 1997, when the International Monetary Fund pressured the central banks

(b) The same as above, but 5 hours after the blackout.

of several Pacific nations to limit their credit, which defaulted multiple corporations, eventually resulting in stock market crashes worldwide. The 2009-2011 financial meltdown is often seen as a classic example of a cascading failure, the US credit crisis paralyzing the economy of the globe, leaving behind scores of failed banks, corporations, and even bankrupt states. Cascading failures can be also induced artificially. An example is the worldwide effort to dry up the money supply of terrorist organizations, aimed at crippling their ability to function. Similarly, cancer researchers aim to induce cascading failures in our cells to kill cancer cells. INTRODUCTION

3

The Northeast blackout illustrates several important themes of this book: First, to avoid damaging cascades, we must understand the structure of the network on which the cascade propagates. Second, we must be able to model the dynamical processes taking place on these networks, like the flow of electricity. Finally, we need to uncover how the interplay between the network structure and dynamics affects the robustness of the whole system. Although cascading failures may appear random and unpredictable, they follow reproducible laws that can be quantified and even predicted using the tools of network science. The blackout also illustrates a bigger theme: vulnerability due to interconnectivity. Indeed, in the early years of electric power each city had its own generators and electric network. Electricity cannot be stored, however: Once produced, electricity must be immediately consumed. It made economic sense, therefore, to link neighboring cities up, allowing them to share the extra production and borrow electricity if needed. We owe the low price of electricity today to the power grid, the network that emerged through these pairwise connections, linking all producers and consumers into a single network. It allows cheaply produced power to be instantly transported anywhere. Electricity hence offers a wonderful example of the huge positive impact networks have on our life. Being part of a network has its catch, however: local failures, like the breaking of a fuse somewhere in Ohio, may not stay local any longer. Their impact can travel along the network’s links and affect other nodes, consumers and individuals apparently removed from the original problem. In general interconnectivity induces a remarkable non-locality: It allows information, memes, business practices, power, energy, and viruses to spread on their respective social or technological networks, reaching us, no matter our distance from the source. Hence networks carry both benefits and vulnerabilities. Uncovering the factors that can enhance the spread of traits deemed positive, and limit others that make networks weak or vulnerable, is one of the goals of this book.

INTRODUCTION

4

VULNERABILITY DUE TO INTERCONNECTIVITY

SECTION 1.2

NETWORKS AT THE HEART OF COMPLEX SYSTEMS

“I think the next century will be the century of complexity.”

BOX 1.1

Stephen Hawking

COMPLEX

[adj., v. kuh m-pleks, kom-pleks; n. kom-pleks]

We are surrounded by systems that are hopelessly complicated. Consider for example the society that requires cooperation between billions of

1) composed of many intercon-

individuals, or communications infrastructures that integrate billions of cell phones with computers and satellites. Our ability to reason and com-

nected

prehend our world requires the coherent activity of billions of neurons in

composite: a complex high-

parts;

compound;

our brain. Our biological existence is rooted in seamless interactions be-

way system

tween thousands of genes and metabolites within our cells. 2) characterized by a very com-

These systems are collectively called complex systems, capturing the

plicated or involved arrange-

fact that it is difficult to derive their collective behavior from a knowledge

ment of parts, units, etc.:

of the system’s components. Given the important role complex systems

complex machinery

play in our daily life, in science and in economy, their understanding, 3) so complicated or intricate as

mathematical description, prediction, and eventually control is one of the

to be hard to understand or

major intellectual and scientific challenges of the 21st century.

deal with: a complex problem The emergence of network science at the dawn of the 21st century is Source: Dictionary.com

a vivid demonstration that science can live up to this challenge. Indeed, behind each complex system there is an intricate network that encodes the interactions between the system’s components: (a) The network encoding the interactions between genes, proteins,

and metabolites integrates these components into live cells. The very existence of this cellular network is a prerequisite of life. (b) The wiring diagram capturing the connections between neurons,

called the neural network, holds the key to our understanding of how the brain functions and to our consciousness.

INTRODUCTION

5

(c) The sum of all professional, friendship, and family ties, often called

the social network, is the fabric of the society and determines the spread of knowledge, behavior and resources. (d) Communication networks, describing which communication devic-

es interact with each other, through wired internet connections or wireless links, are at the heart of the modern communication system. (e) The power grid, a network of generators and transmission lines,

supplies with energy virtually all modern technology.

Figure 1.2

Subtle Networks Behind the Economy A credit card selected as the 99th object in The History of the World in 100 Objects exhibit by the British Museum. This card is a vivid demonstration of the highly interconnected nature of the modern economy, relying on subtle economic and social connections that normally go unnoticed.

(f) Trade networks maintain our ability to exchange goods and services,

being responsible for the material prosperity that the world has enjoyed since WWII (Figure 1.2). Networks are also at the heart of some of the most revolutionary tech-

The card was issued in the United Arab Emirates in 2009 by the Hong Kong and Shanghai Banking Corporation, known as HSBC, a London based bank. The card functions through protocols provided by VISA, a USA based credit association. Yet, the card adheres to Islamic banking principles, which operates in accordance with Fiqhal-Muamalat (Islamic rules of transactions), most notably eliminating interest or riba. The card is not limited to muslims in the United Arab Emirates, but is offered in non-Muslim countries as well, to anyone who agrees with its strict ethical guidelines.

nologies of the 21st century, empowering everything from Google to Facebook, CISCO, and Twitter. At the end, networks permeate science, technology, business and nature to a much higher degree than it may be evident upon a casual inspection. Consequently, we will never understand complex systems unless we develop a deep understanding of the networks behind them. The exploding interest in network science during the first decade of the 21st century is rooted in the discovery that despite the obvious diversity of complex systems, the structure and the evolution of the networks behind each system is driven by a common set of fundamental laws and principles. Therefore, notwithstanding the amazing differences in form, size, nature, age, and scope of real networks, most networks are driven by common organizing principles. Once we disregard the nature of the components and the precise nature of the interactions between them, the obtained networks are more similar than different from each other. In the following sections we discuss the forces that have led to the emergence of this new research field and its impact on science, technology, and society.

INTRODUCTION

6

NETWORKS AT THE HEART OF COMPLEX SYSTEMS

SECTION 1.3

TWO FORCES THAT HELPED NETWORK SCIENCE

600

Network science is a new discipline. One may debate its precise begin-

Erdős-Rényi (1959)

ning, but by all accounts the field has emerged as a separate discipline only

500

in the 21st century.

Granovetter (1973) 400

Why didn’t we have network science two hundred years earlier? After all many of the networks that the field explores are by no means new:

300

metabolic networks date back to the origins of life, with a history of four 200

billion years, and the social network is as old as humanity. Furthermore, many disciplines, from biochemistry to sociology and brain science, have

100

been dealing with their own networks for decades. Graph theory, a prolific subfield of mathematics, has explored graphs since 1735. Is there a reason,

0

therefore, to call network science the science of the 21st century? Something special happened at the dawn of the 21st century that tran-

1960

1970

1980

1990

2000

2008

Figure 1.3

The Emergence of Network Science

scended individual research fields and catalyzed the emergence of a new discipline (Figure 1.3). To understand why this happened now and not two

While the study of networks has a long history, with roots in graph theory and sociology, the modern chapter of network science emerged only during the first decade of the 21st century.

hundred years earlier, we need to discuss the two forces that have contributed to the emergence of network science.

THE EMERGENCE OF NETWORK MAPS

The explosive interest in networks is well documented by the citation pattern of two classic papers, the 1959 paper by Paul Erdős and Alfréd Rényi that marks the beginning of the study of random networks in graph theory [2] and the 1973 paper by Mark Granovetter, the most cited social network paper [3]. The figure shows the yearly citations each paper acquired since their publication. Both papers were highly regarded within their discipline, but had only limited impact outside their field. The explosive growth of citations to these papers in the 21st century is a consequence of the emergence of network science, drawing a new, interdisciplinary attention to these classic publications.

To describe the detailed behavior of a system consisting of hundreds to billions of interacting components, we need a map of the system’s wiring diagram. In a social system this would require an accurate list of your friends, your friends’ friends, and so on. In the WWW this map tells us which webpages link to each other. In the cell the map corresponds to a detailed list of binding interactions and chemical reactions involving genes, proteins, and metabolites. In the past, we lacked the tools to map these networks. It was equally difficult to keep track of the huge amount of data behind them. The Internet revolution, offering effective and fast data sharing methods and cheap digital storage, fundamentally changed our ability to collect, assemble, share, and analyze data pertaining to real networks.

INTRODUCTION

7

Thanks to these technological advances, at the turn of the millenium we witnessed an explosion of map making (BOX 1.2). Examples range from the CAIDA or DIMES projects that offered the first large-scale maps of the Internet; to the hundreds of millions of dollars spent by biologists to experimentally map out protein-protein interactions in human cells; the efforts made by social network companies, like Facebook, Twitter, or LinkedIn, to develop accurate depositories of our friendships and professional ties; the Connectome project of the US National Institute of Health that aims to systematically trace the neural connections in mammalian brains. The sudden availability of these maps at the end of the 20th century has catalyzed the emergence of network science.

THE UNIVERSALITY OF NETWORK CHARACTERISTICS It is easy to list the differences between the various networks we encounter in nature or society: the nodes of the metabolic network are tiny molecules and the links are chemical reactions governed by the laws of chemistry and quantum mechanics; the nodes of the WWW are web documents and the links are URLs guaranteed by computer algorithms; the nodes of the social network are individuals and the links represent family, professional, friendship, and acquaintance ties. The processes that generated these networks also differ greatly: metabolic networks were shaped by billions of years of evolution; the WWW is built by the collective actions of millions of individuals and organizations; social networks are shaped by social norms whose roots go back thousands of years. Given this diversity in size, nature, scope, history, and evolution, one would not be surprised if the networks behind these systems would differ greatly. A key discovery of network science is that the architecture of networks emerging in various domains of science, nature, and technology are similar to each other, a consequence of being governed by the same organizing principles. Consequently we can use a common set of mathematical tools to explore these systems. This universality is one of the guiding principle of this book: we will not only seek to uncover specific network properties, but each time we ask how widely they apply. We will also aim to understand their origins, uncovering the laws that shape network evolution and their consequences on network behavior. In summary, while many disciplines have made the important contributions to network science, the emergence of a new field was partly made possible by data availability, offering accurate maps of networks encountered in different disciplines. These diverse maps allowed network scientists to identify the universal properties of various network characteristics. This universality offers the foundation of the new discipline of network science.

INTRODUCTION

8

THE FORCES THAT HELPED THE EMERGENCE OF NETWORK SCIENCE

BOX 1.2 THE ORIGINS OF NETWORK MAPS

A few of the maps studied today by network scientists were generated with the purpose of studying networks. Most are the byproduct of other projects and morphed into maps only in the hands of network scientists. (a) The list of chemical reactions in a cell were discovered one-by-

one over a 150 year period by biochemists. In the 1990s they were collected in central databases, offering the first chance to assemble the biochemical networks within a cell. (b) The list of actors that play in each movie were traditionally

scattered in newspapers, books and encyclopedias. With the advent of the Internet, these data were assembled into central databases, like imdb.com, feeding the curiosity of movie aficionados. The database allowed network scientists to reconstruct the affiliation network behind Hollywood. (c) The list of authors of millions of research papers were tra-

ditionally scattered in the table of content of thousands of journals. Recently Web of Science, Google Scholar, and other services have assembled them into comprehensive databases, allowing network scientists to reconstruct accurate maps of scientific collaboration networks. Much of the early history of network science relied on the investigators’ ingenuity to recognize and extract networks from preexisting databases. Network science changed that: Today well-funded research collaborations focus on map making, capturing accurate wiring diagrams of biological, communication and social systems.

INTRODUCTION

9

THE FORCES THAT HELPED THE EMERGENCE OF NETWORK SCIENCE

SECTION 1.4

THE CHARACTERISTICS OF NETWORK SCIENCE

Network science is defined not only by its subject matter, but also by its methodology. In this section we discuss the key characteristics of the approach network science adopted to understand complex systems.

INTERDISCIPLINARY NATURE Network science offers a language through which different disciplines can seamlessly interact with each other. Indeed, cell biologists, brain scientists (Figure 1.4) and computer scientists alike are faced with the task of characterizing the wiring diagram behind their system, extracting information from incomplete and noisy datasets, and understanding their systems’ robustness to failures or attacks. To be sure, each discipline brings a different set of goals, technical details and challenges, which are important on their own. Yet, the common nature of many issues these fields struggle with has led to a cross-disciplinary fertilization of tools and ideas. For example, the concept of betweenness centrality that emerged in the social network literature in the 1970s, today plays a key role in identifying high traffic nodes on the Internet. Similarly algorithms developed by computer scientists for graph partitioning have found novel applications in identifying disease modules

Figure 1.4

Mapping the Brain

in medicine or detecting communities within large social networks.

An exploding application area for network science is brain research. The wiring diagram of a complete nervous system has long been available for C. elegans, a small roundworm, but neuronal connectivity data for larger animals has been missing until recently. That is changing thanks to major efforts by the scientific community to develop technologies that can map out the brain’s wiring diagram. The image shows the cover of the April 10, 2014 issue of Nature, reporting an extensive map of the laboratory mouse [4] generated by researchers at the Allen Institute in Seattle.

EMPIRICAL, DATA DRIVEN NATURE Several key concepts of network science have their roots in graph theory, a fertile field of mathematics. What distinguishes network science from graph theory is its empirical nature, i.e. its focus on data, function and utility. As we will see in the coming chapters, in network science we are never satisfied with developing abstract mathematical tools to describe a certain network property. Each tool we develop is tested on real data and its value is judged by the insights it offers about a system’s properties and behavior.

QUANTITATIVE AND MATHEMATICAL NATURE To contribute to the development of network science and to properly use its tools, it is essential to master the mathematical formalism behind INTRODUCTION

10

it. Network science borrowed the formalism to deal with graphs from graph theory and the conceptual framework to deal with randomness and seek universal organizing principles from statistical physics. Lately, the field is benefiting from concepts borrowed from engineering, like control and information theory, allowing us to understand the control principles of networks, and from statistics, helping us extract information from incomplete and noisy datasets. The development of network analysis software has made the tools of network science available to a wider community, even those who may not be familiar with the intellectual foundations and the full mathematical depths of the discipline. Yet, to further the field and to efficiently use its tools, we neet to master its theoretical formalism.

COMPUTATIONAL NATURE Given the size of many of the networks of practical interest, and the exceptional amount of auxiliary data behind them, network scientists are regularly confronted by a series of formidable computational challenges. Hence, the field has a strong computational character, actively borrowing from algorithms, database management and data mining. A series of software tools are available to address these computational problems, enabling practitioners with diverse computational skills to analyze the networks of interest to them. In summary, a mastery of network science requires familiarity with each of these aspects of the field. It is their combination that offers the multi-faceted tools and perspectives necessary to understand the properties of real networks.

INTRODUCTION

11

THE CHARACTERISTICS OF NETWORK SCIENCE

SECTION 1.5

SOCIETAL IMPACT

The impact of a new research field is measured both by its intellectual achievements as well as by its societal impact, indicated by the reach and the potential of its applications. While network science is a young field, its impact is everywhere.

ECONOMIC IMPACT: FROM WEB SEARCH TO SOCIAL NETWORKING The most successful companies of the 21st century, from Google to Facebook, Twitter, LinkedIn, Cisco, Apple and Akamai, base their technology and business model on networks. Indeed, Google not only runs the biggest network mapping operation that humanity has ever built, generating a comprehensive and constantly updated map of the WWW, but its search technology is deeply interlinked with the network characteristics of the Web. Networks have gained particular popularity with the emergence of Facebook, the company with the ambition to map out the social network of the whole planet. Facebook was not the first social networking site and it is likely not the last either: An impressive ecosystem of social networking tools, from Twitter to LinkedIn are fighting for the attention of millions of users. Algorithms conceived by network scientists fuel these sites, aiding everything from friend recommendation to advertising.

HEALTH: FROM DRUG DESIGN TO METABOLIC ENGINEERING Completed in 2001, the human genome project offered the first comprehensive list of all human genes [5, 6]. Yet, to fully understand how our cells function, and the origin of disease, a full list of genes is not sufficient: We also need an accurate map of how genes, proteins, metabolites and other cellular components interact with each other. Indeed, most cellular processes, from food processing to sensing changes in the environment, rely on molecular networks. The breakdown of these networks is responsible for human diseases. The increasing awareness of the importance of molecular networks INTRODUCTION

12

has led to the emergence of network biology, a new subfield of biology that aims to understand the behavior of cellular networks. A parallel movement within medicine, called network medicine, aims to uncover the role of networks in human disease (Figure 1.5). The importance of these advances is illustrated by the fact that Harvard University in 2012 started the Division of Network Medicine, that employs researchers and medical doctors who apply network-based ideas towards understanding human disease. Networks play a particularly important role in drug development. The ultimate goal of network pharmacology [7] is to develop drugs that can cure diseases without significant side effects. This goal is pursued at many levels, from millions of dollars invested to map out cellular networks, to the development of tools and databases to store, curate, and analyze patient and genetic data. Several new companies take advantage of the opportunities offered by networks for health and medicine. For example GeneGo collects maps of cellular interactions from the scientific literature and Genomatica uses the predictive power behind metabolic networks to identify drug targets in bacteria and humans. Recently major pharmaceutical companies, like Johnson & Johnson, have made significant investments in network medicine, seeing it as the path towards future drugs.

SECURITY: FIGHTING TERRORISM Terrorism is a malady of the 21st century, requiring significant resources to combat it worldwide. Network thinking is increasingly present in the arsenal of various law enforcement agencies in charge of responding to terrorist activities. It is used to disrupt the financial network of terrorist organizations and to map adversarial networks, helping to uncover the role of their members and their capabilities. While much of the work in this area is classified, several well documented case studies have been made public. Examples include the use of social networks to find Saddam Hussein [10] or those responsible for the March 11, 2004 Madrid train bombings through the examination of the mobile call network. Network concepts have impacted military doctrine as well, leading to the concept of network-centric warfare, aimed at fighting low intensity conflicts against terrorist and criminal networks that employ

Figure 1.5

Network Biology and Medicine The cover of two issues of Nature Reviews Genetics, the leading review journal in genetics. The journal has devoted exceptional attention to the impact of networks: the 2004 cover focuses on network biology [8] (top), the 2011 cover discusses network medicine [9] (bottom).

decentralized flexible network organization [11] (Figure 1.6). Given the numerous potential military applications, it is perhaps not surprising that one of the first academic programs in network science was started at West Point, the US Army Military Academy. Furthermore, starting in 2009 the Army Research Lab devoted over $300 million to support network science centers across the US. The knowledge and the capabilities offered by networks can be also abused. Such misuses were well illustrated by the indiscriminate network mapping operation by the National Security Agency [12]. Under the pretext of stopping future terrorist attacks, NSA monitored the INTRODUCTION

13

THE IMPACT OF NETWORK SCIENCE

Figure 1.6

The Network Behind a Military Engagement This diagram was designed during the Afghan war in 2012 to portray the American operational plans in Afghanistan. While it has been ridiculed in the press for displaying too much complexity and detail in one chart, it vividly illustrates the interconnected nature of a modern military engagement. Today this example is studied by officers and military students to demonstrate the power and utility of network models for decision-making and operational coordination. Indeed, the job of military generals is not limited to ensuring the necessary military capacities, but must also factor in the beliefs and the living conditions of the local population or the impact of the narcotics trade that finances the opearations of the insurgents. Image from New York Times.

INTRODUCTION

14

THE IMPACT OF NETWORK SCIENCE

communications of hundreds of millions of individuals, from the US and abroad, rebuilding their social network. With that network scientists have awoken to a new social responsibility: to ensure the ethical use of our tools and knowledge.

>

EPIDEMICS: FROM FORECASTING TO HALTING DEADLY VIRUSES While the H1N1 pandemic was not as devastating as it was feared at the beginning of the outbreak in 2009, it gained a special role in the history of epidemics: It was the first pandemic whose course and time evolution was accurately predicted months before the pandemic reached its peak (Online Resource 1.1) [13]. This was possible thanks to fundamental advances in understanding the role of transportation networks in the spread of viruses.

Online Resource 1.1 Predicting the H1N1 Epidemic

Before 2000 epidemic modeling was dominated by compartment-based

The predicted spread of the H1N1 epidemics during 2009, representing the first successful real-time prediction of a pandemic [13]. The project, relying on data describing the structure and the dynamics of the worldwide transportation network, foresaw that H1N1 will peak out in October 2009, in contrast with the expected January-February peak of influenza. This meant that the vaccines timed for November 2009 were too late, eventually having little impact on the outcome of the epdemic. The success of this project shows the power of network science in facilitating advances in areas of key importance for humanity.

models, assuming that everyone can infect everyone else in the same socio-physical compartment. The emergence of a network-based framework has brought a fundamental change, offering a new level of predictability. Today epidemic prediction is one of the most active applications of network science [13, 14], being used to foresee the spread of influenza or to contain Ebola. It is also the source several fundamental results covered in this book, allowing us to model and predict the spread of biological, digital and social viruses (memes). The impact of these advances are felt beyond epidemiology. Indeed, in

Video courtesy of Alessandro Vespignani.

January 2010 network science tools have predicted the conditions nec-

>

essary for the emergence of viruses spreading through mobile phones [15]. The first major mobile epidemic outbreak that started in the fall of 2010 in China, infecting over 300,000 phones each day, closely followed the predicted scenario.

NEUROSCIENCE: MAPPING THE BRAIN The human brain, consisting of hundreds of billions of interlinked neurons, is one of the least understood networks from the perspective of network science. The reason is simple: We lack maps telling us which neurons are linked together. The only fully mapped brain available for research is that of the C. elegans worm, consisting of only 302 neurons. Detailed maps of mammalian brains could lead to a revolution in brain science, allowing the understanding and curing of numerous neurological and brain diseases. With that brain research could turn it into one of the most prolific application area of network science [16]. Driven by the potential transformative impact of such maps, in 2010 the National Institutes of Health in the U.S. has initiated the Connectome project, aimed at developing technologies that could provide accurate neuron-level maps of mammalian brains (Figure 1.4).

MANAGEMENT: UNCOVERING THE INTERNAL STRUCTURE OF AN ORGANIZATION While management tends to rely on the official chain of command, it is increasingly evident that the informal network, capturing who really communicates with whom, plays the most important role in the sucINTRODUCTION

15

THE IMPACT OF NETWORK SCIENCE

cess of an organization. Accurate maps of such organizational networks can expose the potential lack of interactions between key units, help identify individuals who play an important role in bringing different departments and products together, and help higher management diagnose diverse organizational issues. Furthermore, there is increasing evidence in the management literature that the productivity of an employee is determined by his/her position in this informal organizational network [17]. Therefore, numerous companies, like Maven 7, Activate Networks or Orgnet, offer tools and methodologies to map out the true structure of an organization. These companies offer a host of services, from identifying opinion leaders to reducing employee churn, optimizing knowledge and product diffusion and designing teams with the diversity, size and expertise to be the most effective for specific tasks (Figure 1.8). Established firms, from IBM to SAP, have added social networking capabilities to their business. Overall, network science tools are indispensable in management and business, enhancing productivity and boosting innovation within an organization.

INTRODUCTION

16

THE IMPACT OF NETWORK SCIENCE

(a)

Figure 1.7

Mapping Organizations

(a)Employees of a Hungarian company with three main locations (purple, yellow and blue). The management realized that information reaching the workers about the intentions of the higher management often had nothing do to with their real plans. Seeking to enhance information flow within the company, they turned to Maven 7, a company that applies network science in organizational setting.

(b) (b) Maven 7 developed an online platform to ask each employee to whom do they turn to for advice when it comes to decisions impacting the company. This platform provided the map shown in (b), where two individuals are connected if one nominated the other as his/her source of information on organizational and professional issues. The map identifies several highly influential individuals, appearing as large hubs.

(c) The position of the leadership within the company’s informal network, nodes being colored based on their rank within the company. Note that none of the directors, shown in red, are hubs. Nor are the top managers, shown in blue. The hubs come from lower ranks: they are managers, group leaders and associates. The biggest hub, hence the most influential individual, is an ordinary employee, appearing as a gray node in the center.

(c)

(d) The links of the largest hub (red) and those two links away from this hub (orange), demonstrate that a significant fraction of employees are at most two links from this hub. But who is this hub? He is the employee in charge of safety and environmental issues. Hence he regularly visits each location and talks with the employees. He is connected to everyone except the top management. With little knowledge of the true intentions of the management, he passes on information that he collects along his trail, effectively running a gossip center.

(d)

Should they fire or promote the biggest hub? What is the best solution to this problem?

INTRODUCTION

17

THE IMPACT OF NETWORK SCIENCE

SECTION 1.6

SCIENTIFIC IMPACT

Nowhere is the impact of network science more evident than in the scientific community. The most prominent scientific journals, from Nature to Science, Cell and PNAS, have devoted reviews and editorials addressing the impact of networks on various topics, from biology to social sciences. For example, Science has published a special issue on networks, marking the ten-year anniversary of the discovery of scale-free networks [18] (Figure 1.8). During the past decade each year about a dozen international conferences, workshops, summer and winter schools have focused on network science. A highly successful network science conference series, called NetSci, attracts the field’s practitioners since 2005. Several general-interest books have made bestseller lists in many countries, bringing network science to the general public. Most major universities offer network science courses, attracting a diverse student body, and in 2014 Northeastern University in Boston and the Central European University in Budapest have launched PhD programs in network science. The see the impact of networks on the scientific community it is useful to inspect the citation patterns of the most cited papers in the area of com-

Figure 1.8

Complex Systems and Networks Special issue of Science magazine devoted to networks, published on July 24, 2009, on the 10th anniversary of the 1999 discovery of scale-free networks [18].

plex systems. Each of these papers are citation classics, reporting classic discoveries like the butterfly effect, renormalisation group, spin glasses, fractals and neural networks, and cumulatively amassing anywhere between 2,000 and 5,000 citations. To see how the interest in network science compares to the impact of these foundational papers in Figure 1.9 we compare their citation patterns to the citations of the two most cited network science papers: the 1998 paper on small-world phenomena [19] and the 1999 Science paper reporting the discovery of scale-free networks [18]. As one can see, the rapid rise of yearly citations to these two papers is without precedent in the area of complex systems. Several other metrics indicate that network science is impacting in a defining manner numerous disciplines. For example, in several research fields network papers became the most cited papers in their leading journals: INTRODUCTION

18

1000

Figure 1.9

Chaos: Lorenz (1963)

900

Complexity and Network Science

Spin Glasses: Edward-Anderson (1975)

800

The scientific impact of network science, as seen through citation patterns, compared to the citations of the most cited papers in complexity. The study of complex systems in the 60s and 70s was dominated by Edward Lorenz’s 1963 classic work on chaos [20], Kenneth G. Wilson’s renormalization group [21], and Samuel F. Edwards and Philip W. Anderson work on spin glasses [22]. In the 1980s the community has shifted its focus to pattern formation, following Benoit Mandelbrot’s book on fractals [23] and Thomas Witten and Len Sander’s introduction of the diffusion limited aggregation model [24]. Equally influential was John Hopfield’s paper on neural networks [25] and Per Bak, Chao Tang and Kurt Wiesenfeld’s work on self-organized criticality [26]. These papers continue to define our understanding of complex systems. The figure compares the yearly citations of these landmark papers with the citations of the two most cited papers in network science, the paper by Watts and Strogatz on small world networks and by Barabási and Albert, reporting the discovery of scale-free networks. [18, 19].

Renormalization: Wilson (1975)

700

Neural Networks: Hopfield (1982)

600

Fractals: Mandelbrot (1982)

500

Networks: Watts-Strogatz (1998)

400

Networks: Barabási-Albert (1999)

300 200 100 0 1960

1965

1970

1975

1980

1985

1990

1995

2000

2005

2010

(a) The 1998 paper by Watts and Strogatz in Nature on small world phe-

nomena [19] and the 1999 paper by Barabási and Albert in Science on scale-free networks [18] were identified by Thompson-Reuters as being among the top ten most cited papers in physical sciences during the decade after their publication. Currently (2011) the Watts-Strogatz paper is the second most cited of all papers published in Nature in 1998 and the Barabási-Albert paper is the most cited paper among all papers published in Science in 1999. (b) Four years after its publication the SIAM review by Mark Newman

on network science became the most cited paper of any journal published by the Society of Industrial & Applied Mathematics [27]. (c) Reviews of Modern Physics, published since 1929, is the physics jour-

nal with the highest impact factor. Until 2012 the most cited paper of the journal was written by Nobel Prize winner Subrahmanyan Chandrasekhar, his classic 1944 review entitled Stochastic Problems in Physics and Astronomy [28]. During the 70 years since its publication, the paper gathered over 5,000 citations. Yet, in 2012 it was taken over by the first review of network science published in 2001 entitled Statistical Mechanics of Complex Networks [29]. (d) The paper reporting the discovery that in scale-free networks the ep-

idemic threshold vanishes, by Pastor-Satorras and Vespignani [30], is the most cited paper among the papers published in 2001 by Physical Review Letters, shared with a paper on quantum computing. (e) The paper by Michelle Girvan and Mark Newman on community dis-

covery in networks [31] is the most cited paper published in 2002 by Proceedings of the National Academy of Sciences. (f) The 2004 review entitled Network Biology [8] is the second most cited

paper in the history of Nature Reviews Genetics, the top review journal in genetics. Prompted by this extraordinary enthusiasm within by the scientifINTRODUCTION

19

SCIENTIFIC IMPACT

ic community, network science was examined by the National Research Council (NRC), the arm of the US National Academies in charge of offering policy recommendation to the US government. NRC has assembled two panels, resulting in recommendations summarized in two NRC Reports [32, 33], defining the field of network science (Figure 1.10). These reports not only documented the emergence of a new research field, but highlighted the field’s role for science, national competitiveness and security. Following these reports, the National Science Foundation (NSF) in the US established a network science directorate and several Network Science Centers were funded at US universities by the Army Research Labs. Network science has excited the public as well. This was fueled by the success of several general audience books, like Linked, Nexus, Six Degrees and Connected (Figure 1.11). Connected, an award-winning documentary by Australian filmmaker Annamaria Talas, has brought the field to our TV screen, being broadcasted all over the world and winning several prestigious prizes (Online Resource 1.2). Networks have inspired artists as well, leading to a wide range of network-related art projects, and an annual symposium series that brings together artists and network scientists [38]. Fueled by successful movies like The Social Network or Six Degrees of Separation, and a series of science fiction novels and short stories exploiting the network paradigm, today networks are deeply ingrained in popular culture.

Figure 1.10 National Research Council

Two National Research Council reports on network science have documented the emergence of the new discipline and highlighted its longterm impact on research and national competitiveness [32, 33]. They have recommended dedicated support for the field, prompting the establishment of network science centers at US universities and a network science program within NSF.

> Online resource 1.2 Connected

The trailer of the award winning documentary entitled Connected, directed by Annamaria Talas, offering an introduction into network science. It features the actor Kevin Bacon and several well-known network scientists.

>

INTRODUCTION

20

SCIENTIFIC IMPACT

Figure 1.11

Wide Impact Four widely read books, translated to over twenty languages, have brought network science to the general public [34, 35, 36, 37].

INTRODUCTION

21

SCIENTIFIC IMPACT

SECTION 1.7

SUMMARY

Figure 1.12

0.008000%

The Rise of Networks

NETWORK QUANTUM EVOLUTION

0.007000% 0.006000%

The frequency of use of the words evolution, quantum, and networks in books since 1880. The plot indicates the exploding societal awareness of networks in the last decades of the 20th century, laying the ground for the emergence of network science. The plots were generated by Google’s ngram platform, calculating the fraction of books published in a year that mention evolution, quantum or networks.

0.005000% 0.004000% 0.003000% 0.002000% 0.001000% 0.000000% 1800

1820

1840

1860

1880

1900

1920

1940

1960

1980

2000

While the emergence of network science may appear to have been rather sudden phenomenon (Figures 1.3 & 1.9), the field was responding to a wider social awareness of the role and importance of networks. This is illustrated in Figure 1.12, that shows the usage frequency of words that capture two important scientific revolutions of the past two centuries: evolution, the most common term referring to Darwin’s theory of evolution, and quantum, the most frequently used term when one refers to quantum mechanics. As expected, the use of evolution increases after the 1859 publication of Darwin’s On the Origins of Species. The word quantum, first used in 1902, remained virtually absent until the 1920s, when quantum mechanics gained acceptance among physicists and reached public conciousness. The figure compares these words with the usage of network, which enjoyed a spectacular increase following the 1980s, surpassing both evolution and quantum. While the term network has many uses (as do evolution and quantum), its dramatic rise captures the increasing societal awareness of networks. There is something common between the advances facilitated by evolutionary theory, quantum mechanics and network science: They are not only important scientific fields with their own intellectual core and body of knowledge, but they are also enabling platforms. Indeed, the current revolution in genetics is built on evolutionary theory and quantum mechanics offers a platform for a wide range of advances in contemporary science, from chemistry to electronics. In a similar fashion, network sciINTRODUCTION

22

ence is an enabling platform, offering novel tools and perspectives for a wide range of scientific problems, from social networking to drug design. Given this exceptional impact networks have both in science and in society, we must master the tools to study and quantify them. The rest of this book is devoted to this worthy subject.

INTRODUCTION

23

SUMMARY

SECTION 1.8

HOMEWORK

1.1. Networks Everywhere List three different real networks and state the nodes and links for each of them. 1.2. Your Interest Tell us of the network you are personally most interested in. Address the following questions: (a) What are its nodes and links? (b) How large is it? (c) Can be mapped out? (d) Why do you care about it? 1.3. Impact In your view what would be the area where network science could have the biggest impact in the next decade? Explain your answer.

INTRODUCTION

24

SECTION 1.9

BIBLIOGRAPHY

[1] J. Richards, R. Hobbs. Mark Lombardi: Global Networks. Independent Curators International, New York, 2003. [2] P. Erdős and A. Rényi. On random graphs. Publicationes Mathematicae, 6: 290, 1959. [3] M. S. Granovetter. The strength of weak ties. American Journal of Sociology, 78: 1360, 1973. [4] S.W. Oh et.al. A mesoscale connectome of the mouse brain. Nature, 508: 207-214, 2014. [5] International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature, 409: 6822, 2001. [6] J. C. Venter et al. The Sequence of the Human Genome. Science, 291: 1304, 2001. [7] A. L. Hopkins, Network Pharmacology. Nature Biotechnology, 25: 1110-1111, 2007. [8] Z. N. Oltvai and A.-L. Barabási. Network Biology: Understanding the cell’s functional organization. Nature Reviews Genetics, 5: 101, 2004. [9] N. Gulbahce, A.-L. Barabási, and J. Loscalzo. Network medicine: A network-based approach to human disease. Nature Reviews Genetics, 12: 56, 2011. [10] C. Wilson. Searching for Saddam: A five-part series on how the US military used social networking to capture the Iraqi dictator. 2010. www. slate.com/id/2245228/. [11] J. Arquilla and D. Ronfeldt. Networks and Netwars: The Future of Terror, Crime, and Militancy. RAND: Santa Monica, CA, 2001. INTRODUCTION

25

[12] A.L. Barabási, Scientists must spearhead ethical use of big data. Politico.com, September 30, 2013. [13] D. Balcan, H. Hu, B. Goncalves, P. Bajardi, C. Poletto, J. J. Ramasco, D. Paolotti, N. Perra, M. Tizzoni, W. Van den Broeck, V. Colizza, and A. Vespignani. Seasonal transmission potential and activity peaks of the new influenza A(H1N1): a Monte Carlo likelihood analysis based on human mobility. BMC Medicine, 7: 45, 2009. [14] L. Hufnagel, D. Brockmann, and T. Geisel. Forecast and control of epidemics in a globalized world. PNAS, 101: 15124, 2004. [15] P. Wang, M. Gonzalez, C. A. Hidalgo, and A.-L. Barabási. Understanding the spreading patterns of mobile phone viruses. Science, 324: 1071, 2009. [16] O. Sporns, G. Tononi, and R. Kötter. The Human Connectome: A Structural Description of the Human Brain. PLoS Computional Biology, 1: 4, 2005. [17] L. Wu , B. N. Waber, S. Aral, E. Brynjolfsson, and A. Pentland. Mining Face-to-Face Interaction Networks using Sociometric Badges: Predicting Productivity in an IT Configuration Task. Proceedings of the International Conference on Information Systems, Paris, France, December 14-17, 2008. [18] A.-L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286: 509, 1999. [19] D. J. Watts and S .H. Strogatz. Collective dynamics of ‘small-world’ networks. Nature, 393: 440, 1998. [20] E. N. Lorenz. Deterministic Nonperiodic Flow. Journal of the Atmospheric Sciences, 20: 130, 1963. [21] K. G. Wilson. The renormalization group: Critical phenomena and the Kondo problem. Reviews of Modern Physics, 47: 773, 1975. [22] S. F. Edwards and P. W. Anderson. Theory of Spin Glasses. Journal of Physics, F 5: 965, 1975. [23] B. B. Mandelbrot. The Fractal Geometry of Nature. W.H. Freeman and Company. 1982. [24] T. Witten, Jr. and L. M. Sander. Diffusion-Limited Aggregation, a Kinetic Critical Phenomenon. Physical Review Letters, 47: 1400, 1981. [25] J. J. Hopfield. Neural networks and physical systems with emergent collective computational abilities. PNAS, 79: 2554, 1982.

INTRODUCTION

26

BIBLIOGRAPHY

[26] P. Bak, C. Tang, and K. Wiesenfeld. Self-organized criticality: an explanation of 1/ƒ noise. Physical Review Letters, 59: 4, 1987. [27] M. E. J. Newman. The structure and function of complex networks. SIAM Review. 45: 167, 2003. [28] S. Chandrasekhar. Stochastic Problems in Physics and Astronomy. Reviews Modern Physics, 15: 1, 1943. [29] R. Albert and A.-L. Barabási, Statistical mechanics of complex networks. Reviews Modern Physics, 74: 47, 2002. [30] R. Pastor-Satorras and A. Vespignani. Epidemic spreading in scalefree networks. Physical Review Letters, 86: 3200, 2001. [31] M. Girvan and M. E. J. Newman. Community structure in social and biological networks. PNAS, 99: 7821, 2002. [32] National Research Council. Network Science. Washington, DC: The National Academies Press, 2005. [33] National Research Council. Strategy for an Army Center for Network Science, Technology, and Experimentation. Washington, DC: The National Academies Press, 2007. [34] A.-L. Barabási. Linked: The New Science of Networks. Perseus Books Group, 2002. [35] M. Buchanan. Nexus: Small Worlds and the Groundbreaking Science of Networks. Norton, 2003. [36] D. Watts. Six Degrees: The Science of a Connected Age. Norton, 2004. [37] N. Christakis and J. Fowler. Connected: The Surprising Power of Our Social Networks and How They Shape Our Lives. Back Bay Books, 2011. [38] M. Schich, R. Malina, and I. Meirelles (Editors). Arts, Humanities, and Complex Networks [Kindle Edition], 2012.

INTRODUCTION

27

BIBLIOGRAPHY

2 ALBERT-LÁSZLÓ BARABÁSI

NETWORK SCIENCE GRAPH THEORY

ACKNOWLEDGEMENTS

MÁRTON PÓSFAI GABRIELE MUSELLA MAURO MARTINO ROBERTA SINATRA

PHILIPP HOEVEL SARAH MORRISON AMAL HUSSEINI

INDEX

The Bridges of Königsberg

1

Networks and Graphs

2

Degree, Average Degree and Degree Distribution

3

Adjacency Matrix

4

Real Networks are Sparse

5

Weighted Networks

6

Bipartite Networks

7

Paths and Distances

8

Connectedness

9

Clustering Coefficient

10

Summary

11

Homework

12

ADVANCED TOPIC 2.A Global Clustering Coefficient

13

Bibliography

14

Figure 2.0 (front cover)

Human Disease Network The Human Disease Network, whose nodes are diseases connected if they have common genetic origin. Published as a supplement of the Proceedings of the National Academy of Sciences [1], the map was created to illustrate the genetic interconnectedness of apparently distinct diseases. With time it crossed disciplinary boundaries, taking up a life of its own. The New York Times created an interactive version of the map and the London-based Serpentine Gallery, one of the top contemporary art galleries in the world, have exhibited it part of their focus on networks and maps [2]. It is also featured in numerous books on design and maps [3, 4, 5].

This work is licensed under a Creative Commons: CC BY-NC-SA 2.0. PDF V27, 05.09.2014

SECTION 2.1

THE BRIDGES OF KÖNIGSBERG

Few research fields can trace their birth to a single moment and place in

(a)

history. Graph theory, the mathematical scaffold behind network science, can. Its roots go back to 1735 in Königsberg, the capital of Eastern Prussia, a thriving merchant city of its time. The trade supported by its busy fleet of ships allowed city officials to build seven bridges across the river Pregel that surrounded the town. Five of these connected to the mainland the elegant island Kneiphof, caught between the two branches of the Pregel. The

(b)

C

remaining two crossed the two branches of the river (Figure 2.1). This pecu-

A

liar arrangement gave birth to a contemporary puzzle: Can one walk across all seven bridges and never cross the same one twice? Despite many at-

B

tempts, no one could find such path. The problem remained unsolved until 1735, when Leonard Euler, a Swiss born mathematician, offered a rigorous

(c)

mathematical proof that such path does not exist [6, 7].

C A

Euler represented each of the four land areas separated by the river with letters A, B, C, and D (Figure 2.1). Next he connected with lines each

simple observation: if there is a path crossing all bridges, but never the

C Figure 2.1 The Bridges of Königsberg A

same bridge twice, then nodes with odd number of links must be either the

? D

(a) A contemporary map of Königsberg (now Kaliningrad, Russia) during Euler’s time.

starting or the end point of this path. Indeed, if you arrive to a node with

B

an odd number of links, you may find yourself having no unused link for

(b) A schematic illustration of Königsberg’s four land pieces and the seven bridges across them.

you to leave it. A walking path that goes through all bridges can have only one starting

(c) Euler constructed a graph that has four nodes (A, B, C, D), each corresponding to a patch of land, and seven links, each corresponding to a bridge. He then showed that there is no continuous path that would cross the seven bridges while never crossing the same bridge twice. The people of Königsberg gave up their fruitless search and in 1875 built a new bridge between B and C, increasing the number of links of these two nodes to four. Now only one node was left with an odd number of links. Consequently we should be able to find the desired path. Can you find one yourself?

and one end point. Thus such a path cannot exist on a graph that has more than two nodes with an odd number of links. The Königsberg graph had four nodes with an odd number of links, A, B, C, and D, so no path could satisfy the problem. Euler’s proof was the first time someone solved a mathematical problem using a graph. For us the proof has two important messages: The first is that some problems become simpler and more tractable if they are represented as a graph. The second is that the existence of the path does not GRAPH THEORY

D

B

piece of land that had a bridge between them. He thus built a graph, whose nodes were pieces of land and links were the bridges. Then Euler made a

D

3

depend on our ingenuity to find it. Rather, it is a property of the graph. Indeed, given the structure of the Königsberg graph, no matter how smart we are, we will never find the desired path. In other words, networks have

>

properties encoded in their structure that limit or enhance their behavior. To understand the many ways networks can affect the properties of a system, we need to become familiar with graph theory, a branch of mathematics that grew out of Euler’s proof. In this chapter we learn how to represent a network as a graph and introduce the elementary characteristics

Online Resource 2.1

of networks, from degrees to degree distributions, from paths to distanc-

The Bridges of Königsberg

es and learn to distinguish weighted, directed and bipartite networks. We

Watch a short video introducing the Könisberg problem and Euler’s solution.

will introduce a graph-theoretic formalism and language that will be used

GRAPH THEORY

>

throughout this book.

4

THE BRIDGE OF KÖNIGSBERG

SECTION 2.2

NETWORKS AND GRAPHS

If we want to understand a complex system, we first need to know how

(a)

its components interact with each other. In other words we need a map of its wiring diagram. A network is a catalog of a system’s components often called nodes or vertices and the direct interactions between them, called links or edges (BOX 2.1). This network representation offers a common language to study systems that may differ greatly in nature, appearance, or

(b)

scope. Indeed, as shown in Figure 2.2, three rather different systems have exactly the same network representation. Figure 2.2 introduces two basic network parameters: Number of nodes, or N, represents the number of components in the

(c)

system. We will often call N the size of the network. To distinguish the nodes, we label them with i = 1, 2, ..., N. Number of links, which we denote with L, represents the total number of interactions between the nodes. Links are rarely labeled, as they can

(d)

be identified through the nodes they connect. For example, the (2, 4) link connects nodes 2 and 4.

2

3

The networks shown in Figure 2.2 have N = 4 and L = 4.

4 1

The links of a network can be directed or undirected. Some systems have directed links, like the WWW, whose uniform resource locators (URL) point

Figure 2.2 Different Networks, Same Graph

from one web document to the other, or phone calls, where one person calls the other. Other systems have undirected links, like romantic ties: if I date

The figure shows a small subset of (a) the Internet, where routers (specialized computers) are connected to each other; (b) the Hollywood actor network, where two actors are connected if they played in the same movie; (c) a protein-protein interaction network, where two proteins are connected if there is experimental evidence that they can bind to each other in the cell. While the nature of the nodes and the links differs, these networks have the same graph representation, consisting of N = 4 nodes and L = 4 links, shown in (d).

Janet, Janet also dates me, or like transmission lines on the power grid, on which the electric current can flow in both directions. A network is called directed (or digraph) if all of its links are directed; it is called undirected if all of its links are undirected. Some networks simultaneously have directed and undirected links. For example in the metabolic network some reactions are reversible (i.e., bidirectional or undirected) and others are irreversible, taking place in only one direction (directed). GRAPH THEORY

5

BOX 2.1

The choices we make when we represent a system as a network will determine our ability to use network science successfully to solve a particular problem. For example, the way we define the links between

NETWORKS OR GRAPHS?

two individuals dictates the nature of the questions we can explore:

In the scientific literature the

(a) By connecting individuals that regularly interact with each

terms network and graph are

other in the context of their work, we obtain the organizational

used interchangeably:

or professional network, that plays a key role in the success of a company or an institution, and is of major interest to organizational research (Figure 1.7). (b) By linking friends to each other, we obtain the friendship net-

work, that plays an important role in the spread of ideas, products and habits and is of major interest to sociology, marketing

Network Science

Graph Theory

Network

Graph

Node

Vertex

Link

Edge

Yet, there is a subtle distinction

and health sciences.

between the two terminologies: the {network, node, link} combi-

(c) By connecting individuals that have an intimate relationship,

nation often refers to real sys-

we obtain the sexual network, of key importance for the spread

tems: The WWW is a network of

of sexually transmitted diseases, like AIDS, and of major inter-

web documents linked by URLs;

est for epidemiology.

society is a network of individuals linked by family, friend-

(d) By using phone and email records to connect individuals that

ship or professional ties; the

call or email each other, we obtain the acquaintance network,

metabolic network is the sum

capturing a mixture of professional, friendship or intimate

of all chemical reactions that

links, of importance to communications and marketing.

take place in a cell. In contrast, we use the terms {graph, ver-

While many links in these four networks overlap (some coworkers may

tex, edge} when we discuss the

be friends or may have an intimate relationship), these networks have dif-

mathematical representation of

ferent uses and purposes.

these networks: We talk about the web graph, the social graph

We can also build networks that may be valid from a graph theoretic

(a term made popular by Face-

perspective, but may have little practical utility. For example, if we link

book), or the metabolic graph.

all individuals with the same first name, Johns with Johns and Marys with

Yet, this distinction is rarely

Marys, we do obtain a well-defined graph, whose properties can be ana-

made, so these two terminolo-

lyzed with the tools of network science. Its utility is questionable, however.

gies are often synonyms of each

Hence in order to apply network theory to a system, careful considerations

other.

must precede our choice of nodes and links, ensuring their significance to the problem we wish to explore. Throughout this book we will use ten networks to illustrate the tools of network science. These reference networks, listed in Table 2.1, span social systems (mobile call graph or email network), collaboration and affiliation networks (science collaboration network, Hollywood actor network), information systems (WWW), technological and infrastructural systems (Internet and power grid), biological systems (protein interaction and metabolic network), and reference networks (citations). They differ widely in their sizes, from as few as N =1,039 nodes in the E. coli metabolism, to almost half million nodes in the citation network. They cover several areas where networks are actively applied, representing ‘canonical’ datasets frequently GRAPH THEORY

6

NETWORKS AND GRAPHS

used by researchers to illustrate key network properties. As we indicate in Table 2.1, some of them are directed, others are undirected. In the coming chapters we will discuss in detail the nature and the characteristics of each of these datasets, turning them into the guinea pigs of our journey to understand complex networks. NETWORK

NODES

LINKS

DIRECTED UNDIRECTED

N

L

k

Internet

Routers

Internet connections

Undirected

192,244

609,066

6.34

WWW

Webpages

Links

Directed

325,729

1,497,134

4.60

Power Grid

Power plants, transformers

Cables

Undirected

4,941

6,594

2.67

Mobile Phone Calls

Subscribers

Calls

Directed

36,595

91,826

2.51

Email

Email addresses

Emails

Directed

57,194

103,731

1.81

Science Collaboration

Scientists

Co-authorship

Undirected

23,133

93,439

8.08

Actor Network

Actors

Co-acting

Undirected

702,388

29,397,908

83.71

Citation Network

Paper

Citations

Directed

449,673

4,689,479

10.43

E. Coli Metabolism

Metabolites

Chemical reactions

Directed

1,039

5,802

5.58

Protein Interactions

Proteins

Binding interactions

Undirected

2,018

2,930

2.9 0

Table 2.1 Canonical Network Maps

The basic characteristics of ten networks used throughout this book to illustrate the tools of network science. The table lists the nature of their nodes and links, indicating if links are directed or undirected, the number of nodes (N) and links (L), and the average degree for each network. For directed networks the average degree shown is the average in- or out-degrees = = (see Equation (2.5)).

GRAPH THEORY

7

NETWORKS AND GRAPHS

SECTION 2.3

DEGREE, AVERAGE DEGREE, AND DEGREE DISTRIBUTION

A key property of each node is its degree, representing the number of

BOX 2.2

links it has to other nodes. The degree can represent the number of mobile phone contacts an individual has in the call graph (i.e. the number of different individuals the person has talked to), or the number of citations a

BRIEF STATISTICS REVIEW

research paper gets in the citation network.

Four key quantities characterize

DEGREE

a sample of N values x1, ... , xN :

We denote with ki the degree of the ith node in the network. For exam-

ple, for the undirected networks shown in Figure 2.2 we have k1=2, k2=3,

Average (mean):

k3=2, k4=1. In an undirected network the total number of links, L, can be

expressed as the sum of the node degrees:

1 L= ki . N

∑ 2

x1 + x2 + … + x N 1 N = ∑ xi N N i =1

x =

 (2.1)

i =1

The nth moment:

Here the 1/2 factor corrects for the fact that in the sum (2.1) each link is counted twice. For example, the link connecting the nodes 2 and 4 in Figure

x1 + x 2 + … + x N  n

2.2 will be counted once in the degree of node 1 and once in the degree of

x =

n

n

n

N

node 4.



AVERAGE DEGREE

Standard deviation:

=

1 N n ∑ xi N i =1

An important property of a network is its average degree (BOX 2.2), which for an undirected network is

σx =



1 k =

N

∑k N i =1

i

=

2L . N

(2.2)

)

2

Distribution of x:

In directed networks we distinguish between incoming degree, kiin, rep-

 1 p= δ x,x .

resenting the number of links that point to node i, and outgoing degree,

x

kiout, representing the number of links that point from node i to other

nodes. Finally, a node’s total degree, ki, is given by

N

where px follows

 (2.3) k = kiin + kiout. i For example, on the WWW the number of pages a given document points to represents its outgoing degree, kout, and the number of documents that point to it represents its incoming degree, kin. The total number

GRAPH THEORY

1 N ∑ ( xi − x N i =1

8

∑ i

i

of links in a directed network is N

N

L = ∑ kiin = ∑ kiout . i =1 i =1

(2.4)

The 1/2 factor seen in (2.1) is now absent, as for directed networks the two sums in (2.4) separately count the outgoing and the incoming degrees. The average degree of a directed network is 1 N 1 N L k in = ∑ kiin = k out = ∑ kiout = N N i =1 N i =1

(2.5)

DEGREE DISTRIBUTION The degree distribution, pk, provides the probability that a randomly se-

lected node in the network has degree k. Since pk is a probability, it must be normalized, i.e. 



∑p



k =1

k

=1 .

(2.6)

For a network with N nodes the degree distribution is the normalized histogram (Figure 2.3) is given by

Nk , N

pk =

(2.7)

where Nk is the number of degree-k nodes. Hence the number of degree-k

nodes can be obtained from the degree distribution as Nk = Npk.

The degree distribution has assumed a central role in network theory following the discovery of scale-free networks [8]. One reason is that the calculation of most network properties requires us to know pk. For example, the average degree of a network can be written as ∞

k = ∑ kpk .

(2.8)

k=0

The other reason is that the precise functional form of pk determines many network phenomena, from network robustness to the spread of viruses. (a)

1

(b) 0.75

4

pk

2

Figure 2.3

0.5

1

Degree Distribution

0.25

3

The degree distribution of a network is provided by the ratio (2.7).

0 0

(c)

(d)

1

2

k

3

4

(a) For the network in (a) with N = 4 the degree distribution is shown in (b). (b) We have p1 = 1/4 (one of the four nodes has degree k1 = 1), p2 = 1/2 (two nodes have k3 = k4 = 2), and p3 = 1/4 (as k2 = 3). As we lack nodes with degree k > 3, pk = 0 for any k > 3.

1 0.75

pk

(c) A one dimensional lattice for which each node has the same degree k = 2.

0.5

(d) The degree distribution of (c) is a Kronecker’s delta function, pk = δ(k - 2).

0.25 0 0

GRAPH THEORY

1

2

k

3

4

9

DEGREE, AVERAGE DEGREE, AND DEGREE DISTRIBUTION

(a)

Figure 2.4 Degree Distribution of a Real Network

In real networks the node degrees can varywidely. (a) A layout of the protein interaction network of yeast (Table 2.1). Each node corresponds to a yeast protein and links correspond to experimentally detected binding interactions. Note that the proteins shown on the bottom have self-loops, hence for them k=2.

(b)

(b) The degree distribution of the protein interaction network shown in (a). The observed degrees vary between k=0 (isolated nodes) and k=92, which is the degree of the most connected node, called a hub. There are also wide differences in the number of nodes with different degrees: Almost half of the nodes have degree one (i.e. p1=0.48), while we have only one copy of the biggest node (i.e. p92 = 1/N=0.0005).

(c)

pk

pk

k

(c) The degree distribution is often shown on a log-log plot, in which we either plot log pk in function of ln k, or, as we do in (c), or we use logarithmic axes. The advantages of this representation are discussed in Chapter 4.

k 10

SECTION 2.4

ADJACENCY MATRIX

A complete description of a network requires us to keep track of its links. The simplest way to achieve this is to provide a complete list of the links. For example, the network of Figure 2.2 is uniquely described by listing its four links: {(1, 2), (1, 3), (2, 3), (2, 4)}. For mathematical purposes we often represent a network through its adjacency matrix. The adjacency matrix of a directed network of N nodes has N rows and N columns, its elements being: Aij = 1 if there is a link pointing from node j to node i

Aij = 0 if nodes i and j are not connected to each other The adjacency matrix of an undirected network has two entries for each link, e.g. link (1, 2) is represented as A12 = 1 and A21 = 1. Hence, the ad-

jacency matrix of an undirected network is symmetric, Aij = Aji (Figure 2.5b). The degree ki of node i can be directly obtained from the elements of the

adjacency matrix. For undirected networks a node’s degree is a sum over either the rows or the columns of the matrix, i.e. N

N

j =1

i =1

ki = ∑ Aji = ∑ Aji .

(2.9)

For directed networks the sums over the adjacency matrix’ rows and columns provide the incoming and outgoing degrees, respectively N

N

kiin = ∑ Aij ,

kiout = ∑ A ji . j =1

j =1

(2.10)

Given that in an undirected network the number of outgoing links equals the number of incoming links, we have

N

N

N

i =1

i =1

ij

2 L = ∑ kiin = ∑ kiout = ∑ Aij .

(2.11)

The number of nonzero elements of the adjacency matrix is 2L, or twice the number of links. Indeed, an undirected link connecting nodes i and j appears in two entries: Aij = 1, a link pointing from node j to node i, and Aji = 1, a link pointing from i to j (Figure 2.5b).

GRAPH THEORY

11

(a) Adjacency matrix

A11

A21

Aij =

A31 A41



A12

A22

A32 A42

A13

A23

A33 A43

A14

A24

A34 A44

(b) Undirected network

(c) Directed network

1

1 4

3

4

2

0 1 1 0

Aij =

1 0 1 1

1 1 0 0

4

4

j =1

i =1

3

0 1 0 0

k2 = ∑ A2 j = ∑ Ai 2 = 3 Aij = A ji

Aii = 0

Aij =

2

0 1 0 0

0 0 0 1

1 1 0 0

4

k = ∑ A2 j = 2 , k in 2

j =1

0 0 0 0

out 2

4

= ∑ Ai 2 = 1 i =1

Aii = 0

Aij ≠ A ji

Figure 2.5

1 N L = ∑ Aij 2 i , j =1

k =

2L N

L=

The Adjacency Matrix

N

∑A

i , j =1

(a) The labeling of the elements of the adjacency matrix.

ij

k in = k out =

(b) The adjacency matrix of an undirected network. The figure shows that the degree of a node (in this case node 2) can be expressed as the sum over the appropriate column or the row of the adjacency matrix. It also shows a few basic network characteristics, like the total number of links, L, and average degree, , expressed in terms of the elements of the adjacency matrix.

L N

(c) The same as in (b) but for a directed network.

GRAPH THEORY

12

ADJACENCY MATRIX

SECTION 2.5

REAL NETWORKS ARE SPARSE

In real networks the number of nodes (N) and links (L) can vary widely. For example, the neural network of the worm C. elegans, the only fully mapped nervous system of a living organism, has N = 302 neurons (nodes). In contrast the human brain is estimated to have about a hundred billion (N ≈ 1011) neurons. The genetic network of a human cell has about 20,000 genes as nodes; the social network consists of seven billion individuals (N

≈ 7×109) and the WWW is estimated to have over a trillion web documents (N > 1012). These wide differences in size are noticeable in Table 2.1, which lists N and L for several network maps. Some of these maps offer a complete wiring diagram of the system they describe (like the actor network or the E. coli metabolism), while others are only samples, representing a subset of Figure 2.6 Complete Graph

the full network (like the WWW or the mobile call graph).

A complete graph with N = 16 nodes and Lmax = 120 links, as predicted by (2.12). The adjacency matrix of a complete graph is Aij = 1 for all i, j = 1, .... N and Aii = 0. The average degree of a complete graph is = N - 1. A complete graph is often called a clique, a term frequently used in community identification, a problem discussed in CHAPTER 9.

Table 2.1 indicates that the number of links also varies widely. In a network of N nodes the number of links can change between L = 0 and Lmax, where

Lmax =

N 2

=

N ( N − 1) 2

(2.12)

is the total number of links present in a complete graph of size N (Figure 2.6). In a complete graph each node is connected to every other node. In real networks L is much smaller than Lmax, reflecting the fact that

most real networks are sparse. We call a network sparse if L

Finally, one can also define multipartite networks, like the tripartite recipe-ingredient-compound network shown in Figure 2.11.

Online Resource 2.2 Human Disease Network

Download the high resolution version of the Human Disease Network [1], or explore it using the online interface built by the New York Times.

>

GRAPH THEORY

17

GRAPH THEORY

18

BIPARTITE NETWORK

(d)

SANDHOFF DISEASE

OVARIAN CANCER

HUMAN DISEASE NETWORK

ATAXIA-TELANGIECTASIA

PAPILLARY SEROUS CARCINOMA FANCONI ANEMI A T-CELL LYMPHOBLASTIC LEUKEMIA

PANCREATIC CANCER

BREAST CANCER

LYMPHOMA

PERINEAL HYPOSPADIAS

WILMS TUMOR

PROSTATE CANCER

ANDROGEN INSENSITIVITY

SPINAL MUSCULAR ATROPHY

AMYOTROPHIC LATERAL SCLEROSI S

SILVER SPASTIC PARAPLEGIA SYNDROME

LIPODYSTROPHY

CHARCOT-MARIE-TOOTH DISEASE

SPASTIC ATAXIA/PARAPLEGI A

(a)

ATAXIA-TELANGIECTASIA ANDROGEN INSENSITIVITY

PERINEAL HYPOSPADIAS

DISEASE GENOME

DISEASE PHENOME

AR

BRIP1

ALS2

BSCL2

CHEK2

VAPB

RAD54L

MAD1L1

TP53

PIK3CA

MSH2

LMNA

KRAS

HEXB

GARS

CDH1

BRCA2

BRCA1

ATM

(c) The second projection is the disease network, whose nodes are diseases. Two diseases are connected if the same genes are associated with them, indicating that the two diseases have common genetic origin. Figures (a)-(c) shows a subset of the diseaseome, focusing on cancers.

(b) The Human Disease Network (or diseaseome) is a bipartite network, whose nodes are diseases (U) and genes (V). A disease is connected to a gene if mutations in that gene are known to affect the particular disease [4].

(a) One projection of the diseaseome is the gene network, whose nodes are genes, and where two genes are connected if they are associated with the same disease.

Figure 2.10 Human Disease Network

FANCONI ANEMIA

SPASTIC ATAXIA/PARAPLEGIA

SILVER SPASTIC PARAPLEGIA SYNDROME

AMYOTROPHIC LATERAL SCLEROSIS

CHARCOT-MARIE-TOOTH DISEASE

LIPODYSTROPHY

SANDHOFF DISEASE

SPINAL MUSCULAR ATROPHY

WILMS TUMOR

PANCREATIC CANCER

BREAST CANCER

LYMPHOMA

OVARIAN CANCER

PROSTATE CANCER

PAPILLARY SEROUS CARCINOMA

T-CELL LYMPHOBLASTIC LEUKEMIA

(b)

DISEASOME

AR

KRAS

MSH2

CHEK2

DISEASE GENE NETWORK

CDH1

TP53

BRCA1

BRCA2

BSCL2

PIK3CA

HEXB

ATM

GARS

MAD1L1

RAD54L

BRIP1

VAPB

LMNA

(d) The full diseaseome, connecting 1,283 disorders via 1,777 shared disease genes. After [1]. See Online Resource 2.2 for the detailed map.

ALS2

(c)

RECIPES

(a)

INGREDIENTS

COMPOUNDS

Figure 2.11 Tripartite Network

PHENETHYL ALCOHOL

FLOUR

(a) The construction of the tripartite recipe-ingredient-compound network, in which one set of nodes are recipes, like Chicken Marsala; the second set corresponds to the ingredients each recipe has (like flour, sage, chicken, wine, and butter for Chicken Marsala); the third set captures the flavor compounds, or chemicals that contribute to the taste of each ingredient.

L-ASPARTIC ACID

SAGE

BUTYRALDEHYDE

CHICKEN MASALA

9-DECANOIC ACID

CHICKEN

M-CRESOL HYDROGEN SULFIDE

WINE

DELTA-TETRACALACTONE ACETOIN

BUTTER

(b) The ingredient or the flavor network represents a projection of the tripartite network. Each node denotes an ingredient; the node color indicating the food category and node size indicates the ingredient’s prevalence in recipes. Two ingredients are connected if they share a significant number of flavor compounds. Link thickness represents the number of shared compounds.

O-CRESOL 3-METHYL-2-BUTANOL DECANOIC ACID

GLAZED CARROTS

VINEGAR

PYRROLIDINE STYRENE PROPENYL PROPYL DISULFIDE

CARROT

GERANIOL

After [11].

CHIVE

(b)

pimenta

turmeric

carnation lime juice

cassava lard

kelp

angelica holy basil mussel

avocado

litchi

star anise

geranium

black mustard seed oil

grape juice

cane molasses

pear

chamomile

lettuce

zucchini

bartlett pear

anise

dill

vinegar

sherry

armagnac

kale

wood squid

parsnip cocoa katsuobushi

cabernet sauvignon wine

cacao sour milk

cheese

potato chip feta cheese

munster cheese emmental cheese provolone cheese chinese cabbage

cream cheese

cashew

frankfurter bacon

roquefort cheese

ham

buttermilk coconut

butter

malt hop watercress rutabaga

cucumber

eel chervil

corn vegetable

coconut oil palm

salmon

catfish herring

fish

yogurt smoked fish

cod

yam

sunflower oil

horseradish wasabi brassica

red kidney bean

porcini

enokidake matsutake

broccoli

roasted pecan pecan

19

liver mushroom

shiitake

soybean oil roasted hazelnut

beef liver

chicory cauliflower

oat

chicken liver

scallion turnip

nira

bean kidney bean

tequila red algae

bone oil

leek

garlic

black bean red bean

roasted nut hazelnut

sea algae

chive

asparagus

haddock

vegetable oil

bread

cabbage

lima bean

smoked salmon mackerel tuna

white bread

wheat bread

shallot

onion

mung bean

sturgeon caviar

galanga

rye bread

clam beech

meat

potato

soybean

lentil caviar

sesame oil

turkey

crab

brussels sprout

baked potato

yeast

soy sauce

cheddar cheese

scallop lobster

egg

date

GRAPH THEORY

mutton

shrimp

kohlrabi sweet potato fenugreek

pea

chicken broth

cottage cheese

pumpkin raisin

oyster barley

beef broth

veal

raw beef

beef

peanut butter

root

pistachio

nut

macaroni

lamb

peanut oil macadamia nut

camembert cheese

sheep cheese

milk fat

rice

wheat

corn grit whole grain wheat flour

chicken pork roasted meat

peanut walnut

romano cheese

mozzarella cheese

cream

milk

truffle

Plants

brown rice

Cereal

roasted beef

swiss cheese

Animal products

tomato

corn flake rye flour smoked sausage

beet pork sausage

popcorn

roasted peanut

Flowers

saffron

lingonberry

cured pork

pork liver

parmesan cheese

prawn goat cheese

cilantro

coffee

gruyere cheese

Vegetables

tomato juice

buckwheat

chickpea

pork

jamaican rum blue cheese

Plant derivatives

tamarind

smoke egg noodle

beer

bourbon whiskey

licorice

mustard

octopus endive celery

black tea

white wine

rum

Herbs

mint

green tea

apple brandy

sauerkraut

mate

Meats

thyme

oatmeal

okra

Seafoods

1%

lovage

cardamom

parsley

strawberry jam peppermint

vanilla

whiskey

cider

red wine

radish

bell pepper

tea

brandy

cognacport wine

leaf

jasmine tea

raspberry

champagne wine

pear brandy grape brandy

cereal

cherry brandy

mandarin peel pimento green bell pepper

thai pepper

spearmint

roasted sesame seed grape

Nuts and seeds 10 %

celery oil

carrot

coriander

Alcoholic beverages

oregano

lime peel oil

pepper

salmon roe plum lemonjapanese peel peppermint oil roasted almond

sesame seed

blackberry

pineapple wine

violet

chayote

kiwi

strawberry

apple

shellfish carob

plum

apricot

prickly pear

cherry

cayenne

caraway

cinnamon

mandarin

bitter orange

orange juice cranberry

fig

lemon

Spices

30 %

black pepper

ginger

Dairy

10

tabasco pepper

rosemary

50 %

50

ouzo

CATEGORIES Fruits

mace

fennel

bergamot

PREVALENCE

150

nutmeg

black raspberry elderberry basil currant rose berry muscat grape mango lilac flower oil black currant sour cherry almond blueberry peach squash nectarine maple syrup huckleberry clove papaya citrus peel quince strawberry juice olive melon guava sake honey banana concord grape passion fruit

olive oil

gin

sage

orange lime tangerine citrus juniper berry

seaweed savory

durian

watermelon rhubarb

balm

flower orange flower jasmine

artichoke

artemisia fruit

anise seed

lemon juice gardenia

SHARED COMPOUNDS

marjoram

rapeseed black sesame seed

lavender

laurel orange peel

cumin

seed

bay

lemongrass

grapefruit

blackberry brandy

tarragon

kumquat

sassafras

BIPARTITE NETWORK

SECTION 2.8

PATHS AND DISTANCES

Physical distance plays a key role in determining the interactions be-

(a)

tween the components of physical systems. For example the distance be-

3 1

tween two atoms in a crystal or between two galaxies in the universe determine the forces that act between them. In networks distance is a challenging concept. Indeed, what is the diseach other? The physical distance is not relevant here: Two webpages could be sitting on computers on the opposite sides of the globe, yet, have a link to each other. At the same time two individuals that live in the same build-

6

1

5

7

3 2

4 6

5

7

Figure 2.12 Paths

(a) A path between nodes i0 and in is an ordered list of n links P = {(i0, i1), (i1, i2), (i2, i3), ... ,(in1, in)}. The length of this path is n. The path shown in orange in (a) follows the route 1→2→5→7→4→6, hence its length is n = 5.

ing may not know each other. In networks physical distance is replaced by path length. A path is a route that runs along the links of the network. A path’s length represents the number of links the path contains (Figure 2.12a). Note that some texts

(b) The shortest paths between nodes 1 and 7, or the distance d17, correspond to the path with the fewest number of links that connect nodes 1 to 7. There can be multiple paths of the same length, as illustrated by the two paths shown in orange and grey. The network diameter is the largest distance in the network, being dmax = 3 here.

require that each node a path visits is distinct. In network science paths play a central role. Next we discuss some of their most important properties, many more being summarized in Figure 2.13.

SHORTEST PATH The shortest path between nodes i and j is the path with the fewest number of links (Figure 2.12b). The shortest path is often called the distance between nodes i and j, and is denoted by dij, or simply d. We can have multiple shortest paths of the same length d between a pair of nodes (Figure 2.12b). The shortest path never contains loops or intersects itself. In an undirected network dij = dji, i.e. the distance between node i and j is

the same as the distance between node j and i. In a directed network often dij ≠ dji. Furthermore, in a directed network the existence of a path from node i to node j does not guarantee the existence of a path from j to i.

In real networks we often need to determine the distance between two

GRAPH THEORY

2

4

tance between two webpages, or between two individuals who do not know

(b)

20

FIG. 2.13 PATHOLOGY (a)

(b)

2

5

3

4

1

d1→4 2

(d)

(e)

(f)

(g)

GRAPH THEORY

5 d1→5

d1→4

(c)

Path A sequence of nodes such that each node is connected to the next node along the path by a link. Each path consists of n+1 nodes and n links. The length of a path is the number of its links, counting multiple links multiple times. For example, the orange line 1 → 2 → 5 → 4 → 3 covers a path of length four.

1

d1→4=3

3

4

2

5

3

4

2

5

Shortest Path (Geodesic Path, d) The path with the shortest distance d between two nodes. We also call d the distance between two nodes. Note that the shortest path does not need to be unique: between nodes 1 and 4 we have two shortest paths, 1→ 2→ 3→ 4 (blue) and 1→ 2→ 5→ 4 (orange), having the same length d1,4 =3.

1

d1→4=3=dmax

Diameter (dmax) The longest shortest path in a graph, or the distance between the two furthest nodes. In the graph shown here the diameter is between nodes 1 and 4, hence dmax=3.

1

3

4

2

5

3

4

2

5

3

4

2

5

3

4

d =(d1→2+d1→3+d1→4+d1→5+ +d2→3+d2→4+d2→5+ +d3→4+d3→5+ +d4→5)/10=1.6

Average Path Length (〈d〉) The average of the shortest paths between all pairs of nodes. For the graph shown on the left we have 〈d〉=1.6, whose calculation is shown next to the figure.

1

Cycle A path with the same start and end node. In the graph shown on the left we have only one cycle, as shown by the orange line.

1

Eulerian Path A path that traverses each link exactly once. The image shows two such Eulerian paths, one in orange and the other in blue.

1

Hamiltonian Path A path that visits each node exactly once. We show two Hamiltonian paths in orange and in blue.

21

PATHS AND DISTANCES IN NETWORKS

BOX 2.4 NUMBER OF SHORTEST PATHS BETWEEN TWO NODES

The number of shortest paths, Nij, and the distance dij between

nodes i and j can be calculated directly from the adjacency matrix Aij . dij = 1: If there is a direct link between i and j, then Aij = 1 (Aij = 0

otherwise). dij = 2: If there is a path of length two between i and j, then Aik Akj =1

(Aik Akj = 0 otherwise). The number of dij = 2 paths between i and j is

N



N ij(2) = Aik Akj = A 2  k =1

ij

where [...]ij denotes the (ij)th element of a matrix. dij = d: If there is a path of length d between i and j, then Aik ... Alj =

1 (Aik ... Alj = 0 otherwise). The number of paths of length d

between i and j is

N ij( d ) = A d

ij

.

These equations hold for directed and undirected networks. The distance between nodes i and j is the path with the smallest d for which Nij(d) > 0. Despite the elegancy of this approach, faced with a large network, it is more efficient to use the breadth-first-search algorithm described in BOX 2.5.

nodes. For a small network, like the one shown in Figure 2.12, this is an easy task. For a network with millions of nodes finding the shortest path between two nodes can be rather time consuming. The length of the shortest path and the number of such paths can be formally obtained from the adjacency matrix (BOX 2.4). In practice we use the breadth first search (BFS) algorithm discussed in BOX 2.5 for this purpose.

NETWORK DIAMETER The diameter of a network, denoted by dmax, is the maximum shortest path in the network. In other words, it is the largest distance recorded between any pair of nodes. One can verify that the diameter of the network shown in Figure 2.13 is dmax = 3. For larger networks the diameter can be determined using the BFS algorithm described in BOX 2.5.

GRAPH THEORY

22

PATHS AND DISTANCES IN NETWORKS

AVERAGE PATH LENGTH The average path length, denoted by 〈d〉, is the average distance between all pairs of nodes in the network. For a directed network of N nodes, 〈d〉 is

d =

1 ∑ di , j . N ( N − 1) i , j =1, N i≠ j

(2.14)

Note that (2.14) is measured only for node pairs that are in the same component (SECTION 2.9). We can use the BFS algorithm to determine the average path length for a large network. For this we first determine the distances between the first node and all other nodes in the network using the algorithm described in BOX 2.5. We then determine the distances between the second node and all other nodes but the first one (if the network is undirected). We then repeat this procedure for all nodes. The sum

BOX 2.5

(a)

BREADTH-FIRST SEARCH (BFS) ALGORITHM 0

BFS is a frequently used algorithms in network science. Similar to throwing a pebble in a pond and watching the ripples spread from it, BFS starts from a node and labels its neighbors, then the neighbors’

1

(b)

1

neighbors, until it reaches the target node. The number of “ripples”

0

needed to reach the target provides the distance. 1

The identification of the shortest path between node i and j follows the following steps (Figure 2.14):

(c) 2

1. Start at node i, that we label with “0”.

1

1

2

0 1

2. Find the nodes directly linked to i. Label them distance “1” and

3

put them in a queue.

(d) 3

3. Take the first node, labeled n, out of the queue (n = 1 in the first

1

1 2

0

2 3

1

step). Find the unlabeled nodes adjacent to it in the graph. Label them with n + 1 and put them in the queue. 4. Repeat step 3 until you find the target node j or there are no more

Figure 2.14 Applying the BFS Algorithm

nodes in the queue.

(a) Starting from the orange node, labeled ”0”, we identify all its neighbors, labeling them ”1”.

5. The distance between i and j is the label of j. If j does not have a label, then dij = ∞.

(b)-(d) Next we label ”2” the unlabeled neighbors of all nodes labeled ”1”, and so on, in each iteration increasing the label number, until no node is left unlabeled. The length of the shortest path or the distance d0i between node 0 and any other node i in the network is given by the label of node i. For example, the distance between node 0 and the leftmost node is d = 3.

The computational complexity of the BFS algorithm, representing the approximate number of steps the computer needs to find dij on a net-

work of N nodes and L links, is O(N + L). It is linear in N and L as each

node needs to be entered and removed from the queue at most once, and each link has to be tested only once.

GRAPH THEORY

23

PATHS AND DISTANCES IN NETWORKS

SECTION 2.9

CONNECTEDNESS

A phone would be of limited use as a communication device if we could not call any valid phone number; email would be rather useless if we could send emails to only certain email addresses, and not to others. From a network perspective this means that the network behind the phone or the Internet must be capable of establishing a path between any two nodes. This is in fact the key utility of most networks: they ensure connectedness. In this section we discuss the graph-theoretic formulation of connectedness. In an undirected network nodes i and j are connected if there is a path between them. They are disconnected if such a path does not exist, in which case we have dij = ∞. This is illustrated in Figure 2.15a, which shows a network consisting of two disconnected clusters. While there are paths between any two nodes on the same cluster (for example nodes 4 and 6), there are no paths between nodes that belong to different clusters (nodes 1 and 6). A network is connected if all pairs of nodes in the network are connected. A network is disconnected if there is at least one pair with dij = ∞. Clear-

ly the network shown in Figure 2.15a is disconnected, and we call its two subnetworks components or clusters. A component is a subset of nodes in a network, so that there is a path between any two nodes that belong to the component, but one cannot add any more nodes to it that would have the same property. If a network consists of two components, a properly placed single link can connect them, making the network connected (Figure 2.15b). Such a link is called a bridge. In general a bridge is any link that, if cut, disconnects the network. While for a small network visual inspection can help us decide if it is connected or disconnected, for a network consisting of millions of nodes connectedness is a challenging question. Mathematical and algorithmic tools can help us identify the connected components of a graph. For example, for a disconnected network the adjacency matrix can be rearranged into a block diagonal form, such that all nonzero elements in the matrix GRAPH THEORY

24

are contained in square blocks along the matrix’ diagonal and all other elements are zero (Figure 2.15a). Each square block corresponds to a component. We can use the tools of linear algebra to decide if the adjacency matrix is block diagonal, helping us to identify the connected components. In practice, for large networks the components are more efficiently identified using the BFS algorithm (BOX 2.6).

(a)

Figure 2.15

1

3

(a) A small network consisting of two disconnected components. Indeed, there is a path between any pair of nodes in the (1,2,3) component, as well in the (4,5,6,7) component. However, there are no paths between nodes that belong to the different components.

2 7

(b)

Connected and Disconnected Networks

5

4

6

The right panel shows the adjacently matrix of the network. If the network has disconnected components, the adjacency matrix can be rearranged into a block diagonal form, such that all nonzero elements of the matrix are contained in square blocks along the diagonal of the matrix and all other elements are zero.

1 5

4 3

2 7

6

(b) The addition of a single link, called a bridge, shown in grey, turns a disconnected network into a single connected component. Now there is a path between every pair of nodes in the network. Consequently the adjacency matrix cannot be written in a block diagonal form.

BOX 2.6 FINDING THE CONNECTED COMPONENTS OF A NETWORK

1. Start from a randomly chosen node i and perform a BFS (BOX 2.5). Label all nodes reached this way with n = 1. 2. If the total number of labeled nodes equals N, then the network is connected. If the number of labeled nodes is smaller than N, the network consists of several components. To identify them, proceed to step 3. 3. Increase the label n → n + 1. Choose an unmarked node j, label it with n. Use BFS to find all nodes reachable from j, label them all with n. Return to step 2.

GRAPH THEORY

25

CONNECTEDNESS AND COMPONENTS

SECTION 2.10

CLUSTERING COEFFICIENT

The clustering coefficient captures the degree to which the neighbors

(a)

of a given node link to each other. For a node i with degree ki the local clustering coefficient is defined as [12]

Ci =

2 Li ki ( ki − 1)

(2.15)

where Li represents the number of links between the ki neighbors of node i.

(b)

Ci = 0 if none of the neighbors of node i link to each other.



Ci = 1 if the neighbors of node i form a complete graph, i.e. they all

Cii=0

0

1/6 0

Note that Ci is between 0 and 1 (Figure 2.16a): •

Cii=1/2

Cii=1

⟨C⟩= 1/3

0

2/3

13 �0.310 42

3 C △△= = 0.375 8

1

link to each other. •

Ci is the probability that two neighbors of a node link to each other.

Consequently C = 0.5 implies that there is a 50% chance that two neighbors of a node are linked. In summary Ci measures the network’s local link density: The more

densely interconnected the neighborhood of node i, the higher is its local clustering coefficient. The degree of clustering of a whole network is captured by the average clustering coefficient, 〈C〉, representing the average of Ci over all nodes i = 1, ..., N [12],

1  (2.16) C = Ci .



N

Figure 2.16 Clustering Coefficient

(a) The local clustering coefficient, Ci , of the central node with degree ki = 4 for three different configurations of its neighborhood. The local clustering coefficient measures the local density of links in a node’s vicinity. (b) A small network, with the local clustering coefficient of each nodes shown next to it. We also list the network’s average clustering coefficient 〈C〉, according to (2.16), and its global clustering coefficient CΔ, defined in SECTION 2.12, Eq. (2.17). Note that for nodes with degrees ki = 0,1, the clustering coefficient is zero.

∑ N i =1

In line with the probabilistic interpretation 〈C〉 is the probability that two neighbors of a randomly selected node link to each other. While (2.16) is defined for undirected networks, the clustering coefficient can be generalized to directed and weighted [13, 14, 15, 16] networks as well. In the network literature we may encounter the global clustering coefficient as well, discussed in ADVANCED TOPICS 2.A. GRAPH THEORY

26

SECTION 2.11

SUMMARY

The crash course offered in this chapter introduced some of the basic graph theoretical concepts and tools used in network science. The set of elementary network characteristics, summarized in Figure 2.17, offer a formal language through which we can explore networks. Many of the networks we study in network science consist of thousands or even millions of nodes and links (Table 2.1). To explore them, we need to go beyond the small graphs shown in Figure 2.17. A glimpse of what we are about to encounter is offered by the protein-protein interaction network of yeast (Figure 2.4a). The network is too complex to understand its properties through a visual inspection of its wiring diagram. We therefore need to turn to the tools of network science to characterize its topology. Let us use the measures we introduced so far to explore some basic characteristics of this network. The undirected network, shown in Figure 2.4a, has N = 2,018 proteins as nodes and L=2,930 binding interactions as links. Hence its average degree, according to (2.2), is 〈k〉 = 2.90, suggesting that a typical protein interacts with approximately two to three other proteins. Yet, this number is somewhat misleading. Indeed, the degree distribution pk shown in Figure 2.4b,c, indicates that the vast majority of nodes have only a few links. To be precise, in this network 69% of nodes have fewer than three links, i.e. for these k < 〈k〉 . These numerous nodes with few links coexist with a few highly connected nodes, or hubs, the largest having as many as 92 links. Such wide differences in node degrees is a consequence of the network’s scale-free property, discussed in CHAPTER 4. We will see that the shape of the degree distribution determines a wide range of network properties, from the network’s robustness to the spread of viruses. The breadth-first-search algorithm (BOX 2.5) helps us determine the network’s diameter, finding dmax = 14. We might be tempted to expect wide variations in d, as some nodes are close to each other, others, however, may

be quite far. The distance distribution (Figure 2.18a) indicates otherwise: pd has a prominent peak between 5 and 6, telling us that most distances are rather short, being in the vicinity of 〈d〉 =5.61. Also, pd decays fast for

GRAPH THEORY

27

10-2

(a)

large d, suggesting that large distances are absent. Indeed, the variance of the distances is σd = 1.64, indicating that most path lengths are in the close

100

101

102

k

103

0.25 0 10 0.2

vicinity of 〈d〉 . These are manifestations of the small world property dis-

p10 d pk 0.15 -1

cussed in CHAPTER 3.

10-2

HUBS

0.1

The breadth first search algorithm also tells us that the protein interaction network is not connected, but consists of 185 components, shown

⟨d⟩

10-3 0.05

as isolated clusters and nodes in Figure 2.4a. The largest, called the giant component, contains 1,647 of the 2,018 nodes; all other components are

-4 10 0

tiny. As we will see in the coming chapters, such fragmentation is common

1000

in real networks. (b)

2

4

6 101 8 d

10

k

12

142 10

100

The average clustering coefficient of the protein interaction network is C(k)

〈C〉 =0.12, which, as we will come to appreciate in the coming chapters, indicates a significant degree of local clustering. A further caveat is provided by the dependence of the clustering coefficient on the node’s degree, or

10-1

the C(k) function (Figure 2.18b). The fact that C(k) decreases for large k indicates that the local clustering coefficient of the small nodes is significantly higher than the local clustering coefficient of the hubs. Hence the small degree nodes are located in dense local network neighborhoods, while the

10-2

neighborhood of the hubs is much sparser. This is a consequence of hierarchy, a network property discussed in CHAPTER 9. Finally, a visual inspection reveals an interesting pattern: hubs have a

100

101

102

k

103

0.25

Figure 2.18 Characterizing a Real Network 0.2

tendency to connect to small nodes, giving the network a hub and spoke

pd The protein-protein interaction (PPI) network

character (Figure 2.4a). This is a consequence of degree correlations, dis-

of yeast0.15 is frequently studied by biologists and network scientists. The detailed wiring diagram of the network is shown in Figure 2.4a. 0.1 The figure indicates that the network, consisting of N=2,018 nodes and ⟨d⟩ L=2,930 links, has a large 0.05 component that connects 81% of the proteins. We also have several smaller components and 0 numerous isolated proteins that do not interact with any other node.

cussed in CHAPTER 7. Such correlations influence a number of network based processes, from spreading phenomena to the number of driver nodes needed to control a network. Taken together, Figures 2.4 and 2.18 illustrate that the quantities we introduced in this chapter can help us diagnose several key properties of real

0

networks. The purpose of the coming chapters is to study systematically

2

4

6

d

8

10

12

14

(a) The distance distribution, pd, for the PPI network, providing the probability that two randomly chosen nodes have a distance d between them (shortest path). The grey vertical line shows the average path length, which is 〈d〉 =5.61.

these network characteristics and understand what they tell us about a particular complex system.

(b) The dependence of the average local clustering coefficient on the node’s degree, k. The C(k) function is obtained by averaging over the local clustering coefficient of all nodes with the same degree k.

GRAPH THEORY

28

SUMMARY

1 0 0

4 4

1 1 2 2

3

L L= =

FIG. 2.17 GRAPHOLOGY3

N N i, j=1 i, j=1

0 0 0

0 0 0

0 0 0

A A jiji Aijij A L L < k >= < k >= N N

A Aijij

In network science we often distinguish networks by1some elementary property of 0 00 1 1 11 1 0 0 0 Unweighted Unweighted the underlying graph. Here we summarize the 1most commonly encountered netUndirected 0 11 1 1 1 1 (undirected) 1the 0 1 particular (undirected) A = work types. We also list real systems that A share property. Note that ij = = ij ij 1 1 00 4 0 0 11 1 0network 0 4 of these elementary many real networks combine several characteristics. 1 1 00 1 a directed multi-graph0 For example the WWW is with the mobile 00 1 self-interactions; 0 0 0 0 call network is directed and weighted, without =0 Aij = A ji Aii self-loops.

(a)

Aii =N0 1 NN 1 L = A L= =1 Aijij 2 L A ij 2 i,i, j=1 j=1

2 2

3 3

2 i, j=1

Weighted Undirected Directed Weighted (undirected) (undirected)

4 4

1 1 1 2 22

3 33

(b)

Aij = A ji 2L 2L < >= = 2L N < kk >= N

4

0 00 0 2 1 2 0 A A Aijijijij = = 1 A == 0.5 0.51 0 0 00

2 21 1 0 00 0 1 10 1 4 40

0.5 00 100.5 1 1 11 11 0 00 0 00 0 0 00 0 0

0 0

1 0 10 11 0 1 A = = Aijijijij = 11 1 0 00

1 1 0 1 0 1 1 0 1 1

1 101 1 1 11 0 0 00 0 00 0

0 0 1 1 0 0 1 0 1

A AiiiiA= =ii 0 0= 0

0 0 4 4 0 0

===jiAA AijAA AijijijA A jijiji N N 2L L 2L 1 2L < k >= k= L= = AijA < < >= N L kk>= ij N N i, 2j=1 N i, j=1

2

Directed Self-loops Undirected Unweighted Self-loops (undirected)

44 4 4 4

11 1

00 11 00 00

AijA Ǝ i, AA = =0 0 0 = = AijijA =ijij jiA A ii ii Ǝ i, AA AA = jijiA A jiji ii iiN 0 N N N N N L 2L 1 1 1 2L 1 k >= L= = A L = L k L= AijijA+ +ijA A= >=N??N 2 2 j=1,i i, 2 i,i, j=1,i i, j=1 i=1 2jj j=1 N i=1

2 22

3 33

N

i, j=1

(c)

Undirected Directed Unweighted Multigraph Weighted Multigraph (undirected) (undirected)

4 4 4 44

1 11 1 1

1 2 2211 00 0 000 1 1 0 1 11 0 41

110.5 0 00 0 11 00 0 1 11 1 1 1 13 4 1 0 0 0 30 0 0 00 0 0 0 0 0 0 00 00 00 0 0

00 1 00 10 2 AA Aijijij= = 11 == A ij ij 1 0.5 1 0 00 1 0 1

1 1121 1 0 0 0 000 0 1 1 1011 1 1 141

11 100.5 1 1 10 1 1 1 1 11 1 1 0 0 0 0 01 1 0 0 0 0 1 0 1 00

00 0 1 14 0 00 0 10

0 001 1 Aij = 221 A A == 1 Aijijij = 0.5 1 01 00

1 221 0 0 00 1 11 1 1 41

1 1 101 0 00 0

0 1 4 301 00 00 10

0 0

Directed Complete Complete Graph Unweighted Weighted Graph Self-loops (undirected) (undirected)

444 4 4

11 11

(e)

Unweighted Weighted Self-loops Multigraph (undirected) (undirected) (undirected)

4 4 44

1 11

(f)

0

4 44

11 1

GRAPH THEORY

Complete Graph Self-loops Multigraph (undirected) (undirected) 11 1

0

0 01 0 2 Aij = 21 Aij = =0.51 A ijij = 1 11 0 1 00

2 121 0 000 1 111 4 131

0.5 1 111 1 1 111 0 0 001 0 1 000

1 00 1 Aij = 21 Aij = =1 A ij 11 0

1 1 0 12 1 1 1 0 0 1 1 00 1 1 1 3 1 0 0 11 0 0 1 0 1 0 1

444

Self-loops In many networks nodes do not interact with themselves, so the diagonal elements of the adjacency matrix are zero, Aii = 0, i = 1,..., N. In some systems self-interactions are allowed; in such networks, self-loops represent the fact that node i interacts with itself. Examples: WWW, protein interactions. Multigraph/Simple Graphs In a multigraph nodes are permitted to have multiple links (or parallel links) between them. Hence Aii can be any positive integer. Networks that do not allow multiple links are called simple. Multigraph Examples: Social networks, where we distinguish friendship, family and professional ties.

Directed Network A network whose links have selected directions. Examples: WWW, mobile phone calls, citation network.

Weighted Network A network whose links have a defined weight, strength or flow parameter. The elements of the adjacency matrix are Aij = wij if there is a link with weight wij between them. For unweighted (binary) networks, the adjacency matrix only indicates the presence (Aij = 1) or the absence (Aij = 0) of a link. Examples: Mobile phone calls, email network.

0

00 4 31 0 00 0

01

Aii ==0 AAij = A ji1 Ǝ i, =A Aji ji AAA = 00 AA iij ij j== iiii ii 2L N N N N(N 1) 1 1 2L < k >= N Aii < =>= ?N 1 LLL === LmaxA=ij Aij + 22i, i,j=1j=1,i j 2 i=1 N

22 22

33 3

0

ii N

Weighted Complete Self-loops MultigraphGraph (undirected) (undirected) (undirected)

3

1 000 10.5

Aij = A ji A =0 AAijA= A jiA ji ij = N ij = A ji 1 2L N N 1L= Aij < k >= 2L 1 2L Aii = ij + LL== 2 Aij2 i,Aj=1 >=?NN i=1 2 i, i,j=1j=1,i j N ii Ǝ i, AAAiiii==000

222 2

3 3 3

0

Aij AA ji = 1 Aii = 0 i= AAijijA =jj =A A1jijiA = i ij ji L N N(N 1) N 1 N < k >= N(N 1) 2L 2L1 L = A L = = < kkkk>= N ij 1L max L = L = < >= N 1 L = A >= < >= N ij A L = max i, 2j=1 Aij 2 + ? ii 2 N 2 i, j=1,i j i, j=1 i=1 A=ii00= AA Ǝ i,A 00 iiiiii=N

2 22 22

3 33 3

0

i, j=1

i, j=1

(d)

3 3

===AAjiA A=ii 0==00 ijA AA A ijAA ij ijA AAiiiiiiA ==ii0N0N A =jiAAjiji jiji ijij = N N 1 N L2L 1 2L 1 2L 2L 1 LL= 2L < k >= A = A < k >= A < k >= L = A < k >= ij ij = ij L=2 Aijij >= NNN i, j=1 NN 2 i, j=1 2i,2j=1 N

2 2 22 2 2

3 3 33 3

0 0 000 110 Aij = 2 A = 21 A Aijijijijij == = 0.5 1101 1 000

Undirected Network A network whose links do not have a defined direction. Examples: Internet, power grid, science collaboration networks.

Complete Graph (Clique) In a complete graph, or a clique, all nodes are connected to each other. Examples: Actors in the cast of the same movie, as they are all linked to each other in the actor network.

29

SUMMARY

SECTION 2.12

HOMEWORK

2.1. Königsberg Problem

(a)

(b)

(c)

(d)

a)

b)

Which of the icons in Figure 2.19 can be drawn without raising your pencil from the paper, and without drawing any line more than once? Why? 2.2. Matrix Formalism Let A be the N x N adjacency matrix of an undirected unweighted network, without self-loops. Let 1 be a column vector of N elements, all equal to 1. In other words 1 = (1, 1, ..., 1) , where the superscript T indicates the T

transpose operation. Use the matrix formalism (multiplicative constants, multiplication row by column, matrix operations like transpose and trace, etc, but avoid the sum symbol ∑) to write expressions for: (a) The vector k whose elements area)the degrees ki of all b) nodes i = 1, 2,..., N.

Figure 2.19

c)

Königsberg Problem

(b) The total number of links, L, in the network. (c) The number of triangles T present in the network, where a triangle means three nodes, each connected by links to the other two (Hint: you can use the trace of a matrix). (d) The vector knn whose element i is the sum of the degrees of node i's neighbors.

(e) The vector knnn whose element i is the sum of the degrees of node i's second neighbors.

2.3. Graph Representation The adjacency matrix is a useful graph representation for many analytical calculations. However, when we need to store a network in a computer, we can save computer memory by offering the list of links in a Lx2 matrix, whose rows contain the starting and end point i and j of each link. Construct for the networks (a) and (b) in Figure 2.20:

GRAPH THEORY

30

d)

c)

a)

b) (a)

1

2

(b)

3

6 5

1

6

4

Figure 2.20

2

Graph Representation (a) Undirected graph of 6 nodes and 7 links. (b) Directed graph of 6 nodes and 8 directed links.

3 5

4

(a) The corresponding adjacency matrices. (b) The corresponding link lists. (c) Determine the average clustering coefficient of the network shown in Figure 2.20a. (d) If you switch the labels of nodes 5 and 6 in Figure 2.20a, how does that move change the adjacency matrix? And the link list? (e) What kind of information can you not infer from the link list representation of the network that you can infer from the adjacency matrix? (f) In the (a) network, how many paths (with possible repetition of nodes and links) of length 3 exist starting from node 1 and ending at node 3? And in (b)? (g) With the help of a computer, count the number of cycles of length 4 in both networks. 2.4. Degree, Clustering Coefficient and Components (a) Consider an undirected network of size N in which each node has degree k = 1. Which condition does N have to satisfy? What is the degree distribution of this network? How many components does the network have? (b) Consider now a network in which each node has degree k = 2 and clustering coefficient C = 1. How does the network look like? What condition does N satisfy in this case? 2.5. Bipartite Networks

2

1

Consider the bipartite network of Figure 2.21

4

3

6

5

(a) Construct its adjacency matrix. Why is it a block-diagonal matrix? (b) Construct the adjacency matrix of its two projections, on the pur-

7

ple and on the green nodes, respectively. (c) Calculate the average degree of the purple nodes and the average degree of the green nodes in the bipartite network. (d) Calculate the average degree in each of the two network projec-

9

10

11

Figure 2.21 Bipartite network Bipartite network with 6 nodes in one set and 5 nodes in the other, connected by 10 links.

tions. Is it surprising that the values are different from those obtained in point (c)?

GRAPH THEORY

8

31

HOMEWORK

2.6. Bipartite Networks - General Considerations Consider a bipartite network with N1 and N2 nodes in the two sets. (a) What is the maximum number of links Lmax the network can have? (b) How many links cannot occur compared to a non-bipartite network of size N = N1 + N2 ? (c) If N1≪N2 , what can you say about the network density, that is the total number of links over the maximum number of links, Lmax?

(d) Find an expression connecting N1, N2 and the average degree for the two sets in the bipartite network, 〈k1〉 and 〈k2〉.

GRAPH THEORY

32

HOMEWORK

SECTION 2.13

ADVANCED TOPICS 2.A GLOBAL CLUSTERING COEFFICIENT

In the network literature we ocassionally encounter the global clustering coefficient, which measures the total number of closed triangles in a network. Indeed, Li in (2.15) is the number of triangles that node i participates in, as each link between two neighbors of node i closes a triangle (Figure 2.17). Hence the degree of a network’s global clustering can be also captured by the global clustering coefficient, defined as

,

(2.17)

where a connected triplet is an ordered set of three nodes ABC such that A connects to B and B connects to C. For example, an A, B, C triangle is made of three triplets, ABC, BCA and CAB. In contrast a chain of connected nodes A, B, C, in which B connects to A and C, but A does not link to C, forms a single open triplet ABC. The factor three in the numerator of (2.17) is due to the fact that each triangle is counted three times in the triplet count. The roots of the global clustering coefficient go back to the social network literature of the 1940s [17, 18], where CΔ is often called the ratio of transitive triplets. Note that the average clustering coefficient defined in (2.16) and the global clustering coefficient (2.17) are not equivalent. Indeed, take a network that is a double star, consisting of N nodes, where nodes 1 and 2 are joined to each other and to all other nodes, and there are no other links. Then the local clustering coefficient Ci is 1 for i ≥ 3 and 2/(N − 1) for i = 1, 2. It follows that the average clustering coefficient of the network is = 1−O(1), while the global clustering coefficient is CΔ ~ 2/N. In less extreme networks the two definitions will give more comparable values, but they still differ from each other [19]. For example, for the network of in Figure 2.16b we have = 0.31 and CΔ = 0.375.

GRAPH THEORY

33

SECTION 2.14

BIBLIOGRAPHY

[1] K.-I. Goh, M. E. Cusick, D. Valle, B. Childs, M. Vidal, and A.-L. Barabási. The human disease network. PNAS, 104:8685–8690, 2007. [2] H.U. Obrist. Mapping it out: An alternative atlas of contemporary cartographies. Thames and Hudson, London, 2014. [3] I. Meirelles. Design for Information. Rockport, 2013. [4] K. Börner. Atlas of Science: Visualizing What We Know. The MIT Press, 2010. [5] L. B. Larsen. Networks: Documents of Contemporary Art. MIT Press. 2014. [6] L. Euler, Solutio Problemat is ad Geometriam Situs Pertinentis. Commentarii Academiae Scientiarum Imperialis Petropolitanae 8:128-140, 1741. [7] G. Alexanderson. Euler and Königsberg’s bridges: a historical view. Bulletin of the American Mathematical Society 43: 567, 2006. [8] A.-L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286:509–512, 1999. [9] G. Gilder. Metcalfe’s law and legacy. Forbes ASAP, 1993. [10] B. Briscoe, A. Odlyzko, and B. Tilly. Metcalfe’s law is wrong. IEEE Spectrum, 43:34–39, 2006. [11] Y.-Y. Ahn, S. E. Ahnert, J. P. Bagrow, A.-L. Barabási. Flavor network and the principles of food pairing, Scientific Reports, 196, 2011. [12] D. J. Watts and S. H. Strogatz. Collective dynamics of ‘small-world’ networks. Nature, 393:440–442, 1998. GRAPH THEORY

34

[13] A. Barrat, M. Barthélemy, R. Pastor-Satorras, and A. Vespignani. The architecture of complex weighted networks. PNAS, 101:3747–3752, 2004. [14] J. P. Onnela, J. Saramäki, J. Kertész, and K. Kaski. Intensity and coherence of motifs in weighted complex networks. Physical Review E, 71:065103, 2005. [15] B. Zhang and S. Horvath. A general framework for weighted gene coexpression network analysis. Statistical Applications in Genetics and Molecular Biology, 4:17, 2005. [16] P. Holme, S. M. Park, J. B. Kim, and C. R. Edling. Korean university life in a network perspective: Dynamics of a large affiliation network. Physica A, 373:821–830, 2007. [17] R. D. Luce and A. D. Perry. A method of matrix analysis of group structure. Psychometrika, 14:95–116, 1949. [18] S. Wasserman and K Faust. Social Network Analysis: Methods and Applications. Cambridge University Press, 1994. [19] B. Bollobás and O. M. Riordan. Mathematical results on scale-free random graphs, in Stefan Bornholdt, Hans Georg Schuster, Handbook of Graphs and Networks: From the Genome to the Internet (2003 Wiley-VCH Verlag GmbH & Co. KGaA).

GRAPH THEORY

35

BIBLIOGRAPHY

3 ALBERT-LÁSZLÓ BARABÁSI

NETWORK SCIENCE RANDOM NETWORKS

ACKNOWLEDGEMENTS

MÁRTON PÓSFAI GABRIELE MUSELLA MAURO MARTINO ROBERTA SINATRA

SARAH MORRISON AMAL HUSSEINI PHILIPP HOEVEL

INDEX Introduction

1

The Random Network Model

2

Number of Links

3

Degree Distribution

4

Real Networks are Not Poisson

5

The Evolution of a Random Network

6

Real Networks are Supercritical

7

Small Worlds

8

Clustering Coefficient

9

Summary: Real Networks are Not Random

10

Homework

11

ADVANCED TOPICS 3.A Deriving the Poisson Distribution

12

Figure 3.0 (cover image)

Erdős Number

ADVANCED TOPICS 3.B Maximum and Minimum Degrees ADVANCED TOPICS 3.C Giant Component ADVANCED TOPICS 3.D Component Sizes

13

14

15

ADVANCED TOPICS 3.E Fully Connected Regime

16

ADVANCED TOPICS 3.F Phase Transitions

17

The Hungarian mathematician Pál Erdős authored hundreds of research papers, many of them in collaboration with other mathematicians. His relentless collaborative approach to mathematics inspired the Erdős Number, which works like this: Erdős’ Erdős number is 0. Erdős’ coauthors have Erdős number 1. Those who have written a paper with someone with Erdős number 1 have Erdős number 2, and so on. If there is no chain of coauthorships connecting someone to Erdős, then that person’s Erdős number is infinite. Many famous scientists have low Erdős numbers: Albert Einstein has Erdős Number 2 and Richard Feynman has 3. The image shows the collaborators of Pál Erdős, as drawn in 1970 by Ronald Graham, one of Erdős’ close collaborators. As Erdős’ fame rose, this image has achieved an iconic status.

ADVANCED TOPICS 3.G Small World Corrections

18

Bibliography

19

This work is licensed under a Creative Commons: CC BY-NC-SA 2.0. PDF V32, 08.09.2014

SECTION 3.1

INTRODUCTION

(a)

Early

Imagine organizing a party for a hundred guests who initially do not know each other [1]. Offer them wine and cheese and you will soon see them chatting in groups of two to three. Now mention to Mary, one of your guests, that the red wine in the unlabeled dark green bottles is a rare vintage, much better than the one with the fancy red label. If she shares this information only with her acquaintances, your expensive wine appears to be safe, as she only had time to meet a few others so far. The guests will continue to mingle, however, creating subtle paths between individuals that may still be strangers to each other. For example, while John has not yet met Mary, they have both met Mike, so there is an invisible path from John to Mary through Mike. As time goes on, the guests

(b)

Later

will be increasingly interwoven by such elusive links. With that the secret of the unlabeled bottle will pass from Mary to Mike and from Mike to John, escaping into a rapidly expanding group. To be sure, when all guests had gotten to know each other, everyone would be pouring the superior wine. But if each encounter took only ten minutes, meeting all ninety-nine others would take about sixteen hours. Thus, you could reasonably hope that a few drops of your fine wine would be left for you to enjoy once the guests are gone. Yet, you would be wrong. In this chapter we show you why. We will see that the party maps into a classic model in network science called the random network model. And random network theory tells us that we do not have to wait until all individuals get to know each other for our expensive wine to be in danger. Rather, soon after each person meets at least one oth-

Figure 3.1 From a Cocktail Party to Random Networks

er guest, an invisible network will emerge that will allow the information

The emergence of an acquaintance network through random encounters at a cocktail party.

to reach all of them. Hence in no time everyone will be enjoying the better wine.

(a) Early on the guests form isolated groups. (b) As individuals mingle, changing groups, an invisible network emerges that connects all of them into a single network. RANDOM NETWORKS

3

SECTION 3.2

THE RANDOM NETWORK MODEL BOX 3.1 DEFINING RANDOM NETWORKS

There are two definitions of a

Network science aims to build models that reproduce the properties of

random network:

real networks. Most networks we encounter do not have the comforting regularity of a crystal lattice or the predictable radial architecture of a spider web. Rather, at first inspection they look as if they were spun randomly

G(N, L) Model

(Figure 2.4). Random network theory embraces this apparent randomness

N labeled nodes are connect-

by constructing and characterizing networks that are truly random.

ed with L randomly placed links. Erdős and Rényi used

From a modeling perspective a network is a relatively simple object,

this definition in their string

consisting of only nodes and links. The real challenge, however, is to decide

of papers on random net-

where to place the links between the nodes so that we reproduce the com-

works [2-9].

plexity of a real system. In this respect the philosophy behind a random network is simple: We assume that this goal is best achieved by placing

G(N, p) Model

the links randomly between the nodes. That takes us to the definition of a

Each pair of N labeled nodes

random network (BOX 3.1):

is connected with probability p, a model introduced by Gil-

A random network consists of N nodes where each node pair is connect-

bert [10].

ed with probability p. Hence, the G(N, p) model fixes the probability p that two nodes

To construct a random network we follow these steps:

are connected and the G(N, L) 1) Start with N isolated nodes.

model fixes the total number of links L. While in the G(N, L)

2) Select a node pair and generate a random number between 0 and 1.

model the average degree of a

If the number exceeds p, connect the selected node pair with a link,

node is simply = 2L/N, oth-

otherwise leave them disconnected.

er network characteristics are easier to calculate in the G(N, p)

3) Repeat step (2) for each of the N(N-1)/2 node pairs.

model. Throughout this book we will explore the G(N, p) model,

The network obtained after this procedure is called a random graph or

not only for the ease that it al-

a random network. Two mathematicians, Pál Erdős and Alfréd Rényi, have

lows us to calculate key network

played an important role in understanding the properties of these net-

characteristics, but also because

works. In their honor a random network is called the Erdős-Rényi network

in real networks the number of

(BOX 3.2).

links rarely stays fixed.

RANDOM NETWORKS

4

BOX 3.2 RANDOM NETWORKS: A BRIEF HISTORY

(a)

(b)

Figure 3.2

(a) Pál Erdős (1913-1996) Hungarian mathematician known for both his exceptional scientific output and eccentricity. Indeed, Erdős published more papers than any other mathematician in the history of mathematics. He co-authored papers with over five hundred mathematicians, inspiring the concept of Erdős number. His legendary personality and profound professional impact has inspired two biographies [12, 13] and a documentary [14] (Online Resource 3.1).

Anatol Rapoport (1911-2007), a Russian immigrant to the United

(b) Alfréd Rényi (1921-1970)

States, was the first to study random networks. Rapoport’s interests

Hungarian mathematician with fundamental contributions to combinatorics, graph theory, and number theory. His impact goes beyond mathematics: The Rényi entropy is widely used in chaos theory and the random network theory he co-developed is at the heart of network science. He is remembered through the hotbed of Hungarian mathematics, the Alfréd Rényi Institute of Mathematics in Budapest.

turned to mathematics after realizing that a successful career as a concert pianist would require a wealthy patron. He focused on mathematical biology at a time when mathematicians and biologists hardly spoke to each other. In a paper written with Ray Solomonoff in 1951 [11], Rapoport demonstrated that if we increase the average degree of a network, we observe an abrupt transition from disconnected nodes to a graph with a giant component. The study of random networks reached prominence thanks to the fundamental work of Pál Erdős and Alfréd Rényi (Figure 3.2). In a sequence of eight papers published between 1959 and 1968 [2-9], they

>

merged probability theory and combinatorics with graph theory, establishing random graph theory, a new branch of mathematics [2]. The random network model was independently introduced by Edgar Nelson Gilbert (1923-2013) [10] the same year Erdős and Rényi published their first paper on the subject. Yet, the impact of Erdős and Rényi’s work is so overwhelming that they are rightly considered the founders of random graph theory.

Online Resource 3.1 N is a Number: A Portrait of Paul Erdős

The 1993 biographical documentary of Pál Erdős, directed by George Paul Csicsery, offers a glimpse into Erdős' life and scientific impact [14].

“A mathematician is a device for turning coffee into theorems”

>

Alfréd Rényi (a quote often attributed to Erdős)

RANDOM NETWORKS

5

THE RANDOM NETWORK MODEL

SECTION 3.3

NUMBER OF LINKS

Each random network generated with the same parameters N, p looks slightly different (Figure 3.3). Not only the detailed wiring diagram changes between realizations, but so does the number of links L. It is useful, therefore, to determine how many links we expect for a particular realization of a random network with fixed N and p. The probability that a random network has exactly L links is the product of three terms: 1) The probability that L of the attempts to connect the N(N-1)/2 pairs of nodes have resulted in a link, which is pL. 2) The probability that the remaining N(N-1)/2 - L attempts have not resulted in a link, which is (1-p)N(N-1)/2-L. 3) A combinational factor,  N(N-1)    2 ,   L 



(3.0)

counting the number of different ways we can place L links among N(N-1)/2 node pairs. We can therefore write the probability that a particular realization of a random network has exactly L links as  N(N-1)  N ( N −1)  pL (1 − p ) 2 − L . pL =  2    L 

(3.1)

As (3.1) is a binomial distribution (BOX 3.3), the expected number of links in a random graph is

〈L 〉 =

N ( N −1) 2

∑ L=0

RANDOM NETWORKS

LpL = p

N(N − 1) . 2

(3.2)

6

Hence is the product of the probability p that two nodes are connected and the number of pairs we attempt to connect, which is Lmax = N(N

- 1)/2 (CHAPTER 2).

Using (3.2) we obtain the average degree of a random network 〈k 〉 =

2〈L 〉 = p(N − 1). N

(3.3)

Hence is the product of the probability p that two nodes are connected and (N-1), which is the maximum number of links a node can have in a network of size N. In summary the number of links in a random network varies between realizations. Its expected value is determined by N and p. If we increase p a random network becomes denser: The average number of links increase linearly from = 0 to Lmax and the average degree of a node increases

from = 0 to = N-1.

Figure 3.3 Random Networks are Truly Random

Top Row Three realizations of a random network generated with the same parameters p=1/6 and N=12. Despite the identical parameters, the networks not only look different, but they have a different number of links as well (L=10, 10, 8). Bottom Row Three realizations of a random network with p=0.03 and N=100. Several nodes have degree k=0, shown as isolated nodes at the bottom. RANDOM NETWORKS

7

NUMBER OF LINKS

BOX 3.3 BINOMIAL DISTRIBUTION: MEAN AND VARIANCE

If we toss a fair coin N times, tails and heads occur with the same probability p = 1/2. The binomial distribution provides the probability px that we obtain exactly x heads in a sequence of N throws. In general, the binomial distribution describes the number of successes in N independent experiments with two possible outcomes, in which the probability of one outcome is p, and of the other is 1-p. The binomial distribution has the form

 N px =   p x (1 − p )N − x .  x  The mean of the distribution (first moment) is N

 (3.4) 〈 x 〉 = xpx = Np. x =0



Its second moment is N

〈 x 2 〉 = ∑ x 2 px = p(1 − p )N + p 2N 2 ,

(3.5)

x =0

providing its standard deviation as

(

σ x = 〈 x 2 〉 − 〈 x 〉2

)

1 2

1

= [ p(1 − p )N ] 2 .

(3.6)

Equations (3.4) - (3.6) are used repeatedly as we characterize random networks.

RANDOM NETWORKS

8

NUMBER OF LINKS

SECTION 3.4

DEGREE DISTRIBUTION

In a given realization of a random network some nodes gain numerous

0.14

links, while others acquire only a few or no links (Figure 3.3). These differ-

0.12

ences are captured by the degree distribution, pk, which is the probability

pk 0.1 0.08

that a randomly chosen node has degree k. In this section we derive pk for a random network and discuss its properties.

0.06 0.04

BINOMIAL DISTRIBUTION the product of three terms [15]:

0

• The number of ways we can select k links from N- 1 potential links a   N − 1  k  .

Consequently the degree distribution of a random network follows the binomial distribution

k

 N − 1  k 

(3.7)

The shape of this distribution depends on the system size N and the probability p (Figure 3.4). The binomial distribution (BOX 3.3) allows us to calculate the network’s average degree , recovering (3.3), as well as its second moment and variance σk (Figure 3.4).

RANDOM NETWORKS

Peak at

Width

Width

5

10

15

20

k

25

30

35

40

The exact form of the degree distribution of a random network is the binomial distribution (left half). For N ≫ the binomial is well approximated by a Poisson distribution (right half). As both formulas describe the same distribution, they have the identical properties, but they are expressed in terms of different parameters: The binomial distribution depends on p and N, while the Poisson distribution has only one parameter, . It is this simplicity that makes the Poisson form preferred in calculations.

(1-p)N-1-k.

p = p k (1 − p )N −1− k .

Peak at

Figure 3.4 Binomial vs. Poisson Degree Distribution

• The probability that the remaining (N-1-k) links are missing, or

node can have, or

POISSON

0.02

In a random network the probability that node i has exactly k links is

• The probability that k of its links are present, or pk.

BINOMIAL

9

POISSON DISTRIBUTION

0.1

Most real networks are sparse, meaning that for them ≪ N (Table

0.075

2.1). In this limit the degree distribution (3.7) is well approximated by the

〈k 〉 k , k!

BINOMINAL N=102

pk

Poisson distribution (ADVANCED TOPICS 3.A)

pk = e −〈 k 〉

POISSON

N=103 N=104

0.05

0.025

(3.8)

0 20

30

40

which is often called, together with (3.7), the degree distribution of a random network.

50

k

60

70

80

Figure 3.5

Degree Distribution is Independent of the

The binomial and the Poisson distribution describe the same quantity,

Network Size

hence they have similar properties (Figure 3.4):

The degree distribution of a random network with = 50 and N = 102, 103, 104.

• Both distributions have a peak around . If we increase p the net-

Small Networks: Binomial For a small network (N = 102) the degree distribution deviates significantly from the Poisson form (3.8), as the condition for the Poisson approximation, N», is not satisfied. Hence for small networks one needs to use the exact binomial form (3.7) (green line).

work becomes denser, increasing and moving the peak to the right. • The width of the distribution (dispersion) is also controlled by p or . The denser the network, the wider is the distribution, hence the larger are the differences in the degrees.

Large Networks: Poisson For larger networks (N = 103, 104) the degree distribution becomes indistinguishable from the Poisson prediction (3.8), shown as a continuous grey line. Therefore for large N the degree distribution is independent of the network size. In the figure we averaged over 1,000 independently generated random networks to decrease the noise.

When we use the Poisson form (3.8), we need to keep in mind that: • The exact result for the degree distribution is the binomial form (3.7), thus (3.8) represents only an approximation to (3.7) valid in the ≪ N limit. As most networks of practical importance are sparse, this condition is typically satisfied. • The advantage of the Poisson form is that key network characteristics, like , and σk , have a much simpler form (Figure 3.4), depending on a single parameter, . • The Poisson distribution in (3.8) does not explicitly depend on the number of nodes N. Therefore, (3.8) predicts that the degree distribution of networks of different sizes but the same average degree are indistinguishable from each other (Figure 3.5). In summary, while the Poisson distribution is only an approximation to the degree distribution of a random network, thanks to its analytical simplicity, it is the preferred form for pk. Hence throughout this book, un-

less noted otherwise, we will refer to the Poisson form (3.8) as the degree distribution of a random network. Its key feature is that its properties are independent of the network size and depend on a single parameter, the average degree .

RANDOM NETWORKS

10

DEGREE DISTRIBUTION

SECTION 3.5

REAL NETWORKS ARE NOT POISSON

As the degree of a node in a random network can vary between 0 and N-1, we must ask, how big are the differences between the node degrees in a particular realization of a random network? That is, can high degree nodes coexist with small degree nodes? We address these questions by estimating the size of the largest and the smallest node in a random network. Let us assume that the world’s social network is described by the random network model. This random society may not be as far fetched as it first sounds: There is significant randomness in whom we meet and whom we choose to become acquainted with. Sociologists estimate that a typical person knows about 1,000 individuals on a first name basis, prompting us to assume that ≈ 1,000. Using the results obtained so far about random networks, we arrive to a number of intriguing conclusions about a random society of N ≃ 7 x 109 of individuals (ADVANCED TOPICS 3.B): • The most connected individual (the largest degree node) in a random society is expected to have kmax = 1,185 acquaintances. • The degree of the least connected individual is kmin = 816, not that different from kmax or .

• The dispersion of a random network is σk = 1/2 , which for = 1,000 is σk = 31.62. This means that the number of friends a typical individual has is in the ± σk range, or between 968 and 1,032, a rather narrow window. Taken together, in a random society all individuals are expected to have a comparable number of friends. Hence if people are randomly connected to each other, we lack outliers: There are no highly popular individuals, and no one is left behind, having only a few friends. This suprising conclusion is a consequence of an important property of random networks: in a large random network the degree of most nodes is in the narrow vicinity of RANDOM NETWORKS

11

(BOX 3.4).

BOX 3.4

This prediction blatantly conflicts with reality. Indeed, there is extensive evidence of individuals who have considerably more than 1,185 ac-

WHY ARE HUBS MISSING?

quaintances. For example, US president Franklin Delano Roosevelt’s appointment book has about 22,000 names, individuals he met personally

To understand why hubs, nodes

[16, 17]. Similarly, a study of the social network behind Facebook has docu-

with a very large degree, are ab-

mented numerous individuals with 5,000 Facebook friends, the maximum

sent in random networks, we

allowed by the social networking platform [18]. To understand the origin of

turn to the degree distribution

these discrepancies we must compare the degree distribution of real and

(3.8).

random networks.

We first note that the 1/k! term

In Figure 3.6 we show the degree distribution of three real networks, to-

in (3.8) significantly decreases

gether with the corresponding Poisson fit. The figure documents systemat-

the chances of observing large

ic differences between the random network predictions and the real data:

degree nodes. Indeed, the Stirling approximation

• The Poisson form significantly underestimates the number of high degree nodes. For example, according to the random network model

 k k !   2π k      e 

the maximum degree of the Internet is expected to be around 20. In contrast the data indicates the existence of routers with degrees close to 103.

allows us rewrite (3.8) as

• The spread in the degrees of real networks is much wider than expect-

−〈 k 〉

e

ed in a random network. This difference is captured by the dispersion

pk =

σk (Figure 3.4). If the Internet were to be random, we would expect σk =

2π k

( e 〈k 〉 ) . k

k

(3.9)



2.52. The measurements indicate σinternet = 14.14, significantly higher

For degrees k > e the term in

than the random prediction. These differences are not limited to the

the parenthesis is smaller than

networks shown in Figure 3.6, but all networks listed in Table 2.1 share

one, hence for large k both k-de-

this property.

pendent terms in (3.9), i.e. 1/√k and (e/k)k decrease rapidly

In summary, the comparison with the real data indicates that the ran-

with increasing k. Overall (3.9)

dom network model does not capture the degree distribution of real net-

predicts that in a random net-

works. In a random network most nodes have comparable degrees, forbid-

work the chance of observing a

ding hubs. In contrast, in real networks we observe a significant number

hub decreases faster than expo-

of highly connected nodes and there are large differences in node degrees.

nentially.

We will resolve these differences in CHAPTER 4.

RANDOM NETWORKS

k

12

REAL NETWORKS ARE NOT POISSON

(a)

(b)

100 10

(c)

100

INTERNET

SCIENCE COLLABORATION

-1

10-1

pk

pk

10-2

10-2

10-3 10

PROTEIN INTERACTIONS

10-1

pk

10-2

100

10-3 -4

10-3 10-4

10-5

⟨k⟩

⟨k⟩

⟨k⟩ 10-4

10-5

10-6 100

101

k

102

103

100

101

k

102

100

103

101

k

102

Figure 3.6

Degree Distribution of Real Networks The degree distribution of the (a) Internet, (b) science collaboration network, and (c) protein interaction network (Table 2.1). The green line corresponds to the Poisson prediction, obtained by measuring for the real network and then plotting (3.8). The significant deviation between the data and the Poisson fit indicates that the random network model underestimates the size and the frequency of the high degree nodes, as well as the number of low degree nodes. Instead the random network model predicts a larger number of nodes in the vicinity of than seen in real networks.

RANDOM NETWORKS

13

REAL NETWORKS ARE NOT POISSON

SECTION 3.6

THE EVOLUTION OF A RANDOM NETWORK

The cocktail party we encountered at the beginning of this chapter captures a dynamical process: Starting with N isolated nodes, the links are added gradually through random encounters between the guests. This corresponds to a gradual increase of p, with striking consequences on the network topology (Online Resource 3.2). To quantify this process, we first inspect

>

how the size of the largest connected cluster within the network, NG, varies

with . Two extreme cases are easy to understand:

• For p = 0 we have = 0, hence all nodes are isolated. Therefore the largest component has size NG = 1 and NG/N→0 for large N. • For p = 1 we have = N-1, hence the network is a complete graph and all nodes belong to a single component. Therefore NG = N and NG/N = 1. One would expect that the largest component grows gradually from NG

= 1 to NG = N if increases from 0 to N-1. Yet, as Figure 3.7a indicates, this

Online Resource 3.2

Evolution of a Random Network

is not the case: NG/N remains zero for small , indicating the lack of a

A video showing the change in the structure of a random network with increasing p. It vividly illustrates the absence of a giant component for small p and its sudden emergence once p reaches a critical value.

large cluster. Once exceeds a critical value, NG/N increases, signaling

the rapid emergence of a large cluster that we call the giant component.

>

Erdős and Rényi in their classical 1959 paper predicted that the condition for the emergence of the giant component is [2]

k = 1.

(3.10)

In other words, we have a giant component if and only if each node has on average more than one link (ADVANCED TOPICS 3.C). The fact that we need at least one link per node to observe a giant component is not unexpected. Indeed, for a giant component to exist, each of its nodes must be linked to at least one other node. It is somewhat counterintuitive, however, that one link is sufficient for its emergence. We can express (3.10) in terms of p using (3.3), obtaining

RANDOM NETWORKS

14

1 1 pc =  ≈ , N −1 N





(3.11)

Therefore the larger a network, the smaller p is sufficient for the giant component. The emergence of the giant component is only one of the transitions characterizing a random network as we change . We can distinguish four topologically distinct regimes (Figure 3.7a), each with its unique characteristics:

1 N

Subcritical Regime: 0 < < 1 (p <  , Figure 3.7b). For = 0 the network consists of N isolated nodes. Increasing means that we are adding N = pN(N-1)/2 links to the network. Yet, given that < 1, we have only a small number of links in this regime, hence we mainly observe tiny clusters (Figure 3.7b). We can designate at any moment the largest cluster to be the giant component. Yet in this regime the relative size of the largest cluster, NG/N, remains zero. The reason is that for < 1 the largest cluster is a tree with size NG ~ lnN, hence its size increases much slower than the size of the net-

work. Therefore NG/N ≃ lnN/N→0 in the N→∞ limit.

In summary, in the subcritical regime the network consists of numerous tiny components, whose size follows the exponential distribution (3.35). Hence these components have comparable sizes, lacking a clear winner that we could designate as a giant component. Critical Point: = 1 (p =  1 , Figure 3.7c).

N

The critical point separates the regime where there is not yet a giant component ( < 1) from the regime where there is one ( > 1). At this point the relative size of the largest component is still zero (Figure 3.7c). Indeed, the size of the largest component is NG ~ N2/3. Consequently NG grows much slower than the network’s size, so its relative size decreases as NG/N~

N -1/3 in the N→∞ limit.

Note, however, that in absolute terms there is a significant jump in the size of the largest component at = 1. For example, for a random network with N = 7 ×109 nodes, comparable to the globe’s social network, for < 1 the largest cluster is of the order of NG ≃ lnN = ln (7 ×109) ≃ 22.7. In contrast at = 1 we expect NG ~ N2/3 = (7 ×109)2/3 ≃ 3 ×106, a jump of about five orders of magnitude. Yet, both in the subcritical regime and at the critical point the largest component contains only a vanishing fraction of the total number of nodes in the network. In summary, at the critical point most nodes are located in numerous small components, whose size distribution follows (3.36). The power law form indicates that components of rather different sizes coexist. These numerous small components are mainly trees, while the giant component RANDOM NETWORKS

15

THE EVOLUTION OF A RANDOM NETWORK

RANDOM NETWORKS

16

THE EVOLUTION OF A RANDOM NETWORK

(a)

NG /N

0

0.2

0.4

0.6

0.8

1

〈k〉 < 1

1

(b) Subcritical Regime • No giant component • Cluster size distribution: ps ~ s-3/2 e-αs • Size of the largest cluster: NG ~ lnN • The clusters are trees

0

3

(c) Critical Point • No giant component • Cluster size distribution: ps ~ s -3/2 • Size of the largest cluster: NG ~ N 3/2 • The clusters may contain loops

〈k〉 = 1

2

〈k〉 > 1

〈k〉 » lnN

6

(e) Connected Regime • Single giant component • No isolated nodes or clusters • Size of the giant component: NG = N • Giant component has loops

5

(b-e) A sample network and its properties in the four regimes that characterize a random network.

(a) The relative size of the giant component in function of the average degree in the Erdős-Rényi model. The figure illustrates the phase tranisition at = 1, responsible for the emergence of a giant component with nonzero NG.

Evolution of a Random Network

Figure 3.7

(d) Supercritical Regime • Single giant component • Cluster size distribution: ps ~ s-3/2 e-αs • Size of the giant component: NG ~ (p - pc )N • The small clusters are trees • Giant component has loops

k

4

may contain loops. Note that many properties of the network at the critical point resemble the properties of a physical system undergoing a phase transition (ADVANCED TOPICS 3.F). Supercritical Regime: > 1 (p > 1 , Figure 3.7d).

N

This regime has the most relevance to real systems, as for the first time we have a giant component that looks like a network. In the vicinity of the critical point the size of the giant component varies as

NG / N ~ 〈k 〉 − 1,

(3.12)

NG ~ ( p − pc )N ,

(3.13)

or

where pc is given by (3.11). In other words, the giant component contains a finite fraction of the nodes. The further we move from the critical point, a larger fraction of nodes will belong to it. Note that (3.12) is valid only in the vicinity of = 1. For large the dependence between NG and is

nonlinear (Figure 3.7a).

In summary in the supercritical regime numerous isolated components coexist with the giant component, their size distribution following (3.35). These small components are trees, while the giant component contains loops and cycles. The supercritical regime lasts until all nodes are absorbed by the giant component.

ln N , Figure 3.7e). N

Connected Regime: ‹k› > lnN (p > 

For sufficiently large p the giant component absorbs all nodes and components, hence NG≃ N. In the absence of isolated nodes the network becomes connected. The average degree at which this happens depends on N as (AD-

VANCED TOPIC 3.E)

〈k 〉 = ln N .

(3.14)

Note that when we enter the connected regime the network is still relatively sparse, as lnN / N → 0 for large N. The network turns into a complete graph only at = N - 1. In summary, the random network model predicts that the emergence of a network is not a smooth, gradual process: The isolated nodes and tiny components observed for small collapse into a giant component through a phase transition (ADVANCED TOPICS 3.F). As we vary we encounter four topologically distinct regimes (Figure 3.7). The discussion offered above follows an empirical perspective, fruitful if we wish to compare a random network to real systems. A different perspective, with its own rich behavior, is offered by the mathematical literature (BOX 3.5).

RANDOM NETWORKS

17

THE EVOLUTION OF A RANDOM NETWORK

BOX 3.5 NETWORK EVOLUTION IN GRAPH THEORY.

In the random graph literature it is often assumed that the connection probability p(N) scales as Nz, where z is a tunable parameter between -∞ and 0 [15]. In this language Erdős and Rényi discovered that as we vary z, key properties of random graphs appear quite suddenly. A graph has a given property Q if the probability of having Q approaches 1 as N → ∞. That is, for a given z either almost every graph has the property Q or almost no graph has it. For example, for z less than -3/2 almost all graphs contain only isolated nodes and pairs of nodes connected by a link. Once z exceeds -3/2, most networks will contain paths connecting three or more nodes (Figure 3.8).

p~Nz z

-�

-2

-3/2

-4/3

-5/4

-1

-2/3

-1/2

Figure 3.8 Evolution of a Random Graph The threshold probabilities at which different subgraphs appear in a random graph, as defined by the exponent z in the p(N) ~ Nz relationship. For z < -3/2 the graph consists of isolated nodes and edges. When z passes -3/2 trees of order 3 appear, while at z = -4/3 trees of order 4 appear. At z = 1 trees of all orders are present, together with cycles of all orders. Complete subgraphs of order 4 appear at z =-2/3, and as z increases further, complete subgraphs of larger and larger order emerge. After [19].

RANDOM NETWORKS

18

THE EVOLUTION OF A RANDOM NETWORK

SECTION 3.7

REAL NETWORKS ARE SUPERCRITICAL

Two predictions of random network theory are of direct importance for

N

L

k

InN

Internet

192,244

609,066

6.34

12.17

Power Grid

4,941

6,594

2.67

8.51

Science Collaboration

23,133

94,439

8.08

10.05

emerge that contains a finite fraction of all nodes. Hence only for

Actor Network

702,388

29,397,908 83.71

13.46

> 1 the nodes organize themselves into a recognizable network.

Protein Interactions

2,018

2,930

7.61

NETWORK

real networks: 1) Once the average degree exceeds = 1, a giant component should

2) For > lnN all components are absorbed by the giant component, resulting in a single connected network.

Table 3.1

Are Real Networks Connected? The number of nodes N and links L for the undirected networks of our reference network list of Table 3.1, shown together with and lnN. A giant component is expected for > 1 and all nodes should join the giant component for > lnN. While for all networks > 1, for most is under the lnN threshold (see also Figure 3.9).

Do real networks satisfy the criteria for the existence of a giant component, i.e. > 1? And will this giant component contain all nodes for > lnN, or will we continue to see some disconnected nodes and components? To answer these questions we compare the structure of a real network for a given with the theoretical predictions discussed above. The measurements indicate that real networks extravagantly exceed the = 1 threshold. Indeed, sociologists estimate that an average person has around 1,000 acquaintances; a typical neuron is the human brain has about 7,000 synapses; in our cells each molecule takes part in several chemical reactions. This conclusion is supported by Table 3.1, that lists the average degree of several undirected networks, in each case finding > 1. Hence the average degree of real networks is well beyond the = 1 threshold, implying that they all have a giant component. The same is true for the reference networks listed in Table 3.1. Let us now turn to the second prediction, inspecting if we have single component (i.e. if > lnN), or if the network is fragmented into multiple components (i.e. if < lnN). For social networks the transition between the supercritical and the fully connected regime should be at > ln(7 ×109) ≈ 22.7. That is, if the average individual has more than two dozens acquaintances, then a random society must have a single component, leavRANDOM NETWORKS

2.90

19

ing no individual disconnected. With ≈ 1,000 this condition is clearly satisfied. Yet, according to Table 3.1 many real networks do not obey the fully connected criteria. Consequently, according to random network theory these networks should be fragmented into several disconnected components. This is a disconcerting prediction for the Internet, indicating that some routers should be disconnected from the giant component, being unable to communicate with other routers. It is equally problematic for the power grid, indicating that some consumers should not get power. These predictions are clearly at odds with reality. In summary, we find that most real networks are in the supercritical regime (Figure 3.9). Therefore these networks are expected to have a giant component, which is in agreement with the observations. Yet, this giant component should coexist with many disconnected components, a prediction that fails for several real networks. Note that these predictions should be valid only if real networks are accurately described by the Erdős-Rényi model, i.e. if real networks are random. In the coming chapters, as we learn more about the structure of real networks, we will understand why real networks can stay connected despite failing the k > lnN criteria.

SUBCRITICAL

SUPERCRITICAL

Figure 3.9

FULLY CONNECTED

Most Real Networks are Supercritical The four regimes predicted by random network theory, marking with a cross the location () of the undirected networks listed in Table 3.1. The diagram indicates that most networks are in the supercritical regime, hence they are expected to be broken into numerous isolated components. Only the actor network is in the connected regime, meaning that all nodes are part of a single giant component. Note that while the boundary between the subcritical and the supercritical regime is always at = 1, the boundary between the supercritical and the connected regime is at lnN, which varies from system to system.

INTERNET

POWER GRID SCIENCE COLLABORATION

ACTOR NETWORK

YEAST PROTEIN INTERACTIONS

1

RANDOM NETWORKS

10

k

20

REAL NETWORKS ARE SUPERCRITICAL

SECTION 3.8

SMALL WORLDS

q

The small world phenomenon, also known as six degrees of separation, has long fascinated the general public. It states that if you choose any two individuals anywhere on Earth, you will find a path of at most six acquain-

q

tances between them (Figure 3.10). The fact that individuals who live in the same city are only a few handshakes from each other is by no means surprising. The small world concept states, however, that even individuals

q

who are on the opposite side of the globe can be connected to us via a few acquaintances. In the language of network science the small world phenomenon im-

w

plies that the distance between two randomly chosen nodes in a network is short. This statement raises two questions: What does short (or small)

Jane

q q

Ralph

w

w

Sarah

w

Peter

q

mean, i.e. short compared to what? How do we explain the existence of Figure 3.10

these short distances?

Six Deegree of Separation

Both questions are answered by a simple calculation. Consider a ran-

According to six degrees of separation two individuals, anywhere in the world, can be connected through a chain of six or fewer acquaintances. This means that while Sarah does not know Peter, she knows Ralph, who knows Jane and who in turn knows Peter. Hence Sarah is three handshakes, or three degrees from Peter. In the language of network science six degrees, also called the small world property, means that the distance between any two nodes in a network is unexpectedly small.

dom network with average degree . A node in this network has on average: nodes at distance one (d=1). 2 nodes at distance two (d=2). 3 nodes at distance three (d =3). ... d nodes at distance d. For example, if ≈ 1,000, which is the estimated number of acquaintences an individual has, we expect 106 individuals at distance two and about a billion, i.e. almost the whole earth’s population, at distance three from us. To be precise, the expected number of nodes up to distance d from our starting node is

N(d ) ≈ 1 + 〈k 〉 + 〈k 〉2 + ... + 〈k 〉d =

〈k 〉d +1 − 1 . 〈k 〉 − 1

RANDOM NETWORKS

(3.15)

21

(a)

N(d) must not exceed the total number of nodes, N, in the network.

1D LATTICE ⟨d⟩~N

2D LATTICE ⟨d⟩~N1/2

Therefore the distances cannot take up arbitrary values. We can identify the maximum distance, dmax, or the network’s diameter by setting 

N(dmax ) ≈ N,.

3D LATTICE ⟨d⟩~N1/3

⟨d⟩

(3.16)

RANDOM NETWORK

⟨d⟩~lnN

Assuming that » 1, we can neglect the (-1) term in the nominator and the denominator of (3.15), obtaining

N

 〈k 〉

dmax



N.

(b)

(3.17)

1D

2D

3D

Therefore the diameter of a random network follows 

ln N , dmax ≈ ln 〈k 〉

ln⟨d⟩

(3.18)

RANDOM NETWORK

which represents the mathematical formulation of the small world phelnN

nomenon. The key, however is its interpretation: Figure 3.11

• As derived, (3.18) predicts the scaling of the network diameter, dmax, with

Why are Small Worlds Surprising?

the size of the system, N. Yet, for most networks (3.18) offers a better

Much of our intuition about distance is based on our experience with regular lattices, which do not display the small world property:

approximation to the average distance between two randomly chosen nodes, , than to dmax (Table 3.2). This is because dmax is often dominat-

ed by a few extreme paths, while is averaged over all node pairs, a

1D: For a one-dimensional lattice (a line of length N) the diameter and the average path length scale linearly with N: dmax~ ~N.

process that supresses the fluctuations. Hence typically the small world property is defined by

〈d 〉 ≈



ln N , ln 〈k 〉

2D: For a square lattice dmax~ ~ N1/2.

(3.19)

3D: For a cubic lattice dmax~ ~ N1/3. 4D: In general, for a d-dimensional lattice dmax ~ ~ N1/d.

describing the dependence of the average distance in a network on N and .

These polynomial dependences predict a much faster increase with N than (3.19), indicating that in lattices the path lengths are significantly longer than in a random network. For example, if the social network would form a square lattice (2D), where each individual knows only its neighbors, the average distance between two individuals would be roughly (7 ×109)1/2 = 83,666. Even if we correct for the fact that a person has about 1,000 acquaintances, not four, the average separation will be orders of magnitude larger than predicted by (3.19).

• In general lnN « N, hence the dependence of on lnN implies that the distances in a random network are orders of magnitude smaller than the size of the network. Consequently by small in the "small world phenomenon" we mean that the average path length or the diameter depends logarithmically on the system size. Hence, “small” means that is proportional to lnN, rather than N or some power of N (Figure 3.11). • The 1/ln term implies that the denser the network, the smaller is

(a) The figure shows the predicted N-dependence of for regular and random networks on a linear scale. (b) The same as in (a), but shown on a log-log scale.

the distance between the nodes. • In real networks there are systematic corrections to (3.19), rooted in the fact that the number of nodes at distance d > drops rapidly (ADVANCED TOPICS 3.F). Let us illustrate the implications of (3.19) for social networks. Using N ≈ 7 ×109 and ≈ 103, we obtain  RANDOM NETWORKS

22

SMALL WORLD PROPERTY



〈d 〉 ≈

ln7 × 109 = 3.28. ln(103 )

BOX 3.6

(3.20)

19 DEGREES OF SEPARATION

Therefore, all individuals on Earth should be within three to four handshakes of each other [20]. The estimate (3.20) is probably closer to the real

How many clicks do we need to reach

value than the frequently quoted six degrees (BOX 3.7).

a randomly chosen document on the Web? The difficulty in addressing

Much of what we know about the small world property in random net-

this question is rooted in the fact

works, including the result (3.19), is in a little known paper by Manfred Ko-

that we lack a complete map of the

chen and Ithiel de Sola Pool [20], in which they mathematically formulated

WWW—we only have access to small

the problem and discussed in depth its sociological implications. This pa-

samples of the full map. We can

per inspired the well known Milgram experiment (BOX 3.6), which in turn

start, however, by measuring the

inspired the six-degrees of separation phrase.

WWW’s average path length in samples of increasing sizes, a procedure

While discovered in the context of social systems, the small world prop-

called finite size scaling. The mea-

erty applies beyond social networks (BOX 3.6). To demonstrate this in Table

surements indicate that the average

3.2 we compare the prediction of (3.19) with the average path length for

path length of the WWW increases

several real networks, finding that despite the diversity of these systems

with the size of the network as [21]

and the significant differences between them in terms of N and , (3.19)

⟨d⟩ � 0.35 + 0.89 lnN.

offers a good approximation to the empirically observed .

In 1999 the WWW was estimated to

In summary the small world property has not only ignited the public’s

have about 800 million documents

imagination (BOX 3.8), but plays an important role in network science as

[22], in which case the above equa-

well. The small world phenomena can be reasonably well understood in

tion predicts ≈18.69. In other

the context of the random network model: It is rooted in the fact that the

words in 1999 two randomly chosen

number of nodes at distance d from a node increases exponentially with d.

documents were on average 19 clicks

In the coming chapters we will see that in real networks we encounter sys-

from each other, a result that be-

tematic deviations from (3.19), forcing us to replace it with more accurate

came known as 19 degrees of separa-

predictions. Yet the intuition offered by the random network model on the

tion. Subsequent measurements on

origin of the small world phenomenon remains valid.

NETWORK

N

L

k

a sample of 200 million documents d

dmax

found ≈16 [23], in good agree-

lnN

ment with the ≈17 prediction.

ln k

Currently the WWW is estimated to

Internet

192,244

609,066

6.34

6.98

26

6.58

WWW

325,729

1,497,134

4.60

11.27

93

8.31

Power Grid

4,941

6,594

2.67

18.99

46

8.66

in which case the formula predicts

Mobile Phone Calls

36,595

91,826

2.51

11.72

39

11.42

≈25. Hence is not fixed but as

Email

57,194

103,731

1.81

5.88

18

18.4

the network grows, so does the dis-

Science Collaboration

23,133

93,439

8.08

5.35

15

4.81

tance between two documents.

Actor Network

702,388

29,397,908

83,71

3,91

14

3,04

Citation Network

449,673

4,707,958

10.43

11,21

42

5.55

E. Coli Metabolism

1,039

5,802

5.58

2.98

8

4.04

Protein Interactions

2,018

2,930

2.9 0

5.61

14

7.14

have about trillion nodes (N~1012),

The average path length of 25 is much larger than the proverbial six degrees (BOX 3.7). The difference is easy to understand: The WWW has smaller

Table 3.2 Six Degrees of Separation

average degree and larger size than the social network. According to (3.19)

The average distance and the maximum distance dmax for the ten reference networks. The last column provides predicted by (3.19), indicating that it offers a reasonable approximation to the measured . Yet, the agreement is not perfect - we will see in the next chapter that for many real networks (3.19) needs to be adjusted. For directed networks the average degree and the path lengths are measured along the direction of the links.

RANDOM NETWORKS

both of these differences increase the Web’s diameter.

23

SMALL WORLD PROPERTY

BOX 3.7 SIX DEGREES: EXPERIMENTAL CONFIRMATION

The first empirical study of the small world phenomena took place in 1967, when Stanley Milgram, building on the work of distances in social networks [24, 25]. Milgram chose a stock broker in Boston and a divinity student in Sharon, Massachusetts as targets. He then randomly selected residents of Wichita and Omaha, sending them a letter containing a short summary of the study’s purpose, a photograph, the name, address and infor-

15 NUMBER OF CHAINS

Pool and Kochen [20], designed an experiment to measure the

mation about the target person. They were asked to forward the

N=64

10

5

0

letter to a friend, relative or acquantance who is most likely to

0

1

know the target person.

2 3 4 5 6 7 8 9 10 11 12 NUMBER OF INTERMEDIARIES

0.7

Within a few days the first letter arrived, passing through only

0.6

two links. Eventually 64 of the 296 letters made it back, some,

0.5

however, requiring close to a dozen intermediates [25]. These

pd

completed chains allowed Milgram to determine the number of

0.4

He found that the median number of intermediates was 5.2, a

0.2

relatively small number that was remarkably close to Frigyes

0.1

Karinthy’s 1929 insight (BOX 3.8).

0

work, hence his experiment could not detect the true distance between his study’s participants. Today Facebook has the most extensive social network map ever assembled. Using Facebook’s social graph of May 2011, consisting of 721 million active users and 68 billion symmetric friendship links, researchers found an average distance 4.74 between the users (Figure 3.12b). Therefore, the study detected only ‘four degrees of separation’ [18], closer to the prediction of (3.20) than to Milgram’s six degrees [24, 25].

“I asked a person of intelligence how many steps he thought it would take, and he said that it would require 100 intermediate persons, or more, to move from Nebraska to Sharon.” Stanley Milgram, 1969

RANDOM NETWORKS

USA

0.1

individuals required to get the letter to the target (Figure 3.12a).

Milgram lacked an accurate map of the full acquaintance net-

Worldwide

0

2

4

d

6

8

10

Figure 3.12

Six Degrees? From Milgram to Facebook (a) In Milgram's experiment 64 of the 296 letters made it to the recipient. The figure shows the length distribution of the completed chains, indicating that some letters required only one intermediary, while others required as many as ten. The mean of the distribution was 5.2, indicating that on average six ‘handshakes’ were required to get a letter to its recipient. The playwright John Guare renamed this ‘six degrees of separation’ two decades later. After [25]. (b) The distance distribution, pd , for all pairs of Facebook users worldwide and within the US only.Using Facebook’s N and L (3.19) predicts the average degree to be approximately 3.90, not far from the reported four degrees. After [18].

24

THE EVOLUTION OF A RANDOM NETWORK

RANDOM NETWORKS

25

1929

1935

Frigyes Karinthy (1887-1938) Hungarian writer, journalist and playwright, the first to describe the small world property. In his short story entitled ‘Láncszemek’ (Chains) he links a worker in Ford’s factory to himself [26, 27].

PUBLICATION DATE

MILESTONES

1940

1945

WWII

Karinthy, 1929

“The worker knows the manager in the shop, who knows Ford; Ford is on friendly terms with the general director of Hearst Publications, who last year became good friends with Árpád Pásztor, someone I not only know, but to the best of my knowledge a good friend of mine.”

19 DEGREES OF THE WWW

BOX 3.8

1958 1960

Manfred Kochen (1928-1989), Ithiel de Sola Pool (1917-1984) Scientific interest in small worlds started with a paper by political scientist Ithiel de Sola Pool and mathematician Manfred Kochen. Written in 1958 and published in 1978, their work addressed in mathematical detail the small world effect, predicting that most individuals can be connected via two to three acquaintances. Their paper inspired the experiments of Stanley Milgram.

1950

1970

1978

1980

19 Degrees of the WWW Measurements on the WWW indicate that the separation between two randomly chosen documents is 19 [21] (Box 3.6).

1985

6-DEGREE OF SEPARATION

John Guare

1991

2011

The Facebook Data Team measures the average distance between its users, finding “4 degrees” (BOX 3.7).

2005

4-DEGREE OF SEPARATION

Duncan J. Watts (1971), Steven Strogatz (1959) A new wave of interest in small worlds followed the study of Watts and Strogatz, finding that the small world property applies to natural and technological networks as well [29].

1998 1999 2000

XXI

Duncan J. Watts Steven Strogatz

John Guare (1938) The phrase ‘six degrees of separation’ was introduced by the playwright John Guare, who used it as the title of his Broadway play [28].

Stanley Milgram (1933-1984) American social psychologist who carried out the first experiment testing the small-world phenomena. (BOX 3.7).

1967

PUBLISHED 20 YEARS LATER

Ithiel de Sola Pool

DISCOVERY

Manfred Kochen

Stanley Milgram

Guare, 1991

“Everybody on this planet is separated by only six other people. Six degrees of separation. Between us and everybody else on this planet. The president of the United States. A gondolier in Venice. It’s not just the big names. It’s anyone. A native in a rain forest. A Tierra del Fuegan. An Eskimo. I am bound to everyone on this planet by a trail of six people. It’s a profound thought. How every person is a new door, opening up into other worlds.”

SECTION 3.9

CLUSTERING COEFFICIENT

The degree of a node contains no information about the relationship between a node's neighbors. Do they all know each other, or are they perhaps isolated from each other? The answer is provided by the local clustering coefficient Ci, that measures the density of links in node i’s immediate

neighborhood: Ci = 0 means that there are no links between i’s neighbors; Ci = 1 implies that each of the i’s neighbors link to each other (SECTION 2.10).

To calculate Ci for a node in a random network we need to estimate the

expected number of links Li between the node’s ki neighbors. In a random

network the probability that two of i’s neighbors link to each other is p. As there are ki(ki - 1)/2 possible links between the ki neighbors of node i, the

expected value of Li is

〈Li 〉 = p





ki ( ki − 1) . 2

(3.20)

Thus the local clustering coefficient of a random network is  2〈Li 〉 〈k 〉 =p= . Ci = ki ( ki − 1) N

(3.21)

Equation (3.21) makes two predictions: (1) For fixed , the larger the network, the smaller is a node’s cluster-

ing coefficient. Consequently a node's local clustering coefficient Ci is expected to decrease as 1/N. Note that the network's average clustering coefficient, also follows (3.21). (2) The local clustering coefficient of a node is independent of the node’s

degree. To test the validity of (3.21) we plot / in function of N for several undirected networks (Figure 3.13a). We find that / does not decrease as N-1, but it is largely independent of N, in violation of the prediction (3.21) RANDOM NETWORKS

26

and point (1) above. In Figure 3.13b-d we also show the dependency of C on the node’s degree ki for three real networks, finding that C(k) systematically decreases with the degree, again in violation of (3.21) and point (2).

In summary, we find that the random network model does not capture the clustering of real networks. Instead real networks have a much higher clustering coefficient than expected for a random network of similar N and L. An extension of the random network model proposed by Watts and Strogatz [29] addresses the coexistence of high and the small world property (BOX 3.9). It fails to explain, however, why high-degree nodes have a smaller clustering coefficient than low-degree nodes. Models explaining the shape of C(k) are discussed in Chapter 9.

(a)

(b)

All Networks

10

Internet

10

0

Clustering in Real Networks (a) Comparing the average clustering coefficient of real networks with the prediction (3.21) for random networks. The circles and their colors correspond to the networks of Table 3.2. Directed networks were made undirected to calculate and . The green line corresponds to (3.21), predicting that for random networks the average clustering coefficient decreases as N-1. In contrast, for real networks appears to be independent of N.

10-1

10-2

C(k)

C / k 10-4

10-2

10-6

10-3 103

101

N

100

105

Science Collaboration

(c)

10

Figure 3.13

0

10

102

k

104

(b)-(d) The dependence of the local clustering coefficient, C(k), on the node’s degree for (b) the Internet, (c) science collaboration network and (d) protein interaction network. C(k) is measured by averaging the local clustering coefficient of all nodes with the same degree k. The green horizontal line corresponds to .

0

k

C(k)

C(k)

103

Protein Interactions

(d)

0

101

10-1 10-1

10-2 100

101

102

k

RANDOM NETWORKS

103

104

100

101

102

103

k

27

CLUSTERING COEFFICIENT

BOX 3.9 WATTS-STROGATZ MODEL

Duncan Watts and Steven Strogatz proposed an extension of the

(a)

random network model (Figure 3.14) motivated by two observa-

(b)

REGULAR

tions [29]:

(c)

SMALL-WORLD

RANDOM

(a) Small World Property In real networks the average distance between two nodes depends logarithmically on N (3.18), rather than following a polynomial ex-

p= 0

p= 1

Increasing randomness

pected for regular lattices (Figure 3.11).

1

(d)

0.8

(b) High Clustering The average clustering coefficient of real networks is much high-

0.6

er than expected for a random network of similar N and L (Figure

0.4

3.13a).

0.2

⟨C (p)⟩ ⟨ C (0) ⟩

d (p) /d (0)

0 0.0001

The Watts-Strogatz model (also called the small-world model) interpolates between a regular lattice, which has high clustering but lacks the small-world phenomenon, and a random network, which has low clustering, but displays the small-world property (Figure 3.14a-c). Numerical simulations indicate that for a range of rewiring parameters the model's average path length is low but the clustering coefficient is high, hence reproducing the coexistence of high clustering and small-world phenomena (Figure 3.14d). Being an extension of the random network model, the WattsStrogatz model predicts a Poisson-like bounded degree distribution. Consequently high degree nodes, like those seen in Figure 3.6, are absent from it. Furthermore it predicts a k-independent C(k), being unable to recover the k-dependence observed in Figures 3.13b-d. As we show in the next chapters, understanding the coexistence of the small world property with high clustering must start from the network's correct degree distribution.

DEGREE CORRELATIONS

0.001

0.01 p

0.1

1

Figure 3.14 The Watts-Strogatz Model

(a) We start from a ring of nodes, each node being connected to their immediate and next neighbors. Hence initially each node has = 3/4 (p = 0). (b) With probability p each link is rewired to a randomly chosen node. For small p the network maintains high clustering but the random long-range links can drastically decrease the distances bea tween the nodes. b (c) For p = 1 all links have been rewired, so the network turns into a random network. (d) The dependence of the average path length d(p) and clustering coefficient on the rewiring parameter p. Note that d(p) and have been normalized by d(0) and obtained for a regular lattice (i.e. for p=0 in (a)). The rapid drop in d(p) signals the onset of the small-world phenomenon. During this drop, remains high. Hence in the range 0.001 ln N condition, implying that they should be broken into isolated clusters (Table 3.1). Some networks are indeed fragmented, most are not.

AT A GLANCE: RANDOM NETWORKS

Average Path Length Random network theory predicts that the average path length follows

Definition: N nodes, where each

(3.19), a prediction that offers a reasonable approximation for the ob-

node pair is connected with probability p.

served path lengths. Hence the random network model can account for the emergence of small world phenomena.

Average Degree:

k = p ( N − 1) .

Clustering Coefficient In a random network the local clustering coefficient is independent of the node’s degree and depends on the system size as 1/N. In con-

Average Number of Links: 

trast, measurements indicate that for real networks C(k) decreases with

L =

the node degrees and is largely independent of the system size (Figure 3.13).

p N ( N − 1) . 2

Degree Distribution:

Taken together, it appears that the small world phenomena is the only

Binomial Form: 

property reasonably explained by the random network model. All other network characteristics, from the degree distribution to the clustering co-

pk =

efficient, are significantly different in real networks. The extension of the Erdős-Rényi model proposed by Watts and Strogatz successfully predicts the coexistence of high C and low , but fails to explain the degree distri-

N 1 k p (1 p)N k

1 k

.

Poisson Form:

bution and C(k). In fact, the more we learn about real networks, the more we will arrive at the startling conclusion that we do not know of any real

pk = e

network that is accurately described by the random network model.

− k

k

k . k!

Giant Component (GC) (NG):

This conclusion begs a legitimate question: If real networks are not random, why did we devote a full chapter to the random network model? The

NG~ lnN

〈k〉 < 1:

answer is simple: The model serves as an important reference as we proceed to explore the properties of real networks. Each time we observe some network property we will have to ask if it could have emerged by chance.

2   3  N G ~ N 

1 < 〈k〉 < lnN:

For this we turn to the random network model as a guide: If the property is present in the model, it means that randomness can account for it. If the property is absent in random networks, it may represent some signature of

NG~(p-pc )N

〈k〉 > lnN:

order, requiring a deeper explanation. So, the random network model may be the wrong model for most real systems, but it remains quite relevant for network science (BOX 3.10).

Average Distance: 

〈d 〉 ∝

ln N ,. ln 〈k 〉

Clustering Coefficient: 

C =

RANDOM NETWORKS

30

k . N

REAL NETWORKS ARE NOT RANDOM

BOX 3.10 RANDOM NETWORKS AND NETWORK SCIENCE

The lack of agreement between random and real networks raises an important question: How could a theory survive so long given its poor agreement with reality? The answer is simple: Random network theory was never meant to serve as a model of real systems. Erdős and Rényi write in their first paper [2] that random networks “may be interesting not only from a purely mathematical point of view. In fact, the evolution of graphs may be considered as a rather simplified model of the evolution of certain communication nets (railways, road or electric network systems, etc.) of a country or some unit.” Yet, in the string of eight papers authored by them on the subject [2-9], this is the only mention of the potential practical value of their approach. The subsequent development of random graphs was driven by the problem's inherent mathematical challenges, rather than its applications. It is tempting to follow Thomas Kuhn and view network science as a paradigm change from random graphs to a theory of real networks [30]. In reality, there was no network paradigm before the end of 1990s. This period is characterized by a lack of systematic attempts to compare the properties of real networks with graph theoretical models. The work of Erdős and Rényi has gained prominence outside mathematics only after the emergence of network science (Figure 3.15). Network theory does not lessen the contributions of Erdős and Rényi, but celebrates the unintended impact of their work. When we discuss the disrepacies between random and real networks, we do so mainly for pedagogical reasons: to offer a proper foundation on which we can understand the properties of real systems. Figure 3.15

200

Network Science and Random Networks

Erdős-Rényi 1960 Erdős-Rényi 1959

150

While today we perceive the Erdős-Rényi model as the cornerstone of network theory, the model was hardly known outside a small subfield of mathematics. This is illustrated by the yearly citations of the first two papers by Erdős and Rényi, published in 1959 and 1960 [2,3]. For four decades after their publication the papers gathered less than 10 citations each year. The number of citations exploded after the first papers on scale-free networks [21, 31, 32] have turned Erdős and Rényi’s work into the reference model of network theory.

100

50

0 1960

1965

RANDOM NETWORKS

1970

1975

1980

1985

1990

1995

2000

2005 2010

31

REAL NETWORKS ARE NOT RANDOM

SECTION 3.11

HOMEWORK

3.1. Erdős-Rényi Networks Consider an Erdős-Rényi network with N = 3,000 nodes, connected to each other with probability p = 10–3. (a) What is the expected number of links, 〈L〉? (b) In which regime is the network? (c) Calculate the probability pc so that the network is at the critical point. (d) Given the linking probability p = 10–3, calculate the number of nodes Ncr so that the network has only one component. (e) For the network in (d), calculate the average degree 〈kc r〉 and the average distance between two randomly chosen nodes 〈d〉. (f) Calculate the degree distribution pk of this network (approximate with a Poisson degree distribution). 3.2. Generating Erdős-Rényi Networks Relying on the G(N, p) model, generate with a computer three networks with N = 500 nodes and average degree (a) 〈k〉 = 0.8, (b) 〈k〉 = 1 and (c) 〈k〉 = 8. Visualize these networks. 3.3. Circle Network Consider a network with N nodes placed on a circle, so that each node connects to m neighbors on either side (consequently each node has degree 2m). Figure 3.14(a) shows an example of such a network with m = 2 and N = 20. Calculate the average clustering coefficient 〈C〉 of this network and the average shortest path 〈d〉. For simplicity assume that N and m are chosen such that (n-1)/2m is an integer. What happens to 〈C〉 if N≫1? And what happens to 〈d〉? 3.4. Cayley Tree A Cayley tree is a symmetric tree, constructed starting from a central

RANDOM NETWORKS

32

node of degree k. Each node at distance d from the central node has degree k, until we reach the nodes at distance P that have degree one and are called leaves (see Figure 3.16 for a Cayley tree with k = 3 and P = 5.). (a) Calculate the number of nodes reachable in t steps from the central node. (b) Calculate the degree distribution of the network. (c) Calculate the diameter dmax. (d) Find an expression for the diameter dmax in terms of the total number of nodes N.

(e) Does the network display the small-world property? 3.5. Snobbish Network Consider a network of N red and N blue nodes. The probability that

Figure 3.16 Cayley Tree

there is a link between nodes of identical color is p and the probability that

A Cayley Tree With k = 3 and P = 5.

there is a link between nodes of different color is q. A network is snobbish if p > q, capturing a tendency to connect to nodes of the same color. For q = 0 the network has at least two components, containing nodes with the same color. (a) Calculate the average degree of the "blue" subnetwork made of only blue nodes, and the average degree in the full network. (b) Determine the minimal p and q required to have, with high probability, just one component. (c) Show that for large N even very snobbish networks (p≫q) display the small-world property. 3.6. Snobbish Social Networks Consider the following variant of the model discussed above: We have a network of 2N nodes, consisting of an equal number of red and blue nodes, while an f fraction of the 2N nodes are purple. Blue and red nodes do not connect to each other (q = 0), while they connect with probability p to nodes of the same color. Purple nodes connect with the same probability p to both red and blue nodes. (a) We call the red and blue communities interactive if a typical red node is just two steps away from a blue node and vice versa. Evaluate the fraction of purple nodes required for the communities to be interactive. (b) Comment on the size of the purple community if the average degree of the blue (or red) nodes is 〈k〉≫1. (c) What are the implications of this model for the structure of social (and other) networks?

RANDOM NETWORKS

33

HOMEWORK

SECTION 3.12

ADVANCED TOPICS 3.A DERIVING THE POISSON DISTRIBUTION

To derive the Poisson form of the degree distribution we start from the exact binomial distribution (3.7) 

 N − 1 k pk =  p (1 − p )N −1− k   k 

(3.22)

that characterizes a random graph. We rewrite the first term on the r.h.s. as

 N − 1 (N − 1)(N − 1 − 1)(N − 1 − 2)...(N − 1 − k + 1) (N − 1)k (3.23) , ≈  k  = k! k! where in the last term we used that k « N. The last term of (3.22) can be simplified as 

ln[(1 − p )( N −1)− k ] = (N − 1 − k )ln(1 −

〈k 〉 ) N −1

and using the series expansion 

( −1)n+1 n x 2 x3 ln(1 + x ) = ∑ x =x− + − ..., ∀ | x |≤ 1 n 2 3 n =1 ∞

we obtain 

ln[(1 p )N −1− k ] ≈ (N − 1 − k )

〈k 〉 k = −〈k 〉(1 − ≈ 〈k 〉 N −1 N −1

which is valid if N » k. This represents the small degree approximation at the heart of this derivation. Therefore the last term of (3.22) becomes 

(1 − p ) N − 1− k = e −〈 k 〉 .

(3.24)

Combining (3.22), (3.23), and (3.24) we obtain the Poisson form of the deRANDOM NETWORKS

34

gree distribution 



 N − 1 k (N − 1)k k −〈 k 〉 ( N −1)− k pk =  p (1 − p ) = pe k!  k  k

(N − 1)k  〈k 〉  −〈 k 〉 e , = k !  N − 1 or 

pk = e

RANDOM NETWORKS

−〈 k 〉

〈k 〉 k . k!

(3.25)

35

DERIVING THE POISSON-DEGREE DISTRIBUTION

SECTION 3.13

ADVANCED TOPICS 3.B MAXIMUM AND MINIMUM DEGREES

To determine the expected degree of the largest node in a random network, called the network’s upper natural cutoff, we define the degree kmax

such that in a network of N nodes we have at most one node with degree higher than kmax . Mathematically this means that the area behind the Pois-

son distribution pk for k ≥ kmax should be approximately one (Figure 3.17). Since the area is given by 1-P(kmax), where P(k) is the cumulative degree dis-

tribution of pk, the network’s largest node satisfies:



N 1 − P ( kmax ) ≈ 1.

(3.26)

We write ≈ instead of =, because kmax is an integer, so in general the exact equation does not have a solution. For a Poisson distribution kmax k

k

+1

∞ 〈k 〉 〈k 〉 k 〈k 〉 max = e −〈 k 〉 ∑ ≈ e −〈 k 〉 , ( kmax + 1)! k = kmax +1 k ! k=0 k !

1 − P ( kmax ) = 1 − e −〈 k 〉 ∑

(3.27)

where in the last term we approximate the sum with its largest term. For N = 109 and = 1,000, roughly the size and the average degree of the globe’s social network, (3.26) and (3.27) predict kmax = 1,185, indicating that a random network lacks extremely popular individuals, or hubs. We can use a similar argument to calculate the expected degree of the smallest node, kmin. By requiring that there should be at most one node with degree smaller than kmin we can write





NP ( kmin– 1) ≈ 1.

(3.28)

For the Erdős-Rényi network we have  RANDOM NETWORKS

36

kmin–1

P( kmin– 1) = e −〈 k 〉 ∑ k=0

〈k 〉 k . k!

(3.29)

Solving (3.28) with N = 109 and = 1,000 we obtain kmin = 816.

pk

The area under the curve should be less than 1/N.

Figure 3.17

Minimum and Maximum Degree

kmin

RANDOM NETWORKS

k

kmax

The estimated maximum degree of a network, kmax, is chosen so that there is at most one node whose degree is higher than kmax. This is often called the natural upper cutoff of a degree distribution. To calculate it, we need to set kmax such that the area under the degree distribution pk for k > kmax equals 1/N, hence the total number of nodes expected in this region is exactly one. We follow a similar argument to determine the expected smallest degree, kmin.

37

MAXIMUM AND MINIMUM DEGREES

SECTION 3.14

ADVANCED TOPICS 3.C GIANT COMPONENT

In this section we introduce the argument, proposed independently by Solomonoff and Rapoport [11], and by Erdős and Rényi [2], for the emergence of giant component at = 1 [33]. Let us denote with u = 1 - NG/N the fraction of nodes that are not in the

giant component (GC), whose size we take to be NG. If node i is part of the

GC, it must link to another node j, which must also be part of the GC. Hence if i is not part of the GC, that could happen for two reasons: • There is no link between i and j (probability for this is 1- p). • There is a link between i and j, but j is not part of the GC (probability for this is pu). Therefore the total probability that i is not part of the GC via node j is 1 - p + pu. The probability that i is not linked to the GC via any other node is therefore (1 - p + pu)N - 1, as there are N - 1 nodes that could serve as potential links to the GC for node i. As u is the fraction of nodes that do not belong to the GC, for any p and N the solution of the equation 

u = (1 − p + pu )N −1

(3.30)

provides the size of the giant component via NG = N(1 - u). Using p = / (N - 1) and taking the logarithm of both sides, for « N we obtain

   〈k 〉

 〈k 〉 ln u = (N − 1)ln 1 − (1 − u ) ≈ ( N −1)  − (1 − u ) = − 〈k 〉 (1 − u ), (3.31)  N −1   N −1 

where we used the series expansion for ln(1+x). Taking an exponential of both sides leads to u = exp[- (1 - u)]. If we denote with S the fraction of nodes in the giant component, S = NG / N, then

S = 1 - u and (3.31) results in RANDOM NETWORKS

38



S = 1 − e −〈 k 〉 S .

(3.32)

1

(a)

0.8

This equation provides the size of the giant component S in function of (Figure 3.18). While (3.32) looks simple, it does not have a closed solu-

0.6

tion. We can solve it graphically by plotting the right hand side of (3.32) as

y

a function of S for various values of . To have a nonzero solution, the

k = 1.5 k =1

0.4

obtained curve must intersect with the dotted diagonal, representing the left hand side of (3.32). For small the two curves intersect each other

0.2

only at S = 0, indicating that for small the size of the giant component

k = 0.5

is zero. Only when exceeds a threshold value, does a non-zero solution emerge.

0

0.2

0.4

0.6

1

0.8

S

To determine the value of at which we start having a nonzero solution we take a derivative of (3.32), as the phase transition point is when the

1

(b)

r.h.s. of (3.32) has the same derivative as the l.h.s. of (3.32), i.e. when

(

)

d 1 − e −〈 k 〉 S = 1, dS

〈k 〉e

−〈 k 〉 S

0.8 0.6

(3.33)

S 0.4

= 1.

0.2

Setting S = 0, we obtain that the phase transition point is at = 1 (see also ADVANCED TOPICS 3.F).

0

1

k

2

3

Figure 3.18

Graphical Solution

(a) The three purple curves correspond to y = 1-exp[ - S ] for =0.5, 1, 1.5. The green dashed diagonal corresponds y = S, and the intersection of the dashed and purple curves provides the solution to (3.32). For =0.5 there is only one intersection at S = 0, indicating the absence of a giant component. The =1.5 curve has a solution at S = 0.583 (green vertical line). The =1 curve is precisely at the critical point, representing the separation between the regime where a nonzero solution for S exists and the regime where there is only the solution at S = 0. (b) The size of the giant component in function of as predicted by (3.32). After [33].

RANDOM NETWORKS

39

GIANT COMPONENT

SECTION 3.15

ADVANCED TOPICS 3.D COMPONENT SIZES (a)

In Figure 3.7 we explored the size of the giant component, leaving an im-

(b)

portant question open: How many components do we expect for a given ? What is their size distribution? The aim of this section is to discuss these topics.

N=102 N=103 N=104 N=�

k =1/2 100

10-5 (c)

longs to a component of size s (which is different from the giant component G) is [33]

(d)

s

 s s ! ≈ 2π s   for large s we obtain  e







ps ~ s

10

−3/2 − ( 〈 k 〉−1) s + ( s −1)ln 〈 k 〉

e

.

(3.35)

Therefore the component size distribution has two contributions: a slowly decreasing power law term s-3/2 and a rapidly decreasing exponential term e-(-1)s+(s-1)ln. Given that the exponential term dominates

102

103

101

102

103

k =3 100

100

10-1 10-2 ps -3 10 10-4 10-5 10-6



101

100 10-1 10-2

10-5

Replacing s-1 with exp[(s-1) ln] and using the Stirling-formula 



k =1

10-3 10-4

〈k 〉 )s −1 −〈 k 〉 s ( s (3.34) ps ~ e . s! 

102

N=102 N=103 N=104 N=�

0

For a random network the probability that a randomly chosen node be-

s

101

100 10-1 10-2 ps 10-3 10-4

Component Size Distribution

k =3 k =1 k =1/2 N=� N=104 100

101

s

102

103

Figure 3.19 Component Size Distribution

for large s, (3.35) predicts that large components are prohibited. At the

Component size distribution ps in a random network, excluding the giant component.

critical point, = 1, all terms in the exponential cancel, hence ps follows the power law

(a)-(c) ps for different values and N, indicating that ps converges for large N to the prediction (3.34).

 (3.36) ps ~ s −3/2 .

(d) ps for N = 104, shown for different . While for < 1 and > 1 the ps distribution has an exponential form, right at the critical point = 1 the distribution follows the power law (3.36). The continuous green lines correspond to (3.35). The first numerical study of the component size distribution in random networks was carried out in 1998 [34], preceding the exploding interest in complex networks.

As a power law decreases relatively slowly, at the critical point we expect to observe clusters of widely different sizes, a property consistent with the behavior of a system during a phase transition (ADVANCED TOPICS 3.F). These predictions are supported by the numerical simulations shown in Figure 3.19.

RANDOM NETWORKS

100 10-1 10-2 ps -3 10 10-4 10-5 10-6

40

(a) 5

Average Component Size

4

The calculations also indicate that the average component size (once again, excluding the giant component) follows [33]

s

2



1 1 − 〈k 〉 + 〈k 〉NG / N

〈s 〉 =

1

(3.37)

0

For < 1 we lack a giant component (NG = 0), hence (3.37) becomes



〈s 〉 =



3



1 , 1 − 〈k 〉

0.5

1

k

1.5

(b) 30 25



s

(3.38)

2

2.5

N=102 N=103 N=104 Theory

20 15 10

which diverges when the average degree approaches the critical point

5

= 1. Therefore as we approach the critical point, the size of the clus-

0.5

0

ters increases, signaling the emergence of the giant component at = 1. Numerical simulations support these predictions for large N (Figure

1

k

1.5

(c) 2.5

3.20).

2 1.5

need to first calculate the size of the giant component. This can be done in a self-consistent manner, obtaining that the average cluster size de-

1

creases for > 1, as most clusters are gradually absorbed by the giant

0.5

component.

0

0.5

Note that (3.37) predicts the size of the component to which a randomly

Figure 3.20

chosen node belongs. This is a biased measure, as the chance of belong-

Average Component Size

ing to a larger cluster is higher than the chance of belonging to a smallwe obtain the average size of the small components that we would get if we were to inspect each cluster one by one and then measure their average size [33]



2 〈s . ′〉 = 2 − 〈k 〉 + 〈k 〉NG / N

1

k

1.5

2

2.5

(a) The average size of a component to which a randomly chosen node belongs to as predicted by (3.39) (purple). The green curve shows the overall average size of a component as predicted by (3.37). (After [33]).

er one. The bias is linear in the cluster size s. If we correct for this bias,



2.5

N=102 N=103 N=104 Theory

s

To determine the average component size for > 1 using (3.37), we

2

(b) The average cluster size in a random network. We choose a node and determined the size of the cluster it belongs to. This measure is biased, as each component of size s will be counted s times. The larger N becomes, the more closely the numerical data follows the prediction (3.37). As predicted, diverges at the =1 critical point, supporting the existence of a phase transition (ADVANCED TOPICS 3.F).

(3.39)

Figure 3.20 offers numerical support for (3.39).

(c) The average cluster size in a random network, where we corrected for the bias in (b) by selecting each component only once.The larger N becomes, the more closely the numerical data follows the prediction (3.39).

RANDOM NETWORKS

41

COMPONENT SIZES

SECTION 3.16

ADVANCED TOPICS 3.E FULLY CONNECTED REGIME

To determine the value of at which most nodes became part of the giant component, we calculate the probability that a randomly selected NG

node does not have a link to the giant component, which is (1  − p )

≈ (1 − p )N,

as in this regime NG ≃ N. The expected number of such isolated nodes is N

 N ⋅ p IN = N(1 − p ) = N 1 − ≈ Ne − Np ,  N   N

where we used (1 −

(3.40)

x n ) ≈ e − x, an approximation valid for large n. If we n

make p sufficiently large, we arrive to the point where only one node is disconnected from the giant component. At this point IN = 1, hence according

to (3.40) p needs to satisfy Ne

− Np

= 1 . Consequently, the value of p at which

we are about to enter the fully connected regime is 

ln N , N

p=

(3.41)

which leads to (3.14) in terms of .

RANDOM NETWORKS

42

SECTION 3.17

ADVANCED TOPICS 3.F PHASE TRANSITIONS

The emergence of the giant component at =1 in the random network model is reminiscent of a phase transition, a much studied phenomenon in physics and chemistry [35]. Consider two examples: i. Water-Ice Transition (Figure 3.21a): At high temperatures the H2O molecules engage in a diffusive motion, forming small groups and then breaking apart to group up with other water molecules. If cooled, at 0˚C the molecules suddenly stop this diffusive dance, forming an ordered rigid ice crystal. ii. Magnetism (Figure 3.21b): In ferromagnetic metals like iron at high temperatures the spins point in randomly chosen directions. Under some critical temperature Tc all atoms orient their spins in the same direction and the metal turns into a magnet. The freezing of a liquid and the emergence of magnetization are examples of phase transitions, representing transitions from disorder to order. Indeed, relative to the perfect order of the crystalline ice, liquid water is rather disordered. Similarly, the randomly oriented spins in a ferromagnet take up the highly ordered common orientation under Tc. Many properties of a system undergoing a phase transition are universal. This means that the same quantitative patterns are observed in a wide range of systems, from magma freezing into rock to a ceramic material turning into a superconductor. Furthermore, near the phase transition point, called the critical point, many quantities of interest follow power-laws. The phenomena observed near the critical point = 1 in a random network in many ways is similar to a phase transition: • The similarity between Figure 3.7a and the magnetization diagram of Figure 3.21b is not accidental: they both show a transition from disorder to order. In random networks this corresponds to the emergence RANDOM NETWORKS

43

of a giant component when exceeds = 1. • As we approach the freezing point, ice crystals of widely different sizes are observed, and so are domains of atoms with spins pointing in the same direction. The size distribution of the ice crystals or magnetic domains follows a power law. Similarly, while for < 1 and > 1 the cluster sizes follow an exponential distribution, right at the phase transition point ps follows the power law (3.36), indicating the coexistence of components of widely different sizes. •

At the critical point the average size of the ice crystals or of the magnetic domains diverges, assuring that the whole system turns into a single frozen ice crystal or that all spins point in the same direction. Similarly in a random network the average cluster size diverges as we approach = 1 (Figure 3.20).

Figure 3.21

Phase Transitions (a) Water-Ice Phase Transition The hydrogen bonds that hold the water molecules together (dotted lines) are weak, constantly breaking up and re-forming, maintaining partially ordered local structures (left panel). The temperature-pressure phase diagram indicates (center panel) that by lowering the temperature, the water undergoes a phase transition, moving from a liquid (purple) to a frozen solid (green) phase. In the solid phase each water molecule binds rigidly to four other molecules, forming an ice lattice (right panel). After http://www.lbl.gov/Science-Articles/Archive/sabl/2005/February/ water-solid.html.

PRESSURE (ATM)

(a)

SOLID

LIQUID

1.0

GAS TEMPERATURE

0° C

100° C

(b)

(b) Magnetic Phase Transition In ferromagnetic materials the magnetic moments of the individual atoms (spins) can point in two different directions. At high temperatures they choose randomly their direction (right panel). In this disordered state the system’s total magnetization (m = ∆M/N, where ∆M is the number of up spins minus the number of down spins) is zero. The phase diagram (middle panel) indicates that by lowering the temperature T, the system undergoes a phase transition at T= Tc, when a nonzero magnetization emerges. Lowering T further allows m to converge to one. In this ordered phase all spins point in the same direction (left panel).

1 0.8 0.6

m

0.4

ordered phase

0

RANDOM NETWORKS

ordered phase

0.2

0

1

disordered phase

disordered phase 2

Tc

3

4

T

44

PHASE TRANSITIONS

SECTION 3.18

ADVANCED TOPICS 3.G SMALL WORLD CORRECTIONS

Equation (3.18) offers only an approximation to the network diameter, valid for very large N and small d. Indeed, as soon as d approaches the system size N the d scaling must break down, as we do not have enough nodes to continue the d expansion. Such finite size effects result in corrections to (3.18). For a random network with average degree , the network diameter is better approximated by [36]

ln N ln〈k 〉

2ln N , ln[ −W ( 〈k 〉 exp − 〈k 〉 )]

dmax = + where

(3.42)

the Lambert W-function W(z) is the principal inverse of

f(z) = z exp(z). The first term on the r.h.s is (3.18), while the second is the correction that depends on the average degree. The correction increases the diameter, accounting for the fact that when we approach the network’s diameter the number of nodes must grow slower than . The magnitude of the correction becomes more obvious if we consider the various limits of (3.42). In the → 1 limit we can calculate the Lambert W-function, finding for the diameter [36]

dmax = 3

ln N . ln〈k 〉

(3.43)

Hence in the moment when the giant component emerges the network diameter is three times our prediction (3.18). This is due to the fact that at the critical point = 1 the network has a tree-like structure, consisting of long chains with hardly any loops, a configuration that increases dmax. In the → ∞ limit, corresponding to a very dense network, (3.42) becomes

dmax =

 ln〈k 〉  ln N 2ln N + + ln N  . ln〈k 〉 〈k 〉  〈k 〉2 

(3.44)

Hence if increases, the second and the third terms vanish and the solution (3.42) converges to the result (3.18).

RANDOM NETWORKS

45

SECTION 3.19

BIBLIOGRAPHY

[1] A.-L. Barabási. Linked: The new science of networks. Plume Books, 2003. [2] P. Erdős and A. Rényi. On random graphs, I. Publicationes Mathematicae (Debrecen), 6:290-297, 1959. [3] P. Erdős and A. Rényi. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci., 5:17-61, 1960. [4] P. Erdős and A. Rényi. On the evolution of random graphs. Bull. Inst. Internat. Statist., 38:343-347, 1961. [5] P. Erdős and A. Rényi. On the Strength of Connectedness of a Random Graph, Acta Math. Acad. Sci. Hungary, 12: 261–267, 1961. [6] P. Erdős and A. Rényi. Asymmetric graphs. Acta Mathematica Acad. Sci. Hungarica, 14:295-315, 1963. [7] P. Erdős and A. Rényi. On random matrices. Publ. Math. Inst. Hung. Acad. Sci., 8:455-461, 1966. [8] P. Erdős and A. Rényi. On the existence of a factor of degree one of a connected random graph. Acta Math. Acad. Sci. Hungary, 17:359-368, 1966. [9] P. Erdős and A. Rényi. On random matrices II. Studia Sci. Math. Hungary, 13:459-464, 1968. [10] E. N. Gilbert. Random graphs. The Annals of Mathematical Statistics, 30:1141-1144, 1959. [11] R. Solomonoff and A. Rapoport. Connectivity of random nets. Bulletin of Mathematical Biology, 13:107-117, 1951. [12] P. Hoffman. The Man Who Loved Only Numbers: The Story of Paul RANDOM NETWORKS

46

Erdős and the Search for Mathematical Truth. Hyperion Books, 1998. [13] B. Schechter. My Brain is Open: The Mathematical Journeys of Paul Erdős. Simon and Schuster, 1998. [14] G. P. Csicsery. N is a Number: A Portait of Paul Erdős, 1993. [15] B. Bollobás. Random Graphs. Cambridge University Press, 2001. [16] L. C. Freeman and C. R. Thompson. Estimating Acquaintanceship. Volume, pg. 147-158, in The Small World, Edited by Manfred Kochen (Ablex, Norwood, NJ), 1989. [17] H. Rosenthal. Acquaintances and contacts of Franklin Roosevelt. Unpublished thesis. Massachusetts Institute of Technology, 1960. [18] L. Backstrom, P. Boldi, M. Rosa, J. Ugander, and S. Vigna. Four degrees of separation. In ACM Web Science 2012: Conference Proceedings, pages 45−54. ACM Press, 2012. [19] R. Albert and A.-L. Barabási. Statistical mechanics of complex networks. Reviews of Modern Physics, 74:47-97, 2002. [20] I. de Sola Pool and M. Kochen. Contacts and Influence. Social Networks, 1: 5-51, 1978. [21] H. Jeong, R. Albert and A. L. Barabási. Internet: Diameter of the world-wide web. Nature, 401:130-131, 1999. [22] S. Lawrence and C.L. Giles. Accessibility of information on the Web Nature, 400:107, 1999. [23] A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the web. Computer Networks, 33:309–320, 2000. [24] S. Milgram. The Small World Problem. Psychology Today, 2: 60-67, 1967. [25] J. Travers and S. Milgram. An Experimental Study of the Small World Problem. Sociometry, 32:425-443, 1969. [26] K. Frigyes, “Láncszemek,” in Minden másképpen van (Budapest: Atheneum Irodai es Nyomdai R.-T. Kiadása, 1929), 85–90. English translation is available in [27]. [27] M. Newman, A.-L. Barabási, and D. J. Watts. The Structure and Dynamics of Networks. Princeton University Press, 2006. [28] J. Guare. Six degrees of separation. Dramatist Play Service, 1992. RANDOM NETWORKS

47

BIBLIOGRAPHY

[29] D. J. Watts and S. H. Strogatz. Collective dynamics of ‘small-world’ networks. Nature, 393: 409–10, 1998. [30] T. S. Kuhn. The Structure of Scientific Revolutions. University of Chicago Press, 1962. [31] A.-L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286:509-512, 1999. [32] A.-L. Barabási, R. Albert, and H. Jeong. Meanfield theory for scalefree random networks. Physica A, 272:173-187, 1999. [33] M. Newman. Networks: An Introduction. Oxford University Press, 2010. [34] K. Christensen, R. Donangelo, B. Koiller, and K. Sneppen. Evolution of Random Networks. Physical Review Letters, 81:2380-2383, 1998. [35] H. E. Stanley. Introduction to Phase Transitions and Critical Phenomena. Oxford University Press, 1987. [36] D. Fernholz and V. Ramachandran. The diameter of sparse random graphs. Random Structures and Algorithms, 31:482-516, 2007.

RANDOM NETWORKS

48

BIBLIOGRAPHY

4 ALBERT-LÁSZLÓ BARABÁSI

NETWORK SCIENCE THE SCALE-FREE PROPERTY

ACKNOWLEDGEMENTS

MÁRTON PÓSFAI GABRIELE MUSELLA MAURO MARTINO ROBERTA SINATRA

SARAH MORRISON AMAL HUSSEINI PHILIPP HOEVEL

INDEX

Introduction

1

Power Laws and Scale-Free Networks

2

Hubs

3

The Meaning of Scale-Free

4

Universality

5

Ultra-Small Property

6

The Role of the Degree Exponent

7

Generating Networks with Arbitrary Degree Distribution

8

Summary

9

Homework

10

ADVANCED TOPICS 4.A Power Laws

11

ADVANCED TOPICS 4.B Plotting Power-laws

12

ADVANCED TOPICS 4.C Estimating the Degree Exponent

13

Figure 4.0 (cover image)

“Art and Networks” by Tomás Saraceno

Bibliography

14

Tomás Saraceno creates art inspired by spider webs and neural networks. Trained as an architect, he deploys insights from engineering, physics, chemistry, aeronautics, and materials science, using networks as a source of inspiration and metaphor. The image shows his work displayed in the Miami Art Museum, an example of the artist’s take on complex networks.

This book is licensed under a Creative Commons: CC BY-NC-SA 2.0. PDF V53 09.09.2014

SECTION 4.1

INTRODUCTION

The World Wide Web is a network whose nodes are documents and the links are the uniform resource locators (URLs) that allow us to “surf” with a click from one web document to the other. With an estimated size of over one trillion documents (N≈1012), the Web is the largest network humanity

>

has ever built. It exceeds in size even the human brain (N ≈ 1011 neurons).

It is difficult to overstate the importance of the World Wide Web in our daily life. Similarly, we cannot exaggerate the role the WWW played in the development of network theory: it facilitated the discovery of a number of fundamental network characteristics and became a standard testbed for most network measures.

Online Resource 4.1 Zooming into the World Wide Web

We can use a software called a crawler to map out the Web’s wiring di-

Watch an online video that zooms into the WWW sample that has lead to the discovery of the scale-free property [1]. This is the network featured in Table 2.1 and shown in Figure 4.1, whose characteristics are tested throughout this book.

agram. A crawler can start from any web document, identifying the links (URLs) on it. Next it downloads the documents these links point to and identifies the links on these documents, and so on. This process iteratively returns a local map of the Web. Search engines like Google or Bing operate

>

crawlers to find and index new documents and to maintain a detailed map of the WWW. The first map of the WWW obtained with the explicit goal of understanding the structure of the network behind it was generated by Hawoong Jeong at University of Notre Dame. He mapped out the nd.edu domain [1], consisting of about 300,000 documents and 1.5 million links (Online Resource 4.1). The purpose of the map was to compare the properties of the Web graph to the random network model. Indeed, in 1998 there were reasons to believe that the WWW could be well approximated by a random network. The content of each document reflects the personal and professional interests of its creator, from individuals to organizations. Given the diversity of these interests, the links on these documents might appear to point to randomly chosen documents. A quick look at the map in Figure 4.1 supports this view: There appears to be considerable randomness behind the Web’s wiring diagram. Yet, a THE SCALE-FREE PROPERTY

3

closer inspection reveals some puzzling differences between this map and a random network. Indeed, in a random network highly connected nodes, or hubs, are effectively forbidden. In contrast in Figure 4.1 numerous small-degree nodes coexist with a few hubs, nodes with an exceptionally large number of links. In this chapter we show that hubs are not unique to the Web, but we encounter them in most real networks. They represent a signature of a deeper organizing principle that we call the scale-free property. We therefore explore the degree distribution of real networks, which allows us to uncover and characterize scale-free network. The analytical and empirical results discussed here represent the foundations of the modeling efforts the rest of this book is based on. Indeed, we will come to see that no matter what network property we are interested in, from communities to spreading processes, it must be inspected in the light of the network’s degree distribution.

Figure 4.1 The Topology of the World Wide Web

Snapshots of the World Wide Web sample mapped out by Hawoong Jeong in 1998 [1]. The sequence of images show an increasingly magnified local region of the network. The first panel displays all 325,729 nodes, offering a global view of the full dataset. Nodes with more than 50 links are shown in red and nodes with more than 500 links in purple. The closeups reveal the presence of a few highly connected nodes, called hubs, that accompany scale-free networks. Courtesy of M. Martino. THE SCALE-FREE PROPERTY

4

INTRODUCTION

SECTION 4.2

POWER LAWS AND SCALE-FREE NETWORKS

If the WWW were to be a random network, the degrees of the Web documents should follow a Poisson distribution. Yet, as Figure 4.2 indicates, the Poisson form offers a poor fit for the WWW’s degree distribution. Instead on a log-log scale the data points form an approximate straight line, suggesting that the degree distribution of the WWW is well approximated with (4.1)

pk ~ k − γ . Equation (4.1) is called a power law distribution and the exponent degree exponent (BOX 4.1). If we take a logarithm of (4.1), we obtain

γ

is its

(4.2)

log pk ~ −γ log k .

If (4.1) holds, log pk is expected to depend linearly on log k, the slope of this

line being the degree exponent γ (Figure 4.2).

(b)

Figure 4.2

100

100

The Degree Distribution of the WWW

10-2

10-2

(a)

pk

pk

in

10

The incoming (a) and outgoing (b) degree distribution of the WWW sample mapped in the 1999 study of Albert et al. [1]. The degree distribution is shown on double logarithmic axis (log-log plot), in which a power law follows a straight line. The symbols correspond to the empirical data and the line corresponds to the power-law fit, with degree exponents γin= 2.1 and γout = 2.45. We also show as a green line the degree distribution predicted by a Poisson function with the average degree 〈kin〉 = 〈kout〉 = 4.60 of the WWW sample.

out

10-4

-4

γ in

γ out

10-6

10-6

10-8

10-8

10-10

10-10 100

101

102

kin

103

THE SCALE-FREE PROPERTY

104

105

100

101

102

kout

103

104

105

5

The WWW is a directed network, hence each document is characterized by an out-degree kout, representing the number of links that point from the document to other documents, and an in-degree kin, representing

the number of other documents that point to the selected document. We must therefore distinguish two degree distributions: the probability that a randomly chosen document points to kout web documents, or pk , and the out

probability that a randomly chosen node has kin web documents pointing

to it, or pk . In the case of the WWW both pk and pk in

in

by a power law

pk ~ k

− γ in

in

pk ~ k

out

can be approximated (4.3)

,

− γ out

out

,

(4.4)

where γin and γout are the degree exponents for the in- and out-degrees, re-

spectively (Figure 4.2). In general

γin can differ from γout. For example, in

Figure 4.1 we have γin ≈ 2.1 and γout ≈ 2.45.

The empirical results shown in Figure 4.2 document the existence of a network whose degree distribution is quite different from the Poisson distribution characterizing random networks. We will call such networks scale-free, defined as [2]: A scale-free network is a network whose degree distribution follows a power law. As Figure 4.2 indicates, for the WWW the power law persists for almost four orders of magnitude, prompting us to call the Web graph scale-free network. In this case the scale-free property applies to both in and out-degrees. To better understand the scale-free property, we have to define the power-law distribution in more precise terms. Therefore next we discuss the discrete and the continuum formalisms used throughout this book. Discrete Formalism As node degrees are positive integers, k = 0, 1, 2, ..., the discrete formalism provides the probability pk that a node has exactly k links

pk = Ck −γ .

(4.5)

The constant C is determined by the normalization condition ∞

∑p k =1

Using (4.5) we obtain,

k

= 1.

(4.6)



C ∑ k −γ = 1 , k =1

THE SCALE FREE PROPERTY

6

POWER LAWS AND SCALE-FREE NETWORKS

hence

C =

1 ∞

∑k

= −γ

1 , ζ (γ )

(4.7)

k =1

where ζ (γ) is the Riemann-zeta function. Thus for k > 0 the discrete power-law distribution has the form

pk =

k −γ . ζ (γ )

(4.8)

Note that (4.8) diverges at k=0. If needed, we can separately specify p0, representing the fraction of nodes that have no links to other nodes. In that case the calculation of C in (4.7) needs to incorporate p0. Continuum Formalism In analytical calculations it is often convenient to assume that the degrees can have any positive real value. In this case we write the power-law degree distribution as

p( k ) = Ck −γ .

(4.9)

Using the normalization condition





kmin

(4.10)

p( k )dk = 1

we obtain

1

C= kmin

k dk

=(

1)k min1 .

(4.11)

Therefore in the continuum formalism the degree distribution has the form γ −1 − γ p( k ) = (γ − 1)kmin k ,.

(4.12)

Here kmin is the smallest degree for which the power law (4.8) holds. Note that pk encountered in the discrete formalism has a precise mean-

ing: it is the probability that a randomly selected node has degree k. In contrast, only the integral of p(k) encountered in the continuum formalism has a physical interpretation: k2

∫ p(k)dk

(4.13)

k1

is the probability that a randomly chosen node has degree between k1 and k2.

In summary, networks whose degree distribution follows a power law are called scale-free networks. If a network is directed, the scale-free property applies separately to the in- and the out-degrees. To mathematically study the properties of scale-free networks, we can use either the discrete or the continuum formalism. The scale-free property is independent of the formalism we use.

THE SCALE FREE PROPERTY

7

POWER LAWS AND SCALE-FREE NETWORKS

BOX 4.1 THE 80/20 RULE AND THE TOP ONE PERCENT Vilfredo Pareto, a 19th century economist, noticed that in Italy a few wealthy individuals earned most of the money, while the majority of the population earned rather small amounts. He connected this disparity to the observation that incomes follow a power law, representing the first known report of a power-law distribution [3]. His finding entered the popular literature as the 80/20 rule: Roughly 80 percent of money is earned by only 20 percent of the population. The 80/20 rule emerges in many areas. For example in management it is often stated that 80 percent of profits are produced by only 20 percent of the employees. Similarly, 80 percent of decisions are made during 20 percent of meeting time. The 80/20 rule is present in networks as well: 80 percent of links

Figure 4.3

on the Web point to only 15 percent of webpages; 80 percent of

Vilfredo Federico Damaso Pareto (1848 – 1923)

citations go to only 38 percent of scientists; 80 percent of links in Hollywood are connected to 30 percent of actors [4]. Most quantities following a power law distribution obey the 80/20 rule. During the 2009 economic crisis power laws gained a new meaning: The Occupy Wall Street Movement draw attention to the fact that in the US 1% of the population earns a disproportionate 15%

Italian economist, political scientist, and philosopher, who had important contributions to our understanding of income distribution and to the analysis of individual choices. A number of fundamental principles are named after him, like Pareto efficiency, Pareto distribution (another name for a power-law distribution), the Pareto principle (or 80/20 law).

of the total US income. This 1% phenomena, a signature of a profound income disparity, is again a consequence of the power-law nature of the income distribution.

THE SCALE-FREE PROPERTY

8

POWER LAWS AND SCALE-FREE NETWORKS

SECTION 4.3

HUBS

The main difference between a random and a scale-free network comes in the tail of the degree distribution, representing the high-k region of pk.

To illustrate this, in Figure 4.4 we compare a power law with a Poisson func-

tion. We find that: • For small k the power law is above the Poisson function, indicating that a scale-free network has a large number of small degree nodes, most of which are absent in a random network. • For k in the vicinity of 〈k〉 the Poisson distribution is above the power law, indicating that in a random network there is an excess of nodes with degree k≈〈k〉. • For large k the power law is again above the Poisson curve. The difference is particularly visible if we show pk on a log-log plot (Figure 4.4b), indicating that the probability of observing a high-degree node, or hub, is several orders of magnitude higher in a scale-free than in a random network. Let us use the WWW to illustrate the magnitude of these differences. The probability to have a node with k=100 is about p100≈10−94 in a Poisson distribution while it is about p100≈4x10-4 if pk follows a power law. Conse-

quently, if the WWW were to be a random network with =4.6 and size N≈1012, we would expect

N k≥100 = 1012

(4.6)k e k! k=100

4.6

10

82

(4.14)

nodes with at least 100 links, or effectively none. In contrast, given the WWW’s power law degree distribution, with

γin = 2.1 we have Nk≥100 = 4x109,

i.e. more than four billion nodes with degree k ≥100.

THE SCALE-FREE PROPERTY

9

(a)

(b)

0.15 0.15 0.15

-2.1 pkk ~ k-2.1 p ~ k-2.1 pk k~ k-2.1

0.1 0.1 0.1 pkk p pk k 0.05 0.05 0.05

0 0 0

POISSON POISSON POISSON POISSON

10 10 10

20 k 30 20 30 20 kk 30

40 40 40

50 50 50

1000 100 -2.1 pkk ~ k-2.1 100-1 pk ~ k-2.1-2.1 10 -1 pk ~ k 10-1 10-1 -2 -2 10 10-2 p10 kk -2 p10 pk k -3-3 10-3 10-3 10-4-4 10-4 10-4 POISSON POISSON 10-5-5 POISSON 10-5-5 POISSON 10 -6 10 -6 10-6 0 1011 k 1022 1033 10-6 10 00 10 101 102 103 100 101 kk 102 103

Figure 4.4 Poisson vs. Power-law Distributions

(a) Comparing a Poisson function with a power-law function (γ= 2.1) on a linear plot. Both distributions have ⟨k⟩= 11. (b) The same curves as in (a), but shown on a log-log plot, allowing us to inspect the difference between the two functions in the high-k regime. (c) A random network with ⟨k⟩= 3 and N = 50, illustrating that most nodes have comparable degree k≈⟨k⟩.

(d)

(c)

(d) A scale-free network with γ=2.1 and ⟨k⟩= 3, illustrating that numerous small-degree nodes coexist with a few highly connected hubs. The size of each node is proportional to its degree.

The Largest Hub All real networks are finite. The size of the WWW is estimated to be N ≈

1012 nodes; the size of the social network is the Earth’s population, about N



7 × 109. These numbers are huge, but finite. Other networks pale in com-

parison: The genetic network in a human cell has approximately 20,000 genes while the metabolic network of the E. Coli bacteria has only about a thousand metabolites. This prompts us to ask: How does the network size affect the size of its hubs? To answer this we calculate the maximum degree, kmax, called the natural cutoff of the degree distribution pk. It represents the expected size of the largest hub in a network. It is instructive to perform the calculation first for the exponential distribution

p(k) = Ce− λ k .

For a network with minimum degree kmin the normalization condition





kmin

provides C =

λeλk

p( k )dk = 1

(4.15)

. To calculate kmax we assume that in a network of N

min

nodes we expect at most one node in the (kmax, ∞) regime (ADVANCED TOPICS

3.B). In other words the probability to observe a node whose degree exceeds kmax is 1/N:





kmax

THE SCALE-FREE PROPERTY

p( k )dk =

1 . N

(4.16)

10

Equation (4.16) yields

kmax = kmin +

ln N . λ

1010 109 108 107

(4.17)

105 104 103 102 101 100

imum degree will not be significantly different from kmin. For a Poisson degree distribution the calculation is a bit more involved, but the obtained dependence of kmax on N is even slower than the logarithmic dependence predicted by (4.17) (ADVANCED TOPICS 3.B).

RANDOM NETWORK

104

106

8 N 10

1010

1012

Figure 4.5 Hubs are Large in Scale-free Networks

1

kmax = kmin N γ −1 .

The estimated degree of the largest node (natural cutoff) in scale-free and random networks with the same average degree ⟨k⟩= 3. For the scale-free network we chose γ = 2.5. For comparison, we also show the linear behavior, kmax ∼ N − 1, expected for a complete network. Overall, hubs in a scale-free network are several orders of magnitude larger than the biggest node in a random network with the same N and ⟨k⟩.

(4.18)

Hence the larger a network, the larger is the degree of its biggest hub. The polynomial dependence of kmax on N implies that in a large scale-free network there can be orders of magnitude differences in size between the smallest node, kmin, and the biggest hub, kmax (Figure 4.5). To illustrate the difference in the maximum degree of an exponential and a scale-free network let us return to the WWW sample of Figure 4.1, consisting of N ≈ 3 × 105 nodes. As kmin = 1, if the degree distribution were to follow an exponential, (4.17) predicts that the maximum degree should be kmax

1

(ʏ-1)

kmax ~ InN 102

For a scale-free network, according to (4.12) and (4.16), the natural cutoff

kmax ~ N

kmax

As lnN is a slow function of the system size, (4.17) tells us that the max-

follows

SCALE-FREE (N - 1)

≈ 14 for λ=1. In a scale-free network of similar size and γ

= 2.1,

(4.18) predicts kmax ≈ 95,000, a remarkable difference. Note that the largest in-degree of the WWW map of Figure 4.1 is 10,721, which is comparable to

kmax predicted by a scale-free network. This reinforces our conclusion that

in a random network hubs are effectivelly forbidden, while in scale-free networks they are naturally present.

In summary the key difference between a random and a scale-free network is rooted in the different shape of the Poisson and of the power-law function: In a random network most nodes have comparable degrees and hence hubs are forbidden. Hubs are not only tolerated, but are expected in scale-free networks (Figure 4.6). Furthermore, the more nodes a scalefree network has, the larger are its hubs. Indeed, the size of the hubs grows polynomially with network size, hence they can grow quite large in scalefree networks. In contrast in a random network the size of the largest node grows logarithmically or slower with N, implying that hubs will be tiny even in a very large random network.

THE SCALE FREE PROPERTY

11

HUBS

(a) POISSON

P

Most nodes have the same number of links

Chicago Boston

No highly connected nodes

Los Angeles

Number of links (k)

Number of nodesNumber with k links of nodes with k links

Number of nodes with k links

(b)

P

(d)

Number of nodes with k links

(c) POWER LAW

Many nodes with only a few links A few hubs with large number of links

Chicago Boston Los Angeles

Number of links (k)

Figure 4.6 Random vs. Scale-free Networks

(a) The degrees of a random network follow a Poisson distribution, rather similar to a bell curve. Therefore most nodes have comparable degrees and nodes with a large number of links are absent. (b) A random network looks a bit like the national highway network in which nodes are cities and links are the major highways. There are no cities with hundreds of highways and no city is disconnected from the highway system.

(c) In a network with a power-law degree distribution most nodes have only a few links. These numerous small nodes are held together by a few highly connected hubs. (d) A scale-free network looks like the air-traffic network, whose nodes are airports and links are the direct flights between them. Most airports are tiny, with only a few flights. Yet, we have a few very large airports, like Chicago or Los Angeles, that act as major hubs, connecting many smaller airports. Once hubs are present, they change the way we navigate the network. For example, if we travel from Boston to Los Angeles by car, we must drive through many cities. On the airplane network, however, we can reach most destinations via a single hub, like Chicago. After [4].

THE SCALE FREE PROPERTY

12

HUBS

Number of nodesNumber with k links of nodes with k links

P

P

SECTION 4.4

THE MEANING OF SCALE-FREE

The term “scale-free” is rooted in a branch of statistical physics called the theory of phase transitions that extensively explored power laws in the 1960s and 1970s (ADVANCED TOPICS 3.F). To best understand the meaning of the scale-free term, we need to familiarize ourselves with the moments of the degree distribution. The nth moment of the degree distribution is defined as ∞

〈 k n 〉 = ∑ k n pk ≈ ∫ kmin



kmin

k n p( k )dk.

(4.19)

The lower moments have important interpretation: • n=1: The first moment is the average degree, ⟨k⟩.

• n=2: The second moment, ⟨k2⟩, helps us calculate the variance σ2 = ⟨k2⟩ − ⟨k⟩2, measuring the spread in the degrees. Its square root,

standard deviation.

σ, is the

• n=3: The third moment, ⟨k3⟩, determines the skewness of a distribution, telling us how symmetric is pk around the average ⟨k⟩.

For a scale-free network the nth moment of the degree distribution is

〈k n 〉 = ∫

kmax

kmin

k n p( k )dk = C

n −γ +1 n −γ +1 kmax − kmin . n −γ +1

(4.20)

While typically kmin is fixed, the degree of the largest hub, kmax, increas-

es with the system size, following (4.18). Hence to understand the behavior

of ⟨kn⟩ we need to take the asymptotic limit kmax → ∞ in (4.20), probing the properties of very large networks. In this limit (4.20) predicts that the value of ⟨kn⟩ depends on the interplay between n and γ: n−γ+1 • If n −γ + 1 ≤ 0 then the first term on the r.h.s. of (4.20), kmax , goes to

zero as kmax increases. Therefore all moments that satisfy n ≤ γ−1 are finite.

• If n−γ+1 > 0 then ⟨kn⟩ goes to infinity as kmax→∞. Therefore all moTHE SCALE-FREE PROPERTY

13

ments larger than γ−1 diverge.

For many scale-free networks the degree exponent γ is between 2 and 3

(Table 4.1). Hence for these in the N → ∞ limit the first moment ⟨k⟩ is finite,

but the second and higher moments, ⟨k2⟩, ⟨k3⟩, go to infinity. This diver-

gence helps us understand the origin of the “scale-free” term. Indeed, if the degrees follow a normal distribution, then the degree of a randomly chosen node is typically in the range

k = k ± σ k .

(4.21)

Yet, the average degree and the standard deviation σk have rather different magnitude in random and in scale-free networks: • Random Networks Have a Scale For a random network with a Poisson degree distribution σk = 1/2, which is always smaller than ⟨k⟩. Hence the network’s nodes have de-

grees in the range k = ⟨k⟩ ± ⟨k⟩1/2. In other words nodes in a random network have comparable degrees and the average degree ⟨k⟩ serves

as the “scale” of a random network. • Scale-free Networks Lack a Scale For a network with a power-law degree distribution with γ < 3 the first moment is finite but the second moment is infinite. The divergence of ⟨k2⟩ (and of

pk

σk) for large N indicates that the fluctuations around

the average can be arbitrary large. This means that when we randomly choose a node, we do not know what to expect: The selected node’s degree could be tiny or arbitrarily large. Hence networks with γ < 3 do

not have a meaningful internal scale, but are “scale-free” (Figure 4.7).

⟨k⟩ k

For example the average degree of the WWW sample is ⟨k⟩ = 4.60 (Ta-

Random Network

ble 4.1). Given that γ ≈ 2.1, the second moment diverges, which means

that our expectation for the in-degree of a randomly chosen WWW

Randomly chosen node: k = k ± k Scale: ⟨k⟩

document is k=4.60 ± ∞ in the N → ∞ limit. That is, a randomly chosen

Scale-Free Network

1/2

Randomly chosen node: k = k ± ∞ Scale: none

web document could easily yield a document of degree one or two, as 74.02% of nodes have in-degree less than ⟨k⟩. Yet, it could also yield a node with hundreds of millions of links, like google.com or facebook.

Figure 4.7 Lack of an Internal Scale

com. Strictly speaking ⟨k2⟩ diverges only in the N

→ ∞ limit. Yet, the diver-

For any exponentially bounded distribution, like a Poisson or a Gaussian, the degree of a randomly chosen node is in the vicinity of ⟨k⟩. Hence ⟨k⟩ serves as the network’s scale. For a power law distribution the second moment can diverge, and the degree of a randomly chosen node can be significantly different from ⟨k⟩. Hence ⟨k⟩ does not serve as an intrinsic scale. As a network with a power law degree distribution lacks an intrinsic scale, we

gence is relevant for finite networks as well. To illustrate this, Table 4.1 lists

⟨k2⟩

2

= ten k2 − k networks. and Figure 4.8 shows the standard deviation σ for real

For most of these networks σ is significantly larger than ⟨k⟩, documenting large variations in node degrees. For example, the degree of a randomly chosen node in the WWW sample is kin = 4.60 ± 1546, indicating once again that the average is not informative.

In summary, the scale-free name captures the lack of an internal scale, a consequence of the fact that nodes with widely different degrees coexist in the same network. This feature distinguishes scale-free networks from lattices, in which all nodes have exactly the same degree (σ = 0), or from

random networks, whose degrees vary in a narrow range (σ = ⟨k⟩1/2). As we THE SCALE FREE PROPERTY

14

THE MEANING OF SCALE-FREE

will see in the coming chapters, this divergence is the origin of some of the most intriguing properties of scale-free networks, from their robustness to random failures to the anomalous spread of viruses.

N

NETWORK

L

k

k2in

k2out

k2

in

out

Internet

192,244

609,066

6.34

-

-

240.1

-

-

3.42*

WWW

325,729

1,497,134

4.60

1546.0

482.4

-

2.00

2.31

-

Power Grid

4,941

6,594

2.67

-

-

10.3

-

-

Exp.

Mobile Phone Calls

36,595

91,826

2.51

12.0

11.7

-

4.69*

5.01*

-

Email

57,194

103,731

1.81

94.7

1163.9

-

3.43*

2.03*

-

Science Collaboration

23,133

93,439

8.08

-

-

178.2

-

3.35*

702,388

29,397,908

83.71

-

-

47,353.7

-

Actor Network

-

-

2.12*

Citation Network

449,673

4,689,479

10.43

971.5

198.8

-

3.03**

4.00*

-

E. Coli Metabolism

1,039

5,802

5.58

535.7

396.7

-

2.43*

2.9 0*

-

Protein Interactions

2,018

2,930

2.9 0

-

-

32.3

-

-

2.89*

Table 4.1 Degree Fluctuations in Real Networks

The table shows the first 〈k〉 and the second 2 moment ⟨k2⟩ (〈kin2〉 and 〈kout 〉 for directed networks) for ten reference networks. For directed networks we list 〈k〉=〈kin〉=〈kout〉. We also list the estimated degree exponent, γ, for each network, determined using the procedure discussed in ADVANCED TOPICS 4.A. The stars next to the reported values indicate the confidence of the fit to the degree distribution. That is, * means that the fit shows statistical confidence for a power-law (k−γ); while ** marks statistical confidence for a fit (4.39) with an exponential cutoff. Note that the power grid is not scalefree. For this network a degree distribution of the form e−λk offers a statistically significant fit, which is why we placed an “Exp” in the last column.

45 40

WWW (IN)

35

EMAIL (OUT)

30

CITATIONS (IN)

25

σ

Figure 4.8 Standard Deviation is Large in Real Networks

METABOLIC (IN)

WWW (OUT)

20

METABOLIC (OUT)

15

For a random network the standard deviation follows σ = 1/2 shown as a green dashed line on the figure. The symbols show σ for nine of the ten reference networks, calculated using the values shown in Table 4.1. The actor network has a very large ⟨k⟩ and σ, hence it omitted for clarity. For each network σ is larger than the value expected for a random network with the same ⟨k⟩. The only exception is the power grid, which is not scale-free. While the phone call network is scale-free, it has a large γ, hence it is well approximated by a random network.

INTERNET SCIENCE COLLABORATION

10

EMAIL (IN)

5

CITATIONS (OUT)

‹k›1/2

PROTEIN PHONE CALLS (IN, OUT) POWER GRID

0

2

THE SCALE-FREE PROPERTY

4

6

‹k›

8

10

12

14

15

THE MEANING OF SCALE-FREE

SECTION 4.5

UNIVERSALITY

While the terms WWW and Internet are often used interchangeably in the media, they refer to different systems. The WWW is an information

Figure 4.9

The topology of the Internet

network, whose nodes are documents and links are URLs. In contrast the

An iconic representation of the Internet topology at the beginning of the 21st century. The image was produced by CAIDA, an organization based at University of California in San Diego, devoted to collect, analyze, and visualize Internet data. The map illustrates the Internet’s scale-free nature: A few highly connected hubs hold together numerous small nodes.

Internet is an infrastructural network, whose nodes are computers called routers and whose links correspond to physical connections, like copper and optical cables or wireless links. This difference has important consequences: The cost of linking a Boston-based web page to a document residing on the same computer or to one on a Budapest-based computer is the same. In contrast, establishing a direct Internet link between routers in Boston and Budapest would require us to lay a cable between North America and Europe, which is prohibitively expensive. Despite these differences, the degree distribution of both networks is well approximated by a power law [1, 5, 6]. The signatures of the Internet’s scale-free nature are visible in Figure 4.9, showing that a

THE SCALE-FREE PROPERTY

16

few high-degree routers hold together a large number of routers with only a few links. In the past decade many real networks of major scientific, technological and societal importance were found to display the scale-free property. This is illustrated in Figure 4.10, where we show the degree distribution of an infrastructural network (Internet), a biological network (protein interactions), a communication network (emails) and a network characterizing scientific communications (citations). For each network the degree distribution significantly deviates from a Poisson distribution, being better approximated with a power law. The diversity of the systems that share the scale-free property is remarkable (BOX 4.2). Indeed, the WWW is a man-made network with a history of little more than two decades, while the protein interaction network is the product of four billion years of evolution. In some of these networks the nodes are molecules, in others they are computers. It is this diversity that prompts us to call the scale-free property a universal network characteristic. From the perspective of a researcher, a crucial question is the following: How do we know if a network is scale-free? On one end, a quick look at the degree distribution will immediately reveal whether the network could be scale-free: In scale-free networks the degrees of the smallest and the largest nodes are widely different, often spanning several orders of magnitude. In contrast, these nodes have comparable degrees in a random network. As the value of the degree exponent plays an important role in predicting various network properties, we need tools to fit the pk distribution

and to estimate γ. This prompts us to address several issues pertaining to plotting and fitting power laws: Plotting the Degree Distribution

The degree distributions shown in this chapter are plotted on a double logarithmic scale, often called a log-log plot. The main reason is that when we have nodes with widely different degrees, a linear plot is unable to display them all. To obtain the clean-looking degree distributions shown throughout this book we use logarithmic binning, ensuring that each datapoint has sufficient number of observations behind it. The practical tips for plotting a network’s degree distribution are discussed in ADVANCED TOPICS 4.B. Measuring the Degree Exponent A quick estimate of the degree exponent can be obtained by fitting a straight line to pk on a log-log plot.Yet, this approach can be affected by

systematic biases, resulting in an incorrect γ. The statistical tools available to estimate γ are discussed in ADVANCED TOPICS 4.C.

The Shape of pk for Real Networks Many degree distributions observed in real networks deviate from a pure power law. These deviations can be attributed to data incompleteTHE SCALE FREE PROPERTY

17

UNIVERSALITY

ness or data collection biases, but can also carry important information about processes that contribute to the emergence of a particular network. In ADVANCED TOPICS 4.B we discuss some of these deviations and in CHAPTER 6 we explore their origins. In summary, since the 1999 discovery of the scale-free nature of the WWW, a large number of real networks of scientific and technological interest have been found to be scale-free, from biological to social and linguistic networks (BOX 4.2). This does not mean that all networks are scalefree. Indeed, many important networks, from the power grid to networks observed in materials science, do not display the scale-free property (BOX 4.3). (a)

(b)

100

100

Figure 4.10

Many Real Networks are Scale-free

10-1 10-1

10-2

The degree distribution of four networks listed in Table 4.1.

10-3

(a) Internet at the router level.

10-2

10-4 pk 10-5

(b) Protein-protein interaction network.

pk

(c) Email network.

10-3

10-6

(d) Citation network.

10-7

In each panel the green dotted line shows the Poisson distribution with the same 〈k〉 as the real network, illustrating that the random network model cannot account for the observed pk. For directed networks we show separately the incoming and outgoing degree distributions.

10-4

PROTEIN INTERACTIONS

INTERNET

10-8

10-5

10-9 10

0

10

1

k

10

100

10

2

3

(c)

k

101

102

(d)

100

100

kin kout

10-1 10-2

kin kout

10-1 10-2

EMAILS

-3

10 pk 10-4

10 pk 10-4

10-5

10-5

10-6

10-6

10-7

10-7

10-8

10-8

CITATIONS

-3

10-9

10-9 10

0

10

1

10 kin, kout

2

THE SCALE-FREE PROPERTY

10

3

10

4

100

101

102 kin, kout

103

104

18

UNIVERSALITY

19

1965

UNIVERSALITY

Derek de Solla Price (1922 - 1983) discovers that citations follow a power-law distribution [7], a finding later attributed to the scale-free nature of the citation network [2].

CITATIONS [7]

PUBLICATION DATE

Michalis, Petros, and Christos Faloutsos discover the scale-free nature of the internet [15].

Réka Albert, Hawoong Jeong, and Albert-László Barabási discover the power-law nature of the WWW [1] and introduce scale-free networks [2, 10].

TIMELINE: SCALE-FREE NETWORKS

BOX 4.2

CITATIONS [8]

1998

0 1999

4

2000

23

2001

54

METABOLIC [11, 12]

2002

145

SOFTWARE [21] ENERGY LANDSCAPE [23]

EMAIL [22]

LINGUISTICS [19] ELECT. CIRCUITS [20]

COAUTHOR. [16, 17] SEXUAL CONTACTS [18]

PHONE CALLS [13]

INTERNET [5]

ACTORS [2]

WWW [1, 2, 9, 10]

PROTEINS [14,15]

2003

304

2005

2006

2007

2008

2009

2010

1470 1460

TWITTER [25, 26]

2011

1760

FACEBOOK [27]

2012

1900

2013

1960

Barabási and Albert, 1999

“we expect that the scale-invariant state observed in all systems for which detailed data has been available to us is a generic property of many complex networks, with applicability reaching far beyond the quoted examples.”

2004

559

781

985

1180

MOBILE CALLS [24]

2560

# OF PAPERS ON “SCALE-FREE NETWORKS” (Google Scholar)

THE SCALE-FREE PROPERTY

BOX 4.3 NOT ALL NETWORK ARE SCALE-FREE The ubiquity of the scale-free property does not mean that all real networks are scale-free. To the contrary, several important networks do not share this property: • Networks appearing in material science, describing the bonds between the atoms in crystalline or amorphous materials. In these networks each node has exactly the same degree, determined by chemistry (Figure 4.11). • The neural network of the C. elegans worm [28]. • The power grid, consisting of generators and switches connected by transmission lines. For the scale-free property to emerge the nodes need to have the capacity to link to an arbitrary number of other nodes. These links do not need to be concurrent: We do not constantly chat with each of our acquaintances and a protein in the cell does not simultaneously bind to each of its potential interaction partners. The scalefree property is absent in systems that limit the number of links a node can have, effectively restricting the maximum size of the hubs. Such limitations are common in materials (Figure 4.11), explaining why they cannot develop a scale-free topology.

THE SCALE FREE PROPERTY

Figure 4.11 The Material Network

A carbon atom can share only four electrons with other atoms, hence no matter how we arrange these atoms relative to each other, in the resulting network a node can never have more than four links. Hence, hubs are forbidden and the scale-free property cannot emerge. The figure shows several carbon allotropes, i.e. materials made of carbon that differ in the structure of the network the carbon atoms arrange themselves in. This different arrangement results in materials with widely different physical and electronic characteristics, like (a) diamond; (b) graphite; (c) lonsdaleite; (d) C60 (buckminsterfullerene); (e) C540 (a fullerene) (f) C70 (another fullerene); (g) amorphous carbon; (h) single-walled carbon nanotube.

20

UNIVERSALITY

SECTION 4.6

ULTRA-SMALL WORLD PROPERTY

The presence of hubs in scale-free networks raises an interesting question: Do hubs affect the small world property? Figure 4.4 suggests that they do: Airlines build hubs precisely to decrease the number of hops between two airports. The calculations support this expectation, finding that distances in a scale-free network are smaller than the distances observed in an equivalent random network. The dependence of the average distance ⟨d⟩ on the system size N and the degree exponent

γ are captured by the formula [29, 30]     〈d 〉 ~    

const.

ln ln N

ln N ln ln N ln N

γ =2 2 3

Next we discuss the behavior of ⟨d⟩ in the four regimes predicted by

(4.22), as summarized in Figure 4.12: Anomalous Regime (γ = 2)

According to (4.18) for γ = 2 the degree of the biggest hub grows linearly

with the system size, i.e. kmax ∼ N. This forces the network into a hub and spoke configuration in which all nodes are close to each other because

they all connect to the same central hub. In this regime the average path length does not depend on N. Ultra-Small World (2 < γ < 3) Equation (4.22) predicts that in this regime the average distance increases as lnlnN, a significantly slower growth than the lnN derived for random networks. We call networks in this regime ultra-small, as the hubs radically reduce the path length [29]. They do so by linking to a large number of small-degree nodes, creating short distances between them.

THE SCALE-FREE PROPERTY

21

To see the implication of the ultra-small world property consider again the world’s social network with N ≈ 7x109. If the society is described by

a random network, the N-dependent term is lnN = 22.66. In contrast for

a scale-free network the N-dependent term is lnlnN = 3.12, indicating that the hubs radically shrink the distance between the nodes. HUMAN PPI

(a)

INTERNET (2011)

SOCIETY

WWW

30

InN (γ > 3 and random)

⟨d⟩

Figure 4.12

20

Distances in Scale-free Networks InN

10

InInN

(a) The scaling of the average path length in the four scaling regimes characterizing a scale-free network: constant (γ = 2), lnlnN (2 < γ< 3), lnN/lnlnN (γ = 3), lnN (γ > 3 and random networks). The dotted lines mark the approximate size of several real networks. Given their modest size, in biological networks, like the human protein-protein interaction network (PPI), the differences in the node-to-node distances are relatively small in the four regimes. The differences in ⟨d⟩ is quite significant for networks of the size of the social network or the WWW. For these the small-world formula significantly underestimates the real ⟨d⟩.

(γ = 3)

InInN (2 < γ < 3) (γ = 2)

0 10

10

2

10

4

10

6

10

N

10

10

12

N = 104

N = 102 0.5

pd

10

8

N = 106

0.5

(b)

0.5

(c)

0.4

0.4

0.4

0.3

0.3

0.3

0.2

0.2

0.2

0.1

0.1

0.1

0

0

5 d γ = 2.1

10

15

γ = 3.0

20

0

γ = 5.0

0

5 d

14

10

15

20

0

(d)

0

5 d

10

15

(b) (c) (d) Distance distribution for networks of size N = 102, 104, 106, illustrating that while for small networks (N = 102) the distance distributions are not too sensitive to γ, for large networks (N = 106) pd and ⟨d⟩ change visibly with γ.

20

RN

Critical Point (γ = 3)

The networks were generated using the static model [32] with ⟨k⟩ = 3.

This value is of particular theoretical interest, as the second moment of the degree distribution does not diverge any longer. We therefore call

γ = 3 the critical point. At this critical point the lnN dependence en-

countered for random networks returns. Yet, the calculations indicate the presence of a double logarithmic correction lnlnN [29, 31], which shrinks the distances compared to a random network of similar size. Small World (γ > 3) In this regime ⟨k2⟩ is finite and the average distance follows the small world result derived for random networks. While hubs continue to be present, for γ > 3 they are not sufficiently large and numerous to have a significant impact on the distance between the nodes. Taken together, (4.22) indicates that the more pronounced the hubs are, the more effectively they shrink the distances between nodes. This conclusion is supported by Figure 4.12a, which shows the scaling of the average path length for scale-free networks with different γ. The figure indicates

that while for small N the distances in the four regimes are comparable, for large N we observe remarkable differences. Further support is provided by the path length distribution for scaleTHE SCALE FREE PROPERTY

22

ULTRA-SMALL PROPERTY

γ and N (Figure 4.12b-d). For N = 102 the path length distributions overlap, indicating that at this size differences in γ refree networks with different

sult in undetectable differences in the path length. For N = 106, however, pd observed for different γ are well separated. Figure 4.12d also shows that the

larger the degree exponent, the larger are the distances between the nodes. In summary the scale-free property has several effects on network distances: • Shrinks the average path lengths. Therefore most scale-free networks of practical interest are not only “small”, but are “ultra-small”. This is a consequence of the hubs, that act as bridges between many small degree nodes. • Changes the dependence of ⟨d⟩ on the system size, as predicted by (4.22). The smaller is γ, the shorter are the distances between

the nodes. • Only for

γ > 3 we recover the ln N dependence,

the signature of the

small-world property characterizing random networks (Figure 4.12).

BOX 4.4 WE ARE ALWAYS CLOSE TO THE HUBS Frigyes Karinthy in his 1929 short story [33] that first described

12

the small world concept cautions that “it’s always easier to find

10

someone who knows a famous or popular figure than some run-

8 ⟨dtarget⟩ 6

the-mill, insignificant person”. In other words, we are typically closer to hubs than to less connected nodes. This effect is particu-

4

larly pronounced in scale-free networks (Figure 4.13).

2 0

RANDOM NETWORK

SCALE-FREE

10 20 30 40 50 60 70 80 90 100

The implications are obvious: There are always short paths linking us to famous individuals like well known scientists or the president of the United States, as they are hubs with an excep-

ktarget

Figure 4.13 Closing on the hubs

tional number of acquaintances. It also means that many of the The distance ⟨dtarget⟩ of a node with degree k ≈ ⟨k⟩ to a target node with degree ktarget in a random and a scale-free network. In scale-free networks we are closer to the hubs than in random networks. The figure also illustrates that in a random network the largest-degree nodes are considerably smaller and hence the path lengths are visibly longer than in a scale-free network. Both networks have ⟨k⟩ = 2 and N = 1,000 and for the scale-free network we choose γ = 2.5.

shortest paths go through these hubs. In contrast to this expectation, measurements aiming to replicate the six degrees concept in the online world find that individuals involved in chains that reached their target were less likely to send a message to a hub than individuals involved in incomplete chains [34]. The reason may be self-imposed: We perceive hubs as being busy, so we contact them only in real need. We therefore avoid them in online experiments of no perceived value to them.

THE SCALE FREE PROPERTY

23

ULTRA-SMALL PROPERTY

SECTION 4.7

THE ROLE OF THE DEGREE EXPONENT

Many properties of a scale-free network depend on the value of the degree exponent γ. A close inspection of Table 4.1 indicates that: •

γ

varies from system to system, prompting us to explore how the

properties of a network change with

γ.

• For most real systems the degree exponent is above 2, making us wonder: Why don’t we see networks with γ < 2? To address these questions next we discuss how the properties of a scale-free network change with γ (BOX 4.5). Anomalous Regime (γ≤ 2) For

γ< 2 the exponent 1/(γ− 1) in (4.18) is larger than one, hence the

number of links connected to the largest hub grows faster than the size of the network. This means that for sufficiently large N the degree of the largest hub must exceed the total number of nodes in the network, hence it will run out of nodes to connect to. Similarly, for γ < 2 the av-

erage degree ⟨k⟩ diverges in the N → ∞ limit. These odd predictions are

only two of the many anomalous features of scale-free networks in this regime. They are signatures of a deeper problem: Large scale-free network with

γ < 2, that lack multi-links, cannot exist (BOX 4.6).

Scale-Free Regime (2 < γ< 3) In this regime the first moment of the degree distribution is finite but the second and higher moments diverge as N →∞. Consequently scale-

free networks in this regime are ultra-small (SECTION 4.6). Equation (4.18) predicts that kmax grows with the size of the network with exponent 1/

(γ - 1), which is smaller than one. Hence the market share of the largest hub, kmax /N, representing the fraction of nodes that connect to it, decreases as k /N ∼ N-(γ-2)/(γ-1). max

As we will see in the coming chapters, many interesting features of scale-free networks, from their robustness to anomalous spreading

THE SCALE-FREE PROPERTY

24

BOX 4.5 THE γ DEPENDENT PROPERTIES OF SCALE-FREE NETWORKS

RANDOM REGIME

SCALE-FREE REGIME

ANOMALOUS REGIME

1

DIVERGES

k2

DIVERGES

d

const

kmax GROWS FASTER THAN N

=2 kmax N

k k2

d

) (IN

TE

IN

CO

LL

AB

ON

3 FINITE DIVERGES

=3

d

ln N ln ln N

CRITICAL POINT

lnlnN

ULTRA-SMALL WORLD

kmax N

THE SCALE-FREE PROPERTY

TA TI

B

2

k

CI

W EM W ( AI OU AC L (O T) TO UT ) W R W W M (IN) ET AB .( IN )

W

A

OR EM RN ATIO N AI ET L (IN )

Indistinguishable from a random network

PR O M TEI ET N AB (IN .( ) OU T)

No large network can exist here

γ

k

FINITE

k2

FINITE

d

lnN ln k SMALL WORLD

1 -1

25

THE ROLE OF THE DEGREE EXPONENT

phenomena, are linked to this regime. Random Network Regime (γ > 3) According to (4.20) for γ > 3 both the first and the second moments are finite. For all practical purposes the properties of a scale-free network in this regime are difficult to distinguish from the properties a random network of similar size. For example (4.22) indicates that the average distance between the nodes converges to the small-world formula derived for random networks. The reason is that for large

γ

the degree

distribution pk decays sufficiently fast to make the hubs small and less numerous. Note that scale-free networks with large γ are hard to distinguish from a random network. Indeed, to document the presence of a power-law degree distribution we ideally need 2-3 orders of magnitude of scaling, which means that kmax should be at least 102 - 103 times larger than kmin.

By inverting (4.18) we can estimate the network size necessary to observe the desired scaling regime, finding

k  N =  max   kmin 

γ −1

.

(4.23)

For example, if we wish to document the scale-free nature of a network with

γ

= 5 and require scaling that spans at least two orders of magni-

tudes (e.g. kmin ∼ 1 and kmax ≃ 102), according to (4.23) the size of the network must exceed N > 108. There are very few network maps of this size.

Therefore, there may be many networks with large degree exponent. Given, however, their limited size, it is difficult to obtain convincing evidence of their scale-free nature. In summary, we find that the behavior of scale-free networks is sensitive to the value of the degree exponent γ. Theoretically the most interesting regime is 2
2 are graphical, it is impossible to find graphical networks in the 0 < γ < 2 range. After [39].

27

THE ROLE OF THE DEGREE EXPONENT

SECTION 4.8

GENERATING NETWORKS WITH ARBITRARY DEGREE DISTRIBUTION

k1=3 k2=2

Networks generated by the Erdős-Rényi model have a Poisson degree distribution. The empirical results discussed in this chapter indicate, how-

k3=2 k4=1

(a)

ever, that the degree distribution of real networks significantly deviates from a Poisson form, raising an important question: How do we generate

(b)

networks with an arbitrary pk? In this section we discuss three frequently used algorithms designed for this purpose. Configuration Model

(c)

The configuration model, described in Figure 4.15, helps us build a network with a pre-defined degree sequence. In the network generated by the

(d)

model each node has a pre-defined degree ki, but otherwise the network is

wired randomly. Consequently the network is often called a random net-

work with a pre-defined degree sequence. By repeatedly applying this procedure to the same degree sequence we can generate different networks

Figure 4.15

with the same pk (Figure 4.15b-d). There are a couple of caveats to consider:

The Configuration Model The configuration model builds a network whose nodes have pre-defined degrees [40, 41]. The algorithm consists of the following steps:

• The probability to have a link between nodes of degree ki and kj is

pij =

ki k j . 2L − 1

(4.24)

(a) Degree Sequence Assign a degree to each node, represented as stubs or half-links. The degree sequence is either generated analytically from a preselected pk distribution (BOX 4.7), or it is extracted from the adjacency matrix of a real network. We must start from an even number of stubs, otherwise we are left with unpaired stubs.

Indeed, a stub starting from node i can connect to 2L - 1 other stubs. Of these, kj are attached to node j. So the probability that a particular stub is connected to a stub of node j is kj /(2L - 1). As node i has ki stubs, it has kj attempts to link to j, resulting in (4.24).

• The obtained network contains self-loops and multi-links, as there is

(b, c, d) Network Assembly Randomly select a stub pair and connect them. Then randomly choose another pair from the remaining 2L - 2 stubs and connect them. This procedure is repeated until all stubs are paired up. Depending on the order in which the stubs were chosen, we obtain different networks. Some networks include cycles (b), others self-loops (c) or multi-links (d). Yet, the expected number of self-loops and multi-links goes to zero in the N → ∞ limit.

nothing in the algorithm to forbid a node connecting to itself, or to generate multiple links between two nodes. We can choose to reject stub pairs that lead to these, but if we do so, we may not be able to complete the network. Rejecting self-loops or multi-links also means that not all possible matchings appear with equal probability. Hence (4.24) will not be valid, making analytical calculations difficult. Yet, the number of self-loops and multi-links remain negligible, as the number of choices to connect to increases with N, so typically we do not need to exclude them [42]. THE SCALE-FREE PROPERTY

28

• The configuration model is frequently used in calculations, as (4.24) and its inherently random character helps us analytically calculate numerous network measures.

BOX 4.7 GENERATING A DEGREE SEQUENCE WITH POWER-LAW DISTRIBUTION

The degree sequence of an undirected network is a sequence of

b

networks shown in Figure 4.15a is {3, 2, 2, 1}. As Figure 4.15a illustrates, the degree sequence does not uniquely identify a graph, as there are multiple ways we can pair up the stubs.

pk~k-ʏ pk~k-ʏ

10-6 10-6

To generate a degree sequence from a pre-defined degree distribution we start from an analytically pre-defined degree distribuDegree preserving

10-8 10-8

degree sequence {k1, k2, ..., kN} that follow the distribution pk. We

10-10 10-10 100 100

Full randomization

(a)

10-0 10-0 pk pk -2 10 10-2 10-4 10-4

node degrees. For example, the degree sequence of each of the

Original network

randomization tion, like pk∼k-γ, shown in Figure 4.16a. Our goal is to generate a

101 101

102 102

start by calculating the function

D( k ) = ∑ pk ' ,

k k

103 103

1 1

(4.25)

k '≥ k

104 104

(b)

D(k) D(k)

shown in Figure 4.16b. D(k) is between 0 and 1, and the step size at

0.5 0.5

any k equals pk. To generate a sequence of N degrees following pk,

we generate N random numbers ri, i = 1, ..., N, chosen uniformly

r r 0 0

from the (0, 1) interval. For each ri we use the plot in (b) to assign

a degree ki. The obtained ki = D (ri) set of numbers follows the de-1

sired pk distribution. Note that the degree sequence assigned to

a pk is not unique - we can generate multiple sets of {k1, ..., kN} se-

D(k) D(k)

1 1

k=D-1(r) k=D-1(r)

k' k k' k

k' k'

10 10

k k

100 100

quences compatible with the same pk.

Figure 4.16 Generating a Degree Sequence

(a) The power law degree distribution of the degree sequence we wish to generate. (b) The function (4.25), that allows us to assign degrees k to uniformly distributed random numbers r.

THE SCALE-FREE PROPERTY

29

GENERATING NETWORKS WITH A PRE-DEFINED DEGREE DISTRIBUTION

Degree-Preserving Randomization As we explore the properties of a real network, we often need to ask if a certain network property is predicted by its degree distribution alone, or if it represents some additional property not contained in pk. To answer this question we need to generate networks that are wired randomly, but whose pk is identical to the original network. This can be achieved through

degree-preserving randomization [43] described in Figure 4.17b. The idea be-

hind the algorithm is simple: We randomly select two links and swap them, if the swap does not lead to multi-links. Hence the degree of each of the four involved nodes in the swap remains unchanged. Consequently, hubs stay hubs and small-degree nodes retain their small degree, but the wiring diagram of the generated network is randomized. Note that degree-preserving randomization is different from full randomization, where we swap links without preserving the node degrees (Figure 4.17a). Full randomization turns any network into an Erdős-Rényi network with a Poisson degree distribution that is independent of the original pk.

Figure 4.17 Degree Preserving Randomization

(b)

(a)

FULL RANDOMIZATION

T1

ORIGINAL NETWORK

DEGREE-PRESERVING RANDOMIZATION

Two algorithms can generate a randomized version of a given network [43], with different outcomes. (a) Full Randomization This algorithm generates a random (Erdős– Rényi) network with the same N and L as the original network. We select randomly a source node (S1) and two target nodes, where the first target (T1) is linked directly to the source node and the second target (T2) is not. We rewire the S1-T1 link, turning it into an S1-T2 link. As a result the degree of the target nodes T1 and T2 changes. We perform this procedure once for each link in the network.

T1

S1

S1

T2

S2

T2

(b) Degree-Preserving Randomization This algorithm generates a network in which each node has exactly the same degree as in the original network, but the network’s wiring diagram has been randomized. We select two source (S1, S2) and two target nodes (T1, T2), such that initially there is a link between S1 and T1, and a link between S2 and T2. We then swap the two links, creating an S1-T2 and an S2-T1 link. The swap leaves the degree of each node unchanged.We repeat this procedure until we rewire each link at least once. Bottom Panels: Starting from a scale-free network (middle), full randomization eliminates the hubs and turns the network into a random network (left). In contrast, degree-preserving randomization leaves the hubs in place and the network remains scale-free (right). THE SCALE FREE PROPERTY

30

GENERATING NETWORKS WITH A PRE-DEFINED DEGREE DISTRIBUTION

Hidden Parameter Model The configuration model generates self-loops and multi-links, features

p3,4=0.2

p1,3=0.4

(a)

that are absent in many real networks. We can use the hidden parameter model (Figure 4.18) to generate networks with a pre-defined pk but without multi-links and self-loops [44, 45, 46]. ηi

We start from N isolated nodes and assign each node i a hidden parameter ηi, chosen from a distribution ρ(η). The nature of the generated network

1

2

3

4

1

1.5

2

0.5

⟨η⟩=1.25

>

depends on the selection of the {ηi} hidden parameter sequence. There are

(b)

two ways to generate the appropriate hidden parameters:

(c)

1

2

1

2

3

4

3

4

• ηi can be a sequence of N random numbers chosen from a pre-defined ρ(η) distribution. The degree distribution of the obtained network is

pk = ∫

e −ηη k ρ(η )dη . k!

• ηi can come from a deterministic sequence {η1,

(4.26)

η2, ..., ηN}. The degree

distribution of the obtained network is −η

pk =

e j ηj 1 . ∑ N j k! k

Figure 4.18 Hidden Parameter Model

(4.27)

(a) We start with N isolated nodes and assign to each node a hidden parameter ηi, which is either selected from a ρ(η) distribution or it is provided by a sequence {ηi}. We connect each node pair with probability

The hidden parameter model offers a particularly simple method to generate a scale-free network. Indeed, using j

c = , i = 1,..., N i

p(ηi , η j ) =

(4.28)

The figure shows the probability to connect nodes (1,3) and (3,4).

as the sequence of hidden parameters, according to (4.27) the obtained network will have the degree distribution

pk

k

(1+ 1 )

ηiη j . η N

(b, c) After connecting the nodes, we obtain the networks shown in (b) or (c), representing two independent realizations generated by the same hidden parameter sequence (a).

(4.29)

for large k. Hence by choosing the appropriate α we can tune γ=1+1/α. We

can also use ⟨η⟩ to tune ⟨k⟩ as (4.26) and (4.27) imply that ⟨k⟩ = ⟨η⟩.

The expected number of links in the network generated by the model is

In summary, the configuration model, degree-preserving randomiza-

L=

tion and the hidden parameter model can generate networks with a pre-defined degree distribution and help us analytically calculate key network characteristics. We will turn to these algorithms each time we explore

1 N ηiη j 1 = η N. ∑ 2 i, j ' η N 2

Similar to the random network model, L will vary from network to network, following an exponentially bounded distribution. If we wish to control the average degree ⟨k⟩ we can add L links to the network one by one. The end points i and j of each link are then chosen randomly with a probability proportional to ηi and ηj. In this case we connect i and j only if they were not connected previously.

whether a certain network property is a consequence of the network’s degree distribution, or if it represents some emergent property (BOX 4.8). As we use these algorithms, we must be aware of their limitations: • The algorithms do not tell us why a network has a certain degree distribution. Understanding the origin of the observed pk will be the subject of CHAPTERS 6 and 7.

• Several important network characteristics, from clustering (CHAPTER 9) to degree correlations (CHAPTER 7), are lost during randomization.

THE SCALE FREE PROPERTY

31

GENERATING NETWORKS WITH A PRE-DEFINED DEGREE DISTRIBUTION

BOX 4.8 TESTING THE SMALL-WORD PROPERTY In the literature the distances observed in a real network are

0.35

often compared to the small-world formula (3.19). Yet, (3.19) was

0.3

derived for random networks, while real networks do not have

0.25

a Poisson degree distribution. If the network is scale-free, then

pd

(4.22) offers the appropriate formula. Yet, (4.22) provides only the

0.2

scaling of the distance with N, and not its absolute value. Instead

0.15

of fitting the average distance, we often ask: Are the distances ob-

0.1

served in a real network comparable with the distances observed

0.05

in a randomized network with the same degree distribution? De-

0

gree preserving randomization helps answer this question. We

2

4

6

8 d 10

12

14

16

Original network

illustrate the procedure on the protein interaction network.

Degree preserving randomization Full randomization

(i) Original Network

We start by measuring the distance distribution pd of the

original network, obtaining ⟨d⟩= 5.61 (Figure 4.19). (ii) Full Randomization

We generate a random network with the same N and L as the original network. The obtained pd visibly shifts to the right, providing ⟨d⟩ = 7.13, much larger than the original ⟨d⟩ = 5.61.

It is tempting to conclude that the protein interaction network is affected by some unknown organizing principle that keeps the distances shorter. This would be a flawed conclusion, however, as the bulk of the difference is due to the fact that full randomization changed the degree distribution. (iii) Degree-Preserving Randomization

As the original network is scale-free, the proper random reference should maintain the original degree distribution.

Figure 4.19 Randomizing Real Networks

The distance distribution pd between each node pair in the protein-protein interaction network (Table 4.1). The green line provides the path-length distribution obtained under full randomization, which turns the network into an Erdős-Rényi network, while keeping N and L unchanged (Figure 4.17). The light purple curve correspond to pd of the network obtained after degree-preserving randomization, which keeps the degree of each node unchanged. We have: ⟨d⟩=5.61±1.64 (original), ⟨d⟩=7.13 ± 1.62 (full randomization), ⟨d⟩=5.08 ± 1.34 (degree-preserving randomization).

Hence we determine pd after degree-preserving randomiza-

tion, finding that it is comparable to the original pd.

In summary, a random network overestimates the distances between the nodes, as it is missing the hubs. The network obtained by degree preserving randomization retains the hubs, so the distances of the randomized network are comparable to the original network. This example illustrates the importance of choosing the proper randomization procedure when exploring networks.

THE SCALE FREE PROPERTY

32

GENERATING NETWORKS WITH A PRE-DEFINED DEGREE DISTRIBUTION

Hence, the networks generated by these algorithms are a bit like a photograph of a painting: at first look they appear to be the same as the original. Upon closer inspection we realize, however, that many details, from the texture of the canvas to the brush strokes, are lost. The three algorithms discussed above raise the following question: How do we decide which one to use? Our choice depends on whether we start from a degree sequence {ki} or a degree distribution pk and whether we can tolerate self-loops and multi-links between two nodes. The decision tree involved in this choice is provided in Figure 4.20.

NETWORK

EXACTLY THE SAME DEGREE SEQUENCE

DEGREE-PRESERVING RANDOMIZATION

DEGREE DISTRIBUTION

SIMPLE pk

CONFIGURATION MODEL

Figure 4.20

Choosing a Generative Algorithm The choice of the appropriate generative algorithm depends on several factors. If we start from a real network or a known degree sequence, we can use degree-preserving randomization, which guarantees that the obtained networks are simple and have the degree sequence of the original network. The model allows us to forbid multi-links or selfloops, while maintaining the degree sequence of the original network.

ADJUSTABLE ⟨k⟩

If we wish to generate a network with given pre-defined degree distribution pk, we have two options. If pk is known, the configuration model offers a convenient algorithm for network generation. For example, the model allows us generate a networks with a pure power law degree distribution pk=Ck –γ for k≥ kmin.

HIDDEN PARAMETER MODEL

However, tuning the average degree 〈k〉 of a scale-free network within the configuration model is a tedious task, because the only available free parameter is kmin. Therefore, if we wish to alter 〈k〉, it is more convenient to use the hidden parameter model with parameter sequence (4.28). This way the tail of the degree distribution follows ~k-γ and by changing the number of links L we can to control 〈k〉.

THE SCALE-FREE PROPERTY

33

GENERATING NETWORKS WITH A PRE-DEFINED DEGREE DISTRIBUTION

SECTION 4.9

SUMMARY

The scale-free property has played an important role in the development of network science for two main reasons: • Many networks of scientific and practical interest, from the WWW to the subcellular networks, are scale-free. This universality made the scale-free property an unavoidable issue in many disciplines. • Once the hubs are present, they fundamentally change the system’s behavior. The ultra-small property offers a first hint of their impact on a network’s properties; we will encounter many more examples in the coming chapters. As we continue to explore the consequences of the scale-free property, we must keep in mind that the power-law form (4.1) is rarely seen in this pure form in real systems. The reason is that a host of processes affect the topology of each network, which also influence the shape of the degree distribution. We will discuss these processes in the coming chapters. The diversity of these processes and the complexity of the resulting pk confuses those who approach these networks through the narrow perspective of the quality of fit to a pure power law. Instead the scale-free property tells us that we must distinguish two rather different classes of networks: Exponentially Bounded Networks We call a network exponentially bounded if its degree distribution decrease exponentially or faster for high k. As a consequence is smaller than , implying that we lack significant degree variations. Examples of pk in this class include the Poisson, Gaussian, or the sim-

ple exponential distribution (Table 4.2). Erdős-Rényi and Watts-Strogatz networks are the best known models network belonging to this class. Exponentially bounded networks lack outliers, consequently most nodes have comparable degrees. Real networks in this class include highway networks and the power grid. Fat Tailed Networks We call a network fat tailed if its degree distribution has a power law tail in the high-k region. As a consequence is much larger than , resulting in considerable degree variations. Scale-free networks with a power-law degree distribution (4.1) offer the best known example of networks belonging to this class. Outliers, or exceptionally high-degree THE SCALE-FREE PROPERTY

34

nodes, are not only allowed but are expected in these networks. Net-

BOX 4.9

works in this class include the WWW, the Internet, protein interaction networks, and most social and online networks.

AT A GLANCE: SCALE-FREE NETWORKS

While it would be desirable to statistically validate the precise form of the degree distribution, often it is sufficient to decide if a given network has an exponentially bounded or a fat tailed degree distribution (see AD-

DEGREE DISTRIBUTION

VANCED TOPICS 4.A). If the degree distribution is exponentially bounded, the

Discrete form:

random network model offers a reasonable starting point to understand

pk =

its topology. If the degree distribution is fat tailed, a scale-free network offers a better approximation. We will also see in the coming chapters that

k −γ . ζ (γ )

Continuous form:

the key signature of the fat tailed behavior is the magniture of 〈k2〉: If 〈k2〉 is

γ −1 − γ p( k ) = (γ − 1)kmin k .

large, systems behave like scale-free networks; if 〈k2〉 is small, being comparable to 〈k〉(〈t〉+1), systems are well approximated by random networks.

SIZE OF THE LARGEST HUB 1

In summary, to understand the properties of real networks, it is of-

kmax = kminN y −1 .

ten sufficient to remember that in scale-free networks a few highly connected hubs coexist with a large number of small nodes. The presence of

MOMENTS OF pk for N

γ ≤ 3: 〈k〉 diverges.

these hubs plays an important role in the system’s behavior. In this chapter

2
3: 〈k〉

scale-free? The next chapter provides the answer.

→∞

finite, 〈k2〉

and 〈k2〉 finite.

DISTANCES

    〈d 〉 ~    

THE SCALE-FREE PROPERTY

35

SUMMARY

const.

ln ln N

ln N ln ln N ln N

γ =2 2 3

SECTION 4.10

HOMEWORK

4.1. Hubs Calculate the expected maximum degree kmax for the undirected net-

works listed in Table 4.1.

4.2. Friendship Paradox The degree distribution pk expresses the probability that a randomly

selected node has k neighbors. However, if we randomly select a link, the

probability that a node at one of its ends has degree k is qk = Akpk, where A is a normalization factor. (a) Find the normalization factor A, assuming that the network has a power law degree distribution with 2 < γ < 3, with minimum degree kmin and maximum degree kmax. (b) In the configuration model qk is also the probability that a randomly chosen node has a neighbor with degree k. What is the average degree of the neighbors of a randomly chosen node? (c) Calculate the average degree of the neighbors of a randomly chosen node in a network with N = 104, γ= 2.3, kmin= 1 and kmax= 1, 000. Compare the result with the average degree of the network, 〈k〉. (d) How can you explain the "paradox" of (c), that is a node's friends have more friends than the node itself? 4.3. Generating Scale-Free Networks Write a computer code to generate networks of size N with a power-law degree distribution with degree exponent γ. Refer to SECTION 4.9 for the pro-

cedure. Generate three networks with γ = 2.2 and with N = 103, N = 104 and N = 105 nodes, respectively. What is the percentage of multi-link and self-

loops in each network? Generate more networks to plot this percentage in function of N. Do the same for networks with γ = 3. 4.4. Mastering Distributions Use a software which includes a statistics package, like Matlab, Math-

THE SCALE-FREE PROPERTY

36

ematica or Numpy in Python, to generate three synthetic datasets, each containing 10,000 integers that follow a power-law distribution with γ =

2.2, γ = 2.5 and γ = 3. Use kmin = 1. Apply the techniques described in ADVANCED TOPICS 4.C to fit the three distributions.

THE SCALE-FREE PROPERTY

37

HOMEWORK

SECTION 4.11

ADVANCED TOPICS 4.A POWER LAWS

Power laws have a convoluted history in natural and social sciences, being interchangeably (and occasionally incorrectly) called fat-tailed, heavytailed, long-tailed, Pareto, or Bradford distributions. They also have a series of close relatives, like log-normal, Weibull, or Lévy distributions. In this section we discuss some of the most frequently encountered distributions in network science and their relationship to power laws. Exponentially Bounded Distributions Many quantities in nature, from the height of humans to the probability of being in a car accident, follow bounded distributions. A common property of these is that px decays either exponentially (e-x), or faster

than exponentially (e-x2/σ2) for high x. Consequently the largest expected x is bounded by some upper value xmax that is not too different from

⟨x⟩. Indeed, the expected largest x obtained after we draw N numbers from a bounded px grows as xmax ∼ log N or slower. This means that out-

liers, representing unusually high x-values, are rare. They are so rare that they are effectively forbidden, meaning that they do not occur with any meaningful probability. Instead, most events drawn from a bounded distribution are in the vicinity of ⟨x⟩. The high-x regime is called the tail of a distribution. Given the absence of numerous events in the tail, these distributions are also called thin tailed. Analytically the simplest bounded distribution is the exponential distribution e-λx. Within network science the most frequently encountered bounded distribution is the Poisson distribution (or its parent, the binomial distribution), which describes the degree distribution of a random network. Outside network science the most frequently encountered member of this class is the normal (Gaussian) distribution (Table 4.2). Fat Tailed Distributions The terms fat tailed, heavy tailed, or long tailed refer to px whose decay

THE SCALE-FREE PROPERTY

38

at large x is slower than exponential. In these distributions we often encounter events characterized by very large x values, usually called outliers or rare events. The power-law distribution (4.1) represents the best known example of a fat tailed distribution. An instantly recognizable feature of an fat tailed distribution is that the magnitude of the events x drawn from it can span several orders of magnitude. Indeed, in these distributions the size of the largest event after N trials scales as xmax ∼

Nζ where

ζ

is determined by the exponent γ characterizing the tail of

the px distribution. As Nζ grows fast, rare events or outliers occur with a noticeable frequency, often dominating the properties of the system.

The relevance of fat tailed distributions to networks is provided by several factors: • Many quantities occurring in network science, like degrees, link weights and betweenness centrality, follow a power-law distribution in both real and model networks. • The power-law form is analytically predicted by appropriate network models (CHAPTER 5). Crossover Distribution (Log-Normal, Stretched Exponential) When an empirically observed distribution appears to be between a power law and exponential, crossover distributions are often used to fit the data. These distributions may be exponentially bounded (power law with exponential cutoff), or not bounded but decay faster than a power law (log-normal or stretched exponential). Next we discuss the properties of several frequently encountered crossover distributions. Power law with exponential cut-off is often used to fit the degree distribution of real networks. Its density function has the form:

C=

where x > 0 and

γ

p(x) = C x −γ e − λ x ,

(4.30)

λ 1−γ , Γ (1 − γ , λ xmin )

(4.31)

> 0 and Γ(s,y) denotes the upper incomplete gamma

function. The analytical form (4.30) directly captures its crossover nature: it combines a power-law term, a key component of fat tailed distributions, with an exponential term, responsible for its exponentially bounded tail. To highlight its crossover characteristics we take the logarithm of (4.30), ln p(x) = ln C − γ ln x − λ x .

(4.32)

For x ≪ 1/λ the second term on the r.h.s dominates, suggesting that the

distribution follows a power law with exponent γ. Once x ≫ 1/λ, the λx

term overcomes the ln x term, resulting in an exponential cutoff for high x.

THE SCALE-FREE PROPERTY

39

4.A POWER LAWS

Stretched exponential (Weibull distribution) is formally similar to (4.30) except that there is a fractional power law in the exponential. Its name comes from the fact that its cumulative distribution function is one minus a stretched exponential function P(x) = e-(λx) (4.32) which leads to β

density function

P '( x ) = Cx β −1e− ,( λ x )

β

(4.33)

C = βλ β .

(4.34)



In most applications x varies between 0 and + . In (4.32) ing exponent, determining the properties of p(x):

β is the stretch-

• For β = 1 we recover a simple exponential function. • If β is between 0 and 1, the graph of log p(x) versus x is “stretched”, meaning that it spans several orders of magnitude in x. This is the regime where a stretched exponential is difficult to distinguish from a pure power law. The closer β is to 0, the more similar is p(x) to the power law x-1. • If β > 1 we have a “compressed” exponential function, meaning that x varies in a very narrow range. • For β = 2 (4.33) reduces to the Rayleigh distribution. As we will see in CHAPTERS 5 and 6, several network models predict a streched exponential degree distribution. A log-normal distribution (Galton or Gibrat distribution) emerges if ln x follows a normal distribution. Typically a variable follows a log-normal distribution if it is the product of many independent positive random numbers. We encounter log-normal distributions in finance, representing the compound return from a sequence of trades. The probability density function of a log-normal distribution is

p(x) =

2

1

x

exp

(ln x μ ) 2 . 2 2

(4.35)

Hence a log-normal is like a normal distribution except that its variable in the exponential term is not x, but ln x. To understand why a log-normal is occasionally used to fit a power law distribution, we note that

σ 2 = (ln x)2 − ln x

2



(4.36)

captures the typical variation of the order of magnitude of x. Therefore now ln x follows a normal distribution, which means that x can vary rather widely. Depending on the value of σ the log-normal distribution THE SCALE-FREE PROPERTY

40

4.A POWER LAWS

may resemble a power law for several orders of magnitude. This is also illustrated in Table 4.2, that shows that ⟨x2⟩ grows exponentially with σ, hence it can be very large. In summary, in most areas where we encounter fat-tailed distributions, there is an ongoing debate asking which distribution offers the best fit to the data. Frequently encountered candidates include a power law, a stretched exponential, or a log-normal function. In many systems empirical data is not sufficient to distinguish these distributions. Hence as long as there is empirical data to be fitted, the debate surrounding the best fit will never die out. The debate is resolved by accurate mechanistic models, which analytically predict the expected degree distribution.We will see in the coming chapters that in the context of networks the models predict Poisson, simple exponential, stretched exponential, and power law distributions. The remaining distributions in Table 4.2 are occasionally used to fit the degrees of some networks, despite the fact that we lack theoretical basis for their relevance for networks.

THE SCALE-FREE PROPERTY

41

4.A POWER LAWS

NAME Poisson (discrete)

px /p(x) � e−µ µx x!

Exponential (discrete)

(1 − e−λ )e−λx

Exponential (continuous)

λe−λx

Power law (discrete)

Power law (continuous)

Power law with cutoff (continuous) Stretched exponential (continuous) Log-normal (continuous) Normal (continuous)

x

� −↵

↵x

(

hxi

hx2 i

µ

µ(1 + µ)

� 1 (eλ − 1)

� (eλ + 1) (eλ − 1)2

� 1 λ

� ⇣(↵ − 2) ⇣(↵), 1,

⇣(↵)

( � ↵ (↵ − 1), 1,

−↵

λ1 ↵ x−↵ e−λx (1−↵)

βλβ xβ−1 e−(λx)

p1 e−(ln x−µ) x 2⇡σ 2

2

β





(2σ 2 )

2 2 p 1 e−(x−µ) (2σ ) 2⇡σ 2

λ−1

↵>2 ↵1 ↵>2 ↵1

(2−↵) (1−↵)

2



(

� ⇣(↵ − 1) ⇣(↵), 1, ( � ↵ (↵ − 2), 1, λ−2

λ−1 (1 + β −1 )

eµ+σ

� 2 λ2

↵>1 ↵2 ↵>1 ↵2

(3−↵) (1−↵)

λ−2 (1 + 2β −1 )

2

e2(µ+σ

2)

µ2 + σ 2

µ

Table 4.2 Distributions in Network Science

The table lists frequently encountered distributions in network science. For each distribution we show the density function px, the appropriate normalization constant C such that ∞



x = xmin

Cf ( x ) dx = 1

for the continuous case or ∞

∑ Cf ( x ) = 1

x = xmin

for the discrete case. Given that ⟨x⟩ and ⟨x2⟩ play an important role in network theory, we show the analytical form of these two quantities for each distribution. As some of these distributions diverge at x = 0, for most of them ⟨x⟩ and ⟨x2⟩ are calculated assuming that there is a small cutoff xmin in the system. In networks xmin often corresponds to the smallest degree, kmin, or the smallest degree for which the appropriate distribution offers a good fit.

THE SCALE-FREE PROPERTY

42

4.A POWER LAWS

Poisson   



Lin-­‐lin  plot  



   



pk



pk



   

Exponen+al  

(b)   Log-­‐log  plot  





















k





k

Lin-­‐lin  plot  







 













 



k

Lin-­‐lin  plot  





k

















k









k





k



 



Log-­‐log  plot        

   

 





   



    

 















k

 





k







Gaussian  

(g)   Lin-­‐lin  plot  



Log-­‐log  plot  



   

   



pk



pk



Log-­‐normal  

















Lin-­‐lin  plot  



 



(f)  



pk



    



k

Log-­‐log  plot  



   





pk



pk



Stretched  Exponen0al  

(e)  



 







k



pk





Log-­‐log  plot  



   

 



Power  Law  with  Exponen3al  Cutoff  







Lin-­‐lin  plot  



   



pk





(d)   Log-­‐log  plot  



   



k

pk







Power  Law  

(c)  

pk





pk



 



 

   



 

Log-­‐log  plot  



   

pk

Lin-­‐lin  plot  

pk

(a)  



 















k





 





k





Figure 4.21 Distributions Visualized

Linear and the log-log plots for the most frequently encountered distributions in network science. For definitions see Table 4.2.

THE SCALE-FREE PROPERTY

43

4.A POWER LAWS

SECTION 4.12

ADVANCED TOPICS 4.B PLOTTING POWER-LAWS

Plotting the degree distribution is an integral part of analyzing the properties of a network. The process starts with obtaining Nk, the number

of nodes with degree k. This can be provided by direct measurement or by a model. From Nk we calculate pk = Nk /N. The question is, how to plot pk to

best extract its properties. Use a Log-Log Plot In a scale-free network numerous nodes with one or two links coexist with a few hubs, representing nodes with thousands or even millions of links. Using a linear k-axis compresses the numerous small degree nodes in the small-k region, rendering them invisible. Similarly, as there can be orders of magnitude differences in pk for k = 1 and for large

k, if we plot pk on a linear vertical axis, its value for large k will appear to be zero (Figure 4.22a). The use of a log-log plot avoids these problems.

We can either use logarithmic axes, with powers of 10 (used throughout this book, Figure 4.22b) or we can plot log pk in function of log k (equally

correct, but slightly harder to read). Note that points with pk =0 or k=0 are not shown on a log-log plot as log 0=-∞. Avoid Linear Binning The most flawed method (yet frequently seen in the literature) is to simply plot pk = Nk/N on a log-log plot (Figure 4.22b). This is called linear

binning, as each bin has the same size Δk = 1. For a scale-free network linear binning results in an instantly recognizable plateau at large k, consisting of numerous data points that form a horizontal line (Figure 4.22b). This plateau has a simple explanation: Typically we have only one copy of each high degree node, hence in the high-k region we either have Nk=0 (no node with degree k) or Nk=1 (a single node with degree k). Consequently linear binning will either provide pk=0, not shown on a log-log plot, or pk = 1/N, which applies to all hubs, generating a plateau

at pk = 1/N.

This plateau affects our ability to estimate the degree exponent

γ. For

example, if we attempt to fit a power law to the data shown in Figure

THE SCALE-FREE PROPERTY

44

LINEAR SCALE

LINEAR BINNING

0.15

(a)

10

0

10

-1

Figure 4.22 Plotting a Degree Distributions

(b)

0.1

10-2

pk

pk

A degree distribution of the form pk ∼ (k + k0)-γ, with k0=10 and γ=2.5, plotted using the four procedures described in the text: (a) Linear Scale, Linear Binning. It is impossible to see the distribution on a lin-lin scale. This is the reason why we always use log-log plot for scale-free networks.

10-3

0.05

10-4

0

1000

2000 k 3000

10-5 4000

LOG-BINNING

10

0

10

1

10

2

k 10

3

(b) Log-Log Scale, Linear Binning. Now the tail of the distribution is visible but there is a plateau in the high-k regime, a consequence of linear binning.

10

4

CUMULATIVE

100

(c) Log-Log Scale, Log-Binning. With log-binning the plateau dissappears and the scaling extends into the high-k regime. For reference we show as light grey the data of (b) with linear binning.

100 (c)

10-1

(d)

10-1

10-2 10-3 pk 10-4 10-5

10-2 Pk

(d) Log-Log Scale, Cumulative. The cumulative degree distribution shown on a log-log plot.

10-3

10-6 10-7

10-4

10-8 10

0

10

1

10

2

k 10

3

10-5

10

4

4.22b using linear binning, the obtained

100

101

102

3 k 10

104

γ is quite different from the

real value γ=2.5. The reason is that under linear binning we have a large number of nodes in small k bins, allowing us to confidently fit pk in this

regime. In the large-k bins we have too few nodes for a proper statistical estimate of pk. Instead the emerging plateau biases our fit. Yet, it is pre-

cisely this high-k regime that plays a key role in determining γ. Increasing the bin size will not solve this problem. It is therefore recommended to avoid linear binning for fat tailed distributions. Use Logarithmic Binning Logarithmic binning corrects the non-uniform sampling of linear binning. For log-binning we let the bin sizes increase with the degree, making sure that each bin has a comparable number of nodes. For example, we can choose the bin sizes to be multiples of 2, so that the first bin has size b0=1, containing all nodes with k=1; the second has size b1=2, con-

taining nodes with degrees k=2, 3; the third bin has size b2=4 containing nodes with degrees k=4, 5, 6, 7. By induction the nth bin has size 2n-1 and

contains all nodes with degrees k=2n-1, 2n-1+1, ..., 2n-1-1. Note that the bin size can increase with arbitrary increments, bn = cn, where c > 1. The degree distribution is given by p⟨k ⟩=Nn/bn, where Nn is the number of n

nodes found in the bin n of size bn and ⟨kn⟩ is the average degree of the nodes in bin bn.

The logarithmically binned pk is shown in Figure 4.22c. Note that now the scaling extends into the high-k plateau, invisible under linear binning. Therefore logarithmic binning extracts useful information from the THE SCALE-FREE PROPERTY

45

4.B PLOTTING A POWER-LAW DEGREE DISTRIBUTION

rare high degree nodes as well (BOX 4.10). Use Cumulative Distribution Another way to extract information from the tail of pk is to plot the complementary cumulative distribution

Pk =

q=k+1

pq ,

(4.37)

which again enhances the statistical significance the high-degree region. If pk follows the power law (4.1), then the cumulative distribution scales as

Pk ∼ k −γ +1 .

(4.38)

The cumulative distribution again eliminates the plateau observed for linear binning and leads to an extended scaling region (Figure 4.22d), allowing for a more accurate estimate of the degree exponent. In summary, plotting the degree distribution to extract its features requires special attention. Mastering the appropriate tools can help us better explore the properties of real networks (BOX 4.10).

THE SCALE-FREE PROPERTY

46

4.B PLOTTING A POWER-LAW DEGREE DISTRIBUTION

BOX 4.10 DEGREE DISTRIBUTION OF REAL NETWORKS In real systems we rarely observe a degree distribution that fol-

(a) 100

lows a pure power law. Instead, for most real systems pk has the shape shown in Figure 4.23a, with some recurring features:

10-1

• Low-degree saturation is a common deviation from the power-law behavior. Its signature is a flattened pk for k < ksat. This

pk

indicates that we have fewer small degree nodes than expect-

10-2

HIGH DEGREE CUTOFF

ed for a pure power law. The origin of the saturation will be

(kcut)

explained in CHAPTER 6.

LOW DEGREE SATURATION (ksat)

10-3

• High-degree cutoff appears as a rapid drop in pk for k > kcut,

10-4

indicating that we have fewer high-degree nodes than expected in a pure power law. This limits the size of the largest hub, making it smaller than predicted by (4.18). High-degree cut-

k

100

101

102

103

100

101 k+ksat 102

103

0 (b) 10

offs emerge if there are inherent limitations in the number of links a node can have. For example, in social networks individuals have difficulty maintaining meaningful relationships

10-1

with an exceptionally large number of acquaintances.

~ pk 10-2

Given the widespread presence of such cutoffs the degree distribution is occasionally fitted to (4.39)

10-3

where ksat accounts for degree saturation, and the exponential

10-4

⎛ k ⎞ , px = a(k + ksat )−γ exp ⎜ − . ⎝ kcut ⎟⎠

term accounts for high-k cutoff. To extract the full extent of the scaling we plot ∼ ⎛ k ⎞ px = px exp ⎜ ⎝ kcut ⎟⎠

(4.40)

Figure 4.23 Rescaling the Degree Distribution (a) In real networks the degree distribution frequently deviates from a pure power law by showing a low degree saturation and high degree cutoff.

~ ~ ~ in function of k = k + ksat. According to (4.40) p ~ k -γ, correcting for

the two cutoffs, as seen in Figure 4.23b.

(b) By plotting the rescaled in function of (k + ksat), as suggested by (4.40), the degree distribution follows a power law for all degrees.

It is occasionally claimed that the presence of low-degree or high-degree cutoffs implies that the network is not scale-free. This is a misunderstanding of the scale-free property: Virtually all properties of scale-free networks are insensitive to the low-degree saturation. Only the high-degree cutoff affects the system’s properties by limiting the divergence of the second moment, ⟨k2⟩. The presence of such cutoffs indicates the presence of additional phenomena that need to be understood.

THE SCALE FREE PROPERTY

47

4.B PLOTTING A POWER-LAW DEGREE DISTRIBUTION

SECTION 4.13

ADVANCED TOPICS 4.C ESTIMATING THE DEGREE EXPONENT

Online Resource 4.2 Fitting power-law

As the properties of scale-free networks depend on the degree exponent (SECTION 4.7), we need to determine the value of γ. We face several difficulties, however, when we try to fit a power law to real data. The most

The algorithmic tools to perform the fitting procedure described in this section are available at http://tuvalu.santafe. edu/~aaronc/powerlaws/.

important is the fact that the scaling is rarely valid for the full range of the degree distribution. Rather we observe small- and high- degree cut-

>

offs (BOX 4.10), denoted in this section with Kmin and Kmax, within which we

have a clear scaling region. Note that Kmin and Kmax are different from kmin

and kmax, the latter corresponding to the smallest and largest degrees in a

network. They can be the same as ksat and kcut discussed in BOX 4.10. Here we

focus on estimating the small degree cutoff Kmin, as the high degree cutoff can be determined in a similar fashion. The reader is advised to consult the discussion on systematic problems provided at the end of this section before implementing this procedure. Fitting Procedure As the degree distribution is typically provided as a list of positive integers kmin , ..., kmax, we aim to estimate γ from a discrete set of data points [47]. We use the citation network to illustrate the procedure. The network consists of N=384,362 nodes, each node representing a research paper published between 1890 and 2009 in journals published by the American Physical Society. The network has L = 2,353,984 links, each representing a citation from a published research paper to some other publication in the dataset (outside citations are ignored). For no particular reason, this is not the citation dataset listed in Table 4.1. See [48] for an overall characterization of this data. The steps of the fitting process are [47]: 1. Choose a value of Kmin between kmin and kmax. Estimate the value of the degree exponent corresponding to this Kmin using −1

⎤ ⎡ ⎢N ⎥ ki γ = 1+ N ⎢ ∑ ln ⎥ . 1 ⎢ i=1 K min − ⎥ ⎣ 2⎦

THE SCALE-FREE PROPERTY

(4.41)

48

2. With the obtained (γ, Kmin) parameter pair assume that the degree

pk =

1 k −γ , ζ (γ , K min )

(a)

10-1

distribution has the form (4.42)

10-3 pk

hence the associated cumulative distribution function (CDF) is

10-5

ζ (γ , k ) . Pk = 1 − ζ (γ , K min )

(4.43) 10-7

3. Use the Kormogorov-Smirnov test to determine the maximum dis-

10-9

tance D between the CDF of the data S(k) and the fitted model pro-

100

vided by (4.43) with the selected (γ, kmin) parameter pair,

D = maxk ≥ K min | S ( k ) − Pk | .

Kmin=49

101

Citation

102

k

103

104

Fitting

100

(4.44)

(b)

Equation (4.44) identifies the degree for which the difference D be-

D

tween the empirical distribution S(k) and the fitted distribution (4.43) is the largest. 10-1

4. Repeat steps (1-3) by scanning the whole Kmin range from kmin to kmax. We aim to identify the Kmin value for which D provided by (4.44) is

minimal. To illustrate the procedure, we plot D as a function of Kmin

for the citation network (Figure 4.24b). The plot indicates that D is minimal for Kmin= 49, and the corresponding

γ

estimated by (4.41),

10-2

representing the optimal fit, is γ=2.79. The standard error for the ob-

0

tained degree exponent is

σγ =

 ζ ′′(γ , K )  ζ ′(γ , K )  2  min min N −   K K ζ ( γ , ) ζ ( γ ,  min min )    

obtain σγ=0.003, hence γ=2.79(3).

40 Kmin 60

80

100

500

1

which implies that the best fit is γ

20

±

(c)

p < 10-4

(4.45)

400 p(D) 300

σγ. For the citation network we

200

Note that in order to estimate γ datasets smaller than N=50 should be 100

treated with caution. Goodness-of-fit

0

Just because we obtained a (γ, Kmin) pair that represents an optimal fit to our dataset, does not mean that the power law itself is a good model for the studied distribution.We therefore need to use a goodness-of-fit test,

0.000

D

0.010

Figure 4.24 Maximum Likelihood Estimation

(a) The degree distribution pk of the citation network, where the straight purple line represents the best fit based on the model (4.39).

which generates a p-value that quantifies the plausibility of the power law hypothesis. The most often used procedure consists of the following steps: 1. Use the cumulative distribution (4.43) to estimate the KS distance be-

(b) The values of Kormogorov-Smirnov test vs. Kmin for the citation network.

tween the real data and the best fit, that we denote by Dreal. This is step 3 above, taking the value of D for Kmin that offered the best fit

(c) p(Dsynthetic) for M=10,000 synthetic datasets, where the grey line corresponds to the Dreal value extracted for the citation network.

to the data. For the citation data we obtain Dreal = 0.01158 for Kmin= 49 (Figure 4.24c).

THE SCALE-FREE PROPERTY

0.005

49

ESTIMATING THE DEGREE EXPONENT

2. Use (4.42) to generate a degree sequence of N degrees (i.e. the same

number of random numbers as the number of nodes in the original dataset) and substitute the obtained degree sequence for the empirical data, determining Dsynthetic for this hypothetical degree sequence. Hence Dsynthetic represents the distance between a synthetically generated degree sequence, consistent with our degree distribution, and the real data. 3. The goal is to see if the obtained Dsynthetic is comparable to Dreal. For this

we repeat step (2) M times (M ≫ 1), and each time we generate a new

degree sequence and determine the corresponding Dsynthetic, eventu-

ally obtaining the p(Dsynthetic) distribution. Plot p(Dsynthetic) and show as a vertical bar Dreal (Figure 4.24c). If Dreal is within the p(Dsynthetic) distribution, it means that the distance between the model providing the best fit and the empirical data is comparable with the distance expected from random degree samples chosen from the best fit distribution. Hence the power law is a reasonable model for the data. If, however, Dreal falls outside the p(Dsynthetic) distribution, then the power law is not a good model - some other function is expected to describe the original pk better. While the distribution shown in Figure 4.24c may be in some cases useful to illustrate the statistical significance of the fit, in general it is better to assign a p-number to the fit, given by ∞

(

)

p = ∫ P D synthetic dD synthetic . D

(4.46)

The closer p is to 1, the more likely that the difference between the empirical data and the model can be attributed to statistical fluctuations alone. If p is very small, the model is not a plausible fit to the data. Typically, the model is accepted if p > 1%. For the citation network we obtain p < 10-4, indicating that a pure power law is not a suitable model for the original degree distribution. This outcome is somewhat surprising, as the power-law nature of citation data has been documented repeatedly since 1960s [7, 8]. This failure indicates the limitation of the blind fitting to a power law, without an analytical understanding of the underlying distribution. Fitting Real Distributions To correct the problem, we note that the fitting model (4.44) eliminates all the data points with k < Kmin. As the citation network is fat tailed,

choosing Kmin = 49 forces us to discard over 96% of the data points. Yet,

there is statistically useful information in the k < Kmin regime, that is

ignored by the previous fit. We must introduce an alternate model that resolves this problem. As we discussed in BOX 4.10, the degree distribution of many real networks, like the citation network, does not follow a pure power law. It often has low degree saturations and high degree cutoffs, described by THE SCALE-FREE PROPERTY

50

ESTIMATING THE DEGREE EXPONENT

100

the form

pk =

1 ( k + ksat )−γ e− k / kcut − γ − k ′ / kcut + ( ) ∑ k ′ ksat e

0.0026

D

(4.47)

0.0022

k =1

5000

10

and the associated CDF is

Pk =

(a)

0.0024

1

k

∑ (k ′ + ksat ) e −γ

− k ′ / kcut

k ′=1

∑ (k ′ + k k ′=1

sat

)−γ e− k ′ / kcut ,

6000

kcut 7000

-1

(4.48)

where ksat and kcut correspond to low-k saturation and the large-k cutoff,

10-2 0

respectively. The difference between our earlier procedure and (4.47) is

kcut=3000

that we now do not discard the points that deviate from a pure power

10 kcut=6000

ksat

20

kcut=9000

law, but instead use a function that offers a better fit to the whole degree distribution, from kmin to kmax.

(b)

10-1

Our goal is to find the fitting parameters ksat, kcut, and

γ of the model

10-3

(4.47), which we achieve through the following steps (Figure 4.25):

pk 10-5

1. Pick a value for ksat and kcut between Kmin and Kmax. Estimate the val-

ue of the degree exponent γ using the steepest descend method that

maximizes the log-likelihood function N

log (γ | ksat , kcut ) = ∑ log p( ki | γ , ksat , kcut ).

10-7

(4.49)

10-9

i =1

100

That is, for fixed (ksat, kcut) we vary γ until we find the maximum of

(4.49).

k

103

104

p = 0.69 p(D)

tween the cumulative degree distribution (CDF) of the original data

600

and the fitted model provided by (4.47).

400

3. Change ksat and kcut, and repeat steps (1-3), scanning with ksat from

kmin= 0 to kmax and scanning with kcut from kmin= k0 to kmax. The goal is

200

to identify ksat and kcut values for which D is minimal. We illustrate

this by plotting D in function of ksat for several kcut values in Figure 4.25a for our citation network. The (ksat, kcut) for which D is minimal,

and the corresponding γ is provided by (4.41), represent the optimal parameters of the fit. For our dataset the optimal fit is obtained for

γ= 3.028. We

find that now D for the real data is within the generated p(D) distri-

(c)

800

the form (4.47). Calculate the Kormogorov Smirnov parameter D be-

0 0.000 0.001 0.002 0.003 0.004 0.005 D Figure 4.25

Estimating the Scaling Parameters for Citation Networks

bution (Figure 4.25c), and the associated p-value is 69%. Systematic Fitting Issues The procedure described above may offer the impression that determining the degree exponent is a cumbersome but straightforward process. In reality these fitting methods have some well known limitations: 1. A pure power law is an idealized distribution that emerges in its

THE SCALE-FREE PROPERTY

102 Fitting

1000

2. With the obtained γ(ksat, kcut) assume that the degree distribution has

ksat= 12 and kcut= 5,691, providing the degree exponent

101

Citation

51

(a) The Kormogorov-Smirnov parameter D vs. ksat for kcut = 3,000, 6,000, 9,000, respectively. The curve indicates that ksat= 12 corresponds to the minimal D. Inset: D vs. kcut for ksat= 12, indicating that kcut =5,691 minimizes D. (b) Degree distribution pk where the straight line represents the best estimate from (a). Now the fit accurately captures the whole curve, not only its tail, or it did in Figure 4.24a. (c) p(Dsynthetic) for M = 10,000 synthetic datasets. The grey line corresponds to the Dreal value from the citation network. ESTIMATING THE DEGREE EXPONENT

form (4.1) only in simple models (CHAPTER 5). In reality, a whole range of processes contribute to the topology of real networks, affecting the precise shape of the degree distribution. These processes will be discussed in CHAPTER 6. If pk does not follow a pure power law, the methods described above, designed to fit a power law to the data, will inevitably fail to detect statistical significance. While this finding can mean that the network is not scale-free, it most often means that we have not yet gained a proper understanding of the precise form of the degree distribution. Hence we are fitting the wrong functional form of pk to the dataset. 2. The statistical tools used above to test the goodness-of-fit rely on

the Kolmogorov-Smirnov criteria, which measures the maximum distance between the fitted model and the dataset. If almost all data points follow a perfect power law, but a single point for some reason deviates from the curve, we will loose the fit’s statistical significance. In real systems there are numerous reasons for such local deviations that have little impact on the system’s overall behavior. Yet, removing these “outliers” could be seen as data manipulation; if kept, however, one cannot detect the statistical significance of the power law fit. A good example is provided by the actor network, whose degree distribution follows a power law for most degrees. There is, however, a prominent outlier at k = 1,287, thanks to the 1956 movie Around the World in Eighty Days. This is the only movie where imdb.com the source of the actor network, lists all the normally uncredited extras in the cast. Hence the movie appears to have 1,288 actors. The second largest movie in the dataset has only 340 actors. Since each extra has links only to the 1,287 extras that played in the same movie, we have a local peak in pk at k=1,287. Thanks to this peak, the degree distribution, fitted to a power law, fails to pass the Kolmogorov-Smirnov criteria. Indeed, as indicated in Table 4.3, neither the pure power law fit, nor a power law with high-degree cutoff offers a statistically significant fit. Yet, ultimately this single point does not alter the power law nature of the degreee distribution. 4. As a result of the issues discussed above, the methodology described

to fit a power law distribution often predicts a small scaling regime, forcing us to remove a huge fraction of the nodes (often as many as

Power Power Grid Grid

kmin 0.517 0.5174

4

P-VALUE

PERCENTAGE

0.91 0.91

12% 12%

Table 4.3 Exponential Fitting

For the power grid a power law degree distribution does not offer a statistically significant fit. Indeed, we will encounter numerous evidence that the underlying network is not scale-free. We used the fitting procedure described in this section to fit the exponential function e-λk to the degree distribution of the

99%, see Table 4.4) to obtain a statistically significant fit. Once plotted

power grid, obtaining a statistically significant fit. The table shows the obtained λ parameters, the kmin over which the fit is valid, the obtained p-value, and the percentage of data points included in the fit.

next to the original dataset, the obtained fit can be at times ridiculous, even if the method predits statistical significance.

THE SCALE-FREE PROPERTY

52

ESTIMATING THE DEGREE EXPONENT

In summary, estimating the degree exponent is still not yet an exact science. We continue to lack methods that would estimate the statistical significance in a manner that would be acceptable to a practitioner. The blind application of the tools describe above often leads to either fits that obviously do not capture the trends in the data, or to a false rejection of the power-law hypothesis. An important improvement is our ability to derive the expected form of the degree distribution, a problem discussed in CHAPTER 6.

Kmin

( k + ksat

P-VALUE

PERCENT

(

K ;[ Kmin , ]

e

k/kcut

ksat

kcut

P-VALUE

INTERNET

3.42

72

0.13

0.6%

3.55

8

8500

0.00

WWW (IN)

2.00

1

0.00

100%

1.97

0

660

0.00

WWW (OUT)

2.31

7

0.00

15%

2.82

8

8500

0.00

POWER GRID

4.00

5

0.00

12%

8.56

19

14

0.00

4.69

9

0.34

2.6%

6.95

15

10

0.00

5.01

11

0.77

1.7%

7.23

15

10

0.00

EMAIL-PRE (IN)

3.43

88

0.11

0.2%

2.27

0

8500

0.00

EMAIL-PRE (OUT)

2.03

3

0.00

1.2%

2.55

0

8500

0.00

SCIENCE COLLABORATION

3.35

25

0.0001

5.4%

1.50

17

12

0.00

ACTOR NETWORK

2.12

54

0.00

33%

-

-

-

0.00

CITATION NETWORK (IN)

2.79

51

0.00

3.0%

3.03

12

5691

0.69

4.00

19

0.00

14%

-0.16

5

10

0.00

E.COLI METABOLISM (IN)

2.43

3

0.00

57%

3.85

19

12

0.00

E.COLI METABOLISM (OUT)

2.90

5

0.00

34%

2.56

15

10

0.00

2.89

7

0.67

8.3%

2.95

2

90

0.52

MOBILE PHONE CALLS (IN) MOBILE PHONE CALLS (OUT)

CITATION NETWORK (OUT)

YEAST PROTEIN INTERACTIONS

Table 4.4 Fitting Parameters for Real Networks The estimated degree exponents and the appropriate fit parameters for the reference networks studied in this book. We implement two fitting strategies, the first aiming to fit a pure power law in the region (Kmin, ∞) and the second fits a power law with saturation and exponential cutoff to the whole dataset. In the table we show the obtained γ exponent and Kmin for the fit with the best statistical significance, the p-value for the best fit and the percentage of the data included in the fit. In the second case we again show the exponent γ, the two fit parameters, ksat and kcut, and the p-value of the obtained fit. Note that p > 0.01 is considered to be statistically significant.

THE SCALE-FREE PROPERTY

53

ESTIMATING THE DEGREE EXPONENT

SECTION 4.14

BIBLIOGRAPHY

[1] H. Jeong, R.Albert, and A.-L. Barabási. Internet: Diameter of the world-wide web. Nature, 401:130-131, 1999. [2] A.-L. Barabási and R.Albert. Emergence of scaling in random networks. Science, 286:509-512, 1999. [3] V. Pareto. Cours d’Économie Politique: Nouvelle édition par G.- H. Bousquet et G. Busino, Librairie Droz, Geneva, 299–345, 1964. [4] A.-L. Barabási. Linked: The New Science of Networks. Plume, New York, 2002. [5] M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology. Proceedings of SIGCOMM. Comput. Commun. Rev. 29: 251-262, 1999. [6] R. Pastor-Satorras and A.Vespignani. Evolution and Structure of the Internet: A Statistical Physics Approach. Cambridge University Press, Cambridge, 2004. [7] D. J. De Solla Price. Networks of Scientific Papers. Science 149: 510515, 1965. [8] S. Redner. How Popular is Your Paper? An Empirical Study of the Citation Distribution. Eur. Phys. J. B 4: 131, 1998. [9] R. Kumar, P. Raghavan, S. Rajalopagan, and A.Tomkins. Extracting Large-Scale Knowledge Bases from the Web. Proceedings of the 25thVLDBConference, Edinburgh,Scotland,pp.639-650,1999. [10] A.-L. Barabási, R.Albert, and H. Jeong. Mean-field theory of scalefree random networks. Physica A 272:173-187, 1999. [11] H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai, and A.-L. Barabási. The THE SCALE-FREE PROPERTY

54

large-scale organization of metabolic networks. Nature 407: 651-654, 2000. [12] A. Wagner, A. and D.A. Fell. The small world inside large metabolic networks. Proc. R. Soc. Lond. B 268: 1803–1810, 2001. [13] W. Aiello, F. Chung, and L.A. Lu. Random graph model for massive graphs, Proc. 32nd ACM Symp. Theor. Comp, 2000. [14] H. Jeong, B. Tombor, S. P. Mason, A.-L. Barabási, and Z.N. Oltvai. Lethality and centrality in protein networks. Nature 411: 41-42, 2001. [15] A. Wagner. How the global structure of protein interaction networks evolves. Proc. R. Soc. Lond. B 270: 457–466, 2003. [16] M. E. J. Newman. The structure of scientific collaboration networks. Proc. Natl.Acad. Sci. 98: 404-409, 2001. [17] A.-L. Barabási, H. Jeong, E. Ravasz, Z. Néda, A. Schubert, and T. Vicsek. Evolution of the social network of scientific collaborations. Physica A 311: 590-614, 2002. [18] F. Liljeros, C.R. Edling, L.A.N. Amaral, H.E. Stanley, and Y. Aberg. The Web of Human Sexual Contacts. Nature 411: 907-908, 2001. [19] R. Ferrer i Cancho and R.V. Solé. The small world of human language. Proc. R. Soc. Lond. B 268: 2261-2265, 2001. [20] R. Ferrer i Cancho, C. Janssen, and R.V. Solé. Topology of technology graphs: Small world patterns in electronic circuits. Phys. Rev. E 64: 046119, 2001. [21] S. Valverde and R.V. Solé. Hierarchical Small Worlds in Software Architecture. arXiv:cond-mat/0307278, 2003. [22] H. Ebel, L.-I. Mielsch, and S. Bornholdt. Scale-free topology of email networks. Phys. Rev. E 66: 035103(R), 2002. [23] J.P.K. Doye. Network Topology of a Potential Energy Landscape: A Static Scale-Free Network. Phys. Rev. Lett. 88: 238701, 2002. [24] J.-P. Onnela, J. Saramaki, J. Hyvonen, G. Szabó, D. Lazer, K. Kaski, J. Kertesz, and A.-L. Barabási. Structure and tie strengths in mobile communication networks. Proceedings of the National Academy of Sciences 104: 7332-7336 (2007). [25] H. Kwak, C. Lee, H. Park, S. Moon. What is Twitter, a social network or a news media? Proceedings of the 19th international conference on World Wide Web, 591-600, 2010. [26] M. Cha, H. Haddadi, F. Benevenuto and K. P. Gummadi. Measuring THE SCALE-FREE PROPERTY

55

BIBLIOGRAPHY

user influence in Twitter: The million follower fallacy. Proceedings of international AAAI Conference on Weblogs and Social, 2010. [27] J. Ugander, B. Karrer, L. Backstrom, and C. Marlow. The Anatomy of the Facebook Social Graph. ArXiv:1111.4503, 2011. [28] L.A.N. Amaral, A. Scala, M. Barthelemy and H.E. Stanley. Classes of small-world networks. Proceeding National Academy of Sciences U. S. A. 97:11149-11152, 2000. [29] R. Cohen and S. Havlin. Scale free networks are ultrasmall. Phys. Rev. Lett. 90, 058701, 2003. [30] B. Bollobás and O. Riordan. The Diameter of a Scale-Free Random Graph. Combinatorica, 24: 5-34, 2004. [31] R. Cohen and S. Havlin. Complex Networks - Structure, Robustness and Function. Cambridge University Press, Cambridge, 2010. [32] K.-I. Goh, B. Kahng, and D. Kim. Universal behavior of load distribution in scale-free networks. Phys. Rev. Lett. 87: 278701, 2001. [33] F. Karinthy. Láncszemek, in Minden másképpen van. Budapest, Atheneum Irodai es Nyomdai R.-T. Kiadása, 85–90, 1929. English translation in: M.E.J. Newman, A.-L. Barabási, and D. J. Watts. The Structure and Dynamics of Networks. Princeton University Press, Princeton, 2006. [34] P.S. Dodds, R. Muhamad and D.J. Watts. An experimental study to search in global social networks. Science 301: 827-829, 2003. [35] P. Erdős and T. Gallai. Graphs with given degrees of vertices. Matematikai Lapok, 11:264-274, 1960. [36] C.I. Del Genio, H. Kim, Z. Toroczkai, and K.E. Bassler. Efficient and exact sampling of simple graphs with given arbitrary degree sequence. PLoS ONE, 5: e10012, 04 2010. [37] V. Havel. A remark on the existence of finite graphs. Casopis Pest. Mat., 80:477-480, 1955. [38] S. Hakimi. On the realizability of a set of integers as degrees of the vertices of a graph. SIAM J.Appl. Math., 10:496-506, 1962. [39] I. Charo Del Genio, G. Thilo, and K.E. Bassler. All scale-free networks are sparse. Phys. Rev. Lett. 107:178701, 10 2011. [40] B. Bollobás. A probabilistic proof of an asymptotic formula for the number of labelled regular graphs. European J. Combin. 1: 311– 316, 1980. [41] M. Molloy and B. A. Reed. Critical Point for Random Graphs with a Given Degree Sequence. Random Structures and Algorithms, 6: 161-180, THE SCALE-FREE PROPERTY

56

BIBLIOGRAPHY

1995. [42] M. Newman. Networks: An Introduction. Oxford University, Oxford, 2010. [43] S. Maslov and K. Sneppen. Specificity and stability in topology of protein networks. Science, 296:910-913, 2002. [44] G. Caldarelli, I. A. Capocci, P. De Los Rios, and M.A. Muñoz. ScaleFree Networks from Varying Vertex Intrinsic Fitness. Phys. Rev. Lett. 89: 258702, 2002. [45] B. Söderberg. General formalism for inhomogeneous random graphs. Phys. Rev. E 66: 066121, 2002. [46] M. Boguñá and R. Pastor-Satorras. Class of correlated random networks with hidden variables. Phys. Rev. E 68: 036112, 2003. [47] A. Clauset, C.R. Shalizi, and M.E.J. Newman. Power-law distributions in empirical data. SIAM Review S1: 661-703, 2009. [48] S. Redner. Citation statistics from 110 years of physical review. Physics Today, 58:49, 2005.

THE SCALE-FREE PROPERTY

57

BIBLIOGRAPHY

5 ALBERT-LÁSZLÓ BARABÁSI

NETWORK SCIENCE THE BARABÁSI-ALBERT MODEL

ACKNOWLEDGEMENTS

MÁRTON PÓSFAI GABRIELE MUSELLA MAURO MARTINO ROBERTA SINATRA

SARAH MORRISON AMAL HUSSEINI PHILIPP HOEVEL

INDEX

Introduction Growth and Preferential Attachment

1

The Barabási-Albert Model

2

Degree Dynamics

3

Degree Distribution

4

The Absence of Growth or Preferential Attachment

5

Measuring Preferential Attachment

6

Non-linear Preferential Attachment

7

The Origins of Preferential Attachment

8

Diameter and Clustering Coefficient

9

Homework

10

Summary

11

ADVANCED TOPICS 5.A

Deriving the Degree Distribution

12

ADVANCED TOPICS 5.B

Nonlinear Preferential Attachment

13

ADVANCED TOPICS 5.C

The Clustering Coefficient

14

Bibliography

15

Figure 5.0 (cover image) Scale-free Sonata

Composed by Michael Edward Edgerton in 2003, 1 sonata for piano incorporates growth and preferential attachment to mimic the emergence of a scale-free network. The image shows the beginning of what Edgerton calls Hub #5. The relationship between the music and networks is explained by the composer: “6 hubs of different length and procedure were distributed over the 2nd and 3rd movements. Musically, the notion of an airport was utilized by diverting all traffic into a limited landing space, while the density of procedure and duration were varied considerably between the 6 differing occurrences.“

This book is licensed under a Creative Commons: CC BY-NC-SA 2.0. PDF V48 19.09.2014

SECTION 5.1

INTRODUCTION

Hubs represent the most striking difference between a random and a scale-free network. On the World Wide Web, they are websites with an exceptional number of links, like google.com or facebook.com; in the metabolic network they are molecules like ATP or ADP, energy carriers in-

>

volved in an exceptional number of chemical reactions. The very existence of these hubs and the related scale-free topology raises two fundamental questions: • Why do so different systems as the WWW or the cell converge to a similar scale-free architecture? Online Resource 5.1 Scale-free Sonata

• Why does the random network model of Erdős and Rényi fail to reproduce the hubs and the power laws observed in real

Listen to a recording of Michael Edward Edgerton's 1 sonata for piano, music inspired by scale-free networks.

networks?

>

The first question is particularly puzzling given the fundamental differences in the nature, origin, and scope of the systems that display the scale-free property: • The nodes of the cellular network are metabolites or proteins, while the nodes of the WWW are documents, representing information without a physical manifestation. • The links within the cell are chemical reactions and binding interactions, while the links of the WWW are URLs, or small segments of computer code. • The history of these two systems could not be more different: The cellular network is shaped by 4 billion years of evolution, while the WWW is less than three decades old. •The

purpose

of

the

metabolic

network

is

to

produce

the

chemical components the cell needs to stay alive, while the purpose of the WWW is information access and delivery. THE BARABÁSI-ALBERT MODEL

3

To understand why so different systems converge to a similar architecture we need to first understand the mechanism responsible for the emergence of the scale-free property. This is the main topic of this chapter. Given the diversity of the systems that display the scale-free property, the explanation must be simple and fundamental. The answers will change the way we model networks, forcing us to move from describing a network’s topology to modeling the evolution of a complex system.

THE BARABÁSI-ALBERT MODEL

4

INTRODUCTION

SECTION 5.2

GROWTH AND PREFERENTIAL ATTACHMENT

We start our journey by asking: Why are hubs and power laws absent in random networks? The answer emerged in 1999, highlighting two hidden assumptions of the Erdős-Rényi model, that are violated in real networks [1]. Next we discuss these assumptions separately. Networks Expand Through the Addition of New Nodes The random network model assumes that we have a fixed number of nodes, N. Yet, in real networks the number of nodes continually grows thanks to the addition of new nodes. Consider a few examples: • In 1991 the WWW had a single node, the first webpage build by Tim Berners-Lee, the creator of the Web. Today the Web has over a trillion (1012) documents, an extraordinary number that was reached through the continuous addition of new documents by millions of individuals and institutions (Figure 5.1a). • The collaboration and the citation network continually expands through the publication of new research papers (Figure 5.1b). • The actor network continues to expand through the release of new movies (Figure 5.1c). • The protein interaction network may appear to be static, as we inherit our genes (and hence our proteins) from our parents. Yet, it is not: The number of genes grew from a few to the over 20,000 genes present in a human cell over four billion years. Consequently, if we wish to model these networks, we cannot resort to a static model. Our modeling approach must instead acknowledge that networks are the product of a steady growth process.

THE BARABÁSI-ALBERT MODEL

5

Nodes Prefer to Link to the More Connected Nodes

(a) NUMBER OF HOSTS

The random network model assumes that we randomly choose the interaction partners of a node. Yet, most real networks new nodes prefer to link to the more connected nodes, a process called preferential attachment (Figure 5.2). Consider a few examples:

1x109 9x108 8x108 7x108 6x108 5x108

WORLD WIDE WEB

4x108 3x108 2x108 1x108 0x100

1982

1987

1992

1997

2002

2007

2012

YEARS

• We are familiar with only a tiny fraction of the trillion or more docu-

(b) NUMBER OF PAPERS

ments available on the WWW. The nodes we know are not entirely random: We all heard about Google and Facebook, but we rarely encounter the billions of less-prominent nodes that populate the Web. As our knowledge is biased towards the more popular Web documents, we are more likely to link to a high-degree node than to a node with only few links.

450000 400000 350000

CITATION NETWORK

300000 250000 200000 150000 100000 50000 0

1880 1900

1920

1940

• No scientist can attempt to read the more than a million scientific pa-

1960

1980

2000 2020

1980

2000 2020

pers published each year. Yet, the more cited is a paper, the more likely that we hear about it and eventually read it. As we cite what we read, our citations are biased towards the more cited publications, representing the high-degree nodes of the citation network. • The more movies an actor has played in, the more familiar is a casting director with her skills. Hence, the higher the degree of an actor in the

(c)

250000

NUMBER OF MOVIES

YEARS

200000 150000 100000 50000 0

actor network, the higher are the chances that she will be considered

ACTOR NETWORK

1880 1900

1920

1940

1960

YEARS

for a new role. Figure 5.1 The Growth of Networks

In summary, the random network model differs from real networks in two important characteristics:

Networks are not static, but grow via the addition of new nodes:

(A) Growth

(a) The evolution of the number of WWW hosts, documenting the Web’s rapid growth. After http://www.isc.org/solutions/survey/history.

Real networks are the result of a growth process that continuously increases N. In contrast the random network model assumes that the number of nodes, N, is fixed.

(b) The number of scientific papers published in Physical Review since the journal’s founding. The increasing number of papers drives the growth of both the science collaboration network as well as of the citation network shown in the figure.

(B) Preferential Attachment In real networks new nodes tend to link to the more connected nodes. In contrast nodes in random networks randomly choose their interaction partners.

(c) Number of movies listed in IMDB.com, driving the growth of the actor network.

There are many other differences between real and random networks, some of which will be discussed in the coming chapters. Yet, as we show next, these two, growth and preferential attachment, play a particularly important role in shaping a network’s degree distribution.

THE BARABÁSI-ALBERT MODEL

6

GROWTH AND PREFERENTIAL ATTACHMENT

THE BARABÁSI-ALBERT MODEL

7

GROWTH AND PREFERENTIAL ATTACHMENT

PUBLICATION DATE

MILESTONES

1935

1941 1945

George Udmy Yule (1871-1951) used preferential attachment to explain the power-law distribution of the number of species per genus of flowering plants [3]. Hence, in statistics preferential attachment is often called a Yule process.

1955 1960

1968

PHYSICIST

George Kinsley Zipf (1902-1950) used preferential attachment to explain the fat tailed distribution of wealth in the society [5].

1976

SOCIOLOGIST

1980

1985

Derek de Solla Price (1922-1983) used preferential attachment to explain the citation statistics of scientific publications, calling it cumulative advantage [7].

1995

1999 2000

2005

Barabási (1967) & Albert (1972) introduce the term preferential attachment to explain the origin of scale-free networks [1].

Robert Merton (1910-2003) In sociology preferential attachment is often called the Matthew effect, named by Merton [8] after a passage in the Gospel of Matthew.

1990

Gospel of Matthew XXI

NETWORK SCIENTISTS

Albert-László Barabási & Réka Albert PREFERENTIAL ATTACHMENT

“For everyone who has will be given more, and he will have an abundance.”

Herbert Alexander Simon (1916-2001) used preferential attachment to explain the fat-tailed nature of the distributions describing city sizes, word frequencies, or the number of papers published by scientists [6].

1970

Derek de Solla Price CUMULATIVE ADVANTAGE

Robert Gibrat (1904-1980) proposed that the size and the growth rate of a firm are independent. Hence, larger firms grow faster [4]. Called proportional growth, this is a form of preferential attachment.

1950

ECONOMIST

STATISTICIAN

1931

Robert Gibrat PROPORTIONAL GROWTH

POLITICAL SCIENTIST

Robert Merton MATTHEW EFFECT

2010

Preferential attachment has emerged independently in many disciplines, helping explain the presence of power laws characterising various systems. In the context of networks preferential attachment was introduced in 1999 to explain the scale-free property.

Herbert Alexander Simon MASTER EQUATION

George Udmy Yule YULE PROCESS

ECONOMIST

György Pólya (1887-1985) Preferential attachment made its first appearance in 1923 in the celebrated urn model of the Hungarian mathematician György Pólya [2]. Hence, in mathematics preferential attachment is often called a Pólya process.

1923 1925

MATHEMATICIAN

György Pólya PÓLYA PROCESS

George Kinsley Zipf WEALTH DISTRIBUTION

PREFERENTIAL ATTACHMENT: A BRIEF HISTORY

FIG 5.2

SECTION 5.2

THE BARABÁSI-ALBERT MODEL

The recognition that growth and preferential attachment coexist in real networks has inspired a minimal model called the Barabási-Albert model, which can generate scale-free networks [1]. Also known as the BA model or the scale-free model, it is defined as follows: We start with m0 nodes, the links between which are chosen arbitrarily, as long as each node has at least one link. The network develops following two steps (Figure 5.3): (A) Growth At each timestep we add a new node with m (≤ m0) links that connect

the new node to m nodes already in the network. (B) Preferential attachment

The probability Π(k) that a link of the new node connects to node i depends on the degree ki as

Π( ki ) =

ki

∑k j

.

Figure 5.3 Evolution of the Barabási-Albert Model

The sequence of images shows nine subsequent steps of the Barabási-Albert model. Empty circles mark the newly added node to the network, which decides where to connect its two links (m=2) using preferential attachment (5.1). After [9].

(5.1)

j

Preferential attachment is a probabilistic mechanism: A new node is free to connect to any node in the network, whether it is a hub or has a single link. Equation (5.1) implies, however, that if a new node has a choice

>

between a degree-two and a degree-four node, it is twice as likely that it connects to the degree-four node. After t timesteps the Barabási-Albert model generates a network with N = t + m0 nodes and m0 + mt links. As Figure 5.4 shows, the obtained network has a power-law degree distribution with degree exponent γ=3. A mathe-

Online Resource 5.2 Emergence of a Scale-free Network

matically self-consistent definition of the model is provided in BOX 5.1.

Watch a video that shows the growth of a scale-free network and the emergence of the hubs in the Barabási-Albert model. Courtesy of Dashun Wang.

As Figure 5.3 and Online Resource 5.2 indicate, while most nodes in the

>

network have only a few links, a few gradually turn into hubs. These hubs are the result of a rich-gets-richer phenomenon: Due to preferential attachTHE BARABÁSI-ALBERT MODEL

8

ment new nodes are more likely to connect to the more connected nodes

100

than to the smaller nodes. Hence, the larger nodes will acquire links at the

10-1

expense of the smaller nodes, eventually becoming hubs.

10-2 pk

In summary, the Barabási-Albert model indicates that two simple

10-3 10-4

mechanisms, growth and preferential attachment, are responsible for the

10-5

emergence of scale-free networks. The origin of the power law and the as-

10-6

sociated hubs is a rich-gets-richer phenomenon induced by the coexistence

10-7

of these two ingredients. To understand the model’s behavior and to quan-

10

100

tify the emergence of the scale-free property, we need to become familiar with the model’s mathematical properties, which is the subject of the next section.

γ=3

-8

101

k

102

103

Figure 5.4 The Degree Distribution

The degree distribution of a network generated by the Barabási-Albert model. The figure shows pk for a single network of size N=100,000 and m=3. It shows both the linearly-binned (purple) and the log-binned version (green) of pk. The straight line is added to guide the eye and has slope γ=3, corresponding to the network’s predicted degree exponent.

THE BARABÁSI-ALBERT MODEL

9

THE BARABÁSI-ALBERT MODEL

BOX 5.1 G1(0)

THE MATHEMATICAL DEFINITION OF THE BARABÁSI-ALBERT MODEL The definition of the Barabási-Albert model leaves many mathe-

G1(1)

matical details open:

1

• It does not specify the precise initial configuration of the first m0 nodes.

G1(2)

• It does not specify whether the m links assigned to a new node are added one by one, or simultaneously. This leads to potential

1

2

or

2 3

p=

mathematical conflicts: If the links are truly independent, they

1

2

p=

1 3

could connect to the same node i, resulting in multi-links. 2

Bollobás and collaborators [10] proposed the Linearized Chord Diagram (LCD) to resolve these problems, making the model more

1

amenable to mathematical approaches.

or

G1(3)

According to the LCD, for m=1 we build a graph G1(t) as follows (Figure 5.5):

1

3

p=

2

3 5

3

p=

(1) Start with G1(0), corresponding to an empty graph with no

1 5

or

1

2

a

nodes. (2) Given G1(t-1) generate G1(t) by adding the node vt and a single link between vt and vi, where vi is chosen with probability

3

1 p= 5

(4.1) Figure 5.5

p=

ki 2t 1 1 , 2t 1

if 1 i

t 1

if i = t

The Linearized Chord Diagram (LCD)

(5.2)

b

G1(0): We start with an empty network. G1(1): The first node can only link to itself, forming a self-loop. Self-loops are allowed, and so are multi-links for m>1. G1(2): Node 2 can either connect to node 1 with probability 2/3, or to itself with probability 1/3. According to (5.2), half of the links that the new node 2 brings along is already counted as present. Consequently node 1 has degree k1=2 at node 2 has degree k2=1, the normalization constant being 3. G1(3): Let us assume that the first of the two G1(t) network possibilities have materialized. When node 3 comes along, it again has three choices: It can connect to node 2 with probability 1/5, to node 1 with probability 3/5 and to itself with probability 1/5.

That is, we place a link from the new node vt to node vi with prob-

ability ki/(2t-1), where the new link already contributes to the degree of vt. Consequently node vt can also link to itself with prob-

ability 1/(2t - 1), the second term in (5.2). Note also that the model permits self-loops and multi-links. Yet, their number becomes negligible in the t→∞ limit. For m > 1 we build Gm(t) by adding m links from the new node vt one by one, in each step allowing the outward half of the newly added link to contribute to the degrees.

THE BARABÁSI-ALBERT MODEL

The construction of the LCD, the version of the Barabási-Albert model amenable to exact mathematical calculations [10]. The figure shows the first four steps of the network's evolution for m=1:

10

INTRODUCTION

SECTION 5.3

DEGREE DYNAMICS

To understand the emergence of the scale-free property, we need to focus on the time evolution of the Barabási-Albert model. We begin by exploring the time-dependent degree of a single node [11]. In the model an existing node can increase its degree each time a new node enters the network. This new node will link to m of the N(t) nodes already present in the system. The probability that one of these links connects to node i is given by (5.1). Let us approximate the degree ki with a continuous real variable, representing its expectation value over many realizations of the growth process. The rate at which an existing node i acquires links as a result of new nodes connecting to it is

dki k = mΠ( ki ) = m N −1i . dt ∑k j

(5.3)

j =1

The coefficient m describes that each new node arrives with m links. Hence, node i has m chances to be chosen. The sum in the denominator of (5.3) goes over all nodes in the network except the newly added node, thus N −1

∑k j =1

j

= 2 mt − m.

(5.4)

Therefore (5.4) becomes

dki ki . = dt 2t − 1

(5.5)

For large t the (-1) term can be neglected in the denominator, obtaining

dki 1 dt . = 2 t ki

(5.6)

By integrating (5.6) and using the fact that ki (ti)=m, meaning that node i

joins the network at time ti with m links, we obtain

THE BARABÁSI-ALBERT MODEL

11

β

⎛ t ⎞ = m ⎜ ⎟ ∑kki (t) j = 2 mt − m. ⎝ ti ⎠ j =1 N −1

BOX 5.2

(5.7)

We call β the dynamical exponent and has the value N −1

∑k j =1

j

TIME IN NETWORKS

1 = 2 mt β =− m. 2

As we compare the predictions of the network models with real data, we

Equation (5.7) offers a number of predictions:

have to decide how to measure time in networks. Real networks evolve

• The degree of each node increases following a power-law with the

over rather different time scales:

same dynamical exponent β =1/2 (Figure 5.6a). Hence all nodes follow the same dynamical law.

World Wide Web The first webpage was created in

• The growth in the degrees is sublinear (i.e. β < 1). This is a consequence

1991. Given its trillion documents,

of the growing nature of the Barabási-Albert model: Each new node has

the WWW added a node each milli-

more nodes to link to than the previous node. Hence, with time the ex-

second (103 sec).

isting nodes compete for links with an increasing pool of other nodes.

Cell

• The earlier node i was added, the higher is its degree ki(t). Hence, hubs

The cell is the result of 4 billion years

are large because they arrived earlier, a phenomenon called first-mov-

of evolution. With roughly 20,000

er advantage in marketing and business.

genes in a human cell, on average the cellular network added a node

• The rate at which the node i acquires new links is given by the deriva-

every 200,000 years (~1013 sec).

tive of (5.7)

dki (t ) m 1 , = . dt 2 tit

Given these enormous time-scale (5.8)

differences it is impossible to use real time to compare the dynamics of different networks. Therefore, in

indicating that in each time step older nodes acquire more links (as

network theory we use event time,

they have smaller ti). Furthermore the rate at which a node acquires

advancing our time-step by one each

links decreases with time as t

−1/2

. Hence, fewer and fewer links go to a

time when there is a change in the

node.

network topology.

In summary, the Barabási-Albert model captures the fact that in real

For example, in the Barabási-Albert

networks nodes arrive one after the other, offering a dynamical descrip-

model the addition of each new node

tion of a network’s evolution. This generates a competition for links during

corresponds to a new time step,

which the older nodes have an advantage over the younger ones, eventual-

hence t=N. In other models time

ly turning into hubs.

is also advanced by the arrival of a new link or the deletion of a node. If needed, we can establish a direct mapping between event time and the physical time.

THE BARABÁSI-ALBERT MODEL

12

DEGREE DYNAMICS

105 (a)

Figure 5.6

SINGLE NETWORK

Degree Dynamics

104

(a) The growth of the degrees of nodes added at time t =1, 10, 102, 103, 104, 105 (continuous lines from left to right) in the Barabási-Albert model. Each node increases its degree following (5.7). Consequently at any moment the older nodes have higher degrees. The dotted line corresponds to the analytical prediction (5.7) with β = 1/2.



10

3

k 102 101

(b) Degree distribution of the network after adding N = 102, 104, and 106 nodes, i.e. at time t = 102, 104, and 106 (illustrated by arrows in (a)). The larger the network, the more obvious is the power-law nature of the degree distribution. Note that we used linear binning for pk to better observe the gradual emergence of the scale-free state.

100 100

(b)

101

102

103

t

104

105

k

k

η N To determine the degree distribution in the large N limit, we first calculate N k k of knodes η  than C/η k, i.e. the number fitness η and with degree greater η (t) > with η

m k

t0 < t (t)>> kk. Using (6.3) we find that this condition implies those that satisfy k kkη (t) η k

kη (t) > k

t0

 m C/η t0 k) ρ(η)dη i  η ρ(η)dη, C/η C/η η ηk t  mm ≈ 1− P (k) = P (ki ≤ k) = 1−P (ki > k) ≈ 1− + t m 0 ρ(η)dη (6.43) η 0C/η m 0 + ttk 0≈ 1− k ρ(η)dη m C/η P (k) = P (ki ≤ k) = 1−P (ki > k) ≈ 1− 0 m ρ(η)dη, k 0 P (k) = P (ki ≤ k) = 1−P (ki > k) ≈ 1− ≈ 1− ρ(η)dη, k m0 + t 0 k m0 + t 0 t asymptotically, for large t. The probability where the last equation is valid t i

 m C/η





≈η1− m0 C/η

t density function for the degree  distribution is η

p(k) = P  (k) = 

recovering (6.6).

0

p(k) = P (k) = 

p(k) = P  (k) =

EVOLVING NETWORKS

Ct

C/η −(C/η+1)

 ηη m 0 η

0

k

i η

i

ρ(η)dη,



 m C/η k

ρ(η)dη,

η

C C/η −(C/η+1) ρ(η)dη, m k η mC/η k −(C/η+1) ρ(η)dη,

Cp(k) = P  (k)ρ(η)dη, mC/η= k −(C/η+1) ηC 0

η

32

ADVANCED TOPICS 6.A SOLVING THE FITNESS MODEL

SECTION 6.9

BIBLIOGRAPHY

[1] A.L. Barabási. Linked: The New Science of Networks. Perseus, Boston, 2001. [2] G. Bianconi and A.-L. Barabási. Competition and multiscaling in evolving networks. Europhysics Letters, 54: 436-442, 2001. [3] A.-L. Barabási, R. Albert, H. Jeong, and G. Bianconi. Power-law distribution of the world wide web. Science, 287: 2115, 2000. [4] P.L. Krapivsky and S. Redner. Statistics of changes in lead node in connectivity-driven networks. Phys. Rev. Lett., 89:258703, 2002. [5] C. Godreche and J. M. Luck. On leaders and condensates in a growing network. J. Stat. Mech., P07031, 2010. [6] J. H. Fowler, C. T. Dawes, and N. A. Christakis. Model of Genetic Variation in Human Social Networks. PNAS, 106: 1720-1724, 2009. [7] M. O. Jackson. Genetic influences on social network characteristics. PNAS, 106:1687–1688, 2009. [8] S.A. Burt. Genes and popularity: Evidence of an evocative gene environment correlation. Psychol. Sci., 19:112–113, 2008. [9] J. S. Kong, N. Sarshar, and V. P. Roychowdhury. Experience versus talent shapes the structure of the Web. PNAS, 105:13724-9, 2008. [10] A.-L. Barabási, C. Song, and D. Wang. Handful of papers dominates citation. Nature, 491:40, 2012. [11] D. Wang, C. Song, and A.-L. Barabási. Quantifying Long term scientific impact. Science, 342:127-131, 2013. [12] M. Medo, G. Cimini, and S. Gualdi. Temporal effects in the growth of EVOLVING NETWORKS

33

networks. Phys. Rev. Lett., 107:238701, 2011. [13] C. Venter et al. The sequence of the human genome. Science, 291:1304-1351, 2001. [14] A.-L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286:509-512, 1999. [15] G. Bianconi and A.-L. Barabási. Bose-Einstein condensation in complex networks. Phys. Rev. Lett., 86: 5632–5635, 2001. [16] C. Borgs, J. Chayes, C. Daskalakis, and S. Roch. First to market is not everything: analysis of preferential attachment with fitness. STOC’07, San Diego, California, 2007. [17] S. N. Dorogovtsev, J.F.F. Mendes, and A.N. Samukhin. Structure of growing networks with preferential linking. Phys. Rev. Lett., 85: 4633, 2000. [18] C. Godreche, H. Grandclaude, and J.M. Luck. Finite-time fluctuations in the degree statistics of growing networks. J. of Stat. Phys., 137:11171146, 2009. [19] Y.-H. Eom and S. Fortunato. Characterizing and Modeling Citation Dynamics. PLoS ONE, 6: e24926, 2011. [20] A.-L. Barabási, H. Jeong, Z. Néda, E. Ravasz, A. Schubert, and T. Vicsek. Evolution of the social network of scientific collaborations. Physica A, 311: 590-614, 2002. [21] R. Albert, and A.-L. Barabási. Topology of evolving networks: local events and universality. Phys. Rev. Lett., 85:5234-5237, 2000. [22] G. Goshal, L. Chi, and A.-L Barabási. Uncovering the role of elementary processes in network evolution. Scientific Reports, 3:1-8, 2013. [23] J.H. Schön, Ch. Kloc, R.C. Haddon, and B. Batlogg. A superconducting field-effect switch. Science, 288: 656–8. 2000. [24] D. Agin. Junk Science: An Overdue Indictment of Government, Industry, and Faith Groups That Twist Science for Their Own Gain. Macmillan, New York, 2007. [25] S. Saavedra, F. Reed-Tsochas, and B. Uzzi. Asymmetric disassembly and robustness in declining networks. PNAS, 105:16466–16471, 2008. [26] F. Chung and L. Lu. Coupling on-line and off-line analyses for random power-law graphs. Int. Math., 1: 409-461, 2004. [27] C. Cooper, A. Frieze, and J. Vera. Random deletion in a scalefree random graph process. Int. Math. 1, 463-483, 2004. EVOLVING NETWORKS

34

BIBLIOGRAPHY

[28] S. N. Dorogovtsev and J. Mendes. Scaling behavior of developing and decaying networks. Europhys. Lett., 52: 33-39, 2000. [29] C. Moore, G. Ghoshal, and M. E. J. Newman. Exact solutions for models of evolving networks with addition and deletion of nodes. Phys. Rev. E, 74: 036121, 2006. [30] H. Bauke, C. Moore, J. Rouquier, and D. Sherrington. Topological phase transition in a network model with preferential attachment and node removal. The European Physical Journal B, 83: 519-524, 2011. [31] M. Pascual and J. Dunne, (eds). Ecological Networks: Linking Structure to Dynamics in Food Webs. Oxford Univ Press, Oxford, 2005. [32] R. Sole and J. Bascompte. Self-Organization in Complex Ecosystems. Princeton University Press, Princeton, 2006. [33] U. T. Srinivasan, J. A. Dunne, J. Harte, and N. D. Martinez. Response of complex food webs to realistic extinction sequencesm. Ecology, 88:671– 682, 2007. [34] M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology. ACM SIGCOMM Computer Communication Review, 29: 251-262, 1999. [35] A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, and A. Tomkins. Graph structure in the web. Computer Networks, 33: 309-320, 2000. [36] J. Leskovec, J. Kleinberg, and C. Faloutsos, Graph evolution: Densification and shrinking diameters. ACM TKDD07, ACM Transactions on Knowledge Discovery from Data, 1:1, 2007. [37] H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai, and A.-L. Barabási. The large-scale organization of metabolic networks. Nature, 407: 651–655, 2000. [38] S. Dorogovtsev and J. Mendes. Effect of the accelerating growth of communications networks on their structure. Phys. Rev. E, 63: 025101(R), 2001. [39] M. J. Gagen and J. S. Mattick. Accelerating, hyperaccelerating, and decelerating networks. Phys. Rev. E, 72: 016123, 2005. [40] C. Cooper and P. Prałat. Scale-free graphs of increasing degree. Random Structures & Algorithms, 38: 396–421, 2011. [41] N. Deo and A. Cami. Preferential deletion in dynamic models of web-like networks. Inf. Proc. Lett., 102: 156-162, 2007. EVOLVING NETWORKS

35

BIBLIOGRAPHY

[42] S.N. Dorogovtsev and J.F.F. Mendes. Evolution of networks with aging of sites. Phys. Rev. E, 62:1842, 2000. [43] A.N. Amaral, A. Scala, M. Barthélémy, and H.E. Stanley. Classes of small-world networks. Proc. National Academy of Sciences USA, 97: 11149, 2000. [44] K. Klemm and V. M. Eguiluz. Highly clustered scale free networks. Phys. Rev. E, 65: 036123, 2002. [45] X. Zhu, R. Wang, and J.-Y. Zhu. The effect of aging on network structure. Phys. Rev. E, 68: 056121, 2003.

EVOLVING NETWORKS

36

BIBLIOGRAPHY

7 ALBERT-LÁSZLÓ BARABÁSI

NETWORK SCIENCE DEGREE CORRELATION

ACKNOWLEDGEMENTS

MÁRTON PÓSFAI GABRIELE MUSELLA MAURO MARTINO NICOLE SAMAY

ROBERTA SINATRA SARAH MORRISON AMAL HUSSEINI PHILIPP HOEVEL

INDEX

Introduction

1

Assortativity and Disassortativity

2

Measuring Degree Correlations

3

Structural Cutoffs Correlations in Real Networks

4 5

Generating Correlated Networks

6

The Impact of Degree Correlations

7

Summary

8

Homework

9

ADVANCED TOPICS 7.A

Degree Correlation Coefficient

10

ADVANCED TOPICS 7.B

Structural Cutoffs

11

Bibliography

12

Figure 7.0 (cover image) TheyRule.net by Josh On

Created by Josh On, a San Francisco-based designer, the interactive website TheyRule.net uses a network representation to illustrate the interlocking relationship of the US economic class. By mapping out the shared board membership of the most powerful U.S. companies, it reveals the influential role of a small number of individuals who sit on multiple boards. Since its release in 2001, the project is interchangeably viewed as art or science.

This book is licensed under a Creative Commons: CC BY-NC-SA 2.0. PDF V24 18.09.2014

SECTION 7.1

INTRODUCTION

Angelina Jolie and Brad Pitt, Ben Affleck and Jennifer Garner, Harrison Ford and Calista Flockhart, Michael Douglas and Catherine Zeta-Jones, Tom Cruise and Katie Holmes, Richard Gere and Cindy Crawford (Figure 7.1). An odd list, yet instantly recognizable to those immersed in the headline-driven world of celebrity couples. They are Hollywood stars that are or were married. Their weddings (and breakups) has drawn countless hours of media coverage and sold millions of gossip magazines. Thanks to them we take for granted that celebrities marry each other. We rarely pause to ask: Is this normal? In other words, what is the true chance that a celebrity marries another celebrity?

Figure 7.1 Hubs Dating Hubs

Celebrity couples, representing a highly visible proof that in social networks hubs tend to know, date and marry each other (Images from http://www.whosdatedwho.com).

Assuming that a celebrity could date anyone from a pool of about a hundred million (108) eligible individuals worldwide, the chances that their mate would be another celebrity from a generous list of 1,000 other celebrities is only 10-5. Therefore, if dating were driven by random encounters, celebrities would never marry each other. Even if we do not care about the dating habits of celebrities, we must pause and explore what this phenomenon tells us about the structure of the social network. Celebrities, political leaders, and CEOs of major corporations tend to know an exceptionally large number of individuals and are known by even more. They are hubs. Hence celebrity dating (Figure 7.1) and joint board memberships (Figure 7.0) are manifestations of an interesting property of social network: hubs tend to have ties to other hubs. As obvious this may sound, this property is not present in all networks. Consider for example the protein-interaction network of yeast, shown in Figure 7.2. A quick inspection of the network reveals its scale-free nature: numerous one- and two-degree proteins coexist with a few highly connected hubs. These hubs, however, tend to avoid linking to each other. They link instead to many small-degree nodes, generating a hub-and-spoke pattern. This is particularly obvious for the two hubs highlighted in Figure 7.2: they almost exclusively interact with small-degree proteins.

DEGREE CORRELATIONS

3

k’ =

13

Figure 7.2

k

=

56

Hubs Avoiding Hubs The protein interaction map of yeast. Each node corresponds to a protein and two proteins are linked if there is experimental evidence that they can bind to each other in the cell. We highlighted the two largest hubs, with degrees k = 56 and k′ = 13. They both connect to many small degree nodes and avoid linking to each other. The network has N = 1,870 proteins and L = 2,277 links, representing one of the earliest protein interaction maps [1, 2]. Only the largest component is shown. Note that the protein interaction network of yeast in TABLE 4.1 represents a later map, hence it contains more nodes and links than the network shown in this figure. Node color corresponds to the essentiality of each protein: the removal of the red nodes kills the organism, hence they are called lethal or essential proteins. In contrast the organism can survive without one of its green nodes. After [3]. Pajek

A brief calculation illustrates how unusual this pattern is. Let us assume that each node chooses randomly the nodes it connects to. Therefore the probability that nodes with degrees k and k′ link to each other is

pk,k ′ =

kk ′ . 2L

(7.1)

Equation (7.1) tells us that hubs, by the virtue of the many links they have, are much more likely to connect to each other than to small degree nodes. Indeed, if k and k′ are large, so is pk,k’ . Consequently, the likelihood

that hubs with degrees k=56 and k’ = 13 have a direct link between them

is pk,k’ = 0.16, which is 400 times larger than p1,2 = 0.0004, the likelihood that a degree-two node links to a degree-one node. Yet, there are no direct links between the hubs in Figure 7.2, but we observe numerous direct links between small degree nodes. Instead of linking to each other, the hubs highlighted in Figure 7.2 almost exclusively connect to degree one nodes. By itself this is not unexpected: We expect that a hub with degree k = 56 should link to N1 p1, 56 ≈ 12

nodes with k = 1. The problem is that this hub connects to 46 degree one neighbors, i.e. four times the expected number. In summary, while in social networks hubs tend to “date” each other, in the protein interaction network the opposite is true: The hubs avoid linking to other hubs, connecting instead to many small degree nodes. While it is dangerous to derive generic principles from two examples, the purpose of this chapter is to show that these patterns are manifestations of a general property of real networks: they exhibit a phenomena called degree correlations. We discuss how to measure degree correlations and explore their impact on the network topology. DEGREE CORRELATIONS

4

INTRODUCTION

SECTION 7.2

ASSORTATIVITY AND DISASSORTATIVITY

Just by the virtue of the many links they have, hubs are expected to link to each other. In some networks they do, in others they don’t. This is illustrated in Figure 7.3, that shows three networks with identical degree sequences but different topologies: • Neutral Network Figure 7.3b shows a network whose wiring is random. We call this network neutral, meaning that the number of links between the hubs coincides with what we expect by chance, as predicted by (7.1). • Assortative Network The network of Figure 7.3a has precisely the same degree sequence as the one in Figure 7.3b. Yet, the hubs in Figure 7.3a tend to link to each other and avoid linking to small-degree nodes. At the same time the small-degree nodes tend to connect to other small-degree nodes. Networks displaying such trends are assortative. An extreme manifestation of this pattern is a perfectly assortative network, in which each degree-k node connects only to other degree-k nodes (Figure 7.4). • Disassortative Network In Figure 7.3c the hubs avoid each other, linking instead to small-degree nodes. Consequently the network displays a hub and-spoke character, making it disassortative. In general a network displays degree correlations if the number of links between the high and low-degree nodes is systematically different from what is expected by chance. In other words, the number of links between nodes of degrees k and k′ deviates from (7.1).

DEGREE CORRELATIONS

5

Figure 7.3

Degree Correlation Matrix (d)

assortative

(a)

20

0.02

15

0.015

(a,b,c) Three networks that have precisely the same degree distribution (Poisson pk), but display different degree correlations. We show only the largest component and we highlight in orange the five highest degree nodes and the direct links between them.

k1 10

0.01

5

0.005

(d,e,f) The degree correlation matrix eij for an assortative (d), a neutral (e) and a disassortative network (f) with Poisson degree distribution, N=1,000, and 〈k〉=10. The colors correspond to the probability that a randomly selected link connects nodes with degrees k1 and k2.

0 0

5

k1

(b)

10

k2

15

k1

k2

20

(a,d) Assortative Networks For assortative networks eij is high along the main diagonal. This indicates that nodes of comparable degree tend to link to each other: small-degree nodes to small-degree nodes and hubs to hubs. Indeed, the network in (a) has numerous links between its hubs as well as between its small degree nodes.

k2

(e) 20

0.02

15

0.015

(b,e) Neutral Networks In neutral networks nodes link to each other randomly. Hence the density of links is symmetric around the average degree, indicating the lack of correlations in the linking pattern.

neutral

k1 10

0.01

5

0.005

(c,f) Disassortative Networks In disassortative networks eij is higher along the secondary diagonal, indicating that hubs tend to connect to small-degree nodes and small-degree nodes to hubs. Consequently these networks have a hub and spoke character, as seen in (c).

0 0

10

15

k2

20

(f)

disassortative

(c)

5

20

0.02

15

0.015

k1 10

0.01

5

0.005

0 0

5

k1

10

kk22

k2

15

k1

20

k2

15

DEGREE CORRELATIONS

6

ASSORTATIVITY AND DISASSORTATIVITY

The information about potential degree correlations is captured by the degree correlation matrix, eij, which is the probability of finding a node with degrees i and j at the two ends of a randomly selected link. As eij is a

probability, it is normalized, i.e.

∑e i, j

ij

= 1.

(7.2)

In (5.27) we derived the probability qk that there is a degree-k node at the end of the randomly selected link, obtaining

kpk . 〈k〉

(7.3)

eij = q i .

(7.4)

qk = We can connect qk to eij via j

Figure 7.4

In neutral networks, we expect

Perfect Assortativity

eij = qi q j .

(7.5)

In a perfectly assortative network each node links only to nodes with the same degree. Hence ejk = δjkqk, where δjk is the Kronecker delta. In this case all non-diagonal elements of the ejk matrix are zero. The figure shows such a perfectly assortative network, consisting of complete k-cliques.

A network displays degree correlations if eij deviates from the random

expectation (7.5). Note that (7.2) - (7.5) are valid for networks with an arbitrary degree distribution, hence they apply to both random and scale-free networks. Given that eij encodes all information about potential degree correla-

tions, we start with its visual inspection. Figures 7.3d,e,f show eij for an assortative, a neutral and a disassortative network. In a neutral network small and high-degree nodes connect to each other randomly, hence eij lacks any

trend (Figure 7.3e). In contrast, assortative networks show high correlations along the main diagonal, indicating that nodes predominantly connect to nodes with comparable degree. Therefore low-degree nodes tend to link to other low-degree nodes and hubs to hubs (Figure 7.3d). In disassortative net-

works eij displays the opposite trend: it has high correlations along the secondary diagonal. Therefore high-degree nodes tend to connect to low-degree nodes (Figure 7.3f). In summary information about degree correlations is carried by the degree correlation matrix eij. Yet, the study of degree correlations through

the inspection of eij has numerous disadvantages:

• It is difficult to extract information from the visual inspection of a matrix. • Unable to infer the magnitude of the correlations, it is difficult to compare networks with different correlations. 2 /2 independent variables, representing • ejk contains approximately k max

a huge amount of information that is difficult to model in analytical calculations and simulations. We therefore need to develop a more compact way to detect degree correlations. This is the goal of the subsequent sections. DEGREE CORRELATIONS

7

ASSORTATIVITY AND DISASSORTATIVITY

SECTION 7.3

MEASURING DEGREE CORRELATIONS

While eij contains the complete information about the degree correlations characterizing a particular network, it is difficult to interpret its content. In this section is to introduce the degree correlation function that

j2

j1

offers a simpler way to quantify degree correlations.

i

Degree correlations capture the relationship between the degrees of

j3

nodes that link to each other. One way to quantify their magnitude is to measure for each node i the average degree of its neighbors (Figure 7.5) N

1 knn (ki ) = ∑ Aij k j . ki j=1

j4

(7.6)

The degree correlation function calculates (7.6) for all nodes with degree

Figure 7.5

Nearest Neighbor Degree: knn

k [4, 5]

knn (k) = ∑ k ′P( k ′ | k) , k′

To determine the degree correlation function knn(ki ) we calculate the average degree of a node’s neighbors. The figure illustrates the calculation of knn(ki ) for node i. As the degree of the node i is ki = 4, by averaging the degree of its neighbors j1, j2, j3 and j4, we obtain knn(4) = (4 + 3 + 3 + 1)/4 = 2.75.

(7.7)

where P(k’|k) is the conditional probability that following a link of a k-degree node we reach a degree-k' node. Therefore knn(k) is the average degree of the neighbors of all degree-k nodes.To quantify degree correlations we inspect the dependence of knn(k) on k. • Neutral Network For a neutral network (7.3)-(7.5) predict

P( k ′ | k) =

ekk ′ e q q = kk ′ = k ′ k = qk ′ . ∑ ekk′ qk qk

(7.8)

k′

This allows us to express knn(k) as

knn (k) = ∑ k ′qk ′ = ∑ k ′ k′

k′

k ′p( k ′ ) 〈k 2 〉 = . 〈k〉 〈k〉

(7.9)

Therefore, in a neutral network the average degree of a node’s neighbors is independent of the node’s degree k and depends only on the global network characteristics ⟨k⟩ and ⟨k2⟩. So plotting knn(k) in func-

tion of k should result in a horizontal line at ⟨k2⟩/⟨k⟩, as observed for DEGREE CORRELATIONS

j2

8

the power grid (Figure 7.6b). Equation (7.9) also captures an intriguing

SCIENTIFIC COLLABORATION

(a)

102

property of real networks: our friends are more popular than we are,

ASSORTATIVE

a phenomenon called the friendship paradox (BOX 7.1). • Assortative Network In assortative networks hubs tend to connect to other hubs, hence the higher is the degree k of a node, the higher is the average degree of

knn(k)

101

its nearest neighbors. Consequently for assortative networks knn(k)

Random prediction

increases with k, as observed for scientific collaboration networks

~k0.37

(Figure 7.6a). 100

• Disassortative Network

(b)

In disassortative network hubs prefer to link to low-degree nodes.

101

Consequently knn(k) decreases with k, as observed for the metabolic

101

k

102

103

POWER GRID

network (Figure 7.6c).

NEUTRAL

The behavior observed in Figure 7.6 prompts us to approximate the degree correlation function with [4]

knn (k) = ak µ .

knn(k)

(7.10)

Random prediction ~k-0.04

If the scaling (7.10) holds, then the nature of degree correlations is determined by the sign of the correlation exponent μ:

100

102

METABOLIC NETWORK

(c)

• Assortative Networks: μ > 0

k

101

103

A fit to knn(k) for the science collaboration network provides μ = 0.37 DISASSORTATIVE

± 0.11 (Figure 7.6a).

• Neutral Networks: μ = 0 According to (7.9) knn(k) is independent of k. Indeed, for the power grid

we obtain μ = 0.04 ± 0.05, which is indistinguishable from zero (Figure 7.6b). • Disassortative Networks: μ < 0 For the metabolic network we obtain μ = − 0.76 ± 0.04 (Figure 7.6c).

102

knn(k) 101 Random prediction ~k-0.76

Figure 7.6

100

101

Degree Correlation Function

102

k

103

The degree correlation function knn(k) for three real networks. The panels show knn(k) on a loglog plot to test the validity of the scaling law (7.10).

In summary, the degree correlation function helps us capture the presence or absence of correlations in real networks. The knn(k) function also plays an important role in analytical calculations, allowing us to predict the impact of degree correlations on various network characteristics (SEC-

(a) Collaboration Network The increasing knn(k) with k indicates that the network is assortative.

TION 7.6). Yet, it is often convenient to use a single number to capture the magnitude of correlations present in a network. This can be achieved either through the correlation exponent μ defined in (7.10), or using the de-

(b) Power Grid The horizontal knn(k) indicates the lack of degree correlations, in line with (7.9) for neutral networks.

gree correlation coefficient introduced in BOX 7.2.

(c) Metabolic Network The decreasing knn(k) documents the network’s disassortative nature. On each panel the horizontal line corresponds to the prediction (7.9) and the green dashed line is a fit to (7.10).

DEGREE CORRELATIONS

9

MEASURING DEGREE CORRELATIONS

BOX 7.2

BOX 7.1

DEGREE CORRELATION COEFFICIENT

FRIENDSHIP PARADOX

If we wish to characterize degree correlations using a single number,

The friendship paradox makes

we can use either μ or the degree correlation coefficient. Proposed by

a suprising statement: On av-

Mark Newman [8,9], the degree correlation coefficient is defined as

r=∑ jk

jk(e jk − q j qk ) σ2

erage my friends are more popular than I am [6,7]. This claim

(7.11)

is rooted in (7.9), telling us that the average degree of a node’s

with

neighbors is not simply ⟨k⟩, but 2

⎡ ⎤ σ = ∑k qk − ⎢ ∑kqk ⎥ . ⎣ k ⎦ k 2

2

depends on ⟨k2⟩ as well.

(7.12)

Consider a random network, for

Hence r is the Pearson correlation coefficient between the degrees

which ⟨k2⟩ = ⟨k⟩(1 + ⟨k⟩). Accord-

found at the two end of the same link. It varies between −1 ≤ r ≤ 1: For

ing to (7.9) knn(k) = 1+⟨k⟩. There-

r < 0 the network is assortative, for r = 0 the network is neutral and

fore the average degree of a

for r > 0 the network is disassortative. For example, for the scientific

node’s neighbors is always high-

collaboration network we obtain r = 0.13, in line with its assortative

er than the average degree of a

nature; for the protein interaction network r = −0.04, supporting its

randomly chosen node, which is

disassortative nature and for the power grid we have r = 0.

⟨k⟩.

The assumption behind the degree correlation coefficient is that

The gap between ⟨k⟩ and our

knn(k) depends linearly on k with slope r. In contrast the correlation

friends’ degree can be partic-

exponent μ assumes that knn(k) follows the power law (7.10). Naturally,

ularly large in scale-free net-

both cannot be valid simultaneously. The analytical models of SEC-

works, for which ⟨k2⟩/⟨k⟩ signifi-

TION 7.7 offer some guidance, supporting the validity of (7.10). As we

cantly exceeds ⟨k⟩ (Figure 4.8).

show in ADVANCED TOPICS 7.A, in general r correlates with μ.

Consider for example the actor network, for which ⟨k2⟩/⟨k⟩ = 565 (Table 4.1). In this network the average degree of a node's friends is hundreds of times the degree of the node itself. The friendship paradox has a simple origin: We are more likely to be friends with hubs than with small-degree nodes, simply because hubs have more friends than the small nodes.

DEGREE CORRELATIONS

10

MEASURING DEGREE CORRELATIONS

SECTION 7.4

STRUCTURAL CUTOFFS

Throughout this book we assumed that networks are simple, meaning that there is at most one link between two nodes (Figure 2.17). For example, in the email network we place a single link between two individuals that are in email contact, despite the fact that they may have exchanged multiple messages. Similarly, in the actor network we connect two actors with a single link if they acted in the same movie, independent of the number of joint movies. All datasets discussed in Table 4.1 are simple networks. In simple networks there is a puzzling conflict between the scale-free property and degree correlations [10, 11]. Consider for example the scalefree network of Figure 7.7a, whose two largest hubs have degrees k = 55 and k' = 46. In a network with degree correlations ekk' the expected number of links between k and k' is

Ekk ′ = ekk ′ 〈k〉N .

(7.13)

For a neutral network ekk, is given by (7.5), which, using (7.3), predicts

55 46 k pk k ' pk ' Ekk ' = N = 300 300 300 = 2.8 . k 3

(7.14)

Therefore, given the size of these two hubs, they should be connected to each other by two to three links to comply with the network’s neutral nature. Yet, in a simple network we can have only one link between them, causing a conflict between degree correlations and the scale-free property. The goal of this section is to understand the origin and the consequences of this conflict. For small k and k' (7.14) predicts that Ekk’ is also small, i.e. we expect less than one link between the two nodes. Only for nodes whose degree exceeds some threshold ks does (7.14) predict multiple links. As we show in ADVANCED TOPICS 7.B, ks, called structural cutoff, scales as

DEGREE CORRELATIONS

11

(a) a

a

Figure 7.7

Structural Disassortativity (a) A scale-free network with N=300, L=450, and γ=2.2, generated by the configuration model (Figure 4.15). By forbidding self-loops and multi-links, we made the network simple. We highlight the two largest nodes in the network. As (7.14) predicts, to maintain the network’s neutral nature, we need approximately three links between these two nodes. The fact that we do not allow multilinks (simple network representation) makes the network disassortative, a phenomena called structural disassortativity.

b

b (b)

(b) To illustrate the origins of structural correlations we start from a fixed degree sequence, shown as individual stubs on the left. Next we randomly connect the stubs (configuration model). In this case the expected number of links between the nodes with degree 8 and 7 is 8x7/28 ≈ 2. Yet, if we do not allow multi-links, there can only be one link between these two nodes, making the network structurally disassortative.

>

>

12

STRUCTURAL CUTOFFS

ks (N ) ∼ (〈k〉N )1/2 .

(7.15)

In other words, nodes whose degree exceeds (7.15) have Ekk’ > 1, a conflict that as we show below gives rise to degree correlations. To understand the consequences of the structural cutoff we must first ask if a network has nodes whose degrees exceeds (7.15). For this we compare the structural cutoff, ks, with the natural cutoff, kmax, which is the expected

largest degree in a network. According to (4.18), for a scale-free network 1 γ 1

kmax ∼ N . Comparing kmax to ks allows us to distinguish two regimes: • No Stuctural Cutoff For random networks and scale-free networks with γ ≥ 3 the exponent of kmax is smaller than 1/2, hence kmax is always smaller than ks. In other words the node size at which the structural cutoff turns on exceeds the size of the biggest hub. Consequently we have no nodes for which Ekk’ > 1. For these networks we do not have a conflict between degree correlations and the simple network requirement. • Stuctural Disassortativity For scale-fee networks with

γ < 3 we have 1/(γ-1) > 1/2, i.e. ks can be

smaller than kmax. Consequently nodes whose degree is between ks and kmax can violate Ekk’ > 1. In other words the network has fewer links between its hubs than (7.14) would predict. These networks will therefore

become disassortative, a phenomenon we call structural disassortativity. This is illustrated in Figures 7.8a,b that show a simple scale-free network generated by the configuration model. The network shows disassortative scaling, despite the fact that we did not impose degree correlations during its construction. We have two avenues to generate networks that are free of structural disassortativity: (i) We can relax the simple network requirement, allowing multiple links between the nodes. The conflict disappears and the network will be neutral (Figures 7.8c,d). (ii) If we insist having a simple scale-free network that is neutral or assortative, we must remove all hubs with degrees larger than ks. This is illustrated in Figures 7.8e,f: a network that lacks nodes with k ≥ 100 is neutral. Finally, how can we decide whether the correlations observed in a particular network are a consequence of structural disassortativity, or are generated by some unknown process that leads to degree correlations? Degree-preserving randomization (Figure 4.17) helps us distinguish these two possibilities: (i) Degree Preserving Randomization with Simple Links (R-S) We apply degree-preserving randomization to the original network DEGREE CORRELATIONS

13

STRUCTURAL CUTOFFS

(a)

(b)

100

103

10-2 102

10-4 pk

knn(k)

10

-6

101

10-8 10-10

100 10

0

10

1

10

2

k 10

3

10

4

(c)

(d)

10

103

0

10 0

10 1

10 2 k 103

10 4

10 0

10 1

10 2 k

10 4

10-2 102

10-4 pk

knn(k)

10-6

101

10-8 10-10

100 10 0

10 1

10 2 k

103

10 4

(e)

(f)

100

103

103

10-2

Figure 7.8

10

2

10-4 pk

Natural and Structural Cutoffs

knn(k)

10

-6

The figure illustrates the tension between the scale-free property and degree correlations. We show the degree distribution (left panels) and the degree correlation function knn(k) (right panels) of a scale-free network with N = 10,000 and γ = 2.5, generated by the configuration model (Figure 4.15).

101

10-8 100

10-10 10 0

10 1

10 2 k 10 3

10 4

10 0

10 1

10 2 k 10 3

10 4

(a,b) If we generate a scale-free network with the power-law degree distribution shown in (a), and we forbid self-loops and multilinks, the network displays structural disassortativity, as indicated by knn(k) in (b). In this case, we lack a sufficient number of links between the high-degree nodes to maintain the neutral nature of the network, hence for high k the knn(k) function must decay. (c,d) We can eliminate structural disassortativity by relaxing the simple network requirement, i.e. allowing multiple links between two nodes. As shown in (c,d), in this case we obtain a neutral scale-free network. (e,f) If we impose an upper cutoff by removing all nodes with k ≥ ks ≃ 100, as predicted by (7.15), the network becomes neutral, as seen in (f).

DEGREE CORRELATIONS

14

STRUCTURAL CUTOFFS

Real Network

100

(a)

(b)

ASSORTATIVE

102

101

102

100

103

(c)

NEUTRAL

101

SCIENTIFIC COLLABORATION

k

POWER GRID

102

METABOLIC NETWORK

102

knn(k) 101

knn(k) 101

R-S R-M Real Network

100

101

k

102

103

100

101

k

102

DISASSORTATIVE

NEUTRAL

1 3 at each step we make sure that we10do not permit more than one 10and

METABOLIC NETWORK

POWER GRID

link between a pair of nodes. On the algorithmic side this means

100

R−S ly explained by the degree distribution.1 If the randomized knn (k)

10

does not show degree correlations while knn(k) does, there is some unknown process that generates the observed degree correlations. 1

2

103

Randomization with Simple Links (R-S): At each step of the randomization process we check that we do not have more than one link between any node pairs.

103

For a self-consistency check it is sometimes useful to perform deDISASSORTATIVE

Randomization with Multiple Links (R-M): We allow multi-links during the randomization processes.

3 10gree-preserving METABOLIC NETWORK randomization that allows for multiple links be-

tween the nodes. On the algorithmic side this means that we allow each random rewiring, even if it leads to multi-links. This process

102

We performed these two randomizations for the networks of Figure 7.6. The R-M procedure always generates a neutral network, conseR-M quently knn (k) is always horizontal. The true insight is obtained when we compare knn(k) R-S with k nn (k), helping us to decide if the observed correlations are structural:

eliminates all degree correlations.

We 101performed the randomizations discussed above for three real networks. As Figure 7.9a shows, the assortative nature of the scientific collaboration network disappears under both randomizations. This indicates that the assortative correlations of the collaboration network is not linked to 100 101 102 103 k contrast, its scale-free nature. In for the metabolic network the observed

(a) Scientific Collaboration Network The increasing knn(k) differs from the horiR-S zontal knn (k), indicating that the network’s assortativity is not structural. Consequently the assortativity is generated by some process that governs the network’s evolution. This is not unexpected: structural effects can generate only disassortativity, not assortativity.

disassortativity remains unchanged under R-S (Figure 7.9c). Consequently the disassortativity of the metabolic network is structural, being induced by its degree distribution. In summary, the scale-free property can induce disassortativity in simple networks. Indeed, in neutral or assortative networks we expect multi-

(b) Power Grid R-S R-M The horizontal knn(k), k nn (k) and k nn (k) all support the lack of degree correlations (neutral network).

ple links between the hubs. If multiple links are forbidden (simple graph), the network will display disassortative tendencies. This conflict vanishes for scale-free networks with γ ≥ 3 and for random networks. It also vanishes if we allow multiple links between the nodes.

DEGREE CORRELATIONS

102

To uncover the origin of the observed degree correlations, we must compare knn(k) (grey R-S symbols), with k nn (k) and k R-M (k) obtained after nn degree-preserving randomization. Two degree-preserving randomizations are informative in this context:

102

0

k

Randomization and Degree Correlations

R−S (k) are indistinguishable, then real knn(k) and the randomized knn (k) knnsystem knn(k)the correlations observed in a real are all structural, ful-

1 10 10 100 Multiple10Links 102 (ii)10Degree Preserving Randomization with k k (R-M)

101

Figure 7.9

that each rewiring that generates multi-links is discarded. If the

knn(k)

k

DISASSORTATIVE

103

knn(k)

101

(c) Metabolic Network R-S As both knn(k) and knn (k) decrease, we conclude that the network’s disassortativity is induced by its scale-free property. Hence the observed degree correlations are structural.

15

STRUCTURAL CUTOFFS

SECTION 7.5

CORRELATIONS IN REAL NETWORKS

To understand the prevalence of degree correlations we need to inspect the correlations characterizing real networks. In Figure 7.10 we show the knn(k) function for the ten reference networks, observing several patterns: • Power Grid For the power grid knn(k) is flat and indistinguishable from its randomized version, indicating a lack of degree correlations (Figure 7.10a). Hence the power grid is neutral. • Internet For small degrees (k ≤ 30) knn(k) shows a clear assortative trend, an

effect that levels off for high degrees (Figure 7.10b). The degree correlations vanish in the randomized version of the Internet map. Hence the Internet is assortative, but structural cutoffs eliminate the effect for high k. • Social Networks The three networks capturing social interactions, the mobile phone network, the science collaboration network and the actor network, all have an increasing knn(k), indicating that they are assortative (Figures

7.10c-e). Hence in these networks hubs tend to link to other hubs and low-degree nodes tend to link to low-degree nodes. The fact that the R-S (k), indicates that the assortative observed knn(k) differs from the knn

nature of social networks is not due to their scale-free the degree distribution. • Email Network While the email network is often seen as a social network, its knn(k)

decreases with k, documenting a clear disassortative behavior (Figure R-S

7.10f). The randomized knn (k) also decays, indicating that we are observing structural disassortativity, a consequence of the network’s scale-free nature.

DEGREE CORRELATIONS

16

• Biological Networks The protein interaction and the metabolic network both have a negative μ, suggesting that these networks are disassortative. Yet, the scalR-S

ing of knn (k) is indistinguishable from knn (k), indicating that we are observing structural disassortativity, rooted in the scale-free nature of these networks (Figure 7.10 g,h). • WWW The decaying knn(k) implies disassortative correlations (Figure 7.10i). The randomized knn (k) also decays, but not as rapidly as knn(k). Hence R-S

the disassortative nature of the WWW is not fully explained by its degree distribution. • Citation Network This network displays a puzzling behavior: for k ≤ 20 the degree correlation function knn(k) shows a clear assortative trend; for k > 20,

however, we observe disassortative scaling (Figure 7.10j). Such mixed behavior can emerge in networks that display extreme assortativity (Figure 7.13b). This suggests that the citation network is strongly as-

sortative, but its scale-free nature induces structural disassortativity, changing the slope of knn(k) for k ≫ ks. In summary, Figure 7.10 indicates that to understand degree correlaR-S

tions, we must always compare knn(k) to the degree randomized knn (k). It also allows us to draw some interesting conclusions: (i) Of the ten reference networks the power grid is the only truly neutral network. Hence most real networks display degree correlations. (ii) All networks that display disassortative tendencies (email, protein, metabolic) do so thanks to their scale-free property. Hence, these are all structurally disassortative. Only the WWW shows disassortative correlations that are only partially explained by its degree distribution. (iii) The degree correlations characterizing assortative networks are not explained by their degree distribution. Most social networks (mobile phone calls, scientific collaboration, actor network) are in this class and so is the Internet and the citation network. A number of mechanisms have been proposed to explain the origin of the observed assortativity. For example, the tendency of individuals to form communities, the topic of CHAPTER 9, can induce assortative correlations [12]. Similarly, the society has endless mechanisms, from professional committees to TV shows, to bring hubs together, enhancing the assortative nature of social and professional networks. Finally, homophily, a well documented social phenomena [13], indicates that individuals tend to associate with other individuals of similar background and characteristics, hence individuals with comparable degree tend to know each other. This degree-homophily may be responsible for the celebrity marriages as well (Figure 7.1). DEGREE CORRELATIONS

17

DEGREE CORRELATIONS IN REAL NETWORKS

POWER GRID

(a)

INTERNET

(b)

101

103

µ=-0.04

µ=0.56

102

knn(k)

knn(k) 101

Real Network (log-bin) Real Network (lin-bin) R-S

100 10

k

10

0

1

100

10

MOBILE PHONE CALLS

(c)

102

100

2

101

102 k 103

SCIENTIFIC COLLABORATION

(d)

102

µ=0.33

knn(k)

104

µ=0.16

knn(k)

101

100

10

k

10

0

1

101

10

ACTOR

(e)

100

2

k

105

µ=0.34

102

103

EMAIL

(f)

104

101

µ=-0.74

10

4

103

knn(k)

knn(k)

103

102

102

101 10

0

10 k 10 10

10 10 1

2

3

4

PROTEIN

(g)

101 100

5

102 k 103

103

µ=-0.10

knn(k)

104

METABOLIC

(h)

102

101

µ=-0.76

102

Figure 7.10

knn(k)

101

Randomization and Degree Correlations

10

1

100 10

0

(i)

10

1

k

100 10

(j)

WWW

10

100

2

101

k

102

The degree correlation function knn(k) for the ten reference networks (Table 4.1). The grey symbols show the knn(k) function using linear binning; purple circles represent the same data using log-binning (SECTION 4.11). The green dotted line corresponds to the best fit to (7.10) within the fitting interval marked by the arrows at the bottom. Orange squares represent k R-S (k) nn obtained for 100 independent degree-preserving randomizations, while maintaining the simple character of these networks. Note that we made directed networks undirected when we measured knn(k). To fully characterize the correlations emerging in directed networks we must use the directed correlation function (BOX 7.3).

103

CITATION

4

µ=-0.82

µ=-0.18

103

102

knn(k)

knn(k)

102 101 100 101 102 DEGREE CORRELATIONS

103 k 104 105

101 100

101

102 k 103

104 18

DEGREE CORRELATIONS IN REAL NETWORKS

in-in in-out out-in out-out

BOX 7.3 102

CORRELATIONS IN DIRECTED NETWORKS kα, β(k ) nn

β

The degree correlation function (7.7) is defined for undirected net1 measure correlations in directed networks we must take works. 10To

and into account that each node i is characterized by an incoming k in i an outgoing k iout degree [14]. We therefore define four degree correla-

α,β tion functions, k nn (k), where α and β refer to the in and out indices

α,β (k) for citation networks, in(Figures 7.11 a-d). In Figure 7.11e we show knn

dicating 100 a lack of in-out correlations and the presence of assortativity for small k for 0the other three (in-in, out-in, out-out).4 1 correlations 2 3

10

10

(a)

10



10

10

(b)

in-in

in-out

(c)

(d)

out-in

(e)

out-out

103

in-in in-out out-in out-out

Figure 7.11 Correlations in Directed Network

102 kα,nnβ(kβ)

101

100 10 0

10 1

a

10 2



10 3

10 4

(a)-(d) The four possible correlations characterizing a directed network. We show in purple and green the (α, β) indices that define the appropriate correlation function [14]. For example, (a) in,in describes the knn (k) correlations between the in-degrees of two nodes connected by a link. (e) The k α,nnβ (k) correlation function for citation networks, a directed network. in,in For example knn (k) is the average indegree of the in-neighbors of nodes with in-degree kin. These functions show a clear assortative tendency for three of the four functions up to degree k ≃ 100. The empty symbols capture the degree randomized k α,nnβ (k) for each degree correlation function (R-S randomization).

b 19

DEGREE CORRELATIONS

in-in

in-out

DEGREE CORRELATIONS IN REAL NETWORKS

SECTION 7.6

GENERATING CORRELATED NETWORKS

To explore the impact of degree correlations on various network characteristics we must first understand the correlations characterizing the network models discussed thus far. It is equally important to develop algorithm that can generate networks with tunable correlations. As we show in this section, given the conflict between the scale-free property and degree correlations, this is not a trivial task. DEGREE CORRELATIONS IN STATIC MODELS Erdős-Rényi Model The random network model is neutral by definition. As it lacks hubs, it does not develop structural correlations either. Hence for the ErdősRényi network knn(k) is given by (7.9), predicting μ = 0 for any ⟨k⟩ and N. Configuration Model The configuration model (Figure 4.15) is also neutral, independent of the choice of the degree distribution pk. This is because the model allows for both multi-links and self-loops. Consequently, any conflicts caused by the hubs are resolved by the multiple links between them. If, however, we force the network to be simple, then the generated network will develop structural disassortativity (Figure 7.8). Hidden Parameter Model In the model ejk is proportional to the product of the randomly chosen

hidden variables ηj and ηk (Figure 4.18). Consequently the network is tech-

nically uncorrelated. However, if we do not allow multi-links, for scalefree networks we again observe structural disassortativity. Analytical calculations indicate that in this case [18] knn(k) ~ k−1,

(7.16)

i.e. the degree correlation function follows (7.10) with μ = − 1. Taken together, the static models explored so far generate either neutral networks, or networks characterized by structural disassortativity following (7.16). DEGREE CORRELATIONS

20

103

DEGREE CORRELATIONS IN EVOLVING NETWORKS

Randomized (R-S)

knn(k)

To understand the emergence (or the absence) of degree correlations in growing networks, we start with the initial attractiveness model (SECTION 6.5), which includes as a special case the Barabási-Albert model.

>

102

Initial Attractiveness Model

~k-0.5

Consider a growing network in which preferential attachment follows (6.23), i.e. Π(k) ∼ A + k, where A is the initial attractiveness. Depending on

101

the value of A, we observe three distinct scaling regimes [15]: (i) Disassortative Regime: γ < 3

10 0

If − m < A < 0 we have

knn (k)  m

1−

A m

(m + A) ⎛ 2m ⎞ ς⎜ ⎟N 2m + A ⎝ 2m + A ⎠

A − 2 m+A

k

A m

(7.17)

A m

(7.18)

(ii) Neutral Regime: γ = 3 If A = 0 the initial attractiveness model reduces to the Barabási-Albert model. In this case

knn (k) 

m ln N. 2

(7.19)

Consequently knn(k) is independent of k, hence the network is neutral. (iii) Weak Assortativity: γ > 3 If A > 0 the calculations predict

⎛ k ⎞ knn (k) ≈ (m + A)ln ⎜ . ⎝ m + A ⎟⎠

(7.20)

As knn(k) increases logarithmically with k, the resulting network displays a weak assortative tendency, but does not follow (7.10).

In summary, (7.17) - (7.20) indicate that the initial attractiveness model generates rather complex degree correlations, from disassortativity to weak assortativity. Equation (7.19) also shows that the network generated by the Barabási-Albert model is neutral. Finally, (7.17) predicts a power law k-dependence for knn(k), offering analytical support for the empirical scaling (7.10).

Bianconi-Barabási Model With a uniform fitness distribution the Bianconi-Barabási model generates a disassortative network [5] (Figure 7.12). The fact that the randomized version of the network is also disassortative indicates that the model's disassortativity is structural. Note, however, that the real knn(k) DEGREE CORRELATIONS

10 4

10 5

The degree correlation function of the Bianconi-Barabási model for N = 10,000, m = 3 and uniform fitness distribution (SECTION 6.2). As the green dotted line indicates, follwing (7.10) indicates, the network is disassortative, consistent with μ ≃ -0.5. The orange symbols corR-S R-S respond to knn (k). As knn (k) also decreases, the bulk of the observed disassortativity is strucR-S tural. But the difference between knn(k) and knn (k) suggests that structural effects cannot fully account for the observed degree correlation.

ing the power-law [15, 16] −

10 2 k 10 3

Figure 7.12 Correlations in the Bianconi-Barabási Model

Hence the resulting network is disassortative, knn(k) decaying follow-

knn (k) ∼ k

10 1

21

GENERATING CORRELATED NETWORKS

(a) STEP 1 LINK SELECTION (a)

···

k=3

kmax

Figure 7.13 Xulvi-Brunet & Sokolov Algorithm

···

Assortative Neutral Disassortative

knn(k)

a

d (e)

10

b

c

k=2

k=1

(b)(b) 2

The algorithm generates networks with maximal degree correlations.

k=3

DISASSORTATIVE k ≥ k a

≥ kc ≥ kd

b

10

k=1

k=2

k=3

···

kmax

STEP 2 REWIRE

k=3

100

k=2 k=1

k=2

10 ASSORTATIVE

(a) The basic steps of the algorithm. (b) knn(k) for networks generated by the algorithm for a scale-free network with N = GENERATING CORRELATED NETWORKS 1,000, L = 2,500, γ = 3.0. (c, d) A typical network configuration and the corresponding Aij matrix for the maximally assortative network generated by the algorithm, where the rows and columns of Aij were ordered according to increasing node degrees k. (e,f) Same as in (c,d) for a maximally disassortative network.

(f)

1

···

0

10

1

k=1

k

DISASSORTATIVE

10 2

(d) (c) ASSORTATIVE (c) ASSORTATIVE

(d)

(d)

···

k=2 k=3

k=1

k=1

k=2 k=3

···

The Aij matrices (d) and (f) capture the inner regularity of networks with maximal correlations, consisting of blocks of nodes that connect to nodes with similar degree in (d) and of blocks of nodes that connect to nodes with rather different degrees in (f).

kmax kmax

···

···

E

(f)

k=3 k=2

k=3

k=1

k=2 k=1

(a) STEP 1 LINK SELECTION

(b)

b

c

102

knn(k)

a (e) dDISASSORTATIVE DISASSORTATIVE (e) ka ≥ kb ≥ kc ≥ kd

Assortative Neutral Disassortative

(f) (f)

(f)

k=2

k=1 k=1

101

k=2

···

k=3 k=3

···

102

Assortative 100 Neutral 10 0 Disassortative

kASSORTATIVE (k) nn

DISASSORTATIVE

k=3 10 2

10 1

k

···

(b)

···

STEP 2 REWIRE

kmax

kmax

k=3

k=2 k=1

k=2

101

k=1 (c)

(d)

ASSORTATIVE

k=1

k=2 k=3

10

···

kmax

0

10 2

10 1

k

TATIVE

···

10 0

(d)

k=3

k=1

k=2 k=3

· · k=2 ·

kmax

k=1

··· k=3 (a) STEP 1 LINK SELECTION

c d

b

(b)

a ka ≥ kb ≥ kc ≥ kd

DEGREE CORRELATIONS

102

knn(k) 101

STEP 2 REWIRE

0

Assortative Neutral Disassortative

k=2 k=1 22

R-S and the randomized knn (k) do not overlap, indicating that the disassor-

tativity of the model is not fully explained by its scale-free nature. TUNING DEGREE CORRELATIONS Several algorithms can generate networks with desired degree correlations [8, 17, 18]. Next we discuss a simplified version of the algorithm proposed by Xalvi-Brunet and Sokolov that aims to generate maximally correlated networks with a predefined degree sequence [19, 20, 21]. It consists of the following steps (Figure 7.13a): • Step 1: Link Selection Choose at random two links. Label the four nodes at the end of these two links with a, b, c, and d such that their degrees are ordered as ka ≥ kb ≥ kc ≥ kd. • Step 2: Rewiring Break the selected links and rewire them to form new pairs. Depending on the desired degree correlations the rewiring is done in two ways: • Step 2A: Assortative By pairing the two highest degree nodes (a with b) and the two lowest degree nodes (c with d), we connect nodes with comparable degrees, enhancing the network’s assortative nature. • Step 2B: Disassortative By pairing the highest and the lowest degree nodes (a with d and b with c), we connect nodes with different degrees, enhancing the network’s disassortative nature. By iterating these steps we gradually enhance the network’s assortative (Step 2A) or disassortative (Step 2B) features. If we aim to generate a simple network (free of multi-links), after Step 2 we check whether the particular rewiring leads to multi-links. If it does, we reject it, returning to Step 1. The correlations characterizing the networks generated by this algorithm converge to the maximal (assortative) or minimal (disassortative) value that we can reach for the given degree sequence (Figure 7.13b). The model has no difficulty creating disassortative correlations (Figures 7.13e,f). In the assortative limit simple networks display a mixed knn(k): assortative

for small k and disassortative for high k (Figures 7.13b). This is a consequence

of structural cutoffs: For scale-free networks the system is unable to sustain assortativity for high k. The observed behavior is reminiscent of the knn(k) function of citation networks (Figure 7.10j). The version of the Xalvi-Brunet & Sokolov algorithm introduced in Figure 7.13 generates maximally assortative or disassortative networks. We can tune the magnitude of the generated degree correlations if we use the algorithm discussed in Figure 7.14. DEGREE CORRELATIONS

23

GENERATING CORRELATED NETWORKS

In summary, static models, like the configuration or hidden parameter model, are neutral if we allow multi-links, and develop structural disassortativity if we force them to generate simple networks. To generate networks with tunable correlations, we can use for example the Xalve-Brunet & Sokolov algorithm. An important result of this section is (7.16) and (7.18), offering the analytical form of the degree correlation function for the hidden paramenter model and for a growing network, in both case predicting a power-law k-dependence. These results offer analytical backing for the scaling hypothesis (7.10), indicating that both structural and dynamical effects can result in a degree correlation function that follows a power law.

DEGREE CORRELATIONS

24

INTRODUCTION

cc

ASSORTATIVE

DISASSORTATIVE

ASSORTATIVE ASSORTATIVE

(a) 2 10210SELECTION STEP 1 LINK

a

knnk(k) (k) nn

c

10 10 1

STEP 2 REWIRE

p=0.2 p=0.2 p=0.4 p=0.4 p=0.6 p=0.6 p=0.8 p=0.8 p=1.0 p=1.0

1

a-b

c-d

102102

Figure 7.14 Tuning Degree Correlations

p=0.2 µ=-0.064 p=0.2 µ=-0.064

We can use the Xalvi-Brunet & Sokolov algop=0.4 µ=-0.080 p=0.4 µ=-0.080 rithm to tune the magnitude of degree corp=0.6 µ=-0.085 p=0.6 µ=-0.085 relations.

knnk(k) (k) nn

p=0.8 µ=-0.095 p=0.8 µ=-0.095 p=1.0 p=1.0rewiring step (a) We execute the deterministic

ASSORTATIVE

b

p

10 10 1

with probability p, and with probability 1 − p we randomly pair the a, b, c, d nodes with each other. For p = 1 we are back to the algorithm of Figure 7.13, generating maximal degree correlations; for p < 1 the induced noise tunes the magnitude of the effect.

1

b-c DISASSORTATIVE

a

d

(b) Typical network configurations generated for p = 0.5.

a-d

100100

101 10 1 1- p

ka ≥ kb ≥ kc ≥ kd

102102

k k

(c) The knn(k) functions for various p values for 1 N 2 and a network 10 with = 1, 101 = 10,000, ⟨k⟩10 102 γ = 3.0.

100100

k k

RANDOM REWIRE

Note that the correlation exponent μ depends on the fitting region, especially in the assortative case.

ASSORTATIVE

(b)

DISASSORTATIVE DISASSORTATIVE

DISASSORTATIVE

bb

c

c

ASSORTATIVE ASSORTATIVE

DISASSORTATIVE DISASSORTATIVE

ASSORTATIVE

(c)

DISASSORTATIVE

ASSORTATIVE ASSORTATIVE 10 10 p=0.2 p=0.2 p=0.4 p=0.4 p=0.6 p=0.6 knn(k) p=0.8 1 LINK SELECTION a kann(k)STEP STEPp=0.8 1 LINK SELECTION p=1.0 p=1.0 2

2

101

101

c-dc-d

101

101

ASSORTATIVE ASSORTATIVE

bb

c c

DISASSORTATIVE DISASSORTATIVE 10 10 p=0.2 µ=-0.064 p=0.2 µ=-0.064 p=0.4 µ=-0.080 p=0.4 µ=-0.080 p=0.6 µ=-0.085 p=0.6 µ=-0.085 knn(k) knn(k) p=0.8 µ=-0.095 p=0.8 µ=-0.095 STEP 2 REWIRE STEP 2 REWIRE a-b a-b p=1.0 p=1.0 2

2

pp b-cb-c

100

dd

100

DEGREE CORRELATIONS

a a

101

101 k

102 k

10 DISASSORTATIVE 100 DISASSORTATIVE 2

100

a-d a-d 25

101

101 k

102 k

GENERATING CORRELATED NETWORKS

102

SECTION 7.7

THE IMPACT OF DEGREE CORRELATIONS

1

As we have seen in Figure 7.10, most real networks are characterized by works display structural disassortativity. These correlations raise an im-

0.8 S/N 0.6

portant question: Why do we care? In other words, do degree correlations

0.4

some degree correlations. Social networks are assortative; biological net-

alter the properties of a network? And which network properties do they

Assortative Neutral Disassortative

0.2

influence? This section addresses these important questions.

0

An important property of a random network is the emergence of a phase transition at ⟨k⟩ = 1, marking the appearance of the giant component (SECTION 3.6). Figure 7.15 shows the relative size of the giant component for networks with different degree correlations, documenting several pat-

1

k

1.5

2

2.5

3

Figure 7.15

Degree Correlations and the Phase Transition Point

terns [8, 19, 20]:

Relative size of the giant component for an Erdős-Rényi network of size N=10,000 (green curve), which is then rewired using the Xalvi-Brunet & Sokolov algorithm with p = 0.5 to induce degree correlations (Figure 7.14). The figure indicates that as we move from assortative to disassortative networks, the phase transition point is delayed and the size of the giant component increases for large ⟨k⟩. Each point represents an average over 10 independent runs.

• Assortative Networks For assortative networks the phase transition point moves to a lower ⟨k⟩, hence a giant component emerges for ⟨k⟩ < 1. The reason is that it is easier to start a giant component if the high-degree nodes seek out each other. • Disassortative Networks The phase transition is delayed in disassortative networks, as in these the hubs tend to connect to small degree nodes. Consequently, disassortative networks have difficulty forming a giant component. • Giant Component For large ⟨k⟩ the giant component is smaller in assortative networks than in neutral or disassortative networks. Indeed, assortativity forces the hubs to link to each other, hence they fail to attract to the giant component the numerous small degree nodes. These changes in the size and the structure of the giant component have implications to the spread of diseases [22, 23, 24], the topic of CHAPTER 10. Indeed, as we have seen in Figure 7.10, social networks tend to be assortative. The high degree nodes therefore form a giant component that acts as DEGREE CORRELATIONS

0.5

26

a “reservoir” for the disease, sustaining an epidemic even when on average

0.3

the network is not sufficiently dense for the virus to persist.

0.25

Assortative Neutral Disassortative

0.2 pd 0.15

The altered giant component has implications for network robustness

24 m ax

=

m ax

18

0.05

d

damage because the hubs form a core group, hence many of them are re-

=

0.1

d

d

m ax

fragments a network. In assortative networks hub removal makes less

=

21

as well [25]. As we discuss in CHAPTER 8, the removal of a network's hubs

dundant. Hub removal is more damaging in disassortative networks, as in these the hubs connect to many small-degree nodes, which fall off the net-

0

5

10

d

15

20

25

work once a hub is deleted. Figure 7.16

Let us mention a few additional consequences of degree correlations:

Degree Correlations and Path Lengths Distance distribution for a random network with size N = 10, 000 and ⟨k⟩ = 3. Correlations are induced using the Xalvi-Brunet & Sokolov algorithm with p = 0.5 (Figure 7.14). The plots show that as we move from disassortative to assortative networks, the average path length decreases, indicated by the gradual move of the peaks to the left. At the same time the diameter, dmax, grows. Each curve represents an average over 10 independent networks.

• Figure 7.16 shows the path-length distribution of a random network rewired to display different degree correlations. It indicates that in assortative networks the average path length is shorter than in neutral networks. The most dramatic difference is in the network diameter, dmax, which is significantly higher for assortative networks. Indeed, assortativity favors links between nodes with similar degree, resulting in long chains of k = 2 nodes, enhancing dmax (Figure 7.13c). • Degree correlations influence a system’s stability against stimuli and perturbations [26] as well as the synchronization of oscillators placed on a network [27, 28]. • Degree correlations have a fundamental impact on the vertex cover problem [29], a much-studied problem in graph theory that requires us to find the minimal set of nodes (cover) such that each link is connected to at least one node in the cover (BOX 7.4). • Degree correlations impact our ability to control a network, altering the number of input signals one needs to achieve full control [30]. In summary, degree correlations are not only of academic interest, but they influence numerous network characteristics and have a discernable impact on many processes that take place on a network.

DEGREE CORRELATIONS

27

THE IMPACT OF DEGREE CORRELATIONS

BOX 7.4 VERTEX COVER AND MUSEUM GUARDS Imagine that you are the director of an open-air museum located in a large park. You wish to place guards on the crossroads to observe each path. Yet, to save cost you want to use as few guards as possible. How many guards do you need? Let N be the number of crossroads and m < N is the number of guards N ) ways of placing the m guards you can afford to hire. While there are (m

at N crossroads, most configurations leave some paths unsupervised [31]. The number of trials one needs to place the guards so that they cover all paths grows exponentially with N. Indeed, this is one of the six basic NP-complete problems, called the vertex cover problem. The vertex cover of a network is a set of nodes such that each link is connected to at least one node of the set (Figure 7.17). NP-completeness means

Figure 7.17 The Minimum Cover Formally, a vertex cover of a network is a set C of nodes such that each link of the network connects to at least one node in C. A minimum vertex cover is a vertex cover of smallest possible size. The figure above shows examples of minimum vertex covers in two small networks, where the set C is shown in purple. We can check that if we turn any of the purple nodes into green nodes, at least one link will not connect to a purple node.

that there is no known algorithm which can identify a minimal vertex cover substantially faster than using as exhaustive search, i.e. checking each possible configuration individually. The number of nodes in the minimal a vertex cover depends on the network topology, being affected by the degree distribution and degree correlations [29].

DEGREE CORRELATIONS

28

THE IMPACT OF DEGREE CORRELATIONS

SECTION 7.8

SUMMARY

BOX 7.5 AT A GLANCE: DEGREE CORRELATIONS

Degree correlations were first discovered in 2001 in the context of the Internet by Romualdo Pastor-Satorras, Alexei Vazquez, and Alessandro Vespignani [4, 5], who also introduced the degree correlation function

Degree Correlation Matrix eij

knn(k) and the scaling (7.10). A year later Kim Sneppen and Sergey Maslov

used the full p(ki,kj), related to the eij matrix, to characterize the degree

Neutral networks:

correlations of protein interaction networks [32]. In 2003 Mark Newman

eij = qi qi =

introduced the degree correlation coefficient [8, 9] together with the assortative, neutral, and disassortative distinction. These terms have their roots in social sciences [13]:

ki pki k j pk j 〈k〉 2

Degree Correlation Function

knn (k) = ∑ k ' p(k ' | k)

Assortative mating reflects the tendency of individuals to date or marry

k'

individuals that are similar to them. For example, low-income individuals marry low-income individuals and college graduates marry college grad-

Neutral networks:

uates. Network theory uses assortativity in the same spirit, capturing the

knn (k) =

degree-based similarities between nodes: In assortative networks hubs tend to connect to other hubs and small-degree nodes to other small-de-

〈k 2 〉 〈k〉

gree nodes. In a network environment we can also encounter the tradition-

Scaling Hypothesis

al assortativity, when nodes of similar properties link to each other (Figure

knn (k) ∼ k µ

7.18).

μ > 0: Assortative Disassortative mixing, when individuals link to individuals wo are unlike

μ = 0: Neutral

them, is also common in some social and economic systems. Sexual net-

μ < 0: Disassortative

works are perhaps the best example, as most sexual relationships are be-

Degree Correlation Coefficient

tween individuals of different gender. In economic settings trade typically takes place between individuals of different skills: the baker does not sell

r=∑

bread to other bakers, and the shoemaker rarely fixes other shoemaker's

jk

shoes.

r > 0: Assortative r = 0: Neutral

Taken together, there are several reasons why we care about degree cor-

r < 0: Disassortative

relations in networks (BOX 7.5): • Degree correlations are present in most real networks (SECTION 7.5).

DEGREE CORRELATIONS

jk(e jk − q j qk ) σ2

29

• Once present, degree correlations change a network’s behavior (SECTION 7.7).



Figure 7.18

Politics is Never Neutral The network behind the US political blogosphere illustrates the presence of assortative mixing, as used in sociology, meaning that nodes of similar characteristics tend to link to each other. In the map each blue node corresponds to liberal blog and red nodes are conservative. Blue links connect liberal blogs, red links connect conservative blogs, yellow links go from liberal to conservative, and purple from conservative to liberal. As the image indicates, very few blogs link across the political divide, demonstrating the strong assortativity of the political blogosphere.

Degree correlations force us to move beyond the degree distribution, representing quantifiable patters that govern the way nodes link to each other that are not captured by pk alone.

Despite the considerable effort devoted to characterizing degree correlations, our understanding of the phenomena remains incomplete. For example, while in SECTION 7.6 we offered an algorithm to tune degree correlations, the problem is far from being fully resolved. Indeed, the most accurate description of a network's degree correlations is contained in the eij matrix. Generating networks with an arbitrary eij remains a difficult task.

Finally, in this chapter we focused on the knn(k) function, which cap-

After [33].

tures two-point correlations. In principle higher order correlations are also present in some networks (BOX 7.6). The impact of such three or four point correlations remains to be understood.

DEGREE CORRELATIONS

30

SUMMARY

BOX 7.6 TWO-POINT, THREE-POINT CORRELATIONS The complete degree correlations characterizing a network are determined by the conditional probability P(k(1), k(2), ..., k(k)|k) that a node with degree k connects to nodes with degrees k(1), k(2), ..., k(k). Two-point Correlations The simplest of these is the two-point correlation discussed in this chapter, being the conditional probability P(k’|k) that a node with degree k is connected to a node with degree k′.  For uncorrelated networks this conditional probability is independent of k, i.e. P(k’| k) = k’pk’/⟨k⟩ [18]. As the empirical evaluation of P(k′|k) in real networks is cumbersome, it is more practical to analyze the degree correlation function knn(k) defined in (7.7). Three-point Correlations Correlations involving three nodes are determined by P(k(1),k(2)|k). This conditional probability is connected to the clustering coefficient. Indeed, the average clustering coefficient C(k) [22, 23] can be formally written as the probability that a degree-k node is connected to nodes with degrees k(1) and k(2), and that those two are joined by a link, averaged over all the possible values of k(1) and k(2),

C(k) =

k



(1)

,k

(2)

P(k (1) , k (2) | k)pkk(1) ,k ( 2 ) ,

where pkk , k is the probability that nodes k(1) and k(2) are connected, (1)

(2)

provided that they have a common neighbor with degree k [18]. For neutral networks C(k) is independent of k, following

(k C=

DEGREE CORRELATIONS

2

k

k 3N

)

2

.

31

SUMMARY

SECTION 7.9

HOMEWORK

7.1. Detailed Balance for Degree Correlations Express the joint probability ekk' , the conditional probability P(k'|k) and

the probability qk, discussed in this chapter, in terms of number of nodes

N, average degree 〈k〉, number of nodes with degree k, Nk, and the number

of links connecting nodes of degree k and k', Ekk' (note that Ekk' is twice the

number of links when k = k'). Based on these expressions, show that for any network we have

ekk ' = qk P ( k ' | k ). 7.2. Star Network Consider a star network, where a single node is connected to N – 1 degree one nodes. Assume that N≫1. (a) What is the degree distribution pk of this network? (b) What is the probability qk that moving along a randomly chosen link we find at its end a node with degree k?

(c) Calculate the degree correlation coefficient r for this network. Use the expressions of ekk' and P(k'|k) calculated in HOMEWORK 7.1. (d) Is this network assortative or disassortative? Explain why. 7.3. Structural Cutoffs Calculate the structural cutoff ks for the undirected networks listed in

Table 4.1. Based on the plots in Figure 7.10, predict for each network whether

ks is larger or smaller than the maximum expected degree kmax. Confirm your prediction by calculating kmax.

7.4. Degree Correlations in Erdős-Rényi Networks Consider the Erdős-Rényi G(N,L) model of random networks, introduced in CHAPTER 2 (BOX 3.1 and SECTION 3.2), where N labeled nodes are connected with L randomly placed links. In this model, the probability that there is a link connecting nodes i and j depends on the existence of a link between nodes l and s. DEGREE CORRELATIONS

32

(a) Write the probability that there is a link between i and j, eij and the probability that there is a link between i and j conditional on the existence of a link between l and s. (b) What is the ratio of such two probabilities for small networks? And for large networks? (c) What do you obtain for the quantities discussed in (a) and (b) if you use the Erdős-Rényi G(N,p) model? Based on the results found for (a)-(c) discuss the implications of using the G(N,L) model instead of the G(N,p) model for generating random networks with small number of nodes.

DEGREE CORRELATIONS

33

HOMEWORK

SECTION 7.10

ADVANCED TOPICS 7.A DEGREE CORRELATION COEFFICIENT

In BOX 7.2 we defined the degree correlation coefficient r as an alterna-

NETWORK

tive measure of degree correlations [8, 9]. The use of a single number to

N

r

μ

characterize degree correlations is attractive, as it offers a way to compare

Internet

192,244

0.02

0.56

the correlations observed in networks of different nature and size. Yet, to

WWW

325,729

-0.05

-1.11

effectively use r we must be aware of its origin.

Power Grid

4,941

0.003

0.0

Mobile Phone Calls

36,595

0.21

0.33

Email

57,194

-0.08

-0.74

Science Collaboration

23,133

0.13

0.16

Actor Network

702,388

0.31

0.34

Citation Network

449,673

-0.02

-0.18

E. Coli Metabolism

1,039

-0.25

-0.76

Protein Interactions

2,018

0.04

-0.1

The hypothesis behind the correlation coefficient r implies that the knn(k) function can be approximated by the linear function

knn (k) ∼ rk .

(7.21)

This is different from the scaling (7.10), which assumes a power law dependence on k. Equation (7.21) raises several issues: • The initial attractiveness model predicts a power law (7.18) or a logarithmic k-dependence (7.20) for the degree correlation function. A

Table 7.1 Degree Correlations in Reference Networks

The table shows the estimated r and μ for the ten reference networks. Directed networks were made undirected to measure r and μ. Alternatively, we can use the directed correlation coefficient to characterize such directed networks (BOX 7.8).

similar power law is derived in (7.16) for the hidden parameter model. Consequently, r forces a linear fit to an inherently nonlinear function. This linear dependence is not supported by numerical simulations or analytical calculations. Indeed, as we show in Figure 7.19, (7.21) offers a poor fit to the data for both assortative and disassortative networks. • As we have seen in Figure 7.10, the dependence of knn(k) on k is complex, often changing trends for large k thanks to the structural cutoff. A linear fit ignores this inherent complexity.

• The maximally correlated model has a vanishing r for large N, despite the fact that the network maintains its degree correlations (BOX 7.7).

This suggests that the degree correlation coefficient has difficulty detecting correlations characterizing large networks.

DEGREE CORRELATIONS

34

ASSORTATIVE

102

70 60

knn(k)

50 40 30

101

20 10 0

100

(b)

Figure 7.19

SCIENTIFIC COLLABORATION

(a)

101

k 102

103

Degree Correlation Function The degree correlation function knn(k) for three real networks. The left panels show the cumulative function knn(k) on a log-log plot to test the validity of (7.10). The right panels show knn(k) on a lin−lin plot to test the validity of (7.21), i.e. the assumption that knn(k) depends linearly on k. This is the hypothesis behind the correlation coefficient r. The slope of the dotted line corresponds to the correlation coefficient r. As the lin-lin plots on the right illustrate, (7.21) offers a poor fit for both assortative and disassortative networks.

0 50 100 150 200 250 300

POWER GRID

101

10

NEUTRAL

8 knn(k)

6 4 2

10

0

100

DISASSORTATIVE

(c)

0

k

101

0

102

5

10

15

20

METABOLIC NETWORK

400

103

300

102 knn(k)

200

101

100

100

0 100

101

k 102

0

103

200

400 600 800

0.5

INTERNET PHONE CALLS

Relationship Between μ and r On the positive side, r and μ are not independent of each other. To show this we calculated r and μ for the ten reference networks (TABLE 7.1). The results are plotted in Figure 7.20, indicating that μ and r correlate for positive r. Note, however, that this correlation breaks down for negative r. To understand the origin of this behavior, next we derive a direct rela-

0

and determine the value of r for a network with correlation exponent μ.

μ

CITATION

-0.5

-1

a=

WWW

-1

of the degree distribution as

k

EMAIL

METABOLIC

We start by determining a from (7.10). We can write the second moment

〈k 2 〉 = 〈knn (k)k〉 = ∑ ak µ +1 pk = a〈k µ +1 〉 ,

POWER GRID

PROTEIN

tionship between μ and r. To be specific we assume the validity of (7.10)

which leads to

ACTORS

COLLABORATION

-0.5

r

0

Figure 7.20

Correlation Between r and N To illustrate the relationship between r and μ, we estimated μ by fitting the knn(k) function to (7.10), whether or not the power law scaling was statistically significant.

2

〈k 〉 . 〈k µ +1 〉

We now calculate r for a network with a given μ: DEGREE CORRELATIONS

0.5

35

ADVANCED TOPIC 7A: DEGREE CORRELATION

r= =

k

kak µ qk 2 r

k2 2 k 2

=

k

a k µ +2

pk k

k2 2 k 2

2 r

k2 k µ +1

=

k µ +2 k

2 µ +2 k2 ⎞ 1 k ⎛ k ⎟. ⎜ µ +1 − 2 σr k ⎝ k k ⎠

2 r

k2 2 k 2

BOX 7.7

= (7.22)

THE PROBLEM WITH LARGE NETWORKS

For μ = 0 the term in the last parenthesis vanishes, obtaining r = 0.

The Xalvi-Brunet & Sokolov al-

Hence if μ = 0 (neutral network), the network will be neutral based on r

gorithm helps us calculate the

as well. For k > 1 (7.22) suggests that for μ > 0 the parenthesis is positive,

maximal (rmin) and the minimal

hence r > 0, and for μ < 0 the parenthesis is negative, hence r < 0. There-

(rmax) correlation coefficient for

fore r and μ predict degree correlations of similar kind.

a scale-free network, obtaining [21]

In summary, if the degree correlation function follows (7.10), then the sign of the degree correlation exponent μ will determine the sign of the coefficient r:

rmin μ Npk’ and k’ > Npk , the effects of the restriction

on the multiple links are felt, turning the expression for rkk′ into

rkk ′ =

DEGREE CORRELATIONS

〈k〉ekk ′ . Npk pk '

Nk Nk = 12

The maximum number of links one can have between two groups. The figure shows two groups of nodes, with degree k=3 and k’=2. The total number of links between these two groups must not exceed:

As mkk’ is the maximum of Ekk′, we must have rkk′ ≤ 1 for any k and k’.

equation

(c)

Calculating mkk'

7.22. Consequently, we can write rkk’ as

Ekk ′ 〈k〉ekk ′ = . mkk ′ min { kP(k), k ′P( k ′ ), NP(k)P( k ′ )}

k Nk = 8

Figure 7.22

is the largest possible value of Ekk′. The origin of (7.25) is explained in Figure

rkk ′ =

(b)

mNkk = min{kNk, k Nk , Nk Nk } =8

where Ekk′ is the number of links between nodes of degrees k and k’ for k≠k’ and twice the number of connecting links for k=k’, and

kNk= 9

(7.28)

38

For scale-free networks these conditions are fulfilled in the region k, k’ > (aN)1/(γ+1), where a is a constant that depends on pk. Note that this value is below the natural cutoff. Consequently this scaling provides a lower bound for the structural cutoff, in the sense that whenever the cutoff of the degree distribution falls below this limit, the condition rkk’ < 1 is always satisfied. For neutral networks the joint distribution factorizes as

ekk ′ =

kk ′pk pk ' 〈k〉 2

.

(7.29)

Hence, the ratio (7.28) becomes

rkk ′ =

kk ′ . 〈k〉N

(7.30)

Therefore, the structural cutoff needed to preserve the condition rkk’ ≤ 1 has the form [11, 34, 35, 36]

ks (N ) ~ (〈k〉N )1/2 ,

(7.31)

which is (7.15). Note that (7.31) is independent of the degree distribution of the underlying network. Consequently, for a scale-free network ks(N) is independent of the degree exponent γ.

DEGREE CORRELATIONS

39

ADVANCED TOPIC 7B: STRUCTURAL CUTOFFS

SECTION 7.12

BIBLIOGRAPHY

[1] P. Uetz, L. Giot, G. Cagney, T. A. Mansfield, RS Judson, JR Knight, D. Lockshon, V. Narayan, M. Srinivasan, P. Pochart, A. Qureshi-Emili, Y. Li, B. Godwin, D. Conover, T. Kalbfleisch, G. Vijayadamodar, M. Yang, M. Johnston, S. Fields, J. M. Rothberg. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature, 403: 623–627, 2000. [2] I. Xenarios, D. W. Rice, L. Salwinski, M. K. Baron, E. M. Marcotte, D. Eisenberg. DIP: the database of interacting proteins. Nucleic Acids Res., 28: 289–29, 2000. [3] H. Jeong, S.P. Mason, A.-L. Barabási, and Z.N. Oltvai. Lethality and centrality in protein networks. Nature, 411: 41-42, 2001. [4] R. Pastor-Satorras, A. Vázquez, and A. Vespignani. Dynamical and correlation properties of the Internet. Phys. Rev. Lett., 87: 258701, 2001. [5] A. Vazquez, R. Pastor-Satorras, and A. Vespignani. Large-scale topological and dynamical properties of Internet. Phys. Rev., E 65: 066130, 2002. [6] S.L. Feld. Why your friends have more friends than you do. American Journal of Sociology, 96: 1464–1477, 1991. [7] E.W. Zuckerman and J.T. Jost. What makes you think you’re so popular? Self evaluation maintenance and the subjective side of the “friendship paradox”. Social Psychology Quarterly, 64: 207–223, 2001. [8] M. E. J. Newman. Assortative mixing in networks. Phys. Rev. Lett., 89: 208701, 2002. [9] M. E. J. Newman. Mixing patterns in networks. Phys. Rev. E, 67: 026126, 2003. [10] S. Maslov, K. Sneppen, and A. Zaliznyak. Detection of topological pattern in complex networks: Correlation profile of the Internet. Physica DEGREE CORRELATIONS

40

A, 333: 529-540, 2004. [11] M. Boguna, R. Pastor-Satorras, and A. Vespignani. Cut-offs and finite size effects in scale-free networks. Eur. Phys. J. B, 38: 205, 2004. [12] M. E. J. Newman and Juyong Park. Why social networks are different from other types of networks. Phys. Rev. E, 68: 036122, 2003. [13] M. McPherson, L. Smith-Lovin, and J. M. Cook. Birds of a feather: homophily in social networks. Annual Review of Sociology, 27:415-444, 2001. [14] J. G. Foster, D. V. Foster, P. Grassberger, and M. Paczuski. Edge direction and the structure of networks. PNAS, 107: 10815, 2010. [15] A. Barrat and R. Pastor-Satorras. Rate equation approach for correlations in growing network models. Phys. Rev. E, 71: 036127, 2005. [16] S. N. Dorogovtsev and J. F. F. Mendes. Evolution of networks. Adv. Phys., 51: 1079, 2002. [17] J. Berg and M. Lässig. Correlated random networks. Phys. Rev. Lett., 89: 228701, 2002. [18] M. Boguñá and R. Pastor-Satorras. Class of correlated random networks with hidden variables. Phys. Rev. E, 68: 036112, 2003. [19] R. Xulvi-Brunet and I. M. Sokolov. Reshuffling scale-free networks: From random to assortative. Phys. Rev. E, 70: 066102, 2004. [20] R. Xulvi-Brunet and I. M. Sokolov. Changing correlations in networks: assortativity and dissortativity. Acta Phys. Pol. B, 36: 1431, 2005. [21] J. Menche, A. Valleriani, and R. Lipowsky. Asymptotic properties of degree-correlated scale-free networks. Phys. Rev. E, 81: 046103, 2010. [22] V. M. Eguíluz and K. Klemm. Epidemic threshold in structured scale-free networks. Phys. Rev. Lett., 89:108701, 2002. [23] M. Boguñá and R. Pastor-Satorras. Epidemic spreading in correlated complex networks. Phys. Rev. E, 66: 047104, 2002. [24] M. Boguñá, R. Pastor-Satorras, and A. Vespignani. Absence of epidemic threshold in scale-free networks with degree correlations. Phys. Rev. Lett., 90: 028701, 2003. [25] A. Vázquez and Y. Moreno. Resilience to damage of graphs with degree correlations. Phys. Rev. E, 67: 015101R, 2003. [26] S.J. Wang, A.C. Wu, Z.X. Wu, X.J. Xu, and Y.H. Wang. Response of degree-correlated scale-free networks to stimuli. Phys. Rev. E, 75: 046113, DEGREE CORRELATIONS

41

BIBLIOGRAPHY

2007. [27] F. Sorrentino, M. Di Bernardo, G. Cuellar, and S. Boccaletti. Synchronization in weighted scale-free networks with degree–degree correlation. Physica D, 224: 123, 2006. [28] M. Di Bernardo, F. Garofalo, and F. Sorrentino. Effects of degree correlation on the synchronization of networks of oscillators. Int. J. Bifurcation Chaos Appl. Sci. Eng., 17: 3499, 2007. [29] A. Vazquez and M. Weigt. Computational complexity arising from degree correlations in networks. Phys. Rev. E, 67: 027101, 2003. [30] M. Posfai, Y Y. Liu, J-J Slotine, and A.-L. Barabási. Effect of correlations on network controllability. Scientific Reports, 3: 1067, 2013. [31] M. Weigt and A. K. Hartmann. The number of guards needed by a museum: A phase transition in vertex covering of random graphs. Phys. Rev. Lett., 84: 6118, 2000. [32] S. Maslov and K. Sneppen. Specificity and stability in topology of protein networks. Science, 296: 910–913, 2002. [33] L. Adamic and N. Glance. The political blogosphere and the 2004 U.S. election: Divided they blog (2005). [34] J. Park and M. E. J. Newman. The origin of degree correlations in the Internet and other networks. Phys. Rev. E, 66: 026112, 2003. [35] F. Chung and L. Lu. Connected components in random graphs with given expected degree sequences. Annals of Combinatorics, 6: 125, 2002. [36] Z. Burda and Z. Krzywicki. Uncorrelated random networks. Phys. Rev. E, 67: 046118, 2003.

DEGREE CORRELATIONS

42

BIBLIOGRAPHY

8 ALBERT-LÁSZLÓ BARABÁSI

NETWORK SCIENCE NETWORK ROBUSTNESS

ACKNOWLEDGEMENTS

MÁRTON PÓSFAI GABRIELE MUSELLA NICOLE SAMAY ROBERTA SINATRA

SARAH MORRISON AMAL HUSSEINI PHILIPP HOEVEL

INDEX

Introduction Introduction

1

Percolation Theory

2

Robustness of Scale-free Networks

3

Attack Tolerance

4

Cascading Failures

5

Modeling Cascading Failures

6

Building Robustness

7

Summary: Achilles' Heel

8

Homework

9

ADVANCED TOPICS 8.A Percolation in Scale-free Network

10

ADVANCED TOPICS 8.B Molloy-Reed Criteria ADVANCED TOPICS 8.C Critical Threshold Under Random Failures ADVANCED TOPICS 8.D

11 12

Breakdown of a Finite Scale-free Network

13

ADVANCED TOPICS 8.E

14

Attack and Error Tolerance of Real Networks ADVANCED TOPICS 8.F Attack Threshold ADVANCED TOPICS 8.G The Optimal Degree Distribution Homework

Figure 8.0 (cover image)

Networks & Art: Facebook Users Created by Paul Butler, a Toronto-based data scientist during a Facebook internship in 2010, the image depicts the network connecting the users of the social network company. It highlights the links within and across continents. The presence of dense local links in the U.S., Europe and India is just as revealing as the lack of links in some areas, like China, where the site is banned, and Africa, reflecting a lack of Internet access.

15

16 This book is licensed under a Creative Commons: CC BY-NC-SA 2.0. PDF V26, 05.09.2014

SECTION 8.1

INTRODUCTION

Errors and failures can corrupt all human designs: The failure of a component in your car’s engine may force you to call for a tow truck or a wiring error in your computer chip can make your computer useless. Many natural and social systems have, however, a remarkable ability to sustain their basic functions even when some of their components fail. Indeed, while there are countless protein misfolding errors and missed reactions in our cells, we rarely notice their consequences. Similarly, large organizations can function despite numerous absent employees. Understanding the origins of this robustness is important for many disciplines: • Robustness is a central question in biology and medicine, helping us understand why some mutations lead to diseases and others do not. • It is of concern for social scientists and economists, who explore the stability of human societies and institutions in the face of such disrupting forces as famine, war, and changes in social and economic order. • It is a key issue for ecologists and environmental scientists, who seek to predict the failure of an ecosystem when faced with the disruptive

Figure 8.1 Achilles’ Heel of Complex Networks

The cover of the 27 July 2000 issue of Nature, highlighting the paper entitled Attack and error tolerance of complex networks that began the scientific exploration of network robustness [1].

effects of human activity. • It is the ultimate goal in engineering, aiming to design communication systems, cars, or airplanes that can carry out their basic functions despite occasional component failures. Networks play a key role in the robustness of biological, social and technological systems. Indeed, a cell's robustness is encoded in intricate regulatory, signaling and metabolic networks; the society’s resilience cannot be divorced from the interwoven social, professional, and communication web behind it; an ecosystem’s survivability cannot be understood without a careful analysis of the food web that sustains each species. Whenever nature seeks robustness, it resorts to networks.

NETWORK ROBUSTNESS

3

The purpose of this chapter is to understand the role networks play in ensuring the robustness of a complex system. We show that the structure of the underlying network plays an essential role in a system’s ability to survive random failures or deliberate attacks. We explore the role of networks in the emergence of cascading failures, a damaging phenomenon frequently encountered in real systems. Most important, we show that the laws governing the error and attack tolerance of complex networks and the emergence of cascading failures, are universal. Hence uncovering them helps us understand the robustness of a wide range of complex systems.

Figure 8.2 Robust, Robustness

“Robust” comes from the latin Quercus Robur, meaning oak, the symbol of strength and longevity in the ancient world. The tree in the figure stands near the Hungarian village Diósviszló and is documented at www.dendromania.hu, a site that catalogs Hungary's oldest and largest trees. Image courtesy of György Pósfai.

NETWORK ROBUSTNESS

4

INTRODUCTION

SECTION 8.2

PERCOLATION THEORY

The removal of a single node has only limited impact on a network’s

(a)

(b)

(c)

(d)

integrity (Figure 8.3a). The removal of several nodes, however, can break a network into several isolated components (Figure 8.3d). Obviously, the more nodes we remove, the higher are the chances that we damage a network, prompting us to ask: How many nodes do we have to delete to fragment a network into isolated components? For example, what fraction of Internet routers must break down so that the Internet turns into clusters of computers that are unable to communicate with each other? To answer these questions, we must first familiarize ourselves with the mathematical underpinnings of network robustness, offered by percolation theory. Percolation Percolation theory is a highly developed subfield of statistical physics and mathematics [2, 3, 4, 5]. A typical problem addressed by it is illustrated in Figure 8.4a,b, showing a square lattice, where we place pebbles with probability p at each intersection. Neighboring pebbles are considered connected, forming clusters of size two or more. Given that the position of each pebble is decided by chance, we ask:

Figure 8.3

The Impact of Node Removal The gradual fragmentation of a small network following the breakdown of its nodes. In each panel we remove a different node (highlighted with a green circle), together with its links. While the removal of the first node has only limited impact on the network’s integrity, the removal of the second node isolates two small clusters from the rest of the network. Finally, the removal of the third node fragments the network, breaking it into five non-communicating clusters of sizes s = 2, 2, 2, 5, 6.

• What is the expected size of the largest cluster? • What is the average cluster size? Obviously, the higher is p, the larger are the clusters. A key prediction of percolation theory is that the cluster size does not change gradually with p. Rather, for a wide range of p the lattice is populated with numerous tiny clusters (Figure 8.4a). If p approaches a critical value pc, these small clusters grow and coalesce, leading to the emergence of a large cluster at pc. We call this the percolating cluster as it reaches the

end of the lattice. In other words, at pc we observe a phase transition

from many small clusters to a percolating cluster that percolates the whole lattice (Figure 8.4b). To quantify the nature of this phase transition, we focus on three quantities: NETWORK ROBUSTNESS

5

• Average Cluster Size: ⟨s⟩ According to percolation theory the average size of all finite clusters follows

〈s〉 ∼ p − pc

−γ p

(8.1)

In other words, the average cluster size diverges as we approach pc (Figure 8.4c).

• Order Parameter: P∞ The probability P∞ that a randomly chosen pebble belongs to the largest cluster follows

P∞ ∼ ( p − pc ) p . β

(8.2)

Therefore as p decreases towards pc the probability that a pebble belongs to the largest cluster drops zero (Figure 8.4d). • Correlation Length: ξ The mean distance between two pebbles that belong to the same cluster follows −ν

ξ ∼ p − pc .

(8.3) Figure 8.4 Percolation

p = 0 .1

(a)

A classical problem in percolation theory explores the random placement with probability p of pebbles on a square lattice.

p = 0 .7

(b)

(a) For small p most pebbles are isolated. In this case the largest cluster has only three nodes, highlighted in purple. (b) For large p most (but not all) pebbles belong to a single cluster, colored purple. This is called the percolating cluster, as it spans the whole lattice (see also Figure 8.6).

(c)

(c) The average cluster size, ⟨s⟩, in function of p. As we approach pc from below, numerous small clusters coalesce and ⟨s⟩ diverges, following (8.1). The same divergence is observed above pc, where to calculate ⟨s⟩ we remove the percolating cluster from the average. The same exponent γp characterizes the divergence on both sides of the critical point.

(d)

s

P∞

1

0

0.25

0.5

p

NETWORK ROBUSTNESS

pc

0.75

1

0

0

0.25

0.5

p

pc

0.75

(d) A schematic illustration of the p−dependence of the probability P∞ that a pebble belongs to the largest connected component. For p < pc all components are small, so P∞ is zero. Once p reaches pc a giant component emerges. Consequently beyond pc there is a finite probability that a node belongs to the largest component, as predicted by (8.2).

1

6

PERCOLATION THEORY

Therefore while for p < pc the distance between the pebbles in the

same cluster is finite, at pc this distance diverges. This means that at pc the size of the largest cluster becomes infinite, allowing it to percolate the whole lattice. The exponents γp, βp, and ν are called critical exponents, as they char-

acterize the system’s behavior near the critical point pc. Percolation

theory predicts that these exponents are universal, meaning that they are independent of the nature of the lattice or the precise value of pc. Therefore, whether we place the pebbles on a triangular or a hexagonal lattice, the behavior of ⟨s⟩, P∞, and ξ is characterized by the same γp, βp, and ν exponents. Consider the following examples to better understand this universality: • The value of pc depends on the lattice type, hence it is not universal. For example, for a two-dimensional square lattice (Figure 8.4) we have

pc ≈ 0.593, while for a two-dimensional triangular lattice pc = 1/2 (site percolation). • The value of pc also changes with the lattice dimension: for a square

lattice pc ≈ 0.593 (d = 2); for a simple cubic lattice (d = 3) pc ≈ 0.3116.

Therefore in d = 3 we need to cover a smaller fraction of the nodes with pebbles to reach the percolation transition.

• In contrast with pc, the critical exponents do not depend on the lattice type, but only on the lattice dimension. In two dimensions, the case shown in Figure 8.4, we have γp = 43/18, βp = 5/36, and ν = 4/3, for any lattice. In three dimensions γp = 1.80, βp = 0.41, and ν = 0.88. For any

d > 6 we have γp = 1, βp = 1, ν = 1/2, hence for large d the exponents are

independent of d as well [2].

Inverse Percolation Transition and Robustness The phenomena of primary interest in robustness is the impact of node failures on the integrity of a network. We can use percolation theory to describe this process. Let us view a square lattice as a network whose nodes are the intersections (Figure 8.5). We randomly remove an f fraction of nodes, asking how their absence impacts the integrity of the lattice. If f is small, the missing nodes do little damage to the network. Increasing f, however, can isolate chunks of nodes from the giant component. Finally, for sufficiently large f the giant component breaks into tiny disconnected components (Figure 8.5). This fragmentation process is not gradual, but it is characterized by a critical threshold fc: For any f < fc we continue to have a giant component.

Once f exceeds fc, the giant component vanishes. This is illustrated by the

f-dependence of P∞, representing the probability that a node is part of the

NETWORK ROBUSTNESS

7

PERCOLATION THEORY

giant component (Figure 8.5): P∞ is nonzero under fc, but it drops to zero as

we approach fc. The critical exponents characterizing this breakdown, γp,

βp, ν, are the same as those encountered in (8.1)-(8.3). Indeed, the two pro-

cesses can be mapped into each other by choosing f = 1 − p.

What, however, if the underlying network is not as regular as a square lattice? As we will see in the coming sections, the answer depends on the precise network topology. Yet, for random networks the answer continues to be provided by percolation theory: Random networks under random node failures share the same scaling exponents as infinite-dimensional percolation. Hence the critical exponents for a random network are γp = 1, βp = 1 and ν = 1/2, corresponding to the d > 6 percolation exponents encountered

earlier. The critical exponents for a scale-free network are provided in ADVANCED TOPICS 8.A. In summary, the breakdown of a network under random node removal is not a gradual process. Rather, removing a small fraction of nodes has only limited impact on a network’s integrity. But once the fraction of removed nodes reaches a critical threshold, the network abruptly breaks into disconnected components. In other words, random node failures induce a phase transition from a connected to a fragmented network. We can use the tools of percolation theory to characterize this transition in both regular and in random networks. For scale-free networks key aspects of the described phenomena change, however, as we discuss in the next section.

1

P∞

0.75

0.5

0.25

0

0

f = 0.1

0.25

0.5

f

f = fc

0< f < fc :

f = fc :

There is a giant component.

The giant component vanishes.

0.75

1

f = 0.8

Figure 8.5 Network Breakdown as Inverse Percolation

The consequences of node removal are accurately captured by the inverse of the percolation process discussed in Figure 8.4. We start from a square lattice, that we view as a network whose nodes are the intersections. We randomly select and remove an f fraction of nodes and measure the size of the largest component formed by the remaining nodes. This size is accurately captured by P∞, which is the probability that a randomly selected node belongs to the largest component. The observed networks are shown on the bottom panels. Under each panel we list the characteristics of the corresponding phases.

f > fc : The lattice breaks into many tiny components.

P∞ ~ |f −f c | β

NETWORK ROBUSTNESS

8

PERCOLATION THEORY

BOX 8.1 From Forest Fires to Percolation Theory We can use the spread of a fire in a forest to illustrate the basic con-

(a)

cepts of percolation theory. Let us assume that each pebble in Figure 8.4a,b is a tree and that the lattice describes a forest. If a tree catch-

p = 0 .55

es fire, it ignites the neighboring trees; these, in turn ignite their neighbors. The fire continues to spread until no burning tree has a non-burning neighbor. We must therefore ask: If we randomly ignite a tree, what fraction of the forest burns down? And how long it takes the fire to burn out? The answer depends on the tree density, controlled by the parameter

(b)

p. For small p the forest consists of many small islands of trees (p = 0.55, Figure 8.6a), hence igniting any tree will at most burn down one

p = 0 .593

of these small islands. Consequently, the fire will die out quickly. For large p most trees belong to a single large cluster, hence the fire rapidly sweeps through the dense forest (p = 0.62, Figure 8.6c). The simulations indicate that there is a critical pc at which it takes ex-

tremely long time for the fire to end. This pc is the critical threshold of the percolation problem. Indeed, at p = pc the giant component just

(c)

emerges through the union of many small clusters (Figure 8.6b). Hence the fire has to follow a long winding path to reach all trees in the loose-

p = 0 .62

ly connected clusters, which can be rather time consuming.

Figure 8.6

Forest Fire The emergence of the giant component as we change the occupation probability p. Each panel corresponds to a different p in the vicinity of pc shown for a lattice of 250x250 sites. The largest cluster is colored black. For p < pc the largest cluster is tiny, as seen in (a). If this is a forest and the pebbles are trees, any fire can at most consume only a small fraction of the trees, burning out quickly. Once p reaches pc≈0.593, shown on (b), the largest cluster percolates the whole lattice and the fire can reach many trees, burning slowly through the forest. Increasing p beyond pc connects more pebbles (trees) to the largest component, as seen for p = 0.62 on (c). Hence, the fire can sweep through the forest, burning out quickly again.

NETWORK ROBUSTNESS

9

PERCOLATION THEORY

SECTION 8.3

ROBUSTNESS OF SCALE-FREE NETWORKS

(a)

    INTERNET

1

Percolation theory focuses mainly on regular lattices, whose nodes have identical degrees, or on random networks, whose nodes have compa-

 

0.75

P ∞ ( f ) /P ∞ (0)

rable degrees. What happens, however, if the network is scale-free? How do the hubs affect the percolation transition? To answer these questions, let us start from the router level map of the

0.5

0.25

Internet and randomly select and remove nodes one-by-one. According to percolation theory once the number of removed nodes reaches a critical

0

value fc, the Internet should fragment into many isolated subgraphs (Figure

0

0.25

0.5

f

0.75

1

8.5). The simulations indicate otherwise: The Internet refuses to break apart even under rather extensive node failures. Instead the size of the largest

(b)

component decreases gradually, vanishing only in the vicinity of f = 1 (Fig-

SCALE-FREE NETWORK

1

ure 8.7a). This means that the network behind the Internet shows an unusual robustness to random node failures: we must remove all of its nodes to

P ∞ ( f ) /P ∞ (0)

0.75

destroy its giant component. This conclusion disagrees with percolation on lattices, which predicts that a network must fall apart after the removal of a finite fraction of its nodes.

0.5

0.25

The behavior observed above is not unique to the Internet. To show this we repeated the above measurement for a scale-free network with degree exponent γ = 2.5, observing an identical pattern (Figure 8.7b): Under random node removal the giant component fails to collapse at some finite fc,

but vanishes only gradually near f = 1 (Online Resource 8.1). This hints that the Internet's observed robustness is rooted in its scale-free topology. The goal of this section is to uncover and quantify the origin of this remarkable robustness.

0

0

0.25

0.5

f

0.75

1

Figure 8.7 Robustness of Scale-free Networks (a) The fraction of Internet routers that belong to the giant component after an f fraction of routers are randomly removed. The ratio P∞( f)/P∞(0) provides the relative size of the giant component. The simulations use the router level Internet topology of Table 4.1.

(b) The fraction of nodes that belong to the giant component after an f fraction of nodes are removed from a scale-free network with γ = 2.5, N = 10,000 and kmin = 1. The plots indicate that the Internet and in general a scale-free network do not fall apart after the removal of a finite fraction of nodes. We need to remove almost all nodes (i.e. fc=1) to fragment these networks. NETWORK ROBUSTNESS

10

Molloy-Reed Criterion To understand the origin of the anomalously high fc characterizing the

Internet and scale-free networks, we calculate fc for a network with an arbitrary degree distribution. To do so we rely on a simple observation:

>

For a network to have a giant component, most nodes that belong to it must be connected to at least two other nodes (Figure 8.8). This leads to the Molloy-Reed criterion (ADVANCED TOPICS 8.B), stating that a randomly wired network has a giant component if [6]

κ= Networks with

〈k 2 〉 > 2. 〈k〉

(8.4)

κ < 2 lack a giant component, being fragmented into

many disconnected components. The Molloy-Reed criterion (8.4) links the network’s integrity, as expressed by the presence or the absence of a giant component, to ⟨k⟩ and ⟨k2⟩. It is valid for any degree distribution pk.

Online Resource 8.1 Scale-free Network Under Node Failures

To illustrate the robustness of a scale-free network we start from the network we constructed in Online Resource 4.1, i.e. a scale-free network generated by the Barabási-Albert model. Next we randomly select and remove nodes one-by-one. As the movie illustrates, despite the fact that we remove a significant fraction of the nodes, the network refuses to break apart. Visualization by Dashun Wang.

To illustrate the predictive power of (8.4), let us apply it to a random network. As in this case ⟨k2⟩ = ⟨k⟩(1 + ⟨k⟩), a random network has a giant component if

κ=

〈k 2 〉 〈k〉(1+ 〈k〉) = = 1+ 〈k〉 > 2 〈k〉 〈k〉

(8.5)

or

>

〈k〉 > 1 .

(8.6)

This prediction coincides with the necessary condition (3.10) for the existence of a giant component. Critical Threshold To understand the mathematical origin of the robustness observed in Figure 8.7, we ask at what threshold will a scale-free network loose its giant component. By applying the Molloy-Reed criteria to a network with an arbitrary degree distribution, we find that the critical threshold follows [7] (ADVANCED TOPICS 8.C)

fc = 1−

1 . 〈k 2 〉 −1 〈k〉

(8.7)

Figure 8.8

Molloy-Reed Criterion Each individual must hold the hand of two other individuals to form a chain. Similarly, to have a giant component in a network, on average each of its nodes should have at least two neighbors. The Molloy-Reed criterion (8.4) exploits this property, allowing us to calculate the critical point at which a network breaks apart. See ADVANCED TOPICS 8.B for the derivation.

The most remarkable prediction of (8.7) is that the critical threshold fc depends only on ⟨k⟩ and ⟨k2⟩, quantities that are uniquely determined by the degree distribution pk. Let us illustrate the utility of (8.7) by calculating the breakdown threshold of a random network. Using ⟨k2⟩ = ⟨k⟩(⟨k⟩ + 1), we obtain (ADVANCED TOPICS 8.D)

fc ER = 1−

1 . 〈k〉

(8.8)

Hence, the denser is a random network, the higher is its fc, i.e. the more NETWORK ROBUSTNESS

11

ROBUSTNESS OF SCALE-FREE NETWORKS

nodes we need to remove to break it apart. Furthermore (8.8) predicts

1

γ = 4.0

0.75

γ = 2.0

that fc is always finite, hence a random network must break apart after

γ = 3.0

P ∞ ( f ) /P ∞ (0)

the removal of a finite fraction of nodes. Equation (8.7) helps us understand the roots of the enhanced robustness observed in Figure 8.7. Indeed, for scale-free networks with γ < 3 the second moment ⟨k2⟩ diverges in the N → ∞ limit. If we insert ⟨k2⟩ → ∞ into

0.5

(8.7), we find that fc converges to fc = 1. This means that to fragment a

0.25

random removal of a finite fraction of its nodes does not break apart a

0

scale-free network we must remove all of its nodes. In other words, the 0

0.25

0.5

f

0.75

1

large scale-free network. Figure 8.9

To better understand this result we express ⟨k⟩ and ⟨k2⟩ in terms of the

Robustness and Degree Exponent

parameters characterizing a scale-free network: the degree exponent γ

The probability that a node belongs to the giant component after the removal of an f fraction of nodes from a scale-free network with degree exponent γ. For γ = 4 we observe a finite critical point fc≃2/3, as predicted by (8.9). For γ < 3, however, fc → 1. The networks were generated with the configuration model using kmin = 2 and N = 10, 000.

and the minimal and maximal degrees, kmin and kmax, obtaining

1 fc =

3 1

2

1 k

k

2 3 min max

1

2 k 1 3 min

1

2
3 the critical threshold fc depends only on γ and kmin, hence fc

is independent of the network size N. In this regime a scale-free network behaves like a random network: it falls apart once a finite fraction of its nodes are removed.

• For

γ < 3 the kmax diverges for large N, following (4.18). Therefore in

the N → ∞ limit (8.9) predicts fc → 1. In other words, to fragment an infinite scale-free network we must remove all of its nodes. Equations (8.6)-(8.9) are the key results of this chapter, predicting that scale-free networks can withstand an arbitrary level of random failures without breaking apart. The hubs are responsible for this remarkable robustness. Indeed, random node failures by definition are blind to degree, affecting with the same probability a small or a large degree node. Yet, in a scale-free network we have far more small degree nodes than hubs. Therefore, random node removal will predominantly remove one of the numerous small nodes as the chances of selecting randomly one of the few large hubs is negligible. These small nodes contribute little to a network’s integrity, hence their removal does little damage. Returning to the airport analogy of Figure 4.6, if we close a randomly selected airport, we will most likely shut down one of the numerous small airports. Its absence will be hardly noticed elsewhere in the world: you can still travel from New York to Tokyo, or from Los Angeles to Rio de Janeiro. NETWORK ROBUSTNESS

12

ROBUSTNESS OF SCALE-FREE NETWORKS

Link Removal

1

Robustness of Finite Networks

Node Removal

Equation (8.9) predicts that for a scale-free network fc converges to one

P ∞ ( f ) /P ∞ (0)

0.75

only if kmax → ∞, which corresponds to the N → ∞ limit. While many networks of practical interest are very large, they are still finite, prompting us to ask if the observed anomaly is relevant for finite networks. To address this we insert (4.18) into (8.9), obtaining that fc depends on the

0.5

0.25

network size N as (ADVANCED TOPICS 8.C)

0

C fc ≈ 1− 3−γ , N

0.25

0.5

(8.10)

γ −1

f

0.75

Robustness and Link Removal

cates that the larger a network, the closer is its critical threshold to fc = 1.

What happens if we randomly remove the links rather than the nodes? The calculations predict that the critical threshold fc is the same for random link and node removal [7, 8]. To illustrate this, we compare the impact of random node and link removal on a random network with ⟨k⟩ = 2. The plot indicates that the network falls apart at the same critical threshold fc ≃ 0.5. The difference is in the shape of the two curves. Indeed, the removal of an f fraction of nodes leaves us with a smaller giant component than the removal of an f fraction of links. This is not unexpected: on average each node removes ⟨k⟩ links. Hence the removal of an f fraction of nodes is equivalent with the removal of an f⟨k⟩ fraction of links, which clearly makes more damage than the removal of an f fraction of links.

To see how close fc can get to the theoretical limit fc = 1, we calculate fc for the Internet. The router level map of the Internet has ⟨k2⟩/⟨k⟩ = 37.91 (Table 4.1). Inserting this ratio into (8.7) we obtain fc = 0.972. Therefore, we need to remove 97% of the routers to fragment the Internet into disconnected components. The probability that by chance 186,861 routers fail simultaneously, representing 97% of the N = 192,244 routers on the Internet, is effectively zero. This is the reason why the topology of the Internet is so robust to random failures. In general a network displays enhanced robustness if its breakdown threshold deviates from the random network prediction (8.8), i.e. if (8.11)

Enhanced robustness has several ramifications: • The inequality (8.11) is satisfied for most networks for which ⟨k2⟩ deviates from ⟨k⟩(⟨k⟩ + 1). According to Figure 4.8, for virtually all reference networks ⟨k2⟩ exceeds the random expectation. Hence the robustness predicted by (8.7) affects most networks of practical interest. This is illustrated in Table 8.1, that shows that for most reference networks (8.11) holds. • Equation (8.7) predicts that the degree distribution of a network does not need to follow a strict power law to display enhanced robustness. All we need is a larger ⟨k2⟩ than expected for a random network of similar size. • The scale-free property changes not only fc, but also the critical expo-

nents γp, βp and ν in the vicinity of fc. Their dependence on the degree exponent γ is discussed in ADVANCED TOPICS 8.A.

• Enhanced robustness is not limited to node removal, but emerges under link removal as well (Figure 8.10).

NETWORK ROBUSTNESS

1

Figure 8.10

where C collects all terms that do not depend on N. Equation (8.10) indi-

fc > fcER.

0

13

ROBUSTNESS OF SCALE-FREE NETWORKS

In summary, in this section we encountered a fundamental property of real networks: their robustness to random failures. Equation (8.7) predicts that the breakdown threshold of a network depends on ⟨k⟩ and ⟨k2⟩, which in turn are uniquely determined by the network's degree distribution. Therefore random networks have a finite threshold, but for scale-free networks with γ < 3 the breakdown threshold converges to one. In other words, we need to remove all nodes to break a scale-free network apart, indicating that these networks show an extreme robustness to random failures. The origin of this extreme robustness is the large ⟨k2⟩ term. Given that for most real networks ⟨k2⟩ is larger than the random expectation, enhanced robustness is a generic property of many networks. This robustness is rooted in the fact that random failures affect mainly the numerous small nodes, which play only a limited role in maintaning a network’s integrity.

NETWORK

RANDOM FAILURES (REAL NETWORK)

Internet

RANDOM FAILURES

(RANDOMIZED NETWORK)

ATTACK

(REAL NETWORK)

Table 8.1

Breakdown Thresholds Under Random Failures and Attacks

WWW

The table shows the estimated fc for random node failures (second column) and attacks (fourth column) for ten reference networks. The procedure for determining fc is described in ADVANCED TOPICS 8.E. The third column (randomized network) offers fc for a network whose N and L coincides with the original network, but whose nodes are connected randomly to each other (randomized network, f cER, determined by (8.8)). For most networks fc for random failures exceeds f cER for the corresponding randomized network, indicating that these networks display enhanced robustness, as they satisfy (8.11). Three networks lack this property: the power grid, a consequence of the fact that its degree distribution is exponential (Figure 8.31a), and the actor and the citation networks, which have a very high ⟨k⟩, diminishing the role of the high ⟨k2⟩ in (8.7).

Power Grid Mobile-Phone Call Email Science Collaboration Actor Network

0.98

Citation Network E. Coli Metabolism Yeast Protein Interactions

NETWORK ROBUSTNESS

14

ROBUSTNESS OF SCALE-FREE NETWORKS

SECTION 8.4

ATTACK TOLERANCE

The important role the hubs play in holding together a scale-free net-

Attacks

1

work motivates our next question: What if we do not remove the nodes

Random Failures

randomly, but go after the hubs? That is, we first remove the highest de0.75

P ∞ ( f ) /P ∞ (0)

gree node, followed by the node with the next highest degree and so on. The likelihood that nodes would break in this particular order under normal conditions is essentially zero. Instead this process mimics an attack on the network, as it assumes a detailed knowledge of the network topology, an

0.5

0.25

ability to target the hubs, and a desire to deliberately cripple the network [1].

0

0

0.25

0.5

f

0.75

1

The removal of a single hub is unlikely to fragment a network, as the remaining hubs can still hold the network together. After the removal of a few hubs, however, large chunks of nodes start falling off (Online Resource 8.2). If the attack continues, it can rapidly break the network into tiny clus-

Figure 8.11

Scale-free Network Under Attack The probability that a node belongs to the largest connected component in a scale-free network under attack (purple) and under random failures (green). For an attack we remove the nodes in a decreasing order of their degree: we start with the biggest hub, followed by the next biggest and so on. In the case of failures the order in which we choose the nodes is random, independent of the node’s degree. The plot illustrates a scale-free network’s extreme fragility to attacks: fc is small, implying that the removal of only a few hubs can disintegrate the network. The initial network has degree exponent γ = 2.5, kmin = 2 and N = 10,000.

ters. The impact of hub removal is quite evident in the case of a scale-free network (Figure 8.11): the critical point, which is absent under random failures, reemerges under attacks. Not only reemerges, but it has a remarkably low value. Therefore the removal of a small fraction of the hubs is sufficient to break a scale-free network into tiny clusters. The goal of this section is to quantify this attack vulnerability. Critical Threshold Under Attack An attack on a scale-free network has two consequences (Figure 8.11): • The critical threshold fc is smaller than fc = 1, indicating that under attacks a scale-free network can be fragmented by the removal of a finite fraction of its hubs. • The observed fc is remarkably low, indicating that we need to remove only a tiny fraction of the hubs to cripple the network. To quantify this process we need to analytically calculate fc for a netNETWORK ROBUSTNESS

15

work under attack. To do this we rely on the fact that hub removal changes the network in two ways [9]: • It changes the maximum degree of the network from kmax to k'max as all

>

nodes with degree larger than k'max have been removed.

• The degree distribution of the network changes from pk to p'k', as nodes connected to the removed hubs will loose links, altering the degrees of the remaining nodes. By combining these two changes we can map the attack problem into the robustness problem discussed in the previous section. In other words,

Online Resource 8.2

we can view an attack as random node removal from a network with ad-

Scale-free Networks Under Attack

justed k'max and p'k'. The calculations predict that the critical threshold fcfor

During an attack we aim to inflict maximum damage on a network. We can do this by removing first the highest degree node, followed by the next highest degree, and so on. As the movie illustrates, it is sufficient to remove only a few hubs to break a scale-free network into disconnected components. Compare this with the network’s refusal to break apart under random node failures, shown in Online Resource 8.1. Visualization by Dashun Wang.

attacks on a scale-free network is the solution of the equation [9, 10] (ADVANCED TOPICS 8.F) 2−γ

fc1−γ = 2 +

3−γ

2 −γ kmin ( fc1−γ − 1). 3−γ

(8.12)

Figure 8.12 shows the numerical solution of (8.12) in function of the degree exponent γ, allowing us to draw several conclusions:

>

• While fc for failures decreases monotonically with γ, fc for attacks can have a non-monotonic behavior: it increases for small γ and decreases for large γ. • fc for attacks is always smaller than fc for random failures.

1

Random Failures

• For large γ a scale-free network behaves like a random network. As a

Attacks 0.8

random network lacks hubs, the impact of an attack is similar to the impact of random node removal. Consequently the failure and the

kmin = 3

kmin = 2

0.6

kmin = 3

fc

attack thresholds converge to each other for large γ. Indeed, if γ →

0.4

∞ then pk → δ(k − kmin), meaning that all nodes have the same degree kmin. Therefore random failures and targeted attacks become indistin-

kmin = 2

0.2

guishable in the γ → ∞ limit, obtaining 0





1

fc → 1− (kmin − 1)



(8.13)

• As Figure 8.13 shows, a random network has a finite percolation thresh-

2

3

4

γ

6

7

8

Figure 8.12

Critical Threshold Under Attack

old under both random failures and attacks, as predicted by Figure 8.12

The dependence of the breakdown threshold, fc, on the degree exponent γ for scale-free networks with kmin = 2, 3. The curves are predicted by (8.12) for attacks (purple) and by (8.7) for random failures (green).

and (8.13) for large γ. The airport analogy helps us understand the fragility of scale-free networks to attacks: The closing of two large airports, like Chicago’s O’Hare Airport or the Atlanta International Airport, for only a few hours would be headline news, altering travel throughout the U.S. Should some series of events lead to the simultaneous closure of the Atlanta, Chicago, Denver, and New York airports, the biggest hubs, air travel within the North American continent would come to a halt within hours.

NETWORK ROBUSTNESS

5

16

ATTACK TOLERANCE

In summary, while random node failures do not fragment a scale-free

Attacks Random Failures

1

network, an attack that targets the hubs can easily destroy such a network. This fragility is bad news for the Internet, as it indicates that it is inherentP ∞ ( f ) /P ∞ (0)

ly vulnerable to deliberate attacks. It can be good news in medicine, as the vulnerability of bacteria to the removal of their hub proteins offers avenues to design drugs that kill unwanted bacteria.

0.75

0.5

0.25

0 0

0.25

0.5

f

0.75

1

Figure 8.13

Attacks and Failures in Random Networks The fraction of nodes that belong to the giant component in a random network if an f fraction of nodes are randomly removed (green) and in decreasing order of their degree (purple). Both curves indicate the existence of a finite threshold, in contrast with scale-free networks, for which fc→ 1 under random failures. The simulations were performed for random networks with N = 10,000 and ⟨k⟩ = 3.

NETWORK ROBUSTNESS

17

ATTACK TOLERANCE

BOX 8.2 PAUL BARAN AND THE INTERNET

In 1959 RAND, a Californian think-tank, has assigned Paul Baran, a young engineer at that time, to develop a communication system that can survive a Soviet nuclear attack. As a nuclear strike handicaps all equipment within the range of the detonation, Baran had to design a system whose users outside this range do not loose contact with one another. He described the communication network of his time as a “hierarchical structure of a set of stars connected in the form of a larger star,” offering an early description of what we call today a scale-free network [11]. He concluded that this topology is too centralized to be viable under attack. He also discarded the hub-and-spoke topology shown in Figure 8.14a, noting that the “centralized network is obviously vulnerable as destruction of a single central node destroys communication between the end stations.” Baran decided that the ideal survivable architecture was a distributed mesh-like network (Figure 8.14c). This network is sufficiently redundant, so that even if some of its nodes fail, alternative paths can connect the remaining nodes. Baran’s ideas were ignored by the military, so when the Internet was born a decade later, it relied on distributed protocols that allowed each node to decide where to link. This decentralized philosophy paved the way to the emergence of a scale-free Internet, rather than the uniform mesh-like topology envisioned by Baran.

Figure 8.14 Baran’s Network

LINK STATION

(a)

CENTRALIZED

NETWORK ROBUSTNESS

(b)

DECENTRALIZED

(c)

DISTRIBUTED

Possible configurations of communication networks, as envisioned by Paul Baran in 1959. After [11].

18

ATTACK TOLERANCE

SECTION 8.5

CASCADING FAILURES

Throughout this chapter we assumed that each node failure is a random event, hence the nodes of a network fail independently of each other. In reality, in a network the activity of each node depends on the activity of its neighboring nodes. Consequently the failure of a node can induce the failure of the nodes connected to it. Let us consider a few examples: • Blackouts (Power Grid) After the failure of a node or a link the electric currents are instantaneously reorganized on the rest of the power grid. For example, on August 10, 1996, a hot day in Oregon, a line carrying 1,300 megawatts

Figure 8.15

sagged close to a tree and snapped. Because electricity cannot be

Domino Effect

stored, the current it carried was automatically shifted to two lower

The domino effect is the fall of a series of dominos induced by the fall of the first domino. The term is often used to refer to a sequence of events induced by a local change, that propagates through the whole system. Hence the domino effect represents perhaps the simplest illustration of cascading failures, the topic of this section.

voltage lines. As these were not designed to carry the excess current, they too failed. Seconds later the excess current lead to the malfunction of thirteen generators, eventually causing a blackout in eleven U.S. states and two Canadian provinces [12]. • Denial of Service Attacks (Internet) If a router fails to transmit the packets received by it, the Internet protocols will alert the neighboring routers to avoid the troubled equipment by re-routing the packets using alternative routes. Consequently a failed router increases traffic on other routers, potentially inducing a series of denial of service attacks throughout the Internet [13]. • Financial Crises Cascading failures are common in economic systems. For example, the drop in the house prices in 2008 in the U.S. has spread along the links of the financial network, inducing a cascade of failed banks, companies and even nations [14, 15, 16]. It eventually caused the worst global financial meltdown since the 1930s Great Depression. While they cover different domains, these examples have several common characteristics. First, the initial failure had only limited impact on NETWORK ROBUSTNESS

19

the network structure. Second, the initial failure did not stay localized, but it spread along the links of the network, inducing additional failures. Eventually, multiple nodes lost their ability to carry out their normal functions. Consequently each of these systems experienced cascading failures, a dangerous phenomena in most networks [17]. In this section we discuss the empirical patterns governing such cascading failures. The modeling of these events is the topic of the next section. EMPIRICAL RESULTS Cascading failures are well documented in the case of the power grid, information systems and tectonic motion, offering detailed statistics about their frequency and magnitude. • Blackouts A blackout can be caused by power station failures, damage to electric transmission lines, a short circuit, and so on. When the operating limits of a component is exceeded, it is automatically disconnected to protect it. Such failure redistributes the power previously carried by the failed component to other components, altering the power flow, the frequency, the voltage and the phase of the current, and the operation of the control, monitoring and alarm systems. These changes can in turn disconnect other components as well, starting an avalanche of failures. A frequently recorded measure of blackout size is the energy unserved. Figure 8.17a shows the probability distribution p(s) of energy unserved in all North American blackouts between 1984 and 1998. Electrical engineers approximate the obtained distribution with the power law [18],

p(s) ~ s −α,

(8.14)

where the avalanche exponent α is listed in Table 8. 2 for several countries. The power law nature of this distribution indicates that most blackouts are rather small, affecting only a few consumers. These coexists, however, with occasional major blackouts, when millions of consumers lose power (Figure 8.16). • Information Cascades Modern communication systems, from email to Facebook or Twitter,

Figure 8.16 Northeast Blackout of 2003 One of the largest blackouts in North America took place on August 14, 2003, just before 4:10 p.m. Its cause was a software bug in the alarm system at a control room of the First Energy Corporation in Ohio. Missing the alarm, the operators were unaware of the need to redistribute the power after an overloaded transmission line hit a tree. Consequently a normally manageable local failure began a cascading failure that shut down more than 508 generating units at 265 power plants, leaving an estimated 10 million people without electricity in Ontario and 45 million in eight U.S. states. The figure highlights the states affected by the August 14, 2003 blackout. For a satelite image of the blackout, see Figure 1.1.

facilitate the cascade-like spreading of information along the links of the social network. As the events pertaining to the spreading process often leave digital traces, these platforms allow researchers to detect the underlying cascades. The micro-blogging service Twitter has been particularly studied in this context. On Twitter the network of who follows whom can be reconstructed by crawling the service's follower graph. As users frequently share web-content using URL shorteners, one can also track each spreading/sharing process. A study tracking 74 million such events over two months followed the diffusion of each URL from a

NETWORK ROBUSTNESS

20

CASCADING FAILURES

particular seed node through its reposts until the end of a cascade

(a)

-1

10

POWER FAILURES

(Figure 8.18). As Figure 8.17b indicates, the size distribution of the ob-2

served cascades follows the power-law (8.14) with an avalanche exponent α

10

≈ 1.75 [19]. The power law indicates that the vast majority of

-3

posted URLs do not spread at all, a conclusion supported by the fact

10

p(s)

that the average cascade size is only ⟨s⟩ = 1.14. Yet, a small fraction of

-4

10

URLs are reposted thousands of times.

-5

10

• Earthquakes Geological fault surfaces are irregular and sticky, prohibiting their

-6

10

smooth slide against each other. Once a fault has locked, the contin-

10

0

1

ued relative motion of the tectonic plates accumulate an increasing (b)

amount of strain energy around the fault surface. When the stress becomes sufficient to break through the asperity, a sudden slide re-

10

leases the stored energy, causing an earthquake. Earthquakes can be

10

also induced by the natural rupture of geological faults, by volcanic

10

2



-3

p(s)

-4

10

-5

Each year around 500,000 earthquakes are detected with instrumen-

10

tation. Only about 100,000 of these are sufficiently strong to be felt

10

✁ ✁

10

(c)

[20].

10 10

non, given the difficulty of mapping out the precise network of inter-

10

P(s)

bear many similarities to network-based cascading events, suggest-

1

2

10 10 10 s: RETWEET NUMBER

3

10

4

EARTHQUAKES

3

10

dependencies that causes them. Yet, the resulting cascading failures

0

4

Earthquakes are rarely considered a manifestly network phenome-

10

Shallow (0-70 km) Intermediate (70-300 km) Deep (300-700 km)

2 1 0

-1

10

ing common mechanisms.

-2

10

-3

10

The power-law distribution (8.14) followed by blackouts, informa-

5.5

6

6.5

7

7.5

8

log s: EARTHQUAKE MAGNITUDE

8.5

Figure 8.17

Cascade Size Distributions (a) The distribution of energy loss for all North American blackouts between 1984 and 1998, as documented by the North American Electrical Reliability Council. The distribution is typically fitted to (8.14). The reported exponents for different countries are listed in Table 8.2. After [18].

or earthquakes so small that one needs sensitive instruments to detect them. Equation (8.14) predicts that these numerous small events coexist with a few exceptionally large events. Examples of such major cascades include the 2003 power outage in North America (Figure 8.16), the tweet Iran Election Crisis: 10 Incredible YouTube Videos http://bit.ly/vPDLo that was shared 1,399 times

(b) The distribution of cascade sizes on Twitter. While most tweets go unnoticed, a tiny fraction of tweets are shared thousands of times. Overall the retweet numbers are well approximated with (8.14) with α ≃ 1.75. After [19].

[21], or the January 2010 earthquake in Haiti, with over 200,000 victims. Interestingly, the avalanche exponents reported by electrical engineers, media researches and seismologists are surprisingly close to each other, being between 1.6 and 2 (Table 8.2).

(c) The cumulative distribution of earthquake amplitudes recorded between 1977 and 2000. The dashed lines indicate the power law fit (8.14) used by seismologists to characterize the distribution. The earthquake magnitude shown on the horizontal axis is the logarithm of s, which is the amplitude of the observed seismic waves. After [20].

Cascading failures are documented in many other environments: • The consequences of bad weather or mechanical failures can cascade through airline schedules, delaying multiple flights and

NETWORK ROBUSTNESS



-7

≈ 1.67 (Figure 8.17c)

electricity in a few houses, tweets of little interest to most users,



-6

by humans. Seismologists approximate the distribution of earth-

ures are relatively small. These small cascades capture the loss of

4

-2

10

tion cascades and earthquakes indicates that most cascading fail-

10

TWITTER CASCADES

-1

activity, landslides, mine blasts and even nuclear tests.

quake amplitudes with the power law (8.14) with α

3

10 10 10 s: ENERGY UNSERVED (MWH)

21

CASCADING FAILURES

stranding thousands of passengers (BOX 8.3) [22]. • The disappearance of a species can cascade through the food web of an ecosystem, inducing the extinction of numerous species and altering the habitat of others [23, 24, 25, 26]. • The shortage of a particular component can cripple supply chains. For example, the 2011 floods in Thailand have resulted in a chronic shortage of car components that disrupted the production chain of more than 1,000 automotive factories worldwide. Therefore the damage was not limited to the flooded factories, but resulted in worldwide insurance claims reaching $20 billion [27]. In summary, cascading effects are observed in systems of rather dif-

Figure 8.18 Information Cascades

ferent nature. Their size distribution is well approximated with the power

Examples of information cascades on Twitter. Nodes denote Twitter accounts, the top node corresponding to the account that first posted a certain shortened URL. The links correspond to those who retweeted it. These cascades capture the heterogeneity of information avalanches: most URLs are not retweeted at all, appearing as single nodes in the figure. Some, however, start major retweet avalanches, like the one seen at the bottom panel. After [19].

law (8.14), implying that most cascades are too small to be noticed; a few, however, are huge, having a global impact. The goal of the next section is to understand the origin of these phenomena and to build models that can reproduce its salient features.

SOURCE

EXPONENT

CASCADE

Table 8.2

Power grid (North America)

Avalanche Exponents in Real Systems.

Power grid (Sweden)

The reported avalanche exponents of the power law distribution (8.14) for energy loss in various countries [18], twitter cascades [19] and earthquake sizes [20]. The third column indicates the nature of the measured cascade size s, corresponding to power or energy not served, the number of retweets generated by a typical tweet and the amplitude of the seismic wave.

Power grid (Norway) Power grid (New Zealand) Power grid (China) Twitter Cascades Earthquakes

NETWORK ROBUSTNESS

Seismic Wave

22

CASCADING FAILURES

BOX 8.3 CASCADING FLIGHT CONGESTIONS

Flight delays in the U.S. have an economic impact of over $40 billion per year [28], caused by the need for enhanced operations, passenger loss of time, decreased productivity and missed business and leisure opportunities. A flight delay is the time difference between the expected and actual departure/arrival times of a flight. Airline schedules include a buffer period between consecutive flights to accommodate short delays. When a delay exceeds this buffer, subsequent flights that use the same aircraft, crew or gate, are also delayed. Consequently a delay can propagate in a cascade-like fashion through the airline network. While most flights in 2010 were on time, 37.5% arrived or departed late [22]. The delay distribution follows (8.14), implying that while most flights were delayed by just a few minutes, a few were hours behind NETWORK ROBUSTNESS

CASCADING FAILURES

schedule. These long delays induce correlated delay patterns, a signature of cascading congestions in the air transportation system (Figure 8.19).

Figure 8.19 Clusters of Congested Airports U.S. aviation map showing congested airports as purple nodes, while those with normal traffic as green nodes. The lines correspond to the direct flights between them on March 12, 2010. The clustering of the congested airports indicate that the dealys are not independent of each other, but cascade through the airport network. After [22].

23

SECTION 8.6

MODELING CASCADING FAILURES

The emergence of a cascading event depends on many variables, from the structure of the network on which the cascade propagates, to the nature of the propagation process and the breakdown criteria of each individual component. The empirical results indicate that despite the diversity of these variables, the size distribution of the observed avalanches is universal, being independent of the particularities of the system. The purpose of this section is to understand the mechanisms governing cascading phenomena and to explain the power-law nature of the avalanche size distribution. Numerous models have been proposed to capture the dynamics of cascading events [18, 29, 30, 31, 32, 33, 34, 35]. While these models differ in the degree of fidelity they employ to capture specific phenomena, they indicate that systems that develop cascades share three key ingredients: (i) The system is characterized by some flow over a network, like the flow of electric current in the power grid or the flow of information in communication systems. (ii) Each component has a local breakdown rule that determines when it contributes to a cascade, either by failing (power grid, earthquakes) or by choosing to pass on a piece of information (Twitter). (iii) Each system has a mechanism to redistribute the traffic to other nodes upon the failure or the activation of a component. Next, we discuss two models that predict the characteristics of cascading failures at different levels of abstraction.

NETWORK ROBUSTNESS

24

A !" φ=0.4 (a) φ=0.4 E f=1/2

A !"

FAILURE PROPAGATION MODEL Introduced to model the spread of ideas and opinions [30], the failure

A !"

B $" C f=1/3 f=1/2

f=1/2

well [35]. The model is defined as follows:

φ=0.4 Consider a network with an arbitrary degree distribution, where each E f=1/2 16 14

node contains an agent. An agent i can be in the state 0 (active or healthy)12

A !"

(c)

φi = φ for all i.

D

8

6

6

4

4

0

k

φ

C

f=2/3

10-2

10-2

p (s )

10-3

10

10-4

10-4

φ SUPERCRITICAL

C

101

-3

D

p (s )

10-5 10 2 100

s

f=2/3

f=1/2 3 1 1010

10-3

102 4 10

s

0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 0.26

LOWER CRITICAL POINT UPPER CRITICAL POINT SUBCRITICAL SUPERCRITICAL

10-2

10-3

6

• If the selected agent i is in state 1, it does not 4change its state.

10-4

SUPERCRITICAL

10-5 0 10

0.26

101

102

s

103

104

Figure 8.20 Failure Propagation Model

of any other node. It can also lead to the failure of multiple nodes, as il-

(a,b) The development of a cascade in a small network in which each node has the same breakdown threshold φ = 0.4. Initially all nodes are in state 0, shown as green circles. After node A changes its state to 1 (purple), its neighbors B and E will have a fraction f = 1/2 > 0.4 of their neighbors in state 1. Consequently they also fail, changing their state to 1, as shown in (b). In the next time step C and D will also fail, as both have f > 0.4. Consequently the cascade sweeps the whole network, reaching a size s = 5. One can check that if we initially flip node B, it will not induce an avalanche.

lustrated in Figure 8.20a,b. The simulations document three regimes with distinct avalanche characteristics (Figure 8.20c): • Subcritical Regime If ⟨k⟩ is high, changing the state of a node is unlikely to move other nodes over their threshold, as the healthy nodes have many healthy neighbors. In this regime cascades die out quickly and their sizes follow an exponential distribution. Hence the system is unable to support large global cascades (blue symbols, Figure 8.20c,d). • Supercritical Regime

(c) The phase diagram of the failure propagation model in terms of the threshold function φ and the average degree ⟨k⟩ of the network on which the avalanche propagates. The continuous line encloses the region of the (⟨k⟩, φ) plane in which the cascades can propagate in a random graph.

If ⟨k⟩ is small, flipping a single node can put several of its neighbors over the threshold, triggering a global cascade. In this regime perturbations induce major breakdowns (purple symbols, Figure 8.20c,d). • Critical Regime

(d) Cascade size distributions for N = 10,000 and φ = 0.18, ⟨k⟩ = 1.05 (green), ⟨k⟩ = 3.0 (purple), ⟨k⟩ = 5.76 (orange) and ⟨k⟩ = 10.0 (blue). At the lower critical point we observe a power law p(s) with exponent α = 3/2 . In the supercritical regime we have only a few small avalanches, as most cascades are global. In the upper critical and subcritical regime we see only small avalanches. After [30].

At the boundary of the subcritical and supercritical regime the avalanches have widely different sizes. Numerical simulations indicate that in this regime the avalanche sizes s follow (8.14) (green and orange symbols, Figure 8.21d) with α = 3/2 if the underlying network is random.

25

f=2/3

10-2

100

8 its original state 0. of its ki neighbors are in state 1, otherwise itkretains

f=1/2

p (s )

φ

10 bors. The agent i adopts state 1 (i.e. it also fails) if at least a φ fraction

NETWORK ROBUSTNESS

B

2

12 the state of its k neigh• If the selected agent i is in state 0, it inspects i

tial perturbation can die out immediately, failing to induce the failure

8

0

(d)

D

f=1/2

E

p (s )

6

10-1

0 In other words, a healthy node i changes its state if a φ fraction of its 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24

D

100 LOWER CRITICAL POINT LOWER CRITICAL POINT UPPER CRITICAL POINT UPPER CRITICAL POINT 10-1 SUBCRITICAL SUBCRITICAL 10-1 SUPERCRITICAL SUPERCRITICAL

10-1

10-5 0 10 0.1 0.12 0.14 0.16 00.18 0.2 0.14 0.22 0.16 0.24 0.18 0.26 0.2 0.22 0.24 0.26 0.1 0.12

SUBCRITICAL

φ neighbors have failed. Depending on the local network topology, an ini-

A !"

10

SUPERCRITICAL 2 SUPERCRITICAL 4

16

2

B C

E

100

12

we randomly pick an agent and update its state following a threshold 14

B

0

the release of a new piece of information. In each subsequent time step rule:

$" C f=1/3

f=0

10 SUBCRITICAL

10

k

E A !"

16

12

2

D

14

8

f=0

A !"

14 SUBCRITICAL SUBCRITICAL

f=0

B All agents are initially in the healthy state 0. At time t = 0 one agent $" C f=1/3 switches to state 1, corresponding to an initialf=1/2 component failure or to

D

(b)

f=1/2

$" C f=1/3

16

or 1 (inactive or failed), and is characterized by a breakdown threshold10 k

f=0

f=1/2

B

propagation model is frequently used to describe cascading failures as

D

B

E

f=1/2

E

MODELING CASCADING FAILURES

103

10-4

104

10-5 0 10

(a)

E .$

p = 0.5

(b)

D -$

Figure 8.21 Branching Model

(a) The branching process mirroring the propagation of the failure shown in Figure 8.20a,b. The perturbation starts from node A, whose failure flips B and E, which in turn flip C and D, respectively.

A *$

+$ B

C ,$

p = 0.5

(b) An elementary branching process. Each active link (green) can become inactive with probability p0 = 1/2 (top) or give birth to two new active links with probability p2 = 1/2 (bottom).

(c)

x( t)

3 2 1 0

0

1

2

3

t

SUBCRITICAL (d)

4

5

s = tmax + 1 = 6

SUPERCRITICAL (e)

(c) To analytically calculate p(s) we map the branching process into a diffusion problem. For this we show the number of active sites, x(t), in function of time t. A nonzero x(t) means that the avalanche persists. When x(t) becomes zero, we loose all active sites and the avalanche ends. In the example shown in the image this happens at t = 5, hence the size of the avalanche is tmax + 1 = 6.

CRITICAL (f)

An exact mapping between the branching model and a one dimensional random walk helps us calculate the avalanche exponent. Consider a branching process starting from a stub with one active end. When the active site becomes inactive, it decreases the number of its active sites, i.e. x → x − 1. When the active site branches, creates two active sites, i.e. x → x + 1. This maps the avalanche size s to the time it takes for the walk that starts at x = 1 to reach x = 0 for the first time. This is a much studied process in random walk theory, predicting that the return time distribution follows a power law with exponent 3/2 [32]. For branching process corresponding to scale-free pk, the avalanche exponent depends on γ, as shown in Figure 8.22.

BRANCHING MODEL Given the complexity of the failure propogation model, it is hard to analytically predict the scaling behavior of the obtained avalanches. To understand the power-law nature of p(s) and to calculate the avalanche exponent α, we turn to the branching model. This is the simplest model that still captures the basic features of a cascading event. The model builds on the observation that each cascading failure follows a branching process. Indeed, let us call the node whose initial failure triggers the avalanche the root of the tree. The branches of the tree are the nodes whose failure was triggered by this initial failure. For exam-

(d,e,f) Typical avalanches generated by the branching model in the subcritical (d), supercritical (e) and critical regime (f). The green node in each cascade marks the root of the tree, representing the first perturbation. In (d) and (f) we show multiple trees, while in (e) we show only one, as each tree (avalanche) grows indefinitely.

ple, in Figures 8.20a,b, the breakdown of node A starts the avalanche, hence A is the root of the tree. The failure of A leads to the failure of B and E, representing the two branches of the tree. Subsequently E induces the failure of D and B leads to the failure of C (Figure 8.21a). The branching model captures the essential features of avalanche propagation (Figure 8.21). The model starts with a single active node. In the next time step each active node produces k offsprings, where k is selected from a pk distribution. If a node selects k = 0, that branch dies out

(Figure 8.21b). If it selects k > 0, it will have k new active sites. The size of an avalanche corresponds to the size of the tree when all active sites died out (Figure 8.21c).

NETWORK ROBUSTNESS

26

MODELING CASCADING FAILURES

2.0

The branching model predicts the same phases as those observed in the cascading failures model. The phases are now determined only by ⟨k⟩,

α

hence by the pk distribution:

1.5

• Subcritical Regime: ⟨k⟩ < 1 For ⟨k⟩ < 1 on average each branch has less then one offspring. Consequently each tree will terminate quickly (Figure 8.21d). In this regime the avalanche sizes follow an exponential distribution.

1.0

2

3

4

γ

For ⟨k⟩ > 1 on average each branch has more than one offspring. Conse-

The Avalanche Exponent The dependence of the avalanche exponent α on the degree exponent γ of the network on which the avalanche propagates, according to (8.15). The plot indicates that between 2 < γ < 3 the avalanche exponent depends on the degree exponent. Beyond γ = 3, however, the avalanches behave as they would be spreading on a random network, in which case we have α =3/2.

quently the tree will continue to grow indefinitely (Figure 8.21e). Hence in this regime all avalanches are global. • Critical Regime: ⟨k⟩ = 1 For ⟨k⟩ = 1 on average each branch has exactly one offspring. Consequently some trees are large and others die out shortly (Figure 8.21e). Numerical simulations indicate that in this regime the avalanche size distribution follows the power law (8.14). The branching model can be solved analytically, allowing us to determine the avalanche size distribution for an arbitrary pk. If pk is exponentially bounded, e.g. it has an exponential tail, the calculations predict α = 3/2. If, however, pk is scale-free, then the avalanche exponent depends on the power-law exponent γ, following (Figure 8.22) [32, 33]

γ ≥3

2 1

⎧⎪ 3 / 2, α=⎨ γ / ( γ − 1), ⎪⎩

5

27

MODELING CASCADING FAILURES

SECTION 8.7

BUILDING ROBUSTNESS

Can we enhance a network’s robustness? In this section we show that

(a)

(b)

the insights we gained about the factors that influence robustness allows us to design networks that can simultaneously resist random failures and attacks. We also discuss how to stop a cascading failure, allowing us to enhance a system’s dynamical robustness. Finally, we apply the developed

k = 12 / 7

tools to the power grid, linking its robustness to its reliability.

(c)

Designing Robust Networks

k = 24 / 7

1.5 RANDOM TARGETED TOTAL

Designing networks that are simultaneously robust to attacks and random failures appears to be a conflicting desire [36, 37, 38, 39]. For ex-

1

ample, the hub-and-spoke network of Figure 8.23a is robust to random

fc

failures, as only the failure of its central node can break the network

0.5

into isolated components. Therefore, the probability that a random failure will fragment the network is 1/N, which is negligible for large N. At the same time this network is vulnerable to attacks, as the removal of

0

a single node, its central hub, breaks the network into isolated nodes. We can enhance this network’s attack tolerance by connecting its pe-

5

γ

10

15

20

Figure 8.23 Enhancing Robustness

ripheral nodes (Figure 8.23b), so that the removal of the hub does not

(a) A hub-and-spoke network is robust to random failures but has a low tolerance to an attack that removes its central hub.

fragment the network. There is a price, however, for this enhanced robustness: it requires us to double the number of links. If we define the cost to build and maintain a network to be proportional to its average

(b) By connecting some of the small degree nodes, the reinforced network has a higher tolerance to targeted attacks. This increases the cost measured by ⟨k⟩, which is higher for the reinforced network.

degree ⟨k⟩, the cost of the network of Figure 8.23b is 24/7, double of the cost 12/7 of the network of Figure 8.23a. The increased cost prompts us to refine our question: Can we maximize the robustness of a network to both random failures and targeted attacks without changing the cost?

(c) Random, fcrand, targeted fctarg and total fctot percolation thresholds for scale-free networks in function of the degree exponent γ for a network with kmin = 3.

A network’s robustness against random failures is captured by its percolation threshold fc, which is the fraction of the nodes we must remove for the network to fall apart. To enhance a network's robustness we must increase fc. According to (8.7) fc depends only on ⟨k⟩ and ⟨k2⟩. Consequently the degree distribution which maximizes fc needs to

maximize ⟨k2⟩ if we wish to keep the cost ⟨k⟩ fixed. This is achieved by a bimodal distribution, corresponding to a network with only two kinds

NETWORK ROBUSTNESS

0

28

(a)

of nodes, with degrees kmin and kmax (Figure 8.23a,b).

k =2 ATTACK

1

RANDOM FAILURE

If we wish to simultaneously optimize the network topology against

0.75

both random failures and attacks, we search for topologies that maxi-

P∞

0.5

mize the sum (Figure 8.24c)

0.25

fc tot = fcrand + fctarg . .

(8.16)

A combination of analytical arguments and numerical simulations in-

0

0

0.25

0.5

f

0.75

1

k =3

(b)

dicate that this too is best achieved by the bimodal degree distribution

ATTACK

1

RANDOM FAILURE

[36, 37, 38, 39]

0.75

P∞ 0.5

pk = (1− r)δ (k − kmin ) + rδ (k − kmax ) ,

(8.17)

0.25 0

0

0.25

0.5

f

0.75

1

describing a network in which an r fraction of nodes have degree kmax and the remaining (1 − r) fraction have degree kmin.

(c)

k =5 ATTACK

1

As we show in ADVANCED TOPICS 8.G, the maximum of fctot is obtained

RANDOM FAILURE

when r = 1/N, i.e. when there is a single node with degree kmax and the

0.75

P∞

remaining nodes have degree kmin. In this case the value of kmax depends

0.5

on the system size as

0.25

kmax = AN . 2/3

(8.18)

In other words, a network that is robust to both random failures and attacks has a single hub with degree (8.18), and the rest of the nodes have the same degree kmin. This hub-and-spoke topology is obviously robust

0

0

0.25

0.5

f

0.75

1

Figure 8.24

Optimizing Attack and Failure Tolerance

against random failures as the chance of removing the central hub is

The figure illustrates the optimal network topologies predicted by (8.16) and (8.17), consisting of a single hub of size (8.18) and the rest of the nodes have the same degree kmin determined by ⟨k⟩. The left panels show the network topology for N = 300; the right panels show the failure/attack curves for N = 10,000.

1/N, tiny for large N. The obtained network may appear to be vulnerable to an attack that removes its hub, but it is not necessarily so. Indeed, the network’s giant component is held together by both the central hub as well as by the

(a) For small ⟨k⟩ the hub holds the network together. Once we remove this central hub the network breaks apart. Hence the attack and error curves are well separated, indicating that the network is robust to random failures but fragile to attacks.

many nodes with degree kmin, that for kmin > 1 form a giant component

on their own. Hence while the removal of the kmax hub causes a major

one-time loss, the remaining low degree nodes are robust against subsequent targeted removal (Figure 8.24c).

(b) For larger ⟨k⟩ a giant component emerges, that exists even without the central hub. Hence while the hub enhances the system’s robustness to random failures, it is no longer essential for the network. In this case both the attack fctarg and error fcrand are large. (c) For even larger ⟨k⟩ the error and the attack curves are indistinguishable, indicating that the network's response to attacks and random failures is indistinguishable. In this case the network is well connected even without its central hub.

NETWORK ROBUSTNESS

29

BUILDING ROBUSTNESS

BOX 8.4 HALTING CASCADING FAILURES

Can we avoid cascading failures? The first instinct is to reinforce the

0.4

network by adding new links. The problem with reinforcement is that BIOMASS FLUX

in most real systems the time needed to establish a new link is much larger than the timescale of a cascading failure. For example, thanks to regulatory, financial and legal barriers, building a new transmission line on the power grid can take up to two decades. In contrast, a

reduced through selective node and link removal [40]. To do so we note that each cascading failure has two parts:

0.2 0.1 0

cascading failure can sweep the power grid in a few seconds. In a counterintuitive fashion, the impact of cascading failures can be

0.3

0

5

10

15 20 25 30 35 NUMBER OF GENES

Figure 8.25

Lazarus Effect The growth rate of a bacteria is determined by its ability to generate biomass, the molecules it needs to build its cell wall, DNA and other cellular components. If some key genes are missing, the bacteria is unable to generate the necessary biomass. Unable to multiply, it will eventually die. Genes in whose absence the biomass flux is zero are called essential.

(i) Initial failure is the breakdown of the first node or link, representing the source of the subsequent cascade. (ii) Propagation is when the initial failure induces the failure of additional nodes and starts cascading through the network. Typically the time interval between (i) and (ii) is much shorter than the time scale over which the network could be reinforced. Yet, sim-

The plot shows the biomass flux for E. Coli, a bacteria frequently studied by biologists. The original mutant is missing an essential gene, hence its biomass flux is zero, as shown on the vertical axis. Consequently, it cannot multiply. Yet, as the figure illustrates, by removing five additional genes we can turn on the biomass flux. Therefore, counterintuitively, we can revive a dead organism through the removal of further genes, a phenomena called the Lazarus effect [41].

ulations indicate that the size of a cascade can be reduced if we intentionally remove additional nodes right after the initial failure (i), but before the failure could propagate. Even though the intentional removal of a node or a link causes further damage to the network, the removal of a well chosen component can suppress the cascade propagation [40]. Simulations indicate that to limit the size of the cascades we must remove nodes with small loads and links with large excess load in the vicinity of the initial failure. The mechanism is similar to the method used by firefighters, who set a controlled fire in the fireline to consume the fuel in the path of a wildfire. A dramatic manifestation of this approach is provided by the Lazarus effect, the ability to revive a previously "dead" bacteria, i.e. one that is unable to grow and multiply. This can be achieved through the knockout of a few well selected genes (Figure 8.25) [41]. Therefore, in a counterintuitive fashion, controlled damage can be beneficial to a network.

NETWORK ROBUSTNESS

40

30

BUILDING ROBUSTNESS

CASE STUDY: ESTIMATING ROBUSTNESS The European power grid is an ensemble of more than twenty national power grids consisting of over 3,000 generators and substations (nodes) and 200,000 km of transmission lines (Figure 8.26a-d). The network's degree distribution can be approximated with (Figure 8.26e) [42, 43]

pk =

e− k/〈 k 〉 〈k〉

(8.19)

indicating that its topology is characterized by a single parameter, ⟨k⟩. Such exponential pk emerges in growing networks that lack preferential attachment (SECTION 5.5).

By knowing ⟨k⟩ for each national power grid, we can predict the respective network's critical threshold fctarg for attacks. As Figure 8.26f shows, for national power grids with ⟨k⟩ > 1.5 there is a reasonable agreement between the observed and the predicted fctarg (Group 1). However, for power grids with ⟨k⟩ < 1.5 (Group 2) the predicted fctarg underestimates

the real fctarg, indicating that these national networks are more robust

to attacks than expected based on their degree distribution. As we show next, this enhanced robustness correlates with the reliability of the respective national networks. To test the relationship between robustness and reliability, we use several quantities, collected and reported for each power failure: (1) energy not supplied; (2) total loss of power; (3) average interruption time, measured in minutes per year. The measurements indicate that Group 1 networks, for which the real and the theoretical fctarg agree, represent two thirds of the full network size and carry almost as much power and energy as the Group 2 networks. Yet, Group 1 accumulates more than five times the average interruption time, more than two times the recorded power losses and almost four times the undelivered energy compared to Group 2 [42]. Hence, the national power grids in Group 1 are significantly more fragile than the power grids in Group 2. This result offers direct evidence that networks that are topologically more robust are also more reliable. At the same time this finding is rather counterintuitive: One would expect the denser networks to be more robust. We find, however, that the sparser power grids display enhanced robustness. In summary, a better understanding of the network topology is essential to improve the robustness of complex systems. We can enhance robustness by either designing network topologies that are simultaneously robust to both random failures and attacks, or by interventions that limit the spread of cascading failures. These results may suggest that we should redesign the topology of the Internet and the power grid to enhance their robustness [44]. Given the opportunity to do so, this could indeed be achieved. Yet, these infrastructural networks were built incrementally over decades, following the self-organized growth process described in the previous chapters. Given the enormous cost of each node and link, it is unlikely that we would ever be given a chance to rebuild them. NETWORK ROBUSTNESS

31

BUILDING ROBUSTNESS

(a)

Figure 8.26

(b)

The Power Grid (a) The power grid is a complex infrastructure consisting of (1) power generators, (2) switching units, (3) the high voltage transmission grid, (4) transformers, (5) low voltage lines, (6) consumers, like households or businesses. When we study the network behind the power grid, many of these details are ignored.

fc

Following [27], we translate the problem of intentional (c) tack to an equivalent random failure problem. The (d) ing [27], we translate the problem of intentional moval of a fraction f of nodes with the highest deo an equivalent random failure problem. The ee equivalent to thewith random removal deof those of is a then fraction f of nodes the highest nks nodes to those thenconnecting equivalentthe toremaining the random removal ofalready those reoved. Thus, probability that to a specific link leads nnecting thethe remaining nodes those already re-to deleted will be given Thus, thenode probability that aby: specific link leads to

d node will be given by: ˜

K kP (k) p˜K˜= kP (k) dk k dk K

p˜ =

(13)

k

K

the average degree of the undamaged graph. It (e)

Cumulative distribution

-1

γ

2

2

0

10

0

10

˜ K˜b K˜ p˜ = + 1 +e−1K/˜ eγ − K/ γ γ

Cumulative distribution

b =

c c (14) (14)

0,6

random

0,4

0,0

1,0

3,0

fc

(e) The complementary cumulative degree distribution Pk of the European power grid. The plot shows the data for the full network (UCTE) and separately for Italy, and the joint network of UK and Ireland, indicating that the national grid’s Pk also follows (8.19).

4,0

BREAKDOWN

0,4 0,4

p˜ = (lnp˜ -2p − 1)p =c(ln pc −c 1)pc ITALY Italy (15) (15) 10 -2 10 e assume thatK is large enough to ignore the UK AND IRELAND here we ).assume is large enough to ignore the Thus,thatK an equivalent network UK and Ireland with K/ γ). Thus, an equivalent network UCTE with has been-3 built after a random removal UCTE ˜ 10 been built after a random removal aximal degree K has es due to the fact that the -3 absence of correlations 10 0links. 10 correlations 5absence nodes due to theoffact that In theorder of a random failure the 15 kto obtain 5 k such mplies a random failure of0links.graph, In order to obtain distribution of the damaged a10fail- the 15 k introduced (3). graph, But this willa tions. egree distribution of equation the damaged such failhbenodes includinginto generators, transformers and substa

2,0

BB

b

0,5 0,5 Breakdown

-1

(b,c,d) The Italian power grid with the details of production and consumption. Once we strip these details from the network, we obtain the spatial network shown in (c). Once the spatial information is also removed, we arrive to the network (d), which is the typical object of study at the network level.

attack

(f)

10 straightforward equation (12) it(12) is to see Using equation it is10 straightforward to that: see that:

Pk

A a

0,2

(13)

the average degree of the undamaged graph. It to show thatgives: this gives: cult tocult show that this

a

0,8

GROUP 2

f ctarg 0,3

GROUP 1

0,3

0,2 0,2 0,1 0,0

Connected CONNECTED

0,1 0,0

1

1

1,5 1,5

k

2

2

Figure 3: (a) Phase space for exponential uncorrelated net-

NETWORK ROBUSTNESS

32

(f) The phase space (fctarg,〈k〉) of exponential uncorrelated networks under attack, where fctarg is the fraction of hubs we must remove to fragment the network. The continuous curve corresponds to the critical boundary for attacks, below which the network retains its giant component. The plot also shows the estimated fctarg(⟨k⟩) for attacks for the thirty-three national power grids within EU, each shown as a separate circle. The plot indicates the presence of two classes of power grids. For countries with ⟨k⟩ > 1.5 (Group 1), the analytical prediction for fctarg agrees with the numerically observed values. For countries with ⟨k⟩ < 1.5 (Group 2) the analytical prediction underestimates the numerically observed values. Therefore, Group 2 national grids show enhanced robustness to attacks, meaning that they are more robust than expected for a random network with the same degree sequence. After [42].

BUILDING ROBUSTNESS

SECTION 8.8

SUMMARY: ACHILLES' HEEL BOX 8.5 ROBUSTNESS, RESILIENCE, REDUNDANCY

Redundancy and resilience are concepts deeply linked to robustness. It is useful to clarify the differences between them. Robustness A system is robust if it can maintain its basic functions in the presence of internal and external errors. In a network context robustness refers to the system's ability to carry out its basic functions even when some of its nodes and links may be missing.

The masterminds of the September 11, 2001 did not choose their targets at random: the World Trade Center in New York, the Pentagon, and the White House (an intended target) in Washington DC are the hubs of America’s economic, military, and political power [45]. Yet, while causing a human tragedy far greater than any other event America has experienced since the Vietnam war, the attacks failed to topple the network. They did

Resilience A system is resilient if it can adapt to internal and external errors by changing its mode of operation, without losing its ability to function. Hence resilience is a dynamical property that requires a shift in the system's core activities.

offer, however, an excuse to start new wars, like the Iraq and the Afghan wars, triggering a series of cascading events whose impact was far more devastating than the 9/11 terrorist attacks themselves. Yet, all networks, ranging from the economic to the military and the political web, survived. Hence, we can view 9/11 as a tale of robustness and network resilience (BOX 8.5). The roots of this robustness were uncovered in this chapter: Real networks have a whole hierarchy of hubs. Taking out any one of them is not

Redundancy Redundancy implies the presence of parallel components and functions that, if needed, can replace a missing component or funciton. Networks show considerable redundancy in their ability to navigate information between two nodes, thanks to the multiple independent paths between most node pairs.

sufficient to topple the underlying network. The remarkable robustness of real networks represents good news for most complex systems. Indeed, there are uncountable errors in our cells, from misfolding proteins to the late arrival of a transcription factor. Yet, the robustness of the underlying cellular network allows our cells to carry on their normal functions. Network robustness also explains why we rarely notice the effect of router errors on the Internet or why the disappearance of a species does not result in an immediate environmental catastrophe. This topological robustness has its price, however: fragility against attacks. As we showed in this chapter, the simultaneous removal of several hubs will break any network. This is bad news for the Internet, as it allows crackers to design strategies that can harm this vital communication system. It is bad news for economic systems, as it indicates that hub removal can cripple the whole economy, as vividly illustrated by the 2009 financial meltdown. Yet, it is good news for drug design, as it suggests that an accurate map of cellular networks can help us develop drugs that can kill unwanted bacteria or cancer cells. The message of this chapter is simple: Network topology, robustness, NETWORK ROBUSTNESS

33

p = 0 .593

ACHILLES’ HEEL

PERCOLATION

SHLOMO HAVLIN

p = 0 .62

YEAR

1950

1957 1960

1964

Mathematicians Simon Broadbent and John Hammersey introduce percolation and formalize many of its mathematical concepts [5]. The theory rose to prominence in the 1960s and 70s, finding applications from oil exploration to superconductivity.

1970

1980

Paul Baran explores the vulnerability of communication networks to Soviet nuclear attacks, concluding that they are too centralized to be viable under attack. Proposes instead a mesh-like network architecture (BOX 8.2).

2000 2001

1990

Albert, Jeong and Barabási study the error and attack tolerance of complex networks, discovering their joint robustness to failures and fragility to attacks.

and fragility cannot be separated from one other. Rather, each complex system has its own Achilles’ Heel: the networks behind them are simulta-

Shlomo Havlin and his collaborators establish a formal link between network robustness and percolation theory, showing that the percolation threshold of a scale-free network is determined by the first two moments of the degree distribution.

Figure 8.27 From Percolation to Robustness: A Brief History

The systematic study of network robustness started with a paper published in Nature (Figure 8.1) by Réka Albert, Hawoong Jeong and Albert-László Barabási [1], reporting the robustness of scale-free networks to random failures and their fragility to attacks. Yet, the analytical understanding of network robustness relies on percolation theory. In this context, particularly important were the contributions of Shlomo Havlin and collaborators, who established the formal link between robustness and percolation theory and showed that the percolation threshold of a scale-free network is determined by the moments of the degree distribution. A statistical physicist from Israel, Havlin had multiple contributions to the study of networks, from discovering the self-similar nature of real networks [46] to exploring the robustness of layered networks [47].

neously robust to random failures but vulnerable to attacks. When considering robustness, we cannot ignore the fact that most systems have numerous controls and feedback loops that help them survive in the face of errors and failures. Internet protocols were designed to ‘route around the trouble’, guiding the traffic away from routers that malfunction; cells have numerous mechanisms to dismantle faulty proteins and to shut down malfunctioning genes. This chapter documented a new contribution to robustness: the structure of the underlying network offers a system an enhanced failure tolerance. The robustness of scale-free networks prompts us to ask: Could this enhanced robustness be the reason why many real networks are scale-free? Perhaps real systems have developed a scale-free architecture to satisfy their need for robustness. If this hypothesis is correct we should be able to set robustness as an optimization criteria and obtain a scale-free network. Yet, as we showed in SECTION 8.7, a network with maximal robustness has a hub-and-spoke topology. Its degree distribution is bimodal, rather than a power law. This suggests that robustness is not the principle that drives the development of real networks. Rather, networks are scale-free thanks to growth and preferential attachment. It so happens that scale-free networks also have enhanced robustness. Yet, they are not the most robust networks we could design.

NETWORK ROBUSTNESS

2010

34

SUMMARY

BOX 8.6 AT A GLANCE: NETWORK ROBUSTNESS

Malloy-Reed criteria: A giant component exists if

k2 >2 k Random failures:

1 〈k 〉 −1 〈k〉 1 ER Random Network: fc = 1− 〈k〉 fc = 1−

2

Enhanced robustness:

fc > fcER

Attacks: 2−γ

fc1−γ = 2 +

3−γ

2 −γ kmin ( fc1−γ − 1) 3−γ

Cascading failures:

p(s) ∼ s −α ⎧ 3/2 γ >3 ⎪ α=⎨ γ ⎪ γ −1 2 < γ < 3 ⎩

NETWORK ROBUSTNESS

35

SUMMARY

SECTION 8.9

HOMEWORK

8.1. Random Failure: Beyond Scale-Free Networks Calculate the critical threshold fc for networks with (a) Power law with exponential cutoff. (b) Lognormal distribution. (c) Delta distribution (all nodes have the same degree). Assume that the networks are uncorrelated and infinite. Refer to Table 4.2 for the functional form of the distribution and the corresponding first and second moments. Discuss the consequences of the obtained results for network robustness. 8.2. Critical Threshold in Correlated Networks Generate three networks with 104 nodes, that are assortative, disassortative and neutral and have a power-law degree distribution with degree exponent γ = 2.2. Use the Xalvi-Brunet & Sokolov algorithm described in SECTION 7.5 to generate the networks. With the help of a computer, study the robustness of the three networks against random failures, and compare their P∞(f)/P∞(0) ratio. Which network is the most robust? Can you explain why? 8.3. Failure of Real Networks Determine the number of nodes that need to fail to break the networks listed in Table 4.1. Assume that each network is uncorrelated. 8.4. Conspiracy in Social Networks In a Big Brother society, the thought police wants to follow a "divide and conquer" strategy by fragmenting the social network into isolated components. You belong to the resistance and want to foil their plans. There are rumours that the police wants to detain individuals that have many friends and individuals whose friends tend to know each other. The resistance puts you in charge to decide which individuals to protect: those whose friendship circle is highly interconnected or those with many friends. To decide

NETWORK ROBUSTNESS

36

you simulate two different attacks on your network, by removing (i) the nodes that have the highest clustering coefficient and (ii) the nodes that have the largest degree. Study the size of the giant component in function of the fraction of removed nodes for the two attacks on the following networks: (a) A network with N = 104 nodes generated with the configuration model (SECTION 4.8) and power-law degree distribution with γ = 2.5. (b) A network with N = 104 nodes generated with the hierarchical model described in Figure 9.16 and ADVANCED TOPIC 9.B. Which is the most sensitive topological information, clustering coefficient or degree, which, if protected, limits the damage best? Would it be better if all individuals' information (clustering coefficient, degree, etc.) could be kept secret? Why? 8.5. Avalanches in Networks Generate a random network with the Erdős-Rényi G(N,p) model and a scale-free network with the configuration model, with N = 103 nodes and average degree 〈k〉 = 2. Assume that on each node there is a bucket which can hold as many sand grains as the node degree. Simulate then the following process: (a) At each time step add a grain to a randomly chosen node i. (b) If the number of grains at node i reaches or exceeds its bucket size, then it becomes unstable and all the grains at the node topple to the buckets of its adjacent nodes. (c) If this toppling causes any of the adjacent nodes' buckets to be unstable, subsequent topplings follow on those nodes, until there is no unstable bucket left. We call this sequence of toppings an avalanche, its size s being equal to the number of nodes that turned unstable following an initial perturbation (adding one grain). Repeat (a)-(c) 104 times. Assume that at each time step a fraction 10–4 of sand grains is lost in the transfer, so that the network buckets do not become saturated with sand. Study the avalanche distribution P(s).

NETWORK ROBUSTNESS

37

HOMEWORK

SECTION 8.10

ADVANCED TOPICS 8.A PERCOLATION IN SCALE-FREE NETWORKS

To understand how a scale-free network breaks apart as we approach the threshold (8.7), we need to determine the corresponding critical exponents γp, βp and ν. The calculations indicate that the scale-free property alters the value of these exponents, leading to systematic deviations from the exponents that characterize random networks (SECTION 8.2). Let us start with the probability P∞ that a randomly selected node belongs to the giant component. According to (8.2) this follows a power law

near pc (or fc in the case of node removal). The calculations predict that for a scale-free network the exponent βp depends on the degree exponent γ as [7, 48, 49, 50, 51]

3 p=

1 1 1

3

2
4) we have βp = 1, for most scale-free networks of practical interest βp > 1. Therefore, the giant component collapses faster in the vicinity of the critical point in a scale-free network than in a random network. The exponent characterizing the average component size near pc follows [48]

⎧⎪ 1 γ >3 γp =⎨ −1 2 < γ < 3. ⎩⎪

(8.21)

The negative γp for γ < 3 may appear surprising. Note, however, that for

γ < 3 we always have a giant component. Hence, the divergence (8.1) cannot be observed in this regime. NETWORK ROBUSTNESS

38

For a randomly connected network with arbitrary degree distribution the size distribution of the finite clusters follows [48, 50, 51] *

ns ∼ s −τ e− s/s.

(8.22)

Here, ns is the number of clusters of size s and s* is the crossover cluster size. At criticality

s * ~ p − pc

−σ

(8.23)

The critical exponents are

=

⎧ ⎪ ⎪ ⎪ σ =⎨ ⎪ ⎪ ⎪ ⎩

2

5 2

>4 3 2< 2

3−γ 2 4.

< 4,

(8.24)

(8.25)

τ = 5/2 and σ = 1/2 are recov-

In summary, the exponents describing the breakdown of a scale-free network depend on the degree exponent γ. This is true even in the range 3 < γ < 4, where the percolation transition occurs at a finite threshold fc. The

mean-field behavior predicted for percolation in infinite dimensions, capturing the response of a random network to random failures, is recovered only for γ > 4.

NETWORK ROBUSTNESS

39

RANDOM NETWORKS AND PERCOLATION

SECTION 8.11

ADVANCED TOPICS 8.B MOLLOY-REED CRITERION

The purpose of this section is to derive the Molloy-Reed criterion, which allows us to calculate the percolation threshold of an arbitrary network [6]. For a giant component to exist each node that belongs to it must be connected to at least two other nodes on average (Figure 8.8). Therefore, the average degree ki of a randomly chosen node i that is part of the giant com-

ponent should be at least 2. Denote with P(ki ∣ i ↔ j) the conditional probability that a node in a network with degree ki is connected to a node j that

is part of the giant component. This conditional probability allows us to determine the expected degree of node i as [51]

〈ki∣i ↔ j〉 = ∑ ki P(ki∣i ↔ j) = 2 .

(8.26)

ki

In other words, ⟨ki ∣ i ↔ j⟩ should be equal or exceed two, the condition

for node i to be part of the giant component. We can write the probability appearing in the sum (8.26) as

P(ki∣i ↔ j) =

P(ki ,i ↔ j) P(i ↔ j∣ki )p(ki ) , = P(i ↔ j) P(i ↔ j)

(8.27)

where we used Bayes’ theorem in the last term. For a network with degree distribution pk, in the absence of degree correlations, we can write

P(i ↔ j) =

2L 〈k〉 , = N(N − 1) N − 1

P(i ↔ j∣ki ) =

ki , N −1

(8.28)

which express the fact that we can choose between N − 1 nodes to link to, each with probability 1/(N − 1) and that we can try this ki times. We can now return to (8.26), obtaining

∑ ki

P(i ↔ j∣ki )p(ki ) ki P(ki∣i ↔ j) = ∑ki P(i ↔ j) ki

k p(k ) = ∑ki i i = 〈k〉 ki

∑ ki

ki 2 p(ki ) 〈k〉

(8.29)

With that we arrive at the Molloy-Reed criterion (8.4), providing the condition to have a giant component as

κ=

NETWORK ROBUSTNESS

〈k 2 〉 >2. 〈k〉

(8.30)

40

SECTION 8.12

ADVANCED TOPICS 8.C CRITICAL THRESHOLD UNDER RANDOM FAILURES

The purpose of this section is to derive (8.7), that provides the critical threshold for random node removal [7, 51]. The random removal of an f fraction of nodes has two consequences: • It alters the degree of some nodes, as nodes that were previously connected to the removed nodes will lose some links [k → k' ≤ k]. • Consequently, it changes the degree distribution, as the neighbors of the missing nodes will have an altered degree [pk → p'k']. To be specific, after we randomly remove an f fraction of nodes, a node with degree k becomes a node with degree k' with probability

⎛ k ⎞ k− k ′ k′ ⎜⎝ k ' ⎟⎠ f (1− f )

k' ≤ k .

(8.31)

The first f -dependent term in (8.31) accounts for the fact that the selected node lost (k − k')  links, each with probability f; the next term accounts for the fact that node removal leaves k' links untouched, each with probability (1 − f). The probability that we have a degree-k node in the original network is pk; the probability that we have a new node with degree k' in the new network is ∞ ⎛ k ⎞ k− k′ k′ p'k ' = ∑ pk ⎜ ⎟⎠ f (1− f ) . k ' ⎝ k=k '

(8.32)

Let us assume that we know ⟨k⟩ and ⟨k2⟩ for the original degree distribution pk. Our goal is to calculate ⟨k'⟩, ⟨k'2⟩ for the new degree distribution pk'' ,

obtained after we randomly removed an f fraction of the nodes. For this we write

NETWORK ROBUSTNESS

41

k'

=

f

k '=0

k ' pk ''

∞ ∞ ⎛ ⎞ k−k ' k! k' = ∑ k ' ∑ pk ⎜ f (1− f ) ⎟ k '! k − k ' ! )⎠ ⎝ ( k=k ' k '=0 ∞



=∑

∑p

k '=0

k '=k '

k

k=[k’, ∞)

(8.33)

k

k(k − 1)! k '−1 f k−k ' (1− f ) (1− f ). (k '− 1)!( k − k ')!

The sum above is performed over the triangle shown in Figure 8.28. We

k’

can check that we are performing the same sum if we change the order of Figure 8.28

summation together with the limits of the sums as ∞

=∑

k '=0





k

k=k '

k=0

k '=0

∑ =∑ ∑ .

The Integration Domain

(8.34)

In (8.34) we change the integration order, i.e. the order of the two sums. We can do so because both sums are defined over the triangle shown in purple in the figure.

Hence we obtain

k'

f



k

k=0

k '=0

= ∑ k ' ∑ pk

k(k − 1)! k '−1 f k−k ' (1− f ) (1− f ) ( k '− 1)!( k − k ')!



k

k=0

k '=0

= ∑ (1− f ) kpk ∑ ∞

k

k=0

k '=0

= ∑ (1− f ) kpk ∑

( k − 1)! f k−k ' 1− f k '−1 ( ) ( k '− 1)!( k − k ')! ⎛ k − 1 ⎞ k−k ' k '−1 ⎜⎝ k '− 1 ⎟⎠ f (1− f )

(8.35)



= ∑ (1− f ) kpk k=0

= (1− f ) k . This connects ⟨k'⟩ to the original ⟨k⟩ after the random removal of an f fraction of nodes. We perform a similar calculation for ⟨k'2⟩:

k '2

f

= k '(k ' 1) + k '

f

= k '(k '− 1) f + k ' =

k '=0

f

(8.36)

k ' ( k ' 1)pk '' + k ' f .

Again, we change the order of the sums (Figure 8.28), obtaining

k '(k ' 1)

f

=

k '=0

k '(k ' 1)pk ''

∞ ∞ ⎛ k ⎞ k−k ' f (1− f )k ' = ∑ k '(k '− 1)∑ pk ⎜ ⎝ k ' ⎟⎠ k '=0 k=k '

k '(k '− 1) k−k ' f (1− f )k ' k '!(k − k ')! k=0 k '=0 ∞ k k! = ∑ ∑ pk f k−k ' (1− f )k '−2 (1− f )2 (k '− 2)!(k − k ')! k=0 k '=0 k



= ∑ k '(k '− 1)∑ pk



k

= ∑ (1− f ) k(k − 1)pk ∑ 2

k=0

k '=0

(k − 2)! f k−k ' (1− f )k '−2 (k '− 2)!(k − k ')!

(8.37)

k ⎛ k − 2 ⎞ k−k ' k '−2 = ∑ (1− f )2 k(k − 1)pk ∑ ⎜ ⎟ f (1− f ) k=0 k '=0 ⎝ k '− 2 ⎠ ∞



= ∑ (1− f )2 k(k − 1)pk k=0

NETWORK ROBUSTNESS

42

CRITICAL THRESHOLD UNDER RANDOM FAILURES

= (1− f )2 k(k − 1) . Hence we obtain

k '2

f

= k '(k '− 1) + k '

f

= k '(k '− 1) f + k '

f

= (1− f ) k(k − 1) + (1− f ) k 2

(

)

= (1− f )2 k 2 − k + (1− f ) k 2

= (1− f )2 k 2 − (1− f ) k + (1− f ) k = (1− f ) k 2

2

(

− − f + 2 f − 1+ 1− f 2

(8.38)

)k

= (1− f )2 k 2 + f (1− f ) k .

which connects ⟨k'2⟩ to the original ⟨k2⟩ after the random removal of an f fraction of nodes. Let us put the results (8.35) and (8.38) together:

〈 k ′ 〉 f = (1− f )〈k〉 ,

(8.39)

〈 k ′ 〉 f = (1− f )2 〈k 2 〉 + f (1− f )〈k〉 .

(8.40)

According to the Molloy-Reed criterion (8.4) the breakdown threshold is given by

κ=

〈k '2 〉 f = 2. 〈k '〉 f

(8.41)

Inserting (8.38) and (8.40) into (8.41) we obtain our final result (8.7),

fc = 1−

1 〈k 〉 −1 〈k〉 2

(8.42)

providing the breakdown threshold of networks with arbitrary pk under random node removal.

NETWORK ROBUSTNESS

43

CRITICAL THRESHOLD UNDER RANDOM FAILURES

SECTION 8.13

ADVANCED TOPICS 8.D BREAKDOWN OF A FINITE SCALE-FREE NETWORK

In this section we derive the dependence (8.10) of the breakdown threshold of a scale-free network on the network size N. We start by calculating the mth moment of a power-law distribution

〈k 〉 = (γ − 1)k m

kmax

γ −1 min



kmin

k m−γ dk =

(γ − 1) γ −1 m−γ +1 kmax kmin [k ]kmin . (m − γ + 1)

(8.43)

Using (4. 18) 1

kmax = kmin N γ −1

(8.44)

we obtain

〈k m 〉 =

(γ − 1) γ −1 m−γ +1 m−γ +1 . kmin [kmax − kmin ] (m − γ + 1)

(8.45)

To calculate fc we need to determine the ratio

κ=

3−γ 3−γ − kmin 〈k 2 〉 (2 − γ ) kmax , = 2−γ 2−γ 〈k〉 (3 − γ ) kmax − kmin

(8.46)

which for large N (and hence for large kmax) depends on γ as

γ >3

⎧ kmin 〈k 2 〉 2 − γ ⎪⎪ 3−γ γ −2 κ= = ⎨ k k 〈k〉 3 − γ ⎪ max min ⎪⎩ kmax

3>γ > 2

(8.47)

2 >γ >1

The breakdown threshold is given by (8.7)

fc = 1−

1 , κ −1

(8.48)

where κ is given by (8.46). Inserting (8.43) into (8.42) and (8.47), we obtain

fc ≈ 1− which is (8.10). NETWORK ROBUSTNESS

C N

3−γ γ −1

,

(8.49)

44

SECTION 8.14

ADVANCED TOPICS 8.E ATTACK AND ERROR TOLERANCE OF REAL NETWORKS

In this section we explore the attack and error curves for the ten reference networks discussed in Tables 4.1 and (8.2). The corresponding curves are shown in Figure 8.29. Their inspection reveals several patterns, confirming the results discussed in this chapter: • For all networks the error and attack curves separate, confirming the Achilles’ Heel property (SECTION 8.8): Real networks are robust to random failures but are fragile to attacks. • The separation between the error and attack curves depends on the average degree and the degree heterogeneity of each network. For example, for the citation and the actor networks fc for the attacks is in the vicinity of 0.5 and 0.75, respectively, rather large values. This is because these networks are rather dense, with ⟨k⟩ = 20.8 for citation network and ⟨k⟩ = 83.7 for the actor network. Hence these networks can survive the removal of a very high fraction of their hubs.

NETWORK ROBUSTNESS

45

75

(a)

75

MOBILE PHONE CALLS

1

SCIENTIFIC COLLABORATION

1

Figure 8.29

1

Error and Attack Curves

0.5

0

(c)

0.25

0.5

f

0.75

1 1

0.25

0.5

f

0.75

0

1

POWER GRID SCIENTIFIC COLLABORATION PROTEIN

0.75 0.75 0.75

0.75 0.75 0.75

0.5 P0.5 ∞

0.5 P0.5 ∞ P∞ 0.5

0.5 P0.5 ∞ P 0.5

0.5 f0.5

f

0.75 0.75

1 1

INTERNET ACTOR METABOLIC

(e)1 1 1

0 0 0 0 0 0

1 1

0.5 0.5 f f0.5

f

0.75 0.75 0.75

1 1 1

MOBILE PHONE CALLS EMAIL WWW

0 0 0 00 0

(g) 1 1 1

0.25 0.25 0.25

0.5 0.5 f 0.5

f f

0.75 0.75 0.75

1 1 1

SCIENTIFIC COLLABORATION PROTEIN CITATION

0 0 0 0 0 0

0.25 0.25 0.25

0.5 0.5 f 0.5

f f

0.75 0.75 0.75

1 1 1

ACTOR METABOLIC

0 00 0 0 0

0.25 0.25 0.25

(i)

0.5 f0.5 0.5 f

f

0.75 0.75 0.75

1 1 1

1

P∞

P0.25 ∞

P0.25 ∞

0.25

0.25

0 0

f

0 0 0 0 0 0

0.75

1

0

0.25

0.5

f

0.5

f

0.5

0.25

0.25

0

0.25

0.5

f

NETWORK ROBUSTNESS

0.75

1

0

0.25

0.5

f

0

0

0

0

0

0

0

1

0.25 0.25 0.25

0.5 f0.5 0.5

f f

0.75 0.75 0.75

1 1 1

WWW

0.25 0.25

0.25

0.5

0 0 0 0

0.25 0.25

0.5 0.5 f

f

0.75 0.75

1 1

0.75

1

CITATION

1

0.75

0.5

0

0.25

0.5

f

0.75

1

0

0

0.25

0.5

f

CITATION

0

1

0

0.75

0.5

P∞

0.75

1

ACTOR METABOLIC

0.75

P∞

0.75

P∞

0.25

1 1

SCIE

0.5

0.25

0.75 0.75

0

0.25

0.5 P∞ P∞

0.25 0.25

0.5 f0.5

0

0.75 0.75

P∞

0.25 0.25

0

0.5

1 1

0.5

0 00 0

0

0.5

P∞

P∞

1

0.75

0.75

0.5 P∞ P∞

(j)

WWW

1 1 1

0.75

0.5

0.25 0.25

f

0.75 0.75 0.75

SCIENTIFIC COLLABORATION PROTEIN CITATION

1

0.75 0.75

0.5

0.5 0.5 f f0.5

0.25 0.25

1

0.5 P∞ P0.5 ∞ P0.25 ∞

0.25 0.25 0.25

0.5

(h) 1

0.75 0.75 0.75

0 0 0 0 0 0

P0.5 ∞ P0.5 ∞ P0.25 ∞

0.25 0.25 0.25

0.25 0.25 0.25

0.75

0.5

0.75 0.75 0.75

P0.5 ∞ P0.5 ∞ P∞

0.75 0.75 0.5

1 1 1

0.5

0.5

0

0.25 0.25 0.25

0.75 0.75 0.75

P0.5 ∞ 0.5 P∞ P∞

INTERNET ACTOR METABOLIC

P

(f)1

0.75 0.75 0.75

0.5

f

∞ 0.25 0.25 0.25

0.25 0.25

0.25 0.25

0.25

0.75

The error (green) and attack (purple) curves for the ten reference networks listed in Table 0.5 0.5 4.1. The green P∞vertical line corresponds to the P ∞ rand estimated fc for errors, while the purple vertical line corresponds to fctarg for attacks. The 0.25 0.25 estimated fc corresponds to the point where the giant component first drops below 1% of 0 0 its size. this proce0.75 original 1 0 In most 0.25 systems 0.5 0.75 1 f dure offers a good approximation for fc. The only exception is theMOBILE metabolic network, for PHONE CALLS 1 1 which fctarg < 0.25, but a small cluster persists, WWW pushing the reported fctarg to fctarg ≃ 0.5. 1 1



P0.25 ∞

0 0 0 0

0

1 1 1

0.75 0.75

0.25 0.25

1

0

(d) 1

P∞

1 1 1

0.25

0

1

MOBILE PHONE CALLS EMAIL

1 1

1 1 1

P∞

0.25

0

0.75

0.5

0.5

P∞

0.25

1

0.75

0.75

P∞

S

75 75 75

INTERNET

1

0.75

TION

75 75 75

(b)

POWER GRID

1

46

ATTACK AND ERROR TOLERANCE OF REAL NETWORKS

0

SECTION 8.15

ADVANCED TOPICS 8.F ATTACK THRESHOLD

The goal of this section is to derive (8.12), providing the attack threshold of a scale-free network. We aim to calculate fc for an uncorrelated scale-

free network, generated by the configuration model with pk = c ⋅ k−γ where

−γ+1 −γ+1 − k max ). k = kmin ,…, kmax and c ≈ (γ − 1)/(k min

The removal of an f fraction of nodes in a decreasing order of their degree (hub removal) has two effects [9, 51]: (i) The maximum degree of the network changes from kmax to k'max. (ii) The links connected to the removed hubs are also removed, changing the degree distribution of the remaining network. The resulting network is still uncorrelated, therefore we can use the Molloy-Reed criteria to determine the existence of a giant component. We start by considering the impact of (i). The new upper cutoff, k'max, is given by kmax

f=



′ kmax

pk dk =

′ − γ +1 − γ +1 γ − 1 kmax − kmax . − γ +1 − γ +1 γ − 1 kmin − kmax

(8.50)

If we assume that kmax ≫ k'max and kmax ≫ kmin (true for large scale-free

networks with natural cutoff), we can ignore the kmax terms, obtaining

⎛ k′ ⎞ f = ⎜ max ⎟ ⎝ kmin ⎠

− γ +1

,

(8.51)

which leads to 1

k 'max = kmin f 1−γ .

(8.52)

Equation (8.52) provides the new maximum degree of the network after we remove an f fraction of the hubs. NETWORK ROBUSTNESS

47

Next we turn to (ii), accounting for the fact that hub removal changes the degree distribution pk →  p'k . In the absence of degree correlations we assume that the links of the removed hubs connect to randomly selected stubs. Consequently, we calculate the fraction of links removed ‘randomly’, f,˜ as a consequence of removing an f fraction of the hubs: kmax

f =



k 'max

=

kpk dk

kmax

=

〈k〉

1 c k −γ +1dk 〈k〉 k∫'max

− γ +2 1 1− γ kmax ′ −γ +2 − kmax . − γ +1 − γ +2 〈k〉 2 − γ kmin − kmax

Ignoring the kmax term again and using 〈k〉 ≈

γ −1 k we obtain γ − 2 min

+2

kmax kmin

f˜ =

(8.53)

.

(8.54)

Using (8.51) we obtain 2

f˜ = f 1

For γ

.

(8.55) ˜ → 2 we have f → 1, which means that the removal of a tiny fraction

of the hubs removes all links, potentially destroying the network. This is consistent with the finding of CHAPTER 4 that for γ = 2 the hubs dominate the network. In general the degree distribution of the remaining network is

pk =

kmax k=kmin

k k



k k

(1 f˜)k pk .

(8.56)

Note that we obtained the degree distribution (8.32) in ADVANCED TOPICS 8.C. This means that now we can proceed with the calculation method developed there for random node removal. To be specific, we calculate

κ for a

scale-free network with kmin and k'max using (8.45):

κ=

3−γ 2 − γ kmax ′ 3−γ − kmin . 2−γ 2−γ 3 − γ kmax ′ − kmin

(8.57)

Substituting into this (8.52) we have

κ=

3−γ (3−γ )/(1−γ ) 3−γ 2 − γ kmin f − kmin 2 −γ f (3−γ )/(1−γ ) − 1 . = k min 2−γ (2−γ )/(1−γ ) 2−γ 3 − γ kmin f − kmin 3−γ f (2−γ )/(1−γ ) − 1

(8.58)

After simple transformations we obtain 2−γ

fc1−γ = 2 +

NETWORK ROBUSTNESS

⎛ 3−γ ⎞ 2 −γ kmin ⎜ fc1−γ − 1⎟ 3−γ ⎝ ⎠

(8.59)

48

THRESHOLD UNDER ATTACK

SECTION 8.17

ADVANCED TOPICS 8.G THE OPTIMAL DEGREE DISTRIBUTION

In this section we derive the bimodal degree distribution that simultaneously optimizes a network’s topology against attacks and failures, as discussed in SECTION 8.7 [37]. Let us assume, as we did in (8.17), that the degree distribution is bimodal, consisting of two delta functions:

pk = (1− r)δ (k − kmin ) + rδ (k − kmax ) . We start by calculating the total threshold, f

(8.62)

, as a function of r and

tot

kmax for a fixed ⟨k⟩. To obtain analytical expressions for fcrand and fctarg we

calculate the moments of the bimodal distribution (8.62),

〈k〉 = (1− r)kmin + rkmax , 〈k 〉 = (1− r)k 2

2 min

+ rk

2 max

(〈k〉 − rkmax )2 2 . = + rkmax 1− r

(8.63)

Inserting these into (8.7) we obtain

fcrand =

2 〈k〉 2 − 2r〈k〉kmax − 2(1− r)〈k〉 + rkmax . 2 〈k〉 2 − 2r〈k〉kmax − (1− r)〈k〉 + rkmax

(8.64)

To determine the threshold for targeted attack, we must consider the fact that we have only two types of nodes, i.e. an r fraction of nodes have degree kmax and the remaining (1 − r) fraction have degree kmin. Hence hub removal can either remove all hubs (case (i)), or only some fraction of them (case (ii)): (i) fctarg > r . In this case all hubs have been removed, hence the nodes left after the targeted attack have degree kmin. We therefore obtain

fctarg = r +

NETWORK ROBUSTNESS

1− r 〈k〉 − rkmax

⎧ 〈k〉 − rkmax − 2(1− r) ⎫ − rkmax ⎬ . ⎨〈k〉 〈k〉 − rk − (1− r) max ⎩ ⎭

(8.65)

49

(ii) fctarg < r. In this case the removed nodes are all from the high-degree group, leaving behind some kmax nodes. Hence we obtain

fctarg =

2 − 2(1− r)〈k〉 . 〈k〉 2 − 2r〈k〉kmax + rkmax kmax (kmax − 1)(1− r)

(8.66)

With the thresholds (8.64) - (8.66) we can now evaluate the total threshold fctot (8.16). To obtain an expression for the optimal value of kmax as a function

of r we determine the value of k for which fctot is maximal. Using (8.64) and (8.66), we find that for small r the optimal value of kmax can be approximated

by 1/3

⎧ 2〈k〉 2 (〈k〉 − 1)2 ⎫ −2/3 −2/3 kmax ~ ⎨ ⎬ r = Ar . 2〈k〉 − 1 ⎩ ⎭

(8.67)

Using this result and (8.16), for small r we have fctot = 2 −

1 3〈k〉 1/3 − r + O(r 2/3 ) . 〈k〉 − 1 A 2

(8.68)

Thus fctot approaches the theoretical maximum when r approaches zero.

For a network of N nodes the maximum value of fctot is obtained when r = 1/N, being the smallest value consistent with having at least one node of

degree kmax. Given this r the equation determining the optimal kmax, representing the size of the central hubs, is [37]

kmax = AN 2/3 ,

(8.69)

where A is defined in (8.67).

NETWORK ROBUSTNESS

50

THE OPTIMAL DEGREE DISTRIBUTION

SECTION 8.18

BIBLIOGRAPHY

[1] R. Albert, H. Jeong, and A.-L. Barabási. Attack and error tolerance of complex networks. Nature, 406: 378, 2000. [2] D. Stauffer and A. Aharony. Introduction to Percolation Theory. Taylor and Francis. London, 1994. [3] A. Bunde and S. Havlin. Fractals and Disordered Systems. Springer, 1996. [4] B. Bollobás and O. Riordan. Percolation. Cambridge University Press. Cambridge, 2006. [5] S. Broadbent and J. Hammersley. Percolation processes I. Crystals and mazes. Proceedings of the Cambridge Philosophical Society, 53: 629, 1957. [6] M. Molloy and B. Reed. A criticial point for random graphs with a given degree sequence. Random Structures and Algorithms, 6: 161, 1995. [7] R. Cohen, K. Erez, D. ben-Avraham and S. Havlin. Resilience of the Internet to random breakdowns. Phys. Rev. Lett., 85: 4626, 2000. [8] D. S. Callaway, M. E. J. Newman, S. H. Strogatz. and D. J. Watts. Network robustness and fragility: Percolation on random graphs. Phys. Rev. Lett., 85: 5468–5471, 2000. [9] R. Cohen, K. Erez, D. ben-Avraham and S. Havlin. Breakdown of the Internet under intentional attack. Phys. Rev. Lett., 86: 3682, 2001. [10] B. Bollobás and O. Riordan. Robustness and Vulnerability of ScaleFree Random Graphs. Internet Mathematics, 1: 1-35, 2003. [11] P. Baran. Introduction to Distributed Communications Networks. Rand Corporation Memorandum, RM-3420-PR, 1964. NETWORK ROBUSTNESS

51

[12] D.N. Kosterev, C.W. Taylor and W.A. Mittlestadt. Model Validation of the August 10, 1996 WSCC System Outage. IEEE Transactions on Power Systems 14: 967-979, 1999. [13] C. Labovitz, A. Ahuja and F. Jahasian. Experimental Study of Internet Stability and Wide-Area Backbone Failures. Proceedings of IEEE FTCS, Madison, WI, 1999. [14] A. G. Haldane and R. M. May. Systemic risk in banking ecosystems. Nature, 469: 351-355, 2011. [15] T. Roukny, H. Bersini, H. Pirotte, G. Caldarelli and S. Battiston. Default Cascades in Complex Networks: Topology and Systemic Risk. Scientific Reports, 3: 2759, 2013. [16] G. Tedeschi, A. Mazloumian, M. Gallegati, and D. Helbing. Bankruptcy cascades in interbank markets. PLoS One, 7: e52749, 2012. [17] D. Helbing. Globally networked risks and how to respond. Nature, 497: 51-59, 2013. [18] I. Dobson, B. A. Carreras, V. E. Lynch and D. E. Newman. Complex systems analysis of series of blackouts: Cascading failure, critical points, and self-organization. CHAOS, 17: 026103, 2007. [19] E. Bakshy, J. M. Hofman, W. A. Mason, and D. J. Watts. Everyone's an influencer: quantifying influence on twitter. Proceedings of the fourth ACM international conference on Web search and data mining (WSDM '11). ACM, New York, NY, USA, 65-74, 2011. [20] Y. Y. Kagan. Accuracy of modern global earthquake catalogs. Phys. Earth Planet. Inter., 135: 173, 2003. [21] M. Nagarajan, H. Purohit, and A. P. Sheth. A Qualitative Examination of Topical Tweet and Retweet Practices. ICWSM, 295-298, 2010. [22] P. Fleurquin, J.J. Ramasco and V.M. Eguiluz. Systemic delay propagation in the US airport network. Scientific Reports, 3: 1159, 2013. [23] B. K. Ellis, J. A. Stanford, D. Goodman, C. P. Stafford, D.L. Gustafson, D. A. Beauchamp, D. W. Chess, J. A. Craft, M. A. Deleray, and B. S. Hansen. Long-term effects of a trophic cascade in a large lake ecosystem. PNAS, 108: 1070, 2011. [24] V. R. Sole, M. M. Jose. Complexity and fragility in ecological networks. Proc. R. Soc. Lond. B, 268: 2039, 2001. [25] F. Jordán, I. Scheuring and G. Vida. Species Positions and Extinction Dynamics in Simple Food Webs. Journal of Theoretical Biology, 215: 441NETWORK ROBUSTNESS

52

BIBLIOGRAPHY

448, 2002. [26] S.L. Pimm and P. Raven. Biodiversity: Extinction by numbers. Nature, 403: 843, 2000. [27] World Economic Forum, Building Resilience in Supply Chains. World Economic Forum, 2013. [28] Joint Economic Committee of US Congress. Your flight has been delayed again: Flight delays cost passengers, airlines and the U.S. economy billions. Available at http://www.jec.senate.gov, May 22. 2008. [29] I. Dobson, A. Carreras, and D.E. Newman. A loading dependent model of probabilistic cascading failure. Probability in the Engineering and Informational Sciences, 19: 15, 2005. [30] D.J. Watts. A simple model of global cascades on random networks. PNAS, 99: 5766, 2002. [31] K.-I. Goh, D.-S. Lee, B. Kahng, and D. Kim. Sandpile on scale-free networks. Phys. Rev. Lett., 91: 148701, 2003. [32] D.-S. Lee, K.-I. Goh, B. Kahng, and D. Kim. Sandpile avalanche dynamics on scale-free networks. Physica A, 338: 84, 2004. [33] M. Ding and W. Yang. Distribution of the first return time in fractional Brownian motion and its application to the study of onoff intermittency. Phys. Rev. E., 52: 207-213, 1995. [34] A. E. Motter and Y.-C. Lai. Cascade-based attacks on complex networks. Physical Review E, 66: 065102, 2002. [35] Z. Kong and E. M. Yeh. Resilience to Degree-Dependent and Cascading Node Failures in Random Geometric Networks. IEEE Transactions on Information Theory, 56: 5533, 2010. [36] G. Paul, S. Sreenivas, and H. E. Stanley. Resilience of complex networks to random breakdown. Phys. Rev. E, 72: 056130, 2005. [37] G. Paul, T. Tanizawa, S. Havlin, and H. E. Stanley. Optimization of robustness of complex networks. European Physical Journal B, 38: 187–191, 2004. [38] A.X.C.N. Valente, A. Sarkar, and H. A. Stone. Two-peak and threepeak optimal complex networks. Phys. Rev. Lett., 92: 118702, 2004. [39] T. Tanizawa, G. Paul, R. Cohen, S. Havlin, and H. E. Stanley. Optimization of network robustness to waves of targeted and random attacks. Phys. Rev. E, 71: 047101, 2005.

NETWORK ROBUSTNESS

53

BIBLIOGRAPHY

[40] A.E. Motter. Cascade control and defense in complex networks. Phys. Rev. Lett., 93: 098701, 2004. [41] A. Motter, N. Gulbahce, E. Almaas, and A.-L. Barabási. Predicting synthetic rescues in metabolic networks. Molecular Systems Biology, 4: 1-10, 2008. [42] R.V. Sole, M. Rosas-Casals, B. Corominas-Murtra, and S. Valverde. Robustness of the European power grids under intentional attack. Phys. Rev. E, 77: 026102, 2008. [43] R. Albert, I. Albert, and G.L. Nakarado. Structural Vulnerability of the North American Power Grid. Phys. Rev. E, 69: 025103 R, 2004. [44] C.M. Schneider, N. Yazdani, N.A.M. Araújo, S. Havlin and H.J. Herrmann. Towards designing robust coupled networks. Scientific Reports, 3: 1969, 2013. [45] A.-L. Barabási. Linked: The New Science of Networks. Plume, New York, 2002. [46] C.M. Song, S. Havlin, and H.A Makse. Self-similarity of complex networks. Nature, 433: 392, 2005. [47] S.V. Buldyrev, R. Parshani, G. Paul, H.E. Stanley and S. Havlin. Catastrophic cascade of failures in interdependent networks. Nature, 464: 08932, 2010. [48] R. Cohen, D. ben-Avraham and S. Havlin. Percolation critical exponents in scale-free networks. Phys. Rev. E, 66: 036113, 2002. [49] S. N. Dorogovtsev, J. F. F. Mendes, and A. N. Samukhin. Anomalous percolation properties of growing networks. Phys. Rev. E, 64: 066110, 2001. [50] M. E. J. Newman, S. H. Strogatz, and D. J. Watts. Random graphs with arbitrary degree distributions and their applications. Phys. Rev. E, 64: 026118, 2001. [51] R. Cohen and S. Havlin. Complex Networks: Structure, Robustness and Function. Cambridge University Press. Cambridge, UK, 2010.

NETWORK ROBUSTNESS

54

BIBLIOGRAPHY

9 ALBERT-LÁSZLÓ BARABÁSI

NETWORK SCIENCE COMMUNITIES NORMCORE ONCE UPON A TIME PEOPLE WERE BORN INTO COMMUNITIES AND HAD TO FIND THEIR INDIVIDUALITY. TODAY PEOPLE ARE BORN INDIVIDUALS AND HAVE TO FIND THEIR COMMUNITIES.

MASS INDIE RESPONDS TO THIS SITUATION BY CREATING CLIQUES OF PEOPLE IN THE KNOW, WHILE NORMCORE KNOWS THE REAL FEAT IS HARNESSING THE POTENTIAL FOR CONNECTION TO SPRING UP. IT'S ABOUT ADAPABILITY, NOT EXCLUSIVITY.

ACKNOWLEDGEMENTS

MÁRTON PÓSFAI NICOLE SAMAY ROBERTA SINATRA

SARAH MORRISON AMAL HUSSEINI

INDEX

Introduction Introduction

1

Basics of Communities

2

Hierarchical Clustering

3

Modularity

4

Overlapping Communities

5

Characterizing Communities

6

Testing Communities

7

Summary

8

Homework

9

ADVANCED TOPICS 9.A Counting Partitions ADVANCED TOPICS 9.B Hiearchical Modularity ADVANCED TOPICS 9.C Modularity ADVANCED TOPICS 9.D Fast Algorithms for Community Detection

10

11 Figure 9.0 (cover image)

12

13

ADVANCED TOPICS 9.E Threshold for clique percolation

14

Art & Networks: K-Mode: Youth Mode K-Mode is an art collective that publishes trend reports with an unusual take on various concepts. The image shows a page from Youth Mode: A Report on Freedom, discussing the subtle shift in the origins and the meaning of communities, the topic of this chapter [1].

Homework Bibliography

This book is licensed under a Creative Commons: CC BY-NC-SA 2.0. PDF V26, 05.09.2014

SECTION 9.1

INTRODUCTION

Belgium appears to be the model bicultural society: 59% of its citizens are Flemish, speaking Dutch and 40% are Walloons who speak French. As multiethnic countries break up all over the world, we must ask: How did this country foster the peaceful coexistence of these two ethnic groups since 1830? Is Belgium a densely knitted society, where it does not matter if one is Flemish or Walloon? Or we have two nations within the same borders, that learned to minimize contact with each other? The answer was provided by Vincent Blondel and his students in 2007, who developed an algorithm to identify the country’s community structure. They started from the mobile call network, placing individuals next to whom they regularly called on their mobile phone [2]. The algorithm revealed that Belgium’s social network is broken into two large clusters of communities and that individuals in one of these clusters rarely talk with individuals from the other cluster (Figure 9.1). The origin of this separation became obvious once they assigned to each node the language spoken by each individual, learning that one cluster consisted almost exclusively of French speakers and the other collected the Dutch speakers.

Figure 9.1 Communities in Belgium

Communities extracted from the call pattern of the consumers of the largest Belgian mobile phone company. The network has about two million mobile phone users. The nodes correspond to communities, the size of each node being proportional to the number of individuals in the corresponding community. The color of each community on a red–green scale represents the language spoken in the particular community, red for French and green for Dutch. Only communities of more than 100 individuals are shown. The community that connects the two main clusters consists of several smaller communities with less obvious language separation, capturing the culturally mixed Brussels, the country’s capital. After [2].

In network science we call a community a group of nodes that have a higher likelihood of connecting to each other than to nodes from other communities. To gain intuition about community organization, next we discuss two areas where communities play a particularly important role: • Social Networks Social networks are full of easy to spot communities, something that scholars have noticed decades ago [3,4,5,6,7]. Indeed, the employees of a company are more likely to interact with their coworkers than with employees of other companies [3]. Consequently work places appear as densely interconnected communities within the social network. Communities could also represent circles of friends, or a group of individuals who pursue the same hobby together, or individuals living in the same neighborhood. A social network that has received particular attention in the context COMMUNITIES

3

INTRODUCTION

of community detection is known as Zachary’s Karate Club (Figure 9.2)

(a)

[7], capturing the links between 34 members of a karate club. Given

23

15

27

10

16

the club's small size, each club member knew everyone else. To uncov-

31

er the true relationships between club members, sociologist Wayne

14

7

29 28 32

11 8

22

18 12

25

(b)

tween the club’s president and the instructor split the club into two.

CITATIONS

About half of the members followed the instructor and the other half the president, a breakup that unveiled the ground truth, representing club's underlying community structure (Figure 9.2a). Today community finding algorithms are often tested based on their ability to infer these two communities from the structure of the network before the

90 80 70 60 50 40 30 20 10 1980

split.

1985

1990

1995 2000 YEAR

2005

2010

2015

Figure 9.2 Zachary’s Karate Club

ing of how specific biological functions are encoded in cellular net-

(a) The connections between the 34 members of Zachary's Karate Club. Links capture interactions between the club members outside the club. The circles and the squares denote the two fractions that emerged after the club split in two. The colors capture the best community partition predicted by an algorithm that optimizes the modularity coefficient M (SECTION 9.4). The community boundaries closely follow the split: The white and purple communities capture one fraction and the green-orange communities the other. After [8].

works. Two years before receiving the Nobel Prize in Medicine, Lee Hartwell argued that biology must move beyond its focus on single genes. It must explore instead how groups of molecules form functional modules to carry out a specific cellular functions [10]. Ravasz and collaborators [11] made the first attempt to systematically identify such modules in metabolic networks. They did so by building an algorithm to identify groups of molecules that form locally dense communities (Figure 9.3). Communities play a particularly important role in understanding

(b) The citation history of the Zachary karate club paper [7] mirrors the history of community detection in network science. Indeed, there was virtually no interest in Zachary’s paper until Girvan and Newman used it as a benchmark for community detection in 2002 [9]. Since then the number of citations to the paper exploded, reminiscent of the citation explosion to Erdős and Rényi’s work following the discovery of scale-free networks (Figure 3.15).

human diseases. Indeed, proteins that are involved in the same disease tend to interact with each other [12,13]. This finding inspired the disease module hypothesis [14], stating that each disease can be linked to a well-defined neighborhood of the cellular network. The examples discussed above illustrate the diverse motivations that drive community identification. The existence of communities is rooted in who connects to whom, hence they cannot be explained based on the degree distribution alone. To extract communities we must therefore inspect

The frequent use Zachary’s Karate Club network as a benchmark in community detection inspired the Zachary Karate Club Club, whose tongue-in-cheek statute states: “The first scientist at any conference on networks who uses Zachary's karate club as an example is inducted into the Zachary Karate Club Club, and awarded a prize.”

a network’s detailed wiring diagram. These examples inspire the starting hypothesis of this chapter: H1: Fundamental Hypothesis A network’s community structure is uniquely encoded in its wiring diagram.

Hence the prize is not based on merit, but on the simple act of participation. Yet, its recipients are prominent network scientists (http://networkkarate.tumblr.com/). The figure shows the Zachary Karate Club trophy, which is always held by the latest inductee. Photo courtesy of Marián Boguñá.

According to the fundamental hypothesis there is a ground truth about a network’s community organization, that can be uncovered by inspecting Aij. The purpose of this chapter is to introduce the concepts necessary to COMMUNITIES

17

1

2

24

The interest in the dataset is driven by a singular event: A conflict be-

Communities play a particularly important role in our understand-

6 3

26

• Biological Networks

5

33

19

larly interacted outside the club (Figure 9.2a).

4

9

21

Zachary documented 78 pairwise links between members who regu-

13

20

34

30

4

INTRODUCTION

understand and identify the community structure of a complex network. We will ask how to define communities, explore the various community characteristics and introduce a series of algorithms, relying on different principles, for community identification.

(a)

(b)

Figure 9.3 Communities in Metabolic Networks

The E. coli metabolism offers a fertile ground to investigate the community structure of biological systems [11]. (a) The biological modules (communities) identified by the Ravasz algorithm [11] (SECTION 9.3). The color of each node, capturing the predominant biochemical class to which it belongs, indicates that different functional classes are segregated in distinct network neighborhoods. The highlighted region selects the nodes that belong to the pyrimidine metabolism, one of the predicted communities. (b) The topologic overlap matrix of the E. coli metabolism and the corresponding dendrogram that allows us to identify the modules shown in (a). The color of the branches reflect the predominant biochemical role of the participating molecules, like carbohydrates (blue), nucleotide and nucleic acid metabolism (red), and lipid metabolism (cyan).

(c)

(d)

(c) The red right branch of the dendrogram tree shown in (b), highlighting the region corresponding to the pyridine module. (d) The detailed metabolic reactions within the pyrimidine module. The boxes around the reactions highlight the communities predicted by the Ravasz algorithm. After [11].

COMMUNITIES

5

INTRODUCTION

SECTION 9.2

BASICS OF COMMUNITIES

What do we really mean by a community? How many communities are in a network? How many different ways can we partition a network into communities? In this section we address these frequently emerging questions in community identification.

DEFINING COMMUNITIES Our sense of communities rests on a second hypothesis (Figure 9.4): H2: Connectedness and Density Hypothesis A community is a locally dense connected subgraph in a network. In other words, all members of a community must be reached through other members of the same community (connectedness). At the same time we expect that nodes that belong to a community have a higher probability

Figure 9.4 Connectedness and Density Hypothesis

to link to the other members of that community than to nodes that do not

Communities are locally dense connected subgraphs in a network. This expectation relies on two distinct hypotheses:

belong to the same community (density). While this hypothesis considerably narrows what would be considered a community, it does not uniquely

Connectedness Hypothesis Each community corresponds to a connected subgraph, like the subgraphs formed by the orange, green or the purple nodes. Consequently, if a network consists of two isolated components, each community is limited to only one component. The hypothesis also implies that on the same component a community cannot consist of two subgraphs that do not have a link to each other. Consequently, the orange and the green nodes form separate communities.

define it. Indeed, as we discuss below, several community definitions are consistent with H2. Maximum Cliques One of the first papers on community structure, published in 1949, defined a community as group of individuals whose members all know each other [5]. In graph theoretic terms this means that a community is a complete subgraph, or a clique. A clique automatically satisfies H2: it is a connected subgraph with maximal link density. Yet, viewing communities as cliques has several drawbacks:

Density Hypothesis Nodes in a community are more likely to connect to other members of the same community than to nodes in other communities. The orange, the green and the purple nodes satisfy this expectation.

• While triangles are frequent in networks, larger cliques are rare. • Requiring a community to be a complete subgraph may be too restrictive, missing many other legitimate communities. For example, none of the communities of Figure 9.2 and 9.3 correspond to complete subgraphs.

COMMUNITIES

6

Basics of Communities

Strong and Weak Communities

(a)

To relax the rigidity of cliques, consider a connected subgraph C of NC

nodes in a network. The internal degree kiint of node i is the number of links

that connect i to other nodes in C. The external degree kiext is the number of

links that connect i to the rest of the network. If kiext=0, each neighbor of i is within C, hence C is a good community for node i. If kiint=0, then node i should be assigned to a different community. These definitions allow us to distinguish two kinds of communities (Figure 9.5): • Strong Community

(b)

C is a strong community if each node within C has more links within the community than with the rest of the graph [15,16]. Specifically, a subgraph C forms a strong community if for each node i ∈ C,

kiint(C ) > kiext(C ) .

(9.1)

• Weak Community C is a weak community if the total internal degree of a subgraph exceeds its total external degree [16]. Specifically, a subgraph C forms a weak community if

k int (C ) ∑ i i∈C

(c)

>

k ext (C ) ∑ i . i∈C

(9.2)

A weak community relaxes the strong community requirement by allowing some nodes to violate (9.1). In other words, the inequality (9.2) applies to the community as a whole rather than to each node individually. Note that each clique is a strong community, and each strong community is a weak community. The converse is generally not true (Figure 9.5). Figure 9.5 Defining Communities

The community definitions discussed above (cliques, strong and weak communities) refine our notions of communities. At the same time they

(a) Cliques A clique corresponds to a complete subgraph. The highest order clique of this network is a square, shown in orange. There are several three-node cliques on this network. Can you find them?

indicate that we do have some freedom in defining communities.

NUMBER OF COMMUNITIES How many ways can we group the nodes of a network into communities? To answer this question consider the simplest community find-

(b) Strong Communities A strong community, defined in (9.1), is a connected subgraph whose nodes have more links to other nodes in the same community that to nodes that belong to other communities. Such a strong community is shown in purple. There are additional strong communities on the graph - can you find at least two more?

ing problem, called graph bisection: We aim to divide a network into two non-overlapping subgraphs, such that the number of links between the nodes in the two groups, called the cut size, is minimized (BOX 9.1). Graph Partitioning We can solve the graph bisection problem by inspecting all possible divisions into two groups and choosing the one with the smallest cut size.

(c) Weak Communities A weak community defined in (9.2) is a subgraph whose nodes’ total internal degree exceeds their total external degree. The green nodes represent one of the several possible weak communities of this network.

To determine the computational cost of this brute force approach we note that the number of distinct ways we can partition a network of N nodes into groups of N1 and N2 nodes is

N! N1!N2!

.

COMMUNITIES

(9.3)

7

Basics of Communities

BOX 9.1 GRAPH PARTITIONING

Chip designers face a problem of exceptional complexity: They need

(a)

to place on a chip 2.5 billion transistors such that their wires do not intersect. To simplify the problem they first partition the wiring diagram of an integrated circuit (IC) into smaller subgraphs, chosen such that the number of links between them to be minimal. Then they lay out different blocks of an IC individually, and reconnect these blocks. A similar problem is encountered in parallel computing, when a large computational problem is partitioned into subtasks and assigned to

(b)

individual chips. The assignment must minimize the typically slow communication between the processors. The problem faced by chip designers or software engineers is called graph partitioning in computer science [17]. The algorithms developed for this purpose, like the widely used Kerninghan-Lin algorithm (Fig-

Figure 9.6 Kerninghan-Lin Algorithm

ure 9.6), are the predecessors of the community finding algorithms dis-

The best known algorithm for graph partitioning was proposed in 1970 [18]. We illustrate this with graph bisection which starts by randomly partitioning the network into two groups of predefined sizes. Next we select a node pair (i,j), where i and j belong to different groups, and swap them, recording the resulting change in the cut size. By testing all (i,j) pairs we identify the pair that results in the largest reduction of the cut size, like the pair highlighted in (a). By swapping them we arrive to the partition shown in (b). In some implementations of the algorithm if no pair reduces the cut size, we swap the pair that increases the cut size the least.

cussed in this chapter. There is an important difference between graph partitioning and community detection: Graph partitioning divides a network into a predefined number of smaller subgraphs. In contrast community detection aims to uncover the inherent community structure of a network. Consequently in most community detection algorithms the number and the size of the communities is not predefined, but needs to be discovered by inspecting the network’s wiring diagram.

Using Stirling's formula n! ! 2π n(n / e)n we can write (9.3) as

N! 2 N (N / e) N

N1!N2!

2 N1 (N1 / e)

N1

2 N2 (N2 / e)

N2

N N +1/2 . N N2N2 +1/2 N1 +1/2 1

(9.4)

To simplify the problem let us set the goal of dividing the network into two equal sizes N1 = N2 = N/2. In this case (9.4) becomes 2 N +1 ( N +1)ln 2 – 1 ln N 2 = e , N

(9.5)

indicating that the number of bisections increases exponentially with the size of the network. To illustrate the implications of (9.5) consider a network with ten nodes

COMMUNITIES

8

Basics of Communities

which we bisect into two subgraphs of size N1 = N2 = 5. According to (9.3)

10 30

Bell Number

we need to check 252 bisections to find the one with the smallest cut size.

eN

Let us assume that our computer can inspect these 252 bisections in one millisecond (10-3 sec). If we next wish to bisect a network with a hundred nodes into groups with N1 = N2 = 50, according to (9.3) we need to check

10 20

approximately 1029 divisions, requiring about 1016 years on the same computer. Therefore our brute-force strategy is bound to fail, being impossible

BN

to inspect all bisections for even a modest size network. Community Detection

10 10

While in graph partitioning the number and the size of communities is predefined, in community detection both parameters are unknown. We call a partition a division of a network into an arbitrary number of groups, such that each node belongs to one and only one group. The number of pos-

10 0

sible partitions follows [19-22] BN =

1 ∞ jN . e∑ j! j=0

(9.6)

0

10

20

30

40

50

Figure 9.7

Number of Partitions The number of partitions of a network of size N is provided by the Bell number (9.6). The figure compares the Bell number to an exponential function, illustrating that the number of possible partitions grows faster than exponentially. Given that there are over 1040 partitions for a network of size N=50, brute-force approaches that aim to identify communities by inspecting all possible partitions are computationally infeasible.

As Figure 9.7 indicates, BN grows faster than exponentially with the net-

work size for large N.

Equations (9.5) and (9.6) signal the fundamental challenge of community identification: The number of possible ways we can partition a network into communities grows exponentially or faster with the network size N. Therefore it is impossible to inspect all partitions of a large network (BOX 9.2). In summary, our notion of communities rests on the expectation that each community corresponds to a locally dense connected subgraph. This hypothesis leaves room for numerous community definitions, from cliques to weak and strong communities. Once we adopt a definition, we could identify communities by inspecting all possible partitions of a network, selecting the one that best satisfies our definition. Yet, the number of partitions grows faster than exponentially with the network size, making such brute-force approaches computationally infeasible. We therefore need algorithms that can identify communities without inspecting all partitions. This is the subject of the next sections.

COMMUNITIES

N

9

Basics of Communities

BOX 9.2 NP COMPLETENESS

How long does it take to execute an algorithm? The answer is not given in minutes and hours, as the execution time depends on the speed of the computer on which we run the algorithm. We count instead the number of computations the algorithm performs. For example an algorithm that aims to find the largest number in a list of N numbers has to compare each number in the list with the maximum found so far. Consequently its execution time is proportional to N. In general, we call an algorithm polynomial if its execution time follows Nx. An algorithm whose execution time is proportional to N3 is slower on any computer than an algorithm whose execution time is N. But this difference dwindles in significance compared to an exponential algorithm, whose execution time increases as 2N. For example, if an algorithm whose execution time is proportional to N takes a second for N = 100 elements, then an N3 algorithm takes almost three hours on the same computer. Yet an exponential algorithm (2N) will take 1020 years to complete. The problem that an algorithm can solve in polynomial time is called

Figure 9.8 Night at the Movies

Traveling Salesman is a 2012 intellectual thriller about four mathematicians who have solved the P versus NP problem, and are now struggling with the implications of their discovery. The P versus NP problem asks whether every problem whose solution can be verified in a polynomial time can also be solved in a polynomial time. This is one of the seven Millennium Prize Problems, hence a $1,000,000 prize waits for the first correct solution. The Traveling Salesman refers to a salesman who tries to find the shortest route to visit several cities exactly once, at the end returning to his starting city. While the problem appears simple, it is in fact NP-complete - we need to try all combination to find the shortest path.

a class P problem. Several computational problems encountered in network science have no known polynomial time algorithms, but the available algorithms require exponential running time. Yet, the correctness of the solution can be checked quickly, i.e. in polynomial time. Such problems, called NP-complete, include the traveling salesman problem (Figure 9.8), the graph coloring problem, maximum clique identification, partitioning a graph into subgraphs of specific type, and the vertex cover problem (Box 7.4). The ramifications of NP-completeness has captured the fascination of the popular media as well. Charlie Epps, the main character of the CBS TV series Numbers, spends the last three months of his mother's life trying to solve an NP complete problem, convinced that the solution will cure her disease. Similarly the motive for a double homicide in the CBS TV series Elementary is the search for a solution of an NP-complete problem, driven by its enormous value for cryptography.

COMMUNITIES

10

Basics of Communities

SECTION 9.3

HIERARCHICAL CLUSTERING

To uncover the community structure of large real networks we need algorithms whose running time grows polynomially with N. Hierarchical clustering, the topic of this section, helps us achieve this goal. The starting point of hierarchical clustering is a similarity matrix, whose elements xij indicate the distance of node i from node j. In community identification the similarity is extracted from the relative position of nodes i and j within the network. Once we have xij, hierarchical clustering iteratively identifies groups of nodes with high similarity. We can use two different procedures to achieve this: agglomerative algorithms merge nodes with high similarity into the same community, while divisive algorithms isolate communities by removing low similarity links that tend to connect communities. Both procedures generate a hierarchical tree, called a dendrogram, that predicts the possible community partitions. Next we explore the use of agglomerative and divisive algorithms to identify communities in networks.

AGGLOMERATIVE PROCEDURES: THE RAVASZ ALGORITHM We illustrate the use of agglomerative hierarchical clustering for community detection by discussing the Ravasz algorithm, proposed to identify functional modules in metabolic networks [11]. The algorithm consists of the following steps: Step 1: Define the Similarity Matrix In an agglomerative algorithm similarity should be high for node pairs that belong to the same community and low for node pairs that belong to different communities. In a network context nodes that connect to each other and share neighbors likely belong to the same community, hence their xij should be large. The topological overlap matrix (Figure 9.9) [11]

COMMUNITIES

xijo =

J (i, j) min(ki, kj )+1‒Θ(A ) ij

(9.7)

11

Hierarchical Clustering

erding according the predominant to the to predominant A (1) B (1) class of the substrates it producf the substrates (a)it producAB BA (1) 1 B (1) 1 (b) A 1 1 e classification of metabolism 1 1 sification of metabolism 1 1 C (3) A B C C (3) 1 1 1biochem1 ndard, small molecule small molecule biochemn shown in Fig. in 4A,Fig. and4A, in and the in the C C 1/3 1/3 representation in Fig. 4B, sional representation in Fig. 4B, D (0) D (0) fates a given of a small given molecule small molecule 1/3 1/3 ed on the on same of stributed the branch same branch of 1/3 1/3 1/3 1/3 d4A) correspond to relatively and correspond to relatively I(1/3) I(1/3) D D E (1/3) ons of the metabolic net- net- E (1/3) 1 d regions of the metabolic 1 1 1 2/3 1 11/3 1/3 1 K (1)2/3 1 erefore, there are strong strong1 K (1) B). Therefore, there 2/3 1 1/3 1/3 are 2/3 H 1/3 F(1) 1/3 GF (1/3) 1 H G (1/3) (1/3) (1) 1 een shared biochemical between shared biochemical (1/3) I etabolites and theand global J (2/3) I J (2/3) of metabolites the global E. coli metaboation ofE. coliofEmetaboE Fig. 3. Uncovering organization the underlying 1 Fig. 3. Uncovering the underlying 1 m) (16). (116).1 1 1 2/3 modularity network.network. 1/3 , bottom) modularity of2/3 complex 1/3 of a1complex Ka1illustrated K illustrated Topological eate putative modules ob- (A) ob1 2/3 overlap (A)H Topological overlap theF putative modules 1 2/3 G 1 on a small network. For F –analyaph – based 1 on hypothetical a smallHhypothetical network. For our theory graph theory based G analyeach pair of nodes, iofand j, wei and define mical pathways, we con-we each pair nodes, j, we define biochemical pathways, conJ (i, j) the topological overlapO J T (i, j) the topological overlapO involving the J (i, the npathways the pathways involving j)/[min k )], (k, where j) T J (i, j) J (i, (k, j)/[min k )],J (i, where

A D B K C J D I KH J E

IF

HG E

F

Figure 9.9 The Ravasz Algorithm

G

F E

E

H

H

I

I

J

J

K

K

D

D

C B A

The agglomerative hierarchical clustering algorithm proposed by Ravasz was designed to identify functional modules in metabolic networks, but it can be applied to arbitrary networks.

F

0.90 0.70 0.50 0.30 0.10

C B A

0.90 0.70 0.50 0.30 0.10

captures this expectation. Here Θ(x) is the Heaviside step function,

(a) Topological Overlap A small network illustrating the calculation of the topological overlap xij0. For each node pair i and j we calculate the overlap (9.7). The obtained xij0 for each connected node pair is shown on each link. Note that xij0 can be nonzero for nodes that do not link to each other, but have a common neighbor. For example, xij=1/3 for C and E. (b) Topological Overlap Matrix The topological overlap matrix xij0 for the network shown in (a). The rows and columns of the matrix were reordered after applying average linkage clustering, placing next to each other nodes with the highest topological overlap. The colors denote the degree of topological overlap between each node pair, as calculated in (a). By cutting the dendrogram with the orange line, it recovers the three modules built into the network. The dendogram indicates that the EFG and the HIJK modules are closer to each other than they are to the ABC module.

which is zero for x≤0 and one for x>0; J(i, j) is the number of common neighbors of node i and j, to which we add one (+1) if there is a direct link between i and j; min(ki,kj) is the smaller of the degrees ki and kj. Consequently: • x0ij=1 if nodes i and j have a link to each other and have the same neighbors, like A and B in Figure 9.9a.

• x0ij (i, j) =0 if i and j do not have common neighbors, nor do they link to each other, like A and E. • Members of the same dense local network neighborhood have high topological overlap, like nodes H, I, J, K or E, F, G.

After [11].

Step 2: Decide Group Similarity As nodes are merged into small communities, we must measure how similar two communities are. Three approaches, called single, complete and average cluster similarity, are frequently used to calculate the community similarity from the node-similarity matrix xij (Figure 9.10). The

Ravasz algorithm uses the average cluster similarity method, defining the similarity of two communities as the average of xij over all node pairs i and j that belong to distinct communities (Figure 9.10d). Step 3: Apply Hierarchical Clustering The Ravasz algorithm uses the following procedure to identify the communities: 1. Assign each node to a community of its own and evaluate xij for all node pairs. 2. Find the community pair or the node pair with the highest similarity and merge them into a single community. 3. Calculate the similarity between the new community and all other communities. 4. Repeat Steps 2 and 3 until all nodes form a single community. Step 4: Dendrogram

COMMUNITIES

12

Hierarchical Clustering

(a)

(b)

1

2

A

1B

A

B

x1 ij = r ij = A

(c)

1B

D

A

1B

2F

E

C

D G C

x ij = r ij =

1

A B C A B C

F D E EF G 2.75 2.22 3.46 3.08 3.38 2.68 3.97 G3.40 2.31 1.59 2.88 2.34 D E F G 2.75 2.22 3.46 3.08 2 3.38 2.68 3.97 3.40 2.31 1.59 2.88 D2.34

C

B C Complete Linkage:

E

A

1B

F

A

x 12 = 3.97

In agglomerative clustering we need to determine the similarity of two communities from the node similarity matrix xij. We illustrate this procedure for a set of points whose similarity xij is the physical distance rij between them. In networks xij corresponds to some network-based distance measure, like xijo defined in (9.7).

D G F

E

x 12 = 1.59

G

2

(a) Similarity Matrix Seven nodes forming two distinct communities. The table shows the distance rij between each node pair, acting as the similarity xij.

D C

2F

E D G

C B Average Linkage:

x 12 = 2.84

Average Linkage:

x 12 = 2.84

G

Complete Linkage:

Cluster Similarity

2F

E

C

A

(d)

Figure 9.10

D

1Single Linkage: x12 = 1.59

D G

x 12 = 3.97

A

B C Single Linkage:

2F

E

2

F

E

(b) Single Linkage Clustering The similarity between communities 1 and 2 is the smallest of all xij , where i and j are in different communities. Hence the similarity is x12=1.59, corresponding to the distance between nodes C and E.

G

(c) Complete Linkage Clustering The similarity between two communities is the maximum of xij, where i and j are in distinct communities. Hence x12=3.97.

The pairwise mergers of Step 3 will eventually pull all nodes into a sin-

(d) Average Linkage Clustering The similarity between two communities is the average of xijover all node pairs i and j that belong to different communities. This is the procedure implemented in the Ravasz algorithm, providing x12=2.84.

gle community. We can use a dendrogram to extract the underlying community organization. The dendrogram visualizes the order in which the nodes are assigned to specific communities. For example, the dendrogram of Figure 9.9b tells us that the algorithm first merged nodes A with B, K with J and E with F, as each of these pairs have x0ij=1. Next node C was added to the (A, B) community, I to (K, J) and G to (E, F). To identify the communities we must cut the dendrogram. Hierarchical clustering does not tell us where that cut should be. Using for example the cut indicated as a dashed line in Figure 9.9b, we recover the three obvious communities (ABC, EFG, and HIJK). Applied to the E. coli metabolic network (Figure 9.3a), the Ravasz algorithm identifies the nested community structure of bacterial metabolism. To check the biological relevance of these communities, we color-coded the branches of the dendrogram according to the known biochemical classification of each metabolite. As shown in Figure 9.3b, substrates with similar biochemical role tend to be located on the same branch of the tree. In other words the known biochemical classification of these metabolites confirms the biological relevance of the communities extracted from the network topology. Computational Complexity How many computations do we need to run the Ravasz algorithm? The algorithm has four steps, each with its own computational complexity: COMMUNITIES

13

Hierarchical Clustering

0.31

(a)

0.17

Figure 9.11

(b) 0.31 0.29 0.29

0.23

0.17 0.2

0.57

0.2 0.31

0.17

0.4 0.23

0.18

Centrality Measures Divisive algorithms require a centrality measure that is high for nodes that belong to different communities and is low for node pairs in the same community. Two frequently used measures can achieve this:

0.18

(a) Link Betweenness Link betweenness captures the role of each link in information transfer. Hence xij is proportional to the number of shortest paths between all node pairs that run along the link (i,j). Consequently, inter-community links, like the central link in the figure with xij=0.57, have large betweenness. The calculation of link betweenness scales as 0(L N), or 0(N2) for a sparse network [23].

0.23 Step 1: The calculation of the similarity matrix x0ij requires us to com0.18 0.2

pare N2 node pairs, hence the number of computations scale as N2. In 0.2

0.4 other words its computational complexity is 0(N2). 0.18 0.23

Step 2: Group similarity requires us to determine in each step the distance of the new cluster to all other clusters. Doing this N times requires 0(N2) calculations. Steps 3 & 4: The construction of the dendrogram can be performed in 0(NlogN) steps.

(b) Random-Walk Betweenness A pair of nodes m and n are chosen at random. A walker starts at m, following each adjacent link with equal probability until it reaches n. Random walk betweenness xij is the probability that the link i→j was crossed by the walker after averaging over all possible choices for the starting nodes m and n. The calculation requires the inversion of an NxN matrix, with 0(N3) computational complexity and averaging the flows over all node pairs, with 0(LN2). Hence the total computational complexity of random walk betweenness is 0[(L + N) N2], or 0(N3) for a sparse network.

Combining Steps 1-4, we find that the number of required computations scales as 0(N2) + 0(N2) + 0(NlogN). As the slowest step scales as 0(N2), the algorithm’s computational complexity is 0(N2). Hence hierarchal clustering is much faster than the brute force approach, which generally scales as 0(eN).

DIVISIVE PROCEDURES: THE GIRVAN-NEWMAN ALGORITHM Divisive procedures systematically remove the links connecting nodes that belong to different communities, eventually breaking a network into isolated communities. We illustrate their use by introducing an algorithm proposed by Michelle Girvan and Mark Newman [9,23], consisting of the following steps: Step 1: Define Centrality While in agglomerative algorithms xij selects node pairs that belong to

the same community, in divisive algorithms xij, called centrality, selects node pairs that are in different communities. Hence we want xij to be

high (or low) if nodes i and j belong to different communities and small if they are in the same community. Three centrality measures that satisfy this expectation are discussed in Figure 9.11. The fastest of the three is link betweenness, defining xij as the number of shortest paths that

go through the link (i, j). Links connecting different communities are expected to have large xij while links within a community have small xij.

Step 2: Hierarchical Clustering The final steps of a divisive algorithm mirror those we used in agglomerative clustering (Figure 9.12): 1. Compute the centrality xij of each link. 2. Remove the link with the largest centrality. In case of a tie, choose one link randomly. 3. Recalculate the centrality of each link for the altered network. COMMUNITIES

14

Hierarchical Clustering

E

I

F

H

G

E K

F

J

(a)

A

A

A

(b) A

C

C

C

B

AB

B

C

x x xD D

D

F

E

E

E

F GF G

I H

G H

J

B

D

I I H K K J J

E

K F

E

B

AB

C

C

D

D

E

A

A

D

I

J

I I H K K J J

K

E F

E

E

FG FG

C

D

D

G H

I I I H KH K J J J

HH

I I

BB

AA

FF

(d)

CC

CC

DD

DD

EE KK

BB

AA

xx HH

GG

I I

FF

HH

GG

A

A B

I I

KK

JJ

JJ

JJ

BA DCB EDC FED JFE HJF IHJ JI H K JI C

KJ

D

x x x

B

0.5

0.5

0.5

0.4

0.4

0.4

0.3

0.3

0.3

0.2

0.2

0.1

0.1

C

D

K F E

F

E

C

C

D

D

E

E

K

0 0

0

I I H KH K J J J

J

J

K

I

K

32

2 43

n43 6n4

6n 8 6

8 10 8

10

0.30.3 0.20.2

0 0

(Figure0 9.2a),2 finding that the predicted communities matched almost 3 4 6 8 10 nn

6

8

(a) The divisive hierarchical algorithm of GirM 0.2 Newman uses link betweenness van and (Figure0.1 9.11a) as centrality. In the figure the link weights, assigned proportionally 0 to xij , indicate that links connecting dif0 2 3 4 10 n 6 the8 highest ferent communities have xij. Indeed, each shortest path between these communities must run through them.

(f) The modularity function, M, introduced in SECTION 9.4, helps us select the optimal cut. Its maxima agrees with our expectation that the best cut is at level 3, as shown in (e).

0.10.1

Girvan and Newman applied their algorithm to Zachary’s Karate Club 4

0.5

(e) The dendrogram generated by the Girvan-Newman algorithm. The cut at level 3, shown as an orange dotted line, reproduces the three communities present in the network.

M 4. MRepeat steps 2 and 3 until all links are removed.

3

10

perfectly the two groups after the break-up. Only node 3 was classified incorrectly.

Computational Complexity The rate limiting step of divisive algorithms is the calculation of centrality. Consequently the algorithm’s computational complexity depends on which centrality measure we use. The most efficient is link betweenness, with 0(LN) [24,25,26] (Figure 9.11a). Step 3 of the algorithm introduces an additional factor L in the running time, hence the algorithm scales as 0(L2N), or 0(N3) for a sparse network.

HIERARCHY IN REAL NETWORKS Hierarchical clustering raises two fundamental questions: Nested Communities First, it assumes that small modules are nested into larger ones. These nested communities are well captured by the dendrogram (Figures 9.9b and 9.12e). How do we know, however, if such hierarchy is indeed present in a network? Could this hierarchy be imposed by our algorithms, whether or not the underlying network has a nested community structure? Communities and the Scale-Free Property Second, the density hypothesis states that a network can be partitioned into a collection of subgraphs that are only weakly linked to other subgraphs. How can we have isolated communities in a scale-free network, if the hubs inevitably link multiple communities?

COMMUNITIES

J

10

0.40.4

2

K

H

(b)-(d) The sequence of images illustrates how the algorithm removes one-by-one the three highest xij links, leaving three isolated communities behind. Note that betweenness needs to be recalculated after each link removal.

0.50.5

0

I G

0.3

0 20

F

0.4

0.1

0

K

The Girvan-Newman Algorithm

G H

I

E

I H

G

J

Figure 9.12

FG F G H

F

J

M M M 0.2

EE KK

(f)

C

K

H

B

A AB B

A

C

A

(c)

BB

B

AB

B

C

x xGH xH

F G FG

(e)

E

I

x

G

15

Hierarchical Clustering

(a)

The hierarchical network model, whose construction is shown in Figure 9.13, resolves the conflict between communities and the scale-free property and offers intuition about the structure of nested hierarchical commu(b)

nities. The obtained network has several key characteristics: Scale-free Property The hierarchical model generates a scale-free network with degree exponent (Figure 9.14a, ADVANCED TOPICS 9.A) ln5 γ =1+ = 2.161 . ln4 Size Independent Clustering Coefficient

(c)

While for the Erdős-Rényi and the Barabási-Albert models the clustering coefficient decreases with N (SECTION 5.9), for the hierarchical network we have C=0.743 independent of the network size (Figure 9.14c). Such N-independent clustering coefficient has been observed in metabolic networks [11]. Hierarchical Modularity The model consists of numerous small communities that form larger communities, which again combine into ever larger communities. The quantitative signature of this nested hierarchical modularity is the dependence of a node’s clustering coefficient on the node’s degree [11,27,28]

C(k) ~ k −1 .

Figure 9.13 Hierarchical Network



(9.8) The iterative construction of a deterministic hierarchical network.

In other words, the higher a node’s degree, the smaller is its clustering coefficient.

(a) Start from a fully connected module of five nodes. Note that the diagonal nodes are also connected, but the links are not visible.

Equation (9.8) captures the way the communities are organized in a network. Indeed, small degree nodes have high C because they reside in dense communities. High degree nodes have small C because they con-

(b) Create four identical replicas of the starting module and connect the peripheral nodes of each module to the central node of the original module. This way we obtain a network with N=25 nodes.

nect to different communities. For example, in Figure 9.13c the nodes at the center of the five-node modules have k=4 and clustering coefficient C=4. Those at the center of a 25-node module have k=20 and C=3/19. Those at the center of the 125-node modules have k=84 and C=3/83.

(c) Create four replicas of the 25-node module and connect the peripheral nodes again to the central node of the original module, obtaining an N=125-node network. This process is continued indefinitely.

Hence the higher the degree of a node, the smaller is its C. The hierarchical network model suggests that inspecting C(k) allows us to decide if a network is hierarchical. For the Erdős-Rényi and the

After [27].

Barabási-Albert models C(k) is independent of k, indicating that they do not display hierarchical modularity. To see if hierarchical modularity is present in real systems, we calculated C(k) for ten reference networks, finding that (Figure 9.36): • Only the power grid lacks hierarchical modularity, its C(k) being independent of k (Figure 9.36a). • For the remaining nine networks C(k) decreases with k. Hence in

COMMUNITIES

16

Hierarchical Clustering

(a)

10

0

10

-1

10

-2

10

0

10

0

10

-1

10

-1

10

-2

10

-2

-3

-3

P(k)

P(k)

P(k)

p k 10 p k 10 p k 10 -3

10

-4

10

-4

10

-4

10

-5

10

-5

10

-5

10

-6

10

-6

10

-6

10

-7

10

-7

10

-7

10

-8

0 10

-8

10

(b) 10

0

10

0

10

0

10

-1

10

-1

10

-1

C(k) C(k) C(k) -2

10

-2

10

-2

10

-3

10

-3

10

-3

-4

-4

0

10

0

10

0

10

-1

10

-1

10

-1

C(N) C(N) C(N)

10

-8

(c)

10

-4

0 101 0 1 2 1 2 3 2 3 4 3 4 10 40 10 0 110 01 2 10 10 10 10 10 k 10 10 10 10 10 10 k 10 10k 10 1010 10 10 10

k101010123 k 101010234k

345

10 10 10

10

-2

10

-2

10

-2

10

-3

10

-3

10

-3

-4

-4

-4

45 5 3 2 10 10 10 10 10 2 10 10 2 10 10 10 10 N3 10 410N3 10 4N10 510 4 10 5

these networks small nodes are part of small dense communities,

Figure 9.14

while hubs link disparate communities to each other.

Scaling in Hierarchical Networks

• For the scientific collaboration, metabolic, and citation network

10

5

Three quantities characterize the hierarchical network shown in Figure 9.13:

C(k) follows (9.8) in the high-k region. The form of C(k) for the Internet, mobile, email, protein interactions, and the WWW needs to

(a) Degree Distribution The scale-free nature of the generated network is illustrated by the scaling of pk with slope γ=ln 5/ln 4, shown as a dashed line. See ADVANCED TOPICS 9.A for the derivation of the degree exponent.

be derived individually, as for those C(k) does not follow (9.8). More detailed network models predict C(k)~k-β, where β is between 0 and 2 [27,28]. In summary, in principle hierarchical clustering does not require pre-

(b) Hierarchical Clustering C(k) follows (9.8), shown as a dashed line. The circles show C(k) for a randomly wired scale-free network, obtained from the original model by degree-preserving randomization. The lack of scaling indicates that the hierarchical architecture is lost under rewiring. Hence C(k) captures a property that goes beyond the degree distribution.

liminary knowledge about the number and the size of communities. In practice it generates a dendrogram that offers a family of community partitions characterizing the studied network. This dendrogram does not tell us which partition captures best the underlying community structure. Indeed, any cut of the hierarchical tree offers a potentially valid partition (Figure 9.15). This is at odds with our expectation that in each network there is a ground truth, corresponding to a unique community structure.

(c) Size Independent Clustering Coefficient The dependence of the clustering coefficient C on the network size N. For the hierarchical model C is independent of N (filled symbols), while for the Barabási-Albert model C(N) decreases (empty symbols).

(a)

After [27].

Figure 9.15 Ambiguity in Hierarchical Clustering A

(b)

B

A

C

D

E

F

J

H

I

J

(d)

B

A

K

C

C

D

D

D

E

I G

K

H J

COMMUNITIES

F

E

I G

K

H J

F

Hierarchical clustering does not tell us where to cut a dendrogram. Indeed, depending on where we make the cut in the dendrogram of Figure 9.9a, we obtain (b) two, (c) three or (d) four communities. While for a small network we can visually decide which cut captures best the underlying community structure, it is impossible to do so in larger networks. In the next section we discuss modularity, that helps us select the optimal cut.

B

A

C

E F

B

(c)

I G

K

H J

17

Hierarchical Clustering

While there are multiple notions of hierarchy in networks [29,30], inspecting C(k) helps decide if the underlying network has hierarchical modularity. We find that C(k) decreases in most real networks, indicating that most real systems display hierarchical modularity. At the same time C(k) is independent of k for the Erdős-Rényi or Barabási-Albert models, indicating that these canonical models lack a hierarchical organization.

COMMUNITIES

18

Hierarchical Clustering

SECTION 9.4

MODULARITY

In a randomly wired network the connection pattern between the nodes is expected to be uniform, independent of the network's degree distribution. Consequently these networks are not expected to display systematic local density fluctuations that we could interpret as communities. This expectation inspired the third hypothesis of community organization: H3: Random Hypothesis Randomly wired networks lack an inherent community structure. This hypothesis has some actionable consequences: By comparing the link density of a community with the link density obtained for the same group of nodes for a randomly rewired network, we could decide if the original community corresponds to a dense subgraph, or its connectivity pattern emerged by chance. In this section we show that systematic deviations from a random configuration allow us to define a quantity called modularity, that measures the quality of each partition. Hence modularity allows us to decide if a particular community partition is better than some other one. Finally, modularity optimization offers a novel approach to community detection.

MODULARITY Consider a network with N nodes and L links and a partition into nc

communities, each community having Nc nodes connected to each other

by Lc links, where c =1,...,nc. If Lc is larger than the expected number of links between the Nc nodes given the network’s degree sequence, then the nodes

of the subgraph Cc could indeed be part of a true community, as expected

based on the Density Hypothesis H2 (Figure 9.2). We therefore measure the difference between the network’s real wiring diagram (Aij) and the expect-

ed number of links between i and j if the network is randomly wired (pij), 1 Mc = (Aij − p ij ). 2L (i,∑ j)∈Cc

COMMUNITIES

(9.9)

19

Modularity

(a)

Here pij can be determined by randomizing the original network, while keeping the expected degree of each node unchanged. Using the degree

OPTIMAL PARTITION M = 0 .41

SU

OPTIMAL PARTITION SUBOPTIMAL = 0 .41PARTITION SINGLEMCOMMUNITY M = 0 .22 M =0

SU

preserving null model (7.1) we have



p ij =

ki kj 2L

.

(9.10)

If Mc is positive, then the subgraph Cc has more links than expected by

is zeroPARTITION then the chance, hence it represents a potential community. If M OPTIMAL c M the = 0 degree .41 connectivity between the N nodes is random, fully explained by

(b)

c

distribution. Finally, if Mc is negative, then the nodes of Cc do not form a

NE

community. Using (9.10) we can derive a simpler form for the modularity (9.9) (ADVANCED TOPICS 9.B) Mc =

OPTIMAL PARTITION Lc kc 2 , − M = 0 .41 (9.11) L ( 2L ) SINGLE COMMUNITY M =0

(c)

where Lc is the total number of links within the community Cc and kc is the

SUBOPTIMAL PARTITION SINGLE COMMUNITY M = 0 .22 NEGATIVE MODULARITY M =0 M = − 0.12

total degree of the nodes in this community. To generalize these ideas to a full network consider the complete partition that breaks the network into nc communities. To see if the local link density of the subgraphs defined by this partition differs from the expectSINGLE COMMUNITY M = 0 modued density in a randomly wired network, we define the partition’s

(d)

larity by summing (9.11) over all nc communities [23]

NEGATIVE MODULARITY M = − 0.12

nc Lc kc 2 M = − ∑ [ L ( 2L ) ]. (9.12) c=1

Modularity has several key properties:

Figure 9.16 Modularity

• Higher Modularity Implies Better Partition To better understand the meaning of modularity, we show M defined in (9.12) for several partitions of a network with two obvious communities.

The higher is M for a partition, the better is the corresponding community structure. Indeed, in Figure 9.16a the partition with the maximum modularity (M=0.41) accurately captures the two obvious communities. A partition with a lower modularity clearly deviates from

(a) Optimal Partition The partition with maximal modularity M=0.41 closely matches the two distinct communities.

these communities (Figure 9.16b). Note that the modularity of a partition cannot exceed one [31,32]. • Zero and Negative Modularity

(b) Suboptimal Partition A partition with a sub-optimal but positive modularity, M=0.22, fails to correctly identify the communities present in the network.

By taking the whole network as a single community we obtain M=0, as in this case the two terms in the parenthesis of (9.12) are equal (Figure 9.16c). If each node belongs to a separate community, we have Lc=0

and the sum (9.12) has nc negative terms, hence M is negative (Figure

(c) Single Community If we assign all nodes to the same community we obtain M=0, independent of the network structure.

9.16d).

We can use modularity to decide which of the many partitions predicted by a hierarchical method offers the best community structure, select-

(d) Negative Modularity If we assign each node to a different community, modularity is negative, obtaining M=-0.12.

ing the one for which M is maximal. This is illustrated in Figure 9.12f, which shows M for each cut of the dendrogram, finding a clear maximum when COMMUNITIES

20

Modularity

NE

the network breaks into three communities.

THE GREEDY ALGORITHM The expectation that partitions with higher modularity corresponds to partitions that more accurately capture the underlying community structure prompts us to formulate our final hypothesis: H4: Maximal Modularity Hypothesis For a given network the partition with maximum modularity corresponds to the optimal community structure. The hypothesis is supported by the inspection of small networks, for which the maximum M agrees with the expected communities (Figures 9.12 and 9.16). The maximum modularity hypothesis is the starting point of several community detection algorithms, each seeking the partition with the largest modularity. In principle we could identify the best partition by checking M for all possible partitions, selecting the one for which M is largest. Given, however, the exceptionally large number of partitions, this bruteforce approach is computationally not feasible. Next we discuss an algorithm that finds partitions with close to maximal M, while bypassing the need to inspect all partitions. Greedy Algorithm The first modularity maximization algorithm, proposed by Newman [33], iteratively joins pairs of communities if the move increases the partition's modularity. The algorithm follows these steps: 1. Assign each node to a community of its own, starting with N communities of single nodes. 2. Inspect each community pair connected by at least one link and compute the modularity difference ∆M obtained if we merge them. Identify the community pair for which ∆M is the largest and merge them. Note that modularity is always calculated for the full network. 3. Repeat Step 2 until all nodes merge into a single community, recording M for each step. 4. Select the partition for which M is maximal. To illustrate the predictive power of the greedy algorithm consider the collaboration network between physicists, consisting of N=56,276 scientists in all branches of physics who posted papers on arxiv.org (Figure 9.17). The greedy algorithm predicts about 600 communities with peak modularity M = 0.713. Four of these communities are very large, together containing 77% of all nodes (Figure 9.17a). In the largest community 93% of the authors publish in condensed matter physics while 87% of the authors in the second largest community publish in high energy COMMUNITIES

21

Modularity

(a)

Physics E−print Archive, 56,276 nodes

13,454

11,070

93% C.M.

87% H.E.P.

(b)

1,009

1,744

480 615 9,278 98% astro

(c)

mostly condensed matter, 9,350 nodes

subgroup, 134 nodes

1,005

460

9,350 86% C.M.

+ 600 smaller communities

power−law distribution of group sizes

single research group 28 nodes

physics, indicating that each community contains physicists of similar

Figure 9.17

professional interests. The accuracy of the greedy algorithm is also il-

The Greedy Algorithm

lustrated in Figure 9.2a, showing that the community structure with the

(a) Clustering Physicists The community structure of the collaboration network of physicists. The greedy algorithm predicts four large communities, each composed primarily of physicists of similar interest. To see this on each cluster we show the percentage of members who belong to the same subfield of physics. Specialties are determined by the subsection(s) of the e-print archive in which individuals post papers. C.M. indicates condensed matter, H.E.P. high-energy physics, and astro astrophysics. These four large communities coexist with 600 smaller communities, resulting in an overall modularity M=0.713.

highest M for the Zachary Karate Club accurately captures the club’s subsequent split. Computational Complexity Since the calculation of each ∆M can be done in constant time, Step 2 of the greedy algorithm requires O(L) computations. After deciding which communities to merge, the update of the matrix can be done in a worstcase time O(N). Since the algorithm requires N–1 community mergers, its complexity is O[(L + N)N], or O(N2) on a sparse graph. Optimized implementations reduce the algorithm’s complexity to O(Nlog2N) (ONLINE RESOURCE 9.1).

(b) Identifying Subcommunities We can identify subcommunities by applying the greedy algorithm to each community, treating them as separate networks. This procedure splits the condensed matter community into many smaller subcommunities, increasing the modularity of the partition to M=0.807.

LIMITS OF MODULARITY Given the important role modularity plays in community identification, we must be aware of some of its limitations. Resolution Limit Modularity maximization forces small communities into larger ones

(c) Research Groups One of these smaller communities is further partitioned, revealing individual researchers and the research groups they belong to.

[34]. Indeed, if we merge communities A and B into a single community, the network’s modularity changes with (ADVANCED TOPICS 9.B) ΔMAB =

l AB kAkB , − L 2L 2

(9.13)

After [33].

where lAB is number of links that connect the nodes in community A

with total degree kA to the nodes in community B with total degree kB. If

A and B are distinct communities, they should remain distinct when M is maximized. As we show next, this is not always the case. Consider the case when kAkB |2L < 1, in which case (9.13) predicts ∆MAB > 0 if there is at least one link between the two communities (lAB ≥ 1).

Hence we must merge A and B to maximize modularity. Assuming for simplicity that kA ~ kB= k, if the total degree of the communities satisfies COMMUNITIES

22

Modularity

k ≤ 2L

(9.14)

then modularity increases by merging A and B into a single communi-

>

ty, even if A and B are otherwise distinct communities. This is an artifact of modularity maximization: if kA and kB are under the threshold

(9.14), the expected number of links between them is smaller than one. Hence even a single link between them will force the two communities together when we maximize M. This resolution limit has several consequences:

Online Resource 9.1

• Modularity maximization cannot detect communities that are

Modularity-based Algorithms

smaller than the resolution limit (9.14). For example, for the WWW

There are several widely used community finding algorithms that maximize modularity.

sample with L=1,497,134 (Table 2.1) modularity maximization will have difficulties resolving communities with total degree kC ≲

Optimized Greedy Algorithm The use of data structures for sparse matrices can decrease the greedy algorithm’s computational complexity to 0(Nlog2N) [35]. See http:// cs.unm.edu/~aaron/research/fastmodularity.htm for the code.

1,730. • Real networks contain numerous small communities [36-38]. Given the resolution limit (9.14), these small communities are systematically forced into larger communities, offering a misleading

Louvain Algorithm The modularity optimization algorithm achieves a computational complexity of 0(L) [2]. Hence it allows us to identify communities in networks with millions of nodes, as illustrated in Figure 9.1. The algorithm is described in ADVANCED TOPICS 9.C. See https:// sites.google.com/site/findcommunities/ for the code.

characterization of the underlying community structure. To avoid the resolution limit we can further subdivide the large communities obtained by modularity optimization [33,34,39]. For example, treating the smaller of the two condensed-matter groups of Figure 9.17a as a separate network and feeding it again into the greedy algorithm,

>

we obtain about 100 smaller communities with an increased modularity M = 0.807 (Figure 9.17b) [33]. Modularity Maxima All algorithms based on maximal modularity rely on the assumption that a network with a clear community structure has an optimal partition with a maximal M [40]. In practice we hope that Mmax is an easy to find maxima and that the communities predicted by all other partitions are distinguishable from those corresponding to Mmax. Yet, as we show next, this optimal partition is difficult to identify among a large number of close to optimal partitions. Consider a network composed of nc subgraphs with comparable link

densities kC ≈ 2L/nc. The best partition should correspond to the one where each cluster is a separate community (Figure 9.18a), in which case

M=0.867. Yet, if we merge the neighboring cluster pairs into a single community we obtain a higher modularity M=0.87 (Figure 9.18b). In general (9.13) and (9.14) predicts that if we merge a pair of clusters, we change modularity with ΔM =

l AB 2 − 2 . L nc

(9.15)

In other words the drop in modularity is less than ∆M = −2/nc2. For a

network with nc = 20 communities, this change is at most ∆M = −0.005,

tiny compared to the maximal modularity M≃0.87 (Figure 9.18b). As the

COMMUNITIES

23

Modularity

67 (a)

Figure 9.18 Modularity Maxima A ring network consisting of 24 cliques, each made of 5 nodes.

M=0.867

(a) The Intuitive Partition The best partition should correspond to the configuration where each cluster is a separate community. This partition has M=0.867.

(b)

71

(b) The Optimal Partition If we combine the clusters into pairs, as illustrated by the node colors, we obtain M=0.871, higher than M obtained for the intuitive partition (a).

M=0.871

(c)

7

M=0.80

the height of the modularity ber of these structures k is of the network, since there dularity (d) tructures than nodes in the s variations k is in n are very xts, Q max and increasing n (or there rease Q max . If the intention in the cores across networks, these for in order to ensure a fair re very

(c) Random Partition Partitions with comparable modularity tend to have rather distinct community structure. For example, if we assign each cluster randomly to communities, even clusters that have no links to each other, like the five highlighted clusters, may end up in the same community. The modularity of this random partition is still high, M=0.80, not too far from the optimal M=0.87.

7

(d) Modularity Plateau The modularity function of the network (a) reconstructed from 997 partitions. The vertical axis gives the modularity M, revealing a high-modularity plateau that consists of numerous low-modularity partitions. We lack, therefore, a clear modularity maxima - instead the modularity function is highly degenerate. After [40].

MODULARITY, M

g n (or pendence of Q max on n and k entiontopology and how it network For instance, in Appendix A, these ence for the ring network and fair ya ofits degenerate solutions

MODULARITY, M

0

Because of this dependence, ny empirical network should without a null expectation

n and k how it dix A, ork and utions ndence, should tation

number of groups increases, ∆Mij goes to zero, hence it becomes increasingly difficult to distinguish the optimal partition from the numerous suboptimal alternatives whose modularity is practically indistinguishable from Mmax. In other words, the modularity function is not peaked around a single optimal partition, but has a high modularity plateau (Figure 9.18d). In summary, modularity offers a first principle understanding of a network's community structure. Indeed, (9.16) incorporates in a compact form a number of essential questions, like what we mean by a community, how we choose the appropriate null model, and how we measure the goodness of a particular partition. Consequently modularity optimization plays a

COMMUNITIES

24

Modularity

central role in the community finding literature. At the same time, modularity has several well-known limitations: First, it forces together small weakly connected communities. Second, networks lack a clear modularity maxima, developing instead a modularity plateau containing many partitions with hard to distinguish modularity. This plateau explains why numerous modularity maximization algorithms can rapidly identify a high M partition: They identify one of the numerous partitions with close to optimal M. Finally, analytical calculations and numerical simulations indicate that even random networks contain high modularity partitions, at odds with the random hypothesis H3 that motivated the concept of modularity [41-43]. Modularity optimization is a special case of a larger problem: Finding communities by optimizing some quality function Q. The greedy algorithm and the Louvain algorithm described in ADVANCED TOPICS 9.C assume that Q = M, seeking partitions with maximal modularity. In ADVANCED TOPICS 9.C we also describe the Infomap algorithm, that finds communities by minimizing the map equation L, an entropy-based measure of the partition quality [44-46].

COMMUNITIES

25

Modularity

SECTION 9.5

OVERLAPPING COMMUNITIES

A node is rarely confined to a single community. Consider a scientist, who belongs to the community of scientists that share his professional interests. Yet, he also belongs to a community consisting of family members and relatives and perhaps another community of individuals sharing his hobby (Figure 9.19). Each of these communities consists of individuals who are members of several other communities, resulting in a complicated web of nested and overlapping communities [36]. Overlapping communities are not limited to social systems: The same genes are often implicated in multiple diseases, an indication that disease modules of different disorders overlap [14].

Figure 9.19 Overlapping Communities

While the existence of a nested community structure has long been ap-

Schematic representation of the communities surrounding Tamás Vicsek, who introduced the concept of overlapping communities. A zoom into the scientific community illustrates the nested and overlapping structure of the community characterizing his scientific interests. After [36].

preciated by sociologists [47] and by the engineering community interested in graph partitioning, the algorithms discussed so far force each node into a single community. A turning point was the work of Tamás Vicsek and collaborators [36,48], who proposed an algorithm to identify overlapping communities, bringing the problem to the attention of the network science community. In this section we discuss two algorithms to detect overlapping communities, clique percolation and link clustering.

CLIQUE PERCOLATION The clique percolation algorithm, often called CFinder, views a commu-

>

nity as the union of overlapping cliques [36]: • Two k-cliques are considered adjacent if they share k – 1 nodes (Figure 9.20b). • A k-clique community is the largest connected subgraph obtained by the union of all adjacent k-cliques (Figure 9.20c). • k-cliques that can not be reached from a particular k-clique belong to other k-clique communities (Figure 9.20c,d).

Online Resource 9.2 CFinder

The CFinder software, allowing us to identify overlapping communities, can be downloaded from www.cfinder.org.

The CFinder algorithm identifies all cliques and then builds an Nclique x

>

Nclique clique–clique overlap matrix O, where Nclique is the number of cliques

and Oij is the number of nodes shared by cliques i and j (Figure 9.39). A typical COMMUNITIES

26

Overlapping Communities

(a)

Figure 9.20

(b)

The Clique Percolation Algorithm (CFinder) To identify k=3 clique-communities we roll a triangle across the network, such that each subsequent triangle shares one link (two nodes) with the previous triangle.

(c)

(a)-(b) Rolling Cliques Starting from the triangle shown in green in (a), (b) illustrates the second step of the algorithm.

(d)

(c) Clique Communities for k=3 The algorithm pauses when the final triangle of the green community is added. As no more triangles share a link with the green triangles, the green community has been completed. Note that there can be multiple k-clique communities in the same network. We illustrate this by showing a second community in blue. The figure highlights the moment when we add the last triangle of the blue community. The blue and green communities overlap, sharing the orange node.

output of the CFinder algorithm is shown in Figure 9.21, displaying the com-

(d) Clique Communities for k=4 k=4 community structure of a small network, consisting of complete four node subgraphs that share at least three nodes. Orange nodes belong to multiple communities.

munity structure of the word bright. In the network two words are linked to each other if they have a related meaning. We can easily check that the overlapping communities identified by the algorithm are meaningful: The word bright simultaneously belongs to a community containing light-related words, like glow or dark; to a community capturing colors (yellow,

Images courtesy of Gergely Palla.

brown); to a community consisting of astronomical terms (sun, ray); and to a community linked to intelligence (gifted, brilliant). The example also illustrates the difficulty the earlier algorithms would have in identifying communities of this network: they would force bright into one of the four communities and remove from the other three. Hence communities would be stripped of a key member, leading to outcomes that are difficult to interpret. Could the communities identified by CFinder emerge by chance? To distinguish the real k-clique communities from communities that are a pure consequence of high link density we explore the percolation properties of k-cliques in a random network [48]. As we discussed in CHAPTER 3, if a random network is sufficiently dense, it has numerous cliques of varying order. A large k-clique community emerges in a random network only if the connection probability p exceeds the threshold (ADVANCED TOPICS 9.D)

1

. pc(k) =

[(k − 1)N ]1/(k−1)

(9.16)

Figure 9.21 Overlapping Communities Communities containing the word bright in the South Florida Free Association network, whose nodes are words, connected by a link if their meaning is related. The community structure identified by the CFinder algorithm accurately describes the multiple meanings of bright, a word that can be used to refer to light, color, astronomical terms, or intelligence. After [36].

Under pc(k) we expect only a few isolated k-cliques (Figure 9.22a). Once p ex-

ceeds pc(k), we observe numerous cliques that form k-clique communities (Figure 9.22b). In other words, each k-clique community has its own thresh-

old: • For k =2 the k-cliques are links and (9.16) reduces to pc(k)~1/N, which COMMUNITIES

27

Overlapping Communities

is the condition for the emergence of a giant connected component in

(a)

Erdős–Rényi networks. • For k = 3 the cliques are triangles (Figure 9.22a,b) and (9.16) predicts pc(k)~1/√2N. In other words, k-clique communities naturally emerge in sufficiently dense networks. Consequently, to interpret the overlapping community structure of a network, we must compare it to the community structure obtained for the degree-randomized version of the original network.

Computational Complexity Finding cliques in a network requires algorithms whose running time grows exponentially with N. Yet, the CFinder community definition is

(b)

based on cliques instead of maximal cliques, which can be identified in polynomial time [49]. If, however, there are large cliques in the network, it is more efficient to identify all cliques using an algorithm with O(eN) complexity [36]. Despite this high computational complexity, the algorithm is relatively fast, processing the mobile call network of 4 million mobile phone users in less then one day [50] (see also Figure 9.28).

LINK CLUSTERING While nodes often belong to multiple communities, links tend to be community specific, capturing the precise relationship that defines a node’s membership in a community. For example, a link between two individuals may indicate that they are in the same family, or that they work together, or that they share a hobby, designations that only rarely overlap.

Figure 9.22

Similarly, in biology each binding interaction of a protein is responsible

The Clique Percolation Algorithm (CFinder)

for a different function, uniquely defining the role of the protein in the

Random networks built with probabilities p=0.13 (a) and p=0.22 (b). As both p's are larger than the link percolation threshold (pc=1/ N=0.05 for N=20), in both cases most nodes belong to a giant component.

cell. This specificity of links has inspired the development of community finding algorithms that cluster links rather than nodes [51,52]. The link clustering algorithm proposed by Ahn, Bagrow and Lehmann

(a) Subcritical Communities The 3-clique (triangle) percolation threshold is pc(3)=0.16 according to (9.16), hence at p=0.13 we are below it. Therefore, only two small 3-clique percolation clusters are observed, which do not connect to each other.

[51] consists of the following steps: Step 1: Define Link Similarity The similarity of a link pair is determined by the neighborhood of the nodes connected by them. Consider for example the links (i,k) and (j,k), connected to the same node k. Their similarity is defined as (Figure

(b) Supercritical Communities For p=0.22 we are above pc(3), hence we observe multiple 3-cliques that form a giant 3-clique percolation cluster (purple). This network also has a second overlapping 3-clique community, shown in green.

9.23a-c)

| n+(i) ∩ n+( j) | , S ((i,k),( j,k))= | n+(i) ∪ n+( j) |

(9.17)

where n+(i) is the list of the neighbors of node i, including itself. Hence

After [48].

S measures the relative number of common neighbors i and j have. Con-

sequently S=1 if i and j have the same neighbors (Figure 9.23c). The less is the overlap between the neighborhood of the two links, the smaller is S (Figure 9.23b).

COMMUNITIES

28

Overlapping Communities

(a)

(b)

A

k

i

j

Figure 9.23

(c)

B

C

ck ai

ai

bj

1 S ( (i,k), ( j,k)) = 3

Identifying Link Communities

ck bj

S ( (i,k), ( j,k)) = 1

Figure 1: (A ) The similarity measure S eik e jk between edges eik ande jk sharing nodek. For this example,n i n j 12 and 4, giving S 1 3. Two simple cases:B) ( an isolated k(a kb 1), connected triplea,c,b) ( has (d)n i n j S 1 3, while (C) an isolated triangle has S 1. (e) 1

2

a

1

2

c

3

3-4

3

2-4

b

1

1-2 1-3

2

4-7

3

4-6

3

4-5 7-9

4

7-8

6

8-9

index [1]: S eik ejk

n i n i

n n

j j

5

(a) The similarity S of the (i,k) and (j,k) links connected to node k detects if the two links belong to the same group of nodes. Denoting with n+(i) the list of neighbors of node i, including itself, we obtain |n+(i)∪n+(j)| =12 and |n+(i)∩n+(j)| =4, resulting in S = 1/3 according to (9.17). (b) For an isolated (ki = kj = 1) connected triple we obtain S = 1/3.

2

(f)

5-6

5

7

Figure 2: An example network with node 6 communities a) ( and link communities 8 b).( 5 link similarity matrix and (c) The resulting link dendrogram. Compare with main text Fig. 1. 1

2-3 5

9

4

1-4

The link clustering algorithm identifies links with a similar topological role in a network. It does so by exploring the connectivity patterns of the nodes at the two ends of each link. Inspired by the similarity function of the Ravasz algorithm [4] (Figure 9.19), the algorithm aims to assign to high similarity S the links that connect to the same group of nodes.

(c) For a triangle we have S = 1.

9 7 8 (2)

An example illustration of this similarity measure is shown in1 Fig. (see Sec.4.1 for generalizations of

Step 2: Apply Hierarchical Clustering The similarity matrix S allows us to use hierarchical clustering to iden-

(d) The link similarity matrix for the network shown in (e) and (f). Darker entries correspond to link pairs with higher similarity S. The figure also shows the resulting link dendrogram. (e) The link community structure predicted by the cut of the dendrogram shown as an orange dashed line in (d). (f) The overlapping node communities derived from the link communities shown in (e).

tify link communities (SECTION 9.3). We use a single-linkage procedure, iteratively merging communities with the largest similarity link pairs (Figure 9.10).

After [51].

Taken together, for the network of Figure 9.23e, (9.17) provides the similarity matrix shown in (d). The single-linkage hierarchical clustering leads to the dendrogram shown in (d), whose cuts result in the link communities shown in (e) and the overlapping node communities shown in (f). Figure 9.24 illustrates the community structure of the characters of Victor Hugo’s novel Les Miserables identified using the link clustering algorithm. Anyone familiar with the novel can convince themselves that the communities accurately represent the role of each character. Several characters are placed in multiple communities, reflecting their overlapping roles in the novel. Links, however, are unique to each community. Computational Complexity The link clustering algorithm involves two time-limiting steps: similarity calculation and hierarchical clustering. Calculating the similarity (9.17) for a link pair with degrees ki and kj requires max(ki,kj) steps. For a

scale-free network with degree exponent γ the calculation of similarity has

complexity O(N2/(γ-1)), determined by the size of the largest node, kmax. Hierarchical clustering requires O(L2) time steps. Hence the algorithm's total

COMMUNITIES

29

Overlapping Communities

Figure 9.24

Boulatruelle Jondrette

Brujon

Child1

Blacheville Gueulemer

MmeBurgon

Dahlia

Favourite

Babet

Eponine

Child2

Link Communities

Anzelma

Zephine Montparnasse

Listolier

Tholomyes

Claquesous

MotherPlutarch

Perpetue

Fantine Mabeuf

Thenardier MmeThenardier

Gavroche

Combeferre

Courfeyrac

Bahorel Joly Grantaire

Marius

Feuilly

Simplice

Brevet Champmathieu

Judge

Chenildieu

Bamatabois

Woman2

Cochepaille Valjean

Gribier

Woman1

Magnon MmeHucheloup

Javert

Cosette

Bossuet Prouvaire

Marguerite

Toussaint Enjolras

Fauchelevent

LtGillenormand Gillenormand

BaronessT

The network of characters in Victor Hugo’s 1862 novel Les Miserables. Two characters are connected if they interact directly with each other in the story. The link colors indicate the clusters, light grey nodes corresponding to single-link clusters. Nodes that belong to multiple communities are shown as pie-charts, illustrating their membership in each community. Not surprisingly, the main character, Jean Valjean, has the most diverse community membership. After [51].

Fameuil

Scaufflaire Isabeau

Pontmercy

MmeDeR

MlleGillenormand

Gervais

Labarre

MlleBaptistine MmeMagloire

MmePontmercy MlleVaubois

OldMan

Myriel

MotherInnocent

CountessDeLo Napoleon Geborand

Count

Champtercier

Cravatte

computational complexity is O(N2/(γ-1))+ O(L2). For sparse graphs the latter term dominates, leading to O(N2). The need to detect overlapping communities have inspired numerous algorithms [53]. For example, the CFinder algorithm has been extended to the analysis of weighted [54], directed and bipartite graphs [55,56]. Similarly, one can derive quality functions for link clustering [52], like the modularity function discussed in SECTION 9.4. In summary, the algorithms discussed in this section acknowledge the fact that nodes naturally belong to multiple communities. Therefore by forcing each node into a single community, as we did in the previous sections, we obtain a misleading characterization of the underlying community structure. Link communities recognize the fact that each link accurately captures the nature of the relationship between two nodes. As a bonus link clustering also predicts the overlapping community structure of a network.

COMMUNITIES

30

Overlapping Communities

SECTION 9.6

TESTING COMMUNITIES

Community identification algorithms offer a powerful diagnosis tool, allowing us to characterize the local structure of real networks. Yet, to interpret and use the predicted communities, we must understand the accuracy of our algorithms. Similarly, the need to diagnose large networks prompts us to address the computational efficiency of our algorithms. In this section we focus on the concepts needed to assess the accuracy and the speed of community finding.

ACCURACY If the community structure is uniquely encoded in the network’s wiring diagram, each algorithm should predict precisely the same communities. Yet, given the different hypotheses the various algorithms embody, the partitions uncovered by them can differ, prompting the question: Which community finding algorithm should we use? To assess the performance of community finding algorithms we need to measure an algorithm’s accuracy, i.e. its ability to uncover communities in networks whose community structure is known. We start by discussing two benchmarks, which are networks with predefined community structure, that we can use to test the accuracy of a community finding algorithm. Girvan-Newman (GN) Benchmark The Girvan-Newman benchmark consists of N=128 nodes partitioned into nc=4 communities of size Nc=32 [9,57]. Each node is connected with

probability pint to the Nc–1 nodes in its community and with probability

pext to the 3Nc nodes in the other three communities. The control parameter

μ=

k ext , + k int

k ext

(9.18)

captures the density differences within and between communities. We

COMMUNITIES

31

testing communities

Figure 9.25

(a)

Testing Accuracy with the NG Benchmark The position of each node in (a) and (c) shows the planted communities of the Girvan-Newman (GN) benchmark, illustrating the presence of four distinct communities, each with Nc=32 nodes. (a) The node colors represent the partitions predicted by the Ravasz algorithm for mixing parameter µ=0.40 given by (9.18). As in this case the communities are well separated, we have an excellent agreement between the planted and the detected communities.

(b) 1

Ravasz

(b) The normalized mutual information in function of the mixing parameter µ for the Ravasz algorithm. For small µ we have In≃1 and nc≃4, indicating that the algorithm can easily detect well separated communities, as illustrated in (a). As we increase µ the link density difference within and between communities becomes less pronounced. Consequently the communities are increasingly difficult to identify and In decreases.

0.8

0.6

In 0.4

0.2

0

0.1

0.2

0.3

µ

0.4

0.5

(c) For µ=0.50 the Ravasz algorithm misplaces a notable fraction of the nodes, as in this case the communities are not well separated, making it harder to identify the correct community structure.

0.6

Note that the Ravasz algorithm generates multiple partitions, hence for each µ we show the partition with the largest modularity, M. Next to (a) and (c) we show the normalized mutual information associated with the corresponding partition and the number of detected communities nc. The normalized mutual information (9.23), developed for non-overlapping communities, can be extended to overlapping communities as well [59].

(c)

expect community finding algorithms to perform well for small µ (Figure 9.25a), when the probability of connecting to nodes within the same community exceeds the probability of connecting to nodes in different communities. The performance of all algorithms should drop for large µ (Figure 9.25b), when the link density within the communities becomes comparable to the link density in the rest of the network. Lancichinetti-Fortunato-Radicchi (LFR) Benchmark The GN benchmark generates a random graph in which all nodes have comparable degree and all communities have identical size. Yet, the degree distribution of most real networks is fat tailed, and so is the community size distribution (Figure 9.29). Hence an algorithm that performs well on the GN benchmark may not do well on real networks. To avoid COMMUNITIES

32

testing communities

(a)

Figure 9.26

(e)

LFR Benchmark The construction of the Lancichinetti-Fortunato-Radicchi (LFR) benchmark, which generates networks in which both the node degrees and community sizes follow a power law. The benchmark is built as follows [57]:

(b)

(c)

(a) Start with N isolated nodes. (d)

(b) Assign each node to a community of size Nc where Nc follows the power law distribution PN ~Nc-ζ with community exponent c ζ. Also assign each node i a degree ki selected from the power law distribution pk~k -γ with degree exponent γ.

this limitation, the LFR benchmark (Figure 9.26) builds networks for which both the node degrees and the planted community sizes follow power laws [58].

(c) Each node i of a community receives an internal degree (1-µ)ki, shown as links whose color agrees with the node color. The remaining µki degrees, shown as black links, connect to nodes in other communities.

Having built networks with known community structure, next we need tools to measure the accuracy of the partition predicted by a particular community finding algorithm. As we do so, we must keep in mind that the

(d) All stubs of nodes of the same community are randomly attached to each other, until no more stubs are ‘‘free’’. In this way we maintain the sequence of internal degrees of each node in its community. The remaining µki stubs are randomly attached to nodes from other communities.

two benchmarks discussed above correspond to a particular definition of communities. Consequently algorithms based on clique percolation or link clustering, that embody a different notion of communities, may not fare so well on these. Measuring Accuracy

(e) A typical network and its community structure generated by the LFR benchmark with N=500, γ=2.5, and ζ=2.

To compare the predicted communities with those planted in the benchmark, consider an arbitrary partition into non-overlapping communities. In each step we randomly choose a node and record the label of the community it belongs to. The result is a random string of community labels that follow a p(C) distribution, representing the probability that a randomly selected node belongs to the community C. Consider two partitions of the same network, one being the benchmark (ground truth) and the other the partition predicted by a community finding algorithm. Each partition has its own p(C1) and p(C2) distribution. The

joint distribution, p(C1, C2), is the probability that a randomly chosen node belongs to community C1 in the first partition and C2 in the second. The

similarity of the two partitions is captured by the normalized mutual information [38]

∑ p(C1, C2)log2

C ,C

In = 1 2 1 2

p(C1, C2 )

p(C1)p(C2 )

H({p(C1)}) + 12 H({p(C2)})

.

(9.19)

The numerator of (9.19) is the mutual information I, measuring the information shared by the two community assignments: I=0 if C1 and C2 are

independent of each other; I equals the maximal value H({p(C1)}) = H({p(C2)}) when the two partitions are identical and



H({p(C )}) = − p(C )log2 p(C )

(9.20)

C

COMMUNITIES

33

testing communities

N

0.6 0.4 0.2 0 0.1

0.2

0.3

0.4

µ 0.5

0.6

0.3

0.7

0.4

µ 0.5

0.6

0.7

0.8

LFR BENCHMARK

Figure 9.27

1

Girvan-Newman Greedy Mod. (Opt) Louvain Infomap Ravasz

0.8

0.2

(b)

NG BENCHMARK

1

NORMALIZED MUTUAL INFORMATION

NORMALIZED MUTUAL INFORMATION

(a)

0 0.1

0.8

Testing Against Benchmarks

0.8

We tested each community finding algorithm that predicts non-overlapping communities against the GN and the LFR benchmarks. The plots show the normalized mutual information In against µ for five algorithms. For the naming of each algorithm, see TABLE 9.1.

0.6 0.4 0.2 0 0.1

Girvan-Newman Greedy Mod. (Opt) Louvain Infomap Ravasz

0.2

0.3

0.4

µ 0.5

0.6

0.7

(a) GN Benchmark The horizontal axis shows the mixing parameter (9.18), representing the fraction of links connecting different communities. The vertical axis is the normalized mutual information (9.19). Each curve is averaged over 100 independent realizations.

0.8

is the ShannonLFR entropy. BENCHMARK NORMALIZED MUTUAL INFORMATION

1

If 0.8

all nodes belong to the same community, then we are certain about

the next label and H=0, as we do not gain new information by inspecting the 0.6

(b) LFR Benchmark Same as in (a) but for the LFR benchmark. The benchmark parameters are N=1,000, ⟨k⟩=20, γ=2, kmax=50, ζ=1, maximum community size: 100, minimum community size: 20. Each curve is averaged over 25 independent realizations.

community to which the next node belongs to. H is maximal if p(C) is the uniform 0.4 distribution, as in this case we have no idea which community comes Girvan-Newman next and each new node provides H bits of new information. Greedy Mod. (Opt)

0.2

0In 0.1

Louvain Infomap Ravasz

summary, In=1 if the benchmark and the detected partitions are 0.2

0.3

0.4

µ 0.5

0.6

0.7

0.8

identical, and In=0 if they are independent of each other. The utility of In is

illustrated in Figure 9.25b that shows the accuracy of the Ravasz algorithm

for the Girvan-Newman benchmark. In Figure 9.27 we use In to test the performance of each algorithm against the GN and LFR benchmarks. The results allow us to draw several conclusions: • We have In≃1 for µ n + 2 we can combine (9.40) and (9.41) to obtain

ln Nn(Hi) = C′n − ln ki

or

ln 5 ln 4

− ln 5

Nn(Hi) ∼ ki ln 4 .

(9.42) (9.43)

To calculate the degree distribution we need to normalize Nn(Hi) by calculating the ratio Nn(Hi) pki ∼ ∼ ki−γ . ki+1 − ki

(9.44)

Using 

ki+1 − ki =

i+1



4l −

l=1

i



4l = 4i+1 = 3 ki +4

l=1

we obtain

(9.45)

ln 5

ln 5 ki− ln 4 p = ∼ ki−1− ln 4 . ki 3ki + 4

(9.46)

In other words the obtained hierarchical network’s degree exponent is γ=1+

ln 5 = 2.16 . ln 4

(9.47)

Clustering Coefficient It is somewhat straightforward to calculate the clustering coefficient i

of the Hi hubs. Their ∑ 4 links come from nodes linked in a square, thus the l

l=1

connections between them equals their number. Consequently the number of links between the Hi’s neighbors is i

4l = kn(Hi) , ∑

(9.48)

l=1

providing

2ki ki(ki − 1)

2 ki − 1

. C(Hi) = =

(9.49)

In other words we obtain

COMMUNITIES

45

Advanced Topics 9.a



2 k

, C(k) ≃

(9.50)

indicating that C(k) for the hubs scales as k–1, in line with (9.8).

Empirical Results Figure 9.36 shows the C(k) function for the ten reference networks. We also show C(k) for each network after we applied degree-preserving randomization (green symbols), allowing us to make several observations: • For small k all networks have an order of magnitude higher C(k) than their randomized counterpart. Therefore the small degree nodes are located in much denser neighborhoods than expected by chance. • For the scientific collaboration, metabolic, and citation networks with a good approximation we have C(k)~k–1, while the randomized C(k) is flat. Hence these networks display the hierarchical modularity of the model of Figure 9.13. • For the Internet, mobile phone calls, actors, email, protein interactions and the WWW C(k) decreases with k, while their randomized C(k) is k-independent. Hence while these networks display a hierarchical modularity, the observed C(k) is not captured by our simple hierarchical model. To fit the C(k) of these systems we need to build models that accurately capture their evolution. Such models predict that C(k)~k–β, where β can be different from one [27]. • Only for the power grid we observe a flat, k-independent C(k), indicating the lack of a hierarchical modularity. Taken together, Figure 9.36 indicates that most real networks display some nontrivial hierarchical modularity.

COMMUNITIES

46

Advanced Topics 9.a

(a)

POWER GRID

10 0

10 -1

C(k)

C(k)

10 -2

10 -2

10 -3

10 -3

k

10 1

(d)

10 0

10 -1

Figure 9.36 Hierarchy in Real Networks

10 -4 0 10

10 2

MOBILE PHONE CALLS

10 0

INTERNET

10 0

10 -1

10 -4 0 10

(c)

(b)

k

10 1

10 2

10 3

The scaling of C(k) with k for the ten reference networks (purple symbols). The green symbols show C(k) obtained after applying degree preserving randomization to each network, that washes out the local density fluctuations. Consequently communities and the underlying hierarchy are gone. Directed networks were made undirected to measure C(k). The dashed line in each figure has slope -1, following (9.8), serving as a guide to the eye.

10 4

SCIENTIFIC COLLABORATION

10 -1

C(k)

C(k)

10 -2

10 -2 10 -3

10 -4 0 10

(e)

k

10 1

ACTOR

10 0

10 -3 10 0

10 2

(f)

k

10 1

10 2

10 3

EMAIL

10 0 10 -1

10 -1

10 -2

C(k)

C(k)

10 -3

10 -2

10 -4 10 -3 10 0

(g)

10 1

10 2

k 10 3

10 4

(h)

PROTEIN

10 0

10 -5 10 0

10 5

10 0

10 -1

10 -1

C(k)

C(k)

10 -2

10 -2

10 -3 10 0

(i)

k

10 1

WWW

10 0

10 -3 10 0

10 2

(j)

k

10 1

10 2

10 3

10 4

METABOLIC

k

10 1

10 2

10 3

CITATION

10 0

10 -1 10 -1

10 -2

C(k)-3

C(k)

10

10 -2

10 -4 10 -5

10 -3

10 -6 10 -7

10 0

COMMUNITIES

10 1

10 2

k 10 3

10 4

10 5

10 -4 10 0

10 1

k

10 2

10 3

10 4

47

Advanced Topics 9.a

SECTION 9.11

ADVANCED TOPICS 9.B MODULARITY

In this section we derive the expressions (9.12) and (9.13), characterizing the modularity fuction and its changes. Modularity as a Sum Over Communities Using (9.9) and (9.10) we can write the modularity of a full network as

1 N 2L i,∑ j=1

ki kj

M= (A − )δC ,C , ij

2L

i

j

(9.51)

where Ci is the label of the community to which node i belongs to. As only node pairs that belong to the same community contribute to the sum in (9.51), we can rewrite the first term as a sum over communities,



nc nc Lc 1 N 1 Aij δC ,C = Aij = ∑ 2L ∑ ∑L i j 2L i,∑ j=1 c=1 i, j∈C c=1

(9.52)

c

where Lc is the number of links within community Cc. The factor 2 disap-

pears because each link is counted twice in Aij.

In a similar fashion the second term of (9.51) becomes N

kk

1 i j δ = 2L i,∑ 2L j=1

C i ,C j

nc

1

∑ (2L)2 ∑ c=1 i, j∈C

c

ki kj =

nc

kc 2 , ∑ 4L2 c=1

(9.53)

where kc is the total degree of the nodes in community Cc. Indeed, in the configuration model the probability that a stub connects to a randomly

1 , as in total we have 2L stubs in the network. Hence the 2L k likelihood that our stub connects to a stub inside the module is c . By re2L

chosen stub is

peating this procedure for all kc stubs within the community Cc and adding 1/2 to avoid double counting, we obtain the last term of (9.53). Combining (9.52) and (9.53) leads to (9.12).

COMMUNITIES

48

Advanced Topics 9.b

Merging Two Communities Consider communities A and B and denote with kA and kB the total de-

gree in these communities (equivalent with kc above). We wish to calculate the change in modularity after we merge these two communities. Using (9.12), this change can be written as 2

ΔMAB =

k AB L AB − L ( 2L )



LA kA 2 L B kB 2 + − − L ( 2L ) L ( 2L )

,

(9.54)

where

L AB = L A +LB + l AB , (9.55)

lAB is the number of direct links between the nodes of communities A and B, and

k AB = kA + kB .

(9.56)

After inserting (9.55) and (9.56) into (9.54), we obtain

l AB L

kAkB 2L 2

− ΔMAB =

(9.57)

which is (9.13).

COMMUNITIES

49

Advanced Topics 9.b

SECTION 9.12

ADVANCED TOPICS 9.C FAST ALGORITHMS FOR COMMUNITY DETECTION

The algorithms discussed in this chapter were chosen to illustrate the fundamental ideas and concepts pertaining to community detection. Consequently they are not guaranteed to be neither the fastest nor the most accurate algorithms. Recently two algorithms, called the Louvain algorithm and Infomap have gained popularity, as their accuracy is comparable to the accuracy of the algorithms covered in this chapter but offer better scalability. Consequently we can use them to identify communities in very large networks. There are many similarities between the two algorithms: • They both aim to optimize a quality function Q . For the Louvain algorithm Q is modularity, M, and for Infomap Q is an entropy-based measure called the map equation or L. • Both algorithms use the same optimization procedure. Given these similarities, we discuss the algorithms together.

THE LOUVAIN ALGORITHM The O(N2) computational complexity of the greedy algorithm can be prohibitive for very large networks. A modularity optimization algorithm with better scalability was proposed by Blondel and collaborators [2]. The Louvain algorithm consists of two steps that are repeated iteratively (Figure 9.37): Step I Start with a weighted network of N nodes, initially assigning each node to a different community. For each node i we evaluate the gain in modularity if we place node i in the community of one of its neighbors j. We then move node i in the community for which the modularity gain is the largest, but only if this gain is positive. If no positive gain is found, i stays in its original community. This process is applied to all nodes until no further improvement can be achieved, completing Step I.

COMMUNITIES

50

Advanced Topics 9.c

Figure 9.37

1ST PASS 1 2

3

0 4

10

9 12

The Louvain Algorithm 1 2

STEP I

5 8

10

9 12

14

7

STEP II

14

1

1

13

3

16

14

The main steps of the Louvain algorithm. Each pass consists of two distinct steps:

4

4

6

11

15

13

3

0 4

6

11 8

= 0 .023 = 0 .032 = 0 .026 = 0 .026

7

5

15

∆ M 0,2 ∆ M 0,3 ∆ M 0,4 ∆ M 0,5

Step I Modularity is optimized by local changes. We choose a node and calculate the change in modularity, (9.58), if the node joins the community of its immediate neighbors. The figure shows the expected modularity change ∆Mo,i for node 0. Accordingly node 0 will join node 3, as the modularity change for this move is the largest, being ∆M0,3=0.032. This process is repeated for each node, the node colors corresponding to the resulting communities, concluding Step I.

1 2

2ND PASS 14

1 16

4

4 1 3

STEP I

14

1

1

1 2

4

4

3

16

STEP II 26

3

1

24

2

Step II The communities obtained in Step I are aggregated, building a new network of communities. Nodes belonging to the same community are merged into a single node, as shown on the top right. This process will generate self-loops, corresponding to links between nodes in the same community that are now merged into a single node.

The modularity change ΔM obtained by moving an isolated node i into a community C can be calculated using

ΔM = − − − − [ ( )] [ ( ) ( )] Σin + 2ki,in 2W

Σtot + ki 2W

2

Σin 2W

Σtot 2W

2

ki 2W

The sum of Steps I & II are called a pass. The network obtained after each pass is processed again (Pass 2), until no further increase of modularity is possible. After [2].

2

(9.58)

where ∑ in is the sum of the weights of the links inside C (which is LC for an unweighted network); ∑ tot is the sum of the link weights of all nodes

in C; ki is the sum of the weights of the links incident to node i; ki,in is the

sum of the weights of the links from i to nodes in C and W is the sum of the weights of all links in the network. Note that ΔM is a special case of (9.13), which provides the change in modularity after merging communities A and B. In the current case B is an isolated node. We can use ΔM to determine the modularity change when i is removed from the community it belonged earlier. For this we calculate ΔM for merging i with the community C after we excluded i from it. The change after removing i is –ΔM. Step II We construct a new network whose nodes are the communities iden-

tified during Step I. The weight of the link between two nodes is the sum of the weight of the links between the nodes in the corresponding communities. Links between nodes of the same community lead to weighted self-loops. Once Step II is completed, we repeat Steps I - II, calling their combination a pass (Figure 9.37). The number of communities decreases with each pass. The passes are repeated until there are no more changes and maximum modularity is attained. COMMUNITIES

51

Advanced Topics 9.c

1111100 1100 0110 11011 10000 11011 0110 0011 10111 1001 0011 1001 0100 0111 10001 1110 0111 10001 0111 1110 0000 1110 10001 0111 1110 0111 1110 1111101 1110 0000 10100 0000 1110 10001 0111 0100 10110 11010 10111 1001 0100 1001 10111 1001 0100 1001 0100 0011 0100 0011 0110 11011 0110 0011 0100 1001 10111 0011 0100 0111 10001 1110 10001 0111 0100 10110 111111 10110 10101 11110 00011

(a)

(b)

(c)

01011 0110 1111100

1100 11011

10000

1111101

1110 01010

10100

10111

0000

0100

00010

1101 0010

11110

00011 1111100 1100 0110 11011 10000 11011 0110 0011 10111 1001 0011 1001 0100 0111 10001 1110 0111 10001 0111 1110 0000 1110 10001 0111 1110 0111 1110 1111101 1110 0000 10100 0000 1110 10001 0111 0100 10110 11010 10111 1001 0100 1001 10111 1001 0100 1001 0100 0011 0100 0011 0110 11011 0110 0011 0100 1001 10111 0011 0100 0111 10001 1110 10001 0111 0100 10110 111111 10110 10101 11110 00011

010

01 11

011

011

0001 0

1011

10

0001 110

000

00

0

111 0000 11 01 101 100 101 01 0001 0 110 011 00 110 00 111 1011 10 111 000 10 111 000 111 10 011 10 000 111 10 111 10 0010 10 011 010 011 10 000 111 0001 0 111 010 100 011 00 111 00 011 00 111 00 111 110 111 110 1011 111 01 101 01 0001 0 110 111 00 011 110 111 1011 10 111 000 10 000 111 0001 0 111 010 1010 010 1011 110 00 10 011

Infomap detect communities by compressing the movement of a random walker on a network.

The Louvain111 algorithm is more limited by storage demands than by 000

computational1100 time. 010 The number of computations scale linearly with L 10

0010 most time consuming first pass. 10 for the With subsequent passes over a

(a) The orange line shows the trajectory of a random walker on a small network. We want to describe this trajectory with a minimal number of symbols, which we can achieve by assigning repeatedly used structures (communities) short and unique names.

00

1101 number of nodes and links, the complexity of the algorithm decreasing 011 11

110

010 10 is at most O(L). It therefore allows us to identify communities in net0 1011 011 works 111with0001millions of nodes. 000

111 0000 11 01 101 100 101 01 0001 0 110 011 00 110 00 111 1011 10 111 000 10 111 000 111 10 011 10 000 111 10 111 10 0010 10 011 010 011 10 000 111 0001 0 111 010 100 011 00 111 00 011 00 111 00 111 110 111 110 1011 111 01 101 01 0001 0 110 111 00 011 110 111 1011 10 111 000 10 000 111 0001 0 111 010 1010 010 1011 110 00 10 011

INFOMAP

111 0000 11 01 101 100 101 01 0001 0 110 011 00 110 00 111 1011 10 111 000 10 111 000 111 10 011 10 000 111 10 111 10 0010 10 011 010 011 10 000 111 0001 0 111 010 100 011 00 111 00 011 00 111 00 111 110 111 110 1011 111 01 101 01 0001 0 110 111 00 011 110 111 1011 10 111 000 10 000 111 0001 0 111 010 1010 010 1011 110 00 10 011

(b) We start by giving a unique name to each node. This is derived using a Huffman coding, a data compression algorithm that assigns each node a code using the estimated probability that the random walk visits that node. The 314 bits under the network describe the sample trajectory of the random walker shown in (a), starting with 1111100 for the first node of the walk in the upper left corner, 1100 for the second node, etc., and ending with 00011 for the last node on the walk in the lower right corner.

Introduced by Martin Rosvall and Carl T Bergstrom, Infomap exploits data compression for community identification (Figure 9.38) [44-46]. It does it by optimizing a quality function for community detection in directed and weighted networks, called the map equation. Consider a network partitioned into nc communities. We wish to encode in the most efficient fashion the trajectory of a random walker on this network. In other words, we want to describe the trojectory with the smallest number of symbols. The ideal code should take advantage of the fact that the random walker tends to get trapped into communities, staying there for a long time (Figure 9.38c).

(c) The figure shows a two-level encoding of the random walk, in which each community receives a unique name, but the name of nodes within communities are reused. This code yields on average a 32% shorter coding. The codes naming the communities and the codes used to indicate an exit from each community are shown to the left and the right of the arrows under the network, respectively. Using this code, we can describe the walk in (a) by the 243 bits shown under the network in (c). The first three bits 111 indicate that the walk begins in the red community, the code 0000 specifies the first node of the walk, etc.

To achieve this coding we assign: • One code to each community (index codebook). For example the purple community in Figure 9.38c is assigned the code 111. • Codewords for each node within each community. For example the top left node in (c) is assigned 001. Note that the same node code can be reused in different communities. • Exit codes that mark when the walker leavers a community, like 0001 for the purple community in (c).

(d) By reporting only the community names, and not the locations of each node within the communities, we obtain an efficient coarse graining of the network, which corresponds to its community structure.

The goal, therefore, is to build a code that offers the shortest description of the random walk. Once we have this code, we can identify the network's community structure by reading the index codebook, which is uniquely as-

COMMUNITIES

011

From Data Compression to Communities

100

1010

0001 110

110

10

111 0000 11 01 101 100 101 01 0001 0 110 011 00 110 00 111 1011 10 111 000 10 111 000 111 10 011 10 000 111 10 111 10 0010 10 011 010 011 10 000 111 0001 0 111 010 100 011 00 111 00 011 00 111 00 111 110 111 110 1011 111 01 101 01 0001 0 110 111 00 011 110 111 1011 10 111 000 10 000 111 0001 0 111 010 1010 010 1011 110 00 10 011

111

100 Computational Complexity 111

10

11

Figure 9.38

110

101

10 00

111

001 0000

010

1100

10

10101 0000

100

1010

10110 0010

111 0

111

111

000

011 00

101

100

11010

111111

110

11

1001

0111

10001

(d)

001 01

0011

52

Advanced Topics 9.c

signed to each community (Figure 9.38c).

>

The optimal code is obtained by finding the minimum of the map equation nc

L= qH(Q) + p cH(Pc ) . ∑ ↻

(9.59)

c =1

In a nutshell, the first term of (9.59) gives the average number of bits necessary to describe the movement between communities where q is the prob-

Online Resource 9.3 Map Equation for Infomap

For a dynamic visualization of the mechanism behind the map equation, see http://www. tp.umu. se/~rosvall/livemod/mapequation/.

ability that the random walker switches communities during a given step. The second term gives the average number of bits necessary to describe

>

movement within communities. Here H(Pc) is the entropy of within-community movements — including an “exit code” to capture the departure from a community i. The specific terms of the maps equation and their calculation in terms of the probabilities capturing the movement of a random walker on a network, is somewhat involved. They are described in detail in Ref [44-46]. Online Resource 9.3 offers an interactive tool to illustrate the mechanism behind (9.59) and its use. At the end L serves as a quality function, that takes up a specific value for a particular partition of the network into communities. To find the best partition, we must minimize L over all possible partitions. The popular implementation of this optimization procedure follows Steps I and II of the Louvain algorithm: We assign each node to a separate community, and we systematically join neighboring nodes into modules if the move decreases L. After each move L is updated using (9.59). The obtained communities are joined into supercommunities, finishing a pass, after which the algorithm is restarted on the new, reduced network. Computational Complexity The computational complexity of Infomap is determined by the procedure used to minimize the map equation L. If we use the Louvain procedure, the computational complexity is the same as that of the Louvain algorithm, i.e. at most O(LlogL) or O(NlogN) for a sparse graph. In summary, the Louvain algorithm and Infomap offer tools for fast community identification. Their accuracy across benchmarks is comparable to the accuracy of the algorithms discussed throughout this chapter (Figure 9.28).

COMMUNITIES

53

Advanced Topics 9.c

SECTION 9.13

ADVANCED TOPICS 9.D THRESHOLD FOR CLIQUE PERCOLATION

In this section we derive the percolation threshold (9.20) for clique percolation on a random network and discuss the main steps of the CFinder algorithm (Figure 9.39). When we roll a k-clique to an adjacent k-clique by relocating one of its nodes, the expectation value of the number of adjacent k-cliques for the template to roll further should equal exactly one at the percolation threshold (Figure 9.20). Indeed, a smaller than one expectation value will result in a premature end of the k-clique percolation clusters, because starting from any k-clique, the rolling would quickly come to a halt. Consequently the size of the clusters would decay exponentially. A larger than one expectation value, on the other hand, allows the clique community to grow indefinitely, guaranteeing that we have a giant cluster in the system. The above expectation value is provided by

(k − 1)(N − k − 1)

k−1

,

(9.63)

where the term (k–1) counts the number of nodes of the template that can be selected for the next relocation; the term (N-k–1)k–1 counts the number of potential destinations for this relocation, out of which only the fraction pk–1 is acceptable, because each of the new k–1 edges (associated with the relocation) must exist in order to obtain a new k-clique. For large N, (9.63) simplifies to

(k–1) N p c k–1 = 1 , which leads to (9.16).

COMMUNITIES

54

Advanced Topics 9.d

Figure 9.39

(a)

CFinder algorithm

1

5

4

2

The main steps of the CFinder algorithm. (a) Starting from the network shown in the figure, our goal is to identify all cliques. All five k=3 cliques present in the network are highlighted.

3

(b)

O=

1

2

3

4

5

1

0

1

0

0

0

2

1

0

0

0

0

3

0

0

0

1

0

4

0

0

1

0

1

5

0

0

0

1

0

(b) The overlap matrix O of the k=3 cliques. This matrix is viewed as an adjacency matrix of a network whose nodes are the cliques of the original network. The matrix indicates that we have two connected components, one consisting of cliques (1,2) and the other of cliques (3, 4, 5). The connected components of this network map into the communities of the original network. (c) The two clique communities predicted by the adjacency matrix.

(c)

2 1

3

4

(d) The two clique communities shown in (c), mapped on the original network.

5

(d)

COMMUNITIES

55

Advanced Topics 9.d

SECTION 9.14

BIBLIOGRAPHY

[1] B. Droitcour. Young Incorporated Artists. Art in America, 92-97, April 2014. [2] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks.  J. Stat. Mech., 2008. [3]  G.C. Homans. The Human Groups. Harcourt, Brace & Co, New York, 1950. [4] S.A. Rice. The identification of blocs in small political bodies. Am. Polit. Sci. Rev., 21:619–627, 1927. [5] R.D. Luce and A.D. Perry. A method of matrix analysis of group structure. Psychometrika, 14:95–116, 1949. [6] R.S. Weiss and E. Jacobson. A method for the analysis of the structure of complex organizations. Am. Sociol. Rev., 20:661–668, 1955. [7] W.W. Zachary. An information flow model for conflict and fission in small groups. J. Anthropol. Res., 33:452–473, 1977. [8] L. Donetti and M.A. Muñoz. Detecting network communities: a new systematic and efficient algorithm. J. Stat. Mech., P10012, 2004. [9] M. Girvan and M.E.J. Newman. Community structure in social and biological networks. PNAS, 99:7821–7826, 2002. [10] L.H. Hartwell, J.J. Hopfield, and A.W. Murray. From molecular to modular cell biology. Nature, 402:C47–C52, 1999. [11] E. Ravasz, A. L. Somera, D. A. Mongru, Z. N. Oltvai, and A.-L. Barabási. Hierarchical organization of modularity in metabolic networks. Science, 297:1551-1555, 2002. [12] K.-I. Goh, M. E. Cusick, D. Valle, B. Childs, M. Vidal, and A.-L. Barabási. The human disease network. PNAS, 104:8685-8690, 2007. COMMUNITIES

56

BIBLIOGRAPHY

[13] J. Menche, A.Sharma, M. Kitsak, S. Ghiassian, M. Vidal, J. Loscalzo, A.-L. Barabási. Oncovering disease-disease relationships through the human interactome. 2014. [14] A.-L. Barabási, N. Gulbahce, and J. Loscalzo. Network medicine: a network-based approach to human disease. Nature Review Genetics, 12:5668, 2011. [15] G. W. Flake, S. Lawrence, and C.L. Giles. Efficient identification of web communities. Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, 150-160, 2000. [16] F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi. Defining and identifying communities in networks. PNAS, 101:2658–2663, 2004. [17] A.B. Kahng, J. Lienig, I.L. Markov, and J. Hu. VLSI Physical Design: From Graph Partitioning to Timing Closure. Springer, 2011. [18] B.W. Kernighan and S. Lin. An efficient heuristic procedure for partitioning graphs. Bell Systems Technical Journal, 49:291–307, 1970. [19] G.E. Andrews. The Theory of Partitions. Addison-Wesley, Boston, USA, 1976. [20]  L. Lovász. Combinatorial Problems and Exercises. North-Holland, Amsterdam, The Netherlands, 1993. [21]  G. Pólya and G. Szegő. Problems and Theorems in Analysis I. Springer-Verlag, Berlin, Germany, 1998. [22] V. H. Moll. Numbers and Functions: From a classical-experimental mathematician’s point of view. American Mathematical Society, 2012. [23]  M.E.J. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical Review E, 69:026113, 2004. [24]   M.E.J. Newman. A measure of betweenness centrality based on random walks. Social Networks, 27:39–54, 2005. [25]  U. Brandes. A faster algorithm for betweenness centrality. J. Math. Sociol., 25:163–177, 2001. [26]  T. Zhou, J.-G. Liu, and B.-H. Wang. Notes on the calculation of node betweenness. Chinese Physics Letters, 23:2327–2329, 2006. [27] E. Ravasz and A.-L. Barabasi. Hierarchical organization in complex networks. Physical Review E, 67:026112, 2003. [28] S. N. Dorogovtsev, A. V. Goltsev, and J. F. F. Mendes. Pseudofractal scale-free web. Physical Review E, 65:066122, 2002. [29] E. Mones, L. Vicsek, and T. Vicsek. Hierarchy Measure for Complex Networks. PLoS ONE, 7:e33799, 2012. COMMUNITIES

57

BIBLIOGRAPHY

[30] A. Clauset, C. Moore, and M. E. J. Newman. Hierarchical structure and the prediction of missing links in networks. Nature, 453:98-101, 2008. [31] L. Danon, A. Díaz-Guilera, J. Duch, and A. Arenas. Comparing community structure identification. Journal of Statistical Mechanics, P09008, 2005. [32] S. Fortunato and M. Barthélemy. Resolution limit in community detection. PNAS, 104:36–41, 2007. [33]  M.E.J. Newman. Fast algorithm for detecting community structure in networks. Physical Review E, 69:066133, 2004. [34] S. Fortunato and M. Barthélemy. Resolution limit in community detection. PNAS, 104:36–41, 2007. [35]   A. Clauset, M.E.J. Newman, and C. Moore. Finding community structure in very large networks. Physical Review E, 70:066111, 2004. [36] G. Palla, I. Derényi, I. Farkas, and T. Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435:814, 2005. [37]   R. Guimerà, L. Danon, A. Díaz-Guilera, F. Giralt, and A. Arenas. Self-similar community structure in a network of human interactions. Physical Review E, 68:065103, 2003. [38]  L. Danon, A. Díaz-Guilera, J. Duch, and A. Arenas. Comparing community structure identification. J. Stat. Mech., P09008, 2005. [39]  J. Ruan and W. Zhang. Identifying network communities with a high resolution. Physical Review E 77: 016104, 2008. [40] B. H. Good, Y.-A. de Montjoye, and A. Clauset. The performance of modularity maximization in practical contexts. Physical Review E, 81:046106, 2010. [41] R. Guimerá, M. Sales-Pardo, and L.A.N. Amaral. Modularity from fluctuations in random graphs and complex networks. Physical Review E, 70:025101, 2004. [42] J. Reichardt and S. Bornholdt. Partitioning and modularity of graphs with arbitrary degree distribution. Physical Review E, 76:015102, 2007. [43] J. Reichardt and S. Bornholdt. When are networks truly modular? Physica D, 224:20–26, 2006. [44] M. Rosvall and C.T. Bergstrom. Maps of random walks on complex networks reveal community structure. PNAS, 105:1118, 2008. [45] M. Rosvall, D. Axelsson, and C.T. Bergstrom. The map equation. Eur. Phys. J. Special Topics, 178:13, 2009. COMMUNITIES

58

BIBLIOGRAPHY

[46] M. Rosvall and C.T. Bergstrom. Mapping change in large networks. PLoS ONE, 5:e8694, 2010. [47] A. Perey. Oksapmin Society and World View. Dissertation for Degree of Doctor of Philosophy. Columbia University, 1973. [48] I. Derényi, G. Palla, and T. Vicsek. Clique percolation in random networks. Physical Review Letters, 94:160202, 2005. [49] J.M. Kumpula, M. Kivelä, K. Kaski, and J. Saramäki. A sequential algorithm for fast clique percolation. Physical Review E, 78:026109, 2008. [50] G. Palla, A.-L. Barabási, and T. Vicsek. Quantifying social group evolution. Nature, 446:664-667, 2007. [51] Y.-Y. Ahn, J. P. Bagrow, and S. Lehmann. Link communities reveal multiscale complexity in networks. Nature, 466:761-764, 2010. [52]  T.S. Evans and R. Lambiotte. Line graphs, link partitions, and overlapping communities. Physical Review E, 80:016105, 2009. [53] M. Chen, K. Kuzmin, and B.K. Szymanski. Community Detection via Maximization of Modularity and Its Variants. IEEE Trans. Computational Social Systems, 1:46-65, 2014. [54]  I. Farkas, D. Ábel, G. Palla, and T. Vicsek. Weighted network modules. New J. Phys., 9:180, 2007. [55]  S. Lehmann, M. Schwartz, and L.K. Hansen. Biclique communities. Physical Review E, 78:016108, 2008. [56]  N. Du, B. Wang, B. Wu, and Y. Wang. Overlapping community detection in bipartite networks. IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, IEEE Computer Society, Los Alamitos, CA, USA: 176–179, 2008. [57] A. Condon and R.M. Karp. Algorithms for graph partitioning on the planted partition model. Random Struct. Algor., 18:116–140, 2001. [58] A. Lancichinetti, S. Fortunato, and F. Radicchi. Benchmark graphs for testing community detection algorithms. Physical Review E, 78:046110, 2008. [59] A. Lancichinetti, S. Fortunato, and J. Kertész. Detecting the overlapping and hierarchical community structure of complex networks. New Journal of Physics, 11:033015, 2009. [60] S. Fortunato. Community detection in graphs. Physics Reports, 486:75–174, 2010. [61] D. Hric, R.K. Darst, and S. Fortunato. Community detection in networks: structural clusters versus ground truth. Physical Review E, 90:062805, 2014. COMMUNITIES

59

BIBLIOGRAPHY

[62] M. S. Granovetter. The Strength of Weak Ties. The American Journal of Sociology, 78:1360–1380, 1973. [63] J.-P. Onnela, J. Saramäki, J. Hyvönen, G. Szabó, D. Lazer, K. Kaski, J. Kertész, and A.-L. Barabási. Structure and tie strengths in mobile communication networks. PNAS, 104:7332, 2007. [64] K.-I. Goh, B. Kahng, and D. Kim. Universal Behavior of Load Distribution in Scale-Free Networks. Physical Review Letters, 87:278701, 2001. [65] A. Maritan, F. Colaiori, A. Flammini, M. Cieplak, and J.R. Banavar. Universality Classes of Optimal Channel Networks. Science, 272:984 –986, 1996. [66] L.C. Freeman. A set of measures of centrality based upon betweenness. Sociometry, 40:35–41, 1977. [67]   J. Hopcroft, O. Khan, B. Kulis, and B. Selman. Tracking evolving communities in large linked networks. PNAS, 101:5249–5253, 2004. [68]   S. Asur, S. Parthasarathy, and D. Ucar. An event-based framework for characterizing the evolutionary behavior of interaction graphs. KDD ’07: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, NY, USA, pp. 913–921, 2007. [69]  D.J. Fenn, M.A. Porter, M. McDonald, S. Williams, N.F. Johnson, and N.S. Jones. Dynamic communities in multichannel data: An application to the foreign exchange market during the 2007–2008 credit crisis. Chaos, 19:033119, 2009. [70]  D. Chakrabarti, R. Kumar, and A. Tomkins. Evolutionary clustering, in: KDD ’06: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, NY, USA, pp. 554–560, 2006. [71]  Y. Chi, X. Song, D. Zhou, K. Hino, and B.L. Tseng. Evolutionary spectral clustering by incorporating temporal smoothness. KDD ’07: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, NY, USA, pp. 153–162, 2007. [72]  Y.-R. Lin, Y. Chi, S. Zhu, H. Sundaram, and B.L. Tseng. Facetnet: a framework for analyzing communities and their evolutions in dynamic networks. in: WWW ’08: Proceedings of the 17th International Conference on the World Wide Web, ACM, New York, NY, USA, pp. 685–694, 2008. [73] L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. Group formation in large social networks: membership, growth, and evolution. KDD ’06: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, NY, USA, pp. 44–54, 2006.

COMMUNITIES

60

BIBLIOGRAPHY

[74] M. E. J. Newman and J. Park. Why social networks are different from other types of networks. Physical Review E, 03G122, 2003. [75]  B. Krishnamurthy and J. Wang. On network-aware clustering of web clients. SIGCOMM Comput. Commun. Rev., 30:97–110, 2000. [76] K.P. Reddy, M. Kitsuregawa, P. Sreekanth, and S.S. Rao. A graph based approach to extract a neighborhood customer community for collaborative filtering. DNIS ’02: Proceedings of the Second International Workshop on Databases in Networked Information Systems, Springer-Verlag, London, UK, pp. 188–200, 2002. [77] R. Agrawal and H.V. Jagadish. Algorithms for searching massive graphs. Knowl. Data Eng., 6:225–238, 1994. [78] A.Y. Wu, M. Garland, and J. Han. Mining scale-free networks using geodesic clustering. KDD ’04: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, New York, NY, USA, 2004, pp. 719–724, 2004.

COMMUNITIES

61

BIBLIOGRAPHY

10 ALBERT-LÁSZLÓ BARABÁSI

NETWORK SCIENCE SPREADING PHENOMENA

ACKNOWLEDGEMENTS

MÁRTON PÓSFAI NICOLE SAMAY ROBERTA SINATRA

SARAH MORRISON AMAL HUSSEINI PHILIPP HOEVEL

INDEX

Introduction

1

Epidemic Modeling

2

Network Epidemics

3

Contact Networks

4

Beyond the Degree Distribution

5

Immunization

6

Epidemic Prediction

7

Summary

8

Homework

9

ADVANCED TOPICS 10.A Microscopic Models of Epidemic Processes

10

ADVANCED TOPICS 10.B Analytical Solution of the SI, SIS and SIR Models

11

ADVANCED TOPICS 10.C Targeted Immunization

12

ADVANCED TOPICS 10.D The SIR Model and Bond Percolation

13

Bibliography

14 Figure 10.0 (cover image)

Bill Smith An epidemiological model of the perfect infectious disease (evolved growth system) is an artwork by Bill Smith, an Illinois-based artist (2009, mixed media, 84x84x84 inches) (http://www.widicus.org).

This work is licensed under a Creative Commons: CC BY-NC-SA 2.0. PDF V26, 05.09.2014

SECTION 10.1

INTRODUCTION

On the night of February 21, 2003 a physician from Guangdong Province in southern China checked into the Metropole Hotel in Hong Kong. He previously treated patients suffering from a disease that, lacking a clear diagnosis, was called atypical pneumonia. Next day, after leaving the hotel, he went to the local hospital, this time as a patient. He died there several days later of atypical pneumonia [1]. The physician did not leave the hotel without a trace: That night sixteen other guests of the Metropole Hotel and one visitor also contracted the disease that was eventually renamed Severe Acute Respiratory Syndrome, or SARS. These guests carried the SARS virus with them to Hanoi, Singapore, and Toronto, sparking outbreaks in each of those cities. Epidemiologists later traced close to half of the 8,100 documented cases of SARS back to the

Figure 10.1

Super-spreaders

Metropole Hotel. With that the physician who brought the virus to Hong

One-hundred-forty-four of the 206 SARS patients diagnosed in Singapore were traced to a chain of five individuals that included four super-spreaders. The most important of these was Patient Zero, the physician from Guangdong Province in China, who brought the disease to the Metropole Hotel. After [1].

Kong become an example of a super-spreader, an individual who is responsible for a disproportionate number of infections during an epidemic. A network theorist will recognize super-spreaders as hubs, nodes with an exceptional number of links in the contact network on which a disease spreads. As hubs appear in many networks, super-spreaders have been documented in many infectious diseases, from smallpox to AIDS [2]. In this chapter we introduce a network based approach to epidemic phenomena that allows us to understand and predict the true impact of these hubs. The resulting framework, that we call network epidemics, offers an analytical and numerical platform to quantify and forecast the spread of infectious diseases. Infectious diseases account for 43% of the global burden of disease, as captured by the number of years of lost healthy life. They are called contagious, as they are transmitted by contact with an ill person or with their secretions. Cures and vaccines are rarely sufficient to stop an infectious disease - it is equally important to understand how the pathogen responsible for the disease spreads in the population, which in turn determines the way we administer the available cures or vaccines. SPREADING PHENOMENA

3

Figure 10.2

Mobile Phone Viruses Smart phones, capable of sharing programs and data with each other, offer a fertile ground for virus writers. Indeed, since 2004 hundreds of smart phone viruses have been identified, reaching a state of sophistication in a few years that took computer viruses about two decades to achieve [3]. Mobile viruses are transmitted using two main communication mechanisms [4]: Bluetooth (BT) Viruses A BT virus infects all phones found within BT range from the infected phone, which is about 10-30 meters. As physical proximity is essential for a BT connection, the transmission of a BT virus is determined by the owner’s location and the underlying mobility network, connecting locations by individuals who travel between them (SECTION 10.4). Hence BT viruses follow a spreading pattern similar to influenza.

The diversity of phenomena regularly described as spreading processes on networks is staggering: Biological

Multimedia Messaging Services (MMS) Viruses carried by MMS can infect all susceptible phones whose number is in the infected phone’s phonebook. Hence MMS viruses spread on the social network, following a long-range spreading pattern that is independent of the infected phone’s physical location. Consequently the spreading of MMS viruses is similar to the patterns characterizing computer viruses.

The spread of pathogens on their respective contact network is the main subject of this chapter. Examples include airborne diseases like influenza, SARS, or tuberculosis, transmitted when two individuals breathe the air in the same room; contagious diseases and parasites transmitted when people touch each other; the Ebola virus, transmitted via contact with a patient's bodily fluids, HIV and other sexually transmitted diseases passed on during sexual intercourse. Infectious diseases also include cancers carried by cancer-causing viruses, like HPV or EBV, or diseases carried by parasites like bedbugs or malaria. Digital A computer virus is a self-reproducing program that can transmit a copy of itself from computer to computer. Its spreading pattern has many similarities to the spread of pathogens. But digital viruses also have many unique features, determined by the technology behind the specific virus. As mobile phones morphed into hand-held computers, lately we also witnessed the appearance of mobile viruses and worms that infect smartphones (Figure 10.2). Social The role of the social and professional network in the spread and acceptance of innovations, knowledge, business practices, products, behavior, rumors and memes, is a much-studied problem in social sciences, marketing and economics [5, 6]. Online environments, like Twitter, offer unprecedented ability to track such phenomena. Consequently a staggering number of studies focus on social spreading, asking for example why can some messages reach millions of individuals, while others struggle to get noticed. The examples discussed above involve diverse spreading agents, from biological to computer viruses, ideas and products; they spread on differ-

SPREADING PHENOMENA

4

introduction

PHENOMENA

AGENT

NETWORK

Venereal Disease

Pathogens

Sexual Network

Rumor Spreading

Information, Memes

Communication Network

Diffusion of Innovations

Ideas, Knowledge

Communication Network

Computer Viruses

Malwares, Digital viruses

Internet

Mobile Phone Virus

Mobile Viruses

Social Network/Proximity Network

Bedbugs

Parasitic Insects

Hotel - Traveler Network

Malaria

Plasmodium

Mosquito - Human network

ent types of networks, from social to computer and professional networks;

Table 10.1

they are characterized by widely different time scales and follow different

Networks and Agents

mechanisms of transmission (Table 10.1). Despite this diversity, as we show

The spread of a pathogen, a meme or a computer virus is determined by the network on which the agent spreads and the transmission mechanism of the responsible agent. The table lists several much studied spreading phenomena, together with the nature of the particular spreading agent and the network on which the agent spreads.

in this chapter, these spreading processes obey common patterns and can be described using the same network-based theoretical and modeling framework.

SPREADING PHENOMENA

5

introduction

SECTION 10.2

EPIDEMIC MODELING

Epidemiology has developed a robust analytical and numerical framework to model the spread of pathogens. This framework relies on two fundamental hypotheses: i. Compartmentalization Epidemic models classify each individual based on the stage of the disease affecting them. The simplest classification assumes that an individual can be in one of three states or compartments: • Susceptible (S): Healthy individuals who have not yet contacted the pathogen (Figure 10.3). • Infectious (I): Contagious individuals who have contacted the pathogen and hence can infect others. • Recovered (R): Individuals who have been infected before, but have recovered from the disease, hence are not infectious. The modeling of some diseases requires additional states, like immune individuals, who cannot be infected, or latent individuals, who have been exposed to the disease, but are not yet contagious. Individuals can move between compartments. For example, at the beginning of a new influenza outbreak everyone is in the susceptible state. Once an individual comes into contact with an infected person, she can become infected. Eventually she will recover and develop im-

Figure 10.3

Pathogens

munity, losing her susceptibility to the the particular strain of influ-

A pathogen, a word rooted in the Greek words “suffering, passion” (pathos) and “producer of” (genes), denotes an infectious agent or germ. A pathogen could be a disease-causing microorganism, like a virus, a bacterium, a prion, or a fungus. The figure shows several much-studied pathogens, like the HIV virus, responsible for AIDS, an influenza virus and the hepatitis C virus. After http://www. livescience.com/18107-hiv-therapeutic-vaccines-promise.html and http://www.huffingtonpost.com/2014/01/13/deadly-viruses-beautiful-photos_n_4545309.html

enza. ii. Homogenous Mixing The homogenous mixing hypothesis (also called fully mixed or mass-action approximation) assumes that each individual has the same chance of coming into contact with an infected individual. This hypothesis eliminates the need to know the precise contact network on which the disease spreads, replacing it with the assumption that

SPREADING PHENOMENA

6

SUSCEPTIBLE (HEALTHY) (a)

anyone can infect anyone else.

S

In this section we introduce the epidemic modeling framework built

INFECTED (SICK)

I

INFECTION

on these two hypotheses. To be specific, we explore the dynamics of three frequently used epidemic models, the so-called SI, SIS and SIR models, that SUSCEPTIBLE (HEALTHY)

help us understand the basic building blocks of epidemic modeling. (b)

FRACTION INFECTED i(t)

SUSCEPTIBLE-INFECTED (SI) MODEL Consider a disease that spreads in a population of N individuals. Denote with S(t) the number of individuals who are susceptible (healthy) at time t and with I(t) the number individuals that have been already infected. At time t=0 everyone is susceptible (S(0) = N) and no one is infected (I(0)=0). Let us assume that a typical individual has ⟨k⟩ contacts and that the likelihood that the disease will be transmitted from an infected to a susceptible

1

0.5

FRACTION INFECTED i(t)

0 1 0

individual in a unit time is β. We ask the following: If a single individual fected at some later time t?

infected person comes into contact with ⟨k⟩S(t)/N susceptible individuals in a unit time. Since I(t) infected individuals are transmitting the pathogen, each at rate β, the average number of new infections dI(t) during a timeframe dt is

β⟨k⟩

Figure 10.4

(10.1)

s(t) = S(t) / N,       i(t) = I(t) / N ,

If i → 1,

(10.2)

(10.3)

We solve (10.3) by writing



di di + = β 〈k〉dt . i (1− i)

Integrating both sides, we obtain SPREADING PHENOMENA

10

i ≈ i0eβ⟨k⟩t

where the product β⟨k⟩ is called the transmission rate or transmissibility.



8

If i is small,

time t. For simplicity we also drop the (t) variable from i(t) and s(t), re-writ-

di = β 〈k〉si = β 〈k〉i(1− i) , dt

6

saturation regime

ing (10.1) as (ADVANCED TOPICS 10.A)

t

exponential regime

to capture the fraction of the susceptible and of the infected population at



4

(b) Time evolution of the fraction of infected individuals, as predicted by (10.4). At early times the fraction of infected individuals grows exponentially. As eventually everyone becomes infected, at large times we have i(∞)=1.

Throughout this chapter we will use the variables



If i → 1,

di →0 dt

(a) In the SI model an individual can be in one of two states: susceptible (healthy) or infected (sick). The model assumes that if a susceptible individual comes into contact with an infected individual, it becomes infected at rate β. The arrow indicates that once an individual becomes infected, it stays infected, hence it cannot recover.

S(t)I(t) dt . N

dI(t) S(t)I(t) . = β 〈k〉 dt N

10

saturation regime

i ≈ i0e

2

8

di The Susceptible-Infected (SI) Model dt → 0

Consequently I(t) changes at the rate



6

β⟨k⟩t

0 0

fected person encounters a susceptible individual is S(t)/N. Therefore the



t

4

If i is small,

Within the homogenous mixing hypothesis the probability that the in-



2

exponential regime 0.5

becomes infected at time t=0 (i.e. I(0)=1), how many individuals will be in-



INFECTED (SICK)

lni − ln(1− i) + C = β 〈k〉t . 7

epidemic modeling

SUSCEPTIBLE (HEALTHY) (a)

With the initial condition i0= i(t=0), we get C=i0/(1–i0), obtaining that the

S

fraction of infected individuals increases in time as

i=



i0 eβ 〈k 〉t . 1− i0 + i0 eβ 〈k 〉t

RECOVERY

SUSCEPTIBLE (HEALTHY)

• At the beginning the fraction of infected individuals increases exponen-

(b) FRACTION INFECTED i(t)

tially (Figure 10.4b). Indeed, early on an infected individual encounters only susceptible individuals, hence the pathogen can easily spread. • The characteristic time required to reach an 1/e fraction (about 36%) of all susceptible individuals is

τ=

1 . β 〈k〉

τ is the inverse of the speed with which the pathogen spreads

through the population. Equation (10.5) predicts that increasing either the density of links ⟨k⟩ or β enhances the speed of the pathogen and reduces the characteristic time.

Figure 10.5

• With time an infected individual encounters fewer and fewer suscepti-

Most pathogens are eventually defeated by the immune system or by treatment. To capture this fact we need to allow the infected individuals to recover, ceasing to spread the disease. With that we arrive at the so-called SIS model, which has the same two states as the SI model, susceptible and infected. The difference is that now infected individuals recover at a fixed rate μ, becoming susceptible again (Figure 10.5a). The equation describing the dynamics of this model is an extension of (10.3),

exponential outbreak

If i is small,

i ≈ i0e(β⟨k⟩t-μ)t 0 0

2

(10.6)

the population recovers from the disease. The solution of (10.6) provides the fraction of infected individuals in function of time (Figure 10.5b) (10.7)

where the initial condition i0= i(t=0) gives C=i0/(1–i0 –µ/β⟨k⟩). While in the SI model eventually everyone becomes infected, (10.7) predicts that in the SIS model the epidemic has two possible outcomes:

SPREADING PHENOMENA

8

10

endemic state

0.5

where μ is the recovery rate and the μi term captures the rate at which

µ Ce( β 〈 k 〉− µ )t , ) β 〈k〉 1+ Ce( β 〈 k 〉− µ )t

6

4

t

6

i (∞)= 1 –

μ β⟨k⟩

8

10

(b) Time evolution of the fraction of infected individuals in the SIS model, as predicted by (10.7). As recovery is possible, at large t the system reaches an endemic state, in which the fraction of infected individuals is constant, i(∞), given by (10.8). Hence in the endemic state only a finite fraction of individuals are infected. Note that for high recovery rate μ the number of infected individuals decreases exponentially and the disease dies out.

SUSCEPTIBLE-INFECTED-SUSCEPTIBLE (SIS) MODEL

i = (1−

t

μ

s(t→∞)=0.



4

If i is small, (a) The SIS model has the samei (∞)= states 1 – as the SI β⟨k⟩ i ≈ i0e(β⟨k⟩t-μ)t model: susceptible and infected. It differs from the SI model in that it allows recovery, i.e. infected individuals are cured, becoming susceptible again at rate μ.

epidemic ends when everyone has been infected, i.e. when i(t→∞)=1 and

di = β 〈k〉i(1− i) − µi , dt

2

exponential endemic The Susceptible-Infected-Susceptible (SIS) Model outbreak state

ble individuals. Hence the growth of i slows for large t (Figure 10.4b). The



0.5

0 1 0

(10.5)

INFECTED (SICK)

1

FRACTION INFECTED i(t)

Hence



I

INFECTION

(10.4)

Equation (10.4) predicts that:



INFECTED (SICK)

8

epidemic modeling



Endemic State (μ1 the pathogen will spread and persist in the population. The higher is R0, the faster is the spreading process. The table lists R0 for several wellknown pathogens. After [7].

ceeds the number of newly infected individuals. Therefore with time the pathogen disappears from the population. In other words, the SIS model predicts that some pathogens will persist in the population while others die out shortly. To understand what governs the difference between these two outcomes we write the characteristic time of a pathogen as

τ=



1 , µ (R0 − 1)

(10.9)

where

R0 =

β 〈k〉 (10.10) µ

is the basic reproductive number. It represents the average number of susceptible individuals infected by an infected individual during its infectious period in a fully susceptible population. In other words, R0 is the number of new infections each infected individual causes under ideal circumstances. The basic reproductive number is valuable for its predictive power: •

If R0 exceeds unity,

τ is positive, hence the epidemic is in the

endemic state. Indeed, if each infected individual infects more than one healthy person, the pathogen is poised to spread and persist in the population. The higher is R0, the faster is the spreading process. •

If R0< 1 then τ is negative and the epidemic dies out. Indeed, if each infected individual infects less than one additional person, the pathogen cannot persist in the population.

Consequently, the reproductive number is one of the first parameters SPREADING PHENOMENA

9

epidemic modeling

SUSCEPTIBLE model. The equation describing the evolution of the SIS model there (HEALTHY) a spontaneous transition term and reads as

di(t) = − µ i(t) + β k i(t) [1 − i(t)] . epidemiologists estimate for a new pathogen, gauging the severity of the dt (a) model for the study of infectious diseases leading to an endemic state with The usual normalization condition s(t) = 1 − i(t) has to be valid at a listed in Table problem they face. For several well-studies pathogens R0 ismatic a stationary and constant value for the prevalence of infected individuals, i.e. the degree to which the infection in the population as not measured Theis widespread SIS model does take into account the possibility of they 10.2. The high R0 of some of these pathogens underlies thebydangers the density of infected. In the SIS model, individuals exist INFECTION in the susceptiREMOVAL matic model for the study of infectious diseases leading to an endemic state with ble and infected classes only. The removal disease transmission is described as in theor SI acquired immunization, which woul ual’s through death pose: For example each individual infected with measles causes over a dozmodel, but infected individuals may recover and become susceptible again with a stationary and constant value for the prevalence of infected individuals, i.e. susceptible–infected–removed (SIR) model (Anderson and probability µ dt, where µso-called is the recovery rate. Individuals thus run stochastically as measured en subsequent infections. the degree to which the infection is widespread through in the the cycle population susceptible infected susceptible, hence the name of the Murray, 2005). The SIR model, in fact, assumes REMOVED that infected indiv INFECTED SUSCEPTIBLE model. The equation describing the evolution of the SIS model therefore contains by the density of infected. In the SIS model, individuals exist INFECTION in the susceptiREMOVAL (SICK) (HEALTHY) (IMMUNE/DEAD) a spontaneous transition term and reads as pear permanently ble and infected classes only. The disease transmission is described as in the SI from the network with rate µ and enter a new comp di(t) (9.6) = − µ i(t) +again β k i(t)with [1 − i(t)] . model, but infected individuals may recover and become susceptible individuals, whose density in the population is r (t) = R(t)/ dt removed SUSCEPTIBLE-INFECTED-RECOVERED (SIR) MODEL probability µ dt, where µ is the recovery rate. thus run stochastically TheIndividuals usual normalization condition s(t) = 1 − i(t) has to be valid at all times. duction of a new compartment yields the following system of equation The SIS model does not take into account the possibility of an individFor many pathogens, like mostthe strains of influenza, individuals develop through cycle susceptible infected susceptible, henceorthe name (b) of the ual’s removal through death acquired immunization, which would lead to the the dynamics: INFECTED SUSCEPTIBLE REMOVED model. Thethe equation describing the evolution of theof SIS model therefore(SIR) contains so-called susceptible–infected–removed model (Anderson and May, 1992; immunity after they recover from infection. Hence, instead return2005). The SIR model, in fact, assumes that infected dsindividuals disap(SICK) (HEALTHY) (IMMUNE/DEAD) a spontaneous transition term and reads as Murray, ds(t) = − β k i [1 − r − i] pear permanently from the network with rate µ and enter a new compartment R of ing to the susceptible state, they are “removed” from the population. These k i(t) [1 − r (t) − i(t)] =Theβintroremoved individuals, whose density in the population is r (t) =dt di(t) dtR(t)/ N. describing duction of − a new compartment system of equations i(t) i(t)] .the yields the following (9.6) k = − µ i(t) + β [1 recovered individuals do not count any longer from the perspective of the dynamics: dt s di ds = − µ i + β k i [1 − r − i ] i ds(t) = − βat iall [1 −times. r − i] pathogen as they cannot be infected, nor can they infect SIR The usual normalization condition s(t) = 1 others. − i(t) has The to be = β k ki(t) [1 − r (t) − i(t)] dt valid r dt dt The SIS model does not take into account possibility of an individdi dr model, whose properties are discussed in Figure 10.6, captures thethe dynamics (9.7) = − µ i + β k i [1 − r − i ] dt = µi ual’s removal through death or acquired immunization, which would lead to the dr dt = µi (9.8) of this process. so-called susceptible–infected–removed (SIR) model (Anderson and May, 1992; dt Throughthat theseinfected dynamics,Through all infected individuals will sooner or later enter these dynamics, all theinfected individuals will sooner or la Murray, 2005). The SIR model, in fact, assumes individuals disaprecovered compartment, so that it is clear that in the infinite time limit the epipear permanently from the network with rate demics µ andmust enter new R compartment, somodels that it is clear that fade a away. Itrecovered iscompartment interesting to note thatof both the SIS and SIR t in the infinite time l In summary, dependingremoved on theindividuals, characteristics of a pathogen, we need introduce a time governing self-recovery (c) whose density in the population is rscale (t) 1/µ = R(t)/ N.theThe intro-of individuals. We can think demics must fade away. It is interesting to note that both the SIS and of two extreme cases. If 1/µ is smaller than the spreading 1time scale 1/ β, then the different models to capture the ofdynamics of anyields epidemic outbreak. duction a new compartment the following equations describing process issystem dominatedof by As the natural recovery of infected to susceptible or removed introduce a time scale 1/µ governing the self-recovery of individuals. This situation is less interesting since it corresponds to a dynamical the dynamics: of the SI, SIS, and SIRindividuals. s shown in Figure 10.7, the predictions models process governed agree by the decay into a healthy state and the interaction with neighof two extreme cases. If 1/µ is smaller than the spreading time scale 1 ds i bors plays a minor role. The other extreme case is in0.75 the regime 1/µ 1/ β, ds(t) = − β k i When i]the with each other in the early stages of an dtepidemic: number of r (t) − i(t)] = β k i(t)[1[1−−r − process is dominated by the natural recovery of infected to susceptibl r dt infected individuals is small, the disease spreads freely and the number of individuals. This situation is less interesting since it corresponds to di (9.7) 0.5 = − µ i + β k i [1 − r − i ] dt process governed by the decay into a healthy state and the interaction infected individuals increases exponentially. The outcomes are different dr bors plays a(9.8) minor role. The other extreme case is in the regime 1 = µ i infected; the SIS model for large times: In the SI model everyone becomes dt 0.25 either reaches an endemic Through state, in which a finite fraction of individuals these dynamics, all infected individuals will sooner or later enter the compartment, so in thatthe it isSIR clearmodel that in the infinite time limit the epiare always infected, or therecovered infection dies out; everyone demics must fade away. It is interesting to note that both the SIS and SIR models 0 0 20 40 60 t recovers at the end. The reproductive predictsthethe long-term fate We can think introduce a timenumber scale 1/µ governing self-recovery of individuals. two extreme persists cases. If 1/µinisthe smaller than the spreading time scale 1/Figure β, then the population, while for of an epidemic: for R01 it dies out naturally. individuals. This situation is less interesting since it corresponds to aThe Susceptible-Infected-Recovered (SIR) Model dynamical

I

S

R

I

S

R

FRACTION OF POPULATION

1

0.75

0.5

0.25

0

20

40

60

FRACTION OF POPULATION

0

process governed by the decay into a healthy state and the interaction with neigh(a) In contrast with the SIS model, in the SIR a minor role. the The fact other that extreme casean is in the regime 1/µ 1/ β, The models discussed sobors farplays have ignored that individmodel recovered individuals enter a recovered state, meaning that they develop imual comes into contact only with its network-based neighbors in the permunity rather than becoming susceptible tinent contact network. We assumed homogenous mixing instead, which again. Flu, SARS and Plague are diseases with this property, hence we must use the means that an infected individual can infect any other individual. It also SIR model to describe their spread.

means that an infected individual typically infects only ⟨k⟩ other individu-

(b) The differential equations governing the time evolution of the fraction of individuals in the susceptible s, infected i and the removed r state.

als, ignoring variations in node degrees. To accurately predict the dynamics of an epidemic, we need to consider the precise role the contact network plays in epidemic phenomena.

(c) The time dependent behavior of s, i and r as predicted by the equations shown in (b). According to the model all individuals transition from a susceptible (healthy) state to the infected (sick) state and then to the recovered (immune) state.

SPREADING PHENOMENA

10

epidemic modeling

1

SI

EXPONENTIAL REGIME

Figure 10.7

Comparing the SI, SIS and SIR Models

SIS

0.75

The plot shows growth of the fraction of infected individuals, i, in the SI, SIS and SIR models. Two different regimes stand out:

FINAL REGIME

i ( t ) 0.5

Exponential Regime The models predict an exponential growth in the number of infected individuals during the early stages of the epidemic. For the same β the SI model predicts the fastest growth (smallest τ, see (10.5)). For the SIS and SIR models the growth is slowed by recovery, resulting in a larger τ, as predicted by (10.9). Note that for sufficiently high recovery rate μ the SIS and the SIR models predict a disease-free state, when the number of infected individuals decays exponentially with time.

0.25 0

SIR 0

5

t

10

15

SI Exponential Regime: Number of infected individuals grows exponentially

i =

i0 e k t 1 i0 + i0 e

SIS

k

t

i= 1

µ

Ce(

k 1+ Ce(

Final Regime: Saturation at t→ =∞

i( ∞) = 1

i( ∞ ) = 1 −

Epidemic Threshold: Disease does not always spread

No threshold

R0 = 1

SPREADING PHENOMENA

SIR µ )t

k k

β

µ )t

µ k

No closed solution

Final Regime The three models predict different long-term outcomes: In the SI model everyone becomes infected, i(∞)=1; in the SIS model a finite fraction of individuals are infected i(∞)

spreading on fully connected networks. Mobile Phone Viruses Mobile phone viruses spread via MMS and Bluetooth (Figure 10.2). An MMS virus sends a copy of itself to all phone numbers found in the phone's contact list. Therefore MMS viruses exploit the social network behind mobile communications. As shown in Table 4.1, the mobile call network is scale-free with a high degree exponent. Mobile viruses can also spread via Bluetooth, passing a copy of themselves to all susceptible phones with a BT connection in their physical proximity. As discussed above, this co-location network is also highly heterogenous [4]. In summary, in the past decade technological advances allowed us to map out the structure of several networks that support the spread of biological or digital viruses, from sexual to proximity-based contact networks (see also ONLINE RESOURCE 10.2). Many of these, like the email network, the internet, or sexual networks, are scale-free. For others, like co-location networks, the degree distribution may not be fitted with a simple power law, yet show significant degree heterogeneity with high ⟨k 2⟩. This means that the analytical results obtained in the previous section are of direct relevance to pathogens spreading on most networks. Consequently the underlying heterogenous contact networks allow even weakly virulent viruses to easily spread in the population.

SPREADING PHENOMENA

22

contact networks

SECTION 10.5

BEYOND THE DEGREE DISTRIBUTION

(a)

So far we have kept our models simple: We assumed that pathogens spread on an unweighted network uniquely defined by its degree distri-

NETWORK

A

bution. Yet, real networks have a number of characteristics that are not

B

B

captured by pk alone, like degree correlations or community structure. Furthermore, the links are typically weighted and the interactions have

C

a finite temporal duration. In this section we explore the impact of these

D

properties on the spread of a pathogen.

A

C D

9:00

12:00

15:00

time

TEMPORAL NETWORKS

Figure 10.17

Most interactions that we perceive as social links are brief and infre-

Temporal Networks

quent. As a pathogen can be only transmitted when there is an actual con-

Most interactions in a network are not continuous, but have a finite duration. We must therefore view the underlying networks as temporal networks, an increasingly active research topic in network science.

tact, an accurate modeling framework must also consider the timing and the duration of each interaction. Ignoring the timing of the interactions can lead to misleading conclusions [39-41]. For example, the static network of Figure 10.17b was obtained by aggregating the individual interactions

(a) Temporal Network

shown in Figure 10.17a. On the aggregated network the infection has the

The timeline of the interactions between four individuals. Each vertical line marks the moment when two individuals come into contact with each other. If A is the first to be infected, the pathogen can spread from A to B and then to C, eventually reaching D. If, however, D is the first to be infected, the disease can reach C and B, but not A. This is because there is a temporal path from A to D.

same chance of spreading from D to A as from A to D. Yet, by inspecting the timing of each interaction, we realize that while an infection starting from A can infect D, an infection that starts at D cannot reach A. Therefore, to accurately predict an epidemic process we must consider the fact that pathogens spread on temporal networks, a topic of increasing interest in network science [40-43]. By ignoring the temporality of these contact patterns, we typically overestimate the speed and the extent of an outbreak [42,43].

(b) Aggregated Network The network obtained by merging the temporal interactions shown in (a). If we only have access to this aggregated representation, the pathogen can reach all individuals, independent of its starting point. After [40].

BURSTY CONTACT PATTERNS The theoretical approaches discussed in the SECTIONS 10.2 and 10.3 assume that the timing of the interactions between two connected nodes is random. This means that the interevent times between consecutive contacts follow an exponential distribution, resulting in a random but uniform sequence of events (Figure 10.18a-c). The measurements indicate otherwise: The interevent times in most social systems follow a power law distribution [35,44] (Fig. 10.18d-f). This means that the sequence of contacts

SPREADING PHENOMENA

(b) AGGREGATED

TEMPORAL NETWORK

23

(a)

Figure 10.18

Bursty Interactions (b)

(c)

300

(a) If the pattern of activity of an individual is random, the interevent times follow a Poisson process, which assumes that in any moment an event takes place with the same probability q. The horizontal axis denotes time and each vertical line corresponds to an event whose timing is chosen at random. The observed inter-event times are comparable to each other and very long delays are rare.

10 0

10 −2

P (τ)

DELAY TIME ( τ)

500

100

−100

10 −4

0

200

400

600

EVENT NUMBER

800

10 −6

1,000

0

20

40

60

τi

80 100

(b) The absence of long delays is visible if we show the inter-event times τi for 1,000 consecutive random events. The height of each vertical line corresponds to the gaps seen in (a).

(d) (f) 500

10 0

300

10 −4

(c) The probability of finding exactly n events within a fixed time interval follows the Poisson distribution P(n,q)=e–qt(qt)n/n!, predicting that the inter-event time distribution follows P(τi)~e–qτi, shown on a log-linear plot.

P (τ)

DELAY TIME (τ)

(e)

100

−100

10 −8

0

200

400

600

EVENT NUMBER

800

1,000

10 −12 −2 10 0 10

10 2

τi

10 4

10 6

(d) The succession of events for a temporal pattern whose interevent times follow a power-law distribution. While most events follow each other closely, forming bursts of activity, there are a few exceptionally long interevent times, corresponding to long gaps in the contact pattern. The time sequence is not as uniform as in (a), but has a bursty character.

between two individuals is characterized by periods of frequent interactions, when multiple contacts follow each other within a relatively short time frame. Yet, the power law also implies that occasionally there are a very long time gaps between two contacts. Therefore the contact patterns have an uneven, “bursty” character in time (Figure 10.18d,e).

(e) The waiting time τi of 1,000 consecutive events, where the mean event time is chosen to coincide with the mean event time of the Poisson process shown in (b). The large spikes correspond to exceptionally long delays.

Bursty interactions are observed in a number of contact processes of relevance for epidemic phenomena, from email communications to call patterns and sexual contacts. Once present, burstiness alters the dynamics

(f) The delay time distribution P(τi)~τi–2 for the bursty process shown in (d) and (e). After [35].

of the spreading process [43]. To be specific, power law interevent times increase the characteristic time τ, consequently the number of infected individuals decays slower than predicted by a random contact pattern. For example, if the time between consecutive emails would follow a Poisson distribution, an email virus would decay following i(t)~exp(–t/τ) with a de-

cay time of τ≈1 day. In the real data, however, the decay time is τ≈21 days, a much slower process, correctly predicted by the theory if we use power law interevent times [43].

DEGREE CORRELATIONS As discussed in CHAPTER 7, many social networks are assortative, implying that high degree nodes tend to connect to other high degree nodes. Do these degree correlations affect the spread of a pathogen? The calculations indicate that degree correlations leave key aspects of network epidemics in place, but they alter the speed with which a pathogen spreads in a network: • Degree correlations alter the epidemic threshold λc : assortative correlations decrease λc and dissasortative correlations increase it [45,46].

SPREADING PHENOMENA

24

Beyond the Degree Distribution

(a)

old vanishes for a scale-free network with diverging second moment, whether the network is assortative, neutral or disassortative [47]. Hence the fundamental results of SECTION 10.3 are not affected by degree correlations. • Given that hubs are the first to be infected in a network, assortativity accelerates the spread of a pathogen. In contrast disassortativity slows the spreading process. • Finally, in the SIR model assortative correlations were found to

FRACTION OF INFECTED USERS

• Despite the changes in λc , for the SIS model the epidemic thresh-

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

CONTROL

0

10

20

30

40

TIME t

REAL

50

60

70

80

(b)

lower the prevalence but increase the average lifetime of an epidemic outbreak [48].

LINK WEIGHTS AND COMMUNITIES Throughout this chapter we assumed that all tie strengths are equal, fo-

E

cusing our attention on pathogens spreading on an unweighted network.

F

In reality tie strengths vary considerably, a heterogeneity that plays an important role in spreading phenomena. Indeed, the more time an individual spends with an infected individual, the more likely that she too becomes (c)

infected. In the same vein, previously we ignored the community structure of the network on which the pathogen spreads. Yet, the existence communities (CHAPTER 9) leads to repeated interactions between the nodes within the same community, altering the spreading dynamics. The mobile phone network allows us to explore the role of tie strengths and communities on spreading phenomena [49]. Let us assume that at t=0 we provide a randomly selected individual with some key information. At

Figure 10.19

each time step this “infected” individual i passes the information to her

Information Diffusion in Mobile Phone Networks

contact j with probability p ij ~βwij, where β is the spreading probability and

The spread of information on a weighted mobile call graph, where the probability that a node passes information to one of its neighbors is proportional to the strength of the tie between them. The tie strength is the number of minutes two individuals talk on the phone.

wij is the strength of the ties captured by the number of minutes i and j have

spent with each other on the phone. Indeed, the more time two individuals talk, the higher is the chance that they will pass on the information. To understand the role of the link weights in the spreading process, we also consider the situation when the spreading takes place on a control network,

(a) The fraction of infected nodes in function of time. The blue circles capture the spread on the network with the real tie strengths; the green symbols represent the control case, when all tie strengths are equal.

that has the same wiring diagram but all tie strengths are set equal to w= ⟨wij⟩. As Figure 10.19a illustrates, information travels significantly faster on

(b) Spreading in a small network neighborhood, following the real link weights. The information is released from the red node, the arrow weight indicating the tie strength. The simulation was repeated 1,000 times; the size of the arrowheads is proportional to the number of times the information was passed along the corresponding direction, and the color indicates the total number of transmissions along that link. The background contours highlight the difference in the direction the information follows in the real and the control simulations.

the control network. The reduced speed observed in the real system indicates that the information is trapped within communities. Indeed, as we discussed in CHAPTER 9, strong ties tend to be within communities while weak ties are between them [50]. Therefore, once the information reaches a member of a community, it can rapidly reach all other members of the same community, given the strong ties between them. Yet, as the ties between the communities are weak, the information has difficulty escaping the community. Consequently the rapid invasion of the community is followed by long intervals during which the infection is trapped within a

SPREADING PHENOMENA

(c) Same in (b), but we assume that each link has the same weight w=⟨wij⟩(control). After [49]. 25

Beyond the Degree Distribution

community. When all link weights are equal (control), the bridges between communities are strengthened, and the trapping vanishes. The difference between the real and the control spreading process is illustrated by Figure 10.20b,c, that shows the spreading pattern in a small neighborhood of the mobile call network. In the control simulation the information tends to follow the shortest path. When the link weights are taken into account, information flows along a longer backbone with strong ties. For example, the information rarely reaches the lower half of the network in Figure 10.20b, a region always reached in the control simulation shown in (c).

COMPLEX CONTAGION Communities have multiple consequences for spreading, from inducing global cascades [51,52] to altering the activity of individuals [53]. The diffusion of memes, representing ideas or behavior that spread from individual to individual, further highlights the important role of communities [54]. Meme diffusion has attracted considerable attention from marketing [5, 55] to network science [56,57], communications [58], and social media [59-61]. Pathogens and memes can follow different spreading patterns, prompting us to systematically distinguish simple from complex contagion [54,62,63]. Simple contagion is the process we explored so far: It is sufficient to come into contact with an infected individual to be infected. The spread of memes, products and behavior is often described by complex contagion, capturing the fact that most individuals do not adopt a new meme, product or behavioral pattern at the first contact. Rather, adoption requires reinforcement [64], i.e. repeated contact with several individuals who have already adopted. For example, the higher is the fraction of a person’s friends that have a mobile phone, the more likely that she also buys one. In simple contagion communities trap an information or a pathogen, slowing the spreading (Figure 10.19a). The effect is reversed in complex contagion: Because communities have redundant ties, they offer social reinforcement, exposing an individual to multiple examples of adoption. Hence communities can incubate a meme, a product or a behavioral pattern, enhancing its adoption. The difference between simple and complex contagion is well captured by Twitter data. Tweets, or short messages, are often labeled with hashtags, which are keywords acting as memes. Twitter users can follow other users, receiving their messages; they can forward tweets to their own followers (retweet), or mention others in tweets. The measurements indicate that most hashtags are trapped in specific communities, a signature of complex contagion [54]. A high concentration of a meme within a certain community is evidence of reinforcement. In contrast, viral memes spread across communities, following a pattern similar to that encountered in bi-

SPREADING PHENOMENA

26

Beyond the Degree Distribution

(a)

Figure 10.20

Simple vs. Complex Contagion The community structure of the Twitter follower network. Each circle corresponds to a community and its size is proportional to the number of tweets produced by the respective community. The color of a community represents the time when the studied hashtag (meme) is first used in the community. Lighter colors denote the first communities to use a hashtag, darker colors denote the last community to adapt it.

(b)

(a) Simple Contagion The evolution of the viral meme captured by the #ThoughtsDuringSchool hashtag from its early stage (30 tweets, left) to the late stage (200 tweets, right). The meme jumps easily between communities, infecting many of them, following a contagion pattern encountered in the case of biological pathogens. (b) Complex Contagion The evolution of a non-viral meme caputed by the #ProperBand hashtag from the early stage (left) to the final stage (65 tweets, right). The tweet is trapped in a few of communities, having difficulty to escape them. This is a signature of reinforcement, an indication that the meme follows complex contagion. After [54].

ological pathogens. In general the more communities a meme reaches, the more viral it is (Figure 10.20). In summary, several network characteristics can affect the spread of a pathogen in a network, from degree correlations to link weights and the bursty nature of the contact pattern. As we discussed in this section, some network characteristics slow a pathogen, others aid their spread. These effects must therefore be accounted for if we wish to predict the spread of a real pathogen. While these patterns are of obvious relevance for infectious diseases, they also influence the spread of such non-infectious diseases as obesity (BOX 10.2).

SPREADING PHENOMENA

27

Beyond the Degree Distribution

BOX 10.2 DO OUR FRIENDS MAKE US FAT?

Infectious diseases, like influenza, SARS, or AIDS, spread through the transmission of a pathogen. But could the social network aid the spread of noninfectious diseases as well? Recent measurements indi-

>

cate that it does, offering evidence that social networks can impact the spread of obesity, happiness, and behavioral patterns, like giving up smoking [65,66]. Obesity is diagnosed through an individual’s body-mass index (BMI), which is determined by numerous factors, from genetics to diet and

Online Resource 10.3

exercise. The measurements show that our friends also play an im-

Spreading in Social Networks

portant role. The analysis of the social network of 5,209 men and

“If your friends are obese, your risk of obesity is 45 percent higher. … If your friend’s friends are obese, your risk of obesity is 25 percent higher. … If your friend’s friend’s friend, someone you probably don’t even know, is obese, your risk of obesity is 10 percent higher. It’s only when you get to your friend’s friend’s friend’s friends that there’s no longer a relationship between that person’s body size and your own body size.”

women has found that if one of our friends is obese, the risk that we too gain weight in the next two to four years increases by 57% [65]. The risk triples if our best friend is overweight: In this case, our chances of weight gain jumps by 171% (Figure 10.21). For all practical purposes, obesity appears to be just as contagious as influenza or AIDS, despite the fact that there is no "obesity pathogen" that transmits it.

Watch Nicholas Christakis explaining the spread of health patterns in social networks.

> Figure 10.21

The Web of Obesity The largest connected component of the social network capturing the friendship ties between 2,200 individuals enrolled in the Framingham Heart Study. Each node represents an individual; nodes with blue borders are men, those with red borders are women. The size of each node is proportional to the person's BMI, yellow nodes denoting obese individuals (BMI ≥30). Purple links are friendship or marital ties and orange links are family ties (e.g. siblings). Clusters of obese and non-obese individuals are visible in the network. The analysis indicates that these clusters cannot be attributed to homophily, i.e. the fact that individuals of similar body size may befriend with each other. They document instead a complex contagion process, capturing the "spread" of obesity along the links of the social network. After [65].

SPREADING PHENOMENA

28

Beyond the Degree Distribution

SECTION 10.6

IMMUNIZATION

Immunization strategies specify how vaccines, treatments or drugs are distributed in the population. Ideally, should a treatment or vaccine exist, it should be given to every infected individual or those at risk of contracting the pathogen. Yet, often cost considerations, the difficulty of reaching all individuals at risk, and real or perceived side effects of the treatment prohibit full coverage. Given these constraints, immunization strategies aim to minimize the threat of a pandemic by most effectively distributing the available vaccines or treatments. Immunization strategies are guided by an important prediction of the traditional epidemic models: If a pathogen’s spreading rate λ is reduced under its critical threshold λc, the virus naturally dies out (Figure 10.11). Yet, the epidemic threshold vanishes in scale-free networks, questioning the effectiveness of this strategy. Indeed, if the epidemic threshold vanishes, immunization strategies can not move λ under λc. In this section we discuss how to use our understanding of the network topology to design effective network-based immunization strategies that counter the impact of the vanishing epidemic threshold.

RANDOM IMMUNIZATION The main purpose of immunization is to protect the immunized individual from an infection. Equally important, however, is its secondary role: Immunization reduces the speed with which the pathogen spreads in a population. To illustrate this effect consider the situation when a randomly selected g fraction of individuals are immunized in a population [8]. Let us assue that the pathogen follows the SIS model (10.3). The immunized nodes are invisible to the pathogen, and only the remaining (1–g) fraction of the nodes can contact and spread the disease. Consequently, the effective degree of each susceptible node changes from ⟨k⟩ to ⟨k⟩ (1–g), which decreases the spreading rate of the pathogen from λ= β/µ to λ'=λ(1–g). Next we explore the consequences of this reduction in both random and scale-free contact networks.

SPREADING PHENOMENA

29

• Random Networks

BOX 10.3

If the pathogen spreads on a random network, for a sufficiently high g the spreading rate λ' could fall below the epidemic threshold (10.25). The immunization rate gc necessary to achieve this is calculated by setting



HOW TO HALT AN EPIDEMIC?

(1− gc )β 1 , = µ 〈k〉 + 1

several interventions to control or delay an epidemic outbreak.

obtaining

Health safety officials rely on

g = 1− c

µ 1 . β 〈k〉 + 1

Some of the most common in-

(10.27)

terventions include: Transmission-Reducing Interventions

Consequently, if vaccination increases the fraction of immunized individuals above gc, it pushes the spreading rate under the epidemic threshold λc. In this case

Face masks, gloves, and hand

τ becomes negative and the

washing reduces the transmis-

pathogen dies out naturally. This explains why health official

sion rate of airborne or contact

encourage a high fraction of the population take the influenza

based pathogens. Similarly,

vaccine: The vaccine protects not only the individual, but also the

condoms reduce the transmis-

rest of the population by decreasing the pathogen’s spreading

sion rate of sexually transmit-

rate. Similarly, a condom not only protects the individual who

ted pathogens.

uses it from contacting the HIV virus, but also decrease the rate at which AIDS spreads in the sexual network. Hence for random

Contact-Reducing Interventions

networks a sufficiently high immunization rate can eliminate

For diseases with severe health consequences officials can quar-

the pathogen from the population.

antine patients, close schools • Heterogenous Networks

and limit access to frequently

If the pathogen spreads on a network with high ⟨k 2⟩, and random

visited public spaces, like movie

immunization changes λ to λ(1–g), we can use (10.26) to determine

theaters and malls. These make

the critical immunization gc



the network sparser by reducing the number of contacts between

β 〈k〉 (1− gc ) = 2 (10.28) µ 〈k 〉

individuals, hence decreasing the transmission rate.

obtaining gc = 1−

µ 〈k〉 . β 〈k 2 〉

Vaccinations

(10.29)

Vaccinations permanently remove the vaccinated nodes from the network, as they

For a random network (10.29) reduces to (10.27). For a scale-free

cannot be infected nor can they

network with γ0, i.e. when the number of infected nodes grows exponentially with time. This yields the condition for a global outbreak as

β µ

λ= >

〈k〉 , 〈k 〉 − 〈k〉 2

(10.46)

allowing us to write the epidemic threshold for the SIR model as (Table 10.3)

1 . λ = c

〈k 2 〉 −1 〈k〉

(10.47)

SIS MODEL In the SIS model the density of infected nodes is given by (10.18),

di dt

k . = β (1− ik )kΘ − µik

(10.48)

There is a small but important difference in the density function of the SIS model. For the SI and the SIR models, if a node is infected, then at least one of its neighbors must also be infected or recovered, hence at most (k–1) of its neighbors are susceptible, the origin of the (-1) term in the paranthesis of (10.34) . However, in the SIS model the previously infected neighbor can become susceptible again, therefore all k links of a node can be available to spread the disease. Hence we modify the definition (10.34) to obtain

Θ k =

∑ k′

k ′pk′ik′

〈k〉

= Θ .

(10.49)

Again keeping only the first order terms we obtain

di dt

k = β kΘ − µik .

(10.50)

Multiplying the equation with (k–1)pk/〈k〉 and summing over k we have

dΘ dt

⎛ 〈k 2 〉 ⎞ − µ ⎟ Θ . ⎝ 〈k〉 ⎠

(10.51)

Θ(t) = Cet /τ,

(10.52)

= ⎜β This again has the solution

SPREADING PHENOMENA

50

advanced topics 10.B

where the characteristic time of the SIS model is

〈k〉 . β 〈k 2 〉 − 〈k〉 µ

τ =

(10.53)

A global outbreak is possible if τ>0, which yields the condition for a global outbreak as

β 〈k〉 , (10.54) λ= > µ

〈k 2 〉

and the epidemic threshold for the SIS model as (Table 10.3)

SPREADING PHENOMENA



λc =

〈k〉 . 〈k 2 〉

(10.55)

51

advanced topics 10.B

SECTION 10.12

ADVANCED TOPICS 10.C TARGETED IMMUNIZATION

In this section we derive the epidemic threshold for the SIS and SIR models on scale-free networks under hub immunization. We start with an uncorrelated network with power law degree distribution p =c.k -γ where k

-γ+1 and k ≥k min. In SECTION 10.16 we obtained for the critical spreadc≈(γ–1)/k min

ing rate, λc =

〈k〉 1 = 〈k 2 〉 κ

(SIS model)

and

1 1 = 〈k 2 〉 κ −1 −1 〈k〉

λc =

(SIR model).

Under hub immunization we immunize all nodes whose degree is larger than k0. From the perspective of the epidemic this is equivalent with removing the high degree nodes from the network. Therefore to calculate the new critical spreading rate, we need to determine the average degree 〈k'〉 and the second moment 〈k'2〉 after the hubs have been removed. This problem was addressed in the ADVANCED TOPICS 8.F, where we studied the robustness of a network under attack. We have seen that hub removal has two effects: 1) The maximum degree of the network changes to k0. 2) The links connected to the removed hubs are also removed, as if we

randomly remove an

⎛ k ⎞ f = ⎜ 0 ⎟ ⎝ kmin ⎠



− γ +2

(10.56)

fraction of links. The degree distribution of the resulting network is k0

⎛ k ⎞ ˜ k− k′  k′ pk′′ = ∑ ⎜ ⎟ f (1− f ) pk . k=k min ⎝ k ′ ⎠ SPREADING PHENOMENA

52

According to (8.39) and (8.40) this yields



〈 k ′ 〉 = (1− f )〈k〉 ,

〈 k ′ 〉 = (1− f ) 〈k 〉 + f (1− f )〈k〉 , 2



2

2





where 〈k〉 is the average and 〈k2〉 is the second moment of the degree distribution before the link removal, but with maximum degree k0. For the SIS model this means (1− f )〈k〉 1 , λc′ = = 2  (1− f ) 〈k 2 〉 + f (1− f )〈k〉 (1− f )κ + f

(10.57)

where, according to equation (8.47), for 2>γ>3

γ − 2 3−γ γ −2 κ= k0 kmin . 3−γ

(10.58)



Combining (10.56), (10.57) and (10.58) we obtain −1

⎡ γ − 2 3−γ γ −2 γ − 2 5−2γ 2γ −4 γ −2 ⎤ λc′ = ⎢ k0 kmin − k0 kmin + k02−γ kmin ⎥ . (10.59) 3−γ ⎣ 3−γ ⎦ For the SIR model a similar calculation yields −1

⎡ γ − 2 3−γ γ −2 γ − 2 5−2γ 2γ −4 ⎤ γ −2 λc′ = ⎢ k0 kmin − k0 kmin + k02−γ kmin − 1⎥ . 3−γ ⎣ 3−γ ⎦

(10.60)

For both the SIR and SIS models if k0≫k min we have

3 − γ γ −3 2−γ k k . γ − 2 0 min

λc′ ≈

SPREADING PHENOMENA

(10.61)

53

advanced topics 10.C

SECTION 10.13

ADVANCED TOPICS 10.D THE SIR MODEL AND BOND PERCOLATION

The SIR model is a dynamical model that captures the time dependent

1

spread of an infection in a network. Yet, it can be mapped into a static bond

SOURCE

percolation problem [103-106]. This mapping offers analytical tools that help us predict the model’s behavior.

2

Consider an epidemic process on a network, so that each infected node transmits a pathogen to each of its neighbors with rate β, and recovers

after a recovery time τ=1/µ. We view the infection as a Poisson process,

4

consisting of series of random contacts with average interevent time βτ.

5

3 pb = 1 − e− β /µ

Therefore the probability that an infected node does not transmit the pathogen to susceptible neighbors decreases exponentially in time, or e–βτ. The infected node stays infected until it recovers in τ=1/µ time. Therefore

Figure 10.35 Mapping Epidemics into Percolation Consider the contact network on which the epidemic spreads. To map the spreading process into percolation, we leave in place each link with probability, pb=1–e–β/µ, a probability determined by the biological characteristics of the pathogen. Therefore links are removed with probability e–β/µ. The cluster size distribution of the remaining network can be mapped exactly into the outbreak size. For large β/µ we will likely have a giant component, indicating that we could face a global outbreak. β/µ corresponds to a virus that has difficulty spreading and we end up with numerous small clusters, indicating that the pathogen will likely die out.

the overall probability that the pathogen is passed on is 1– e–βτ. This process is equivalent with bond percolation on the same network, where each directed link is occupied with probability pb=1–e–βτ (Figure 10.35). If β and τ are the same for each node, the network can be considered un-

directed. Although this mapping looses the temporal dynamics of the epidemic process, it has several advantages: • The total fraction of infected nodes in the endemic state maps into the size of the giant component of the percolation problem. • The probability that a pathogen dies out before reaching the endemic state equals the fraction of the nodes in a randomly selected finite component in the percolation problem. • We can determine the epidemic threshold by exploiting the known properties of bond percolation. Consider the average number of links outgoing from a node that can be reached by a link. This allows us to retrace the course of the epidemic: If an infected individual infects on average at least one other individual, then the epidemic can reach an endemic state. Since a node can be reached by one of its k links, the probability to be reached is kpk/N〈k〉. The probability of each of its k–1 outgoing links infectSPREADING PHENOMENA

54

ing its neighbor is pb. Since the network is randomly connected, as long as the epidemic has not spread yet, the average number of neighbors infected by the selected node is 〈R 〉 = p i

b



pk k(k − 1) . 〈k〉

An endemic state can be reached only if 〈Ri〉>1, obtaining the condition for the epidemic as [107,108]

1 〈k 2 〉 − 1) > . 〈k〉 pb

(

(10.62)

Equation (10.62) agrees with the result (10.46) derived earlier from the dynamical models: Scale-free networks with γ≤3 have a divergent second moment, hence such networks undergo a percolation transition even at pb→0. That is, a virus can spread on this network regardless of how small is

the infection probability β or how small is the recovery time τ.

SPREADING PHENOMENA

55

advanced topics 10.D

SECTION 10.14

BIBLIOGRAPHY

[1] D. Normile. The Metropole, Superspreaders and Other Mysteries. Science, 339:1272-1273, 2013. [2] J.O. Lloyd-Smith, S.J. Schreiber, P.E. Kopp, and W.M. Getz. Superspreading and the effect of individual variation on disease emergence. Nature, 438:355-359, 2005. [3] M. Hypponen. Malware Goes Mobile. Scientific American, 295:70, 2006. [4] P. Wang, M. Gonzalez, C. A. Hidalgo, and A.-L. Barabási. Understanding the spreading patterns of mobile phone viruses. Science, 324:10711076, 2009. [5] E.M. Rogers. Diffusion of Innovations. Free Press, 2003. [6] T.W. Valente. Network models of the diffusion of innovations. Hampton Press, Cresskill, NJ, 1995. [7] History and Epidemiology of Global Smallpox Eradication From the training course titled "Smallpox: Disease, Prevention, and Intervention". The CDC and the World Health Organization. Slides 16-17. [8] R.M. Anderson and R.M. May. Infectious Diseases of Humans: Dynamics and Control. Oxford University Press, Oxford, 1992. [9] R. Pastor-Satorras and A. Vespignani. Epidemic spreading in scalefree networks. Physical Review Letters, 86:3200–3203, 2001. [10] R. Pastor-Satorras and A. Vespignani. Epidemic dynamics and endemic states in complex networks. Physical Review E, 63:066117, 2001. [11] Y. Wang, D. Chakrabarti, C, Wang, and C. Faloutsos. Epidemic spreading in real networks: an eigenvalue viewpoint. Proceedings of 22nd International Symposium on Reliable Distributed Systems, pg. 25-34, 2003.

SPREADING PHENOMENA

56

[12] R. Durrett. Some features of the spread of epidemics and information on a random graph. PNAS, 107:4491-4498, 2010. [13] S. Chatterjee and R. Durrett. Contact processes on random graphs with power law degree distributions have critical value 0. Ann. Probab., 37: 2332-2356, 2009. [14] C Castellano, and R Pastor-Satorras. Thresholds for epidemic spreading in networks. Physical Review Letters, 105:218701, 2010. [15] B. Lewin. (ed.), Sex i Sverige. Om sexuallivet i Sverige 1996 [Sex in Sweden. On the Sexual Life in Sweden 1996]. National Institute of Public Health, Stockholm, 1998. [16] F. Liljeros, C. R. Edling, L. A. N. Amaral, H. E. Stanley, and Y. Åberg. The web of human sexual contacts. Nature, 411:907-8, 2001. [17] A. Schneeberger, C. H. Mercer, S. A. Gregson, N. M. Ferguson, C. A. Nyamukapa, R. M. Anderson, A. M. Johnson, and G. P. Garnett. Scale-free networks and sexually transmitted diseases: a description of observed patterns of sexual contacts in Britain and Zimbabwe. Sexually Transmitted Diseases, 31: 380-387, 2004. [18] W. Chamberlain. A View from Above. Villard Books, New York, 1991. [19] R. Shilts. And the Band Played On. St. Martin’s Press, New York, 2000. [20] P. S. Bearman, J. Moody, and K. Stovel. Chains of affection: the structure of adolescent romantic and sexual networks. Am J Sociol., 110:44-91, 2004. [21] M. C. González, C. A. Hidalgo, and A.-L. Barabási. Understanding individual human mobility patterns. Nature, 453:779-782, 2008. [22] C. Song, Z. Qu, N. Blumm, and A.-L. Barabási. Limits of Predictability in Human Mobility. Science, 327:1018-1021, 2010. [23] F. Simini, M. González, A. Maritan, and A.-L. Barabási. A universal model for mobility and migration patterns. Nature, 484:96-100, 2012. [24] D. Brockmann, L. Hufnagel, and T. Geisel. The scaling laws of human travel. Nature, 439:462–465, 2006. [25] V. Colizza, A. Barrat, M. Barthelemy, and A. Vespignani. The role of the airline transportation network in the prediction and predictability of global epidemics. PNAS, 103:2015, 2006. [26] L. Hufnagel, D. Brockmann, and T. Geisel. Forecast and control of epidemics in a globalized world. PNAS, 101:15124, 2004. [27] R. Guimerà, S. Mossa, A. Turtschi, and L. A. N. Amaral. The worldwide air transportation network: Anomalous centrality, community strucSPREADING PHENOMENA

57

BIBLIOGRAPHY

ture, and cities' global roles. PNAS, 102:7794, 2005. [28] C. Cattuto, et al. Dynamics of Person-to-Person Interactions from Distributed RFID Sensor Networks. PLoS ONE, 5:e11596, 2010. [29] L. Isella, C. Cattuto, W. Van den Broeck, J. Stehle, A. Barrat, and J.-F. Pinton. What’s in a crowd? Analysis of face-to-face behavioral networks. Journal of Theoretical Biology, 271:166-180, 2011. [30] K. Zhao, J. Stehle, G. Bianconi, and A. Barrat. Social network dynamics of face-to-face interactions. Physical Review E, 83:056109, 2011. [31] J. Stehlé, N. Voirin, A. Barrat, C Cattuto, L. Isella, J-F. Pinton, M. Quaggiotto, W. Van den Broeck, C. Régis, B. Lina, and P. Vanhems. High-resolution measurements of face-to-face contact patterns in a primary school. PLoS ONE, 6:e23176, 2011. [32] B.N. Waber, D. Olguin, T. Kim, and A. Pentland. Understanding Organizational Behavior with Wearable Sensing Technology. Academy of Management Annual Meeting. Anaheim, CA. August, 2008. [33] L. Wu, B.N. Waber, S. Aral, E. Brynjolfsson, and A. Pentland. Mining Face-to-Face Interaction Networks using Sociometric Badges: Predicting Productivity in an IT Configuration Task. In Proceedings of the International Conference on Information Systems. Paris, France. December 14-17 2008. [34] M. Salathé, M. Kazandjievab, J.W. Leeb, P. Levisb, M.W. Feldmana, and J.H. Jones. A high-resolution human contact network for infectious disease transmission. PNAS, 107:22020–22025, 2010. [35] A.-L. Barabási. The origin of bursts and heavy tails in human dynamics. Nature, 435:207-11, 2005. [36] V. Sekara, and S. Lehmann. Application of network properties and signal strength to identify face-to-face links in an electronic dataset. Proceedings of CoRR, 2014. [37] S. Eubank, H. Guclu, V.S.A. Kumar, M.V. Marathe, A. Srinivasan, Z. Toroczkai, and N. Wang. Modelling disease outbreaks in realistic urban social networks. Nature, 429:180-184, 2004. [38] H. Ebel, L-I. Mielsch, and S. Bornholdt. Scale-free topology of e-mail networks. Physical Review E, 66:035103, 2002. [39] M. Morris, and M. Kretzschmar. Concurrent partnerships and transmission dynamics in networks. Social Networks, 17:299-318, 1995. [40] N. Masuda and P. Holme. Predicting and controlling infectious diseases epidemics using temporal networks. F1000 Prime Rep., 5:6, 2013. [41] P. Holme, and J. Saramäki. Temporal networks. Physics Reports,

SPREADING PHENOMENA

58

BIBLIOGRAPHY

519:97-125, 2012. [42] M. Karsai, M. Kivelä, R. K. Pan, K. Kaski, J. Kertész, A.-L. Barabási, and J. Saramäki. Small but slow world: how network topology and burstiness slow down spreading. Physical Review E, 83:025102(R), 2011. [43] A. Vazquez, B. Rácz, A. Lukács, and A.-L. Barabási. Impact of nonPoissonian activity patterns on spreading processes. Physical Review Letters, 98:158702, 2007. [44] A. Vázquez, J.G. Oliveira, Z. Dezsö, K.-I. Goh, I. Kondor, and A.-L. Barabási. Modeling bursts and heavy tails in human dynamics. Physical Review E, 73:036127, 2006. [45] A.V. Goltsev, S.N. Dorogovtsev, and J.F.F. Mendes. Percolation on correlated networks. Physical Review E., 78:051105, 2008. [46] P. Van Mieghem, H. Wang, X. Ge, S. Tang and F. A. Kuipers. Influence of assortativity and degree-preserving rewiring on the spectra of networks. The European Physical Journal B, 76:643, 2010. [47] M. Boguná, R. Pastor-Satorras, and A. Vespignani. Absence of epidemic threshold in scale-free networks with degree correlations. Physical Review Letters, 90:028701, 2003. [48] Y. Moreno, J. B. Gómez, and A.F. Pacheco. Epidemic incidence in correlated complex networks. Physical Review E, 68:035103, 2003. [49] J.-P. Onnela, J. Saramaki, J. Hyvonen, G. Szabó, D. Lazer, K. Kaski, J. Kertész, and A.-L. Barabási. Structure and tie strengths in mobile communication networks. PNAS, 104:7332, 2007. [50] M. S. Granovetter. The strength of weak ties. American Journal of Sociology, 78:1360–1379, 1973. [51] A. Galstyan, and P. Cohen. Cascading dynamics in modular networks. Physical Review E, 75:036109, 2007. [52] J. P. Gleeson. Cascades on correlated and modular random networks. Physical Review E, 77:046117, 2008. [53] P. A. Grabowicz, J. J. Ramasco, E. Moro, J. M. Pujol, and V. M. Eguiluz. Social features of online networks: The strength of intermediary ties in online social media. PLOS ONE, 7:e29358, 2012. [54] L. Weng, F. Menczer and Y.-Y. Ahn. Virality Prediction and Community Structure in Social Networks. Scientific Reports, 3:2522, 2013. [55] S. Aral, and D. Walker. Creating social contagion through viral product design: A randomized trial of peer influence in networks. Management Science, 57:1623–1639, 2011. [56] J. Leskovec, L. Adamic, and B. Huberman. The dynamics of viral SPREADING PHENOMENA

59

BIBLIOGRAPHY

marketing. ACM Trans. Web, 1, 2007. [57] L. Weng, A Flammini, A. Vespignani, and F. Menczer. Competition among memes in a world with limited attention. Scientific Reports, 2:335, 2012. [58] J. Berger, and K. L. Milkman. What makes online content viral? Journal of Marketing Research, 49:192–205, 2009. [59] S. Jamali, and H. Rangwala. Digging digg: Comment mining, popularity prediction and social network analysis. Proc. Intl. Conf. on Web Information Systems and Mining (WISM), 32–38, 2009. [60] G. Szabó and, B. A. Huberman. Predicting the popularity of online content. Communications of the ACM, 53:80–88, 2010. [61] B. Suh, L. Hong, P. Pirolli, and E. H. Chi. Want to be retweeted? Large scale analytics on factors impacting retweet in twitter network. Proc. IEEE Intl. Conf. on Social Computing, 177–184, 2010. [62] D. Centola. The spread of behavior in an online social network experiment. Science, 329:1194–1197, 2010. [63] L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. Group formation in large social networks: membership, growth, and evolution. Proc. ACM SIGKDD Intl. Conf. on Knowledge discovery and data mining, 44–54, 2006. [64] M. Granovetter. Threshold Models of Collective Behavior. American Journal of Sociology, 83:1420–1443, 1978. [65] N.A. Christakis, and J.H. Fowler. The Spread of Obesity in a Large Social Network Over 32 Years. New England Journal of Medicine, 35:370379, 2007. [66] N. A. Christakis and J. H. Fowler. The collective dynamics of smoking in a large social network. New England Journal of Medicine, 358:22492258, 2008. [67] R. Pastor-Satorras, and A. Vespignani. Evolution and structure of the Internet: A statistical physics approach. Cambridge University Press, Cambridge, 2007. [68] Z. Dezső and A-L. Barabási. Halting viruses in scale-free networks. Physical Review E, 65:055103, 2002. [69] R. Pastor-Satorras and A. Vespignani. Immunization of complex networks. Physical Review E, 65:036104, 2002. [70] R. Cohen, S. Havlin, and D. ben-Avraham. Efficient Immunization Strategies for Computer Networks and Populations. Physical Review Letters, 91:247901, 2003.

SPREADING PHENOMENA

60

BIBLIOGRAPHY

[71] F. Fenner et al. Smallpox and its Eradication. WHO, Geneva, 1988. http://www.who.int/features/2010/smallpox/en/ [72] L. A. Rvachev, and I. M. Longini Jr. A mathematical model for the global spread of influenza. Mathematical Biosciences, 75:3-22, 1985. [73] A. Flahault, E. Vergu, L. Coudeville, and R. Grais. Strategies for containing a global influenza pandemic. Vaccine, 24:6751-6755, 2006. [74] I. M. Longini Jr, M. E. Halloran, A. Nizam, and Y. Yang. Containing pandemic influenza with antiviral agents. Am. J. Epidemiol., 159:623-633, 2004. [75] I.M. Longini Jr, A. Nizam, S. Xu, K. Ungchusak, W. Hanshaoworakul, D. Cummings, and M. Halloran. Containing pandemic influenza at the source. Science, 309:1083-1087, 2005. [76] V. Colizza, A. Barrat, M. Barthélemy, A.-J. Valleron, and A. Vespignani. Modeling the world-wide spread of pandemic influenza: baseline case and containment interventions. PLoS Med, 4:e13, 2007. [77] T. D. Hollingsworth, N.M. Ferguson, and R.M. Anderson. Will travel restrictions control the International spread of pandemic influenza? Nature Med., 12:497-499, 2006. [78] C.T. Bauch, J.O. Lloyd-Smith, M.P. Coffee, and A.P. Galvani. Dynamically modeling SARS and other newly emerging respiratory illnesses: past, present, and future. Epidemiology, 16:791-801, 2005. [79] I. M. Hall, R. Gani, H.E. Hughes, and S. Leach. Real-time epidemic forecasting for pandemic influenza. Epidemiol Infect., 135:372-385, 2007. [80] M. Tizzoni, P. Bajardi, C. Poletto, J. J. Ramasco, D. Balcan, B. Gonçalves, N. Perra, V. Colizza, and A. Vespignani. Real-time numerical forecast of global epidemic spreading: case study of 2009 A/H1N1pdm. BMC Medicine, 10:165, 2012. [81] D. Balcan, H. Hu, B. Gonçalves, P. Bajardi, C. Poletto, J. J. Ramasco, D. Paolotti, N. Perra, M. Tizzoni, W. Van den Broeck, V. Colizza, and A. Vespignani. Seasonal transmission potential and activity peaks of the new influenza A/H1N1: a Monte Carlo likelihood analysis based on human mobility. BMC Med., 7:45, 2009. [82] P. Bajardi, et al. Human Mobility Networks, Travel Restrictions, and the Global Spread of 2009 H1N1 Pandemic. PLoS ONE, 6:e16591, 2011. [83] P.Bajardi, C. Poletto, D. Balcan, H. Hu, B. Gonçalves, J. J. Ramasco, D. Paolotti, N. Perra, M. Tizzoni, W. Van den Broeck, V. Colizza, and A. Vespignani. Modeling vaccination campaigns and the Fall/Winter 2009 activity of the new A/H1N1 influenza in the Northern Hemisphere. EHT Journal, 2:e11, 2009.

SPREADING PHENOMENA

61

BIBLIOGRAPHY

[84] M.E. Halloran, N.M. Ferguson, S. Eubank, I.M. Longini, D.A.T. Cummings, B. Lewis, S. Xu, C. Fraser, A. Vullikanti, T.C. Germann, D. Wagener, R. Beckman, K. Kadau, C. Macken, D.S. Burke, and P. Cooley. Modeling targeted layered containment of an influenza pandemic in the United States. PNAS, 105:4639-44, 2008. [85] G. M. Leung, A. Nicoll. Reflections on Pandemic (H1N1) 2009 and the international response. PLoS Med, 7:e1000346, 2010. [86] A.C. Singer, et al. Meeting report: risk assessment of Tamiflu use under pandemic conditions. Environ Health Perspect., 116:1563-1567, 2008. [87] R. Fisher. The wave of advance of advantageous genes. Ann. Eugen., 7:355–369, 1937. [88] J. V. Noble. Geographic and temporal development of plagues. Nature, 250:726–729, 1974. [89] D. Brockmann and D. Helbing. The Hidden Geometry of Complex, Network-Driven Contagion Phenomena. Science, 342:1337-1342, 2014. [90] J. S. Brownstein, C. J. Wolfe, and K. D. Mandl. Empirical evidence for the effect of airline travel on inter-regional influenza spread in the United States. PLoS Med, 3:e40, 2006. [91] D. Shah and T. Zaman, in SIGMETRICS’10, Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems, pp. 203–214, 2010. [92] A. Y. Lokhov, M. Mezard, H. Ohta, L. Zdeborová. Inferring the origin of an epidemy with dynamic message-passing algorithm. Phys. Rev E, 90:012801, 2014. [93] P. C. Pinto, P. Thiran, M. Vetterli. Locating the Source of Diffusion in Large-Scale Networks. Physical Review Letters, 109:068702, 2012. [94] C. H. Comin and L. da Fontoura Costa. Identiying the starting point of a spreading process in complex networks. Phys. Rev. E, 84:056105, 2011. [95] D. Shah and T. Zaman. Rumors in a Network: Who's the Culprit? IEEE Trans. Inform. Theory, 57:5163, 2011. [96] K. Zhu and L. Ying. Information source detection in the SIR model: A sample path based approach. Information Theory and Applications Workshop (ITA); 1-9, 2013. [97] B. A. Prakash, J. Vreeken, and C. Faloutsos. Spotting culprits in epidemics: How many and which ones? ICDM’12; Proceedings of the IEEE International Conference on Data Mining, 11:20, 2012. [98]  V. Fioriti and M. Chinnici. Predicting the sources of an outbreak

SPREADING PHENOMENA

62

BIBLIOGRAPHY

with a spectral technique. Applied Mathematical Sciences, 8:6775-6782, 2012. [99]  W. Dong, W. Zhang and C.W. Tan. Rooting out the rumor culprit from suspects. Proceedings of CoRR, 2013. [100] B. Barzel, and A.-L. Barabási. Universality in network dynamics. Nature Physics, 9:673, 2013. [101] A. Barrat, M. Barthélemy and A. Vespignani. Dynamical Processes on Complex Networks. Cambridge University Press, 2012. [102] S. N. Dorogovtsev, A.V. Goltsev, and J. F. F. Mendes. Critical phenomena in complex networks. Reviews of Modern Physics 80, 1275, 2008. [103] R. Cohen and S. Havlin. Complex Networks - Structure, Robustness and Function. Cambridge University Press, 2010. [104] P. Grassberger. On the critical behavior of the general epidemic process and dynamical percolation. Mathematical Biosciences, 63:157, 1983. [105] M. E. J.Newman. The spread of epidemic disease on networks. Physical Review E, 66:016128, 2002. [106] C. P. Warren, L. M. Sander, and I. M. Sokolov. Firewalls, disorder, and percolation in networks. Mathematical Biosciences, 180:293, 2002. [107] R. Cohen, K. Erez, D. ben-Avraham, and S. Havlin. Resilience of the Internet to random breakdown. Physical Review Letters, 85:4626–4628, 2000. [108] D. S. Callaway, M. E. J. Newman, S. H. Strogatz, and D. J. Watts. Network robustness and fragility: percolation on random graphs. Physical Review Letters, 85:5468–5471, 2000.

SPREADING PHENOMENA

63

BIBLIOGRAPHY