The Oxford Handbook of Social Networks 9780190251765, 9780197520628, 019025176X

"Social networks fundamentally shape our lives. Networks channel the ways that information, emotions, and diseases

138 102 10MB

English Pages 697 Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

The Oxford Handbook of Social Networks
 9780190251765, 9780197520628, 019025176X

Table of contents :
Cover
The Oxford Handbook of Social Networks
Copyright
Table of Contents
Acknowledgments
Editor Biographies
Contributor Biographies
Chapter 1: Introduction
The Handbook as a Map
Network Basics and Theories
Network Methods
Network Dimensions
Network Landscape
Conclusions, Concerns, and Future Directions
Note
References
Part I: Network Basics and Theory
Chapter 2: Network Basics: Points, Lines, and Positions
The Building Blocks of Networks
Two General Approaches to Social Network Analysis
Basic Network Forms
Network Building Blocks: Bridging Levels
Boundary Specification
Connectivity, Cohesion, and Community
Statistical Models of Networks
Collecting Social Network Data
Name Generators
Network Sampling
Ethics and Social Network Analysis
Conclusion
Note
References
Chapter 3: Theories of Social Networks
Networks and Theory
Action Theory and Social Capital
Social Structures and Individual Action
Social Capital
Pragmatism and Interactionism
Relational Sociology
Social Networks and Meaning
Extensions
Conclusion
Notes
References
Chapter 4: Networks and Neo-Structural Sociology
Individual and Collective Capacities
Interdependencies in the Organizational Society: Bureaucracy and Collegiality
Relational Infrastructures
Social Processes as Social Capital of the Collective in the Organizational and Market Society
Neo-Structural Institutionalism
Challenges: Longitudinal and Multilevel Network Structures to Navigate Social Processes
Conclusion
References
Chapter 5: Rethinking Social Networks in the Era of Computational Social Science
Four Conceptualizations of Network Ties for Social Network Theory
Social Ties as Access or Opportunity
Social Ties as (Time-Aggregated) Behavioral Interactions
Social Ties as Interpersonal Sentiments
Social Ties as Socially Constructed Role Relations
Comparing These Four Conceptualizations
Treatment of Ties and Null Ties
Temporality
Dilemmas of Mapping Theories to Data across Discrepant Conceptualizations of Networks
Can We Use Role Relation Data to Investigate Theories of Social Interaction, Access, and Sentiments?
Can We Use Aggregated Social Interaction Data to Investigate Theories of Access and Social Sentiments?
A Revolution in Data Collection: Computational Social Science
Computational Social Science and Role Relations
Computational Social Science and Sentiments
Computational Social Science and Behavioral Interactions
Computational Social Science and Structures of Access or Opportunity
A Revolution in Data Analysis: From Aggregating to Modeling Relational Events
Acknowledgments
Notes
References
Chapter 6: Networks, Status, and Inequality
Terminology and Scope
Networks
Status
Inequality
Ascertaining Status in Networks
Esteem and Choice
Visibility and Prominence
Agonism
Status Production and Maintenance in Networks
The Popularity Tournament
The Facebook Effect
Status Diffusion
Asymmetry and Evolution
Topological Implications of Status
Studying Networks in Unequal Environments
Conclusion
Notes
References
Part II: Networks Methods
Chapter 7: Strategies for Collecting Social Network Data: Overview, Assessment, and Ethics
What Is the Goal? Theory’s Role in Gathering Network Data
Design Strategies for Sampling and Measurement
The “Boundary Specification” Problem
Name Generators—Which Relationships?
Name Interpreters—Information about Identified Social Ties
Data Quality and Assessment: Did We Capture What We Intended to Capture?
Tie Reliability and Validity
Implications and Quality Assessment
Strategies for Optimizing Data Fidelity
Cognitive Social Structures
Unique Ethical Considerationsof Network Data
Ethics in Data Collection
Ethics in Data Analysis and Presentation of Results
Summary
Notes
References
Chapter 8: Social Network Experiments
What Is an Experiment?
Can Social Networks Be Studied with Experiments?
Experimental Manipulations for the Study of Social Networks
Examples of Network Experiments
Homophily and the Spread of Health Behaviors
Networks and the Matthew Effect
Error and Error Correction Process in Network Diffusion
Network Recall and Social Exclusion
Conclusion
Notes
References
Chapter 9: The Network Scale-Up Method
Introduction
Methodology
The Network Scale-Up Estimator
Estimating Degree
Bayesian Approach
Generalized Network Scale-Up
Survey Design
Defining “Know”
The Scaled-Down Condition
Conclusion
References
Chapter 10: The Continued Relevance of Ego Network Data
Advantages and Disadvantages of Ego Network Data
Advantages
Disadvantages
What Can Be Extracted from Ego Network Data?
Applications of Ego Network Data
Using Ego Network Properties to Predict Individual-Level Outcomes
Using Ego Network Data to Measure Social Boundaries
Using Ego Network Data to Improve RDS Estimation
Using Ego Network Data to Infer Full Network Features
Conclusion: Future Uses of Ego Network Data
Notes
References
Chapter 11: Dyadic, Nodal, and Group-Level Approaches to Study the Antecedents and Consequences of Networks: Which Social Network Models to Use and When?
A Framework of Basic Models for Social Network Analysis at Different Levels
Network Antecedents at a Dyadic Level (Model 1.1)
Network Consequences at a Dyadic Level (Model 1.2)
Network Emergence at the Nodal Level (Model 1.3)
Network Consequences at the Nodal Level (Model 1.4)
Network Emergence at a Group Level (Model 1.5)
Network Consequences at a Group Level (Model 1.6)
Variations and Extensions of the Six Basic Models
Network Mediation Models
Network Moderation Models
Network Coevolution Model
Multiple Groups and Multilevel Models for Dyadic and Nodal-Level Analysis
Generalizability
Group-Level Effects
Cross-Level Interaction
Macro-Micro-Macro Models
Conclusion
Acknowledgments
Notes
References
Chapter 12: An Introduction to Statistical Models for Networks
Some History
Some More Notation
Exponential Family of Random Graphs—p*
Statistical Theory
Parameters
Simulation, Estimation, and Goodness of Fit
Other Types of Networks
Bipartite Networks
Multilevel Networks
Multivariate Networks
Longitudinal Models
Longitudinal Networks: Evolution of Structure or Coevolution of Structure and Attributes
Conclusion
Acknowledgements
Notes
References
Chapter 13: Advances in Exponential Random Graph Models
ERGM and ALAAM
Model Constructs
Multilevel ERGMs and ALAAMs
Modeling Techniques
Empirical Examples
Empirical Example 1: Multiple Project Memberships and Advice Seeking in Organizations
Example 2: Common Resource Management Satisfaction and Information Exchanges between Users
Example 3: How Are Individual Accomplishments Shared across the Team?
Discussion and Future Steps
Notes
References
Chapter 14: Modeling Network Dynamics
Conceptualizing Network Dynamics
Network Change Processes
Nodal Effects
Dyadic Effects
Endogenous Structure
Modeling Network Dynamics
The Relational Event Framework
Stochastic Actor-Oriented Framework
The Exponential Random Graph Framework
Model Selection
Empirical Example of Three Approaches
Model Statistics
Attribute Effects: Age
Dyadic Effects: Proximity
Endogenous Effects: Reciprocity
Endogenous Effects: Triadic Closure
Modeling Strategy and Results
Baseline Models
More Practical Models
Extended Models
Subsequent Steps
Outstanding Issues and Future Directions
Notes
References
Chapter 15: Causal Inference for Social Network Analysis
The Influence Process
Randomized Experiments
Observational Studies
Simulation Example of Identification Using OLS
Informing the Inevitable Debate by Quantifying the Robustness of Inferences
The Selection of Interaction Partners
Estimation of Selection Models
Quantifying the Robustness of Inferences from Selection Models
Discussion
Conclusion
Notes
References
Part III: Netwrok Dimensions
Chapter 16: Case Studies in Network Community Detection
Virality Prediction of Social Memes
Congressional Roll Call
Exploratory Analysis of the C. Elegans Neural Network
Comparing Network Architectures of the Human Brain at Different States
A Probabilistic Network Model for Malaria Parasite Genes
Concluding Comments
Acknowledgments
References
Chapter 17: Three Perspectiveson Centrality
The Walk Structure Perspective
The Contribution/Induced Centrality Perspective
The Flow Outcomes Perspective
Discussion
Final Note
Acknowledgments
Notes
References
Chapter 18: Network Visualization
Brief History and Motivations
Basic Network Visualization Strategies: Better Sociograms
Advanced Network Visualization Approaches: Moving beyond Sociograms
Conclusions
A Note on Software
References
Chapter 19: The Spatial Dimensions of Social Networks
Micro-Level Networks of People
Meso-Level Networks of Things
Macro-Level Networks of Places
Networks in Latent Space
Frontiers
Notes
References
Chapter 20: Field Experiments of Preferential Attachment
A Novel Application: http://www.ebay.com
Acknowledgment
Notes
References
Chapter 21: Duality beyond Persons and Groups: Culture and Affiliation
Duality in Past and Present Sociology
Limitations of Breiger’s 1974 Formulation
Scope of Duality Discussion
Dualities in the Analysis of Culture
Dualities of Artists and Art Worlds
Duality of Actors and Cultural Forms
Dualities of Networks and Meaning
From Relationality to “Fusion” of Networks and Culture
Dualities in the Analysis of Structures: Affiliation Networks
Affiliation Networks
Revamping Old Ideas? “New” Science of Networks
Actor-Network Theory and “Heterogeneous Networks”
Recent Developments and Future Directions
Duality and Its Extensions toward Multiple Networks
Cultural Analysis: Duality of Documents and Words Yielding Categories
Notes
References
Chapter 22: Networks of Culture, Networks of Meaning Two Approaches to Text Networks
What Does Meaning Mean?
Network Text Analysis for Meaning Structure
Constructing Text Networks
Results
Computational Narrative Analysis for Embedded Meaning
Constructing Subject-Action-Object Networks
Results
Conclusion
Acknowledgment
Note
References
Chapter 23: Historical Network Research
Cross-Cutting Ties
Informal Social Ties
Associational and Organizational Networks
Narrative Networks
Cohesion
Brokerage and Centrality
Conclusions
References
Part IV: Network Landscape
Chapter 24: Networks in Archaeology
The Added Value of the “Network”? Dyads and Triads
Encounters with Network Thinking in Archaeology
Spatial Network Analysis and “Theory Models”
From Theory Models to Data Models
Entangled Networks of Humans and Things
Acknowledgments
Notes
References
Chapter 25: Networks, Kin, and Social Support
Size
Density
Betweenness
Transitivity
Reciprocity
Embeddedness
Families as Systems of Exchanges
Defining Family Roles through Configurations of Interactions
Caring Roles
Affectionate Roles
Limited Interaction Roles
Entwined Lives Roles
Friendly Roles
References
Chapter 26: Demography and Networks
Demography: Enumeration, Estimation, and Explanation
Network Approaches and Current Contributions to Demography
Future Directions for Network Approaches to Advance Demographic Research
Note
References
Chapter 27: The Neuroscience of Social Networks
The Neuroscience of Social Networks
Fields Collide: The Social Brain Hypothesis
An Emerging New Field
Why the Brain?
How the Brain Encodes Social Relationships
Differential Neural Responses to Friends and Strangers
The Need to Move beyond “Friend versus Stranger”
The Neural Representation of Social Closeness
The Neural Encoding of Indirect Social Relationships
The Importance of Indirect Social Relationships to Everyday Human Thought and Behavior
The Neural Encoding of Social Network Position Characteristics
Distinct but Analogous Facets of Social Status
How the Brain Shapes and Constrains Social Networks
Does the Processing Capacity of the Human Brain Constrain Social Network Size?
How Social Networks Shape the Brain
Summary
References
Chapter 28: Computational Social Science, Big Data, and Networks
Computational Thinking about Social Processes
Challenges in Modeling Social Data
Machine Learning and Social Sciences
Online Experimentation on Interactions
Online Field Experiments
Challenges
Notes
References
Chapter 29: Networks: An Economic Perspective
Why Should We Study NetworkStructure?
Externalities: A Unifying Theme
Overview
Network Formation
Behavior and Games on Networks
Strategic Complementarities
Financial Networks
Social Learning
Labor Markets
Development Economics
Exchange Theory, Bargaining, and Trade on Networks
Empirical Analyses of Network Models
Concluding Remarks
Notes
References
Chapter 30: Social Capital and Economic Sociology
Social Capital and the Labor Market
Job-Matching Processes
Job-Matching Outcomes
Social Capital and Workplace Outcomes
Antecedents
Individual Performance and Innovation Outcomes
Trust and Collective Outcomes
Power and Influence
Summary
References
Chapter 31: The International Trade Network
Data and Measurement in ITN Studies
World System Classification in the ITN
Topological Properties of the ITN
Explaining the ITN
Effect of Homophily on the ITN
Effect of Systemic Equivalence on the ITN
Effect of Topological Properties on the ITN
Multivariate Regression Quadratic Assignment Procedure
A Future Direction: Exponential Random Graph Model
References
Chapter 32: Maps of Science, Technology, and Education
Map Design
Map Utility
Exemplary Maps of Science, Technology, and Education
Springer Nature SciGraph
NSF Graph Tool DIA2
NIH CTSA Expertise Explorer
NIH Twitter Activity Explorer
Learning LeX Subway Maps
CyberSeek Career Maps
Discussion and Outlook
Scalable, Multilevel Maps
Acknowledgments
Notes
References
Chapter 33: Criminal Networks
Criminal Networks
Measuring Criminal Groups and Groups of Criminals
Criminal Groups
Co-Offending Groups
Criminal Investigations
Criminal Network Data
Theoretical Foundations in Criminal Networks
Organizations
Diffusion
Group Process
Criminal Justice Applications
Moving Criminal Networks Forward
Notes
References
Index

Citation preview

T h e Ox f o r d H a n d b o o k o f

SOCIAL N ET WOR K S

the oxford handbook of

SOCIAL NETWORKS Edited by

RYAN LIGHT and

JAMES MOODY

1

1 Oxford University Press is a department of the University of Oxford. It furthers the University’s objective of excellence in research, scholarship, and education by publishing worldwide. Oxford is a registered trade mark of Oxford University Press in the UK and certain other countries. Published in the United States of America by Oxford University Press 198 Madison Avenue, New York, NY 10016, United States of America. © Oxford University Press 2020 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, by license, or under terms agreed with the appropriate reproduction rights organization. Inquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above. You must not circulate this work in any other form and you must impose this same condition on any acquirer. Library of Congress Cataloging-in-Publication Data Names: Light, Ryan, editor. | Moody, James, editor. Title: The oxford handbook of social networks / edited by Ryan Light and James Moody. Description: New York, NY : Oxford University Press, [2020] | Series: Oxford handbooks | Includes bibliographical references and index. Identifiers: LCCN 2020020553 (print) | LCCN 2020020554 (ebook) | ISBN 9780190251765 (hardback) | ISBN 9780197520628 (epub) Subjects: LCSH: Social networks. | Online social networks. | Social sciences—Network analysis. Classification: LCC HM741 .O93 2020 (print) | LCC HM741 (ebook) | DDC 302.3—dc23 LC record available at https://lccn.loc.gov/2020020553 LC ebook record available at https://lccn.loc.gov/2020020554 1 3 5 7 9 8 6 4 2 Printed by Sheridan Books, Inc., United States of America.

Table of Contents

Acknowledgmentsix Editor Biographiesxi Contributor Biographies xiii

1. Introduction

1

Ryan Light and James Moody

PA RT I   N E T WOR K BA SIC S A N D T H E ORY 2. Network Basics: Points, Lines, and Positions

17

Ryan Light and James Moody

3. Theories of Social Networks

34

Jan Fuhse

4. Networks and Neo-Structural Sociology

50

Emmanuel Lazega

5. Rethinking Social Networks in the Era of Computational Social Science

71

James A. Kitts and Eric Quintane

6. Networks, Status, and Inequality

98

John Levi Martin and James P. Murphy

PA RT I I   N E T WOR K M E T HOD S 7. Strategies for Collecting Social Network Data: Overview, Assessment, and Ethics

119

Jimi Adams, Tatiane Santos, and Venice Ng Williams

8. Social Network Experiments

137

Matthew E. Brashears and Eric Gladstone

9. The Network Scale-Up Method Tyler H. McCormick

153

vi   Table of Contents

10. The Continued Relevance of Ego Network Data

170

Jeffrey A. Smith

11. Dyadic, Nodal, and Group-Level Approaches to Study the Antecedents and Consequences of Networks: Which Social Network Models to Use and When?

188

Filip Agneessens

12. An Introduction to Statistical Models for Networks

219

Valentina Kuskova and Stanley Wasserman

13. Advances in Exponential Random Graph Models

234

Dean Lusher, Peng Wang, Julia Brennecke, Julien Brailly, Malick Faye, and Colin Gallagher

14. Modeling Network Dynamics

254

David R. Schaefer and Christopher Steven Marcum

15. Causal Inference for Social Network Analysis

288

Kenneth A. Frank and Ran Xu

PA RT I I I   N E T WOR K DI M E N SION S 16. Case Studies in Network Community Detection

311

Saray Shai, Natalie Stanley, Clara Granell, Dane Taylor, and Peter J. Mucha

17. Three Perspectives on Centrality

334

Stephen P. Borgatti and Martin G. Everett

18. Network Visualization

352

James Moody and Ryan Light

19. The Spatial Dimensions of Social Networks

368

Zachary P. Neal

20. Field Experiments of Preferential Attachment

384

Arnout van de Rijt and Afife Idil Akin

21. Duality beyond Persons and Groups: Culture and Affiliation

392

Sophie Mützel and Ronald Breiger

22. Networks of Culture, Networks of Meaning: Two Approaches to Text Networks Ryan Light and Jeanine Cunningham

414

Table of Contents   vii

23. Historical Network Research

432

Emily Erikson and Eric Feltham

PA RT I V   N E T WOR K L A N D S C A PE 24. Networks in Archaeology

445

Carl Knappett

25. Networks, Kin, and Social Support

467

G. Robin Gauthier

26. Demography and Networks

480

M. Giovanna Merli, Sara R. Curran, and Claire Le Barbenchon

27. The Neuroscience of Social Networks

496

Carolyn Parkinson, Thalia Wheatley, and Adam M. Kleinbaum

28. Computational Social Science, Big Data, and Networks

516

Bruno Abrahao and Paolo Parigi

29. Networks: An Economic Perspective

535

Matthew O. Jackson, Brian W. Rogers, and Yves Zenou

30. Social Capital and Economic Sociology

563

Steve McDonald and Richard A. Benton

31. The International Trade Network

583

Min Zhou

32. Maps of Science, Technology, and Education

598

Katy Börner

33. Criminal Networks

616

Chris M. Smith and Andrew V. Papachristos Index

633

Acknowledgments

This project took years to finish, as one might expect, and required the steadfast commitment of numerous people to push toward publication. First, the Handbook results from the collective commitment and work of the more than 50 scholars who have contributed their expertise to this volume. The extraordinarily high response rate to our participation request was somewhat surprising, but consistent with our observations about the social networks community’s interest in contributing to new intellectual projects and—perhaps more importantly—in building tools for “new” researchers or for “old” researchers interested in thinking about new things. Second, we are especially thankful for the support of James Cook, our editor at Oxford University Press, who initially saw the value of and remained enthusiastic for this project. Emily Mackenzie and the numerous members of the editorial and production team at Oxford were also instrumental to bringing the Handbook to fruition. Third, we are thankful for the broader community of network scholars who have inspired us through the years and motivated our thinking about networks. jimi adams, in particular, has been a frequent sounding board on all issues related to social networks big and small. Our colleagues at Duke and the University of Oregon have also been key audiences, notably the networks-leaning folks, especially the fabulous PhD students Jeanine Cunningham and Nicholas Theis at the University of Oregon who read drafts of numerous chapters. Fourth, we are grateful for the support of the Duke Network Analysis Center for the opportunity the Center provides for network researchers to exchange ideas. Last, we are thankful to Jill Harrison, Henry Harrison Light, and Lisa Keister.

Editor Biographies

Ryan Light is Associate Professor of Sociology at the University of Oregon and the Digital Scholarship Fellow in the Social Sciences at the University of Oregon Libraries. Professor Light’s research focuses on modeling social complexity and dynamics using network analysis. He takes a computational social scientific approach to develop models of culture and science. His work has appeared in Proceedings of the National Academy of Sciences, Annual Review of Sociology, and Social Forces, among others. James Moody is the Robert O. Keohane Professor of Sociology at Duke University. He has published extensively in the field of social networks, methods, and social theory with over 70 peer-reviewed papers and extensive applied consultation with industry and the Department of Defense. His work has focused theoretically on the network foundations of social cohesion and diffusion, with a particular emphasis on building tools and methods for understanding dynamic social networks. He has used network models to help understand organizational performance, school racial segregation, adolescent health, disease spread, economic development, and the development of scientific disciplines (among others). Moody’s work is funded by the National Science Foundation, the National Institutes of Health, the James  S.  McDonnel Foundation, and the Robert Wood Johnson Foundation and has appeared in top social science, health, and medical journals. He is winner of the International Network for Social Network Analysis’s Freeman Award for scholarly contributions to network analysis and in 2014 was named a Thomson Reuters “Highly Cited Researcher,” for authorship of papers in the top 1% of citations within the field. He is founding director of the Duke Network Analysis Center, former editor of the online Journal of Social Structure, and cofounding editor of the American Sociological Association’s Open Access journal Socius.

Contributor Biographies

Bruno Abrahao is an Assistant Professor of Information Systems and Business Analytics, NYU Shanghai, and Global Network Assistant Professor, New York University. His research focuses on theoretical and applied aspects of data science and machine learning to investigate social behavior. Abrahao holds a PhD in Computer Science from Cornell University, was a Postdoctoral Fellow at Stanford University, with affiliations in the Computer Science and Sociology departments, and was a Postdoctoral Researcher at Microsoft Research AI, Redmond. jimi adams is Associate Professor in the Department of Health and Behavioral Sciences at the University of Colorado Denver. His work focuses on examining social networks to understand how infectious diseases and novel ideas spread. This has included modeling HIV/AIDS risk in the United States and Sub-Saharan Africa, and the organizational dynamics of interdisciplinary fields. He is the author of Gathering Social Network Data. Filip Agneessens is Associate Professor of Sociology and Social Research at the University of Trento and an Associate Member of the Department of Sociology/Nuffield College at the University of Oxford. He holds a PhD in Sociology from Ghent University. His research centers on social network analysis with a specific focus on methodology and applications in intraorganizational settings. He has coedited a special issue in Social Networks on “Advances in Two-Mode Social Network Analysis” (with Martin Everett) and on “Negative and Signed Tie Networks” (with Nicholas Harrigan and Giuseppe [Joe] Labianca). Afife Idil Akin holds a PhD in Sociology from Stony Brook University (2018). Her research utilizes a combination of comprehensive datasets of social movement action and field experiments to analyze structural and emergent elements of social movement participation, especially in online petitions. Her collaborative experimental work with Arnout van de Rijt examines how initial success may be a defining factor for later success in a variety of social and political situations. Richard  A.  Benton is Assistant Professor of Labor and Employment Relations at the University of Illinois. Benton’s research interests include economic sociology, organization theory, social stratification, and social networks. His primary research stream examines board interlock networks, corporate governance, and elite power. Other research examines the dynamics of contention in shareholder activism and how organizational processes affect social stratification. Benton’s research has been supported by the National Science Foundation and the Russell Sage Foundation and has appeared in the American Journal of Sociology, Organization Science, Social Forces, and Social Networks. Stephen  P.  Borgatti is the Paul Chellgren Chair of Management at the University of Kentucky, where he is also department head. He received his PhD in Social Science from

xiv   Contributor Biographies the University of California, Irvine, and his BA in Anthropology at Cornell. His research interests are in social networks, particularly in flow phenomena and cognition about networks. He is a past president of International Network for Social Network Analysis (INSNA) and winner of the INSNA’s Simmel Award for lifetime achievement. He is coauthor of the UCINET software for social network analysis. Katy Börner is the Victor H. Yngve Distinguished Professor of Engineering and Information Science in the Luddy School of Informatics, Computing, and Engineering, Core Faculty of Cognitive Science, and Founding Director of the Cyberinfrastructure for Network Science Center at Indiana University, Bloomington. She is a curator of the international Places & Spaces: Mapping Science exhibit that features large-format maps and interactive data visualizations. She holds an MS in Electrical Engineering from the University of Technology in Leipzig (1991) and a PhD in Computer Science from the University of Kaiserslautern (1997). Börner is a Fellow of the American Association for the Advancement of Science (AAAS), a Humboldt Research Fellow, and an Association for Computing Machinery (ACM) Fellow. Julien Brailly is Associate Professor at the National Institute of Polytechnic of Toulouse (INPT/ENSAT) since 2017. He completed a PhD at the University Paris-Dauphine and a postdoc at the Swinburne University of Technology. He is a sociologist specialized in social network analysis and economic sociology. His recent works concern digital platforms and collective action, whether it concerns TV trade shows, water management in the SubSaharan community, or agriculture. Matthew  E.  Brashears is Associate Professor of Sociology at the University of South Carolina. His work integrates ideas from evolutionary theory, social networks, organizational theory, and neuroscience and has appeared in Nature Scientific Reports, the American Sociological Review, the American Journal of Sociology, Social Networks, and Advances in Group Processes, among others. He has received grants from the National Science Foundation, the Defense Threat Reduction Agency, the Army Research Institute, the Army Research Office, and the Office of Naval Research. He is coeditor for Social Psychology Quarterly and is an officer in the American Sociological Association’s Social Psychology Section. Ronald Breiger is a Regents’ Professor and Professor of Sociology at the University of Arizona. He works in the areas of social networks, mathematical models, and measurement in cultural and institutional analysis. Recent publications include “Capturing Distinctions while Mining Text Data: Toward Low-Tech Formalization for Text Analysis” (with Robin Wagner-Pacifici and John Mohr), Poetics (2018), and “Insurgencies as Networks of Event Orderings” (with Julia Smith), Sociological Theory (2018). He is a recipient of the Simmel Award (International Network for Social Network Analysis, 2005) and the James S. Coleman Distinguished Career Achievement Award (Section on Mathematical Sociology, American Sociological Association, 2018). Julia Brennecke is Reader in Innovation Management at the University of Liverpool Management School, United Kingdom, and an adjunct researcher at the Centre for Transformative Innovation at the Swinburne University of Technology in Melbourne, Australia. Her research focuses on networks within and between organizations, with the

Contributor Biographies   xv aim of creating a better understanding of how and why network ties form and exposing the consequences of network connections for innovation. Her work has been published in journals such as Academy of Management Journal, Research Policy, and Human Resource Management. Jeanine Cunningham is a PhD candidate in Sociology at the University  of Oregon. Her research examines the relationships among meaning-making activities, mobilization, and legitimacy claims. She is a mixed-methods researcher who uses network analysis to explore how powerful groups seek  to shape cultural, environmental, and political landscapes through information dissemination, financial contributions, and the creation of alliances. Sara R. Curran is Professor of International Studies, Professor of Sociology, and Professor of Public Policy and Governance at the University of Washington. She researches development and demographic dynamics, migration and immigrant incorporation, and population dynamics and climate change. Emily Erikson is Associate Professor of Sociology and the School of Management (by courtesy) at Yale University and Director of the Fox International Fellowship Program. She is the author of Between Monopoly and Free Trade: The English East India Company, 1600–1757 and several other works on the role of networks in institutional transformation and historical processes. Martin G. Everett is currently codirector of the Mitchell Centre for Social Network Analysis at the University of Manchester and has over 40 years’ experience in research in social network analysis. He graduated in mathematics from Loughborough University and then went on to Oxford to complete a master’s degree and a doctorate. He is a past president of the International Network for Social Network Analysis, winner of the Simmel Award for lifetime achievement in social network analysis, coeditor of the journal Social Networks, coauthor of the social network analysis package UCINET, and a Fellow of the Academy of Social Sciences. Malick Faye is an academic staff member at the Method Center for Economic, Social, and Cultural Sciences at the Zeppelin University (Friedrichshafen) and since 2016 a Research Associate at the Centre de Sociologie des Organisations, Sciences Po, Paris. He obtained a PhD at the Carl von Ossietzky University in Oldenburg on the influence of network structures on the management of water provision in an agro-pastoral community in northwestern Senegal. His research focuses on the self-governance of common pool resources in heterogeneous groups, the dynamics of institutions for collective action, and the structural effects of governance processes. Eric Feltham is a PhD student in the Department of Sociology at Yale University, and he holds an MA in Statistics from Yale. Additionally, he is a Graduate Student Researcher in the Yale Institute for Network Science. Kenneth A. Frank is MSU Foundation Professor of Sociometrics; Professor in Counseling, Educational Psychology, and Special Education; and Adjunct (by courtesy) in Fisheries and Wildlife and Sociology at Michigan State University. He received his PhD in measurement, evaluation, and statistical analysis from the School of Education at the University of Chicago in 1993. His substantive interests include the study of schools as organizations, social

xvi   Contributor Biographies structures of students and teachers and school decision making, and social capital. His substantive areas are linked to several methodological interests: social network analysis, sensitivity analysis and causal inference (http://konfound-it.com), and multilevel models. His recent publications include agent-based models of the social dynamics of the implementation of innovations in organizations and the implications of social networks for educational opportunity. Jan Fuhse is currently a Replacement Professor of Sociology at the University of Passau, Germany. After receiving his PhD in Sociology from Universität Stuttgart (Germany) in 2007, he completed a postdoc at Columbia University in 2007–2008. From 2009 to 2013, he was an Assistant Professor of Political Sociology at the University of Bielefeld, completing his Habilitation in 2011. From 2013 to 2018 he had a research fellowship at Humboldt University of Berlin. Fuhse’s research focuses on communication and meaning in social networks, on social networks in inequality, on interethnic relations, and on constellations in political discourse. Colin Gallagher is a Postdoctoral Research Fellow in the Centre for Transformative Innovation at the Swinburne University of Technology in Victoria, Australia. His research interests include the application of statistical methods for social networks to substantive issues in social psychology, education, organizational culture, and mental health. G. Robin Gauthier is Assistant Professor at the University of Nebraska-Lincoln. Her interests are gender, family, health, and social networks. She has three primary areas of research, examining how peer groups reinforce or challenge established gender norms in a social setting, how patterns of social connections affect risk for health outcomes, and how social network models can uncover how social roles are enacted in everyday life. Since starting as an assistant professor, her research has appeared in journals including PLoS One, the Journal of Interpersonal Violence, Social Sciences, and the Journal of Ethnicity in Substance Abuse. Eric Gladstone is a UX Researcher at Facebook, and an organizational and social network scientist. His work examines human and software interface interactions, and human behaviors and perceptions in and of social and organizational networks. He lists the following in no certain order as life priorities: research, non-human animal pets, human wife, basketball, swimming, biking, sailing, science fiction. Clara Granell is Visiting Assistant Professor in the Department of Computer Science and Engineering in the Universitat Rovira i Virgili, Tarragona, Spain. She obtained her PhD from Universitat Rovira i Virgili. Her past appointments include postdoctoral training at the Department of Mathematics of the University of North Carolina at Chapel Hill and at the Universitat de Barcelona Institute of Complex Systems. Her work is devoted to complex systems, with a special focus on problems suited to be represented with networks. She has experience working in community detection, epidemic spreading, and multiplex networks as well as applying theoretical methods to real data, such as neuronal networks. Matthew O. Jackson is the William D. Eberle Professor of Economics at Stanford University and an external faculty member of the Santa Fe Institute. He was at Northwestern University and Caltech before joining Stanford and received his BA from Princeton University in 1984 and PhD from Stanford in 1988. Jackson’s research interests include game theory, microeconomic theory, and the study of social and economic networks, on which he has published many articles and the books The Human Network and Social and Economic Networks. He

Contributor Biographies   xvii also teaches an online course on networks and coteaches two others on game theory. Jackson is a Member of the National Academy of Sciences, a Fellow of the American Academy of Arts and Sciences, a Fellow of the Econometric Society, a Game Theory Society Fellow, and an Economic Theory Fellow, and his other honors include the von Neumann Award, a Guggenheim Fellowship, the Social Choice and Welfare Prize, the B.E.  Press Arrow Prize for Senior Economists, and teaching awards. James A. Kitts is Professor of Sociology and Founding Director of the Computational Social Science Institute at the University of Massachusetts. He earned his PhD from Cornell University in 2001 and previously held faculty appointments at Columbia University, Dartmouth College, and the University of Washington. Bridging computational social science, sociology, and public health, he has worked on methods to detect social interaction from audio signals using wearable sensors, has analyzed the dynamics of patient transfers across hospitals, and directs an NIH-funded longitudinal study of adolescent friendship networks. His work appears in American Sociological Review, American Journal of Sociology, Social Forces, Social Networks, Demography, and Social Psychology Quarterly. Adam M. Kleinbaum is Associate Professor at the Tuck School of Business at Dartmouth. His research examines the antecedents and evolution of social networks in organizations and has shown how formal and informal structures and processes, prior career history, individual personality, and brain structure and function all contribute to advantageous networks. His work is methodologically diverse, ranging from the analysis of electronic communications to neuroimaging to computational linguistics, but thematically focused on the formation and evolution of social networks. He enjoys commuting to campus on his vintage three-speed bicycle. Carl Knappett holds the Walter Graham/Homer Thompson Chair in Aegean Prehistory in the Department of Art History at the University of Toronto. He is an archaeologist interested both in the micro-processes of meaning making in material culture and in the macro-scale dynamics of interaction within and between communities. To this end he has sought to develop network approaches that have broad applicability for the study of ancient material culture and society. His publications include Thinking through Material Culture, An Archaeology of Interaction, and Network Analysis in Archaeology. He conducts fieldwork at various Bronze Age sites across the Aegean and directs the new excavations at the Minoan town of Palaikastro in east Crete. Valentina Kuskova is the Head of the International Laboratory for Applied Network Research at the National Research University Higher School of Economics (HSE) in Moscow. She is also the Applied Statistics with Network Analysis Program Academic Supervisor at HSE and Deputy First Vice Rector. She received her PhD in 2010 in Organizational Behavior and Decision Sciences from Indiana University as well as an MS in Statistics. She is a faculty member of the Department of Sociology. Her research interests include social networks, longitudinal analysis, and research design. Emmanuel Lazega is Professor of Sociology at Sciences Po, Paris. He is a senior member of the Institut Universitaire de France and the author of several books, among which The Collegial Phenomenon: The Social Mechanisms of Cooperation among Peers in a Corporate Law Partnership and Bureaucracy, Collegiality and Social Change: Redefining Organizations with Multilevel Relational Infrastructures. His research brings together social theory,

xviii   Contributor Biographies s­ociology of organizations, economic sociology, and social and organizational network analyses. He received the 2018 Simmel Award of the International Network of Social Network Analysts. Claire Le Barbenchon is a PhD candidate in Public Policy and Sociology at Duke University, pursuing a concurrent master’s in Statistical Science. Her research interests lie at the intersection of social networks, migration and economic sociology. Dean Lusher is Professor of Innovation Studies in the Centre for Transformative Innovation at the Swinburne University of Technology. He is a social network analyst with expertise in the theory and application of exponential random graph models (ERGMs). His research focuses on social and technological innovation, organizational culture, knowledge transfer, and network effectiveness. Dean is a Board Member of the International Network for Social Network Analysis, a founding member of the Australian Network for Social Network Analysis, and an editorial board member of the journal Social Networks. He leads the Swinburne node of MelNet. Christopher Steven Marcum is a Staff Scientist for Data Science Policy at the National Institute of Allergy and Infectious Diseases. His research has two arms: network methods development and understanding how health impacts network processes over the life course. He is the recipient of the 2015 Matilda White Riley Early Stage Investigator Award from the Office of Behavioral and Social Science Research at the National Institutes of Health for his work on intergenerational exchange from a network perspective. John Levi Martin is Florence Borchert Bartling Professor of Sociology at the University of Chicago. He is the author of Social Structures, The Explanation of Social Action, Thinking through Theory, Thinking through Methods, and Thinking through Statistics, as well as articles on methodology, cognition, social networks, and theory. Tyler H. McCormick is Associate Professor of Statistics and Sociology at the University of Washington, where he is also a core faculty member in the Center for Statistics and the Social Sciences. He is also a Senior Data Science Fellow and colead for Data Science Education & Career Development at the eScience Institute, the University of Washington’s data science center. McCormick’s work develops statistical models that infer dependence structure in scientific settings where data are sparsely observed or subject to error. His recent projects include estimating features of social networks (e.g., the degree of clustering or how central an individual is) using data from standard surveys, inferring a likely cause of death (when deaths happen outside of hospitals) using reports from surviving caretakers, and quantifying and communicating uncertainty in predictive models for global health policymakers. He holds a PhD in Statistics (with distinction) from Columbia University and is the recipient of an NIH Career Development (K01) Award, an Army Research Office Young Investigator Program Award, and a Google Faculty Research Award. Tyler currently serves as Editor for the Journal of Computational and Graphical Statistics (JCGS). Steve McDonald is Professor of Sociology and University Faculty Scholar at North Carolina State University. His primary area of study investigates the economic consequences of race and gender inequality in access to social capital. This has taken the form of research on “nonsearching,” the receipt of unsolicited job leads from interpersonal connections, and

Contributor Biographies   xix natural mentoring relationships. McDonald edited the research in the Sociology of Work volume on “Networks, Work, and Inequality.” His research has also appeared in the American Journal of Sociology, Social Forces, Social Problems, and Social Networks. M.  Giovanna Merli is Professor of Public Policy, Sociology and Global Health at Duke University, where she also directs the Duke Population Research Institute. Her research straddles demography, social networks, and global health with a significant methodological component in the design of population representative surveys of sexual networks and the evaluation of innovative network-based sampling approaches to recruit samples of hidden populations or rare populations. Peter J. Mucha is a Professor of Mathematics and Applied Physical Sciences at the University of North Carolina at Chapel Hill. He obtained his PhD in Applied and Computational Mathematics from Princeton, was a postdoctoral instructor at MIT, and was previously a faculty member at Georgia Tech. His research spans broad interests in applications of networks and network representations of data. James P. Murphy is a postdoctoral fellow at Northwestern University’s Institute for Policy Research. His research focuses on how the foundations and consequences of interpersonal social networks varies across different institutional contexts. Currently, he is engaged in two major research projects. The first examines how the composition and dynamics of police partner networks affect alleged officer misconduct. His second line of research considers how multiplex networks of friendship and conflict affect preadolescents’ perceptions of their school communities and sense of belonging. Sophie Mützel is Professor of Sociology at the University of Lucerne, Switzerland. She works in the areas of economic and cultural sociology, social network analysis, computational text analysis, and sociological theory. Her recent publications have looked at practices of data-driven companies, studied the organization of creativity, and examined the emergence of a new market. She is currently principal investigator of the research project “Facing Big Data: Methods and Skills Needed for a 21st Century Sociology” funded by the Swiss National Science Foundation. Zachary P. Neal is Associate Professor at Michigan State University, where he studies urban networks in multiple domains and at multiple scales, including social networks in neighborhoods, transportation networks in regions, and economic networks globally. He also conducts research on methods for generating and analyzing bipartite projections. He is the author of more than 60 peer-reviewed publications and four books, and currently serves as an editor at Journal of Urban Affairs, Evidence and Policy, and Global Networks. Andrew V. Papachristos is Professor of Sociology at Northwestern University and Director of the Northwestern Network and Neighborhood Initiative. Paolo Parigi is a researcher at Facebook. He is interested in trust and in the broader area of how technology is impacting relationships. The key insight of his work is that technology is not only accumulating data about people but is transforming lives. We live in a largely engineered space where interactions are often designed by algorithms. A new space for an applied social science has now emerged as a result of the digital transformation. Paolo’s current position in industry allows him to pursue this more applied side of computational

xx   Contributor Biographies social science. Prior to his current position, Paolo has worked as an assistant professor at Stanford, senior data scientist at Uber and lead trust scientist at Airbnb. Carolyn Parkinson is Assistant Professor in the UCLA Department of Psychology, Director of the Computational Social Neuroscience Lab, and a faculty member of the UCLA Brain Research Institute. Eric Quintane is an Associate Professor of Organization Behavior at ESMT Berlin. He received his PhD from the University of Melbourne in Australia. His research work focuses on examining the dynamics of social networks and how they relate to group and individual outcomes. His research has appeared in journals such as American Journal of Sociology,  Journal of Applied Psychology, Organization Science, Organizational Research Methods, Social Networks, and Strategic Entrepreneurship Journal. Brian W. Rogers is Professor of Economics at Washington University in St. Louis, Director of the MISSEL lab at Washington University, and an Associate Editor at Mathematical Social Sciences. His research interests are in microeconomic theory, in particular, the fields of ­network formation, social learning, behavioral game theory, and decision theory. He is interested in developing and applying statistical game theoretic models, which provide theoretical insights into behavior and are often useful for describing and interpreting experimental data. Current work explores the phenomena of ambiguity aversion and rational inattention through lab experiments. Tatiane Santos is a postdoctoral fellow at the Wharton School at the University of Pennsylvania and adjunct faculty at the Colorado School of Public Health. Her research has focused on evaluating the impact of the Patient Protection and Affordable Care Act provisions on population health outcomes, health care utilization, and costs. She is primarily interested in public health services and systems research specific to policies and reimbursement reforms that encourage institutions in the health care and governmental public health sectors to align efforts to improve population health. She has evaluated Colorado’s Medicaid reform efforts, as well as Colorado’s state innovation model that seeks to integrate primary care and behavioral health. She is interested in applying organization theory and social network methods to explore the role of community social capital in promoting public health. David  R.  Schaefer is Professor of Sociology at the University of California, Irvine. His research investigates the mechanisms responsible for network formation and change, with an empirical focus on networks in school and prison settings. In addition, he studies how  social networks influence various outcomes related to human development and ­health-related behavior. He is the recipient of the 2012 Freeman Award for Distinguished Scholarship from the International Network for Social Network Analysis. Saray Shai is Assistant Professor of Computer Science at Wesleyan University. She holds dual BS degrees from the Israel Institute of Technology and a PhD from the University of St. Andrews, United Kingdom. She was a postdoc at the University of North Carolina at Chapel Hill. Her research focuses on developing network-based mathematical and computational tools and applying them to data analysis problems arising in a variety of contexts. Chris M. Smith is Assistant Professor of Sociology at the University of Toronto.

Contributor Biographies   xxi Jeffrey A. Smith is Assistant Professor in the Sociology Department at the University of Nebraska-Lincoln (UNL). He joined the faculty at UNL in 2013, shortly after receiving his  PhD from Duke University. His work falls at the intersection of network analysis, traditional statistical methods, and social stratification. He has done methodological work on network sampling and missing data, as well as more substantive work on network processes like homophily and status. His work has been published in the American Sociological Review, Sociological Methodology, Social Networks, Social Science & Medicine, and other venues. Natalie Stanley is a Postdoctoral Fellow at Stanford University. Prior to joining Stanford, she finished her PhD at the University of North Carolina at Chapel Hill. Her research interests are community detection in networks, computational immunology, and bioinformatics for multiomics integration. Dane Taylor is Assistant Professor of Mathematics at the University at Buffalo, SUNY. He has a PhD in Applied Mathematics from the University of Colorado at Boulder and has held postdoc positions at the University of North Carolina at Chapel Hill and the Statistical and Applied Mathematical Sciences Institute. Dr. Taylor is interested in network-based models for complex systems and high-dimensional data, and his work is currently supported by the Simons Foundation. Arnout van de Rijt  is currently Professor of Sociology at the European University Institute, Florence, Italy. He received his PhD in 2007 in Sociology from Cornell University and has had faculty appointments at the State University of New York at Stony Brook and Utrecht University, the Netherlands. His research interests include collective action, stratification, social networks, computational social science, mathematical sociology, and the sociology of science. Peng Wang is a Senior Research Fellow in the Centre for Transformative Innovation, the Swinburne University of Technology, and a leading network methodologist for the development of exponential random graph models and autologistic actor attribute models. He is the designer and programmer for the PNet suite of software packages, which are used around the world for the simulation and estimation of network data. He holds a prestigious Discovery Early Career Research Award (DECRA) from the Australian Research Council. Wang was a founding member of MelNet. Stanley Wasserman is the James H. Rudy Professor of Statistics, Psychology, and Sociology at Indiana University. He is also Research Fellow of the International Laboratory for Applied Network Research at the National Research University Higher School of Economics in Moscow, and has had faculty appointments in Minnesota and Illinois. Professor Wasserman was Founding Chair of the Department of Statistics at Indiana and Founding Editor and Coordinating Editor of the journal Network Science. He is coauthor of Social Network Analysis: Methods and Applications, and is an Honorary Fellow of the American Statistical Association, the International Statistical Institute, and the American Association for the Advancement of Science. He received his PhD in 1977 in Statistics from Harvard University. He is a member of the Department of Psychological and Brain Sciences and the Department of Statistics. Thalia Wheatley is Associate Professor in the Psychological and Brain Sciences department at Dartmouth, Director of the Dartmouth Social Intelligence Laboratory, and Director of

xxii   Contributor Biographies the Social Lab consortium at Dartmouth. Her research examines how minds align to transfer, share, and create information and how this alignment scaffolds social connectedness. Her work employs a multidisciplinary approach that includes neuroimaging, natural language processing, cross-cultural research, and social network analyses. Venice Ng Williams is a Post-doctoral Mixed Methods Researcher at the University of Colorado Prevention Research Center for Family and Child Health. She received her PhD in Health Services Research from the Colorado School of Public Health and is trained in program planning, evaluation, and econometrics. Her research focuses on mixed methods, maternal-child health, organizational collaboration, and translating research into practice within the context of prevention programs. She has previously worked in health promotion, tobacco prevention policies, systems change evaluation, health impact assessments, and hospital community-benefit research. Ran Xu is Assistant Professor in the Department of Allied Health Sciences and an Applied Statistician in the Department of Allied Health Sciences at the University of Connecticut. Previously, he was a postdoctoral research associate in the Grado Department of Industrial and Systems Engineering at Virginia Tech. His research expertise includes advanced quantitative methods, computational social science, and social network analysis, with applications to education, science policy, and health sciences. His research has appeared in journals such as Social Networks, the Journal of Policy Analysis and Management, and System Dynamics Review. Yves Zenou is Professor of Economics at Monash University, Melbourne, Australia, and holds the Richard Snape Chair in Business and Economics. He is Elected Fellow of the Econometric Society and Fellow of the Regional Science Association International. His research interests include social interactions and network theory, urban economics, segregation and discrimination of ethnic minorities, criminality, and education. Min Zhou is Associate Professor of Sociology at the University of Victoria, Canada. He received his PhD from Harvard University in 2011. His research applies a sociological perspective to global market networks, international organizations, global environmentalism, and East Asia relations, while employing interdisciplinary approaches. He has published articles in various sociological journals including Social Forces, Social Networks, International Sociology, Social Science Research, Sociological Quarterly, Sociological Forum, Sociological Perspectives, the Canadian Review of Sociology, and the Journal of East Asian Studies. His recent research projects have been supported by the Social Sciences and Humanities Research Council (SSHRC) of Canada.

T h e Ox f o r d H a n d b o o k o f

SOCIAL N ET WOR K S

chapter 1

I n troduction Ryan Light and James Moody

Social network analysis has grown exponentially since the 1970s, tracing the technological and economic changes that have drawn attention to the connectedness central to modern human life. The reason for this growth, as seen in Figure  1.1, is clear: networks are particularly relevant to the 21st century, with some suggesting that we now live in the “Connected Age.” Of course, humans have always been connected, and despite important changes in how people connect with one another, social networks have always been central to our survival and well-being. While online social networks and other forms of “big data” draw attention to how we relate to one another, these large networks function alongside other types of networks—networks of organizations, small local networks, and close personal networks. Together, the structure of our social and economic lives traces these different ways that we are tied to one another. This book aims to emphasize the diverse strategies required to approach the complex variety of ways in which we are embedded within networks. Social network analysis is a deeply interdisciplinary field, and this interdisciplinarity problematizes the easy separation of theory and methods. The growth of large-scale networks has pushed the boundaries of how we conceptualize our social relationships, raising questions about how we define our social spheres: How do our online networks relate to more specific friendship or familial networks? When do we rely on the distant connections that may be more abundant now than in the past versus our ties that serve as stronger bonds? We draw attention to this complication because network analysis is sometimes identified primarily with data and methods. We live in an exciting moment for social scientists as real-time social and behavioral data proliferate in an unprecedented way, and many scholars have spent a tremendous amount of energy on capturing and analyzing these data. These techniques require significant specialization and may push theoretical issues into the background. However, social network analysis has always been a deeply theoretical field working through how social structures limit or enable social action. For most of the history of social networks research, methodological developments have been driven by substantive researchers whose aim has been to operationalize theoretical ideas. For example, the seminal works on block modeling (White, Boorman, & Breiger,  1976;

2   Ryan Light and James Moody Social Networks 6000

Inequality

count

4000

Democracy 2000

Bayesian Polarization

0 1970

1980

1990

year

2000

2010

2020

figure 1.1  Trend in articles on “Social Network*” topics among all papers indexed in Web of Science Social Science Citation Index, with other keywords for comparison. Full color figures available on Oxford Handbooks Online. Boorman & White,  1976) were developed to capture theoretical models advanced by structural anthropologists. Similarly, ideas about how to capture centrality and cohesion were explicitly linked to theory (Freeman,  1977; Friedkin,  1991). While contemporary academic division of labor encourages greater specialization and thus threatens substantive and methodological integration, it is still largely a hallmark of the social network analysis field. This Handbook takes this lesson about the relationship between theory and method as a  core motivation. We offer significant space to both theory and methods but also note that  these pursuits are not easily separated. Many scholars who write primarily on ­methods implicitly or explicitly engage theory and vice versa. Scholars working in specific fields, such as economics or crime and networks, often bring this relationship to light by ­connecting the theoretical and methodological advancements. For example, dynamic approaches to understanding social structure require sophisticated methods but also sophisticated theory-building around why structures might change. In this introduction, while previewing the content of this volume, we also illustrate the importance of theory and method within the broad interdisciplinary field of social network analysis. Data can be an important link in interdisciplinary projects, whether drawn from large online social networks or the widely used networks component of the National Longitudinal Study of Adolescent to Adult Health (Add Health) dataset; however, theory and methods together play a central role in connecting research on social networks.

Introduction   3

The Handbook as a Map Our broad argument is a simple one: social network analysis is greater than a method or data but serves as a central paradigm for understanding social life. In Exchange and Power in Social Life, Peter Blau (1964) noted, “To speak of social life is to speak of the associations between people” (p. 12). The notion that networks capture a uniquely social ­dimension—­a “who” defined by roles and commitments embedded in exchanges with ­others, as contrasted to the more common collection of “whats” encoded in demographic ­ ­attributes—motivates the linkage between networks, motivations, and actions. The evolution of the field from early theoretical statements on small group dynamics by social theorists like Georg Simmel to mid-20th-century work on families by scholars like Elizabeth Bott illustrates the field’s interdisciplinary early history. The development of the field has only furthered its interdisciplinarity. To illustrate, we pulled all papers from 70 sociology and population science journals that included the term “network” in the abstract, title, or keywords.1 We then employed a text network (Moody & Light, 2006; see Chapter 22 in this volume) to identify the network of papers linked by similar topics within these journals. Figure 1.2 presents an overview of the topic clusters identified in the network. The left panel displays a broad spatial map representation, while the right panel lists the frequencies of the most prominent topic clusters. While there is a concentration of work on tools and methods (lower right of the Topic Concept Map in Figure 1.2) that is distinct from substantive topics, importantly, this distinction is not stark. We see specialized work on methods, such as exponential random graph models, in a less dense region of the graph and closely proximate to substantive issues such as crime, economic issues, and community development. This figure also highlights

Globalization/Trade Social Movements Politics

Diffisuon & Tech Business/ Economic Soc

Social Capital Class

Immigrant/Ethnic Community

Systems, Science Communication Social Exchange Voluntary Association

Migration Health & HIV Family & Kin Jobs & Markets

Community Development Measurement & Centrality

School & Adolescent Crime & Neighborhood

ERGM Sampling/ RDS

Health, HIV Diffusion Innovations & Technology ERGM, Random Graphs Schools, Adolescents & Peers Social Movements Family, Kin & Social Support Scicence, Systems & Communication Exchange and Power Migration, Transnational Solcial Capital, Identity Business, Economic Soc Measurement & Centrality Jobs, Gender & Markets Community Structure Ethnic Communities & Immigration Voluntary Associations & Status Trade & Globalization Neighborhood, Crime & Space Food & Agriculture Trust & Embeddedness Sampling, RDS Politics

0

50

100

150

200

250

Size Size Distribution Topic Concept Map

figure 1.2  Topic distribution of network papers in 70 sociology and population science journals.

4   Ryan Light and James Moody the numerous substantive fields using social network analysis. The most popular areas include health, technology and some of the most popular areas of sociological research, such as education and social movements. The rapid evolution of social network analysis has led to the generation of several unique theoretical and methodological streams. For example, scholars differ in terms of whether they use dynamic, time-dependent data or more cross-sectional data. The following chapters speak to this wide diversity by including, for example, discussions on up-to-date stat­ istical modeling of network change and on the relationship between new data and computational challenges and social network analysis. The last part of the book will then focus on substantive applications attempting to cover the major areas identified in the literature with a bias toward those emerging most recently. As Figure 1.2 illustrates, scholars from a wide variety of substantive areas, including economics, criminology, and demography, have incorporated social network analysis into their methodological and theoretical toolkits. Handbooks serve as maps—albeit imperfect—of scholarly fields. While no volume can capture the broad variation of a vibrant scholarly community, we can certainly illustrate this breadth as we see in this volume. Readers will find many of the broad areas that have used social network analysis, from research on trade to research on the neuroscience of networks. Several important areas of network research are not featured, such as work on public health. We simply couldn’t include every important area that has contributed to this field. However, we hope that readers walk away with a sense of the breadth of research on social networks. Next, we will introduce the four major sections of the volume in detail. We conclude by discussing several promising new directions in this field.

Network Basics and Theories Network approaches have always been deeply theoretical, although the theoretical underpinnings of social network analysis have not always been at the fore. Since at least the work of Georg Simmel (1950 [1902]) over a century ago, systematic theorizing around pattern and positioning in small groups has brought clarity to how we understand social interaction and social structure. Yet, the volume and quality of theorizing on social networks have increased dramatically in the past several decades as scholars address network formation, network change, and many other concerns. Part I includes chapters that offer overviews of key aspects of social network analysis with provocative future-oriented frameworks. In the first chapter of Part I, Light and Moody offer an introduction to the basic concepts and logic of social network analysis. This chapter hopes to provide some grounding for those who are new to networks and would like a quick overview. It introduces major concepts and provides a rough outline of the development of the networks perspective. Next, Fuhse provides a general theoretical introduction, overview, and insight into future directions on work in relationality. Fuhse outlines central perspectives from theories focusing on how networks constrain or enable action to relational theories focusing on symbolic networks and networks of meaning. In another forward-thinking chapter, Lazega develops a multilevel approach to understanding the relationship between organizational and social

Introduction   5 networks. Kitts and Quintane update network theories in light of the data and methods revolution associated with computational social science. They call for a new analytic approach that redirects network scholars from “structural patterns to social processes.” Building a connection between theory and methods, Martin and Murphy tie the concep­ tualization and measurement of status within social networks to potential inequalities ­conditioning network data collection.

Network Methods While social network analysis has grown far beyond—and never really was solely—a set of methods for exploring data on social connectivity, many core discoveries in social network analysis depend on innovations in data collection and methodology. Part II addresses this central area of social network analysis with chapters on data collection and statistical analysis. Collecting data on social networks can be time consuming and expensive as social network data collection in its “simplest” form requires asking respondents a series of m ­ ini-­surveys about the people with whom they have social ties. adams, Santos, and Williams provide a clear and concise introduction to numerous major strategies for collecting social network data as well as a discussion of the ethical concerns in social research generally and specific to collecting social network data. Brashears and Gladstone discuss conducting social n ­ etwork experiments and collecting reliable, experimental data for determining causal p ­ rocesses. Illustrating how a social network perspective can contribute to collecting social scientific data generally, McCormick introduces a method that leverages personal networks to capture information about rare or stigmatized groups and/or to develop population estimates. Ego networks—those typically smaller networks centered on a focal actor—may appear out of step with the move toward big data; however, due to recent advances in the analysis and collection of ego network data, Smith convincingly argues, they remain key area of research on networks. Agneessens extends this work on ego networks in his multilevel conceptualization. This chapter provides a practical discussion of when researchers should focus on the dyadic, nodal, or group-level units in a social network analysis. These two chapters provide useful foundations for thinking about statistical approaches to social networks. These statistical models for networks are more formally introduced in Kuskova and Wasserman’s insightful introduction. This chapter focuses on the exponential family of random graph distributions. Lusher and coauthors extend this discussion to consider recent advances in exponential random graph models and related models including multilevel extensions. Schaefer and Marcum provide an overview of l­ongitudinal network modeling with a robust discussion of the strengths and weaknesses of approaches from the relational event model to the stochastic actor-oriented model. They illustrate the properties of these models with a fascinating network consisting of dominance interactions among Eurasian red deer. Frank and Xu discuss causality in social network analysis with specific attention to how scholars tease apart selection versus peer influence effects. They conclude with an im­por­ tant call for transparency and sensitivity analysis in the scientific models used in social network analysis.

6   Ryan Light and James Moody

Network Dimensions The nine chapters in Part III address key dimensions of networks that exist at the intersection of theory and methods. For example, theory informs how we conceive of a group when we are trying to locate communities within a network or how we conceptualize who an important actor is. Moreover, networks do not exist in isolation from other social factors but are embedded in space and time. Social processes also influence social structure and vice versa, drawing attention to how social networks relate to culture, meaning, and group formation. These dimensions, like spatiality and temporality, have both theoretical and methodological implications. The first three chapters in this section address dimensions of social network analysis that are often presented as solely measurement and/or statistical issues, but the authors conceptualize at the intersection of theory and methods: how to locate communities, how to uncover key actors or nodes in a network, and how to visualize networks. Shai and coauthors offer a series of case studies of network community detection across a range of disciplines. These case studies illustrate how community detection methods should be guided by the objectives of the analysis itself and not a one-size-fits-all model. Similar to community detection measures, centrality measures abound. Borgatti and Everett provide an approach to thinking about centrality that focuses on three frameworks: the “walk structure perspective,” the “induced centrality perspective,” and the “flow outcomes perspective.” These perspectives highlight how centrality is more than a measure but, as the authors state, is a “family of concepts on par with ‘demographics’ or ‘personality traits.’ ” In their chapter on visualization, Moody and Light offer a practical overview to constructing useful network visualizations. Freeman (2004) includes a deep reliance on visual methods as one of the defining features of the social network analytic tradition, and the reason for this is that visualizations (when done well) provide a multidimensional summary of a social system in ways that collections of single-dimension summary scores simply cannot. While one might be able to intuit a certain level of clustering or hierarchy from modularity or centralization scores, respectively, one can often easily see such in a well-done visualization. Modern visualization tools now allow for deeper exploratory data analysis than ever before, including tools for interactive layering of information and data. These tools make network visualization a central way to communicate network results to both academic audiences and the general public. Next, Neal explores the spatial dimensions of social networks by examining the relationship between distance in geographic space and distance in social space. Van de Rijt and Akin examine processes of preferential attachment—the key network principle that popularity increases as a function of who is already popular—using online field experiments. Mützel and Breiger provide an overview of recent work on duality, another key networks concept. Duality, broadly, is the idea that people form connections through overlapping group membership. So, we can model these connections in networks that contain people and their links through groups. Mützel and Breiger describe new directions for work on duality that center on how duality can help us to understand cultural structures and processes. Building off this idea, Light and Cunningham provide an overview of two strategies for modeling culture and meaning using network analysis to examine patterns in collections of narratives.

Introduction   7 Another key dimension of social networks is their temporality. By exploring historical network research, Erickson and Feltham turn to the relationship between time and social networks. Their chapter focuses on six network phenomena, like cohesion and brokerage, to review the state of historical network research. In conclusion, Erickson and Felthem call for “more precise attention to the particular structural configurations, patterns of association, and dynamic mechanisms driving historical change.”

Network Landscape Social network analysis has become a core analytic technique across a range of social scientific disciplines. The chapters in Part IV illustrate the feedback loop as social network analysis contributes to new insights in a diverse set of fields, while discipline-specific insight often contributes to developments in our general understanding of social networks. While these chapters represent a fraction of the network landscape, these are key areas where social network analysis has been put to use. For example, Knappett provides an overview of the use of networks in archaeology with a particular focus on two different archaeological approaches to networks—those built from theory and those built from data—with specific scalar implications. He highlights the potential of thinking topologically versus geographically about the relationship between humans and artefacts. Two chapters in this section examine aspects of demography and social science on the family. Gauthier’s chapter provides an overview of research on the support derived from family networks. She concludes by offering an alternative way of thinking about family networks as emergent roles. Merli, Curran, and Le Barbenchon discuss the variety of ways that networks have been incorporated into demographic research. They highlight several ways that networks can help advance demographic research. For example, network ­structures can help population-level estimates via techniques like the network scale-up method. Research in the neuroscience of networks provides an excellent illustration of the feedback loop between new areas of research and new information about network processes. This field, thoughtfully outlined in the chapter by Parkinson, Wheatley, and Kleinbaum, examines how networks, as the authors write, “shape and are shaped by the psychological processes of their members.” This chapter explores cutting-edge research on how the intersection of neuroscience and social network analysis extends both areas in fruitful ways. Computational social science is another cutting-edge area of research on social networks. In their chapter on big data and networks, Abrahao and Parigi focus on the broad range of computational social scientific approaches from machine learning to online experiments. They conclude with a concise list of some of the main challenges social scientists face when “creating algorithmic models of human behavior.” The next three chapters in this section examine the relationship between economics and social networks. Jackson, Rogers, and Zenou provide a general overview built around the concept of externalities or, as the authors write, “situations in which the behaviors of some people affect the welfare of others.” Networks inherently contribute to both positive and negative externalities as network actors’ behaviors affect those with whom they share ties. McDonald and Benton turn to the resources derived from social networks in their

8   Ryan Light and James Moody discussion of social capital and economic sociology. Zhou examines the international trade network and describes approaches to modeling trade as a network. Börner’s chapter on science and networks highlights practical applications of network visualization. Her work offers insight into the creation of network maps and how these maps can help make data-driven decisions about a range of scientific processes and outcomes from international collaboration to inequality in science. Last, Smith and Papachristos describe the intersection of social network analysis and criminology. They discuss the prior research on criminal organizations and discuss the implications of this research for policy. Together these diverse chapters highlight that the network landscape is diverse and robust. Social network analysis is an exciting interdisciplinary space contributing across the social sciences and beyond.

Conclusions, Concerns, and Future Directions Social scientists had network analysis mostly to themselves for the first half of the last century (Freeman, 2004). While other fields made use of graph theory for particular problems, the wholesale adoption of networks as an interesting substantive object of study in itself (rather than a means to solving, say, a search problem) was comparatively rare. This has changed dramatically with the rise of “network science” as a distinct substantive field ­drawing researchers from outside the social sciences. This influx of new technical expertise has brought with it a welcome increase in tools and techniques, but perhaps also a strain on the linkage between theory and method. Generative network models make great use of simplifying assumptions about human relational behavior to gain mathematical traction on models. Like other classic simplifications (Schelling, 1971), such approximations often lead to important insights. But we must also take care to avoid substituting simplifying assumptions for research-based insights into complex behavior. The disciplinary broadening of network studies carries with it a natural tendency toward cliques: as investigators discover ideas in their own fields and cite proximate references, what starts as an interdisciplinary field risks becoming a loosely connected multidisciplinary set of fields. Consider as an illustration the cocitation network of all papers published in the journal Social Networks and all papers on “network*” in Physical Review E (Figure  1.3). Without a dedicated examination of how fields employ cross-disciplinary information, one can judge images like this as optimistic or pessimistic. On the one hand, it’s hard to imagine physicists and social scientists citing each other at all 30 years ago; on the other hand, the clear separation of the fields suggests that we may be at a tipping point where the volume of work within each field no longer necessitates communication across the fields. Our hope is that social, natural, and computational scientists maintain a high level of communication to ensure that substantive knowledge, theoretical expertise, and new methodological developments work in concert to move the networks field forward. With this in mind, we next turn to a set of 10 open problems we hope a vibrant interdisciplinary revision of this handbook 20 years from now will have solved. These are issues that

Introduction   9

figure 1.3  Cocitation network for references from Social Network and network articles in Physical Review E: based on papers published since 2010, displaying only papers with at least three citations, size proportional to total citation count. Full color figures available on Oxford Handbooks Online. we have noticed emerging at the intersection of multiple chapters and, we think, lend ­themselves to careful integration of network methodology and network theory. Of course, any list is particular to the authors and there are likely many directions not mentioned here. The first problem is meaningfully integrating dynamics and diffusion. Much of social network analysis is interested in how “bits” diffuse through networks (see Chapter 14 and others), and at least since the early work of Morris and Kretzschmar (1997), we have known that the temporal structure of networks constrains how diffusion proceeds through a network. Recent developments in this area have bifurcated between epidemiological models (Armbruster, Wang, & Morris, 2017; Moody & Benton, 2016; Lee, Moody, & Mucha 2019) and social models—such as stochastic actor-oriented models. The fundamental problems extending from the epidemiological perspective are that infrequent, temporal contacts fundamentally shape the underlying set of network paths that bits can flow through, which then in turn shapes who is at risk to future diffusion. The fundamental issue at play in dynamic social diffusion models turns on how relations are modified as information or behavior flows through the network— the feedback process on network structure emerging from selection on a bit that simultaneously diffuses through the network. Future work that profitably merges these ideas could lead

10   Ryan Light and James Moody to new insights and more accurate prediction, particularly as network behaviors far “upstream” in a diffusion process shape exposure that then rewires networks later. A closely related second direction within the broader diffusion problem space is to further our understanding and modeling on how qualitatively different types of behaviors move through qualitatively different types of relations. The methodological challenge here is likely to be met by extending multilayer network ideas with competing-diffusion ideas— intuitively modeling secrets among a confidant network and gossip among a status­competition network, for example. The theoretical challenge will be to link the social ­psychological understanding of issue salience with perceptions of trust, status, and relational position. Echoes of early work on cognitive social structures are clear here, and linking these formally with models of diffusion should be fruitful. Our ability to understand and model the social psychological linkages between structure and diffusion is a necessary next step in the growing body of research using experimental diffusion manipulations. Dramatic effect sizes for community health interventions (Bjorkman & Svensson,  2015) have prompted an increase in network interventions and network field experiments, and experiments are the gold standard for disentangling ­ ­selection and influence in network effects. But while there is a wide scope and diversity in potential types of network experimental interventions (see Valente, 2012, and Chapter 8), most of the current work focuses on diffusion effects through influential people (Valente, 2017). There are still deep difficulties in understanding the effects of network inter­ vention and parsing the effects of intent to treat from treatment on the treated, as people usually have volition in their interaction levels within experimentally manipulated relations (e.g., one can choose not to hang out with a randomly assigned roommate). These experimental issues raise fascinating questions about how behavior feeds back onto relational saliency and how actors navigate voluntary and involuntary relations. More experimental evidence bridging the network and social psychological level will shed light on these sets of problems. The recent growth in interest on multilayer networks reignites old questions about multiplexity in general. While the vast majority of social network research has focused on single-­relation systems, we are seeing a resurgence in data that simultaneously collects information across multiple relations. The multilayer network approach has characterized these systems in connectivity terms—asking how things flow through different layers differently. But the core insight from early multiplex relations work was that social systems constrain how relations mix—one must be a father before being a grandfather. The rich work on formal relational systems, pioneered by White (1963) and taken up by Pattison (1993), is fertile ground for understanding the basic rules shaping lived social systems. If we can effectively develop estimation-based models that derive such rules, we will have datagrounded network generation processes that respect these fundamental aspects of multirelational interdependence. Such models could then provide a theoretical grounding predicting how networks evolve over time. The fifth open area continues to be honing relational models for community detection. Applications and tools abound (see Chapter 16), and we now have a much clearer understanding of how analytic choices—sometimes implicit in things like unrecognized resolution parameters—affect results. We similarly have new methods for community detection in dynamic and multirelational networks. What has been lost in the rapid growth in community detection tools is the tight link with theory, either on what a community is

Introduction   11 substantively or on the network process flowing through these meso-level structures. A key problem for future work is for social theorists to provide concrete understandings of how people are embedded in groups and what those groups imply for relational patterns. Freeman’s (1992) original work focused on structural markers (reciprocity, transitivity), but in keeping with the focus on dynamics and multirelations, we would like to have an understanding of groups that provides a collective “life history” of network involvement. The sixth set of open problems is related to bridging the different levels of activities that drive relational activity. A core insight of the exponential random graph model (ERGM) framework in particular, and network generation models in general, is that actors respond to relational configurations and change or form relations in response. Building out multilevel, multirelational versions of how actors respond to “configurations” opens fascinating theoretical questions. So, for example, if communities are “real,” then the relations that people react to should vary in their significance with the community boundary: a friend-of-acogroup friend is more important than the friend-of-a-cross-group friend. This sort of activity feeds back, of course, to the saliency of the group boundary itself. At issue is moving beyond local configurations that are fast to calculate and instead building configurations that bridge actor-recognized meso-levels. The rise of online networks, social media, and the internet of things has opened a vast trove of new data for network analysts to play with. At the moment, it seems that much of the work currently done here is either substantively rich and descriptive or analytically thin and predictive. So we find either fascinating and engaging exposés on the more distasteful Reddit communities or new algorithms for suggesting stories people might find fun on their media feed. For the field to move forward, engaged social theory on the motivations, norms, social rewards, and social sanctions implicit in engagement with these communities will be needed. Analytically this will require a mixed-methods approach, but it will also deeply inform the community detection, role, and dynamic diffusion points described earlier. Any analysis of online networks almost certainly will involve an analysis of free-form generated text: online life is largely a communicative environment, and this is almost always text or video. Linguists recognized long ago that language is fundamentally relational; the meaning of terms is fundamentally linked to their position in a sequence of other terms. Current work linking texts and networks focuses generally on topic extraction (Light, 2014; Moody & Light, 2006; Rule, Cointet, & Bearman 2015; see Chapter 22) or narrative structure (Bearman, Faris, & Moody, 1999; Rule et al., 2015). Our intuition is that a wide-open area for future work is to leverage multimode networks of meaning producers (“speakers”), terms, and syntax. The nexus of these items allows one to situate actors within an interactive context and derive meaning from the situation (see Chapter 21). Models for doing so are difficult but open fascinating doors for new questions. This work will benefit most directly from interaction with machine learning tools, where interest in language extraction and coding is well established. In addition to the multimethodological implications of connecting texts to networks, broader efforts toward incorporating qualitative methods and mixed methods—research strategies that combine quantitative and qualitative network analysis—continue to push for multidimensional understandings of social networks. Networks are obviously more than outcomes or key predictors; they indicate the social groups in which individuals live. The “thick description” of how people make sense of, develop, and maintain their social

12   Ryan Light and James Moody relations remains an enduring question (Hollstein, 2014). Approaches that rely solely on quantitative techniques often infer or provide a “thin description” of these processes and fail to address the “how” questions, such as how people form new ties, how people perceive their ties, and exactly how people mobilize their ties in times of need (Small, 2017). Research on historical networks has been the site of some significant interventions in this vein (e.g., McLean, 1998; see Chapter 23). We hope for continued growth in the “thick description” of social networks. The final area we are excited to watch develop over the next 20 years is projects that examine collections of similar network relations. In the spirit of old sociological community studies, we now have rich social network data being collected on hundreds of villages, for example. While the current investment is largely in a health intervention framework, we expect there will be many new opportunities across domains, ranging from kids in schools to users on Reddit to commuters linking neighborhoods. New data will open opportunities to examine an ecology of network formation processes. The chapters in this volume will introduce new readers to current best practice while helping experienced users see a bit of the forest in addition to their well-known tree. We have learned a great deal from the chapters in this volume and expect that you will find them similarly engaging. It is an exciting time to be engaged in this field and we look forward to the next phase of research on social networks.

Acknowledgements Special thanks to Kanan Shah for help with preliminary analysis related to this chapter.

Note 1. The focus here on sociology betrays our disciplinary bias as we are both sociologists. Yet, we hypothesize that sociology serves as a good model for the wider landscape as other fields likely exhibit a similar broad incorporation of network theories and methods across subfields.

References Armbruster, B., Wang, L., & Morris, M. (2017). Forward reachable sets: Analytically derived properties of connected components for dynamic networks. Network Science, 5, 328–354. Bearman, P., Faris, R., & Moody, J. (1999). Blocking the future: New solutions for old problems in historical social science. Social Science History, 23(4), 501–533. Bjorkman, M., & Svensson, J. (2015). Power to the people: Evidence from a randomized field experiment on community-based monitoring in Uganda. Quarterly Journal of Economics, 124, 735–769. Blau, P. (1964). Exchange and power in social life. New York, NY: John Wiley & Sons, Inc. Boorman, S. A., & White, H. C. (1976). Social structure from multiple networks II. Role structures. American Journal of Sociology, 81, 1384–1446. Freeman, L. (1977). A set of measure of centrality based on betweenness. Sociometry, 40, 35–41. Freeman, L. (1992). The sociological concept “group”: An empirical test of two models. American Journal of Sociology, 98, 152–166.

Introduction   13 Freeman, L. (2004). The development of social network analysis: A study in the sociology of ­science. Vancouver, BC: Empirical Press. Friedkin, N. E. (1991). Theoretical foundations for centrality measures. American Journal of Sociology, 96, 1478–1504. Hollstein, B. (2014). Mixed methods social networks research: An introduction. In S. Dominguez & B. Hollstein (Eds.), Mixed methods social networks research: Design and applications (pp. 3–34). New York, NY: Cambridge University Press. Lee, E., Moody, J., & Mucha, P. (2019). Exploring concurrency and reachability in the presence of high temporal resolution. In P.  Holme & Saramaki (Eds.), Temporal network theory. Cham, Switzerland: Springer. Light, R. (2014). From words to networks and back: Digital text, computational social science, and the case of presidential inaugural addresses, Social Currents, 1(2), 111–129. McLean, P. D. (1998). A frame analysis of favor seeking in the Renaissance: Agency, networks, and political culture. American Journal of Sociology, 104(1), 51–91. Moody, J., & Benton, R. A. (2016). Interdependent effects of cohesion and concurrency for epidemic potential. Annals of Epidemiology, 26, 241–248. Moody, J., & Light, R. (2006). A view from above: The evolving sociological landscape. American Sociologist, 37(2), 67–86. Morris, M., & Kretzschmar, M. (1997). Concurrent partnerships and the spread of HIV. AIDS, 11, 641–648. Pattison, P. (1993). Algebraic models for social networks. New York, NY: Cambridge University Press. Rule, A., Cointet, J. P., & Bearman, P. S. (2015). Lexical shifts, substantive changes, and continuity in State of the Union discourse, 1790–2014. Proceedings of the National Academy of Sciences, 112(35), 10837–10844. Schelling, T. C. (1971). Dynamic models of segregation. Journal of Mathematical Sociology, 1, 143–186. Simmel, G. (1950 [1902]). The dyad and the triad. in K.  Wolf (Ed.), The sociology of Georg Simmel. Glencoe, IL: Free Press. Small, M. L. (2017). Someone to talk to. New York, NY: Oxford University Press. Valente, T. W. (2012). Network interventions. Science, 337, 49–53. Valente, T. W. (2017). Putting the network in network interventions. Proceedings of the National Academy of Sciences, 114(36), 9500–9501. White, H. C. (1963). An anatomy of kinship: Mathematical models for structures of cumulated roles. Englewood Cliffs, NJ: Prentice-Hall. White, H. C., Boorman, S. A., & Breiger, R. L. (1976). Social structure from multiple networks. I. Blockmodels of roles and positions. American Journal of Sociology, 81, 730–780.

pa rt i

N ET WOR K BA SIC S A N D T H E ORY

Chapter 2

N et wor k Basics: Poi n ts, Li n e s, a n d Positions Ryan Light and James Moody Social networks are a fundamental building block of social life. By making our connections visible, we can observe the structures enabling and constraining the things that we do and the ways we move about the world. Networks provide a powerful metaphor for contemporary life as connectivity seems more relevant now than ever before: people connect to one another via internet technology over vast geographic expanses, workers network to find new jobs, and terrorists form hidden networks to coordinate acts of violence. While social network analysis has contributed to understanding each of these areas (and many others), it has done so through a set of systematic techniques that move beyond the general concept of connectivity to specific and now well-tested empirical, analytic methods. This chapter provides a brief overview of the core building blocks of social network analysis.

The Building Blocks of Networks Visually, networks consist of points and the lines connecting them. In the same way, social network data consist of two linked classes: nodes (or points) and edges (or lines). Nodes in social network analysis are most often people but can refer to any other unit capable of being linked to one another, such as schools, countries, or ideas. We may also refer to nodes as vertices or actors. Without data on connectivity, the data collected at the node level consist of standard social science data. For example, with teenaged actors we likely would collect demographic and educational information. Edges in social network analysis consist of the relations among nodes. Also referred to as arcs or ties, edges can be valued or directed. In a valued graph, edges may possess different weights indicating the strength of a connection. For example, ties between people may differ by how much time they spend with one another. Unweighted ties—edges with no specific value—are binary or either “on” or “off.” For example, we may be interested in whether people spend any time at all together without specifying a value. In this case, the tie would

18   Ryan Light and James Moody exist between two people if they hung out together regardless of how much time they spend together. Edges can also be directed or undirected. Directed ties are ties “sent” from one node to another. For example, Brian may like Steve, but Steve does not reciprocate. A directed tie is sent from Brian to Steve, but not vice versa. Undirected ties are always symmetrical as they do not indicate the direction in which a tie is sent. In an undirected global network, for example, we might identify an edge between any two countries that belong to the same nongovernmental organization (e.g., Hughes et al., 2009). Network data are often stored as an adjacency matrix, which we can denote as Mi,j. Here, M is the matrix, i indicates the rows of the matrix, and j indicates the columns. Figure 2.1 illustrates how this works. In both panels, we have drawn a five-node network. Panel A captures an undirected, binary network. We can imagine a group of five kindergartners who help each other with reading. In the triad—or the three nodes—on the left: Andy (node a) and Bella (node b) help each other; Bella and Carlos (node c) help each other; but Andy and Carlos do not offer each other help. The matrix captures this information; Ma,b is 1 and Mb,a is 1 because the network is undirected and symmetrical. Mb,c and Mc,b are also 1 because Bella and Carlos help each other. Yet, a tie does not link Carlos to Andy so these cells (Mc,a and Ma,c) are blank. Panel B, on the other hand, captures a directed, binary network. We can imagine the same scenario with a slight variation: In this graph, we take who is helping whom into consideration. For example, Andy helps Bella and Bella returns the favor (Ma,b = Mb,a), while Carlos helps Bella without reciprocation. This asymmetry is reflected in the matrix as a tie “sent” from Carlos to Bella (Mc,b) is included, while a tie sent from Bella to Carlos (Mb,c) remains blank. While both of the networks illustrated in Figure 2.1 are binary, indicating the presence or absence of a tie, weighted or valued graphs can be stored in matrices as well. The value of the elements in the matrix of a valued graph will be allowed to vary beyond the simple 1 or 0. Note that network data can be stored as an edgelist or nodelist as well, which is often computationally more efficient. We probably most often think about networks as modeling relationships between similar units, such as networks between people, computers, or organizations. This type of network, as seen in Figure 2.1, is called a one-mode network. Two-mode networks—or affiliation networks—model the relationship between two different units. For example, business elites may be connected to one another via overlapping corporate boards (e.g., Burris, 2005). This network consists of two different types of nodes: elites and corporations. In a two-mode network, the adjacency matrix consists of different elements across the rows and columns. (a) a

(b)

d

b

c Undirected, binary a b c d e a 1 b 1 c d e

1 1

e

c

a

a

Directed, binary a b c d e 1

b 1 1

1 1

e

d

b

1

1

c

1

d e

1

1 1

figure 2.1  Adjacency matrices.

1

1

Network Basics   19 For example, the rows may consist of business elites, while the columns consist of corporate boards. Any connection between groups or events and people can be captured in this way: teenagers and afterschool activities, Wikipedia editors editing different Wikipedia entries, activists attending environmental protests, and so forth. In addition to variation between types of nodes, edges may also vary by kind. Multiplexity is the property of edge variation in social networks. Some networks may not be multiplex— for example, we may be solely interested in how children help one another. However, we often seek more information about the type of connections we may observe in the social world. For example, in a classroom, children may help with one another, play with one another, and be siblings with one another. Considering multiplex relationships, we can include each of these unique edge types, we can model each edge type individually, or we could add them together as a summary of interaction. Dynamic or temporal aspects of networks are increasingly important to network scholars. In dynamic networks nodes and edges may be active or inactive in a given moment of time. For example, in a preschool classroom network, a toddler may be sick and therefore not present for a day’s activity. This absence results in an inactive node for this time period. Or toddlers may play with different peers at different times, creating active and inactive ties between this set of classmates. Social network scholars have built increasingly more sophisticated tools for examining dynamic networks, including statistical models and network animations. Contemporary work (Mucha et al., 2010 and others) combines multiplexity and dynamics under a unified framework known as “multilayer” networks. Each “layer” in a multilayer network is either a new panel (if temporal) or relation (if multiplexity), and ­layers are linked to each other by sharing the same nodes. The discussion so far has focused on a few very general concepts for thinking about networks. Next, we discuss the general ways in which networks mater.

Two General Approaches to Social Network Analysis Social network analysis has contributed to a diverse range of topics across the social sciences. But, as an overview, we can consider two basic ways to conceptualize how networks matter: connectionist approaches or positional approaches. Connectionist approaches to networks are the most familiar and focus on the set of links between nodes and how those links provide a conduit for diffusion. The primary focus of connectionist approaches to network analysis is on the path structure of a network, where nodes sit within that path structure, the extent to which groups are heavily linked, and the ease with which things diffuse across the network. If your question is about how something travels through a network or how the set of relations in a network bind together a collectivity, you are generally asking a connectionist question. So, for example, classic studies on the diffusion of innovations (Rogers,  2003 [1962]), the nature of groups and cohesion (Freeman, 1992), or how to reach others in a network (Milgram, 1967) all fundamentally turn on how subsets in the population are connected by relational paths in the network. Average path length, or the average of the shortest paths connecting each pair of nodes in a network,

20   Ryan Light and James Moody is one basic measure of connectivity. Diffusion processes highlight questions about network dynamics, as the temporal ordering of relations constrains when things flow through the network, and who is ultimately reached (Moody, 2002), which is especially important for those interested in understanding disease spread. Perhaps the most well-known connectionist approach to networks arises out of the “small world” research. A small-world network consists of small groups of densely connected nodes weakly connected to one another. Many human networks are small-world networks as people form tight groups of closely connected friends that only share a few connections with other groups. Yet those weak intergroup connections enable information to spread with surprising speed. In one of two versions of his classic study from the 1960s, Stanley Milgram evaluated how quickly a message could be sent between strangers by asking individuals in Omaha, Nebraska, to send a letter to an stockbroker in Boston, Massachusetts, through intermediaries of their choosing. So, these individuals strategized the best person who may be a friend of a friend of the stranger. Milgram discovered that this task required six intermediaries on average, suggesting that strangers are separated by about only six people or six degrees of separation. This notion that our densely connected subgroups can be quickly traversed has been subject to repeated successful replication, including various online celebrities and online small-world games like Six Degrees of Kevin Bacon. Connectionist questions need data on the full array of relevant relations in a context. So, for example, if one really wants to characterize the flow of information in a community, you need to know all the channels and players that could be sharing information. This is the primary reason that classic network studies require population-level data. Positional approaches to network analysis, in contrast, focus less on how the path structure links nodes across a population and more on how patterns of relations characterize roles within networks. The multiplex set of relations in any social system tend to fit together in a sensible way. Consider as an archetypical example a system such as Figure 2.2. This admittedly silly example highlights how roles are defined by a consistent pattern of social exchange (e.g., classic work by Homans & Nadel). Being a “parent” within a family involves a set of accepted relations to everyone else in that family, and one’s role is defined by the pattern of relations one is engaged in. For example, if an elder child starts to provide food and discipline for her siblings, then she starts to take on some of the characteristics of a parent, regardless of any Archetypical Family Exchanges Parent

Parent

Child Child

Child Romantic Love Provides food for Bickers with

figure 2.2  Multiplexity and network position.

Network Basics   21 biological constraint. The fundamental insight of positional approaches to network studies is that patterns of relations tend to come in well-bounded configurations that hang together, and those patterns constrain action (White, Boorman, & Breiger, 1976). Questions about hierarchy, relational status, or isolation all focus on the positional features of networks. Positional questions need data on enough of the relevant exchange to characterize positions uniquely. Importantly, it does not require that you know how different units in the population are fully connected—one can study varieties of family networks (or business organizations, political structures, etc.) without collecting data on the full kinship network in a nation. As such, positional approaches tend to be more amenable to sampling than connectionist approaches. The distinction between connectionist and positional approaches is theoretically im­por­ tant for understanding the mechanisms underlying social actions. For example, a classic question in network studies is whether homophily is driven primarily by role constraints or the diffusion of information (Burt 1987). Homophily is the idea that “birds of a feather flock together” and that we are likely to share much more in common with our friends than with strangers. Theoretically, homophily can arise either because we influence each other (a dynamic selection and influence process resting on connectivity) or because we face similar constraints in our local pattern of ties (Burt, 1987). Since networks capture the social contexts affecting actors’ behaviors, the formal similarity in network processes can be used to identify similarities across seemingly very different social settings. For example, we can think of a general diffusion process underlying deviant behavior, smoking, buying panics, and disease flow. While the particular ways in which each of these features diffuses across the network may differ, the underlying network connectivity pattern provides a fundamental limit on the extent of diffusion. Similarly, we might expect peripheral members of organizations to assess opportunities in similar ways, perhaps taking more risks than would otherwise seem rational, regardless of whether these organizations are businesses, nonprofits, or nations. Knowing which mechanism (or the relative weight of both) will shape the ability to intervene in such systems. If deviant homophily is due primarily to diffusion, then interventions with peer leaders will be most effective. If, instead, it is due primarily to one’s position in the peer ecology of a school—as “outsider” or “other”—then systematic treatment for ac­cept­ ance might be more effective. Or, consider international trade in the world system (see Chapter 31 by Zhou in this volume). In world systems research, countries are stratified into three primary positions—the periphery, semiperiphery, and core—but these positions are defined by the extent of connectivity within and between each position. Each country is connected to most of the other countries in the core. The semiperiphery has many ties to the core but fewer ties between each other. The periphery has ties to the semiperiphery and core but no ties to one another. Opportunities, such as those connected to trade, are constrained by a country’s position in this broader global system, with exploitation tending to move from the core to the periphery. If our goal is to help spur national development, are we better off acting on diffusion mechanisms or positional mechanisms?

Basic Network Forms We can think about how these approaches interpret different basic network forms in different ways. Figure  2.3 depicts four different basic network forms that you have probably

22   Ryan Light and James Moody (a)

(b)

C

D

A B

B

E

C D

F

E

A

F G H

J

G

I I

H

(c)

J

(d)

A

D

C

B

J

E

I

F

B

E

G C H

D

F

A

J

G H

I

figure 2.3  Some basic network forms (see Christakis & Fowler, 2009). encountered in your everyday life. A network in its most basic form, perhaps, is an empty graph or an unconnected network. As seen in Figure  2.3A, an empty graph contains no edges linking the nodes. A node that is disconnected from a network is called an isolate and an empty graph consists entirely of isolates. A group of strangers independently working in a library may be represented as an empty graph. Figure 2.3B depicts another basic network, the bucket brigade. In a bucket brigade, there are no redundancies in ties and very little hierarchy. Every person is tied to no more than two network alters, or connected nodes. The telephone game represents a classic bucket brigade network. In this game, players transmit information through the network from player A to player J. With the exception of the end points, players have equal power to disrupt or change the message. Even when played earnestly, the telephone game often results in a poorly communicated message as players confuse, mishear, or mumble the message from person to person. Figure 2.3C represents a telephone tree. Before the advent of group messages via text or email, telephone

Network Basics   23 trees were common. Here, a central node would call several associates, who would call more associates, and so on. In this case, hierarchy is noticeable. The most centrally located node, node J, connects the two different clusters of people. Playing a somewhat weaker role, G, H, and I also have a good deal of power, while nodes A through F are the weakest structurally. Figure 2.3D is a maximally connected graph. In this graph all nodes have N – 1 edges or in this case 10 – 1 = 9—we subtract 1 because we are not considering ties to oneself, which are often called loops. Or, all nodes are connected to one another. Let’s quickly think about the implications of these simple forms for a network process. We can imagine a group of public health scholars evaluating potential interventions for a disease observed over each of these graphs. In the disconnected graph (Figure 2.3A), we can observe that none of these people share a direct tie with one another. Therefore, we can assume that any disease diffusion in this population occurs from unobserved network ties or from means that don’t require direct contact, such as airborne transmission. In the bucket brigade (Figure 2.3B), tie ordering determines potential risk to the spread of disease. The same can be said of Figure 2.3C; however, we can imagine targeted interventions based on the most central nodes, or the four nodes that connect the graph. In a maximally ­connected graph, such as Figure 2.3D, the risk of disease diffusion is shared equally between all people and prevention campaigns likely include engaging every actor, such as a ­population-wide vaccination program.

Network Building Blocks: Bridging Levels Within most networks, nodes vary by the amount of power or influence they have. Centrality is one way of measuring power as very central nodes may control more of the action in a network than less central nodes. Central nodes are often said to be more “im­por­ tant” to the overall structure of a network. For example, in Figure 2.3C, the central node J controls whether the three groups of triads can communicate with one another. There are numerous ways of measuring centrality, but the simplest measure of centrality is degree. Degree is the number of edges incident to the node, which is equivalent to the number of alters that a node has. Degree is often a way that popularity within a network is measured, which makes sense in, for example, a friendship network as degree captures the number of people that have nominated an individual as a friend. As discussed further in Chapter 17 by Borgatti and Everett, network scholars have developed numerous ways to measure centrality beyond degree that emphasize different theoretical interests, such as power, the flow of information, and proximity. Centrality alone does not determine the importance of a particular node within the network as importance is based, in part, on what an actor hopes to accomplish. A central actor may share a strong relationship with many other people and could therefore spread information quickly to them, yet the information they possess is likely to quickly become ­redundant: “Did you hear the rumor about Jack? Yeah, Jennifer already told me.” Strong ties in Granovetter’s (1973) classic framework connect two people who spend a lot of time together, while weak ties connect people who pass in and out of each other’s lives. While

24   Ryan Light and James Moody strong ties may be particularly effective at providing social support—as they are more likely to connect people who trust one another—the people connected by weak ties are more likely to share new information. In a similar vein, Burt (1992) introduces the notion of structural holes. Structural holes indicate the absence of ties between groups of people. To Burt, structural holes represent opportunities to learn new information and/or control its spread. Scholars are often trying to broker structural holes in their research fields as indicated by the number of research papers trying to fill gaps or bridge chasms. Politicians may also try to broker structural holes within misaligned constituencies. In general, the basic network forms are aggregates of local network elements, and generally it is useful to consider the network aggregation process explicitly. The most basic unit of a network is a dyad. A dyad is a pair of nodes. A dyad can be “on” or “off,” reciprocated or not. Dyads are useful in thinking about potentially shared behaviors, for example, when thinking about factors that lead senators to vote the same way on legislation, and are the fundamental units for diffusion—bits traverse dyads from those who hold them to those who do not. Obviously, dyads alone ultimately offer a very limited picture of social relations. The simple addition of another alter creating a group of three—or a triad—generates a far more complicated situation as the pairs are likely to play off one another as individuals may vie for attention given the “audience” of an additional actor. Kadushin (2012, p. 204) calls triads “the molecules of networks.” We can observe the extent to which classic axioms related to triads are true within these small networks: Is a friend of my friend also my friend? Or is an enemy of my enemy my friend? These related issues address the roles that actors may play in small groups and indicate the tendency toward social balance: Groups of individuals cohere around similar attitudes and beliefs as they attempt to reduce the strain of imbalance in their social groups (Hummon & Doreian, 2003). If we are spending a lot of time with someone yet we do not know many of their friends, we can begin to feel excluded and suspicious. We can think about many different kinds of triads based on whether ties are reciprocated within groups of people. For example, a triad may be fully connected and reciprocated such that all three people know and like one another. Alternatively, a person may know two people who do not know each other. We would expect that they are likely to come to know one another in the future, but in the interim the bridging figure has a lot of power and can play one person off of the other: “You should meet my friend. He is [blank].” The triad census identifies all possible combinations of connectivity between three nodes and serves as a kind of periodic table of social relations. Figure 2.4 presents the 16 possible triads for any directed binary network on a single relation. Here we have highlighted triads of particular interest from a balance theory point of view. In the four-arch triad with two mutual, no asymmetric, and one null dyad (“201”), we see that a “friend of a friend” is not a friend, so if balance theory is true, then we would expect people to avoid this configuration. In fact, if everyone follows the rule that a friend of a friend is a friend, then we will never see the 201 triad in a (stable) population. This is called an intransitive triad and is comparatively rare. Transitive triads, like “300,” occur when ties connect all three nodes and indicate local clustering. These properties have significant implications for the resulting macro-structure. For example, a network without 201 triads in its population means that all relations are within clusters, so the resulting network will be a set of complete cliques

Network Basics   25 Triad Census (0)

(1)

(2)

(3)

(4)

(5)

(6)

003

012

102

111D

201

210

300

021D

111U

120D

Intransitive Transitive

021U

030T

120U

021C

030C

120C

Mixed

figure 2.4  Triad census. (Davis & Leinhardt, 1971). In general, if you can specify an interactive social behavior as a process on either dyads or triads, you can operationalize that social process as a restriction on the distribution of triads (see Johnsen, 1985, 1986, for details). Social balance is but one model for how networks emerge out of the interdependent actions of many people. If, instead, we posit a preference for like others and (an even minor) avoidance of those who are different, networks can quickly segregate into purely homophilous subsets (Schelling, 1971). Or, if people prefer to connect to those who are already popular (a preference for the 021U triad, for example), then the networks that result will tend to be highly centralized and have a long-tailed degree distribution.

Boundary Specification The fact that networks often overlap in everyday life leads to problems related to where a network ends and where it begins. For example, does my friendship group begin with only my closest friends or does it include my friends’ friends as well? Or maybe we cannot understand an individual’s friendship group unless we consider all possible friends? This is the problem of boundary specification. In social networks boundaries can often logically extend to every human on the planet. As collecting data from the human population is unreasonable, network scholars often construct more tractable and practical networks for analysis. Figure 2.5 illustrates several of the different levels of analysis within a social network. We call the dark node in the center of the graph the ego as it is the focus of this particular analysis. As mentioned earlier, the smallest network forms include dyads or triads.

26   Ryan Light and James Moody Primary Group Global Network

Dyad

2-step Partial network

Ego-Net

figure 2.5  The relationship between different levels of network analysis. For example, we can study our ego’s best friends or closest confidants. Moving one step out from the ego, we can also examine all of the actors directly connected to our ego. This is called an ego network. In the ego network identified in Figure 2.5, we can see the five gray actors connected to the ego and we can also observe the connections that they share with one another. A lot of social science research observes networks at the level of the ego network as these data can be collected from individuals in traditional surveys. The primary group, or community, at the top of the graph consists of a collection of densely connected ties. In sociology, primary groups are most often associated with close friends, neighbors, or family members and are often strong sources of social support (Litwak & Szelenyi, 1969). Network scholars may also like to take a step beyond the ego network to the ego’s twostep network as illustrated by the two-step partial network that encapsulates most of the network illustrated in Figure 2.5. A two-step network includes all of the ego’s immediate connections and their immediate connections. Last, scholars may seek an entire global network, but again even here boundaries matter. The National Longitudinal Study of Adolescent Health (Add Health) collected global networks of hundreds of high schools in the United States—a perfectly reasonable and valid boundary. Yet undoubtedly this network fails to capture all of the friendships shared by the interviewed teenagers as the researchers could only reliably collect data on friendships that occurred within each school: friends who attended other schools were ignored. The enormous increase in the availability of network data, especially big data, has the potential to mesmerize scholars away from thinking about the bounds of data as big data often seems boundless. Yet boundary specification remains a theoretical and methodological concern even when a network appears vast. Network scholars should continue to acknowledge issues related to boundary specification and think through their implications.

Network Basics   27

Connectivity, Cohesion, and Community We often seek to observe and characterize how collections of nodes are connected to one another. In other words, we may want to know whether or how people in a social network form groups. This relates to longstanding interest within the social sciences about social solidarity and social order generally (e.g., the classic work of Durkheim, Simmel, etc.). There are several different strategies for finding groups in social networks, and scholars are best to choose a strategy that corresponds with their theoretical framework. These approaches build from the notion of cohesion. Cohesion denotes the extent to which a group of people is glued together. A simple meas­ure of network-wide cohesion is density. Density is the number of ties in a network divided by the total number of possible ties (or network size minus 1). We talk about the cohesion of sports teams with an expectation that teams with less cohesion, or fewer ties to one another, are less likely to succeed. In general, a cohesive social group will have more internal ties than external ties, and group members will likely be close to one another (Moody & White, 2003). Therefore, a subgraph—or a section of a network—that has nodes that are maximally connected is cohesive. This cohesive subgraph is called a clique. You can imagine that cliques in the “real world” are relatively rare and therefore are often less useful for social network analysis (Moody & Coleman, 2015). Relaxing the assumption of maximal connection, scholars have built measures of cohesion based on the extent of connectivity within groups of nodes. For example, Moody and White (2003) build a measure of structural cohesion based on node connectivity. The basic idea is that cohesive groups are groups that are robust to node removal, or disconnection (Moody & Coleman, 2015). Or, as Moody and White (2003, p. 107), define: “A group is structurally cohesive to the extent that multiple independent relational paths among all pairs of members hold it together.” If a group doesn’t have multiple available paths, it will be vulnerable to node removal and therefore will not be cohesive. This measure of structural cohesion captures one dimension of embeddedness, or the extent to which someone is integrated into dense networks with multiplex ties (see Granovetter, 1985). Using a related logic, computer scientists and physicists, among others, have developed community detection algorithms for locating structure in complex networks based on the robustness of subgraphs to shocks. Where Moody and White (2003) focus on node removal, many of these alternatives focus on edge removal, iteratively deleting edges and examining whether this deletion leads to more in-group than out-group ties (Newman & Girvan, 2004). With this in mind, many community detection algorithms begin with some variation of modularity. Modularity “measures the fraction of the edges in the network that connect vertices of the same type (i.e. within-community edges) minus the expected value of the same quantity in a network with the same community divisions but random connections between vertices” (Newman & Girvan, 2004, p. 7). As with many of the concepts described in this chapter, choosing between the available methods of locating cohesive groups in social networks is a theoretical and substantive concern and should be done with some care. The importance of locating groups within increasingly complex networks is likely to continue pushing statisticians and others to develop optimal community detection strategies. In this volume, Shai and coauthors

28   Ryan Light and James Moody provide an overview of the strengths and weaknesses of some of the current methods (see Chapter 16).

Statistical Models of Networks Statistical models of networks are increasingly common as scholars move beyond descriptive statistics or the inclusion of network statistics as independent variables to ask causal questions about networks themselves. Three recent trends include (1) models that ask how an observed network came to be or what factors contributed to the structure of an observed network, (2) models that infer diffusion or how something flows over a network, and (3) models that account for network change. Network data inherently violate the independence assumption central to the appropriate use of most common statistical methods in the social sciences, such as ordinary least squares (OLS) regression and its extensions. Network data are not independent but are relational, and therefore, we assume that nodes in one area of the graph are more closely related to each other than to nodes in a distant part of a graph. Fortunately, a well-­developed set of tools accounting for network dependencies allow for the statistical modeling of network formation, diffusion, and change and can be implemented in several statistical programs, including R (see Kolaczyk & Csárdi, 2014). For example, p* or exponential random graph models allow scholars to estimate network effects in addition to typical non-network factors. For example, Goodreau, Kitts, and Morris (2009) model the factors influencing friendship in high schools. These factors include both typical demographic factors that one might assume affect friendships in high school, such as grade level, sex, and race, and network factors like triadic closure (e.g., the friend of a friend is more likely to be your friend). They find evidence of many of the processes that we might expect: assortative mixing—­ groups tend to stick together—by grade and sex, as well as triadic closure within groups. Yet they find surprising variation in effects by race, especially influenced by variation in the racial composition of schools. See Chapter 12 by Kuskova and Wasserman and Chapter 13 by Lusher et al. in this volume for an introduction and overview of statistical approaches to social network analysis. Chapter 14 by Schaefer and Marcum extends these approaches to dynamic contexts, describing methods that allow for the examination of the factors that influence changes to networks over time.

Collecting Social Network Data Numerous strategies exist to collect social network data.1 These strategies fall into two broad groups: local networks and global or complete networks. Local network sampling implies either a nongeneralizable convenience sample or a random probability sample, while global networks imply a census. Of course, these two types of networks relate to one another as every ego network is a sample drawn from a population-level global network. This means that for some research questions, such as attribute mixing (e.g., proportion of Black students with white friends), ego network data are sufficient to draw population inference.

Network Basics   29

Name Generators Many network data collection strategies start with the construction of a name generator. Name generators are means for procuring a list of network alters. For example, you may ask an interview subject to list the five most recent people with whom she had lunch. Of course, lunch dates may not be interesting, so often name generators capture some element of social capital, like “who would you turn to in times of need.” Marin and Hampton (2007) identify four main approaches to name generation: role relation, interaction, affective, and exchange. Role relation captures network alters from specific domains, such as friends, coworkers, or family; interaction-based generators focus on contact; affective generators capture the network alters an interview subject feels “close to”; and exchange approaches focus on social and financial support and engagement. As Marin and Hampton (2007) state, each of these approaches is both a subset of a respondent’s full personal network and theoretically valid, but can be misused. Specifically, research questions should be consistent with the type of name generator used in the collection of network data. The most famous, and perhaps most controversial, name generator is the General Social Survey’s (GSS’s) “important matters” generator. To efficiently capture people’s core network alters, the GSS asks the following question: “From time to time, most people discuss im­por­ tant matters with other people. Looking back over the last six months, who are the people with whom you discussed matters important to you?” The interviewer probed the re­spond­ ent to answer up to five network alters. This question captures an interaction—a discussion taking place in the past six months—and hints at the affective dimension: surely someone with whom you discussed something important is someone who is close to you. Indeed, an extensive body of important research has assumed that the network elicited by this name generator identifies “close” alters. However, the important matters name generator has been subject to some criticism due to variation in how people interpret “important matters” (Bearman & Parigi, 2004). Like any data collection strategy, the research question is key for determining the appropriate name generator. When scholars use large surveys with network modules after data collection, the name generator’s connection to research questions is no less important as the theoretical validity of the network is contextual.

Network Sampling When we sketch out plans for a social network analysis, we often envision collecting data on a complete, global network, but this is often not practical Yet, alternatives have been developed. We can bound our analysis by a specific social location—for example, we may reasonably collect a complete network of coworkers in a modest-sized organization. Yet, our project might not lend itself to a well-bounded setting. For example, if we are interested in drug use within a city, selecting a well-bounded social setting might be difficult or impossible. In this case, we could construct ego networks of known drug users, but these ego networks may overlap in significant ways, complicating any statistical inference we may wish to make about drug use. Thankfully, we are not limited to these two approaches and have several network sampling tools.

30   Ryan Light and James Moody Link-tracing methods, such as respondent-driven sampling, have become dominant, especially when seeking information on hidden populations, such as drug users or sex workers. Respondent-driven sampling extends snowball sampling methods where re­spond­ ents refer alters in their network for recruitment into the study (Wejnert & Heckathorn, 2008). If enough people are recruited—or the “referral chain is long enough”—statistics derived from the sample are independent of the seeds used to start recruitment and can be used for calculating unbiased population estimates (Wejnert & Heckathorn, 2008). This strategy has not been without critics (Verdery et al., 2015) and is likely to remain an area of continued research interest and debate. Recent variations have found that folding in more detailed local ­network information can lead to much more robust population estimates (Mouw & Verdery, 2012). The network scale-up method is an additional network sampling strategy that, unlike respondent-driven sampling, does not include the network itself in the sampling process. With the network scale-up method, respondents are asked to report on a target population—often people who are otherwise hard to reach directly, such as people who they know who inject drugs (e.g., Bernard et al., 2010). These responses are used in conjunction with reports of network size to construct a population estimate about network characteristics. See Chapter 9 for McCormick’s thorough discussion of the network scale-up method. While network sampling has been a topic of discussion in social networks for decades, it remains one of the most vibrant areas of work on social networks. This work has only expanded with the incorporation of online social network data.

Ethics and Social Network Analysis Social network analysts should consider the ethical implications of their research regardless of whether the data they are using is primary (collected by the researcher) or secondary (collected by previous researchers). In the United States, ethical protocol for academic research is organized around the principles of the Belmont Report. The Belmont Report was written in the aftermath of several tragic examples of ethical malfeasance within the global scientific community, including the atrocities performed by scientists in Nazi Germany or in the United States during the racist Tuskegee experiments. The Belmont Report and subsequent protocols focus on several principles of ethical research including informed consent and the maintenance of privacy via anonymity or confidentiality. Informed consent requires that research subjects are able to make informed decisions about their participation in the study; they should understand that their participation is voluntary, and they should know that they can walk away from the study at any moment. Special protocols might be necessary for certain vulnerable populations, such as children or inmates (Singleton & Straits, 2005). Social network analysis presents some specific ethical challenges that scholars should consider and address. For example, in many network studies individuals who are not research subjects are potentially included in the analysis. Research subjects responding to  a  name generator will provide names or codes that identify people in their social world  who are not research subjects and are therefore not able to provide informed consent. As  adams, Santos, and Williams elaborate in Chapter  7 of this volume, ­

Network Basics   31 confidentiality helps to protect the privacy of both the research subject and their ­network alters. Special care should be used to maintain confidentiality within social network analysis as networks are often susceptible to disaggregation, whereby atypical structural patterns can identify otherwise confidential subjects. This is analogous to studies of wealth that include the very wealthy who may be easy to identify with precise measures of wealth and basic demographic information. Network analysis is deeply connected to research using big data or the massive datasets built from the digital traces humans leave on the internet or via digitally mediated practices, such as when shopping using credit cards or shopper reward programs. These data are often public or may be procured through research partnerships with organizations. The public nature of data, such as public Twitter data, alleviates some concern about informed consent because users are participating in a public forum, much like quoting someone who has submitted an editorial to a newspaper. Yet, despite the public nature of Twitter and other online social network data, ethics should be a subject of concern as users of these platforms may have a very different sense of how their data may be used and could be subject to abuse. Research partnerships with corporate owners of massive troves of data present several additional ethical concerns. First, as Lazer et al. (2009) write, conclusions that are drawn from these data can be difficult or impossible to evaluate. On its face, this might appear as a research methods and not an ethics issue, but provocative findings about important social issues drawn from inaccessible data can lead to policy changes with ethics ramifications. A more open science is an ethical principle. Second, Lazer et al. (2009) describe the ethics of having a closed system of research that rewards a scant few with data access while prohibiting access to all others. This establishes unethical research hierarchies and should be a concern of social network analysts as well. This short overview does not intend to be a comprehensive discussion of ethics and social networks but rather hopes to serve as a launching point for further discussion and thought. More extended discussions can be found in several chapters in this volume, in most textbooks on network analysis (e.g., Kadushin, 2012), and in other edited volumes and/or special issues explicitly dedicated to the topic of ethics and network analysis (e.g., Breiger, 2005).

Conclusion This chapter has provided a brief overview of some of the concepts that provide the building blocks for understanding social networks. It is by no means comprehensive; the intent has been to introduce several concepts, strategies, and concerns that will assist in future exploration. Subsequent chapters in this volume will dig deeper into many of the topics that were introduced in this overview. Networks provide the fundamental structure of social life. Individual opportunities and constraints are in part determined by positions within our vast intersecting community of networks. These networks transcend place through virtual worlds—online spaces untethered to our geography—creating opportunities to connect to people globally. Networks also transcend time as we connect through memory to ancestors and other absences. Networks are vast, but these connections are so deeply engrained in our everyday life that they are easy to take for granted. Network analysis challenges this ­taken-for-grantedness, bringing into resolution the connected world that is always around us.

32   Ryan Light and James Moody

Note 1. For a more complete discussion of social network data collection, see adams (2019) and Chapter 7 by adams et al. in this volume.

References adams, J. (2019). Gathering social network data. Thousand Oaks, CA: Sage. Bearman, P., & Parigi, P. (2004). Cloning headless frogs and other important matters: Conversation topics and network structure. Social Forces, 83(2), 535–557. Bernard, H. R., Hallett, T., Iovita, A., Johnsen, E. C., Lyerla, R., McCarty, C., . . . Shelley, G. A. (2010). Counting hard-to-count populations: The network scale-up method for public health. Sexually Transmitted Infections, 86(Suppl 2), ii11–ii15. Breiger, R.  L. (2005). Introduction to special issue: Ethical dilemmas in social network research. Social Networks, 27(2), 89–93. Burris, V. (2005). Interlocking directorates and political cohesion among corporate elites. American Journal of Sociology, 111(1), 249–283. Burt, R. S. (1987). Social Contagion and Innovation: Cohesion versus Structural Equivalence. American Journal of Sociology, 92(6), 1287–1335. Burt, R.  S. (1992). Structural holes: The social structure of competition. Cambridge, MA: Harvard University Press. Christakis, N. A., & Fowler, J. H. (2009). Connected: The surprising power of our social networks and how they shape our lives. New York, NY: Little, Brown. Davis, J. A., & Leinhardt, S. (1971). The structure of positive interpersonal relations in small groups. In J. Berger, M. Zelditch, & B. Anderson (Eds.), Sociological theories in progress (Vol. 2, pp. 218–251). Boston, MA: Houghton Mifflin. Freeman, L.C. (1992). The sociological concept of group: An empirical test of two models.” American Journal of Sociology, 98(1), 152–166. Goodreau, S. M., Kitts, J. A., & Morris, M. (2009). Birds of a feather, or friend of a friend? Using exponential random graph models to investigate adolescent social networks. Demography, 46(1), 103–125. Granovetter, M. (1973). The strength of weak ties. American Journal of Sociology, 78, 1360–1380. Granovetter, M. (1985). Economic action and social structure: The problem of embeddedness. American Journal of Sociology, 91(3), 481–510. Hughes, M. M., Peterson, L., Harrison, J. A., & Paxton, P. (2009). Power and relation in the world polity: The INGO network country score, 1978–1998. Social Forces, 87(4), 1711–1742. Hummon, N. P., & Doreian, P. (2003). Some dynamics of social balance processes: Bringing Heider back into balance theory. Social Networks, 25(1), 17–49. Johnsen, E. C. (1985). Network macrostructure models for the Davis-Leinhardt set of empirical sociomatrices. Social Networks, 7(3), 203–224. Johnsen, E.  C. (1986). Structure and process: Agreement models for friendship formation. Social Networks, 8(3), 257–306. Kadushin, C. (2012). Understanding social networks: Theories, concepts, and findings. New York, NY: Oxford University Press. Kolaczyk, E.  D., & Csárdi, G. (2014). Statistical analysis of network data with R (Vol. 65). New York, NY: Springer.

Network Basics   33 Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabási, A. L., Brewer, D., . . . Jebara, T. (2009). Computational social science. Science, 323(5915), 721–723. Litwak, E., & Szelenyi, I. (1969). Primary group structures and their functions: Kin, neighbors, and friends. American Sociological Review, 34(4), 465‒481. Marin, A., & Hampton, K.  N. (2007). Simplifying the personal network name generator: Alternatives to traditional multiple and single name generators. Field Methods, 19(2), 163–193. Milgram, S. (1967). The small world problem. Psychology Today, 2(1), 60–67. Moody, J. (2002). The importance of relationship timing for diffusion. Social Forces, 81(1), 25–56. Moody, J., & Coleman, J. (2015). Clustering and cohesion in networks: Concepts and meas­ ures. In J. Wright (Ed.), International encyclopedia of social and behavioral sciences (2nd ed., pp. 906–912). Amsterdam, Netherlands: Elsevier. Moody, J., & White, D. R. (2003). Structural cohesion and embeddedness: A hierarchical concept of social groups. American Sociological Review, 68(1), 103–127. Mouw, T., & Verdery, A. M. (2012). Network sampling with memory: A proposal for more efficient sampling from social networks. Sociological Methodology, 42(1), 206–256. Mucha, P. J., Richardson, T., Macon, K., Porter, M. A., & Onnela, J. P. (2010). Community structure in time-dependent, multiscale, and multiplex networks. Science, 328(5980), 876–878. Newman, M. E., & Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review E, 69(2), 026113. Rogers, E. M. (2003 [1962]). Diffusion of innovations (5th ed.). New York, NY: Free Press. Schelling, T.  C. (1971). Dynamic models of segregation. Journal of Mathematical Sociology, 1(2), 143–186. Singleton Jr., R. A., & Straits, B. C. (2005). Approaches to social research. New York, NY: Oxford University Press. Verdery, A. M., Mouw, T., Bauldry, S., & Mucha, P. J. (2015). Network structure and biased variance estimation in respondent driven sampling. PLoS One, 10(12), e0145296. Wejnert, C., & Heckathorn, D. D. (2008). Web-based network sampling: Efficiency and efficacy of respondent-driven sampling for online research. Sociological Methods & Research, 37(1), 105–134. White, H. C., Boorman, S. A., & Breiger, R. L. (1976). Social structure from multiple networks. I. Blockmodels of roles and positions. American Journal of Sociology, 81(4), 730–780.

Chapter 3

Theor ies of Soci a l N et wor ks Jan Fuhse

Networks and Theory Network researchers have long lamented a lack of theory of social networks (Granovetter, 1979; Wellman, 1983, p. 179). Social networks were studied empirically, or written about metaphorically, without worrying too much about their substance. Over the last 30 years, a number of researchers have attempted to fill this “theory gap.” This chapter offers an overview of the most important approaches: the theory of action, pragmatist/interactionist approaches, and relational sociology. Borgatti and Lopez-Kidwell (2011) offer a distinction between “network theory,” on the one hand, and “theory of networks” on the other. They view network theory as about the effects of social networks (e.g., on social mobility), and theory of networks as covering the impacts on networks (e.g., foci of activity). However, Borgatti and Lopez-Kidwell’s candidates are not really “theory,” but formulations of network mechanisms: systematizing recurrent observable processes involving networks (Wimmer & Lewis, 2010, p. 139ff). A proper theoretical treatment has to take an additional step back and offer answers to the following questions: What are social networks? Why do they constitute important features of the social world? And how do they relate to other features like social categories and other cultural forms, resources and social mobility, formal organizations, and largescale fields of society? I focus on social networks because these are still the most important subject area of network research in the social sciences. Also, social relationships seem to  be different from relations between symbolic forms, between actors and material objects, and between events. Therefore, we need different theory to cover these respective ­networks. The following short overviews of the most important theories of social networks cover the (1) theory of action with the social capital concept (Burt, Coleman, Lin, Hedström); (2) pragmatist/interactionist approaches (Emirbayer, Crossley, Martin); and (3) relational sociology around Harrison White (Tilly, Mische, Padgett, Fuhse). The concluding section offers a table summarizing these overviews and a few points for further discussion.

Theories of Social Networks   35

Action Theory and Social Capital The first approach comes from authors adhering to theories of action, following the classic formulation by Max Weber. It comprises Ronald Burt’s, James Coleman’s, and Peter Hedström’s accounts of the interplay of networks as social structures with individual action, as well as the notion of social capital (Burt, Coleman, Lin). To varying degrees, these authors advocate modeling action on the basis of rational choice.

Social Structures and Individual Action According to theories of action, social phenomena result from individual actions. Decisions for these are taken out of subjective considerations of objective external circumstances. Most often, they conceptualize action as driven by individual calculations of utility: I do whatever promises the best results (according to my interests) in any given situation (Lindenberg, 1990; Hedström, 2005, p. 38ff). Individual actions then lead to the change or reproduction of social structures that constrain or enable future action. Following this general stance of methodological individualism, Ronald Burt (1982) views networks as social structures resulting from individual action, and channeling it in turn (Figure 3.1). • In any given situation, the network forms part of the context of action (1). The position of an actor and his or her ties to others are given and influence his or her behavior. • The position in the network determines the actor’s interests (2). • These interests lead to particular action (3). However, the network not only affects action through interests but also affords actors with different opportunities for action, depending on their position and embeddedness (also 3 in Burt’s figure). Actors in the center can influence processes in the network, for example, by distributing resources (e.g., money and symbolic esteem). Peripheral actors, in contrast, face less pressure to conform to expectations in the network and enjoy more leeway in their behavior. Networks constrain and enable individual action. • Finally, individual decisions have repercussions for the social structure at hand (4). They can reproduce or change network constellations. Reproduction occurs when action conforms to expectations. Networks change when action deviates from the expected.

1 Social structure as the context of action

2

Actor interests

3 3

4

Action

figure 3.1  Action and social structure according to Ronald Burt (1982, p. 9).

36   Jan Fuhse James Coleman (1990, p. 8ff) offers a simplified version of Burt’s schema with his wellknown three-step model of social explanations. Here, the social structure affects only individual decisions, leading to actions that then get aggregated to changes in social structure. This modeling of networks in action theory has to confront two challenges: (1) Networks affect individual actions in two ways: (a) the position in networks enables and constrains action, and (b) networks influence individual preferences or interests (through step 2 in Burt’s schema). In theory, models have to incorporate these two mechanisms simultaneously, with any one position differing in both opportunities and interests. (2) Step 4 in Burt’s schema consists in the aggregation of actions to a new social structure, or to change in social structure. In statistics, we only have to add numbers to look at rates of individuals acting in isolation from each other (like investing time and opportunity costs in education). In networks, this aggregation requires examining the interplay of a number of actors in different positions, with varying opportunities and interests for action. Generally, the aggregation of actions in complex social structures constitutes the biggest problem for action theory. Peter Hedström (2005, p. 76ff) suggests using simulations to examine the aggregation of actions in network constellations. Agent-based modeling supplies individuals in a population with simple decision rules for tie formation, based on their interests and opportunities. However, Hedström’s (2005, p. 119ff) empirical example only models distributions of youth unemployment in urban neighborhoods. This instance of a statistical aggregation does not consider the interplay of actors in various network positions. The SIENA software now allows for agent-based simulations of empirically observed changes in networks (Snijders, van de Bunt, & Steglich, 2010). SIENA is based on action­theoretical assumptions, modeling the formation of ties as the succession of individual decisions. It does not predict (explain) developing network structures. Rather, it looks for individual propensities for the formation of particular ties (e.g., interethnic friendships). While Burt’s steps 1, 2, and 3 are modeled with SIENA (to varying extents), step 4 remains out of consideration. Overall, action theory adopts a naturalistic take on networks and relationships: They are treated as objective structures, without inquiring what they are made of. They result from individual actions, and they structure them in turn. But since multiple social actors participate in their construction, networks and relationships seem to acquire a thing-like quality. Any one individual action has to take them for granted and can only decide to change or to conform to them. In a sense, action theory does not really provide a substantive answer to the question: What are social networks? Rather, it offers a way of modeling their emergence, reproduction, and change. The big advantages of action theory are its intuitive plausibility and its well-established position in the social sciences. Action-theoretical models of networks can build on a host of preliminary work. However, the interplay of actions and networks may be too complex for manageable modeling.

Theories of Social Networks   37

Social Capital The notion of social capital is also tied to action theory. After the initial formulation by Pierre Bourdieu, a number of action theorists (Coleman, Burt, Lin) adopted the concept. The concept of social capital implies three general assumptions (Lin, 2001, p. 19ff): (1) Networks can be seen as a resource for individuals. Advantageous network positions and connections afford more opportunities for action, in particular for upward mobility. Social networks are viewed from the perspective of the individual. (2) As with economic and human (or cultural) capital, actors possess more or less social capital. Therefore, we have to be able to quantify the value of networks for individuals, or at least to distinguish individuals with more social capital from those with less. (3) Individuals can maximize the value of their social networks. Acting rationally, they should act to accumulate social capital. Individuals enter social relationships when they promise them net value, and they drop or reject them when expecting more costs than benefits. These assumptions make for an economic perspective on social networks (Somers, 2005): people “invest” in their relations and “capitalize” on them (e.g., when drawing on social connections in their job searches). However, the assumptions remain rather abstract. The authors of the social capital concept have quite different ideas about which ties and networks are particularly valuable, and what we can use them for. According to Pierre Bourdieu (1986), social capital consists in the resources that individuals can mobilize through their social relationships. Social capital can be converted into other forms of capital. Individuals use their social relationships to attain educational degrees or cultural refinement (cultural capital) or for upward professional mobility (economic capital). Since individuals do not really act strategically in Bourdieu’s theory, his concept of social capital only shares the first two assumptions, not the third one. But social capital does not play a systematic role in Bourdieu’s theory. Rather, he disdains network research for a preoccupation with the “surface” of “intersubjective relations” (Bourdieu & Wacquant, 1992, p. 113f). For James Coleman (1988, 1990), the concept refers to the embeddedness of individuals into particular local network structures. Social capital primarily lies in dense networks (with “closure”) that force actors to cooperate with each other. Dense networks solve the “free-rider problem”: recurrent interaction with the same actors allows for sanctioning noncooperation. Therefore, closure in networks leads to the emergence of norms for cooperation and of trust. Ronald Burt (1992, p. 8ff) also uses the concept for specific network structures. Here, social capital stands for the access of actors to different network clusters (following Mark Granovetter’s [1973] notion of “weak ties”). Actors with “bridges” across “structural holes” in network structure get more information. Hence, they have better chances on markets (in particular, the job market), and they become more creative through combining ideas from disparate networks (Burt, 2004). Coleman and Burt thus denote contrasting network structures as advantageous. These concepts are brought together in Nan Lin’s (2001) theory of social capital. According to Lin,

38   Jan Fuhse Table 3.1 Overview of Concepts of Social Capital Bourdieu

Coleman

Burt

Lin

Putnam

What?

Resources that can be mobilized in relationships

Dense social networks (closure)

Weak ties across structural holes

Instrumental and expressive social capital

Horizontal ties (in associations)

What for?

Attainment of economic and cultural capital

Cooperation/ solution of free-rider problem

Market opportunities

Social mobility/ solidarity and identity

Democratic and economic performance

How?

Mobilization of resources in groups

Sanctions of deviance, trust

Channels of information

Access to information/ mutual support

Cooperation and solidarity

actors possess different forms of social capital: (1) Burt’s bridges across structural holes are important for “instrumental action.” They give access to information from distant areas of social structure. Accordingly, individuals form “weak ties” (without closure) preferably to members of different classes. These primarily help for upward social mobility. (2) Strong ties in dense networks (Coleman’s social capital) assist in “expressive action.” Alters share resources in strong ties (solidarity) and support each other in their worldviews. 1 Therefore, strong ties with closure are primarily found within social classes. Political scientist Robert Putnam (1993) does not treat social capital as an individual resource, but as a property of regional social structures. These differ by their traditionally horizontal or hierarchical network structures. According to Putnam, the horizontal patterning of social relations leads to cooperation and solidarity, and makes for better gov­ern­ ance and economic development. Putnam’s arguments have become primarily important in the discourse on civil society. The five approaches sketched here differ substantially in what they term “social capital,” in their aims, and in how exactly they are achieved (Table  3.1). Apart from Putnam, all authors reduce networks to resources of individuals. The concept focuses on the effects of network positions and embeddedness on the individual level. Consequently, it often connects to research on ego-centric networks. The conceptualizations by Coleman and Burt (and their combination by Lin) concern the structure of personal ties, stressing different aspects as key (dense networks vs. bridges across structural holes). In contrast, these approaches bracket the composition of ego-centric networks. Only Bourdieu’s theory points to the attributes of the actors connected to, in particular, their resources as important for the value of social relationships.

Pragmatism and Interactionism The sociological-psychological school of symbolic interactionism by William Thomas, George Herbert Mead, Herbert Blumer, and Erving Goffman developed out of the

Theories of Social Networks   39 philosophical strand of American pragmatism of Charles Sanders Peirce, William James, and John Dewey. Both emphasize the processing of meaning in the mind (pragmatism) and in interaction (interactionism) as the basic layer of social reality. In this vein, Fine and Kleinman (1983) proposed a combination of symbolic interactionism with social network analysis. The most important authors following this lead are Mustafa Emirbayer, Nick Crossley, and John Levi Martin—all three of them also drawing on relational sociology around Harrison White (see Relational Sociology). They are set apart from White’s approach by their pragmatist/interactionist foundations in social theory. This entails four important pushes: to a focus on subjective meaning (1), to a processualization of relations and networks (2), to an incorporation of qualitative methods in network research (3), and toward locating networks in social fields (4). (1) In line with the pragmatist roots, the authors of this approach emphasize the importance of subjective meaning for and in social networks. Social relations and networks consist of the subjective meaning that actors hold of them, and this subjective meaning is continuously subject to negotiation and interpretation in interaction (Fine & Kleinman, 1983, p. 101ff). For Crossley (2010a, pp. 28f, 37), the “internalization of the perspective of the other”—the empathy that Mead sees as basic for the process of symbolic interaction—forms part of this subjective construction of relationships. We see ourselves through the eyes of the other and adapt our behavior accordingly. A “common ground” (Merleau-Ponty) or shared “definition of the situation” (Thomas, Goffman) unfolds in the rhythm of interaction. Social relationships build on subjective agreement with regard to how the actors relate to each other. Emirbayer and Goodwin (1994, p. 1437ff) argue that network analysis and Harrison White’s theory of networks fail to take both agency and the autonomy of culture into account. Culture is not encapsulated in social networks and determined by them. Rather, individuals agentically draw on different cultural ideas to make sense of their situation, and to engage in interactions and relationships with others. With Ann Mische, Emirbayer offers an account of agency that combines habitual iteration, forward thinking, and practical evaluation of alternative courses of action (Emirbayer & Mische, 1998; see also Erikson, 2013, p. 233ff). In contrast, Martin (2010) builds on recent psychological research to argue that actors are not overly intelligent and creative but follow relatively simple heuristics in their behavior toward others. They lead them to connect in particular social network structures—for example, in closed and polarized group structures from heuristics of transitivity in bounded contexts (Martin,  2009), or in ranked prestige hierarchies when people follow others in their choices, as in the sexual field (Martin & George, 2006). These heuristics depend on the relations at hand, on concrete circumstances of the interaction, and on the individual position in relations to others. (2) All the pragmatist and interactionist approaches focus on subjective meaning as important for social relations and networks. They differ in the modalities of subjective processing of meaning—from creative agency to simple heuristics. In spite of these differences, the authors considered here arrive at similar formulations about the nature of social networks. These are conceptualized as “structures of meaning” that develop continuously over the course of interaction (Fine & Kleinman,  1983; Martin,  2009, p. 9ff; Crossley,  2013, p. 124ff). Relationships as the basic building

40   Jan Fuhse blocks of networks are processualized: they are dynamic and fluid (Fine & Kleinman, 1983, p. 99ff), “lived trajectories of iterated interaction” (Crossley, 2010a, p. 28f). Emirbayer (1997, p. 286f) terms the processes at play in social networks “trans-action”: both the identities and the relations between them are constructed (in their meaning) through processes of transaction.2 (3) Some authors following the interactionist tradition have called for the use of qualitative methods to study social networks. If social networks are laden with subjective meaning, we have to inquire into this with interpretive methods. Consequently, Crossley (2010b, 2015) combines qualitative tools with the formal techniques of network analysis. Matthew Desmond (2014) lays out “relational ethnography” as a way of examining relational configurations with qualitative interviews and participant observation. In contrast to traditional ethnography, this approach studies “fields rather than places, boundaries rather than bounded groups, processes rather than processed people, and cultural conflict rather than group culture” (Desmond, 2014, p. 548). (4) The fourth and final push, not directly connected to pragmatist/interactionist roots, locates relations and networks in social fields. Martin (2011, p. 268ff) draws primarily on earlier Gestalt psychology for his theory of fields. Social fields are spaces of mutual orientation characterized by uncertainty of the actors at play. As a result, they enter social relations and develop institutionalized rules (DiMaggio,  1986). For Martin (2011, p. 191ff), the behavior of the actors follows simple heuristics (see earlier), or a “social aesthetics” of how to relate to each other. In the case of sexual fields, this results in a prestige hierarchy of actors in terms of their perceived attractiveness (“sexual capital”; Martin & George, 2006). Building firmly on Pierre Bourdieu, Emirbayer and Desmond (2015) develop the concept of the “racial field.” However, they place less emphasis on networks of intersubjective relations than on Bourdieusian “objective relations” (p. 84ff). These are characterized by the relative distribution of various forms of capital, including the newly discovered “racial capital.” In contrast, social networks feature centrally in Bottero and Crossley’s (2011) account of fields. Building on Bourdieu as well as on Howard Becker’s concept of art worlds, they study the concrete webs of relations among different kinds of actors (artists, managers, venues, customers). The fourth push may be coincidentally pursued by the most important authors of the pragmatist/interactionist approaches to social networks. The other three pushes—to subjective meaning, processualization, and qualitative-interpretive methods—follow from interactionism and pragmatism. These approaches to social networks seem to have surged over the last years. It remains to be seen to what extent they are adopted in network research and how they combine with its traditionally formal and quantitative methods.

Relational Sociology Relational sociology around Harrison White is connected to the pragmatist and interactionist approaches, sometimes interweaving with them (Mische,  2011; Fuhse,  2015a).3 Rather than building on an established theoretical tradition in sociology, relational

Theories of Social Networks   41 sociology formulates a specialized theory of social networks. I first present the basic approach and then some recent extensions, including my own.

Social Networks and Meaning White was central in American structuralist network research of the 1960s and 1970s, developing blockmodel analysis with his students and directly influencing many authors (Scott, 2000, p. 33ff). During the 1980s, White became dissatisfied with a purely structural vision of networks. He wanted to know what a tie, as the basic building block of networks, was, and why and how it meshed with other ties to form larger structures (Mische, 2011, p. 82). White (1992) laid out his answer in the first edition of his book Identity and Control: Social networks are not mere patterns of ties. Rather, they have to be thought of, and studied, as constructions of meaning. Networks consist of identities that are constructed in stories, and thereby related to each other (White, 1992, p. 65ff). The starting point for these arguments is—as in many social theories—the general uncertainty of social interaction (White, 1992, p. 3ff). Identities strive for “control” and “footing” in social contexts. Social structures, however, emerge only through the observation by others. Control attempts of identities leave a social trace in the form of stories told about them. White’s (1992, p. 6ff) example here is the children’s playground. The collaboration and jousting between children are observed in a web of stories about them and about the relations between them. White thus views social networks as interwoven with forms of meaning (identities, stories) and structured by them. Unlike in pragmatism/interactionism, we are not dealing with “subjective meaning” in people’s heads here. Rather, stories about identities and relations are communicated, thereby structuring social interaction (White et al., 2007, p. 544). Over and above identities and ties, White views networks as interwoven with domains of cultural forms (Mische & White, 1998, p. 702ff). Domains consist of symbols, language (e.g., specialized argot), norms, etc. These forms of meaning develop in the interaction in networks and differ from one network context to the next. Cultural forms depend on their development and diffusion in networks, just as networks are imprinted by them. White sees network and domain as inextricably intertwined in “netdoms,” and as only analytically separable. In the two editions of Identity and Control, White (1992, 2008) builds elaborate theoretical architecture on these foundations. Rather than discuss complex but rarely used concepts like “disciplines” or “control regimes,” I concentrate here on insights from White and authors around him on the interrelation between network constellations and forms of meaning. Identities in networks are not prefabricated building blocks, but projection points for their construction in stories. They and their qualities result from interactive control attempts and from their narrative observation. Charles Tilly (2002, pp. 8f, 26f) regards the attribution of subjective dispositions to these identities in stories as key for this process. Observed behavior is traced to relatively stable motivations and capacities for action. This leads to expectations about the future behavior of these identities. In this vein, artists are seen as more or less talented (capacity) based on the quality of their paintings (observed behavior) and as adherents of particular styles (motivations). Padgett and Ansell (1993) show the importance of these processes of attribution for the arrangement of social relations in the ascent of the Medici to power.

42   Jan Fuhse Network formations bring about particular styles and are imprinted by them (White, 2008, p. 112ff). First of all, a style is an artifact of observation: the behavior of multiple identities in a network is classified as similar, and the identities are recognized as adhering to a common style. The classification becomes part of social structure if the control attempts of identities follow the style. For example, artists produce paintings with the characteristics of impressionism, fauvism, or abstract expressionism (White,  1993, p. 63ff). This back and forth between observation, classification, and orientation relates identities to each other as similar or dissimilar. Styles often develop in dense networks and demarcate them symbolically. But they can also correlate to structurally equivalent positions in networks. For example, the social movement leaders studied by Ann Mische (2008) display particular communication styles according to their affiliation with one or more movement organizations. Social categories (like gender or ethnicity) group identities by criteria of similarity and difference. White (2008 [1965]) early termed categorically separated networks as “catnets” (for category and network). Blockmodel analysis reconstructs not only categories dividing dense network clusters from each other but also role categories of structural equivalence. Social categories are connected to rules for interaction within and between them. In White’s theory since 1992, categories become part of the domain of cultural forms in a network. According to Tilly (1998, p. 63f), categories need stories to render them plausible and ­legitimize them (“boundary stories”). Social categories can emerge as “collective ­identities” from the cooperative interaction in dense networks and motivate for collective action (Gould, 1995). But they can also be used to bar outsiders from resources (Tilly, 1998, pp. 6f, 75ff). Paul DiMaggio combines White’s theoretical arguments with those from Pierre Bourdieu (DiMaggio & Powell, 1983; DiMaggio, 1986). Institutions (like marriage, shaking hands, or formal organization) are cultural models for interaction. They emerge isomorphically in the mutual orientation of actors and institutionalize through repetition and imitation. Therefore, they differ by network context. Following DiMaggio (1986), some institutions concern the relations between categories of actors. They can be reconstructed by way of blockmodel analysis. White (2008, p. 171ff) conceptualizes institutions as networks that are strongly patterned by cultural models. Overall, relational sociology of and around Harrison White does not really provide a strong and consistent perspective, like action theory. Rather, it offers a creative collection of theoretical arguments about the interplay of meaning and networks. Its strength lies in the fruitful connection to empirical social research, as exemplified by the studies of Mische (2008), Padgett and Ansell (1993), Gould (1995), and DiMaggio (1986).

Extensions Since the end of the 1990s, relational sociology has been extended in two ways: (1) Culture is now itself conceptualized and studied as a network of symbols meaningfully connected to each other (Mohr, 1998; Fuchs, 2001). Some symbols or concepts in a culture (or in the “domain” connected to a network) are more central, others in the periphery. Systematic differences between symbolic universes can be analyzed in

Theories of Social Networks   43 their network structures. Generally, relational sociology is most interested in the relations of symbols that are connected to the social networks between actors, for example, in reconstructing the meaning of social categories (Mohr, 1994) and relationship frames (Yeung, 2005), or in identifying “weak culture” shared across network clusters (Schultz & Breiger, 2010). (2) Relational sociology increasingly turns to the communicative events in which networks are negotiated and constructed (Mische, 2003, p. 262; Mützel, 2009). Godart and White (2010) argue that “switchings” between sociocultural contexts make for the emergence of meanings and the change of social structures. Emirbayer (1997, p. 287) and Tilly (2005, p. 6f) refer to “transactions” as the basic process in networks (see Pragmatism and Interactionism), while Padgett (2012) and I (Fuhse,  2015b) build on different notions of “communication.” This second push leads to a concern for studying the dynamics (and the reproduction) of networks in communication, rather than fixed patterns of ties (e.g., by drawing on conversation analysis and/or with relational event models). Both advancements connect well with the recent turn to big data and computational social science. Relational sociology provides a fruitful backdrop for formal quantitative research on communicative events and on symbolic structures in data compiled from large databases, from journalistic and political discourse, and from the Web 2.0. Authors from the tradition play a prominent role in the theoretical reflection of big data analyses (Mohr, Breiger, & Wagner-Pacifici, 2015). In my own work, I try to provide a coherent framework for the various threads running through relational sociology (Fuhse, 2009, 2015b, in preparation). Social networks are conceptualized as dual in nature. They consist of (1) observable regularities in communication and (2) the patterns of meaning underlying these regularities. This “meaning structure” of a social network is made of “relational expectations” about the behavior of particular actors toward particular others. These arise out of the attribution of communicative events to actors, making for definitions of identities and their relations to each other. In turn, relational expectations structure communication by rendering particular events likely and others unlikely. Communicative regularities are influenced, of course, by opportunities for contact at foci of activity. In contrast, the meaning structure of networks builds on cultural models for relationships and network structures. For example, relationship frames like “love,” “friendship,” or “patronage” make for very different relationships and network structures. Com­ munication and meaning structure thus retain a certain relative autonomy, in spite of their close interplay. Actors can be not only individual but also formal organizations (e.g., companies, universities, states) and collective actors (e.g., social movements, street gangs). Collective actors are constructed on two levels: (1) internally, the sharp drawing of a boundary of meaning to the outside and the symbolic construction of a collective identity go hand in hand with an increased density of ties and the thinning out of personal ties to the outside, and (2) in the network with other collective/corporate actors (e.g., in the political field), they effectively become actors if communicative events (e.g., protest marches for social movements or homicides for gangs) are attributed to them rather than to individual actors. This leads to “relational expectations” about how collectives behave toward other collective or corporate actors.

44   Jan Fuhse These various advancements lead to a complex theoretical architecture with social networks as patterns of communicative events and intertwined with forms of meaning (that can themselves be studied in networks of symbols). Overall, relational sociology is mostly connected to the formal study of full networks, with blockmodel analysis and recently exponential random graph models as methods of choice (Gondal & McLean, 2013). Cultural networks are sometimes studied with topic modeling or Galois lattices. But, like the pragmatist and interactionist approaches, relational sociology also connects to qualitative ­methods (Fuhse & Mützel, 2011).

Conclusion The approaches termed here “theories of social networks” are necessarily selective, and their presentation remains sketchy and stylized. Due to space constraints, I had to leave out approaches like network exchange theory (Emerson, Cook) and constructural theory ­(Carley, Mark). Table 3.2 gives an overview of the three families of action theory, ­pragmatism/ interactionism, and relational sociology. Action theory views social networks in a naturalist way as objective structures that afford individuals with opportunities and constraints, depending on their position and embeddedness. The social capital concept reduces these opportunities and constraints to a resource. Individual actors can have more or less of that resource. With their individualist perspective, action theory and the social capital approaches lend themselves to combination with the statistical analysis of ego-centric networks, as well as with agent-based modeling and with SIENA. The authors adhering to pragmatist and interactionist perspectives (Fine and Kleinman, Emirbayer, Crossley, Martin) get away from networks as objective structures. They locate them in the minds of actors and in the interaction between them. Social networks consist in the subjective/intersubjective patterns of meaning attached to ties (definitions of the situation). With their focus on meaning, these approaches connect well with the qualitative study of networks in interviews and in “relational ethnography.”

Table 3.2 Theories of Social Networks Theory of Action/ Rational Choice

Pragmatism/ Interactionism

Relational Sociology

Key authors

Burt, Coleman, Lin, Hedström

Emirbayer, Crossley, Martin

White, Tilly, Mische, Padgett, Fuhse

What are social networks?

Opportunities, constraints

Interaction patterns, definition of situation

Identities and stories, relational expectations

Locus

Objective structure/ individual resource

Subjective meaning/ interaction

Ties, transactions/ communication

Methods

Ego-centric networks, SIENA, agent-based modeling

Qualitative interviews, relational ethnography

Full networks, ­blockmodel analysis, Galois lattices

Theories of Social Networks   45 In relational sociology by Harrison White and his colleagues and students, networks consist of identities that are related to each other in stories. In spite of its strong connections to pragmatist and interactionist authors, relational sociology avoids the minds of individuals. Instead, it regards social networks as supra-individual structures of meaning, or as patterns of transactions/communication. Relational sociology studies mostly full networks with formal methods like blockmodel analysis. With the extensions to cultural networks and to communicative events, relational sociology also draws on Galois lattices, topic modeling, and sequential and qualitative analyses of communication. To conclude this overview, I list a number of challenges for theories of social networks: (1) All the approaches covered here move away from purely structural or formal notions of networks and relations (Erikson, 2013). In different ways, the three theoretical families conceptualize social relations as processual and dynamic. The basic processes at play are termed action, interaction, transaction, or communication, with relations and networks as historical snapshots, results, and determinants. The concepts of action and agency make individuals the sources of these processes, while transactions and communication are distinctly supra-personal. Interaction occupies a middle ground between those two poles. Interactionist and pragmatist approaches point to the importance of subjective meaning (as well as of its interactive negotiation), while relational sociology focuses on meaning that is communicated/transacted. Action theory at times incorporates a similar concern for subjective meaning. But the ­concept of social capital is devoid of these considerations. (2) Who, or what, are the actors in networks? Action theory, pragmatism, and interactionism build on individuals as processing meaning/deciding about their actions. Consequently, they theorize social networks primarily or solely between individuals. Some authors propose their frameworks to be compatible with higher-level actors (like companies or social movements), but without accounting for them theoretically (e.g., Crossley,  2010a, p. 43f). For relational sociologists, networks are bundles of ­stories/expectations that build on the attribution of events to actors, be they individual, collective, or corporate. This leads to a multilevel architecture of the social world, with social relations located on various levels, and with important repercussions between them (Fuchs, 2001; White, 2008). Material objects or cultural symbols do not feature as actors in any of the theories of social networks surveyed here. They do not take decisions (action theory) or p ­ rocess meaning subjectively (pragmatism/interactionism). Nor are stories told about them or relational expectations attached to them (relational sociology). To address cultural relations between symbols, or sociotechnical relations between objects and human actors, we need different theories. (3) The theories covered here provide highly general frameworks. However, they do not give much indication what network configuration we will find empirically or how they will evolve over time. Over the last years, this desideratum has been addressed under the rubric of “network mechanisms” like reciprocity, homophily, foci of activity, transitivity, preferential attachment, brokerage, and contagion. Only a few of the theoretical approaches link to network mechanisms. According to Hedström (2005), all social mechanisms should be modeled on the level of individual decisions. Tilly (2005, p. 23ff) disagrees: mechanisms are chains of transactions that recurrently lead

46   Jan Fuhse to a change in social configurations—without invoking inner-directed decisions. Martin (2011) argues that actors adopt simple heuristics of interaction depending on circumstances and on their positions in fields. These heuristics correspond to network mechanisms. (4) The attempts to connect to Bourdieu’s field theory point to the need for combining network research with concepts for macro-structures of society like politics, art, the economy, science, etc. Action theory remains notoriously silent about these macro-phenomena. Pragmatists and interactionists like Crossley, Emirbayer, Desmond, and Martin, and neo-institutionalist DiMaggio pick up on the concept of fields. Relational sociology seems to provide for this macro-connection with its multilevel architecture of networks. This begs the question of how the interrelation between different levels should be modeled: how do processes within and between social units affect each other? (5) Finally, we have to consider what we understand by “theory” and what we expect from theorizing about social networks (Abend, 2008). Broadly speaking, scientific theories are compounds of meaningfully connected sentences that are conceptually and logically coherent, and that include statements about empirically observable phenomena. They should be highly general and sufficiently abstract to cover a wide range of phenomena (as opposed to theories of the middle range). However, we are looking for theories of social networks as a concept developed out of empirical studies. Therefore, theories should not only be coherent and abstract but also connect to empirical research, to provide theoretical expectations and a framework for the interpretation of results. As pointed out, the theories under consideration differently connect to research methods: action theory and the concept(s) of social capital inform statistical analyses of ego-centric networks. Interactionist and pragmatist approaches lean toward participant observation and qualitative interviews. And relational sociology is most often combined with formal network analysis. At the same time, the expectations in theorizing differ profoundly, and they sometimes seem to drive the choice for one theory or another. Action theory and pragmatist/interactionist approaches build on a commitment to individual actors. Often, this commitment seems to stem from ontological beliefs and/or from normative impetus. For example, action theory insists that only human beings have “causal powers” (Coleman,  1990, p. 4; Hedström,  2005, pp. 28, 34). In a similar vein, Emirbayer and others emphasize agency as the “moment of freedom” of human actors (Emirbayer & Goodwin, 1994, p. 1142). Relational sociology, in contrast, deliberately strays from everyday thinking and language into its very own realm of concepts, arriving at highly abstract assertions about the social world with individuals as secondary to supra-personal forces. My sympathies lie with this approach, since ontological assumptions and normative considerations do not necessarily lead to sound theoretical statements—even if the epistemic break with everyday conceptions comes at the expense of intuitive plausibility.

Notes 1. Charles Kadushin (2002) derives a similar distinction between individual motivations ­conducive to dense network structures or to bridges across structural holes from the ­psychological literature.

Theories of Social Networks   47 2. In his later book with Desmond, Emirbayer replaces “transactions” with “interactions” as one component of social configurations (rather than their basic units; Emirbayer & Desmond,  2015, p. 188ff). Also, the authors equate dynamics and process with agency, rather than with supra-personal “transactions.” 3. “Relational sociology” is used here in a much narrower sense than in Emirbayer’s (1997) “Manifesto for a Relational Sociology.”

References Abend, G. (2008). The meaning of “theory.” Sociological Theory, 26, 173–199. Borgatti, S., & Lopez-Kidwell, V. (2011). Network theory. In J. Scott & P. Carrington (Eds.), Sage handbook of social network analysis (pp. 40–54). Thousand Oaks, CA: Sage. Bottero, W., & Crossley, N. (2011). Worlds, fields and networks: Becker, Bourdieu and the structures of social relations. Cultural Sociology, 5, 99–119. Bourdieu, P. (1986). The forms of capital. In J.  Richardson (Ed.), Handbook of theory and research for the sociology of education (pp. 241–258). New York, NY: Greenwood Press. Bourdieu, P., & Wacquant, L. (1992). Invitation to reflexive sociology. Cambridge, UK: Polity. Burt, R. (1982). Toward a structural theory of action. New York, NY: Academic Press. Burt, R. (1992). Structural holes: The social structure of competition. Cambridge, MA: Harvard University Press. Burt, R. (2004). Structural holes and good ideas. American Journal of Sociology, 110, 349–399. Coleman, J. (1988). Social capital in the creation of human capital. American Journal of Sociology, 94(Suppl.), S95–120. Coleman, J. (1990). Foundations of social theory. Cambridge, MA: Belknap. Crossley, N. (2010a). Towards relational sociology. Abingdon, UK: Routledge. Crossley, N. (2010b). The social world of the network: Combining qualitative and quantitative elements in social network analysis. Sociologica, 1, 2010. doi: 10.2383/32049. Crossley, N. (2013). Interactions, juxtapositions, and tastes: Conceptualizing “relations” in relational sociology. In C. Powell & F. Dépelteau (Eds.), Conceptualizing relational sociology (pp. 123–143). New York, NY: Palgrave. Crossley, N. (2015). Networks of sound, style and subversion: The punk and post-punk worlds of Manchester, London, Liverpool and Sheffield, 1975–80. Manchester, UK: Manchester University Press. Desmond, M. (2014). Relational ethnography. Theory & Society, 43, 547–579. DiMaggio, P. (1986). Structural analysis of organizational fields: A blockmodel approach. Research in Organizational Behavior, 8, 335–370. DiMaggio, P., & Powell, W. (1983). The iron cage revisited: Institutional isomorphism and collective rationality in organizational fields. American Sociological Review, 48, 147–160. Emirbayer, M. (1997). Manifesto for a relational sociology. American Journal of Sociology, 103, 281–317. Emirbayer, M., & Desmond, M. (2015). The racial order. Chicago, IL: University of Chicago Press. Emirbayer, M., & Goodwin, J. (1994). Network analysis, culture, and the problem of agency. American Journal of Sociology, 99, 1411–1154. Emirbayer, M., & Mische, A. (1998). What is agency? American Journal of Sociology, 103, 962–1023. Erikson, E. (2013). Formalist and relationalist theory in social network analysis. Sociological Theory, 31, 219–242.

48   Jan Fuhse Fine, G.  A., & Kleinman, S. (1983). Network and meaning: An interactionist approach to structure. Symbolic Interaction, 6, 97–110. Fuchs, S. (2001). Against essentialism: A theory of culture and society, Cambridge, MA: Harvard University Press. Fuhse, J. (2009). The meaning structure of social networks. Sociological Theory, 27, 51–73. Fuhse, J. (2015a). Theorizing social networks: Relational sociology of and around Harrison White. International Review of Sociology, 25, 15–44. Fuhse, J. (2015b). Networks from communication. European Journal of Social Theory, 18, 39–59. Fuhse, J. (in preparation). Social networks of meaning and communication. Fuhse, J., & Mützel, S. (2011). Tackling connections, structure, and meaning in networks: Quantitative and qualitative methods in sociological network research. Quality & Quantity, 45, 1067–1089. Godart, F., & White, H. (2010). Switchings under uncertainty: The coming and becoming of meanings. Poetics, 38, 567–586. Gondal, N., & McLean, P. (2013). Linking tie-meaning with network structure: Variable connotations of personal lending in a multiple-network ecology. Poetics, 41, 122–150. Gould, R. (1995). Insurgent identities: Class, community, and protest in Paris from 1848 to the commune. Chicago, IL: University of Chicago Press. Granovetter, M. (1973). The strength of weak ties. American Journal of Sociology, 78, 1360–1380. Granovetter, M. (1979). The theory-gap in social network analysis. In P. Holland & S. Leinhardt (Eds.), Perspectives on social network research (pp. 501–518). New York, NY: Academic Press. Hedström, P. (2005), Dissecting the social: On the principles of analytical sociology. Cambridge, UK: Cambridge University Press. Kadushin, C. (2002). The motivational foundation of social networks. Social Networks, 24, 77–91. Lin, N. (2001). Social capital. Cambridge, UK: Cambridge University Press. Lindenberg, S. (1990). Homo socio-oeconomicus: The emergence of a general model of man in the social sciences. Journal of Institutional and Theoretical Economics, 146, 727–748. Martin, J. L. (2009). Social structures. Princeton, NJ: Princeton University Press. Martin, J. L. (2010). Life’s a beach but you’re an ant, and other unwelcome news for the sociology of culture. Poetics, 38, 228–243. Martin, J. L. (2011). The explanation of social action. New York, NY: Oxford University Press. Martin, J. L., & George, M. (2006). Theories of sexual stratification: Toward an analytics of the sexual field and a theory of sexual capital. Sociological Theory, 24, 107–132. Mische, A. (2003). Cross-talk in movements: Reconceiving the culture-network link. In M. Diani & D. McAdam (Eds.), Social movements and networks (pp. 258–280). New York, NY: Oxford University Press. Mische, A. (2008). Partisan publics: Communication and contention across Brazilian youth activist networks. Princeton, NJ: Princeton University Press. Mische, A. (2011). Relational sociology, culture, and agency. In J. Scott & P. Carrington (Eds.), Sage handbook of social network analysis (pp. 80–97). London, UK: Sage. Mische, A., & White, H. (1998). Between conversation and situation: Public switching dynamics across network domains. Social Research, 65, 695–724. Mohr, J. (1994). Soldiers, mothers, tramps and others: Discourse roles in the 1907 New York City charity directory. Poetics, 22, 327–357. Mohr, J. (1998). Measuring meaning structures. Annual Review of Sociology, 24, 345–370.

Theories of Social Networks   49 Mohr, J., Breiger, R., & Wagner-Pacifici, R. (Eds.). (2015). Colloquium: Assumptions of Sociality, Big Data & Society, 2. http://bds.sagepub.com/content/colloquium-assumptions-sociality Mützel, S. (2009). Networks as culturally constituted processes: A comparison of relational sociology and actor-network theory. Current Sociology, 57, 871–887. Padgett, J. (2012). From chemical to social networks. In J.  Padgett & W.  Powell (Eds.), The emergence of organizations and markets (pp. 92–114). Princeton, NJ: Princeton University Press. Padgett, J. & Ansell, C. (1993). Robust action and the rise of the medici. American Journal of Sociology, 98, 1259–1319. Putnam, R. (1993). Making democracy work: Civic traditions in modern Italy. Princeton, NJ: Princeton University Press. Schultz, J., & Breiger, R. (2010). The strength of weak culture. Poetics, 38, 610–624. Scott, J. (2000). Social network analysis (2nd ed.). London, UK: Sage. Snijders, T., van de Bunt, G., & Steglich, C. (2010). Introduction to stochastic actor-based models for network dynamics. Social Networks, 32, 44–60. Somers, M. (2005). Let them eat social capital: Socializing the market versus marketizing the social. Thesis Eleven, 81, 5–19. Tilly, C. (1998). Durable inequality. Berkeley, CA: University of California Press. Tilly, C. (2002). Stories, identities, and political change. Lanham, MD: Rowman & Littlefield. Tilly, C. (2005). Identities, boundaries and social ties. Boulder, CO: Paradigm. Wellman, B. (1983). Network analysis: Some basic principles. Sociological Theory, 1, 155–200. White, H. (1992). Identity and control: Towards a structural theory of action. Princeton, NJ: Princeton University Press. White, H. (1993). Careers and creativity: Social forces in the arts. Boulder, CO: Westview. White, H. (2008 [1965]). Notes on the constituents of social structure. Sociologica, 1, 2008. doi:10.2383/26576 White, H. (2008). Identity and control: How social formations emerge. Princeton, NJ: Princeton University Press. White, H., Fuhse, J., Thiemann, M., & Buchholz, L. (2007). Networks and meanings: Styles and switchings. Soziale Systeme, 13, 543–555. Wimmer, A., & Lewis, K. (2010). Beyond and below racial homophily: ERG models of friendship network documented on Facebook. American Journal of Sociology, 116, 583–642. Yeung, K-T. (2005). What does love mean? Exploring network culture in two network settings. Social Forces, 84, 391–420.

Chapter 4

N et wor ks a n d N eo Structu r a l Sociol ogy Emmanuel Lazega Individual and Collective Capacities Sociology is often presented as knowledge of regular associations between position in a social structure and behavior, individual and/or collective. A simplified version of 20th­century European structuralism identified position in terms of interdependencies: social phenomena (e.g., language, as in Saussure,  1916, or kinship, as in Lévi-Strauss,  1949, or myths, as in Lévi-Strauss, 1978) were construed as structures, that is, self-contained systems of differences between in­ter­de­pend­ent entities emerging from chaos. Such complex systems of interdependencies were seen as derivable from invariant and dominant rules (e.g., prohibition of incest, norms of reciprocity, difference between sacred and profane) or ­variables (e.g., macro-level stratification) providing coordinates for position in this system of ­ differences—but also expressing epiphenomenal variety in agency (Wellman & Berkowitz, 1988). Behavior was defined as varied manifestations of these underlying structures less than as outcomes of human agency (choice and strategy). Neo-structural sociology (NSS) revisits this strongly deterministic structuralism by opening it to individual and collective agency. Interdependencies between actors are too important in social life to be left unorganized, and actors and institutions struggle to or­gan­ ize them, to build organized collective actors, and to use these organizations to navigate problematic social processes that cannot be ignored or stopped. Social network analysis can be used, together with other methods, for tracking and understanding actors’ positions, embeddedness, and efforts to manage their interdependencies in contexts of cooperation and/ or competition where interests often diverge, conflicts flare up, and constraining but often fragile institutions are inherited from the past. As such, it avoids reification of the notion of structure and helps in further developing a sociological theory of collective action and of the management of the latter’s dilemmas (Weber, 1978 [1920]; Olson, 1965; Wittek & van de Bunt, 2004; Wittek, Schimank, & Groß, 2007). Intentional, reflexive, and strategic behaviors endogenizing the structure, not blind reproduction of the underlying structure, are also parts of the behavioral assumptions of this approach, including the use of organized settings as

Networks and Neo-Structural Sociology   51 “tools with a life of their own” in their “dynamic configuring fields” (Selznick, 1949), that is, as political communities. NSS assumes a form of social rationality: actors themselves articulate these dimensions of individual and collective action by combining identities in reference groups, cultural norms, and authority in their appropriateness judgements (Lazega,  1992,  2014). Reference groups and authority can be methodologically identified with relational infrastructures such as social niches and social status measured with networks. These are also dimensions of agency as identified by the structural branch of symbolic interactionism (Stryker, 1980) or by authors such as Archer (1982) or Donati (2010). In this framework, relationships can be defined as indicators of interdependencies: as channels for the flows and exchanges of resources of all kinds (material, informational, emotional, etc.), but also as moral or symbolic commitments vis-a-vis the exchange partners (Lazega,  2012a). Commitments in particular are based on rhetorical promises and moral conventions that introduce culture and duration in exchanges. They presuppose a form of social control of their acceptability and credibility. Thus, in these micro-­foundations of neo-structuralism, agency mobilizes and combines both structure and culture. Relying on appropriateness judgments to guide socially rational action involves endogenization of structure: individuals are endowed with a capacity to perceive vertical and horizontal differentiations, for example, power relationships and social inequalities. They are assumed to be able to combine these perceptions, their own and others’ behavior, and relational choices by using language and culture giving meaning to actions, and to act based on this contextualization. Breiger’s (2010; Schultz and Breiger, 2010; Breiger & Puetz, 2015) notion of “weak culture,” for example, provides a key link between the normative dimension of appropriateness judgments and relational life. The transition between the old and new structuralism can be traced back to anthropologists such as Mitchell (1969) and their use of social networks to look at structures of opportunity and constraints. Harrison White’s Chains of Opportunity (1970), a seminal book that models the labor market in terms of vacancy chains, represents this transition well. It contributes to the old structuralist tradition by creating a link between position expressed in relational terms and chances of getting a job. Newcomers with relational profiles relatively similar to that of leavers have a higher probability of replacing the leavers in the chain. This led to descriptions of structures of opportunity and constraints using models from which measurements and interpretations of his own concept of “structural equivalence” were later derived in the 1970s. By providing a new formalism clustering actors based on similarities in their relational profiles (blockmodeling) and combining the use of this relational approach with a new way of identifying positions, endogenous role sets, and division of work in human groups, White, Boorman, and Breiger (1976) have enriched structural social sciences with an exceptional wealth of new concepts and intuitions and hypotheses. They have allowed social network methods of analysis to become so generic that they can now be used to both identify systems of opportunities/constraints and study social order/discipline and processes in society. But White’s (1970) link between structure and mobility also introduces the possibility for other sociologists to look for cues about how actors manage both their interdependencies and their mobility to try to switch positions at the meso level. This is particularly well illustrated in a chapter on “Mobility in Loops,” which opens the door to a neo-structural sociology where agency, individual and collective, begins to be taken into account as condition and consequence of opportunities and constraints, reflecting also strategies, navigation of social processes and involvement in social and institutional change.

52   Emmanuel Lazega

Interdependencies in the Organizational Society: Bureaucracy and Collegiality NSS looks at how actors manage their practice-related interdependencies to deal with cooperation dilemmas in socially organized settings defined as small or large political communities (Reynaud,  1989). It relies on numerous methods, from formalizing to interpretive, especially on the analysis of socioeconomic networks, to understand combined interdependencies and conflicts from an organized collective action perspective. As indicators of such systems of interdependences and conflicts in organized social milieus, networks are considered as artifacts of methods, not as modes of coordination in themselves. Network methodology helps describe the morphology of those systems, always beginning with a sociology of work and members’ task and functional interdependencies. This leads to a conception of social capital that stems from a general sociological tradition focusing, for example, on social processes supporting and enhancing economic performance, from Durkheim (1893) to Coleman (1990) and Lazega (2009). Social capital is approached as a collective capacity, not so much as an individual capacity (as, e.g., in Burt’s [2005] or Comet’s [2007] approach to “relational” capital maximizing individual performance in competitive arenas). The methodology helps model these social processes (see later) that, provided certain conditions are fulfilled, facilitate collective action (Lazega & Pattison, 2001; Lazega, 2006). For example, recurring structural patterns of specific multiplex ties are assumed to be beneficial to collective action among peers because members use them to solve problems of coordination as much as problems of individual action (Lazega & Pattison,  1999). Social rationality and social capital understood in that way create a form of social discipline and collective responsibility that is recognized as legitimate by actors, close to what Elias (1991) called the articulation of external constraints and internalized self-control, characterizing both the individual and the collective levels of agency simultaneously. NSS’s focus on collective action has led to the use of an organizational approach to social life that brings to light a generic meso-social level of society. Sociologists (e.g., Perrow, 1991) assert that contemporary societies are dominated and shaped by large bureaucratic or­gan­i­ za­tions. Beyond this general statement, NSS recenters the study of the organizational society on two kinds of ideal-typical organizational and institutional forms, based primarily on a sociology of work, with an indefinite number of combinations of both forms in real or­gan­ i­za­tional life. At one extreme is the dominant and default form, Weberian bureaucracy, and at the other extreme is an older form, collegiality (as revisited by Waters [1989] and Lazega [2001]). These types can be used to differentiate between two ways of managing cooperation dilemmas. Ideal-typical bureaucracy is meant to carry out routine tasks and mass production, using centralized coordination, hierarchy, and impersonal interactions between affiliated members. Ideal-typical collegiality (not to be confused with congeniality) is meant to carry out nonroutine, innovative tasks, using deliberation and consensus building backed up with collective responsibility that can only be enforced with personalized relationships between (often rival) peers. Governing collective action by impersonal interactions and governing it by personalized relationships are two basic and different models, even in the organizational society where bureaucracy is the default form.

Networks and Neo-Structural Sociology   53 One of the main issues that NSS has explored up to now is how personalized relationships are used to steer collective action among rival peers in organizations (Lazega, 2001), but also coopetition in markets (Lazega & Mounier, 2002; Brailly et al., 2018). Beyond r­ easoning in terms of “embeddedness” (Granovetter, 1985), participation in nonroutine ­collective action— for example, for professional brainstorming or commercial or political negotiations— requires personalized cooperation with others, including in struggles with competitors. For actors “embarked” more than “embedded”, this steering is based on navigating social processes, whether within or between organizations, in public administrations, businesses, nonprofit associations, cooperatives, or politics and social movements. This coopetition is always problematic and expressed through personalized transfers/sharing or exchanges of the various kinds of resources mentioned earlier. From a neo-structural perspective, this means that specific local relational infrastructures must emerge from multiplex social exchanges—for example, of coworkers’ goodwill, advice, sometimes role distance, and emotional support— so that members can cooperate and exchange on an ongoing basis, if not in the long run. This does not mean that personal relationships do not matter in widespread bureaucratized settings (as documented by a literature presented, for example, in Brass, 1984; Kilduff & Tsai,  2003). It means, however, that one has to be particularly careful in looking at how (which particular blend) and where they matter, at which level in the organizational ­stratigraphy—for actors and for sociological explanation of collective action. In an already bureaucratized society, their systematic use can be considered inappropriate, if not corrupt, as in Weberian critique of patrimonialism. Indeed, using social network analysis to look at how bureaucracy and collegiality are brought together by combining strongly personalized ties and impersonal interactions in multilevel structures is one of NSS’s avenue of future development (see the section on dynamic multilevel networks later). Examples include a network study of how bureaucracy rotates rival peers in a carousel system (recall White’s mobility in loops) to counterbalance patronage and clientelism in a corporate law partnership (Lazega, 2000) and a network study of “top-down collegiality” to silence conflicting religious orientations among priests in a Roman Catholic diocese (Lazega & Wattebled, 2011).

Relational Infrastructures Among the building blocks of NSS, management of interdependencies produces relatively stable relational patterns, called relational infrastructures, that complexify the fundamental structural notion of position. NSS identifies two kinds of relational infrastructures that facilitate the navigation of generic social processes: a system of “social niches” and a system of heterogeneous (and more or less inconsistent) forms of social status that can be both endogenous and exogenous. A social niche is a dense position in terms of blockmodeling, that is, a subset of members, at the organizational or interorganizational levels, who are approximately structurally equivalent. They both play a similar role in a system of collective action and establish among themselves durable, dense, and multiplex social exchanges and relations. Actors contextualizing their behavior in organized settings have a trained capacity to detect the existence of  niches based on the criterion of cohesion that comes attached to a certain social

54   Emmanuel Lazega homogeneity: they use similarities (e.g., in terms of office membership or specialty, hierarchical status, gender, culture, or class, i.e., both endogenous and exogenous attributes from the perspective of the organization). A niche is capable of coordination and collective agency and only makes sense in a system of niches that represents a form of division of work (White, Boorman, & Breiger, 1976). There are many empirical examples of identifications of systems of social niches at the intraorganizational level as well as at the interorganizational level. For example, Delarre (2005) looks at groups of French enterprises (1991–1999) as new social entities characterized by dense and multiple exchanges and strategic alliances between the daughter companies that they include in their holdings. Funding, staff, expertise, control, etc., circulate within such niches and form a system that is able to preserve a flexibility that allows these groups to adjust to volatile markets, thus managing the “paradox of embeddedness” (Uzzi, 1997; Varanda, 2005; Grossetti, 2011). Status, a multidimensional and highly complex notion, refers to a member’s relative “importance” in the group, both in the formal hierarchy and in the networks of exchanges (Merton, 1959; Gould, 2002). It involves a mandate that confers collective recognition of the importance of individual or collective contributions, and the authority that comes attached to this mandate, with responsibilities and benefits from various forms of deference. Members with status, as individuals, are thus granted a license (Hughes,  1945) to legitimate participation in specific forms of leadership. It can be exogenous in the Weberian tradition, that is, economic (based on control of the production apparatus and revenue), social (based on honor or prestige, not only from birth, but also from human capital [education]), and political (based on administrative and political control of public institutions, particularly the state). From a more endogenous and relational perspective, status can be achieved in many ways, for example, based on various kinds of centrality (Freeman, 1979), or even endorsement by other members who themselves are endowed with status. It is not surprising, therefore, that members of a group compete for status, but also that this competition is shaped by status heterogeneity, inconsistency (Lenski, 1954) and ranking between these dimensions. Analyzing the correlations between all these dimensions of status is a useful contribution of NSS’s explanation of how various social processes work, including regulation. Relations between niches and status are dynamic. Niches can produce a fragmentation that is not without risk for organized collectives, hence the paradoxical importance of cross-­ boundary status competition for organizational integration. Collectively, as relational infrastructure built on heterogeneity and inconsistency rules, bundles of dimensions of status conferred by this competition can paradoxically create solidarity and cohesion when systems of niches are subjected to too many centrifugal forces. Relational infrastructures co-evolve and co-constitute each other. Decomposing networks into sub-structures, as with exponential random graph models (Lusher, Koskinen & Robins, 2013) helps to identify these dynamics.

Social Processes as Social Capital of the Collective in the Organizational and Market Society Depending on how members involved in carrying out nonroutine, innovative tasks reflexively invest in relational infrastructures, they facilitate or hinder the deployment

Networks and Neo-Structural Sociology   55 and navigation of the social processes on which collective action and coordination are recursively based. Among these processes, NSS has focused on modelling the variable forms of particularistic solidarity (measured, for example, with direct and indirect reciprocities, i.e., the forms of restricted and generalized exchange identified by Claude LéviStrauss), exclusion, and desolidarization; socialization and collective learning (assessed, for example, with advice networks); social control (measured, for example, with monitoring and sanctioning networks) and conflict resolution; and regulation and institutionalization of norms and practices (i.e., politics). Each of these processes is at the heart of social life and collective action. A first category of social process thus involves the creation of these personalized, particularistic solidarities, desolidarizations, and segregations, inside and across social niches. There are many well-known examples of the existence of solidary processes in markets and industries. Ingram and Roberts (2000) provide a case of seriously friendly relationships between otherwise competing managers in the upscale hotel industry in Sidney. They explain this result by the idea that friendly relationships stabilize the norms of exchange between these managers in that industry. Éloire (2010) provides another example of forms of bounded solidarity based on the reconstitution and analyses of social networks among restaurant owners in a city center. He detects a specific form of homophily among members of a social niche of high-end restaurants (i.e., White’s [1981] “paradoxical” market) who are more central, famous, and exclusive than others. The fact that these niches do not seem to exist in all types of Whitian markets reveals the discriminating and strategic nature of this form of bounded solidarity between competitors. Particularistic solidarity cannot be reduced to a purely relational, reciprocity-, multiplexity-, or cohesion-based phenomenon: it is also made possible by social boundaries and norms, the presence of which is confirmed by introducing in the models, for example, effects for various forms of homophily and attribute similarities between actors. A second category of social processes consists in collective learning, or even the opposite in construction of ignorance, within and across organized settings. Collective learning is understood here in a broad sense: the way in which we think with others and build common knowledge with them, such as reconstituting the history of the collective itself and its own past and changes, mastering together new techniques and how to implement them, adapting together to new environmental constraints—that is, living with “new” limits and transitions. Neo-structural research has examined collective learning based on the study of advice networks and the theory of appropriateness judgments. Relational infrastructures matter here as well. Actors use status criteria when selecting an adviser (see Blau,  1964; Krackhardt, 1990; Agneessens & Wittek, 2012; Škerlavaj & Dimovski, 2006; Lazega & Van Duijn, 1997; Montes-Linh, 2014, among many others). Recognition of status gratifies the advisers by providing them with an incentive to share their knowledge, experience, and educated judgment. In formally organized contexts, following this status rule, members avoid seeking advice from the colleagues “below” them in the formal hierarchy or in the pecking order. But these asymmetries are not necessarily rigid. The recursive and cyclical dynamics of advice networks, as seen later, creates a structural oscillation as the super-­ centrality of specific actors (in the core of these networks) fluctuates. In addition, empirical research finds many “infractions” to this avoidance rule. Actors use several kinds of similarities among themselves to counteract the conflicting effects of these status games in collective learning. The use of homophily in the choice of exchange partners allows members to cut across status boundaries to access advice from “below.” Thus, to the extent that advice

56   Emmanuel Lazega networks are structured by status and by the mitigation of status competition in social niches, they tend to become both hierarchical and cohesive, the hierarchical dimension often being stronger than the cohesive dimension. They are also strongly embedded in other types of social networks that also help with mitigating the status rule. Individuals can find social niches to be a safer environment to engage in advice relationships, even sometimes with direct competitors, especially when many high-status players coexist in the social niche and are able collectively to enforce social discipline and rules of protection against opportunistic behavior, turning cutthroat competition into more or less “friendly” competition (Lazega, Bar-Hen, Barbillon, & Donnet, 2016). Modeling (un)learning processes can be highly heuristic in the study of markets and industries. In markets, the existence of social niches and various forms of status seem to facilitate collective learning between businesspersons and companies. At the interor­gan­ i­za­tional level, entrepreneurs also seek to learn from each other while still trying to compete on strategic aspects such as market distribution (see, e.g., among many others, Kogut & Zander, 1996; Lomi & Pattison, 2006). As shown by Piña-Stranger and Lazega (2010) in a study of advice networks among biotech entrepreneurs, status games are different at that level from what they are at the intraorganizational level: at the interorganizational level, entrepreneurs do seek advice “below” them in the pecking order. Oubenal (2015) uses the same perspective in his network study of concerted ignorance of risks in the construction of the financial markets for specific products such as exchange traded funds (or trackers). A third generic process consists in using relationships to exercise social control and bring rival peers back to good order. When it is confronted with behavior that is deviant or perceived as opportunistic, and before using costly judicial procedures, an organized collective activates a personalized system of monitoring and sanctioning using reputations and helping in selecting sanctioners able to use personalized relationships and access to the deviant members who need to be reminded of their commitments. That process makes it possible to solve the problem of the “second-order free-rider” problem (Coleman, 1990; Wittek, 1999) by lowering the cost of control thanks to the use of personal relations between sanctioners and targets of social control. It is also based on the existence of social niches in which the threat of losing one’s personal ties is used as leverage against the targets and on a specialized form of social status, that of informal “police.” This link between relational infrastructures (niches, status) and social control is established by observing regularities in the personalized and informal relational paths through which those sanctions are implemented to protect common resources. Lazega and Krackhardt’s (2000) provide analyses of a three-way network dataset (Krackhardt, 1987) for the reconstitution of a lateral control regime exposing such effects of relational infrastructures on this process. Techniques identifying relational infrastructures being mainly descriptive (blockmodeling, centrality measures, etc.), statistical tests and models combining ties and attributes of actors are needed to confirm the existence and functions of social processes mobilizing relational infrastructures at the more granular level of specific substructures. p2 (Van Duijn, Snijders, & Zijlstra,  2004) or exponential random graph models (Wasserman & Faust, 1994; Robins, Woolcock, & Pattison, 2005; Snijders, 2005; Lusher, Koskinen, & Robins,  2013) test for the significant presence of such substructures, for example, of cyclical substructures characterizing indirect reciprocity—and by extension bounded solidarity.

Networks and Neo-Structural Sociology   57

Neo-Structural Institutionalism Finally, identifying relational infrastructures also helps model the “regulatory” process, that is, the micro-political (re)definition of the rules of the game among members, and institution building that comes attached (as cause or consequence). Classic concepts such as “precarious value” (Selznick, 1957) have already brought together neo-structural and institutional perspectives. Building on such concepts, identifying relational infrastructures in socially organized settings helps model the negotiation of norms and conventions (Reynaud, 1989; Favereau & Lazega, 2002; Lazega, 2016b, 2018) and their institutionalization in stable practices (“living the rules,” as in Glückler, Suddaby, & Lenz, 2018). NSS shows that institutionalization is characterized by specific social dynamics bringing together structure, culture, and agency—that of oligarchical negotiation of precarious values and cultural stabilization (or challenges) of interpretations of the rules en vigueur. In these political dynamics, institutional entrepreneurs with heterogeneous and inconsistent forms of social status (measured also in network terms) can have particular influence. They punch above their weight in exploiting or undermining oppositional solidarities to promote their regulatory interests: in definition of priority rules; in use of rhetorics of relative sacrifice to build legitimacy and manage the losers; in articulation of regulation levels as “vertical linchpins,” that is, members who act simultaneously at different, superposed strata of collective agency; etc. Empirical examples based on network analyses can be found among corporate lawyers (Lazega, 2001) and in the case of institutional capture of a commercial court by lay judges coming from the banking industry (Lazega & Mounier, 2012). At the interorganizational level, network studies of lobbying, for example, provide precious insights into this process of how relational infrastructures stabilize interpretation of the rules and regulation as a relational process (see, e.g., a tradition of work beginning with Laumann & Knoke, 1987). Studies of “unified” (public/private; top-down/bottom up; national and transnational) institutionalization as a form of “government by relational infrastructures” can be usefully framed from an NSS perspective as well. In contemporary neo-liberal capitalism, joint regulation of markets by business and public authorities is becoming increasingly systematic whether through authoritarian States, or as more “regulatory States” establishing general, vague legal frameworks, leaving the task of defining the substance of rules that are en vigueur for market participants themselves, in particular finance. Penalva-Icher (2010) offers an example of this type of joint regulatory process by examining the social construction, in France, of “socially responsible” finance promoted by “ethical” funds. She uses a network study to show that, even when there are no formal barriers to entering this market, social and informal barriers do exist for participating in its oligarchic regulatory process. Long-term social investments in this milieu (i.e., in personalized friendships) allow financiers to be at the right place at the right time when important decisions about their industry are made. Other case studies illustrate collegial oligarchies using status inconsistencies in conflicts of interests to concentrate power and build/buy legitimacy in the regulation of the economy (e.g., Lazega, 2012c; Lazega, Quintane, & Casenaz, 2016, on the construction of a new transnational intellectual property regime). Network studies of courts specialized in business are also of particular interest here. NSS has looked, for example, at how public authorities and private business unlock, capture, and exploit each other’s collective action capacities thanks to common relational infrastructures. It is a promising avenue of research on how powers (fail to) check each other in contemporary or­gan­i­za­tional societies.

58   Emmanuel Lazega The list of the social processes that are the social capital of the collective, the existence of which depends on a common and underlying relational infrastructure, is indefinite. Each of these processes can be compared in different organized settings. They are also linked in dynamic, recursive ways. They can energize or inhibit the evolution of their own relational infrastructures and thus steer collective action in new directions. New rules can lead to new solidarities and reconfigure a system of niches. Normative beliefs produced by regulation in controversies can influence, for example, choices of advisers and therefore collective learning (Lazega, Mounier, Snijders and Tubaro, 2012). Social control can encourage the emergence of new forms of social status and modify the principles of status consistency, which in turn can impact regulation. Systematic network research and modeling on the concatenation (Tilly, 2007) of these processes based on the fact that they draw on the same relational infrastructures is in its infancy. The evolution of relational infrastructures at each level will help understand how recursive social ­processes reinforce/feed back on/transform/undermine each other using the same or different relational infrastructures—when their dynamics are indeed based on relational ­infrastructures—to contribute to the emergence of new social orders.

Challenges: Longitudinal and Multilevel Network Structures to Navigate Social Processes All social processes in forms of organized collective action (in which personalized ties are crucial for coordination) are intrinsically dynamic even if social network analysts have often speculated about them based on static data. “Dynamic invariants” as a basis for organizational resilience can be identified, for example, in advice networks: centrality trajectories of members and analyses of relational turnover in longitudinal datasets show recursive cyclical dynamics in centralization-­decentralization-recentralization of these networks as generated by a search for a balance between overload and conflict among super-central advisers (Lazega, Sapulete, & Mounier, 2011). Dynamic and multilevel perspectives can be developed, for example, showing when, in contexts of cooperation among competitors, access to advisers who are “big fish in big ponds” provides competitive advantages for the little fish (Lazega et al., 2008). Extension of opportunity structures by “network lift from dual alters” increases this advantage when members can close multilevel three-paths and when dual alters have complementary resources (Lazega, Jourda, & Mounier, 2013). This is the case for the social control of markets, as shown in neo-structural studies (using Snijders & Nowicki’s [1997] stochastic blockmodeling) of formal judicial institutions exercising social control on the business world (Lazega, Sapulete, & Mounier, 2011)—­institutions where a centuries-old capture is produced by structural stability regardless of membership turnover. Sociological research increasingly takes into account these dynamic and multilevel dimensions of position, relational infrastructures, and social processes. How to model dynamics of multilevel networks is an important question, for example, in studies of institutional emergence at the transnational level or in studies of increasing digitalization/bureaucratization of exchanges and controls in the organizational society. Especially with studies of regulation, institutionalization, and concatenation of processes, demand for longitudinal and multilevel data increases. Given the complex structure and richness of contemporary big

Networks and Neo-Structural Sociology   59 relational datasets (including information on production output, affiliations, careers and trajectories, performance outcomes, etc., in addition to behavior), often in comparative frameworks, new perspectives will emerge to take into account this complexity of multilevel network dynamics. One of the main issues for network analysts today is to design and use robust methods analytically disentangling causal effects to measure, model, and account for social phenomena in different real-life settings, across levels, and over time. Extending existing models, such as that of Snijders’s (2016) approach of the dynamics of networks to the dynamics of multilevel networks, will allow studying the coevolution of multiple networks and multiple behaviors, where “behavior” is a shorthand for any changeable characteristic of the actors who are the nodes in the network. Position in a dynamic structure is not simple to identify and track. Social dynamics are complex. Individual actors may follow different trajectories and change not only places and positions (see, e.g., Brandes, 2016; Moody et al., 2011; Quintane, 2013) but also behaviors, norms, and relationships. Collective actors in which individuals are affiliated emerge when their social capital (as defined earlier) is sufficient, but they can stabilize or unravel over time. The coevolution of all these dimensions of individual and collective action, especially from a relational perspective, is not well known. Models for longitudinal network analysis such as Snijders’s Siena statistical actor-oriented approach for longitudinal network data (Snijders, 2001, 2005, 2017 for a recent synthesis; Snijders, Steglich & Schweinberger, 2006; Snijders, Lomi, & Torló, 2013) provide analytical tools and statistical tests for the relative weight of influence and selection effects describing the coevolution of networks and behavior. Empirical explorations can be found in research on institutional emergence and maintenance (see Moody, 2009; Lazega, Mounier, Snijders & Tubaro, 2016). Position in multilevel network structures is also difficult to specify (Snijders & Bosker, 1999; Snijders, 2016; Lazega & Snijders, 2016). This often requires observing and modeling at least two systems of collective action that are superimposed and partially interlocked in terms of their interdependencies: for example, one interindividual, the other interorganizational. Building on Breiger’s (1974) “dual” approach of bipartite or two-mode networks that co-constitute each other, articulation of distinct levels of collective action can  be partly accounted for using a structural linked design (Lazega et al.,  2008,  2013; Breiger, 2015), where the unit of analysis is the individual-organization pair, or dual positioning as articulated with strategies of actors. Examples are node sets defined as a set of firms and a set of employees, with firm-firm ties, employee-employee ties, and firm­employee affiliations. Each level is represented with a complete network and examined separately, and then combined with that of the other level thanks to information about the affiliation of each individual in the first network to one of the organizations in the second network. Taking into account such within-level and cross-level effects over time provides a better understanding of processes in which individual effects translate into social effects, as in institutionalization processes. Statistical tests for hypotheses about the significance of specific multilevel effects have been developed (Zijlstra et al., 2006; Wang et al., 2013, 2016) and used in economic sociology. For example, research exploring trade fairs and social processes taking place in them cross-level between networks of sales representatives and the networks of the companies that employ them (Brailly et al., 2015; Favre et al., 2016) explains the conditions under which small firms can resist and survive predatory practices of multinational corporations (Brailly,  2016) or the differences in strategies of collective learning between novices and experienced traders in new marketplaces bringing together regional and global players in

60   Emmanuel Lazega the television industry (Favre,  2014). This approach works for all systems that organize themselves around several levels of decision making and power. These levels can be bureaucratic, collegial, or both: they are articulated but benefit also from autonomy (Lazega, 2020a, 2020b). Neo-structural approaches to social processes have been developed employing mixed methods, both qualitative and interpretive (beginning with a sociology of work and ethnography) as well as quantitative and formalizing. Combining dynamic and multilevel network analysis without conflating the levels (in Archer’s [1982] sense) is one of the next frontier of NSS. Indeed how do relational infrastructures from personalized relationships help navigate social processes in a bureaucratized, hierarchical world of routines, impersonal interactions and subordinations? To account for this apparent paradox, a more complex, multilevel, and dynamic understanding of the notion of position must be introduced. This is equivalent to saying that the complexity of articulations of bureaucracy and collegiality requires new theoretical approaches that benefit from the adoption of the dynamic multilevel perspectives and methodologies mentioned above. Sociological “stratigraphy” can identify superposed strata of bu­reauc­racy and collegiality in social settings, stressing the vertical dimension of social phenomena in new organizational terms. In this stratigraphy, two such multilevel relational infrastructures at least account for the more complex notion of position. First, multilevel social niches, i.e. subsets of “pairs” of individuals/organiza­tions that occupy a common position in the division of work of at least two strata of collective agency simultaneously (Žiberna, 2014; Žiberna & Lazega, 2016). One temporary kind of such a niche is the intermediary-level social niche, that is, a collegial pocket that is built in between strata to serve as a foothold for groups of actors who prepare for the reconfiguration of lower or upper levels with new projects, discourses, practices, turnover, and relational rewiring, thus attempting to drive the coevolution of these strata by challenging incumbents, creating new collectives and redefining the division of work. Second, multilevel status, which qualifies individuals who play the role of vertical linchpins, driving this coevolution of strata by being present and active in collective agency at two or more superposed levels/strata simultaneously. In a stratified context, these multilevel relational infrastructures shed additional light on institutional entrepreneurship and joint regulation, whether in public service, business firms or cooperatives, political parties, or civil society associations. For example, when vertical linchpins cluster together, they usually constitute a special kind of intermediary-­level social niche, a “collegial oligarchy” (Lazega, Quintane and Casenaz, 2016). Building on established knowledge of status heterogeneity and inconsistency in collegial settings, we can understand how members of this collegial oligarchy dominate (but do not monopolize) the joint regulatory process. When actors join efforts to build intermediarylevel social niches as stepping stones for the establishment of a collective presence at both levels simultaneously, they do so because establishing this presence at the other level requires a redefinition of the division of work at that level. This intermediary-level position is thus meant to eventually reach, beyond the mere function of collective foothold, the quality of second, cross-level social niche, to acquire a role in a redefined structure, i.e. in a new division of work at both levels simultaneously. The construction of such positions characterizes extremely competitive lower or upper levels in which mobility and relational turnover are intense and where new and challenging collective actors are not always welcome among incumbent individuals or organizations (Molina et al. 2018). Thus,

Networks and Neo-Structural Sociology   61 anticipating the future development of NSS, we argue that these multilevel relational infrastructures make full sense when considered in their dynamic coevolution and environment. This also indicates that complex dynamics of “multilevel synchronization” could also be measured and modeled using longitudinal and multilevel network analyses (Lazega, 2016a). Setting in motion the gears of such a multilevel synchronization is also socially costly in time and resources for members with multilevel forms of status who want to be part of the collegial oligarchy in their political system. Not only will such a synchronization prove intrinsically too expensive for many institutional entrepreneurs who cannot spare resources to share at both levels simultaneously, but also the cost of such synchronization could be dumped on lower-level constituencies, for example, first-level social niches. This can backfire in terms of regulation because the latter can also be internally competitive. For institutional entrepreneurship to work at several levels simultaneously, opportunity structures must be extended and mobilized efficiently—for example network lift from dual alters mentioned earlier (Lazega et al., 2013) and specific multilevel Matthew effects must be at work (Lazega & Jourda, 2016). Much remains to be done in NSS to further explore and enrich knowledge of such multilevel dynamics in terms of social inequalities. Indeed, theorizing dual/multilevel opportunity structures, synchronization, and costs of synchronization of levels in such opportunity structures and in the construction / emergence of social systems can contribute more generally to more established bodies of sociological knowledge. If different forms of adjustment and synchronization between levels take place, for example, in the relational turnover required by mobility and careers, costs—which are often invisible and poorly measured by contemporary sociology—generate still further social inequalities. These costs are almost always incurred by individuals, rarely by the or­gan­iz­ a­tion and by the actors using them as “tools with a life of their own” in their “dynamic configuring fields” (Selznick, 1949). Therefore, dumping of costs of synchronization on the weakest in society must lead NSS to rethink the contribution of dynamic and multilevel network analyses to measurements of social inequalities in the organizational society as a class society (Lazega, 2012b). Social stratification itself can be better understood with dynamic and multilevel network approaches to phenomena such as opportunity hoarding (Tilly, 1998), which transforms organizations into pawls of ratcheted social stratification. Dynamics well known to the study of social mobility in society are also multilevel: the more open the bottom of social stratification is, the more closed and self-segregated it is at the top (Godechot, 2016; Godechot et al., 2019; Tomaskovic-Devey, 2013)—closure being strongly reinforced by personalized relationships and collegial coordination (Lazega, 2020b). Again, mobility in loops (White, 1970) and organizational rotations create status hierarchies promoting or demoting leaders (Lazega, Lemercier, & Mounier, 2006), but at the same time they can also exclude discreetly, as in giant musical chairs. Exploratory network analytical insights into such developments can be found, for example, in work on social mobility (see Breiger,  1990, to begin with) and schools (Moody,  2001; Vermeij, Van Duijn, & Baerveldt, 2009). Separate dynamics at different levels of analysis raise new research questions about reassessing the relationships between meso and macro levels of society, especially their co-constitution. The conditions under which the multilevel character of a system of interdependencies drives social processes, rigidifies or destabilizes social structure and inequalities remain to be further measured and modeled.

62   Emmanuel Lazega Understanding social phenomena involving participatory processes, i.e. not only polarizing processes, will directly benefit from these developments in social network analysis. For example, bottom-up versus top-down struggles to shape the institutionalization of new commons and forms of collective responsibility (Lazega,  2017) in bureaucratic societies dominated by digital platforms are an issue for democracies in which bureaucratic regulation meets with collegial self-regulation, and understanding them will require such modeling, as will any kind of “unified” (bringing together bottom-up and top-down dynamics, public and private actors, and multiple national structures and cultures) emergence of transnational institutions in areas such as judiciary, urban development, environmental policy, etc. Between individual responsibilities, state responsibilities, and transnational institutions, there are multiple and superposed bureaucratic and collegial strata of collective agency and responsibility, each with their own social processes changing at their own rhythm and influencing change at the other levels. Thus, accounting for social phenomena, over time and across boundaries and levels, using a neo-structural approach is a challenging and promising approach.

Conclusion NSS is largely concerned with how members manage their social resources to fulfill their commitment to broadly understood collective responsibilities, thus helping collective actors manage dilemmas generated by cooperation. It considers social capital as a set of relational infrastructures and generic social processes, that is, as a collective asset and capacity for collective action taking place at the meso level in the organizational society. This was made possible by new formalisms proposed by generations of social network analysts. As always in science, these new formalisms have helped develop new phenomeno­ logies, intuitions, and hypotheses in sociology, especially about phenomena that are very difficult to observe empirically, such as the dynamics of multilevel forms of collective agency combining social and organizational networks over time. Throughout this exploration, several issues come into view as critical concerns for contemporary organizational societies, and thus for further neo-structural research by the social sciences. Without any claim to completeness, it is possible to count among these concerns the following issues. Making progress in the study of social networks requires awareness that we live in societies of organizations as class societies. Managerial thinking about combinations of bu­reauc­ racy and collegiality would tend to be short term and to favor solutions that can be safely implemented quickly, whereas innovation usually requires more time, a different temporality (Bruna, 2013). Synchronizing these temporalities by building dynamic multilevel relational infrastructures is costly in many ways, and thus an issue of social stratification and inequalities. Understanding how synchronization works requires measurements of sedimented vertical or horizontal differentiations of the social world at different levels and analyzing their costs. The issue of the relative costs of synchronizations and asynchronies between levels, as well as that of the allocation of these costs in the evolution of joint regulation and institutions, is important for the capacity to innovate technically, socially and politically for the many, not just to generate new cooperative institutions for the

Networks and Neo-Structural Sociology   63 s­ elf-segregated few. If members cannot reshape to some extent their structures of opportunity and constraints, – for example, build intermediary-­level relational infrastructures, become vertical linchpins – they cannot participate in the redefinition of a recognized schedule, and therefore in the redistribution of the costs and gains of synchronization between the superposed levels of collective agency. More work is thus needed to measure synchronization costs (as approximations of social (in)capacities) in institutional entrepreneurship and new ways of understanding social inequalities that are based on such costs. This characterizes increasingly situations where one level becomes entirely transparent to the other, as in the case of online microworkers (Tubaro, 2019; Tubaro & Casilli, 2019). Modeling and understanding of this social discipline or social capital of the collective, both at the interorganizational and intraorganizational levels of agency, requires rich data and knowledge of the relational dimension of these processes. The study of social capital as a collective capacity is nevertheless confronted with the problem of the organized scarcity of data on interdependences and social discipline that are accessible to public academic research. Indeed, the production of fundamental knowledge on the meso-social level is not the exclusive prerogative of academic organizations. Public administrations (police, military) and private companies (BRT as network data platforms for marketing, strategic consulting, personnel management, and labor markets) keep building and exploiting relational databases that allow them to acquire a sophisticated knowledge of the economic and social interdependences among individual and/or organizational actors. For example, contemporary social digitalization bureaucratizes social control by combining information from devices such as body sensors/captors with information from online relational profiles. This weakens control regimes based on concrete personal relationships—with possible societal consequences in terms of further limits to welfare protection or to political freedoms and institutional entrepreneurship (where they exist), both being likely to become conditional to acceptance of relational “intervention” (Valente,  2012; Lazega,  2015a). It is part of the responsibility of the public and open social sciences not to abandon to the private actors an increasingly systematic and closed knowledge of personal and organizational interdependences, social processes, and social capital as understood here. NSS and models of the dynamics of multilevel networks could help in understanding the current creation of new institutions (or change in older ones) to manage the coming (demographic, migratory, ecological) transitions and survival of societies in terms of access to vital resources (such as energy, clean water, food, or new technology). But they are also at the heart of digitalization as the latest phase of Weberian bureaucratization of society. Digital network analytical routines fed with data collected from intrusive privacy-killing social media technology will soon allow computer scientists and artificial intelligence to identify collegial settings and pockets in ways that may lead to manipulation or neutralization of the social processes and the relational infrastructures listed previously. This raises the prospect of undermined democratic institutional entrepreneurship and politics altogether (Al Amoudi & Lazega, 2020; Archer 2014). NSS and models of dynamics of multilevel networks point to many open questions that need to be addressed and unchartered territories that need to be explored. Little progress will be made, even methodological, without a sound theoretical foundation. NSS attempts to provide this foundation by contextualizing these models. Models will remain misleading if the social sciences do not uphold a tradition of anthropological social network analyses, empirical research questioning and listening to actors on the ground, uncovering the relational infrastructures and social

64   Emmanuel Lazega processes behind the phenomena in which they are interested. Whether it is about redefining commons, collective responsibility, coopetition, government, or even social stratification, exploring dynamic multilevel networks exposes social (in)capacities to build new laboratories for social change, including the issue of privatization of knowledge and the social (in)capacities to steer social change that come attached. For the worse not to be certain, much remains to be done to prevent social network analyses from becoming purely technocratic and bureaucratizing instruments of social engineering, including as a challenge for public sociology and its possible contribution to navigating future transitions.

References Agneessens, F., & Wittek, R. (2012). Where do intra-organizational advice relations come from? The role of informal status and social capital in social exchange. Social Networks, 34, 333–345. Al-Amoudi, I., & Lazega, E. (Eds.). (2019). Post-human institutions and organizations: Confronting the matrix. Abingdon: Routledge. Archer, M. S. (1982). Morphogenesis versus structuration: On combining structure and action. British Journal of Sociology, 35, 455–483. Archer, M. (Ed.). (2014). Late modernity: Trajectories towards morphogenic society. Dordrecht: Springer. Blau, P. M. (1964). Exchange and power in social life. New York, NY: John Wiley. Brailly, J. (2016). Dynamics of networks in trade fairs—A multilevel relational approach to the cooperation among competitors. Journal of Economic Geography, 16, 1279–1301. Brailly, J., Comet, C., Delarre, S., Eloire, F., Favre, G., Lazega, E., . . . Varanda, M. (2018). Neostructural economic sociology beyond embeddedness: Relational infrastructures and social processes in markets and market institutions. Economic Sociology, the European Electronic Newsletter, 19(3). Brailly, J., Favre, G., Chatellet, J., & Lazega, E. (2015). Embeddedness as a multilevel problem: A case study in economic sociology. Social Networks, 44, 319–333. Brandes, U. (2016). Network position. Methodological Innovations, 9, 1–19. Brass, D. J. (1984). Being in the right place: A structural analysis of individual influence in an organization. Administrative Science Quarterly, 29, 518–539. Breiger, R. L. (1974). The duality of persons and groups. Social Forces, 53, 181–190. Breiger, R. L. (Ed.). (1990). Social mobility and social structure. Cambridge, UK: Cambridge University Press. Breiger, R.  L. (2010). Dualities of culture and structure: Seeing through cultural holes. In J. Fuhse & S.  Mützel (Eds.), Relationale Soziologie: Zur kulturellen Wende der Netzwerkforschung (pp. 37–47). Berlin: Springer. Breiger, R. L. (2015). Scaling down. Big Data & Society. https://arizona.pure.elsevier.com/en/ publications/scaling-down Breiger, R. L., & Puetz, K. (2015). Culture and networks. In J. Wright (Ed.), International encyclopedia of social and behavioral sciences (2nd ed., pp. 557–562), Amsterdam: Elsevier. Bruna, M.-G. (2013). Performance et diversité : une analyse sous l’angle des réseaux sociaux (Doctoral dissertation). Université de Paris–Dauphine. Burt, R. (2005). Brokerage and closure: An introduction to social capital. New York, NY: Oxford University Press.

Networks and Neo-Structural Sociology   65 Coleman, J. S. (1990). Foundations of social theory. Cambridge, MA: Harvard University Press. Comet, C. (2007). Capital social et profits des artisans du bâtiment: le poids des incertitudes sociotechniques. Revue française de sociologie, 48, 67–91. Delarre, S. (2005). La reproduction des groupes d’entreprises comme entités socioéconomiques stables. Revue française de sociologie, 46, 115–150. Donati, P. (2010). Relational sociology: A new paradigm for the social sciences. Abingdon: Routledge. Durkheim, É. (1893). De la division du travail social (The division of labour in society) (W. D. Halls, Trans.). Paris, France: Félix Alcan. Elias, N. (1991), The society of individuals, Oxford: Basil Blackwell. Éloire, F. (2010). Une approche sociologique de la concurrence sur un marché Le cas des restaurateurs lillois. Revue française de sociologie, 51, 481–517. Favre, G. (2014). Des rencontres dans la mondialisation. Réseaux et apprentissages dans un salon de distribution de de programmes de télévision en Afrique sub-saharienne (Doctoral dissertation). Université de Paris–Dauphine. Favre, G., Brailly, J., Chatellet, J., & Lazega, E. (2016). Inter-organizational network influence on long term and short term inter-individual relationships: The case of a trade fair for TV programs distribution in sub-Saharan Africa. In E.  Lazega & T.  A.  B.  Snijders (Eds.), Multilevel network analysis: Theory, methods and applications. Dordrecht, Netherlands: Springer. pp. 295–314. Favereau, O., & Lazega, E. (Eds.). (2002). Conventions and structures in economic organization: Markets, networks, and hierarchies. Cheltenham, UK: Edward Elgar Publishing. Freeman, L. C. (1979). Centrality in social networks: Conceptual clarification. Social Networks, 1, 215–239. Glückler, J., Suddaby, R., & Lenz, R. (Eds.). (2018). Knowledge and institutions. Heidelberg, Germany: Springer. Godechot, O. (2016). Wages, bonuses and appropriation of profit in the financial industry: the working rich. London: Routledge. Godechot, O., Horton, J., Millo, Y. (2019). Structural Exchange Pays Off. Reciprocity in Boards and Executive Compensations in US Firms (1990–2015), Paris: MaxPo Discussion Paper, n°19.1. Gould, R.  V. (2002). The origins of status hierarchies: A formal theory and empirical test. American Journal of Sociology, 107, 1143–1178. Granovetter, M. S. (1985). Economic action and social structure: The problem of embeddedness. American Sociological Review, 91, 481–510. Grossetti, M. (2011). L’espace à trois dimensions des phénomènes sociaux. Echelles d’action et d’analyse. Sociologies. http://sociologies.revues.org/index3466.html Hughes, E. C. (1945). Dilemmas and contradictions of status. American Journal of Sociology, 50, 353–359. Ingram, P., & Roberts, P. W. (2000). Friendship among competitors in the Sydney hotel industry. American Journal of Sociology, 106, 387–423. Kilduff, M., & Tsai, W. (2003). Social networks and organizations. Thousand Oaks, CA: Sage. Kogut, B., & Zander, U.  B. (1996). What firms do: Coordination, identity and learning. Organization Studies, 7, 502–518. Krackhardt, D. (1987). Cognitive social structures. Social Networks, 9, 109–134. Krackhardt, D. (1990). Assessing the political landscape: Structure, cognition, and power in organizations. Administrative Science Quarterly, 35, 342–369. Ithaca, NY: Cornell University and Sage.

66   Emmanuel Lazega Laumann, E., & Knoke, D. (1987). The organizational state. University of Wisconsin Press. Lazega, E. (1992). Micropolitics of knowledge. New York, NY: Aldine de Grutyer. Lazega, E. (2000). Teaming up and out? Cooperation and solidarity in a collegial or­gan­i­za­ tion. European Sociological Review, 16, 245–266. Lazega, E. (2001). The Collegial phenomenon: The Social mechanisms of cooperation among peers in a corporate law partnership. Oxford, UK: Oxford University Press. Lazega, E. (2006). Capital social, processus sociaux et capacité d’action collective. In A. Bevort & M.  Lallement (Eds.), Capital social: Echanges, réciprocité, équité (pp. 213–225). Paris, France: La Découverte. Lazega, E. (2009). Theory of cooperation among competitors: A neo-structural approach. Sociologica, Italian Journal of Sociology Online, 1. doi:10.2383/29560 Lazega, E. (2012a). Sociologie néo-structurale. In R. Keucheyan & G. Bronner (Eds.), Introduction à la théorie sociale contemporaine. Paris, France: Presses Universitaires de France. pp. 113–129. Lazega, E. (2012b). Analyses de réseaux et classes sociales. Revue Française de Socio-Economie, 10, 273–279. Lazega, E. (2012c). Time to shrink to greatness? Networks and conflicts of interests in large professional firms. Revue für post-theorisches management, 10, 34–43. Lazega, E. (2014). Appropriateness and structure in organizations: Secondary socialization through dynamics of advice networks and weak culture. In D. J. Brass, G. (J.) Labianca, A. Mehra, D. S. Halgin, & S. P. Borgatti (Eds.), Contemporary perspectives on organizational social networks: Research in the sociology of organizations (Vol. 40, pp. 381–402). Bingley, UK: Emerald Group Publishing. Lazega, E. (2015a). Body captors and network profiles: A neo-structural note on digitalized social control and morphogenesis. In M.  S.  Archer (Ed.), Generative mechanisms transforming the social order (pp. 113–133). Dordrecht, Netherlands: Springer. Lazega, E. (2016a). Synchronization costs in the organizational society: Intermediary relational infrastructures in the dynamics of multilevel networks. In E. Lazega & T. Snijders (Eds.), Multilevel network analysis: Theory, methods and applications. Dordrecht, Netherlands: Springer. pp. 47–77. Lazega, E. (2016b). Réseaux et régulation: Pour un institutionnalisme néo-structural. Revue de la Régulation. Online. https://journals.openedition.org/regulation/11902. Lazega, E. (2017). Networks and commons: Organizational morphogenesis in the struggles to shape new sharing institutions. In M. S. Archer (Ed.), Morphogenesis and human flourishing (Vol. V). Dordrecht: Springer Verlag. Lazega, E. (2018). Networks and institutionalization: A neo-structural approach (EUSN 2017 keynote address). Connections, 37, 7–22. Lazega, E. (2019). Bottom-up collegiality, top-down collegiality, or inside-out collegiality? Research on multilevel and intermediary-level relational infrastructures as laboratories for social change. In G. Ragozini & M.-P. Vitale (Eds.), Challenges in social network research (pp. 17‒31), Dordrecht: Springer. Lazega, E. (2020a). Embarked on social processes in dynamic and multilevel networks (INSNA Sunbelt 2018 keynote address). Connections, 40:60–76. Lazega, E. (2020b), Bureaucracy, Collegiality and Social Change: Redefining Organizations with Multilevel Relational Infrastructures, Cheltenham, UK: Edward Elgar Publishers. Lazega, E., Bar-Hen, A., Barbillon, P., & Donnet, S. 2016. Effects of competition on collective learning in advice networks. Social Networks, 47, 1–14. Lazega, E., & Jourda, M.-T. (2016). The structural wings of Matthew effects: The contribution of three-level network data to the analysis of cumulative advantage. Methodological Innovation, 9, 1–13.

Networks and Neo-Structural Sociology   67 Lazega, E., Jourda, M., & Mounier, L. (2013). Network lift from dual alters: Extended opportunity structures from a multilevel and structural perspective. European Sociological Review, 29, 1226–1238. Lazega, E., Jourda, M., Mounier, L., & Stofer, R. (2008). Catching up with big fish in the big pond? Multi-level network analysis through linked design. Social Networks, 30, 157–176. Lazega, E., & Krackhardt, D. (2000). Spreading and shifting costs of lateral control in a law partnership: A structural analysis at the individual level. Quality & Quantity, 34, 153–175. Lazega, E., Lemercier, C., & Mounier, L. (2006). A spinning top model of formal structure and informal behaviour: Dynamics of advice networks in a commercial court. European Management Review, 3, 113–122. Lazega, E., & Mounier, L. (2002). Interdependent entrepreneurs and the social discipline of their cooperation: A research program for structural economic sociology in a society of organizations. In O. Favereau & E. Lazega (Eds.), Conventions and structures in economic organization: Markets, networks and hierarchies (pp. 147–199). Cheltenham, UK: Edward Elgar. Lazega, E. & Mounier, L. (2012). Networks of institutional capture. In B. Vedres & M. Scotti (eds), Networks in Social Policy Problems (pp. 124–137), Cambridge: Cambridge University Press. Lazega, E., Mounier, L., Snijders, T., & Tubaro, P. (2012). Norms, status and the dynamics of advice networks. Social Networks, 34, 323–332. Lazega, E., & Pattison, P. (2001). Social capital as social mechanisms and collective assets: The example of status auctions among colleagues. In N. Lin, K. Cook, & R. Burt (Eds.), Social capital: Theory and research (pp. 185–208). New York, NY: Aldine-de Gruyter. Lazega, E., & Pattison, P. (1999). Multiplexity, generalized exchange and cooperation in or­gan­ i­za­tions. Social Networks, 21, 67–90. Lazega, E., Quintane, E., & Casenaz, S. (2016). Collegial oligarchy and networks of normative alignments in transnational institution building: The case of the European Unified Patent Court. Social Networks, 48, 10–22. Lazega, E., Sapulete, S., & Mounier, L. (2011). Structural stability regardless of membership turnover? The added value of blockmodelling in the analysis of network evolution. Quality & Quantity, 45, 129–144. Lazega, E., & Snijders, T. A. B. (Eds.). (2016). Multilevel network analysis for the social sciences: Theory, methods and applications (Methodos Series). Dordrecht, Netherlands: Springer. Lazega, E., & Van Duijn, M. (1997). Position in formal structure, personal characteristics and choices of advisors in a law firm: A logistic regression model for dyadic network data. Social Networks, 19, 375–397. Lazega, E., & Wattebled, O. (2011). Two definitions of collegiality and their inter-relation: The case of a Roman Catholic diocese. Sociologie du Travail, 53(Suppl. 1), e57–77. Lenski, G. E. (1954). Status crystallization: A non-vertical dimension of social status. American Sociological Review, 19, 405–413. Lévi-Strauss, C. (1949). Les Structures élémentaires de la parenté. Paris, France: PUF. Lévi-Strauss, C. (1978), Myth and Meaning, Toronto: Toronto University Press. Lomi, A., & Pattison, P. (2006). Manufacturing relations: An empirical study of the or­gan­i­za­ tion of production across multiple networks. Organization Science, 17, 313–332. Lusher, D., Koskinen, J., & Robins, G. (Eds.). (2013). Exponential random graph models for social networks: Theory, methods and applications. New York, NY: Cambridge University Press. Merton, R. K. (1959). Social theory and social structure. Glencoe, IL: Free Press. Molina, J. L., Martínez-Cháfer, L., Molina-Morales, F. X., & Lubbers, M. J. (2018). Industrial districts and migrant enclaves: a model of interaction. European Planning Studies, 26(6), 1160‒1180.

68   Emmanuel Lazega Montes-Linh, J. (2014). Apprentissages inter-organisationnels au sein des réseaux inter-­individuels: Le cas de la conversion de viticulteurs à l’agriculture biologique (Doctoral dissertation). Université Paris–Dauphine. Mitchell, J. C. (1969). The concept and use of social networks. In J. C. Mitchell (Ed.), Social Networks in Urban Situations. Analyses of Personal Relationships in Central African Towns (pp. 1–50). Manchester: Manchester University Press. Moody, J. (2001). Race, school integration, and friendship segregation in America. American Journal of Sociology, 107, 679–716. Moody, J. (2009). Static representations of dynamic networks. Duke Population Research Institute Online Working Paper Series, August. Moody, J., Brynildsen, W. D., Osgood, D. W., Feinberg, M. E., & Gest, S. (2011). Popularity trajectories and substance use in early adolescence. Social Networks, 33, 101–112. Olson, M., Jr. (1965). The logic of collective action. Cambridge, MA: Harvard University Press. Oubenal, M. (2015). La légitimation des produits financiers, Le cas des Exchange Traded Funds (ETF) en France. Rennes, France: Presses Universitaires de Rennes. Penalva-Icher, E. (2010). Amitié et régulation par les normes: Le cas de l’investissement socialement responsable. Revue française de sociologie, 51, 519–544. Perrow, C. (1991). A society of organizations. Theory and Society, 20, 725–762. Piña-Stranger, Á., & Lazega, E. (2010). Inter-organizational collective learning: The case of French biotech industry. European Journal of International Management, 4, 602–620. Quintane, E. (2013). Short-term and long term stability in organizational networks: Temporal structures of project teams. Social Networks, 35, 528–540. Reynaud, J.-D. (1989). Les Règles du jeu. Paris, France: Armand Colin. Robins, G. L., Woolcock, J., & Pattison, P. (2005). Small and other worlds: Global network structures from local processes. American Journal of Sociology, 110, 894–936. Saussure, L.-F. (1916). Cours de linguistique générale. Paris, France: Payot. Schultz, J., & Breiger, R. L. (2010). The strength of weak culture. Poetics: Journal of Empirical Research on Culture, the Media, and the Arts, 38, 610–624. Selznick, P. (1949). TVA and the grass roots: A study of politics and organization. Berkeley, CA: University of California Press. Selznick, P. (1957). Leadership in administration. Evanston, IL: Row, Peterson & Co. Škerlavaj, M., & Dimovski, V. (2006). Social network approach to organizational learning. Journal of Applied Business Research, 22, 89–97. Snijders, T. A. B. (2001). The statistical evaluation of social network dynamics. In M. Sobel & M. Becker (Eds.), Sociological methodology (pp. 361–395). London, UK: Basil Blackwell. Snijders, T. A. B. (2005). Models for longitudinal network data. In P. Carrington, J. Scott, and S.  Wasserman (Eds.), Models and methods in social network analysis. New York, NY: Cambridge University Press. pp. 215–247. Snijders, T. A. B. (2016). The multiple flavours of multilevel issues for networks. In E. Lazega & T. A. B. Snijders (Eds.), Multilevel network analysis: Theory, methods and applications (Methodos Series). Dordrecht, Netherlands: Springer. pp. 15–46. Snijders, T.  A.  B. (2017). Stochastic actor-oriented models for network dynamics. Annual Review of Statistics and Its Application, 4, 343–363. Snijders, T. A. B., & Bosker, R. (1999). Multilevel analysis. London, UK: Sage. Snijders, T. A. B., & Nowicki, K. (1997). Estimation and prediction for stochastic blockmodels for graphs with latent block structure. Journal of Classification, 14, 75–100.

Networks and Neo-Structural Sociology   69 Snijders, T. A. B., Lomi, A., & Torló, V. J. (2013). A model for the multiplex dynamics of twomode and one-mode networks, with an application to employment preference, friendship, and advice. Social networks, 35(2), 265–276. Amsterdam: Elsevier. Snijders, T. A. B., Pattison, P., Robins, G. L., & Handcock, M. (2006). New specifications for exponential random graph models. Sociological Methodology, 36, 99–153. Snijders, T. A. B., Steglich, C. E. G., & Schweinberger, M. (2007). Modeling the co-evolution of networks and behaviour. In K. van Montfort, H. Oud, & A. Satorra (Ed.), Longitudinal models in the behavioral and related sciences, 31:41–71. London: Routledge. Stryker, S. (1980). Symbolic interactionism: A social structural version. London, UK: Benjamin/ Cummings. Tilly, C. (1998). Durable inequality. Berkeley, CA: University of California Press. Tilly, C. (2007). Democracy. Cambridge, UK: Cambridge University Press. Tomaskovic-Devey, D. (2013). “What Might a Labor Market Look Like?”. Research in the Sociology of Work, 24, 45‒80. Tubaro, P. (2019). Décrypter la société des plateformes: organisations, marchés et réseaux dans l’économie numérique [Decyphering the platform society: organizations, markets and networks in the digital economy]. Mémoire d’habilitation à diriger des recherches, Institut d’Études Politiques de Paris. Tubaro, P., & Casilli, A. (2019). Micro-work, artificial intelligence and the automotive industry. Journal of Industrial and Business Economics, 46(3), 333–345. Uzzi, B. (1997). Social structure and competition in interfirm networks. The paradox of embeddedness. Administrative Science Quarterly, 42, 35–67. Valente, T. W. (2012). Network interventions. Science, 337, 49–53. Van Duijn, M. A. J., Snijders, T. A. B., & Zijlstra, B. J. H. (2004). P2: A random effects model with covariates for directed graphs. Statistica Neerlandica, 58, 234–254. Varanda, M. (2005). La réorganisation du commerce d’un centre-ville: résistance et obstacles à l’action collective. Paris, France: L’Harmattan. Vermeij, L., Van Duijn, M. A. J., & Baerveldt, C. (2009). Ethnic segregation in context: Social discrimination among native Dutch pupils and their ethnic minority classmates. Social Networks, 31, 230–239. Wang, P., Robins, G., Pattison, P., & Lazega, E. (2013). Exponential random graph models for multilevel networks. Social Networks, 35, 96–115. Wang, P., Robins, G., Pattison, P., & Lazega, E. (2016). Social selection models for multilevel networks. Social Networks, 44, 96–115. Wasserman, S., & Faust, K. (1994). Social network analysis, theory and applications. Cambridge, MA: Cambridge University Press. Waters, M. (1989). Collegiality, bureaucratization, and professionalization: A Weberian analysis. American Journal of Sociology, 94, 945–972. Weber, M. (1978 [1920]). Economy and society (Ed. G. Roth and C. Wittich). Berkeley, CA: University of California Press. Wellman, B., & Berkowitz, S. (Eds.). (1988). Social structures: A network approach. Cambridge, UK: Cambridge University Press. White, H.  C. (1970). Chains of opportunity: System models of mobility in organizations. Cambridge, MA: Harvard University Press. White, H. C. (1981). Where do markets come from? American Journal of Sociology, 87(3), 517–547.

70   Emmanuel Lazega White, H. C., Boorman, S., & Breiger, R. L. (1976). Social structure from multiple networks I. Blockmodels of roles and positions. American Journal of Sociology, 81, 730–780. Wittek, R. (1999). Interdependence and informal control in organizations. Groningen, Netherlands: University of Groningen Press. Wittek, R., Schimank, U., & Groß, T. (2007). Governance: A sociological perspective. In D. Jansen (Ed.), New forms of governance in research organizations (pp. 71–106). Dordrecht, Netherlands: Springer. Wittek, R., & van de Bunt, G. G. (2004). Post-bureaucratic governance, informal networks and oppositional solidarity in organizations. Netherlands’ Journal of Social Sciences, 40(3), 295–319. Žiberna, A. (2014). Blockmodeling of multilevel networks. Social Networks, 39, 46–61. Žiberna, A., & Lazega, E. (2016). Role sets and division of work at two levels of collective agency: the case of blockmodeling a multilevel (interindividual and interorganizational) network. In E. Lazega & T. A. B. Snijders (Eds.), Multilevel network analysis: theory, methods and applications (pp. 173‒209). Dordrecht: Springer. Zijlstra, B. J. H., Van Duijn, M. A. J., & Snijders, T. A. B. (2006). The multilevel p2 model—A random effects model for the analysis of multiple social networks. Methodology, 2, 42–47.

Chapter 5

R ethi n k i ng Soci a l N et wor ks i n th e Er a of Compu tationa l Soci a l Science James A. Kitts and Eric Quintane

Social network analysis and theory has proliferated rapidly across the social sciences, shifting our analytical focus from individuals or groups to social relations or “ties.” Such ties have been conceptualized in four distinct ways. Theories of group process and team dynamics often depict social networks as patterns of sentiments, or thoughts and feelings in the minds of individuals directed at others, such as liking, hatred, or trust. Theorists interested in network positions as sources of power often think of ties as access, an enduring opportunity to obtain resources or information from another (even if the party never uses the opportunity). Theorists interested in social influence or contagion often think of networks as behavioral interaction that actually occurs between actors, such as temporally aggregated communication, support, gifts, sex, or citations (ignoring possible contacts that never occurred). Most classic work on social networks has measured role relations where two parties have a socially constructed relationship associated with distinct norms and expectations for role-related behavior, such as marriage, friendship, or coauthorship. Although these four approaches consider different kinds of theoretical objects, they similarly treat the network as a temporally continuous and enduring latent structure.1 An alternative perspective focuses on time-situated events linking actors, including interactions such as conversations, meetings, or transactions. This focus is growing in social network research because of two revolutions, one in data collection and one in data analysis. First, new telecommunications and sensor technologies allow researchers to collect data on events connecting actors with unprecedented volume and granularity. Electronic traces, such as logs of messages sent and received, telephone calls, meetings recorded on electronic calendars, exchanges in online commerce or sharing sites, or sensor data, allow researchers to observe social dynamics in fine time grain. In the second revolution, innovations in

72   James A. Kitts and Eric Quintane statistical methods are uniquely fit to make sense of these streaming data, by explicitly modeling temporal interdependence of relational events (Butts, 2008), including interaction behavior. This confluence of cutting-edge methods and dynamic behavioral data implies exciting frontiers of empirical research. However, it also challenges social network theory, which has been largely predicated on a view of networks as relatively stable configurations of interpersonal relationships. After describing these two revolutions, we investigate the mapping between streaming interaction data and social network concepts to identify implications for network theories. We show how a deeper understanding of temporal dynamics can also enhance our understanding of traditional social network lenses. In particular, new frontiers inspire rigorous attention to scope conditions on theories developed for alternative network concepts, such as social influence network theory (Friedkin & Johnsen,  2011) for interaction networks, structural balance theory (Cartwright & Harary, 1956) for interpersonal sentiments, or network exchange theory (Cook et al., 1983) for structures of access or opportunity. We also describe nascent efforts to develop dynamic structural theories to fit the new breed of timestamped event data. This often means eschewing the concept of ties to focus on social processes operating in time.

Four Conceptualizations of Network Ties for Social Network Theory We will elaborate an analytic typology developed by Kitts (2014), distinguishing four basic approaches to defining networks. The four usages of the social network concept offer building blocks for distinct theories, and all have supported decades of fruitful research. This typology certainly does not represent the approaches as mutually exclusive, and they often overlap empirically as illustrated in Figure 5.1. Some empirical ties may involve two or more of these features, but the four types remain distinct in theory because they apply to distinct theoretical mechanisms with corresponding scope conditions. Social network analysts often ignore the differences between these four conceptualizations, as if they are interchangeable, such that we can measure one and apply it to a theory developed for another. For example, they may use a measure of electronic messages sent and received (interaction) and apply a theory such as homophilous attraction or structural balance (which apply to sentiments) or centrality and power in exchange (which apply to structures of access and opportunity). In briefly discussing these four traditions, we will highlight a couple of analytical issues that prove relevant under revolutionary modes of data collection and analysis. First, how does a network concept represent null ties, such as non-coauthors (role relation), impossible exchange partners (structure of access or opportunity), nonrecipients of phone calls (interaction), or nonliked others (sentiment)? Second, what is the connection of the social network concept to time? In particular, does the conceptualization of network ties assume temporal continuity? By continuity in time, we mean that for a given time interval on which a tie is defined, the tie is assumed to exist from start to finish including all time points

Rethinking Networks   73

Behavioral Interactions Face-to-face talk, sex, money lending, phone calls, citations, violence, electronic messages

Role Relations Cultural label with relational norms: Friends, teammates, comembers, coauthors, advisor/student, patron/client

Access Opportunities for exchange, information, support, (even if not used)

Interpersonal Sentiments Individuals’ thoughts or feelings directed at another: Liking, love, hatred respect, trust

figure 5.1  Four conceptualizations of social networks. within the interval.2 For example, if one person reports being a friend with another person in a given survey wave, this is assumed to be true for all time within the wave.

Social Ties as Access or Opportunity A substantial body of theory assumes that social networks are opportunity structures, where a party has an enduring access to resources (Molm, Whitham, & Melamed, 2012) or information from another (even if the party never uses the opportunity). Knowing the set of alters accessible in such a way to any given ego allows a researcher to construct a graph of possible paths by which goods or information might flow among actors. Classic measures of closeness or betweenness centrality (Freeman, 1979) are defined on geodesics, or the shortest paths that can be traversed between any two nodes in a network. An actor connected to others by relatively short paths is assumed to have access to more accurate and timely information, and an actor who lies on many shortest paths of access among peers is assumed to have power to control flows of information in the network (Bonacich, 1987). Researchers in sociological exchange theory have offered valuable insights into the dynamics of exchange given exogenous structures of access or opportunity (Cook et al., 1983). Although this usage is prevalent in network theories and this interpretation is imposed on social network data of all forms, it has been poorly aligned with observational

74   James A. Kitts and Eric Quintane social network data. There are not many cases where naturally collected data on interpersonal sentiments, role relations, or interaction are interpretable as structures of access or opportunity. Potential interaction partners may be difficult or impossible to observe empirically because most links may never be realized. For example, researchers may observe patterns of electronic messages sent or money lent among actors, but not the set of alters who could have sent messages or lent money. In fact, much of the empirical work that has aimed to investigate social networks as access to resources or information has been in the domain of laboratory experiments, where investigators can control such access. For example, a prominent contemporary online experiment (Centola, 2010) assigns network ties (“health buddies”) as information sent by the investigator to ego about alter’s choice (even where parties are anonymous and will never interact, no sentiments are felt, and ego never reads that information).3

Social Ties as (Time-Aggregated) Behavioral Interactions In many cases, researchers are not interested in where interaction could occur, but instead where two parties actually share overt behavior or one directs behavior at another. Theories of social influence (Friedkin & Johnsen, 2011) and diffusion (Valente, 1995) typically apply to interaction networks. Researchers studying the spread of HIV naturally focus on networks of risk behaviors, such as sex or needle sharing. Generative theories of social networks may also be driven by assumptions about interaction. For example, arguments that spatial and sociometric propinquity leads to formation of dyadic ties (C.  C.  Liu & Srivastava, 2015) and triad closure by chance (Stephens & Poorthuis, 2015; Lewis, Gonzales, & Kaufman,  2012) are driven by an assumption that ties are composed of behavior that occurs in time and physical or social space. Classic social network analysis and theory has focused on temporally continuous and stable relationships as measured by sociometric surveys. By contrast, interaction behavior is explicitly rooted in time and so behavioral interaction data are typically ill-suited to this view of networks as timeless abstractions. To develop conventional social network data from interaction records, researchers interpret aggregations of observed or self-reported interaction histories (Heidler et al., 2014; Vargas, 2011) as temporally continuous relationships that may be amenable to social network analysis. Aggregation produces a set of dyadic event counts (such as the number of electronic messages sent within a month), and thus coarsens the observation of interaction, but does not change the nature of the data. They are still events. Dyadic event counts do not naturally imply continuity and thus do not resemble a network of relationships. To make these data amenable to conventional social network analysis, scholars typically take a step beyond aggregation to convert those dyadic event counts into a set of “ties” that are assumed continuous in time. For example, researchers may define a friend as “a person whom the user has directed at least two posts to” (Huberman et al., 2009) or a communication network as a set of ties represented by at least one reciprocated phone call in a month (Eagle, Macy, & Claxton,  2010) or in 18 weeks (Onnela et al., 2007). Aggregating timestamped events into counts is a methodological choice to reduce the temporal pattern into a static constant (Brashears & Quintane, 2018); interpreting that temporally aggregated event count as a continuous “tie” or relationship is an ontological

Rethinking Networks   75 leap with deep implications for theory. This method may be repeated over successive time periods to derive a series of such structures, turning episodic temporal aggregations into panel network data.

Social Ties as Interpersonal Sentiments Classic theories of network dynamics interpret social ties as interpersonal sentiments (Homans,  1950), which are thoughts or feelings that social actors have about others. As sentiments exist in the minds of individual perceivers, they are inherently directional. They may be positive (liking or trust), negative (hatred or jealousy), or neutral (awareness or acquaintance), and they may or may not be symmetric with sentiments felt by the other party. Sentiments may be interconnected with but distinct from social interaction. For example, according to theories of homophilous attraction (Byrne, 1961), individuals are drawn to peers who are similar to themselves, grow to like them, then choose them as interaction and relationship partners. The triadic theory of structural balance (Cartwright & Harary, 1956; Marvel et al.,  2011) is built on an assumption that ties represent positively or negatively valenced sentiments. Dissonance arises when positively tied actors disagree (or negatively tied actors agree) in the sign of their tie to a third actor. This dissonance leads to instability in the network, and resulting reconfigurations may lead to a more stable balanced state (where triads are either all positive or have a positively tied pair that is negatively tied to a third). In all of these cases, the “tie” exists in the subjective thoughts or feelings of the actor(s).

Social Ties as Socially Constructed Role Relations Much social network research has depicted networks as sets of role relations, cultural labels assigned to dyads as distinct relationships (such as friendship, kinship, marriage, etc.) or shared involvement in some higher-order social unit (teammates, officemates, housemates, coauthors). These relations operate as relational norms, expectations, and repertoires for how actors behave within the role. For example, research on the meaning of friendship among adolescents (Kitts & Leal, 2020) shows that friendship is typically construed as relational norms, including behavioral prescriptions (e.g., “sticks up for me when others are against me”) and proscriptions (e.g., “would never hurt me”). Although we might thus infer relationships from patterns in role-related behavior, more often researchers simply ask individuals to identify their friends or other relations in surveys or interviews, or find archival data that records these relationships. There is no theory about the social processes producing role relations generally (applying across disparate categories of friendship, kinship, combatant, plaintiff-defendant, and ­vendor-client relations), and few theories even for more specific role relations. However, because role relations are simple to measure with surveys or archival data, they are often used as proxies for any of the other three types of ties. Theories developed to explain the dynamics of interpersonal sentiments, access, or behavioral interaction have been applied

76   James A. Kitts and Eric Quintane to role relation data (such as friendship or coauthorship), even though these theories have nothing to say directly about role relations.

Comparing These Four Conceptualizations Treatment of Ties and Null Ties Researchers have conflated theories and data across these four distinct conceptualizations of social networks with impunity because they have focused exclusively on ties. It is instructive to redirect our attention to null ties to interrogate the mapping of one network concept to another. Work based on structures of access or opportunity—including theories of structural power and centrality in exchange networks—is predicated on an interpretation of null ties as loci where interaction is strictly impossible. It is rarely recognized that this implies a weak or ambiguous interpretation of ties, where interaction might occur (but might not). If we represent the network as a sociomatrix, most of the analytical leverage comes from a strong interpretation of the zeros. Any uncertainty about those zeros is a fundamental problem: a long path through social ties may be unimportant if the first node could simply contact the last node directly (as if traversing null ties). Uncertainty about the null ties in a network propagates along each step in a path, such that long paths (and metrics that depend on them) become questionable. In the case of sentiments and especially role relations, the focus is entirely on ties, so null ties are poorly defined and rarely observed in any explicit way. For example, if a survey allows students to identify their best friends among students in their school, we have some insight into the alters who are nominated as best friend but little information about the alters not so nominated. In fact, the non-best friends could be enemies, strangers, siblings, lovers, neighbors, or just very good friends. The problem is not peculiar to friendship and is shared for a great variety of role relations as well as for sentiments. We may make some reasonable assumptions about the behavior of a coauthor, client, or friend-with-benefits relation, but how do we interpret non-coauthor, non-client, or non-friend-with-benefits? Investigating null ties in role relation and sentiment network measures is an open frontier for future research, and we will discuss this frontier later. Until we have more purchase on what null ties mean for a given role relation or sentiment we should be very careful in applying theories or metrics that depend importantly on null ties when using role relation or sentiment data. Defining networks as temporally aggregated interaction histories offers a concrete, clear, and strong definition of social ties: we know whether or how much a given mode of interaction (email, phone conversation, sex, lending) has been observed on this time interval. Null ties in temporally aggregated interaction data also have an explicit interpretation: interaction of this form did not occur between these actors during this time interval. However, null ties still cannot carry the theoretical weight that they carry for structures of

Rethinking Networks   77 access. Taking aggregated interaction as a proxy for access requires us to interpret lack of observed interaction on a time interval as observed impossibility of interaction on that interval.

Temporality Ties conceived as role relations, sentiments, and access all have had little relationship to time in previous work. Until recently, almost all social networks research involved a timeless analysis of a fixed structure of relations. All ties were assumed to operate continuously for the duration of the research. This assumption of continuity at the tie level implies two related properties at the graph level, stability and concurrency. By network stability, we mean an absence of network change on the interval. By concurrency, we mean that all ties within the period of continuity are simultaneously active, and thus all paths described by the graph can be traversed for the entire interval. Accordingly, we can employ a centrality or centralization metric, a clustering coefficient, or an index of homophily, modularity, or structural cohesion to describe the network structure. Longitudinal studies were rare until recently and have typically applied the same timeless lenses to repeated cross-sections. Within each time interval, all ties are assumed to be present (or absent) continuously and concurrently, even as the network changes between discrete time steps. It is rarely possible to identify a start or stop time for a role relation, sentiment, or access. An exception, such as legal marriage, may have a well-defined beginning and ending, but those time points may have little to do with the underlying sentiments and behaviors of matrimony. A couple may love each other and live together years before marrying, and they may move out and hate each other a long time before divorcing. Similarly, coauthorship is a role relation that typically implies interaction over time, but interaction involved in developing the article may be separated by years from the observed publication time, coauthors may not even work on the project at the same time, and some may not work on it at all. Unlike most interpretations of sentiments, role relations, and access structures, interaction behavior—such as conversations, money lending, sex, and fighting—occurs at specific times. But the canon of social network theory, developed largely for static networks, has little to say about time. Thus, although we see the temporality implicit in interaction data as an opportunity to develop more nuanced processual theories of social networks, scholars have historically treated time as an ignorable nuisance by aggregating interaction data over time, interpreting the resulting data as cross-sectional networks just like those collected through sociometric surveys. More recent research on social interaction has aggregated events into coarse panels, defining a tie as a quantity of interaction exceeding some threshold (e.g., at least n phone calls in a month) and defining a null tie as a lesser quantity of interaction in that time interval. Let us consider the implications of the width of those time intervals, a seemingly innocuous methodological choice. The wider the time interval monitored, the more events may be included in each interval, the greater likelihood that social ties will be identified, and the denser the network. Shrinking intervals of aggregation breaks event

78   James A. Kitts and Eric Quintane series into separate intervals, reduces within-interval event counts, eliminates ties, and reduces network density. In the limit as intervals shrink, the network disappears altogether. By contrast, temporally continuous relationships will not be eliminated by shrinking the width of panels. Even an instantaneous snapshot of a network of relationships is still a network. Observing interaction event histories situated in continuous time without temporal aggregation may produce sparse and unstable ties with only vacuous null ties. Aggregating events in time gives more stable impressions of ties and more meaningful interpretations of null ties. The tradeoff is that, in developing these robust images of network structure by aggregation, researchers also destroy details of the sequence and timing of interaction. Interestingly, constructing networks from dynamic interaction data necessarily implies these tradeoffs between observing the dynamic process (by dissolving the network) and observing the structure (by obscuring the process). In this chapter, we point to new analytical lenses that resolve this dilemma through explicit dynamic structural analysis of relational events.

Dilemmas of Mapping Theories to Data across Discrepant Conceptualizations of Networks Given that research has so rarely acknowledged the discrepancies between these four ways of conceptualizing networks, there has been hardly any attention to the conditions under which we can employ a theory developed for one conceptualization to data drawn from another. Consequently, researchers often face a mismatch between the form of their network data and the network theory they aim to employ. Social influence or diffusion theories developed for behavioral interaction have been applied to role relation data, such as modeling contagion on a friendship network or social influence among coworkers. Structural balance theory and homophilous attraction, developed for sentiment structures, have been applied to socially constructed role relations or even behavioral interaction networks such as electronic messages or phone calls. Similarly, theories of network exchange and structural power, developed for structures of access or opportunity, have been applied to role relations or behavioral interaction data. Let us examine some challenges for crossing these boundaries in connecting theory to data and then think about scope conditions for when it may be acceptable to do so. It seems that there are 12 possible ways that theory and data can mismatch across these four conceptualizations, but we focus on the 5 ways that are common in the literature. Given the ease of measuring role relations using surveys (and the lack of general theories for role relations), early researchers generally used role relations as proxies to apply theories of interaction, access, and sentiments. In more recent years, especially with the advent of digital trace data (Golder & Macy, 2014; Malik, 2018), interaction data are readily available with little effort or expense, so researchers use them as proxies to apply theories of access and sentiments. We focus on these five cases where researchers commonly employ a theory from one network conceptualization to data derived from another.

Rethinking Networks   79

Can We Use Role Relation Data to Investigate Theories of Social Interaction, Access, and Sentiments? Interpreting a role relation graph (kin, friend, lover, classmate, teammate, coworker, comember, neighbor) as an interaction graph is making dual assumptions that the specific role relation implies interaction and that the lack of that specific role relation implies noninteraction. For many relations, the former assumption may be questionable, while the latter is often absurd. People do not always interact with their friends, neighbors, or parents, and we cannot assume that non-friends, non-neighbors, or non-kin never interact. Thus, a role relation network will at best offer a subset of the true structure of interaction, and some key role relations may not represent regular interaction at all. If researchers interpret null ties in the role relation graph as null ties in the interaction graph, this likely induces “false negatives” where interaction occurred despite an absent role relation. Rather than conventionally assuming this risk away, let us begin to explore scope conditions where it may be reasonable to assume that a nonrelationship does in fact reflect noninteraction. Consider a small bounded population with a high baseline level of interaction, such as a school classroom. In this context it seems indefensible to assume that non–best friends never interact with one another. By contrast, if the network boundary is very broad and the baseline level of interaction very low, such as in a large city, then it is more reasonable to assume that non-best-friends never interact. Stated more generally, interpreting a role relation network as an interaction network is more problematic if the network boundary is set around a population with a high expectation of interaction. We avoid repetition here, but all of these points apply at least as strongly for interpreting role relation networks as access networks. In light of this challenge, it seems hard to motivate a study of structural power (e.g., depicting a central actor as lying on the shortest paths among peers) for a network of close friends in a classroom or in a small team of employees, because a theory of access places great weight on null ties, which carry little weight when the researcher has only measured a specific role relation in a population with high expectation of interpersonal access. This is a key challenge to a large body of work that has applied such theories, concepts, and metrics for specific role relations in classrooms, teams, or ­organizational units, where access is practically ubiquitous across the population. We need similar scope conditions for applying role relation data to theories of sentiments. Consider applying structural balance theory to friendship networks among youth. Sentiments can be challenging to measure, especially negative sentiments. Many scholars interested in positive and negative sentiments have measured the role relation friendship (taken to represent positive sentiments) and assumed that peers not mentioned as friends or best friends are in fact negative ties. For example, Hallinan (1974) measured best friends among students and implicitly assumed that alters not nominated as best friends operated as negative sentiment ties, to apply structural balance theory. Following this logic, over four decades of studies have appealed to structural balance theory as an explanation for triad closure or transitivity in positive-sentiment relations (such as friendship). For example, Wimmer and Lewis (2010) describe a force to close triads among Facebook friends by structural balance, which requires the assumption that individuals who are not Facebook friends are joined by a negative tie. Much work follows from Holland and Leinhardt (1971), who argued that null ties (and even asymmetric positive ties) in positive sentiment

80   James A. Kitts and Eric Quintane networks can be interpreted as negative ties for the purpose of deriving transitive closure from ­balance theory. Later work following from theirs extended this argument to role relations (Hallinan, 1974). To be sure, transitivity and triad closure may be extremely prevalent in role relations of many kinds, but structural balance theory cannot explain this pattern unless null ties in the role relation (e.g., non-coauthor) are interpretable as negative sentiments. Here we may offer an explicit scope condition for applying structural balance theory to role relation networks with an implication of positive mutual affect, like friendship. In a small group of densely interacting friends where nearly everyone is friends with everyone else, an anomalous pair of actors who do not call each other friends may be perceived as having negative sentiments toward one another. Or a pair of actors who have always called each other friends but cease doing so at a particular point in time may be perceived as having negative sentiments. In either case, it is the violated expectation of a positive role relation that implies negative sentiments.

Can We Use Aggregated Social Interaction Data to Investigate Theories of Access and Social Sentiments? Given the ubiquitous challenges of measuring networks of access or opportunity outside the laboratory, researchers have often interpreted temporally aggregated interaction networks as access to apply theories of structural power or centrality metrics based on path lengths. However, because those theories rely on a strong interpretation of null ties, they are generally difficult to apply to aggregated interaction data. First is the problem that interactions may have occurred in some other mode not measured (say, two students may not send each other emails on the university server because they communicate on Gmail or Instagram instead, so their lack of measured emails does not represent lack of access). A person is unlikely to follow a long path of past email partners to say something to a teammate or classmate who is immediately accessible some other way. Second is the interplay between time aggregation and the interpretation of null ties, which are crucial to theories of access. If aggregated interaction data are already hard to defend as a measure of impossibility of interaction, aggregating interactions into shorter time intervals further weakens this interpretation of null ties and thus makes the interpretation of interaction as access even more untenable. Simply aggregating interaction into brief intervals will make the entire population seem inaccessible, and without aggregation raw interaction event data generally cannot be represented as access at all. We previously discussed how structural balance theory has been inappropriately used to account for triad closure in role relations. Recently researchers have appealed to structural balance theory to explain the phenomenon of triad closure in temporally aggregated behavioral interaction networks (e.g., if i tends to send electronic messages to j, and j sends to k, then i also tends to send to k). The link to the classic theory is tempting, but recall that the motivational drive in the theory is the dissonance created by the (negative) valence of ties. Applications of structural balance theory to positive or neutral social interaction data such as email communication networks (Kossinets & Watts, 2009) must assume that noninteraction on a time interval is equivalent to a temporally continuous negative sentiment tie on that interval, an assumption that at least needs to be defended. We believe this slippage from a theory for positive and negative sentiments to an analysis of neutral interaction events is a telling example of why the careful thought in this chapter is needed.

Rethinking Networks   81 In a contrasting example, Kitts (2010) identifies scope conditions under which structural balance theory may be applied to triad closure in social interaction data (i.e., conditions in which noninteraction might be plausibly interpreted as negative sentiments): envision an interaction form that is socially construed as reflecting positive sentiments (e.g., children playing together, adults having each other over for dinner with family), operating in small groups with a very high baseline level (high expectation) of social interaction. In those conditions, for dyads where such interaction is anomalously missing, the parties may be perceived as avoiding each other and implicitly having a negative tie. Similarly, if two parties interact regularly and then cease interacting together, this could imply or signal a negative sentiment. In either case, it is the tension between the expectation of interaction and surprising lack of interaction that implies a negative tie to social perceivers, and implied or perceived negativity does have implications for structural balance theory. By contrast, for a large university community, the absence of emails exchanged between any two students does not imply a negative sentiment to anyone because it does not violate any expectation. Thus, in those cases we have no justification to use structural balance theory to explain triad closure in networks of positive or neutral interactions. Such patterns could instead be explained more parsimoniously through propinquity (i.e., proximity in physical or social space), through shared memberships in classes, clubs, or other foci (Feld, 1981), as a byproduct of homophily (Goodreau, Kitts, & Morris, 2009) or status hierarchy (Feld & Elmore, 1982). In the next two sections we will introduce a pair of revolutions, in data and methods. We first describe the implications of new sources of streaming data for each of the network conceptualizations defined earlier. Then we present a methodological revolution that allows us to go beyond interpreting aggregated interactions as ties, and instead to directly investigate structural-temporal interdependencies in behavior.

A Revolution in Data Collection: Computational Social Science Decades of previous research on interaction networks were constrained to either manually observe interaction behavior (and then aggregate it into “social ties”) or to ask survey questions about typical interaction partners (such that survey respondents mentally aggregate over events). Tools of computational social science allow direct measurement of social interaction behavior in continuous time, employing a new universe of timestamped interaction data, which form the behavioral substrate of what we call social networks. These data include traces of electronic messaging and interactions using social media, as well as streaming data from location-aware devices, fixed and wearable sensors, biometric monitoring, and the like.4 Several recent articles have highlighted the key theoretical, methodological, or institutional challenges in using such digital trace data in the social sciences (e.g., Golder & Macy, 2014; Kitts, 2014; Lazer & Radford, 2017). Many scholars have represented aggregated interaction event data as social networks, applying the same theories developed for understanding sociometric nomination patterns. In the following section, we ask how these new sources of data correspond to the four traditional conceptualizations in social network theory and analysis.

82   James A. Kitts and Eric Quintane

Computational Social Science and Role Relations Contact lists such as those on Facebook, Google, or LinkedIn should be classified minimally as role relations in the aforementioned typology because they are temporally continuous dyadic roles defined by the social media platform. Like other role relations, the data are convenient but only weakly correspond to theoretically motivated network concepts. Some links on some platforms may imply interaction, sentiments, or access, but this is likely to be idiosyncratic and inconsistent, so it is difficult to fit such data to scope conditions of any behavioral theory without extensive processing. For example, while  some Facebook contacts might indeed interact socially or have relationships with one another deeper than a contact list on the website, many or most such online contacts apparently never interact (even on the website itself), are not mutually acquainted, and some are not even people. If Facebook pages can represent not only persons but also organizations, clubs, babies, cats, or deceased people, then we are unlikely to find a meaningful social or behavioral theory that applies to networks of this kind. Early research on online relationships (e.g., Adamic & Adar,  2003) tended to examine sparse networks where most actors had only one or two connections, which researchers assumed were close friends or colleagues. Connections on social media years later have become less costly (usually just a click of a mouse), so they are both more numerous and less significant (Dunbar, 2016; Golder, Wilkinson, & Huberman, 2007; Huberman, Romero, & Wu, 2009; Yang & Yu, 2014), with some ties connecting complete strangers. To derive an interpretable social network, scholars have used a variety of ways to carve out a subset of a digital contact list that plausibly fits a conventional social tie concept. For example, Golder et al. (2007) recommend filtering the set of Facebook friends to identify only those who send each other electronic messages or comment on each other’s materials. Their goal is to remove noninteracting “friends” and thus make these ties more like offline friendships (which they assume include communication). Similarly, Gilbert and Karahalios (2009) use exchange of photos as well as public and private messages to identify tie strength in Facebook relationships. Wimmer and Lewis (2010) aim to restrict the list of Facebook friends to substantive relationships by only selecting friends from one cohort at the same college, and regard the friendship as existing (one-way) only after one party tags the other in a photo posted to Facebook. This restriction implies that they are genuine people who have some face-to-face interaction.5 Xie et al. (2012) similarly propose ways to equate offline relationships with patterns of posting on Twitter. We see that researchers may develop a composite meas­ure intersecting an arguably ambiguous role relation (such as Facebook friends) with a temporally aggregated behavioral interactions to find a more interpretable relationship. Even so, such composite measures are problematic for null ties, as they impose a strong equivalence assumption between total strangers and close friends who fail to interact on the platform in a given time interval, which may not always be defensible.

Computational Social Science and Sentiments Although most have focused on the wealth of electronic trace data representing social interactions, innovations in computational social science can also provide digital data on interpersonal sentiments. We might naively assume that a tag such as a “friend” or a “like” on

Rethinking Networks   83 Facebook entails a positive sentiment, but users may employ those tags for a variety of reasons and even the meaning of a “like” is unclear (Sumner, Ruge-Jones, & Alcorn, 2017). Interpersonal sentiments are not commonly available in digital form, although particular social media may invite users to rate each other positively or negatively (Leskovec et al., 2010; State, Abrahao, & Cook, 2016). A challenging but rewarding approach is to analyze text in messages or posts using natural language processing tools for sentiment analysis (Stieglitz & Dang-Xuan,  2013; B. Liu, 2015; Pozzi et al., 2016). For example, text analysis of social media posts allows for identification of emotional expressions and examination of temporal patterns in moods (Dodds et al., 2011; Golder & Macy, 2011) and also for identification of dyadic similarity in sentiments toward social objects to drive recommender systems (Yang, Huang, & Wang,  2017). Sentiment analysis of Facebook status updates (Coviello et al.,  2014) has allowed observation of emotional contagion on networks of Facebook contacts. The same tools can be applied to text directed from one actor to another to provide measures of interpersonal sentiments. Fuhse et al. (2020) analyze transcribed verbal and nonverbal reactions to speech (e.g., laughter, applause, objections) as recorded in parliamentary proceedings to identify sentiments of alliance and conflict among parties, and O’Connor, Stewart, and Smith (2013) analyzed news reports of events to infer valence in international relations. An exciting but less developed frontier is collecting electronic trace data on sentiments using biometric sensors. For example, researchers may monitor galvanic skin response, pupil dilation, or heart rate (Palaghias et al., 2016; Salah et al., 2011); employ brain imaging (O’Donnell & Falk, 2015); or monitor hormones in saliva or urine samples such as oxytocin (Doom, Doyle, & Gunnar,  2017; Grebe et al.,  2017) or cortisol and testosterone (Ketay, Welker, & Slatcher, 2017; Kornienko et al., 2014; Mehta et al., 2017). They may also automatically monitor sentiment-related nonverbal behavior, such as eye gaze or body posture (Schmid Mast et al., 2015), response latency (Iyengar & Westwood, 2015); analyze speech features in audio recordings (Gu et al.,  2017; Rachuri et al.,  2010); use accelerometers to detect laughter (Hung, Englebienne, & Kools, 2013); use chest bands to monitor breathing patterns during conversations (Rahman et al.,  2011; Ejupi & Menon,  2018); use infrared thermography to infer emotions from facial microexpressions (Clay-Warner & Robinson, 2015); or reflect radio-frequency signals off of the body to detect emotional states through physiological responses (Zhao, Adib, & Katabi, 2016).

Computational Social Science and Behavioral Interactions Among the most commonly collected data forms attributed to computational social science are records of behavioral interactions: timestamped logs of phone calls (Eagle et al., 2010; Onnela et al.,  2007; Raeder et al.,  2011), emails (Kleinbaum, Stuart, & Tushman,  2013; Kossinets & Watts, 2009), electronic calendar meetings (Lovett et al., 2010), or credit card purchases (Dong et al.,  2018); online arenas for dating (Lin & Lundquist,  2013), gaming (Szell & Thurner, 2010), resource sharing (State, Abrahao, & Cook, 2016), writing reviews and commenting on them (Goldberg, Hannan, & Kovacs, 2016), exchanging questions and answers (Vu, Pattison, & Robins, 2015), and crowdsourced editing (Crandall et al., 2008). Face-to-face interactions have been similarly monitored using wearable sensors that detect conversations (Harari et al., 2019; Nakakura, Sumi, & Nishida, 2009; Wyatt et al., 2008) or

84   James A. Kitts and Eric Quintane physical proximity (Eagle, Pentland, & Lazer,  2009; Pachucki et al.,  2015) and mutual ­orientation (Jang et al., 2017). The advent of location-aware devices allows for inference of interaction events by colocation (Lee et al., 2013) or mobility patterns as sequences of locations (Cranshaw et al., 2010). In parallel to the emergence of new sources of digital data coming directly from computer-mediated interactions, the digitalization of more traditional sources of interaction data has also enabled access to fine-grained temporal information about persons and organizations offline. For example, radio communication transcripts (Butts, Petrescu-Prahova, & Cross, 2007), citations (Peng, 2015; Shwed & Bearman, 2010; Foster, Rzhetsky, & Evans, 2015), and gang member fatal shootings (Papachristos, Hureau, & Braga, 2013) also provide timestamped event records of interaction. The availability of timestamped interaction data offers rigorous ways to operationalize classic social network concepts such as tie strength. These methods range from simple, such as the total time spent in phone calls (Onnela et al., 2007) or the frequency of two-party face-to-face conversations (Wyatt et al., 2008) during a time interval, to very complex, such as automatically collected measures of interaction intensity, intimacy, reciprocal giving, emotional support, and relational duration (Gilbert & Karahalios, 2009). Similarly, continuous monitoring of interaction using sensors allows new measures of node position in networks. Wyatt et al. (2008) use sensor-recorded speech to develop a temporally sensitive measure of closeness centrality, considering how information flows through paths of conversations (where nodes are “closer” to the extent that they are connected by short paths with more talking time and “farther” to the extent that they are connected by long paths with little talking time). It may also allow researchers to link these social positions to finegrained interaction behavior. For example, they find that individuals change their speech patterns (syllable rate, vocal pitch, speaking turn length) when talking with peers who are central in the network of temporally aggregated conversations and when speaking through “weak ties” (infrequent conversations) in that network. Of course, we challenge such literatures to stretch these ideas a step further, beyond ties and networks, to directly theorize, measure, and model the structural temporal dependencies in social behavior.6

Computational Social Science and Structures of Access or Opportunity Many core social network concepts, measures, and theories are predicated on the notion of the network as a structure of access or opportunities. This has been easy to implement in experiments, where researchers can control who is able to interact with or receive information from whom, but hardly any observational study actually measures null ties as complete prohibition of relevant interaction. That interpretation is especially untenable in small-scale contexts such as face-to-face interaction among classmates or close colleagues. Thus, implementation of access networks in observational empirical research is a largely open frontier, where computational social science promises to deliver new insights. There are two avenues by which computational social science may lead to important theoretical developments. First, digital data may mitigate the dearth of research on structures of access or opportunity by providing new ways to monitor users’ access to information on a network. Although the form of data may be idiosyncratic and specific to particular social media, in many cases a medium may allow or require users to define privacy preferences, identifying what

Rethinking Networks   85 information is public or private, and even identify discrete sets or social circles of alters who are privy to different levels of shared information, or ability to exchange messages (Anthony, Campos-Castillo, & Horne, 2017; Golder & Macy, 2014; Lewis, 2011). This could be developed in further work to give natural field observation of networks more directly interpretable as access to information. The second avenue represents a deeper rethinking of the concept of access. In interpreting observed interaction networks as access, researchers have strayed from the crucial scope condition that gives access networks analytical leverage, that is, that relevant interaction is strictly impossible through null ties, requiring information or resources to travel through other actors in the network. Ironically, some of these same researchers have data relevant to the concept of access. For example, they may have measures of the physical proximity of actors in a network, or the fact that some actors share a primary language and others do not, or that some actors may be more available to one another because of their work schedules, or the locations of their offices or homes, or their positions in the organizational chart. These variables constrain availability for interaction and are often a more faithful representation of the access concept than is the observed interaction network. But they are typically treated either as control variables, a nuisance to be ruled out as researchers investigate some other structural force (like assortative mixing or triad closure), or as determinants of the network (such as the effect of physical distance on the odds of forming a tie). These features of the interaction context may instead be seen as composing a latent access network. As Kitts (2014) has argued, the next step is to extend the concept of networks-as-access to latent ties with continuous rather than discrete values (where interaction is more or less costly, risky, or difficult) and to refine the theories and measures to reflect this step. Operationalizing the access network as a structure of impediments, challenges, and barriers to interaction could mean building a network without conventional sociometric data like role relations or interaction records, but instead measuring and modeling the structure of access. Researchers may apply theories of access (such as structural power or network exchange) to those data on distance or impediments, rather than using questionable proxies based on role relations or observed interaction. The new developments in computational social science—especially the advent of location-­ aware devices and geotagging of locations for interaction and routine activities—may often offer such fine-grained measures of relatively exogenous access networks. Rather than implementing null ties as strict prohibition of interaction, we can model a continuum like transaction costs or friction, which makes interaction more or less feasible. This generalization will facilitate applying theories and concepts of network exchange to behavioral interaction outside the laboratory, where impediments to exchange may be relative rather than absolute. Such impediments make some exchanges costly or undesirable, but not strictly impossible.

A Revolution in Data Analysis: From Aggregating to Modeling Relational Events The emerging continuous measures of interaction challenge us to consider more deeply issues of temporal dynamics in underlying social processes. Preliminary efforts extend previous work on networks composed of temporally aggregated events. Moody, McFarland, and

86   James A. Kitts and Eric Quintane Bender-deMoll (2005) demonstrate how aggregating behavior in slices of time allows networks to be tracked over time visually. Kossinets and Watts (2009) construct network snapshots based on temporally aggregated email exchanges, with a variety of different thresholds to define ties, then investigate changes in these networks over time. Martin (2009) proposes a “quantitative ethology” in which interpersonal dominance acts are aggregated within discrete periods to describe the development of status hierarchies among adolescents. Statistical modeling approaches also typically operate on temporally aggregated data. For example, researchers use exponential family random graph models (ERGMs) to explore forces that underlie structural patterns in static networks (Butts et al.,  2014; Cranmer & Desmarais, 2010; Goodreau et al., 2009; Harris, 2013; Lewis, 2016). Stochastic actor-­oriented models (SAOMs) or temporal exponential family random graph models (TERGMs) aim to use discrete time network panel data to make inferences about generative processes that account for the changing structure (Leifeld & Cranmer,  2015; Snijders, Van de Bunt, & Steglich, 2010). However, these methods do not offer a framework for directly modeling fine-grained interdependent dynamics of relational behavior. Temporal aggregation obscures generative dynamics because it necessarily removes a wealth of information about timing or sequence, as well as changes in the composition of nodes in the network. Diffusion paths that appear in aggregated networks may be impossible if the sequencing or timing of interaction prevents transmission. For example, if B interacts with C and then A interacts with B, the path from A to B to C appears in a temporally aggregated network but is meaningless given the constraints of timing and sequence (Moody, 2002). Indeed, producing a static network by aggregating over a sequence of interactions, such as sex contacts in a high school (Bearman, Moody, & Stovel,  2004), can severely distort social processes such as contagion and diffusion. Aggregation also assumes that all actors were available during the entire time interval. Hence, aggregating interactions and then using conventional tools to analyze and interpret the networks is likely to be misleading. Notably, aggregating timestamped event data into panel data to use tools like SAOM or TERGM destroys the very fine-grained information that could provide direct insight into the processes researchers are trying to model. By retaining the event data, future work can lead to theoretical insights that illuminate mechanisms of change.7 Recent developments in statistical methodology make it possible to preserve the temporal information present in relational event data. Inspired by event history analysis, the relational event modeling (REM) framework (Butts, 2008) was developed as a way to statistically capture temporal dependencies inherent in behavioral interaction data while respecting coarser social interdependencies between actors. These models predict the occurrence of the next event in a sequence of events, where past events form a context that shapes the propensity for future events. The framework allows specification of statistics that capture social processes, such as reciprocation or transitive closure. Such social interdependencies violate the conventional assumption of independent observations for cross-sectional networks but are not a problem in the REM framework due to the temporal ordering of events. Each event is taken to be independent of the others, conditional on the realized history of events. Relational event modeling is employed by Butts (2008) to describe radio communications in the World Trade Center disaster. Variants of this model have been applied in a variety of interaction forms: computer-mediated group interactions (Leenders, Contractor, & DeChurch, 2016; Pilny et al., 2016), online discussions in Massive Open Online Courses (Vu et al., 2015) and computer games (Schecter et al., 2018), phone calls among students

Rethinking Networks   87 (Pilny et al., 2017), cosponsorship of bills in Congress (Brandenberger, 2018), international events coded from news reports (DuBois & Smyth, 2010), and patient transfers among hospitals (Kitts et al., 2017). Alternative frameworks or extensions have been proposed to serve specific needs of researchers in handling sequences of relational events. For example, recent efforts to combine REM and SAOM aim to model the coevolution of event sequences with individual-level attributes or behaviors (Stadtfeld & Block,  2017; Stadtfeld, Hollway, & Block, 2017). Alternatives or extensions incorporate exogenous events and multiple event types at the ego level (Marcum & Butts, 2015), combine multiple event sequences through hierarchical modeling (DuBois, Butts, McFarland, & Smyth, 2013), or integrate stochastic blockmodeling with relational event models (DuBois, Butts, & Smyth, 2013). Finally, slightly different approaches to modeling relational event sequences have been presented through multilevel modeling (DeNooy, 2011) or alternative modifications of linear or logistic regression models (De Nooy, 2015; Doogan & Warren, 2017; Vu et al., 2011). In traditional dynamic network analysis (Snijders et al.,  2010), social processes are assumed to follow the same form of time dependence, which remains stable over historical time. A key advantage of modeling event sequences is that researchers can explore how relational events may depend on past events over distinct time horizons, aligning the analytical method with the time grain of the social processes. Static properties of networks such as reciprocity and transitivity can be recast as dyadic and triadic patterns of history de­pend­ ence in events (e.g., i may be more likely to give to j again if j reciprocates quickly vs. slowly, or if j has a long history of reciprocating to another actor, k). Authors have used different assumptions about history dependence, specifically the retrospective time horizon that may affect current events. Butts (2008) considers that only the most recent event (or the previous event of a certain type) can be used to predict the next event. Kitts et al. (2017) show how short-term and long-term history dependence may reflect distinct processes, moving beyond reciprocity as a property of ties to theorize reciprocation as nuanced patterns of history dependence. Some authors also weight events by their distance in retrospective time, such as by a continuous decay in importance (Amati, Lomi, & Mascia,  2019; Brandes, Lerner, & Snijders, 2009; Leenders et al., 2016). These approaches have benefits and drawbacks: Selecting only the most recent event or a certain kind of event severely constrains the questions that can be asked about history dependence. Distinguishing the short term from the long term allows that processes may operate differently over these time horizons but requires the researcher to distinguish the two horizons for substantive or theoretical reasons. Time weighting avoids this choice, but researchers must instead specify a decay function. (For an exploration of the boundary between discrete-horizon and continuous decay approaches, see Kitts et al., 2017, p. 876). The temporal expression of a given social process may be affected by the context. Researchers typically apply the concepts of inertia, reciprocity, or transitive closure in the same way, regardless of the type of underlying relational data. For example, triad closure is applied in different contexts such as interpersonal sentiments (e.g., Newcomb, 1961), radio communication during emergency situations (e.g., Butts et al.,  2007), or interorganizational alliances (Gulati & Gargiulo, 1999). In these cases, the time horizons of history dependence may apply to years for the context of strategic alliances between firms, to months for changes of interpersonal sentiments, and to minutes or seconds for radio communications. Recognizing these different time horizons might enable us to theorize more about temporal shapes of dyadic and triadic history dependence for different modes of interaction. For example, in a longitudinal

88   James A. Kitts and Eric Quintane study of the coevolution of multiple modes of interaction (phone calls, coworking, social visits, wearable sensor measures of dyadic and multiparty face-to-face conversations), Kitts (2010) teases apart propinquity and structural balance as explanations for triad closure by exploiting the theorized time horizons of these processes. This analysis compares interaction forms implying colocation like multiparty face to face conversations and team meetings with isolated interaction like phone calls and dyadic meetings and also compares sentiment-laden interaction forms like social visits with affectively neutral interaction forms like working together. The comparison across modes of interaction reveals stronger triad closure on a short time scale to the extent that interaction joins third parties into shared locations, ­consistent with propinquity theory. Consistent with balance theory, triad closure increases over a long time scale, but only for interaction modes laden with affect. In closing, the analytical lenses described here can be extended and applied to discover the interplay of dynamic social interaction patterns with coarser socially constructed relationships, interpersonal sentiments, or access over time, all largely unexplored frontiers. With the advent of computational social science, we now see some researchers monitoring and modeling several of these phenomena simultaneously. For example, researchers analyze perceived friendships based on self-reports in tandem with measures of interaction using phone call records (Pilny et al., 2017) or Bluetooth sensors (Oloritun et al., 2013). Further, Eagle et al. (2009) infer friendships based on spatiotemporal patterns of proximity and communication (i.e., phone calls, close proximity off campus in evenings and on weekends) and compare those to self-reported friendships. Zhang & Butts (2017) infer friendships and group comemberships in the same data using correlations of activity sequences in dyads, as detected by mobile devices. Park and Kim (2017) analyze interaction events (gift exchange) overlaid on a network of “friends” in an online community. West et al. (2014) examine interpersonal evaluations among Wikipedia editors (positive, neutral, or negative votes for each other’s candidacy for administrator) in tandem with sentiment analysis of their comments. Stadtfeld and Pentland (2015) examine the interplay between two forms of role relation—romantic partnerships and friendships—using longitudinal survey data that is paired with mobile phone data (enabling future research on how multiple role relations interoperate with behavioral interaction). Wyatt et al. (2011) combine surveys with automatic conversation detection using audio; Bahulkar et al. (2017) combine surveys with records of calls and text messages and Bluetooth measures of colocation; Matic, Osmani, and Mayora-Ibarra (2012) use a combination of proximity using Wi-Fi signals and relative body orientation using embedded accelerometers, orientation sensors, and gyroscopes. Such research often presents one measure as ground truth for the purpose of evaluating the other as a measure of the true social network. Although that is a fruitful step, we expect new data and methods will lead to fundamental discoveries resulting from investigating how role relations, interactions, sentiments, and access interrelate in time.

Acknowledgments Research reported in this publication was supported by the National Institutes of Health (NICHD) under award number R01HD086259. The content is solely the responsibility of the authors and does not necessarily represent the official views of the funders. The authors thank Kevin Lewis and Diego Leal for helpful comments.

Rethinking Networks   89

Notes 1. To bound our discussion, we consider social networks as sets of social actors and links between them (in behavior, sentiments, access, or socially constructed roles). Social network analysis can also be applied to so-called two-mode networks that could be seen as conventional rectangular data, such as persons and their opinions, theaters and the films they show, or documents and their topics. Some such networks may be projected to form conventional social networks (such as partners in a conversation or comembers of a team). In other cases, one-mode projections do not form social networks as we use the term here, such as persons linked by having similar opinions, theaters linked by showing similar films, or documents linked by sharing topics. We focus here on four coherent domains of research, where all nodes represent social actors and all links represent behavior, sentiments, access, or roles connecting actors. We do not consider mere similarity to be a social tie (cf. Bail, 2016; Borgatti et al., 2009). 2. Gross and Jansa (2017) refer to this property of continuity as “persistence.” Our approach follows Kitts (2014) in using continuity because persistence may be confused with another dimension of temporality: the duration of a tie. The assumption of tie continuity is orthogonal to duration, as a tie may exist continuously on a brief interval. 3. That study prohibits contact, communication, observability, or even personal information about the peer to ensure that all network ties are anonymous strangers who will never directly encounter or have feelings for one another; they either receive a single email notice for each peer that joins an online forum (Centola, 2010) or they receive updates about a peer’s ongoing health behavior (Centola,  2011). In this pure application of networks-as­access, interaction and sentiments are eliminated as potentially confounding factors in estimating the causal effect of social ties (as access). Later online experiments (e.g., Centola et al., 2018) induce interaction among research subjects, in the form of one-shot coordination games with a series of different partners. The players never meet, communicate, or learn anything about each other, so once again this design aims to cleanly investigate a single dimension of network ties. 4. The term computational social science is used to represent multiple fields of inquiry that are as yet hardly integrated. One of these fields (see Macy & Willer,  2002) focuses on computational modeling of social processes, aiming to clarify theory through formalization, often without empirical data. Another of these fields focuses on methods derived from computer science and machine learning for analyzing socially relevant data, such as natural language processing of text (Bail, 2016; Evans & Aceves, 2016), often agnostic to social scientific theories or data sources. A third usage (see Lazer et al., 2009; Golder & Macy, 2014; Salganik, 2019) describes new sources of digital trace data), often agnostic to either theory or analytical method. In this chapter we refer to the third movement (defined on data sources and data collection) but suggest that the other computational social science communities could as well add purchase to the challenges identified in this chapter. 5. That said the lead author just mentioned to one Facebook-using friend that Facebook pages could as well be cats or dead people, and she responded by showing him her dead cat’s Facebook page. Princess has been dead for seven years but still has 28 Facebook friends (including 27 persons and one dead dog), is tagged in photos with them, and continues to send and receive messages, posts, comments, and likes after her death. Conventional social network analysis may be applied to these data, but no general behavioral theory applies to such phenomena.

90   James A. Kitts and Eric Quintane 6. For example, whereas the approach by Wyatt et al. obscures sequential constraints by aggregating over time, see Falzon et al. (2018) discuss extensions and applications that explicitly take sequence of interactions into account. 7. Even when relational events are aggregated, information in the timestamped data may be exploited before aggregation to yield new insights. For example, Oloritun et al. (2012) examine aggregated interaction networks collected using Bluetooth sensors but differentiate structures of short interactions from lengthy interactions, arguing that short and long interactions reflect different generative processes, producing distinct structures.

References Adamic, L. A., & Adar, E. (2003). Friends and neighbors on the web. Social Networks, 25(3), 211–230. Amati, V., Lomi, A., & Mascia, D. (2019). Some days are better than others: Examining time-specific variation in the structuring of interorganizational relations. Social Networks, 57, 18‒33. Anthony, D.  L., Campos-Castillo, C., & Horne, C. (2017). Toward a sociology of privacy. Annual Review of Sociology, 43(1): 249‒269. Bail, C. A. (2016). Combining natural language processing and network analysis to examine how advocacy organizations stimulate conversation on social media. Proceedings of the National Academy of Sciences, 113(42), 11823‒11828. Bahulkar, A., Szymanski, B. K., Chan, K., & Lizardo, O. (2017). Coevolution of a multilayer node-aligned network whose layers represent different social relations. Computational Social Networks, 4(1), 11. Bearman, P. S., Moody, J., & Stovel, K. (2004). Chains of affection: The structure of adolescent romantic and sexual networks. American Journal of Sociology, 110(1), 44–91. Bonacich, P. (1987). Power and centrality: A family of measures. American Journal of Sociology, 92(5), 1170–1182. Borgatti, S.  P., Mehra, A., Brass, D., & Labianca, G. (2009). Network analysis in the social sciences. Science, 323(5916), 892–895. Brandenberger, L. (2018). Trading favors—Examining the temporal dynamics of reciprocity in congressional collaborations using relational event models. Social Networks, 54, 238–253. Brandes, U., Lerner, J., & Snijders, T. A. B. (2009). Networks evolving step by step: Statistical analysis of dyadic event data. In International Conference on Advances in Social Network Analysis and Mining (pp. 200–205), IEEE. Brashears, M. E., & Quintane, E. (2018). The weakness of tie strength. Social Networks, 55, 104–115. Butts, C. T. (2008). A relational event framework for social action. Sociological Methodology, 38(1), 155–200. Butts, C. T., Morris, M., Krivitsky, P. N., Almquist, Z., Handcock, M. S., Hunter, D. R., . . . Bender de-Moll, S. (2014). Introduction to exponential-family random graph (ERG or p*) modeling with ERGM. Florence, Italy: European University Institute. Butts, C.  T., Petrescu-Prahova, M., & Cross, B.  R. (2007). Responder communication networks in the World Trade Center disaster: Implications for modeling of communication within emergency settings. Journal of Mathematical Sociology, 31(2), 121–147. Byrne, D. (1961). Interpersonal attraction and attitude similarity. Journal of Abnormal and Social Psychology, 62(3), 713.

Rethinking Networks   91 Cartwright, D., & Harary, F. (1956). Structural balance: A generalization of Heider’s theory. Psychological Review, 63(5), 277–293. Centola, D. (2010). The spread of behavior in an online social network experiment. Science, 329(5996), 1194–1197. Centola, D. (2011). An experimental study of homophily in the adoption of health behavior. Science, 334(6060), 1269‒1272. Centola, D., Becker, J., Brackbill, D., & Baronchelli, A. (2018). Experimental evidence for tipping points in social convention. Science, 360(6393), 1116‒1119. Clay-Warner, J., & Robinson, D. T. (2015). Infrared thermography as a measure of emotion response. Emotion Review, 7(2), 157–162. Cook, K.  S., Emerson, R.  M., Gillmore, M.  R., & Yamagishi, T. (1983). The distribution of power in exchange networks: Theory and experimental results. American Journal of Sociology, 89(2), 275–305. Coviello, L., Sohn, Y., Kramer, A. D. I., Marlow, C., Franceschetti, M., Christakis, N. A., & Fowler, J. H. (2014). Detecting emotional contagion in massive social networks. PLoS One, 9(3), e90315. Crandall, D., Cosley, D., Huttenlocher, D., Kleinberg, J., & Suri, S. (2008). Feedback effects between similarity and social influence in online communities. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 160–168). New York, NY: ACM. Cranmer, S. J., & Desmarais, B. A. (2010). Inferential network analysis with exponential random graph models. Political Analysis, 19(1), 66–86. Cranshaw, J., Toch, E., Hong, J., Kittur, A., & Sadeh, N. (2010). Bridging the gap between physical location and online social networks. In Proceedings of the 12th ACM International Conference on Ubiquitous Computing (pp. 119–128). New York, NY: ACM. de Nooy, W. (2011). Networks of action and events over time. A multilevel discrete-time event history model for longitudinal network data. Social Networks, 33(1), 31–40. de Nooy, W. (2015). Structure from interaction events. Big Data & Society, 2(2). Dodds, P. S., Harris, K. D., Kloumann, I. M., Bliss, C. A., & Danforth, C. M. (2011). Temporal patterns of happiness and information in a global social network: Hedonometrics and Twitter. PLoS One, 6(12), e26752. Dong, X., Suhara, Y., Bozkaya, B., Singh, V.  K., Lepri, B., & Pentland, A.  S. (2018). Social bridges in urban purchase behavior. ACM Transactions on Intelligent Systems and Technology (TIST), 9(3), 33. Doogan, N. J., & Warren, K. (2017). A network of helping: Generalized reciprocity and cooperative behavior in response to peer and staff affirmations and corrections among therapeutic community residents. Addiction Research & Theory, 25(3), 243–250. Doom, J. R., Doyle, C. M., & Gunnar, M. R. (2017). Social stress buffering by friends in childhood and adolescence: Effects on HPA and oxytocin activity. Social Neuroscience, 12(1), 8–21. DuBois, C., Butts, C. T., McFarland, D., & Smyth, P. (2013). Hierarchical models for relational event sequences. Journal of Mathematical Psychology, 57(6), 297–309. DuBois, C., Butts, C., & Smyth, P. (2013). Stochastic blockmodeling of relational event dynamics. In Proceedings of the 16th International Conference on Artificial Intelligence and Statistics (pp. 238–246). DuBois, C., & Smyth, P. (2010). Modeling relational events via latent classes. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 803–812). New York, NY: ACM.

92   James A. Kitts and Eric Quintane Dunbar, R. I. (2016). Do online social media cut through the constraints that limit the size of offline social networks? Royal Society Open Science, 3(1), 150292. Eagle, N., Macy, M., & Claxton, R. (2010). Network diversity and economic development. Science, 328(5981), 1029–1031. Eagle, N., Pentland, A. S., & Lazer, D. (2009). Inferring friendship network structure by using mobile phone data. Proceedings of the National Academy of Sciences, 106(36), 15274–15278. Ejupi, A., & Menon, C. (2018). Detection of talking in respiratory signals: A feasibility study using machine learning and wearable textile-based sensors. Sensors, 18(8), 2474. Evans, J. A., & Aceves, P. (2016). Machine translation: Mining text for social theory. Annual Review of Sociology, 42, 21–50. Falzon, L., Quintane, E., Dunn, J., & Robins, G. (2018). Embedding time in positions: Temporal measures of centrality for social network analysis. Social Networks, 54, 168–178. Feld, S. L. (1981). The focused organization of social ties. American Journal of Sociology, 86(5), 1015–1035. Feld, S. L., & Elmore, R. G. (1982). Patterns of sociometric choices: Transitivity reconsidered. Social Psychology Quarterly, 45(2), 77–85. Foster, J. G., Rzhetsky, A., & Evans, J. A. (2015). Tradition and innovation in scientists’ research strategies. American Sociological Review, 80(5), 875–908. Freeman, L. C. (1979). Centrality in social networks conceptual clarification. Social Networks, 1(3), 215–239. Friedkin, N. E., & Johnsen, E. C. (2011). Social influence network theory: A sociological examination of small group dynamics (Vol. 33). New York: Cambridge University Press. Fuhse, J., Stuhler, O, Riebling, J. & Martin, J. L. (2020). Relating social and symbolic relations in quantitative text analysis. A study of parliamentary discourse in the Weimar Republic. Poetics, 78. Gilbert, E., & Karahalios, K. (2009). Predicting tie strength with social media. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 211–220). New York, NY: ACM. Goldberg, A., Hannan, M. T., & Kovács, B. (2016). What does it mean to span cultural boundaries? Variety and atypicality in cultural consumption. American Sociological Review, 81(2), 215–241. Golder, S. A., & Macy, M. W. (2011). Diurnal and seasonal mood vary with work, sleep, and daylength across diverse cultures. Science, 333(6051), 1878–1881. Golder, S.  A., & Macy, M.  W. (2014). Digital footprints: Opportunities and challenges for online social research. Annual Review of Sociology, 40, 129–152. Golder, S. A., Wilkinson, D. M., & Huberman, B. A. (2007). Rhythms of social interaction: Messaging within a massive online network. In Steinfield, C., Pentland, B.T., Ackerman, M., Contractor, N. (Eds.), Communities and technologies (pp. 41–66). London, UK: Springer. Goodreau, S. M., Kitts, J. A., & Morris, M. (2009). Birds of a feather, or friend of a friend? Using exponential random graph models to investigate adolescent social networks. Demography, 46(1), 103–125. Grebe, N. M., Kristoffersen, A. A., Grøntvedt, T. V., Thompson, M. E., Kennair, L. E. O., & Gangestad, S. W. (2017). Oxytocin and vulnerable romantic relationships. Hormones and Behavior, 90, 64–74. Gross, J.  H., & Jansa, J.  M. (2017). Relational concepts, measurement, and data collection. In Victor, J. N., Montgomery, A. H., Lubell, M. (Eds.), The Oxford handbook of political networks (p. 175‒201). New York: Oxford University Press.

Rethinking Networks   93 Gu, J., Gao, B., Chen, Y., Jiang, L., Gao, Z., Ma, X.,. . . & Jin, J. (2017). Wearable social sensing: Content-based processing methodology and implementation. IEEE Sensors Journal, 17(21), 7167–76. Gulati, R., & Gargiulo, M. (1999). Where do interorganizational networks come from? American Journal of Sociology, 104(5), 1439–1493. Hallinan, M.  T. (1974). A structural model of sentiment relations. American Journal of Sociology, 80(2), 364–378. Harari, G. M., Müller, S. R., Stachl, C., Wang, R., Wang, W., Bühner, M., . . . & Gosling, S. D. (2019). Sensing sociability: Individual differences in young adults’ conversation, calling, texting, and app use behaviors in daily life. Journal of Personality and Social Psychology. Advance online publication. http://dx.doi.org/10.1037/pspp0000245 Harris, J. K. (2013). An introduction to exponential random graph modeling (Vol. 173). Thousand Oaks, CA: Sage Publications. Heidler, R., Gamper, M., Herz, A., & Eßer, F. (2014). Relationship patterns in the 19th century: The friendship network in a German boys’ school class from 1880 to 1881 revisited. Social Networks, 37, 1–13. Holland, P.  W., & Leinhardt, S. (1971). Transitivity in structural models of small groups. Comparative Group Studies, 2(2), 107–124. Homans, G. C. (1950). The human group. New York, NY: Harcourt, Brace & World. Huberman, B. A., Romero, D. M., & Wu, F. (2009). Social networks that matter: Twitter under the microscope. First Monday, 14(1). Hung, H., Englebienne, G., & Kools, J. (2013). Classifying social actions with a single accelerometer. In Proceedings of the 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing (pp. 207–210). New York, NY: ACM. Iyengar, S., & Westwood, S. J. (2015). Fear and loathing across party lines: New evidence on group polarization. American Journal of Political Science, 59(3), 690–707. Jang, H., Choe, S. P., Gunkel, S. N., Kang, S., & Song, J. (2017). A system to analyze group socializing behaviors in social parties. IEEE Transactions on Human-Machine Systems, 47(6), 801–813. Ketay, S., Welker, K.  M., & Slatcher, R.  B. (2017). The roles of testosterone and cortisol in friendship formation. Psychoneuroendocrinology, 76, 88–96. Kitts, J. A. (2010). Dynamics of networks within groups. Paper presented at the Group Process Conference, Atlanta, Georgia. Kitts, J. A. (2014). Beyond networks in structural theories of exchange: Promises from computational social science. Advances in Group Processes, 31, 263–298. Kitts, J. A., & Leal, D. F. (2020). What Is(n’t) a friend? Dimensions of the friendship concept among adolescents. Forthcoming in Social Networks. Kitts, J. A., Lomi, A., Mascia, D., Pallotti, F., & Quintane, E. (2017). Investigating the temporal dynamics of interorganizational exchange: Patient transfers among Italian hospitals. American Journal of Sociology, 123(3), 850–910. Kleinbaum, A.  M., Stuart, T.  E., & Tushman, M.  L. (2013). Discretion within constraint: Homophily and structure in a formal organization. Organization Science, 24(5), 1316–1336. Kornienko, O., Clemans, K. H., Out, D., & Granger, D. A. (2014). Hormones, behavior, and social network analysis: Exploring associations between cortisol, testosterone, and network structure. Hormones and Behavior, 66(3), 534–544. Kossinets, G., & Watts, D.  J. (2009). Origins of homophily in an evolving social network. American Journal of Sociology, 115(2), 405–450.

94   James A. Kitts and Eric Quintane Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabási, A. L., Brewer, D., . . . Jebara, T. (2009). Computational social science. Science, 323(5915), 721–723. Lazer, D., & Radford, J. (2017). Data ex machina: Introduction to big data. Annual Review of Sociology, 43, 19–39. Lee, Y., Min, C., Hwang, C., Lee, J., Hwang, I., Ju, Y., . . . Song, J. (2013). Sociophone: Everyday face-to-face interaction monitoring platform using multi-phone sensor fusion. In Proceedings of the 11th Annual International Conference on Mobile Systems, Applications, and Services (pp. 375–388). New York, NY: ACM. Leenders, R.  T.  A., Contractor, N.  S., & DeChurch, L.  A. (2016). Once upon a time: Understanding team processes as relational event networks. Organizational Psychology Review, 6(1), 92–115. Leifeld, P., & Cranmer, S. J. (2015). A theoretical and empirical comparison of the temporal exponential random graph model and the stochastic actor-oriented model. arXiv preprint arXiv:1506.06696. Leskovec, J., Huttenlocher, D., & Kleinberg, J. (2010). Signed networks in social media. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 1361–1370). New York, NY: ACM. Lewis, K. (2011). The co-evolution of social network ties and online privacy behavior. In Privacy online (pp. 91–109). Berlin, Heidelberg, Germany: Springer. Lewis, K. (2016). Preferences in the early stages of mate choice. Social Forces, 95(1), 283–320. Lewis, K., Gonzalez, M., & Kaufman, J. (2012). Social selection and peer influence in an online social network. Proceedings of the National Academy of Sciences, 109(1), 68–72. Lin, K.  H., & Lundquist, J. (2013). Mate selection in cyberspace: The intersection of race, ­gender, and education. American Journal of Sociology, 119(1), 183–215. Liu, B. (2015). Sentiment analysis: Mining opinions, sentiments, and emotions. New York: Cambridge University Press. Liu, C. C., & Srivastava, S. B. (2015). Pulling closer and moving apart: Interaction, identity, and influence in the US Senate, 1973 to 2009. American Sociological Review, 80(1), 192–217. Lovett, T., O’Neill, E., Irwin, J., & Pollington, D. (2010). The calendar as a sensor: Analysis and improvement using data fusion with social networks and location. In Proceedings of the 12th ACM International Conference on Ubiquitous Computing (pp. 3–12). New York, NY: ACM. Macy Michael, W., & Willer, R. (2002). From factors to actors: Computational sociology and agent-based modeling. Annual Review of Sociology, 28(1), 143–166. Malik, M. M. (2018). Bias and Beyond in Digital Trace Data (Doctoral dissertation, Carnegie Mellon University). Retrieved from http://reports-archive.adm.cs.cmu.edu/anon/isr2018/ abstracts/18-105.html. Marcum, C. S., & Butts, C. T. (2015). Constructing and modifying sequence statistics for relevent using informR in R. Journal of Statistical Software, 64(5), 1. Martin, J. L. (2009). Formation and stabilization of vertical hierarchies among adolescents: Towards a quantitative ethology of dominance among humans. Social Psychology Quarterly, 72(3), 241–264. Marvel, S. A., Kleinberg, J., Kleinberg, R. D., & Strogatz, S. H. (2011). Continuous-time model of structural balance. Proceedings of the National Academy of Sciences, 108(5), 1771–1776. Matic, A., Osmani, V., & Mayora-Ibarra, O. (2012). Analysis of social interactions through mobile phones. Mobile Networks and Applications, 17(6), 808–819. Mehta, P. H., DesJardins, N. M. L., van Vugt, M., & Josephs, R. A. (2017). Hormonal underpinnings of status conflict: Testosterone and cortisol are related to decisions and satisfaction in the hawk-dove game. Hormones and Behavior, 92, 141–154.

Rethinking Networks   95 Molm, L.  D., Whitham, M.  M., & Melamed, D. (2012). Forms of exchange and integrative bonds: Effects of history and embeddedness. American Sociological Review, 77(1), 141–165. Moody, J. (2002). The importance of relationship timing for diffusion. Social Forces, 81(1), 25–56. Moody, J., McFarland, D., & Bender-deMoll, S. (2005). Dynamic network visualization. American Journal of Sociology, 110(4), 1206–1241. Nakakura, T., Sumi, Y., & Nishida, T. (2009). Neary: Conversation field detection based on similarity of auditory situation. In Proceedings of the 10th workshop on Mobile Computing Systems and Applications (p. 14). New York, NY: ACM. Newcomb, T. (1961). The acquaintance process. New York, NY: Holt, Reinhard & Winston. O'Connor, B., Stewart, B. M., & Smith, N. A. (2013). Learning to extract international relations from political context. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1094‒1104). ACL. O’Donnell, M. B., & Falk, E. B. (2015). Linking neuroimaging with functional linguistic analysis to understand processes of successful communication. Communication Methods and Measures, 9(1–2), 55–77. Oloritun, R. O., Madan, A., Pentland, A., & Khayal, I. (2012). Evolution of social encounters in ad-hoc mobile face-to-face interaction networks. In 2012 International Conference on Social Informatics (pp. 192–928). IEEE. Oloritun, R. O., Madan, A., Pentland, A., & Khayal, I. (2013). Identifying close friendships in a sensed social network. Procedia-Social and Behavioral Sciences, 79, 18–26. Onnela, J.-P., Saramäki, J., Hyvönen, J., Szabó, G., Lazer, D., Kaski, K., . . . Barabási, A.-L. (2007). Structure and tie strengths in mobile communication networks. Proceedings of the National Academy of Sciences, 104, 7332. Pachucki, M. C., Ozer, E. J., Barrat, A., & Cattuto, C. (2015). Mental health and social networks in early adolescence: a dynamic study of objectively-measured social interaction behaviors. Social Science & Medicine, 125, 40–50. Palaghias, N., Hoseinitabatabaei, S. A., Nati, M., Gluhak, A., & Moessner, K. (2016). A survey on mobile social signal processing. ACM Computing Surveys (CSUR), 48(4), 57. Papachristos, A. V., Hureau, D. M., & Braga, A. A. (2013). The corner and the crew: The influence of geography and social networks on gang violence. American Sociological Review, 78(3), 417–447. Park, P.  S., & Kim, Y.  H. (2017). Reciprocation under status ambiguity: How dominance motives and spread of status value shape gift exchange. Social Networks, 48, 142–156. Peng, T. Q. (2015). Assortative mixing, preferential attachment, and triadic closure: A longi­ tudinal study of tie-generative mechanisms in journal citation networks. Journal of Informetrics, 9(2), 250–262. Pilny, A., Proulx, J. D., Dinh, L., & Bryan, A. L. (2017). An adapted structurational framework for the emergence of communication networks. Communication Studies, 68(1), 72–94. Pilny, A., Schecter, A., Poole, M. S., & Contractor, N. (2016). An illustration of the relational event model to analyze group interaction processes. Group Dynamics: Theory, Research, and Practice, 20(3), 181. Pozzi, F. A., Fersini, E., Messina, E., & Liu, B. (2016). Sentiment analysis in social networks. Cambridge, MA: Morgan Kaufmann. Rachuri, K. K., Musolesi, M., Mascolo, C., Rentfrow, P. J., Longworth, C., & Aucinas, A. (2010, September). EmotionSense: A mobile phones based adaptive platform for experimental social psychology research. In Proceedings of the 12th ACM International Conference on Ubiquitous Computing (pp. 281–290). New York, NY: ACM.

96   James A. Kitts and Eric Quintane Raeder, T., Lizardo, O., Hachen, D., & Chawla, N. V. (2011). Predictors of short-term decay of  cell phone contacts in a large scale communication network. Social Networks, 33(4), 245–257. Rahman, M. M., Ali, A. A., Plarre, K., Al’Absi, M., Ertin, E., & Kumar, S. (2011). mconverse: Inferring conversation episodes from respiratory measurements collected in the field. In Proceedings of the 2nd Conference on Wireless Health (p. 10). New York, NY: ACM. Salah, A. A., Lepri, B., Pianesi, F., & Pentland, A. S. (2011). Human behavior understanding for inducing behavioral change: Application perspectives. In International Workshop on Human Behavior Understanding (pp. 1–15). Berlin, Heidelberg, Germany: Springer. Salganik, M. (2019). Bit by bit: Social research in the digital age. Princeton, NJ: Princeton University Press. Schecter, A., Pilny, A., Leung, A., Poole, M. S., & Contractor, N. (2018). Step by step: Capturing the dynamics of work team process through relational event sequences. Journal of Organizational Behavior, 39(9), 1163–1181 Schmid Mast, M., Gatica-Perez, D., Frauendorfer, D., Nguyen, L., & Choudhury, T. (2015). Social sensing for psychology: Automated interpersonal behavior assessment. Current Directions in Psychological Science, 24(2), 154–160. Shwed, U., & Bearman, P. S. (2010). The temporal structure of scientific consensus formation. American Sociological Review, 75(6), 817–840. Snijders, T. A., Van de Bunt, G. G., & Steglich, C. E. (2010). Introduction to stochastic actorbased models for network dynamics. Social Networks, 32(1), 44–60. Stadtfeld, C., & Block, P. (2017). Interactions, actors, and time: Dynamic network actor models for relational events. Sociological Science, 4, 318–352. Stadtfeld, C., Hollway, J., & Block, P. (2017). Dynamic network actor models: Investigating coordination ties through time. Sociological Methodology, 47(1), 1–40. Stadtfeld, C., & Pentland, A. (2015). Partnership ties shape friendship networks: A dynamic social network study. Social Forces, 94(1), 453–477. State, B., Abrahao, B., & Cook, K. (2016). Power imbalance and rating systems. In Proceedings of the 10th International AAAI Conference on Web and Social Media (ICWSM). Stephens, M., & Poorthuis, A. (2015). Follow thy neighbor: Connecting the social and the spatial networks on Twitter. Computers, Environment and Urban Systems, 53, 87–95. Stieglitz, S., & Dang-Xuan, L. (2013). Emotions and information diffusion in social media— Sentiment of microblogs and sharing behavior. Journal of Management Information Systems, 29(4), 217–248. Sumner, E. M., Ruge-Jones, L., & Alcorn, D. (2017). A functional approach to the Facebook Like button: An exploration of meaning, interpersonal functionality, and potential alternative response buttons. New Media & Society, 20(4), 1451–1469. Szell, M., & Thurner, S. (2010). Measuring social dynamics in a massive multiplayer online game. Social Networks, 32(4), 313–329. Valente, T. W. (1995). Network models of the diffusion of innovations. Cresskill, NJ: Hampton Press. Vargas, R. (2011). Being in “bad” company: Power dependence and status in adolescent susceptibility to peer influence. Social Psychology Quarterly, 74(3), 310–332. Vu, D. Q., Hunter, D., Smyth, P., & Asuncion, A. U. (2011). Continuous-time regression models for longitudinal networks. In Advances in Neural Information Processing Systems (pp. 2492–2500). NIPS. Vu, D., Pattison, P., & Robins, G. (2015). Relational event models for social learning in MOOCs. Social Networks, 43, 121–135.

Rethinking Networks   97 West, R., Paskov, H. S., Leskovec, J., & Potts, C. (2014). Exploiting social network structure for person-to-person sentiment analysis. Transactions of the Association for Computational Linguistics, 2, 297–310. Wimmer, A., & Lewis, K. (2010). Beyond and below racial homophily: ERG models of a friendship network documented on Facebook. American Journal of Sociology, 116(2), 583–642. Wyatt, D., Bilmes, J., Choudhury, T., & Kitts, J. A. (2008). Towards the automated social analysis of situated speech data. In Proceedings of the 10th International Conference on Ubiquitous Computing (pp. 168–171). New York, NY: ACM. Wyatt, D., Choudhury, T., Bilmes, J., & Kitts, J. A. (2011). Inferring colocation and conver­ sation networks from privacy-sensitive audio with implications for computational social science. ACM Transactions on Intelligent Systems and Technology (TIST), 2(1), 7. Xie, W., Li, C., Zhu, F., Lim, E. P., & Gong, X. (2012). When a friend in Twitter is a friend in life. In Proceedings of the 4th Annual ACM Web Science Conference (pp. 344–347). New York, NY: ACM. Yang, D., Huang, C., & Wang, M. (2017). A social recommender system by combining social network and sentiment similarity: A case study of healthcare. Journal of Information Science, 43(5), 635–648. Yang, D.  H., & Yu, G. (2014). Static analysis and exponential random graph modelling for micro-blog network. Journal of Information Science, 40(1), 3–14. Zhang, X., & Butts, C. T. (2017). Activity correlation spectroscopy: a novel method for inferring social relationships from activity data. Social Network Analysis and Mining, 7(1), 1. Zhao, M., Adib, F., & Katabi, D. (2016). Emotion recognition using wireless signals. In Proceedings of the 22nd Annual International Conference on Mobile Computing and Networking (pp. 95–108). New York, NY: ACM.

Chapter 6

N et wor ks, Status, a n d I n equa lit y John Levi Martin and James P. Murphy

Terminology and Scope The vagueness of our key terms has proven deceptive for previous seafarers, who took this as a promise of calm, while instead the incredible breadth of these terms led to a hazardous shallowness, and thinkers were stranded on the reefs of preconception . . . and, occasionally, torpedoed by extended metaphor. We begin by clarifying terminology.

Networks The notion that there is a single class of objects, “networks,” has been a great inspiration to new forms of structural thinking. However, any attempt to use general approaches for classes of radically different phenomena—exchange relations between firms, transport relations between cities, friendship relations between children, synaptic connections between ­neurons—puts a hard limit to the development of plausible theories. Here, we will only be interested in social networks and, most specifically, relations exclusive of formal or­gan­i­za­tion (though they may be in formal organizations). This approach to network analysis draws from an Anglo-American perspective on the world quite at odds with the statist Durkheimianism that guides the conventional sociological imagination. This view, which goes back to Francis Hutcheson, came into its own with later British anthropologists, especially those associated with the Manchester School (Barnes, 1954; Mitchell, 1969). Networks are considered to be a set of largely voluntary ties that often span organizational boundaries. Despite being divorced from formal hierarchies, they make possible other forms of differentiation, such as status.

Status “Status” also turns out to be a remarkably indistinct notion. Originating in Henry Sumner Maine’s (1888 [1861]: 164f; cf. Sumner, 1906, p. 67) distinction between status and contract as

Networks, Status, and Inequality   99 principles of social organization, the former fixed at birth, “status” was then generalized so that it could be “achieved” as well as “ascribed” at birth, and brought into sociology (Linton, 1936; Homans, 1950, p. 11; Merton, 1957). It was then conflated with the idea of status as some sort of prestige ranking, largely coming from Henderson and Parsons’s translation of Weber’s (1947) Theory of Social and Economic Organization (see Martin [2009b] for a somewhat fuller discussion). The resulting synthesis increasingly saw “status” as an at­trib­ute of persons pertaining to their position in a single continuous ranking of esteem (see, e.g., Homans,  1961, p. 149; Lenski,  1954; and for a network representation, Whyte,  1981 [1943]). Thus, while “status” might be understood as any position in a social order, allowing for horizontal as well as vertical differentiation, horizontality has rarely been a key theoretical concern and, indeed, usually enters analyses as a frustrating complication, to be ignored if possible as a form of imprecision or error. Status, then, can easily be understood as a form of inequality.

Inequality Understood in the simplest form, the term inequality seems to refer merely to the absence of equality, and such absence is the norm in most situations. Any organization of statuses other than equality (horizontality) implies inequality. Here, however, we also focus on a different kind of inequality, a nonnetwork inequality in which network structures may be embedded, and which may shape their formation. Given these definitions, we explore in greater detail different ways in which status has been conceptualized in network studies.

Ascertaining Status in Networks Because of the imprecision in the notion of status, researchers can easily talk at crosspurposes (see Collins [2000] for a critique). For example, there was, with apparently no sense of incongruence, a debate as to whether “status” was the cooperative, task-oriented status envisioned by expectation states/status characteristics theorists (Ridgeway,  1987) or something more akin to agonistic dominance (M. T. Lee & Ofshe, 1981). This confusion is especially likely because in naturally occurring groups, there can be more than one principle of ordering. In studies of children’s groups, for example, “status” has at different times been understood as membership in the leading crowd, popularity, effectiveness in initiative, or authority over others (for examples, see Coleman, 1961, p. 98; Sherif et al., 1988 [1961], pp. 37, 125; Harvey, 1953). The choice of conceptualization can make quite a difference—the popular are not necessarily the most liked (Eder, 1995, pp. 31, 40); those given the most attention may not be deferred to (see Vaughn & Waters, 1980, p. 371; Abramovitch, 1980). Here we discuss the most common ways of conceptualizing status and their relation to network study.

Esteem and Choice As we have seen, the most common interpretation of status has dealt with esteem. While some survey instruments directly tap such feelings, more generally, esteem has been

100   John Levi Martin and James P. Murphy inferred from patterns of sociometric choice. This might be assessed via a hypothetical (“if you had to spend a week on a deserted island with one person in your class . . .”) or a seeming matter of fact (“name your three best friends . . .”) item. The notion seems to be that while we recognize that there may be many dyadically particular reasons for one person to choose another, these sorts of factors should cancel out in any aggregation of received choices, which will therefore tap esteem. Using the language of Holland and Leinhardt (1981), we refer to people’s tendency to make choices as their expansiveness, and their tendency to receive choices as their attractiveness. (And we will use popularity to refer to the observed number of realized choices garnered by any person.) In graph-theoretic terms, these correspond, when realized, to out-degree and in-degree, respectively. The in-degree is therefore widely used as a measure of esteem, sometimes, confusingly, called “centrality.”1 Such a sum clearly treats all choices as equal; however, we might imagine that a choice from a very popular alter should boost one’s status more than a choice from an unpopular one.2 Working through this logic leads to the class of eigenvector measures (Bonacich, 1987) also often termed centrality. However, where the logic underlying the measure holds, we expect preferential attachment by degree, which means that the results of such eigenvector measures will usually correlate extremely highly with degree.3 Where there is a central or­gan­iz­ a­tion but positional centrality is not associated with degree—for example, where centers are of low degree (as in many organizations) or where there is heterophilous matching by degree (hub-spoke systems)—eigenvector measures may better express central position than mere degree; however, there are presumably a large class of structures where the reverse is true. In larger assemblages, it may be that we are interested not only in the esteem garnered by any one individual but also in that of cliques of individuals. Investigations here had begun with the logic of Heider’s (1946) balance theory, which implied sets of cliques composed of mutual relations. Yet data gathered by sociometric questionnaire tended to have many asymmetric relations. Investigators using data on children’s friendship nominations within schools (especially Holland, Leinhardt, and Davis) found that they had to account for hierarchical relations between some, but not all, cliques, or clusters, as they were called; consequently, the clusters were partially ordered (Davis & Leinhardt, 1972). However, this model assumed that the only reason that two clusters would be unordered were that they were at the same rank; this did not fit the fact that boys and girls tended to have separate structures, and members of a low-ranked boys’ group would tend not to choose members of a high-ranked girls’ group, and vice versa (Leinhardt, 1972; Davis, 1979, p. 56). Further generalizations allowed any triads compatible with transitivity (Holland & Leinhardt,  1971) and, finally, triads with two mutual relations and one asymmetric (Johnsen, 1985). Thus, horizontality can emerge in a world of vertical relations for different reasons. Two individuals may be at roughly equivalent positions, or they may actually be incomparable. Further, there may be noise in the data, or there may be a “just noticeable difference” required for objective differences in status to be subjectively recognized. Most interestingly, there may be actual disagreements among different persons as to the relevant vertical distinctions (Nakao & Romney, 1993). Most simply, we might have a partition into different “sets,” each of which believes its own members to be of higher status than others (Martin, 2009a).

Networks, Status, and Inequality   101 Despite the presence of horizontal differentiation, researchers generally assume that measures of vertical position (e.g., in-degree) are comparable across all persons. Such approaches share an understanding that asymmetries in choice processes imply status differences (Gould,  2002). If Archie chooses Reggie as a friend, but Reggie does not choose Archie, then, all other things being equal, we expect Reggie to be of higher ­status than Archie. However, we do not expect all other things to be equal—we expect Jughead, say, to also choose Reggie and not Archie, and for Archie to have a lower popularity. However, there are three measurement issues that bedevil such conclusions. The first is that persons may have different thresholds at which they decide to name someone as a friend. This becomes very weighty in data-gathering regimes in which there is no constraint on expansiveness—for example, when the respondent makes a potentially independent choice of whom to nominate when considering the set of all possible alters in turn. Someone with a high threshold contributes few ties to the potential status of others and therefore tends to be elevated in relative popularity. Someone with a low threshold contributes to the popularity of others and has many asymmetric ties. Even more, this threshold may be less of a (time invariant) trait than a state into and out of which respondents can pass; those who feel depressed or under threat may withdraw from others by increasing the threshold at which they consider someone a friend (Schaefer, Kornienko, & Fox, 2011; E. Smith, Menon, & Thompson, 2012). In other data-gathering regimes, there is a hard constraint on the number of possible alters that may be nominated (e.g., the five male and five female friends of Add Health). Given that instruction may be understood as proposing a norm (try to think of five . . .), we may expect this problem to be somewhat attenuated. (It should be noted, however, that there is in fact substantial variation in expansiveness in Add Health, and only 61% are at the maximum for same-sex friendships.) One mathematically elegant way of attempting to take differences in expansiveness into account when estimating popularity is the p1 model, developed by Holland and Leinhardt (1981), which produces estimates of person expansiveness and attractiveness parameters, as well as a global parameter indicating the tendency toward reciprocation of ties. This model has been extended to include covariates by van Duijn, Snijders, and Zijlstra (2004). In recent work on status inequality and high schools, J. A. Smith and Faris (2015) employ the p1 model to estimate each actor’s attractiveness conditional on the network’s overall tendency toward reciprocity. There is, however, still a second problem that comes from missing data. In general, when data are missing completely at random, many nodal measures, especially relative degree and eigenvector centrality, are impressively robust to small to moderate amounts of missing data (Borgatti, Carley, & Krackhardt, 2006; J. A. Smith & Moody, 2013; D. J. Wang et al., 2012). However, much less is known about the effects of nonrandom missingness, which is to be expected in unequal environments. In settings such as schools, nonresponse and nonserious response are correlated with a number of later poor outcomes (e.g., SavinWilliams & Joyner,  2014). If there is systematic nonresponse coming disproportionately from the “clowns” or “cut-ups,” we do not know whether some observed isolates have a certain outcome because they are isolates (no one willing to nominate them as a friend) or because, on the contrary, the problem is that they do have friends—the ones who sit in the back, who carve “Do Bowls” on the desk, and who say their only friend in the school is

102   John Levi Martin and James P. Murphy Batman.4 The latter situation should be of particular concern as missing more popular actors will disproportionately bias measurement of low-status actors’ ranking (J. A. Smith, Moody, & Morgan, 2017). Yet a third issue is the fact that both nodal and graph-level measures’ sensitivity to missingness is systematically related to networks’ structural properties even when data are missing completely at random (J. A. Smith & Moody, 2013; Costender & Valente, 2003; D. J. Wang et al., 2012).5 While degree and eigenvector centrality are, as noted, relatively robust, their bias still depends on network size and skewness of degree distributions (centralization). As one would expect, such concerns are aggravated when missingness is correlated with status (J. A. Smith et al., 2017).6

Visibility and Prominence A second species of status concerns the distribution of attention to actors, what we might call prominence or visibility. In this sense, high-status actors are those who are most visible in a given social domain. While frequently conflated with esteem/choice, prominence does not carry with it any necessary intonation conveying approval or disapproval. Whether or not it is recognized as such, visibility is the focus of study in many of the most popular kinds of “status” studies. It underlies studies on scientific citation networks (de Sola Price, 1976)— where there is a widespread assumption that prominence is equivalent to esteem—fame (van de Rijt et al., 2013), and adolescent peer relations (Eder, 1985). With the ever-increasing use of Internet data and “big data” more generally, visibility on a mass scale becomes more amenable to study. As such studies proliferate, it becomes all the more important to distinguish prominence from other sorts of status lest we generalize naively from the former to the latter.7 As a form of status, visibility has at least two important consequences. Awareness of another person is the first ingredient for processes such as social comparison (Festinger, 1954) that could support social learning (Friedkin, 1993). This includes learning about third parties (e.g., i learns from j that k is “cool”), which supports the recursive dynamic posited by many models of esteem/choice hierarchy (e.g., Gould, 2002; Manzo & Baldassarri, 2014). A second set of consequences involves visibility’s role in the dynamics of other forms of status, particularly esteem/choice. Visibility is a double-edged sword. It provides opportunities to disappoint as well as impress. For example, in a study of online reviews of award-nominated books, Kovács and Sharkey (2014) find that, upon winning an award, a book attracts more readers. Average ratings decline, however, as readership expands beyond the initial audience. Similarly, in a study of female peer relations in middle school, Eder (1985) finds that from the perspective of the students themselves, a primary meaning of “popularity” is visibility. Girls gain popularity in part through general friendliness, drawing positive sentiments from others. As they rise through the status hierarchy, however, to maintain their position, newly popular girls leave unreciprocated friendship gestures from lower-ranked students, creating the impression of “snobbiness.” Such examples demonstrate the interwoven dynamics of many separate phenomena that sociologists frequently lump together under the same label, “status.”

Networks, Status, and Inequality   103

Agonism A third type of relation that can induce status differences and leave traces in network data turns not on esteem, but on agonism. Some agonistic processes produce relations that are necessarily antisymmetric: if two children fight and one wins, the other has lost. Such relations can be actual fights, but they may also be nonviolent competitions of verbal or athletic prowess, or a series of bluffs that may establish a differentiation of toughness but only actually lead to a fight when there is disagreement as to who would be the victor, were a fight to take place (Gould, 2003). Because these relations are inherently antisymmetric, they require methods different from those used to study choice processes (for approaches to the quantification of agonistic status, see Boyd & Silk, 1983; Jameson, Appleby, & Freeman, 1999; Kalma, 1991; Landau, 1953; Martin, 1998; Roberts, 1990; Sade et al., 1988). Most important, rather than additive effects of ego’s expansiveness and alter’s attractiveness implying a tie, we often expect a comparative process, in which it is the difference between ego’s and alter’s qualities that leads to the establishment of a relation of “greater than.” The questions that can be asked about this status structure are the same as those asked under choice regimes. The first pertains to the degree of vertical differentiation. The clearest structure is a perfect order, in which all subjects have a rank, with all relations flowing down the rank order. One example is what is often called a “dominance order” or, more familiarly, a “pecking order,” even among the nonfowl (if not the fair) (e.g., Eder, 1995, p. 68). Such dominance orders have been found among most monkeys and apes, as well as in many other species from ants on up (for references and accessible discussions, see Martin, 2009a, Chapter  4; De Waal,  1998; Cheney & Seyfarth,  1990; Hölldobler & Wilson,  1994). Yet in many cases, the observed structure falls short of a clear order possessing absolute vertical differentiation. Yet, as with choice, an absence of vertical differentiation does not require horizontal ­differentiation—there may simply be numerical equality between some actors. Hence, the second question pertains to horizontal differentiation—are there some pairs that lack a relation because they are structurally incomparable? Finally, is an absence of verticality due to a simpler censoring whereby not all relationships are observed? Such partial orders of status relations have been observed in primates in the wild (and lead to measurement approaches discussed by Iverson & Sade, 1990), where, especially in larger groups, many do not interact (a freedom often denied by the caging of zoo animals). Although there have been arguments that for evolutionary reasons we should expect human interaction to involve dominance processes similar to those found among apes (the classic statement here is Tiger [1970]; for more thoughtful discussions see the early review of Mazur [1973] and the contrary arguments of Boehm [1997]), there have been few investigations into comparable processes of vertical organization in small human groups.8 Although there is some evidence of the formation of orders among adult humans (see the review by Chase [1980]), and more regarding children and adolescents (see especially Martin,  2009b; Savin-Williams,  1987; Strayer & Strayer,  1976; and the classic piece by Hanfmann,  1935), still, very little is known about the conditions that would favor, or the processes that would lead to, such a vertical differentiation. The ­analog from animal studies would suggest that “caging,” metaphorically or literally, will

104   John Levi Martin and James P. Murphy increase the tendency to approximate a linear order, perhaps mostly by decreasing ­horizontal differentiation.

Status Production and Maintenance in Networks Thus far, we have concentrated on laying out the types of status that may be examined using network data. We now turn to the dynamics that lead to the emergence and evolution of status differences in networks.

The Popularity Tournament Above we treated status as if it were a fixed substructure from which or in which observed network relations are generated. However, considering choice processes, sociologists frequently assume that attractiveness is shaped by, if not wholly due to, the pattern of choices itself. If so, there is an important endogeneity to such processes, for they follow Merton’s (1968) “Matthew effect,” namely, that to them who have, more shall be given. Each additional choice a person garners increases his or her attractiveness and thus increases the pressures on others to choose him or her (see, e.g., Waller, 1937, p. 730). This leads to what Martin (2009a, p. 65) calls a “popularity tournament,” a particular social dynamic that arises in situations where two important conditions hold. The first is that each person has some information about the popularity of others. We often assume that such information is freely available, and not always for very good reason. However, in other cases, there are specific institutional structures to distribute such knowledge. The sociology of science quantifies the number of references made to any article as an indicator of that article’s dominance in its field, and researchers find that they must cite the widely cited, but they are free to ignore the ignored, whatever the content. The second condition for a popularity tournament to arise is that any additional choice received by a person increases that person’s attractiveness. If possible, an actor will replace a choice of a less popular with a more popular alter. What may stop this from leading to a complete concentration of choices, other than a lack of information, may be the existence of horizontal differentiation, or a desire for reciprocated choices (Gould, 2002). The possibility of contrarians—those who have antipreferential attachment—is ignored or assumed to be minimal, perhaps rightly.

The Facebook Effect We noted that in some surveys, we may have complications coming from a ceiling placed on expansiveness. Yet this ceiling has a sort of ecological validity—most of us can remember only so many acquaintances (i.e., those whose names you remember, more or less, when you encounter them). Even more difficult is generating a name in the absence of the alter.

Networks, Status, and Inequality   105 Even very, very popular people have a total number of friends and acquaintances more in line with those of medium popularity than they would if, like Montgomery Burns, they were equipped with a Rolodex. In many computer-mediated platforms, most importantly, Facebook, all ties are, by fiat, symmetric. A friend request that is ignored does not transition into a tie. Yet, given the blatant nature of the request and the low cost to accede to it, some of these ties are just as aspirational as the asymmetric ones in questionnaire data. As a result, “popularity” incorporates expansiveness in a way that we do not see in networks where reciprocation is more costly. Further, given that some people have high expansiveness because of a desire to use the infrastructure as a broadcasting platform, snatching their 15 minutes of fame as opposed to waiting for media consecration, we have the possibility of new forms of network status that we may call, honoring our two great visionaries, “structural Warhols.”

Status Diffusion A second set of dynamics that lead to endogenous processes of verticalization involves not the number of ties per se, but the actions of members of a task group. Still, the notion is that each attribution of esteem increases the likelihood of further attributions, because the esteemed are more likely to carry out acts that garner further esteem. This tradition goes back to R. F. Bales (1950, p. 73), who was interested in different aspects of differentiation in small groups, but found such high correlations between different forms of positive evaluation that, “as a first approximation,” he proposed a single ordering of status (Heinicke & Bales, 1953, pp. 18f, 21; also see Berger & Zelditch, 1998, p. 98, for a discussion). This led to a tradition of work that predominantly used laboratory subjects (usually undergraduates) given a task. A widespread notion at the time was that group members should have a status corresponding to their contributions to the solution of collective problems (Blau, 1964, p. 47; Homans, 1961, p. 334). These quasi-functional attributions of statuses could then be generalized to other nonfunctional forms of differentiation. Even if status might indeed come from successful contributions to group goals, one needed to attempt to make a contribution if one was to have any chance of meeting with success. And statuses in Linton’s sense (such as gender) that might be technically irrelevant for the task at hand turned out to predict such attempts. Thus, such irrelevant or “diffuse” statuses could lead to differential participation and hence self-fulfilling prophecies. Initial assumptions by group members about status rankings are, all other things being equal, likely to end up confirmed (Berger et al., 1977; Fişek, Berger, & Norman, 1995; Foddy & Smithson, 1996; Correll & Ridgeway, 2006). Despite its sharing the unhealthy obsession with status characterizing many network researchers, this tradition made relatively little impact in network analysis other than via coerced citations, in no small part because of its insistence on finding the same thing over and over again. Yet network researchers (e.g., Stewart, 2005) have been increasingly taking findings (or, we should say, the one finding) from this tradition seriously as they begin to explore not only the emergence of status hierarchies but also their persistence or instability. Until recently, there was a widespread presumption that there was no fundamental difference between the processes leading to the emergence of status structures (on the one hand) and their subsequent dynamics. Challenging that assertion has led to more interest in the phenomenology of status—when, and how, higher-status actors seem more (or less)

106   John Levi Martin and James P. Murphy appealing than others (e.g., Zink et al.,  2008; Martin,  2005), more charismatic (e.g., Zablocki, 1980), and less impressible (e.g., Yeung & Martin, 2003). Of course, notions of the popularity tournament do imply that there should be a continuation of cumulative advantage processes. But such processes can lead to the transformation of structures, as well as their stability. Bothner et al. (Bothner, Haynes, Lee, & Smith, 2010; Bothner, Smith, & White, 2010) demonstrate the paradoxical result that if the “halo” surrounding high-status actors is quite strong, those favored by the highest status tend to rise to the top themselves, to the point where they are near equals to their patron, compressing the status hierarchy. In contrast, J.  A.  Smith and Faris (2015) contend that upwardly mobile actors’ status gains are fleeting because their newfound position is perceived as less stable than that of established actors. This self-fulfilling prophecy has the effect of maintaining the integrity of the status structure while allowing variability in individual positions over time.9 Perhaps the most illuminating work on the topic comes from organizational sociology, in particular, its economic applications (see Podolny & Lynn [2009] and Sauder et al. [2012] for discussions). Here we may gain traction when, in addition to the esteem estimated by received choices, we also have a measure of objective quality that we are confident should be taken into account by other actors. This might be the case if, for example, the relations are investments of one firm in another, where we also have information on each firm’s price, earnings, and debt. This notion of objective quality can be extended to the relation between interpersonal networks and individual personality (Flynn et al., 2006; Jensen-Campbell et al., 2002). Although it is possible for such exogenous (objective) characteristics to merely jump-start a reflexive process of endogenous status amplification, it is also possible for social choices to be a reasonable heuristic for initial judgments of objective quality, which are then subject to further refinement over time. Thus, the endogenous processes of the popularity tournament may be stronger for some processes at the earlier stages, and for others, stronger in the later stages (also see Salganik & Watts, 2008). Such is life.

Asymmetry and Evolution Finally, one set of theoretically expected processes that become relevant to the evolution of status orderings pertains to the felt unease of certain local patterns of relations. If intran­ sitive relations are inherently disturbing (De Soto,  1960; Hallinan & Hutchins,  1980; Chase, 1980; although see Doreian & Krackhardt, 2001), we may see the tendency of vertical differentiation to increase. Ego may choose a “locally popular” alter (one who is chosen by those he or she chooses), increasing the number of choices that go to the popular, and, indeed, increases in the popularity of a target may draw other new ties (Martin & Yeung, 2006; Habinek, Martin, & Zablocki, 2015). On the other hand, if asymmetric relations are less stable than symmetric ones, we may see some tendency toward a mitigation of inequality, to the extent that asymmetry itself is an indicator of inequality. And there are indeed good reasons to expect that, in some cases, asymmetries are fragile. Unreciprocated friendship nominations are more likely to be withdrawn than are reciprocated (Adler & Adler, 1995; Gould, 2002; Hallinan, 1979; Hallinan & Kubitschek, 1988). Such instability is less likely to characterize status that turns on visibility. For one thing, some such ties are inherently antisymmetric. (A journal article from the 1980s may cite one from the 1960s, but the older article cannot reciprocate.) In other cases,

Networks, Status, and Inequality   107 asymmetry is simply irrelevant to the nature of the tie. Resentment toward a high school’s athletes and cheerleaders—its “leading crowd”—does not make them any less visible for all that (Coleman, 1961; Eder, 1985).

Topological Implications of Status Such processes have implications for the evolution of network topology. We consider in particular two aspects central to network research past and present: clustering and connectedness. Given that most of the approaches discussed here measure status using in-degree or some weighted version thereof, they are applicable to cases in which the data are choices that tap esteem (as opposed to the result of comparative or agonistic processes). Let us return to the popularity tournament but make the small change that ties are formed sequentially and that, once made, they are not dropped. This might be the case if actors enter the network at different times and choose to allocate their basket of ties among the actors present at that time, a reasonably good model for website links. If these ties are formed proportional to the existing degree of target nodes, the resulting network at any time soon approximates a power law (Barabási & Albert, 1999; though see Clauset et al. 2009). More importantly for the purposes of this chapter, if preferential attachment is strong enough to overcome any initial horizontal differentiation, high-status actors will attract ties from diverse subgroups, making them bridges (if ties are symmetric; also see Burt [1977]). However, given horizontal differentiation, we may have more complex structures, in which bridges are not necessarily of high degree, or where the network fractures into separate components as opposed to remaining connnected. Further, the tendencies toward preferential attachment can be dampened by the tendency (noted earlier) for unreciprocated ties to be withdrawn. The unpopular stop trying to befriend the popular, leading to assortative mixing. In some cases, this may include as­sort­ a­tive mixing by degree, producing a network with loosely connected subgroups with high internal density (Newman & Park, 2003). However, we also may find that, as a result of this very assortativity, the degree distribution equalizes—the unpopular begin to have as many friends as the popular, as they withdraw their aspirational nominations to the popular and establish mutual relations with other unpopular actors. As with (necessarily symmetric) data on marriages, we may find it impossible to recover information about preferences from a distribution based only on realized outcomes (see Logan, Hoff, & Newton, 2008). Thus, the status distribution can lead to different sorts of network structures with different dynamics. But it is also possible for other forms of inequality to affect network dynamics and network data. We close with attention to this.10

Studying Networks in Unequal Environments While we would all like to be able to reify networks, given the aesthetic pleasure of graph theories, we are repeatedly confronted with the fact that network analysts cannot ignore the quotidian social processes that produce their data, most importantly, questionnaire data that

108   John Levi Martin and James P. Murphy require recall (see, especially, Paik & Sanchagrin, 2013). And these processes turn out to be sensitive to inequality and difference (cf. Carley & Krackhardt, 1996; Menon & Smith, 2014). In particular, even outside of the tautology that people tend to nominate high-status alters, where status is measured as popularity, we still find a tendency for choices to align with status. Sometimes this is an artifact of timing. When you first went to college, you might have had 100 free slots for new acquaintances, but the seniors you met might have had only one or two left free. You remember their names; they don’t remember yours. This often increases the impression of a status gradient where there might not actually be one. But other times, it may be because higher-status people are likely to be easier to remember, if only because they tend to be more visible (see earlier). Finally, there is also a differential that seems to come from what is often called motivated cognition. “Aspirational” friendships may result from the fact that some respondents convince themselves that they are indeed liked by someone by whom they would like to be liked (see, e.g., Karweit & Hansell, 1983; An & McConnell, 2015; though also see Casciaro, 1998). And those of low status are likely to remember their one-time relations with those of higher status better than their one-time friends remember these same ties. This differential tie formation leads to a version of the Lake Woebegone effect; in this case, it is not the (true) fact that, as Scott Feld (1991) charmingly put it, your friends have more friends than you. Rather, we have findings that are impossible: historians doing genealogical work notice that everyone’s relatives tend to be better off than they, because people tend to remember their successful distant relatives (i.e., those decent folks who might lend you money) and forget the schlumps (i.e., those whom might seek to borrow money from you—the cads!). Further, background inequalities lead to other network distortions: some people are less likely to name anyone as a friend because they have limited language fluency, distrust the interviewer, or are not proud of their friends. Finally, a name generator that uses the wording “with whom do you discuss important matters?” may, to some extent, tap into the nonrandom distribution of a subjective sense of what matters count as important (Bearman & Parigi, 2004; Brashears, 2014; Small, 2013) or the vicissitudes of time (B. Lee & Bearman, 2017). As Pierre Bourdieu (1979) said, the privilege of believing that your ideas are important is one that is denied the dominated, perhaps less so in American than in European countries, but to an unexplored extent.11 The recall process itself (and not the choice of targets) may also be shaped by the subjective experience of inequality. E. Smith et al. (2012) find fascinating evidence that when respondents are feeling insecure, they may tend to stay closer to home as they mentally traverse their network. It also may be that cognitive load (e.g., worrying about something else) can lead people to use different, easier strategies, such as selecting names from a group with high triadic closure as opposed to naming a person perhaps liked more but unconnected to the others. Since stress and fear, like the confidence that one’s words are im­por­tant, are not randomly distributed, there is reason to think that we have nonrandom differences in reporting.

Conclusion Finally, it is worth noting that inequality is something that sociologists uniformly treat as a bad thing, if only because it is a secure business plan for them (a bit like being an

Networks, Status, and Inequality   109 exterminator in a tropical climate). But status is different—there seems to be no taboo to talking about striving for status, even if one’s eye teeth show as it happens. It is not simply that there is something a little bit daffy in wishing everyone the best in the striving after status and then expressing ressentiment at the resulting inequality. It is that, to the extent that our thinking about status involves projecting every form of social differentiation onto the real number line—and this extent, though varying across department and institution, is certainly far higher than it should be—we develop a consensually reaffirmed but fundamentally distorted notion of the basis of social differentiation. This distortion might put us in a remarkably bad position to do what many sociologists have been determined to attempt: to generate theories of inequality from theories of status. Social scientists would do better to look at the specifics of different network formation and reporting processes for different types of persons and different types of relations, and here we have pointed to encouraging work along these lines.

Notes 1. We note, parenthetically, the intellectual devastation that has resulted from a misguided effort to rationalize various measures of network position by attempting to treat them all as species of the genus “centrality,” an effort that is akin to attempting to treat all cutlery as varieties of spoons. “Centrality” only is meaningful in a graph that has a center, or an approximation to such. On issues of centrality we refer the reader to Borgatti in this ­volume. 2. This logic is far from airtight. If you are of such high status, why doesn’t that loser pick you, anyway? (See Rasch, 1960.) 3. We are grateful to Scott Feld for pointing this out in a talk. See Schoch, Valente, and Brandes (2017) for an empirical examination. 4. Recent research on model-based imputation of edges (e.g., C.  Wang et al.,  2016) offers promising solutions when there is rich information on actor covariates and the sampling frame for nodes is well defined, but much is still unknown about their practical consequences under different forms of measurement error. 5. We refer the reader to the discussion in D. J. Wang et al. (2012) of the easily overlooked issue of false positives in network data collection. This should be a growing concern as researchers increasingly rely on large, automatically collected digital data. Structural properties of the network can have opposite effects on sensitivity depending on the type of error. For example, strategies correcting for missing data may actually increase bias when spurious nodes are falsely sampled (e.g., an out-of-date school roster includes a student who has long since transferred). 6. Smith et al. have produced a helpful calculator for sensitivity analysis for nonrandom missingness: http://www.soc.duke.edu/~jmoody77/missingdata/calculator.htm (accessed December 8, 2017). 7. That said, Kitts (2014) correctly notes that, if used wisely, the recent rise of computational social science—including massive-scale data—will provide network analysts with better opportunities to distinguish between different sorts of relations. 8. One exception is Fararo and Skvoretz (1986, 1988; Fararo, Skvoretz, & Kosaka, 1994), who built upon work in animal studies. But they actually applied their model not to human groups, but to chickens. Skvoretz and Fararo (1996) later wed this model with task-group

110   John Levi Martin and James P. Murphy dynamics but had no longitudinal data with which to examine changes in vertical stratification over time. 9. Of course, similar findings may result from the incorporation of random error in the measure of status; it is not parvenus but miscategorized actors who appear to fall in status. But we see a similar finding with different data in Moody et al. (2011). However, as is often the case, it is far from clear that this should be expected to be a general finding; the Eagles, among others, point to the opposite dynamic, that of “the new kid in town.” 10. Researchers may also be interested in how “networks”—that is, personal relations—affect material inequality. Here we point the reader to a recent review by DiMaggio and Garip (2012). Most work here is not particularly attentive to network structure; the great exception is Burt (e.g., 1992). Here see McDonald in this volume. 11. Casual observation suggests that many professors would have great difficulty choosing which people they discuss important matters with, because of a sincere conviction that the only correct answer would involve all their discussion (or monologue) partners, since everything they say is important.

References Abramovitch, R. (1980). Attention structures in hierarchically organized groups. In D. R. Omark, F. F. Strayer, & D. G. Freedman (Eds.), Dominance relations: An ethological view of human conflict and social interaction (pp. 381–396). New York, NY: Garland. Adler, P. A., & Adler, P. (1995). Dynamics of inclusion and exclusion in preadolescent cliques. Social Psychology Quarterly, 58, 145–162. An, W., & McConnell, W. R. (2015). Origins of asymmetric ties in friendship networks: From status differential to perceived centrality. Network Science, 3, 269–292. Bales, R. F. (1950). Interaction process analysis. Cambridge, MA: Addison-Wesley. Barabási, A., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512. Barnes, J. A. (1954). Class and committees in a Norwegian Island Parish. Human Relations, 7, 39–58. Bearman, P., & Parigi, P. (2004). Cloning headless frogs and other important matters. Social Forces, 83, 535–557. Berger, J., Fişek, M. H., Norman, R. Z., & Zelditch, M. (1977). Status characteristics and social interaction. New York, NY: Elsevier. Berger, J., & Zelditch, M. (1998). Status, power, and legitimacy. New Brunswick, NJ: Transaction. Blau, P. M. (1964). Exchange and power in social life. New York, NY: Wiley. Boehm, C. (1997). Egalitarian behaviour and the evolution of political intelligence. In R. W.Byrne, & A. Whiten, (Eds.), Machiavellian intelligence II (pp. 341–364). Cambridge: Cambridge University Press. Bonacich, P. (1987). Power and centrality: A family of measures. American Journal of Sociology, 92, 1170–1182. Borgatti, S. P., Carley, K. M., & Krackhardt, D. (2006). On the robustness of centrality measures under conditions of imperfect data. Social Networks, 28, 124–136. Bothner, M. S., Haynes, R., Lee, W., & Smith, E. B. (2010). When do Matthew effects occur? Journal of Mathematical Sociology, 34, 80–114. Bothner, M. S., Smith, E. B., & White, H. C. (2010). A model of robust positions in social networks. American Journal of Sociology, 116, 943–992.

Networks, Status, and Inequality   111 Bourdieu, P. (1979). Public opinion does not exist. In A.  Mattelart & S.  Siegelaub (Eds.), Communication and class struggle (pp. 124–130). New York, NY: International General. Boyd, R., & Silk, J.  B. (1983). A method for assigning cardinal dominance ranks. Animal Behavior, 31, 45–58. Brashears, M.  E. (2014). “Trivial” topics and rich ties: The relationship between discussion topic, alter role, and resource availability using the “important matters” name generator. Sociological Science, 1, 493–511. Burt, R. S. (1977). Positions in multiple network systems, part one: A general conception of stratification and prestige in a system of actors cast as a social topology. Social Forces, 56, 106–131. Burt, R.  S. (1992). Structural holes: The social structure of competition. Cambridge, MA: Harvard University Press. Carley, K. M., & Krackhardt, D. (1996). Cognitive inconsistencies and non-symmetric friendship. Social Networks, 18, 1–27. Casciaro, T. (1998). Seeing things clearly: Social structure, personality, and accuracy in network perception. Social Networks, 20, 331–351. Chase, I. D. (1980). Social processes and hierarchy formation in small groups: A comparative perspective. American Sociological Review, 45, 905–924. Cheney, D. L. & Seyfarth, R. M. (1990). How monkeys see the world. Chicago: University of Chicago Press. Clauset, A., Shalizi, C. R., Newman, M. E. J. (2009). Power-law distributions in empirical data. SIAM Review, 51, 661–703. Coleman, J. S. (1961). The adolescent society. New York, NY: Free Press of Glencoe. Collins, R. (2000). Situational stratification: A micro-macro theory of inequality. Sociological Theory, 18, 17–43. Correll, S.  J., & Ridgeway, C.  L. (2006). Expectation states theory. In J.  Delamater (Ed.), Handbook of social psychology (pp. 29–51). New York, NY: Springer. Costender, E., & Valente, T. W. (2003). The stability of centrality measures when networks are sampled. Social Networks, 25, 283–307. Davis, J. A. (1979). The Davis/Holland/Leinhardt studies: An overview. In P. W. Holland & S.  Leinhardt (Eds.), Perspectives on Social Network Research (pp. 51‒62). New York: Academic Press. Davis, J. A., & Leinhardt, S. (1972). The structure of positive interpersonal relations in small groups. In J. Berger, M. Zelditch, Jr., & B. Anderson (Eds.), Sociological Theories In Progress Volume 2 (pp. 218‒251). Boston: Houghton Mifflin. de Sola Price, D. (1976). A general theory of bibliometric and other cumulative advantage processes. Journal of the American Society for Information Science, 27, 292–306. de Soto, C. B. (1960). Learning a social structure. Journal of Abnormal and Social Psychology, 60, 417–421. de Waal, F. (1998). Chimpanzee politics (rev. ed.). Baltimore, MD: Johns Hopkins University Press. DiMaggio P. D., & Garip, F. (2012). Network effects and social inequality. Annual Review of Sociology, 38, 93–118. Doreian, P., & Krackhardt, D. (2001). Pre-transitive balance mechanisms for signed networks. Journal of Mathematical Sociology, 25, 43–67. Eder, D. (1985). The cycle of popularity: Interpersonal relations among female adolescents. Sociology of Education, 58, 154–165.

112   John Levi Martin and James P. Murphy Eder, D., with Evans, C. C., & Parker, S. (1995). School talk: Gender and adolescent culture. New Brunswick, NJ: Rutgers University Press. Fararo, T.  J., & Skvoretz, J. (1986). E-state structuralism: A theoretical method. American Sociological Review, 51, 591–602. Fararo, T. J., & Skvoretz, J. (1988). Dynamics of the formation of stable dominance structures. In M.  Webster & M.  Foschi (Eds.), Status generalization: New theory and research (pp. 327–350). Stanford, CA: Stanford University Press. Fararo, T. J., Skvoretz, J., & Kosaka, K. (1994). Advances in e-state structuralism: Further studies dominance structure formation. Social Networks, 16, 233–265. Feld, S.  L. (1991). Why your friends have more friends than you do. American Journal of Sociology, 96, 1464–1477. Festinger, L. (1954). A theory of social comparison processes. Human Relations, 7, 117–140. Fişek, M. H., Berger, J., & Norman, R. Z. (1995). Evaluations and the formation of expectations. American Journal of Sociology, 101, 721–746. Flynn, F. J., Reagans, R. E., Amanatullah, E. T., & Ames, D. R. (2006). Helping one’s way to the top: Self-monitors achieve status by helping others and knowing who helps whom. Journal of Personality and Social Psychology, 91, 1123–1137. Foddy, M., & Smithson, M. (1996). Relative ability, paths of relevance, and influence in task oriented groups. Social Psychology Quarterly, 59, 140–153. Friedkin, N. E. (1993). Structural bases of interpersonal influence in groups: A longitudinal case study. American Sociological Review, 58, 861–872. Gould, R.  V. (2002). The origins of status hierarchy: A formal theory and empirical test. American Journal of Sociology, 107, 1143–1178. Gould, R. V. (2003). Collision of wills: How ambiguity about social rank breeds conflict. Chicago, IL: University of Chicago Press. Habinek, J., Martin, J. L., & Zablocki, B. (2015). Double-embeddedness: Spatial and relational contexts of tie persistence and re-formation. Social Networks, 42, 27–41. Hallinan, M. T. (1979). The process of friendship formation. Social Networks, 1, 193–210. Hallinan, M. T., & Hutchins, E. E. (1980). Structural effects on dyadic change. Social Forces, 59, 225–245. Hallinan, M. T., & Kubitschek, W. N. (1988). The effects of individual and structural characteristics on intransitivity in social networks. Social Psychology Quarterly, 51, 81–92. Hanfmann, E. (1935). Social structure of a group of kindergarten children. American Journal of Orthopsychiatry, 5, 407–410. Harvey, O. J. (1953). An experimental approach to the study of status relations in informal groups. American Sociological Review, 18, 357–367. Heider, F. (1946). Attitudes and cognitive orientation. The Journal of Psychology 21,107‒112. Heinicke, C., & Bales, R.  F. (1953). Developmental trends in the structure of small groups. Sociometry, 16, 7–38. Holland, P.  W., & Leinhardt, S. (1971). Transitivity in structural models of small groups. Comparative Group Studies 2, 107‒124. Holland, P. W., & Leinhardt, S. (1981). An exponential family of probability distributions for directed graphs. Journal of the American Statistical Association 76, 33‒50. Hölldobler, B., & Wilson, E. O. (1994). Journey to the ants. Cambridge, MA: Harvard University Press. Homans, G. C. (1950). The human group. New York, NY: Harcourt, Brace and Company. Homans, G. C. (1961). Social behavior: Its elementary forms. New York, NY: Harcourt, Brace and World.

Networks, Status, and Inequality   113 Iverson, G. J., & Sade, D. S. (1990). Statistical issues in the analysis of dominance hierarchies in animal societies. Journal of Quantitative Anthropology, 2, 61–83. Jameson, K. A., Appleby, M. C., & Freeman, L. C. (1999). Finding an appropriate order for a hierarchy based on probabilistic dominance. Animal Behaviour, 57, 991–998. Jensen-Campbell, L. A., Adams, R., Perry, D. G., Workman, K. A., Furdella, J. Q., & Began, S. K. (2002). Agreeableness, extraversion, and peer relations in early adolescence: Winning friends and deflecting aggression. Journal of Research in Personality, 36, 224–251. Johnsen, E. (1985). Network macrostructure models for the Davis-Leinhardt set of empirical sociomatrices. Social Networks 7, 203‒224. Kalma, A. (1991). Hierarchisation and dominance assessment at first glance. European Journal of Social Psychology, 21, 165–181. Karweit, N., & Hansell, S. (1983). Sex differences in adolescent relationships: Friendship and status. In J. L. Epstein & N. Karweit (Eds.), Friends in school: Patterns of selection and influence in secondary schools (pp. 115–130). New York, NY: Academic Press. Kitts, J. A. (2014). Beyond networks in structural theories of exchange: Promises from computational sociology. Advances in Group Processes, 31, 263–298. Kovács, B., & Sharkey, A. J. (2014). The paradox of publicity: How awards can negatively affect the evaluation of quality. Administrative Science Quarterly, 59, 1–33. Landau, H. G. (1953). On dominance relations and the structure of animal societies: III. The condition for a score structure. Bulletin of Mathematical Biophysics, 15, 143–148. Lee, B., & Bearman, P. (2017). Important matters in political context. Sociological Science, 4, 1–30. Lee, M. T., & Ofshe, R. (1981). The impact of behavioral style and status characteristics on social influence: A test of two competing theories. Social Psychology Quarterly, 44, 73–82. Leinhardt, S. (1972). Developmental change in the sentiment structure of children’s groups. American Sociological Review, 37, 202–212. Lenski, G. (1954). Status crystallization: A non-vertical dimension of social status. American Sociological Review, 19, 405–413. Linton, R. (1936). The study of man. New York, NY: Appleton-Century-Crofts. Logan, J. A., Hoff, P. D., & Newton, M. A. (2008). Two-sided estimation of mate preferences for similarities in age, education, and religion. Journal of the American Statistical Association, 103, 559–569. Maine, H. S. (1888 [1861]). Ancient law. New York, NY: Henry Holt. Manzo, G., & Baldassarri, D. (2014). Heuristics, interactions, and status hierarchies: An agentbased model of deference exchange. Sociological Methods and Research, 44, 329–387. Martin, J. L. (1998). Structures of power in naturally occurring communities. Social Networks, 20, 197–225. Martin, J. L. (2005). Is power sexy? American Journal of Sociology, 111, 408–446. Martin, J. L. (2009a). Social structures. Princeton, NJ: Princeton University Press. Martin, J. L. (2009b). The formation and stabilization of vertical hierarchies among adolescents: Towards a quantitative ethology of dominance among humans. Social Psychology Quarterly, 72, 241–264. Martin, J. L., & Yeung, K. (2006). Persistence of close personal ties over a twelve year period. Social Networks, 28, 331–362. Mazur, A. (1973). A cross-species comparison of status in small established groups. American Sociological Review, 38, 513–530. Menon, T., & Smith, E. B. (2014). Identities in flux: Cognitive network activation in times of change. Social Science Research, 45, 117–130.

114   John Levi Martin and James P. Murphy Merton, R. K. (1957). The role-set: Problems in sociological theory. British Journal of Sociology, 8, 106–120. Merton, R. K. (1968). Social theory and social structure. New York: Free Press. Mitchell, J. C. (1969). Social networks in urban situations: Analyses of personal relationships in central African towns. Manchester, UK: Manchester University Press. Moody, J., Brynildsen, W. D., Osgood, D. W., Feinberg, M. E., & Gest, S. (2011). Popularity trajectories and substance use in early adolescence. Social Networks, 33, 101–112. Nakao, K., & Romney, A. K. (1993). Longitudinal approach to subgroup formation: Re-analysis of Newcomb’s fraternity data. Social Networks, 15, 109–131. Newman, M. E. J., & Park, J. (2003). Why social networks are different from other types of networks. Physical Review E, 68, 036122. Paik, A., & Sanchagrin, K. (2013). Social isolation in America: An artifact. American Sociological Review, 78, 339–360. Podolny, J., & Lynn, F. (2009). Status. In P. Hedström & P. Bearman (Eds.), The Oxford handbook of analytical sociology (pp. 544–565). New York, NY: Oxford University Press. Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark: Danish Institute of Educational Research. Ridgeway, C. L. (1987). Nonverbal behavior, dominance, and the basis of status in task groups. American Sociological Review, 52, 683–694. Roberts, J. M., Jr. (1990). Modeling hierarchy: Transitivity and the linear ordering problem. Journal of Mathematical Sociology, 16, 77–87. Sade, D. S., Altmann, M., Loy, J., Hausfater, G., & Breuggeman, J. A. (1988). Sociometrics of Macaca mulatta II. Decoupling centrality and dominance in rhesus monkey social networks. American Journal of Physical Anthropology 77, 409‒425. Salganik, M.  J., & Watts, D.  J. (2008). Leading the herd astray: An experimental study of self-fulfilling prophecies in an artificial cultural market. Social Psychology Quarterly, 71, 338–355. Sauder, M., Lynn, F., & Podolny, J.  M. 2012. Status: Insights from organizational sociology. Annual Review of Sociology 38, 267–83. Savin-Williams, R. C. (1987). Adolescence: An ethological perspective. New York, NY: SpringerVerlag. Savin-Williams, R. C., & Joyner, K. (2014). The dubious assessment of gay, lesbian, and bisexual adolescents of Add Health. Archives of Sexual Behavior, 43, 413–422. Schaefer, D. R., Kornienko, O., & Fox, A. M. (2011). Misery does not love company: Network selection mechanisms and depression homophily. American Sociological Review, 76, 764–785. Schoch, D., Valente, T. W., & Brandes, U. (2017). Correlations among centrality indices and a class of uniquely ranked graphs. Social Networks, 50, 46–54. Sherif, M., Harvery, O. J., White, B. J., Hood, W. R., & Sheriff, C. W. (1988 [1961]). The robbers cave experiment. Middletown, CT: Wesleyan University Press. Skorvetz, J., & Fararo, T. J. (1996). Status and participation in task groups: A dynamic network model. American Journal of Sociology, 101, 1366–1414. Small, M. L. (2013). Weak ties and the core discussion network: Why people regularly discuss important matters with unimportant alters. Social Networks, 35, 470–483. Smith, E., Menon, T., & Thompson, L. (2012). Status differences in the cognitive activation of social networks. Organization Science, 23, 67–82.

Networks, Status, and Inequality   115 Smith, J. A., & Faris, R. (2015). Movement without mobility: Adolescent status hierarchies and the contextual limits of cumulative advantage. Social Networks, 40, 139–153. Smith, J.  A., & Moody, J. (2013). Structural effects of network sampling coverage I: Nodes missing at random. Social Networks, 36, 652–668. Smith, J. A., Moody, J., & Morgan, J. H. (2017). Network sampling coverage II: The effect of non-random missing data on network measurement. Social Networks, 48, 78–99. Stewart, D. (2005). Social status in an open-source community. American Sociological Review, 70, 823–842. Strayer, F. F., & Strayer, J. (1976). An ethological analysis of social agonism and dominance relations among preschool children. Child Development, 47, 980–989. Sumner, W. G. (1906). Folkways. Boston, MA: Ginn and Company. Tiger, L. (1970). Dominance in human societies. Annual Review of Ecology and Systematics, 1, 287–306. van de Rijt, A., Shor, E., Ward, C., & Steven S. (2013). Only 15 minutes? The social stratification of fame in printed media. American Sociological Review, 78, 266–289. van Duijn, M. A. J., Snijders, T. A. B., & Zijlstra, B. J. H. (2004). p2: A random effects model with covariates for directed graphs. Statistica Neerlandica, 58, 234–254. Vaughn, B. E., & Waters, E. (1980). Social organization among preschool peers: Dominance, attention, and sociometric correlates. In D.  R.  Omark, F.  F.  Strayer, & D.  G.  Freedman (Eds.), Dominance relations: An ethological view of human conflict and social interaction (pp. 359–379). New York, NY: Garland. Waller, W. (1937). The rating and dating complex. American Sociological Review 2, 727–734. Wang, D. J., Shi, X., McFarland, D. A., & Leskovec, J. (2012). Measurement error in network data: A re-classification. Social Networks, 34, 396–409. Wang, C., Butts, C. T., Hipp, J. R., Jose, R., & Lakon, C. M. (2016). Multiple imputation for missing edge data: A predictive evaluation method with application to Add Health. Social Networks, 45, 89–98. Weber, M. (1947). The theory of social and economic organization (A.  M.  Henderson & T. Parsons, Trans.). New York, NY: Oxford University Press. Whyte, W. F. (1981 [1943]). Street corner society (3rd ed.). Chicago, IL: University of Chicago Press. Yeung, K., & Martin, J. L. (2003). The looking glass self: An empirical test and elaboration. Social Forces, 81, 843–879. Zablocki, B. (1980). Alienation and charisma. New York, NY: Free Press. Zink, C.  F., Tong, Y., Chen, Q., Bassett, D.  S., Stein, J.  L., & Meyer-Lindenberg, A. (2008). Know your place: Neural processing of social hierarchy in humans. Neuron, 58, 273–283.

pa rt i i

N ET WOR K M ET HODS

Chapter 7

Str ategies for Col l ecti ng Soci a l N et wor k Data Overview, Assessment, and Ethics Jimi Adams, Tatiane Santos, and Venice Ng Williams

For decades, the field of social network analysis was dominated by a popular set of smallscale datasets.1 Scholars reanalyzed these existing datasets as newly developing methods allowed for refined strategies to answer the original research questions. For example, one of the most widely used datasets—Zachary’s (1977) karate club data—is so well known that it is used to benchmark new approaches for estimating community detection (see Fortunato, 2010).2 Commonly available data were also reanalyzed to evaluate new research questions or theoretical reformulations of previous ideas. A prominent example of this can be found in the numerous reanalyses of the Coleman, Katz, and Menzel (1957) data on physician prescribing practices. As scholars returned to these data, each subsequent paper reassessed the diffusion pattern of tetracycline among these physicians, with each new paper offering a modified account of the primary diffusion drivers that came in each study before (Burt, 1987; Kilduff & Oh, 2006; Marsden & Podolny, 1990; Strang & Tuma, 1993; Van den Bulte & Lilien,  2001). As the remainder of this chapter will detail, many best ­practices for gathering social network data are expensive (in time, effort, and resources), explaining why this and other popular early social network datasets were so frequently reanalyzed. Over the past few decades, scholars are incorporating network data collection into a rapidly expanding number and type of research projects. Alongside this growth, researchers have also moved to develop, and standardize methods employed in those studies. Fortunately, the field seems to have retained the strong norm of data sharing, meaning that many of these newly emerging datasets are available to researchers beyond those directly

120   Jimi Adams, Tatiane Santos, and Venice Ng Williams engaged in their collection. This expansion has grown to the point that even some individual projects now serve as repositories for hundreds of unique network data sets (Bevc, Retrum, & Varda, 2015). With the proliferation and availability of new social network data, the field has identified best practices for a more comprehensive set of dimensions that allow researchers to optimize social network data collection. This chapter provides an overview of three main dimensions—design strategies, assessment and implications of design strategies, and ethical considerations. However, before engaging in any discussion of how to gather social network data, it is imperative to identify what exactly it is that researchers aim to capture when soliciting information on social networks. This particularly matters for how we gather social network data, because different conceptualizations of the ties of interest can fundamentally alter strategies for measuring those relationships.

What Is the Goal? Theory’s Role in Gathering Network Data Borgatti and colleagues (2009) provide a means for classifying the different types of network ties most frequently hypothesized to shape outcomes that interest social scientists. They differentiate among three primary types of ties: (1) social relationships, (2) interactions, and (3) flows.3 Social relationships capture particular relationship-based roles that people occupy (e.g., friendships, kinship ties, affective relationships, etc.).4 These types of ties are frequently thought to exhibit a number of characteristics that enhance our ability to measure them—including that members of the relationship can easily perceive and report about them, and are often enduring (rather than fleeting) in time, allowing for stable meas­ ure­ment. Interactions capture the types of (frequently momentary) activities in which pairs of people jointly participate (having sex, sharing resources, sending and receiving messages, etc.).5 While social relationships differ from interactions conceptually, they also differ in the approaches taken for their analyses—for example, between models that focus on more stable relationships (e.g., exponential random graph models or stochastic actorbased models) versus those that focus on the sequential structural patterns among more momentary interactions (e.g., relational event models). Finally, flows capture the actual spreading or contagion of various things (ideas, diseases, resources, patient transfers) between the nodes in a network that are connected by relational or interaction ties. These varied conceptualizations each carry quite different implications for what types of ties researchers aim to measure, and therefore how we should go about capturing each type. Even when a study’s aims clearly align with one of these conceptualizations and clearly indicate an appropriate strategy for gathering the appropriate data (e.g., the transmission of a sexually transmitted infection), practical limitations of data collection often requires that researchers’ theoretical aims can only gather empirical proxies that approximate the actual processes of interest (e.g., using self-reported sexual partnership data, perhaps combined with biomarkers of disease status at multiple time points). Moreover, when theory and methods can align on the pertinent tie types, the methods of data collection frequently

Strategies for COLLECTING Social Network Data   121 cannot match the precision needed to actually capture all of the relevant dynamics from a population of interest. For example, even if researchers can accurately capture behaviors within sexual partnerships, it likely is not possible to do so with the temporal granularity to distinguish the discrete act in which transmission actually takes place from other nontransmitting interactions between the same partners. This requires researchers to carefully identify potential sources of conceptual slippage between the concepts of theoretical interest and their actual data and analytic capacities.6 In addition to these conceptual comparisons, later in the chapter we also describe existing strategies for—and common findings from— directly assessing the quality of network data. Following longstanding norms in the literature, we orient much of our discussion to follow data collection strategies around approaches developed for survey-based approaches, which primarily address gathering social relationship data (Marsden, 2011). Most of these considerations carry over into gathering data on other tie types, and we describe the necessary adaptation of those concepts for other approaches where appropriate.

Design Strategies for Sampling and Measurement Researchers are faced with three primary design considerations when gathering social network data, regardless of which type of tie(s) they are interested in capturing. First, which actors should be included? Second, which of those actors’ ties will they aim to capture? Third, how will those relationships be measured?

The “Boundary Specification” Problem Generally in social science, sampling is addressed solely as a question of which individuals within a population should be targeted for inclusion in a study. When the analytic focus shifts to relationships, which are the foundation of social network analysis (Berkowitz, 1982), this sampling question must simultaneously bound which individuals and relationships will be included in the analysis—a consideration labeled the “boundary specification problem” (Laumann, Marsden, & Prensky, 1994). Broadly, there are three primary strategies employed to define the boundaries of network-­based study designs. Complete network designs begin by enumerating an entire population of interest, acquire a saturated sample from that population that represents a full census of respondents, then identify all of the relationships that exist among members of that population. Complete network designs typically are most effective in populations that are easily bounded (e.g., formal organizational memberships) and provide salient boundaries that constrain the majority of relationships to exist among members of that bounded population (e.g., schools or classrooms). However, just because boundaries can be clearly defined does not necessarily mean that they provide the social boundaries that

122   Jimi Adams, Tatiane Santos, and Venice Ng Williams constrain social relationships. While some members can be clearly demarcated as within certain boundaries, their relationships may—or may not—be similarly contained within those boundaries (e.g., out-of-school friendships). This is where network scholars have repeatedly noted that the nature of the boundary specification problem is uniquely multilevel in social network studies; we must simultaneously consider how fully the population and relationships of interest are contained within the demarcated boundary (Laumann et al., 1994; Morris, 2004). On the other end of the spectrum, ego network designs investigate networks primarily as characteristics of the relationships surrounding particular individuals (see Smith’s Chapter 10 in this volume). Because of this focus on individuals and their relationships, for ego network designs, sampling considerations generally follow relatively standard individual-­level procedures. As a result, ego network designs can be included in other study designs, adding personal network information to the other individual-level data that are gathered, such as from nationally representative, population-based samples. For example, the General Social Survey has periodically included a network module (Burt, 1984). In addition to the opposing extremes of the approaches described above for determining study boundaries, many studies take a more blended strategy labeled a “partial network” design (Morris, 2004). This design relies on a link-tracing approach, wherein the researcher first identifies a sample of “seed” respondents, then constructs a strategy to subsequently recruit some portion of each ego’s nominated alters to be recruited as respondents themselves. This pattern can then be repeated as many or as few times as desired. While this strategy has similarities to general snowball sampling, the additional waves of recruitment in link-tracing designs are typically more systematically determined (see McCormick’s Chapter 9 in this volume). For example, epidemiologists developed many of the earliest, and most sophisticated, versions of these designs to trace infectious disease exposure and transmission (Klovdahl, 1985).7 Beyond these different conceptualizations for which people and ties to include, Laumann and colleagues (1994) also distinguish between strategies for who can make those boundary definitions. That is, even if researchers determine that their study requires a “complete” network study of members within an organization, different people may have divergent perspectives on who are appropriately identified as members. The realist approach to boundary specification relies on people within the population of interest to subjectively determine who should be included or excluded from consideration as members of an or­gan­i­za­tion (Laumann et al., 1994). A nominalist approach begins with some predetermined identification of the population boundary—for example, via a membership roster. It is important to point out that either of these boundary specification approaches merely identifies which members and ties potentially could be included in the study8 and is a separate question from the question we turn to in the next section: which ties are actually reported on. Each of these approaches come with their own potential benefits and drawbacks. The prominent National Longitudinal Study of Adolescent Health (Add Health) demonstrates some of the potential limitations that arise even in seemingly well-bounded networks. Bearman and colleagues’ (2004) study of romantic and sexual networks found that even when they asked students to identify their partners from a roster of other students attending their school,9 approximately half of the indicated partnerships were outside the school. Given these permeable boundaries, those ties outside the school had broad potential

Strategies for COLLECTING Social Network Data   123 implications for the study’s aims to understand how these networks shaped population-level diffusion potential and determinants of individual risk for sexually transmitted infections.

Name Generators—Which Relationships? In the context of a social network survey, once researchers have a strategy to identify respondents, they must consider what number and type(s) of ties they will ask about and what information they will gather about those relationships. In network studies, these distinctions typically break down into selecting the name generators and name interpreters that will be used. Simply, a name generator identifies which partners (often referred to as alters) will be elicited within a study, based on a specified relationship type. Traditionally, social network studies have often focused on a single name generator to elicit information on one relationship type at a time. In this case, researchers rely on theory to carefully identify the most appropriate relationship on which to elicit information. Perhaps the single most frequently used name generator is the General Social Survey’s (GSS) “important matters” question.10 Prompts like these allow respondents to name partners with whom they share the specified relationship. Name generators like the “important matters” question leave substantial room for respondent interpretation, leading some to question the precision in its resulting data (Bearman & Parigi, 2004). Research has repeatedly shown that more specific name generators solicit more reliable, precise, and accurate data (Brewer et al.,  2005; Adams & Moody,  2007). However, others note that the GSS “important matters” question was used primarily to elicit respondents’ salient relationships from which they could potentially draw social support and regular interaction. Given these aims, some research has shown that questions open to respondent interpretation can ­appropriately prompt responses of their salient personal relationships, even if relying on substantially different mental maps of what the question means while formulating their responses (Brashears, 2014). Practically, using a single name generator reduces both interviewer and respondent burden but may reduce the specificity and/or the range of relationships successfully captured (Marin & Hampton, 2006). These tradeoffs raise the important question of whether a single name generator is adequate for a study’s aims. Ultimately, there is no blanket answer to this question; the recommended best practices have typically relied on a study’s theoretical motivation—that is, the “ideal” network should include all the individuals or organizations who have a role in the process of interest (e.g., public policy decision making, quality improvement, etc.) while excluding individuals who have no bearing on the process (Burt et al., 2012). An increasing number of studies show that for many questions, a single name generator cannot replace the same coverage provided by multiple name generators (Klofstad, McClurg, & Rolfe,  2009; Marin & Hampton,  2006). For example, a study of physicians sought to limit respondent burden by identifying the minimal set of name generators needed to sufficiently characterize networks involved in quality improvement (QI) intervention and dissemination (Burt et al., 2012). The researchers found that single name generators insufficiently enumerated the ties that are relevant to QI network processes. Other work highlights the importance of multiple name generators for spanning the different relational domains pertinent to individuals’ lives (McCarty, Killworth, & Rennell, 2007).

124   Jimi Adams, Tatiane Santos, and Venice Ng Williams Whether employing one or multiple name generators, all studies must consider a n ­ umber of additional factors regarding how to employ the selected name generator(s). A first consideration is whether respondents are provided with a roster of other members of the population from whom to identify potential alters or are asked to free-recall their names. Employing a free-recall approach has the potential to underrepresent respondents’ personal network size, due to limits on memory and recall (Brewer, 2000). For example, in a longitudinal study of older adults, after respondents named confidant alters via free recall, researchers then prompted respondents with alters they named in previous waves of the study to follow up on their exclusion from subsequent rounds (Cornwell et al., 2014). This study went on to show that these exclusions represented both real network change (i.e., former confidants who are no longer partners) and memory errors (i.e., people the re­spond­ ent confirms as still being a confidant, but only after prompting). A second consideration is how many alters about whom the researchers aim to elicit information for each name generator. Are they going to cap the number of alters about whom they elicit information? Many studies limit respondents to a small number of nominations (e.g., three to five; see, e.g., Merluzzi & Burt, 2013), either because small numbers are thought sufficient to capture the theoretically salient relationships or because of the burdens that each additional named alter implies for follow-up questions (Paik & Sanchagrin, 2013). Caps can either be implicit (i.e., only known to the researcher) or explicit (i.e., mentioned to respondents, perhaps in the question script). Research shows that explicitly identified caps can have a range of unintended consequences. Mentioning a number could introduce floor effects that artificially inflate the number of named partners,11 which could alter a study’s analytic findings. For example, when Add Health asked respondents to name their five best male and female friends (Bearman, Jones, & Udry, 1997), it may have inadvertently reduced the amount of gender-based homophily (Kandel, 1978; McPherson, Smith-Lovin, & Cook, 2001) observed in that population by encouraging respondents to name more opposite-sex friends than they would have without such an explicit numerical anchor in the question prompt. If respondents or interviewers interpret a number included in the prompt as a cap on the number of alters to be elicited, the population degree distribution could be artificially truncated (by stopping reporting or recording nominations above the cap), thus reducing both the observed range and variability within the population.

Name Interpreters—Information about Identified Social Ties Most studies follow name generator question(s) with name interpreters. Name interpreters ask respondents to provide additional information about each alter nominated in their responses to the name generator question(s). Typically, these focus on (1) attributes of the nominated alters, (2) details of the relationship between the respondent and each alter (e.g., strength, type, or frequency of the relationship), and (3) the respondent’s estimate of relationships that exist among their alters (e.g., “How well does Person A know Person B?” for each pair of alters). For most network-based investigations, we are interested in not only the number of ties that individuals have but also attributes of those alters to whom they are connected. These attributes can be used to examine the composition of networks (e.g., how many, what

Strategies for COLLECTING Social Network Data   125 proportion, etc., of one’s alters have particular characteristics) or the (dis)similarity between ego and alter attributes (e.g., the well-known network pattern of homophily, McPherson et al., 2001). This aim leads to most network studies including some strategy to gather information on alters’ attributes. Any time alters are included among a study’s respondents (e.g., in complete network designs, and once recruited into a partial network designs), their attributes can be gathered from these respondents themselves. In the case of ego network designs, respondents’ reports on their alters’ attributes are likely the only source for obtaining such information. Moreover, sometimes, even when those alters are included among a study’s respondents, it can still be valuable to get a respondent’s perception of their partners’ attributes (e.g., as validation, or to model perception vs. reality). However, as with many aspects of network data collection, respondent burden rapidly increases: each additional name interpreter must be asked for each alter identified by the name generator. Therefore, when gathering network data, it is often thought to be even more important to optimize the efficiency and appropriateness of name interpreter items (Young et al., 2016), by avoiding the collection of irrelevant, redundant, costly, and/or time-consuming data (McCarty et al., 2007). In addition to alters’ attributes, studies also frequently rely on egos to provide additional information about the relationships identified with the name generators. For example, in Add Health, respondents were asked to describe whether they had engaged in a selection of activities (talked on the phone, visited their house, etc.) for each of their named friends (Bearman et al., 2004). These tie-based interpreters can be considered either as separate (behavioral) network questions to be addressed independently (e.g., by analyzing the patterns of phone communication networks among identified friends) or more frequently as an opportunity to interpret how the networks elicited by the name generators should be used in analyses (e.g., by considering how phone communication fosters friendship ties). Alternatively, these relationship-based name interpreters can allow researchers to investigate how respondents potentially differentially interpret the same question prompts (see Bearman & Parigi, 2004). In summary, network data collection design strategies entail the three primary components described: (1) delineating which nodes and which ties will (and will not) be of interest to a study—frequently described as the “boundary specification problem,” (2) determining which set(s) of relationships on which to gather information—referred to as “name generators” in survey approaches, and (3) attaching additional information about the nodes and ties under study—referred to as “name interpreters” in survey approaches. While the previous descriptions focus on survey approaches for eliciting network data (i.e., by asking some egos to report on themselves and the ties they have to a specified set of alters), data collection strategies have rapidly evolved to incorporate a variety of other strategies (e.g., see the chapters in this volume by Kitts and Quintaine [Chapter 5] and Brashears and Gladstone [Chapter 8], or Dominguez & Hollstein, 2014). These other strategies can range from relying on external sources of information on individuals (e.g., from archival sources) that can be attached as node attributes to passive behavioral monitoring (e.g., recording behavioral interaction data from automated sensors) (Eagle, Pentland, & Lazer, 2009; Salathe et al., 2010). While these strategies differ in the ways that data are actually collected, all of the considerations described apply equally to any of them. For example, boundary specification is an issue that applies to data collected by survey or by sensor. Moreover, since they were collected for other purposes, archival or passive data sources’

126   Jimi Adams, Tatiane Santos, and Venice Ng Williams inclusion criteria have the potential to differ in meaningful ways from the analytic aims of any study in which they are used, and researchers should carefully consider how the differences between their aims and the qualities of available data may alter how they interpret any subsequent analytic results.

Data Quality and Assessment: Did We Capture What We Intended to Capture? Once gathered, social network data often provide unique capabilities for assessing their quality (Wald,  2014). Here we describe some of the (1) primary ways network data has been evaluated and common descriptive patterns found across these assessments, (2) implications of these types of assessments for the analytic capabilities of network data in other studies, and (3) existing strategies to overcome the limitations identified in the first two questions.

Tie Reliability and Validity Because social network ties, by definition, involve two different actors, frequently the same information can potentially be reported by both actors involved—and as noted earlier, sometimes even by their partners (see an additional perspective on this in the section on cognitive social structures later). This allows direct comparisons of how consistently the same ties are reported among these various reports, providing the capacity to assess data reliability—if not validity—of the gathered tie information. For example, Adams and Moody (2007) evaluated how consistently partners reported sexual and needle-sharing partnerships in a project examining HIV risk (Potterat et al., 2004). This analysis found broad general agreement among partners when reporting on their own behaviors—which improved when properly accounting for the temporal specificity with which respondents were asked about their partnerships (Adams & Moody, 2007).12 This type of analysis builds on a long research tradition, which raises a number of questions about how reliably people report partnership information (see, e.g., Killworth & Bernard, 1976; Bernard, Killworth, & Sailer, 1979). But as exhibited by the example earlier (Adams & Moody, 2007), tie reports that are more salient (e.g., strong vs. weak relationships) and more precisely elicited ties (e.g., with specific time-bounding of relationship reporting windows) are more likely to be reported consistently (see also Brewer,  2000; Brewer & Webster, 1999).

Implications and Quality Assessment Researchers are increasingly incorporating the types of evaluations described previously into their data collection efforts (Phillips et al., 2017; An & Schramski, 2015). But beyond simply describing patterns of data fidelity, researchers are increasingly demonstrating how

Strategies for COLLECTING Social Network Data   127 any such imprecisions in relational data (e.g., misidentifying characteristics of alters) can influence the interpretation of network patterns from the subsequent data (Young et al., 2016). Different strategies for handling partner disagreements on relationships can lead to altered estimates of a relationship’s existence, duration, and content (Adams & Moody, 2007; Phillips et al.,  2017). Such variability between partners’ reports has proven important in some contexts for estimating behavioral implications for population-level HIV-relevant risk behaviors (Helleringer et al., 2011), but other researchers have shown that estimates of an influenza epidemic can be quite robust to how this variability in reporting is incorporated into their models (Potter, Smieszek, & Sailer, 2015). As such, the utility of such reliability and validity assessments is best understood in the context of the data’s intended applications rather than solely as a standalone question. Beyond simply considering network reporting inconsistencies as a nuisance to accurate representation that hampers accurate analytic interpretations, other researchers have attempted to leverage these incongruences as theoretically informative. For example, in populations where preferential attachment processes are at work, we find high correlation between received nominations and most centrality indicators (Smith, Moody, & SmithLovin, 2017). However, because of recall limitations and/or alter list truncation, the most popular actors cannot reciprocate all of the nominations they receive. Grippa and Gloor (2009) highlight how this greater likelihood for high-degree actors to have unreciprocated nominations can be leveraged as an independent estimator of these individuals’ centrality (on the underlying complete graph, even if it is not observed) or their reputation and informal leadership within the group.

Strategies for Optimizing Data Fidelity A variety of strategies have emerged for resolving discrepancies between partners’ reports on the same relationships. The most straightforward possibilities are either to use the intersection set, which only counts a relationship as existing when multiple reports both agree on the nature of the relationship, or to use the union set, which counts a relationship as existing if either of the members of a relationship describes its presence. When choosing only between these two options, researchers more commonly rely on union-set logic (Brewer & Webster,  1999). The rationale for this stems from studies often precluding one partner’s potential to report on a tie’s presence (e.g., through nonresponse) and findings from concordance comparisons in empirical literature that “false negatives” are a much more common reason for discrepancies. That is, researchers choose to include ties that are reported by either partner and do not take the other partner’s failure to confirm the nomination as necessarily indicating it is not present (Brewer, 2000).13 While such simple (and mostly a priori) decisions about tie inclusion/exclusion are common, recent work has suggested the probabilistic estimation of tie likelihoods as an alternative strategy for resolving inconsistencies between tie reports (Butts, 2003). One example of this strategy requires computing a score of each actor’s “credibility,” then applying this credibility as a weight to each respondent’s respective report of a tie (An & Schramski, 2015). This allows for model-based estimation of multiple possible graph representations for a single network, derived from a single set of survey responses.

128   Jimi Adams, Tatiane Santos, and Venice Ng Williams

Cognitive Social Structures David Krackhardt (1987) introduced a different conceptualization for capturing social ­network data—known as “cognitive social structures” (CSS)—wherein instead of asking people to only report on the ties they are personally involved in, researchers ask all members of a group to estimate the patterns of relationships among the entire population. For example, in Krackhardt’s (1987) initial introduction of the idea, he shows how CSS can provide an improved prediction of individuals’ performance in a small management firm. While generally thought to provide a different view of networks’ capabilities in a population, it has also been used as a means to fill in missing data or provide additional means for adjudicating between conflicting reports. Neal (2008) shows how aggregating across the complete set of “perceptual networks” provided by CSS can provide a single aggregate consensus structure in a classroom setting. This consensus structure provides valued information on a tie’s presence and salience. She shows how this consensus structure overcame some of the limitations of relying solely on the self-reported network by more completely capturing the classroom’s clustering into separate groups. In part this improvement of fit for describing the global pattern arose from smoothing over some of the potential information loss derived from unreciprocated tie reports—particularly by reducing the likelihood of “false negative” reports (Neal, 2008).

Unique Ethical Considerations of Network Data The same general principles used in the social sciences for the protection of research participants also apply when gathering network data. However, to follow the intent of these principles and not just the prescriptions developed to uphold them in practice requires some additional considerations beyond standard protocols for network data. A special issue of Social Networks (Breiger, 2005) was devoted to identifying and making recommendations for handling these unique considerations. It should be noted that though this issue is now more than a decade old, and some of the pragmatic solutions to the issues raised in it have since changed, it still provides useful elaboration of the main set of unique ethical factors that network studies should consider. Here, we begin with those considerations that stem directly from the Belmont Report’s (National Commission for the Protection of Human Subjects,  1979) primary ethical protections—those of minimized risk and informed consent.

Ethics in Data Collection Charged with minimizing the risks to their research participants, social scientists are frequently primarily concerned with providing respondents with assurances of either anonymity or confidentiality. Anonymity entails not gathering information from research participants that could potentially be personally identifying. In practice, for all but the

Strategies for COLLECTING Social Network Data   129 simplest ego network designs, anonymity is often not possible for the analytic aims of social network research; to analyze the connections among members of a population, we need to know which members of the population the data represent (Borgatti & Molina,  2003). Alternatively, confidentiality ensures that while researchers will know (or have the capacity to know) research participants’ identity, they will report all data and analytic findings in a way that does not convey any personally identifying information. Confidentiality is more viable for network studies, but either strategy is simpler in principle than they are in practice. A burgeoning literature has identified the potential problems of deductive disclosure. This work shows that with enough information, individuals’ identities can be deduced from deidentified data, even in some cases where researchers began with truly anonymized data. Computer scientists have devoted considerable attention to these questions in the domain of “big data.” For example, Narayanan and Shmatikov (2009) gathered an anonymous graph of several thousand Twitter users, which they then aimed to (re)identify using auxiliary information from a complementary dataset of Flickr users. Using only features of the network from the Flickr dataset, they were able to successfully reidentify 30.8% of sampled Twitter user pairs. The vast majority of the 12.1% rate of misidentified false positives from their automated methods matched to a person in the same geographic location and/or were only one step removed from the true match, which could mostly be manually corrected.14 This example highlights how even strict data anonymization protocols do not sufficiently protect participants’ identities, and oftentimes network-based information (which may be readily and publicly available) can facilitate this identification. In other words, while deductive disclosure is increasingly a concern for all human subjects research, network data may be particularly susceptible to the potential limitations of standard deidentification procedures. The Belmont Report has also been used to ensure that human subjects’ research participation is voluntary and includes informed consent—requiring that participants know the potential risks involved before agreeing to be involved. This dimension of human subjects’ protections has occasionally proven particularly perplexing for institutional review boards (IRBs) when evaluating social network research. Given that network designs often request participants to report on the identity, characteristics, and even relationships of their partners, IRBs have debated whether these alters should be classified as “secondary subjects” (Marsden, 2011), and therefore whether their informed consent is required to gather and/or retain such data (Morris, 2004; Sönmez et al., 2016). Researchers have devised a number of strategies to address the privacy concerns for such secondary subjects. A first strategy is to assess the actual risks of their inclusion. If it can be established that the inclusion of these alters’ information would provide no more potential risks than they encounter in their daily lives, some IRBs have concluded that it is not necessary to obtain their consent (Klovdahl,  2005). Another approach leverages the fact that many network studies attempt to collect and analyze data on entire populations, not just a sample from it. In this case, recruitment strategies can themselves directly address the consent for everyone in the population. For example, Add Health employed an active consent procedure whereby every student within the recruited schools were asked for their consent to be included in the study (Bearman et al., 1997). The research team then generated the rosters from which respondents nominated friends and romantic partners to only include those who gave their consent to be included.15 Some researchers have successfully argued

130   Jimi Adams, Tatiane Santos, and Venice Ng Williams that active consent is an undue burden on researchers when studies involve minimal risk, leading IRBs to allow passive consent—that is, notifying the entire population of the aims and potential risks of the study, then only excluding information from being collected on those who opt out (Lorant et al., 2015).16 Network-based sampling approaches—like those that compose the partial network designs described earlier—have devised strategies that allow for the anonymized matching of nominated partners. For example, respondent-driven sampling (RDS) uses a staged procedure that (1) asks respondents to report only on characteristics of their alters (rather than identifying them), (2) then provides respondents with uniquely identified tokens that respondents give to those described partners, (3) who in turn opt in to participating in the study by returning the token to the study team (Gile & Handcock,  2010; Wejnert & Heckathorn, 2008). These RDS recruitment chains can then be linked via the tokens’ unique identifiers, and after the linkage is made, the identifying information can be removed prior to data storage and analysis (Sönmez et al., 2016).

Ethics in Data Analysis and Presentation of Results Sharing research results with the population under study has become an increasingly common practice. It can simultaneously allow the population to inform the interpretation of study results and facilitate their ability to benefit from its findings (Wallerstein & Duran, 2006). Kadushin (2005) adapts these ideas for social network data. He notes the challenges of potential participant identifiability (especially in smaller networks), therefore recommending that findings reports are generalized in a way that removes identifiable features. This deidentification may necessitate going beyond simply removing individuals’ characteristics. For example, presenting a network visualization in some cases could reveal any individuals as uniquely identifiable from their position, which could be extrapolated to identify other actors they are (directly or indirectly) connected to. As with any research, it is also important to evaluate how findings could be used. Network studies present a number of unique usage considerations. For example, in surveillance of infectious diseases, the ability to track potential transmission through a population may require modest risks stemming from individuals’ partner identification, but these may be outweighed by the potential population-level benefits of slowing its spread (Klovdahl, 2005). Management and business consulting has increasingly incorporated network analysis into evaluating and optimizing firm performance. These types of analyses have been used to determine vital outcomes for employees in those organizations (e.g., promotion, salary, and termination decisions). In these cases, while traditional informed consent procedures can allow participants’ awareness of potential risks, assurances of confidentiality, and how voluntary their participation is (Borgatti & Molina, 2003), they may not sufficiently protect participants from detrimental (or biases in beneficial) outcomes. As a result of these potential real-life harms from organizational network analyses, researchers have crafted careful agreements with these partner organizations, which control who will see the data, how it could be shared, and what ramifications may arise if someone else sees a respondent’s information (Borgatti & Molina, 2005). Some organization-based researchers even recommend explicitly executing contractual agreements with the studied organization’s leadership to ensure that the researcher maintains control over all data ownership and presentation, so

Strategies for COLLECTING Social Network Data   131 that individual participants are protected from potential harms (Borgatti & Molina, 2005; Kadushin, 2012). In sum, appropriately protecting human subjects in social networks research designs frequently requires the researchers to take additional steps beyond those common to social science—even to achieve the same goals.

Summary This chapter has provided an overview of social network data collection strategies, focusing particularly on study design, data assessment, and ethical considerations. As with most of the topics in this handbook, these and related questions could warrant a standalone course, whereas it is often a single module in larger courses on social network analysis. Gathering social network data requires a different set of orienting principles than governs general strategies for data collection in the social sciences. For sampling, this requires considering relationships in addition to people. For measurement, this means carefully weighing the variety of theoretical and pragmatic tradeoffs that arise from the number of name generators and name interpreters to be employed. Moreover, as other opportunities for gathering network data become increasingly common (e.g., from passive collection or organic data sources), these same considerations should be used to identify what datasets include. Simply, “big data” are not immune from the need to carefully factor in how these same choices shape what data represent. Network data, and the design strategies used to gather them, uniquely provide a variety of possibilities for descriptively assessing the quality of data, and in turn how that quality alters the capacity of analyses to which network data are employed. Simply applying the human subjects’ protections from general social science protocols is not sufficient in network studies to maintain the aims of minimized risk and informed consent that motivate research protections. In particular, this must be carefully evaluated as research fields continue to embrace ideals of open science and data sharing, aims shared historically within the field of social network analysis.

Notes 1. This is not to suggest that other scholars were not also gathering their own data. Many were. It simply points out how few such scholars there were. And as with any burgeoning field, each employed his or her own newly developing strategies for gathering that data. 2. This is so frequently the case that the NetSci annual meetings have adopted the norm of awarding a karate trophy to the first presentation each year that makes use of these data. 3. In the paper, they also present a fourth type of “relationship” often modeled as a social network—similarities (e.g., being the same gender, being from proximate locations, or having similar attitudes). We leave these out of the discussion here, because those are better conceptualized as similarities that are modeled as social relationships, not actually representing social relationships. That is, these can be captured as individual attributes with the represented relationships and then estimated analytically (and are outside the scope of this chapter). Moreover, these types of ideas require a different theoretical

132   Jimi Adams, Tatiane Santos, and Venice Ng Williams f­ramework and subsequent analytic strategies that apply to the types of relationships addressed here. 4. Networks can also represent the relationships among entities other than people. Here, we discuss ties connecting people but encourage readers to recognize that this is merely a shorthand to facilitate ease of writing. 5. While social relationships and interactions can involve groups, to gather and represent these collective experiences, we record the collection of dyadic relationships/interactions occurring within that group. 6. Data proxies are often the coin of the realm across the social sciences, so this alone is not unique to social network data. However, the ramifications of such proxies can often be more pronounced in social network data (see, e.g., the discussion of tie reliability and validity later). 7. Adaptations of such link-tracing designs provide the backbone of study designs used to identify and study hard-to-reach populations for whom no appropriate sampling frame exists—for example, respondent-driven sampling (Heckathorn,  1997; Salganik & Heckathorn, 2004). However, while RDS studies sample over networks, they rarely actually include sufficient network data to analyze network structure within those populations (for an exception, see Khan et al., 2015). 8. As is frequently highlighted in social network analysis literature, compared to those that are present, ties that are absent—but possible—are frequently as important (if not more so) for understanding the network structure and its implications within a population. 9. While the prompt focused on in-school partnerships, students had the option to name partners outside the school. 10. Specifically, its prompt asks respondents to name “the people with whom you discussed matters important to you” (Burt, 1984). 11. This was the explicit protocol for the important matters question in the General Social Survey. If respondents named fewer than five alters, interviewers were instructed to probe respondents for any additional alters (Burt, 1984). 12. That is, simply because two people disagree about whether they shared a particular relationship does not necessarily indicate disagreement. In studies where partners are asked to only report on partnerships that occurred within a particular timeframe (Colorado Springs respondents were asked about partnerships occurring in the previous six months; Potterat et al., 2004), if the interview and relationship windows are misaligned, seeming disagreements can both be accurate. For example, suppose I report today about a partnership I had six months ago that only lasted a few weeks. If that partner is interviewed a month from today, he or she could accurately leave me out of the partnerships he or she reports within the six months prior to the interview. 13. This approach is also consistent with the assumptions underlying analyses of ego network data—that individuals can accurately report on their own behaviors, without the possibility for assessing corroboration by their partners. 14. The remaining 57% were not matched to any Twitter users from their sample. The methods used do not allow them to assess whether these were non-Twitter users, Twitter users outside their sample, or Twitter users inside the sample that were simply not matched. 15. In fact, Add Health’s consent procedure was a two-stage active consent, in that both the students themselves and their parents had to agree to the students’ inclusion in the study. 16. However, passive—or even assumed—consent has been criticized as potentially in­con­sist­ ent with standard interpretations of the “Common Rule.” For example, see the discussion

Strategies for COLLECTING Social Network Data   133 (e.g., Fiske & Hauser,  2014; Kahn, Vayena, & Mastroianni,  2014) regarding a massive Facebook experiment on emotional contagion (Kramer, Guillory, & Hancock, 2014).

References Adams, J., & Moody, J. (2007). To tell the truth? Measuring concordance in multiply-reported network data. Social Networks, 29, 44–58. An, W., & Schramski, S. (2015). Analysis of contested reports in exchange networks based on actors’ credibility. Social Networks, 40, 25–33. Bearman, P.  S., Jones, J., & Udry, J.  R. (1997). The national longitudinal study of adolescent health: Research design. Chapel Hill, NC: University of North Carolina. Bearman, P.  S., Moody, J., Stovel, K., & Thalji, L. (2004). Social and sexual networks: The National Longitudinal Study of Adolescent Health. In M. Morris (Ed.), Network epidemiology: A handbook for survey design and data collection (pp. 201–220). London, UK: Oxford University Press. Bearman, P., & Parigi, P. (2004). Cloning headless frogs and other important matters: Conversation topics and network structure. Social Forces, 83(2), 535–557. Berkowitz, S. D. (1982). An introduction to structural analysis: The network approach to social research. Toronto: Butterworths. Bernard, H. R., Killworth, P. D., & Sailer, L. (1979). Informant accuracy in social network data IV: A comparison of clique-level structure in behavioral and cognitive network data. Social Networks, 2(3), 191–218. Bevc, C.  A., Retrum, J.  H., & Varda, D.  M. (2015). Patterns in PARTNERing across public health collaboratives. International Journal of Environmental Research and Public Health, 12(10), 12412–12425. Borgatti, S. P., Mehra, A., Brass, D. J., & Labianca, G. (2009). Network analysis in the social sciences. Science, 323, 892–895. Borgatti, S. P., & Molina, J. L. (2003). Ethical and strategic issues in organizational social network analysis. Journal of Applied Behavioral Science, 29(3), 337–349. Borgatti, S. P., & Molina, J-L. (2005). Toward ethical guidelines for network research in or­gan­ i­za­tions. Social Networks, 27(2), 107–117. Brashears, M.  E. (2014). “Trivial” topics and rich ties: The relationship between discussion topic, alter role, and resource availability using the “important matters” name generator. Sociological Science, 1, 493–511. Breiger, R. L. (Ed.). (2005). Ethical dilemmas in social network research. Special Issue of Social Networks, 27(2), 89–168. Brewer, D. D. (2000). Forgetting in the recall-based elicitation of person and social networks. Social Networks, 22, 29–43. Brewer, D. D., John, J. P., Muth, S. Q., Malone, P. Z., Montoya, P., Green, D. L., . . . Cox, P. A. (2005). Randomized trial of supplementary interviewing techniques to enhance recall of sexual partners in contact interviews. Sexually Transmitted Diseases, 32(3), 189–193. Brewer, D.  D., & Webster, C.  M. (1999). Forgetting of friends and its effects on measuring friendship networks. Social Networks, 21, 361–373. Burt, R. S. (1984). Network items and the General Social Survey. Social Networks, 6, 293–339. Burt, R. S. (1987). Social contagion and innovation: Cohesion versus structural equivalence. American Journal of Sociology, 92, 1287–1335.

134   Jimi Adams, Tatiane Santos, and Venice Ng Williams Burt, R., Meltzer, D. O., Seid, M., Borgert, A., Chung, J., Colletti, R. B., . . . Margolis, P. (2012). What’s in a name generator? Choosing the right name generators for social network surveys in healthcare quality and safety research? BMJ Quality and Safety, 21, 992–1000. Butts, C.  T. (2003). "Network Inference, Error, and informant (in)accuracy: A Bayesian approach." Social Networks 25: 103–140. Coleman, J. S., Katz, E., & Menzel, H. (1957). The diffusion of an innovation among physicians. Sociometry, 20(4), 253–270. Cornwell, B., Schumm, L. P., Laumann, E. O., Kim, J., & Kim, Y-J. (2014). Assessment of social network change in a national longitudinal survey. Journals of Gerontology, Series B: Psychological Sciences and Social Sciences, 69(8), S75–82. Dominguez, S., & Hollstein, B. (2014). Mixed methods social networks research: Design and applications. New York, NY: Cambridge University Press. Eagle, N., Pentland A. S., & Lazer, D. (2009). Inferring friendship network structure by using mobile phone data. Proceedings of the National Academy of Science, 106(36), 15274–15278. Fiske, S. T., & Hauser, R. M. (2014). Protecting human research participants in the age of big data. Proceedings of the National Academy of Sciences, 111(38), 13675–13676. Fortunato, S. (2010). Community detection in graphs. Physics Reports, 486, 75–174. Gile, K. J., & Handcock, M. S. (2010). Respondent driven sampling: An assessment of current methodology. Sociological Methodology, 40, 285–327. Grippa, F., & Gloor, P. A. (2009). You are who remembers you: Detecting leadership through accuracy of recall. Social Networks, 31(4), 255–261. Heckathorn, D. D. (1997). Respondent-driven sampling: A new approach to the study of hidden populations. Social Problems, 44(2), 174–199. Helleringer, S., Hans-Peter, K., Kalilani-Phiri, L., Mkandawire, J., & Benjamin, A. (2011). The reliability of sexual partnership histories: Implications for the measurement of partnership concurrency during surveys. AIDS, 25, 503–511. Kadushin, C. (2005). Who benefits from network analysis: Ethics of social network research. Social Networks 27: 139‒153. Kadushin, C. (2012). Understanding social network analysis: Theories, concepts & findings, Oxford University Press. Kandel, D.  B. (1978). Homophily, selection, and socialization in adolescent friendships. American Journal of Sociology, 84, 427–436. Kahn, J. P., Vayena, E., & Mastroianni, A. C. (2014). Opinion: Learning as we go: Lessons from the publication of Facebook’s social-computing research. Proceedings of the National Academy of Sciences, 111(38), 13677–13679. Khan, B., Dombrowski, K., Curtis, R., & Wendel, T. (2015). Estimating vertex measures in social networks by sampling completions of RDS trees. Social Networking, 4(1), 1–16. Kilduff, M., & Oh, H. (2006). Deconstructing diffusion: An ethnostatistical examination of medical innovation network data reanalyses. Organizational Research Methods, 9, 432–455. Killworth, P. D., & Bernard, H. R. (1976). Informant accuracy in social network data. Human Organizations, 35, 269–286. Klofstad, C. A., McClurg, S. D., & Rolfe, M. (2009). Measurement of political discussion networks: A comparison of two “name generator” procedures. American Association for Public Opinion Research, 73(3), 462–483. Klovdahl, A.  S. (1985). Social networks and the spread of infectious diseases: The AIDS ­example. Social Science Medicine, 21, 1203–1216. Klovdahl, A. S. (2005). Social network research and human subjects protection: Towards more effective infectious disease control. Social Networks, 27, 119–137.

Strategies for COLLECTING Social Network Data   135 Krackhardt, D. (1987). Cognitive social structures. Social Networks, 9, 109–134. Kramer, A. D. I., Guillory, J. E., & Hancock, J. T. (2014). Experimental evidence of massive­scale emotional contagion through social networks. Proceedings of the National Academy of Sciences, 111(24), 8788–8790. Laumann, E. O., Marsden, P. V., & Prensky, D. (1994). The boundary specification problem in network analysis. In L. C. Freeman, D. R. White, & A. K. Romney (Eds.), Research methods in social network analysis. New York, NY: Transaction Publishers. Lorant, V., Soto, V. E., Alves, J., Federico, B., Kinnunen, J., Kuipers, M., . . . Kunst, A. (2015). Smoking in school-aged adolescents: Design of a social network survey in six European countries. BMC Research Notes, 8(1), 91. Marin, A., & Hampton, K. N. (2006). Simplifying the personal network name generator alternatives to traditional multiple and single name generators. Field Methods, 19(2), 163–193. Marsden, P. V. (2011). Survey methods for network data. In J. Scott & P. J. Carrington (Eds.), The Sage handbook of social network analysis. Thousand Oaks, CA: Sage. Marsden, P. V., & Podolny, J. (1990). Dynamic analysis of network diffusion processes. In J. Weesie & H. Flap (Eds.), Social networks through time (pp. 197–214). Utrecht, Netherlands: ISOR. McCarty, C., Killworth, P. D., & Rennell, J. (2007). Impact of methods for reducing respondent burden on personal network structural measures. Social Networks, 29, 300–315. McPherson, M., Smith-Lovin, L., & Cook, J. M. (2001). Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27, 415–444. Merluzzi, J., & Burt, R. S. (2013). How many names are enough? Identifying network effects with the least set of listed contacts. Social Networks, 35(3), 331–337. Morris, M. (Ed.). (2004). Network epidemiology: A handbook for survey design and data collection. Oxford, UK: Oxford University Press. National Commission for the Protection of Human Subjects. (1979). The Belmont Report: Ethical principles and guidelines for the protection of human subjects of research. US Government Printing Office. Narayanan, A., & Shmatikov, V. (2009). De-anonymizing social networks. IEEE Symposium on Security and Privacy, 30, 173–187. Neal, J.  W. (2008). “Kracking” the missing data problem: Applying Krackhardt’s cognitive social structures to school-based social networks. Sociology of Education, 81(2), 140–162. Paik, A., & Sanchagrin, K. (2013). Social isolation in America: An artifact. American Sociological Review, 78, 339–360. Phillips, G., II, Janulis, P., Mustanski, B., & Birkett, M. (2017). Validation of tie corroboration and reported alter characteristics among a sample of young men who have sex with men. Social Networks, 48, 250–255. Potter, G.  E., Smieszek, T., & Sailer, K. (2015). Modelling workplace contact networks: The effects of organizational structure, architecture, and reporting errors on epidemic predictions. Network Science, 3(3), 298–325. Potterat, J. J., Woodhouse, D. E., Muth, S. Q., Rothenberg, R. B., Darrow, W. W., Klovdahl, A.  S., & Muth, J.  B. (2004). Network dynamism: History and lessons of the Colorado Springs study. In M. Morris (Ed.), Network epidemiology: A handbook for survey design and data collection. New York, NY: Oxford University Press. Salathe, M., Kazandjieva, M., Lee, J. W., Levis, P., Feldman, M. W., & Jones, J. H. (2010). A high-resolution human contact network for infectious disease transmission. Proceedings of the National Academy of Science, 107(51), 22020–22025. Salganik, M. J., & Heckathorn, D. D. (2004). Sampling and estimation in hidden populations using respondent driven sampling. Sociological Methodology, 34, 193–240.

136   Jimi Adams, Tatiane Santos, and Venice Ng Williams Smith, J., Moody, J., & Smith-Lovin, L. (2017). Network sampling coverage II: The effect of non-random missing data on network measurement. Social Networks, 48(1), 78–99. Sönmez, S., Apostolopoulos, Y., Tanner, A. E., Massengale, K., & Brown, M. (2016). Ethnoepidemiological research challenges: Networks of long-haul truckers in the inner city. Ethnography, 17(1), 111–134. Strang, D., & Tuma, N. B. (1993). Spatial and temporal heterogeneity in diffusion. American Journal of Sociology, 99, 614–639. Van den Bulte, C., & Lilien, G. L. (2001). Medical innovation revisited: Social contagion versus marketing effort. American Journal of Sociology, 106, 1409–1435. Wald, A. (2014). Triangulation and validity of network data. In S. Dominguez & B. Hollstein (Eds.), Mixed methods networks research: Design and applications (pp. 65–89). New York, NY: Cambridge University Press. Wallerstein, N.  B., & Duran, B. (2006). Using community-based participatory research to address health disparities. Health Promotion Practice, 7(3), 312–323. Wejnert, C., & Heckathorn, D. D. (2008). Web-based network sampling: Efficiency and efficacy of respondent-driven sampling for online research. Sociological Methods & Research, 37(1), 105–134. Young, A. M., Rudolph, A. E., Su, A. E., King, L., Jent, S., & Havens, J. (2016). Accuracy of name and age data provided about network members in a social network study of people who use drugs: Implications for constructing sociometric networks. Annals of Epidemiology, 26(11), 802–809. Zachary, W. W. (1977). An information flow model for conflict and fission in small groups. Journal of Anthropological Research, 33(4), 452–473.

Chapter 8

Soci a l N et wor k Ex per i m en ts Matthew E. Brashears and Eric Gladstone

What is the most effective way of determining causal processes involving social networks? By definition, social networks are collections of social entities (often individuals) who are linked by one or more relationships, including informal interpersonal bonds (e.g., friendship), formal economic transactions (e.g., employment), obligatory connections (e.g., family), and simple copresence or comembership, to name a few. Such assemblages of individuals or other social actors are often quite large and so are typically studied essentially “in the wild,” imposing significant difficulties in data collection. These challenges include identifying the boundaries of the network (if conducting a sociometric analysis; see Laumann, Marsden, & Prensky, 1992), achieving an adequate sample size (e.g., Hipp et al., 2015; McCarty, Killworth, & Rennell, 2007; Merluzzi & Burt, 2013), selecting the best network instrument (e.g., Brashears & Quintane, 2018; Fischer, 1982), and minimizing priming (e.g., Brashears, 2011; Lee & Bearman, 2017), respondent fatigue (Fischer, 2009), interviewer fatigue (Paik & Sanchagrin, 2013), and other sources of bias. Once the data are collected, the inherent dependence between observations (or between dyads in ego network samples; see Brashears, 2014) forces researchers to use complex replacements for conventional multivariate models (e.g., exponential random graph models; see Lusher, Koskinen, & Robins, 2013). Thus, complex data are met with complex methods in hopes of producing straightforward answers. But there is an alternative. In this chapter we provide an overview of experimental designs and give several examples of how experimental methods can be employed in social network research. Experiments provide unsurpassed certainty in causal identification by allowing the researcher to control all aspects of the process while in many cases substantially reducing data collection costs. Moreover, with the aid of computers, the internet, and a little creativity, it is often possible to collect highly reliable data (see Kraut et al., 2004; Paolacci, Chandler, & Ipeirotis, 2010; Horton, Rand, & Zeckhauser,  2012) from online samples with the aid of crowdsourcing services, such as Amazon Mechanical Turk. In short, the time is right for network experiments.

138   Matthew E. Brashears and Eric Gladstone

What Is an Experiment? Most simply put, an experiment is the deliberate creation of a situation for the purpose of measuring the causal linkages between one or more factors (see also Jackson & Cox, 2013). In all experiments, a treatment is given to subjects to determine how their responses change pre- and posttreatment (i.e., a within-subjects design), different groups of subjects are given different treatments to see how their responses vary from each other (i.e., a betweensubjects design), or both. Experiments can be further broken down by whether they are true experiments, quasi-experiments, or natural experiments, proceeding in that order from maximum researcher control over conditions to minimum. In a true experiment, the researcher controls all aspects of the experimental setting and randomly assigns participants to conditions. This type of experiment is typically executed in a laboratory where the physical space is entirely under the researcher’s control, frequently including interactions between the subject and covert research assistants (i.e., “confederates”). Quasi-experiments involve the random assignment of subjects and administration of treatments in a setting not fully under researcher control (e.g., in a public area). Natural experiments are possible when natural events cause an experiment-like treatment to be applied to a population (e.g., enactment of new legislation or an earthquake), permitting pre- and posttreatment measures to be compared. Crucially, experiments of all types allow us to “rule in” causal mechanisms but never to “rule out” causal mechanisms. When a mechanism is confirmed experimentally, it is accurate to say that the mechanism is sufficient to produce the observed outcome but has not been shown to be necessary. That is, there may be other mechanisms that produce the same outcome. Experimental designs simplify causal identification by synthetically producing matched counterfactual conditions. For example, in a true experiment, researchers design the conditions to differ only on the variable(s) of interest through the use of manipulations (i.e., deliberate changes to the subjects’ experiences). They then assign subjects to each condition at random, ensuring that the distribution of unobserved characteristics (e.g., sexual preference, personality traits, etc.) in each condition is equal, within the limits of probability theory. As a result, the only systematic differences between the groups are those produced by the manipulations, and in effect, each condition represents a particular type of counterfactual relative to the others. For example, if we test a drug by giving one randomly selected group the actual medication and another randomly selected group a placebo (e.g., a sugar pill), then the group receiving the placebo is a counterfactual for the group receiving the drug. In one synthetic universe, individuals receive a drug that (hopefully) works, while in the other synthetic universe, they receive a drug that definitely does not work. Differences between groups can therefore be positively attributed to differences in the manipulations as all other characteristics, including the attributes of the subjects, are identical. Causal identification becomes more challenging as we move from true experiments to quasi-experiments to natural experiments. As the experimental situation is less and less under the control of the researcher, it becomes more likely that outside factors that are unobserved and unmeasured can influence the outcome or moderate any causal relationship. For example, a sudden earthquake can provide the opportunity to examine changes in cortisol levels, but other factors that vary over time (e.g., political events, seasonal weather changes, etc.) are also likely to play a role, complicating efforts to specify a causal link.

Social Network Experiments   139 Natural Experiments

True Experiments

Less Generalizable

More Generalizable

More Identifiable

Less Identifiable

Naturalistic Studies

QuasiExperiments

figure 8.1  Ease of causal identification/generalizability by research type. However, this decrease in ease of causal identification brings with it an improvement in generalizability. True experiments are most difficult to connect to “real life” events because the causal connection is observed only in strictly controlled circumstances. Quasiexperiments and natural experiments take place in less controlled and artificial environments and are therefore more straightforward to relate to “real life” conditions. For example, it may be possible to induce racial prejudice in the laboratory, but this does not demonstrate that such prejudice impacts actual employment decisions. A quasi-experimental design in which job applications, differing only in the race of the applicant, are sent to real employers to measure the frequency of call-backs is more easily generalized but suffers from more opportunities for confounds. The precise type of design selected thus has critical implications for causal identification and generalizability. In comparison, naturalistic studies of social networks (e.g., collecting network data from a selection of managers; see Burt, 1992) maximize generalizability while minimizing the potential for causal identification. As such, traditional studies of social networks are predominantly found at one extreme of the causal identification/generalizability dimension (see Figure 8.1) and are frequently subject to protracted debates over causal identification (e.g., diffusion of obesity: Christakis & Fowler, 2007; Cohen-Cole & Fletcher, 2008a, 2008b; Fowler & Christakis, 2008; diffusion of antibiotic use: Burt, 1980; Coleman et al., 1966; Van den Bulte & Lilien, 2001). Even when not subject to such debates, naturalistic studies of social networks often suffer from the fact that only a single network is examined. As a result, while many nodes may be contained in the data, there is only one “observation” at the network level. These studies essentially represent the worst of both worlds: too little researcher control to provide easy causal identification, and too few observations to provide generalizability.

Can Social Networks Be Studied with Experiments? Social networks are assemblages of parts, often including dozens to millions of distinct individuals with a potentially very large number of relationships between them. Moreover, most network analysts are of the opinion that networks have effects that are not perfectly reducible to the individual or accounted for with higher-level structural influences. Why

140   Matthew E. Brashears and Eric Gladstone study a network if all of its behavior can be encapsulated in individuals, on the one hand, or large-scale social trends on the other? As a result, it may appear on the surface as though experimental designs are inappropriate for network research, despite their obvious advantages; a network cannot fit into a laboratory, and thus networks cannot be studied with experiments. Nonetheless, these objections can be overcome by doing one of two things: shrinking the process to fit into the laboratory or enlarging the laboratory to encompass the process. Morris Zelditch (1969) argues for shrinking the process to fit into the laboratory when he asks, “Can you really study an army in the laboratory?” His answer is a resounding “yes” and rests on the point that the goal of most research is not to study a phenomenon per se (e.g., a network of managers), but rather to identify and test the underlying causal mechanisms that enable and empower a given phenomenon (e.g., the effect of redundancy on information movement). Given that this is the case, an experimental design is a valid approach so long as the causal mechanism can be brought into the laboratory, even if the context in which it is thought to operate cannot. For example, if status homophily is thought to produce segregated social networks, then an experiment need only induce an arbitrary status order in naïve subjects (using any of a variety of previously validated methods; see, e.g., Simpson & Walker, 2002) and observe their resulting communications behavior in the laboratory. If the mechanism is observed to work in these artificial conditions, then it is logical to assume that it will also operate in larger, more natural groups (e.g., in natural friendship networks; see Brashears, 2008). Mechanisms are smaller than contexts and thus can be made to fit into many laboratory settings quite handily. Zelditch’s points remain as germane now as they were when he wrote them, but 40 years later, new solutions to the same problem have become available. Not only can we shrink a process to fit into the laboratory, as Zelditch suggests, but also we can enlarge the laboratory to encompass the process. The combination of the internet and ubiquitous connected devices including smartphones and wearables (see Eagle, Pentland, & Lazer, 2009) allows researchers to construct large, but strictly controlled, networks on demand. Connected devices (including but not limited to personal computers, tablets, and cellular telephones) can be used both to administer experimental treatments and as interfaces for subject interaction. The types of interaction can be as constrained or free as the researcher desires, ranging from something as simple as conventional game theory designs where participants divide a pool of points that are tied to their compensation (e.g., the ultimatum game: Rand et al., 2013) to something as complex as full-motion video with audio monitoring. Moreover, as interaction occurs via connected device, researchers can mislead subjects into believing they are interacting with another human when they are in fact communicating with a computer program (i.e., a simulated interactant), allowing highly precise and controlled administration of interactional stimuli. Moreover, thanks to the internet, interaction can take place between individuals drawn from widely separated geographic locations. The laboratory can thus be enlarged enough to enclose the planet, if such is desirable. The final technological innovation that has revolutionized the potential for experiments in social network research is the rise of crowdsourcing platforms like Amazon Mechanical Turk.1 Mechanical Turk, or MTurk, was developed as an online labor market for connecting individuals with time on their hands to groups who are willing to pay money to complete tasks that only humans are capable of (e.g., tagging photos with their contents). MTurk workers peruse a listing of jobs maintained in an online marketplace before selecting the

Social Network Experiments   141 jobs that are most appealing, completing them, and receiving electronic payment. Both the employer and worker are able to rate each other, leveraging reputation effects to ensure good behavior, and workers can be selected on a variety of criteria, including their average ratings.2 Experimentalists have already begun using MTurk as a source of data (e.g., Kraut et al., 2004); workers can be compensated for completing experiments as they would any other type of job. This approach involves no sampling as usually defined and thus cannot be viewed as statistically representative. But as with laboratory studies, statistical representativeness is not usually a goal. Instead, differences in experimental conditions allow the identification of causal effects, and the more diverse pool of research subjects (i.e., not exclusively college students) reduces the risk that observed results are limited to a subset of the population. Comparisons with laboratory data (Buhrmester, Kwang, & Gosling, 2011; Gosling & Johnson, 2010) indicate that results derived from MTurk are more representative than, and at least as reliable as, traditional college student samples. Results from experiments conducted via MTurk are consistent with those collected in the lab, as well as those collected elsewhere on the internet such as survey websites, message boards, and online forums (Paolacci et al., 2010). Finally, research shows that MTurk samples compare favorably with those drawn using more traditional probability sampling methods (Weinberg, Freese, & McElhattan, 2014). Other options for research crowdsourcing continue to emerge, including Northeastern University’s Volunteer Science Initiative3 and the Zooniverse Project,4 both of which serve as online sources of participants and human help for scientific research. Therefore, it has become possible to construct online networks and to recruit subjects cheaply and easily from a wider cross-section of society than ever before, somewhat escaping the tradeoff between causal identification and generalizability reflected in Figure 8.1. In summary, social networks can be studied in the laboratory by focusing on the critical causal mechanisms and capturing them in reduced form. Alternatively, they can be examined experimentally outside the laboratory by using computers to present highly controlled stimuli to subjects and to manage their interactions. Finally, crowdsourcing platforms enable the rapid and inexpensive recruitment of a geographically and demographically diverse (i.e., less WEIRD; see Henrich, Heine, & Norenzayan,  2010) set of respondents. Thus, we have ample reason to believe that social networks can, and indeed should, be studied with experiments.

Experimental Manipulations for the Study of Social Networks Experiments of all types derive their power in part from the manipulations used. As the name suggests, manipulations are deliberate changes made by the researcher (or, in the case of natural experiments, the environment) to cast light on a hypothesized causal mechanism. It is effectively impossible to provide a list of the possible manipulations available to researchers as a considerable degree of creativity is frequently involved. However, four broad criteria should generally guide manipulation selection or design: theoretical con­ sist­ency, fit with scope conditions, availability of validated designs, and manipulation restraint.

142   Matthew E. Brashears and Eric Gladstone First, to provide valuable results, an experimental manipulation must be consistent with the theory it means to test or relies upon. Because experiments work by systematically changing some variable or set of variables thought to have a causal impact on some other variable or variables, they rely on a theoretical model of how those variables are connected. This is most obviously true for formal theory, where the model is specified in exacting (often mathematical; see, e.g., Willer, 1999 or Friedkin, 1998) detail, but it is also true for theories stated as textual rather than mathematical postulates. An appropriate manipulation is one that targets the mechanism specified by the theory and adjusts it in the required manner. However, it is important to keep in mind that even the most simple of data-driven research is, fundamentally, rooted in at least an implicit theory. For example, if researchers want to understand the connections between network structure and emotional well-being, they will at a minimum select some aspect or aspects of the network to connect to emotional outcomes. The specific structural features selected reflect an underlying, likely unstated, theory about how networks shape emotional responses. Thus, virtually any serious network study will be guided by some type of theoretical position. Experiments only differ in the extent to which this is true; while a conventional network study might be purely exploratory, involving minimal theory, a network experiment must have identified a specific set of mechanisms, shaped by theory, that are to be reproduced and manipulated in controlled conditions. Second, all theories have boundaries that define the set of conditions under which they apply. These boundaries often indicate the types of entities the theory applies to (e.g., humans) as well as the types of situations it is relevant for (e.g., task groups). Such boundaries are known more formally as “scope conditions” or “boundary conditions,” and an experiment that fits within a theory’s scope conditions can serve as a test of that theory, while an experiment that exceeds them cannot. For example, theories of network diffusion cascades generally assume that an item can spread from person to person like a biological contagion. As such, an experiment tracking the movement of an item that cannot be duplicated (e.g., money; see Schaefer,  2007) would violate the scope conditions for network diffusion cascades and not provide an effective test. An unstated but often necessary scope condition for experiments in general is that participants are engaged with and paying attention to the experiment itself. As a result, many experiments use what are called attention checks or manipulation checks (Keppel, 1991) to confirm that the subjects are engaged and that the manipulation is working (though not necessarily that the manipulation is having the anticipated effect). Scope conditions can be relaxed by subsequent theoretical development and empirical research, but at any given moment, a manipulation based in, or intended to test, a given theory must be consistent with its scope conditions. Third, while substantial creativity is often involved in the development and deployment of experimental manipulations, it is generally desirable to use manipulations already developed by other researchers, when possible. As with survey items, the behavior of previously used manipulations is more likely to be known and thus the results that they produce can be more easily understood. Likewise, potential pitfalls in the design (e.g., demand characteristics) are more likely to be known, and thus possible to adjust for. To date (as we discuss later), experiments have presented networks to participants in textual form (e.g., vignettes), have shown them sociograms, and have depicted interactions between computer-generated avatars, among other approaches. These methods each have their own advantages, but there

Social Network Experiments   143

figure 8.2 Example of a two-by-two factorial design.

Sparse Network Dense Network

Network Conditions

Power Conditions High Power Low Power High Power x Sparse Network

High Power x Dense Network

Low Power x Sparse Network

Low Power x Dense Network

is little consensus on which is best for a given purpose or what the potential biases may be. Thus, considerable work remains to be done. Finally, manipulations must be applied with a degree of restraint. This doesn’t mean that manipulations must be designed to be small in effect, but rather that they should be applied cautiously to maximize causal identification. For example, an experiment with two manipulations with two levels each (e.g., high vs. low power priming and dense vs. sparse network) can be performed with a simple two-by-two crossing. This results in four conditions, each reflecting a combination of manipulations (e.g., high/dense, high/ sparse, low/dense, low/sparse). Critically, however, the experimental conditions form pairs, between which only one factor has changed (e.g., high/dense vs. low/dense), allowing the impact of that specific manipulation to be measured under otherwise constant conditions (see Figure 8.2). If manipulations are applied such that these pairs don’t form and conditions always differ in more than one way at a time, it becomes impossible to identify causal relationships with certainty. The goal is thus not simply to apply manipulations, but to do so to make it straightforward to identify and measure causal links if they indeed exist.

Examples of Network Experiments While experimental designs do not make up a majority of social networks research, there have been a number of examples, some quite recent, in the literature. Here we review several of these to provide some idea of the types of questions that can be addressed using experimental approaches. These are not chosen because they are necessarily the most important experiments that have been performed, but instead because they provide good examples of true experiments (e.g., Brashears & Gladstone,  2016; O’Connor & Gladstone,  2015) and quasi-experiments (e.g., Centola,  2011; Salganik, Dodds, & Watts, 2006). We do not provide examples of natural experiments for reasons of brevity and because these of necessity are relatively more unique than other types of experiments (but for an example see Godechot, 2016).

144   Matthew E. Brashears and Eric Gladstone

Homophily and the Spread of Health Behaviors Damon Centola (2011) used a quasi-experiment to investigate how homophily (the tendency for individuals to associate with those like themselves or for birds of a feather to flock together) influences the likelihood of adopting a new health-related behavior. Centola argued that individuals are more likely to adopt a new behavior from others who are similar to themselves along a relevant dimension. For example, a healthier person is more likely to adopt a health-related behavior from another healthier person rather than from someone who is less healthy. An experimental design was particularly useful in this context because it isolates network structural topology from frequently associated confounds such as homophily, geographic proximity, and interpersonal affect (Blau & Schwartz,  1984; McPherson, Smith-Lovin, & Cook, 2001). Centola examined the spread of a health behavior through an internet-based social networking environment, created for the experiment, populated by 710 participants recruited from an online fitness website. Individual traits including age, gender, and body mass index (BMI) were used to produce two types of networks: networks generated by allocating individuals at random and networks generated to produce homogeneity on the individual-level traits. These conditions allowed Centola to compare diffusion of behavior in homophilous conditions to diffusion of behavior without homophily, and individuals were assigned at random to one or the other condition. Participants generated an anonymous social profile that displayed their gender, BMI, fitness level, diet, and favorite exercise. Subjects were then matched with other participants in the community (i.e., network ties were imposed). An online dashboard informed both parties about each other and their respective adoption of healthy behaviors. Participants ultimately made decisions regarding the adoption of an internet-based diet diary. Importantly, only those in the experiment knew of or could access the diet diary, ensuring that the only way to learn about or adopt the behavior was to receive a dashboard notification from a community partner. The experiment began when the public dashboard of a healthy seed—a member of the experimental community with above-average exercise minutes and fitness and a below-average BMI—displayed the adoption of the health diary. If a neighbor observed the notification via the online dashboard and adopted the behavior as well, then his or her neighbors were alerted in turn. After seven weeks of observation, the experiment concluded and Centola examined the rates of diet diary adoption across conditions. Centola found that the rate of behavioral adoption was significantly higher in the homophilous networks compared to the nonhomophilous networks. The results suggest that the most effective health interventions will be mediated by social influence from similar individuals, which has obvious public policy implications. This research benefited substantially from a quasi-experimental design. While all aspects of the subjects’ environment were not under researcher control (e.g., they were logging in from unknown physical locations), the conditions were identical with the exception of the experimental manipulation. In other words, the interfaces were the same, starting conditions were the same, and community features were identical, aside from the creation of homogeneous or heterogeneous networks. Causal identification was substantially improved by the design as it eliminated selection effects; all individuals

Social Network Experiments   145 in the experiment were identical in selecting into a health-related website and no ­unobserved variables influenced assignment to a condition. Finally, because the experiment was conducted online, it was relatively more generalizable than either a lab study or a naturalistic study of a single bounded context. That is, the online participant pool is more representative of the general population than is a behavioral laboratory containing, largely, undergraduates. Likewise, the results, generated from multiple artificially induced networks, are more generalizable than a naturalistic study of a single social network.

Networks and the Matthew Effect Salganik et al. (2006) investigated how social influence impacts the likelihood that a song is viewed favorably. In particular, the authors were interested in whether or not perceptions of successful bands would become a self-fulfilling prophecy (Merton, 1948). In short, a selffulfilling prophecy means that an initially false state of events evokes a new state of events that makes the originally false situation come true. They explored this question by artificially manipulating the initial popularity of various songs in an online music market populated by 14,341 participants. The authors recruited participants from among those who had been involved in prior studies using email solicitations. The emails contained a link to a custom-designed website, as well as some information regarding the purpose of the study. As was typical, the emails did not reveal the full goals of the study so as to not introduce a confound. The link led participants to a menu containing 48 songs, which they could listen to, rate (from 1 to 5), and download for free, if desired. Participants could only rate songs if they had listened to them first, and could only download them if they had rated them. These three behaviors formed the basis of the authors’ dependent variable. Participants were randomly assigned to one of two conditions: an independent condition and a social influence condition. In the independent condition, songs were randomly presented in a list of 48, and no information about prior downloads by others was provided. In the social influence condition, participants were randomly assigned to one of eight initially identical “worlds,” allowing eight replications of the same process from the same starting conditions. In the social influence condition, each song was accompanied by a number indicating how often songs were downloaded by prior participants. Salganik et al. also generated an additional condition wherein they manipulated the popularity of a song. Songs that were previously the most popular were now displayed as least popular, and vice versa for the least popular songs. Ultimately, Salganik et al. found that many songs experienced self-fulfilling prophecies in which initially false perceptions of popularity became real over time. In this example, the authors used two manipulations: the presence or absence of social influence and the deliberate modification of popularity scores. Both allowed the authors to show that judgments of song quality depend significantly, though not entirely, on perceptions of the views of others. Further, the authors demonstrated that web-based experiments can yield powerful results in the study of macro-sociological processes that otherwise would be quite difficult to study in brick-and-mortar settings.

146   Matthew E. Brashears and Eric Gladstone

Error and Error Correction Process in Network Diffusion Brashears and Gladstone (2016) used experimental methods to study (1) how rapidly errors accumulate during social networks contagions, (2) whether some message formats are more likely to accumulate error when compared to others, and (3) what human efforts at correcting errors do to the contagion. The authors designed an experiment wherein participants read, remembered, and retransmitted a series of 10 sentences (13 to 16 words in length) drawn from popular press books, ensuring a standard and relatively undemanding reading level. Reading and retransmitting a series of messages acted as an analog for the more general process of receiving and  sending other types of network contagions, all of which involve the movement of information. While the study was administered via computer, as with the prior examples, all participants completed the experiment in a physical laboratory, allowing the authors total control over environmental conditions. A given participant was presented with a sentence for five seconds and was then forced to watch a blank screen for five seconds, before being given unlimited time to enter a new sentence into the computer. The time limits were used to capture real-world time and cognitive restraints, without being excessively demanding. This process was repeated until the participant read and retransmitted all 10 sentences. These rekeyed sentences then acted as the new stimulus sentences for the next participant.5 Brashears and Gladstone introduced two manipulations: a format manipulation and an error correction manipulation. First, the authors hypothesized that the redundancy of the message format (see Shannon,  1950) would influence the accuracy of retransmitted messages. In information theory, redundancy, or entropy, refers to the extent to which a given signal can be identified based on a subset of the elements. For example, at one extreme, an endless sequence of one character (e.g., “11111111 . . .”) is maximally redundant (minimal entropy) because the entire sequence is known once the first character has been identified. At the opposite extreme, an endless sequence of random numbers is minimally redundant (maximum entropy) because each character identified provides no improvement in the ability to identify the full sequence. The stimulus sentences were thus presented either in highredundancy standard English (e.g., the format of this chapter) or in low-redundancy text messaging pidgin (e.g., “I see you” becomes “I c u”). Second, the authors argued that humans would attempt to correct errors that they detected, and that these corrections would impact message fidelity. In the No Correction condition, participants were instructed to read and retransmit the message verbatim. In the Correction condition, participants were asked to read and reproduce what they believed to be the intended meaning of the message (i.e., to paraphrase). These manipulations were crossed to produce a two (message format: standard English vs. texting pidgin) by two (attempts to correct messages: Correction vs. No Correction) design, yielding all possible combinations of the manipulations. Participants were randomly assigned to a condition, thus ensuring that individual variation did not affect the results. The results indicated that high-redundancy formats are more robust against error, that error correction efforts are generally effective, and that failed error corrections produce more widely varying descendant messages. Thus, language with more built-in fault tolerance can weather error storms to a greater degree. Further, and somewhat counterintuitively, the (failed) act of correcting errors gives rise to substantially more distinct forms of

Social Network Experiments   147 the original message than do the original errors. Thus, diversity emerges not in spite of attempts to conform, but because of it, producing “innovation from imitation.” The use of a true experimental design in this study allowed the precise identification of causal processes; differences between conditions could be attributed conclusively to message format, or the presence of error correction. Moreover, unlike previous similar research tracking error in online network contagions (e.g., Simmons, Adamic, & Adar,  2011), Brashears and Gladstone were able to monitor the network at all times, ensuring that all predecessor and descendant messages were identified and properly linked. Finally, this study shows how a mechanism (i.e., network contagion) can be reduced to fit into a laboratory even though the context in which it normally unfolds (i.e., an actual network) is not.

Network Recall and Social Exclusion O’Connor and Gladstone (2015) use true experimental methods to explore the ways in which people perceive and recall novel network structures. The authors hypothesized that social exclusion would systematically and negatively impact an individual’s ability to decode the network structures embedded in unfamiliar social situations. More specifically, the authors argued that compared to those who were not excluded, excluded participants would view new social groups as relatively denser than (1) they actually were or (2) those who were not excluded. The authors conducted three separate studies, only two of which we review here. To determine whether exclusion systematically impacts perceptions of novel networks, they manipulated the description of the network itself (Figure 8.3). Using Amazon Mechanical Turk, O’Connor and Gladstone recruited roughly 140 participants from the United States, Canada, the United Kingdom, and Australia. After providing their informed consent to participate in the experiment, participants were asked to rate the extent to which they felt (1) popular (reverse-scored), (2) part of a social group (reverse-scored), (3) included (reverse-scored), and (4) ostracized. These variables were then merged to form a composite index of overall feelings of social exclusion.6

A

C E

D F

figure 8.3  Stylized social network used in Studies 2 and 3.

B

148   Matthew E. Brashears and Eric Gladstone After providing their perceptions of social exclusion, participants observed a network sociogram containing seven ties (Figure  8.3) for 30 seconds before the software auto­advanced to a new screen containing questions regarding network perceptions. While all participants saw the same network structure, half received a network graphic described as a social group, while the other half received a network graphic described as a road and town map. The dependent variable used by the authors was the number of relationships/roads participants recalled after seeing the networks. The authors found that as participants’ self-reported feelings of social exclusion increased, so too did their tendency to overestimate the number of social relations in the social network graphic (i.e., the social group condition), but not in the road and town network graphic (i.e., the road map condition). The results suggested that social exclusion has a significant impact on the recall of social relations, but not relations in general. In Study 2, O’Connor and Gladstone sought to establish a stronger causal relationship between ostracism and perceptions of novel social networks. In Study 1, a correlate of feeling socially excluded (e.g., taxed cognitive abilities) may have driven the observed effects. To remedy this problem, O’Connor and Gladstone relied on psychological priming, a manipulation used commonly in psychology, social psychology, and economics (e.g., Brashears, Hoagland, & Quintane,  2016; Stevens,  1951). Priming is often used to study a variable of interest without introducing that variable’s correlates into the experiment, for example, to induce a feeling of being powerful without also bringing along high education, wealth, or formal authority. There are many established ways to prime an individual (for the seminal work on priming, see Meyer & Schvaneveldt, 1971), but the general aim remains the same: to induce a state of the variable of interest without that variable’s correlates being present. After providing their informed consent, participants were directed to a new screen that depicted a game called Cyberball (Williams, 2006). Cyberball is a three-person computerbased ball toss game designed to prime and manipulate participants’ experiences and feelings of social exclusion. Participants in the lab believed, falsely, that they were playing this game with other individuals in the room. The authors created two conditions: inclusion and exclusion. In the exclusion conditions, participants were thrown the ball one-fifth of the time, whereas in the inclusion condition, participants were thrown the ball four-fifths of the time. After playing the game for four minutes, participants completed a brief inventory that served as the author’s manipulation check. Participants then moved on to the second component of the study, where they were presented with a custom-designed video animation showing figures (known as avatars) interacting in a number of social settings. The video was eight minutes long and depicted the avatars talking and dancing across three separate scenes: a beer garden, walking into a club, and dancing to music. The authors used a video rather than a stylized network graph to increase realism and ensure that the findings in Study 1 were consistent across different methods of presentation. Importantly, the underlying network structure of the social interactions depicted in the video remained identical to that of Study 1 (Figure 8.3). Again, the authors found that experiences of ostracism systematically altered the way in which social networks were recalled such that ostracized individuals perceive other networks as denser than they are. Using true experimental procedures, O’Connor and Gladstone were able to link psychological processes to behaviors that generate macro-level network structures. This experiment demonstrates, first and foremost, how multiple studies can be used to improve causal

Social Network Experiments   149 identification. This is simple and fast in an experimental framework, but substantially more difficult in a naturalistic setting. Second, this experiment showcases how modern experiments can present material in multiple formats (e.g., videos of avatars, sociograms, etc.) and can mislead participants into believing they are interacting with others when they are only  interacting with a computer program. Finally, this experiment combined a quasiexperimental design (Study 1) with a true experimental design (Study 2), thereby improving both causal identification and generalizability.

Conclusion In this chapter we have argued that experimental designs are not just feasible for the study of social networks, but in many respects optimal. They provide maximum causal identification and, with the use of modern crowdsourcing services, often require only minimal sacrifices to generalizability. Thus, by either shrinking a process to fit into the lab or expanding the lab to encompass a process, we are able to more precisely study social network mechanisms with experiments than nearly any other method. We additionally provided several examples of social network experiments from the existing literature, though we were by no means exhaustive in our listing (e.g., Muchnik, Aral, & Taylor, 2013). These give an idea of the range of manipulations that are available, the topics that can be studied, and the insights that can be achieved by using experimental designs. As such, we hope that the reader agrees that the time is right for network experiments. While we are enthusiastic about the capabilities of network experiments, we do not mean to suggest that all research questions should be addressed with an experiment. Naturalistic designs must remain a significant part of the repertoire of social network analysts, and we do not mean to imply otherwise. Nevertheless, it is a mistake to assume that the ideal context in which to study a process is one that duplicates the natural environment as precisely as possible. A globe the size of the earth is of no use. The natural environment is complex, messy, and prone to generating incorrect causal models. The laboratory, in contrast, provides a calm and controlled environment in which the hidden workings of the social world may be uncovered. Armed with this detailed knowledge, it is then possible to examine the “real world” for signs of these mechanisms in action and to use deviations from those expectations to identify missing variables. In short, experiments have a critical role to play alongside natural studies, but neither of the two can replace the other.

Notes 1. https://www.mturk.com/mturk/welcome. 2. For sociological research on the community of MTurk workers and the future of crowdsourcing, see Kittur et al. (2013). 3. https://volunteerscience.com/volunteerscience.html 4. https://www.zooniverse.org/ 5. The similarity to the children’s game of “telephone” (also known by too many other names to mention here) is intentional.

150   Matthew E. Brashears and Eric Gladstone 6. A composite index of at least three items is generally preferable when assessing psychological and social psychological processes and should yield Cronbach’s alpha values of at least 0.7 (Cronbach & Shavelson, 2004).

References Blau, P. M., & Schwartz, J. E. (1984). Crosscutting social circles. New Brunswick, NJ: Transaction Publishers. Brashears, M. E. (2008). Sex, society and association: A cross-national examination of status construction theory. Social Psychology Quarterly, 71, 72–85. Brashears, M.  E. (2011). Small networks and high isolation? A reexamination of American discussion networks. Social Networks, 33, 331–341. Brashears, M.  E. (2014). “Trivial topics” and rich ties: The relationship between discussion topic, alter role, and resource availability using the “important matters” name generator. Sociological Science, 1, 493–511. Brashears, M. E., & Gladstone, E. (2016). Error correction mechanisms in social networks can reduce accuracy and encourage innovation. Social Networks, 44, 22–35. Brashears, M. E., Hoagland, E., & Quintane, E. (2016). Sex and network recall accuracy. Social Networks, 44, 74–84. Brashears, M.  E., & Quintane, E. (2018). The weakness of tie strength. Social Networks, 55, 104–115. Buhrmester, M.  D., Kwang, T., & Gosling, S.  D. (2011). Amazon’s Mechanical Turk: A new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science, 6, 3–5. Burt, R. S. (1980). Innovation as a structural interest: Rethinking the impact of network position on innovation adoption. Social Networks, 2, 327–355. Burt, R.  S. (1992). Structural holes: The social structure of competition. Cambridge, MA: Harvard University Press. Centola, D. (2011). An experimental study of homophily in the adoption of health behavior. Science, 334, 1269–1272. Christakis, N. A., & Fowler, J. H. (2007). The spread of obesity in a large social network over 32 years. New England Journal of Medicine, 357, 370–379. Cohen-Cole, E., & Fletcher, J. M. (2008a). Is obesity contagious? Social networks vs. environmental factors in the obesity epidemic. Journal of Health Economics, 27, 1382–1387. Cohen-Cole, E., & Fletcher, J.  M. (2008b). Detecting implausible social network effects in acne, height, and headaches: Longitudinal analysis. British Medical Journal, 337, a2533. Coleman, J. S., Katz, E., Menzel, H., Columbia University, & Bureau of Applied Social Research. (1966). Medical innovation: A diffusion study. Indianapolis, IN: Bobbs-Merrill Co. Cronbach, L. J., & Shavelson, R. J. (2004). My current thoughts on coefficient alpha and successor procedures. Educational and Psychological Measurement, 64(3), 391–418. Eagle, N., Pentland, A.  S., & Lazer, D. (2009). Inferring friendship network structure by using mobile phone data. Proceedings of the Natural Academy of the Sciences, USA, 106, 15274–15278. Fischer, C. S. (1982). To dwell among friends: Personal networks in town and city. Chicago, IL: University of Chicago Press. Fischer, C. S. (2009). The 2004 GSS finding of shrunken social networks: An artifact? American Sociological Review, 74, 657–669.

Social Network Experiments   151 Fowler, J. H., & Christakis, N. A. (2008). Estimating peer effects on health in social networks: A response to Cohen-Cole and Fletcher; and Trogdon, Nonnemaker, and Pais. Journal of Health Economics, 27, 1400–1405. Friedkin, N.  E. (1998). A structural theory of social influence. New York, NY: Cambridge University Press. Godechot, O. (2016). The chance of influence: A natural experiment on the role of social capital in faculty recruitment. Social Networks, 46, 60–75. Gosling, S. D., & Johnson, J. A. (Eds.). (2010). Advanced methods for conducting online behavioral research. Washington, DC: American Psychological Association. Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences, 33, 1–75. Hipp, J. R., Wang, C., Butts, C. T., Jose, R., & Lakon, C. M. (2015). Research note: The consequences of different methods for handling missing network data in stochastic actor based models. Social Networks, 41, 56–71. Horton, J.  J., Rand, D.  G., & Zeckhauser, R.  J. (2012). The online laboratory. Experimental Economics. Jackson, M., & Cox, D. R. (2013). The principles of experimental design and their application in sociology. Annual Review of Sociology, 39, 27–49. Keppel, G. (1991). Design and analysis: A researcher’s handbook. Upper Saddle River, New Jersey: Prentice-Hall. Kraut, R., Olson, J., Banaji, M., Bruckman, A., Cohen, J., & Couper, M. (2004). Psychological research online: Opportunities and challenges. American Psychologist, 59, 105–117. Laumann, E. O., Marsden, P. V., & Prensky, D. (1992). The boundary specification problem in network analysis. In L. C. Freeman, D. R. White, & A. K. Romney (Eds.), Research methods in social network analysis (pp. 61–79). New Brunswick, NJ: Transaction Publishers. Lee, B., & Bearman, P. (2017). Important matters in political context. Sociological Science, 4, 1–30. Lusher, D., Koskinen, J., & Robins, G. (2013). Exponential random graph models for social networks: Theory, methods, and applications. New York, NY: Cambridge University Press. McCarty, C., Killworth, P. D., & Rennell, J. (2007). Impact of methods for reducing respondent burden on personal network structural measures. Social Networks, 29, 300–315. McPherson, M., Smith-Lovin, L., & Cook, J. M. (2001). Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27, 415–444. Merluzzi, J., & Burt, R. S. (2013). How many names are enough? Identifying network effects with the least set of listed contacts. Social Networks, 35, 331–337. Merton, R. K. (1948). The self-fulfilling prophecy. The antioch review, 8(2), 193–210. Meyer, D.  E., & Schvaneveldt, R.  W. (1971). Facilitation in recognizing pairs of words: Evidence of a dependence between retrieval operations. Journal of Experimental Psychology, 90(2), 227. Muchnik, L., Aral, S., & Taylor, S. J. (2013). Social influence bias: A randomized experiment. Science, 341, 647–651. O’Connor, K. M., & Gladstone, E. (2015). How social exclusion distorts social network perceptions. Social Networks, 40, 123–128. Paik, A., & Sanchagrin, K. (2013). Social isolation in America: An artifact. American Sociological Review, 78, 339–360. Paolacci, G., Chandler, J., & Ipeirotis, P.  G. (2010). Running experiments on Amazon Mechanical Turk. Judgment and Decision Making, 5, 411–419.

152   Matthew E. Brashears and Eric Gladstone Rand, D. G., et al. (2013). Evolution of fairness in the one-shot anonymous Ultimatum Game. Proceedings of the National Academy of Sciences, USA, 110(7), 2581–2586. Salganik, M.  J., Dodds, P.  S., & Watts, D.  J. (2006). Experimental study of inequality and unpredictability in an artificial cultural market. Science, 311, 854–856. Schaefer, D. R. (2007). Votes, favors, toys, and ideas: The effect of resource characteristics on power in exchange networks. Sociological Focus, 40, 138–160. Shannon, C.  E. (1950). Memory requirements in a telephone exchange. The Bell System Technical Journal, 29(3), 343–349. Simmons, M.  P., Adamic, L.  A., & Adar, E. (2011). Memes online: Extracted, subtracted, injected, and recollected. International Conference on Web & Social Media, 2011. Simpson, B., & Walker, H. A. (2002). Status characteristics and performance expectations: A reformulation. Sociological Theory, 20(1), 24–40. Stevens, S.  S. (1951). Handbook of experimental psychology. Hoboken, New Jersey: Wiley Publishing. Van den Bulte, C., & Lilien, G. L. (2001). Medical innovation revisited: Social contagion versus marketing effort. American Journal of Sociology, 106, 1409–1435. Weinberg, J. D., Freese, J., & McElhatten, D. (2014). Comparing data characteristics and results of an online factorial survey between a population-based and a crowdsource-recruited sample. Sociological Science, 1, 292–310. Willer, D. (1999). Network exchange theory. Westport, CT: Praeger. Williams, K. D., & Jarvis, B. (2006). Cyberball: A program for use in research on interpersonal ostracism and acceptance. Behavior research methods, 38(1), 174–180. Zelditch, M. (1969). Can you really study an army in the laboratory? In A.  Etzioni (Ed.), Complex organizations (2nd ed., pp. 528–539). New York, NY: Holt, Rinehart and Winston.

Chapter 9

The N et wor k Sca l e-U p M ethod Tyler H. M c Cormick Introduction Say that a researcher wants to estimate the number of people who are members of a particular group and there is no reliable list or administrative data source. A natural approach would be to take a survey of the population and ask individuals whether or not they belong to this group. This approach assumes (1) that individuals are willing to reveal their membership in the group of interest and (2) that members of the group are in the sampling frame of the survey. If the group of interest is highly stigmatized or extremely rare, then one or both of these assumptions are likely violated. In HIV/AIDS research, for example, injection drug users, female sex workers, and men who have sex with men are considered the groups most at risk for contracting the disease (United Nations, 2010). Further, even if both assumptions are satisfied, asking respondents individually if they are a member of the group is highly inefficient for very rare groups. The network scale-up method is one of a series of methods that leverage a respondent’s social network to more effectively capture information about specific groups or about the population as a whole. The network scale-up method works with questions that are known as aggregated relational data (ARD). These questions take the form “How many Xs do you know?” That is, ARD are count data consisting of the number of connections between a respondent and individuals with a specific characteristic. Critically, ARD do not involve observing any links in the network and are collected using standard probability sampling techniques. In this way ARD differs from other methods that use respondents’ networks to reach certain groups in the population. Respondentdriven sampling (RDS), for example, also relies on connections between individuals to access members of a certain population. In RDS and other snowball sampling schemes, however, additional participants enter the survey as they are recruited from respondents’ networks. This feature, while potentially appealing from an efficiency perspective, means that the sampling process depends fundamentally on the social network. Since ARD can

154   Tyler H. McCormick be collected without observing any links, it can be incorporated into standard survey ­platforms. As we discuss later, the network still plays a key role in the responses that individuals give, but the sampling process itself occurs independently of the population social network. Broadly defined, ARD have been used in two contexts. First, ARD have been used to estimate and learn about the sizes of hard-to-reach groups. Most of this work has been on estimating the sizes of these groups using the network scale-up method (Bernard et al., 1991; Killworth, McCarty, Bernard, Shelley, & Johnsen,  1998; Maltiel et al.,  2015; Feehan et al., 2016), though others have used ARD to estimate other demographic features of hardto-reach groups (McCormick & Zheng, 2012, 2015). Given the substantial cost savings of ARD over traditional network sampling tools, researchers have also used ARD to learn about social structure in large populations, not specifically in the case of hard-to-reach groups. DiPrete et al. (2011), for example, examined cleavages in Americans’ social networks based on demographic and political factors. Breza et al. (2017) compare ARD to fully observed graphs in the context of risk-sharing behavior in a savings-monitoring experiment. Feehan, Mahy, and Salganik (2017) used a variant of the scale-up method to estimate mortality from sibling reports, and Chen, Karbasi and Crawford (2016) adapted the idea to estimate the total size of a network. The main focus of this chapter is estimating the size of a group of individuals using ARD and the network scale-up method. As the name implies, the method uses information for survey respondents’ social networks to “scale up” to an entire population. As Feehan and Salganik (2016) put it, the “core insight behind the network scale-up method is that ordinary people have embedded in their personal networks information that can be used to estimate the sizes of hidden populations.” The scale-up method has been used in a variety of contexts. Bernard et al. (1991), for example, used the scale-up method to estimate the number of victims from an earthquake in Mexico City. Kadushin et al. (2006) used the method to estimate heroin use in 14 cities in the United States. As mentioned previously, the method has also been used extensively to estimate the size of groups considered most at risk for HIV/AIDS (see, e.g., Ezoe et al., 2012; Scutelniciuc, 2012; Jafar Khounigh et al., 2014; Jing et al., 2014) and drug users (Salganik, Fazito, Bertoni, Abdo, Mello, & Bastos, 2011; Guo et al., 2013; Maghsoudi et al., 2014; Nikfarjam et al., 2016). The method uses two pieces of information: (1) ARD questions about the population of interest and (2) the degree, or total network size, of the respondents. To see how the method works, say that you’d like to estimate the number of individuals who are injection drug users in the population. We use a survey of 1,000 individuals to determine how many injection drug users they know. In the survey, respondents reported a total of 2,500 connections to injection drug users and reported 500,000 total connections (on average, each respondent knew 500 people total). The researcher would then estimate that 0.5% of respondents’ connections are with injection drug users. Multiplying by the total population size, then, gives an estimate of the number of injection drug users in the population. A unique feature of this method is that it relies not only on the individuals selected by the researcher but also on the survey participants’ network members, who are not observed. The rest of the chapter is organized as follows. We discuss the details of the network scale-up method in the next section. As alluded to in the previous paragraph, the method relies on several restrictive assumptions, which we describe and then discuss statistical approaches to mitigate them. In the third section we discuss data collection and survey design. The final section is the conclusion.

the Network Scale-Up Method   155

Methodology This section formally presents the network scale-up estimator. We begin by describing the basic scale-up estimator and maximum likelihood estimation. The discussion continues with the assumptions required for valid estimates under this paradigm and attempts to use more nuanced statistical models to mitigate the impacts of departures from the assumptions.

The Network Scale-Up Estimator As mentioned previously, the network scale-up method requires two types of information. First, we need to know the number of individuals in the target, often hard-to-reach, population that each sampled respondent knows. Define this first piece of information as yik, the number of people that respondent i knows in group k (e.g., the number of people who are currently incarcerated, or the number of intravenous drug users). The second piece of information is the total number of connections that each respondent has, or the size of their personal network. Since a person’s network size is referred to as “degree” in the networks literature, we use di to represent the total number of connections by respondent i. Assume each yik follows a binomial distribution with di as the number of trials and pk = Nk/N, where Nk is the size of group of interest k and N is the total population size. Then, Killworth, McCarty, Bernard, Shelley, and Johnsen (1998) present the maximum likelihood estimator starting with the likelihood

 di  ( pk | yik , … ynk ) = ∏ i   pkyik (1 − pk )di − yik ,  yik 

which is maximized when

 ∑ y N ik   = N ∑ i yik . (1) pt = k = i and thus N k N ∑ i di ∑ i di

 based on the maximum likelihood estimator is, after a bit Further, the standard error of N k more algebra,

) = N × Nk . s.e.( N k ∑ i di

The precision of the estimated population size, therefore, increases with the number of connections made by individuals in the sample. Note, however, that the more connections a respondent must poll to identify members of the group of interest, the more challenging the cognitive task becomes for the respondent.

156   Tyler H. McCormick Three features of the maximum likelihood estimator are worth further discussion. First, the estimator relies on high-quality estimates of respondents’ degrees. We discuss estimating degree in the next section. Second, it is straightforward to compute the estimator because individuals in the survey are assumed to be sampled independently from one another. That is, we assume that the number of individuals known by one respondent is not influenced by the number known by another respondent. This observation leads to the third point, that the estimator depends on the ratio of sums of individuals known by the respondents. More specifically, while the sampling strategy may be determined by the characteristics of the survey respondents, the scale-up estimates depend on the characteristics of the people known by respondents, or alters. This feature is a critical difference between ARD-based methods and traditional survey designs. Though the maximum likelihood representation is straightforward, it relies on several assumptions that have proven problematic in practice. First, barrier effects occur based on what McCormick, Salganik, and Zheng (2010) call “nonrandom mixing.” That is, the propensity for individuals to know members of the target group varies widely across individuals in the population. This heterogeneity creates additional variation, known as overdispersion, that is not considered in the maximum likelihood framework mentioned previously. Zheng, Salganik, and Gelman (2006) propose a model that accounts for overdispersion by introducing an additional parameter specific to each ARD group. Specifically, Zheng et al. (2006) propose

yik ∼ Negative Binomial (mean = expai + bk , overdispersion = wk ),

where αi is the respondents’ gregariousness, βk is the fraction of contacts with group k, and higher values of the ωk parameter correspond to more variation than would be expected if the responses followed a Poisson distribution. Zheng et al. (2006) suggest interpreting overdispersion as a factor of decrease in the fraction of people that know one member of the population compared to the number that know zero. Put another way, in populations with high overdispersion, individuals tend to know either many members of the group or zero. Using data from McCarty (2002), Zheng et al. (2006) found that knowing individuals in prison had a high rate of overdispersion, a finding that DiPrete et al. (2011) also found using the 2006 General Social Survey. In their work on connections to incarcerated individuals, Lee et al. (2015) found that an African American male with an average-size network would be expected to know just under two prisoners. A Caucasian female, in contrast, would be expected to know only about 0.3. Further, about 85% of the Caucasian women in the sample reported knowing no prisoners, while for African American males the percentage was about 63%. These results demonstrate that, along with varying between ARD groups, barrier effects can vary systematically based on respondent characteristics. We return to this issue when discussing degree estimation in a subsequent section. While barrier effects create bias because of the influence of individuals’ social networks on their responses, the next two issues with network scale-up involve cases where respondents’ reports differ from the connections in their network. First, transmission errors occur when a respondent is connected to someone in his or her social network that belongs to a certain group but the respondent does not know the person is a member of the group. If a researcher asks respondents how many individuals they know who use recreational

the Network Scale-Up Method   157 marijuana, for example, a respondent might work with individuals who are users but do not reveal that information because of potential repercussions in the workplace. Ezoe et al. (2012) provide a particularly striking look at the potential impacts of transmission errors in a study to estimate the number of men who have sex with men (MSM) in Japan. In their work, Ezoe et al. (2012) used an internet survey of 1,500 individuals and asked a number of ARD questions, including the number of contacts each respondent had that he or she understands to be MSM. Using data collected from a sample of MSM by Hidaka et al. (2005), Ezoe et al. (2012) concluded that the average MSM “comes out” to only about five other individuals. The average (estimated) network size in the Ezoe et al. (2012) survey was about 360. If MSM have networks that are about the same size as the average of the survey respondents, then “coming out” to only five individuals would mean that less than 1.5% of contacts know about the individual’s MSM status. Finally, the scale-up method assumes that respondents can accurately recall the number of individuals that they know in a particular group. For example, people seem to underrecall the number of people they know in large subpopulations (e.g., people named Michael) and overrecall the number in small subpopulations (e.g., people who committed suicide) (Killworth et al., 2003; Zheng et al., 2006). Systematic over- or underreporting based on population size can lead to substantial bias in scale-up estimators.

Estimating Degree The network scale-up estimator requires an estimate of each respondent’s degree. The number of individuals known by the respondent, of course, depends on the definition of “know.” Using a very restrictive definition of “know,” it might be possible to have respondents enumerate directly (e.g., people the respondent has lent to or received a loan from for a large sum of money) their connections. Recall, however, that the population of individuals known by the respondents is the group that will be used to “scale up” to the population, so using a very restrictive definition of “know” limits the depth to which respondents poll their social network. A looser definition, however, has the caveat that it is very unlikely that respondents will be able to recall (either in the time allotted for the survey or at all) all of their connections. Given these constraints, most researchers will require an additional estimation procedure to obtain respondents’ degree. The first set of methods attempt to elicit respondents’ networks directly. The diary method required subjects to keep a daily record of all known people encountered over the span of 100 days. This method, while yielding very rich and accurate data, requires too much cooperation and time to be employed in routine sample surveys. Later efforts have attempted to reduce the burden on respondents by using data on contacts that are recorded automatically, for example, Christmas card mailing lists (Hill & Dunbar,  2003), email logs (Kossinets & Watts,  2006), or cell phone records (Onnela et al., 2007). The enumeration methods described previously are difficult to incorporate in a standard survey framework. The reverse small-world method (Killworth & Bernard, 1978; Killworth, Bernard, & McCarty, 1984; Bernard et al., 1990) is a survey-based attempt to help respondents poll a large fraction of their network directly. The method uses the logic from the

158   Tyler H. McCormick small-world experiments of Milgram (1967). Interviewers ask respondents to name individuals in their network that they would pass a message to if they were trying to get a message to a specific target. Asking about different targets enumerates different groups within the respondent’s network. The number of targets was typically very large, however, meaning that the process could take hours to complete. An additional procedure that can be embedded in a survey more readily and polls respondents’ networks in aggregate rather than one connection at a time is the summation method (McCarty et al., 2001). In this method, respondents are asked how many people they know in a list of specific relationship types (e.g., immediate family, neighborhood, coworkers, etc.), and these responses are then summed to yield an overall estimate. McCarty et al. (2001) propose 16 relation types that when added together should yield the total personal network size. This procedure requires a list that both (1) completely covers all respondents’ networks and (2) contains only mutually exclusive categories. Without the former characteristic, the method will underestimate network size, and violations of the latter will result in double-counting. Finally, the scale-up method is also a viable candidate for degree estimation. To see this, recall that in Equation (1) we assume that Nk, the size of the target group, was unknown. For some groups, though, we have reasonable information about the size of the group from administrative or other sources. If the size of the group is known, the estimator can then be rearranged to estimate degree. Asking about more than one group with known size can mitigate barrier effects. If, for example, we only asked the respondents the number of people they knew named Rachel, then we should expect that female respondents would report knowing more individuals than male respondents because of homophily based on gender. More formally, to get the maximum likelihood estimator for the respondent degree, we take, following Killworth, Johnsen, McCarty, Shelley, and Bernard (1998) and Killworth, McCarty, Bernard, Shelley, and Johnsen (1998), yik ∼ Binomial  di , N K  and then have the N   likelihood

y

 N k   d  ik  d  (di | N1 , …N K −1 , yi1 … yik −1 ) = ∏ k =1    i   1 − i   yik   N   N  K −1

N k − yik

,

where there are K total ARD subgroups and the first K − 1 have known size. The maximum likelihood estimator is then

di = N

∑ ∑

K −1 k =1 K −1

yik

Nk k =1

. (2)

With an eye toward the three sources of error described previously, McCormick et al. (2010) developed a hierarchical model to estimate degree in a way that controls barrier effects and recall errors. Transmission errors can be controlled by choosing populations with known sizes that are readily observable (e.g., first names). Consider first barrier effects. The Zheng et al. (2006) model presented in the previous section accounts for excess variation that arises from heterogeneity in the propensity for people to know individuals in certain groups. McCormick et al. (2010) extend this concept

the Network Scale-Up Method   159 by modeling systematic differences in the parts of each respondent’s network that the ARD questions elicit. To see this, consider asking individuals about only populations with known size that are made up predominantly of females (e.g., a set of female first names). Since we expect there to be homophily based on gender, we expect that female respondents will report knowing more individuals in the chosen groups than if we had asked for a set of, for example, male names with the same size. This mismatch will therefore tend to overestimate the degree of female respondents and underestimate the degree male respondents, with the magnitude of the bias corresponding to the strength of homophily. To see this effect in action, see Figure 9.1, which shows histograms of responses to two ARD questions from McCarty et al. (2001). From the figure, we see that younger female respondents knew, on average, about twice as many people named Christina as did older males. Meanwhile, older male respondents knew (again, on average) about three times as many people named Robert as younger females. These discrepancies arise because the name Christina is most popular with younger females, while Robert is more popular with older males in the US population. McCormick et al. (2010) address barrier effects using a model that adjusts the expected number known by the (estimated) rate of interaction between individuals in different demographic groups. To do this, McCormick et al. (2010) group respondents into ego groups, e, and the people that are connected to respondents, or alters, into a = 1, . . . ,A groups. The alter and ego groups are partially determined by the information available

0.6

0.6

1.39

Density

Density

0.8

Younger women

0.4 0.2

Younger women

0.4 0.2 0.0

0.0

0

1

2

3

4

5

6

1 2 3 4 5 6 7 8 9

Number known

0.6

Older men

0.75

Density

0.6

11

Number known

0.8

Density

1.95

0.4

Older men

6.52

0.4 0.2

0.2

0.0

0.0

0

1

2

3

4

Number known

5

6

1 2 3 4 5 6 7 8 9

11

Number known

figure 9.1  McCarty et al., 2001 asked respondents how many people they knew with a set of first names. This figure represents the responses broken down by age and gender. The left two histograms are responses to “How many people named Christina do you know?” and the right two histograms are the same question for Robert. Christina is a more popular name among younger females, while Robert is more common among older males. “Michael”, for example).

160   Tyler H. McCormick about categories with known size as the model requires that both the size of the group and the distribution of individuals across demographic groups (the number of males under 30 named Christina Robert) be known. McCormick et al. (2010) present the following model:

yik ∼ Neg-Binom( µike , w k'), Where mike = di ∑ a =1 m(e, a) A

N ak . (3) Na

The negative binomial representation allows for group-specific overdispersion, ωk' , as described in the previous section and in Zheng et al. (2006). The mean of the negative binomial, µike, takes a form that corrects for barrier effects that happen between discrete respondent or ego categories, e, and alter categories, a. The expected number of people known in group k, then, depends on the person’s overall degree, di, as in the maximum likelihood setup, but the degree is then modulated by the characteristics of the person (via ego group e) and of people in group k. Specifically, the m(e,a) is the “mixing matrix” that estimates the probability that a person in ego group e interacts with a person in alter group a:

  dia m(e, a) = E  i in ego group e  A  di = ∑ dia  a =1  

(4)

where dia is the number of people the respondent knows in alter group a. The final term, Nak/Na, represents the conditional probability of a person being in group k given that he or she is in alter group a (of males under 30, what fraction are named “Michael”). The alter groups partition the set of possible individuals, so summing across alter groups gives the total number known in group k. Estimating the McCormick et al. (2010) model requires choosing the number of alter and ego groups. Choosing a larger number of groups will allow the model to more precisely adjust for social structure, thereby improving degree estimation. More precise alter and ego groups require more information about groups with known size, however, which is likely to be a limitation in practice. McCormick et al. (2010) used six categories for egos and eight for alters (three age by two gender and four age by two gender, respectively). Populations with known size were first names, with data about the distribution of names in the population coming from birth records provided by the US Social Security Administration. Turning to recall error in ARD, note that Equation (3) assumes that the responses are accurate (modulo sampling error). In practice, however, respondents tend to have difficulty recalling accurately the number of individuals they know in a given group if the group size is large. Specifically, individuals tend to underrecall the number of individuals they know in large groups (Killworth et al., 2003; Zheng et al., 2006). To address this issue, McCormick et al. (2010) develop a calibration curve based on the logic depicted in Figure 9.2. In the figure, the left panel represents what would happen if respondents could recall perfectly from their network, with the area of the number known in each ARD group (outer circle) increasing monotonically with the area of the overlap between the circles. What actually

Group A

Group B

Group C

f(% in group k)

the Network Scale-Up Method   161

Actually known Person i Actual network

Group D

Group A

% in group k

Recalled known

Group B Person i Recalled network Group C

Group D

figure 9.2  ARD questions ask respondents to poll their actual network (left panel), though they actually use the subset of their network that they happen to recall at a given time. The function f(·) adjusts for the fraction of the actual network that a respondent recalls. happens, however, is depicted in the right panel. Rather than recalling from their complete network, respondents recall from a subset of their network. McCormick et al. (2010) develop a calibration curve using populations with known size. If respondents recall from their complete network, then we should observe that bk ≈ Nk / N. Both Killworth et al. (2003) and Zheng et al. (2006), however, found that the relationship is nearly linear for groups that were small but increases progressively as the size of the group gets larger. Specifically, the relationship begins as linear and then increases to the recalled graph being approximately the square root of the complete graph. See McCormick and Zheng (2007) for the complete details of the calibration curve derivation. Zheng et al. (2006) as well as Maltiel et al. (2015) also develop adjustments for recall. As we describe next, the Maltiel et al. (2015) adjustment is based on estimating the sizes of all the groups in the McCarty et al., (2001) data and thus includes corrections for larger groups, whereas the McCormick et al. (2010) calibration curve is for the names used for estimating degree, which are substantially smaller.

Bayesian Approach Maltiel et al. (2015) developed a unified Bayesian approach that attempts to address the three sources of error in scale-up estimates and is specifically designed for estimating group size. Maltiel et al. (2015) present three different models, beginning with a model that assumes random mixing but allows for heterogeneity in degrees and then systematically adding components to deal with recall error, transmission bias, and barrier effects. The final product is the “combined model,” which we describe here. First, recall that the maximum likelihood estimator assumes each respondent’s answer to ARD question k as coming from a binomial distribution where the number of trials is the person’s degree and the probability is the fraction of individuals in the population. Maltiel et al. (2015) take a different approach, by specifying each response as a binomial conditional on additional parameters that vary for each individual. Thus, while the binomial assumption may not be valid for a single set of parameters across all individuals in the population,

162   Tyler H. McCormick the Maltiel et al. (2015) approach assumes that it is valid for a given individual only after accounting for person-specific differences in the model parameters. This allows heterogeneity in degrees, as well as in network characteristics that create barrier effects, to be reflected in each individual’s probability. Specifically, the combined model likelihood takes the form

yik |di ,τ k qik ∼ Binomial (di , qikτ k ).

The number of trials in this binomial is di, just as in the maximum likelihood estimator. Maltiel et al. (2015) use a Bayesian framework, meaning that uncertainty in individuals’ degrees will propagate into the final size estimate. Across the degrees, Maltiel et al. (2015) use a log normal prior distribution, which uses a long tail to account for a small number of high-degree individuals and corresponds to estimates of degree distributions in previous work (Zheng et al., 2006; McCormick et al., 2010). Unlike the maximum likelihood estimator, the probability of having a tie is specific to each individual and composed of two components. The first component, qik, accounts for barrier effects by explicitly allowing the probability of a connection to members of group k to vary by respondent. Maltiel et al. (2015) give qik a beta prior distribution, parameterizing the distribution in terms of mean and dispersion (Skellam, 1948; Mielke, 1975; Diggle et al., 2002, Chapter 9). Using this parameterization, the mean parameter of the beta is E[qik] = Nk/N and the dispersion allows for excess variation that arises because of overdispersion. The final piece, τk, controls transmission errors. That is, τk is the average proportion of respondents’ contacts in group k that the respondents report. The transmission bias term in Maltiel et al. (2015) is similar to the corrections developed in Salganik, Mello, Abdo, Bertoni, Fazito, and Bastos (2011), except that the term in Maltiel et al. (2015) combines the transmission and differential network size terms that are combined in Salganik, Mello, Abdo, Bertoni, Fazito, and Bastos (2011). For groups with known size, transmission bias is set to be one, which facilitates estimating degree. For groups with unknown size, the transmission bias has a beta prior. This beta prior is designed to be informative. That is, to appropriately account for transmission effects, the Maltiel et al. (2015) model requires strong external evidence to set a reasonable range for the prior. The game of contacts (Salganik, Mello, Abdo, Bertoni, Fazito, & Bastos, 2011), discussed in the next section, is one such source of information. Finally, Maltiel et al. (2015) also developed an adjustment for recall errors. This adjustment is based on the relationship between the (uncorrected) estimates and actual sizes for the known population. Maltiel et al. (2015) used all of the populations in the McCarty et al. (2001) data in their estimation procedure, rather than the comparably smaller first names used to develop the calibration curve in McCormick et al. (2010). The correction is then applied using a postprocessing step that is related to the normalization step introduced in Zheng et al. (2006).

Generalized Network Scale-Up While methods described to this point rely on variations on the maximum likelihood methodology described at the beginning of this section, Feehan and Salganik (2016) represent a novel conceptualization of sources of bias and uncertainty in the scale-up method.

the Network Scale-Up Method   163 The so-called generalized scale-up estimator relies on one critical observation: the total number of times individuals report being a member of a group must match the total number of reports of knowing group members recorded. That is, if we interviewed the entire population, then the number of times being a member of the population is recorded as an answer to an ARD question (out-reports in the parlance of Feehan & Salganik, 2016) must match the total number of times individuals report their status (in-reports or visibility). Put simply, every time someone discloses his or her status, someone must learn about an alter’s status. If the number of out-reports that person i has with hidden population k is yik and the number of in-reports that person i has about his or her status as a member of group k is vik, then the Feehan and Salganik (2016) condition requires that Σi yik = Σi vik. Multiplying both sides by Nk, the number of individuals in group k gives Nk =

∑y ∑ v /N i

i ik

ik

, k



which serves as the basis for the generalized scale-up estimator. The main challenge that arises is in estimating the denominator of the previous equation, the average number of in-reports for the hidden population. This quantity requires direct access to members of the hidden population and, further, requires those individuals to estimate their visibility to other members of the population. Estimating the visibility of hidden group members itself presents challenges. Asking directly would essentially require hidden group members to estimate how many people would include them in their count of number known. Instead, Feehan and Salganik (2016) propose collecting enriched ARD from a probability sample of the hidden group. Obtaining a probability sample is, of course, nontrivial in and of itself. In practice, the most commonly used technique is respondent-driven sampling (Heckathorn, 1997). Enriched ARD involve asking members of the hidden group ARD questions, but then also asking about their visibility within each ARD group. This would involve, for example, asking hidden group members how many people they know who are postal workers and then asking, “How many of these postal workers are aware that you’re an injection drug user?” Salganik, Mello, Abdo, Bertoni, Fazito, and Bastos (2011) propose an interview protocol called the “game of contacts” to elicit enriched ARD. This procedure has been used in multiple settings to date (Salganik, Fazito, Bertoni, Abdo, Mello, & Bastos, 2011; Maghsoudi et al., 2014). The visibility of hidden group members to the enriched ARD groups (e.g., postal workers in the previous example) is then aggregated to estimate the visibility of the hidden group for the whole population, needed for the denominator of the generalized scale-up estimator. This approach assumes that the visibility of the hidden population members to individuals in the enhanced ARD groups is about the same as to the general population. As Feehan and Salganik (2016) point out, it is less likely that injection drug users are more visible to postal workers in this example than they would be to the general population. Choosing medical professionals, however, would likely be a poor choose as injection drug users would be more visible to these individuals than to the general population. Feehan and Salganik (2016) describe the generalized scale-up estimator as the traditional scale-up estimator with three adjustment factors. First, to preserve in- and out-reports, the

164   Tyler H. McCormick estimator must account for discrepancies between the sampling frame and the population. This is critical for ARD-type questions since the sampling frame is derived from respondents but actually consists of respondents’ alters. Say we have two respondents i and j and that person i is included in a representative survey of adults. Say also that i knows j and j is an injection drug user who is under 18. When polled on the survey, person i will report knowing person j, but person j cannot report being known by person i (since person j is not included in the survey frame), meaning that out-reports will exceed in-reports. The presentation of the generalized scale-up method, then, is for the case when the frame and population overlap perfectly. Second, as has been widely noted, the standard scale-up estimator does not account for discrepancies in which individuals in the sample know about a person’s status. The generalized scale-up estimator addresses this issue directly by using visibility rather than degree as the denominator of the scale-up equation. This adjustment is known as the “true-positive rate” because it represents the fraction of connections to the hidden group that are actually known to the survey respondent. The third adjustment accounts for potential differences in the degrees of individuals in the hidden and frame populations. This issue does not arise in the generalized scale-up estimator since the method uses the in-reports from hidden population members directly. It does arise in the traditional scale-up framework, however, that relies on respondent degrees, the distribution of which may differ substantially from those of the hidden group members. The three adjustments are summarized as follows:

∑y ∑d

ik

i

i

i

 frame   degree   true positive  ∑ i yik . × × × = rate  ratio   ratio    ∑ i vi

That is, we arrive at the generalized scale-up estimator by beginning with the standard scale-up estimator, then adjusting for respondents’ alters who are outside the frame, discrepancies in the degree distribution between respondents and the hidden population, and the rate at which hidden population members reveal their status. The two estimators will be the same only when the product of the three adjustment factors is equal to one.

Survey Design This section describes strategies for collecting ARD for the network scale-up method. Researchers should carefully consider the cognitive burden of ARD questions. ARD questions require that respondents recall their connection with a given alter and that they correctly identify the alter as a member of the subgroup of interest. As discussed previously, subgroups with a high degree of stigma will likely have substantial transmission error, making it less likely a respondent will know that an alter is a member of the group. Large groups will make it difficult for respondents to recall accurately the number of individuals known. Asking respondents to think only of their close connections may mitigate some issues but will reduce the number of potential alters that may be members of the target group.

the Network Scale-Up Method   165

Defining “Know” One of the most fundamental questions in social network analysis is how we define the network of interest. In ARD, defining the network explicitly for respondents is critical, as otherwise each respondent could interpret the question in a different way and poll different aspects of his or her network. A commonly used definition for scale-up studies is from Bernard et al. (1989): two people are connected if they have been in contact during the past two years and both know each other by sight and by name. In some cases researchers alter the definition of “know” intentionally to learn about heterogeneity across different types of connections. DiPrete et al. (2011), for example, used ARD to compare networks based on trust (defined by willingness to lend a sum of money) to broader acquaintanceship networks. For the network scale-up method, however, the focus is on choosing a definition that produces the most accurate estimate for the size of the population of interest. To see how the definition of “know” can influence the quality of scale-up estimates, consider the potential uncertainty that would arise from asking either a strong or a weak tie. With a strong tie, respondents likely have substantially more information about each alter and the cognitive burden is lower since the respondent needs to mentally “poll” fewer alters. A strong tie, however, only reaches a small fraction of each individual’s network. That is, there may be individuals who have the trait of interest that the respondent knows, but not well enough to meet the more restrictive definition. DiPrete et al. (2011) estimated that, using a definition of acquaintanceship similar to Bernard et al. (1989), the median respondent had about 550 acquaintanceship ties, whereas the median respondent had only about 17 connections when the prompt asked respondents to think of people that are “good friends, people you discuss important matters with, or trust for advice, or trust with money.” Using the broader definition, therefore, enables the researcher to indirectly access over 30 times more individuals without substantially altering the survey design. This increase in (indirect) sample size, however, could easily be swamped in some cases by even more extreme errors that arise if, for example, individuals only reveal the trait of interest to groups of close friends or if the trait is not sufficiently memorable so that respondents do not accurately recall all members of the group. There is currently very little empirical information to guide researchers to an optimal definition of “know.” One exception is the work by Feehan et al. (2016). Along with a subtle discussion of the implications of choosing a definition of “know,” Feehan et al. (2016) compare estimates for populations with known sizes using two tie definitions. Specifically, Feehan et al. (2016) used data collected from a nationally representative survey in Rwanda in 2011 with an experiment that has two arms. Both arms asked respondents the same set of ARD questions but instructed respondents to use different definitions of a connection. In one arm respondents received a prompt similar to the acquaintanceship definition used in Bernard et al. (1989). Participants in the second arm were asked to consider connections with whom they have shared a meal in the past 12 months. Feehan et al. (2016) reported that the meal definition elicited information about 60% fewer individuals than the acquaintanceship condition. Despite reaching fewer individuals, the meal definition had a smaller mean squared error across 22 populations with known size. In this setting having higher-quality information about a smaller number of connections produced more accurate estimates than more

166   Tyler H. McCormick connections with weaker ties. As Feehan et al. (2016) point out, these results do not indicate that the meal definition is optimal. The optimal tie type likely depends both on the definition of the tie (e.g., a meal or an acquaintance) and on the time lag since the last contact. Future work could systematically examine these two factors using diverse populations with known size. Applying the method to groups with unknown size would, then, require assuming that the optimal definition does not depend on properties of the population that differ systematically between populations used for evaluation and unknown populations.

The Scaled-Down Condition As mentioned in the previous section, choosing the groups with known population size carefully can have a major impact on the quality of degree estimates. In cases where there is limited choice about which groups to choose, the McCormick et al. (2010) model provides an option. When there are options, however, choosing names that satisfy what McCormick et al. (2010) call the “scaled down” condition reduces bias and enables researchers to use the simpler maximum likelihood degree estimates. McCormick et al. (2010) give a formal derivation of the scaled-down condition, but the intuition is as follows. First, recall that the mixing matrix, m(e,a), in McCormick et al. (2010) represents the rate of mixing between ego group e to know individuals in alter group a (e.g., the fraction of ties in the network of 21- to 40-year-old males that are with females 41 to 60 years old). If there is no social structure across the alter and ego groups, then the mixing matrix simplifies to Na/N for every alter and ego group. That is, the fraction of ties we would expect with individuals in alter group a is the same across all members of the population and only depends on the number of individuals in alter group a. In practice, we do not expect there to be random mixing. Instead, the scaled-down condition suggests using ARD to construct a set of alters that have the same characteristics as the population. That is, the combined set of groups chosen with known size should have the same breakdown of characteristics as the population overall. If, for example, the population is composed of 20% females between 21 and 40 years of age, then we would like 20% of the alters represented by the known size groups to also be females between 21 and 40 years of age. Note that this condition does not apply to each specific group, but to the aggregate. More formally, the McCormick et al. (2010) scaled-down condition is





K k =1

N ak

Na

∑ =

K k =1

N

Nk

, ∀a. (5)

A researcher can ask about relatively homogeneous groups so long as the groups are chosen to balance in aggregate. Under the scaled-down condition, each respondent has (in expectation) the propensity to interact with the same distribution of individuals as in the population. A male respondent may know more individuals that have male names, for example, but this would be offset by knowing fewer individuals who have female names.

the Network Scale-Up Method   167

Conclusion In this chapter, we examine the network scale-up method to estimate the size of groups that are hard to count using other methods. The social network structure of respondents elevates and frustrates the effectiveness of the method. Connections between survey respondents and members of target groups drive the method, but the same structure can create barrier effects and lead to biased estimates. To this point, most current work on the scale-up estimator has focused on addressing the three issues mentioned at the beginning of this chapter: barrier effects, transmission errors, and recall bias. Recent work (namely Feehan & Salganik, 2016) offers a new direction for understanding the scale-up methods. In both lines of work a clear theme is the importance of marrying statistical tools with data collection. Whether it be using the game of contacts to estimate transmission bias or choosing groups that follow the scaled-down condition when estimating degree, understanding the role of survey design and implementation decisions in the resulting scale-up estimators is paramount to achieving quality population size estimates.  

References Bernard, H. R., Johnsen, E. C., Killworth, P. D., McCarty, C., Shelley, G. A., & Robinson, S. (1990). Comparing four different methods for measuring personal social networks. Social Networks, 12, 179–215. Bernard, R. H., Johnsen, E., Killworth, P., & Robinson, S. (1989). Estimating the size of an average personal network and of an event subpopulation. In M. Kochen (Ed.), The small world (pp. 159–175). Norwood, NJ: Ablex Press. Bernard, R.  H., Johnsen, E., Killworth, P., & Robinson, S. (1991). Estimating the size of an average personal network and of an event subpopulation: Some empirical results. Social Science Research, 20(2), 109–121. Breza, Emily., Chandrasekhar, Arun G., McCormick, Tyler H., and Pan, Mengjie (2020). Using Aggregated Relational Data to Feasibly Identify Network Structure without Network Data. American Economic Review, August, 110(8), 2454–2484. https://www.aeaweb.org/articles?id=10.1257/aer.20170861 Chen, L., Karbasi, A., & Crawford, F. W. (2016). Estimating the size of a large network and its communities from a random sample. In Advances in Neural Information Processing Systems (pp. 3072–3080). Diggle, P., Heagerty, P., Liang, K.-Y., & Zeger, S. (2002). Analysis of longitudinal data. Oxford University Press. DiPrete, T.  A., Gelman, A., McCormick, T., Teitler, J., & Zheng, T. (2011). Segregation in social networks based on acquaintanceship and trust. American Journal of Sociology, 116, 1234–1283. Ezoe, S., Morooka, T., Noda, T., Sabin, M. L., & Koike, S. (2012). Population size estimation of men who have sex with men through the network scale-up method in Japan. PLoS One, 7(1), e31184. Feehan, D. M., Mahy, M., & Salganik, M. J. (2017). The network survival method for estimating adult mortality: Evidence from a survey experiment in Rwanda. Demography, 54(4), 1503–1528.

168   Tyler H. McCormick Feehan, D. M., & Salganik, M. J. (2016). Generalizing the network scale-up method. Sociological Methodology, 46(1), 153–186. http://dx.doi.org/10.1177/0081175016665425 Feehan, D. M., Umubyeyi, A., Mahy, M., Hladik, W., & Salganik, M. J. (2016). Quantity versus quality: A survey experiment to improve the network scale-up method. American Journal of Epidemiology, 183(8), 747–757. Guo, W., Bao, S., Lin, W., Wu, G., Zhang, W., Hladik, W., . . . Wang, L. (2013). Estimating the size of HIV key affected populations in Chongqing, China, using the network scale-up method. PLoS One, 8(8), e71796. Heckathorn, D. D. (1997). Respondent-driven sampling: A new approach to the study of hidden populations. Social Problems, 44(2), 174–199. Hidaka, Y., & Kimura, H., Ichikawa  S. (2005). Study group report on HIV prevention and evaluation for MSM, health and labour sciences research grants [in Japanese]. Hill, R. A., & Dunbar, R. I. M. (2003). Social network size in humans. Human Nature, 14(1), 53–72. Jafar Khounigh, A., Haghdoost, A. A., SalariLak, S., Zeinalzadeh, A. H., Yousefi-Farkhad, R., Mohammadzadeh, M., & Holakouie-Naieni, K. (2014). Size estimation of most-at risk groups of HIV/AIDS using network scale-up in Tabriz, Iran. Journal of Clinical Research & Governance, 3(1), 21–26. Jing, L., Qu, C., Yu, H., Wang, T., & Cui, Y. (2014). Estimating the sizes of populations at high risk for HIV: A comparison study. PLoS One, 9(4), e95601. Kadushin, C., Killworth, P., Bernard, H., & Beveridge, A. (2006). Scale-up methods as applied to estimates of heroin use. Journal of Drug Issues, 36(2), 417. Killworth, P. D., & Bernard, H. R. (1978). The reverse small-world experiment. Social Networks, 1(2), 159–192. Killworth, P. D., Bernard, H. R., & McCarty, C. (1984). Measuring patterns of acquaintanceship. Current Anthropology, 23, 318–397. Killworth, P., Johnsen, E., McCarty, C., Shelley, G., & Bernard, H. (1998). A social network approach to estimating seroprevalence in the United States. Social Networks, 20(1), 23–50. Killworth, P., McCarty, C., Bernard, H., Shelley, G., & Johnsen, E. (1998). Estimation of seroprevalence, rape, and homelessness in the United States using a social network approach. Evaluation Review, 22(2), 289–308. Killworth, P.  D., McCarty, C., Bernard, H.  R., Johnsen, E.  C., Domini, J., & Shelley, G.  A. (2003). Two interpretations of reports of knowledge of subpopulation sizes. Social Networks, 25, 141–160. Kossinets, G., & Watts, D. J. (2006). Empirical analysis of an evolving social network. Science, 311(5757), 88–90. Lee, H., McCormick, T., Hicken, M. T., & Wildeman, C. (2015). Racial inequalities in connectedness to imprisoned individuals in the United States. Du Bois Review: Social Science Research on Race, 12(2), 269–282. Maghsoudi, A., Baneshi, M. R., Neydavoodi, M., & Haghdoost, A. (2014). Network scaleup correction factors for population size estimation of people who inject drugs and female sex workers in Iran. PLoS One, 9(11), e110917. Maltiel, R., Raftery, A. E., McCormick, T. H., & Baraff, A. J. (2015). Estimating population size using the network scale up method. Annals of Applied Statistics, 9(3), 1247–1277. http://dx.doi.org/10.1214/15-AOAS827 McCarty, C. (2002). Structure in personal networks. Journal of Social Structure, 3(1). McCarty, C., Killworth, P. D., Bernard, H. R., Johnsen, E. C., & Shelley, G. A. (2001). Comparing two methods for estimating network size. Human Organization, 60, 28–39.

the Network Scale-Up Method   169 McCormick, T., Salganik, M., & Zheng, T. (2010). How many people do you know?: Efficiently estimating personal network size. Journal of the American Statistical Association, 105(489), 59–70. McCormick, T. H., & Zheng, T. (2007). Adjusting for recall bias in “How many X’s do you know?” surveys. In Proceedings of the Joint Statistical Meetings. McCormick, T.  H., & Zheng, T. (2012). Latent demographic profile estimation in hard-toreach groups. Annals of Applied Statistics, 6(4), 1795. McCormick, T. H., & Zheng, T. (2015). Latent surface models for networks using aggregated relational data. Journal of the American Statistical Association, 110(512), 1684–1695. Mielke, P., Jr. (1975). Convenient beta distribution likelihood techniques for describing and comparing meteorological data. Journal of Applied Meteorology, 14, 985–990. Milgram, S. (1967). The small world problem. Psychology Today, 1, 62–67. Nikfarjam, A., Shokoohi, M., Shahesmaeili, A., Haghdoost, A.  A., Baneshi, M.  R., HajiMaghsoudi, S., Rastegari, A., Nasehi, A. A., Memaryan, N., & Tarjoman, T. (2016). National population size estimation of illicit drug users through the network scale-up method in 2013 in Iran. International Journal of Drug Policy, 31, 147–152. Onnela, J., Saramaki, J., Hyvonen, J., Szabo, G., Lazer, D., Kaski, K., . . . Barabasi, A. (2007). Structure and tie strengths in mobile communication networks. Proceedings of the National Academy of Science, USA, 104(18), 7332–7336. Salganik, M., Fazito, D., Bertoni, N., Abdo, A., Mello, M., & Bastos, F. (2011). Assessing network scale-up estimates for groups most at risk of HIV/AIDS: Evidence from a multiplemethod study of heavy drug users in Curitiba, Brazil. American Journal of Epidemiology, 174(10), 1190–1196. Salganik, M. J., Mello, M. B., Abdo, A. H., Bertoni, N., Fazito, D., & Bastos, F. I. (2011). The game of contacts: Estimating the social visibility of groups. Social Networks, 33(1), 70–78. Scutelniciuc, O. (2012). Network scale-up method experiences: Republic of Kazakhstan. In Consultation on estimating population sizes through household surveys: Successes and challenges. New York, NY. Skellam, J. (1948). A probability distribution derived from the binomial distribution by regarding the probability of success as variable between the sets of trials. Journal of the Royal Statistical Society Series B (Methodological), 10(2), 257–261. United Nations. (2010). Joint programme on HIV/AIDS (UNAIDS) report. Zheng, T., Salganik, M., & Gelman, A. (2006). How many people do you know in prison? Journal of the American Statistical Association, 101(474), 409–423.

chapter 10

The Con ti n u ed R el eva nce of Eg o N et wor k Data Jeffrey A. Smith

Ego network data have a long history in the social sciences, acting as a kind of bridge between traditional statistical techniques and network analysis (Perry, Pescosolido & Borgatti 2018). Ego network data continue to be collected and analyzed and are widely used in studies of health, organizations, and stratification (Cornwell, Laumann, & Shumm, 2008; Kadushin, 2012; Smith, McPherson, & Smith-Lovin, 2014). Researchers continue to use ego network data because (1) the data are (relatively) easy to collect and (2) the data provide a surprisingly large amount of relational, or network, information “on the cheap” (Smith, 2012). Ego network data are also quite flexible, with potential applications varying greatly in scale (individual to full networks) and type. For example, ego network data are often used to measure the social support available to individuals (Fischer,  1982; Wills & Shinar,  2000; Cornwell et al., 2009); the same basic data structure has, however, also been used as a purely methodological tool, used to correct the estimates of nontraditional sampling schemes (Lu, 2013). With its ease of collection, depth of information, and flexibility in use, the view here is that ego network data are, and will continue to be, useful for network scholars, even in the face of new and more widely available complete network data sources, such as from cell phone databases and online platforms. Ego network data are generally defined in contrast to full network data (Marsden, 2011). With full network data, a researcher has information on all actors in a setting: one can identify each actor and determine whether a tie exists between each pair of actors. Under such ideal conditions, it is straightforward to calculate global measures of interest, such as group divisions and cohesion, as well as individual measures, like centrality (an individual’s position in the network) (Freeman, 1979; Moody & White, 2003). Often, however, it is impractical to collect full network data on the population of interest (Frank, 1971; Smith, 2015). The network may be too large and the resources available too scant to interview everyone in the population, and an appropriate electronic data source may not exist. A researcher may be interested in the entire US population, for example, where collecting a full census on a particular relationship of interest (which may not exist in online sources) is not generally

THE Continued Relevance of Ego Network Data   171 possible. It would also be difficult (although not impossible) to collect full network data across many different contexts, such as schools or neighborhoods. Ego network data offer an alternative, basing the network information on a sample of individuals (e.g., McPherson, Smith-Lovin, & Brashears, 2006). The survey randomly samples individuals from a known population (generally assumed to have a known sampling frame), gathering information about the respondents and their local social network (Marsden,  1990). The data yield personal networks, where the focus is on ego, the focal node/respondent, and ego’s immediate social neighbors, or alters (i.e., the people that ego is connected to). For example, a researcher may ask respondents whom they talk with on a regular basis, collecting data on each named alter; we would know the number of alters as well as the characteristics of each alter. The data will also often include the social connections between alters (although not always). Figure 10.1 depicts a set of example ego networks. Ego represents the sampled re­spond­ ent, from which the rest of the network information is derived. The dark black lines represent ties between ego and alter, while the dotted lines represent ties between alters. We can see in the top left corner that ego has two alters, who are not tied to each other. Or, in row 2, the far right ego has three alters where every alter is connected to every other alter. It is important to remember that the ego networks are sampled separately, with reports on immediate contacts only. There is no information on the alters of the named alters. Similarly, ego network surveys do not collect identifying information on the named alters and there Ego Alter Ties between ego and alter Ego

Ties between alters

Ego

Ego

Ego

Ego

Ego

figure 10.1  Example ego networks.

Ego

Ego

172   Jeffrey A. Smith is no (simple) way of knowing if the sample respondents themselves are connected. Thus, the sampled parts of the network (collected within an ego network survey) cannot be connected. In that sense, one can think of ego network sampling as capturing pieces of the whole network, with each ego network representing a different piece of the puzzle (Smith, 2012; Smith et al., 2014).

Advantages and Disadvantages of Ego Network Data There are many positive characteristics associated with personal network data, as well as a number of potential drawbacks. I begin with the advantages of ego network data before turning to the disadvantages.

Advantages The data are, first and foremost, easy to collect. Rather than require information on every node and every tie between nodes, ego network data are only based on an individual’s social network. This means that we do not need a census of all cases but can, instead, simply collect a sample of respondents. One need not have data on all Americans to collect useful, representative network data on the country (e.g., Marsden, 1987). Moreover, the ego networks are treated as independent, with no identifying information needed to link the ego networks. This greatly reduces the collection burden of the researcher. Second, ego network data are easy to embed in traditional surveys. Ego network data are based on a random sample of the population. This makes it easy to add to an existing, typical survey. A researcher would still begin by taking a random sample of individuals and asking questions about demographics, socioeconomic status (SES), health, and the like. The only difference here is that one adds network questions, eliciting information about the respondents’ social contacts. In fact, national surveys, like the General Social Survey (GSS), will periodically include ego network sections within the larger questionnaire (Burt, 1984). This embedding of ego network questions within larger surveys means that network data can be collected alongside (and thus used as a predictor of) measures of health, income, and the like. Third, ego network data are easy to combine with alternative data collection strategies. For example, it is possible (and often very useful) to pair respondent-driven sampling (RDS) data with ego network surveys (Lu, 2013). With RDS, individuals from hidden populations (such as female sex workers) recruit others into the study, thus creating a chain of referrals that serve as the base sample (Salganik & Heckathorn, 2004). Additional ego network data can be collected on the recruited individuals, supplementing the RDS chain data in important ways (Lu, 2013; Verdery et al., 2015). Ego network data can also supplement more automated data sources like cell phone data, colocation Bluetooth data, or online data, fleshing out the local network characteristics of the sampled respondents (Golder & Macy, 2011; Oliver, Matic, & Frias-Martinez, 2015).

THE Continued Relevance of Ego Network Data   173 Fourth, ego network data can be collected on sensitive relations, where it is difficult or impossible to have respondents identify their alters on the relation of interest. For example, it may be difficult to ask respondents to name their sexual partners, making it hard to collect complete networks (Krivitsky & Morris, 2015). Respondents may be reluctant, while no list (or roster) may exist for them to identify the partner. Similarly, given the confidential nature of the data, such collection efforts (on a third party, the named alter) may be met with institutional review board (IRB) complications. An ego network survey offers a nice alternative, as it does not require identifying the names of partners, only the number of partners and their characteristics. The researcher can thus collect network information without broaching difficult confidentiality issues, making it easier to elicit network information on sensitive relations like sexual contact and drug use (Morris & Kretzschma, 2000; Merli, Moody, Mendelsohn, & Gauthier, 2015). Fifth, ego network surveys contain a great deal of information, including the characteristics of ego, the characteristics of the alters (or the people ego interacts with), the structural features of the personal network (e.g., size), and even something about the nature of the relationship between ego and each alter (closeness, frequency of contact, etc.). More subtly, ego network data offer information at both the micro and macro levels. At the micro level, there is information about individual respondents and the people they interact with. This local information can then be aggregated to infer characteristics about the population-level network structure from which the respondents were sampled, thus offering information about the macro level (Smith, 2012; Gjoka, Smith, & Butts, 2014).

Disadvantages Such advantageous characteristics come at a cost, however. First, the data are based (typically) on self-reports. Ego (the focal respondent) reports on the characteristics of the alters and the ties between alters. There could be biases in such reporting, as respondents may be unable to report on their alters with complete accuracy; while demographic characteristics like gender and education are unlikely to cause many problems, respondents may, for example, be unable to accurately report on the political attitudes of their alters (Marsden, 1990). There may also be cognitive biases in the reporting of the alter-alter ties; for example, respondents may assume their alters know each other based on a cognitive desire to minimize conflict or divides within their ego network (Krackhardt & Kilduff, 1999). Second, respondents must be treated as independent. Ego network data capture purely personal networks, based on an independent sample of respondents. The egos themselves are thus assumed to be independent and, in fact, must be treated so as there is (generally) no way to tell which egos are tied to each other. Such assumptions work well in nationally representative surveys, where it is extremely unlikely that any two selected respondents know each other (Smith et al., 2014). This works less well in smaller, bounded settings where a large number of respondents within a school or organization are sampled. If sampled egos are, in fact, tied together, then models that assume they are independent may have biased estimates and deflated standard errors (Ebring & Young, 1979). Third, the data do a rather poor job of capturing asymmetries in social relations, as the relations are based only on ego’s report of a tie. Many social connections must assumed to be symmetric (such as friendship), even if an underlying asymmetry actually exists (where

174   Jeffrey A. Smith A likes B much more than B likes A) (Gould, 2002; Smith & Faris, 2015). Additional data would have to be collected or the relation of interest must be concrete enough to allow for clear asymmetries in the interactions (e.g., Did you borrow money from each alter? Did they borrow money from you?).

What Can Be Extracted from Ego Network Data? Table 10.1 summarizes the information available from ego network data. While one must weigh the potential drawbacks of using sampled data, the upside, in terms of information collected, is higher than one might expect. The first column in Table 10.1 describes the information collected from the survey itself. The second column lists the measures associated with each piece of information, while the third column offers example applications. Respondents are first asked to list a set of social contacts, the alters. Practically, studies will often ask respondents to list a limited number of alters (e.g., up to 10), but it is generally best not to truncate the data in any way (Smith, 2015). Similarly, it is generally a good idea to ask more than one question to elicit the initial list of alters (Marin & Hampton, 2007). The person you hang out with may be different than the person you discuss health matters with. Multiple relations capture a fuller picture of the local personal network. The exact relations depend on the question of interest. Questions are typically either general (Looking back over the last six months, who are the people with whom you discussed an important personal matter?) or behavioral (Who have you slept with in the last six months?) or capture support relations (If you were sick, who would be willing to accompany you to the hospital?).

Table 10.1 Information, Measures, and Applications Associated with Ego Network Data Information Collected Example Measures

Example Substantive Applications

1. List of alters

Degree

Social support; risk factors

Degree distribution

Global network inference

Differential degree

Global network inference

Proportion same as ego

Social boundaries

Mixing matrix

Social boundaries; global network inference

Distributional summaries (e.g., proportion female)

Adjust RDS estimates; social support; risk factors

3. Tie characteristics

Distributional summaries (e.g., proportion kin)

Social support

4. Alter-alter ties

Density

Social support; risk factors

Efficiency

Information flow and brokerage

Ego network configurations

Global network inference

2. Alter characteristics

THE Continued Relevance of Ego Network Data   175 Such data, at the individual level, offer an estimate of the number of contacts for ego, or individual degree. More generally, the listing of alters (aggregated over all respondents) offers an estimate of the degree distribution, equal to the number of alters per respondent. The alter list also offers information on differential degree, or the mean degree across demographic groups. This is inferable as there is information on degree and the demographic characteristics of the respondents. For example, those with higher levels of education may have more social ties than those with less education (e.g., Lizardo,  2006; McPherson et al., 2006). The second row in Table 10.1 summarizes the information surrounding the alter characteristics. A researcher may ask the respondent to report on the race, gender, and age of the named alters (often only a subset of the total list). This information can be used in a number of ways. For example, we can ask if the respondents are similar to their social contacts. The respondent provides information on both their characteristics and their alters’ characteristics; this makes it possible to ask if ego is the same/different than the alters (e.g., what proportion of the alters are the same gender as ego?). We can also ask if the alters are themselves diverse or homogenous (Marsden,  1988). For example, what proportion of the alters are female? What proportion of the alters (or alter-alter dyads) are the same gender? Similarly, we can ask how ego network composition is associated with different outcomes, for example, the strength of an ethnic identity (Lubbers, Molina, & McCarty, 2007). Ego network data also include information about the tie itself. This is summarized in row 3 of Table 10.1. The survey may ask respondents to report on the frequency of contact between themselves and the alter (daily, weekly, monthly, etc.), as well as the closeness and duration of the relationship. A survey may also capture whether the named alter is kin (mother, father, brother, etc.). A detailed survey could even ask where and under what circumstances the two met. Such information is useful in fleshing out the picture of an individual’s social network (e.g., showing proportion of kin, how often ego sees their alters, etc.). The fourth row summarizes the information surrounding the alter-alter ties. A researcher will ask the respondent to report on the ties that exist between alters 1 and 2, 1 and 3, 2 and 3, etc. (typically limited to a subset of all alters to curtail respondent burden). The alter-alter tie data capture the local structural patterns in the ego network. Are individuals socially connected to ego (the respondent) also connected to each other? Or, is a friend of a friend (the respondent) also a friend? The simplest structural measure based on the alter-alter ties is density, showing the proportion of ties that exist relative to the total number possible, but more complicated measures are possible (see Smith, 2012). Substantively, density shows to what extent people tied to ego are also tied to each other.

Applications of Ego Network Data Ego network data thus contain information about the number of alters, the characteristics of the alters, and the structure of the ego network. Different research traditions employ this information in very different ways, depending on the question of interest and the scale of the analysis.

176   Jeffrey A. Smith

Using Ego Network Properties to Predict Individual-Level Outcomes Ego network data are commonly used to measure individual-level network properties, which are then used as predictors of various outcomes, like health, mental health, job placement, cultural consumption, and the like (House, Landis, & Umberson, 1988; Dimaggio & Louch, 1998). Scholars in this tradition are most likely to collect and analyze ego network data as part of a traditional, (often) national-level survey. The most common usage of ego network data is still as a means of measuring social support (Luke & Harris, 2007; Umberson & Montez, 2010). Social support is then used to predict health outcomes, such as mental health, health status, recovery from illness, etc. (Cornwell & Laumann,  2015; Perry & Pescosolido,  2015). The simplest measure of social support is the degree of the respondent: for example, the number of people the respondent can rely on for aid, providing specific health information, or providing emotional support. Generally, the more people one can rely on, the better the health outcomes (although this has its limits and having too many associates may be a stressor in itself; see Falci & McNeely, 2009). It is also possible to use the characteristics of the alters to measure social support. Here, different alters offer different resources from which ego can potentially draw. Ego networks composed of alters with higher income (financial resources), education (informational resources), and more kin (emotional resources) may lead to better outcomes. Ego network data are useful, in part, because of the wide range of relational information that can be collected. For example, it is clear that not all connections are “good” or act as resources for ego. Social connections can also represent risks and potential stressors, and ego network data can just as easily capture such negative influences (Portes & Landolt, 1996). For example, if the relation of interest is drug partners, then those with higher degree (i.e., more drug partners) are actually at higher risk for engaging in risky behavior than those with lower degree (Lovell, 2002; Bailey et al., 2007). Similarly, the alter characteristics can capture potential stressors and risks that ego faces. For example, the distribution of (chronic) illness among named alters can be an important predictor of one’s own health status, as dealing with an ill relative or friend may be a stressor, both physically and emotionally (Moreman, 2008). The structural characteristics of the ego network (or the alter-alter ties) can also be used to predict individual-level outcomes. The basic idea is that norms are easier to maintain in a tight-knit network, where everyone knows everyone else (as social control is easier to maintain). For example, past work by Haynie (2001) explored adolescent delinquent be­hav­ ior. She predicted delinquent behavior as a function of personal network features, including both composition and structure. Delinquency increased as friends’ delinquency increased, but this relationship was clearly dependent on the density of the friendships: the rate of increase in delinquency was considerably higher in ego networks with high density. In such cases, a tight-knit network made it easier to maintain social control, making friends’ delinquent behavior seem normative. Ego network data not only provide information on risks and support but also can be used to proxy the availability and diversity of information available to ego. Here the focus is on the flow of information from outside sources to the focal respondent. The basic notion

THE Continued Relevance of Ego Network Data   177 underlying such work is a powerful but simple one: that an individual who is receiving a diverse, nonredundant set of information is in an advantageous position (Burt, 1992). They would (potentially) hear about a wider range of job opportunities, new health information, new consumption goods, etc., while, importantly, having information that others do not have (Lin, 1999; Burt, 2004). Past work has used ego network data as a means of measuring the information flow surrounding ego. We might expect individuals who have larger ego networks to receive more diverse information, as they are pulling from more sources. For example, more highly educated people tend to have larger networks and more diverse cultural consumption patterns (Dimaggio, 1987) (although more recent work has also argued that individuals with diverse taste join many groups, thus creating larger, more diverse networks; see Lizardo, 2006). A long literature has alternatively focused on the relationships between ego and the named alters. Here, the question is whether those with more weak connections are more  likely to find better positions, have more diverse consumption patterns, etc. (Granovetter, 1974). The argument is that weak ties are more likely to bridge loosely connected parts of the network and thus provide more unique information compared to strong “redundant” connections—that is, the strength-of-weak-ties hypotheses (although note that the evidence is mixed here) (Granovetter, 1973). Work in the organizational literature has focused more on the strategic aspects of such processes, arguing that an individual who fills a “structural hole” is likely to have informational and brokerage advantages, as they connect two groups with few ties between them (Burt, 1992). Burt has a suggested a number of ego network measures to capture these features, including effective size, efficiency, and constraint (Burt, 1992). See Table 10.1.

Using Ego Network Data to Measure Social Boundaries Ego network data can also be used to study the social boundaries that exist in a population of interest, in addition to measuring individual-level predictors. Ego network data contain information on the demographic characteristics of both the respondents and alters, for example, offering information on education, gender, age, etc. Taken together, the re­spond­ ent and alter characteristics can be used to explore homophily, or the tendency for similar individuals to interact (McPherson, Smith-Lovin, & Cook,  2001; Smith et al.,  2014). Aggregating over all respondents yields the frequency of social ties between different demographic groups. One can ask how many ties exist between people of the same characteristics, or in-group ties (male-male, female-female), versus ties between people of different characteristics, or out-group ties (male-female). For example, how frequently do individuals form social connections with someone of a different race, education, or gender? Ego network data thus offer a means of measuring the salience of a demographic dimension (Laumann, 1966; Blau & Schwartz, 1984). A low-salience, or unimportant, dimension will exhibit low levels of homophily, as the social boundaries are porous and individuals are able to freely form close social ties with members of a different social group. One can then use ego network data to tease out which demographic dimensions are more salient or are more important in organizing a social system (Marsden,  1987). Importantly, this can be done at a national level (as the data are easy to collect as part of a traditional national survey). One can ask, for example, if race/ethnicity is a more important demographic divide

178   Jeffrey A. Smith than religion, age, or gender (Marsden, 1988; Rosenfeld, 2008). If collected over time, it is possible to ask how the salience of different dimensions increase/decrease in light of larger macro structural shifts, such as demographic changes (e.g., the country is more racially diverse), increasing inequality, and changes to residential segregation (Smith et al., 2014). It is also possible to make cross-national comparisons, where different countries, with very different political and economic systems, may be socially divided in different ways (Kalmijn, 1994). Ego network data also make it possible to look at the specific divides between social groups. Are college graduates particularly unlikely to make out-group ties? Similarly, are ties between college graduates and high school graduates more or less likely than ties between college graduates and PhD holders (Smith et al., 2014)? This can be represented by a frequency table, or mixing matrix, capturing the frequency of ties between each pair of categories (e.g., Merli, Moody, Mendelsohn, & Gauthier, 2015). Such analyses can shed light on which social divides are particularly salient. Analytically, a researcher can examine the raw, or absolute, rate of contact between demographic groups, as well as the rate of contact relative to some baseline model of chance expectations, for example, based on the demographic composition of the population of interest. The raw rates show the actual level of contact between groups (Blau, Beeker, & Fitzpatrick, 1984). How likely is someone with a high school degree to know someone with a college degree? The relative rates capture the salience of the demographic dimension more directly, by asking the same question net of what we expect by chance, just based on the size of different groups (i.e., we would expect more ties between two large groups just by chance). A number of models can be employed to handle the relative-to-chance analysis: including traditional log-linear models, case-control models, and exponential random graph models (Koehly, Goodreau, & Morris, 2004; Krivitsky, Handcock, & Morris, 2011; Smith et al., 2014).

Using Ego Network Data to Improve RDS Estimation Ego network data have also recently been applied to more technical problems, serving as a useful corrective for other sampling traditions—most notably acting as a means of adjusting RDS estimates. RDS is a widely used sampling strategy, appropriate for conditions where no clear sampling frame exists, such as with hidden, vulnerable populations like drug users and female sex workers (Salganik & Heckathorn,  2004). RDS begins with a small number of seeds (often selected based on convenience); the initial seeds then pass on a coupon inviting other drug users, for example, into the study. These invited cases are brought into the study and the process is repeated again. Typically, each respondent will recruit two or three other people into the study. The sampling process is thus based on personal referrals, and a researcher must make strict assumptions about this recruitment process to arrive at valid estimates (e.g., proportion of female sex workers with syphilis) (McCreesh et al., 2012; Merli, Moody, Smith, Li, Weir, & Chen, 2015). For example, one must assume that respondents refer new participants into the study by making a random selection among their network alters; or one must assume that there is no preferential recruitment. When the assumptions of the estimators do not hold, the estimates may be biased and have inflated variances (Goel & Salganik, 2010; Lu, 2013). In practice, this is often the case, as participants differentially recruit certain types

THE Continued Relevance of Ego Network Data   179 of people into the study (e.g., female sex workers may be more/less likely to pass the coupon to people in the same physical venue). The problem is that such biases are difficult to adjust for, as traditional estimators only use information from the RDS chain itself. Work by Lu (2013) suggests that traditional RDS estimators can be greatly improved by incorporating ego network information from the recruited respondents. The basic idea is to collect ego network data on the individuals recruited into the study. The alter characteristics (from the ego network survey) are then used to adjust for any violations of the assumptions, such as differential peer recruitment. This is possible as one can compare the characteristics of the recruited peers (i.e., those they passed the coupon to) to the characteristics of the full list of alters, including those that did not receive the coupon. One can then adjust for any biases in recruitment. Lu found that using the ego network data greatly improved the estimates, with lower bias and variance than estimates based only on the RDS data itself (see also Verdery et al., 2015). More generally, this hints at the potential payoff from combining ego network data with other data sources.

Using Ego Network Data to Infer Full Network Features Ego network data can also be used to infer global network features when full network data cannot be collected (Smith, 2012, 2015). A researcher may be interested in the network features of a population, such as distance or cohesion, where it is infeasible to collect information on all actors and all ties between actors (e.g., the network may be too large or the relationship of interest too sensitive to collect directly). It may, however, be possible to collect sampled ego network data in cases where a census cannot be collected. If it is possible to make inference about the full network structure from independently sampled ego network data, then there is a potentially radical shift in the way we think about doing network analysis—as one could collect a sample (instead of a census) and still measure the network features of a given population (e.g., Handcock & Gile,  2010; Krivitsky et al.,  2011). This would, for example, make it easier to move beyond small, institutionally bounded populations. Using sampled network data makes data collection easier but also raises difficult inference problems. It is difficult to capture network-level measures like cohesion, group structure, or diffusion potential if we cannot map out the direct and indirect connections among all actors in a network. The sampled data offer only independent pieces of the network, thus offering no direct way of measuring global network features. Instead, a researcher must infer what the full network structure looks like just based on the local information found in the ego network data. Past work has used simulation as a solution, where the goal is to generate complete networks that are consistent with the local information found in the sampled data (Morris et al., 2009; Smith, 2012). As the ego networks are drawn randomly from the population, any network consistent with the sampled information is a possible realization of the true network. The basic idea is to gather information from the ego networks, generate full networks that have those properties, and use the generated networks to answer substantive questions about the population of interest. Much of the literature using ego network data to generate full networks has focused on sexual networks (Morris et al., 2009; Merli, Moody, Mendelsohn, & Gauthier, 2015). Sexual

180   Jeffrey A. Smith networks are a prime candidate for such analyses: it is difficult to collect full network data on sexual relations,1 but it is necessary to know something about the global network features to characterize the epidemic risk in a population. A typical study will take information on the degree distribution, differential degree, and homophily (i.e., sexual mixing between groups) and generate a full network that is consistent with what is observed in the actual ego network data. This is generally accomplished using exponential random graph models (ERGMs) or similar approaches. Once the network is generated, it is possible to explore the potential for infectious diseases such as HIV or hepatitis C virus to spread through the network (Morris & Kretzschma, 2000). Recent work has also used ego network data and simulation to explore the properties of spatially embedded networks. An unbounded network (i.e., not bounded by school walls or organizational affiliation) can range over a wide geographic area while still having properties that are shaped by geography; for example, interaction decreases as physical distance between two individuals increases (Spiro, Almquist, & Butts, 2016). Ego network data are useful for capturing the spatial network properties of unbounded populations. A researcher could randomly sample respondents, collecting information on the neighborhood, city, state, etc., of the respondents and alters. If physical location cannot be collected, it is possible to ask how far (or how long) each alter is from ego. This information could then be used to inform the simulation of the full network, simulating ties based on the observed physical distance between ego-alter pairs in the data (e.g., Butts et al., 2012; Merli, Moody, Smith, Li, Weir, & Chen, 2015). It is important to note that studies in the sexual network tradition (as well as the spatial tradition) typically do not incorporate the alter-alter ties into the analysis: the alters are unlikely to have sexual relations with each other and it is difficult to ask about the sexual partners of the alters. A number of recent studies have developed models that make explicit use of the alter-alter ties (Smith, 2012, 2015). This will be particularly important in settings where one’s associates are likely to know each other, the case for most (nonsexual) network ties. The goal of such work is to incorporate the structure of the ego networks into a simulation that infers global network features from sampled data. The approach of Smith (2012), for example, offers a unique means of measuring (local) structural features from the alteralter tie data and then generating networks based on those local features. Specifically, the alter-alter tie data are used to construct a distribution of ego network configurations. An example ego network configuration distribution is plotted in Figure 10.2 (limited to configurations with four or fewer alters for space considerations). The x-axis plots the different configurations that ego can fall into, based on the size of the ego network and the pattern of ties between alters. The y-axis captures the proportion of egos in a hypothetical sample that fall into each configuration. The basic idea is then to look for full networks with the same distribution of ego network configurations as the sampled data (the simulation is also conditioned on the degree distribution, differential degree, and homophily found in the sample). A network consistent with this local information is likely to have similar global features as the actual (unknown) network. The simulation is heavily constrained by the empirical data, making it more likely that the generated network approximates the true network. In a test of the method, Smith (2012, 2015) found that the simulation approach offers estimates of global network features (like distance and modularity) that closely approximate the values from the true network.

THE Continued Relevance of Ego Network Data   181 0.20

Proportion in Network

0.15

0.10

0.05

0.00

isolate Ego Network Type

figure 10.2  Example ego network configuration distributions. Notes: This figure is based on a hypothetical ego network configuration distribution.Ego is not included in the ego network types. Only ego network types of size four or less are included to make the figure legible.

This suggests that a researcher could collect sampled ego network data and still make plausible claims about the structure of the full network.

Conclusion: Future Uses of Ego Network Data Ego network data have a long history in the social sciences, traditionally acting as a means to measure social support and social boundaries, and more recently being employed to infer global network features from a sample, as well as acting as a corrective to other sampling techniques. This long and varied history suggests something about the utility of this seemingly “too simple” network data structure. The evolution of use also suggests that ego network data might be used in very different ways in the future. We should expect innovations in methods, approaches, and substantive concerns. Here, I highlight a number of possible avenues for future work. First, there is an opportunity to push the ego network sampling/simulation approach much further, making it a general option for network scholars. For example, one possible application is estimating contextual network effects; contextual network effects capture the relationship of global network properties, like cohesion (i.e., percent in the largest

182   Jeffrey A. Smith bicomponent), on individual outcomes, like mental health (Bearman & Moody, 2004). For example, do more socially cohesive schools have lower rates of depression? The question is whether one can estimate such relationships using sampled data rather than having to collect census data in every context. Researchers would first collect sampled ego network data in different contexts (schools, organizations, etc.). They would then use the sampled data to infer global network structure, yielding an estimate of cohesion, for example, across contexts. Cohesion would then be used as a contextual-level predictor of mental health, suicidality, and so on (Wray, Colen, & Pescosolido, 2011). The larger goal would be to make it easier to incorporate global network structure into traditional analyses in the social sciences, with data limitations no longer acting as an obstacle. The open question, of course, is whether sampled data will actually yield empirically valid, accurate estimates of contextual network effects. In a similar way, there is an opportunity to combine network sampling with agent-based models, with the goal of specifying agent-based models that are more fully grounded in empirical data. Agent-based models are based on simulated worlds where actors interact based on a set of prespecified rules; these models are used to explore/test social theories under controlled conditions. Agent-based models are (generally) based on a stylized hypothetical world, where the conditions of the simulation are not informed by empirical data.2 Purely theoretical models, while potentially useful, are also easier to dismiss, and recent work has pushed for agent-based models to be more empirically grounded (Bruch & Atwell, 2015; Hedström & Manzo, 2015). One way of satisfying this call would be to incorporate ego network data into agent-based models, most obviously in models based on network processes, such as diffusion (e.g., diffusion of culture through a network). A researcher would use sampled ego network data to generate realistic full networks, which could then be used within the agent-based model as the base substrate for the simulation. The advantage of such an approach is that the data are easy to collect but the simulations are still conditioned on a realistic network structure. There is also room to push the modeling of ego network data. For example, Krivistky and Morris (2015) use ego network data to estimate ERGMs (where the researcher predicts the network of interest, the ties between all i,j pairs, as a function of different network counts). Thus, one would only need to collect sampled data to estimate these increasingly popular statistical network models. More work on the statistical properties of these models, as well as more practical applications, is needed. Future work should also grapple more explicitly with the limitations of ego network data. For example, respondents may forget to name alters that should have been part of their ego network (Marin, 2004). This can distort the number of alters listed (degree), as well as the characteristics of the alters. More work is needed to describe the biases that exist. More work is also needed on the survey side, showing what survey protocols can limit threats to validity. For example, Marin and Hampton (2007) suggest using multiple name generators, as certain questions may prime different associations for the respondent. It would also be useful to explore the benefits of using specific exchanges (borrow money, get a ride from, etc.) instead of more general name generators (who do you discuss important matters with?). Similarly, there is much work to be done on fully incorporating multiple relations into the analysis of ego network data. Ego network data based on multiple relations has many advantages: methodologically, the data are less prone to recall bias (as the respondent has more than one chance to name an alter); substantively, multiple relational data open up

THE Continued Relevance of Ego Network Data   183 new avenues for analyses—for example, making it possible to enumerate social roles. Finally, future work should continue to explore the costs and benefits of using different collection technologies when eliciting contacts. Finally, there is much work to be done on combining ego network data with other data sources. We saw earlier how ego network data can improve RDS estimates, but there are a number of other ways that ego network data can be paired with additional data sources to create a unique, robust set of information. For example, it is possible to supplement ego network data with a very limited snowball sample. With a snowball sample, a researcher will contact a random subsample of all named alters, interviewing those selected alters in a separate interview. Even going out one step into the network offers a great deal of supplemental information. By interviewing the alters, one receives firsthand reports on their characteristics; this serves as a robustness check on the original reports from ego on the alters. The alter interviews also make it possible to measure asymmetry in relationships, as one can see if the named alter reports the tie as being as equally strong as the respondent did. Another possibility is to supplement the ego network data by matching the named alters to the interviewed respondents, using nicknames and descriptions (but not actual ids) as a means of identifying the alter in the list of respondents (Mouw et al., 2014). This matching process would make it possible to fill in some of the ties between the isolated ego networks. Similarly, ego network data can be combined with automated data sources, like cell phone records and social media websites. For example, recent studies have used Bluetooth signals on cell phones to generate colocation networks (i.e., are person i and person j in the same location at the same time?) (Oliver et al., 2015). Such data have the potential to reveal the social network structure on the population of interest. Unfortunately, unless the researcher can give cell phones to everyone in the population of interest, the data only reveal a subgraph sample on the network (i.e., only showing the ties between the sampled respondents). The problem with this approach is the likely sparseness of the data. Any two individuals in the network are unlikely to interact. A subgraph sample on a large network may yield very few ties between the sampled respondents unless the sample is quite large or the network is very dense. A subgraph sample with few ties indicates that the network is not very dense but is otherwise uninformative. A researcher can add to the cell phone (colocation) data by asking the respondents to directly describe their social contacts or elicit their ego network. Such data can serve as a means of fleshing out whatever network structure emerges from the colocation data. In short, ego network data are easy to collect and useful for a wide variety of substantive and methodological problems. Given this flexibility and ease of use, there is every reason to believe that ego network data will continue to be a useful option for network scholars.

Notes 1. This is the case for a number of reasons: first, respondents may be unwilling to name their sexual partners; second, IRB approval may be difficult to acquire; third, the network is unlikely to be bounded; and fourth, no clear list of possible partners may exist. 2. There is already a large literature employing empirical data to inform simulations of disease spread (see earlier), but the literature on agent-based modeling and nonhealth outcomes, such as culture, inequality, neighborhood mobility, etc., has been much slower to incorporate empirical data into its simulations.

184   Jeffrey A. Smith

References Bailey, S. L., Ouellet, L. J. Mackesy-Amiti, M. E., Golub, E. T., Hagan, H., Hudson, S. M., . . . DUIT Study Team. (2007). Perceived risk, peer influences, and injection partner type predict receptive syringe sharing among young adult injection drug users in five us cities. Drug and Alcohol Dependence, 91, S18–S29. Bearman, P. S., & Moody, J. (2004). Suicide and friendships among American adolescents. American Journal of Public Health, 94(1), 89–95. Blau, P. M., Beeker, C., & Fitzpatrick, K. M. (1984). Intersecting social affiliations and intermarriage. Social Forces, 62(3), 585–606. Blau, P. M., & Schwartz, J. E. (1984). Crosscutting social circles. Orlando, FL: Academic Press. Bruch, E., & Atwell, J. (2015). Agent-based models in empirical social research. Sociological Methods & Research, 44(2), 186–221. doi:10.1177/0049124113506405 Burt, R.  S. (1984). Network items and the General Social Survey. Social Networks, 6(4), 293–339. Burt, R.  S. (1992). Structural holes: The social structure of competition. Cambridge, MA: Harvard University Press. Burt, R.  S. (2004). Structural holes and good ideas. American Journal of Sociology, 110(2), 349–399. doi:10.1086/421787 Butts, C. T., Acton, R. M., Hipp, J. R., & Nagle, N. N. (2012). Geographical variability and network structure. Social Networks, 34(1), 82–100. Cornwell, B., & Laumann, E. O. (2015). The health benefits of network growth: New evidence from a national survey of older adults. Social Science & Medicine, 125, 94–106. Cornwell, B., Laumann, E.  O., & Schumm, P.  L. (2008). The social connectedness of older adults: A national profile. American Sociological Review, 73(2), 185–203. Cornwell, B., Schumm, L.  P., Laumann, E.  O., & Graber, J. (2009). Social networks in the Nshap study: Rationale, measurement, and preliminary findings. Journals of Gerontology Series B: Psychological Sciences and Social Sciences, 64(Suppl. 1), i47–i55. DiMaggio, P. (1987). Classification in art. American Sociological Review, 52, 440–455. DiMaggio, P., & Louch, H. (1998). Socially embedded consumer transactions: For what kinds of purchases do people most often use networks. American Sociological Review, 63, 619–637. Erbring, L., & Young, A.  A. (1979). Individuals and social structure: Contextual effects as endogenous feedback. Sociological Methods and Research 7: 396–430. Falci, C., & McNeely, C. (2009). Too many friends: Social integration, network cohesion and adolescent depressive symptoms. Social Forces, 87(4), 2031–2061. Fischer, C. S. (1982). To dwell among friends: Personal networks in town and city. Chicago, IL: University of Chicago Press. Frank, O. (1971). Statistical inference in graphs (Doctoral dissertation). Stockholm University Stockholm, Sweden. Freeman, L. C. (1979). Centrality in social networks: Conceptual clarification. Social Networks, 1, 215–239. Gjoka, M., Smith, E., & Butts, C. (2014). Estimating clique composition and size distributions from sampled network data. In 2014 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) (pp. 837–842). Toronto, ON: IEEE. Goel, S., & Salganik, M. J. (2010). Assessing respondent-driven sampling. Proceedings of the National Academy of Sciences, 107(15), 6743–6747.

THE Continued Relevance of Ego Network Data   185 Golder, S. A., & Macy, M. W. (2011). Diurnal and seasonal mood vary with work, sleep, and daylength across diverse cultures. Science, 333(6051), 1878–1881. Gould, R.  V. (2002). The origins of status hierarchies: A formal theory and empirical test. American Journal of Sociology, 107, 1143–1178. Granovetter, M. (1974). Getting a job: A study of contacts and careers. Cambridge, MA: Harvard University Press. Granovetter, M.  S. (1973). The strength of weak ties. American Journal of Sociology, 78, 1360–1380. Handcock, M. S., & Gile, K. J. (2010). Modeling social networks from sampled data. Annals of the Applied Statistics, 4, 5–25. Haynie, D. L. (2001). Delinquent peers revisited: Does network structure matter? American Journal of Sociology, 106(4), 1013–1057. Hedström, P., & Manzo, G. (2015). Recent trends in agent-based computational research a brief introduction. Sociological Methods & Research, 44(2), 179–185. House, J. S., Landis, K. R., & Umberson, D. (1988). Social relationships and health. Science, 241(4865), 540–545. Kadushin, C. (2012). Understanding social networks: Theories, concepts, and findings. New York, NY: Oxford University Press. Kalmijn, M. (1994). Assortative mating by cultural and economic occupational status. American Journal of Sociology, 100(2), 422–452. Koehly, L., Goodreau, S. M., & Morris, M. (2004). Exponential family models for sampled and census network data. Sociological Methodology, 34, 241–270. Krackhardt, D., & Kilduff, M. (1999). Whether close or far: Social distance effects on perceived balance in friendship networks. Journal of Personality and Social Psychology, 76: 770–782. Krivitsky, P. N., Handcock, M. S., & Morris, M. (2011). Adjusting for network size and composition effects in exponential-family random graph models. Statistical Methodology, 8(4), 319–339. http://dx.doi.org/10.1016/j.stamet.2011.01.005 Krivitsky, P. N., & Morris, M. (2015). Inference for social network models from egocentricallysampled data, with application to understanding persistent racial disparities in HIV Prevalence in the U.S. University of Wollongong, National Institute for Applied Statistics Research Australia. http://niasra.uow.edu.au/publications/UOW190187 Laumann, E.  O. (1966). Prestige and association in an urban community. Indianapolis, IN: Bobbs-Merrill. Lin, N. (1999). Social networks and status attainment. Annual Review of Sociology, 25, 468–487. Lizardo, O. (2006). How cultural tastes shape personal networks. American Sociological Review, 71, 778–807. Lovell, A. M. (2002). Risking risk: The influence of types of capital and social networks on the injection practices of drug users. Social Science & Medicine, 55(5), 803–821. Lu, X. (2013). Linked ego networks: Improving estimate reliability and validity with re­spond­ent-driven sampling. Social Networks, 35(4), 669–685. Lubbers, M. J., Molina, J. L., & McCarty, C. (2007). Personal networks and ethnic identifications the case of migrants in Spain. International Sociology, 22(6), 721–741. Luke, D. A., & Harris, J. K. (2007). Network analysis in public health: History, methods, and applications. Annual Review Public Health, 28, 69–93. Marin, A. (2004). Are respondents more likely to list alters with certain characteristics?: Implications for name generator data. Social Networks, 26(4), 289–307.

186   Jeffrey A. Smith Marin, A., & Hampton, K. N. (2007). Simplifying the personal network name generator. Field Methods, 19(2), 163–193. doi:10.1177/1525822x06298588 Marsden, P. V. (1987). Core discussion networks of Americans. American Sociological Review, 52, 122–131. Marsden, P. V. (1988). Homogeneity in confiding relations. Social Networks, 10(1), 57–76. Marsden, P.  V. (1990). Network data and measurement. Annual Review of Sociology, 16, 435–463. Marsden, P. V. (2011). Survey methods for network data. In Carrington, P. J., & Scott, J. (Ed.), The Sage handbook of social network analysis (pp. 370–88). London: SAGE Publications. McCreesh, N., Frost, S., Seeley, J., Katongole, J., Tarsh, M. N., Ndunguse, R., . . . Johnston, L. G. (2012). Evaluation of respondent-driven sampling. Epidemiology (Cambridge, Mass.), 23(1), 138. McPherson, M., Smith-Lovin, L., & Brashears, M.  E. (2006). Social isolation in America: Changes in core discussion networks over two decades. American Sociological Review, 71(3), 353–375. McPherson, M. J., Smith-Lovin, L., & Cook, J. M. (2001). Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27, 415–444. Merli, M. G., Moody, J., Mendelsohn, J., & Gauthier, R. (2015). Sexual mixing in Shanghai: Are heterosexual contact patterns compatible with an HIV/AIDS epidemic? Demography, 52(3), 919–942. Merli, M. G., Moody, J., Smith, J., Li, J., Weir, S., & Chen, X. (2015). Challenges to recruiting population representative samples of female sex workers in China using respondent driven sampling. Social Science & Medicine, 125, 79–93. Moody, J., & White, D.  R. (2003). Structural cohesion and embeddedness. American Sociological Review, 68, 103–127. Moremen, R.  D. (2008). The downside of friendship: Sources of strain in older women`s friendships. Journal of Women & Aging, 20(1–2), 169–187. Morris, M., & Kretzschma, M. (2000). A micro-simulation study of the effect of concurrent partnerships on HIV spread in Uganda. Mathematical Population Studies, 8(2), 109–133. Morris, M., Kurth, A. E., Hamilton, D. T., Moody, J., & Wakefield, S. (2009). Concurrent partnerships and HIV prevalence disparities by race: Linking science and public health practice. American Journal of Public Health, 99(6), 1023–1031. Mouw, T., Chavez, S., Edelblute, H., & Verdery, A. (2014). Binational social networks and assimilation: A test of the importance of transnationalism. Social Problems, 61(3), 329–359. Oliver, N., Matic, A., & Frias-Martinez, E. (2015). Mobile network data for public-health: Opportunities and challenges. Frontiers in Public Health, 3, 189. doi:10.3389/fpubh.2015.00189 Perry, B. L., & Pescosolido, B. A. (2015). Social network activation: The role of health discussion partners in recovery from mental illness. Social Science & Medicine, 125, 116–128. Perry, B.  L., Pescosolido, B.  A., and Borgatti, S.  P. (2018). Egocentric network analysis: Foundations, methods, and models. Cambridge, UK: Cambridge University Press. Portes, A., & Landolt, P. (1996). The downside of social capital. American Prospect, 26, 18–22. Rosenfeld, M. J. (2008). Racial, educational and religious endogamy in the United States: A comparative historical perspective. Social Forces, 87(1), 1–31. doi:10.1353/sof.0.0077 Salganik, M. J., & Heckathorn, D. D. (2004). Sampling and estimation in hidden populations using respondent-driven sampling. Sociological Methodology, 34(1), 193–240. Smith, J. A. (2012). Macrostructure from microstructure: Generating whole systems from ego networks. Sociological Methodology, 42(1), 155–205. doi:10.1177/0081175012455628

THE Continued Relevance of Ego Network Data   187 Smith, J. A. (2015). Global network inference from ego network samples: Testing a simulation approach. Journal of Mathematical Sociology, 39(2), 125–162. doi:10.1080/0022250X. 2014.994621 Smith, J. A., & Faris, R. (2015). Movement without mobility: Adolescent status hierarchies and the contextual limits of cumulative advantage. Social Networks, 40, 139–153. http://dx.doi. org/10.1016/j.socnet.2014.10.004 Smith, J. A., McPherson, M., & Smith-Lovin, L. (2014). Social distance in the United States: Sex, race, religion, age, and education homophily among confidants, 1985 to 2004. American Sociological Review, 79(3), 432–456. doi:10.1177/0003122414531776 Spiro, E. S., Almquist, Z. W., & Butts, C. T. (2016). The persistence of division: Geography, institutions, and online friendship ties. Socius: Sociological Research for a Dynamic World, 2. doi:10.1177/2378023116634340 Umberson, D., & Karas Montez, J. (2010). Social relationships and health a flashpoint for health policy. Journal of Health and Social Behavior, 51(1 Suppl.):S54–S66. Verdery, A.  M., Merli, M.  G., Moody, J., Smith, J.  A., & Fisher, J.  C. (2015). Brief report: Respondent-driven sampling estimators under real and theoretical recruitment conditions of female sex workers in China. Epidemiology, 26(5), 661–665. doi:10.1097/ede. 0000000000000335 Wills, T.  A., & Shinar, O. (2000). Measuring perceived and received social support. In S. Cohen, L. G. Underwood, & B. H. Gottlieb (Eds.), Social support measurement and intervention: A guide for health and social scientists (pp. 86–135). New York, NY: Oxford University Press. Wray, M., Colen, C., & Pescosolido, B. (2011). The sociology of suicide. Annual Review of Sociology, 37(1), 505–528. doi:10.1146/annurev-soc-081309–150,058

Chapter 11

Dya dic, Noda l , a n d Grou p-L ev el A pproache s to St u dy the A n teceden ts a n d Consequ ence s of N et wor ks Which Social Network Models to Use and When? Filip Agneessens

Social network analysis has become an increasingly popular way of looking at the social world. In fields such as sociology, anthropology, psychology, epidemiology, political science, management, and educational research, social network analysis has helped generate novel insights into a variety of social phenomena. Examples of topics being studied with social network analysis include friendship among students in classrooms (e.g., Van de Bunt, Van Duijn, & Snijders,  1999; Moody,  2001), advice between employees in organizations (e.g., Lazega et al., 2012; Agneessens & Wittek, 2013), cosponsorship of legislative bills by politicians (e.g., Fowler,  2006), interlocking directorates among firms (e.g., Burt,  1980; Mizruchi & Stearns,  1988), and trade relations between countries (e.g., D.  A.  Smith & White, 1992; Kim & Shin, 2002). This network approach has not only enabled social scientists to tackle existing research questions in an innovative way but also allowed scholars to develop and answer new exciting research questions (cf. Wellman, 1983, 1997; Burt, Kilduff, & Tasselli, 2013). As a result, social network analysis has become a well-established approach to study social phenomena, bringing together a range of theories and methods (cf.  Wellman, 1997; Borgatti & Halgin, 2011).1 Methodologically, social network analysis encompasses a variety of methods to study the social relations and social interactions between individual units or nodes2 in a particular

Which Social Network Models to Use and When?   189 social setting or group. To ensure that the research questions posed in a specific study are answered correctly, it is crucial that the correct (causal) model is chosen, the appropriate network data are collected, the right network measures are used, and the most suitable statistical methods are applied. While specific social network studies might focus on fundamentally different areas of research, they may nevertheless use similar types of network methods and models. Focusing on research questions that aim to explain (rather than merely describe), this chapter offers an overview of (causal) models for common and less common types of questions that can be answered with social network analysis and discusses appropriate statistical methods and network sampling approaches to answer such questions. The chapter starts with a classification of the main types of basic models based on two criteria: (1) whether the researchers are interested in the antecedents of networks and/or their consequences and (2) the appropriate level of analysis, in particular the dyadic, nodal, or group level.3 This results in six basic types of models: dyadic-, nodal-, and group-level models that either aim to explain why specific network structures emerge (i.e., where the dependent variable is a network) or aim to understand the consequences of such network structures (i.e., where the network is an independent variable). The second part of the chapter focuses on models that are extensions and variations of these basic models. In particular, the focus will be on models where networks take on the role of mediator or moderator, as well as models that incorporate multiple levels of analysis and models that integrate network antecedents and network consequences. The examples discussed will primarily be drawn from the fields of management and educational research.

A Framework of Basic Models for Social Network Analysis at Different Levels The main components of any social network analysis are relational data, that is, network data that capture the social relations between two nodes (i.e., a dyad). These relational data can be directed or undirected; they can be valued, categorical, or binary; they can be positive, neutral, or negative (Yang, Trincado, Labianca, & Agneessens, 2019; Harrigan, Labianca, & Agneessens, 2020); and they can represent information about interactions, flows, relational roles, and interpersonal evaluations (cf. Borgatti et al., 2009). In addition to network data, supplementary information can be collected regarding group-level characteristics (e.g., information about group-level attributes such as size, location, or type of group), nodal-level characteristics (e.g., information about individual-level attributes such as demographics, norms, values, or behavior in the case of people), and dyadic-level transmissions of specific characteristics (e.g., information about the transfer of behavior, norms, values, or resources between individual nodes).4 While dyadic social relations are the basic building blocks for any social network analysis, the actual analysis can occur at a number of different levels. The three most common levels of analysis are the dyadic level, the nodal (or individual) level, and the group level

190   Filip Agneessens Table 11.1  Examples of Network Properties at a Dyadic, Individual, and Group Level Group-level network properties: Network structure -  Density of a group -  Level of centralization in a group -  Level of homophily in a group -  Level of transitivity/clustering in a group -  Core-periphery structure of a group -  Number of cliques in a group Nodal-level network properties: Network position -  Degree centrality -  Closeness centrality -  Betweenness centrality -  Constraint index -  Number of cliques a node is part of Dyadic-level network properties: Network connectedness -  Direct (network) connection between two nodes -  Geodesic distance between two nodes -  Level of structural equivalence between two nodes -  Number of clique memberships shared between two nodes

(cf.  Marsden, 1990; Contractor, Wasserman, & Faust, 2006; Mizruchi & Marquis, 2006; Brass,  2012).5 Analysis at a particular level requires network measures or properties and statistical methods at that particular level. Table 11.1 provides a list of examples of network properties that are commonly used at each of these three main levels. At a dyadic level, the emphasis is on how two nodes are related (i.e., connected) to each other. At this level the focus could be on the “direct connectedness” between these two nodes, for example, how frequently there is a direct interaction between two nodes or how much (e.g., how strongly or closely) two nodes are directly related. However, dyadic analysis can also focus on the “indirect connectedness” between two nodes, such as the geodesic distance between two nodes or their level of structural equivalence (Lorrain & White, 1971; see Table 11.1). At the nodal (individual) level, the focus is on the position of the individual node (unit) in the network, that is, the relation of the individual node to some or all other nodes in the group. Popular measures of position include degree, closeness, betweenness centrality (Freeman, 1979), and measures to capture structural holes (Burt, 1992). Finally, at the group level, the analysis focuses on the network structure of the group as a whole. This includes group-level network properties, such as the density or centralization of the group (Freeman, 1979), but also more complex structural aspects, such as the extent to which the group exhibits a core-periphery structure (Borgatti & Everett, 2000). As in standard social science research, the research questions in quantitative social network studies can be either descriptive or explanatory. Descriptive network research might aim to (1) identify specific properties of a group (e.g., by calculating the density or level of centralization in a group), (2) detect the network position of an individual node in a group

Which Social Network Models to Use and When?   191 (e.g., identify the most central node), or (3) define the relation between two nodes (e.g., describe how far removed two nodes are from each other). In descriptive network research, attributes can feed into these network constructs. For example, nodal-level attributes are needed to identify homophily6 and to calculate the heterogeneity (diversity) index for the ego network of an individual node (Blau, 1977). Explanatory network research, on the other hand, concentrates on the antecedents of networks and/or on their consequences (cf. Brass, 2012; Brass et al., 2004), either aiming to uncover why specific dyadic-, nodal-, or group-level network properties (such as those described in Table 11.1) emerge or attempting to uncover the effects of such network properties. Hence, explanatory studies focus on causal mechanisms. When the focus is on network emergence, the relational data take the role of the dependent variable in the (causal) model, with attributes and/or other network relational data as independent variables. When the focus is on network consequences, relational data are in the independent part of the model, with (nodal or group-level) attributes as dependent (i.e., outcome) variables. As the focus can be on either network emergence or consequences at a dyadic, nodal, or group level, six basic types of (causal) models can be identified. Figure 11.1 provides a graphical representation of the generic format of these six types. The three models on the left side are concerned with network emergence at a dyadic, nodal, and group level (Models 1.1, 1.3, and 1.5, respectively), while those models concerned with the effects of networks at each specific level are on the right side (Models 1.2, 1.4, and 1.6, respectively). This classification provides the basic framework for this chapter and is a starting point for the discussion about the more complex models later in the chapter. In the remaining part of this section, these six models are discussed in more detail.

Network Antecedents at a Dyadic Level (Model 1.1) One of the most fundamental questions that social network researchers have raised is how or when social network relations emerge. Model 1.1 incorporates studies that aim to answer such a question by focusing on the direct or indirect connectedness between two nodes (i.e., by focusing on the dyadic level). In principle, this can encompass research questions that aim, for example, to understand the reasons that two nodes are structurally equivalent or two nodes have a specific geodesic distance (cf. Table 11.1). However, in practice, most studies that are concerned with the emergence of networks from a dyadic perspective have focused on the direct relations between those two nodes.7 Typical types of explanations for the presence of a direct relation between two nodes include (1) the presence of such a tie between both nodes in the past; (2) surrounding structural mechanisms such as reciprocity, transitivity, and cyclicality; (3) the presence of other network relations; (4) the physical distance between two nodes; and (5) explanations involving attributes of these nodes, such as homophily. For example, the presence of a friendship tie between two students might be explained by a number of different factors, such as both students (1) having been friends in the past, (2) having many common friends (and friends of friends becoming friends), (3) having collaborated on a group project together in the past, (4) being neighbors, and (5) sharing an interest in the same type of music.

GROUP LEVEL

192   Filip Agneessens NETWORK ANTECEDENTS

NETWORK CONSEQUENCES

MODEL 1.5: Group-Level Network Antecedents

MODEL 1.6: Group-Level Network Consequences

X: Group-level characteristics

a

X: Group-level network structure

b

Y: Group-level network structure

Can the network structure of a group be explained by a) group-level characteristics? b) other network structural properties of that group?

NODAL LEVEL

MODEL 1.3: Nodal-Level Network Antecedents X: Nodal-level characteristics

a

X: Nodal-level network position

b

Y: Nodal-level network position

Can the network position of a node be explained by a) characteristics of that individual node? b) another nodal-level property for that unit?

DYADIC LEVEL

MODEL 1.1: Dyadic-Level Network Antecedents X: Dyadic-level characteristics X: Dyadic-level connectedness

a Y: Dyadic-level connectedness b

Can the connectedness between 2 nodes be explained by a) dyadic-level characteristics? b) other dyadic-level network connectedness?

X: Group-level characteristics X: Group-level network structure

a(*)

Y: Group-level outcome

b

Can a group-level outcome be explained by a) group-level characteristics? (*) b) network structural properties of that group? MODEL 1.4: Nodal-Level Network Consequences X: Nodal-level characteristics X: Nodal-level network position

a(*)

Y: Nodal-level outcome

b

Can a nodal-level outcome be explained by a) characteristics of that individual node? (*) b) that node’s network position? MODEL 1.2: Dyadic-Level Network Consequences X: Dyadic-level characteristics

a(*)

X: Dyadic-level connectedness

b

Y: Dyadic-level outcome

Can a specific dyadic transmissionbe explained by a) dyadic-level characteristics? (*) b) the network connectedness between those nodes?

(*) non-network part of the model

figure 11.1  Social network models for network antecedents (left) and network consequences (right) at the dyadic, nodal, and group levels. The first four types of explanations focus on independent variables capturing some sort of dyadic-level connectedness (corresponding to arrow b in Model 1.1), while the last type focuses on dyadic-level characteristics, that is, the attributes of either or both nodes (arrow a in Model 1.1). A more thorough overview of some of these mechanisms and related theoretical arguments can be found in Contractor et al. (2006) and Rivera, Soderstrom, and Uzzi (2010). When network data are collected among all nodes in a group (i.e., complete network analysis), exponential random graph models (ERGMs; Lusher, Koskinen, & Robins, 2013) and multivariate regression quadratic assignment procedures (MRQAPs; Krackhardt, 1987)

Which Social Network Models to Use and When?   193 have been the most common methods used to answer such questions, while stochastic actor-oriented models (SAOMs; Snijders, van de Bunt, & Steglich, 2010) and relational event models (Butts, 2008; Stadtfeld, Hollway, & Block, 2017) are frequently used when network data in a group are collected over time. Network data for a single group with a sufficient number of nodes can, in principle, suffice to perform such statistical tests, provided the researcher aims to only draw conclusions about the social processes taking place in that specific group. However, as will be discussed in the section on “Multiple Groups and Multilevel Models”, data among a set of groups are obviously needed to make more generalizable statements regarding these social processes beyond a specific context (i.e., beyond a single group).8

Network Consequences at a Dyadic Level (Model 1.2) Model 1.2 concentrates on how the direct or indirect connectedness between two nodes might contribute to the transmission of a particular nodal attribute, such as a particular idea, practice, behavior, or resource between those nodes (e.g., Travers & Milgram, 1969; Stevenson & Gilly, 1991). This transmission process is sometimes also referred to as social contagion (Burt, 1987). Examples of transmissions include the diffusion of innovation (i.e., the transfer of an idea, practice, or object that is new to the recipient) via a communication or friendship network (Rogers, 2003; Coleman, Katz, & Menzel, 1966; Valente, 1995, 1996; Burt, 1987), as well as the spread of a specific sexually infectious disease via a sexual contact network (Klovdahl, 1985; Keeling & Eames, 2005). Salancik and Pfeffer (1978) used a social information processing approach to explain why employees who share information would have more similar attitudes about their job. When social contagion among nodes is due to the presence of a direct network relation among those nodes, the result will be a high level of homophily in that group. However, some researchers (e.g., Friedkin,  1984; Burt,  1987; Borgatti & Li,  2009) have argued that the level of structural or regular equivalence (rather than the presence of a direct link) between two nodes might be more relevant for the adoption of innovation or transmission of specific behavior. Those in structurally equivalent positions may adjust their behavior to each other because they feel in competition with each other (Fujimoto & Valente, 2012), or these nodes might feel more related to each other because they take on a similar role in the network. Methodologically, the study of the effects of networks on specific transmissions requires the explicit recording of the transmission of a specific, unique characteristic (attribute) between two nodes and the time at which this happens, as well as network information among these nodes. Data collection can involve recording specific transmissions and network data among a random sample of dyads or among all dyads in a group. When sampling dyads randomly, standard techniques can be used, for example, to examine whether the transmission of a disease among two nodes is more likely to take place when they are friends. However, if the focus is on the sequential process of a disease flowing through a friendship network, the friendship relation and the transmission of a specific attribute would need to be recorded among all nodes in a bounded group, so the analysis becomes more complex.9

194   Filip Agneessens While many of the theoretical arguments made in this subsection in principle could be approached from a dyadic perspective (as in Model 1.2), most studies have actually focused on the aggregated effect of “implicitly” measured transmissions from multiple connections,10 that is, a focus on the effects at the nodal level (as in Model 1.4). The reason for this is that the focus is often on the aggregated result, such as a person exhibiting specific behavior, holding specific beliefs, or having a certain amount of aggregated knowledge (i.e., the cumulative effect of contagion processes from being connected to multiple alters), rather than a focus on a single transmission from one node to another. For example, the amount that a person smokes is generally considered as the aggregated effect of social influence from all his or her friends (and their behavior or attitudes), and therefore models similar to Model 1.4 might be more appropriate in such a case.

Network Emergence at the Nodal Level (Model 1.3) At the nodal level, both the antecedents and consequences of nodal network positions have been widely studied. Focusing specifically on humans as units of analysis, the social support and the social capital literature has been particularly influential (e.g., Wellman, 1979; Lin, Vaughn, & Ensel, 1981; Sarason et al., 1983; Burt, 1984; Campbell, Marsden, & Hulbert, 1986; Marsden, 1987, 1990; Borgatti, Jones, & Everett, 1998; Lin, Cook, & Burt, 2001). With regard to the antecedents of network positions (Model 1.3), classic studies primarily concentrated on explaining the size of a person’s network (i.e., degree centrality) or the characteristics of a person’s direct contacts (e.g., how dissimilar or diverse the alters of a person are with regard to a specific attribute). Demographic and other individual (nodal) characteristics of the person, such as age, gender, socioeconomic status, educational level, ethnicity, and religion, have been commonly used to explain differences in size, in the level of homophilous connections, and in the heterogeneity of a person’s alters with a particular focus on different types of emotional and instrumental support (Wellman, 1979; Burt, 1984; Sarason et al., 1983; Moore, 1990; Ibarra, 1992; van Emmerik, 2006; Fischer, 1982). A more recent stream of research that fits within this general framework has focused on the reasons that some individuals take on a brokerage role in a network, that is, why they tend to connect with others who are themselves not connected (Burt, 1992). In this respect the focus has been especially on the impact of personality (e.g., Burt, Jannotta, & Mahoney,  1998; Oh & Kilduff,  2008) and strategic orientations (e.g., tertius gaudens and tertius iugens; Obstfelt,  2005). For example, Kalish and Robins (2006) have studied the effects of the Big Five personality traits on the preference to be surrounded by open versus closed triadic structures, while Mehra, Kilduff, and Brass (2001) have studied the effects of self-monitoring on the level to which a person is high on betweenness centrality.11 These studies fit perfectly with Model 1.3 (arrow a), whereas Model 1.3 (arrow b) focuses on explanations where the network position of a node in one network might impact its position in another network. For example, the centrality in the collaboration network might impact the centrality in the friendship network.12 Methodologically, both ego network data and complete network data have been widely used to explain why nodes end up in a specific network position. While the generalizability of the conclusions that can be drawn from sampled ego networks is a clear benefit of such a design, only information about ego’s direct contacts tends to be available and this

Which Social Network Models to Use and When?   195 information tends to be based on self-reports. Hence, only ego network–based measures such as degree and ego betweenness can generally be calculated (see Perry, Pescosolido, & Borgatti, 2018). Conversely, when network information is collected about the ties between all nodes in a group (i.e., complete network data), the indirect ties between ego and all the other nodes in a group can be calculated and therefore more complex network measures of position involving geodesic distance or walks (e.g., closeness and betweenness centrality) can be used as a dependent variable. However, because the network position is calculated for all nodes in a specific group, generalization beyond the group could be problematic (see the section on “Multiple Groups and Multilevel Models”). The choice between an ego network and a complete network approach also has important implications for the choice of statistical method. With ego networks, standard statistical techniques, such as linear regression, are often applied. However, since ties are nested in nodes, a multilevel approach might be more appropriate to disentangle within-egos and between-egos variance when studying the impact of nodal attributes and prior network position on outcomes (van Duijn, van Busschbach, & Snijders, 1999). Similarly, in case complete network data are used, it might, on certain occasions, be more appropriate to perform a dyadic analysis (as proposed by arrow b in Model 1.1) rather than simple nodal-level analysis, especially when interested in explaining measures of nodal position such as degree centrality. This is because many of these measures of position are actually based on some additive aggregation of the direct and indirect ties between a focal node and all other nodes (Bloch, Jackson, & Tebaldi, 2017).13

Network Consequences at the Nodal Level (Model 1.4) Regarding the consequences of an individual node’s network position (Model 1.4), the main measures being considered again include degree centrality, closeness centrality, betweenness centrality, and the constrain index.14 To explain how nodal-level network properties impact outcomes, three broad types of arguments can roughly be identified (cf. Brass, 1984). One main argument focuses on how networks provide access to specific resources, attitudes, and values (Brass, 1984).15 Such an approach tends to focus on nodal-level properties that concentrate on reach (cf. Borgatti,  2005; Agneessens, Borgatti, & Everett,  2017). Measures can focus on direct reach, such as degree centrality (e.g., when focusing on emotional support; cf. Haines & Hurlbert, 1992; Thoits, 1982), or on indirect flows, such as closeness centrality (e.g., when focusing on easily transferable knowledge about where to find a job; cf. Granovetter,  1973). Measures can also focus on shortest paths only (degree and closeness) or incorporate all walks (Borgatti, 2005). A second broad approach focuses on the power and control that result from being in a brokerage position between others (e.g., Brass, 1984; Burt, 1992). Such brokerage roles or structural hole positions have been captured by a number of measures, including the constraint index (Burt, 1992), that is, the amount of open triadic structures one is part of, and betweenness centrality (Freeman,  1979). For example, in one classic study Burt (2000) showed how managers who are in such a brokerage position tend to get promoted faster, while in another study Burt shows how such situations are related with higher creativity (Burt, 2004).

196   Filip Agneessens Finally, a third major stream of research, already discussed when covering Model 1.2, focuses on social contagion of attitudes and behavior, such as the contagion of smoking behavior (Mercken et al., 2009), criminal behavior (Baerveldt, Völker, & Van Rossem, 2008), or job satisfaction (Agneessens & Wittek, 2008). To explain how a specific network position affects nodal-level outcomes, standard statistical techniques, such as linear regression, are quite popular. For ego network data, such an approach might seem reasonable since the dependent variable is measured at the nodal level (McCallister & Fischer, 1978; Burt, 1984; Marsden, 1987, 1990; see Perry et al., 2018 for more on ego networks). However, when complete network data are used, the network position is calculated for all nodes in a group and used to predict their nodal-level outcome. Since these outcomes are not independent of each other, the regression model needs to take this complex interdependence into account. For example, when wanting to predict how the number of friends impacts one’s happiness, the value for ego’s happiness (Yego) might be dependent on the happiness of ego’s connections (Yalter1, Yalter2, etc.), while at the same time the happiness for ego’s connections (Yalter1, Yalter2, etc.) might be dependent on the value of happiness for ego (Yego). Network autoregression models have been proposed to incorporate such recursive effects explicitly (Doreian, Teuter, & Wang, 1984; Marsden & Friedkin, 1993; Leenders, 2002). Alternatively, longitudinal models (see later in this chapter) and a number of other influence models have been proposed (see Valente, 2005; cf. Mouw, 2006).

Network Emergence at a Group Level (Model 1.5) At the group level, studies focusing on the antecedents of network structures (Model 1.5) have primarily concentrated on one of three types of structures: (1) cohesion as measured by density, (2) centralization and hierarchy, and (3) fragmentation and subgroups. Research on teams and in school classes has often used aggregated nodal-level characteristics, such as age, gender, educational level, and expertise, to explain the emergence of such network structures (e.g., by taking the average or standard deviation of such nodal-level attributes; see Harrison & Klein, 2007). For example, Reagans, Zuckerman, and McEvily (2004) found an effect of demographic diversity on network density, while Balkundi et al. (2007) concentrated on the effect of age diversity on the proportion of structural holes in a team. Besides aggregated nodal-level characteristics, actual group-level properties, such as the size of the group, and properties of the group leader or the teacher in a school class could also be used to explain the group-level structure (arrow a in Model 1.5). Finally, studies might also incorporate other networks as an independent variable to explain, for example, how a centralized collaboration network leads to a centralized friendship network, or how a high level of subgroup formation based on friendship cliques might result in a higher number of conflict ties, especially between such subgroups (arrow b in Model 1.5). In practice, most studies have tried to explain basic structural properties, such as density, while far less attention has been paid to the antecedents of more complex group-level structural properties, such as core-periphery structures. Part of the reason for this is that the emergence of such group-level structures is hard to understand when the focus is solely on group-level antecedents and when no attention is given to dyadic- or nodal-level mechanisms (see the Section on “Macro-Micro-Macro Models”). From a methodological point of view, groups are the basic units of analysis in such grouplevel studies, and therefore intragroup network data need to be collected among a

Which Social Network Models to Use and When?   197 sufficiently large number of groups to be able to explain the occurrence of a specific grouplevel network structure. Provided a sufficiently large random sample of groups is selected and individual nodes are not members of multiple groups,16 standard statistical techniques, such as linear regression, can be used with group-level network constructs as the dependent variable.

Network Consequences at a Group Level (Model 1.6) Similar to the studies on group-level network antecedents, the prevalent structural properties found in studies investigating the consequences of network structure on group-level outcomes (Model 1.6) are cohesion (density), centralization, and fragmentation. Using a social capital perspective, the concept of group cohesion (e.g., Mullen & Copper, 1994) has been widely studied, linking a high network density with a range of positive outcomes. As an example, Sparrowe et al. (2001) found that a high density for advice in groups had a positive effect on their performance. In one intriguing classic network study Bavelas (1950) and Leavitt (1951) showed how a more centralized communication structure had a positive effect on efficiency, but a negative impact on the team members’ average satisfaction with the task (cf. Shaw, 1954; Shore, Bernstein, & Lazer, 2015). Besides density and centralization, studies have considered subgroup formations and the amount of brokerage in a group as an explanation for group outcomes (Balkundi et al., 2007).17 To study such models, standard statistical techniques, similar to those discussed previously for Model 1.5, can be used provided enough groups have been randomly sampled and there is no overlap in membership across groups. While the antecedents and especially the consequences of networks have been widely studied from a group-level perspective, it is worth noting that the network characteristics that scholars have focused on in such cases have been predominantly restricted to constructs such as density, centralization, and, to a lesser extent, fragmentation. However, because of the aggregated nature of many grouplevel properties (such as density being an aggregate of dyadic network relations and diversity being an aggregate of nodal characteristics), group-level analysis may not always be the correct level to analyze the antecedents and consequences of network structure. Given the danger of “fallacies of the wrong level” (Rousseau, 1985) and in particular of ecological fallacy, in some cases these macro- (group-level) processes can be better understood from a dyadic or nodal perspective. For example, a centralized friendship network might emerge from a centralized collaboration network because the central node in the collaboration network also becomes central in the friendship network, which requires a nodal-level or even dyadic-level analysis. This concern is discussed in more detail in the section on “MacroMicro-Macro Models”).

Variations and Extensions of the Six Basic Models While a considerable amount of social network research might fit one of the six basic types of models, a growing number of studies have incorporated network data in a more advanced

198   Filip Agneessens and complex way, by building on and extending the six basic models discussed before. In this section some of these more complex types of models are discussed.

Network Mediation Models One important way in which the aforementioned models have been extended is by incorporating social networks as a mediator. Such network mediation models can provide insights into the social processes and interactions that take place in a group, that is, how specific initial states might lead to particular outcomes. Researchers interested in group composition and diversity (in terms of demographic characteristics, personality, values, and expertise) have become particularly interested in the mediating role of social relations (e.g., K. G. Smith et al., 1994; Pfeffer, 1997; Reagans et  al.,  2004). By focusing on the communication, trust, (dis)liking, and conflict ties that emerge between group members, the social network approach helps the researcher uncover exactly how the composition of the group affects outcomes, such as well-being, performance, or the generation of innovative ideas. In team research this line of inquiry is commonly referred to as the intervening model (K. G. Smith et al., 1994; Pfeffer, 1997) or the input-process-output model (see Marks, Mathieu, & Zaccaro, 2001; cf. Palardy, 2008, for a similar argument concerning educational research). For example, teams with low age diversity might exhibit a greater proportion of structural holes, and such structural holes might in turn have important performance implications (Balkundi et al., 2007). Similarly, transformational leadership in a group might affect team performance through the advice-seeking behavior that emerges among team members (Zhang & Peterson, 2011). Such studies are examples of group-level network mediation models (Model 2.6, arrow a in Figure 11.2). Other examples of studies that fit with Model 2.6 might focus on how prior network structures (arrow b), such as the level of centralization of the workflow network, might impact the centralization or density of the (focal) friendship network, which in turn might inhibit or enhance group performance. At the nodal level (Model 2.4), studies such as those by Fang et al. (2015) have explored how personality traits of employees impact job performance and career success through the centrality and brokerage role they achieve in the expressive and instrumental networks of these organizations (arrow a). Similarly, the position of employees in the formal workflow network might affect their centrality in the (focal) friendship network, and this in turn might impact their happiness with their job (arrow b). Core to both these examples is the idea that the impact of individual-level attributes or prior network positions on individual outcomes is mediated through the position these nodes acquire in a focal network. An example of network mediation at a dyadic level (Model 2.2) could focus on the similarity in values between two nodes (arrow a) or, alternatively, the level to which both nodes are required to work together (arrow b), and how this might generate specific network relations (such as friendship), which then result in the transmission of specific outcomes (e.g., the spread of specific gossip). Whichever the level of analysis, what is noticeable is that these network mediation models tend to assign a rather passive and even deterministic role to networks (i.e., being a product of specific attributes or prior ties), which subsequently produces a specific outcome.18

GROUP LEVEL

Which Social Network Models to Use and When?   199 NETWORK ANTECEDENTS

NETWORK CONSEQUENCES

MODEL 2.5: Mediation Model for Group-Level Network Antecedents

MODEL 2.6: Mediation Model for Group-Level Network Consequences

Z: Group-level network structure

a

X: Group-level characteristics

b X: Group-level network structure

Y: Group-level network structure

X: Group-level characteristics

b X: Group-level network structure

NODAL LEVEL

MODEL 2.3: Mediation Model for Nodal-Level Network Antecedents Z: Nodal-level network position

a

X: Nodal-level characteristics

b X: Nodal-level network position

DYADIC LEVEL

Z: Dyadic-level connectedness

X: Dyadic-level characteristics

b

Z: Nodal-level network position

a

Y. Nodal-level network position

X: Nodal-level characteristics

b X: Nodal-level network position

X: Dyadic-level connectedness

Y: Nodal-level outcome

MODEL 2.2: Mediation Model for Dyadic-Level Network Consequences Z: Dyadic-level connectedness

a

Y. Dyadic-level connectedness

Y: Group-level outcome

MODEL 2.4: Mediation Model for Nodal-Level Network Consequences

MODEL 2.1: Mediation Model for Dyadic-Level Network Antecedents a

Z: Group-level network structure

a

X: Dyadic-level characteristics

b

Y: Dyadic-level outcome

X: Dyadic-level connectedness

Z: Mediator

figure 11.2  Network mediation models at the dyadic, nodal, and group levels.

Network Moderation Models While network mediation models might tend to minimize the autonomous role of social interactions, some researchers have taken a somewhat different approach by considering the moderating role of a focal network. Instead of considering network relations as simple consequences of a specific initial state that subsequently generates specific outcomes, in network moderation models nodes and their social relations play a far more active role in

GROUP LEVEL

200   Filip Agneessens NETWORK ANTECEDENTS

NETWORK CONSEQUENCES

MODEL 3.5: Moderation Model for Group-Level Network Antecedents

MODEL 3.6: Moderation Model for Group-Level Network Consequences

M: Group-level network structure

M: Group-level network structure

a

X: Group-level characteristics

b Y: Group-level network structure

NODAL LEVEL

X: Group-level network structure

b Y: Group-level outcome

X: Group-level network structure

MODEL 3.3: Moderation Model for Nodal-Level Network Antecedents

MODEL 3.4: Moderation Model for Nodal-Level Network Consequences

M: Nodal-level network position

M: Nodal-level network position

a

X: Nodal-level characteristics

b Y: Nodal-level network position

X: Nodal-level network position

DYADIC LEVEL

a

X: Group-level characteristics

a

X: Nodal-level characteristics

b Y: Nodal-level outcome

X: Nodal-level network position

MODEL 3.1: Moderation Model for Dyadic-Level Network Antecedents

MODEL 3.2: Moderation Model for Dyadic-Level Network Consequences

M: Dyadic-level connectedness

M: Dyadic-level connectedness

X: Dyadic-level characteristics X: Dyadic-level connectedness

a

b Y: Dyadic-level connectedness

X: Dyadic-level characteristics X: Dyadic-level connectedness

a

b Y: Dyadic-level outcome

M: Moderator

figure 11.3  Network moderation models at the dyadic, nodal, and group levels.

shaping and changing the way the initial state has an impact on specific outcomes, that is, by moderating this relationship (see Figure 11.3). At a group level (Model 3.6), the moderation in such a model can involve a simple interaction between, for example, the gender diversity in a group and some network structure (arrow b) or an interaction between two network structural properties (arrow a). For example, Tröster, Mehra, and Van Knippenberg (2014) show that in culturally diverse groups, more than in culturally homogeneous groups, a higher average workflow between group members (i.e., a high network density) predicts greater team success (arrow b). Considering density and centralization, Grund (2012) observed that in English soccer teams high

Which Social Network Models to Use and When?   201 intensity of interaction and low centralization were associated with better team performance (arrow a). However, this interaction can also take a more complex form, for example, capturing the number of ties crossing faultlines (Ren, Gray, & Harrison, 2015) or the level to which the most central actor in a centralized network possesses specific attributes (e.g., leadership qualities). At an individual level (Model 3.4), the position in a network can interact with either other networks (arrow a) or individual characteristics (arrow b). For example, following the buffering hypothesis, being surrounded by a large number of friends (i.e., social support) might buffer the negative effect of stressful events on well-being (Cohen & Wills, 1985). Focusing on the position of employees in both the formal and the informal network (and the overlap between both), Soltis et al. (2013) found that employees who are approached for advice by many colleagues who they are also required to work with increased their turnover intension, while being able to seek advice from colleagues who they are not required to work with in the organization decreased an employee’s turnover intensions. Finally, at a dyadic level (Model 3.2), a friendship relation might be more likely to lead to the transfer of specific information between two students when these students hold similar background characteristics. Network moderation models could also aim to explain the emergence of networks at a dyadic, individual, or group level, while using other network properties or network relations as moderators (Models 3.1, 3.3, or 3.5, respectively). For example, at a dyadic level, Grosser, Lopez-Kidwell, and Labianca (2010) found that a gossip tie is more likely to emerge between employees who are required to work together while also being friends (arrow b in Model 3.1). What unites these moderation models is the idea that networks may play a critical role in when and how a prior state has an impact on specific outcomes by accentuating, buffering, or even reversing the relationship between two variables. Hence, compared to the mediation models focusing on the why question, this approach allows for a more independent role of networks.

Network Coevolution Model The models discussed so far assume that a clear causal direction can be identified between the focal network and the other networks or attribute variables; that is, either the focal network is an independent variable (i.e., a focus on network consequences) or it is a dependent variable (i.e., a focus on network antecedents). However, in many research settings the causal direction might be open for discussion and instead longitudinal models might be needed to try to uncover the true direction of the causality. To illustrate this, consider smoking behavior among students in a school. Smokers might be more likely to nominate other smokers as friends, while nonsmokers might be inclined to nominate nonsmokers. However, the existence of a high level of homophily can be the result of (1) a tendency of friendship relations emerging between students who are similar with regard to their smoking behavior (i.e., homophily as a result of social selection) or (2) a tendency for students to be influenced in their smoking behavior by their friends (i.e., homophily as a result of social contagion) (e.g., Merken et al., 2010; see also McPherson, Smith-Lovin, & Cook [2001] and Shalizi & Thomas [2011] for a broader discussion regarding homophily). In such a case network emergence and network consequence need to be modeled simultaneously (Snijders, 2011) and data about smoking and friendship ties need to be collected at multiple time points.

202   Filip Agneessens

CO-EVOLUTION OF 2 NETWORKS

CO-EVOLUTION OF NETWORK AND ATTRIBUTE

MODEL 4.5: Group-level Co-evolution of 2 Distinct Network Relations (at time 1 and time 2)

MODEL 4.6: Group-level Co-evolution of Structure and Attribute (at time 1 and time 2)

T1: Group-level network A structure T1: Group-level network B structure

a b

T2: Group-level network A structure

T1: Group-level characteristics

T2: Group-level network B structure

T1: Group-level network structure

MODEL 4.3: Nodal-level Co-evolution of 2 Distinct Network Relations (at time 1 and time 2) T1: Nodal-level network A position T1: Nodal-level network B position

a b

T2: Nodal-level network A position

T1: Nodal-level characteristics

T2: Nodal-level network B position

T1: Nodal-level network position

DYADIC LEVEL

MODEL 4.1: Dyadic-level Co-evolution of 2 Distinct Network Relations (at time 1 and time 2) T1: Dyadic-level connectedness for A T1: Dyadic-level connectedness for B

a b

a b

T2: Group-level characteristics T2: Group-level network structure

MODEL 4.4: Nodal-level Co-evolution of Structure and Attribute (at time 1 and time 2)

DYADIC/NODAL LEVEL

NODAL LEVEL

GROUP LEVEL

From a network perspective, two important types of models can be identified: (1) models focusing on the coevolution of a network and an attribute (Figure 11.4, right) and (2) models focusing on the coevolution of two networks (Figure 11.4, left). At a group level, the question could be whether a high level of communication in a group (i.e., high density) generates more innovative ideas in this group or whether, on the other hand, a significant number of new ideas in the group lead to a higher communication density in the group (Model 4.6). Focusing on the coevolution of two networks (networks A and B), the question could be whether the network structure for A impacts the network structure for B, or vice versa (cf. Model 4.5). Provided a sufficient number of groups are available, standard longitudinal models could be used to model the coevolution where the group is the unit of analysis (e.g., Duncan, Duncan, & Strycker, 2006; Preacher et al., 2008).

a b

T2: Nodal-level characteristics T2: Nodal-level network position

MODEL 4.7: Dyadic/Nodal-level Co-evolution of Network and Attribute (at time 1 and time 2) T1: Nodal-level characteristics T1: Dyadic-level connectedness/ Nodal-level network position

a b

T2: Nodal-level characteristics T2: Dyadic-level connectedness/ Nodal-level network position

MODEL 4.2: Dyadic-level Co-evolution of Structure and Attribute (at time 1 and time 2)

T2: Dyadic-level connectedness for A

T1: Dyadic-level characteristics

T2: Dyadic-level connectedness for B

T1: Dyadic-level connectedness

a b

figure 11.4  Coevolution models at the dyadic, nodal, and group levels.

T2: Dyadic-level characteristics T2: Dyadic-level connectedness

Which Social Network Models to Use and When?   203 At a dyadic or nodal level, two distinct types of models could be identified that combine networks and attributes (Models 4.2 and 4.4). However, in practice, stochastic actor-oriented models (SAOMs; Snijders et al., 2010; Steglich, Snijders, & Pearson, 2010; Ripley et al., 2019) have become the most popular approach to test such coevolution models between network and behavior (i.e., Model 4.7). These models tend to use a combination of (1) a nodal-level focus to model whether changes in behavior and attitudes are the result of network position (Model 1.4) and (2) a dyadic-level type of focus to model whether behavior and attitudes drive tie formation (Model 1.1).19 Studies in education research, for example, have looked at the coevolution of friendship networks and alcohol use (Mundt, Mercken, & Zakletskaia, 2012), delinquent behavior (Baerveldt et al., 2008), and many other factors (see Veenstra et al., 2013, for an overview). Organizational scholars have considered the coevolution of perceived psychological safety and advice/friendship relations among team members (Schulte, Cohen, & Klein, 2012) or how attempts to control others’ behavior is impacted by competence and affect-based status (de Klepper et al., 2017). For the coevolution of two networks, a dyadic-level type of focus between two or more networks (Model 4.2) using SAOMs is most widely used. Examples of models focusing on multiple networks include the coevolution of friendship and gossip in organizations (Ellwardt, Steglich, & Wittek, 2012) and the coevolution between the bullying network and the defending network among pupils in elementary school (Huitsing et al., 2014). As insights into the causal mechanisms are crucial in social sciences, network coevolution models have become a useful vehicle for uncovering the underlying causality.

Multiple Groups and Multilevel Models for Dyadic and Nodal-Level Analysis One important question that has so far not been systematically addressed in this chapter is whether to collect intragroup network data from a single group or from a set of groups when performing dyadic- or nodal-level network analysis. Obviously, group-level models always require intragroup network data from a sufficiently large number of groups to perform any statistical analysis. However, for dyadic- or nodal-level models, network data from a single group with a sufficient number of nodes, such as a single school class or a single organization, will be perfectly suitable if one wants to solely draw conclusions about that specific group (e.g., whether in that specific school class, the students central in the friendship network have higher grades than the less central students). However, data from multiple groups are needed in any of the following conditions: (1) to make generalizable statements beyond a specific group about dyadic-level or nodal-level processes, (2) to test whether group-level variables might actually explain dyadic-level or nodal-level outcomes, and/or (3) to test whether group-level variables moderate dyadic-level or nodal-level processes (cf. Snijders, 2016). Each of these arguments is discussed in more detail next.20

Generalizability First, scholars often aim to make generalizable statements about network emergence and/or its consequences beyond a specific context (Entwisle et al., 2007; Snijders, 2016). To ensure that the conclusions made are not an idiosyncratic result of a single group-specific context,

204   Filip Agneessens one needs to examine a random sample of groups or even the full population of groups to make some statement about them. One approach to find general patterns across a set of groups is to merge all data into one dataset and perform one analysis (e.g., an ERGM, SIENA, or autoregressive analysis) taking into account that only relations within groups exist (Wang, Robins, & Pattison, 2009; Ripley et al., 2019). This approach of merging data from multiple groups into one analysis is particularly useful when the size of each group is very small, and therefore these groups do not have enough power to be analyzed separately. However, the consequence is that only an overall pattern emerges and no information is available regarding any potential variation in results between groups (i.e., homogeneity in social processes across groups is assumed). Alternatively, when the size of each group is large enough to perform a separate analysis, each group could be analyzed in turn (e.g., using ERGM) and a classic meta-analysis can be used to test for an overall effect, as well as to test for potential variation in effects across groups (e.g., Snijders & Baerveldt,  2003). This approach has been widely used in educational research aimed at finding overall patterns across schools or school classes (e.g., Lubbers,  2003; Knecht et al.,  2011; Schaefer et al.,  2011; Mercken et al.,  2012; Huitsing et al., 2014). Again, the focus of this approach is on finding overall effects across groups, and while the approach does allow to test for differences in results across groups, these models generally do not incorporate any group-level effects, nor do they try to explain these potential differences (see An [2015] for extensions of this approach).

Group-Level Effects However, the dyadic- or nodal-level outcomes found across a set of groups might in fact be due to group-level network or attribute characteristics. For example, when studying students across a set of school classes, students’ friendship network centrality might seem to be related to their grades, whereas in reality the true differences in grades might be between school classes, with all students in more dense school classes having higher grades. In other words, there might be a group-level contextual variable (i.e., a group-level network structure, rather than the nodal-level network position) that impacts the nodal-level outcomes (Agneessens & Koskinen, 2016). In such a case, a proper multilevel approach is required with group-level variables as additional explanatory variables and a large number of groups (Snijders, 2016; Tasselli, Kilduff, & Menges, 2015). The main types of multilevel network models can be found in Figure 11.5. In these models, arrow b represents the effect at the dyadic or nodal level, while arrow c represents the effect of the group level on the dyadic-level or nodal-level outcome. Finally, arrow a represents the potential link between the dyadic-level or nodal-level independent variable and the group-level independent construct, since the group-level construct might be an aggregation of dyadic-level and nodal-level constructs (such as density being the aggregate of degree or the ethnic diversity of a group being the aggregate of the ethnicity of its members). For example, combining dyadic-level and group-level factors to predict dyadic-level network ties, Tolsma et al. (2013) found that bullying between pupils is not more prevalent among ethnically mixed pairs of students than it is among nonmixed pairs. However, they did find that bullying ties are more likely to emerge when the classroom is more ethnically diverse. This example fits within Model 5.3, with arrow b representing the effect of similarity

NODAL/DYADIC AND GROUP LEVEL

DYADIC AND GROUP LEVEL

NODAL AND GROUP LEVEL

Which Social Network Models to Use and When?   205 NETWORK ANTECEDENTS

NETWORK CONSEQUENCES

MODEL 5.5: Multilevel Model with Group and Nodal-level Antecedents for Network

MODEL 5.6: Multilevel Model with Group and Nodal-level Consequences for Network

g

X: Group-level characteristics a

Y: Group-level network structure c

d

X: Nodal-level characteristics

e Y. Nodal-level network position

b

MODEL 5.3: Multilevel Model with Group and Dyadic-Level Antecedents for Network g

X: Group-level characteristics a

Y: Group-level network structure c

d

X: Dyadic-level characteristics

e Y. Dyadic-level connectedness

b

X: Group-level network structure a

g

Y: Group-level outcome c

e

d

X: Nodal-level network position

Y. Nodal-level outcome

b

MODEL 5.4: Multilevel Model with Group and Dyadic-Level Consequences for Network X: Group-level network structure a

g

Y: Group-level outcome c

e

d

X: Dyadic-level connectedness

Y. Dyadic-level outcome

b

NETWORK TO NETWORK MODEL 5.1: Multilevel Model with Group and Nodal-level Network to Network X: Group-level network structure a X: Nodal-level network position

g

Y: Group-level network structure c

d

b

e Y. Nodal-level network position

MODEL 5.2: Multilevel Model with Group and Dyadic-Level Network to Network X: Group-level network structure a X: Dyadic-level connectedness

g

Y: Group-level network structure c

d

b

e Y. Dyadic-level connectedness

figure 11.5  Multilevel network models at the dyadic, nodal, and group levels. in ethnicity and arrow c representing the effect of diversity on network emergence. Note that arrow a represents the correspondence between the independent variables at two levels: similarity in ethnicity between two pupils (dyadic level) and ethnic diversity (group level). Similarly, Model 5.2 could consider how a dyadic friendship tie might be the result of either the strength of the workflow between both employees (arrow b) or the workflow density in the group (arrow c), with arrow a reflecting the fact that density is the aggregation of the dyadic strength between all students of a school class. To include group-level factors in dyadic-level models, a more integrated multilevel approach has been proposed (Models 5.2 and 5.3) (Ripley et al., 2019; Snijders, 2016), while multilevel autoregressive models have been developed that can combine group-level (e.g., centralization) and nodal-level structures (e.g., degree) to predict nodal outcomes, such as job satisfaction (Model 5.6) (Agneessens & Koskinen, 2016). These are some examples of the six types of multilevel models.

206   Filip Agneessens

Cross-Level Interaction Dyadic-level or nodal-level processes might also turn out to be dissimilar for different groups, in which case the question could be asked if these effects might be dependent on the group-level context. This requires a multilevel model that incorporates a cross-level interaction effect, represented by arrow d in the multilevel models in Figure 11.5. In such cases, the size and potentially even the direction of the dyadic-level or nodal-level effect(s) in such models (arrow b) are assumed to be contingent on the group-level context. For example, studying the implementation success of a new system by employees in an organization, Sasidharan et al. (2012) found that the new system was more likely to be implemented among employees in decentralized groups. However, the study also found that in centralized groups the central employees were most likely to apply the new system (Model 5.6; arrow d). Similarly, combining a dyadic and a group level, Mehra, Kilduff, and Brass (1998) have studied the effect of the size of the minority on ethic homophily (Model 5.3, arrow d). Hence, such multilevel approaches allow one to not only make statements that are valid across groups (generalizability) but also consider how—by incorporating group-level effects—these contextual factors might affect nodal-level and dyadic-level outcomes, and how these contextual factors might change the direction of nodal-level and dyadic-level effects (cross-level interaction effects).

Macro-Micro-Macro Models A final major extension concerns the group-level processes, i.e., Models 1.5 and 1.6, and the possible risk of ecological fallacy (Rousseau, 1985; Hitt et al., 2007). Problems of ecological fallacy in network research might occur when an assumed group-level process (i.e., a macro-process) should be analyzed at a dyadic or nodal level (i.e., using a micro-process) (cf. Coleman, 1986; Alexander, 1987). This problem is most apparent in studies where the group-level network properties and the group-level attribute constructs are, in fact, aggregates of dyadic-level or nodal-level data (cf. Tasselli et al., 2015; Brass & Borgatti, 2019). To illustrate this, consider a situation where groups with low gender diversity were found to exhibit a higher density for friendship than high-diversity groups (Model 1.5). Since the group-level diversity is an aggregate construct of a nodal attribute (cf. Harrison & Klein, 2007) and the network density is an aggregate of dyadic friendship ties, two plausible reasons can be put forward to explain the macro-results (Figure 11.6). The emerging network density in low-diversity groups might indeed come about because groups with high diversity on gender tend to generate fewer friendship ties (i.e., a group-level effect in line with the basic Model 1.5). On the other hand, the differences in density between different groups might also be the result of dyadic-level or nodal-level processes, such as a higher tendency for friendship ties to emerge between people of the same gender (i.e., the result of a dyadic-level homophily effect in line with Model 1.1). In the case of a group-level effect, there would be a higher tendency for tie formation among both homophilous and heterophilous dyads in low-diversity groups, and a lower tendency for ties to form among homophilous and heterophilous dyads in high-diversity groups as illustrated in Situation 1 in Figure  11.6. However, in the case of a dyadic-level homophily effect, ties would be

Which Social Network Models to Use and When?   207 Group diversity Situation 1: Group-level effect Situation 2: Dyadic-level effect

figure 11.6  Different effects of group diversity on network density: group-level diversity versus dyadic-level homophily effect. primarily formed among nodes with the same attribute and less among heterophilous dyads. Since groups who are less diverse have more opportunities to form homophilous dyads, more ties will once again be formed in low-diversity groups than in high-diversity groups, but in this case the real reason is a preference to form homophilous ties, as Situation 2 in Figure 11.6 illustrates. While in both cases low levels of diversity would generate a higher density at the group level, what differentiates both cases is that the ties emerge in different places in the respective groups. A pure group-level model (Model 1.5) would not be able to distinguish between both processes as it uses aggregate constructs and therefore would be unable to uncover a potential dyadic or nodal micro-level process. Instead, a macro-micro-macro step needs to be added to these models. The earlier discussed multilevel models can be further extended to provide the general framework for such a macro-micro-macro approach, where an extra step is added to capture the “aggregation” of micro-processes to macro-outcomes (arrow e), while arrow g represents the macro-to-macro effect (Models 1.5 and 1.6). This approach is in line with the macro-micro-macro model proposed by James Coleman (Coleman, 1986, 1990; cf. Hedström & Swedberg, 1998; Raub, Buskens, & Van Assen, 2011; Brass & Borgatti, 2019). Considering Model 5.3 and the example in Figure 11.6, the group composition is disaggregated to the dyadic level, where the two nodes in the dyad might or might not have the same value on gender (macro-micro or group-dyadic link, arrow a). Second, at a dyadic level, the homophily effect is considered a potential driver for network formation (the micro or dyadic-level effect represented by arrow b). And, finally, as a result of the independent dyadic homophily processes, a group network structure emerges (micro-macro or dyadic-group effect, arrow e). If the dyadic-level homophily effect is driving the network formation (Situation 2 in Figure 11.6), then the micro-level effect (arrow b) would be found to be important and the group-level effect (arrow g) could be explained away completely. However, if there is a true group-level effect (Situation 1), then the micro-effect (arrow b) would not be important, while the group-level effect (arrow g) would be. This example illustrates the need to consider dyadic-level and/or nodal-level processes when aiming to understand changes at the group level (cf. Wellman & Frank, 2001; Raub et al., 2011; Brass & Borgatti, 2019). A comparable argument can be made (see Models 5.4 and 5.6) when focusing on group outcomes as a result of the group’s network structure (cf. Model 1.6). For example, when the

208   Filip Agneessens group outcome is the result of aggregation (such as the average satisfaction in a group or total number of creative ideas generated by individuals in a group), this might be better studied by incorporating a dyadic or nodal perspective. First, specific individual nodes are embedded in a specific way in the network structure; that is, they have a specific position (macro-micro link, arrow a). This network position might lead to individual outcomes (micro-effect, arrow b), such as a specific level of satisfaction or level of creative ideas, which then translate (aggregate) into group outcomes such as average satisfaction or total number of creative ideas (micro-macro link, arrow c). In practice, the situation may be more complex, as (1) both dyadic/nodal-level and grouplevel processes might work to some extent, (2) macro-structures might impact the micro-processes (arrow d in Figure 11.5), (3) the independent and/or dependent variables might not be simple aggregations (but rather configurational, Klein & Kozlowski, 2000), and (4) multiple micro-processes (e.g., reciprocity, transitivity, homophily) might work together to create a macro-structure. Simulation approaches such as agent-based models are promising for contributing to the understanding of such macro-outcomes (e.g., Buskens & Van de Rijt,  2008; Macy & Flache, 2009; Hamill & Gilbert, 2009; Corten & Buskens, 2010; Mäs et al., 2013; Snijders & Steglich, 2015; Stadtfeld, Takács & Vörös, 2020).

Conclusion Social network analysis offers a powerful technique to model social relations between nodes in a group. However, to correctly answer the research questions of a study, it is important to choose (1) the appropriate (causal) model, (2) the appropriate network data collection, (3) the right network measures, and (4) the most suitable statistical method. This chapter provides an overview of the main models. In a first part, the basic explanatory models were discussed, focusing either on antecedents of networks or on the consequences of networks at a dyadic, nodal, and group level. The resulting six models (Figure 11.1) can be seen as the basic archetypical models. From these six basic models, different variants and extensions can be developed. The most common extensions, discussed in this chapter, are (1) approaches where a focus on network emergence is combined with a focus on network consequences (the network mediation model and the coevolution model), (2) approaches where networks take on the role of moderator (the network moderation model), and (3) approaches that combine distinct levels of analysis (the multilevel model and macro-micro-macro models). Based on this overview, a number of noteworthy gaps and future directions for research can be identified. First, of the six basic types of models, the “transmission model” (Model 1.2) has arguably been one of the least studied within the field. Often network relations have been assumed to be proxies for such transitions (e.g., friendship or communication ties are expected to imply the flow of specific information and other resources) without there necessarily being sufficient empirical evidence. While tracking transitions of specific ideas, practices, behaviors, or resources between specific individual nodes has traditionally been challenging, the increasing use of online (communication) data and the growing availability of technology (e.g., to record behavior and speech or to code conversations) could make such research more conceivable.

Which Social Network Models to Use and When?   209 Second, in these classic models attributes and networks are traditionally seen as separate entities. The network moderation approach offers a strategy to combine network and attributes in a simple or more complex way. Further research can benefit from considering the effects of a combination of both. However, rather than, for example, simply combining network density and attribute diversity, future research, especially at the group level, might benefit from more complex, configurational approaches to combine both networks and attributes. Another way in which networks and attributes can be combined is by modeling their coevolution over time. SAOMs have been a particularly fruitful approach that has generated considerable insights in the last two decades. The recent development of relational event-type models (Butts, 2008; Stadtfeld et al., 2017), allowing the emphasis on both states (e.g., friendship) and events (e.g., an email), provides another important direction for future research. However, one challenge to their general implementation is the need for high-quality longitudinal network data as well as attribute data. A fourth major issue relates to the generalizability and contextual factors. Recent years have seen a considerable increase in data collection across multiple groups, especially in schools and teams. Such data allow for more generalizable statements across contexts but also open up opportunities to incorporate macro-contextual effects into the approach. From a methods perspective, proper multilevel network models are required, which allow incorporating macro-contexts into dyadic- and nodal-level analysis (Snijders,  2016; Lazega & Snijders, 2016; Lomi, Robins, & Tranmer, 2016). A fifth issue relates to the micro-macro debate as outlined in the last section of this chapter. Social network analysis is well placed to provide more detailed insights into the processes linking micro-level processes with macro-level changes. In many cases, to understand macro changes (i.e., group-level effects), a focus on micro-processes (i.e., dyadic-level or nodal-level effects) might be required, as well as an understanding of the effects of micro-processes on macro-outcomes (Coleman,  1986). More research is needed on how micro-processes (under specific macro-conditions) could generate specific macro-outcomes. Agent-based models and other simulation methods (e.g., Buskens & Van de Rijt,  2008; Snijders & Steglich, 2015; Stadtfeld et al., 2020) may be useful in linking the micro-network processes with such macro-structural outcomes.

Acknowledgments I would like to thank Joe Labianca, Steve Borgatti, Tom Valente, Johan Koskinen, and Francisco Trincado Munoz for helpful comments on earlier versions of this chapter.

Notes 1. See, for example, Wasserman and Faust (1994); Scott (2017); Carrington, Scott, and Wasserman (2005); or Robins (2015) for some excellent general theoretical and methodological overviews on social network analysis. 2. These units can be people, animals, companies, nongovernmental organizations, countries, etc. The more technical term node will be used in the rest of this chapter.

210   Filip Agneessens 3. A group is defined as a clearly delineated set of nodes within a formally and a priori (pre) defined boundary (Marsden, 1990). The term group is preferred over the more commonly used term network. 4. In this chapter the transfer of a specific nodal attribute is considered a dyadic-level outcome but does not, in and of itself, constitute a network relation. However, the (persistent) transfer of specific characteristics between two nodes might (theoretically) be assumed to happen because of the presence of a (latent) tie between those nodes, and therefore (under such an assumption) these transfers might be used as a proxy for the presence of a hidden network relation between those nodes (see the section discussing Model 1.2). 5. From a network perspective, the dyadic level can be seen as being nested in the individual (node) level, while the individual level is nested in the group level. For example, the degree of a node is the aggregate of the dyadic network relations between that node and all other nodes, while the density for a group is the aggregate of the degree of all nodes in the group. Similarly, individual attribute data can be used at a group level, for example, by calculating the average or variation for a specific individual attribute (e.g., Harrison & Klein, 2007). 6. Homophily can be defined as the tendency for units to be more likely to connect to others who are similar in terms of specific attributes, compared to others who are dissimilar. 7. Instead, structural equivalence and geodesic distance between two nodes are most of the time simply seen as a consequence of the emergence of a series of direct connections between a set of nodes. 8. A different approach has been to randomly sample dyads in a group and then perform a dyadic analysis (Kenny, Kashy, & Cook, 2006). However, while a random sample of dyads offers the possibility to make statements regarding the social mechanisms in the larger group from which the dyads are sampled, this standard design is not well fitted to consider the broader structural environment (e.g., whether friendship emerges as a result of having common friends). 9. While there is some similarity with relational events models, the later models tend to focus on more abstract versions of events (e.g., whether a call happened at a specific time) rather than a specific attribute flowing through the network (e.g., specific information being transmitted). 10. This means that what is measured is a network relation (which is assumed to imply a transmission) rather than measuring a specific transmission explicitly (cf. footnote 5) combined with a nodal-level outcome. 11. Some studies have also focused on the effect of personality on other measures of position, such as degree centrality (e.g., Klein et al., 2004). See Fang et al. (2015) for a more in-depth discussion. 12. Note that such a design does not answer the question of whether ego’s collaboration ties and friendship ties are with the same alters or with different alters. This would require a dyadic analysis (arrow b in Model 1.1). 13. For example, degree centrality is the sum of the number of direct contacts between a focal actor and others in the group. Hence, nodes with a specific nodal-level attribute generating a higher in-degree can easily be translated into a dyadic-level analysis focusing on a tendency for other nodes to choose nodes with such an attribute. 14. However, see Borgatti, Jones, and Everett (1998) for a more extensive list of potential individual- and group-level social capital measures. 15. See Podolny (2001) for an interesting different approach. 16. If some nodes are members of multiple groups (e.g., project teams) or links within and between groups are of interest, other approaches, such as the use of a two-mode network analysis, might be more appropriate (cf. Agneessens & Everett, 2013).

Which Social Network Models to Use and When?   211 17. Some research has pointed at nonlinear effects of a specific group-level property. For example, performance might be highest when there is a relatively high level of trust among team members, but too high a level of trust might be counterproductive as it generates groupthink (Chung & Jackson, 2013). 18. In this respect it is worth noting that Fang et al. (2015) also found that the network position had an important independent effect on work outcomes after controlling for ­personality. 19. The reason for this is that attitudes and behavioral variables are measured at a nodal level and networks at a dyadic level. Note that since these are actor-oriented models, they rely on changes in attributes and network ties being made by an actor and hence might not be seen as purely dyadic approaches (cf. Snijders & Koskinen, 2013, p. 138). 20. The data can be either complete (intragroup) network data or a sample of ego network data or dyadic network data from a set of groups.

References Agneessens, F., Borgatti, S. P., & Everett, M. G. (2017). Geodesic based centrality: Unifying the local and the global. Social Networks, 49, 12–26. Agneessens, F., & Koskinen, J. (2016). Modelling individual outcomes using a Multilevel Social Influence (MSI) model. Individual versus team effects of trust on job satisfaction in an organisational context. In T. A. B. Snijders & E. Lazega (Eds.), Multilevel network analysis (pp. 81‒105). Berlin: Springer. Agneessens, F., & Wittek, R. (2008). Social capital and employee well-being: Disentangling intrapersonal and interpersonal selection and influence mechanisms. Revue Française de Sociologie, Special Issue on Social Networks, 49, 617–637. Agneessens, F., & Wittek, R. (2013). Where do intra-organizational advice relations come from? Social Networks, 34, 333–345. Alexander, J. C., Giesen, B., Munch, R., & Smelser, N. J. (Ed.). (1987). The micro-macro link. Berkley: University of California Press. An, W. (2015). Multilevel meta network analysis with application to studying network dynamics of network interventions. Social Networks, 43, 48–56. Baerveldt, C., Völker, B., & Van Rossem, R. (2008). Revisiting selection and influence. An inquiry into friendship networks of high school students and their association with delinquency. Canadian Journal of Criminology and Criminal Justice, 50(5), 559–587. Balkundi, P., Kilduff, M., Barsness, Z. I., & Michael, J. H. (2007). Demographic antecedents and performance consequences of structural holes in work teams. Journal of Organizational Behavior, 28, 241–260. Bavelas, A. (1950). Communication patterns in task-oriented groups. Journal of the Acoustical Society of America, 22, 725. Blau, P. M. (1977). Inequality and heterogeneity: A primitive theory of social structure (Vol. 7). New York, NY: Free Press. Bloch, F., Jackson, M. O., & Tebaldi, P. (2017). Centrality measures in networks. https://arxiv. org/pdf/1608.05845.pdf Borgatti, S. P. (2005). Centrality and network flow. Social Networks, 27(1), 55–71. Borgatti, S. P., & Everett, M. G. (2000). Models of core/periphery structures. Social Networks, 21(4), 375–395. Borgatti, S. P., & Halgin, D. (2011). On network theory. Organization Science, 22, 1168–1181.

212   Filip Agneessens Borgatti, S.  P., Jones, C., & Everett, M.  G. (1998). Network measures of social capital. Connections, 21(2), 27–36. Borgatti, S. P., & Li, X. (2009). On network analysis in a supply chain context. Supply Chain Management, 45(2), 5–22. Borgatti, S.  P., Mehra, A., Brass, D., & Labianca, G. (2009). Network analysis in the social sciences. Science, 323(5916), 892–895. Brass, D. J. (1984). Being in the right place: A structural analysis of individual influence in an organization. Administrative Science Quarterly, 29, 518–539. Brass, D.  J. (2012). A social network perspective on organizational psychology. In S. W. J. Kozlowski (Ed.), The Oxford handbook of organizational psychology (pp. 667–695). New York: Oxford University Press. Brass, D. J., & Borgatti, S. P. (2019). A practical guide to multilevel social network research. In S. E. Humphrey, & J. M. LeBreton (Eds.) The handbook of multilevel theory, measurement, and analysis. American Psychological Association. Brass, D. J., Galaskiewicz, J., Greve, H. R., & Tsai, W. (2004). Taking stock of networks and organizations: A multilevel perspective. Academy of Management Journal, 47(6), 795–817. Burt, R. S. (1980). Cooptive corporate actor networks: A reconsideration of interlocking directorates involving American manufacturing. Administrative Science Quarterly, 25, 557–582. Burt, R. S. (1984). Network items and the General Social Survey. Social Networks, 8, 149–174. Burt, R. S. (1987). Social contagion and innovation: Cohesion versus structural equivalence. American Journal of Sociology, 92, 1287–1335. Burt, R. S. (1992). Structural holes: The social structure of competition. Cambridge: Harvard University Press. Burt, R. S. (2000). The network structure of social capital. Research in Organizational Behavior, 22, 345–423. Burt, R.  S. (2004). Structural holes and good ideas. American Journal of Sociology, 110, 349–399. Burt, R. S., Jannotta, J. E., & Mahoney, J. T. (1998). Personality correlates of structural holes. Social Networks, 20(1), 63–87. Burt, R. S., Kilduff, M., & Tasselli, S. (2013). Social network analysis: Foundations and frontiers on advantage. Annual Review of Psychology, 64, 527–547. Buskens, V., & Van de Rijt, A. (2008). Dynamics of networks if everyone strives for structural holes. American Journal of Sociology, 114(2), 371–407. Butts, C. T. (2008). A relational event framework for social action. Sociological Methodology, 38(1), 155–200. Campbell, K. E., Marsden, P. V., & Hulbert, J. S. (1986). Social resources and socioeconomic status. Social Networks, 8, 97–117. Carrington, P. J., Scott, J., & Wasserman, S. (Eds.). (2005). Models and methods in social network analysis (Vol. 28). New York: Cambridge University Press. Chung, Y., & Jackson, S. E. (2013). The internal and external networks of knowledge-intensive teams. The role of task routineness. Journal of Management, 39(2), 442–468. Cohen, S., & Wills, T.  A. (1985). Stress, social support, and the buffering hypothesis. Psychological Bulletin, 98(2), 310–357. Coleman, J. S. (1990). Foundations of social theory. Cambridge, MA: Belknap Press of Harvard University Press. Coleman, J. S. (1986). Social theory, social research, and a theory of action. American Journal of Sociology, 91(6), 1309–1335.

Which Social Network Models to Use and When?   213 Coleman, J.  S., Katz, E., & Menzel, H. (1966). Medical innovation. New York, NY: BobbsMerrill. Contractor, N.  S., Wasserman, S., & Faust, K. (2006). Testing multitheoretical, multilevel hypotheses about organizational networks: An analytic framework and empirical example. Academy of Management Review, 31(3), 681–703. Corten, R., & Buskens, V. (2010). Co-evolution of conventions and networks: An experimental study. Social Networks, 32(1), 4–15. de Klepper, M. C., Labianca, G., Sleebos, E., & Agneessens, F. (2017). Sociometric status and peer control attempts: A multiple status hierarchies approach. Journal of Management Studies, 54(1), 1–31. Doreian, P., Teuter, K., & Wang., C. H. (1984). Network autocorrelation models: Some Monte Carlo results. Sociological Methods Research, 13(2), 155–200. Duncan, T.  E., Duncan, S.  C., & Strycker, L.  A. (2006). An introduction to latent variable growth curve modeling: Concepts, issues, and application. New York: Routledge Academic. Ellwardt, L., Steglich, C., & Wittek, R. (2012). The co-evolution of gossip and friendship in workplace social networks. Social Networks, 34(4), 623–633. Entwisle, B., Faust, K., Rindfuss, R.  R., & Kaneda, T. (2007). Networks and contexts: Community variation in the structure of social ties. American Journal of Sociology, 112(5), 1495–1533. Fang, R., Landis, B., Zhang, Z., Anderson, M. H., Shaw, J. D., & Kilduff, M. (2015). Integrating personality and social networks: A meta-analysis of personality, network position, and work outcomes in organizations. Organization Science, 26(4), 1243–1260. Fischer, C. S. (1982). To dwell among friends: Personal networks in town and city. Chicago, IL: University of Chicago Press. Fowler, J. H. (2006). Legislative cosponsorship networks in the US House and Senate. Social Networks, 28, 454–465. Freeman, L. (1979). Centrality in social networks: Conceptual clarification. Social Networks, 1, 215–239. Friedkin, N. E. (1984). Structural cohesion and equivalence explanations of social homogeneity. Sociological Methods and Research, 12, 235–261. Fujimoto, K., & Valente, T. W. (2012). Social network influences on adolescent substance use: Disentangling structural equivalence from cohesion. Social Science & Medicine, 74(12), 1952–1960. Grosser, T. J., Lopez-Kidwell, V., & Labianca, G. (2010). A social network analysis of positive and negative gossip in organizational life. Group & Organization Management, 35(2), 177–212. Grund, T. U. (2012). Network structure and team performance: The case of English Premier League soccer teams. Social Networks, 34(4), 682–690. Haines, V. A., & Hurlbert, J. S. (1992). Network range and health. Journal of Health and Social Behavior, 33(3), 254–266. Hamill, L., & Gilbert, G. N. (2009). Social circles: A simple structure for agent-based social network models. Journal of Artificial Societies and Social Simulation, 12(2), 1‒3. Harrigan, N. M., Labianca, G. J., & Agneessens, F. (2020). Negative ties and signed graphs research: Stimulating research on dissociative forces in social networks. Social Networks, 60, 1-10. Harrison, D. A., & Klein, K. J. (2007). What’s the difference? Diversity constructs as separation, variety, or disparity in organizations. Academy of Management Review, 32(4), 1199–1228.

214   Filip Agneessens Hedström, P., & Swedberg, R. (1998). Social mechanisms. Cambridge: Cambridge University Press. Hitt, M. A., Beamish, P. W., Jackson, S. E., & Mathieu, J. E. (2007). Building theoretical and empirical bridges across levels: Multilevel research in management. Academy of Management Journal, 50(6), 1385–1399. Huitsing, G., Snijders, T. A. B., Van Duijn, M. A. J., & Veenstra, R. (2014). Victims, bullies, and their defenders: A longitudinal study of the co-evolution of positive and negative networks. Development and Psychopathology, Development and Psychopathology, 26, 645–659. Ibarra, H. (1992). Homophily and differential returns: Sex differences in network structure and access in an advertising firm. Administrative Science Quarterly, 37(3), 422–447. Kalish, Y., & Robins, G. (2006). The relationship between individual predispositions, structural holes and network closure. Social Networks, 28(1), 56–84. Keeling, M. J., & Eames, K. T. D. (2005). Networks and epidemic models. Journal of the Royal Society, 2(4), 295–307. Kenny, D. A., Kashy, D. A., & Cook, W. L. (2006). The analysis of dyadic data. New York, NY: Guilford. Kim, S., & Shin, E. H. (2002). A longitudinal analysis of globalization and regionalization in international trade: A social network approach. Social Forces, 81(2), 445–468. Klein, K. J., & Kozlowski, S. W. (2000). From micro to meso: Critical steps in conceptualizing and conducting multilevel research. Organizational Research Methods, 3(3), 211–236. Klein, K. J., Lim, B. C., Saltz, J. L., & Mayer, D. M. (2004). How do they get there? An examination of the antecedents of centrality in team networks. Academy of Management Journal, 47(6), 952–963. Klovdahl, A. S. (1985). Social networks and the spread of infectious diseases: The AIDS example. Social Science & Medicine, 21(11), 1203–1216. Knecht, A. B., Burk, W. J., Weesie, J., & Steglich, C. (2011). Friendship and alcohol use in early adolescence: A multilevel social network approach. Journal of Research on Adolescence, 21(2), 475–487. Lazega, E., Lise Mounier, L., Snijders, T., & Tubaro, P. (2012). Norms, status and the dynamics of advice networks: A case study. Social Networks, 34, 323–332. Lazega, E., & Snijders, T. A. B. (Eds.). (2016). Multilevel network analysis for the social sciences (Methodos Series: Methodological Prospects in the Social Sciences). Cham, Germany: Springer. Leavitt, H. J. (1951). Some effects of certain communication patterns on group performance. Journal of Abnormal and Social Psychology, 46(1), 38–50. Leenders, R.  T.  A.  J. (2002). Modeling social influence through network autocorrelation: Constructing the weight matrix. Social Networks, 24(1), 21–47. Lin, N., Cook, K. S., & Burt, R. S. (2001). Social capital: Theory and research. New York: Aldine de Gruyter. Lin, N., Vaughn, J. C., & Ensel, W. M. (1981). Social resources and occupational status attainment. Social Forces, 59, 1163–1181. Lomi, A., Robins, G., & Tranmer, M. (2016). Introduction to multilevel social networks. Social Networks, 44, 266–268. Lorrain, F., & White, H. C. (1971). Structural equivalence of individuals in social networks Journal of Mathematical Sociology, 1(1), 49–80. Lubbers, M. (2003). Group composition and network structure in school classes: A multilevel application of the p∗ model. Social Networks, 25, 309–332.

Which Social Network Models to Use and When?   215 Lusher, D., Koskinen, J., & Robins, G. (2013). Exponential random graph models for social networks. Structural analysis in the social sciences. New York, NY: Cambridge University Press. Macy, M. W., & Flache, A. (2009). Social dynamics from the bottom up: Agent-based models of social interaction. In P. Hedström & P. Bearman (Eds.), The Oxford handbook of analytical sociology (pp. 245–268). Oxford, UK: Oxford University Press. Marks, M. A., Mathieu, J. E., & Zaccaro, S. J. (2001). A temporally based framework and taxonomy of team processes. Academy of Management Review, 26(3), 356–376. Marsden, P. V. (1987). Core discussion networks of Americans. American Sociological Review, 52, 122–131. Marsden, P.  V. (1990). Network data and measurement. Annual Review of Sociology, 16, 435–463. Marsden, P.  V., & Friedkin, N.  E. (1993). Network studies of social influence. Sociological Methods & Research, 22, 125–149. Mäs, M., Flache, A., Takács, K., & Jehn, K. A. (2013). In the short term we divide, in the long term we unite: Demographic crisscrossing and the effects of faultlines on subgroup polarization. Organization Science, 24(3), 716–736. McCallister, L., & Fischer, C.  S. (1978). A procedure for surveying personal networks. Sociological Methods & Research, 7(2), 131–148. McPherson, M., Smith-Lovin, L., & Cook, J. M. (2001). Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27, 415–444. Mehra, A., Kilduff, M., & Brass, D. J. (1998). At the margins: A distinctiveness approach to the social identity and social networks of underrepresented groups. Academy of Management Journal, 41(4), 441–452. Mehra, A., Kilduff, M., & Brass, D.  J. (2001). The social networks of high and low self­monitors: Implications for workplace performance. Administrative Science Quarterly, 46(1), 121–146. Mercken, L., Snijders, T. A. B., Steglich, C., & de Vries, H. (2009). Dynamics of adolescent friendship networks and smoking behavior: Social network analyses in six European countries. Social Science & Medicine, 69(10), 1506–1514. Mercken, L., Snijders, T. A., Steglich, C., Vartiainen, E., & De Vries, H. (2010). Dynamics of adolescent friendship networks and smoking behavior. Social Networks, 32(1), 72–81. Mercken, L., Steglich, C., Sinclair, P., Holliday, J., & Moore, L. (2012). A longitudinal social network analysis of peer influence, peer selection, and smoking behavior among adolescents in British schools. Health Psychology, 31(4), 450. Mizruchi, M. S., & Marquis, C. (2006). Egocentric, sociocentric, or dyadic? Identifying the appropriate level of analysis in the study of organizational networks. Social Networks, 28(3), 187–208. Mizruchi, M. S., & Stearns, L. B. (1988). A longitudinal study of the formation of interlocking directorates. Administrative Science Quarterly, 33, 194–210. Moody, J. (2001). Race, school integration, and friendship segregation in America. American Journal of Sociology, 107, 679–716. Moore, G. (1990). Structural determinants of men’s and women’s personal networks. American Sociological Review, 55(5), 726–735. Mouw, T. (2006). Estimating the causal effect of social capital: A review of recent research. Annual Review of Sociology, 32, 79–102. Mullen, B., & Copper, C. (1994). The relation between group cohesiveness and performance: An integration. Psychological Bulletin, 115(2), 210–227.

216   Filip Agneessens Mundt, M. P., Mercken, L., & Zakletskaia, L. (2012). Peer selection and influence effects on adolescent alcohol use: A stochastic actor-based model. BMC Pediatrics, 12, 115. Obstfeld, D. (2005). Social networks, the tertius iungens orientation, and involvement in innovation. Administrative Science Quarterly, 50(1), 100–130. Oh, H., & Kilduff, M. (2008). The ripple effect of personality on social structure: Selfmonitoring origins of network brokerage. Journal of Applied Psychology, 93(5), 1155–1164. Palardy, G. J. (2008). Differential school effects among low, middle, and high social class composition schools: A multiple group, multilevel latent growth curve analysis. School Effectiveness and School Improvement, 19(1), 21–49. Perry, B. L., Pescosolido, B. A., & Borgatti, S. P. (2018). Egocentric network analysis: Foundations, methods, and models. New York: Cambridge University Press. Pfeffer, J. (1997). New directions for organization theory. New York: Oxford University Press. Podolny, J. M. (2001). Networks as the pipes and prisms of the market. American Journal of Sociology, 107(1), 33–60. Preacher, K. J., Wichman, A. L., MacCallum, R. C., & Briggs, N. E. (2008). Latent growth curve modeling (No. 157). Thousand Oaks, Ca., Sage. Raub, W., Buskens, V., & Van Assen, M. A. (2011). Micro-macro links and microfoundations in sociology. Journal of Mathematical Sociology, 35(1–3), 1–25. Reagans, R., Zuckerman, E., & McEvily, B. (2004). How to make the team: Social networks vs. demography as criteria for designing effective teams. Administrative Science Quarterly, 49(1), 101–133. Ren, H., Gray, B., & Harrison, D. A. (2015). Triggering faultline effects in teams: The importance of bridging friendship ties and breaching animosity ties. Organization Science, 26(2), 390–404. Ripley, R., Snijders, T.  A.  B., Boda, Z., Vörös, A., & Preciado, P. (2019). Manual for SIENA version 4.0. Oxford: University of Oxford, Department of Statistics. http://www.stats.ox. ac.uk/~snijders/siena/ Rivera, M.  T., Soderstrom  S.  B., & Uzzi, B. (2010). Dynamics of dyads in social networks: Assortative, relational, and proximity mechanisms. Annual Review of Sociology, 36, 91–115. Robins, G. (2015). Doing social network research: Network-based research design for social scientists. London: Sage. Rogers, E. M. (2003). Diffusion of innovations (5th ed.). New York, NY: Free Press. Rousseau, D. M. (1985). Issues of level in organizational research: Multi-level and cross-level perspectives. In L. L. Cummings & B. M. Staw (Eds.), Research in organizational behavior (Vol. 7, pp. 1–37). Greenwich, CT: JAI. Salancik, G. R., & Pfeffer, J. (1978). A social information processing approach to job attitudes and task design. Administrative Science Quarterly, 23(2), 224–253. Sarason, I. G., Levine, H. M., Basham, R. B., & Sarason, B. R. (1983). Assessing social support: The social support questionnaire. Journal of Personality and Social Psychology, 44, 127–139. Sasidharan, S., Santhanam, R., Brass, D. J., & Sambamurthy, V. (2012). The effects of social network structure on enterprise systems success: A longitudinal multilevel analysis. Information Systems Research, 23(3-part-1), 658–678. Schaefer, D. R., Simpkins, S. D., Vest, A. E., & Price, C. D. (2011). The contribution of extracurricular activities to adolescent friendships: New insights through social network analysis. Developmental Psychology, 47(4), 1141–1152. Schulte, M., Cohen, N. A., & Klein, K. J. (2012). The coevolution of network ties and perceptions of team psychological safety. Organization Science, 23(2), 564–581.

Which Social Network Models to Use and When?   217 Scott, J. (2017). Social network analysis. London: Sage. Shalizi, C. R., & Thomas, A. C. (2011). Homophily and contagion are generically confounded in observational social network studies. Sociological Methods & Research, 40(2), 211–239. Shaw, M. E. (1954). Some effects of unequal distribution of information upon group performance in various communication nets. Journal of Abnormal and Social Psychology, 49(4), 547–553. Shore, J., Bernstein, E., & Lazer, D. (2015). Facts and figuring: An experimental investigation of network structure and performance in information and solution spaces. Organization Science, 26(5), 1432–1446. Smith, D. A., & White, D. R. (1992). Structure and dynamics of the global economy: Network analysis of international trade 1965–1980. Social Forces, 70(4), 857–893. Smith, K. G., Smith, K. A., Olian, J. D., Sims, H. P., O’Bannon, D. P., & Scully, J. A. (1994). Top management team demography and processes. The role of social integration and communication. Administrative Science Quarterly, 39, 412–438. Snijders, T. A. B. (2011). Statistical models for social networks. Annual Review of Sociology, 37, 131–153. Snijders, T. A. B. (2016). The multiple flavours of multilevel issues for networks. In E. Lazega & T.  A.  B.  Snijders (Eds.), Multilevel network analysis for the social sciences (pp. 15–46). Cham, Germany: Springer. Snijders, T. A. B., & Baerveldt, C. (2003). A multilevel network study of the effects of delinquent behavior on friendship evolution. Journal of Mathematical Sociology, 27, 123–151. Snijders, T. A. B., & Koskinen, J. (2013). Longitudinal models. In D. Lusher, J. Koskinen, & G. Robins (Eds.), Exponential random graph models for social networks. (pp. 130-140). New York: Cambridge University Press. Snijders, T. A. B., & Steglich, C. E. (2015). Representing micro–macro linkages by actor-based dynamic network models. Sociological Methods & Research, 44(2), 222–271. Snijders, T. A. B., van de Bunt, G. G., & Steglich, C. E. G. (2010). Introduction to stochastic actor-based models for network dynamics. Social Networks, 32, 44–60. Soltis, S. M., Agneessens, F., Sasovova, Z., & Labianca, G. (2013). A social network perspective on turnover intentions: The role of distributive justice and social support. Human Resource Management, 52(4), 561–584. Sparrowe, R. T., Liden, R. C., Wayne, S. J., & Kraimer, M. L. (2001). Social networks and the performance of individuals and groups. Academy of Management Journal, 44, 316–325. Stadtfeld, C., Hollway, J., & Block, P. (2017). Dynamic network actor Markov models: Investigating coordination ties through time. Sociological Methodology, 47(1), 1–40. Stadtfeld, C., Takács, K., & Vörös, A. (2020). The emergence and stability of groups in social networks. Social Networks, 60, 129–145. Steglich, C., Snijders, T.  A.  B., & Pearson, M. (2010). Dynamic networks and behavior: Separating selection from influence. Sociological Methodology, 40(1), 329–393. Stevenson, W. B., & Gilly, M. (1991). Information processing and problem solving: The migration of problems through formal positions and networks of ties. Academy of Management Journal, 34, 918–928. Tasselli, S., Kilduff, M., & Menges, J.  I. (2015). The microfoundations of organizational social networks. A review and an agenda for future research. Journal of Management, 41, 1361–1387. Thoits, P. A. (1982). Conceptual, methodological, and theoretical problems in studying social support as a buffer against life stress. Journal of Health and Social Behavior, 23, 145–159.

218   Filip Agneessens Tolsma, J., van Deurzen, I., Stark, T. H., & Veenstra, R. (2013). Who is bullying whom in ethnically diverse primary schools? Exploring links between bullying, ethnicity, and ethnic diversity in Dutch primary schools. Social Networks, 35(1), 51–61. Travers, J., & Milgram, S. (1969). An experimental study of the small world problem. Sociometry, 32, 425–443. Troster, C., Mehra, A., & Van Knippenberg, D. (2014). Structuring for team success: The interactive effects of network structure and cultural diversity on team potency and performance. Organizational Behaviour and Human Decision Processes, 124, 245–255. Valente, T. W. (1995). Network models of the diffusion of innovations. Cresskill, NJ: Hampton Press. Valente, T.  W. (1996). Social network thresholds in the diffusion of innovations. Social Networks, 18(1), 69–89. Valente, T.  W. (2005). Models and methods for innovation diffusion. In P.  J.  Carrington, J. Scott, & S. Wasserman (Eds.), Models and methods in social network analysis. (pp. 98–116). Cambridge, UK: Cambridge University Press. Van de Bunt, G.  G., Van Duijn, M.  A.  J., & Snijders, T.  A.  B. (1999). Friendship networks through time: An actor-oriented dynamic statistical network model. Computational and Mathematical Organization Theory, 5(2), 167–192. Van Duijn, M. A. J., Van Busschbach, J. T., & Snijders, T. A. B. (1999). Multilevel analysis of personal networks as dependent variables. Social Networks, 21, 187–209. Van Emmerik, I. J. H. (2006). Gender differences in creation of different types of social capital: A multilevel study. Social Networks, 28(1), 24–37. Veenstra, R., Dijkstra, J., Steglich, C., & Van Zalk, M. (2013). Network–behavior dynamics. Journal of Research on Adolescence, 23(3), 399–412. Wang, P., Robins, G., & Pattison, P. (2009). PNet: Program for the simulation and estimation of exponential random graph models. Melbourne School of Psychological Sciences, The University of Melbourne. (url: http://www.melnet.org.au/pnet) Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications (Vol. 8). Cambridge: Cambridge University Press. Wellman, B. (1979). The community question: The intimate networks of East Yorkers. American Journal of Sociology, 84(5), 1201–1231. Wellman, B. (1983). Network analysis: Some basic principles. Sociological Theory, 1, 155–200. Wellman, B. (1997). Structural analysis: From method and metaphor to theory and substance. Contemporary Studies in Sociology, 15, 19–61. Wellman, B., & Frank, K. (2001). Network capital in a multi-level world: Getting support in personal communities. In N.  Lin, K.  Cook, & R.  Burt (Eds.), Social capital: Theory and research (pp. 233–273). Chicago, IL: Aldine DeGruyter. Yang, S.  W., Trincado, F., Labianca, G., & Agneessens, F. (2019) Negative ties at work. In D.  Brass & S.  P.  Borgatti (Eds.) Social networks at work. SIOP Organizational Frontiers Series. New York: Routledge. Zhang, Z., & Peterson, S. J. (2011). Advice networks in teams: The role of transformational leadership and members’ core self-evaluations. Journal of Applied Psychology, 96(5), 1004.

chapter 12

A n I n troduction to Statistica l Model s for N et wor ks Valentina Kuskova1 and Stanley Wasserman 2

We begin with a graph (or a directed graph), a single set of nodes N, and a set of lines or arcs L. It is common to use this mathematical concept to represent a network as well as the notation of Wasserman and Faust (1994), especially Chapters 13 and 15. We first note that we view statistical models for networks to have three generations of development: 1. Uniform distribution—the “random” graph 2. Conditional uniform distributions 3. Exponential family of random graph distributions Uniform and conditional uniform distributions take a space of graphs (or directed graphs), which may have been restricted by conditioning on particular graph characteristics, having been given fixed values (such as L = 10 lines, or a dyad census, specifying the counts of mutual, asymmetric, and null dyads), and assign equal probabilities to all graphs in the space. The completely uniform distribution conditions on nothing, while the ErdösRényi conditional uniform distribution fixes the number of lines in the graph (or arcs in the digraph). A large segment of the network science community works exclusively with this conditional uniform distribution. These distributions, as well as Holland and Leinhardt’s p1 distribution (the independent multinomial dyad distribution), are discussed at length in Chapters 13 and 15 of Wasserman and Faust (1994). Our focus here will be on the exponential family of random graph distributions, p*, because of its inclusiveness. It includes the conditional uniform distributions as special cases. There are extensions of these distributions, posited first for single networks, to a wide range of situations, including multiple relations, affiliation relations, valued relations, and social influence and selection studies (in which information on attributes of the nodes is

220   Valentina Kuskova and Stanley Wasserman available). All of these extensions can be found in the chapters of Carrington, Scott, and Wasserman (2005) and Lusher, Koskinen, and Robins (2013). We also mention some of these extensions at the end of this chapter, as do Chapters 13 and 14 of this volume. The purpose of this short chapter is to discuss the developments in statistical models for networks that have occurred over the past 10 years, since the publication of the statistical chapters (Chapters 8, 9, 10, and 11) of Carrington, Scott, and Wasserman (which were written in 2002) and since the Wasserman, Robins, and Steinley (2006) review paper on statistical models for networks. New developments include, for instance, longitudinal models for the coevolution of networks and behavior (Snijders, Steglich, & Schweinberger, 2007) and latent space models for social networks (Hoff, Raftery, & Handcock, 2002). More recent research has introduced similar models for more complicated network data. We refer the reader to the excellent expositions by Kolaczyk (2009) and Lusher et al. (2013). Attention here is given to models for small to moderately large networks, most commonly gathered by social and behavioral scientists, rather than to big networks, which are usually modeled by simple models based on few substantive assumptions. At the beginning of this chapter, we assume that the researcher has complete or whole networks, which allows us to take a global view of social structure, including information on all actors in the “population” and all existing ties among these actors. Data collection of these networks begins from a list of included nodes and includes data on the presence or absence of relations between every pair of nodes. Relations are assumed to be dichotomous. There are many, many examples of such networks, as can be found in the chapters of this volume. For complete network studies, network data may be collected based on documentary or electronic records (see Robins,  2015), but for unipartite networks, actors are commonly asked to respond to a survey. In that case, actors are presented with one or more name generator items, where they are asked to identify other individuals with whom they have a relationship consistent with the name generator. For instance, in a study of workplace structure, respondents might be presented with a name generator item such as: In this organization, with whom do you work closely to ensure that you complete your ­work-based tasks?

Respondents might then provide names based on simple recall or—if the organization is small enough—might be given a list of people in the organization and asked to tick off those consistent with the name generator. The name generator item then measures network ties with that particular content. Alternatives to complete networks include sampled or ego-centered networks. Statistical models for such data are quite simple and underdeveloped (Crossley et al., 2015). There is a growing literature on network sampling, but the models based on sampled data are also in need of further research.

Some History Statistical models for social networks are essentially probability distributions for graphs. Early work on distributions for graphs was quite limiting, forcing researchers to adopt

AN Introduction to Statistical Models for Networks   221 independence assumptions that were not terribly realistic (see Chapters 13 to 16 of Wasserman & Faust, 1994). It is hard to accept the standard assumption common in much of the literature, especially in physics, of complete independence and then to adopt the misnamed and overly simplistic Erdös-Rényi “random graph” distribution (there are, of course, an infinite number of random graph distributions). The random graph distribution to the physicists, which is usually referred to as a Bernoulli graph, assumes no dependencies at all among the random components of a graph. Equally hard to believe as a true representation of social behavior are the many conditional uniform distributions and p1, which assumes independent dyads. The breakthrough in statistical modeling of networks was first exposited by Frank and Strauss (1986), who termed their model a Markov random graph. Further developments, especially commentary on the estimation of distribution parameters, were given by Strauss and Ikeda (1990). Wasserman and Pattison (1996) elaborated on the model, describing a more general family of distributions. Pattison and Wasserman (1999);  Robins, Pattison, and Wasserman (1999); and Anderson, Wasserman, and Crouch (1999) further developed this family of models, showing how a Markov parametric assumption gives just one, of many, possible sets of parameters. This family, with its variety and extensions, was named p*, a label by which it has come to be known. The p* label (first used by Wasserman & Pattison, 1996) derives from the research on statistical modeling commenced by Holland and Leinhardt (1981) with their dyadic independence p1 model. The parameters (which are determined by the hypothesized dependence structure) reflect structural concerns, which are assumed to be governing the probabilistic nature of the underlying social and/or behavioral process. This family, with its variety and extensions, is alternatively known as the exponential family of random graph models. This pre-2000 early work by the first researchers extended p* in a variety of ways and laid the foundation for recent work on the estimation problems inherent in the early formulations. This research also was an important forerunner of the new parametric specifications that gave wider usage to the family. Wasserman and Robins (2005) offer a review of p* circa 2003, while Robins et al. (2006) review the 2003–2006 period. A more thorough history of this family of distributions, including a discussion of its roots in spatial modeling and statistical physics, can be found in Lusher et al. (2013). The work of Frank and Strauss (1986) did indeed begin a new era for statistical modeling of networks, although it took 10 years for Markov random graphs to be discussed at more length by network methodologists. We briefly describe the highlights of the past decade here. More information, especially recent advances in these models, can be found in Chapter 13 in this volume. An area of rapid growth in network analysis has been the development and use of new statistical models for social networks. A recent, comprehensive review of this research can be found in Kolaczyk (2009). In this chapter we concentrate on two statistical models that we consider of interest to sociology researchers: the exponential family of random graph models for cross-sectional network data known as p*, and stochastic actor-oriented models for longitudinal network data. Both models permit inferences about the structure of an empirical social network, including associations between individual attributes and network

222   Valentina Kuskova and Stanley Wasserman relational tie variables. Using these models, statistical inferences can be made about research questions such as: • Do relationships of trust among humans tend to be reciprocated and/or are they ­hierarchical? • Are shared job attitudes associated with work collaboration? • Does common drinking behavior among adolescents lead to friendship, or do adolescent binge drinkers learn their drinking behavior from their friends? Network statistical models often have some important differences from the standard statistical approaches typically used by social and behavioral scientists, technically, because the standard approaches enable simple likelihood functions and relatively straightforward estimation from empirical data. Networks, however, imply dependence among observations. To make the point more generally, a network-based social system implies dependence among the various patterns of ties in systematic ways. Clearly, one of the issues facing the development of network statistical models has been how to formulate models that express dependence among network tie variables.

Some More Notation As mentioned at the beginning of this chapter, a network is a set of g actors and a collection of r relations that specify how these actors are related to one another. As defined by Wasserman and Faust (1994, Chapter 3), a network can also contain a collection of attribute characteristics, measured on the actors. We let Ν = {1,2, . . . ,g} denote the set of actors and χ denote a particular relation defined on the actors (here, we let r = 1). Specifically, χ is a set of ordered pairs recording the presence or absence of relational ties between pairs of actors. This binary relation can be represented by a g × g matrix X, with elements

1 if ( i, j) ∈ c , Xij =  0 otherwise.

We will use a variety of graph characteristics and statistics throughout; such quantities are defined in the early chapters of Wasserman and Faust (1994). We assume throughout that X and its elements are random variables. Typically, these variables are assumed to be interdependent, given the interactive nature of the social processes that generate and sustain a network. Much of the work over the past two decades has been on the explicit hypotheses underlying different types of interdependencies. In fact, one of the newer ideas for network analysis utilized by the p* family of models is a dependence graph, a device that allows a researcher to consider which elements of X are independent. Wasserman and Robins (2005) discuss such graphs at length. A dependence graph, which we illustrate in the next section, is also the starting point for the HammersleyClifford theorem, which posits a very general probability distribution for network random variables using the postulated dependence graph. The exact form of the dependence graph depends on the nature of the substantive hypotheses about the network under study.

AN Introduction to Statistical Models for Networks   223 As outlined by Robins et al. (2006), a statistical model for a network can be constructed using this approach through a series of five steps:

1. Regard each relational tie as a random variable. 2. Specify a dependence hypothesis, or hypotheses, leading to a dependence graph. 3. Generate a specific model from the p* family from the specified dependence graph. 4. Simplify the parameter space through homogeneity or other constraints. 5. Estimate, assess, and interpret model parameters.

We need to introduce p* via a dependence graph.

Exponential Family of Random Graphs—p* Any observed single relational network may be regarded as a realization x =[ xij ] of a random two-way binary array X =[ Xij ] . The dependence structure for these random variables is determined by the dependence graph D of the random array X. D is itself a graph whose nodes are elements of the index set {( i, j ) ; i, j ∈ N, i6 = j} for the random variables in X, and whose edges signify pairs of the random variables that are assumed to be conditionally dependent (given the values of all other random variables). More formally, a dependence graph for a univariate network has node set N D = {( i, j ) ; i, j ∈ N, i6 = j}

The edges of D are given by

e D = {((i, j),( k, l)) , where Xij and Xkl are not conditionally independent}.

Consider now a general dependence graph with an arbitrary edge set. Such a dependence graph yields a very general probability distribution for a (di)graph, which we term p* and focus on later. For an observed network, considered to be a realization x of a random array X, we assume the existence of a dependence graph D. The edges of D are crucial here; consider the set of edges, and determine if there are any complete subgraphs, or cliques found in the de­pend­ ence graph. (For a general dependence graph, a subset A of the set of relational ties ND is complete if every pair of nodes in A (i.e., every pair of relational ties) is linked by an edge of D. A subset composed of a single node is also regarded as complete. These cliques specify which subsets of relational ties are all pairwise, conditionally dependent on each other.

Statistical Theory As mentioned, we assume throughout that X and its elements are random variables, so that any observed single relational network can be regarded as a realization x = [ xij ] of X. Typically, these variables are assumed to be interdependent, given the interactive nature of

224   Valentina Kuskova and Stanley Wasserman the social processes that generate and sustain a network. Much of the work over the past decade has been on explicit hypotheses underlying different types of dependencies among the { Xij } . Dependence graphs are devices that allow a researcher to consider which elements of X are independent. A dependence graph is also the starting point for the Hammersley-Clifford theorem, which posits a very general probability distribution for network random variables using the postulated dependence graph. The exact form of the dependence graph depends on the nature of the substantive hypotheses about the network under study. After presenting the basic model, we come back to our discussion of dependence graphs. The full description of dependence graphs is beyond the scope of this chapter, but further details can be found in Robins and Pattison (2005). The Hammersley-Clifford theorem (see Wasserman & Robins,  2005, for a summary) establishes that a probability model for X depends only on the cliques of the dependence graph D. In particular, application of the Hammersley-Clifford theorem yields a characterization of Pr(X = x) in the form of an exponential family of distributions: 1 Pr ( X = x ) =   exp k



(∑

A⊆ ND

)

lA ∏ ( i , j )∈A xij , (1)

where: • • • •

k = Px exp{ PA ⊆ ND lAQ( i , j ) ∈A xij } is a normalizing quantity; D is the dependence graph for X; the summation is over all subsets A of nodes of D; Q(i , j )∈A xij is the sufficient statistic corresponding to the parameter λ A; and lA = 0 whenever the subgraph induced by the nodes in A is not a clique of D.

Different dependence assumptions result in different types of configurations. Frank and Strauss (1986) first showed that configurations for Markov dependence were edges, stars of various types (a single node with arcs going in and/or out), and triangles or triads. The model in effect supposes that the observed network is built up from combinations of these various configurations, and the parameters express the presence (or absence) of the configurations in the observed network. For instance, a strongly positive triangle parameter is evidence for more triangulation in the network, implying that networks with large numbers of triangles have larger probabilities of occurring. A variety of dependence graphs are well known in the literature. One very general and simple member of the p* family is the Bernoulli graph, in which ties are assumed to be independent of each other (see earlier). For directed networks, dyadic independence assumes that there is dependence only within but not between dyads: Holland and Leinhardt’s (1981) p1 is an example. This model has, as a dependence graph, connections only between Xij and Xji. Markov dependence as introduced by Frank and Strauss (1986) assumes that ties involving the same actor are dependent: precisely, Xij and Xkl are conditionally independent if and only if {i, j} ∩ {k, l} = ∅ . Because of the similarity to the de­pend­ ence inherent in a Markov spatial process, such a random graph was labeled “Markov” by Frank and Strauss (1986). One can also formulate dependence graphs when data on attribute variables measured on the nodes are available. If the attribute variables are taken as fixed, with network ties

AN Introduction to Statistical Models for Networks   225 varying depending on the attributes, then models for social selection arise (Robins, Elliott, & Pattison, 2001). If, on the other hand, the network is assumed fixed, with the distribution of attributes dependent on the pattern of network ties, the outcomes are models for social influence (Robins, Pattison, & Elliott, 2001). The set of nonzero parameters in this probability distribution for Pr(X = x) depends on the maximal cliques of the dependence graph (a maximal clique is a complete subgraph that is not contained in any other complete subgraph). Any subgraph of a complete subgraph is also complete (but not maximal), so that if A is a maximal clique of D, then the probability distribution for the (di)graph will contain nonzero parameters for A and all of its subgraphs. Each clique, and hence each nonzero parameter in the model, corresponds to a configuration, a small subgraph of possible network ties. Different dependence assumptions result in different types of configurations. For instance, Frank and Strauss (1986) showed that configurations for Markov dependence (described later) were edges, stars of various types (a single node with arcs going in and/or out), and triangles for nondirected graphs. The model in effect supposes that the observed network is built up from combinations of these various configurations, and the parameters express the presence (or absence) of the configurations in the observed network. For instance, a strongly positive triangle parameter is evidence for more triangulation in the network, implying that networks with large numbers of triads have larger probabilities of arising. All models from this family, which we refer to as p*, have this form. Some recent literature refers to these models as ERGMs—exponential random graph models. It is of course uninformative to refer to these distributions as “exponential random graphs”—almost any probability distribution for a graph can be made “exponential.” Further, strictly speaking, the model is not exponential but, in the statistical sense, an exponential family, which conveys a special meaning in statistical theory (and has important implications for some of the estimation procedures described later—see Hunter, 2007). Hence, we much prefer the more informative moniker p*, and the descriptor an exponential family of distributions for random graphs. The p* label (first used by Wasserman & Pattison,  1996) derives from the research on statistical modeling commenced by Holland and Leinhardt with their dyadic independence p1 model. As for the details, the probability of a particular realization of a random graph depends on the cliques of the dependence graph, and from that, the sufficient statistics (arising from the configurations) specified by the hypothesized dependencies. The sufficient statistics are the counts of these configurations arising in the realization being modeled.

Parameters One should limit the number of parameters by either postulating a simple dependence graph or making restrictive assumptions about the parameters. The usual assumption is homogeneity, in which parameters for isomorphic configurations of nodes are equated. Even with homogeneity imposed, models may not be identifiable or estimable. Typically, parameters for higher-order configurations (e.g., higher-order stars or triads) are set to zero (equivalent to setting higher-order interactions to zero in general linear models) to ease model interpretation and simplify model fit. However, such parameter limitations can ­create degenerate models (see later).

226   Valentina Kuskova and Stanley Wasserman As mentioned, Markov random graph models were indeed a breakthrough in moving toward more realistic dependence assumptions. Markov dependence is often inadequate, however, in handling typical social network data. Occasionally, parameters arising from Markov dependence assumptions are consistent with either complete or very sparse networks, which are of course unhelpful in modeling realistic data. In other words, Markov random graphs can yield degenerate models. Several authors have provided technical demonstrations of this degeneracy problem (Handcock,  2002; Park & Newman,  2004; Robins, Pattison, & Woolcock, 2005; Robins et al., 2007; Snijders, 2002; Snijders et al., 2006). Other authors have written on the mathematical aspects of this problem (Yin, Rinaldo, & Fadnavis, 2016; Karwa, Petrovic, & Baji, 2016); solutions have fortunately been forthcoming recently. One approach is specification of new parameters and addition of these parameters to the “mix.” For example, Snijders et al. (2006) proposed a method of combining counts of all the Markov star parameters into the one statistic, with geometrically decreasing weights on the higher-order star counts so that they do not come to dominate the calculation. The resulting parameter is termed a geometrically weighted degree parameter, or an alternating k-star parameter (the term alternating comes from alternating signs in the calculation of the statistic). Various versions of this degree-based parameter have been proposed (see Hunter, 2007, who shows the linkages between them). Such parameters increase the “fittability” of models. Another suggestion of Snijders et al. (2006) was k-triangles, configurations with k separate triangles sharing one edge, the base of the k-triangle. These configurations also introduce a new distribution of graph features (alongside the degree distribution and the geodesic distribution): the edge-wise shared partner distribution (see Hunter,  2007; Hunter & Handcock, 2006). Counts of the k-triangle configurations are combined into one statistic just as for the case of the geometrically weighted degree parameter, producing a new statistic and associated parameter for alternating k-triangles. Snijders et al. (2006) also proposed k-paths, configurations identical to k-triangles except that the edge at the base of the k-triangle is not necessarily present. This configuration quantifies multiple independent paths between pairs of nodes. As with the other parameters just described, these counts can be combined into one parameter, alternating k-paths. There is an associated distribution across the graph, the dyad-wise shared partner distribution (e.g., shared partners based on dyads, not just on edges; see Hunter,  2007; Hunter & Handcock, 2006). There is also social circuit dependence, a special, and rather different, dependence assumption. Social circuit dependence explicitly permits the emergence of dependence through existing observations; specifically, the presence of certain ties creates dependencies that otherwise would not exist. This dependence also permits the appearance of higher-order structures (e.g., “clumps” of triangles). While the simpler Markov dependence can also be interpreted in terms of such self-organizing qualities, social circuit dependence enables the appearance of higher-order structures (e.g., dense regions of triangles) that are expressly implied by the model, rather than a simple chance accumulation of basic Markov configurations. We note that these new parameters and specifications do not resolve all the problems of degeneracy and nonconvergence. There are other forms of higher-order dependence assumptions that might also be necessary for a particular dataset. However, the new

AN Introduction to Statistical Models for Networks   227 specifications have proven very adequate. Robins et al. (2007) show that the models containing these new dependence parameters perform dramatically better than Markov models in terms of convergence when applied to a number of classic small-scale network datasets. Goodreau (2007) fits the new specifications to a network of over 1,000 nodes and shows how to assess model fit across many graph features. As mentioned earlier, statisticians have been working on the mathematical properties of these models (Chatterjee & Diaconis, 2013; Wang & Bickel, 2017); as of 2017, the jury is still out with respect to the tractability of their features and their problems.

Simulation, Estimation, and Goodness of Fit It is relatively straightforward to simulate p* models, and estimate parameters (as mentioned later), using long-established statistical approaches such as the Metropolis algorithm (Snijders, 2002) implementation of a Markov chain Monte Carlo. As first noted by Anderson et al. (1999), if the model is not degenerate, the algorithm will “burn in” to a stationary distribution of graphs reflecting the parameter values in the model. The length of the burn-in depends on the starting graph for the simulation, the complexity of the model, and the size of the network. For small networks of 30 nodes, for instance, nondegenerate models can burn in within a few tens of thousands of iterations, which can be achieved within seconds on a fast enough computer. It is then possible to sample a number of graphs from this distribution and look at typical features of them, for instance, the density, the geodesic distribution, the frequencies of various triads, and so on (Robins et al., 2007). In other words, although the model is based on certain configurations, the graphs from the distribution typically will exhibit certain other features of interest that can be investigated. These models are especially appealing not only because they are readily simulated but also because the parameters can be estimated from available data. In the past, p* models were fitted using pseudo-likelihood estimation based on logistic regression procedures (Strauss & Ikeda, 1990; see Anderson et al., 1999, for a review). Although pseudo-likelihood can provide information about the data, especially in terms of identifying major effects (Robins et al., 2007), when models are close to degeneracy or when dependency is strong, the precise pseudo-likelihood parameter estimates are likely to be misleading. A more reliable way to fit the models is through Markov chain Monte Carlo maximum likelihood estimation (MCMCMLE). There are various algorithms for this (see Hunter & Handcock, 2006; Snijders, 2002). While the technical details are complicated, the underlying conceptual basis is straightforward. MCMCMLE is based on simulation (hence the MCMC part of the acronym). A distribution of graphs is simulated from an initial guess at parameter estimates. A sample from the resulting graph distribution is compared to the observed graph to see how well the observed graph is reproduced by the modeled configurations. If it is not well reproduced, the parameter estimates are appropriately adjusted, iteration by iteration. If the model is well behaved, this procedure usually results in increasingly refined parameter estimates, until finally the procedure stops after satisfying some criterion. We do note one large difference between Markov models and models containing parameters from the new specifications: the new specifications are more likely to be well behaved and result in converged parameter estimates.

228   Valentina Kuskova and Stanley Wasserman Once estimates have been obtained, the model can be simulated and assessed. The assessment can be accomplished by comparing a statistic calculated from the observed graph to the distribution of the statistic generated by the model. The chosen statistics should not be those that are “fixed” by the model (i.e., do not use the sufficient statistics for the fitted parameters for assessment). This approach yields goodness-of-fit statistics for the fitted model, even though it is rather demanding. Goodreau (2007) illustrates this approach well, showing how it can be used to improve models by the addition of extra effects. It also permits judgments about how well competing models might represent the network. We note that currently, there are three programs publicly available for the simulation, estimation, and goodness of fit of p* models: • The StOCNET suite of programs from the University of Groningen, http://stat.gamma. rug.nl/stocnet/ (especially RSiena) • The statnet program from the University of Washington, written in R, http://csde. washington.edu/statnet • The pnet program from the University of Melbourne, http://www.sna.unimelb.edu.au As mentioned, most of this software is written in R (see Kolaczyk & Csárdi, 2014, and Luke, 2015). There are Bayesian approaches to estimation (see Bouranis, Friel, & Maire, 2017), most notably the model Bayesian exponential random graphs (BERGMs) (Caimo & Friel, 2014).

Other Types of Networks We mention other types of networks to which p* models have been extended. In fact, the literature on p* has become vast—a Google search of the “misleading” phrase exponential random graph model yielded hundreds of thousands of webpages. Wasserman and Robins (2012) also present a review of other types of networks; we note that this section is based, in part, on that review.

Bipartite Networks A social system usually does not consist of simply one type of actor. For instance, individuals may be associated with organizations, and the system could include both individuals and organizations as actors. Clearly, networks exist with more than one set of actors. A bipartite network is a representation of ties linking actors in one set with actors in another. Such networks are often referred to as two-mode networks. There are ties between nodes of different types but not among nodes of the same type. Similarly, when there are three types of nodes, we will have tripartite networks or, more generally, k-partite networks. There are extensions of the statistical models discussed here to multiple-mode networks.

AN Introduction to Statistical Models for Networks   229

Multilevel Networks Multilevel networks arise when one set of actors are nested within another set of actors (such as managers within departments, both of which can generate relational ties). Recent statistical research on such networks can be found in Lazega and Snijders (2016).

Multivariate Networks A network often contains more than one type of relationship. As Robins and Pattison (2005) observed, relationships between actors are complex and multifaceted, and relationships between two individual actors can take many forms and serve many different purposes. It is usually wrong to aggregate different types of relational ties into a single relationship for analysis. If actors are people, it can be useful to distinguish between ties that involve some level of positive affect or trust, such as friendship and trust, and ties that are more instrumental, such as information exchange or work collaboration. Recent research has labeled such networks multilayer. Multilayer networks include multivariate networks as a special case and are a “hot” topic in recent literature (see Dickison, Magnani, & Ross, 2016).

Longitudinal Models We spend a bit more space discussing such models here, because of their large literature and their importance. Specifically, when data have been collected in longitudinal panel designs with appropriately selected actor attribute and network tie variables, the possible coevolution of network structure and attributes can be examined. Because of the longitudinal data, social influence and social selection effects can be differentiated using stochastic actororiented models (SAOMs) for network evolution (Snijders et al.,  2007; Snijders, van de Bunt, & Steglich, 2010). The models are based on Markov chains of latent changes in attributes given the network structure, and latent changes in network ties given the attributes, using simulation procedures. Parameter estimates are adjusted to compare with the observed panel data, so that the estimates produce the most likely series of changes con­sist­ ent with the effects in the model and the panel data. The final parameter estimates in effect relate to those series of changes that are probabilistically most likely within this simulation. The models are actor-oriented in the sense that each actor is assumed to wish to change the social environment to a more optimal form: such a change occurs in attributes or in structure, contingent on the structure and other attributes in the actor’s local social network. For instance, in a study of the social effects of obesity, there might be a parameter for actors to change their eating habits based on those of their friends, and to change their friendships to those who share their eating habits. With both these parameters in the model, it is possible to distinguish whether people choose friends with similar eating behaviors or adapt their eating behaviors to conform with those of their friends (or indeed both). We do not provide full details of SAOMs in this review chapter. Interested readers should consult Snijders et al. (2010), who provide a tutorial-style article that explains recent developments. When longitudinal network data are available, these models should be of interest

230   Valentina Kuskova and Stanley Wasserman to sociologists. For instance, Selfhout et al. (2010) used an SAOM to determine how the personality traits of adolescents affected the development of social relationships within schools. There has already been extensive work on the coevolution of social structure among school children and health behaviors, such as smoking and drinking, with a focus on identifying both selection and influence effects (e.g., Mercken et al., 2010). Another approach arises when one has continuous records of changes of relational ties. Such event histories can be analyzed with special event history models (see Butts, 2008; Brandes, Lerner, & Snijders, 2009); such methodology is an alternative to SAOMs, when allowed by the data under study.

Longitudinal Networks: Evolution of Structure or Coevolution of Structure and Attributes Of course, social systems are never stagnant and exhibit ongoing change across time as relationships come in and out of existence. In this sense, social systems are stochastic and do not evolve to some “optimal point” where they settle into a fixed structure. This is not to say that the systems are random or that they are not stable; rather, they may operate quite systematically across time according to their own internal logics whereby ties change based on certain stochastic “rules.” Overall global structures may be reasonably stable and con­sist­ ent across time, even when underpinning local structures exhibit change as ties appear and disappear. So it is often desirable to observe networks across time in a panel study, together with measures of attributes. This enables the study of coevolution of ties and attributes. Selection and influence are two processes that involve both network ties and individual attributes, but the sequencing of events differs. In selection, the presence of individuals with similar attributes leads to a network tie; in influence, the presence of a network tie leads to changes of attributes toward similarity. There is no reason to suppose that both processes may not proceed in parallel, along with network self-organizing processes. The network then may exhibit ongoing change through stable processes, as both network ties and individual attributes evolve across time in ways that are possibly linked; in other words, we may see the coevolution of network structure and attributes (Snijders et al., 2007).

Conclusion Network theoretical and analytic approaches have reached a new level of sophistication, accompanied by a rapid growth of interest in adopting these approaches in social science research generally. Of course, much social and behavioral science focuses on individuals, but there are often situations where the social environment—the social system—affects individual responses. In these circumstances, to treat individuals as isolated social atoms, a necessary assumption for the application of standard statistical analysis, is simply incorrect. Network methods should be part of the theoretical and analytic arsenal available to sociologists.

AN Introduction to Statistical Models for Networks   231

Acknowledgements Both authors are members of the Laboratory for Applied Network Research at National Research University Higher School of Economics. This handbook chapter was prepared within the framework of a subsidy granted to the Higher School of Economics, Moscow, by the Government of the Russian Federation for the implementation of the Global Competitiveness Program.

Notes 1. National Research University Higher School of Economics, Moscow 2. Indiana University, Bloomington

References Anderson, C.  J., Wasserman, S., & Crouch, B. (1999). A p* primer: Logit models for social networks. Social Networks, 21, 37–66. Bouranis, L., Friel, N., & Maire, F. (2017). Efficient Bayesian inference for exponential random graph models by correcting the pseudo-posterior distribution. Social Networks, 50, 98–108. Brandes, U., Lerner, J., & Snijders, T. A. B. (2009). Networks evolving step by step: Statistical analysis of dyadic event data In  2009 International Conference on Advances in Social Network Analysis and Mining (pp. 200‒205). IEEE. Butts, C. T. (2008). A relational event framework for social action. Sociological Methodology, 38, 155–200 Caimo, A., & Friel, N. (2014). BERGM: Bayesian exponential random graphs in R. Journal of Statistical Software, 61, 1–25. Carrington, P. J., Scott, J., & Wasserman, S. (Eds.). (2005). Models and methods in social network analysis. New York, NY: Cambridge University Press. Chatterjee, S., & Diaconis, P. (2013). Estimating and understanding exponential random graph models. Annals of Statistics, 41, 2428–2461. Corander, J., Dahmstrom, K., & Dahmstrom, P. (2002). Maximum likelihood estimation for exponential random graph models. In J.  Hagberg (Ed.), Contributions to social network analysis, information theory and other topics in statistics: A Festschrift in honour of Ove Frank (pp. 1–17). University of Stockholm Press. Crossley, N., Bellotti, E., Edwards, G., Everett, M.  G., Koskinen, J., & Tramner, M. (2015). Social Network Analysis for Ego-Nets. London, UK: Sage. Dickison, M. E., Magnani, M., & Rossi, L. (2016). Multilayer social networks. New York, NY: Cambridge. Frank, O., & Strauss, D. (1986). Markov graphs. Journal of the American Statistical Association, 81, 832–842. Goodreau, S. (2007). Applying advances in exponential random graph (p*) models to a large social network. Social Networks, 29, 231–248. Handcock, M. S. (2002). Statistical models for social networks: Degeneracy and inference. In R. Breiger, K. Carley, & P. Pattison (Eds.), Dynamic social network modeling and analysis (pp. 229–240). Washington, DC: National Academies Press. Holland, P. W., & Leinhardt, S. (1977). Notes on the statistical analysis of social network data.

232   Valentina Kuskova and Stanley Wasserman Holland, P. W., & Leinhardt, S. (1981). An exponential family of probability distributions for directed graphs. Journal of the American Statistical Association, 76, 33–65 (with discussion). Hoff, P., Raftery, A., & Handcock, M. (2002). Latent space approaches to social network analysis. Journal of the American Statistical Association, 97, 1090–1098. Hunter, D. R. (2007). Curved exponential family models for social networks. Social Networks, 29, 216–230. Hunter, D., & Handcock, M. (2006). Inference in curved exponential family models for ­networks. Journal of Computational and Graphical Statistics, 15, 565–583. Karwa, V., Petrovic, S., & Baji, D. (2016). DERGMs: Degeneracy restricted exponential random graph models. Unpublished manuscript. Kolaczyk, E. D. (2009). Statistical analysis of network data. New York, NY: Springer. Kolaczyk, E. D., & Csárdi, G. (2014). Statistical analysis of network data with R. New York, NY: Springer. Lazega, E., & Snijders, T. A. B. (Eds.). (2016). Multilevel network analysis for the social sciences: Theory, methods and applications. New York, NY: Springer. Luke, D. A. (2015). A user’s guide to network analysis in R. New York, NY: Springer. Lusher, D., Koskinen, J., & Robins, G. (Eds.). (2013). Exponential random graph models for social networks. New York, NY: Cambridge. Mercken, L., Snijders, T. A., Steglich, C., Vartiainen, E., & De Vries, H. (2010). Dynamics of adolescent friendship networks and smoking behavior. Social networks, 32(1), 72‒81. Park, J., & Newman, M. (2004). Solution of the 2-star model of a network. Physical Review E, 70, 066146. Pattison, P. E., & Wasserman, S. (1999). Logit models and logistic regressions for social networks: II. Multivariate relations. British Journal of Mathematical and Statistical Psychology, 52, 169–193. Robins, G. L. (2015). Doing social network research. London, UK: Sage. Robins, G. L., Elliott, P., & Pattison, P. E. (2001). Network models for social selection processes. Social Networks, 23, 1–30. Robins, G., & Pattison, P. (2005). Interdependencies and social processes: Dependence graphs and generalized dependence structures.  In Models and methods in social network ­analysis, 28. Robins, G. L., Pattison, P. E., & Elliott, P. (2001). Network models for social influence processes. Psychometrika, 66, 161–190. Robins, G. L., Pattison, P. E., Kalish, Y., & Lusher, D. (2006). An introduction to exponential random graph (p*) models for social networks. Social Networks, 29, 173–191. Robins, G. L., Pattison, P. E., & Wasserman, S. (1999). Logit models and logistic regressions for social networks, III. Valued relations. Psychometrika, 64, 371–394. Robins, G.  L., Pattison, P.  E., & Woolcock, J. (2005). Social networks and small worlds. American Journal of Sociology, 110, 894–936. Robins, G.  L., Snijders, T.  A.  B., Wang, P., Handcock, M., & Pattison, P.  E. (2007). Recent developments in exponential random graph (p*) models for social networks. Social Networks, 29, 192–215. Selfhout, M., Burk, W., Branje, S., Denissen, J., Van Aken, M., & Meeus, W. (2010). Emerging late adolescent friendship networks and Big Five personality traits: A social network approach. Journal of personality, 78(2), 509‒538. Snijders, T. A. B. (2002). Markov chain Monte Carlo estimation of exponential random graph models. Journal of Social Structure, 3, 2.

AN Introduction to Statistical Models for Networks   233 Snijders, T. A. B., Pattison, P. E., Robins, G. L., & Handcock, M. (2006). New specifications for exponential random graph models. Sociological Methodology, 36, 99–153. Snijders, T. A. B., Steglich, C., & Schweinberger, M. (2007). Modeling the co-evolution of networks and behavior. In K. van Monfort, H. Oud, & A. Satorra (Eds.), Longitudinal models in the behavioral and related sciences (pp. 41–71). New York, NY: Erlbaum. Snijders, T. A. B., van de Bunt, G. G., & Steglich, C. E. (2010). Introduction to stochastic actorbased models for network dynamics. Social Networks, 32, 44–60. Strauss, D., & Ikeda, M. (1990). Pseudolikelihood estimation for social networks. Journal of the American Statistical Association, 85, 204–212. Wang, Y. X. R., & Bickel, P. J. (2017). Likelihood-based model selection for stochastic block models. Annals of Statistics, 45, 500–528. Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications. New York, NY: Cambridge University Press. Wasserman, S., & Pattison, P. E. (1996). Logit models and logistic regressions for social networks: I. An introduction to Markov random graphs and p*. Psychometrika, 60, 401–426. Wasserman, S., & Robins, G.  L. (2005). An introduction to random graphs, dependence graphs, and p*. In P. J. Carrington, J. Scott, & S. Wasserman (Eds.), Models and methods in social network analysis, (pp. 148‒161). New York, NY: Cambridge University Press. Wasserman, S., & Robins, G. (2012). Social network research: The foundation of network science. In APA handbook of research methods in psychology, Vol 3: Data analysis and research publication (pp. 451–469). American Psychological Association. Wasserman, S., Robins, G., & Steinley, D. (2006). Statistical models for networks: A brief review of some recent research. In ICML Workshop on Statistical Network Analysis (pp. 45‒56). Berlin, Heidelberg: Springer. Yin, M., Rinaldo, A., & Fadnavis, S. (2016). Asymptotic quantization of exponential random graphs. Annals of Applied Probability, 26, 3251–3285.

CHAPTER 13

A dva nces i n Ex pon en ti a l R a n dom Gr a ph Model s Dean Lusher, Peng Wang, Julia Brennecke, Julien Brailly, Malick Faye, and Colin Gallagher

ERGM and ALAAM Exponential random graph models (ERGMs) are statistical models for social network structure. ERGMs assume that social networks are composed of various network substructures (or network configurations) like reciprocity, brokerage, or transitive closure, which, combined together, explain how the network came into being. Network configurations therefore represent social processes. By including a selection of network configurations within an ERGM, a researcher can test theoretical assertions about the presence (or absence) of such substructures. The selection of various network configurations within an ERGM therefore reflects theoretical statements about the formation of network ties. A fundamental assumption of ERGMs is that network ties depend on one another. A key breakthrough for ERGMs with respect to this issue of dependency came from Frank and Strauss (1986), who proposed the idea of a Markov dependence assumption. In short, rather than assuming that network ties are independent—as most standard statistical methods do—Frank and Strauss proposed that two ties are independent unless they share a node, creating the notion of conditional dependence between nodes. As an example, if Abraham knows Betty, and Betty knows Charlie, then we would expect that Abraham and Charlie are more likely to get to know one another because they share a friend. ERGMs capture the interdependences among network ties and other attributes for nodes and ties involved in a  given network setting (Robins, Pattison, Kalish, & Lusher,  2007; Wasserman & Pattison, 1996). The network ties are treated as random variables and the assumptions on how these tie variables may be dependent on one another infer various network configurations or subnetworks that can be interpreted as the local social processes or building blocks

Advances in Exponential Random Graph Models   235 for the overall network structure. By definition, the social is relational and implies ­dependency. ERGM therefore aligns methods more closely with theory by specifically incorporating dependency. For an extensive discussion of the theory, method, and range of applications of ERGM, we recommend Lusher, Koskinen, and Robins (2013). For a shorter introduction, we suggest Robins, Pattison, Kalish, and Lusher (2007) or indeed other papers that provide detail into various aspects of ERGM (Robins, Pattison, & Wang, 2009; Robins, Snijders, Wang, Handcock, & Pattison, 2007; Wang, Robins, Pattison, & Lazega, 2013, 2016; Wang et al., 2009). A second key assumption of ERGMs is that they assume that there is never one single process that explains the formation of network ties. For instance, reciprocity is an important social process, but no one would suggest that ties only form because of this. Further, preferential attachment, or the rich get richer as far as network ties go, is also a very useful social process, yet it is unlikely that this alone explains why any or all networks are formed. Indeed, there are a range of processes that occur simultaneously to make a network the way it is, and ERGM offers a way of analyzing this range of possible social processes all at the same time. So, an ERGM analysis can include network effects for reciprocity, preferential attachment, brokerage, transitive closure, homophily, and more all-in-one analysis—unpacking statistically which of these are actually driving the formation of social ties. This makes it a rather powerful way of analyzing social networks and of comparing competing theories about tie formation all at the same time, one against the other. There have been several generations of ERGMs developed for one-mode networks, from the Bernoulli models (Erdos & Renyi, 1960) to the new specification models (Pattison & Robins, 2004; Snijders et al., 2006). Extensions of ERGMs for the cases of two-mode or bipartite networks (Agneessens & Roose, 2008; Skvoretz & Faust, 1999; Wang, Robins, & Pattison, 2013; Wang et al., 2009), for multiplexed networks (Huitsing et al., 2012; Wang 2013), and for multilevel networks (Wang, Robins, Pattison & Lazega 2013) were also presented. By treating nodal-level attribute measures as covariates, Robins, Elliot & Pattison (2001) introduced social selection models that enable the tests of how nodal attributes may affect social structures based on the dependence assumption that the emergence of various social configurations is conditionally dependent on the attribute values of the nodes involved. So ERGMs allow us to capture network self-organization (i.e., ties come about due to the presence or absence of other ties, such as transitive closure) as well as the idea of social selection (i.e., ties come about because of the presence of actor attributes, such as homophily). In this chapter, we will look at recent developments of ERGMs, which examine networks for self-organizing properties and for the effects of actor attributes, and are therefore explicitly about predicting the presence of network ties. However, we also wish to demonstrate recent developments of models that are highly related to ERGMs, which focus on nodelevel outcomes. These models are auto-logistic actor attribute models (ALAAMs) and they test how nodal attributes may be affected by individuals’ network positions and attributes of their connected neighbors. ALAAMs are also known as social influence models and were introduced by Robins, Pattison & Elliot (2001) and extended by Daraganova et al. (2012), Daraganova and Robins (2013), and Bryant et al. (2016). Together, these two models— ERGM and ALAAM—represent two sides of the same coin, social selection and social influence. An ALAAM approach originates from auto-correlational methods long employed within spatial statistics for understanding how outcomes or attributes of interest are correlated

236   Dean Lusher, Peng Wang, Julia Brennecke ET AL. with themselves across geospatial dimensions (Cliff & Ord,  1972), and more recently adapted for examining diffusion across social network ties (Doreian,  1989; Doreian, Teuter, & Wang, 1984). Like ERGMs, ALAAMs are based on a set of assumptions regarding the conditional dependence of dependent variables. Also like ERGM, ALAAM employs Markov chain Monte Carlo simulation-based methods of estimation and goodness-of-fit tests (Snijders,  2002). However, whereas ERGMs are models for predicting network ties given node-level attributes, and stochastic actor-oriented models (i.e., RSiena) are for modeling the coevolution of node-level outcomes and network ties, ALAAMs are for predicting a node-level attribute given the network structure (and other nodal attributes), using cross-sectional (directed or undirected) network data. Therefore, ALAAM allows researchers to examine how individual behaviors, attitudes, or other outcomes may be constrained and enabled by the structure of surrounding social network relationships, as well as an individual’s own behaviors, attitudes, or outcomes, as well as the behaviors, attitudes, or outcomes of other individuals to whom one is relationally tied. ALAAMs take a superficial form similar to that of a conventional binary logistic regression. However, ALAAM (as well as other network approaches, such as ERGMs) can be differentiated from conventional statistical techniques through the assumption of interdependencies of observations. We introduce the concept of multilevel networks, which have been the subject of a recent special edition in Social Networks (Lomi, Robins, & Tranmer, 2016) and a book (Lazega & Snijders, 2016). Theoretically, multilevel networks allow for the possibility of one level to influence or affect the other in both directions. This means that you can use these multilevel network models to delineate bottom-up and top-down processes, making them quite different from traditional multilevel models, which aim to identify and explain the variances within and between nodes grouped into macro-level units, and between-unit ties cannot be considered. This means with multilevel ERGMs, we can understand how, for instance, organizational ties at one level may shape individuals at another level, but also how network ties between individuals within and across organizations may shape organizational ties.1 In what follows, we focus on the general methods of such models, and then on three applications of the recent multilevel ERGMs for the analysis of network structure, and multilevel ALAAMs and directed single-level ALAAMs for the prediction of nodal outcomes.

Model Constructs We briefly introduce some terminology and model definitions here, and in the following sections we provide examples that work through each of the recent developments in these models. Through the applications, the reader will get an understanding of the sorts of questions that these new models can answer, as well as the general theoretical assumptions made when using these models. Let X denote the random variable representing networks with (n) nodes, and Y denote the various attributes at the nodal level. The network random variable is a collection of network tie variables, that is, X = {Xij}, where i and j are indices for nodes, and Xij = 1 if there is a tie between node i and j; otherwise Xij = 0. The attribute variable Y = {Yi} is a vector of attribute values, and Yi denotes the attribute variable for node i. There are n * (n – 1)/2 tie variables in a nondirected network, where Xij = Xji, and 2n*(n–1)–1 possible networks in the

Advances in Exponential Random Graph Models   237 graph space. For directed networks, there are n * (n – 1) tie variables, and 2n*(n–1) possible networks, and Xij represents a tie sending from node i to j. Actor attributes can be of the forms of binary, continuous, or categorical. For binary attribute measures, there are n variables and 2n possible binary attribute vectors in the attribute space. The large numbers of possible graphs or attributes in the sample space make model parameter estimation difficult for traditional analytical methods. We briefly describe the techniques after describing the model constructs.

Multilevel ERGMs and ALAAMs ERGMs treat the network variable as the dependent variables and nodal attributes as covariates. They can be expressed as Pr ( X = x | Y = y ) =



1 exp ∑ Q {qQ zQ ( X , Y )}, k

where • Q defines configurations derived from various dependence assumptions about network tie variables and attribute covariates; that is, all ties and attributes involved in a configuration are considered dependent on one another given the rest of the network structure. • zQ ( X ,Y ) are graph statistics counting the number of configurations of type Q. The graph statistics in general can be expressed as

zQ ( X , Y ) = ∑ i , j , k ∏ Y , X ∈Q XijYk . { k ij }



• θQ is the parameter associate with the statistic and the configuration of type Q. • κ is a normalizing constant that sums over the entire graph space, that is,

k = ∑ x∈X exp ∑ Q {qQ ZQ ( X , Y )}.



Nodal attributes (Y) are the dependent variables in ALAAMs, and network structures or tie variables are considered as exogenous and affecting the formation of attributes. ALAAMs have the following general form:

Pr (Y = y | X = x ) =

1 exp ∑ Q {qQ zQ (Y , X )}, k

where Q, zQ ( X ,Y ) , θQ , and κ follow similar definitions as in ERGMs, except the dependent variable is now Y, and the dependence assumptions are about how attributes are conditionally dependent on the network structures and other attributes. For nondirected networks, Erdos and Renyi (1960), Frank and Strauss (1986), and Snijders et al. (2006) proposed the most commonly used ERGM specifications in an increased order of complexity; for directed networks, there are much larger numbers of

238   Dean Lusher, Peng Wang, Julia Brennecke ET AL. possible configurations where directions of ties have different interpretations from the perspective of the senders and receivers. Higher-order configurations represent different social processes, such as transitive closure, which reflect a path-shortening effect, or local social hierarchy (as some only receive ties, some only give, and others do both) as opposed to cyclic closure (in which everybody both gives and receives ties), which may represent a generalized exchange of information in the network (Robins et al., 2009). The most commonly used directed ERGMs follow the specifications proposed by Holland and Leinhardt (1981); van Duijn, Snijders, and Zijlstra (2004); and Robins et al. (2009). For ALAAMs, Daraganova and Robins (2013) presented a comprehensive nondirected specification, while Daraganova et al. (2012) applied such model specifications in a study of unemployment and spatial and network processes. ALAAMs for directed networks are demonstrated in Bryant et al. (2016) in the context of mental health and social networks in postdisaster situations. We present a directed ALAAM example later in this chapter. ERGMs or ALAAMs for multilevel networks follow the same constructs, except the multilevel network consists of two within-level one-mode networks (macro- and micro-level depending on contexts) and a single two-mode meso-level network (see Figure 13.1 for a visual representation of this multilevel framework). Attributes for nodes from different levels can be used as covariates in ERGMs and as dependent variables in ALAAMs. Wang et al. (Wang, Robins, Pattison & Lazega, 2013, 2016) highlighted how network structure or attributes of nodes from one level may affect the structures or attributes of nodes at a different level through the dependencies among network ties from the micro-, macro-, and mesolevel networks. Multilevel ERGMs have also been applied to different fields of research, such as international trade and policy (Hollway & Koskinen, 2016), organizational network studies (Zappa & Lomi, 2016), social-ecological systems (Bodin et al., 2016), and evaluations of network interventions (Matous & Wang, 2016). The general framework of ­multilevel network data we use here, and as used in MPNet (Wang, Robins, Pattison, & Koskinen, 2016), is depicted in Figure 13.1, where the macro-level (A) is used for organizations or groups, the micro-level (B) is people, and the meso-level (X) represents the affiliations between levels A and B.

Macro level A (Organizations)

Meso level X (Affiliations)

Micoro level B (People)

figure 13.1  Graphic representation of a multilevel network.

Advances in Exponential Random Graph Models   239 We note that for ALAAMs, the number of data points is equal to n, which is the number of nodes in the network. For ERGMs, the number of data points is n(n – 1), which represents the number of possible ties between n actors in the network (excluding loops, or self-nominations). ALAAMs are therefore susceptible to issues with small networks, unlike ERGMs, but it does mean that ALAAMs can deal with rather large networks (say, 1,000 nodes), whereas ERGMs have trouble with such sized networks.

Modeling Techniques The number of possible graphs or attribute vectors can be intractable with even a small number of nodes, which makes parameter estimation infeasible with traditional analytical methods. Snijders (2002) proposed several Markov chain Monte Carlo (MCMC) simulation strategies to generate graph samples representing a given ERGM. These simulated samples can then be used iteratively in ERGM parameter estimations. We applied the MCMC maximum likelihood estimation (MLE) algorithm proposed by Snijders (2002) based on Robbins and Monro (1951). The algorithm is implemented in MPNet (Wang, Robins, Pattison & Koskinen, 2016), which is used for the examples presented in this chapter. Other estimation methods including importance sampling (Geyer & Thompson, 1992) are implemented in statnet (Handcock et al., 2008), and Bayesian estimation methods proposed by Caimo and Friel (2011) and Koskinen, Robins, and Pattison (2010) are implemented in the R estimation package Bergm (Caimo & Friel, 2014).2 At the end of the MCMCMLE, model convergence is checked by comparing simulated graph samples and the empirical network by t-ratios. A model is considered as converged if all t-ratios are smaller than 0.1 in absolute values. Once a converged model is obtained, we test the significance of the parameter estimates by comparing the estimates and their estimated standard errors. If the ratio between an estimate and its standard error is greater than 2.0 in absolute value, we consider the parameter to be significantly different from zero, and the corresponding effect is statistically significantly stronger or weaker than what we expect from random depending on the signs of the estimates. To test the model goodness of fit (GOF), a large number of simulated graph samples are generated using the converged model while network statistics included as well as not included in the original model are collected. The goal is to test whether the model can capture features of the network that are not included in the model. For graph statistics not modeled in the model, a t-ratio value smaller than 2.0 indicates the model gives an adequate fit for the statistics. The models presented in this chapter provide adequate fits to most of the graph statistics implemented in MPNet. More detailed discussion on ERGM model selection and GOF are presented in Hunter, Goodreau, and Handcock (2008) and Wang, Robins, Pattison & Lazega (2016).

Empirical Examples We present three empirical examples that highlight the range of applications of such models and the theoretical questions one can answer using directed ALAAMs and multilevel

240   Dean Lusher, Peng Wang, Julia Brennecke ET AL. ERGMs and ALAAMs. The first empirical example demonstrates the application of ­multilevel ERGMs to network data collected in an organization. The second empirical example uses multilevel ALAAMs to investigate the impact of social networks on individual satisfaction through social influence mechanisms in an agro-pastoralist Senegalese community. The third and final empirical example uses single-level directed ALAAMs to model the personal investment of identity by scientists in their research labs. All models were fitted using the MPNet software (Wang, Robins, Pattison & Koskinen, 2016), available on the MelNet website (http://www.melnet.org.au). Additionally, there is a user manual that provides details of the range of network parameters for these various new models. Examples of how to structure the data are given in the manual, and example datasets are provided on the website.

Empirical Example 1: Multiple Project Memberships and Advice Seeking in Organizations For the first example, we draw on multilevel network data initially presented by Brennecke and Rank (2016) on 68 individuals working on 37 research and development (R&D) projects in a German high-tech firm. The research question pertains to how memberships of various work projects may be related to interpersonal advice-seeking behaviors, which has implications for effective knowledge transfer and project allocation. Effective knowledge sharing is a key concern for organizations (Kogut & Zander, 1992), but particularly for innovation (Fleming, 2001). The multilevel network in this example consists of a bipartite network reflecting individuals’ memberships in multiple project teams as well as an advice-seeking network between individuals. There are no ties between projects (i.e., the red macro-level ties in the example in Figure 13.1 are not included). Individuals were asked to select from a list all projects that they worked on. In addition, they were asked to indicate to whom they turned for work-­related advice, selecting from a roster with all employees working in the same division. We also take into account individual attribute data on hierarchical status (binary attribute), memberships in stable work units (categorical attribute), and an individual’s number of inventions (continuous attribute). All networks are binary, in line with requirements for dichotomized data only for use in MPNet.3 We fitted a multilevel ERGM that included single-level structural and attribute-based effects for the bipartite project membership network and the advice network following established standards. The model yields a good fit. Table 13.1 shows the model estimation results, including the effects specified in the model, a graphical representation of that network configuration, the estimates, and standard errors. We first interpret the effects capturing the structure of the micro-level advice network. The (arc) effect can be understood as a constant term in a regression and is not interpreted. We find tendencies for reciprocity and transitivity (ATA-T) as well as tendencies against cyclic closure (ATA-C). Thus, advice transfer is characterized by direct reciprocation as well as by hierarchical structuring: while a tendency for cyclic closure would indicate that all individuals are equally involved in advice transfer, positive transitive closure represents a form of informal hierarchy in the sense that there is only one individual in a triad that two

Advances in Exponential Random Graph Models   241 Table 13.1  Results of the Multilevel ERGM for Project Memberships and Advice Seeking Effect

Configuration

Estimate

Standard Error

Effects relating to the advice network (micro-level) ArcA ReciprocityA

–2.760*

1.030

2.478*

0.296

TwoPathA

–0.014

0.009

AinSA

–0.870

0.492

AoutSA

0.343

0.208

ATA-T

1.349*

0.116

ATA-CWE

–0.307*

0.051

A2PA-T

–0.103*

0.014

Hierarchical status SenderA

0.092

0.107

Hierarchical status ReceiverA

0.078

0.121

Hierarchical status InteractionA

0.721*

0.220

# of inventions SenderA

–0.020*

0.008

# of inventions ReceiverA

0.001

0.009

# of inventions DifferenceA

0.012

0.010

Unit membership MatchA

1.160*

0.119

Effects relating to the project-researcher affiliation network (meso-level) XEdge

–3.660*

0.753

XASA

0.528

0.494

XASB

1.281*

0.264

XACA

–0.419*

0.064

XACB

–0.247*

0.092 (Continued )

242   Dean Lusher, Peng Wang, Julia Brennecke ET AL. Table 13.1  Continued Effect

Configuration

Estimate

Standard Error

–0.005

0.010

–0.031*

0.009

0.951*

0.113

–0.990*

0.217

0.003*

0.001

Multilevel effects (macro-level)



ABoutS1X



ABinS1X

ATXBXarc

… ATXBXreciprocity

… # of inventions Star2BXSender

# of inventions Star2BXReceiver

–0.001

0.001

Unit membership TXBXMatchArc

0.231*

0.048

Unit membership L3XBXMatchArc

–0.017*

0.006

others turn to for advice. The tendency for transitive closure is additionally confirmed by the negative alternating two-paths (A2PA-T) parameter, indicating a tendency against open paths. Regarding actor attributes, we find a tendency for homophily based on individuals’ hierarchical status (Hierarchical status InteractionA). In other words, individuals with the same hierarchical status are more likely to seek advice from each other. We find a negative effect of individuals’ number of inventions on their tendency to seek advice (# of inventions SenderA). Finally, individuals are more likely to seek advice from each other if they belong to the same organization unit. Just like the arc effect in the micro-level advice network, the edge effect (XEdge) in the meso-level project membership network is akin to the constant term in a regression. The positive (XASB) effect indicates activity. Some project members belong to a high number of projects, while others only work on a few. The negative clustering parameters, (XACA) and (XACB), show that neither do individuals tend to share memberships on the same project teams nor do project teams tend to be connected by multiple individuals. Thus, the project membership network is characterized by tendencies against local clustering.

Advances in Exponential Random Graph Models   243 Concerning the multilevel effects, the (ABoutS1X) effect captures the interaction between knowledge workers’ number of project memberships and their tendency to seek advice. It is negative, indicating that the more projects individuals work on, the less they seek advice from their colleagues. The positive (ATXBXarc) effect indicates that individuals working on the same project seek advice from others on the same project. However, they do not mutually transfer advice, as indicated by the negative (ATXBXreciprocity) effect. The positive multilevel effect for individuals’ number of inventions (# of inventions Star2BXSender) indicates a three-way interaction between individuals’ number of project memberships, their number of inventions, and their advice-seeking behavior. While, in general, an individual’s number of inventions negatively influences their tendency to seek advice, as shown by the micro-level sender effect, the number of inventions positively influences their advice-seeking tendency if they work on a high number of projects. Finally, we also find an influence of individuals’ unit membership on the interaction between their project memberships and their advice ties: individuals belonging to the same unit and working on the same project are more likely to transfer advice (Unit membership TXBXMatchArc), while individuals belonging to the same unit and working on different projects are less likely to have an advice relationship with each other (Unit membership L3XBXMatchArc). We won’t go into detailed interpretations here regarding the implications of these results. Collectively, these findings give us a rich and detailed picture of the structure of advice seeking by R&D workers and its connection to project membership. The study of effective R&D units like this one allows us to go beyond reflections on the data that are truisms, such as “you should collaborate more,” and instead give detailed insights about particular structures of interacting that may be beneficial to an organization.

Example 2: Common Resource Management Satisfaction and Information Exchanges between Users The second example presents a multilevel ALAAM investigating individual satisfaction about the collective water supply management in a Senegalese village community. As ALAAM is a social influence model, we are concerned with predicting node-level outcomes (as opposed to ERGMs, where we are trying to predict network tie formation). This rural community is composed by 462 individuals divided into two ethnic groups—the Fulani and the Wolof—in latent conflict about water and land issues. In 2004, an American nongovernmental organization funded the construction of a water tower carrying water to every household in that community and allocated the management of this common-pool resource to an elected board composed of 33 villagers representing all social groups in the ­community. The dependent variable studied here corresponds to individual satisfaction (i.e., whether the village interests are satisfied or not) regarding the management of the common resource by the board. Two important individual attributes are taken into consideration, ethnicity and gender, and implemented as binary independent variables. Satisfaction is highly dependent on ethnicity: generally, while Wolof villagers are satisfied, Fulani villagers who are livestock owners largely consider that their needs are not taken into account by the board and so are less satisfied. Satisfaction depends also on gender, with women being more satisfied than men. Men, principally responsible for animal watering, are more confronted with water flow

244   Dean Lusher, Peng Wang, Julia Brennecke ET AL. pressure at critical times of the day, contrary to women, who expend water in off-peak times, mainly for cooking. These differences are, nonetheless, weakened by the common experience of problems related to water distribution and inequalities in institutional arrangements. Indeed, as shown by many researchers in social sciences (e.g., Coleman,  1988), social networks are crucial for understanding the diffusion of opinions in a specific milieu. Hence, this study examines satisfaction with water management as coming from attribute effects and from social influence mechanisms that permeate through networks. Because of the organization of this community into different social foci (Feld, 1981), individuals’ satisfaction needs to be investigated through a multilevel network approach, distinguishing and articulating at the same time both villagers and the board members who are in charge of the common resource. This multilevel network is composed of the following: 1) Information exchanges among villagers (429*429) 2) Information exchanges among board members (33*33) 3) Information exchanges between villagers and their board members (429*33) We fitted a multilevel ALAAM using MPNet software including these three networks. The results are presented in Table 13.2. We first interpret the model effects concerning the villagers. The density parameter (DensityB) is similar to the constant in a logistic regression analysis and is not interpreted. Second, no degree effects are significant: villagers who ask information from many others (SenderB) and/or provide information to many others (ReceiverB) do not tend to be less/ more satisfied. The contagion effect is significant and positive (ContagionB). This effect corresponds to a social influence mechanism: that is, individuals who obtain information and advice from satisfied others are more likely to be satisfied. Regarding the actor attributes effects, only ethnicity (EthnicityB) is significant and negative: Fulani are less satisfied than Wolof. At the villager level, this finding confirms the view that Fulani think their needs are not taken into account by the board. The reasons for this are infrastructural and institutional. While the largest Fulani village possesses more livestock than all other villages together, it is connected to the water distribution net with a secondary pipe that provides very low water pressure, especially when they water their animals in the morning and in the evening. Institutionally, Fulani are underrepresented on the board and occupy only deputy positions. Next, we interpret the effects regarding the board members (the macro-level—level A). Here, none of the structural effects is significant. The same holds for the attribute effects. These nonsignificant results depict, interestingly, an important aspect underlined by Faye (2011,  2013): structural aspects that could undermine collective action, like subgroup homophily, are not reflected in the board as a regulating institutional unit. On the one hand, significant structural and compositional aspects at the village community level are not observed as such in the board. On the other hand, Fulani board members consider, contrary to their principals, that their villages’ interests are satisfied. These results raise the question of whether the board as an institution fulfills a social-integrative function or whether it has built an entity that stands apart from the rest of the villagers and has its own internal social processes.4 Another reason for the nonsignificant effects could be that the second level is constituted by only 33 individuals: this small number of observations could make it difficult to obtain any significant effect.

Advances in Exponential Random Graph Models   245 Table 13.2  Results of the Multilevel ALAAM for Common Resource Management and Interpersonal Information Sharing on Individual-Level Satisfaction Effect

Configuration

Estimate

Standard Error

Effects relating to the network between villagers (micro-level B) DensityB

1.664*

0.434

SenderB

–0.153

0.093

ReceiverB

–0.139

0.096

ContagionB EthnicityB (1 = Fulani) ego

0.224*

0.102

–2.27*

0.278

0.029

0.288

DensityA

–0.022

1.187

SenderA

0.305

0.208

ReceiverA

0.019

0.194

–0.394

0.34

EthnicityA (1 = Fulani) ego

1.239

1.263

GenderA (1 = female) ego

–1.586

0.996

GenderB (1 = female) ego Effects relating to the network of board members (macro-level A)

ContagionA

Effects relating to affiliation network of villagers and board members (meso-level X) ActivityXA

–0.248

0.168

ActivityXB

–0.572*

0.175

0.499*

0.246

ContagionX-AB

These results do not mean that board members do not have an impact on individual satisfaction through social influence mechanisms. The last part of Table 13.2 shows precisely how interactions between villagers and board members are important. First, we find a significant (ActivityXB) effect: the more a villager obtains information from or provides information to board members, the less he or she is satisfied. This effect raises the question of causality—that is, whether individuals, because of their dissatisfaction, try to interact with board members to complain or correct them, or whether they are dissatisfied because they interact with many board members and are aware about splits and dissensions in the institution. The inverse effect, (ActivityXA), is not significant. A board member’s satisfaction is not related to his or her interactions with villagers, regardless of the satisfaction of the villagers. The (ContagionX-AB) effect shows that there is a social influence between board members and villagers: board members interacting with satisfied villagers are more satisfied, and the inverse is also true. This shows an opinion assortativity between levels, and how satisfied

246   Dean Lusher, Peng Wang, Julia Brennecke ET AL. board members may use other satisfied villagers to whom they have social ties as a voice or messenger to explain board decisions. This short example shows that it is important to take into account the social relationships among individuals to explain individual opinion. Furthermore, it also emphasizes the multilevel dimension of the study of collective action and the management of a common resource, and how such social structures operating at a different but related level may also have an impact. Given the claim that we live in an organizational society (Lazega,  2012; Perrow,  1991), this multilevel ALAAM network approach could be used in many frameworks involving multiple levels of agency for studying social influence.

Example 3: How Are Individual Accomplishments Shared across the Team? In this final example, we return to a single-level network analysis using a directed ALAAM, trying to investigate how network position predicts individual reports of self-identification by researchers in their research labs. Other analyses using directed ALAAMs could focus on predicting individual performance from one’s position in an organization’s informal network of trust or advice, or player performance on a sports team. There are myriad possible applications for this single-level ALAAM. To illustrate the utility of ALAAM, we compare an ALAAM with a conventional logistic regression model to show the extra benefits that including social network ties can have on understanding individual behaviors. As ALAAM is a logistic regression model, the outcome to be predicted is dichotomous. In cases where an outcome is based on a continuous measure, it is necessary to assign a cut-off score, above which the individual is considered to have the outcome, and below which the individual is considered not to. While many simple solutions can be used (e.g., median cut-off), these are sometimes criticized as arbitrary. Thus, we took a statistical approach using latent class analysis (LCA), which groups individual cases into probable (latent) groups or classes. In response to the question “If someone were to criticize my research team, it would feel like a personal insult,” we dichotomized our dependent variable into “1” (=highly invested), or “0” (=not highly invested). Future improvements to ALAAM may include the prediction of continuous attributes that avoid these issues. In regard to the predictors in the model, we included a set of network effects for various dyadic (one-to-one) configurations of directed network ties. These included a sender effect (outgoing trust ties predict the outcome), receiver effect (incoming trust ties predict the outcome), reciprocity effect (bidirectional trust ties), contagion effect (co-occurring outcomes across a trust tie), and contagion-reciprocity effect (co-occurring outcomes across a reciprocated trust tie). Additionally, a set of network-attribute interaction effects were specified. These effects consisted of interactions between the network effects mentioned previously (sender, receiver, reciprocity, contagion) and several individual attributes, including the number of patents, the number of publications, and team leadership role (indicating seniority).5 Additional Markov-based parameters involving three nodes are possible within ALAAMs. However, we restricted our analyses to dyadic configurations, to maintain adequate statistical power in the face of relatively few network members.

Advances in Exponential Random Graph Models   247 Model specification and estimation for the ALAAM proceeded in three general steps. First, the full original model was estimated and converged with good model fit. Second, for the sake of model parsimony, network-attribute interactions were removed if they were nonsignificant (except if they were a nested component of another significant interaction effect). The revised model was re-estimated and found to be convergent; however, the goodness-of-fit test showed that one out-of-model effect involving self-identification and trust ties among support role workers was not adequately re-estimated using this model. Thus, in a third step, we included these effects in a respecified model, which subsequently converged with excellent fit characteristics. This final ALAAM model is presented in Table 13.3.

Table 13.3  ALAAM Directed Model—Outcomes: Self-Identification with Research Lab   Effect

  Configuration

ALAAM

Binary Logisticb B

SE

95% CI

Density

.14

.31

-.43

.85

.02

.35

Trust nominations/sender

.06

.09

-.12

.21

–.04

.19

Receiver

.02

.18

Reciprocity

.75

.63

Contagion

.13

.25

–1.19

.84

Contagion × reciprocity Publicationsb (ego) Patents (ego) b

Θ

SE

–.14

.14

–.40

.09

–.35

.20

.54

.42

–.28

1.67

2.47

1.28

Patents (ego) sender

–2.61

Patents (ego) receiver

.04

Patents (ego) reciprocity

3.41

Sig.

1.09 * .15 1.66 *

Patents (alter) sender

.11

.13

Patents (alter) receiver

–.86

.41

*

Patents (alter) reciprocity

1.72

.70

*

Team leader (ego) Support role (ego)

–.06 2.10

2.13

2.93 –1.53 20.81

–2.73 1.56 1.70

1.39

Support role (ego) sender

.24

.82

Support role (alter) sender

.54

.69

Support role (ego + alter) sender

.22

–1.88

–5.69 2.65 *

* Wald > 2. a Based on 1,000 bootstrap samples. b Logarithmic transformation of count.

248   Dean Lusher, Peng Wang, Julia Brennecke ET AL. For comparative purposes, this final ALAAM is accompanied by a conventional logistic regression in which probable self-identification is regressed onto individual-level factors, as well as a simple count of outgoing trust nominations, as one might find in a rudimentary ­egocentric network study. As seen in the conventional binary logistic regression, in which all effects are purely individual effects, no single (individual-level) effect is significant, providing no indication as to what underlies a perception of personal investment in the research lab. This includes the individual’s number of trust nominations (outgoing trust ties), which also bears no direct relationship. Turning to the ALAAM model provides a markedly different picture. Self-identification with the research lab is predicted by the number of patents, not as an individual attribute, but in terms of its interaction with reciprocal trust ties between those named on the patent(s) and their colleagues. In particular, reciprocated trust ties that involved individuals with a higher number of patents was predictive of a greater likelihood of self-identification for both the individual named on the patents and the trusted network partner. By contrast, we see an exact mirror image with respect to unreciprocated trust ties. When we observed an individual named on patents placing trust in other individuals, who in turn did not reciprocate that trust, we observed a lower likelihood of self-identification for both those individuals. Finally, in cases where we see one support worker placing trust in another, we tend not to observe self-identification with the research lab. While achievements usually carry the names of specific individuals, they are nonetheless often the result of a collective effort on the part of a wider team. Understanding the relational structure of research teams or other work groups allows us to gain insight into how achievements accrued by a relative few individuals might support collective benefits, such as psychological cohesion across a group of coworkers. In this vein, this study offers us an unsurprising result with a few important twists. As one might expect, mutually acknowledged (i.e., reciprocal) trust relationships lie at the core of self-identification with the research lab. Interestingly, however, reciprocal trust ties by themselves are not a predictor of psychological cohesion with the group; rather, it is (reciprocal) trust ties that exist between named patent holders and other workers that matter most for personally investing oneself in their research lab. Crucially, the converse is also true: in the presence of unreciprocated trust ties, there is a dearth of self-identification with the research lab. This may suggest a deleterious effect to the group cohesion when untrusted individuals (individuals who claim to trust others but are not trusted back) hold or win individual accolades. From a methodological standpoint, these results illustrate the usefulness and limitations of the ALAAM for analyzing cross-sectional sociocentric network data to predict an individual outcome. As with any cross-sectional model, the ALAAM cannot disentangle cause and consequence, and thus both selection and influence interpretations are possible. While our discussion here favors a view of reciprocal network ties leading to self-identification, our results could likewise suggest a different causal process whereby self-identification with the group (in combination with individual achievement) leads to the formation of reciprocal trust ties. Naturally, longitudinal network data and associated statistical models (i.e., RSiena) would allow for firmer conclusions regarding the exact social processes at play. Nevertheless, as seen in this study, a single-level ALAAM is highly advantaged in comparison to other conventional regression methods in investigating how individual-level

Advances in Exponential Random Graph Models   249 outcomes may be contingent on one’s network connections to other individuals and, by extension, their attributes and characteristics.

Discussion and Future Steps The examples presented in this chapter demonstrate how ERGMs and ALAAMs can be applied in multilevel network contexts, as well as single-level ALAAMs for directed networks. The utility of directed ALAAMs is that they provide the focus of individual outcomes of conventional logistic regression but add in the effects of social network ties. Further, multilevel network models, both ERGMs and ALAAMs, permit great possibilities for understanding processes that occur at different levels on the formation of social ties and also individual outcomes. There are several technical frontiers for the development and applications of ERGMs. Pattison and Snijders (2013) and Wang, Robins and Pattison (2013) described a hierarchy of dependence assumptions that can be used as a guide for future model specification development. Pattison et al. (2013) and Stivala et al. (2016) proposed conditional estimation methods for snowball sampled network data such that ERGM parameter approximations for large population networks may be obtained based on relatively smaller snowball samples. Koskinen et al. (2013) proposed Bayesian augmentation methods for analyzing networks with a potentially large amount of missing data. Snijders and Koskinen (2013) also proposed methods for estimating longitudinal models using ERGMs. Other technical developments are in process and include the comodeling of network structure and individual outcomes for cross-sectional data (Fellows & Handcock,  2012; Wang, Robins, Koskinen, & Lusher, 2016), where node-attribute associations are treated as two-mode networks and modeled together with the interpersonal one-mode network, and ALAAMs for multilevel networks where the outcomes of nodes at one level may be affected by the attributes of nodes at a different level through the multilevel network structure (Wang, Brennecke, Lusher, & Robins, under review). We encourage readers to check the MelNet website (http://www.melnet.org.au/) for updates regarding new developments for ERGMs and ALAAMs in MPNet, as well as the statnet website for new developments in this space (https://statnet.csde.washington.edu/).

Notes 1. We note that with cross-sectional network models this is difficult to desalinate, but at least at the level of interpretation we do not assume that one level necessarily controls the other. 2. statnet also implements the Robbins-Monro algorithm, while MPNet also implements the Bayesian estimation algorithm. 3. Currently MPNet can only deal with binary networks, though valued networks can be added as dyadic covariate networks to predict the observed network under study. 4. In this sense, the question of whether the regulating institution is composed of members of the subgroup elites could be relevant. 5. For both, a log-linear transformation was used to weight the first publication/patent most heavily, with each additional publication/patent diminishing in weight.

250   Dean Lusher, Peng Wang, Julia Brennecke ET AL.

References Agneessens, F., & Roose, H. (2008). Local structural properties and attribute characteristics in 2-mode networks: p* models to map choices of theater events. Journal of Mathematical Sociology, 32(3), 204–237. doi:10.1080/00222500802148685 Bodin, O., Robins, G., McAllister, R. R. J., Guerrero, A. M., Crona, B., Tengo, M., & Lubell, M. (2016). Theorizing benefits and constraints in collaborative environmental governance: A transdisciplinary social-ecological network approach for empirical investigations. Ecology and Society, 21(1): Article 40. doi:10.5751/es-08368–210,140 Brennecke, J., & Rank, O. N. (2016). The interplay between formal project memberships and informal advice seeking in knowledge-intensive firms: A multilevel network approach. Social Networks, 44, 307–318. http://dx.doi.org/10.1016/j.socnet.2015.02.004 Bryant, R. A., Gallagher, H. G., Waters, E., Gibbs, L., Pattison, P., MacDougall, C., . . . Lusher, D.  (2016). Mental health and social networks following disaster. American Journal of Psychiatry, 174(3), 277‒285. Caimo, A., & Friel, N. (2011). Bayesian inference for exponential random graph models. Social Networks, 33, 41–55. Caimo, A., & Friel, N. (2014). Bergm: Bayesian Exponential Random Graphs in R. Journal of Statistical Software, 61(2), 1–25. Cliff, A., & Ord, K. (1972). Testing for spatial autocorrelation among regression residuals. Geographical Analysis, 4(3), 267–284. Coleman, J. (1988). Social capital in the creation of human capital. American Journal of Sociology, 94(Supplement: Organizations and Institutions: Sociological and Economic Approaches to the Analysis of Social Structure), S95–S120. Daraganova, G., Pattison, P. E., Koskinen, J. H., Mitchell, B., Bill, A., Watts, M., & Baum, S. (2012). Networks and geography: Modelling community network structures as the outcome of both spatial and network processes. Social Networks, 34(1), 6–17. Daraganova, G., & Robins, G.  L. (2013). Autologistic actor attribute models. In D.  Lusher, J.  Koskinen, & G.  Robins (Eds.), Exponential random graph models for social networks: Theory, methods and applications (pp. 102–114). New York, NY: Cambridge University Press. Doreian, P. (1989a). Network autocorrelation models: Problems and prospects. In D. A. Griffith (Ed.), Spatial statistics: Past, present, future. Ann Abor, MI: Michigan Document Services. Doreian, P., Teuter, K., & Wang, C. (1984). Network autocorrelation models. Sociological Methods & Research, 13(2), 155–200. Erdos, P., & Renyi, A. (1960). On the evolution of random graphs. Bulletin of the International Statistical Institute, 38(4), 343–347. Faye, M. (2011). Biased net model and subgroup relations—Social integration in heterogeneous groups. Procedia—Social and Behavioral Sciences, 100, 2–20. Faye, M. (2013). Soziale Netzwerke im kollektiven Handeln und Entscheiden: Das AllmendeProblem einer Dörfergemeinschaft im Nordwesten Senegals (PhD thesis). Carl von Ossietzky Universität Oldenburg. Feld, S. L. (1981). The focused organization of social ties. American Journal of Sociology, 86(5), 1015–1035. doi:10.1086/227352 Fellows, I., & Handcock, M. (2012). Exponential-family random network models. https:// arxiv.org/abs/1208.0121v1 Fleming, L. (2001). Recombinant uncertainty in technological search. Management Science, 47(1), 117–132. doi:10.1287/mnsc.47.1.117.10671

Advances in Exponential Random Graph Models   251 Frank, O., & Strauss. (1986). Markov graphs. Journal of the American Statistical Association, 81(395), 832–842. Geyer, C.  J., & Thompson, E. (1992). Constrained Monte Carlo maximum likelihood for dependent data. Journal of the Royal Statistical Society, Series B(54), 657–699. Handcock, M., Hunter, D. R., Butts, C., Goodreau, S., & Morris, M. (2008). statnet: Software tools for the representation, visualization, analysis and simulation of network data. Journal of Statistical Software, 24(1), 1‒11. Holland, P. W., & Leinhardt, S. (1981). An exponential family of probability-distributions for directed-graphs. Journal of the American Statistical Association, 76(373), 33–50. Hollway, J., & Koskinen, J. (2016). Multilevel embeddedness: The case of the global fisheries governance complex. Social Networks, 44, 281–294. doi:10.1016/j.socnet.2015.03.001 Huitsing, G., van Duijn, M.  A.  J., Snijders, T.  A.  B., Wang, P., Sainio, M., Salmivalli, C., & Veenstra, R. (2012). Univariate and multivariate models of positive and negative networks: Liking, disliking, and bully-victim relationships. Social Networks, 34(4), 645–657. doi:10.1016/j.socnet.2012.08.001 Hunter, D. R., Goodreau, S. M., & Handcock, M. S. (2008). Goodness of fit of social network models. Journal of the American Statistical Association, 103(481), 248–258. Kogut, B., & Zander, U. (1992). Knowledge of the firm, combinative capabilities, and the replication of technology. Organization Science, 3(3), 383–397. doi:10.1287/orsc.3.3.383 Koskinen, J. H., Robins, G. L., & Pattison, P. E. (2010). Analysing exponential random graph (p-star) models with missing data using Bayesian data augmentation. Statistical Methodology7(3), pp. 366‒384. Koskinen, J. H., Robins, G. L., Wang, P., & Pattison, P. E. (2013). Bayesian analysis for partially observed network data, missing ties, attributes and actors. Social Networks, 35(4), 514–527. doi:10.1016/j.socnet.2013.07.003 Lazega, E. (2012). Sociologie néo-structurale. In G. Bronner & R. Keucheyan (Eds.), Théories sociales contemporaines (pp. 113‒128). Paris, France: Presses Universitaires de France. Lazega, E., & Snijders, T. A. B. (Eds.). (2016). Multilevel network analysis for the social sciences: Theory, methods and applications. Switzerland: Springer. Lomi, A., Robins, G., & Tranmer, M. (2016). Introduction to multilevel social networks. Social Networks, 44, 266–268. doi:10.1016/j.socnet.2015.10.006 Lusher, D., Koskinen, J., & Robins, G. (Eds.). (2013). Exponential random graph models for social networks: Theory, methods and applications. New York, NY: Cambridge University Press. Matous, P., & Wang, P. (2016). Measuring outcomes of social networking event at distant locations. Paper presented at the 1st Australian Social Network Analysis Conference, Melbourne, Australia. Pattison, P., & Snijders, T. A. B. (2013). Statistical models for social networks: Future directions. In D. Lusher, J. Koskinen, & G. Robins (Eds.), Exponential random graph models for social networks: Theory, methods and applications (pp. 287‒301). New York, NY: Cambridge University Press. Pattison, P.  E., & Robins, G.  L. (2004). Building models for social space: Neighbourhoodbased models for social networks and affiliation structures. Mathematics and Social Sciences, 42(168), 11–29. Pattison, P. E., Robins, G. L., Snijders, T. A. B., & Wang, P. (2013). Conditional estimation of exponential random graph models from snowball sampling designs. Journal of Mathematical Psychology, 57(6), 284–296.

252   Dean Lusher, Peng Wang, Julia Brennecke ET AL. Perrow, C. (1991). A society of organizations. Theory and Society, 20(6), 725–762. doi:10.1007/ bf00678095 Robbins, H., & Monro, S. (1951). A stochastic approximation method. Annals of Mathematical Statistics, 22, 400–407. Robins, G., Pattison, P.  E., Kalish, Y., & Lusher, D. (2007). An introduction to exponential random graph (p*) models for social networks. Social Networks, 29(2), 173–191. Robins, G. L., Elliott, P., & Pattison, P. E. (2001). Network models for social selection processes. Social Networks, 23(1), 1–30. Robins, G. L., Pattison, P. E., & Elliott, P. (2001). Network models for social influence processes. Psychometrika, 66(2), 161–189. Robins, G. L., Pattison, P. E., & Wang, P. (2009). Closure, connectivity and degree distributions: Exponential random graph (p*) models for directed social networks. Social Networks, 31(2), 105–117. Robins, G.  L., Snijders, T.  A.  B., Wang, P., Handcock, M., & Pattison, P.  E. (2007). Recent developments in exponential random graph (p*) models for social networks. Social Networks, Special Section: Advances in Exponential Random Graph (p*) Models, 29(2), 192–215. Skvoretz, J., & Faust, K. (1999). Logit models for affiliation networks. Sociological Methodology, 29, 253–280. Snijders, T. A. B. (2002). Markov chain Monte Carlo estimation of exponential random graph models. Journal of Social Structure, 3(2). Snijders, T. A. B., & Koskinen, J. (2013). Longitudinal models. In D. Lusher, J. Koskinen, & G. Robins (Eds.), Exponential random graph models for social networks: Theory, methods and applications (pp. 130–140). New York, NY: Cambridge University Press. Snijders, T. A. B., Pattison, P. E., Robins, G. L., & Handcock, M. (2006). New specifications for exponential random graph models. Sociological Methodology, 36, 99–153. Stivala, A. D., Koskinen, J. H., Rolls, D. A., Wang, P., & Robins, G. L. (2016). Snowball sampling for estimating exponential random graph models for large networks. Social Networks, 47, 167–188. doi:10.1016/j.socnet.2016.05.11.003 van Duijn, Snijders, T. A. B., & Zijlstra, B. (2004). A random effects model with covariates for directed graphs. Statistica Neerlandica, 58, 234–254. Wang, P., Brennecke, J., Lusher, D., & Robins, G. (under review). Social influence analysis using multilevel ALAAMs: Individual performance, inter and intra organizational network structures. Organizational Research Methods. Wang, P., Sharpe, K., Robins, G. L., & Pattison, P. E. (2009). Exponential random graph (p*) models for affiliation networks. Social Networks, 31(1), 12–25. Wang, P. (2013). Exponential random graph model extensions: models for multiple networks and bipartite networks. In D. Lusher, J. Koskinen, & G. Robins (Eds.), Exponential random graph models for social networks: Theory, methods and applications (pp. 130–140). NY: Cambridge University Press. Wang, P., Robins, G., & Pattison, P. (2013). Exponential random graph model specifications for bipartite networks: A dependence hierarchy. Social Networks, 35(2), 211–222. Wang, P., Robins, G., Pattison, P., & Lazega, E. (2013). Exponential random graph models for multilevel networks. Social Networks, 35(1), 96–115. doi:10.1016/j.socnet.2013.01.004 Wang, P., Robins, G., Koskinen, J., & Lusher, D. (2016). Duality of social selection and social influence. Paper presented at the XXXVI Sunbelt Social Network Analysis Conference, Newport Beach. CA.

Advances in Exponential Random Graph Models   253 Wang, P., Robins, G., Pattison, P., & Koskinen, J. (2016). MPNet: A program for the simulation and estimation of exponential random graph models. Melbourne, Australia: Swinburne University of Technology. Wang, P., Robins, G., Pattison, P., & Lazega, E. (2016). Social selection models for multilevel networks. Social Networks, 44, 346–362. Wasserman, S., & Pattison, P. (1996). Logit models and logistic regressions for social networks.1. An introduction to Markov graphs and p. Psychometrika, 61(3), 401–425. Zappa, P., & Lomi, A. (2016). Knowledge sharing in organizations: A multilevel network analysis (Vol. 12) Role Sets and Division of Work at Two Levels of Collective Agency: The Case of Blockmodeling a Multilevel (Inter-individual and Inter-organizational) Network (pp. 333‒353).

Chapter 14

Modeli ng N et wor k Dy na mics David R. Schaefer and Christopher Steven Marcum

One of the great lessons from the last half century of research on social networks is that relationships are constantly in flux. While much social network analysis focuses on static relationships between actors, there is also a rich tradition of work extending back to foundational studies in network science focused on the notion that network change is an indelible aspect of social life for human and nonhuman actors alike (e.g., Bott, 1957; Heider, 1946; Newcomb, 1961; Rapoport, 1949; Sampson, 1969). Today, social network researchers benefit from this history in that a host of methods to collect and analyze such dynamic network data have been developed. Among them, the methods based on stochastic process theory have given rise to a paradigm where inferences and predictions can be made on the mechanisms that drive changes in social structure. Over 40 years ago, Holland and Leinhardt (1977) published a seminal paper detailing a straightforward approach for social scientists to model change in small networks using stochastic process theory based on Bernoulli graph Markov chains, which would ultimately lead to statistical approaches to network dynamics. Then, over 20 years ago, Suitor, Wellman, and Morgan (1997) edited a special issue of Social Networks that sought to raise this issue to the forefront of the field by featuring papers on the theoretical underpinnings of how, why, and when social networks change. Since that issue, a quickly developing literature on methodology for collecting and analyzing network dynamics has emerged, each method addressing one or more of the many challenges researchers face in modeling network dynamics. In this chapter, we review three contemporary approaches commonly employed to model network dynamics. We begin by defining network dynamics as the process by which features of networks change over time. These features consist of the basic units of networks: namely, vertices, edges, and their respective covariates. Network dynamic modeling seeks to understand this process of change, on the one hand, and to predict future network states, on the other. We pay particular attention to the temporal resolution under study (i.e., the order of the time scale), as this shapes which methods are appropriate for a given research question. Throughout, our emphasis will be on network dynamics involving state changes in edges, and to a lesser extent actors and covariates. We organize the balance of this chapter loosely

Modeling Network Dynamics   255 around the dichotomy of temporal resolutions commonly found in network science. We discuss one approach for event-level network data (focused on the moments at which relations between entities transpire) and two approaches for state-level network data (i.e., relations with some degree of persistence, measured in cross-section). To further illuminate these approaches, we draw from an empirical example using historical network data in the public domain. We conclude with a discussion of open questions and challenges for the future of network dynamic methods and research development.

Conceptualizing Network Dynamics A network is defined as a set of actors (represented as vertices, sites, or nodes in a graph) and the relations between them (represented by edges, arcs, or ties) (Wasserman & Faust, 1994). Both actors and relations may be associated with corresponding characteristics or attributes. Two types of network-related changes have received the most attention. First is change in nodal attributes driven, at least in part, by their pattern of connections to other nodes—what is termed dynamics on networks. Examples include influence on behaviors or attitudes (Friedkin, 1998) and diffusion (e.g., disease; Morris & Kretzschmar, 1995). This is not our immediate focus, though we point the reader toward a rich discussion of such approaches in Porter and Gleeson (2016). Second, and the focus of this chapter, is change in the state of ties among a set of nodes— termed network dynamics. Examples include a sequence of email exchanges in a company, the formation of friendships in schools, or the waxing and waning of conflict in families. Our focus on network dynamics emphasizes how the relationships between actors change across time. While some of the methods we review can accommodate fluctuating actor sets, we make the simplifying assumption that the set of actors is constant. Of course, the network and nodes may change in response to one another (i.e., network-behavior coevolution; Steglich, Snijders, & Pearson, 2010). Some of the approaches we review are capable of modeling such endogenous dynamics; however, we restrict our focus to network change aspects, treating all other effects as covariates. Given our focus on relational change, it is helpful to begin with an overview of how relations can be classified. Borgatti and colleagues (2009) distinguished four types of relations: similarities, “social relations,” interactions, and flows. Similarities refers to dyadic measures of whether actors have common attributes (e.g., whether two people are the same gender or work in the same office). Similarity relations can change, but only as a consequence of change in the node attribute that is the basis of the similarity, and, as such, is not an outcome of interest in this chapter. That said, similarity is one of the primary drivers of other kinds of relations (e.g., homophily, spatial stratification, etc.; McPherson, Smith-Lovin, & Cook, 2001; Festinger, Back, & Shachter, 1950) and is an important predictor of tie formation and dissolution that we consider. “Social relations” include affective or cognitive states (e.g., liking someone), as well as roles that define a relationship (e.g., coworkers, spouses). These types of relations have the quality of being more or less enduring over time. That is, once formed, these relationships are relatively durable in nature. While these kinds of relations do change—a fight may lead friends to stop liking one another and the friendship to end—such changes mark a shift from one relationship state to another.

256   David R. Schaefer and Christopher Steven Marcum Contrast the durability of ties in such social relations to the relatively ephemeral nature of “interactions” and “flows,” which represent social actions taken by one actor and directed toward another (or the movement of something from one actor to another), respectively. These kinds of relations can be measured in two ways. They can be recorded at the level of the discrete transmission (e.g., an event such as a text message) or summarized within a dyad (e.g., “i talks to j”). When interactions and flows are captured at this level, they can be conceptualized as events. Network events are typically of short duration, if not instantaneous. By contrast, when interactions and flows are summarized for a dyad, they come to resemble relationship states. Admittedly, underlying many state representations of social relations are a series of continuous events. Friendships, for instance, develop and are maintained through a series of micro-interactions (Fischer, 1982), as are relations of animosity (Tita & Radil, 2011) and ambiguity (Uchino, 2004). Thus, what may appear to be a stable friendship in the cross-section might coincide with a series of ongoing social exchanges that help sustain the relationship. Thus, the same relationship can be understood and represented in different ways, with implications for how network change is modeled. This distinction between relations in the form of either states, with some level of persistence, or instantaneous events has implications for the nature of interdependence between relations and the subsequent modeling approach. Butts and Marcum (2017) distinguish these by whether autocorrelation between actors is sequential or simultaneous. Events are instantaneous actions occurring periodically over time within dyads. As such, ties occur in sequence, and much of their autocorrelation is across time. With the relational event framework, we seek to model series of observations of those micro-interactions. By contrast, the persistent nature of states means that relations overlap in time (i.e., exist simultaneously), making them autocorrelated at a given time point. Both stochastic actor-oriented and exponential random graph frameworks seek to model state changes between discrete slices of the aggregation of those ties over time.

Network Change Processes Although the manifestation of network changes differs between networks of events versus states, several common relational patterns or interdependencies are present in both types of dynamic networks. Thus, we provide a brief overview of common relational processes. This classification is only general and it is possible for more complicated interactive effects, either within or across categories.

Nodal Effects Actors characterized by particular attributes, or higher or lower values on an attribute, may be more or less likely to form or dissolve ties over time. Nodal effects can be distinguished for the sender and receiver of a tie, representing a distinction between the gregariousness and attractiveness of actors based on their attributes, respectively.

Dyadic Effects Dyadic-level processes take several forms. (1) Attribute based. Actors’ attributes may interact to create a dyad-level effect on tie likelihood. A frequent example is homophily

Modeling Network Dynamics   257 (McPherson et al., 2001), where the likelihood of a tie is greater in dyads characterized by nodes that are similar on an attribute. Much less common or studied is heterophily, or elevated tie likelihood between dissimilar nodes (though gender heterophily in romantic relations is a notable exception). Tendencies for actors with a particular attribute to select alters based on the alter’s value on a different attribute are also possible (e.g., youth with poor social bonds befriending substance-using peers; Schaefer,  2018). (2) Propinquity based. Some nodes are closer together than others in space (e.g., geography) and time, which elevates their likelihood of coming into contact, interacting, and sharing a tie. Propinquity can also be achieved through common involvement in foci, such as activities or organizations (Feld, 1981). Propinquity is often measured at the dyadic level, though other specifications are possible (e.g., affiliations can be treated as a two-mode network). (3) Entrainment. One type of tie (possibly formed earlier in a relationship) may increase the likelihood of another type of tie (Lusher, Koskinen, & Robins, 2013). For instance, a relationship characterized by  trust can beget friendship. In contrast to attribute-based effects and propinquity, the predictor is strictly a relational attribute.

Endogenous Structure Much of the attention in social network modeling has focused on the various forms of interdependence between nodes, most of which are described as resulting from tendencies for certain types of structures to transform into others. Common forms include the following. (1) Degree based. Some nodes may have more incoming/outgoing ties. In the longitudinal case, nodes with higher degree may tend to send/receive more ties in the future (e.g., sociality and popularity). (2) Preferential attachment. Actors with high/low degree may have an elevated likelihood of connecting to alters with a similar high/low degree (i.e., an interaction between the degree of two nodes). (3) Reciprocity. This is a dyadic-level process whereby an outgoing tie (from i to j) is matched by an incoming tie (from j to i) in the future. (4) Triadic closure. These are sets of three nodes in which a tie exists within each dyad (though not necessarily in both directions within a dyad). Triad closure can take the form of transitivity—the presence of a two-path from i to k via j (i.e., and are present) coincides with a direct tie from i to k ()—or a cycle, where a two-path from i to k via j coincides with a direct tie from k to i ().

Modeling Network Dynamics The models we cover in this chapter are statistical, that is, stochastic and estimable, in nature. We would be remiss if we failed to point out that there are alternative approaches to studying network dynamics, some even in nonstatistical frameworks. These include a host of physical models of interaction (Barzel & Barabási,  2013), epidemiological approaches (e.g., diffusion), qualitative approaches (Jack,  2005), and deterministic methods. While each of these alternatives offers its own contribution to the study of network dynamics, we exclude them from our review here for brevity and relevance (some models are appropriate only in special cases). In this chapter we assume that ties are directed (i.e., an tie is distinct from a tie) and that relations are dichotomous (either present or absent).

258   David R. Schaefer and Christopher Steven Marcum We discuss the capacity for each approach to handle other types of relations. We also assume “whole” network data, where relations between all nodes within a specified context are measured, though studies with other types of samples, such as ego-centered network data, have adopted the methods we review here (Krivitsky & Morris, 2015; Marcum & Butts, 2015; Smith, 2012, 2015; Lubbers et al., 2010). As with any method, whether these approaches are the ideal choice depends on one’s research question and data. Lastly, we focus on one-mode networks, but each of these approaches is suitable for two-mode networks. Holland and Leinhardt (1977) established the first approach to studying network dynamics with a firm foundation in stochastic process theory. Their method was revolutionary in the sense that it provided a model of social structure that could be employed to understand network change with longitudinal network data at the same time that it could be used to understand the processes that gave rise to a single cross-sectional network observation (using the stationary distribution under a given model, for instance). One underappreciated aspect of their seminal model was the assumption that the basis of change in social networks occurs not at the level of higher-order structure (triads, four-cycles, etc.) but rather at the level of what they characterized as “choice”—that is, the level of the edges incident between a dyad—and that those choices (as determined by the actors sending and receiving ties on those dyads) were made conditionally independently and nonsimultaneously (i.e., in continuous time). Thus, dynamic network observations under their model could take the form of a list of events transpiring between actors over a fixed period of time, a set of cross-sections of network states, or a panel of longitudinal network states. Taken together, their model advanced a perspective that network dynamic models are as much about characterizing change as they are about describing processes, that the process by which change in networks occurs is stochastic in nature, and that the basic units of change in networks are the edges themselves. It is from this perspective that we orient the balance of the present chapter. Different aspects of this perspective are foundational for the three approaches we review in the following sections. In the first case, Butts’s (2008) relational event model for social action focuses on ties as the units of change, adopting the sender-receiver and nonsimultaneity perspective employed by Holland and Leinhardt (1977). In the second, Snijders’s (1996) stochastic actor-oriented model examines units of change in the form of actors’ choices of whether or not to add or remove ties to others. In the third and final case we review here, the exponential random graph framework (Robins et al., 2007; Krivitsky & Handcock, 2014) treats state change at the level of the dyad as the unit of analysis. Thus, the three statistical models for the analysis of network dynamics of interest here constitute a hierarchy with respect to their temporal resolution and units of analysis. Table 14.1 provides a brief summary of these approaches. The remainder of this chapter offers a more detailed discussion of each, roughly organized with respect to this hierarchy.

The Relational Event Framework We begin our review of network dynamic models with a statistical approach for the least expansive temporal unit: that is the moment when interactions transpire between actors. Traditionally, social networks—even dynamic social networks—have been thought of as more or less fixed entities over short periods of time, with relationships changing on presbyopic scales. While this may be true for many types of social relationships (e.g., friendships

Modeling Network Dynamics   259 Table 14.1  Summary of the Three Dynamic Network Models Reviewed in this Chapter REM

TERGM

SAOM

Suitable for research questions involving the sequences, patterns, timing, and likelihood of discrete social actions.

Research questions about the effect of actor, tie, and network structural covariates on the formation, duration, and dissolution of ties are all appropriate.

Can address the same types of questions as TERGMs. Able to simultaneously model multiple types of relations. In addition, questions about how network position and ties to others affect nodal attributes are suitable.

Examples: 1. What’s the expected waiting time for a tie sent in the past to be reciprocated in the future? 2. Do patterns of relational dynamics change after exogenous shocks to the network?

Examples: 1. Are the structural processes involved in tie formation different from those involved in tie maintenance? 2. How do actor attributes affect relational stability?

Examples: 1. Does one type of tie lead to another, or vice versa (e.g., does friendship beget status, and/or status beget friendship)? 2. How much homophily on an attribute is due to actors selecting similar others vs. actors influencing one another?

Nature of data

Time-ordered series of discrete, instantaneous, social actions (i.e., relational events) directed from sending entities to receiving entities.

Whole network data consisting of longitudinal panels or cross-sectional observations of network states.

Whole network data consisting of longitudinal panels or cross-sectional observations of network states.

Key references

Butts (2008); Marcum and Butts (2015); Butts and Marcum (2017)

Krivitsky and Handcock (2014); Leifeld et al. (2017); Robins and Pattison (2001)

Snijders (2001); Snijders van de Bunt, and Steglich (2010); Steglich, Snijders, and Pearson (2010)

Key software

R: relevent, informR

R: statnet, xergm; PNet

R: RSiena

Types of questions

Additional summaries comparing advantages and disadvantages of these models (and others) and the methods used to estimate them can be found in Silk et al. (2017).

and confidants) and indeed for many types of social interactions (e.g., coauthorship, support provision), there are myriad micro-behaviors in the form of directed interactions that occur between members of a network in real time as a series of discrete events. For example, utterances in conversations, acts of bullying, and exchanges in financial transactions all consist of micro-level interactions occurring effectively instantaneously between actors in a network. When research questions probe dynamic network processes unfolding at the event level, it is appropriate to adopt a model that embraces that unit of analysis. Butts (2008) introduced the relational event framework to advance a perspective for thinking about, analyzing, and learning from micro-level dynamic network data. He defines

260   David R. Schaefer and Christopher Steven Marcum a relational event as a social action emitted by one entity and directed toward another in its environment. Thus, the nature of interaction appropriate for study under the relational event framework is necessarily directed. A series of such social actions, ordered as they transpire in time, constitute an event history. The goal of the relational event framework for social action is to provide a unified theoretical and statistical approach to learn from event histories in a network with an emphasis on how models of social behavioral dynamics unfold in time. With this framework, we might ask questions such as how does the past history of interactions unfolding between a set of actors affect future behavior? The basic approach is relatively straightforward. First, the framework posits that each relational event consists of a sender of action, a receiver of action, an action type, and the time (recorded as either an event order or timestamp) that the action transpired. The events are assumed to transpire nonsimultaneously in the event history and the interevent waiting times are assumed to be governed by a latent piecewise constant hazard model (which is akin to the proportional hazards family of models). As Brandes, Lerner, and Snijders (2009) point out, the dependent variable under this framework is the next event and the goal is to model how interactions captured by the event history (and optionally, dynamic or static sender/receiver/event covariates) predict what happens next (Marcum & Butts, 2015). The support of each next event, that is, the set of possible actions assuming no endogenous constraints, is constructed from the sets of senders, receivers, and action types (when multiple types of events can transpire). When interest is in single relations (i.e., one type of action), the number of events that could possibly occur at any time, assuming that all actors are available for both sending and receiving ties at all times in the observation period, is simply N * (N – 1) (otherwise it is S * R * A, for the total number of senders, receivers, and action types, respectively). Thus, at every moment in time, there are effectively N * (N – 1) possible social actions in this framework, all other things constant. Moreover, the model assumes the first event initiates the relational event process (i.e., it is treated as exogenously null), which continues until the last event is observed. Several tutorials have been published (Butts & Marcum, 2017), including the freely available documentation of the relevent and informR packages for R. Those packages are currently the only widely available software for constructing sufficient statistics and modeling relational event data. Models are specified through the use of sufficient statistics that capture covariate and endogenous dynamic effects of interest. For instance, one may wish to capture the overall tendency for the receipt of past actions to affect an actor’s future rate of sending ties to others; a statistic equal to the normalized in-degree up to a particular moment in time as a predictor of the next event is sufficient for that process. For such an effect the model counts the number of times an actor has been a receiver of ties up to time t (normalizes it) and uses that to predict sending a tie at t + 1. For example, in dominance relations, if being victimized increases future dominance behavior, then the model parameter associated with that sufficient statistic will be positive, indicating that the hazard for sending domination ties increases as a function of past victimization. Or, one may wish to know whether pairs of actors are involved in sequential exchanges that can be characterized by immediate reciprocation. Returning to the dominance example, the sufficient statistic would be whether i dominated j at time t, which is used to predict whether j dominates i at time t + 1 (or not). Thus, during estimation, each sufficient statistic is evaluated against what could possibly happen next (i.e., each possible event) and modifies the hazard of the next event according to whether that statistic increases or decreases its relative propensity to

Modeling Network Dynamics   261 transpire. In practice, the relevent package parameterizes these effects as log rate multipliers on the hazard of the next event. Previously published papers provide empirical examples of these and other types of sufficient statistics and extend the model in several ways. For instance, in the original paper, Butts (2008) derived two versions of the model likelihood (one for temporally ordinal and one for temporally exact timing information) and used the framework to model the conversational dynamics of radio communications between police responding to the 9/11 terrorist attacks on the World Trade Center. In a separate but closely related model, Brandes et al. (2009) modeled the valence (positive or negative) of interactions unfolding between countries in political event networks. In another separate but related model, de Nooy (2011) introduced temporal hierarchy into the relational event process. DuBois, Butts, McFarland, and Smyth (2013) further extended work on hierarchical models to incorporate hierarchical relational event data structures and modeled educational discussions occurring in multiple classrooms, and developed stochastic blockmodels of online communication (DuBois, Butts, & Smyth,  2013). Marcum and Butts (2015) modified the likelihood to incorporate exogenous events and demonstrated a generalized application of the model for fitting to ego-centered event histories with many types of events. This paper also illustrated the use of dynamic support constraints for the action set, which are useful when not every action is possible at all times. Finally, actor-oriented approaches to relational event models are in development for one-mode (Stadtfeld, Hollway, & Block,  2017) and two-mode data (Stadtfeld & Geyer-Schulz, 2011). These advances are notable because their approach relates both to the ego-centered approach described in Marcum and Butts (2015) and to the stochastic actor-oriented models we discuss in the next section.

Stochastic Actor-Oriented Framework Snijders (1996, 2001, 2005) introduced the stochastic actor-oriented model (SAOM) as a means to evaluate change across two or more network observations in which ties are considered to be states. Much of the research adopting this model has focused on friendships, typically among schoolchildren (Veenstra et al.,  2013). Other research has examined the recurrent flow of advice (Snijders, Lomi, & Torló,  2013), gossip (Ellwardt, Steglich, & Wittek, 2012), and power attributions (Labun, Wittek, & Steglich, 2016). Moving beyond humans as actors, researchers have investigated, for instance, interorganizational cooperation in estuary management (Berardo & Scholz,  2010) and trade flows between nations (Prell & Feng, 2016). For each of these networks, the “state” relations are measured at a given point in time (e.g., a survey item asking “who are your five closest friends?”) or aggregated within a window of time (e.g., volume of trade in a given year). With this approach, the question is what processes are responsible for (or consistent with) the observed changes in network across observation waves. Although the SAOM considers relational states observed at two or more cross-sections in time, it assumes that ties change in continuous time through a series of micro-steps interspersed between observations. Each micro-step consists of no more than one change in the network. Key to understanding and interpreting an SAOM is that its estimation is achieved through an actor-oriented (i.e., agent-based) simulation. Each actor is in control over its outgoing ties and makes decisions about whether and how to change those ties. The

262   David R. Schaefer and Christopher Steven Marcum decision process is specified through an evaluation function that takes the form of a series of effects, such as homophily or reciprocity. The goal of estimation is to obtain a reasonable “weight” to give each effect (i.e., parameter estimate) to represent observed network change. In practice, the model begins by calculating the values of several target statistics based on the effects included in the model. Effects are based on the network, actor characteristics, and possible dyadic attributes. For instance, a model with a reciprocity effect would have a reciprocity target statistic, calculated as the sum across all actors of their number of reciprocated ties. The model conditions on the time 1 observation, and thus target statistics are calculated for time 2 and beyond. Estimation treats the first observation as given and simulates future network change (incidentally, using a latent relational event process). Changes are prompted from the perspective of the actor. Each micro-step begins with the random selection of one actor. Typically, actors have an equal chance of being selected, although differential opportunities across actors is allowed (specified through the rate function). The chosen actor has the opportunity to change the state of the tie in one of its dyads—by either adding a tie, dissolving a tie, or making no change. Which change is made is based on the current state of the network and parameter estimates as summarized through the evaluation function. Actors use the evaluation function to calculate the “value” of every possible tie, then make the change that produces the highest value of the evaluation function (though with some error introduced to ensure stochasticity). Once a change is made, the algorithm repeats—an actor is randomly chosen and the network is evaluated from its perspective. This process repeats a large number of times. Periodically the algorithm compares the target statistics to statistics generated through the simulated network change and adjusts parameter estimates. For example, if the current set of parameter estimates was leading to too many reciprocated ties, then the reciprocity parameter would be adjusted downward. This adjustment results in actors giving reciprocity less weight when considering which ties to add, keep, or dissolve. The model has converged once it can recreate the set of target statistics. The model considers two types of changes: whether a nonexistent tie forms (vs. not), and whether an existing tie dissolves (vs. persisting). The model constrains effects to be equal for formation and maintenance by default, though this constraint can be tested for suitability and relaxed. Most SAOM applications to date maintain this constraint, and thus effects typically are interpreted as their effect on actors’ tendencies to form and/or maintain ties versus failure to form and/or dissolve ties. Several good reviews of SAOMs are available (Snijders, 2001; Snijders, van de Bunt, & Steglich, 2010). The model has stimulated an active and ongoing research agenda that has led to several extensions of the network evolution model. These include modeling the coevolution of networks and nodal attributes (Steglich et al., 2010; Niezink & Snijders, 2017), such as through peer influence (Veenstra et al., 2013), but can also be extended to questions of diffusion (Greenan, 2015) and separating influence on behavior adoption from behavior cessation (Haas & Schaefer,  2014). Other developments include weighted ties (Elmer et al., 2017) and nondirected ties (Ferligoj et al., 2015), though the latter are complicated by the actor-oriented basis of the model, and multilevel models (Hollway et al.,  2016). Additionally, applications of SAOMs are featured in special issues of Social Networks (Vol. 32 [1]) and the Journal of Research on Adolescence (Vol. 23 [3]). SAOMs are estimable using the RSiena package within R, and indeed the manual and webpage for RSiena are excellent resources (Ripley et al., 2020).

Modeling Network Dynamics   263

The Exponential Random Graph Framework Our final approach builds upon the exponential random graph model (ERGM) family (Wasserman & Pattison, 1996), which is a very general approach to the statistical modeling of networks (Robins et al.,  2007). ERGMs were developed to model a single network as observed at one point in time. ERGMs assume the given network is only one from a distribution of possible networks. The goal in model estimation is to derive parameter values and evaluate their ability to reproduce the observed network (defined by a set of sufficient statistics). Although the ERGM models at the network level, under the assumption the network is at equilibrium (i.e., Markov dependence), the model can be interpreted as providing the probability of a tie conditional on the rest of the network. Much has been written about ERGMs and applications are widespread (see Lusher et al., 2013). The dependencies modeled by an ERGM often imply a temporal process, but this is not explicit in their specification per se (Robins & Pattison,  2001). Network change is made explicit in several recent extensions to the model (discussed later). Like the SAOM, these extensions assume networks are observed at two or more discrete time points. In applying the ERGM to longitudinal data, we follow Snijders (2011) and refer to this general class of model as dynamic ERGMs. One such extension is the temporal ERGM (TERGM) introduced by Robins and Pattison (2001) with subsequent treatments in Hanneke, Fu, and Xing (2010) and Desmarais and Cranmer (2012). In general, these approaches distinguish dependencies as either within or across time (Robins & Pattison, 2001). For instance, the simplest case TERGM pools time slices; hence, all dependency is within time and effects are represented by simple change statistics in the same form as the cross-sectional ERGM (see Czarna et al., 2016). Such a model makes the questionable assumption that time slices are independent of one another. Cross-time dependency can be introduced in the form of one or more lagged statistics (i.e., the effect of the network at t – 1 on the network at time t). A good start is to introduce the relations in the immediately preceding network as a dyadic covariate (edge-wise). This is often referred to as a stability parameter and captures the tendency for ties to persist over time. Additional within-time effects then represent the likelihood of observing particular configurations of ties (e.g., mutual ties) net of the preceding observation. With this specification temporal dependence is assumed to exist only at the dyad level, while dependence between dyads is assumed to occur within time. An alternative TERGM specification takes additional cross-time dependencies into account. For instance, instead of modeling whether mutual dyads are likely to appear over time (as noted earlier), one can specify a cross-time effect to test whether asymmetric ties at time t tend to become mutual ties at t + 1 (Demarais & Cranmer, 2012; Hanneke et al., 2010; Robins & Pattison, 2001). Additional cross-time dependencies could be specified to test for higher-order processes, such as those related to triad closure. Such a model makes the assumption that dependence between ties exists across time, though not necessarily within time (i.e., ties are conditionally independent at time t given the network at time t – 1). The TERGM has been criticized for the challenge in interpreting parameters. For instance, models with a stability term require that other effects be interpreted as the tendency toward a particular configuration net of network structure at the preceding time point. This complicates interpretation of the strength or tendency toward particular patterns. For instance, a within-time mutuality parameter would be interpreted as the

264   David R. Schaefer and Christopher Steven Marcum tendency for mutual ties to appear at t + 1 conditioned on the rest of the network and any tendencies toward mutuality inherent in the preceding time point (Block et al.,  2017). Second, as Krivitsky and Handcock (2014) point out, TERGMs risk conflating tie formation with tie persistence unless care is taken to distinguish the two (e.g., Robins & Pattison, 2001). Failure to parameterize the dynamic process in this way implies that formation and dissolution are effectively complements of one another, when in reality these dynamics may be driven by different processes. Tie creation is distinguished from tie stability in the separable temporal ERGM (STERGM; Krivitsky & Handcock,  2014). Like the SAOM, the STERGM is appropriate when ties are states and, hence, their initiation and duration can be modeled. However, unlike the SAOM, STERGMs make a default assumption that the processes of tie formation and dissolution are distinct (and, aptly, separable). Thus, a STERGM is really two models: one to predict which ties form by time t, conditioned on their absence at t – 1, and the other to predict which ties dissolve (via their longevity) by time t, conditioned on their presence at t – 1. The two submodels are specified independently and estimated sequentially. Thus, for instance, the edges term in a STERGM would be interpreted in the formation submodel as the log odds of a null tie gaining an edge and, in the dissolution model, as the log odds of an edge persisting to the next time point. Krivitsky and Handcock (2014) argue that this specification has the advantage of offering clarity to the network processes responsible for tie formation distinct from their dissolution (and indeed from their duration). The STERGM and the TERGM with across-time dependence are discrete-time models that assume conditional independence in the interval between observations (Koskinen, Caimo, & Lomi, 2015). The TERGM does not allow newly formed or dissolved ties from t to t + 1 to inform the dependence structure at t + 1. Within the STERGM, creation is conditionally independent of dissolution, and hence “separable from each other within a time step” (Krivitsky & Handcock, 2014). It is possible for a TERGM to model both within- and cross-time dependencies. However, specifying the full set of across- and within-time dependencies for a given structure can require many parameters to represent the various configurations of ties across time. For instance, estimating density requires two parameters—one representing new ties at t + 1 and a second for ties that persist over time. Reciprocity requires 5 parameters, while transitivity requires 26.1 This approach better represents the complex interdependencies within and across time, though at the expense of interpretability (see Robins & Pattison, 2001). An alternative ERGM approach that assumes continuous time is the longitudinal ERGM (LERGM) (Snijders & Koskinen,  2013). Here, ties are assumed to change one at a time between adjacent network observations, making model specification similar to the SAOM (Snijders, 2017). Modeled effects can thus be interpreted as the probability of a tie change dependent on the rest of the network in a given state. Applications using dynamic ERGMs are fewer in number than SAOMs and REMs, in part due to their relatively nascent stage of development. Examples of TERGMs exist with cross-time dependence limited to tie stability (McFarland et al., 2014) and more extensive models with both cross- and within-time dependence (Papachristos, Hureau, & Braga, 2013; Schaefer et al., 2011). Mousavi and Gu (2015) used a STERGM to analyze discrete network changes in time aggregations of online interactions between politicians in the US House of Representatives as a function of party and demographic homophily. STERGMs have also

Modeling Network Dynamics   265 been used to evaluate the efficacy of proposed HIV-mitigating health interventions in a simulation of men who have sex with men (Delaney et al., 2015) and to examine postdiagnosis behavior among the same population (Khanna et al.,  2014). Koskinen et al. (2015) used a LERGM to model dynamics of investment relations among foreign state actors while simultaneously controlling for initial network conditions arising as a function of spatial propinquity. One advantage that the dynamic ERGM approach has over both REM and SAOM is that it is equally suited for directed and nondirected ties without much respecification effort. Additionally, as ERGM methodological development is an active area, newly emerging extensions may be adapted to dynamic networks. For instance, there is work on ERGMs for valued networks (Krivitsky,  2012) and hierarchical data (Wang et al.,  2013) that can be incorporated into dynamic ERGMs. ERGMs are estimable within the statnet package in R, which also has a comprehensive overview (Hunter et al., 2008b), and PNet (Wang, Robins, & Pattison, 2009). As TERGMs and STERGMs are longitudinal extensions of the ERGM, they can be estimated using the same software, although user-friendly packages have also been developed (Krivitsky & Goodreau, 2016; Leifeld, Cranmer, & Desmarais, 2017).

Model Selection Naturally, the level of temporal expansiveness of one’s network data matters for the choice of an appropriate candidate model. If observations of relationships between actors are nearly instantaneous and can be captured in a nonaggregated manner, then the relational event framework may be preferred over alternatives. However, if the data are at least somewhat temporally expansive, or even extemporaneous, then either the SAOM or dynamic ERGM may be more suited to the problem. Given that SAOMs and dynamic ERGMs both model panel network data, it is reasonable to ask when one method is preferable to the other. Leifeld and Cranmer (2019) demonstrate that the models behave similarly under certain conditions, and where they differ is a matter of model specification. However, anything but a trivial model is necessarily going to differ between the two approaches, as they involve different estimation routines, modeling frames, assumptions, and sufficient statistics. Block et al. (2017) point out that in many ways, the sufficient statistics in any given model may be postulated to tap into the same underlying social processes (i.e., popularity, homophily, triadic closure), but their direct comparability ends there. While we take no position on which model should be preferred across the board, a handful of studies have focused on articulating key differences between the approaches and implications for inference. Many of the differences are in the assumptions that form the backbone of the model. One key difference is that the SAOM is actor based, while the dynamic ERGM is edge based. Block, Stadtfeld, and Snijders (2016) examine the model at the micro-level of the tie decision to explain how this difference affects the relative probability of a tie. They show that models containing seemingly the same effects (i.e., the same sufficient statistics) can give certain types of ties different probabilities. One example is a density-only model. By nature of ties being nested in actors, the SAOM version of the model will consider not only the density of ties but also their distribution across actors (Block et al., 2016).

266   David R. Schaefer and Christopher Steven Marcum A second difference is in the treatment of time and, in particular, when ties are allowed to change. The SAOM and LERGM are continuous time models that allow a change in one dyad to immediately affect the dependence structure for other dyads. By contrast, the TERGM and STERGM are discrete-time models where tie changes only affect the de­pend­ ence structure at the next observed time point. In other words, the models assume conditional independence of tie changes within a change interval. Often this assumption is not warranted and its violation leads to erroneous inferences, though shorter intervals between observations may lessen the problem (Lerner et al., 2013). This leads into a final issue surrounding time. In discrete-time models like the TERGM and STERGM, parameter estimates are sensitive to the duration between observations as  manifested in the amount of network change (Leifeld & Cranmer,  2019; Koskinen et al., 2015; Block et al., 2017). With larger intervals between t and t + 1 (i.e., more network change), the time t network explains less of the time t + 1 network, while within-time effects increase in magnitude. Indeed, as the interval between observations approaches infinity, all dependence is within time.2 This feature of the model also implies that the interval between observations should be considered when choosing within- versus cross-time dependencies. By contrast, with the SAOM only the rate parameter is dependent on interval length; other  parameter estimates are independent of the interval between observations (Block et al., 2017). This issue becomes relevant when attempting to parse the processes that lead to patterns, such as transitivity. Given a triad composed of directed , , and edges, one may ask which of the three edges formed last to complete the structure. Models that take a continuous time approach and model the network edge by edge are able to address that question. Thus, the SAOM and LERGM (and REM) can all disentangle the order in which edges appear. By contrast, discrete-time models (e.g., TERGM and STERGM) have less capacity to capture such dynamics, unless they occur across the discrete time points. This points to the importance in such models of measurement waves coinciding with the theorized change process. The issue of time also affects interpretability of the model. One common way of interpreting an ERGM is the probability of a tie conditional on the rest of the network. However, this interpretation is only justified if the network can be assumed to be in equilibrium (i.e., Markov dependence). However, in a discrete-time dynamic ERGM, the assumption of equilibrium no longer holds (Block et al.,  2017), and thus interpretation must be at the graph level. Comparisons agree that the decision between an SAOM and ERGM would be best informed by the research question and assumptions that the modeler holds or is ultimately willing to accept. For instance, Block et al. (2017, p. 26) recommend the SAOM if one is more interested in questions of process versus structure. And Block et al. (2016) suggest that an SAOM is better if actors are assumed to have a limited number of ties; otherwise, an ERGM may be preferred. Leifeld and Cranmer (2019) point out that the SAOM assumption of no simultaneous changes is violated in instances of collective action, and in such cases an ERGM-based approach may be preferable. In the absence of theoretical guidance, Leifeld and Cranmer (2019) recommend that if an a priori theoretical justification of one over the other is not possible, then the predictive performance of the TERGM may be preferred over the SAOM (but see the response in Block et al. [2019], and Block et al. [2017] for another view on the value of predictive performance).

Modeling Network Dynamics   267

Empirical Example of Three Approaches We draw our empirical example from a historical network of dominance relations between 68 members of a herd of Eurasian red deer (Cervus elaphus) collected observationally by Appleby (1980). Each directed relation in the network represents the outcome of a dominance challenge between two stags. Appleby collected these data between January 20 and April 24,1978. He recorded the unique IDs of the stags in battle, who won and lost, and a set of both dynamic and fixed covariates on the stags and their environment. The dominance hierarchy arising from the network of these battle bucks has previously been studied in aggregate by Freeman, Freeman, and Romney (1992). For our purposes, we focus on two versions of the data: for the relational event framework, we use all 2,008 dominance events, and for the SAOM and ERGM, we use two equally spaced union-rule aggregated networks (representing an aggregation of events 1 through 961 and 962 through 2,008). Figure 14.1 displays the network in several instantiations. The top-row plots illustrate the first three events and the bottom-row plots illustrate the two time slice aggregations along with the total temporal aggregation. All vertices have been fixed to the same set of coordinates (produced by custom variation on the Fruchterman-Reingold algorithm) and are colored by Event 1

Event 2

Event 3

Time Aggregation 1 (Events 1–961)

Time Aggregation 2 (Events 962–2008)

Total Aggregation (Events 1–2008)

K–Core Membership 1

2

3

4

5

6

7

8

9

10

12

figure 14.1  Domination networks at the event level (top half) and aggregate level (bottom half).

268   David R. Schaefer and Christopher Steven Marcum k-core membership in the total aggregation network. Edge weights and shading are proportional to the total number of times an edge was active within each time window (and are thus equal to unity in the first row). A very dense core (n = 39 [57% of stags], k = 12) is at the center of this network, indicating that most domination occurs among a common set of actors. Indeed, roughly 84% of all interactions occur within this core; these high-degree actors are highly homophilous as they send ties to each other around 92% of the time. We assume a fixed actor set at all times (i.e., all deer were available at all times for challenge). While this does not necessarily reflect reality, the assumption is made for simplicity and could be relaxed under any of these models in practice (e.g., by introducing time-varying support constraints on the actor set). We also make the simplifying assumption that all actions are observed and, for the relational event model, that the exact timing of the events was separated by fractions of a minute (as Appleby’s temporal resolution could not distinguish between the onset of about 18% of immediately proximal events). Lastly, we assume that the two aggregated networks represent an underlying dominance hierarchy where a directed tie stemming from buck i to buck j at time t implies that i holds a meaningful dominance position over j in that particular cross-section. As the cross-sections capture events in distinct windows of time versus dominations accumulating from time 1 to time 2, the two time points represent different manifestations of the hierarchy. In this regard, note that the Jaccard coefficient is 0.21, indicating that 21% of the ties observed at either time point were observed at both time points (Table 14.2). Our goal in this example is to demonstrate how to model a set of analogous effects with each of the three approaches. Effects are comparable in that they reflect common relational processes expected to occur over time. However, some of these effects necessarily differ in their functional form and operationalization across models in accordance with the nature

Table 14.2  Descriptive Statistics for Stags (N = 68) and Cross-Sectional Networks Time 1

Time 2

Age M

5.79

SD

2.71

Min

2

Max

13

Degree (in, out) M

5.88

6.57

SD

6.37, 6.85

6.62, 6.14

Min

0, 0

0, 0

Max Isolates (in, out, both)

25, 29

26, 31

20, 21, 15

10, 7, 3

Mutuality (M / M + A)

.02

.03

Transitivity

.46

.39

Jaccard index

.21

Modeling Network Dynamics   269 of the model. As observed by Block et al. (2016, p. 26–27), “there are no truly equivalent model specifications.”3

Model Statistics Our empirical demonstration illustrates each of the major classes of effect described previously. These are common processes and the motivation for including them is fairly intuitive. However, given the nature of our network (i.e., dominance hierarchy), we briefly reflect on the theoretical meaning behind each effect. Note that each of these effects takes the form of a statistic calculated for each event or network that follows the initial observation (e.g., a statistic for events 2 to 2,008 in the REM and at time 2 for the SAOM and dynamic ERGMs).

Attribute Effects: Age Older bucks are larger and have more experience than younger bucks, which may steer younger bucks away from becoming involved in dominance challenges with their older counterparts. Alternatively, to be successful in securing one’s place in a high position among the dominance hierarchy, a younger stag may need to engage in differently age-matched mixed fights (which they are likely to lose). To capture such possibilities, we add three age-related statistics to each model. First is a pair of statistics for the age of winners (i.e., the tie sender, or ego) and losers (i.e., the tie receiver, or alter) of dominance challenges. To assess homophily (or heterophily) on age, we include a dyadic-level statistic that considers the joint value of each ego’s and alter’s age. For REMs and ERGMs the statistic is a function of the absolute value of the age difference; for SAOMs this difference is standardized to range 0–1 and reverse-coded (higher values = more similarity; Ripley et al., 2020).

Dyadic Effects: Proximity It is very difficult for stags to engage in dominance challenges with potential partners who are not within striking range. Appleby’s data do not include a precise measure of proximity but do indicate whether the two deer engaged in a dominance event were previously observed to be grazing on the same foliage. Unfortunately, this is effectively endogenous with the event itself as they were cograzing 97% of the time before engaging in battle. Hence, we use an aggregate measure of overall preferences, namely the sum of the number of times each buck was grazing on the same type of food (with the assumption that similar foodstuffs are locally proximal).

Endogenous Effects: Reciprocity In a dominance hierarchy, reciprocity represents each actor in a dyad dominating the other. We do not expect such patterns to exist; if anything, we expect a tendency away from reciprocity. Accordingly, we add statistics to each model that capture reciprocal domination

270   David R. Schaefer and Christopher Steven Marcum over time. For the REM, with finer temporal resolution, we can distinguish between two types of reciprocity: (1) immediate (I dominates j and then the very next thing that happens is j dominates i) and (2) lagged (at some point in the past, i dominated j and j “remembers” these past experiences and dominates i at a later point). ERGMs can also accommodate multiple forms of reciprocity (Robins & Pattison, 2001), of which we examine (1) reciprocated dyads that emerge net of the preceding network and (2) lagged reciprocity in the form of an asymmetric tie at time 1 that prompts the return tie at time 2. For the SAOM, reciprocity takes only one form: the number of reciprocated ties at time 2.

Endogenous Effects: Triadic Closure In dominance relations, dynamics involving three parties are an important aspect of the process giving rise to the resulting dominance hierarchy. Transitive ties, in particular, represent a three-party manifestation of a strict global hierarchy (Chase, 1980). Other triadic configurations may also be important. Triadic closure of a three-cycle (i dominates j, who dominates k, who then turns the tides by dominating i), for instance, may represent the process by which a previously low-ranking stag overcomes his station to rise quickly to the top. As with reciprocity, the capacity to investigate the ordering of triadic relations varies across modeling approaches. We illustrate this with an example that tests which of the three possible edges complete a transitive triad (see Figure 14.2). The continuous time models

Outbound Shared-Partners

Inbound Shared-Partners

Outbound Two-Path

Key Pest Domination Future Domination

figure 14.2  Processes of triadic closure resulting in 030T triad.

Modeling Network Dynamics   271 readily allow for separate effects representing each edge, though for the REM this is only through the use of the “memory” statistics (i.e., not “p-shift” statistics, which reference the one, immediately prior event). The TERGM also allows for separate transitivity statistics, but only when modeling “lagged” transitivity that occurs across the discrete time points (see Robins & Pattison, 2001). Within a given time point, transitivity is represented by one statistic and the three paths that would close a triad all contribute to that same statistic.

Modeling Strategy and Results Our primary purpose is to illustrate similarities and differences in the sufficient statistics (i.e., model terms) and interpretations of effects for the different modeling approaches. To accomplish this, we estimate two models with similar types of statistics (baseline and a full comparative model), then specify additional models that highlight distinctive features of each approach.

Baseline Models We begin by fitting the simplest regimes of network change using a reduced specification for each modeling approach. In these models, network change is completely stochastic (change happens at random given the model parameters in the appropriate frame). In the exact timing relational event model, this model has a single effect: a pacing constant ­“intercept” term that captures the overall tendency for bucks to send ties in a piecewise constant manner (Butts & Marcum, 2017). This is a simple exponential waiting time model where the estimated coefficient of –8.543 represents the log hazard of event occurrence. If nothing else in this ever-changing world mattered, we would expect about one dominance challenge per hour to occur ((N * N – 1) * exp(–8.543 = 0.901)) and to wait about an hour and six minutes between events (1 / ((N * N –1) * exp(–8.543)) = 1.109). The baseline SAOM contains two parameters: one for the rate function and the other for out-degree. With two time points there is one period of change. The rate parameter estimate of 14.03 indicates that actors were given 14 opportunities on average to make a change in one of their outgoing relations (either adding or dropping a tie, though no change is also possible during this period). The out-degree term captures the overall tendency for actors to send ties to others. In its absence (i.e., a rate-only model), the odds of a tie in any dyad are represented by a Bernoulli process with tie probability of 0.5 (resulting in a network with density of 0.5). The estimated out-degree term is negative (b = –1.194), which is common in social networks, and adjusts the probability of a tie downward from 0.5. The baseline TERGM also contained two effects: a stability term predicting the effect of a time 1 tie on the likelihood of a time 2 tie and an edges term that predicts the likelihood of a time 2 tie, net of the time 1 network. The estimated stability term is positive (b = 2.436), suggesting that the odds of a tie from i to j were 11.4 (exp[2.436]) times greater if a tie from i to j was present at time 1. The negative edges term (–2.667) must be interpreted net of the stability term. In dyads where a tie existed at time 1, the odds of a tie existing versus not existing are exp(2.436 – 2.667) = 0.79. In terms of probability, time 1 ties have a 0.44 probability (0.79 / 1.79) of persisting into wave 2. For dyads that did not exhibit a tie at time 1, the odds of a tie forming versus not forming are exp(–2.667) = 0.07. Thus, new ties have a 0.065 probability (0.07 / 1.07) of appearing.

272   David R. Schaefer and Christopher Steven Marcum Finally, the STERGM specification of a baseline model contains two edge terms (one each for tie formation and dissolution) that govern the rates of change between observations. In this simple example, the results of the STERGM are equivalent to the TERGM. The edges parameter in the formation submodel equals –2.667, the same as the edges term in the TERGM that, on its own, governs the likelihood of new tie formation. The edges parameter in the dissolution submodel equals –0.231, which equates to 0.79 odds (exp[–0.231]) of ties persisting across waves.

More Practical Models We now present fuller model specifications that are closer to what might be done in practice. Our full comparative models containing analogous terms (e.g., related to covariates, degree effects, reciprocity, and triadic closure) are presented in the top half of Table 14.3. The results from the exact timing relational event model are reported as M1. The pacing constant representing the overall rate that any buck successfully dominates another is relatively unchanged from the baseline model. We observe a small positive effect for food preference homophily: bucks who graze on the same type of food more often are more likely to exhibit a tie. Introducing age effects reveals that a unit increase in age multiplies the hazard of successful domination by exp(0.138) = 1.148 while also multiplying the hazard of being dominated by exp(–0.182) = 0.834, all else remaining constant. Meanwhile, the negative dyadic event covariate effect for age differences reduces the hazard of two bucks engaging in battle as their age difference increases (multiples it by exp(–0.288) = 0.750). Considered separately, these results suggest that older deer more quickly accumulate successful dominations (sender effect) and are slower to be dominated in the future (receiver effect), and the rate at which two deer engage in future trials is inversely proportional to the difference in their respective ages (homophily). However, as age homophily is a form of interaction between senders and receivers, all else is not equal and these effects must be considered in tandem. This is achieved by calculating the predicted multiplicative change in hazard at the dyad level, considering the joint ages of the sender and receiver. As shown in Figure 14.3, the hazard of domination increases the greatest for younger bucks, who are most likely to dominate other younger bucks. Older bucks increase their dominations at a somewhat slower rate, and when they do, it tends to be against fellow older stags. Comparing slopes of each side of the diagonal “ridge” demarcating homophily reveals an asymmetry: the hazards of younger bucks dominating older bucks are quite low (steep dropoff), while the odds of an older buck dominating a younger buck are less extreme (a relatively more moderate slope). We now turn to structural effects, which in this specification of the REM represent how accumulated patterns of events up to the current time point affect the rate of future events. The first two structural terms capture how accumulated normalized in-degree and out-degree affect sending and receiving future ties, respectively. The large positive coefficients for both suggest that bucks with a larger share of the domination in the past are likely to continue to dominate in the future, while those who have been previously dominated more often are likely to continue to fail their challenges, net of all else in the model. This is consistent with the evidence captured by the negative reciprocity term: as the fraction of i’s past dominations coming from a particular deer increases, i’s rate of dominating that deer in the future decreases. Finally, the two triadic effects have coefficients that are consistent with

Modeling Network Dynamics   273 dominance relations: the number of the three-cycle coefficient is negative, suggesting a reduction in the hazard of the ki domination given previous i→j, j→k events. The positive generalized transitivity effect supports the expectation that strict dominance hierarchy relations unfold over time in this data. Model 4 reports coefficients from analogous terms in the SAOM. The rate function is larger than in the baseline model, with actors having 23.15 opportunities on average to change their ties. This increase compared to the baseline model reflects the larger number of network changes required for the model to recreate the distributions of the more extensive set of statistics included in this model. The out-degree term is also of greater magnitude than in the baseline model. This is an indication that ties are even less likely to exist unless they are facilitated through one of the processes represented by other effects in the model. Turning to the newly introduced effects, the dyadic effect of same grazing type is significant. Its positive valence indicates that stags are more likely to send a tie to (dominate) someone who more often grazes on the same type of food. In contrast to the REM, none of the effects of age are significant, suggesting that age is not playing a role in long-term dominance outcomes. Next we interpret the endogenous network effects. The nonsignificant out-degree activity effect suggests there is no tendency for stags who dominate many others to continue to have more outgoing (domination) ties over time. The significant in-degree popularity effect reveals that stags dominated by many others will continue to be dominated by many others over time. The reciprocity effect is nonsignificant, which suggests that stags show no signs of successfully dominating or not dominating stags who have dominated them. In other words, reciprocity occurs at a chance rate given other model parameters. The contrast of this null effect with the negative effect in the REM is noteworthy and may be due to the loss of information from dichotomizing ties in the SAOM. At the triadic level, the effects of cycles (negative) and transitivity (positive) are expected and imply a hierarchy among the stags. In interpreting these, we note that the likelihood of a stag dominating another stag depends on whether that tie would contribute to these patterns: ties contributing to cycles are less likely, while those contributing to transitivity are more likely. Model 6 reports on a rather similar TERGM. Compared to the baseline TERGM, the stability term is smaller in magnitude, implying that part of the durability in ties is explained by effects newly introduced to the model. The edges term is of greater magnitude than previously, suggesting that ties are even less likely to exist at time 2, unless supported by the newly introduced effects. The pattern of results for the node and dyad attribute effects is the same as the SAOM: only grazing on the same type of foodstuffs matters for network structure. Ties between stags grazing on the same foodstuffs are more likely than chance, net of the structure of dominance relations in the preceding wave (which may also be partially explained by similar grazing behavior). The structural terms in the TERGM represent the probability of observing certain types of local structures in the time 2 network, net of the time 1 network. The effects for in-degree and out-degree distributions are both negative, indicating that stags with higher numbers of ties are increasingly unlikely. Note that the valence of these terms departs from the SAOM, which is reflective of the fundamental difference in the approaches. For instance, the positive popularity parameter in the SAOM indicates that at the actor level, stags are more likely to extend ties to those alters with greater versus fewer incoming ties. By

Table 14.3  Estimated Coefficients REM (1) –λ Base Node and Dyad Attributes

Structure

SAOM

(2) SD

–λ

(3) SD

–λ

(4)

–8.68

Grazing on same food type

23.15

1.48

27.07 1.85

–1.92

0.1***

–2.12 0.12***

0.08*** –3.40

0.1***

0.02

0***

0.02

0***

0.02

0***

Age ego

0.14

0.02***

0.14

0.02***

0.14

0.1

Age alter

–0.18

0.01*** –0.16

Age homophilya

–0.29

0.02*** –0.25

Out-degree dist./ activity

5.89

0.43***

0

SE

(8a—initiation) (8b—persistence) SE

b

SE

–3.29

0.25***

b

SE

0.83 0.16***

–3.40 0.21*** –3.00 0.2***

–1.10

0.42***

0.01

0***

0.01

0***

0

0

–0.01

0.01

0.01 0.01

0.02 0.02

0.03 0.02

0.04

0.02†

0

0.04

0.01*** –0.14

0.01*** –0.01

0.01

0

0.01

–0.02 0.02

–0.04 0.02

–0.02

0.02

0.02*** –0.22

0.02***

0.21

0.18

0.09 0.19

–0.03 0.03

–0.03 0.03

–0.06

0.03*

0.02

0.42***

0.36

0.48

0.01

0.01

0.02 0***

–1.59 0.54***

–2.09

0.62***

0.53 0.58

7.4

0.58***

0.04

0.01***

0.02 0.01*

–2.42 0.52***

–2.35

0.67***

0.3

0.02

0.2

–0.71 0.26***

–0.17 0.34

–0.83 0.13***

–1.41 0.22*** 1.28 0.14***

Reciprocity (w/ memory)

–11.83

1.52*** –10.64

1.41*** –8.59

1.36***

0.02*** –0.22

0.02*** –0.25

0.02*** –0.49

0.11***

0.15

0.02***

0***

1.93 0.11***

b

0.01 0***

0.43***

0.01

(7) SE

0***

0.39*** 10.51

–0.17

b

0

11.14

Three-cycle

SE

STERGM

0***

In-degree dist./ popularity

Transitivity

b

(6)

b

0.08*** –8.95

6.2

(5)

SD

Rate/stability Edges/out-degree

TERGM

0.13

–0.06 0.04 0.05

0.57

0.31

–1.36 1.19

–1.05

0.15***

–0.30 1.01

1.07

0.16***

1.14 0.24***

Alternative Structureb

Reciprocity

–0.88 0.33***

Three-cycle

–0.98 0.16***

OSPSnd

–0.01

0***

0

0

0.3

ISPSnd

0.04

0***

0.03

0***

0.04 0.02

0.29 0.03***

OTPSnd

–0.01

0***

0

0

0.04 0.03

0.18 0.03***

P shifts Reciprocity (immediate) Two-path

0.12 0.03***

–9.49 18.65 –3.43

0.1

Mixed two-star

–4.19

0.11

Popularity/pref. attach.

–4.02

0.1

Two-star

–3.17

0.08

Two-component BIC

0.03***

–6.09 34,229

34,002

0.07 29,431

2357

2310

1304

-4854

* p < .05; ** p < .01; *** p < .001. a Homophily is indicated by a negative coefficient in the REMs and dynamic ERGMs, which measure the absolute difference, but a positive coefficient in the SAOMs, which reverse-code and normalize this statistic. b In predicting an event, REM effects represent whether the conditional pattern of ties existed aggregated across all previous events. SAOM effects represent whether the respective sequence of ties occurred during the sequence of micro-steps between observation waves. TERGM effects represent whether a tie at time 2 was predicted by the respective pattern at time 1. Whereas these effects are disaggregations of the terms above in the REM and SAOM, in the TERGM they are conditioned differently (on the time 1 network) versus the effects above, which are conditioned on the time 2 network (with the exception of stability).

276   David R. Schaefer and Christopher Steven Marcum Reveiver Age 1

2

3

4

5

6

7

8

9

10 11 12 13

1 2 3 4

Sender Age

5 6 7 8 9 10 11 12 13

figure 14.3  Effects of age from REM Model 3. Cells represent the relative hazard of an event in a dyad contingent on the ages of the sender and receiver. Relative hazards range from 0 (white cells) to 1 (black cells). contrast, the TERGM captures patterns at the network level, which in this case is a skewed in-degree distribution. The resulting pattern is the same, but the process of achieving it differs. The remaining three structural effects (reciprocity, three-cycles, transitivity) are similar in valence and significance as the SAOM, though with expected differences in magnitude. The final modeling approach is the STERGM, where effects are differentiated for tie ­creation versus tie persistence. Beginning with the new tie formation submodel, the same foodstuffs effect is again positive. The age homophily term (coded as absolute difference) indicates that bucks of different age were less likely to engage in a battle. The positive age ego effect hints that older bucks may be more likely to dominate others than younger bucks. In combination, these two age effects suggest that older bucks are more likely to engage in battles than younger bucks and that bucks are most likely to battle someone close in age, but when age-heterophilous battles occur, older bucks are more likely to be victorious. This STERGM submodel contains the same structural terms as the TERGM and has the same pattern of results. The two models differ, however, in interpretation. For example, the reciprocity term in the STERGM solely represents newly emerging structures, whereas in the TERGM it captures the tendency toward mutuality, in new or persistent ties, net of the tendency for ties to persist over time (as represented by the stability term). Results are very different in the tie persistence submodel, where, outside of edges, the only significant effect is transitivity. Transitive relations were more likely to form and, once formed, persist across waves. By contrast, remaining effects only had an effect on the initiation of dominance relations. For instance, bucks grazing on the same foodstuffs are more

Modeling Network Dynamics   277 likely to initiate a battle but no more likely to continue battling than bucks not on the same foodstuffs. Similarly, the patterns for age and the remaining structural terms only hold for the initiation of new ties, not for the persistence of domination across waves.

Extended Models We now turn to two forms of more specialized model specifications. First is a set of models that attempt to capture more nuanced transitivity effects. In particular, we specify dynamic effects for closing three types of transitive ties (represented in Figure 14.2). In terms of the dominance hierarchy process, we might characterize these effects in the following manner. The first represents a future challenge among lesser bucks who previously lost to the same, stronger deer. The second represents a future challenge among equals who were previously successful against the same lesser deer. And the third represents the strict dominance hierarchy process that we’ve already discussed: one buck successfully rises to the top as he successfully dominates the other two, regardless of their previous history. These effects are possible with each approach in principle, though their instantiations differ. The REM and SAOM simply disaggregate the transitivity statistic to isolate the specific tie, while the ERGM approach requires a shift in temporal perspective (discussed later).4 From the REM (M2), the evidence suggests that only one of these forms increases the hazard of future triadic closure. The number of times two bucks dominated the same third stag in the past multiplies the hazard that one of them will dominate the other in the future by 1.041, net of other effects. The other two forms of transitivity reduce the hazard of their respective event, though the magnitudes of these effects are relatively smaller (close to one unit). With the disaggregation of transitivity, other model effects are substantively unchanged, though some magnitudes shift. This model also fits better than the simpler model as indicated by the Bayesian information criterion (BIC) statistic. The SAOM results (M5) provide a slightly different story. Here, the only significant effect indicates that bucks who have been dominated by the same buck will engage in their own dominance challenge. The other two forms of triadic closure are not significant. Unlike the REM, notice that with the more detailed transitivity representations, other model effects have shifted in magnitude and significance compared to M4. The out-degree activity effect is now positive and significant, suggesting that stags who dominate many others will continue to do so. The reciprocity effect is now negative, implying even less reciprocity than expected by chance, while the three-cycle effect is much stronger. Triadic closure can also be disaggregated in a dynamic ERGM; however, this requires shifting the timeframe under consideration. The transitivity term in M6 cannot disentangle these separate processes because it relies on the count of transitive triads observed at time 2. The only way for an ERGM to detect these processes is if their manifestation coincides with the timing of observations. That is, the structural precursors are present at time t (e.g., two of the three ties) and the third tie forms at time t + 1. Thus, in the ERGM, these three effects take the form of cross-time or lagged effects. Cross-time effects can be created for many kinds of effects and, to illustrate, we also include lagged reciprocity and three-cycle effects in place of their within-time representations. Cross-time effects have been discussed for the TERGM, but not the STERGM, though this is a natural extension.5

278   David R. Schaefer and Christopher Steven Marcum The extended TERGM (M7) reveals a significant effect for each of the lagged statistics. Lagged reciprocity and three-cycles were both negative. The reciprocity effect indicates that ties at time 1 were less likely than chance to be reciprocated at time 2 (net of other model effects). And two-paths were unlikely to close to form a cycle. Turning to the transitivity effects, results suggest that the strongest effect was for ties to form among bucks who had previously dominated the same buck, as found in the REM, though other forms of transitivity were also positive and significant. The fit of this model over the previous version is improved based on BIC, suggesting that either capturing cross-time dependence or this more nuanced version of transitivity is a better representation of this network’s dynamics. Our final extended model (M3) is a variant of the REM including specialized terms to capture the dynamics involved in dyadic participation shifts (Gibson, 2003). These coefficients modify the hazard of a particular focal event occurring, given that a specific ­micro-­sequence of events has just transpired (relative to anything else possible happening, ceteris paribus). With the exception of the two-component effect, these effects represent future action that includes at least one member of the preceding action. The negative effects all indicate that the hazards for the focal events decrease (and the exponential weighting time leading to those events increases) as a result of the preceding domination challenges. Thus, the model expects to wait a long time for these types of events to transpire relative to waiting for some other event to occur, given their differing respective possibilities. For example, the negative two-path effect indicates that a stag who has just dominated another stag is unlikely to be the next stag to dominate another (i.e., the next event); or, put another way, we expect to wait a long time between such events. The only nonsignificant introduced effect was reciprocity (a participation shift of the form i→j, j→i). Indeed, the estimate is absolutely large at less than –9 and the posterior standard deviation is likewise large, indicating that immediately successful retaliation over one’s prior dominator is almost certainly nonexistent in this social system. Unsuccessful bucks are unlikely to have the reserve capacity to rally their just desserts. Indeed, the fact that all the dyadic micro-behavioral coefficients are negative suggests that the dynamics of this social process is more aptly characterized as a series of random competitions, rather than series involving the same two or three actors in succession. Indeed, the two-component effect, representing action in nonoverlapping dyads, has the shortest waiting time of all participation shifts at around 3.1 hours (1 / ((66 * 65) * exp(–3.4 + –6.1))), assuming all other effects constant at 0. Moreover, we observe a massive reduction in the base rate and accumulated versions of in-degree, out-degree, and reciprocity coefficients in this model, suggesting that at least some of those effects are entailed by the participant shifts (probably due to the fact that high-degree actors are more likely to be repeatedly involved in micro-sequences of events). Finally, we notice a sizeable reduction in BIC for this model over the others. This suggests that including dyadic participation shifts, despite the fact that they are negatively informative, improves the fit of this model to these data.

Subsequent Steps Upon completion of modeling, a number of additional steps are prudent. We do not have space to discuss these but point the reader to useful sources. A question for any model is how well it is able to recreate the observed data (i.e., goodness of fit). For REMs, Butts and Marcum (2017) describe how the deviance residuals can be used to summarize the extent to

Modeling Network Dynamics   279 which the fitted model is “surprised” by the observed event data (see also statnet workshop materials). Dyadic REMs can also be evaluated for predictive value using supplied statistics. REMs and ERGMs offer likelihood-based measures of fit (BIC, Akaike information criterion, etc.) that can be used for model comparison purposes. However, for ERGMs and SAOMS, overall estimates of fit and predictive value are still in development. Reasonable approaches have been developed that use a fitted model to simulate network features (either included in or withheld from the fitted model) and compare those simulations to the observed network. Discussions of this approach are available for ERGMs (Goodreau, Kitts, & Morris,  2009; Hunter, Goodreau, & Handcock,  2008a) and SAOMs (Lospinoso & Snijders, 2019). Once a model of network change is developed, it can be used as the basis for simulations to address different types of questions. This is straightforward with SAOMs and dynamic ERGMs as their estimation already involves a simulation algorithm, but more difficult in REMs as sampling an event history given a set of model parameters is not currently supported by software (though it can be done using custom scripts; see the appendix in Marcum & Butts [2015] along with the relevent R package documentation for simple examples). Simulations are currently the best approach for evaluating goodness of fit as mentioned earlier. Simulations can also be used as a means to decompose the processes responsible for observed network patterns. For instance, Steglich et al. (2010) estimate how much of the homophily observed in a network is due to peer influence versus selecting into homophilous friendships. Simulations can also be used as a means to evaluate network interventions, such as mitigating peer influence on smoking (adams & Schaefer,  2016), or to construct artificial networks, such as needed to model disease transmission (Jenness et al., 2016). Such usages have the advantage of grounding the parameters and/or initial conditions in a realworld context. Simulations can be used to predict the exact structure of the network at future time points, though accuracy may be poor (Block et al., 2017). Our examples only contained two waves of panel data. However, the methods for network panel data can be extended to model much larger sequences. In such a case, it is worthwhile to consider whether the modeled processes vary in magnitude over time (i.e., temporal heterogeneity; see Lospinoso et al.,  2011). This is especially important with dynamic ERGMs, where parameter estimates are influenced by the amount of time/network change between observations (Block et al., 2017).

Outstanding Issues and Future Directions Although this chapter has given ties prominence in modeling network change, research questions often extend to node changes (cf., the September 2017 issue of Network Science). Much of the research on network-behavior coevolution is driven by questions regarding “peer influence” on an outcome of interest, where modeling endogenous network change is necessary to control for selection into relationships based on the outcome (Steglich et al., 2010). By this same logic, however, when estimating how nodal attributes contribute to network selection, it is important to control for endogenous change in those attributes;

280   David R. Schaefer and Christopher Steven Marcum otherwise, estimates of network change may be biased. At present, node change functionality is far less developed than network change modeling in the approaches presented. For instance, the SAOM (Steglich et al., 2010) is the most used coevolution model (see https:// www.stats.ox.ac.uk/~snijders/siena/siena_applications.htm for a fairly comprehensive list), yet SAOMs have been restricted to modeling discrete behavior change measured as ordinal integers (though extensions to continuous measures are being developed). ERGMs and REMs do not have the functionality to estimate coevolutionary models. Discrete-time models may have particular difficulty modeling coevolution: ERGMs because of the assumption that changes between observations are independent, and REMs because the dependent variable is the next endogenous event (and thus the model is not amenable to endogenous changes in actor attributes as a function of the event history). However, all three approaches can incorporate dynamic covariate effects, which may provide a first approximation to coevolution. Given these limitations for ERGMs and REMs, continuous-time versions of these models may be a more promising route (e.g., Snijders & Koskinen, 2013; Stadtfeld & Geyer-Schulz, 2011). This chapter has described methods for modeling networks in the form of either “event” or “state” relations. However, dyadic relationships in the natural world are characterized by both: events, such as interaction and self-disclosure, cumulate to form relationships, such as friendship. Modeling how micro-level interactions contribute to relationships promises great insight to network dynamics. For instance, what patterns and types of interactions lead to friendship formation and persistence over time? What interactions contribute to relationship dissolution (e.g., a breakdown of interactional reciprocity; Schaefer,  2012)? What leads to asymmetric friendship relations? And is there an identifiable point at which relationships begin (and end)? Models have been developed to account for multiplex relations (Huitsing et al., 2012; Snijders et al., 2013); however, these are effectively restricted to state-level data. Models are needed that include forces at both event and state levels. Such an exercise is data demanding, requiring network data at both the event and state level over time. With the widespread accumulation of electronic trails, such data is within reach (Kitts, 2014; Bahulkar et al., 2017). Moreover, recent advances in modeling hierarchical network data may be generalizable to incorporate such temporal nesting data structures (Lazega & Snijders, 2016). As longitudinal network modeling becomes more widespread, it will be important to pay greater attention to the ways that tie formation differs from tie dissolution. Relatively few studies have modeled formation distinct from dissolution, but those that do typically report effects that differ in strength (Cheadle et al., 2013; Moody, 1999; van Workum et al., 2013; van Zalk et al.,  2010). These two processes are so often treated as complements of one another that it may not even be a modeling consideration that occurs to the researcher in the first place. Thus, solid theoretical treatment that establishes when dissolution and formation differ and should be treated as separate models is strongly needed in this arena. For instance, although homophily is one of the most common patterns of human association, its effects on relationship formation and durability, versus dissolution, are woefully understudied (McPherson et al., 2001). Yet, there are good reasons to believe that the processes that bring people together are distinct from those that keep people together or ultimately tear them apart. From a practical, modeling perspective, this is most readily accomplished using an SAOM or STERGM, the latter of which explicitly prompts users to consider this issue. As the relational event framework assumes that events are instantaneous, dissolution cannot be specified explicitly; however, recent work by Marcum and Butts (2015)

Modeling Network Dynamics   281 demonstrates how the approach can be used to model spell data (e.g., with distinct effects for the onset and duration/termination of events). New models are emerging regularly and more work is needed to better understand how to sift through the alternative modeling approaches available for seemingly the same data structure. Comparisons of ERGMs and SAOMs for state data are beginning to appear (Block et al., 2016, 2017; Leifeld & Cranmer, 2019), though on the whole they are far from conclusive. Part of the lack of consensus is attributable to questions over how a longitudinal model should be evaluated. While there is agreement that a good model should reproduce “other network properties not explicitly modeled,” there are differing opinions on whether models should articulate the “mechanisms” responsible for change and the value of “out of sample” prediction (Block et al., 2017). There is also debate over the proposed actor-­oriented relational event model (Stadtfeld et al.,  2017) and how it relates to existing models (Butts,  2017; see the Symposium on Dynamic Network Models in the 2017 issue of Sociological Methodology). Much work is needed to offer a more thorough understanding of the implications of different approaches to modeling event data (e.g., Butts vs. de Nooy vs. Stadtfeld). Certainly, no single “canned” solution exists to evaluate which model is most appropriate for a given dataset: researchers are urged to scrupulously vet their choice of model in the context of their research question, the underlying assumptions they are willing to make, and, naturally, the structure of their longitudinal network data. One connection between the ERGM and SAOM frameworks is their implicit reliance on the Markov stationarity assumption—that future states of the network depend only on the current state and not on the sequence of states preceding it. As Snijders (2011) points out, the Markov assumption facilitates use of continuous-time Markov processes in modeling network dynamics one tie change at a time. This specification follows directly from Holland and Leinhardt (1977), who demonstrated that such an assumption simplifies network dynamics to a series of basic events, even if observations are made at a time-aggregated level. In fact, the SAOM assumes such a continuous-time Markov process unfolding latently between observations, during which actors only consider the current state of the network when evaluating whether to add or remove a tie in the latent event process. Likewise, in both discrete-time and continuous-time ERGMs, cross-time dynamics are assumed to arise from structures found only in the immediate past (again, the current network state predicts the next). In both frameworks, when more than two waves of data exist, each successive wave is evaluated in this manner and integrated according to the Markov assumption. The real world, however, may involve long-term time dependence on the past or future states of the network, thus violating the Markov assumption. For instance, from an actor-oriented perspective, one may strategically time action to fill longstanding structural holes to take advantage of slowly emerging market conditions in a trade exchange network. Or, returning to our empirical example, an early established dominance hierarchy may restrict opportunities for competition between high-degree partners in the short term while they recover from battle: serial network observations made in the interim may appear to exhibit regime change under a Markov process and would be a poor predictor of the state of future interactions once those actors recuperate. While longer lagged effects can be introduced into a dynamic ERGM to incorporate information from the distant past into the model, this is still an open state of development for SAOMs. The relational event framework, however, has no such stationarity assumption. Conditional independence between events is assumed, allowing one to construct and fit event-to-event Markov transition models to data. This is a

282   David R. Schaefer and Christopher Steven Marcum considerable advantage of using REM for event data: it supports the use of sufficient statistics that model how the next event (a future state) depends on a series of prior actions (perhaps transpiring relatively long ago in the event history). In the absence of event data, however, careful consideration to stationarity should be given to the interpretation of a dynamic model of network panel data. In this chapter, we reviewed three commonly used approaches to modeling network dynamics. These methods are not single models per se. Rather, they are modeling frameworks that provide a host of possibilities for addressing research questions about network change. We included the relational event framework for event data, the stochastic actor-­oriented framework for panel data from the actors’ perspectives, and the dynamic exponential ­random graph framework also for panel data but from a systems perspective. We focused on these frameworks in part because of their popularity in the literature and their accessibility in software packages, and in part because of their relatively nascent state of development. As such, new features, improvements, special cases, and software are frequently introduced into the field. As a result, some of the limitations and caveats discussed previously may already be on a path toward resolution. Our empirical example was drawn from relational data on dominance challenges in a herd of red deer collected by Appleby (1980). The results highlight how each framework can be employed to take advantage of unique modeling features, while sharing analogous aspects of the underlying structure. As we noted, effects estimated from these models are not quantitatively comparable: their assumptions, estimation routines, parameters, and even respective motivating theories are different. Estimates from a relational event model, for example, have a proportional hazards interpretation at the level of the event (i.e., the effects modify the hazard of the next event). Those from a stochastic actor-oriented model have a utility interpretation at the level of the actor (e.g., effects modify an actor’s utility of making a local change). And, of course, these differ from dynamic exponential random graph models, which have an edge formation/dissolution interpretation in terms of probability (e.g., effects modify the odds of tie formation in the next state). Thus, these models are not always appropriate for all types of dynamic network data and research questions. Happily, these three frameworks are adaptable enough within their own space to encompass a variety of dynamic network research questions.

Notes 1. The full set of configurations representing a structure is limited to those where at least one tie exists at t + 1 and a tie is present in each relation composing the structure at least once. For reciprocity, each relation can have three patterns of presence—t1, t2, t1,2—which combine to make eight configurations (23). Combining isomorphic configurations and removing the configuration with only t1 ties reduces the set to 5. For transitivity, there are three ties that can combine in 27 ways (33). None of these are isomorphic in a transitive triad, and thus only the set with three t1 ties is removed, leaving 26 configurations available as sufficient statistics. 2. At the opposite extreme, as the interval between observations approaches 0, all de­pend­ence is across time, and may be best represented as a relational event process (Butts & Marcum, 2017). 3. We use the relevent, RSiena, and statnet packages to estimate models. We focus on the TERGM and STERGM rather than LERGM as they depart from the SAOM in notable ways worth illustrating (e.g., by being discrete time) and are readily estimable in R.

Modeling Network Dynamics   283 4. Currently, these triadic effects can only be disentangled in the SAOM using maximum likelihood estimation, and thus all SAOMs were estimated using maximum likelihood. 5. Lagged effects that span multiple time periods are also possible (see Leifeld & Cranmer, 2019).

References adams, j., & Schaefer D. R. (2016). How initial prevalence moderates network-based smoking change: Estimating contextual effects with stochastic actor oriented models. Journal of Health and Social Behavior, 57, 22–38. Appleby, M. C. (1980). Social rank and food access in red deer stags. Animal Behaviour, 74(3), 294–309. Bahulkar, A., Szymanski, B., Chan, K., & Lizardo, O. (2017). Co-evolution of two networks representing different social relations in NetSense. Complex Networks & Their Applications V: Proceedings of the 5th International Workshop on Complex Networks and their Applications (COMPLEX NETWORKS), 693, 423–434. Barzel, B., & Barabási, A. L. (2013). Universality in network dynamics. Nature Physics, 9(10), 673–681. Berardo, R., & Scholz, J. T. (2010). Self-organizing policy networks: Risk, partner selection, and cooperation in estuaries. American Journal of Political Science, 54(3), 632–649. Block, P., Hollway, J., Stadtfeld, C., Koskinen, J., & Snijders, T. (2019). “Predicting” after peeking into the future: Correcting a fundamental flaw in the SAOM--TERGM comparison of Leifeld and Cranmer (2019). arXiv:1911.01385. Block, P., Koskinen, J., Hollway, J., Steglich, C., & Stadtfeld, C. (2017). Change we can believe in: Comparing longitudinal network models on consistency, interpretability and predictive power. Social Networks, 52, 180–191. Block, P., Stadtfeld, C., & Snijders, T. A. B. (2016). Forms of dependence: Comparing SAOMs and ERGMs from basic principles. Sociological Methods & Research, 48, 202–239. Borgatti, S. P., Mehra, A., Brass, D. J., & Labianca, G. (2009). Network analysis in the social sciences. Science, 323(5916), 892–895. Bott, E. (1957). Family and social network. London, UK: Tavistock Publications. Brandes, U., Lerner, J., & Snijders, T. A. (2009, July). Networks evolving step by step: Statistical analysis of dyadic event data. Paper presented at the 2009 International Conference on Advances in Social Network Analysis and Mining. http://dx.doi.org/10.1109/ASONAM. 2009.28. Butts, C. T. (2008). A relational event framework for social action. Sociological Methodology, 38, 155–200. Butts, C. T. (2017). Actor orientation and relational event models. Sociological Methodology, 47, 47–56. Butts, C.  T., & Marcum, C.  S. (2017). A relational event approach to modeling behavioral dynamics. In A. Pilny & M. S. Poole (Eds.), Group processes: Data-driven computational approaches (pp. 51–92). Switzerland: Springer International Publishing. Chase, I. D. (1980). Social process and hierarchy formation in small groups: A comparative perspective. American Sociological Review, 45, 905–924. Cheadle, J. E., Stevens, M., Williams, D. T., & Goosby, B. J. (2013). The differential contributions of teen drinking homophily to new and existing friendships: An empirical assessment of assortative and proximity selection mechanisms. Social Science Research, 42, 1297‒1310. Czarna, A.  Z., Leifeld, P., Śmieja, M., Dufner, M., & Salovey, P. (2016). Do narcissism and emotional intelligence win us friends? Modeling dynamics of peer popularity using inferential network analysis. Personality and Social Psychology Bulletin, 42(11), 1588–1599.

284   David R. Schaefer and Christopher Steven Marcum De Nooy, W. (2011). Networks of action and events over time. A multilevel discrete-time event history model for longitudinal network data. Social Networks, 33(1), 31–40. Delaney, K. P., Rosenberg, E. S., Kramer, M. R., Waller, L. A., & Sullivan, P. S. (2015). Optimizing human immunodeficiency virus testing interventions for men who have sex with men in the United States: A modeling study. Open Forum Infectious Diseases, 2(4), ofv153. Desmarais, B. A., & Cranmer, S. J. (2012). Micro-level interpretation of exponential random graph models with application to estuary networks. Policy Studies Journal, 40(3), 402–434. DuBois, C., Butts, C. T., McFarland, D., & Smyth, P. (2013). Hierarchical models for relational event sequences. Journal of Mathematical Psychology, 57, 297–309. DuBois, C., Butts, C.  T., & Smyth, P. (2013). Stochastic blockmodeling of relational event dynamics. Journal of Machine Learning Research, 31, 238–246. Ellwardt, L., Steglich, C., & Wittek, R. (2012). The co-evolution of gossip and friendship in workplace social networks. Social Networks, 34(4), 623–633. Elmer, T., Boda, Z., & Stadtfeld, C. (2017). The co-evolution of emotional well-being with weak and strong friendship ties. Network Science, 5, 278–307. Feld, S.  L. (1981). The focused organization of social ties. American Journal of Sociology, 86(5), 1015–1035. Ferligoj, A., Kronegger, L., Mali, F., Snijders, T. A. B., & Doreian, P. (2015). Scientific collaboration dynamics in a national scientific system. Scientometrics, 104(3), 985–1012. Festinger, L., Back, K. W., & Schachter, S. (1950). Social pressures in informal groups: A study of human factors in housing (Vol. 3). Palo Alto: Stanford University Press. Fischer, C. S. (1982). What do we mean by “friend”? An inductive study. Social Networks, 3(4), 287–306. Freeman, L. C., Freeman, S. C., & Romney, A. K. (1992). The implications of social structure for dominance hierarchies in red deer, Cervus elaphus L. Animal Behaviour, 44, 239–245. Friedkin, N.  E. (1998). A structural theory of social influence. Cambridge, UK: Cambridge University Press. Gibson, D. R. (2003). Participation shifts: Order and differentiation in group conversation. Social Forces, 81, 1335–1381. Goodreau, S. M., Kitts, J. A., & Morris, M. (2009). Birds of a feather, or friend of a friend? Using exponential random graph models to investigate adolescent social networks. Demography, 46, 103‒125. Greenan, C.  C. (2015). Diffusion of innovations in dynamic networks. Journal of the Royal Statistical Society: Series A (Statistics in Society), 178(1), 147–166. Haas, S. A., & Schaefer, D. R. (2014). With a little help from my friends? Asymmetrical social influence on adolescent smoking initiation and cessation. Journal of Health and Social Behavior, 55, 126–143. Hanneke, S., Fu, W., & Xing, E.  P. (2010). Discrete temporal models of social networks. Electronic Journal of Statistics, 4, 585–605. Heider, F. (1946). Attitudes and cognitive organization. Journal of Psychology, 21, 107–112. Holland, P.  W., & Leinhardt, S. (1977). A dynamic model for social networks. Journal of Mathematical Sociology, 5(1), 5–20. Hollway, J., Lomi, A., Pallotti, F., & Stadtfeld, C. (2016). Multilevel social spaces: The network dynamics of organizational fields. Network Science, 5, 187–212. Huitsing, G., van Duijn, M.  A.  J., Snijders, T.  A.  B., Wang, P., Sainio, M., Salmivalli, C., & Veenstra, R. (2012). Univariate and multivariate models of positive and negative networks: Liking, disliking, and bully-victim relationships. Social Networks, 34(4), 645–657.

Modeling Network Dynamics   285 Hunter, D. R., Goodreau, S. M., & Handcock, M. S. (2008a). Goodness of fit of social network models. Journal of the American Statistical Association, 103(481), 248–258. Hunter, D. R., Handcock, M. S., Butts, C. T., Goodreau, S. M., & Morris, M. (2008b). ergm: A package to fit, simulate and diagnose exponential-family models for networks. Journal of Statistical Software, 24(3). Jack, S. L. (2005). The role, use and activation of strong and weak network ties: A qualitative analysis. Journal of Management Studies, 42(6), 1233–1259. Jenness, S. M., Goodreau, S. M., Morris, M., & Cassels S. (2016). Effectiveness of combination packages for HIV-1 prevention in Sub-Saharan Africa depends on partnership network structure. Sexually Transmitted Infections, 92(8), 619–624. Khanna, A. S., Goodreau, S. M., Gorbach, P. M., Daar, E., & Little, S. J. (2014). Modeling the impact of post-diagnosis behavior change on HIV prevalence in Southern California men who have sex with men (MSM). AIDS and Behavior, 18(8), 1523–1531. Kitts, J. A. (2014). Beyond networks in structural theories of exchange: Promises from computational social science. Advances in Group Processes, 31, 263–298. Koskinen, J., Caimo, A., & Lomi, A. (2015). Simultaneous modeling of initial conditions and time heterogeneity in dynamic networks: An application to foreign direct investments. Network Science, 3(1), 58–77. Krivitsky, P.  N. (2012). Exponential-family random graph models for valued networks. Electronic Journal of Statistics, 6, 1100. Krivitsky, P. N., & Goodreau, S. M. (2016). STERGM—Separable temporal ERGMs for modeling discrete relational dynamics with statnet. https://cran.r-project.org/web/packages/ tergm/vignettes/STERGM.pdf Krivitsky, P. N., & Handcock, M. S. (2014). A separable model for dynamic networks. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(1), 29–46. Krivitsky, P.  N., & Morris, M. (2015). Inference for social network models from ­egocentrically-sampled data, with application to understanding persistent racial disparities in HIV prevalence in the US. National Institute for Applied Statistics Research Australia, University of Wollongong, Working Paper 05–15, 45. Labun, A., Wittek, R., & Steglich, C. (2016). The co-evolution of power and friendship networks in an organization. Network Science, 4, 364–384. Lazega, E., & Snijders, T. A. (Eds.). (2016). Multilevel network analysis for the social sciences: Theory, methods and applications (Vol. 12). Switzerland: Springer International Publishing. Leifeld, P., & Cranmer, S. J. (2019). A theoretical and empirical comparison of the temporal exponential random graph model and the stochastic actor-oriented model. Network Science, 7, 20‒51. Leifeld, P., Cranmer, S. J., & Desmarais, B. A. (2017). btergm. Temporal exponential random graph models by bootstrapped pseudolikelihood. R package version 1.8.2. Lerner, J., Indlekofer, N., Nick, B., & Brandes, U. (2013). Conditional independence in dynamic networks. Journal of Mathematical Psychology, 57(6), 275–283. Lospinoso, J. A., & Snijders, T. A. B. (2019). Goodness of fit for stochastic actor-oriented models. Methodological Innovations, 12, https://doi.org/10.1177/2059799119884282. Lospinoso, J.  A., Schweinberger, M., Snijders, T.  A., & Ripley, R.  M. (2011). Assessing and accounting for time heterogeneity in stochastic actor oriented models. Advances in Data Analysis and Classification, 5(2), 147–176. Lubbers, M. J., Molina, J. L., Lerner, J., Brandes, U., Ávila, J., & McCarty, C. (2010). Longitudinal analysis of personal networks. The case of Argentinean migrants in Spain. Social Networks, 32(1), 91–104. Lusher, D., Koskinen, J., & Robins, G. (2013). Exponential random graph models for social networks: Theory, methods, and applications. Cambridge, UK: Cambridge University Press.

286   David R. Schaefer and Christopher Steven Marcum Marcum, C. S., & Butts, C. T. (2015). Constructing and modifying sequence statistics for relevent using informR in R. Journal of Statistical Software, 64, 1–34. McFarland, D. A., Moody, J., Diehl, D., Smith, J. A., & Thomas, R. J. (2014). Network ecology and adolescent social structure. American Sociological Review, 79(6), 1088–1121. McPherson, M., Smith-Lovin, L., & Cook, J. M. (2001). Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27(1), 415–444. Moody, J. W. (1999). The structure of adolescent social relations: Modeling friendship in dynamic social settings (Doctoral dissertation). Retrieved from UMI: 9954682. Morris, M., & Kretzschmar, M. (1995). Concurrent partnerships and transmission dynamics in networks. Social Networks, 17(3–4), 299–318. Mousavi, R., & Gu, B. (2015). The effects of homophily in Twitter communication network of US House of Representatives: A dynamic network study. Available online at SSRN: http:// dx.doi.org/10.2139/ssrn.2666052 Newcomb, T. M. (1961). The acquaintance process. New York, NY: Holt, Rinehart & Winston. Niezink, N. M. D., & Snijders, T. A. B. (2017). Co-evolution of social networks and continuous actor attributes. Annals of Applied Statistics, 11, 1948–1973. Papachristos, A. V., Hureau, D. M., & Braga, A. A. (2013). The corner and the crew: The influence of geography and social networks on gang violence. American Sociological Review, 78(3), 417–447. Porter, M. A., & Gleeson, J. P. (2016). Dynamical systems on networks. Switzerland: Springer. Prell, C., & Feng, K. (2016). The evolution of global trade and impacts on countries’ carbon trade imbalances. Social Networks, 46, 87–100. Rapoport, A. (1949). Outline of a probabilistic approach to animal sociology i. Bulletin of Mathematical Biophysics, 11, 183–196. Ripley, R. M., Snijders, T. A. B., Boda, Z., Vörös, A., & Preciado, P. (2020). Manual for RSiena. Oxford, UK: University of Oxford, Department of Statistics; Nuffield College. Robins, G., & Pattison, P. (2001). Random graph models for temporal processes in social networks. Journal of Mathematical Sociology, 25(1), 5–41. Robins, G., Pattison, P., Kalish, Y., & Lusher, D. (2007). An introduction to exponential random graph models for social networks. Social Networks, 29, 173–191. Sampson, S. (1969). Crisis in a cloister (Doctoral dissertation). Cornell University. Schaefer, D. R. (2012). Homophily through non-reciprocity: Results of an experiment. Social Forces, 90, 1271–1295. Schaefer, D.  R. (2018). A network analysis of factors leading adolescents to befriend ­substance-using peers. Journal of Quantitative Criminology, 34, 275‒312. Schaefer, D. R., Simpkins, S. D., Vest, A. E., & Price, C. D. (2011). The contribution of extracurricular activities to adolescent friendships: New insights through social network analysis. Developmental Psychology, 47, 1141–1152. Silk, M. J., Croft, D. P., Delahay, R. J., Hodgson, D. J., Weber, N., Boots, M., & McDonald, R. A. (2017). The application of statistical network models in disease research. Methods in Ecology and Evolution, 26, 1026–1041. Smith, J. A. (2012). Macrostructure from microstructure: Generating whole systems from ego networks. Sociological methodology, 42(1), 155–205. Smith, J. A. (2015). Global network inference from ego network samples: testing a simulation approach. Journal of Mathematical Sociology, 39(2), 125–162. Snijders, T.  A.  B. (1996). Stochastic actor-oriented models for network change. Journal of Mathematical Sociology, 21(1–2), 149–172.

Modeling Network Dynamics   287 Snijders, T. A. B. (2001). The statistical evaluation of social network dynamics. Sociological Methodology, 31(1), 1–33. Snijders, T. A. B. (2005). Models for longitudinal network data. In P. Carrington, J. Scott, & S. Wasserman (Eds.), Models and methods in social network analysis (pp. 215–247). New York, NY: Cambridge University Press. Snijders, T. A. B. (2011). Statistical models for social networks. Annual Review of Sociology, 37, 131–153. Snijders, T.  A.  B. (2017). Stochastic actor-oriented models for network dynamics. Annual Review of Statistics and Its Application, 4, 343–363. Snijders, T. A. B., & Koskinen, J. (2013). Longitudinal models. In D. Lusher & G. Robins (Eds.), Exponential random graph models for social networks: Theory, methods and applications (pp. 130–139). New York, NY: Cambridge University Press. Snijders, T. A., Lomi, A., & Torló, V. J. (2013). A model for the multiplex dynamics of twomode and one-mode networks, with an application to employment preference, friendship, and advice. Social Networks, 35(2), 265–276. Snijders, T. A. B., van de Bunt, G., & Steglich, C. E. G. (2010). Introduction to stochastic actorbased models for network dynamics. Social Networks, 32(1), 44–60. Stadtfeld, C., & Geyer-Schulz, A. (2011). Analyzing event stream dynamics in two-mode ­networks: An exploratory analysis of private communication in a question and answer community. Social Networks, 33(4), 258–272. Stadtfeld, C., Hollway, J., & Block, P. (2017). Dynamic network actor models: Investigating coordination ties through time. Sociological Methodology, 47, 1–40. Steglich, C. E. G., Snijders, T. A. B., & Pearson, M. (2010). Dynamic networks and behavior: Separating selection from influence. Sociological Methodology, 40, 329–393. Suitor, J. J., Wellman, B., & Morgan, D. L. (1997). It’s about time: How, why, and when networks change. Social Networks, 19, 1–7. Tita, G. E., & Radil, S. M. (2011). Spatializing the social networks of gangs to explore patterns of violence. Journal of Quantitative Criminology, 27(4), 521–545. Uchino, B.  N. (2004). Social support and physical health: Understanding the health consequences of relationships. New Haven, CT: Yale University Press. van Workum, N., Scholte, R. H., Cillessen, A. H., Lodder, G., & Giletta, M. (2013). Selection, deselection, and socialization processes of happiness in adolescent friendship networks. Journal of Research on Adolescence, 23(3), 563–573. van Zalk, W. M. H., Kerr, M., Branje, S. J., Stattin, H., & Meeus, W. H. (2010). It takes three: Selection, influence, and de-selection processes of depression in adolescent friendship networks. Developmental Psychology, 46(4), 927. Veenstra, R., Dijkstra, J. K., Steglich, C., & Van Zalk, M. H. (2013). Network–behavior dynamics. Journal of Research on Adolescence, 23(3), 399–412. Wang, P., Robins, G., & Pattison, P. (2009). PNet: Program for the simulation and estimation of exponential random graph models. Melbourne, Australia: Melbourne School of Psychological Sciences, The University of Melbourne. Wang, P., Robins, G., Pattison, P., & Lazega, E. (2013). Exponential random graph models for multilevel networks. Social Networks, 35(1), 96–115. Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications. Cambridge, UK: Cambridge University Press. Wasserman, S., & Pattison, P. (1996). Logit models and logistic regressions for social networks: I. An introduction to Markov graphs and p*. Psychometrika, 61(3), 401–425.

Chapter 15

Causa l I n fer ence for Soci a l N et wor k A na lysis Kenneth A. Frank and Ran Xu*

Social networks can have powerful effects on a multitude of outcomes, ranging from health (Coker et al., 2002) to education (Frank, Muller & Mueller, 2013) to work (Burt, 2005; Granovetter, 2005). Networks may have especially important effects on changes in behaviors or beliefs because networks can deliver resources or convey norms through an immediate social context (Coleman,  1994). But there are at least three characteristics giving networks important potential effects on individual behaviors that can create challenges in estimating network effects. First, people may assimilate the behavior of their network members or seek interactions with similar others. This makes the processes of influence and the selection of network members based on homophily (birds of a feather flock together) potentially conflated (Leenders, 1995; McPherson, Smith-Lovin, & Cook, 2001). Second, a particular relationship can be especially powerful or unstable (Simmel, 1950) depending on the actions of a mutual third party. Thus, pairs of relations are not independent of one another. Third, because networks can be powerful forces in conveying and filtering the effects of social context, it can be challenging to differentiate the effects of networks from those of the shared context (Feld, 1981; Penuel et al., 2013). Given the importance of network effects and the challenges in estimating them, there have been deep and historical debates about the relevant effects of networks (Doreian,  2001; Shalizi & Thomas, 2011; Wellman & Berkowitz, 1988). For example, Burt’s (1987) critique of Coleman, Katz, and Menzel (1966) triggered an enduring debate about the mechanism of influence through direct contacts or structural similarity (with Leenders [2002] re-expressing the debate in terms of direct versus indirect networks), and Reagans & Zuckerman’s (2008) critique of Burt due to the potential for omitted variable bias. Christakis and Fowler’s findings concerning the contagion of health behaviors and attributes, especially obesity, triggered extensive debate about the specification of the models to account for explanations based on the choice of network members, not their influences (see Lyons,  2011, and   *  These authors contributed equally to this work.

Causal Inference for Social Network Analysis   289 Cohen-Cole & Fletcher’s [2008a, 2008b] critique of Christakis & Fowler’s [2007, 2008] models of the contagion of obesity; see also Gelman, 2011, and Leenders, 1995). Most recently some have challenged the ability to identify network effects at all (Shalizi & Thomas, 2011). In this chapter we will formally describe two fundamental processes associated with networks: the influence of network members on beliefs or behaviors and the selection of network members with whom to interact or establish a tie. Understanding how these processes are different but potentially intertwined helps us characterize the strength of potential threats to making inferences about networks as well as identify techniques that can be used to mitigate potential bias. The core of our enterprise is that causal inference in any science is about differentiating among alternative explanations. To discern among these explanations we need good theory (Wellman & Berkowitz, 1988), good data, and a model that can support scientific discourse about the mechanisms of an effect.

The Influence Process We begin with the theoretical explanations for why influence might occur. One must ask deeply, why would an actor be influenced by the behaviors or beliefs of others in his or her network? What happens if he or she chooses not to be influenced? For example, an adolescent, Ashley, may conform to the mean behavior of those in her network to avoid losing emotional support or to align her behaviors with her social identity—to deviate would be to deny her identity as a member of a particular social category (e.g., Akerlof & Kranton, 2002). Common reasons for influence include conformity/norms, informational based, imitation, identification, and competition (Burt,  2009; Deutsch & Gerard,  1955; Frank & Fahrbach, 1999; Petty & Cacioppo, 1986). But the theory of influence should be tailored to the particular context and behavior (Doreian, 2001). For example, fishermen might conform in their fishing practices to access the local knowledge and expertise of others (Frank, Maroulis, Belman & Kaplowitz, 2011). This sets up a social exchange of knowledge for conformity of members of a community who share similar problems and who benefit from coordinated action. Such an exchange would take on a different dynamic between fishermen in different communities whose local contexts differ and who are not mutually invested in a shared outcome. Drawing on theory, one might also authentically challenge whether influence occurs at all. why would an actor expecting to exit a social system respond to norms within that system if there are few consequences for failure to conform? For example, high school seniors may be less likely to conform to norms in their schools as they anticipate divergent social contexts in the near future, and alternatively certified teachers (e.g., Teach for America) expecting to leave a school after a short time may not conform to teaching norms within that school. Similarly, itinerant fishermen may not conform to the fishing norms of a particular locale in which they are fishing (Frank et al., 2007). Given that the context may alter the extent of influence, the challenge, then, is to estimate network influence given the unusual features of network processes. For example, network effects may be estimated as large not because an adolescent is influenced by the behaviors of others in her network, but because the adolescent chooses to interact with others who engage in similar behaviors. Therefore, in this section we discuss the challenges generated

290   Kenneth A. Frank and Ran Xu by the potential for network influence and selection and some possible estimation approaches to mitigate those challenges.

Randomized Experiments One key approach to minimizing bias in the estimation of any effect is to employ randomization. Applied to network studies, Sacerdote (2000) utilize the random assignment of students to roommates to identify possible influences of roommates on academic and social behaviors, finding positive effects of roommates on subjects’ grade point average and on decisions to join social groups such as fraternities. Similarly, Kremer and Levy (2008) found that males who were randomly assigned to roommates who drank alcohol prior to college obtained a lower grade point average than those assigned to nondrinking roommates. In each case, given the random assignment of people to networks, there is no reason to believe that there will be a priori differences between those embedded in different networks. Correspondingly, any final differences can be attributed to the network effect rather than pre-existing factors. Of course, there may be differences simply due to the happenstance of sampling, but sampling variability is reflected in conventional statistical procedures (e.g., standard error or confidence interval). A second type of experiment preserves subjects’ pre-existing social contacts but randomly assigns subjects to treatment (intervention) conditions. For example, to study peers’ influence on product adoption, Aral and Walker (2012) randomly manipulated whether Facebook users received notifications that their friends had adopted the product, finding that younger and married individuals were less susceptible to influence. In a similar study Kramer, Guillory, and Hancock (2014) manipulated positive and negative expressions Facebook users received from their friends and showed that emotions are contagious via social networks. For a more thorough review, see VanderWeele and An (2013). While randomized experiments provide a sound basis for inference, there are several limitations to conducting randomized experiments to study network processes. First, randomized experiments can be difficult to conduct because of ethical concerns or logistical challenges, or constraints on resources (Cook, 2002, 2003; Rubin, 1974). Second, concern about the violation of the stable treatment unit value assumption (SUTVA), often interpreted as no spillover effects, can be especially great for studies of network effects that ­presumably focus on the phenomena of transfer from one person to another (e.g., Sun et al., 2013).1 A third concern is that most experiments assume that manipulated ties are static, when in fact most ties are continuously dynamic as people deliberately seek to create new interactions and sever existing ones. For example, Carell, Sacerdote, and West (2013) found a negative treatment effect for the students they intended to help because the students avoided the peers with whom the designed group intended them to interact and instead formed more homogeneous subgroups. Finally, virtually all human randomized experiments are conducted on nonrandom populations. Thus, what one gains in internal validity by employing randomization one might lose in external validity or representativeness by selecting a specific sample (Frank, Maroulis, Duong & Kelcey, 2013).

Causal Inference for Social Network Analysis   291

Observational Studies Given the limitations of randomized experiments, observational studies are an important source of understanding about network influence. But in observational studies network effects may be especially difficult to identify because inferred influence or contagion (Besag, 1974; Doreian, 1980; Friedkin, 1998; Oetting & Donnermeyer, 1998; Ord, 1975) could instead be due to selection of network partners, for example, based on homophily of race, gender, or specific behaviors (Lazarsfeld & Merton,  1954; Matsueda & Anderson,  1998; McPherson et al., 2001). Similarities in behaviors can also be attributed to selection of social setting as people with prior similarities can select themselves into common social settings (Feld, 1981; Kalmijn & Flap, 2001), and it is the settings that have the effects on behavior, not the exposure to others in those settings. Manski (1993) also describes simultaneous influence, which he refers to as the “reflection problem”; does the mirror image cause the person’s movements or reflect them? Manski concludes that identification is difficult if possible at all when there is simultaneous influence, requiring strong assumptions to make statistical and causal inferences (see Online Appendix A for a formal representation of the reflection ­problem). Our first response to concerns about inferring network effects is to emphasize the importance of longitudinal data to disentangle network influence from selection. Use of longitudinal data, in which one controls for prior behaviors of subjects, has been shown to approximate estimates from randomized experiments in a growing number of studies and settings (e.g., Bifulco, 2012; Cook, Shadish, & Wong, 2008; Steiner et al., 2010). Using longitudinal data, one might specify the influence on an actor at a given time point as a function of the actor’s interactions with others in a preceding interval and the others’ prior behaviors as well as the actor’s own prior behaviors. For example, let delinquencyit represent the extent to which adolescent i engaged in delinquency behaviors at time t. This can be modeled as Delinquency it = β0 + β1 delinquency behaviors at time t – 1 of others i interacted with between t – 1 and t i (1) +β2 delinquency behaviorsit –1 + eit ,   where the error terms (eit) are assumed independently distributed, N(0,σ2). The term delinquency behaviors at time t – 1 of others i interacted with between t – 1 and ti is the network exposure term. It can be simply the mean or sum of the behaviors of those with whom adolescent i interacted between t – 1 and t. Using the mean as an example, if adolescent Ashley interacted with Kim and Sam in the last year and Kim engaged in 5 acts of delinquency last year and Sam engaged in 10, then Ashley is exposed to a norm of 7.5 ((5 + 10) / 2) through her network.2 Correspondingly, the term β1 indicates the normative influence of others on adolescent i. If β1 is positive, then the more the members of Ashley’s network engaged in delinquency behaviors, the greater her delinquency behaviors relative to her prior behaviors. Note that the influence model allows one to infer influence based not on self-report (which might be fraught with bias due to socially desirable responses or perception bias) but on the change in behaviors in the direction of those with whom one interacts (Friedkin,

292   Kenneth A. Frank and Ran Xu 1998). As such, it is optimally estimated entirely from observational data (Frank, Muller, Schiller, Riegle-Crumb, Strassman-Mueller, Crosnoe & Pearson, 2008), but it can also be estimated from direct self-reports on beliefs and behaviors without asking the respondent to infer sources of influence. Note also that defining the exposure term in terms of prior beliefs eliminates the correlation of errors and observed variables that are simultaneously measured. Manski (1993) himself recognized much of the potential of longitudinal data to mitigate the reflection problem by accounting for the dynamics of reciprocated influence (see also Sims [1980] for the use of dynamic models and longitudinal data in vector autoregression). Now consider the selection counterargument to influence in light of the model. In particular, one might argue that Ashley has friends who are delinquent because of her prior tendency to be delinquent, not because she was influenced by them. But this prior tendency is represented through Ashley’s prior behavior and is controlled in the model. In its simplest form, if delinquency behavior at time t – 1 takes a value of 1 for the presence of such behavior and 0 for the absence, then inclusion of prior delinquency behavior in the model amounts to estimating the network effect relative to those who engaged in similar prior behavior. The model as it is presented does assume a linear effect of prior behaviors, but this can be modified by considering nonlinear forms (e.g., taking the log of prior behaviors) as well as considering the interaction of prior behavior and network members (friends may have more influence on those who already are strongly inclined toward the behavior—Frank, Zhao, Penuel, Ellefson & Porter, (2011).). Even with longitudinal data there are challenges to estimating network influence. For example, Shalizi and Thomas (2011) express the challenges to estimation of network effects as an omitted variable problem.3 Even with longitudinal data, if there is an omitted variable that codetermines both individual outcomes and network ties, contagion or influence effects are generally unidentifiable. Consider Figure 15.1, in which selection (homophily) is confounded with influence and represented as an “omitted variable bias” problem.4 To understand the sources of bias in Figure 15.1, we have to know that to obtain consistent and unbiased estimates of influence using ordinary least squares (OLS) (in which the longrun expected value of the estimate equals the population value), the unobserved errors must be uncorrelated with the observed variables. In this case, if either the idiosyncratic

risk taking tendencyi

A

C

Bi delinquencyit-1

risk taking tendencyj

D

delinquencyit

Key

Unobserved variables

Bj delinquencyjt-1

eit

observed variables

Error

Figure 15.1  Influence and selection are confounded through the unobserved variable risk-taking tendency.

Causal Inference for Social Network Analysis   293 error eit or omitted variable risk-taking tendency is correlated with observed variables, the estimates will be inconsistent. For now we only focus on the unobserved variable risk­taking tendency and assume that conditional on the unobserved variable the eit are not correlated with the observed variables in the model.5 Assuming that delinquency is a function of unobserved risk-taking tendency (arrow Bi), and when there is homophily-based selection that operates through this unobserved variable, (1) person i will select person j who is similar on the unobserved risk-taking tendency (arrow A in the figure); (2) person j’s delinquency behavior is a function of person j’s risk-taking tendency (arrow Bj), which is similar to person i’s risk-taking tendency through selection; (3) because of (1) and (2), the risk-taking tendency for person i will be correlated with person j’s delinquency behavior (arrow C in the figure). As risk-taking tendency is unobserved, this violates the key assumption of OLS so that estimates will be inconsistent, and the contagion (exposure) effect is unidentifiable. For an analogous algebraic argument see Online Appendix A. Given our description of the sources of bias in OLS estimation of influence in terms of omitted variables, we can describe situations in which OLS estimates will not be biased. Each of these leverages the formal model as in (1) to articulate alternative explanations and then control using appropriate measures. First, consider that the omitted variable affects only the selection of network members, but not the attribute of the individuals as in Figure 15.2a. For example, this would be the case if people select friends based on similar risk taking, but risk taking did not affect delinquency (arrows Bi and Bi are removed from Figure 15.1). Thus, the variables affecting the behavioral outcome of interest are all observed. As a result, the unobserved errors are no longer correlated with observed variables so that the contagion effect is identified. Note that in this case there still could be strong homophily in the selection process, and homophily can still depend on the unobserved variable, but in this case selection is no longer confounded with influence as the unobserved factors that affect selection do not affect influence. (a)

(b) ci

ci

Xi

Xj

Yit-1

Yjt-1

Yit-1

Yjt-1

Yit

Yit

eit

Key

eit

Unobserved variables

observed variables

Error

figure 15.2  Situations under which influence effect can be identified using OLS.

294   Kenneth A. Frank and Ran Xu Second, there may be a variable that codetermines selection and influence, but the variable is observed and controlled in the influence model (see Figure 15.2b). Since the variable is observed and controlled, the OLS assumptions are satisfied, and thus the contagion effect is identified. In this case there still could be strong homophily in the selection process, and homophily may depend on the same variable that appears in the influence model. But the homophily does not affect estimation because it is accounted for by controlling for what is common in selection and influence. This argument also extends to observed variables, such as pretests (Shadish et al., 2008; Steiner et al., 2010; Bifulco, 2012) that serve as proxies for an unobserved variable, satisfying the strong assumption of ignorability.

Simulation Example of Identification Using OLS While the previous arguments are grounded in the assumptions for estimating model (1), here we give a simulated example where contagion effects are identified. Specifically, we simulate data in which a common trait codetermines selection and influence, but the common trait is observed. In this case there is strong homophily in network selection but all variables affecting the behavioral outcome are observed as in Figure  15.2b (see Online Appendix B for details of the simulation). Using simulated data from four time points for a network of 40 nodes with a density of ties of 0.2, we estimated the influence model in (1) using OLS. Figure 15.3 shows the mean bias for estimates of the previous behavior and network exposure terms. There is essentially zero bias using OLS when controlling for the time invariant trait that codetermines influence and selection, despite the fact that there is strong homophily in the selection process. Bias of Prior 0.4

0.2

0.2

0.0

0.0

Bias

Bias

Bias of Network Exposure 0.4

–0.2

–0.2

–0.4

–0.4 0.0

0.2 0.4 0.6 0.8 True coefficient of network exposure

0.0

0.2 0.4 0.6 True coefficient of prior

0.8

figure 15.3  Simulation examples where a common trait that codetermines selection and influence is observed. Both the prior term and contagion effects in the influence model are identified and unbiased using OLS. The blue line represents the bias of mean estimates where each point is a result of 500 simulations. Dashed lines represent the 95% confidence intervals.

Causal Inference for Social Network Analysis   295 Furthermore, note that OLS recovered the true parameters given that the selection process was accounted for but not necessarily directly modeled (e.g., SIENA—Steglich, Snijders, & Pearson, 2010). This is because bias is induced by unobserved confounded variables, not by information that is accounted for in the model (such as captured in X and Yit–1). We note here that our approach to obtain unbiased estimates heavily leverages the specification of timing in the model. The outcome is modeled as a function of the individual’s own prior characteristic as well as the prior characteristics of those with whom the individual interacted in the preceding time interval. OLS will not generally generate unbiased models of belief or behavior specified as a function of concurrent measurements (Doreian, 1980). This is part of the basis of criticism of Christakis and Fowler’s (2007, 2008) models of obesity, which include concurrent measurements on the right-hand side (e.g., Lyons, 2011).6 There are three current techniques that attempt to reduce the bias in estimating contagion or network effects. Although each potentially leverages extra information in the data to reduce bias, none can claim to eliminate all sources of bias. First is the instrumental variable (IV) method. Here an instrument must be identified such that it is correlated only with the endogenous explanatory variables such as the exposure term (arrow C in Figure 15.1) but not directly correlated with the outcome (arrow D in Figure 15.1 is removed). This assumption is known as the exclusion restriction. Using such an instrument, one can obtain unbiased and consistent estimates of the influence parameter (e.g., using a two-stage least squares method—see Wooldridge, 2010). Although there are considerable concerns about the use of instrumental variables for estimation (Wooldridge, 2010, p. 108), there are a handful of studies that leverage either substantive knowledge7 (Duncan, Haller, & Portes, 1968; Angrist & Lang, 2004; O’Malley et al., 2014; An, 2015) or specific network structure8 (Bramoullé, Djebbari, & Fortin, 2009) to identify instrumental variables. However, these methods require a strong theoretical argument regarding the assumption of no correlation with the omitted variable (and the outcome), which is essentially untestable. Furthermore, standard errors are severely inconsistent for weak instruments, which is especially a concern for small samples (Bound, Jaeger, & Baker,  1995; Wooldridge,  2010). As a possible extension (see Xu,  2018), econometricians have used multiple lagged premeasures as instruments for the premeasure immediately preceding a change score. For example, if one wanted to model the change in academic effort during the senior year of high school controlling for the change during the junior year, one could use the change measured during the sophomore year as an instrument for the change during the junior year. Such an approach could be applied to models of network influence, modeling change in behavior as a function of change in exposure, using prior states as instruments for the prior term. Second, propensity score methods have been used to match actors based on their observed characteristics (Aral, Muchnik, & Sundararajan,  2009). By matching on such characteristics, one intends to remove bias due to observed characteristics, reduce the assumptions of linearity, and estimate differential network effects by propensity. Fletcher [2010] uses a sophisticated matching approach combined with instrumental variables to leverage variation of age within grade levels of adolescents). Third, stochastic actor-oriented models (e.g., SIENA) repeatedly simulate influence and selection processes simultaneously. The key contribution here is that by modeling selection, the estimates of influence are conditional on the dynamics of the social structure. Details of this method can be found in Chapter 14 of this handbook.

296   Kenneth A. Frank and Ran Xu Critically, both propensity score techniques and stochastic actor-oriented models depend on conceptualization and measurement of confounding variables. For example, while propensity score approaches have become fairly popular in economic and public policy research, they cannot directly account for bias due to unobserved factors (Heckman, 2005; Morgan & Harding, 2006, p. 40; Rosenbaum, 2002, p. 297; Shadish, Cook, & Campbell, 2002, p. 164). Similarly, Steglich et al. (2010) note that SIENA estimation depends on inclusion of variables confounded with influence and selection (see also Shalizi & Thomas,  2011). Therefore, good science still depends on the specification and measurement of alternative explanations to network effects. Finally, it is worth noting that there are some recently developed techniques that have potential to correctly identify network effects when there is an omitted variable that codetermines both individual outcomes and network ties. Specifically, Xu (2018) and Shalizi and McFowland (2018) leverage the fact that nodes are more likely to have ties and be near each other in the network space when they share latent homophily based on the omitted variable, and show that latent positions estimated from the latent space model (Hoff, Raftery, and Handcock, 2002) in the selection process can be a good proxy for the same omitted variable that is also present in the influence process. Controlling for these estimated latent positions in the traditional influence model (e.g., OLS) shows promise to obtain unbiased estimates of the true network effects.

Informing the Inevitable Debate by Quantifying the Robustness of Inferences While use of timing, multiple pretests, instrumental variables, propensity scores, latent factors, and simulations can all improve estimation, each has its own set of assumptions that cannot be directly verified. Therefore, no inference made from network models will go undisputed, especially if the interpretation has policy implications. In response, we advocate using sensitivity analysis to quantify the conditions necessary to invalidate an inference (Diprete & Gangl, 2004; Frank, 2000; Lin, Psaty, & Kronmal, 1998; Rosenbaum, 1983; 1986). For example, Frank and Min (2007) and Frank et al. (2013) quantify how much bias there must be in an estimated effect to invalidate an inference. Frank, Maroulis, Duong & Kelcey (2013) then draw on Rubin’s causal model to interpret the percentage of bias to invalidate an inference in terms of replacing observed cases with null hypothesis cases. This leads to statements such as “To invalidate the inference, yy% of the estimate would have to be due to bias. Therefore, one would have to replace yy% of the observed cases with null hypothesis cases (in which network exposure had no effect on the outcome) to invalidate the inference of a network effect.” These techniques have already been applied to network models (Frank, Zhao, & Borman,  2004; Frank, Zhao, Penuel, Ellefson & Porter,  2011). Furthermore, the calculations can be conducted using statistical significance as a threshold, an effect size of a particular value, or a combination as in a nonzero null hypothesis. The calculations are straightforward (available at http://www.konfound-it.com—see Xu et al., 2019) and examples can be found in Frank, Maroulis, Duong & Kelcey (2013) and Dietz et al. (2015) as well as in Online Appendix C.9

Causal Inference for Social Network Analysis   297 Importantly, the sensitivity analyses described earlier do not change the initial causal inference. What they do is quantify the language for discourse about causal inferences. For example, instead of debating whether there is a variable omitted from the model, discourse is in terms of how strong the omitted variable would have to be to invalidate an inference. Quantifying the robustness of causal inferences should help anchor discourse in theory and the full body of empirical results, ultimately making discourse more scientific.

The Selection of Interaction Partners While the influence model represents how actors change behaviors or beliefs in response to others around them, the selection model represents how actors choose with whom to interact. For example, Frank, Muller, and Mueller (2013) describe how adolescents identify new friends in school as a function of shared courses. The basic model for friendship nomination was



 p(friendship nomination ij )  log   = θ0 + θ1 course overlap ij , 1 − p(friendship nomination ij ) 

(2)

where friendship nominationij took a value of 1 if either adolescent i nominated adolescent j or j nominated i as a friend at time 2 (spring 1996), and 0 otherwise. The term course overlap might represent the number of courses adolescents i and j took together in academic year 1995–1996, perhaps weighted for size of course. Frank et al. also defined course overlap as membership in a common course-taking cluster (e.g., the subset of students who took English 3, Woodworking, and Desktop Computers together—see their Figure  15.1). The basic hypothesis was that adolescents would be more likely to make new friends with those with whom they took courses than with others in the schools, as the courses attracted students with similar interests and provided opportunities for exposure and interaction, as well as the presence of third parties who could broker introductions. Common factors that have been considered important for selection include individual preference for similar others (e.g., homophily), network structure (e.g., reciprocity and transitivity, popularity [or preferential attachment]), and social context (e.g., proximity and social foci) (Rivera, Soderstrom, & Uzzi, 2010). And as was the case for influence models, researchers must engage theory deeply to consider why an actor would interact with another. For example, why is homophily such a strong driver of network ties? Does it decrease transaction costs, draw on triadic closure, or both? Is status gained or lost based on with whom one interacts? Such questions drive back to the fundamental goals of the actor and how those goals can be accomplished by engaging in specific network interactions (Frank, Muller & Mueller, 2013).

Estimation of Selection Models There are dependencies in network data that compromise standard estimation approaches for models such as (2). In particular, maximum likelihood approaches such as might be commonly used for logistic regression do not apply because one cannot easily specify the

298   Kenneth A. Frank and Ran Xu likelihood as a function of independent observations. See Chapters 12 and 13 for discussions of the history and recent developments in the specification and estimation of models of network relations that account for the dependencies inhering in network structure. We believe that techniques that control for dependencies through random effects (Baerveldt Van Duijn, & van Hemert, 2004; Lazega & Van Duijn, 1997; Hoff, 2005) as well as latent spaces (Hoff, Raftery, & Handcock, 2002; Minhas, Hoff, & Ward, 2019; Sweet, Thomas, & Junker,  2013) have encouraging potential. But even after accounting for dependencies inhering in network structure, a key source of dependencies remains due to the influence process. For example, if adolescents interact with others who are similar in terms of homework effort, one might not know whether they chose to interact with similar others or they influenced one another to exert similar effort on homework.10

Quantifying the Robustness of Inferences from Selection Models As was the case for the influence models, concerns about causality can be mitigated by carefully conceptualizing the timing of the phenomenon and analyzing the data accordingly. In particular, if one controls for prior relationships, then factors affecting current relationships are associated with a change in relationship that is a stronger basis for a causal inference. For example, one might model adolescent friendship in 11th grade using homework efforts at the beginning of 11th grade and controlling for the presence of friendship in 10th grade. The control for the prior relationship can take the form of a simple variable representing the prior state or separating the sample by whether the pair had a relationship at time 1. The advantage of the latter approach is that it allows one to separately estimate the formation of new relationships versus the dissolution of existing relationships (see Steglich et al., 2010; Frank, Muller & Mueller, 2013). One can also leverage the chronology to estimate effects of network structure at time 1 on network relations at time 2, as is done in relational event modeling (Butts, 2009; De Nooy, 2011). As was the case for influence models, even after accounting for dependencies in network data using theory, sophisticated estimation techniques, and longitudinal data, there may still be concerns about alternative explanations or omitted variables. Following our argument for influence models, we advocate that researchers inform scientific discourse by quantifying the conditions necessary to invalidate an inference from a selection model. Here, one can use similar techniques as applied to the influence models to obtain rough calculations. More refined approaches would use simulation and analytic techniques used in estimation to calculate the percentage of relations that must be reassigned at random to invalidate the inference (Xu and Frank, forthcoming). 

Discussion Most of the methodological advances in the analysis of social networks over the last 20 years have focused on the modeling of how actors select with whom to interact. These ­statistical

Causal Inference for Social Network Analysis   299 techniques have drawn on simulation, statistical assumptions such as Markov chains, and latent spaces. While these are important, the sociological importance of networks depends in part on the capacity of networks to predict behavior or beliefs. Therefore, we have attended carefully to the bases for statistical and causal inferences from network influence models. Our synthesis of the current state of causal inference from network models is encouraging. If one attends carefully to the complex dependencies and processes in network data, there are many circumstances under which network effects can be identified well enough to inform science. In particular, longitudinal data and carefully measured covariates can be used to mitigate the bias induced when influence and selection are both present. Dependencies in selection models can be controlled either through the terms of an exponential random graph model or with the consolidated terms of a latent space. Given these controls, one can estimate substantive effects such as those associated with homophily. Actor-oriented models (e.g., SIENA) also have potential to help social scientist understand systems in which influence and selection occur. By extending into agent-based models, theories based on actor-oriented models can investigate systemic implications of network dynamics beyond the timeframe of observation and can explore the implications of underlying mechanisms for network-related behaviors (Hedström & Ylikoski, 2010). But the power of such models demands careful and intense consideration of the social scientist. First, one should begin by carefully theorizing each mechanism (selection or influence) independently and specify the model accordingly. For example, one might focus on selection as a function of gender homophily and influence accentuated by seniority or age. Second, actor-oriented models typically pertain to processes that play out over multiple time points. As such, one should carefully consider lagged effects and the factors that diminish behavior or disrupt social ties, as well as those that cause increases in behavior and the formation of social ties. Third, the simultaneous modeling of influence and selection does not ensure that dependencies and alternative explanations have been accounted for. Regardless of the statistical techniques used to estimate network effects, the value of the enterprise will ultimately inhere in the theoretical understanding of the phenomenon (Platt, 1964; Box, 1976; Hedström & Swedberg, 1996). In a world of limited capacities, actors will not be able to engage in all possible ties or exert effort in all possible directions. Therefore, social scientist must ask deeply, why would a person seek ties with A over B? What is gained by engaging A, and what is lost by not engaging B? And what is gained by conforming to the norms of one group over those of another? The theory behind the answers may come from social capital (Frank, Muller, Schiller, Riegle-Crumb, StrassmanMuller, Crosnoe & Pearson, 2008), social-psychological balance (Heider, 1946), or transaction costs of economics (Zeng & Xie, 2008). Such theories have the capacity to direct data collection, integrate findings across studies, and direct further research. As an example of a theory-driven model used in analytic sociology (Hedström & Bearman,  2009), Frank, Muller, Schiller, Riegle-Crumb, Strassman-Muller, Crosnoe & Pearson (2008) built influence models based on a theory of adolescent utility, in which the starting point is not why an adolescent chooses certain friends or responds to norms, but the fundamental goals of the adolescent (Akerlof & Kranton, 2002). In particular, following Akerlof and Kranton, Frank et al. specified an adolescent’s utility as a balance of the desire to learn or advance in education versus the desire to fit in socially. Drawing on this utility, Frank et al. derived an influence model in which adolescents conform to the expectations

300   Kenneth A. Frank and Ran Xu of potential friends (clusters of students who took the same courses) as well as those who were already friends, thus representing multiple layers of social context (Hedström, Sandell, & Stern, 2000). The theory also accommodated classic descriptions of the pursuit of human capital and socioeconomic advantage. Frank et al. tested the theory with Adolescent Health and Academic Achievement (AHAA) data, which included measures of academic effort, course-taking data, and indicators of the value of human capital. Ultimately, Frank et al. concluded that clusters of course mates had important effects on girls’ future math course taking (but not on that of boys), even controlling for the alternative explanations attributed to direct friends, human capital, and socioeconomic background. There are likely to be concerns about causal inference from network models even in the best of scenarios when one has developed strong theory, obtained data that aligned with the theory, and carefully specified and estimated the corresponding network model. This holds partly because network processes are complex and interwoven such that identification of network effects can be challenged. Furthermore, such challenges are especially likely when the results of research could have implications for policy and the allocation of resources (e.g., education or public health). Given the almost inevitable debates about causal inferences (Abbott,  1998), especially from network models, we advocate a shift in the scientific process. Instead of the lone ­scientist or set of scientists (including coauthors) making meaning of network models, we expect a community of scientists and sometimes policymakers to contribute to the sense-making. As such, researchers must provide the information and frame the discourse for sense-making. As with other science, this includes a careful description of methods and data, making them as transparent as possible. Beyond advocating for conventional transparency, we implore social scientists to discuss the robustness of inferences from network models in quantitative terms. For example, researchers can report the percentage of bias that must be present to invalidate an inference. The application of sensitivity analysis allows the research community to recognize the potential for alternative causes in a nondeterministic system while still inferring that there is enough evidence for action. In the end, researchers will have to carefully consider both the transparency of models and the preferred properties of estimation (e.g., limiting bias). But this need not be zero sum. Sometimes the more transparent model can eliminate much of the bias, as we show in our analysis and simulations. Furthermore, one could interpret a conventional and transparent model by quantifying the robustness of any inference to bias as well as estimate a more sophisticated model. In some cases the robustness analysis can even be applied to the more sophisticated model (e.g., Frank, Sykes, Anagnostopoulos, Cannata, Chard, Krause & McCrory, 2008), although one must take care in considering corresponding null ­hypotheses. We have focused on concerns about the internal validity of inferences from network models in terms of dependencies and omitted variables. But there are other factors that can affect inferences from network models. As in science, one must attend carefully to measurement issues. For example, in network data there has been consistent evidence in bias of recall (Marsden,  2005; Chapter  7 in this handbook), although this may be mitigated by using rosters or prompting with categories (e.g., departments in a firm—Henry, Lubell, & McCoy,  2012). There can also be bias in overgeneralization of triadic closure (Freeman et al., 1987), which can be mitigated by asking respondents about their behaviors directly,

Causal Inference for Social Network Analysis   301 and not perceptions of ties among others’, especially those who are more than two steps removed in the network. There are also concerns about the external validity of network effects. Technically, if one analyzes the network data on all the members of a system, then one has the population, and statistical inference is not necessary. But even in these cases one can consider the population as all realizations of the system of which one only has a sample at a single point in time. This is an example of Fisher’s idea of a superpopulation or Cronbach & Shapiro’s (1982) description of the broadest populations (Units, Treatments, Outcomes, Settings‒UTOS). Therefore, issues of statistical inference and generalization apply. See Chapter 12 in this handbook for further discussion. As in other settings, one must carefully consider how well the example analyzed represents other conditions. Networks among adolescents in the United States in the mid1990s (pre-cell phone, Facebook, Twitter, and Instagram) may not represent current adolescents. In these cases one can explore variation in network effects within the study to anticipate the generality of the inferences. Furthermore, one can interpret the bias necessary to invalidate an inference in terms of the mechanism for sampling cases rather than the mechanisms affecting selection or influence (Frank & Min, 2007; Frank, Maroulis, Duong & Kelcey, 2013). Such an analysis could also inform the interpretations of missing data.

Conclusion Causal inference in any context is about differentiating among alternative explanations. Therefore, in social network analysis the competing mechanisms of influence and selection create inherent challenges for causal inference. Here we have identified two critical steps social network researchers can take to respond to the challenge of causal inference. First, we implore the research community to develop careful and comprehensive theory about the specifics of the phenomenon. For example, influence within a high school building is different than at a party on the weekend, and influence likely draws more on identity in adolescence than in adulthood. Drawing on these theories, we can develop hypotheses and clear and testable alternative explanations, which is to the heart of causal inference. Second, social network analysts should gather and analyze longitudinal data and rich covariates whenever possible. This allows the researcher to focus on those factors that cause a change in behavior or in network ties. Studying change mitigates the potential for spurious inferences based on differences at baseline. Using good theory and longitudinal data, researchers should be better able to transparently estimate models that can then be discussed by the research and other communities. It is in such communities that causal inference should be adjudicated, and it is through such communities that causal inference will be translated into policy or practice. It is with this discourse in the research and policy communities in mind that we have presented techniques for quantifying the robustness of causal inferences. While newer techniques may improve estimation, theory and relevant (longitudinal) data should still be used in evaluating inferences when these techniques are applied (Wellman & Berkowitz, 1988). The same holds true for techniques on new frontiers such as those developed for multilayered (two-mode) networks or the synthesis of existing techniques with

302   Kenneth A. Frank and Ran Xu multilevel models accounting for context. What will endure is that good science will start and end with well-developed theory and data that can directly inform that theory.

Notes 1. One solution is to conduct randomized experiments on units at higher levels of aggregation which can be assumed to be independent (e.g., schools), but such experiments can become very expensive as treatments must be implemented at the organizational level and therefore degrees of freedom are defined at the organizational level (Slavin, 2008). 2. In this sense, the exposure term extends basic conceptualizations of centrality (e.g., Freeman, 1978) because the exposure term is a function of the characteristics of the members of a network, whereas centrality is a function only of the structure of the network. 3. The difficulty of identification caused by entanglement between contagion effects and other confounding variables (shared social context, ego’s and alter’s attributes for example) can be easily framed as an omitted variable bias problem. What is less obvious is that the dilemma caused by co-evolution of influence and selection process can essentially be framed as an omitted variable bias problem as well. 4. Unobserved variables that only affect behavior but not selection cause estimation problems as well, but that’s not the focus here. 5. Different exogeneity assumptions must hold for different estimation methods, more details see Wooldridge (2010). 6. The limitation applies equally to generalized linear models such as the logistic regression used by Christakis and Fowler (2007, 2008). 7. Duncan et al. (1968) used friend’s intelligence as an IV to study peer effects on occupational and educational aspirations. Angrist and Lang (2004) used the predicted number of transferred-in disadvantaged students as an IV to study effects on the academic performance of students in the receiving schools. O’Malley et al. (2014) used genetic alleles as IVs to estimate peer effects on weight status. An (2015) used friends’ family smoking status to estimate peer effects on smoking. 8. Bramoullé et al. (2009) argued that if there are intransitive triads, for example i->j->k but i and k are not connected, which assumes all the influence from i to k is through j and captured by j’s behavior, then j’s outcome can be used as an instrument for i to estimate contagion effects for k’s outcome since k is not directly influenced by i. 9. Frank (2000, 2008) also quantifies how strongly an omitted variable would have to be correlated with the exposure term and with the resultant behavior to invalidate an inference of an effect of network exposure on the resultant behavior. This leads to statements such as “an omitted variable would have to be correlated at xxx with both the exposure term and the outcome” to invalidate the inference. See spreadsheet available at https://msu.edu/~kenfrank/ KonFound-it!.xlsx. See also recent extensions of this approach specific to network models Vander Weele (2011), and Xu (2016) has considered sensitivity of network inferences in terms of the extent to which network relations would have to be “rewired” to invalidate and inference. 10. To address this, one could use third parties as instruments in models of selection. For example, in modeling an adolescent’s choice of network partners one could use the characteristics of potential friends’ parents as an instrument for characteristics of potential friends. This satisfies the exclusion restriction by reasoning that the parents’ characteristics are known only through what is observed in the potential friend (An, 2015).

Causal Inference for Social Network Analysis   303

References Abbott, A. (1998). The causal devolution. Sociological Methods & Research, 27(2), 148–181. Akerlof, G. A., & Kranton, R. E. (2002). Identity and schooling: Some lessons for the economics of education. Journal of Economic Literature, 40(4), 1167–1201. An, W. (2015). Instrumental variables estimates of peer effects in social networks. Social Science Research, 50, 382–394. Angrist, J. D., & Lang, K. (2004). Does school integration generate peer effects? Evidence from Boston’s Metco Program. American Economic Review, 94(5), 1613–1634. Aral, S., Muchnik, L., & Sundararajan, A. (2009). Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. Proceedings of the National Academy of Sciences, 106(51), 21544–21549. Aral, S., & Walker, D. (2012). Identifying influential and susceptible members of social networks. Science, 337(6092), 337–341. Baerveldt, C., Van Duijn, M. A. J., & van Hemert, D. A. (2004). Ethnic boundaries and personal choice: Assessing the influence of individual inclinations to choose intraethnic relationships on pupils’ networks. Social Networks, 26(1): 55‒74. Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society, Series B (Methodological), 36(2), 192–236. Bifulco, R. (2012). Can nonexperimental estimates replicate estimates based on random assignment in evaluations of school choice? A within-study comparison. Journal of Policy Analysis and Management, 31(3), 729–751. Bound, J., Jaeger, D. A., & Baker, R. M. (1995). Problems with instrumental variables estimation when the correlation between the instruments and the endogenous explanatory variable is weak. Journal of the American Statistical Association, 90(430), 443–450. Box, G. E. (1976). Science and statistics. Journal of the American Statistical Association, 71(356), 791–799. Bramoullé, Y., Djebbari, H., & Fortin, B. (2009). Identification of peer effects through social networks. Journal of Econometrics, 150(1), 41–55. Burt, R. S. (1987). Social contagion and innovation: Cohesion versus structural equivalence. American Journal of Sociology, 92(6), 1287–1335. Burt, R. S. (2005). Brokerage and closure: An introduction to social capital. Oxford, UK: Oxford University Press. Burt, R. S. (2009). Structural holes: The social structure of competition. Boston, MA: Harvard University Press. Butts, C. T. (2008). A relational event framework for social action. Sociological Methodology, 38(1), 155–200. Carrell, S.  E., Sacerdote, B.  I., & West, J.  E. (2013). From natural variation to optimal ­p olicy? The importance of endogenous peer group formation. Econometrica, 81(3), 855–882. Christakis, N. A., & Fowler, J. H. (2007). The spread of obesity in a large social network over 32 years. New England Journal of Medicine, 357(4), 370–379. Christakis, N. A., & Fowler, J. H. (2008). The collective dynamics of smoking in a large social network. New England Journal of Medicine, 358(21), 2249–2258. Cohen-Cole, E., & Fletcher, J.  M. (2008a). Detecting implausible social network effects in acne, height, and headaches: Longitudinal analysis. BMJ, 337, a2533.

304   Kenneth A. Frank and Ran Xu Cohen-Cole, E., & Fletcher, J. M. (2008b). Is obesity contagious? Social networks vs. environmental factors in the obesity epidemic. Journal of Health Economics, 27(5), 1382–1387. Coker, A. L., Smith, P. H., Thompson, M. P., McKeown, R. E., Bethea, L., & Davis, K. E. (2002). Social support protects against the negative effects of partner violence on mental health. Journal of Women’s Health & Gender-Based Medicine, 11(5), 465–476. Coleman, J. S. (1994). Foundations of social theory. Boston, MA: Harvard University Press. Coleman, J.  S., Katz, E., & Menzel, H. (1966). Medical innovation. New York, NY: BobbsMerrill. Cook, T. D. (2002). Randomized experiments in educational policy research: A critical examination of the reasons the educational evaluation community has offered for not doing them. Educational Evaluation and Policy Analysis, 24(3), 175–199. Cook, T. D. (2003). Why have educational evaluators chosen not to do randomized experiments? Annals of the American Academy of Political and Social Science, 589(1), 114–149. Cook, T. D., Shadish, W. R., & Wong, V. C. (2008). Three conditions under which experiments and observational studies produce comparable causal estimates: New findings from within-­study comparisons. Journal of Policy Analysis and Management, 27(4), 724–750. Cronbach, L. J., & Shapiro, K. (1982). Designing evaluations of educational and social programs. San Francisco, CA: Jossey-Bass. De Nooy, W. (2011). Networks of action and events over time. A multilevel discrete-time event history model for longitudinal network data. Social Networks, 33(1), 31–40. Deutsch, M., & Gerard, H. B. (1955). A study of normative and informational social influences upon individual judgment. Journal of Abnormal and Social Psychology, 51(3), 629. Dietz, T., Frank, K. A., Whitley, C. T., Kelly, J., & Kelly, R. (2015). Political influences on greenhouse gas emissions from US states. Proceedings of the National Academy of Sciences, 112(27), 8254–8259. DiPrete, T.  A., & Gangl, M. (2004). Assessing bias in the estimation of causal effects: Rosenbaum bounds on matching estimators and instrumental variables estimation with imperfect instruments. Sociological Methodology, 34(1), 271–310. Doreian, P. (1980). Linear models with spatially distributed data spatial disturbances or spatial effects? Sociological Methods & Research, 9(1), 29–60. Doreian, P. (2001). Causality in social network analysis. Sociological Methods & Research, 30(1), 81–114. Duncan, O. D., Haller, A. O., & Portes, A. (1968). Peer influences on aspirations: A reinterpretation. American Journal of Sociology, 74(2), 119–137. Feld, S. L. (1981). The focused organization of social ties. American Journal of Sociology, 86(5), 1015–1035. Fletcher, J.  M. (2010). Social interactions and smoking: Evidence using multiple student cohorts, instrumental variables, and school fixed effects. Health Economics, 19(4), 466–484. Frank, K. A. (2000). Impact of a confounding variable on the inference of a regression coefficient. Sociological Methods and Research, 29(2), 147–194. Frank, K. A., & Fahrbach, K. (1999). Organizational culture as a complex system: Balance and information in models of influence and selection. Organization Science, Special issue on Chaos and Complexity in Organization, 10(3), 253–277. Frank, K. A., Maroulis, S., Belman, D., & Kaplowitz, M. D. (2011). The social embeddedness of natural resource extraction and use in small fishing communities. In W.  W.  Taylor, A.  J.  Lynch, & M.  G.  Schechter, (Eds.), Sustainable fisheries: Multi-level approaches to a global problem (pp. 309–332). Bethesda, MA: American Fisheries Society.

Causal Inference for Social Network Analysis   305 Frank, K. A., Maroulis, S., Duong, M., & Kelcey, B. (2013). What would it take to change an inference?: Using Rubin’s causal model to interpret the robustness of causal inferences. Education, Evaluation and Policy Analysis, 35, 437–460. Frank, K., & Min, K. S. (2007). Indices of robustness for sample representation. Sociological Methodology, 37(1), 349–392. Frank, K. A., Mueller, K., Krause, A., Taylor, W., & Leonard, N. (2007). The intersection of global trade, social networks, and fisheries. In W. Taylor, M. G. Schechter, & L. Wolfson (Eds.), Globalization: Effects on fisheries resources (pp. 385–423). New York, NY: Cambridge University Press. Frank, K. A., Muller, C., & Mueller, A. S., (2013). The embeddedness of adolescent friendship nominations: The formation of social capital in emergent network structures. American Journal of Sociology, 119(1), 216. Frank, K. A., Muller, C., Schiller, K., Riegle-Crumb, C., Strassman-Muller, A., Crosnoe, R., & Pearson, J. (2008). The social dynamics of mathematics course taking in high school. American Journal of Sociology, 113(6), 1645–1696. Frank, K. A., Sykes, G., Anagnostopoulos, D., Cannata, M., Chard, L., Krause, A., & McCrory, R. (2008). Extended influence: National board certified teachers as help providers. Education Evaluation and Policy Analysis, 30(1), 3–30. Frank, K. A., Zhao, Y., & Borman, K. (2004). Social capital and the diffusion of innovations within organizations: Application to the implementation of computer technology in schools. Sociology of Education, 77, 148–171. Frank, K. A., Zhao, Y., Penuel, W. R., Ellefson, N. C., & Porter, S. (2011). Focus, fiddle and friends: Sources of knowledge to perform the complex task of teaching. Sociology of Education, 84(2), 137–156. Freeman, L. C. (1978). Centrality in social networks conceptual clarification. Social Networks, 1(3), 215–239. Freeman, L. C., Romney, A. K., & Freeman, S. C. (1987). Cognitive structure and informant accuracy. American Anthropologist, 89(2), 310–325. Friedkin, N.  E. (1998). A structural theory of social influence. New York, NY: Cambridge University Press. Gelman, A. (2011). Controversy over the Christakis-Fowler findings on the contagion of ­obesity. http://themonkeycage.org/2011/06/1-lyonss-statistical-critiques-seem-reasonableto-me-there-could-well-be-something-important-that-im-missing-but-until-i-hearotherwise-for-example-in-a-convincing-reply-by-christakis-and-f/ Granovetter, M. (2005). The impact of social structure on economic outcomes. Journal of Economic Perspectives, 19(1), 33–50. Heckman, J. (2005). The scientific model of causality. Sociological Methodology, 35, 1–99. Hedström, P., & Bearman, P. (Eds.). (2009). The Oxford handbook of analytical sociology. Oxford, UK: Oxford University Press. Hedström, P., Sandell, R., & Stern, C. (2000). Mesolevel networks and the diffusion of social movements: The case of the Swedish Social Democratic Party. American Journal of Sociology, 106(1), 145–172. Hedström, P., & Swedberg, R. (1996). Social mechanisms. Acta Sociologica, 39(3), 281–308. Hedström, P., & Ylikoski, P. (2010). Causal mechanisms in the social sciences. Annual Review of Sociology, 36, 49–67. Henry, A. D., Lubell, M., & McCoy, M. (2012). Survey-based measurement of public management and policy networks. Journal of Policy Analysis and Management, 31(2), 432–452.

306   Kenneth A. Frank and Ran Xu Heider, F. (1946). Attitudes and cognitive organization. Journal of Psychology, 21(1), 107–112. Hoff, P.  D. (2005). Bilinear mixed-effects models for dyadic data. Journal of the American Statistical Association, 100(469), 286–295. Hoff, P. D., Raftery, A. E., & Handcock, M. S. (2002). Latent space approaches to social network analysis. Journal of the American Statistical Association, 97(460), 1090–1098. Kalmijn, M., & Flap, H. (2001). Assortative meeting and mating: Unintended consequences of organized settings for partner choices. Social Forces, 79(4), 1289–1312. Kiviet, J.  F. (1995). On bias, inconsistency, and efficiency of various estimators in dynamic panel data models. Journal of econometrics, 68(1), 53–78. Kramer, A. D., Guillory, J. E., & Hancock, J. T. (2014). Experimental evidence of massive-scale emotional contagion through social networks. Proceedings of the National Academy of Sciences, 111(24), 8788–8790. Kremer, M., & Levy, D. (2008). Peer effects and alcohol use among college students. Journal of Economic Perspectives, 22(3), 189. Lazarsfeld, P.  F., & Merton, R.  K. (1954). Friendship as a social process: A substantive and methodological analysis. Freedom and Control in Modern Society, 18(1), 18–66. Lazega, E., & van Duijn, M. (1997). Position in formal structure, personal characteristics and choices of advisors in a law firm: A logistic regression model for dyadic network data. Social Networks, 19, 375–397. Leenders, R.  T.  A. (2002). Modeling social influence through network autocorrelation: Constructing the weight matrix. Social Networks, 24(1), 21–47. Leenders, R. T. A. J. (1995). Structure and influence: Statistical models for the dynamics of actor attributes, network structure and their interdependence (Doctoral dissertation). University of Groningen. Lin, D. Y., Psaty, B. M., & Kronmal, R. A. (1998). Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics, 54(3), 948–963. Lyons, R. (2011). The spread of evidence-poor medicine via flawed social-network analysis. Statistics, Politics, and Policy, 2(1), 2151‒7509. Manski, C.  F. (1993). Identification of endogenous social effects: The reflection problem. Review of Economic Studies, 60(3), 531–542. Marsden, P. V. (2005). Recent developments in network measurement. Models and Methods in Social Network Analysis, 8, 30. Matsueda, R. L., & Anderson, K. (1998). The dynamics of delinquent peers and delinquent behavior. Criminology, 36(2), 269–308. McPherson, M., Smith-Lovin, L., & Cook, J. M. (2001). Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27, 415–444. Minhas, S., Hoff, P. D., & Ward, M. D. (2019). Inferential approaches for network analysis: AMEN for latent factor models. Political Analysis, 27(2), 208–222. Morgan, S. L., & Harding, D. J. (2006). Matching estimators of causal effects: Prospects and pitfalls in theory and practice. Sociological Methods and Research, 35, 3–60. Oetting, E. R., & Donnermeyer, J. F. (1998). Primary socialization theory: The etiology of drug use and deviance. I. Substance Use & Misuse, 33(4), 995–1026. O’Malley, A.  J., Elwert, F., Rosenquist, J.  N., Zaslavsky, A.  M., & Christakis, N.  A. (2014). Estimating peer effects in longitudinal dyadic data using instrumental variables. Biometrics, 70(3), 506–515. Ord, K. (1975). Estimation methods for models of spatial interaction. Journal of the American Statistical Association, 70(349), 120–126.

Causal Inference for Social Network Analysis   307 Penuel, W. R., Frank, K. A., Sun, M., Kim, C., & Singleton, C. (2013). The organization as a filter of institutional diffusion. Teachers College Record, 115(1), 306–339. Petty, R. E., & Cacioppo, J. T. (1986). The elaboration likelihood model of persuasion (pp. 1–24). New York, NY: Springer. Platt, J. R. (1964). Strong inference. Science, 146(3642), 347–353. Reagans, R. E., & Zuckerman, E. W. (2008). Why knowledge does not equal power: The network redundancy trade-off. Industrial and Corporate Change, 17(5), 903–944. Rivera, M.  T., Soderstrom, S.  B., & Uzzi, B. (2010). Dynamics of dyads in social networks: Assortative, relational, and proximity mechanisms. Annual Review of Sociology, 36, 91–115. Rosenbaum, P. R. (1986). Dropping out of high school in the United States: An observational study. Journal of Educational and Behavioral Statistics, 11(3), 207–224. Rosenbaum, P. (2002). Observational studies. New York, NY: Springer. Rosenbaum, P. R., & Rubin, D. B. (1983). Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. Journal of the Royal Statistical Society, Series B (Methodological), 45(2), 212–218. Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5), 688. Sacerdote, B. (2000). Peer effects with random assignment: Results for Dartmouth roommates (No. w7469). Cambridge, MA: National Bureau of Economic Research. Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin. Shadish, W. R., Clark, M. H., & Steiner, P. M. (2008). Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random and nonrandom assignments. Journal of the American Statistical Association, 103(484), 1334‒1344. Shalizi, C. R., & McFowland, E., III. (2018). Controlling for latent homophily in social networks through inferring latent locations. arXiv preprint arXiv:1607.06565. Shalizi, C.  R., & Thomas, A.  C. (2011). Homophily and contagion are generically confounded in observational social network studies. Sociological Methods & Research, 40(2), 211–239. Simmel, G. (1950). The sociology of Georg Simmel (Kurt H. Wolff, Ed.). Glencoe, IL: Free Press. Sims, C.  A. (1980). Macroeconomics and reality. Econometrica: Journal of the Econometric Society, 48(1), 1–48. Slavin, R. E. (2008). Perspectives on evidence-based research in education-what works? Issues in synthesizing educational program evaluations. Educational Researcher, 37, 5–14. Steglich, C., Snijders, T. A., & Pearson, M. (2010). Dynamic networks and behavior: Separating selection from influence. Sociological Methodology, 40(1), 329–393. Steiner, P. M., Cook, T. D., Shadish, W. R., & Clark, M. H. (2010). The importance of covariate selection in controlling for selection bias in observational studies. Psychological Methods, 15(3), 250. Sun, M., Penuel, W. R., Frank, K. A., Gallagher, H. A., & Youngs, P. (2013). Shaping professional development to promote the diffusion of instructional expertise among teachers. Educational Evaluation and Policy Analysis, 35(3), 0162373713482763. Sweet, T. M., Thomas, A. C., & Junker, B. W. (2013). Hierarchical network models for education research hierarchical latent space models. Journal of Educational and Behavioral Statistics, 38(3), 295–318. VanderWeele, T.  J. (2011). Sensitivity analysis for contagion effects in social networks. Sociological Methods & Research, 40(2), 240–255.

308   Kenneth A. Frank and Ran Xu VanderWeele, T. J., & An, W. (2013). Social networks and causal inference. In S. L. Morgan (Ed.), Handbook of causal analysis for social research (pp. 353–374). Dordrecht, Netherlands: Springer Netherlands. Wellman, B., & Berkowitz, S.  D. (1988). Social structures: A network approach (Vol. 2). Cambridge, UK: CUP Archive. Wooldridge, J. (2010). Econometric analysis of cross section and panel data (2nd ed.). Cambridge, MA: MIT Press. Xu, R. (2016). Sensitivity analysis for observation errors in social networks (Doctoral dissertation). Michigan State University. Xu, R. (2018). Alternative estimation methods for identifying contagion effects in dynamic social networks: A latent-space adjusted approach. Social Networks, 54, 101–117. Xu, R., & Frank, K.A., (forthcoming). Sensitivity Analysis for Network Observations with Applications to Inferences of Social Influence Effects. Network Science. Xu, R., Frank, K. A., Maroulis, S. J., & Rosenberg, J. M. (2019). konfound: Command to quantify robustness of causal inferences. The Stata Journal, 19(3), 523-550. Zeng, Z., & Xie, Y. (2008). A preference-opportunity-choice framework with applications to intergroup friendship. American Journal of Sociology, 114(3), 615.

pa rt i i i

N ET WOR K DI M E NSIONS

Chapter 16

Case Stu die s i n N et wor k Com m u n it y Detection Saray Shai, Natalie Stanley, Clara Granell, Dane Taylor, and Peter J. Mucha

Most networks representing real-world systems display community structure, and many visualizations of networks lend themselves naturally to observations about group-level interactions in the network. The most commonly studied pattern of group-level interactions is the assortative community structure, where groups of nodes appear to be more connected to each other than to the rest of the network. Other types of community structures exist (and we will briefly mention them later), but the focus of this chapter is on assortative community structures. One might be reasonably curious about why this is such a common feature across a great variety of real networks, and even more intriguingly, what do the groups mean? Considering examples from different disciplines, one can observe that these groups (or communities) often have important roles in the organization of a network. For example, in a social network where nodes represent individuals and edges describe friendships between them, communities can correspond to groups of people with shared interests (Granovetter, 1973; McPherson, Smith-Lovin, & Cook,  2001; Moody & White,  2003; Zachary,  1977). In the graph of the World Wide Web, where a directed edge between web pages represents a hyperlink from one to the other, communities often correspond to webpages with related topics (Flake, Lawrence, & Giles, 2000). In brain networks of interconnected neurons or cortical areas, communities can correspond to specialized functional components such as visual and auditory systems (Sporns & Betzel, 2016). In networks representing interactions among proteins, communities can group together proteins that contribute to the same cellular function (Spirin & Mirny,  2003). Across each of these examples, the communities provide a new level of description of the network, and this intermediate (i.e., “mesoscopic”) perspective between the microscopic (nodes) and macroscopic (the whole network) domains proves to be very useful in understanding the essential functionality and organizational principles of a network.

312   Saray Shai, Natalie Stanley, Clara Granell ET AL. One of the motivations to identify communities in applications is to learn about the r­ elationship between structure and attributes such as age, location, interests, health, race, sex, and so on. In particular, when structure aligns with data attributes, it suggests that we may be onto something, and the observed alignment immediately starts planting seeds for hypotheses and future studies. However, congruent with most community detection algorithms, we refer to structural communities in which there is a prevalence of edges between nodes in the same community versus those between communities. Importantly, this notion is a topological property of the network and is agnostic to attributes. In principle, one can choose other definitions for what constitutes a community, and we note that for attributed (also called annotated) networks there is growing interest in developing community detection algorithms that utilize both structural and attribute information (Binkiewicz, Vogelstein, & Rohe, 2017; Bothorel et al., 2015; Newman & Clauset, 2016; Peel, Larremore, & Clauset, 2017; Yang, McAuley, & Leskovec, 2013). While here we do not explore these possibilities, and focus our attention on communities in the topological sense, it is important to note that there is often positive correlation between community structure and attribute information due to homophily (Aral, Muchnik, & Sundararajan,  2009; McPherson et  al.,  2001)—that is, edges exist preferentially between nodes with similar attributes. Generally speaking, studying the interplay between attribute information and network structure is complicated due to confounding effects (Shalizi & Thomas, 2011). Detecting communities in an automated manner is not a simple pursuit, first, because although the qualitative notions of communities may be intuitive, translating such ideas into an appropriate modeling framework can be challenging. In particular, various applications call for different notions of a community, each producing a different mesoscopic description of a network. Second, the computational complexity of community detection can be a fundamental issue; for example, the number of possible partitions of nodes into nonoverlapping groups is nonpolynomial in the size of the network (and allowing overlapping communities increases the number of possibilities), motivating important work on different heuristics for efficiently identifying communities. Such challenges make community detection one of the most complex—yet fascinating—areas of network science, with a huge and ever-increasing number of different algorithms available in the literature. We only indicate a few classes of community detection methods here, referring the reader to comprehensive community detection reviews by Porter, Onnela, and Mucha (2009); Fortunato (2010); and Fortunato and Hric (2016) (see also a recent review by Schaub et al. [2017] on the conceptual differences between different perspectives on community detection). While the ideas of community detection have been around in sociology for decades (see, e.g., the discussions in Coleman, 1964; Freeman, 2004; Moody & White, 2003), the field has benefited from significant contributions across numerous disciplines proposing a variety of methods and algorithms for automating community detection. Graph partitioning (e.g., Barnes,  1982; Fiedler,  1973; Kernighan & Lin,  1970; Mahoney, Orecchia, & Vishnoi, 2012) spans a large literature across computer science and mathematics, aiming to divide a network into a specified number of groups so that some selected quantity is optimized, such as the number of edges between the groups (i.e., cut size). Modularity maximization (Newman & Girvan, 2004), a different optimization approach for graph partitioning originating in the physics literature, aims to find the partition with the largest difference between the total weight of within-community edges and that expected under a null model—that is, a random-network model with selected properties. Modularity

Case Studies in Network Community Detection   313 maximization typically leads to more balanced community sizes, can account for degree heterogeneity in the network, and does not require a priori specification of the number of communities. However, it is well known to suffer from a resolution limit (Fortunato & Barthelemy, 2007), and it is not at all clear how to best interpret the different numbers of communities that can be obtained by varying resolution parameters (Arenas, Fernandez, & Gomez, 2008b; Reichardt & Bornholdt, 2006). Statistical inference (e.g., Ball, Karrer, & Newman,  2011; Hastings,  2006; Karrer & Newman, 2011; Peixoto, 2013, 2014), arising from the statistics literature, typically aims to identify a parametrized generative model that describes the network (e.g., with maximum likelihood). For example, stochastic blockmodels (Fienberg & Wasserman, 1981; Holland, Laskey, & Leinhardt, 1983; Snijders & Nowicki, 1997) assume for a given partition that the edge probability between nodes depends on their community memberships (see more details in a recent note by Abbe [2017] on the current developments in community detection in the context of stochastic blockmodels). Note that according to this description, ­community can be any group of nodes that interact in a stochastically equivalent way with other groups of nodes, thus allowing the detection of nonassortative communities such as core-periphery structures and bipartite blocks. Cut size, modularity, and likelihood all define objective functions that measure the “goodness” of the partitions (or, in some cases, sets of communities that may or may not cover the network) and are generally computationally hard problems—in most cases finding the conclusively best community assignment is effectively equivalent to computing a nonvanishing fraction of all possibilities, which grows exponentially with system size. In the case of modularity maximization, this problem has been proved to be NP-complete (Brandes et al., 2008), and there are an exponential number of competitive local optima that often have a wide structural diversity among them (Good, de Montjoye, & Clauset, 2010). Indeed, this ruggedness in the partition landscape also appears in other objective functions as well, and one could overcome it in various ways (e.g., by using prior domain knowledge to constrain the optimization; Sohn et al., 2011). Fortunately, many algorithms have been developed to efficiently provide good solutions in practice, including a variety of iterative (Blondel et al.,  2008; Kernighan & Lin,  1970; Peixoto,  2014), spectral (Barnes,  1982; Fiedler, 1973; Newman, 2006), and convex optimization (Ames & Vavasis, 2011, 2014; Cai et al., 2015; Chen, Sanghavi, & Xu, 2012; Oymak & Hassibi, 2011) methods. At the same time, numerous heuristics have been developed for community detection that do not necessarily optimize a global objective function but nonetheless have proven to be useful. These often fall into two categories: agglomerative methods, which are akin to hierarchical clustering (Hastie, Tibshirani, & Friedman, 2001), and divisive methods, such as iteratively partitioning a network by some local measure (such as edge betweenness; Girvan & Newman, 2002). A number of other community detection methods stem from analyses of dynamical systems on a network, including the Potts model for spin systems (Reichardt & Bornholdt, 2004; Wu, 1982), random walks (Delvenne, Yaliraki, & Barahona, 2010; Jeub et al., 2015; Pons & Latapy,  2005; Rosvall & Bergstrom,  2008; Zhou,  2003), and oscillator synchronization (Arenas, Díaz-Guilera, & Pérez-Vicente, 2006; Li et al., 2008). Such approaches are directly applicable for studying these respective dynamical systems and in some cases are closely related or even equivalent to one of the aforementioned quality functions (Delvenne et al., 2010; Fiedler, 1973; Rosvall & Bergstrom, 2008). Conversely, community structure can

314   Saray Shai, Natalie Stanley, Clara Granell ET AL. have a profound effect on dynamics taking place on networks (e.g., the spread of ­information across social networks [Aral et al.,  2009; McPherson et al.,  2001; Melnik et al.,  2014; O’Sullivan et al., 2015; Ugander et al., 2012; Weng, Menczer, & Ahn, 2013], random walks and heat flow [Delvenne et al., 2010; Mucha et al., 2010], cascades [Galstyan & Cohen, 2007; Gleeson, 2008], and synchronization [Arenas et al., 2006; Skardal & Restrepo, 2012]), and adopting a community-based perspective provides a useful vantage point to study these dynamics. These are just a small sample of the many community detection methods that have been developed, and we in no way intend this chapter to be a comprehensive review of all methods. Rather, here we present examples from different scientific disciplines demonstrating the useful application of community detection. In particular, we aim to emphasize community detection as a tool for studying networks. Identifying communities is often just a first step in data analysis as it opens up many possibilities for further study. We illustrate this idea with a well-known example shown in Figure 16.1, the Zachary karate club (Zachary, 1977). The karate club network developed by Zachary (1977), through observing the interactions between members of a club during the two-year period from 1970 to 1972, represents the friendships between 34 of the club members as an aggregated, weighted network. During this period, there was a club division (indicated by node colors and shapes in Figure 16.1) due to a conflict between the club instructor and the president (nodes 1 and 34, respectively). Due in part to this “ground truth” division and the network’s small size and simple structure, the Zachary karate club has become a common example for demonstrating community detection algorithms. Zachary demonstrated that most of the members chose to be in the subgroup best associated with their friends. Specifically, his use of a cut algorithm to define a split of the network into two subgroups almost perfectly reproduced the real-life split of all but one of the members. Node 9 didn’t choose to join the president’s new club despite the larger number of ties to that club, apparently because he was only three

25 12

11

26 28

18 32

8

5

20

21

2

7 4

17 13

22

24

3

1

6

10

29

34 14

19

33

9

30 16

31

15 23

27

figure 16.1  Zachary karate club network (Zachary, 1977). Node colors and shapes indicate the club division that occurred, with the instructor (node 1) and the president (node 34) shown in bold.

Case Studies in Network Community Detection   315 weeks away from completing a four-year quest for a black belt, requiring his allegiance to the instructor. This seemingly odd behavior of node 9 highlights three important lessons: (1) adopting a community-based approach to network analysis provides a vantage point to ask new research questions; (2) one must be cautious when comparing the output of a community detection algorithm to known information on the network (frequently referred to as “ground truth”), as the latter might include important additional information not captured by the network topology (Peel et al.,  2017); and finally (3) applied community detection should incorporate domain knowledge to choose appropriate methodologies, develop application-specific techniques, and address domain-driven questions. The detection of structural communities provides us with a lens to look at network data that often results in nontrivial findings beyond the recovery of “nodes classes” (Hric, Darst, & Fortunato, 2014). With these lessons in mind, the rest of this chapter is organized by the following case studies. In the first section, we describe how communities have been used to help predict which memes go viral on Twitter. In the second section, we highlight political polarization in the US Congress, demonstrating the use of communities to quantify polarization and identify node roles, such as US senators that bridge the legislative space between political parties. In the third section, we present a study of the neuronal network of Caenorhabditis elegans in which multiresolution communities uncover groups of neurons with similar biological function. In the fourth section, we turn to a different neuroscience application that uses communities to compare human brain networks under different tasks and rest states. Finally, in the fifth section, we provide an example of how communities can help explain the evolution of genes important to malaria. We selected these case study examples to highlight the utility of a community-driven approach to network analysis, drawing from these creative applications in which the modeling assumptions and algorithm choices elucidate important aspects of the data. We hope that our discussion will be thought provoking for those previously unfamiliar with this area and inspire further use of community detection for network analysis.

Virality Prediction of Social Memes Community structure affects social contagions and epidemics through structural trapping, meaning that a meme or virus spreads readily within a community (or communities, if the contagion arises in clusters) and tends to not spread (as quickly, if at all) from one community to another (Aral et al., 2009; Fisher, 2017; Hewstone, Rubin, & Willis, 2002; McPherson et al., 2001; Melnik et al., 2014; Onnela et al., 2007; O’Sullivan et al., 2015; Ugander et al., 2012; Weng et al., 2013). That is, the contagion exhibits “community concentration” in which it is localized (i.e., concentrated) within one or more communities. In the context of epidemics, structural communities (which often reflect geographic constraints) can be represented by metapopulation models (Colizza & Vespignani, 2008; Melnik et al., 2014) that partition the human population into subgroups (broadly defined). Social contagions and epidemics share many mathematical and modeling similarities (Dietz, 1967; Goffman & Newill, 1964); however, their differences are also important. One crucial distinction is that social contagions are typically better modeled as complex contagions (Centola & Macy, 2007) in which

316   Saray Shai, Natalie Stanley, Clara Granell ET AL. a node’s (i.e., person’s) adoption of the contagion requires social reinforcement, for example, as modeled by threshold criteria (Granovetter, 1978; Watts, 2002). Whereas a biological epidemic can be transmitted through a single exposure, a person can require a certain amount of “contagion exposure” (e.g., number or fraction of contacts who have already adopted it) before adopting a social contagion themselves. Although subtle, this discriminating feature of social contagions and epidemics can significantly impact spreading patterns on networks (Centola, 2010; Centola & Macy, 2007; Melnik et al., 2013; O’Sullivan et al., 2015; Taylor et al., 2015; Weng et al., 2013). Weng et al. (2013) study the spread of memes across the Twittersphere, concluding that homophily and social reinforcement collectively boost community concentration. Interestingly, they find this effect to differ for viral memes (those that spread vastly in the population) versus nonviral memes (those that do not reach high levels of popularity and are only shared by a small fraction of the population). The three main findings of their work are that (1) communities allow us to estimate how much the spreading pattern of a meme deviates from that of infectious diseases, (2) viral memes tend to spread more like epidemics than nonviral memes; and finally (3) the virality of memes can be predicted based on early spreading patterns in terms of community structure. We now describe further each of these results. The authors built an unweighted, undirected network from Twitter data, encoding reciprocal following relationships between users. This network provided evidence of structural trapping for memes, defined as unique hashtags, that spread through tweets and retweets. They identified communities using two community detection methods: Infomap (Rosvall & Bergstrom, 2008), an information-theoretic algorithm, and link clustering (Ahn, Bagrow, & Lehmann, 2010), which identifies overlapping communities by clustering edges. By analyzing the flow of information, they found that memes are much more likely to spread across intracommunity edges versus intercommunity edges. Given that a variety of factors (e.g., homophily, social reinforcement, and use history) can contribute to this phenomenon, it is important to recognize that this feature of community structure alone is able to differentiate how important different edges might be in fostering the spread of memes. To demonstrate that the local phenomenon of preferential spreading across intracommunity edges contributes to the mesoscopic phenomenon of community concentration, the authors developed an entropy-based measure to quantify the extent to which the spreading of memes concentrates into communities. They compared this measure for their dataset to that of four null models for social contagions: random spreading, a simple epidemic, a social reinforcement model, and an epidemic with homophily. By drawing this comparison, the authors observed community concentration for nonviral epidemics to more closely resemble complex contagions, whereas the spreading of viral memes more closely resembled simple epidemics. In particular, viral memes exhibited less structural trapping (similar to epidemics), whereas nonviral memes exhibited stronger structural trapping (similar to complex contagions). To further distinguish viral and nonviral memes, Weng et al. (2013) focused on the early stages of contagions and studied the average contagion exposure (i.e., the number of social contacts who are already adopters) for each adopter of a contagion. The authors compared their Twitter dataset to the same four null models and again observed viral memes to more closely resemble simple epidemics; namely, less exposure is required for transmission of a viral meme.

Case Studies in Network Community Detection   317 Motivated by the observation that community concentration and contagion exposure are informative features to gauge the virality of a meme, they then implemented a classification algorithm using random forests to predict whether or not a meme will go viral. To map virality prediction as a classification problem, they partitioned the set of memes into two classes (viral vs. nonviral), specifying the fraction of memes that are nonviral (considering virality both in terms of the number of retweeters and the total number of retweets). To study the benefit of using community structure information to improve virality prediction, they compared the resulting classification precision and recall scores for three classifiers: random guessing, community-blind prediction, and community-based prediction. They found that incorporating information about community structure can greatly improve the prediction accuracy for the virality of memes (see also a follow-up paper with expanded results, Weng, Menczer, & Ahn, 2014).

Congressional Roll Call While the representation of Twitter following as a network is straightforward, direct ­connectivity is only one of many data types that can be represented by a network. Other common networks encode the similarity between, for example, people, text documents, or protein sequences. Here we consider communities found in network representations of roll-call voting similarity in the US Congress, as constructed and studied by Waugh et al. (2009). These networks connect two members in a selected Congress (i.e., the two-year period starting in early January following the biennial congressional elections) according to the similarity in their voting patterns. Waugh et al. (2009) defined edge weights equal to the fraction of bills that the two members voted the same way, yay or nay, among the total number of bills for which they were both present and voted (after removing nearly unanimous votes). This definition yields weighted edges in a dense network; indeed, every member of Congress is connected in this definition to every other member in the same chamber with some positive weight unless they managed to never once vote the same way, while two members who always voted identically are connected with an edge of weight 1. Because the self-loops connecting each member of Congress to him- or herself do not provide additional information, these were removed. This undirected roll-call similarity network is a selected projection of the underlying bipartite (and signed) data that connects legislators with the bills that they voted on. This projection is useful for describing legislative activity because the community structures group together members of Congress who vote similarly, independent of the political or policy content of the bills, providing relatively accessible and intuitive examples of communities (see, e.g., Figure 16.2). Waugh et al. (2009) studied community structure for these networks, providing a framework for thinking about the large-scale structure of congressional legislative action in terms of political allegiances, whether or not those allegiances are well aligned with the nominally declared party memberships. In particular, they considered modularity, which measures the difference between the total weight of within-community edges and that expected under a given null model (e.g., random network), to quantify legislative polarization. They found a curious relationship between the modularity of a chamber (House or Senate) in a Congress (i.e., the largest value of modularity found maximizing over partitions) and turnover of its

318   Saray Shai, Natalie Stanley, Clara Granell ET AL. 1

1

10

0.9

10

0.9

20

0.8

20

0.8

30

0.7

30

0.7

40

0.6

40

0.6

50

0.5

50

0.5

60

0.4

60

0.4

70

0.3

70

0.3

80

0.2

80

0.2

90

0.1

90

0.1

100

10

20

30

40

50

60

70

80

90

100

0

100

10

20

30

40

50

60

70

80

90

100

0

figure 16.2  Roll-call similarity adjacency matrices in the US Senate as defined by Waugh et al. (2009) for the (left) 85th and (right) 108th Congresses, after reordering indices (senators) with reorderMAT from the Brain Connectivity Toolbox (Rubinov & Sporns, 2010). The 85th Congress, January 3, 1957, to January 3, 1959, included the first federal civil rights legislation passed by Congress since Reconstruction (Wikipedia, 2017a). The modularity of this weighted network (i.e., the maximum modularity obtained across observed partitions) is 0.091. In contrast, the modularity of the 108th Senate, January 3, 2003, to January 3, 2005, is 0.273, one of the highest values in any Senate. For comparison, two equal-sized blocks with perfect agreement within and zero agreement between blocks yields a modularity of 1/2 (up to a 1/N factor from removal of self-loops). Full color figures available on Oxford Handbooks Online. majority party in the elections leading into the next Congress: while periods of very high or very low polarization (as measured by modularity) appeared to be relatively stable in terms of re-electing the majority party, they found that middle levels of polarization more frequently led to majority party turnover at the subsequent election (controlling for various other hypothesized factors). In so doing, Waugh et al. (2009) not only used community detection as an exploratory tool for intuitively understanding large-scale structure but also used modularity as a useful quantity describing an important global feature of these networks. Additional intuition about these networks and their changes over time can be obtained from visualizations, as demonstrated by the force-directed layouts of Andris et al. (2015) and the community-focused figure of Moody and Mucha (2013). Looking at the Senate roll call from 1975 to 2012, Moody and Mucha combined modularity for system-wide polarization of a Congress with groups of senators in each Congress identified by a modified version of “convergence of iterated correlations” (CONCOR; Breiger, Boorman, & Arabie,  1975; White, Boorman, & Breiger,  1976). A feature of this grouping is that by construction it leaves some senators in the political center unaffiliated with the party-centric groups, allowing for easy visualization of the hollowing out of legislative activity in the political center over time, along with increasing polarization. This simultaneous use of modularity and CONCOR in the visualization demonstrates the value of using multiple methods for identifying communities. Because of the temporal nature of the roll-call networks, community detection methods that explicitly utilize the identifications across time have also been usefully applied. Whereas the Waugh et al. (2009) analysis and Moody and Mucha (2013) visualization detected communities within each two-year Congress independently (and then identify common

Case Studies in Network Community Detection   319 senators from one Congress to the next in the visualization), a “multilayer networks” framework can be used for studying networks that change dynamically over time, as well as a variety of other network generalizations (see, e.g., Kivela et al., 2014). Mucha et al. (2010) used the properties of Laplacian dynamics to generalize the original definition of modularity to multilayer networks, using the Senate roll call as an instructive example for temporal networks, processing the data into the two-year single-Congress waves (called “slices” then but now more commonly thought of as “layers”). Naively, one could start by maximizing the modularity of each layer independently, but connecting those communities between layers then requires selection of a matching procedure that often leads to ambiguities. In contrast, the multilayer version directly allows for continuation of communities from one layer to the next and characterizing their flow across layers. In the simplest setting, the idea behind multilayer community detection introduces an interlayer coupling parameter, ω, describing the weight of the identity arcs linking corresponding nodes across layers. The multilayer modularity and the partitions found under fixed parameters then depend on ω. For ω = 0, the single-layer modularity of each network layer is optimized independently. As ω is increased, the coupling between layers encourages finding partitions that include greater spanning of communities across layers. The partition highlighted and visualized in Mucha et al. (2010) includes communities that span multiple Congresses, with most of the single-Congress layers containing only two communities. The handful of layers with more than two communities mark key transitions in the two-party system, often with one group fading in favor of another (whether or not they name themselves differently). While the start of the American Civil War in 1861 is particularly obvious in the data, these transitions also occur near other major political moments or, in some cases, near the boundaries of the recognized “party systems” of the United States as studied in political science (see Wikipedia, 2017b). Alternative partitions of the data corresponding to different interlayer coupling parameter values were visualized by Mucha and Porter (2010), demonstrating how different features are highlighted by exploring the space of community detection parameters. We note that similar network constructions have been used to study voting in the congresses in Peru (Lee, Magallanes, & Porter, 2017) and Brazil (Levorato & Frota, 2016), as well as the United Nations General Assembly (Macon, Mucha, & Porter,  2012). Community detection has also been used to study committee assignments (Porter et al., 2005, 2006, 2007) and cosponsorship (Zhang et al., 2007) in the US Congress, and multilayer modularity in the multiplex setting was used by Cranmer, Menninga, and Mucha (2015) to measure the level of “fractionalization” in international relations.

Exploratory Analysis of the C. elegans Neural Network Many community detection methods, including but not limited to many traditional modularity optimization algorithms, provide a user with a single partition of the network into communities along with the corresponding value of the objective function (e.g., modularity). The value of modularity itself can be valuable as in the example of the previous section and is frequently interpreted by users as an assessment of the meaningfulness of that

320   Saray Shai, Natalie Stanley, Clara Granell ET AL. ­ artition, although caution is strongly recommended (see Bassett et al.,  2013; Guimera, p Sales-Pardo, & Amaral, 2004). There are two immediate problems with analyzing a network with a fixed-resolution community detection algorithm. First, some meaningful structures could remain undetected (e.g., small cliques lumped together into one community) under modularity optimization at a single resolution (Reichardt & Bornholdt, 2006) due to resolution limits of modularity (Fortunato & Barthelemy, 2007), as well as detectability limits that apply to all polynomial-time community detection methods (Nadakuditi & Newman, 2012). Second, when the purpose of community detection is data exploration, studying a single resolution (or scale) of community structure might lead to the conclusion that there is only one good way to partition that data (which is often misleading). Instead, being able to access multiple scales of resolution of the data can be crucial for identifying and understanding interesting phenomena that otherwise would have been unexplored. One example that illustrates the importance of multiresolution community detection is a study of the neural network of the nematode C. elegans. C. elegans is a free-living, transparent nematode that has become one of the most widely studied living organisms in biology. C. elegans was the first multicellular organism to have its whole genome sequenced and is currently still the only organism for which we have access to its whole connectome. The structural anatomy of C. elegans is approximately a cylinder of diameter 0.1 mm and length 1 mm. The structure of its neuronal wiring can be found in the Wormatlas database (Altun et al., 2002), consisting of 302 neurons, their locations, and the synapses between them as determined by serial section electron microscopy. The database also describes different functions in which each neuron is involved. Arenas, Fernandez, and Gomez (2008a) and Granell, Gomez, and Arenas (2011) studied the structure of the nematode from a complex networks perspective, illustrating that community detection can help discern the interplay between the topology and functionality of neural networks. The network abstraction describes the nervous system of C. elegans as a directed, weighted network, where nodes represent neuronal cell bodies and edges represent synapses. The resulting network was analyzed via modularity optimization (using the original formulation of Newman & Girvan,  2004), yielding a partition that divided the neurons into five communities corresponding mainly to locations on the worm’s body. This result is not entirely surprising, as it indicates that synapses occur more often within identifiable spatially contiguous and determined regions as compared to a corresponding random-graph model (which is independent of spatial location). However, the authors were interested in analyzing the network at further resolution levels, in the hope that this would reveal new interesting features. To this end, Arenas, Fernandez, and Gomez (2008b) proposed an algorithm using a modified version of the original modularity formulation, incorporating a tuning parameter to detect communities across the whole mesoscale. This was done by adding a self-loop of equal weight r to all nodes in the network, a modification that only affects the diagonal of the adjacency matrix and therefore keeps the network connectivity unchanged (cf. the different resolution parameter approach introduced by Reichardt & Bornholdt, 2006). When the weight r takes its minimum value, the maximum-­modularity partition for this modified network is a single community including all nodes (the macroscale). Conversely, when the weight of the self-loop is tuned to its maximum value, the corresponding partition separates each node into its own community (the microscale). By tuning r between these two extreme values, one can explore community structure at

Case Studies in Network Community Detection   321 300

Number of clusters

250 200 150 100 r=0

50 10

100

1000

10000

0

r -r min

figure 16.3  Results of a multiresolution community detection algorithm for the C. elegans neural network. Panel A shows the number of detected communities for the modularity­optimizing partition at every value of the topological scale defined by the log(r − rmin), where rmin is the value of r that maximizes the modified modularity measure for the macroscale partition (i.e., the partition obtained with r = 0). Panel B visualizes the frequency matrix of the mesoscales of the C. elegans, thresholded at a value of 0.6. different resolutions. It is worth noting that as each modularity optimization is independent from the others, the obtained structure is not forced to follow a hierarchical structure. To apply this algorithm to the C. elegans neural network, Granell et al. (2011) discretized the self-loop weight range into 1,000 logarithmically spaced intervals, spanning r ∈ [0,rmax]. By considering r > 0, they tunably identified a greater number of communities (whose sizes decreased) with increasing r. The mesoscale is depicted in Figure  16.3A, where we can observe multiple important resolution scales. The most persistent scale of community structure is highlighted by a circle in the figure, providing evidence that at this scale the communities are robustly detected. To simultaneously extract information across scales, they built a frequency matrix (or “consensus matrix”) encoding the number of times that two neurons were placed in the same community for the different r values. By thresholding these frequencies, they were able to unravel substructural scales corresponding to groups of neurons involved in different functionalities at different scales. Figure 16.3B shows the frequency matrix thresholded at 0.6, a value chosen by fixing the sizes of the groups to be analyzed to 10 neurons or less. The figure highlights the five large communities corresponding to optimizing the original modularity measure (i.e., r = 0), as well as the substructures within these five communities. In particular, the highlighted scales in Figure 16.3A contributed most to the frequency matrix. Trying to classify the functional role of neurons in C. elegans is extremely delicate because of their multifunctional aspects; that is, many neurons participate in different synaptic pathways, resulting in different functionalities. However, with the previously obtained partition and the extensive description of each neuron in the Wormatlas database, Granell et al. (2011) proposed a tentative classification of some groups of neurons. The task involved assigning functions to groups of nodes that are persistently coclustered across many scales of resolution. They identified nine groups of neurons that were both strongly persistent and small (specifically, they contained fewer than 10 neurons) and found these communities to be strongly associated to the following functional roles: (1) nose/head orientation

322   Saray Shai, Natalie Stanley, Clara Granell ET AL. movement; (2) head-withdrawal reflex, related to dorsal relaxation; (3) head-withdrawal reflex, related to ventral relaxation; (4) olfactory and thermosensation reflex; (5) chemotaxis to lysine reflex; (6) backward sinusoidal movement of the worm, related to touch ­stimulus; (7) forward and backward autonomous sinusoidal movement of the worm; (8) relaxation state related to a sleep state; and (9) a group containing neurons with functions that remain unknown. Their classification does not intend to be exact or final, but rather to provide biologists with useful information for future research. As we have seen, the application of community detection algorithms is a powerful approach to exploratory data analysis. Moreover, the use of a multiresolution approach identified structures beyond the expected grouping of neurons in different locations and allowed discovery of groups of neurons that contribute to the same neurological function, providing a takeoff point for further research.

Comparing Network Architectures of the Human Brain at Different States Another type of neural connectivity data is functional brain connectivity, which describes the statistical patterns of dynamic interactions among neurons or brain regions (Bullmore  & Sporns,  2009). Unlike the “structural network” in the previous section (where the network represents the actual wiring between neurons), “functional networks” can be measured with a variety of neuroimaging or electrophysiological recording methods and can be measured while the brain is in a resting state or under stimulus (Sporns, 2013). The structural and functional brain networks of various model organisms (such as C. elegans mentioned in the previous section) and humans have been shown to organize into communities (usually called modules in this context) that often correspond to specialized functional components (Sporns & Betzel, 2016). Such a modular organization has been suggested as evolutionarily advantageous for several reasons. For instance, it conserves the wiring cost involved in anatomically connecting neurons to constitute circuits or networks, since the connections inside communities are often shorter (Bullmore & Sporns, 2012). Moreover, changes in the modular organization of the human brain have been recently shown to associate with aging and clinical disorders (Fornito, Zalesky, & Breakspear, 2015). However, community detection applied to a static single network can fail to capture more realistic situations where the data is temporal, originates from multiple sources, or spans multiple spatial and/or temporal scales. To address this shortcoming, some community detection techniques have been recently extended for multilayer networks, in which multiple networks form a multilayer stack as shown in Figure 16.4A (see also the discussion in the second section on the use of multilayer networks in studying temporal Senate roll-call networks). In general, these layers can represent different time windows in an experiment, different individuals, or different experimental conditions. Cole et al. (2014) applied multilayer community detection to characterize the relationship between resting-state (i.e., subjects were asked to do nothing) and task-evoked (i.e., subjects were asked to perform a specific task such as pressing a button or answering a logic

Case Studies in Network Community Detection   323 Resting-state partition

(b)

(a)

From Power et al. (2011) 1 2 Tas k co nne 3 (64 ctivit or 7 y m a tota tric es l)

(d)

25

30

20 15

25

20

10 Consensus across 64 tasks

5 0

35

Partition similarity with power et al. (2011) resting–state (z-score)

Partition similarity with power et al. (2011) resting–state (z-score)

(c)

4

0

0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 Coupling parameter (higher = consensus)

Consensus across 7 tasks

15

10

0

0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 Coupling parameter (higher = consensus)

figure 16.4  Multilayer community detection applied to resting state and task-evoked functional brain networks. Each layer in the multilayer network (schematic shown in panel A) represents the functional connectivity between brain regions under different tasks. The layers are coupled by identity arcs of weight ω connecting each node (brain region) in a given layer to itself in all other layers (dashed lines). Panels C and D show the similarity (measured by the standardized Rand coefficient) of each task partition to the resting-state partition reported in Power et al. (2011) (shown in panel B) as a function of ω. As ω increases, the task partitions converge to a consensus partition similar to the resting-state community partition. Full color figures available on Oxford Handbooks Online. Reprinted figure with permission from Cole et al. (2014), © 2014 Elsevier Inc.

question) functional connectivity in the human brain. Subjects were asked to perform ­different kinds of tasks while functional magnetic resonance imaging (fMRI) was used to measure the temporal changes in brain activity across hundreds of brain regions. Then, for each task, they constructed a layer in a multilayer network using the Pearson correlations between the fMRI time series of all pairs of brain regions. The authors hypothesized that networks obtained from resting-state fMRI would reveal an intrinsic architecture that would also be present across a wide variety of task states (i.e., across networks obtained from fMRI measurements under different tasks), but also that some task-evoked connectivity changes unique to each task state would be evident. To estimate both intrinsic and evoked architectures simultaneously, Cole et al. (2014) used the multilayer generalization of modularity (Mucha et al., 2010) to uncover communities spanning across layers. In this setting, the multilayer formulation across subjects

324   Saray Shai, Natalie Stanley, Clara Granell ET AL. connects every brain region in a given layer to itself in each of the other layers with an ­identity arc of edge weight ω, called the coupling parameter. Note that in the temporal setting, as described in the second section, each layer represents a different time window and each node is coupled to its appearances in consecutive ordered layers; in contrast, here the layers are categorical with all-to-all interlayer (intertask) connections. The authors used small values of the coupling parameter ω to identify network communities elicited differentially across tasks, and large values of ω to identify consensus communities present across tasks. For a given ω, they applied multilayer community detection and compared the partition obtained for each task layer with the resting-state partition reported by Power et al. (2011) (which used the Infomap community detection method by Rosvall & Bergstrom, 2008). In particular, for every value of ω, they performed 100 random optimizations and chose the one that was most similar on average to the other 99 optimizations as the representative partition. This is one example of a consensus algorithm, which is used to find stable results from a set of partitions delivered by stochastic methods, as encountered with some of the computational heuristics for modularity optimization. The similarity between task-specific communities and resting-state communities is reported in Figure 16.4C,D, shown as a function of weight ω. Similarity was quantified by the z-score of the Rand coefficient, which counts the fraction of node pairs identified the same way by both partitions (either together in both or separate in both; see Traud et al., 2011). To ensure the robustness of the results, two datasets were used. One dataset consisted of 64 tasks (each performed by 15 individuals) defined as distinct cognitive processes with minimal perceptual changes across tasks. The second dataset involves seven tasks (each performed by 118 individuals) that were chosen to elicit the involvement of all major cognitive domains and brain systems. In both datasets, it was found that as ω increases, a single architecture emerged with high similarity to the resting-state network architecture. While multilayer community detection indeed encourages a single consensus partition at high coupling parameters, there is no guarantee that this partition would look like the resting-state partition. In other words, the network architecture present across many task states is also present during rest, implying an intrinsic network architecture. Upon further examination, the authors identified a set of small (but likely functionally important) task-evoked connectivities that differed from the resting-state connectivities. To quantify these network changes, they calculated the percentage of connections that significantly (quantified by t-tests) changed from the rest state, revealing a prominent pattern of decreased within-community connectivity and increased between-community connectivity during task performance, which suggests a partial breakdown of network communities during task performance so that activity can better flow between systems with diverse functions. Providing a mesoscale perspective on the organization of brain networks, multilayer community detection employed at different coupling parameters can be useful for network comparison. Here, the authors compared connectivity patterns between brain regions (representing the functional dependencies between their fMRI time series) under different tasks and a rest state, revealing an intrinsic community structure that was present across brain states as well as small (but consistent) changes in the community structures that were common across tasks.

Case Studies in Network Community Detection   325

A Probabilistic Network Model for Malaria Parasite Genes In addition to the analysis of neuroscience data, community detection can be useful for analyzing other biological data. The nature of community detection applied to biological data is desirable for developing a mechanistic understanding of the underlying system. Here we highlight the work of Larremore, Clauset, and Buckee (2013), which used community detection to develop and computationally investigate a hypothesis about the nature of recombination in the sequences of the genes (called var genes) encoding proteins in the human malaria parasite Plasmodium falciparum genome. This work is novel and interesting because the authors used a network representation of their data, along with the communities found in this representation, to formulate and validate biological hypotheses. Rich genetic diversity in the var genes of the human malaria parasite has been shown to contribute to the complexity of the epidemiology of the infection and disease. From generation to generation the parasite can change which of the var genes are expressed as proteins and exported to the surface of the red blood cells. This prevents existing antibodies from recognizing (and thus, resisting) whichever new var-encoded protein is on the cell surface, prolonging infection. One diversity-generating mechanism is recombination, which is the exchange and shuffling of genetic information during mitosis and meiosis (Barry et al., 2007). The ability to understand genetic diversity is complicated by inadequate tools to uncover the phylogeny, or genetic relationship between sequences resulting from recombination events, in a scalable and statistically rigorous way. The typical analyses for evolutionary data assume a tree-like relationship between events, which does not accommodate recombination data. To address this challenge, Larremore et al. (2013) use a novel approach: they cast their problem in terms of a collection of networks. Then, they apply community detection to each of the networks and use the properties of the communities to generate hypotheses of the mechanisms behind the recombination process. More specifically, to investigate the heterogeneity and the corresponding possible patterns in recombination events across a set of 307 sequences from the var genes, the authors restricted their analyses to nine particular “highly variable regions” (HVRs) within each of the 307 sequences. Then for each HVR, they constructed a network, where the nodes represented the 307 sequences and an edge was placed between a pair of nodes if they had evidence of a recombinant relationship, based on a notion of sequence similarity within the particular HVR. Communities were then identified in each of the nine networks using a degree-corrected stochastic blockmodel (SBM) approach (Karrer & Newman, 2011). In the SBM, the probability of an edge existing between a pair of nodes depends on their community assignments, and hence nodes within a community are connected to each other and to other communities in a characteristic way. For a network with N nodes and K communities, the SBM is parametrized by an N-length array z, where zi gives the community assignment for node i, and a K × K matrix, θ, where θij (together with the node degrees) specifies the probability of an edge existing between nodes in communities i and j. In the process of fitting the SBM, one learns the parameters θ and z that are most likely to describe the data, and hence these parameters can then be used to sample networks from the model. In this

326   Saray Shai, Natalie Stanley, Clara Granell ET AL. (a)

(b) 0

0

50

100

node index 150 200

250

300

50

node index

100 150 200 250 300 HVR 6

figure 16.5  Visualization of the community assignments inferred through the stochastic blockmodel for HVR6. Panel A shows the HVR6 network, with the nodes colored by the inferred community assignment. Panel B gives the community-colored adjacency matrix, with the rows and columns sorted by the inferred community assignments. Full color figures available on Oxford Handbooks Online. Reprinted figure with permission from Larremore et al. (2013).

analysis, sampling from the model was useful because it allowed the authors to create synthetic networks to computationally validate their hypotheses about the constraints influencing recombination. After identifying communities within each HVR network, as shown in Figure 16.5, the authors used two summary statistics to formulate their biological hypothesis. First, the variation of information (Meila,  2005) was used to compare the community assignments of nodes (i.e., each of the 307 sequences) across the nine HVR networks. They observed that six of the nine networks had a prominent community structure (i.e., far from random). Of those, in four networks the community assignments were distinct, while the other two were correlated. These observations motivated the hypothesis that recombination events occur in constrained ways, leading to a strong community structure, and that one should analyze HVR networks individually instead of building a consensus network that aggregates the HVR networks. Next, they used assortativity (see, e.g., Newman, 2002) to overlay the network structure with various known biological features of the sequences, such as var gene length. Specifically, assortativity quantifies the tendency of nodes of the same type (e.g., same gene length) to be connected in the network. They observed that three HVR networks had community structure correlating strongly with two biological features (i.e., nodes of the same biological label tend to group together), while three other HVR networks with highly heterogenous community structure were unaligned with any of the known biology. These observations allowed for the formulation of the hypothesis that the HVRs that are unrelated

Case Studies in Network Community Detection   327 to each other also promote recombination under unrelated constraints and are responsible for fostering genetic diversity to avoid immune evasion. Given the ability to find communities within each HVR network and the lack of similarity in community structure between HVR networks, Larremore et al. (2013) were able to formulate and test hypotheses for the diversity-generating mechanisms of var genes, and this would have been difficult using standard phylogenetic approaches or without adopting a community-based perspective. The application of the stochastic blockmodel to this task provided a statistically grounded approach for testing the plausibility of the model.

Concluding Comments Through five representative case studies from diverse application domains, we have demonstrated the utility of community detection in data analysis tasks such as prediction (see the first section), node role classification and temporal evolution (see the second section), multiscale functional analysis (see the third section), network comparison (see fourth section), and data representation for probabilistic model construction (see the fifth section). Our goal here was to provide the reader with an application-driven perspective on the various uses of community detection while highlighting application-specific goals and motivations for identifying communities in networks. We have by no means covered even a small fraction of the activity in community detection with the previous examples, and many others could have been used (see, e.g., recent applications in Hi-C data analysis [Cabreros, Abbe, and Tsirigos, 2016], network security [Ding et al., 2012], and understanding of animal societies [Rubenstein et al., 2015]). We hope that our presentation encourages readers to think about how community detection might be useful in their own work.

Acknowledgments We are grateful to the many people who gave us feedback on a draft version of this chapter, with a special thanks to Dan Larremore and Aaron Clauset. The perspective on community detection presented in this work has been heavily influenced by the authors’ research activities with many collaborators—far too many to properly list here—across a variety of supported projects and training grants from different agencies, most recently including the James S.  McDonnell Foundation (grant #220020315 and #220020457), the National Institutes of Health (Award Numbers R01HD075712, R56DK111930, and T32CA201159), and the National Science Foundation (ECCS1610762). The present content is solely the responsibility of the authors and does not necessarily represent the official views of any funding agencies.

328   Saray Shai, Natalie Stanley, Clara Granell ET AL.

References Abbe, E. (2017). Community detection and stochastic block models: Recent developments. The Journal of Machine Learning Research, 18(1), 6446–6531. Ahn, Y.-Y., Bagrow, J. P., & Lehmann, S. (2010). Link communities reveal multiscale complexity in networks. Nature, 466(7307), 761–764. Altun, Z., Herndon, L., Wolkow, C., Crocker, C., Lints, R., & Hall, D. (2002). Wormatlas. A database featuring behavioral and structural anatomy of Caenohabditis elegans. http:// www.wormatlas.org/ Ames, B. P., & Vavasis, S. A. (2011). Nuclear norm minimization for the planted clique and biclique problems. Mathematical Programming, 129(1), 69–89. Ames, B.  P., & Vavasis, S.  A. (2014). Convex optimization for the planted k-disjoint-clique problem. Mathematical Programming, 143(1–2), 299–337. Andris, B., Lee, D., Hamilton, M. J., Martino, M., Gunning, C. E., & Selden, J. A. (2015). The rise of partisanship and super-cooperators in the U.S. house of representatives. PLoS One, 10(4), e0123507. Aral, S., Muchnik, L., & Sundararajan, A. (2009). Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. Proceedings of the National Academy of Sciences, 106(51), 21544–21549. Arenas, A., Dıaz-Guilera, A., & Perez-Vicente, C. J. (2006). Synchronization reveals topological scales in complex networks. Physical Review Letters, 96(11), 114102. Arenas, A., Fernandez, A., & Gomez, S. (2008a). A complex network approach to the determination of functional groups in the neural system of C. elegans. Berlin, Heidelberg, Germany: Springer Berlin Heidelberg. Arenas, A., Fernandez, A., & Gomez, S. (2008b). Analysis of the structure of complex networks at different resolution levels. New Journal of Physics, 10(053039). Ball, A., Karrer, B., & Newman, M. E. J. (2011). Efficient and principled method for detecting communities in networks. Physical Review E, 84(3), 036103. Barnes, E. R. (1982). An algorithm for partitioning the nodes of a graph. SIAM Journal on Algebraic Discrete Methods, 3(4), 541–550. Barry, A. E., Leliwa-Sytek, A., Tavul, L., Imrie, H., Migot-Nabias, F., Brown, S. M., . . . Day, K. P. (2007). Population genomics of the immune evasion (var) genes of Plasmodium ­falciparum. PLOS Pathogens, 3(3), e34. Bassett, D. S., Porter, M. A., Wymbs, N. F., Grafton, S. T., Carlson, J. M., & Mucha, P. J. (2013). Robust detection of dynamic community structure in networks. Chaos, 23(1), 013142. Binkiewicz, N., Vogelstein, J.  T., & Rohe, K. (2017). Covariate assisted spectral clustering. Biometrika, 104(2), 361–377. Blondel, V. D., Guillaume, J.-L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10008. Bothorel, C., Cruz, J. D., Magnani, M., & Micenkova, B. (2015). Clustering attributed graphs: Models, measures and methods. Network Science, 3(3), 408–444. Brandes, U., Delling, D., Gaertler, M., Gorke, R., Hoefer, M., Nikoloski, Z., & Wagner, D. (2008). On modularity clustering. IEEE Transactions on Knowledge and Data Engineering, 20(2), 172–188. Breiger, R. L., Boorman, S. A., & Arabie, P. (1975). An algorithm for clustering relational data with applications to social network analysis and comparison with multidimensional scaling. Journal of Mathematical Psychology, 12(3), 328–383.

Case Studies in Network Community Detection   329 Bullmore, E., & Sporns, O. (2009). Complex brain networks: Graph theoretical analysis of structural and functional systems. Nature Reviews Neuroscience, 10(3), 186–198. Bullmore, E., & Sporns, O. (2012). The economy of brain network organization. Nature Reviews Neuroscience, 13(5), 336–349. Cabreros, I., Abbe, E., & Tsirigos, A. (2016). Detecting community structures in hi-c genomic data. In 2016 Annual Conference on Information Science and Systems (CISS) (pp. 584–589). doi:10.1109/CISS.2016.7460568 Cai, T. T., Li, X., et al. (2015). Robust and computationally feasible community detection in the presence of arbitrary outlier nodes. Annals of Statistics, 43(3), 1027–1059. Centola, D. (2010). The spread of behavior in an online social network experiment. Science, 329(5996), 1194–1197. Centola, D., & Macy, M. (2007). Complex contagions and the weakness of long ties. American Journal of Sociology, 113(3), 702–734. Chen, Y., Sanghavi, S., & Xu, H. (2012). Clustering sparse graphs. In Advances in neural information processing systems (pp. 2204–2212). Cole, M. W., Bassett, D. S., Power, J. D., Braver, T. S., & Petersen, S. E. (2014). Intrinsic and task-evoked network architectures of the human brain. Neuron, 83(1), 238–251. Coleman, J. S. (1964). Introduction to mathematical sociology. London, UK: Collier-Macmillan. Colizza, V., & Vespignani, A. (2008). Epidemic modeling in metapopulation systems with heterogeneous coupling pattern: Theory and simulations. Journal of Theoretical Biology, 251(3), 450–467. Cranmer, S. J., Menninga, E. J., & Mucha, P. J. (2015). Kantian fractionalization predicts the conflict propensity of the international system. Proceedings of the National Academy of Sciences, 112(38), 11812–11816. Delvenne, J.-C., Yaliraki, S. N., & Barahona, M. (2010). Stability of graph communities across time scales. Proceedings of the National Academy of Sciences, 107(29), 12755–12760. Dietz, K. (1967). Epidemics and rumours: A survey. Journal of the Royal Statistical Society, Series A (General), 505–528. Ding, Q., Katenka, N., Barford, P., Kolaczyk, E., & Crovella, M. (2012). Intrusion as (anti)social communication: Characterization and detection. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ‘12) (pp. 886–894). New York, NY: ACM. http://doi.acm.org/10.1145/2339530.2339670 Fiedler, M. (1973). Algebraic connectivity of graphs. Czechoslovak Mathematical Journal, 23, 289–305. Fienberg, S. E., & Wasserman, S. S. (1981). Categorical data analysis of single sociometric relations. Sociological Methodology, 12, 156–192. Fisher, J. C. (2017). Exit, cohesion, and consensus: Social psychological moderators of consensus among adolescent peer groups. Social Currents, 2329496517704859. Flake, G. W., Lawrence, S., & Giles, C. L. (2000). Efficient identification of web communities. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’00) (pp. 150–160). New York, NY: ACM. Fornito, A., Zalesky, A., & Breakspear, M. (2015). The connectomics of brain disorders. Nature Reviews Neuroscience, 16(3), 159–172. Fortunato, S. (2010). Community detection in graphs. Physics Reports, 486(3), 75–174. Fortunato, S., & Barthelemy, M. (2007). Resolution limit in community detection. Proceedings of the National Academy of Sciences, 104(1), 36–41. Fortunato, S., & Hric, D. (2016). Community detection in networks: A user guide. Physics Reports, 659, 1–44.

330   Saray Shai, Natalie Stanley, Clara Granell ET AL. Freeman, L. C. (2004). The development of social network analysis: A study in the sociology of science. Vancouver, Canada: Empirical Press. Galstyan, A., & Cohen, P. (2007). Cascading dynamics in modular networks. Physical Review E, 75(3), 036109. Girvan, M., & Newman, M. E. J. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12), 7821–7826. Gleeson, J. P. (2008). Cascades on correlated and modular random networks. Physical Review E, 77(4), 046117. Goffman, W., & Newill, V. (1964). Generalization of epidemic theory. Nature, 204(4955), 225–228. Good, B. H., de Montjoye, Y.-A., & Clauset, A. (2010). Performance of modularity maximization in practical contexts. Physical Review E, 81(4). Granell, B., Gomez, S., & Arenas, A. (2011). Mesoscopic analysis of networks: Applications to exploratory analysis and data clustering. Chaos, 21(1), 016102. Granovetter, M. (1978). Threshold models of collective behavior. American Journal of Sociology, 1420–1443. Granovetter, M.  S. (1973). The strength of weak ties. American Journal of Sociology, 78(6), 1360–1380. Guimera, R., Sales-Pardo, M., & Amaral, L. (2004). Modularity from fluctuations in random graphs and complex networks. Physical Review E, 70(2). Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning. Berlin, Germany: Springer. Hastings, M. B. (2006). Community detection as an inference problem. Physical Review E, 74(3), 035102. Hewstone, M., Rubin, M., & Willis, H. (2002). Intergroup bias. Annual Review of Psychology, 53(1), 575–604. Holland, P. W., Laskey, K. B., & Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social Networks, 5(2), 109–137. Hric, D., Darst, R. K., & Fortunato, S. (2014). Community detection in networks: Structural communities versus ground truth. Physical Review E, 90(6), 062805. Jeub, L. G. S., Balachandran, P., Porter, M. A., Mucha, P. J., & Mahoney, M. W. (2015). Think locally, act locally: Detection of small, medium-sized, and large communities in large networks. Physical Review E, 91(1), 012821. Karrer, B., & Newman, M. E. J. (2011). Stochastic blockmodels and community structure in networks. Physical Review E, 83(1), 016107. Kernighan, B. W., & Lin, S. (1970). An efficient heuristic procedure for partitioning graphs. Bell System Technical Journal, 49(2), 291–307. Kivela, M., Arenas, A., Barthelemy, M., Gleeson, J.  P., Moreno, Y., & Porter, M.  A. (2014). Multilayer networks. Journal of Complex Networks, 2(3), 203–271. Larremore, D. B., Clauset, A., & Buckee, C. O. (2013). A network approach to analyzing highly recombinant malaria parasite genes. PLoS Computational Biology, 9(10), e1003268. Lee, S. H., Magallanes, J. M., & Porter, M. A. (2017). Time-dependent community structure in legislation cosponsorship networks in the congress of the republic of Peru. Journal of Complex Networks, 5(1), 127–144. Levorato, M., & Frota, Y. (2016). Brazilian congress structural balance analysis. arXiv:1609.00767.

Case Studies in Network Community Detection   331 Li, D., Leyva, I., Almendral, J. A., Sendina-Nadal, I., Buldu, J. M., Havlin, S., & Boccaletti, S. (2008). Synchronization interfaces and overlapping communities in complex networks. Physical Review Letters, 101(16), 168701. Macon, K. T., Mucha, P. J., & Porter, M. A. (2012). Community structure in the United Nations General Assembly. Physica A, 391(1–2), 343–361. Mahoney, M. W., Orecchia, L., & Vishnoi, N. K. (2012). A local spectral method for graphs: With applications to improving graph partitions and exploring data graphs locally. Journal of Machine Learning Research, 13(1), 2339–2365. McPherson, M., Smith-Lovin, L., & Cook, J. M. (2001). Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27, 415–444. Meila, M. (2005). Comparing clusterings: An axiomatic view. In Proceedings of the 22nd International Conference on Machine Learning (ICML ’05) (pp. 577–584). New York, NY: ACM. Melnik, S., Porter, M. A., Mucha, P. J., & Gleeson, J. P. (2014). Dynamics on modular networks with heterogeneous correlations. Chaos, 24(2), 023106. Melnik, S., Ward, J. A., Gleeson, J. P., & Porter, M. A. (2013). Multi-stage complex contagions. Chaos, 23(1), 013124. Moody, J., & Mucha, P. J. (2013). Portrait of political party polarization. Network Science, 1(1), 119–121. Moody, J., & White, D. R. (2003). Structural cohesion and embeddedness: A hierarchical concept of social groups. American Sociological Review, 68(1), 103–127. Mucha, P. J., & Porter, M. A. (2010). Communities in multislice voting networks. Chaos, 20(4), 041108. Mucha, P.  J., Richardson, T., Macon, K., Porter, M.  A., & Onnela, J.-P. (2010). Community structure in time-dependent, multiscale, and multiplex networks. Science, 328(5980), 876–878. Nadakuditi, R. R., & Newman, M. E. J. (2012). Graph spectra and the detectability of community structure in networks. Physical Review Letters, 108, 188701. Newman, M.  E.  J. (2002). Assortative mixing in networks. Physical Review Letters, 89(20), 208701. Newman, M. E. J. (2006). Finding community structure in networks using the eigenvectors of matrices. Physical Review E, 74(3). Newman, M. E. J., & Clauset, A. (2016). Structure and inference in annotated networks. Nature Communications, 7, 11863. Newman, M. E. J., & Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review E, 69, 026113. Onnela, J.-P., Saramaki, J., Hyvoen, J., Szabo, G., Lazer, D., Kaski, K., . . . Baraba’si, A.-L. (2007). Structure and tie strengths in mobile communication networks. Proceedings of the National Academy of Sciences, 104(18), 7332–7336. O’Sullivan, D. J., O’Keeffe, G. J., Fennell, P. G., & Gleeson, J. P. (2015). Mathematical modeling of complex contagion on clustered networks. Frontiers in Physics, 3, 71. Oymak, S., & Hassibi, B. (2011). Finding dense clusters via “low rank + sparse” decomposition. arXiv preprint arXiv:1104.5186. Peel, L., Larremore, D. B., & Clauset, A. (2017). The ground truth about metadata and community detection in networks. Science Advances, 3(5), E1602548. Peixoto, T.  P. (2013). Parsimonious module inference in large networks. Physical Review Letters, 110(14), 148701.

332   Saray Shai, Natalie Stanley, Clara Granell ET AL. Peixoto, T.  P. (2014). Hierarchical block structures and high-resolution model selection in large networks. Physical Review X, 4(1). Pons, P., & Latapy, M. (2005). Computing communities in large networks using random walks. Berlin, Heidelberg, Germany: Springer Berlin Heidelberg. Porter, M. A., Friend, A. J., Mucha, P. J., & Newman, M. E. J. (2006). Community structure in the U.S. House of Representatives. Chaos, 16(4), 041106. Porter, M. A., Mucha, P. J., Newman, M. E. J., & Warmbrand, C. M. (2005). A network analysis of committees in the US House of Representatives. Proceedings of the National Academy of Sciences, 102(20), 7057–7062. Porter, M. A., Mucha, P. J., Newman, M. E. J., & Friend, A. J. (2007). Community structure in the United States House of Representatives. Physica A, 386(1), 414–438. Porter, M. A., Onnela, J.-P., & Mucha, P. J. (2009). Communities in networks. Notices of the AMS, 56(9), 1082–1097, 1164–1166. Power, J. D., Cohen, A. L., Nelson, S. M., Wig, G. S., Barnes, K. A., Church, J. A., . . . Petersen, S. E. (2011). Functional network organization of the human brain. Neuron, 72(4), 665–678. Reichardt, J., & Bornholdt, S. (2004). Detecting fuzzy community structures in complex networks with a Potts model. Physical Review Letters, 93(21), 218701. Reichardt, J., & Bornholdt, S. (2006). Statistical mechanics of community detection. Physical Review E, 74(1), 016110. Rosvall, M., & Bergstrom, C. T. (2008). Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences, 105(4), 1118–1123. Rubenstein, D. I., Sundaresan, S. R., Fischhoff, I. R., Tantipathananandh, C., & Berger-Wolf, T. Y. (2015). Similar but different: Dynamic social network analysis highlights fundamental differences between the fission-fusion societies of two equid species, the Onager and Grevy’s zebra. PLoS One, 10(10), 1–21. doi:10.1371/journal.pone.0138645 Rubinov, M., & Sporns, O. (2010). Complex network measures of brain connectivity: Uses and interpretations. NeuroImage, 52(3), 1059–1069. Schaub, M. T., Delvenne, J.-C., Rosvall, M., & Lambiotte, R. (2017). The many facets of community detection in complex networks. Applied Network Science, 2(1), 4. Shalizi, C. R., & Thomas, A. C. (2011). Homophily and contagion are generically confounded in observational social network studies. Sociological Methods & Research, 40(2), 211–239. Skardal, P. S., & Restrepo, J. G. (2012). Hierarchical synchrony of phase oscillators in modular networks. Physical Review E, 85(1), 016208. Snijders, A. T., & Nowicki, K. (1997). Estimation and prediction for stochastic blockmodels for graphs with latent block structure. Journal of Classification, 14(1), 75–100. Sohn, Y., Choi, M.-K., Ahn, Y.-Y., Lee, J., & Jeong, J. (2011). Topological cluster analysis reveals the systemic organization of the Caenorhabditis elegans connectome. PLoS Computational Biology, 7(5), e1001139. Spirin, V., & Mirny, L.  A. (2003). Protein complexes and functional modules in molecular networks. Proceedings of the National Academy of Sciences, 100(21), 12123–12128. Sporns, O. (2013). Structure and function of complex brain networks. Dialogues in Clinical Neuroscience, 15(3), 247–262. Sporns, O., & Betzel, R. F. (2016). Modular brain networks. Annual Review of Psychology, 67, 613–640. Taylor, D., Klimm, F., Harrington, H. A., Krama’r, M., Mischaikow, K., Porter, M. A., & Mucha, P. J. (2015). Topological data analysis of contagion maps for examining spreading processes on networks. Nature Communications, 6, 7723.

Case Studies in Network Community Detection   333 Traud, A. L., Kelsic, E. D., Mucha, P. J., & Porter, M. A. (2011). Comparing community structure to characteristics in online collegiate social networks. SIAM Review, 53(3), 526–543. Ugander, J., Backstrom, L., Marlow, C., & Kleinberg, J. (2012). Structural diversity in social contagion. Proceedings of the National Academy of Sciences, 109(16), 5962–5966. Watts, D. J. (2002). A simple model of global cascades on random networks. Proceedings of the National Academy of Sciences, 99(9), 5766–5771. Waugh, A. S., Pei, L., Fowler, J. H., Mucha, P. J., & Porter, M. A. (2009). Party polarization in Congress: A network science approach. arXiv:0907.3509. Weng, L., Menczer, F., & Ahn, Y.-Y. (2013). Virality prediction and community structure in social networks. Scientific Reports, 3, 2522. Weng, L., Menczer, F., & Ahn, Y.-Y. (2014). Predicting successful memes using network and community structure. In Proceedings of the Eighth International AAAI Conference on Weblogs and Social Media (ICWSM). White, H. C., Boorman, S. A., & Breiger, R. L. (1976). Social structure from multiple networks. I. Blockmodels of roles and positions. American Journal of Sociology, 81(4), 730–780. Wikipedia. (2017a). Civil Rights Act of 1957. http://en.wikipedia.org/w/index.php?title=Civil_ Rights_Act_of_1957. Page Version ID: 761682759. Wikipedia. (2017b). Political parties in the United States. https://en.wikipedia.org/w/index. php?title=Political_parties_in_the_United_States. Page Version ID: 768751459. Wu, F. Y. (1982). The Potts model. Reviews of Modern Physics, 54(1), 235–268. Yang, J., McAuley, J., & Leskovec, J. (2013). Community detection in networks with node attributes. In 2013 IEEE 13th International Conference on Data Mining (pp. 1151–1156). IEEE. Zachary, W. W. (1977). An information flow model for conflict and fission in small groups. Journal of Anthropological Research, 33(4), 452–473. Zhang, Y., Friend, A.  J., Traud, A.  L., Porter, M.  A., Fowler, J.  H., & Mucha, P.  J. (2007). Community structure in congressional cosponsorship networks. Physica A, 387(7), 1705–1712. Zhou, H. (2003). Network landscape from a Brownian particle’s perspective. Physical Review E, 67(4), 041908.

Chapter 17

Thr ee Perspecti v es on Cen tr a lit y Stephen P. Borgatti and Martin G. Everett

Social network analysis is blessed with a large number of measures of centrality. We know these are node-level measures and that they are in a very general sense about measuring node importance. But beyond that it isn’t easy to define what counts as a centrality measure and what doesn’t. Part of the problem, of course, is that people can call anything they want “centrality,” and they do. They can also use much more fanciful names instead of centrality, such as ­influence, prominence, status, power, and so on, which they also do, even though it is an empirical ­question how a graph-theoretic concept like centrality relates to these variables. Finally, it is an evolving category, so things that today we might not call centrality still carry the name. This chapter is about three perspectives on centrality, which we refer to as the “walk structure perspective,” the “induced centrality perspective,” and the “flow outcomes perspective.” They are presented not as competing frameworks—the first is so general as to easily coexist with any other—but as conceptual tools that provide insight into centrality and perhaps into the enterprise of social network analysis generally. Before getting into these, however, a few prefatory notes are in order. It is often stated that centrality measures are structural measures. The idea here is to distinguish a centrality measure from, say, a node’s wealth or social class. The measures are constructed from the network (i.e., “endogenous”), and not from qualities of the actors that one would need to look beyond the pattern of ties to obtain (i.e., “exogenous”). This is a useful distinction but overstated. It is not that one cannot incorporate financial status into a centrality measure, but that a centrality measure must incorporate ties as a key feature. So, for example, in a study of organizations, we might define a variant of degree centrality (i.e., the number of ties a node has) by differentially weighting ties by the organizational rank of the person being connected to. Thus, my ties to high-ranking managers count more than my ties to ordinary workers. Such a measure incorporates both a structural aspect (the ties) and an exogenous attribute of the nodes (their rank in the organization). This is a legitimate and, indeed, useful measure of centrality. It is also worth noting that centrality concepts are universally described as “measures.” This causes all kinds of confusion. For most social scientists, the word measure indicates

Three Perspectives on Centrality   335 that centrality measures fall under methodology and are best described in the methods section of an empirical paper. Thus, the hypothesis in the main section of the paper is going to be something like “More central managers will be more likely to receive promotion in the next year.” The specific centrality concept is left unstated. What these researchers don’t understand is that “centrality” is just a family of concepts on par with “demographics” or “personality traits.” Who writes a hypothesis like “Demographics will be associated with postsurgical survival”? No one would accept this because it is understood that “demographics” is just a category of constructs. Rather, with personality, a theory would argue, with appropriate reasoning, that a specific personality characteristic, such as neuroticism, should contribute to survival. That trait would be part of the hypothesis, as in “Higher levels of neuroticism will be associated with reduced five-year survival rates.” How neuroticism is actually measured is left for the methods section, because there are multiple ways to measure the same psychological construct. Similarly, a good network hypothesis would state that, say, closeness centrality will be associated with being perceived to be a good source of gossip. The methods section would then discuss exactly how closeness was being calculated (and there are indeed multiple ways). Finally, it should be mentioned that there have been some attempts to define centrality measures, or at least to lay down characteristics of “well formedness.” For example, Sabidussi (1966) suggested five properties that every centrality measure should observe. These were things like adding a tie to a node should always increase its centrality and adding a tie anywhere else in the network should never reduce a node’s centrality. Unfortunately, the only measure to pass all five criteria was his own, a form of closeness centrality. Freeman (1978/1979) argued that any centrality measure worth its salt should achieve its highest value with the node at the center of a star graph. Specifically, if we consider all possible graphs of a given size (defined by number of nodes), no node in any graph should get a higher score than the node at the center of a star graph. This is an appealing criterion but does leave out measures—such as the PN measure of Everett and Borgatti (2014)—that intuitively seem as qualified to be called centrality measures as any other. The one property that does make sense to require all purely structural centrality ­measures to obey is the isomorphism rule: if two nodes are isomorphic, they must receive the same score. In a sense, this is what we mean by “structural”—if the values are calculated entirely from the pattern of ties in the network, structurally identical nodes will necessarily receive the same scores. Of course, if the measure is a hybrid that deliberately incorporates an exogenous node attribute such as personality, then we simply modify the isomorphism rule correspondingly: if two nodes are isomorphic in the network and have the same personality, then they must receive the same score.1 Schoch and Brandes (2016) have examined a simple property that is shared by nearly all centrality measures. The open neighborhood of a node v is the subgraph induced by all the nodes adjacent to v and is denoted by N(v). That is, N(v) is the graph that consists of all nodes adjacent to v and all the edges in the original graph that connect these vertices but does not include v itself. If we include v, it is called the closed neighborhood and is denoted by N[v]. A node u is said to dominate a node v if N [u] ⊇ N (v ) , meaning that the open neighborhood of v is included in the closed neighborhood of u. A centrality measure c is said to be neighborhood preserving if whenever u dominates v, then c(u) ≥ c(v ) . This implies that, given a graph in which there are some nodes that satisfy neighborhood inclusion, then the centrality ranking is predetermined. Of the centrality measures commonly used in

336   Stephen P. Borgatti and Martin G. Everett social networks, the only one that does not obey this property is Bonacich’s (1987) beta centrality (also known as Bonacich power), but only when beta is negative. It should be noted that there exist graphs for which every pair of nodes satisfy neighborhood inclusion, and so the ranking of nodes by almost every centrality measure is completely determined regardless of which measure is used. On the other hand, it is possible to have graphs for which no pairs of nodes satisfy the inclusion property, in which case the centrality rankings across measures can be quite independent. In reading this chapter, one theme that the reader might keep in mind is the tension between using standard measures that the field as a whole has a great deal of experience using and constructing new measures tailor-made for a specific application. The downside of the former is that no off-the-shelf measure may be quite right for a given application, while the downside of the latter is that the new measure has not been vetted by the community and could have serious flaws. Moreover, consumers of research utilizing new measures may wonder if the measure was somehow ginned up specifically because it would yield significant results for the data in question—a kind of overfitting problem.

The Walk Structure Perspective In social network analysis, our central object is the graph—a set of nodes together with a set of dyadic ties2 that connect them. The definition makes it sound like the ties are independent of each other—just a big bucket of connected dyads, like a sample of mother-daughter pairs. But what makes networks interesting is that sets of dyads interlink by sharing nodes, forming paths through the network. Because of paths, almost any node can affect any other, creating what otherwise might be seen as “spooky action at a distance,” as Einstein described quantum mechanics. The crux of the walk structure perspective is that some nodes will, because of the structure of paths, be more or less influential in the network as a whole. But let us back up and discuss paths more carefully. A path is a kind of traversal through a network that begins at a node, ends at a node, and moves only between adjacent (i.e., tied) nodes. By convention, the length of a traversal is defined as the number of links traversed. For simplicity, we assume that ties have no direction so that the A—B tie is the same object as the B—A tie. But the term path has a very specific meaning in graph theory, namely a traversal that never repeats a node. Hence the sequence A—B—C—D—E is a path, but the sequence A—B—C—D—B—E is not. A traversal that allows revisiting of nodes (but not links) is a trail. The sequence A—B—C—D—B—E is a trail, but A—B—C—D—B—C—E is not, because it repeats the B-to-C move. An unrestricted traversal, which can repeat nodes and links, is a walk. The sequence A—B—C—D—B—C—E is a walk, as is the sequence A—B—A—B—A—B—C—B, etc. Obviously walks can be infinite in length. Also obviously, trails are a kind of walk, and paths are a kind of trail. Among paths between a given pair of nodes, the shortest path is called a geodesic. Note there can be multiple geodesics linking the same two nodes. The walk structure perspective says that what a centrality measure measures is the involvement of a node in the traversal structure of a graph (which is to say the walk ­structure, since walks are the most general kind of traversals). The more involvement, the more ­central.

Three Perspectives on Centrality   337 How measures differ is in what kinds of traversals they pay attention to—unrestricted walks, trails, paths, geodesics—and what property of the traversals they focus on (e.g., their number or their length). The perspective also pays attention to the role of the node in these traversals. Are they starting/ending points? Or interior nodes in the sequence? We shall refer to measures focusing on the former as radial measures and measures focusing on the latter as medial measures. To illustrate, let us look at various measures of centrality in light of these different traversals, starting with degree. Degree is just the number of ties a node has. For our purposes, it is useful to think of a tie as a path (indeed, a geodesic) of length 1. So degree centrality counts the number of paths of length 1 that emanate (or terminate) from a node. Note that degree is a radial measure: we are counting paths that emanate from (or terminate at) a node. Another well-known measure is closeness. Sabidussi (1966) and Freeman (1978/1979) both define closeness as the sum of graph-theoretic distances from a node to all others, where distance is defined as the length of a geodesic connecting two nodes. While the sum of distances is simple, it is more intuitive to think in terms of the average distance. More central nodes are closer, on average, to all others than peripheral nodes. Since closeness is an inverse measure of centrality (larger numbers indicate less centrality), a popular alternative is reciprocal closeness. Whereas classical average closeness for node i can be defined as −1 (n − 1) ∑ j dij , where dij is the distance from i to j, reciprocal closeness averages the recip−1 rocals of distances: ( n − 1) ∑ j di−j 1. Both measures of closeness are radial measures like degree but differ from degree in that, instead of counting qualified paths, they assess their typical length. This distinction is important, but we must be careful to understand what it is not. Both degree and closeness make use of length, but in different ways. Degree uses length to bound the set of paths to be considered (specifically, paths of length 1), but then counts them rather than assesses their length. Consider the many obvious variants of degree. As early as 1966, Donald Sade proposed the k-reach measure, which counts the number of paths (emanating from a node) of length less than or equal to k, a parameter to be determined by the researcher. Or we could count all of the ­geodesics, regardless of length, that emanate from a node. In both cases, length is used to identify suitable paths, but the property measured is the number or frequency of ­qualifying paths.3 Agneessens, Borgatti, and Everett (2017) offer a variant of reciprocal closeness that yields a weighted average of distances. This is accomplished by adding a negative exponent to the distances: when the exponent is large, the measure weights short distances more than long ones (see Equation 1). When the exponent is 1, the measure reduces to reciprocal closeness. Of interest in this context is that, while the measure is clearly about assessing typical length rather than counting paths, Agneessens et al. note that when the exponent is set to infinity, the formula yields degree centrality. The point is that our distinction between length and count measures can break down in special cases (see also Brandes, Borgatti and Freeman, 2016).

ci (δ ) =

∑ j dij−δ n −1



(1)

Betweenness centrality is defined by the formula in Equation 2 (Freeman, 1978/1979). In the formula, gikj is the number of geodesic paths from i to j that pass through k. The summation is

338   Stephen P. Borgatti and Martin G. Everett across all possible pairs i and j. Ignoring the denominator inside the sum, the measure counts the number of shortest paths from anywhere to anywhere that pass through k. In short, the measure defines geodesics as the traversals of interest, and then looks at how often k is an interior node in these traversals. It is a medial measure of centrality, and the property measured is frequency. Of course, the actual formula divides gikj by gij, which is the total number of geodesics from i to j. So a more accurate description of the measure is that it sums the share of best paths that k sits on. It counts all of the geodesics that include k as an interior node, but it discounts those for which there is an alternative geodesic between the same pair. Other betweenness measures have been devised that don’t rely on geodesics. For example, Newman (2005) proposes random walk betweenness, which counts all walks, rather than just geodesics. bk = ∑ i , j



g ikj g ij



(2)

An interesting betweenness measure is flow betweenness (Freeman, Borgatti, & White, 1991). For nonvalued graphs, the measure attempts to count the number of edge-independent paths between all pairs of nodes that pass through a given node. This measure introduces a new descriptive element in characterizing nodes’ involvement in the traversals of a graph. The individual paths being counted are true paths in that nodes are not repeated. However, the unit of analysis is the set of paths from a source node to a target, which are required to be edge independent.4 Thus, the subset of paths used for flow betweenness have the property that they do not share edges. In addition, there is no requirement that the paths be geodesics. One reason for considering whole sets of paths from one node to another is to model traversals in which something flowing through a network may be in multiple places at once. This happens when what is flowing is copied rather than transferred across ties. Thus, multiple paths are followed at the same time. This issue is discussed further in the section on the third perspective. Beta centrality, also known as Bonacich power (Bonacich, 1987; Bonacich & Lloyd, 2001), is defined in Equation 3, where R is the adjacency matrix of the graph, and 1 is a column vector of ones so that R1 is just the row sums of R. The measure draws on the earlier work of Katz (1953) and Hubbell (1965). Beta centrality is just Katz’s measure divided by β, and Hubbell’s measure is Katz’s plus 1. b=



(I − β R)

−1

R1 (3)

Note that ( I − β R ) is just a transformation of the data matrix R. Under the right condi−1 tions, ( I − β R ) can be written as the infinite sum shown in Equation 4. Following Equation 3, we can then multiply the infinite sum by R and get a matrix we will call W, as shown in Equation 5. −1



β 0 R 0 + β 1 R1 + β 2 R 2 + (4)



W =− β 0 R1 + β 1 R 2 + β 2 R 3 + (5) (I β R) R = −1

The matrix W is revealing. Ignoring the betas, it is a sum of powers of the adjacency matrix. These have a meaning. The (i,j)th cell of the R1 matrix (which is just the adjacency matrix)

Three Perspectives on Centrality   339 gives the number of walks of exactly length 1 from i to j. The R2 matrix gives the number of walks of exactly length 2. The R3 matrix gives the number of walks of exactly length 3, and so on. This means that, summed up, W gives the total number of walks of all lengths from every node to every other, and if we take the row sums, beta centrality is then a frequency measure of the radial type.5 Of course, without the betas, every cell of W would be infinity, since walks can be of infinite length. So let’s consider the betas. For clarity of exposition, let’s temporarily assume that beta is less than 1—say, 0.5. Then walks of length 1 will be weighted 0.50 = 1, which means they are given full weight. Walks of length 2 will be weighted 0.51 = 0.5, which means they will be weighted only half as much as walks of length 1. The R3 matrix will be weighted 0.52 or 0.25, and so on. Walks of even the modest length of 10 will be weighted only 0.000977, and walks of infinite length will be weighted 0. Thus, beta centrality counts the number of walks of all lengths emanating from a node, weighted inversely by their length. Now, must beta be less than 1 for this to work? No. For the infinite sum to converge, the absolute value of beta must be less than 1/λ, where λ is the largest eigenvalue of R. But λ can always be rescaled to be 1 by simply dividing every value in R by λ. For example, if the largest eigenvalue of a matrix is 45, we can divide the cells of the matrix by 45, and the λ of the new matrix will be 1. As a result, if we prefer to think of beta as a fraction like 0.5, we can do so by rescaling the matrix so that λ is 1. It is useful to observe that when beta is zero, beta centrality is just degree. We can see in Equation 3 that if beta is zero, the equation reduces to R1, which is simply degree. As beta increases, longer walks are included, but weighted by (shortness of) length. Therefore, we can think of this as a radial frequency measure that counts walks weighted (rather than selected) by length. When the absolute value of beta approaches 1/λ closely (from below), the resulting centrality scores become indistinguishable from eigenvector centrality. The formula for an eigenvector is shown in Equation 6, where v is the eigenvector. Rewriting it as Equation 7 makes it clear that, with eigenvector centrality, a node’s score is proportional to the sum of the scores of its neighbors, and 1/λ is the proportionality constant.

Rv = λv (6) vi = 1

λ∑j

rij v j (7)

More measures could be discussed, but it seems reasonable to stop here and present a three-dimensional typology of centrality measures (Borgatti & Everett, 2006). The dimensions are clear. The first dimension is what kind of traversals the measure pays attention to. The choices discussed here have been geodesics, paths, trails, and unrestricted walks. The second dimension is the node’s position along these traversals: is it an endpoint6 or an interior node? We have referred to measures relying on the former as radial and measures relying on the latter as medial. The third dimension is what property of traversals we are assessing: their frequency or their length? This is perhaps the fuzziest of the dimensions, since many measures appear to take account of both. However, there is a subtle distinction between assessing the typical length of a path, as in closeness or information centrality (Stephenson & Zelen, 1989), and using length to bound the set of paths to be counted, as in degree and betweenness. We see the latter as frequency measures. The beta centrality family

340   Stephen P. Borgatti and Martin G. Everett Table 17.1  Three-Dimensional Typology of Centrality Measures Radial measures

Length

Geodesics

Paths

Closeness,a ARD,b generalized closenessc

Informationd

Trails

IEC,k Markov FPTe

Length × Freq Frequency

Walks

Katz, Hubbell, beta,f PN,g eigenvector, TECk Degree, PIIh

k-Reach

GPIi

Geodesics

Paths

Trails

Betweenness, generalized betweennessesj

Flow betweenness

Medial measures Walks

Length Length × Freq Frequency

RWB,l MECk

Freeman (1978/1979). b Average reciprocal distance. cAgneessens et al. (2017). dStephenson & Zelen (1989). e First passage time. f Bonacich (1987). g Everett and Borgatti (2014). h Smith et al. (2014). i Markovsky, Willer, and Patton (1988). j Brandes (2008). k Friedkin (1991). l Newman (2005). a

of measures are harder to classify. Effectively, the scores are the sum of products of the number of walks and (an inverse function of) the lengths of the walks. This is an ambiguous number. If normalized by the number of walks, we would call it the average (inverse) length of walks emanating from a node, weighted by frequency. If normalized by the sum of inverse lengths, we would call it the average number of walks, weighted by inverse length. Since we don’t normalize by either, the nature of the quantity is indeterminate. Table 17.1 presents the typology, with the beta centrality measures occupying a mixed category of length and frequency. The table includes some measures that have not been discussed here.

The Contribution/Induced Centrality Perspective The contribution perspective follows directly from the walk structure perspective presented in the last section but is in a certain sense more disciplined. The idea is to quantify the extent to which a node (or set of nodes) contributes to a notable characteristic of the graph as a whole. To do this, we identify a structural characteristic of the network as a whole (known as a graph invariant), then remove a given node, recalculate the graph invariant, and take the difference. This difference captures the node’s contribution to the graph invariant and is taken to be its centrality score. We do the same thing for each node in turn to calculate every node’s centrality. Centralities calculated in this way are known as induced centralities (Everett & Borgatti, 2010) or vitalities (Koschützki et al., 2005).

Three Perspectives on Centrality   341 A well-known example of an induced centrality is flow centrality (Freeman et al., 1991). The idea behind flow centrality is that a network is a system of pipe-like channels through which things flow. The valued adjacency matrix gives the capacities of all the pipes. The classical problem, posed by Ford and Fulkerson (1956), is: given the selection of a source S and a target T, what is the maximum amount of stuff that can flow from s to t? For graphs with unit capacities on all pipes, the problem can be shown to be equivalent to the number of edge-independent paths from s to t, where a set of edge-independent paths is a set in which no path shares a link. The idea of flow centrality is to measure how much of the flow from s to t (and, indeed, from any node to any node) passes through a given node k. Because flow solutions are not necessarily unique, the method Freeman et al. devised to calculate this flow was an induced centrality. First they calculated maximum flows from every node to every other, summing these up. Then they removed a node from the network and recalculated the sum of maximum flows. The difference between these sums was interpreted as the amount of flow passing through that node. In hindsight, it is not clear that the method exactly captures the conceptualization, but it is in any case a clear example of an induced centrality. It should be obvious that degree centrality is an induced centrality. If we take as the graph invariant the number of edges in the graph and now remove a node, the change in the number of edges in the graph is exactly the number of edges incident on the node, which is to say, its degree. However, it can be shown that not all accepted measures of centrality are induced centralities. Everett and Borgatti (2010) present the case of the graph in Figure 17.1 (left). Let us assume we will be measuring the centrality of nodes a and b, in which case we will be removing them, one at a time, from the graph. The graphs without a and without b are shown in the middle and right-hand sections of Figure 17.1. Note that these two graphs are isomorphic. Whatever graph invariant we calculate on them, the result will be exactly the same, which means that a and b will have the same induced centrality score for any invariant we can imagine. Interestingly, when we calculate off-the-shelf measures of centrality on this graph, we find that all measures give identical scores to a and b except for closeness measures. For example, the standard Freeman closeness centrality gives b a

a

a

b

b

G

G-{b}

G-{a}

figure 17.1  Removing node a or node b yields identical residual graphs, but nodes a and b have different closeness scores.

342   Stephen P. Borgatti and Martin G. Everett slightly better score. So does information centrality (Stephenson & Zelen, 1989), as well as three-step reach. As a result, none of the closeness measures can be conceptualized as induced centralities. This result has implications for the question raised earlier about what properties a centrality measure must conform to. Given the principle that isomorphic nodes must have the same score, we might have been tempted to make a stronger statement, namely that any centrality measure must give the same score to two nodes whose removal yields isomorphic residual graphs. Clearly, this fails for closeness. The reverse fails as well: if two nodes have the same centrality score, it is not the case that their residual graphs G-{a} and G-{b} are isomorphic. An interesting set of induced centralities can be built on conventional centralities. For example, suppose we take betweenness centrality and calculate it for every node.7 Now we sum the betweennesses to obtain a single number for the entire graph—a graph invariant. Suppose we use this to define an induced centrality. It might seem like the results would be trivial—each node contributed their betweenness centrality to the sum, and so removing the node, recalculating every remaining node’s betweenness, and recalculating the sum would perhaps subtract from this sum precisely that node’s betweenness, just like degree. But it’s not so. Consider the network of marriage ties (Padgett & Ansel,  1993) shown in Figure 17.2. Notice that part of the Medici’s betweenness comes from enabling Pazzi and Salviati to connect to the rest of the network. But more than that, the Medici’s presence gives Salviati betweenness, because the Salviati family connect Pazzi to the Medici, and

PUCCI

Salviati give more than they receive

CASTELLAN BARBADORI

PERUZZI STROZZI

PAZZI RIDOLFI

SALVIATI

BISCHERI

MEDICI TORNABUON ACCIAIUOL

GUADAGNI ALBIZZI

Ridolfi actually reduce the betweennness of other nodes

LAMBERTES GINORI Lambertes have no betweenness of their own, but contribute to that of Guadagni

figure 17.2  Marriage ties among Florentine families during the Renaissance.

Three Perspectives on Centrality   343 Table 17.2  Direct and Indirect Contributions to the Total Betweenness of the Network Family

Total

Direct

Indirect

Medici

73

48

25

Guadagni

38

23

15

Albizzi

43

19

24

Salviati

57

13

44

Ridolfi

9

10

–1

Bischeri

11

10

1

Strozzi

14

9

5

Barbadori

14

9

5

Tornabuoni

13

8

5

Castellani

18

5

13

Peruzzi

24

2

22

Pazzi

35

0

35

Ginori

28

0

28

Acciaiuoli

24

0

24

Lamberteschi

29

0

29

0

0

0

Pucci

from there, the world. So, removing the Medici doesn’t just remove their direct contribution to the sum of betweennesses, but also Salviati’s contribution. Thus, when the graph invariant is defined as the sum of betweennesses, the induced centrality can be decomposed into two parts: the direct contribution of a node’s own betweenness and the indirect contribution of increasing others’ betweenness. Table 17.2 shows, for each node, the induced or total contribution, their direct betweenness, and the induced minus the direct, which we label “indirect.” It is interesting to compare node profiles on their direct and indirect contributions. The Medici make a strong indirect contribution, but it is small relative to other nodes. Several nodes, from Pazzi down to Lamberteschi, make no direct contribution of their own, but make strong indirect contributions. That is, they have no betweenness themselves, but they enable other nodes to have betweenness. Pendants (nodes of degree 1) nearly always have this quality. Perhaps the most interesting node in the network is the Ridolfi family. They have a direct betweenness score of 10, but when removed from the network we find that their contribution to the sum of all betweenness is 9, meaning that their indirect contribution is actually negative. In other words, their presence reduces the betweenness of the nodes around them, unlike every other node in the graph. There are a number of appealing things about induced centralities. One is that we needn’t limit ourselves to measuring the centrality of individual nodes. We can do groups of nodes. For example, to measure the centrality of the marketing department in an organization, we can simply delete the whole department and see how much the graph invariant changes. Moreover, following Borgatti (2003, 2006), we can combine the induced centrality with a

344   Stephen P. Borgatti and Martin G. Everett combinatorial optimization algorithm to identify optimal sets of nodes who make the most contribution—as an ensemble—to the graph invariant. Another advantage of induced centralities is that we can work with two-mode networks just as comfortably as one-mode networks. It simply doesn’t matter what kind of network we calculate the graph invariant on. In the two-mode Davis, Gardner, and Gardner (1941) data, we can use this approach to measure the centrality of each person and each event, although we might still be interested in separate normalizations for each mode, à la Borgatti and Everett (1997). Yet another advantage is that induced centralities can be calculated on edges instead of nodes: we simply calculate the invariant, remove an edge, recalculate the invariant, and call the difference the centrality of that edge. It is its contribution to the network characteristic as a whole. Finally, we can generalize the induced centrality approach to make use of multiple graph invariants. For instance, suppose we have different invariants that we think capture different aspects of the cohesion of a graph. We want to measure each node’s contribution to this multivariate construct of cohesion. An obvious approach is to take the sum of squared differences between the vector of invariants for the graph as a whole and the vector derived from the graph with the node removed. Above all, though, the great benefit of induced centralities is that they can be generated on the fly, enabling researchers to use a measure of centrality that is precisely appropriate for a given research context. We needn’t limit ourselves to off-the-shelf measures that may have been designed with assumptions and purposes very different from the current situation. At the same time, because induced centralities are all constructed in the same way, we needn’t treat a measure based on a different invariant as an entirely new measure with unknown properties. In fact, we can think of induced centrality as a single but parameterized measure. In this it is like beta centrality, in which the user chooses the value of beta. In the case of induced centrality, the user-selected parameter is the graph invariant. Following this thread of customizing measures, it is worth considering the “key player” line of thought presented in Borgatti (2003, 2006). These papers point out that when we consider specific reasons for detecting key players, we find that off-the-shelf measures are not optimal. Two classes of reasons or aims are considered, labeled KPP-Pos and KPP-Neg. KPP-Pos refers to the general aim of aiding a network by increasing cohesion and facilitating flows. KPP-Neg refers to the aim of fragmenting a network and preventing the nodes from coordinating or sharing. Consider the network in Figure 17.3. The most central node by all standard measures is node 1. Yet removing node 1 from the network would not fragment it. If that were the goal, node 8 would be the best choice, even though it is not the most central.

14

6 7

12 9

5 1

13

8

10

2 11

4 3

15

figure 17.3  Node 1 is most central, but removing node 8 is more damaging.

Three Perspectives on Centrality   345 Table 17.3  Reasons for Identifying Key Players DISRUPT

We want to remove them—to maximally disrupt the network.

ENHANCE

We want to help them—to make the network as a whole function better (diffuse info; coordinate well).

INFLUENCE We want to identify key opinion leaders—to influence the network. LEARN

We want to know who is in the know—so we can question or surveil them.

REDIRECT

We want to remove/prune them—to redirect flows in the network toward our preferred players.

We can extend this line of thinking by considering additional or more specific goals. Table 17.3 lists several possibilities. The “disrupt” and “enhance” aims correspond to KPPNeg and KPP-Pos. The “influence” goal refers to identifying nodes who could be seeded with an attitude, belief, behavior, or virus that we want to spread maximally through the network. The “learn” goal is, in a sense, the inverse of the “influence” goal. We want to identify nodes that are likely to have received the most (non-redundant) flow from the network as a whole. Finally, the “redirect” goal is about pruning the network to redirect flows in the network to different nodes. In a police/defense setting this might be to redirect flows to nodes that we can more conveniently surveil. Alternatively, we might want to prune the network to make it more compact. For the sake of brevity we omit examples, but suffice it to say that, for each goal, we can construct networks where the standard measures of centrality are not as optimal as bespoke measures.

The Flow Outcomes Perspective The first perspective we presented—involvement in the walk structure of the graph—was about the different kinds of traversals that could be made on a graph. At the time, we didn’t ask what these traversals represented—what was traversing and why? Our third perspective on centrality takes this as its starting point. Consider a coin moving through an economy. A person gives it to a store clerk to pay for gum. Later, the clerk gives it to another customer in change. The customer gives it to a friend who will use it in a vending machine. The coin keeps moving through the economy until eventually it is lost or taken out of service by the government. We might think of the coin as traversing a very large network. In reality, this is a bit of a stretch—in all likelihood, no tie existed between the person and the store clerk before the coin passed between them. It is more like the movement of the coin created the illusion of a tie on the spot. On the other hand, the coin can’t just go anywhere from anywhere. While it is possible to mail a coin to a far-off land, in general the coin will pass from hand to hand, and when it is in the hands of a given individual, there is a limited set of others that have any probability of receiving it. Within that set, it may be exceedingly difficult to predict who it will go to, and so we might consider that it could go to any of them with uniform probability. So we might model the coin’s movement as a random walk through a network and observe that at any given point in time there are millions of coins taking random walks through the same network.

346   Stephen P. Borgatti and Martin G. Everett Viewed this way, it seems natural to pose a variety of questions about the traversals, which we refer to as flow outcomes. For example, when a coin reaches a given node, we might ask what the probability distribution is of the number of hops it took to get there. We might be especially interested in the mean of that distribution, that is, the expected number of hops from the mint to the first time it reaches a given person (known as mean first passage time or FPT in the Markov chain literature). Or we might ask how often we expect the coin to visit a given node. Similarly, we might ask the probability of reaching a node before reaching another (such as a terminal node indicating exit from the system, as when the coin is retired). Underlying these questions is an assumption that the answers will be different for different nodes, depending on their position in the network structure. This suggests conceiving of centrality measures as related to flow outcomes, and indeed perhaps the measures could be defined in terms of estimating flow outcomes. This is the essence of the flow perspective on centrality (Borgatti, 2005). To start, let’s consider a trivial example. Suppose we define the following flow regime, which we will call the package regime. In this regime, we require that things can only be in one place at one time, and that they only flow along geodesics (i.e., shortest paths). On average, how long does it take for something to go from a given node to any other, randomly chosen, node in the network? If we assume that it takes one unit of time to traverse each link, the answer is Freeman’s closeness centrality divided by N – 1, which is to say, the average geodesic distance from the node to all others. Let us continue with this particular flow regime and add one item. If there are multiple equally short paths from one node to another, the package chooses uniformly at random among them. Now suppose we run a simulation with every pair of nodes serving as a source and a target, and we count the number of times the package passes through each node. The expectation of this quantity is exactly betweenness centrality. This means that, across thousands of simulations, the average number of times something passes through a given node converges on its betweenness centrality. The point here is that both the closeness and betweenness formulas give the expected values of two different flow outcomes in a network, given a particular flow regime. Closeness gives the expected time until first arrival, and betweenness gives the frequency of arrival.8 What is immediately obvious when we take this perspective is how unrealistic the flow regimes underlying betweenness and closeness are. The concept of shortest paths involves both having a target and complete knowledge of the network. Neither of these is likely for most flows that we encounter. Consider, for example, the flow of information—such as gossip—through an organization. In transmitting the information, a given node might have a distal target in mind but has little control over whether or how the information gets there. A look at empirical gossip flows suggests a regime very different from the package regime that betweenness is consistent with. Gossip does not travel along shortest paths, nor is it limited to actual paths at all. Paths do not repeat nodes, but a bit of gossip can easily reach the same node multiple times. It is less likely, however, that node A tells node B the same piece of gossip more than once, suggesting graph-theoretic trails. Moreover, unlike the package regime, there’s no reason to expect that gossip can only be in one place at one time. If I tell you something interesting, setting off a chain of transmission, I still have the gossip and can send it to someone else as well. Similar considerations apply to contagious diseases. Table 17.4 presents a number of flow regimes that readily come to mind. One question that arises is what happens if we apply a measure consistent with one flow regime to a

Three Perspectives on Centrality   347 Table 17.4  A Sample of Flow Regimes Name

Traversal

Contagion Description

Used book

Trail

Move

I read a paperback. When finished, I pass it along to a friend, who does the same, until someone keeps it. A person might receive it twice but (normally) doesn’t pass it to same person as before.

News/ gossip

Trail

Copy

I hear some gossip and pass it on. Then I tell someone else. Each node does the same but normally doesn’t tell the same person multiple times.

Itinerant

Path

Move

I am an itinerant mathematician like Erdös. I live and work with you until your spouse kicks me out. Then I go to someone else’s house. Each time, I burn my bridges and can’t return.

Virus

Path

Copy

A virus infects a host, who then infects others. Infected people develop antibodies so they can’t get it again.

Coin

Walk

Move

A coin moves through the economy until it is lost or retired. It can only be in one place at a time.

Attitude

Walk

Copy

My views affect yours, which in turn affects others. We are all continually affecting each other.

Travel

Geodesic

Move

People traveling by air to some destination prefer to make the fewest number of stops.

research context that is more consistent with a different flow regime. For example, if we are looking at the flow of information and calculate Freeman’s betweenness centrality, how badly would its estimates be of the number of times a given bit of information flowed over a certain node? Borgatti (2005) provides an example where it does quite poorly. The data are the marriage ties among Florentine families shown in Figure  17.2. A simulation is programmed with the gossip regime and run thousands of times. The average number of times something flows over a node is calculated and compared with betweenness centrality. Both betweenness and the simulation give the Medici the highest score. But whereas betweenness puts the Strozzi family right in the middle of the pack, the gossip simulation puts the Strozzi in second place. This result seems more in keeping with historical reality in the sense that the Strozzi were the previously dominant family and still a major power at the time of the Medici rise.

Discussion The three perspectives we have presented are clearly related. All are about traversals through a graph, and which ones tend to involve a given node. The walk structure perspective is perhaps the most agnostic. Yes, it claims that a node is more central to the extent it participates in the walk structure of the graph, but beyond that it is just a typology of measures based on the kinds of traversals the measures pay attention to, and what aspect of the ­traversals—number or length or both—that they record. The main benefit of this perspec-

348   Stephen P. Borgatti and Martin G. Everett tive is that it enables us to put all centrality measures into a single common framework. It is the one characterization that leaves no measure out. It is also the only one that is means based rather than outcome based, meaning that we define measures in terms of how they are calculated and what kinds of math are used. The second perspective takes from the first perspective the notion that centralities ­measure involvement in the network, but the focus on traversals is incidental. Rather, the focus is on a kind of goal—a graph invariant—and measuring the extent that a given node contributes to the goal. Most of the goals we can readily imagine have something to do with the cohesion of a network, such as the average distance among all pairs of nodes. (And most cohesion measures have something to do with the number and shortness of paths linking pairs.) It seems likely that the first perspective includes the second as a special case. That is, all induced centralities can be located in the typology of the first perspective, but not all measures discussed in the context of walk perspective can be derived as induced centralities. As a general principle, any measure that gives different values to points whose removal yields isomorphic residual graphs cannot be an induced centrality, since any induced centrality must give the same value to nodes with isomorphic neighborhoods. The walk perspective is useful for imposing some order on the set of proposed centrality metrics but excludes none. As a result, we may define centrality measures that defy interpretation. For example, suppose we count the number of paths emanating from a node in which adjacent nodes alternate in degree, for example, a path like A—b—C—d—E, where lowercase letters indicate low-degree nodes and capital letters indicate high-degree nodes. It’s a measure, and we can classify it, but we don’t know what to do with it. In contrast, induced measures carry at least one interpretation built into the measure: it is the extent to which each node contributes to some graph property, such as fragmentation. However, the meaningfulness of a measure depends on the meaningfulness of the graph invariant. If we choose a silly graph invariant, we will get a highly interpretable, but still silly, measure. For example, if the graph invariant is the number of nodes with degree 5, nodes with high centrality will be those that contribute to that property, and nodes with low centrality will be those that take away from that property. The flow perspective takes a step beyond the induced perspective in ensuring meaningfulness. The flow perspective restricts measures to just those that represent a flow outcome, such as time of first arrival, frequency of arrival, probability of arrival as a function of time, and so on. An area for future research is to ask, as we did with induced centralities, whether every measure of centrality can be seen as measuring the outcome of a flow process whose rules merely need to be uncovered. Depending on how richly we define our flow regime, we can define more complicated outcomes such as the quality of whatever is flowing when it arrives, given that distortions are introduced along the way. If the observed signal equals true signal plus noise, how much of the true signal can be inferred by averaging the signal observed from many sources? A key advantage of the flow approach is that we are modeling an actual process. To the extent the process matches the reality of some situation, we obtain a measure that not only ranks nodes but also estimates a known quantity, such as the number of unique bits of information each node can be expected to receive by a certain time. As discussed earlier in the context of betweenness, estimating such quantities from measures that are inconsistent with the actual social process at hand can give very poor results.9 Moreover, through simulation, we can easily examine unique flow regimes that no one has studied before (and for

Three Perspectives on Centrality   349 which no measure has been constructed). Ideally, we would use simulation results to guide the development of a closed-form equation that would capture the concept exactly. This is unlikely in practice, as only measures based on shortest paths and unrestricted walks are computationally easy. But it really doesn’t matter: the stable simulation results averaged over thousands of runs give us our measures directly. This may be costly with big data, but not nearly as costly as graph-theoretic measures that involve tracing all paths.

Final Note A leitmotif of this chapter, particularly evident in the discussions of the induced and flow perspectives, is the construction of centrality measures that are appropriate to a given theoretical task and/or research design.10 In the section on key player approaches, it was noted that off-the-shelf measures, such as betweenness or beta centrality, are almost always suboptimal for any given purpose, simply because they were not designed specifically for that purpose. The inadequacy is compounded exponentially when the job requires identifying a set of nodes working together. Thus, this chapter is very much a vote for bespoke measures constructed specifically for the study at hand. On the one hand, this may seem obvious. If you are studying a new construct such as cyber-bullying, you wouldn’t use a survey from the 1950s about bullying in the playground. You might start with that, but you would customize the instrument for the job. On the other hand, we know (from working with countless workshop participants) that the main reason people choose a measure of centrality is that somebody famous in their field used that measure, indemnifying the present research against criticism. The second major reason is that they tried all the measures in their software package and used the one that happened to yield a significant result. This is obviously a bad approach because, with dozens of named measures on the market, the risk of spurious correlation is extremely high. Yet in an age in which researchers refer to their publications as “hits,” both of these practices are extremely hard to argue against, especially in the case of young assistant professors looking for tenure. All we can say is that for a paper to have legs, it needs to have integrity. If you want your paper to be cited for years to come, use the right measure for the job.11

Acknowledgments This paper began as a presentation to the 12th Applied Statistics International Conference (2015) in Ribno, Slovenia. We are grateful to the organizers for having given us the opportunity to elaborate these ideas.

Notes 1. As discussed in the section on network flows, we could devise a measure that contains a random element, perhaps calculated via simulation. In that case it is the expected value that must be the same for isomorphic nodes. 2. Although not strictly necessary, we typically use binary relations, meaning that the ties connect pairs of nodes, as opposed to triples, quads, etc. Graphs with nonbinary ties are

350   Stephen P. Borgatti and Martin G. Everett called hypergraphs. Note that we use tie, line, link, and edge interchangeably, just as we treat nodes, vertices, points, and actors as synonyms. 3. Of course, there are also count measures that don’t make use of length in any way. For example, we could just count the number of all paths or trails emanating from a node, although this would be computationally prohibitive. 4. Two paths are edge independent if they don’t contain any of the same links. For example, the path A—B—C—D is not edge independent of A—E—B—C—F—D because the B—C link is used in both. 5. We can also take column sums. When R(i,j) can be viewed as the amount of influence that i has on j, then the row sums are a measure of the total direct and indirect influence that i has. When R(i,j) can be viewed as the amount of respect that i accords j, then the column sums can be viewed as a measure of the status of j. 6. Again, for simplicity, we consider only undirected graphs, so the term endpoint suffices and refers to either end of a traversal. But for directed graphs we would distinguish origins from termini. 7. Instead of the sum of betweenness scores, we can also think of this quantity as the sum of all entries in the Freeman dependency matrix (Freeman, 1980), which is also related to the Wiener index (Brandes, Borgatti, and Freeman, 2015). 8. More accurately, it is the frequency with which something passes through a node on its way somewhere else. If we add in the number of times that the node is the target of the traversal, then we must add N to betweenness. 9. However, there are some cases where extreme flow regimes, such as those calling for shortest paths, can be useful. For example, in an epidemiological setting, closeness centrality gives a kind of worst-case scenario for how quickly a sexually transmitted disease could reach a given node. This is certainly useful for planning. 10. Here we echo the call by Friedkin (1991), who argues for deriving centrality measures from a theoretical model of a given social process. 11. Note that in supporting customized measures, we are not necessarily calling for new standard measures to be added to the shelf. For example, if someone tweaks betweenness centrality for a specific application (e.g., by ignoring paths longer than k), there is no need to name it so-and-so centrality and publish it as a new measure. Part of the point of the walk structure perspective is to make family resemblances obvious and to attempt to discourage the need to name each of an infinite number of variations on the same concept. Similarly, if someone selects a novel graph invariant and bases an induced centrality on it, there is generally no need to name it and publish it as a standalone measure. The whole point of induced centrality is that it is a parameterized measure, the parameter being the choice of invariant.

References Agneessens, F., Borgatti, S. P., & Everett, M. G. (2017). Geodesic based centrality: Unifying the local and the global. Social Networks, 49, 12–26. Bonacich, P. (1987). Power and centrality: A family of measures. American Journal of Sociology, 92(5), 1170–1182. Bonacich, P., & Lloyd, P. (2001). Eigenvector-like measures of centrality for asymmetric relations. Social Networks, 23(3), 191‒201. Borgatti, S. P. (2003). The key player problem. In R. Breiger, K. Carley, & P. Pattison (Eds.), Dynamic social network modeling and analysis: Workshop summary and papers (pp. 241–252). Washington, DC: National Academy of Sciences Press.

Three Perspectives on Centrality   351 Borgatti, S. P. (2005). Centrality and network flow. Social Networks, 27(1), 55–71. Borgatti, S. P. (2006). Identifying sets of key players in a network. Computational, Mathematical and Organizational Theory, 12(1), 21–34. Borgatti, S.  P., & Everett, M.  G. (1997). Network analysis of 2-mode data. Social Networks, 19(3), 243–269. Borgatti, S.  P., & Everett, M.  G. (2006). A graph-theoretic perspective on centrality. Social Networks, 28(4), 466–484. Brandes, U. (2008). On variants of shortest-path betweenness centrality and their generic computation. Social Networks, 30(2), 136–145. Brandes, U., Borgatti, S. P., & Freeman, L. C. (2016). Maintaining the duality of closeness and betweenness centrality. Social Networks, 44, 153–159. Davis, A., Gardner, B. B., & Gardner, M. R. (1941). Deep South: A social anthropological study of class and caste. Chicago, IL: University of Chicago Press. Everett, M. G., & Borgatti, S. P. (2010). Induced, endogenous and exogenous centrality. Social Networks, 32(4), 339–344. Everett, M. G., & Borgatti, S. P. (2014). Networks containing negative ties. Social Networks, 38, 111–120. Ford, L. R., & Fulkerson, D. R. (1956). Maximal flow through a network. Canadian Journal of Mathematics, 8(3), 399–404. Freeman, L.  C. (1978/1979). Centrality in social networks conceptual clarification. Social Networks, 1(3), 215–239. Freeman, L. C. (1980). The gatekeeper, pair-dependency and structural centrality. Quality and Quantity, 14(4), 585–592. Freeman, L. C., Borgatti, S. P., & White, D. R. (1991). Centrality in valued graphs: A measure of betweenness based on network flow. Social Networks, 13(2), 141–154. Friedkin, N. E. (1991). Theoretical foundations for centrality measures. American Journal of Sociology, 96(6), 1478–1504. Hubbell, C. H. (1965). An input-output approach to clique identification. Sociometry, 28(4), 377‒399. Katz, L. (1953). A new status index derived from sociometric analysis. Psychometrika, 18(1), 39‒43. Koschützki, D., Lehmann, K. A., Peeters, L., Richter, S., Tenfelde-Podehl, D., & Zlotowski, O. (2005). Centrality indices. In U. Brandes, & T. Erlebach, (Eds.), Network Analysis: Methodological Foundations is (pp. 16–61). Berlin: Springer Science & Business Media. Markovsky, B., Willer, D., & Patton, T. (1988). Power relations in exchange networks. American Sociological Review, 53, 220–236. Newman, M. E. (2005). A measure of betweenness centrality based on random walks. Social Networks, 27(1), 39–54. Padgett, J.  F., & Ansell, C.  K. (1993). Robust action and the rise of the Medici, 1400-1434. American Journal of Sociology, 98(6), 1259–1319. Sabidussi, G. (1966). The centrality index of a graph. Psychometrika, 31(4), 581–603. Schoch, D., & Brandes, U. (2016). Re-conceptualizing centrality in social networks. European Journal of Applied Mathematics, 27(6), 971–985. Smith, J., Halgin, D., Kidwell, V., Labianca, G., Brass, D., & Borgatti, SP. (2014). Power in politically charged networks. Social Networks 36: 162–176 Stephenson, K., & Zelen, M. (1989). Rethinking centrality: Methods and examples. Social Networks, 11(1), 1–37.

Chapter 18

N et wor k V isua liz ation James Moody and Ryan Light

If we ever get to the point of charting a whole city or a whole nation, we would have . . . a picture of a vast solar system of intangible structures, powerfully influencing conduct, as gravitation does in space. Such an invisible structure underlies society and has its influence in determining the conduct of society as a whole. J. L. Moreno, New York Times, April 13, 1933

Network analysis has been deeply visual since Moreno first introduced the sociogram. Sociograms literally make visible the “invisible structure” of social relations by plotting people and their relationships in a graph. This simple function—to illuminate patterns of relations that are otherwise difficult to see—has arguably been one of the core contributions of network research. Many times, we need simply to identify who’s connected to whom to make sense of political and social alliance structures. In general, network visualizations— when they are good—are compelling data objects in themselves that draw viewers in for further exploration and insight. Unfortunately, when visualizations are bad, they are often little more than distracting depictions of densely overlapping lines and points that are impossible to interpret or “hairballs.” Our goal in this chapter is to help identify the methods underlying network visualization with an eye toward helping users produce more ­effective figures. Despite the compelling intuition of the importance of network visualization, visualization methodology has progressed largely ad hoc alongside researchers trying to solve particular substantive or methodological problems. The fundamental challenge of network visualization is to represent a multidimensional data structure in two (or, rarely, three) dimensions, which necessarily distorts the underlying data. As such, the task before analysts is to figure out what needs to be emphasized and how to do so in a manner that adheres most faithfully to the data. Researchers must often balance the theoretical or heuristic value of network visualizations for understanding underlying social structures with the empirical or realist depiction of social relations.

Network Visualization   353 In this chapter, we first briefly review key historical moments in the development of network visualization tools, then turn our focus toward best practices and recent innovations.

Brief History and Motivations Moreno’s hand-drawn sociograms (1934) describing social organization are the obvious stepping-off point for a history of network visualization, but his work did not emerge out of a vacuum (for a full history, see Freeman, 2000). Rather, there had been similar sorts of ideas expressed in kinship diagrams and organizational charts. But most of these historical precedents were specific solutions for the topical area of interest, not a general solution to modeling social relations. In terms of visualization, Moreno made at least two advances. First, he consciously focused on developing a symbolic system for coding social relations. For example, red lines indicated attraction, while black lines indicated rejection distinguishing mutual or symmetrical from directed or asymmetrical relations. Moreno visualized gender with circles for girls and triangles for boys. The intention here was to develop a syntax from the symbol system, to encode a visual language. Second, he was the first to specify in a general way the notion that proximity maps onto relational closeness, a theme that will echo through layout approaches going forward. That is, he suggested that there was an implicit social field being represented and that position on that field is meaningful. For Moreno, of course, this positioning was largely art—there were no computational or analytic tools yet to do anything else. This line of thinking was quickly taken up by early community sociologists, particularly folks like Lundberg and Steele (1938), who provide some truly beautiful maps of small-town elite networks. Network visualizations were a perfect fit for the community study approach of the time, which aimed to make clear status and community differences that were implicitly known to members of the community. By simply drawing out these relations, sources of power and alliances become visible (see also Loomis, 1943), an insight that remains im­por­ tant to research on power in organizations or small groups (see Carroll & Sapinski, 2010). As sociology generally became more quantitatively rigorous in the 1940s and 1950s, authors sought to move network visualization from art to science. Nothway (1940) developed a convention of ordering nodes by centrality scores in rings, with the most central nodes at the center ring and peripheral nodes at the edges, with placement around the circle used to help avoid line crossing or to highlight clustering (see also Bronfenbrenner, 1944). This development is first in a theme of adding in quantitative information to the layout— usually as a dimension/axis for plotting or for sizing symbols (for grid versions, see Bjerstedt 1952). The instinct of adding dimensions to the placement of nodes on the page took on new computational meaning with the development of factor analysis and computational tools in the 1950s and 1960s (see Laumann & Guttman, 1966, as an example). While these calculations were increasingly done by computer, most of these early drawings were still plotted by hand. In the early 1970s, Alba (1972) worked with Kadushin (1974) to produce some of the first fully computer-generated network diagrams, using ideas related to multidimensional scaling and spring embedding (see Tutte, 1963). The field has exploded since then, with “graph drawing” as a core computational challenge in computer science.

354   James Moody and Ryan Light Substantively, the motivation for visualizing networks remains the same: to make the structure of a social system readily visible. The challenge is that network structure is multidimensional but visualizations are limited to two or three dimensions, so some data loss is necessary. Programmers have developed a series of heuristic features to optimize that aim to capture core meaning features—such as limiting the variance in edge length or minimizing edge crossing. Unfortunately, no heuristic is perfect for every situation, and authors need to choose among alternative tradeoffs depending on the aspect of a given social structure they wish to highlight. Moreover, algorithmic approaches to these optimization problems are often computationally intensive, so authors seek to find not just a good drawing, but a good drawing that runs efficiently. We turn next to these general layout strategies.

Basic Network Visualization Strategies: Better Sociograms First, it should go without saying that network visualizations need to follow the same principles that guide any good quantitative data visualization (see Healy & Moody,  2014; Tufte,  2001 for review). The main idea here is that data visualizations are faithful to the underlying quantitative data—they avoid graphical chart junk or misleading scaling that would exaggerate or generally distort the underlying reality of the data. Because there is no perfect layout heuristic, there is often more room for author discretion than in many quantitative data visualizations. Similarly, colors and size features should be chosen to highlight readability and ease of use. When in doubt, use well-vetted color schemes (see Healy, 2019) and node shapes that are generally easy to see (circles and squares are better than triangles, stars, or emojis, for example). Element size should accurately reflect the quantities of interest. As this is a well-trodden area of basic graphic tradecraft, we will not belabor the point here, other than to say we recommend keeping the focus on high data-to-ink ratios and treat network visualizations in much the same way cartographers treat maps. The starting point for network visualization is classic sociograms: points represent nodes and lines between points represent relations in a two-dimensional representation of the network. We will start there and generalize from these. Within this framework one needs to decide how to represent points and lines (size, shade, shape, etc.) and how to position nodes. Node size is usually used to encode common centrality metrics—such as degree or closeness—while color and shape are used to encode node attributes. Edge width or shading (e.g., grayscale) is usually used to encode the strength of the tie. Edge color may indicate different types of relationships. User choices with these features are wide, and generally the hardest part of selecting such features is just to avoid visual clutter. Sometimes less is more. Most of the computational efforts invested in network visualization are aimed at identifying optimal placement of nodes in the two-dimensional space, where optimal is relative to a given graph visual feature. Such features include maintaining a consistent edge length (edges of the same value should be the same length), avoiding edge crossings or closely parallel lines (both confuse ability to trace relations), having screen distance representing the underlying graph distance (nodes that could reach each other quickly should be close together), or focusing on a key exogenous feature of the setting (such as explicit hierarchy

Network Visualization   355 1

1

1

2

4

2

3

7

13

12

5

10

5

3

4

2

6

6

12

5

11 14

12

10

12 11

14

5

14 13 8

3

2

8

9

13

10

12

13

6

14

6 11

11

13

7

10

13

12

8

9

9

11 3

7

8

8

13

5

4

7

5 14

4

8

3

2

7

1

1

6

4

8

13

3

1

1 4

2

6

2

12 11

14

3

1

2

2

7

9

10

16

8

14 8

11

12

3

11

14

11

8

5

10 12

7

14 7

13

8

9

6

6

9

2

5

2

5

5

10

3 4

4

1

4

10

7 5

9 3

8 10

2

11 12

1

13 4

14 6

figure 18.1  Algorithmic layout heterogeneity. in a single dimension). The number of such heuristics is huge, and each can have a significant effect on the resulting layout. As an example, consider the Figure 18.1, which applies 10 commonly available layout routines to the exact same underlying network. Recognizing this heterogeneity highlights that layout algorithms are generally problem specific and/or theoretically motivated. Layouts that optimize ordering for hierarchy may not apply well to graphs without a strongly hierarchical structure. We find it useful to consider three basic approaches to node placement in point-and-line networks: distance-based layouts, hierarchical layouts, and fixed-coordinate layouts. Distance-based layouts aim to preserve underlying graph-theoretic distance in the placement of points onscreen and are often intuitively similar to the original hand-drawn sociograms. The two algorithmic approaches generally taken here are based on “spring embedder” ideas or dimensional reduction techniques (e.g., singular value decomposition [SVD], multidimensional scaling [MDS], etc.). Spring-embedding routines originated with Tutte (1963) as a way to efficiently draw planar graphs. The idea is that each node has a spring pushing it away from all other nodes in the graph, but also a countervailing spring attracting it to connected neighbors. The field of forces implied by these two sets balances out in a way that pushes nodes near their neighbors and away from those that they are disconnected from (note that this balancing of forces idea is why they are also referred to as energy minimization routines). Consider a simple two-dimensional grid (Figure 18.2): the algorithm nicely goes from any arbitrary starting point (here a circle) to a uniform placement of nodes across the screen. The simplest way to get a sense of how these sorts of embedders work is to play with interactive versions, where you can tug on a node and watch the springs “stretch” and the system react. See, for example, Healy’s (2013) work on philosopher collaboration networks.

356   James Moody and Ryan Light

figure 18.2  Self-organizing layout via spring embedder. Full color figures available on Oxford Handbooks Online. While these types of interactive layouts are fun to play with, and instructive from an ­algorithmic standpoint, they rarely convey any more information than the simple static graph would. There are many variants on the spring embedder idea, the most common being from Fruchterman and Reingold (1991) and Kamada and Kawai (1989). Variants differ in how they explicitly deal with weights on the edges, postplacement adjustments for node overlap, relative weight factors, and so forth. An alternative approach toward the same goal is to use an explicit dimensional reduction technique, such as SVD or MDS. These techniques are popular ways to locate patterns in network and non-network data. If we consider the adjacency matrix (or, alternatively, the distance matrix) as a general n-dimensional data array, we can identify a small (usually two or three) number of dimensions that capture the majority of the variance. This was the original idea behind using principal components analysis, another common dimensional reduction technique (Laumann & Guttman, 1966). Because each of these techniques broadly seeks to represent social space as physical space onscreen, a simple metric for layout “fit” is the correlation between Euclidian distance on the screen and geodesic distance in the network (Moody, McFarland, & Bender-DeMoll, 2005). Figure 18.3 applies six common distance-based metrics to the same network of high school friendships. Notice that the fit varies here from 0.47 to 0.73, and this largely has to do with the stacking of nodes on top of each other (and arbitrariness of the circular layout). Because KamadaKawai has a heuristic avoiding node overlap, two nodes that should be in the exact same place will be at slightly different spots onscreen, which necessarily lowers the correlation with graph-theoretic distance. But, particularly for small graphs where node identities might matter, this is probably a reasonable price to pay on the distance metric to avoid overlaps. Similarly, one feature of a distance-based metric is that you might have theoretical or empirical reasons to emphasize some distances differently than others. For example, Frank and Yasumoto (1998) use a group-based layout that weights ties within a group higher than ties between groups, which helps to illustrate the group structure of the network. In the fifth panel visualization, we apply the Fruchterman-Reingold algorithm to a version of the network that weights ties within communities at triple the value of ties between communities, while the last panel collects each community into a circular layout centered on the group’s position within a similar layout space. These are the sorts of tradeoffs one has to make in balancing different layout goals, and there is no objectively superior metric. Rather, one needs to be clear about the choices made and data manipulations used to generate the network. If the network has a strong ordering to the relations—such as command rank in a formal hierarchy or distance from a seed node—then embedding that information in the network

Network Visualization   357 Multi-Dimensional Scaling (fit=0.69)

Fruchterman-Reingold (fit=0.73)

Visualization of Similarities (vos) (fit=0.59)

Fruchterman-Reingold , in group x3 (fit=0.64)

Kamada Kawai (fit=0.56)

Fruchterman-Reingold, group-circular (fit=0.47)

figure 18.3  Variants of distance-based graph layouts. Full color figures available on Oxford Handbooks Online. can be quite helpful. In these ordered graphs, one axis is reserved for the explicit ordering and the other ­dimension is reserved for minimizing edge crossings. Good algorithms for laying out hierarchies are often much more computationally intensive, as one has to search across many edges to identify crossings and then propagate ordering up or down the tree. For very large trees, it is often useful to use polar coordinates, such that the root of the tree is at the center of the figure and each branch radiates out from that. General social science applications of tree-based layouts are not as common as they are in management or corporate applications. The most common use for tree-based layouts tends to be to map diffusion results or trace the progress of respondent-driven sampling (RDS). Figure 18.4 provides an example of an RDS chain (Fisher & Merli, 2014). The final common representation consists of fixed-coordinate layouts. As the name implies, in these layouts the coordinates are determined exogenously to the structure of the graph. The most common forms are maps—where points are the geographic location of the nodes, or geometric layouts (typically circular, sometimes grids)—where the placement is constrained to a shape but otherwise arbitrary (see, e.g., trade networks as described in Chapter 31 in this volume). Circular layouts are minimally useful on their own. Their chief advantage is that you can easily see who each node is connected to, since the layout ensures that no line crosses completely under a node. If you have a small graph and want to know exactly how each person in that network is connected to every other, a circular layout may be useful, but by and large

358   James Moody and Ryan Light

Recruited survey wave Vertex Color High tier Middle tier Low tier

Legend

Non-venue based

100+ 50-100 20-49

Size of sex work venue 10-19 people 5-9 people Fewer than 5 people None, DN, NA

Edge color Recruited from different venue Recruited from same venue

figure 18.4  Respondent-driven sampling trees among Shanghai sex workers (Fisher & Merli, 2014). Full color figures available on Oxford Handbooks Online. circular layouts take up a lot of display space without any reinforcing information in the coordinate system itself. The most recent exemplar of the circular layout paradigm is chord diagrams, which typically exploit explicit edge bundling to decrease visual clutter (Holton, 2006). Chord diagrams are usually visually stunning due simply to the splatter of colors but often only weakly convey the connectivity pattern of a network. Circular layouts can be a useful way to highlight the embeddedness of groups in a larger network, where the macro-structure of a network is determined by a distance-based layout, and then each group is placed as a circle centered on the group’s centroid, such as the last panel in Figure 18.3. Maps provide a way of bringing information external to the network into the layout and are very effective when the purpose of the map is to highlight geographic exchanges. The primary problem with geographic coordinates is that people are not randomly distributed across space, so the structure of the map-based network often simply reflects heterogeneity in population density. While there are area-based solutions to this problem—population density–equalizing cartograms (Gastner & Newman,  2004), these are difficult to implement in point-and-line sociograms (but see Moody, 2018). While the variance in point-and-line layout algorithms leads to more cross-platform heterogeneity than might be expected in, say, scatterplots or histograms, one should not take this variance as discouraging or antiscientific. Rather, the fact that there is no single “right” layout simply means that authors have the discretion necessary to highlight the core features of a given network that corresponds best with their research questions. Automatic layout algorithms are merely attempts to maximize a particular layout heuristic to help in

Network Visualization   359 1. Default layout fit=0.73

2. Adjust edge weight fit=0.73

3. Color nodes by race fit=0.73

White Black Hispanic Asian

4. Size nodes by popularity fit=0.73

White Black Hispanic Asian

5. Adjust by community structure fit=0.68

White Black Hispanic Asian

6. Adjust for overlap & spacing fit=0.64

White Black Hispanic Asian

figure 18.5  Informational embellishments to a baseline layout. Full color figures available on Oxford Handbooks Online. that task. There is nothing scientifically sacred in a layout heuristic. While we would be reticent to move points around on a scatterplot, we should not be so reticent to adjust network layouts by hand if the underlying heuristic fails to provide a clear representation of the data. As an example, consider Figure 18.5, an extension of our running example of a high school friendship network. Here we start with the best-fitting distance-based layout from Figure 18.3, then layer over additional information to highlight core features of the network. The first adjustment is to simplify the representation of the edges by removing the duplicates generated by reciprocity and weighting each edge by the maximum value of either (weight is number of activities students report doing with each other). We then color the network by the primary organizing feature in this network—race—in panel 3 and adjust node size by popularity in panel 4. We then turn to the actual placement of the nodes and add a small in-group weight to the base Fruchterman-Reingold layout. While this highlights natural clustering and removes some overlap between nodes, there is still a great deal of occlusion here. So in the final panel we help alleviate this in two ways: the main treatment is to move nodes by hand that were stacked on top of each other, and the second is to add a little transparency to the nodes so one can see where edges pass under (rather than to) the nodes. Densely connected graphs may also require editing to illustrate core structures. For example, correlational networks where the edge weight is a similarity score based on correlation will likely have a maximally connected structure where every node is connected to all others. This kind of network will be difficult to interpret using traditional layout ­algorithms and may be best illustrated after thresholding weakly connected edges or

360   James Moody and Ryan Light r­emoving edges with correlation scores below a certain value. Importantly, while these reduced networks may help illustrate the backbone, strong connections within a graph, they should be used primarily as heuristics or theoretical models and with caution in any statistical analyses, if at all. Sociograms can be amazingly effective, and once exposed to the general approach, audiences quickly learn how to read them. Variants can effectively bundle edges to highlight micro/macro flow or layer colors to convey differences in relational types. They are amenable to strong data augmentation when displayed online, allowing tooltips, hyperlinks, or otherwise dynamic exploration. This flexibility makes sociograms particularly useful for consulting or explaining networks to lay audiences, as they can then explore the network themselves. As an example, consider recent interactive sociograms for Trump alliances (Graph Commons, 2018) or hidden funds in the Panama papers (International Consortium of Investigative Journalists, 2019).

Advanced Network Visualization Approaches: Moving beyond Sociograms While sociogram visualizations are common and effective for small static networks, they tend to become unreadable when the network is either very large, is dynamic, or contains many relations. Techniques for solving these problems usually turn on sidestepping the visual clutter introduced by traditional layout techniques. We have treated the problem of network dynamics in depth elsewhere (Moody et al., 2005), but for networks with dynamic nodes and edges it is often useful to generate a movie of the network. Movies allow one to see the tempo and relational dynamics of a setting in abstracted form and can be very effective at demonstrating change over time. The primary algorithmic challenge with network movies is chaining layouts such that movement is due solely to change in the network, rather than random layout noise. Bender de Moll (Bender de Moll,  2018; Bender de Moll, Morris, & Moody,  2008) has updated the earlier work by Bender deMoll and McFarland (2006) with the network dynamic temporal visualization (NDTV) R package. Network movies are often intractable for static representations in print papers or books, for example, and people tend to have poor memories for spatial change—so while movies are great for getting a qualitative sense of setting dynamics, it is often better to have comparative “slices” or small multiples (Tufte, 2001) to allow people to examine change over time in detail. One approach to account for temporal change in a static representation is to plot the network as a multislice graph using a time-space representation. In these representations, each node is connected to its future self via an “identity” arc, but then within each time slice relations are displayed as normal. Figure  18.6 describes the basic approach applied to a small sample network. Figure 18.7 presents the first five waves of the Newcomb fraternity data using this approach. For similar flow problem networks see applications of Gant diagrams to network connectivity, and for a simple multislice graph see the sociograms in Light and adams (2016).

Network Visualization   361 (a) Aggregate network structure

3– A

7

C

8 – 10

E

B

2–5

0–

1

D

3–5

F

(b) Full Time-Space Representation

D

B

F

F

D

D

D

C

C

C

C

C

B

B

B

B

B

B

A

A

A

A

2

3

4

5

6

7

D

B

time 0

F

1

E

E

E

C

C

C

8

8

10

(c) Condensed Time-Space representation

D

B

F

D

D

C

C

C

B

B

B

B

A

A

A

2

3

D

B

time 0

F

1

4

5

6

7

E

E

C

C

8

8

10

figure 18.6  Visualizing network change. Traditional sociograms are most likely to fail social scientists as network size increases. The big data revolution has provided us access to truly enormous networks, but visualizing a traditional sociogram with many thousands of nodes or very dense tie patterns tends to be uninformative (e.g., a “hairball”). The general solution to this problem is to abstract from the traditional representation to some level of aggregation, focusing on the macrostructural patterns and minimizing or outright removing references to particular nodes. A simple example of this idea is to remove nodes but focus on relations, adding a great deal of transparency to lines and coloring to emphasize communities. Figure 18.8 uses this technique to describe the network of over 19,000 papers written by scientists involved in a policy debate (see Edelmann, Moody, & Light, 2017). Panel A illustrates the distribution of topics across

362   James Moody and Ryan Light Time-Slice representation of the Newcomb Fraternity data

Wave 1

Wave 2

Wave 3

Wave 4

Wave 5

figure 18.7  Multislice temporal representation of Newcomb fraternity data. Full color figures available on Oxford Handbooks Online. (a)

(b)

Immunology Scientists for Science

BioChem HIV Vaccines & Drugs

Virology Evolutionary Genetics Genetic Sequencing

Social Aspects of Health Public Health

Cambridge Working Group

figure 18.8  Abstracting from sociograms: removing nodes. Full color figures available on Oxford Handbooks Online. the network with virology squarely in the middle as the debate centers on experiments involving viruses. Panel B illustrates the two sides of the debate with proponents more densely connected and overlapping with virology and the opponents of these experiments dispersed across the graph. When we expand to even larger networks where the primary goal is identifying the ­community memberships, we have found heat map and contour sociogram (Moody & Light, 2006) approaches to be quite effective. The contour sociogram approach aims to represent the aggregate structure of the network by exploiting the fact that algorithms like Fruchterman-Reingold or MDS place

Economics

Geography Political Science

Law

Anthropology

Business

Library Science

Sociology

OrgPsych Management Crim

Population Health

Soc Psych Educ Language & Psych Linguistics

Develop Psych

figure 18.9  Abstracting from sociograms: contour shading. Full color figures available on Oxford Handbooks Online.

Network Visualization   363

Psychology

364   James Moody and Ryan Light nodes that are strongly tied to each other or otherwise structurally equivalent on top of (or very near) each other. Figure 18.9 describes the process using cocitation networks among all social science journals. Panel 1 is a somewhat crowded traditional representation. We can now highlight the clustering in the network by layering a contour surface over the distribution of nodes (Panel 2), but since we care little about any particular node, we can simply remove the nodes and replace them entirely with the two-dimensional density surface, which captures the probability of finding a node at each xy point in the display plane (panel 3). We have found these representations to be quite effective in bibliometric analysis, and a number of such graphs are to be found throughout the volume. See also the VOS software tool (Waltman, van Eck, & Noyon, 2010), which implements this sort of layout for bibliometric data. A slightly different approach, which is very effective at displaying dense networks, is to treat the adjacency matrix as a heat map. The core task associated with matrix-based heat maps is finding an optimal ordering to the rows/columns of the adjacency matrix; the most common approach is to order by observed subgroups, yielding a block-diagonal matrix. Figure 18.10 illustrates this heat map method. Here, based on matrices from Waugh et al. (2009), Shai and coauthors (see Chapter 16 in this volume) illustrate increases in congressional polarization from the late 1950s to the early 2000s based on voting similarity. If one is instead interested in highlighting a particular subset of nodes, it is often effective to collect equivalent nodes and display relations among the collective set. Simple reductions include removing nodes with low degree (i.e., focusing only on highly connected nodes) or placing them within a fixed orbit of their dominant neighbors. One might, for example, collapse all nodes within a cluster who have no ties outside, focusing one’s attention on between-cluster connections. Figure 18.11 provides one solution to this problem when faced with trying to capture the overall extent of political polarization in the US Senate (Moody & Mucha, 2013). Here, we collapsed all nodes by structural equivalence—which naturally results in many senators who occupy a “party loyalist” position being grouped together, allowing us to highlight those rare senators that cross the aisle. Combining many of the techniques described earlier, we then root each party loyalist position at a fixed coordinate defined by the modularity score, and then use a spring embedder–based placement of the nonloyalist positions, which nicely spaces them between their contacts proportional to their balance of similarity, and identity arcs to link senators to themselves over time.

20

1

1

0.9

0.9

0.8

20

0.8

0.7 40

0.6

0.7 40

0.6

0.5 60

0.4

0.5 60

0.4

0.3 80

0.2

0.3 80

0.2

0.1 100

20

40

60

80

100

0

0.1 100

20

40

60

80

100

0

figure 18.10  Senate covoting similarity heat maps. Full color figures available on Oxford Handbooks Online.

Network Visualization   365 US Senate Voting Similarity Networks, 1975-2012 0.27

Democrats

Modularity

0.3

0.2

0.1

Detail 0

1910

1930

1950

1970

1990

2010

Landrieu Baucus

Heinz

Morgan Long Johnston Talmadge Stevens Nunn Danforth

Boren Cohen

Hatfield Boren Jeffords Specter Boren Heinz Shelby Nunn Heflin Zorinsky Chafee Johnston Durenberger Stafford Heflin Packwood Cohen Heinz Evans Andrews Damato Long

Durenberger

Nunn

Hollings

Specter

Nunn

Jeffords

Heflin

Staff ord

Percy

Chaff e,J

Packwood

Baucus

Durenberger Heinz

Chafee

Nelson

Durenberger Chafee Shelby

Specter

Collins

Chafee

Specter

Snowe

Miller

Damato

Cohen

Snowe

Brown Snowe

Collins

Miller

Hatfield

Packwood

Breaux

Chafee

Heflin

Specter Smith

Snowe Murkowski

Coleman

Jeffords

Exon Heflin

Stone

Morgan

Weicker

Exon

Stenn is

Pearson

0

ll Campbe

eiker Schw

Polarization Modularity

0.13

Stevens

Voinovich

Inhofe

Group Size 50

10 5 Vote similarity (> 0.6)

0.27 Ford D

‘75-76

D

Within Group Vote Similarity

Senators crossing time

25

Carter

D

Republicans

0.13

1 10

0.72

20

R

62

R

0.78

0.83

0.89

D

56 50 56 Senate Balance

R

Reagan

62

R

D

G.H.W. Bush D D

‘79-’80 ‘83-’84 ‘87-’88 ‘91-’92 Timeline : President, Senate Party Balance, Date (through June 7, 2012)

D

R

Clinton

‘95-’96

R

R

‘99-’00

R

D

G.W. Bush R R

‘03-’04

D

‘07-’08

D

Obama

D

‘11-’ 12

figure 18.11  Senate voting similarity. Full color figures available on Oxford Handbooks Online.

Conclusions Network visualizations provide unique opportunities to visualize social structure and have been a key aspect of the field since its inception in the 1930s. The core challenge of network visualization has always been one of scale—the item of interest has more dimensions than the display space, and as such we always have to accept tradeoffs in what we emphasize in the visualization. The workhorse of network visualization is the sociogram—nodes connected by links—and the central advances in network visualization have been ways to either generalize the sociogram or layer new sorts of information. Automatic layout procedures have become quite sophisticated, but researchers should always remember that any achieved layout is the result of multiple heuristics aimed at aiding in the conveyance of information and not be afraid of adjusting or modifying layouts to help convey substantive information or to highlight theoretical concerns. A primary growth area in network research recently has been to abstract from simple point-and-line diagrams to visualization tools that describe a field of social interaction, network dynamics, or interactive online information exploration tools.

A Note on Software Here we have deliberately avoided extensive summaries or instruction on a particular ­software package. All original work in this chapter was done using the PAJEK network

366   James Moody and Ryan Light visualization system, with output edited in Adobe Illustrator. Common alternatives include Gephi, the VOS Viewer, Netdraw (included with UCINET), and the packages associated with iGraph or Statnet in R. In the interactive space Javascript tools dominate, including D3, Cytoscape, sigma.js, and vis.js. Python has similar tools. Since all tools are heuristic approximations, we strongly recommend tools that allow for by-hand user exploration and adjustments (particularly for small to medium-sized networks), which is often not easy with scripted tools such as iGraph or Statnet. As such, the choice between software tools is generally less important than how you develop proficiency with using the available options in your preferred program.

References Alba, R. (1972). SOCK. Behavioral Science, 17, 326–327. Bender-deMoll, S., Morris, M., & Moody, J. (2008). Prototype packages for managing and animating longitudinal network data: Dynamic network and rSoNIA. Journal of Statistical Software, 24, 7. Bender-deMoll, S., & McFarland, D. A. (2006). The art and science of dynamic network visualization. Journal of Social Structure, 7(2). http://www.cmu.edu/joss/content/articles/ volume7/deMollMcFarland/ Bender-deMoll, S. (2018). Package vignette for NDTV: Network dynamic temporal visualizations. https://cran.r-project.org/web/packages/ndtv/vignettes/ndtv.pdf Bjerstedt, A. (1952). Chess-board sociogram for sociographic representation of choice directions and for the analysis of sociometric locomotions. Sociometry, 15, 244–262. Bronfenbrenner, U. (1944). The graphic presentation of sociometric data. Sociometry, 7, 283–289. Carroll, W. K., & Sapinski, J. P. (2010). The global corporate elite and the transnational policyplanning network, 1996–2006: A structural analysis. International Sociology, 25(4), 501–538. Edelmann, A., Moody, J., & Light, R. (2017). When scientists take a stand on contentious issues: Disparate foundations of scientists’ policy positions on contentious biomedical research. Proceedings of the National Academies of Science, 114(24), 6262–6267. doi:10.1073/ pnas.1613580114 Fisher, J. C., & Giovanna Merli, M. (2014). Stickiness of respondent-driven sampling recruitment chains. Network Science, 2, 298–301. Frank, Kenneth A. and Yasumoto, Jeffrey Y. (1998). Linking Action to Social Structure within a System: Social Capital within and between Subgroups. American Journal of Sociology, 104, 642–686. Freeman, L. C. (2000). Visualizing social networks. Journal of Social Structure, 1, 1. Fruchterman, T.  J.  J., & Reingold, E. (1991). Graph drawing by force-directed placement. Software—Practice and Experience, 21, 1129–1164. Gastner, M.  T., & Newman, M.  E.  J. (2004). Diffusion-based method for producing densityequalizing maps. Proceedings of the National Academies of Science, 101, 7499–7504. Graph Commons. (2018). The Trump network. https://graphcommons.com/graphs/ee4a43a23189-4f82-879c-960344332ea6 Healy, K. (2013). A co-citation network for philosophy. https://kieranhealy.org/philcites/

Network Visualization   367 Healy, K. (2019). Data visualization: A practical introduction. Princeton, NJ: Princeton University Press. Healy, K., & Moody, J. (2014). Data visualization in sociology. Annual Review of Sociology, 40, 105–128. Holton, D. (2006). Hierarchical edge bundles: Visualization of adjacency relations in hierarchical data. IEEE Transaction on Visualization and Computer Graphics, 12, 741–748. International Consortium of Investigative Journalists. (2019). Offshore leaks database. https:// offshoreleaks.icij.org/ Kadushin, C. (1974). The American intellectual elite. Boston, MA: Little, Brown. Kamada, T., & Kawai, S. (1989). An algorithm for drawing general undirected graphs. Information Processing Letters, 31, 7–15. Laumann, E. O., & Guttman, L. (1966). The relative associational contiguity of occupations in an urban setting. American Sociological Review, 31, 169–178. Light, R., & Adams, J. (2016). Knowledge in motion: The evolution of HIV/AIDS research. Scientometrics, 107(3), 1227–1248. Lundberg, G. A., & Steele, M. (1938). Social Attraction-Patterns in a Village. Sociometry, 1(3/4), 375–419. Loomis, Charles  P. (1943). Ethnic Cleavages in the Southwest as Reflected in Two High Schools. Sociometry, 6(1), 7–26. Moody, J. (2018). Multiple sources of interdisciplinary training. http://www.soc.duke.edu/ ~jmoody77/S@D/2017Viz/DNAC_Poster_2018.pdf Moody, James, McFarland, Daniel  A., & Bender-DeMoll, S. (2005). Dynamic Network Visualization: Methods for Meaning with Longitudinal Network Movies. American Journal of Sociology, 110, 1206–1241. Moody, J., & Light, R. (2006). A view from above: The evolving sociological landscape. American Sociologist, 38, 67–86. Moody, J., & Mucha, P. J. (2013). Portrait of political party polarization. Network Science, 1, 119–121. Moreno, J. (1934). Who shall survive: A new approach to the problem of human interrelations. Washington, DC: Nervous and Mental Disease Publishing Co. Nothway, M. L. (1940). A method for depicting social relationships by sociometric testing. Sociometry, 3(2), 145–150. Tufte, E.  R. (2001). The visual display of quantitative information. Cheshire, CT: Graphics Press. Tutte, W. T. (1963). How to draw a graph. Proceedings of the London Mathematical Society, 13, 743–768. Waltman, L., van Eck, N. J., & Noyon, E. C. M. (2010). A unified approach to mapping and clustering of bibliometric networks. Journal of Informetrics, 4, 629–635. Waugh, A. S., Pei, L., Fowler, J. H., Mucha, P. J., & Porter, M. A. (2009). Party polarization in Congress: A Network Science Approach SSRN. Retrieved from http://ssrn.com/ abstract=1437055.

Chapter 19

The Spati a l Dim ensions of Soci a l N et wor ks Zachary P. Neal

The first law of geography holds that “everything is related to everything else, but near things are more related than distant things” (Tobler,  1970, p. 236). This law captures the phenomenon of spatial dependence that is the essence of research in all areas of geography: where things are matters. For geographers, the concepts of “near” and “distant” that appear in the law unambiguously refer to relative locations in physical or topographical space, which is measured in meters. For example, I am physically near the president of Michigan State University (i.e., about 0.5 km) but physically distant from the president of the United States (i.e., more than 800 km). A nearly identical law exists in network science. The first law of networks—if such a thing exists—would also be that everything is related to everything else, but near things are more related than distant things. This law captures the phenomenon of structural dependence that is the essence of research on all types of networks: where things are matters. The only difference between the first laws of networks and of geography is the meaning of the keywords. For network analysts, the concepts of “near” and “distant” that appear in the law refer to relative locations in the network of topological space, which is measured in edges. For example, I am topologically near the university president (i.e., we are separated by two edges because I know someone who knows the university president) but topologically distant from the US president (i.e., we are separated by more than two edges because I don’t know anyone who knows the US president). The topographical space of geographers and topological space of network analysts are often closely related (adams, Faust, & Lovasi, 2012), and as a result, their two laws often collide: everything is related to everything else, but topographically and topologically near things are more related than topographically and topologically distant things. The focus of the spatial study of social networks lies in exploring a series of questions embedded in this combined law of geography and networks.1 First, which is more important: topographical or topological location? For example, when it comes to spreading new ideas, is it more important that two people live near one another or that they know one another? Second,

THE Spatial Dimensions of Social Networks   369 which came first: topographical or topological location? For example, nearby businesses often work together, but do they work together because they are located nearby or are they located nearby because they need to work together? Finally, what are the entities that are simultaneously located in topographical and topological space, and what is the scale of that space? The possible answers to these last questions provide a structure for this chapter, which examines three different kinds of things that are simultaneously embedded in different scales of topographical space and in topological space: people at the micro-scale (e.g., neighbors in a neighborhood), things at the meso-scale (e.g., roads in a city), and places at the macro-scale (e.g., cities in the world).

Micro-Level Networks of People Most studies of social networks examine networks in which the nodes represent people and the edges represent the range of relationships that exist between people, including friendship, romantic partnership, and conflict. Just as people are embedded in networks of different types of relationships, they are also embedded in physical space: they live somewhere, they work somewhere, and they form relationships somewhere. However, despite the vast body of research on people’s connections to each other developed initially by sociologists and anthropologists and the vast body of research on people’s connections to places by geographers, these two lines of work have remained largely separate. Past and continuing attempts to bridge this gap involve exploring the relationship between people’s physical location relative to one another and their pattern of relationships with one another. How did it start? Network analysis in general and spatial network analysis in particular are often seen as relatively new techniques, but they actually have quite a long history. In the early 1930s, the Hudson School for Girls experienced an unusual increase in the number of runaways. Jacob Moreno, an Austrian-born psychologist, was hired to investigate using his new technique of sociometry (Borgatti et al., 2009; Moreno, 1934). His data collection efforts included mapping the relationships among the school’s 435 girls and staff, as well as each girl’s residence in one of 16 cottages.2 He found that most relationships among the girls were within cottage, while a few of the relationships were between cottage, suggesting that each girl’s cottage of residence shaped her social network. Moreover, this cottage-based network helped to explain how the runaway epidemic developed: the within-cottage network allowed one runaway girl to convince one or more of her cottage mates to also run away, while the between-cottage ties allowed the idea of running away to spread more widely. Drawing on his findings at the Hudson School and other research sites, Moreno argued that networks of attractions and repulsions among a group of individuals could be used to sociometrically plan an entire community. A few years later, these ideas were put into practice in the planning of a subsistence homesteading community, a type of rural public housing project for low-income families. The community’s 35 families were assigned houses based on each family’s position in a network of attraction toward other families. The goal was to “promote more harmonious neighborhood structures” by housing families that liked each other nearby, while housing families that disliked or were indifferent to each other farther apart (cf. Festinger, Schachter, & Back, 1950; Wolman, 1937).

370   Zachary P. Neal The earliest studies and applications of spatial network analysis to networks of people had the character of social engineering but operated on a common premise that the distance between two people (or families) was related to the type or intensity of social tie between them. However, changes in the spatial organization of human settlements from rural to urban and suburban, and changes in modes of communication from face-to-face to electronic, increasingly called this premise into question, leading scholars to ask whether physical distance still mattered for the structure of social relations. Calling this the “Community Question,” Wellman (1979) identified three answers that have been offered. First, although city living had brought people into closer physical proximity, long solitary commutes and reliance on computers and smartphones may have led to community being lost and personal social networks disappearing. Alternatively, densely populated urban neighborhoods may have made it possible for community and personal social networks to be saved despite changes in how we communicate. Finally, and most consistent with Wellman’s observations, these changes may have liberated communities from the constraints formerly imposed by the friction of spatial distance, allowing personal social ­networks to include individuals both near and far. What do we know now? Driven by these early studies of the role of space in social networks, much contemporary research in this area has sought to probe this basic premise further by asking: how does the probability of a social tie between two people decline as the physical distance between them increases? Perhaps the simplest specification is a social adaptation of Isaac Newton’s law of gravity called a gravity model, M1 × M2/D2, where M1 and M2 represent the “social mass” of the two people (e.g., their gregariousness) and D is the distance between them. This specification suggests that two people located next to each other in space are four times more likely to form a social tie than if they were separated by two units of distance, and nine times more likely than if they were separated by three units of distance. Of course, reality is not this simple, but empirical investigations have still confirmed that social tie formation declines at greater distances. For example, in one southern US neighborhood, each additional unit of distance between two people led to a 52% reduction in the likelihood of a weak tie and a 62% reduction in the likelihood of a strong tie (Hipp & Perrin, 2009). Additionally, physical distance influenced the formation of social networks as much as social differences. For example, having children has the same effect on one’s ties to child-free friends as moving 50% farther away, while getting married has the same effect on ties to single friends as moving twice as far away. These findings suggest that even in the early 21st century, physical distance still powerfully shapes the formation of social networks. Both the inverse-squared function in the gravity model and the logarithmic function used by Hipp and Perrin (2009) suggest that for each additional unit of distance between two people, the probability of a social tie between them declines or, more precisely, that the relationship between distance and tie formation is monotonic. Others have recently questioned this assumption, exploring the actual relationship between distance and the ex­ist­ ence, creation, and maintenance of friendships among adolescents (Preciado et al., 2012). They found that, consistent with the approach used by Hipp and Perrin (2009), the probability of a tie can be accurately predicted by the logarithm of distance (i.e., logit(p) = β0 + β1 × log(dist)), but also that this implies that the probability of a tie follows a power law (i.e.,  p ≈ exp(β0) × distβ1). In this power law, the value of the exponent (β1) may range between  –0.7 and –0.2 depending on contextual factors like the scale of distance under

THE Spatial Dimensions of Social Networks   371 consideration. Interestingly, they also found some evidence that although the probability of tie formation does generally decrease as the distance between adolescents increases, it does not always decline monotonically. Instead, there appear to be three regimes: the probability declines for greater but still nearby distances (e.g., farther away but still in the same school zone), then levels off for intermediate distances (e.g., outside the school zone but still in the same school district), and finally declines again for longer distances (e.g., outside the school district). While some have sought greater precision in the functional form of the distance–tie probability relationship, others have focused on alternative ways of measuring distance. Most research in this area focuses on as-the-crow-flies or Euclidean distance; however, this is not actually how people move through space, and thus not the kind of distance that likely influences how they form relationships. The measurement concepts from the theory of space syntax offer one alternative, viewing distance as a kind of “cost” of movement that includes the actual distance traveled (e.g., number of steps), the number of segments required (e.g., turns), and the complexity of the path (e.g., angular change in turns). In office settings where interaction and tie formation must contend with cubicle barriers, long hallways, and hidden photocopier rooms, space syntax measures of distance offer a much better explanation of tie formation than a simple Euclidean conception of distance (Sailer & McCulloh, 2012). In neighborhood settings, an even simpler conception of distance seems to work: how far can you walk without getting hit by a car? Tertiary or T-communities are defined as the maximum region a person can access on foot using only small tertiary streets and without crossing a primary (i.e., high-traffic) street. Regardless of actual Euclidean distance, social ties are more likely to form between residents of the same T-community than between residents of different T-communities, simply because chance walking encounters are safe within but not between these regions (Grannis, 2009; Hipp, Faris, & Boessen, 2012). How can we study it? Increased attention on the relationship between physical distance and social tie formation has spurred recent development in methods for studying this phenomenon. Many of the most promising modeling strategies rely on simulation, including equation-based models, agent-based models, and exponential random graph models. Equation-based models (e.g., Butts et al., 2012) begin by specifying a hypothesized function describing the relationship between distance and tie formation probability, for example, like those identified by Hipp and Perrin (2009) or Preciado et al. (2012). Next, the researcher constructs a simulated social network that might form if every person in a given population formed ties with every other person with the probability specified by the function. Finally, the simulated social network is examined either in comparison to an empirical social network to evaluate whether the hypothesized function accurately reproduces the type of social network observed in reality or to simulated social networks generated by other hypothesized functions to understand how different functions yield different types of network structures. Agent-based models (e.g., Neal, 2015; Neal & Neal, 2014) are closely related but aim to simulate relationship formation as an organic and endogenous process that unfolds over time. Such models begin by specifying a series of simple behavioral rules that guide how agents (often people) interact with each other and with their local environments. These rules may include instructions about when to form or dissolve a tie with another agent (i.e., selection) but also about when to change one’s own behaviors based on the behaviors of other agents (i.e., influence), as well as internal processes like belief formation and external

372   Zachary P. Neal processes like movement through the environment. Once defined, each agent in the model is allowed to autonomously interact with other agents and the environment following these rules, while the researcher observes the social network that emerges as a result. Similar to equation-based models, the emergent simulated networks can be compared to empirical networks or to simulated networks that emerge under different sets of behavioral rules. Finally, exponential random graph models (ERGMs) include a broad class of statistical models that resemble regression models but are designed to predict the structure of a network, and which have recently been extended to include physical distance (e.g., Daraganova et al., 2012). In essence, ERGMs are like equation- and agent-based models in reverse. They begin with an empirically measured social network and a set of covariates hypothesized to be responsible for the network’s observed structure. These covariates can include measures of physical distance between the people, as well as other dyad-level attributes (e.g., relationship strength), node-level attributes (e.g., gender, degree), and other structural patterns (e.g., transitivity). The goal of an ERGM is to answer the question: what role would each of these covariates need to have been playing when this network formed for it to have the structure we observe? Thus, whereas equation- and agent-based models begin with hypothesized network formation processes and match them to empirical data, ERGMs begin with an empirical network and attempt to derive the network formation process most likely to have been responsible for its formation.

Meso-Level Networks of Things Because people are typically the nodes in social networks, the most common form of a spatially embedded social network is a network of people embedded in specific locations in neighborhoods, schools, or offices. However, networks of inanimate objects like roads, pipes, power lines, and other forms of infrastructure can also be viewed as social networks because their structures are socially meaningful. Research on how pathways in offices (Sailer & McCulloh, 2012) or through neighborhood streets (Grannis, 2009) affect social relations offer examples of how infrastructure networks have social consequences. Moreover, infrastructure networks are often even more spatially bound than networks of social relations because, unlike people, infrastructure mostly does not move around. Of course, not all research on infrastructure networks is spatial. For example, Watts and Strogatz (1998) used the US power grid to demonstrate the features of a small-world network but did not make reference to the fact that generators, transformers, and substations are all located in specific places. However, increasingly the spatially embedded nature of infrastructure is being incorporated into network analyses. How did it start? The history of examining spatial infrastructure networks is as old as graph theory itself and begins with a puzzle that confronted the 18th-century villagers of Königsburg (now Kaliningrad in present-day Russia). The Pregolya River runs through the village and contains two small islands that were connected to each other and to the north and south banks of the village by a series of seven bridges (see Figure 19.1). Several, likely apocryphal, stories exist about the exact nature of the puzzle. One holds that the villagers, after a few pints at the tavern, would attempt to take a stroll through town using each bridge

THE Spatial Dimensions of Social Networks   373

figure 19.1  The bridges of Königsburg.

Reprinted from Neal (2013, p. 70).

only once, then return to the tavern for another refreshment. Another holds that the king wished to hold a celebratory parade through town that crossed each bridge, but only once. However, it proved impossible to cross each bridge only one time. In 1741, Swiss Mathematician Leonard Euler explained why by focusing only on the space’s most im­por­tant elements, the four land masses (nodes) and seven bridges (edges). From this abstract graph representation, Euler demonstrated that the availability of the paths the villagers sought—one that uses each edge once—depends on the nodes’ degree distribution. Such a path exists only when none of the nodes, or when exactly two of the nodes, have an odd degree. In Königsburg, all four land masses were served by an odd number of bridges, and thus no such path is possible. It was in his proof that spatial networks and graph theory were born. What do we know now? Since the 18th century, analysis of spatially embedded infrastructure networks has turned from plotting drunken village walks to somewhat more pressing matters. In the past decade, three broad issues have received particular attention: robustness, wayfinding, and universality. The issue of networks’ robustness under conditions of failure (e.g., a node or edge is removed) initially rose to prominence with Albert, Jeong, and Barabási’s (2000) observation that networks with scale-free degree distributions are robust against random failure (e.g., accidents) but vulnerable to targeted failure (e.g., attacks). In the more concrete context of street networks, research on robustness has turned to measuring and developing street systems that facilitate travel even when certain roads get

374   Zachary P. Neal congested or closed. Although relying on existing network analytic tools, this represents a significant departure from traditional approaches to planning transportation systems, which have relied primarily on the volume-to-capacity ratio of single links, while ignoring these links’ role in the wider network (Scott et al., 2006). Moreover, recent work in this area has demonstrated that thoughtful network planning requires not only understanding how individual road segments fit into the wider network’s topology but also understanding how the spatial organization of the network overlays onto other spatially distributed features like where people live and where they want to go (Jenelius, 2009). Identifying geodesic paths, that is, the shortest path between two nodes in a network, is a complex task for which the development of better and faster algorithms is ongoing. In the context of spatially embedded networks like road networks, identifying the geodesic path means finding the shortest route from point A to point B, which is also difficult and helps explain the widespread adoption of in-car GPS navigation systems. Although theoretical research on shortest-path algorithms in abstract graphs is helpful, unique features of wayfinding problems in road networks have spurred a separate line of inquiry. First, queries about the shortest path are typically made by drivers in moving cars, and thus need to be answered quickly on small computers with limited computational capacity (i.e., practically efficient algorithms), for very large networks (i.e., asymptotically efficient algorithms). Some solutions to this challenge have involved incorporating the same hierarchical features of street networks noted by Grannis (2009): focus the algorithm on getting close to the destination by searching primary arterial roads first, then zero in on the destination by searching tertiary neighborhood streets (Zhu et al., 2013). Second, unlike abstract graphs, the nodes in street networks feature traffic lights, which may not block movement, but can certainly slow it down. Thus, while in graph theory attention has been in developing shortest-path algorithms, attention in infrastructure network wayfinding has instead been on developing shortest-time and shortest-cost-path algorithms, which are hard, indeed NP-hard (Ahuja et al., 2002). Finally, from the perspective of the driver (and sometimes from the GPS computer), there can be significant uncertainty about features of the road network, including whether certain segments are congested, whether certain segments are closed, and indeed about the structure of the network beyond what the driver can see out the windshield at the moment. Thus, while most shortest-path algorithms begin with a known network, algorithms for wayfinding in road networks must incorporate elements of uncertainty (Chen et al., 2013). Although spatially embedded infrastructure networks may be different from nonspatial abstract graphs, and thus require different analytic approaches, contemporary research in this area has also turned to examining these networks’ similarity to nonspatial networks and to deriving universal laws. Spatially embedded infrastructure networks including roads (Jiang, 2007), airlines (Neal, 2014), and power lines (Watts & Strogatz, 1998) all have the same small-word structure and scale-free degree distributions that are observed in many other nonspatial networks. Likewise, features of road networks including the number of nodes (i.e., intersections), number of edges (i.e., road segments), and edge weights (i.e., road capacity) are proportional to the geographic and population size of the city they are in, and in roughly the same way as analogous features of cardiovascular (Samaniego & Moses, 2008) and neural (Changizi & Destefano, 2010) networks are related to the size of the organisms they are in. These apparent universalities have led some to speculate that the structure of spatially embedded networks that perform distribution functions is deterministic (Bejan, 1996; West, Brown, & Enquist, 1997).

THE Spatial Dimensions of Social Networks   375 How can we study it? Spatially embedded infrastructure networks are studied using all the same tools and methods as other networks, but a few methodological concepts are particularly relevant. First, nearly all road networks and many other types of infrastructure networks are examples of planar graphs. Planar graphs are a special subset of graphs that can be embedded (i.e., drawn) on a two-dimensional plane such that no edges intersect. The relationship between the features of a planar graph and a spatially embedded infrastructure network are easy to see: the two-dimensional plane on which the graph is embedded is the earth’s surface, over which the roads or other infrastructure runs, while the requirement that no edges intersect corresponds to the fact that any intersection of two roads is represented by a node.3 Recognizing that road networks are planar graphs is important because planar graphs have different statistical properties compared to all graphs (Masucci et al., 2009). For example, a nonplanar graph can contain a maximum of (N2 – N)/2 undirected edges, but a planar graph can contain a maximum of only 3(N – 2) edges, which in turn has implications for evaluating the density of planar graphs. When studying road and other infrastructure networks, it is also useful to recognize that there are at least two distinct ways of representing them as graphs. The primal approach, which is the most common and familiar, represents roads as edges and intersections as nodes, much like a map. The left panel of Figure 19.2 illustrates the Bridges of Königsburg network using the primal approach: land masses are numbered nodes, while bridges are lettered edges. Primal representations are planar graphs and indicate which locations are accessible from other locations via segments of infrastructure. For example, in this figure, the centrality of node 2 highlights that one can reach all other land masses from this island. In contrast, the dual approach represents roads as nodes and intersections as edges. The right panel of Figure 19.2 illustrates the same Bridges of Königsburg network, but using the dual approach: the bridges are lettered nodes, while land masses are numbered edges. Dual representations are typically nonplanar and more complex than their primal counterparts. They indicate which infrastructure segments provide access to other segments via intersections. For example, in this figure, the centrality of node F highlights that one can reach all other bridges from this bridge. Thus, the primal and dual approaches provide different ways

1&2

1 A

F

2 C

D

2

A

G

B

C

2

B

2

2

2&4

2

2 2

3

4 1

E

F

E

2

1 4 3

3

4

D

3

G

Dual

figure 19.2  Primal and dual representations of the bridges of Königsburg.

376   Zachary P. Neal of considering questions of access and accessibility in spatially embedded infrastructure networks (Porta, Crucitti, & Latora, 2006). Finally, for many decades traffic engineers have relied on notions of hierarchy when designing urban traffic systems. A road’s location in the hierarchy depends on its function in the overall system, which in turn is closely associated with its capacity. At the top of the hierarchy are freeways, which serve to rapidly distribute individuals to general areas of the city, while progressively lower-capacity, slower roads (e.g., arterial surface streets, collector streets, local streets, cul-de-sacs) are located at lower levels of the hierarchy. These hierarchical levels are also reflected in the structure of the network. For example, a given freeway will be connected to most other freeways in the area, but to only a small number of arterial streets, and never to a cul-de-sac. However, only recently have tools for the description and analysis of multilevel, multilayer, or hierarchical networks emerged (Boccaletti et al., 2014; Wang et al., 2013). These methodological developments open up new possibilities for analyzing the distinctively hierarchical structure of road networks.

Macro-Level Networks of Places Specially embedded networks of people often focus on networks within micro-scale spaces like neighborhoods, while networks of things often focus on networks within meso-scale spaces like whole cities (e.g., for road networks) or countries (e.g., for train or power grid networks). Shifting to a still higher scale, it is possible to examine spatially embedded networks of entire places, for example, networks of cities and countries that span the globe. Because there are many different ways that places around the globe might be linked, it can be helpful to classify these types of networks by the form and function of the linkages (see Table  19.1, adapted from Smith & Timberlake,  1995). Whether the interactions between places take a human, information, or material form, and whether they serve an economic, political, cultural, or social function, defines a unique type of spatially embedded network with a distinctive structure. For example, a network capturing material-form and economicfunction exchanges might focus on intermodal commodity movements, and thus be centralized around the world’s key ports like Shanghai and Rotterdam. In contrast, a network capturing human-form and social-function exchanges might focus on leisure passenger travel, and thus be seasonally centralized around places with good weather.

Table 19.1  Typology of Networks of Places Form Function

Human

Information

Material

Economic

Labor migration

Teleconference

Commodity

Political

Diplomatic visit

Policy collaboration

Mutual aid

Cultural

Student exchange

Music

Artwork

Social

Vacation

Phone call

Care package

THE Spatial Dimensions of Social Networks   377

figure 19.3  Early global urban networks. How did it start? Although the formal analysis of specially embedded networks of places at the global scale is relatively recent, these networks have a long history. Figure 19.3 illustrates three early global urban networks. The ancient silk road that linked Eastern cities supplying spices and textiles to the West highlights the simultaneous topographical and topological betweenness of Constantinople (now Istanbul), which helps explain its role of trading outpost and powerful gateway city. The medieval trade networks illustrate the beginnings of powerful trading empires (e.g., Venice and Genoa) and alliances (e.g., the Hanseatic League, including London, Amsterdam, and Stockholm). More recently, the submarine telegraph cables that permitted “high speed” transoceanic communication confirm that even by 1903 New York and London, sometimes in the context of global urban networks abbreviated NYLON, were key nodes. There is also a long history of theorizing about these types of networks. As early as 1927, sociologist Roderick McKenzie observed that “the world is fast becoming a closed region . . . in which centers and routes are gaining precedence over boundaries and political

378   Zachary P. Neal areas” (p. 28). Several decades later, Manuel Castells (1996) echoed this observation, noting that the world was transitioning from a “space of places” to a “space of flows.” Both McKenzie and Castells were describing the development of networks that are simultaneously spatially embedded and spatially unconstrained: they link specific places located in physical space but do so over long distances and across borders. As a result, cities’ roles in the world system have become more a function of their position in topological space than in topographical space (Neal, 2011). Drawing on these ideas, Friedmann (1986) offered an initial but heuristic view of such a topological space—a portrait of what a global urban network might look like if it were to be formally measured. Since this time, much of the scholarship on spatially embedded networks of places has been on developing methodologies for measuring them, and on describing their basic structural properties. How can we study it? In the case of spatial networks of places, it is useful to address the question “How can we study it?” first because this is where much of the effort has been devoted. Most studies of aggregate units like cities and countries rely on official statistics (e.g., the census) collected by government agencies and other organizations. However, these official statistics rarely contain structural or network information; they report details about specific places (e.g., the population of city X, the gross domestic product of country Y) but not about how those places interact with one another (e.g., where city X residents are from, where country Y sends its exports). This has led global-scale spatial network researchers to look for alternate data sources and measurement strategies. Among the solutions that have emerged, the most widespread are those that rely on the movement of people or the location of firms to make inferences about interactions between cities. The movement of people between places represents an obvious way to conceptualize and measure a network of places. These movements represent not only the exchange of the people themselves but also exchanges of the ideas in those people’s heads, and of the money in their wallets. Thus, networks of human movement can also be viewed as indirect reflections of networks of information and economic exchange. The most promising approach to measuring the global-scale movement of people between cities is to focus on airline travel, both because this is the preferred mode of long-range travel and because a range of potential data sources exist. However, studying global urban networks using airline passenger travel requires consideration of a number of measurement subtleties (Neal, 2014). First, should the nodes in the network represent the actual airports providing transportation service or a spatial aggregation like a city or region? The former may be more useful for studies of transportation infrastructure, while the latter may be more useful for studies of urban phenomena. Second, should the edges in the network represent passengers’ actual travel routes including layovers or only passengers’ initial origin and final destination regardless of the route taken? The former may be more useful for research on things diffused by mere physical presence (e.g., germs), while the latter may be more useful for research on things diffused only through more intense engagement (e.g., ideas). Finally, should the network include the movement of all people or only people traveling for a specific purpose (e.g., business vs. leisure)? This measurement decision is analogous to the selection of a broad (e.g., “Who do you know?”) or narrow (e.g., “Who have you had sex with in the last week?”) name generator in traditional social network data collection. A second strategy for measuring global-scale spatial networks relies on firm locations to make inferences about the locations of economic interactions. One approach focuses on the fact that most multinational corporations have a hierarchical structure: a single global

THE Spatial Dimensions of Social Networks   379 headquarters in one city interacts with several strategically located regional offices, which in turn interact with specific production or retail sites scattered globally. By aggregating the place-based hierarchies of multiple firms, it is possible to construct a network that identifies the cities where corporate control originates and the cities where corporate control is exerted (Alderson & Beckfield, 2004). An alternative approach focuses on the fact that cross-border transactions are complicated for corporations due to differences in laws, currencies, and customs, and that these complications lead corporations to rely on advanced producer services (APSs) like international law firms and management consultancies to facilitate their global transactions. These APS-facilitated transactions are maximized when the contracted APS firm maintains a branch location in both the origin and destination cities. By identifying pairs of cities that host branches of the same APS firms (a special case of a two-mode network projection), it is possible to construct a network that identifies dyads with the highest capacity for economic interaction (Taylor,  2001; Neal, 2017). What do we know now? Because the initial research on spatially embedded networks of places has focused on overcoming measurement challenges, much of the analysis of these networks to date has been descriptive. For example, a large body of research has suggested that intercity air traffic networks at the global, continental, and national scales are examples of small-world, scale-free, modular networks (cf. Kaluza et al., 2010; Neal, 2014). Similarly, more than 15 years of research by the Globalization and World Cities (GaWC) research network has found that in firm-based networks of places, New York and London are con­ sist­ently the most central, while there is significant instability below this top tier (Taylor & Derudder, 2016). Moving beyond such descriptive findings, research on the causes and consequences of these types of networks remains more limited but has begun to examine issues of accessibility and disease spread. The structure of the global air transport network privileges some cities and regions as highly accessible while simultaneously marginalizing other cities and regions as inaccessible. Moreover, the structure of this network, and the accessibility of places, is not a simple function of their location or distance from one another. For example, cities in Australia and New Zealand are the most spatially distant from major hub airports, but they are not the least accessible. Instead, cities throughout the global south are the least accessible, having the highest absolute and per-kilometer time and fare costs (Zook & Brunn, 2006). These places’ relative lack of accessibility is socially significant because it suggests the network structure ensures their continued marginality in the larger system of world cities. This same type of network also has important consequences for the spread of disease, including seasonal influenza (Colizza et al.,  2006). In early 2009, individuals in Mexico were exposed to a new form of influenza A (H1N1), and international visitors to Mexico unwittingly spread this virus globally upon their return home. By examining the network of international air travel patterns, researchers were able to predict with more than 92% accuracy the locations where this virus was imported. This successful use of the spatially embedded air transportation network to predict the outcome of a public health crisis led to the formation of the Bio.Diaspora program in Toronto (Kahn et al., 2009), which in 2012 used similar network data to evaluate epidemic risks of travel to London during the summer Olympics (Kahn et al., 2012). This program has since developed into BlueDot, a Certified B Corporation that uses spatial networks at the global scale to predict and prepare for epidemics.

380   Zachary P. Neal

Networks in Latent Space For each of the types of networks discussed in the preceding sections, “space” refers to physical or geographic space. However, another type of space is also important for social networks: latent space (Hoff, Raferty, & Handcock, 2002). Latent space is a conceptual tool and mathematical abstraction rather than an actual location. In the context of social networks, the dimensions of latent space are typically social characteristics including demographics and attitudes. For example, a 20-year-old female Democrat would occupy a different, and perhaps quite distant, location in a three-dimensional (i.e., age ✕ gender ✕ party affiliation) latent space than a 40-year-old male Republican. Most latent space approaches to network analysis are rooted in an assumption that social ties are more likely between people who are more socially similar (i.e., homophily), or stated in explicitly spatial terms: social ties are local in social space (McPherson, 2004). In some cases, the analysis begins with data on each individual’s characteristics (i.e., their positions within a social space) and attempts to infer the probability of a social tie between dyads. In other cases, the analysis begins with an observed network and attempts to infer the dimensions of the latent space within which it can be embedded, and the nodes’ positions within that space. This is achieved, for example, by applying multidimensional scaling to the set of topological distances between pairs of nodes in a network, which yields a set of topographical coordinates for each node along one or more dimensions. Following such an analysis, the dimensions may be interpreted as representing specific social characteristics (e.g., age), while nodes’ positions may be interpreted as representing individuals’ values of those characteristics. Node coordinates obtained using a latent space approach can also be useful for node placement in network visualizations because they yield sociograms in which topologically close nodes are located near one another, while topologically distant nodes are located farther apart.

Frontiers The exploration of physical spaces has always been about looking for frontiers, whether in the depths of the ocean or at the edges of the solar system. Spatially embedded social networks also have a number of frontiers that are worth exploring more. The previous discussion has identified at least three frontiers for spatial network science that are just opening up. First, although these sections focus on specific types of networks (people, things, and places) and levels of analysis (at micro-, meso-, and macro-levels), these distinctions are artificial. There is a need to find linkages between networks of different types and scales. For example, the global air transportation network may shape how diseases spread among the world’s cities, but how does this network intersect with meso-level road networks that shape how diseases spread within a region (e.g., Balcan et al.,  2009)? Second, geography and econometrics offer a wealth of modeling techniques that facilitate spatial analysis, while network science and mathematics offer a wealth of modeling techniques that facilitate the analysis of networks and graphs. Now there is a need to combine these modeling traditions

THE Spatial Dimensions of Social Networks   381 and to develop new models that can explicitly and simultaneously account for both ­topographical and topological distances (e.g., Daraganova et al.,  2012). Finally, networks can be modeled as mathematical abstractions, but social networks are actually lived in, and their members’ perceptions of the network matter. Likewise, physical space can be modeled as a geometric abstraction, but social spaces are actually lived in, and their occupants’ perceptions of the space matter. Thus, there is a need to find ways to conceptualize and measure distance and space in more socially meaningful ways (e.g., Sailer & McCulloh, 2012; Zook & Brunn, 2006).

Notes 1. This chapter focuses on research that explicitly involves the (theory about the) measurement of both physical distance in geographic space and topological distance in network space. For a more general introduction to place-based networks, see Neal (2013). For more technical introductions to the mathematical foundations of spatial networks, see Blanchard and Volchenkov (2009) and Barthélemy (2011). 2. Interestingly, Moreno’s time at the school overlapped with jazz singer Ella Fitzgerald, then 16 years old, who was serving a sentence because she was “ungovernable and will not obey the just and lawful commands of her mother.” She may have been among the runaways Moreno was hired to investigate. Russ Immarigeon, “The Ungovernable Ella Fitzgerald,” http://www.prisonpublicmemory.org. 3. The obvious exception is the overpass or underpass, where roads intersect but on different planes, and thus where the intersection does not constitute a node.

References Adams, J., Faust, K., & Lovasi, G. S. (2012). Capturing context: Integrating spatial and social network analyses. Social Networks, 34, 1–5. Ahuja, R. K., Orlin, J. B., Pallottino, S., & Scutellà, M. G. (2002). Minimum time and minimum cost-path problems in street networks with periodic traffic lights. Transportation Science, 36, 326–336. Albert, R., Jeong, H., & Barabási, A.-L. (2000). Error and attack tolerance of complex networks. Nature, 406, 378–382. Alderson, A. S., & Beckfield, J. (2004). Power and position in the world city network. American Journal of Sociology, 109, 811–851. Balcan, D., Colizza, V., Gonçalves, B., Hu, H., Ramasco, J.  J., & Vespignani, A. (2009). Multiscale mobility networks and the spatial spreading of infectious diseases. Proceedings of the National Academy of Sciences, 106, 21484–21489. Barthélemy, M. (2011). Spatial networks. Physics Reports, 499, 1–101. Bejan, A. (1996). Street network theory of organization in nature. Journal of Advanced Transportation, 30, 85–107. Blanchard, P., & Volchenkov, D. (2009). Mathematical analysis of urban spatial networks. Berlin, Germany: Springer. Boccaletti, S., Bianconi, G., Criado, R., del Genio, C. I., Gómez-Gardeñes, J., Romance, M., . . . Zanin, M. (2014). The structure and dynamics of multilayer networks. Physics Reports, 544, 1–122.

382   Zachary P. Neal Borgatti, S. P., Mehra, A., Brass, D. J., & Labianca, G. (2009). Network analysis in the social sciences. Science, 323, 892–895. Butts, C. T., Acton, R. M., Hipp, J. R., & Nagle, N. N. (2012). Geographical variability and network structure. Social Networks, 34, 82–100. Castells, M. (1996). The rise of the network society. Cambridge, MA: Blackwell. Changizi, M. A., & Destefano, M. (2010). Common scaling laws for city highway systems and the mammalian neocortex. Complexity, 15, 11–18. Chen, B. Y., Lam, W. H. K., Sumalee, A., Li, Q., Shao, H., & Fang, Z. (2013). Finding reliable short paths in road networks under uncertainty. Networks and Spatial Economics, 13, 123–148. Colizza, V., Barrat, A., Barthélemy, M., & Vespignani, A. (2006). The role of the airline transportation network in the prediction and predictability of global epidemics. Proceedings of the National Academy of Sciences, 103, 2015–2020. Daraganova, G., Pattison, P., Koskinen, J., Mitchell, B., Bill, A., Watts, M., & Baum, S. (2012). Networks and geography: Modeling community network structures as the outcome of both spatial and network processes. Social Networks, 34, 6–17. Festinger, L., Schachter, S., & Back, K. (1950). Social pressures in informal groups. New York, NY: Harper & Brothers. Friedmann, J. (1986). The world city hypothesis. Development and Change, 17, 69–83. Grannis, R. (2009). From the ground up: Translating geography into community through neighbor networks. Princeton, NJ: Princeton University Press. Hipp, J.  R., Faris, R.  W., & Boessen, A. (2012). Measuring “neighborhood”: Constructing ­network neighborhoods. Social Networks, 34, 128–140. Hipp, J.  R., & Perrin, A.  J. (2009). The simultaneous effect of social distance and physical ­distance on the formation of neighborhood ties. City and Community, 8, 5–25. Hoff, P. D., Raferty, A. E., & Handcock, M. S. (2002). Latent space approaches to social n ­ etwork analysis. Journal of the American Statistical Association, 97, 1090–1098. Jenelius, E. (2009). Network structure and travel patterns: Explaining the geographical dis­ parities of road network vulnerability. Journal of Transport Geography, 17, 234–244. Jiang, B. (2007). A topological pattern of urban street networks: Universality and peculiarity. Physica A, 384, 647–655. Khan, K., Arino, J., Hu, W., Raposo, P., Sears, J., Calderon, F., . . . Gardam, M. (2009). Spread of a novel influenza A (H1N1) virus via global airline transportation. New England Journal of Medicine, 361, 212–214. Khan, K., McNabb, Scott JN., Memish, Ziad  A., Eckhardt, R., Hu, W., Kossowsky, D., . . . Brownstein, John S. (2012). Infectious disease surveillance and modelling across geographic frontiers and scientific specialties. Lancet, 12, 222–230. Kaluza, P., Kölzsch, A., Gastner, M. T., & Blasius, B. (2010). The complex network of global cargo ship movements. Journal of the Royal Society Interface, 7, 1093–1103. Masucci, A.  P., Smith, D., Crooks, A., & Batty, M. (2009). Random planar graphs and the London street network. European Physics Journal B, 71, 259–271. McKenzie, R. D. (1927). The concept of dominance and world-organization. American Journal of Sociology, 33, 28–42. McPherson, M. (2004). A Blau space primer: Prolegomenon to an ecology of affiliation. Industrial and Corporate Change, 13, 263–280. Moreno, J. L. (1934). Who shall survive? A new approach to the problem of human interrelations. Washington, DC: Nervous and Mental Disease Publishing Co.

THE Spatial Dimensions of Social Networks   383 Neal, Z. P. (2011). From central places to network bases: A transition in the US urban hierarchy, 1900–2000. City and Community, 10, 49–74. Neal, Z. P. (2013). The connected city: How networks are shaping the modern metropolis. New York, NY: Routledge. Neal, Z. P. (2014). The devil is in the details: Differences in air traffic networks by scale, species, and season. Social Networks, 38, 63–73. Neal, Z. P. (2015). Making big communities small: Using network science to understand the ecological and behavioral requirements for community social capital. American Journal of Community Psychology, 55, 369–380. Neal, Z. P. (2017). Well connected compared to what? Rethinking frames of reference in world city network research. Environment and Planning A, 49, 2859–2877. Neal, Z. P., & Neal, J. W. (2014). The (in)compatibility of diversity and sense of community. American Journal of Community Psychology, 53, 1–12. Porta, S., Crucitti, P., & Latora, V. (2006). The network analysis of urban streets: A dual approach. Physica A, 369, 853–866. Preciado, P., Snijders, T.  A.  B., Burk, W.  J., Stattin, H., & Kerr, M. (2012). Does proximity ­matter? Distance dependence of adolescent friendships. Social Networks, 34, 18–31. Sailer, K., & McCulloh, I. (2012). Social networks and spatial configurations—How office ­layouts drive social interaction. Social Networks, 34, 47–58. Samaniego, H., & Moses, M. E. (2008). Cities as organisms: Allometric scaling of urban road networks. Journal of Transport and Land Use, 1, 21–39. Scott, D. M., Novak, D. C., Aultman-Hall, L., & Guo, F. (2006). Network robustness index: A new method for identifying critical links and evaluating the performance of transportation networks. Journal of Transport Geography, 14, 215–227. Smith, D.  A., & Timberlake, M.  T. (1995). Cities in global matrices: Toward mapping the world-system’s city system. In P. L. Knox & P. J. Taylor (Eds.), World cities in a world-system (pp. 79–97). New York, NY: Cambridge University Press. Taylor, P. J. (2001). Specification of the world city network. Geographical Analysis, 33, 181–194. Taylor, P. J., & Derudder, B. (2016). World city network: A global urban analysis. New York, NY: Routledge. Tobler, W.  R. (1970). A computer movie simulating urban growth in the Detroit region. Economic Geography, 46, 234–240. Wang, P., Robbins, G., Pattison, P., & Lazega, E. (2013). Exponential random graph models for multilevel networks. Social Networks, 35, 96–115. Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of “small-world” networks. Nature, 393, 440–442. Wellman, B. (1979). The community question: The intimate network of East Yorkers. American Journal of Sociology, 84, 1201–1231. West, G. B., Brown, J. H., & Enquist, B. J. (1997). A general model for the origin of allometric scaling laws in biology. Science, 276, 122–126. Wolman, S. (1937). Sociometric planning of a new community. Sociometry, 1, 220–254. Zhu, A. D., Ma, H., Xiao, X., Luo, S., Tang, Y., & Zhou, S. (2013). Shortest path and distance queries on road networks: Towards bridging theory and practice. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (pp. 857–868). New York, NY: Association for Computing Machinery. Zook, M.  A., & Brunn, S.  D. (2006). From podes to antipodes: Positionalities and global ­airline geographies. Annals of the Association of American Geographers, 96, 471–490.

Chapter 20

Field Ex per i m en ts of Pr efer en ti a l At tach m en t Arnout van de Rijt and Afife Idil Akin

What has come to be known as the “preferential attachment” hypothesis is that nodes with many ties increase their connectivity faster than nodes with few ties (Albert & Barabási, 2002; Barabási & Albert, 1999). Preferential attachment provides an explanation for extremity in the distribution of ties across nodes, as has been observed in many networks (Newman, 2005): a few nodes are tied to many others, while most nodes are tied to few others. In this expla­ nation, differentiation occurs even if nodes do not have attributes that render them a priori more popular such as quality, talent, gender, race, class, status, or price. Extreme variation in connectivity emerges simply because of feedback (more ties lead to even more ties) and aging (older nodes have had more time to become better connected). The preferential attachment hypothesis became popular around the turn of the 21st cen­ tury with the emergence of complex systems as a new interdisciplinary field of scientific inquiry, but was preceded by many similar hypotheses in earlier literatures. A prominent target of sociological inquiry is “cumulative advantage” (Allison & Stewart, 1974; Allison, Long, & Krauze,  1982; Cole & Cole,  1973; Faia,  1975; Merton,  1988; Powell et al.,  2005; Reskin, 1977; Salganik, Dodds, & Watts, 2006; Willson, Shuey, & Elder, 2007), defined as “any temporal process in which a favorable relative position becomes a resource that pro­ duces further gain” (DiPrete & Eirich, 2006), and the related “Matthew effect” (Merton, 1968), whereby inequality grows with time. Work on cumulative advantage spilled over into bib­ liometry, where Price formulated the “cumulative advantage distribution” of citations (Price,  1976) and, later, Huber empirically investigated cumulative advantage in biblio­ metric time patterns by looking for trends of acceleration in scientists’ and inventors’ rates of production (Huber,  1998,  2002; Huber & Wagner-Döbler,  2001). Elsewhere, positive feedback in resource accumulation is also referred to as “rich-get-richer processes” or “suc­ cess breeds success” (van de Rijt et al.,  2014). Barabási & Albert’s account of scale-free

Field Experiments of Preferential Attachment   385 distributions of connectivity as the result of a preferential attachment process is grounded in earlier contributions by Yule (1925), Simon (1955), and others on the relationship between feedback processes and skewed distributions. Observational studies have identified longitudinal patterns of link accumulation con­sist­ ent with preferential attachment in observational network data from a wide range of con­ texts such as science citation networks, academic coauthorship, online content popularity, and career success (e.g., Jeong, Néda, & Barabási, 2003; Newman, 2001; Pham, Sheridan, & Shimodaira, 2015; Perc, 2014; Petersen et al., 2011; Ratkiewicz et al., 2010). A key challenge in evidencing preferential attachment is that confounding temporal processes whose func­ tional forms are unknown may generate longitudinal patterns that are reminiscent of but not actually produced by preferential attachment. For example, scientific careers may natu­ rally exhibit increases in productivity resulting from experience and career-related changes in the environment and available resources, falsely suggesting a success-breeds-success mechanism. Moreover, unobserved node-specific qualities may produce spurious correla­ tions between past and future attachment that are easily mistaken for preferential attach­ ment. For example, we just cited a number of very well-cited studies in the literature on preferential attachment, but did we do that because many others cite them, or because they stand out in quality and importance? And did the increasing speed at which these studies accumulated citations stem from positive feedback or from a delay in recognition, or per­ haps also from overall growth of this field of inquiry? An answer to such questions requires data on counterfactuals such as “Would the studies have been cited as frequently today had they not initially gained many citations?” Such counterfactuals are not available in histori­ cal records. To overcome this obstacle to evidencing preferential attachment, we developed a field experimental strategy that we applied to a range of network contexts. We later discovered a reference in Salganik and Watts (2008) to a study by Hansen and Putler (1996), who had apparently used this same strategy many years earlier to reveal positive feedback in the downloading of freeware. In this strategy the experimenter acts as a node bestowing links upon random other nodes to see if these largesses result in an uptick in connectivity vis-àvis the control group of nonrecipients. Apart from delivering the necessary counterfactual for causal inference, the in vivo nature of the intervention provides external validity often lacking in laboratory experiments. The applications of this strategy in four real-world domains can be seen in the first four rows of Table 20.1.1 In our first application of this experimental strategy we selected crowdfunding projects on http://www.kickstarter.com that had not yet raised any money five days after launch. We matched pairs of projects with the same funding goal and made a first donation of 1% of the funding goal to one (experimental condition), while withholding funding from the other (control condition). With donors and projects as nodes and donations as directed ties, the preferential attachment predicts greater numbers of donations from third parties by the funding deadline in the experimental conditions. A second area of application is the consumer review site http://www.epinions.com, where users post product reviews that are rated for their usefulness by others. We selected newly posted unrated reviews that we found useful and gave a random subset of reviews a positive rating (experimental condition) while withholding a rating from the rest (control condi­ tion). Preferential attachment would predict more subsequent positive ratings (links) given to reviews (nodes) by third parties in the experimental condition.

386   Arnout van de Rijt and Afife Idil Akin Our third application involved the peer-produced encyclopedia http://www.wikipedia. org, where editors reward one another’s volunteer efforts by placing virtual editing awards called “barnstars” on each other’s user pages. We selected random editors from among the 1% most productive editors, as measured by their total number of edits during the past three months, and awarded them a barnstar. The preferential attachment prediction is that editors (nodes) in the experimental condition receive more barnstars (links) from third parties. The final area where we have previously tested for preferential attachment through field experimental intervention is the petitioning website http://www.change.org. We selected recently posted petitions that had accumulated under 15 signatures, that had not been signed in the previous 24 hours, and that we were comfortable supporting. Our research

Table 20.1  Applications of the Proposed Experimental Design for Testing Preferential Attachment Experimental Application

Node

Incoming Link

http://www.kickstarter.com

Crowdfunding project

Financial contribution

http://www.epinions.com

Product review

Positive rating

http://www.wikipedia.org

Editor

Editing award

http://www.change.org

Petition

Signature

http://www.ebay.com

Item for sale

Eyeball

1

Incoming ties

0.75

funding experiment funding control ratings experiment ratings control awards experiment awards control signatures experiment signatures control

0.5

0.25

0

0

0.25

0.5 Time

0.75

1

figure 20.1  Accumulation of four types of incoming ties in four field experiments. The curves represent running numbers of donations (blue), positive ratings (red), awards (yel­ low), and campaign signatures (green) in the experimental condition (solid lines) and the control condition (dashed lines). The horizontal axis is normalized so that 0 marks the time of experimental intervention and 1 marks the end of the observation period. The vertical axis is normalized so that for each system a value of 1 equals the maximum across time and conditions. Full color figures available on Oxford Handbooks Online.

Field Experiments of Preferential Attachment   387 team added a dozen signatures to a random half of these selected petitions (experimental condition) and withheld signatures from the remaining petitions (control condition). The preferential attachment hypothesis predicts that petitions (nodes) in the experimental con­ dition will accumulate more signatures (links) from third parties during the following days. Figure 20.1 is adapted from van de Rijt et al. (2014) showing the accumulation of links by application domain and condition over the course of the observation period. In each domain, nodes in the experimental condition accumulated on average more links than in the control condition, providing experimentally controlled and contextually broad support for the preferential attachment hypothesis.

A Novel Application: http://www.ebay.com In addition to the previous four experiments, we report on a novel application of the in vivo design to http://www.ebay.com.2 eBay is an online auction service that allows users to make bids on a variety of listed items or simply purchase them through a “Buy it now” option. Each listing is designed to serve two functions: one is showing interest for a product by bidding on it, and the other is showing interest by following or “watching” a product. To avoid effects of our treatment on bidding, our study population excludes items that can be bid on, only including listings that can be bought instantly at the listed price. A plausible mechanism producing preferential attachment in this setting is that higher numbers of watchers suggest a more worthwhile item. Before watching an item, users can observe how many other users are already watching. From our population of “Buy it now” listings, we randomly picked 600 items that had no initial watchers, which we randomly assigned to six conditions. In these six conditions we intended to introduce respectively zero, one, two, three, four, and five initial watchers. Because of a logistical error, the condition with four initial watchers was accidentally assigned five initial watchers. As a result, conditions 0, 1, 2, and 3 each ended up with the intended 100 listings, while condition 4 had 0 listings and 5 had 200 listings. In compliance with the website’s terms of use we employed real eBay accounts that belonged to members of our research team. We then tracked the daily num­ ber of watchers over the course of a month or until the item was sold, whichever came first. The preferential attachment hypothesis predicts that listings (nodes) in experimental con­ ditions with more initial watchers (links) accumulated greater numbers of additional watchers. Figure 20.2 shows the number of later watchers as a function of time (days since inter­ vention), by experimental condition. The results in Figure 20.2 are not unambiguously in favor of preferential attachment. While the number of later watchers monotonically increases from one to two to three to five initial watchers, as predicted, the number of later watchers for listings with zero initial watchers lies between the values for three and five initial watchers. Inspection of the data reveals that the averages in Figure 20.2 are heavily driven by a few outliers that accumulate very large numbers of watchers. Figure 20.3 compares the experimental conditions in terms of the percentage of cases with at least one later watcher. Apart from being robust to outliers, the quantity in Figure 20.3 isolates the instantaneous effect of our intervention on the propensity for watchers to join, net of any exacerbating (or counteracting) second-order effects that later watchers may

388   Arnout van de Rijt and Afife Idil Akin

# later watchers

2

0 initial watchers 1 initial watchers 2 initial watchers 3 initial watchers 5 initial watchers

1

0

0

10

Time

20

% with any later watchers

figure 20.2  Number of watchers over time, by condition.

40

20

0

0

1

2 3 # initial watchers

5

figure 20.3  Evidence for preferential attachment on eBay. Percentage of later watchers by number of initial watchers bestowed through field experimental intervention. have on the joining of yet later watchers. Figure 20.3 shows that, consistent with the prefer­ ential attachment hypothesis, higher numbers of initial watchers are associated with higher numbers of later watchers. An exact test finds evidence for differences across experimental conditions at the 95% confidence level (N = 600; p = .024). The only deviation from a monotonic relationship between initial and later watchers in Figure 20.3 is the percentage of later watchers for listings with five initial watchers being lower than that for three initial watchers. Indeed, a logistic regression of any later watchers on the number of initial watchers finds no positive trend (p = .074). One possibility is that differences between higher categories are smaller. Indeed, in van de Rijt et al. (2014, p. 6937), we identified decreasing marginal effects of initial links on later links, suggesting for the present context a sublinear relationship between initial watchers and the log odds of

Field Experiments of Preferential Attachment   389 later watchers. A logistic regression model with the square root of the number of initial watchers yields some support for this interpretation, finding a significant effect of initial watchers on later watchers (p = .047). In sum, our results suggest a concave relationship between the number of existing watchers and the rate at which subsequent watchers join, consistent with a sublinear preferential attachment process. We conclude that across a wide range of theoretical contexts, network growth exhib­ its preferential attachment. Our field experimental approach was able to establish a causal relationship between early and later links accumulated by nodes in a variety of empirical domains. The feedback effect appears to be stronger for early links than for later ones, with sublinear functional forms providing a better fit to the experimental data. This functional form is of critical importance in the assessment of systemic con­ sequences of preferential attachment. With feedback effects leveling off as ties accumu­ late, cohorts of nodes may counterintuitively not exhibit Matthew effects (Merton, 1968). Despite the operation of preferential attachment, cohorts may then experience decreases in scale-invariant measures of inequality, as less connected nodes gradually catch up with their better-connected counterparts. A promising line of inquiry suggested by these findings may involve the identification of contextual conditions that moderate the shape of the feedback effect and thus determine whether relative differences in connectivity shrink or grow with time. Perhaps when multiple preferential attachment mechanisms are combined, the joint impact may lead the rate of new ties to be super­ proportional to the number of past ties, setting in motion a dynamic of ever-growing network inequality.

Acknowledgment We thank James Moody and Ryan Light for all their efforts toward putting together this hand­ book. We also would like to thank Michael Claffey, Demi Ajao, and Jasmit Walia for support in execution of the experiment. This work was supported by National Science Foundation Grants SES-1340122 and SES-1303522 to the first author.

Notes 1. Our application of this strategy in four real-world domains was approved by the Stony Brook University Human Subjects Committee (ID numbers 373335, 366647, 230771, and 442574) and conducted in compliance with the terms of use of the respective websites. 2. Our field experiment was approved by the Stony Brook University Human Subjects Committee (ID number 663902) and was conducted in compliance with eBay’s terms of use.

References Albert, R., & Barabási, A. (2002). Statistical mechanics of complex networks. Reviews of Modern Physics, 74(1), 47–97. Allison, P.  D., Long, J.  S., & Krauze, T.  K. (1982). Cumulative advantage and inequality in ­science. American Sociological Review, 47(5), 615–625.

390   Arnout van de Rijt and Afife Idil Akin Allison, P. D., & Stewart, J. A. (1974). Productivity differences among scientists: Evidence for accumulative advantage. American Sociological Review, 39(4), 596–606. Barabási, A., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286, 509–512. Cole, J. R., & Cole, S. (1973). Social stratification in science. Chicago, IL: University of Chicago Press. Diprete, T. A., & Eirich, G. M. (2006). Cumulative advantage as a mechanism for inequality: A review of theoretical and empirical developments. Annual Review of Sociology, 32(1), 271–297. Faia, M.  A. (1975). Productivity among scientists: A replication and elaboration. American Sociological Review, 40(6), 825–829. Hanson, W.  A., & Putler, D.  S. (1996). Hits and misses: Herd behavior and online product popularity. Marketing Letters, 7(4), 297–305. Huber, J.  C. (1998). Cumulative advantage and success-breeds-success: The value of time ­pattern analysis. Journal of the American Society for Information Science, 49(5), 471–476. Huber, J. C. (2002). A new model that generates Lotka’s law. Journal of the American Society for Information Science and Technology, 53(3), 209–219. Huber, J.  C., & Wagner-Döbler, R. (2001). Scientific production: A statistical analysis of authors in mathematical logic. Scientometrics, 50(2), 323–337. Jeong, H., Néda, Z., & Barabási, A. (2003). Measuring preferential attachment in evolving networks. Europhysics Letters, 61(4), 567–570. Merton, R. K. (1968). The Matthew effect in science. Science, 159, 56–63. Merton, R. K. (1988). The Matthew effect in science, II: Cumulative advantage and the symbol­ ism of intellectual property. Isis, 79(4), 606–623. Newman, M. E. (2001). Clustering and preferential attachment in growing networks. Physical Review E, 64(2), 025102. Newman, M.  E. (2005). Power laws, Pareto distributions, and Zipf ’s Law. Contemporary Physics, 46(5), 323–351. Perc, M. (2014). The Matthew effect in empirical data. Journal of the Royal Society Interface, 11(98), 20140378. Petersen, A. M., Jung, W., Yang, J., & Stanley, H. E. (2011). Quantitative and empirical demon­ stration of the Matthew effect in a study of career longevity. Proceedings of the National Academy of Sciences, 108(1), 18–23. Pham, T., Sheridan, P., & Shimodaira, H. (2015). PAFit: A statistical method for measuring preferential attachment in temporal complex networks. PLoS One, 10(9), e0137796. doi:10.1371/journal.pone.0137796 Powell, W., White, D., Koput, K., & Owen-Smith, J. (2005). Network dynamics and field evo­ lution: The growth of interorganizational collaboration in the life sciences. American Journal of Sociology, 110(4), 1132–1205. Price, D.  D. (1976). A general theory of bibliometric and other cumulative advantage pro­ cesses. Journal of the American Society for Information Science, 27(5), 292–306. Ratkiewicz, J., Fortunato, S., Flammini, A., Menczer, F., & Vespignani, A. (2010). Characterizing and modeling the dynamics of online popularity. Physical Review Letters, 105(15), 158701. Reskin, B.  F. (1977). Scientific productivity and the reward structure of science. American Sociological Review, 42(3), 491–504. Restivo, M., & van de Rijt, A. (2012). Experimental study of informal rewards in peer produc­ tion. PLoS One, 7(3), e34358. doi:10.1371/journal.pone.0034358 Salganik, M.  J., Dodds, P.  S., & Watts, D.  J. (2006). Experimental study of inequality and unpredictability in an artificial cultural market. Science, 311, 854–856.

Field Experiments of Preferential Attachment   391 Salganik, M.  J., & Watts, D.  J. (2008). Leading the herd astray: An experimental study of self-fulfilling prophecies in an artificial cultural market. Social Psychology Quarterly, 71(4), 338–355. Simon, H. A. (1955). On a class of skew distribution functions. Biometrika, 42(3/4), 425–440. van de Rijt, A., Kang, S.  M., Restivo, M., & Patil, A. (2014). Field experiments of ­success-­breeds-success dynamics. Proceedings of the National Academy of Sciences, 111(19), 6934–6939. Willson, A., Shuey, K., & Elder, J. G. (2007). Cumulative advantage processes as mechanisms of inequality in life course health. American Journal of Sociology, 112(6), 1886–1924. Yule, G.  U. (1925). A mathematical theory of evolution, based on the conclusions of Dr. J. C. Willis, F.R.S. Philosophical Transactions of the Royal Society of London, Series B, 213, 21–87. Zipf, G. K. (1935). The psychobiology of language. Boston, MA: Houghton-Mifflin.

Chapter 21

Dua lit y beyon d Persons a n d Grou ps Culture and Affiliation Sophie Mützel and Ronald Breiger

The concept of duality speaks to the core of the relational approach in the social sciences, which understands that essences imply relations. As a general concept, duality vastly enlarges the types of data and the types of phenomena network analysis can examine. Substantively, duality proves fundamental for the analysis of how the social is structured and how sociologists can go about analyzing such structuring. One of Georg Simmel’s (1955) fundamental insights into social structure was to see the intersection of circles,1 that is, to understand a group as the union of individuals who belong to it and, in turn, to understand an individual as an intersection of the groups to which the individual belongs (p. 141). In Simmel’s understanding, individuals and groups are of the same contents, but two different categories. In his article “The Duality of Persons and Groups,” Ronald Breiger (1974) formalizes Simmel’s idea (pp. 181–182): Consider a set of individuals and a set of groups such that the value of a tie between any two individuals is defined as the number of groups of which they are both members. The value of a tie between any two groups is defined conversely as the number of persons who belong to both of them.

A set of actors thus affiliates with a set of groups and vice versa. Information about these two sets of data can be represented together: persons and the groups they belong to can be expressed as a two-mode incidence matrix or affiliation matrix. In turn, the relations between persons and the groups they belong to can be visually represented as a bipartite graph, in which the two types of nodes are simultaneously shown, connected by their affiliation. The 1974 article on duality shows the formal, mathematical transformation from a twomode incidence matrix to a one-mode matrix of comembership, or, more generally, of

Duality beyond Persons and Groups   393 co-occurrences. In this matrix transformation, the original two-mode dataset is t­ ransformed into a one-mode weighted matrix of persons connected to other persons by the groups they belong to as well as a second one-mode weighted matrix of groups connected to other groups by the people that constitute them. Or if, considering a visual representation, one graph shows the persons as the nodes, then their joint memberships in groups are the connections between them. And, dually, in a second graph groups are the nodes and the persons they share are the ties between them. Technically, the transformation from a rectangular affiliation matrix thus results in two square comembership matrices. They each yield information that is significantly different than information obtained from interaction data or survey tools (e.g., who talks to whom) on social ties. Comembership ties of persons may suggest a potential for direct social interaction. However, networks derived from transformation using the principle of duality do not show direct interactions. Rather, the ties indicate relations between elements on the basis of additional structural information folded in (e.g., group membership between persons). Indeed, one of the qualities of the concept of duality and its matrix transformation is to allow for an empirical analysis of the structuring of the social world without having information about direct interactions (just “metadata”; Healy, 2013) and without assigning a priori categories of the groupiness of groups or people (Stegbauer, 2019). This technical feature based on theoretical insights and mathematical transformation has presented network analysts with the possibility to work with data sources other than direct interactional data that typically originate from observations or questionnaires. Data sources range from vast arrays of historical data from archives (Ventresca & Mohr, 2002) to large-N datasets of trace data (Shi et al., 2017), particularly in forms of texts. Moreover, these technical aspects have shown broad usage in formal modeling of what are called twomode or bimodal or affiliation networks in the sociological social network literature (e.g., Agneessens & Everett, 2013) and bipartite graphs in the “new” science of networks, with a reference to graph theory (e.g., Watts,  2004). Utilizing the formal modeling properties, research from a variety of disciplines has shed light on substantive areas, including the study of taste (e.g., Lizardo, 2006), flavors (e.g., Ahn et al., 2011), social movements (e.g., Bearman & Everett, 1993; Diani & Kousis, 2014), historical change (e.g., Tilly, 1997), scientific fields (e.g., Moody, 2004), organizations (e.g., Mizruchi & Galaskiewicz, 1993), markets (e.g., Brailly et al., 2016), and the global economy (e.g., Hidalgo & Hausmann, 2009), to name only a few. Substantively, duality can be understood as a structural mechanism that links one type of social structure to another type (e.g., persons and the groups they belong to), effectively linking multiple levels of social structure. These social structures may represent different orders of social phenomena, potentially linking some lower level (e.g., persons) to higher levels (e.g., groups). How persons are linked to each other through their shared membership ties yields of course a different network than how groups are linked to each other via the persons that constitute them. The structural mechanism of duality, formally linking one type of social structure to another, has become a building block of social network analysis. It has been prominently used by literatures on interlocking directorates (Mizruchi, 1996), collaboration networks of teams (Wuchty, Jones, & Uzzi, 2007), terrorist networks (Breiger et al., 2014; Elzinga et al., 2010), the structure of scientific fields (Moody & Light, 2006) and the “ ‘new’ science of networks” (Watts, 2004) alike.

394   Sophie Mützel and Ronald Breiger Considerations of duality also go farther than technical insights into a structural ­ echanism. Breiger (1974) suggests a duality, in which the structure of the intergroup m ­network (groups are the nodes and shared persons are the ties among groups) and the structure of the interpersonal network (persons are the nodes and their joint memberships in groups are the connections among the persons) are distinct but also mutually ­constitutive. To be sure, identities of persons as well as identities of groups are understood as fundamentally relational. A person’s identity is not defined by his or her attributes; a group’s identity is not defined in terms of assigned functions. Rather, they co-constitute each other: the structural logic of one type of social structure is revealed to be constitutive of the structural logic of the other and vice versa. Duality in this sense is a “relational device” (Mohr & White, 2008, p. 490) to explore the mutual constitution of social elements. Moreover, research has also shown that properties of duality extend to linkages among different kinds of social elements. This literature has linked social and cultural elements (e.g., categories and practices, organizations and their interpretation of issues, persons and cultural items, actors and stories). It has thus opened new avenues to map and analyze dual relations (e.g., Mohr,  1998).2 “In each case, the analytic task is to illuminate how one order of social phenomena is relationally linked to a different order of social experience in such a fashion that their structural inter-dependency is mapped” (Mohr & White, 2008, p. 491, emphasis added). Employing this analytic principle of duality has thus allowed moving from studying interrelations among social actors to analyzing the underlying structure of interests, tastes, styles, and categories. Both the mutual constitutiveness and the possibility to link different kinds of social elements to each other have been fundamental for connecting social network analysis to cultural analysis since the 1990s. The general principle of duality has thus provided substantive insights into the social structuring of the world in addition to the technical possibility of formally modeling dual constitutions. It has been integral to social network analysis over the past 40 years. What is more, it has also been enabling the formal analysis of culture, particularly using texts as data, as well as analyses of large-N datasets. Our aim in this chapter is not to provide an overview of formal modeling considerations (see, e.g., Borgatti, Everett, & Johnson, 2013; Borgatti & Halgin, 2011; Lee & Martin, 2018; Martin & Lee, 2018; Robins, 2015; Wasserman & Faust, 1994) or a sense of the theoretical possibilities for elaborating the duality concept (Puetz, 2017). Instead, we take the general principle of duality as a lens to group together literatures that have been and will continue to be influential for the analysis of the social.3 The chapter is structured as follows: the next section locates the concept of duality within past and present sociology and points to its limitations. The third section discusses four strands of literature that analyze culture, differentiating between works that use the principle of duality as a mechanism of linkages and as an analytical principle. The third section closes with an argument that research needs to also consider moving “beyond relationality” (Pachucki & Breiger, 2010). The fourth section discusses research on affiliation networks that typically are interested in the structural mechanism that the principle of duality offers. The chapter closes with a discussion of recent developments; it calls for the study of “how culture prods, evokes, and constitutes social networks in ways that may be envisioned and modeled by new analytic methods” (Pachucki & Breiger, 2010, p. 207).

Duality beyond Persons and Groups   395

Duality in Past and Present Sociology Duality is a classic concept in explanations of the social. References to a concept of ­co-­constitution may be traced back at least as far as the philosophical writings of Baruch Spinoza in the 17th century. It has been argued that important aspects of the “classical” sociologies of both Durkheim and Simmel were influenced in part by their distinctive readings of, and references to, Spinoza (Breiger, 2011). The duality concept has also been noted in the work of sociologists Cooley and Goffman, as well as the social anthropologist Nadel (Breiger, 1974).

Limitations of Breiger’s 1974 Formulation Breiger’s (1974) formal model of Simmel’s intersection of social circles (Kreuzung der sozialen Kreise) conceptualized relationships among actors (albeit actors at two different levels) only in terms of membership relations. Ties among individuals based on the groups to which they both belong, or (conversely) ties among groups based on shared memberships, might usefully be compared or contrasted to social relations among persons or among groups (respectively), but the membership ties are not social relations; the distinction has been well recognized by Whitham (2012). The concept of duality captures and formalizes one central idea of Simmel’s intersecting circles. Others have picked up Simmel’s idea differently and have focused on the social cohesion of a group as a social circle of individuals (Kadushin, 1976) or on the fact that shared attendance at an event can establish a social relation (Feld, 1981). Breiger’s formulation of duality does not make a direct contribution to extending these important concepts.

Scope of Duality Discussion It is useful to distinguish a number of different representations of insights about duality. One approach is to convert a two-mode dataset to two one-mode datasets, in which each one-mode dataset contains information about the other. Dual projection approaches (Everett & Borgatti, 2013) take the two-mode dataset into account. Especially when working with visual representations, research has been taking multiple or heterogeneous network relations into account (Cambrosio et al., 2006; Powell et al., 2005). Tripartite structural analysis (Fararo & Doreian, 1984) (“nested dualities”) provides an extension of the duality idea to formal models of multiple levels of social structure. A suggestive application (Cornwell, Curry, & Schwirian, 2003) is to the study of community conflict, where actors, issues, and “games” (referring to strategic action within and between sectors of local communities such as land development, finance, and political bureaucracies) are all related to one another in a tripartite analysis within which community conflict is specified as an “ecology of games.” McPherson (1983) developed an “ecology of affiliation,” which extends the duality idea to a multidimensional property space within which the overlapping attributes of members of different types of groups (such as

396   Sophie Mützel and Ronald Breiger labor unions and fraternal organizations) provide measures of the competition between the groups. The agenda-setting volume of Lazega and Snijders (2016) provides a model (along with applications to the study of institutions) in which agency exists at the level of individuals (such as scientists in laboratories who have social and co-citation relations with others) and at the level of groups (in the form of mobility of scientists among laboratories based on shifting research identities of the labs as well as the laboratories’ prestige rankings); moreover, affiliation ties (such as the affiliation of a scientist with a laboratory) connect these two levels of agency. A recent pair of papers (Lee & Martin, 2018; Martin & Lee, 2018) provides the most complete effort at algebraic formalization and generalization of the duality concept to encompass cultural structures and structural elements of culture, while also building upon principles of duality and those of formal concept analysis. The substantive focus of this chapter is on two general strands of research that have utilized the duality principle and have contributed greatly to network analysis: studies of culture and of affiliation networks. The next section discusses dualities in the analysis of culture.

Dualities in the Analysis of Culture In recent years, several overviews have astutely highlighted the fundamental role of ­network analysis for the analysis of culture (e.g., Breiger & Puetz,  2015; Crossley,  2016; DiMaggio,  2011; McLean,  2017; Mische,  2011; Pachucki & Breiger,  2010; Rule & Bearman, 2016). As DiMaggio (2011, p. 286) points out, “network analysis is the natural methodological framework for empirically developing insights from leading theoretical approaches to cultural analysis.” At the same time, research has also been “showing how culture prods, evokes, and constitutes social networks in ways that may be envisioned and modeled by new analytic methods” (Pachucki & Breiger, 2010, p. 207). Duality serves as a fundamental concept for the empirical analysis of culture from a network perspective. This section provides overviews of the role of duality in the analysis of networks and culture. In closing, it argues for going beyond dualities and toward the fusion of network structures and culture.

Dualities of Artists and Art Worlds Studies on art worlds and the organizational field of the arts have used the mechanism of the duality of persons and their groups to study patterns of these fields using network analysis and other relational approaches: for example, relations of artists and their galleries (Giuffre, 1999), musicians and their studios (Faulkner, 1986), musicians and the clubs they performed in (Crossley, 2009), writers and their group memberships (Anheier, Gerhards, & Romo, 1995), actors and the films they participated in (Watts, 1999), or musical team members and the musicals they participated in (Uzzi & Spiro, 2005). In addition to serving as a structural mechanism for studying aspects of cultural production, duality has also been a central concept in analyzing other notions of culture, which directly address the limitations of a mere structural notion of duality and offer alternatives.

Duality beyond Persons and Groups   397 An important development has been the articulation of a duality of social and semantic networks, arising from the comparative study of artistic communities (Basov et al., 2017; Basov et al., 2019).

Duality of Actors and Cultural Forms One focus considers cultural aspects necessary to sustain relations, identities, and groups. Groups, as we saw earlier, are interpersonal networks, but they also exhibit group styles (Eliasoph & Lichterman, 2003), which provide stylized templates for interpersonal interaction. Cultural and relational aspects of social structure become inseparable as “culture in interaction.” This echoes DiMaggio’s (1992) quest to develop a focus in network analysis that takes both cultural and relational aspects into account. One focus of research has been on the study of a culture and its identity and boundaries, through shared objects, symbols, or expressions of taste. For example, relating consumer goods to households yields an analysis of a culture and its subculture patterning (Schweizer,  1993); ceramics and their location found in archeological settlement excavations provide insights into the culture of groups that once used and shared the objects (Mills et al.,  2013). Moreover, shared tastes constitute the identity of social groups (e.g., Lizardo, 2006) and, dually, social groups shape cultural taste (e.g., DiMaggio, 1987). In this sense, the focus of the dual relation is no longer on persons and the groups they belong to as a mechanism that links one level of social structure to another level. Rather, “duality is also a property that extends to linkages among different kinds of social orders” (Mohr & White 2008, p. 490). The principle of duality thus allows relating cultural entities, including objects, practices, categories, interpretations, styles, and tastes, to social actors to study aspects of a culture and its identity and boundaries. For example, the study of political actors and their ideology yields insights on the political elite structure (Bearman, 1993); the study of illustrated animal species and their occupations provides insights into the culture of the division of labor in a society (Martin, 2000); and bringing styles and fashion houses into a dual relation yields an analysis of the fashion market (Godart, 2018). DiMaggio (2011, p. 290) points to the idea of duality as “the recognition that each mode in a two-mode network constitutes the identity of the other,” which applies to persons and groups as much as to cultural entities and actors that share them. Cultural entities then constitute the actors who share them, and, dually, actors constitute the cultural entities. Particularly formative for using duality as a linkage among different kinds of social levels in cultural analysis are the works of John Mohr. In a series of articles on the US social welfare system at the turn of the 19th century, Mohr shows how practices and organizational identities are dual and can be studied to capture “institutional logics” (Friedland & Alford, 1991). Using directory data from social welfare charities, Mohr (1994) brings status identities of people described as in need of relief (e.g., “blind men”) and categories of relief services by a particular organization (e.g., “given asylum by state-supported non-profit organization”) into a dual relation. Mohr then turns to blockmodel analysis to show the patterns of moral order. Mohr and Duquenne (1997) show how the larger categories organizations use for people in need of relief (e.g., “deserving”) and the relief practices they employ in attending to the needy (e.g., “give food”) are also dually related and can be studied as such. Using Galois lattices4 to capture the mutual constitutiveness of categories and

398   Sophie Mützel and Ronald Breiger practices, they can show shifts in welfare discourse over the period of several decades. In another article, Mohr and Guerra-Pearson (2010) study the institutional space of welfare organizations, using relief practices, status identities, social problems, and organizational types to show the duality of organizational niche and organizational form. In yet another series of articles, focusing on the duality of categories and practices, Mohr and coauthors study changes in the institutional logic of affirmative action (Breiger & Mohr,  2004; Mohr & Lee,  2000).5 Building on these works on practices and identities, Breiger (2000) studies the dual relation between US Supreme Court justices authoring special opinions and their issue areas using both correspondence analysis and Galois lattices comparatively. Mische and Pattison (2000) analyze how Brazilian youth activists representing different ideological perspectives were linked to social movement organizations and to political events (also Mische, 2008). This use of duality as an analytic principle linking together different kinds of social levels inquires into the underlying logics of particular settings and salient identities—and how they change (Mohr, 1998). To be sure, co-constitution is also possible among nonhuman actors, as categories, objects, practices, and skills may also be brought into relations with each other to understand underlying identities and dynamics.

Dualities of Networks and Meaning Mohr’s work also points to a third focus of research using duality as an analytic principle when studying culture: it allows tapping into the making of meaning. The study of meaning and meaning making is central for the analysis of culture. Formal ways to measure meaning and to model meaning making have aided this quest (Mohr, 1998). As indicated earlier, research of the past decades has urged network analysis to take both cultural and relational aspects into account. In particular, interactionists have pointed to the role of meaning within networks (Fine & Kleinman,  1983); others have argued that structuralist analyses were already providing cultural interpretations without acknowledging it (Brint,  1992); yet again others argued for the role of discourse as foundational for networks (Emirbayer & Goodwin, 1994; Somers, 1994). White’s Identity and Control (1992) provides a fundamental shift in network analytic thinking toward taking the role of meaning and discourse into account. White uses the notions of identity and control to describe how actors in interaction with others find their social footing and are able to move between social positions. To establish their social footing, actors use stories to which they themselves and others attribute meaning. These efforts to establish social footing occur in network domains (netdoms) that are realms of interaction characterized by bundles of relations and associated sets of stories, entangled with each other. In netdoms, structural and cultural dimensions of relations coalesce. White’s suggestions continue to be influential in interrelating network and cultural analysis, what has been called “relational sociology” (e.g., Mische, 2011; Mützel, 2009; Mützel & Fuhse, 2010). Two prominent strands of research are, one, on the role of meaning in social networks and, two, on how to study meaning making, with a particular focus on the role of stories as the ties in networks. For one, White’s approach to reconsider the duality of structure and culture, networks and meaning has spurred further considerations on how to think about social relationships and their ties. Interactionists point to the fluidity of both social relationships and the

Duality beyond Persons and Groups   399 identities of persons making up the relationship (Ikegami,  2005; Salvini,  2010). Others ­indicate that social relationships can be understood as “intersubjective constructs of expectations and cultural forms” (Fuhse, 2009, p. 52) between dyadic sets of actors and depend on the attribution of meaning. Moreover, the meaning structure of relationships also translates into the “network culture” overall. The meaning of ties, it turns out, is important for understanding the cultural form of a social network (Fuhse, 2009). McLean (1998) shows the importance of the meaning of ties in a network of political actors when bringing the relations of letter writers to their addressees with the types of requests into a dual analysis, using multidimensional scaling. In a study on relationships in communes, Yeung (2005), using Galois lattices, illustrates that meanings of relationships depend on social ties and the network culture. Another strand of work extracts meaning from bimodal networks of cultural elements. Mohr’s work on categories and practices was pioneering to get at the meaning of institutional settings. Others use texts as data in similar fashion, to uncover networks of meaning that emerge from relations among discursive elements. Mützel (2002) analyzes the market of competing German newspapers, their editorial and stylistic decisions on political reporting, and their evaluative principles during the period when the German political capital city relocated from Bonn to a unified Berlin. Using optimal matching and blockmodel analyses on large corpora of texts, the study shows newspapers competing economically and in meaning making, while also collaboratively creating a new German political narrative. Kennedy (2005) also studies a market. His work examines the nascent market of the new product “computer workstation” by looking at large corpora of media reports on work­station firms and their co-mentioned competitors. The analysis focuses on associations between companies, which can be found as co-occurrences in media reports as well as in companies’ own press releases. These discursive associations “construct a macro-­sociocognitive structure” that dually “embodies category meaning and reflects its social order” (p. 215). In turn, the new label “workstation” can become a legitimate product category. Indeed, the study of categories and in particular of category emergence over time has been a prominent application using the principle of duality to unearth networks of meaning among cultural elements (for recent overviews on category research see Durand & Khaire, 2017; Negro, Kocak, & Hsu, 2010; Vergne & Wry, 2014). Jones et al. (2012) study the formation of a new category, the new architectural style “modern architecture,” by relating information gathered from texts on architectural logics and material features used in buildings. Analyzing networks of concept and term co-occurrences in architectural literature, this work shows formation and also contestation over the new category. Typically working with textual data, this line of work is especially attuned to the role of intermediaries or third-party evaluators, such as media analysts (e.g., Kennedy, 2008) or restaurant reviewers (e.g., Rao, Monin, & Durand, 2005), in the emergence of new categories of meaning.

From Relationality to “Fusion” of Networks and Culture While seminal works view the relation of structure and culture as “interdependent though autonomous” (Emirbayer & Goodwin, 1994) and research has predominantly shown effects of network structure on culture and, much less so, of culture on network structure, current works suggest possibilities for understanding structure and culture as “intertwined and

400   Sophie Mützel and Ronald Breiger interdependent sets of social formations” (Godart & White,  2010, p. 581). The analytical quest is to move beyond relationality to a fusion of networks and culture. This could allow the bridging of “cultural holes” that currently permeate research both on networks and on culture, thus gaining analytical rewards (Pachucki & Breiger, 2010). In this move, the concept of duality and Harrison White’s works are central features. Looking at White’s work of the past decades, it becomes evident that some of his main ideas relate to duality. For example, vacancy chains can be understood as the duality of vacancies and job categories; markets and their upstream-downstream relations of producers, buyers, and sellers can be seen as the duality of decoupling and embeddedness (Breiger, 2005, p. 885). In Identity and Control, White begins to rework the idea of the duality of structure and culture.6 Instead of “interdependent though autonomous,” structure and culture, or social structure and stories, become “second-order processes that need to be accounted for by the dynamics of identity and control among network domains” (Pachucki & Breiger,  2010, p. 208). In this sense White’s notions of identity and control are “a deeper set of forces” that play out generating both structure (social networks) and culture (local rules, practices, stories) (Breiger,  2010, p. 38). Culture is not shared, but rather analysts need to take into account switchings of identities across netdoms, which themselves are temporary settings, in which relations and stories are entangled. Identities emerge rather than remain fixed in networks. Networks are “multiple, cross-cutting sets of relations sustained by conversational dynamics, shared story-lines and shifting definitions of social settings” (Mische, 2003, p. 258). Meaning, as the previous section has already indicated, emerges from the structure of relations among a plurality of cultural elements. There is no primary causal effect between network structure and culture, but rather both are intertwined and entangled with each other: “stories describe the ties in networks” (White, 1992, p. 65) and also “a social network is a network of meanings” (p. 67). Pachucki and Breiger (2010) review empirical works that pick up this quest for the intermingled nature of culture and network structure. They also suggest four analytical avenues into how “cultural holes” (i.e., the cultural contingencies that enable network structures) can be analyzed to overcome prioritizing culture over structure or vice versa (pp. 215–218). The idea is to focus further on how network structure and cultural content is clustered— where and why there are holes. For example, Lizardo (2014) develops a strategy to measure cultural holes and identifies omnivorous cultural taste as a way to bridge cultural holes. Goldberg (2011) shows the spanning of cultural holes and overlaps in culture, not on the basis of homophily in attitudes per se, but rather on the basis of homophily in the relational structure among attitudes. Again, others show the central role of ambiguity and multivocality in the making of categories that may serve to bridge cultural holes (Mützel, 2010).

Dualities in the Analysis of Structures: Affiliation Networks The principle of duality also lies at the core of the study of affiliation networks that have a structural two-mode setup to explain particular social outcomes. Typically, studies on affiliation networks work with large-N datasets and focus on structural explanations—without

Duality beyond Persons and Groups   401 further interest in cultural explanations or the meaning for the actors involved. Affiliation networks come under many rubrics in different empirical fields. Moreover, there are multiple centers of fundamental works, which will be discussed in the following.

Affiliation Networks In the past decades, the general concept of duality of persons and groups has been applied to different empirical fields. Without formalization, the dual relationship of persons and groups has been used in studies on elite group formation (Laumann & Pappi, 1976). It has also been instrumental in developing research on interlocking directorates, which occur “when a person affiliated with one organization sits on the board of directors of another organization” (Mizruchi, 1996, p. 271). Interlocking directorates thus present a classic application of the duality of persons and groups. Interlocking directorates have been shown to serve as communication systems, to affect control and influence (e.g., Mintz & Schwartz, 1985; Mizruchi, 1992; Useem, 1984), as well to increase the imitation of corporate practices (e.g., Ahuja, 2000; Galaskiewicz & Burt, 1991). Interlocking directorates have also shown structural features of entire politico-economic systems (e.g., Stokman, Ziegler, & Scott,  1985; Windolf, 2002). Substantively, the idea of interlocks has been criticized because the structural mechanism assumes relationships between members of a group, while the potential complexity of relations gets reduced to simple and, by construction, symmetrical relations.7 Other examples of affiliation networks prominently include research on knowledge production focusing on the structure of collaboration. Using the dual structure between authors and their publications, collaboration networks between authors can be constructed and analyzed to gain insights on the structure of scientific fields (Moody, 2004; Moody & Light, 2006). Similarly, using authors, their publications, and their patents as data, others have focused on the structure of collaboration in forms of teams in different scientific fields (Wuchty et al., 2007).

Revamping Old Ideas? “New” Science of Networks Another center of research using affiliation networks is the “ ‘new’ science of networks” (Watts, 2004) that captured physicists’ interest in social networks at the beginning of the 2000s. This line of research uses large-N datasets, is strong in theoretical modeling, and does not necessarily relate to the sociological ancestry of the concept of duality and its formalization. Moreover, with references to graph theory instead of sociological theory, this line of research now refers to two-mode networks of persons and groups as “bipartite affiliation networks.” A single-mode projection of persons connected by the groups they belong to is called an “actor affiliation network” and, analogously, the single-mode projection of groups connected by the people that share membership is the “group interlock network” (Watts, 2004, pp. 248–249). Watts and Strogatz (1998) use, as one of three empirical examples, a network of actors and the movies they have appeared in, to show the existence of small worlds. Newman (2001) studies the structure of scientific collaboration using coauthorship measured by

402   Sophie Mützel and Ronald Breiger authors and the publications they cowrote as a proxy for “human acquaintances.” Watts, Dodds, and Newman (2002) recognize the role social groups play in network structures when they model search processes in large-scale networks. They suggest a model of generalized affiliation networks “in which distance between groups is defined according to some number of social dimensions (e.g., geography and occupation), and individuals are characterized by the coordinates of the groups to which they belong” (Watts, 2004, p. 248). Others have followed this interest and have studied bipartite affiliation networks beyond persons and groups that are wide-ranging in empirical substance (see Barabási, 2016, for a current summary). For example, to quantify the economic complexities of nations and thus to contribute to economic development theory, Hidalgo and coauthors (Hidalgo & Hausmann, 2009; Hidalgo et al., 2007) have studied affiliation networks in which countries are connected to each other by the types of products they export. Others have examined global culinary practices, using affiliation networks of flavor compounds and cooking ingredients, to map flavor networks (Ahn et al., 2011).

Actor-Network Theory and “Heterogeneous Networks” Recent works within the larger perspective of actor-network theory, with its tradition of the study of science and technology and in particular taking nonhuman entities into account, have similarly started working with affiliation networks (Mützel, 2009). This presents yet another center of research on using affiliation networks. Crucial in this line of work is the move toward network visualizations of what is here called “heterogeneous relational data” (Cambrosio et al., 2006, p. 3141; see also Cambrosio et al., 2013), namely two-mode data. Based on text analytic and networks analytic measures on the relations between workshops and institutions (Cambrosio, Keating, & Mogoutov, 2004), authors and their area of research (Bourret et al., 2006), or journals and research areas (Cambrosio et al.,  2006), network maps represent entire fields of science. Latour et al. (2012) similarly use the power of visualization of affiliation networks connecting keywords, authors, and institutions to argue that this collapsing of levels (into one visual representation) and possibly ensuing reconfigurations of perspectives is what Gabriel Tarde meant by his notion of monad. “A monad is not a part of a whole, but a point of view on all the other entities taken severally and not as a totality” (p. 598). Indeed, bipartite networks allow for such a point of view. Others have been working on spanning the bridges between the literatures on affiliation networks, duality, and heterogeneous networks: Roth’s work (Roth,  2013; Roth & Cointet, 2010) shows multilevel dynamics between social and semantic structures.

Recent Developments and Future Directions This concluding section discusses two noteworthy developments using and extending the  general principle of duality.8 One concerns the general principle of duality and its

Duality beyond Persons and Groups   403 extensions beyond the two-mode to multiple levels of affiliation. The other relates to ­developments in semantic network analyses, which use the duality of documents and words to show shifts and drifts in trajectories of political or scientific discourse over time.

Duality and Its Extensions toward Multiple Networks In the past decade, the concept of duality has been extended beyond the two-mode to multiple levels of affiliation. After all, social life happens in and across multiple networks. Organizational research has long been aware of multiple relations and their interplay between interpersonal, interunit, and interorganizational ties (e.g., Brass et al., 2004). Yet only recently has research zoomed in on the linkages and interdependencies between these different levels. Padgett and Powell (2012) propose to consider the multilayeredness of social life and its dynamics to explain how newness emerges. In particular, they focus on multiple network domains and study large-scale, long-term processes. On the basis of a diverse set of case studies, Padgett and Powell argue that organizational novelty comes from transpositions and recombinations of and across multiple network domains. An organization is the reproduction and recombination of persons and rules, which actors from different networks bring along. In turn, people are collections and results of prior networks and their interaction rules. “In other words, both organizations and people are shaped, through network co-evolution, by the history of each flowing through the other” (Padgett, 2012, p. 171). Lazega et al. (2008) suggest a multilevel network analysis via linked design to examine a “dual positioning” (p. 161) of actors in a system of multiple interdependencies. This approach first analyzes the complete networks of interpersonal and interorganizational ties separately. Then it links these two networks to each other on the basis of information about membership of each individual in interpersonal networks to one of the organizations in the interorganizational network. In the empirical study, French cancer labs have ties to each other via, for example, the joint usage of equipment and recruitment of researchers, which come to form a network of interorganizational ties. Scientists also have interpersonal ties with other scientists in seeking advice (p. 163). Providing a link between the two sets of networks, each scientist is affiliated with a cancer lab. Bringing information about inter­ personal and interorganizational networks together in one research design underscores the co-constitution of social positioning. More recently, the basic idea of duality has also been incorporated within the powerful exponential random graph model (ERGM) family of statistical models for networks, now applicable to multilevel ERGMs (see Wang et al., 2013, 2016, for the analytical framework of multilevel EGRMs). For example, Zappa and Lomi (2016) use such a framework to study the interdependencies between formal and informal relations within organizations, which are shaped by the joint membership of persons in subunits and the presence of relations between subunits (p. 337), that is, by the intersection of interpersonal ties and organizational subunits. The study examines how both social and structural conditions affect the likelihood that network ties crosscut formal organizational boundaries. It shows that informal interpersonal ties are sustained and shaped by hierarchical relations linking subunits in which organizational participants are located.

404   Sophie Mützel and Ronald Breiger Such multilevel network analyses allow for the possibility to explain how lower-level f­ eatures and processes result from features of linked higher-level structures—and vice versa. Breiger (2015) suggests that more research approaches are needed that allow for a connection of macro-level findings to lower-level, local behavioral processes and identifies this as a “duality of scaling up and down” (p. 3).

Cultural Analysis: Duality of Documents and Words Yielding Categories Another area of development continues and expands the tradition of formal cultural analysis using newly available old and new textual datasets (Bail, 2014; Bearman, 2015; Mohr & Rawlings, 2015) and new techniques of large-scale textual analysis (Evans & Aceves, 2016; Ignatow & Mihalcea, 2016; Mohr & Bogdanov, 2013). Such big data analyses of textual data relations rely on the general principle of duality between documents and words to analyze patterns of co-occurrences and to reduce complexities. Empirical studies investigate entire fields of public, political, economic, or scientific discourse over time and are able to show their changes and evolution. Light and adams (2016) use the method of topic modeling to explore how HIV/AIDS research has evolved over 20 years. The applied latent Dirichlet allocation (LDA) topic modeling algorithm is a probabilistic model of unsupervised learning techniques. LDA assumes that documents consist of a distribution of themes or topics; topics, in turn, consist of a distribution of words. But rather than simply looking at the distribution of words across documents, the LDA algorithm focuses on the co-occurrence of words in each document to endogenously identify a specified number of topics that consist of these clusters of words. In effect, a word can be part of several topics. This method thus considers a word’s different meanings that result from its co-occurrence with other words (see Blei, 2012; Blei, Ng, & Jordan,  2003; DiMaggio, Nag, & Blei,  2013). On the basis of estimated topics, Light and adams then construct networks of topics that take into account topics’ similarity and difference based on the overlap of words. They are thus able to show the rise and fall of particular topics as well as how the structure of knowledge production in scientific discussions changed over time. Rule, Cointet, and Bearman (2015) suggest an alternative approach to topic modeling using semantic network analytic techniques. They base their analysis of State of the Union addresses over 227 years on frequently occurring noun terms, including multiword phrases, and translate the co-occurrence of these terms into a measure of proximity that captures the relatedness of each word pair. This yields a semantic network of co-occurrences with terms as nodes connected by edges weighted by their similarity. A community detection algorithm then identifies cohesive subsets that can be understood as discursive categories identified in the documents. These categories serve similar sorting purposes as the topics mentioned earlier and allow examining local and global semantic network structures of specific time periods. Compared to topic modeling, this approach permits insights into how each category is semantically and relationally structured by the terms it consists of— and how semantic categories evolve over a long period of time. In another large-scale textual analysis and using a similar approach, Hoffman et al. (2018) show the semantic structure

Duality beyond Persons and Groups   405 of the Protestant Bible, including semantic kernels of dissent and conformity for later competing interpretative usages. Mützel (2016) combines qualitative, topic modeling, and semantic network analyses using tens of thousands of public stories over 22 years to show how a new category in the field of breast cancer therapeutics emerges across the network domains of science, commerce, industry, and journalism. Zooming in at different analytical levels, the analyses delineate shift and drifts in the semantic trajectories of therapeutic treatments—the vast majority of which failed in development. The study of texts shows that after series of contestations and rejections, narratives from science, commerce, industry, and journalism eventually come to agree on a new medical category. This categorization, in turn, laid the ground for inducing a shift in market structure and, subsequently, a new market emerges. Such studies of large text corpora using novel, computational methods of text analyses, also in combination with qualitative approaches, permit insights into meaning-making processes over time. The formal modeling of culture using texts, relying on the foundational principle of duality, and finding patterns that, in turn, need interpretation can also be seen as an alternative to the assumed divide between qualitative and quantitative approaches in cultural analysis. We assume that future research will continue to work on understanding the dualities of culture and structure. We also expect that future work will examine “how culture prods, evokes, and constitutes social networks in ways that may be envisioned and modeled by new analytic methods” (Pachucki & Breiger, 2010: 207). In effect, research will focus less on dualities and more on their intermingled nature and interdependencies.

Notes 1. The German reference is Die Kreuzung der sozialen Kreise (Simmel,  1992 [1908]). In a 1955  translation into English, Bendix chose the title “The web of group-affiliations” (Simmel, 1955). 2. Mohr’s formative insights have recently been expanded in a collection of articles brought together in a Poetics special issue on the formal study of culture (Edelmann & Mohr, 2018). 3. This chapter thus links to several other subjects covered in this handbook, for example, on theory, aspects of culture and meanings, or multilevel networks, though it is structured around the concept of duality. 4. On Galois lattices see also Freeman and White (1993). 5. By providing a formal model of aggregation, Breiger and Mohr (2004) ratchet up the study of categories and identities. 6. Arguably, one of White’s early concepts catnets, which relates to the intersection of network structure and social categories (White, 2008) prefigures this. 7. Although research on interorganizational networks is typically understood to analyze relationships between actors of the same level (i.e., organizations of different types linked to each other), we can also consider organizations to be linked to other organizations by elements of another level, for example, contracts and investments (see, e.g., Powell, Koput, & Smith-Doerr, 1996, p. 125, for a list of possible relations), as well as by the members they share with others from previous job posts. 8. We note that there have been interesting extensions regarding the formal modeling of dual relations as well as in the usage of the principle of duality for modeling purposes. This necessitates another discussion. Here we can also point to works that show close affinity

406   Sophie Mützel and Ronald Breiger between two-mode network analyses and correspondence analysis (Breiger,  2000; de Nooy, 2003) as well as qualitative comparative analysis (Breiger, 2009). Moreover, Breiger and Melamed (2014) “formulate a ‘dual’ to regression analysis in which a network among the cases, as well as a two-mode cases-by variables array can be seen to generate key analytical outcomes” (p. 265). Turning regression analysis inside out using network analytic techniques yields further insights on the cases than classic regression analysis (see also Breiger et al., 2014).

References Agneessens, F., & Everett, M. G. (2013). Introduction to the special issue on advances in twomode social networks. Social Networks, 35(2), 145–147. Ahn, Y.-Y., Ahnert, S. E., Bagrow, J. P., & Barabási, A.-L. (2011). Flavor network and the principles of food pairing. Scientific Reports, 1(196), 1–7. doi:10.1038/srep00196 Ahuja, G. (2000). Collaboration networks, structural holes, and innovation: A longitudinal study. Administrative Science Quarterly, 45(3), 425–455. Anheier, H. K., Gerhards, J., & Romo, F. (1995). Forms of capital and social structure in cultural fields: Examining Bourdieu’s social topography. American Journal of Sociology, 100(4), 859–903. Bail, C.  A. (2014). The cultural environment: Measuring culture with big data. Theory and Society, 43(3–4), 465–482. Barabási, A.-L. (2016). Network science. Cambridge, UK: Cambridge University Press. Basov, N., de Nooy, W., & Nenko, A. (2019). Local meaning structures: Mixed-method socio­ semantic network analysis. American Journal of Cultural Sociology, 1–44, doi:10.1057/ s41290-019-00084-9 Basov, N., Lee, J.-S, & Antoniuk, A. (2017). Social networks and construction of culture: A socio-semantic analysis of art groups. In H. Cherifi, S. Gaito, W. Quattrociocchi, & A. Sala (Eds.), Complex networks & their applications V: Complex networks 2016 (pp. 785–796). Cham: Springer. Bearman, P. S. (1993). Relations into rhetorics: Local elite social structure in Norfolk, England, 1540–1640. New Brunswick, NJ: Rutgers University Press. Bearman, P.  S. (2015). Big data and historical social science. Big Data & Society, 2(2), 1–5. doi:10.1177/2053951715612497 Bearman, P.  S., & Everett, K.  D. (1993). The structure of social protest, 1961–1983. Social Networks, 15(2), 171–200. Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84. Blei, D. M., Ng, A. Y., & Jordan, M. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022. Borgatti, S. P., Everett, M. G., & Johnson, J. C. (2013). Analyzing social networks. Los Angeles, CA: Sage. Borgatti, S. P., & Halgin, D. S. (2011). Analyzing affiliation networks. In J. Scott & P. J. Carrington (Eds.), The Sage handbook of social network analysis (pp. 417–433). London, UK: Sage. Bourret, P., Mogoutov, A., Julian-Reynier, C., & Cambrosio, A. (2006). A new clinical collective for French cancer genetics: A heterogeneous mapping analysis. Science Technology Human Values, 31(4), 431–464. Brailly, J., Favre, G., Chatellet, J., & Lazega, E. (2016). Market as a multilevel system. In E.  Lazega & T.  A.  B.  Snijders (Eds.), Multilevel network analysis for the social sciences: Theory, methods and applications (pp. 245–271). Heidelberg, Germany: Springer.

Duality beyond Persons and Groups   407 Brass, D. J., Galaskiewicz, J., Greve, H. R., & Tsai, W. (2004). Taking stock of networks and organizations: A multilevel perspective. Academy of Management Journal, 47(6), 795–817. Breiger, R. L. (1974). The duality of persons and groups. Social Forces, 53(2), 181–190. Breiger, R. L. (2000). A tool kit for practice theory. Poetics, 27(2–3), 91–115. Breiger, R.  L. (2005). White, Harrison. In G.  Ritzer (Ed.), Encyclopedia of social theory (pp. 884–886). Thousand Oaks, CA: Sage. Breiger, R. L. (2009). On the duality of cases and variables: Correspondence analysis (CA) and qualitative comparative analysis (QCA). In D. Byrne & C. Ragin (Eds.), The Sage handbook of case-based methods (pp. 243–259). Thousand Oaks, CA: Sage. Breiger, R.  L. (2010). Dualities of culture and structure: Seeing through cultural holes. In J. Fuhse & S. Mützel (Eds.), Relationale Soziologie (pp. 37–47). Wiesbaden, Germany: VS Verlag. Breiger, R. L. (2011). Baruch Spinoza: Monism and complementarity. In C. Edling & J. Rydgren (Eds.), Sociological insights of great thinkers (pp. 255–262). Santa Barbara, CA: Praeger. Breiger, R. L. (2015). Scaling down. Big Data & Society, 2(2), 1–4. doi:10.1177/2053951715602497 Breiger, R. L., & Melamed, D. (2014). The duality of organizations and their attributes: Turning regression modeling “inside out.” Research in the Sociology of Organizations, 40, 263–275. Breiger, R. L., & Mohr, J. W. (2004). Institutional logics from the aggregation of organizational networks: Operational procedures for the analysis of counted data. Computational & Mathematical Organizational Theory, 10, 17–43. Breiger, R. L., & Puetz, K. (2015). Culture and networks. In J. D. Wright (Ed.), International encyclopedia of the social & behavioral sciences (2nd ed., Vol. 5, pp. 557–562). Oxford, UK: Elsevier. Breiger, R. L., Schoon, E., Melamed, D., Asal, V., & Rethemeyer, R. K. (2014). Comparative configurational analysis as a two-mode network problem: A study of terrorist group engagement in the drug trade. Social Networks, 36, 23–39. Brint, S. (1992). Hidden meanings: Cultural content and context in Harrison White’s structural sociology. Sociological Theory, 10, 194–208. Cambrosio, A., Cottereau, P., Popowycz, S., Mogoutov, A., & Vichnevskaia, T. (2013). Analysis of heterogenous networks: The ReseauLu Project. In B. Reber & C. Brossaud (Eds.), Digital cognitive technologies: Epistemology and the knowledge economy (pp. 137–152). Hoboken, NJ: John Wiley & Sons. Cambrosio, A., Keating, P., Mercier, S., Lewison, G., & Mogoutov, A. (2006). Mapping the emergence and development of translational cancer research. European Journal of Cancer, 42(18), 3140–3148. Cambrosio, A., Keating, P., & Mogoutov, A. (2004). Mapping collaborative work and innovation in biomedicine: A computer-assisted analysis of antibody reagent workshops. Social Studies of Science, 34(3), 325–364. Cornwell, B., Curry, T. J., & Schwirian, K. (2003). Revisiting Norton Long’s ecology of games: A network approach. City & Community, 2(2), 121–142. Crossley, N. (2009). The man whose web expanded: Network dynamics in Manchester’s post/ punk music scene 1976–1980. Poetics, 37(1), 24–49. Crossley, N. (2016). Social network analysis. In D.  Inglis & A.-M.  Almila (Eds.), The Sage handbook of cultural sociology (pp. 282–293). London, UK: Sage. de Nooy, W. (2003). Fields and networks: Correspondence analysis and social network analysis in the framework of field theory. Poetics, 31(5–6), 305–327. Diani, M., & Kousis, M. (2014). The duality of claims and events: The Greek campaign against the Troika’s memoranda and austerity, 2010–2012. Mobilization: An International Quarterly, 19(4), 387–404.

408   Sophie Mützel and Ronald Breiger DiMaggio, P. (1987). Classification in art. American Sociological Review, 52(4), 440–455. DiMaggio, P. (1992). Nadel’s paradox revisited: Relational and cultural aspects of organizational structure. In N. Nohria & R. Eccles (Eds.), Networks and organizations: Structure, form and action (pp. 118–142). Boston, MA: Harvard Business School Press. DiMaggio, P. (2011). Cultural networks. In J. Scott & P. J. Carrington (Eds.), The Sage handbook of social network analysis (pp. 286–300). London, UK: Sage. DiMaggio, P., Nag, M., & Blei, D. (2013). Exploiting affinities between topic modeling and the sociological perspective on culture: Application to newspaper coverage of U.S. government arts funding. Poetics, 41(6), 570–606. Durand, R., & Khaire, M. (2017). Where do market categories come from and how? Distinguishing category creation from category emergence. Journal of Management, 43(1), 87–110. Edelmann, A., & Mohr, J. W. (2018). Formal studies of culture: Issues, challenges, and current trends. Poetics, 68, 1–9. Eliasoph, N., & Lichterman, P. (2003). Culture in interaction. American Journal of Sociology, 108(4), 735–794. Elzinga, P., Poelmans, J., Viaene, S., Dedene, G., & Morsing, S. (2010). Terrorist threat assessment with formal concept analysis. Paper presented at the 2010 IEEE International Conference on Intelligence and Security Informatics (ISI), Vancouver, Canada. Emirbayer, M., & Goodwin, J. (1994). Network analysis, culture, and the problem of agency. American Journal of Sociology, 99(6), 1411–1454. Evans, J. A., & Aceves, P. (2016). Machine translation: Mining text for social theory. Annual Review of Sociology, 42, 21–50. Everett, M. G., & Borgatti, S. P. (2013). The dual-projection approach for two-mode networks. Social Networks, 35(2), 204–210. Fararo, T.  J., & Doreian, P. (1984). Tripartite structural analysis: Generalizing the BreigerWilson formalism. Social Networks, 6(2), 141–175. Faulkner, R. (1986). Hollywood studio musicians: Their work and careers in the recording industry. Lanham, MD: University Press of America. Feld, S. L. (1981). The focused organization of social ties. American Journal of Sociology, 86(5), 1015–1035. Fine, G.  A., & Kleinman, S. (1983). Network and meaning: An interactionist approach to structure. Symbolic Interaction, 6, 97–110. Freeman, L.  C., & White, D.  R. (1993). Using Galois lattices to represent network data. Sociological Methodology, 23, 127–146. Friedland, R., & Alford, R. (1991). Bringing society back in: Symbols, practices and institutional contradictions. In W. Powell & P. DiMaggio (Eds.), New institutionalism in organizational analysis (pp. 232–263). Chicago, IL: University of Chicago Press. Fuhse, J. (2009). The meaning structure of social networks. Sociological Theory, 27(1), 51–73. Galaskiewicz, J., & Burt, R. S. (1991). Interorganization contagion in corporate philanthropy. Administrative Science Quarterly, 36(1), 88–105. Giuffre, K. (1999). Sandpiles of opportunity: Success in the art world. Social Forces, 77(3), 815–832. Godart, F. (2018). Culture, structure, and the market interface: Exploring the networks of stylistic elements and houses in fashion. Poetics, 68, 72–88. Godart, F., & White, H. C. (2010). Switchings under uncertainty: The coming and becoming of meanings. Poetics, 38(6), 567–586.

Duality beyond Persons and Groups   409 Goldberg, A. (2011). Mapping shared understandings using relational class analysis: The case of the cultural omnivore reexamined. American Journal of Sociology, 116(5), 1397–1436. Healy, K. (2013). Using metadata to find Paul Revere. https://kieranhealy.org/blog/ archives/2013/06/09/using-metadata-to-find-paul-revere/ Hidalgo, C.  A., & Hausmann, R. (2009). The building blocks of economic complexity. Proceedings of the National Academy of Sciences, 106(26), 10570–10575. Hidalgo, C. A., Klinger, B., Barabási, A.-L., & Hausmann, R. (2007). The product space conditions the development of nations. Science, 317(5837), 482–487. Hoffman, M. A., Cointet, J.-P., Brandt, P., Key, N., & Bearman, P. (2018). The (Protestant) bible, the (printed) sermon, and the word(s): The semantic structure of the conformist and dissenting bible, 1660–1780. Poetics, 68, 89–103. Ignatow, G., & Mihalcea, R. (2016). Text mining: A guidebook for the social sciences. London, UK: Sage. Ikegami, E. (2005). Bonds of civility. Aesthetic networks and the political origins of Japanese culture. New York, NY: Cambridge University Press. Jones, C., Maoret, M., Massa, F. G., & Svejenova, S. (2012). Rebels with a cause: Formation, contestation, and expansion of the de novo category “modern architecture,” 1870–1975. Organization Science, 23(6), 1523–1545. Kadushin, C. (1976). Networks and circles in the production of culture. American Behavioral Scientist, 19(6), 769–784. Kennedy, M. T. (2005). Behind the one-way mirror: Refraction in the construction of product market categories. Poetics, 33(3–4), 201–226. Kennedy, M. T. (2008). Getting counted: Markets, media, and reality. American Sociological Review, 73(2), 270–295. Latour, B., Jensen, P., Venturini, T., Grauwin, S., & Boullier, D. (2012). The whole is always smaller than its parts: A digital test of Gabriel Tarde’s monads. British Journal of Sociology, 63(4), 590–615. Laumann, E. O., & Pappi, F. U. (1976). Networks of collective action: A perspective on community influence systems. New York, NY: Academic Press. Lazega, E., Jourda, M.-T., Mounier, L., & Stofer, R. (2008). Catching up with the big fish in the big pond? Multi-level network analysis through linked design. Social Networks, 30, 159–176. Lazega, E., & Snijders, T. A. B. (Eds.). (2016). Multilevel network analysis for the social sciences. Heidelberg, Germany: Springer. Lee, M., & Martin, J. L. (2018). Doorway to the dharma of duality. Poetics, 68, 18–30. Light, R., & adams, j. (2016). Knowledge in motion: The evolution of HIV/AIDS research. Scientometrics, 107(3), 1227–1248. Lizardo, O. (2006). How cultural tastes shape personal networks. American Sociological Review, 71(5), 778–807. Lizardo, O. (2014). Omnivorousness as the bridging of cultural holes: A measurement strategy. Theory and Society, 43(3), 395–419. Martin, J.  L. (2000). What do animals do all day?: The division of labor, class bodies, and totemic thinking in the popular imagination. Poetics, 27(2–3), 195–231. Martin, J. L., & Lee, M. (2018). A formal approach to meaning. Poetics, 68, 10–17. McLean, P. D. (1998). A frame analysis of favor seeking in the Renaissance: Agency, networks, and political culture. American Journal of Sociology, 104(1), 51–91. McLean, P. D. (2017). Culture in networks. Cambridge, UK: Polity Press. McPherson, M. (1983). An ecology of affiliation. American Sociological Review, 48, 519–532.

410   Sophie Mützel and Ronald Breiger Mills, B. J., Clark, J. J., Peeples, M. A., Haas, W. R., Roberts, J. M., Hill, J. B., . . . Shackley, M. S. (2013). Transformation of social networks in the late pre-Hispanic US Southwest. Proceedings of the National Academy of Sciences, 110(15), 5785–5790. Mintz, B., & Schwartz, M. (1985). The power structure of American business. Chicago, IL: University of Chicago Press. Mische, A. (2003). Cross-talk in movements: Reconceiving the culture-network link. In M. Diani & D. McAdam (Eds.), Social movements and networks (pp. 258–280). Oxford, UK: Oxford University Press. Mische, A. (2008). Partisan publics: Communication and contention across Brazilian youth activist networks. Princeton, NJ: Princeton University Press. Mische, A. (2011). Relational sociology, culture, and agency. In J.  Scott & P.  J.  Carrington (Eds.), The Sage handbook of social network analysis (pp. 80–97). London, UK: Sage. Mische, A., & Pattison, P. (2000). Composing a civic arena: Publics, projects, and social ­settings. Poetics, 27, 163–194. Mizruchi, M. (1992). The structure of corporate political action: Interfirm relations and their consequences. Cambridge, MA: Harvard University Press. Mizruchi, M. (1996). What do interlocks do? An analysis, critique, and assessment of research on interlocking directorates. Annual Review of Sociology, 22, 271–298. Mizruchi, M., & Galaskiewicz, J. (1993). Networks of interorganizational relations. Sociological Methods & Research, 22(1), 46–70. Mohr, J. W. (1994). Soldiers, mothers, tramps and others: Discourse roles in the 1907 New York City Charity Directory. Poetics, 22(4), 327–357. Mohr, J. W. (1998). Measuring meaning structures. Annual Review of Sociology, 24, 345–370. Mohr, J. W., & Bogdanov, P. (2013). Introduction—Topic models: What they are and why they matter. Poetics, 41(6), 545–569. Mohr, J. W., & Duquenne, V. (1997). The duality of culture and practice: Poverty relief in New York City, 1888–1917. Theory and Society, 26, 305–356. Mohr, J. W., & Guerra-Pearson, F. (2010). The duality of niche and form: The differentiation of institutional space in New York City, 1888–1917. In G. Hsu, Ö. Kocak, & G. Negro (Eds.), Categories in markets: Origins and evolution (Vol. 31, pp. 321–368). Bingley, UK: Emerald. Mohr, J. W., & Lee, H. (2000). From affirmative action to outreach: Discourse shifts at the University of California. Poetics, 28, 47–71. Mohr, J. W., & Rawlings, C. (2015). Formal methods of cultural analysis. In J. D. Wright (Ed.), International encyclopedia of the social & behavioral sciences (2nd ed., Vol. 5, pp. 357–367). Oxford, UK: Elsevier. Mohr, J. W., & White, H. C. (2008). How to model an institution. Theory and Society, 37(5), 485–512. Moody, J. (2004). The structure of a social science collaboration network: Disciplinary cohesion from 1963 to 1999. American Sociological Review, 69(2), 213. Moody, J., & Light, R. (2006). A view from above: The evolving sociological landscape. American Sociologist, 37(2), 67–86. Mützel, S. (2002). Making meaning of the move of the German capital: Networks, logics, and the emergence of capital city journalism. Ann Arbor, MI: UMI. Mützel, S. (2009). Networks as culturally constituted processes: A comparison of relational sociology and actor-network theory. Current Sociology, 57(6), 871–887. Mützel, S. (2010). Koordinierung von Märkten durch narrativen Wettbewerb. Kölner Zeitschrift für Soziologie und Sozialpsychologie, Sonderheft, 49, 87–106.

Duality beyond Persons and Groups   411 Mützel, S. (2016). Markets from stories. Ms. Habilitation: Humboldt-University of Berlin, Germany. Mützel, S., & Fuhse, J. (2010). Einleitung: Zur relationalen Soziologie. Grundgedanken, Entwicklungslinien und transatlantische Brückenschläge. In J. Fuhse & S. Mützel (Eds.), Relationale Soziologie: Zur kulturellen Wende der Netzwerkforschung (pp. 7–35). Wiesbaden, Germany: VS Verlag. Negro, G., Kocak, Ö., & Hsu, G. (2010). Research on categories in the sociology of organizations. In G. Hsu, Ö. Kocak, & G. Negro (Eds.), Categories in markets: Origins and evolution (Vol. 31, pp. 3–35). Bingley, UK: Emerald. Newman, M. E. (2001). The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences, 98(2), 404–409. Pachucki, M. A., & Breiger, R. L. (2010). Cultural holes: Beyond relationality in social networks and culture. Annual Review of Sociology, 36, 205–224. Padgett, J. F. (2012). Transposition and refunctionality. The birth of partnership systems in Renaissance Florence. In J. F. Padgett & W. W. Powell (Eds.), The emergence of organizations and markets (pp. 168–207). Princeton, NJ: Princeton University Press. Padgett, J. F., & Powell, W. W. (2012). The problem of emergence. In J. F. Padgett & W. W. Powell (Eds.), The emergence of organizations and markets (pp. 1–29). Princeton, NJ: Princeton University Press. Powell, W. W., Koput, K. W., & Smith-Doerr, L. (1996). Interorganizational collaboration and the locus of innovation: Networks of learning in biotechnology. Administrative Science Quarterly, 41, 116–145. Powell, W. W., White, D., Koput, K., & Owen-Smith, J. (2005). Network dynamics and field evolution: The growth of interorganizational collaboration in the life sciences. American Journal of Sociology, 110(4), 1132–1205. Puetz, K. (2017). Fields of mutual alignment: A dual-order approach to the study of cultural holes. Sociological Theory, 35(3), 228–260. Rao, H., Monin, P., & Durand, R. (2005). Border crossing: Bricolage and the erosion of categorical boundaries in French gastronomy. American Sociological Review, 70(6), 968–991. Robins, G. (2015). Doing social network research: Network-based research design for social ­scientists. Los Angeles, CA: Sage. Roth, C. (2013). Socio-semantic frameworks. Advances in Complex Systems, 16(4–5), 1–26. Roth, C., & Cointet, J.-P. (2010). Social and semantic coevolution in knowledge networks. Social Networks, 32(1), 16–29. Rule, A., & Bearman, P. S. (2016). Networks and culture. In L. Hanquinet & M. Savage (Eds.), Routledge international handbook of the sociology of art and culture (pp. 161–173). Milton Park, UK: Routledge. Rule, A., Cointet, J.-P., & Bearman, P. S. (2015). Lexical shifts, substantive changes, and continuity in State of the Union discourse, 1790–2014. Proceedings of the National Academy of Sciences, 112(35), 10837–10844. Salvini, A. (2010). Symbolic interactionism and social network analysis: An uncertain encounter. Symbolic Interaction, 33, 364–388. Schweizer, T. (1993). The dual ordering of actors and possessions. Current Anthropology, 34(4), 469–483. Shi, F., Shi, Y., Dokshin, F. A., Evans, J. A., & Macy, M. W. (2017). Millions of online book co-purchases reveal partisan differences in the consumption of science. Nature Human Behaviour, 1, 1–9. doi:10.1038/s41562-017–0079

412   Sophie Mützel and Ronald Breiger Simmel, G. (1992 [1908]). Die Kreuzung sozialer Kreise. In O. Rammstedt (Ed.), Soziologie. Untersuchungen über die Formen der Vergesellschaftung (Vol. 11, pp. 456–511). Frankfurt a. M., Germany: Suhrkamp. Simmel, G. (1955). The web of group-affiliations. In K. H. Wolff & R. Bendix (Eds.), Conflict & The Web Of Group Affiliations (pp. 125–195). New York, London: The Free Press; CollierMacmillan Limited. Somers, M.  R. (1994). The narrative constitution of identity: A relational and network approach. Theory and Society, 23, 605–649. Stegbauer  C. (2019). Breiger (1974): The duality of persons and groups. In: B.  Holzer & C.  Stegbauer (Eds.), Schlüsselwerke der Netzwerkforschung (pp. 83–53). Springer VS, Wiesbaden. Stokman, F., Ziegler, R., & Scott, J. (1985). Networks of corporate power. Cambridge, UK: Polity Press. Tilly, C. (1997). Parliamentarization of popular contention in Great Britain, 1758–1834. Theory & Society, 26, 245–273. Useem, M. (1984). The inner circle: Large corporations and the rise of political activity. New York, NY: Oxford University Press. Uzzi, B., & Spiro, J. (2005). Collaboration and creativity: The small world problem. American Journal of Sociology, 111(2), 447–504. Ventresca, M., & Mohr, J.  W. (2002). Archival research methods. In J.  Baum (Ed.), The Blackwell companion to organizations (pp. 805–828). Oxford, UK: Blackwell. Vergne, J. P., & Wry, T. (2014). Categorizing categorization research: Review, integration, and future directions. Journal of Management Studies, 51(1), 56–94. Wang, P., Robins, G., Pattison, P., & Lazega, E. (2013). Exponential random graph models for multilevel networks. Social Networks, 35, 96–115. Wang, P., Robins, G., Pattison, P., & Lazega, E. (2016). Social selection models for multilevel networks. Social Networks, 44, 346–362. Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications. Cambridge, UK; New York, NY: Cambridge University Press. Watts, D. J. (1999). Networks, dynamics, and the small-world phenomenon. American Journal of Sociology, 105(2), 493–527. Watts, D. J. (2004). The “new” science of networks. Annual Review of Sociology, 30, 243–270. Watts, D. J., Dodds, P. S., & Newman, M. E. J. (2002). Identity and search in social networks. Science, 296, 1302–1305. Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of “small-world” networks. Nature, 393(6684), 440–442. White, H. C. (1992). Identity and control: A structural theory of social action. Princeton, NJ: Princeton University Press. White, H. C. (2008). Notes on the constituents of social structure. Soc. Rel. 10–Spring ‘65. Sociologica, 1, 1–15. doi:10.2383/26576 Whitham, M.  M. (2012). Community connections: Social capital and community success. Sociological Forum, 27(2), 441–457. Windolf, P. (2002). Corporate networks in Europe and the United States. Oxford, UK: Oxford University Press. Wuchty, S., Jones, B. F., & Uzzi, B. (2007). The increasing dominance of teams in production of knowledge. Science, 316(5827), 1036–1039.

Duality beyond Persons and Groups   413 Yeung, K.-T. (2005). What does love mean? Exploring network culture in two network ­settings. Social Forces, 84(1), 391–420. Zappa, P., & Lomi, A. (2016). Knowledge sharing in organizations: A multilevel network ­analysis. In E. Lazega & T. A. B. Snijders (Eds.), Multilevel network analysis for the social sciences (pp. 333–353). Heidelberg, Germany: Springer.

Chapter 22

N et wor ks of Cu lt u r e , N et wor ks of M e a n i ng Two Approaches to Text Networks Ryan Light and Jeanine Cunningham

A music critic describes a new artist by referring to an old one. An anthropologist signals her theoretical framework by citing a classic text. A writer searches for just the right word given the words they have already written and the personal dictionary they carry in their mind. An organization frames its statement of purpose to connect to other like organizations. We understand the world through a network of meanings. Sense-making is networked. Meaning networks are inherently social: they connect individuals through shared understanding. Miscommunication occurs when meaning is obscured or doesn’t make sense. The classic telephone game reveals how meaning changes as schoolchildren pass a message from one classmate to the next through a classroom network. The telephone game’s punchline is based on how meaning is in part subject to the whim of a network of fidgety and, perhaps, mischievous children. This chapter focuses on the relationship between meaning and networks. Meaning is affected by network processes, but meaning does not solely exist passively adrift on social networks as it helps to constitute those networks as well. We see this in linguistic networks and also in research on how culturally embedded values contribute to network formation (White, 2008). Previous scholarship has focused on how meaning is often structured as a network. Here, symbols such as words are connected to one another to model patterns or structures of meaning. These network models of text have proven particularly useful given the massive proliferation of text data via the internet. This data abundance results from online conversations taking place via websites, social media platforms, digital media publications, and the digital preservation of historical materials, such as official government data and other material as well as books, diaries, and so forth. While previous generations of scholars may have suffered from the relative difficulty of finding or accessing data, contemporary ­scholars often must address having access to an abundance of data (Bail,  2014; Light, 2014). Network approaches to text offer one way to make sense of large collection or corpora of text. In this chapter, we focus on two network forms: text or semantic networks and

Networks of Culture, Networks of Meaning   415 subject-action-object networks. Each approach offers a different way of thinking about text data. We proceed through the chapter as follows: First, we introduce recent conversations over meaning, culture, and networks within the social sciences with a bias toward the literature in sociology—the discipline with which we are most familiar. Next, we describe the two aforementioned approaches and provide illustrations drawn from historical interviews of formerly enslaved people. Last, we situate these approaches alongside other ways of computationally analyzing patterns across texts. We argue for the continued relevance of thinking about and modeling the relationship between networks and meaning.

What Does Meaning Mean? Meaning has long been considered by some to be a key to sociological understanding (Fuhse, 2009; White, 2008; White et al., 2007). We cannot develop systematic knowledge about a group of people without considering how they make sense of the world around them. However, scholars rarely provide a definition of meaning, which incites confusion. Like many concepts in the social sciences, meaning is the subject of debate and beset by ideological fights that rehash seemingly ancient methodological complaints. In this case, the argument centers upon whether local meaning or global meaning is the proper level of analysis. At the level of text, for example, is meaning located individually in each text or is it located globally across multiple texts? As we will see, network scholars have largely focused on the more structural approach—that is, meaning as a global property—but have not entirely abandoned its local construction and negotiation. From this perspective, local meanings aggregate to global meaning structures, which also recursively feed back into local meaning making. This recursive process presents a challenge to how we think about meaning in everyday life. We often offer deep psychological or emotional weight to the concept of meaning. We might ask what a painting “means” or think about the “meaning” of life. We also may think about less lofty senses of meaning when we try to remember what a difficult word or what a rarely seen street sign means. Research on meaning often differs along these lines, with meaning understood as values—“culturally embedded normative explanations” (Orbuch, 1997, p. 460)—or more basic symbolic forms of representation. Of course, these two ways of thinking about meaning are related to one another and this relationship forms one of the key motivations of many social scientists working on the relationship between networks and meaning: symbolic representations both profound and mundane can flow through and shape networks. Debates on the relationship between meaning and social life often center upon whether meanings operate as systems versus more ad hoc, unconscious perceptual “images.” The former, associated with the “strong program” in sociology, conceives of meaning as a structured system. Building from classic 20th-­century structuralism, this approach emphasizes how meaning exists relationally—often in a binary fashion—and emphasizes totalities. Describing structuralism across the social sciences, Lane (1970, p. 14) expressed the networked character of deep societal structures: “The essential quality of the structuralist method, and its fundamental tenet, lies in its attempt to study not the elements of the whole, but the complex network of relationships that link and unite

416   Ryan Light and Jeanine Cunningham those elements.” Meaning, in this framework, occurs globally as a networked web— again in its strictest form consisting of binaries, such as good/evil, sacred/profane, and so forth—and these “webs of significance” play a central role in guiding social action (Alexander, 2003, p. 22). This meaning system often is embedded in our linguistic practice and can be drawn from it. Critics of this approach, following general critiques of structuralism, view it as under­ specified. Building from practice theory, research in cognitive sociology, for example, has developed an alternative understanding of meaning distinguishing people’s habitual actions from more conscious ones. Within the framework of cognitive sociology, especially the “dual process” mode, people mostly rely on symbols as simplifying functions capturing engrained, but rarely reflexively examined, lines of action (Lizardo et al., 2016). Symbols may or may not be embedded in unconscious values or beliefs that shape everyday action. A recent formal extension of this approach illustrates how shared beliefs and values can be modeled as “belief networks.” In Boutyline and Vaisey (2017), political beliefs are conceived as a network, where beliefs (here, political attitude survey responses) form the network’s nodes. One belief is connected to another based on the extent of overlapping responses to national survey questions (here, correlation scores). In this view action derives less from a complex system of meaning and more from relatively simple heuristics, like political ideology. We take a comprehensive approach to conceptualizing meaning as operating both locally and globally and in both simple and more complex ways. Yet, most research on networks and meaning—for either practical or theoretical reasons—can be categorized by the extent to which it falls on the continuum along these two dimensions from the simple to the complex. Simple meanings are more likely to be habituated sources of action or those beliefs that may be less conscious or explicit. Belief networks and networks based on structured datasets, such as surveys, may capture these more simple forms of meaning, while more complex ideas can be articulated through unstructured data derived from writing and speech. These articulations may reference simple meaning but together form a more complex meaning structure. Formal approaches to meaning have attended to these complex meaning structures. Mohr (1998, p. 364) identifies three advantages of the “shift toward quantitative relational methodologies” to understanding meaning: (1) These models are “iconic.” Consistent with network methods generally, these approaches promote visualization and “look like what they represent.” (2) They provide a “rich conceptual vocabulary.” Network concepts may help generate new ways of thinking about culture. (3) They provide tools to quantify meaning structures. While interpretation remains a key goal in cultural analysis, quantification encourages systematic pattern searching analogous to and alongside human modes of interpretation. These techniques are complementary to hermeneutic approaches and not antagonistic toward them. Quantification and formal measurement also encourage replication and reproducibility. Interpretive approaches do not formally replicate, but advance through the accumulation of evidence: my understanding plus new evidence extends previous understanding. Structural approaches to meaning allow for the specific evaluation of the exact same techniques used to generate analysis, especially within the best practices of open science (e.g., shared code and data when possible). In this chapter we focus on how meaning can be extracted from text data. The move to locate patterns in collections of text follows from the structuralist tradition. Several classic

Networks of Culture, Networks of Meaning   417 works in structural anthropology and sociology use basic strategies of text modeling in ways that preview the approaches described here (see Carley & Palmquist,  1992; Franzosi,  1990). However, we see less tension between the structural and cognitive approaches described previously. One reason for this relaxed structural approach arises from the texts that we use here and have used elsewhere (see Light,  2014; Light & Cunningham, 2016; Moody & Light, 2006), especially the interviews with formerly enslaved people. Unlike political speech, qualitative interviews with people about their everyday lives often display characteristics resonant with the cognitive approach to meaning. People are not explicit about their values; they contradict themselves describing tension between global values and local reasoning. For example, some formerly enslaved interview subjects express passionate disdain for the horrors of slavery while defending the actions of the person who enslaved them. These contradictions are the stuff of everyday life, but even the contradictions themselves may exist in structured ways across individual texts. Meaning can be both contradictory and structured, and balance between these characteristics can be the subject of empirical investigation. While the theory behind networks and meaning remains the subject of some debate, the effort to use network structures to analyze large sets of text data has practical advantages. First, large sets of texts are unwieldy, and the physical reading of texts in a linear fashion is time-consuming and inefficient. Given even modest-sized collections of several thousand texts, reading word for word may be impossible because of practical constraints. Second, the use of networks to uncover meaning within text data addresses the problem of the bias accumulated through the successive reading of texts. This bias is endemic to hermeneutical approaches to text insomuch as the reading of one text influences the reading of the next. This bias is not entirely negative as the nuances across texts are an important aspect of learning about them, yet the problem can be pernicious, especially when biases accrue systematically as a reader selects one text after another. This is related to the issue of biased entry into text collections as the order in which collections are read may matter for sense-making. Biased entry occurs when the entrance into a collection of texts is determined by characteristics of the text. For example, texts that are most popular may be those scholars are most likely to read first. And, of course, the most popular texts may be outliers relative to the corpus as a whole. In fact, it seems likely that the characteristics that contribute to the popularity of a text would be exactly correspondent with what would make a text an outlier: exceptional writing and storytelling, compelling narrative action, etc. Of course, these advantages do not preclude the human reading of text. Indeed, these network text approaches are often best used in conjunction with other types of “reading” (see Nelson, 2017 and Karrell & Freedman, 2019, for productive techniques). We provide two examples of how networks can be used to explore collections of texts. These are tractable solutions for network scholars both to classify the relationships between texts and to explore the content of texts themselves. We use interview data of formerly enslaved Americans collected in the 1930s by the US Works Progress Administration (WPA) to illustrate two different types of networks. First, building from linguistic understandings of co-occurrence or how terms appear together in texts, we will use networks as a relatively simple way to classify social text data. In this case, the documents serve as nodes, and the relationship between the documents based on overlapping content serves as edges. Second, we will use networks to model the content of the corpus. Based on the work of Franzosi and others, and using parsing techniques from natural language processing, we will extract

418   Ryan Light and Jeanine Cunningham subject-action-object triads and model these words as networks. In this case, words that comprise the nodes and edges indicate coappearance in sentences. The network is directed as subjects send to verbs and verbs send to objects. This is an efficient and useful means of summarizing the action described in a large set of text data. Together these approaches are tractable networks for making sense of complex text data consistent with structural approaches to language and social life.

Network Text Analysis for Meaning Structure Network text analysis, or semantic network analysis (Doerfel, 1998; Rice & Danowski, 1993), has been used by network scholars for several decades as part of a broad program of formalizing cultural analysis (see Edelmann & Mohr, 2018; Fuhse & Mützel, 2011; Mohr, 1998). The initial intuition remains: texts with more overlapping words are more likely to be similar than texts with few overlapping words. It is based on a simplification of texts that reduces them to their most basic parts: grammarless words, otherwise known as a “bag of words” technique. This unordered string of words appears as a kind of genetic code for a text. We can compare these complex patterns across texts to observe the distance between texts based on the extent to which these patterns—as a view of the content itself—overlap. The technique, then, begins with a matrix of documents-by-words, where each document is assigned to a column and each word in the entire corpus is assigned to a row. This twomode matrix is transformed into a one-mode network—either document-document or word-word—using the logic established in Breiger (1974; see Lee & Martin, 2018, and Mützel & Breiger’s Chapter 21 in this volume). Network text analytic techniques are used to analyze an array of topics from historical phenomena (Light, 2014; Rule, Cointet, & Bearman, 2015) to social media (Bail, 2016) to science (Moody & Light, 2006). Earlier approaches use semiautomated techniques to identify words clustered into themes across texts. For example, Kathleen Carley’s path-breaking work on the sociological analysis of text combines networks and semiautomated text analysis. In her work exploring the relationship between cognition and culture, Carley (1994) presents a multistep method for mapping the social knowledge of culture by identifying linguistic patterns. Examining a variety of text sources, such as science fiction literature wherein the characteristics of robots are described, Carley’s method necessitates that researchers manually identify concepts (i.e., descriptors) and their relationships within texts before turning to computer-assisted techniques to construct maps of social knowledge. More recent efforts by Carley and others are more fully automated (see Diesner & Carley, 2005). For example, Rule et al. (2015) examine changes in political speech in the United States using automated text analysis. Building co-occurrence networks of terms in State of the Union speeches delivered by US presidents, the authors find that the content of these formal speeches is stable and only changes gradually over time. In other words, State of the Union speeches are a “stable cultural form” (p. 10841). Looking at the contemporary case of advocacy organization, Bail (2016) examines Facebook posts from autism spectrum disorder advocacy organizations to analyze the discursive themes and patterns that certain

Networks of Culture, Networks of Meaning   419 groups use to develop more user interaction with their posts. Using natural language processing techniques combined with network analysis, Bail (2016, p. 11824) finds that advocacy groups that create “cultural bridges” in their posts, or messages “linking discursive themes within an advocacy field that are seldom discussed together,” are able to interact with a wider audience than those that mostly post overly familiar discussions. Network text analytic techniques prove particularly effective at document classification or the process of finding patterns across texts. For example, Evans (2016) describes how network-based text analysis proves useful in observing patterns of interdisciplinarity in science. She writes, “Text is ripe with possibility for measuring interdisciplinarity” (p. 3). She uses a distance measure based on overlapping words to identify how proximate disciplines are to one another with implications for the issues on which individual scientists work. Texts are structured and patterned in ways that summarize their more complex and nuanced layers. This simplification motivates qualitative efforts at corpus summarization and their computational counterparts. The data for our examples consist of interviews of formerly enslaved people collected during the Great Depression in the United States. Under the auspices of the jobs program, from 1936 to 1938 interviewers from the Federal Writers’ Project of the WPA conducted over 2,000 interviews of formerly enslaved people. These interviews vary in length from a few dozen to nearly 7,000 words, which provides some evidence of the variation within the corpus. These data have several known limitations, but should be considered valuable. First, the data were collected 70 years after emancipation. Many of the interviewees were very young when they experienced slavery and were elderly when interviewed. This can lead to several biases related both to the process of aging and memory and to the differences between the way children and adults experienced slavery. For example, this collection reflects the greater likelihood for enslaved children to work in the household than adults (Escott, 1979). The interviews, mostly conducted by white journalists and historians, also were subject to racial stereotyping, which can be seen in internal debates in the Federal Writers’ Project over how speech patterns should be reflected in the interview transcripts. Respondents likely differed in how they responded to interviewers based on race as well (see Stewart, 2016). Given that the interviews were conducted largely in the Jim Crow South, the Black respondents may have been reluctant to discuss their experiences with white interviewers and may have painted an optimistic picture of race relations in the pre–Civil War period to please them or to otherwise avoid negative sanctioning. These interviews also exhibit signs of interviewer bias as some white interviewers transcribed particularly outlandish depictions of Black speech ­patterns—a practice that was negatively sanctioned by the WPA office in Washington, DC. Even still, these interviews remain one of the greatest sources for understanding American slavery and remain widely used by historians (see Baptist, 2014; Plath, 2017).

Constructing Text Networks For the first example, a network text analysis, we analyze patterns within the WPA data. We ask, “How does the structure of the WPA corpus differ by emotional valence?” We use the structural connections between interviews based on overlapping content to build the interview network. We start with a file consisting of the interviews and some metadata, such as the interviewer name and the US state in which the interview took place. Following

420   Ryan Light and Jeanine Cunningham Bail (2016), we preprocess the interview text in several ways. First, we annotate and tokenize the data, which means we identify parts of speech for each word. We retain nouns for this analysis as they have shown to effectively summarize the content of text data—although this is likely dependent on length and the context of the corpus (see Hoffman et al., 2018). As is common in natural language processing, we weight each word across each document by tf × idf and remove very common words within the corpus and words that are rarely used (n = 284,275 total tokens consisting of 5,939 nouns).1 This results in a text-by-word matrix where the rows are texts or documents (here, WPA interviews) and the columns are words. The cells are the weighted score of each word that appears in a particular document, with the majority of cells being empty as the vocabulary is large and the documents are relatively short. In other words, this is a two-mode network where one mode consists of texts and the other mode consists of words. While we could accomplish a lot with this two-mode network and these text-by-word networks require more research, we take the common route of projecting this network into a single mode (Breiger, 1974). In this case, we project onto the text mode, or the text-by-text matrix, by multiplying the text-by-word matrix by its inverse (e.g., the word-by-text network). The cells of this matrix consist of the sum of overlapping tf × idf scores and indicate the extent to which two interviews overlap. Last, we use the Louvain method to locate communities within the interview network where each interview is assigned to a single community. These communities consist of interviews that are more like one another based on overlapping content. To summarize the methodological approach to constructing these networks, we (1) preprocess the text and retain nouns, (2) weight each word by tf × idf, (3) project the resulting text-by-word matrix to a text-by-text interview network, and (4) locate communities within this interview network using the Louvain method.

Results The interview-to-interview network is densely connected. This connectivity is consistent with the boundedness of the WPA corpus. Interview subjects were asked to report on similar material, which increases the likelihood of overlapping language. The average degree for the network is 1,477, while the network size is 1,481 nodes. Most nodes are connected to all others. Community detection identified four communities within this dense network. As can be seen in Figure 22.1, the network is relatively evenly divided between three communities of 418, 479, and 578 nodes. Community detection also identified a small community of five nodes that have been removed from all subsequent analyses. Does the interview structure capture differences in the interviews outside of the relationship between nouns themselves? For example, are the identified communities associated with characteristics of the interview setting, like the region in which the interview took place? We examine these questions by comparing communities across several variables that identify potential differences. For example, the interviews also may differ by geographic region. Both the experiences of those who were enslaved in the Deep South and race relations in the Deep South may have differed from the experiences of those living in other parts of the South. These differences may lead to differences in the content of interviews and, thus, shape the interview structure. Figure 22.2 depicts the relationship between the interview network structure and interviewer race. While each of the communities has a

Networks of Culture, Networks of Meaning   421 figure 22.1  WPA interview network.

1:Most Diverse Community 2:Most Deep South Community 3:Least Diverse

1.00

Proportion

0.75

Interviewer Race 0.50

White Black

0.25

0.00 1

2 Community

3

figure 22.2  Race by communities in the WPA interview network. majority of interviews that were conducted by white interviewers, they significantly differ in terms of proportion. Nearly 50% of the interviews in community 1 were conducted by Black interviewers, while community 2 consists of 9% of interviews conducted by Black interviewers, and community 3 consists of only 5% of interviews conducted by Black interviewers. This provides initial indication that the interview structure differs by race. The extent to which the interviews reflect the norms instituted by the WPA’s Washington, DC office also may contribute to differences in the structure of the interviews. The WPA instituted several guidelines about how interviews should be transcribed to avoid, at least

422   Ryan Light and Jeanine Cunningham 12.5

Number of Words

10.0

7.5

5.0

2.5

0.0 1

2 Community

3

figure 22.3  Problematic words by communities in the WPA interview network. somewhat, the overtly racist transcription offered by some interviewers, who each transcribed their own interviews. These guidelines offered specific ways of transcribing colloquialisms and phrases and words that were to be avoided, including racist terms among others. These words—that do not contribute to the formation of the noun-based interview structure—provide insight into racialized interviewing and transcription practices. Figure 22.3 provides initial indication of how these communities differed by use of these WPA identified terms. We can see that both communities 1 and 2 almost entirely consist of interviews that do not use the restricted terms. On the other hand, community 3 consists of interviews that are more likely to include terms on the restricted list. In sum, the interview structure is associated with differences in region and with differences related to the use of restricted terms. Next, we turn to the relationship between the interview structure and the sentiment or emotional valence of the interviews. To evaluate how the communities relate to emotional valence, we construct the mean sentiment of the adjectives and adverbs within each narrative interview. Terms were assigned sentiment using the “SocialSent” method that trains sentiment over historical ­lexicons (Hamilton et al., 2016). We specifically construct sentiment scores as the average of scores for terms in the 1930s and 1940s lexicons. Each interview is assigned a sentiment score based on the average sentiment score of each adjective and adverb matched in the historical lexicons. Results offer initial insight into how the structure of a corpus can relate to an outcome like sentiment or valence. As can be seen in Table 22.1, these communities differ based on their emotional valence. Community 2, almost entirely consisting of interviews that took place in the Deep South, is negatively associated with mean sentiment when compared to community 1, the most diverse community in terms of the race of the

Networks of Culture, Networks of Meaning   423 Table 22.1  Sentiment by Noun Network Communities Mean Sentiment 1: Most Diverse (reference) 2: Most Deep South

— −0.031* (0.012)

3: Least Diverse

0.042*** (0.012)

Deep South = 1

0.008 (0.012)

Problematic Words

0.002 (0.004)

Constant

0.290*** (0.012)

Observations

1,476

Note: Values in parentheses are standard errors. *p < .05; **p < .01; ***p < .001.

interviewers and the reference group, while community 3, which consists of interviews most likely to use restricted terms and is the least diverse, is positively associated with sentiment. These findings provide initial evidence of how the content structure of interviews relates to their emotional valence. The composition of the communities suggests the ways in which this relationship may be shaped in part through sociological factors, like region and propensity for racist terminology. Future analysis can more deeply explore how the structure of these seminal texts on American slavery relate to other factors, like racial bias within the interview process itself. Nonetheless, this brief illustration highlights one method for constructing text networks and describes how this network structure relates to emotional valence. Next, we use annotation to dig deeper into the content of the narratives themselves.

Computational Narrative Analysis for Embedded Meaning We can also leverage grammar to describe the narrative action of texts. Quantitative narrative analysis seeks to uncover the action driving the narrative or the operationalization and measurement of agency (Franzosi, De Fazio, & Vicari, 2012). Specifically, this approach identifies who does what to whom in narrative text. The typical bag-of-words approach employed in the first example results in some information lost, such as the grammatical relationship between words that we also use to construct meaning. In this example, we merge the

424   Ryan Light and Jeanine Cunningham grammatical relationships between words with network analysis. We build subject-verb-object networks emphasizing the syntactical structure of how people narrate their lived experience. Subject-verb-object triads connect the subject of sentences to objects through verbs or actions. In sociology Roberto Franzosi recognized the narrative importance of this triad for understanding texts (see Franzosi, 1990, 2004). Using semiautomated techniques, Franzosi et al. (2012) locate patterns of coercion and violence within historical newspaper stories ­covering lynching events in the United States. Aggregating subjects, verbs, and objects, they offer a skeletal image of this coverage in the form of a directed network drawing specific attention to how white “mobs” attacked their Black neighbors, but also coerced local law enforcement in an effort to administer their extrajudicial form of racial oppression. This approach is consistent with the three advantages of formal approaches to text analysis outlined by Mohr (1998). De Fazio (2013) illustrates the promise of this approach for historical analysis. Using these “computer-assisted story grammars,” De Fazio identifies four distinct phases in the political strife in Northern Ireland between 1968 and 1972. The approach systematically reveals the organization of violence as the insurgency and counterinsurgency quickly evolved, resolving to less violent forms of protest by the end of the four-year period. We develop a computational narrative analysis following quantitative narrative analysis techniques described in Franzosi et al. (2012). Like this work, we focus on agency within the WPA formerly enslaved interviews. Asking “Who does what to whom in the WPA corpus,” we draw specific attention to the skeletal dimensions of action within these texts. Prior work describes, for example, the differences described by men and women interview subjects within the WPA corpus. Our technique enables a systematic view of the structure of action within the formerly enslaved interviews.

Constructing Subject-Action-Object Networks For this illustration we use the same WPA interview data as described earlier to build ­subject-action-object (S-A-O) networks. We process the data based on the S-A-O triad. First, we annotate and tokenize each sentence in each interview using the cleanlp package in R (Arnold, 2017). Annotation identifies the part of speech and the syntactical role of each word. We retain each S-A-O triad and sum them. Next, we create a sender-receiver edge-list where subjects send to verbs and verbs send to objects. This edge-list is used for creating our directed networks. In this particular analysis, we are looking for a skeletal image of the thematic structure, so we hone in on the most common S-A-O triplets for the interview corpus and construct this network using the igraph package in R (Csardi & Nepusz, 2006). We use the Louvain method for identifying communities within this S-A-O network and then manually inspect these subgraphs.

Results The WPA S-A-O network consists of 255 terms and 642 edges. Community detection identifies 12 communities that summarize the structure of action within the WPA corpus. Figure 22.4 depicts the entire 255-term network. While this network is somewhat difficult to parse, we see several clear features of interest. For example, the network is bounded by a

Networks of Culture, Networks of Meaning   425 meat suppercake mama woman church dinner wife milk weddin bed us parti dress eye store recollect land plenti daughter pressur food biscuitbabi chicken plantat shirt farm pea doctor grandchild garden sister life quit chillun hairplace girl room acr wood fun have attend patch fishpossum battl wear teacher merci memori bawnwid book bread workshoe boy overs son corn meetin child own home live brother danc stroke famili cloth trip game lawd pass commod someth staydere go there slave huntin get educ chillen troubl chanc marbl play right lot n!!gerlivinhous armi wuz sens rais pick come jine crop one preacher dollar husband money chip join give way master job save write mindmake run kind buy noth marster differ meet build hog earn cotton folk help tolewe water kill gal didnhe mammi truth everyth love word read den lie whip pension good say dey marri mother man stori burn she seed peopl bother buri leav klux wasn who bout tell put em send massa anyth tote bring hors father sing ani carri nothin thing none dem learn yo beat tie see whup timesoldier they do know care call you take want findcatch ask hire cook keep sell that name what pay hit vote think aboutbedoin need tention much ticket mean show teach i forgetit membertellin tax song age talk dat yanke don rememb like lose ain walk day hear lord mile reckon bless lak school look god git war thank

figure 22.4  S-A-O WPA narrative network. cluster of terms closely linked with the pronoun “I” in the lower part of the graph and by terms associated with the verb “have” in the upper part of the graph: the I-have connection is strong, as indicated by the thick edge connecting the two terms. The pronoun “he” and connected terms are located at the center of the graph. The words in this cluster (e.g., “whip,” “beat,” “kill,” etc.) tend toward violent action terms consistent with the frequent discussion of violence within these narratives. The size of the communities varies, with numerous small communities alongside four larger communities. Four of the 12 communities are dyads that, nonetheless, suggest important themes: read-write (literacy), song-sing (song), ticket-vote (politics), mile-walk (travel). Figure  22.4 displays four additional midsize ­communities within the S-A-O network. Again, these suggest important themes within the corpus, including religion, the agentic joining of church and army, and farming practice with the central role of cotton. The larger communities are structured around key aspects of the narrative, primarily the pronouns and verbs that drive how the formerly enslaved interviewees describe their lives. For example, Figure 22.5 is centered on the masculine pronoun, “he,” and the plural pronoun, “they.” This community, as mentioned previously, is connected to work and to violence and other active characteristics of the lives of the enslaved. “They-whip-he” and “they-catch-he” summarize the violence of slavery, while the “he-run-way” triplet identifies the common narrative description of runaway slaves. For example, the transcription of

426   Ryan Light and Jeanine Cunningham hors

burn

bother

everyth

dey care tote

catch water

putake carri

bring

they sell whip he

find buri

em

give

beat

run

way

kill

build

tie buy

hog

seed didn

whup

hire

figure 22.5  S-A-O WPA narrative network: “he” and “they.” Martha Bradley’s interview reports: “Dere was a slave whut lived in Macon county. He run ’way and when he was catched dey dug a hole in de ground and put him crost it and beat him nigh to death.” Many of the interview respondents report similar tragedies around this theme of running away with some ending in successful escape. Frances Bateson of Nashville, Tennessee, describes how his brother attempted at least one unsuccessful escape that resulted in brutal torture, but eventually he ran away and never was found. Much of the action, especially violent action, occurs around the pronouns “he” and “they,” consistent, perhaps, with the age of these respondents prior to emancipation as most were very young and encountered the stories they related to the WPA interviewers by observing or listening to the experiences of those older than themselves. As personal narratives, key aspects of action revolve around the pronouns “I” and “you.” Figure 22.6 depicts the community of S-A-O terms most closely associated with “I,” “you,” and “she.” These terms are closely linked as indicated by the thicker or stronger lines connecting them as narrators describe wanting to report or being unable to report particular details of their experience in the “i-tell-you” S-A-O triplet. We can see further evidence of

Networks of Culture, Networks of Meaning   427 someth

educ commod dollar dollor

talk

bawn

yo

help

job

pension learn

wasn

git

forget donmember keep noth ain hear name send teach leav wuz love ani ask age bout know i she pay tole dem tell you see man rememb tellin much mother marri stori master soldier father none show mammi folk k1ux truth hit lie yanke get

tax

tention

peopl war gal

figure 22.6  S-A-O WPA narrative network: “I” and “she.” the narrative process in the connection between “I” and “remember” as the narrators report on their past. For example, Frank Greene of Pine Bluff, Arkansas, was 78 when he spoke with WPA interviewer Bernice Bowden, which means he was a small boy at the time of emancipation, yet he states that he remembers certain parts of this period well. He says, “Yes'm, I can remember the Civil War and the Yankees, too.” After describing his “boss” being run off from the planation, he captures a small but powerful image of life at the moment of emancipation: “I remember the Yankees would grab up us little folks and put us on the mules—just for fun you know. I can remember that just as well as if 'twas yesterday— seems like.” The S-A-O approach moves beyond bag-of-words techniques to incorporate grammatical structure into the creation of word networks. This brief discussion of S-A-O networks constructed from the WPA narrative provides some initial insight into how networks can help us to understand the content and the structure of text collections. In this case we can see how pronouns structure major aspects of how these narratives were told, but also how concepts were more likely to occur depending on which pronouns they were most closely linked with.

428   Ryan Light and Jeanine Cunningham

Conclusion The structure of language plays an important role in meaning making in the everyday world. The network of words that tie sentences to paragraphs to conversations to histories provides a key element of the meaning structure that bounds the field of human interaction and constitutes the contexts of culture. The illustrations in this chapter capture two distinct approaches to thinking about the relationship between texts and networks: In the first example, we developed a text network of the WPA interview data to understand corpus structure and its relationship to emotional valence. In the second illustration, we used S-A-O networks to understand the narrative themes in the WPA data. Both approaches allow us to extract different features of the semantic structures that compose the content and meaning of the texts. The quantitative analysis of text data inherently loses depth and nuance relative to more interpretivist, qualitative approaches. Computational “distant reading” approaches nonetheless provide a different view and a different way of engaging texts, especially a collection of texts that would be difficult to engage systematically in a qualitative way (Moretti, 2013). For example, while one could read several thousand interviews of formerly enslaved Americans, this reading will be necessarily cumulative and to some extent linear in that one reading will influence the next. This traditional “close” reading has obvious advantages, but some weaknesses as well: both regularities or patterns may be obscured in close reading, while deviant cases may either appear overrepresented due to cognitive bias or be difficult to locate especially across large corpora. By building formal models of meaning, we are able to take advantage of massive contemporary computing power to gain understanding of large collections of individual texts and the people who have written them. Text network approaches are, however, not the only techniques for finding meaning across texts: computationally elaborate methods that operate outside of or adjacent to network analysis provide an alternative way of examining text structure. For example, the family of topic modeling techniques, perhaps most prominently ­represented by latent Dirichlet allocation (see Blei, Ng, & Jordan, 2003), enables researchers to uncover pronounced clusters of co-occurring words, or “topics,” found within a text corpus. Certain topic modeling techniques, such as structural topic models (Roberts et al., 2014), can be used to analyze the proportion of documents that include a particular topic and the frequency with which certain words occur within a topic as distributed across document-level information (e.g., publication date, source origin, author, etc.). Another family of models known as word embedding maps words and phrases from a corpus onto vector spaces. Word vectors can be used to illustrate, for example, which words share common linguistic contexts throughout a corpus or shifts in language, such as descriptors of social class, over time (Kozlowski, Taddy, & Evans, 2019; Mikolov et al., 2013). Many of these approaches benefit from network thinking and methods at the visualization or analysis stage. For example, the relationships between topics derived from topic models are often visualized as a network (e.g. Light and Cunningham 2016; Light & Odden, 2017). Topic models, word embeddings, and other natural language processing techniques offer compelling strategies for uncovering context and meaning within texts, complementing the ways in which network-centric methods enable us to visualize text structures and the relationality of meaning.

Networks of Culture, Networks of Meaning   429

Acknowledgment The authors thank Jim Moody for over 15 years of productive conversation about text n ­ etworks. This chapter was funded in part by the University of Oregon Foundation Board of Trustees’ and Office of Research and Innovation’s 2017 Interdisciplinary Award in the Humanities and Social Sciences, the College of Arts and Sciences, and the University of Oregon Libraries’ Digital Scholarship Fellowship.

Note 1. tf × idf, or term frequency by inverse document frequency, is a common weighting technique within text analysis and information retrieval that discounts very common words. The intuition is that very common words in a text collection are less informative than less common words.

References Alexander, J. C. (2003). The meanings of social life: A cultural sociology. New York, NY: Oxford University Press. Arnold, TB. (2017). A tidy data model for natural language processing using cleanNLP. R Journal, 9(2). Bail, C.  A. (2014). The cultural environment: Measuring culture with big data. Theory and Society, 43(3–4), 465–482. Bail, C. A. (2016). Combining natural language processing and network analysis to examine how advocacy organizations stimulate conversation on social media. Proceedings of the National Academy of Sciences, 113(42), 11823–11828. Baptist, E. E. (2014). The half has never been told: Slavery and the making of American capitalism. New York, NY: Basic Books. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3(January), 993–1022. Boutyline, A., & Vaisey, S. (2017). Belief network analysis: A relational approach to understanding the structure of attitudes. American Journal of Sociology, 122(5), 1371–1447. Breiger, R. L. (1974). The duality of persons and groups. Social Forces, 53(2), 181–190. Carley, K. (1994). Extracting culture through textual analysis. Poetics, 22(4), 291–312. Carley, K., & Palmquist, M. (1992). Extracting, representing, and analyzing mental models. Social Forces, 70(3), 601–636. Csardi, G. & Nepusz, T. (2006). The igraph software package for complex network research. InterJournal, Complex Systems, 1695. De Fazio, G. (2013). The radicalization of contention in Northern Ireland, 1968–1972: A relational perspective. Mobilization: An International Quarterly, 18(4), 475–496. Diesner, J., & Carley, K. M. (2005). Revealing social structure from texts: Meta-matrix text analysis as a novel method for network text analysis. In V. K. Narayanan, & D. J. Armstrong (Eds.), Causal mapping for research in information technology (pp. 81–108). Hershey, PA: Idea Group Publishing. Doerfel, M. L. (1998). What constitutes semantic network analysis? A comparison of research and methodologies. Connections, 21, 16–26.

430   Ryan Light and Jeanine Cunningham Edelmann, A., & Mohr, J. W. (2018). Formal studies of culture: Issues, challenges, and current trends. Poetics, 68, 1–9. Escott, P. D. (1979). Slavery remembered: A record of twentieth-century slave narratives. Chapel Hill, NC: UNC Press Books. Evans, E. D. (2016). Measuring interdisciplinarity using text. Socius, 2, 2378023116654147. Franzosi, R. (1990). Computer-assisted coding of textual data: An application to semantic grammars. Sociological Methods & Research, 19(2), 225–257. Franzosi, R. (2004). From words to numbers: Narrative, data, and social science. New York, NY: Cambridge University Press. Franzosi, R., De Fazio, G., & Vicari, S. (2012). Ways of measuring agency: An application of quantitative narrative analysis to lynchings in Georgia (1875–1930). Sociological Methodology, 42(1), 1–42. Fuhse, J. A. (2009). The meaning structure of social networks. Sociological Theory, 27(1), 51–73. Fuhse, J., & Mützel, S. (2011). Tackling connections, structure, and meaning in networks: Quantitative and qualitative methods in sociological network research. Quality & Quantity, 45(5), 1067–1089. Hamilton, W.  L., Clark, K., Leskovec, J., & Jurafsky, D. (2016, November). Inducing domain-specific sentiment lexicons from unlabeled corpora. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing (Vol. 2016, p. 595). NIH Public Access. Hoffman, M. A., Cointet, J. P., Brandt, P., Key, N., & Bearman, P. (2018). The (Protestant) bible, the (printed) sermon, and the word (s): The semantic structure of the conformist and dissenting bible, 1660–1780. Poetics, 68, 89–103. Karrell, D., & Freedman, M. (2019). Rhetorics of radicalism. American Sociological Review, 84, 726–753. Kozlowski, A. C., Taddy, M., & Evans, J. A. (2019). The Geometry of culture: Analyzing the meaning of class through word embeddings. American Sociological Review, 84, 905–949. Lane, M. (1970). Introduction to structuralism. New York, NY: Basic Books. Lee, M., & Martin, J. L. (2018). Doorway to the dharma of duality. Poetics, 68, 18–30. Light, R. (2014). From words to networks and back: Digital text, computational social science, and the case of presidential inaugural addresses. Social Currents, 1(2), 111–129. Light, R., & Cunningham, J. (2016). Oracles of peace: Topic modeling, cultural opportunity, and the Nobel Peace Prize, 1902–2012. Mobilization: An International Quarterly, 21(1), 43–64. Light, R., & Odden, C. (2017). Managing the boundaries of taste: culture, valuation, and computational social science. Social Forces, 96, 877–908. Lizardo, O., Mowry, R., Sepulvado, B., Stoltz, D. S., Taylor, M. A., Van Ness, J., & Wood, M. (2016). What are dual process models? Implications for cultural analysis in sociology. Sociological Theory, 34(4), 287–310. Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. Mohr, J. W. (1998). Measuring meaning structures. Annual Review of Sociology, 24(1), 345–370. Moody, J., & Light, R. (2006). A view from above: The evolving sociological landscape. American Sociologist, 37(2), 67–86. Moretti, F. (2013). Distant reading. London, UK: Verso Books. Nelson, L.  K. (2017). Computational grounded theory: A methodological framework. Sociological Methods & Research, 49(1), 3–42.

Networks of Culture, Networks of Meaning   431 Orbuch, T. L. (1997). People’s accounts count: The sociology of accounts. Annual Review of Sociology, 23(1), 455–478. Plath, L. (2017). “My master and Miss . . . warn’t nothing but poor white trash”: Poor white slaveholders and their slaves in the antebellum South. Slavery & Abolition, 38(3), 475–488. Rice, R. E., & Danowski, J. A. (1993). Is it really just like a fancy answering machine? Comparing semantic networks of different types of voice mail users. The Journal of Business Communication, 30, 369–397. Roberts, M. E., Stewart, B. M., Tingley, D., Lucas, C., Leder-Luis, J., Gadarian, S.K., Alberston B., & Rand, D. G. (2014). Structural topic models for open-ended survey responses. Amercan Journal of Political Science, 58, 1064–1082. Rule, A., Cointet, J. P., & Bearman, P. S. (2015). Lexical shifts, substantive changes, and continuity in State of the Union discourse, 1790–2014. Proceedings of the National Academy of Sciences, 112(35), 10837–10844. Stewart, C. A. (2016). Long past slavery: Representing race in the federal writers’ project. Chapel Hill, NC: UNC Press Books. White, H.  C. (2008). Identity and control: How social formations emerge. Princeton, NJ: Princeton University Press. White, H., Fuhse, J., Thiemann, M., & Buchholz, L. (2007). Networks and meaning: Styles and switchings. Soziale Systeme, 13(1–2), 543–555.

Chapter 23

Histor ica l N et wor k R esea rch Emily Erikson and Eric Feltham

Regime change, market expansion, revolution, political polarization, increasing inequality, ethnic factionalization, and the rise and fall of institutions of governance and civil discourse are all major social transformations. Historical social network research begins with the ­presumption that important historical transitions are based on transformations in the ­patterns of relationships between people. For example, markets are created when social exchange patterns become more heterogeneous, a clustering of social relations produce clannish politics, and informal and closed networks of exchange increase inequality. This basic insight echoes and extends the animating impulse of the Annales school and arguments drawn from core canonical works in the sociological tradition, including Tönnies, Marx, Durkheim, and Simmel, all of whom maintained that changes in patterns of social interaction patterns drive significant historical phenomena. Our need to understand the drivers of social change has perhaps never been greater. Fortunately, the potential for illuminating network analysis of large-scale social change has never been as promising. Due to the massive digitization drive that has been occurring for the past decade or so in the humanities, the resources for historical social network analysis have grown at an exponential rate. As a result, new materials, archives, and corpora are available that illuminate all kinds of dark corners and crevices—as well as broad swathes and sweeping vistas—that were previously inaccessible to network researchers. Traditionally, historical network research has been plagued by the painstaking process of locating systematic historical data and then converting it into a text-readable format prior to analysis. The digitization drive has greatly reduced these obstacles. Researchers are now able to realize the goal of conducting analyses on the impact of relational and transformational patterns across many times and places, thereby adding invaluable cases for research into relatively infrequent large-scale macro-historical shifts. In the following, we provide a rough overview of overlapping stages of the development of historical network research with the goal of providing clues as to the most profitable avenues for further development now that more data is available.

Historical Network Research   433

Cross-Cutting Ties Much of the earlier work in the historical social network tradition was conceptually grounded in the idea of cross-cutting ties. Cross-cutting networks wend in and out of powerful bases of group membership, such as class, race, ethnic identity, occupational group, and regional ­location. As such, they capture a unique and important characteristic of social networks for historical research. They are interstitial mechanisms for coordination. Because they exist outside of or are only partially encased within existing institutional bases of power, they have the capacity to raze or transform those institutions—in many cases by linking of marginalized individuals across different arenas interested in pursuing change (White, 1992, 2008). Cross-cutting networks, for example, have been found to have played an essential role in mobilizing peasant rebellion. In a comparison of 17th-century peasant revolts in France and the Ottoman Empire, Barkey (1991) found that the existence of cross-cutting ties linking peasants and nobles in France was crucial to the emergence of sustained, violent collective action. Their absence in the Ottoman Empire doomed the prospects for resistance. Two years later Foran and Goodwin (1993) similarly found that cross-cutting ties, this time across individuals in different occupations and professions, were essential to fomenting uprisings in Nicaragua and Iran. In the first two monographs devoted to historical social network analysis, the authors were able to expand and further specify the role of mobilizing cross-cutting networks. Bearman (1993) considered the determinants of the English Civil War. He showed that the  structure of relations between gentry in 17th-century England shifted away from a ­kin-based pattern into a patronage system that articulated its claims for legitimacy and authority within abstract religious terms ambiguous enough to accommodate the emerging, new elite coalitions. The resulting cleavages set the stage for the English Civil War. Gould’s (1995) work on the Paris Commune revolts charted a similar path to explaining widespread mobilization. Cross-cutting networks provided a crucial link and coordinating mechanism that bound individuals into recognizable communities with the potential for violent insurgency and mobilization. Gould showed how overlaps in different cross-cutting networks created pockets of individuals who saw themselves as similar and were therefore willing to act together, creating a context for revolutionary activity. All of these works ­indicate that the way in which the network of social relations that maps onto and connects different existing social groups can have an effect on the potential for historical transformation that is independent of other causes. After a short hiatus, the theme of cross-cutting networks re-emerged with new analytical strategies to support its investigation in economic settings. Trapido and Hillmann (2010) examined economic actors’ propensity to transact and cooperate within, rather than across, socially meaningful groups by constructing merchant cooperation networks in 18th-­century Bristol. Taking as their context the political discord between the Tories and Whigs, they estimated the probability of a tie between merchants and found that cross-party ties were significantly less likely than in-party ties, indicating that political affiliations and views shaped market transactions. Trapido (2013) carried this theme forward in an analysis of 844 commercial voyages undertaken between 1690 and 1813 by privateering Bristol merchants.

434   Emily Erikson and Eric Feltham By again modeling the propensity of tie formation, he found that merchants’ cross-cutting political party partnerships were characterized by inequality; specifically, more established merchants only sought cross-party ties with less established merchants. Hillman and Aven (2011) returned to the theme in research on new enterprises in the late imperial era of Russian industrialization (1869–1913). Using a network constructed by linking entrepreneurs by joint participation in the founding of a company, they found that the more economically successful core of the network was also more ethnically diverse than peripheral clusters, thus linking cross-cutting ties to success in capital mobilization. The concept was also imported into the study of state power. Cline’s (2012) work brought the importance of cross-cutting networks even further back in history and into the terrain of state power. Using prosopographical methods, she constructed three social networks: Phillip II of Macedon and his generals, Pericles’s network, and Alexander the Great’s network. She applied cluster analysis to Phillip II’s network and observed that the network centrality seemed—counterintuitively—less important than the multicultural nature of political affiliations and alliances. Phillip II and his generals did not occupy a core position in their network, and Alexander the Great’s network encompassed relationships across the empire, which contrasted with the more homogenous networks of his generals. While cross-cutting ties benefited Alexander, evidence has also indicated that too much overlap across groups can lead to political stasis and partisan politics. Parigi and Bearman (2008) used Italian electoral reform in 1993 to study political alliance formation. They showed that the electoral reform instigated structural changes to the party networks: previously isolated and oppositional peripheral parties were shifted into an overlapping and densely connected set of clusters. While the reforms were intended to reduce the number of parties and decrease factional interests, they ultimately had the opposite effect as the integration of parties led toward the creation of even larger national-level political cleavages.

Informal Social Ties The growth of markets and transition to capitalism have been associated with the dissolution of closed social groups and increasing trust and exchange between strangers and acquaintances since the birth of the social sciences (Tönnies, 2011 [1887]). This vein of historical research is largely concerned with the importance of acquaintanceship, friendship, and “weak” ties. Because they allow for relationships to develop among heterogeneous actors, weak ties are conceptually similar to cross-cutting ties but theorized and measured at the level of the individual rather than the level of social groups, communities, and nations. Weak ties capture another important dimension of the historical potency of social networks: their dynamism. We cannot pick our families, but we can pick and choose our friends. Thus, weak ties not only expand the knowledge horizons of individuals, as famously pointed out by Granovetter (1985), but also add significantly to social fluidity. The ability to create entirely new relationships is a prima facie necessary element for historical changes rooted in the transformation of existing patterns of relations. Social network researchers have documented this type of relational transformation in studies of rapid trade growth and economic development. The existence and maintenance of cooperation and trust over weak and “arm’s length” ties have been shown to be crucial to

Historical Network Research   435 the miraculous commercial success of early-modern Genoese merchants (Van Dooselaere,  2009), the expansion of English trade in the Indian Ocean (Erikson & Samila, 2018), and the development of 18th-century Bristol (Trapido, 2013). The creation of informal, extra-institutional networks of alliance and patronage also appears to be a significant factor in the process of state formation. As early as 1974, Blok concluded that the Italian state expanded its reach by activating and absorbing the network of connections linking Italian mafioso. Kettering (1988) made a similar structural argument by showing that the expansion of state powers in 17th-century France relied on the personal ministerial network created by Cardinal Richelieu in his quest for power. Alexander and Danowski (1990) coded “who to whom” relations pulled from 280 of Cicero’s letters and observed that Roman knights and senators had almost identical interaction patterns. McLean found that informal social networks were central to the consolidation of legitimate authority in Florence (2007) and in the Polish political reform of 1791 (2004). And in his book Trust and Rule (2005), Tilly made the more general argument that all processes of state expansion take place through an amoeba-like process of phagocytosis: social networks create pockets of coordinated, cooperative activity that is enveloped and assimilated into the state apparatus. However, these claims must be subject to an important caveat: the lack of hierarchical organization and rational bureaucratic processes in most social networks tends to limit state expansion after a certain point. Such appears to have been the case in Florence after the Medicis (McLean, 2005) and in the United Provinces after the Gouden Eeuw (Adams, 1996). The influence of informal networks further extends to political decision making. Parigi and Bergemann (2016) constructed a network of US congressmen from the Jeffersonian era by boardinghouse coresidence. By examining voting patterns, they found that the politicians’ political activities were influenced by their boarding mates, whether or not they belonged to the same political party. And in revolutionary and social movement contexts, weak ties have acted as a pathway for recruitment. Social networks have been important to the creation of the Salvadoran Guerilla Army (Viterna,  2006), civil rights protests (Fernandez & McAdam, 1988; McAdam, 1986, 1988), and collective action in East Germany (Pfaff, 1996).

Associational and Organizational Networks Network ties may link not only individuals but also groups. In particular, civic institutional connections have been shown to play a crucial role in state and national development. In the early modern Atlantic empire, Olson (1992) showed that intercontinental linkages across nongovernmental interest groups provided the sinews upon which the imperial state took shape. In Japan, networks linking various arts groups passionately pursuing their ­interests in haiku, kimono fashions, tea ceremonies, and other aesthetic pastimes formed a skeletal network of horizontal associations crucial to national unification (Ikegami, 2005). Network ties linking bureaucratic offices in the 20th-century US government allowed for  the development of an autonomous corps of civil servants (Carpenter, 2001). In a

436   Emily Erikson and Eric Feltham series of papers, Wimmer (2011, 2012) established the central importance of cross-national networks of ties linking civic associations to the emergence of the nation state. And Somers established the importance of the pattern of ties between local and national institutions in the development of the rights of the citizenry and the idea of citizenship in 18th-century England. Similarly, interorganizational ties have played a significant role in democratization, religious insurgency, and military outcomes. Torfason and Ingram (2010) found that the diffusion of democracy depends significantly on the network structure of intergovernmental organizations through the use of a network autocorrelation model tracking changes in democracy among the world’s countries from 1815 to 2000. Wurpts, Corcoran, and Pfaff (2018) constructed a weighted network of northern Hanseatic towns based on joint participation in assembly meetings and found that measures of betweenness and degree centrality predict the adoption of Protestantism. Lehmann and Zhukov (2019) analyzed “battle dyads” composed of enemy combatants by using a database of battles stretching from 1939 to 2011 and concluded that the decision of an army to surrender is contagious across proximate battles. Sectorial and organizational networks also play a crucial role in market expansion and economic transitions. Lachmann (1987, 2002) formally analyzed the network structure of resource flows in early modern England and identified how the historically determined pattern had an unintended effect of pushing elites into market-making activities, thus paving the way to capitalism. Hungary’s successful transition into capitalism was supported by the dynamic and fluid capacities of informal networks across firms (Stark & Bruszt, 1998; Stark & Vedres, 2006). The early English shipping industry benefited from interbuilder ties that reduced the risk of firm failure in Clyde (Ingram & Lifschitz, 2006). And Asia’s transition into capitalism seems to have been made possible by the information distribution and risk-sharing within informally linked business groups (Keister, 2001; Lincoln, Gerlach, & Takahashi, 1992; Lincoln, Gerlach, & Ahmadjian, 1996). More descriptive work by Buchnea (2014) has identified three different relational stages in the evolution of the Liverpool–New York trade in the period stretching from 1763 to 1833.

Narrative Networks Networks have also been used by researchers to link events together into larger narratives and can thereby be made into history-making machines in their own right. Bearman, Faris, and Moody (1999) extracted 14 life stories from inhabitants of a rural Chinese village who lived through an agrarian revolt, a counterrevolution, and a revolution between 1920 and 1950 and used network analysis to map overlap between accounts, thereby creating a new and compelling group-level history of the events. Narrative networks have shed light on several different social and cultural phenomena. Mohr (1994) analyzed categorical descriptions of types of eligible clients from the 1907 New York City Charity Directory to study the assignment of moral status to types of charity recipients. The result of blockmodeling the data is a glimpse into the moral universe of early 20th-century urban space. Bearman and Stovel (2000) used a previously published dataset of over 600 stories from National Socialist Workers’ Party members to explain how

Historical Network Research   437 “ordinary men and women became Nazis.” They treated events as nodes and narrative clauses as arcs, which enabled them to observe structural differences in accounts of identity formation. Franzosi (2004) has provided a broad overview and technical introduction to these methods.

Cohesion While many of the early articles analyzing historical social networks were theoretically groundbreaking, only a few had data of sufficient quality or quantity to fully analyze the impact of the structural characteristics of their networks—as opposed to the impact of the existence of informal or cross-cutting ties. One dimension of network structure that has been explored is cohesion and density, and in particular its importance to collective action efforts. The importance of cohesion in stimulating and sustaining revolutionary movements has been documented across a wide variety of historical sites including Iran, the Ottoman Empire, and British India (Barkey & Van Rossum, 1997; Goldstone, 2003; Rao & Dutta, 2012; Skocpol, 1982). Ahnert and Ahnert (2015) analyzed an epistolary network of Protestants under the reign of Mary I, a monarch who sought to reverse the Reformation. They observed that pockets of resistance were extremely well connected and robust to the loss of many central figures. Such a structure was likely a necessity given the threats of exile and death levied upon these groups. Density also seems to have insulated American mafioso figures from investigation. DellaPosta (2017) analyzed the tradeoff between decentralization and security, on one side, and integration and efficiency on the other in a criminal network spanning 700 members of 24 American mafia families in the mid-20th century and found high levels of modularity. In another study of religious patterns, Parigi (2012) probed into the relationship between social movements and formal organizations through an examination of the changing proc­ ess through which saints were canonized in the Roman Catholic Church. The practice emerged out of conflict between Roman authorities and local religious activists, who often ignored the churches’ dismissal of claims over miracles. Parigi used network analysis to show how the church used sainthood as a way to adjudicate these claims without ignoring local interests. Canonization, it turns out, depended on social cohesion and density at the level of the community of supporters for the proto-saints. Pointing to the need for greater refinement of measures—or more attention to the interaction between context and network structure—cohesion has also dampened the possibility  for collective resistance in numerous sites including southern Italy, the Philippines, Boston, and Pennsylvania (Banfield, 1958; Goodwin, 1997; Safford, 2009; Small, 2009). The kind of detailed data or analysis needed to ascertain whether these contrary findings are due to a curvilinear relationship with efficacy and cohesion or instead related to finer-­grained changes in network structure not captured by cohesion has not yet been gathered or performed. Newer work has incorporated promising aspects of dynamic mechanisms and multiplex relations. Böhm and Hillman (2015) employed a dynamic version of the social cohesion concept by investigating the role of social closure in the spread of abolitionism in 18th-­century Bristol. Abolition was against the economic interests of English merchants yet failed to find

438   Emily Erikson and Eric Feltham a foothold because elite merchants began to exclude slave traders from the denser portion of their professional networks. As the trade grew more disreputable, the remaining slave ­traders were forced to the margins of the larger merchant community, which proved to be both a deterrent and a dampener on any political action in support of slavery. Smith and Papachristos’s (2016) analysis of Al Capone’s known associates database links criminal power to multiplexity. By constructing a layered network of multiple tie types, the authors were able to observe multiple types of relational ties increasing the cohesion of Capone’s inner circle, that is, “the most criminally elite” (Smith & Papachristos, 2016, p. 662).

Brokerage and Centrality Another structural element of networks that has received considerable attention in historical work is the impact of high centralization and the presence of brokerage opportunities. One of the most influential works on historical social network analysis is John ­­­­­­Padgett and Christopher Ansell’s (1993) classic article “Robust Action and the Rise of the Medici.” Padgett and Ansell show that marital and economic ties across different Florentine factions of the 15th century created a significant brokerage opportunity for Cosimo de Medici. Using blockmodeling methods across family groups, the authors demonstrate the existence of a structural vacuum or hole at the center of the larger network that Medici was able to bridge to his lasting advantage. The structural benefit was so significant that Medici’s consolidation of power was a crucial moment of state centralization for Florence, thus linking centralization and state emergence. Adams (1996) has further shown that brokerage opportunities conferred structural advantages ­important to the imperial expansion of the Dutch East India Company in the 17th century. Barkey (2008) argued that the hub-and-spoke nature of Ottoman rule was crucial to sustained imperial control. Hillmann showed that a pattern of horizontal brokerage was essential to the emergence of national unity in 18th-century revolutionary Vermont (2008a) and parliamentary opposition to Charles I (2008b). In another realm entirely, Yue, Luo, and Ingram (2013) showed that banks with high centrality in late 19th- and early 20th-century New York finance worlds appear to have successfully insulated themselves from failure.

Conclusions The ideas that motivate social network analysis have a long history. In 1533, Thomas More drew an analogy that he took from Paul the Apostle. He posited that ideas, such as heresies, can spread through the social body as a canker spreads through the body of an individual (Ahnert & Ahnert, 2015). For centuries, historical social network analysis was largely metaphorical in its implementation. As can be seen in this brief overview, however, historical network research has advanced significantly over the last 30 or so years, yet many questions remain open and challenges persist. Every area of research needs to justify its existence, and much of the earlier work on networks was characterized by the need to show that networks matter to historical processes and large-scale events. This point has been established, and the progress of the field requires

Historical Network Research   439 more precise attention to the particular structural configurations, patterns of association, and dynamic mechanisms driving historical change. Although new sources of data are continually being made available, the historical record is fundamentally limited. Are recorded moments of interaction and exchange sufficient to produce a map of the interactional basis of social life for any given community, state, or­gan­i­za­ tion, or nation? Such analyses require compiling many different types of information, inventive use of data, and a careful attention to the existing history of the areas under investigation. When an accurate picture of community structure is constructed, the challenge remains that network mechanisms and structure may not have a consistent effect across contexts. One of the central tensions of historical network research is balancing general theoretical inquiry with contextually sensitive analysis, a problem heightened by the radical differences in social organization, cultural expectations, and environmental constraints that arise in different epochs and areas. There may well be a significant amount of causal heterogeneity. But this of course is exactly why historical network research is so necessary. How can we explore the interaction of cultural and institutional context with network structure without the variation that history provides? The same feature that makes historical analysis challenging is also what makes history such an important laboratory for understanding the effect of network structure and dynamics on society.

References Adams, J. (1996). Principals and agents, colonialists and company men: The decay of colonial control in the Dutch East Indies. American Sociological Review, 61, 12–28. Ahnert, R., & Ahnert, S. E. (2015). Protestant letter networks in the reign of Mary I: A quantitative approach. ELH, 82(1), 1. doi:10.1353/elh.2015.0000 Alexander, M. C., & Danowski, J. A. (1990). Analysis of an ancient network: Personal communication and the study of social structure in a past society. Social Networks, 12(4), 313–335. doi:10.1016/0378–8733(90)90013-Y Banfield, E. C. (1958). The moral basis of a backward society. New York, NY: Free Press. Barkey, K. (1991). Rebellious alliances: The state and peasant unrest in early seventeenth-century France and the Ottoman empire. American Sociological Review, 56, 699–715. Barkey, K. (2008). Empire of difference: The Ottomans in comparative perspective. Cambridge, UK: Cambridge University Press. Barkey, K., & Van Rossem, R. (1997). Networks of contention: Villages and regional structure in the seventeenth-century Ottoman empire. American Journal of Sociology, 102, 1345–1382. Bearman, P. (1993). Relations into rhetorics: Local elite social structure in Norfolk, England, 1540–1640. New Brunswick, NJ: Rutgers University Press. Blok, A. (1974). The mafia of a Sicilian village, 1860–1960: A study of violent peasant entrepreneurs. Oxford, UK: Basil Blackwell. Böhm, T., & Hillmann, H. (2015). A closed elite? Bristol’s society of merchant venturers and the abolition of slave trading. In E. Erikson (Ed.), Political power and social theory (Vol. 29, pp. 147–175). Bingley, UK: Emerald Group Publishing Limited. doi:10.1108/S0198871920150000029007 Buchnea, E. (2014). Transatlantic transformations: Visualizing change over time in the Liverpool–New York trade network, 1763–1833. Enterprise & Society, 15(4), 687–721. doi:10.1017/S1467222700016086

440   Emily Erikson and Eric Feltham Cline, D. H. (2012). Six degrees of Alexander: Social network analysis as a tool for ancient history. Ancient History Bulletin, 26, 1–2. Erikson, E., & Samila, S. (2018). Networks, Institutions, and Uncertainty: Information Flow in Early Modern Markets. Journal of Economic History, 78(4), 1034–1067. Fernandez, R.  M., & McAdam, D. (1988). Social networks and social movements: Multiorganizational fields and recruitment to Mississippi freedom summer. Sociological Forum, 3, 441–460. Foran, J., & Goodwin, J. (1993). Revolutionary outcomes in Iran and Nicaragua: Coali­ tion  fragmentation, war, and the limits of social transformation. Theory and Society, 22, 209–247. Franzosi, R. (2004). From words to numbers: Narrative, data, and social science. Cambridge, UK: Cambridge University Press. Goldstone, J. A. (2003). Revolutions: Theoretical, comparative, and historical studies. New York, NY: Cengage Learning. Goodwin, J. (1997). The libidinal constitution of a high-risk social movement: Affectual ties and solidarity in the Huk rebellion, 1946 to 1954. American Sociological Review, 62, 53–69. Gould, R. V. (1995). Insurgent Identities: Class, Community, and Protest in Paris from 1848 to the Commune. Chicago: University of Chicago Press. Granovetter, M. (1985). Economic action and social structure: The problem of embeddedness. American Journal of Sociology, 91(3), 481–510. Hillmann, H. (2008a). Localism and the limits of political brokerage: Evidence from revolutionary Vermont. American Journal of Sociology, 114, 287–331. Hillmann H. (2008b). Mediation in multiple networks: Elite mobilization before the English civil war. American Sociological Review, 73, 426–454. Hillmann, H., & Aven, B. L. (2011). Fragmented networks and entrepreneurship in late imperial Russia. American Journal of Sociology, 117(2), 484–538. doi:10.1086/661772 Ikegami, E. (2005). Bonds of civility: Aesthetic networks and the political origins of Japanese culture. Cambridge, UK: Cambridge University Press. Ingram, P., & Lifschitz, A. (2006). Kinship in the shadow of the corporation: The interbuilder network in Clyde River shipbuilding, 1711–1990. American Sociological Review, 71(2), 334–352. doi:10.1177/000312240607100208 Keister, L. A. (2001). Exchange structures in transition: Lending and trade relations in Chinese business groups. American Sociological Review, 66, 336–360. Kettering, S. (1988). The historical development of political clientelism. Journal of Interdisciplinary History, 18(3), 419–447. doi:10.2307/203895 Lachmann, R. (1987). From manor to market. In Structural change in England, 1536, 1640. Madison, WI: Wisconsin University Press. Lachmann, R. (2002). Comparisons Within a Single Social Formation: A Critical Appreciation of Perry Anderson’s Lineages of the Absolutist State. Qualitative Sociology, 25, 83–92. doi:10.1023/A:1014308324923 Lehmann, T., & Zhukov, Y. (2019). Until the bitter end? The diffusion of surrender across ­battles. International Organization, 73(1), 133–169. Lincoln, J. R., Gerlach, M. L., & Ahmadjian, C. L. (1996). Keiretsu networks and corporate performance in Japan. American Sociological Review, 61, 67–88. Lincoln, J. R., Gerlach, M. L., & Takahashi, P. (1992). Keiretsu networks in the Japanese economy: A dyad analysis of intercorporate ties. American Sociological Review, 57, 561–585. McAdam, D. (1986). Recruitment to high-risk activism: The case of freedom summer. American Journal of Sociology, 92, 64–90.

Historical Network Research   441 McAdam, D. (1988). Freedom summer. Oxford, UK: Oxford University Press. McLean, P. D. (2004). Widening access while tightening control: Office-holding, marriages, and elite consolidation in early modern Poland. Theory and Society, 33, 167–212. McLean, P. D. (2005). Patronage, citizenship, and the stalled emergence of the modern state in Renaissance Florence. Comparative Studies in Society and History, 47, 638–664. McLean, P. D. (2007). The art of the network: Strategic interaction and patronage in Renaissance Florence. Durham, NC: Duke University Press. Mohr, J. (1994). Soldiers, mothers, tramps and others: Discourse roles in the 1907 New York City charity directory. Poetics, 22(4), 327–357. doi:10.1016/0304-422X(94)90013–2 Olson, A. G. (1992). Making the empire work: London and American interest groups, 1690–1790. Cambridge, MA: Harvard University Press. Padgett, J. F., & Ansell, C. (1993). Robust action and the rise of the Medici, 1400–1434. American Journal of Sociology, 98, 1259–1319. Parigi, P. (2012). The rationalization of miracles. Cambridge, UK: Cambridge University Press. Parigi, P., & Bearman, P.  S. (2008). Spaghetti politics: Local electoral systems and alliance structure in Italy, 1984–2001. Social Forces, 87(2), 623–649. doi:10.1353/sof.0.0147 Parigi, P., & Bergemann, P. (2016). Strange bedfellows: Informal relationships and political preference formation within boardinghouses, 1825–1841. American Journal of Sociology, 122(2), 501–531. doi:10.1086/688606 Pfaff, S. (1996). Collective identity and informal groups in revolutionary mobilization: East Germany in 1989. Social Forces, 75(1), 91–117. doi:10.1093/sf/75.1.91 Rao, H., & Dutta, S. (2012). Free spaces as organizational weapons of the weak: Religious festivals and regimental mutinies in the 1857 Bengal native army. Administrative Science Quarterly, 57, 625–668. Safford, S. (2009). Why the garden club couldn’t save Youngstown: The transformation of the Rust Belt. Cambridge, MA: Harvard University Press. Skocpol, T. (1982). Rentier State and Shi’a Islam in the Iranian Revolution. Theory and Society, 11(3), 265–283. Small, M.  L. (2009). Villa Victoria: The transformation of social capital in a Boston barrio. Chicago, IL: University of Chicago Press. Smith, C.  M., & Papachristos, A.  V. (2016). Trust thy crooked neighbor: Multiplexity in Chicago organized crime networks. American Sociological Review, 81(4), 644–667. doi:10.1177/0003122416650149 Stark, D., & Bruszt, L. (1998). Postsocialist pathways: Transforming politics and property in East Central Europe. Cambridge, UK: Cambridge University Press. Stark, D., & Vedres, B. (2006). Social times of network spaces: Network sequences and foreign investment in Hungary. American Journal of Sociology, 111(5), 1367–1411. Tilly, C. (2005). Trust and rule. Cambridge, UK: Cambridge University Press. Tonnies, F. (2011 [1877]). Community and society. Mineola, NY: Dover Publications. Torfason, M.  T., & Ingram, P. (2010). The global rise of democracy: A network account. American Sociological Review, 75, 355–377. Trapido, D. (2013). Counterbalances to economic homophily: Microlevel mechanisms in a historical setting. American Journal of Sociology, 119(2), 444–485. doi:10.1086/673971 Trapido, D., & Hillmann, H. (2010). Relational counterbalances to economic endogamy: A theory and a historical example. Academy of Management Proceedings, 2010(1), 1–6. doi:10.5465/ambpp.2010.54493442 Van Doosselaere, Q. (2009). Commercial agreements and social dynamics in medieval Genoa. Cambridge, UK: Cambridge University Press.

442   Emily Erikson and Eric Feltham Viterna, J. S. (2006). Pulled, pushed, and persuaded: Explaining women’s mobilization into the Salvadoran guerrilla army. American Journal of Sociology, 112, 1–45. White, H. C. (1992). Identity and control: A structural theory of social action. Princeton, NJ: Princeton University Press. White, H.  C. (2008). Identity and control: How social formations emerge. Princeton, NJ: Princeton University Press. Wimmer, A. (2011). A Swiss anomaly? A relational account of national boundary-making. Nations and Nationalism., 17, 718–737. Wurpts, B., Corcoran, K.  E., & Pfaff, S. (2018). The diffusion of Protestantism in northern Europe: Historical embeddedness and complex contagions in the adoption of the Reformation. Social Science History, 42(2), 213–244. doi:10.1017/ssh.2017.49 Yue, L.  Q., Luo, J., & Ingram, P. (2013). The failure of private regulation: Elite control and ­market crises in the Manhattan banking industry. Administrative Science Quarterly, 58(1), 37–68. doi:10.1177/0001839213476502

pa rt i v

N ET WOR K L A N DS C A PE

Chapter 24

N et wor ks i n A rch a eol ogy Carl Knappett

A clandestine, underground group diffuse and hard to locate, and a family support group, dense, clustered, and conspicuous: for both organizations, the term network seems to come naturally (Kadushin,  2012). It is remarkable how wide the net falls, encompassing social groups with widely varying negative or positive connotations. And this is just for groups in human societies: we also see the term readily used to describe the relations between animals and plants in an ecosystem, not to mention bacteria, molecules, and enzymes. As well as such organismal communities, network also describes physical infrastructure. It can seem at times as if one can hardly talk or even think about any kind of connectivity—material, social, or metaphorical—without the idea of the network. This multiplication of usages is arguably confusing and misleading, so much so that we might have been better off abandoning the word altogether; and yet “the language of networks shows no sign of abating anytime soon” (Jagoda, 2016, p. 4). The figure of the network must have a flexible quality rendering it useful across a range of colloquial settings. Its utility has become quite apparent in scholarship too, as the exponential growth of “network science” over the last 15 years indicates. This is not to say that an academic conceptualization of the network was absent prior to this point—network analysis in sociology had by then become well established (see Scott, 2013, pp. 29–38, for historical overview). Nonetheless, the discovery of the mathematical properties of small worlds in 1998 had a revolutionary impact (Watts & Strogatz, 1998), to the extent that today we see network science across a vast array of domains: from neuroscience (Sporns, 2014) to evolutionary biology (Dagan, 2011) to biochemistry (Padgett & Powell, 2012), the list goes on. And, most relevantly for this review, it has also impacted upon archaeology. Indeed, from the rapidly growing archaeological literature one could easily gain the impression that it is futile discussing connectivity without networks. However, a striking feature of archaeological network analysis is that it typically focuses on dyadic links—while it is triadic links that are usually considered key in social network analysis. In this review I will explore some of the reasons for this disjuncture, and what a focus on dyads or triads might tell us about the nature of social network analysis in general. First, I briefly review dyads and triads in the

446   Carl Knappett social network analysis (SNA) literature, before looking at how network analysis has evolved in archaeology.

The Added Value of the “Network”? Dyads and Triads Relations generate networks, and the relationship between two people (in a social network) is an evident starting point, as it is the simplest kind of network. The relationship may be of various kinds—it may be simple, symmetric, or asymmetric, for example (Kadushin, 2012, pp. 14–15). But social networks tend not to be made of such dyads alone. A may like B, but B will probably have another friendship, with C. If there are then two dyads, such that A likes B, and B likes C, then in social networks it is very common for there also to be friendly relations between A and C.  This means we must think not just in terms of dyads, but a ­triad—a simple network of three people (or units of any kind). As Kadushin (2012, p. 16) puts it, “this simple network turns out to be the building block of more complex relations.” This basic structure is also sometimes described in terms of “network dependence,” because the pairs of relationships A–B and A–C depend to some degree on B–C (Brandes et al., 2013, p. 10; see also Collar et al., 2015, in an archaeological setting). As the same authors argue, “without dependence among ties, there is no emergent network structure” (Brandes et al., 2013, p. 10). There are quite a lot of possible configurations of triads, as Kadushin very usefully illustrates (Kadushin, 2012, Fig. 2.2). These are built up following different “rules,” such as balance and transitivity. Going back to our three actors A, B, and C, this triad is transitive (Figure 24.1). But then “whole” networks tend to stretch beyond just three actors. Even in a relatively small network, the various configurations of dyads and triads can become quite complicated. Kadushin uses the classic example of the karate club analyzed by Zachary (1977). The club membership of 34 makes for quite a small network, one might think, but the various relations are nonetheless complex (especially as Zachary recorded different kinds of

figure 24.1  A transitive triad.

Networks in Archaeology   447 relations among the members). As one might expect, the karate club network is quite dense—but even in a network of high density, there are only 156 connections out of a possible 1,112 (Kadushin, 2012, p. 29). The network shows not only the clustering typical of such social networks (with many transitive triads) but also the gaps that can easily occur in such networks too, with some individuals much less connected than others. One might expect with a larger network, say of hundreds of individuals, that cliques would form, but even here, within a small community, there are subcommunities. The study of these gaps, where triads do not form, is an important part of SNA, with such situations described as “structural holes” (Burt, 1992; Kadushin, 2012, pp. 29–30). These are the places where networks thin out and become less dense. So, for example, A may link to B, and B to C, but A, for whatever reason, does not link to C. If we then add two more actors, D and E, each of which links to B, but not to each other, then B comes to take on quite an important connecting role, thanks to the lack of transitivity between these other nodes. It has four connections, but the others each only have two (Figure 24.2). Kadushin (2012, p. 30) makes an interesting connection here, suggesting that the famous study on “the strength of weak ties” by Granovetter (1973) also focuses on network gaps. Here, though, the focus shifts from a node (in Figure 24.2, it is node B that has the “broker” role) to a link. We then have to show a slightly different diagram, one that focuses on a key “weak” tie between clusters, here the tie between A and B (Figure 24.3; from Granovetter, 1973, Figure 2). This tie has importance in the network overall because it connects the two dense clusters (or what Granovetter called “clumps”)—and it is only in this position because of the lack of links between clusters (i.e., B does not connect to D, A does not connect to I, etc.). That a link exists at all between A and B is perhaps against the odds—usually in social networks the connections are transitive. Granovetter explains this by saying that this kind of link is typically quite weak—just a passing acquaintance, rather than a friendship. It is unlikely with such an acquaintance between A and B that B will also know D. So it is a weak tie, perhaps not very important as a “dyad” in and of itself—but what the dyad does structurally in terms of the network overall is act as a conduit through which useful information might flow. Within any given cluster there is a lot of redundancy, so its members tend to hear the same information over and over; while this may have its advantages, it hinders the spread of new ideas across a wider community. Hence the counterintuitive idea in Granovetter’s term—these ties might be weak dyadically, but they are strong in terms of the overall network structure.1 C

D B

A

figure 24.2  A simple network of five nodes, A–E, with B having four links while the others only have two.

E I

E F C

A D

B

J

G H

figure 24.3  A network composed of two “clumps,” with one weak tie (between A and B) connecting them (after Granovetter, 1973).

448   Carl Knappett This section on dyads, triads, and structural holes may to a sociologist seem quite r­edundant in a handbook devoted to social networks. But for the archaeological reader, even such simple accounts of triads are relatively foreign. It is probably the scale of analysis that is responsible for this lacuna, as archaeological network analysis is often regional—a scale at which dense networks rarely exist, but rather sparse dyadic connections between separate social clusters. Thus, even though some archaeologists employing network analysis are certainly very much in tune with the tenets of SNA—as we will see later—there has not been much exploration of the dense local clusters where triads form. To some extent, this is perhaps understandable. Archaeologists might be forgiven for assuming that social actors in close proximity will have transitive relations, for seemingly obvious reasons of providing support and resources. As a corollary, it is logical to invest analytically in those rarer, perhaps more unpredictable (dyadic) “weak ties” between clusters. Which sites in any given cluster are able and willing to reach out beyond that cluster to establish more far-flung links? And what sites in other clusters are sought out in such exchanges? Since it is these less common, longer-range dyads that have captured the attention of archaeologists, network analysis in this domain has found itself naturally drawn to spatial networks at the macro-scale, with a corresponding interest in techniques drawn from physics and physical geography. Nevertheless, the influence of SNA on archaeological network analysis, particularly in the North American tradition, is increasingly showing the salience of local-scale social networks and the potential of multiscalar approaches (Blair, 2016; Mills, 2017; Peeples, 2018). Let us now look to some of the historical developments in network analysis in archaeology to uncover these trends.

Encounters with Network Thinking in Archaeology Some of the earliest examples of network analysis in archaeology can be found in the 1970s and 1980s among a group of researchers—John Terrell, Geoff Irwin, and Terry Hunt— working in the Pacific (e.g., Hunt, 1988; Irwin, 1978; Terrell, 1977). As Terrell himself notes, “this research focus had developed apart from the SNA tradition then beginning to flourish at Harvard and elsewhere” (Terrell, 2013, p. 20). Influences came instead from geography and population genetics, and the basis for these early network approaches lies in graph theory (Peeples 2019, 455). Archaeology’s early exposure to networks via geography is seen in the use of spatial interaction models in Aegean contexts too in the following decades, from the use of proximal point analysis (PPA) to understand site centrality in the Cyclades (Broodbank, 2000; Davis, 1982) to Rihll and Wilson’s collaboration on the locational logic of Greek city-states (Rihll & Wilson, 1987, 1991). These applications were rather sporadic, however, and it was only some 10 years ago that network analysis in archaeology started to become a more systematic form of enquiry. In some cases, the elaboration of spatial ­interaction models inspired by physics and graph theory implied a concern with connections across physical space (e.g., T. Evans, Knappett, & Rivers, 2009; Isaksen, 2008; Knappett, Evans, & Rivers, 2008; Sindbæk, 2007), though other early adopters drew from SNA too (e.g., Coward,  2010; Graham,  2006a; Munson & Macri,  2009). Just a little later,

Networks in Archaeology   449 archaeologists working across a range of periods and places engaged themselves in network analysis. One might mention a couple of initiatives. A session at the Annual Meeting of the Society for American Archaeology in 2010 brought together various scholars working on networks, which became the edited volume Network Analysis in Archaeology (Knappett, 2013), focusing on uses of network techniques to explore aspects of regional connectivity. Then Tom Brughmans, Anna Collar, and Fiona Coward collaborated in setting up a group called “The Connected Past,” again to share views and approaches across the wider community. They have organized meetings, a special issue of the Journal of Archaeological Method and Theory in 2015, and a volume with Oxford University Press (Brughmans, Collar, & Coward, 2016). Other recent collections dedicated to networks and connectivity in archaeology include editions of the Archaeological Review from Cambridge (S. Evans & Felder, 2014) and Les Nouvelles de l’Archéologie (Knappett, 2014). Book-length treatments with network thinking and/or analysis at their heart include Ruffini (2008), Malkin (2011), Knappett (2011), Blake (2014a), and Peeples (2018). Increasingly, SNA is very much a part of the archaeological conceptualization of networks, as exemplified in the Southwest Social Networks Project, the result of a close collaboration between sociologists and archaeologists (e.g., Mills et al., 2013, 2015; Peeples & Haas, 2013). But, as with any new trends that take off quite suddenly, initially leaving some people behind, there is the accusation that network analysis in archaeology is just a fad that will not last. This has been playfully covered by Anna Collar and colleagues in a recent review, with reference to Gartner’s “hype cycle,” describing the sudden “peak of inflated expectations” with which technological innovations are often met, only to be followed by a “trough of disillusionment” (Collar et al., 2015, p. 2). Some scholars are, implicitly or explicitly, reluctant to jump on what they see as a network bandwagon. Others are rather more agnostic and may not have anything against the term, even happily using it colloquially without really seeing any great need to adopt a more formal usage. And it has to be said that even though “networks” and “connectivity” may at times seem inseparable, one can perfectly well discuss connectivity without networks. Broadly speaking, the humanities have undergone a “relational turn,” with a heavy emphasis on relations and connections—but rarely is the network utilized as a formal method. There are also plenty of examples in archaeology. Cyprian Broodbank in his 2013 Making of the Middle Sea does an extraordinary job of presenting complex, long-term connectivities with little recourse to networks. Kristian Kristiansen presents a case for the massive connectivity of Bronze Age Europe (and see others who use world systems theory) without much more than a very loose use of the word network—certainly not an analytical usage (Kristiansen & Larsson,  2005; see also Kristiansen, Lindkvist, & Myrdal, 2018). Justin Jennings has a most useful book on “globalizations” in archaeology (Jennings,  2011) while only touching upon network thinking (though see his very useful 2006 paper against radial thinking). One could very well mention the Handbook on Archaeology and Globalization edited by Tamar Hodos (Hodos, 2017) and a recent paper by Anthony Harding—welcoming network ideas, while able to discuss connectivities quite effectively without them (Harding, 2013).2 Some of these scholars are quite positively disposed toward network analysis—indeed, Broodbank (2000) has put PPA to good use in his analysis of connectivity in the Early Bronze Age Cyclades. But it is, quite rightly, not seen as the be all and end all of any approach to connectedness. So, what does this bifurcation mean? Is it just a fad that will pass? Or are we seeing the early stages of a paradigm shift, with some early adopters and equally some laggards (Collar

450   Carl Knappett et al., 2015, p. 2)? Will all archaeological analysis of connectivity at some point employ networks? There do appear to be some striking advantages, such as the capacity it provides for handling relational data, or the heuristic potential it offers for putting connections center stage—an important corrective to the archaeological tendency for “dots on the map.” However, the quite distinct directions of network influence on archaeology—from geography and physics, on the one hand, to SNA on the other—do point to some underlying tensions that, if not resolved or at least recognized, may undo some of the progress made thus far. In the use of spatial interaction models, there is little recognition of the triads that are seemingly so fundamental to social networks (see Amati, Shafie, & Brandes, 2018), and yet the triads in social networks may not be that identifiable at archaeological scales anyway. So, is the search for regional dyads that we see often in network analysis in archaeology really at loggerheads with SNA and its triads? Or can we use archaeology to show, in a way that even SNA has perhaps not recognized, that searching for dyads and triads in social and geographical networks is perfectly reasonable? We will see now, using examples, how network analysis using spatial interaction models tends to pick out dyads, and that this procedure is tied to the “top down” nature in which links are formulated. Bottom-up approaches, in which links are created on the basis of data, are more in keeping with the edicts of SNA, but these are not necessarily much more successful in identifying triads. These two approaches we will designate, following Rivers (2016), as “theory” and “data” models, respectively.

Spatial Network Analysis and “Theory Models” Archaeologists are not uninterested in local social relations. Choosing sites rooted in physical space as the typical actors in networks would seem to suggest the opposite, but they are in effect a kind of agglomerated proxy for all the messy, mobile, social interactions of their inhabitants. Given the limited and patchy nature of the evidence, archaeologists have to create an averaged physical proxy for all the low-level individual social activity that eludes them. Some examples of seemingly physical networks can show this process at work, each with rather different assumptions about the social parameters affecting relations between actors. In our first scenario—of a series of small communities leading a somewhat marginal existence with few local resources—we confront a situation where we assume actors are concerned primarily with security and support. Furthermore, in such a scenario we might also assume that actors do not have access to technologies allowing them to connect over very long distances. This was Cyprian Broodbank’s starting point in then choosing a particular network model—PPA—for the Early Bronze Age Cyclades (Figure 24.4), where communities were small (200 to 300 people), scattered across an archipelago in which ­maritime transport was key, but only with rowing technology, making long journeys arduous and risky (Broodbank, 2000). With PPA, each site laid out on the map is simply connected up to its three or four nearest neighbors. The network thus generated implicitly favors local interactions, thus reflecting the “social” assumptions thought to be relevant—even though

Networks in Archaeology   451

figure 24.4  Four versions of a proximal point analysis for the Early Bronze Age Cyclades (after Broodbank, 2000). the network emerges because of the sites’ positions in physical space. It is of further interest in this case that Broodbank simulates increasing population over time, adding more and more notional sites to the space—this effectively makes the Cyclades more and more inward looking, as it allows for the formation of more and more local triads. A second scenario concerns this same space in the Aegean but takes a wider view to incorporate surrounding areas such as Crete, the Dodecanese, the Greek mainland, and western Anatolia (Figure 24.5). It also concerns a later period, in the later Middle Bronze Age (about a millennium later than the Broodbank case study), by which time sail technology had appeared, allowing more reliable travel over long distances (Knappett et al., 2008). The archaeological evidence also points to more thoroughgoing connections across quite long distances, so we have to assume a different kind of social setting, in which local support networks are not the overriding reason for interaction for all sites. Thus, PPA is not the most appropriate technique, and so for this analysis a different method—which we called “ariadne”—was devised to permit both local interactions and more far-flung connections, according to their relative costs and benefits.3 Again, the network is treated like a physical system, although indirectly these physical qualities are related to social circumstances. If, as in the first case, we reckon that sites are actors concerned above all with social support, then we might predict clustered networks with lots of triads. If, as in the second case, we think

452   Carl Knappett

figure 24.5  An output of the “ariadne” network model, showing hypothetical connectivity across the southern Aegean during the later Middle Bronze Age. that sites are also seeking efficient access to resources, even if at some distance, then we might expect ties that break out of these clusters—ties that are longer distance, locally weaker,4 and dyadic. It is perhaps no accident that these two examples employing spatial network models are in maritime, archipelago settings. In archaeology, maritime networks have attracted more analyses of this kind (see Leidwanger et al., 2014) than have terrestrial or riverine networks. However, quite the opposite seems to be the case in other disciplines, such as geography and maritime history, where César Ducruet argues that “maritime transport has been much less studied than any other transport mode, especially from a network perspective” (Ducruet, 2015, p. 3). One of the reasons he puts forward for this discrepancy is the “vague geographic distribution and morphology of maritime flows due to the absence of a track infrastructure” (p. 3). This could be the very factor for why the maritime has, conversely, been attractive for archaeological network modeling. Archaeology generally lacks the kind of direct evidence for transport from site to site that both geographers and historians can access. So rather than work with direct evidence for connections, the archaeologist models connectivity between sites—and the apparent vagueness of the “morphology of maritime flows” makes seascapes much more amenable to modeling exercises. Although in reality a sea voyage out can be very different from the voyage back, with winds and currents quite critical (Leidwanger, 2011; Tartaron, 2013), archaeologists do typically consider the sea in such exercises as a uniform space in which travel in any direction is theoretically possible— and hence there exist many different possibilities for those “cluster breaking” (dyadic) links that are so key to their analyses. When it comes to terrestrial networks, however, such links

Networks in Archaeology   453 seem much less freely available; if settlements follow a river valley, how can one settlement leapfrog or bypass another immediately upstream or downstream to reach a more distant cluster, with new resources? It is harder to envisage how any given settlement can avoid its local clusters. We can clearly see the rather more predictable linearity of these routes in the few examples of archaeological network analysis on terrestrial networks, such as the work of Jenkins on the Inka road network (Jenkins, 2001, Figs. 2 and 3; see also Santley, 1991, on the Aztec transport network), or that of Peregrine on the position of Cahokia in relation to riverine networks (Peregrine, 1991). An interesting case in this regard is the work by Menze and Ur (2012) on settlement locations and intersite road networks in Syria during the third millennium BC—in their Figure 8 (see Figure 24.6), it looks as if the intersite transportation network they identify is similarly dyadic and linear, with few indications of transitivity (Menze & Ur, 2012). Perhaps this is a question of scale: the local networks are transitive and the regional ones are dyadic, irrespective of maritime or terrestrial transport. In each of these terrestrial examples it is not the local network scale that is being examined, but the regional. If local networks are transitive regardless of sea or land, then where sea and land meet, one might expect the network spaces to be fundamentally unchanged—which is not what one might typically predict, given that coastlines are usually considered as important thresholds. If local, short-distance interactions are possible though, there seems no reason to differentiate too radically between land and sea. This is seen quite neatly in Tartaron’s work in the Saronic Gulf (Tartaron, 2013), where the sea is enclosed by land masses (Argolid, Corinthia, Attica), and there are many coastal and island settlements (Figure 24.7). In such a space, it seems quite possible for communities to maintain locally transitive networks that straddle both land and sea.

figure 24.6  Network of settlements in third millennium BC Syria (from Menze & Ur, 2012, Figure 8, courtesy of Bjoern Menze and Jason Ur). Full color figures available on Oxford Handbooks Online.

454   Carl Knappett

figure 24.7  Map of the Saronic Gulf, Greece, and surrounding areas (from Tartaron, 2013, courtesy of Tom Tartaron). Much of the Aegean is like this, but not all of it—the island of Crete, for example, is like a mini-continent but has fairly open expanses of sea all around (except, arguably, for Antikythera and Kythera to the northwest and Kasos to the northeast). In this scenario, local networks are inevitably terrestrial, and along coastlines of course, while any maritime connection goes beyond the local (Thera, for example, is some 100 km distant). Unlike the Saronic Gulf, here one can perhaps see the coast as an important threshold between local triadic clusters and regional dyadic links. This is quite well expressed in a paper by Donald Haggis on settlement structure in Minoan Crete, showing very neatly the idea that one might move over time from a clustered, more localized structure to one that is more dyadic, essentially, with longer-distance connections (see Figure 24.8, from Haggis, 2002, Figure 7.4). Interestingly, some recent efforts to develop spatial interaction models using archaeological data have been directed at terrestrial networks (Bevan & Wilson,  2013; Davies et al., 2014). This may seem odd given the predominance of maritime spaces as the preferred arena for network modeling in archaeology. However, the terrestrial focus can in part be explained by the involvement in both of these publications, on the island of Crete and the Khabur triangle in Syria, respectively, of geographer Alan Wilson. In the late 1960s and 1970s he had developed spatial interaction models for understanding various urban dynamics, such as retail sales in shopping centers (Wilson, 1971). He also collaborated with ancient historian Tracey Rihll using similar modeling techniques to understand settlement hierarchy and the location of city-states in Attica in the eighth and seventh centuries BC (Rihll & Wilson, 1987, 1991). This confluence of a terrestrial focus and the discipline of ­geography makes perfect sense when one bears in mind Ducruet’s observation, noted earlier, that in

Networks in Archaeology   455

PRESTATE CONDITON Heterarchical Integrated

STATE CONDITON Hierarchical Non-integrated Low connectedness High connectedness

figure 24.8  Network models of settlement distributions for hypothesized prestate and state conditions in Bronze Age Crete (after Haggis, 2002). this discipline land transport has received far more attention than maritime connectivity. It thus constitutes a fascinating shift in emphasis within archaeological modeling toward the terrestrial, although the maritime still receives its share of attention (see Leidwanger et al., 2014; Leidwanger & Knappett, 2018). It also differs from the examples mentioned earlier, such as Jenkins (2001) and Peregrine (2001), in that they involve very little input of actual data. Links between sites are not made on the basis of known roads (in contrast to Menze & Ur, 2012, for example) but are simply drawn across physical space. These are top-down “theory models” (see Rivers, 2016, pp. 124–125) that put space and distance first in their reckoning of the overall structure that interactions are likely to take. A distinctive feature that emerges from these spatial interaction models is the radial pattern, seen in the illustrative figures in Rihll and Wilson (1991), Bevan and Wilson (2013), and Davies et al. (2014). There is very little if any transitivity in these networks, because they are not concerned with triads. Let us consider briefly, and perhaps oversimplistically, one of the main original targets of these kinds of models—the location of shopping centers in relation to consumers. In these networks there are evident motivations—acquiring groceries and so forth—and clear directionality, with households resource poor and supermarkets resource rich. In Wilson’s model, if household A and household B both shop at supermarket C, no connection is then drawn between households A and B—except in the sense that they have a kind of epistemic connection, in that they have both been affected by gravity similarly, and so both probably have similar spatial characteristics.5

From Theory Models to Data Models These theory models derive largely from physics and network science. They differ from “data models” (Read,  2008; Rivers,  2016, pp. 124–125), which draw “real” connections between nodes based on empirical evidence, rather than hypothesized interactions in space. In archaeology, these generally take their inspiration from sociology, and specifically SNA.6

456   Carl Knappett Network type 1

Context 2

Context 1

Network type 2

Attribute 2

Context 3 The relation between the values of several attributes is the same in contexts 1 and 3.

Geographical space

Attribute 1

The relation between the values of attributes 2 and 3 is the same in several contexts.

Attribute 3

Topological space

figure 24.9  Networks of type 1 and type 2 (after Östborn & Gerding, 2014, courtesy of Per Östborn). Östborn and Gerding describe only the latter (i.e., data models) in terms of network analysis, suggesting that the former—what we are here calling theory models—only uses network theory to generate network modeling (Östborn & Gerding, 2014, p. 75). Within such network analysis (which, contra Östborn and Gerding, still deals in models, though of the data rather than the theory variety), Östborn and Gerding then distinguish between two kinds of network, types 1 and 2 (Figure 24.9). Networks of type 1 are spatial. Links between nodes can be drawn in various ways, perhaps even using actual known road connections between sites, but often in archaeology, in the absence of such direct evidence, some proxy is used, typically shared attributes (which could be architectural, ceramic, lithic, etc.). These shared attributes need not originate in any of the network contexts but exist as imports—which then implies a separation between producing and consuming sites, though the former may not actually be included in the network. In other words, there could be resource-poor sites obtaining products from resource-rich sites (which introduces directionality). So, two sites may be “connected” through their shared consumption of certain artifacts—though we might not be able to see any evidence for direct contact, these similarities act as indirect signs of their likely contact. Networks built up in this way have been classed as “general similarity networks” (Östborn & Gerding, 2014). Östborn and Gerding do make some quite pertinent criticisms of some existing studies of this kind in archaeology. One of the earliest examples is in Sindbæk (2007), who, in his study of connectivity in early Viking Age southern Scandinavia, uses copresence of any among 31 artifact types to connect the 71 settlements from which these artifact finds are taken. They have two criticisms. One is that he fits the data to a preconceived notion of “small-world networks”—although this expectation actually seems largely derived from another kind of data, in the form of the Vita Anskarii text from the ninth century, which records various journeys across the region in question, and as far south as Rome. So Sindbæk constructs a network on the basis of 22 sites visited by 55 named individuals, resulting in 116 relations, with sites linked when there is a co-occurrence of a person or group. This network derived from the ninth-century text has the structure of a small world, according to Sindbæk, one in which there are a few hubs that give a scale-free pattern (one way in which small-world networks can be formed, he says), which is why he then conducts

Networks in Archaeology   457 his material analysis to see if the artifactual data also indicate such a structure. Therefore, his “assumption” of a particular network structure is not really a “theory model,” because it derives from data; yet Östborn and Gerding suggest that the problem lies in the inability of this analysis to discriminate between alternative assumptions (Östborn & Gerding, 2014, p. 78; see similar critique in Brughmans, 2010, p. 278). Östborn and Gerding have a second criticism, which is that allowing connections between sites on the basis of any single copresence among 31 possible types creates the problem of false positives. Östborn and Gerding (2014) then assess other archaeological implementations of general similarity networks, arguing that Coward does something similar in her work on early Neolithic networks in the Near East, though her results are more robust because she performs statistical tests on her findings (Coward, 2010, 2013); and Mills et al. in their work on patterns of connectivity and population movement during the period 1200–1500 AD in the US Southwest add a further level of sophistication by considering not just presence, but relative abundance of ceramic types at sites (e.g., Mills et al., 2013). This is one of the most striking examples of “general similarity networks” that has been implemented in archaeology, because of the huge ceramic dataset they have assembled. With many different ceramic ware groups variably represented from site to site, when two sites share a lot of the same wares, then this becomes a proxy to say that they are probably connected quite strongly to each other (and more so than sites that share fewer wares). On the basis of this measure of similarity, sites are connected to form social networks that may or may not respect geographical distance—and indeed, their results show that sites that are strongly related socially (based on artifactual similarity) are often separated by quite long distances (Mills et al., 2013). A good example of a similarity network elaborated on the basis of an imported material distributed across “consumer” sites is found in the work of Mark Golitko and colleagues on obsidian (Golitko & Feinman, 2015; Golitko et al., 2012). In their earlier study (Golitko et al., 2012) they focus on the Maya world, identifying the differential use of obsidian sources from site to site (Figure 24.10). But rather than simply link these sites through the copresence of obsidian (from a given source), they group sites together on the basis of similarities in their frequencies of types of obsidian source (using Brainerd-Robinson coefficients of similarity, as do Mills et al.). Their later work uses the same method but extended to cover all of Mesoamerica.7 Freund and Batist have performed the same kind of network analysis on obsidian distributions in the Neolithic west Mediterranean (Freund & Batist, 2014). The Brainerd-Robinson coefficient for similarity measures is also found in a study of Iroquoian pottery decoration (Hart & Engelbrecht, 2012; Hart et al., 2016). This is a statistic defined within archaeology to assess how similar the material assemblage of one site is to that of another. The analyst defines the assemblages on the basis of a range of types that are considered key. If two assemblages are said to have high similarity, this means that they “both have very similar proportions of the relevant types” (Cowgill, 1990, p. 513; see also Weidele et al., 2016). The comparisons between assemblages are conducted pairwise. Östborn and Gerding (2014) suggest that such studies typically use just one attribute (i.e., ceramics, or obsidian). However, they do note that some combine different criteria, picking out a paper by Emma Blake on networks in pre-Roman Italy, in which she not only links up contexts that share certain artifacts but also insists that they have to be no more than 50 km apart (Blake, 2013).8 One might argue too that the work of Mills et al. (2013) also incorporates more than one attribute, as they include obsidian as well as ceramic data.9 In Östborn and Gerding’s own work on Hellenistic fired bricks, they argue for the use of multiple

458   Carl Knappett

figure 24.10  A network of Maya sites connected by their choice of obsidian sources (from Golitko et al., 2012, courtesy of Mark Golitko). attributes to ascertain similarity. Although “fired brick” might itself seem to constitute a single category, it is a very broad kind of building material that covers many functions, used in walls, columns, and pavements. These general similarity networks, then, are principally concerned with dyads. Is there some way in which archaeological network studies of this genre might pay closer attention to triads, and thereby make fuller use of some of the key insights of SNA? This argument has been recently made by Amati et al. (2018), critiquing spatial interaction models for their narrow conceptualization of links as dyadic, and proposing exponential random graph models as a means for incorporating tie dependencies and hence triadic connections. Emma Blake addresses this same issue from a different angle, saying that archaeologists have emphasized “macrolevel studies of entire networks” at the expense of local structure (Blake,  2014b, p. 28). She shows that local networks in Bronze Age Italy have quite different structures, with some seemingly more locally based and with greater transitivity. Even though she uses a triad census to assess local structure, she recognizes, following Goodreau, Kitts, and Morris (2009), that “transitivity tells more about dyadic relations, even if we study it through triads” (Blake, 2014b, p. 30). Does this perhaps then mean that we can use dyads across all scales, from the micro-level to the macro-level, and that there is, after all, no fundamental incompatibility between scales of analysis? Triads might form more readily at the local level and be a useful tool for assessing local structure, but arguably they are “byproducts of social dynamics in dyads” (see Blake, 2014b, p. 31, citing Kitts & Huang, 2010).

Networks in Archaeology   459

Entangled Networks of Humans and Things Nonetheless, we are still left with a bifurcation between type 1 and type 2 networks, following the distinction made by Östborn and Gerding (2014). All of the aforementioned data models of type 1 networks use the contexts of attributes as the network nodes (see Figure 24.9). These contexts are usually sites, and the networks thus occupy geographical space. However, in type 2 networks, the attributes become the nodes, and the networks occupy topological space. In other words, an attribute might be a brick stamp, with brick stamps then connected in a network by some feature thereof, such as the individual named in the stamp. The network thus generated would not occupy a geographical space, but rather a conceptual one. This is actually a case study by Graham (2006b), one of the earliest of its kind in archaeology; Östborn and Gerding cite Brughmans (2010) as another, with pottery forms connecting in a “relational network of co-presence.” Another interesting example of a type 2 network is that generated in a study of Hellenistic sculptural production (Larson, 2013). Using epigraphic evidence of signatures on sculptures (a total of 493 inscriptions), which typically contain both patronyms and toponyms, Larson is able to trace both familial associations, on the one hand, and geographic associations on the other. While the former do not appear to have much of a role in the organization of sculptural production, “geographic connections among sculptors demonstrate a cohesive structure” (Larson, 2013, p. 245). While this may sound like a physical network of type 1, the two-mode affiliation networks that Larson produces can very easily be projected in topological space, with actual geography irrelevant. This use of two-mode networks with epigraphic evidence is also discussed in a further paper by Graham (2014), who argues that epigraphy offers “one of the richest skeins to unpick for network analysis in archaeology,” as it is a form of data that offers rich social relationships of a kind that are normally quite elusive in our discipline (Graham, 2014, p. 39). He uses some of the different data (including epigraphy) on Roman stamped bricks to create affiliation networks, for example, between different social roles in the brick industry (how domini link to other domini via shared officinatores).10 One could argue that a lot of the potential of type 2 networks for archaeologists lies in their capacity to link up people and things, each treated as a different “mode” in an affiliation network. This is of particular interest because a lot of the discussion of “materiality” in recent archaeological theory has struggled to incorporate a human presence—as if things are merely interacting with other things somewhat independently of human involvement. One significant attempt to work with this problem in material culture studies can be found in Ian Hodder’s recent work on “entanglement theory” (Hodder, 2012). What is quite interesting in this theory is the development of a powerful focus on the relations of dependence and dependency between people and things; it is evidently an approach strongly oriented to relations, without being network oriented per se (despite clear resonance with “de­pend­ ence” in Brandes et al., 2013; see earlier). Entanglement theory does flirt with ­network analysis, although Hodder eventually rejects it for being insufficiently “sticky”—which we should take to mean a lack of capacity to convey the sense in which one gets more and more entangled in a network of people and things over time, as one might in a spider’s web. Since

460   Carl Knappett the 2012 book Entangled, however, in which this critique was voiced, Hodder has collaborated with Angus Mol in recently publishing a paper in which he works out from his initial “tanglegrams” to conceive of them in more formal network terms (Hodder & Mol, 2016). The attributes that are connected in topological space are quite varied in character, and their interconnections are somewhat vague—but it is nonetheless a useful exercise in thinking in slightly more formal terms about how one might devise type 2 networks that manage to incorporate the diverse material culture assemblages with which we surround ourselves. It is perhaps, in effect, a kind of micro-scale theory modeling. Whether this move will allow archaeologists to move toward the kind of complementarity of agency-based and structural approaches that Schortman (2014) hopes for is another matter.11 But it does not address the continuing question of how to resolve the fundamental differences between type 1 and type 2 networks (the former with contexts as nodes, the latter with attributes)—currently it still remains very difficult working with both together, to the extent that most computer applications do not allow such complementarity.12 If some progress could be made on this front, then we might finally be better placed to talk about local social triads and regional spatial networks together, theory and data models, and both top-down and bottom-up dynamics (see Knappett, 2011, and Mills et al., 2015, on potential of networks for multiscale analysis).

Acknowledgments I am extremely grateful to Tim Evans and Ray Rivers for everything they have taught me in our decade-long collaboration and friendship. Barbara Mills and Ryan Light generously offered invaluable feedback on earlier drafts. I would like to thank Bjoern Menze, Jason Ur, Tom Tartaron, Per Östborn, and Mark Golitko for graciously providing images used in some of the figures.

Notes 1. One might note here the important rejoinder that weak ties only have this strength when the information to be transferred can be spread with relative ease, in cases of what have been called “simple contagion.” Where “complex contagion” is concerned, however, adopters of an innovation may need to hear the same information repeatedly, from different trusted sources—and so the redundancy found through repeated strong ties can in such cases be more beneficial than less dense weak ties (Centola & Macy,  2007; see also Centola, 2018). 2. See also Kristiansen (2014); Nakoinz (2013). There are further studies that seem more ­computationally network based, such as Pearce and Moutsiou (2014), on late Pleistocene hunter-gatherers, and Ossa (2013) on network expectations for exchange systems in ancient Mesoamerica, though they do not generate network visualizations. 3. See, for example, Knappett, Evans, and Rivers (2008, 2011). 4. Though note that it was a key insight of Granovetter (1973) that the local weakness of a tie could be the very reason for its broader, structural strength—see Centola and Macy (2007, p. 703). 5. See also Fulminante (2012) for another example of a theory model, in this case used to predict the emergence of central places in protohistoric peninsular Italy (using Delaunay triangulation).

Networks in Archaeology   461 6. Although, perhaps counterintuitively, they do not seem to be as focused on triads as one might then expect. 7. For further network analysis in ancient Mesoamerica, based on epigraphic evidence and site locations, see Munson and Macri (2009); Scholnick, Munson, and Macri (2013). 8. So this, in a sense, combines aspects of theory and data models. 9. Consider also Mizoguchi (2009), who takes both tomb types and pottery distributions into account, though the manner of their combination is not especially clear in his ­analysis. 10. For another study of networks using epigraphic data, see Ruffini (2008). 11. See also interesting work by Astrid van Oyen assessing complementarities of actor­network theory and SNA (e.g., van Oyen, 2016). In a recent review of network approaches in archaeology, Peeples (2019, 463) also underlines the potential for archaeology to contribute to wider debates concerning the effects of culture and history on network processes and structure. 12. Here one might mention Orengo and Livarda (2016), which sets out to combine the two, to some degree, but which largely remains with type 1 networks. They suggest a way of combining spatial network modeling with archaeological data (distribution of imported exotic plants in Roman Britain), though the “social” links they suggest are indicated by imports that are still very spatialized.

References Amati, V., Shafie, T., & Brandes, U. (2018). Reconstructing archaeological networks with structural holes. Journal of Archaeological Method and Theory, 25(1), 226–253. Bevan, A., & Wilson, A. (2013). Models of settlement hierarchy based on partial evidence. Journal of Archaeological Science, 40(5), 2415–2427. Blair, E. (2016). Glass beads and constellations of practice. In A.  P.  Roddick & A.  B.  Stahl (Eds.), Knowledge in motion: Constellations of learning across time and place (pp. 97–125). Tucson, AZ: University of Arizona Press. Blake, E. (2013). Social networks, path dependence, and the rise of ethnic groups in pre-Roman Italy. In C.  Knappett (Ed.), Network analysis in archaeology: New approaches to regional interaction (pp. 203–221). Oxford, UK: Oxford University Press. Blake, E. (2014a). Social networks and regional identity in Bronze Age Italy. Cambridge, UK: Cambridge University Press. Blake, E. (2014b). Dyads and triads in community detection: A view from the Italian Bronze Age. Les Nouvelles de l’Archéologie, 135, 28–32. Brandes, U., Robins, G., McCranie, A., & Wasserman, S. (2013). What is network science? Network Science, 1(1), 1–15. Broodbank, C. (2000). An island archaeology of the early Cyclades. Cambridge, UK: Cambridge University Press. Broodbank, C. (2013). The making of the middle sea: A history of the Mediterranean from the beginning to the emergence of the Classical World. London, UK: Thames and Hudson. Brughmans, T., Collar, A., & Coward, F. (Eds.). (2016). The connected past: Challenges to network studies in archaeology and history. Oxford, UK: Oxford University Press. Burt, R.  S. (1992). Structural holes: The social structure of competition. Cambridge, MA: Harvard University Press.

462   Carl Knappett Centola, D. (2018). How behavior spreads: The science of complex contagions. Princeton, NJ: Princeton University Press. Centola, D., & Macy, M. (2007). Complex contagions and the weakness of long ties. American Journal of Sociology, 113(3), 702–734. Collar, A., Coward, F., Brughmans, T., & Mills, B.  J. (2015). Networks in archaeology: Phenomena, abstraction, representation. Journal of Archaeological Method and Theory, 22, 1–32. Coward, F. (2010). Small worlds, material culture and ancient Near Eastern social networks. Proceedings of the British Academy, 158, 453–484. Coward, F. (2013). Grounding the net: Social networks, material culture and geography in the Epipalaeolithic and Early Neolithic of the Near East (21,000–6,000 cal BCE). In C. Knappett (Ed.), Network analysis in archaeology: New approaches to regional interaction (pp. 247–280). Oxford, UK: Oxford University Press. Cowgill, G.  L. (1990). Why Pearson’s r is not a good similarity coefficient for comparing ­collections. American Antiquity, 55(3), 512–521. Dagan, T. (2011). Phylogenomic networks. Trends in Microbiology, 19, 483–491. Davies, T., Fry, H., Wilson, A., Palmisano, A., Altaweel, M., & Radner, K. (2014). Application of an entropy maximizing and dynamics model for understanding settlement structure: The Khabur Triangle in the Middle Bronze and Iron Ages. Journal of Archaeological Science, 43, 141–154. Davis, J.  L. (1982). Thoughts on prehistoric and Archaic Delos. Temple University Aegean Symposium, 7, 23–33. Dormann, C. F., & Strauss, R. (2013). Detecting modules in quantitative bipartite networks: The QuaBiMo algorithm. http://arxiv.org/abs/1304.3218 Ducruet, C. (2015). Maritime flows and networks in a multidisciplinary perspective. In C.  Ducruet (Ed.), Maritime networks: Spatial structures and time dynamics (pp. 3–26). London, UK: Routledge. Evans, S., & Felder, K. (2014). Making the connection: Changing perspectives on social networks. Archaeological Review from Cambridge, 29(1), 9–17. Evans, T., Knappett, C., & Rivers, R. (2009). Using statistical physics to understand relational space: A case study from Mediterranean prehistory. In D. Lane, D. Pumain, S. van der Leeuw, & G. West (Eds.), Complexity perspectives on innovation and social change (pp. 451–480). Berlin, Germany: Springer. Freund, K. P., & Batist, Z. (2014). Sardinian obsidian circulation and early maritime navigation in the Neolithic as shown through social network analysis. Journal of Island and Coastal Archaeology, 9, 364–380. Fulminante, F. (2012). Social network analysis and the emergence of central places. A case study from Central Italy (Latium Vetus). BABESCH, 87, 27–53. Golitko, M., & Feinman, G.  M. (2015). Procurement and distribution of pre-Hispanic Mesoamerican obsidian 900 BC–AD 1520: A social network analysis. Journal of Archaeological Method and Theory, 22, 206–247. Golitko, M., Meierhoff, J., Feinman, G. M., & Williams, P. R. (2012). Complexities of collapse: The evidence of Maya obsidian as revealed by social network graphical analysis. Antiquity, 86, 507–523. Goodreau, S., Kitts, J., & Morris, M. (2009). Birds of a feather or friend of a friend? Using exponential random graph models to investigate adolescent friendship networks. Demography, 46(1), 103–126.

Networks in Archaeology   463 Graham, S. (2006a). Networks, agent-based models and the Antonine Itineraries: Implications for Roman archaeology. Journal of Mediterranean Archaeology, 19, 45–64. Graham, S. (2006b). Ex figlinis, the network dynamics of the Tiber Valley brick industry in the hinterland of Rome. Oxford, UK: BAR Int. Ser. 1486. Graham, S. (2014). On connecting stamps—Network analysis and epigraphy. Les Nouvelles de l’archéologie, 135, 39–44. Granovetter, M. (1973). The strength of weak ties. American Journal of Sociology, 78(6), 1360–1380. Haggis, D. C. (2002). Integration and complexity in the late prepalatial period: A view from the countryside in eastern Crete. In Y.  Hamilakis (Ed.), Labyrinth revisited: Rethinking Minoan archaeology (pp. 120–142). Oxford, UK: Oxbow Books. Harding, A. F. (2013). World systems, cores, and peripheries in prehistoric Europe. European Journal of Archaeology, 16, 378–400. Hart, J. P., & Engelbrecht, W. (2012). Northern Iroquoian ethnic evolution: A social network analysis. Journal of Archaeological Method and Theory, 19(2), 322–349. Hart, J. P., Shafie, T., Birch, J., Dermarkar, S., & Williamson, R. F. (2016). Nation building and social signaling in southern Ontario: A.D. 1350–1650. PLoS One, 11(5), e0156178. Hodder, I. (2012). Entangled: An archaeology of the relationships between humans and things. Malden, MA: Wiley-Blackwell. Hodder, I., & Mol, A. (2016). Network analysis and entanglement. Journal of Archaeological Method and Theory, 23(4), 1066–1094. Hodos, T. (Ed.). (2017). The Routledge handbook of archaeology and globalization. London, UK: Routledge. Hunt, T.  L. (1988). Graph theoretic network models for Lapita exchange a trial application. In P. V. Kirch & T. L. Hunt (Eds.), Archaeology of the Lapita cultural complex: A critical review (pp. 135–155). Research Report 5. Seattle, WA: Thomas Burke Memorial Washington State Museum. Irwin, G. J. (1978). Pots and entrepôts: A study of settlement, trade and the development of economic specialization in Papuan prehistory. World Archaeology, 9, 299–319. Isaksen, L. (2008). The application of network analysis to ancient transport geography: A case  study of Roman Baetica. Digital Medievalist, 4. http://www.digitalmedievalist.org/ journal/4/isaksen/ Jagoda, P. (2016). Network aesthetics. Chicago, IL: University of Chicago Press. Jenkins, D. (2001). A network analysis of Inka roads, administrative centers, and storage facilities. Ethnohistory, 48(4), 655–687. Jennings, J. (2006). Core, peripheries, and regional realities in Middle Horizon Peru. Journal of Anthropological Archaeology, 25, 346–370. Jennings, J. (2011). Globalizations and the ancient world. Cambridge, UK: Cambridge University Press. Kadushin, C. (2012). Understanding social networks: Theories, concepts and findings. Oxford, UK: Oxford University Press. Kitts, J. A., & Huang, J. (2010). Triads. In A. G. Barnett (Ed.), Encyclopedia of social networks. New York, NY: Sage. Knappett, C. (2011). An archaeology of interaction: Network perspectives on material culture and society. Oxford, UK: Oxford University Press. Knappett, C. (Ed.). (2013). Network analysis in archaeology: New approaches to regional interaction. Oxford, UK: Oxford University Press. Knappett, C. (Ed.). (2014). Special edition of Les Nouvelles de l’Archéologie: Les réseaux sociaux en archéologie.

464   Carl Knappett Knappett, C., Evans, T., & Rivers, R. (2008). Modelling maritime interaction in the Aegean Bronze Age. Antiquity, 82, 1009–1024. Knappett, C., Evans, T., & Rivers, R. (2011). The Theran eruption and Minoan palatial collapse: New interpretations gained from modelling the maritime network. Antiquity, 85, 1008–1023. Kristiansen, K. (2014). Towards a new paradigm? The third science revolution and its possible consequences in archaeology. Current Swedish Archaeology, 22, 11–34. Kristiansen, K., & Larsson, T. B. (2005). The rise of Bronze Age society: Travels, transmissions and transformations. Cambridge, UK: Cambridge University Press. Kristiansen, K., Lindkvist, T., & Myrdal, J. (Eds.). (2018). Trade and civilisation: Economic networks and cultural ties, from prehistory to the early modern era. Cambridge, MA: Cambridge University Press. Larson, K.  A. (2013). A network approach to Hellenistic sculptural production. Journal of Mediterranean Archaeology, 26(2), 235–260. Leidwanger, J. (2011). Maritime archaeology as economic history: Long-term trends of Roman commerce in the northeast Mediterranean (Unpublished doctoral dissertation). University of Pennsylvania. Leidwanger, J., & Knappett, C. (Eds.). (2018). Maritime networks in the ancient Mediterranean world. Oxford, UK: Oxford University Press. Leidwanger, J., Knappett, C., Arnaud, P., Arthur, P., Blake, E., Broodbank, C., . . . Van de Noort, R. (2014). A manifesto for the study of ancient Mediterranean maritime networks. Antiquity+. Malkin, I. (2011). A small Greek world: Networks in the ancient Mediterranean. Oxford, UK: Oxford University Press. Menze, B.  H., & Ur, J.  A. (2012). Mapping patterns of long-term settlement in Northern Mesopotamia at a large scale. Proceedings of the National Academy of Sciences, 109, E778–787. Mills, B. J. (2017). Social network analysis in archaeology. Annual Review of Anthropology, 46, 379–397. Mills, B. J., Clark, J. J., Peeples, M. A., Haas, W. R., Jr., Roberts, J. M., Jr., Hill, J. B., . . . Shackley, M. S. (2013). Transformations of social networks in the late pre-Hispanic US Southwest. Proceedings of the National Academy of Sciences, 110(15), 5785–5790. Mills, B. J., Peeples, M. A., Haas, W. R., Jr., Borck, L., Clark, J. J., & Roberts, J. M., Jr. (2015). Multiscalar perspectives on social networks in the Late Prehispanic Southwest. American Antiquity, 80, 3–24. Mizoguchi, K. (2009). Nodes and edges: A network approach to hierarchisation and state formation in Japan. Journal of Anthropological Archaeology, 28, 14–26. Mol, A. A. A. (2014). The connected Caribbean: A socio-material network approach to patterns of homogeneity and diversity in the pre-colonial period Caribbean. Leiden, Netherlands: Sidestone Press Dissertations. Munson, J., & Macri, M. J. (2009). Sociopolitical network interactions: A case study of the Classic Maya. Journal of Anthropological Archaeology, 28, 424–438. Nakoinz, O. (2013). Räumliche interaktionsmodelle. Praehistorische Zeitschrift, 88, 226–257. Orengo, H.  A., & Livarda, A. (2016). The seeds of commerce: A network analysis-based approach to the Romano-British transport system. Journal of Archaeological Science, 66, 21–35. Ossa, A. (2013). Using network expectations to identify multiple exchange systems: A case study from postclassic Sauce and its hinterland in Veracruz, Mexico. Journal of Anthro­ pological Archaeology, 32, 415–432.

Networks in Archaeology   465 Östborn, P., & Gerding, H. (2014). Network analysis of archaeological data: A systemic approach. Journal of Archaeological Science, 46, 75–88. Östborn, P., & Gerding, H. (2015). The diffusion of fired bricks in Hellenistic Europe: A similarity network analysis. Journal of Archaeological Method and Theory, 22, 306–344. Padgett, J. F., & Powell, W. W. (2012). The emergence of organizations and markets. Princeton, NJ: Princeton University Press. Pearce, E., & Moutsiou, Y. (2014). Using obsidian transfer distances to explore social network maintenance in late Pleistocene hunter-gatherers. Journal of Anthropological Archaeology, 36, 12–20. Peeples, M. (2018). Connected communities: Social networks, identity, and social change in the ancient Cibola world. Tucson, AZ: University of Arizona Press. Peeples, M. (2019). Finding a place for networks in archaeology. Journal of Archaeological Research, 27(4), 451–499. Peeples, M., & Haas, W.  R., Jr. (2013). Brokerage and social capital in the prehispanic U.S. Southwest. American Anthropologist, 115(2), 232–247. Peregrine, P. (1991). A graph-theoretic approach to the evolution of Cahokia. American Antiquity, 56, 66–76. Read, D.  W. (2008). A formal explanation of formal explanation. Structure and Dynamics, 3(2), 1–16. Rihll, T.  E., & Wilson, A.  G. (1987). Spatial interaction and structural models in historical analysis: Some possibilities and an example. Histoire et Mésure, 2(1), 5–32. Rihll, T. E., & Wilson, A. G. (1991). Modelling settlement structures in ancient Greece: New approaches to the polis. In J.  Rich & A.  Wallace-Hadrill (Eds.), City and country in the ancient world (pp. 59–96). London, UK: Routledge. Rivers, R. (2016). Can archaeological models always fulfill our prejudices? In T. Brughmans, A. Collar, & F. Coward (Eds.), The connected past: Challenges to network studies in archaeology and history (pp. 123–147). Oxford, UK: Oxford University Press. Ruffini, G. (2008). Social networks in Byzantine Egypt. Cambridge, UK: Cambridge University Press. Santley, R.  S. (1991). The structure of the Aztec transport network. In C.  Trombold (Ed.), Ancient road networks and settlement hierarchies in the new world (pp. 198–210). Cambridge, UK: Cambridge University Press. Scholnick, J. B., Munson, J. L., & Macri, M. J. (2013). Positioning power in a multi-relational framework: A social network analysis of Classic Maya political rhetoric. In C. Knappett (Ed.), Network analysis in archaeology: New approaches to regional interaction (pp. 95–124). Oxford, UK: Oxford University Press. Schortman, E. M. (2014). Networks of power in archaeology. Annual Review of Anthropology, 43, 167–182. Scott, J. (2013). Social network analysis (3rd ed.). London, UK: Sage. Seland, E. H. (2013). Networks and social cohesion in ancient Indian Ocean trade: Geography, ethnicity, religion. Journal of Global History, 8, 373–390. Sindbæk, S. M. (2007). The small world of the Vikings: Networks in early Medieval communication and exchange. Norwegian Archaeological Review, 40(1), 59–74. Sindbæk, S.  M. (2013). Broken links and black boxes: Material affiliations and contextual network synthesis in the Viking world. In C. Knappett (Ed.), Network analysis in archaeology: New approaches to regional interaction (pp. 71–94). Oxford, UK: Oxford University Press.

466   Carl Knappett Sporns, O. (2014). Contributions and challenges for network models in cognitive neuro­ science. Nature Neuroscience, 17(5), 652–660. Tartaron, T. F. (2013). Maritime networks in the Mycenaean world. Cambridge, UK: Cambridge University Press. Terrell, J. (1977). Human biogeography in the Solomon Islands. Chicago, IL: Field Museum of Natural History. Terrell, J. (2013). Social network analysis and the practice of history. In C.  Knappett (Ed.), Network analysis in archaeology: New approaches to regional interaction (pp. 17–41). Oxford, UK: Oxford University Press. Van Oyen, A. (2016). Networks or work-nets? Actor-network theory and multiple social topologies in the production of Roman terra sigillata. In T.  Brughmans, A.  Collar, & F.  Coward (Eds.), The connected past: Challenges to network studies in archaeology and ­history (pp. 35–56). Oxford, UK: Oxford University Press. Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of “small-world” networks. Nature, 393, 440–442. Wiedele, D., van Garderen, M., Golitko, M., Feinman, G. M., & Brandes, U. (2016). On graphical representations of similarity in geo-temporal frequency data. Journal of Archaeological Science, 72, 105–116. Wilson, A.  G. (1971). A family of spatial interaction models, and associated developments. Environment and Planning, 3, 1–32. Zachary, W. W. (1977). An information flow model for conflict and fission in small groups. Journal of Anthropological Research, 33(4), 452–473.

Chapter 25

N et wor ks, K i n, a n d Soci a l Su pport G. Robin Gauthier

Family relationships and the emotional and instrumental exchanges embedded within them are among the most important sources of social support available to individuals throughout the life course. This chapter provides an overview of three approaches to family research that conceptualize families as networks. The three approaches explore how family relationships provide differential access to social support. The basic idea throughout the chapter is that family structure (e.g., single parent, stepparent, blended stepfamily) is not a sufficient proxy for a person’s access to social support, but network approaches offer a potential solution. Family researchers recognize the heterogeneity of families and the experiences people have within the same family structure (Ganong & Coleman,  2004; Stewart,  2007). They have long understood families as systems of interdependent relationships with their own emergent properties. Sibling alliances, rivalry, and parental and spousal dynamics all emerge from the interactions that take place within the wider relational context of the family system (see Cox & Paley, 1997; Cox & Paley, 2003, for overviews). Because these properties are emergent, they (and the support or opposition embedded within them) cannot be reduced to family structure. However, if social support cannot be inferred from family structure directly, formally measuring supportive relationships is a difficult conceptual and methodological problem. We begin the chapter with a discussion of current approaches that render the language of families as systems into configurations of measurable network structures, with a focus on how to measure the capacity for social support embedded within them. We then turn the discussion to research that employs a more inclusive understanding of family, widening the definition of what “counts” as kin (Powell, 2017). We discuss approaches that conceptualize family as a system of exchanges. We end the chapter with a discussion of a new approach to measuring family relationships rooted in a network theory of social roles, defining family roles by the kinds of activities people do with each other at home. This definition allows us to measure exchanges of social support embedded within relationships directly. We now turn our discussion to work that has applied network measures to family ties. The examples we discuss depart from traditional measurements of families, which take

468   G. Robin Gauthier traditionally named roles as a proxy for the support available within the family unit. Instead, the approaches we focus on take family networks to measure emergent, system-level properties and their capacities to provide social support. The measures discussed may, but need not, use traditional kinship labels. These approaches answer the recent calls to measure families allowing a more expansive definition, to admit nontraditional kin to better reflect what some scholars have called the more voluntary character of contemporary families (Aeby,  2016). For example, Widmer (2010) argues for widened criteria for family forms, beyond cohabitation—he argues that families need not be bounded by households, consanguinity, marriage, or, indeed, any other set criteria. Instead, the criterion for family membership is simply that a person has been designated a family member. The structural properties of family networks collected through these methods can be used to gauge their capacity for providing access to resources through the personal relationships that constitute social networks. Size, reciprocity, density, transitivity, betweenness, and embeddedness into wider networks are properties of family networks that contribute to the family’s capacity to provide social support to its members. Network methods are useful because measuring these properties (except for network size) is impossible within a traditional accounting of kinship. Traditional kinship models assume that relationships are reciprocal (if individual A is related to B, then B is related to A) and transitive (if A is related to B and A is related to C, then B is related to C). A network approach is flexible enough to measure family systems when families don’t conform to these principles. Data requirements vary across these measures because they occur at different scales. Network size can be measured using data collected from a single individual, but a measure of reciprocity requires data to be collected from the focal individual and his or her alters. Density, transitivity, betweenness, and embeddedness all require the researcher to measure relationships between every pair of individuals. In this section, we discuss several examples that have employed these measures to family networks.

Size The number of people an individual considers kin is one of the most fundamental characteristics of his or her family network and the simplest measure of bonding social capital. Bonding social capital encourages trust among members, the enforcement of local norms, and conformity. Aeby (2016) found that Swiss individuals’ family networks included, on average, 3.9 people (with a range from 0 to 31). In general, the more family relationships a person has, the more access they have to resources in times of need (Widmer, 2010), but isolation (no family ties) is likely to be particularly deleterious. People who have no contact with their family members are more likely to suffer suicidal ideation (de Catanzaro, 1995), and those who live alone are more likely to be socially isolated (Cacioppo & Hawkley, 2003).

Density Density is another potential indicator of a network’s capacity for bonding social capital. A network is denser when network alters (or family members) have more relationships with

Networks, Kin, and Social Support   469 each other. When networks are denser, these interconnections make possible rapid ­coordination among network alters to mobilize support and facilitate strong emotional bonds (Cornwell, 2012). In their study of US households, Gauthier and Moody (2013) study the range of genetic density across households in the United States. The household membership roster was obtained from pooling 10 years of data collected as part of the General Social Survey. The rosters were transformed into networks of genetic relationships weighted by the proportion of shared genetics implied by the kinship term used to represent the relationship. Relationships were recorded as directed (parents to children) or bidirectional (relations between children) and assigned a weight according to the proportion of shared genetic material (on average) shared between the pair. Finally, the genetic density of the household was calculated by dividing the total weight of the edges by the maximum possible weight. The distribution of genetic density across household types has a range of 0 (no one is related) to 1 (a single parent lives with the children). They used this measure to differentiate household types that were obtained by subjecting the household membership roster to a clustering algorithm that distinguished types of relationships: legal (married/unmarried), lineal (is a child/parent of), and collateral (is a sibling of), which result in 19 observed kinship classes, which was further divided by each member’s gender. They then compared the genetic density within households headed by cohabiting parents with children present to those headed by married parents with children present. They found that genetic density was higher among households with married couples and children than among cohabiting couples. This finding suggests that these household types might be best suited to provide bonding capital (Widmer, 2010).

Betweenness The extent to which individuals are instrumental in holding together various components of their own social network is an indicator of their bridging social capital (Aeby,  2016; Widmer, 2010). Bridging capital provides access to new information and new social ties. Bonding social capital may be mobilized more quickly than bridging social capital, but it may also be more heavily constraining. Bridging social capital encourages autonomy and it may be preferred when an individual needs more flexible sources of support (Aeby, Widmer, & Carlo, 2014). If an individual is the only person that joins multiple social worlds (as children do after their parents’ divorce), he or she has high bridging social capital and lower bonding capital (Widmer, 2006). This bridging property has been operationalized as betweenness centrality. For example, Aeby (2016) found that women had higher betweenness centrality in their emotional support networks, suggesting that they tend to provide others with social support.

Transitivity The other side of being in a position to bridge family networks is the increased potential for boundary ambiguity. Boundary ambiguity is an important concept in family research that lends itself to analysis through the lens of family configurations. Boundary ambiguity

470   G. Robin Gauthier occurs when there is disagreement among family members about who is (and who is not) part of the family system and/or what role each member should perform (Carroll, Olson, & Buckmiller, 2007). Any given individual’s family configuration may (and frequently does) diverge from the people he or she considers family. This is impossible in a traditional accounting of family relationships that are transitive—a relationship between A and B, and B and C implies a relationship between A and C. For example, if A is a sibling of B and B is a child of C, then A is a child of C. Thus, the concept of boundary ambiguity can be operationalized as intransitivity. The property is thus strongly linked to growing rates of stepfamilies and cohabiting partners, as boundary ambiguity is more likely to be found among stepfamilies compared to adoptive or biologically based families (Hobart, 1988; Stewart, 2005). In one study where boundary ambiguity was operationalized as intransitivity, Brown and Manning (2009) compared household rosters that had been collected independently from parents and children to determine whether the rosters expressed agreement on the family’s living arrangements. Disagreements were most pronounced between parent-child reports when one included a cohabiting stepfather. Castren and Widmer (2015) also explored the concept of boundary ambiguity among stepfamilies by comparing mothers’, partners’, and children’s descriptions of the family. They found that mothers perceived the family boundary more exclusively, on the whole, than stepfathers or their children. These disagreements concerning whether the stepparent is really part of the family (or even a resident of the household) can result in strained relationships that ultimately provide the child with less social support and the adult partnership with lower relational quality (Stewart, 2005; Whitsett & Land, 1992).

Reciprocity Family relationship terms are also reciprocal by definition. If X is a sibling of Y, then Y is a sibling of X, or if X is a parent of Y, then Y is a parent of X. However, these relationships may not be reciprocal in terms of either affection or support. Recent studies have shown that older generations report more affection for younger generations than the younger generation does for the older generation (Giarrusso, Stallings, & Bengston  1995, Giarrusso et al., 2001; Swartz, 2009). Instrumental support is lagged across time, with the younger generation receiving support when they are young and providing support as their parents age (Silverstein et al., 2002). These findings demonstrate that there is a wide range of reciprocity in family relationships rooted in long-term exchanges of instrumental support and affection as well as social norms of familial responsibility. Divorce and remarriage, in particular, can disrupt these important intergenerational exchanges (Ganong, Coleman, & Rothrauff, 2009).

Embeddedness The embeddedness of a particular dyadic relationship into the larger network has potentially important implications for that dyad (Simmel, 1908). Duo-centric networks combine

Networks, Kin, and Social Support   471 information about spouses’ personal social networks to understand how their relationship is embedded within a wider social network (Kennedy et al., 2015). Cornwell (2012) argues that the cost of dissolving the union is expected to be lower if spouses’ social networks do not overlap. In addition, overlapping networks may strengthen the spousal dyad by allowing each spouse to gain insight into the other and by facilitating coordinated support. His findings support these arguments—shared social contacts resulted in more open dialogue between spouses and stronger perceptions that the spouse could be relied on for support (Cornwell, 2012). Felmlee (2001) argued that the opinions of friends and family could have their own influence on spousal relationship stability. She found that relationship stability is bolstered by friends and family approval of the spouse. However, she did not find that friendship overlap contributed to relationship stability. Widmer (2010) likewise found that partnerships that were supported by others had higher quality than those that were embedded in sparse or unsupportive networks. Support from the wider network also improved the quality of parent-child dyads. The preceding discussion showed how researchers have successfully applied many traditional network measures to understanding the social support embedded in family networks. Both traditional and nontraditional definitions of family have been employed, and family researchers have used several different network boundaries—from the household to wider, extrafamilial networks. All the examples discussed earlier emphasize that the pattern of family ties is an important part of whether or not the relationships are likely to have the capacity to provide support to the individuals embedded within them.

Families as Systems of Exchanges The idea that family relationships are important because they are a system of exchanges— not because of their essential characteristics—underlies much of the push for a reconceptualization of how to measure families. Cluster analysis has recently been used to capture the emergence of configurations of family relations within some prespecified boundary (Widmer & La Farga, 2000). Many different clustering routines have been employed, but they have been used with the same purpose—to reveal relational patterns that have thus far remained hidden, to better grasp at the underlying principles of contemporary family life. As in the previous examples, the relationships under study can, but need not, employ a traditional definition of family. In their analysis of household configurations, Gauthier and Moody (2013) used cluster analysis to describe the living arrangements of US households using data from the General Social Survey (GSS) pooled over 10 years, between 2000 and 2010. The GSS asked respondents detailed questions about the relationships between household members including the relationship between the respondent and each household member, and the relationship between his or her spouse or partner and each household member. Gauthier and Moody characterized each relationship within the household according to three properties; legal status (married/unmarried), biological lineage (is a child/parent of), and collateral kinship status (sibship). They found 19 types of relationships, which they further divided by gender, yielding 38 distinct terms. They then recorded the presence or absence of each of these

472   G. Robin Gauthier 38  relationships for each household. Finally, the authors then subjected these household records to a K-means clustering model. They found that US households could be partitioned into 10 types. Five clusters contained newer/nontraditional household configurations, including living alone or with unrelated roommates, living with children but with no spouse, and cohabiting with and without children. The rest of the household configurations contained more traditional family forms, including married couples with and without children and a small number of households that include extended kin. They found differences in cluster membership based on gender and the marital status of adults. Single women were more likely to live with children than single men. They found that single women who live with their children were distinct enough to be placed into a distinct cluster. Single men who live with their children, however, were not. They, in fact, have household membership profiles that are more similar to the patterns found among single men. Single men living both with and without children tend to live with roommates and other relatives, much like single, childless women. However, when single women live with their children, they are less likely to live with other family members (except their own mothers). Households containing cohabiting couples were found to be more diverse than households headed by a married couple, whether or not there are children present. The authors argue that this result suggests that no single profile has yet emerged for cohabiters—the boundaries around their households are more permeable than married households. This study is an example of how a researcher might explore the interplay between traditional constituents of kinship and a more inductive approach to understanding how people arrange themselves into households. In a series of studies, Widmer (2010) asked college students to list their “significant family members.” He emphasized that respondents could feel negatively about the people they named, and they need not have frequent face-to-face interaction so long as they were an influential person. In one study of 25 students, 19 types of kin were named. Almost all the students named their mother, most named their siblings and father, and many named other, more distant types of kin; only seven included a nontraditional family member (their romantic partners). Another study in the series uncovered eight family configurations defined through a hierarchal clustering routine performed on data collected from 229 college students. In general, traditionally defined family members were most frequently named, but a significant proportion of their sample included a nontraditional family member. Friends (30% named a female friend, 20% included a male friend) and romantic partners (39% included a partner) were especially common. Only 10% included a stepfather and stepmother, and perhaps most surprisingly 7% included their partner’s mother. Following a similar approach, Aeby (2016) also uncovered eight family configurations using cluster analysis from data obtained in a nationally representative sample of 803 Swiss respondents. The inclusion criteria for family membership was once again broadly defined, with no reference to consanguinity or coresidence. Respondents were asked: “Who are the individuals who, over the past year, have been very important to you, even if you have not gotten along well with them?” Two configurations included unconventional elements, alongside conventional ones, and one contained only nonkin and three contained only conventional kin (the last group listed no one). Together, 40% of their sample belonged to a configuration that had included someone who would not be traditionally defined as family.

Networks, Kin, and Social Support   473 These studies begin to provide a picture of the variety of family forms that exist today. They also highlight the importance of individuals outside the traditional definitions of family. Family researchers are also interested in how the contents of family exchanges relate to and mutually reinforce each other and in the affective or emotional content of family ties. In general, two types of exchanges can be distinguished—expressively oriented and instrumentally oriented exchanges. Interactions are expressively oriented when youth spend time with their family members for the primary purpose of being together. Examples of expressively orientated actions include watching TV, playing catch, or discussing the day’s events. Offer (2013) found that while they were engaged in leisure activities with their families, adolescents reported positive emotions. Likewise, Padilla-Walker, Coyne, and Fraser (2012) also found that family cell phone use, watching movies and TV, and playing video games all brought adolescents closer to their family. Instrumentally oriented tasks bind individuals to groups by inducing positive sentiment toward the group as a social unit. An action is instrumentally oriented if its primary purpose is to accomplish a tangible end, like getting the dishes done or a baby’s diaper changed. Experiments have provided us evidence of a causal connection between participating in task-oriented exchanges and positive affect. Lawler (2001) argued that successful dyadic exchange produces positive emotion that the actor attributes to that particular relationship. When exchanges take place within the context of a group, the positive emotion generated from successful exchanges becomes attributed to the social unit. However, observational research provides counterevidence of the effect of doing routine maintenance with family members. Offer (2013) found that while adolescents were engaging in routine household maintenance (chores) with both parents, they report lower well-being; when only one parent was present, there was no detectable difference in their affect. It thus remains unclear how instrumentally oriented ties cumulate into positive affect within relationships. In a working paper, Gauthier (2018) takes steps toward addressing this gap by using ­patterns of parent-child interactions to describe relationships from the child’s perspective. She clusters patterns of time-use data collected within the third wave of the Panel Study of Income Dynamics—Child Development Supplement (PSID-CDS-III) to measure the association (emotionally oriented interactions) and resource-sharing (instrumentally oriented) components of intergenerational solidarity. The sample is the subset of PSID households participating in 1997 with children aged 0 to 12 years old. Eighty-eight percent (2,380) of eligible households provided information on 3,563 children. In 2007 the supplement was administered again to those 1,506 then-adolescents who were still younger than 18 years old. Up to two adolescents from each studied household were asked to record their activities and their duration for the previous 24 hours for one weekday and one day from the weekend. Ultimately time diaries were collected from 1,442 adolescents, and of those, 1,298 rated the quality of their relationships with their family members. Each social or task-oriented activity the child participated in as the helper or the recipient was coded into one of four interaction types: (1) expressive, mutually oriented activities; (2) assistance received; (3) productive (cooperative) exchange; and (4) solitary household task. The results reveal six unique relational patterns that were found to have different associations with the focal child’s feelings of closeness to his or her family members. The youth in the four most relationally integrated groups have higher closeness on average, they have statistically indistinguishable levels of closeness compared with each other, and they are statistically significantly closer to their family members than youth in the final two

474   G. Robin Gauthier categories. The description of the six clusters proceeds from the cluster reporting the ­highest average closeness to the least. Twenty adolescents (1.15% of the sample) were found to have a dependent type of relational integration. They are most clearly differentiated from the others based on the relatively high amount of assistance directed toward them. These adolescents are both expressively and instrumentally integrated. They receive instrumental assistance, but they contribute less labor to the household than expected based on the frequency of their expressive activities. They report the most positive relations with their family members. One hundred and ten adolescents (8.5%) experience a balanced type of relational integration—they engage in both instrumental and expressive activities with their family members. They engage in about as many expressively oriented activities with their family members as the dependent group, but they also contribute to one more cooperative effort and more solitary household tasks compared with the dependent group. They have strong positive family ­relations, higher than all other groups except the dependent group. Three hundred and twenty-two adolescents (26.2%) have family relations that are mainly characterized by expressive integration. They have moderate levels of mutually oriented activities, and they are less integrated through instrumental ties than others. Twenty-one adolescents (2.36%) are overintegrated; they are more integrated through expressive and instrumental ties than adolescents in any other cluster. Half of the adolescents in the sample (and the largest group by far; n = 662) have family relationships that are relatively less integrated than all of the other groups. They contribute little instrumental assistance to the family unit and are tied through leisure activities. The final 11.6% of the sample (n = 163) of adolescents’ family interactions are limited, and instead solitary household tasks are far more common. These findings show that cooperation is a crucial characteristic of family interactions. The completion of routine household tasks threatens family relationships unless balanced out with mutually oriented activities and cooperative tasks. Evidence drawn by examining clusters of family interactions showed that adolescents with a more balanced set of interactions were found to have the closest relationships with their family members rather than those with the highest volume, although the difference did not reach statistical significance.

Defining Family Roles through Configurations of Interactions Contemporary family forms are difficult to study because their boundaries frequently shift. Children link their parents’ households after divorce (or nonmarital birth), and repartnering creates many relationships that did not exist four decades ago. Structural changes have outpaced society’s cultural repertoire of available terminology, which is often inadequate to describe these new family types. Widening the net of family terminology by expanding the number of categories that survey respondents can choose from to enumerate their families will reduce some measurement problems but will not solve the underlying problem. A ­longer list will not provide respondents or researchers an adequate vocabulary as there is no consensus on what terms should be used or what terms are missing.

Networks, Kin, and Social Support   475 In response to the gap between traditional labels and real family relationships, researchers have called for studies that move away from asking what the family is and ask instead what families do (Allen, Blieszner, & Roberto, 2011; Nelson, 2006; Scanzoni & Marsiglio, 1991). Family relationships can be classified and social roles identified through the operating principles of “does for” and “does with” rather than the traditional principles of “is a parent of ” and “is married to.” Gauthier (2014) illustrates the possibility of deriving family roles from interaction data using time-use diaries from the third wave of the PSID-CDS-III. The basic idea is that two children who interact in the same way with their “family members” occupy the same kind of role relation—no matter how (or whether) they are related using formal (possibly outdated) kinship rules. Gauthier uses a modified measurement of role equivalence (Winship & Mandel, 1983). Winship and Mandel’s method was developed specifically to have a measure of role equivalence that can be used to compare roles across networks. For each pair of actors in the data, they generate a binary vector, which they call the role relation, recording the presence or absence of all the possible relations the two could have. The aggregate set of actors’ role relations is their role set, and actors who have the most similar role sets are considered to be role equivalent. In the following analyses, two individuals are role equivalent when they share the same set of direct relations with their respective focal children. The interactions recorded within each child’s time diary was transformed into a matrix, with each alter’s interactions recorded as a row, each with up to 10 rows (alter types) and 64 activity types, with the presence or absence of each activity recorded as present or absent. The resulting 6,873 ego networks were stacked to create three matrices, one for each wave. In 1997, 11,423 alters were reported, 9,112 in 2002, and 5,332 in 2007. The distance between the rows (child-alter interactions) within each of the three matrices was calculated using the matching coefficient. The distance matrix was then subjected to a hierarchal clustering method. Eighteen roles emerged in total, 16 in the first wave, and 8 in both the second and third waves. Two new roles emerged in the second and third wave were not present in the first. Gauthier classified the roles into five substantive categories based on their shared content (not structure or integration): caring, affectionate, limited, entwined, and friendly.

Caring Roles Three kinds of roles that closely resemble “mothering” were found in the 1997 sample. These “mothering” roles were distinguishable from the others because of their inclusion of physical care. Twice as many mothers, compared to fathers, took on these roles. For example, 17% of married mothers and 18% of cohabiting mothers enacted one of these care-based roles with the focal child versus 10% of married fathers and 15% of cohabiting fathers. In addition, a nontrivial proportion of siblings (13%) and grandparents (8%) took on these caring roles. The care-based roles are differentiated within themselves by the diversity of active play and whether or not they watch TV.

Affectionate Roles Three affectionate roles closely resemble what we might think of as traditionally “sibling” roles. The three roles were defined by high-affect, playful interactions with open displays of

476   G. Robin Gauthier affection. In fact, siblings were twice as likely than parents (15% vs. about 6%) to enact one of these roles with the focal child in 1997 and the difference continues as the sample ages.

Limited Interaction Roles Three roles were defined by shared mealtimes but otherwise limited interaction with the child. From a child’s perspective, these limited roles resemble the role of a traditional breadwinning father. However, both many married, coresident fathers (32%) and married, coresident mothers (25%) took on this role with their child in the 1997 sample. Resident stepparents were even more likely to enact this kind of role. For example, 41% of married stepfathers and 33% of cohabiting stepfathers, and 27% of married stepmothers and 47% of cohabiting stepmothers were found to have limited relations with their stepchild. On the other hand, very few siblings had this kind of relationship with the focal child.

Entwined Lives Roles Individuals whose roles are within the “entwined lives” category go about their daily lives without constant togetherness but share many domestic aspects of their days—they play with, talk with, do some chores with, and show some affection with each other in addition to time spent apart. These are by far the most common parent-child relationships. Between a quarter and half of all parent types have this kind of relationship with their child in the 1997 sample. As the children age, those with involved family relationships tend to move into these roles. A higher proportion of all family relationships take on this character in the later samples (69% of fathers, 64% of mothers, 57% of stepfathers, and 50% of stepmothers).

Friendly Roles Individuals whose roles are one of the five within the friendly class interact with children more as peers than parents in the other three classes, but they have more interaction than parents in the limited class. Two of the friendly roles also incorporate a few instances of physical care. In general, all five relationship types that make up the friendly class are characterized by either play or communication. These roles resemble how we might think of stepparent relationships. In fact, over a third of stepmothers and stepfathers are friendly with the focal child. About half of grandparents are also friendly with their grandchildren. Unsurprisingly, over 80% of friends in the 1997 sample we classified as friends. Just over half of friends were classified as friends among the older children (children in the 2002–2003 and 2007 samples), with many also being classified as “entwined.” This finding is in line with research that shows peers becoming more central to adolescents’ lives as they mature. Gauthier used network methods to uncover family roles without reference to their traditional labels. She then mapped the emergent family roles back onto traditional family terms to assess the overlap between the two. She found that mothers who are biologically related to their child tend to “look like” mothers regardless of their residential status, while fathers’ type of relationship depends more on their residential status. Stepparents look more like

Networks, Kin, and Social Support   477 distant relatives (not a strongly hierarchal relationship) than friends (peer based). Despite these trends, however, she stresses that the relational overlap between different traditionally labeled kin types is vast. Fathers, social fathers, siblings, and grandparents were all found to enact these “mothering” roles while many biological mothers did not. The promise of this work is that it allows the researcher to study the potential for support embedded in family roles as they are actually experienced without making assumptions about how (or whether) these roles map onto traditional family relationships. In this chapter, we have reviewed three different perspectives that address the difficulty of measuring family processes on a large scale. The first section reviewed work that applied traditional network measures to family networks. The second explored the family as a system of directed exchanges, employing cluster analyses to classify patterns of exchanges into discrete types. Finally, the chapter discusses an inductive approach to uncovering family roles. The common theme throughout the chapter was the idea that supportive relations cannot be inferred through traditional kinship terms. This idea is very relevant in contemporary society where there are many ways to “do family,” and new tools to measure families are needed.

References Aeby, G. (2016). Who are my people? Strengths and limitations of ego-centered network analysis: A case illustration from the Family tiMes Survey. FORS Working Paper Series, paper 2016–2. Lausanne, Switzerland: FORS. Aeby, G., Widmer, E. D., & Carlo, I. D. (2014). Bonding and bridging social capital in step-and first-time families and the issue of family boundaries. Interpersona, 8(1), 149. Allen, K. R., Blieszner, R., & Roberto, K. A. (2011). Perspectives on extended family and fictive kin in the later years: Strategies and meanings of kin reinterpretation. Journal of Family Issues, 32, 1156–1177. Brown, S. L., & Manning, W. D. (2009). Family boundary ambiguity and the measurement of family structure: The significance of cohabitation. Demography, 46, 85–101. Cacioppo, J.  T., & Hawkley, L.  C. (2003). Social isolation and health, with an emphasis on underlying mechanisms. Perspectives in Biology and Medicine, 46(3), S39–52. Carroll, J. S., Olson, C. D., & Buckmiller, N. (2007). Family boundary ambiguity: A 30-year review of theory, research and measurement. Family Relations, 56, 210–230. Castren, A.-M., & Widmer, E. D. (2015). Insiders and outsiders in stepfamilies: Adults’ and children’s views on family boundaries. Current Sociology, 63(1), 35–56. Cornwell, B. (2012). Spousal network overlap as a basis for spousal support. Journal of Marriage and Family, 74(2), 229–238. Cox, M. J., & Paley, B. (1997). Families as systems. Annual Review of Psychology, 48, 243–267. Cox, M.  J., & Paley, B. (2003). Understanding families as systems. Current Directions in Psychological Science, 12, 193–196. de Catanzaro, D. (1995). Reproductive status, family interactions, and suicidal ideation: Surveys of the general public and high risk groups. Ethology and Sociobiology, 16, 385–394. Felmlee, D. H. (2001). No couple is an island: A social network perspective on dyadic stability. Social Forces, 79(4), 1259–1287. Ganong, L. H., & Coleman, M. (2004). Stepfamily relationships: Development, dynamics, and interventions. New York, NY: Kluwer Academic/Plenum Publishers.

478   G. Robin Gauthier Ganong, L.  H., Coleman, M., & Rothrauff, T. (2009). Patterns of assistance between adult children and their older parents: Resources, responsibility and remarriage. Journal of Social and Personal Relationships, 26, 161–178. Gauthier, G. R. (2014). Anatomies of kinship: Diversity in the formal structures of American families (PhD thesis). Duke University, Durham NC. Gauthier, G. R. (2018). Whose turn is it? Patterns of co-operation, leisure, and family closeness. Paper presented at the 2018 ASA, Philadelphia, PA. Gauthier, R., & Moody, J. (2013). Chapter 5 Anatomies of kinship: Change and diversity in the formal structure of American families. In N.  Landale, S.  McHale, & A.  Booth (Eds.), Families and child health (pp. 73–94). New York, NY: Springer Publishing Co. Giarrusso, R., Feng, D., Silverstein, M., & Bengston, V. L. (2001). Grandparent-adult grandchild affection and consensus: Cross-generational and cross-ethnic comparisons. Journal of Family Issues, 22, 4, 456–477. Giarrusso, R., Stallings, M., & Bengtson, V. L. (1995). The intergenerational stake hypothesis revisited: Parent–child differences in perceptions of relationships 20 years later. In V. L. Bengtson, K. W. Schaie, & L. M. Burton (Eds.), Adult intergenerational relations: Effects of societal change (pp. 227–296). New York, NY: Springer Publishing Co. Hobart, C. (1988). The family system in remarriage: An explanatory study. Journal of Marriage and the Family, 50, 649–661. Kennedy, D.  P., Jackson, G.  L., Green, H.  D., Bradbury, T.  N., & Karney, B.  R. (2015). The ­analysis of duocentric social networks: A primer. Journal of Marriage and the Family, 77, 295–311. Lawler, E. (2001). An affect theory of social exchange. American Journal of Sociology, 107(2), 321–352. Nelson, M. K. (2006). Single mothers “do” family. Journal of Marriage and Family, 68, 781–795. Offer, S. (2013). Family time activities and adolescents’ emotional well-being. Journal of Marriage and Family, 75(1), 26–41. Padilla-Walker, L. M., Coyne, S. M., & Fraser, A. M. (2012). Getting a high-speed family connection: Associations between family media use and family connection. Family Relations, 61(3), 426–440. Powell, B. (2017). Changing counts, counting change: Toward a more inclusive definition of family. Journal of the Indiana Academy of the Social Sciences, 17(1), 1–14. Scanzoni, J., & Marsiglio, W. (1991). Wider families as primary relationships. Marriage and Family Review, 17, 117–134. Silverstein, M., Conroy, S. J., Wang, H., Giarrusso, R., & Bengston, V. L. (2002). Reciprocity in parent-child relations over the adult life course. Journals of Gerontology: Series B, 57(1), S3–13. Simmel, G. (1908). Soziologie. Leipzig, Netherlands: Duncker & Humblot. Stewart, S. D. (2005). Boundary ambiguity in stepfamilies. Journal of Family Issues, 26, 1002–1029. Stewart, S. D. (2007). Brave new stepfamilies. Thousand Oaks, CA: Sage Publications. Swartz, T. T. (2009). Intergenerational family relations in adulthood: Patterns, variations and implications in the contemporary United States. Annual Review of Sociology, 35, 191–212. Whitsett, D. P., & Land, H. M. (1992). Role strain, coping, and marital satisfaction of step­ parents. Families in Society, 73(2), 79–92. Widmer, E. D. (2006). Who are my family members? Bridging and binding social capital in family configurations. Journal of Social and Personal Relationships, 23, 979–998.

Networks, Kin, and Social Support   479 Widmer, E.  D. (2010). Family configurations: A structural approach to family diversity. Burlington, VT: Ashgate Publishing. Widmer, E. D., & La Farga, L.-A. (2000). Family networks: A sociometric method to study relationships in families. Field Methods, 12, 108–128. Winship, C., & Mandel, M. (1983). Roles and positions: A critique and extension of the ­blockmodeling approach. Sociological Methodology, 14, 314–344.

Chapter 26

Demogr a ph y a n d N et wor ks M. Giovanna Merli, Sara R. Curran, and Claire Le Barbenchon

This chapter provides a review of the ways in which social networks concepts, data, and tools have been incorporated in demography and reflects on ways in which they can be further leveraged for advancing demographic research. We first provide an overview of the three major, methodological, and theoretical domains of demography. These include efforts (1) to accurately enumerate and describe the size and composition of populations; (2) to estimate past, present, and future population size and parameters in the absence of complete information; and (3) to explain the dynamics of population systems, particularly the linkages between fertility, migration, and mortality and population structures since these relations are predictive of population growth rates and population composition. Population growth rates and compositional structure prominently figure into explanations of demographic behavior at the individual level. They also provide crucial proximate conditions related to cultural, economic, political, and social change. We draw upon two major explanations in the field of demography to illustrate our points throughout the chapter, namely demographic transition theory and cumulative causation of migration. The second part of our chapter reviews how network concepts, data, and tools are implicated and explicated within each explanatory domain. Social network concepts and measures have been increasingly incorporated into contemporary demographic research as determinants of demographic behaviors, such as fertility, migration, and mortality. In addition, a growing number of demographers are adopting network data and exploiting the structural properties of networks to improve the descriptions of hard-to-reach populations, especially those that are hidden or rare. Finally, network data and measures are being used to estimate population size and parameters under conditions of sparse information. The third part of our chapter suggests productive ways for advancing demographic research that rely on the adoption of a wider array of network data and tools.

Demography and Networks   481

Demography: Enumeration, Estimation, and Explanation Demography encompasses a vast conceptual and substantive territory that is organized by a unique demographic conceptual framework. This framework is dynamic and operates at multiple levels. Demography characterizes the ways in which vital rates—mortality, fertility, and migration—shape aggregate population structures (their composition, distribution, and change) through states, inflows, outflows, and transitions between states. The main objective of demographic analysis is to understand the connection between aggregate population structure, composition, and change. This connection implies a micro-macro analytic link. At the aggregate level, demographers identify regularities through models that describe the relationship between vital rates and population structures; at the individual level, they seek to understand how behaviors shape vital rates and how these behaviors, through feedbacks, are altered by aggregate properties of the population (Lee,  2001; Palloni,  2002).1 This blending of macro-demographic structures and micro-processes of human behaviors has importantly contributed to the domains of fertility and family, infectious disease epidemiology, aging, marriage, migration, and residential mobility and segregation. Such linkages require developing system dynamic modeling to represent these endogenous processes (Preston, Heuveline, & Guillot, 2001). Naturally, demographers have invoked network concepts to describe such a multilevel dynamic model. In this chapter, we elaborate some of these conceptual invocations. Demographers conceptualize populations as abstract agglomerations of individuals defined by geographic boundaries and shared attributes and characteristics. These agglomerations are typically referenced in terms of ecological space (e.g., a country, a region, people with certain characteristics living within geographic boundaries). One goal of demographic research is the precise, accurate description of populations. Typically demographers rely on information sources such as census data, which completely enumerate a population, or detailed population surveys, which rely on samples drawn from complete sampling frames, designed according to probabilistic sampling principles, from which generalization and inference to the underlying population can be drawn. This goal is challenged by difficulties to fully enumerate or accurately describe “hidden populations” for which sampling frames are nonexistent or incomplete. Populations might be hidden for any number of reasons, including remote locations, politics, or cultural or social stigmas that make revelations difficult. Furthermore, the description of populations may be complicated by missing information, such as key vital events like births and deaths that are forgotten or never recorded. Demographers have developed numerous techniques for revealing gaps in population enumeration and parameter estimation (Preston et al., 2001; United Nations, 1983). They use generalized tools and approaches, such as the life table, cohort analyses, and stable population models that link individuals to cohorts and across the life course, to search for structural regularities that are assumed to govern populations based on a set of relationships linking vital rates to aggregate structures. Similarly, the estimation of population parameters in the absence of direct information on births and deaths has animated a variety of indirect estimation techniques that generate demographic parameters from incomplete

482   M. Giovanna Merli, Sara R. Curran, and Claire Le Barbenchon data and rely on stable population theory and its associated assumptions. These approaches relate estimates based on observed data to a suitable standard life table to, for example, generate a child mortality rate; they rely on survivorship reports by kin, generally siblings, to estimate adult mortality (Brass & Coale, 1977; Rutenberg & Sullivan, 1991; Timaeus, 1991). The reliance on reports by kin to efficiently estimate demographic parameters in the absence of complete registration of vital events easily lends itself to the use of network concepts and tools, since these methods collect information about mortality in kin networks. In fact, recent developments in demographic methodology have seen the use of network tools to enumerate population size, describe population characteristics, estimate population parameters, and, by doing so, fruitfully integrate demographic and social network approaches to generate products that pass the test of rigor and representation (Feehan, Mahy, & Salganik,  2017; Maltiel et al.,  2015; McCormick & Zheng,  2012; Merli et al.,  2016,  2019; Mouw & Verdery, 2012). Explanations in demography range from individual-level behavioral ones to populationlevel ones. We focus here on two population-level explanations to organize our conceptual discussion: demographic transition theory and the cumulative causation of migration. These theories describe systems of relationships among demographic variables to predict future demographic behavior and population-level demographic outcomes. Demographic transition theory emerged in the early 20th century among both American and French demographers (Myrskylä, Kohler, & Billari,  2009) to counter the classic Malthusian explanation of boom-and-bust population cycles. Demographers noticed that as nations progressed through industrialization, population growth tended to follow a pattern of structural change that, far from following a boom-and-bust cycle, eventually rebalanced and restabilized to slower growth rates. Formalized by Notestein in 1945, demographic transition theory is a generalized description of the changing pattern of mortality, fertility, and growth rates as societies move from one demographic regime to another, involving four or five idealized stages of demographic change (Notestein, 1945). First, preindustrial societies exhibit high birth and death rates that are roughly in balance and population growth rates are slow. In the second stage, with industrial development, food supplies and sanitation improve, while death rates decline and birth rates climb. This leads to rapid population growth fueled with a widening of the base of the age structure. By the third stage, birth rates begin to fall as women’s education rises and they increasingly control their fertility decisions, families’ and communities’ reliance on subsidence agriculture diminishes, and institutional capacities increase with social welfare supplementing family support systems. Birth and death rates move toward a closer balance, while population growth tends to level off. In the fourth stage, birth and death rates are low and the population age structure elongates. In a fifth stage, if the society transitions to below-replacement fertility levels, the population size shrinks and succumbs to rapid aging. Although the negative correlation between fertility and industrialization or development is a widely accepted social fact (Myrskylä et al., 2009), there is considerable variability in the rates of transition, and many societies even exhibit slowed or stalled progression through these idealized stages (Bongaarts, 2006, 2017; Grace & Sweeney, 2013). As Caldwell (1976) illustrates, underlying demographic transition theory are behavioral models about fertility that propose how crucial influencing factors such as technology, information and knowledge, social identities and roles, kinship and family relations, and the progression of life course events shape the abilities and capacities of women to control

Demography and Networks   483 the number of children they bear. These explanations often rely on social network concepts to propose how ideas and resources are diffused through a community and reach an individual woman. Demographic transition theory is also informed by epidemiological models of the spread of disease, which offer contagion metaphors to explain disease transition and population change (Omran,  2005). Similarly, demographic transition theory, as it is described at later stages, explains how aging processes both elongate and narrow kinship networks, creating challenges for social support networks (Reher, 2011; Verdery, 2015). In fact, some demographers have expressed concerns that the unraveling of kinship networks through the second demographic transition has had profoundly negative effects for some children (McLanahan, 2004). And, at stages when there is an age structure imbalance, at the end of stage 2 and the beginning of stage 3, there may be a demographic surplus (a higher proportion of young, working-age population), which then yields high levels of migration and reshapes the cultural meanings of kinship ties (Caldwell, 1976). As we will demonstrate in the next section, the models that explain how individual behaviors aggregate to the population level rely to a great extent on network concepts to elucidate the mechanisms connecting population structures to individuals and their nearest neighbors. The demographic surplus, as well as other factors that push some to migrate, has also yielded a well-accepted social fact in demography—namely that migration cumulatively causes future migration. The idea that migration behavior is integrally linked not just to an individual’s characteristics and inclinations, but also to a system of linkages between migrants and nonmigrants and between origin and destination communities has its roots in the concept of chain migration (MacDonald & MacDonald, 1964), which later yielded a rich literature describing migrant networks (Boyd, 1989; Fawcett, 1989), creating challenges for neoclassical economic explanations of migration (Greenwood, 1975). As evidence accumulated about the role of migrant networks for influencing individual migration, a new theory emerged to better explain migration behavior and the emergence and sustenance of migration systems. By the early 1990s, Massey and colleagues (Massey, 1988; Massey, Arango et al., 1994) proposed a cumulative causation of migration theory that drew upon Myrdal’s explanations for economic development trajectories (Myrdal, 1958). This theory claims that with each new migrant between origin and destination, systems, and even institutions, emerge that facilitate the flow of information and resources about destination opportunities, reduce the costs of migration, minimize risks, and fuel imaginations about the possibilities of migration (Massey, Arango et al.,  1994). The mechanism at the heart of these systems and institutions is social networks; social networks connect individuals between origins and destinations and to nearest relationships in either the origin or destination. Just as with demographic transition theory, scholars of migration frequently invoke social networks to explain the relationships between individual behavior and social structural conditions. The theory argues that each move propels the next move (either by the same individual or a related individual), accumulating like a snowball moving down a snowy hill. Notwithstanding the empirical challenges of observing these phenomena—given the requirements of prospectively and longitudinally observing migration from origins and then in destinations—a growing body of evidence supports the predictions associated with the theory (Castles, De Haas, & Miller, 2014). For example, with growing migration momentum and accumulated migration experiences within origin communities, migration rates increase, migration becomes a less selective process, and migrant streams include a greater diversity of individuals (Fussell,  2010; Garip,  2012; Massey, Goldring, & Durand,  1994).

484   M. Giovanna Merli, Sara R. Curran, and Claire Le Barbenchon Similarly, the cumulative causation of migration theory explains how migrants are ­incorporated into destination economies and societies via enclaves and the inherent networks within those communities and back to origin communities (Portes,  1997). Social network concepts are also invoked by migration scholars to explain numerous transnational remittances of economic and sociocultural resources from destination communities back to hometown or origin communities (Smith,  2006; Waldinger,  2008; Waldinger & Fitzgerald, 2004).

Network Approaches and Current Contributions to Demography While demographers’ paradigms, analytical strategies, and data generally focus on individuals and their attributes, in social networks actors are described by their relations, not by their attributes. Social networks scholars adhere to what has been termed an “anticategorical imperative” (Emirbayer & Goodwin, 1994; Emirbayer, 1997), rejecting the premise of attributional categories and other substantives (Emirbayer, 1997). Thus, at least at first glance, demography and networks are conceptually opposite: the first, to borrow Emirbayer’s (1997) terms, is governed by substantialism and categorical approaches, the second by relationalism and transactions through which “resources, goods and even positions [dynamically] flow through particular configurations of ties” (Emirbayer, 1997, p. 298). It is an axiom in sociology that individual behavior “is organized through and motivated not by attributes and categorical affiliations but by the structure of tangible social relations in which persons are embedded” (Bearman,  1993, as cited by Emirbayer,  1997, p. 299). Similarly, demographic behavior, a fundamental component of the demographic framework, is embedded in social structure. This idea derives from theories of social interaction and behavior that are central to the demographic transition theory that were developed in response to the limited power that individual attributes were found to hold for the European fertility decline (Watkins, 1991), as well as for fertility declines in less developed countries (Bongaarts & Watkins, 1996). They rest on the insight that individuals do not make decisions about demographic and other social behaviors in isolation, because information, resources, and norms that lead to the adoption of these behaviors are transmitted across individuals (Kohler et al., 2015; Montgomery & Casterline, 1996). Demographic theories and models of demographic behaviors or outcomes have variously relied on social networks as a crucial mechanism for more complete explanations of the components of population change—fertility, migration, and mortality—as well as the close correlates of those components (e.g., marriage, family building and living arrangements, morbidity). The social network concept of “diffusion” (Rogers, 2003) has enabled demographers to elucidate the mechanisms that connect individual behaviors and aggregate population properties by elaborating the idea that changes in individual behaviors accounting for aggregate shifts can be the result of “social processes” whereby individual actions track behaviors embedded in social structures. Rogers (2003) describes the diffusion of family planning programs to curb fertility rates as a preventative innovation: “an idea that an

Demography and Networks   485 individual adopts at one point in time in order to lower the probability that some future unwanted event may occur” (p. 69). This type of innovation typically diffuses more slowly, implying a more gradual slope of the diffusion curve. Because it is hard to count events that have not yet occurred (in this case, births), many diffusion studies in demography have focused on the diffusion of contraception as the main proximate determinant of fertility decline (Rosero-Bixby & Casterline, 1993; Valente et al., 1997; Entwisle et al., 1996). Early literature in this area showed that contraceptive diffusion occurs among people with similar attributes (Rogers & Kincaid,  1981), spatially proximate (Knodel & van de Walle,  1979; Watkins, 1986), and that social pressure from network partners plays a role in contraceptive adoption (Udry, 1982) because networks can function as a medium for the enforcement of social norms. In cases where networks are heterophilous, meaning dissimilar on attributes, opinion leaders are particularly important for transmitting messages and are looked to as a source of information and experience, particularly due to their often higher status and educational achievement (Rogers, 2003). This may lead to a situation where different villages adopt different means of family planning, despite receiving similar external messaging, an idea that, as indicated later in this review, was further explored by Entwisle et al. (1996). Later advances in understanding fertility decline have drawn heavily on network theories of diffusion. Rosero-Bixby & Casterline (1993) generated a model of fertility decline that includes a contagion model for birth control adoption, wherein the number of network partners who practice family planning affects ego’s own behavior, alongside cost and access to birth control, finding evidence that social interaction contributed to the diffusion of contraceptive use in Costa Rica. Montgomery & Casterline (1993) employed a multivariate model controlling for economic and social conditions with fixed effects for township to estimate the effects of a diffusion process on fertility decline in Taiwan. Their results point to within-township diffusion, bringing additional quantitative evidence to previous findings by Watkins (1986,  1991) that fertility patterns spread within geographic boundaries, from communities to the national level, indicative of a process of diffusion through networks. While the aforementioned studies do not untangle the precise mechanisms through which family planning diffuses, later studies, mainly relying on qualitative data, introduced the concepts of social learning and social influence to elucidate the “black box” of the diffusion process. Watkins & Danzi (1995) found that Italian and Jewish women in New York and Philadelphia who gave birth between 1920 and 1940 acquired knowledge and understanding of family planning mainly from other women in their network, and that contraceptive use diffused more widely in larger, more heterogeneous networks. Entwisle et al. (1996) combined both qualitative and quantitative data to show that, in Nang Rong district, Thailand, diffusion of family planning occurred within villages, and that early adopters tended to dictate which type of contraceptive was used. These results lend evidence to a social influence hypothesis, whereby the take-up by others in your community drives your own. Watkins (2000) showed that in Nyanza Province, Kenya, women discuss family size and family planning options among themselves, though rarely with women of higher social standing, pointing to the importance of reducing uncertainty in the diffusion process through discussion with network partners, indicative of a social learning process. Bernardi (2003) described social learning as a process by which couples learn firsthand from their childbearing friends what having children is like, contributing additional qualitative evidence to social learning as a determinant of fertility at the individual level. A similar process

486   M. Giovanna Merli, Sara R. Curran, and Claire Le Barbenchon was shown to operate in the course of mortality and fertility change. Sandberg et al. (2012) used egocentric social network data linked to a demographic surveillance system in Nyakhar, Senegal, to find that the level of infant mortality in one’s network affected perception of infant mortality change over time, leading individuals to over- (or under-) estimate mortality risks. Sandberg (2006), using comprehensive network data, found that in a small, isolated community in the Nepalese mountains, the perception of higher mortality among respondents’ sociometric network was linked to higher levels of fertility, pointing to a social learning mechanism that leads individuals to change their fertility behavior based on information about infant mortality levels acquired from socially proximate others. With the recent collection of large-scale egocentric social network data in omnibus demographic and health surveys, demographers have undertaken the empirical testing of various family planning diffusion mechanisms identified in the previous literature and predicted by demographic theory. Kohler, Behrman, & Watkins (2001) collected egocentric network panel data among women living in South Nyanza District, Kenya, to show that where social learning is the main mechanism leading to contraceptive uptake, the proportion of contraceptive users among a respondent’s network partners exerts a positive influence on contractive use, while density (or the extent to which network partners know each other measured by the ratio of actual links to possible links in a network) has no (or a null) effect on take-up because dense networks provide redundant information. However, where social influence is relevant, the proportion of ego’s alters using family planning is a strong predictor of her contraceptive use in a dense network that exerts normative pressures. In a second paper using the same Kenyan data, Behrman, Kohler, and Watkins (2002) employed panel data fixed effects and controls to tease out the relative contribution of social learning and social influence on changes in contraceptive adoption over time. They found that each additional network partner who uses contraception after the first one has increasingly smaller effects on adoption by ego, producing a nonlinear effect on take-up. This suggests that social learning is more important than social influence, as network alters provide increasingly redundant information about contraceptives. Furthermore, this follows the S-curve shape of diffusion explained in Rogers (2003), where initial adopters can spark adoption among network partners quickly, as they interact with individuals who learn from them about the benefits of family planning. However, over time, individuals can count increasing numbers of adopters among their network alters and the social learning mech­ an­ism loses prominence, leading to a slowdown of adoption. A third paper by Kohler, Behrman, and Watkins (2000) also with Kenyan data shows that there can be multiple equilibria in a diffusion process, and thus increased social interaction can be status quo enhancing. This suggests that areas where there are strong social norms against birth control may never reach high adoption levels. This is echoed by Munshi & Myaux (2006), who find that, in a Bangladeshi village, relevant social interactions are restricted to the individual’s religious group, with cross-religion effects entirely absent, despite universal village reception of the same family planning inputs, and that religion­specific norms can stem the diffusion of family planning. In Germany’s very low fertility setting, Lois & Arránz Becker (2014) relied on the measurement of ego’s network characteristics (e.g., network size and composition) in a rich panel dataset spanning five waves from 1988 to 2002 to disentangle the portion of decisions regarding transition to first birth attributable to social learning, social pressure, and the social opportunity costs of parenthood. All three mechanisms exerted a positive impact on both fertility intentions and behavior,

Demography and Networks   487 with strong evidence for social learning, where individuals with more friends who have children also have more positive views of children. Furthermore, evidence of social influence from peers with children (termed social pressure in the paper) only existed among childless women older than 28, while weaker evidence for opportunity cost of children was found across age groups. Most of the analyses of social interactions and demographic behavior are unable to address the causal effects of social networks because unobserved factors directly affecting attitudes and behavior may also affect choices of network partners, suggesting endogenous network formation. Bernardi, Keim, & von der Lippe (2007:24) note that the decision to have a child can affect the parent’s social networks (p. 24) as couples with newborn children may selectively form their networks with others who are also parents. The papers by Behrman et al. (2002) and Lois & Arránz Becker (2014), which rely on repeated measures of aggregate network properties reported by a panel of respondents over time, are a first step. While the aforementioned studies have illustrated the effects of networks on demographic outcomes, Verdery (2015) uniquely studied the effects of the demographic transition on the formation of kinship networks. He simulates three different demographic transition scenarios, an early mortality decline, an early fertility transition, and a late fertility transition, to show that small shifts in the timing of the demographic transition can have important implications for the number of kin an individual is connected to: from gains in kin network members at the start of the transition when mortality declines to drops in network members in early fertility transition societies and amplified and enduring networks in societies experiencing late fertility transition, such as those in Sub-Saharan Africa. As alluded to, social network concepts were, and continue to be, integral to the development of explanations for migration behavior and population-level characterizations of migration systems. Measuring and observing the influence of migration networks based in origin communities has grown from the counts of people from ego’s same community who have migrated (anywhere) (Massey, Goldring, & Durand, 1994; Massey & Aysa-Lastra, 2011; Zhao, 2003), accumulated counts of people from origins to particular destinations (Curran & Rivero-Fuentes, 2003), and the demographic composition of migrant networks (Curran & Rivero-Fuentes, 2003; Curran et al., 2005). Other research has measured the content of network ties by measuring the number of family members with migration experience, or the number of friends who have migrated, and also decomposing the assessment of these effects by their gendered content (Curran et al.,  2005; Davis, Stecklov, & Winter,  2002; Liu, 2013; Stecklov et al., 2010). There have also been attempts to capture the intensity of connections within a network through the measurement of spatial and temporal distance away from an origin community, numbers of return visits, and the amount of resources transmitted through a network back to an origin community or household (Garip, Burak, & Snyder, 2015; Manchin & Orazbayev, 2018; Tilly, 2007). In our assessment of the literature, we found only one set of studies that formalizes a network structure in an origin community to explain migration (Jampaklay, Korinek, & Entwisle, 2007; Verdery et al., 2012). In that case, the data for the study were designed to collect network information that could be used to compute structural relationships to explain a number of different kinds of outcomes. These data draw upon complete household and individual censuses within more than 50 villages, gathering information about the contemporary and historic ties between households based on various types of kinship relations and resource exchanges (labor and equipment sharing). Measures of network kinship density were then related to the distance

488   M. Giovanna Merli, Sara R. Curran, and Claire Le Barbenchon of residential settlement, including migration, and longitudinal analyses showed how, as kinship ties attenuated, residential distances elongated. The literature on immigrant incorporation in destinations, while centrally implicating social networks in the form of migrant social capital, does not do more than observe the proportion of co-ethnics (e.g. Toma, 2016; Vacca et al., 2018) or the count of friendship and familial ties (e.g., Korinek, Entwisle, & Jampaklay, 2005; Chang, Wen, & Wang, 2011). These explanations have been used to better understand immigrant economic outcomes, as well as the durability of ethnic enclaves and residential segregation (e.g., Jampaklay et al., 2007; Skop et al., 2006). More recently, network data that map the structural properties of the networks have enabled efforts to accurately describe hidden and rare populations. Network data have also made possible the estimation of population parameters in settings where suitable data for direct estimation are scarce. The accurate description of both hidden and rare populations is crucial in demography. Populations are “hidden” if they are effectively impossible to sample using conventional survey methods that require predefined sampling frames because they are defined with reference to invidious status characteristics not likely to be revealed by omnibus survey research. They are rare if they are too small to show up in standard probability sampling designs with implied high costs due to the large number of screening interviews required to recruit a sample of sufficient size with a basis for inferring representation. Both these concerns apply to the study of migration. Immigrant groups are often too small to show up in standard probability sampling designs. When immigrants have illegal status, they are difficult to identify and include in conventional survey strategies due to fear of repatriation. Approaches that rely on respondents’ social networks to recruit samples of hidden and rare populations have recently been developed and applied to migrant populations (Mouw & Verdery, 2012; Merli et al., 2016; Merli et al. 2019). This approach capitalizes on the actual network structure of the target population to identify and interview multiple waves of respondents through a process of peer referral over the network to generate estimates of population proportions and a postrecruitment weighting of cases to correct biases toward sampling popular individuals. The network is mapped by collecting egocentric network data from respondents together with minimally identifying information about their alters, and this information is used to guide the referral process over the network. Besides allowing a visualization of the network structure, the egocentric network data collected in this way can also enable an understanding of the crucial role of the structural properties of social networks in migration decisions and immigrant adaptation, which is often hampered by the absence of data on migrants’ social networks. Because this method may offer better and more cost-effective coverage of immigrant populations while peer referral reduces respondents’ confidentiality concerns and increases response rates, demographers are spearheading efforts to empirically test and evaluate this approach against known population samples (Merli, Moody, Smith, et al., 2015; Merli et al., 2019). Other network-based methods are currently being developed to estimate the size of hidden populations. Based on the intuition that individuals’ social networks are, on average, representative of the population, the basic network scale-up method and its variants (Maltiel et al., 2015) generate population size estimates using survey data collected about aggregate relational data, that is, the number of connections to members of the hidden population asked of respondents in a random sample of the general population. Finally, Feehan et al. (2017) have developed a novel network-based approach to estimate the age and sex

Demography and Networks   489 distribution of adult deaths in the absence of adequate vital registration data using ­information on the number of deaths in the network in which respondents are embedded (in Rwanda where the method was tested, the name generator is people with whom respondents have shared a meal within the last 12 months). This method builds on the sibling survival method, demographers’ classic tool of mortality estimation in the absence of death registration, but it is more efficient because it can be derived from samples of moderate size, produces more information per interview, and can plausibly generate estimates of deaths in the previous year more efficiently and quickly than data collected in omnibus surveys such as the Demographic and Health Survey.

Future Directions for Network Approaches to Advance Demographic Research Demographic research has relied on network concepts to explain variation in demographic behavior. It has focused on diffusion of information and behaviors related to family planning to explain fertility transitions in pretransition settings or diffusion of behaviors related to family formation in lowest-low fertility settings. To illustrate the mechanisms through which diffusion occurs and how networks work to affect demographic behavior, this research has relied on a limited number of network measures, such as the proportion of network partners with a given attribute or behavior measured cross-sectionally or, more rarely, longitudinally, and structural measures of the degree of social connectedness in a network. Similarly, migration theory has relied on network concepts related to the idea of diffusion to explain both individual migration behavior and changes in the population composition of migrant streams. Even though social network concepts are centrally implicated within migration theorizing, the structural features of social networks within origin and destination sites or the structural features of migrant networks connecting origin and destination are not frequently observed, and the application of formal network tools for empirically evaluating and completely elaborating migration theories is rare. Migration scholars continue to observe that there is considerable variability in migration rates, directionality of flows, the unevenness of immigrant incorporation in destinations, and patterns of transnationality that might be more completely explained with the application of more sophisticated network concepts and techniques(Waldinger, 2008; Zhou & Portes, 2012). More research is needed that explicitly links structural characteristics of networks to the mechanisms involved in affecting demographic behavior. Although we know that networks matter because they shape diffusion of demographic behaviors, it is not known which structural features of the network, in addition to network density, are most relevant to diffusion and whether the introduction of new measures can help confirm or disprove previous findings. For example, in fertility research, because of the importance of opinion leaders for the diffusion of birth control, or of the role of early adopters in one’s network, demographers could ask what positional features of the network increase the likelihood of acquiring information. Network concepts that can be relevant to the spread of information but also to the generation or reinforcement of norms that constrain behavior are structural features of the

490   M. Giovanna Merli, Sara R. Curran, and Claire Le Barbenchon network such as centrality, which describes to what extent certain nodes in a network are prominent or influential, or structural cohesion, operationalized by Moody & White (2003) through the graph theoretical property of node connectivity and the independent paths in a network that keep the group connected. The understanding of the role of network structure in determining how behaviors or resources are encouraged or constrained could also be aided by the mapping and comparison of different network structures and their covariation with demographic outcomes. This approach was taken, for example, by Merli, Moody, Mendelsohn, and Gauthier (2015) to describe variation in sexual network structure and in HIV prevalence across two different contexts of Sub-Saharan Africa and China. As for research that describes how demographic behaviors structure networks, except for rare work considering the effect of the demographic transition on network formation, we do not know much about how individual demographic behavior changes network structure. One domain where the influence of behavior on structure is particularly relevant is migration. Obviously, the removal (addition) of a person from (to) a spatially defined place via migration has implications for the structure of social networks at either origin or destination. Furthermore, the network structure of relationships linking an origin to a destination will also have variable properties that influence migration behavior. Future research investigations might examine how the density of ties accelerates or slows migration rates out of origin communities or eases incorporation at destinations, or whether a positive association of centrality and prior migration experience yields a faster diffusion of migration through a community than does migration experience that is located in peripheral network nodes. The reciprocal dynamics between individual migrant behavior and network structure could also prove scientifically advantageous. Additional advances in accounting for how migration is both cause and consequence of networked relationships might provide exciting empirical leverage for observing the mechanisms influencing structural changes in networks. Another concern in current research is that most studies that implicate networks in demography focus on static networks, usually described (albeit rarely fully mapped) with cross-sectional egocentric network data. The absence of data that would allow for the complete tracing of the evolution of networks over time imposes significant challenges for drawing inference regarding the causal effects of social networks. The longitudinal mapping of changes in social networks including shifts in the strengths of network ties, the addition and subtraction of network members, and changes in relevant interpreters would allow demographers to more rigorously disentangle the causal mechanisms of diffusion in fertility and family formation research. Such temporal and spatial depth in relational data would also valuably capture the structure and nature of networks related to mobility or migration, improving predictive capacities since mobility is so dynamic and variably conditioned by the changing contexts in origin and destination. As noted, demography has relied on network data and tools to enumerate and describe populations in the absence of sampling frames and to estimate population parameters in the absence of direct data on mortality. Reliance on social network data and tools has allowed demographers to do this more efficiently and cost-effectively than with conventional approaches to enumeration, estimation, and description, by recruiting small but precise samples as well as network information that can be easily collected in omnibus social surveys. Recruitment of samples by relying on respondents’ social networks is also relevant to demographers interested in seeking alternatives to conventional data collection

Demography and Networks   491 schemes (e.g., Brick, 2011). Because response rates to government and privately sponsored ­household surveys for demographic research are falling throughout the world’s high­income countries (De Leeuw & De Heer,  2002), including the United States (National Research Council,  2013), approaches where the interaction between interviewer and respondent is mediated by a member of the respondent’s social network are promising and should be further investigated, especially as they could pertain to the recruitment of samples of a wider range and type of populations of interest to demographers, including the general population.

Note 1. A finer distinction within demography is one between demographic methods or formal demography, consisting of demographers’ unique framework and technical toolkit, and population studies, which encompass research from multiple disciplinary perspectives on the causes and consequences of population change (Preston, 1993; Xie, 2000). Here we refer to demography as an all-encompassing term.

References Bearman, P. S. (1993). Relations into rhetorics: Local elite social structure in Norfolk, England, 1540–1640. New Brunswick, NJ: Rutgers University Press. Behrman, J. R., Kohler, H.-P., & Watkins, S. C. (2002). Social networks and changes in contraceptive use over time: Evidence from a longitudinal study in rural Kenya. Demography, 39(4), 713–738. Bernardi, L. (2003). Channels of social influence on reproduction. Population Research and Policy Review, 22(5/6), 427–555. Bernardi, L., Keim, S., & von der Lippe, H. (2007). Social influences on fertility. Journal of Mixed Methods Research, 1(1), 23–47. Bongaarts, J. (2006). The causes of stalling fertility transitions. Studies in Family Planning, 37(1), 1–16. Bongaarts, J. (2017). Africa’s unique fertility transition. Population and Development Review, 43(S1), 39–58. Bongaarts, J., & Watkins, S. C. (1996). Social interactions and contemporary fertility transitions. Population and Development Review, 22(4), 639–682. Boyd, M. (1989). Family and personal networks in international migration: Recent developments and new agendas. International Migration Review, 23(3), 638–670. Brass, W., & Coale, A. (1977). Methods of analysis and estimation. In D. P. Smith, & N. Keyfitz (Eds.), Mathematical Demography. Biomathematics (Vol 6, pp. 307–313). Berlin, Heidelberg: Springer. Brick, M. (2011). The future of survey sampling. Public Opinion Quarterly, 75(5) Special Issue 2011, 872–888. Caldwell, J. C. (1976). Toward a restatement of demographic transition theory. Population and Development Review, 2(3/4), 321–366. Castles, S., De Haas, H., & Miller, M. J. (2014). The age of migration: International population movements in the modern world (5th Ed). London, UK: Palgrave Macmillan. Chang, K.  C., Wen, M., & Wang, G. (2011). Social capital and work among rural-to-urban migrants in China. Asian Population Studies, 7(3), 275–293.

492   M. Giovanna Merli, Sara R. Curran, and Claire Le Barbenchon Curran, S. R., Garip, F., Chung, C. Y., & Tangchonlatip, K. (2005). Gendered migrant social capital: Evidence from Thailand. Social Forces, 84(1), 225–255. Curran, S.  R., & Rivero-Fuentes, E. (2003). Engendering migrant networks: The case of Mexican migration. Demography, 40(2), 289–307. Davis, B., Stecklov, G., & Winter, P. (2002). Domestic and international migration from rural Mexico: Disaggregating the effects of network structure and composition. Population Studies, 56, 291–309. De Leeuw, E. D., & De Heer, W. (2002). Trends in household survey nonresponse: A longitudinal and international comparison. In R.  M.  Groves, D.  A.  Dillman, J.  L.  Eltinge, & R. J. A. Little (Eds.), Survey nonresponse (pp. 41–54). New York, NY: Wiley. Emirbayer, M. (1997). Manifesto for a relational sociology. American Journal of Sociology, 103(2), 281–317. Emirbayer, M., & Goodwin, J. (1994). Network analysis, culture, and the problem of agency. American Journal of Sociology, 99(6), 1411–1454. Entwisle, B., Rindfuss, R. R., Guilkey, D. K., Chamratrithirong, A., Curran, S. R., & Sawangdee, Y. (1996). Community and contraceptive choice in rural Thailand: A case study of Nang Rong. Demography, 33(1), 1–11. Fawcett, J.  T. (1989). Networks, linkages, and migration systems. International Migration Review, 23(3), 671–680. Feehan, D. M., Mahy, M., & Salganik, M. J. (2017). The network survival method for estimating adult mortality: Evidence from a survey experiment in Rwanda. Demography, 54(4), 1503–1528. Fussell, E. (2010). The cumulative causation of international migration in Latin America. Annals of the American Academy of Political and Social Science, 630(1), 162–177. Garip, F. (2012). Discovering diverse mechanisms of migration: The Mexico–US Stream 1970–2000. Population and Development Review, 38(3), 393–433. Garip, F., Burak E., & Snyder, B. (2015). Network effects in migrant remittances: Evidence from household, sibling, and village ties in Nang Rong, Thailand. American Behavioral Scientist, 59(9), 1066–1082. Grace, K., & Sweeney, S. H. (2013). Understanding stalling demographic transition in highfertility countries: A case study of Guatemala. Journal of Population Research, 30(1), 19–37. Greenwood, M.  J. (1975). Research on internal migration in the United States: A survey. Journal of Economic Literature, 13(2), 397–433. Jampaklay, A., Korinek, K., & Entwisle, B. (2007). Residential clustering among Nang Rong migrants in urban settings of Thailand. Asian and Pacific Migration Journal, 16(4), 485–510. Knodel, J., & Van de Walle, E. (1979). Lessons from the past: Policy implications of historical fertility studies. Population and Development Review, 5(2), 217–245. Kohler, H.-P., Behrman, J., & Watkins, S. (2000). Empirical assessments of social networks, fertility and family planning programs. Nonlinearities and their implications. Demographic Research, 3(7). www.jstor.org/stable/26348012. Accessed on: 26-02-2020. Kohler, H.-P., Behrman, J.  R., & Watkins, S.  C. (2001). The density of social networks and fertility decisions: Evidence from South Nyanza District, Kenya. Demography, 38(1), 43–58. Kohler, H-P., Helleringer, S., Behrman, J. R., & Watkins, S. C. (2015). The social and the sexual. Networks in contemporary demographic research. In P. Kreager, B. Winney, S. Ulijaszek, & C. Capelli (Eds.), Population in the human sciences: Concepts, models, evidence (pp. 196–237). Oxford, UK: Oxford University Press. Korinek, K., Entwisle, B., & Jampaklay, A. (2005). Through thick and thin: Layers of social ties and urban settlement among Thai migrants. American Sociological Review, 70(5), 779–800.

Demography and Networks   493 Lee, R. (2001, June 29). Demography abandons its core (Unpublished manuscript). University of California, Berkeley. Liu, M.  M. (2013). Migrant networks and international migration: Testing weak ties. Demography, 50(4), 1243–1277. Lois, D., & Arránz Becker, O. (2014). Is fertility contagious? Using panel data to disentangle mechanisms of social network influences on fertility decisions. Advances in Life Course Research, 21, 123–134. MacDonald, J. S., & MacDonald, L. D. (1964). Chain migration ethnic neighborhood formation and social networks. Milbank Memorial Fund Quarterly, 42(1), 82–97. Maltiel, R., Raftery, A. E., McCormick, T. H., & Baraff, A. J. (2015). Estimating population size using the network scale up method. Annals of Applied Statistics, 9(3), 1247. Manchin, M., & Orazbayev, S. (2018). Social networks and the intention to migrate. World Development, 109, 360–374. Massey, D. S. (1988). Economic development and international migration in comparative perspective. Population and Development Review, 14(3), 383–413. Massey, D. S., Arango, J., Hugo, G., Kouaouci, A., Pellegrino, A., & Taylor, J. E. (1994). An evaluation of international migration theory: The North American case. Population and Development Review, 20(4), 699–751. Massey, D. S., & Aysa-Lastra, M. (2011). Social capital and international migration from Latin America. International Journal of Population Research, 2011, 834145. doi:10.1155/2011/834145. Massey, D. S., Goldring, L., & Durand, J. (1994). Continuities in transnational migration: An analysis of nineteen Mexican communities. American Journal of Sociology, 99(6), 1492–1533. McLanahan, S. (2004). Diverging destinies: How children are faring under the second demographic transition. Demography, 41(4), 607–627. McCormick, T.  H., & Zheng, T. (2012). Latent demographic profile estimation in hardto-reach groups. Annals of Applied Statistics, 6(4), 1795. Merli, M.  G., Moody, J., Mendelsohn, J., & Gauthier, R. (2015). Heterosexual mixing in Shanghai: Are heterosexual contact patterns in China compatible with an HIV/AIDS epidemic. Demography, 52(3), 919–943. Merli, M. G., Moody, J., Smith, J., Li, J., Weir, S., & Chen, X. S. (2015). Challenges to recruiting representative samples of female sex workers in China using respondent driven sampling. Social Science & Medicine, 125, 79–93. Merli, M. G., Mouw, T., Stolte, A. & Le Barbenchon, C. (2019, April). Using Multiple Modes of Data Collection to Recruit Migrant Samples With Network Sampling With Memory: The Chinese Immigrants in Raleigh-Durham (ChIRDU) Study. Paper presented at the Population Association of America, Austin, TX. Merli, M. G., Verdery, A., Mouw, T., & Li, J. (2016). Sampling migrants from their social networks: The demography and social organization of Chinese migrants in Dar es Salaam, Tanzania. Migration Studies, 4(2), 182–214. Montgomery, M.  R., & Casterline, J.  B. (1993). The diffusion of fertility control in Taiwan: Evidence from pooled cross-section time-series models. Population Studies, 47(3), 457–479. Montgomery, M. R., & Casterline, J. B. (1996). Social learning, social influence, and new models of fertility. Population and Development Review, 22, 151–175. Moody, J., & White, D.R. (2003). Structural cohesion and embeddedness: A hierarchical concept of social groups. American Sociological Review, 68(1), 103–127. Mouw, T., & Verdery, A. M. (2012). Network sampling with memory: A proposal for more efficient sampling from social networks. Sociological Methodology, 42(1), 206–256.

494   M. Giovanna Merli, Sara R. Curran, and Claire Le Barbenchon Munshi, K., & Myaux, J. (2006). Social norms and the fertility transition. Journal of Development Economics, 80(1), 1–38. Myrdal, G. (1958). Economic theory and underdeveloped regions. London, UK: Gerald Duckworth Publishers. Myrskylä, M., Kohler, H., & Billari, F. C. (2009). Advances in development reverse fertility declines. Nature, 460(7256), 741–743. National Research Council. (2013). Nonresponse in social science surveys: A research agenda (R. Tourangeau & T. J. Plewes, Eds.). Panel on a Research Agenda for the Future of Social Science Data Collection, Committee on National Statistics. Division of Behavioral and Social Sciences and Education. Washington, DC: National Academies Press. Notestein, F. W. (1945). Population—The long view. In T. W. Schultz (Ed.), Food for the world. (pp. 36–57). Chicago, IL: University of Chicago Press. Omran, A. R. (2005). The epidemiologic transition: A theory of the epidemiology of population change. Milbank Quarterly, 83(4), 731–757. Palloni, A. (2002). Rethinking the teaching of demography: New challenges and opportunities. Genus, 58(3/4), 35–70. Portes, A. (1997). Immigration theory for a new century: Some problems and opportunities. International Migration Review, 31(4), 799–825. Preston, S. H. (1993). The contours of demography: Estimates and projections. Demography, 30(4), 593–606. Preston, S., Heuveline, P., & Guillot, M. (2001). Demography: Measuring and modeling population processes. Malden, MA: Blackwell Publishing. Reher, D.  S. (2011). Economic and social implications of the demographic transition. Population and Development Review, 37, 11–33. Rogers, E. M. (2003). Diffusion of innovations (5th ed.). New York, NY: Free Press. Rogers, E. M., & Kincaid, D. L. (1981). Communication networks: Toward a new paradigm for research. New York, NY: Free Press. Rosero-Bixby, L., & Casterline, J. (1993). Modelling diffusion effects in fertility transition. Population Studies, 47(1), 147–167. Rutenberg, N., & Sullivan, J. (1991, August). Direct and indirect estimates of maternal mortality from the sisterhood method. Paper presented at the Demographic and Health Surveys World Conference, Washington, DC. http://www.popline.org/node/316144 Sandberg, J. (2006). Infant mortality, social networks, and subsequent fertility. American Sociological Review, 71(2), 288–309. Sandberg, J., Rytina, S., Delaunay, V., & Marra, A. S. (2012). Social learning about levels of perinatal and infant mortality in Niakhar, Senegal. Social Networks, 34(2), 264–274. Skop, E., Peters, P. A., Amaral, E. F., Potter, J. E., & Fusco, W. (2006). Chain migration and residential segregation of internal migrants in the metropolitan area of São Paulo, Brazil. Urban Geography, 27(5), 397–421. Smith, R. (2006). Mexican New York: Transnational lives of new immigrants. Berkeley and Los Angeles, CA: University of California Press. Stecklov, G., Carletto, C., Azzarri, C., & Davis, B. (2010). Gender and migration from Albania. Demography, 47(4), 935–961. Tilly, C. (2007). Trust networks in transnational migration. Sociological Forum, 22(1), 3–24. Timaeus, I. (1991). Measurement of adult mortality in less developed countries: A comparative review. Population Index, 57, 552–568.

Demography and Networks   495 Toma, S. (2016). The role of migrant networks in the labour market outcomes of Senegalese men: How destination contexts matter. Ethnic and Racial Studies, 39(4), 593–613. Udry, J. R. (1982). The effect of normative pressures on fertility. Population and Environment, 5(2), 109–122. United Nations. (1983). Manual X: Indirect techniques for demographic estimation. Department of International Economic and Social Affairs Population Studies, No. 81. New York, NY: United Nations. Vacca, R., Solano, G., Lubbers, M. J., Molina, J. L., & McCarty, C. (2018). A personal network approach to the study of immigrant structural assimilation and transnationalism. Social Networks, 53, 72–89. Valente, T. W., Watkins, S. C., Jato, M. N., Van Der Straten, A., & Tsitsol, L. P. M. (1997). Social network associations with contraceptive use among Cameroonian women in voluntary associations. Social Science & Medicine, 45(5), 677–687. Verdery, A. M. (2015). Links between kinship and demographic transitions. Population and Development Review, 41(3), 465–484. Verdery, A. M., Entwisle, B., Faust, K., & Rindfuss, R. R. (2012). Social and spatial networks: Kinship distance and dwelling unit proximity in rural Thailand. Social Networks, 34(1), 112–127. Waldinger, R. (2008). Between “here” and “there”: Immigrant cross-border activities and ­loyalties. International Migration Review, 42(1), 3–29. Waldinger, R., & Fitzgerald, D. (2004). Transnationalism in question. American Journal of Sociology, 109(5), 1177–1195. Watkins, S. (1986). Conclusions. In A. J. Coale & S. Watkins (Eds.), The decline of fertility in Europe. Princeton, NJ: Princeton University Press. Watkins, S. C. (1991). Market, states, nations and bedrooms in Western Europe, 1870–1960. In J. Huber (Ed.), Macro-micro linkages in sociology (pp. 262–279), Newbury Park, CA: Sage Publications. Watkins, S. C. (2000). Local and foreign models of reproduction in Nyanza Province, Kenya, 1930–1998. Population and Development Review, 26(4), 725–759. Watkins, S. C., & Danzi, A. D. (1995). Women’s gossip and social change. Gender & Society, 9(4), 469–490. Xie, Y. (2000). Demography: Past, present, and future. Journal of the American Statistical Association, 95(450), 670–673. Zhao, Y. (2003). The role of migrant networks in labor migration: The case of China. Contemporary Economic Policy, 21(4), 500–511. Zhou, M., & Portes, A. (2012). The new second generation: Segmented assimilation and its variants. In C.  Suarez-Orozco, M.  Suarez-Orozco, & D.  B.  Qin-Hilliard (Eds.), The new immigration (pp. 99–116). New York, NY: Routledge.

Chapter 27

The N eu roscience of Soci a l N et wor ks Carolyn Parkinson, Thalia Wheatley, and Adam M. Kleinbaum

The Neuroscience of Social Networks From its beginning, the study of networks has drawn on a variety of disciplinary perspectives. For much of its history, research on social networks has assumed that social networks behave like other similarly large, interconnected structures. However, the nodes that make up social networks—human beings—think and behave in flexible, complex, and often seemingly irrational ways. A deep understanding of social networks, therefore, requires not only analysis at the network level but also an understanding of how such networks shape and are shaped by the psychological processes of their members. In recent years, psychol­ ogy has begun to make inroads into the network literature, but while neuroscience is an increasingly important area of psychology, research on the neuroscience of social networks remains scarce. In this chapter, we review the extant research pertaining to the neurosci­ ence of social networks and sketch a research agenda to augment this already interdiscipl­ inary field with insights from neuroscience.

Fields Collide: The Social Brain Hypothesis Research on the neuroscience of social networks traces its origins to the work of the anthro­ pologist Robin Dunbar. Dunbar began with the observation that as the size of a group increases, the social complexity—that is, the number of potential dyadic ties within that group—increases exponentially. Combining field-based observations of social primates with neuroanatomical data, he noted a correlation between the average size of the brain’s neocortex in a primate species and the sociality of that species (Figure 27.1).

THE Neuroscience of Social Networks   497 (b)

(a)

1000

Humans Mean Group Size

Mean group size

100

10

1000 1000 Camps

10

Clans

1 0

Tribes 0

2 1 Neocortex ratio

3

4

5

Individual societies

figure 27.1  Predicting human social group size from brain structure. (A) The relationship between mean social group size and neocortex ratio [i.e., (neocortex volume) / (total brain volume – neocortex volume)] in primates (white triangles = prosimians; black triangles = New and Old World Monkeys; white squares = apes; black square = modern humans; dashed lines depict, from left to right, separate regression lines for prosimians, monkeys, and apes). By extrapolating the relationship between group size and neocortex ratio in other primates to predict the average human social group based on the characteristic human neocortex ratio, Dunbar (1998) predicted an average social group size for humans of approx­ imately 150 individuals. This number corresponds closely to the observed mean group size in modern humans (black square). Reproduced from (Dunbar, 2018). (B) Average social group sizes across three contemporary samples from the United States (black triangles), as well as traditional human societies from Africa, Asia, Australia, North America, and South America, including hunter-gatherer and horticultural communities. While hunter-­gatherers tend to form small, relatively unstable, overnight camps of 30 to 50 individuals (white cir­ cles) and larger tribes of 500 to 2,500 individuals defined by a common cultural identity (white squares), they also consistently form clans or villages of approximately 150 individuals (black circles) whose members interact with one another regularly enough to form bonds based on direct and specific knowledge about each other (Dunbar,  1993). The predicted social group size (i.e., 150) extrapolated from the relationship shown in (A) is depicted by the solid black horizontal line; dashed horizontal lines indicate 95% confidence intervals. Reproduced from Dunbar (1998).

Extrapolating from a regression model relating neocortical volume and social group size in primates, Dunbar (1993) predicted that humans should have an average social group size of 150 individuals (Figure 27.1). This number—now known as “Dunbar’s number”—turns out to be a surprisingly common group size for humans. Dunbar found 150 to be the aver­ age clan size in traditional hunter-gatherer societies characterized by anthropologists (Dunbar, 1993). Similarly, although modern industrialized societies are much larger than 150 individuals, 150 appears to be the limit on the number of individuals (e.g., relatives, friends, acquaintances) with whom we maintain regular contact on at least an annual basis, and with whom we maintain defined social relationships (for a review, see Dunbar, 2008). In the corporate world, the company behind the Gore-Tex brand is well known for its policy of building plants to house 150 employees, with subsequent growth requiring the addition

498   Carolyn Parkinson, Thalia Wheatley, and Adam M. Kleinbaum of a new building. “We’ve found again and again that things get clumsy at one hundred and fifty,” founder Bill Gore said (quoted in Gladwell, 2000). Dunbar’s idea, known as the social brain hypothesis, posits that humans’ exceptional intel­ ligence and corresponding unusually large brains evolved to meet the pressures associated with surviving and reproducing in large, complexly bonded groups (Byrne & Whiten, 1988; Dunbar,  1993). In many other species, interactions with unrelated others are limited to aggressive and reproductive encounters. Even among the relatively small subset of species whose members live peacefully in groups alongside nonkin with whom they have no repro­ ductive ties, social groups are often composed of fluid, anonymous aggregations (Dunbar & Shultz, 2010). Contrastingly, as humans, we spend our lives almost entirely in the company of unrelated others with whom we forge lasting, intense bonds of the sort typically reserved for reproductive relationships in most other species (Dunbar & Shultz, 2007). Successfully navigating groups composed of very intense and varied social relationships characterized by shifting loyalties and rivalries, coalition formation, tactical deception, and strategic betrayals requires a brain with considerable computing power, since each member must keep track of his or her own relationships with others, relationships between third parties, and how best to use this information to his or her own benefit. A considerable body of neuroscience evidence has amassed in support of the social brain hypothesis by systematically relating social network size to brain size, and in particular, to the relative volume of neocortex (i.e., a component of the brain involved in higher-order mental functions, such as conscious thought and language), across species. In line with the notion that the cognitive demands of surviving and thriving in large, complexly bonded social groups selected for the unusually large human neocortex, average social group size is positively correlated with relative neocortical volume across primate species (Dunbar, 1993). The brain, in short, appears to have evolved to enable life in our social networks. If so, understanding how the structure and function of the brain affect—and are affected by—our networks is an important area for research.

An Emerging New Field Humans’ distinctive sociality—enabled by our large neocortex—is thought to reflect an evolutionary advantage: coordinating with otherwise would-be strangers likely enhanced our ancestors’ abilities to survive, thrive, and reproduce. However, while inhabiting large, complexly bonded social groups confers substantial benefits to individual group members, it is also extremely cognitively demanding: as group size increases, each group member must monitor and remember an ever-increasing amount of social information (e.g., Who is friends with whom? Who is in conflict with whom?) to maintain harmony and avoid con­ flict within the group. Thus, social complexity and human brain evolution are thought to be tightly linked (Dunbar & Shultz, 2007). Understanding this relationship—how the brain supports and constrains our sociality, and how our social networks impact brain structure and function—is the topic of an emerging new field at the intersection of neuroscience, anthropology, and sociology: the neuroscience of social networks. In this chapter, we explore this new field and how an understanding of the brain may shed light on how we shape and are shaped by the networks in which we are embedded.

THE Neuroscience of Social Networks   499 By integrating approaches from the fields of neuroscience and social network analysis, we can begin to ask questions like: What kinds of social network information does the brain track and encode? How do situational factors shape the kinds of social network information that is encoded, and how does such information modulate subsequent thought and behavior? How do biological factors, such as brain structure, influence the kinds of social network positions that individuals occupy? And how do the network positions that we occupy affect subsequent brain development? Although we do not yet have complete answers to these questions, they are well within reach of the combined expertise of these fields.

Why the Brain? A question often posed to neuroscientists studying social behavior is: Why go to the brain at all? That is, what explanatory power does a neuroscientific explanation provide over and above a behavioral one? The candid answer is that right now, neuroscientific explanations for social behavior are limited. The field of social neuroscience is in its infancy. However, even inchoate explanations are beginning to bear fruit and these explanations reveal two answers. The first is that a deep understanding of how people connect requires an under­ standing of the tools the brain uses to support that connectivity. Moreover, it requires an understanding of the limitations of that biological endowment. The second answer is that a behavioral approach requires behavior to observe. In contrast, brain activity offers a win­ dow into mental processing and can even predict behavior before it occurs, thereby provid­ ing both a predictive model of future behavior and the possibility of intervention. Furthermore, by decoding thought—even patterns of thought that exist under the thresh­ old of conscious awareness (Soon et al.,  2008)—neuroscience can reveal how people respond to the social world in ways that may not be directly reportable by the persons involved or that may lack overt behavioral corollaries. For example, a recent functional magnetic resonance imaging (fMRI) study found that people whose social network positions afford more brokerage opportunities recruit brain regions that support considering others’ points of view to a greater extent when updating their own opinions following exposure to divergent peer feedback (i.e., peers’ opinions that disagreed with their own). Yet, no differences were identified between high- and low­brokerage individuals in behavioral performance (i.e., the extent to which people changed their own opinions following divergent peer feedback) on the same task (O’Donnell et al., 2017). More generally, functional neuroimaging can provide an information-rich measure of diverse aspects of how people attend to, mentally respond to, and interpret the world around them. These characterizations can be compared across members of the same social networks, for example, to investigate homophily and social influence effects in a finergrained manner than might otherwise be possible (Parkinson, Kleinbaum, & Wheatley, 2018). In addition, as discussed later in this chapter, characterizing neural response patterns evoked when people view personally familiar others can provide insight into what aspects of social knowledge people track and retrieve during social encounters (e.g., traits, charac­ teristics of their social network position), and mapping out what brain systems encode such knowledge can inform testable hypotheses regarding impact on downstream thoughts and behaviors (Parkinson, Kleinbaum, & Wheatley,  2017; Zerubavel et al.,  2015). Social

500   Carolyn Parkinson, Thalia Wheatley, and Adam M. Kleinbaum neuroscience may be in its infancy, but its potential to add signal to models of human behavior should not be underestimated. Here, we provide examples of how this potential is currently being realized to advance our understanding of how individuals encode, shape, and are shaped by their social environment and suggest directions for future research.

How the Brain Encodes Social Relationships In this section, we highlight psychological and neuroscientific research on how people think about, and are affected by, social relationships between themselves and others.

Differential Neural Responses to Friends and Strangers The majority of psychological and neuroscientific research examining how individuals’ real-world social relationships impact their thoughts, emotions, and behaviors has been limited to contrasting behavioral and neural responses to friends and strangers. This grow­ ing body of literature suggests marked differences in how the human brain responds to strangers and personally familiar others (Deaner, Shepherd, & Platt, 2007; Fareri et al., 2012; Gobbini et al., 2013; Visconti di Oleggio Castello et al., 2014). For example, merely viewing familiar faces (cf. strangers’ faces) engages brain systems involved in affective processing and theory of mind (i.e., thinking about other people’s thoughts), purportedly reflecting emotional responses and the activation of person knowledge (e.g., traits, intentions, atti­ tudes), respectively (Gobbini & Haxby, 2007). The automatic activation of knowledge about familiar individuals when encountering them is thought to assist the perceiver in appropri­ ately “shifting gears” depending on whom the perceiver has encountered (e.g., an old friend, an acquaintance, an employer). Thus, our brains automatically distinguish between familiar and unfamiliar others when encountering them, and differential neural responses to strang­ ers and familiar others likely serve to facilitate effective, beneficial social interactions.

The Need to Move beyond “Friend versus Stranger” Perhaps reflecting the logistical challenges of bringing real-world social relationships into the lab, very little research has extended the study of how personal relationships are repre­ sented in the brain and/or how they impact neural processing beyond the relatively crude distinction between familiar others and complete strangers. Therefore, with few exceptions (e.g., mother-infant bonds; Case, Repacholi, & Stevenson,  2006; Leibenluft et al.,  2004; E.  E.  Nelson & Panksepp,  1998), extremely little is known about how the human brain encodes information about the nature and quality of our relationships with personally familiar others, or the neural mechanisms through which such information influences cog­ nition and behavior. Yet, given that many of our everyday interactions take place with ­people who are already familiar to us (Sun et al., 2013), it seems likely that these interactions

THE Neuroscience of Social Networks   501 are influenced by more nuanced social relationship information than the simple distinction between those we have encountered before and those we have not. Better understanding how social relationship information, such as social closeness, is encoded in the brain, and how such information impacts downstream neural processing (and thus subsequent thoughts, emotions, and actions), is an important direction for future research.

The Neural Representation of Social Closeness We recently sought to address this gap in understanding by investigating how the brain encodes social closeness (i.e., tie strength) between perceivers and individuals with whom they are familiar. We hypothesized that social closeness would be represented in the brain using neural mechanisms also involved in encoding proximity to oneself in other domains (e.g., spatial and temporal frames of reference). This prediction was rooted in the obser­ vation that converging theories from cognitive linguistics, neuroscience, and social psy­ chology suggest that different domains of psychological distance (i.e., removal from one’s own current, firsthand experience) are encoded similarly. Conceptual metaphor theory (Lakoff & Johnson, 2008) suggests that we use spatial language to describe social relation­ ships (e.g., “close friend,” “distant relative”) because we mentally represent this informa­ tion in spatial terms. Neuroscientists have suggested that over the course of evolution, mechanisms devoted to spatial processing may have been redeployed to “plot” informa­ tion in increasingly abstract (e.g., social, temporal) frames of reference (Parkinson & Wheatley,  2013,  2015; Yamazaki, Hashimoto, & Iriki,  2009). Mounting evidence from social psychology supports these assertions and suggests an explanation for overlap in the language and brain areas used to represent spatial and social distance: the degree to which information is removed from our current experience in time or space or the extent to which it refers to someone else (i.e., social distance) carries a common psychological meaning with important implications for the perceiver: relevance to the self in the here and now, and thus, at what level of detail such information should be construed (Liberman & Trope, 2008; Vallacher & Wegner, 1985). We scanned participants using fMRI as they viewed trials consisting of sequentially pre­ sented pairs of objects photographed at different egocentric distances (spatial distance tri­ als), phrases referring to the immediate or more remote future (temporal distance trials), and names and photographs of familiar others and acquaintances (social distance trials). In each trial, participants saw two images sequentially and were asked to judge how much closer or farther, sooner or later, or more or less familiar the second image was relative to the first for spatial, temporal, and social distance trials, respectively (Figure 27.2). Thus, in effect, the progression of stimuli over time within each trial was analogous to “movement” either toward or away from the participant in a spatial, temporal, or social frame of refer­ ence. Using statistical pattern recognition techniques, we found that a region of parietal cortex with a long-established role in encoding spatial distance in humans and other ani­ mals also underpins mental representations of social and temporal distances. The pattern of activity in this region for nearer versus farther objects was similar to the pattern evoked by more familiar versus less familiar others and the pattern for sooner versus later time (Parkinson, Liu, & Wheatley, 2014), suggesting a common neural mechanism for distin­ guishing social, spatial, and temporal distances from the self.

502   Carolyn Parkinson, Thalia Wheatley, and Adam M. Kleinbaum (a)

(b) Closer

Sooner

Fatther

YEARS FROM NOW

Time

Time

(c)

IN A FEW SECONDS

YEARS FROM NOW

(d) More Familiar

Time

IN A FEW SECONDS

Later

ACQUAINTANCE

FRIEND

Less Familiar

FRIEND

ACQUAINTANCE

figure 27.2  Evidence for shared neural mechanisms for representing spatial, temporal, and social closeness. (A–C) In an fMRI study, participants viewed sequentially presented stimuli such that stimulus change over time was analogous to “movement” either toward or away from the observer in spatial, temporal, or social frames of reference. (A) Spatial dis­ tance trials consisted of objects photographed at different egocentric distances. (B) Temporal distance trials consisted of phrases referring to the immediate or more remote future. (C) Social distance trials consisted of names and photographs of four friends and four acquaintances of the participant. Experimental stimuli contained individuals’ actual first and last names rather than the words friend and acquaintance. (D) In a large cluster within the right inferior parietal cortex, a brain region consistently implicated in spatial cognition, neural response patterns encoded relative distance from the self, irrespective of whether that distance was social, spatial, or temporal in nature. Adapted from Parkinson et al. (2014). Full color figures available on Oxford Handbooks Online. This finding suggests that encoding social closeness to oneself (i.e., tie strength) relies on an evolutionarily ancient computation for representing distances from the self in the phys­ ical world. Our ability to track how socially close we are to an individual at any moment is possible in part because we represent the strength of a social bond as “distance from self.” These findings also support speculation that brain circuitry originally devoted to spatial computations was “recycled” to perform analogous operations in increasingly abstract frames of reference (Parkinson & Wheatley, 2013, 2015; Yamazaki et al., 2009). More gener­ ally, the current results are consistent with suggestions that neural mechanisms supporting higher-order cognition may often be best understood in terms of the computations, rather than the domains of knowledge, that they involve (Mitchell, 2008). Although cognition is often studied according to common-sense categories, it would be inefficient for the brain to represent spatial, social, and temporal distances using entirely separate mechanisms if they carry a common psychological meaning, as suggested by strikingly similar effects on pre­ dictions, evaluations, and behavior (Liberman & Trope, 2008): proximity to the self in the here and now.

THE Neuroscience of Social Networks   503 By combining the characterization of individuals’ real-world social relationships with neuroimaging methods, we can begin to understand how social relationship information, such as the strength of a social tie, is encoded in the brain. Using similar approaches, we are hopeful that future research will shed light on the neural mechanisms through which aspects of our direct social relationships modulate cognitive, emotional, and behavioral responses to other people (e.g., attention to social cues: Deaner et al., 2007; reactions to others’ pain: Martin et al., 2015). Continued progress on this front will require that research­ ers continue to combine information about real-world social relationships (beyond the friend vs. stranger distinction) with methods for characterizing neural information processing.

The Neural Encoding of Indirect Social Relationships In the following section, we consider how people think about and are affected by social relationships between others, and patterns thereof.

The Importance of Indirect Social Relationships to Everyday Human Thought and Behavior One of the key insights of the social network perspective is that relationships between third parties shape behavior (Brent, 2015; Massen, Pašukonis, Schmidt, & Bugnyar, 2014; Massen, Szipl, Spreafico, & Bugnyar, 2014). Knowledge about third-party relationships (e.g., who is friends with whom) and patterns of social ties (e.g., who has many friends) can be useful for managing our own reputations and for tracking the reputations of others. For example, cooperation and trust between otherwise unfamiliar individuals are facilitated when those individuals share mutual friends (Ferrin, Dirks, & Shah, 2006), presumably because shared social ties heighten the potential reputation costs and benefits posed by an interaction (Coleman, 1988). Many everyday behaviors, such as predicting the potential consequences of a recent social misstep or determining how best to seek or spread a particular piece of information, depend on the ability to track and encode not only the states of our own rela­ tionships but also patterns of ties between third parties in our social groups. Despite the apparent importance to individual cognition and behavior of relationships between third parties in our social networks (Krackhardt, 1990), extremely little is known about how, and under what circumstances, such information is encoded in the brain and how third-party relationship knowledge may come to influence cognition, emotions, and behavior during social interactions (Weaverdyck & Parkinson, 2018). Given that neurosci­ entists have historically paid very little attention to even direct social relationship informa­ tion, beyond the friend versus stranger distinction, the dearth of research investigating the neural encoding and consequences for downstream neural processing of indirect social relationship information is perhaps not altogether surprising.

504   Carolyn Parkinson, Thalia Wheatley, and Adam M. Kleinbaum

The Neural Encoding of Social Network Position Characteristics In a recent study, we sought to gain insight into how the human brain tracks and encodes patterns of social relationships—specifically, where others sit in one’s real-world social net­ work (Parkinson et al., 2017). We first characterized the friendship network of a graduate student cohort (N = 275) and recruited a subset of these students for an fMRI study. A cus­ tomized stimulus set was created for each fMRI participant to ensure that he or she viewed individuals who varied in terms of at least two aspects of social network position that we predicted would be behaviorally relevant: geodesic distance from the participant and eigen­ vector centrality. Accordingly, each participant’s stimulus set consisted of brief videos of the two highest and lowest eigenvector centrality individuals at geodesic distances of 1, 2, and 3 from him or her in the friendship network. During the fMRI study, participants were instructed to simply watch these videos and press a button if the same video was displayed twice in a row (to maintain their attention on the screen). After exiting the scanner, participants saw the same classmates again and rated them in terms of perceived social closeness, eigenvector centrality, and brokerage. These subjective ratings were highly positively correlated with the individuals’ actual proximity to the partic­ ipant in the friendship network, eigenvector centrality, and brokerage, respectively, suggest­ ing that participants had relatively accurate knowledge of familiar others’ social network positions. In addition, the fMRI results suggested that this knowledge had been sponta­ neously retrieved in the students’ brains when viewing one another, even in the absence of a related task. In other words, information about social distance from the participant, bro­ kerage, and eigenvector centrality was reliably carried in distributed patterns of neural responses evoked when network members merely saw one another’s faces. Geodesic distance from the participant was encoded in the same region of parietal cortex that we previously found contained a common neural code for social, spatial, and temporal distances from oneself (Parkinson et al.,  2014), consistent with suggestions that brain regions with an evolutionarily old role in encoding physical space may be redeployed to encode where other people sit in a mental map of “social space” (Parkinson & Wheatley, 2013, 2015). Brokerage information was encoded in brain areas (e.g., superior temporal and supplementary motor regions) widely implicated in action understanding. Future work will hopefully clarify if this pattern of results is attributable to brokers imbuing more social meaning into their gestures or commanding differential amounts of attention from perceiv­ ers to their actions and gestures (e.g., because of perceivers’ knowledge of their brokerage status or of qualities related to this aspect of network position). Finally, eigenvector central­ ity in the friendship network was encoded in brain regions critical for inferring others’ mental states and intentions (e.g., dorsomedial prefrontal cortex, posterior cingulate cortex) and visual attention (e.g., extrastriate visual cortex), and for assessing the value of stimuli (e.g., ventromedial frontal cortex). Interestingly, a related study that focused on identifying brain regions that track in-degree centrality reported a similar pattern of results, as described in more detail below. Zerubavel et al. (2015) investigated the neural mechanisms involved in tracking sociomet­ ric popularity, operationalized as in-degree centrality (i.e., the sum of liking ratings received from fellow group members). The authors first characterized the social networks of two

THE Neuroscience of Social Networks   505 student groups (i.e., 13-member on-campus clubs), then measured group members’ brain activity while they viewed photographs of one another in an fMRI scanner. When high in-degree centrality individuals’ photographs were shown, greater activity was observed in brain regions that have previously been implicated in tracking the value of rewards (e.g., ventromedial prefrontal cortex, ventral striatum), as well as in brain systems involved in understanding others’ mental states (e.g., dorsomedial prefrontal cortex, the temporopari­ etal junction). Moreover, Zerubavel et al. (2015) found that activity in reward-related brain areas mediated the relationship between the sociometric popularity of the individual being viewed and the engagement of areas involved in social cognition (e.g., inferring others’ mental states) in the perceiver’s brain. These results suggest that brain systems involved in monitoring the value of stimuli in our surroundings may assign increased motivational relevance to highly popular individuals, which may in turn trigger the engagement of brain regions involved in understanding the mental states of those individuals.

Distinct but Analogous Facets of Social Status Interestingly, the pattern of results described previously concerning the neural encoding of social network centrality closely mirrors what has been observed in studies of the neural encoding and cognitive consequences of dominance-based social status in our close primate relatives. For example, rhesus macaques ascribe value to viewing the faces of high-­status (i.e., dominant) conspecifics and attend more to cues to dominant/high-ranking individuals’ mental states (Deaner, Khera, & Platt,  2005; Klein & Platt,  2013; Shepherd, Deaner, & Platt,  2006). Thus, in humans, sociometric popularity appears to exert strikingly similar effects on neural and cognitive processing to those exerted by dominance-based social status in other primates. More research is needed to better understand the neural mechanisms through which sociometric status is encoded and impacts the processing of other domains of information, as the vast majority of psychological and neuroscientific research on the perception, antecedents, and consequences of social status in humans has centered on the status con­ ferred by physical dominance and, to a lesser degree, prestige (i.e., respect based on exper­ tise; Cheng et al., 2013). Indeed, whereas sociological research has investigated the social status that individuals receive through their patterns of social connections and its influence on interpersonal interactions (e.g., Ellwardt, Labianca, & Wittek, 2012), the overwhelming majority of psychological and neuroscientific literature on social status has operationalized social status in terms of physical dominance and the associated capacity to inflict physical violence on others (Cheng et al., 2013). Given that for modern humans, success in everyday life is increasingly dependent on affiliative social relationships and reputation management (Tennie, Frith, & Frith,  2010) rather than the need to display or avoid physical violence (Pinker, 2011), the support and capacity for influence associated with an individual’s social network position (e.g., being connected to other highly influential individuals) is likely a highly behaviorally relevant facet of social status. Yet, the neural encoding and cognitive consequences of this aspect of social status are only beginning to be understood. For humans, tracking and encoding relationships and interactions between third parties account for a large proportion of what we speak, and likely think, about every day. Roughly two-thirds of human conversations are centered on social topics about third parties (i.e., on

506   Carolyn Parkinson, Thalia Wheatley, and Adam M. Kleinbaum gossip; Dunbar, 2004). Consistent with the suggested importance of patterns of third-party relationships to individual cognition and behavior, this preponderance of gossip is thought to allow information about interactions and relationships between third parties to percolate efficiently through social groups, allowing individuals’ knowledge about other group ­members to extend well beyond what would be possible for them to observe firsthand (Dunbar, 2004; Mullins, Whitehouse, & Atkinson, 2013). Managing our own reputations and monitoring those of others not only figures prominently in modern human life but also has been suggested to be a pressure that drove the evolution of language (Knight, StuddertKennedy, & Hurford, 2000; Tennie et al., 2010). Thus, monitoring relationships and infor­ mation flow between third parties appears to be central to the evolution, and everyday deployment, of human cognition. However, we are only beginning to map out the neural mechanisms involved in monitoring and encoding information about relationships between third parties (e.g., whether an individual is a friend, a friend of a friend, or further removed from oneself in social ties; structural characteristics of an individual’s social network posi­ tion, such as how well connected he or she is, or whether or not he or she presents a broker­ age opportunity). Further research integrating approaches from cognitive neuroscience and social network analysis is needed to better understand these phenomena.

How the Brain Shapes and Constrains Social Networks Until recently, research relating brain size to social network size had only examined this relationship across species. Researchers have now begun to relate brain structure to social network characteristics in humans. The first study of this kind found that social network size and complexity (as indexed by the Number of People in Social Network and Number of Embedded Networks subscales of the Social Network Index [SNI], respectively; Cohen et al., 1997) were correlated with the volume of the amygdala, a brain region involved in social and emotional processing (Bickart et al., 2011). Subsequent studies replicated and extended this work by demonstrating that amygdala volume is positively associated with the size of both face-to-face and online (i.e., Facebook) social networks (Kanai et al., 2012; Von Der Heide, Vyas, & Olson, 2014) and have highlighted positive associations between social net­ work size and the volume of other brain regions within the frontal and temporal lobes that are implicated in social information processing (Kanai et al., 2012; Lewis et al., 2011; Powell et al., 2012; Von Der Heide et al., 2014). However, there remains some inconsistency across studies in the particular brain regions that have been associated with social network size in humans. This may be due in part to variability across studies in the indices of social network size that have been used (e.g., number of Facebook friends; number of people an individual has had social contact with during the past month; the SNI: Cohen et al., 1997; the Norbeck Social Support Questionnaire: Norbeck, Lindsey, & Carrieri, 1981). Although more work is needed to better understand exactly how and why various indices of social engagement are differentially related to brain structure, it is striking that studies using a wide range of meth­ odologies and samples consistently find positive associations between social network size (i.e., ego degree) and the size of brain structures involved social cognition.

THE Neuroscience of Social Networks   507 Notably, Lewis et al. (2011) found that the cortical volume of two regions of the medial prefrontal cortex was positively correlated with both social cognitive competence (indexed by the ability to engage in higher-order reasoning about mental states­, e.g., “I believe that you suppose that she thinks . . .”) and social network size. Later work by the same group demonstrated that individual differences in social cognitive skills mediate the relationship between prefrontal cortical volume and social network size (Powell et al., 2012). These stud­ ies provide an important source of support for the social brain hypothesis: for evolution to work, there must be within-species variability upon which natural selection can operate, and if the social brain hypothesis is correct, then variability in neocortical volume should relate to both social cognitive competencies and social network size. Taken together with the extant body of research relating aspects of brain size (e.g., neocortical volume) to social group size within and across species, these results strongly suggest that the human brain increased in size over the course of evolution to meet the cognitive demands of navigating large, complexly bonded social networks.

Does the Processing Capacity of the Human Brain Constrain Social Network Size? Modern technologies, such as the internet, would seem to provide us with the tools neces­ sary to forge and maintain social relationships on a far larger scale than would otherwise be possible. Yet, the same average social community size—150 individuals­­­—that characterizes social networks in both modern industrial and hunter-gatherer societies (Hamilton et al., 2007; Hill & Dunbar, 2003; Roberts et al., 2009; Zhou et al., 2005) characterizes the number of relationships that people maintain online (e.g., on Facebook: Dunbar, 2016; on Twitter: Gonçalves, Perra, & Vespignani, 2011; via email communications: Haerter, Jamtveit, & Mathiesen, 2012). In addition, use of social networking sites does not appear to meaning­ fully impact face-to-face social network size (Christakis & Fowler, 2009) or feelings of emo­ tional closeness to members of one’s offline network (Pollet, Roberts, & Dunbar,  2011). Thus, even though modern technological innovations allow us to “friend” thousands of individuals, the number with whom we can manage significant relationships is constrained by limits on both our time and the processing capacity of our brains. Of course, an individual’s social effort (e.g., time, emotional investment) is not distrib­ uted equally across his or her alters. There appear to be sharp, consistent breakpoints in an ego’s level of investment in his or her alters, such that our social networks are composed of a series of layers, with each successive layer containing approximately three times the num­ ber of alters in the previous layer (i.e., 5, 15, 45, ~150), and with relationships in each succes­ sive layer characterized by decreasing levels of intimacy and frequency of interaction (Dunbar, 2008). Evidence for this hierarchical structure has been found across diverse cul­ tural contexts (e.g., industrial and hunter-gatherer societies) and in both online and faceto-face networks (Dunbar et al., 2015; Dunbar & Spoors, 1995; Hamilton et al., 2007; Hill & Dunbar, 2003; Zhou et al., 2005). It is possible that universal and biologically predisposed limits on our social cognitive abilities constrain not only the size of our personal social

508   Carolyn Parkinson, Thalia Wheatley, and Adam M. Kleinbaum networks but also the distribution of tie strength within them. For instance, for each ­individual with whom one maintains a strong social tie, one must maintain an exceptionally comprehensive set of memories (e.g., intimate details about that person and the relation­ ships with oneself and others). Cross-cultural consistency in the number of relationships within each concentric “layer” of one’s personal network may reflect limitations on the capacity to remember and manage specific relationship information (Sutcliffe et al., 2012; Zhou et al., 2005). In the same vein, it has even been suggested that the innermost circle of one’s social network is limited to an average of five individuals because humans, on average, can only simultaneously represent the mental states of five individuals (Stiller & Dunbar, 2007). Alternatively, it is possible that human social networks evince a universal “layered” structure across cultures and contexts because of constraints on the amount of time required to forge and maintain a bond of a given strength. Furthering our understand­ ing of precisely how the processing power of the human brain constrains the size and struc­ ture of the social networks that we inhabit is a promising avenue for future research.

How Social Networks Shape the Brain An intriguing new line of research suggests that just as biologically predisposed character­ istics of brain structure and function shape our social networks, our social networks them­ selves can alter our brains. Indeed, recent evidence of the brain’s neuroplasticity has overturned the prior scientific consensus that the brain, once developed, remains largely static throughout adulthood (Pascual-Leone et al., 2005). The prevailing view now holds that brain structure and function remain changeable in response to experience throughout the life course. But very little research to date has examined how the brain adapts in response to the social networks within which we are embedded. When relating brain structure to sociality in humans, it is often difficult to ascertain the direction of causality in the relationship between neural and social network variables: do people whose brains are already especially well suited to advanced social cognition go on to forge more social connections, or do the cognitive demands of managing a larger number of social relationships engender changes in the structure and function of brain regions involved in social cognition? The answer is very likely that both phenomena occur, given that many aspects of social network position appear to be heritable (Fowler, Dawes, & Christakis,  2009) and given the large and growing body of evidence for neuroplasticity (C. A. Nelson, 1999). However, because researchers lack control of human research partici­ pants’ social context, until recently, there was no evidence that individuals’ social networks can shape the structure and function of their brains. Fortunately, laboratory studies of our close primate relatives provide one way to address this issue. In a landmark study, Sallet et al. (2011) assessed the relationship between social network size and brain structure in adult rhesus macaques that were randomly assigned to live in social groups of varying sizes in a research colony. Living in larger social groups caused the macaques to develop more gray matter (i.e., more neural cell bodies) in areas of the frontal and temporal lobes known to be involved in social and emotional processing (e.g., superior temporal sulcus, temporal pole, amygdala, rostral prefrontal cortex), and also caused increased functional coupling (i.e., connectivity) between these brain regions, as

THE Neuroscience of Social Networks   509 measured by correlations of fMRI time series across brain regions. In other words, ­manipulating social network size exerted a causal effect on the structure and functional response profile of brain regions involved in social and emotional processing. Thus, the structure and function of brain regions implicated in navigating the social envi­ ronment, rather than being fixed or purely genetically predetermined, appear to remain labile even into adulthood. Observational evidence in humans appears to be consistent with the results of the macaque research. One study showed that relative to bus drivers, who repeatedly drive a fixed route, London taxi drivers, who must learn several thousand streets to successfully navigate the city, have increased gray matter volume in a brain region involved in encoding mental maps of space, the posterior hippocampus, and the volume of this brain structure is positively correlated with years of taxi-driving experience (Maguire, Woollett, & Spiers, 2006). Thus, increased experience with mentally representing and rea­ soning about complex maps of space (i.e., engaging in expert navigation) increases gray matter volume in a brain region supporting spatial navigation. In the same way, inhabiting a complex social environment (e.g., being embedded in a larger social group) may demand a significant degree of expert social cognition, thereby shaping the structure and function of brain regions supporting the underlying mental processes. In addition to their importance to furthering our understanding of the relationship between brain structure and the social environment, these findings have several practical implications. For example, many clinical disorders associated with alterations in social engagement (e.g., depression, autism) are also associated with neuroanatomical differences (e.g., cortical thinning) in brain regions involved in social cognition and emotion regula­ tion (Hadjikhani et al., 2005). If the complexity of one’s social environment exerts a causal effect on the structure of brain regions involved in social and emotional processing, then relationships between the structure of these brain regions and clinical disorders character­ ized by altered social interactions may at least partially reflect the consequences, rather than the causes, of concomitant alterations in social functioning. The finding that social network size shapes brain structure also has important implica­ tions for efforts to identify potential risk factors for, and protective factors against, cognitive decline in older adults. The same brain regions (e.g., prefrontal cortex) that increase in volume to support navigating large social networks (Sallet et al., 2011) also support more general high-level cognitive functions, such as working memory, planning, attention, and language. This has led some researchers to suggest that maintaining a high level of social engagement is analogous to “exercise” for these brain structures, which may provide some degree of protection against the functional impairments associated with normal aging and with the onset of neurodegenerative diseases (Wald, 2016). Consistent with such specula­ tion, longitudinal studies have found evidence that maintaining an extensive social network protects older individuals against the development of dementia (Fratiglioni et al., 2000), and against memory loss more generally (Ertel, Glymour, & Berkman,  2008). Relatedly, while cognitive degeneration and old age typically entail decreased long-distance connec­ tivity between brain regions, a recent study demonstrated that greater social network embeddedness is associated with higher levels of long-distance brain connectivity in older adults (Joo et al., 2017). The relationship between social engagement and cognitive function is not limited to older adults; recent evidence also points to associations between cognitive abilities (e.g., memory, executive functioning) and the size of one’s social network among adults ranging in age from 35 to 85 (Seeman et al., 2011). In addition, a recent study found

510   Carolyn Parkinson, Thalia Wheatley, and Adam M. Kleinbaum that older adults (i.e., 80 years of age on average) who participated in a social engagement intervention consisting of daily, 30-minute web-enabled face-to-face conversations improved on tests of memory and executive function over a six-week period (Dodge et al., 2015). Thus, mounting evidence suggests that social engagement, like cognitive and physical exercise, can aid individuals in staving off the cognitive decline associated with aging. Interestingly, socially focused interventions have been shown to have very high adherence rates (Dodge et al., 2015), possibly because individuals tend to view them as less effortful or aversive than interventions involving cognitive training or physical exercise. Given that social interactions tend to be relatively effortless and enjoyable, encouraging individuals to maintain social ties throughout the lifespan is a promising way to promote healthy brain aging.

Summary Human cognition, behavior, success, hardship, and opportunity are all embedded within the social networks that we build and inhabit. Characteristics of our own relationships in these groups, such as their nature and intimacy, have wide-ranging effects on how we inter­ act with one another. The relationships that shape our social behavior are not limited to our direct social ties, but also include the webs of contacts possessed by each of our interaction partners. Researchers are only beginning to understand how our brains track and encode information about the complex webs of social relationships that we inhabit and how this information is used to shape subsequent mental processing and behavior. We are also only in the early stages of understanding how the evolved structure and function of the human brain impacts how we construct and navigate our social networks and how the social net­ works we inhabit influence brain structure and function. An exciting new body of interdis­ ciplinary research is beginning to shed light on questions central to our understanding of a fundamental facet of human nature: our sociality. Psychologists are realizing that a deep understanding of the mind requires understanding human connectedness. Conversely, social network analysis can benefit from understanding how brain function constrains and shapes that connectedness. Research in cognitive neuroscience and psychology has provided considerable insight into the processes underlying individual human thought and action. Yet, this research has often stripped human perception and behavior of much of its social nature, either studying individuals in isolation or studying them in artificial social contexts. Although these para­ digms can afford experimental control and robust results, their ability to enhance our understanding of real-world social behavior is in many ways limited. Contrastingly, parallel research on social networks consistently demonstrates that both direct and indirect social ties powerfully shape our behavior (Christakis & Fowler, 2009) and, increasingly, that the behavior of humans and other social animals is informed by our knowledge of third-party relationships and by the structure of the social networks we inhabit (e.g., Ellwardt et al., 2012; Ferrin et al., 2006; Fuong, Maldonado-Chaparro, & Blumstein, 2015). By combining these two separate fields, we can begin to understand how larger-scale, emergent social phenom­ ena arise from the constraints and connectivity of individual minds.

THE Neuroscience of Social Networks   511

References Bickart, K. C., Wright, C. I., Dautoff, R. J., Dickerson, B. C., & Barrett, L. F. (2011). Amygdala volume and social network size in humans. Nature Neuroscience, 14(2), 163–164. doi:10.1038/ nn.2724 Brent, L. J. N. (2015). Friends of friends: Are indirect connections in social networks important to animal behaviour? Animal Behaviour, 103, 211–222. doi:10.1016/j.anbehav.2015.01.020 Byrne, R., & Whiten, A. (1988). Machiavellian intelligence: Social expertise and the evolution of intellect in monkeys, apes, and humans. Oxford, UK: Clarendon Press. doi:10.1002/ (SICI)1520–6505(1996)5:53.0.CO;2-H Case, T. I., Repacholi, B. M., & Stevenson, R. J. (2006). My baby doesn’t smell as bad as yours. Evolution and Human Behavior, 27(5), 357–365. doi:10.1016/j.evolhumbehav.2006.03.003 Cheng, J. T., Tracy, J. L., Foulsham, T., Kingstone, A., & Henrich, J. (2013). Two ways to the top: Evidence that dominance and prestige are distinct yet viable avenues to social rank and influence. Journal of Personality and Social Psychology, 104(1), 103–125. Christakis, N. A., & Fowler, J. H. (2009). Connected: The surprising power of our social networks and how they shape our lives. New York, NY: Little, Brown and Company. Cohen, S., Doyle, W. J., Skoner, D. P., Rabin, B. S., & Gwaltney, J. M. (1997). Social ties and susceptibility to the common old. JAMA: The Journal of the American Medical Association, 277(24), 1940–1944. doi:10.1001/jama.1997.03540480040036 Coleman, J.  S. (1988). Social capital in the creation of human capital. American Journal of Sociology, 94(S1), S95. doi:10.1086/228943 Deaner, R. O., Khera, A. V, & Platt, M. L. (2005). Monkeys pay per view: Adaptive valuation of social images by rhesus macaques. Current Biology, 15(6), 543–548. doi:10.1016/j. cub.2005.01.044 Deaner, R. O., Shepherd, S. V., & Platt, M. L. (2007). Familiarity accentuates gaze cuing in women but not men. Biology Letters, 3(1), 65–68. doi:10.1098/rsbl.2006.0564 Dodge, H. H., Zhu, J., Mattek, N., Bowman, M., Ybarra, O., Wild, K., . . . Kaye, J. A. (2015). Web-enabled conversational interactions as a means to improve cognitive functions: Results of a 6-week randomized controlled trial. Alzheimer’s and Dementia: Translational Research and Clinical Interventions, 1(1), 1–12. doi:10.1016/j.trci.2015.01.001 Dunbar, R. I. M. (1993). Coevolution of neocortical size, group size and language in humans. Behavioral and Brain Sciences, 16, 681–735. doi:10.1017/S0140525X00032325 Dunbar, R. I. M. (1998). The social brain hypothesis. Evolutionary Anthropology: Issues, News, and Reviews, 6(5), 178–190. doi:10.1002/(SICI)1520–6505(1998)6:53.0.CO;2–8 Dunbar, R. I. M. (2004). Gossip in evolutionary perspective. Review of General Psychology, 8(2), 100–110. doi:10.1037/1089–2680.8.2.100 Dunbar, R.  I.  M. (2008). Cognitive constraints on the structure and dynamics of social networks. Group Dynamics: Theory, Research, and Practice, 12(1), 7–16. doi:10.1037/ 1089–2699.12.1.7 Dunbar, R. I. M. (2016). Do online social media cut through the constraints that limit the size of offline social networks? Royal Society Open Science, 3(1), 150292. doi:10.1098/rsos.150292 Dunbar, R. I. M. (2018). The anatomy of friendship. Trends in Cognitive Sciences, 22(1), 32–51. doi:10.1016/j.tics.2017.10.004 Dunbar, R. I. M., Arnaboldi, V., Conti, M., & Passarella, A. (2015). The structure of online social networks mirrors those in the offline world. Social Networks, 43, 39–47. doi:10.1016/j. socnet.2015.04.005

512   Carolyn Parkinson, Thalia Wheatley, and Adam M. Kleinbaum Dunbar, R. I. M., & Shultz, S. (2007). Evolution in the social brain. Science, 317(5843), 1344–1347. doi:10.1126/science.1145463 Dunbar, R. I. M., & Shultz, S. (2010). Bondedness and sociality. Behaviour, 147(7), 775–803. doi:10.1163/000579510X501151 Dunbar, R. I. M., & Spoors, M. (1995). Social networks, support cliques, and kinship. Human Nature, 6(3), 273–290. doi:10.1007/BF02734142 Ellwardt, L., Labianca, G., & Wittek, R. (2012). Who are the objects of positive and negative gossip at work? Social Networks, 34(2), 193–205. doi:10.1016/j.socnet.2011.11.003 Ertel, K. A., Glymour, M. M., & Berkman, L. F. (2008). Effects of social integration on preserv­ ing memory function in a nationally representative US elderly population. American Journal of Public Health, 98(7), 1215–1220. doi:10.2105/AJPH.2007.113654 Fareri, D. S., Niznikiewicz, M. A., Lee, V. K., & Delgado, M. R. (2012). Social network modu­ lation of reward-related signals. Journal of Neuroscience, 32(26), 9045–9052. doi:10.1523/ JNEUROSCI.0610–12.2012 Ferrin, D. L., Dirks, K. T., & Shah, P. P. (2006). Direct and indirect effects of third-party rela­ tionships on interpersonal trust. Journal of Applied Psychology, 91(4), 870–883. doi:10.1037/0021–9010.91.4.870 Fowler, J. H., Dawes, C. T., & Christakis, N. A. (2009). Model of genetic variation in human social networks. Proceedings of the National Academy of Sciences of the United States of America, 106(6), 1720–1724. doi:10.1073/pnas.0806746106 Fratiglioni, L., Wang, H.  X., Ericsson, K., Maytan, M., & Winblad, B. (2000). Influence of social network on occurrence of dementia: A community-based longitudinal study. Lancet, 355(9212), 1315–1319. doi:10.1016/S0140-6736(00)02113–9 Fuong, H., Maldonado-Chaparro, A., & Blumstein, D. T. (2015). Are social attributes associ­ ated with alarm calling propensity? Behavioral Ecology, 26(2), 587–592. doi:10.1093/beheco/ aru235 Gladwell, M. (2000). The tipping point: How little things can make a big difference. New York, NY: Little, Brown and Company. Gobbini, M.  I., Gors, J.  D., Halchenko, Y.  O., Rogers, C., Guntupalli, J.  S., Hughes, H., & Cipolli, C. (2013). Prioritized detection of personally familiar faces. PLoS One, 8(6), e66620. doi:10.1371/journal.pone.0066620 Gobbini, M.  I., & Haxby, J.  V. (2007). Neural systems for recognition of familiar faces. Neuropsychologia, 45(1), 32–41. doi:10.1016/j.neuropsychologia.2006.04.015 Gonçalves, B., Perra, N., & Vespignani, A. (2011). Modeling users’ activity on twitter networks: Validation of Dunbar’s number. PLoS One, 6(8), e22656. doi:10.1371/journal.pone.0022656 Hadjikhani, N., Joseph, R. M., Snyder, J., & Tager-Flusberg, H. (2005). Anatomical differences in the mirror neuron system and social cognition network in autism. Cerebral Cortex, 16(9), 1276–1282. doi:10.1093/cercor/bhj069 Haerter, J. O., Jamtveit, B., & Mathiesen, J. (2012). Communication dynamics in finite capacity social networks. Physical Review Letters, 109(16), 168701. doi:10.1103/PhysRevLett.109.168701 Hamilton, M. J., Milne, B. T., Walker, R. S., Burger, O., & Brown, J. H. (2007). The complex structure of hunter-gatherer social networks. Proceedings of the Royal Society B: Biological Sciences, 274(1622), 2195–2202. doi:10.1098/rspb.2007.0564 Hill, R. A., & Dunbar, R. I. M. (2003). Social network size in humans. Human Nature, 14(1), 53–72. doi:10.1007/s12110-003-1016-y Joo, W., Kwak, S., Youm, Y., & Chey, J. (2017). Brain functional connectivity difference in the complete network of an entire village: The role of social network size and embeddedness. Scientific Reports, 7(1), 4465. doi:10.1038/s41598-017-04904-1

THE Neuroscience of Social Networks   513 Kanai, R., Bahrami, B., Roylance, R., & Rees, G. (2012). Online social network size is reflected in human brain structure. Proceedings of the Royal Society B: Biological Sciences, 279(1732), 1327–1334. doi:10.1098/rspb.2011.1959 Klein, J. T., & Platt, M. L. (2013). Social information signaling by neurons in primate striatum. Current Biology, 23(8), 691–696. doi:10.1016/j.cub.2013.03.022 Knight, C., Studdert-Kennedy, M., & Hurford, J. (Eds.). (2000). The evolutionary emergence of language: Social function and the origins of linguistic form. Cambridge, UK: Cambridge University Press. doi:10.1017/CBO9780511606441 Krackhardt, D. (1990). Assessing the political landscape: Structure, cognition, and power in organizations. Administrative Science Quarterly, 35(2), 342–369. doi:10.2307/2393394 Lakoff, G., & Johnson, M. (2008). Metaphors we live by. Chicago, IL: University of Chicago Press. Leibenluft, E., Gobbini, M. I., Harrison, T., & Haxby, J. V. (2004). Mothers’ neural activation in response to pictures of their children and other children. Biological Psychiatry, 56(4), 225–232. doi:10.1016/j.biopsych.2004.05.017 Lewis, P. A., Rezaie, R., Brown, R., Roberts, N., & Dunbar, R. I. M. (2011). Ventromedial pre­ frontal volume predicts understanding of others and social network size. NeuroImage, 57(4), 1624–1629. doi:10.1016/j.neuroimage.2011.05.030 Liberman, N., & Trope, Y. (2008). The psychology of transcending the here and now. Science, 322(5905), 1201–1205. doi:10.1126/science.1161958 Maguire, E. A., Woollett, K., & Spiers, H. J. (2006). London taxi drivers and bus drivers: A structural MRI and neuropsychological analysis. Hippocampus, 16(12), 1091–1101. doi:10.1002/hipo.20233 Martin, L. J., Hathaway, G., Isbester, K., Mirali, S., Acland, E. L., Niederstrasser, N., . . . Mogil, J. S. (2015). Reducing social stress elicits emotional contagion of pain in mouse and human strangers. Current Biology, 25(3), 326–332. doi:10.1016/j.cub.2014.11.028 Massen, J. J. M., Pašukonis, A., Schmidt, J., & Bugnyar, T. (2014). Ravens notice dominance reversals among conspecifics within and outside their social group. Nature Communications, 5, 3679. doi:10.1038/ncomms4679 Massen, J.  J.  M., Szipl, G., Spreafico, M., & Bugnyar, T. (2014). Ravens intervene in others’ bonding attempts. Current Biology, 24(22), 2733–2736. doi:10.1016/j.cub.2014.09.073 Mitchell, J. P. (2008). Activity in right temporo-parietal junction is not selective for theoryof-mind. Cerebral Cortex, 18(2), 262–271. doi:10.1093/cercor/bhm051 Mullins, D. A., Whitehouse, H., & Atkinson, Q. D. (2013). The role of writing and recordkeep­ ing in the cultural evolution of human cooperation. Journal of Economic Behavior & Organization, 90, S141–151. doi:10.1016/j.jebo.2012.12.017 Nelson, C.  A. (1999). Neural plasticity and human development. Current Directions in Psychological Science, 8(2), 42–45. doi:10.1111/1467–8721.00010 Nelson, E.  E., & Panksepp, J. (1998). Brain substrates of infant–mother attachment: Contributions of opioids, oxytocin, and norepinephrine. Neuroscience & Biobehavioral Reviews, 22(3), 437–452. doi:10.1016/S0149-7634(97)00052–3 Norbeck, J. S., Lindsey, A. M., & Carrieri, V. L. (1981). The development of an instrument to measure social support. Nursing Research, 30(5), 264–269. doi:10.1097/00006199198109000-00003 O’Donnell, M. B., Bayer, J. B., Cascio, C. N., & Falk, E. B. (2017). Neural bases of recommen­ dations differ according to social network structure. Social Cognitive and Affective Neuroscience, 12(1), 61–69. doi:10.1093/scan/nsw158 Parkinson, C., Kleinbaum, A.  M., & Wheatley, T. (2017). Spontaneous neural encoding of social network position. Nature Human Behaviour, 1, 72. doi:10.1038/s41562-017–0072

514   Carolyn Parkinson, Thalia Wheatley, and Adam M. Kleinbaum Parkinson, C., Kleinbaum, A.  M., & Wheatley, T. (2018). Similar neural responses predict friendship. Nature Communications, 9, 332. doi:10.1038/s41467-017-02722-7 Parkinson, C., Liu, S., & Wheatley, T. (2014). A common cortical metric for spatial, temporal, and social distance. Journal of Neuroscience, 34(5), 1979–1987. doi:10.1523/JNEUROSCI. 2159–13.2014 Parkinson, C., & Wheatley, T. (2013). Old cortex, new contexts: Re-purposing spatial percep­ tion for social cognition. Frontiers in Human Neuroscience, 7(October), 645. doi:10.3389/ fnhum.2013.00645 Parkinson, C., & Wheatley, T. (2015). The repurposed social brain. Trends in Cognitive Sciences, 19(3), 133–141. doi:10.1016/j.tics.2015.01.003 Pascual-Leone, A., Amedi, A., Fregni, F., & Merabet, L. B. (2005). The plastic human brain cortex. Annual Review of Neuroscience, 28, 377–401. doi:10.1146/annurev.neuro.27. 070203.144216 Pinker, S. (2011). Decline of violence: Taming the devil within us. Nature, 478(7369), 309–311. doi:10.1038/478309a Pollet, T. V., Roberts, S. G. B., & Dunbar, R. I. M. (2011). Use of social network sites and instant messaging does not lead to increased offline social network size, or to emotionally closer relationships with offline network members. Cyberpsychology, Behavior and Social Networking, 14(4), 253–258. doi:10.1089/cyber.2010.0161 Powell, J., Lewis, P.  A., Roberts, N., García-Fiñana, M., & Dunbar, R.  I.  M. (2012). Orbital prefrontal cortex volume predicts social network size: An imaging study of individual differences in humans. Proceedings of the Royal Society B: Biological Sciences, 279(1736), 2157–21562. doi:10.1098/rspb.2011.2574 Roberts, S. G. B., Dunbar, R. I. M., Pollet, T. V., & Kuppens, T. (2009). Exploring variation in active network size: Constraints and ego characteristics. Social Networks, 31(2), 138–146. doi:10.1016/j.socnet.2008.12.002 Sallet, J., Mars, R.  B., Noonan, M.  P., Anderson, J., O’Reilly, J.  X., Jbabdi, S., . . . Rushworth, M. F. S. (2011). Social network size affects neural circuits in macaques. Science, 334(6056), 697–700. doi:10.1126/science.1210027 Seeman, T.  E., Miller-Martinez, D.  M., Stein Merkin, S., Lachman, M.  E., Tun, P.  A., & Karlamangla, A. S. (2011). Histories of social engagement and adult cognition: Midlife in the U.S. study. Journals of Gerontology Series B: Psychological Sciences and Social Sciences, 66B(Suppl. 1), i141–152. doi:10.1093/geronb/gbq091 Shepherd, S. V., Deaner, R. O., & Platt, M. L. (2006). Social status gates social attention in monkeys. Current Biology, 16(4), R119–120. doi:10.1016/j.cub.2006.02.013 Soon, C. S., Brass, M., Heinze, H.-J., & Haynes, J.-D. (2008). Unconscious determinants of free decisions in the human brain. Nature Neuroscience, 11(5), 543–545. doi:10.1038/nn.2112 Stiller, J., & Dunbar, R. I. M. (2007). Perspective-taking and memory capacity predict social network size. Social Networks, 29(1), 93–104. doi:10.1016/j.socnet.2006.04.001 Sun, L., Axhausen, K. W., Lee, D.-H., & Huang, X. (2013). Understanding metropolitan pat­ terns of daily encounters. Proceedings of the National Academy of Sciences of the United States of America, 110(34), 13774–13779. doi:10.1073/pnas.1306440110 Sutcliffe, A., Dunbar, R., Binder, J., & Arrow, H. (2012). Relationships and the social brain: Integrating psychological and evolutionary perspectives. British Journal of Psychology (London, England: 1953), 103(2), 149–168. doi:10.1111/j.2044–8295.2011.02061.x Tennie, C., Frith, U., & Frith, C. D. (2010). Reputation management in the age of the worldwide web. Trends in Cognitive Sciences, 14(11), 482–488. doi:10.1016/j.tics.2010.07.003

THE Neuroscience of Social Networks   515 Vallacher, R. R. R., & Wegner, D. M. D. M. (1985). A theory of action identification. Hillsdale, NJ: Lawrence Erlbaum Associates. Visconti di Oleggio Castello, M., Guntupalli, J. S., Yang, H., & Gobbini, M. I. (2014). Facilitated detection of social cues conveyed by familiar faces. Frontiers in Human Neuroscience, 8, 678. doi:10.3389/fnhum.2014.00678 Von Der Heide, R., Vyas, G., & Olson, I. R. (2014). The social network-network: Size is pre­ dicted by brain structure and function in the amygdala and paralimbic regions. Social Cognitive and Affective Neuroscience, 9(12), 1962–1972. doi:10.1093/scan/nsu009 Wald, C. (2016). Social networks: Better together. Nature, 531(7592), S14–15. doi:10.1038/531S14a Weaverdyck, M.  E., & Parkinson, C. (2018). The neural representation of social networks. Current Opinion in Psychology, 24, 58-66. https://doi.org/10.1016/j.copsyc.2018.05.009 Yamazaki, Y., Hashimoto, T., & Iriki, A. (2009). The posterior parietal cortex and non-spatial cognition. F1000 Biology Reports, 1, 74. doi:10.3410/B1-74 Zerubavel, N., Bearman, P. S., Weber, J., & Ochsner, K. N. (2015). Neural mechanisms tracking popularity in real-world social networks. Proceedings of the National Academy of Sciences of the United States of America, 112(49), 15072–15077. doi:10.1073/pnas.1511477112 Zhou, W.-X., Sornette, D., Hill, R. A., & Dunbar, R. I. M. (2005). Discrete hierarchical organi­ zation of social group sizes. Proceedings of the Royal Society B: Biological Sciences, 272(1561), 439–444. doi:10.1098/rspb.2004.2970

CHAPTER 28

Compu tationa l Soci a l Science , Big Data, a n d N et wor ks Bruno Abrahao and Paolo Parigi 1

The web is transforming the way we interact with each other and learn about ourselves. Profound technological innovations are driving major changes in our cultural and economical landscapes. Whether people increasingly rely on online systems to find information, to make and maintain social connections, to collaborate, to make purchases, or to find ways to commute, our traces of digital activity produce data on a scale unimaginable even a few years ago. As a result, our social space has become more quantifiable than ever before. Together with the dramatic increase in computational capabilities and the development of powerful algorithms, we have a first opportunity to deeply understand our own behavior through vast amounts of data that our technology generates. The coupling of a quantified social space with new technologies and algorithms has prompted the emergence of a new science, computational social science (CSS)—the investigation of social phenomena through a combination of data, powerful computational capabilities, and algorithmic approaches to human behavior to address decades-old, as well as new, sociological questions (Giles, 2012, pp. 22, 25). CSS is fueled by new sources of data, collected at massive scales, mainly from the internet, popularly known as “big data.”2 These data are very different in nature from what we have been able to collect in the past. Traditionally, social scientists relied on survey results, small-scale laboratory experiments, and population-level aggregated measures. In recent years, not only does the volume of digital information become astounding, but also we are now able to collect and analyze detailed traces of human activity at the individual level. As we communicate by email, phone calls, messaging applications, connect to others on social media platforms, and participate in buyer-seller markets, a great deal of data we now collect from the web can be naturally modeled as network structures. Big data is fundamentally about interactions, and networks are intuitive abstractions of our online life mediated by technology. The new possibilities that Big data brings about comes with new forms of complexity. Accordingly, data analysis faces new challenges, as we need to address more complex

Computational Social Science, Big Data, and Networks   517 structures giving rise to correlations, unstructured, high-dimensional, and heterogeneous data. Massive amounts of digital data frequently exhibit amplified noise and sampling imbalances, such as severe population underrepresentation. In addition, as the information on the web is mostly textual, data now encode new levels of ambiguity, as natural language data reflect crucial information of human activity. Last, due to the accelerated rate of data production and volume, the scalability of our computational methods for data analysis becomes a major concern. In the case of networks, modeling entities and their relationships encoded in big data using this abstraction, allows us to inherit powerful graph theoretical models and algorithms from computer science, which allow for complex analysis of structural properties (Abrahao, Soundarajan, Hopcroft, Kleinberg  2013) and dynamics in timestamped data (Abrahao, Chierichetti, Kleinberg, Panconesi,  2013) efficiently at massive scales (Dean & Ghemawat, 2008; Owens et al., 2008). On the other hand, statistically, networks represent new types of dependencies in data, as connected entities are correlated in complex ways, which our current statistical tools are not fully able to model. These challenges and opportunities call for the development of novel methodologies and principled approaches to advance our knowledge of social phenomena. In this context, CSS figures as an emerging field whose goals include (1) the use of computational thinking, algorithms, and optimization to reason about social processes; (2) the use of data at finegrained levels and new methodological and computational capabilities to complement tools that social scientists traditionally employed; (3) the use of data at the individual level, and (4) the development of online experiments with large populations, which incorporate the role of context into experimentation. As a community we have made progress on all four goals in recent years (Giles, 2012; Lazer et al., 2009; Mann, 2016), but CSS is currently in its early stages—its foundations are still to be established. As data are becoming increasingly social, coming from diverse domains and reflecting complex systems, to lay out a solid basis for the field and to enable further discoveries, we need to approach CSS using insights that transcend disciplinary boundaries from fields as diverse as sociology, computer science, statistics, economics, psychology, and several others. In what follows, we present a discussion that illustrates the goals and scope of this new field and how it relates to social networks and machine learning. This chapter is organized in three broad sections. We first situate the theoretical foundations of CSS in the early 1950s and beyond. We then cover new methodologies, focusing in particular on applications and limitations of machine learning and social network analysis. From there we discuss the role of randomized online experimentation using internet users as subjects. We conclude with a list of challenges confronting CSS practitioners.

Computational Thinking about Social Processes Although CSS is emerging in response to advancements in computing and availability of data, its foundation can be traced back to the 1950s. Herbert Simon and his collaborators were the

518   Bruno Abrahao and Paolo Parigi first social scientists who attempted to model human behavior through an algorithmic approach (Simon, 2010; Simon & Newell, 1971). Simon introduced several seminal ideas for understanding how humans make decisions. Key to Simon’s approach to human behavior is his notion of bounded rationality, that is, that human cognitive capacities face information constraints that are influenced by the complexity of the environment and the cognitive capacity of the actor. Simon wrote: “Human rational behavior . . . is shaped by a scissors whose two blades are the structure of the task environment and the computational capacity of the actor” (Simon, 1990, p. 7). In situations where the task at hand exceeds the cognitive abilities of the actor, individuals use various heuristics to arrive at a decision about the appropriate behavior. These heuristics were the first attempt to describe human behavior as a computational process. The work of Milgram in the late 1960s (Milgram, 1967; Travers, 1969) is a perfect illustration of how a computational approach to behavior directly links to social networks. Milgram’s experiment aimed at measuring how many steps it would take for a letter given to a random person in the United States to reach a target person, a banker in Boston, Massachusetts in this particular experiment, when people can only forward the message to contacts who they know on a first-name basis. The results of this research are popularly known as the “six degrees of separation”, because the experiment revealed that median number of steps it took for people to route messages to the target was six. Even though this seems like a straightforward measurement of how many steps of indirection separate individuals, the most fascinating aspect of this work was the realization (a few decades later) that there is a natural computational process at work underlying this type of social interaction. Accordingly, people are able to deliver messages using some implicitly natural decentralized routing algorithm they unconsciously use to operate in a social network that they cannot fully observe. Several researchers have replicated and measured the diameter of large populations of humans using data from large-scale social network platforms (“It’s a small world after all,” 1979; Backstrom et al., 2012; Dodds, Muhamad, & Watts, 2003). To understand the more fundamental question about short paths and the ability of people to find them in a decentralized way, Kleinberg in 2000 formally described a simple computational model that led to an elegant structural observation of the process (Kleinberg, 2000). Kleinberg starts with a model consisting of a lattice in d dimensions. If messages were routed using the links of a lattice, they would take a very long time to reach their destination. Accordingly, some of the links that people took to reach a far-away target span large distances (Watts & Strogatz, 1998). For example, consider someone in the West Coast of the United States trying to find a path to a target located in Boston. Instead of taking a path through all states in between the coasts to route the letter, it is easier to simply contact someone from a city on the East Coast and send them the letter. After the letter arrives on the East Coast, the links will tend to become shorter, and when it arrives in Boston, they will become even shorter until the letter reaches the target. Kleinberg conceptualized the formation of these long-range links with probability decaying exponentially with distance, driven by a clustering exponent α. Small α values induce long range links, whereas large values make short range, community-like links more abundant. He then observes that even though real social networks have small diameters, it is not clear that a message would find its way in a few steps without the knowledge of a path built optimally using a global view of the network. Thinking further about the structure of the network, Kleinberg asked what kind of network would allow for the effective discovery of short paths by agents who only have a partial local view of the link structure. The surprising finding of

Computational Social Science, Big Data, and Networks   519 this research is the discovery of a fundamental connection between α and d. Accordingly, there is a unique α for which agents with only a partial and local view of the network are able to efficiently find short paths to a target. Accordingly, this is possible when α = d for any d. The clean mathematical relationship established by Kleinberg between the two structural properties, and the consequences for the global dynamic of social processes taking place in networks, is an illustration of the power of CSS to leverage computational and mathematical methods to model social processes with the goal of illuminating fundamental sociological questions.

Challenges in Modeling Social Data Given the elegance and accuracy of Kleiberg’s model, it is tempting to think that we can describe and understand social behavior as processes governed by simple rules or laws, in the same spirit as we can understand the movements of planets or the workings of gravity. However, social behavior is the confluence of many different forces that are impossible to describe mathematically or by utilitarian views, such as simple objective functions in optimization problems.3 More fundamentally, human behavior is simply too complex to capture, to model, and to predict. Building on the work of Simon, Tversky and Kahneman’s research on how people make decisions helps explain the complexity of human behavior (Tversky & Kahneman, 1974). They showed that individuals use mental shortcuts to make quick decisions when facing information overloads. These shortcuts are not irrational but often generate systemic errors that lead to inaccurate outcomes. Tversky and Kahneman went on to describe three common shortcuts, or heuristics—representatives, availability, and anchoring—that they saw as prevalent in social behavior. Representativeness occurs when we estimate the likelihood of an event on the basis of how much it fits an existing prototype; availability refers instead to estimating the likelihood of an event by the easiness with which it comes to mind; finally, anchoring refers to the tendency to rely heavily on the first piece of information available. Since the publishing of this work, researchers have identified other heuristics responsible for systematically biasing decisions (Kahneman, 2011). The complexity generated by the interaction of individuals’ mental processes for decision making means that it is challenging to describe collective outcomes with mathematical constructs. Consider the example of finding communities in networks. Community ­ ­structure captures the tendency of entities in a network to group together in meaningful subsets whose members have a distinctive relationship to one another. A great deal of ­previous work has been devoted to identifying communities computationally, purely by analyzing the link structure of a network (Girvan & Newman, 2002; Palla et al., 2005; Ahn, Bagrow, & Lehmann, 2010; Newman & Girvan, 2004; Clauset & Newman, 2008; Wasserman & Anderson, 1987). Accordingly, revealing the community structure of a network allows for the analysis of networks at different levels of detail. For example, in marketing applications, entities that belong to the same community may share common interests, which make them the best targets to specific advertisement campaigns, or in a political setting, communities may reflect the organizational landscape of party members and elected officials (Parigi & Bearman, 2008).

520   Bruno Abrahao and Paolo Parigi Since the beginnings of network analysis in anthropology and sociology, the question of how to best identify communities among groups of individuals interacting with each other has been a vexing question. Communities are directly related to the bread and butter of social sciences—social norms, group cohesion, and identity (Moody & White,  2003). Sociologists have been researching methods for community identification and detection for several years, but they have been less concerned about the computational aspects of methods to find communities (see, e.g., White, Boorman, & Breiger, 1976). On the other hand, computer scientists have produced a large variety of efficient algorithms to detect communities. Nonetheless, the algorithmic detection of communities based on mathematical properties of the network partially ignores the “messy” social processes that gave rise to a community in the first place. Abrahao et al. (Abrahao, Soundarajan, Hopcrsoft, Kleinberg, 2013; Abrahao et al., 2012) rigorously analyzed the output of multiple community detection algorithms through a structural separability framework and concluded that they exhibit high structural variability across the collection. Different community detection algorithms output communities with significantly different structural properties, and it is unclear for a nonexpert user which algorithm would produce the intended structure when applied to a particular context of interest. Figure 28.1 shows an illustration of the structure of communities extracted by different algorithms, as well as that of communities manually identified and annotated by experts. More importantly, their research demonstrated that community detection algorithms are far from capturing “real” communities. They compared the structure of communities

Metis

Infomap

Annotated Community

Newman-Modularity

Random Walk

Louvain

figure 28.1  Six different communities of 100 nodes each, identified on the LiveJournal network through different methods, namely Metis, Annotated Community, Random Walk, Infomap, Newman-Modularity, and Louvain. The communities comprise different node sets of the network, which were displayed by applying the same network layout algorithm. The visual diversity of the collection provides a rough and ready illustration of the structural variability that can be produced by the different methods. To aid the identification of structural nuances, the lightness of the red node colors reflect node degree, from fully illuminated (low degree) to dark (high degree).

Computational Social Science, Big Data, and Networks   521 detected by algorithms with that of communities extracted and annotated by domain experts. They showed that it is possible to train an automated classification algorithm to learn to structurally distinguish the annotated communities from the outputs that community detection algorithms produce with high accuracy. The problem lies in the way the community detection literature has evolved: instead of trying to understand the structure of communities empirically, many attempts to define communities are grounded on the notion of mathematical optimization. That is, starting with an a priori expectation about what a community should look like, researchers specify an objective function for a search problem whose solution provides the desired communities. This process has given rise to a large collection of community detection algorithms, each aiming at optimizing a particular objective function through a particular heuristic. Modularity optimization (Newman, 2006) is perhaps the best-known example of this class of algorithms. Modularity aims to evaluate the density of connections within a group of nodes, compared to a null model, in this case, a random network (which by definition possesses no community structure). Because it matches the notion of graph partitioning, the construct that defines community structure as subsets of nodes that are densely connected within a given set but sparsely connected to nodes in other disjoint or overlapping sets is perhaps the most popular goal shared by many community detection algorithms. The difficulty in capturing the notion of communities in networks exemplifies a typical challenge with social data where the design of algorithms faces issues beyond scalability and computational complexity. Instead the problem lacks clear definitions and goals. In addition to the aforementioned multitude of driving forces underlying human behavior, due to the diverse nature of networks, communities are necessarily context dependent, involving interpretations and expectations. Therefore, it has proved challenging to come up with a universal definition and, consequently, algorithms that capture the notion of ­community structure. Attempts to model a multitude of driving forces using a clean mathematical construct or tractable objective functions in optimization problems often fail to even approximate the phenomena we intend to capture in data. Machine learning is a way to overcome this limitation by finding patterns in data and generalizing them via general purpose models.

Machine Learning and Social Sciences Machine learning is a subfield of artificial intelligence that is mostly preoccupied with learning to generalize models from data, which will then be used for sifting more data to make predictions.4 The algorithms propelling modern machine learning do not have hardcoded rules determining the behavior of the machine. Rather, the algorithms allow for learning from the responses Y observed in the environment X and for the machine to adapt by fitting the parameters of a prescribed model F from a family of models, known as a hypothesis space, to best capture the relationship Y = F(X). Modern machine learning has become ubiquitous and its application ranges from the fields of finance (Nevmyvaka, Feng,  & Kearns, 2006) to natural language processing (Young, Hazarinka, Poria, & Cambria, 2018) to robotics and autonomous vehicles (Mnih, Kacukuoglu, Silver, et al., 2015).

522   Bruno Abrahao and Paolo Parigi Machine learning can be incorporated substantively in the work of social scientists (Molina & Garip, 2019). Traditionally, social scientists have focused on developing accurate causal explanations while ignoring the predictive capacity of the models they built (Watts, 2011). Causal explanations were linked to the mathematical properties of a model, a process that aims to discover the structural causes of the phenomenon at hand. The newly discovered structural property was then used for wide generalizations. Thus, it was not the case that practitioners ignored the predictive power of their models, but predictions were deemed unnecessary once the cause of the phenomenon or the generative model were unearthed. As we illustrate next, machine learning algorithms that use big data take a different approach. The availability of massive amounts of data propels the success of machine learning techniques. Consequently, the relatively small datasets traditionally available to social scientists has historically hindered the wide adoption of machine learning in the community, due to generalization problems, as the following example illustrates. Previous research suggests Facebook users give more attention to posts about physical exercise that were posted by others that were similar along demographic characteristics--­ gender, age, ethnicity. Similar to other social sciences processes, concerns about physical activities appeared shaped by the tendency of people to like others that are similar to oneself—what has been labelled as homophily. Yet, despite the common structural mechanism producing the observed finding, the data the researchers used to make this claim came from a small sample of self-selected participants: 232 participants and 30 days’ worth of Facebook posts (Burke & Rains, 2018). This is a perfect example, among many possible, of social science research that uses a small sample to find a structural properties that generalizes beyond the case. Hofman, Sharma, & Watts (2017) suggest that studies based on a small number of observations like the one above are good for generating hypotheses, but have low predictive power and thus, are likely to not replicate. That is, the findings from this case-study are unlikely to generalize to other samples, despite the fact that homophily is a common mechanism for the diffusion of social processes, because the estimated model overfits the data. The low predictive power creates a situation where scientific discovery does not increase the general understanding of the social world. A simple solution would be to add more data. Yet this is not enough. Hofman, Sharma and Watts suggest introducing a clear distinction between exploratory research and predictive research. In exploratory research, the results of all the models the researchers fitted before arriving at the final one are reported. The findings of each model will be reported to “avoid creating a false impression of having confirmed a hypothesis rather than simply having generated one.” (p. 487) The best model from the exploratory phase is then selected and used in a confirmatory study with new data not used for generating the hypotheses. The confirmatory study will specify the research design the researcher plans to use, and the researcher tests and uses only one model to make predictions in the confirmatory stage. Going back to the previous example, if the exploratory phase suggested gender and age to be the most relevant features influencing weight concerns generated by Facebook posts, a researcher could either manipulate the features experimentally (e.g., by creating synthetic networks with different levels of age and gender that participants can voluntarily join [Centola, 2010]) or perform a multivariate analysis and evaluate the impact of the two features on an out-of-sample group. The confirmatory analysis will use methodologies borrowed from machine learning to arrive at accurate predictions. Predictions will be based on new data not used in the exploratory phase. Barbosa et al. (2020) recently employed a similar design

Computational Social Science, Big Data, and Networks   523 for developing a predictive model about trust between hosts and guests in Airbnb. In the exploratory part, the authors calibrated a model using data on roughly four thousand users who played an online investment game aiming to measure interpersonal trust, similar to that described in Abrahao et al. (2017). They used these data to formulate hypotheses about the user behavior that correlated with investment decisions in the game. Finally, they tested their hypotheses by confirming predictions the model produces on a different dataset not used during the modeling phase. An example of the power of leveraging massive amounts of data to make causal arguments using longitudinal data using machine learning is the work of Saha et al. (2019), which analyze millions of Twitter posts to identify the long-term effects of psychiatric medications on their users. Using natural language processing (NLP), the researchers used machine learning to capture the language of people who self report medication intake in discussion forums. This model is then used to identify Twitter users who were taking one of the 49 psychiatric medications that the researchers considered, effectively finding 30,000 of them in a record of all Twitter data over a period of two years. After extracting the entire Twitter history of these users over the two-year period, which accounted for over 100 million Tweets in total, the date of the specific Tweet that allowed the model to identify each user is taken as the treatment date. Machine learning is then employed once again to find patterns in these users’ pretreatment language. The researchers then form a control group of 300,000 users who posted over 700 million Tweets in total using the same pretreatment language as the users in the preceding group, but did not mention any medication intake. The control group members match those in the treatment group in their language features, as well as in their features and behavior as Twitter users, such as the frequency and time of their posts, geographic locations, demographic characteristics, etc. The researchers revealed that, when compared to the control group, an NLP analysis found that posts by those who self report taking an antidepressant medication showed significant, consistent (based on the drug) differences in emotional and cognitive outcomes related to their language before and after they started taking them. Their work showed that healthcare providers can benefit from this kind of social media data analysis, whose reach and scale were unthinkable even a few years ago. Accordingly, this approach has the potential to extend the information provided by clinical trials and to improve treatment choices. An area of social sciences where machine learning has been already applied successfully is in the discovery of network structure. For example, the work of Clauset and Newman (2008) aims to explore a hierarchical community structure (Girvan & Newman, 2002) by fitting the hierarchical model to observed network data and using maximum likelihood (Casella & Berger, 2001), a method at the heart of machine learning, to guide the search for the most probable hierarchical model that generated the data. The authors then further leverage the extracted hierarchy to predict missing links in partially known networks: actors that share the same group in the hierarchical structure are likely to be linked. Their methodology produces predictions with high precision (i.e., the fraction of relevant instances among the retrieved instances) and recall (i.e., the fraction of relevant instances that have been retrieved over the total amount of relevant instances). Although machine learning represents a powerful framework for making predictions in networks, as illustrated by the preceding example, the flipside of analyzing networks using machine learning is that network structure in itself imposes intrinsic computational hardness to the learning methods. One such example is the problem of network inference.

524   Bruno Abrahao and Paolo Parigi Oftentimes data that reflects processes taking place over networks do not explicitly reveal the underlying network. For example, in the blogosphere, “memes” spread through an underlying social network of bloggers, personal contacts influence people’s purchases, and innovations diffuse by word of mouth. Even though we observe the nodes’ identities and the time at which they get “informed,” rarely do we observe a network of diffusion and influence in which the information cascades take place. As reasoning about relationships is central to the social sciences, the goal of the network inference problem (Abrahao, Chierichetti, Kleinberg, Panconesi, 2013) is to turn one or more chronological sequences of events, in which nodes got informed (or infected, using an Epidemiology language), into reconstructed networks. This problem is computationally intractable, and even approximation algorithms based on machine learning methods presented disappointing accuracy and did not scale to networks with more than a few hundred nodes. The researchers in Abrahao, Chierichetti, Kleinberg, Panconesi (2013) analyzed the network inference problem using an information-theoretic approach. They considered the amount of resources (i.e., data and their quality) required in this machine learning task to extract knowledge from data. In the case of network inference, they call this the trace complexity of the problem.5 The authors provide optimal inference algorithms with rigorous upper bounds on their trace complexity, along with information-theoretic lower bounds. They established that the task requires an exorbitant amount of resources. That is, to be able to reconstruct a hidden network, we need to observe a quadratic number of information cascades that spread through the whole network on the number of actors, an unrealistic number in practice. Because of this, a Bayesian analysis would fail to produce a posterior distribution that could predict, between two given options, the most likely network that best explains cascade data. Moreover, the length of the traces plays little role in the exact inference. In fact, the only information that can be used is the fact that the second actor infected (or informed) could only have been influenced by the seed of the cascade; the rest of the trace can be ignored. This shows that this kind of network data provides very little structure for machine learning to reveal. In other words, the task turned big data, consisting of long records of the chronological sequence of infections into a small dataset, where only a piece of each trace is actually useful. Interestingly, while the previous result applies to general networks, we can surprisingly use exponentially fewer traces to reconstruct social networks if we can assume properties of the network structure, which real-world social networks often reflect (e.g., bounded degree or tree-like networks). We can also recover network properties, such as the degree distribution, using a linear number of traces on the number of actors, without reconstructing the network itself. For these special cases, the tail of the traces, which does not contribute to any useful signal to solve the problem in the general case, now become useful. This shows that, by changing the task, we can make most of the data useful again, thereby restoring the big data property of the information available. As a final example, we can apply machine learning to guide social platforms to achieve optimal outcomes in markets. Consider the case of a network of buyers and sellers where we observe only a sparse combination of past interactions between buyers and sellers. We expect that, over time, the exchange of reputation information, reflecting the intrinsic quality of actors, can lead to better matching, either by an organic formation of links independently driven by the actors or by a platform that recommends interactions. From the perspective of a platform, the challenge of building such an algorithm is that, to achieve a global optimum, we need a way to predict all possible relationships that may form. The challenge is that in real systems we often observe only a sparse combination of past

Computational Social Science, Big Data, and Networks   525 interactions between buyers and sellers. Moreover, we only have access to partial information regarding actors’ reputations. Therefore, an adequate training dataset requires a rich subset of observations from the combinatorial explosion of all possible matching between all actors, an unrealistic expectation. Lin et al. (2014) present a machine learning algorithm that integrates ideas from combinatorial multiarmed bandits (Cesa-Bianchi, & Lugosi, 2006) and finite partial monitoring games (Bartók, Zolghadr, & Szepesvári, 2012), to handle all of the aforementioned issues. Using a regression argument, the algorithm only requires feedback on a small set of matchings to rapidly learn optimal ones, thereby finding a system-wide configuration of transactions that maximize the global utility. They isolate offline optimization tasks from online learning and avoid explicit enumeration of all matching to be able to learn and compute optimal global matching. This implies that we can build optimal markets efficiently while relying on more realistic assumptions, that is, only partial reputation data for at least a small set of matching. Last, a great wealth of innovation in computer science in recent years has propelled the research and application of deep neural networks (Goodfellow, Bengio, & Courville, 2016), which, powered by gradient-descent optimization methods (Robbins & Monro, 1951), exhibit remarkable prediction performance in specific tasks, such as recognizing pictures (Nixon & Aguado,  2019); natural language processing (Young, Hazarika, Poria, & Cambria,  2018), including machine language translation; playing games (Sutton & Barto,  2018); and also graph (or network) analysis (Scarselli, Tsoi, Hagenbuchner, & Monfardini, 2008). Although these methods can provide accurate prediction to social scientists, one of their drawbacks is that they are a black-box system. In other words, it is not entirely possible to interpret why predictions were made. As the social sciences are predicated on explaining social phenomena, we believe the community is going to particularly benefit from a growing research literature concerning “interpretable machine learning,” (Rudin, 2019) which aims to overcome the limitation of black-box methods while retaining the same level of accuracy. As the above discussion suggests, predictions have not been part of the work of social scientists. The challenges of making predictions in the social sciences is best summarized by Duncan Watts (Watts, 2011). He writes: “In simple systems . . . it is possible to predict with high probability what will actually happen—for example when the Halley’s Comet will next return or what orbit a particular satellite will enter. For complex systems, by contrast, the best that we can hope for is to correctly predict the probability that something will happen” (p. 143). Social systems create complex interactions between their own internal parts (individuals, groups, etc.) that give rise to unpredictable outcomes. Yet, making predictions is a fundamental part of the scientific method (Watts, 2011). We think that machine learning may drive social scientists’ attention toward predictions and contribute to creating an applicable body of knowledge.

Online Experimentation on Interactions A fast-growing area of CSS is to use the web to engage a large number of volunteers to participate in randomized online experiments. By using the web, researchers are now able to recruit large samples of specific populations to participate in giant online laboratories. Industry researchers use the web as a place for experiments as a matter of daily routine. The simplest and most commonly used type of experiment is the A/B test, in which the

526   Bruno Abrahao and Paolo Parigi researcher expose users to different versions of a platform, effectively splitting them into two groups at random—a treatment group, usually consisting of a fraction of the user population, and a control group, consisting of those who keep experiencing the currently deployed version of the service. This type of experiment is very effective for capturing the impact of product changes (e.g., the usage of new fonts or the implementation of new rankings) but are less effective for capturing theoretically motivated questions. The reason for this is that the A/B test remains constrained to a given platform that cannot be easily altered to isolate the confounds of the relevant factors shaping behavior. Because of this fundamental limitation, researchers who use the A/B test are limited in what they can freely manipulate. Nevertheless, theory-driven experiments, as opposed to the A/B test, conducted using computers predates the emergence of CSS. The field of human-computer interaction, with foundations in World War II–era cybernetic experimentation (Hauben & Hauben,  1998), quickly adopted applied online experiments with the goal of improving the design of computer interfaces. As online communities grew in population and diversity, social scientists began exploring the rich datasets these sites generated on social behavior. More recently, online experiments have been able to leverage big data in a promising direction that allows for causal arguments, as well as for increasing both the explanatory and prediction power of the model. The work of Abrahao et al. (2017) on trust in online markets of the sharing economy is one such case. Social biases have been observed in interactions among users in the sharing economy ( State, Abrahao, & Cook, 2016), a factor that often hinders the growth and healthy function of these services. The study investigated whether, and to what extent, features artificially engineered by sharing economy platforms, such as the reputation system, counteract natural behavioral tendencies that lead to social biases. The paper focuses on the common tendency to trust others who are similar, or homophily, which results in social selection of exchange partners simply as a byproduct of specific socio-demographic characteristics. The

Average invested

20 Reputation Baseline Higher

10

0 Savings

1 2 4 0 Distance from the participant

5

figure 28.2  Experiment using an investment game with Airbnb users as subjects playing the role of investors. We plot the average savings and investment in profiles at different socio-demographic distances from the subject. We present the farthest profile to the subjects as having better reputation than the closest ones.

Computational Social Science, Big Data, and Networks   527 tendency to form ties with similar others is one of the few regularities of social behavior that has been observed in many social settings. For instance, James Moody studied how homophily and balancing forces generated segregated friendship patterns in high school with low levels of race heterogeneity (Moody, 2001). Abrahao and collaborators tested the impact of homophily vis-á-vis the impact of reputation using a randomized online experiment with 8,906 users of Airbnb. The experiment was based on the widely used investment game, a variation of the prisoner’s dilemma, which is a single-shot game in which participants decide how many of the endowed credits to invest in recipients, who in turn may cooperate or defect when deciding how many credits, of the inflated (tripled) credits invested in them, to return. The authors generated synthetic profiles with various demographic characteristics of the simulated recipients and presented them as other Airbnb users to participants who play the role of investor. Abrahao et al. established that homophily systematically interferes with interactions. Participants invested more points on profiles whose characteristics matched theirs. The researchers then measured the extent to which the platform has the power to alter users’ perceptions of trust through reputation. They found that any positive reputation signal is enough to offset homophily. That is, a dissimilar other who exhibits high reputation tends to be perceived as more trustworthy than a more familiar counterpart with lower reputation. Finally, the study also presented evidence that the key finding from the experiment, that is, that reputation offsets homophily, was then confirmed by generalizing to interaction dynamics in the real world through analyses of one million hospitality interactions observed in Airbnb’s internal database Figure 28.2. The structured variation of the profiles represented a significant design innovation from standard randomized experiments. Each profile had random and predetermined elements that created statistical dependencies that the authors parsed out in the analysis. Profiles were built on the basis of the participant demographic characteristics collected at the beginning of the experiment and verified later against the Airbnb database. For example, the most similar profile a female participant would see was different from the most similar profile a male participant would see. Each further distance profile was built by randomly varying one, then two, or even all demographic characteristics. This design has some similarities with conjoint analysis and vignette studies where multiple attributes are clustered together (Louviere,  1988). Finally, the partnership between the authors of the study and Airbnb allowed for the creation of random sample of users, which reduced the overall bias of the estimates. Central to this design is the idea that data on individual-level interactions does not simply mean that we can recruit more people; it is also that the social world has become subject to more quantifiable manipulations. The next section builds on this intuition and suggests a new experimental design that is a hybrid of a laboratory experiment and a field experiment.

Online Field Experiments The quantification of everyday life means that experimentation can leave the laboratory and enter the social world. This possibility opens the doors to ethical concerns that we address in the next section. It also creates new opportunities for field experimentation, i.e., experiments that occur in the field and without randomization into treatment or control. Online field

528   Bruno Abrahao and Paolo Parigi experiments (OFEs) are an example of how the quantification of life is making possible to estimate the causal effect of complex treatments. Before exploring what we mean by complex treatment, we illustrate how an OFE works using the previously introduced Airbnb project. The experimental manipulation designed for measuring how reputation impacts social bias and trust represented phase 1 in a within-subjects pre-post design. Participants in the pre-measurement were invited to come back for a post-measurement, or phase 2. In between the two waves, some of the participants were expected to travel and interact with a stranger. Using data from advance bookings, the researchers knew who was going to experience the treatment (i.e., staying at the house of a total stranger) during the two waves and randomly selected participants to the experiment accordingly. While this task would have been impossible just a few years ago, as it required future knowledge, it is now within the realm of what is possible to study because of the available data. Thus, while participants are not assigned to treatment at random, the resulting bias is reduced by the randomization of the participants included in the sample. Furthermore, the estimates of the average effect on the treated is one of the main advantages of the OFE. Staying at the house of a total stranger is the complex treatment that we argue cannot be abstracted from the context where it occurred (Ashmore, Deux, & McLaughlin-Volpe, 2004). This goes beyond the argument that context provides the raw material for what happens during a social interaction and the relevant identities that get enacted (e.g., the context provides the stereotypes in research about stereotypes). We argue that context often directly impacts and mediates social interactions. Treatment complexity describes the degree to which a treatment entails embeddedness of the social interaction in its environment. Treatment complexity does not imply statistical interactions across multiple factors (though that may occur in some settings). A treatment is complex when the social interaction cannot be isolated from the context in which it happens without losing or significantly altering the overall effect of the experience. A complex treatment is one in which the interactions between participants are intrinsically shaped by the context in which they take place. As previously stated, treatment is not randomized. In the case of the Airbnb project, participants decided where and when to travel (or host guests in their homes). Nevertheless, the selection of participants to the experiment was randomized and the sample built in accordance with certain characteristics: first trip, gender, etc. These procedures helped reduce the various sources of selection bias if they are known at priori by the researchers (Parigi, Santana, & Cook, 2017). Furthermore, because of randomization into the sample, it becomes possible to estimate response bias and use statistical adjustments like inverse probability weights, to correct the main estimates.

Challenges As we mentioned in the introduction, CSS is a science in its infancy. Many unresolved, broad questions are currently roadblocks stumping the growth of CSS. We conclude this chapter by highlighting current challenges in the field. • Privacy: With the increasing amount of data collected by online platforms, new challenges to privacy have emerged. For instance, it is now possible to reconstruct relevant

Computational Social Science, Big Data, and Networks   529 characteristics of a user (gender, political preferences, sexual orientation, etc.) by looking at the pattern of Facebook likes ( Kosinski, Stillwell, & Graepel,  2013). A large amount of data about individual choices has opened the door to micro-focused marketing efforts and to political propaganda. More importantly, it has brought to the fore issues on how to best protect sensitive data and privacy. The recent implementation of the General Data Protection Regulation (GDPR) by the European Union is a first step toward creating binding guidelines for how to best collect, store, and protect the personal information of users. The key provision of the GDPR is not only a broader definition of personal data but also the fact that the GDPR has created consistency across Europe on how personal data can be processed, used, and exchanged securely. While the law formally applies to European state members, it covers all companies worldwide that collect and/or store data in Europe. Because of this broad mandate, the GDPR has been labeled as the first international privacy law of a new era. Yet, in a sense, even the GDPR appears obsolete. For instance, the information about Facebook likes previously mentioned is not considered private data by the GDPR. The challenge of protecting user privacy is therefore fundamentally one of evolving societal norms and technology. • Ethics and user consent: In 2014, Jeff Hancock and collaborators conducted a study to see whether emotions expressed by people in the Facebook newsfeed affected a Facebook user’s own use of emotional language (Kramer et al., 2014). Evidence showed that an emotional contagion occurred among Facebook users—their own emotional language reflected that which they saw in their newsfeeds. The reaction to the publication of the study surprised the researchers. First, many users complained about becoming part of research without any explicit consent; second, many users were surprised to learn that their newsfeed had been manipulated. With time this study has come to symbolize the ethical challenges of doing research using online data. Particularly when using data at scale, CSS researchers need to be careful to inform the subjects that they are participating in an experiment. Many researchers assume that because tech companies run hundreds of A/B tests per week and manipulate every aspect of the user experience, users will be even more willing to accept manipulation done in the name of science. This assumption has proven wrong many times already. • Representatives and algorithmic fairness: In June 2016, Eric Loomis was sentenced to six years in prison because he presented a high risk to the community. Part of the decision, the New York Times reported, resulted from Compass—a private algorithm developed by Northpointe Inc. to predict the likelihood of recidivism. The algorithm is proprietary, a fact that limits how much a defender and the general public can see on its functioning. Further, an investigation from ProPublica on machine learning tools applied to sentencing showed the systematic bias of the training set used for teaching Compass how to make predictions (Larson, Mattu, Kirchner, & Angwin, 2016)—including predictions in the case of Loomis. It is not only that machine learning algorithms can be biased. Rather, the data on which the algorithms are trained contains all sorts of biases because the people who curated the data in the first place are biased. Addressing how machine learning can be used for unbiasing results will be a growing area of CSS research in the near future (Kleinberg, Ludwig, Mullainathan, & Rambachan, 2018). • Engaging industry partners and experimental replication: Industry produces data whose volume and quality are virtually impossible to find in a university setting; it is of

530   Bruno Abrahao and Paolo Parigi paramount importance to collaborate with technology companies to increase the value of CSS work. Models for setting up collaboration between academic researchers and companies are now emerging, but significant work still needs to be done. Legal barriers, for instance, often prohibit companies from sharing data with third parties. Further, users may have concerns about the privacy of their data. Finally, using proprietary data raises the issue of replication of findings. Because the data is private and protected by strict privacy laws (see the GDPR on this), granting access to researchers is difficult and sharing data with the larger scientific community is virtually impossible. A few years ago, when Netflix shared data for its second data challenge, a group of researchers reverse-engineered the data and proved that sparse datasets with many features can be effectively de-anonymized (Narayanan & Shmatikov, 2008). How is the scientific community supposed to reproduce published results then? One developing collaboration format involves having a third party (a university usually) certify the accuracy of the results. We think that within universities, this guarantor role is going to increase in the near future, likely to be incorporated within the institutional review board mandates. • Providing guarantees to data science: Most of the current developments in data analysis, such as machine learning methods, focus on improving efficiency, predicting performance, and fine-tuning. However, these concerns represent only the later parts of the CSS pipeline. As a result, most of our recent progress consists of methods that are often oblivious to the likelihood of error rates and the assessment of business value, in the context of testing many different hypotheses. More importantly, they often overlook other interdependent steps of the cycle, from early stages, such as data collection, cleaning, and sampling, to more advanced stages, such as translating business problems into learning tasks and evaluating model fit. The process of connecting these steps is mostly done manually, which makes the practice of CSS exploratory at best. Moving forward, CSS will need to develop principled methods to provide error guarantees and best practices throughout the entire chain of data analysis. • Addressing challenges that come with big data: Finally, collecting large amounts of data in an uncontrolled way has its perils. First, with more data comes more noise. In fact, it is a hard problem to decide whether we are observing signal or noise in big data. Amplified levels of noise in large datasets give rise to higher chances of the data exhibiting repeating patterns of noisy fluctuations that may be misinterpreted as signal. Accordingly, this might lead to an arbitrary increase in the rate of false positives. Second, big data may exceed even the gigantic amounts of storage that have been made available by current technology. Therefore, algorithms that process large amounts of data in streams with little storage requirements, as well as algorithms for compression and dimensionality reduction, will be fundamental. Third, big data is going to require “fair data sampling” or automatic model correction to guarantee that all populations involved are being ­uniformly represented in the data to reduce potential sources of bias (Wang, Abrahao, & Kamar 2020).

Notes 1. Abrahao was partially supported by a National Natural Science Foundation of China (NSFC) grant #61850410536 Both authors contributed equally to the chapter and are listed in alphabetical order.

Computational Social Science, Big Data, and Networks   531 2. The term big data is oftentimes interpreted to mean the amounts of storage necessary to record the data. Others may classify data as “big” if their structure is complex, such as high dimensional and heterogeneous. Nonetheless, closer to our discussion is the notion of whether or not data is “big” depends on the question we ask. For example, if we ask what is the average human height, we have big data in terms of storage as we have more than seven billion data points, but not with respect to the question, as we can sample 1,000 such data points and answer the question with high statistical precision. 3. As the popular anecdote goes, in physics, if we cannot explain more than 95% of the data with a simple model, there is something wrong with the model, whereas in sociology, if we can explain more than 5% of the data with a simple model, there is something wrong with the model. 4. Modern machine learning methods also allow for interpretable outcomes to shed insight on the structure of the data. 5. The term trace complexity deviates from the traditional statistical literature nomenclature sample complexity. As opposed to samples, whose amounts can be linearly measured by their population number, traces possess two dimensions of measure, namely the number of different traces and their length (where each piece within a single trace consists of the informed actor’s identity and the time at which he or she was informed).

References Saha, K., Sugar, B., Torous, J., Abrahao, B., Kıcıman, E., Choudhury, M. (2019). A Social Media Study on the Effects of Psychiatric Medication Use. In Proceedings International Association for the Advancement of Artificial Intelligence Conference on Web and Social Media, pp. 440–451, 2019, ISSN: 2334-0770 Wang, Z., Abrahao, B., Kamar, E.  (2020). Supervised Discovery of Unknown Unknowns through Test Sample Mining. In Proceedings of The Thirty-Fourth AAAI Conference on Artificial Intelligence, New York, NY, USA, 2020. Barbosa, N.M., Sun, E., Antin, J., & Parigi, P. Designing for Trust: A Behavioral Framework for Sharing Economy Platforms. In Proceedings of The World Wide Web Conference (WWW '2020), April 20–24, 2020, Taipei, Taiwan. ACM Inc., New York, NY. https://doi. org/10.1145/3366423.3380279 Abrahao, B., Chierichetti, F., Kleinberg, R., & Panconesi, A. (2013). Trace complexity of network inference. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’13) (pp. 491–499) Association for Computing Machinery, New York, NY, USA. Abrahao, B., Parigi, P, Gupta, A., Cook, K. S., Reputation offsets trust judgments based on social biases among Airbnb users, Proceedings of the National Academy of Sciences, 2017, 114 (37) 9848–9853; DOI: 10.1073/pnas.1604234114 Abrahao, B., Soundarajan, S., Hopcroft, J., & Kleinberg, R. (2012). On the separability of structural classes of communities. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’12) (pp. 624–632) Association for Computing Machinery, New York, NY, USA. Abrahao, B., Soundarajan, S., Hopcroft, J., & Kleinberg, R. (2013). A separability framework for analyzing community structure. ACM Transactions on Knowledge Discovery and Data Mining, 8, 1. Ahn, Y., Bagrow, J. P., & Lehmann, S. (2010). Link communities reveal multiscale complexity in networks. Nature, 466(7307), 761–764.

532   Bruno Abrahao and Paolo Parigi Ashmore, R. D., Deaux. K., & McLaughlin-Volpe, T. (2004). An organizing framework for collective identity: Articulation and significance of multidimensionality. Psychological Bulletin, 130(1), 80–114. Backstrom, L., Boldi, P., Rosa, M., Ugander, J., & Vigna, S. (2012). Four degrees of separation. In Proceedings of the 4th Annual ACM Web Science Conference (WebSci ’12) (pp. 33–42). New York, NY: ACM. Bartók, G., Zolghadr, N., & Szepesvári, C. (2012). An adaptive algorithm for finite stochastic partial monitoring. arXiv preprint arXiv:1206.6487. Burke, T.  J., & Rains, S.  A. (2018). The paradoxical outcomes of observing others’ exercise behavior on social network sites: Friends’ exercise posts, exercise attitudes, and weight ­concern. Journal of Medical Internet Research 20(12), pages e11439. Casella, G., & Berger, R. (2001, June). Statistical inference. Duxbury Resource Center. Centola, D. (2010). The Spread of Behavior in an Online Social Network Experiment. Science, 329(5996), 1194–1197. doi:10.1126/science.1185231. Cesa-Bianchi, N., & Lugosi, G. (2006).  Prediction, learning, and games. New York, NY: Cambridge university press. Clauset, A., Moore, C., & Newman, M.E.J. (2008). Hierarchical structure and the prediction of missing links in networks. Nature, 453(7191), 98–101. Dean, J., & Ghemawat, S. (2008). Mapreduce: Simplified data processing on large clusters. Communications of the ACM, 51(1), 107–113. Dodds, P. S., Muhamad, R., & Watts, D. J. (2003). An experimental study of search in global social networks. Science, 301(5634), 827–829. Giles, J. (2012). Computational social science: Making the links. Nature, 488(7412), 448–550. Girvan, M., & Newman, M. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12), 7821–7826. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. Boston, MA: MIT Press. Hauben, M., & Hauben, R. (1998). Netizens: On the history and impact of usenet and the internet. First Monday, 3(7). Hofman, J., M., Sharma, A., & Watts, D. J. (2017). Prediction and explanation in social systems. Science, (355), 486–488. It’s a small world after all. (1979). Current Contents, 43, 5–10. Kahneman, D. (2011). Thinking fast and slow. Farrar, Straus and Giroux. New York. Kleinberg, J. M. (2000). Navigation in a small world. Nature, 406(6798), 845–845. Kleinberg, J., Ludwig, J., Mullainathan, S., & Rambachan, A. (2018). Algorithmic fairness. In Aea papers and proceedings (Vol. 108, pp. 22–27). Kosinski, M., Stillwell, D., & Graepel, T. (2013). Private traits and attributes are predictable from digital records of human behavior.  Proceedings of the national academy of sciences, 110(15), 5802–5805. Kramer, D. I., Guillory, J. E., & Hancock, J. T. (2014). Experimental evidence of massive-scale emotional contagion through social networks. Proceedings of the National Academy of Sciences, 111(24), 8788–8790. Larson, J., Mattu, S., Kirchner, L., & Angwin, J. (2016). How we analyzed the COMPAS recidivism algorithm. ProPublica (5 2016), 9. Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabási, A.-L., Brewer, D., . . . Van Alstyne, M. (2009). Computational social science. Science, 323(5915), 721–723. Lin, T., Abrahao, B., Kleinberg, R., Lui, J., & Chen, W. (2014). Combinatorial partial monitoring game with linear feedback and its applications. In Proceedings of the 31st International

Computational Social Science, Big Data, and Networks   533 Conference on Machine Learning (ICML ’14), 901–909, Proceedings of Machine Learning Research. Louviere, J.  J. “Conjoint Analysis Modelling of Stated Preferences: A Review of Theory, Methods, Recent Developments and External Validity.” Journal of Transport Economics and Policy 22, no. 1 (1988): 93–119. Mann, A. (2016). Core concept: Computational social science. Proceedings of the National Academy of Science, 3(113), 468–470. Milgram, S. (1967). The small world problem. Psychology Today, 2, 60–67. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., . . . & Petersen, S. (2015). Human-level control through deep reinforcement learning.  Nature,  518(7540), 529–533. Molina, M., & Garip, F. (2019). Machine Learning for Sociology. Annual Review of Sociology, 45, 27–45. https://www.annualreviews.org/doi/abs/10.1146/annurev-soc-073117-041106. Moody, J. (2001). Race, school integration, and friendship segregation in America. American Journal of Sociology, 107(3), 679–716. Moody, J., & White, D. R. (2003). Structural cohesion and embeddedness: A hierarchical concept of social groups. American Sociological Review, 68(1), 103–127. Narayanan, A., & Shmatikov, V. (2008). “Robust De-anonymization of Large Sparse Datasets,” 2008 IEEE Symposium on Security and Privacy (sp 2008), Oakland, CA, pp. 111–125. Nevmyvaka, Y., Feng, Y., & Kearns, M. (2006, June). Reinforcement learning for optimized trade execution. In  Proceedings of the 23rd international conference on Machine learning  (pp. 673–680). Newman, M., & Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review E, 69(2), 026113. Newman, M. E. J. (2006). Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103(23), 8577–8582. Nixon, M., & Aguado, A. (2019). Feature extraction and image processing for computer vision. Cambridge, MA: Academic Press. Owens, J. D., Houston, M., Luebke, D., Green, S., Stone, J. E., & Phillips, J. C. (2008). Gpu computing. Proceedings of the IEEE, 96(5), 879–899. Palla, G., Derenyi, I., Farkas, I., & Vicsek, T. (2005). Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043), 814–818. Parigi, P., & Bearman, P.  S. (2008). Spaghetti politics: Local electoral systems and alliance structure in Italy, 1984–2001. Social Forces, 87(2), 623–649. Parigi, P., Santana, J. J., & Cook, K. S. (2017). Online field experiments: Studying social interactions in context. Social Psychology Quarterly, 80(1), 1–19. Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206–215. Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., & Monfardini, G. (2008). The graph neural network model. IEEE Transactions on Neural Networks, 20(1), 61–80. Simon, H. A. (1990). Invariants of human behavior. Annual Review of Psychology, 41, 1–19. Simon, H. A. (2010). A behavioral model of rational choice. Competition Policy International, 6(1), 241–258. Simon, H. A., & Newell, A. (1971). Human problem solving: The state of the theory in 1970. American Psychologist, 26(2), 145–159. State, B., Abrahao, B., & Cook, K. (2016). Power imbalance and rating systems. In  Tenth International AAAI Conference on Web and Social Media.

534   Bruno Abrahao and Paolo Parigi Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. Cambridge, MA: MIT press. Travers, J., & Milgram, S. (1969). An experimental study of the small world problem. Sociometry, 32(4), 425–443. Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185(4157), 1124–1131. Wasserman, S., & Anderson, C. (1987). Stochastic a posteriori blockmodels: Construction and assessment. Social Networks, 9(1), 1–36. Watts, D. (2011). Everything is obvious once you know the answer. Crown Publishing Group. Watts, D., & Strogatz, S. Collective dynamics of ‘small-world’ networks. Nature, 393 440–442 (1998). https://doi.org/10.1038/30918 White, H. C., Boorman, S. A., & Breiger, R. L. (1976). Social structure from multiple networks. I. Blockmodels of roles and positions. American Journal of Sociology, 81(4), 730–780. Young, T., Hazarika, D., Poria, S., & Cambria, E. (2018). Recent trends in deep learning based natural language processing. IEEE Computational intelligence magazine, 13(3), 55–75.

chapter 29

N et wor ks An Economic Perspective Matthew O. Jackson, Brian W. Rogers, and Yves Zenou

Why Should We Study Network Structure? Why should social scientists care about the patterns of interactions in a society? An easy answer is that science is about explaining what we see in the world, and since social structures exhibit regularities, we should catalog those regularities and understand their origins. A deeper answer is that, as humans, we should care about the welfare of our societies, and social structure is a primary driver of behaviors and ultimately of people’s well-being. This perspective suggests a series of more directed questions organized around an important agenda. We want to understand not only which regularities social structures exhibit but also the mechanisms by which social structures impact human opportunities, beliefs, behaviors, and the resulting welfare. This also means that it becomes critical to understand why social structures take the forms that they do, and when it is that they take forms that are “better” versus “worse” in terms of the resulting behaviors. These perspectives lead to questions such as: What is the role of social networks in driving persistent differences between races and genders in education and labor market outcomes? What is the role of homophily in such differences? Why do we see such homophily even if it ends up with negative consequences in terms of labor markets?

Externalities: A Unifying Theme The fundamental reason that we need to look at social structure to understand human ­opinions and behaviors is that there are externalities involved. Externalities is a term that

536   Matthew O. Jackson, Brian W. Rogers, and Yves Zenou economists use to refer to situations in which the behaviors of some people affect the welfare of others. The concept of an externality is very general and, correspondingly, they can take  many forms. Second-hand smoke is a negative externality imposed by a smoker on those around him or her, as are the CO2 emissions of factories around the world, and some of the numerous activities with positive externalities include acquiring information as well as innovating and inventing. An important aspect of an externality is that, often, the welfare effects on others are not fully taken into account by the decision maker, leading to choices that are inefficient for society. One reason that understanding externalities is important is that they have consequences for policy. For example, as our medical understanding of the effect of smoking has improved, we have learned that the externalities from smoking are significant, which motivated changes to laws restricting smoking rights. In the same vein, the recognition of pollution-related externalities is the prime motivator for regulations such as emissions taxes, which are mostly intended to offset the externalities imposed on others. Networks of social interactions are important to study and understand precisely because they involve many externalities, both positive and negative. If one person becomes an expert at using a particular software, that knowledge can benefit his or her colleagues, who are then more easily able to learn to use that software themselves. If a person gets a job in a growing company, that position may benefit his or her friends, who may then learn quickly about new opportunities at that company and have an advantage in obtaining interviews. If one business has a partnership with another business that ends up making excessively risky investments in its other dealings and goes bankrupt, then that will damage the partnership and, for instance, if these are large investment banks, could even be a catalyst for a contagion of financial distress. To summarize, networks are of paramount interest because they drive behavior in contexts that involve systematic externalities. There are two basic themes that arise from this observation. The first theme involves network formation: people do not necessarily consider the full societal impact of their decisions to form or maintain relationships. This results in suboptimal network formation. For example, a person may not keep in touch with an old friend, even though the information that he or she might have learned from that old friend could end up being of substantial value to the person’s current friends, as may be the case in wordof-mouth learning about job vacancies. There is a basic sense in which information networks are systematically underconnected from a societal welfare perspective (e.g., see the discussion of the “connections model” later). The second theme concerns behaviors that involve peer interactions across a given network, such as choices to become educated, to undertake criminal activities, or to adopt a new technology. In these cases external effects lead behaviors involving positive externa­ lities to be taken at too low a level and behaviors involving negative externalities to be taken at too high a level, relative to what would maximize societal welfare.

Overview The scientific study of social networks was initiated more than a century ago and has grown into a central field of sociology over the past 50 years (see, e.g., Wasserman & Faust, 1994; Freeman, 2004). Although the importance of embeddedness of economic activity in social

An Economic Perspective   537 structures is fundamental, with a few exceptions, economists largely ignored it until the 1990s. Indeed, studies of networks from economic perspectives and using game theoretic modeling techniques have emerged mainly over the last 20 years (for overviews, see Benhabib, Bisin, & Jackson,  2011; Bramoullé, Galeotti, & Rogers,  2016; Goyal,  2007; Jackson, 2008, 2014, 2019; Jackson, Rogers, & Zenou, 2017; Jackson & Zenou, 2015). By now the study of social networks has also become important in anthropology, political science, computer science, and statistical physics. Network analysis has become one of the few truly multi- and interdisciplinary sciences. Along with such a variety of fields converging on a common topic come challenges in bridging gaps in terminology, technique, and perspective. Here, we aim to describe some of the perspectives and techniques that have come from the economics literature and how they interplay with those from other fields. Networks are mathematically complicated objects that often involve large groups of people, and thus have daunting combinatorics in terms of their possible connections. Moreover, behaviors and relationships are typically highly interdependent and nonlinear, making causal inference problematic. Thus, a scientific understanding of networks and their effects requires the use of models and testing of hypotheses. Which frameworks should we use to generate those hypotheses? One important technique that has emerged from the economic perspective has been the use of game theory. There are two distinct veins of game theoretic approaches to understanding social networks. The first is analyzing the formation of the network itself. What is the structure of networks that form when the people choose relationships based on costs and benefits? How do the externalities play a role in the extent to which social welfare is optimized? The second vein incorporates an analysis of network structure to help us understand the behavior of agents in a society. This approach includes contagions, diffusion, social learning, and various forms of peer influence. All of these applications involve externalities, and a variety of studies have modeled these various behaviors.

Network Formation A presumption underlying the economic modeling of network formation is that people have some choice regarding with whom they interact.1 Individuals form relationships and contribute effort to maintain them to the extent that they find them beneficial and avoid or drop relationships that are not beneficial. This presumption is sometimes captured through equilibrium notions of network formation, but is also modeled through various dynamics, as well as agent-based models where certain heuristics are specified that govern behavior. This “choice” perspective traces the structure and the properties of networks back to the costs and benefits that they bestow upon their participants. It is important to emphasize that this does not always presume that the individuals are fully rational, informed, or cognizant of their potential options, or even that they are “individuals” rather than groups or organizations or nations. Nonetheless, the analyses generally embody the idea that networks in which there are substantial gains from forming new relationships or terminating old ones are more ephemeral than networks where no such gains exist. Regardless of what drives people to form networks, as social scientists we should still care about the welfare implications of the network that forms for a society in question. This

538   Matthew O. Jackson, Brian W. Rogers, and Yves Zenou interest can be understood from two directions. On the one hand, an analysis of the costs and benefits that networks offer naturally leads us to the question of how well the system performs. Since costs and benefits need to be carefully spelled out, there is a natural metric with which to evaluate the impact of a social network or changes in a social network. On the other hand, some of the interest in social surroundings comes from the fact that models that ignore those surroundings are unable to explain some observed phenomena, such as persistence of inequality among different groups in terms of their employment or wages. The interest in these issues often emerges from some fairness criteria or from an overall welfare perspective, where there is a question of whether or not a society is operating as efficiently as it should or could. For example, when labor markets are viewed in a social context in which opportunities depend in part on one’s position in a network, it changes predictions regarding how jobs will be filled and incomes will be determined. This then leads back to a welfarist evaluation of the impact of the network and things like homophily. Network formation models that account for people’s choices predict different structures from benchmark models in which relationships are governed purely by some stochastic process that produces a random graph (see Jackson [2008], Newman [2010], and Pin & Rogers [2016] for overviews). There are models that involve choice and chance (e.g., Chandrasekhar & Jackson, 2018; Currarini, Jackson, & Pin, 2009, 2010; König, Tessone, & Zenou, 2014; Mele, 2013), but the important underlying aspect to modeling choices is that it  enables us to account for the externalities present and say something about societal welfare. Strategic network formation models build on the premise that the payoff or net benefit of a relationship depends potentially on the larger network and are not just on the dyad in question. For instance, people obtain value from friends of friends, and so have an interest in the way in which the network is connected that extends beyond their immediate neighborhood. Game theory provides a set of tools to analyze such a situation. Strategic network formation analysis borrows ideas from this literature and has also contributed to its recent enlargement. Nash equilibrium is a central concept in game theory,2 but it turns out to be a clumsy tool for modeling network formation as there often needs to be mutual consent in link creation. A person cannot simply decide to be a friend or partner of another; it requires both to agree. To get around the fact that Nash equilibrium embodies only unilateral strategic adjustments, one could try to model a process of proposals and acceptances, but it becomes needlessly complicated and hard to analyze. To handle such mutual consent in network formation, Jackson and Wolinsky (1996) introduced an alternative solution concept for network formation games, which they referred to as pairwise stability.3 To be pairwise stable, a network has to satisfy two conditions. First, no agent can benefit from severing any of his or her existing links. Second, no pair of agents should both benefit from creating a new link between them. Pairwise stability is a relatively weak equilibrium or stability concept that embodies mutual consent. Jackson and Wolinsky (1996) also introduced an illustrative model that they called the connections model, which captures values of indirect connections and externalities in a network. People benefit from friendships, but they also benefit from friends of friends, and from friends of friends of friends, and longer paths in the network. For instance, having more friends of friends may give a person more indirect access to information as it flows to

An Economic Perspective   539 4

2

3 1

figure 29.1  A star network with four agents. the person’s friends. In particular, any connection, direct or indirect, is valuable, although more distant connections contribute less to an agent’s well-being. Direct relationships also involve maintenance costs. People weigh the costs and benefits of their friendships, including the indirect benefits that they bring. They form valuable relationships and not ones that would be of a net negative value. A variety of network structures can be pairwise stable, and, quite naturally, that set depends on the costs and benefits of forming relationships, as well as the values of indirect relationships. To see some of the basic ideas underlying the literature, consider the star network shown in Figure 29.1, with a center (agent 1) connected to three peripheral agents (agents 2, 3, and 4). In any model in which agents benefit from indirect connections, such as the connections model, the network that forms might not be the one that results in the highest social welfare. For instance, in Figure 29.1, there may be some indirect benefit that agent 2 gets from having agent 3 as a “friend of a friend.” Agent 1 will generally not take this indirect benefit for agent 2 fully into account (even if partly altruistic) when deciding whether to maintain a relationship with agent 3. This positive externality means that private incentives and overall total social welfare are misaligned. In the case in which there are positive externalities, unless all agents are exactly and fully altruistic, the networks that form will generally not be the ones that are best from a societal point of view.4 There are also many settings in which ties result in negative externalities. For instance, it could be that agent 1 is collaborating on a research project with agent 3 and that distracts agent 1 from spending time with agent 2. In that case, there is a negative externality on agent 2 that agent 1 may not fully internalize when forming the relationship with agent 3. Such analyses reveal that, in general, individual and social incentives are not congruent, and that decentralized behavior typically results in an inefficient arrangement of social ties. A basic intuition is that with positive externalities, too few social ties are formed, as people do not fully internalize the indirect benefits to others when forming links; and conversely, with negative externalities too, many social ties are formed as people do not fully internalize the indirect harm that their relationships do to others. Indeed, for a wide variety of settings this is the case (e.g., see the discussion in Jackson, 2003, 2008). It is not easy to overcome the tension between individual incentives and societal welfare due to the pervasive externalities in network settings. To some extent, people who provide indirect benefits to others may expect to be compensated for their behaviors. For example, Burt’s (1992, 2004) theory of structural holes relates to this idea: people who connect otherwise disparate groups benefit from those connections. This is not solely because they benefit directly personally from their key position, but also because they serve as an important conduit and coordinator—providing externalities to those for whom they serve as a connector. They may be able to leverage that key position, for instance, by expecting favors in

540   Matthew O. Jackson, Brian W. Rogers, and Yves Zenou return for the unique information that they are able to relay to others. Although such rewards may help induce people to provide critical connections and positive externalities, the rewards generally cannot reflect the full value of all of the externalities present in a network at the same time, and thus fall short of providing the full incentives needed to result in networks that are best from society’s perspective. This can be shown mathematically.5 This tension between individual incentives and overall societal efficiency has been examined in a wide variety of models of costs and benefits, ranging from stylized versions of costs and benefits from friends of friends to more explicit models of things like research and development networks. One avenue pursues the case in which ties are directed and can be formed unilaterally (e.g., see Bala & Goyal, 2000), for instance, as in citations or in linking to or following on various social media. For such settings, externalities are again present and similar results apply as in the case with mutual consent in tie formation. A second avenue has considered heterogeneous versions of the connections model, wherein the costs and benefits of forming links can vary across agents (and pairs of agents) (see, e.g., Carayol & Roux, 2009; De Marti & Zenou, 2017; Galeotti, Goyal, & Kamphorst, 2006; Jackson & Rogers, 2005; Johnson & Gilles, 2000). In all of these models characterizing the full set of equilibria can be challenging given the combinatorics. Cabrales, Calvó-Armengol, and Zenou (2011) (see also Currarini et al., 2009, 2010) propose an alternative approach where agents only decide on the socialization effort they exert. For example, researchers go to workshops and congresses to listen, to be listened to, and to meet other researchers. In this approach, socializing is not equivalent to elaborating a nominal list of intended relationships, and, more importantly, meeting and talking to someone does not necessarily imply forming a link with that person. Indeed, if we take the example of conferences, then individuals decide how often they go to conferences. The probability of forming a link between two persons will then positively depend on how often these two persons go to the same conference. But even if these two persons go very often to the same conferences and talk a lot to each other, it does not necessarily imply that they will write a paper together (i.e., form a tie). Cabrales et al. (2011) can then perform an equilibrium analysis that equates marginal costs and benefits of socialization, characterize the equilibrium, and examine its welfare properties. They show, again, that externalities lead the equilibrium socialization efforts to be inefficient. More recently, researchers have been developing richer dynamic and statistical models of network formation (e.g., Badev,  2015; Chandrasekhar & Jackson,  2014,  2018; Christakis et al., 2010; De Paula et al., 2015; Graham, 2016; Jackson & Rogers, 2007; Jackson & Watts, 2002; König et al., 2014; Leung, 2015; Mele, 2013) that can incorporate incentives to form relationships and are suited to empirical application and statistical analysis.

Behavior and Games on Networks Thus far our discussion has focused on network formation. In this regard, the analysis typically builds from a given specification of how a network, once formed, translates into the costs and benefits for each agent. For the most part, those models do not derive the payoffs to the people involved in the network from a comprehensive look at how people behave once a network is in place. For example, the connections model posits that friends of friends

An Economic Perspective   541 are valuable but does not model that value as coming from a specific social learning or ­diffusion process. One can imagine different kinds of interactions that may ultimately correspond roughly to the payoffs specified in the connections model, but the interactions themselves lie outside the description of that model. This leads to a question of how people will behave once a network is formed, and how that depends on the setting and the externalities that are generated by their behaviors. In the present section, we discuss the literature that provides a complementary view to strategic network analysis, by explaining how behavior is driven by incentives and externalities on a given network (and so does not attempt to model network formation). This is then more explicit about the details of how agents interact along their links and what drives those behaviors. Again, a main question of interest is how externalities distort behaviors away from what might be socially optimal. There are many simple interactions, including basic forms of contagion and diffusion processes that are studied in this way without an explicit consideration of incentives. In fact, there is a very large literature discussing such processes on networks. For example, there is a wide class of models developed for epidemiologic questions concerning how a disease may spread through a population (see, e.g., Bailey,  1975; Diekmann, Heesterbeek, & Britton, 2013). Important questions there include understanding when a disease will take hold in a society once introduced versus when it will eventually die out. In the latter case, questions include how much (and which members) of a society will be infected, and so forth. It has been shown that answers to these questions depend in important and systematic ways on the structure of the contact network as well as how the spreading depends on the level of exposure (see, e.g., Granovetter, 1978; Jackson & Lopez-Pintado, 2013; Jackson & Yariv,  2007,  2011; Pastor-Satorras & Vespignani,  2001; Lopez-Pintado,  2008). The role of externalities in such settings is also readily apparent. For example, a choice by one agent to become immunized against a disease confers a positive externality on those with whom she interacts. For many behaviors, however, choices of people interact with the decisions of their friends. For instance, a decision of whether to adopt a new technology is usually driven by whether those around the decision maker are using compatible technologies. This means that such a diffusion process is more complex than the propagation of a disease. Correspondingly, these models tend to be naturally considered through game theoretic approaches, wherein each agent is presumed to take an action from a specified set and receives a payoff that depends both on his or her own action and the actions chosen by other agents (e.g., friends and acquaintances in the network). Here externalities are pervasive and this applies to most peer-effect settings. Note that, even in the usual case in which payoffs depend on the actions only of direct neighbors in the network, agents will care about, and have to make inferences regarding, the actions of farther-away agents. Friends of friends may not matter directly, but they do influence the behavior of one’s friends. This means that indirect effects can be of prominent importance in equilibrium characterizations. We refer to this literature as that of “games on networks” (for an overview see Jackson & Zenou, 2015). We emphasize that a very wide set of applications fall under this umbrella, including the diffusion of any good that involves complementarities, local public goods production, and any peer-effect application. While the payoffs can generally take an unrestricted form, two important classes of network games have been identified that give rise to

542   Matthew O. Jackson, Brian W. Rogers, and Yves Zenou important general insights. The classes are, respectively, games with strategic complementarities and with strategic substitutes. To understand the essence of these classes of games, consider a game where each agent chooses some level of behavior: for instance, how much education to obtain, how many hours to study, how hard to work, or how much effort to put into learning how to do something. Strategic complementarity describes the case in which an increase in the behavior of a friend increases the marginal benefit from increasing one’s own action. For example, the marginal returns to working on a collaborative project may increase in the time that one’s partners put into the project due to synergies. The equilibria of games with strategic complements are generally those for which, as one agent increases his or her action, his or her neighbors find it in their interest to increase their actions, which then feeds back to create an additional incentive for higher actions to the original agent, and so on. Note that even with such positive feedback, this does not necessarily lead to socially optimal outcomes, since each person is still choosing a level of action based at least partly on his or her own well-being and not based on the overall impact on the society. Games with strategic substitutes are, in contrast, those in which an increase in the action of a friend decreases the marginal incentive to raise one’s own action. For example, as one’s contacts vaccinate themselves against a disease, the incentive to immunize oneself decreases, as the chance of contracting the disease is lowered by the actions of one’s neighbors. Roughly speaking, these games tend to have a less “extreme” nature, since higher actions by some agents tend to produce lower actions in connected agents. There are still feedback effects, but here, instead of amplifying behaviors, the feedback is to mitigate the behaviors. Again, there are still important externalities, since any one person’s decision is still impacting many others, and the individual is not always weighing the full social impact of his or her own decision.

Strategic Complementarities To illustrate the characteristics of network games with strategic complements, let us examine a specific and very tractable game in which links represent the exertion of influence between two peers in a social network. To that end, consider a set of agents who each choose how intensely to pursue a social activity. The returns to investing in the activity depend on the agent’s action and the actions of his or her neighbors. A fixed network governs who influences each given agent (e.g., his or her friends). The game is one of complements, in which an agent’s marginal payoff from the social activity is an increasing function of the choices of his or her neighbors. We focus on an interpretation of the activity as criminal behavior, so that as one’s friends increase their level of criminal activity, the returns to increasing one’s own criminal activity increase as well (although other interpretations are possible as well, such as investment in education). Ballester, Calvo-Armengol, and Zenou (2006) analyze such a game where the utility functions take a particular “linear quadratic” form. They characterize the Nash equilibrium of this peer-effect game. A central finding is that an agent’s criminal activity is proportional to how central he or she is in the underlying network, where centrality is measured according to a particular formulation known as “Katz-Bonacich” centrality (a well-known measure introduced by Katz,  1953, and Bonacich,  1987). The relationship between the peer effect

An Economic Perspective   543 and Katz-Bonacich centrality is quite intuitive, as they both involve iterative interactions. A person is influenced by his or her friends. They are in turn influenced by their friends. Thus, friends of friends have an indirect effect and that effect can also be quantified. Iterating, there are effects of friends of friends of friends, and so on. Katz-Bonacich centrality looks at this limit, where there is a decaying effect as the distance increases, and this is exactly how such games with complementarities work as well, and so it is very natural that this centrality measure plays a prominent role in understanding peer effects. What kinds of insights can be taken from this model? As just mentioned, the analysis predicts that a key determinant of one’s (e.g., criminal) activities is the position in the network: the more central (in terms of Katz-Bonacich centrality) a person is in a network, the higher is his or her level of criminal activity. The centrality captures the direct and indirect influences on a given individual. This gives a causal prediction between network position and criminal activity, and here Katz-Bonacich centrality embodies the externalities and how they operate through the network. Analyses of such models also yield further predictions. For example, one can also use them for policy experiments, seeing how changing the network or intervening to influence behaviors can change the overall behaviors in the society. In particular, in the context of the model mentioned previously, it suggests a way of formulating policies aimed at reducing crime. Indeed, Ballester, Calvo-Armengol, and Zenou (2006, 2010) propose a new measure of network centrality: intercentrality. This measure can be used to identify key players in a network, that is, those people whose removal from the network would lead to the greatest aggregate reduction in criminal activity. Intercentrality ranks individuals according to this criterion and allows one to identify the most effective targets when one’s objective is to reduce overall crime. Importantly, Katz-Bonacich centrality and intercentrality need not rank individuals in the same way. One captures how a given person is influenced by others, while the other captures the overall influence of a person on the society’s activity. It can be that the most active criminal is not the most effective target for reduction of overall crime. This potential difference derives from the complicated set of externalities implicit in peer-effect settings, and the fact that how a person is influenced is not the same as how influential that person is on others. To illustrate this finding, consider the network of 11 criminals shown in Figure 29.2. Let us determine who the most active criminals and the key players are. Actually, these depend on a factor 0 < δ < 1 that discounts how influence decays with distance (e.g., how much influence does my friends’ actions have on my activity?). Similar to the connections

8

2

7

3

9

4 1 10

11

6

figure 29.2  A bridge network with 11 agents.

5

544   Matthew O. Jackson, Brian W. Rogers, and Yves Zenou model discussed earlier, the influence of people at a distance k from an individual on that individual are proportional to δk so that criminal 4’s influence on criminal 11 is proportional to δ3. Calculating Katz-Bonacich centrality and intercentrality, both make use of this discount factor. It turns out that, whatever the value of δ, individuals 2, 6, 7, and 11 display the highest Katz-Bonacich centrality. These are the individuals who have the highest number of direct connections and, importantly, they are directly connected to the “bridge individual” 1, which gives them relatively close access to a large range of indirect connections. Altogether, they are the most central criminals and thus, in equilibrium, those who engage most heavily in crime. For small values of δ (e.g., δ = 0.1), the key player is also the most active player (i.e., of type 2), but for higher values of δ (e.g., δ ≥ 0.2), the key player is criminal 1, the bridge player. Even though that person does not have the highest Katz-Bonacich centrality, removing that person from the network would reduce crime the most since they are most essential in transmitting behavior across the network. Indeed, because indirect effects have a pronounced effect, eliminating criminal 1 has the highest joint direct and indirect effect on aggregate crime reduction. This is only true when there are enough externalities from removing criminal 1. When δ is small enough, indirect effects are small and direct links drive behavior. As a result, criminal 2, who has more direct links than criminal 1, becomes the key player.6 Generally, in networked peer-effects and games on network settings, whether the behaviors are larger or smaller than socially optimal depends on two aspects. The first is whether behaviors are complements (e.g., studying, criminal behavior, cheating) or substitutes (e.g., local public goods, vaccinations), as that dictates whether behaviors are amplified by the behavior of others or muted by the behavior of others. The second is whether the externalities are positive or negative. That is, do the behaviors of others impact a given person positively or negatively? It is important to emphasize that this is very different from whether behaviors are complements or substitutes. The fact that behavior is a strategic complement in the case of both education and crime leads to similar effects in terms of their feedback but very different policy implications. In the case of studying, there is suboptimal behavior. By subsidizing behavior, or making it easier for students to study, a government could increase social welfare. In contrast, in the case of criminal behavior it is exactly the opposite: even though the game is still one of complements, the overall external effects are negative— criminal behavior ultimately harms others. The literature in this area has been successful in outlining the impact of externalities in games of both complements and substitutes and how these apply to a variety of settings (again, see the survey by Jackson & Zenou, 2015). Some related areas in which people make decisions and how these interact are discussed later. These are just a few of the application areas and provide some illustrations of the insights that emerge.

Financial Networks Financial markets are a setting in which networks of relationships play a central role, and in which there are relevant and timely problems for which network analysis is essential. Systemic risk in financial settings is a network phenomenon. There are many dimensions on which network externalities drive financial fragilities, and here we mention two of them.

An Economic Perspective   545 One relates to panics and bank runs, and the other relates to counterparty risk, which has to do with the chance that a business partner will default, which is driven by his or her other investments. The role of networks in financial panics is well illustrated by a fascinating historical study of Kelly and Ó Gráda (2000), who look at the behavior of Irish depositors in a New York bank during two bank runs in the 1850s. As recent immigrants, their social network was determined largely by their place of origin in Ireland, which largely determined where they settled in New York, and thus their social networks. During both panics this social network was the prime determinant of if and when a depositor pulled money from the bank. In those times, even some financial market information was largely by word of mouth, and so the panic spread according to classic diffusion patterns, with basic insights applying from the classic contagion literature. Although classic network analysis applied to such word-of-mouth panic spreading, there are additional subtleties that drive systemic risk in financial networks. There is a rapidly growing literature on this subject that merges classic network analysis with details from financial settings (e.g., see Jackson and Pernoud 2020 for a survey). The starting point is the obvious one that the health of a financial institution derives, in part, from the health of other financial institutions. Links, representing contracts and liabilities between institutions, allow for the possibility of contagion, in that if one institution becomes insolvent, this can cause otherwise healthy partner institutions to become unhealthy. This can cascade and have a large effect on the economy. Even a relatively small shock to the system can become amplified and have profound effects on the performance of a whole economy. To understand this issue, consider banks in different regions (e.g., as in Allen & Gale, 2000) that hold part of their deposits in other regions for some diversification. In the case of a banking crisis in one region, the claims of other banks on this region lose value and can cause a banking crisis in the adjacent regions. The crisis can gain momentum as losses in those adjacent regions spread in a contagion to more regions. Here, however, there is an important tradeoff that makes such contagions different from the standard ones that we are used to from epidemiology and from marketing studies of diffusion. In financial networks there are countervailing effects. As financial institutions and/or regions become more interconnected or globalized, the network possibilities of contagion multiply. One the other hand, as the institutions or regions become more interconnected, they also become more diversified and less sensitive to shocks to any particular partner. Which of these effects dominates depends on the specific context. As long as the magnitude of shocks affecting financial institutions are sufficiently small, denser networks enhance the stability of the system (Allen & Gale, 2000; Freixas, Parigi, & Rochet, 2000). However, as shocks become larger, such interconnections serve as a mechanism for propagation of shocks and lead to a more fragile financial system. Which sorts of shocks matter depends on both the topology of the network and the size of the contracts that are present in each link (Acemoglu, Ozdaglar, & Tahbaz-Salehi, 2015; Elliott, Golub, & Jackson, 2014). This is important because it helps us understand the most recent financial crisis in which there was a relatively large shock to the system (a crisis in the mortgage market). The well-connected network of swaps and other contracts led financial distress to spread widely. This led too-big-to-fail considerations to evolve into too-interconnected-to-fail concerns and increased attention to the understanding of how networks matter in financial

546   Matthew O. Jackson, Brian W. Rogers, and Yves Zenou contagions. The full extent of the network of relationships was underestimated, as was the correlation in the investments and the extent to which the mortgage crisis could reach. Here, again, externalities play a central role. One bank that has a financial contract with another cannot control what risks the second bank takes with the rest of its portfolio and may not even be able to assess that counterparty risk. The second bank is making investments based on its own tolerance for risk and may not pay attention to the consequences it will have for the larger economy if it is forced to default (e.g., see Jackson and Pernoud 2019, 2020). The managers of Lehman Brothers were not necessarily paying attention to the future costs to shareholders of other institutions and to taxpayers that their bankruptcy would trigger, as hundreds of billions of dollars needed to be pumped into the system to prevent a major contagion. Understanding such financial networks and systemic risks is an important area of active research.

Social Learning Much of what we understand about the world is learned from observing others. These inferences depend on what others know about the world and also potentially on what decisions they make. We rely heavily on others’ reviews of products and we make inferences about quality from the popularity of goods and services. These outcomes then involve externalities, as others’ decisions influence our inferences and decisions, and ultimately our welfare. While this simple observation has always been relevant, it is becoming increasingly so as more information about others’ likes and dislikes is communicated via online systems. There is a vast literature on social learning spanning a number of disciplines, but a couple of key papers are very useful in pointing out the externalities in such settings and their potential impact: Banerjee (1992) and Bikhchandani, Hirshleifer, and Welch (1992). These seminal papers study a simple framework within which individuals have privately held information about the quality of some goods (e.g., the quality of a restaurant). While this information cannot be directly communicated to others, individuals sequentially take publicly observed actions based, in part, on their private information (e.g., which restaurant to patronize). If properly aggregated, the private information would reveal the quality of the good almost perfectly in a large population, but the externalities are such that this social learning may fail. A central message of these papers was to emphasize that, under certain conditions, herds are inevitable: after some point, all individuals take the same action, even if it is an inefficient action. The basic mechanism is fairly simple, so we illustrate the intuition and refer the reader to the papers for more details. Consider people deciding whether to buy a new product or not. Consider a very simple case in which everyone would actually agree that the product is not worth buying if they had a chance to try it out. People look at the product and get some impression of it and based on these impressions have some information about its quality. The key assumption is that any one person could be wrong, but on average he or she would be right: if we pooled the population, the majority of people would correctly infer that the product is not a good one. However, people move one by one. Consider a case in which the first person has a good impression and buys the product. Suppose, by chance, that the second person also happens to have a good impression and buys the product. Now consider the third person. Even if she has a bad impression, she sees that the first two people

An Economic Perspective   547 bought the product and so infers that they both had good impressions. She then realizes that the majority of observable impressions are positive and so she buys the product, regardless of her own impression. The fourth person therefore learns nothing about the third ­person’s impression and thus faces the same decision problem as the third person and also ignores his own impression. A herd forms and everyone buys the product even though it is a dud. Clearly this herding effect is the result of a very stylized model, and whether it holds or not depends on details of the setting (e.g., see Smith & Sorensen, 2000), including the quality of information, heterogeneity of the population, whether people are altruistic (and, for instance, who posts reviews online), and who sees whom. These models are set up to highlight the role of informational externalities. The payoff to each individual depends only on his or her own decision and the quality of the product, and not on the action taken by anyone else. However, people’s actions are still important for others as they carry information. People do not account for the informational content of their actions for others when making their decisions. If the third and fourth people in our example’s queue had followed their own impressions, it would have given more information to the subsequent society. Instead, the selfish act of following the herd, while individually rational, confers an informational externality to subsequent decision makers, since they suffer from not being able to learn from many of the actions before them. Again, although this model is stylized, it is clear that people are influenced by others when making decisions. For example, in the classic study on the Truman versus Dewey presidential campaign, Lazarsfeld, Berelson, and Gaudet (1944) found that voters were more influenced by friends and colleagues than by the mass media. Given the widespread applications and the fact that the externalities become even more complex in richer settings, the literature on social learning has expanded in many directions. Social learning can fail in many ways, and this had led to many models of social learning in networks, in which the structure of who observes the actions of others, and how their actions are sequenced, is more elaborate. There are, in fact, two main paradigms in the literature, often referred to as rational (Bayesian) learning and naïve (boundedly rational) learning, respectively. The main idea behind the Bayesian learning model is that agents are fully rational, in the sense that they accurately process any information that becomes available to them, either through communication or through the observation of the actions and the payoffs of their peers. As (Bayesian) consistency would suggest, under some regularity conditions, agents in large networks will converge to the same beliefs and/or actions, but not always to the optimal ones given the externalities (e.g., see Acemoglu et al. 2010; Bala & Goyal, 1998; Golub & Sadler,  2016; Molavi et al.,  2016; Mossel, Neeman, & Tamuz,  2014; Mossel, Sly, & Tamuz, 2015; Tahbaz-Salehi & Jadbabaie, 2008). In the naïve (boundedly rational) learning model, introduced by De Groot (1974), agents start with some initial beliefs and then update their beliefs by taking a weighted average of their neighbors’ beliefs. Agents are boundedly rational in the sense that an agent’s new belief is just the (weighted) average of his or her neighbors’ beliefs from the previous period which, for example, can lead to “overcounting” information from some individuals, relative to Bayesian updating. Over time, provided that the network is strongly connected (so there is a directed path from any agent to any other) and satisfies a weak aperiodicity condition, beliefs converge to a consensus. It can be shown (Golub & Jackson, 2010) that, only if the influence of prominent individuals goes to zero as the network grows, society can learn efficiently.

548   Matthew O. Jackson, Brian W. Rogers, and Yves Zenou The Bayesian approach hinges on the assumption that agents possess the mental capacity to optimally extract and aggregate information in the aforementioned way, or at least along similar lines. Although in some cases this assumption may be plausible, empirical evidence suggests that even in very small and simple network structures people are far from rational in their processing of others’ information (Choi, Gale, & Kariv, 2008). In fact, observations from recent field experiments (Chandrasekhar, Larreguy, & Xandri,  2020; Grimm & Mengel, 2015; Mobius, Phan, & Szeidl, 2014) are compatible with the assumption that subjects exhibit a non-Bayesian behavior more compatible with the naïve (boundedly rational) learning model. There are several ways in which externalities manifest themselves here; let us mention a few of the ways in which externalities lead learning to fail. First, as we have already seen, it may be that agents are not taking into account the information value of their actions to others. Also, it can be that agents don’t experiment as much as would be optimal. For instance, consider our example of buying the new product earlier, but suppose that people also see previous people’s satisfaction, and not only their decisions: for instance, seeing reviews of previous purchasers. Here it can be that too few people try the product for society to end up learning the product’s quality. Even if it is not in some particular person’s interest to try the product because the current information suggests that the product is not worthwhile, it could be best for the overall society to have that person experiment and try the product to get better information and be more confident that the product is not worthwhile. This follows since if it turned out to be a good product, it will generate positive value to many other people—and so society would like to see more experimentation than any given potential consumer would. Another possible failure in social learning comes not from the particular decisions of the agents, but from the structure of the network. For instance, it might be that people pay too much attention to just a few people (Bala & Goyal, 2000; Golub & Jackson, 2010) or have excessive homophily (Golub & Jackson, 2012). Indeed, homophily (the tendency of individuals to associate with individuals with similar characteristics; see McPherson, Smith-Lovin, & Cook, 2001) can reduce the speed of social learning: opinions may converge quickly within a group but may move slowly across groups. Individuals may prefer to form ties with similar others, but this could be harmful to the overall ability of society to process information.

Labor Markets A wide set of studies documents that the filling of most jobs involves, at least in part, personal contacts (e.g., see Bayer, Ross, & Topa,  2008; Brown, Setren, & Topa,  2016; Granovetter, 1973, 1974; Holzer, 1988; Montgomery, 1991; Ioannides & Datcher-Loury, 2004; Kramarz & Nordström Skans,  2014; Pellizzari,  2010; Rees,  1966; Rees & Shultz,  1970; Topa, 2011; Wahba & Zenou, 2005). Indeed, networks of personal contacts are critical conduits of information about employment opportunities, which flow via word of mouth and, in many cases, constitute a valid alternative source of employment information to more formal methods. Networks of contacts have the advantage that they can be relatively less costly to use to identify appropriate workers than sorting through endless numbers of applications, and they can provide more reliable information about jobs and the workers, and hence lead to better matches, than other methods.

An Economic Perspective   549 Given that networks place constraints on who hears about job opportunities, the classical “frictionless” supply-and-demand analyses of labor markets have the potential to be quite misleading. Rather, which firms hire which workers, and how those workers are compensated, depends in important ways on how information about job vacancies is passed through society. Thus, there is an immediate sense that being “well connected” confers advantageous opportunities to learn about attractive jobs and improves an individual’s chances of employment, the quality of his or her jobs, and his or her compensation. Indeed, such an intuition is borne out in models that account for word-of-mouth learning about job opportunities. Better-connected individuals have shorter periods of unemployment and higher wages on average. In fact, modeling such a setting, Calvó-Armengol and Jackson (2004, 2007) explore wider implications of social networks in labor markets. For example, they show that if there is some homophily or segregation in a society and one begins with any disparities in employment across segments of the population, then those disparities can be amplified and become persistent. If one’s social contacts are poorly employed, then they are a less reliable source of information for several reasons, including the fact that they are less likely to hear about job opportunities at their employer (since they frequently have none) and also they become competitors for some job opportunities that do become available. Thus, people who have better-employed friends and relatives have greater incentives to stay in the labor market and to invest in education and human capital. Given this prediction, if there are segments of the population that are poorly employed and others that are well employed, those disparities can be robust and persistent. In terms of social well-being, to the extent one is concerned with reducing inequality (at least of opportunities if not also of the outcomes themselves), an important message is that policies aimed at distorting labor market outcomes must account for the correlations and persistence in labor markets, and associated education decisions, induced by job contact networks. In this framework, externalities are again important in understanding the conclusions. For instance, when one worker drops out of the labor force, he or she imposes a negative externality on friends and relatives, who could have benefited from having an (employed) friend as a contact. These frictions further lead to mismatches in the labor market, as there is no guarantee that the best worker for a given job will be matched to that job, as the optimal worker may never learn about the opening or acquire the skills needed (e.g., see Bolte, Immorlica, Jackson 2020). Recent empirical work examines a variety of aspects of the distortions in labor markets (see, e.g., Beaman, 2016, for an overview).

Development Economics When formal institutions do not exist or do not work very well, informal, network-based relationships serve as substitutes. For example, when insurance markets are missing and access to banking is limited, shocks to incomes are often smoothed via borrowing from family and friends (see, e.g., Fafchamps & Lund, 2003; Kinnan & Townsend, 2012). Further, new technology adoption is critical for growth and development (see, e.g., Alvarez, Buera, & Lucas, 2013; Perla & Tonetti, 2014), and social relationships are the means by which many individuals learn about, and are then convinced to adopt, new technologies. These and other vital functions of social networks have made them a main focus of development studies.7

550   Matthew O. Jackson, Brian W. Rogers, and Yves Zenou For example, the adoption of new technologies and programs has been the focus of many studies (e.g., Banerjee et al., 2013, 2019; Beaman et al., 2016; Conley & Udry, 2010; Duflo & Saez, 2003; Foster & Rosenzweig, 1995). Banerjee et al. (2013) use network data to analyze the diffusion of microfinance in a set of rural Indian villages. One of their key questions is: how do the social networks in a village affect the diffusion of microfinance loans? They develop a model of information diffusion through a social network that discriminates between information passing (individuals must be aware of the product before they can adopt it, and they are made aware by their friends) and endorsement (the decisions of informed individuals to adopt the product might be influenced by their friends’ decisions). Their findings suggest that households who participate in microfinance are much more likely to inform their friends that microfinance is available than households who choose not to participate. Thus, households who have a high fraction of friends who are participating are much more likely to hear about microfinance than those with a low fraction, all else held constant. Their analysis suggests that much of the peer interaction in this setting involves people making each other aware of microfinance and that peer influences beyond that play an insignificant role in participation decisions. Moreover, they trace the wide variation in adoption across villages to the centrality of the first-informed households in the villages and show that how one measures centrality is essential to predicting diffusion. Beaman et al. (2016) study a similar issue via a carefully designed randomized controlled trial of a program that incentivizes farmers to adopt a productive new agricultural technology in 200 villages in Malawi. Their aim is to identify seed farmers to train to use the new technology. They find that some farmers need to learn about the technology from multiple people before they adopt it themselves. Thus, in that context, simply changing the patterns of who is trained in a village on a technology on the basis of social network theory can increase the adoption of new technologies compared to the ministry’s existing extension strategy.8 Failures of diffusion in some of these settings relate back to the externalities that we discussed in the context of social learning. Here, the choice of whether to adopt a new technology by one household ends up affecting both whether their neighboring households become aware of the technology and whether those neighbors become convinced to adopt it. Diffusion can fail for both of these reasons, and these studies together show that it can be vital to distinguish and account for both effects as they differ across products. In some cases a new technology is easy to understand and clearly beneficial and so failures in diffusion come from network structures and failures in the spread of information, while in others additional learning is required and additional externalities are present as people learn from the experiences and behaviors of others.

Exchange Theory, Bargaining, and Trade on Networks Another obvious area in which network structure impacts outcomes, and in which people exercising power in key network positions impacts the terms of trade that others obtain, is that of the exchange of goods and services. Such trade, along with the associated bargaining processes, rarely occurs in the context of a centralized market; much more often it occurs via a series of bilateral agreements. When individuals who are critical to certain transactions (e.g., see Burt, 1992) exercise their bargaining power to obtain favorable terms, this

An Economic Perspective   551 can lead to substantial frictions in trade and to inefficient transactions and allocations in the system. Early experiments, such as those by Cook and Emerson (1978) and Bienenstock and Bonacich (1993), show how position in a network affects which bargains are struck and how gains from trade are distributed across a network (see Cook & Cheshire, 2013, for a recent survey). Game theory has played an important role in such analyses, including early studies by Aumann and Myerson (1988) and Bonacich and Bienenstock (1993), and has driven the rapidly expanding literature that spans theory and experiments. Indeed, in many markets a buyer and a seller must make an investment before they can (feasibly or profitably) trade. This may involve setting up accounts, opening channels of communication, making personal contacts, or making goods and systems compatible. This is true of a variety of markets, from the production of clothing (e.g., Uzzi, 1996) to the trading of securities (e.g., Wang, 2016). The market is networked due to the constraints imposed by the (lack of) investments. The network structure of possible trading pairs then becomes relevant because it determines the outside options of a given buyer and a seller bargaining over a trade surplus. For example, a buyer who has invested with only one seller may have little bargaining power in that relationship, since he or she has no alternative partners with which to profitably trade. Kranton and Minehart (2000); Jackson (2003); Corominas-Bosch (2004); Charness, Corominas-Bosch, and Frechette (2005); Blume et al. (2009); Manea (2011); Elliott (2015); Nava (2015); Condorelli, Galeotti, and Renou (2016); and Wang (2016), among others, analyze various market structures and models. For example, in addition to pure buyer-seller networks, trade can more generally involve intermediaries, dealers, brokers, and retailers, whose motives to trade are pecuniary rather than deriving from final consumption of the traded goods. The presence of intermediate traders raises a set of questions: How do different network structures affect the efficiency of trading outcomes? How does the position of a trader in a network affect his or her profit? (For overviews, see Cook & Cheshire, 2013; Condorelli & Galeotti, 2016; and Manea, 2016.) There is also a large literature on inefficient investment due to hold-up problems9 and overinvestment in outside options. For example, Elliott (2015) shows that there are inefficiencies in these investments when long-term contracts cannot be enforced. One insight that emerges here is that a given agent can have a strong incentive to make costly investments in additional trading partners, so as to increase his or her bargaining power with existing partners. But, as these incentives apply also to the other agents, it can well be that these additional investments, while costly, roughly offset each other, leading to wasteful overexpenditures in relationships that are not used in equilibrium. In the case of relationship-specific investments, a hold-up problem represents a different sort of inefficiency that can lead to underinvestment in relationships. Depending on the structure of the network, once the investment cost has been sunk, an agent may not be able to recoup that sunk cost. Anticipating this possibility of being held up, the agent may fail to invest in a relationship that would be efficient from an aggregate point of view. A further set of inefficiencies that can arise in intermediated trade derive from the decentralized nature of decision making, which can lead to miscoordination in the routing of goods or of the prices implicit in trade terms. For example, a given buyer may value a final product only when it can be combined with the purchase of a complementary product that he or she buys through a different supply chain. The agents along these chains may therefore be unable to perfectly anticipate final demand for their product, as it will depend on prices

552   Matthew O. Jackson, Brian W. Rogers, and Yves Zenou and routing that take place in other paths of the network. This can lead, for example, to reduced incentives to trade, even if demand would be strong in a perfectly coordinated world. The fact that an agent’s incentive to invest in a relationship is influenced by the decisions of others is a manifestation of externalities. As noted, it is not generally clear-cut if such externalities will lead to a net under- or overinvestment in relationships. But what is generally true, outside of a few highly special circumstances, is that there tends to be an inefficient set of transactions, relative to that which would maximize total value across society.

Empirical Analyses of Network Models When analyzing the causal determinants of behavior, where behavior is potentially influenced both by peers (connected agents) and by personal characteristics, particular care must be taken. A broad observation relevant to any empirical analysis can already be seen in our exposition. In the second section, we discussed the modeling of network formation, and in the third section, we discussed the modeling of behaviors across a given network. In most applications, both of these dimensions are simultaneously at work: the network itself and the behaviors on that network are jointly determined by the people being studied. That is, both the network and behavior should be treated as endogenous observations, and so understanding which decisions have causal impacts on other decisions is challenging. As friends often have similar characteristics, they are predisposed to make similar decisions, even absent any direct causal effects of one friend’s decision on another’s. It is quite likely that only some of the characteristics that are relevant to the friendship decision will be observable to the analyst. For example, two students may be friends in part because they have well-educated and affluent parents (observables), but it may also be that their friendship reflects compatible personalities or attitudes toward study beyond those instilled by their parents (unobservables). As a result, if the individuals choose to become friends, and their behaviors on some dimension are correlated (e.g., both have good grades), it may not be possible to sort out the explanation. Is it because they share the same observable characteristics (educated parents) that they take similar actions? Is it that they face a common favorable environment (e.g., a good school and good teachers)? Or is it because the individuals had similar unobservable characteristics (attitudes toward study) that both caused them to be likely to form a friendship and, in parallel, caused them to behave similarly? Disentangling such possible explanations is a major challenge that is common to empirical work dealing with the effects of social networks (e.g., see Aral & Walker, 2012). Some such studies, for instance, in the enormous literature on peer effects in education, take advantage of random assignments of students to classes or as roommates (see, e.g.,Algan et al., 2015; Carrell, Sacerdote, & West, 2013; Sacerdote, 2001). However, in many settings, we do not have the luxury of controlling the network or having it randomly assigned. Moreover, overcoming these challenges is crucial to our understanding of how networks influence (and are influenced by) behavior. One obstacle to empirically identifying peer and social network effects is what Manski (1993) termed the reflection problem. If one assumes a standard linear formulation in which a person’s behavior, for instance, his or her study habits, is influenced by the average of peers’ behaviors (their study habits) and his or her background (parents’ education level)

An Economic Perspective   553 and peers’ backgrounds (their parents’ education level), then the impact of peers’ behavior on an individual’s behavior cannot be distinguished from the impact of peers’ background characteristics on the individual’s behavior. This problem applies most strongly when individuals are connected in groups (e.g., students into classrooms). If, instead, one has network data such that the peers who matter for some individuals differ from those who matter for some others in a sufficiently rich way, then it is possible to identify the impact of peers’ behaviors, as shown by Bramoullé, Djebbari, and Fortin (2009); Calvó-Armengol, Patacchini, and Zenou (2009); Lee, Liu, and Lin (2010); Liu and Lee (2010); Liu et al. (2012); and others.10 As mentioned, a different obstacle to identifying peer influences is that similar behaviors may be driven by a common factor that is not observed by the analyst. Consider, for example, a teen’s decision to use illegal drugs. Is it because the teen’s friend initiated drug use, or is it due to some common influence, such as a substance-abusing parent or teacher who caused both children to adopt the same behavior? The distinction between these explanations is important for policy purposes. When peer contagion effects operate, intervening to alter one child’s behavior may have the additional effect of changing several other children’s behavior. In other cases, such as when children initiate substance use because adults in their neighborhood provide opportunities to do so, these so-called multiplier effects may not exist or may involve other factors (Brock & Durlauf, 2001). Correctly distinguishing endogenous from exogenous social effects is necessary for any effort to gauge the true net impact or social benefits of any behavioral intervention. Bramoullé et al. (2009) show how asymmetries in a peer network can be sufficient to ensure that endogenous and exogenous peer effects can be effectively identified. This brings us to the third major challenge outlined previously: the endogeneity of networks.11 To address this, one either has to have some fortuitous exogeneity in the network (Algan et al., 2015; Beaman, 2012; Carrell et al., 2013; Lindquist, Sauermann, & Zenou, 2015; Munshi, 2003) or has to model the network formation directly. Recent progress has been made on this front, in a series of models of network formation that can be taken to data (e.g., Badev, 2015; Chandrasekhar & Jackson, 2018; Christakis et al., 2010; Currarini, Jackson, Pin, 2009, 2010; De Paula et al., 2015; Goldsmith-Pinkham & Imbens, 2013; Graham, 2016; Jackson & Rogers, 2007; Leung, 2015; Patacchini, Rainone, & Zenou, 2016; Mele, 2013), as well as some approaches using instrumental variables (Bifulco, Fletcher, & Ross,  2011; Patacchini & Zenou, 2016).12

Concluding Remarks There are many important research frontiers in social network analysis, especially at junctures between sociology, economics, anthropology, and other fields. These frontiers include building a richer understanding of the dynamics of networked relationships and behaviors, and their coevolution. Perhaps most importantly, as social network analysis is increasingly making its way into policy arenas, it is important to understand potential inefficiencies both in network structure and in the resulting behaviors that they drive. Our main emphasis in this survey has been that the concept of externalities is a unifying principle that can be incorporated into

554   Matthew O. Jackson, Brian W. Rogers, and Yves Zenou network analysis more generally, as it helps frame many networked interactions and is the defining characteristic of an interesting network. In the absence of externalities, one could simply examine bilateral relationships independently of each other and divorced from their social context. What makes the social context interesting is that a given relationship is influenced by, and influences, the more general social structure. Viewing that context through the lens of the underlying externalities focuses our attention on exactly how people’s behaviors are interrelated and what it is that drives inefficiencies and imperfections in human behaviors and societies. Beyond the importance of externalities, there are increasingly rich network analyses involving economic applications, ranging from development economics (e.g., Banerjee et al., 2013, 2019; Breza, 2016; Cai, de Janvry, & Sadoulet, 2015; Cai & Szeidl, 2018; Feigenberg, Field, & Pande,  2013; Karlan et al.,  2009,  2010; Kinnan & Townsend,  2012; Munshi & Rosenzweig,  2009), to labor economics (e.g., Montgomery,  1991; Calvó-Armengal & Jackson, 2004; Ioannides & Datcher-Loury, 2004; Beaman,  2016), to international trade (e.g., Chaney,  2016), to interstate alliances and wars (e.g., Jackson & Nei,  2015; König et al., 2017), politics (e.g., Cohen & Malloy, 2014; Canen, Jackson, Trebbi, 2020), and crime (e.g., Calvó-Armengol & Zenou,  2004; Glaeser, Sacerdote, & Scheinkman,  1996; Zenou, 2016). Theoretical work is proceeding in tandem with empirical studies across these, and other, varied contexts, which makes clear that the interests of economists, sociologists, and other social scientists will continue to overlap in substantial ways. As this research progresses, and as our data become ever more complete and integrated, it will become increasingly important to marry the insights from different scientific fields to gain a more complete understanding of social context and how it drives human behavior.

Notes 1. With respect to “rational choice,” the emphasis is on the “choice” and not on “rationality.” Many of the models, in fact, assume that people are quite myopic. The common feature is that people have some discretion in their relationships and make some sort of choices, whether they be biased, myopic, subject to systematic errors, governed by norms, or ­rational. 2. Indeed, in game theory, Nash equilibrium (named after John F. Nash) is the canonical solution concept of a game. It is a list of a strategy for each player such that no player has anything to gain by changing his or her own strategy (i.e., unilaterally). For a brief introduction to the basic definitions and workings of game theory, see Jackson (2011). 3. This is related to, but distinct from, a similar concept in matching theory. 4. This applies with various measures of overall social welfare, including a utilitarian measure, as well as Pareto efficiency. See Jackson (2003) for a discussion. 5. See Jackson and Wolinsky (1996), Dutta and Mutuswami (1997), Currarini and Morelli (2000), Watts (2001), and Bloch and Jackson (2007) for more discussion, and see Jackson (2003, 2008) for overviews. 6. Using data on American teenagers and Swedish adult criminals, Liu et al. (2012) and Lindquist and Zenou (2014) show, respectively, that, in the real world, 20% to 30% of the most active criminals are not the key players. For an overview on the literature on key players, see Zenou (2016). 7. For recent overviews, see Fafchamps (2011) and Breza (2016).

An Economic Perspective   555 8. This is related to the experiments of Centola (2010, 2011), who finds that the clustering in network structures can affect the adoption of some products. 9. The hold-up problem is a situation where two parties may be able to work most efficiently by cooperating but refrain from doing so because of concerns that they may give the other party increased bargaining power, and thereby reduce their own profits. When an agent A has made a prior commitment to a relationship with agent B, the latter can “hold up” the former for the value of that commitment. The hold-up problem leads to severe economic cost and might also lead to underinvestment. 10. One can also identify the peer effects if the formulation is nonlinear, or with some other formulations, but then the identification is really dependent on the deviation from linearity. 11. Unfortunately, the sort of instrumental variable strategy described previously is valid only if the network can be treated as (conditionally) exogenous, which is not usually the case unless one has access to an appropriately controlled field experiment, not only in terms of observed characteristics but also in terms of unobserved characteristics that could influence behaviors. It could then be that behaviors of friends of friends are correlated in ways that invalidate instrumenting on friends of friends as a method to identify peer effects. 12. We refer to the literature surveys provided by Blume et al. (2011), Jackson (2014), Graham (2015), Jackson et al. (2016), Chandrasekhar (2016), and Fortin and Boucher (2016) for a detailed treatment of these econometric issues.

References Acemoglu, D., Dahleh, M. A., Lobel, H., & Ozdaglar, A. (2010). Bayesian learning in social networks. Review of Economic Studies, 78, 1201–1236. Acemoglu, D., Ozdaglar, A., & Tahbaz-Salehi, A. (2015). Systemic risk and stability in financial networks. American Economic Review, 105, 564–608. Algan, Y., Dalvit, N., Do, Q.-A., Le Chapelain, A., & Zenou, Y. (2015). How social networks shape our beliefs: A natural experiment among future French politicians (Unpublished manuscript). Sciences Po. Allen, F., & Gale, D. (2000). Financial contagion. Journal of Political Economy, 108, 1–33. Alvarez, F. E., Buera, F. J., & Lucas, R. E., Jr. (2013). Idea flows, economic growth, and trade. National Bureau of Economic Research Working Paper No. 19667. Aral, S., & Walker, D. (2012). Identifying influential and susceptible members of social networks. Science, 337(6092), 337–341. Aumann, R., & Myerson, R. (1988). Endogenous formation of links between players and coalitions: An application of the Shapley value. In A. Roth (Ed.), The Shapley value (pp. 175–191). Cambridge, UK: Cambridge University Press. Badev, A. (2015). Discrete games in endogenous networks: Theory and policy (Unpublished manuscript). Washington, DC: Federal Reserve Board. Bailey, N.  T.  J. (1975). The mathematical theory of infectious diseases and its applications. London, UK: Griffin. Bala, V., & Goyal, S. (1998). Learning from neighbors. Review of Economic Studies, 65, 595–621. Bala, V., & Goyal, S. (2000). A non-cooperative model of network formation. Econometrica, 68, 1181–1231. Ballester, C., Calvó-Armengol, A., & Zenou, Y. (2006). Who’s who in networks. Wanted: The key player. Econometrica, 74, 1403–1417.

556   Matthew O. Jackson, Brian W. Rogers, and Yves Zenou Ballester, C., Calvó-Armengol, A., & Zenou, Y. (2010). Delinquent networks. Journal of the European Economic Association, 8, 34–61. Banerjee, A. (1992). A simple model of herd behaviour. Quarterly Journal of Economics, 107, 797–817. Banerjee, A., Chandrasekhar, A. G., Duflo, E., & Jackson, M. O. (2013). Diffusion of microfinance. Science, 341(6144), 1236498. Banerjee, A., Chandrasekhar, A. G., Duflo, E., & Jackson, M. O. (2019). Gossip: Identifying central individuals in a social network Review of Economic Studies, 86(6), 2453–2490 Bayer, P., Ross, S. L., & Topa, G. (2008). Place of work and place of residence: Informal hiring networks and labor market outcomes. Journal of Political Economy, 116, 1150–1196. Beaman, L. (2012). Social networks and the dynamics of labour market outcomes: Evidence from refugees resettled in the U.S. Review of Economic Studies, 79, 128–161. Beaman, L. (2016). Social networks and the labor market. In Y. Bramoullé, B. W. Rogers, & A. Galeotti (Eds.), The Oxford handbook of the economics of networks (pp. 649–671). Oxford, UK: Oxford University Press. Beaman, L., BenYishay, A., Magruder, J., & Mobarak, A. M. (2016). Can network theory-based targeting increase technology adoption? (Unpublished manuscript). Northwestern University. Benhabib, J., Bisin A., & Jackson, M. O. (2011). Handbook of social economics (Vols. 1A, 1B). Amsterdam, Netherlands: Elsevier. Bienenstock, E.  J., & Bonacich, P. (1993). Game-theory models for exchange networks: Experimental results. Sociological Perspectives, 36, 117–135. Bifulco, R., Fletcher, J.  M., & Ross, S.  L. (2011). The effect of classmate characteristics on post-secondary outcomes: Evidence from the Add Health. American Economic Journal: Economic Policy, 3, 25–53. Bikhchandani, S., Hirshleifer, D., & Welch, I. (1992). A theory of fads, fashion, custom, and cultural change as information cascades. Journal of Political Economy, 100, 992–1026. Bloch, F., & Jackson, M. O. (2007). The formation of networks with transfers among players. Journal of Economic Theory, 133, 83–110. Blume, L. E., Brock, W. A., Durlauf, S. N., & Ioannides, Y. M. (2011). Identification of social interactions. In J. Benhabib, A. Bisin, & M. O. Jackson (Eds.), Handbook of social economics. Amsterdam, Netherlands: Elsevier Science. Blume, L. E., Easley, D., Kleinberg, J., & Tardos, E. (2009). Trading networks with price-setting agents. Games and Economic Behavior, 67, 36–50. Bolte, L., Immorlica, N., & Jackson, M. O. (2020). The Role of Referrals in Inequality, Immobility, and Inefficiency in Labor Markets. Immobility, and Inefficiency in Labor Markets,’’ SSRN paper # 3512293. Bonacich, P. (1987). Power and centrality: A family of measures. American Journal of Sociology, 92, 1170–1182. Bonacich, P., & Bienenstock, E. J. (1993). Assignment games, chromatic number, and exchange theory. Journal of Mathematical Sociology, 17, 243–259. Bramoullé, Y., Djebbari, H., & Fortin, B. (2009). Identification of peer effects through social networks. Journal of Econometrics, 150, 41–55. Bramoullé, Y., Galeotti, A., & Rogers, B. (2016). The Oxford handbook of the economics of networks. Oxford, UK: Oxford University Press. Breza, E. (2016). Field experiments, social networks, and development. In Y.  Bramoullé, B.  W.  Rogers, & A.  Galeotti (Eds.), Oxford handbook on the economics of networks (pp. 412–439). Oxford, UK: Oxford University Press. Brock, W., & Durlauf, S. E. (2001). Discrete choice models with social interactions. Review of Economic Studies, 68, 235–260.

An Economic Perspective   557 Brown, M., Setren, E., & Topa, G. (2016). Do informal referrals lead to better matches? Evidence from a firm’s employee referral system. Journal of Labor Economics, 34, 161–209. Burt, R.  S. (1992). Structural holes: The social structure of competition. Cambridge, MA: Harvard University Press. Burt, R.  S. (2004). Structural holes and good ideas. American Journal of Sociology, 110, 349–399. Cabrales, A., Calvó-Armengol, A., & Zenou, Y. (2011). Social interactions and spillovers. Games and Economic Behavior, 72, 339–360. Cai, J., de Janvry, A., & Sadoulet, E. (2015). Social networks and the decision to insure. American Economic Journal: Applied Economics, 7(2), 81–108. Cai, J., & Szeidl, A. (2018). Interfirm relationships and business performance. The Quarterly Journal of Economics, 133(3), 1229–1282. Calvó-Armengol, A., & Jackson, M. O. (2004). The effects of social networks on employment and inequality. American Economic Review, 94, 426–454. Calvó-Armengol, A., & Jackson, M. O. (2007). Networks in labor markets: Wage and employment dynamics and inequality. Journal of Economic Theory, 132, 27–46. Calvó-Armengol, A., Patacchini, E., & Zenou, Y. (2009). Peer effects and social networks in education. Review of Economic Studies, 76, 1239–1267. Calvó-Armengol, A., & Zenou, Y. (2004). Social networks and crime decisions. The role of social structure in facilitating delinquent behavior. International Economic Review, 45, 939–958. Canen, N., Jackson, M. O., & Trebbi, F. (2019). Endogenous networks and legislative activity. SSRN paper # 2823338. Carayol, N., & Roux, P. (2009). Knowledge flows and the geography of networks: A strategic model of small world formation. Journal of Economic Behavior & Organization, 71, 414–427. Carrell, S. E., Sacerdote, B. I., & West, J. E. (2013). From natural variation to optimal policy? The importance of endogenous peer group formation. Econometrica, 81, 855–882. Centola, D. (2010). The spread of behavior in an online social network experiment. Science, 329, 1194–1197. Centola, D. (2011). An experimental study of homophily in the adoption of health behavior. Science, 334, 1269–1272. Chandrasekhar, A.  G. (2016). Econometrics of network formation. In Y.  Bramoullé, B.  W.  Rogers, & A.  Galeotti (Eds.), Oxford handbook on the economics of networks (pp. 303–357). Oxford, UK: Oxford University Press. Chandrasekhar, A. G., & Jackson, M. O. (2014). Tractable and consistent random graph models. NBER Working Paper No. 20276. Chandrasekhar, A. G., & Jackson, M. O. (2018). A network formation model based on subgraphs. http://ssrn.com/abstract=2660381 Chandrasekhar, A. G., Larreguy, H., & Xandri, J. P. (2020). Testing models of social learning on networks: Evidence from a lab experiment in the field. Econometrica, 88(1), 1–32. Chaney, T. (2016). Networks in international trade. In Y. Bramoullé, B. W. Rogers, & A. Galeotti (Eds.), The Oxford handbook of the economics of networks (pp. 754–775). Oxford, UK: Oxford University Press. Charness, G., Corominas-Bosch, M., & Frechette, G. R. (2005). Bargaining on networks: An experiment. Journal of Economic Theory, 136, 28–65. Choi, S., Gale, D., & Kariv, S. (2008). Sequential equilibrium in monotone games: A theory based analysis of experimental data. Journal of Economic Theory, 143, 302–330. Christakis, N.  A., Fowler, J.  H., Imbens, G.  W., & Kalyanaraman, K. (2010). An empirical model for strategic network formation. NBER Working Paper No. 16039.

558   Matthew O. Jackson, Brian W. Rogers, and Yves Zenou Cohen, L., & Malloy, C. J. (2014). Friends in high places. American Economic Journal: Economic Policy, 6, 63–91. Condorelli, D., & Galeotti, A. (2016). Strategic models of intermediation networks. In Y. Bramoullé, B. W. Rogers, & A. Galeotti (Eds.), The Oxford handbook of the economics of networks (pp. 733–753). Oxford, UK: Oxford University Press. Condorelli, D., Galeotti, A., & Renou, L. (2016). Bilateral trading in networks. Review of Economic Studies, 84(1), 82–105. Conley, T., & Udry, C. (2010). Learning about a new technology: Pineapple in Ghana. American Economic Review, 100, 35–69. Cook, K.  S., & Cheshire, C. (2013). Social exchange, power and inequality in networks. In R. Wittek, T. A. B. Snijders, & V. Nee (Eds.), The handbook of rational choice social research. Stanford, CA: Stanford University Press. Cook, K. S., & Emerson, R. M. (1978). Power, equity and commitment in exchange networks. American Sociological Review, 43, 721–739. Corominas-Bosch, M. (2004). On two-sided network markets. Journal of Economic Theory, 115, 35–77. Currarini, S., Jackson, M. O., & Pin, P. (2009). An economic model of friendship: Homophily, minorities, and segregation. Econometrica, 77, 1003–1045. Currarini, S., Jackson, M. O., & Pin, P. (2010). Identifying the roles of race-based choice and chance in high school friendship network formation. Proceedings of the National Academy of Sciences, 107, 4857–4861. Currarini, S., & Morelli, M. (2000). Network formation with sequential demands. Review Economic Design, 5, 229–250. De Groot, M. H. (1974). Reaching a consensus. Journal of the American Statistical Association, 69, 118–121. De Marti, J., & Zenou, Y. (2017). Segregation in friendship networks. Scandinavian Journal of Economics, 119, 656–708. De Paula, A., Richards-Shubik, S., & Tamer, E. (2015). Identification of preferences in network formation games. CeMMAP Working Paper 29/15. Diekmann, O., Heesterbeek, H., & Britton, T. (2013). Mathematical tools for understanding infectious disease dynamics. Princeton, NJ: Princeton University Press. Duflo, E., & Saez, E. (2003). The role of information and social interactions in retirement plan decisions: Evidence from a randomized experiment. Quarterly Journal of Economics, 118, 815–842. Dutta, B., & Mutuswami, S. (1997). Stable networks. Journal of Economic Theory, 76, 322–344. Elliott, M.  L. (2015). Inefficiencies in networked markets. American Economic Journal: Microeconomics, 7, 43–82. Elliott, M. L., Golub, B., & Jackson, M. O. (2014). Financial networks and contagion. American Economic Review, 104, 3115–3153. Fafchamps, M. (2011). Risk sharing between households. In J.  Benhabib, A.  Bisin, & M. O. Jackson (Eds.), Handbook of social economics (Vol. 1A). Amsterdam, Netherlands: Elsevier. Fafchamps, M., & Lund, S. (2003). Risk-sharing networks in rural Philippines. Journal of Development Economics, 71, 261–287. Feigenberg, B., Field, E., & Pande, R. (2013). The economic returns to social interaction: Experimental evidence from microfinance. Review of Economic Studies, 80(4), 1459–1483.

An Economic Perspective   559 Fortin, B., & Boucher, V. (2016). Some challenges in the empirics of the effects of networks. In Y. Bramoullé, B. W. Rogers, & A. Galeotti (Eds.), Oxford handbook on the economics of ­networks (pp. 277–302). Oxford, UK: Oxford University Press. Foster, A.D., & Rosenzweig, M. R. (1995). Learning by doing and learning from others: Human capital and technical change in agriculture. Journal of Political Economy, 103, 1176–1209. Freeman, L. (2004). The development of social network analysis. A Study in the Sociology of Science, 1, 687. Freixas, X., Parigi, B., & Rochet, J. (2000). Systemic risk, interbank relations and liquidity provision by the central bank. Journal of Money, Credit and Banking, 32, 611–638. Galeotti, A., Goyal, S., & Kamphorst, J. (2006). Network formation with heterogeneous players. Games and Economic Behavior, 54, 353–372. Glaeser, E., Sacerdote, B., & Scheinkman, J. (1996). Crime and social interactions. Quarterly Journal of Economics, 111, 507–548. Goldsmith-Pinkham, P., & Imbens, G.  W. (2013). Social networks and the identification of peer effects. Journal of Business and Economic Statistics, 31, 253–264. Golub, B., & Jackson, M.  O. (2010). Naïve learning in social networks and the wisdom of crowds. American Economic Journal: Microeconomics, 2, 112–149. Golub, B., & Jackson, M.  O. (2012). How homophily affects the speed of learning and best-response dynamics. Quarterly Journal of Economics, 127, 1287–1338. Golub, B., & Sadler, E. (2016). Learning in social networks. In Y. Bramoullé, B. W. Rogers, & A. Galeotti (Eds.), The Oxford handbook of the economics of networks (pp. 504–542). Oxford, UK: Oxford University Press. Goyal, S. (2007). Connections: An introduction to the economics of networks. Princeton, NJ: Princeton University Press. Graham, B.  S. (2015). Methods of identification in social networks. Annual Review of Economics, 7, 465–485. Graham, B.  S. (2016). An econometric model of link formation with degree heterogeneity (Unpublished manuscript). University of California, Berkeley. Granovetter, M.  S. (1973). The strength of weak ties. American Journal of Sociology 78, ­1360–1380. Granovetter, M. S. (1974). Getting a job. Cambridge, MA: Harvard University Press. Granovetter, M.  S. (1978). Threshold models of collective behavior. American Journal of Sociology, 83, 489–515. Grimm, V., & Mengel, F. (2015). An experiment on belief formation in networks (Unpublished manuscript). University of Essex. Holzer, H. J. (1988). Search method use by the unemployed youth. Journal of Labor Economics, 6, 1–20. Ioannides, Y.  M., & Datcher-Loury, L. (2004). Job information networks, neighborhood effects and inequality. Journal of Economic Literature, 424, 1056–1093. Jackson, M.  O. (2003). The stability and efficiency of economic and social networks. In B.  Dutta & M.  O.  Jackson (Eds.), Networks and groups: Models of strategic formation. Heidelberg, Germany: Springer-Verlag. Jackson, M.  O. (2008). Social and economic networks. Princeton, NJ: Princeton University Press. Jackson, M. O. (2011). A brief introduction to the basics of game theory. SSRN paper 1968579. http://ssrn.com/abstract=1968579.

560   Matthew O. Jackson, Brian W. Rogers, and Yves Zenou Jackson, M.  O. (2014). Networks in the understanding of economic behaviors. Journal of Economic Perspectives, 28, 3–22. Jackson, M. O. (2019). The Human Network. Pantheon Press, NY. Jackson, M. O., & Lopez-Pintado, D. (2013). Diffusion and contagion in networks with heterogeneous agents and homophily. Network Science, 1, 49–67. Jackson, M. O., & Nei, S. (2015). Networks of military alliances, wars, and international trade. Proceedings of the National Academy of Sciences, 112, 15277–15284. Jackson, M.  O. & Pernoud, A. (2019). Distorted Investment Incentives, Regulation, and Equilibrium Multiplicity in a Model of Financial Networks, SSRN paper # 3311839. Jackson, M. O. & Pernoud, A. (2020). Systemic Risk in Financial Networks: A Survey, mimeo: Stanford University SSRN Paper # 3651864. Jackson, M. O., & Rogers, B. (2005). The economics of small worlds. Journal of the European Economic Association, 3, 617–627. Jackson, M. O., & Rogers, B. (2007). Meeting strangers and friends of friends: How random are socially generated networks? American Economic Review, 97, 890–915. Jackson, M. O., Rogers, B., & Zenou, Y. (2017). The economic consequences of social network structure. Journal of Economic Literature55, 49–95. Jackson, M. O., & Watts, A. (2002). The evolution of social and economic networks. Journal of Economic Theory, 106(2), 265–295. Jackson, M. O., & Wolinsky, A. (1996). A strategic model of social and economic networks. Journal of Economic Theory, 71, 44–74. Jackson, M. O., & Yariv, L. (2007). Diffusion of behavior and equilibrium properties in network games. American Economic Review (Papers and Proceedings), 97, 92–98. Jackson, M.  O., & Yariv, L. (2011). Diffusion, strategic interaction, and social structure. In J. Benhabib, A. Bisin, & M. O. Jackson (Eds.), Handbook of social economics (Vol. 1A, pp. 645–678). Amsterdam, Netherlands: Elsevier Science. Jackson, M.  O., & Zenou, Y. (2015). Games on networks. In P.  Young & S.  Zamir (Eds.), Handbook of game theory (Vol. 4, pp. 91–157). Amsterdam, Netherlands: Elsevier. Johnson, C., & Gilles, R.  P. (2000). Spatial social networks. Review of Economic Design, 5, 273–299. Karlan, D., Mobius, M., Rosenblat, T., & Szeidl, A. (2009). Trust and social collateral. Quarterly Journal of Economics, 124(3), 1307–1361. Karlan, D., Mobius, M., Rosenblat, T., & Szeidl, A. (2010). Measuring trust in Peruvian shantytowns (Unpublished manuscript). Yale University. Katz, L. (1953). A new status index derived from sociometric analysis. Psychometrika, 18, 39–43. Kelly, M., & Ó Gráda, C. (2000). Market contagion: Evidence from the panics of 1854 and 1857. American Economic Review, 90, 1110–1124. Kinnan, C., & Townsend, R. (2012). Kinship and financial networks, formal financial access, and risk reduction. American Economic Review, 102, 289–293. König, M., Rohner, D., Thoenig, M., & Zilibotti, F. (2017). Networks in conflict: Theory and evidence from the great war of Africa Econometrica, 85(4), 1093–1132. König, M., Tessone, C., & Zenou, Y. (2014). Nestedness in networks: A theoretical model and some applications. Theoretical Economics, 9, 695–752. Kramarz, F., & Nordström Skans, O. (2014). When strong ties are strong: Networks and youth labour market entry. Review of Economic Studies, 81, 1164–1200.

An Economic Perspective   561 Kranton, R.  E. & Minehart, D.  F. (2000). Competition for goods in buyer-seller networks. Review of Economic Design, 5, 301–332. Lazarsfeld, P., Berelson, B., & Gaudet, H. (1944). People’s choice: How the voter makes up his mind in a presidential campaign (3rd ed.). New York, NY: Columbia University Press. Lee, L. F., Liu, X., & Lin, X. (2010). Specification and estimation of social interaction models with network structures. Econometrics Journal, 13, 145–176. Leung, M. (2015). A random-field approach to inference in large models of network formation (Unpublished manuscript). University of Southern California. Lindquist, M. J., Sauermann, J., & Zenou, Y. (2015). Network effects on worker productivity. CEPR Discussion Paper No. 10928. Lindquist, M. J., & Zenou, Y. (2014). Key players in co-offending networks. CEPR Discussion Paper No. 9889. Liu, X., & Lee, L.  F. (2010). GMM estimation of social interaction models with centrality. Journal of Econometrics, 159, 99–115. Liu, X., Patacchini, E., Zenou, Y., & Lee, L-F. (2012). Criminal networks: Who is the key player? CEPR Discussion Paper No. 8772. Lopez-Pintado, D. (2008). Diffusion in complex social networks. Games and Economic Behavior, 62, 573–590. Manea, M. (2011). Bargaining on networks. American Economic Review, 101, 2042–2080. Manea, M. (2016). Models of bilateral trade networks. In Y.  Bramoullé, B.  W.  Rogers, & A. Galeotti (Eds.), The Oxford handbook of the economics of networks (pp. 698–732). Oxford, UK: Oxford University Press. Manski, C. F. (1993). Identification of endogenous effects: The reflection problem. Review of Economic Studies, 60, 531–542. McPherson, M., Smith-Lovin, L., & Cook, J. M. (2001). Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27, 415–444. Mele, A. (2013). A structural model of segregation in social networks (Unpublished manuscript). Johns Hopkins University, Carey Business School. Mobius, M., Phan, T., & Szeidl, A. (2014). Treasure hunt: Social learning in the field. NBER Working Paper No. 21014. Molavi, P., Eksin, C., Ribeiro, A., & Jadbabaie, A. (2016). Learning to coordinate in social networks. Operations Research, 64, 605–621. Montgomery, J. D. (1991). Social networks and labor-market outcomes: toward an economic analysis. The American Economic Review, 81(5), 1408–1418. Mossel, E., Neeman, J., & Tamuz, O. (2014). Majority dynamics and aggregation of information in social networks. Journal of Autonomous Agents and Multi-Agent Systems, 3, ­408–429. Mossel, E., Sly, A., & Tamuz, O. (2015). Strategic learning and the topology of social networks. Econometrica, 83, 1755–1794. Munshi, K. (2003). Networks in the modern economy: Mexican migrants in the U.S. labor market. Quarterly Journal of Economics, 118, 549–597. Munshi, K., & Rosenzweig, M. (2009). Why is mobility in India so low? Social insurance, inequality, and growth (No. w14850). National Bureau of Economic Research. Nava, F. (2015). Efficiency in decentralized oligopolistic markets. Journal of Economic Theory, 157, 315–348. Newman, M. E. J. (2010). Networks: An Introduction. Oxford, UK: Oxford University Press.

562   Matthew O. Jackson, Brian W. Rogers, and Yves Zenou Pastor-Satorras, R., & Vespignani, A. (2001). Epidemic dynamics and endemic states in ­complex networks. Physical Review E, 63, 066117. Patacchini, E., Rainone, E., & Zenou, Y. (2016). Heterogenous peer effects in education (Unpublished manuscript). Cornell University. Patacchini, E., & Zenou, Y. (2016). Social networks and parental behavior in the intergenerational transmission of religion. Quantitative Economics, 7, 969–995. Pellizzari, M. (2010). Do friends and relatives really help in getting a good job? Industrial and Labor Relations Review, 63, 494–510. Perla, J., & Tonetti, C. (2014). Equilibrium imitation and growth. Journal of Political Economy, 122, 52–76. Pin, P., & Rogers, B. W. (2016). Stochastic network formation and homophily. In Y. Bramoullé, B.  W.  Rogers, & A.  Galeotti (Eds.), Oxford handbook on the economics of networks (pp. 138–166). Oxford, UK: Oxford University Press. Rees, A. (1966). Information networks in labor markets. American Economic Review, 56, 559–566. Rees, A., & Schultz, G. P. (1970). Workers and wages in an urban labor market. Chicago, IL: University of Chicago Press. Sacerdote, B. (2001). Peer effects with random assignment: Results from Dartmouth roommates. Quarterly Journal of Economics, 116, 681–704. Smith, L., & Sorensen, P. (2000). Pathological outcomes of observational learning. Econometrica, 68, 371–398. Tahbaz-Salehi, A., & Jadbabaie, A. (2008). A necessary and sufficient condition for consensus over random networks. IEEE Transactions on Automatic Control, 53, 791–795. Topa, G. (2011). Labor markets and referrals. In J. Benhabib, A. Bisin, & M. O. Jackson (Eds.), Handbook of social economics (Vol. 1B, pp. 1193–1221). Amsterdam, Netherlands: Elsevier Science. Uzzi, B. (1996). The sources and consequences of embeddedness for the economic performance of organizations: The network effect. American Sociological Review, 61, 674–698. Wahba, J., & Zenou, Y. (2005). Density, social networks and job search methods: Theory and applications to Egypt. Journal of Development Economics, 78, 443–473. Wang, C. (2016). Core and periphery trading networks (Unpublished manuscript). Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications. Cambridge, UK: Cambridge University Press. Watts, A. (2001). A dynamic model of network formation. Games and Economic Behavior, 34, 331–341 Zenou, Y. (2016). Key players. In Y. Bramoullé, B. W. Rogers, & A. Galeotti (Eds.), The Oxford handbook of the economics of networks (pp. 244–274). Oxford, UK: Oxford University Press.

CHAPTER 30

Soci a l Ca pita l a n d Economic Sociol ogy Steve M c Donald and Richard A. Benton

Social relationships are critical for understanding work and workplaces. Classical sociological research emphasizes the importance of worker solidarity (Marx & Engels,  1955), intragroup dynamics (Homans, 1951; Roethlisberger & Dickson, 1939; Simmel, 1950), and community relations (Tocqueville, 1889; Tonnies, 1887) as key determinants of economic outcomes. Contemporary scholarship on social networks and work tends to view interpersonal relationships as a resource that promotes the opportunities and productivity of workers. These resources are conceptualized as social capital (McDonald & Benton, 2013). Some have tended to view social capital as a collective resource that derives from group membership (Coleman, 1990; Portes, 1998; Putnam, 2001), whereas others measure social capital at the individual level (Lin, 2001). The resources could be (1) transmitted or conferred through interpersonal relationships, as in the case of information, influence, or status (Lin, 2001), or (2) derived from abstract properties associated with network structure, as in the case of trust, brokerage, and social closure (Burt, 1992; Coleman, 1990). In this chapter, we review recent research on the economic consequences associated with social capital in the labor market and within work organizations.

Social Capital and the Labor Market Researchers have long debated (1) how people learn about and are hired into job openings through their network contacts and (2) how the use of job contacts affects the character and quality of the positions for which they are hired. Scholarship offers complex answers to these relatively simple questions—answers that reveal how social context shapes employment opportunities.

564   Steve McDonald and Richard A. Benton

Job-Matching Processes From an economic perspective, a job search is a rational process of investing in information about employment opportunities. The equilibrium search model assumes that increased job  search intensity yields higher-quality job information and offers (Mortensen & Pissarides, 1999). However, search is a costly activity and therefore people will continue to search until the value of their best opportunity exceeds the cost of further search (i.e., the optimal stopping rule: Devine & Kiefer,  1991; Lippman & McCall,  1976). Social network connections serve as one important source of job information. People can invest in this source by developing/maintaining their connections and by mobilizing that information through inquiries for job-finding assistance. The economic process of network-based job finding was elaborated through an explicit modeling of the interplay between the social networks of workers and the matching of those individuals to jobs (Calvó-Armengol & Jackson, 2004). Employed and unemployed actors are connected via network relations. The job-matching process unfolds over a series of discrete sequential time periods. In each time period, employed actors randomly lose jobs and information about job openings arrives at random for both employed and unemployed individuals. Unemployed actors use the information they receive by applying for a job. Employed actors pass the information along to their unemployed friends. This model of network-based job finding laid the groundwork for further theoretical elaborations and extensions (for recent examples, see Galenianos, 2014; Merlino, 2014; Zaharieva, 2015; Zenou, 2015). While these economic models provide a useful starting point for understanding the interplay between social networks and employment, they remain limited, particularly with regard to their assumptions about how job information and assistance are received. Rather than being random, information receipt is endogenous to background and relational characteristics. Figure  30.1 illustrates these processes, with the top-left portion of the figure Background characteristics affect alter perceptions of ego and moderate the form and extent of assistance

Network features affect the quality of social capital resources

pital cial ca

d of so le han ce Invisib ssistan cited a e li o s n U vid pro o t ble g/a llin i W Mobilization

Network Features Degree, centrality, density, diversity, homophily, transitivity, etc.

Job Finding Assistance

(Ego)

le to ask Willing/ab

Background characteristics shape network features

Background Characteristics Gender, race, class, status, religion, age, etc.

figure 30.1  Illustration of job-finding assistance processes.

Social Capital and Economic Sociology   565 showing how network features impact the quality of social capital resources provided to potential job changers. For example, social network size and diversity are positively associated with the receipt of job leads (McDonald, Lin, & Ao, 2009). Network alter and contact characteristics also affect the rate at which individuals learn about employment opportunities. People with higher-prestige ties, for instance, tend to hear about more job leads than others (Lin & Ao, 2008). Network features are themselves influenced by background characteristics (see the bottom-right portion of Figure 30.1). Proximity and homophily are among the most impor­ tant determinants of social network connectivity (McPherson, Smith-Lovin, & Cook, 2001; Rivera, Soderstrom, & Uzzi,  2010). Proximity is a reflection of the physical closeness of individuals (e.g., residential distance) and the extent to which people participate in similar social institutions (e.g., schools, workplaces, voluntary associations). Homophily refers to the tendency for individuals to connect with similar others. The proximity and homophily mechanisms generate segregated social networks based on social and ecological factors— especially gender, race/ethnicity, and class. Social network segregation contributes to distinctive network structures across social groups, which can be highly consequential for the receipt of job-finding assistance. For example, women and minority groups tend to maintain a social capital access deficit relative to male and white workers (Lin, 2000). Women, minority ethnic group members, working-class individuals, and people living in disadvantaged neighborhoods tend to have smaller networks and fewer employed and high-status contacts in their professional networks (Chua, 2013; Cornwell & Cornwell, 2008; McDonald, 2011b; McDonald et al., 2009; Moren-Cross & Lin, 2008; Rankin & Quane, 2000; Smith, 2000). Women also have less access than men to social capital through voluntary association memberships (McPherson & Smith-Lovin, 1986; Rotolo & Wharton, 2003; Rotolo & Wilson, 2007), which serve as important resources for social and career advancement (Benton, 2016b; Fong & Shen, 2016; Rotolo & Wilson, 2003; Ruiter & De Graaf, 2009; Son, 2015; Son & Lin, 2008). Job-finding assistance can also vary depending on how individuals mobilize their social network resources (see the middle portion of Figure 30.1). Psychologists have emphasized the importance of personality and goal orientation in the mobilization of network resources (Totterdell, Holman, & Hukin,  2008; Van Hoye, van Hooft, & Lievens,  2009; Wanberg, Kanfer, & Banas, 2000). According to this perspective, a job search is viewed as a process of self-regulation, with employment outcomes linked to deliberate decisions about how to search for a job. Networking is a strategic process that involves contacting others to obtain information, job leads, and job-finding assistance (Porter & Woo, 2015). Unemployed individuals who score higher on extraversion survey items tend to report networking more intensely than others, and networking intensity is positively associated with re-employment (Wanberg et al., 2000). Mobilization strategies can also vary across social groups. Workers may be unable to effectively mobilize their social capital because their contacts lack access to the kinds of contacts who can help them to gain quality employment (MacLeod, 2010; Royster, 2003). Some workers may be less willing to pursue assistance. For example, black residents in high-poverty neighborhoods sometimes employ a “defensive individualist” ideology that reduces their likelihood of pursuing job-finding assistance among their friends and relatives (Smith, 2005, 2007). The ability and willingness of network alters to provide assistance are additional determinants of job assistance receipt (Marin, 2012). The decision to provide (or not provide)

566   Steve McDonald and Richard A. Benton job-finding assistance is linked to many factors, but most notably the reputation of the target individual. To the extent that a person has been fired, quit, or failed to meet work requirements in the past, job contacts are unlikely to provide them with information about new job opportunities (Trimble O’Connor,  2013). The process of providing job leads is complex, requiring (1) knowledge about openings, (2) knowledge of individuals who might be good fits for those openings, and (3) deciding whether or not to reveal those openings to them (Marin, 2012; Trimble O’Connor, 2013). Weaker and more diverse network connections are associated with knowledge about openings and target individuals, whereas ­ stronger ties increase the chance of assisting a job seeker (Kim & Fernandez, 2017; Marin, 2012, 2013). In general, weak tie contacts are best for accessing information, whereas strong tie contacts exert greater influence on the hiring process (Barbulescu,  2015; Bian, Huang, & Zhang,  2015; McDonald,  2011b; Yakubovich,  2005). Race/ethnic minorities are, on average, more likely than whites to assist others with a job search (Fernandez & Fernandez-Mateo, 2006; Hamm & McDonald, 2015). Distinct cultural logics appear to contribute to variation in job-finding assistance among different race/ethnic subgroups. Smith and Young (2017) find that African American contacts are primarily motivated by job seekers’ desire to work, Latinx contacts by job seekers’ necessity for finding employment, and white contacts by a perceived fit between the job seeker and the job opening. Importantly, mobilization of network resources is not required to receive job-finding assistance—social capital is often received in the form of unsolicited assistance. Lin (2000) has referred to this unsolicited form of network assistance as the “invisible hand of social capital” (see also Lin & Ao, 2008; McDonald & Day, 2010). For example, roughly a quarter of the working population has changed jobs without engaging in a job search—nearly half for workers who are hired into executive and management occupations (McDonald & Elder, 2006). These “nonsearch” transitions are mostly mediated by informal recruitment through social network connections (McDonald, 2010). Finally, background characteristics moderate the form and extent of assistance provided by job contacts by impacting their perceptions of fit, competence, and risk (see the upperright portion of Figure 30.1). For example, network alters tend to steer female workers away from male-dominated workplaces into female-dominated workplaces (and the reverse for male workers) to improve perceived gender matches (Kmec, McDonald, & Trimble, 2010). Background characteristics can also affect the extent of assistance received. As a brief illustration, we present evidence from the 2005 Social Capital USA survey (for details see McDonald et al., 2009) to display gender and race variation in the predicted probabilities of receiving different types of job-finding assistance (net of education, age, employment status, job tenure, occupation type, occupation prestige, and supervisory authority). Respondents who reported receiving job-finding assistance indicated whether job contacts (1) provided information (to the respondent or to the company about the respondent), (2) vouched (“put in a good word”) for the respondent, or (3) provided information and vouched for the respondent. Figure  30.2 shows how different African American males are  from the other groups. They are overwhelmingly likely to receive only information assistance from their job contacts, which is generally considered to be the least extensive form of support. By contrast, white men are most likely to receive both information and vouching assistance.

Social Capital and Economic Sociology   567 1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00

White male

White female

Black male

Information and vouching

Black female Vouching only

Hispanic male Hispanic female Information only

figure 30.2  Predicted probabilities of receiving different types of job-finding assistance by race of recipient. Source: Social Capital USA Survey, 2005.

Job-Matching Outcomes From the perspective of job seekers, contacts can provide information about a potential job opening that is both better than and unique from the information that might otherwise be available to them (Castilla, Lan, & Rissing, 2013a). From an employer’s perspective, people who are referred by contacts tend to come from a “richer pool” of job candidates (Fernandez, Castilla, & Moore, 2000), in that referrers deem them to be competent, trustworthy, and worth risking their reputation on. In this way, network contacts serve as a signal of the superior quality of connected job candidates (Castilla et al., 2013a). Consequently, many argue that interpersonal connections yield better employment outcomes relative to more formal job search means (Montgomery, 1991; Dustmann et al., 2016). However, the empirical evidence suggests that the causal benefits of network contacts in the job-matching process are more varied, depending in large part on the economic outcome of interest. Research consistently shows that personal contact use increases the odds of becoming employed and reduces spells of unemployment (Bonoli & Turtschi,  2015; Cappellari & Tatsiramos, 2015; Cingano & Rosolia, 2012; Fernandez et al., 2000; Petersen, Saporta, & Seidel, 2000). The more contested question is whether job contacts lead to better jobs. Granovetter’s (1974) seminal analysis showed a positive association between wages and the use of personal contacts to find jobs in the United States. Subsequent research has been hard pressed to replicate that finding net of controls and in more broadly generalizable samples (Bridges & Villemez, 1986; Green, Tigges, & Diaz, 1999; Mouw, 2003; Pellizzari, 2010; but see Dustmann et al., 2016; Shen & Bian, 2018). One important reason that network effects on income are so elusive is because most research is limited to contact effects among workers who are looking for employment rather

568   Steve McDonald and Richard A. Benton than all workers who change jobs. Economic analyses tend to model job lead receipt as a positive function of job search intensity (e.g., Christensen et al.,  2005; Fontaine,  2008; Zaharieva, 2015), which has led to the exclusive focus on the employment benefits associated with active networking. Yet these benefits are more modest than the benefits of being informally recruited into new jobs without engaging in a job search (Chua,  2014; Elliott, 2000; McDonald, 2015; McDonald & Elder, 2006), highlighting the key distinction between supply networks (populated by contacts who are mobilized by job seekers) and recruitment networks (populated by contacts who are mobilized by employers: Granovetter & Tilly, 1988). Whereas active reliance on supply networks is common for gaining access to many low-wage and low-status jobs, passive search strategies and hiring through informal recruitment are far more common among professionals, managers, and executives (Fernandez-Mateo & Fernandez,  2016; Hamori,  2010; McDonald,  2015; Stupnytska & Zaharieva, 2015). The employment benefits associated with network-based job finding are also strongly related to the job contact’s status and relational characteristics. People tend to be hired into prestigious jobs when they receive help from contacts that hold high-prestige jobs (Bian et al., 2015; Davern, 1999; Kim, 2009; Lin, 1999). Job-finding assistance from weak-tie connections also tends to be beneficial for employment outcomes (D. W. Brown & Konrad, 2001; Granovetter, 1973; Kim, 2009; Lin, Ensel, & Vaughn, 1981; Smith, 2000; Van Hoye et al., 2009; Yakubovich, 2005; Zenou, 2015; but see Huang & Western, 2011; Obukhova, 2012). Obtaining job referrals from company employees (especially hiring authorities) increases the chances that an individual will gain an interview, receive a job offer, receive higher wages, and stay in a job longer (M. Brown, Setren, & Topa, 2015; Di Stasio & Gërxhani, 2015; Fernandez et al., 2000; Fernandez & Weinberg, 1997; Kmec & Trimble, 2009). The efficacy of network-based job finding also varies by the ascriptive characteristics of the job seeker and contact. White males tend to earn higher wages when they access their jobs through personal contacts relative to what women and racial minorities earn when using contacts (Huffman & Torres, 2002; Marmaros & Sacerdote, 2002; Parks-Yancy, 2006; Parks-Yancy, DiTomaso, & Post, 2006; Petersen et al., 2000). When people rely on white and male connections, independent of their own personal gender and race characteristics, they tend to be hired into higher-quality jobs (Belliveau,  2005; Elliott,  1999; Green et al., 1999; Huffman & Torres, 2002; Parks-Yancy, 2006; Royster, 2003; Smith, 2000; Son & Lin, 2012; Stainback, 2008). Research suggests that racial minorities only benefit from job contacts when employers do not know the race/ethnicity of their job contacts (Kmec & Trimble, 2009) or when they are referred by white contacts and job evaluators express low levels of racial prejudice (Silva, 2018). Network effects also appear to be stronger for midcareer workers than for early-career workers (McDonald & Elder, 2006). This is likely due to a shifting in network composition across the life course: as workers gain experience, more and more work-based acquaintances become incorporated into their networks, which results in better assistance for job finding (McDonald, 2011a; McDonald, Chen, & Mair, 2015; McDonald & Mair, 2010). All of this suggests that some workers experience a social capital returns deficit. Economic returns may vary across groups even when those groups have similar amounts of social capital resources mobilized on their behalf. Returns deficits derive from the ways that or­gan­iz­ a­tional actors may perceive social capital differently depending on the characteristics of ego. For example, employers have been known to downplay specific hiring criteria for

Social Capital and Economic Sociology   569 some race/ethnic groups while highly valuing the same criteria for other groups (Pager, Western, & Bonikowski,  2009). Previous research suggests that employers view network connectivity as a more important criterion for the hiring and promotion of white men than it is for women and racial minorities (see Wilson, 1997; McDonald, 2011a). Job-matching processes are “dual” or “double” embedded (Baker & Faulkner,  2009; Habinek, Martin, & Zablocki, 2015; McDonald, Benton, & Warner, 2012; Tian & Liu, 2018), in that distinctive cultural and institutional contexts moderate the outcomes of networkmediated processes such as job matching. Most international studies show that n ­ etworks tend to increase the odds of finding a job (Chua, 2011; Obukhova & Lan, 2013; Yakubovich, 2005), but the wage and job quality benefits of networks are more variable. In a comparative study of advanced industrialized nations, Pellizzari (2010) found that searching for jobs via contacts results in wage premiums only in a small number of countries (Belgium and the Netherlands). Contacts to weak ties tend to be beneficial in some contexts (e.g., Russia: Yakubovich,  2005), while strong-tie relations afford greater opportunities in others (e.g., China: Obukhova,  2012; but see Obukhova & Zhang,  2017). The variable benefits of tie strength have been linked to cultural and institutional differences in how labor market resources are allocated—such as guanxi in China (Bian,  1997; Shen,  2013; Burt, Bian,  & Opper, 2018). Variation in contact effects can also be influenced by the extent of credentialism within an economic sector. Personal contact use yields lower-wage returns relative to formal search methods when explicit training requirements and certification are common, while networks yield greater-wage premiums under conditions of qualification uncertainty (Chua, 2011). Demand-side recruitment also tends to vary across institutional contexts. In liberal market economies such as the United States, high-wage jobs are often filled through informal recruitment of passive job seekers, whereas active formal search is a more common way of filling high-wage jobs in coordinated market economies and socialist economies (Benton et al., 2015; McDonald et al., 2012). This line of research implies that policy implementation of “free market” principles may increase the economic benefits associated with network-based job finding. The shift from coordinated to market economies may also enhance the informational value of social networks, as opposed to their influence value (Chen & Volker, 2016).

Social Capital and Workplace Outcomes Social relations within workplaces and organizations can serve as important resources for strategic action. An intraorganizational network refers to network relations bounded within an organization or organizational unit. Network relations can include ties related to trust, friendship, advice, workflow, and many others. Network scholars commonly distinguish between instrumental ties—such as advice and workflow networks—and informal ties, such as friendships, and trust relations, which can have distinct effects on workplace outcomes. These social structures provide a set of opportunities and constraints that contextualize individual action. Individuals can draw on resources embedded in their personal networks, such as advice or information, as they pursue strategic goals. However, these

570   Steve McDonald and Richard A. Benton networked contexts also help establish the norms, varied political interests, and socially constructed goals that structure and constrain action. More specifically, relations can provide social capital in the form of individual resources useful for pursuing individual performance and competitive advantage (Burt, 2001; Lin, 2001) or collective resources useful for sustaining trust and cooperation within work organizations (Coleman, 1990). Early work on networks within workplaces emerged as an important corrective to the rational design perspectives that viewed organizational behavior through a strict efficiency and strategic design lens. Prior to the 1930s, Taylor’s scientific management perspective dominated management theory, focusing on process engineering and work design. However, with the development of the human relations school, practitioners and analysts increasingly saw how work contexts were similar to other human groups (Mayo, 1933). This latter set of perspectives was informed, in part, by early network thinking. A series of famous studies conducted at the Hawthorne factory of Western Electric Company (Roethlisberger & Dickson,  1939) discovered that dense networks of informal relations developed among a group of workers wiring and assembling telephone switchboard banks. These informal relations had powerful effects on employee productivity. Despite being paid through a piece rate system designed to incentivize individual productivity, network relations helped the group of workers set informal work quotas well below their maximum productivity. The workers used informal sanctioning tactics, such as ridicule and insults, to enforce group norms restricting output. Consequently, the network provided a collective infrastructure for enforcing group norms and discouraging individual defection at the group’s expense. Although group cohesion and social solidarity dynamics are widely acknowledged today (Benton, 2016a; Moody & White, 2003), scholars of the time were able to draw new lessons about human motivation and behavior at work that had lasting effects in the scholarship of work organizations. As the Hawthorne case demonstrates, informal relations form a “shadow structure” that can affect individuals’ knowledge, beliefs, and behaviors within a work organization (Kanter, 1977; Krackhardt & Hanson, 1993; McGuire, 2002; Stevenson, 1990). Scholars have elaborated how distinct network structures foster unique individual and collective social capital resources. On the one hand, access to diverse network resources fosters unique learning opportunities and chances to broker information and influence across disparate regions of a network (Burt, 1992). These network positions form a structural basis for individual competitive advantage and thus can be seen as an individual form of social capital within the workplace. On the other hand, tightly knit and cohesive networks can support norms of trust and reciprocity, thus supporting a collective form of social capital available to group members engaged in collective action (Coleman, 1990; Ferrin, Dirks, & Shah, 2006; Moody & White, 2003; Benton, 2017). Each form of social capital has a number of notable antecedents and outcomes within workplace contexts, as illustrated in Figure 30.3.

Antecedents Network structures within organizations evolve as individuals form new ties and selectively maintain or discontinue existing ties. These network dynamics can reflect a complex set of factors including individual traits and endogenous structural determinants. A number of individual attributes play a particularly prominent role in these network dynamics and their

Social Capital and Economic Sociology   571 Relation types: Communication Trust Advice Friendship Workflow Demographic antecedents: Race Gender Homophily

Stuctural antecedents: Reciprocity Preferential attachment Transitivity

Network position Structural holes Critical positions Centrality

Network topology Closure Cohesion

Individual Qutomes: Performance Innovation Power imbalace Collective Outcomes: Trust Cooperation Consensus Team performance Power balance

figure 30.3  Diagram of intraorganizational network effects. outcomes. Individual status traits such as gender, race, personality, and functional or task roles all affect social network formation and outcomes within work organizations (Berger, Cohen, & Zelditch, 1972). In many organizations, women and racial and ethnic minorities are at a disadvantage when it comes to acquiring and mobilizing network resources, as compared to men and members of the majority group (Brass, 1985; McGuire, 2002). This problem is magnified by the tendency for similar individuals to homophilously form ties (McPherson et al.,  2001), leaving underrepresented groups with fewer opportunities to build helpful relationships. This inequality in the relational opportunity structure can be particularly consequential for disparate career development trajectories, as women and minority group members gain fewer opportunities to acquire resources and power or to become socialized into the informal aspects of work organizations. In a study of help and assistance networks among employees of a financial services firm, McGuire (2000) found that women’s and minorities’ network contacts had less formal authority as compared to their white male peers. This was largely because the formal structure of the organization limited women’s and minorities’ access to powerful contacts. However, even when majority and minority group members exhibit similar network resources, underrepresented groups are typically less able to draw on these social resources for help (McGuire, 2002) or convert them into material gains, such as promotion (Burt, 1998). As a consequence, underrepresented groups tend to suffer from deficits in both social capital access and mobilization that can help explain career discrepancies. A considerable body of research also considers endogenous, or purely structural, antecedents to network formation within work organizations. Intraorganizational social networks, particularly advice and friendship ties, are thought to reflect self-organizing dynamics as actors confront the existing structure of the network as a set of opportunities and constraints informing their own future network action. The classic example of this is network reciprocity (Gouldner,  1960). In a longitudinal study of advice relations within a Dutch housing corporation, Agneessens and Wittek (2012) find that workers regularly seek advice

572   Steve McDonald and Richard A. Benton from coworkers to whom they have previously given advice. Reciprocated relations may reflect norms of cooperation and generalized exchange within organizations that reduces status competition (Lazega et al., 2012; Lazega & Pattison, 1999). Other notable structural effect within work organizations concerns the emergence of social status and network hierarchies (Lazega et al., 2012), as well as closed triads that aggregate into larger community structures (Davis, 1970).

Individual Performance and Innovation Outcomes Intraorganizational networks are important for performance and innovation outcomes, particularly at the individual level (Flap, Bulder, & Volker, 1998). A well-developed theory of structural advantage explains how distinct network positions facilitate access to unique sources of resources and information that can be leveraged into superior innovation and performance. In one sense, organizations are themselves networks of task flows designed to achieve collective goals. Positions that are central and critical to the workflow network tend to house more significant tasks, and occupants of these positions tend to accumulate more varied experiences (Brass, 1981; De Vaan, Vedres, & Stark, 2015). Individuals in these positions are critical for organizational performance outcomes, but these positions can also be leveraged into individual advancement (Benton, 2019). Similar effects accrue to individuals in central and critical positions in informal networks. Individuals who are centrally positioned in advice-sharing networks are better able to engage in cooperative problem solving, accumulate specialized knowledge, and become experienced with alternative solutions. In general, people with larger and more diverse advice networks tend to have higher job performance ratings (Sparrowe et al., 2001) and more career advancement opportunities (Burt, 2005). Perhaps the most widely supported theory of structural advantage is Burt’s (1992, 2004) theory of structural holes. Burt argues that because opinions, behaviors, and information tend to be more homogenous within dense groups than between groups, individuals who inhabit network positions bridging otherwise disconnected groups are best situated to access diverse opinions, knowledge, or resources necessary for effective strategic action, particularly when action requires innovation and creativity. For instance, occupying key bridging positions in information networks improves individual performance in knowledge-intensive work because people inhabiting these positions have better opportunities to draw on others’ expertise (Cross & Cummings, 2004).

Trust and Collective Outcomes Whereas structural holes and brokerage positions facilitate individual performance outcomes and innovation, network cohesion is more useful for generating collective resources and enhancing group performance (Benton, 2016a; Burt, 2001, 2005; Moody & White, 2003). Closed triads and larger cohesive groups buttress shared norms of trust and reciprocity among members (Ferrin et al., 2006; Simmel, 1950). Closed social networks increase monitoring potential within groups, making it easier for group members to observe others’ behavior and socially sanction those who defect from group interests. Consequently,

Social Capital and Economic Sociology   573 cohesive network structures can reduce free riding or defection, making it easier for group members to draw on and contribute to collective norms of generalized trust, reciprocity, and consensus. Consensus increases predictability and cooperation that work group members seek out when selecting future group members, further reinforcing network closure and producing densely tied communities (Hinds et al.,  2000). More broadly, trust is an important social resource that increases a number of performance outcomes, including openness to interpersonal communication, organizational citizenship behaviors, open negotiation processes, and reduced intraorganizational conflict (Dirks & Ferrin, 2001). Densely knit groups also spread information and knowledge more efficiently than sparse groups or groups that depend on individual brokers to distribute knowledge. In a study of informal relations among employees of an research and development firm, Reagans and McEvily (2003) find that cohesive networks improved knowledge transfer in dyads—two actors are better able to share information when they share multiple ties to mutual third parties. They reasoned that mutual embeddedness within dense groups increases actors’ willingness to invest in the time and effort necessary to achieve effective knowledge transfers. Dense network structures may be particularly adaptable when work tasks require coordination and collaboration or organizational “buy-in” (Podolny & Baron,  1997). Hansen (2002) finds that cohesive network structures support knowledge sharing among work groups that increases teamwork efficiency and expedites project completion. Of course, overreliance on cohesive networks involves tradeoffs. Reliance on cohesive networks may reduce uncertainty and foster collective trust but can obstruct access to weak ties that are particularly important for individual performance (Gargiulo & Benassi, 2000; Mizruchi & Stearns, 2001). For instance, cohesive networks may improve one individual’s performance at the expense of another’s. In a study of investment bankers, Gargiulo, Ertug, and Galunic (2009) find that network closure improves performance for bankers who regularly seek out and acquire information from others; however, closure can harm individual performance for bankers who regularly serve as sources for information. This highlights the point that networks within work organizations may support social norms that facilitate exploitative exchange relations or dependency asymmetries. Whereas some scholars explore the ways cohesion and brokerage facilitate distinct resources, others emphasize how diverse network structures can complement one another. While brokerage positions facilitate access to unique resources and information necessary for innovation, dense groups support coordination and recombination necessary to implement innovative strategies under uncertain conditions. Consequently, when multiple cohesive groups overlap in membership, this network structure, sometimes called intercohesion or structural folds (Vedres & Stark, 2010), builds on the advantages of both network properties in supporting creative innovation (de Vaan et al., 2015).

Power and Influence Network-based resources can be important antecedents for power and influence within work organizations (Pfeffer,  2010). Power and influence accrue to individuals best positioned to benefit from dependency asymmetries and broker information and influence across the network. A dependency relationship exists when person A depends on person B but this dependency is not mutual. Such a dependency relationship gives B power over A

574   Steve McDonald and Richard A. Benton (Cook et al., 1983; Emerson, 1962). Positions in a broader network of task interdependencies, advice relations, or information flows can introduce complex interpersonal dependences. In a study of workflow, communication, and friendship networks among employees of a newspaper publisher, Brass (1984) finds that centrally positioned individuals as well as individuals in critical workflow positions are attributed greater influence by both supervisors and nonsupervisors and have greater chances of being promoted. Critical positions are similar to structural holes: a proportion of input/output workflows must pass through a critical position with few alternative pathways. Others in the network depend on individuals in these critical positions to access information or workflow, presenting opportunities to broker information and providing a structural source of power. Not only do network positions provide structural sources of power, but also one’s knowledge and understanding of the broader network context can provide advantages. An accurate perception of one’s social and political landscape is a necessary prerequisite for gaining and maintaining power within organizations (Pettigrew,  1973). For example, Krackhardt (1990) examines how accurately individuals perceive the friendship and advice networks within a technology firm. People with more accurate perceptions of the advice network tended to be viewed as more powerful by others within the organization. Perhaps unsurprisingly, network perceptions are themselves influenced by network position. Krackhardt and Kilduff (2002) find that dense and closed network structures support agreement about who is tied to whom within firms. Just as cohesive networks support consensus and reciprocity while reducing brokerage opportunities, they also support consensus about the broader network structure. This may, in turn, reduce individuals’ capacity to leverage unique knowledge about the network for personal gain.

Summary Considerable research effort has sought to examine the interplay between social networks and economic life, yielding many important insights about how interpersonal resources contribute to labor markets and workplace outcomes. Our review highlights both the regularity and complexity of these processes (for other recent reviews, see Castilla et al., 2013a; Castilla, Lan, & Rissing, 2013b; McDonald et al., 2013; Trimble & Kmec, 2011). We see several promising avenues for further exploration of these lines of inquiry. First, new research is revealing important insights into the contextual contingencies associated with returns to social capital (e.g., Larquier & Marchal, 2016; Zhang & Lin, 2016). Whereas most previous research emphasizes ways that social capital use leads to inequality and segregation in employment outcomes, Rubineau and Fernandez (2015) suggest that there is a tipping point at which gender differences in the propensity to use network-based referrals may lead to job desegregation. Consequently, organizations could rely on network-based hiring practices to help correct rather than exacerbate ascriptive imbalances. Other research shows how institutional change may impact these organizational practices. Obukhova and Rubineau (2016) recently showed that employers respond to marketization policies by adopting practices that encourage employee referrals. Future research should continue to examine the organizational and institutional mechanisms driving variation in returns to social capital.

Social Capital and Economic Sociology   575 Second, changes in internet and communication technologies have fundamentally transformed the labor market, organizations, and how work is done. Future research should continue to examine how the mobilization of online connections impacts employment opportunities (see Garg & Telang, 2017; Gee, Jones, & Burke, 2016; Gershon, 2017) as well as how organizational actors use online networks to recruit passive talent via professional social media websites such as LinkedIn (Sharone, 2017). Finally, the proliferation of crowdsourced online work platforms offers an entirely new horizon to examine how networks matter for productivity and recognition (Bianchi, Kang, & Stewart, 2010). Further research along these lines will extend foundational ideas about the link between social capital and work outcomes.

References Agneessens, F., & Wittek, R. (2012). Where do intra-organizational advice relations come from? The role of informal status and social capital in social exchange. Social Networks, 34(3), 333–345. Baker, W., & Faulkner, R. R. (2009). Social capital, double embeddedness, and mechanisms of stability and change. American Behavioral Scientist, 52(11), 1531–1555. Barbulescu, R. (2015). The strength of many kinds of ties: Unpacking the role of social contacts across stages of the job search process. Organization Science, 26(4), 1040–1058. Belliveau, M.  A. (2005). Blind ambition? The effects of social networks and institutional sex composition on the job search outcomes of elite coeducational and women’s college ­graduates. Organization Science, 16(2), 134–150. Benton, R. A. (2016a). Corporate governance and nested authority: Cohesive network structure, actor-driven mechanisms, and the balance of power in American corporations. American Journal of Sociology, 122(3), 661–713. Benton, R. A. (2016b). Uniters or dividers? Voluntary organizations and social capital acquisition. Social Networks, 44, 209–218. Benton, R. A. (2017). The decline of social entrenchment: Social network cohesion and board responsiveness to shareholder activism. Organization Science, 28(2), 262–282. Benton, R. A. (2019). Brokerage and closure in corporate control: Shifting sources of power for a fractured corporate board network. Organization Studies, 40(11), 1631–1656. Benton, R. A., McDonald, S., Manzoni, A., & Warner, D. F. (2015). The recruitment paradox: Network recruitment, structural position, and East German market transition. Social Forces, 93(3), 905–932. Berger, J., Cohen, B.  P., & Zelditch, M. (1972). Status characteristics and social interaction. American Sociological Review, 37(3), 241–255. Bian, Y. (1997). Bringing strong ties back in: Indirect ties, network bridges, and job searches in China. American Sociological Review, 62, 366–385. Bian, Y., Huang, X., & Zhang, L. (2015). Information and favoritism: The network effect on wage income in China. Social Networks, 40, 129–138. Bianchi, A. J., Kang, S. M., & Stewart, D. (2010). The organizational selection of status characteristics: Status evaluations in an open source community. Organization Science, 23(2), 341–354. Bonoli, G., & Turtschi, N. (2015). Inequality in social capital and labour market re-entry among unemployed people. Research in Social Stratification and Mobility, 42, 87–95.

576   Steve McDonald and Richard A. Benton Brass, Daniel J. (1981). Structural Relationships, Job Characteristics, and Worker Satisfaction and Performance. Administrative Science Quarterly, 26(3), 331–348. Brass, D. J. (1984). Being in the right place—A structural-analysis of individual influence in an organization. Administrative Science Quarterly, 29(4), 518–539. Brass, D. J. (1985). Men’s and women’s networks: A study of interaction patterns and influence in an organization. Academy of Management Journal, 28(2), 327–343. Bridges, W.  P., & Villemez, W.  J. (1986). Informal hiring and income in the labor market. American Sociological Review, 51, 574–582. Brown, D. W., & Konrad, A. M. (2001). Granovetter was right: The importance of weak ties to a contemporary job search. Group & Organization Management, 26(4), 434–462. Brown, M., Setren, E., & Topa, G. (2015). Do informal referrals lead to better matches? Evidence from a firm’s employee referral system. Journal of Labor Economics, 34(1), 161–209. Burt, R. (1992). Structural holes: The social structure of competition. Cambridge, MA: Harvard University Press. Burt, R. (1998). The gender of social capital. Rationality and Society, 10(1), 5–46. Burt, R. (2004). Structural holes and good ideas. American Journal of Sociology, 110(2), 349–399. Burt, R. S. (2001). Structural holes versus network closure as social capital. In N. Lin, K. Cook, & R. S. Burt (Eds.), Social capital: Theory and research. New Brunswick, NJ: Transaction Publishers. Burt, R. S. (2005). Brokerage and closure: An introduction to social capital. Oxford, UK: Oxford University Press. Burt, R. S., Bian, Y., & Opper, S. (2018). More or less guanxi: Trust is 60% network context, 10% individual difference. Social Networks, 54, 12–25. Calvó-Armengol, A., & Jackson, M. O. (2004). The effects of social networks on employment and inequality. American Economic Review, 94(3), 426–454. Cappellari, L., & Tatsiramos, K. (2015). With a little help from my friends? Quality of social networks, job finding and job match quality. European Economic Review, 78, 55–75. Castilla, E. J., Lan, G. J., & Rissing, B. A. (2013a). Social networks and employment: Mechanisms (part 1). Sociology Compass, 7(12), 999–1012. Castilla, E. J., Lan, G. J., & Rissing, B. A. (2013b). Social networks and employment: Outcomes (part 2). Sociology Compass, 7(12), 1013–1026. Chen, Y., & Volker, B. (2016). Social capital and homophily both matter for labor market outcomes—Evidence from replication and extension. Social Networks, 45, 18–31. Christensen, B. J., Lentz, R., Mortensen, D. T., Neumann, G. R., & Werwatz, A. (2005). On-thejob search and the wage distribution. Journal of Labor Economics, 23(1), 31–58. Chua, V. (2011). Social networks and labour market outcomes in a meritocracy. Social Networks, 33(1), 1–11. Chua, V. (2013). Categorical sources of varieties of network inequalities. Social Science Research, 42(5), 1236–1253. Chua, V. (2014). The contingent value of unmobilized social capital in getting a good job. Sociological Perspectives, 57(1), 124–143. Cingano, F., & Rosolia, A. (2012). People I know: Job search and social networks. Journal of Labor Economics, 30(2), 291–332. Coleman, J. S. (1990). Foundations of social theory. Cambridge, MA: Belknap Press. Cook, K.  S., Emerson, R.  M., Gillmore, M.  R., & Yamagishi, T. (1983). The distribution of power in exchange networks—Theory and experimental results. American Journal of Sociology, 89(2), 275–305.

Social Capital and Economic Sociology   577 Cornwell, E.  Y., & Cornwell, B. (2008). Access to expertise as a form of social capital: An examination of race-and class-based disparities in network ties to experts. Sociological Perspectives, 51(4), 853–876. Cross, R., & Cummings, J. N. (2004). Tie and network correlates of individual performance in knowledge-intensive work. Academy of Management Journal, 47(6), 928–937. Davern, M. (1999). Social networks and prestige attainment. American Journal of Economics and Sociology, 58(4), 843–864. Davis, J.  A. (1970). Clustering and hierarchy in interpersonal relations: Testing two graph ­theoretical models on 742 sociomatrices. American Sociological Review, 35, 843–851. de Vaan, M., Vedres, B., & Stark, D. (2015). Game changer: The topology of creativity. American Journal of Sociology, 120(4), 1144–1194. Devine, T. J., & Kiefer, N. M. (1991). Empirical labor economics: The search approach. Oxford, UK: Oxford University Press. Di Stasio, V., & Gërxhani, K. (2015). Employers’ social contacts and their hiring behavior in a factorial survey. Social Science Research, 51, 93–107. Dirks, K. T., & Ferrin, D. L. (2001). The role of trust in organizational settings. Organization Science, 12(4), 450–467. Dustmann, C., Glitz, A., Schönberg, U., & Brücker, H. (2016). Referral-based job search networks. Review of Economic Studies, 83(2), 514–546. Elliott, J. R. (1999). Social isolation and labor market insulation. Sociological Quarterly, 40(2), 199–216. Elliott, J.  R. (2000). Class, race, and job matching in contemporary urban labor markets. Social Science Quarterly, 81, 1036–1052. Emerson, R. M. (1962). Power-dependence relations. American Sociological Review, 27(1), 31–41. Fernandez, R.  M., Castilla, E.  J., & Moore, P. (2000). Social capital at work: Networks and employment at a phone center. American Journal of Sociology, 105(5), 1288–1356. Fernandez, R.  M., & Fernandez-Mateo, I. (2006). Networks, race, and hiring. American Sociological Review, 71(1), 42–71. Fernandez, R. M., & Weinberg, N. (1997). Sifting and sorting: Personal contacts and hiring in a retail bank. American Sociological Review, 62, 883–902. Fernandez-Mateo, I., & Fernandez, R. M. (2016). Bending the pipeline? Executive search and gender inequality in hiring for top management jobs. Management Science, 62(12), 3636–3655. Ferrin, D. L., Dirks, K. T., & Shah, P. P. (2006). Direct and indirect effects of third-party relationships on interpersonal trust. Journal of Applied Psychology, 91(4), 870–883. Flap, H., Bulder, B., & Volker, B. (1998). Intra-organizational networks and performance: A review. Computational & Mathematical Organization Theory, 4(2), 109–147. Fong, E., & Shen, J. (2016). Participation in voluntary associations and social contact of immigrants in Canada. American Behavioral Scientist, 60, 617–636. Fontaine, F. (2008). Why are similar workers paid differently? The role of social networks. Journal of Economic Dynamics and Control, 32(12), 3960–3977. Galenianos, M. (2014). Hiring through referrals. Journal of Economic Theory, 152, 304–323. Garg, R., & Telang, R. (2017). To be or not to be linked: Online social networks and job search by unemployed workforce. Management Science, 64(8), 3469–3970. Gargiulo, M., & Benassi, M. (2000). Trapped in your own net? Network cohesion structural holes, and the adaptation of social capital. Organization Science, 11(2), 183–196. Gargiulo, M., Ertug, G., & Galunic, C. (2009). The two faces of control: Network closure and individual performance among knowledge workers. Administrative Science Quarterly, 54(2), 299–333.

578   Steve McDonald and Richard A. Benton Gee, L. K., Jones, J., & Burke, M. (2016). Social networks and labor markets: How strong ties relate to job finding on Facebook’s social network. Journal of Labor Economics, 35(2), ­485–518. Gershon, Ilana. (2017). Down and out in the New Economy: How People Find (or Don’t Find) Work Today. Chicago: University of Chicago Press. Gouldner, A.  W. (1960). The norm of reciprocity: A preliminary statement. American Sociological Review, 25(2), 161–178. Granovetter, M.  S. (1973). The strength of weak ties. American Journal of Sociology, 78(6), 1360–1380. Granovetter, M.  S. (1974). Getting a job. A study of contacts and careers. Cambridge, MA: Harvard University Press. Granovetter, M., & Tilly, C. (1988). Inequality and labor process. In N.  J.  Smelser (Ed.), Handbook of sociology (pp. 175–221). Newbury Park, CA: Sage. Green, G.  P., Tigges, L.  M., & Diaz, D. (1999). Racial and ethnic differences in job-search ­strategies in Atlanta, Boston, and Los Angeles. Social Science Quarterly, 80(2), 263. Habinek, J., Martin, J. L., & Zablocki, B. D. (2015). Double-embeddedness: Spatial and relational contexts of tie persistence and re-formation. Social Networks, 42, 27–41. Hamm, L., & McDonald, S. (2015). Helping hands: Race, neighborhood context, and reluctance in providing job-finding assistance. Sociological Quarterly, 56(3), 539–557. Hamori, M. (2010). Who gets headhunted—and who gets ahead? The impact of search firms on executive careers. Academy of Management Perspectives, 24(4), 46–59. Hansen, Morten T. (2002). Knowledge Networks: Explaining Effective Knowledge Sharing in Multiunit Companies. Organization Science, 13(3), 232–248. Hinds, P. J., Carley, K. M., Krackhardt, D., & Wholey, D. (2000). Choosing work group members: Balancing similarity, competence, and familiarity. Organizational Behavior and Human Decision Processes, 81(2), 226–251. Homans, G. C. (1951). The human group (Vol. xxxiv). Piscataway, NJ: Transaction Publishers. Huang, X., & Western, M. (2011). Social networks and occupational attainment in Australia. Sociology, 45(2), 269–286. Huffman, M.  L., & Torres, L. (2002). It’s not only “who you know” that matters: Gender, ­personal contacts, and job lead quality. Gender & Society, 16(6), 793–813. Kanter, R. M. (1977). Men and women of the corporation. New York, NY: Basic Books. Kim, H. H. (2009). Networks, information transfer, and status conferral: The role of social capital in income stratification among lawyers. Sociological Quarterly, 50(1), 61–87. Kim, M., & Fernandez, R. M. (2017). Strength matters: Tie strength as a causal driver of networks’ information benefits. Social Science Research, 65, 268–281. Kmec, J. A., McDonald, S., & Trimble, L. B. (2010). Making gender fit and “correcting” gender misfits: Sex segregated employment and the nonsearch process. Gender & Society, 24(2), 213–236. Kmec, J. A., & Trimble, L. B. (2009). Does it pay to have a network contact? Social network ties, workplace racial context, and pay outcomes. Social Science Research, 38(2), 266–278. Krackhardt, D. (1990). Assessing the political landscape—Structure, cognition, and power in organizations. Administrative Science Quarterly, 35(2), 342–369. Krackhardt, D., & Hanson, J. R. (1993). Informal networks: The company behind the charts. Harvard Business Review, 71(4), 104–111. Krackhardt, D., & Kilduff, M. (2002). Structure, culture and Simmelian Ties in entrepreneurial firms. Social Networks, 24, 279–290.

Social Capital and Economic Sociology   579 Larquier, G.  de, & Marchal, E. (2016). Does the formalization of practices enhance equal ­hiring opportunities? An analysis of a French nation-wide employer survey. Socio-Economic Review, 14(3), 567–589. Lazega, E., Mounier, L., Snijders, T., & Tubaro, P. (2012). Norms, status and the dynamics of advice networks: A case study. Social Networks, 34(3), 323–332. Lazega, E., & Pattison, P.  E. (1999). Multiplexity, generalized exchange and cooperation in organizations: A case study. Social Networks, 21(1), 67–90. Lin, N. (1999). Social networks and status attainment. Annual Review of Sociology, 25, 467–487. Lin, N. (2000). Inequality in social capital. Contemporary Sociology, 29(6), 785–795. Lin, N. (2001). Social capital: A theory of social structure and action. Cambridge, UK: Cambridge University Press. Lin, N., & Ao, D. (2008). The invisible hand of social capital: An exploratory study. In N. Lin & B.  H.  Erickson (Eds.), Social capital: An international research program (pp. 107–132). Oxford, UK: Oxford University Press. Lin, N., Ensel, W. M., & Vaughn, J. C. (1981). Social resources and strength of ties: Structural factors in occupational status attainment. American Sociological Review, 46(4), 393–405. Lippman, S. A., & McCall, J. (1976). The economics of job search: A survey. Economic Inquiry, 14(2), 155–189. MacLeod, J. (2010). Ain’t no makin’it (3rd ed.). Boulder, CO: Westview Press. Marin, A. (2012). Don’t mention it: Why people don’t share job information, when they do, and why it matters. Social Networks, 34(2), 181–192. Marin, A. (2013). Who can tell? Network diversity, within-industry networks, and opportunities to share job information. Sociological Forum, 28(2), 350–372. Marmaros, D., & Sacerdote, B. (2002). Peer and social networks in job search. European Economic Review, 46(4), 870–879. Marx, K., & Engels, F. (1955). Capital. Chicago, IL: Encyclopædia Britannica. Mayo, E. (1933). The human problems of an industrial civilization. New York, NY: Macmillan. McDonald, S. (2010). Right place, right time: serendipity and informal job matching. SocioEconomic Review, 8(2), 307–331. McDonald, S. (2011a). What you know or who you know? Occupation-specific work experience and job matching through social networks. Social Science Research, 40(6), 1664–1675. McDonald, S. (2011b). What’s in the “old boys” network? Accessing social capital in gendered and racialized networks. Social Networks, 33(4), 317–330. McDonald, S. (2015). Network effects across the earnings distribution: Payoffs to visible and invisible job finding assistance. Social Science Research, 49, 299–313. McDonald, S., & Benton, R. A. (2013). Social capital. In V. Smith (Ed.), Sociology of work encyclopedia (pp. 791–793). Thousand Oaks, CA: Sage. McDonald, S., Benton, R.  A., & Warner, D.  F. (2012). Dual embeddedness: Informal job matching and labor market institutions in the United States and Germany. Social Forces, 91(1), 75–97. McDonald, S., Chen, F., & Mair, C. A. (2015). Cross-national patterns of social capital accumulation network resources and aging in China, Taiwan, and the United States. American Behavioral Scientist, 59(8), 914–930. McDonald, S., & Day, J.  C. (2010). Race, gender, and the invisible hand of social capital. Sociology Compass, 4(7), 532–543. McDonald, S., & Elder, G. H. (2006). When does social capital matter? Non-searching for jobs across the life course. Social Forces, 85(1), 521–549.

580   Steve McDonald and Richard A. Benton McDonald, S., Gaddis, S.  M., Trimble, L.  B., & Hamm, L. (2013). Frontiers of sociological research on networks, work, and inequality. In S.  McDonald (Ed.), Networks, work and inequality (Vol. 24, pp. 1–41). Bingley, UK: Emerald. McDonald, S., Lin, N., & Ao, D. (2009). Networks of opportunity: Gender, race, and job leads. Social Problems, 56(3), 385–402. McDonald, S., & Mair, C. A. (2010). Social capital across the life course: Age and gendered patterns of network resources. Sociological Forum, 25(2), 335–359. McGuire, G. M. (2000). Gender, race, ethnicity, and networks: The factors affecting the status of employees’ network members. Work and Occupations, 27(4), 501–524. McGuire, G. M. (2002). Gender, race, and the shadow structure: A study of informal networks and inequality in a work organization. Gender and Society, 16(3), 303–322. McPherson, J.  M., & Smith-Lovin, L. (1986). Sex segregation in voluntary associations. American Sociological Review, 51(1), 61–79. McPherson, M., Smith-Lovin, L., & Cook, J. M. (2001). Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27, 415–444. Merlino, L. P. (2014). Formal and informal job search. Economics Letters, 125(3), 350–352. Mizruchi, M. S., & Stearns, L. B. (2001). Getting deals done: The use of social networks in bank decision-making. American Sociological Review, 66(5), 647–671. Montgomery, J. D. (1991). Social networks and labor-market outcomes: Toward an economic analysis. American Economic Review, 81(5), 1408–1418. Moody, J., & White, D. R. (2003). Structural cohesion and embeddedness: A hierarchical concept of social groups. American Sociological Review, 68(1), 103–127. Moren-Cross, J., & Lin, N. (2008). Access to social capital and status attainment in the United States: Racial/ethnic and gender differences. In N. Lin & B. H. Erickson (Eds.), Social capital: An international research program (pp. 364–379). Oxford, UK: Oxford University Press. Mortensen, D. T., & Pissarides, C. A. (1999). New developments in models of search in the labor market. In O.  C.  Ashenfelter & D.  Card (Eds.), Handbook of labor economics (pp. 2567–627). Amsterdam, Netherlands: New Holland. Mouw, T. (2003). Social capital and finding a job: Do contacts matter? American Sociological Review, 68, 868–898. Obukhova, E. (2012). Motivation vs. relevance: Using strong ties to find a job in urban China. Social Science Research, 41(3), 570–580. Obukhova, E., & Lan, G. (2013). Do job seekers benefit from contacts? A direct test with contemporaneous searches. Management Science, 59(10), 2204–2216. Obukhova, E., & Rubineau, B. (2016). Democratizing referrals: Market transition and labor market networks in China. Academy of Management Proceedings, 2016(1), 17414. Obukhova, E., & Zhang, L. (2017). Social capital and job search in urban China: The strengthof-strong-ties hypothesis revisited. Chinese Sociological Review, 49(4), 340–361. Pager, Devah, Bruce Western, and Bart Bonikowski. (2009). Discrimination in a Low-Wage Labor Market A Field Experiment. American Sociological Review, 74(5), 777–799. Parks-Yancy, R. (2006). The effects of social group membership and social capital resources on careers. Journal of Black Studies, 36(4), 515–545. Parks-Yancy, R., DiTomaso, N., & Post, C. (2006). The social capital resources of gender and class groups. Sociological Spectrum, 26(1), 85–113. Pellizzari, M. (2010). Do friends and relatives really help in getting a good job? Industrial & Labor Relations Review, 63(3), 494–510. Petersen, T., Saporta, I., & Seidel, M. L. (2000). Offering a job: Meritocracy and social networks. American Journal of Sociology, 106(3), 763–816.

Social Capital and Economic Sociology   581 Pettigrew, A. (1973). The politics of organizational decision-making. London, UK: Tavistock. Pfeffer, J. (2010). Power. New York, NY: HarperCollins Publishers. Podolny, J. M., & Baron, J. N. (1997). Resources and relationships: Social networks and mobility in the workplace. American Sociological Review, 62(5), 673–693. Porter, C. M., & Woo, S. E. (2015). Untangling the networking phenomenon: A dynamic psychological perspective on how and why people network. Journal of Management, 41(5), 1477–1500. Portes, A. (1998). Social capital: Its origins and applications in modern sociology. Annual Review of Sociology, 24, 1–24. Putnam, R. D. (2001). Bowling alone: The collapse and revival of American community. New York, NY: Simon and Schuster. Rankin, B. H., & Quane, J. M. (2000). Neighborhood poverty and the social isolation of innercity African American families. Social Forces, 79(1), 139–164. Reagans, R., & McEvily, B. (2003). Network structure and knowledge transfer: The effects of cohesion and range. Administrative Science Quarterly, 48(2), 240–267. Rivera, M.  T., Soderstrom, S.  B., & Uzzi, B. (2010). Dynamics of dyads in social networks: Assortative, relational, and proximity mechanisms. Annual Review of Sociology, 36(1), 91–115. Roethlisberger, F. J., & Dickson, W. J. (1939). Management and the worker. Cambridge, MA: Harvard University Press. Rotolo, T., & Wilson, J. (2003). Work histories and voluntary association memberships. Sociological Forum, 18(4), 603–619. Rotolo, T., & Wharton, A. (2003). Living across institutions: Exploring sex-based homophily in occupations and voluntary groups. Sociological Perspectives, 46(1), 59. Rotolo, T., & Wilson, J. (2007). Sex segregation in volunteer work. Sociological Quarterly, 48(3), 559–585. Royster, D. (2003). Race and the invisible hand: How white networks exclude black men from blue-collar jobs. Berkeley, CA: University of California Press. Rubineau, B. & Fernandez, R. M. (2015). Tipping points: The gender segregating and desegregating effects of network recruitment. Organization Science, 26(6), 1646–1664. Ruiter, S., & De Graaf, N. D. (2009). Socio-economic payoffs of voluntary association involvement: A Dutch life course study. European Sociological Review, 25(4), 425–442. Sharone, Ofer. (2017). LinkedIn or LinkedOut? How Social Networking Sites Are Reshaping the Labor Market. Research in the Sociology of Work, 30, 1–31 Shen, J. (2013). How can one get ahead in the contemporary labour market of China?— Examining the changing stratification mechanisms through job-attainment patterns. Journal of Management and Sustainability, 3(2), 132–144. Shen, J., & Bian, Y. (2018). The causal effect of social capital on income: A new analytic strategy. Social Networks, 54, 82–90. Silva, F. (2018). The strength of whites’ ties: How employers reward the referrals of black and white jobseekers. Social Forces, 97(2), 741–768. Simmel, G. (1950). The sociology of Georg Simmel. (K. H. Wolf, Trans.). New York, NY: Free Press. Smith, S. S. (2000). Mobilizing social resources: Race, ethnic, and gender differences in social capital and persisting wage inequalities. Sociological Quarterly, 41(4), 509–537. Smith, S. S. (2005). “Don’t put my name on it”: Social capital activation and job-finding assistance among the black urban poor. American Journal of Sociology, 111(1), 1–57. Smith, S. S. (2007). Lone pursuit: Distrust and defensive individualism among the black poor. New York, NY: Russell Sage Foundation.

582   Steve McDonald and Richard A. Benton Smith, S. S., & Young, K. A. (2017). Want, need, fit: The cultural logics of job-matching assistance. Work and Occupations, 44(2), 171–209. Son, J. (2015). Institutional affiliation as a measure of organizational social capital: A case study of Korea. Social Indicators Research, 129(2), 1–18. Son, J., & Lin, N. (2008). Social capital and civic action: A network-based approach. Social Science Research, 37(1), 330–349. Son, J., & Lin, N. (2012). Network diversity, contact diversity, and status attainment. Social Networks, 34(4), 601–613. Sparrowe, R. T., Liden, R. C., Wayne, S. J., & Kraimer, M. L. (2001). Social networks and the performance of individuals and groups. Academy of Management Journal, 44(2), 316–325. Stainback, K. (2008). Social contacts and race/ethnic job matching. Social Forces, 87(2), 857–886. Stevenson, W. B. (1990). Formal structure and networks of interaction within organizations. Social Science Research, 19(2), 113–131. Stupnytska, Y., & Zaharieva, A. (2015). Explaining U-shape of the referral hiring pattern in a search model with heterogeneous workers. Journal of Economic Behavior & Organization, 119, 211–233. Tian, F. F., & Liu, X. (2018). Gendered double embeddedness: Finding jobs through networks in the Chinese labor market. Social Networks, 52, 28–36. Tocqueville, A. de. (1889). Democracy in America (J. P. Mayer, Ed., M. Lerner, Trans.). New York, NY: Harper & Row. Tonnies, F. (1887). Community and society (C. P. Loomis, Trans.). East Lansing, MI: Michigan State University Press. Totterdell, P., Holman, D., & Hukin, A. (2008). Social networkers: Measuring and examining individual differences in propensity to connect with others. Social Networks, 30(4), 283–296. Trimble, L. B., & Kmec, J. A. (2011). The role of social networks in getting a job. Sociology Compass, 5(2), 165–178. Trimble O’Connor, L. (2013). Ask and you shall receive: Social network contacts’ provision of help during the job search. Social Networks, 35(4), 593–603. Van Hoye, G., van Hooft, E. A. J., & Lievens, F. (2009). Networking as a job search behaviour: A social network perspective. Journal of Occupational and Organizational Psychology, 82(3), 661–682. Vedres, B., & Stark, D. (2010). Structural folds: Generative disruption in overlapping groups. American Journal of Sociology, 115(4), 1150–1190. Wanberg, C.  R., Kanfer, R., & Banas, J.  T. (2000). Predictors and outcomes of networking intensity among unemployed job seekers. Journal of Applied Psychology, 85(4), 491–503. Wilson, G. (1997). Pathways to power: Racial differences in the determinants of job authority. Social Problems, 44(1), 38–54. Yakubovich, V. (2005). Weak ties, information, and influence: How workers find jobs in a local Russian labor market. American Sociological Review, 70(3), 408–421. Zaharieva, A. (2015). Social contacts and referrals in a labor market with on-the-job search. Labour Economics, 32, 27–43. Zenou, Y. (2015). A dynamic model of weak and strong ties in the labor market. Journal of Labor Economics, 33(4), 891–932. Zhang, Y., & Lin, N. (2016). Hiring for networks: Social capital and staffing practices in transitional China. Human Resource Management, 55(4), 615–635.

CHAPTER 31

The I n ter nationa l Tr a de N et wor k Min Zhou

Remarkable expansion of international trade has made the world more tightly connected. The international trade network (ITN) is one of the most prominent manifestations of economic globalization. It reflects complex interconnection and interdependence among national economies. It has thus become a frontier of the scholarship on the intersection of social network analysis (SNA) and economic globalization. This chapter discusses key contributions SNA studies have made to our knowledge about international trade. Generally speaking, the existing literature applies SNA to the ITN in three distinct directions. The first line of inquiry originates from world system theory and attempts to substantiate the hierarchical structure of the international community envisioned by world system theory. The second line of inquiry describes the topological structure and evolution of the ITN. It reveals that the ITN possesses typical properties of a complex, rather than random, network. The third line of inquiry employs various modeling techniques to explain the ITN. Its key question is why the ITN takes place as observed. The gravity model of international trade popular among economists has identified various factors affecting the formation and evolution of the ITN. With a functional form reminiscent of the law of gravity in physics, the simplest version of the gravity model predicts bilateral trade flows based on the economic sizes (often using gross domestic product as a proxy) and distance between two countries. More sophisticated versions of the gravity model have been widely employed to evaluate various determinants of bilateral trade flows. The SNA modeling strategies largely build upon the conventional gravity model borrowed from international economics but make important improvements. They incorporate network characteristics such as social homophily, systemic factors, and topological properties into the gravity model. They also make use of estimating methods developed for network data such as the multivariate regression quadratic assignment procedure (MRQAP), which leads to better statistical estimation of the gravity model. In future research, instead of relying on the gravity model, it is also promising to directly use SNA models such as the exponential random graph model (ERGM) to explain the ITN.

584   Min Zhou

Data and Measurement in ITN Studies Before elaborating on the three lines of inquiry, I first introduce the network data and ­measurement commonly used in SNA studies of the ITN. Almost all versions of data on international trade used by SNA scholars are derived from two major sources—the United Nations Comtrade database and the International Monetary Fund Direction of Trade Statistics (DOTS) database. Due to the relational nature of SNA, data usually consist of bilateral trade between every pair of countries, and dyad-year (i.e., bilateral trade between a country pair in a particular year) is the conventional unit of measurement. Countries are represented by nodes (or vertices) and trade relations between countries are denoted by ties (or edges) connecting nodes. While some look at total trade between two countries and treat the ITN as undirectional, more and more studies see the ITN as directional and distinguish imports and exports. Despite some correlation between them, imports and exports operate by different mechanisms and have different impacts on the country pair. It is better to distinguish them. Lumping them together also entails loss of information. Earlier studies sometimes treat the ITN as binary and simply consider presence or absence of a tie between two countries. For example, a tie is either present or absent depending on whether the value of bilateral trade is greater than a given threshold (zero, one million, or other designated values). This simplification of the ITN into a binary network creates loss of information and would not generate much insight into the strength of a tie. For instance, a 1-million trade relationship is treated as the same as a 100-billion trade relationship. More and more studies treat the ITN as a valued network. Instead of simply examining whether there is a tie or not between two countries, they measure the strength of the tie, usually by actual trade volume. As discussed later, the literature making use of the ­gravity model borrowed from international economics is particularly keen in explaining the strength (trade volume) of a tie. There are two more variations of ITN data. One variation aims to simplify ITN data and highlight important relations in the ITN. Either binary or valued ITN data can be huge and sometimes pose challenges for SNA, especially visualization. Some trade relations are important, while others are insignificant or even negligible for international trade overall. Scholars have developed different methods that extract the backbone of the ITN without losing much information. There are three ways to construct such ITN data (Zhou, Wu, & Xu, 2016). First, we can set a threshold for the volume of bilateral trade and only keep those trade relations above this threshold in the network. The selection of a higher threshold results in a network made up of more important trade ties. Second, we can calculate the proportion of each trade relation relative to the country’s total trade and only keep those relations whose proportion is greater than a designated threshold. Only those trade ties above this threshold are considered as relatively important to individual countries and are thus kept in the network. The selection of a higher proportion as the threshold results in a network with more important trade relations. The third method is to use the ranking of a country’s trade relations. For each country, we rank its trade relations with other countries by volume. Then we only keep top-ranked trade relations. We can set the standard as top 1, top 2, and so on. The top 1 network only consists of each country’s topmost trade relation, the top 2 network consists of each country’s top two trade relations, and so on.

THE International Trade Network   585 The last two methods (the selection of a proportion threshold or a top-ranking threshold) are particularly good at highlighting relatively important trade relations of individual countries. For instance, the two-billion trade relationship between the United States and Haiti should not be treated with the same importance for the United States and Haiti. Obviously this trade relationship matters much more for Haiti than the United States. Setting a proportion threshold or a top-ranking threshold can address such concern. When the research focuses on Haiti’s trade network, this relationship would remain in the data. It may not pass the designated threshold and is thus omitted when the focus is on trade networks of the United States. The proportion or top-ranking threshold helps construct the ITN that takes into account relative importance.

World System Classification in the ITN Among three distinct directions of SNA research on international trade, the first line of inquiry inspired by world system theory is arguably the earliest. World system theory portrays the ITN as consisting of a dense and cohesive core and a sparsely connected periphery (and sometimes a semiperiphery somewhere in between). Within the ITN core countries are capable of trading with many countries, whereas peripheral countries are constrained to develop trade only with core countries, not among themselves (Chase-Dunn & Grimes, 1995; Wallerstein, 1974). The difference in world system status provides core countries structural advantages while disadvantaging peripheral ones and making them economically dependent on core countries. Guided by this theory, Snyder and Kick (1979) and Breiger (1981) are among the first that applied SNA to the ITN. This line of inquiry almost uniformly attempts to assess the extent to which the ITN exhibits a core-periphery interaction pattern, using various types of blockmodeling analysis (Breiger,  1981; Clark & Beckfield,  2009; Mahutga, 2006; Mahutga & Smith, 2011; Nemeth & Smith, 1985; Reichardt & White, 2007; Smith & White, 1992; Snyder & Kick, 1979; Van Rossem, 1996). The key motivation of these network studies is to empirically classify countries into different structural positions, usually three positions including the core, semiperiphery, and periphery. Some studies identify more than three positions or treat world system status as a continuum rather than discrete positions. To classify countries into various world system statuses, this literature uses SNA to examine how similar or different countries are from each other in terms of trade relations with other countries. Early studies (e.g., Snyder & Kick, 1979) employed structural equivalence— two countries are structurally equivalent if they trade with the exact same other countries. Later studies switched to role equivalence where two countries are seen as role equivalent if they trade with similar types of countries based on position. Despite different measures in these studies, they all apply some form of blockmodeling to an adjacency matrix of bilateral trade. They permute the rows and columns of the matrix to generate blocks representing world system status. Specific techniques may vary. For instance, classifications within the matrix can be generated by different procedures such as maximizing the density of the core and making the core have a density close to 1 (or a “1-block” in the terminology of

586   Min Zhou blockmodeling), minimizing the density of the periphery (close to a “0-block”), m ­ aximizing the difference between the density in the core and that in the periphery, or maximizing the correlation between the observed matrix and the ideal core-periphery matrix. The use of SNA by this literature is limited, however. Prior research in this vein mainly employs SNA, especially blockmodeling, to reveal a core-periphery hierarchy among countries while paying little attention to other topological features of the ITN. It is not very interested in explaining the strength of trade ties between specific country pairs either.

Topological Properties of the ITN The second line of inquiry examines various topological properties of the ITN. With the recent rise of “the new science of networks” across many disciplines (Barabási,  2002; Watts,  2004), more scholars in economics, mathematics, and even physics have started adopting a network perspective on economic activities (Easley & Kleinberg,  2010; Goyal, 2007; Jackson, 2008; Schweitzer et al., 2009). This new approach sees international trade as a complex network and sheds light on the topology of the ITN (Baskaran et al., 2011; Bhattacharya et al., 2008; Fagiolo, 2010; Fagiolo, Reyes, & Schiavo, 2008, 2009; Garlaschelli & Loffredo, 2004, 2005; Garlaschelli et al., 2007; Li, Jin, & Chen, 2003; Picciolo et al., 2012; Serrano & Boguna,  2003; Serrano, Boguna, & Vespignani,  2007; Squartini, Fagiolo, & Garlaschelli, 2011a, 2011b; Wilhite, 2001). Some of the key topological properties in which scholars are interested include the node degree distribution (the number of nonzero or substantial trade relations of a country), average nearest-neighbor degree (the average number of partners of the neighbors of a given country), clustering coefficient (the fraction of a country’s partners who are themselves partners), and degree-degree correlation (the correlation between a country’s degree and the average degree of its nearest neighbors). Overall this literature finds that the ITN possesses typical properties of complex networks, including the “small world” property (a highly clustered network with small path lengths), a scale-free (or power law-shaped) degree distribution, a high clustering coefficient, and the presence of degree-degree correlation. All these topological properties indicate that international trade cannot be simply reduced to a random-network description. It is worth noting that topological properties of the ITN may vary depending on how the ITN is measured. Topological properties of the ITN measured as a valued network differ from those from its binary version (Fagiolo et al., 2008). Selecting a different importance threshold when constructing the ITN data also affects observed topological properties (Zhou et al., 2016). Revealed topological properties may also depend on standard determinants of international bilateral trade such as the economic size of the countries involved, geographic distance between the two countries, and so on. However, even after all factors known to affect bilateral trade flows have been removed, the “residual” ITN still follows a complex-network pattern. The residual ITN can be seen as the variance that remains unexplained after all standard determinants of bilateral trade flows have been accounted for. The residual ITN displays even more impressive signatures of complex networks such as a ­power-law-shaped degree distribution of nodes. Some endogenous self-organizing mechanisms are definitely at work.

THE International Trade Network   587 Despite differences in topological properties revealed by different scholars using ­differing data and measurement, some consensus exists. Key topological properties can be summarized as follows (Garlaschelli & Loffredo, 2004, 2005; Serrano & Boguna, 2003). In terms of first-order topological properties, notably, the node degree distribution is scale invariant and follows a power-law shape. The ITN is both robust and vulnerable as a result. Randomly removing some countries would not make a big impact on the scale-free network, but if a fraction of well-connected countries (trade hubs) are targeted and removed, the network would collapse easily. Countries are highly heterogeneous in their importance for the whole ITN. The scale-free network would not be easily destabilized as most countries have no such impact. On the other hand, its stability is also heavily dependent on a few primary nodes. The dynamics (such as crises or other economic shocks) of these hubs can easily diffuse to the whole network and thus dominate the dynamics of the whole ITN. In terms of second-order topological properties, the degree-degree correlation is usually negative, suggesting a disassortative tendency (i.e., in the network, nodes of a low degree are more likely to connect with nodes of a high degree). Countries with many trade partners are on average connected with countries holding fewer partners. The more poorly connected a country is, the more likely it has to be tied to (and thus dependent on) much better-connected countries. In terms of third-order topological properties, the correlation between a country’s degree and its clustering coefficient is negative, so partners of well-connected countries are less interconnected than those poorly connected ones. This property implies that well-connected countries are more likely to be “brokers” that occupy structural hole positions. A well-­connected country’s partners are less likely to be connected between themselves without this well-connected country serving as an intermediary. Nevertheless, in studies using the valued ITN, the correlation is positive—countries with more intense trade relations are more likely to form strongly connected trade triangles. This property reveals many strongly connected trade cliques in the ITN. The correlations between the first-order property (especially the node degree) and higher-order properties (such as the degree-degree correlation and clustering coefficient) suggest a hierarchical structure in the ITN. If there were no hierarchy, these higher-order properties should be close to a random quantity independent of the node degree. In practice, they are conditional on the node degree. Finally, all these topological properties have been stable over time. Individual trade relations may change over time, but the ITN is a highly stable system overall (Fagiolo et al., 2009; Zhou et al., 2016). This line of inquiry is often technical and uses sophisticated computational procedures. Most scholars in this literature are from physics, mathematics, or information sciences. Nevertheless, it is also highly descriptive and not sufficiently theory informed. More efforts should be put into connecting revealed topological properties with economic and sociological theories. More interesting questions beyond descriptive studies in the future can include: What are the determinants of these topological properties? Are there relevant nodal, dyadic, or systemic characteristics (including those outside the ITN itself) that can explain the peculiar topological properties observed in the ITN? What are the consequences and implications of these topological properties for international trade or economic development? How do these properties support or challenge theories on international trade?

588   Min Zhou

Explaining the ITN The third line of inquiry models the structure and evolution of the ITN through various SNA modeling strategies. The key motivation of this literature is to make use of SNA to better explain observed trade patterns among countries. This literature is thus explanatory and sheds light on why the ITN takes on the shape as observed. Most models in this literature are closely related to the gravity model of international trade popular in international economics. There have been fruitful efforts in bridging SNA with the gravity model. Some identify major problems with the gravity model and make important improvements. These improvements include highlighting the importance of social homophily in international trade, incorporating network characteristics (such as systemic factors and topological properties) when modeling international trade, and using SNA-based techniques that generate better statistical estimation.

Effect of Homophily on the ITN It is well established in international economics that bilateral trade is well described empirically by the gravity model. Its most basic version relates trade between two countries positively to their economic size and negatively to the geographic distance between them. Similar to the Newtonian law of gravity in physics, this basic form of the gravity model can be typically specified as: Tradeij = A(GDPi GDPj) / Distij, where Tradeij is the value of bilateral trade flows between country i and country j, the gross domestic products (GDPs) are their respective economic sizes, Distij is a measure of the distance between them, and A is a constant. The conventional procedure is to take the logarithms of the original multiplicative gravity equation. The dependent variable becomes ln(Tradeij) and then the gravity model can be estimated in its linear form. Extensions to this basic gravity model have been widely employed to discern the marginal explanatory power of other trade-related variables of interest. The gravity model has solid theoretical foundations (Anderson, 1979; Bergstrand, 1985; Deardorff, 1998; Feenstra, Markusen, & Rose, 2001) and successful empirical applications (Frankel, 1997; Rose, 2004, 2005; Ingram, Robinson, & Busch, 2005; Zhou, 2010, 2011). It is seen as “one of the great success stories in empirical economics” (Feenstra et al., 2001, p. 431). SNA scholars have long been interested in understanding why actors develop strong ties with some partners but not others. The most reliable principle that shapes formation of ties in various social networks is homophily (McPherson, Smith-Lovin, & Cook, 2001). Homophily can be broadly defined as attraction generated by similarity in two actors’ characteristics. Simply put, similar actors are more likely to develop stronger ties. It is found that in the ITN homophily also operates—similarities between two countries, especially geographic, political, and sociocultural proximity, generate stronger bilateral trade ties (Frankel,  1997; Zhou, 2010, 2011). Adopting a homophily perspective facilitates incorporation of seemingly noneconomic factors into the gravity model and enhances its explanatory power.

THE International Trade Network   589 Geography is “the most basic source of homophily” (McPherson et al.,  2001, p. 429). Proximity in geographic distance brings about more opportunities and less effort for social interactions. There is a higher chance of establishing a tie with those close in geographic distance than those who are far away. International trade is a sort of social relation linking territorially distinct countries across space through economic exchanges. Geographic homophily exists in the ITN too. It is common sense that trading with neighboring countries is less costly than with distant ones. More interestingly, geographic homophily has not diminished over recent decades, despite remarkable progress in transportation and communications technologies (Zhou, 2010, 2011). Theories such as new economic geography stress an increasing importance of geographical homophily as the real force driving the high regional concentration of trade (Krugman, 1991; Fujita, Krugman, & Venables, 1999). Regionalization has become a prominent trend in the ITN as a result (Kim & Shin, 2002). Politics is found to be another source of homophily in the ITN. Similarity in political systems between two countries promotes affinity and helps them to carry out more liberal trade policies toward each other (Mansfield, Milner, & Rosendorff,  2000; Milner & Kubota, 2005). An extreme example is the deep divide in international trade along political ideologies during the Cold War era. Due to political homophily, two politically similar countries are more likely to develop stronger trade ties (Bliss & Russett,  1998; Dixon & Moon, 1993; Oneal & Russett, 1997). This political homophily is still discernible in today’s ITN, but it has not increased in recent years (Zhou, 2010, 2011). Culture also generates salient homophily in the ITN. Deep-rooted cultural differences remain noticeable among countries despite globalizing trends (Hofstede, 2001; Inglehart & Baker,  2000). Similar cultural background facilitates communication and transaction in economic exchange (Bandelj, 2002; Elsass & Veiga, 1994; Neal, 1998). Practices of economic exchange are culturally specific, and different cultural practices hamper business activities across borders. Different national cultures make it difficult to understand and predict the behavior of others, thereby impeding mutual trust and complicating effective interactions. The risks and uncertainties inherent in international trade lead countries to trade more with their culturally similar counterparts. Inertia or path dependence also plays a role. Cultural ties forged within former colonial empires also act as trade-promoting cultural homophily. These historical colonial ties give rise to relatively stable trade networks, such that trading preference between former colonizers and colonies is still discernible (Louis & Robinson, 1994). Recent studies have confirmed the continuous or even increasing importance of cultural homophily in the ITN (Zhou, 2010, 2011). This literature suggests that countries may increasingly favor their geographically and culturally proximate counterparts in international trade, thereby giving rise to intensified homophily in the overall ITN. Some studies try to explain this intensified homophily. For instance, an analysis of international bilateral trade data at the sector level produces such an explanation (Zhou, 2011). Disintegration of productive activities and product differentiation are two increasingly pronounced trends in international economic activities. First, firms are increasingly able to divide the production process into different stages and locate them over the globe according to comparative advantages. To produce a commodity, many intermediate components are brought together from various countries, and they contribute greatly to the expansion of today’s international trade. This increasing disintegration of productive activities expands the homophily-sensitive intermediate input sector in international trade. Second, there are many varieties of the same commodity in the global market

590   Min Zhou made by different countries, and a country imports and exports the same commodity but of different varieties. Facing product differentiation, consumers choose a product not simply for its basic function, but also for its cultural and symbolic content. This increasing product differentiation expands the homophily-sensitive finished good sector in international trade. Expansion of international trade in homophily-inducing sectors such as the intermediate input and finished good sectors outpaces less homophily-sensitive sectors such as raw material. Consequently, this differential expansion of trade across sectors shifts the composition of the overall international trade and makes it more subject to homophily. Overall, this observed homophily along geographic, political, and cultural lines reveals multiple dimensions within international trade (Zhou, 2010). Instead of becoming autonomized or “socially decontextualized” (Hirst & Thompson,  1999, p. 10), the ITN is still embedded in various international sociocultural networks. International trade should not be reduced to pure economic markets. The presence of various homophily reconfirms the importance of an SNA perspective on international trade.

Effect of Systemic Equivalence on the ITN It has been noticed that most studies on international trade view bilateral trade as simply the business of the two countries involved and thus isolate the dyad from the overall international trade system. Notably, the conventional gravity model sees bilateral trade flows as a product of mutual attraction (or so-called gravity) generated by the two countries themselves, without considering the broader systemic context the dyad is embedded in. Even the addition of the homophily idea to the gravity model remains at the dyadic level. This dyadic view of international trade has been criticized, and some network scholars go beyond the dyadic level and incorporate higher-level network factors into the study of international trade. This SNA literature draws attention to countries’ structural locations within the broader ITN and examines how these systemic positions affect their trade with each other. In particular, it finds that similarity in systemic positions influences bilateral trade ties. Two types of structural similarity have been examined at the higher systemic level. They are structural equivalence and role equivalence. Two countries are seen as perfectly structurally equivalent if they trade with identical other countries in the network (Wasserman & Faust, 1994, p. 356). One hundred percent structural equivalence is rare, but we can measure the degree of structural equivalence. In contrast, two role-equivalent countries have trade ties with the same types of other countries. Role equivalence does not require the two countries to have identical ties with identical others as required by structural equivalence. Two role-equivalent countries relate in the same ways with other countries who are themselves in the same positions (Wasserman & Faust, 1994, p. 473). Despite this conceptual difference, both structural equivalence and role equivalence are at the systemic level because they are based on the comparison of two countries’ respective trade relations with others in the system. They do not presuppose a direct dyadic tie between two countries (Zhou, 2013). Both types of systemic equivalence have been applied to the ITN and are found to influence bilateral trade ties. The influence of role equivalence is stressed in SNA studies inspired by world system theory. Role equivalence, operationalized by blockmodeling, is used by these studies to identify world system status. Two countries share the same world system status if they are role equivalent; that is, they maintain ties to the same types of others in the world system.

THE International Trade Network   591 World system theory contends that world system status in the overall ITN matters for bilateral trade ties. Two core countries have large bilateral trade flows, while bilateral trade ties are weak between two peripheral countries. Peripheral countries often have their foreign trade relations concentrated in those with a few core countries (such as their former colonizers). On the other hand, structural equivalence is employed to capture the common thirdparty effect (Kim & Skvoretz, 2010; Zhou & Park, 2012). The SNA literature suggests a positive influence of structural equivalence on tie formation between actors (Burt, 1987, 1988; Friedkin, 1984; Mizruchi, 1990, 1993). Several studies reveal the trade-promoting effect of structural equivalence. Kim and Skvoretz’s (2010) study on the “third-party effect” and Zhou and Park’s (2012) study on the cohesion effect of structural equivalence both find that two countries develop more bilateral trade when they share more common third trade partners. According to them, the trade-promoting effect of structural equivalence operates through three mechanisms. First, countries sharing the same third partners are more exposed to common sociocultural values and market preferences. Second, common third partners provide ideal locales where two countries meet and discover business opportunities in each other’s markets. Third, structural equivalence exposes the two countries to common formal and informal institutions, which facilitates their interaction with each other and even promotes convergence of their institutions. Taken together, structural equivalence promotes common social values, information flows, and institutional convergence between two countries, thereby generating more bilateral trade. This third-party effect reflects the presence of triadic closure in the ITN, a Simmelian property found in many other social networks (e.g., two strangers have a higher probability of becoming friends if they share common friends).

Effect of Topological Properties on the ITN Also responding to limitations of the traditional gravity model, some scholars have begun incorporating topological properties into modeling the ITN. The traditional gravity model is limited to the dyadic level and neglects the fact that international trade occurs in a complex (not a random) network. For instance, the work by Baskaran et al. (2011) recognizes the ITN as a scale-free network and thus brings the network parameter, gamma, into the gravity model as a key control variable. The node degree distribution is an essential topological characteristic of a scale-free network. Gamma is a parameter created to characterize the degree distribution. Roughly speaking, gamma captures how even or uneven trade connections are distributed across countries—a smaller gamma indicates that the ITN is more even and that trade relations are less concentrated. Hence, gamma is a network variable used as a proxy for the topological structure of the ITN. Incorporating this network variable into the gravity model improves the modeling of international trade. Failure to control for the network structure of the ITN may cause omitted variable bias and contaminate estimation of the gravity model. The importance of controlling for topological properties when modeling international trade is also confirmed by Fagiolo (2010) using the framework of the gravity model residual ITN. Fagiolo (2010) examines the estimated residues of the gravity model of international trade. In this residual ITN, all dyadic factors known to the gravity model literature that affect bilateral trade flows have been removed. It is found that this residual ITN still displays

592   Min Zhou topological properties of a complex network, rather than a random network. Hence, all those standard determinants of bilateral trade flows in the gravity model literature cannot fully account for the observed pattern of the ITN. This finding again suggests that the traditional gravity model may actually have omitted variable problems and that incorporation of topological properties is required for a more accurate model of international trade.

Multivariate Regression Quadratic Assignment Procedure In addition to neglecting systemic influences and network properties in the modeling, another limitation of the gravity model literature lies in its statistical estimation. When studying bilateral trade, because the same country appears in multiple dyads, the assumption of observation independence is violated. This dyadic dependence may affect the effectiveness of standard statistical tests. SNA scholars have developed the MRQAP to address possible interdependence across observations often seen in dyadic data, or dyadic auto­ correlation (Dekker, Krackhardt, and Snijders 2007; Krackhardt, 1987, 1988). Conventional statistical analysis can be combined with this MRQAP to better estimate dyadic data. The MRQAP especially helps produce more reliable tests of statistical significance. Here is how the MRQAP works. For the sake of simplicity I use its combination with ordinary least squares (OLS) regression as an example. The procedure begins by producing random permutation of the adjacency matrix (reordering the rows and columns while preserving the original structure of the matrix) and then runs multiple (often hundreds of) iterations of the OLS model with randomly permutated matrices. The coefficients obtained from these iterations are compared with the coefficients in the original OLS model. The percentage of frequency with which the permutation coefficients exceed the observed OLS coefficient is called an MRQAP probability. It can be seen as a pseudo t-test and indicates the statistical reliability of the original OLS results. This test can be interpreted like conventional tests of significance—a result of less than 5% provides evidence that the original OLS coefficients are statistically significant. The MRQAP does not require an assumption of independent observations. It is robust against dyadic autocorrelation and allows us to assess the efficiency of the OLS results. Some scholars have already adopted the MRQAP in estimating the gravity model of international trade (Zhou, 2011; Zhou & Park, 2012). This procedure generates more reliable estimation of significant variables influencing international bilateral trade. Nevertheless, it is worth noting that the MRQAP itself does not account for higher-level effects, such as systemic factors or network topologies, so using it alone does not solve potential omitted variable problems and thus may still produce inappropriate results.

A Future Direction: Exponential Random Graph Model To sum up the SNA modeling of international trade discussed earlier, building on the conventional gravity model SNA, scholars so far have made two major contributions. First, they

THE International Trade Network   593 incorporate key network variables such as social homophily, systemic equivalence, and ­topological properties into the gravity model, in the hope of solving omitted variable problems in the conventional gravity model. Second, they develop better statistical procedures, such as the MRQAP, that accommodate dyadic interdependence inherent in international bilateral trade data and produce more reliable statistical tests. Taken together, these modeling strategies are still under the gravity model framework. Another promising modeling strategy is to directly use SNA models such as the ERGM (Holland & Leinhardt, 1981; Snijders et al., 2006; Strauss & Ikeda, 1990; Wasserman & Pattison, 1996). The ERGM models social networks parametrically based on random graph theory. It detects how the structural characteristics of the observed network differ from random networks and estimates how different network formation processes influence network characteristics differently. Very few studies have actually made use of the ERGM to explore the ITN, but this modeling technique has lots of potential. First, the ERGM allows interdependence among observations. As mentioned previously, due to the relational nature of bilateral trade data, observations are not independent from each other, but there is interdependence between observations. The trade tie between two countries is conditional on their ties with others in the network. As a result, statistical modeling of ITN data should be different from conventional statistical methods. The ERGM is such a model that has been developed for the purpose of modeling network data. Second, the ERGM estimates the influence of network formation processes, including both processes endogenous to the network and processes stemming from exogenous attributes, on network structure. On the one hand, similar to the gravity model framework, the ERGM can accommodate exogenous factors, including characteristics of the countries, such as attributes of the sender and the receiver, dyadic or higher-level factors, or variables derived from the external network environment. More importantly, it can control for or even reveal the endogenous self-organizing effects of the ITN such as density, reciprocity, and transitivity. For instance, the density effect captures the tendency of a random network occurring when all relevant explanatory variables are held under control. Even if all relevant explanatory variables have been considered, the ITN would still have some ties and display a certain density just by chance. The reciprocity effect captures the reciprocal tendency between countries in the directed ITN (i.e., the exporter is also likely to be an importer in a bilateral trade relationship). The transitivity effect captures the common third-party effect discussed previously—countries sharing more common third trade partners are more likely to trade intensively with each other. Many other possible endogenous processes are worth exploring too, including isolates, out two-star or in two-star, simple two-path, cyclic closure, and transitive closure, to name a few. These endogenous processes cannot be fully accounted for by exogenous variables in the gravity model. Usually after the gravity model has been estimated, the residue still displays patterns of a complex network that remain unexplained (Fagiolo, 2010). These patterns unaccounted for in the gravity model are very likely due to endogenous self-organizing processes. In this sense, the ERGM is superior to the gravity model as it better accounts for these endogenous processes. In future research the ERGM is a promising tool that will generate more insight into the ITN.

References Anderson, J. E. (1979). A theoretical foundation for the gravity equation. American Economic Review, 69(1), 106–116.

594   Min Zhou Bandelj, N. (2002). Embedded economies: Social relations as determinants of foreign direct investment in Central and Eastern Europe. Social Forces, 81(2), 411–444. Barabási, A.-L. (2002). Linked: The new science of networks. Cambridge, MA: Perseus Publishing. Baskaran, T., Blöchl, F., Brück, T., & Theisb, F. J. (2011). The Heckscher-Ohlin model and the network structure of international trade. International Review of Economics & Finance, 20(2), 135–145. Bergstrand, J.  H. (1985). The gravity equation in international trade: Some microeconomic foundations and empirical evidence. Review of Economics and Statistics, 67(3), 474–481. Bhattacharya, K., Mukherjee, G., Sarämaki, J., Kaski, K., & Manna, S. (2008). The international trade network: Weighted network analysis and modeling. Journal of Statistical Mechanics Theory and Experiment, 2008, P.02002. doi:10.1088/1742–5468/2008/02/P02002 Bliss, H., & Russett, B. (1998). Democratic trading partners: The liberal connection, 1962–1989. Journal of Politics, 60(4), 1126–1147. Breiger, R.  L. (1981). Structures of economic interdependence among nations. In P.  Blau & R. Merton (Eds.), Continuities in structural inquiry (pp. 353–380). Beverly Hills, CA: Sage. Burt, R. S. (1987). Social contagion and innovation: Cohesion versus structural equivalence. American Journal of Sociology, 92, 1287–1335. Burt, R. S. (1988). Some properties of structural equivalence measures derived from sociometric choice data. Social Networks, 10, 1–28. Chase-Dunn, C., & Grimes, P. (1995). World-systems analysis. Annual Review of Sociology, 21, 387–417. Clark, R., & Beckfield, J. (2009). A new trichotomous measure of world-system position using the international trade network. International Journal of Comparative Sociology, 50, 5–38. Deardorff, A. V. (1998). Determinants of bilateral trade: Does gravity work in a neoclassical world? In J. A. Frankel (Ed.), The regionalization of the world economy (pp. 7–22). Chicago, IL: University of Chicago Press. Dekker, D., Krackhardt, D., & Snijders, T. A. B. (2007). Sensitivity of MRQAP tests to collinearity and autocorrelation conditions. Psychometrika, 72(4), 563–581. Dixon, W. J., & Moon, B. E. (1993). Political similarity and American foreign trade patterns. Political Research Quarterly, 46(1), 5–25. Easley, D., & Kleinberg, J. (2010). Networks crowds and markets: Reasoning about a highly ­connected world. New York, NY: Cambridge University Press. Elsass, P.  M., & Veiga, J.  F. (1994). Acculturation in acquired organizations: A force-field ­perspective. Human Relations, 47(4), 431–453. Fagiolo, G. (2010). The international-trade network: Gravity equations and topological properties. Journal of Economic Interaction and Coordination, 5(1), 1–25. Fagiolo, G., Reyes, J. A., & Schiavo, S. (2008). On the topological properties of the world trade web: A weighted network analysis. Physica A: Statistical Mechanics and Its Applications, 387(15), 3868–3873. Fagiolo, G., Reyes, J., & Schiavo, S. (2009). World-trade web: Topological properties, dynamics, and evolution. Physical Review E, 79(3), 036115. Feenstra, R. C., Markusen, J. R., & Rose, A. K. (2001). Using the gravity equation to differentiate among alternative theories of trade. Canadian Journal of Economics, 34(2), 430–447. Frankel, J. A. (1997). Regional trading blocs in the world economic system. Washington, DC: Institute for International Economics. Friedkin, N. E. (1984). Structural cohesion and equivalence explanations of social homogeneity. Sociological Methods and Research, 12, 235–261.

THE International Trade Network   595 Fujita, M., Krugman, P., & Venables, A.  J. (1999). The spatial economy: Cities, regions, and international trade. Cambridge, MA: MIT Press. Garlaschelli, D., Di Matteo, T., Aste, T., Caldarelli, G., & Loffredo, M.  I. (2007). Interplay between topology and dynamics in the world trade web. European Physical Journal B, 57, 159–164. Garlaschelli, D., & Loffredo, M.  I. (2004). Fitness-dependent topological properties of the world trade web. Physical Review Letters, 93(18), 188701. doi:10.1103/PhysRevLett.93.188701 Garlaschelli, D., & Loffredo, M. I. (2005). Structure and evolution of the world trade network. Physica A: Statistical Mechanics and its Applications, 355(1), 138–144. Goyal, S. (2007). Connections: An introduction to the economics of networks. Princeton, NJ: Princeton University Press. Hirst, P., & Thompson, G. (1999). Globalization in question: The international economy and the possibilities of governance (2nd ed.). Cambridge, UK: Polity Press. Hofstede, G. (2001). Culture’s consequences: Comparing values, behaviors, institutions, and organizations across nations (2nd ed.). Thousand Oaks, CA: Sage Publications. Holland, P. W., & Leinhardt, S. (1981). An exponential family of probability-distributions for directed-graphs. Journal of the American Statistical Association, 76(373), 33–50. Inglehart, R., & Baker, W. E. (2000). Modernization, cultural change, and the persistence of traditional value. American Sociological Review, 65(1), 19–51. Ingram, P., Robinson, J., & Busch, M.  L. (2005). The intergovernmental network of world trade: IGO connectedness, governance, and embeddedness. American Journal of Sociology, 111, 824–858. Jackson, M.  O. (2008). Social and economic networks. Princeton, NJ: Princeton University Press. Kim, S., & Shin, E.-H. (2002). A longitudinal analysis of globalization and regionalization in international trade: A social network approach. Social Forces, 81, 445–468. Kim, S., & Skvoretz, J. (2010). Embedded trade: A third party effect. Social Science Quarterly, 91(4), 964–983. Krackhardt, D. (1987). QAP partialling as a test of spuriousness. Social Networks, 9(2), 171–186. Krackhardt, D. (1988). Predicting with networks—Nonparametric multiple-regression analysis of dyadic data. Social Networks, 10(4), 359–381. Krugman, P. (1991). Geography and trade. Cambridge, MA: MIT Press. Li, X., Jin, Y. Y., & Chen, G. (2003). Complexity and synchronization of the world trade web. Physica A: Statistical Mechanics and its Applications, 328(1–2), 287–296. Louis, W. R., & Robinson, R. (1994). The imperialism of decolonization. Journal of Imperial and Commonwealth History, 22(3), 462–511. Mahutga, M. C. (2006). The persistence of structural inequality? A network analysis of international trade, 1965–2000. Social Forces, 84, 1863–1889. Mahutga, M. C., & Smith, D. A. (2011). Globalization, the structure of the world economy and economic development. Social Science Research, 40, 257–272. Mansfield, E. D., Milner, H. V., & Rosendorff, B. P. (2000). Free to trade: Democracies, autocracies, and international trade. American Political Science Review, 94(2), 305–321. McPherson, M., Smith-Lovin, L., & Cook, J. M. (2001). Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27, 415–444. Milner, H. V., & Kubota, K. (2005). Why the move to free trade? Democracy and trade policy in the developing countries. International Organization, 59(1), 107–143. Mizruchi, M.  S. (1990). Cohesion, structural equivalence, and similarity of behavior: An approach to the study of corporate political power. Sociological Theory, 8, 16–32.

596   Min Zhou Mizruchi, M. S. (1993). Cohesion, equivalence, and similarity of behavior: A theoretical and empirical assessment. Social Networks, 15, 275–307. Neal, M. (1998). The cultural factor: Cross-national management and the foreign venture. London, UK: MacMillan Press. Nemeth, R. J., & Smith, D. A. (1985). International trade and world-system structure: A multiple network analysis. Review (Fernand Braudel Center), 8(4), 517–560. Oneal, J. R., & Russett, B. (1997). The classic liberals were right: Democracy, interdependence, and conflict, 1950–1985. International Studies Quarterly, 41(2), 267–294. Picciolo, F., Squartini, T., Ruzzenenti, F., Basosi, R., & Garlaschelli, D. (2012). The role of distances in the world trade web. arXiv preprint, arXiv: 1210.3269. doi:10.1109/SITIS.2012.118 Reichardt, J., & White, D. R. (2007). Role models for complex networks. European Physical Journal B, 60, 217–224. Rose, A. K. (2004). Do we really know that the WTO increases trade? American Economic Review, 94(1), 98–114. Rose, A. K. (2005). Which international institutions promote international trade? Review of International Economics, 13(4), 682–698. Schweitzer, F., Fagiolo, G., Sornette, D., Vega-Redondo, F., Vespignani, A., & White, D.  R. (2009). Economic networks: The new challenges. Science, 325, 422–425. Serrano, M. A., & Boguna, M. (2003). Topology of the world trade web. Physical Review E, 68, 015101(R). doi:10.1103/PhysRevE.68.015101 Serrano, M. A., Marian, B., & Vespignani, A. (2007). Patterns of dominant flows in the world trade web. Journal of Economic Interaction and Coordination, 2(2), 111–124. Smith, D. A., & White, D. R. (1992). Structure and dynamics of the global economy: Network analysis of international trade 1965–1980. Social Forces, 70, 857–893. Snijders, T. A. B., Pattison, P. E., Robins, G., & Handcock, M. S. (2006). New specifications for exponential random graph models. Sociological Methodology, 36, 99–153. Snyder, D., & Kick, E. L. (1979). Structural position in the world system and economic growth, 1955–1970: A multiple-network analysis of transnational interactions. American Journal of Sociology, 84, 1096–1126. Squartini, T., Fagiolo, G., & Garlaschelli, D. (2011a). Randomizing world trade: A binary network analysis. Physical Review E, 84, 046117. doi:10.1103/PhysRevE.84.046117 Squartini, T., Fagiolo, G., & Garlaschelli, D. (2011b). Randomizing world trade II: A weighted network analysis. Physical Review E, 84, 046118. doi:10.1103/PhysRevE.84.046118 Strauss, D., & Ikeda, M. (1990). Pseudolikelihood estimation for social networks. Journal of the American Statistical Association, 85(409), 204–212. Van Rossem, R. (1996). The world system paradigm as general theory of development: A cross-national test. American Sociological Review, 61(3), 508–527. Wallerstein, I. (1974). The modern world-system. New York, NY: Academic Press. Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications. Cambridge, UK: Cambridge University Press. Wasserman, S., & Pattison, P. (1996). Logit models and logistic regressions for social networks: I. An introduction to Markov graphs and p*. Psychometrika, 61(3), 401–425. Watts, D. J. (2004). The “new” science of networks. Annual Review of Sociology, 30(1), 243–270. Wilhite, A. (2001). Bilateral trade and “small-world” networks. Computational Economics, 18(1), 49–64. Zhou, M. (2010). Multidimensionality and gravity in global trade, 1950–2000. Social Forces, 88(4), 1619–1643.

THE International Trade Network   597 Zhou, M. (2011). Intensification of geo-cultural homophily in global trade: Evidence from the gravity model. Social Science Research, 40(1), 193–209. Zhou, M. (2013). Substitution and stratification: The interplay between dyadic and systemic proximity in global trade, 1993–2005. Sociological Quarterly, 54(2), 302–334. Zhou, M., & Park, C. (2012). The cohesion effect of structural equivalence on global bilateral trade, 1948–2000. International Sociology, 27(4), 502–523. Zhou, M., Wu, G., & Xu, H. (2016). Structure and formation of top networks in international trade, 2001–2010. Social Networks, 44, 9–21.

CHAPTER 32

M a ps of Science , Tech nol ogy, a n d Education Katy Börner

For centuries, cartographic maps of earth and water have guided human exploration. They have marked the border between the known and the unknown, firing the imagination and fueling the desire for new knowledge and new experience. Today, science maps serve as visual interfaces to immense amounts of data, depicting people, objects, and their (social) relationships in ways that allow us to effectively discern apparent outliers, clusters, and trends. Exemplarily, the Places & Spaces: Mapping Science exhibit (http://scimaps.org) features more than 100 maps of science and technology (S&T) together with several interactive data visualizations, called macroscopes. As an example, Figure 32.1 shows a world map with an overlay of the research collaboration network created by Olivier H. Beauchesne (2011a, 2011b). Elsevier’s Scopus publication data was used to compute the number of times two authors in different cities appear on one paper together during the years 2005–2009. As can be seen, research collaborations are truly global. There is a high density of collaborations within Europe and, to a lesser extent, within North America. Africa and South America collaborate often with the countries that had formerly colonized them. Background information on the history and utility of science maps together with many examples can be found in the Atlas of Science (Börner, 2010), Atlas of Knowledge (Börner,  2015), and science mapping review by Chen (2017). An overview of popular ­science mapping tools can be found in Cobo et al. (2011). Hands-on tutorials for diverse tools are available in the Information Visualization MOOC (http://ivmooc.cns. iu.edu). In this chapter, we review the general process by which maps of S&T are created using the theoretical visualization framework introduced in Börner (2015). We showcase the power of maps to not only help locate us in physical space but also help us understand the social networks in which we operate, the extent and structure of our collective science and technology knowledge, and the learning pathways individuals and cohorts of students take. Sample maps of science, technology, and education will be used to illustrate the creation

Maps of Science, Technology, and Education   599

figure 32.1  Scientific collaborations between world cities by Olivier H. Beauchesne. Full color figures available on Oxford Handbooks Online. and usage of (interactive) data visualizations to make data-driven decisions. We conclude with a discussion and outlook.

Map Design Making sense of data by designing appropriate visualizations is a complex process that involves human perception and cognition (Palmer, 1999; Ware, 2012), but also data mining, visualization algorithms, and user interfaces. Different conceptualizations of the overall process have been developed to understand and optimize this process and to improve human decision-making capabilities. Among others, process models focus on key sense-making leverage points (Pirolli & Card, 2005), the match between preconceptualizations and expectations of visualization designers and visualization readers (Hook & Börner, 2005), major data transformation and visual mappings (Chi, 2000), or visualization design and interpretation to support workflow optimization and tool design. When visualizing the structure and dynamics of science or technology, the data that needs to be represented is high dimensional and inherently complex. Many different types of visualizations can be used, and many different “mappings” of data attributes to visual attributes are possible. To ease the design of effective visualizations, different visualization frameworks (also called taxonomies or classifications) have been proposed in statistics, information visualization, and graphic design (Harris,  2000; Keim,  2001; Kosslyn,  1989;

600   Katy Börner Mackinlay, 1986; Munzner, 2014; Shneiderman, 1996; Wilkinson, 2005). Börner (2015) provides predefined types for the process of data visualization including common “task types” and “insight need types” (see Table 32.1, left column). In general, maps address one or more task types in Börner (2015) that answer different questions. There are four general task types and associated questions: (1) temporal, answering “when” questions; (2) spatial, “where”; (3) topical: “what”; and (4) trees and network layouts, “with whom.” Given well-defined general task types and specific insight need types, the final visualization will also depend on the type of data (see “data scale types,” Table 32.1, column 2), the available “visualization types” (Table 32.1, column 3), “graphic symbol types” (Table 32.1, column 4), and “graphic variable types” (Table 32.1, column 5; each type is further detailed in Börner, 2015; e.g., “retinal: form” includes size, shape, rotation, curvature, angle, and closure) that can be used, and the level of interaction required by the final visualization (Table 32.1, column 6). Each type is well defined and exemplified; see Table 32.2 on visualization types. Any visualization can theoretically be analyzed and interpreted as a path along the columns of Table 32.1. For example, given a scientific question, the question type and detailed insight need are identified, and then data of different scale(s) are acquired, a visualization type is selected, and relevant geometric symbol types are chosen and visually modified (e.g., color-coded) using different graphic variable types. Last but not least, different interaction types might be implemented to facilitate the interactive exploration of the visualization (see examples discussed in later sections).

Table 32.1  Visualization Framework Designed to Ease the Selection and Design of Data Visualizations Insight Need Types

Data Scale Types

Visualization Types

Graphic Symbol Graphic Types Variable Types

Interaction Types

• categorize/cluster • order/rank/sort • distributions (also outliers, gaps) • comparisons • trends (process and time) • geospatial • compositions (also of text) • corrections/ relationships

• normal • ordinal • interval • ratio

• table • chart • graph • map • network layout

• spatial • geometric position symbols • retinal point form color line area optics surface motion volume • linguistic symbols text numerals punctuation marks • pictorial symbols images icons statistical graphs

• overview • zoom • search and locate • filter • details on demand • history • extract • link and brush • projection • distortion

Adapted from Börner (2015).

Maps of Science, Technology, and Education   601 Table 32.2  Visualization Types and Examples Name

Description

Examples

Tables

Ordered arrangements of rows and columns in a grid. Grid cells may contain geometric, linguistic, or pictorial symbols.

Figures 32.6–32.9, 32.12

Charts

Depict quantitative and qualitative data without using a well-defined reference system.

Examples are pie charts in which the sequence of “pie slices” and the overall size of a “pie” are arbitrary, or word clouds.

Graphs

Plot quantitative and/or qualitative data variables to a well-defined reference system, such as coordinates on a horizontal or vertical axis.

Bar graphs in Figures 32.6–32.10, 32.12

Maps

Display data records visually according to their physical (spatial) relationships and show how data are distributed spatially.

Figures 32.1, 32.9

Network layouts

Use nodes to represent sets of data records, and link connecting nodes to represent relationships between those records.

Treemap in Figures 32.4, 32.5 Tree layout in Figures 32.11, 32.12 Network layouts in Figures 32.3, 32.6–32.8, 32.10

Map Utility Maps of science can be used to explore, understand, and communicate social and scholarly networks and their interdependence, as well as the expertise profiles of institutes or nations; to chart career trajectories; and to identify emerging research frontiers; among others. They can show homogeneity versus heterogeneity, cause and effect, and relative speed of progress. They allow us to track the emergence, evolution, and disappearance of topics and help to identify the most promising areas of research. Maps can be created for (interactive) exploration or the communication of insights. Maps might be created at the individual (micro) level or the global (macro) level to support different levels of decision making (see Figure 32.2). They may address one or more task types and associated questions such as (1) temporal, answering “when” questions; (2) spatial, “where”; (3) topical: “what”; and (4) trees and network layouts, “with whom.” Maps are used by policymakers, industry, scholars, or children to make more informed personal or professional decisions. User and needs analysis studies are used to identify the best task type(s) and level (micro, meso, macro) of analysis and visualization design (see Figure 32.2). Studies at the micro-level might be possible by hand. Most tools support micro-to-meso-level studies. Macro-level studies might require highly scalable algorithms and advanced supercomputing infrastructures. The best approaches and tools depend on the type of analysis; for example, major geospatial tools are developed in cartography, while topical analysis tools are developed by linguists. Ultimately, user studies and/or validation studies should be conducted to ensure that the maps can be used by the intended stakeholder group and results are correct and easy to

602   Katy Börner

figure 32.2  Type of analysis versus level of analysis. Full color figures available on Oxford Handbooks Online. understand. Subsequently, we introduce maps that are designed for different stakeholder groups, address different insight needs, and view science, technology, and education data at different levels of analysis.

Exemplary Maps of Science, Technology, and Education For many, science is rather abstract and nebulous. Maps of science give scholarly activity a physical space, also called a basemap; data overlays can be used to indicate the ever­changing structure and dynamics of social networks, diffusion pathways, scholarly and societal impact, or bursts and trends. Maps of science might be created using publication, funding, or social media data; maps of technology typically show patent, trademark, or stock market data; and maps of education use student learning activity data collected from student information systems of learning management systems. Four interactive maps of science and technology are discussed. The Springer Nature SciGraph shows the multimodal network of journal articles, books and chapters, organizations, institutions, funders, research grants, patents, clinical trials, substances, conference series, events, citations but also reference networks, altmetrics results, and links to research datasets. The NSF Graph Tool DIA2 visualizes National Science Foundation (NSF) funding data so that collaboration networks and funding portfolios can be better understood. The NIH CTSA Expertise Explorer helps biomedical researchers understand what expertise and resources different research National Institutes of Health (NIH) centers

Maps of Science, Technology, and Education   603 offer. The NIH Twitter Data Explorer shows the retweeting networks of NIH-relevant tweets. We also ­discuss two online visualization services that are intended to provide guidance for not only learners and teachers/curriculum designers but also employers and counselors interested in keeping up with the increasing speed of science and technology progress.

Springer Nature SciGraph The Springer Nature SciGraph comprises 1.5 to 2 billion triples.1 Each triple has the format “subject→predicate→object” and may connect any subject or concept via a predicate (verb) to any other object to show the type of relationship existing between the subject and the object. An example would be “Smith→coauthored→Paper” or “Paper→acknowledges funding by→Award.” SciGraph links metadata extracted from journals and articles, books and chapters, organizations, institutions, funders, research grants, patents, clinical trials, substances, conference series, events, citations but alsoreference networks and altmetrics results. Other linked open data from trusted, high-quality sources such as Springer Nature are added continuously. SciGraph visualizations reveal how the rich semantic descriptions are related, overcoming former boundaries by relating comprehensive information about the research landscape. See Figure 32.3 for a rendering of the linked open data cloud by Evangelos Kalampokis. The overall goal is to increase discoverability of high-quality data as larger parts of the SciGraph data will be made freely available in various formats (CSV, JSON, XML) under a CC BY-NC 4.0 license.2

NSF Graph Tool DIA2 DIA2 (short for Deep Insights Anytime, Anywhere) was designed as a central resource for researchers, educators, and learners who are transforming undergraduate education in science, technology, engineering, and mathematics (STEM) (Madhavan et al.,  2014). Using data on 246,902 NSF funding awards made between 1995 and 2016, it renders diverse visualizations in support of six well-defined user needs: (1) understanding the NSF organizational structure and the number of awards distributed across directorates (see Figure  32.4)—clicking on a specific directorate brings up a treemap with all programs, and clicking on a specific program results in a listing of awards and their (co-) principal investigators, institutions, program officers, and geolocations; (2) searching using the thesaurus concept to find out to what program to use to submit a proposal (see Figure 32.5); (3) exploring the network of funded investigators via the People Explorer to understand existing scholarly networks and identify collaborators and competitors (see query result for “Borner” in Figure 32.6); (4) exploring the network of institutions to identify potential collaborators at a specific institution (see query result for “Indiana University” in Figure 32.7); (5) examining NSF programs in DIA2 (Madhavan et al., 2014);

604   Katy Börner

figure 32.3  Major structure of the #LinkedData cloud by Evangelos Kalampokis. Full color figures available on Oxford Handbooks Online. Interactive version is available at http://lod-cloud.net/versions/2017-01-26/cloudImage2017.svg.

and (6) searching for funded projects on specific topics (see query result for “visualization” in Figure 32.8). Each user-requested visualization is added to his or her personal dashboard. Dashboards can be easily optimized by adding/modifying/deleting new visualizations or changing the placement of visualizations. DIA2 is easy to use and addresses important user needs. It was designed for a user group characterized by high domain expertise yet not necessarily high data visualization literacy. By adhering to user experience standards, providing an easy-touse and self-instructive interface, and selecting easy-to-read visualizations that can be progressively refined, DIA2 minimizes visual complexity and enjoys wide usage. In 2016 alone, there were 1,000,000 hits and 145,000 unique queries.

Maps of Science, Technology, and Education   605

figure 32.4  DIA2 NSF Org Structure view.

figure 32.5  DIA2 Thesaurus Concepts view.

606   Katy Börner

figure 32.6  DIA2 People Explorer view with query result for “Borner.” Full color figures available on Oxford Handbooks Online.

figure 32.7  DIA2 Institution Explorer view with query result for “Indiana University.” Full color figures available on Oxford Handbooks Online.

Maps of Science, Technology, and Education   607

figure 32.8  DIA2 Topic Explorer view with query result for “visualization.” Full color ­figures available on Oxford Handbooks Online.

NIH CTSA Expertise Explorer The CTSA Explorer supports the interactive exploration of expertise available via the NIH/ National Center for Advancing Translational Sciences (NCATS)-funded Clinical and Translational Science Awards (CTSAs) using data provided by NIH RePORTER (NIH Reporter, n.d.).3 Users can search for a keyword (e.g., disease, drug, or gene names) and explore and compare the number and topical coverage of clinical trials (CTs), publications, and funding awards, and the expertise held by the different geospatially distributed CTSA centers. The visualization is interactive, allowing users to select one CTSA and explore all its CTs, publications, awards, and expertise profiles. Links in the respective listings lead to full-text documents with information on CTs, publications, and award details. Figure 32.9 shows a screenshot of the interface filtered for “malaria.” In addition to helping non-CTSA-funded researchers benefit from the NCATS-funded centers, the visualization also makes it possible to gain a global overview of potential overlap with other privately funded biomedical research efforts to identify opportunities for collaboration or leveraging resources.

608   Katy Börner

figure 32.9  Geospatial map of CTSA Hub expertise showing symbols for clinical trials (triangles) publications (squares), and funding awards (diamonds) with size coded by the number of trials/publications/awards. Below the map is a sorted listing of CTSAs by total number of trials/publications/awards. Shown on right are listings of major trials/publications/awards with active hyperlinks that lead to full-text publications, funding awards, etc., for closer study. Full color figures available on Oxford Handbooks Online. Interactive version for “malaria” is available at http://demo.cns.iu.edu/client/iai/expertise.html?set=malaria.

NIH Twitter Activity Explorer This interactive visualization shows Twitter data related to usernames and hashtags identified to relate to NIH efforts or goals (e.g., health). The purpose of this visualization is to understand how the NIH and health-related information diffuses and to identify “super spreaders” that can be used to speed up the dissemination of NIH/health-relevant information to the experts and the public. Twitter data was collected using Twitter’s public API for hashtags and usernames identified in advance. Shown in Figure 32.10 is data for 57 Twitter accounts associated with the NIH’s CTSA and CTSA Hubs collected since May 18, 2015. The data was processed to produce networks of Twitter retweeting activity by identified accounts and hashtags and visualized using the Sci2 and Gephi tools (Mathieu, Heymann, & Jacomy, 2009; Sci2 Tool, 2009) (see Figure 32.10, top left). In the network, each node represents a Twitter user account and each directed link denotes retweets. Nodes are size coded by the

Maps of Science, Technology, and Education   609

figure 32.10  Directed retweet networks of Twitter data for accounts associated with NIH and NCATS grant programs (left) together with ego-centric details (top right) and sorted listing of most active users (lower right). Full color figures available on Oxford Handbooks Online. Interactive version is available at http://demo.cns.iu.edu/client/iai/twitter.html. number of mentions and colored by user type. Links are directed from accounts that retweeted a post to the account that was retweeted; they are sized and colored proportionally to the number of times one user retweeted another. Clicking on any node (or vertical bar) brings up user account details such as major sources and followers for a given account (see Figure 32.10, top right). A list of all usernames is given on the right, sorted by number of tweets. The networks can help to identify communities of Twitter users that are interested in the various projects associated with the NIH and the users that have the most influence and ability to spread information in and outside various information networks, and determine the reach of various NIH social media campaigns. The visualization can help the NIH form and encourage specific social media strategies to help CTSA social media accounts to engage with other Twitter users. The network supports examination of accounts that are hubs of information broadcast to the social network and accounts that act as authorities within the network; the relationships between CTSA hub accounts and other users; and the various social media and translational strategies used by CTSA hubs on Twitter.

Learning LeX Subway Maps Nesbitt used a subway metaphor to communicate his PhD plans to his adviser (Nesbitt, 2003, 2004). In the hand-drawn map, interconnecting ideas running through the

610   Katy Börner PhD thesis are represented by different colors. Related ideas correspond to category stations along that track. Overlapping ideas are shown as connected stations. The familiarity of metro maps makes it possible to understand the many different “trains of thought” and their complex interdependencies. Microsoft’s Subway Maps LeX4 uses subway maps to provide visual overviews and easy-touse interfaces to online course released within the official Microsoft Learning eXperiences (LeX) program that was designed to help individuals and organizations maximize the use of Microsoft products. An exemplary map for cloud productivity is shown in Figure 32.11. Each stop is one online course that consists of four modules with a total duration of 8 to 16 hours. Courses with a green filling are released; those with white filling are unscheduled. The map is read from left to right: indicated by the track at the center of the map, the Fundamental IT Pro Skills course should be taken first, followed by Office 365 Administrator courses. There are three specializations as seen by the three-pronged “fork” in this track: Communication Professional, Messaging Administrator, and SharePoint Administrator. In addition, courses on general Collaboration Skills (the lower track) are offered. The map is interactive—clicking on a released course brings up the course description, information on what students will learn, and a link to the course on edX.

CyberSeek Career Maps In 2016, there were 128,000 openings for information security analysts in the United States but only 88,000 workers currently employed in those positions; that is, 40,000 jobs remained unfilled according to CyberSeek,5 putting digital privacy and infrastructure at risk. In addition, there were 220,000 additional openings requesting cybersecurity-related skills, but employers were struggling to find workers who possess these skills. CyberSeek joined with Burning Glass to create career pathways for those interested in becoming cybersecurity workers and protect important and private information, from bank accounts to sensitive military communications. The maps show detailed pathways from entry- to advanced-level jobs (see Figure 32.12, top). Clicking on a node, for example, Cyber Crime Analyst/Investigator, brings up details on average salary, required education, top skills, certifications, and more (see Figure 32.12, bottom). The maps are intended to serve the needs of three stakeholder groups: (1) employers, by answering questions such as “How large is the cybersecurity workforce in my and/or neighboring regions?” or “How much does it cost on average to hire cybersecurity workers in my region?”; (2) educators and career counselors interested in answering questions such as “Should I offer a cybersecurity training program and what skills/certificates should be taught?” or “What entry-level jobs should students target?”; and (3) students interested in knowing the demand for cybersecurity jobs in their region or salary levels given certain skills and educational credentials.

Discussion and Outlook In an age of information overload, the ability to make sense of vast amounts of data and to render insightful visualizations is as important as the ability to read and write.

figure 32.11  Subway map for courses on Cloud Productivity.

612   Katy Börner

figure 32.12  CyberSeek career pathways map. Interactive version is available at http:// cyberseek.org/pathway.html.

Scalable, Multilevel Maps Going forward, there is a need to scale up visualizations so they do the following: • Cover multiple record types (see previous discussion of the Springer Nature SciGraph and visualization of linked open data in Figure 32.3) • Add new data in real time—that is, data is added dynamically as papers are published or funding awards are made • Show overlapping areas of research (see work on using sparse Markov chains to efficiently reveal overlapping and hierarchically nested community structure in citation flow networks: Bae et al., 2017) • Support mining and exploration of diagrams, visualizations, and photographs featured in scholarly records (see recent work on Viziometrics: Lee, West, & Howe, 2017 • Support exploration or communication at multiple levels of aggregation (micro, meso, macro) (see recent work on multilevel graphs: Lazega & Snijders,  2016; Schreiber et al., 2014)

Maps of Science, Technology, and Education   613 • Build on and expand standardization efforts such as the University of California, San Diego (UCSD) map of science (Börner et al., 2012) to provide a scientifically validated “basemap” reference system that is widely used instead of hundreds of maps with limited validation and utility • Are easy to read and use by not only experts but also general audiences

Acknowledgments This work is partially supported by the National Science Foundation under an NCN CP Supplement to 1553044, AISL-1713567, DGE-1735095, DMS-1839167. CA-FW-HTF: Convergence Accelerator 1936656, and the National Institutes of Health under awards P01AG039347, U01CA198934, and OT2OD026671. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Notes 1. http://www.springernature.com/gp/researchers/scigraph?countryChanged=true 2. https://github.com/springernature/scigraph/wiki 3. https://projectreporter.nih.gov 4. http://mslexsubways.azurewebsites.net/#/ 5. https://www.cyberseek.org/index.html#aboutit

References Bae, S.-H., Halperin, D., West, J. D., Rosvall, M., & Howe, B. (2017). Scalable and efficient flowbased community detection for large-scale graph analysis. ACM Transactions on Knowledge Discovery from Data, 11(3), 1–30. doi:10.1145/2992785 Beauchesne, O. H. (2011a). Map of scientific collaborations from 2005 to 2009. http://collabo. olihb.com. Beauchesne, O. H. (2011b). Stream of scientific collaborations between world cities. Courtesy of Science-Metrix, Inc. In K. Börner & M. J. Stamper (Eds.), 7th iteration (2011): Science maps as visual interfaces to digital libraries, Places & spaces: Mapping science. http://scimaps.org. Börner, K. (2010). Atlas of science: Visualizing what we know. Cambridge, MA: MIT Press. Börner, K. (2015). Atlas of knowledge: Anyone can map. Cambridge, MA: MIT Press. Börner, K., Klavans, R., Patek, M., Zoss, A. M., Biberstine, J. R., Light, R. P., . . . Boyack, K. W. (2012). Design and update of a classification system: The UCSD map of science. PLoS One, 7(7), e39464. doi:10.1371/journal.pone.0039464 Chen, C. (2017). Expert review. Science mapping: A systematic review of the literature. Journal of Data and Information Science, 2(2), 1–40. doi:10.1515/jdis-2017–0006 Chi, E. H. (2000). A taxonomy of visualization techniques using the data state reference model. Paper presented at the Proceedings of the IEEE Symposium on Information Visualization 2000.

614   Katy Börner Cobo, M. J., López-Herrera, A. G., Herrera-Viedma, E., & Herrera, F. (2011). Science mapping software tools: Review, analysis, and cooperative study among tools. Journal of the American Society for Information Science and Technology, 62(7), 1382–1402. doi:10.1002/asi.21525 Cyberseek. (n.d.). https://www.cyberseek.org/index.html#aboutit Harris, R. L. (2000). Information graphics: A comprehensive illustrated reference. New York, NY: Oxford University Press. Hook, P. A., & Börner, K. (2005). Educational knowledge domain visualizations: Tools to navigate, understand, and internalize the structure of scholarly knowledge and expertise. In A. Spink & C. Cole (Eds.), New directions in cognitive information retrieval (pp. 187–208). Amsterdam, Netherlands: Springer-Verlag. Keim, D. A. (2001). Visual exploration of large data sets. Communications of the ACM, 44(8), 38–44. doi:10.1145/381641.381656 Kosslyn, S. M. (1989). Understanding charts and graphs. Applied Cognitive Psychology, 3(3), 185–225. doi:10.1002/acp.2350030302 Lazega, E., & Snijders, T. A. B. (Eds.). (2016). Multilevel network analysis for the social sciences: Theory, methods and applications (Vol. 12). Berlin, Germany: Springer-Verlag. Lee, P.-s., West, J. D., & Howe, B. (2017). Viziometrics: Analyzing visual information in the scientific literature. IEEE Transactions on Big Data, 4(1), 117–129. doi: 10.1109/ TBDATA.2017.2689038 LeX subway maps. (n.d.). http://mslexsubways.azurewebsites.net/#/ Mackinlay, J. (1986). Automating the design of graphical presentations of relational information. ACM Transactions on Graphics, 5(2), 110–141. doi:10.1145/22949.22950 Madhavan, K., Elmqvist, N., Vorvoreanu, M., Chen, X., Wong, Y., Xian, H., . . . Johri, A. (2014). Dia2: Web-based cyberinfrastructure for visual analysis of funding portfolios. IEEE Transactions on Visualization and Computer Graphics, 20(12), 1823–1832. doi:10.1109/ TVCG.2014.2346747 Mathieu, B., Heymann, S., & Jacomy, M. (2009). Gephi: An open source software for exploring and manipulating networks. Paper presented at the International AAAI Conference on Weblogs and Social Media. Munzner, T. (2014). Information visualization: Principles, techniques, and practice. Natick, MA: AK Peters. Nesbitt, K. V. (2003). Multi-sensory display of abstract data (Doctoral dissertation). University of Sydney. Nesbitt, K.  V. (2004). Getting to more abstract places using the metro map metaphor. Paper presented at the Proceedings of the 8th International Conference on Information Visualisation, Washington, DC. NIH Reporter. (n.d.). https://projectreporter.nih.gov Palmer, S.  E. (1999). Vision science: Photons to phenomenology. Cambridge, Massachusetts: MIT Press. Pirolli, P., & Card, S. (2005). The sensemaking process and leverage points for analyst technology as identified through cognitive task analysis. In Proceedings of the International Conference on Intelligence Analysis (pp. 2–4). https://www.e-education.psu.edu/geog885/ sites/www.e-education.psu.edu.geog885/files/geog885q/file/Lesson_02/Sense_Making_ 206_Camera_Ready_Paper.pdf. Accessed January 6, 2019. Schreiber, F., Kerren, A., Börner, K., Hagen, H., & Zeckzer, D. (2014). Heterogeneous networks on multiple levels. In A. Kerren, H. C. Purchase, & M. O. Ward (Eds.), Multivariate network visualization (pp. 175–206). Berlin, Germany: Springer International Publishing.

Maps of Science, Technology, and Education   615 Science of Science (Sci2) Tool. (2009). Indiana University and SciTech Strategies. https://sci2. cns.iu.edu. SciGraph dataset downloads. (n.d.). https://github.com/springernature/scigraph/wiki Shneiderman, B. (1996). The eyes have it: A task by data type taxonomy for information visualizations. Paper presented at the Proceedings of the IEEE Symposium on Visual Languages, Washington, DC. Springer nature scigraph. (2017). http://www.springernature.com/gp/researchers/scigraph Ware, C. (2012). Information visualization: Perception for design. Amsterdam, The Netherlands: Elsevier. Wilkinson, L. (2005). The grammar of graphics (statistics and computing). New York, NY: Springer-Verlag.

CHAPTER 33

Cr im i na l N et wor ks Chris M. Smith and Andrew  V. Papachristos

The origin story of social network analysis (SNA) often begins with social psychologist Jacob Moreno’s development of “sociometry” in the 1930s. Moreno founded a journal and a scientific community around the theory and study of individuals and their relationships within social groups (Marineau, 2007; Moreno, 1953 [1934]). Moreno believed that the visualization of social networks using sociograms was equivalent to putting social structures “under the microscope” (Moreno, 1953 [1934], p. 96). Among his earliest test cases for sociometry, Moreno (1953 [1934]) mapped out 4,350 relationships of attraction, rejection, and ­indifference between 435 delinquent girls across 16 cottages at the Hudson School for Girls, a juvenile detention facility. When the school experienced an epidemic of runaways, Moreno identified that the girls running away were not random. Rather, the runaway girls represented a particular pattern of social ties among the residents: the runaway girls were part of a weakly linked chain of mutual friendships that crossed four cottages (Moreno, 1953 [1934], p. 441). In this classic analysis, Moreno established how delinquent actions were a consequence of position in social networks—and with this, the analysis of crime and networks began. Scholars of crime and delinquency arrived a bit late to the network turn in the sciences that has exploded over the last 30 years in disciplines ranging from anthropology to theoretical physics (Papachristos, 2011). However, since the early 2000s, research on crime and networks has increased dramatically, especially the field of peer network effects and juvenile delinquency (Haynie & Kreager, 2013). Empirical studies employing formal network methods and measures to study criminal groups and organizations—such as street gangs, smuggling rings, terrorist organizations, and organized crime syndicates—have been less frequent but can be of great consequence for the development of social scientific theory and criminal justice practice and policy. Advances in criminal networks research are leading scholars into uncharted and innovative territories, often without a guiding theory or even a consistent set of empirical findings. Crime and network research gathers, compares, and analyzes dozens of network properties at the individual, group, or system level to advance fresh perspectives to old criminological debates. The majority of theories explaining deviant and delinquent behavior are inherently relational, and network approaches have the potential to push past criminology’s definitional bulwarks and typologies to reveal how the relationships and behaviors of

Criminal Networks   617 individuals and groups might inform theory, research, and practice (McGloin & Kirk, 2010; Papachristos, 2011, 2014). Criminal opportunity, learning, labeling, and control theories all rely on the influence and structure of interpersonal relationships in their explanations of the etiology of crime and delinquency. For example, classic learning theories stress how people learn and adopt criminal values, attitudes, behaviors, and techniques from their friends and associates over time (Burgess & Akers, 1966; Sutherland, 1947). Similar to conventional behaviors or values, people first learn how to commit criminal acts and to embrace criminal values through meaningful social relationships and in social groups. Social control theory posits that people are less likely to commit crimes when they form bonds to conventional institutions such as peers, families, and schools (Hirschi, 1969) or when the networks among neighborhood residents are capable of collectively monitoring youth behavior and mobilizing community resources (Bursik & Grasmick, 1993). Criminological theories have been tested and debated through the network analysis of extensive school datasets—such as the National Longitudinal Study of Adolescent Health (Add Health), Promoting SchoolCommunity-University Partnerships to Enhance Resilience (PROSPER), and the North Carolina Context of Adolescent Substance Use Study. Overall, this mounting crime and network research finds consistent support for the group nature of delinquency: having delinquent friends is one of the strongest and most persistent predictors of delinquency (Haynie, 2001; Haynie et al., 2005; Haynie & Kreager, 2013; Kreager, Rulison, & Moody, 2011; Young, 2011; Young et al., 2014). Like much of criminology more generally, this emerging crime and network scholarship is largely based on studies of youth and their less serious acts of delinquency, such as underage drinking, substance abuse, and schoolyard fights. As a result, many of our current theories of networks and crime are actually theories of relationships and delinquency. However, do the same learning theories developed during adolescent friendships—often beginning in school—explain adult criminal relationships? Do findings from delinquent activities such as smoking, drinking, or fighting unilaterally transpose on adults engaging in robbery, assault, financial conspiracies, or murder? Or, for that matter, do self-reported acts of delinquency from students align with the network influences on youth or adults engaged in more serious crimes? This chapter takes an important step in trying to unpack what we know about criminal networks, taking as our point of departure some of the key findings from delinquency and extending to networks of serious crimes and crimes of financial and violent consequence. Most reviews of crime and networks weave together research on peer effects and juvenile delinquency with research on criminal networks and organizations (McGloin & Kirk, 2010; Morselli,  2014; Papachristos,  2011,  2014). However, the extensive coverage of this volume across a variety of SNA topics allows us to delineate the criminal networks research from peer and delinquency research, survey the field of criminal networks, and assess the methodological and theoretical challenges that have been slowing down the study of criminal networks.

Criminal Networks The study of criminal networks most often starts with a pair of basic empirical questions: do criminal networks really exist, and, if so, can we measure them? Locating criminal groups, let alone criminal networks, is complicated by the clandestine nature of most crime—a

618   Chris M. Smith and Andrew V. Papachristos problem that plagues qualitative as well as quantitative scholarship (e.g., Watters & Biernacki,  1989; Whyte,  1993 [1943]; Wright et al.,  1992). While ethnographers have produced a wealth of knowledge guided by their access to criminal organizations and dealings, network scholars largely rely on data collected for other functions that they repurpose for relational analysis. Criminal network scholars face a particularly daunting task of deciphering who gets included in criminal networks and what constitutes a criminal relationship. These scholars’ innovations and creativity illuminate crime patterns and organizational structures that might otherwise remain hidden even to the members of the criminal networks themselves. To assess the state of criminal networks research, we conducted a systematic literature review published in social science journals over the last 25 years. This review excluded studies that focus solely on delinquency and excluded studies that discuss networks conceptually. Inclusion in our review required studies to use formal network data, methods, or models. Our review produced a total of 49 unique studies, which we reduced to Table 33.1 to organize a sample of 29 criminal network studies that represent the key domains and findings of the field.1 Much of the work in the field of criminal networks is from the last decade and much stems from small professional working groups that have brought together international groups of scholars—in other words, the work itself was produced by its own network.2 Our systematic literature review summarized in Table 33.1 reveals three overall trends in the criminal network scholarship: (1) innovations on measuring the boundaries of criminal groups and group crimes, (2) data are mostly official law enforcement and court records of closed criminal cases, and (3) the lack of a unifying criminological or sociological theoretical foundation crossing these studies.

Measuring Criminal Groups and Groups of Criminals Criminal Groups Criminal networks research by and large focuses its attention on defining the boundaries of criminal groups and criminal enterprises. The topics shown in Table 33.1 cover a variety of serious criminal networks including street gangs, mafias, drug trafficking rings, smugglers, and white-collar conspiracies. Only one-third of the 29 studies focus on defined criminal groups such as organized crime syndicates or street gangs. For these topics, criminal networks scholars begin with some criminal group boundary or label but then use SNA to determine the boundary, effect, or organization of these groups. This is an important break from past research on criminal groups, which often superimposed a structure on entities like street gangs or the mafia without considering relational data whatsoever. For example, some organized crime groups are actually much less organized when put under the SNA microscope (Calderoni, 2012), and powerful individuals in gang networks are not always the known gang leaders (Morselli, 2009a). Thus, one of the underlying themes of criminal networks research is to “seek, rather than assume, structure” (Morselli, 2009b, p. 18). Our

Table 33.1  Criminal Networks Research  

Researchers

Criminal Network

Data Sources

Nodes

Relationships

Network Measurements

1 Athey and Bouchard (2013)

BALCO steroid scandal

Archives, legal documents

97 individuals

Personal

Community detection

2 Baker and Faulkner (1993)

White-collar price-fixing schemes

Legal documents

33 individuals 21 individuals 24 individuals

Criminal

Centrality, density

3 Baker and Faulkner (2003)

Fountain Oil and Gas Company

Legal documents, interviews, phone survey

230 individuals

Communication, personal

Diffusion

4 Bichler and Malm (2013)

Global gun trafficking

Small Arms Survey

120 nations

Small arms transfers

Stochastic actor-oriented models

5 Breiger et al. (2014)

Global terrorist organizations

News sources

395 organizations

Drug trade engagement

Two-mode networks

6 Bright, Hughes, and Chalmers (2012)

Australian methamphetamine market

Legal documents

36 individuals

Criminal

Centrality

7 Calderoni (2012)

Calabrian ‘Ndrangheta mafia groups

Legal documents

39 individuals 48 individuals

Communication, meetings

Centrality, density

8 DellaPosta (2017)

American mafia groups

Law enforcement records, legal documents

707 individuals

Criminal

Clustering, modularity

9 Hughes (2013)

Chicago 1960s street gangs

Archives

248 individuals

Friendship

Centrality

9/11 hijackers

News sources

37 individuals

Prior contacts, meetings

Distance, neighborhoods

10 Krebs (2002)

(Continued )

Table 33.1  Continued Researchers

Criminal Network

Data Sources

Nodes

Relationships

Network Measurements

Malm, Bichler, and Van De Walle (2010)

Canadian co-offenders

Law enforcement records

2,197 individuals

Criminal, legitimate, personal

Exponential random graph models

12 Mancuso (2014)

Nigerian sex trafficking network

Law enforcement records

86 individuals

Communication

Centrality, density

13 Mastrobuoni and Patacchini (2012)

American mafia groups

Law enforcement records

800 individuals

Criminal, personal

Centrality, density

14 McGloin (2005)

Newark street gangs

Law enforcement records

736 individuals

Criminal, personal

Cohesion, cut-points

15 McGloin and Piquero (2010)

Philadelphia juvenile co-offenders

Law enforcement records

218 individuals

Criminal

Redundancy

16 Morselli (2009a)

Quebec Hells Angels

Law enforcement records

174 individuals

Criminal, communication

Centrality

17 Morselli and Giguere (2006)

Global drug traffickers

Legal documents

110 individuals

Communication

Key players, reciprocity

18 Morselli and Roy (2008)

Montreal port auto theft rings

Law enforcement records

44 individuals 33 individuals

Criminal, communication

Centrality, cut-points

19 Nash, Bouchard, and Malm (2013)

Eron mortgage fraud

Survey

559 individuals

Investments

Diffusion

20 Natarajan (2006)

New York City heroin trafficking organization

Legal documents

294 individuals

Communication

Centrality, cohesion

21 Obert (2014)

19th-century western gun fighters

Secondary sources

255 individuals

Alliances, conflicts

Community detection, spatial analysis

11

22 Papachristos (2009)

Chicago street gangs

Law enforcement records

48 gangs 66 gangs

Homicides

Centrality, contagion, reciprocity

23 Papachristos, Hureau, and Braga (2013)

Boston and Chicago street gangs

Law enforcement records

57 gangs 46 gangs

Shootings, homicides Exponential random graph models, reciprocity

24 Papachristos and Wildeman (2014)

Chicago co-offenders

Law enforcement records

3,718 individuals

Homicides

Distance

25 Pedahzur and Perliger (2006)

Palestinian suicide attacks

News sources

22 individuals 36 individuals 36 individuals 49 individuals

Personal

Centrality

26 Schaefer (2012)

Arizona juvenile co-offenders

Law enforcement records

10,629 individuals

Criminal

Distance, exponential random graph models

27 Smith and Papachristos (2016)

Chicago Prohibition-era organized crime

Archives

1,030 individuals

Criminal, legitimate, personal

Exponential random graph models, multiplexity

28 Tenti and Morselli (2014)

Italian drug co-offenders

Legal documents

242 individuals

Criminal

Centrality, density

29 Tita and Radil (2011)

Los Angeles street gangs

Law enforcement records

120 block groups 29 gangs

Rivalries

Spatial analysis

622   Chris M. Smith and Andrew V. Papachristos own work on Prohibition era Chicago provides another example of the ways criminal networks research pushes beyond predefined or assumed group boundaries—such as public enemy lists—and in so doing produces insights on how organized crime develops and integrates with legitimate society (Smith & Papachristos, 2016).

Co-Offending Groups A second research innovation pushing criminal networks beyond defined criminal groups is research on co-offending. Rather than start from a defined criminal group or organization, research on co-offending looks at the informal associations of offenders or groups of offenders and links individuals through incidents of coarrest or coassociation in crime. This developing line of research, recently summarized by McGloin and Nguyen (2014), hopes to better understand the group nature of crime more broadly, but specifically how to understand the influence that networks and co-offending have on criminal careers, desistence, and other individual trajectories. In these studies, groups are not generated through formal affiliation via a named entity, as one might view street gangs based on nonrelational research. Instead, groups emerge as clusters of ties within a criminal network, often using formal network metrics such as ego density or clustering scores. This research does not assume groups are quite so neatly defined, and SNA provides a foundational tool for understanding group processes writ large. As a case in point, a series of studies by Papachristos and colleagues generate coarrest networks for entire cities and neighborhoods, not only revealing a foundational network structure but also providing evidence that such networks facilitate the social contagion of gun violence (e.g., Papachristos, Hureau, & Braga, 2013; Papachristos & Wildeman, 2014).

Criminal Investigations Lastly, on the topic of innovations in bounding and defining criminal groups and group crimes, the majority of the studies in Table 33.1 focus on defining groups or revealing networks within particular investigations or cases in which the group structure may have been less clear. Among the first scholars to demonstrate this point were sociologists Wayne Baker and Robert Faulkner, who revealed the field of criminal networks and its potential. In their classic study of white-collar conspiracies in the heavy machining industry, Baker and Faulkner (1993) dug into archival legal documents and secondary sources to create a criminal network based on the federal investigation of price-fixing schemes. Following Baker and Faulkner, many of the criminal network studies use SNA to identify groups and their structure following investigations, and these studies cover a range of serious crimes from local drug markets to international gun markets to large-scale frauds to terrorist activities. For example, using two-mode analysis, Breiger and colleagues discover the multiple ways in which terrorist organizations participate in the international drug trade, and in turn they establish a link between illicit global economies and ideologically driven violence (Breiger et al., 2014). This example, as well as other criminal network studies in Table 33.1, are at the cutting edge of defining local, territorial, and global structures of criminalized goods and violence markets.

Criminal Networks   623

Criminal Network Data Although there is a range of data sources in Table 33.1, from history books to global government surveys, the second trend we observe is that criminal network scholarship relies most heavily on data from official law enforcement and court records of closed criminal cases. Over 70% of the 29 studies in Table 33.1 rely on criminal justice records. Hagedorn (1990) criticizes the use of official law enforcement and court data as “courthouse criminology” because the data collection introduces institutional and environmental biases. Namely, the results emerging from such data might more accurately reflect the activities and opinions of criminal justice agencies since official criminal justice data can only measure and model what is known to and recorded by institutions responsible for investigating, charging, and prosecuting crimes. When not using law enforcement data, Table 33.1 shows that criminal networks research relies second on newspapers, books, and archives. Law enforcement data and court records are not always available to the public and require a data-sharing agreement with a particular institution, whereas newspapers, books, and archives are mostly open source and less bound by a single institution. Although more available to the public, open-source materials require extensive coding and organizing for relational material and introduce different forms of bias than “courthouse criminology.” The studies in Table 33.1 using open sources must contend with journalists and archivists’ decisions on who is important, what should be reported and recorded, and what should be omitted. Breiger and colleagues (2014) caution that one form of selection bias in open-source coding of criminal networks is better completeness on larger criminal groups than smaller criminal groups. A spotlight effect on infamous individuals is also common in criminal networks from open sources. For example, Sageman’s (2004) database on the Global Salafi Jihad is biased toward leaders and members caught during investigations. Our own research on Chicago historical organized crime is biased toward Al Capone and his top associates who were under the investigative spotlight of the Chicago Crime Commission, the Chicago Tribune, and the Internal Revenue Service (Papachristos & Smith, 2014; Smith & Papachristos, 2016). In addition to the conventional data collection challenges found in most criminological research, criminal network data also confronts the issue of dark networks. Hidden populations within a social network framework are sometimes called “dark networks,” meaning that outsiders—and sometimes even insiders—do not know the total structure of a network (Xu & Chen, 2008). Networks are not dark because they are criminal; they are dark when they are clandestine and members prefer to conceal their identities and their activities. A major challenge in the analysis of dark networks is that incomplete data can occur at either the individual level or the relationship level. The consequence for missing data in dark networks is that analyses of observed networks tend to be underestimated. Research on dark networks must accept the limitation that the data are at risk of being incomplete and can produce unstable predictions in modeling (Breiger et al., 2014; Malm, Bichler, & Van De Walle, 2010; Smith & Papachristos, 2016; Xu & Chen, 2008). What is evident across Table 33.1 is that none of the data sources used were intentionally designed for SNA purposes. In contrast, many of the studies of peer effects and juvenile delinquency leverage extensive school surveys that include well-constructed peer network

624   Chris M. Smith and Andrew V. Papachristos roster questions. Criminal network scholars have to approach conventional arrest records, court transcripts, books, and archives with clever relational lenses. The advancement in criminal networks research to date has not been new survey instruments containing network questions, but rather looking at existing conventional data in new ways. For example, Papachristos and colleagues’ research on homicides in crime networks analyzes commonly used arrest and victimization records (Papachristos,  2009; Papachristos et al.,  2013; Papachristos & Wildeman, 2014). The innovation is not the collection of original survey data, but rather manipulating the inherent relational nature of arrest data that link co-offenders through arrest events. Similarly, across several studies Morselli and colleagues tap into the relational nature of investigative wiretap logs to map out communication networks spanning criminal organizations and markets (Morselli, 2009a; Morselli & Giguere, 2006). Even though the investigative method of wiretaps has been around since Prohibition, computational methods to extract the relational data from thousands of pages of transcripts are new to the field of criminology.

Theoretical Foundations in Criminal Networks Not always finding a home in classic criminological theories, criminal network scholars are promiscuously piecing together sociological and organizational theories to test hypotheses and interpret empirical findings. The third dominant theme we identify in Table 33.1 is the lack of a shared conceptual or theoretical approach across the criminal network scholarship. While all of these Table 33.1 studies share a foundational belief in applying network methods to empirical puzzles, each draws from a different theoretical starting point—or, at times, no theoretical foundation at all. In our review of the studies in Table 33.1, we classify three broad approaches that fit most, but not all, of the research: (1) organizations, (2) diffusion, and (3) group process.

Organizations The most prevalent theoretical thread crossing the criminal networks research in Table 33.1 is organizational theory. Borrowing heavily from theories of legitimate organizations, scholars focus less on definitional debates of organized crime or legitimate organizations and instead test how well organizational concepts, such as trust, hierarchy, patronage, or brokerage, work in criminal network contexts. Specifically, criminal networks scholars draw on the organizational theoretical insights of Burt’s (1992, 2007) theory of structural holes and brokerage, Granovetter’s (1973, 1985) theory of the strength of weak ties and theory of structural embeddedness, Lin’s (2001) theory of social capital, and Uzzi’s (1996, 1999) theoretical work on embeddedness and trust. This is an illuminating endeavor as scholars show that illicit networks require breaking many organizational rules in order to persist— such as prioritizing concealment over coordination (Baker & Faulkner,  1993), requiring multiplex noncriminal relationships (Smith & Papachristos, 2016), horizontal rather than

Criminal Networks   625 hierarchical organizational structure (McGloin, 2005; Pedahzur & Perliger, 2006), or the riskiness of brokerage (Baker & Faulkner, 1993; Morselli, 2010; Morselli & Roy, 2008). In some instances, the criminal networks research generates its own theory of illicit organizations as a contrast to legitimate organizations. Specifically, 11 of the 29 studies in Table 33.1 cite Baker and Faulkner’s (1993) groundbreaking study on the structure of conspiracies.

Diffusion A less common theoretical thread across the criminal networks research is diffusion as the processes through which crime and violence spread. This classification has developed in three directions in recent years. First, some scholars draw upon studies of the diffusion of innovations and technology (Coleman, Katz, & Menzel, 1957; Valente, 1995) to understand how information and crime spread through illicit networks. For example, Baker and Faulkner (2003) and Nash, Bouchard, and Malm (2013) examine the diffusion of white-collar fraud to locate which agents and which actions are most responsible for spreading criminal information through legitimate corporations. Second, research integrates network models of diffusion with spatial diffusion models. The union of spatial and social network models is complementary in large part because the underlying statistical modeling of autoregressive terms is identical (Leenders, 2002). Scholars in this area build upon models of the spatial diffusion of crime (e.g., Cohen & Tita,  1999; Morenoff, Sampson, & Raudenbush, 2001) by integrating additional nonspatial networked data, such as gang conflicts (Papachristos et al., 2013; Tita & Radil, 2011) and co-offending (Schaefer, 2012). Third, research looks to epidemiological models of disease transmission and diffusion in an attempt to understand how person-to-person interactions might facilitate the diffusion of gun violence within populations and networks. For example, recent studies analyze how exposure to gunshot violence within one’s networks fosters the social contagion of violence and enhances individual risk of gunshot injury (Papachristos & Wildeman,  2014; Tracy, Braga, & Papachristos, 2016).

Group Process Our third classification is a bit of a catch-all that focuses on the theoretical orientations and understandings of groups outside of formal organizations and diffusion. This classification of group process ranges from theories of violence and dominance (e.g., Collins,  2008; Gould, 1999) to theories of gangs (e.g., Thrasher, 1927; Whyte, 1993 [1943]) and theories of organized crime (e.g., Boissevain, 1974; Gambetta, 1993; Ianni & Reuss-Ianni, 1976) to theories of homophily and dyads (e.g., McPherson, Smith-Lovin, & Cook, 2001; Simmel, 1955). Criminal network scholars in this theoretical classification explore theories that fit and do not fit the network empirics—detecting and tracking criminal group processes often in contrast to conventional groups or building on classic studies of criminal groups with relational and structural insights. For instance, Papachristos’s (2009) study on gang violence examines how reciprocity within a gang conflict network fuels subsequent homicides between groups. Likewise, Gould’s (1999) underlying theory of conflict pivots on the ways individuals and groups address basic collective action problems, such as the threat to group

626   Chris M. Smith and Andrew V. Papachristos solidarity or the assertion of dominance among others. As another example, Hughes (2013) employs SNA to test some of the classic theories of group process in gangs and finds that, contrary to Thrasher’s (1927) theory of strong friendships among gang members leading to increased violence, the least cohesive gangs are the most violent.

Criminal Justice Applications A relational approach to offenders and criminal groups is not new to law enforcement, but the application of network science is. In the early 1990s, Sparrow (1991) pointed out the applicability of SNA in criminal justice practice—specifically tracing the flows of drugs and cash. Criminal justice organizations continue this interest in the analysis of criminal networks in their violence reduction efforts, problem-oriented policing, focused deterrence, and targeting of key players.3 One developing area of criminal networks in practice is the mapping of rivalries and alliances between criminal groups to pinpoint where violent exchanges are most likely to occur. The National Network for Safe Communities (2016) created an entire process for collecting such data called a group audit. A group audit is a tool that facilitates data collection on street gangs, crews, and groups with basic social network visualization techniques. Neighborhood patrol, homicide detectives, gang experts, and probation and parole officers come together with a project manager to combine and map their knowledge and experience of the criminal groups in the area. Gathered around maps of the city or maps of specific areas, experts identify group names, members, and territories on the map as well as conflicts and alliances between groups (National Network for Safe Communities, 2016; SierraArevalo & Papachristos, 2015). As part of a crime reduction initiative, Kennedy, Braga, and Piehl (1997) mapped out gang rivalries and alliances in Boston. Violence intervention and prevention efforts relied on these network maps to direct attention toward gangs that were actively involved in shootings and target resources scarce to those gangs. This targeted group-based intervention strategy, of which gang network mapping was a key part, produced significant reductions in youth homicide and nonfatal shootings (Braga et al., 2001). The implications of research findings from criminal networks and violence networks argue against sweeping policies and practices based on categorical distinctions such as gang membership, race, or neighborhood and instead focus on intervention and prevention efforts that consider the observable and risky behavior of individuals (Bichler & Malm, 2015; Papachristos & Wildeman, 2014). Using network techniques to pinpoint groups and individuals at risk for victimization provides more direct points of intervention and a more efficacious use of limited resources. Criminal networks pose several questions for criminal justice: Can fringe groups or individuals be identified before the commission of a crime? Can criminal networks predict and target future violence in a way that is ethical and not overly deterministic? These questions are coming into play for criminal networks scholars. For example, the Chicago Police Department’s former Strategic Subject List of Chicago residents who were most closely connected to gun violence was informed by criminal networks research. Human rights groups and concerned citizens criticize these types of lists as not being transparent and generating a new form of profiling. During a political moment when distrust of police is high and

Criminal Networks   627 high-profile police shootings of young black men have kept the nation’s attention for years, it is possible to see how the applied use of criminal networks for criminal justice purposes can generate controversy—especially when the focus of the network analysis is to identify offenders rather than prioritize saving potential victims.4

Moving Criminal Networks Forward Not to put too fine a point on it, methodologically, criminal networks scholars need to hurry up while simultaneously slowing down the theoretical orientation of the work. As the network science world plows forward computationally and methodologically, we urge criminal networks scholars to keep pace with the innovations, create new datasets, and test new measures, but also improve the theoretical underpinnings of our research questions and make broader contributions to our disciplines. Scholars should continue to explore the depths of existing datasets, but we should, at the same time, begin to consider the design and collection of new data sources—particularly those outside of police departments and courtrooms. Criminal network data have almost entirely been collected post hoc of closed criminal cases or completed arrests. We need to be critical and honest with the data about how predictive our research can be given the enthusiasm for SNA to stop terrorists and lower crime rates. Newspapers and archives introduce a different set of selection biases of preserved historical memories, but relational databases from these sources permit new research questions and theoretical engagements— such as how violence markets are generated and spread (Obert, 2014). Moving forward, criminal networks researchers should consider collaborating with ethnographers to maximize the relationality of past and future field notes. Hughes (2013) provides a notable example of reviving ethnographic field notes from the historical gang research of Short and Strodtbeck (1965) with fresh criminal network perspectives. Another direction forward is to include network-related questions in future survey collection efforts. As an excellent case in point, research by Kreager and colleagues (2016, 2017) introduces SNA to prisons and is positioned to move forward theories of inmate networks and their consequences for health, safety, support, and recidivism. Their work shows how criminogenic prison settings foster the formation of social—rather than criminal—networks via the concentration of convicted persons. A similar ethos should be expanded to other domains of empirical investigation. We are convinced that the field of criminal networks is missing a unified theoretical approach. Upon completing our literature review for this chapter and in light of the recent controversies around the application of criminal networks, we are considering what moving our scholarship toward a relational theory of criminal networks would require. What are the theories that should be driving our questions and empirics? What are the elements of the theory, the theory of the nodes, and the theory of relationships? Scholars are making multiple decisions on how and when relationships matter to a criminal network, but when should criminal networks operate as the dependent variable, an independent variable, or a measure of a feedback loop or social process? We are not the first to note the lack of theoretical engagement in criminal networks; in fact, this is a common critique of SNA more broadly. At the same time, we encourage crime researchers not simply to borrow from SNA,

628   Chris M. Smith and Andrew V. Papachristos but also to contribute to its methodological and theoretical developments. We must ­consider how our own expertise might improve scientific thinking and inquiry more broadly and not be afraid to insert ourselves into those debates. Not everyone worries about social network methods and applications outpacing theory. Even Jacob Moreno is remembered more for his method than for his theory. But it is worth remembering that our colleagues conducting peer networks and delinquency research are engaging some of the most classic questions in criminology and gaining incredible insights on how and why youth commit delinquent acts. We hope that criminal network scholarship returns to some of the classic questions about the etiology of adult crimes and serious crime, as we are in a position to make lasting impacts on our fields.

Notes 1. The full table of the 49 studies using SNA on criminal groups is available from the authors via request. Including only 29 of the 49 studies in Table 33.1 mostly reduces the number of entries by a single author. 2. The Illicit Networks Workshop has coordinated research conferences and publications on this subject since 2009, resulting in several edited volumes (see Bichler & Malm,  2015; Morselli, 2014) and special issues of journals like Global Crime and Trends in Organized Crime. 3. Bichler and Malm (2015) have an entire edited volume dedicated to the issue of network analysis and crime prevention. 4. See Papachristos (2016) for additional commentary on Chicago’s Strategic Subject List.

References Athey, N. C., & Bouchard, M. (2013). The BALCO scandal: The social structure of a steroid distribution network. Global Crime, 14(2–3), 216–237. Baker, W. E., & Faulkner, R. R. (1993). The social organization of conspiracy: Illegal networks in the heavy electric equipment industry. American Sociological Review, 58(6), 837–860. Baker, W. E., & Faulkner, R. R. (2003). Diffusion of fraud: Intermediate economic crime and investor dynamics. Criminology, 41(4), 1173–1206. Bichler, G., & Malm, A. E. (2013). Small arms, big guns: A dynamic model of illicit market opportunity. Global Crime, 14(2–3), 261–286. Bichler, G., & Malm, A. E. (Eds.). (2015). Disrupting criminal networks: Network analysis in crime prevention. Boulder, CO: Lynne Rienner Publishers. Boissevain, J. (1974). Friends of friends: Networks, manipulators, and coalitions. Oxford, UK: Basil Blackwell. Braga, A. A., Kennedy, D. M., Waring, E. J., & Piehl, A. M. (2001). Problem-oriented policing, deterrence, and youth violence: An evaluation of Boston’s Operation Ceasefire. Journal of Research in Crime and Delinquency, 38(3), 195–225. Breiger, R. L., Schoon, E., Melamed, D., Asal, V., & Rethemeyer, R. K. (2014). Comparative configurational analysis as a two-mode network problem: A study of terrorist group engagement in the drug trade. Social Networks, 36(1), 24–39. Bright, D. A., Hughes, C. E., & Chalmers, J. (2012). Illuminating dark networks: A social network analysis of an Australian drug trafficking syndicate. Crime, Law & Social Change, 57(2), 151–176.

Criminal Networks   629 Burgess, R. L., & Akers, R. L. (1966). A differential association-reinforcement theory of criminal behavior. Social Problems, 14(2), 128–147. Bursik, R. J., & Grasmick, H. G. (1993). Neighborhoods and crime: The dimensions of effective community control. Lanham, MD: Lexington Books. Burt, R.  S. (1992). Structural holes: The social structure of competition. Cambridge, MA: Harvard University Press. Burt, R.  S. (2007). Brokerage and closure: An introduction to social capital. New York, NY: Oxford University Press. Calderoni, F. (2012). The structure of drug trafficking mafias: The ‘Ndrangheta and cocaine. Crime, Law & Social Change, 58(3), 321–349. Cohen, J., & Tita, G. (1999). Diffusion in homicide: Exploring a general method for detecting spatial diffusion processes. Journal of Quantitative Criminology, 15(4), 451–493. Coleman, J., Katz, E., & Menzel, H. (1957). The diffusion of an innovation among physicians. Sociometry, 20(4), 253–270. Collins, R. (2008). Violence: A micro-sociological theory. Princeton, NJ: Princeton University Press. DellaPosta, D. (2017). Network closure and integration in the mid-20th century American mafia. Social Networks, 51, 148–157. Gambetta, D. (1993). The Sicilian mafia: The business of private protection. Cambridge, MA: Harvard University Press. Gould, R. V. (1999). Collective violence and group solidarity: Evidence from a feuding society. American Sociological Review, 64(3), 356–380. Granovetter, M.  S. (1973). The strength of weak ties. American Journal of Sociology, 78(6), 1360–1380. Granovetter, M. S. (1985). Economic action and social structure: The problem of embeddedness. American Journal of Sociology, 91(3), 481–510. Hagedorn, J. M. (1990). Back in the field again: Gang research in the nineties. In C. R. Huff (Ed.), Gangs in America (pp. 240–259). Newbury Park, CA: Sage. Haynie, D. L. (2001). Delinquent peers revisited: Does network structure matter? American Journal of Sociology, 106(4), 1013–1057. Haynie, D. L., Giordano, P. C., Manning, W. D., & Longmore, M. A. (2005). Adolescent romantic relationships and delinquency involvement. Criminology, 43(1), 177–210. Haynie, D. L., & Kreager, D. A. (2013). Peer networks and crime. In F. T. Cullen & P. Wilcox (Eds.), Oxford handbook of criminological theory (pp. 257–273). New York, NY: Oxford University Press. Hirschi, T. (1969). Causes of delinquency. Berkeley, CA: University of California Press. Hughes, L. A. (2013). Group cohesiveness, gang member prestige, and delinquency and violence in Chicago, 1959–1962. Criminology, 51(4), 795–832. Ianni, F. A. J., & Reuss-Ianni, E. (1976). The crime society: Organized crime and corruption in America. New York, NY: Plume. Kennedy, D. M., Braga, A. A., & Piehl, A. M. (1997). The (un)known universe: Mapping gangs and gang violence in Boston. In D.  Weisburd & T.  McEwen (Eds.), Crime mapping and crime prevention (pp. 219–262). New York, NY: Criminal Justice Press. Kreager, D. A., Rulison, K., & Moody, J. (2011). Delinquency and the structure of adolescent peer groups. Criminology, 49(1), 95–127. Kreager, D. A., Schaefer, D. R., Bouchard, M., Haynie, D. L., Wakefield, S., Young, J. T. N., & Zajac, G. (2016). Toward a criminology of inmate networks. Justice Quarterly, 33(6), 1000–1028.

630   Chris M. Smith and Andrew V. Papachristos Kreager, D. A., Young, J. T. N., Haynie, D. L., Bouchard, M., Schaefer, D. R., & Zajac, G. (2017). Where “old heads” prevail: Inmate hierarchy in a men’s prison unit. American Sociological Review, 82(4), 685–718. Krebs, V. E. (2002). Mapping networks of terrorist cells. Connections, 24(3), 43–52. Leenders, R.  T.  A.  J. (2002). Modeling social influence through network autocorrelation: Constructing the weight matrix. Social Networks, 24(1), 21–47. Lin, N. (2001). Social capital: A theory of social structure & action. New York, NY: Cambridge University Press. Malm, A. E., Bichler, G., & Van De Walle, S. (2010). Comparing the ties that bind criminal networks: Is blood thicker than water? Security Journal, 23, 52–74. Mancuso, M. (2014). Not all madams have a central role: Analysis of a Nigerian sex trafficking network. Trends in Organized Crime, 17(1), 66–88. Marineau, R. F. (2007). The birth and development of sociometry: The work and legacy of Jacob Moreno (1889–1974). Social Psychology Quarterly, 70(4), 322–325. Mastrobuoni, G., & Patacchini, E. (2012). Organized crime networks: An application of network analysis techniques to the American mafia. Review of Network Economics, 11(3), 1–41. McGloin, J. M. (2005). Policy and intervention considerations of a network analysis of street gangs. Criminology & Public Policy, 4(3), 607–636. McGloin, J.  M., & Kirk, D.  S. (2010). An overview of social network analysis. Journal of Criminal Justice Education, 21(2), 169–181. McGloin, J. M., & Nguyen, H. (2014). The importance of studying co-offending networks for criminological theory and policy. In C.  Morselli (Ed.), Crime and networks (pp. 13–27). New York, NY: Routledge. McGloin, J. M., & Piquero, A. R. (2010). On the relationship between co-offending network redundancy and offending versatility. Journal of Research in Crime and Delinquency, 47(1), 63–90. McPherson, M., Smith-Lovin, L., & Cook, J. M. (2001). Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27(1), 415–444. Moreno, J. L. (1953 [1934]). Who shall survive?: Foundations of sociometry, group psychotherapy and sociodrama. Beacon, NY: Beacon House. Morenoff, J. D., Sampson, R. J., & Raudenbush, S. W. (2001). Neighborhood inequality, collective efficacy, and the spatial dynamics of homicide. Criminology, 39(3), 517–558. Morselli, C. (2009a). Hells Angels in springtime. Trends in Organized Crime, 12(2), 145–158. Morselli, C. (2009b). Inside criminal networks. New York, NY: Springer. Morselli, C. (2010). Assessing vulnerable and strategic positions in a criminal network. Journal of Contemporary Criminal Justice, 26(4), 382–392. Morselli, C. (Ed.). (2014). Crime and networks. New York, NY: Routledge. Morselli, C., & Giguere, C. (2006). Legitimate strengths in criminal networks. Crime, Law & Social Change, 45, 185–200. Morselli, C., & Roy, J. (2008). Brokerage qualifications in ringing operations. Criminology, 46(1), 71–98. Nash, R., Bouchard, M., & Malm, A. E. (2013). Investing in people: The role of social networks in the diffusion of a large-scale fraud. Social Networks, 35(4), 686–698. Natarajan, M. (2006). Understanding the structure of a large heroin distribution network: A quantitative analysis of qualitative data. Journal of Quantitative Criminology, 22(2), 171–192. National Network for Safe Communities. (2016). Group violence intervention: An implementation guide. Washington, DC: US Department of Justice, Office of Community Oriented Policing Services.

Criminal Networks   631 Obert, J. (2014). The six-shooter marketplace: 19th-century gunfighting as violence expertise. Studies in American Political Development, 28, 49–79. Papachristos, A. V. (2009). Murder by structure: Dominance relations and the social structure of gang homicide. American Journal of Sociology, 115(1), 74–128. Papachristos, A. V. (2011). The coming of a networked criminology? In J. MacDonald (Ed.), Measuring crime and criminality: Advances in criminological theory (Vol. 27, pp. 101–140). New Brunswick, NJ: Transaction Publishers. Papachristos, A. V. (2014). The network structure of crime. Sociology Compass, 8(4), 347–357. Papachristos, A. V. (2016, July 29). Commentary: CPD’s crucial choice: Treat its list as offenders or as potential victims? Chicago Tribune. http://www.chicagotribune.com/news/­ opinion/commentary/ct-gun-violence-list-chicago-police-murder-perspec-0801-jm20160729-story.html Papachristos, A. V., Hureau, D. M., & Braga, A. A. (2013). The corner and the crew: The influence of geography and social networks on gang violence. American Sociological Review, 78(3), 417–447. Papachristos, A. V., & Smith, C. M. (2014). The embedded and multiplex nature of Al Capone. In C. Morselli (Ed.), Crime and networks (pp. 97–115). New York, NY: Routledge. Papachristos, A. V., & Wildeman, C. (2014). Network exposure and homicide victimization in an African American community. American Journal of Public Health, 104(1), 143–150. Pedahzur, A., & Perliger, A. (2006). The changing nature of suicide attacks: A social network perspective. Social Forces, 84(4), 1987–2008. Sageman, M. (2004). Understanding terror networks. Philadelphia, PA: University of Pennsylvania Press. Schaefer, D.  R. (2012). Youth co-offending networks: An investigation of social and spatial effects. Social Networks, 34(1), 141–149. Short, J.  F., & Strodtbeck, F.  L. (1965). Group process and gang delinquency. Chicago, IL: University of Chicago Press. Sierra-Arevalo, M., & Papachristos, A. V. (2015). Applying group audits to problem oriented policing. In G. Bichler & A. E. Malm (Eds.), Disrupting criminal networks: Network analysis in crime prevention (pp. 27–46). Boulder, CO: Lynne Rienner Publishers. Simmel, G. (1955). Conflict & the web of group affiliations (K. H. Wolff & R. Bendix, Trans.). New York, NY: Free Press. Smith, C.  M., & Papachristos, A.  V. (2016). Trust thy crooked neighbor: Multiplexity in Chicago organized crime networks. American Sociological Review, 81(4), 644–667. Sparrow, M. K. (1991). The application of network analysis to criminal intelligence: An assessment of the prospects. Social Networks, 13(3), 251–274. Sutherland, E. H. (1947). Principles of criminology (4th ed.). Chicago, IL: J. B. Lippincott. Tenti, V., & Morselli, C. (2014). Group co-offending networks in Italy’s illegal drug trade. Crime, Law & Social Change, 62(1), 21–44. Thrasher, F. M. (1927). The gang: A study of 1,313 gangs in Chicago. Chicago, IL: University of Chicago Press. Tita, G. E., & Radil, S. M. (2011). Spatializing the social networks of gangs to explore patterns of violence. Journal of Quantitative Criminology, 27(4), 521–545. Tracy, M., Braga, A.  A., & Papachristos, A.  V. (2016). The transmission of gun and other ­weapon-involved violence within social networks. Epidemiologic Reviews, 38(1), 70–86. Uzzi, B. (1996). The sources and consequences of embeddedness for the economic performances of organizations: The network effect. American Sociological Review, 16(4), 674–698.

632   Chris M. Smith and Andrew V. Papachristos Uzzi, B. (1999). Embeddedness in the making of financial capital: How social relations and networks benefit firms seeking financing. American Sociological Review, 64(4), 481–505. Valente, T. W. (1995). Network models of the diffusion of innovations. Cresskill, NJ: Hampton Press. Watters, J. K., & Biernacki, P. (1989). Targeted sampling: Options for the study of hidden populations. Social Problems, 36(4), 416–430. Whyte, W. F. (1993 [1943]). Street corner society: The social structure of an Italian slum (4th ed.). Chicago, IL: University of Chicago Press. Wright, R. T., Decker, S. H., Redfern, A. K., & Smith, D. L. (1992). A snowball’s chance in hell: Doing field research with residential burglars. Journal of Research in Crime and Delinquency, 29(2), 148–161. Xu, J., & Chen, H. (2008). The topology of dark networks. Communications of the ACM, 51(10), 58–65. Young, J. T. N. (2011). How do they “end up together”? A social network analysis of self-­control, homophily, and adolescent relationships. Journal of Quantitative Criminology, 27(3), 251–273. Young, J. T. N., Rebellon, C. J., Barnes, J. C., & Weerman, F. M. (2014). Unpacking the black box of peer similarity in deviance: Understanding the mechanisms linking personal ­behavior, peer behavior, and perceptions. Criminology, 52(1), 60–86.

Index

Note: Tables and figures are indicated by an italic “t”, “f ” respectively, following the page number. For the benefit of digital users, indexed terms that span two pages (e.g., 52–53) may, on occasion, appear on only one of those pages. Abbe, E.  313 Abdo, A. H.  162, 163 Abrahao, B.  517, 520, 523, 524, 526–527 A/B test  525–526, 529 access 71 social ties  73–74, 73f structures, computational social science  84–85 access deficit, social capital  565 access theories, aggregate social interaction data for  80–81 act 35 action aggregations 36 context  35, 35f instrumental 38 opportunities  35, 35f action theory  35–36, 35f, 44, 44t social capital  37–38, 38t, 44 social structures and individual action  35–36, 35f actor. See also node bridges 37 collective 43 dualities 397–398 interdependencies 50 interests  35, 35f actor affiliation networks  401 actor-network theory  402 actor-oriented models  299 adams, j.  126, 360, 404, 438 adjacency matrix  18, 18f Adolescent Health and Academic Achievement (AHAA)  300 advanced producer services (APSs)  379

advice networks  55–56 intraorganizational  569, 571–572, 571f, 574 exponential random graph model  240–242, 241t neo-structural sociology  55–56, 58 transfer, empirical examples  240–242, 241t advice seeking organizations  240–243, 241t Aeby, G.  468, 472 affectionate roles, family  475–476 affiliation. See also duality ecology 395–396 affiliation matrix  392–394. See also duality, beyond person and groups affiliation networks  400–402 actor 401 actor-network theory and “heterogeneous networks,” 401 background 401 bipartite 401–402 collaboration, structure  401 generalized 402 group interlock network  401 interlocking directorates  188, 393, 401 “new” science  393, 401–402 African Americans, social capital access deficit  565 job-finding contacts  566, 567f prisoners, knowing  156 workplace  565, 566, 568–569, 571 age, Eurasian red deer dominance modeling 269 agency 39 collective 54 neo-structural sociology  51

634   index agent-based models  36. See also specific types causal inferences  299 SIENA  36, 44, 44t spatial dimensions  371–372 agents, connected  542, 552 agglomerative methods  313 aggregated relational data (ARD)  153–154 barrier effects  156–162, 159f, 167 enriched 163 generalized scale-up estimator  162–164 recall error  160–161, 161f scale-up estimators  167 standard scale-up estimator  163–164 aggregate social interaction data  80–81 Agneessens, F.  337, 571–572 agonism 103–104 Ahnert, R.  437 Ahnert, S. E.  437 Airbnb experiment  526–527, 526f air transport network, global  379, 380 Akerlof, G. A.  299 Alba, R.  353 Albert, R.  373–374, 384–385 Alexander, M. C.  435 alter-alter ties  173, 174t, 175, 176, 180 alternating k-paths  226 alternating k-star parameter  226 alters, ego network data  174–175, 174t, 177, 179, 180 alters, named ego network data  171–172, 173, 174t, 175, 176, 177, 182–183 name generators  124 Amati, V.  458 Amazon Mechanical Turk  140–141 An, W.  290 Anderson, C. J.  228 Andris, B.  318 Annotated Community  520, 520f annotated networks  312 Ansell, C.  41, 42, 438 antecedents, social network dyadic level  191–193, 192f, 208 social capital and workplace ­outcomes  570–572, 571f antecedents, workplace outcomes and social capital  570–572, 571f anticategorical imperative  484

Appleby, M. C.  267, 268, 282 Aral, S.  290 Arango, J.  483 archaeology 445–460 Bronze Age Crete, settlements distribution network models  454, 455f Early, Cyclades  450–452, 451f Middle, ariadne network model  451–452, 452f “The Connected Past,”  449 dyads and triads  445, 446–448, 446f, 447f entangled networks, of humans and things 458–459 network fundamentals  445 network thinking in  448–450 obsidian sources, Maya network  457, 458f proximal point analysis  448–451 Saronic Gulf area, locally transitive networks  453, 454f spatial network analysis and “theory models,”  450–455, 452f–455f Syria settlements network  3rd mill. BC 453, 453f theory models to data models  455–458, 456f, 458f type 1 and 2 networks  456, 456f Archer, M. S.  51 arcs. edges Arenas, A.  320–321 ariadne network model  451–452, 452f Arránz Becker, O.  486–487 articles cocitation network  8, 9f interdisciplinarity  3–4, 3f trends  1, 2f artists, dualities  396–397 art worlds, dualities  396–397 aspirational friendships  108 associational networks  435–436 assortative community structure  311 assortative mixing  28, 85 assortativity  107, 245–246, 326 asymmetry, evolution and  106–107 Atlas of Knowledge (Börner)  598 Atlas of Science (Börner)  598 attractiveness  100, 101 attribute-based dyadic effects  256–257

index   635 attributed networks  312 Aumann, R.  551 auto-logistic actor attribute models (ALAAMs) 235–236 multilevel  237–239, 238f autoregression models  196 Aven, B. L.  434 average nearest-neighbor degree  586 bag-of-words technique  418, 423–424 Bail, C. A.  418–419, 420 Baker, W. E.  622, 625 balance theory Heider’s 100 triads 24 Bales, R. F.  105 Balkundi, P.  196 Ballester, C.  542–543 Banerjee, A.  546, 550 Barabási, A.-L.  373–374, 384–385 Barbosa, N. M.  522–523 bargaining 550–552 Barkey, K.  433, 438 barrier effects, aggregated relational data  156–162, 159f, 167 basics, network  17–31 boundary specification  25–26, 26f building blocks  17–19, 18f bridging levels  22f, 23–25, 25f cohesion  27, 197 community 27 connectionist approaches  19–21, 20f connectivity 27 data collection  28 ethics 30–31 forms, basic network  21–23, 22f name generators  29 positional approaches  20–21, 20f sampling 29–30 statistical models  28 Baskaran T.  591 Bastos, F. I.  162, 163 Batist, Z.  457 Bavelas, A.  197 Bayesian approach, network scale-up method 161–162 Bayesian exponential random graphs (BERGMs) 228

Bayesian learning  547–548 Beaman, L.  550 Bearman, P. S.  122, 404, 418, 433, 434, 436–437 Beauchesne, O. H.  598, 599f Becker, Howard  40 behavior. See also specific types definition 50 quality, on behavior relations  10 behavior, games and  540–552 development economics  549–550 exchange theory, bargaining, and trade  550–552 financial networks  544–546 games on networks  541–542 labor markets  548–549 social learning  546–548 strategic complementarities  542–544, 543f strategic substitutes  542 behavioral interactions (time-aggregated), social ties as  73f, 74–75 computational social science and  83 Behrman, J. R.  486, 487 belief networks  416 beliefs, social networks on  288 Belmont Report  30, 128, 129 Bender de Moll, S.  86, 360 Berelson, B.  547 Bergemann, P.  434 Bergm 239 Berkowitz, S. D.  288, 289 Bernard, H. R.  154, 155, 158, 165 Bernardi, L.  485, 487 Bernoulli graph  224 Bertoni, N.  162, 163 beta centrality  336, 338–340, 344, 349 betweenness  73, 190 contribution/induced centrality perspective  342–343, 342f family relationships  469 flow outcomes perspective  346–349 walk structure perspective  337–339, 340t between-subjects design  138 Bevan, A.  455 bias, causal inference limiting 300 mitigating potential  289 identification, ordinary least squares  294–296, 294f

636   index bias, causal inference (Continued) longitudinal data and carefully measured covariates 299 methods 295–296 randomization 290 omitted variable  288, 292–293, 292f perception 291 quantifying 296 recall 300 self-report 291 triadic closure overgeneralization  300–301 bias, successive textual readings  417 bibliometry 384 Bienenstock, E. J.  551, 552 big data  516–517. See also computational social science challenges 530 definition 516 machine learning and social sciences  521–525 network visualization  361 online experiments, on interactions  526 textual analyses  404 Bikhchandani, S.  546 Billari, F. C.  482 bipartite networks  228 affiliation 401–402 exponential random graph models  235 birth control, adoption  484–487, 489–490 Blake, E.  449, 457–458 Blau, P., Exchange and Power in Social Life 3 Block, P.  265, 266 blockmodel analysis international trade network  585–586 moral order patterns  397 block modeling  51 seminal works  1–2 social niche  53–54 Blume, L. E.  551 Blumer, H.  38, 38t Böhm, T.  437–438 Bonacich, P.  336, 551, 552 Bonacich power  336, 338–340, 344, 349 bonding social capital  468–469 Boorman, S.  51 Borgatti, S. P.  120, 255, 335, 337, 341, 343–344 Börner, K.  598, 600 Bothner, M. S.  105

Bott, Elizabeth  3 Bottero, W.  40 bottom-up dynamics  62 Bouchard, M.  625 boundary ambiguity, intransitivity  469–470 conditions 142 specification  25–26, 26f specification problem  121–123, 125 boundedly rational learning  547 bounded solidarity  55 Bourdieu, P.  37, 38t, 40, 46 Boutyline, A.  416 Braga, A. A.  626 Brailly, J.  53 brain network architecture, different states  322–324, 323f processing capacity, on social network size 507–508 shaping and constraining social networks  506–507 size, on social network size  496–498, 497f social behavior and, rationale for study of 499–500 social brain hypothesis  496–498, 497f social networks shaping  508–510 Brainerd-Robinson coefficient for similarity 457 Bramoullé, Y.  553 Brandes, U.  260, 335 Brashears, M. E.  146–147 Brass, D. J.  194, 206, 574 Breiger, R. L.  51, 59, 60, 392, 394, 395, 398, 400, 403, 418, 585, 622, 623 Brennecke, J.  240 Breza, E.  154 bridges 37–38 actors with  37 Bridges, of Königsburg  372–375, 373f, 375f bridging levels  22f, 23–25, 25f bridging social capital  469 brokerage criminal networks  624 historical network research  438 positions 572–573 role 194 Bronfenbrenner, U.  353

index   637 Bronze Age Crete, settlements distribution network models  454, 455f Early, Cyclades  450–452, 451f Middle, ariadne network model  451–452, 452f Broodbank, C.  450–452, 451f, 452f Making of the Middle Sea 449 Brown, S. L.  470 Brughmans, T.  449, 459 Bryant, R. A.  235, 238 Buchnea, E.  436 Buckee, C. O.  325, 327 bucket brigade  22, 22f, 23 building blocks  17–19, 18f bridging levels  22f, 23–25, 25f bureaucracy 52–53 Burt, R. S.  24, 35–38, 35f, 38t, 44, 44t, 195, 288, 539, 572, 624 Butts, C. T.  86–87, 88, 256, 258–261, 259t, 278, 279 Cabrales, A.  540 Caenorhabditis elegans neural network  319–322, 321f caging 103–104 Caimo, A.  239 Caldwell, J. C.  482–483 Calvó-Armengol, A.  540, 542–543, 549, 553 Cambrosio, A.  402 capacities, individual and collective  50–51 career maps, CyberSeek  610, 612f Carell, S. E.  290 caring roles, family  475 Carley, K.  418 case studies, network community detection  311–327 agglomerative methods  313 assortative community structure  311 attributed networks  312 community concentration  315–317 congressional roll call  317–319, 318f contagion exposure  316–317 divisive methods  313 dynamical systems  313–314 “goodness” of partitions  313 graph partitioning  312 ground truth  314–315

groups (communities)  311 heuristics 313 hierarchical clustering  313 homophily 312 karate club network, Zachary’s  314–315, 314f malaria genes, probabilistic network model  325–327, 326f mesoscopic perspective  311 modularity  313, 317–319, 318f, 319–320 modularity maximization  312–313, 319–322, 321f, 323–324 network architecture, human brain at different states  322–324, 323f neural network, Caenorhabditis elegans  319–322, 321f social memes, virality prediction  315–317 statistical inference  313 structural communities  312 Castells, M.  378 Casterline, J. B.  485 Castren, A.-M.  470 catnets 42 causal explanations, machine learning and 522 causal identification latent space  298, 299 by research type  138–139, 139f ruling in and ruling out  138 causal inference  288–302 behavior, social networks on  288 historical debates  288–289 influence process  289–290 interaction parameters, selection  297 observational studies  290–294, 292f, 293f randomized experiments  290 robustness of inferences, quantifying and informing debate  296–297 robustness of inferences, quantifying from selection models  298 selection models, estimation  297–298 simulation example, ordinary least squares identification  294–296, 294f causal mechanisms  191 causal models. See models, social network; specific models causes. See also specific topics structural 522 Centola, D.  143–144

638   index centrality (centralization) capturing, theory and  2 computational social science  72, 76, 77, 80, 84 definition 100 flow 341 group level  23, 190–191, 190t cross-level interaction  206 multilevel models  204, 205 network consequences  197 network emergence  196 historical network research  438 information  339, 342 intercentrality 543–544 Katz-Bonacich  542–545, 543f network mediation models  198 network moderation models  200–201 nodal level network consequences  195 network emergence  194–195 centrality measures  334–349 beta centrality (Bonacich power)  336, 338–340, 344, 349 betweenness (see betweenness) closeness (see closeness) contribution/induced centrality perspective  334, 340–345, 341f–342f, 343t, 344f, 345t, 348 endogenous 334 exogenous 334 flow outcomes perspective  334, 345–347, 347t, 348–349 fundamentals  334–335 (See also specific types) graph invariant  340–344, 347–348 isomorphism rule  335 key player approaches  344–345, 345t, 349 neighborhood preserving  335 PN measure  335, 340t walk structure perspective  334, 336–340, 340t, 347–348 well formedness  335 centralized communication structure  197 central nodes  23. See also centrality Chains of Opportunity (White)  51 Charness, G.  551 Chen, C.  598 Chen, L.  154

Chierichetti, F.  524, 526–527 choice attractiveness 100 esteem 99–102 expansiveness 100 level 258 popularity 100 rational  35, 537–538 chord diagrams  358 Christakis, N. A.  288–289, 295 circles, intersection  392 circular layouts  357–358 civic association networks  436 classic learning theories  617 Clauset, A.  325, 327, 523 Cline, D. H.  434 clique  27, 100 closeness centrality  73, 84, 190, 335 contribution/induced centrality perspective  341–342, 341f flow outcomes perspective  346 walk structure perspective  337, 339, 340t neural representation  501–503, 502f reciprocal 337 close reading approaches  428 clumps  226, 447, 447f cluster breaking links  452 clusters 100 Cobo, M. J.  598 cocitation network  8, 9f coevolutionary models, stochastic actororiented models  280 cognitive social structures  128 Cohen-Cole, E.  288 cohesion 437–438 historical network research  437–438 population 179 social capital  27, 197 theory and  2 Cointet, J.-P.  404, 418 Cole, M. W.  322, 323–324 Coleman, J.  36, 37–38, 38t, 44t, 46, 52, 119, 207, 288 Collar, A.  449 collective actors and action  43, 50 capacities 50–51

index   639 identities 42 learning, socialization and  55–56 outcomes, social capital and  572–573 social capital, social processes as  52, 54–56, 63 collegiality 52–53 collegial oligarchy  60–61 comembership  88, 137. See also duality one-mode matrix  392–393 ties 393 Comet, C.  52 commitments, neo-structural sociology  51, 311 common ground  39 common resource management satisfaction and information exchange between users  243–246, 245t communication network  74, 80 observable regularities  43 communicative events  43 community  26, 26f, 27. See also primary group concentration 315–317 community detection, network  311–313 algorithms  27, 312, 314–315, 314f, 320–322, 321f, 404, 520–521, 520f (See also specific types) case studies  311–327 (see also case studies, network community detection) literature 521 relational models  10–11 Community Question  370 Compass 529 competition 289 complete network designs all nodes  192 “boundary specification” problem  121–122 data collection and sources  28, 220 ego network  170, 173, 179 multilevel network analysis  403 multilevel network structures  59 nodal level network consequences  195–196 network emergence  194–195 respondent recall  160–161 sampling 29 time alters  125 computational narrative analysis, for embedded meaning  423–427

approach basics  423–424 results  424–427, 425f–427f subject-action-object network construction 424 computational social science (CSS) challenges and opportunities  516–517 definition 516 graph theoretical models and algorithms  517 role relations  82 sentiments 82–83 computational social science (CSS), big data, and networks  516–530 challenges 528–530 community detection algorithms  520–521, 520f computational thinking, on social processes  517–519 ethics and user consent  528 General Data Protection Regulation  528 Kleinberg 518–519 machine learning and social sciences  521–525 Milgram 518 modeling social data, challenges  519–521, 520f online experiments field 527–528 on interactions  525–527, 526f privacy 527–528 Simon 517–518 Tversky and Kahneman  519 computational social science (CSS), social networks 71–88 access 71 data analysis revolution  85–88 data collection revolution  81–85 access/opportunity structures  84–85 behavioral interactions  83 role relations  82 sentiments 82–83 interactions 71 mapping theories to data, discrepant conceptualizations 78–81 aggregate social interaction data, for access and social sentiments theories  80–81 role relation data, for social interaction, access, and sentiments theories  79–80

640   index computational social science (CSS), social networks (Continued) network ties, conceptualizations, comparing 76–78 temporality 77–78 ties and null ties, treatment  76–77 role relations  71 sentiments 71 ties, social, conceptualizations  72–76, 73f access or opportunity  73–74, 73f (time-aggregated) behavioral ­interactions  73f, 74–75 interpersonal sentiments  73f, 75 socially constructed role relations  73f, 75–76 computational thinking, on social processes  517–519 computer-assisted story grammars  424 conceptual metaphor theory  501 concurrency 77 conditional uniform distributions  219 Condorelli, D.  551 configurations, exponential family of random graphs—p* 225 conformity/norms 289 connected agents  542, 552 connectedness 190 “ The Connected Past,”  449 connectionist approaches  19–21, 20f connections measuring social, ego network data  176 model  538–541, 539f strong “redundant,”  23, 177 connectivity 27 kinds 445 scale-free distributions  384–385 consensus algorithm  324 consensus matrix  321, 321f consequences, social network  208 dyadic level  192f, 193–194, 208 group level  197 nodal level  195–196 contagion  45, 71. See also disease spread banks and financial distress  536 birth control adoption  485 case study, network community detection  316–317 children’s behavior  553

data analysis  86 emotional  83, 133, 529 error and error correction  146 exposure 316–317 externalities 541 financial networks  545–546 flows 120 friendship network  78 gun violence  622 health behaviors and attributes  288 obesity 289 social  193–194, 196, 201, 315–316 social memes, virality prediction  315–317 violence 625 contagion effect causal inference observational studies  291–294 ordinary least square identification, example  294–295, 294f individual accomplishments, sharing across team  246 information exchanges  244 contagion-reciprocity effect  246 continuity, computational social science  72–73, 77–78, 80–82, 84–85 social ties as (time-aggregated) behavioral interactions 73f, 74–75 continuous decay  87 contour shading  362–364, 363f Contractor, N. S.  192 contribution perspective  334, 340–345, 341f–342f, 343t, 344f, 345t, 348 control 195 “convergence of iterated correlations” (CONCOR) 318 Cook, K. S.  551 Corcoran, K. E.  436 Cornwell, B.  471 Corominas-Bosch, M.  551 cottage-based network  369 coupling parameter  324 Coward, F.  449 Coyne, S. M.  473 Cranmer, S. J.  263, 265, 266, 319 Crawford, F. W.  154 criminal networks  616–628 courthouse criminology bias  623 criminal justice applications  626–627

index   641 data  618, 619t–621t, 623–624 Katz-Bonacich centrality, intercentrality, and key players  543–544, 543f measuring 618–622 co-offending groups  622 criminal investigations  622 groups  618–622, 619t–621t moving forward  627–628 research literature review  618, 619t–621t locating groups and networks  617–618 origins and history  616–617 theoretical foundations  624–626 diffusion 625 group process  622, 625–626 organizations 624–625 theories 617 Cronbach, L. J.  301 Crosnoe, R.  299 cross-cutting ties  433–434 cross-level interaction effect  205f, 206 Crossley, N.  39–40, 44, 44t, 45, 46 cross-national networks  436 crowdsourcing platforms, experimental research 140–141 CTSA Expertise Explorer, NIH  602–603, 607, 608f culture  39, 42–43 forms (See also specific types) dualities 397–398 holes 400 homophily 589 meaning  414–428 (see also meaning, culture networks) models 43 networks and, fusion  398–399 symbols  42–43, 45 weak 51 culture, duality in analysis of  396–400 actors and cultural forms  397–398 artists and art worlds  396–397 networks and meanings  398–399 networks–culture fusion  399–400 cumulative advantage  384 cumulative causation of migration  482, 483–484 Curran, S. R.  487

CyberSeek career maps  610, 612f cyclic closure  240, 241t Danowski, J. A.  435 Danzi, A. D.  485 Daraganova, G.  235, 238 dark networks  623 data. See also specific topics relational 28 sharing 119–120 theory mismatch  78 data analysis computational social science  85–88 ethical considerations  130–131 data collection  28 complete network studies  220 computational social science  81–85 (see also under computational social science) access/opportunity structures  84–85 behavioral interactions  83 role relations  82 sentiments 82–83 ethical considerations  128–130 name generators  29 ties, reliability and validity  126 data collection strategies  119–131 data quality and assessment  126–128 (see also data quality and assessment) ethical considerations  128–131 data analysis and presentation of results 130–131 data collection  128–130 flows 120 gathering, theory in  120–121 interactions 120 sampling measurement, design strategies “boundary specification” problem  121–123, 125 name generator  123–124, 125 name interpreters  124–126 social relations  120 data quality and assessment  126–128 cognitive social structures  128 data fidelity, optimization strategies  127 implications and quality assessment  126–127 tie reliability and validity  126

642   index Davies, J. L.  455 Davis, A.  344 De Fazio, G.  424 degeneracy problem  226 degree  190, 337 definition 23 estimating, network scale-up method  157–161, 159f, 161f degree based  257 degree-degree correlation  586–587 De Groot, M. H.  547 Delarre, S.  54 DellaPosta, D.  437 demographic transition theory  482–484, 487, 489, 490 demography 480–491 behavior on network structure  490 behaviors 484 demographic transition theory  482–484, 487, 489, 490 diffusion 484–485 enumeration, estimation, and explanation  481–484 future directions  489–491 hidden and rare populations  480, 481, 488–489 immigration 488 micro-macro analytic link  481 migration 481 network approaches and current contributions 484–489 network evolution  490 populations 481 relationalism and transactions  484 scope 481 social networks  483–484 social structure  484 structural characteristics, mechanisms  489 substantialism and categorical approaches 484 de Nooy, W.  261 density  42, 190, 190t family relationships  468 group-level effects diversity  206, 207, 207f multilevel network models  204, 205 network consequences  197 network emergence  196

macro-micro-macro models  206 network coevolution model  202 network mediation models  198 network moderation models  200–201 social capital  27, 197 density effect, international trade network  593 dependence graphs  222–225 exponential family of random graphs—p* 223 maximal cliques  225 dependency, exponential random graph models 235 descriptive network research  190–191 Desmarais, B. A.  263 desolidarization 55 deterministic structuralism  50. See also neo-structural institutionalism development economics  549–550 Dewey, John  39 DIA2, NSF graph tool  602, 603–604, 605f–607f Institutional Explorer  603, 606f NSF Org Structure  603, 605f People Explorer  603, 606f Thesaurus Concepts  603, 605f Topic Explorer  604, 606f Dietz, T.  296 diffusion criminal networks  625 demography 484–485 dynamics  9–10, 19–20 error and error correction process  146–147 family planning  484–485 social psychological linkages  10 structure and  10 DiMaggio, P.  42, 396, 397 dimensional reduction techniques  355, 356f DiPrete, T. A.  154, 156, 165 direct connectedness  190 directed edges  18 directed ties  18 Dirichlet allocation (DA), latent  428 topic modeling algorithm  404 disease spread  20, 21–23, 22f, 122, 180 air transportation, global  379–380 artificial network models  279 centrality 346 connectionist approach  20 dynamics 255

index   643 ethics 130 externalities 541 flow outcomes perspective  346 flows 120 malaria genes  325–327, 326f sexually transmitted infections  153, 193 spatial dimensions  379, 380 virality prediction, social memes  316 dissonance 75 distance demography 487–488 geodesic  190, 190t, 191 graph-theoretic  100, 334, 337, 346, 349, 355, 356 Katz-Bonacich centrality, decay  543–544 neural encoding, social network position 504 neural representations, social closeness  501–502, 502f node pairs  348 population 179 residential proximity  565 retrospective time, event weighting by  87 role, emotional support  53 social dimensions defining  402 visualization, network  354–356, 355f–357f, 358, 359, 359f distance, archaeology long  451–452, 454, 457 short, local interactoins  453 theory model  455 distance, computational social science Airbnb investment game  526f, 527 long-range links  518 distance, geographic criminal networks  619t, 621t international trade networks  583, 586, 588, 589 social distance and  6 distance, physical ego network data  180, 190, 195 global scale  378 micro-level networks  370–372 social tie on  370 tie formation  85 topological  379, 381 distance-based layouts  355–358, 357f, 359 distant reading approaches, computations  428

divisive methods  313 Djebbari, H.  553 documents duality  404–405 Dodds, P. S.  402 domain 41 dominance relations, Eurasian red deer  267–279 Donati, P.  51 Doreian, P.  288 dual alters  58, 61 dual approach  375–376, 375f dual embeddedness, job matching  569 duality concept 392 discussion, scope  395–396 mutually constitutive  394 relational device  394 structural mechanism  393 duality, beyond person and groups  392–406 circles, intersection  392 comembership, one-mode matrix  392–393 concept 392 culture, analysis of  396–400 actors and cultural forms  397–398 artists and art worlds  396–397 networks and culture, “fusion,”  399–400 networks and meanings  398–399 linkages, different social element types  394 matrix transformations  392–393 mutually constitutive  394 past and present sociology  395–396 recent developments and future directions  402–405 cultural analysis, documents duality and words yielding categories  404–405 duality and extensions toward multiple networks 403–404 relationality to “fusion” of networks and culture 399–400 structural mechanism  393 structure analysis, affiliation networks  400–402 actor-network theory  402 affiliation networks  401 “heterogeneous networks,”  402 “new” science  393, 401–402 two-mode incidence matrix (affiliation matrix) 392–394

644   index dual positioning  403 dual process sociology  416 DuBois, C.  261 Ducruet, C.  452, 454–455 Dunbar, R. I. M.  496–498, 497f Dunbar’s number  497–498, 497f Duong, M.  296 Duquenne, V.  397–398 Durkheim, É.  52, 395 dyad (dyadic level)  24, 25–26, 26f, 189, 190, 190t, 436 archaeology  445, 446–448, 447f battling 436 definition 24 event counts  74 homophily effect  192f, 206–207 multiple groups analysis  203–206 network antecedents  191–193, 192f, 208 network coevolution model  202f, 203 network consequences  192f, 193–194, 208 network mediation models  198, 199f network moderation models  200f, 201 ties  74, 75, 81, 87–88 transmission 189 dyadic effects attribute-based 256–257 change processes  256–257 entrainment 257 propinquity-based 257 proximity, Eurasian red deer dominance modeling 269 dyad-wise shared partner distribution  226 dynamical systems  313–314 dynamic dimensions  58–59 dynamic invariants  58 dynamic networks  19 dynamic relationships  45 dynamics diffusion with  9, 10 network, modeling  254–282 (see also  network dynamics modeling) dynamic social diffusion models  9 Eagle, N.  88 ebay.com  386–389, 386t, 388f ecological space  481 ecology of affiliation  395–396 of games  395

economics 535–554 behavior and games  540–552 development economics  549–550 exchange theory, bargaining, and trade 550–552 financial networks  544–546 games on networks  541–542 labor markets  548–549 social learning  546–548 strategic complementarities  542–544, 543f strategic substitutes  542 externalities 535–536 history and overview  536–537 Nash equilibrium  538, 542 network formation  537–540, 539f network models, empirical analyses  552–553 studying network structure, rationale  535 economic sociology, social capital and  563–575 Eder, D.  102 edge effect, empirical examples  241t, 242 edges  17–19, 22, 22f definition 17 degree 23 directed and undirected  18 removal 27 edge-wise shared partner distribution  226 education maps  598–613. See also maps, science, technology, and education ego-centric networks  38 ego network  5, 26, 26f designs  122, 125, 129 ego network data  29, 170–183 advantages 172–173 applications  171, 175–181 full network features, inferring  179–181, 181f individual-level outcomes, predicting  176–177 RDS estimation, improving  178–179 social boundaries, measuring  177–178 definition 170 disadvantages 173–174 examples  171–172, 171f full 170–171 future uses  181–183 history 170 information  174–175, 174t ego node  25, 26f Elias, N.  52

index   645 Ellefson, N. C.  292 Elliot, P.  235 Elliott, M. L.  551 Éloire, F.  55 embedded meaning, computational narrative analysis  423–427 approach basics  423–424 results  424–427, 425f–427f subject-action-object network construction 424 embeddedness 27 criminal networks  624 dyadic relationship  470–471 family relationships  470–471 neo-structural sociology  53 paradox 54 social capital  37 embedding, word  428 Emerson, R. M.  551 Emirbayer, M.  39–40, 43, 44, 44t, 46, 484 empathy 39 empty graph  22 endogeneity  104, 553 measures, centrality  334 network effects, Eurasian red deer dominance  273–276, 274t structure 257 endorsement 550 Entangled (Hodder)  460 entanglement theory  458–459 entrainment 257 entwined lives roles, family  476 Entwisle, B.  485 en viguer 57 eopinions.com  385, 386t epidemiological perspective  9 equation-based models  371 equivalence individuals 100 structural 51 Erdos, P.  237 Erdös-Rényi conditional uniform distribution 219 Erdös-Rényi “random graph” distribution 221 error and error correction process, network diffusion 146–147 Ertug, G.  573

esteem centrality 100 choice 99–102 estimable models  257 ethics 30–31 computational social science  528 data collection strategies  128–131 data analysis and presentation of results 130–131 data collection  128–130 ethnic minorities, social capital access deficit  565 workplace  565, 566, 568–569, 571 Euler, L.  373 Eurasian red deer, dominance relations modeling 267–279 attribute effects: age  269 dyadic effects: proximity  269 endogenous effects reciprocity 269–270 triadic closure  270–271, 270f modeling strategy and results  270–279 baseline 270–271 extended 277–278 more practical  272–277, 274t, 275f subsequent steps  278–279 model statistics  269 Evans, E. D.  419 Everett, M. G.  335, 337, 341, 344 evolution, asymmetry and  106–107 exchange, systems of, family relationships  471–474 parent-child interactions  470, 471, 473–474, 476 studies 471–474 Exchange and Power in Social Life (Blau)  3 exchange theory  73–74, 550–552 exclusion 55 social, network recall and  147–149, 147f exogenous measures, centrality  334 expansiveness  100, 101 expectations, relational  43 experiments. See also specific types boundary conditions  142 crowdsourcing platforms  140–141 definition 138 designs 138 goal, causal mechanism identification  140–141

646   index experiments (Continued) implicit theory  142 natural 138 quasi-  138–139, 139f scope conditions  142 true 138 two-by-two factorial design  143, 143f within-subjects and between-subjects  138 experiments, social networks  137–149 causal identification/generalizability, by research type  138–139, 139f challenges 137 definition  138–139, 139f definition and types  138–139, 139f examples 143–149 error and error correction process, network diffusion  146–147 homophily and health behavior spread 144–145 Matthew effect and networks  145 network recall and social exclusion  147–149, 147f interventions 10 manipulations  141–143, 143f study, feasibility  139–141 explanatory network research  191 exponential family of distributions  225 exponential family of random graphs—p*  219, 221, 223–228 Bernoulli graph  224 complete subgraphs  223 degeneracy problem  226 dependence graphs  222–225 Hammersley-Clifford theorem  222, 224 Markov chain Monte Carlo maximum likelihood estimation  227, 239 Markov dependence  224–226 Markov random graph  221, 226 parameters 225–227 simulation, estimation, and goodness of fit 227–228 statistical theory  223–225 exponential random graph models (ERGMs)  3, 5, 11, 137, 219, 221, 225 applications  28, 192 definition 234 dependency 235 duality 403

ego networks  178, 180 goal 372 Hammersley-Clifford theorem  224 international trade network  583, 592–593 key assumptions  234–235 multiplexed and multilevel networks  235 network decomposition, substructure  54, 56 network dynamics  258, 263–265 network tie, interdependences  234 one-mode networks  235 relational sociology  44, 120 relative-to-chance analysis  178 spatial dimensions  371–372 structural patterns, static networks  86 temporal 86 two-mode/bipartite networks  235 exponential random graph models (ERGMs), advances 234–249 empirical examples  239–249 common resource management and information exchange  243–246, 245t individual accomplishment sharing, team  246–248, 247t multiple project memberships and advice seeking, organizations  240–243, 241t ERGM and ALAAM fundamentals  234–236 future steps  249 model constructs  236–239 modeling techniques  239 multilevel ERGMs and ALAAMs  237–239, 238f exponential random graphs, Bayesian  228 externalities  535–536, 547 external validity, network effects  290, 301, 385 Ezoe, S.  157 Facebook brain and social networks  506, 507 computational social science  82, 83, 522, 529 network text analysis  418–419 randomized experiments, causal inference 290 triad closing  79 Facebook effect  104–105

index   647 Fagiolo, G.  591–592 false negatives  79, 127, 129 false positives  129, 457, 530 family planning, diffusion  484–487, 489–490 family relationships  467–477. See also kin archetypical exchanges, multiplexity and network position  20–21, 20f betweenness 469 density 468–469 dependent, child  474 “does for” and “does with,”  475 embeddedness 470–471 emergent 467 exchanges, systems  471–474 heterogeneity 467 measuring 468 parent-child interactions  470, 471, 473–474, 476 reciprocity 470 role equivalence measure  475 roles, interaction  474–477 affectionate 475–476 caring 475 entwined lives  476 friendly 476–477 limited interaction  476 size 468 transitivity 469–470 voluntary character  468 Fang, R.  198 Faris, R.  101, 106, 436 Faulkner, R. R.  622, 625 Faust, K.  219 Faye, M.  244 Fazito, D.  162, 163 feedback (loop) demography 481 dynamic social diffusion models  9 games on networks  542 peer, divergent  499 peer-effect game  544 preferential attachment, field experiments  384–385, 389 feedback loop  7 Feehan, D. M.  154, 163–166, 488–489 Feld, S. L.  108 Felmlee, D. H.  471 Fernandez, A.  320

Fernandez, R. M.  575 fertility decline, mechanisms  484–487, 489–490 fight club data  119 financial networks  544–546 Fine, G. A.  39–40 first law of geography  368 first law of networks  368 fixed-coordinate layouts  357–358 fixed-resolution community detection algorithm 320 Fletcher, J. M.  288 flow centrality  341 flow outcomes perspectives  334, 345–347, 347t, 348–349 flows  120, 255–256 Foran, J.  433 formal relational systems  10 forms, basic network  21–23, 22f Fortin, B.  553 Fortunato, S.  312 Fowler, J. H.  288–289, 295 Frank, K. A.  292, 296, 297, 299–300, 356 Frank, O.  221, 224–225, 234, 237 Franzosi, R.  417, 424, 437 Fraser, A. M.  473 Frechette, G. R.  551 freedom, moment of  46 free-form generated text  11 Freeman, L. C.  11, 267, 335, 337, 341–342, 353 Freeman, S. C.  267 free-rider problem  56 frequency matrix  321, 321f Freund, K. P.  457 Friedmann, J.  378 Friel, N.  239 friends defining 74 family role  476–477 of a friend  24 vs. strangers, brain encoding  500–501 friendships aspirational 108 micro-interactions 256 Fruchterman, J. T.  356 Fruchterman-Reingold algorithm  267, 356, 357, 357f, 359, 362 Fu, W.  263 Fuhse, J.  40, 43–44, 44t, 83, 399

648   index functional networks  322 human brain network architecture, different states  322–324 Galeotti, A.  551 Galunic, C.  573 games, behavior and  540–552 development economics  549–550 exchange theory, bargaining, and trade 550–552 financial networks  544–546 games on networks  541–542 labor markets  548–549 social learning  546–548 strategic complementarities  542–544, 543f strategic substitutes  542 games on networks  541–542 game theory  538 Gant diagrams  360, 362f Gardner, B. B.  344 Gardner, M. R.  344 Gargiulo, M.  573 Gaudet, H.  547 Gauthier, R.  469, 471, 473, 475, 476–477, 490 Gelman, A.  156, 289 General Data Protection Regulation (GDPR) 528 generalizability models 203–204 by research type  138–139, 139f, 206 generalized network scale-up  162–164 generalized scale-up estimator  162–164 general similarity networks  456–458 General Social Survey (GSS), “important matters” generator  29 generated text, free-form  11 generative network models  8 generative theories  74 geodesics distance between two nodes  73, 190, 190t, 191, 195, 226, 356, 504 distribution  226, 227 flow outcomes perspective  346, 347t paths 374 walk structure perspective  336–339, 340t geography first law of  368 homophily 589

geometrically weighted degree parameter  226 Gerding, H.  456, 456f, 459 Gestalt psychology  40 Gilbert, E.  82 Gladstone, E.  146–149 Gleeson, J. P.  255 globalization, economic, international trade network 583, See also international trade network (ITN) global network  26, 26f global-scale spatial networks, measuring  376–379, 377f Gloor, P.A.  127 Goffman, E.  38, 38t Goldberg, A.  400 Golder, S. A.  82 Golitko, M.  457–458, 458f Gomez, S.  320–321 goodness of fit (GOF) exponential family of random graphs—p* 227–228 exponential graph models  239 “goodness” of partitions  313 Goodreau, S.  28, 227, 228, 239, 458 Goodwin, J.  433, 484 Gould, R. V.  42, 433, 625–626 Graham, S.  459 grandparents caring roles  475 friendly roles  476 Granell, B.  320–321 Grannis, R.  374 Granovetter, M.  23–24, 434, 447, 567, 568, 624 graph. See also specific types dependence 222–225 empty 22 invariant  340–344, 347–348 maximally connected  22f, 23 partitioning  312, 521 theoretical models and algorithms  517 graph-theoretic distance  100, 334, 337, 346, 349, 355, 356 gravity model international trade network  583, 590–593 future direction  592–593 homophily 588 multivariate regression quadratic assignment procedure  583, 592

index   649 systemic equivalence  590 topological properties  591–592, 593 M1 × M2/D2 370 tie strength  584 topological properties  586–587 Grippa, F.  127 Grosser, T. J.  201 ground truth  314–315 group interlock network  401 group level  189, 190, 190t, 311 models 189 dyadic and nodal-level analysis  204–205, 205f network coevolution model  202, 202f network consequences  192f, 197 network mediation  198, 199f network mediation models  198, 199f network moderation models  200–201, 200f group process  71 criminal networks  622, 625–626 Grund, T. U.  200 Gu, B.  264 Guerra-Pearson, F.  398 Guillory, J. E.  290 Hagedorn, J. M.  623 Haggis, D. C.  454, 455f Hallinan, M. T.  79 Hammersley-Clifford theorem  222, 224 Hampton, K. N.  29 Hancock, J. T.  290, 529 Handbook on Archaeology and Globalization (Hodos) 449 Handcock, M. S.  239, 264 Hanneke, S.  263 Hansen, M. T.  573 Hansen, W. A.  385 Harding, A. F.  449 Hawthorne factory, Western Electric Company 570 Haynie, D. L.  176 health behavior spread, homophily  144–145 Healy, K.  355–356 heat maps  364, 364f, 365f Hedström, Peter  35, 36, 44t, 45–46 Heider, F.  100

herds economics 546–547 Eurasian red deer  267–279, 282 (see also Eurasian red deer, dominance relations modeling) “heterogeneous networks,”  402 heterogeneous relational data  402 heterophily 257 matching, degrees  100 heuristics 39 Hidaka, Y.  157 Hidalgo, C. A.  402 hidden populations  480, 481, 488–489 hierarchical clustering  313 highly variable regions (HRVs), malaria genes 325–327 Hillmann, H.  433, 434, 437–438 Hipp, J. R.  370–371 Hirshleifer, D.  546 historical network research  432–439 associational and organizational networks 435–436 brokerage and centrality  438 cohesions 437–438 cross-cutting ties  433–434 informal social ties  434–435 narrative networks  436–437 presumption, fundamental  432 Hodder, I.  459 Hodos, Tamar, Handbook on Archaeology and Globalization 449 Hoffman, M. A.  404–405 Hofman, J. M.  522 hold-up problem  551 holes cultural 400 structural (see structural holes) Holland, P. W.  79, 100, 101, 219, 221, 224, 238, 254, 258, 281 homogeneity, exponential family of random graphs—p* 225 homophilous attraction  72, 75, 77, 81 homophily  21, 25, 34, 256–257, 288, 312 culture 589 definition  565, 588 deviant 21 empirical examples  241t, 242 exchange partner choice  55

650   index homophily (Continued) geography 589 health behavior spread  144–145 index 77 on interactions  527 international trade network  588–590 latent 220 meme spread  316 politics 589 relational structure, among attitudes  400 social learning  548 social network connectivity  565 trust 522–523 Hopcroft, J.  517, 520 horizontality 100–101 horizontal patterning  38, 38t Hric, D.  312 Hubbell, C. H.  338 Huber, J. C.  384 hub-spoke systems, centrality  100 Hudson School for Girls, runaways  369, 616 Hughes, L. A.  626 humans. See also neuroscience brain network architecture, different states  322–324, 323f entangled networks  458–459 micro-level networks  369–372 sociality 498 Hunt, T. L.  448 Hunter, D. R.  239 Hutcheson, Francis  98 identification 289 identities 41 collective 42 Identity and Control (White)  41, 398–399, 400 Ikeda, M.  221 imitation 289 implicitly measured transmissions  194 importance sampling  239 “important matters” generator 29 question  29, 108, 123, 182 income, network effects  567–568 indirect connectedness  190 indirect social relationships  503

individual-level accomplishments, sharing across team  246–248, 247t action, social structures and  35–36, 35f capacities 50–51 level (see nodal level) models 189 outcomes, ego network data prediction  176–177 resources  37, 38, 38t induced centrality perspective  334, 340–345, 341f–342f, 343t, 344f, 345t, 348 inequality  99, 107–108 inertia  87, 589 inferences, causal, quantifying robustness informing debate  296–297 selection models  298 influence 573–574. See also causal inference; specific types process 289–290 simultaneous 291 influenza spread, air travel patterns on  379 Infomap  316, 324, 520, 520f informal social ties  434–435 information centrality  339, 342 exchange, between users  243–246, 245t externalities 547 influence based in  289 passing 550 theoretic algorithm  316, 524 informed consent  30 informR (R package)  260 Ingram, P.  55, 436, 438 innovation outcomes, social capital  572 input-process-output model  198 institutionalism, neo-structural  57–58 institutional logics  397–398 institutional review boards  129–130 institutions  42. See also specific types instrumental action  38 instrumental variable (IV) method  295 interactionism  38–40, 44, 44t interaction parameters, selecting  297 interactions 255–256 interactions, computational social science  71, 120

index   651 aggregate social data, access and social sentiments theories  80–81 behavioral 83 (time-aggregated) behavioral, social ties 73f, 74–75 intercentrality  543–544, 543f interdependencies actors 50 organizational society  52–53 position 50 practice-related 52 relationships as indicators  51 interdisciplinarity  1, 3–4, 3f interlocking directorates  188, 393, 401 International Monetary Fund Direction of Trade Statistics (DOTS) database. 584 international trade network (ITN)  583–593 binary studies, early  584 data and measurement  584–585 exponential random graph model  583, 592–593 multivariate regression quadratic assignment procedure  583, 592 social network analysis modeling strategies  588–592 homophily 588–590 systemic equivalence (gravity model)  583, 590–591 topological properties  591–592 topological properties  586–587 interorganizational ties  436 intersection set  127 intervening model  198 intransitive triad  24, 25f intransitivity, boundary ambiguity  469–470 intraorganizational networks definition 569 effects  571–572, 571f (see also social capital, economic sociology) performance and innovation outcomes  571f, 572 Irwin, G. J.  448 isolate  22, 22f isomorphism rule  335 Jackson, M. O.  538–539, 541, 549, 551 James, William  39

Jenkins, D.  455 Jennings, J.  449 Jeong, H.  373–374 job-matching, social capital and outcomes 567–569 processes  564–566, 564f, 567f Johnsen, E.  155, 158 Jones, C.  399 Kadushin, C.  24, 130, 154, 353, 445, 446–447 Kahneman, D.  519 Kalish, Y.  194, 235 Kamada, T.  356 Kamada Kawai  356, 357f Karahalios, K.  82 karate club network  314–315, 314f, 446–447 Karbasi, A.  154 Katz, E.  119, 288 Katz, L.  338 Katz-Bonacich centrality  542–545, 543f Kawai, S.  356 Keim, S.  487 Kelcey, B.  296 Kelly, M.  545 Kennedy, D. M.  626 Kennedy, M. T.  399 Kettering, S.  435 key players criminal  543–544, 543f measures, off-the-shelf  344–345, 345t, 349 Kick, E. L.  585 kickstarter.com  385, 386t Kilduff, M.  194, 206, 574 Killworth, P. D.  155, 158, 161 Kim, S.  591 Kim, Y. H.  88 kin, social support and  467–477. See also family relationships Kitts, J. A.  28, 72, 81, 85, 87, 88, 458 Kleinbaum, A. M.  499 Kleinberg, R.  517, 518–519, 520, 524, 526–527 Kleinman, S.  39–40 Knappett, C., Network Analysis in Archaeology 449 “know,”  157, 165–166 Kohler, H.-P.  482, 486 Kolacyzk, E. D.  220, 221 Königsburg, bridges of  372–375, 373f, 375f

652   index Koskinen, J. H.  220, 235, 239, 249, 265 Kossinets, G.  86 Kovács, B.  102 k-partite networks  228 k-paths 226 Krackhardt, D.  56, 128, 574 Kramer, A. D.  290 Kranton, R. E.  299, 551 Kreager, D. A.  626 Kretzschmar, M.  9 Kristiansen, K.  449 Krivitsky, P. N.  264 k-triangles 226 Labianca, G.  201 labor market, social capital and  563–569 job-matching outcomes  567–569 job-matching processes  564–566, 564f, 567f labor markets  548–549 Lachmann, R.  436 Lane, M.  415–416 Larremore, D. B.  325, 327 latent class analysis (LCA)  246 latent Dirichlet allocation  428 topic modeling algorithm  404 latent homophily  220 latent space  220, 298, 299, 380 latent space model  220, 296, 298 Latour, B.  402 Laumann, E. O.  122 Lawler, E.  473 layers 319 Lazarsfeld, P.  547 Lazega, E.  53, 56, 229, 239, 296, 403 Lazer, D.  31 learning collective, socialization  55–56 naïve (boundedly rational)  547 rational (Bayesian)  547–548 theories, classic  617 Leavitt, H. J.  197 Lee, H.  156 Lee, L. F.  553 Leenders, R. T. A.  289 Lehmann, T.  436 Leifeld, P.  265, 266

Leinhardt, S.  79, 100, 101, 219, 221, 224, 238, 254, 258, 281 Lerner, J.  260 levels. See also specific types analysis  25–26, 26f bridging 22f, 23–25, 25f Lévi-Strauss, Claude  55 Lewis, K.  79, 82 Lewis, P. A.  507 LeX subway maps, learning  609–610, 611f Light, R.  360, 404 limited interaction roles, family  476 Lin, N.  37–38, 38t, 44t, 566 Lin, T.  525 Lin, X.  553 linchpins, vertical  57, 60, 63 lines 17. See also edges linkages. See also specific types social elements, different kinds  394 social psychological, diffusion  10 link-tracing methods  30, 122 Linton, R.  105 Liu, X.  553 “living the rules,”  57 Lizardo, O.  400 Lois, D.  486–487 Lomi, A.  403 longitudinal data, network influence vs. selection 291 longitudinal exponential family random graph models (LGERM)  264–266 longitudinal networks models  229–230, 258, 259t, 263, 264, 265, 280 (See also specific types) structure evolution/structure and attributes coevolution 230 longitudinal structures, for social processes 58–62 Lopez-Kidwell, V.  201 Louvain  520, 520f Lu, X.  170, 172, 178–179 Lundberg, G. A.  353 Luo, J.  438 Lusher, D.  220, 235, 249 Lyons, R.  288

index   653 machine learning, social sciences  521–525 macro-level networks, of places  376–379, 376t, 377f macro-micro-macro models  206–208, 207f macroscale 320 macroscopic 311 macro-structures 46 Mahy, M.  154 Main, Henry Sumner  98–99 Making of the Middle Sea (Broodbank)  449 malaria genes, probabilistic network model  325–327, 326f Malkin, I.  449 Malm, A. E.  625 Maltiel, R.  161–162 Manchester School  98 Mandel, M.  475 Manea, M.  551 manipulations, experimental  141–143, 143f Manning, W. D.  470 Manski, C. F.  291, 292, 552–553 maps  357, 358 criminal rivalries and alliances  626 research collaboration network  598, 599f maps, science, technology, and education  598–613 CyberSeek career maps  610, 612f design  599–600, 600t, 601t exemplary 602–603 history 598 LeX subway maps, learning  609–610, 611f NIH CTSA Expertise Explorer  602–603, 607, 608f NIH Twitter Data (Activity) Explorer  603, 608–609 NSF graph tool DIA2  602, 603–604, 605f–607f Institutional Explorer  603, 606f NSF Org Structure  603, 605f People Explorer  603, 606f Thesaurus Concepts  603, 605f Topic Explorer  604, 606f research collaboration network  598, 599f scalable, multilevel maps  612–613 Springer Nature SciGraph  603, 604f utility  601–602, 602f Marcum, C. S.  256, 259t, 260, 261, 278, 279

Marin, A.  29 maritime transport flows  450–452, 451f, 452f Markov chain Monte Carlo maximum likelihood estimation (MCMCMLE) 227 Markov chain Monte Carlo (MCMC) simulation 239 methods, ALAAMs  236 Markov chains  299 Markov dependence  224–226 Markov random graph  221, 226 Markov stationarity assumption  281 Maroulis, S.  296 Martin, J. L.  39–40, 44, 44t, 46, 86 Massey, D. S.  483 material objects  45 Matic, A.  88 matrix adjacency  18, 18f affiliation 392–394 consensus  321, 321f frequency  321, 321f mixing  160, 166 one-mode, comembership  392–393 transformations 392–393 Matthew effect  61, 104, 145, 384, 389 maximally connected graph  22f, 23 maximum likelihood estimator  155–156, 158, 161–162 Maya sites, obsidian sources  457, 458f Mayora-Ibarra, O.  88 McCarty, C.  155, 156, 158–159, 159f, 161, 162 McCormick, T. H.  156, 158–162, 166 McEvily, B.  196, 288, 573 McFarland, D.  86, 261, 360 McFowland, E. III  296 McGloin, J. M.  622 McGuire, G. M.  571 McKenzie, R.  377–378 McLean, P. D.  399, 434 McPherson, M.  395 Mead, G. H.  38, 38t meaning belief networks  416 cognitive sociology  416–417 complex structures  416 dualities 398–399

654   index meaning (Continued) forms 41 meaning of  415–418 simple 416 social life  415–416 social networks  41–42 structuralism 415–417 structures  43, 414 subjective 41 meaning, culture networks  414–428 definition 415–418 embedded, computational narrative analysis for  423–427, 425f–427f (see also  computational narrative analysis, for embedded meaning) examples 414 structures, network text analysis for  418–423 (see also semantic network analysis, for meaning structure) measures. See also specific types centrality 334–335 definition 334–335 Mechanical Turk (MTurk)  140–141 mediation group-level network  198, 199f network models  198, 199f Medici  41, 342–343, 342f, 343t, 347, 438 Mehra, A.  194, 200, 206 Mello, M. B.  162, 163 MelNet 249 memes machine learning  524 virality prediction  315–317 Mendelsohn, J.  490 Menninga, E. J.  319 Menze, B. H.  453, 453f Menzel, H.  119, 288 Merli, M. G.  490 Merton, R. K.  104 meso-level networks, of things  372–376, 373f, 375f mesoscopic 311 methodology  188–189. See also specific topics developments, theory and  1–2 (see also specific theories) Metis  520, 520f micro-interactions, friendships  256 micro-level networks, people  369–372

microscale 320 microscopic 311 Microsoft Learning eXperiences  610 Microsoft LeX subway maps, learning  609–610, 611f migration, demography  484, 489 cumulative causation  482, 483–484 future directions  489–490 social network concepts  487–488 Milgram, S.  20, 158, 518 Mills, B. J.  457–458 Min, K. S.  296 Minehart, D. F.  551 minorities, social capital access deficit  565 workplace  565, 566, 568–569, 571 Mische, Ann  39, 40–43, 44t missingness 101–102 Mitchell, J. C.  51 mixing matrix  160, 166 nonrandom 156 random 166 Mizruchi, M.  401 mobility in loops  51, 53, 61 mobilization strategies, job-finding  565–566 models, social network  188–209. See also specific models and topics levels of analysis  189–191 dyadic  189, 190, 190t group  189, 190, 190t nodal (individual)  189, 190, 190t methodological advances  298–299 network consequences  208 six basic models, social network analysis  189–197, 192f causal mechanisms  191 descriptive network research  190–191 dyadic level antecedents  191–193, 192f, 208 consequences 192f, 193–194 transmission 189 examples, by level  189–190, 190t explanatory network research  191 group level attributes 189 consequences 192f, 197 emergence 192f, 196–197

index   655 individual level, attributes  189 nodal level consequences 192f, 195–196 emergence 192f, 194–195 social relations  189 six basic models, variations and extensions 197–208 macro-micro-macro models  206–208, 207f multiple groups and multilevel models  203–206 multiple groups and multilevel models, cross-level interaction  205f, 206 multiple groups and multilevel models, generalizability 203–204 multiple groups and multilevel models, group-level effects  204–205, 205f network coevolution model  201–203, 202f network mediation models  198, 199f network moderation models  199–201, 200f social capital and workplace outcomes  570–572, 571f social data, challenges  519–521, 520f modularity  27, 313, 319–320 congressional roll call case study  317–319, 318f modularity maximization  312–313 C. elegans neural network  319–322, 321f human brain network architecture, different states  323–324 modularity optimization  319–321, 324, 521 Mohr, J. W.  397–398, 416, 424, 436 Mol, A. A. A.  460 Monro, S.  239 Montgomery, M. R.  485 Moody, J.  27, 86, 126, 318–319, 436, 469, 471, 490, 523 Moore, C.  523 More, T.  438 Moreno, J.  352, 353, 369, 616, 627 Morgan, D. L.  254 Morris, M.  9, 28, 458 mothering roles  475 Mounier, L.  53 Mousavi, R.  264 MPNet  238, 239–240, 244, 249

Mucha, P. J.  312, 318–319 Mueller, A. S.  297 Muller, C.  297, 299 multidimensional scaling (MDS) technique  355, 356, 357f, 362 multilayer networks  10, 19, 229, 319, 322–323, 323f multilevel models auto-logistic actor attribute models  237–239, 238f group level centrality  204, 205 network analysis and duality 403–404 dyadic and nodal-level analysis  203–206 multilevel networks  229 analysis, complete network designs  403 exponential random graph models  235 structures, social processes  58–62, 63 multilevel synchronization  61 multiple groups, dyadic and nodal-level analysis 203–206 multiple project memberships and advice seeking in organizations  240–243, 241t multiplexity 10 definition 19 network, exponential random graph models 235 network position  20–21, 20f multiplier effects  553 multislice graph  360, 361f, 362f Gant diagrams  360, 362f Newcomb fraternity data  360, 362f multivariate networks  229 multivariate regression quadratic assignment procedure (MRQAP)  192, 583, 592 Munshi, K.  486 mutual consent  538 mutually constitutive  394 Mützel, S.  399, 405 Myaux, J.  486 Myerson, R.  551 Myrdal, G.  483 Myrskylä, M.  482 naïve learning  547 named alters, ego network data  171–173, 171f, 174t, 175–177, 182–183

656   index name generators  124 definition 123 “important matters” question  29, 108, 123, 182 relationships, choosing  123–124, 125 name interpreters, social ties identification  124–126 nanrandom missingness  101 Narayanan, A.  129 narrative networks  436–437 Nash, R.  625 Nash equilibrium  538, 542 National Longitudinal Study of Adolescent to Adult Health (Add Health) dataset  2, 26 National Network for Safe Communities  626 natural experiment  138–139, 139f natural language processing  83, 417–420, 428, 517, 521, 523, 535 nature, social networks  39 Nature SciGraph, Springer  603, 604f Nava, F.  551 Neal, J. W.  128 neighborhood preserving centrality measures 335 neo-structural institutionalism  57–58 neo-structural sociology  50–64 advice networks  55–56, 58 bureaucracy and collegiality  52–53 fundamentals 50–51 individual and collective capacities  50–51 longitudinal and multilevel structures, for social processes  58–62, 63 neo-structural institutionalism  57–58 relational infrastructures  53–54 social processes, as social capital of collective  52, 54–56, 63 nested dualities  395 netdoms  41, 398 network(s) autoregression models  196 community detection, case studies  311–327 (see also case studies, network community detection) consequences group level  192f, 197 nodal level  192f, 195–196 culture fusion with  398–399

definition  255, 445 designs complete 121 ego network  122, 125, 129 dynamics  254, 255 exchange theories  72, 77, 78, 85 first law of  368 formation, economics  537–540, 539f inference problem  524 recall, social exclusion  147–149, 147f sampling based on  130 (See also specific types) science 8 social (see specific types) social dimension  3 text analysis (see semantic network analysis) theory 34 Network Analysis in Archaeology (Knappett) 449 network change processes  256–257 dyadic effects  256–257 endogenous structure  257 nodal effects  256 network coevolution model  201–203, 202f network dynamics modeling  254–282. See also stochastic actor-oriented models (SAOMs) approaches 258 conceptualization 255–257 change processes  256–257 change processes, dyadic effects  256–257 change processes, endogenous structure 257 change processes, nodal effects  256 fundamentals 255–256 Eurasian red deer empirical example  267–279 (see also Eurasian red deer, dominance relations modeling) exponential random graph framework  263–265 fundamentals 257–258 history 254 longitudinal exponential family random graph models  264–266 model selection  265–266 outstanding issues and future directions  279–282

index   657 purpose 254 relational event framework  258–261, 259t separable temporal exponential family random graph models  264–266, 272, 274t, 276, 277, 280 stochastic and estimable models  257 temporal exponential family random graph models  86, 259t, 263–266, 271–272, 273–277, 274t network dynamic temporal visualization (NDTV) R package  360 network mediation models  198, 199f network moderation models  199–201, 200f network scale-up estimator  155–157 network scale-up method  30, 153–167 aggregated relational data  153–154 core insight  154 group size, estimating  154 information used  154 maximum likelihood estimator  155–156, 158, 161–162 methodology 155–164 Bayesian approach  161–162 degree, estimating  157–161, 159f, 161f generalized network scale-up  162–164 network scale-up estimator  155–157 respondent-driven sampling  153 survey design  164–166 “know,” defining  165–166 scaled-down condition  166 stigma, subgroups  153, 164 network ties, conceptualizations  72–76, 73f. See also tie(s), conceptualizations neural encoding. See also neuroscience indirect social relationships  503–506 importance, everyday thought and behavior 503 position characteristics  504–505 social status, distinct but analogous facets 505–506 neural network Caenorhabditis elegans  319–322, 321f human brain at different states  322–324, 323f neuroscience 496–510 brain and social behavior, study of  499–500

brain encoding, social relationships  500–503 friends vs. strangers  500–501 social closeness  501–503, 502f brain processing capacity, on social network size  507–508 brain shaping and constraining, of social networks 506–507 brain size, on social network size  496–498, 497f disciplines 496 emerging field  498–499 indirect social relationships, neural encoding 503–506 importance, everyday thought and behavior 503 social position characteristics  504–505 social status, distinct but analogous facets 505–506 social brain hypothesis  496–498, 497f social networks shaping brain  508–510 Newcomb fraternity data  360, 362f Newman, M. E.  338, 401–402, 523 Newman-Modularity  520, 520f “new” science, networks  393, 401–402 affiliation networks  400–401 international trade network  586 new technologies, adoption  549–550 Nguyen, H.  622 niche coordination and collective agency  54 neo-structural sociology  53–54 social 53–54 NIH CTSA Expertise Explorer  602–603, 607, 608f NIH Twitter Data (Activity) Explorer  603, 608–609, 609f nodal level  189, 190, 190t network coevolution model  202f, 203 network consequences  192f, 195–196, 208 network mediation models  198, 199f network moderation models  200f, 201 node central  23 (see also centrality) definition 17 degree distribution  586–587, 591 one-mode network  18, 18f two-mode networks  18–19, 59

658   index node-level analysis antecedents and consequences  188–209 change processes  256 multiple groups  203–206 nominalist approach, boundary specification 122 nonplanar graph  375 nonrandom mixing  156 norms 289 Notestein, F. W.  482 Nothway, M. L.  353 NSF graph tool DIA2  602, 603–604, 605f–607f Institutional Explorer  603, 606f NSF Org Structure  603, 605f People Explorer  603, 606f Thesaurus Concepts  603, 605f Topic Explorer  604, 606f null 72 null ties  72 computational social science theories  76–77 observational studies, network influence  290–294, 292f, 293f obsidian sources, Maya network  457, 458f Obukhova, E.  575 O’Connor, B.  83 O’Connor, K. M.  147–149 Offer, S.  473 Ó Gráda, C.  545 oligarchy, collegial  60–61 Olson, A. G.  435 one-mode matrix, comembership  392–393 one-mode network  18, 18f exponential random graph models  235 online experiments field 527–528 on interactions  525–527, 526f online field experiments (OFEs)  527–528 Onnela, J.-P.  312 opportunity hoarding 61 social ties  73–74, 73f structures  51, 61 structures, extension  58 ordinary least squares (OLS) causal inference observational studies  292–294, 292f, 293f simulation example, of identification  294–296, 294f

multivariate regression quadratic assignment procedure with  592 regression 27 organizations. See also specific types criminal networks  624–625 networks 435–436 society, interdependencies  52–53 Osmani, V.  88 Östborn, P.  456, 456f, 459 Oubenal, M.  56 p*  27, 223–228. See also exponential family of random graphs-p* p1 distribution  219 Pachucki, M. A.  400 Padgett, J.  41, 42, 43, 403, 438 Padilla-Walker, L. M.  473 pairwise stability  538 Panconesi, A.  524, 526–527 Papachristos, A. V.  438, 616–617, 621t, 622–626 Par, P. S.  88 paradoxical market  55 parent-child interactions  470, 471, 473–474, 476 Parigi, P.  434, 437 Park, C.  591, 592 Parkinson, C.  499, 501–502, 502f, 503–504 partial network design  26, 26f, 122, 125, 130 participatory processes  62. See also specific types particularistic solidarity  55 path 336 path dependence  589 Pattacchini, E.  553 patterns. See also specific types horizontal  38, 38t moral order  397 relations 20–21 speech pattern interviews  419 structural, static networks  86 Pattison, P. E.  10, 221, 229, 235, 239, 249, 263. See also exponential random graph models (ERGMs) p* distributions  219–221. See also statistical models, network Pearson, J.  299 Peeples, M.  449 peer-effect settings, economics  541, 542–544, 543f, 552, 553

index   659 Peirce, Charles Sanders  39 Pellizzari, M.  569 Penalva-Icher, E.  57 Pentland, A.  88 Penuel, W. R.  292 perception bias  291 perceptual networks  128 Peregrine, P.  455 performance individual, social capital  572 innovation outcomes  571f, 572 Perrin, A. J.  370–371 personality 194 Pfaff, S.  436 Pfeffer, J.  193 physician prescribing practices  119 Piehl, A. M.  626 Piña-Stranger, Á.  56 places, macro-level networks of  376–379, 376t, 377f planar graph  355, 375 Plasmodium falciparum genome, probabilistic network model  325–327, 326f pnet 228 PN measure  335, 340t point 17. See also node politics, homophily  589 popularity 99–102 already popular on  6 definition 100 degree  23, 257 ego on  106 Facebook effect  105 in-degree popularity effect  273, 274t Matthew effect  145 sociometric, tracking  504–505 status measurement  105, 108 stochastic actor-oriented models  273 tournament  104, 106, 107 populations demography 481 hidden and rare  480, 481, 488–489 Porter, M. A.  255, 312, 319 Porter, S.  292 position brain 499 brokerage 572–573 dual positioning  403 family exchanges  20–21, 20f

interdependencies 50 multiplexity  20–21, 20f neural encoding  504–505 power 71 structural equivalent  42 positional approaches  20–21, 20f Powell, W. W.  403 power  195, 573–574 computational social science  72 network positions  71 structural 76 Power, J. D.  323f, 324 power-law-shaped degree distribution of nodes. 586–587 pragmatism  38–40, 44, 44t precarious value  57 Preciado, P.  371 preferential attachment (hypothesis)  257, 384–389 applications 384 cumulative advantage  384 ebay.com  386–389, 386t, 388f eopinions.com  385, 386t evidencing 385 history 384 kickstarter.com  385, 386t Matthew effect  384 observational studies  385 positive feedback  384–385 proposed experimental design, applications  384–385, 386t wikipedia.org  386, 386t presentation, results, ethical considerations  130–131 Price, D. D.  384 primal approach  375–376, 375f primary group  26, 26f. See also community privacy, computational social science  527–528 probabilistic network model latent Dirichlet allocation topic modeling algorithm 404 malaria genes  325–327, 326f processualized relationships  40 processual relationships  45 project memberships, multiple  240–243, 241t prominence, visibility and  102 propensity score methods  295–296 propinquity-based dyadic effects  257 proximal point analysis  448–451

660   index proximity  23, 565 closeness, social  353, 565 neural representation  501, 502, 504 computational social science  81, 84, 85, 88 connectivity 565 dyadic effects, modeling  269 Eurasian red deer dominance modeling  269 interaction partner selection  297 micro-level networks  370 trade ties, bilateral  588–589 transitive relations  448 pseudo-likelihood estimation, p* models  227 Putler, D. S.  385 Putnam, R.  38, 38t qualitative methods  40 quantitative ethology  86 quasi-experiment  138–139, 139f racial capital  40 racial field  40 racial minorities, social capital access deficit  565 workplace  565, 566, 568–569, 571 random graph Markov  221, 226 models, exponential family (see exponential random graph models (ERGMs)) random graph distribution Erdös-Rényi 221 exponential family  5, 219, 221 (see also specific types) randomized experiments, network influence 290 random mixing  166 Random Walk  520, 520f Rank, O. N.  240 rare populations  480, 481, 488 rational choice  35, 537–538 rationality, social  51 rational (Bayesian) learning  547–548 Reagans, R. E.  196, 288, 573 realist approach  182 boundary specification  122 recall bias 300 error, aggregated relational data  160–161, 161f social exclusion  147–149, 147f

reciprocal closeness  337 reciprocity (reciprocation)  18, 24, 74, 86–87, 235, 257 empirical examples  240, 241t Eurasian red deer dominance modeling  269–270 family relationships  470 red deer, Eurasian  267–279, 267f, 268t. See also Eurasian red deer, dominance relations modeling “redundant” connections, strong  23, 177 reflection problem  291–292, 552–553 reinforcement, social, meme spread  316 Reingold, E.  356 relational ethnography  40, 44 relational event, defined  259–260 relational event model (REM)  86–87, 258–261, 259t applications  120, 193 Eurasian red deer dominance relations  272–273, 274t, 276f, 277 interactions 120 relational expectations  43 relational history, defined  260 relational infrastructures  53–54 relational sociology  40–44, 44t, 45, 398 extensions 42–44 social networks and meaning  41–42 relational structure  400 among attitudes, homophily  400 relational systems. See also specific types formal 10 relations. See also specific types flows 255–256 interactions 255–256 similarities 255 “social relations,”  255 types 255–256 relationships. See also family relationships dynamic 45 interdependencies, as indicators  51 neo-structural sociology  51 processual 45 processualized 40 social 120 relevant (R package)  260, 261 Renou, L.  551 Renyi, A.  237

index   661 repercussions 35 research collaboration network map  598, 599f descriptive network  190–191 explanatory network  191 resources, individual  37, 38, 38t respondent-driven sampling (RDS)  30, 130, 153, 163, 172, 357, 358f ego network data  172, 174t ego network data, improving estimation  178–179 visualization, network  357, 358f returns deficit, social capital  568–569 rich-get-richer processes  384 Riegle-Crumb, C.  299 Rihll, T. E.  448, 454, 455 risk-taking tendency  292f, 293 Rivera, M. T.  192 Rivero-Fuentes, E.  487 Rivers, R.  450 roads 455 city 369 networks, Syria  453 spatial dimensions  372–376, 375f Robbins, H.  239 Roberts, P. W.  55 Robins, G. L.  194, 220–222, 227–229, 235, 239, 249, 263 robustness, quantifying causal inferences informing debate and  296–297 from selection models  298 Rogers, E. M.  484–485, 486 role equivalence  590–591 role relations, computational social science  71, 82 data, for social interaction, access, and sentiments theories  79–80 socially constructed, social ties as  73f, 75–76 Romney, A. K.  267 Rosero-Bixby, L.  485 Roth, C.  402 RSiena (R package)  228, 236, 248, 259t, 262 Rubin, D. B.  296 Rubineau, B.  575 Ruffini, G.  449 Rule, A.  404, 418 ruling in/out causal mechanisms  138

Sabidussi, G.  335, 337 Sacerdote, B.  290 Sade, D. S.  337 Sageman, M.  623 Saha, K.  523 Salancik, G. R.  193 Salganik, M. J.  145, 154, 156, 162–164, 385 Sallet, J.  508 sampling 29–30 network-based 130 network scale-up method  30 respondent-driven  30, 130, 163, 172, 357, 358f sampling measurement, design strategies “boundary specification” problem  121–123, 125 name generators  123–124, 125 name interpreters  124–126 Sandberg, J.  486 Saronic Gulf area, locally transitive networks  453, 454f Sasidharan, S.  206 scaled-down condition  166 scale-free distributions, connectivity  384–385 scale-up estimators  167 generalized 162–164 network 155–157 standard (traditional)  163–164 Schiller, K.  299 Schoch, D.  335 Schortman, E. M.  460 science, technology, and education maps 598–613. See also maps, science, technology, and education scope conditions  142 Scott, J.  220 second-order free-rider problem  56 sectorial networks  436 segregation, social network  565 selection models. See also specific types estimation 297–298 robustness of inference quantification from 298 Selfhout, M.  230 self-report bias  291 semantic network analysis  404–405

662   index semantic network analysis, for meaning structure 418–423 applications 418 fundamentals 418–419 results  420–423, 421f, 421f–422f, 422f, 423t techniques 418–419 text networks, construction  419–420 Senate voting, heat maps  364, 364f, 365f Senegalese village community, common resource management and information exchange  243–246, 245t sensitivity analysis  5, 296–297, 300 sentiments, computational social science  71, 82–83 interpersonal, social ties as  73f, 75 sentiments theories, data aggregate social interaction  80–81 role relation  79–80 separable temporal exponential family random graph models (STERM)  264–266, 272 Eurasian red deer dominance relations 274t, 276, 277, 280 sexual capital  40 Shai, S.  364 Shalizi, C. R.  288, 289, 296 Shapiro, K.  301 sharing economy  526 Sharkey, A. J.  102 Sharma, A.  522 Shelley, G.  155, 158 Shmatikov, V.  129 Short, J. F.  626 shortest-cost-path algorithms  374 shortest paths  73, 349, 374. See also geodesics flow outcomes perspective  346, 347t nodal level  195 structural power  79 walk structure perspective  336–339, 340t shortest-time algorithms  374 Shultz, S.  498 sibling roles  475–476 Siena models, Sinjders  59 SIENA software  36, 204. See also  actor-oriented models agent-based modeling  36, 44, 44t causal inference estimation  295–296, 299 longitudinal network analysis  59

similarities 255 similar network relations  12 Simmel, G.  3, 4, 392, 395 Simon, H. A.  385, 517–518 simultaneous influence  291 Sindbæk, S. M.  456–457 singular value decomposition (SVD) technique  355, 356 six degrees of separation  518 size, kin  468 size, social network brain processing capacity on  507–508 brain size on  496–498, 497f Skvoretz, J.  591 slices 319 small world  20, 586 small-world, scale-free, modular networks 379 Smith, C. M.  438, 621t, 623, 624–625 Smith, E.  108 Smith, J. A.  101, 106, 170, 173–175, 177–180 Smith, N. A.  83 Smith, S. S.  566 Smyth, P.  261 Snijders, T. A. B.  59, 101, 226, 229, 237–239, 249, 258, 260, 261, 263, 265, 266, 296. See also stochastic actor-oriented models (SAOMs) Snyder, D.  585 social aesthetics  40 social balance  24–25 social boundaries, measuring, from ego network data  177–178 social brain hypothesis  496–498, 497f social capital  37–38, 38t, 44, 194. See also specific types bonding 468–469 bridging 469 collective, social processes as  52, 54–56, 63 definition 563 density  27, 197 “invisible hand,”  566 trust and, criminal networks  624 women and minority groups, access deficit 565 social capital, economic sociology  563–575 future directions  574–575 interpersonal relationships  563

index   663 labor market  563–569 job-matching outcomes  567–569 job-matching processes  564–566, 564f, 567f returns deficit  568–569 social capital, defined  563 tipping point  574 workplace outcomes  569–574 antecedents  570–572, 571f fundamentals and history  569–570 individual performance and innovation outcomes 571f, 572 power and influence  573–574 trust and collective outcomes  572–573 social categories  42 social circuit dependence  226 social contagion  196, 315–316 social control  55, 56, 58, 63 social control theory  617 social dynamics  57, 58–59 social exclusion, network recall and  147–149, 147f social fields  40 social influence network theory  72 social information processing approach  193 social interaction data aggregate, access/social sentiments theories 80–81 role relation, theories  79–80 sociality, human  498 social learning  546–548 social mass  370 social memes, virality prediction  315–317 social network  46. See also specific topics anticategorical imperative  484 applications 445 basics and theories  4–5 close personal  1 conclusions and concerns  8 definition  98, 137, 222 dimensions 6–7 domain 41 ego 5 future directions  8–12 landscape 7–8 meaning 41–42 methods 5 organizations 1

physical infrastructure  445 (see also  archaeology) small local  1 social network analysis (SNA)  445. See also specific types archaeology  445–460 (see also  archaeology) dyads and triads  446–448, 446f, 447f general approaches  19–21, 20f (See also specific types) history and articles  1, 2f interdisciplinarity  1, 3–4, 3f method, theory  1–2 origins 369 theoretical and methodological streams  4 “theory models,”  450–455, 452f–455f social network analysis (SNA) modeling strategies international trade network  588–592 homophily 588–590 systemic equivalence (gravity model)  583, 590–591 topological properties  591–592 topological structure and evolution  583 world system classification  583, 585–586, 590–591 social network models. See models, social network social niche  53–54, 60 first-level 61 intermediary-level 60 social pressure  487 social processes. See also specific types as social capital of collective  52, 54–56, 63 social rationality  51, 52 social reinforcement, meme spread  316 social relationships  120, 255. See also family relationships; kin; specific types indirect, thought and behavior  503 models 189 social sentiments theories, aggregate social interaction data for  80–81 social spheres  1 social status. See status, social social structures. See also specific types cognitive 128 individual action and  35–36, 35f

664   index social support kin  467–477 (see also family relationships) measuring, ego network data  176 social ties. See under tie(s) sociogram 365 abstractions, node removal  361–362, 362f beyond 360–364 chord diagrams  358 classic 354 computer-generated, earliest  353 contour shading  362–364, 363f Gant diagrams  360, 362f heat maps  364, 364f, 365f improving, strategies  354–360 informational embellishments  358–359, 359f layouts distance-based  355–358, 357f, 359 fixed-coordinate 357–358 maps and circular  357–358 Moreno’s  352, 353 multislice graph  360, 361f, 362f Newcomb fraternity data  360, 362f node ordering, Nothway’s  353 scale challenge  365 small-town elite networks  353 time-space representation  360, 361f, 362f sociological exchange theory  73–74, 550–552 sociological stratigraphy  60 sociometric popularity, tracking  504–505 sociometry  369, 616 Soderstrom, S. B.  192 solidarity bounded 55 particularistic 55 Soltis, S. M.  201 Soundarajan, S.  517, 520 space definition 380 syntax, theory  371 Sparrow, M. K.  626 Sparrowe, R. T.  197 spatial dimensions  368–381 agent-based models  371–372 equation-based models  371 exponential random graph models  372 first law of geography  368 first law of networks  368 frontiers 380–381

latent space  380 macro-level networks, places  376–379, 376t, 377f meso-level networks, things  369–372 micro-level networks, people  369–372 origins 369–370 space syntax, theory  371 spatial infrastructure networks, theory  372–373, 373f tertiary (T-) communities  371 topographical space  368–369 topological space  368–369 spatial infrastructure networks, theory of 372–373 spatially embedded infrastructure networks  372–376, 373f, 375f Spinoza, B.  395 spring embedder techniques  355–356, 356f Springer Nature SciGraph  603, 604f stability 77 stable treatment unit value assumption (SUTVA) 290 Stadtfeld, C.  88, 265, 266 star network  539, 539f statistical inference  313 statistical models, network  28, 219–230. See also specific types bipartite networks  228 construction, steps  223 data collection, complete network studies 220 development, generations  219 exponential family of random graphs-p* 223–228 Bernoulli graph  224 complete subgraphs  223 degeneracy problem  226 dependence graphs  222–225 Hammersley-Clifford theorem  222, 224 Markov chain Monte Carlo maximum likelihood estimation  227 Markov dependence  224–226 Markov random graph  221, 226 parameters 225–227 simulation, estimation, and goodness of fit  227–228 statistical theory  223–225 history 220–222

index   665 longitudinal models  229–230 longitudinal networks, structure evolution/ structure and attributes coevolution 230 multilevel networks  229 multivariate networks  229 notation 222–223 p1 distribution  219 p* distributions  219–221 uniform and conditional uniform distributions 219 statnet  228, 239, 249 status, network conceptualization 99 definition 98–99 production and maintenance  104–107 asymmetry and evolution  106–107 Facebook effect  104–105 popularity tournament  104, 106, 107 status diffusion  105–106 topological implications  107 status, network, ascertaining inequality  99–104 agonism 103–104 esteem and choice  99–102 visibility and prominence  102 status, social ascertaining 99–104 diffusion 105–106 hierarchies emergence 105 persistence and instability  105 inequality 99 neo-structural sociology  54 neural encoding, distinct but analogous facets 505–506 unequal environments  107–108 visibility and prominence  102 Steele, M.  353 Steglich, C. E. G.  279, 296 Steinley, D.  220 stepparent roles entwined lives  476 friendly 476 limited interaction  476 Stewart, B. M.  83 stigma, network scale-up method  153, 164 Stivala, A. D.  249

stochastic actor-oriented models (SAOMs)  5, 9, 11, 86, 261–262 applications  120, 193, 203, 236, 256 causal inference  295–296 criminal networks research  619t Eurasian red deer dominance relations  269–270, 271, 273–276, 274t, 277, 279 longitudinal models  229–230 Markov stationarity assumption  281 network dynamics  256, 257–258, 259t, 282 node-level outcomes and network ties coevolution 236 relationships 120 REM with  87 timestamped event data–panel data integration 86 use, as coevolutionary model  280 stochastic blockmodel (SBM) approach  313 degree-corrected  325, 326f, 327 judicial institutions  58 online communication  261 relational event models with  87 stochastic models  257 stochastic process theory methods  254 StOCNET 228 Stovel, K.  436–437 Strassman-Muller, A.  299 strategic complementarities  542–544, 543f strategic orientations  194 strategic substitutes  542 stratigraphy, sociological  60 Strauss, D.  221, 224–225, 234, 237 strength-of-weak-ties hypotheses  177 Strodtbeck, F. L.  626 Strogatz, S. H.  372, 401 strong “redundant” connections  23, 177 strong ties  23, 38 structural balance theory  72, 77, 78, 79–81 triadic 75 structural causes  522 structural communities  312 structural equivalence  51, 590, 591 structural holes  24, 624 actor-oriented perspective  281 archaeology 447–448 Burt’s theory  539, 572, 624 definition 447

666   index structural holes (Continued) economics 539 measures  24, 447–448, 539, 574, 624 social capital  37–38, 38t economic sociology  571f, 572, 574 individual performance outcomes and innovation 571f, 572 social network analysis  190, 196, 198 structuralism 415–417 deterministic  50 (see also neo-structural institutionalism) structural mechanism, duality  393 structural power theories  76, 78, 79–80, 85 structure analysis, affiliation networks actor-network theory  402 affiliation networks  401 “heterogeneous networks,”  402 “new” science  393, 401–402 structure analysis, duality  400–402 structures. See also specific types definition 50 diffusion 10 meaning 39 opportunity  51, 58, 61 social processes, longitudinal and multilevel  58–62, 63 subject-action-object (S-A-O) network construction 424 subjective meaning  39, 41, 45 success breeds success  384 Suitor, J. J.  254 switchings 43 symbolic interactionism  51 symbols, cultural  42–43, 45 synchronization 60–63 costs and gains  62–63 Syria road networks  453 settlements network 3rd mill. BC  453, 453f systemic equivalence international trade network  590–591 role equivalence  590–591 structural equivalence  590, 591 Tartaron, T. F.  453, 454f team, sharing individual accomplishments across  246–248, 247t

technology maps  598–613. See also maps, science, technology, and education telephone tree  22–23, 22f temporal exponential family random graph models (TERGMs)  86, 259t, 263–266, 271–277, 274t Eurasian red deer dominance relations  273–276, 274t, 278 temporality, computational social science  72, 77–78 Terrell, J.  448 tertiary (T-) communities  371 texts 11 network approaches  414–415 network approaches, semantic network analysis 404–405 theories  34–46, 44t. See also specific theories action theory  35–36, 35f, 44, 44t social capital  37–38, 38t, 44 social structures and individual action  35–36, 35f actors, network  45 Bourdieu’s field theory  46 data mismatch  78 definition 46 expectations 46 material objects or cultural symbols  42–43, 45 meaning 45 method 1–2 networks 34 network theory vs. theory of networks  34 pragmatism and interactionism  38–40, 44, 44t processual and dynamic  45 relational sociology  40–44, 44t, 45 extensions 42–44 social networks and meaning  41–42 theory of networks  34 theory-driven experiments, computer  526 theory-driven models  299–300. See also specific types theory models. See also specific types data models from  455–458, 456f, 458f spatial network analysis  450–455, 452f–455f Theory of Social and Economic Organization (Weber) 99

index   667 thick description  11–12 thin description  12 things entangled networks of humans  459–460 meso-level networks  372–376, 373f, 375f third-party effect  591, 593 third-party relationships  399, 501, 503, 506, 510 Thomas, A. C.  289 Thomas, W.  38, 38t Thrasher, F. M.  626 tie(s) 41. See also edges computational social science theories  76–77 cross-cutting 433–434 directed 18 dyadic  74, 75, 81, 87–88 horizontal  38, 38t informal social  434–435 interorganizational 436 ordering 23 reliability and validity, data collection  126 social relationships  120 strong  23, 38 structure  38, 38t unweighted 17–18 weak  23, 38, 38t, 447, 447f tie(s), conceptualizations  72–76, 73f comparing 76–78 temporality  72, 77–78 ties and null ties, treatment of  76–77 social ties access or opportunity  73–74, 73f (time-aggregated) behavioral interactions 73f, 74–75 interpersonal sentiments  73f, 75 socially constructed role relations  73f, 75–76 strength 84 tie(s), dissolution  280 longitudinal network modeling  280 similarity 255 STERGM  264, 272 TERGM 259t, 264 vs. tie formation  280 tie(s), formation longitudinal network modeling  280

similarity 255 STERGM  264, 272, 276 TERGM 259t, 264, 272, 276 vs. tie dissolution  280 Tilly, C.  41–42, 43, 568 Trust and Rule 435 time (temporality)  72, 77–78 time-space representation  360, 361f, 362f Tolsma, J.  204 top-down dynamics  62 topic modeling  44, 45, 404, 405, 428 topographical space  368–369. See also spatial dimensions topological space  368–369 topology properties, international trade network  586–587, 591–592 status 107 Torfason, M. T.  436 trace complexity  524 trade intermediated 551–552 international network  583–593 (see also international trade network) on networks  550–552 transactions  43, 45–46. See also specific types transitive closure, empirical examples  241t, 242 transitive triad  24, 25f, 270, 270f, 446, 446f transitivity archaeology  446–447, 446f, 453, 455, 458 empirical examples  240, 241t family relationships  469–470 transitivity effect, international trade network 593 transmission implicitly measured  194 model  193–194, 208 transparency  5, 300, 359, 361 Trapido, D.  433–434 triad  24, 25 archaeology  446–448, 446f, 447f balance theory  24 census  24, 25f intransitive  24, 25f transitive  24, 25f, 270, 270f, 446, 446f

668   index triadic closure  74, 79–81, 85, 87–88, 257, 265 Eurasian red deer dominance relations modeling  270–271, 270f, 272, 277 overgeneralization, bias in  300–301 triadic theory of structural balance  75 triangles, clumps of  226, 447, 447f tripartite structural analysis  395 Tröster, C.  200 true experiment  138–139, 139f trust embeddedness and  624 homophily 522–523 social capital and  572–573 Trust and Rule (Tilly)  435 Tutte, W. T.  355 Tversky, A.  519 Twitter Data Explorer, NIH  603, 608–609, 609f two-by-two factorial design  141–143, 143f two-mode incidence matrix (affiliation matrix) 392–394. See also duality, beyond person and groups two-mode networks  18–19, 59, 228 exponential random graph models  235 two-step network  26, 26f type  1/2 networks, 456, 456f undirected edges  18 uniform distributions  219 conditional 219 Erdös-Rényi 219 union set  127 United Nations Comtrade database  584 unweighted ties  17–18 Ur, J. A.  453, 453f user consent, computational social science  528 Uzzi, B.  192, 624 Vaisey, S.  416 VanderWeele, T. J.  290 van Duijn, M. A. J.  101, 238 Van Knippenberg, D.  200 var genes  325–327, 326f Verdery, A. M.  487 vertical linchpins  57, 60, 63 vertices. See node violence networks  626 virality prediction, social memes  315–317

visibility, prominence and  102 visualization, network  352–365 advanced approaches  360–364 abstracting, node removal  361–362, 362f contour shading  362–364, 363f heat maps, Senate voting similarity  364, 364f, 365f multislice graphs  360, 361f, 362f time-space representation  360, 361f, 362f advantages and disadvantages  352 computer-generated, earliest  353 factor analysis and computational tools  353 fundamentals 352–353 history 353 methodology progression  352 motivations 354 node ordering, Nothway’s  353 scale challenge  365 small-town elite networks  353 software 364–365 strategies, better sociograms  354–360 chord diagrams  358 classic sociograms  354 computational effort  354–355 informational embellishments  358–359, 359f layouts, algorithm, heterogeneity  355, 355f layouts, distance-based  355–358, 357f, 359 layouts, fixed-coordinate  357–358 layouts, maps and circular  357–358 multidimensional scaling technique  355, 356, 357f, 362 principles 354 respondent-driven sampling  357, 358f singular value decomposition technique  355, 356 spring embedder techniques  355–356, 356f Volunteer Science Initiative  141 von der Lippe, H.  487 Walker, D.  290 walk structure perspective  334, 336–340, 340t, 347–348 Wang, C.  551 Wang, P.  235, 238, 239, 249 Wasserman, S.  219, 221, 222, 228. See also exponential random graph models (ERGMs)

index   669 Watkins, S. C.  485, 486 Watts, D. J.  86, 372, 385, 401, 402, 522, 525 Waugh, A. S.  317–319, 318f, 364 weak culture  51 weak ties  23–24, 38, 38t criminal networks  624 dyadic, archaeology  447–448, 447f strength  447, 447f Weber, M.  35 Theory of Social and Economic Organization 99 Welch, I.  546 well formedness  335 Wellman, B.  254, 288, 289, 370 Weng, L.  316–317 West, J. E.  290 West, R.  88 Western Electric Company, Hawthorne factory 570 Wheatley, T.  499, 501–502, 502f, 504 White, D. R.  27, 490 White, H. C.  10, 39, 40–44, 44t, 45, 51 Chains of Opportunity 51 Identity and Control  41, 398–399, 400 Whitman, M. M.  395 Widmer, E. D.  468, 469, 470, 471, 472 wikipedia.org  386, 386t Wilson, A. G.  448, 454, 455 Wimmer, A.  79, 82, 436 Winship, C.  475 within-subjects design  138 Wittek, R.  571–572 Wolinsky, A.  538–539 women, social capital access deficit  565 workplace  565, 568–569, 571 Wooldridge, J.  295 words categories from  404–405 embedding 428 vectors 428 workplace outcomes, social capital and  569–574 antecedents  570–572, 571f individual performance and innovation outcomes 571f, 572

power and influence  573–574 trust and collective outcomes  572–573 workplace outcomes  569–570 Works Progress Administration (WPA), Federal Writers’ Project approach basics  423–424 computational narrative analysis, for embedded meaning  424–427 approach basics  423–424 results  424–427, 425f–427f subject-action-object network construction 424 meaning structure, results  420–423, 421f–422f, 423t speech pattern interviews  419 text networks, constructing  419–420 world system theory, international trade network  583, 585–586, 590–591 Wu, G.  584, 586, 587 Wurpts, B.  436 Wyatt, D.  84, 88 Xie, W.  82 Xing, E. P.  263 Xu, H.  584, 586, 587 Xu, R.  296 Yasumoto, J. Y.  356 Yeung, K.-T.  399 Young, K. A.  566 Yue, L. Q.  438 Yule, G. U.  385 Zachary, W. W.  119, 314–315, 314f, 446–447 Zappa, P.  403 Zelditch, M.  140 Zenou, Y.  540, 541, 542–543, 553 Zerubavel, N.  504–505 Zhang, X.  88 Zhao, Y.  292 Zheng, T.  156, 158, 160–162 Zhou, M.  584, 586, 587, 588–590, 591, 592 Zhukov, Y.  436 Zijlstra, B. J. H.  101, 238 Zooniverse Project  141 Zuckerman, E. W.  196, 288