Multilevel Modeling 1544310307, 9781544310305

Multilevel Modeling is a concise, practical guide to building models for multilevel and longitudinal data. Author Dougla

277 98 2MB

English Pages [129] Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Multilevel Modeling
 1544310307, 9781544310305

Table of contents :
Cover
CONTENTS
Praise for the Second Edition
About the Author
Series Editor's Introduction
Preface
Chapter 1. The Need for Multilevel Modeling
Background and Rationale
Theoretical Reasons for Multilevel Models
Statistical Reasons for Multilevel Models
Scope of This Book
Online Book Resources
Chapter 2. Planning a Multilevel Model
The Basic Two-Level Multilevel Model
The Importance of Random Effects
Classifying Multilevel Models
Chapter 3. Building a Multilevel Model
Introduction to Tobacco Voting Data Set
Assessing the Need for a Multilevel Model
Model-Building Strategies
Estimation
Level 2 Predictors and Cross-Level Interactions
Hypothesis Testing
Chapter 4. Assessing a Multilevel Model
Assessing Model Fit and Performance
Estimating Posterior Means
Centering
Power Analysis
Chapter 5. Extending the Basic Model
The Flexibility of the Mixed-Effects Model
Generalized Models
Three-Level Models
Cross-Classified Models
Chapter 6. Longitudinal Models
Longitudinal Data as Hierarchical: Time Nested Within Person
Intraindividual Change
Interindividual Change
Alternative Covariance Structures
Chapter 7. Guidance
Recommendations for Presenting Results
Useful Resources
References
Index

Citation preview

MULTILEVEL MODELING Second Edition

Quantitative Applications in the Social Sciences A S A G E P U B L I C AT I O N S S E R I E S 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52.

Analysis of Variance, 2nd Edition Iversen/ Norpoth Operations Research Methods Nagel/Neef Causal Modeling, 2nd Edition Asher Tests of Significance Henkel Cohort Analysis, 2nd Edition Glenn Canonical Analysis and Factor Comparison Levine Analysis of Nominal Data, 2nd Edition Reynolds Analysis of Ordinal Data Hildebrand/Laing/ Rosenthal Time Series Analysis, 2nd Edition Ostrom Ecological Inference Langbein/Lichtman Multidimensional Scaling Kruskal/Wish Analysis of Covariance Wildt/Ahtola Introduction to Factor Analysis Kim/Mueller Factor Analysis Kim/Mueller Multiple Indicators Sullivan/Feldman Exploratory Data Analysis Hartwig/Dearing Reliability and Validity Assessment Carmines/Zeller Analyzing Panel Data Markus Discriminant Analysis Klecka Log-Linear Models Knoke/Burke Interrupted Time Series Analysis McDowall/ McCleary/Meidinger/Hay Applied Regression, 2nd Edition Lewis-Beck/ Lewis-Beck Research Designs Spector Unidimensional Scaling McIver/Carmines Magnitude Scaling Lodge Multiattribute Evaluation Edwards/Newman Dynamic Modeling Huckfeldt/Kohfeld/Likens Network Analysis Knoke/Kuklinski Interpreting and Using Regression Achen Test Item Bias Osterlind Mobility Tables Hout Measures of Association Liebetrau Confirmatory Factor Analysis Long Covariance Structure Models Long Introduction to Survey Sampling Kalton Achievement Testing Bejar Nonrecursive Causal Models Berry Matrix Algebra Namboodiri Introduction to Applied Demography Rives/Serow Microcomputer Methods for Social Scientists, 2nd Edition Schrodt Game Theory Zagare Using Published Data Jacob Bayesian Statistical Inference Iversen Cluster Analysis Aldenderfer/Blashfield Linear Probability, Logit, and Probit Models Aldrich/Nelson Event History and Survival Analysis, 2nd Edition Allison Canonical Correlation Analysis Thompson Models for Innovation Diffusion Mahajan/Peterson Basic Content Analysis, 2nd Edition Weber Multiple Regression in Practice Berry/Feldman Stochastic Parameter Regression Models Newbold/Bos Using Microcomputers in Research Madron/Tate/Brookshire

53.

Secondary Analysis of Survey Data Kiecolt/ Nathan 54. Multivariate Analysis of Variance Bray/ Maxwell 55. The Logic of Causal Order Davis 56. Introduction to Linear Goal Programming Ignizio 57. Understanding Regression Analysis, 2nd Edition Schroeder/Sjoquist/Stephan 58. Randomized Response and Related Methods, 2nd Edition Fox/Tracy 59. Meta-Analysis Wolf 60. Linear Programming Feiring 61. Multiple Comparisons Klockars/Sax 62. Information Theory Krippendorff 63. Survey Questions Converse/Presser 64. Latent Class Analysis McCutcheon 65. Three-Way Scaling and Clustering Arabie/ Carroll/DeSarbo 66. Q Methodology, 2nd Edition McKeown/ Thomas 67. Analyzing Decision Making Louviere 68. Rasch Models for Measurement Andrich 69. Principal Components Analysis Dunteman 70. Pooled Time Series Analysis Sayrs 71. Analyzing Complex Survey Data, 2nd Edition Lee/Forthofer 72. Interaction Effects in Multiple Regression, 2nd Edition Jaccard/Turrisi 73. Understanding Significance Testing Mohr 74. Experimental Design and Analysis Brown/Melamed 75. Metric Scaling Weller/Romney 76. Longitudinal Research, 2nd Edition Menard 77. Expert Systems Benfer/Brent/Furbee 78. Data Theory and Dimensional Analysis Jacoby 79. Regression Diagnostics, 2nd Edition Fox 80. Computer-Assisted Interviewing Saris 81. Contextual Analysis Iversen 82. Summated Rating Scale Construction Spector 83. Central Tendency and Variability Weisberg 84. ANOVA: Repeated Measures Girden 85. Processing Data Bourque/Clark 86. Logit Modeling DeMaris 87. Analytic Mapping and Geographic Databases Garson/Biggs 88. Working With Archival Data Elder/Pavalko/Clipp 89. Multiple Comparison Procedures Toothaker 90. Nonparametric Statistics Gibbons 91. Nonparametric Measures of Association Gibbons 92. Understanding Regression Assumptions Berry 93. Regression With Dummy Variables Hardy 94. Loglinear Models With Latent Variables Hagenaars 95. Bootstrapping Mooney/Duval 96. Maximum Likelihood Estimation Eliason 97. Ordinal Log-Linear Models Ishii-Kuntz 98. Random Factors in ANOVA Jackson/Brashers 99. Univariate Tests for Time Series Models Cromwell/Labys/Terraza 100. Multivariate Tests for Time Series Models Cromwell/Hannan/Labys/Terraza

Quantitative Applications in the Social Sciences A S A G E P U B L I C AT I O N S S E R I E S 101. 102. 103. 104. 105. 106. 107. 108. 109. 110. 111. 112. 113. 114. 115. 116. 117. 118. 119. 120. 121. 122. 123. 124. 125. 126. 127. 128. 129. 130. 131. 132. 133.

134. 135. 136. 137. 138. 139. 140. 141. 142. 143.

Interpreting Probability Models: Logit, Probit, and Other Generalized Linear Models Liao Typologies and Taxonomies Bailey Data Analysis: An Introduction Lewis-Beck Multiple Attribute Decision Making Yoon/Hwang Causal Analysis With Panel Data Finkel Applied Logistic Regression Analysis, 2nd Edition Menard Chaos and Catastrophe Theories Brown Basic Math for Social Scientists: Concepts Hagle Basic Math for Social Scientists: Problems and Solutions Hagle Calculus Iversen Regression Models: Censored, Sample Selected, or Truncated Data Breen Tree Models of Similarity and Association Corter Computational Modeling Taber/Timpone LISREL Approaches to Interaction Effects in Multiple Regression Jaccard/Wan Analyzing Repeated Surveys Firebaugh Monte Carlo Simulation Mooney Statistical Graphics for Univariate and Bivariate Data Jacoby Interaction Effects in Factorial Analysis of Variance Jaccard Odds Ratios in the Analysis of Contingency Tables Rudas Statistical Graphics for Visualizing Multivariate Data Jacoby Applied Correspondence Analysis Clausen Game Theory Topics Fink/Gates/Humes Social Choice: Theory and Research Johnson Neural Networks Abdi/Valentin/Edelman Relating Statistics and Experimental Design: An Introduction Levin Latent Class Scaling Analysis Dayton Sorting Data: Collection and Analysis Coxon Analyzing Documentary Accounts Hodson Effect Size for ANOVA Designs Cortina/Nouri Nonparametric Simple Regression: Smoothing Scatterplots Fox Multiple and Generalized Nonparametric Regression Fox Logistic Regression: A Primer Pampel Translating Questionnaires and Other Research Instruments: Problems and Solutions Behling/Law Generalized Linear Models: A Unified Approach, 2nd Edition Gill/Torres Interaction Effects in Logistic Regression Jaccard Missing Data Allison Spline Regression Models Marsh/Cormier Logit and Probit: Ordered and Multinomial Models Borooah Correlation: Parametric and Nonparametric Measures Chen/Popovich Confidence Intervals Smithson Internet Data Collection Best/Krueger Probability Theory Rudas Multilevel Modeling, 2nd Edition Luke

144. Polytomous Item Response Theory Models Ostini/Nering 145. An Introduction to Generalized Linear Models Dunteman/Ho 146. Logistic Regression Models for Ordinal Response Variables O’Connell 147. Fuzzy Set Theory: Applications in the Social Sciences Smithson/Verkuilen 148. Multiple Time Series Models Brandt/Williams 149. Quantile Regression Hao/Naiman 150. Differential Equations: A Modeling Approach Brown 151. Graph Algebra: Mathematical Modeling With a Systems Approach Brown 152. Modern Methods for Robust Regression Andersen 153. Agent-Based Models, 2nd Edition Gilbert 154. Social Network Analysis, 3rd Edition Knoke/Yang 155. Spatial Regression Models, 2nd Edition Ward/Gleditsch 156. Mediation Analysis Iacobucci 157. Latent Growth Curve Modeling Preacher/Wichman/MacCallum/Briggs 158. Introduction to the Comparative Method With Boolean Algebra Caramani 159. A Mathematical Primer for Social Statistics Fox 160. Fixed Effects Regression Models Allison 161. Differential Item Functioning, 2nd Edition Osterlind/Everson 162. Quantitative Narrative Analysis Franzosi 163. Multiple Correspondence Analysis LeRoux/Rouanet 164. Association Models Wong 165. Fractal Analysis Brown/Liebovitch 166. Assessing Inequality Hao/Naiman 167. Graphical Models and the Multigraph Representation for Categorical Data Khamis 168. Nonrecursive Models Paxton/Hipp/ Marquart-Pyatt 169. Ordinal Item Response Theory Van Schuur 170. Multivariate General Linear Models Haase 171. Methods of Randomization in Experimental Design Alferes 172. Heteroskedasticity in Regression Kaufman 173. An Introduction to Exponential Random Graph Modeling Harris 174. Introduction to Time Series Analysis Pickup 175. Factorial Survey Experiments Auspurg/Hinz 176. Introduction to Power Analysis: Two-Group Studies Hedberg 177. Linear Regression: A Mathematical Introduction Gujarati 178. Propensity Score Methods and Applications Bai/Clark 179. Multilevel Structural Equation Modeling Silva/Bosancianu/Littvay 180. Gathering Social Network Data Adams 181. Generalized Linear Models for Bounded and Limited Quantitative Variables Smithson/Shou 182. Exploratory Factor Analysis Finch 183. Multidimensional Item Response Theory Bonifay 184. Argument-Based Validation in Testing and Assessment Chapelle

Sara Miller McCune founded SAGE Publishing in 1965 to support the dissemination of usable knowledge and educate a global community. SAGE publishes more than 1000 journals and over 800 new books each year, spanning a wide range of subject areas. Our growing selection of library products includes archives, data, case studies and video. SAGE remains majority owned by our founder and after her lifetime will become owned by a charitable trust that secures the company’s continued independence. Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne

MULTILEVEL MODELING Second Edition

Douglas A. Luke Washington University in St. Louis

FOR INFORMATION:

Copyright  c 2020 by SAGE Publications, Inc.

SAGE Publications, Inc.

All rights reserved. No part of this book may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system, without permission in writing from the publisher.

2455 Teller Road Thousand Oaks, California 91320 E-mail: [email protected] SAGE Publications Ltd. 1 Oliver’s Yard 55 City Road London EC1Y 1SP United Kingdom SAGE Publications India Pvt. Ltd.

Printed in the United States of America

B 1/I 1 Mohan Cooperative Industrial Area

ISBN: 978-1-5443-1030-5

Mathura Road, New Delhi 110 044 India SAGE Publications Asia-Pacific Pte. Ltd. 3 Church Street #10-04 Samsung Hub Singapore 049483

Acquisitions Editor: Helen Salmon Editorial Assistant: Megan O’Hefferman Production Editor: Karen Wiley Copy Editor: QuADs Typesetter: Integra Proofreader: Jen Grubba Indexer: Naomi Linzer Cover Designer: Candice Harman Marketing Manager: Shari Countrymen

This book is printed on acid-free paper. 20 21 22 23 24 10 9 8 7 6 5 4 3 2 1

CONTENTS

Praise for the Second Edition

ix

About the Author

xi

Series Editor’s Introduction Preface 1. The Need for Multilevel Modeling Background and Rationale Theoretical Reasons for Multilevel Models Statistical Reasons for Multilevel Models Scope of This Book Online Book Resources

xiii xv 1 1 2 4 6 8

2. Planning a Multilevel Model The Basic Two-Level Multilevel Model The Importance of Random Effects Classifying Multilevel Models

9 9 11 12

3. Building a Multilevel Model Introduction to Tobacco Voting Data Set Assessing the Need for a Multilevel Model Model-Building Strategies Estimation Level 2 Predictors and Cross-Level Interactions Hypothesis Testing

16 16 18 23 25 26 28

4. Assessing a Multilevel Model Assessing Model Fit and Performance Estimating Posterior Means Centering Power Analysis

34 34 47 52 57

5. Extending the Basic Model The Flexibility of the Mixed-Effects Model Generalized Models Three-Level Models Cross-Classified Models

63 63 63 72 76

6. Longitudinal Models Longitudinal Data as Hierarchical: Time Nested Within Person Intraindividual Change Interindividual Change Alternative Covariance Structures

79 79 80 87 90

7. Guidance Recommendations for Presenting Results Useful Resources

94 94 97

References

100

Index

105

PRAISE FOR THE SECOND EDITION Because of the author’s pedagogically masterful presentation of multi-level modeling, the otherwise challenging journey to this topic now becomes not only smooth but also enjoyable. —Lin Ding, Ohio State University The book offers insights and explanations from which both newcomers and seasoned experts can find benefit. —Timothy Ford, Ohio University This is a thorough and accessible introduction to multilevel models. Through extensive examples, the author expertly guides the reader through the material addressing interpretation, graphical presentation, and diagnostics along the way. —Jennifer Hayes Clark, University of Houston The Second Edition is even better than the first. The models presented are closely linked to an extended example that students can readily identify with. —Richard R. Sudweeks, Brigham Young University

ix

ABOUT THE AUTHOR Douglas A. Luke is Professor and Director of the Center for Public Health Systems Science at the Brown School at Washington University in St. Louis. Dr. Luke is a leading researcher in the areas of public health policy, implementation science, and systems science. Most of the work that Dr. Luke directs at the Center focuses on the evaluation, dissemination, and implementation of evidence-based public health policies. During the past decade, Dr. Luke has worked on applying systems science methods to important public health problems, especially social network analysis. He has published two systems science review papers in the Annual Review of Public Health, and the first study to employ new statistical network modeling techniques on public health data was published in the American Journal of Public Health in 2010. He was also a member of a National Academy of Sciences panel that produced a recent report, “Assessing the Use of Agent-Based Models for Tobacco Regulation,” which provided the FDA and other public health scientists with guidance on how best to use computational models to inform tobacco control regulation and policy. Dr. Luke directs the doctoral progam in Public Health Sciences at the Brown School, where he also teaches doctoral courses in multilevel and longitudinal modeling, social network analysis, and philosophy of social science. Dr. Luke received his PhD in clinical and community psychology in 1990 from the University of Illinois at Urbana-Champaign.

xi

SERIES EDITOR’S INTRODUCTION How people see and act in the world reflect the contexts within which they function. Social scientists would not hesitate to agree with this statement. But they differ in the contexts that they emphasize. In sociology, these contexts might be neighborhoods or communities; in political science, voting precincts, legislative districts, or states; in economics, markets of various types; in education, classrooms, schools, and school districts. In each instance, level-one units, e.g., individuals, operate within and are constrained by level-two units, e.g., the contexts just enumerated. Multilevel models are designed for such situations, where independent variables measured at two (or more) levels are hypothesized to affect a level-one outcome. Multilevel Modeling, 2nd edition, by Douglas Luke, provides a step-bystep introduction to multilevel statistical models. As in the first edition, the volume targets users already familiar with linear regression but new to multilevel concepts and analysis. Professor Luke lays a firm foundation, giving the most attention to two-level hierarchical linear models where level-one units are neatly nested within level-two units. He explains how to conceptualize, build, and assess these models in Chapters 2, 3, and 4, respectively. Chapters 5 and 6 take up various extensions of the two-level hierarchical linear model, including nonlinear multilevel models, three-level and cross-classified models, and longitudinal models that allow for intra-individual change and inter-individual variability. Chapter 7 provides guidance on the presentation of results as well as a curated list of references for those wanting to learn more. Examples are critical to the pedagogy of the volume. One involves influences on tobacco-related legislation by members of Congress. Members of Congress are nested in states (two levels), with the outcome measured as the percentage of time that members voted in a “protobacco” direction over a four-year period. Professor Luke uses this example to illustrate different model specifications, explain measures of fit and model performance, and discuss centering and its impact. A related binary model predicts the outcome of votes on particular bills, for or against. Professor Luke uses an extension of this example, to introduce the generalized linear mixed-effects model. This version of the model nests votes on particular bills within members, nested within states. A different example is used to illustrate the multilevel approach to latent trajectory modeling. This example draws on the Longitudinal

xiii

xiv Study of Aging to investigate change in the activities of daily living as an elderly population ages. There are other examples as well. Data and software code (in R and Stata) are provided in an online appendix so that readers can gain experience with the methods by reproducing the examples. Professor Luke provides lots of practical general advice about multilevel modeling as well as advice specific to these examples. Since the first edition of Multilevel Modeling was published 15 years ago, it has become straightforward to link sample survey data to measures of relevant contexts. Indeed, in some instances, ancillary data for this purpose has already been created and made available to users. For example, the Health and Retirement Survey (HRS) has assembled and makes available to qualified users measures of sociodemographic characteristics, the built environment, health care, the food environment, physical hazards, and social stressors at multiple levels of census geography with links to HRS respondents. The possibilities are limitless, with much still to be done. With Multilevel Modeling, 2nd edition in hand, graduate students and others are well equipped to begin their journey. Barbara Entwisle Series Editor

PREFACE Since the first edition of this monograph was published in 2004, there have been numerous developments in the statistical and computational methods used in multilevel and longitudinal modeling. Mixed-effects modeling has been solidified as a primary means for accurately and efficiently estimating a wide variety of multilevel and longitudinal models. More complex models that include cross-level interactions, cross-classified random effects, alternative covariances structures, and the like appear much more frequently in the health and social sciences research literature. Sophisticated mixed-effects modeling procedures are now incorporated in most comprehensive statistical software packages (including R, Stata, and SAS), and thus there is less need for specialized multilevel software. During this same period, I have taught graduate multilevel classes and trainings over a dozen times, and I have learned a thing or two about how to think about mixed-effects models, how to correctly interpret their results, and maybe most important how to communicate those results to interested audiences. My students in these classes have always been patient with me and have been my most important collaborators in my own statistical training. The second edition of Multilevel Modeling has been improved and expanded in ways too numerous to list in detail. However, the major changes in this new version are as follows: • Longitudinal methods are expanded and get their own new chapter. • Diagnostic procedures are expanded with an emphasis on influence statistics. • Coverage of models of counts (Poisson) has been added. • A short new section on power analysis has been added. • Cross-classified models are now discussed. • The coverage of centering has been updated to reflect current statistical knowledge and practices. • A new section has been added that makes recommendations for presenting modeling results. xv

xvi • A new support website has been developed for the book that provides the data and the statistical code (both R and Stata) used for all of the presented analyses. I hope that with these changes, this book will remain useful and relevant for students and researchers for many years to come. In developing this second edition, I would like to particularly thank the following people who all provided extremely detailed and helpful reviews of the earlier edition, and drafts of this second edition. • Edward Brent, Department of Sociology, University of Missouri • Brian V. Carolan, Department of Educational Foundations, Montclair State University • Timothy Ford, Department of Curriculum, Instruction, and Learning, University of Louisiana • Jennifer Hayes Clark, Department of Political Science, University of Houston • Changjoo Kim, Department of Geography, University of Cincinnati • David LaHuis, Department of Psychology, Wright State University I dedicated the first edition of this book to my parents. I would like to dedicate this updated version to my daughter, Alina Luke. Alina was in first grade when I started work on the original volume. As time passes, daughters grow up. She recently received her MPH with a concentration in biostatistics and epidemiology, and is herself a skilled analyst with training in mixed-effects models. In fact, she has helped with this new edition by providing the Stata mixed-effects modeling code and in developing the support website for the book. For all of this, I get to thank her for her professional skills and for the joy she brings to my life.

CHAPTER 1. THE NEED FOR MULTILEVEL MODELING I should venture to assert that the most pervasive fallacy of philosophic thinking goes back to neglect of context. —John Dewey, 1931

Background and Rationale When one considers almost any phenomenon of interest to social and health scientists, it is hard to overestimate the importance of context. For example, we know that the likelihood of developing depression is influenced by social and environmental stressors. The psychoactive effects of drugs can vary based on the social frame of the user. Early childhood development is strongly influenced by a whole host of environmental conditions: diet, amount of stimulation in the environment, presence of environmental pollutants, quality of relationship with mother, and so on. Physical activity is shaped by neighborhood environment; people who live in neighborhoods with sidewalks are much more likely to walk. The probability of teenagers engaging in risky behavior is related to being involved in structured activities with adult involvement. A child’s educational achievement is strongly affected by classroom, school, and school system characteristics. These examples can be extended to situations beyond where individuals are being influenced by their contexts. The likelihood of couples avoiding divorce is strongly related to certain types of religious and cultural backgrounds. Group decision-making processes can be influenced by organizational climate. Hospital profitability is strongly affected by reimbursement policies set by government and insurance companies. What all these examples have in common is that characteristics or processes occurring at a higher level of analysis are influencing characteristics or processes at a lower level. Constructs are defined at different levels, and the hypothesized relations between these constructs operate across different levels. Table 1.1 presents an example of the interdependence among levels of analysis, here with an example from the area of tobacco control. Research programs on tobacco control exist at all levels of analysis, from the genetic up to the sociocultural and political (i.e., “from cells to society”). Moreover, although research can occur strictly 1

2 Table 1.1

Levels of analysis in health research with examples from tobacco control

Level of Analysis

Examples From Tobacco Control

Cultural/political

Measuring elasticity of the effect of cigarette taxation on population smoking rates

Social/environmental

Measuring the relative importance of family and peer environment on teen smoking incidence

Behavioral/psychological

Designing effective smoking prevention and cessation programs

Organ systems

Designing ways to block tumor formation in smokers

Cellular

Tracing metabolic pathways of nicotine uptake

Molecular/genetic

Examining the genetic basis of nicotine dependence

within any of these levels, much of the most important research will look at the links between the levels. For example, as we learn more about the genetic basis of nicotine dependence, we may be able to tailor specific preventive interventions to particular genotypes. These types of multilevel theoretical constructs require specialized analytic tools to properly evaluate. These multilevel tools are the subject of this book. Despite the importance of context, throughout much of the history of the health and social sciences, investigators have tended to use analytic tools that could not handle these types of multilevel data and theories. In earlier years, this was due to the lack of such tools. However, even after the advent of more sophisticated multilevel modeling approaches, practitioners have continued to use more simplistic single-level techniques (Luke, 2005).

Theoretical Reasons for Multilevel Models The simplest argument, then, for multilevel modeling techniques is this: Because so much of what we study is multilevel in nature, we should use theories and analytic techniques that are also multilevel. If we do not do this, we can run into serious problems, including making incorrect causal claims. For example, it is very common to collect and analyze health and behavioral data at the aggregate level. Epidemiologic studies, for exam-

3 ple, have shown that in countries where fat is a larger component of the diet, the death rate from breast cancer is also higher (Carroll, 1975). It might seem reasonable to then assume that women who eat a lot of fat would be more likely to get breast cancer. However, this interpretation is an example of the ecological fallacy, where relationships observed in groups are assumed to hold for individuals (Freedman, 1999). Recent health studies, in fact, have suggested that the link between fat intake and breast cancer is not very strong at the individual level (Holmes et al., 1999). This type of problem can also work the other way. It is very common in the behavioral sciences to collect data from individuals and then aggregate the data to gain insight into the groups to which those individuals belong. This can lead to the atomistic fallacy, where inferences about groups are incorrectly drawn from individual-level information (Diez-Roux, 1998). It is possible to be successful assessing ecological characteristics from individual-level data; for example, see Moos’s (1996) work on social climates. However, as Shinn and Rapkin (2000) have argued, this approach is fraught with danger and a much more valid approach is to assess group and ecological characteristics using group-level measures and analytic tools. It is useful here to consider the sociological distinction between properties of collectives and members (Lazarsfeld & Menzel, 1961). Members belong to collectives, but various properties (variables) of both collectives and their members may be measured and analyzed at the same time. Lazarsfeld and Menzel identify analytical, structural, and global properties of collectives. Analytical properties are obtained by aggregating information from the individual members of the collective (e.g., proportion of Hispanics in a city). Structural properties are based on the relational characteristics of collective members (e.g., friendship density in a classroom). Finally, global properties are characteristics of the collective itself that are not based on the properties of the individual members. Presence of an antismoking policy in a school would be a global property of the school, for example. Using this framework, it becomes clear that fallacies are a problem of inference, not of measurement. That is, it is perfectly admissible to characterize a higher level collective using information obtained from lower level members. The types of fallacies described above come about when relationships discovered at one particular level are inappropriately assumed to occur in the same fashion at some other (higher or lower) level. There is broad interest in social and physical context across the social sciences, and this can be seen most clearly in the ecological richness

4 of various social science theories and conceptual frameworks. In sociology and criminology, the theory of neighborhood disorder proposes that various physical and social indicators of environmental disorder are related to a variety of individual and relational outcomes, including crime, violence, policing styles, and depression (Sampson, Morenoff, & Gannon-Rowley, 2002). Political scientists have consistently viewed political participation (e.g., voting, petitioning, contacting elected officials) as being driven by a number of contextual processes and factors, including organizational culture, media exposure, and peer influence (Uhlaner, 2015). Policy science generally views policy development and implementation as a process embedded in local, regional, and national political and geographic contexts. For example, the political stream of Kingdon’s influential multiple streams framework is defined by referring to predominately contextual structures and processes including national mood, legislative body makeup, and interest group activities (Béland & Howlett, 2016). An alternative model of the policy process, the advocacy coalition framework, more explicitly positions policy activity as an output from the policy subsystem, which is in turn made up of three types of collectives: (1) coalitions, (2) governmental bodies, and (3) institutions (Sabatier & Weible, 2007). Finally, implementation science (sometimes called dissemination and implementation science) is a relatively new social science that focuses on how evidence-based programs, practices, and policies can be better disseminated, implemented, and maintained to benefit population health. Early frameworks used in implementation science view implementation processes and outcomes as situated within social and organizational contexts. For example, Rogers’s diffusion of innovations theory has been used extensively to study how new discoveries are passively diffused and actively disseminated across social and organizational networks and systems (Rogers, 2003; Valente, 1996). Figure 1.1 presents an expanded social–ecological framework for implementation science (Luke, Morshed, McKay, & Combs, 2017). This framework, adapted from Glass and McAtee (2006), suggests that relational, organizational, and social contexts are important for studying and understanding implementation outcomes.

Statistical Reasons for Multilevel Models Despite the contextual richness of the theories and frameworks used by social scientists, they have often tended to utilize traditional

5 Figure 1.1

A social–ecological model for implementation science. Global: Political/economic climate & dynamics Macro: National, state, & large area dynamics Mezzo: Setting of program implementation

Level of analysis

e.g., organization, hospital, neighborhood

Program, Policy, or Practice D&I timeline Translation

Dissemination

Implementation

Sustainability

being implemented

Micro: Professional or personal social networks e.g., organizational staff or clients

Psychological: Program-relevant knowledge, attitudes, & beliefs Biological: Program-relevant physiological, biological, & genetic characteristics

Source: Adapted from Glass and McAtee (2006).

individual-level statistical tools for their data, even if their data and hypotheses are multilevel in nature. One approach has been to disaggregate group-level information to the individual level so that all predictors in a multiple regression model are tied to the individual unit of analysis. This leads to at least two problems. First, all of the unmodeled contextual information ends up pooled into the single individual error term of the model (Duncan, Jones, & Moon, 1998). This is problematic because individuals belonging to the same context will presumably have correlated errors, which violates one of the basic assumptions of multiple regression (i.e., no autocorrelation of residuals). This is often called clustering, and it implies that observations within some larger unit are related to one another. For example, you might see clustering of student test scores within a classroom. If students are not randomly assigned to classes, if teacher ability varies across classrooms, or if classroom

6 environments (e.g., class size) vary systematically, then you would expect to see clustering of scores. The second statistical problem is that by ignoring context, the model assumes that the regression coefficients apply equally to all contexts, “thus propagating the notion that processes work out in the same way in different contexts” (Duncan et al., 1998, p. 98). One partial solution to these statistical problems is to include an effect in the model that corresponds to the grouping of the individuals. This leads to an analysis of variance (ANOVA) or analysis of covariance (ANCOVA) approach to modeling. Unfortunately, there are still a number of issues with this approach. First, in the case where there are many groups, these models will have many more parameters, resulting in greatly reduced power and parsimony. Second, these group parameters are often treated as fixed effects, which ignores the random variability associated with group-level characteristics. Finally, ANOVA methods are not very flexible in handling missing data or greatly unbalanced designs.

Scope of This Book Based on the previous discussion, the purpose of this monograph is to provide a relatively nontechnical introduction to multilevel modeling statistical techniques for social and health scientists. After this introduction, the book is split into two major sections. Chapters 2 through 4 discuss how to plan, build, and assess the basic two-level multilevel model, and describe the steps in fitting a multilevel model, including data preparation, model estimation, model interpretation, hypothesis testing, testing of model assumptions, centering, and power analysis. Chapters 5 through 7 constitute the second section of the book, covering extensions and more advanced topics. Chapter 5 covers useful extensions to the basic multilevel model, including modeling noncontinuous and nonnormal dependent variables, building three-level models, and building cross-classified models. Chapter 6 covers the use of mixed-effects models for longitudinal data. Finally, Chapter 7 provides some guidance on a couple of general multilevel modeling topics, including tips for presenting multilevel model results and general resources for analysts. The presentation of the topics covered in this book only assumes familiarity with multiple regression, and the text makes extensive use of example data and analyses.

7 The primary focus of this book is the application of mixed-effects modeling techniques for multilevel and longitudinal models. Mixedeffect models are powerful and flexible approaches that can handle a number of theoretical and statistical challenges. In particular, mixedeffects models can deal with the type of clustered data that more traditional multiple regression models cannot handle. Multilevel and longitudinal data sets are typified by clustered data (e.g., students nested in classrooms or observations nested within individuals), so that makes a mixed-effects modeling approach ideal in these situations. A useful definition to serve as a basis for the rest of the presentation is as follows: A multilevel model is a statistical model applied to data collected at more than one level in order to elucidate relationships at more than one level. The statistical basis for multilevel modeling has been developed over the past several decades from a number of different disciplines and has been called various things, including hierarchical linear models (Raudenbush & Bryk, 2002), random coefficient models (Longford, 1995), mixed-effects models (Pinheiro & Bates, 2000), covariance structure models (Muthén, 1994), growth curve models (Rogosa, Brandt, & Zimowski, 1982), as well as multilevel models. All these specific types of multilevel models fall into one of two broad statistical categories: a mixed-effects multiple regression approach and a structural equation modeling (SEM) approach. This book will focus on the mixed-effects regression-based multilevel modeling. Generally, I will use “multilevel” or “longitudinal” models when I am talking about the purposes of a model; I will tend to used “mixed-effects” models when I am talking about the broad class of statistical models that can be used for multilevel or longitudinal analyses. For a good introduction to the SEM-based approach, see Chapter 6 in Heck and Thomas (2009). Multilevel and longitudinal models are starting to appear more frequently in every area of the social and health sciences, as the techniques have become more widely known and integrated into the major statistical packages. There are as many specific types of multilevel models as there are scientific questions. However, there are certain types of overarching models that can be seen across the different research disciplines, as indicated in Table 1.2. This table is not meant to be exhaustive but to provide a catalyst for the reader and indicate the extremely wide applicability of multilevel methods. In particular, it is not immediately obvious to the new analyst that mixed-effects models can be extremely useful for modeling longitudinal data (where multiple observations are nested within an individual) and also for meta-analytic studies (where multiple effect statistics are nested within individual studies).

8 Table 1.2

Types of multilevel models and structures found in the health and social sciences

Type

Multilevel Structure

Examples

Physical

Entities nested within the immediate physical environment, including biological, ecological, and built environments

Diez-Roux, et al. (2001); Lottig, et al. (2014); O’Campo, et al. (2015)

Social

Entities nested within social structures, including families, peer groups, and social networks

Buka, et al. (2003); Koster, et al. (2015); Sampson, et al. (2005)

Organizational Individuals and small groups nested within organizational and institutional contexts

Beidas, et al. (2015); Maes and Lievens (2003); Masuda, et al. (2018)

Political

Individuals, groups, and communities nested within specific sociopolitical, cultural, or historical contexts

Boehmer, et al. (2008); Luke and Krauss (2004); Skiple, et al. (2016)

Temporal

Multiple observations of a single entity taken over time

Boyle and Willms (2001); Curran, et al. (1997); Howe, et al. (2016)

Analytic

Multiple effect measures nested with individual studies (i.e., metaanalysis)

Goldstein, et al. (2000); Weisz, et al. (2017)

Online Book Resources Finally, a variety of resources are available online to support the material covered in this book. These resources include all the data sets used for the examples, R and Stata code to reproduce most of the models, results, and figures, and an occasionally updated mixed-effects modeling resource list. These materials can be found at https://www.douglasluke.com.

CHAPTER 2. PLANNING A MULTILEVEL MODEL

It’s turtles all the way down! —Stephen Hawking, 1988

The Basic Two-Level Multilevel Model The goal of a multilevel model is to predict values of some dependent variable based on a function of predictor variables at more than one level. For example, we might want to examine how a child’s score on a standardized reading exam is influenced both by characteristics of the child (e.g., amount of study time) as well as characteristics of the child’s classroom (e.g., size of class). In this example, we consider the child to be measured and modeled at Level 1, and the classroom at Level 2. This simple two-level structure can be seen in the following multilevel model, with one predictor variable each at Level 1 and Level 2: Level 1: Level 2:

Yij = β0j + β1j Xij + rij β0j = γ00 + γ01 Wj + u0j β1j = γ10 + γ11 Wj + u1j

(2.1)

This system of equations not only lists all the predictor and dependent variables but also clearly delineates the multilevel nature of the model. The Level 1 part of the model looks similar to a typical ordinary least squares (OLS) multiple regression model. However, the j subscripts tell us that a different Level 1 model is being estimated for each of the j Level 2 units (classrooms). Using the above example, each classroom in the study may have a different average reading score (β0j ) and a different effect of study time on reading score (β1j ). Thus, we are allowing the intercept and slope to vary across the Level 2 units. This leads to the critical conception in multilevel modeling—we can treat intercepts and slopes as outcomes of Level 2 predictors.

9

10

A note about notation Students (and others) often find it tedious to navigate notation systems for statistical equations. However, understanding the notation used for multilevel and other mixed-effects models will pay off in at least three ways. First, given the potential complexity of the structure of these models, a clear and consistent notation system will help in designing the appropriate model. Second, it will make it less likely that the user will make a mistake when translating a model design into statistical programming code. Finally, the notation system can help when interpreting the results of a model analysis. The notation system used here is based on Raudenbush and Bryk (2002) and has five basic components. First, the uppercase, italicized Roman letters (e.g., W , X , Y , Z) are used for the observed variables. Second, the italicized Greek letters (e.g., β, γ ) are used for the model parameters. The lowercase, italicized Roman letters (e.g., r, u) are used for the variance components at each level. Next, the letter subscripts (i, j, etc.) provide information about the nesting structure in the model. There will be one letter for each level, and earlier letters imply nesting within the later letters. So, for example, in a study of student achievement, Yij might refer to the score of the dependent variable for the ith student nested in the jth classroom. Finally, the number subscripts are used to map out the model structure. At the lowest level of a multilevel model, the numbers refer to the order of the Level 1 parameters: 0 for the intercept, 1 for the first covariate, 2 for the second, and so on. For each higher level in the model, one additional number entry is used to distinguish the parameters at that level. So in a two-level model (see Equation 2.1), the second number in the pair will designate which parameter is referenced at that second level. Again, 0 would indicate the intercept at that level, 1 the first covariate, and so on. The first number will always tell you information about the first level, the second number about the second level, and so on. Looking at Equation 2.1, the subscripts for γ00 are two zeroes. The first zero tells you that the parameter is connected to the Level 1 intercept, and the second zero tells you that it is also the intercept at Level 2. For an example of how this notation can be extended to handle three level models, see Equation 5.9 in Chapter 5.

The Level 2 part of the model listed in Equation 2.1 indicates how each of the Level 1 parameters are functions of Level 2 predictors and variability: β0j is the Level 1 intercept in Level 2 unit j; γ00 is the mean value of the Level 1 dependent variable, controlling for the Level 2 predictor, Wj ; γ01 is the effect (slope) of the Level 2 predictor, Wj ; and u0j is the error, or unmodeled variability, for unit j. The interpretation of the second equation is similar, but here we are modeling the Level 2

11 effects on the slope of Xij : β1j is the Level 1 slope in Level 2 unit j; γ10 is the mean value of the Level 1 slope, controlling for the Level 2 predictor, Wj ; γ11 is the effect of the Level 2 predictor, Wj ; and u1j is the error for unit j. Instead of using a system of equations to specify the multilevel model, we can substitute the Level 2 parts of the model into the Level 1 equation. After substituting and rearranging the terms, we get the following: Yij = [γ00 + γ10 Xij + γ01 Wj + γ11 Wj Xij ] + [u0j + u1j Xij + rij ] fixed

(2.2)

random

This single prediction equation form of the multilevel model is sometimes called the mixed-effects model—it is more compact, but it is harder to quickly discern the multilevel structure of the underlying model. However, this form has two advantages. First, as shown above, the single prediction equation clearly indicates which part of the model is composed of fixed effects (the γ s) and which part is composed of random effects (u and r). This illustrates why multilevel models are also called mixed models or mixed-effects models; they are always made up of both fixed and random effects. The other advantage of the single prediction equation form is that it often closely corresponds to the syntax and output of multilevel modeling software. In addition, this form makes it clear that the Level 1 parameters (i.e., β0j , β1j ) are not directly estimated but are indirectly estimated through the Level 2 gammas (γ ).

The Importance of Random Effects Most analysts, especially those who utilize ANOVA methods, are familiar with fixed effects. Random effects, on the other hand, may be a little less familiar. In the context of ANOVA, random effects are often defined as independent factors whose levels have been randomly selected from a larger potential population of factor levels. In multilevel modeling, however, it is more useful to think of random effects as additional error terms or sources of variability. In our classroom example, we have the traditional individual-level error term: rij . However, our multilevel model has two additional error terms: u0j is the variability of reading scores between classrooms, and u1j is the variability of the relationship of study time to reading scores between classrooms. So a multilevel model will generally have random effects that are tied to Level 1 and Level 2 units. More specifically, random effects arise in a model when a random factor is included in that model. The traditional definition specifies that

12 a factor is random when its levels are a random sample from a larger population. (Note that “levels” here is referring to the units in a random factor, not the levels in a multilevel model.) For example, if a sample of neighborhoods was selected from a larger universe of neighborhoods in a city, that factor would be considered random. However, in practice the distinction between fixed and random factors is often fuzzy, especially once you realize that you can treat a factor as either fixed or random in your statistical model. It is more helpful to consider random factors as satisfying one or more of the following conditions: • The levels of the factor represent some subset of a larger universe of possible levels (as suggested by the above definition). • The levels of the factor can be considered replaceable with other potential levels (e.g., different neighborhoods could be included in the factor without distorting the meaning of the model). • The interest of the model is in the variability across levels of the factor, rather than apportioning variability to specific levels. • The factor has a relatively large number of levels. So taking this approach, we can see that a treatment condition factor from a randomized controlled trial (with three levels: control, traditional treatment, new treatment) would never be an appropriate random factor. The three levels are the only levels that exist, they are not replaceable with other possible levels, the interest is in the variability of specific levels (i.e., the new treatment effect), and it has a small number of levels. In comparison, when we analyze the congressional tobacco voting data later in this book, we will treat the 50 U.S. states as a random factor. Even though we include all possible levels in this factor (all 50 states), it is an appropriate random factor because of the relatively large number of levels, and our modeling interest is more in including variability across states than examining the effects of any specific state.

Classifying Multilevel Models Equation 2.1 illustrates a fairly typical multilevel model, but there are many types of multilevel models that may be estimated depending on the situation. To get a handle on the bewildering array of possible models, it helps to see that the final form of a model will depend on the following decisions:

13 1. How many levels are in your data, and how many of these levels do you want to model? Although it is possible to include more than three levels in a model, most published examples in the social science literature are of two or three levels. 2. How many predictors at each level do you want to consider? 3. Do you want to model Level 1 intercepts, slopes, or intercepts and slopes as a function of Level 2 characteristics? Figure 2.1 shows the difference between (a) intercepts-as-outcomes and (b) slopes and intercepts-as-outcomes. The left side of the figure shows a model that has constant slope across the Level 2 units, but varying intercepts. The right side shows both intercepts and slopes as varying. The analyst should use both theory and evidence from the data to help guide this decision. 4. Finally, which parts of the model should include random effects?

Example of (a) intercepts-as-outcomes and (b) intercepts and slopes-as-outcomes for three Level 2 units.

Figure 2.1

4

5

3

4

2

3

1

2

0

1 0

Dependent variable

(b) Intercepts and slopes as outcomes 5

(a) Intercepts as outcomes

0

1

2

3

4

Level 1 Predictor

5

0

1

2

3

4

Level 1 Predictor

5

Yij = γ00 + γ10 Xij + u0j + uij Xij + rij

L1: Yij = β0j + β1j Xij + rij L2: β0j = γ00 + u0j

+ γ11 Wj Xij + u0j + u1j Xij + rij

β1j = γ10 + γ11 Wj + u1j

Yij = γ00 + γ01 Wj + γ10 Xij

L1: Yij = β0j + β1j Xij + rij

L2: β0j = γ00 + γ01 Wj + u0j

β1j = γ10 + u1j

Yij = γ00 +γ10 Xij +u0j +rij

L1: Yij = β0j + β1j Xij + rij L2: β0j = γ00 + u0j β1j = γ10

Intercepts and slopes as outcomes

Random coefficients regression model

One-way random effects ANCOVA

Means as outcomes

One-way random effects ANOVA

Description

Note: L1 and L2 represent Level 1 and Level 2, respectively. ANOVA = analysis of variance; ANCOVA = analysis of covariance.

3. Random intercepts and slopes

Yij = γ00 +γ01 Wj +u0j +rij

L1: Yij = β0j + rij

2. Random intercepts

L2: β0j = γ00 + γ01 Wj + u0j

Yij = γ00 + u0j + rij

Mixed-Effects Equations

L1: Yij = β0j + rij L2: β0j = γ00 + u0j

System of Equations

Three classes of multilevel models

1. Unconstrained

Class

Table 2.1

L1 intercept and slopes are modeled using L2 predictor(s). Note the cross-level interaction component: (γ11 Wj Xij ).

Intercepts and slopes of L1 model are allowed to vary across L2 units, but we are not modeling that variability with L2 predictors.

Here the emphasis is on L2 predictors.

Often used as a null model to estimate between-groups effects with an intraclass correlation.

Notes

14

15 Despite the large number of different multilevel models that a researcher may wish to use, it is helpful to think about three broad classes of multilevel models, listed in Table 2.1. The first type of model is the simplest possible multilevel model that has no Level 1 or Level 2 predictors. This unconstrained, or null model, is often used as a starting point for building more complex models. It is particularly useful for calculating the intraclass correlation coefficient (ICC; see Chapter 3). The second class of models assumes that Level 1 intercepts vary across Level 2 units, but not the Level 1 slopes. We would use this class of model in our example if we believed that different classrooms had different average reading scores, but believed that the effects of individual study time on reading ability were the same across classrooms. The final class of models assumes that both intercepts and slopes vary across Level 2 units. We would use this model if we believed that there was a crosslevel interaction between classroom characteristics and individual study time on reading scores. For example, some teachers may get better results only with students who study a great deal, whereas other teachers may have a positive effect on all students regardless of study time. Note that this type of cross-level interaction effect is made explicit by the γ11 Wj Xij term in the single prediction equation form of the multilevel model (Equation 2.2).

CHAPTER 3. BUILDING A MULTILEVEL MODEL Look, I made a hat . . . Where there never was a hat. —Stephen Sondheim, Sunday in the Park With George, 1984

Introduction to Tobacco Voting Data Set To illustrate how to develop, test, and interpret a typical multilevel model, I will use an example data set taken from a tobacco control policy study (all data used here are available for download at https://www.douglasluke.com). The main goal of this study was to identify the important influences on voting on tobacco-related legislation by members of Congress from 1997 to 2000 (Luke & Krauss, 2004). The dependent variable is Voting %, the percentage of time that a senator or representative voted in a “protobacco” direction during those 4 years. As an example, consider the 1998 Senate Bill S1415–144, which was an amendment proposed by Senator Ted Kennedy (D-Mass.) to raise federal cigarette taxes by $1.50 per pack over 3 years. For our purposes, a “No” vote is considered to be a protobacco vote because the tobacco industry opposed this legislation. This particular bill was defeated, 40–58. Voting % was calculated for each member of Congress by adding up the total number of times that he or she voted in a proindustry direction and dividing by the total number of tobacco-related bills that he or she voted on. The variable can range from 0.0 (never voted protobacco) to 1.0 (always voted protobacco). (There are challenges in modeling a dependent variable that is a percentage or proportion. This is discussed in more detail in Chapter 5.) Party records the political party of each legislator. Past research has shown that political party is an important predictor of voting pattern—Republicans tend to vote more often in the protobacco industry direction. The other important individual-level variable in which we are interested is Money, the amount of money that the member of Congress received from tobacco industry political action committees (PACs). Our hypothesis is that the more PAC money a legislator receives, the more often that person will vote protobacco. In addition to these Level 1 variables, we will also have information about the Level 2 units. First of all, we will need to know which state each member of Congress represents. Then, we will need to obtain any 16

17 Level 2 variables that we want to use in our model. The most important Level 2 characteristic in our model is the state tobacco farm economy. We assess this with Acres, the number of harvested acres of tobacco in 1999, measured in thousands of acres. Although many multilevel software packages allow you to combine the Level 1 and Level 2 information into a single data file (e.g., R, SAS, and Stata), some packages such as HLM have you start out with two data files, one for each of the levels in your model. Tables 3.1 and 3.2 show how the data sets for our tobacco study may be structured, with one data table for each level in the multilevel model. An important element of the Level 1 data table is the linking or index variable that connects each Level 1 case to the appropriate Level 2 unit. The state abbreviation serves as the link variable for this data set. Note that the minimal data requirement for a multilevel analysis is a dependent variable and a link to a Level 2 unit. However, in most cases, the data sets will also include a variety of Level 1 and Level 2 predictor variables. For software that requires only one data file, the file will be organized at the lowest level of analysis. Any Level 2 (or higher) predictors will be disaggregated and included in each Level 1 record. So, for example, to analyze the tobacco data set using lme4 in R, one data set is prepared with 527 records, one for each member of Congress. In addition to the

Table 3.1

Structure of Level 1 tobacco data set

Name Murkowski Young Shelby Cramer ...

Branch Senate House Senate House

State AK AK AL AL

Voting % 0.84 0.57 0.64 0.89

Party Republican Republican Republican Democrat

Note: N = 527.

Table 3.2

Structure of Level 2 tobacco data set

State AK AL AR ... Note: N = 50.

Acres 0.0 0.0 0.0

Money ($K) 9.2 23.5 24.2 14.0

18 index variable (State) and the Level 1 predictors, each case will also have a value for Acres. This is a disaggregated Level 2 predictor; every member of Congress from the same state will have the same value for this variable. (Merging multiple data tables is a common data management task and will often be required for multilevel and longitudinal data analysis. There are good data management resources available for most statistical platforms. R for Data Science [Wickham & Grolemund, 2017] and Data Management Using Stata [Mitchell, 2010] are two good examples.)

Assessing the Need for a Multilevel Model The first step in building a multilevel model is to decide whether a multilevel model is even needed in the first place. In general, we can use three types of justification for a multilevel model: (1) empirical, (2) statistical, and (3) theoretical. Following along with our tobacco policy example, each of these three justifications will be discussed in turn. Figure 3.1 displays the average protobacco voting percentage for members of Congress for each state. This map shows that there is considerable variability from state to state on voting behavior. In particular, the southeastern and Plains states appear to vote most often for Figure 3.1

Average protobacco vote percentage by Congress members: 1997 to 2000.

Average Vote % 0.25

0.50

0.75

19 Relationship of PAC money and voting % for four large states.

Figure 3.2

100%

Vote %

75%

50%

State CA 25%

FL IL NY

0% 0

10

20

30

40

50

PAC Money ($K)

Note: PAC = political action committee.

tobacco industry interests, whereas New England tends to vote against the industry. Further evidence of state variability is shown in Figure 3.2, a scatterplot of the relationship between PAC contributions received and voting behavior for five of the largest states. For all five of the states, there appears to be a positive relationship between PAC contributions and protobacco voting percentage. New York has the lowest protobacco average score of the group. Also, two of the states (California and Illinois) appear to have a much stronger relationship (steeper slope) between money and voting than the other states. A much more detailed graphical examination is provided in Figure 3.3, which is a scatterplot matrix of the within-state linear fits. This graph also shows that voting percentage increases with PAC money.

20 Figure 3.3

Ordinary least squares fits of voting % and money for 50 states.

MA

RI

ND

CT

HI

VT

ME

SD

NV

MD

NJ

OR

WV

NY

DE

WA

MN

WI

MI

CA

PA

OH

IL

IA

NM

MT

FL

UT

TX

AR

MO

NE

IN

AZ

CO

VA

GA

LA

MS

KS

TN

AK

NH

SC

AL

NC

ID

OK

WY

KY

1.25 1.00 0.75 0.50 0.25 0.00 1.25 1.00 0.75 0.50 0.25 0.00

Vote %

1.25 1.00 0.75 0.50 0.25 0.00 1.25 1.00 0.75 0.50 0.25 0.00 1.25 1.00 0.75 0.50 0.25 0.00 1.25 1.00 0.75 0.50 0.25 0.00 0 30 60 90

0 30 60 90

0 30 60 90

0 30 60 90

0 30 60 90

0 30 60 90

1.25 1.00 0.75 0.50 0.25 0.00 0 30 60 90

0 30 60 90

PAC Money ($K)

However, we can also see here the extent of the state-to-state variability. For example, even though most states show a positive relationship between PAC money and protobacco voting, some states, such as Oklahoma and Michigan, do not. Also, we can see that some states (the small states with few representatives) are contributing relatively less information than other larger states. A scatterplot matrix such as we see in Figure 3.3 takes advantage of the principle of small multiples as espoused by Edward Tufte (2001). This is particularly useful here, where we can compare and contrast the patterns of money and voting across the 50 states. Graphical techniques such as these can be extremely useful for gathering empirical evidence of the need for multilevel modeling. A more

21 formal piece of empirical evidence is provided by the ICC. The ICC measures the proportion of variance in the dependent variable that is accounted for by groups (i.e., Level 2 units): ρ=

σu20

(3.1)

(σu20 + σr2 )

where σu20 and σr2 are estimates of the Level 2 and Level 1 variance components, respectively, and are obtained by fitting a null model. The null model is a multilevel model with no Level 1 or Level 2 predictors: Level 1: Level 2:

Yij β0j

= β0j + rij = γ00 + u0j

(3.2)

The mixed-effects form of this model is then Yij = γ00 + u0j + rij

(3.3)

Yij is the voting percentage for a particular legislator within a particular state. The only fixed effect (γ00 ) is the grand mean across all members of Congress. The error terms are split into two components: (1) the variability between states (u0j ) and (2) the variability between legislators within a state (rij ). We can see that this null model is the same as a one-way, random-effects ANOVA model. As Table 3.3 shows, the estimate for the Level 2 variance is 0.035 and for Level 1 is 0.093. Using Equation 3.1, the ICC is then 0.035/(0.093 + 0.035) = 0.273. This means that states account for 27% of the variability of voting behavior among legislators. This moderately high ICC value suggests that a multilevel model incorporating states and state characteristics may be useful. Note that because there are no Level 1 or Level 2 predictor variables in the null model, there is only one fixed effect estimated. The estimate for γ00 is 0.53, which is interpreted as the expected value of the dependent variable across all subjects. Thus, a typical senator or representative is expected to vote protobacco slightly more than half of the time. Table 3.3

Parameter estimates for null model Parameter Intercept (γ00 ) Level 1 variability (rij ) Level 2 variability (u0j )

Estimate 0.531 0.093 0.035

22 The second justification for using a multilevel model arises from the statistical or structural properties of the data. A major assumption of single-level, OLS models is that the observations (and hence the error terms) are independent of each other. Whenever there is a nested structure in a data set, there is a good chance that the independence assumption is violated. The moderately large ICC in our example already suggests that our observations are not independent. This makes sense given the clustered nature of our data. Senators and representatives from North Carolina, for example, are probably more similar to each other than they are to Congress members from other states, such as Massachusetts. As the saying goes, all politics are local, and it is reasonable to assume that state characteristics can shape political behavior as much as other individual factors, such as political party. Multilevel modeling relaxes this independence assumption and allows for correlated error structures. If OLS is used inappropriately for clustered data with correlated errors, the resulting standard errors may be smaller than they should be, resulting in a greater chance of committing Type I errors. Multilevel models, on the other hand, are more likely to produce consistent estimates of the variance components. The final, and most important, justification for using multilevel models is theoretical. Any time a researcher utilizes a theoretical framework or poses hypotheses that are composed of constructs operating and interacting at multiple levels, then the researcher should use multilevel statistical models. This may seem obvious, but as discussed in Chapter 1, it is quite common to see published work where the theoretical approach is multilevel, but the data are collected or the analyses are performed at a single level. For our tobacco voting study, it is reasonable to hypothesize that characteristics of the state of a legislator may influence his or her voting on tobacco-related bills. Specifically, one hypothesis that we will test is that legislators from states with strong tobacco farming economies will be more likely to vote protobacco than legislators from states with no tobacco economy. So, using our example, we have seen the three types of justification for building a multilevel model. The empirical justification comes from examining the data using graphical tools, and seeing that voting (Level 1) varies strongly by state (Level 2). This is also confirmed with the substantial ICC. The statistical justification comes from recognizing that the cases in our study are not independent, are clustered by state, and are likely to exhibit correlated errors. Finally, the theoretical justifica-

23 tion comes from our interest in a multilevel model that will examine how state-level characteristics influence individual voting behavior.

Model-Building Strategies There is no single best way to build a multilevel model—the individual steps that a researcher should take in building the model are based on the investigator’s research questions; whether the analysis is exploratory or confirmatory; and whether the analytic emphasis is on parameter estimation, model fit, prediction, or even causal inference (Harrell, 2015). That being said, the goal of building a multilevel model is the same as any other statistical model: to end up with the simplest model that is informative and provides the best fit to the observed data (West, Welch, Gałecki, & Gillespie, 2015). Model building is an iterative process, and many analysts will follow one of two broad approaches that can help manage the many empirical and conceptual decisions that need to be made at each step: a top-down or a bottom-up strategy. By iterative, I don’t mean haphazard. Rather, an iterative model-building process proceeds in a small number of steps, always guided by theory and domain knowledge. It is important to keep the model-building process systematic, otherwise the research has no hope of being reproducible (Peng, Dominici, & Zeger, 2006).

Top-Down Strategy In a top-down strategy, the model will be built by iteratively working through the following steps: 1. Start with a fully specified fixed-effects model. This should include all covariates and their interactions for all of the levels in the model. 2. Select appropriate random effects for the model. Retain random effects that improve the fit of the model and, preferably, are meaningful. 3. If desired, select a covariance structure for the remaining residuals in the model. Depending on the goals of the model, this may or may not be necessary. 4. Reduce the model, by dropping unnecessary fixed-effects parameters.

24

Bottom-Up Strategy In contrast, a bottom-up strategy proceeds as follows: 1. Start with an unconditional, null model. This has no fixed-effect covariates, but does include the Level 1 intercept random effect. 2. Add Level 1 covariates to the model. Also, consider adding random effects for these Level 1 parameters. This produces an intercepts-and-slopes-as-outcomes model, see Table 2.1. 3. Add higher level covariates to the model. In a three-level model (see Chapter 5), any covariate added in Level 2 may have a random effect added across the levels of Level 3. For more in-depth treatment of these model-building strategies, see West et al. (2015) as well as Snijders and Bosker (2011). For the remaining examples, we will loosely follow a bottom-up model building strategy. Our first step is to build a simple model with only Level 1 predictors that we will call Model 1. Voting % = β0j + β1j (Party)ij + β2j (Money)ij + rij β0j = γ00 + u0j β1j = γ10 + u1j β2j = γ20

(3.4)

The two Level 1 predictors are Party (0 = Democrat; 1 = Republican) and Money (PAC money received, in thousands of dollars). There are no Level 2 predictors included in this first step. In addition to the fixed effects, we are allowing the effects of Party on voting to vary from state to state. That is, we are treating Party as a random factor, with respect to State. We could also allow the effects of Money to vary randomly, but exploratory analyses (not shown here) indicate that the Money random effect is very close to 0, so it is not included in the model. Therefore, Model 1 has three fixed effects and three random effects (i.e., one intercept, one slope, and the Level 1 error). This can be seen more clearly with the mixed-effects form of the model: Voting % = γ00 + γ10 (Party)ij + γ20 (Money)ij + u0j + u1j (Party)ij + rij

(3.5)

25 Table 3.4

Fixed- and random-effect parameter estimates for Model 1 Parameter Intercept (γ00 ) Party (γ10 ) Money (γ20 ) Level 2 variability (u0j ) Party slope variability (u1j ) Level 1 variability (rij )

Estimate 0.213 0.489 0.004 0.013 0.008 0.027

Table 3.4 shows estimated coefficients for Model 1. The essential components of the fitted multilevel model are the statistical parameters: the fixed-effects regression parameters (the gammas) and the variance components for the random effects. It is important to keep in mind that the Level 2 errors listed in the random-effects sections are not statistical parameters, per se, but are latent random variables, with an expected mean of 0, and variance = σu2 . The estimate for γ00 is 0.21. This is no longer interpreted as the grand mean of voting percentage; instead, it is the expected value of voting percentage when the predictor values are all 0. So according to these data, Democratic legislators who have received no money from tobacco industry PACs are expected to vote protobacco only 21% of the time. The estimate of γ10 is 0.49, which tells us that the “effect” of being Republican is to increase the likelihood of voting protobacco by 49%, compared with Democrats. Finally, γ20 = 0.004, which tells us that for every $1,000 received by a legislator, we would expect to see an increase in protobacco voting of approximately 0.4%. The random effects part of the model is concerned with the variance components. These should not be interpreted as effects in the model. Instead, nonzero variance components are evidence of unmodeled variability. This information can be used to help decide either to add more variables to the model or, conversely, to stop adding variables at a particular level. The relatively large variance components for Level 1 (.027) and Level 2 (.013) can thus be interpreted as evidence that we might want to consider adding more predictors to the model.

Estimation It is most common to use some type of maximum-likelihood (ML) estimation when fitting basic multilevel models. As noted in the above

26 Table 3.5

Comparison of random effects estimates for ML and REML ML

Intercept Party slope Level 1

Variance

95% CI

Variance

0.0135 0.0079 0.0274

[0.007, 0.027] [0.003, 0.023] [0.024, 0.031]

0.0141 0.0085 0.0275

REML 95% CI [0.007, 0.028] [0.003, 0.024] [0.024, 0.031]

Note: ML = maximum likelihood; REML = restricted maximum likelihood; CI = confidence interval.

output, our Level 1 model was fit using ML. Another closely related method that can be used is restricted maximum likelihood (REML). What are the differences between these two methods, and how can you choose which method to use? The main practical difference between the two is how the random-effects variance components are calculated. Both methods, in fact, produce very similar fixed-effects estimates. REML takes into account the degrees of freedom of the fixed effects when estimating the variance components. This results in random-effects estimates that tend to be less biased than with full ML. However, these differences are usually quite small, especially when there is a relatively large number of Level 2 units, of the order of 30 or more (Snijders & Bosker, 2011). An important advantage of ML is that the deviance statistic produced by full ML can be used to compare the fixed and random effects of two models (see Chapter 4 for more details). Table 3.5 shows the random-effects part of the Level 1 model using both ML and REML. As can be seen, in our data set with 50 Level 2 units, the differences between the two methods are trivial and would not lead to any important difference in model building or interpretation. For most purposes, then, you can use full ML unless you have a small number of Level 2 units and the two methods produce large differences.

Level 2 Predictors and Cross-Level Interactions The random-effects variance components in our model are all greater than zero, suggesting that there is potentially substantial unmodeled variability. This leads to our next two models (Models 2 and 3), where we add a Level 2 predictor, the number of acres of tobacco harvested in a state. First, in Model 2, we will have Acres influence only the intercept (β0j ) of the Level 1 prediction equation:

27 Voting % = β0j + β1j (Party)ij + β2j (Money)ij + rij β0j = γ00 + γ01 (Acres)j + u0j β1j = γ10 + u1j β2j = γ20

(3.6)

The intent of this intercept model is to assess the extent to which the interstate variability of voting behavior can be explained by a simple measure of the extensiveness of the tobacco economy within a state. That is, Acres is treated as a main effect, where it directly affects voting. Model 3 extends the previous model to allow tobacco acreage to influence the slopes of the two Level 1 predictors, Party and Money: Voting % = β0j + β1j (Party)ij + β2j (Money)ij + rij β0j = γ00 + γ01 (Acres)j + u0j β1j = γ10 + γ11 (Acres)j + u1j β2j = γ20 + γ21 (Acres)j

(3.7)

Note that to be consistent with our earlier models, we are allowing the effects of Party on voting to vary across states, but not Money. That is, Party is entered as a random effect, and Money as a fixed effect. This slopes model (Model 3) works by including two cross-level interactions. That is, it will test not only whether tobacco acreage influences average voting behavior in a state but also whether it interacts with either of the two Level 1 predictors. Thus, the parameters γ11 and γ21 are serving as indications of cross-level interactions where a Level 2 characteristic may influence a Level 1 relationship. The mixed-effects form of the model is as follows: Voting % = γ00 + γ01 (A)j + γ10 (P)ij + γ11 (A)j (P)ij + γ20 (M)ij + γ21 (A)j (M)ij + u0j + u1j (P)ij + rij

(3.8)

where A is Acres, P is Party, and M is Money. Cross-level interactions are one of the most important features of mixed-effects multilevel modeling. Computationally, cross-level interactions are not particularly noteworthy, they are treated much like any other interaction term by most mixed-effects software. However, they are extremely important conceptually—they allow us to think clearly about micro- and macrolevels of analysis (Aguinis, Boyd, Pierce, & Short, 2011;

28 Shinn & Rapkin, 2000). A typical application is to study how the effects of lower level variables change as a function of higher order moderator variables. For example, do the positive effects of motivation on test results among school children vary by classroom size? There are three particular challenges that are important to keep in mind when incorporating cross-level interactions. First, in the presence of multiple lower and higher order covariates, the number of possible cross-level interactions can be overwhelming (and this is even more challenging when moving to three-level models and beyond). Although you can proceed analytically to identify important interactions that are present in the data, this can easily lead to a “fishing expedition” approach that is likely to inflate Type I error rates. Instead, as in all other aspects of statistical model building, it is wiser to use theory and past experience to guide model building and focusing on a relatively small number of important potential cross-level interactions. Second, the number of higher order units in the study (e.g., classrooms, neighborhoods) will drive the power of your model to detect real cross-level interactions. See Chapter 4 for a more detailed discussion of power in multilevel models; however, a simple principle is useful to keep in mind here. A data set with a small number of Level 2 units will always have somewhat limited power to detect Level 2 main effects and interactions, as well as cross-level interactions. Many multilevel studies in the health and social sciences are restricted to relatively small numbers of Level 2 units; therefore, this is a common constraint on model building and interpretation. So once again, it will be beneficial to think carefully about exactly which cross-level interactions you want to include. Finally, including numerous cross-level interactions may cause estimation problems. This is more likely to occur when the Level 1 predictor is allowed to randomly vary across the Level 2 units and the estimates of the random effect is small or close to zero. One sign of these estimation problems is when you get messages about “failing to converge” from your statistical estimation procedure.

Hypothesis Testing Table 3.6 presents the important results from each of these last three models. To understand these results, we need to understand how hypothesis testing of individual parameter estimates and model comparisons are handled in multilevel models. Significance tests for fixed-effects parameters (the gammas) are similar to those for multiple regression.

29 Most multilevel software programs use the typical ML Wald test to test for the significance of fixed-effect parameters (Hox, et al., 2017). The Wald test is interpreted as a Z-statistic from a standard normal distribution. The Wald Z-test is similarly used for testing variance components in multilevel software other than HLM. HLM, assuming that variances may not be normally distributed, instead uses a chi-square test of the residuals. One should be cautious in interpreting any significance tests of variance components. First, variances are bounded at zero, so their distributions are not normal. More important, it is not clear exactly what the meaning of a significant variance component should be—after all, we generally expect variances to be nonzero. Thus, variances are like effect-size statistics; one can perform a significance test or form confidence intervals, but it is usually more fruitful to interpret their sizes rather than their significance. (This line of reasoning is why lme4 does not currently provide standard errors or significance tests for the variance components in R. See the GLMM FAQ at https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html.) Examination of Table 3.6 shows that political party and PAC contributions are highly significant Level 1 predictors ( p < .001) for all three models. Model 2 shows that tobacco acreage is not quite significant and the effect appears to be rather small—for every additional 1,000 acres of tobacco harvested in a state, we would expect to see an increase in protobacco voting of about 0.06%. Model 3 provides a more complex picture of the effects of party, money, and tobacco economy. The results of this model show that tobacco acreage not only significantly affects the average voting level in a state (the intercept effect), but there are also highly significant cross-level interactions. The negative coefficients for the two interactions indicate that the presence of tobacco farming in a state acts to reduce the effects of being Republican, on one hand, and receiving tobacco industry money, on the other. Finally, all three of the models include party as a random effect but not PAC contributions. Exploration of alternative models reveal that the effects of money on voting are relatively constant across states; therefore, money is kept as a fixed effect. Although we do see some variability of voting behavior across states (u0j ) as well as variability of the effects of party on voting behavior (u1j ), these variance components are reduced across the three models as we add more covariates. That is, some of the variability of voting and money across states ends up being accounted for by the inclusion of acres as a main effect and cross-level interaction.

Acres (γ21 )

Money (γ20 )

For Money slope (β2j )

Acres (γ11 )

Party (γ10 )

For Party slope (β1j )

Acres (γ01 )

Intercept (γ00 )

0.0042

0.4887

0.2125

Coefficient

0.0005

0.0219

0.0222

SE

Model 1

9.07

22.29

9.59

T-Ratio

.000

.000

.000

p

0.0041

0.4885

0.0006

0.2057

Coefficient

0.0005

0.0222

0.0003

0.0209

SE

Model 2

8.22

21.96

1.78

9.85

T-Ratio

Parameter estimates and model fits for three models of voting behavior

For Intercept (β0j )

Fixed Effects

Table 3.6

.000

.000

.074

.000

p

0.0006 0.0000

0.0048 −0.0000

0.0206 0.0005

0.5098

0.0005

0.0197

SE

−0.0016

0.0027

0.1804

Coefficient

Model 3

−2.82

8.70

−3.50

24.70

5.01

9.17

T-Ratio

.005

.000

.000

.000

.000

.000

p

30

0.0041

−0.0085

0.0274

Covariance (u0j , u1j )

Level 1 (eij )

−311.2

7

−325.2

162.6

AIC

[0.024, 0.031]

Likelihood Deviance Parameter

0.0019

[−0.016, −0.000]

[0.003, 0.023]

[0.007, 0.027]

95% CI

0.0019

0.0039

0.0045

0.0043 [0.003, 0.024]

[0.005, 0.024]

95% CI

163.8

−327.6 8

−311.6

AIC

[0.024, 0.031]

[−0.014, 0.001]

Model 2 SE

Likelihood Deviance Parameter

0.0273

−0.0069

0.0085

0.0108

Variance

Note: SE = standard error; CI = confidence interval; AIC = Akaike information criterion.

Model Fit

0.0042

0.0079

Party slope (u1j )

0.0048

0.0135

Intercept (u0j )

Model 1

SE

Variance

Random Effects

0.0018

0.0026

0.0030

0.0032

[0.001, 0.017]

[0.004, 0.018]

95% CI

174.9

−349.8

10

−329.8

AIC

[0.024, 0.031]

[−0.009, 0.001]

Model 3 SE

Likelihood Deviance Parameter

0.0278

−0.0042

0.0044

0.0083

Variance

31

32 Given the complexity of mixed-effects models, with the possibility of multiple main effects and interactions at each level, cross-level interactions, and slope random effects, it has become more common to use likelihood ratio tests to hierarchically compare complete models with one another, instead of focusing on the significance of individual fixed and random effects. This fits nicely with a theory-driven modeling approach, where blocks of covariates and/or random effects may be considered together. Likelihood ratio tests are also commonly used to ascertain the necessity of random slopes, by comparing models with random slopes with a model with fixed slopes. For example, Table 3.7 shows the results of a set of likelihood ratio tests that compare three versions of Model 3. Model 3a is an intercept-only random model (all covariates are fixed). Model 3b allows the effects of party to vary across states (equivalent to Model 3 in Table 3.6). Model 3c allows both party and money to vary. The likelihood ratio test, measured as a χ 2 , compares each model with the previous one. So we can see that allowing party to vary across states results in a better fitting model. However, adding money as a random effect does not improve our model. In addition to likelihood ratio tests, other techniques focusing on Akaike information criterion (AIC), Bayesian information criterion (BIC), and other Bayesian approaches are sometimes used (Posada, Buckley, & Thorne, 2004). In particular, information-theoretic statistics such as AIC can be used to compare models with one another even when they are not nested, which is required for the likelihood ratio approach. See the next section for further details. Sometimes, it can be difficult to correctly trace all the effects in a complex multilevel model. It is useful to plot simple prediction equations based on the estimated parameters. As an example, Figure 3.4 shows the expected voting percentages for Democratic legislators for three different levels of tobacco acreage. The solid line at the bottom shows the relationship of tobacco industry money on voting in a state such as Illinois, where there is no tobacco farming. The dotted line in the middle Table 3.7

Likelihood ratio tests of random effects in Model 3

Model

Random Effects df

Deviation

AIC

χ2

χ 2 df

p

3a 3b 3c

Intercept Party Party + Money

−343.6 −349.9 −352.0

−327.6 −329.9 −326.0

6.3 2.1

2 3

.042 .558

Note: df = degrees of freedom.

8 10 13

33 Predicted protobacco voting percentage for Democrats by amount of tobacco acreage.

Figure 3.4

100%

Predicted voting—Democrats (%)

75%

50%

Acres 25%

None Moderate High

0% 0

25

50

75

100

PAC contributions ($K)

Note: PAC = political action committee.

shows the same relationship for a state such as Georgia, which has a moderate amount of tobacco farming (33,000 acres in 1999). Finally, the dashed line at the top shows an extreme case from North Carolina, with more than 200,000 acres of tobacco harvested. This figure shows that, in general, the more money accepted by Democratic legislators, the more likely they are to vote protobacco. However, this figure shows that the state tobacco economy moderates this relationship. States with more tobacco acreage are likely to show higher protobacco voting rates (higher intercepts), but at the same time, the effect of contributions to individual legislators is lessened (shallower slopes).

CHAPTER 4. ASSESSING A MULTILEVEL MODEL There is no need to ask the question “Is the model true?” If “truth” is to be the “whole truth” the answer must be “No.” The only question of interest is “Is the model illuminating and useful?” —George Box, 1978 In the previous chapter, we saw how to build and estimate a mixedeffects multilevel model. Once you have a fitted model, it is important to look at the model closely to determine if it is working as intended and assess how well the model is explaining the data. It is also usually important to move beyond the simple parameter estimates of the model, and use the full model to examine the important patterns implied by that model. For example, you might want to produce profile plots to better illustrate interaction effects in the model. In this chapter, we cover a number of topics related to understanding and assessing a fitted multilevel model.

Assessing Model Fit and Performance Assessing Model Fit–Deviance and R2 Another important aspect of model specification and testing is examining how closely the model fits the data. One of the products of a model fit using ML is the likelihood statistic. In fact, the likelihood is what is maximized in ML estimation of multilevel models. A transformation of the likelihood called the deviance is obtained by multiplying the natural log of the likelihood by minus two (−2LL). The deviance is a measure of the lack of fit between the data and the model. The deviance for any one model cannot be interpreted directly, but it can be used to compare multiple models with one another. This model comparison can be done between two models fit to the same data set, where one of the models is a subset (and therefore has fewer parameters) of the other. For many hypotheses, and with large samples, the difference of the deviances from each model is distributed as a χ 2 statistic with degrees of freedom equal to the difference in the number of parameters estimated in each model. For example, consider Table 3.6. The deviance for Model 1 is −325.2. The deviance for Model 2 is −327.6. A lower deviance always implies 34

35 better fit, and with nested models, the model with more parameters will always have a lower deviance. The difference between the two deviances is 2.4, which is compared with a χ 2 distribution with 1 (8 parameters − 7 parameters) degree of freedom (df ). The difference is not significant (p = .126), so there is no evidence that Model 2 provides a better fit to the data than Model 1. However, the Level 2 slopes model (Model 3) is significantly better: (349.8 − 327.6 = 22.1, df = 2, p < .001). So if we are going to use amount of tobacco acreage as a Level 2 predictor, then these results suggest that tobacco acreage be used to predict slopes as well as intercepts of the Level 1 predictors. One disadvantage of the deviance (−2LL) is that a model fit to the same data with more parameters will always have smaller deviance. Smaller deviance is good, in that we want to maximize the fit of the model to the data. However, we also generally want to have models that are parsimonious, or able to explain the data with as few parameters as possible. Two other fit indexes have been developed that are based on the deviance but incorporate penalties for a greater number of parameters: the Akaike information criterion (AIC) and Schwarz’s Bayesian information criterion (BIC; Akaike, 1987; Schwarz, 1978). Most mixedeffects modeling packages report the AIC or BIC as part of the model estimation, but they can be calculated easily using Equation 4.1: AIC = −2LL + 2p BIC = −2LL + p ln(N)

(4.1)

where p is the number of parameters in the model and N is the sample size. The BIC was not developed with multilevel models in mind, so it is not totally clear which sample size should be used. Singer and Willett (2003) suggest that the Level 1 sample size should be used, and we follow their advice. As with the deviance, lower AIC or BIC values indicate a better model. One advantage that AIC and BIC have is that they can be used to compare two nonnested models. Examination of the AIC values in Table 3.6 confirms our earlier decision to choose Model 3 as the best model, which has the lowest AIC. In regular OLS regression, the fit of a model is typically assessed by calculating R2 and interpreting it as the percentage of variance of the outcome accounted for by the predictors in the model. In multilevel modeling, the use of R2 is more complicated for a number of reasons. First, a separate R2 can be calculated for each level of the multilevel model. Second, the calculation and interpretation of these various R2 statistics will vary based on other elements of the underlying mixedeffects model, particularly the number of included random effects and

36 whether group or grand-mean centering has been used (see the section on centering later in this chapter). However, based on the work of Snijders and Bosker and others (Hox et al., 2017; Snijders & Bosker, 1994, 2011), we can utilize an approach that is relatively straightforward to calculate and yields interpretable measures of R2 for each level. Instead of interpreting R2 as a simple percentage of variance accounted for, we will interpret R2 in a multilevel model as the proportional reduction of prediction error (Equation 4.2). This general formulation tells us that more complex models should do a better job of explaining predicted outcomes and variability compared with a simpler, baseline model with fewer covariates. This baseline model will often be a null model, but not always. For example, we might want to see the improvement in a multilevel model after adding some substantive predictors to a model that already includes basic demographic variables. R2 = 1 −

unexplained variance from full model unexplained variance from baseline model

(4.2)

Given that residuals in a model indicate lack of fit between a model and the data, a better fitting model is one where the residuals are smaller than those in another comparison model. In a two-level model, then, we will have two ways to assess the model fit. First, for Level 1, R21 will assess the proportional reduction of error for predicting an individual outcome. Then, for Level 2, R22 will assess the proportional reduction of error for predicting a group (Level 2 unit) mean. These statistics are relatively easy to calculate using the output from a fitted model. First, for Level 1 our starting point is the variance of the Level 1 residuals: (4.3) var(residuals)1 = σr2 + σu20 Next, we calculate our estimate of this variance for two models: a baseline model and a fuller comparison model. As stated above, the baseline model is often a null or fully unconstrained model with no Level 1 or Level 2 predictors. The proportional reduction of prediction error for Level 1 is then R21 = 1 −

(σˆ r2 + σˆ u20 )Full (σˆ r2 + σˆ u20 )Baseline

(4.4)

If the comparison model is a better fit to the data, then the variance of the Level 1 residuals will be smaller, leading to a larger R21 .

37 The process for Level 2 is similar. First, we start with the formula for the variances of the Level 2 residuals: var(residuals)2 =

σr2 + σu20 n

(4.5)

where n is the expected number of Level 1 units in each Level 2 unit in the population. We calculate sample-based estimates of this variance for the baseline and comparison models and compute the Level 2 proportional reduction of prediction error: R22 = 1 −

(σˆ r2 /˜n + σˆ u20 )Full (σˆ r2 /˜n + σˆ u20 )Baseline

(4.6)

Here again, σˆ r2 and σˆ u20 are provided as part of the basic output from a multilevel model. However, the user will have to provide a value for n˜ , which should be the typical number of Level 1 units in any Level 2 unit. This value can come from theory or from the expectation of what the sample size should be in the population. Lacking theoretical guidance, and if there are varying group sizes in the data set, then you can use the harmonic mean of the Level 2 unit sample sizes: H = k 1

k (1/nj )

,

(4.7)

where k = the number of Level 2 units. If the sample sizes of the Level 2 units are not too unbalanced, then the simple mean of the group sizes will be close to the harmonic mean and can be used instead. The only wrinkle to this procedure for calculating R21 and R22 is handling a slopes-as-outcomes model. In a slopes model, in addition to the Level 2 variability for the intercept (σu2 ), there will also be a 0

variance for each of the slopes: (σu21 ), (σu22 ), and so on. However, the above equations use only the intercept variability. How should the slope variabilities be handled for R21 and R22 ? Snijders and Bosker (2011) suggest that the model should be refit without the random slopes. This will result in only the two variance components that are required for the calculations and usually does not change the actual parameter estimates very much. Turning to the tobacco voting example, we would like to see how good our final Level 2 slopes model (Model 3) is at predicting individual voting behavior and state-level average voting behavior. Table 4.1 presents the estimated variance components and harmonic means needed for the calculations. (Following Snijders and Bosker’s, 2011, recommendations

38 Table 4.1

Values used to calculate R21 and R22

Model Baseline: Fully unconstrained Full comparison: Level 2 slopes

σˆ r2

σˆ u20

H

0.093 0.028

0.035 0.005

6.24 6.24

described above, we reran the slopes model with the Party and Money slopes as fixed rather than random. This results in the model having only two variance components estimates: σˆ r2 and σˆ u20 .) Using these data, R21 is (1 − (0.028 + 0.005)/(0.093 + 0.035)) = 0.74. (0.028/6.24 + 0.005) = 0.81. So by including two For Level 2, R22 is 1 − (0.093/6.24 + 0.035) Level 1 predictors (Party and Money) and one Level 2 two predictor (Acres), we are able to improve the predictive ability of the model compared with a null model by approximately 74% at Level 1 and 81% at Level 2. The previous examples and calculations follow those laid out by Snijders and Bosker (2011). There are a couple of limitations in this approach. First, as seen here if the model of interest has random slopes, it must be refit as a simpler intercepts only model. Second, due to model misspecification or random variability, multilevel R2 values can sometimes be negative, which is clearly not desirable (Hox, et al., 2017). Recent work by Rights and Sterba (2019) addresses these limitations by laying out a more comprehensive framework for quantifying explained variance in multilevel models. They define 17 different measures of explained variance that can be used to assess how well Level 1 or Level 2 predictors do in explaining total, within-cluster, or between-cluster outcome variability.

Evaluating the Model: Diagnostics An important part of determining the adequacy of a multilevel model is checking whether the underlying assumptions of the model appear valid for the data. Two of the most important assumptions that can be empirically checked in a multilevel model are (1) that the Level 1 (within-group) errors are independent and normally distributed with a mean of zero and (2) that the random effects are normally distributed with a mean of zero and are independent across groups. These assumptions can be assessed using the Level 1 and Level 2 residuals produced during the modeling process. Although the graphical residual analysis tech-

39 niques discussed here can be done with any major statistical package, the lme4 package in R makes the process especially easy and flexible. Residuals from the multilevel models are stored in model fit objects, and these objects can be used by graphing procedures that automatically know how to treat residuals. See, for example, Section 6.9.2 in West et al. (2015). We first turn to examination of the Level 1 residuals. For this section, we will focus on our final model (Model 3 in Table 3.6). A potentially useful residual plot to consider is the boxplot of residuals by state (Figure 4.1). This plot can be used to determine if the residuals are centered at 0 (the vertical dotted line) and that the variances are constant across groups. The residuals do seem to be centered at 0, albeit with a fair amount of variability. It does appear that variability is not constant

State

Figure 4.1

Boxplots of the within-state residuals for Model 3.

WY AK MS OK AL WV KS NE AZ NC SC LA IN AR VA CO NM NY MO GA HI CA TN TX NJ KY WA MN MI OH UT NH ID FL IL SD MD WI PA NV ND MT IA MA OR CT RI ME VT DE −0.4

0.0

Level−1 residuals for Model 3

0.4

0.8

40 across states. However, many of the states have very small sample sizes, so we cannot rely too heavily on the individual boxplots for assessment of the within-group variances. Another common diagnostic plot is a scatterplot of standardized residuals against fitted values. Fitted values in a statistical model are the predicted values of the dependent variable based on the estimated model. The residuals are then the difference between the observed values of the dependent variable and these fitted values. This type of plot is particularly useful to assess problems with heteroscedasticity (Cook & Weisberg, 1982). Figure 4.2 shows the standardized residuals by political party.

Figure 4.2

Scatterplot of standardized residuals by fitted values for Model 3 by Party. Democrat

Republican

0.8 CA − Baca

IL − Phelps

LA − John MO − Danner AL − Cramer

Level−1 Residuals

0.4

TX − Stenholm MI − Barcia MN − Peterson

0.0

−0.4

IL − Porter CA − Bilbray

PA − Specter MD − Morella CA − Campbell

0.0

0.3

0.6

0.9

1.2 0.0

Fitted values

0.3

0.6

0.9

1.2

41 A noticeable pattern in the figure is the straight-edge diagonal seen in the bottom left of the Democrat panel and the top right in the Republican panel. This is an indication of both a floor effect (many Democrats always vote against the industry, and Voting % cannot be less than zero), and a ceiling effect (many Republicans always vote for the indstry, and Voting % cannot be greater than one). Floor and ceiling effects are not fatal flaws in a model, but we shall see in the first section of Chapter 5 how to build a model that avoids this problem with the data. The residuals appear to be centered at 0, and there do not appear to be major problems with heteroscedasticity (although there is some hint of reduced variability for the largest fitted values for Republicans). Furthermore, the variability of the residuals seems to be roughly the same for both parties. Finally, a number of outlying values are marked by the state from which they come. These data points are senators and representatives who tended to vote “against their party,” and thus do not seem to fit the model well. The most interesting point about these outliers is that they tend to come from the larger states. The final plots to consider for the Level 1 residuals are normal quantile–quantile plots, or QQ plots (Cleveland, 1993). As the name suggests, a normal QQ plot can be used to assess the normality of the distribution of a variable. If the data being plotted are normally distributed, they will be arrayed along a straight line in the QQ plot. Figure 4.3 shows that for our tobacco voting data, the Level 1 residuals are fairly normal. These Level 1 residual QQ plots can also be constructed for subsets of the data, which allows for further exploration of covariate effects on model fit. See Figure 4.4 for an example, where we examine the residual distributions for each party. There are a few extreme points at the upper and lower ends for both parties, but there is some suggestion that the model is not doing as good a job at predicting voting for Republicans; note how the Republican residuals are higher than the fit line for the quantiles above 1. The same types of plots can be used to check the Level 2 randomeffects assumptions. The random effects are also assumed to be normally distributed with zero mean. Because our model has two random effects (intercept and party slope), we will need to check each of them. Figure 4.5 shows a set of QQ plots for each of the random effects. The effects for Party appear to be the closest to normal. The lines are not as smooth as before, because there are only 50 Level 2 residuals, whereas there were 527 Level 1 residuals.

42 Figure 4.3

Normal quantile–quantile plot of Level 1 residuals.

0.8

Level 1 Residuals

0.4

0.0

−0.4

−2

0

2

Normal quantiles

Evaluating the Model: Influence Statistics Influence statistics are formal techniques allowing for the diagnosis or identification of observations that influence various aspects of a fitted statistical model. They can be used to help model interpretation and model improvement, either by identifying observations that are outliers or by identifying observations that contribute relatively more than other observations to the predicted (fitted) values of the dependent variable. Although used more often in traditional multiple regression applications, these techniques have been extended to work with mixedeffect models (Van der Meer, Te Grotenhuis, & Pelzer, 2010). Cook’s distance is a popular composite influence statistic, in that it provides an overall measurement of the change in all parameter

43 Figure 4.4

Normal quantile–quantile plot of Level 1 residuals by Party. Democrat

Republican

0.8

Level 1 residuals

0.4

0.0

−0.4

−3

−2

−1

0

1

2

3

−3

−2

−1

0

1

2

3

Normal quantiles

estimates in a regression model when a particular observation is dropped from the model. For mixed-effects models, Cook’s distance is particularly useful for detecting influential Level 2 units (Nieuwenhuis, Te Grotenhuis, & Pelzer, 2012). That is, how much does a fitted multilevel model change when an entire Level 2 unit is removed? Figure 4.6 presents the Cook’s distance statistics for each state based on Model 3. Four states lay outside of the general smoothly increasing influence curve, and three states are highlighted based on their scores being greater than 4/k, where k here is number of Level 2 units (states). (There are a number of commonly used cutoff criteria, including 4/k; in point of fact any outlying influential observation should be examined.) Here, we see that North Carolina, Kentucky, California, and possibly

44 Figure 4.5

Normal quantile–quantile plots for random effects of Model 3. Intercept random effect

Party random effect

0.05

Level 2 random effects

Level 2 random effects

0.1

0.0

0.00

−0.05

−0.1

−2

−1

0

1

Normal quantiles

2

−2

−1

0

1

2

Normal quantiles

New Jersey appear to be influential states for our multilevel model. The top two states are in fact the largest tobacco-growing states; they each harvested more than 200,000 acres of tobacco at the time of the study. Therefore, they may be influential in this model by how the acres covariate works in affecting congressional voting behavior. California and New Jersey are both large states with a high number of members of Congress. In that case, it may make sense that larger states are more influential. Instead of a composite influence diagnostic, we can also examine how particular Level 2 units influence the values of particular parameters in a model. DFBETAS, or the standardized difference of the beta, is a measure of the difference in an estimated parameter coefficient

45 Influence plot for states in Model 3.

State

Figure 4.6 NC KY CA NJ AL PA TX TN MI NY IA MO SC ME DE WI CT MS OH MA OK MD LA IN FL RI AR UT VA MT CO IL VT KS GA MN NE NM WV WA ID WY SD OR AK NV ND NH AZ HI 0.0

0.1

0.2

0.3

Cook's distance

when an observation (or here a Level 2 unit) is dropped. Table 4.2 presents the DFBETAS for each parameter for the four states with the highest Cook’s distances. Here we can see that, as suspected, the overall influence of North Carolina and Kentucky operates through the acres variable. California, on the other hand, is more influential through the money predictor, and New Jersey through the political party covariate. The DFBETAS values are standardized, and the direction of the value indicates whether inclusion of the Level 2 unit increases or decreases the particular parameter estimate. So, for example, including California in the model tends to increase the size of the money parameter coefficient. A common step for further exploring the role of influential cases on estimated models is to fit the same models after removing those

46 Table 4.2

DFBETAS for most influential states in Model 3

State

Cook’s Distance

NC KY CA NJ

0.334 0.109 0.102 0.073

DFBETASParty

DFBETASMoney

DFBETASAcres

−0.244 0.083 0.290 −0.496

0.103 0.038 0.644 0.199

−1.278 0.730 0.054 0.003

Note: DFBETAS = standardized difference of the beta.

cases. This can help reveal whether the estimated model parameters are driven primarily or even exclusively by those influential cases. In our example we could refit Model 3 after dropping the cases from the four most influential states: New York, Kentucky, California, and New Jersey. Table 4.3 compares the model parameter estimates from the full Model 3 (left-hand side) with the results of the refitted model after dropping these four states. The results suggest that while these states might be influential, they are not entirely driving the results. The parameter estimates are similar across models, and in only one

Table 4.3

Comparison of two models after dropping influential states All States

Fixed Effects

Coefficient

Intercept 0.180 Acres 0.003 Party 0.510 Party by Acres −0.002 Money 0.005 Money by Acres −0.000 Random Effects Intercept Party slope Level 1 Model Fit AIC

SE 0.020 0.001 0.021 0.000 0.001 0.000

Dropped NY, KY, CA, NJ T-Ratio Coefficient 9.17 5.01 24.70 −3.50 8.70 −2.82

0.195 0.002 0.501 −0.001 0.004 −0.000

SE 0.021 0.001 0.020 0.000 0.001 0.000

SD

Variance

SD

Variance

0.091 0.067 0.164 AIC

0.008 0.004 0.027

0.097 0.041 0.160 AIC

0.009 0.002 0.026

−329.9

−273.1

Note: SE = standard error; AIC = Akaike information criterion.

T-Ratio 9.18 2.99 25.06 −2.63 7.54 −1.26

47 case are the statistical tests different (for the Money by Acres cross-level interaction term). The random effects are also similar for the two models. In some studies, overly influential cases may indicate problems with the underlying data or the original study design. In these situations it might make sense to drop these cases from further analyses. However, in many situations (particularly in observational studies), dropping cases is not a good idea, even if they are influential. In our data, for example, it would be hard to argue for dropping any state, since we want to understand the influence of money on politics for the entire U.S. Identifying influential observations helps you understand your model better, it does not always mean that you have to fix your model by dropping data.

Estimating Posterior Means Although much of the time we are most interested in the estimation of the fixed part of a multilevel model, there are a number of reasons why we might want to examine the random part, in particular to explore in more detail the variability between Level 2 units on the intercepts and slopes of the Level 1 model. The procedure to do this is known as empirical Bayes (EB) estimation (Gelman et al., 2014). Consider the fully unconstrained model listed at the top of Table 2.1. γ00 is the fixed effect that determines the intercept, and it is simply the grand mean of Yij . However, u0j tells us that groups vary around that grand mean. If we want to estimate β0j for a particular Level 2 unit, we could simply calculate Y¯ .j , the mean for the particular group. This is the same value that you would get if you simply performed an OLS regression for that particular group by itself. However, we would trust this as an estimate for the particular group only if Y were measured with no error. Moreover, the less reliable our measure of Y is in a particular group, the more we will want to use the grand mean across all the groups as our estimate. This is essentially how the empirical Bayes estimate works: EB OLS = λj βˆ0j + (1 − λj )γˆ00 βˆ0j

(4.8)

where λj is the reliability of Y in group j. If the reliability is high (close to 1), then the Bayesian estimate of the intercept will be very close to the simple OLS estimate from the individual group. If the reliability is low, then the Bayesian estimate will approach the fitted value for the grand mean of Y . (Remember that γˆ00 is our estimate of the grand mean of Y .) Note that the Bayesian estimate will always lie between the group

48 mean and the grand mean, and, as reliability worsens, the estimate will “shrink” toward the grand mean. Thus, EB estimates are sometimes called shrinkage estimates. Therefore, the Bayesian estimate of the multilevel regression parameters represents a balance between the information obtained from a specific group and information obtained from the entire data set. The formula for the reliability, λj =

σu20 (σu20 + σr2 /nj )

(4.9)

indicates that a major determinant of high reliability for a particular group is the number of Level 1 units in that group (nj ). This makes sense in that a group with a relatively large sample size is contributing more information to the model than a group that only has only a small number of members. In our example, California will have a more reliable estimate of voting percentage (with 55 members of Congress) than will Oregon (with only 7). The same approach is used to determine the empirical Bayes estimates of any random Level 1 slopes in a multilevel model. Similar to Equation 4.8, the shrinkage estimate for a random slope is a mixture of information from the specific Level 2 unit and information from the entire data set: EB OLS = λj βˆ1j + (1 − λj )γˆ10 (4.10) βˆ1j Table 4.4 lists the EB estimates for each state for Model 3. The table also displays the number of legislators in each state and the amount of harvested acres of tobacco. By examining these numbers, you can see that those states with the smallest sample size and with no tobacco acreage tend to have EB estimates that are closest to the fitted gammas, thus demonstrating Equation 2.17. For example, consider the EB estimates for Alaska (EB-Intercept = 0.18, EB-Party = 0.51) and for New York (EB-Intercept = 0.08, EB-Party = 0.58), and compare each of these with the general estimates taken from Model 3 (Intercept = 0.18, Party = 0.51). Alaska, with only three members of Congress, has a much lower reliability of its estimates; therefore, the EB estimates for Alaska are close to the overall Model 3 estimates. New York, on the other hand, has a sample size of 33. With its higher reliability, New York’s EB estimates are relatively further away from the general model estimates. These EB estimates can be used in a number of ways. First, the shrinkage estimates can be used to examine or identify individual Level 2 units of interest. For example, perusing Table 4.4, we can quickly

49 Table 4.4

Empirical Bayes (EB) estimates for 50 states (based on Model 3) State

Size

AK AL AR AZ CA CO CT DE FL GA HI IA ID IL IN KS KY LA MA MD ME MI MN MO MS MT NC ND NE NH NJ NM NV NY OH OK OR PA RI SC

3 9 6 8 55 8 8 3 25 13 4 7 4 22 12 6 8 10 12 10 4 18 10 11 7 3 14 3 5 4 15 5 4 33 21 8 7 23 5 8

EB-Intercept

EB-Party

0.1838 0.3496 0.2726 0.1936 0.1580 0.2019 0.0670 0.1119 0.1927 0.1463 0.1865 0.1379 0.2111 0.1891 0.2512 0.2402 0.2168 0.2785 0.0592 0.1075 0.1214 0.1489 0.1932 0.3130 0.3022 0.1432 0.1200 0.1455 0.1605 0.1848 0.1534 0.1823 0.1436 0.0773 0.1222 0.2513 0.1686 0.1646 0.0791 0.2541

0.5100 0.4194 0.4708 0.5012 0.5685 0.5295 0.5391 0.5051 0.4912 0.5435 0.5067 0.4983 0.5118 0.5268 0.5133 0.4854 0.4821 0.4901 0.5712 0.5225 0.4981 0.5436 0.4763 0.4348 0.4667 0.5160 0.5516 0.5275 0.5364 0.5101 0.4560 0.5338 0.5280 0.5851 0.5102 0.5143 0.5264 0.4846 0.5478 0.5040

Tobacco Acres (Thousands) 0.0 0.0 0.0 0.0 0.0 0.0 3.0 0.0 5.8 33.0 0.0 0.0 0.0 0.0 6.5 0.0 221.7 0.0 1.3 6.5 0.0 0.0 0.0 2.3 0.0 0.0 207.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 9.8 0.0 0.0 6.2 0.0 39.0 (continued)

50 Table 4.4

(continued) State

Size

SD TN TX UT VA VT WA WI WV WY

3 11 32 5 12 2 11 11 5 2

EB-Intercept

EB-Party

0.1308 0.2009 0.3176 0.1289 0.2386 0.1198 0.1211 0.1276 0.2394 0.2108

0.5323 0.4821 0.4326 0.5066 0.4694 0.5180 0.5433 0.5769 0.4800 0.5117

Tobacco Acres (Thousands) 0.0 63.2 0.0 0.0 38.3 0.0 0.0 1.2 1.6 0.0

ascertain the state with the highest intercept (Alabama) or the states that show the strongest relationship between party and protobacco voting (New York, Wisconsin, and Massachusetts). The second use is to examine all the EB prediction equations to explore the variability of the random effects across the Level 2 units. Figure 4.7 plots the posterior mean prediction equations for each state, with separate lines for Democrats and Republicans. The figure clearly shows the strong effect for political party and the consistent positive effect of money on protobacco voting. The figure shows that most states fall within a fairly tight range, with Republicans showing less variability than Democrats, especially with larger amounts of money. The small number of lines that appear to be outliers are identified as the states with large tobacco economies. At first glance, this graph may seem to suggest that there is an interaction between party and money on voting. However, Model 3 does not include this interaction term. Instead, the different money slopes here are simply the indication of how the money effect varies from state to state in our mixed-effects model. Because there is no interaction, the prediction lines for Democrats and Republicans in each state will be parallel. In fact, there is an important interaction present between party and money on voting, as can be seen by fitting the interaction model or by considering Luke and Krauss (2004). Finally, another way to use the EB prediction equations is to see how they fit to the original data. Figure 4.8 shows a trellis plot of each state’s data with the shrinkage estimates. Party is ignored for this graph. This

51 Bayesian estimated regression lines for Model 3 by Party.

Figure 4.7 100%

Predicted protobacco voting (%)

75%

50%

25%

Party Democrat Republican

0% 0

30

60

90

120

PAC contributions ($K)

graph is useful to see which states have prediction equations that lie close to the original data. In general, states with larger sample sizes and more variability on Money and Voting % show greater agreement between the data and the predictions. So states such as Virginia and Ohio show a better match between their data and the Bayesian prediction estimates than a state such as Oregon (which had no members of Congress who accepted any tobacco industry PAC money). Compare this graph with Figure 3.3, which displays the same data points, but with the individual, state-specific regression lines. The Bayesian posterior mean estimates do not fit the state-specific data as well, but they represent the best estimates of voting behavior, using both state-specific information as well as information from all 50 states.

52 Figure 4.8

Scatterplots of voting data with empirical Bayes estimates.

AK

AL

AR

AZ

CA

CO

CT

DE

FL

GA

HI

IA

ID

IL

IN

KS

KY

LA

MA

MD

ME

MI

MN

MO

MS

MT

NC

ND

NE

NH

NJ

NM

NV

NY

OH

OK

OR

PA

RI

SC

SD

TN

TX

UT

VA

VT

WA

WI

WV

WY

1.00 0.75 0.50 0.25 0.00 1.00 0.75 0.50 0.25 0.00

Vote %

1.00 0.75 0.50 0.25 0.00 1.00 0.75 0.50 0.25 0.00 1.00 0.75 0.50 0.25 0.00 1.00 0.75 0.50 0.25 0.00 0 30 60 90

0 30 60 90

0 30 60 90

0 30 60 90

0 30 60 90

0 30 60 90

1.00 0.75 0.50 0.25 0.00 0 30 60 90

0 30 60 90

PAC contributions ($K)

Centering An important issue in multilevel modeling that we have ignored until now is that of centering Level 1 predictor variables. There are two primary advantages of centering predictors in mixed-effects models: (1) Obtaining parameter estimates that are easier to interpret and (2) increasing the stability of the model by removing high intercorrelations among the random effects (Kreft & de Leeuw, 1998). Centering is simply the process of linearly transforming a variable X by subtracting a meaningful constant, often some type of mean of X . For example, consider centering X on its grand mean: Xij = (Xij − X¯ .. )

(4.11)

53 Xij is now interpreted as a deviation away from the grand mean rather than a raw score. For example, if we perform grand-mean centering on Money, it is now interpreted as how much more or less PAC money an individual member of Congress receives compared with the average for all members of Congress. Furthermore, a score of 0 on the transformed variable is now interpreted as receiving the average amount of money from the tobacco industry. This may be a more meaningful reference point than the untransformed variable, where 0 represents a person who received no money from the industry. In fact, an important use of centering is to come up with a meaningful zero-point for a predictor variable. Centering (sometimes called reparameterization) is often done in ordinary multiple regression. However, centering is not a major issue in ordinary regression because the important elements of a multiple regression model (e.g., parameter estimates, standard errors, model fit) are not changed when a predictor variable is centered. This is not the case for multilevel modeling. If a multilevel model has random slopes, then centering a Level 1 predictor variable can change some elements of the model and not just the interpretation of the transformed variable. To see why this is so, consider Figure 4.9, adapted from Hox (2017). This figure shows an example where the slopes of Y on X vary across three groups. This would suggest the need for a randomslopes multilevel model. Two zero-points on the x-axis are highlighted. One is the zero-point for the original X variable, whereas the other (X *) is the shifted zero-point for some centered version of X . Notice that the variability of the intercepts of the regression lines is different at these two points; it is larger for X * than it is for X . This tells us that the variance of the Level 1 intercept is not constant in a mixed-effects model and will change if we center X . If we did not fit a random-slopes model, then the slopes of the three lines would be the same and the variance would not change when centering X (see Figure 2.1). Table 4.5 shows how this works for the tobacco data. The left side of the table presents our final model from before (Model 3), where neither of the Level 1 predictors are centered. That is, a 0 for money means that that legislator received no money from a tobacco PAC source. Remember that we interpret the parameter coefficient for the intercept as the expected value when all the predictors are zero. In this case, we predict that a Democrat who receives no PAC money (Party = 0 and Money = 0) will vote protobacco 18.0% of the time. The middle part of the table shows the results for the same model, except that Money has now been grand-mean centered. The estimate for the intercept has now

54 Example showing varying intercept variability due to centering.

Y

Figure 4.9

X=0

X* = 0 X

changed to 0.24. We interpret this differently because of the centered Money predictor. Now we say that a Democrat who receives the average amount of money for Congress will vote protobacco approximately 24% of the time. Notice that the only parts of the model that have changed are those parameter estimates and variance components associated with the intercept. The slopes and variance components for Money and Party are unaffected, as is the overall fit of the model.

55 There is another common way to center a Level 1 predictor. Instead of adjusting by the grand mean, we can adjust for the group mean: Xij = (Xij − X¯ .j )

(4.12)

A group-centered variable is interpreted as a deviation away from the mean within a particular group. Thus, the parameter estimate is interpreted as the within-groups effect of the Level 1 (centered) covariate. In contrast, the coefficient of an uncentered Level 1 covariate is a weighted average of both within- and between-groups effects (Bell, Jones, & Fairbrother, 2018). Group-mean centering is thus a little more complicated than centering with the grand mean. The interpretation of the estimated coefficient is different, and the effects on the multilevel model are more pervasive. This can be seen on the right side of Table 4.5. Now, we interpret the intercept as indicating that a Democrat who receives the average amount of PAC money received in his or her state is likely to vote protobacco 22.9% of the time. Notice that unlike grand-mean centering, group-mean centering has resulted in different parameter and variance estimates throughout the entire model. In our example, they are not that different, but in other research situations they could differ more dramatically. Given these complexities, one should use group-mean centering when there are specific theoretical reasons to do so. One type of relevant research situation is when you want to be able to distinguish between within-group and between-group regressions. This is often the case in growth curve modeling using longitudinal data (see below). Another research area where group-mean centering is useful is when you are interested in “frog–pond” effects, where the focus is more on the fit of an individual to his or her environment rather than on how individual scores affect some outcome (Enders & Tofighi, 2007). The following guidelines about centering may be helpful for the practitioner: 1. Always base centering decisions on theoretical grounds. Although centering can have statistical consequences, these should be of secondary concern compared with the scientific goals of the analyses. 2. If any of the predictor variables do not have meaningful zeropoints, they should be centered so that the intercepts in the multilevel model will be interpretable. For example, a Likert-type variable scored from 1 to 7 should not be used in its raw form. If it were, then the intercept would be interpreted as the expected value when the scale is 0, which is a meaningless value.

0.0048 −0.0000

For Money slope (β2j ) Money (γ20 ) Acres (γ11 )

10

−349.9

Note: PAC = political action committee; SE = standard error.

Parameters

Deviance

Model Fit

0.008 0.004 −0.004 0.027

−349.9

Deviance

0.091 0.067 −0.691 0.164

SD

Variance Component

0.091 0.067 −0.691 0.164

0.0048 −0.0000

0.5098 −0.0016

0.2424 0.0025

Coefficient

10

Parameters

0.008 0.004 −0.004 0.027

Variance Component

0.0006 0.0000

0.0206 0.0005

0.0195 0.0005

SE

Grand-Mean Centered (0 = Mean PAC $ across all states)

0.0006 0.0000

0.0206 0.0005

0.0197 0.0005

SE

Intercept (u0j ) Party slope (u1j ) Covariance (u0j , u1j ) Level 1 (eij )

SD

0.5098 −0.0016

For Party slope (β1j ) Party (γ10 ) Acres (γ11 )

Random Effects

0.1804 0.0027

For Intercept (β0j ) Intercept (γ00 ) Acres (γ01 )

Coefficient

Not Centered (0 = No PAC $ received)

Comparison of centered and noncentered models

Fixed Effects

Table 4.5

−333.5

Deviance

0.115 0.070 −0.667 0.163

SD

0.0043 −0.0000

0.5135 −0.0017

0.2294 0.0028

Coefficient

10

Parameters

0.013 0.005 −0.005 0.027

Variance Component

0.0006 0.0000

0.0211 0.0005

0.0222 0.0005

SE

Group-Mean Centered (0 = Mean PAC $ within a state)

56

57 3. Binary or indicator variables can also be centered. By adjusting for the grand-mean of a binary variable you are, in effect, removing the effects of that variable when interpreting the intercept. For example, by centering on Party and Money in the tobacco data set, the resulting intercept estimate is 0.52. This indicates that a typical legislator, regardless of political party, votes protobacco about half of the time. 4. Grand-mean centering of a Level 1 predictor affects only the parts of the model associated with the intercept. 5. Group-mean centering can be useful in certain situations, especially when the research question is more focused on within-group effects, or person–environment relationships.

Power Analysis Ever since Jacob Cohen published his seminal book, Statistical power analysis for the behavioral sciences (1977), analysts in the social and health sciences have included power analysis as one of the essential tools in the statistical arsenal. The power of a statistical test is the probability of rejecting its null hypothesis when the alternative hypothesis is true. Put another way, the power of a statistical test based on a particular study design is the ability of that test and design to detect real effects.

Two Types of Power Analysis Power is related to four other pieces of information: (1) the significance criterion of the test (alpha); (2) the size of the effect of interest in the population; (3) the variability of that effect, again in the population; and (4) the sample size that will be used to detect the effect. In fact, if you have knowledge about any four of these five pieces of information, then the fifth can be determined. This leads to two basic types of power analysis: a priori and post hoc. In a priori power analysis, the goal is to determine an appropriate sample size for a study and analysis that has not been conducted yet, given a particular desired level of power. Typically, social science researchers will want to achieve a power of at least 80%. Using theory as well as statistical and empirical knowledge (e.g., from previous studies of the same phenomenon of interest), the investigator will set reasonable estimates for alpha, power, effect size, and variability, and then the necessary sample size will be calculated. This is such an integral part of

58 study planning that it is very hard to get any behavioral or health experimental/intervention research funded without this type of a priori power analysis. For post hoc power analysis, the temporal order is reversed. For a study and analysis that has already been conducted, the actual alpha, sample size, observed effect size, and variability are used to assess what level of power was actually achieved in the study. Many statisticians and social scientists have argued that post hoc power analysis is not very helpful and may lead to misinterpretation of study results (Goodman & Berlin, 1994). After all, determining that power is low for a statistic that was nonsignificant (p > .05) is very uninteresting and almost tautological. However, post hoc power analyses can be used effectively to plan future, prospective studies using information from the current study. For example, a pilot study might be conducted using a convenience sample, where the expectation is that power might be low. The pilot data (observed effect sizes and variability) and the achieved power can be examined to help plan the next study that will have the appropriate sample size to achieve the desired power. (Of course, in this way you are really doing a priori power analysis using pilot data to produce more reliable and valid effect and variability estimates.)

Power Analysis for Mixed-Effects Models Power analysis can be conducted for longitudinal and multilevel mixedeffects models. However, for multilevel or longitudinal designs the power analysis is more complicated compared with simpler single-level crosssectional designs. Power analysis for these types of mixed-effects models needs to take into account fixed-effects at multiple levels, the clustering that occurs in multilevel models, and the random effects. For example, a power analysis for a two-level model might focus on the power needed to detect a Level 1 predictor, a Level 2 predictor, or a cross-level interaction covariate. Despite the complexity, power analysis is extremely useful for multilevel designs. In particular, when planning a multilevel design, investigators who want to maximize power will often have different sample size options available to them. Consider, for example, a cluster randomized trial design (Murray, 1998) of health outcomes of patients who are nested within health clinics. To increase power, the investigators could increase the number of observed patients within each clinic, they could increase the number of clinics in the study, or they could do both. A mul-

59 tilevel power analysis for the primary outcome can help determine which of these approaches will be most effective at achieving the desired power. Note, however, that the results of the power analysis can only tell you about statistical efficiency. The investigators will also have to assess the cost of a particular sampling approach, where cost means economic, time, and feasibility requirements. In the cluster randomized health clinic example, the power analysis might suggest that doubling the number of patients at each clinic site would be the most efficient way to achieve the desired power. In reality, it might be hard to increase the recruitment rate within the clinics; and adding more clinics may be more feasible. In any case, the results of the power analysis are an important design tool for many types of multilevel studies. There are two approaches for power analysis in mixed-effects models. Traditionally, power could be calculated using statistical theory that produced closed-form solutions and approximate formulas. These formulas produce estimates of standard errors of effects based on multiple inputs (i.e., sample sizes at each level, size of the variance components, or ICC; Arend & Schäfer, 2019). The main limitation of this approach is that the formulas only apply to certain designs—adding additional predictors, covariates, random effects, or additional levels would make them inapplicable or much harder to use. A more flexible approach for estimating power and sample sizes in multilevel models is simulation (Arnold, Hogan, Colford, & Hubbard, 2011). Simulation uses Monte Carlo methods to take parameter estimates (i.e., alpha, desired power, effect sizes, clustering, variance components) and repeatedly build data sets that assume these parameters are true in the population. The resulting power is the proportion of the simulated data sets where the hypothesized effect is detected as significant. The main advantage of simulation is that any model that could be fit by mixed-effects methods can be the subject of the power analysis. See Arend and Schäfer (2019), for a very nice overview of simulation of statistical power in two-level models. For general power considerations, see Kain, Bolker, and McCoy (2015). More specific discussion of power with cross-level interactions is provided by Mathieu and colleagues (2012). To see the utility of multilevel power analysis, we will consider a simple variation of the tobacco voting models from Chapter 3. For this example, we are interested in one Level 1 predictor (Money), and one Level 2 predictor (Acres). We would like to explore the power of our model for explaining variability of the outcome variable (Voting %). This simplified model is represented in the following mixed-effects equation:

60 Level 1: Level 2:

Yij = β0j + β1j Mij + rij β0j = γ00 + γ01 Aj + u0j β1j = γ10 + γ11 Aj + u1j

(4.13)

where Mij is the money Level 1 predictor, and Aj is the acres Level 2 predictor. The single equation form of the same model makes it clearer that there are actually three fixed effects that are relevant for power analysis, as well as two Level 2 random effects: Yij = γ00 + γ10 Mij + γ01 Aj + γ11 Aj Mij + u0j + u1j Mij + rij

(4.14)

In this example, we are using data from the tobacco voting data set but only from five members of Congress randomly selected from five states: Tennessee, Alabama, Missouri, Minnesota, and Maryland. This approximates using results from a limited pilot study to inform planning for a future larger scale study. So using data from these five states, we can explore the effects of increasing Level 1 sample size (members of Congress) or Level 2 sample size (number of states). Furthermore, given our model here, we can calculate power for our Level 1 predictor (Money), our Level 2 predictor (Acres), or the cross-level interaction (Money by Acres). For the power analysis simulation, we followed the outline provided by Green and MacLeod (2016). The results of the fitted two-level model using the subsetted data (five members in each of the selected five states) are presented in Table 4.6. Although the parameter estimates are comparable with those we have seen in previous analyses, the results are not significant. Given the small Level 1 and Level 2 sample sizes, it is not surprising to see that the power for the three covariates is quite low. However, we can use the information from this fitted model to do a fuller prospective power analysis. That is, how many members of Congress and how many states would we need to have to detect the effects of money, acres, and their interaction on voting behavior? Table 4.6

Results and power for five-state subset (TN, AL, MO, MN, and MD) Fixed Effects Intercept Money Acres Money by Acres

Coefficient

SE

T-Ratio

Power

0.4817 0.0057 0.0046 −0.0001

0.1265 0.0042 0.0046 0.0001

3.81 1.36 1.01 −1.09

0.26 0.15 0.10

61 Simulated power for increasing number of states and representatives within states.

Figure 4.10

Estimated power

L1 − Money

L1 − Money

100%

100%

80%

80%

60%

60%

40%

40%

20%

20%

0%

0% 10

20

30

40

50

10

20

Estimated power

L2 − Acres

40

50

40

50

L2 − Acres

100%

100%

80%

80%

60%

60%

40%

40%

20%

20%

0%

0% 10

20

30

40

50

10

20

30

CLI − Money by Acres

CLI − Money by Acres Estimated Power

30

100%

100%

80%

80%

60%

60%

40%

40%

20%

20% 0%

0% 10

20

30

40

50

10

States

20

30

40

50

Reps within states

Note: CLI = cross-level interaction; Reps = representatives.

The prospective power analysis results are presented in Figure 4.10. The figure shows six sets of power analyses. Each row of the figure presents power for one of the covariates. The top row for money (Level 1), the middle row for acres (Level 2), and the bottom row for the interaction of money with acres (cross-level interaction). The lefthand column shows the results of the simulations when the number of groups (states) are increased. The right-hand column shows the power when the number of representatives within each state is increased. The points in each graph panel are the estimated power values, and the bars around each point are the empirical 95% confidence ranges based on the multiple simulations.

62 The results show that in general, increasing the number of states or the number of representatives within states will increase the power of the model. The one exception is the effect of acres; here, we see that only increasing the number of states improves power. This makes sense given that acres is a Level 2 predictor, so the number of Level 2 units is more important here. This example shows that the simulation approach for estimating power in multilevel models can provide a lot of useful information that can help study designers. In particular, when cost, time, and feasibility constraints affect how easy it is to increase sample sizes at different levels, power analyses like these can be critical.

CHAPTER 5. EXTENDING THE BASIC MODEL It’s a dangerous business, Frodo, going out your door. You step onto the road, and if you don’t keep your feet, there’s no knowing where you might be swept off to. —J. R. R. Tolkien, 1954

The Flexibility of the Mixed-Effects Model The previous chapter focused on building, estimating, checking, and interpreting the basic multilevel model using mixed-effects techniques. By basic, we mean that the model was limited to two levels, and the dependent variable was continuous. However, mixed-effects modeling is quite a bit more flexible than implied by the basic models explored in Chapter 3. Most important, we can extend the model to handle a wide variety of noncontinuous dependent variables. In addition, multilevel models are not limited to just two levels, although there are practical and theoretical reasons for not including too many levels in a multilevel model. Finally, mixed-effects models can handle both nested and crossed random effects when two or more random effects are included in a model. The rest of this chapter describes these useful extensions to the basic model.

Generalized Models As suggested above, there is an important limitation in modeling the congressional voting behavior with the basic hierarchical linear model. Our dependent variable is a proportion, and thus it violates the general linear model assumptions of normality and homoscedastic errors. Also, because the voting proportion is bounded at 0 and 1, we may find that predictions of voting behavior based on the fitted models lead to estimated values outside of that range. It is difficult to know what to make of a prediction that says a particular type of Congress member will vote protobacco 120% of the time! Fortunately, multilevel modeling can be extended to handle a wide variety of different types of noncontinuous or nonnormal dependent variables, including binary, proportion, count, and ordinal variables. To do this, we use what is called a generalized linear mixed-effects model 63

64 (GLMM). GLMM works by including a necessary transformation and an appropriate error distribution for the dependent variable into the statistical model. For a good overview of generalized models, see Gill and Torres (2019).

Binary Outcomes As an example, consider a binary (dichotomous) dependent variable. This untransformed variable is bounded by 0 and 1 and is highly nonnormal. We can assume that the underlying probability distribution is binomial with mean μ. Our estimate of μ is p, which we interpret as the probability of the event occurring. That is, the expected value of our binary dependent variable is the probability that its value is 1: E(Y ) = p(Y = 1) A typical transformation for a binomial model is the logit:   p logit(p) = ln 1−p

(5.1)

(5.2)

Figure 5.1 illustrates why the logit transformation is so useful. Although p is bounded, the logit of p is unbounded, and the density of logit(p) is much closer to normal. With generalized mixed-effects models, this type of transformation is called a link function. First, we set up a transformational link that connects the untransformed dependent variable p to a new transformed variable η. So in the case of binary data, our link function is η = logit(p)

(5.3)

We then set up a traditional Level 1 prediction model as described in Chapter 2: η = β0 + β1 X1 + · · · + βk Xk (5.4) for k predictor variables. Notice that there is no term for the Level 1 error variance. For binary (and binomial) variables, the variance is completely determined by the mean, and thus is not a separate term to be estimated. Level 2 models that predict the Level 1 betas can then be constructed as before. There are certain link functions and associated probability distributions that are typically used for the various types of nonnormal data. These so-called canonical link functions are listed in Table 5.1 (Raudenbush & Bryk, 2002). In this table, μ is the expected value of the

65 Figure 5.1

Illustration of logit transformation.

logit(p)

4

0

−4

0.00

0.25

0.50

0.75

1.00

p

dependent variable, for example that would be p for a binary outcome. One of the advantages of GLMM is that you can explicitly pick alternative link functions and/or probability distributions that are appropriate for your data and theories. We can see how GLMM works by applying it to our tobacco voting data. In Chapter 3, we built a model that predicted the proportion of the total votes that were in the protobacco industry direction. However, this proportion was actually obtained by aggregating across the set of individual votes in which a member of Congress participated during the 105th and 106th Congresses (from 1997 to 2000). So instead of aggregating to obtain a proportion, we can instead use a GLMM to build a model that predicts the likelihood of voting protobacco for any individual bill or amendment.

66 Table 5.1

Canonical link functions for GLMMs

Dependent Variable Binary Proportion Count Ordinal

Link Function Logit Logit Log Cumulative logit

Formula μ η = ln( 1−μ ) μ ) η = ln( 1−μ

η = ln(μ) μ ) η = ln( 1−μ

Distribution Binomial Binomial Poisson Proportional odds

Source: Adapted from Raudenbush and Bryk (2002), Table 10.8, p. 333. Note: GLMM = generalized linear mixed-effects model.

In this model, the dependent variable (Vote) is binary, and is coded 0 for a vote that was against the tobacco industry’s interests, and coded 1 for a proindustry vote. We still wish to build a multilevel model, but now the levels have changed. At Level 1, we are measuring and predicting votes on individual bills. These individual votes are nested within a particular member of Congress, so Level 2 is now the individual. We wish to model the likelihood (probability) of voting protobacco on any individual bill. This can be seen more explicitly in the formal model: Level 1: Level 2:

ηij = logit(pij ) ηij = π0j π0j = β00 + β01 Partyj + β02 Moneyj + u0j

(5.5)

with the logit link function specified, and where pij is the probability that the vote of the jth member of Congress on the ith bill is in the proindustry direction. The only predictors in this model are the Level 2 individual predictors of political party and amount of PAC money received. If we had relevant bill level predictors (e.g., whether the vote was for a full bill or just an amendment, or which party sponsored the bill), then they could be included in the Level 1 submodel. In this case, we are modeling only the Level 1 intercept, which is interpreted as the average probability that an individual member of Congress would vote protobacco on a bill. Notice again that there is no error term for the Level 1 submodel. This model also has only one random effect, which is the variability of the individual votes across persons (u0j ). The role of the link function can be seen more clearly with the singleequation form of the model. The right-hand side of the equation shows our traditional prediction model, but the left-hand side makes it clear that the model produces estimates of the transformed outcome, in this case the logit of the probabilities (sometimes called the log-odds).

67 Table 5.2

Results of two-level GLMM of binary voting data

Fixed Effects

Coefficient

SE

T-Ratio

p

Intercept (β00 ) Party (β01 ) Money (β02 )

−1.855 2.717 0.035

0.0814 0.1068 0.0030

−22.80 25.43 11.58

.000 .000 .000

Random Effects

SD

Variance Component

0.842

0.7082

Deviance

Parameters

AIC

BIC

7117.0

4

7125.0

7152.7

Intercept (u0j ) Model Fit

Note: GLMM = generalized linear mixed-effects model; SE = standard error; AIC = Akaike information criterion; BIC = Bayesian information criterion.

logit(pij ) = β00 + β01 Partyj + β02 Moneyj + u0j

(5.6)

Table 5.2 presents the results of the GLMM of the binary voting data. As before, all the coefficients are significant. However, before their values can be interpreted correctly, the coefficients need to be transformed back into their original underlying units. This is done by using the appropriate inverse function of the original link function. The inverse of the logit function is the logistic function: p = logistic(β0 + β1 X1 + β2 X2 ) =

e(β0 +β1 X1 +β2 X2 ) . 1 + e(β0 +β1 X1 +β2 X2 )

(5.7)

So to find the predicted probability that a Democrat receiving no PAC funds would vote protobacco (Party = 0 and Money = 0), we would calculate logistic(−1.855) = (e−1.855 /(1 + e−1.855 )) = 0.135. For a Republican who receives no funds (Party = 1, Money = 0), we calculate logistic(−1.855 + 2.717), which gives the predicted probability of .703. Finally, for a Republican who received $10,000 (Party = 1, Money = 10), logistic(−1.721 + 2.544 + 10 ∗ 0.035) = 0.771. Notice that these probability values are similar to the results of Model 3 listed in Table 3.6. It is often easier to interpret the results of a GLMM by plotting the transformed coefficients to get a prediction graph. Figure 5.2 shows

68 Predicted two-level GLMM binary model voting probabilities.

0.8 0.6 0.4 0.2

Republican Democrat

0.0

Predicted protobacco voting probabilities

1.0

Figure 5.2

0

20

40

60

80

100

120

PAC contributions ($K) Note: GLMM = generalized linear mixed-effects model.

the predicted voting probabilities for the Democratic and Republican legislators. These probabilities are obtained by using the logistic transformation (the inverse of the logit link function) on the estimated model parameters. This plot also highlights the nonlinear form of these predictions. In particular, by using a GLMM, we get predictions that make sense and do not go outside the boundaries of a probability. One should be cautious about interpreting the deviance, Akaike information criterion (AIC), and Bayesian information criterion (BIC) for generalized mixed-effects models. GLMMs are not estimated with maximum-likelihood techniques. Instead, many packages use a penalized quasi-likelihood (PQL) estimation procedure, or other similar approaches. PQL produces an asymptotic approximation to the likelihood. Therefore, the deviance and the information theoretic statistics

69 such as the AIC, which are based on the deviance, may be less reliable, especially for small sample sizes. The technical details of PQL and other nonlinear estimation techniques are described in Raudenbush and Bryk (2002) and Pinheiro and Bates (2000).

Count Outcomes Another type of data that require the use of generalized models is count data. Counts, such as the number of emergency room visits per day or the number of gunshot victims, are commonly encountered by social and health scientists. Counts require a generalized approach because they can never be negative and their error distributions typically follow a Poisson distribution insead of a normal distribution. As an example, consider a model for predicting the frequency of use of alcohol among adolescents. These data are taken from the National Longitudinal Survey of Youth (NLSY), where participants reported the number of days in the past month they used alcohol. Participants are interviewed multiple times each year, so the observations are nested in person (for this example, the longitudinal nature of the data is ignored). Figure 5.3 shows the distribution of this count variable for a random subset of 5,000 adolescents. The distribution is highly skewed, with most participants reporting 0 days of alcohol use. If we were to ignore the distributional challenges of this variable and fit a simple linear mixed-effects model, we would likely obtain an invalid, poorly fitting model. For count data, a generalized model that assumes a Poisson distribution can be fit. Typically, the link function that would be used is the natural logarithm (see Table 5.1). The first set of columns in Table 5.3 presents the results of a generalized mixed-effects model assuming Poisson distribution. The results suggest that White males are likely to use alcohol more frequently, and adolescents who smoke frequently are more likely to use alcohol. Remember that this is a generalized model, so that we must use an inverse function of the underlying link function to properly interpret the fixed-effect parameter estimates. For example, the expected number of days of alcohol use for a White male who also smokes 10 days out of the month is Yˆ = exp(−0.352 + 10 ∗ 0.047 + 0.285), which comes out to 1.5 days per month. Notice that a Level 1 variance component is not provided by the model. For a Poisson model, the variance is assumed to be equal to the mean. This is often not the case for real-world data sets. Examining our

70 Figure 5.3

Distribution of number of days using alcohol in the past month.

0.6

Density

0.4

0.2

0.0 0

10

20

30

Number of alcohol days

NLSY data, we can see for example that the mean of alcohol days is 2.46, while its variance is 23.11. In particular, if the variance is larger than the mean it is called overdispersion. Overdispersion is assessed by looking at the ratio of the residual deviance (sum of squared Pearson residuals) compared with the residual degrees of freedom: φˆ =

D n−p

(5.8)

If the data follow a Poisson distribution with no overdispersion, then we would expect this ratio to be close to 1. Using procedures suggested in the GLMM FAQ (see Chapter 7), we can estimate φ for

71 Table 5.3

Comparison of three models of alcohol use counts Poisson

Fixed Effects Intercept Smoking Gender (male) Race (non-White)

Zero-Inflated

Negative Binomial

Coefficient

p

Coefficient

p

Coefficient

p

−0.352 0.047 0.285 −0.478

0.00 0.00 0.01 0.00

0.253 0.030 0.366 −0.468

0.01 0.00 0.00 0.00

−0.046 0.054 0.423 −0.392

.66 .00 .00 .00

Variance Component

Variance Component

Variance Component

Intercept Zero-inflation (pz) Dispersion

1.876

1.583 0.349

1.518 0.172 1.074

Model Fit

AIC

AIC

AIC

22713.6

19685.3

17212.0

Random Effects

Note: AIC = Akaike information criterion.

a fitted model and test the null hypothesis of no overdispersion. For the basic Poission model of the NLSY data as described above, φˆ is 11.2, which is highly significant. This suggests that we have a problem with overdispersion. There are many reasons why count data may exhibit overdispersion. One common issue is zero inflation, where a count variable has more (or sometimes fewer) zeros than expected. If zero inflation is observed, one way to handle this is to build a two-part model. A two-part model is a mixture of a binomial model (predicting whether an observation is positive) and a positive distribution model (predicting the actual value of the observation if not zero). These models can be somewhat complicated but allow for establishing differential sets of predictors for each of the two processes. A somewhat simpler approach is to include a zero-inflation parameter in a Poisson or similar generalized mixedeffects model. (Not all mixed-effects modeling packages allow for zeroinflation estimation. The results shown here are from the glmmADMB package in R.) The second model in Table 5.3 is a Poisson model with a zeroinflation parameter. The zero-inflation parameter pz is an estimate of the probability that any observation of Y is 0; it also follows that

72 1 − pz is the probability that Y comes from a Poisson distribution (Bohning, Dietz, Schlattmann, Mendonca, & Kirchner, 1999). For our data, the zero-inflation parameter is 0.349, suggesting a high probability that any observation is 0, which corresponds to what we know about the alcohol day variable. This second model, with the zero-inflation parameter, is a better fit to the data as indicated by the lower AIC value. Also, the overdispersion ratio for this second model is now 5.896. It is still significant, but much lower than for the first simple Poisson model. Finally, if there are still distributional problems for your count data, then you can move away from a Poisson model with its strict variance assumptions. A negative-binomial model is a generalization of the Poisson model. That is, it can handle nonnegative count data but does not assume that the mean of the distribution is equal to its variance. It does this by including a variance parameter that partially accounts for unobserved heterogeneity that may be the result of issues such as overdispersion. The third model in Table 5.3 shows the results of a negative-binomial model that also includes a zero-inflation term. The fixed-effects parameter estimates are interpreted similarly to the previous models. However, we now have a better handle on the distributional challenges. In fact, the overdispersion ratio for the negative-binomial model is now 0.808, which is not significantly different from 1.

Three-Level Models Up to now, we have considered only two-level models. First, in Chapter 3, we considered voting behavior at the person level influenced by state-level characteristics. Second, in the first section of this chapter, we modeled votes on individual bills influenced by person-level characteristics. As discussed in the previous section, there were numerous statistical advantages to modeling the binary voting behavior using GLMM. However, by using only a two-level model that models individual votes as a function of person attributes, we lost the ability to see how these attributes (i.e., Party and Money) were random effects at the state level. Fortunately, mixed-effects models can be extended to handle more than two levels. It is not uncommon to come across three-level modeling situations in the social and health sciences. For example, in education, it is common to see data collected at three levels: students nested in classrooms nested in schools. All the same considerations for building

73 two-level models apply to three-level models; in particular, the hypothesized effects at each level should be clearly defined and appropriately measured. Although the extension of the two-level statistical model to three levels is relatively straightforward, the specification of the random effects can become confusing. In a two-level model, remember that the Level 1 intercepts and slopes may be considered random at Level 2. In a threelevel model, Level 1 intercepts and slopes may be random at Level 2 and Level 3. By random, we mean that the Level 1 intercepts and slopes may vary across Level 2 units and Level 3 units. Suppose, for example, that we are interested in modeling reading achievement by students in various classrooms and schools. We may hypothesize that average reading scores (Level 1 intercepts) and the effects of socioeconomic status (SES) on reading scores (Level 1 slope) vary across classrooms (Level 2) and across schools (Level 3). In addition, any predictors that we include in a Level 2 submodel may be considered random at Level 3. So if we include a measure of teacher experience at Level 2, the effects of this predictor variable may be hypothesized to vary across schools, and thus will be entered as a random-effect at Level 3 in the three-level model. To help clarify this example, Equation 5.9 displays one possible three-level statistical model: Level 1: Yijk = π0jk + π1jk (SES)ijk + εijk Level 2: π0jk = β00k + β01k (Exp)jk + r0jk π1jk = β10k + β11k (Exp)jk + r1jk Level 3: β00k β01k β10k β11k

= γ000 + u00k = γ010 + u01k = γ100 + u10k = γ110 + u11k

(5.9)

Here, Yijk is the reading score for the ith student in the jth class in the kth school. SES is a Level 1 predictor, and teacher experience (Exp) is a Level 2 predictor. The Level 1 and Level 2 intercepts and slopes are all modeled as random effects. As usual, this set of equations can be reduced to a single mixed-model equation: Yijk

= γ000 + γ010 (Exp) + γ100 (SES) + γ110 (Exp)(SES) +u00k + u01k (Exp) + u10k (SES) + u11k (Exp)(SES) +r0jk + r1jk (SES) + εijk

(5.10)

Whereas Equation 5.9 makes it clear that there are three levels in the model with one predictor at Level 1 and one predictor at Level 2,

74 Equation 5.10 makes it easier to see that this model has four fixed effects and seven variance components (random effects). In particular, there is an important fixed-effect cross-level interaction (γ110 ) that is easy to miss if you were looking only at Equation 5.9. GLMMs can also be extended to three levels. Our voting data set provides a good example of just such a three-level binary multilevel model. Previously, we have fit two different two-level models to these data: In the first, members of Congress were nested within states; in the second, votes on individual bills were nested within each member of Congress. We can combine these two approaches into a single three-level model where individual votes are nested within members of Congress who are in turn nested within states. Equation 5.11 presents the formal three-level statistical model. As before, we are using a binomial model with a logit link function. Political party and amount of PAC money received are the predictors at the level of the Congress member (now Level 2). Acres is the only Level 3 predictor, and this time it is hypothesized to affect only the Level 2 intercept and not the Level 2 slopes. Level 1: ηijk = logit(Yijk ) ηijk = π0jk Level 2: π0jk = β00k + β01k (Party)jk +β02k (Money)jk + r0jk Level 3: β00k = γ000 + γ001 (Acres)k + u00k β01k = γ010 + u01k β02k = γ020 + u02k

(5.11)

Equation 5.12 shows the single-equation, mixed-model formula and indicates that our model has four fixed effects (and four random effects). Notice that there is no cross-level interaction effect, because Acres is allowed to affect only the intercept.   γ000 + γ001 (Acres)k + γ010 (Party)jk + γ020 (Money)jk Yijk = logistic +u00k + u01k (Party)jk + u02k (Money)jk + r0jk (5.12) The results of fitting this model appear in Table 5.4. The results are essentially consistent with the previous models. Party, money, and tobacco acreage are all significant predictors of voting behavior. The effect of money is slightly smaller in the three-level model than in the two-level model, presumably because some of the effects of money are captured by the tobacco economy in a state—legislators from states with

75 Table 5.4

Results of three-level GLMM of binary voting data

Fixed Effects Intercept (γ000 ) Acres (γ001 ) Party (γ010 ) Money (γ020 ) Random Effects Intercept-1 (r0j ) Intercept-2 (u00 ) Slope-Party (u01 ) Model Fit

Coef. −1.789 0.006 2.627 0.025

SE

T-Ratio

p

0.1319 0.0022 0.1271 0.0032

−13.56 2.57 20.67 7.79

.000 .010 .000 .000

AIC

BIC

SD

Variance Component

0.652 0.663 0.450

0.4255 0.4395 0.2029

Deviance

Parameters

7064.4

8

7080.4

7135.7

Note: SE = standard errors; AIC = Akaike information criterion; BIC = Bayesian information criterion.

tobacco acreage are receiving more money from the tobacco industry than are legislators from states with no tobacco harvest. Figure 5.4 plots the prediction equations for the fitted model, including the effects of three different levels of state tobacco acreage. The shapes of the curves (i.e., the effects of Party and Money) are quite similar to the two-level model. This plot shows us that although tobacco acreage is a significant effect in the model, the effects are not noticeable until acreage is large, of the order of 100,000 acres. Only two states (Kentucky and North Carolina) have a tobacco harvest more than 100,000 acres, so, for most states, only political party and PAC contributions need to be taken into account. The other important part of the fitted model is the random effects. In the two-level model, the standard deviation of the person-level random effect was 0.842. In the three-level model, the person-level variability is now lower (0.652) and is, in fact, a little smaller than the interstate variability (0.663). This tells us that there is as much variability in voting behavior between states as there is between legislators within the states. The random effects of the two person-level slopes are smaller than those for the intercepts, which supports the decision to not include Acres as a slope predictor.

76 Predicted voting probabilities from three-level binary model.

0.8 0.6 0.4 0.2

Republican − 100K Acres Republican − 10K Acres Republican − 0K Acres Democrat − 100K Acres Democrat − 10K Acres Democrat − 0K Acres

0.0

Predicted protobacco voting probabilities

1.0

Figure 5.4

0

20

40

60

80

100

120

PAC contributions ($K)

Cross-Classified Models In many cases, multilevel models with two or more higher level factors are nested, where lower level units are nested within the levels of the higher level units. For example, children are nested in classrooms, which are further nested in schools. Many types of studies or data sets, however, may exhibit more complicated structures. For example, children are nested in both schools and neighborhoods, but schools and neighborhoods are not nested with each other. Some children from the same neighborhood may go to a specific school, while other children from the same neighborhood may go to a different school. In this case, we would consider schools and neighborhoods as cross-classified factors. The children are actually nested within the various combinations of school– neighborhood pairs. If you were to ignore the cross-classification, by

77 Partial cross-tabulation of pupcross data

Table 5.5 1 2 3 4 5 6

1 3 0 1 0 1 0

2 1 1 1 0 1 2

3 1 0 0 0 3 1

4 3 1 0 1 2 0

5 2 1 0 2 1 0

6 0 2 2 1 0 2

7 0 2 1 2 0 2

8 0 1 2 2 1 2

9 0 1 1 2 1 1

10 0 0 0 0 0 0

11 0 0 1 1 0 1

12 0 0 0 1 1 1

13 0 0 0 0 0 0

14 1 1 3 0 2 0

15 2 1 1 1 0 2

fitting one of the traditional two-level nested models (children nested in neighborhoods or children nested in schools), it is possible that the models would produce biased estimates of the individual random effects (Dunn, Richmond, Milliren, & Subramanian, 2015) This type of cross-classification structure can be seen in a classic educational data set from England. The pupcross data set, taken from Hox (2017), contains a number of educational achievement and socidemographic variables from 1,000 children who are nested within both 50 primary and 30 secondary schools, which are crossed with one another. This cross-classification can be seen with a simple cross-tabulation. The crosstabs in Table 5.5 shows that children who attend one primary school (rows) are spread across multiple secondary schools (columns). Also, the crosstabs reveal many empty cells in the data matrix. This type of sparse data caused problems with older types of mixed-effects modeling approaches, but modern techniques (e.g., those incorporated in R’s lme4 or in Stata) can easily handle cross-classified data. Table 5.6 shows the results of a set of cross-classified models. First, we fit a null model. Note that we get a separate variance component estimate for each random factor, which is exactly what we want. The results show that the children’s achievement scores vary across both primary and secondary schools, although the variability is higher for primary schools. Building and interpreting more interesting models with various Level 1 and higher level predictors proceeds in much the same fashion as with our earlier models. The only difference here is that we need to keep in mind that there is not a simple nesting interpretation of the variance components when cross-classified factors are included in the model. The second model presented in Table 5.6 adds two Level 1 predictors: student gender and SES. The results show that girls and children with higher SES have significantly higher achievement scores. The final model

78 Table 5.6

Cross-classified models of student achievement Null

Fixed Effects

Level 1

Levels 1 + 2

Coefficient

p

Coefficient

p

Coefficient

p

Intercept Gender—Female SES Primary Den. Secondary Den.

6.35

.00

5.76 0.26 0.11

.00 .00 .00

5.52 0.26 0.11 0.20 0.18

.00 .00 .00 .11 .07

Random Effects

SD

Variance

SD

Variance

SD

Variance

Intercept—Primary Intercept—Secondary Residual

0.41 0.26 0.72

0.17 0.07 0.51

0.41 0.25 0.69

0.17 0.06 0.47

0.40 0.24 0.69

0.16 0.06 0.47

Parameter

AIC

Parameter

AIC

Parameter

AIC

4

2,326

6

2,255

8

2,253

Model Fit

Note: SES = socioeconomic status; Den. = denominational; AIC = Akaike information criterion.

adds two additional Level 2 predictors, whether the student attended a denominational (religious) primary or secondary school. Being in a denominational school may postively affect achievement, but neither estimate individually is quite significant. However, likelihood ratio tests (not shown here) indicate that the third model fits the data significantly better than the second. The random effects parts of the models show that achievement scores vary from school to school. Further model exploration suggests that the effects of student SES on achievement also vary across schools, but this varability is several orders of magnitude smaller than for the achievement intercept. In any case, this example demonstrates that mixed-effects model estimation and interpretation for cross-classified models is similar to that for nested multilevel models.

CHAPTER 6. LONGITUDINAL MODELS They always say time changes things, but you actually have to change them yourself. —Andy Warhol, 1975

Longitudinal Data as Hierarchical: Time Nested Within Person When we consider multilevel models, it is not unusual to think first of individual objects nested within a physical or social context, such as persons in neighborhoods or clinics in hospitals (see Table 1.2). However, as we saw earlier when we used generalized linear mixed-effects model (GLMM) to model votes on individual bills, mixed-effects models can be applied to multiple observations nested within a single object. This opens up multilevel modeling to a wide variety of useful applications. In particular, mixed-effects models can be applied to longitudinal data where the primary interest is in modeling the structure and predictors of change over time. Mixed-effects modeling has a number of important advantages when applied to longitudinal data compared with traditional analytic methods. Consider Table 6.1, which illustrates different types of “messy” longitudinal data that may appear in a typical longitudinal study. Subject 001 represents the ideal situation where a participant is enrolled, given a baseline interview within the first month, and then is given four followup interviews every 6 months thereafter. Subject 002 is an example of somebody who drops out of the study or is lost to follow-up. Subject 003 has missing data for one or more time points. Subject 004 has only one time point with valid data. Subject 005 has complete data for the baseline and four follow-up interviews, but the interviews occur either earlier or later than planned. Finally, Subject 006 has missing data and uneven time points. Many traditional longitudinal modeling approaches, such as repeated-measures ANOVA, are unable to easily handle longitudinal data that are unbalanced, have missing data, or have uneven time points. Mixed-effects modeling, on the other hand, is much more flexible and efficient. It will use whatever data are available, and it can model change patterns even for data that are collected at varying time points. 79

80 Table 6.1

A “messy” longitudinal data set

Months Subject 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 001 002 003 004 005 006

X X X X X

X X X

X X

X

X X

X X X

X

X X

Intraindividual Change To use a mixed-effects model with longitudinal data, the data must have multiple observations nested within a higher level object, often an individual. In addition, there needs to be some type of time variable that can be used to order the observations across time. The first steps in longitudinal modeling are often to examine the form of that change over time, or as it is sometimes called, intraindividual change (Singer & Willett, 2003). Intraindividual change analysis can answer the following questions: (1) Is change occurring, (2) what direction is it going in, and (3) is there evidence that there is more than simple linear change? Consider the following equation, which describes a basic longitudinal model: Level 1: Yij = β0j + β1j (Time)ij + rij Level 2: β0j = γ00 + u0j β1j = γ10

(6.1)

This model would allow us to examine how the dependent variable changes over time, for all persons (or higher level objects) in the sample. Note that the individual intercept is allowed to vary across persons, but the effect of time here is treated as a fixed-effect (and thus no variance component for time is included). However, time can be treated as a random effect in the model. This leads to a model that estimates how much the time effect (slope) on the dependent variable varies from person to person. This version of the basic longitudinal model would then appear as follows:

81 Level 1: Yij = β0j + β1j (Time)ij + rij Level 2: β0j = γ00 + u0j

(6.2)

β1j = γ10 + u1j These models are called unconditional growth models. They are growth models in that they are used to assess how the dependent variable grows (positively or negatively) over time. And they are unconditional because other than time they have no other predictors or covariates. (They are not true null models, because they have time as a Level 1 predictor.) To illustrate a multilevel model of longitudinal data, we will use data taken from the Longitudinal Study of Aging (LSOA), which is a national biennial panel study of persons 70 years or older conducted from 1984 to 1990 by the National Center for Health Statistics in collaboration with the National Institute on Aging. The data used here were extracted from public use data files available at ICPSR (https://www.icpsr.umich.edu/icpsrweb/ICPSR/series/40). One of the major goals of the LSOA was to measure change in the functional status and living arrangements of older people, which makes it an ideal candidate for longitudinal modeling. Participants who were at least 70 years old were enrolled and interviewed in 1984, and then reinterviewed every 2 years until 1990. These data are available in the online support materials for this book, available at https://www.douglasluke.com. Table 6.2 shows the structure of the data file and variables used in this example. CaseID is the participant ID that uniquely identifies

Table 6.2

LSOA longitudinal data (long form)

CaseID

Wave

Wave2

NumADL

Married

Sex

Age84

1 2 2 2 2 3 3 4 4 4 4

0 0 1 2 3 0 3 0 1 2 3

0 0 1 4 9 0 9 0 1 4 9

0 0 0 0 1 0 0 0 5 0 1

No No No No No No No No No No No

Female Male Male Male Male Female Female Male Male Male Male

70 87 87 87 87 71 71 78 78 78 78

Note: LSOA = Longitudinal Study of Aging.

82 each study participant. NumADL is the number of types of activities of daily living with which the participant was having difficulties. This number could range from 0 to 7 and covered problems with bathing or showering, dressing, eating, getting in or out of bed or chair, walking, getting outside, and using or getting to the toilet. NumADL thus can be interpreted as a measure of functional status, with higher scores indicating lower functioning. Married is a binary variable that records whether the participant was married at the time of the interview. Because marital status can change, Married is considered to be a time-varying covariate. Wave indicates which interview is being done, and in this data set can be used as a time variable to model linear change in NumADL. It has been reparameterized by subtracting 1, so the first interview is at Wave = 0. Wave2 is the square of Wave, and will be used to test for quadratic change (see below). Age84 is the age of the participant at the beginning of the study, and Male is a binary variable indicating the gender of the participant. Both age and gender are measured once in the study, at the baseline interview. Therefore, they are (at least initially) treated as time-invarying (or constant) covariates. Our primary purpose in this example will be to model the trajectory of problems of daily living over a 6-year period. The sample means presented in Table 6.3 suggest that problems of daily living increase during the observational period of the LSOA study. We will also be interested in how age, gender, and marital status influence these trajectories. The data used here include 20,283 interviews of 7,417 different participants. This is an average of 2.73 interviews per person. Given that every participant could have been interviewed four times, there is almost 32% missing data. In fact, only 2,330 participants (31.4%) have all four interviews. In other words, it would not be advisable to use repeated-measures ANOVA on these data; multilevel modeling, on the other hand, will be able to analyze all the data without losing any cases or information. As suggested above, our first step in analyzing these data is to assess the form or structure of the change in NumADL over time. Table 6.4 presents the results of three basic intraindividual models. The first

Table 6.3

Sample means of number of ADLs over interview waves

Mean number of ADLs

Baseline

2 Years

4 Years

6 Years

0.72

1.21

1.19

1.32

Note: ADLs = activities of daily living.

83 Table 6.4

Comparison of three intraindividual change models Null

Fixed Effects

Linear

Quadratic

Coefficient

p

Coefficient

p

Coefficient

Intercept Wave Wave2

1.10

.00

0.76 0.30

.00 .00

Random Effects

SD

Variance

SD

Variance

SD

Variance

Intercept Residual

1.32 1.30

1.75 1.70

1.40 1.22

1.95 1.49

1.40 1.22

1.95 1.49

Model Fit

Parameter

AIC

Parameter

AIC

Parameter

AIC

3

77,821

4

76,515

5

76,477

0.73 0.46 −0.06

p .00 .00 .00

Note: AIC = Akaike information criterion.

set of columns shows the true null model, which is included only for comparison. The middle set of columns presents an unconditional linear growth model. By linear here, we are referring to the type of change allowed in the model. By only including Wave as the time variable, we are able to assess the linear, straight-line, relationship between time and NumADL. For our data, we can see that there is a significant relationship, and that for any LSOA participant we would expect to see about a 0.30 increase in NumADL at each interview time point. That is, LSOA participants report modest increases in problems in daily living as they progress through the study. Although informative, the linear model ignores a major advantage of having longitudinal data with multiple (three or more) time points—the ability to explore nonlinear (curvilinear) patterns of change. Curvilinear models can be extremely useful; they can suggest underlying mechanisms that correspond to particular theories of change, they can help identify cut points or threshhold values where there are important shifts in the change patterns, and in general, they provide much richer longitudinal information than simpler linear growth models. One straightforward way to explore curvilinear models is to build polynomial change models, by including higher order transformations of the time variable (i.e., squared, cubed, etc.). One caution here is that if you enter these simple polynomial terms in the model, you are likely to end up with highly intercorrelated fixed-effect time parameters. This may

84 lead to problems with multicollinearity and model robustness. For this reason, it is often better to build polynomial models using orthogonal transformations of the polynomial coefficients (Smith & Sasaki, 1979). Although it is possible to create orthogonal transformations by hand (Robson, 1959), it is much more common to use built-in procedures that are available in mixed-effects and other statistical modeling software packages (e.g., using the “poly” function in lme4 in R). With only four time points in the LSOA data, we are limited to quadratic, or possibly cubic change models. The last set of columns in Table 6.4 presents a quadratic change model of NumADLs. This model demonstrates that there are, in fact, nonlinear elements of the pattern of change over time. Polynomial models can be hard to interpret correctly based on the raw parameter coefficients (especially if orthogonal contrasts have been used). It is useful, therefore, to examine graphs of predicted outcomes to better understand the change patterns implied by the model. Figure 6.1 shows the predicted trajectory of NumADLs over the lifetime of the LSOA study. We can see that NumADLs increase over time, but the increase starts to flatten out after about Year 4. The choice of the time variable is critical for producing a meaningful longitudinal model. This is a particularly important decision because many longitudinal studies or data sets will provide more than one option for a time variable. Consider longitudinal studies of educational interventions. One time option could be a “wave” variable like we have seen here, where the variable is operationalized as amount of time since study start or since baseline interview. Or time could be measured as age, which could be operationalized as days, months, years, and so on. Alternatively, time could be measured in units that are more relevant for the setting, here it might be grade level or semester. Given multiple options, how should you choose the most appropriate approach? Although you may proceed empirically, exploring what results you get with different time variables, it is always a good idea to let theory be your guide. Specifically, the time variable you choose should be consistent with the underlying theory of change that you are using in your study. For example, if the educational intervention you are studying is designed to work by influencing cognitive abilities, which have a strong developmental aspect, you might be better off using an age-based measure of time. Alternatively, if you are assessing a social support intervention that focuses on social systems within schools, then a grade-level operationalization of time might be more valid.

85 Quadratic change model for number of ADLs over time.

1.0 0.0

0.5

Predicted number of ADLs

1.5

2.0

Figure 6.1

0

1

2

3

4

5

6

Years after study enrollment Note: ADLs = activities of daily living.

We can see the impact of this choice by returning to the LSOA data. The LSOA study is observational, so we do not have any theory that suggests that changes should be directly driven by the amount of time since study start. Instead, we might assume that decrease in functioning for older persons is, in general, a function of age. Age84 is a timeinvarying covariate, because it is the age of each participant at study start in 1984. However, it is straightforward to calculate a time-varying version of age, by simply adding 2 years to Age84 for each wave. When we refit the quadratic model of NumADL using age instead of Wave, we still find significant linear and quadratic effects of time (b = −0.59 and 0.0044, respectively). Now when we plot the growth curve for this model, we get a very different picture of how NumADLs change over time (Figure 6.2).

86 Quadratic change model for number of ADLs over age.

4 3 0

1

2

Predicted number of ADLs

5

6

7

Figure 6.2

70

80

90

100

Age

Note: ADLs = activities of daily living.

This model shows that decrease in functioning actually accelerates over time as people get older. The stark difference in the shape of the growth curves in the two models can be explained by the underlying distribution of ages in the LSOA sample. Most participants are in their early 70s, and age is right-skewed. This means that when we use “wave” as our time variable, even in the last wave most people are in their late 70s and early 80s. However, when we use actual age, we can see the shape of the time relationship over a larger age range, from 70 to more than 100 years. Given that this is an observational study, age turns out to be a more useful time variable than wave. Because time is an explanatory variable in a mixed-effects longitudinal model, it could be centered like any other covariate. As suggested earlier in Section “Centering,” decisions about centering should be based

87 on theoretical or interpretive grounds. We have seen in this section that changes in how time is measured or parameterized in a longitudinal model can have dramatic effects on how the effects of time are interpreted. Centering a time variable to reflect the average time in a study often makes it harder to interpret the growth curve, and it may also make it harder to compare results across studies. For these reasons, centering the time variable as a default analytic step is not recommended (Biesanz, Deeb-Sossa, Papadakis, Bollen, & Curran, 2004). For longitudinal models, time is really of the essence! Careful consideration of how time is measured, and how it should be entered into a longitudinal model is critical, and analysis of other predictor effects should only proceed once you are satisfied with how you will handle time.

Interindividual Change Once we understand the basic shape of change over time, we can move on to modeling the effects of other predictors. When we add new (nontime) predictors, these models are called interindividual models of change, because they are often focusing on how different types of people (or other objects) change over time. For example, if we put gender into a longitudinal model, we can explore how men change over time on some specific dependent variable compared with women. Covariates that do not themselves change over time are properly characteristics of the person; that is, they are entered as Level 2 predictors. For example, we can add gender as a Level 2 predictor in our quadratic change model: Level 1: Yij = β0j + β1j (Age)ij + β2j (Age2 )ij + rij Level 2: β0j = γ00 + γ01 (Male)j + u0j β1j = γ10 β2j = γ20

(6.3)

Notice that here both time variables are fixed effects. Gender here is entered as a Level 2 main effect: It operates in the model by simply shifting the overall level of the dependent variable up or down. So we can see if males have higher or lower levels of the dependent variable compared with females. This implies that women and men will have the same growth curve as each other. If we want to test the hypothesis that

88 the two genders have different patterns of change over time, then we need to enter gender as a cross-level interaction with time: Level 1: Yij = β0j + β1j (Age)ij + β2j (Age2 )ij + rij Level 2: β0j = γ00 + γ01 (Male)j + u0j β1j = γ10 + γ11 (Male)j β2j = γ20 + γ21 (Male)j

(6.4)

Finally, time-varying covariates are entered as Level 1 predictors. The estimated coefficients are interpreted as the expected change in the dependent variable when a change in the predictor occurs. So, for example, if marital status is measured at every interview, then the effect of being married is interpreted as the effect of that status when it occurs (Equation 6.5). Thus, while estimating the fixed effects of a time-varying explanatory variable is straightforward in mixed-effects longitudinal models, interpreting these effects requires some care. Instead of thinking of how a covariate may distinguish effects between subgroups (e.g., males and females), the effect of a time-varying covariate is more properly viewed as a shift in the dependent variable when a change occurs in the covariate. For example, in a study of school performance over time, the type of school (elementary, middle, high school) could be entered as a time-varying explanatory variable. The estimated effect on school performance is not interpreted simply as the difference between middle school and high school students, but it is the effect on school performance when any student shifts from being a middle school student to being a high school student. See Singer and Willet (2003) for more details on the various ways to build main effects and interaction tests in longitudinal models. Level 1: Yij = β0j + β1j (Age)ij + β2j (Age2 )ij + β3j (Married)ij + rij Level 2: β0j β1j β2j β3j

= γ00 + γ01 (Male)j + u0j = γ10 + γ11 (Male)j = γ20 + γ21 (Male)j = γ30

(6.5) Table 6.5 presents the results of three different interindividual prediction longitudinal models of NumADL, which demonstrates how both time-varying and time-nonvarying covariates can be utilized. In Model 1, gender is entered as a Level 2 (time-nonvarying) main effects predictor. The negative and significant parameter estimate indicates that males, on average, have lower (healthier) NumADL trajectories compared with women. The second model adds marital status, a Level 1

89 Table 6.5

Comparison of three interindividual change models Model 1

Fixed Effects Intercept (β0j ) Intercept (γ00 ) Male (γ01 ) Age slope (β1j ) Age (γ10 ) Male (γ11 ) Age2 slope (β2j ) Age2 (γ20 ) Male (γ21 )

Coefficient

Model 2 p

Coefficient

Model 3 p

Coefficient

p

1.18 −0.19

.00 .00

1.20 −0.15

.00 .00

1.20 −0.16

.00 .00

100.27

.00

99.03

.00

103.03 −12.76

.00 .00

26.04

.00

26.01

.00

27.74 −6.26

.00 .10

−0.08

.02

−0.08

.02

Married slope (β3j ) Married (γ30 ) Random Effects

SD

Variance

SD

Variance

SD

Variance

Intercept (u0j ) Level 1 (rij )

1.30 1.22

1.68 1.48

1.30 1.22

1.68 1.48

1.30 1.22

1.68 1.48

Parameter

AIC

Parameter

AIC

Parameter

AIC

6

75,581

7

75,578

9

75,573

Model Fit

Note: AIC = Akaike information criterion.

(time-varying) predictor. Being or becoming married also provides a significant benefit to functioning. Model 3 tests a cross-level interaction hypothesis; that is, do men and women show different growth curves of NumADL? This is done by including an interaction of gender with both the linear and the quadratic age components. The results suggest that men and women do differ in their patterns of change, but only in the linear component. Men have a reduced growth rate over time, compared with women. The lower AIC and a significant likelihood ratio test (not shown) both indicate that Model 3 is the best fitting model. (Note that the raw parameter coefficients shown here are not comparable with those shown in Table 6.4 because the models presented here used a set of orthogonal polynomial contrasts for the quadratic age components.)

90 Predicted ADL trajectories over time by gender.

7

Figure 6.3

5 4 3 0

1

2

Predicted number of ADLs

6

Unmarried Women Unmarried Men

70

80

90

100

110

Age

Note: ADLs = activities of daily living.

Once again it is helpful to produce a prediction graph to accurately interpret the model results. Figure 6.3 shows that the predicted number of problems in daily living increases almost exponentially as older, unmarried persons age. Women in their 70s have about the same functioning level as men do, but over time a gap develops. However, the gap remains relatively small compared with the increased risk as aging occurs. It is probably important to not overinterpret the predictions at the far end of the growth curve; there are relatively few people in the data set who survived past their 90s.

Alternative Covariance Structures Many of the model-building steps for longitudinal data are the same as those we have used in the previous multilevel models. However, there

91 is one additional consideration that is important for the analyst to consider when dealing with longitudinal models. In nonlongitudinal models, we typically assume that errors (residuals) are normally distributed and independent. This independence assumption is often not appropriate for longitudinal data. In particular, the measurements of the outcome variable across occasions (within persons) may have certain types of temporal structure that are not captured in the basic mixed-effects model. Therefore, the analyst may want to choose an appropriate alternative error covariance structure for the longitudinal data. Most multilevel modeling software will allow specification of alternative covariance structures, although they vary in user-friendliness. Table 6.6 shows how variance and covariance structures are defined for four common types of longitudinal models. Table 6.7 then presents the fit indices for the LSOA longitudinal Model 1 with these four different covariance structures. This table also lists the number of estimated random-effects parameters (excluding the Level 1 residual) that are produced for each specific model. The covariance matrices listed here are based on our LSOA data, which can have up to four repeated measures for each participant. So the dimensions are 4 by 4, to correspond with four time points.

Table 6.6

Common alternative covariance structures used in longitudinal models

Name

Covariance Structure σ2

Default σ2 τ2 τ2 τ2

Compound symmetry (homogeneous)

Autoregressive

Unstructured correlations

σT2 1 rT σT2 r2T σT2 r3T σT2 σT2 σT21 σT31 σT41

Source: Adapted from Figure 4.2 in Hoffman (2015).

τ2 σ2 τ2 τ2

τ2 τ2 σ2 τ2

τ2 τ2 τ2 σ2

r1T σT2 σT2 r1T σT2 r2T σT2

r2T σT2 r1T σT2 σT2 r1T σT2

r3T σT2 r2T σT2 r1T σT2 σT2

σT12 σT2 σT32 σT42

σT13 σT23 σT2 σT43

σT14 σT24 σT34 σT2

92 Table 6.7

LSOA Model 1 fit with different error covariance structures

Name Default Compound symmetry (homogeneous) Autoregressive Unstructured correlations

Random Effects Parameter

Deviance

AIC

BIC

1 2

75,574 75,574

75,586 75,588

75,633 75,643

2 7

74,971 74,907

74,985 74,931

75,041 75,026

Note: LSOA = Longitudinal Study of Aging; AIC = Akaike information criterion; BIC = Bayesian information criterion.

The first type of model, called Default, actually makes no assumptions about the error terms, and it does not allow any pattern of correlated errors across occasions. This default will always have the fewest number of random parameters, because no covariances across occasions are estimated. The only random-effects parameter estimated is the variance of the intercepts (σ 2 ). For longitudinal models, we often assume there will be some correlated errors across occasions, and these may be theoretically or substantively interesting. Therefore, the default model can be used as a baseline model against which other models are compared. Instead of not estimating any covariances, you can assume that the covariances are constant across all occasions, or, equivalently, that there is a single value for all correlations between time points. This fairly restrictive assumption is called compound symmetry, or sometimes homogeneous error. This assumption is attractive because of its parsimony, and this model only requires one additional random-effects parameter. This parameter (τ 2 ) captures the constant covariance between observations at two different time points within the same person. However, compound symmetry is a restrictive assumption that is not usually met by real-world data. For example, it assumes that the correlation between two observations close to each other in time would be exactly the same as the correlation between two much more widely spaced observations. An alternative covariance structure that is often used for longitudinal data is that of autoregressive structure, also sometimes called first-order autoregressive. This error structure lies between a fully unrestricted error structure and the highly restrictive homogeneous error. An autoregressive covariance structure assumes that error terms are correlated across

93 first-order lags. So if the lag correlation is estimated at .30, this means that errors at Time 1 and Time 2 are correlated at .30, Time 2 and Time 3 at .30, and so on. The implication of this is that longer lags have smaller correlations. That is, the errors at Time 1 and Time 3 will be correlated less than .30. The autoregressive structure is as parsimonious as homogeneous error; only one additional parameter needs to be estimated: rho, which is the first-order correlation. The autoregressive covariance structure is just as parsimonious as compound symmetry, but it is based on a covariance pattern that often matches real-world longitudinal data more closely. Finally, it is possible to fit a covariance structure that makes no assumptions about the patterns of covariances across observations. The unstructured correlations structure will estimate a separate covariance for all possible time observation pairs. So, for example, the correlation between Time 1 and Time 2 observations could be different from Time 3 and Time 4. This type of unstructured model will often fit the data better, at the cost of dramatically higher number of parameters in the model. There is usually little theoretical justification for exploring an unstructured model, and by fitting all the possible correlations you run the risk of overfitting the data. For these reasons, as well as for its lack of parsimony, it may not be as attractive as an autoregressive model. The model fit indices in Table 6.7 show that for the LSOA model, the autoregressive and unstructured covariance structures provide better fit to the data compared with the default or compound symmetry models. The autoregressive model results are probably preferred, given our previous concerns about parsimony, lack of theory, and risk of overfitting. This section just touches on the most important topics of using multilevel models with longitudinal data. For more details on such issues as additional error covariance structures, how to model nonlinear and discontinuous change, the relationship to time-series analysis, and software issues, the reader should seek out one of these excellent advanced volumes: Hoffman (2015); Little, Schnabel, and Baumert (2000); Moskowitz and Hershberger (2002); or Singer and Willett (2003).

CHAPTER 7. GUIDANCE Learning had to be digested. You didn’t just have to know, you have to comprehend. —Terry Pratchett, Unseen Academicals, 2009

Recommendations for Presenting Results Given the complexities of building, testing, and interpreting multilevel and other types of mixed-effects models, it is sometimes challenging to figure out the best way to present model results. Every analytic situation will have its own challenges, so it is impossible to have a set of specific directions that can always be followed to produce effective model results tables and figures. However, if the following general recommendations are kept in mind while designing the tables and figures, you will be more likely to produce effective analytic dissemination products. 1. Make sure that the random effects and multilevel structure of the model are clear. 2. Treat a model summary table as an infographic. 3. Design the table to emphasize the most important point(s) of the model results. 4. Make good use of prediction figures to illustrate the most important fixed and random effects. The first recommendation is most specific to mixed-effects models. The main thing that distinguishes a multilevel (mixed-effects) model is the inclusion of one or more random effects. Therefore, it is important to include some information about those effects in the summary table. The actual estimates of those effects may not be substantively interesting (e.g., compared with the fixed-effects parameter estimates), but the viewer needs to be reminded of the mixed-effects structure of the model. Similarly, it is often helpful to highlight the multilevel structure of the model, usually by indicating which predictors are modeled at which level. This helps both with general understanding and makes it less likely to misinterpret complicated parts of the model. 94

95 The second recommendation may sound a little paradoxical, but good summary tables should be designed as if they were data graphics. Model summary tables act as look-up tables where specific numeric information can be easily found, but they also serve as infographics where important aspects of model results can be concisely presented. As such, following basic data visualization design principles (e.g., those found in Edward Tufte’s 2001 work) can produce more effective tables. These principles include the following: remove all vertical lines, minimize horizontal lines, use white space for table structure, use consistent alignment, remove unneeded detail (e.g., too many digits), minimize repeated text, use graphical elements within the table, include both macro and micro details, use meaningful row sorts, and so on. For a nice dynamic presentation of some of these principles, see www.darkhorseanalytics.com/blog/clear-off-the-table. Graphical displays also play an important role in communicating the results of a multilevel or longitudinal model. (See Chapter 6 for some examples of longitudinal model results figures.) Graphical exploration is, of course, highly useful for assessing model diagnostics and fit even if these types of graphics will not end up being used in presentations or publications. However, even with a clearly designed model results table and a statistically savvy audience, it can be challenging to quickly and accurately interpret model results tables, especially for models that have numerous interaction terms and complicated random-effect structures. For this reason, it is often quite helpful to produce and display model prediction figures similar to those used here (see especially Figures 3.4, 4.7, 5.4, and 6.3). Finally, keep in mind that a model results table should not simply display all the numeric results of a model or all the numbers produced in the statistical output. Instead, it should be designed to emphasize the most important aspects of the results. These are rarely the p values for the individual parameter estimates. More important may be the overall fit of the model, how nested models compare with one another, how blocks of variables contribute to overall model fit, the effect sizes of the covariates, the precision of the parameter estimates, the pattern of cross-level interactions, and so on. Table 7.1 is an example that helps illustrate some of these recommendations. In particular, this table uses consistent alignment, white space, and a small number of horizontal rules (lines) to help structure the information and make it easier to see which parts of the model results are in which part of the table (e.g., by clearly separating the fixed and random effects). This table, in particular, is designed to facilitate model

3

313.7

319.7

AIC

.000

p

Parameters 7

−325.3

0.013 0.008 0.027

Variance Component

0.0222 0.0219 0.0005

SE

Model 1 (Level 1)

Deviance

0.116 0.089 0.166

SD

0.2125 0.4887 0.0042

Coefficient

Note: SE = standard error; AIC = Akaike information criterion.

Parameters

Deviance

0.093

0.304

Model Fit

0.035

0.188

Intercept Party slope Level 1

Variance Component

0.0311

SD

0.5309

SE

Random Effects

Level 1 Intercept Party Money Level 2 Acres GSP

Coefficient

Model 0 (Null)

Example results table for mixed-effects model comparisons

Fixed Effects

Table 7.1

−311.3

AIC

.000 .000 .000

p

−327.7

Deviance

0.105 0.092 0.165

9

Parameters

0.011 0.008 0.027

Variance Component

0.0003 0.0001

0.0006 0.0000 SD

0.0246 0.0222 0.0005

SE

0.2023 0.4884 0.0041

Coefficient

Model 2 (Levels 1 and 2)

−309.7

AIC

.089 .770

.000 .000 .000

p

96

97 comparison. Various model fit statistics are included so that the reader can compare models with each other. The structure also helps the reader see which blocks of variables are introduced in which model. (Also note how we do not include “NAs” or dashes to indicate the variables that are not included in the earlier models. White space makes it easier to see this.)

Useful Resources In the intervening years since the publication of the first edition of this book, mixed-effects modeling has become more mainstream and used more widely by social and health scientists. Mixed-effects modeling procedures are available in all the major software packages. There are a wide variety of mixed-effects, multilevel, and longitudinal modeling textbooks available, covering both general statistical theory of these methods and software specific applications. Graduate training in statistical methods is more likely to include mixed-effects models, either as part of more general advanced modeling classes or in their own more focused courses. Possibly because of this increased familiarity, there are actually fewer specialized training opportunities available, and somewhat fewer online resources dedicated specifically to these types of models. In the rest of this section, I provide a short, curated list of what I think are the most useful book, web-based, and software resources that support multilevel and longitudinal modeling best practices. This information is also available at the online support website for this book: https://www.douglasluke.com.

Books • Hox, J. J., Moerbeek, M., & Van de Schoot, R. (2017). Multilevel analysis: Techniques and applications. New York, NY: Routledge. • Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (Vol. 1). Thousand Oaks, CA: Sage. • Snijders, T. A. B., & Bosker, R. J. (2011). Multilevel analysis: An introduction to basic and advanced multilevel modeling (2nd ed.). Thousand Oaks, CA: Sage.

98 • West, B. T., Welch, K. B., & Gałecki, A. T. (2015). Linear mixed models: A practical guide using statistical software (2nd ed.). Boca Raton, FL: CRC Press. • Fitzmaurice, G. M., Laird, N. M., & Ware, J. H. (2011). Applied longitudinal analysis (2nd ed.). Hoboken, NJ: John Wiley. • Hoffman, L. (2015). Longitudinal analysis: Modeling within-person fluctuation and change. New York, NY: Routledge. • Menard, S. (Ed.). (2008). Handbook of longitudinal research: Design, measurement, and analysis. Burlington, MA: Academic Press. • Singer, J. D., & Willett, J. B. (2003). Applied longitudinal data analysis: Modeling change and event occurrence. Oxford, England: Oxford University Press.

Software Specific Resources • R – Zuur, A. F., Ieno, E. N., Walker, N., Saveliev, A. A., & Smith, G. M. (2009). Mixed effects models and extensions in ecology with R. New York, NY: Springer. (Great in-depth coverage of mixed-effects models in R, with detailed coding examples. Worth looking at even if you are not an ecologist.) – Gałecki, A., & Burzykowski, T. (2013). Linear mixedeffects models using R. New York, NY: Springer. (Somewhat technical, but comes with a number of detailed analytic examples.) • Stata – Rabe-Hesketh, S., & Skrondal, A. (2012). Multilevel and longitudinal modeling using Stata (3rd ed.). College Station, TX: StataCorp LP. (This actually comes in two volumes, and you will want both of them.) • SAS – Littell, R. C., Milliken, G. A., Stroup, W. W., Wolfinger, R. S., & Schabenberger, O. (2006). SAS for mixed models (2nd ed.). Cary, NC: SAS Institute.

99 – Stroup, W. W., Milliken, G. A., Claassen, E. A., & Wolfinger, R. S. (2018). SAS for mixed models: Introduction and basic applications. Cary, NC: SAS Institute. • SPSS – Heck, R. H., Thomas, S. L., & Tabata, L. N. (2013). Multilevel and longitudinal modeling with IBM SPSS (2nd ed.). New York, NY: Routledge.

Online Resources • GLMM FAQ— https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html Although focused particularly on mixed-effects models in R, this is in fact one of the best repositories of current thinking, challenges, and software implementation of general and generalized mixedeffects models. • UCLA IDRE—https://stats.idre.ucla.edu/ UCLA’s Institute for Digital Research and Education maintains an in-depth repository of general statistical education resources. This site includes a number of helpful resources for mixed-effects modeling, including data sets and code examples. • Centre for Multilevel Modelling—http://www.bristol.ac.uk/cmm/ Website for the Centre for Multilevel Modelling at the University of Bristol, England. Some good resources listed in the training section; the software section focuses particularly on MLwiN. • Cross Validated—https://stats.stackexchange.com/ Also known as StackExchange, this is a great general purpose question and answer site focusing on statistics, data analysis, and data visualization. A great resource for information on mixedeffects models, as well as guidance for using specific statistical platforms (e.g., lme4 in R).

REFERENCES

Aguinis, H., Boyd, B. K., Pierce, C. A., & Short, J. C. (2011). Walking new avenues in management research methods and theories: Bridging micro and macro domains. Journal of Management, 37(2), 395–403. Akaike, H. (1987). Factor analysis and AIC. Psychometrika, 52(3), 317–332. Arend, M. G., & Schäfer, T. (2019). Statistical power in two-level models: A tutorial based on Monte Carlo simulation. Psychological Methods, 24(1), 1–19. Arnold, B. F., Hogan, D. R., Colford, J. M., & Hubbard, A. E. (2011). Simulation methods to estimate design power: An overview for applied research. BMC Medical Research Methodology, 11(1), Article 94. Beidas, R. S., Marcus, S., Aarons, G. A., Hoagwood, K. E., Schoenwald, S., Evans, A. C., . . . Mandell, D. S. (2015). Predictors of community therapists’ use of therapy techniques in a large public mental health system. JAMA Pediatrics, 169(4), 374–382. Béland, D., & Howlett, M. (2016). The role and impact of the multiple-streams approach in comparative policy analysis. Journal of Comparative Policy Analysis: Research and Practice, 18(3), 221–227. Bell, A., Jones, K., & Fairbrother, M. (2018). Understanding and misunderstanding group mean centering: A commentary on Kelley et al.’s dangerous practice. Quality & Quantity, 52(5), 2031–2036. Biesanz, J. C., Deeb-Sossa, N., Papadakis, A. A., Bollen, K. A., & Curran, P. J. (2004). The role of coding time in estimating and interpreting growth curve models. Psychological Methods, 9(1), 30–52. Boehmer, T. K., Luke, D. A., Haire-Joshu, D. L., Bates, H. S., & Brownson, R. C. (2008). Preventing childhood obesity through state policy. American Journal of Preventive Medicine, 34(4), 333–340. Bohning, D., Dietz, E., Schlattmann, P., Mendonca, L., & Kirchner, U. (1999). The zero-inflated Poisson model and the decayed, missing and filled teeth index in dental epidemiology. Journal of the Royal Statistical Society: Series A (Statistics in Society), 162(2), 195–209. Boyle, M. H., & Willms, J. D. (2001). Multilevel modelling of hierarchical data in developmental studies. Journal of Child Psychology and Psychiatry, 42(1), 141–162. Buka, S. L., Brennan, R. T., Rich-Edwards, J. W., Raudenbush, S. W., & Earls, F. (2003). Neighborhood support and the birth weight of urban infants. American Journal of Epidemiology, 157(1), 1–8. Carroll, K. K. (1975). Experimental evidence of dietary factors and hormone-dependent cancers. Cancer Research, 35(11 Pt 2), 3374–3383. Cleveland, W. S. (1993). Visualizing data. Summit, NJ: Hobart Press. Cohen, J. (1977). Statistical power analysis for the behavioral sciences. Burlington, MA: Academic Press. Cook, R. D., & Weisberg, S. (1982). Residuals and influence in regression. New York, NY: Chapman & Hall. Curran, P. J., Stice, E., & Chassin, L. (1997). The relation between adolescent alcohol use and peer alcohol use: A longitudinal random coefficients model. Journal of Consulting and Clinical Psychology, 65(1), 130–140. Diez-Roux, A. V. (1998). Bringing context back into epidemiology: Variables and fallacies in multilevel analysis. American Journal of Public Health, 88(2), 216–222.

100

101 Diez-Roux, A. V., Merkin, S. S., Arnett, D., Chambless, L., Massing, M., Nieto, F. J., . . . Watson, R. L. (2001). Neighborhood of residence and incidence of coronary heart disease. New England Journal of Medicine, 345(2), 99–106. Duncan, C., Jones, K., & Moon, G. (1998). Context, composition and heterogeneity: Using multilevel models in health research. Social Science & Medicine, 46(1), 97–117. Dunn, E. C., Richmond, T. K., Milliren, C. E., & Subramanian, S. V. (2015). Using crossclassified multilevel models to disentangle school and neighborhood effects: An example focusing on smoking behaviors among adolescents in the United States. Health & Place, 31, 224–232. Enders, C. K., & Tofighi, D. (2007). Centering predictor variables in cross-sectional multilevel models: A new look at an old issue. Psychological Methods, 12(2), 121–138. Fitzmaurice, G. M., Laird, N. M., & Ware, J. H. (2011). Applied longitudinal analysis (2nd ed.). Hoboken, NJ: John Wiley. Freedman, D. A. (1999). Ecological inference and the ecological fallacy. In N. J. Smelser & P. B. Baltes (Eds.), International encyclopedia of the social & behavioral sciences (Vol. 6, pp. 4027–4030). Amsterdam, Netherlands: Elsevier. Gałecki, A. T., & Burzykowski, T. (2013). Linear mixed-effects models using R: A step-bystep approach. New York, NY: Springer. Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2014). Bayesian data analysis. Boca Raton, FL: CRC Press. Gill, J., & Torres, M. (2019). Generalized linear models (2nd ed.). Thousand Oaks, CA: Sage. Glass, T. A., & McAtee, M. J. (2006). Behavioral science at the crossroads in public health: Extending horizons, envisioning the future. Social Science & Medicine, 62(7), 1650–1671. Goldstein, H., Yang, M., Omar, R., Turner, R., & Thompson, S. (2000). Meta-analysis using multilevel models with an application to the study of class size effects. Journal of the Royal Statistical Society: Series C (Applied Statistics), 49(3), 399–412. Goodman, S. N., & Berlin, J. A. (1994). The use of predicted confidence intervals when planning experiments and the misuse of power when interpreting results. Annals of Internal Medicine, 121(3), 200–206. Green, P., & MacLeod, C. J. (2016). SIMR: An R package for power analysis of generalized linear mixed models by simulation. Methods in Ecology and Evolution, 7(4), 493–498. Harrell, F. E. (2015). Regression modeling strategies: With applications to linear models, logistic and ordinal regression, and survival analysis (2nd ed.). New York, NY: Springer. Heck, R. H., & Thomas, S. L. (2009). An introduction to multilevel modeling techniques. New York, NY: Routledge. Heck, R. H., Thomas, S. L., & Tabata, L. N. (2013). Multilevel and longitudinal modeling with IBM SPSS (2nd ed.). New York, NY: Routledge. Hoffman, L. (2015). Longitudinal analysis: Modeling within-person fluctuation and change. New York, NY: Routledge. Holmes, M. D., Hunter, D. J., Colditz, G. A., Stampfer, M. J., Hankinson, S. E., Speizer, F. E., . . . Willett, W. C. (1999). Association of dietary intake of fat and fatty acids with risk of breast cancer. JAMA Journal of the American Medical Association, 281(10), 914–920. Howe, L. D., Tilling, K., Matijasevich, A., Petherick, E. S., Santos, A. C., Fairley, L., . . . Lawlor, D. A. (2016). Linear spline multilevel models for summarising childhood growth trajectories: A guide to their application using examples from five birth cohorts. Statistical Methods in Medical Research, 25(5), 1854–1874.

102 Hox, J. J., Moerbeek, M., & van de Schoot, R. (2017). Multilevel analysis: Techniques and applications (3rd ed.). New York, NY: Routledge. Institute of Medicine, Gebbie, K., Rosenstock, L., & Hernandez, L. M. (2003). Who will keep the public healthy?: Educating public health professionals for the 21st century. Washington, DC: National Academies Press. Kain, M. P., Bolker, B. M., & McCoy, M. W. (2015). A practical guide and power analysis for GLMMs: Detecting among treatment variation in random effects. PeerJ, 3, e1226. Koster, J., Leckie, G., Miller, A., & Hames, R. (2015). Multilevel modeling analysis of dyadic network data with an application to Ye’kwana food sharing. American Journal of Physical Anthropology, 157(3), 507–512. Kreft, I. G. G., & de Leeuw, J. (1998). Introducing multilevel modeling. Thousand Oaks, CA: Sage. Lazarsfeld, P. F., & Menzel, H. (1961). On the relation between individual and collective properties. In A. Etzioni (Ed.), Complex organizations: A sociological reader (pp. 422–440). New York, NY: Holt, Rinehart & Winston. Littell, R. C., Milliken, G. A., Stroup, W. W., Wolfinger, R. D., & Schabenberger, O. (2006). SAS for mixed models (2nd ed.). Cary, NC: SAS Institute. Little, T. D., Schnabel, K. U., & Baumert, J. (2000). Modeling longitudinal and multilevel data: Practical issues, applied approaches, and specific examples. Mahwah, NJ: Lawrence Erlbaum. Longford, N. T. (1995). Random coefficient models. In G. Arminger, C. C. Clifford, & M. E. Sobel (Eds.), Handbook of statistical modeling for the social and behavioral sciences (pp. 519–570). Boston, MA: Springer. Lottig, N. R., Wagner, T., Norton Henry, E., Spence Cheruvelil, K., Webster, K. E., Downing, J. A., & Stow, C. A. (2014). Long-term citizen-collected data reveal geographical patterns and temporal trends in lake water clarity. PLoS One, 9(4), e95769. Luke, D. A. (2005). Getting the big picture in community science: Methods that capture context. American Journal of Community Psychology, 35(3–4), 185–200. Luke, D. A., & Krauss, M. (2004). Where there’s smoke there’s money: Tobacco industry campaign contributions and U.S. Congressional voting. American Journal of Preventive Medicine, 27(5), 363–372. Luke, D. A., Morshed, A. B., McKay, V. R., & Combs, T. B. (2017). Systems science methods in dissemination and implementation research. In R. C. Brownson, G. A. Colditz, & E. K. Proctor (Eds.), Dissemination and implementation research in health: Translating science to practice (2nd ed., pp. 157–174). Oxford, England: Oxford University Press. Maes, L., & Lievens, J. (2003). Can the school make a difference? A multilevel analysis of adolescent risk and health behaviour. Social Science & Medicine, 56(3), 517–529. Masuda, Y. J., Liu, Y., Reddy, S. M. W., Frank, K. A., Burford, K., Fisher, J. R. B., & Montambault, J. (2018). Innovation diffusion within large environmental NGOs through informal network agents. Nature Sustainability, 1(4), 190–197. Mathieu, J. E., Aguinis, H., Culpepper, S. A., & Chen, G. (2012). Understanding and estimating the power to detect cross-level interaction effects in multilevel modeling. Journal of Applied Psychology, 97(5), 951–966. Menard, S. W. (2008). Handbook of longitudinal research: Design, measurement, and analysis. Burlington, MA: Academic Press. Mitchell, M. N. (2010). Data management using Stata: A practical handbook. College Station, TX: Stata Press. Moos, R. H. (1996). Understanding environments: The key to improving social processes and program outcomes. American Journal of Community Psychology, 24(1), 193–201.

103 Moskowitz, D. S., & Hershberger, S. L. (2002). Modeling intraindividual variability with repeated measures data: Methods and applications. Mahwah, NJ: Lawrence Erlbaum. Murray, D. M. (1998). Design and analysis of group-randomized trials. Oxford, England: Oxford University Press. Muthén, B. O. (1994). Multilevel covariance structure analysis. Sociological Methods & Research, 22(3), 376–398. Nieuwenhuis, R., Te Grotenhuis, M., & Pelzer, B. (2012). influence.ME: Tools for detecting influential data in mixed effects models. R Journal, 4(2), 38–47. O’Campo, P., Wheaton, B., Nisenbaum, R., Glazier, R. H., Dunn, J. R., & Chambers, C. (2015). The Neighbourhood Effects on Health and Well-being (NEHW) study. Health & Place, 31, 65–74. Peng, R. D., Dominici, F., & Zeger, S. L. (2006). Reproducible epidemiologic research. American Journal of Epidemiology, 163, 783–789. Pinheiro, J. C., & Bates, D. M. (2000). Mixed-effects models in S and S-PLUS. New York, NY: Springer. Posada, D., Buckley, T. R., & Thorne, J. (2004). Model selection and model averaging in phylogenetics: Advantages of Akaike Information Criterion and Bayesian approaches over likelihood ratio tests. Systematic Biology, 53(5), 793–808. Rabe-Hesketh, S., & Skrondal, A. (2012). Multilevel and longitudinal modeling using Stata (3rd ed.). College Station, TX: Stata Press. Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods. Thousand Oaks, CA: Sage. Rights, J. D., & Sterba, S. K. (2019). Quantifying explained variance in multilevel models: An integrative framework for defining R-squared measures. Psychological Methods, 24(3), 309–338. Robson, D. S. (1959). A simple method for constructing orthogonal polynomials when the independent variable is unequally spaced. Biometrics, 15(2), 187–191. Rogers, E. M. (2003). Diffusion of innovations. New York, NY: Free Press. Rogosa, D., Brandt, D., & Zimowski, M. (1982). A growth curve approach to the measurement of change. Psychological Bulletin, 92(3), 726–748. Sabatier, P. A., & Weible, C. M. (2007). The advocacy coalition framework: Innovations and clarifications. In P. A. Sabatier (Ed.), Theories of the policy process (2nd ed., pp. 189–217). Boulder, CO: Westview Press. Sampson, R. J., Morenoff, J. D., & Gannon-Rowley, T. (2002). Assessing "neighborhood effects": Social processes and new directions in research. Annual Review of Sociology, 28(1), 443–478. Sampson, R. J., Morenoff, J. D., & Raudenbush, S. (2005). Social anatomy of racial and ethnic disparities in violence. American Journal of Public Health, 95(2), 224–232. Schwarz, G. (1978). Estimating the dimension of a model (Vol. 6). Bethesda, MD: Institute of Mathematical Statistics. Shinn, M., & Rapkin, B. D. (2000). Cross-level research without cross-ups in community psychology. In Handbook of community psychology (pp. 669–695). Boston, MA: Springer. Singer, J. D., & Willett, J. B. (2003). Applied longitudinal data analysis: Modeling change and event occurrence. Oxford, England: Oxford University Press. Skiple, J. K., Grendstad, G., Shaffer, W. R., & Waltenburg, E. N. (2016). Supreme Court justices’ economic behaviour: A multilevel model analysis. Scandinavian Political Studies, 39(1), 73–94.

104 Smith, K. W., & Sasaki, M. (1979). Decreasing multicollinearity. Sociological Methods & Research, 8(1), 35–56. Snijders, T. A. B., & Bosker, R. J. (1994). Modeled variance in two-level models. Sociological Methods & Research, 22(3), 342–363. Snijders, T. A. B., & Bosker, R. J. (2011). Multilevel analysis: An introduction to basic and advanced multilevel modeling (2nd ed.). Thousand Oaks, CA: Sage. Stroup, W. W., Milliken, G. A., Claassen, E. A., & Wolfinger, R. D. (2018). SAS for mixed models: Introduction and basic applications. Cary, NC: SAS Institute. Tufte, E. R. (2001). The visual display of quantitative information. Cheshire, CT: Graphics Press. Uhlaner, C. J. (2015). Politics and participation. International Encyclopedia of the Social & Behavioral Sciences, 504–508. Valente, T. (1996). Network models of the diffusion of innovations. Computational and Mathematical Organization Theory, 2(2), 163–164. Van der Meer, T., Te Grotenhuis, M., & Pelzer, B. (2010). Influential cases in multilevel modeling: A methodological comment. American Sociological Review, 75(1), 173–178. Weisz, J. R., Kuppens, S., Ng, M. Y., Eckshtain, D., Ugueto, A. M., Vaughn-Coaxum, R., . . . Fordwood, S. R. (2017). What five decades of research tells us about the effects of youth psychological therapy: A multilevel meta-analysis and implications for science and practice. The American Psychologist, 72(2), 79–117. West, B. T., Welch, K. B., Gałecki, A. T., & Gillespie, B. W. (2015). Linear mixed models: A practical guide using statistical software (2nd ed.). Boca Raton, FL: CRC Press. Wickham, H., & Grolemund, G. (2017). R for data science. Sebastopol, CA: O’Reilly Media. Zuur, A. F. (2009). Mixed effects models and extensions in ecology with R. New York, NY: Springer.

INDEX

Akaike information criterion (AIC) generalized linear mixed-effects model (GLMM), 68 model fit and performance assessment, 35 tobacco voting data set, 32 Alternative covariance structures, and longitudinal models, 90–93 Analytic multilevel models, 7–8 Applications, for multilevel modeling, 7–8 A priori power analysis, 57–58 Assessing multilevel model. See also Model fit and performance assessment; Power analysis about, 6, 34 centering, 52–57 empirical Bayes (EB), 47–50 posterior means estimation, 47–52 Assessing need for multilevel model, 18–23

Bosker, R. J., 24, 36, 37–38 Bottom-up strategy, 24–25 Box, G., 34 Boyle, M. H., 8 Bryk, A. S., 10, 69 Building multilevel model, 6. See also Tobacco voting data set Buka, S. L., 8

Basic model extension. See also Generalized linear mixed-effects model (GLMM) about, 6 cross-classified models, 76–78 mixed-effects model flexibility, 63 three-level models, 73–76 Baumert, J., 93 Bayesian information criterion (BIC) generalized linear mixed-effects model (GLMM), 68 model fit and performance assessment, 35, 47–48 tobacco voting data set, 32 Beidas, R. S., 8 BIC (Bayesian information criterion). See Bayesian information criterion (BIC) Binary outcomes, and generalized linear mixed-effects model (GLMM), 64–69 Boehmer, T. K., 8 Bolker, B. M., 59

Dewey, J., 1 Diez-Roux, A. V., 8

Centering, and assessing multilevel model, 52–57 Classifying multilevel modeling, 12–15 Cohen, J., 57 Count outcomes, and generalized linear mixed-effects model (GLMM), 69–72 Cross-classified models, and basic model extension, 76–78 Cross-level interactions, 27–28 Curran, P. J., 8

Empirical Bayes (EB), 47–50 Estimation techniques, 25–26 Generalized linear mixed-effects model (GLMM) about, 63–64 Akaike information criterion (AIC), 68 Bayesian information criterion (BIC), 68 binary outcomes, 64–69 Poisson model, 69, 70–72 Glass, T. A., 4 GLMM (generalized linear mixed-effects model). See Generalized linear mixed-effects model (GLMM) Goldstein, H., 8 Green, P., 60 Guidance for multilevel modeling about, 6, 7

105

106 presenting results recommendations, 94–97 resources, 97–99 Hawking, Stephen, 9 Heck, R. H., 7 Hershberger, S. L., 93 Hoffman, L., 93 Howe, L. D., 8 Hox, J. J., 53, 77 Hypothesis testing, 28–33 Interindividual change, and longitudinal models, 87–90 Intraclass correlation coefficient (ICC), 15, 20–21, 22, 59 Intraindividual change, and longitudinal models, 80–87 Kain, M. P., 59 Kennedy, E., 16 Koster, J., 8 Krauss, M., 50 Level 2 multilevel model about, 6, 9–11 Level 2 predictors, 26–27 Lievens, J., 8 Little, T. D., 93 Longitudinal data, and longitudinal models, 79–80 Longitudinal models about, 6 alternative covariance structures, 90–93 interindividual change, 87–90 intraindividual change, 80–87 longitudinal data, 79–80 Lottig, N. R., 8 Luke, D. A., 8, 50 MacLeod, C. J., 60 Maes, L., 8 Masuda, U. J., 8 Mathieu, J. E., 59 Maximum-likelihood (ML) estimation, 25–26

McAtee, M. J., 4 McCoy, M. W., 59 Mixed-effects model flexibility, and basic model extension, 63 Mixed-effects models, and power analysis, 58–62 ML (maximum-likelihood) estimation, 25–26 Model building strategies, 23–25 Model fit and performance assessment Akaike information criterion (AIC), 35 Bayesian information criterion (BIC), 35, 47–48 diagnostics assessment, 38–42 influence statistics, 42–47 model fit–deviance and R2 , 34–38 Moskowitz, D. S., 93 Multilevel modeling applications for, 7–8 background/rationale for, 1–2 resources, 8 statistical reasons for, 4–6 theoretical reasons for, 2–4 O’Campo, P., 8 Organizational multilevel models, 8 Physical multilevel models, 8 Planning multilevel modeling classifying multilevel modeling, 12–15 random effects, 11–12 two-level multilevel modeling, 9–11 Poisson model, 69, 70–72 Political multilevel models, 8 Posterior means estimation, and assessing multilevel model, 47–52 Post hoc power analysis, 58 Power analysis about, 57 mixed-effects models, 58–62 post hoc power analysis, 58 a priori power analysis, 57–58 Presenting results recommendations, 94–97 Random effects, and planning multilevel modeling, 11–12

107 Raudenbush, W. W., 10, 69 Resources, 8, 97–99 Rights, J. D., 38 Rogers, E. M., 4 Sampson, R. J., 8 Schnabel, K. U., 93 Singer, J. D., 35, 88, 93 Skiple, J. K., 8 Snijders, T. A. B., 24, 36, 37–38 Social multilevel models, 8 Software resources, 8, 98–99 Sondheim, Stephen, 16 Statistical reasons, for multilevel modeling, 4–6 Sterba, S. K., 38 Temporal multilevel models, 8 Theoretical reasons, for multilevel modeling, 2–4 Thomas, S. L., 7 Three-level models, and basic model extension, 73–76

Tobacco voting data set about, 16–18 Akaike information criterion (AIC), 32 assessing need for multilevel model, 18–23 Bayesian information criterion (BIC), 32 bottom-up strategy, 24–25 cross-level interactions, 27–28 hypothesis testing, 28–33 Level 2 predictors, 26–27 maximum-likelihood (ML) estimation, 25–26 strategies for, 23–25 Tolkien, J. R. R., 63 Tufte, E., 20 Two-level multilevel modeling, 9–11. See also Level 2 multilevel model Warhol, A., 79 Weisz, J. R., 8 West, B. T., 24, 39 Willet, J. B., 35, 88, 93 Willms, J. D., 8