Applied Quantitative Analysis in Education and the Social Sciences 0203108558, 9780203108550

To say that complex data analyses are ubiquitous in the education and social sciences might be an understatement. Fundin

920 138 7MB

English Pages xii+376 [389] Year 2013

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Applied Quantitative Analysis in Education and the Social Sciences
 0203108558,  9780203108550

Citation preview

Applied Quantitative Analysis in Education and the Social Sciences

To say that complex data analyses are ubiquitous in the education and social sciences might be an understatement. Funding agencies and peer-review journals alike require that researchers use the most appropriate models and methods for explaining phenomena. Univariate and multivariate data structures often require the application of more rigorous methods than basic correlational or analysis of variance models. Additionally, though a vast set of resources may exist on how to broadly execute a statistical analysis difficulties may be encountered when explicit direction is not provided as to how one should run a model and interpret results. The mission of this book is to expose the reader to advanced quantitative methods as it pertains to individual-level analysis, multilevel analysis, itemlevel analysis, and covariance structure analysis. Each chapter is self-contained and follows a common format so that readers can run the analysis and correctly interpret the output for reporting. Yaacov Petscher is Director of Research at the Florida Center for Reading Research. Christopher Schatschneider is Associate Director of the Florida Center for Reading Research and Professor of Psychology at Florida State University. Donald L. Compton is Professor of Special Education and John F. Kennedy Center Investigator at Vanderbilt University.

Applied Quantitative Analysis in Education and the Social Sciences

Edited by Yaacov Petscher Florida State University Florida Center for Reading Research

Christopher Schatschneider Florida State University Florida Center for Reading Research

Donald L. Compton Vanderbilt University

First Published 2013 by Routledge 711 Third Avenue, New York, NY 10017 Simultaneously published in the UK by Routledge 2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN Routledge is an imprint of the Taylor & Francis Group, an informa business © 2013 Taylor & Francis The right of the editors to be identified as the authors of the editorial material, and of the authors for their individual chapters, has been asserted in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging-in-Publication Data Applied quantitative analysis in education and the social sciences / edited by Yaacov Petscher, Florida State University, Florida Center for Reading Research, Christopher Schatschneider, Florida State University, Florida Center for Reading Research, Donald L. Compton, Vanderbilt University. p. cm. Includes bibliographical references and index. ISBN 978-0-415-89348-0 (hardback : alk. paper) — ISBN 978-0-415-89349-7 (pbk. : alk. paper) — ISBN 978-0-203-10855-0 (ebook) 1. Regression analysis. 2. Mathematical statistics. I. Petscher, Yaacov M. II. Schatschneider, Christopher. III. Compton, Donald L., 1960– IV. Petscher, Yaacov M. Extending conditional means modeling. QA278.2.A67 2013 519.5'36—dc23 2012030573 ISBN: 978-0-415-89348-0 (hbk) ISBN: 978-0-415-89349-7 (pbk) ISBN: 978-0-203-10855-0 (ebk) Typeset in Sabon by Apex CoVantage, LLC

YP—To my wife, Erin, your faith, persistence, and love in the midst of trial are a continued inspiration. To Abigail, your sweet spirit and adoration of life reminds me to slow down. To Naomi, your dancing and independence prompts me to enjoy these traits while you’re a toddler, as I won’t when you’re a teenager. To my Abba, for always believing that academia was the ministry. CS—To my mom, who is the strongest person I know. To my brother, who taught me to never give up. To Amy and Thomas, the most important people in my life. DC—To my children, Rosie and Harry. Remember, statistics affects every decision in life to some extent. Dad.

Contents

Preface List of Contributors PART I Individual-Level Analysis 1

Extending Conditional Means Modeling: An Introduction to Quantile Regression

ix xi

1 3

YAACOV PETSCHER, JESSICA A. R. LOGAN, AND CHENGFU ZHOU

2

Using Dominance Analysis to Estimate Predictor Importance in Multiple Regression

34

RAZIA AZEN

3

I Am ROC Curves (and So Can You)!

65

CHRISTOPHER SCHATSCHNEIDER

PART II Multilevel Analysis 4

Multilevel Modeling: Practical Examples to Illustrate a Special Case of SEM

93 95

LEE BRANUM-MARTIN

5

Linear and Quadratic Growth Models for Continuous and Dichotomous Outcomes

125

ANN A. O’CONNELL, JESSICA A. R. LOGAN, JILL M. PENTIMONTI, AND D. BETSY MCCOACH

PART III Item-Level Analysis

169

6

171

Exploratory and Confirmatory Factor Analysis REX KLINE

7

Factor Analysis with Categorical Indicators: Item Response Theory R. J. DE AYALA

208

viii

Contents

PART IV Covariance Structure Analysis 8 Introduction to Structural Equation Modeling

243 245

RICHARD LOMAX

9 Latent Growth Curve Modeling Using Structural Equation Modeling

265

RYAN P. BOWLES AND JANELLE J. MONTROY

10 Latent Class/Profile Analysis

304

KAREN SAMUELSEN AND KATHERINE RACZYNSKI

11 n-Level Structural Equation Modeling

329

PARAS MEHTA

Index

363

Preface

The growth of applied research methodologies in the last 40 years has led to numerous, important advances in quantitative analysis. Strike that. A more appropriate statement might be that technological advances in the last 20 years have facilitated the development of software packages; making more traditionally complex quantitative approaches seem reasonable or even possible to implement. Such progress has given us methods and software to account for data that exist in hierarchies, procedures for relating the complexity of items in an assessment to an individual’s performance on the assessment, and using scores from multiple assessments to describe hypothetical constructs. Subsequently, academic researchers are now often faced with the challenge of developing expertise in not only a substantive area of interest but also in the types of analytic techniques and associated software packages that are more appropriately suited to their particular data structures and research questions. Our goal in putting together this book was to present advanced quantitative analyses at a level a user with an understanding of regression could reasonably work through most of the presented chapters. As such, although the works were independently completed by the authors, there are a number of common threads throughout. First, each chapter is designed to introduce the reader to the analysis and why it is important to consider as a methodological tool. Second, the contributed works highlight how the analysis might be differentiated from more commonly used approaches in the literature. Third, each chapter contributes space to discuss the assumptions pertaining to the analysis, sample size considerations, and thoughts on power analyses for planning a study. Last, and most importantly, each author works through an applied example using commercially available software packages (e.g., SAS, Mplus, HLM, LISREL, SPSS, IRTPro) or freeware packages (e.g., R, xxM). The chapters typically include the steps necessary to run the analysis, and all contain the resulting output, with interpretation and explication, from the software package either as a formatted table or a direct screenshot from the software package. The edited volume is broken into four sections. Part I includes Chapters 1 through 3, and focuses on methods not often used in education and social sciences, where the unit of analysis is the individual. Chapter 1 introduces the reader to quantile regression, which has been used often in the econometrics literature, but has high utility when one is interested in how the relationship between a predictor and criterion vary according to performance on the criterion. Chapter 2 provides an introduction to dominance analysis, which is a regression-based analysis, allowing one to meaningfully compare the relative importance of predictors to an outcome, and Chapter 3 discusses how one is able to use receiver operating characteristic (ROC) curves for screening individuals’ likelihood for developing a future problem (e.g., failing a later high-stakes achievement test in math). Part II presents multilevel models, whereby a hierarchy exists in the data structure (e.g., students are nested with classrooms). Chapter 4 shows the formal relations between

x

Preface

multilevel models and structural equation models by highlighting how a multilevel model yields the same results across programs traditionally used for multilevel models (e.g., SAS) and ones for structural equation models (e.g., Mplus), and demonstrates that a broader framework (i.e., n-level SEM) might provide a more flexible framework for modeling such data. Chapter 5 illustrates linear and nonlinear growth models in multilevel contexts and shows the flexibility of such models when the outcome is continuous over time or when it is dichotomously scored. Part III includes two chapters where the unit of the analysis is at the item level. Chapter 6 is focused on exploratory and confirmatory factor analysis, and Chapter 7 looks at a special case of confirmatory factor analysis when the items are categorical in nature (i.e., item response theory). Finally, Part IV delves into covariance structure analysis (also known as structural equation models). Chapter 8 provides a broad introduction to the basics for this class of models, and Chapter 9 illustrates how such models may be used in a growth curve framework. Different from Chapter 4, this chapter highlights how the structural equation model context can model change in both observed and latent factors, as well as how latent factors can be used for difference score modeling. Chapter 10 provides another use for latent variables by demonstrating how a latent class or profile analysis may be run, and describes its improvements upon a traditional cluster analysis. Chapter 11 concludes this volume with a perspective that takes a number of the methodologies presented in this book (i.e., multilevel, factor analysis, growth) into a framework termed n-level SEM. This viewpoint expands on the theoretical components presented in Chapter 4, and more specifically describes the intricacies of a fully latent multilevel structural equation model. Though it is more technically complex than some of the other chapters, it provides an insight into an analytic framework that may more reasonably handle data sets with complex data structures (e.g., latent cross-classified multilevel models). The preparation of an edited volume like this could not have been done without the love of our friends and family who differentially supported us through positive and negative reinforcement, as well as some punishment. Special thanks to Lane Akers, at Routledge, who guided us in its preparation, and Andrew Weckenmann, also at Routledge, who took on the task of assisting us with formatting and final preparation, and always kindly responded to many, many e-mails at the last minute. Thanks also to Dr. Adrea Truckenmiller, who served as our target audience “guinea pig” by reading several of the early drafts and provided feedback. It is our hope that each person reading this book will learn as much from the chapters as we did preparing them!

Contributors

R. J. de Ayala Chair and Professor University of Nebraska–Lincoln

Janelle J. Montroy Graduate Student Michigan State University

Razia Azen Assistant Professor University of Wisconsin–Milwaukee

Ann A. O’Connell Professor Ohio State University

Ryan P. Bowles Assistant Professor Michigan State University

Jill M. Pentimonti Senior Researcher Children’s Learning Research Collaborative Ohio State University

Lee Branum-Martin Associate Professor Georgia State University Rex Kline Professor Concordia University Jessica A. R. Logan Senior Researcher Children’s Learning Research Collaborative Ohio State University Richard Lomax Professor Ohio State University D. Betsy McCoach Associate Professor University of Connecticut Paras Mehta Associate Professor University of Houston

Yaacov Petscher Director of Research Florida Center for Reading Research Florida State University Katherine Raczynski Director, Safe and Welcoming Schools University of Georgia Karen Samuelsen Associate Professor Piedmont College Christopher Schatschneider Professor Florida State University Chengfu Zhou Senior Statistics Specialist Florida Center for Reading Research

Part I

Individual-Level Analysis

1

Extending Conditional Means Modeling An Introduction to Quantile Regression Yaacov Petscher, Jessica A. R. Logan, and Chengfu Zhou

1.1 Why Consider Quantile Regression? The purpose of this chapter is to introduce the reader to a special case of median regression analysis known as quantile regression. Summarizing behaviors and observations in the education and social sciences has traditionally been captured by three well-known measures of central tendency: the average of the observations, the median value, and the mode. Extensions of these descriptive measures to an inferential process are generally focused on the mean of the distribution. Traditional multiple regression analysis asks the question in the vein of, “How does X (e.g., weight) relate to Y (e.g., height)?” and implicit to this question is that the relationships among phenomena are modeled in terms of the average. Subsequently, regression may be thought of as conditional means model, in that for any given value of an independent variable, we elicit a predicted mean on the dependent variable. Since its theoretical development in 1805 by Adrien-Marie Legendre, conditional means modeling has become a universal method for model-based inferencing. By extension, simple analysis of variance (ANOVA) models may be viewed as a special case of regression, and the more complex multilevel and structural equation models are also forms of conditional means models. Although regression is a very flexible analytic technique, it does have some weaknesses when applied to (a) nonnormally distributed distributions of scores and (b) testing specific theory concerned with differential relations between constructs in a population. Related to the first point, regression is not well equipped to handle data with non-normal distributions (e.g., data with skewed or kurtotic distributions), because of its assumption that the errors are normally distributed. When this assumption is violated, the parameter estimates can be strongly biased, and the resulting p values will be unreliable (Cohen, Cohen, West, & Aiken, 2003). Within developmental and educational research, researchers are often interested in measuring skills which frequently present with non-normality based on the commonly used measures. Literacy research, for example, shows strong floor effects in the measurement of alphabet knowledge in pre-kindergarten (e.g., Paris, 2005), as well as oral reading fluency in first and second grades (e.g., Catts, Petscher, Schatschneider, Bridges, & Mendoza, 2009). Behavioral research has similarly identified floor effects when measuring dopamine activity in the brain of individuals with attention-deficit/hyperactivity disorder when differential interventions are applied (Johansen, Aase, Meyer, & Sagvolden, 2002), and researchers studying gifted populations typically observe ceiling effects in the distributions of academic achievement and aptitude (McBee, 2010). When such distributions are encountered (for either independent or dependent variables), researchers regularly follow conventional wisdom to either transform the data to

4

Yaacov Petscher, Jessica A. R. Logan, and Chengfu Zhou

bring them closer to a normal distribution or ignore the violation of normality. Each approach may have intuitive merit; for example, when data are transformed, one is able to reflect a normal distribution of scores. Conversely, when the violation is ignored, the original metric of the score is preserved in estimating the model coefficients. In each of these conditions, limitations are presented as well. By transforming one’s data, the original metric is lost, and the interpretation of the coefficients may not have an straightforward application; yet, keeping the data in the original metric and ignoring the presence of a floor or ceiling may be misleading as the true mean relation between X and Y may be underestimated. A second instance in which multiple regression may be insufficient to illuminate a relationship is when a theory is being tested where subgroups exist in a population on the selected outcome, and a differential relationship exists between X and Y for those different subgroups. As an example, suppose we examine the U.S. normative height (stature) growth charts for girls, ages 2 to 20 (Figure 1.1). The darkest black line represents the average height (y-axis) for girls at a given age (x-axis). Note that the line is very steep between ages 2 and 12 when height is still

em 200

195 190

: -76

-----.-::

=

-74

165 =-

-72

180 175

170 165

I I I I I I

In

-78

I I I I I I

I I I I I I I I I

I I I I I I I I I

I I I I I I

70 95th 1-56 90th

f-'" f-'"

~ -56

: -60

/ /

I.

I~

=-58

130 125

J

I~

58

II

56

/

52 50

/

48 46

_-44

100 _

-42 -40 _

-38

90 - -38

t---

5th 1-60

.-

-46

105

95

I.

-48

115 110

'/

1~th

54

I, V /

-52

_-50

120

I

V

I

25th I 62

""" f-'"

1/

: -54

:

~

j

II

7~th 1- 56 50th 1-64

I. /

: -62

-58

135

72

-56

145 140

76

74

-70

155 150

In

78

Stature-far-age percentiles: Girls, 2 to 20 years

-64 160

I I I I I I

IJ.~ -

44

/.

/,

42

1/

40

'Jih

38

'I' Vo'l

38

~

-34 _ 85 =

34

-32

32

-30 75 em - In

30

80

In

9

10

11

12

13

14

15

16

17

18

19

-

20

Age (years) PubiishedMay30, 2000. SOURCE: Developed by by the the Net_I National Cent., Centerfor forHeanh HealthSt.tls~ca StatisticsInincollaboration collaborationwith with SOURCE: Devetoped the National Calter Center for for o.onIc Chronic OfIeaae DiseasePreven1lon PreventionBnd andHealth HealthPromotion Promotion(2000), (2000).

Figure 1.1 U.S. normative growth chart for height (stature) with age. From National Center for Chronic Disease Prevention and Health Promotion (2000).

Extending Conditional Means Modeling

5

developing. For persons over age 12, height is relatively stable, and so a much weaker association between height and age is observed. Testing this theoretically heteroscedastic relationship evidenced by this illustration would be difficult using conditional means models, as the estimate of association would cloud both the lack of a relationship above age 12 and the very strong correlation up to age 12. In response to the basic limitations of conditional means regression (i.e., non-normal distributions), a growing body of literature has begun to use quantile regression to gain more specific inferential statistics from a dataset (e.g., Catts et al., 2009). The distinguishing feature of quantile regression compared to conditional means modeling is that it allows for the estimation of relations between a dependent and independent variable at multiple locations (i.e., quantiles) of the dependent variable within a continuous distribution. That is, rather than describing the mean relations between X and Y, quantile regression estimates the relations between X and Y at multiple points in the distribution of Y. Quantile regression uses every data point when estimating a relation at a given point in the distribution of Y, asymmetrically weighting the residual errors based on the distance of each data point from the given quantile. Because of this method of estimation, quantile regression has no assumptions of the variance in the residual error terms and is robust to non-normally distributed data (Koenker, 2005). Due to the lack of assumptions regarding the normality of the underlying distributions of scores, we believe that quantile regression lends itself well to questions in the education sciences. The goal of this chapter, therefore, is to familiarize the reader with quantile regression, its application, and the interpretation of parameters. 1.1.1 Conceptual Introduction Regression analyses are often referred to as ordinary least squares (OLS) analyses, because they estimate a line of best fit that minimizes the squared errors between the observed and estimated values of each X value. OLS regression provides a predicted score for the outcome based on the score of the predictor(s), and is designed to estimate the relation of Y with X. Quantile regression also examines the relation of Y with X, but rather than produce coefficients which characterize the relationship in terms of the mean, as is the OLS approach, quantile regression may be viewed as a conditional median regression model. As such, the relationship between X and Y is examined conditional on the score of Y, referred to as the pth quantile. The notion of a quantile is similar to that of a percentile, whereby a score at the pth quantile reflects the place in the distribution where the proportion of the population below that value is p. For example, the .50 quantile would be akin to the 50th percentile, where 50% of the population lies below that point. In quantile regression, the .50 quantile is the median, which describes the center of the distribution in a set of data points. Quantile regression, then, predicts Y based on the score of X at multiple specific points in the distribution of Y. As an example, suppose we were to examine the relations between children’s vocabulary scores (Y ) as predicted by parental education (X). A regression of vocabulary on parental education would result in two pieces of information: (1) the intercept, which is an estimate of vocabulary score when parental education is zero, and (2) the slope coefficient, which represents the incremental change in vocabulary score (Y) for a one-unit change in parental education (X). This analytic approach answers the research question: Does parental education significantly predict children’s vocabulary scores? Quantile regression asks a similar question as OLS, but further extends it to ask whether the relation of parental education with children’s vocabulary scores differ for children

6

Yaacov Petscher, Jessica A. R. Logan, and Chengfu Zhou

with higher or lower vocabulary scores. Entering parental education into a quantile regression equation predicting vocabulary would also result in an intercept coefficient and a slope coefficient, with the intercept representing the predicted vocabulary score when parental education is zero and the slope coefficient representing the incremental change in vocabulary score for one-unit change in parental education. However, because quantile regression estimates these relations at multiple points in the distribution of Y, the intercept and slope are uniquely estimated at several points in the distribution of vocabulary score. Because quantile regression allows the relation between X and Y to change dependent on the score of Y, it produces unique parameter estimates for each quantile it is asked to examine (e.g., the .25, .50, and .90 quantiles).

1.2 Quantile Regression Illustration in the Literature: Reading Fluency In the previous section, we noted that OLS regression may be a limited methodological approach when the data are non-normally distributed or when the underlying theory pertains to differential effects between subgroups. In educational research, these two areas have recently been explored using quantile regression. For example, researchers studying literacy skill acquisition for young students have been questioning the appropriate time in one’s development to assess oral reading fluency skills. Measuring oral reading fluency has been used as a proxy for an individual’s level of reading comprehension, because it combines automatic word retrieval with the ability to accurately decode words (Fuchs, Fuchs, Hosp, & Jenkins, 2001). The recommended period for measuring fluency skills is in the primary grades, beginning in the winter of first grade, as its relationship to reading comprehension is the strongest is elementary, and tapering off by high school ( Jenkins & Jewell, 1993). Winter of first grade has been recommended as the most appropriate time point to begin assessing oral reading fluency, because students’ reading skills are theoretically expected to not be adequately developed until this time point (Hasbrouck & Tindal, 2006). Despite this recommendation, Catts et al. (2008) found that 61% of the data fell in the lowest quarter of the distribution of fluency scores for the winter of first grade, as well as finding that nearly 80% of the data were at the lowest quarter of the distribution for the fall assessment of first grade, and 55% of the data were at the lowest quarter for the spring as well; all of which demonstrated the presence of a strong floor effect. Using approximately the same sample, Kim, Petscher, Schatschneider, and Foorman (2010) estimated the correlation between these oral reading fluency scores and reading comprehension, and found a range of r = .61 and .73 from the fall to spring. Taking these two findings together, it may be inferred that the general correlation may not be a comprehensive estimate of the relation between fluency and comprehension at each time point because the estimates are based on data that exhibit strong non-normality. Figure 1.2 displays a scatter plot of the data using the sample from Kim et al. (2010). The OLS regression fit line for this relationship can be quantified as a correlation of r = .63; however, based on the overlain histograms, it can be seen that this is not an efficient index of the association. Although the distribution of reading comprehension displays some skew, oral reading fluency presents with very strong floor effects. If we were to show the scatter plot from Figure 1.2 in a different light, whereby we plot not only the OLS fit line but also add fit lines to represent the relationship between fluency and reading comprehension based on individual cut-points on comprehension at the 10th, 25th, 50th, 75th, and 90th percentiles of oral reading fluency (Figure 1.3), we could observe that the regression lines significantly vary in their linearity between the 10th and 90th percentiles.

Extending Conditional Means Modeling

7

160 150 140

no 120 110 ". U

~

0

L.. (II)

C

s:

F

0.0028 It I

Alpha

Lower

Upper

< .0001 < .0001 < .0001 < .0001 < .0001 < .0001 0.0028

0.05 0.05 0.05 0.05 0.05 0.05 0.05

1.1108 0.7860 0.8037 0.9700 0.9671 0.7479 0.01628

1.3187 0.9106 0.8420 1.2784 1.2279 0.8031 0.07839

Figure 3.14 Output from the PROC NLMIXED comparing two binormal curves.

86

Christopher Schatschneider

Comparison of ORF and Vocabulary Binormal ROC Curves

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

False Positive Rate

I-Orf

- - - - - Vocabulary

Figure 3.15 Comparison of Oral Reading Fluency and Vocabulary binormal curves.

we see the a and b parameters and the AUCs for both the ORF and VOC screens. Additionally, there is a statistical test for the difference in AUCs between the ORF and VOC screens. The difference between the two AUCs is .047, and this difference is significant (t(1999) = 2.99, p = .0028) Additionally, there is a second test that is performed that has two degrees of freedom that tests whether there is a difference between the a and b parameters of the two ROC curves. It is possible that two ROC curves that intersect could potentially have the same AUC but different a and b parameters. This would imply that the screens perform differentially over the range of potential false positive rates. This test appears in the Contrasts section of the output and is labeled “Equality of the ROC curves.” A graph of the two binormal ROC curves appear in Figure 3.15. Contrary to the empirical ROC curves for the two screening assessments, which showed a fairly uniform difference between the two curves, the binormal graph indicates that most of the difference between the two curves is at the lower and middle range of the false-positive rate and little difference exists at the higher end of the false-positive rate.

3.9 Power for Comparing Empirical ROC Curves Power for comparing two AUCs from ROC curves (or power to compare one AUC to a known population value) follows the same rules of thumb that power calculations for any parametric procedure. All else being equal, more subjects, larger effects, larger alpha, and a larger correlation among measures being compared will yield greater power. Power for

I Am ROC Curves (and So Can You)!

87

comparing AUCs can be done using the PASS 11 software (PASS) or can be done in SAS using the %ROCPOWER macro available at http://www.bio.ri.ccf.org/doc/rocpower.sas and a good description as to how to use it at http://www.bio.ri.ccf.org/doc/rocpower_ help.txt. The macro uses the approach outlined in Hanley and McNeil (1983), which uses a critical Z test to determine the difference between two AUCs that also takes into account the correlation between the AUCs that is induced by the correlations between the two screening instruments being compared. The %ROCPOWER macro will give a power estimate given the n s for the two groups, the AUCs for the two screens, and the correlation between the two AUCs. Note, however, that the correlation between the two AUCs is not the same as the correlation between the two screening assessments. To compute the correlation necessary to use the macro, the correlation between the two screening instruments must be computed separately for the two groups being compared, and these two correlations must be averaged. Additionally, the AUCs for the two screening instruments must also be computed and averaged. Then a lookup table provided by Hanley and McNeil (1983) can be used to obtain the correlation needed to run the macro. To compute the power to compare the ORF screen and the VOC screen in this sample, the correlation between ORF and VOC was obtained for the students who failed the reading comprehension test (rf = .12) and for those passing the test (rp = .30), and these two correlations were averaged (rave = .21). Additionally, the AUCs for ORF (.82) and VOC (.78) were averaged (AUCave = .80) and these two results were used in the look-up table provided by Hanley and McNeil (1983) to get an estimated r = .18 (obtained through interpolation). With this information, the power to test the difference between these two curves can be obtained by using the following SAS syntax:

%include 'c:\yourdirectory\rocpower.sas'; %rocpower(T1=.82, T2=.78, T0=.82, NA=846, NN=1154, r=.18, alpha=.05, tails=2);

The output from this run indicated that this particular analysis had power above .80 to detect the difference between the ORF screen and the VOC screen. In order to conduct a power analysis for only one AUC (to see what the power would be compared to chance (AUC = .50) or to some other AUC of interest, the same macro can be used. For example, if we wanted to know the power to detect a difference between and AUC of .78 and chance levels of AUC, the following syntax could be used:

%include 'c:\yourdirectory\rocpower.sas'; %rocpower(T1=.78, T0=.50, NA=846, NN=1154, alpha=.05, tails=2);

Using this syntax, the power to detect a statistical significant AUC of .78 is above .99. In order to compute power a priori, an estimate of the correlation between the two screens being compared will be necessary, as well as an estimate of the size of the AUCs. Then, by trial and error, the number of subjects needed to achieve power of .80 (or any desired level of power) can be obtained. PASS 11, however, can compute the needed number of subjects directly without the need for trial and error.

88

Christopher Schatschneider

3.10 Conclusion The goal of this chapter was to introduce various classification indices and to demonstrate how to conduct empirical and binormal ROC curves. The need for statistical techniques to assist in the classification and prediction of observations cuts across numerous fields. The use of ROC curves is an important statistical tool to use in the development of screening and diagnostic assessment. This chapter provided an illustrative example and how to perform many of these analyses using SAS, but the reader should note that there are numerous statistical packages that are also capable of performing some or all of these analyses, including Stata, SPSS, and R, among others. In addition to the routines available in these general purpose statistical packages, numerous specialty programs specifically designed for ROC curve analysis are available. This chapter also only covered the binormal approach to estimating ROC curves, but it should be noted that this is not the only model than can be fit. The Lehmann estimation procedure (Gonen & Heller, 2010), for example, was not covered here, or instances where the data obtained may be censored. An excellent treatment of these approaches, as well as others, is given by Gonen (2003) along with SAS code and examples. More research needs to be done on particular applications and extensions of ROC curves. For example, it is currently unknown what effects fitting ROC curves to data that are nested (for example, students within classrooms) has on the estimation of the standard errors obtained for the AUC from an empirical ROC curve. But overall, the use of ROC analysis to investigate the performance of screening instruments has proved to be a valuable tool. References Bamber, D. (1975). The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology, 12, 387–415. Barlow, D. H. (1991). Introduction to the special issue on diagnosis, dimensions, and DSM-IV: The science of classification. Journal of Abnormal Psychology, 100, 243–244. Gonen, M. (2003). Analyzing Receiver Operating Characteristic Curves with SAS. Cary, NC: SAS Institute. Gonen, M., & Heller, G. (2010). Lehmann family of ROC curves. Medical Decision Making, 30, 509–517. Good, R. H., Wallin, J., Simmons, D. C., Kameenui, E. J., & Kaminski, R. A. (2002). Systemwide percentile ranks for DIBELS benchmark assessment (Technical Report, No. 9). Eugene: University of Oregon. Hanley, J. A., & McNeil, B. J. (1983). A method of comparing the areas under the receiver operating characteristic curves derived from the same cases. Radiology, 148, 839–843. Krzanowski, W. J., & Hand, D. J. (2009). ROC curves for continuous data [Monographs on Statistics and Applied Probability]. Boca Raton, FL: Taylor and Francis. Metz, C. E. & Pan, X. (1999). Proper ROC curves: Theory and maximum-likelihood estimation. Journal of Mathematical Psychology, 43, 1–33. PASS (Version 11) [computer software]. Kaysville, UT: NCSS Statistical Software. Pintea, S., & Moldovan, R. (2009). The receiver-operating characteristic (ROC) analysis: Fundamentals and applications in clinical psychology. Journal of Cognitive and Behavioral Psychotherapies, 9, 49–66. SAS (Version 9.13) [computer software]. Cary, NC: SAS Institute. Schatschneider, C., Petscher, Y., & Williams, K. M. (2008). How to evaluate a screening process: The vocabulary of screening and what educators need to know. In L. Justice & C. Vukelic (Eds.), Achieving excellence in preschool language and literacy instruction (pp. 304–316). New York, NY: Guilford Press. Swets, J. A. (1996). Signal Detection Theory and ROC analysis in psychology and diagnostics: Collected papers. Mahwah, NJ: Erlbaum.

I Am ROC Curves (and So Can You)!

89

Appendix 3.1: SAS Code to Generate Empirical ROC Curve Graph (Figure 3.2)

/*************************************************************** Creating a data set called rocdata that contains the sensitivity, specificity, false-positive rate, and false-negative rate for every possible cut point ***************************************************************/ proc logistic data=sub; model fail(event='1')=orf /outroc=rocdata; run; /************************************************************ These two data steps are used to create a variable called zero_ one that will serve as our diagonal line for the ROC curve. *************************************************************/ data tot(keep=totobs); set rocdata end=bob; totobs + 1; if bob; run; data rocdata2; if _n_=1 then set tot; set rocdata; obs+1; zero_one=obs/totobs; run; /*********************************************************** This section creates the graph and saves it as a .gif file. /***********************************************************/ filename scatter 'C:\yourdirectory\figure3_2.gif'; goptions reset=all device=gif gsfname=scatter gsfmode=replace; axis1 label=(angle=90 font="arial" H=1.5 'Hit Rate' ) order= 0 to 1 by .1 minor=(n=1) offset=(0,0) major=(h=1.0) value=(h=1.0 font="arial") length=4in;

90

Christopher Schatschneider

axis2 label=(H=1.5 font="arial" 'False Positive Rate' ) order= 0 to 1 by .1 minor=(n=1) major=(h=1.0) offset=(0,0) value=(h=1.0 font="arial") length=4in; title1 font="arial" H=1.5 move=(25,38) 'ROC Curve for ORF Predicting Reading Failure'; symbol1 interpol=join v=dot color=blue H=.5; symbol2 interpol=spline v=none color=black H=1.5; proc gplot data=rocdata2 ; plot _sensit_*_1mspec_ zero_one*zero_one /vaxis=axis1 haxis=axis2 overlay; run; quit;

I Am ROC Curves (and So Can You)!

91

Appendix 3.2: SAS Code to Generate the Empirical and Binormal ROC Curve Graph

/*************************************************************** Creating a data set called rocdata that contains the sensitivity, specificity, false-positive rate, and false-negative rate for every possible cut point ***************************************************************/ proc logistic data=sub; model fail(event='1')=orf /outroc=rocdata; run; /************************************************************ These three data steps are used to create a variable called zero_one that will serve as our diagonal line for the ROC curve and will create a variable called orfsense, which will be used to plot the binormal ROC curve. Please note that the a and b parameters from the binomial ROC analysis must be supplied. *************************************************************/ data tot(keep=totobs); set rocdata end=bob; totobs + 1; if bob; run; data rocdata2; if _n_=1 then set tot; set rocdata; obs+1; zero_one=obs/totobs; run; data rocdata3; set rocdata2; orfsens=probnorm(1.2151 + .848*probit(_1mspec_)); label _sensit_ ='Empirical' orfsens='Binormal'; run; /*********************************************************** This section creates the graph and saves it as a .png file. /***********************************************************/ filename scatter 'C:\projects\roc_chapter\out\figure3_3.png'; goptions reset=all device=png gsfname=scatter gsfmode=replace;

92

Christopher Schatschneider

axis1 label=(angle=90 font="arial" H=1.5 'Hit Rate' ) order= 0 to 1 by .1 minor=(n=1) offset=(0,0) major=(h=1.0) value=(h=1.0 font="arial") length=4in; axis2 label=(H=1.5 font="arial" 'False Positive Rate' ) order= 0 to 1 by .1 minor=(n=1) major=(h=1.0) offset=(0,0) value=(h=1.0 font="arial") length=4in; title1 font="arial" H=1.5 move=(22,48) 'Empirical and Binormal ROC Curve for ORF Predicting Reading Failure'; symbol1 interpol=join v=dot color=blue H=.5; symbol2 interpol=spline v=none color=black H=1.5; symbol3 interpol=spline v=none color=purple H=1.5; symbol4 interpol=spline v=none color=orange H=1.5; symbol5 interpol=spline v=none color=black H=1.5; legend1 order=('Empirical' 'Binormal') label=none frame offset=(.3in,.5in) ; proc gplot data=rocdata3 ; plot _sensit_*_1mspec_ zero_one*zero_one orfsens*_1mspec_ /vaxis=axis1 haxis=axis2 overlay legend=legend1; run; quit;

Part II

Multilevel Analysis

4

Multilevel Modeling Practical Examples to Illustrate a Special Case of SEM Lee Branum-Martin

4.1 Introduction Multilevel models are becoming commonplace in education to account for the clustering of students within classrooms. Multilevel models have been developed in numerous fields to account for the clustering of observations within other units, such as repeated measures within students, or classrooms within schools (see overview by Kreft & de Leeuw, 1998). In this chapter, it is assumed that readers have a general familiarity with multilevel models. Practical examples of how to fit multilevel models are numerous (e.g., West, Welch, & Gałecki, 2007), with one of the best and most cited being Singer (1998). An excellent conceptual introduction to multilevel methods with connections to different historical traditions can be found in Kreft and de Leeuw (1998). More complete technical accounts of multilevel models can be found in Raudenbush and Bryk (2002), Goldstein (2003), and Snijders and Bosker (1999). There is a growing recognition that multilevel models are models of covariance, or dependence among individuals nested within social clusters (Curran, 2003; Mehta & Neale, 2005). A structural equation model (SEM) is a highly flexible method for handling covariance among variables (Bollen, 1989; Lomax, this volume). This chapter exploits this overlap in modeling covariance in both multilevel models and SEM through the use of applied examples. Most approaches to multilevel models and SEM are tightly tied to a single version of equations, diagrams, and software code, with the connections among these three representations often becoming obscured to the user. In addition, some software contains inherent limitations to the number of levels or the ways in which those levels may be connected. In order to overcome these limitations of other approaches, I present a new framework for fitting multilevel models that is based in SEM, has a natural connection to our substantive notions of nested levels, and helps to make explicit how equations, diagrams, and software code represent the same substantive phenomena we wish to test in our empirical models. The current chapter does this by introducing xxM (Mehta, 2012), a package for the free statistical software, R (R Development Core Team, 2011). The purpose of the current chapter is to provide practical, step-by-step examples of fitting simple multilevel models, with equations, diagrams, and code. Singer (1998) and West, Welch, and Gałecki (2007) provided practical examples of multilevel models with code for SAS PROC MIXED. Mehta and Neale (2005) provided code and technical details of a complete translation between PROC MIXED and the more general approach of fitting a SEM. The current chapter extends these two previous sources in order to emphasize conceptual and practical commonalities across multilevel models and SEMs.

96

Lee Branum-Martin

The current chapter will focus on two issues: (a) connections among the parameters represented in equations, diagrams, and code and (b) an introduction to the new but highly extendable programming code of xxM. It is hoped that the current chapter may facilitate connections between multilevel approaches (e.g., mixed effects vs. multilevel SEM) and software (e.g., SAS PROC MIXED, Mplus, HLM, lme4 in R). Three software packages are used in this chapter: SAS PROC MIXED, Mplus 6.1, and xxM. SAS PROC MIXED is a general purpose mixed effects program (Littell, Milliken, Stroup, Wolfinger, & Schabenberger, 2006), coupled with the highly general data management and statistics software, SAS (SAS Institute Inc., 2010). Mplus 6.1 fits numerous kinds of latent variable and two-level models (B. O. Muthén, 2001, 2002; L. K. Muthén & Muthén, 2010). xxM is a package for the free software, R (R Development Core Team, 2011). xxM is designed for fitting SEMs with essentially no limit to the number of levels of nesting, including complicated structures of cross-classification and multiple membership. Demonstrations and updates for xxM can be found at http:// xxm.times.uh.edu. A user’s guide can be downloaded from there with several worked examples of multilevel models and the features of the program (see also Mehta, this volume). 4.1.1 An Example Data Set: Passage Comprehension Scores The purpose of this chapter is to demonstrate connections between multilevel model parameters in equations, diagrams, and code. To be sure, multilevel analysis opens conceptual and statistical issues important for careful consideration. Some of these issues are considered here in the context of an applied data analysis. Generally speaking, multilevel models are used in education for examining how scores vary across shared environments, such as how variation in student reading or math scores might differ across classrooms or schools (or how repeated measures may differ across persons, who also may share classrooms and schools). Such variation at the cluster level is important to understand, since it could not only obscure student level effects such as background characteristics or the influence of other skills (an issue of power or appropriateness of standard errors), but cluster-level variation may also indicate environmental level effects, such as cohort/ classmate influence, instruction, or other potentially ecological effects (Raudenbush & Sampson, 1999). Some of these issues are commented on in the following applied examples. Excellent overviews of the broader conceptual issues can be found in introductory sources (Hox, 1995; Kreft & de Leeuw, 1998; Snijders & Bosker, 1999), with detailed technical accounts for more advanced readers (Goldstein, 2003; Raudenbush & Bryk, 2002). In order to keep the discussion applied, let us consider an example data set of 802 students in 93 classrooms (ignoring schools for the current example) measured in the spring of their first grade year on reading passage comprehension. The measure was taken from the Woodcock Language Proficiency Battery-Revised, a test developed on a nationally representative sample (Woodcock, 1991). The test is scored in W-units, a Rasch-based metric. A substantive question for this dataset is “to what extent do classrooms vary in their English passage comprehension scores?” Figure 4.1 shows box plots for each of 93 classrooms of student passage comprehension scores. The classrooms have been sorted in order of ascending means. The horizontal line at W-score 443 represents the overall sample mean across all students. Each box represents one classroom, with a diamond representing the classroom mean, the middle line within each box representing the classroom median, the upper and lower ends of the

Multilevel Modeling

97

Figure 4.1 Box plots of student passage reading comprehension scores within classrooms.

box representing the upper and lower quartiles (25th and 75th centiles), and the whiskers extending to 1.5 times the range between quartiles. Open circles represent student scores outside the whiskers. Figure 4.1 represents the essential problem for nested data: clusters such as classrooms differ in their average level of performance. If we were to randomly assign students to a reading intervention versus regular instruction, variability due to classrooms would weaken power to detect a true treatment difference (Raudenbush, Bryk, & Congdon, 2008; Raudenbush, Martínez, & Spybrook, 2007; Spybrook et al., 2011). That is, we might worry that noisy variation among classrooms might overwhelm our ability to find a true average treatment effect for students. Even small amounts of cluster differences have been shown to reduce the power to detect mean differences (Barcikowski, 1981; Walsh, 1947). This issue of increasing precision is a major motivation for using multilevel models in evaluating educational programs. On a less technical note, Figure 4.1 illustrates the problem of clustering in a compelling way: classrooms differ systematically. That is, students in a high-performing classroom are more likely to score higher than are students in a low-performing classroom. If we put this lack of independence another way, we can say that student scores in the same classroom are likely to be related, perhaps due to the shared influence of their environment (i.e., partly due to instruction, partly due to selection or assignment to particular teachers, as well as partly due to influence of higher levels of nesting at the school, district, and community). A fundamental assumption of most statistical models is that conditional on all variables used in the model (predictors), student scores (residuals) are independent. In the case of clustered data, as Figure 4.1 suggests, residual scores are not likely to be

98

Lee Branum-Martin

independent, unless classroom is used in the model. Insofar as a multilevel model handles this dependence across students due to classrooms, a multilevel model is a model of covariance (Mehta & Neale, 2005). Because SEM is a model of covariance, connecting the concepts between SEM and multilevel modeling can be helpful to increase our understanding of the connections among our theories, data, and models. Explaining this covariance and connecting SEM to multilevel models is one of the goals of this chapter. 4.1.2 Analysis: How to Handle Classrooms? Analyzing these data ignoring classrooms would clearly leave out a sizeable systematic factor. Alternatively, analyzing the means of each of 93 classrooms would vastly reduce the amount of information. In a case where there are other variables, such as predictors, the student level relations might be different than those at the classroom level. Collapsing across levels might result in distorted estimates of the relations among variables—an ecological fallacy (Burstein, 1980; Cronbach, 1976; Robinson, 1950; Thorndike, 1939). For the present case with a single outcome and no predictors (yet), it would be substantively interesting to simultaneously investigate how much variability there is in passage comprehension due to students and due to classrooms. There is a long history of discussions and alternative models to account for multilevel data (Kreft & de Leeuw, 1998; Raudenbush & Bryk, 2002). One approach would be to add a set of predictors for each individual classroom, such as 92 dummy variables in a regression or an analysis of variance (ANOVA) model. Such propositions raise substantive issues regarding sampling as well as issues regarding statistical estimation. If the classrooms in the data do not constitute a random sample from a universe of possible classrooms—that is, we wish to estimate their differences and not generalize beyond this current set of classrooms, then fixed effects represented by dummy indicators or an ANOVA may be appropriate. That is, the estimates are fixed because the sample is fixed. However, if the sample is drawn at random from a universe of classrooms (or at least plausibly so), and we can assume that the outcome variable is normally distributed (or distributed appropriately according to the kind of model we are using), then it can be efficient to estimate the overall mean and variance, rather than to fit separate models for each classroom: we can estimate two parameters instead of one parameter for each classroom.

4.2 A Univariate Model: Students Nested within Classrooms The most common and widely known model for the nesting of students within classrooms is for a single outcome. Such models have been referred to as multilevel regression models and mixed effects models, as they combine fixed with random regression coefficients (Kreft & de Leeuw, 1998). For a single outcome, Y, with no predictors for i students in j classrooms, we can fit a univariate random intercept model: Y ij = β 0 j + e ij ,

(4.1)

where Yij is the model-predicted outcome score for student i in classroom j, β0j is the regression intercept for classroom j, and eij is the deviation (residual—sometimes noted as rij) for that student. The residuals are assumed to be distributed normally over students

Multilevel Modeling

99

with mean zero and variance σ 2. At level 2 for classrooms, the intercept is decomposed into two parts: β0j = γ00 + u0j ,

(4.2)

where γ00 is the grand intercept (fixed), and the u0j is the random deviation from the grand intercept for that classroom. These fixed and random components are the reason such models are often referred to as “mixed effects models.” The random classroom deviations, u0j, are assumed to be distributed normally with variance τ00. Most readers will recognize that Equations 4.1 and 4.2 can be combined: Y ij = γ 00 + u 0 j + e ij .

(4.3)

Equation 4.3 compactly states that each individual score is the sum of a grand intercept, a classroom deviation, and a student deviation. The distributional assumptions are worth examining closer. At the student level, residual scores are assumed to be distributed normally1 with mean zero and variance σ 2, noted N [0, σ 2]. The mean is zero because these are deviations from the overall model-predicted mean, γ00. At the classroom level, classrooms deviate from this grand mean with a mean of zero and variance τ00, noted N[0, τ00]. It can be confusing to beginners that one variance is squared and the other is not (but some authors choose to use τ 2; Snijders & Bosker, 1999). We will later change this notation in order to reinforce that these parameters are both variances and to emphasize the connections with SEM. Equation 4.3 can be represented graphically, with the helpful addition that variances are shown in Figure 4.2. Standard rules of path tracing apply with a strict correspondence to each parameter in Equations 4.1 through 4.3. That is, we can follow the arrows additively to calculate the model predicted score for any given child in any given classroom. Additionally, the hierarchical structure of the data and the equation for students nested within classrooms is visually reinforced by vertical stacking. Figure 4.2 is a full, explicit SEM representation using typical HLM notation. This representation is useful to bring together Equations 4.1 through 4.3 along with their variance

classroom (level 2)

grand intercept 1 00

classroom variance classroom deviation classroom intercept

student (level 1)

outcome score

student deviation residual variance

Figure 4.2 Diagram of a univariate two-level model for students within classrooms (typical notation, with full, explicit representation).

100

Lee Branum-Martin

components, as is typical in SEM. Figure 4.2 also reinforces the idea that random effects are latent variables, usually represented in the Equations 4.1 through 4.3 with implicit (omitted) unit regression weight. That is, these latent variables, u0j and eij, have fixed factor loadings (Mehta & Neale, 2005). I explore the notion of factor loadings in more detail later. From an SEM perspective, we also notice that β0j is a phantom variable: a latent variable with no variance. In this sense, it serves only as a placeholder in the notation (no separate values are estimated in the model). We have included it here only for explicit correspondence to the typical equations. More importantly, however, we may wish to fit different kinds of models for students and teachers. In the same way that mixed effects models can be represented by equations at each level, I introduce a few changes in order to open interesting possibilities later. First, I separate the levels into two submodels (one for students and one for classrooms) and explicitly link them. Second, following SEM notation at the classroom level, I adopt η as our symbol for latent classroom deviations, with mean α and variance Ψ. Third, at the student level, I use SEM notation for the residual variance: θ. Figure 4.3 shows the results of these changes, making it correspond closely to standard SEM. The most important parameters, the variances of the random effects (Ψ and θ) are shown with the fixed effect (α). In this model, we are interested in estimating three parameters: the latent mean across classrooms (α), the variance of classroom deviations (Ψ), and the variance of the student residuals (θ). The cross-level link is fixed at unity and not estimated (and, as we will see, is usually handled automatically by software). For the sake of completeness, now we can compact the mean structure rewrite Equation 4.3 using SEM notation from Figure 4.3: Y ij = 1η j + 1e ij ,

(4.4)

where the values of ηj are distributed N[α, Ψ] and eij are distributed N[0, θ]. The first term, 1ηj, shows the linking of each student score, Yij, by a unit weight design matrix: Each student gets counted in his or her respective classroom. Thus, each student can be thought of as a variable indicating the latent classroom mean, with a fixed factor loading of unity (Mehta & Neale, 2005). Each classroom has a latent mean, ηj, with grand mean,2 α, with variance Ψ. Additionally, the individual residual, eij, is shown with an explicit unit link. These parameters for this model (α, Ψ, θ) respectively represent those from conventional mixed

classroom (level 2)

grand intercept (latent mean)

\v

classroom variance classroom deviation cross-level link

student (level 1)

outcome score student deviation residual variance

Figure 4.3 Diagram of a univariate two-level model for students within classrooms.

Multilevel Modeling

101

effects models , (γ00, τ00, σ 2), such as those in computer programs such as HLM (Hierarchical Linear Modeling), PROC MIXED, or other multilevel representations (Mehta & Neale, 2005). I have chosen to represent the unit links explicitly in order to invoke the notion of a factor loading from SEM: Random effects in multilevel models are latent variables in SEM. In order to understand the connection between Equation 4.4 and Figure 4.3, it can be helpful to visualize the data structure implied by the model. I then illustrate this model in an expanded, explicit way to demonstrate how a multilevel model is a model of covariance. First, we can imagine a simple data set of three classrooms of two students each: Student

Classroom

Outcome

1

1

Y11

2

1

Y21

3

2

Y32

4

2

Y42

5

3

Y53

6

3

Y63

Each outcome score in italics shows the individual student scores, Yij, where the subscripts indicate the student (i) and respective classroom (j). Then, in matrix form, Equation 4.4 would match to the dataset of six students in three classrooms in the following way: ⎡Y 11 ⎤ ⎡1 0 0⎤ ⎡e11 ⎤ ⎢Y ⎥ ⎢1 0 0⎥ ⎢e ⎥ ⎢ 21 ⎥ ⎢ ⎥ ⎡η ⎤ ⎢ 21 ⎥ ⎢Y 32 ⎥ ⎢0 1 0⎥ ⎢ 1 ⎥ ⎢e 32 ⎥ ⎢ ⎥=⎢ ⎥ ⎢η2 ⎥ + ⎢ ⎥ ⎢Y 42 ⎥ ⎢0 1 0⎥ ⎢η ⎥ ⎢e 42 ⎥ ⎢Y ⎥ ⎢0 0 1⎥ ⎣ 3 ⎦ ⎢e 53 ⎥ ⎢ 53 ⎥ ⎢ ⎢ ⎥ ⎥ ⎣⎢Y 63 ⎦⎥ ⎣⎢0 0 1⎦⎥ ⎣⎢e 63 ⎦⎥

(4.5)

Equation 4.5 is the matrix version of Equation 4.4 and shows the outcome scores on the left side (Yij). On the right side, the full matrix of unit links for each classroom’s deviation (ηj) is actually a i × j design matrix, allocating each block of two students to their respective classroom (Mehta & Neale, 2005). This matrix is usually constructed automatically by software, simply by identifying the classroom cluster variable (e.g., teacher ID). The second matrix is a column vector of latent variables to be estimated for each classroom. In this case, there is one latent η for each of the three classrooms. Finally, each student has a residual score, deviating from the model prediction for the classroom. In the next sections, we will see how this model can be specified in software script. Figure 4.4 completes the representation of Equation 4.5. Each classroom has a latent factor, indicated by its students. At the top of Figure 4.4, a single mean is estimated to be common across latent classroom factors (separate latent means are constrained equal)— classroom latent factors deviate from this mean. Similarly, the variances of the classroom factors are constrained equal, because we are interested in estimating the population variability from this sample of classrooms.

102

Lee Branum-Martin classroom (level 2)

student (level 1)

8

8

8

8

8

Figure 4.4 Structural equation modeling (SEM) diagram of two students each in three classrooms (expanded, multivariate layout).

At the bottom of Figure 4.4, each student in the model from Equation 4.5 is represented explicitly. Residual variances are constrained to a single value (θ), because the model assumes that conditional on classroom factors, student scores behave as if drawn randomly from a homogeneous population. The cross-level linking matrix in the middle of Figure 4.4 is represented as classroomspecific paths from three latent variables, one per classroom. This representation conforms to the linking matrix from Equation 4.5, with blocks of ones in three columns. The factor loadings are fixed to unity for each student within the respective classroom. The links between students not in particular classrooms are constrained to zero and are therefore not shown in the diagram. Figure 4.4 emphasizes that the multilevel latent variable for each classroom is a model of covariance between persons. That is, because people’s scores within a classroom are expected to covary due to their shared environment, the latent variable for each classroom is fit to them. Conditional on the classroom latent variable, students are independent of each other. This is akin to a model of parallel tests where people in a univariate multilevel model act as variables from a single level, multivariate model (Mehta & Neale, 2005). The representation in Figure 4.4 can be extended to any number of persons per classroom and any number of classrooms. However, this expanded version can be quite cumbersome to specify. It is shown here only to emphasize that SEM is a model of covariances, multilevel regression is a special case of SEM (Curran, 2003; Mehta & Neale, 2005), and to clarify the role of the cross-level linking matrix as a matrix of factor loadings. Subsequently in this chapter, I return to the more compact notation (Figure 4.3). It is worth noting that this same shift from univariate to multivariate layout is conceptually the same shift used for multiple time points within persons, allowing individual growth models to be fit as latent variables in SEM (McArdle & Epstein, 1987; Mehta & West, 2000; Meredith & Tisak, 1990; B. O. Muthén, 1997). 4.2.1 PROC MIXED Estimation of the Univariate Model Let us return to the sample of 802 first grade students in 93 classrooms. In SAS PROC MIXED, the univariate random intercepts model would be specified:

Multilevel Modeling Code

103

comment

proc mixed data=PCWA method=ML; name of dataset and estimation method class teacher;

teacher ID as classification variable

model PC = / solution;

outcome with no predictors

random intercept / subject=teacher; run;

intercept is random at teacher level

Similar to the examples provided in Singer (1998), this code specifies that teacher is a classification variable for the clustering of students in classrooms. The model statement gives the regression equation in which passage comprehension, PC, is specified without predictors. The solution portion of the model statement requests the estimate of the fixed effect: the latent mean across classrooms (α). The residual variance (θ) is estimated implicitly by default. The intercept is specified to be random across classrooms (teacher). The random statement produces the variance component (Ψ ) of the random effect (intercept, ηj) implicitly by default. The method=ML is specified in order to be comparable to typical SEM estimation (other estimators can be chosen). The first main result of this PROC MIXED code are the covariance parameter estimates: Covariance Parameter Estimates Cov Parm

Subject

Intercept

teacher

Estimate 89.8343

Residual

410.00

These show that the variance (Ψ) across classroom intercepts (teacher) = 89.8343 and the residual variance (θ) = 410.00. If we add these variance components, the total variance of the model is 499.8 (89.8 + 410.0). An important concept in multilevel models is the intraclass correlation (ICC), or proportion of variability due to clustering. The ICC is simply the ratio of cluster level variance over total variance (89.9 / 499.8 = 0.18). In this case, the model suggests that 18% of the variance in the student level outcome is due to classroom clustering. The second important part of the output shows the fixed effect of the intercept (the latent mean, α) as 444.00 W-score units. This implies that in a perfectly average classroom, we would expect the classroom mean to be 444 W-score units. If we take the square root of the preceding variance components, then classrooms have a standard deviation (SD) of 9.5 W-score units around this mean and students differ within their classrooms with an SD of 20.2 W-score units.

Solution for Fixed Effects Effect Intercept

Estimate

Standard Error

DF

t Value

Pr > |t|

444.00

1.2806

92

346.70