Applying the Rasch Model in Social Sciences Using R [1 ed.] 1138500771, 9781138500778

This unique text provides a step-by-step beginner’s guide to applying the Rasch model in R, a probabilistic model used b

410 56 13MB

English Pages 302 Year 2019

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Applying the Rasch Model in Social Sciences Using R [1 ed.]
 1138500771, 9781138500778

Table of contents :
Dedication
Contents
List of Figures
List of Tables
List of Equations
List of Abbreviations
Acknowledgments
Preface
1 Introduction
2 “Have You Assaulted a Politician?” Measuring Political Engagement Using the Simple Rasch Model
3 Expectations and Residuals
4 “All We Need Is Love (and Money)!” An Introduction to the Rating Scale Rasch Model
5 Relaxing the Common Scale: The Partial Credit Model
6 Observations or Subjective Interpretations? A Manyfacets Rasch Model
7 The Rasch Model as a Generalized Linear Mixed Model
8 Publishing a Paper: Detection of Online Hate Speech
References
Index

Citation preview

“A fantastic introduction to the Rasch model from the novice to the experienced researcher, with self-assessment questions, code, exercises and additional readings. Iasonas Lamprianou took on the challenging task of making accessible information that [is] not readily understood. His step-by-step approach builds upon prior knowledge and enables the reader to follow through more challenging concepts. Concepts and terms are made easy using a very gentle introductory approach for individuals with little background in mathematics and statistics. The available code enables the reader to easily apply each methodology and visualize results and obtain the necessary statistics. It is a must have for graduate students and researchers who plan on utilizing the Rasch model for scale development and evaluation.” —Georgios Sideridis, Boston Children’s Hospital, Harvard Medical School, US “This book is an extremely useful introduction to a very important but still relatively little-known approach to carrying out quantitative research in education, psychology, economics or social sciences, to name just a few areas of study. All these fields benefit from the measurement of constructs such as language proficiency, political engagement, quality of life, extroversion or parental support that cannot be observed directly but have to be constructed from concrete indicators. Lamprianou discusses the principles and practice of building measures with Rasch analyses in a very accessible way by walking the reader through the key concepts with the help of published studies. He encourages the readers to conduct their own analyses with the provided data samples by using the freely available software BlueSky Statistics, which is based on the R platform. The book has a lot to offer to those who are involved in constructing measures across a wide range of areas of studies.” —Ari Huhta, University of Jyväskylä, Finland “This is an ideal book for the beginner and the experienced researcher interested [in exploring or enhancing] their knowledge and skills on the ‘magic world of the Rasch models’, making the transition to the R platform. The author has made an excellent work in organising the book in such a way to effectively guide the reader through simple steps from conducting a Rasch analysis to more contemporary and advanced methods such as the Generalized Linear Mixed Models. The last chapter is a real innovation: it provides a fully blown example of how to construct a manuscript for submission to a journal, using the Rasch model. Applying the Rasch Model in

Social Sciences using R is a well-written book with real life applications that could be a great resource for anyone interested in learning and applying Rasch modelling in analysing research data.” —Panagiotis Antoniou, University of Cambridge, UK “Applying the Rasch Model in Social Sciences using R and BlueSky Statistics is an intrinsically statistical book where jargon and acronyms abound. However, Iasonas Lamprianou, with his years of experience in the field and refreshing sense of humour, takes R and Rasch modelling to the next level. Even though not a primer for the general inexperienced reader, the book is carefully organised, introduces R in an accessible way and offers much inspiration through the vivid examples and breezy language. The book will make an absorbing read that offers independence but clear signposting and very many exciting venues of research based on substantial guidance and input for those who want to make the extra mile.” —Dina Tsagari, OsloMet University, Norway “If you are new to Item Response Theory (IRT) and Rasch measurement, this book is an excellent introduction to how to use R (via the user-friendly platform of BlueSky) to perform statistical analyses within this framework. The book covers key concepts and resources needed to use basic and more advanced Rasch techniques to analyse data from various disciplines, including education, psychology and political science. Its style is very accessible; each chapter really drives home the conceptual understanding before proceeding to the formulae, the statistical procedures and the examples. It is written in an enjoyable way and can be read through cover to cover. Moreover, I found the wealth of clear examples, the nice computer outputs and, most importantly, the intuitive explanations an excellent educational way to preview both R and Rasch analysis. Applying the Rasch Model in Social Sciences Using R and BlueSky Statistics is an essential resource for anyone wishing to begin, or expand, their learning of Rasch measurement techniques using R.” —Ioannis Tsaousis, University of Crete, Greece

Applying the Rasch Model in Social Sciences Using R and BlueSky Statistics This unique text provides a step-by-step beginner’s guide to applying the Rasch model in R, a probabilistic model used by researchers across the social sciences to measure unobservable (“latent”) variables. Each chapter is devoted to one popular Rasch model, ranging from the least to the most complex. Through a freely available and user-friendly package, BlueSky Statistics, Lamprianou offers a range of options for presenting results, critically examines the strengths and weaknesses of applying the Rasch model in each instance, and suggests more effective methodologies where applicable. With a focus on simple software code that does not assume extensive mathematical knowledge, the reader is initially introduced to the so-called simple Rasch model to construct a “political activism” variable out of a group of dichotomously scored questions. In subsequent chapters, the book covers everything from the Rating Scale to the Many-facets Rasch model. The final chapter even showcases a complete mock manuscript, demonstrating what a Rasch-based paper on the identification of online hate speech should look like. Combining theoretical rigor and real-world examples with empirical datasets from published papers, this book is essential reading for students and researchers alike who aspire to use Rasch models in their research. Iasonas Lamprianou is Assistant Professor of Quantitative Methods at the University of Cyprus. He specializes in the operationalization and measurement of latent variables in the social sciences. Iasonas has a longstanding interest in the application of Rasch models on empirical data and has published widely in diverse disciplines such as education, political science, sociology and psychology.

Quantitative Methodology Series George A. Marcoulides, Series Editor This series presents methodological techniques to investigators and students. The goal is to provide an understanding and working knowledge of each method, with a minimum of mathematical derivations. Each volume focuses on a specific method (e.g., Factor Analysis, Multilevel Analysis, Structural Equation Modeling). Proposals are invited from interested authors. Each proposal should consist of: a brief description of the volume’s focus and intended market; a table of contents with an outline of each chapter; and a curriculum vita. Materials may be sent to Dr. George A. Marcoulides, University of California—Santa Barbara, [email protected]. Duncan/Duncan/Strycker  •  An Introduction to Latent Variable Growth Curve Modeling: Concepts, Issues, and Applications, Second Edition Cardinet/Johnson/Pini  •  Applying Generalizability Theory Using EduG Creemers/Kyriakides/Sammons  •  Methodological Advances in Educational Effectiveness Research Heck/Thomas/Tabata  •  Multilevel Modeling of Categorical Outcomes Using IBM SPSS Heck/Thomas/Tabata  •  Multilevel and Longitudinal Modeling with IBM SPSS, Second Edition McArdle/Ritschard  •  Contemporary Issues in Exploratory Data Mining in the Behavioral Sciences Heck/Thomas  •  An Introduction to Multilevel Modeling Techniques: MLM and SEM Approaches Using Mplus, Third Edition Hox/Moerbeek/van de Schoot  •  Multilevel Analysis: Techniques and Applications, Third Edition Lamprianou  •  Applying the Rasch Model in Social Sciences Using R and BlueSky Statistics

Applying the Rasch Model in Social Sciences Using R and BlueSky Statistics IASONAS LAMPRIANOU

First published 2020 by Routledge 52 Vanderbilt Avenue, New York, NY 10017 and by Routledge 2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN Routledge is an imprint of the Taylor & Francis Group, an informa business © 2020 Taylor & Francis The right of Iasonas Lamprianou to be identified as author of this work has been asserted by him in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging-in-Publication Data A catalog record for this book has been requested ISBN: 978-1-138-50077-8 (hbk) ISBN: 978-1-138-50078-5 (pbk) ISBN: 978-1-315-14685-0 (ebk) Typeset in Charter by Apex CoVantage, LLC Visit the eResources: www.routledge.com/9781138500785

I dedicate this book to you, Thekla, because you have always supported all my endeavors.

Contents

List of Figures xiii List of Tables xix List of Equations xxi List of Abbreviations xxiii Acknowledgmentsxxv Prefacexxvii 1 Introduction 1 Quantitative Research in Social Sciences 1 The Need for Reliable and Valid Measurement 3 From Observations to Measures 5 Basic Information on R and BlueSky Statistics 6 How to Read This Book 9 Summary11 Brief Summary of Key Concepts 12 Additional Reading 13 2 “Have You Assaulted a Politician?” Measuring Political Engagement Using the Simple Rasch Model Case Study I: Measuring Political Engagement Using the Rasch Model Application of the Rasch Model on Empirical Data: A Black Box Approach The Building Blocks of Information: Response Patterns and Raw Scores Estimates of Activity Popularity/Unpopularity Estimates of Person Activism

15 16 29 34 34 45

x

contents

Missing Responses and Their Effect on Rasch Estimates 51 Putting Everything Together 57 Case Study II: The Probabilistic Thinking of Pupils 57 Rasch Estimates and the Person-Item Map 61 Item Information Curve 63 Sample Size (Persons) Needed 68 Number of Items Needed 71 Summary72 Self-Assessment Questions 74 Brief Summary of Key Concepts 74 Additional Reading 76 3 Expectations and Residuals 79 A Gentle, but Formal, Presentation of the Rasch Model 79 Expected Responses 81 Visualizing Expectations With Item Characteristic Curves 83 Residuals: When Reality Fails Our Expectations! 85 Visual Assessment of Residuals 87 Model-Data Fit Statistics as Aggregations of Residuals 90 How to Evaluate Model-Data Fit in Practice 94 Person-Fit Indices 97 Other Available Fit Statistics 101 Rasch Model Assumptions and Violations: A General Discussion on Model-Data Misfit 105 Summary108 Self-Assessment Questions 109 Brief Summary of Key Concepts 109 Additional Reading 111 4 “All We Need Is Love (and Money)!” An Introduction to the Rating Scale Rasch Model So, What Do University Students Want From Their Families? Application of the RSM Item Characteristic Curve The Mathematical Approach to RSM Expected Scores Curve Distribution of Raw Scores and Estimation of Thetas Investigation of Person Fit (Expectations vs Observations)

113 113 121 124 128 130 132 138

contents

Investigation of Item (Statement) Fit 148 Further Investigation of Model-Data Fit 152 Item-Person Map 155 Category Information Curves 157 Some General Guidelines When Applying the RSM 159 Summary160 Self-Assessment Questions 161 Brief Summary of Key Concepts 162 Additional Reading 162 5 Relaxing the Common Scale: The Partial Credit Model 163 A Mathematical Formulation of the PCM 163 Application of the PCM 164 Item-Person Map 167 Goodness of Fit 172 Summary174 Self-Assessment Questions 175 Brief Summary of Key Concepts 175 Additional Reading 176 6 Observations or Subjective Interpretations? A Manyfacets Rasch Model 177 Applying the MFRM 178 A Formal Presentation of the MFRM 184 Relaxing the Assumptions of a Common Rating Scale 186 Individual Raters Implementing Their Own Rating Scale 186 Investigating Model-Data Fit 191 The Problem of Disjoint (Disconnected) Subsets 194 Summary195 Self-Assessment Questions 195 Brief Summary of Key Concepts 196 Additional Reading 197 7 The Rasch Model as a Generalized Linear Mixed Model Introducing the Rasch Model as a GLMM Applying the Rasch Model as a GLMM The Data A Simple Logistic Regression Model Applying the Rasch Model as a GLMM

199 199 202 202 204 216

xi

xii

contents

Rasch as a Latent Regression Model Using the Lme4 Package 219 Rasch as a Latent Regression Model Using the TAM Package 222 Summary228 Self-Assessment Questions 229 Brief Summary of Key Concepts 229 Additional Reading 230 8 Publishing a Paper: Detection of Online Hate Speech Personal and Group Effects in a Crowdsourcing Experiment to Detect Online Hate Speech Acknowledgments Introduction: The Problem of Online Hate Speech The Complexities of Hate Speech Detection Aims and Objectives Methodology Results Discussion Appendix—Operational Definition of Hate Speech

231 232 232 232 233 236 237 241 256 259

References261 Index269

Figures

1.1 1.2 1.3 2.1

The Introductory Screen of R The Data Window of BlueSky Statistics The Output and Syntax Window of BlueSky Statistics Different Operational Definitions of the Construct “Political Participation” 2.2 “During the last 12 months, have you . . . . Yes/No” 2.3 An Extract of the Rectangular Response Matrix Holding the Responses of 5025 Persons to 13 Questions Regarding Political Activities 2.4 Another Extract of the Dataset (activities and persons have not been sorted) 2.5 A “Magic Black Box” Analogy of a Probabilistic Model Predicting Whether a Person Gets Involved With a Specific Political Activity 2.6 A Magic Box Estimating the Probability of a Person (of known tendency towards political activities) to Engage in a Specific Political Activity (with known popularity) 2.7 A Judgment of a Person’s Activism and a Judgment of an Activity’s Popularity Govern the Probability that a Person Will Engage With an Activity 2.8 Percentage of Positive Responses—Some Activities are Very Difficult to Endorse 2.9 The Frequency of Respondents Who Got Involved in a Different Number of Activities 2.10 A Correlogram (correlations among all pairs of activities) 2.11 Available Menu and Frequency Tables for Some of the Variables of the Engagement Dataset (BlueSky Statistics)

7 8 9 17 18

20 21

23

24

25 30 32 33 35

xiv

figures

2.12 A Simple Rasch Model Analysis of Political Engagement 2.13 The Estimates of the Simple Rasch Model Parameters, for the Political Engagement Dataset 2.14 The Non-linear Relationship Between Activity Popularity Expressed as a Percentage of Positive Responses and Activity Popularity Expressed as a Rasch Measure 2.15 The Conversion From Popularity Percentages to Popularity Measures 2.16 A Histogram of Person Activism Estimates 2.17 How to Load the Scripts Window 2.18 How to Run Syntax in the Scripts Window 2.19 Activity Popularity Expressed as a Count of Positive Responses (vertical axis) and as a Rasch Measure (horizontal axis) 2.20 The Same Count of Positive Responses Can Correspond to Different Popularity Measures (a magnification of the lower left part of Figure 2.19) 2.21 The Conversion From Counts to Measures 2.22 The Estimates of the Simple Rasch Model Parameters, for the Probabilistic Thinking Dataset 2.23 Person-Item Map for the 13 Items of the Probabilistic Thinking Scale 2.24 Item and Test Information Plots 2.25 The Standard Error of Item Estimates (upper part) and the Distribution of Ability Estimates (lower part) 2.26 A Summary of All Steps, From the Conceptualization of the Main Variables, to the Analysis of Relevant Data on Political Engagement 3.1 ICC Plots for Item 2: a Simple Plot (left), With an Empirical Curve (middle) and With 95% Confidence Intervals of Observed Points (right) 3.2 ICC Plots for Item 4, With an Empirical Curve and 95% Confidence Intervals 3.3 A Graphical Representation of Andersen’s LikelihoodRatio Test 4.1 A Visualization of the Agreement vs Disagreement Responses to the Five Family Involvement Statements 4.2 Two Different Visualizations of the “Agreement Ladder” Where the Verbal Descriptions of the Rating Scale (AD

38 39

41 42 46 47 48

50

53 54 62 64 65 67

73

84 89 103 117

figures

4.3

4.4 4.5 4.6

4.7

4.8

4.9 4.10 4.11 4.12 4.13

4.14 4.15

4.16

to AA) Have Been Mapped to Different Sets of Arbitrary Numerals: 1–4 (left) and 0–3 (right) For Every Given Respondent, We Wish to Express His or Her Overall Attitude Towards Family Involvement With a Single Numerical Value, Which Can Be Used Later for Further Analysis MML Estimation for the Rating Scale Rasch Analysis of the Family Involvement Data The RSM Output (MML estimation) for the Family Involvement Scale Left Plot: the Theoretical Probability (according to the model estimated parameters) for Each Response to be Observed at Each Point of the Latent Variable. Right Plot: Same as the Left Plot but With the Empirical Probabilities to Facilitate Model-Data Fit Comparisons Left: the Shaded Area Shows the Interval of the Latent Variable Where Cat1 is the Most Likely Outcome for the Academic Statement. Right: the Shaded Area Shows the Interval of the Latent Variable Where Cat2 is the Most Likely Outcome for the Academic Statement A Total of 41% of the Respondents Chose Options Other Than the Most Likely One, Which Was Option D for the Academic Statement ICCs for the Academic Statement ESC for the Academic Statement (left) and the Emotional Statement (right) The Distribution of the Raw Scores and of Rasch Estimates for the Respondents The Raw Score Is a Sufficient Statistic for Rasch Estimates The Probability of a Student With an Estimate of θ = 1.277 Logits to Choose Each of the Four Options on the Five Statements Expected Responses to the Five Statements According to the RSM Expected Responses and Residuals to the Five Statements According to a Student Whose Response Pattern Was {Abs. Disagree, Agree, Agree, Abs. Agree, Abs. Agree} Expected Responses and Residuals for a Student With a Total Score of 10 on the Five Statements

119

120 122 123

125

127

128 131 133 134 139

140 142

144 149

xv

xvi

figures

4.17 The Distribution of the Person Estimates (left part) and of the Scale Category Estimates (right part) 4.18 A Category Information Curves Plot for the Academic Statement 5.1 The PCM Output (MML estimation) for the Family Involvement Scale 5.2 The Item-Person Map for the PCM 5.3 A Comparison of the ICCs of the Academic and Emotional Statements 5.4 A Comparison of the ICCs of Six Items With Different Category Thresholds 6.1 MML Estimation for the Many-facets Rating Scale Rasch Analysis 6.2 The RSM Output (MML estimation) for the MFRM 6.3 Category Probability Curves for Rater 1, Question 1 (MFRM, RSM) 6.4 MML Estimation for the Many-facets Partial Credit Rasch Analysis 6.5 The Code Regulating the Partial Credit MFRM 6.6 CPC for Raters 4 (left plot) and 9 (right plot); Parameter Estimates From an MFRM Allowing Rater-Specific Category Estimates 7.1 Long-format Data for a GLMM 7.2 Use the “Subset” Menu to Create a New Dataset, Filtering Out Some of the Data 7.3 Use the Syntax type==“Which” to Filter Out All Rows in the Dataset Where the Value in the Variable Type is “Which” 7.4 A New Dataset Will Be Generated According to the Criteria type==“Which” 7.5 To Convert a Numerical Variable to Factor, Go to the Menu and Choose the “Make Factor Variable” Option 7.6 Go to the Model Fitting Menu to Run a Logistic Regression Model 7.7 The Logistic Regression Window 7.8 The Output of the “Naïve” Logistic Regression Model 7.9 MML Estimation for the RSM for the “Reading for Pleasure” Scale 7.10 MML Estimates of the RSM for the “Reading for Pleasure” Scale

156 159 165 168 169 171 180 181 183 187 188

192 203 205

206 207 208 210 211 212 223 224

figures

7.11 Infit and Outfit Statistics for the RSM for the “Reading for Pleasure” Scale 7.12 Parts of the RLR Output for the “Reading for Pleasure” Scale, Using Gender and Year of Study as Explanatory Variables 7.13 The Standard Error of the Regression Coefficients of the Explanatory Variables 8.1 Interaction Between Attendance and Group (the impact of absenteeism is different for journalism and sociology students) 8.2 Distribution of Coding Latencies 8.3 Mean Coding Latency per Tweet (for different “packages” of tweets), for Male and Female Coders 8.4 Mean Percentage of Speeded Coding for Different “Packages” of Tweets, for Male and Female Coders

225

227 228

246 252 253 255

xvii

Tables

2.1 Examples of Probabilities of Engagement (assuming known activity popularity and person activism) 2.2 Estimated Probabilities of Engagement  2.3 Response Patterns and Missing Responses by Five Individuals 2.4 Descriptive Statistics of Items 3.1 Item-fit Statistics for Each of the 13 Items on the Test 3.2 Reasonable Question Infit and Outfit Mean-square Ranges 3.3 Fit Statistics for Each of the Pupils Who Took the Test 4.1 Five Likert-type Statements for Familial Involvement in Students’ Lives 4.2 Frequency Distribution for Each of the Four Categories of the Likert Scale, for Each of the Five Questions 4.3 The Goodness of Fit Statistics of the Response Patterns of Four Students 4.4 The Goodness of Fit Statistics of the Statements (and the categories of the scale); Based on Simulations 4.5 The Goodness of Fit Statistics for Each of the Statements on the Test (not based on simulations) 5.1 The Goodness of Fit Statistics for Each of the Statements on the Test (not based on simulations) 6.1 Parts of the Output of the Rasch Analysis Where Each of the Raters Applies Their Own Rating Scale 7.1 The Output of the GLMM1 (pupils are random effects and items are fixed effects) 7.2 The Output of the GLMM2 (pupils are nested into classes)

27 28 56 59 95 96 99 115 116 148 150 152 173 190 217 219

xx

tables

7.3 The Output of the GLMM3 (latent regression with grade as an explanatory variable) 8.1 Rasch Latent Regression for Online Hate Speech Detection (tweet deltas are fixed effects; coder thetas are random effects) 8.2 Rasch Latent Regression for Online Hate Speech Detection, Using Thematic Code as Predictors (tweet deltas are fixed effects; coder thetas are random effects)

221

245

249

Equations

  1   2   3   4   5   6   7   8   9 10 11 12 13 14 15 16

Log-odds of a correct response The simple Rasch model as an exponential equation Standardized residuals Outfit Mean Square statistic Infit Mean Square statistic The Rating Scale Rasch model Expected score Residuals Standardized residuals Expected variance for observed response Infit Mean Square (polychotomous case) Outfit Mean Square (polychotomous case) Infit Mean Square (for items) Outfit Mean Square (for items) The Partial Credit model A many-facets approach to the log-odds of a positive response 17 The simple Many-facets Rasch model 18 The Many-facets Rating Scale Rasch model

80 81 86 91 91 129 141 143 143 143 146 146 148 148 163 184 184 184

Abbreviations

ABBREVIATIONS FOR MODELS RSM

Rating Scale model

PCM

Partial Credit model

MFRM

Many-facets Rasch model

GLMM

Generalized Linear Mixed models

ABBREVIATIONS FOR ESTIMATION METHODS AND ESTIMATES CML

Conditional Maximum Likelihood

EAP

Expected A Posteriori (Estimate)

JML

Joint Maximum Likelihood

MLE

Maximum Likelihood Estimate

MML

Marginal Maximum Likelihood

UCON

Unconditional Maximum Likelihood

WLE

Weighted Likelihood Estimate

ABBREVIATIONS FOR VARIOUS STATISTICS 95% CI

95% Confidence Interval

df

Degrees of Freedom

xxiv

abbreviations

IMS

Infit Mean Square

OMS

Outfit Mean Square

SE

Standard Error

ABBREVIATIONS FOR DIAGNOSTIC AND EXPLORATORY RASCH FIGURES CIC

Category Information Curve

CPC

Category Probability Curve

ESC

Expected Score Curve

ICC

Item Characteristic Curve (for Rasch graphs; Chapters 2–6) Intra-class Correlation (for GLMMs; Chapters 7 and 8)

IIC

Item Information Curve

IRF

Item Response Function

TCC

Test Characteristic Curve

TCF

Test Characteristic Function

TIC

Test Information Curve

ABBREVIATIONS FOR LIKERT SCALE (USED IN CHAPTERS 4, 5 AND 8) AD

Absolutely Disagree

D

Disagree

A

Agree

AA

Absolutely Agree

DK

Don’t Know

HS

Hate Speech

NHS

Not Hate Speech

Acknowledgments

I would like to express my thanks to everyone on the Routledge team who helped me with this book. Also, thanks to my colleagues at the Department of Social and Political Sciences, University of Cyprus, for granting me the sabbatical leave which allowed me to invest enough time for this book. My sincere thanks to my colleague Antis Loizides, who had the patience to read the book and give me such detailed feedback. I am also indebted to Alexander Robitzsch, IPN—Leibniz Institute for Science and Mathematics Education at Kiel University, who had the patience and the perseverance to engage with the book in order to give me thoughtful comments. Alexander, I truly appreciate your spontaneous contribution! Special thanks to the Quantitative Methodology Series Editor, Professor George A. Marcoulides, who gave me the opportunity for this book. George, without your encouragement, I would probably have hesitated to undertake this project. Finally, my warmest thanks to Professor Julian Williams, University of Manchester, UK, for being an inspirational teacher and a reliable friend.

Preface

It seems like it was only yesterday that I was studying for my master’s degree in educational assessment, at the University of Manchester, UK. Being young and thirsty for knowledge, I complained to the course director that the course was not quantitative enough. He stared at me for a moment, leaned back in his chair, and hesitantly redirected me to “that mathematics professor on the fourth floor”! That was when I first met Professor Julian Williams, who introduced me to the magical world of the Rasch models. And the rest is history, as people often say. Back in those days, “running” a Rasch analysis was not very straightforward. One had to master the use of command-based software, with all its peculiar syntax and computational idiosyncrasies. Social sciences students typically detest this kind of software, so I quickly developed my own Windows-based Rasch software, using the equations so generously provided by Professors Geoff Masters and Benjamin Wright in two amazing books, Best Test Design and Rating Scale Analysis. Today, however, we need not worry too much about software. BlueSky Statistics, the software used in this book, offers a very intuitive and friendly environment, yet full of powerful tools, to make our Rasch experience pleasant and time efficient. This is important, because a researcher should spend more time reflecting on the theoretical and methodological challenges of quantitative research rather than on the use of the appropriate syntax. I hope that young and enthusiastic students—but also experienced researchers who would like to make the transition to the R world—will find this book useful. More people should be given the opportunity to enter the beautiful world of Rasch analysis, as I did, so many years ago, in a rainy city in the northwest of England.

xxviii

preface

NOTATION AND TYPOGRAPHY

{ Important notes will be presented to the left of the page. }

To engage productively with this book, we assume no familiarity with R. However, you are expected to understand some fundamental statistical concepts (such as the mean and the standard deviation) and to have—at least—some instrumental understanding of mathematical concepts such as the logarithm. Although we typically use words, rather than symbols (e.g., we use “mean” rather than x¯), be prepared to encounter some of the most typical standard mathematical symbols such as e for the base of the natural logarithm, sigma (Σ ) for summation, etc. Every chapter gives you the opportunity for practice on your computer, with datasets from published papers. This helps you to gradually build confidence and seamlessly enrich your practical analysis repertoire. The learning curve is not steep, and you should be able to produce and interpret Rasch output quickly. To help you navigate easily through different pieces of text, we use the Courier New font to signify R commands/code, variables and functions. Most of the code presented in Courier New font is also available on the book’s webpage and can be copied/pasted for convenience. All the datasets can be downloaded from the book’s webpage, but the relevant academic papers have to be downloaded from the publishers’ web site. At various places in the book, you may encounter an image like the one shown on the left. The text in the image will alert you to significant details or important information that you need to take note of.

1 Introduction

This chapter provides virtually no information about the Rasch model. Yet it is the most important chapter of the book, as it presents the fundamental epistemological foundations upon which the Rasch model resides. The first section introduces the reader to the concept of theoretical constructs which are measured through observable indicators. The second section discusses the need for reliable and valid measurement, especially in light of the spreading replication crisis in the wider field of social sciences. The third section presents a concise description of each chapter and discusses how the reader can make the best possible use of the book. Finally, the last section presents some brief information about BlueSky Statistics (www.blueskystatistics.com/), the software we use to conduct the analyses. We also discuss briefly some of the basic mechanics of using the R platform.

QUANTITATIVE RESEARCH IN SOCIAL SCIENCES Quantitative research has gained a lot of traction in social sciences in the past decades. The advent of faster (and cheaper) computers, the availability of state-of-the-art open-source software such as R (www.r-project.org/), the ease of access to big datasets and the greater speeds (and lower prices) of internet connection have contributed to the proliferation of quantitative research. It is not, any more, the Secret Garden of the few. A fundamental underpinning of quantitative research is that we observe the empirical world around us to collect numerous cues relevant to our research questions. These observations are coded in a disciplined way, so as

2

introduction

to become data which can then be processed through software, to produce information. This information will finally be evaluated and used to answer our research questions or to test our theory-driven hypotheses about the social phenomenon under investigation. Social researchers often wish to quantify physically unobservable traits as diverse as ability, ideology or creativity. It is not possible to directly observe the required information; instead, the researchers first define these traits theoretically (e.g., “what is creativity?”) and then identify specific observable indicators. The synthesis of the indicators, using appropriate techniques, will allow them to quantify the trait under investigation. The Rasch model is typically used to aggregate a set of indicators in order to express an unobservable trait as a single quantitative variable. The Rasch model has served the research community faithfully for decades, and has been very popular across many disciplines, from the social sciences to business and medicine. The unobservable traits to be measured are often called constructs, in the sense that you cannot observe them in the physical world, but they are socially or politically determined by us. For example, a political scientist may wish to measure the degree to which the citizens of a country are politically engaged. We collectively agree that political engagement, as a concept, represents the degree to which the people of a society participate in the commons. However, measuring the political engagement of individuals is not a trivial activity. It cannot be measured directly, as one would do when measuring the height or the weight of a person. Also, political engagement could mean different things to different people, so it needs to be theoretically defined, preferably with the help of the relevant literature. After agreeing on the theoretical definition, a researcher needs to define the construct operationally through a number of observable indicators (i.e., this is the operational definition of the construct). It is likely that different researchers, within the same discipline, may formulate slightly different operational definitions of the same variable, even if their research has similar aims and objectives. To elaborate a little bit more, assume that a sociologist wishes to study political engagement. Most researchers would typically define the construct operationally by asking the citizens questions such as “Did you vote in last year’s national elections?” or “Have you, in the last 12 months, signed a petition?” In some cases, however, it may be theoretically appropriate to ask citizens whether they had participated in activities such as assaulting politicians, destroying public property, clashing with police and the

introduction

like. A synthesis of these indicators using the Rasch model will allow the researcher to measure the degree of political engagement of each of the individuals participating in the study. These individual measures can then be used for other purposes, for example, to compare the average political activity of groups (e.g., male vs female citizens) or to identify important determinants of political engagement (e.g., age or educational level).

THE NEED FOR RELIABLE AND VALID MEASUREMENT In the past decades, we have witnessed the proliferation of quantitative research across the social sciences. However, during the same period, the academic community has also been plagued by the unfolding scourge of “replication crisis”, that is, by the lack of replicability of research findings (King, 2003). Although the replication crisis may be the result of many causes, using measurements of high quality can arguably increase our chances to produce replicable research findings. Using psychometric nomenclature, there are two fundamental components of the quality of measurement: validity and reliability. Validity is a fundamental concept of any measurement and has been at the center of academic discussions for decades (Kane, 2006, 2011). In the everyday use of the term, one may refer to the degree to which the operational definition approximates the theoretical definition of a construct. In other words, validity is the degree to which what we measure approximates what we claim to measure. The higher the validity, the more confident we are about our interpretations and uses of the measures. Building on the concept of validity, researchers often talk about validation, which is a cyclic procedure of collecting, organizing and evaluating evidence about the interpretation and use of the outcome of measurement. Although validity is a very complex concept and hundreds of articles and books have been written about it, for this book we do not need to elaborate further. We will meet the concept of validity in other parts of the book as it is the cornerstone of every measurement endeavor. The twin sibling of validity is reliability. Reliability is a technical concept which may be seen as another word for “precision”. The higher the reliability, the smaller the error and the more replicable the measures. Although validity is intrinsically a theoretical endeavor, often backed by statistical evidence, assessing the reliability of measurement is profoundly a statistical

3

4

introduction

exercise (Haertel, 2006). The Rasch model provides useful methodological tools to investigate the validity and to quantify the reliability of measurement. Note the deliberate use of two different verbs: we often say that the validity is investigated but the reliability is quantified. To elaborate on our example about the political engagement of people, our measures are valid to the degree that we do not use irrelevant indicators of engagement. For example, it would arguably be a threat to the validity of our measures to ask questions about political efficacy (e.g., “I feel that I understand politics”). Understanding politics is not an activity, although one might suggest that efficacy and activity could be, somehow, related. Validity, like ice cream, comes in many different flavors! One of the most discussed flavors is face validity. Face validity refers to the degree to which an instrument (e.g., a questionnaire) appears at first look to be appropriate, in terms of its declared aim. In the previous example, it would be harmful to the face validity of our research, if the researcher had included a question about political efficacy in a questionnaire measuring political engagement. The term face validity is often used to denote the opinion of non-experts regarding an instrument. For example, one might ask: “Does this questionnaire look like measuring political engagement, to the eyes of the average person, who will be invited to participate to the survey?” The term face validity is sometimes avoided by researchers because there is not a single, universally accepted definition. For example, in a footnote of a widely cited paper in the political sciences, Adcock and Collier (2001) suggest that “We have found so many different definitions of face validity that we prefer not to use this label” (p. 538). On the other hand, the degree to which the content of the instrument (i.e., the items) represents all the desired aspects of the theoretical construct is often called content validity. Content validity mainly refers to the opinion of experts, who are in the best position to ensure that we have not missed something important out of our instrument. It is possible for an instrument to have a high face validity (according to the average man on the street) but a low content validity (according to the experts). Most of the disciplines in social sciences draw on the psychometric literature to discuss concepts such as content validity (for an example from the political sciences, see Adcock and Collier, 2001). Finally, the consequential validity of our research refers to the possible social and/or political effects, mainly because of the use of our findings (see, e.g., Brewer et al., 2015). For a research to be consequentially valid,

introduction

it must not have negative consequences on individuals or society in general. The consequential validity of our study is important, and it is our duty to disseminate our findings in such a way that the possibility of misuse or misinterpretations will be minimized. In summary, it is important to conclude that any measurement process should yield both reliable and valid measures. It is often convenient to consider the two as sides of the same coin. It is not always possible to unambiguously distinguish between the two. This is sometimes a problem, as some people could wrongly use the terms as if they were interchangeable. In subsequent chapters, we will have the opportunity to discuss how the Rasch model allows us to use different tools to quantify the reliability and investigate the validity of our measurements.

From Observations to Measures It has been said that “[researchers] . . . are confronted with the problem of making sense out of a large mass of data. To handle this problem the data may be ‘reduced’ to a few summary measures. Techniques of data reduction . . . allow us to understand, to some degree, certain characteristics of . . . variables”. This is not a quote from a recent paper, echoing the anxiety of modern researchers dealing with voluminous data downloaded by the gigabyte from the internet. Instead, it is a quote from Labovitz (1967, p. 151), more than half a century ago! It seems that aggregating data has kept researchers busy for decades. So, how do we make the transition from data to measurement? As has already been hinted at in previous sections, we often wish to use the directly observable data as indicators of a latent variable. In the context of this book, we deal with the special case of the Rasch model, which is one of the simplest models for latent measurement (for a more general perspective, see Marcoulides and Moustaki, 2002). The essence of the Rasch model is that, under certain conditions, it enables researchers to aggregate data from multiple observed indicators (e.g., from a number of attitudinal questions), to express the outcome as a single variable. Thus, from many indicator variables, the researcher will only need to consider for purposes of further analysis a single latent variable. In subsequent chapters, we discuss how we can practically apply the Rasch model on empirical data, using the freely available software BlueSky Statistics which is based on the R platform.

5

6

introduction

BASIC INFORMATION ON R AND BLUESKY STATISTICS There is a lot of freely accessible material on the internet regarding the basic mechanics of using the R platform. Most of this material is of very good quality and is easy to read and understand. It is also possible to find tons of very useful introductory material regarding R on YouTube. There are also many recent and well-written books on R; you can search them on the internet or through your favorite bookseller (e.g., search for “R package” at Amazon). Using these resources, videos and books will give you a very good idea about how to download, install and run R. When successfully installed and run, the initial screen should look like Figure 1.1. R is a very rich environment which can be used for data processing and statistical analysis, but can also be used as an ordinary programming language. Thus, how you can use R is practically limited only by your imagination. Having a past experience with programming is not necessary, but it always helps. For example, it is interesting to know that all the entities that R creates or manipulates are called objects. Objects can be anything, for example, numbers, collections of numbers, graphs, datasets etc. We communicate with R using commands such as “objects()” which will return all the objects currently loaded in R’s memory (warning: using a capital “O” will lead to an error as R is case sensitive; thus R understands “Objects()” and “objects()” to be different commands). You can add two numbers and assign the sum to an object (let us call the object MySum) by giving the command “MySum