Systemic Racism and Educational Measurement: Confronting Injustice in Testing, Assessment, and Beyond 9781032128818, 9781032132020, 9781003228141

Systemic Racism and Educational Measurement provides a theoretical and historical reckoning with racism and oppression p

448 118 4MB

English Pages 413 [415] Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Classroom Assessment and Educational Measurement 113858004X, 9781138580046

Classroom Assessment and Educational Measurementexplores the ways in which the theory and practice of both educational m

9,043 215 4MB Read more

Digital and Analogue Instrumentation: Testing and measurement: Testing and measurement [1 ed.] 0852969996, 9780852969991

1,173 99 5MB Read more

Educational measurement

306 26 52MB Read more

Multilingual Testing and Assessment 9781800410558

This book addresses the need for research and guidance on testing multilingual students. The author introduces an integr

162 94 1MB Read more

Reproductive Injustice: Racism, Pregnancy, and Premature Birth 9781479805662

Finalist, 2020 PROSE Award in the Sociology, Anthropology and Criminology category, given by the Association of American

101 88 4MB Read more

Confronting Injustice: Moral History and Political Theory 9780199662555, 019966255X

David Lyons challenges us to confront grave injustices committed in the United States, from the colonists' encroach

273 49 3MB Read more

Maternal Activism: Mothers Confronting Injustice 1438455704, 9781438455709

Demonstrates how individuals can respond to widespread injustice and systemic militarization in society.

227 28 502KB Read more

Current Studies in Educational Measurement and Evaluation 9786057691064

1,454 128 11MB Read more

Argument-Based Validation in Testing and Assessment 9781544334479, 1544334478

Carol A. Chapelle shows readers how to design validation research for tests of human capacities and performance. Any tes

548 65 41MB Read more

Classroom Testing and Assessment for ALL Students : Beyond Standardization [1 ed.] 9781452211725, 9781412966429

"A rare opportunity for the new generation of educators to learn alongside a well-known and experienced educator to

174 28 3MB Read more

Systemic Racism and Educational Measurement: Confronting Injustice in Testing, Assessment, and Beyond
9781032128818, 9781032132020, 9781003228141

Author / Uploaded
Michael Russell

Categories
Education

Table of contents :
Cover
Half Title
Title Page
Copyright Page
Table of Contents
Preface
Notes on Terminology, Phrasing, and Conventions
Notes
References
Acknowledgements
Key Terms and Definitions
Introduction
The White Racial Frame: A Brief Introduction
Power and Apparatus of Oppression
Educational Measurement as Apparatus of Oppression
Directing Focus on Racism Rather than on Racist(s)
Educational Measurement: My Working Definition
Structure of This Book
Notes
References
Part I: Race, Racism, and the White Racial Frame
Notes
References
Chapter 1: The Origins of Race
The Origins of Race
Race and the Enlightenment
Flies in the Ointment
Buffon and the Degeneration Hypothesis
Genesis of the Human Race
The Haunting of Blumenbach’s Skulls
The Supremacy of the White Caucasian Race
Notes
References
Chapter 2: Molding Race in the United States
Molding Race in the United States
Molding Race in the 17th Century
Molding Status as a Free and Whole Human
Molding Race through Sexual Relations and a Child’s Status
Social Changes Following U.S. Independence
U.S. Supreme Court Cases
The One-Drop Rule
The U.S. Census
Molding the Social Construction of Race
Notes
References
Chapter 3: The Systemic Structure of Racism
Building a Foundation for Understanding Racism as Systemic
Race
Racialization, Racial Formation, and Racial Projects
Racialized Social System
Racialism, Racialist, and Racist
Race versus Ethnicity
Theories of Racism
Biological Racism
Cultural Assimilation as Racism
Individual Racism
Institutional Racism
Structural Racism
Social Structuring of the United States
Structuring of Social Systems and Physical Spaces
Interactions Among Institutions
Systemic Racism
Notes
References
Chapter 4: The White Racial Frame
Molding Racialism
Individualism and Individual Merit
Social Darwinism and Genetic Determinism
Scientific Discovery
Utilitarian View of Justice
The White Racial Frame and Systemic Racism
Notes
References
Part II: The White Racial Frame and the Development of Educational Measurement
Background: Francis Galton, Henry Goddard, and the Eugenics Record Office
Chapter 5: Heredity and Family Traits
Family Studies
Francis Galton’s Hereditary Genius
The Jukes
Henry Goddard and The Kallikak Family
The White Racial Frame and Family Studies
Notes
References
Chapter 6: The Birth of Tests of Mental Ability
Precursors to Intelligence Testing
Skulls and Mental Functions
Psychophysics and Mental Abilities
Galton and the Anthropometric Laboratory
Standardized Tests of Mental Ability
Importing Binet-Simon to the United States
Intelligence Testing and the Army Alpha
Impact of Intelligence Testing on Educational Measurement
The Influence of the White Racial Frame
Notes
References
Chapter 7: The Rise of Educational Testing and Test Bias
College Admissions Testing
Bias in Mental Measures
The Bias of Test Content
Test Norm Bias
Biased Interpretation of Test Scores
The Influence of the White Racial Frame
Notes
References
Chapter 8: The Rise of Statistics in Educational Measurement
A Brief History of the Early Development of Statistics
Galton’s Statistical Contributions
Karl Pearson and Ronald Fisher
Charles Spearman
Influences on Educational Measurement
Differences Among Populations
Regression Effects and Deficit Narratives
Opportunity Costs
Silencing Bayesian Statistics
The Replication Crisis
Notes
References
Chapter 9: Educational Measurement as Apparatus for Systemic Racism
Connecting Racial Projects and Apparatus of Oppression
Educational Measurement and Systemic Racism
Individual Productions as Apparatus for Systemic Racism
Test Development Practices as Apparatus for Systemic Racism
Admission, Graduation, and Scholarship Decisions as Apparatus for Systemic Racism
Notes
References
Part III: Alternate Lenses for Educational Measurement
Notes
References
Chapter 10: Critical Theory
The Idea of Critical Theory
Critique of Positivism
Historicity
Criticality
Reflexivity
Ideology
Mass Media
Implications of Critical Theory for Educational Measurement
Notes
References
Chapter 11: Critical Race Theory and QuantCrit
Critical Race Theory
Tenets of Critical Race Theory
Interest Convergence
Storytelling and Counter-Storytelling
The Emergence of QuantCrit
Tenets of QuantCrit
Centrality of Racism and Quantification
Critically Evaluating Categories
Using Numbers for Social Justice
Numbers Are Not Neutral—Voice Is Vital
The “Effect of Race” Problem
Implications of QuantCrit for Educational Measurement
Notes
References
Chapter 12: Intersectionality Theory
Multiplicity and Intersectionality
Intersectionality as a Provisional Concept
Intersectionality as a Critical Theory
Identity and Social Position
Intersectionality Metaphors
Intersectionality as a Heuristic Device
Concerns About Intersectionality as Heuristic
Practical Challenges for Quantitative Research
Reliance on Identity Categories
Limitations of Current Statistical Methods
Additive Models
Interaction Models
Multilevel Models
Implications for Educational Measurement
Notes
References
Chapter 13: Educational Measurement and the Pursuit of Racial Justice
Utilitarianism and Educational Measurement
Differential Item Functioning
Context-Neutral Item Content
Mean Effect
Rawls’s Justice as Fairness
Differential Item Functioning
Context-Neutral Item Content
Mean Effect
Mills’s Rectificatory Justice
Differential Item Functioning
Context-Neutral Content
Mean Effect
Justice Through Measurement
Notes
References
Chapter 14: Forging a Path Toward Anti-Racism in Educational Measurement
Reflection
“Race” as a Variable in Analyses
Item Bias
College Admission Test Use
Research
Socially Constructed Positions
Item Bias
College Admission Decisions
Analytic Methods
Diverse Representation
A Final Thought
Notes
References
Index

Citation preview

SYSTEMIC RACISM AND EDUCATIONAL MEASUREMENT

Systemic Racism and Educational Measurement provides a theoretical and historical reckoning with racism and oppression produced through educational measurement and research methodology. As scholars and professionals in the testing, measurement, and assessment of human learning and performance work to exorcise race sciences, white supremacy, and other injustices from the field’s research and practice, new insights are needed into their root causes. This book is the first to posit that the theory of the White Racial Frame was and continues to be applied to the foundations, process, dissemination, and use of educational measurement, leading to instruments, findings, and decisions that perpetuate the racialized social structure of our nation. Even among well-meaning stakeholders who aim to improve humanity and address inequities, the White Racial Frame shapes the field’s research questions, the methods utilized, the data valued, the interpretations made, and the language used throughout. Students and scholars of educational measurement, testing, and psychometrics will find invaluable clarifications of terminology, concepts, and theories integral to understanding systemic barriers in the field; explications of educational measurement’s core purposes and its influence by the White Racial Frame; and a series of alternate frames, theories, and epistemologies intended to guide educational measurement toward anti-racism and increased fairness. Michael Russell is Professor of Measurement, Evaluation, Statistics, and Assessment in the Lynch School of Education and Human Development at Boston College, USA. He currently serves on the Technical Advisory Committees for several state assessment and accountability programs.

Systemic Racism and Educational Measurement Confronting Injustice in Testing, Assessment, and Beyond Michael Russell

Designed cover image: © SEAN GLADWELL / Getty Images First published 2024 by Routledge 605 Third Avenue, New York, NY 10158 and by Routledge 4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN Routledge is an imprint of the Taylor & Francis Group, an informa business © 2024 Michael Russell The right of Michael Russell to be identified as author of this work has been asserted in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. ISBN: 978-1-032-12881-8 (hbk) ISBN: 978-1-032-13202-0 (pbk) ISBN: 978-1-003-22814-1 (ebk) DOI: 10.4324/9781003228141 Typeset in Times New Roman by SPi Technologies India Pvt Ltd (Straive)

Contents

Preface vii Acknowledgements x Key Terms and Definitions xi Introduction 1 PART I

Race, Racism, and the White Racial Frame

11

1 The Origins of Race

15

2 Molding Race in the United States

35

3 The Systemic Structure of Racism

65

4 The White Racial Frame

105

PART II

The White Racial Frame and the Development of Educational Measurement

131

5 Heredity and Family Traits

137

6 The Birth of Tests of Mental Ability

157

7 The Rise of Educational Testing and Test Bias

184

vi Contents

8 The Rise of Statistics in Educational Measurement

205

9 Educational Measurement as Apparatus for Systemic Racism

238

PART III

Alternate Lenses for Educational Measurement

253

10 Critical Theory

257

11 Critical Race Theory and QuantCrit

277

12 Intersectionality Theory

310

13 Educational Measurement and the Pursuit of Racial Justice

348

14 Forging a Path Toward Anti-Racism in Educational Measurement373 Index 387

Preface

The seed for this book was sowed in 2015 while I was developing a seminar examining seminal publications that shaped the field of educational measurement. Having read Stephen J. Gould’s The Mismeasure of Man a decade earlier, I was aware of the racial bias that inflicted early efforts to quantify mental ability.1 But it wasn’t until I read the works of several pioneers in the field that I began to see how deeply rooted racialized and racist thinking was and how these early developments influence practices today. My position as a middle-aged White man had allowed me to avoid considering the ways in which race and racism influenced my work and the development of my field. To address this void, I immersed myself in the literature on race and racism. I attended workshops and trainings. I developed a seminar focused on racism and research methodology. I made the decision to capitalize on my tenured White privilege to retire a 20-year research agenda focused on technology and assessment and instead concentrate on reckoning racism in the field of educational measurement in which I work. My effort began by investigating why race is described as a social construct and trying to understand why that conception was important for framing educational measurement. Early readings and discussions deepened my understanding of the ways race was used as a tool for oppression. This recognition led to an understanding of the various forms and theories of racism. In turn, this work introduced the White Racial Frame and its role in sustaining systemic racism. Reflecting on the history of educational measurement, I questioned the ways in which the White Racial Frame influenced the field. Somewhere in this journey, I encountered QuantCrit. To understand it, I stepped back to learn about Critical Race Theory; and to understand the critical in Critical Race Theory, I stepped back further to Critical Theory. Kimberlé Crenshaw, Gloria Anzaldúa, and Patricia Hill Collins introduced Intersectional Theory, which led me to anti-racism, Justice as Fairness, and finally Charles Mills’s conception of Rectificatory Justice. This journey through literature, concepts, and conversations unveiled many ways in which the White Racial Frame has and continues to direct the

viii Preface field of educational measurement to function as apparatus within the system of racism. This journey also revealed provisional insight into ways alternate frames offered by Critical Theory, Critical Race Theory, QuantCrit, Intersectionality Theory, Justice as Fairness, and Rectificatory Justice might be applied to support educational measurement’s function as apparatus for an anti-racist endeavor. The chapters of this book serve as pages in a folio populated since I initiated this project eight years ago. For those of you who began your journey years before me, I acknowledge this book may retread understandings already developed. For those of you who have either just begun your journey or may only be stepping onto the platform, I offer this book as a provisional guidebook. I share this background, not as a celebration, but with dismay. Dismay that I allowed myself to be blind to this issue for so long; dismay that I work in a field that allowed me to remain blind to this issue; and dismay that the many people whose writing has shaped my recent understandings remain largely unacknowledged by the field of educational measurement. Reckoning racism will not end in my lifetime. It is a long journey and a painful process. This book aims to unveil roles the White Racial Frame has and continues to play in conscripting educational measurement to serve as apparatus for systemic racism. I write this book with the hope that the experiences, feelings, and understandings I have gained grappling with this topic will help others begin to see what the White Racial Frame blinds. Notes on Terminology, Phrasing, and Conventions Given the racist, patriarchal, sexist, and ableist views of many authors discussed in this book, readers will find some direct quotes offensive and potentially triggering. In the text I author, with a few exceptions described here, I employ language that is current with today’s terminology and phrasing. When quoting other authors, I employ the exact language they employed. Inclusion of these direct quotes is necessary to understand the ideas held by many contributors to educational measurement and the influence these views had on their work. I do not use “[sic]” to indicate that the terminology employed by the quoted author is potentially offensive or no longer appropriate. I do use “[sic]” when a misspelling or other grammatical error exists in the original text. In a few places, I use terminology employed by another author when discussing their ideas and alternate phrasing might create confusion. I use the term racialized rather than race to emphasize the social processes that position people into a group based on characteristics and traits that have been ascribed racialized meaning. By appending -ized or -izing to racial, I emphasize the active process of creating categories into which humans are membered. Throughout the text, I use the phrase membered similarly to the way in which Rogers Brubaker and Fredrick Cooper use the term identification

Preface ix separately from identity. Those in power and who control the state have the “material and symbolic resources to impose the categories, classificatory schemes, and modes of social counting and accounting with which bureaucrats, judges, teachers, and doctors must work and to which nonstate actors must refer.”2 In this way, being membered is a process in which a person is placed or allowed entry by the State into a category that is defined by the State. Through the text, I use the phrase membered not-White to refer to anyone whose racialized identity is anything other than White. Given the argument I develop in Chapters 2 and 3, I prefer this phrasing to minority/minoritized, person of color, or BIPOC because it both highlights the social process of membering people into racialized groups and reflects the denial of access to the advantages of whiteness. I use the phrase people membered White or people membered Black, etc., to emphasize that it is human beings/people that are placed into racialized groups. In using the term member/membered/membering, I acknowledge that many people membered into racialized categories that are not-White embrace their identity as Black, African American, Latinx, Asian, Indigenous, Brown, and so on. Many people have also worked to reclaim their racialized identity and the narratives associated with that identity. My use of the term member/membered/membering is in no way intended to compete or otherwise challenge such identification or projects aimed at reclaiming an identity. Rather, the analysis I present focuses on the ways in which social structures have worked to establish and, over time, mold racialized categories to produce, preserve, and, at times, extend advantage to people membered White through the oppression of people membered into not-White racialized categories. I capitalize all names of racialized groups to emphasize that placing people into a racialized group is a political process, and that the name given to each racialized group is a formal title conferred to members of a racialized group. I include White as a formal racialized group given that people are accepted or denied membership to this group, which in turn provides or denies access to various forms of power, resources, and opportunity within our racialized society. Notes 1 Gould (1996). 2 Brubaker and Cooper (2000).

References Brubaker, R. & Cooper, F. (2000). Beyond “identity”. Theory and Society, 29(1), 1–47. Gould, S.J. (1996). The Mismeasure of Man. WW Norton & Company.

Acknowledgements

Although I did not know it at the time, work on this book began eight years ago. Over that time, many people have contributed to the learning that informed the writing that became this book. I thank Liana Kish, my wife, for supporting this effort, listening as I worked through ideas, and providing feedback on drafts of this book. I thank Larry Ludlow, our department’s chair, for supporting and, at times, defending my work on this topic. I greatly appreciate the critical feedback provided on papers that evolved into sections of this book by Henry Braun, Eduardo Bonilla-Silva, Steve Sireci, Jennifer Randall, Emi Iwanti, Drew Gitomer, Josh Lederman, Mya Poe, and those who performed blind reviews. To Derek Briggs, Charlie DePascale, Nicole Garcia, Brooke Beaird, and Barry Goldman, I am grateful for your detailed and, in some cases extensive, comments, critiques, and suggestions on drafts of this book. I want to acknowledge the impact several scholars had on my understanding of race, racism, and critical social theories through their writing and, in some cases, conversations, among whom are Eduardo Bonilla-Silva, Nancy López, Tukufu Zuberi, Patricia Hill Collins, Ezekiel Dixon-Román, Dorothy Roberts, John Stanfield, Nell Irvin Painter, Michael Omi, Kimberlé Crenshaw, Howard Winant, Janet Helms, Charles Mills, Michelle Fine, Joe Feagin, Lisa Bowleg, Ibram Kendi, Ana Carastathis, Ian Haney López, Vivian May, David Gillborn, Greta Bauer, Paul Holland, Helen Neville, Jerry Rosiek, Jennifer Simms, Lee McBride, Lisa Spanierman, Nolen Cabrera, and George Dei. I express thanks to the recent cohorts of graduate students that worked through various ideas reflected in this book during seminars and office conversations. I thank Daniel Schwartz for agreeing to work with me to bring this work to publication. Finally, I thank Walt Haney, George Madaus, and Joe Pedulla for their mentorship during various stages of my career and for encouraging critical reflection on our field in order to identify opportunities to improve practice.

Key Terms and Definitions

Educational Measurement

The design, development, psychometric analysis, and application of instruments to collect evidence about cognitive, affective, and/or psychological constructs valued by an educational system, and the use of scores from these instruments to inform educational decisions for individual students and groups of students, and/or to examine the effects of educational programs, interventions, policies, and practices.

Race

A social construct that functions as a master scheme for categorizing people based on ocular corporeal characteristics that fundamentally organize society in a stratified, hierarchical manner with implied specious meanings of superiority and inferiority, and that has both material and social consequences manufactured by the distribution of power and resources.

Racialization

The process of creating racial categories.

Racial Formation

The extension of racial meaning to a previously racially unclassified relationship, social practice, or group.

Racialism

The shaping of ideas and actions in ways that use race to inform everyday thinking, decisions, and actions.

Racialist

An idea or action that is shaped or otherwise informed by racialism.

xii Key Terms and Definitions Racist

Ideas, thoughts, and actions of individuals as well as institutional policies and practices, laws, and regulations that function to produce harm, directly or indirectly, for person(s) membered into a nondominant racialized group.

Project

Any effort undertaken by an individual, group of individuals, organization, institution, or political body to advance ideas, policies, practices, regulations, and/or laws intended to inform and/or influence the functioning and evolution of a society.

Racial Project

A project that gives meaning to a racialized identity or that produces racial significance for a social structure.

Racist Project

A racial project that develops a new racialized category, maintains a racialized structure within society, and/or establishes a new racial structure.

Anti-Racist Project

A racial project that challenges, resists, or otherwise works to undo structures that produce advantage through oppression based on racially stratified categorization.

Apparatus

Initiatives, policies, practices, and/or actions undertaken by individuals, organizations, ins titutions, and governmental bodies to inform and influence the functioning and evolution of a society.

Apparatus for Oppression

An apparatus that is applied by an individual, institution, and/or a government agency (i.e., the State) in a way that oppresses one or more groups of people.

Apparatus for Systemic Racism A form of apparatus that is adopted by and integrated into the system of racism to maintain or advance racialized ideology and advantage produced for dominant members of society through the racialized oppression of non-dominant members.

Key Terms and Definitions xiii Apparatus for Anti-Racism

A form of apparatus that that is applied to resist and/or undo racialized ideology and oppression.

Individual Racism

Thoughts, behaviors, and actions of individual members that are influenced by racialized ideology and which produce harm to individuals or groups of people who are members of nondominant racialized groups.

Institutional Racism

Policies and practices enacted within an organization or field of specialization that discriminate based on racialized categorizations and produce disparate outcomes across racialized groups for people working in the institution or who are served by the institution.

Structural Racism

Social, political, and economic arrangements that produce racial inequities through the interaction of institutional policies and practices, some of which alone do not produce inequities, but which when combined with other institutional policies and practices produce disparate outcomes across racialized groups of people.

Systemic Racism

Connects individual, institutional, and structural racism to sustain and, at times, increase power for the dominant elite and to provide unjust economic, political, and social advantage for people membered into the dominant racialized group through the oppression of nondominant racialized groups of people.

White Racial Frame

A racialized ideology developed and refined over time to justify and perpetuate systemic racism and its production of disparate outcomes across racialized groups of people.

Introduction

Breonna Taylor, Philando Castile, Freddie Gray, Michael Brown, Eric Garner, Trayvon Martin, Aaron Campbell, and too many more people killed over the past decade fissured blinders formed by a White Racial Frame.1 But it was the video of an officer membered White, kneeling on George Floyd, that unveiled a White Racial Frame that blinds too many of us to the systemic racism pervading our nation. Since May 25, 2020, individuals and organizations across the United States have launched numerous self-education, awareness, and reflection activities. Reading groups emptied the shelves, putting Ibram X. Kendi’s How to be an Anti-Racist, Ijeoma Oluo’s So You Want to Talk about Race, Robin DiAngelo’s White Fragility, and many more books on back-order. Demand spiked for workshops by the Racial Equity Institute, the People’s Institute for Survival and Beyond, Learning for Justice, Diversity Works, and dozens more. Poster boards dotted lawns, signs taped to office doors, T-shirts, lapel pins, buildings, and barn walls proclaimed Black Lives Matter. Hundreds of thousands of people marched. Like most companies and organizations, the educational measurement community was prompted to rethink its values. This call began during Steve Sireci’s 2020 Presidential Address to the National Council on Measurement in Education (NCME), an organization to which many educational measurement specialists belong. Sireci described the field of educational measurement as “an altruistic profession” and implored the educational measurement community to rethink its values.2 Responding to a growing outcry against educational testing, Sireci reasoned that “if we want the public to value educational tests, and if we want educational tests to have value in helping students learn, then we must establish professional values to support those goals.”3 Among the history of educational measurement that has sowed public mistrust are the (mis)uses of educational tests to channel students membered Black and Brown into special education tracks, a failure to provide a level playing field through standardization, and the disparate outcomes DOI: 10.4324/9781003228141-1

2 Introduction produced by selection and admissions tests. Sireci explicitly acknowledged the “dominant White culture that has permeated our field, and if continued, will prevent us from understanding and acknowledging the diversity of talent in our community.”4 And he wondered, “how can we sleep when we are being called out for supporting a system that perpetuates the obstruction of access to higher education for so many of our children who come from Black and Brown cultural groups?”5 Implicit in Sireci’s comments is recognition that racism has and remains systemic in the United States and other regions of the world. Unacknowledged is the role that a dominant racialized ideology—termed the White Racial Frame—plays in sustaining systemic racism.6 With the exception of those institutions whose mission explicitly combats racism, all other institutions operating in the United States are influenced by this White Racial Frame and, as a result, serve as apparatus for systemic racism. The White Racial Frame: A Brief Introduction Since its inception, the field of educational measurement has been and continues to be influenced by a single dominant frame. Core to this dominant frame is what sociologist Eduardo Bonilla-Silva terms racial ideology.7 Key features of this racial ideology are an understanding of race as a natural trait inherent to each individual, a hierarchical ordering of racialized groups that places the “White race” at top, and narratives that assert that race is a key factor in influencing life outcomes. Sociologist Joe Feagin adds components to this racial ideology and terms this dominant frame the White Racial Frame.8 Produced and refined over centuries, the White Racial Frame is defined by several tenets. Among these tenets are a belief in the biological heredity of physical, mental, and psychological traits, universal laws of nature and society, and the scientific quantification of those laws. The White Racial Frame embraces the idea that merit is earned by individuals and should be justly rewarded. It accepts a utilitarian form of justice in which net-benefit for society is maximized regardless of the distribution of that benefit or the harm inflicted on some people in or outside that society. It views social-economic-political structures first conceived in Europe as superior to those developed by other cultures and societies, and it views these structures as best suited for a just society. And it is the marriage of this last tenet with biological heredity that gives rise to the notion of whiteness and the superiority of culture, institutions, customs, norms, behaviors, beliefs, and ways of knowing associated with White northern-European heritage. Historically, the tenets forming the White Racial Frame are the cloth that shades eyes to the monstrosity of slavery, the injustice of Jim Crow, and the systemic racism that still plagues the United States and the world beyond. In this way, the White Racial Frame provides cover for the disparate

Introduction 3 outcomes systemic racism produces. Put simply, it is the White Racial Frame that both justifies and perpetuates racism. At first glance, the core tenets of the White Racial Frame and the systemic racism it enables may seem unrelated to educational measurement. What does European-based social-political-economic structures or a utilitarian form of justice have to do with measuring learning? How do universal laws of nature, biological heredity, and individual merit impact the study of school effectiveness? And how do these tenets lead educational measurement to function as apparatus for systemic racism? To address these questions, this book explores how the White Racial Frame influenced early developments in the field of educational measurement and how these early developments influence practices today. Because the White Racial Frame operates as an ideological tool for systemic racism, this book also considers how the persistent influence of the White Racial Frame allows educational measurement to function as apparatus for systemic racism. Finally, this book considers implications that alternate frames— including Critical Theory, Critical Race Theory, Quant Critical Race Theory (QuantCrit), Intersectionality Theory, Justice as Fairness, and Rectificatory Justice—have for educational measurement. When applied to educational measurement, I argue that these alternate frames hold potential to modify practices and applications of educational measurement in ways that can allow the institution to serve as apparatus for an anti-racist endeavor that challenges the current system of racism. But before diving into these topics, it is important to understand the relationships between power, oppression, and the White Racial Frame. Power and Apparatus of Oppression The relationship between power and oppression has and continues to be the focus of attention for political scientists, philosophers, sociologists, and others interested in the production of disparate life outcomes. The analysis I present in this book is influenced considerably by the French philosopher Michel Foucault’s ideas regarding power and oppression. Foucault observes that most analyses of power apply one of two frames. The first frame, which Foucault refers to as sovereign power, views power as emanating from a single source that resides above the citizens of a society, with a sovereign leader and their court serving as that source. Eighteenthcentury philosophers viewed sovereign-based social structures as a product of a (hypothetical) contractual arrangement in which power as an original right is given up by individuals forming a society through the establishment of sovereignty. Foucault observes that “a power so constituted risks becoming oppression whenever it over-extends itself, whenever—that is—it goes beyond the terms of the contract.”9 In this frame, power is conceived as

4 Introduction “contract-power,” with oppression the product of transgression(s) of the limits of the contract. The second frame views power as a product of war or a war-like state. In this frame, the victors of war seize power and exert that power through their continued domination of people residing within society. In this way, oppression is a product of “a perpetual relationship of force” resulting from continual struggle to control the conquered and submission of the conquered to elements of that control.10 Foucault argues that, while these two frames were useful for examining power in societies that operated prior to the Enlightenment, the complex arrangements of modern societies require a more complex frame. Rather than conceiving power as emanating from above the people who form a society, Foucault’s third frame conceives of power “as something that circulates” and “functions in the form of a chain.” In this way, power “is never localized here or there, never in anybody’s hands” but instead “is employed and exercised through a net-like organization.” In such a system of power, all individuals are “always in a position of simultaneously undergoing and exercising power.” In this way, “individuals are the vehicles of power, not its points of application.”11 This conception of power arrangements recognizes that no one person or body is responsible for, or even able to create, a master plan. Rather, the various levels at which power resides and is executed lead to the production of an array of behaviors, practices, services, organizational structures, and institutions, some of which come and go. Those that are sustained and eventually embraced by an oppressive social system are those that support, and at times extend, the interests of the dominant group of people who operate the top echelon of society and the organizations and the institutions that operate within that society. By focusing attention on the many levels at which power resides and operates, Foucault does “not mean in any way to minimise the importance and effectiveness of State power.”12 Although the State is seen as “superstructural in relation to a whole series of power networks,”13 Foucault believes that excessive insistence on its [the State] playing an exclusive role leads to the risk of overlooking all the mechanisms and effects of power which don’t pass directly via the State apparatus, yet often sustain the State more effectively than its own institutions, enlarging and maximizing its effectiveness.14 In Foucault’s words, one doesn’t have here a power which is wholly in the hands of one person who can exercise it alone and totally over the others. It’s a machine in which everyone is caught, those who exercise power just as much as those over whom it is exercised.15

Introduction 5 Within Foucault’s conception of power arrangements in modern societies, the various policies, practices, behaviors, and knowledges developed, implemented, and maintained within families, communities, industries, institutions, and the State serve as apparatus of the machine. It is this third concept of power and oppression that explains how the White Racial Frame, and the individuals, institutions, policies, and practices influenced by it, function as apparatus for racialized oppression in U.S. society. Educational Measurement as Apparatus of Oppression Within Foucault’s third conception of power and oppression, not all apparatus have an oppressive effect when they are first introduced. Over time, however, an apparatus can be refined or repurposed in ways that become oppressive. As one example, consider the prison system. As Foucault recounts, the prison system was initially introduced as a vehicle for transforming those who violate the law into law-abiding citizens. In this way, prisons were intended to be an instrument similar to schools and hospitals which aim to improve the quality and health of humankind. However, very soon after the prison-as-reform project was in operation, it became obvious that rather than transforming prisoners into honest citizens, prisons were “serv[ing] only to manufacture new criminals and to drive existing criminals even deeper into criminality.”16 By this time, however, prisons were functioning as a source of employment. Further, it was realized that prisons could be repurposed to function as a tool to control the general population by serving as a form of deterrence to criminal behavior. In this way, their function shifted from one of reform to one of repression—repressing undesired behavior. Over time the function of prisons interacted with other apparatus, including social support systems, voting regulations, and institutional employment policies, to operate as an apparatus of racialized oppression to deny access to social supports, alter political representation, deny voting rights, and decrease employment opportunities and the economic benefits that flow from employment.17 In this way, what was initially introduced as apparatus for transforming a subgroup of people was itself transformed into apparatus for racialized oppression. Since its inception, advances in educational measurement were introduced to improve the technology of educational measurement and, in turn, to provide social benefit. Over time, however, some of these advancements have been applied in ways that function as apparatus for systemic racism, limiting access to economic and social opportunities for people membered not-White and in turn advantaging people membered White. Among these functions are the reproduction of disparities through the use of tests that award opportunities based on individual merit absent full consideration of the social conditions that contribute to the production of test

6 Introduction scores, the reproduction of deficit narratives that elevate cultural norms associated with people membered White and pathologize the culture, beliefs, and behaviors of people membered not-White, and the underestimation of the presence and causes of bias in test scores. Directing Focus on Racism Rather than on Racist(s) I want to make clear that the central argument made in this book is NOT that those who practice educational measurement are racists. As explored in greater detail in Chapter 3, the concepts of race, racism, racialism, and racist are complex and tightly entwined. While those among us who lay claim to being an overt racist are gratefully few, it is an unfortunate reality that one does not need to engage in racist activity to sustain or contribute to systemic racism. Eduardo Bonilla-Silva has dedicated his career to studying racism, and he makes clear that the apparatus that form the system of racism that operates within the United States and throughout the world would persist even if no people who are racist existed.18 The apparatus of racism developed and refined over the last four centuries operates today independent of the people who engage in racist activity. The apparatus of racism are now so deeply rooted in our social-political-economic system that systemic racism operates on its own. By simply following the laws, regulations, customs, and dominant cultural norms of U.S. society, the people residing in the United States sustain systemic racism and the disparate outcomes that systemic racism produces. Educational Measurement: My Working Definition Over the past century, the meaning of the term measurement has sparked debate among social scientists. Joel Michell’s critical analysis of the history of psychological measurement documents the considerable attention psychologists directed to identifying the characteristics that make a measure a measure during the 1920s and 1930s.19 This debate receded shortly after psychologist Stanley S. Stevens wrote that “measurement, in the broadest sense, is defined as the assignment of numerals to objects or events according to rules.”20 More recently, Michell presented a much narrower definition of measurement that focuses on “the discovery or estimation of the ratio of a magnitude of a quantity to a unit of the same quantity.”21 Establishing a middle ground between Stevens’s broad conception and Michell’s strict definition, a task force charged by NCME to identify foundational competencies in educational measurement defines measurement as “a systematic process of data collection using instrumentation that results in a quantity supporting inferences about an attribute or property of an object, event, or phenomenon.” Applying the concept of measurement to education, the task

Introduction 7 force wrote that “educational measurement involves measurement of knowledge, skills, dispositions, and abilities for some educational purpose, such as supporting learning, certifying learning, or identifying policies and practices that improve learning.”22 The conception of educational measurement I use throughout this book is closest to that of the NCME task force. Specifically, I begin by defining educational measurement as the design, development, psychometric analysis, and application of instruments to collect evidence about cognitive, affective, and/or psychological constructs valued by our educational system. Here, the term instrument refers to educational tests, survey instruments, and other tools designed to systematically collect evidence supporting inferences about knowledge, skills, dispositions, or abilities that are of interest to educators and/or educational leaders. Expanding on the final clause of the task force’s definition, my definition of educational measurement also includes the use of scores from these instruments to inform educational decisions for individual students and groups of students, and to examine the effects of educational programs, interventions, policies, and practices. Because a variety of statistical methods may be used to estimate effects on learning as measured by tests and other scale scores, I consider the use of statistical methods to examine effects of educational programs, interventions, policies, and practices as a component of educational measurement. My embrace of this broad definition is influenced by at least two factors. First, this broad definition reflects both my doctoral training, which addressed educational research and measurement jointly, and the program in which I currently teach, which covers measurement, statistics, and assessment as a collective. Second, my definition is influenced by The Standards for Educational and Psychological Testing, which include inferences and interpretations, as well as uses and consequences of use, as topics for consideration when examining validity. This broad definition of educational measurement permits the historical developments that influence practices employed today to include both the development of mental measures and educational tests, and the development and applications of statistical methods used to analyze scores produced by tests and other measurement instruments. Given this broad conception of educational measurement, I consider the field of educational measurement to comprise specialists who: design and develop instruments; conduct studies to examine the validity of inferences, interpretations, and uses of scores produced by these instruments; design and implement accountability, admissions, and other programs that use test scores as a source of evidence of student achievement, ability, readiness, placement, etc.; and/or employ scores provided by instruments to examine factors that impact educational outcomes. This broad set of specialists, as

8 Introduction well as quantitative social scientists more generally, are the primary intended audience for this book. I acknowledge that some readers operate with a narrower definition of educational measurement that may consider various uses of scores produced by a measurement instrument and/or the consequences of those uses as facets of educational testing or educational assessment rather than aspects of educational measurement. In addition, some readers may consider statistical analyses of scores as quantitative educational research or quantitative social science rather than educational measurement. In such cases, I encourage readers to substitute their term of choice when engaging with sections of this book that focus on use of scores, consequences of use, and/or statistical methods employed to examine the impact of educational programs, interventions, policies, or practices. Structure of This Book This book is divided into three parts. Part I examines the construction of race, the subsequent (re)molding of race, and select theories of racism developed by leading sociologists since the mid-19th century. Because race and racism are under-theorized within the field of educational measurement, I invest considerable space detailing the creation of the concept of race and the molding of racialized categories over centuries to maintain advantage for people membered White. I also examine several theories of racism which were developed and took hold for periods of time over the last century. From these theories, I combine key ideas into a model of systemic racism that operates in the United States today. Part I concludes by introducing the White Racial Frame and describing its central role in the ideology that sustains systemic racism. In doing so, four tenets of the White Racial Frame are explored in detail, namely racialism, individualism, scientific discovery, and utilitarianism. Some readers may wonder why I present such a detailed account of race, racism, and the White Racial Frame in a book focused on educational measurement. I do so because these issues are complex but have not garnered sufficient attention within the field. Drawing on my own experience over the past several years, it was not until I developed a deep and detailed understanding of race and racism that I could begin to see how my thinking and work have been unconsciously influenced by the White Racial Frame. Part II examines specific ways in which the White Racial Frame influenced the early development of educational measurement. These developments include the use of family studies to document the heritability of mental traits and dispositions, the birth of tests of mental ability, the rise of educational testing, and the development of modern statistical methods. The analysis presented for each of these topics describes the role one or more facets

Introduction 9 of the White Racial Frame played in shaping each development. In addition, the influence each development has had on subsequent developments and practices in the field are considered. In limiting focus to only these four topics, I acknowledge that this analysis addresses only a small number of advances to theory and practice that occurred during and since the 75 years that are the focus of analysis. My goal in presenting this analysis is not to provide a comprehensive historiography of educational measurement. Rather, my aim in using these examples is threefold. First, these examples allow me to explore how a given frame—in this case, the White Racial Frame—can influence developments in our field. Second, these examples serve to reveal how practices and beliefs developed over a century ago can influence practices today. And third, by revealing how one frame can influence past developments and current practices, I hope to open opportunities to explore how the adoption of alternate frames might influence future developments in the field. Part III shifts focus to alternate frames that have emerged and hold potential to influence the future of educational measurement. Among the frames examined are Critical Theory, Critical Race Theory, Quantitative Critical Race Theory (a.k.a. QuantCrit), and Intersectionality Theory. This section also considers educational measurement’s past and future role in supporting social justice. Here, I argue that educational measurement has operated within a frame of justice that prioritizes utility and consider how a shift to Justice as Fairness or Rectificatory Justice might help rectify educational measurement’s role in the systemic racism that operates the United States. The book ends by considering several actions the field of educational measurement might take to transform its function as apparatus for systemic racism to instead function as apparatus for an anti-racist endeavor. Notes 1 See #SayTheirNames for an extended list of “Black lives stolen due to police brutality” since the early 2000s. https://www.saytheirnamesmemorials.com. 2 Sireci (2021), p. 1. 3 Sireci (2021), p. 1, italics in original. 4 Sireci (2021), p. 4. 5 Sireci (2021), p. 5. 6 Feagin (2013). 7 Bonilla-Silva and Baiocchi (2001). 8 Feagin (2013). 9 Foucault (1980), p. 91. 10 Foucault (1980), p. 92. 11 Foucault (1980); all quotes located on p. 98. 12 Foucault (1980), p. 72. 13 Foucault (1980), p. 122. 14 Foucault (1980), pp. 72–73.

10 Introduction 15 16 17 18 19

Foucault (1980), p. 157. Foucault (1980), p. 40. Alexander (2012). Bonilla-Silva (2018). Michell (1999, 2014). See also Briggs (2022) who considers several different ways in which measurement has been defined. 20 Stevens (1946), p. 677. When presenting this definition, Stevens states that he is paraphrasing N.R. Campbell’s discussion of measurement in the Final Report (1940) of the Committee of the British Association for the Advancement of Science’s debate on the problem of measurement. 21 Michell (1997), p. 358. 22 Presidential Task Force (2023), p. 7.

References Alexander, M. (2012). The New Jim Crow: Mass Incarceration in the Age of Colorblindness. The New Press. Bonilla-Silva, E. (2018). Racism without Racists: Color-Blind Racism and the Persistence of Racial Inequality in the United States (5th ed.). Rowman & Littlefield Publishers. Bonilla-Silva, E. & Baiocchi, G. (2001). Anything but racism: How sociologists limit the significance of racism. Race and Society, 4(2), 117–131. Briggs, D.C. (2022). Historical and Conceptual Foundations of Measurement in the Human Sciences: Credos and Controversies. Routledge. Feagin, J.R. (2013). The White Racial Frame: Centuries of Racial Framing and Counter-framing. Routledge. Foucault, M. (1980). Power/Knowledge: Selected Interviews & Other Writings 1972– 1977. Pantheon Books. Michell, J. (1997). Quantitative science and the definition of measurement in psychology. British Journal of Psychology, 88(3), 355–383. Michell, J. (1999). Measurement in Psychology: A Critical History of a Methodological Concept (Vol. 53). Cambridge University Press. Michell, J. (2014). An Introduction to the Logic of Psychological Measurement. Psychology Press. Presidential Task Force. (2023). Foundational Competencies in Educational Measurement. National Council on Measurement in Education. Sireci, S.G. (2021). NCME presidential address 2020: Valuing educational measurement. Educational Measurement: Issues and Practice, 40(1), 7–16. Stevens, S.S. (1946). On the theory of scales of measurement. Science, 103(2684), 677–680.

Part I

Race, Racism, and the White Racial Frame Race, thus, has always simultaneously involved struggles over meaning and struggles over resources … Understanding this history is crucial to any scholar exploring racial dynamics. Being clear on the “genealogy of the idea of race” ensures, that we are clear on “what is ‘race’ and what it is not.”1

Cord Whitaker, a professor of English, uses mirage as an analogy when examining race.2 As he describes, a mirage is both unreal yet real. A mirage is an illusion produced by the bending of light by water molecules suspended in air. The image of pond water on desert sand is unreal. But the light that is distorted emanates from a sky that is very much real. The same duality holds for race. Race is both unreal and yet real. As a scientifically backed construct, race is unreal—there is no biological or genetic evidence that supports the division of the human species into separate and distinct racialized categories.3 Yet, people have seen and continue to see and use race as a tool to provide advantage for people membered into a dominant racialized group through the oppression of people membered into nondominant racialized groups. The lack of scientific backing for race is evidenced by the ways in which race, and the racialized stratification that flows from it, was and continues to be molded to maintain power. When first introduced as a biological term, race divided the human species into four categories based on geography and physical traits—what are called phenotypical characteristics. Over time, new racialized categories were added and some were removed. Depending on who was believed, there were as few as two races or dozens of racialized groups.4 When convenient, race was and still is molded to define whiteness. At times, it was molded to produce hierarchies within whiteness. Race has been expanded to embrace ethnic, cultural, and religious differences. It has been contracted to simply distinguish White from not-White. People in power modify race and apply it to maintain their economic, political, social, and cultural dominance.5 In an ever-evolving society, it is DOI: 10.4324/9781003228141-2

12 Race, Racism, and the White Racial Frame the unrealness of race that allows it to be bent to architect social structures that sustain power and produce the very real effects generated by the execution of that power. When used by individuals, race allows them to see differences between racialized groups of people. Seeing race evokes stereotypes and deficit narratives engrained by society into each of us. We hear family members talk about people placed into different racialized groups and note differences in the adjectives, pronouns, and tone used. We see family members’ behavior shift in response to the racialized composition of those in close proximity. Images from television programs, news articles, and the history we learn tell us there are differences among people assigned to different racialized groups. We come to understand the coded phrases used by politicians, news anchors, talk radio, and, now, social media posts to reference members of racialized groups. Some within the dominant group act on racialized prejudice with intent to produce harm. It is these acts we are taught to define as “racist.” Naming these intended, racially targeted actions “racist” serves to protect the majority of the dominant group from accusations of “being racist” or engaging in racism. Requiring a “racist” to act with intent, to act on their prejudice knowing they will produce harm, permits all others to be the “good people,” who “don’t have a racist bone in their body.” The impacts individual racism has on people membered into nondominant racialized groups conjoins with the generational harm produced by racist policies built into our institutions and systems. Within every institution, the same patterns occur: opportunities and outcomes are lower for people membered into nondominant groups; school suspension and expulsion rates are higher, while advance course work and admission rates are lower; financial institutions issue loans requiring higher interest rates; criminal conviction rates are higher and sentencing is longer; doctors are less prone to prescribe pain medication, complications during pregnancy and birth are higher, detection of disease occurs later, and health outcomes are lower. The web weaving these institutions into a socio-political-economic system exacerbates the disparate experiences and outcomes experienced by dominant and nondominant grouped members. Historic housing policies forced segregation and concentrated people membered into nondominant groups into densely populated regions. Policing practices applied more frequently in densely populated segregated communities produce higher stop rates. Higher stop rates produce more criminal charges. More charges create the impression of higher crime rates. The perception of higher crime rates justifies more policing. Coupled with higher conviction rates and longer sentences, mandatory sentencing for repeat offenders compounds, increasing imprisonment of people membered into nondominant groups. Prison records complicate employment, voting rights, and eligibility for housing and other

Race, Racism, and the White Racial Frame 13 social support programs. In this way, institutions are structured into a system that provides advantage for people membered into the dominant group through the oppression of people membered into nondominant groups. Most people membered into the dominant group think of themselves as good. Many have strong religious convictions. They believe in equality. They value equal opportunity for all members of society. How is it, then, that dominant group members permit this system of oppression to operate? In Chapter 4, I argue it is the White Racial Frame that allows dominant group members to permit racialized oppression through a sleight of hand. Dominant group members value equality and equal opportunity, but are conditioned to believe outcomes are the product of individual effort. Outcomes are merited based on individual talent and hard work invested to develop those talents to their fullest potential. Good decisions lead to good outcomes, bad decisions to bad outcomes. There are other components of the White Racial Frame that shape educational measurement. But this belief in individualism, individual merit and agency, is the core of the White Racial Frame. It allows too many members of the dominant group to overlook a history that produced today’s ongoing segregation. It produces blindness to segregation’s designed production of disparate outcomes and instead shines light on the illusion that outcomes are merited by each individual’s effort (or lack thereof) to cultivate their natural talents. The relationship among race, racism, and the White Racial Frame is simple. But how it developed is complex and complicated. As unpacked in the next four chapters, the integrated influence of social, economic, and scientific developments produces complexity. The adherence to equality and equal opportunity applied today in an ahistorical manner creates complication. As the quote that opens this section emphasizes, to understand how the White Racial Frame shapes educational measurement so that it functions as apparatus for systemic racism, it is critical to understand how race was born, how it is molded over time, how it is applied to produce and maintain power for the dominant racialized group, and how the unjustness of these applications is shielded by the White Racial Frame. I begin this unraveling by examining the birth of race as a scientific concept during a period in European history termed the Enlightenment. Next, I consider several ways in which the concept of race as a tool for defining subgroups of people has been molded by social, political, and legal forces. Next, I examine how race has and continues to be applied as a concept to produce various forms of racism—individual, institutional, structural, and systemic. This section ends by detailing the White Racial Frame and the ways in which this veiled ideology enables people membered into the dominant group to discount the disparities designed by social, political, and economic systems. Through these understandings, we will be positioned to focus a critical lens on the ways the White Racial Frame influenced early

14 Race, Racism, and the White Racial Frame developments in educational measurement and how these developments allow educational measurement to function as apparatus for systemic racism. Notes

1 2 3 4

Lewis et al. (2019), p. 30, quoting Golash-Boza (2016) p. 131, emphasis added. Whitaker (2019), p. 5. Roberts (2011). Brace (2005) notes that the French physician Julien-Joseph Virey divided the human species into two species, the first of which included four races. Colonel Jean Baptiste Bory de Saint-Vincent identified 15 groupings of human beings. Louis-Antoine Desmoulins divided Homo sapiens into 25 races. 5 Kendi (2016); López (2006).

References Brace, C.L. (2005). “Race” is a Four-Letter Word: The Genesis of the Concept. Oxford University Press. Golash-Boza, T. (2016). A critical and comprehensive sociological theory of race and racism. Sociology of Race and Ethnicity, 2(2), 129–141. Kendi, I.X. (2016). Stamped from the Beginning: The Definitive History of Racist Ideas in America. Nation Books. Lewis, A.E., Hagerman, M.A. & Forman, T.A. (2019). The sociology of race and racism: Key concepts, contributions and debates. Equity and Excellence in Education, 52(1), 29–46. López, I.H. (2006). White by Law: The Legal Construction of Race. NYU Press. Roberts, D. (2011). Fatal Invention: How Science, Politics, and Big Business Re-Create Race in the Twenty-first Century. New Press. Whitaker, C. (2019). Black Metaphors: How Modern Racism Emerged from Medieval Race-Thinking. University of Pennsylvania Press.

1

The Origins of Race

Race is a way of “making up people.”1

In the quote opening this chapter, race is recognized as a tool constructed by society to “make up” groups of people. Accompanying this social construction of groups are narratives “made up” to differentiate groups in “meaningful” ways. It is this “making up” of groups and associated narratives that makes race a social construction. Yet, for too many people the social construction of race is ignored, and the concept of race is instead reduced to the color of one’s skin. And because skin color is influenced by genetics, race is misunderstood as a biological trait. Over the past three years, my colleagues and I have engaged in a project that examines the understandings people hold about race and racism. As part of this project, we surveyed approximately 2,000 people across the nation, all of whom had completed high school in the United States. Participants were presented with a series of statements about race and racism and asked to indicate whether the statement reflected their understanding. A few statements focused on biology and included:

• “Because biology is an important factor that affects life outcomes, there is not much we can do to improve the situation for people who are Black.”

• “Racism is the product of biological differences.” • “Natural biological differences between people who are Black and those

who are White explain why so many of our best athletes are Black and our top business leaders are White.” • “The inequities that exist among people of different races are largely the result of natural biological differences.” Many readers will find these statements problematic and offensive. Yet, 30% to 40% of respondents indicated that these statements were consistent with their understanding of race and racism. DOI: 10.4324/9781003228141-3

16 Race, Racism, and the White Racial Frame Dorothy Roberts is a professor of American sociology and law whose research focuses on the intersection of the science of medicine, race, and politics. Her book, Fatal Invention, examines recent efforts to link genetics to race in order to produce race-based pharmaceutical and medical interventions. Drawing on the vast body of genetics research conducted as part of the Human Genome Project, Roberts evidences no scientific foundation for a biological definition of race. There is no race gene. No genetic markers. No genetic test for White or Black, Asian, or any other racialized categories. In fact, Roberts points to genetic analyses that show greater variation among people with recent African heritage and among people with recent European heritage than there is variation between these two groups of people. That is to say, there is nothing in the genetic structure of human beings that supports categorizing people into racialized groups based on biology. Instead, Roberts argues, “race itself is an invented political grouping. Race is not a biological category that is politically charged. It is a political category that has been disguised as a biological one.”2 But if science does not back race as biological, how did this idea become so engrained in our thinking? The Origins of Race The Merriam-Webster Dictionary provides three entries for the word race. One defines race as a competition between people, animals, vehicles, etc., to determine which is faster, to win something, or to do something first. The second definition applies this same notion when race is used as a verb—to compete in a race or to move forward at great speed. The third addresses race when used to categorize members of a species into subgroups. The first two uses of the term derive from the word ràs, which the Norse used to describe rapid forward movement and which Old English expanded to reference a contest of speed. This use dates as far back as the 1300s. The third use is believed to stem from the Italian term razza, which was used to describe objects with common characteristics. The term razza appears in Marco Polo’s travel narratives in the late 1200s, in which he described people living in different regions. Etymologists trace this third use in English to the 1500s, when race was used to describe wines with a characteristic flavor. Over time, race was also used to reference a group of people, such as those with a common occupation or who were of a specific generation. Shakespeare used the term race in Macbeth when describing “Duncan’s horses—a thing most strange and certain— Beauteous and swift, the minions of their race.”3 Gradually, use of the term race shifted to reference a tribe, nation, members of a distinct familial lineage, or people of a common stock, particularly those of nobility.4 This use is seen in 1572 in reference to “the English race.”5 Used in these ways, race

The Origins of Race 17 referred simply to lineage rather than fixed physical traits associated with broadly categorized groups of people.6 Reviewing 11 editions of the Dictionary of English Language published between 1756 and 1799, Bronwen Douglas, a historian of archeology and anthropology, notes that this genealogical definition of race remained largely unchanged. Douglas also notes similar meaning and consistency in definitions in the first through fifth editions of the French Dictionnaire de l’Acadèmie published between 1694 and 1798. In the sixth edition, published in 1835, a notable change occurs: race was now defined as “a multitude of men who originate from the same country, and resemble each other by facial traits and by exterior conformity. The Caucasian race. The Mongol race. The Malay race.” At about the same time, a similar use appears in English in which race was defined as “any of the major groupings of mankind, having in common distinct physical features or having a similar ethnic background.”7 Since then, race was and continues to be used in a way that associates the term with the stratification of people based on (specious) biological characteristics. Given the relative stability in the definition of race in both England and France throughout the 18th century, why did the meaning of race suddenly shift during the early years of the 19th century? And how did examples of its use contract from applying to a variety of vines or creatures to a narrow focus on “varieties” of humans? Race and the Enlightenment The biological concept of race is a product of the Enlightenment.8 Spanning the late 1600s into the early 1800s, the philosophical and scientific developments that occurred in Europe during this period had a profound influence on economic and political structures then and now. Prior to the Enlightenment, people living in Europe relied on religious revelation to explain their world—all flora, fauna, and natural events were understood as parts of God’s plan. During the Enlightenment, trust in human reason and evidence provided through human senses supplanted religious revelation as a vehicle for understanding the natural world.9 Among the many interests explored during this Age of Reason was natural history. At the time, natural history focused largely on cataloging and giving order to the earth’s flora and fauna. It was this effort to give structure to the living elements that birthed the biological concept of race. In his exploration of the genesis of race, Loring Brace, a late professor of anthropology, ascribes three factors that he believes contributed to the creation of biological races during the Enlightenment.10 First, Brace notes that European world exploration just prior to and during the Enlightenment differed notably from travel that occurred previously. During the Middle Ages,

18 Race, Racism, and the White Racial Frame people rarely traveled more than 25 miles from their homes. Those who did—most famously Italy’s Marco Polo (1254–1324) and Morocco’s Ibn Battutah (1304–1368)—traveled relatively short distances each day, rarely covering more than 25–50 miles. While their travels made contact with people whose phenotypical features and customs varied widely, Brace speculates that the short movements each day exposed travelers gradually to these variations. This gradual movement through a varied landscape of pantones, physical features, and customs trivialized these variations. Although written records reveal physical characteristics were noted—Polo often detailed the pantone of the people he encountered during his travels—people were not separated into different groups based on these physical differences.11 Just prior to and during the Enlightenment, advances in seafaring vessels and navigation techniques allowed travelers to cover vast distances before contacting people residing in lands distant from Europe.12 Brace posits that such travel obscured the subtle shifts in the physical characteristics of people living in relatively close proximity. Instead, these expansive voyages created impressions of dramatic differences among people living in distant geographic regions of the world. A second factor that contributed to race as a biological category is the Great Chain of Being, a creationist concept that lingered from the preEnlightenment era. The Great Chain of Being draws on the belief that God is the creator of all things and that the world consists of “fixed and separate species whose perfect representations were to be found only in the mind of God.” As such, the world is understood as being arranged hierarchically “in a series of steps running from God at the top down through the various entities of the living world to the inorganic—‘base’ metals—at the bottom.”13 It was this order that defined the scala naturae. Made in the image of God’s son, humans were seen as one step below God, yet above all other creatures which God created to serve humankind. The Great Chain of Being established belief in distinct nonoverlapping categories of flora and fauna. Growing interest in applying scientific observation and reason to explain the natural world is the third factor that contributed to the birth of race as a biological concept. A first step in explaining the world was documenting the elements that comprise the world. Early natural historians invested considerable energy in observing and describing natural beings, hoping their descriptions captured the characteristics that distinguished one type of being from all others.14 Given the great number and wide variety of flora and fauna that exists across the earth, a system was needed to organize the documentation of organic life. In 1735, Carl Linnaeus introduced just such a system.15 In the first of his 12 editions of Systema Naturae [System of Nature], the Swedish botanist presented a system for classifying and structuring all known flora and fauna.16 Wedded to the Great Chain of Being, Linnaeus’s classification system mirrored the structure of the scala naturae

The Origins of Race 19 and “arranged the living world into named units in descending order of increasing distinctiveness,” starting with class, then preceding through order, genus, species, and variety. Linnaeus applied his system to classify and arrange more than 12,000 plants and animals into an ordered structure. In his first edition, published in 1735, Linnaeus placed human beings within the order Anthropomorpha under the class Quadrupeds. In doing so Linnaeus was the first to classify human beings as part of the animal kingdom. His tenth edition, published in 1758, renamed the class mammals and the order primates, and classified humans as primates accompanying apes, monkeys, and bats. Collectively, these factors—distant travel, the Great Chain of Being, and observation and reason as tools of science—worked together to influence Linnaeus’s formation of the System of Nature. Linnaeus’s system replicated the hierarchical, nonoverlapping structure defined in the Great Chain of Being. He relied on observed differences to distinguish among species. And the radically different appearance of people residing in lands separated by great distances revealed differences that required explanation. Varieties of human beings were the explanation Linnaeus provided. In addition to placing human beings within the order Primate, Linnaeus’s first nine editions of Systema Naturae also separated human beings into four distinct varieties. Linnaeus termed his varieties Europaeus (Europe), Asiaticus (Asia), Americanus (America), and Afer (Africa). In these nine editions, Linnaeus included only one descriptor that he believed distinguished among the varieties, namely color. To each of the four varieties of humans he appended the name of a color—Europaeus albus (white), Americanus rubescens (reddish), Asiaticus fuscus (tawny), and Africanus niger (black). Linnaeus’s tenth edition, published in 1758, included two notable changes. First, Linnaeus added two new varieties, Monstrous (Monsters) and Ferus (Wild). Among the Monstrous he placed the Hottentots residing in what is now South Africa, the so-called Patagonian giants, and alleged “dwarfs” of the Alps. The Ferus included “wild children” from various regions of the world. Second, and most importantly, Linnaeus expanded his descriptions to include five characteristics associated with each of the four major varieties. These characteristics focused on skin color, features of the head, behavior/ disposition, clothing, and form of government.17 Linnaeus did not assign a hierarchal structure to his variations of humans—Ferus was listed first, followed by Americanus, Europaeus, Asiaticus, Afer, and then Monstrosus. His descriptors, however, clearly indicated a discursive ordering. Europaeus was described as “light, wise and inventive.” In contrast, Asiaticus was “stern, haughty and greedy”; Africanus was “sly, sluggish and neglectful.” It is interesting to note that Linnaeus’s description of Americanus—“unyielding, cheerful and free”—stands in stark contrast to

20 Race, Racism, and the White Racial Frame the characterizations discoursed by conquistadors, colonists, and early citizens of the United States to justify the holocaust of people indigenous to the American continents. The denigration of people residing in Asia and Africa explicit in Linnaeus’s descriptors laid the foundation for racist characterizations of people who did not meet the criteria for White. Linnaeus, however, did not introduce the term race as a category of people. In none of his editions did he use the term race. And, while his classification system clearly separated humans into distinct groups, he did not view these distinctions as meaningful from a biological perspective. He viewed the varieties as members of the same species and, while he noted differences in traits among his varieties, he did not view these varieties as subspecies—all varieties were consistently classified as the same species. Writing in 1737, Linnaeus stated clearly this belief: [God] created one human, as the Holy Scripture teaches; but if the slightest trait [difference] was sufficient, there would easily stick out thousands of different species of man: they display, namely, white, red, black and grey hair; white, rosy, tawny and black faces; straight, stubby, crooked, flattened, and aquiline noses; among them we find giants and pygmies, fat and skinny people, erect, humpy, brittle, and lame people etc. etc. But who with a sane mind would be so frivolous as to call these distinct species?18 As we will see shortly, it was his contemporary Georges Louis Leclerc, better known as Comte de Buffon, who turned Linnaeus’s varieties into race. Flies in the Ointment The role Linnaeus plays in producing the biological concept of race is recounted by the many authors who examine race. Although Linnaeus’s classification system set the cogs in motion, three aspects of the aforementioned narrative overstate the novelty of Linnaeus’s contribution and the role world travel plays in it. The first wrinkle pertains to the denigration of people membered Black, explicit in Linnaeus’s 1758 descriptors. As documented by Whitaker, negative associations with the word black and depictions of people membered Black extend back much further than the Enlightenment. Although Whitaker’s analysis does not find evidence that the term race was used to differentiate among groups of people based on phenotypical characteristics, he traces the associating of black and white with impurity and purity back to a least the 13th century. At the time, white European Christians were suffering losses in their Crusades to convert “heathens” to Christianity. Of particular

The Origins of Race 21 concern was the loss of the Levant of Acre in 1291, which was seized by an Arab army of Islamic faith. Acre was viewed as the gateway to Jerusalem for the Christian European conquest, and its loss marked a slow end to the Crusades. The King of Tars, authored circa 1330, draws on the conflict at Acre. In this tale, further conflict is avoided when a White Christian princess agrees to marry a dark-skinned sultan. During the ceremony, the sultan is to be baptized. Having been called to God by the priest, the sultan’s skin turns from black to white, and at that moment the sultan is said to recognize God as almighty. Whitaker’s translation recounts the denigration of black skin and glorification of white that existed as far back as the early 14th century: “His skin, that was black and hateful, became all white through God’s grace, and bright without blemish and when the sultan saw that sight, then he believed in God almighty.”19 Whitaker’s analysis traces the use of similar imagery that associates blackness with impurity and whiteness as pure in literature produced during the late Middle Ages and argues that these metaphors helped set the stage for the denigration of people membered Black during the Enlightenment and up to this day. Whitaker’s analysis of the negative connotations associated with the word black is supported by the meanings given to the word black prior to the Enlightenment. As the late Winthrop Jordan, a professor of history, describes, the meaning of black before the sixteenth century included, “Deeply stained with dirt; soiled, dirty, foul … Having dark or deadly purposes, malignant; pertaining to or involving death, deadly; baneful, disastrous, sinister … Foul, iniquitous, atrocious, horrible, wicked” … Embedded in the concept of blackness was its direct opposite—whiteness … White and black connoted purity and filthiness, virginity and sin, virtue and baseness, beauty and ugliness, beneficence and evil, God and the devil.20 Jordan goes on to describe the special significance whiteness had in defining the beauty of females, noting the heavy use of cosmetics to whiten alreadywhite skin. A second wrinkle in the common narrative regarding the origins of race is presented by Nell Irving Painter, a professor of American history. In her analysis, titled The History of White People, Painter details the role that the color white played in defining beauty in the 16th, 17th, and 18th centuries. As an example, she documents the influence Johann Winckelmann, known as the father of art history, had on European conceptions of beauty as whiteness just prior to Linnaeus’ 10th edition. Winckelmann defined “the Greek profile [as] the first character of great beauty.”21 Having studied replicas of various ancient Greek sculptors, unknowingly cast a much purer

22 Race, Racism, and the White Racial Frame white than the Greek originals, Winckelmann also emphasized the value white skin plays in establishing the beauty of bodies depicted in art. The centuries-old negative connotations associated with the word black conditioned Linnaeus and Enlightenment scientists that followed to perceive the dark skin of people residing in Africa in similarly negative terms. Likewise, the glorification of whiteness, particularly in artistic forms, conditioned Enlightenment thinkers to perceive the light skin of people residing in Europe as superior.22 As we will see shortly, adoration for the Greek facial form similarly conditioned these thinkers to believe that levels of intelligence declined as facial forms deviate from the Greek ideal. Francois Bernier creates a third wrinkle to the credit given to Linnaeus for introducing a racial frame to the field of science.23 Bernier was a French physician and world traveler who lived in South Asia for 12 years, during which he served as the practicing physician for a Mughal ruler in Agra (India).24 In 1684, Bernier authored a brief 13-paragraph essay, titled A New Division of the Earth, According to the Different Species or Races of Men Who Inhabit It. The opening sentence noted that, until his publication, geographers had “divided the earth according to its different countries and regions.” Bernier proposed a different approach that divided the earth into “four or five” sections based on the “species or race of men” residing in each region.25 Over five paragraphs, Bernier identifies four distinct “races of men” that included those residing in what we now term sub-Saharan Africa, East and Southeast Asia, the northern regions of Finland known as Lapland, and a hodge-podge region spanning Europe (except a part of Muscovy [Moscow]), the Middle East, parts of Northern Africa, and much of South Asia. His justification for separating sub-Saharan Africa rested largely on skin color, facial structures, and characteristics of people’s hair. The physical characteristics of the people residing in Lapland were deemed sufficiently different from those in other regions of Europe to justify forming a separate racialized grouping. Curiously, for the remaining two groups, Bernier noted variation in the pantone of skin among people residing in these regions, but he attributed those variations to exposure to the sun. He believed that limited sun exposure would reveal common coloration that was “truly white.” What separated the Asia–Southeast Asian group were broad shoulders and facial features, including “three hairs of beard.” Bernier also invested a paragraph describing people residing in the American continents. Although he observed some differences in physical traits, the differences were not “so large as to warrant making them a special type distinct from our own [the hodge-podge].”26 Bernier’s view of people residing in Lapland was clearly negative— describing the Lapps as “very ugly” and “wretched animals.” His descriptions of people residing in Africa and East Asia were generally neutral,

The Origins of Race 23 although for each, at least one description creates offense—the hair of people residing in Africa was compared to that of dogs, and the eyes of people residing in Asia were compared to those of pigs. Although one can infer an ordering to the four groups, the second half of Bernier’s essay clearly indicates he saw beauty within each of his racialized groups. In fact, the final seven paragraphs of his essay focus specifically on the beauty of the women residing in each region—an exhibition of gross sexist objectification that, perhaps, has more to say about the patriarchal sexism of the time than about racialized thinking. It is unclear how widely Bernier’s essay was read. Its use of the term race and the separation of the earth into distinct racialized regions clearly evidences seeds for a racialized hierarchy germinating long before Linnaeus, Buffon, and the “scientists” that followed positioned white Europeans second only to God in the Great Chain of Being. Buffon and the Degeneration Hypothesis In 1684, Bernier divided the earth into four regions based on the characteristics of people residing in each region. Four decades later, Linnaeus divided humans, first into four varieties, then into six. Shortly thereafter, GeorgeLouis Leclerc, later known as Comte de Buffon, established hierarchy to those varieties, asserting white Europeans at the top. Born into a wealthy French family, Buffon pursued a variety of intellectual interests including mathematics, cosmology, and natural history. It was his contributions as a naturalist, however, for which he is best known. In 1749, Buffon published the first of what became 36 volumes of Histoire Naturelle (Natural History), a monumental effort to provide a rational explanation for the entirety of natural history.27 In it, Buffon was critical of two aspects of Linnaeus’s work. First, Buffon believed Linnaeus’s classification system had made “the language of science more difficult than science itself ” and argued that Linnaeus’s system merely classified natural objects without any insight into how they came to be. Foreshadowing Dorothy Roberts’s conception of race as a social construct applied for political purposes, Buffon observed that “in nature there are only individuals: genera, orders, and classes exist only in our imagination.” The names Linnaeus gave to natural objects were merely “human creations for the sake of human convenience.”28 Buffon’s second critique addressed Linnaeus’s belief in varieties within a species. Here, Buffon makes perhaps the most significant contribution to the biological conception of race. Linnaeus’s classification system was founded on his belief in distinct categories of living organisms. Each category was defined by a set of permanent characteristics that were common across all members of that category. Although there may be variations among individual members of the category, the defining characteristics are both common and permanent.

24 Race, Racism, and the White Racial Frame Buffon, however, recognized substantial variation among individuals within a group. In most cases, a given variation does not carry forward with consistency into future generations. As an example, Buffon recognized that traits characteristic of an “albino,” “dwarfs,” or “giants” were “a kind of disease” or “accidental varieties, not as permanent differences able to be produced by stable races.”29 On occasion, however, a variation persists, is passed generationally, and becomes a genealogical trait. As a set of variations becomes a stable characteristic of a subgroup within a species, Buffon believed a race within that species is formed. As Claude-Olivier Doron, professor of history and philosophy of science, explains, Buffon’s conception of race was motivated by his belief in a single origin or genesis of human beings—what is termed monogenesis.30 To account for the differences in physical characteristics of people residing in different regions of the world, Buffon required a logical explanation for how such differences developed if all human beings had the same origin. Race, understood as variations that become stable traits passed down genealogically within a subgroup within a species was Buffon’s solution.31 Buffon’s conception of race allowed multiple subgroups, each with a set of distinguishing characteristics, to exist within a species.32 For a given race, the distinguishing characteristics were the product of individual variation that eventually became a shared trait among all members of the race. Given that the shared characteristics defining a race are the product of variation, Buffon’s theory begged the question: prior to the formation of a new race, what characteristics were shared across members of the species? It is here that Buffon’s thinking produced a lasting impact on racialized thinking that persists to this day. Buffon believed white Europeans were the origin. From white Europeans, degenerative variations were passed genealogically through generations and became stable traits within subgroups of the human species. By becoming a set of stable degenerative traits within a subgroup, a new race of human beings was formed. In addition to setting course for racial science, Buffon defined the term species as “a constant succession of similar individuals that can reproduce together.”33 Although both the genealogical conception of race and species required passing of stable traits down through generations, requiring ongoing reproduction within a species allowed for reproduction between races. Buffon’s definition, then, enabled human beings to be separated into races while also maintaining common origin. Before we move from Buffon to the variety of racial categorizations that followed, one final note is required. Most texts that describe Buffon’s contributions to the biological concept of race claim he defined six categories of race. Depending on who is consulted, however, the names of these categories vary. Moreover, closer reads of Buffon’s works reveal that when discussing what some interpret as a specific racial category, he then lists several

The Origins of Race 25 additional racial categories. As an example, he separates “Nubians” from “Ethiopians” and “Hottentots” from “Negroes.”34 Given this confusion, and the absences of a clearly presented classification system within Buffon’s works, I refrain from making a definitive statement about the number of races Buffon conceived or the names assigned to them. It is clear, however, that Buffon believed the species Homo sapiens comprised multiple races, which degenerated from what he believed was the original and superior race of white Europeans. Genesis of the Human Race Buffon developed his degeneration theory to explain how dramatic differences in physical characteristics of people residing in different parts of the world came to be, given that there was a single source of all human life. His monogenistic framing of the problem also required him to identify the original source: white Europe. To explain variation in physical characteristics, other Enlightenment intellectual leaders offered an alternate explanation: polygenism. Rather than a single source of human life, polygenism maintained multiple origins, each occurring in different regions of the earth. Among the most well-known polygenists were Voltaire, David Hume, Christoph Meiners, and George Cuvier. Although Darwin’s theory of evolution gradually put to rest polygenist thinking, the written works produced by these and other scholars endorsing polygenic logics had a lasting impact on the negative characterization and deficit narratives about people membered Black that persist to this day. Voltaire questioned whether people residing in Africa “descended from monkeys or whether the monkeys from them.”35 In his 1785 Outline of the History of Humanity, the German philosopher Christoph Meiners divided humans into two races, “the beautiful White race” and the “ugly Black race.” And Hume added and later revised a statement to his essay “Of National Characters,” in which he described “negroes and in general all other species of men (for there are four or five different kinds) to be naturally inferior to the whites” and specifically noted the absence of “any symptoms of ingenuity” in “negroe slaves dispersed all over Europe.”36 Georges Cuvier, a French naturalist whose work provided the foundation for comparative anatomy and paleontology, believed there were three races: Caucasians, Mongolian, and Ethiopian. Building on the Dutch physician Petrus Camper’s efforts to apply measures of facial angle to distinguish among his racial groups, Cuvier undertook extensive study of human skulls taken from different regions of the earth. Most notably, Camper developed a method for measuring facial angle.37 Finding that “the proportion of the cranium to the face [cranio-facial ratio], the projection of the muzzle [facial angle], the breadth of the cheekbones, [and] the shape of the eye-sockets”

26 Race, Racism, and the White Racial Frame differed among his three races, Cuvier concluded that these differences in skull structure accounted for differences in the “moral and intellectual faculties” among his three races.38 In addition to establishing the Caucasian as the intellectually and morally superior of his three races, Cuvier’s linking of features of the skull with intellect paved the way for the field of phrenology that blossomed in the mid-1800s and, as explored in Chapter 6, was a precursor for tests of intelligence. The Haunting of Blumenbach’s Skulls Cuvier’s development and application of comparative anatomy to classify fauna extended Johann Friedrich Blumenbach’s analysis of human skulls to classify human beings into distinct and separate races. A German scholar whose interests spanned medicine, physiology, and anthropology, Blumenbach subscribed to Buffon’s degenerative theory of monogenism. As a newly appointed professor at the University of Göttingen, Blumenbach attended a lecture during which Petrus Camper described his use of facial angle measures to examine differences among the skulls of humans membered into different races.39 Although Blumenbach gradually grew skeptical of the facial angle method, Camper’s analyses of skulls to document differences among humans inspired him to begin collecting skulls for similar analyses. Over time, Blumenbach’s collection included 245 skulls seized from various regions of the world. With these, Blumenbach developed a variety of measures including the size of the forehead, eye socket, jawbone, and the angle of teeth, jawbone, and nasal bone, as well as Camper’s facial angle. These many measures were combined to form what he termed the norma verticalis, which presented a view of the skull from above.40 From this view, a line drawn across the highest point of the skull (maxillary level) allows the protrusion of the forehead to be compared with that of the face. Blumenbach believed differences between these two protrusions varied systematically among races.41 Combining his norma verticalis with skin color, Blumenbach defined first four and then five categories of race: Negro, Mongolian, Malay, American Indian, and Caucasian.42 It was the name given to this last category—Caucasian— that is the most lasting impact Blumenbach had on the categorization of race. Adhering to the theory of degeneration, Blumenbach’s ordering of his racial categories positioned Caucasians first. Reflecting the cosmetic value of pure-white skin accented by rosy red cheeks in vogue at the time, Blumenbach wrote, The white colour holds the first place, such as is that of most European peoples. The redness of the cheeks in this [Caucasian] variety is almost peculiar to it: at all events it is but seldom to be seen in the rest.43

The Origins of Race 27 Blumenbach opted to term the white European race Caucasian because he believed modern humans originated in the Caucasus Mountains, an area in what is now the nation of Georgia.44 It is in the Caucasus Mountains that Noah’s ark was thought to land, and it is the people (specifically the women) residing in the Caucasus Mountains who were known at the time as possessing the greatest beauty. As Blumenbach describes: I have taken the name of this variety from Mount Caucasus, both because its neighborhood, and especially its southern slopes, produces the most beautiful race of men, I mean the Georgian; and because all physiological reasons converge to this, that in that region, if anywhere, it seems we ought with the greatest probability to place the autochthones of mankind. For in the first place, that stock displays … the most beautiful form of the skull, from which, as from a mean and primeval type, the others diverge by most easy gradations on both sides … white … we may fairly assume to have been the primitive colour of mankind.45 Although Blumenbach clearly defined five races, his views on variation among human beings was similar to that of Linnaeus. Both men noted considerable variation in physical traits within and across their racialized categories. For example, Blumenbach understood that infant skulls in other civilizations had been purposefully reshaped by constriction and that the shape of German skulls was influenced by “keeping infants on their backs with their heads usually flat against a firm surface.”46 Blumenbach also recognized the arbitrariness of his categorizations, writing, “it is very clear they are all related, or only differ from each other in degree [yet] even among these arbitrary kinds of divisions, one is said to be better and preferable to another.”47 Despite holding what is believed to be the largest collection of skulls at the time, Blumenbach also warned against drawing conclusions about a whole group of people based on a small set of observations.48 Blumenbach gave hierarchy to his racialized categories of human beings. But he also recognized that slavery, lack of access to education by those enslaved, and the resources availed to elites in European society all played roles in creating an image of European superiority. Blumenbach’s few interactions with people of African heritage led him to conclude “in respect of their natural mental capacity and abilities, [they] certainly do not appear inferior to the other human races.”49 Despite these observations, Blumenbach’s arbitrary categorizations, particularly that of Caucasian, and his measures of the human skull had lasting impacts of the racialization of human beings.

28 Race, Racism, and the White Racial Frame The Supremacy of the White Caucasian Race In 1444, 235 people captured in Africa were unwillingly brought to the port of Lagos, Portugal. While observing this event, Prince Henry’s chronicler Gomes Easnes de Azurara (Zurara) described the enslaved people of Africa in both humane and inhumane ways. Zurara first chronicles the agony these enslaved people surely felt as they disembarked the ship: And now these Moors, because of the long time we have been at sea; as well as for the great sorrow that you must consider they have at heart, at seeing themselves away from the land of their birth, and placed in captivity … and moreover because they have not been accustomed to a life on shipboard—for all these reasons are poorly and out of condition.50 This compassion, however, gives way to conditioning produced by the then meaning of black: For amongst them were some white enough, fair to look upon, and well proportioned; others were less white like mulattoes; others again were as black as Ethiops, and so ugly, both in features and in body, as almost to appear (to those who saw them) the images of a lower hemisphere.51 In this description we see both the variation in pantones that existed among the enslaved humans and the negative description given to those with the darkest complexions. The negative characteristics associated with the word black persisted through the Enlightenment and similarly influenced negative portrayals of people with skin of dark pantones. The essays and books penned by Bernier, Linnaeus, Buffon, Camper, Blumenbach, Cuvier, Meiners, Voltaire, Kant, and many other Enlightenment scholars all contain sections that elevate whiteness and denigrate blackness, some with restraint and others without. However, it is a stretch too far to attribute the shift in the meaning of race that occurred during the Enlightenment and the accompanying denigration of not-White races to the meaning of black. A more reasonable explanation for this shift is the African slave trade, the early phase of which Zurara accounts. The history of the African slave trade is too long and complex to summarize here. What is well documented, however, is the speed with which it grew and the economic dependency on cheap labor it produced, particularly in the lands taken in the American continents by European conquistadors, plantation owners, and colonial settlers. The economic prosperity produced by enslaved forced labor required moral justification. Fabricating hierarchy in racial categorizations provided Europeans and colonialists a rationale for enslavement.

The Origins of Race 29 Examining the writings produced by Enlightenment philosophers, Emmanuel Chukwudi Eze unveils the white supremist thinking that emerged as the African slave trade reached its highest volumes. In the writings of Kant, Hume, and Hegel, “‘reason’ and ‘civilization’ became almost synonymous with ‘white’ people and northern Europe, while unreason and savagery were conveniently located among the non-whites, the ‘black,’ the ‘red,’ the ‘yellow,’ outside Europe.”52 Focusing specifically on Kant, Eze notes that the largest portion of his career explored anthropology and physical geography. Kant’s interest in the twin sciences is evidenced in his teaching—72 courses on anthropology and geography as compared to 54 on logic, 49 on metaphysics, and 28 on moral philosophy. Kant wrote voluminously about race, publishing at least five extended essays and two books that presented his racial theories. It was Kant who first described race as a “fixed natural entity.”53 Hume went further than his contemporary philosophers “hitching superiority to complexion.”54 In doing so, Hume ignored empirical evidence available to him that falsified his assertion of white European superiority, dismissing the poems and other writings of Francis Williams, a former Jamaican plantation slave, as mere parroting. In doing so, Hume ignored Williams’s graduation from Cambridge University, his work as an accomplished Latin and mathematics teacher, and his public refutation of Hume’s characterizations of his intellect. Hume similarly ignored the academic accomplishments of Jacques-Elisa-Jean Capitein, a young man from West Africa who mastered several languages while studying at the University of Leiden (Holland), who published a dissertation in Latin and Dutch arguing that slavery contradicted Christianity, and who was placed in charge of education for Calvinist missionaries in Ghana. And Hume ignored the accomplishments of Anton Wilhem, a professor at the University of Jena (Germany), who produced several scholarly publications in Latin.55 Despite these counter-examples, a confluence of developments throughout the Enlightenment gave rise to the biological and hierarchical conception of race. Advances in seafaring technologies enabled travel over great distances and accented the differences among people living in distant lands. Embrace of the human senses as a source of empirical evidence and a hunger to understand the natural world through observation permitted humankind to be separated into distinct categories based on ocular physical features. Lingering religious ideas about the genesis of man and the Great Chain of Being provided a foundation for classification systems and explanations for how observable variations developed. Many of these explanations centered the genesis of modern humankind in the Caucasus Mountains, home to the people possessing physical white traits closely reflecting those of Greek gods glorified by the art of the day. Each of these pillars was linked to the rapidly expanding African slave trade and the need for moral justification for

30 Race, Racism, and the White Racial Frame enslavement. Together these developments conspired to enable Enlightenment philosophers and scientists to divide humankind into racialized groups and to position the “well-reasoned,” “civilized” white Caucasians of Europe on top and the negatively connotated, black-skinned enslaved people from Africa below. As the English colonies became the United States, tension over slavery welcomed importation and aggrandizement of the racialized conceptions developed by Enlightenment scientists and philosophers. As we will see in the next chapter, these imported ideas were molded over the next two centuries to protect the power of those membered into the White race. Notes

1 Omi and Winant (2015), p. 105. 2 Roberts (2011), p. 4, italics in original. 3 Act 2, Scene 4, Lines 16–17. 4 This summary of the origins of the term race is based on the Merriam-Webster Dictionary and the Online Etymology Dictionary (https://www.etymonline.com/ word/race). See also Doron (2012). 5 Douglas (2008), p. 34. 6 Boulle (2003) notes that despite the lack of the biological meaning that was soon introduced, the term race was not value free. In France, Race replaced more neutral terms describing noble lineage, such as maison (household) or famille, precisely because it distinguished between good breeding and the absence of breeding … From the first, therefore, the term focused on natural—what we would now call biological—differences and placed great value on the possession of inherited character traits. [But u]nlike the modern advocates of race, however, noble theorists of the sixteenth century did not see such qualities as fixed or inevitable. (p. 12) 7 Douglas (2008), p. 34. 8 Eze (2001), p. 5, writes: Enlightenment philosophy was instrumental in codifying and institutionalizing both the scientific and popular European perceptions of the human race. The numerous writings on race by Hume, Kant, and Hegel played a strong role in articulating Europe’s sense not only of its cultural but also racial superiority. (italics in the original) 9 Brace (2005), p. 36 suggests that Martin Luther’s challenges to the Church, which promoted the idea that individuals were autonomous and fully capable of making decisions on their own that allowed them to live a Christian life, laid the foundation for accepting human senses as sufficient for understanding the natural world. 10 Brace (2005). 11 Brace (2005) notes that Polo’s records occasionally use the term razze, but while the English word race is a cognate, the implications of the Italian word are rather different from the accepted

The Origins of Race 31 connotations in English. Polo mentioned the presence of ‘three races’—tre razze … The three are Turks, Armenians, and Greeks. Brace also notes that Polo used various terms when referring to people residing in various locations. He used the term popoli (peoples) when describing encounters with Armenians and Greeks. In other places, he used the term uomini (humans). Translations of his work use the word classes (English), genera (Latin), and gens (French, people). Despite these variations, Brace believes “the context of Polo’s occasional use of the term razze shows that it does not have the implications that are at the core of the concept ‘race’ in the present world” (p. 20). See Drake (1987) and Snowden (1983) on early interactions among people from Europe and Africa, including during the Greek and Roman empires, which indicates that, while there is some mention of physical characteristics of people from Africa in European accounts, the primary focus was on cultural and technological features of their societies. 12 The invention of the caravel, with its aerodynamic hull and triangular lateen sails, is credited for allowing the Portuguese to explore and soon after exploit the settled regions of the West African coastline. 13 Brace (2005), pp. 28–29. 14 Doron (2012), p. 82. 15 Margócsy (2010) describes how botanists communicated about specific flora prior to Linnaeus’s classification system. 16 Linnaeus’s first edition launched his effort to document and describe organic life on earth. It was not until his second edition that his formal system of classification was introduced. 17 Although the many writings on the origins of the biological concept of race describe this expansion from a single descriptor to these five characteristics, Linnaeus’s text does not present descriptions of these various characteristics in such a clean and organized manner. Pages 20–22 of his 1758 edition lists the then six varieties of Homo sapiens. For Americanus, he includes the descriptors for hair, nose, face, and chin (Pilis nigris, rectis, craffis; Naribus patulis; Facie ephelitica; Mento fubmberbi). For Europaeus and Asiaticus, he adds a description for eyes and omits everything else except hair. For Afer, he describes hair, skin, nafo, and lips, as well as features of women and mothers. 18 Carl Linnaeus (1737), p. 153, quoted in Müller-Wille (2015), p. 196. 19 Whitaker (2019). In his analysis of the King of Tars, Whitaker notes that it was not the baptism itself that caused the sultan’s skin to turn white; rather, it is the priest’s bestowing his own name onto the sultan; “The Christian priest [was] called Cleophas; He called the sultan of Damscus after his own name. His skin, that was black and hateful, became all white” (p. 23). Whitaker infers from this ordering that the sight of God turning his skin black to white suddenly allowed the sultan to understand the almightiness of God and to then be truly baptized into Christianity. It was this external conversion from blackness to white that allowed his internal conversion from the “impurity” of the Islam faith to the “purity” of Christianity. 20 Jordan (2009), p. 39, quoting from the Oxford English Dictionary. 21 Painter (2010), p. 61. 22 Golash-Boza (2016) contrasts the association of white skin with beauty that emerged during the early period of European colonialism with colorism that developed in China during precolonial time, noting that in China, not all distinctions among skin tone had racialized meaning (pp. 129–130). 23 Feagin (2013) documents that Sir William Petty, an English anatomist and philosopher, introduced the idea of “‘blacks’ being physically and culturally inferior to ‘whites’” (p. 50) a decade before Bernier’s publication.

32 Race, Racism, and the White Racial Frame 24 Boulle (2003). 25 Bernier (1864). 26 Bernier (1864). 27 The volumes were a collaborative project to which other naturalists and artists contributed. After Buffon’s death in 1788, his collaborators continued to add to Historie Naturelle, producing eight more volumes. 28 Brace (2005), p. 31. 29 Buffon, quoted in Doron (2012), p. 97. 30 Doron (2012). 31 Doron (2012), p. 96, argues that Buffon understood race “as a relatively constant succession of varieties transmitted along generations inside the human species.” 32 As an example, Buffon points to the race he termed Lapps and notes that despite variation in some characteristics of the Borandians, Zembians, and Samoyeds, they are of the same race because of shared physical features and customs. Buffon writes, “if these peoples differ, it is only a question of more and less.” A similar argument is presented for the race Buffon defined as the Tatars: despite some variation they “share so many similarities that we have to consider them as being part of the same race … the essential characters of their race always remain.” Doron (2012), p. 95, quoting Buffon’s 1749 Variètès dans l’expèce humaine, pp. 371, 379. 33 Roberts (2011), p. 31. Brace (2005, p. 31) similarly attributes sustained reproduction as a required condition of species, quoting a different passage by Buffon: “‘We should regard two animals as belonging to the same species if, by means of copulation, they can perpetuate themselves and preserve the likeness of the species” (1799, II:10). 34 Bindon (2017) notes that Tuttle (1866, p. 35) lists Buffon’s races as “Polar, Negro, Tartar, American, Australian, Asiatic, European” while Brewer (1890, p. 117) lists Buffon’s six races as “(1) the Caucasian (2) the Mongolian (3) the American (4) the Malay (5) the African and (6) the Australian” and Goldsmith (1744) lists them as “Laplanders, Tartars, Southern Asiatics, Africans, Americans, Europeans.” 35 Voltaire Les Lettres d’Amabed (1769), Septième Lettre d’Amabed. 36 Quoted by Immerwahr (1992). In latter versions of the essay, Hume modified this statement in several ways, one of which was removing reference to “all other species of men (for there are four or five different kinds).” There is some controversy as to why Hume made these changes. Some authors interpret these changes as indicating Hume’s prejudice was targeted specifically at people membered Black and were in response to criticism of his position that pointed to major accomplishments of people indigenous to the American continents, specifically the Mayans and Incas. Others observe that this note contrasts with his stance on slavery and suggest it may have been added to combat an argument current at the time in favor of slavery (see Asher, 2022). In addition to Immerwahr (1992), see Eze (2001) and Garrett (2000). 37 Penn Museum (2020), p. 2. 38 Cuvier, quoted in Douglas (2008), p. 46. 39 Painter (2010); see footnote on p. 66. 40 Painter (2010). 41 Montandon (2017); Painter (2010). 42 Blumenbach added the Malay as a separate race after acquiring a skull collected by Captain Cook during an expedition to the South Seas sponsored by Sir Joseph Banks, the then president of the London’s Royal Society. See Painter (2010), p. 74, who implies the addition of this fifth race was inspired more by a desire to win further favor with Banks than was influenced by empirical evidence.

The Origins of Race 33 43 Painter (2010), p. 80. 44 Richards (2018); Roberts (2011); Brace (2005). 45 Brace (2005), p. 45, quoting page 269 of a translation produced by Bendyshe in 1865. 46 Richards (2018), p. 160. 47 Brace (2005), p. 46, quoting page 264 of a translation produced by Bendyshe in 1865. 48 Painter (2010), see p. 76. 49 Richards (2018), p. 162, quoting Blumenbach (1787), p. 4. 50 Whitaker (2019), p. 184. 51 Whitaker (2019), p. 186; Kendi (2016), p. 24. 52 Eze (2001), p. 5. 53 Whitaker (2019), p. 13, referencing Jablonski (2012), p. 251. See also Eze (2001), who presents several of Kant’s writing about the origins of race on pages 38–49. 54 Jordan (1968), p. 253. 55 Popkin (1992, 1993).

References Asher, K. (2022). Was David Hume a racist? Interpreting Hume’s infamous footnote (Part I). Economic Affairs, 42(2), 225–239. Bernier, F. (1864). A new division of the earth, according to the different species or races of men who inhabit it. Journal des Sçavans, 12, 133–140. Bindon, J. (2017). Darwin’s borrowed allegory and the apocryphal six races of buffon. http://jbindon.people.ua.edu/race-and-human-variation/darwins-borrowedallegory-and-the-apocryphal-six-races-of-buffon Blumenbach, J.F. (1787). Einige naturhistorische Bemerkungen bey Gelegenheit einer Schweizerreise, von den Negern. Magazin für das Neueste aus der Physik und Naturgeschichte, 4(3), 1–12. Boulle, P.H. (2003). François Bernier and the origins of the modern concept of race. In The Color of Liberty. Duke University Press. Brace, C.L. (2005). “Race” Is a Four-Letter Word: The Genesis of the Concept. Oxford University Press. Brewer, W.H. (1890). Warren’s New Physical Geography. E.H. Butler & Co. Coles, K.A., Bauer, R., Nunes, Z. & Peterson, C.L. (2015). The Cultural Politics of Blood, 1500–1900. Springer. Doron, C.O. (2012). Race and genealogy. Buffon and the formation of the concept of “race”. Humana Mente Journal of Philosophical Studies, 5(22), 75–109. Douglas, B. (2008). Climate to Crania: Science and the racialization of human difference. In Foreign Bodies: Oceania and the Science of Race. ANU Press. Drake, S. (1987). Black Folk Here and There: An Essay in History and Anthropology (Vol. 7). University of California Center for Afro-American Culture and Society. Eze, E.C. (2001). Race and the Enlightenment: A Reader. Blackwell Publishers. Feagin, J.R. (2013). The White Racial Frame: Centuries of Racial Framing and Counter-framing. Routledge. Garrett, A. (2000). Hume’s revised racism revisited. Hume Studies, 26(1), 171–177. Golash-Boza, T. (2016). A critical and comprehensive sociological theory of race and racism. Sociology of Race and Ethnicity, 2(2), 129–141.

34 Race, Racism, and the White Racial Frame Goldsmith, O. (1774/1854). An History of the Earth and Animated Nature. Willam Sprent, Neville Street. Immerwahr, J. (1992). Hume’s revised racism. Journal of the History of Ideas, 53(3), 481–486. Jablonski, N.G. (2012). Living Color. University of California Press. Jordan, W.D. (1968). White over Black: American Attitudes toward the Negro, 1550– 1812. University of North Carolina Press. Jordan, W.D. (2009). First impressions. In Theories of Race and Racism: A Reader. Routledge. Linnaeus, C. (1737). Critica botanica in qua nomina plantarum generica, specifica, & variantia examini subjiciuntur, selectiora confirmantur indigna rejiciuntur simulque doctrina circa denominationem plantarum traditur. De Necessitate Historiae Naturalis Discursus: Sen Fundamentorum Botanicorum pars IV. Margócsy, D. (2010). “Refer to folio and number”: Encyclopedias, the exchange of curiosities, and practices of identification before Linnaeus. Journal of the History of Ideas, 71(1), 63–89. Montandon, D. (2017). Head shape configuration over the centuries. Journal of Craniofacial Surgery, 28(8), 1890–1900. Müller-Wille, S. (2015). Linnaeus and the Four Corners of the World. In The Cultural Politics of Blood, 1500–1900. Springer. Omi, M. & Winant, H. (2015). Racial Formation in the United States. Routledge. Painter, N.I. (2010). The History of White People. WW Norton & Company. Penn Museum. (2020). A history of craniology in race science and physical anthropology. https://www.penn.museum/sites/morton/craniology.php Popkin, R.H. (1992). Hume’s racism reconsidered. In The Third Force in SeventeenthCentury Thought. Brill. Popkin, R.H. (1993). The High Road to Pyrrhonism. Hackett Publishing. Richards, R. (2018). The beautiful skulls of Schiller and the Georgian girl: Quantitative and aesthetic scaling of the races, 1770–1850. In Johann Friedrich Blumenbach. Routledge. Roberts, D. (2011). Fatal Invention: How Science, Politics, and Big Business Re-Create Race in the Twenty-First Century. New Press. Snowden, F.M. (1983). Before Color Prejudice: The Ancient View of Blacks. Harvard University Press. Tuttle, H. (1866/1896). Origin and Antiquity of Physical Man. Copley’s Press. Whitaker, C. (2019). Black Metaphors: How Modern Racism Emerged from Medieval Race-Thinking. University of Pennsylvania Press.

2

Molding Race in the United States

Race and racism are centrally about seeking, or contesting, power … racial systems use appearances and ancestry as weapons in violent struggles over group position in material and social status.1 For five centuries the phrase “the American people” has been understood as an implicitly white designation.2

Oppression of humans through enslavement dates back at least three millennia. The Greeks, Romans, Egyptians, Mayans, Ottomans, Vikings, and the many dynasties that controlled what is now known as China all relied on slavery to support their agricultural systems, provide domestic labor, and develop and maintain infrastructure. For ancient societies, slavery was a byproduct of war, with winners enslaving the defeated. Once enslaved, humans were also traded. During the Roman Empire, an extensive slave trade operated across North Africa. Later, the Moors traded extensively for enslaved humans with tribal communities throughout North and Central Africa. Similar routes existed in sections of Asia, with victors of armed conflict trading their enslaved captors for financial gain. Ransoming enslaved people conquered through war also occurred and was a means for recovering the financial costs of war incurred by the victors. What sets apart the enslavement and subsequent trade of humans residing in West Africa, which the Portuguese began in 1444, is the absence of war as the catalyst for enslavement and the use of one’s genealogy as a justification for enslavement.3 In his account of the Portuguese’s very first enslavement of “Black Moors,” Prince Henry’s chronicler, Zurara, describes the act of enslavement as one that was in accordance with ancient custom … after the flood, Noah laid upon his son Ham, cursing him in this way: that his race should be subject to all the other races of the world. And from his race these [enslaved Black Moors] are descended.4 DOI: 10.4324/9781003228141-4

36 Race, Racism, and the White Racial Frame In Zurara’s account of this first enslavement of people from Africa by Europeans, we see the seed of capital benefit from the accumulation of enslaved people.5 After capturing two people, the commander of one of the two ships that were the first to bring enslaved Africans to Portugal, persuaded his colleague to seek more captives: Although you are carrying off these two souls through whom the Prince may come to learn something, that does not prevent what is better still, namely for us to carry off many more. For besides the knowledge which the Lord Prince will gain through them, profit will also accrue to him from their service or ransom.6 And so it was that the Portuguese captured and returned with an additional ten enslaved people. The next voyage returned with 29 enslaved people, motivating merchant sailors to petition the prince for licenses to travel to West Africa for the purposes of capturing and enslaving increasing numbers of people. The granting of such licenses marked the shift of enslavement as a military spoil to that of a mercantile commercial enterprise. The enslavement of people from Africa expanded rapidly. Within a century, more than 135,000 enslaved people from Africa labored in Europe.7 As the Portuguese and Spanish seized land from the people residing in the American continents, they brought large numbers of enslaved people from Africa to clear and work land for agricultural purposes, extract minerals from mines, and perform domestic labor. Less than two centuries after the Portuguese’s initial enslavement of people from Africa, the number of enslaved people laboring in what is now known as Central and South America exceeded a half million. Enslaved people of Africa, however, were absent in the British settlements established on lands taken along the eastern coast of North America—that is, until 1619. As the New York Times 1619 Project has helped make common knowledge, the first enslaved Africans were shipped into the British colonial port of Point Comfort in August 1619.8 The 20 humans were originally enslaved by the Portuguese in what is now known as Angola and were then purloined by two English-owned pirate ships that traded the enslaved people to the British colony in exchange for food and supplies.9 This exchange not only marked the start of nearly 250 years of slavery in the British colonies and the United States,10 it sparked a 400-year history (re)molding the categorization of human beings in the United States, a process that continues today. Molding Race in the United States Molding is both the process of forming an object out of a malleable material and influencing the formation or development of an idea, theory, or object.

Molding Race in the United States 37 I use the term molding here to reflect the intentional and carefully selected expansion and contraction of racialized categories, and the subsequent formation of subgroups of people based on those categories that has and continues to occur in the United States as well as other areas of the world. As the two quotes opening this chapter observe, in the United States, the molding and remolding of race has and continues to function to define who is membered White in order to preserve and maintain political, social, and economic power for those membered White. This chapter documents the many ways in which race has and continues to be molded in the United States. The chapter begins by examining the molding of race during the first century of British colonial rule in what became the United States. The chapter then explores the social and economic developments that influenced the (re)molding of race during the 150 years following U.S. independence from British rule. The analysis presented in this chapter documents the many ways in which race has been (re)molded to preserve whiteness for a limited collection of people residing in the United States. This analysis also makes clear why race is a social construct that has and continues to be reconstructed to preserve advantage for those granted whiteness. As we will see in Parts II and III, the molding of race has at least two important implications for educational measurement. First, the fact that racialized categorizations are molded to fit social, economic, and political needs that evolve over time exemplify the social rather than biological construction of race. This social construction locates racialized identity outside of the individual—it is not traits of an individual that define one’s race, but rather the social needs of society that direct the placement of individuals into racialized groups based on a specific subset of traits. In other words, race is not an individual trait, but rather a socially architected categorization of people. Second, the purpose of this categorization is to provide advantage for some through the oppression of others. The study of race and racialized categorization is not of individuals and their traits, but rather of systems that produce advantage and oppression, and how these systems impact opportunity, lived experiences, and outcomes for the people membered into socially constructed categorizations. For this reason, the study of education that considers race is not the study of individuals and their experience in the educational system, but rather the study of the impact that these constructions have on individuals and groups of individuals. The remainder of this chapter aims to shed light on the molding of race in the United States with the goal of reorienting considerations of race in educational measurement from that of an individual trait to that of a social construction of disparate opportunities and outcomes.

38 Race, Racism, and the White Racial Frame Molding Race in the 17th Century Shortly after the first enslaved people taken from Africa were disembarked in Point Comfort, colonial courts and legislative acts molded human categorizations to advantage White English property owners (who were male) through the oppression of newly arrived people of African descent. During this early colonial period, the molding of racialized categorizations occurred even though the concept of race hinted at by Bernier in 1684 and made scientific by Linnaeus, Buffon, and others was still a century off from its making. The biological conception of race introduced in the mid-18th century was then laid neatly over the status categories operating at the time in the British colonies. During the United States’ colonial period, the molding of what became racialized categories occurred through legislative acts and court cases that addressed three main issues: clarifying one’s status as a free and whole human, controlling who is permitted to engage in intercourse and marriage, and modifying the inheritance of status. Although there is not room here to present the full volume of cases specific to each of these three issues, the court rulings and legislative acts shared in this chapter demonstrate how the concept of human categorizations in the British colonies was carefully molded and applied to (re)produce and preserve advantage for the empowered White property owners. Molding Status as a Free and Whole Human

In 1619, the colony of Virginia was populated by a small number of land owners and a substantially larger number of indentured servants. Residing on land taken from the Powhatan confederation, the colonists interacted regularly with the Indigenous people who lived on the lands abutting the Chesapeake Bay. Although the English colonists came for commercial purposes, their religious beliefs influenced their categorization of people. The term Christian was applied to all English colonists and heathen to people who were indigenous to the taken land or who were otherwise not baptized into the Christian faith. Although the English colonists interacted regularly with the Indigenous “heathens,” they did not consider them members of the colony.11 The embarkment of “20 and odd” enslaved humans taken from West Africa introduced a new group of people to the Virginia colony; dark-skinned people of Africa. At the time, the English reserved the term “Christian” for themselves and used it to distinguish themselves from “Negroes” and “Indians.” During the 1680s the English colonists introduced the term “white” to distinguish themselves from “Negroes.” Gradually, the colonists also began referring to “Negroes” as “blacks” or “Africans.”12 From this point forward, I use the term “White” rather than “Christian” regardless of the time period.

Molding Race in the United States 39 As the late Judge Leon Higginbotham describes in his analysis of race and law in the British colonies, when the first people from Africa were disembarked in Virginia, “there was not as yet a statutory process to especially fix the legal standing of blacks.”13 Historical analyses conducted during the late 19th century through the 1960s suggest the Virginia colonists treated people of African descent as indentured servants. More recent analyses, however, maintain that an indentured status applied only to those people of African descent who came voluntarily to the colonies by way of England, rather than forcibly through the Middle Passage.14 The case of John Punch in 1641, however, clearly demarcated the status of people taken from Africa from those who were White. Born in West Africa, John Punch was caught attempting to escape his servitude while running away with two White indentured servants. For punishment, the White Scotsman and White Dutchman both had their service extended by four years. For Punch, the court ruled “being a negro named John Punch shall serve his master or his assigns for the time of his natural Life here or elsewhere.”15 This ruling distinguished Punch from his accomplices, membering him “a negro.” It was Punch’s status as a person of dark skin taken from Africa that led the courts to sentence him to enslavement for life. And so it became that dark-skinned people taken from Africa were understood, commonly and legally, as enslaved servants for life. Nineteen years later, the lifelong enslavement of people taken from Africa was affirmed by Act XXII, which threatened any English—that is, White— servant who runs away with an enslaved person taken from Africa with serving the slave’s master for a period of time equivalent to the slave’s absence. The punishment for running away with an enslaved person taken from Africa was lengthened shortly thereafter, sentencing the White accomplice to lifelong servitude should the Black slave succeed in their escape or die trying to do so. While these punishments deterred people who were White from assisting the escape of enslaved people, they did little to dissuade the enslaved from attempting to escape. To provide such dissuasion, a 1669 act, titled “An Act about the casual killings of slaves,” recognized the need for harsh corporeal punishment as a form of deterrence and excused death that occurred as a result of such punishment.16 Collectively, this set of court rulings and acts made clear the distinction between White indentured servants and the people taken from Africa, and in the process cemented the absence of human rights for people taken from Africa; permitting their murder during punishment and equating their servitude with enslavement for life. At about this same time, an enslaved man, named Fernando, similarly sued for his freedom. His claim focused on his baptism by the Portuguese prior to being sold into slavery in Virginia. At the time, it was understood that only non-Christians could be enslaved. Fernando argued that his baptism changed

40 Race, Racism, and the White Racial Frame his status from a heathen to a Christian, and thus he could no longer be enslaved. Concerned that conversions to Christianity, and subsequent freedom, would jeopardize the rapidly expanding growth of the colony that was becoming increasingly dependent on the labor of enslaved people of African descent, the General Assembly modified the significance of baptism for one’s free state, ruling that “conferring of baptism doth not alter the condition of the person as to his bondage or freedom.” In effect, this act affirmed that it was not one’s status as a heathen, but rather the color of one’s skin, that justified enslavement.17 Virginia was not the only colony in which the legal system distinguished between White people of Europe and dark-skinned people of African descent. During the latter half of the 17th century, the colony of Massachusetts Bay passed a series of edicts, acts, and court decisions that differentiated punishment and rights based on the color of one’s skin. People taken from Africa received harsher sentences, including hangings and immolations, for similar offenses committed by people who were membered White. Curfews were established preventing people of dark skin from being out at night. And taxes were levied on the importation and ownership of people taken from Africa.18 Like those in Virginia, these acts document an understanding of people taken from Africa as less than human chattel. The “less than” status of people taken from Africa was affirmed eight decades later when the U.S. Constitution defined the value of an enslaved human as two-fifths less than a free person for the purposes of representation and taxation. The Dred Scott case, heard by the U.S. Supreme Court in 1857, solidified the status of people taken from or otherwise descended from Africa as less than those of European descent. As Justice Roger B. Taney wrote in the majority opinion: The question is simply this: Can a negro, whose ancestors were imported into this country, and sold as slaves, become a member of the political community formed and brought into existence by the Constitution of the United States, and as such become entitled to all the rights, and privileges, and immunities, guaranteed by that instrument to the citizen? One of which rights is the privilege of suing in a court of the United States in the cases specified in the Constitution. [The Court’s answer:] We think they [people of African descent] are not, and that they are not included, and were not intended to be included, under the word “citizens” in the Constitution, and can therefore claim none of the rights and privileges which that instrument provides for and secures to citizens of the United States. To further emphasize the “less than” status of people of African descent, Justice Taney continued,

Molding Race in the United States 41 On the contrary, they were at that time considered as a subordinate and inferior class of beings, who had been subjugated by the dominant race, and, whether emancipated or not, yet remained subject to their authority, and had no rights or privileges but such as those who held the power and the Government might choose to grant them. What started as an unclear status in 1619 evolved rapidly into a “less than” status that was affirmed as an “inferior” noncitizen 230 years later. The 14th Amendment to the U.S. Constitution, added five years after the end of the Civil War, attempted to redress this classification. Nonetheless, by that time the common understanding and associated narratives denigrating people descended from Africa were firmly entrenched among many people of European descent. Molding Race through Sexual Relations and a Child’s Status

When British colonists first arrived on the lands taken from the Powhatan confederation, a child’s status was defined by the status of their father. As such, a child born out of wedlock was understood to inherit their father’s status. The arrival of enslaved people of Africa and the subsequent sexual relations that occurred between (White) Christian colonists and enslaved women from Africa complicated the inheritance of status. Although a series of rulings and regulations dissuaded sexual relationships across racialized groups, such relationships continued. Most often it was a male Christian colonist who impregnated an enslaved female of African descent, often through rape. Under British law, the child produced from such a relationship inherited their father’s status as a free person. Given the growing need for laborers and the financial investment property owners had made in the purchase and keeping of enslaved people, passing the father’s status to a child of partial African descent was problematic for White male property owners. To address this conflict, two sets of court and legislative actions occurred during the early colonial period. The first aimed to deter sexual relations across racialized lines. The second redefined the rules governing a child’s status. The first known case in colonial Virginia that focused on interracial sexual relations occurred in 1630 when a White male, named Hugh Davis, was sentenced “to be soundly whipt before an assembly of negroes and others for abusing himself to the dishonor of God and shame of Christianity by defiling his body in lying with a negro.”19 Ten years later, Robert Sweat was similarly caught engaging in intercourse with a woman of African descent. In this case, Sweat was sentenced to “do public penance for his offense” while the woman was sentenced to “be whipt at the whipping post.”20 Three decades later, marriage between an “English or other white man or woman” with a “Negro, mulatto, or Indian” was forbidden and punished by

42 Race, Racism, and the White Racial Frame lifelong banishment from the colony. This act also punished English women who had a child by a person of African descent with a fine of 15 pounds or five years of service to the church. The child was similarly punished with service to the church until the age of 30. In effect, these laws banned interracial marriage and punished White women for engaging in sexual relations with non-White men. Although White men who engaged in sexual relations with a woman of African descent were fined, they were allowed to continue living in the colony as long as they remained unwed. An exception, however, existed for a slaveowner who engaged in sexual relations with a female slave; for this, neither the man nor the woman was punished, but if they had a child together, the child was enslaved for life. The 1656 case of Elizabeth Key sheds light on the role race and slavery played in reshaping the heredity of status. Born in 1630, Key’s mother was an enslaved woman taken from Africa who later became impregnated by an English man. After her father’s death, Key’s White godfather sold her to a judge of English descent. When the judge died, Key sued for her freedom, claiming that under British law, she should have inherited her father’s status in society, namely as a free person. Initially, a local jury ruled in her favor. An appeal to the General Court reversed this ruling and affirmed her enslaved status.21 This ruling flipped the British law from conferring status based on a child’s father to that of the mother. In 1663, Act XII affirmed this shift in status definition, specifying that “children got by an Englishman upon a Negro woman shall be bond or free according to the condition of the mother.”22 A year later, this reversal was also affirmed in Maryland, where questions were raised about whether “children that are slaves by birth … should by virtue of their baptism be made free.”23 This question was answered by the Lower House, which created “an Act obliging negroes to serve durante vita [during life].”24 Through this act, the law now clearly specified that all children born of a slave were to be enslaved for life. These rulings and acts (re)molded the relationship between parent and child to provide economic advantage to slave owners; modifying the English practice of status following that of the father and instead designated that the mother’s status be conferred to the child. In his historical analysis of British colonial law, Judge Higginbotham observes that the reversal of English doctrine created a perverse incentive for White slaveowners to engage in illicit relations with women they enslaved, and by doing so, “a white male could eliminate the cost of purchasing an infant slave; by agreeing to enslave his progeny he became a breeder of slaves.”25 As this brief history of the U.S. colonial period documents, racialization of people residing in the British colonies was molded through a series of laws and court cases that regulated who was a free person and who was not,

Molding Race in the United States 43 who could engage in intercourse and who could not, and how one’s racialized identity was passed from parent to child. Much of this early molding occurred prior to the redefinition of race as a biological trait, first hinted at by Bernier and made explicit by Linnaeus, Buffon, Blumenbach, and other intellectual leaders of the Enlightenment. As we explore next, the many social and economic changes that occurred following independence from colonial rule furthered the molding of race in the United States. Social Changes Following U.S. Independence Following independence, the molding of race in the United States occurred through three channels: legislative acts regulating immigration, naturalization, and citizenship; legal challenges based on those acts; and modifications to the U.S. census. Although the first census taken in 1790 did not collect information about a person’s national heritage, of the 80% of the population membered White, approximately 60% were of English descent and 8% were of Scottish descent.26 Of those remaining, the majority were enslaved people of African descent. Those Indigenous to the land were seen as foreign aliens. Over the next century, the many waves of immigrants and people of Mexican descent who remained on their land increased the diversity of national and regional origins populating the United States and its territories. This diversity, however, was not dispersed evenly across the United States and its western territory. Still enslaved until 1865, people of African descent were vastly overrepresented in the southern states. Having come to work in mines and on railways, people of Chinese, and later Japanese, descent resided primarily in California. People of Mexican descent resided in Texas and the territories of the Southwest. Many European immigrants, initially from Ireland and later from eastern and southern Europe, swelled northeastern cities, while those from the German and Nordic regions tended to settle in the northern areas of the Midwest. During this time, England also experienced rapid growth in its urban areas, and with it came increases in crime and poverty. To limit the challenges and social pressures produced by these conditions, England engaged in an active campaign that expelled prisoners and “paupers,” sending many to English colonies in Australia and others to northern U.S. cities. The geographical concentration of these several groups of new residents produced localized pressures specific to the people new to the United States. In the South, growing tension about enslavement heightened focus on the racialization of people of African descent. Perhaps most notable was the specious racial science promulgated by Samuel Morton, Josiah Nott, and George Gliddon undertaken to denigrate people of African descent in order to justify their continued enslavement.

44 Race, Racism, and the White Racial Frame In the Southwest, tensions developed between White settlers and residents of Mexican descent who exercised the option to remain on their land as new U.S. citizens. This tension spawned racialized narratives that denigrated people of Mexican descent, sparked violent attacks, and were used to justify nearly 600 documented lynchings, including women and children. These narratives also provided a pretext for the expulsion campaigns that forced nearly 1.5 million people of Mexican descent, many of them U.S. citizens, out of the country during the 1950s. As the Gold Rush came to an abrupt end, the Panic of 1873, the subsequent depression of 1877, and severe drought produced economic instability in California. Together, these events increased competition for employment between a growing number of settlers of European descent and migrants from China and Japan. In response, racialized narratives about people of Chinese and Japanese descent emerged in the western territories. These narratives bolstered support for laws that initially prevented immigration of people from China and later led to expulsion. These narratives were also used to justify acts of physical violence targeting the communities in which people of Chinese and Japanese descent resided; violence that produced lynchings of hundreds and the murder of thousands of people of Chinese and Japanese descent.27 Racialized narratives denigrating people of African and Asian descent fit neatly with the racialized categories introduced by Linnaeus, Blumenbach, and others in the 18th century. The racialization of people of Mexican descent relied on a key facet of the “biological categorizations,” namely differences in skin color and facial features. These differences in phenotypical features provided an ocular mechanism to reference people of Mexican descent as a new and distinct racial category—a category that resided below that of White Caucasians, but above people of African descent. For the European immigrants swelling northeastern cities, the biological conception of race was not sufficient. There was some variation in the facial features and skin tones of European immigrants, particularly those from southern Europe, but the differences were not sufficiently stark to produce new racialized categories based on phenotypical features. Moreover, Blumenbach’s definition of Caucasian included people from across Europe, through regions of the Middle East, and into South Asia. Instead, the concept of race reverted to pre-biological conceptions of genealogy and tribal (i.e., national) affiliation to distinguish between the “superior” people of Anglo-Saxon descent and those of Celtic, Norman, and southern and eastern Europe origins.28 As Mae Ngai, an American historian, describes, it was in this “constellation of reconstructed racial categories, in which race and nationality—concepts that had been loosely conflated [in] the nineteenth century—disaggregated and reali gned in new and uneven ways” that the laws and court rulings regulating immigration, naturalization, and citizenship (re)molded race.29

Molding Race in the United States 45 The blending of national origin with race in response to increasing social and economic pressures in the U.S. Northeast region during the 19th and early 20th centuries is evidenced in the writing and lectures of Ralph Waldo Emerson and the political rhetoric of Theodore Roosevelt. In her History of White People, Nell Irving Painter details a 30-year period during which Emerson segments the racial conception of Caucasian into discrete racialized categories based on regional descent. In doing so, Emerson elevates the status of people of Anglo-Saxon descent, establishing them as the superior race. Below reside people of Celtic and Norman descent. This expansion of racialized categorizations from White-Caucasian to Anglo-Saxon, Celtic, Norman and, to a lesser extent, other lines of European nationhood, began shortly after northeastern cities—Boston and New York in particular— experienced a growing influx of immigrants following the Irish Potato Famine that occurred during the late 1840s. This separation of people membered White into the Anglo-Saxon and Celtic races first emerged in a series of lectures Emerson delivered in the 1830s and culminated in his 1856 book titled English Traits. In it, Emerson glorifies the English: The long habitation of a powerful and ingenious race has turned every rood of land to its best use, has found all the capabilities … so that England is a huge phalanstery, where all that man wants is provided within the precinct. … What are the elements of that power which the English hold over other nations? If there be one test of national genius universally accepted, it is success; and if there be one successful country in the universe for the last millennium, that country is England … Is this power due to their race, or to some other cause? Men hear gladly of the power of blood or race. Everybody likes to know that his advantages cannot be attributed to air, soil, sea, or to local wealth, as mines and quarries, nor to laws and traditions, nor to fortune, but to superior brain … It is race, is it not? that puts the hundred millions of India under the dominion of a remote island in the north of Europe. Race avails much, if that be true, which is alleged, that all Celts are Catholics, and all Saxons are Protestants; that Celts love unity of power, and Saxons the representative principle. Race is a controlling influence in the Jew, who, for two millenniums, under every climate, has preserved the same character and employments. Race in the negro is of appalling importance … In Ireland, are the same climate and soil as in England, but less food, no right relation to the land, political dependence, small tenantry, and an inferior or misplaced race.30 In these and the nearly 100 other sentences that address race, Emerson divides the Caucasian race into separate races, placing the Anglo-Saxons of England above all others. Being of English stock, Emerson notes that “The American is only the continuation of the English genius into new

46 Race, Racism, and the White Racial Frame conditions, more or less propitious.”31 By “American,” however, Emerson means only those of English descent. Emerson’s notions regarding the separation of the Caucasian race are visible in the ideas presented by several leading figures in American political history. In his Short History of the English Colonies in America published in 1881, Henry Cabot Lodge argues that the success of the United States was due to its founders being of “English stock.”32 In The Races of Europe published in 1899, William Ripley, an esteemed professor of economics at Harvard, mimicked Linnaeus’s 1758 table presenting the characteristics that distinguished the four races of the world. In Ripley’s table, however, people of Europe were divided into three racialized categories—Teutonic (among which were the Anglo-Saxons), Alpine (including the Celtics), and the Mediterranean (among which were the Italians). For each he listed general characteristics of the head, face, hair, eyes, stature, and nose that distinguished among them. Although last alphabetically, he listed and labeled the Teutonic number one, followed by the Alpine and then the Mediterranean.33 During the final decades of the 19th century, the number of immigrants entering the United States from Mediterranean nations increased sharply and census records documented higher birthrates for these newly arrived immigrants compared to those of English stock. Concerned by these developments, Francis Amasa Walker, a former president of MIT, a highly honored economic statistician of his time, and director of the 1870 U.S. census, raised alarm that “the decay of reproductive vigor” of native white Americans—that is, of English stock—posed a major threat to the future of the nation. Fear that the U.S. Anglo-Saxon heritage was being diluted by the influx of immigrants of the Mediterranean race became known as “race suicide.” In 1902, fears of the negative effects of race suicide on the United States were viewed by President Theodore Roosevelt as “fundamentally infinitely more important than any other question in this country.”34 This fear of race suicide, and the need to segregate White Europeans into separate races, manifested in the authoring of the Dictionary of Races or Peoples in 1911 by the 61st Congressional Commission on U.S. immigration. In it, the people of Europe were divided into 28 races; most along national lines.35 As Irvin Painter describes, Roosevelt saw race suicide as a: kind of race war pitting the higher races of his native Americans [i.e., of English-speaking Anglo-Saxon descent] against two groups deemed inferior by dint of their heredity: “degenerate” poor white families of native descent and immigrant workers from southern and eastern Europe.36 It is this idea that Mediterranean immigrants and “degenerate” people of Anglo-Saxon descent formed inferior racialized categories where we see the broadest expansion of race in the United States. Not only were European

Molding Race in the United States 47 Caucasians divided into multiple races, but the Anglo-Saxon race itself was separated into “superior” native Americans (again, meaning of Englishspeaking Anglo-Saxon descent) and those who had degenerated into an “inferior” race. This separation of the Anglo-Saxon race is a direct outgrowth on Darwinian thinking applied to society—an idea that manifests itself in social competition and the concept of survival of the fittest.37 And it is this separation of the races that sets the stage for the eugenics movement, and its influence on the early pioneers of educational measurement detailed in Chapter 6. Separation of Europe into distinct racialized categories also formed the foundation for the rapid changes in immigration, naturalization, and citizenship policies enacted between the 1880s and the 1920s. The social pressures produced by rapid industrialization, urbanization, and immigration created conditions ripe for the molding of race that aimed to protect the status and privilege of people of White Anglo-Saxon descent. Three sets of changes in immigration policy that occurred during this period evidence legislative efforts to protect this restricted conception of whiteness. The first of these efforts occurred through the 1875 Page Act and the Chinese Exclusion Act of 1882, which first limited and then banned immigration from China, Japan, and other regions of Asia. These legislative acts were designed to protect the economic and political interests of “white persons” residing in the western regions of the nation. The second set of efforts to protect Anglo-Saxon whiteness involves the Naturalization Act of 1906 and the Immigration Act of 1907, both of which were signed into law by President Roosevelt. Together, these laws required immigrants to learn English before becoming eligible for citizenship and excluded entry of people deemed “idiots, imbeciles, feebleminded persons, epileptics, insane persons … paupers; persons likely to become a public charge; persons … being mentally or physically defective.” The English-language requirement introduced by the Naturalization Act of 1906 was strengthened by the Immigration Act of 1917, which required passage of a literacy test prior to entry into the United States—a requirement that clearly favored immigrants from English-speaking nations or who had greater access to education in their homelands. Although these acts did not explicitly reference any racialized groups, they were used to exclude a new racialized group of people, namely those from southern and eastern Europe. As we will see in Chapter 6, tests of intelligence were used to classify immigrants from these regions into one of these categories in order to deny them entry into the United States. A third set of legislative actions designed to protect whiteness involved the Emergency Immigration Act of 1921 and the Immigration Act of 1924, which established formal quotas limiting the number of immigrants from specific regions to 2% of their recorded population in the United States

48 Race, Racism, and the White Racial Frame based on the 1870 census. Use of the 1870 census to calculate quotas severely restricted immigration from southern and eastern Europe relative to that from northern European nations. This division intentionally aligned with the belief that people of Anglo-Saxon descent were a “superior” race of White people propagated by Emerson, Roosevelt, and many other leaders during the previous half-century. Although outside of this time period, one additional federal act focused on the racialization of people of Mexican descent is noteworthy. In 1954 the Immigration and Naturalization Service launched what it termed “Operation Wetback.” Similar to efforts in the 1850s to repulse people of Mexican descent who were made U.S. citizens following the end of the MexicanAmerican War, this federal operation removed more than one million immigrants of Mexican descent who were in the country either legally or illegally. Although the circumstances that prompted the program were complicated, the operation solidified the racialization of people of Mexican—and, gradually, Latine descent—that had begun a century earlier. With this brief history in mind, the next sections examine three additional tools used to mold race in the United States: U.S. Supreme Court cases focused on racialized identity; the one-drop rule; and the (re)molding of racialized categories by the U.S. census. U.S. Supreme Court Cases Immigration, naturalization, and citizenship are closely connected processes used to control who is allowed legal entry into the United States and who is permitted the full rights of citizenship. The laws and regulations governing these processes also serve as a powerful tool for molding race. Although the original U.S. Constitution did not address immigration, it did grant Congress the power to regulate naturalization. In 1790, Congress did so in the first of a long series of Naturalization Acts. This Act limited those eligible to become nationals to “free white person[s]” (and their children under the age of 21) who have resided within the jurisdiction of the United States for two or more years.38 While the many additional Naturalization Acts passed over the next half-century modified specific requirements for naturalization, the “free white” status requirement remained intact until the Civil Rights Acts of 1866 and the Naturalization Act of 1870 extended citizenship rights to people of African descent. Limiting citizen rights to people membered White or Black set the stage for legal challenges to the racialized classifications of people interested in seeking or protecting U.S. citizenship. As such, preserving whiteness became both a social goal and a judicial one. Ian Haney López is the Chief Justice Earl Warren Professor of Public Law at the University of California, Berkeley. In his book, White by Law,

Molding Race in the United States 49 Haney López analyzes the rationales provided by federal and, on two occasions, Supreme Court justices for a string of federal court rulings focused on clarifying who in the United States qualifies as being membered White.39 Until the Immigration and Nationality Act of 1952, being membered White was essential for becoming a naturalized citizen. Whiteness was a form of property which, when possesed by an immigrant, opened the door to citizenship. Until naturalization laws were reformed in 1952, essential questions debated in the courts focused on who qualified as White, and why. Haney López identifies 52 cases in state and federal courts and the U.S. Supreme Court that deliberated whiteness. The descent of the petitioners in these cases varied widely and included people from Hawaii, China, Japan, Burma, the Philippines, Mexico, Armenia, Syria, India, and Arabia, as well as people of mixed descent. In some cases, the plaintiff contested the loss of citizenship that resulted from a change in regulations while they were traveling abroad. In a few cases, the debate centered on the right to own property—a right reserved for people membered White. Most cases, however, focused on acquiring whiteness in order to gain citizenship. Throughout these cases, the courts twisted and contorted logic to deny whiteness for people of non-European descent. This contortion was enabled by two competing methods for defining whiteness: a biological definition of racial categorizations, and common knowledge. The biological definition relied on the early work of Linnaeus and Blumenbach and updated racialized categorizations introduced by 19th-century racial “scientists.” When convenient, the courts relied on the scientific definition of racial groups to exclude people whose light-colored skin was as white as those of southern Europeans, arguing that the scientific classification did not include their region of origin within the category of Caucasian. When the scientific definition of Caucasian failed to exclude a person whose complexion was notably darker, the courts turned to common understanding of Whiteness, proclaiming that any person on the street who encountered the darker-skinned plaintiff would conclude they were not-White. Of these 52 cases, three are most illustrative of the efforts the courts made to mold race to protect whiteness. The first of these cases, In re Ah Yup, exemplifies the ways in which the court applied common understanding and scientific understanding to protect whiteness. Heard by the federal district court in California in 1878, this case focused on a man of Chinese descent seeking citizenship. In its ruling, the court wrote, The words “white person” … in this country, at least, have undoubtedly acquitted a well settled meaning in common popular speech, and they are constantly using the sense so acquired in the literature of the country, as well as in common parlance.

50 Race, Racism, and the White Racial Frame To bolster the rationale for its denial of citizenship, the court reasoned: In speaking of the various classifications of races, Webster in his dictionary says, “The common classification is that of Blumenbach, who makes five [races]. 1. The Caucasian, or white race …; 2. The Mongolian, or yellow race …; 3. The Ethiopian or Negro (black) race …; 4. The American, or red race …; 5. The Malay, or Brown race” … This division was adopted from Buffon, with some changes in names, and is founded on the combined characteristics of complexion, hair, and skull … no one includes the white, or Caucasian, with the Mongolian or yellow race.40 Given the common understanding of white skin and the scientific classification of people from China as Mongolian, not Caucasian, the federal court concluded that Ah Yup was not White. As a result, he did not meet the “white person” requirement for citizenship that existed at the time. In the second case, Ozawa v. United States, Takao Ozawa challenged the denial of his application for citizenship. Ozawa had graduated from high school in Berkeley, California, studied at the University of California, attended an “American church,” and was raising his children speaking English in their home. Having been born in Japan and with both of his parents’ being Japanese, Ozawa was denied citizenship because he did not meet the “white person” requirement. Ozawa appealed the lower court’s decision, and his case was eventually heard by the U.S. Supreme Court in 1922. Justice Sutherland penned the court’s opinion: The language of the Naturalization Laws from 1790 to 1870 had been uniformly such as to deny the privilege of naturalization to an alien unless he came within the description “free white person” … the Federal and state courts, in an almost unbroken line, have held that the words “white person” were meant to indicate a person of what is popularly known as the Caucasian race … The appellant in the case now under consideration, however, is clearly of a race which is not Caucasian, and therefore belongs entirely outside the zone on the negative side … These decisions are sustained by numerous scientific authorities … We think these decisions are right, and so hold. The opinion makes specific reference to “scientific authorities” who defined the races and it places specific emphasis on “the Caucasian race.” There are two points of interest in these emphases. First, some of the “scientific authorities” referred to by Justice Sutherland struggled to locate the Japanese within any of the major racialized categories and noted the light—nearly white rather than yellow—skin of people of Japanese descent. The court, however, avoids this complication by focusing on “the Caucasian race,”

Molding Race in the United States 51 which nearly all “scientists” note as being of white skin. Second is the court’s focus on the scientific authorities’ definition of Caucasian—a definition that incorporated people beyond Europe into the White race. Bernier’s separation of the earth into four racialized regions grouped the Middle East, parts of North Africa, and South Asia with Europe, categorizing them all within what he termed the first type of race. Blumenbach’s conception of Caucasian similarly extended to regions outside of Europe. In fact, he introduced the term “Caucasian” as an outgrowth of his allure for the beauty of people residing in the region of the Caucasus Mountains—a chain that divided the European and Asian continents. This focus on the scientific definition of Caucasian expanded the boundaries of who qualified as White beyond the borders of Europe to include people of Aryan descent (i.e., a group of people descended from what is now Iran and the northern Indian subcontinent). Capitalizing on a conception of White Caucasian that extends beyond Europe, Bhagat Singh Thind challenged the denial of his status as a U.S. citizen. Thind was born in India and was a practicing Sikh. As such, he identified himself as an Aryan. Like Ozawa, Thind had resided in the United States for several years, during which he worked on his graduate studies before enlisting in the U.S. military. Following his return from battle, he was granted citizenship in the state of Washington. Shortly thereafter, however, his citizenship was rescinded for failing to meet the requirement of being a “white person.” He then went to Oregon, where he was again granted citizenship. This decision was again challenged by the federal Bureau of Naturalization. Reaching the U.S. Supreme Court just one year after Ozawa’s case was heard, Thind’s legal team drew on the logic applied in Ozawa’s case to argue that, according to scientific definitions, being from South Asia and of Aryan descent, Thind was a Caucasian. Further, as a Caucasian, he was White and therefore met the qualification of being a “white person” required for citizenship. The Supreme Court, however, disagreed. Again writing the court’s opinion, Justice Sutherland rejected the scientific definition of racialized categories and grasped firmly to common knowledge as justification for denying Thind whiteness: If the applicant is a white person, within the meaning of this section, he is entitled to naturalization; otherwise not … The intention was to confer the privilege of citizenship upon that class of persons whom the fathers knew as white, and to deny it to all who could not be so classified … the conclusion that the phrase ‘white persons’ and the word ‘Caucasian’ are synonymous does not end the matter. … ‘Caucasian’ is a conventional word of much flexibility, as a study of the literature dealing with racial questions will disclose, and while it and the words ‘white persons’ are treated as synonymous for the purposes of that case, they are not of

52 Race, Racism, and the White Racial Frame identical meaning … In the endeavor to ascertain the meaning of the statute we must not fail to keep in mind that it does not employ the word ‘Caucasian,’ but the words ‘white persons,’ and these are words of common speech and not of scientific origin … It is in the popular sense of the word, therefore, that we employ it as an aid to the construction of the statute, for it would be obviously illogical to convert words of common speech used in a statute into words of scientific terminology when neither the latter nor the science for whose purposes they were coined was within the contemplation of the framers of the statute or of the people for whom it was framed … The word ‘Caucasian’ is … at best a conventional term, with an altogether fortuitous origin, which under scientific manipulation, has come to include far more than the unscientific mind suspects … The various authorities are in irreconcilable disagreement as to what constitutes a proper racial division. For instance, Blumenbach has 5 races; Keane following Linnaeus, 4; Deniker, 29 … It is a matter of familiar observation and knowledge that the physical group characteristics of the Hindus render them readily distinguishable from the various groups of persons in this country commonly recognized as white. The children of English, French, German, Italian, Scandinavian, and other European parentage, quickly merge into the mass of our population and lose the distinctive hallmarks of their European origin. On the other hand, it cannot be doubted that the children born in this country of Hindu parents would retain indefinitely the clear evidence of their ancestry. Justice Sutherland’s opinion winds an argument negating previous reliance on scientific definitions and instead embraces observation by common man to limit “whiteness” to those of European descent, the immigrants of which can bear children that blend into the White melting pot of the United States. As Haney López thoroughly documents, the U.S. federal courts and Supreme Court were key players in the molding of race that occurred during the late 19th and early 20th centuries to protect the political and economic interests of people membered White. The One-Drop Rule Since the arrival of the first enslaved people of Africa, people engaged in sexual relations across racialized lines. Over two centuries, these interracial sexual relations—forced or consensual—produced a substantial number of people of “mixed race.” During the mid-1800s, the population growth of people of mixed race outpaced that of people membered Black.41 Although children born of mixed race were rarely, if ever, membered White, their status as people membered Black was unclear. For nearly a century, state laws and courts attempted to provide clarity.

Molding Race in the United States 53 At one point, Kentucky and Oregon used the one-quarter rule to member a person of mixed-race Black. By this definition, a person who had one grandparent membered Black was also considered to be membered Black. North Carolina, Missouri, Nebraska, Florida, Indiana, and North Dakota employed a one-eighth rule, pushing Black heritage back to one’s greatgrandparents.42 People meeting this definition were referred to as octoroons— a term that became common following Dion Boucicault’s controversial 1859 play titled The Octoroon.43 The most extreme approach to membering a person Black applied a “onedrop rule.” This logic was foundational for Virginia’s 1924 Racial Integrity Act and the 1930 Code of Virginia. The Racial Integrity Act defined a person as White if they had “no trace whatsoever of any blood other than Caucasian,” while the Code of Virginia stated that “[e]very person in whom there is ascertainable any negro blood shall be deemed and taken to be a colored person.”44 Reflecting on the one-drop rule, Booker T. Washington declared: It is a fact that, if a person is known to have one percent of African blood in his veins, he ceases to be a white man. The ninety-nine percent of Caucasian blood does not weight by the side of the one percent of African blood. The white blood counts for nothing. The person is a Negro every time.45 Through these legal definitions of Black and White, race was again remolded to limit access to the privileges and rights granted to people membered White. The desire to protect the purity and power of whiteness through the law was glaringly evident in the initial decision in Virginia’s case against Mildred and Richard Loving. Found guilty of miscegenation under the 1924 Racial Integrity Act, the Lovings were sentenced to a year in prison. The Lovings appealed their conviction, and in responding to their motion, Judge Leon Bazile declared, Almighty God created the races white, black, yellow, malay and red, and he placed them on separate continents. And but for the interference with his arrangement there would be no cause for such marriages. The fact that he separated the races shows that he did not intend for the races to mix.46 And so it was that the law reverted back to Bernier, Linnaeus, and Blumenbach’s racial science to deny the Lovings’ appeal and, in the process, further evolved the molding of race in the United States. The U.S. Census The U.S. census is another legislative tool used to mold race. As specified by the U.S. Constitution, the federal government is required to conduct a

54 Race, Racism, and the White Racial Frame national census every ten years. The purposes of the census are multiple and include determining the number of representatives (and electoral votes) for each state, informing the collection of taxes and allocation of resources, and documenting the compensation and distribution of people residing within the United States.47 Since the 1960s civil rights legislation, census results have also been used to monitor inequities in U.S. society.48 Among the information collected by the census is the number of people living within each household along with their age, sex, and racialized identity. It is through the question focused on racialized identity, and for a period of time language spoken and ancestry, that race has and continues to be molded.49 Until 1970, national census data was collected in person by enumerators. Through questions put to one or more members of a household and through personal observation, the enumerator determined and recorded racialized information about members of each household. As we will see in Chapter 5, reliance on human judgment to categorize people’s racial identity, and other characteristics, was fraught with error and prejudice. Nonetheless, it was the standard practice from 1790 through 1960. The first national census was conducted in 1790 and focused on two racial groups divided into four categories: free White males, free White females, all other free persons, and slaves. Slaves and “all other free persons” were membered Black, while the two other categories were clearly membered White. These four categories recurred for the next five decades, with a minor modification in language occurring in 1820 when all other free persons was clarified to reference free colored males and females. The 1850 census is the first in which major changes to racialized categories were made. The first change dropped the distinction between free White males and free White females and instead membered both groups of people White. This single White category persists today. For those membered Black, four new categories were created: Black, mulatto, Black slaves, and mulatto slaves. This modification was driven by rising tensions about the abolition of slavery and a narrative gaining popular acceptance manufactured by racial scientists in the southern states who claimed the mixing of races resulted in people of inferior biological traits. Specifically, mulattoes were believed to be less healthy, shorter-lived, and less fertile.50 After passage of the 14th Amendment, the distinction between free and enslaved was dropped, but the separation between Black and mulatto persisted. To further support the theory regarding the biological degradation of people of mixed race, in 1890 two additional categories—Quadroon and Octaroon—were added to distinguish among people membered Black. These two new categories lasted for only one census. Nonetheless, they demonstrate the ways in which politics and racial science influenced the racialized categories employed by the census. Although the census itself did not produce these racialized ideas and resulting categories, it “provide[s] the concepts, taxonomy, and substantive

Molding Race in the United States 55 information by which a nation understands its component parts as well as the contours of the whole.”51 For censuses collected between 1790 and 1860, the absence of any racialized people other than White and Black is notable. During the latter half of the 19th century, social tensions in the West and Northeast impacted the racialized categorizations collected through the census. In the West, settlers found themselves in competition with the people indigenous to the land and with immigrants from China. At the same time, rapid growth in immigration from non-Anglo-Saxon regions of Europe produced social and economic tensions in the Northeast. These tensions led the census to abandon its binary racialized categorization. In 1860, Indigenous people were added to the census.52 A decade later, Chinese was added as the first category representing the Asian race.53 In 1890, Japanese was added as a second category of Asian race. Additional categories were added over the next several decades.54 The additional categories shifted over the decades, some being added and then dropped. Most interesting were the categories of “Hindoo [sic]”55 between 1920 and 1940 and “Hawaiian” from 1950 to 1990 and “Part Hawaiian” in 1950. “Hawaiian” eventually morphed into a new racial category, “Hawaiian/Pacific Islander,” in 2000. How to distinguish between native Americans—that is, the Englishspeaking people of Anglo-Saxon descent—and the increasing numbers of people arrived or descendent from southern and eastern Europe presented a challenge for the census. Although the non-Anglo-Saxon people were deemed to be inferior races, they were nonetheless considered White. Differentiating among the racialized categories of White people, however, was an impossible task for enumerators, and a topic of considerable controversy for political leaders. The initial solution, first applied in the 1850 census, was to collect information about one’s place of birth. Birthplace, then, served as an indicator for membering a person White.56 In 1890, additional questions regarding citizenship and years of residence were added to further corroborate one’s racialized identity. Together, concerns about the inaccuracy of classifications based on this information and a belief that immigration from southern and eastern Europe was degrading the mental and physical qualities of the nation prompted the U.S. Senate to pass an amendment to the 13th Census Act in 1910 that aimed to classify foreign-born residents by their race. At the time, racialized categories expanded greatly to incorporate national and regional distinctions. This expansion is reflected in the 28 racialized categories listed in the Dictionary of Races and Peoples.57 This amendment was eventually dropped from the legislation and replaced by a requirement for the census to collect information about the “mother tongue” for all foreign-born and native-born of foreign parents. Jennifer Leeman, a professor of Spanish whose research focuses on the sociopolitics of language, argues, “The use of mother tongue as a racial indicator was clear … [and] constructed mother tongue as a

56 Race, Racism, and the White Racial Frame hereditary characteristic passed from one generation to the next, regardless of actual language use.”58 Relying on mother tongue as a proxy for racialized categorization of people otherwise deemed White served two purposes. First, it avoided dividing the White census category into 28 or more parts, thus maintaining the appearance of a nation that was overwhelmingly White. Second, birthplace and mother tongue allowed census takers and politicians to document the expanding influx of people deemed “inferior” and, in the 1920s, severely restrict the entry of these “undesirable” races into the United States. The final racialized group of people that emerge in the census are those of Spanish-speaking descent. The admittance of Texas as a state in 1845 and the ensuing Mexican-American War greatly increased the number of people of Mexican descent residing in the United States. Although there were observable differences in the skin tone and facial features of many people of Mexican descent, they were typically treated as White by the census into the 21st century. Although some Mexican Americans were occasionally marked as mulattoes by census enumerators, a racial category specific to people of Mexican descent did not appear until 1930.59 With increasing numbers of people born in Mexico immigrating to the United States in the late 1920s and early 1930s, leaders and the populace membered White in states bordering Mexico became antagonistic toward these newly arrived immigrants. In 1930, for the first time, the U.S. Census Bureau added “Mexican” as a racial category, justifying this addition by claiming, practically all Mexican laborers are of a racial mixture difficult to classify, though usually well recognized in the localities where they are found. In order to obtain separate figures for this racial group, it has been decided that all persons born in Mexico, or having parents born in Mexico, who are definitely not white, Negro, Indian, Chinese, or Japanese, should be returned [i.e., counted] as Mexican.60 Of note in these instructions is the focus on laborers. Like all groups of immigrants from a given region, many became laborers once they entered the United States. However, some came with or quickly accumulated wealth. For wealthy people of Mexican descent, their racial classification was generally White, and the Census Bureau’s instructions did not challenge this assertion. Inclusion of Mexican as a racialized census category was short-lived. Lawyers for the League of United Latin American Citizens challenged the classification. Similar protests were made by the Mexican consul general and the Mexican ambassador. Among the arguments presented was concern that the new racial classification was designed to “discriminate between the Mexicans themselves and other members of the white race, when in truth and

Molding Race in the United States 57 fact we are not only part and parcel but as well the sum and substance of the white race.” The argument also noted that there was no need to separate Mexicans from the White race since “Jim Crow did not apply to us.”61 Nonetheless, concern over immigration from Mexico coupled with the 1924 immigration law provided motivation to designate Mexican as something other than White, particularly laborers who competed for jobs with U.S. citizens membered White. Eventually, political pressure combined with concerns about the inaccurate classifications of some people of Mexican descent led the Census Bureau to remove Mexican as a racialized category.62 It was not until 1970 that categories for Spanish-speaking people reappeared on the U.S. census, but this time classifications focused on region of origin.63 To be clear, the census did not produce race. Rather, “the inclusion and naming of categories … requires an a priori determination of the categories’ importance while simultaneously reinforcing that importance.”64 We see in the next chapter how these categories have important consequences for the lived experiences of people membered into a given race. As the political scientists Jennifer Hochschild and Brenna Marea Powell observe, Together, beliefs and practices determine what the meaningful group categories are, how they are bounded, who belongs in each, and where each group’s status is situated in relation to the others. The racial order helps to guide the polity’s and individuals’ choices about the distribution of goods and resources, and does a great deal to shape each person’s life chances.65 In this way, the census served to support the molding of race by communicating the taxonomy of racialized categories of interest during specific periods of time—an interest that aimed to protect the advantage for citizens of White Anglo-Saxon descent, and gradually of citizens membered White more broadly. Molding the Social Construction of Race Shortly after the first people taken from Africa were delivered to Virginia, colonial leaders began developing a legal structure for racialized categorization. This initial molding of racialized categories focused largely on distinguishing free White people from enslaved people of African descent. Laws and court rulings always favored interpretations that either maintained or guaranteed the enslaved status of people membered Black. This molding of enslaved status brought benefit to property owners membered White by both protecting their investment in enslaved people and providing increased enslaved labor through the birth of enslaved children, at times including their own offspring. The molding of racialized categorizations during the

58 Race, Racism, and the White Racial Frame 17th century occurred prior to the introduction of the biological conception of race and conditioned colonists, and eventually the U.S. populace, to welcome these more formal scientific racialized taxonomies—taxonomies that placed people membered White above all others. Once it became an independent nation, the United States continued to mold race in efforts to preserve or further produce advantage for people membered White. In some cases, laws and court rulings focused on miscegenation were passed to protect the purity of whiteness. At other times, the purity of whiteness was protected by distinguishing the Anglo-Saxon race from people from southern and eastern Europe. Similarly, court rulings twisted logic to exclude fair-skinned Japanese and scientifically classified Caucasians from the White race. When increases in immigrants from “nonWhite” regions of the world increased, and in turn increased competition for employment, laws and policies were enacted to stem the flow of people from these regions. To assist in monitoring this flow, new racialized categories were added to the census. Through these means, racialized categories expanded from the binary White and Black divide (with the people Indigenous to the land largely ignored in early classification schemes) to the nearly 40 racialized categorizations of the early 20th century—a set of categories that mixed “scientific” notions with nationality, language, and religion. This long history of molding race is what makes race a social construction that serves economic and political purposes intended to protect advantage for people membered White. As I explore in the next chapter, it is this (re)molding to protect advantage that makes race a product of, rather than the foundation for, systemic racism. And it is the social construction of race that creates challenges for those in the field of educational measurement who employ race as if it is a characteristic of each person rather than an oppressive construction of society. Notes 1 Haney López (2006), p. xvi. 2 Omi and Winant (2015), p. 75. 3 Referencing Finley (1969) and Davis (1969), Hall (1980, p. 337) observes that “though slavery in the Ancient World was articulated through derogatory classifications which distinguished between the enslaved and enslaving peoples, it did not necessarily entail the use of specifically racial categories, whilst plantation slavery almost everywhere did.” Hall argues that the time in history and physical location in which people of Africa were captured, enslaved, and then forced to perform labor for plantations established on taken land produced a unique set of conditions that led this form of slavery to become the foundation for anti-Black racism. 4 Zurara quoted in Wolf (1994), p. 465. Note the term “race” is used in translation to refer to a group of people with a common genealogy rather than as the biological meaning that arose in the 18th century.

Molding Race in the United States 59 5 The first people residing in Africa enslaved by the Portuguese were captured by Antão Goncalves in 1441 while hunting for monk seals in the mouth of what is now known as the Niger River. Goncalves subsequently captured an additional ten people in search of someone with whom their Arabic-speaking interpreter could communicate in hopes of learning about a passage to India. One of the ten spoke Arabic. At about the same time, the Portuguese learned about the Arabian slave-trading business and the Arabic-speaking slave negotiated his freedom, promising to bring the Portuguese additional enslaved people in exchange for his release. This exchange, which occurred in 1444, is believed to be the beginning of the African slave trade and which led to Zurara’s account of 235 enslaved people from Africa entering the Portuguese port of Lagos. Wolf (1994); Newitt (2010). 6 Zurara quoted in Wolf (1994), p. 460. 7 Wolf (1994) notes that the Black Plague, which was nearing its end, had created a considerable labor shortage across Europe and implies that the enslavement of humans from Africa was motivated, in part, by a desire to fill this shortage. 8 Henry Louis Gates Jr. (2012) notes that the first person from Africa came to what is now the state of Florida in 1513, not as an enslaved person but a conquistador operating as fully free person. The first enslaved person from Africa is thought to have been brought to what is now Florida by the Spanish in 1528 (Parish, 1974). 9 For an informative description of the legal structure undergirding the privateering (pirating) and the complex relationship among financiers and European ports supporting the privateering system of the 16th century, and how these structures came into play with the exchange of enslaved people from Africa for supplies in Port Comfort, see Austin (2019), pp. 10–12. 10 There is disagreement over whether the first people from Africa brought to Virginia in 1619 and the decade immediately following were considered by the British colonists as enslaved or indentured servants. Austin (2019) cites a compelling body of evidence that suggests the people brought from Africa were enslaved. During this early period, some people of African descent who had spent time in England also arrived in the colony of Virginia, and evidence suggests their knowledge of the English language and laws may have allowed them to negotiate indentured status that eventually allowed them to live in a state of freedom and have property rights. 11 There are three records of an Indigenous woman marrying a male colonist in Virginia during the 17th century, and these woman were considered members of the colonies. Rountree (2020). 12 Feagin (2006), p. 15, summarizing Jordan (1968). 13 Higginbotham (1978), p. 20. 14 Austin (2019). 15 Quoted in Higginbotham (1978), p. 28. 16 Higginbotham (1978), pp. 34–36. 17 Higginbotham (1978), see p. 37. 18 The first of these statues, established in 1646, required the “owner” to pay taxes for enslaved people taken from Africa. A series of acts passed in the 1690s specified similar taxes on enslaved people of Africa. And in 1705, an import tax was established for ship captains who brought enslaved people to the colony. 19 The Virginia Assembly, September 17, 1630, quoted in Mumford (1999), p. 280. Higginbotham (1978) notes that this ruling does not reference Davis’s legal status or racialized identity, nor does it reference the gender of the person with whom he slept, but the reference to Christianity suggests Davis was a White Englishman.

60 Race, Racism, and the White Racial Frame 20 The General Court of Colonial Virginia, October 17, 1640, quoted by Higginbotham (1978), p. 23. Mumford (1999) notes that Sweat’s race is not identified but that fact that his penance was to be served at the James City Church, the members of which were of English descent, suggests he was of White English descent. 21 Elizabeth Key was eventually granted freedom, not by the courts, but by those who oversaw her ownership while the case was being contested (Austin, 2019). 22 Quoted in Higginbotham (1978), p. 43. 23 The General Assembly as quoted by Austin (2019), p. 19. 24 As quoted by the Maryland State Archives (2000). 25 Higginbotham (1978), p. 44. 26 Gibson and Jung (2002); McDonald and McDonald (1980). 27 Ting (1994). 28 As an example of the importance of sociohistorical factors on the construction of racialized categories, Zuberi and Bonilla-Silva (2008, p. 58) reference Linda Gordon’s The Great Arizona Orphan Abduction, in which orphaned children of Irish descent are relocated from the Northeast to Arizona. They quote Gordon, They [the orphaned Irish children] did not grasp that this trip was to offer them not only parents but upward mobility—even less did they know that mobility took the form of a racial transformation unique to the American Southwest, that the same train ride had transformed them from Irish to White. 29 Ngai (1999), p. 69. See also Leeman (2004), who writes, The growing preoccupation with race evident in discourses of American expansion and progress, and the ideological coupling of national identity and Anglo-Saxon racial identity, can also be seen in the changing construction of immigrant difference, and in the use of race-based arguments in calls for immigration restriction … The mid 19th century marks a shift in which differences among European groups began to be portrayed and perceived as based on race, rather than ‘simply’ nationality and cultural or political tradition, with American values and identities portrayed as rooted in the Anglo-Saxon racial characteristics of the early settlers. (pp. 514–515) 30 Emerson (1856), first two phrases, page xviii; third and fourth phrase, page xxiii; and fifth phrase, page xxvi. 31 Emerson (1856), p. xviii. 32 Painter (2010). See pp. 31, 66, 72, 273 in Lodge (1881). 33 Ripley (1899), p. 121. 34 Quoted in Irvin Painter (2010), p. 250. 35 United States Immigration Commission, 61st Congress (1911), see p. 7. 36 Painter (2010), p. 250. 37 In Descent of Man, Darwin also extends the racialization of people to socialeconomic class. As Claeys (2000, p. 237) describes, Darwin warns of the ‘degeneration of a domestic race,’ because the human species allowed its worst members, ‘the very poor and reckless,’ to breed so wantonly and injuriously, ‘whilst the careful and frugal, who are generally otherwise virtuous, marry later in life,’ with a consequent ‘retrograde’ effect on human progress. This sense of the poor as a ‘race,’ genus, type, or species apart would continue in much of the discourse on poverty of the 1880s.

Molding Race in the United States 61 The extension of racialization to class was more pervasive in England than in the United States and, as I explore in greater detail in Chapter 6, was a main concern of Francis Galton and the eugenics movement he launched. 38 Women are not mentioned, presumably because the architects of the act assumed coverture—a legal status in which the obligations and loyalty of a woman were to their spouse rather than their nation of origin. 39 Mills (1997) performs a play on words with the title of Haney López’s book, arguing that it was the invention of “Black” that created “White,” writing: ‘White’ people do not preexist but are brought into existence as ‘whites’ by the Racial Contract [a part of which defined Blackness]—hence the peculiar transformation of the human population that accompanies this contract. The white race is invented, and one becomes ‘white by law’. (p. 63) 40 In re Ah Yup, as quoted in Haney López (2006), p. 4. 41 Cruz and Berson (2001) document that between 1760 and 1860, “the mulatto slave population increased by 67 percent; in contrast the black slave population increased by only 20 percent” (p. 81). 42 Browning (1951). 43 The play stirred controversy for several reasons. First, it debuted just three days after the hanging of John Brown, who led the raid on Harper’s Ferry intended to spark a slave revolt in the southern states and starred a female of Black descent. Second, although Boucicault tried to remain noncommittal about his stance on abolition in order to attract audiences from both sides of the issue, many interpreted his message as siding with abolition. See Richardson (1982) and Kaplan (1951). 44 Virginia Racial Integrity Act, 1924 and Code of Virginia, 1930, Section 67 Colored persons and Indians defined. 45 Quoted in Mencke (1976), p. 37. 46 Quoted in Cruz and Berson (2001), p. 81. 47 Hochschild and Powell (2008); Leeman (2004). 48 Strmic-Pawl et al. (2018). 49 Leeman (2004, p. 517) notes that [because] language had been ideologically linked to national identities at least since the Romantic period (Horsman, 1981), the construction of national identities as racial, as well as political and cultural, allowed language to take on a role as an index of race. 50 Hochschild and Powell (2008); Humes and Hogan (2009); Strmc-Pawl et al. (2018). 51 Hochschild and Powell, (2008), p. 60. 52 The census used the term “Indian” until 1950 when it specified “American Indian.” In 1960 it added “Aleut” and “Eskimo,” dropped them in 1970, then readded them in 1980. In 2000 and 2010, a single category labeled “American Indian or Alaska Native” was employed. Pew Research Center (2020). 53 Hochschild and Powell (2008) note that reference to people of Chinese descent first appeared in California’s extra state census in 1852 when a footnote identified a portion of the White population as actually being of Chinese descent. 54 Among these categories were Filipino, Korean, Hindoo, Hawaiian, Part Hawaiian, Asian Indian, Vietnamese, Samoan, Guamanian, Other Asian or Pacific Islander, and simply Other Asian. Pew Research Center (2020).

62 Race, Racism, and the White Racial Frame 55 Hochschild and Powell (2008) notes that the “Hindoo” race stood out because it referenced a religious affiliation rather than national or regional. The authors also note that most people of South Asian/Indian descent in the United States at the time practiced Sikhism—a religion distinct from Hinduism. This difference, however, was overlooked by the Census Bureau and the general public. They also note that the Census Bureau provided clarification on why people who practice Hinduism were enumerated as a type of Asian: Pure-blood Hindus belong ethnically to the Caucasian or white race and in several instances have been officially declared to be white by the United States courts in naturalization proceedings. In the United States, however, the popular conception of the term ‘white’ is doubtless largely determined by the fact that the whites in this country are almost exclusively Caucasians of European origin and in view of that fact that the Hindus, whether pure-blood or not, represent a civilization distinctly different from that of Europe, it was thought proper to classify them with non-white Asians. (U.S. Bureau of the Census, Population 1910, p. 126) 56 Leeman (2004). 57 Leeman (2004), who references the 1910 U.S. Senate Immigration Commission: pp. 18–19. See also United States Immigration Commission, 61st Congress (1911). Dictionary of Races of People. 58 Leeman (2004), pp. 518–519, italics in original. 59 Hochschild and Powell (2008) note that western states, particularly Colorado, reported unusually high numbers of mulattoes in the late-19th-century censuses and speculate this was due to classifying people of Mexican descent as mulattoes. 60 United States Immigration Commission, 61st Congress (1911). Dictionary of Races of People. Government Printing Office. https://archive.org/details/dictionaryof race00unitrich/page/6/mode/2up?ref=ol&view=theater. (1930), quoted in Hochschild and Powell (2008), p. 80. 61 Quoted in Hochschild and Powell (2008), p. 81, with attribution to Benjamin Màrquez (1993), pp. 32–33. 62 Hochschild and Powell (2008). 63 Omi and Winant (2015) contend that by the 1960s, the idea of race being a biological construct was widely discredited in the academic and scientific communities and would have been eliminated from the 1970 census had it not been for civil rights legislation and the subsequent need to monitor patterns of discrimination and disparate impacts. 64 Leeman (2004), p. 509. 65 Hochschild and Powell (2008), p. 61.

References Austin, B. (2019). 1619: Virginia’s First Africans. Hampton History Museum. https:// hampton.gov/DocumentCenter/View/24075/1619-Virginias-First-Africans? bidId= Browning, J.R. (1951). Anti-miscegenation laws in the United States. Duke Bar Journal, 1(1), 26–41. Claeys, G. (2000). The “survival of the fittest” and the origins of social Darwinism. Journal of the History of Ideas, 61(2), 223–240.

Molding Race in the United States 63 Cruz, B.C. & Berson, M.J. (2001). The American melting pot? Miscegenaton laws in the United States. Organization of American Historians Magazine of History, 15(4), 80–84. Davis, D.B. (1969). The comparative approach to American history: Slavery. In Slavery in the New World. Prentice Hall. Emerson, R.W. (1856). English Traits. James R. Osgood and Company. Feagin, J. (2006). Systemic Racism: A Theory of Oppression. Routledge. Finley, M.I. (1969). The idea of slavery. In Slavery in the New World: A Reader in Comparative History. Prentice-Hall. Gates, H.L., Jr. (2012). Who was the first African American? The Root. Accessed August 20, 2021 at https://www.theroot.com/who-was-the-first-african-american1790893808 Gibson, C. & Jung, K. (2002). Historical Census Statistics on Population Totals by Tace, 1790 to 1990, and by Hispanic origin, 1790 to 1990, for the United States, Regions, Divisions, and States. US Census Bureau. Hall, S. (1980). Race articulation and societies structured in dominance. In Sociological Theories: Race and Colonialism. UNESCO. Haney López, I. (2006). White by Law: The Legal Construction of Race. NYU Press. Higginbotham, A.L. (1978). In the Matter of Color: Race and the American Legal Process. The Colonial Period (Vol. 608). Oxford University Press. Hochschild, J.L. & Powell, B.M. (2008). Racial reorganization and the United States Census 1850–1930: Mulattoes, half-breeds, mixed parentage, Hindoos, and the Mexican race. Studies in American Political Development, 22(1), 59–96. Humes, K. & Hogan, H. (2009). Measurement of race and ethnicity in a changing, multicultural America. Race and Social Problems, 1(3), 111–131. Kaplan, S. (1951). The octoroon: Early history of the drama of miscegenation. The Journal of Negro Education, 20(4), 547–557. Leeman, J. (2004). Racializing language: A history of linguistic ideologies in the US Census. Journal of Language and Politics, 3(3), 507–534. Lodge, H.C. (1881). A Short History of the English Colonies in America. Harper & Brothers. Màrquez, B. (1993). LULAC: The Evolution of Mexican American Political Organization. University of Texas Press. Maryland State Archives. (2000). Blacks before the law in colonial Maryland. Freedom or Bondage—The Legislative Record. https://msa.maryland.gov/msa/speccol/ sc5300/sc5348/html/chap3.html McDonald, F. & McDonald, E.S. (1980). The ethnic origins of the American people, 1790. The William and Mary Quarterly: A Magazine of Early American History, 181–199. Mencke, J.G. (1976). Mulattoes and Race Mixture: American Attitudes and Images, 1865–1918. UMI Research Press. Mills, C.W. (1997/2014). The Racial Contract. Cornell University Press. Mumford, K. (1999). After Hugh: Statutory race segregation in Colonial America, 1630-1725. The American Journal of Legal History, 43, 280–305. Newitt, M. (2010). The Portuguese in West Africa, 1415–1670: A Documentary History. Cambridge University Press.

64 Race, Racism, and the White Racial Frame Ngai, M.M. (1999). The architecture of race in American immigration law: A reexamination of the Immigration Act of 1924. The Journal of American History, 86(1), 67–92. Omi, M. & Winant, H. (2015). Racial Formation in the United States. Routledge. Painter, N.I. (2010). The History of White People. WW Norton & Company. Parish, H.R. (1974) Estebanico. Viking Press. Pew Research Center. (2020). What Census Calls Us. https://www.pewresearch.org/ interactives/what-census-calls-us/ Richardson, G.A. (1982). Boucicault’s “The Octoroon” and American law. Theatre Journal, 34(2), 155–164. Ripley, W.Z. (1899). The European Races. Appleton & Company. Rountree, H. (2020). Marriage in early Virginia Indian Society. In Encyclopedia Virginia. https://encyclopediavirginia.org/entries/marriage-in-early-virginia-indian-society Strmic-Pawl, H.V., Jackson, B.A. & Garner, S. (2018). Race counts: Racial and ethnic data on the US Census and the implications for tracking inequality. Sociology of Race and Ethnicity, 4(1), 1–13. Ting, J.C. (1994). Other than a chairman: How US immigration law resulted from and still reflects a policy of excluding and restricting Asian immigration. Temple Political & Civil Rights Law Review, 4, 301. U.S. Bureau of the Census, Population 1910: Volume 1, General Report and Analysis. Government Printing Office. United States Immigration Commission, 61st Congress. (1911). Dictionary of Races of People. Government Printing Office. https://archive.org/details/dictionaryof race00unitrich/page/6/mode/2up?ref=ol&view=theater Wolf, K.B. (1994). The “Moors” of West Africa and the beginnings of the Portuguese slave trade. Journal of Medieval & Renaissance Studies, 24(3), 449–469. Zuberi, T. & Bonilla-Silva, E. (2008). White Logic, White Methods: Racism and Methodology. Rowman & Littlefield Publishers.

3

The Systemic Structure of Racism

Racism is productive, not in the sense of being good, but in the literal capacity of racism to produce things of value to some, even as it wreaks havoc on others.1

The scientific process has a successful track record correcting misconceptions introduced by earlier scientists. Greek astronomers established the earth as the center of the universe. Astronomical observations collected several centuries later led Copernicus to postulate that the earth and the other planets in our solar system revolved around the sun. While Copernicus’s model was initially rejected, over time, scientific evidence led astronomers to abandon the geocentric model in favor of the Copernican model. During the 1800s the medical field believed the human body contained four types of bodily fluids, termed humors. The quality of a person’s health was influenced by the balance of humors. When a person fell seriously ill, bloodletting was used to rebalance the humors. As understanding of the human body evolved in the 19th century, the ideas of humors and bloodletting were replaced by treatments designed for specific ailments—aspirin to reduce fevers, antivenoms to treat poisonous bites, and antimalarial pharmaceuticals to treat malaria. In the late 17th century, alchemists observed that objects lost weight when burned. To explain this weight loss, they theorized that a substance, termed phlogiston, was released from an object as it burned. For example, they theorized that wood was composed of two substances, ash and phlogiston. When burned, phlogiston was released and only the ash remained. Over time, a subset of experiments revealed that some metals did not lose, but rather gained, weight when burned. This finding directly challenged the theory undergirding phlogiston and eventually led to the discovery of oxygen and the process of oxidization.2 Phlogiston, bloodletting, and the geocentric model are all examples of specious scientific concepts that were eventually abandoned when the fallacies upon which they were based were unveiled. Like phlogiston and the DOI: 10.4324/9781003228141-5

66 Race, Racism, and the White Racial Frame geocentric model, race as a biological concept seems sensical based on observation. Human beings do seem to have some physical traits that differ in appearance, and at the time the biological conception of race was formed, these differences were strongly associated with geographic locale. Yet, since this biological conception of race was formed, observations, evolutionary theory, and detailed analyses of the human genome combined to provide overwhelming scientific evidence that a biological conception of race is specious.3 Like phlogiston, it is simply incorrect. Yet, unlike errant theories that were abandoned, the concept of race persists. Why is this? The stickiness of race as a concept to which our society clings has nothing to do with its value to science. Rather, the failure of societies—the United States specifically—to abandon the concept of race is due to the social, political, and economic value race provides to those in power.4 Race capitalizes on corporeal characteristics to mark people oppressed in a social, economic, and political system. As the quote opening this chapter states, race is then used as a tool to control and exploit those people marked by race. This control and exploitation functions to produce, maintain, and further advantage those in power.5 It is the relationships between power, oppression, advantage, and racialized membering of people within a society that is the essence of racism. Despite its specious scientific backing, the concept of race is retained for the simple reason that it enables the production of advantage through oppression by making ocular—clearly visible to the naked eye— those who are advantaged and those whom the advantaged oppress.6 Like race, how racism is theorized has experienced considerable development over the past 150 years. Much of this development is the work of sociologists, with W.E.B. Du Bois making the first contributions in the late 19th and early 20th centuries. Although issues of racism are an important focus of educational research and, to a more limited extent, educational measurement, sparse contributions to the theory of racism have derived from the fields of educational research or educational measurement. In understanding how racism has operated and continues to operate to produce material advantage for people membered White at the expense of people membered Black, Brown, Asian, Latine, and/or Indigenous, I draw heavily on the work of several sociologists who have advanced theories of racism.7 Through their advances, a variety of terms are introduced that refine specific ideas about race and racism. Among these terms are racist, anti-racist/racism, racialization, racialized social system, overt racism, covert racism, aversive racism, racialism, institutional racism, structural racism, and systemic racism. This chapter begins by examining many of these terms in order to provide a foundation for exploring several theories of racism that have taken hold at different points in time. The chapter ends by integrating elements of a subset of these theories to present a theory of systemic racism. It is the ideology—the

The Systemic Structure of Racism 67 worldview that undergirds and justifies the disparate outcomes produced by this integrated theory of systemic racism—that provides the foundation for racism and forms the White Racial Frame that is the focus of the remainder of this book. Building a Foundation for Understanding Racism as Systemic Like all fields of study, theorizing about racism has developed a specialized body of language. This language is intended to refine, extend, and introduce new concepts that deepen the understanding of racism and how it operates within societies. Although not comprehensive, this section introduces and aims to develop a common understanding of several terms that are employed to explicate specific theories of racism. Although the previous two chapters traced the origins and subsequent molding of the term race, given its centrality in theories of racism, I begin with the term race. Race

In analyses of racism, social scientists offer numerous and varied perspectives on the concept of race. Common across conceptions is reliance on phenotypical characteristics to categorize people. However, modern discussions of race recognize that racial categorizations are social constructions. As an example, sociologists Michael Omi and Howard Winant observe that “[a]lthough the concept of race invokes seemingly biologically based human characteristics (so-called phenotypes), selection of these particular human features for purposes of racial signification is always and necessarily a social and historical process.”8 Most discussions also acknowledge that the construction of race is a product of, rather than a precursor to, racism. As an example, sociologist Amanda Lewis links the existence of race to racism, arguing that “race as a set of identities, discursive practices, cultural forms, and ideological manifestations would not exist without racism.”9 Although racial categorizations rely on ocular phenotypical characteristics, differences among racialized groups are commonly understood among the populace as signifying differences in psychological, cognitive, and behavioral characteristics of people membered into racialized groups.10 “Perceived differences in skin color … are understood as the manifestations of more profound differences that are situated within racially identified persons: difference in such qualities as intelligence, athletic ability, temperament, and sexuality, among other traits.”11 Most discussions also recognize the use of race to differentiate those who play a dominant role in society from those who are oppressed or otherwise “othered” through racialized categorization. Bonilla-Silva posits that “races

68 Race, Racism, and the White Racial Frame are the effect of racial practices of opposition (“we” versus “them”) at the economic, political, social, and ideological levels.”12 Omi and Winant contend further that [b]ecause race is located on the body, it has proved a convenient means of rule, a political technology through which power can be both exercised and naturalized … The attachment of this process of ‘othering’ to immediately visible corporeal characteristics facilitated the recognition, surveillance, and coercion of these people, these ‘others.’ This phenotypical differentiation helped render certain human bodies exploitable and submissible.13 Although race itself is a specious biological concept, some scholars recognize the rational and strategic purpose to which race is applied to perform the ideological and political work necessary to justify the exploitation and oppression of people “othered” through their racialized categorization or advantaged by being membered White.14 Because of its role in operationalizing racism, some scholars also perceive race as “the most powerful and persistent group boundary in American history.”15 Given these and many more observations about race, I suggest that race be considered a social construct that functions as a master scheme for categorizing people16 based on ocular corporeal characteristics,17 that fundamentally organizes society in a stratified, hierarchical manner18 with implied specious meanings of superiority and inferiority,19 and that has both material and social consequences20 manufactured by the distribution of power and resources.21 Racialization, Racial Formation, and Racial Projects

As seen in Chapters 1 and 2, the number and definition of racialized categories varies over time. Bernier first introduced four categories. Linnaeus fluctuated between four and six. During the early 20th century, demographers in the United States defined more than 30 racialized categories. This process of creating racialized categories is termed racialization. Put simply, racialization is the production of racial categories through the assignment of “meaning to a previously racially unclassified relationship, social practice or group.”22 As Bonilla-Silva observes, racialized categorization of people has and continues to be a strongly political act. Historically, the act of racialization followed the conquest, colonization, and enslavement of people. Today, racialization is associated with immigration policies and what is termed by some politicians as terrorism. Categories such as “Indians” and “Negroes” were invented (Allen 1994; Berkhoffer 1978; Jordan 1968) in the sixteenth and seventeenth centuries to

The Systemic Structure of Racism 69 justify the conquest and exploitation of various peoples. The invention of such categories entails a dialectical process of construction; that is, the creation of a category of “other” involves the creation of a category of “same.” If “Indians” are depicted as “savages,” Europeans are characterized as “civilized”; if “Blacks” are defined as natural candidates for slavery, “Whites” are defined as free subjects (Gossett 1963; Roediger 1991, 1994; Todorov 1984) … although the racialization of peoples was socially invented … [it] generated new forms of human association with definite status differences. After the process of attaching meaning to a “people” is instituted, race becomes a real category of group association and identity.23 Omi and Winant term the process of racialization—the extension of racial meaning to a previously racially unclassified relationship, social practice, or group—racial formation.24 They also posit that racial formation occurs through racial projects. In its most basic form, a racial project is any effort that shapes “the ways in which human identities and social structures are racially signified, and the reciprocal ways that racial meaning becomes embedded in social structures.”25 As Omi and Winant describe, a racial project can be classified as racist or anti-racist. A racist project is one that develops a new racialized category, maintains a racialized structure within society, or establishes a new racial structure. An anti-racist project challenges, resists, or otherwise works to undo structures that produce advantage through oppression based on racially stratified categorization. As explored in Chapters 1 and 2, racialized categories and structures are constructed and maintained to produce advantage for a dominant racialized group through the oppression of nondominant racialized groups. Since the explicit aim of an anti-racist project is to undo oppression produced through racialized categorizations and structures, a racist project cannot simultaneously function as an anti-racist project, and vice versa. Racist projects serve as the building blocks for racial formation. Applying the terminology of Michel Foucault, racist projects also function as apparatus that contribute to the manufactured distribution of power and resources.26 Mortgage regulations that manufactured segregated urban communities (a.k.a. redlining) is one example of a racist project functioning as apparatus that structured residential patterns along racialized lines.27 Additional examples of racist projects include policing practices that concentrate attention on communities in which large numbers of people membered Black or Latine reside to structure patterns of arrest, prosecution, and incarceration along racialized lines. Similarly, the use of test scores for college admission decisions operate as apparatus that structure patterns of college attendance along racialized lines.28 These and many more practices function as racist projects that structure segments of our society along racialized lines.

70 Race, Racism, and the White Racial Frame As Omi and Winant write, these projects are taking place all the time, whenever race is being invoked or signified, wherever social structures are being organized along racial lines. Racial formation is thus a vast summation of signifying actions and social structures, past and present, that have combined and clashed in the creation of the enormous complex of relationships and identities that is labeled race.29 Further, Omi and Winant explain that [r]ather than envisioning a single, monolithic, and dominant racist project … [racial] projects exist in a dense matrix, operating at varying scales, networked with each other in formally and informally organized ways, enveloping and penetrating contemporary social relations, institutions, identities, and experiences.30 It is through this complex system of racial projects that racial formation both produces racial identities and creates racialized structures that give meaning to one’s racialized categorization—meaning that impacts people in material, psychological, and corporeal manners. Racialized Social System

Bonilla-Silva introduced the idea of racialized social systems in response to conceptions of racism that focused on individual ideas and actions. Understanding racism as the ideas and actions of individuals—“bad characters”—implied that racism could be addressed by changing individuals. Framing racism as a product of individual ideas and actions postulated that ridding people of racist ideas would end racist actions, which in turn would eliminate racism. Bonilla-Silva argues that this focus on the individual ignores the ingrained racialized structures that exist in a given society. He views society as a set of social systems. In the United States, in particular, these social systems are structured by race—that is, they are racialized. Whether one focuses on residential patterns, schools, employment patterns, wealth, health, political districting, and so on, a racialized structure exists.31 As Bonilla-Silva describes, the first step in producing a racialized social system is racialization—the placement of people into racialized categories. Once people are membered into a racialized category, sectors of a society are structured such that access to and participation in specific sectors of society differ based on racialized membering.

The Systemic Structure of Racism 71 In all racialized social systems the placement of people in racial categories involves some form of hierarchy that produces definite social relations between the races. The race placed in the superior position tends to receive greater economic remuneration and access to better occupations and/or prospects in the labor market, occupies a primary position in the political system, is granted higher social estimation (e.g., is viewed as “smarter” or “better looking”), often has the license to draw physical (segregation) as well as social (racial etiquette) boundaries between itself and other races, and receives what DuBois (1939) calls a “psychological wage” (Marable 1983; Roediger 1991). The totality of these racialized social relations and practices constitutes the racial structure of a society.32 Although racialization is an essential process in the formation of a racialized social system, the racialized categories produced for a racialized social system do not necessarily have the same meaning across all racialized social systems.33 As an example, being membered Black in the United States has different meaning from being membered Black in Haiti, Brazil, or South Africa.34 As witnessed in South Africa since the 1980s, in some sectors, a series of anti-racist projects focused on ending apartheid shifted the location of people membered into racialized categories in the racial hierarchy. For example, the location of people membered Black shifted markedly within South Africa’s political system. To support a racialized social structure, ideology is developed by those who hold a dominant position within the social structure. Bonilla-Silva maintains that this ideology serves to justify the division of people into different racialized groups. Ideology also justifies the hierarchal racialized structuring of social systems, and the resulting disparate material, corporeal, and psychological impacts that the structure and functioning of the social system has on the lived experiences of people membered into different racialized groups.35 Most importantly, since ideology is developed and promulgated by those who hold a dominant position in the social structure, the ideology serves and protects the interests of that dominant group.36 Just as racialization and the subsequent formation of racialized groups follows racism, the development of ideology follows, and is a product of, the formation of a racialized social system. Racialism, Racialist, and Racist

The term racist is broadly applied to the ideas, thoughts, and actions of individuals as well as institutional policies and practices, laws and regulations, and society more generally. In effect, anything that functions to produce harm, directly or indirectly, for a person membered into a not-White racialized group is commonly termed racist.

72 Race, Racism, and the White Racial Frame Social scientists whose work focuses on race and racism, however, present a more nuanced perspective on the meaning of the term racist. Social scientists recognize that the narratives—stories that are told within a society (as well as those that are not)—are shaped by the racialized ideology constructed by those who hold a dominant position in society. In turn, these narratives, and the ideology that informs them, shape the everyday thinking and subsequent actions of all members of that society. This shaping of ideas and actions is termed racialism. As John Stanfield explains: [R]acialism is the routine, the everyday, taken-for-granted ways in which we are taught to use race in making normal and extraordinary decisions such as where to live and where not to live, who to befriend and who to fear … who to trust and who not to trust, who to hire and promote first, who is smart and who is dumb, who would make a good spouse and who would not, who can dance and who cannot, etc., etc. Racialism cognitively triggers mental images of the Racialized Other that are connected to the one-to-one presumptions about phenotype and behavior or social or cultural characteristics.37 The effects of racialism are the often-unconscious production of racist ideas—that is, concepts that position one racialized group as inferior or superior to another racialized group.38 Racialism conditions us to see and think of people membered into different racialized groups in specific ways. Because all members of a society are exposed to the dominant ideology and resulting narratives produced through that ideology, we are all, to some extent, filled with racialized ideas and understandings. This enculturation of racialized ideas means that all members of a racialized society—a racialized social system, to use Bonilla-Silva’s terminology—operate, in varying degrees, in a manner that is racialist. Having been “socialized in a society in which everything from language to whom we befriend and whom we trust, regardless of racialized categorization, is routinized in race mythology,” our thinking has been shaped in ways that make us all hold ideas that are racialist.39 In turn, holding ideas that are racialist results in each of us, to varying degrees, functioning as racialists. Having been molded to hold racialist ideas, however, does not make one racist. Rather, being a racist involves blatant, purposeful, conscious action aimed at causing harm or advantage for members of a racialized group or which is intended to introduce, extend, or maintain a structure, policy, or practice that differentially impacts members of racialized groups.40 In this way, a racist is understood as one who takes action that intentionally produces or maintains material, corporeal, and/or psychological harm for member(s) of a racialized group or which maintains advantage for the dominant racialized group.

The Systemic Structure of Racism 73 In this definition of racist, it is important to distinguish between the actor who engages in a racist act and the victim. Because this definition of a racist considers purposeful intent, an action performed by a racialist can be experienced as harmful materially, corporeally, or psychologically by others who are recipients of that action. In this way, a racialist may produce a racist impact—that is, one that causes harm to a member of a racialized group, or advantage to a member of the dominant racialized group—without being racist themselves. It is in this way that many racial enactments inflict harm on members of nondominant racialized groups. Moreover, as we explore shortly, the existing structuring of society along racialized lines allows members of society to simply follow current rules, regulations, and practices as good-meaning citizens in ways that (re)produce racist ideas and impacts. Race versus Ethnicity

Race and ethnicity are commonly conflated by social scientists and the general public. This conflation downplays the significance of racism as the cause of disparate economic, health, educational, political, and other outcomes.41 Those who conflate race and ethnicity point to ethnic groups, such as people of Irish descent or who practice Judaism, who have encountered various forms of discrimination yet successfully integrated into U.S. society and, in turn, gained economic, social, and political benefits similar to those people membered White who once discriminated against them.42 Given the successful integration and subsequent economic and social gains achieved by various ethnic groups, those who conflate ethnicity and race place blame on racialized groups that continue to experience disparate outcomes—placing responsibility for outcome disparities on the racialized group itself rather than on the historic and current policies and practices that negatively impact those groups. Social scientists offer three challenges to the conflation of race and ethnicity. First, the few examples of successful integration in the United States to which advocates point focus on ethnic groups whose phenotypical characteristics are indistinguishable from those already membered White. Although the language and cultural practices of immigrants of various White ethnicities made them stand out during the years immediately following immigration, these differences were only observable through direct interactions. Moreover, once these immigrants and/or their offspring adopted the American English language, the degree to which they stood out was reduced notably. And, as they and their offspring gradually replaced some/many of their cultural practices with those common to the dominant White race in the United States, their ethnic differentiation disappeared.43 Second, conflating race and ethnicity ignores the corporeal ocular markers upon which racialized categorization relies. As previously noted, nearly

74 Race, Racism, and the White Racial Frame all examples of successful ethnic integration focus on ethnic groups whose members pass as White. In part, this White passing is one reason why efforts in the late 19th and early 20th centuries to racialize southern and eastern Europeans failed—census enumerators could not observe differences between those deemed White and those who were members of a specific southern or eastern European ethnic group. In contrast, people membered into phenotypically based racialized groups cannot modify their corporeal characteristics that member them into a given racialized category. Despite the extent to which they adopt the language and cultural practices of the dominant White society, they continue to be perceived as members of their phenotypically based racialized group.44 As Omi and Winant observe, when race is conflated with ethnicity, “the race-concept is thus reduced to something like a preference, something variable and chosen, in the way one’s religion or language is chosen.”45 In turn, those policies and practices that produce racialized social systems and the resultant disparate impacts continue to impact people membered into not-White racialized groups. Finally, race and racialized oppression rely on relations between people deemed superior and those who are subordinate. Although there are examples in which dominant ethnic groups subordinate other ethnic groups— the Tutsi and Hutu in late 20th-century Rwanda; the Serbs, Croatians, and Bosnians in late 20th-century Yugoslavia—there are many more examples of different ethnic groups that reside in a balanced power relationship. The purpose of racialization, however, is to form groups with different power arrangements and to produce “rules” for membering people into a given racialized group. In this way, ethnicity and race are fundamentally different bases for group association. Ethnicity is based on cultural differences that have evolved over time. Race is based on differences in phenotypical characteristics that are manufactured to signify superior and inferior cognitive, psychological, and behavioral traits.46 As American historian Herbert Aptheker observes, belief in the superiority of one’s particular culture, or nation or class or sex is not the same as belief in the inherent, immutable, and significant inferiority of an entire physically characterized people, particularly in mental capacity, but also in emotional and ethical features.47 Although oppression based on ethnicity or racialized membering are both problematic, the manner in which race (and racialized categories) is manufactured to member people into hierarchically structured groups for the purpose of providing advantage to the dominant group through oppression of nondominant groups is fundamentally different from ethnicity.

The Systemic Structure of Racism 75 Theories of Racism The capture and subsequent commercial enslavement of people residing in Africa by the Portuguese in the mid-15th century marks the beginning of anti-Black racism. In what is now the United States, the first instances of racism occurred with the taking of land from the Indigenous people who had cared for that land for centuries, the slaughtering of those people to protect and expand control of taken land, and the development of narratives that justified these actions. The arrival of enslaved people of Africa in 1619 and the rapid expansion of slavery that followed birthed anti-Black racism in what became the United States. Over the next four centuries, the ways in which racism was enacted in the United States evolved from the brutally enforced, subjugated forced labor and denial of basic human rights that was American slavery to the overtly discriminatory laws, practices, and behaviors rampant during the Jim Crow era, and then again to the more subtle “race neutral” practices of today that continue to (re)produce profoundly disparate material, psychological, and corporeal impacts on people membered into different racialized groups. Despite the long evolution of racism in the United States, it was not until 1943 that the term racism was introduced as a formal scholarly concept by Ruth Benedict. An anthropologist at the University of Columbia and a former student of Franz Boas, Benedict was an outspoken anti-racist who challenged the concept of race and the then-dominant theory of racism. In her book, Race and Racism, Benedict defined racism in this way: Racism is the dogma that one ethnic group is condemned by nature to congenital inferiority and another group is destined to congenital superiority. It is the dogma that the hope of civilization depends upon eliminating some races and keeping others pure. It is the dogma that one race has carried progress with it throughout human history and can alone ensure future progress. It is a dogma rampant in the world today and which a few years ago was made into a principal basis of German polity. Racism is not, like race, a subject the content of which can be scientifically investigated. It is, like a religion, a belief which can be studied only historically. Like any belief which goes beyond scientific knowledge, it can be judged only by its fruits and by its votaries and its ulterior purposes. Of course, when it makes use of facts, racist interpretation can be checked against those facts, and the interpretation can be shown to be justified or unjustified on the basis of history and of scientific knowledge … But the literature of racism is extraordinarily inept and contradictory in its use of facts. Any scientist can disprove all its facts and still leave the belief untouched. Racism, therefore, like any dogma that cannot be scientifically demonstrated,

76 Race, Racism, and the White Racial Frame must be studied historically. We must investigate the conditions under which it arises and the uses to which it has been put.48 Setting aside the use of the term ethnic, Benedict’s definition characterized racism as an ideology—a set of ideas that maintained the supremacy of one racialized group over others. This race-based/racist ideology was designed to support the production of outcomes (progress) for the dominant racialized group at the expense of other racialized groups. Benedict’s description also recognizes the shaky “facts” regarding race upon which the ideology backing racism rests. Benedict recognizes the need to situate understandings of how racism operates in sociohistorical contexts. At its core, however, Benedict’s conception centers racism as an ideology. Prior to and following Benedict’s introduction of the term racism, several social scientists have presented theories about racism. As Omi and Winant explain, “Theory is driven by demand; by a necessity to explain, account for, and mange (as well as to resist) socio-historical changes.”49 For racism, the various theories developed since the early 19th century have sought to explain the disparities in the material, corporeal, and psychological experiences of people membered into different racialized groups. In this section, I focus on a subset of the theories that either gained favor at one point in time or have influenced current thinking on racism as it operates in the United States. Specifically, I summarize key features of five theories of racism: biological, cultural assimilation, individual, institutional, and structural. I then focus on the theory of systemic racism and present a model that connects elements of individual, institutional, and structural racism to form what I term a comprehensive theory of systemic racism. It is the ideology—the White Racial Frame—developed to drive this system that is the focus of the next chapter, and which, I argue in Part II, has informed (too) much of the work performed by the educational measurement community. Biological Racism

When first introduced, race was speciously understood as a biological concept manifested through a specific subset of phenotypical characteristics. The impacts of “one’s race” were commonly understood to extend beyond physical characteristics to affect cognitive and psychological traits of an individual as well as the social functioning of groups of individuals. Perceived differences in physical, cognitive, psychological, and societal characteristics associated with racialized categories of humans provided the foundation for ideology developed by the European and colonial American elites to justify enslavement of people from Africa, forced labor of Indigenous people residing in what is now termed Central and South America, and the slaughter and taking of Indigenous land across what is now the

The Systemic Structure of Racism 77 American continents, as well as Australia, New Zealand, and other regions of the world. During this period of colonization, the enslavement and subjugation of people membered Black or Indigenous was rationalized based on an ideology of Eurocentric white supremacy. Biological racism developed a Eurocentric white supremist ideology that fostered an understanding of race as biological to explain the condition of people membered into notWhite racial categories: this ideology posited that their condition was the natural outcome of inferior biological characteristics. There is no way of knowing what proportion of the colonial American and/or the European population embraced the ideology of biological racism during the 17th and 18th centuries. Without question, there were some who did not—Sir Thomas Browne (1646), Richard Baxter (1673), Thomas Tyron (1684), John Hepburn (1713), Ralph Sandiford (1729), John Woolman (1774), James Otis (1764), and Benjamin Rush (1773), to name a few.50 Nonetheless, during this period, a science birthed from this ideology emerged that endeavored to collect evidence that backed the superiority of people membered White and the inferiority of those membered into other racialized categories, particularly those membered Black. This field of study, now termed scientific racism, focused on several alleged differences between racialized groups—brain capacity, pain tolerance, life expectancy, and later, mental “intelligence.” Those who engaged in these “scientific” endeavors did so already knowing the conclusion their work would support—superiority of the White race and inferiority of people membered into other racialized groups. And the impetus for engaging in scientific racism was to provide evidence to support the Eurocentric white supremist ideology that justified the subjugation of people membered not-White. As we will see in Chapters 5–8, the core ideas undergirding the theory of biological racism had profound impacts on the research of early pioneers in educational measurement and continue to influence discourse about disparate educational (and other material, corporeal, and psychological) outcomes today. As Benedict notes in her introduction to the concept of racism, these historical and lasting impacts occurred despite the “extraordinarily inept,” “contradictory,” and “disproved” “facts” that undergird the biological theory of racism.51 Cultural Assimilation as Racism

The massive influx of immigrants from southern and eastern Europe and the northern migration of people membered Black during the late 19th and early 20th centuries sparked tensions among the populace residing in urban areas. The social relations produced by this tension captured the attention of Robert Park, a sociologist at the University of Chicago from 1914 to 1933. Of particular interest were differences in the way social dynamics shifted

78 Race, Racism, and the White Racial Frame over time for different groups of people who flocked to the cities. To explain differences in the experiences of these urban immigrants, Park and his colleague Ernest Burgess developed a theory that centered on assimilation. During this time period, nativism ran strong in American politics and in the psyches of a great many citizens membered White, due in large part to the rapidly changing demographics of immigrants entering the nation. As part of this nativism, efforts were made to racialize people immigrating to the United States from southern and eastern Europe. These efforts led Park and his colleagues to perceive the rise and, for some groups, gradual decrease in tensions as a “race relations cycle.” This cycle—or, more accurately, progression—involved four stages that moved sequentially and irreversibly from competition through conflict, into accommodation, and ended ultimately with assimilation.52 As Park states: The race problem has sometimes been described as a problem in assimilation … There is a process that goes on in society by which individuals spontaneously acquire one another’s language, characteristic attitudes, habits, and modes of behavior. There is also a process by which individuals and groups of individuals are taken over and incorporated into larger groups. Both processes have been concerned in the formation of modern nationalities … The growth of modern states exhibits the progressive merging of smaller, mutually exclusive, into larger and more inclusive social groups. This result has been achieved in various ways, but it has usually been followed, or accompanied, by a more or less complete adoption, by the members of the smaller groups, of the language, technique, and mores of the larger and more inclusive ones … In America it has become proverbial that a Pole, Lithuanian, or Norwegian cannot be distinguished, in the second generation, from an American born of native parents. There is no reason to assume that this assimilation of alien groups to native standards has modified to any great extent fundamental racial characteristics. It has, however, erased the external signs which formerly distinguished the members of one race from those of another.53 It is important to note Park’s use of people from Poland, Lithuania, and Norway as examples of immigrant groups that have “successfully” adopted dominant cultural practices and thus assimilated into U.S. society. Similarly, note Park’s reference to race—the assimilation process central in Park’s race relations cycle does not impact race itself. Instead, adoption of the dominant cultural values and practices changes the behaviors and interactions of the assimilated people such that they blend with the dominant social group—in this case, U.S. citizens membered White. Of course, for those European immigrant groups that Park, and later Park and Burgess,

The Systemic Structure of Racism 79 hold central to assimilation theory, this blending is facilitated by the absence of ocular markers of modern racialized groups. Critics of the race relations cycle/assimilation theory note the conflation of ethnicity with race as a clear weakness. As discussed earlier, the process of racialization to produce a racialized group marked by ocular corporeal characteristics is fundamentally different from the gradual production of an ethnic group.54 Whereas a person can modify the language with which they communicate, adopt a different religion, and adapt their cultural values and behaviors, one cannot modify their corporeal features in ways that blend with a dominant racialized group whose corporeal features differ. It is the conflation of ethnicity and race that more modern theorists of racism mark as a fundamental flaw in assimilation theory. A second criticism of assimilation theory focuses on the one-way adaptation of cultural values and practices. Stanfield, for example, argues: The assimilation ideology holds that Blacks and other racial minorities cannot possibly advance in bureaucratic hierarchies and still identify explicitly with the subordinate group. This ideology, of course, preserves Anglo-Saxon dominance, for it minimizes the risk that diverse values, worldviews, and behavior patterns will compete with those of the elite.55 Central, but unstated, in the assimilation theory is an understanding that the culture to which racialized groups assimilate is that of the dominant White race. Moreover, the assimilation theory accepts the proposition that people of [Anglo-Saxon] European descent (whites) possess the highest level of society and civilization and non-Europeans (people of color) need to assimilate fully to whites’ norms and social institutions if they are to become ‘civilized’ and avoid even greater oppression from whites than they currently endure … assimilation theory emphasizes one-way assimilation to white norms, ideals, behaviors, standards, social worlds, framing, institutions, and culture.56 This one-way assimilation ignores the many ways in which elements of nondominant culture influence and evolve dominant culture. Together, failure to differentiate the externally imposed racialized ascriptions from the development of ethnic culture and reliance on one-way adaptation of cultural values and practices led to the development of alternate theories of racism that decoupled race from ethnicity.57 Individual Racism

Many social scientists, and certainly the general public, conceive of racism as an ideological phenomenon acted out by individual members of a racialized

80 Race, Racism, and the White Racial Frame society.58 To solve problems of racism, then, efforts are needed to change the ideology held by individuals which, in turn, will change the acts they perform such that they are no longer racist. This theory locating racism within individuals was made popular in the mid-1940s by the Swedish economist and sociologist Gunnar Myrdal through his influential book An American Dilemma: The Negro Problem and Modern Democracy.59 Myrdal presented racism “as a set of erratic beliefs that may lead racist actors to develop ‘attitudes’ (prejudice) against the group(s) they conceive as inferior which may ultimately lead them to ‘act’ (discriminate) against the stereotyped group(s).”60 Myrdal perceived racism in the United States as a dilemma that pitted egalitarian principles of equality against discrimination that was pervasive throughout society at the time. As sociologists Eduardo Bonilla-Silva and Gianpaolo Baiocchi observe, because Myrdal located racism within the individual, he viewed the US racial problem as solvable in principle because it was mainly a matter of whites overcoming their failure to fully live up to this American creed. That is, change white prejudices about blacks, and racial relations will significantly improve.61 The focus on racism as individual has led to the parsing of individual racism into at least three broad categories: overt or dominative racism, implicit bias, and covert or aversive racism. Overt individual racism is commonly understood as the actions performed by what are termed “racists” or “bigots.”62 Use of racial slurs and hate crimes are perhaps the most common examples of overt racism today.63 Some analysts also classify various racial enactments—commonplace insults and racial slights—as forms of overt individual racism, particularly when these insults and slights draw on ideas that are well-understood as prejudiced.64 Implicit bias is the product of racialism—the routine, everyday, takenfor-granted ways in which we are taught to use race in making decisions. Ideas about racialized groups are promulgated throughout society via the media, politicians, and casual conversations and behaviors. From a very young age, members of a racialized society absorb these messages which involuntarily influence cognition. In turn, our responses—thoughts, words, and actions—are triggered as automatic processes. As psychologist Patricia Devine explains, Automatic processes involve the unintentional or spontaneous activation of some well-learned set of associations or responses that have been developed through repeated activation in memory. They do not require conscious effort and appear to be initiated by the presence of stimulus cues in the environment.65

The Systemic Structure of Racism 81 Triggered automatically and based on subconscious thought, implicit bias impacts decisions in an unrecognized manner. A growing body of research documents the role implicit bias plays in various fields. As an example, research on job hiring practices has found that modifying the name on an application from a “White sounding” name to a “Black sounding” name impacts the chances a person will be offered an interview.66 Other research documents the role implicit bias has on the chances a person receives emergency medical treatment or appropriate pain medication.67 Still other research identifies ways in which implicit bias impacts educational experiences, and in particular behavioral management, of students membered into different racialized groups.68 In all these examples, discriminatory actions occur without intent and are understood as the product of unconsciously triggered ideas and associations engrained by society from a very young age.69 Aversive racism is an extension of implicit bias. Introduced by psychoanalyst Joel Kovel, aversive racism is typically enacted by people who are aware of their prejudices and are concerned about wrongdoing when interacting with people in interracial situations.70 As psychologist Samuel Gaertner and John Dovidio explain, people who exhibit aversive racism “sympathize with victims of past injustice, support the principle of racial equality, and regard themselves as nonprejudiced, but, at the same time, possess negative feelings and beliefs about Blacks, which may be unconscious.”71 Although aversive racists hold egalitarian beliefs and may resist taking action induced by their prejudices, they nonetheless discriminate at times. As an example, Gaertner and Dovidio conducted an experiment in which participants believed to exhibit aversive racism were exposed to a situation in which they witness a staged emergency and their response—whether or not they provide assistance—was observed.72 The experiment manipulated two features of the staged emergency: (a) the racialized identity of the victim (White or Black); and (b) the number of other people witnessing the emergency (no one, or multiple observers). The findings showed that when no one else observed the emergency, the research participants provided assistance 85% of the time, regardless of the racialized identity of the victim. When other observers were present, the participants provided assistance to a White victim 75% of the time, but only 37.5% when the victim was Black. This finding was consistent with similar experiments. Gaertner and Dovidio explain that with aversive racism, discrimination by Whites against Blacks occurs primarily when norms for appropriate behavior are weak or ambiguous … and tend to be more pronounced when the interaction involves potential threats to the traditionally superior status of Whites relative to Blacks.73

82 Race, Racism, and the White Racial Frame As a special case of implicit racism that is enacted by well-meaning, egalitarian people who often characterize themselves as “not racist,” aversive racism can impact job hiring and promotion decisions, college admission decisions, criminal justice sentencing decisions, and so on.74 Whether a product of overt racism, implicit bias, or aversive racism, social scientists acknowledge the material, corporeal, and psychological harm caused by individual racism. Two ideas are core to these three facets of individual racism. First, ideology is believed to precede racism. This a priori ideology is believed to shape understanding of a superior race and inferior races. This understanding of superiority then influences individual thought and creates prejudice against people membered into nondominant (“inferior”) racialized groups. Second, actions performed by individuals that cause the disparate impacts of racism are understood as being induced by the prejudiced thoughts held by individuals.75 Most modern analysts of race and racism acknowledge that European colonial administrators used science to justify both their enslavement of people of Black African descent and their slaughter of Indigenous people whose land they took.76 However, they challenge the belief that an ideology of White European supremacy predates both the enslavement of people from Africa and encounters with Indigenous people. As the late Stuart Hall, a British sociologist of Jamaican descent, argues, It might be better to start from the opposite end—by seeing how slavery … produced those forms of juridical racism which distinguish the epoch of plantation slavery. The elaboration of the juridical and property forms of slavery, as a set of enclaves within societies predicated on other legal and property forms, required specific and elaborate ideological work.77 From this perspective, the ideology was developed in response to actions that caused harm to enslaved people of Africa and people Indigenous to taken land. Moreover, the ideology was developed to justify actions that had already occurred and to back laws, regulations, and judicial decisions that codified slavery. Critics holding this view acknowledge that the manufactured ideology influenced future understandings and actions, but they maintain that the ideology followed, rather than preceded, the initial racist actions that required justification. Critics of the individual theory of racism also challenge the belief that the discriminatory actions of individuals (induced by prejudice) are the primary driver of disparate material, corporeal, and psychological outcomes caused by racism. Although these critics recognize that individual actions— whether the product of overt racism, implicit bias, or aversive racism—produce harm, they argue that forces larger than those produced by individuals are primarily responsible for disparate outcomes.78 This view inspired the

The Systemic Structure of Racism 83 lead title of Bonilla-Silva’s influential book, Racism without Racists, in which he argues the effects of racism, and racism itself, will persist even if the ideas and actions of individuals are modified such that they are not influenced by racial prejudice and are no longer discriminatory.79 In effect, what is questioned by more modern theories of racism is whether the theory of individual racism is sufficient for explaining the full and recuring impacts of racism. Institutional Racism

The theory of institutional racism was introduced by the political activist Kwame Ture (whose work is published under his former name Stokely Carmichael) and political scientist Charles Hamilton in their book Black Power: The Politics of Liberation.80 Like W.E.B. Du Bois and other social scientists before them, Carmichael and Hamilton recognized that, although pernicious and harmful, the powerful effects of racism did not come from the acts of individuals. Instead, they argued that the most impactful effects of racism are the product of discriminatory policies and practices that occur within the many institutions that operate the United States. As Carol Yeakey, a professor of education and urban studies, summarizes, racism operates on both an overt individual level and a covert institutional level, “where racism as a normative, societal ideology operates within and among the organizations, institutions, and processes of the larger society … the overt acts of individual racism and the more covert acts of institutional racism have a mutually reinforcing effect.”81 At the institutional level, racism operates by combining prejudice and power to create policies and practices that produce, either intentionally or unintentionally, discriminatory outcomes across racialized groups.82 According to James Scheurich and Michelle Young, specialists in educational leadership and administration, Institutional racism exists when institutions or organizations, including educational ones, have standard operating procedures (intended or unintended) that hurt members of one or more races in relation to members of the dominant race … Institutional racism also exists when institutional or organizational cultures, rules, habits, or symbols have the same biasing effect.83 Once established within an institution, these racialized policies and practices have effects that outlast those who first established them. In this way, the policy and practice become a part of the institution—they are institutionalized—and have effects that last well beyond the individual prejudice that contributed to the creation of the policy or practice.

84 Race, Racism, and the White Racial Frame Social scientists point to many examples of institutional racism. As an early example, Ibram Kendi, historian and Director of the Center for Antiracist Research at Boston University, points to a book by Frederick Hoffman authored in 1896 titled Race, Traits and Tendencies of the American Negro, in which the health and welfare of people membered Black are contrasted with when they were first emancipated from slavery and the thencurrent time. Hoffmann’s core argument is that since emancipation, the health and welfare of people membered Black had decreased precipitously such that they were at risk of extinction. This specious observation was used by insurance companies to deny life insurance policies to people membered Black due to alleged high risk of death. Once in place, the misinformed racialized policy established an institutional practice that produced disparities in life insurance offerings—and, in turn, intergenerational transfer of wealth—which outlasted the company decision-makers who first introduced the policy.84 Richard Rothstein, a senior fellow at the Thurgood Marshall Institute, documents a variety of examples of institutional policies introduced between the 1930s and into the early 21st century that have produced profound disparities in housing and wealth. Among the examples is the development of color-coded maps (a.k.a. redlining) by the New Deal Home Owners Loan Corporation and the Federal Housing Authority, which were used by banks to deny mortgages to people attempting to purchase homes in neighborhoods populated predominantly by people membered Black. Similarly, the Social Security Act provided benefits to people working in many sectors of the economy. Excluded, however, were household and agricultural workers—two sectors in which people membered Black were overrepresented. Once established, this policy had profound and lasting disparate impacts on the retirement benefits and financial security of people membered Black.85 Similarly, the Servicemen’s Readjustment Act of 1944—more commonly known as the G.I. Bill—provided educational and mortgage benefits for returning servicemen. At the same time, the federal government made considerable investments in housing development, which led to the creation of large suburban housing tracks. Stipulations in both the G.I. Bill and federal housing development programs established barriers for people membered Black from accessing these benefits and, in some cases, from purchasing these federally financed suburban homes. Again, once established, the enactment of practice based on these policies had lasting impacts.86 More recently, scholars point to policies regarding illegal drug sentencing and the subprime mortgage crisis that occurred in the late 2000s as examples of institutional racism. Specifically, disparities in mandatory sentences for possession of crack cocaine versus powder cocaine established in the 1980s and 1990s are cited as a driving force that greatly increased the number of

The Systemic Structure of Racism 85 people membered Black imprisoned by the criminal justice system. Whereas a person possessing only 5 grams of crack cocaine was mandated to a fiveyear prison sentence, 500 grams of powder cocaine was required for the same sentence. At the time, crack cocaine was present in neighborhoods resided predominantly by people membered Black, while powder cocaine was used more frequently by people membered White.87 Similarly, disparities in the lending of mortgages at subprime rates also negatively impacted a higher percentage of people membered Black. As Omi and Winant describe, in the 1990s and early 2000s, mortgage companies developed mortgage programs for what is termed “assigned risk” borrowers—those who present a higher credit risk. These loans came with higher interest rates and fees. Like policing practices concentrated in neighborhoods predominantly resided by people membered Black, mortgage companies aggressively sought potential homeowners membered Black and Latine to whom they extended assigned risk mortgages. In many cases, however, the potential homeowners had credit ratings that qualified them for conventional mortgages at prime (i.e., lower) rates and fee structures. In effect, this practice steered a disproportionately high percentage of subprime mortgages to households membered Black and Latine, which produced disparate impacts on wealth and loss of homeownership when the subprime mortgage collapse occurred.88 Yet again, an institutional practice that discriminated based on racialized group membership produced disparate outcomes. In education, a variety of institutional policies and practices similarly contribute to the production of disparate outcomes. These policies and practices include: choice of curricular materials (particularly books) authored by and/ or that present narratives of characters membered White; course and special education recommendations and decisions that differentially track students membered Black and Latine; security screening practices implemented differentially across schools predominantly serving students membered White versus not-White; use of test scores that are known to differ, on average, across racialized groups to inform admission decisions or the hiring of educators; and the mechanisms applied to fund schools that produce inequities in resources available across schools predominantly serving students membered White versus not-White.89 These and other institutional practices all contribute to differences in the educational outcomes of students membered into different racialized groups. Collectively, these and many more policies and practices established and implemented within the many institutions that operate within the United States (and across the world) produce what some term a “race tax.” This tax operates to increase costs—material, psychological, and physical—for people membered Black, Brown, Asian, and Indigenous and, at the same time, provides benefits to those membered White.90 Although these costs and benefits

86 Race, Racism, and the White Racial Frame are distributed differently among members within each racialized group, they serve as a vehicle that steers resources from nondominant racialized groups to people membered White—particularly those with elite status. Structural Racism

The terms structural racism and institutional racism are often used interchangeably.91 As I use the terms, institutional racism focuses on the policies and practices enacted within an institution that discriminate based on racialized categorizations and produce disparate outcomes across racialized lines for people working in the institution or who are served by the institution. Put simply, institutional racism focuses on what occurs within an institution that differentially impacts people that interact with that institution. Structural racism focuses more broadly on the structures that control access to institutions, connect institutions, and/or the impacts produced by institutions. As Nancy Krieger, a professor of social epidemiology, describes, structural racism refers to the totality of ways in which societies foster discrimination, via mutually reinforcing systems of discrimination (e.g., in housing, education, employment, earnings, benefits, credit, media, health care, criminal justice, etc.) that in turn reinforce discriminatory beliefs, values, and distribution of resources.92 Whereas individual racism functions at the nano-level to impact individuals, institutional racism operates at the micro-level to impact groups of individuals working within or served by a given institution.93 Structural racism, then, operates at the meso-level and encompasses the socioecological arrangements that connect institutional impacts to produce racial inequities. “As fundamental causes, they [socioecological arrangements] are constantly reconstituting the conditions necessary to ensure their perpetuation.”94 In this way, structural racism acts as a set of funnels, pins, and barriers that both route people into different physical and social spaces and guides them differentially through the system of institutions that form a social, economic, and political system.95 There are three components to structural racism. The first component comprises the social structure of the United States, the forming of which dates back to colonial times and the constituting of the nation following the Revolutionary War. The second focuses on the structuring of social systems and physical spaces that have and continue to serve to segregate U.S. society along racialized lines. The third is the interactions between institutions that compound impacts produced within and by individual institutions.

The Systemic Structure of Racism 87 Social Structuring of the United States

Charles Mills, the late professor of philosophy, argues that the social, economic, and political structure of the United States was and continues to be shaped by a racial contract. In developing the concept of a racial contract, Mills both challenges and extends Jean-Jacques Rousseau’s idea of a social contract. The theory of a social contract was developed to explain the transition of humans from a state in which they struggled for survival as individuals in nature to a state formed by a social society in which they struggle collectively. In forming a social society, individuals tacitly agree to cede some individual freedoms as they submit to rules and authority that protect their remaining rights and maintain social order. In effect, every person who is a member of a society is understood as entering a social contract in which they agree to give up some rights in order to gain protection. In return, each member of a society gains an increase in their security and quality of life as compared to their struggle as an individual in nature.96 Mills argues that the social contract that forms the basis of society in the United States includes a racial element. As Mills explains: if we think of human beings as starting off in a “state of nature,” it suggests that they then decide to establish a civil society and a government. What we have, then, is a theory that founds government on the popular consent of individuals taken as equals … But the peculiar contract to which I am referring … is not a contract between everybody (“we the people”), but between just the people who count, the people who really are people (“we the white people”). So it is a Racial Contract.97 The general purpose of the Contract is always the differential privileging of the whites as a group with respect to the nonwhites as a group, the exploitation of their bodies, land, and resources, and the denial of equal socioeconomic opportunities to them.98 Whereas the social contract shifts the human from a state of nature (or “natural man”) to a “civil/political” person, the racial contract partitions humans into “White” and “not-White.” The rights, and protection of those rights, provided by the social order (a.k.a. the state) then differ for those membered White and those membered not-White. It is important to note that the Racial Contract is not a contract to which the nonwhite subset of humans can be a genuinely consenting party … Rather, it is a contract between those categorized as white over the nonwhites, who are thus the objects rather than the subjects of the agreement.99

88 Race, Racism, and the White Racial Frame Mills later argues, The “Racial Contract” as a theory puts race where it belongs—at center stage and demonstrates how the polity was in fact a racial one, a whitesupremacist state, for which differential white racial entitlement and nonwhite racial subordination were defining, thus inevitably molding white moral psychology and moral theorizing.100 Mills also contends that there is an economic motivation to the Racial Contract. As he explains: the economic dimension of the Racial Contract is the most salient, foreground rather than background, since the Racial Contract is calculatedly aimed at economic exploitation. The whole point of establishing a moral hierarchy and juridically partitioning the polity according to race is to secure and legitimate the privileging of those individuals designated as white/persons and the exploitation of those individuals designated as nonwhite/subpersons.101 Understood in this way, a racial contract provides a useful frame for understanding how the social, legal, and political systems that operate in the United States tolerate the current racialized structure that exists within the United States. Given the system of White domination that existed during the colonial period, the founding of the United States by elite men membered White naturally incorporated assumptions of white supremacy into the structure of the U.S. political, social, and economic system. This white supremist racialized structure is evident in permitting slavery to function as a legal institution, with guaranteed protections for slave owners, and in the counting of enslaved people as three-fifths a human. It is also evident in the various court rulings that protected slavery, whiteness, the principle of separate-but-equal, gerrymandering, redlining, and, most recently, a focus on de jure versus de facto discrimination.102 Although some people membered White are not signatories to the racial contract, all people membered White are beneficiaries of the contract. In this way, the racial contract forms the foundation for the functioning of various forms of racism, including for a long period of time deadly forms of overt individual racism. Structuring of Social Systems and Physical Spaces

The membering of people as White and not-White has had profound impacts on the formation of social systems and the physical spaces in which social systems operate. As Bonilla-Silva describes, the United States (and many other regions of the world) is formed by racialized social systems that position

The Systemic Structure of Racism 89 people membered White at different (advantaged) levels than those membered not-White. This hierarchical racialized social structure impacts the roles people play, the interactions among people, and the lived experiences of people located within different levels of the hierarchy. During colonial times and the first 80 years following independence, slavery produced hard lines that separated those who were enslaved from the rest of society. This impacted the work performed by people membered free and those who were enslaved as well as the interactions between those free and those enslaved—one group behaving with authority, the other with forced deference. Enslavement also created physical separation among racialized groups—those enslaved lived in separate quarters from those membered White. Similarly, those membered Indigenous were forced onto to separate “reservation” lands. Within the same nation, this physical separation fostered the development of different customs, dialects, values, and cultural norms. Following the abolition of slavery, the persistence of racialized social systems was enabled by a racial contract and the legal system that embraces its core principle of white supremacy. Although people membered Black were granted increased freedom, the racialized social system remained largely segregated at all levels of operation. Residence, occupations, political roles, and cultural practices all produced different experiences and outcomes for people membered into different levels of the racialized social system. Institutional policies and practices such as redlining, social security, college admission testing, and the principle of separate-but-equal maintained and, in some cases, expanded the gulf between the physical, social, and economic spaces in which people membered White and those membered notWhite operated. Even today, these spaces remain deeply segregated.103 Analyses of the racialized composition of spaces in residential communities and workplaces document the dramatic segregation of residential spaces that exist today. These analyses show that during working hours, some spaces become much more diverse as people commute to places of employment. Deeper analysis of these more diverse workspaces, however, show that while people commute to produce what appears to be a more diverse collection of people on a horizontal plane, vertically, the workplace remains largely segregated—people membered not-White are overrepresented in lower-paying, less prestigious jobs, while people membered White are overrepresented in higher-paying, white-collar positions.104 The spaces in which we live, learn, socialize, and work are structured in ways that limit contact among people membered White and people membered into other racialized groups.105 As research by sociologist Joe Feagin documents, people membered white “tend to have much more racially homogenous networks than people in other racial groups.”106 And, when people membered White do interact with people membered into other racialized

90 Race, Racism, and the White Racial Frame groups, they often do so from a position of power and in a transactional manner.107 The way in which space has been racially structured interacts with the racialized structuring of social systems to sustain racist ideas that spawn individual racism and enable institutions to function differently for a segregated people. Interactions Among Institutions

Institutional racism focuses on policies and practices developed and operated within institutions. Structural racism connects disparate impacts produced by various forms of institutional racism to further compound these impacts. As an example, Rothstein documents the way in which redlining prevented many people membered Black from acquiring residential real estate and, in turn, structured segregated communities. Once people membered Black were located in high-density communities with a relatively low volume of rental properties, landlords (predominantly membered White) increased rents. Together, increased cost of living and lack of home ownership manufactured a disparity in wealth accumulation between people membered Black and White. When redlining practices abated, opening a larger segment of the housing market to people membered Black, this wealth disparity interacted with policies enacted by mortgage companies that produced higher interest rates for higher-risk loans and required mortgage insurance for smaller down payments, which disparately impacted the mortgage and insurance costs for people membered Black. In turn, these increased costs influenced the value of homes that were affordable to people membered Black compared to people membered White, even when annual incomes were comparable. The lower home values later impacted the amount of money available to fund college education and retirement savings. Michelle Alexander similarly documents the ways in which the segregating of communities—produced by redlining—interacts with policing practices to focus increased attention on densely populated communities overrepresented by people membered Black and Latine. In turn, increased policing—and increased use of policing practices such as “stop and frisk”— increase arrest rates. These arrests often come with court processing fees and, in some cases, fees for time spent in jail awaiting trial. Disparities in conviction rates—in part a function of differences in access to sound legal representation due to differences in wealth—and sentencing produce disparities in the incarceration of people membered Black and Latine compared to those membered White. While imprisoned, often in a predominantly White rural area, the right to vote is restricted and political representation shifts from the prisoner’s home neighborhood to the location of the prison— a practice that decreases representation in communities populated by people membered Black and Latine and overrepresents communities predominantly

The Systemic Structure of Racism 91 populated by people membered White in which a prison is located. Once released, federal and state social support policies often limit or prevent access to housing and food support for former inmates or to people who reside with a former inmate. This policy, in turn, produces challenges for households who require these supports—a pressure that causes some households to separate. Similarly, hiring practices within many institutions inhibit the hiring of people with a prison record—a practice that disproportionately limits employment opportunities for people membered Black and Latine. In this way, the impacts produced by policies and practices of several separate institutions interact to compound the disparate impacts experienced by people membered Black and Latine.108 Collectively, a racial contract structured the United States along racial lines. This racialized structuring enabled social systems and physical spaces to be similarly structured along racialized lines. The segregation of communities and the workplace facilitates interactions among various forms of institutional racism which further compound disparities in outcomes and lived experiences of people membered into different racialized categories. Collectively, the racial contract, organization of racialized social systems and physical spaces, and interactions among institutions operate as structural racism. Systemic Racism

If structural racism functions at the meso-level, systemic racism operates at the macro-layer. Feagin observes that the word systemic means “to place or stand together,” and he uses the term in reference to “an organized societal whole with many interconnected elements.”109 Systemic racism connects individual, institutional, and structural racism to sustain and, at times, increase power for the dominant elite and to provide unjust economic, political, and social advantage to people membered into the dominant racialized group.110 Essential in this system of racism is the development of ideology that justifies the racialization of people, the hierarchical structuring of racialized groups, and the disparate outcomes experienced by racialized groups. Here, I use the term ideology to mean a world view that stabilizes or legitimizes domination.111 An ideology is a political instrument that creates “meaning in the service of power” and is “central in the production and reinforcement of the status quo.”112 As Bonilla-Silva explains, “ideologies, like grammar, are learned socially and, therefore, the rules of how to speak properly come ‘naturally’ to people socialized in particular societies.”113 An ideology is not designed to be logical or accurate. Rather, “the strength of an ideology lies in its loose-jointed, flexible application.”114 An ideology is used by those in power “to ‘glue’ together contradictory practices and

92 Race, Racism, and the White Racial Frame structures: despotism and democracy, coercion and consent, formal equality and substantive inequality, identity and difference.”115 As an example, President George W. Bush once argued that, because people membered Black have a lower life expectancy, Social Security discriminates against them because they receive fewer payments during their lifetime. To rectify this “injustice,” Bush argued that Social Security should be privatized in order to allow people membered Black greater access to their accounts (while they remain alive).116 In this example, the effects of systemic racism on the health of people membered Black is not addressed to justify changes to improve their health, but rather to justify privatization of retirement accounts—a policy change that would increase earnings for the predominantly White and already-wealthy investment managers. Racialized ideology serves as a central schema in a racialized social system by forming “a complex set of background ideas that people draw on but rarely question in their daily affairs … stock ideas and practices that we have absorbed and heavily rel[y] upon but to which we give little thought.”117 Racialized ideology “serve[s] as collective ways of understanding our lives and … provide[s] explanations for both the causes and solutions to personal and social problems.”118 Racialized ideology provides a commonsense understanding of how the world functions which helps people—particularly those membered into the dominant racialized group—“make sense of racial gaps in earnings, wealth, and health such that whites do not see any connection between their gain and others’ loss.”119 Racialized ideologies also provide a foundation for racialized narratives that reinforce understandings fostered by the ideology. Narratives are told by many agents, some acting as individuals—family members, friends, political leaders—and others operating within institutions—the press, media, scholarly publications, school textbooks. Like the ideology upon which they are based, narratives are often false. Yet, once told, they form concrete representations of ideological understandings. Like ideology, narratives are shaped and reshaped in response to sociohistorical developments and often present contradictory ideas. In Stamped from the Beginning, Kendi traces the long history of the reshaping of narratives. During the early and mid-19th century, narratives portrayed people membered Black as “naturally servile” because they did not revolt against enslavement. Yet, “whenever they did fight, reactionary commentators, in both North and South, classified them as barbaric animals who needed to be caged in slavery.”120 Similar contradictory narratives portrayed enslaved people membered Black as “naturally docile and well equipped to take orders,” yet during the Civil War, these same people were depicted as “uncontrollable brutes.”121 Feagin similarly documents the shift in narrative about Indigenous people. Following his interactions with Indigenous people in 1584, Arthur

The Systemic Structure of Racism 93 Barlowe wrote, “We found the people most gentle loving and faithful, void of all guile and treason, and such as lived after the manner of the Golden Age.” Not long after, however, Robert Gray justified his attack of the Indigenous to claim their land describing the Indigenous people as “wild beasts,” “unreasonable creatures,” and “brutish savages” who “worshipped the devil.” From this point forward, the Indigenous were termed “savage,” “infidels,” “heathen,” “barbarian,” and “wild animals that needed to be rooted out of their dens, jungles, and swamps.”122 More recent narratives depict “the black man as a thug;” a portrayal that helped justify the rapid increase in the imprisonment of men membered Black during the 1990s and 2000s.123 Ronald Reagan’s narrative depicting a mother membered Black as a “welfare queen” who was alleged to have collected $150,000 in social service benefits created imagery that helped justify changes to public assistance programs.124 Even more recently, narratives promulgated by Donald Trump depicting people immigrating from Mexico and Latin America as rapists, drug lords, and murderers were similarly developed to denigrate people membered Latine in order to justify policy changes that would limit their immigration to the United States. Narratives generated through the media and other outlets do not just create negative images of people membered into nondominant racialized groups; they also elevate images of people membered White. In movies and television shows, characters who appear White are often portrayed “with more depth and redeeming qualities [which] work to justify the fact that whites tend to do better on nearly any social measure.”125 Although racialized ideologies and the narratives they inspire are false, they serve a practical role.126 In the United States, racialized ideology capitalizes on the more general ideology of individualism and merit to spawn narratives that explain disparate outcomes in a manner that focuses attention on individuals and “characteristics” of racialized groups, “verifying” racialized stereotypes, and distracting attention from the historical and current institutional policies, practices, and structures that produce those disparities. As Kendi explains, Time and again, powerful and brilliant men and women have produced racist ideas in order to justify the racist policies of their era, in order to redirect the blame for their era’s racial disparities away from those policies and onto Black people … racial discrimination led to racist ideas which led to ignorance and hate … Racially discriminatory policies have usually sprung from the economic, political, and cultural self-interests, self-interests that are constantly changing.127 In this way, racialized ideologies are developed and operate to serve power by providing a defense for the economic, social, and political advantages

94 Race, Racism, and the White Racial Frame manufactured for the dominant racialized group.128 As Frederick Douglas once said, “When men oppress their fellow men, the oppressor ever finds, in the character of the oppressed, a full justification for his oppression.”129 Racialized ideology is developed and applied to provide this justification. Systemic racism is a complex system that connects individual racism, institutional racism, and structural racism to produce advantage for the dominant racialized group and harm for nondominant racialized group members. This system of racism sustains power held by elite members of the dominant racialized group. The powerful elite develop ideology from which racialized narratives derive. Together the ideology and narratives justify, and at times motivate, individual actions, institutional policies, and structures that sustain the production of disparate outcomes. To visualize this system of racism, Figure 3.1 depicts a model of systemic racism. As I show in this figure, systemic racism begins and ends with power. People with power control the production of ideology and the formation of racially stratified categories.130 Ideology is crafted in a manner that justifies the formation of racially stratified categories and supports racialized narratives that produce “commonsense” understandings of how members of racialized categories differ from each other and why these differences contribute to the production of disparate outcomes. This ideology and associated narratives shape individual thoughts and actions that manifest in overt and aversive individual racism. These individual forms of racism cause psychological, corporeal, and material harm for individual victims membered into nondominant racialized groups. Racialized ideology and narratives also influence institutional policies, which are the basis of institutional racism. Each form of institutional racism produces a specific subset of disparate outcomes in sectors including economic, residential, educational, health, policing, criminal justice, social services, voting rights, and so on. Historical and current laws and regulations create a structure that connects and compounds the impacts of institutional racism which manufactures disparities in outcomes and lived experiences. Together, these disparate outcomes contribute to psychological, corporeal, and material harm and collectively serve to manufacture economic, social, and political advantage for those membered into the dominant race. Historical and current laws and regulations also (re)molded racialized categories. At times, this (re)molding leads to the manufacturing of a new racialized category—Mulatto in the mid-1880s, Hispanic/Latine in the 1970s, Middle Eastern/Muslim today. At other times, (re)molding modifies a racialized category—protecting Whiteness by excluding Caucasians of South Asian descent in the early 1900s. Perhaps most importantly, disparate outcomes and harm combine with these advantages to bolster power. However, as we will see in Part III,

The Systemic Structure of Racism 95

Figure 3.1 A Model of Systemic Racism.

96 Race, Racism, and the White Racial Frame disparities and harm produced for those membered into nondominant racialized groups, and the advantage manufactured for those membered into the dominant racialized group, can also serve as a motivation for anti-racist resistance. Bonilla-Silva argues that racism will persist without racists. Racists are commonly understood as those who perform overt acts (mis)informed by racialized ideology and false racialized narratives. Some observers extend the conception of racists to those who engage in aversive actions, given that such actions also produce harm for members of nondominant racialized groups. As depicted in the model of systemic racism, removing individual racism—both overt and aversive actions—would remove one source of psychological, corporeal, and material harm. Impacts of institutional racism might also be reduced (slightly). However, removing racists—those who engage in individual racism—would have no direct impact on the many disparities in outcomes manufactured by this system of racism. As sociologists Hayward Horton and Lori Sykes explain, the separation of harms caused by words and actions of racists from the impacts produced by racism occurs because: racial prejudice [is] distinct from racism, the former is neither necessary nor sufficient to infer or indicate the presence of the latter. In practical terms, this means that in contemporary America, racism can continue to be a major force in determining racial inequality although the majority of dominant group members may not themselves be racist. But they don’t have to be. The racism embedded into the heart of the social structure is on autopilot.131 It is in this way that one does not need to be racist in order to be part of the system of racism.132 Given the often unseen—or at least unacknowledged—advantages provided to people membered into the dominant racialized group, it is unlikely that the removal of racists—that is, individual racism—would fundamentally alter the maintenance of power for the elite membered into the dominant racialized group. Nor is it likely to add to the resistance to this power. Bonilla-Silva’s observation provides one example of how systemic racism operates today without any one person or group at its helm—the system is established and designed to sustain itself despite setbacks to distinct elements. This model of systemic racism also demonstrates the essential role ideology plays in driving systemic racism. Ideology, and the narratives based on it, serves as both a motivator for action and a justification of disparate outcomes. As the many social scientists quoted in this chapter attest, ideology is manufactured by the powerful elite to justify the outcomes produced by the

The Systemic Structure of Racism 97 social system they control—in the case of the United States, a racialized social system undergirded by a racial contract. As I explore in the next chapter, core to the ideology that operates today is the White Racial Frame—a frame birthed during the Enlightenment and refined ever since. As we will see, this White Racial Frame is the foundation of the racialized ideology that drives and justifies this system of racism, and that has and continues to play a pivotal role in shaping the field of educational measurement. Notes 1 Benjamin (2019). 2 Partington and McKie (1937, 1938). I credit Borsboom et al. (2009) for introducing me to the story of phlogiston. 3 Roberts (2011). 4 Horton and Sykes (2008). 5 Omi and Winant (2015), p. 3. 6 Omi and Winant (2015), p. 13; see also Zuberi and Bonilla-Silva (2008), p. 34. 7 Lewis et al. (2019 p. 30) suggest that educational researchers might benefit from incorporating some of the sociological research on race and racism into their scholarship as such engagement would help to refine and deepen understandings of what happens in schools and how schools matter for society. The content of this chapter draws heavily on the work of sociologists Eduardo Bonilla-Silva, John Stanfield II, Michael Omi and Howard Winant, and Joe Feagin, and attempts to integrate their thinking into a comprehensive model of systemic racism. 8 Omi and Winant (2015), p. 110. 9 Lewis (2004), p. 625. 10 Zuberi and Bonilla-Silva (2008), p. 34. 11 Omi and Winant (2015), p. 111. 12 Bonilla-Silva (1996), pp. 471–472. 13 Omi and Winant (2015), pp. 245–247. 14 Omi and Winant (2015), p. 111. 15 Cornell and Hartmann (2006), p. 25. 16 Omi and Winant (2015), p. 106. 17 Omi and Winant (2015), p. 13. 18 Omi and Winant (2015), p. 107. 19 Hochschild and Powell (2008), p. 61. 20 Leeman (2004), p. 508, referencing Nobles (2000). 21 Feagin (2006). 22 Omi and Winant (2015), p. 111. 23 Bonilla-Silva (1996), pp. 471–472. 24 Omi and Winant (2015), p. 111. 25 Omi and Winant (2015), p. 13. 26 Foucault (1980). 27 Rothstein (2017). 28 For policing practices, see Alexander (2012); Butler (2018). For college admissions test score use, see Public Counsel (2019).

98 Race, Racism, and the White Racial Frame 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

Omi and Winant (2015), p. 13. Omi and Winant (2015), p. 128. Bonilla-Silva (1996). Bonilla-Silva (1996), pp. 469–470. Lewis et al. (2019), p. 33. Omi and Winant (2015). Bonilla-Silva (1996), p. 474. Golash-Boza (2016), p. 133. Stanfield (2011), p. 105. Kendi (2016), p. 5. Stanfield (2016), p. 71. Stanfield (2016), p. 71; Omi and Winant (2015), p. 128. Omi and Winant (2015). Myrdal (1944); Jacobson (1999). Omi and Winant (2015). López et al. (2018). Omi and Winant (2015), p. 22. Bonilla-Silva (1996). Aptheker (1992), pp. xiii–xiv. Benedict (1943), p. 97. Note that in the first sentence, Benedict uses the term “ethnic” rather than “racial.” Recall the manner in which European ethnic groups were racialized during the later 19th and early 20th centuries. Similarly, people from Mexico and later other regions of Central and South America have been racialized as Hispanic, Latino/a, and/or Latine. More recently, people who practice Islam have similarly been racialized. Here I use the term racialized to mean membering a group of people not-White for the purposes of advantaging people membered White through the oppression of the racialized group(s). 49 Omi and Winant (2015), p. 249. 50 Aptheker (1975). 51 Benedict (1943), p. 97. See also Roberts (2011) for a thorough review of evidence, specifically that focused on genetics, that refutes the “facts” undergirding biological racism. 52 See Lewis et al. (2019), p. 31 for a summary of Park’s theory of assimilation. See Park and Burgess (2019) for the original presentation of the race relations cycle. 53 Park (1914), pp. 606–607. 54 Zuberi and Bonilla-Silva (2008), see p. 40. 55 Stanfield (2016), p. 124. 56 Elias and Feagin (2016), pp. 42–43, italics in the original. 57 Some theories of racism conflated race with class (or, more accurately, class conflict). For a detailed discussion of the shortcomings of the class-based theories of racism, see Omi and Winant (2015). 58 Bonilla-Silva (1996), p. 465. 59 Myrdal (1944). 60 Bonilla-Silva and Baiocchi (2001), p. 118. 61 Elias and Feagin (2016), p. 45. 62 Kovel (1970). 63 Feagin (2013). 64 Golash-Boza (2016). As an example of a slight based on a prejudice belief, consider the scenario depicted by Claudia Rankine in Citizen: An American Lyric in which a woman casually comments that affirmative action is the reason a student membered Black, whose background is unknown to her, was accepted into an institute of higher education.

The Systemic Structure of Racism 99 65 66 67 68 69

Devine (1989), p. 6. Bertrand and Mullainathan (2004). Green et al. (2007); Sabin and Greenwald (2012). Brown (2018); Gullo and Beachum (2020); Gullo et al. (2019). Omi and Winant (2015, p. 63) argue that Massey (2007) takes a more extreme position regarding implicit bias in which “prejudice derives from ineluctable features of human biology and evolution, rather than patterns of socialization [racialism] … it suggests that racism are permanent and ineradicable.” 70 Kovel (1970). 71 Gaertner and Dovidio (2005), p. 618. 72 As Gaertner and Dovidio (2005, p. 620) explain, The scenario for the experiment was inspired by an incident in the mid-1960s in which 38 people witnessed the stabbing of a woman, Kitty Genovese, without a single bystander intervening to help. What accounted for this behavior? Feelings of responsibility play a key … If a person witnesses an emergency knowing that he or she is the only bystander, that person bears all of the responsibility for helping and, consequently, the likelihood of helping is high. In contrast, if a person witnesses an emergency but believes that there are several other witnesses who might help, then the responsibility for helping is shared. Moreover, if the person believes that someone else will help or has already helped, the likelihood of that bystander taking action is significantly reduced. 73 Gaertner and Dovidio (2005), p. 621. 74 Dovidio and Gaertner (2000); Hodson, Dovidio, and Gaertner (2002); Sidanius, Levin, and Pratto (1998); Hodson, Hooper, Dovidio, and Gaertner (2005). 75 In his critique of individual racism, Bonilla-Silva (1996) divides the operation into three components: (1) defining racism as a set of ideas or beliefs; (2) those beliefs producing prejudice; and (3) prejudicial attitudes inducing individual actions that discriminate against people membered into nondominant racialized groups. 76 Hall (1980); Bonilla-Silva (1996). 77 Hall (1980), pp. 337–338. 78 Bonilla-Silva (1996); Bonilla-Silva and Baiocchi (2001); Feagin (2006); Elias and Feagin (2016); Omi and Winant (2015); Stanfield (2016). See also Griffith et al. (2007) on individual racism and health disparities, Alexander (2012) on criminal justice disparities, Leonardo and Grubb (2018) on educational disparities, Sharkey (2013) on residential disparities, Oliver and Shapiro (2006) on wealth disparities, and Rothstein (2017) on financial and property disparities. 79 Bonilla-Silva (1996), p. 466. Note that rather than the phrase “individual racism,” Bonilla-Silva employs the term “ideological racism.” 80 Carmichael and Hamilton (1992). Carmichael changed his name to Kwame Ture after publication. 81 Yeakey (1979), p. 200. 82 Bonilla-Silva (1996), p. 466. 83 Scheurich and Young (1997), p. 5. 84 Kendi (2016), p. 281. 85 Rothstein (2017). 86 Kendi (2016); Rothstein (2017). 87 Alexander (2012). See also Kendi (2016). 88 Omi and Winant (2015). 89 Leonardo and Grubb (2018). See also Cummins (1986), Ladson-Billings (1995), and Lee, Lomotey, and Shujaa (1990).

100 Race, Racism, and the White Racial Frame 90 91 92 93 94 95

Omi and Winant (2015). Bailey et al. (2017). Krieger (2014), p. 650. Gee and Ford (2011). Gee and Ford (2011), p. 117. Krieger (2014), p. 660. In Chapter 9, Francis Galton’s Quincunx is described as a tool he used to explore the idea of distribution, and in particular the normal distribution. Krieger describes how others then modified Galton’s Quincunx such that the distribution produced resembled that of a log-normal distribution. Krieger builds on this idea of modifying the Quincunx by adding funnels that operate at various stages to segregate the shot (i.e., people) that move through the apparatus of a social system to structure outcomes and resulting lived experiences of people membered into different racialized groups. 96 Mills (1997) takes the reason for entering society one step further, writing, “the point of leaving the state of nature is in part to secure a stable environment for the industrious appropriation of the world” (p. 31). 97 Mills (1997), p. 3, italics in the original. 98 Mills (1997), p. 11. 99 Mills (1997), pp. 11–12. 100 Mills (1997), p. 57. 101 Mills (1997), pp. 32–33, italics in the original. 102 For a detailed exploration of de jure versus de facto discrimination, see Rothstein (2017). 103 See Mills (1997), pp. 75–76, for examples of practices within real estate that operate in a covert manner to inhibit home purchases by people membered Black in neighborhoods resided predominantly by people membered White. 104 Ellis et al. (2004); Hall et al. (2019); Ferguson and Koning (2018). 105 Zuberi and Bonilla-Silva (2008). 106 Feagin (2013), p. 92. 107 Feagin and O’Brien (2004). 108 Alexander (2012); Butler (2018). 109 Feagin (2006), p. 8. 110 Feagin (2006); Bracey et al. (2017). 111 Geuss (1981); Bonilla-Silva (2018). 112 Thompson (2020); Bonilla-Silva (2018), p. 54. 113 Bonilla-Silva (2018), p. 78. 114 Jackman (1994), p. 69. 115 Omi and Winant (2015), p. 138. 116 Omi and Winant (2015). 117 Haney López (2003) quoted in Gomez (2012), p. 53. 118 Lewis (2004), p. 632. 119 Lewis (2004), p. 633. 120 Kendi (2016), p. 173. 121 Kendi (2016), p. 225. 122 Feagin (2013), pp. 43–44. 123 Golash-Boza (2016), p. 135. 124 Reagan told this story about a woman membered Black in Chicago, whose real name was Linda Taylor, many times and the details listed evolved over time. Although Reagan did not use the phrase “welfare queen,” a newspaper picked up on Reagan’s speech and used the phrase in the headline to its story. As Levin (2019) details, Taylor did deceptively collect various forms of public assistance

The Systemic Structure of Racism 101 in a fraudulent manner. However, she was never actually eligible for welfare supports. Moreover, she was an extremely accomplished con artist who engaged in a variety of other forms of fraud and is believed to have also committed kidnapping and murder. Reagan’s use of her story, with specific emphasis on her fraudulent welfare scheme, produced a narrative that depicted people membered Black as lazy, exploitive, and undeserving of welfare supports. 125 Golash-Boza (2016), p. 135. 126 Bonilla-Silva (1996). 127 Kendi (2016), pp. 9–10. 128 Lewis (2004). 129 Kendi (2016), p. 199, quoting Frederick Douglas. 130 See Zuberi and Bonilla-Silva (2008), p. 87. 131 Horton and Sykes (2008), p. 240. 132 Oluo (2018).

References Alexander, M. (2012). The New Jim Crow: Mass Incarceration in the Age of Colorblindness. The New Press. Aptheker, H. (1975). The history of anti-racism in the United States. The Black Scholar, 6(5), 16–22. Aptheker, H. (1992). Anti-Racism in U.S. History: The First Two Hundred Years. Greenwood Press. Bailey, Z.D., Krieger, N., Agénor, M., Graves, J., Linos, N. & Bassett, M.T. (2017). Structural racism and health inequities in the USA: Evidence and interventions. The Lancet, 389(10077), 1453–1463. Benedict, R. (1943). Race and Racism. University Press at Edinburgh. Benjamin, R. (2019). The New Jim Code? Race, Carceral Technoscience, and Liberatory Imagination, Othering and Being Institute. Downloaded from https:// belonging.berkeley.edu/video-ruha-benjamin-new-jim-code-race-carceraltechnoscience-and-liberatory-imagination Bertrand, M. & Mullainathan, S. (2004). Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination. American Economic Review, 94(4), 991–1013. Bonilla-Silva, E. (1996). Rethinking racism: Toward a structural interpretation. American Sociological Review, 62(3), 465–480. Bonilla-Silva, E. (2018). Racism without Racists: Color-Blind Racism and the Persistence of Racial Inequality in the United States (5th ed.). Rowman & Littlefield Publishers. Bonilla-Silva, E. & Baiocchi, G. (2001). Anything but racism: How sociologists limit the significance of racism. Race and Society, 4(2), 117–131. Borsboom, D., Cramer, A., Kievit, R.A., Scholten, A.Z. & Franić, S. (2009). The end of construct validity. In The Concept of Validity: Revisions, New Directions, and Applications. Information Age Publishing. Bracey, G., Chambers, C., Lavelle, K. & Mueller, J.C. (2017). The white racial frame: A roundtable discussion. In Systemic Racism. Palgrave Macmillan. Brown, A.L. (2018). From subhuman to human kind: Implicit bias, racial memory, and Black males in schools and society. Peabody Journal of Education, 93(1), 52–65.

102 Race, Racism, and the White Racial Frame Butler, P. (2018). Chokehold: Policing Black Men. The New Press. Carmichael, S. & Hamilton, C.V. (1992). Black Power: The Politics of Liberation in America. Vintage. Cornell, S. & Hartmann, D. (2006). Ethnicity and Race: Making Identities in a Changing World. Sage Publications. Cummins, J. (1986). Empowering minority students: A framework for intervention. Harvard Educational Review, 56(1), 18–36. Devine, P.G. (1989). Stereotypes and prejudice: Their automatic and controlled components. Journal of Personality and Social Psychology, 56(1), 5. Dovidio, J.F. & Gaertner, S.L. (2000). Aversive racism and selection decisions: 1989 and 1999. Psychological Science, 11, 319–323. Elias, S. & Feagin, J.R. (2016). Racial Theories in Social Science: A Systemic Racism Critique. Routledge. Ellis, M., Wright, R. & Parks, V. (2004). Work together, live apart? Geographies of racial and ethnic segregation at home and at work. Annals of the Association of American Geographers, 94(3), 620–637. Feagin, J. (2006). Systemic Racism: A Theory of Oppression. Routledge. Feagin, J.R. (2013). The White Racial Frame: Centuries of Racial Framing and Counter-framing. Routledge. Feagin, J.R. & O’Brien, E. (2004). White Men on Race: Power, Privilege, and the Shaping of Cultural Consciousness. Beacon Press. Ferguson, J.P. & Koning, R. (2018). Firm turnover and the return of racial establishment segregation. American Sociological Review, 83(3), 445–474. Foucault, M. (1980). Power/Knowledge: Selected Interviews and Other Writings 1972– 1977. Pantheon Books. Gaertner, S.L. & Dovidio, J.F. (2005). Understanding and addressing contemporary racism: From aversive racism to the common ingroup identity model. Journal of Social issues, 61(3), 615–639. Gee, G.C. & Ford, C.L. (2011). Structural racism and health inequities: Old issues, new directions. Du Bois Review, 8(1), 115–132. Geuss, R. (1981). The Idea of a Critical Theory: Habermas and the Frankfurt School. Cambridge University Press. Golash-Boza, T. (2016). A critical and comprehensive sociological theory of race and racism. Sociology of Race and Ethnicity, 2(2), 129–141. Gomez, L. (2012). Understanding law and race as mutually constitutive. Journal of Scholarly Perspectives, 8(01), 47–63. Green, A.R., Carney, D.R., Pallin, D.J., Ngo, L.H., Raymond, K.L., Iezzoni, L.I. & Banaji, M.R. (2007). Implicit bias among physicians and its prediction of thrombolysis decisions for black and white patients. Journal of General Internal Medicine, 22(9), 1231–1238. Griffith, D.M., Mason, M., Yonas, M., Eng, E., Jeffries, V., Plihcik, S. & Parks, B. (2007). Dismantling institutional racism: Theory and action. American Journal of Community Psychology, 39(3), 381–392. Gullo, G.L. & Beachum, F.D. (2020). Does implicit bias matter at the administrative level? A study of principal implicit bias and the racial discipline severity gap. Teachers College Record, 122(3), 1–28.

The Systemic Structure of Racism 103 Gullo, G.L., Capatosto, K. & Staats, C. (2019). Implicit Bias in Schools. Routledge. Hall, M., Iceland, J. & Yi, Y. (2019). Racial separation at home and work: Segregation in residential and workplace settings. Population Research and Policy Review, 38(5), 671–694. Hall, S. (1980). Race articulation and societies structured in dominance. In Sociological Theories: Race and Colonialism. UNESCO. Haney López, I. (2003). Racism on Trial: The Chicano Fight for Justice. Harvard University Press. Hochschild, J.L. & Powell, B.M. (2008). Racial reorganization and the United States Census 1850–1930: Mulattoes, half-breeds, mixed parentage, Hindoos, and the Mexican race. Studies in American Political Development, 22(1), 59–96. Hodson, G., Dovidio, J.F. & Gaertner, S.L. (2002). Processes in racial discrimination: Differential weighting of conflicting information. Personality and Social Psychology Bulletin, 28, 460–471. Hodson, G., Hooper, H., Dovidio, J.F. & Gaertner, S.L. (2005). Aversive racism in Britain: Legal decisions and the use of inadmissible evidence. European Journal of Social Psychology, 35(4), 437–448. Horton, H.D. & Sykes, L.L. (2008). Critical demography and the measurement of racism. In White Logic, White Methods: Racism and Methodology. Rowman & Littlefield. Jackman, M.R. (1994). The Velvet Glove. University of California Press. Jacobson, M.F. (1999). Whiteness of a Different Color: European Immigrants and the Alchemy of Race. Harvard University Press. Kendi, I.X. (2016). Stamped from the Beginning: The Definitive History of Racist Ideas in America. Nation Books. Kovel, J. (1970). White Racism: A Psychohistory. Columbia University Press. Krieger, N. (2014). Discrimination and health inequities. International Journal of Health Services, 44(4), 643–710. Ladson-Billings, G. (1995). Toward a theory of culturally relevant pedagogy. American Educational Research Journal, 32(3), 465–491. Lee, C.D., Lomotey, K. & Shujaa, M. (1990). How shall we sing our sacred song in a strange land? The dilemma of double consciousness and the complexities of an African-centered pedagogy. Journal of Education, 172(2), 45–61. Leeman, J. (2004). Racializing language: A history of linguistic ideologies in the US Census. Journal of Language and Politics, 3(3), 507–534. Leonardo, Z. & Grubb, W.N. (2018). Education and Racism: A Primer on Issues and Dilemmas. Routledge. Levin, J. (2019). The Queen: The Forgotten Life Behind an American Myth. Little, Brown. Lewis, A.E. (2004). “What group?” Studying whites and whiteness in the era of “color-blindness”. Sociological Theory, 22(4), 623–646. Lewis, A.E., Hagerman, M.A. & Forman, T.A. (2019). The sociology of race and racism: Key concepts, contributions and debates. Equity and Excellence in Education, 52(1), 29–46. López, N., Vargas, E., Juarez, M., Cacari-Stone, L. & Bettez, S. (2018). What’s your “street race”? Leveraging multidimensional measures of race and intersectionality

104 Race, Racism, and the White Racial Frame for examining physical and mental health status among Latinxs. Sociology of Race and Ethnicity, 4(1), 49–66. Massey, D.S. (2007). Categorically Unequal: The American Stratification System. Russell Sage Foundation. Mills, C.W. (1997/2014). The Racial Contract. Cornell University Press. Myrdal, G. (1944). An American Dilemma: The Negro Problem and Modern Democracy (Vol. II). Transaction Publishers. Nobles, M. (2000). Shades of Citizenship: Race and the Census in Modern Politics. Stanford University Press. Oliver, M.L. & Shapiro, T.M. (2006). Black Wealth, White Wealth: A New Perspective on Racial Inequality. Taylor & Francis. Oluo, I. (2018). So You Want to Talk about Race. Hachette UK. Omi, M. & Winant, H. (2015). Racial Formation in the United States. Routledge. Park, R.E. (1914). Racial assimilation in secondary groups with particular reference to the Negro. American Journal of Sociology, 19(5), 606–623. Park, R.E. & Burgess, E.W. (2019). The City. University of Chicago Press. Partington, J.R. & McKie, D. (1937). Historical studies on the phlogiston theory—I. The levity of phlogiston. Annals of Science, 2(4), 361–404. Partington, J.R. & McKie, D. (1938). Historical studies on the phlogiston theory— III. Light and heat in combustion. Annals of Science, 3(4), 337–371. Public Counsel. (October 19, 2019). Letter to the Regents of the University of California. Roberts, D. (2011). Fatal Invention: How Science, Politics, and Big Business Re-create Race in the Twenty-first Century. New Press. Rothstein, R. (2017). The Color of Law: A Forgotten History of How Our Government Segregated America. Liveright Publishing. Sabin, J.A. & Greenwald, A.G. (2012). The influence of implicit bias on treatment recommendations for 4 common pediatric conditions. American Journal of Public Health, 102(5), 988–995. Scheurich, J.J. & Young, M.D. (1997). Coloring epistemologies: Are our research epistemologies racially biased? Educational Researcher, 26(4), 4–16. Sharkey, P. (2013). Stuck in Place: Urban Neighborhoods and the End of Progress Toward Racial Equality. University of Chicago Press. Sidanius, J., Levin, S. & Pratto, F. (1998). Hierarchical group relations. Institutional terror, and the dynamics of the criminal justice system. In Confronting Racism: The Problem and the Response. Sage. Stanfield, J.H. (2011). Historical Foundations of Black Reflective Sociology. Left Coast Press. Stanfield, J. H. (2016). Black Reflective Sociology: Epistemology, Theory, and Methodology. Taylor & Francis. Thompson, J.B. (2020). Studies in the Theory of Ideology. University of California Press. Yeakey, C. (1979). Ethnicity as a dimension of human diversity. In Human Diversity and Pedagogy. Educational Testing Service. Zuberi, T. & Bonilla-Silva, E. (2008). White Logic, White Methods: Racism and Methodology. Rowman & Littlefield Publishers.

4

The White Racial Frame

White racial framing involves the explanation and construction of social reality from the perspective of dominant whites, one normally steeped in Eurocentrism, thereby creating a long-lasting white racial frame. This white racial frame is a meta-structure that develops and reinforces ideas, actions, networks, institutions, and social structures according to the views and practical racial-group interests of white people.1

Derren Brown is a British mentalist whose performances stage various ways in which he seemingly reads people’s minds. In one situation, he invites two advertising executives into an office and allows them 30 minutes to sketch an advertisement for a taxidermy business. Before setting the two agents to the task, Darren shows them a sample of the business’s product—a stuffed cat— under which he places an envelope before he departs the room. The agents work diligently sketching ideas on a white board. A half-hour later, Darren returns to review their pitch. On the board the executives have drawn a bear playing a harp in front of pearly gates. They named the business Animal Heaven and have created the slogan “The best place for dead animals.” Derren opens the envelope he left under the stuffed cat. In it is a nearly identical logo containing a bear and pearly gates. His business is called Creature Heaven, and his slogan is remarkably similar, “Where the best dead animals go.”2 What is the secret to Derren’s trick? Prior to bringing the two executives into the room, Derren hired a taxi to transport the men from their hotel to the building in which the brainstorming room was located. The driver followed a carefully choregraphed route that began with the taxi parked outside the hotel door, beside which was a flyer with an image of a bear posted on the wall the two executives pass as they enter the taxi. The taxi passes the London Zoo, where they stop beside the large gates framing its entrance. They also stop at a walkway across which a group of children pass, all wearing the same

DOI: 10.4324/9781003228141-6

106 Race, Racism, and the White Racial Frame sweatshirt advertising the zoo and its gates. As they approach the building, they pass several storefronts that contain images of harps, posters inked with Derren’s slogan, and the words “creature heaven” artistically drawn on a blackboard outside a coffee shop. And in the entrance of the building, a massive stuffed bear stands tall on its hind legs. Across his many mentalist tricks, Derren carefully seeds people with images. Once planted, these images shape their thoughts, allowing Derren to predict the ideas they share and decisions they make. As the quote opening this chapter describes, the White Racial Frame acts in a similar way, but on a much grander stage. People raised in the United States are seeded with images, ideas, thoughts, and ways of behaving that are the product of the White Racial Frame. The narratives conveyed by family members, educators, news broadcasters, and radio talk show hosts help construct the White Racial Frame. The textbooks, filmstrips, movies, songs, and speeches that form our schools’ curricular material contribute to the White Racial Frame. The television shows broadcast each day shape and reinforce the White Racial Frame. The advertisements on television, radio, magazines, buses, and billboards inundate us with images that mold the White Racial Frame. The morals and values instilled in us by our family, neighbors, faith-based leaders, teachers, and coaches all contribute to forming the White Racial Frame. As the name makes explicit, race and the white supremist view of the world that elevates White European customs, practices, beliefs, and institutional arrangements are core to the White Racial Frame. The messages conveyed through racialized images and narratives shape our understanding of race and our implicit beliefs about people membered into specific racialized groups. But there is more to the White Racial Frame than race. The frame also shapes the ways in which we think about success and failure, what is good behavior and what is not, and what counts as science and what does not. Individualism, merit, and just rewards are important elements of the White Racial Frame. Valuing science, objectivity, quantification, and measurement also form that frame. And, just as twisting a kaleidoscope shifts the ways in which images are blended, the White Racial Frame adjusts the ways in which many of these aspects intersect with race. In this way, the White Racial Frame is manufactured to support contradictions required to “explain” the disparate outcomes and harms produced by the system of racism that operates the United States. Recognizing the ways in which the White Racial Frame shapes our understanding of racialized people and the disparate outcomes they experience provides insight into “how racial structures are shaped and maintained by an array of seemingly divergent, everyday behaviors, feelings, beliefs, and practices.”3 To position us to explore the ways in which the White Racial Frame has and continues to influence educational measurement, this chapter unpacks

The White Racial Frame 107 four core elements of the White Racial Frame. I begin by focusing on the racialized component of the frame, assembling ideas examined in the previous three chapters to detail their impact on the framing of educational measurement. Three other key elements of the frame are then examined: individualism and individual merit; scientific discovery; and a utilitarian view of justice. Molding Racialism Chapters 1 and 2 document the construction and subsequent molding of race. Chapter 3 explores how racialized ideology and narratives function as components of systemic racism to afford economic, social, and political advantage to people membered White. It is this construction and use of race that makes race a social construct. Yet, many people in the United States cling to an individualistic, biological-genetic conception of race. They see race and racism as properties of each individual and not a construction of society. Even those who use the phrase “race is a social construct” remain attracted to the idea that if people who are membered not-White tried harder or instilled better values in their children, the disparate impacts produced by racism could be addressed. And many who describe racism as systemic maintain that the best way to address racism is to change the attitudes and beliefs of individuals membered White. What explains the stubborn hold of a conception of race as an individual biological-genetic trait and racism as a product of individual prejudice and discrimination? Joe Feagin attributes this hold to the White Racial Frame. Feagin developed the idea of a White Racial Frame during the mid-2000s in an effort to introduce a new paradigm for understanding the persistence of racism in the United States. As he explains, Traditional social science and other mainstream academic and popular analysis has mostly portrayed U.S. racism as mainly a matter of racial “prejudice,” “bias,” and “stereotyping”—of racial attitudes directed at outgroups that indicate an ethnocentric view of the world and incline individuals to take part in bigotry-generated discrimination.4 Feagin does not deny prejudice, bias, and stereotyping influence the persistence of racism. However, he argues this conception focuses the locus of racism on the individual and small groups of individuals. This individual focus fails to present racism as foundational to society—to see that U.S. society is bound by a racial contract that structures society and relationships among institutions to benefit people membered White through the oppression of people membered into other racialized groups. This individual focus implies that racism will gradually disappear as people’s views are

108 Race, Racism, and the White Racial Frame “modernized” and obscures consideration of institutional, structural, and ultimately systemic change. Charles Mills attributes this blinding feature of individual racism to the racial contract, writing: The Racial Contract creates a racialized moral psychology—Whites will then act in racist ways while thinking of themselves as acting morally. In other words, they will experience genuine cognitive difficulties in recognizing certain behavior patterns as racist, so that quite apart from questions of motivation and bad faith they will be morally handicapped simply from the conceptual point of view in seeing and doing the right thing. As I emphasized at the start, the Racial Contract prescribes, as a condition for membership in the polity, an epistemology of ignorance.5 For Feagin, an analysis of racism as it operates in the United States must focus attention on the ideology and narratives that back systemic racism, direct attention to “the deep structural foundation in which such acts of discrimination are imbedded,” and provide insight into the ways bias, prejudice, and discriminatory thinking are reproduced intergenerationally. For Feagin, it is this ideology and associated narratives that form the White Racial Frame. And it is the White Racial Frame that directs attention away from the structural components of racism and instead directs attention to the individual components. To be clear, the White Racial Frame does more than maintain a biologicalgenetic conception of race. The White Racial Frame shapes our understanding of people membered into specific racialized groups, maintaining as superior the culture, behavior, knowledge, and ways of gaining knowledge associated with people membered White. As we saw in Chapter 2, elevating the superiority of White European culture, social structure, and scientific discovery was first planted into the minds of the Enlightened elite through the writings of Bernier, Linnaeus, Buffon, Camper, Blumenbach, Cuvier, Meiners, Voltaire, Kant, Hume, and other Enlightenment scholars.6 These ideas were fertile in the ideas of Jefferson, Jackson, Emerson, Lodge, Ripley, Roosevelt, and many others during the United States’ first century. They permeated the scientific thinking and cultural productions of the late 19th and early 20th centuries. More recently, they are conveyed through the coded rhetoric of our political leaders—Nixon’s “law and order,” Reagan’s “welfare queen” and “war on drugs,” Bush’s Willie Horton, Clinton’s “superpredators,” and Trump’s list too long to present here. Today, racist images and ideas are broadcast widely through the media. Reflecting on my childhood, I recall the sharp contrast between the wellmanicured suburban households that served as the settings for The Brady Bunch, Happy Days, Family Ties, and Leave It to Beaver and the cramped

The White Racial Frame 109 apartment, junkyard, and soiled garage that were the settings for Good Times, Sanford and Son, and Chico and the Man. It was the penthouse apartment that allowed Arnold and Willis Jackson—two orphaned children membered Black—to escape the inferred hardships they would have endured had they not been adopted by the wealthy Mr. Drummond, membered White. While some shows—Fresh Prince of Bel-Air and The Cosby Show—did depict Black households of means, this comfort was preceded by hardship—the Fresh Prince escaping from the rough-and-tumble streets of West Philadelphia and Cosby’s rise from the garbage dump adventures of Fat Albert and his gang. These images were contrasted further through the local evening news that highlighted interviews of notable local leaders— nearly all membered White—and instilled fear through the mugshots and blurred security photos of robbers, rapists, and murderers—nearly all membered Black.7 As Feagin describes, the White Racial Frame is a complex and complicated cog in the system of racism. Like a pair of glasses, the White Racial Frame can be donned or removed depending on context. Like a progressive lens, the White Racial Frame brings into focus different ideas and notions depending on where one’s attention is directed. And like a transition lens, the White Racial Frame sheds light on conceptions that reinforce systemic racism and shades those notions that do not. Writing before Feagin introduced the paradigm of the White Racial Frame, Mills similarly noted how Whiteness shades people membered White from seeing the structural components of racism: James Baldwin argues that white supremacy “forced [white] Americans into rationalizations so fantastic that they approached the pathological,” generating a tortured ignorance so structured that one cannot raise certain issues with whites “because even if I should speak, no one would believe me,” and paradoxically, “they would not believe me precisely because they would know that what I said was true.” Evasion and selfdeception thus become the epistemic norm … [Hence] Montesquieu’s wry observation about African enslavement: “It is impossible for us to suppose these creatures to be men, because, allowing them to be men, a suspicion would follow that we ourselves are not Christians.”8 It is the White Racial Frame’s ability to be worn or removed, alter focus, and to brighten and obscure that allows us to operate with clear contradictions. We can blame the “lack of success” of people membered not-White on “their failure” to pull themselves up by their bootstraps, “not valuing education,” “lacking family values,” or simply “not trying hard enough.” Yet, when a person membered White is not offered college admission or a new job, social policies, like affirmative action, are blamed. And when the situation is reversed, it

110 Race, Racism, and the White Racial Frame is not the talent and hard work of people membered not-White that accounts for their success; it is a policy like affirmative action that has given them advantage. But, when a person membered White succeeds, the advantages built into our system are not attributed; instead it is talent, a good upbringing, and hard work that explains success.9 Similar inconsistencies occur when people run afoul of the law; for a person membered White, their behavior is an exception for what is otherwise a “good kid;” it is “boys being boys,” “not knowing any better.” In contrast, when a person membered not-White is murdered for passing through a neighborhood at night, jogging in a predominantly White neighborhood, resting on the couch in a wrongly entered apartment, or at the hand or knee of an officer suffocating their breath, their past infractions are highlighted and presented as a justification for the actions that led to their death, even though those discretions have nothing to do with the situation that produced their death. Racialized ideas are a core component of the White Racial Frame. Other authors—Ibram Kendi, Nell Irving Painter, Winthrop Jordan—have documented extensively the development and persistence of many of these ideas, so I will not recount them here. This framing was present in those very first definitions of racialized groups, which included demeaning descriptions, and in some cases sketches, of the physical features, psychological characteristics, and social structures that accompanied Linnaeus and Buffon’s racial stratifications.10 Since then, they have permeated written narratives and the sketches accompanying them, political speeches and judicial rulings, and cartoons printed in periodicals in the 19th and early 20th centuries and then broadcast as moving voiced images on national television beginning in the 1950s, and they persist in our modern media. James Scheurich and Michelle Young similarly explore the various ways in which conceptions of family structure are shaped by our society and the damaging impact this structuring has on people membered into nondominant groups: the socially promoted idea—through the media, through legal practices, through governmental programs—of what a good family is, is primarily drawn from the dominant culture’s social, historical experience, that is societal racism. The privileging of one view over others, like the favoring of a White middle-class view of families over an African American view of families, results in social practices that have direct negative effects on families that deviate from the dominant norm.11 These many depictions produce conceptions that construct and reinforce the superiority of culture, values, family structures, discourse, and ways of knowing that first emerged in White (Anglo-Saxon) Europe.12 The dominant racial frame becomes implanted in the neural linkages … by the process of constant repetition of its elements … For most whites

The White Racial Frame 111 the dominant frame has become so fundamental that few are able to see it or assess it critically.13 (Re)constructing racialism is a central function of the White Racial Frame. Backing these racialized ideas, however, are “several ‘big picture’ narratives … with morals that are especially important to white Americans. These emotion-laden scenarios include stories about white conquest, superiority, hard work, and achievement.”14 Rugged individualism, social Darwinism, determinism, universalism, objectivity, and merit all play supporting roles in maintaining the racialized/racist conceptions promulgated by the White Racial Frame.15 It is to these supporting components that we now turn. Individualism and Individual Merit Individualism and the rugged individual are deeply rooted components of the dominant ideology developed by U.S. society. The idea of people operating as individuals, of their own free will, pursuing life, liberty, and happiness, was understood as an unalienable right of the (White) people residing in what became the United States. To protect the rights of each individual, several amendments have and continue to be added to the U.S. Constitution. As Feagin describes, U.S. history—as told by the White elites who have and continue to control publishing agencies, media outlets, and school curricula—is replete with “stories about white conquest, superiority, hard work, and achievement.”16 The very first Europeans who arrived in colonial America are glorified for their individual conquest. As the story is told, “English ‘settlers’ came with little, but drawing on religious faith and hard work they ‘settled’ and made a nearly ‘vacant’ land prosper, against ‘savage’ Indians.”17 This tale of individual conquest was then extended to the pioneers who expanded U.S. borders westward. A counter-narrative, however, suggests these tales may be more myth than legend. In his book, Reading the Forested Landscape, Tom Wessels, a natural biologist with deep knowledge of New England woodlands, documents a very different landscape to which the early “English settlers” arrived. Whereas the myth produces images of dense, dark primeval forests, thick with undergrowth that required enormous energy to tame, Wessels unveils a landscape that was managed for centuries by the indigenous inhabitants. As Wessels describes: Fire was the first landscape management tool used by humans … When the first explorers and settlers ventured from Europe to New England, Native American use of fire was well at hand, especially around the coastal regions of southern and central New England … Thomas Morton,

112 Race, Racism, and the White Racial Frame who explored the coastal regions of Massachusetts, New Hampshire, and Maine, wrote in his 1632 book, New English Canaan, “The Salvages [sic] are accustomed, to set fire in the Country, in all places where they come; and to burne it, twize a yeare, vixe at the Springe, and the fall of the leafe. The reason that mooves them to doe so, is because it would other wise be so overgrown with underweeds, that it would be all a coppice wood, and the people would not be able to passe through the Country out of a beaten path.”18 Managing the New England forests by controlled burns not only made them easier to navigate, it also controlled mosquito and blackfly populations, made it easier to stalk game quietly, aided in the gathering of nuts, and improved growing conditions for forest berries. Wessels also notes that “just as thousands of years of burning by Native Americans created an extensive prairie in parts of Wisconsin and Illinois … burning in New England … created a coastal savannah” that lined the coast from what is now Portland, Maine, through Connecticut. This open prairieland, extending up to five miles inland from the coast, provided supple nesting grounds for a variety of game fowl which the people indigenous to the land, and later the European conquerors, culled for sustenance.19 Although European settlers undoubtedly encountered a wide range of hardships as they plundered Indigenous lands, taming the forests was not nearly as arduous as the myth of colonial settlement portrays. The same exaggeration holds for the settling of western lands. The dominant narratives portray rugged individuals who claimed, cleared, and fiercely defended their stolen land. As Woodrow Wilson described, “Steadily, almost calmly, they extended, [across] this great continent, then wild and silent.”20 In many cases, however, the land had already been cleared by the Indigenous people. Moreover, the U.S. cavalry often arrived long before the settlers to rid the land of its Indigenous people. And the claiming of land was often granted a priori by the U.S. government, which was eager for U.S. citizens to occupy the stolen land. As Greg Grandin describes it: what we think of the West, since its inception, has been the domain of large-scale power, of highly capitalized speculators, businesses, railroads, agriculture, and mining … These markets were created through federal action, by, among other things, federal gunboats. Western movement required a strong state. The U.S. Army removed Native Americans and Mexicans. Government-backed bonds financed the purchase of Louisiana. Federal surveyors plotted out their baselines and principal meridians well in advance of the settled frontier, and federal engineers laid out roads. Public-works projects, many of them carried out by the Corps of Engineers, irrigated arid lands in the West and

The White Racial Frame 113 drained swampy lands in Florida. And the secretary of war distributed rifles and ammunition to settlers.21 Neither the taming of the New England’s “primeval” forests nor the conquest of the West was truly an individual pursuit. Yet, the individualism that forms a pillar of the White Racial Frame requires stories that elevate the individual and downplay the role context and social institutions play in shaping the success of predominantly White colonialists and western pioneers. Individualism leads us to focus on characteristics of the individual, and most importantly characteristics the individual is believed to have developed (or not developed) for themselves. When theorizing social-economic status, this leads to a focus on income the individual “earns” rather than the wealth one possesses. In turn this leads us to overlook how that wealth was acquired, whether it be through savings, investments, or inheritance. Yet, as Ezekiel Dixon-Román observes, it is wealth that has been found to have an effect on offspring outcomes such as educational attainment and labor market participation over and above parental permanent income, educational attainment and occupational prestige (Conley 1999). Additionally parental liquid assets accounted for a substantial proportion of the racial differences in offspring outcomes.22 The elevation of individualism directs attention away from social, economic, and structural factors that influence a wide variety of outcomes, and instead suggests that differences in outcomes are a product of individual merit. As Dixon-Román observes, “The United States has rested its values of progress and democracy on ideologies of meritocracy. [And] it has been via ideologies of meritocracy that selection, inequality, and social im/mobility has been legitimized.”23 As far back as the 1832 Reform Act, meritocracy was promoted as a tool to combat a socio-political-economic system based on corruption and family connections. The elevation of merit led to a “triumph of the entrepreneurial ideal”; an ideal that shifted value away from inherited wealth and social rank to capital acquisition, “with competition being the unbiased arbiter of [individual] effort.”24 Yet, as sociologist Thomas Shapiro, author of The Hidden Cost of Being African American, observes: American society is of two minds about inheritance and we seem to want it both ways. We take pride in our accomplishments, often marking them in monetary terms, and see nothing wrong in passing on what we earned to our children. Indeed, part of the motivation for working hard and acquiring things includes bettering our family and our children for future

114 Race, Racism, and the White Racial Frame generations. This notion, however, collides with the equally strongly held notion of meritocracy because inheritances are unearned, represent a different playing field entirely, and have precious little to do with merit, achievements, or accomplishments. We live with this duality, partly because we deny what inheritances represent, partly because we see it in individual and family terms, and partly because the current political balance heavily favors those with advantages and privileges.25 This conflict is evident in the stories people membered White convey about their own success. Eduardo Bonilla-Silva and his team have interviewed hundreds of people, many of them college students. In these interviews, students routinely attribute their position in college to their hard work and see their status as a product of merit.26 “They refer also to their parents’ hard work that led to them being able to afford to live in exclusive suburbs. They have little to no sense of the hard work millions of others put in that got them nowhere.”27 On the one hand, their position is a product of their individual effort. On the other, they recognize the role their family’s position plays in shaping the opportunities to which they are afforded. Yet, they fail to recognize factors beyond their household that have created their neighborhoods, schools, and the many other structures that position their effort to yield successful outcomes. And they fail to recognize that the neighborhoods in which most other people live do not provide the same or similar structures designed to provide them with advantage. Ignored in the ideology of meritocracy is the influence opportunity plays in shaping the impact of individual effort. It is here that the rags-to-riches narratives depicting the rise of individuals whose families were not of means, of people who seem to be immune to the prejudice and discrimination experienced due to their racialized membering, that allows the White Racial Frame to ignore the role inequity plays in shaping opportunities and ultimately success. Dixon-Román views the idea that equal opportunity for success regardless of inequality is a fatal assumption in the ideology of meritocracy. From his perspective, “meritocracy has become the ideological and systemic legitimation for the neoconservative political interest in rugged individualism and is complicit in the persistent reproduction of social inequality.”28 The idea that individual effort is responsible for an outcome, regardless of the advantages or disadvantages provided by the context in which one operates and an understanding that inherited wealth is just rewards for efforts made by previous generations, creates a contradiction the White Racial Frame manages to support. Reflecting this circular logic, Charles Mills posits, “You are what you are in part because you originate from a certain kind of space, and that space has those properties in part because it is inhabited by creatures like yourself.”29

The White Racial Frame 115 Social Darwinism and Genetic Determinism

Social Darwinism and genetic determinism are extensions of individualism that shift the idea of merit from a product of individual talent and effort to characteristics and behaviors that are inherent to an individual. The quality of a person’s inherent traits makes success or failure a natural occurrence that results from being fit or unfit for survival in a changing society. As we will explore in Chapters 5 and 6, social Darwinism and genetic determinism served as stepping stones to eugenics. The extension of individualism to social Darwinism and genetic determinism begins with Charles Darwin, and hops forward to Herbert Spencer, but traces back to Thomas Malthus. It is said that no other person had a larger impact on science during the 19th and early 20th centuries than Charles Darwin. Darwin’s articulation of the role natural selection plays in the evolutionary process fundamentally changed the ways in which people thought about nature. Darwin’s ideas about natural selection also had profound impacts on how many people thought about society and the gross inequalities that expanded as Europe and the United States industrialized. Industrialization led many people to abandon agricultural and subsistence farming and to instead flock to rapidly growing cities for employment. Industrial expansion also attracted large numbers of immigrants from regions of Europe that were slower to industrialize. The influx of people to urban areas created a variety of social stressors—overcrowding, exhaustion from long working hours, and health risks due to pollution and poor sanitary conditions. In turn, these stressors fostered increased consumption of alcohol, mental illness, spousal tension, crime, murder, and poverty. Although it was Herbert Spencer, and not Darwin, who introduced the phrase “survival of the fittest,” Darwin came to embrace this idea and added the phrasing to later versions of his seminal work. The notion of survival of the fittest transferred Darwin’s concept of natural selection from a natural biological setting and applied it to society. Put simply, Spencer’s logic posited that “if biological organisms evolved gradually by elimination of those individuals least fitted for survival … then social organisms must evolve … by the same process of elimination.”30 On the surface, the concept of survival of the fittest was attractive to those concerned about the growing social tensions produced by industrialization because it provided a simple explanation for the challenges many people encountered—some people were simply not fit to survive in the evolving social-economic context. The broad adoption of social Darwinism during the latter part of the 19th and early 20th centuries, however, has much deeper and darker roots. Both Darwin and Alfred Wallace, who contemporarily developed the theory of evolution by natural selection, credit Thomas Malthus’s influence

116 Race, Racism, and the White Racial Frame on their thinking. Born a half-century before Darwin, Malthus trained first as a minister before being appointed the first professor of history and political economy in a British university. Malthus was deeply conservative and strongly opposed to the use of taxation to provide social supports. In his influential, yet terrifying, book titled Essay on the Principle of Population, Malthus “disclaim[ed] the right of the poor to support” and instead advocated that “we should facilitate, instead of foolishly and vainly endeavouring to impede, the operations of nature in producing this mortality [of the poor].”31 In addition to not impeding nature’s production of mortality, Malthus advocated policy and actions that hastened the death of the poor: Instead of recommending cleanliness to the poor, we should encourage contrary habits. In our towns we should make the streets narrower, crowd more people into the houses, and court the return of the plague. In the country, we should build our villages near stagnant pools … But above all, we should reprobate specific remedies for ravaging diseases.32 As Allan Chase, a scholar of social biology and author of The Legacy of Malthus, summarizes, To Malthus, any measures that eased the lot of the greatest numbers of people—from sanitary reform and medical care to birth control and, above all else, higher wages—were not only immoral and unpatriotic but also against the laws of God and Nature.33 It was these ideas of ridding English society of the poor through natural selection, and the hastening of that selection, that seeded Darwin’s conception of natural selection and Spencer’s survival of the fittest. Malthus’s more brutal views on population control were also adopted by Spencer, who saw social programs designed to support the poor as “perversions of the Darwinian laws of natural selection by ‘the preservation of those least able to take care of themselves.’”34 In the United States, Spencer was a frequent lecturer and his work was widely read. In turn, his ideas regarding survival of the fittest and social legislation were embraced as a defense for a variety of social, political, and economic views. Writing to Spencer in 1866, Henry Beecher observed, “The peculiar condition of American society has made your writings far more fruitful and quickening here than in Europe.”35 Similarly, Oliver Wendell Holmes Jr. noted of Spencer that there was not “any writer of English except Darwin [who] has done so much to affect our whole way of thinking about the universe.”36 During the first decades of the 20th century, business tycoons John D. Rockefeller, Andrew Carnegie, and James Hill proclaimed the rapid and expansive growth of their businesses were “merely a survival of the fittest,”

The White Racial Frame 117 “the working out of a law of nature,” “the truth of evolution,” and “determined by the law of the survival of the fittest.”37 Others drew on survival of the fittest to defend U.S. imperial expansion, arguing that U.S. annexation of distant lands was a natural product of survival of the fittest.38 Perhaps the most prolific and influential social Darwinist of the late 19th century was William Graham Sumner, a professor of political science at Yale University. Through his many writings Sumner defended gross inequities in the acquisition of wealth, argued against social welfare programs, and supported nature taking its course unimpeded for those of lesser means. As he saw it, the nation cannot go outside of this alternative: liberty, inequality, survival of the fittest; not-liberty, equality, survival of the unfittest. The former carries society forward and favors all its best members; the latter carries society downwards and favors all its worst members.39 With these dominant views of “fit” and “unfit” molded by Spencer and affirmed by influential leaders in all sectors of society, the table was set for Gregor Mendel’s work on hereditary traits to take hold in the United States.40 Mendel’s famous pea plant experiments that led to his discovery of the heritability of dominant and recessive traits were published in the 1860s. But it was not until the late 19th century that Mendel’s theory of heredity gained widespread attention in the United States. With growing concern about intergenerational poverty, alcoholism, “feeble-mindedness,” and other “traits” of the “unfit,” Mendel’s theory of heredity provided an attractive explanation: traits that make people “fit” or “unfit” are passed down generationally. It was this connection between an individual’s traits and familial genes that led to the idea of scientific determinism. And it was scientific determinism that further evolved the concept of individualism to incorporate the inheritance of traits, be they natural talents that positioned one for success, or undesirable inherited dispositions that made one “unfit.” Embracing the theory of hereditary social traits also allowed those opposed to social legislation to ignore the influence environmental factors have on the formation of physical ailments and social behaviors. Given the racialized tensions of the time, this notion of heritable traits was also employed to provide a “natural” explanation for racialized inequalities. Whether one focused on racialized superiority/inferiority, social-class differences, deviant social behaviors, physical ills, or mental constructions such as gifted or “feeble-minded,” heredity and genetic determinism provided an attractive explanation that dismissed the influence social and political influences had on the production differences. As this brief history unveils, individualism was a component of the ideology operating in the United States during the early 19th century. Darwin’s

118 Race, Racism, and the White Racial Frame theory of biological evolution, Spencer’s extension of survival of the fittest to society, and Mendel’s theory of heredity elevated attention on the individual and the traits passed to and by each individual. Success was the product of one’s effort and inherited exceptionalism. Failure was the product of the inheritance of inferior traits and natural (de)selection. And all of these ideas were important components of the White Racial Frame that was adorned to explain the supremacy of the White elite and the struggles of all other racialized groups of people.41 Scientific Discovery The role scientific discovery plays in knowledge acquisition and constituting legitimate knowledge is also a key component of the White Racial Frame. Acquiring knowledge through scientific discovery is perhaps the most important and enduring contribution the Enlightenment made to modern society. Prior to the Enlightenment, knowledge flowed from religious doctrine and mystical revelation—the world was the way it was because God made it that way, and understanding of the world was documented in religious texts or made known through mystical revelation. Establishing empiricism during the Enlightenment as the dominant epistemology was foundational for the development of the scientific method. Empiricism shifted the acquisition of knowledge from religious orders to human sensory perception and reason. As John Stanfield describes, Unlike theological cognition, scientific cognition did not freeze conceptions of the world in a priori interpretations. Science was based on human reason and progress. This cognitive style allowed humans to control their destiny and to master their surroundings through the exercise of reason.42 Empiricism provided agency for humans to develop new knowledge and deepen understanding of the natural world. Establishing human sensory perception as a tool for discovery also elevated inductive reasoning as the primary means of discovery. Observation allowed people to gain knowledge, which was then applied to develop theory. Theory was tested by collecting additional sensory information and comparing what was observed with what was predicted to occur.43 With increased human agency in the discovery of scientifically based knowledge, several outlooks emerged that were influential for the development of educational measurement. Among these outlooks are the positivist paradigm, universalism, objectivity, quantification, measurement, and standardization.

The White Racial Frame 119 In its classic form, positivism44 is committed to an empiricist epistemology which “holds that knowledge stems from sense-data inputs [and] our ability to observe patterns.”45 Positivism directs the acquisition and application of knowledge to support positive advancements for humans and their society.46 The positivist researcher aims to “observe the essential elements of the phenomena in question (i.e., the ‘essences’) and render them in systematic and explicit (preferably, mathematical or quantitative) form.”47 As philosopher Harry Acton describes, “Positivists believe it is futile to deduce or demonstrate truths about the world from alleged self-evident premises that are not based primarily on sense perception.”48 Positivist research functions as a quest for determinacy through the discovery of laws that govern nature. Undergirding positivism is a belief in universalism; the notion that “a single reality exists and the epistemological claim that this reality can be known objectively.”49 A key assumption of universalism is that nature is uniform—a discovery made by one individual in one locale generalizes to all locales.50 In this way, “experimental results, which can normally be witnessed by only a few people, came to be accepted as truthful by nearly everyone.”51 Given a single universal reality, positivism embraces the notion that all sensory observations can be reduced to functions and formulas that reflect or provide models for the laws governing nature. Focusing specifically on biological discovery, Dixon-Román observes, “Biology has been assumed to consist of fixed and predetermined characteristics and processes of the natural world … This assumes that there is a precise calculus of the natural world that can be deciphered and ascertained.”52 Under the positivist paradigm, scientific discovery aims to unveil universal laws of nature and to apply those laws for the betterment of humankind. For two centuries, the positivist scientific paradigm understood this method of discovery as a process of verification; experimentation and observations were made to verify theory. In the mid-20th century, Karl Popper challenged the notion that the “truth” of a theory could be verified. Instead, Popper reoriented the scientific process to falsification; that is, the testing of a theory and tentatively accepting that theory as long as no evidence exists that challenges propositions based on that theory.53 Despite this important reorientation, empiricism and positivism centered humans in the construction of new knowledge. As Stanfield describes, positivism “require[s] human construction of the instruments of data collection and evaluation, human presentation and interpretation of collected data, and human decisions about how the data were to be used.”54 Centering the human in the development of knowledge raised concerns about the objectivity of knowledge produced by humans. In his book Trust in Numbers, Theodore Porter notes, “‘Objectivity’ arouses the passions as few other words can … In most contexts, objectivity means fairness and impartiality. Someone who ‘isn’t objective’ has allowed

120 Race, Racism, and the White Racial Frame prejudice or self-interest to distort a judgment.”55 From a philosophical perspective, objectivity is closely aligned with realism—that is, an object or phenomenon exists independent of the mind that observes that object or phenomenon. In contrast, subjectivity “refers to ideas and beliefs that exist only in the mind. When philosophers speak of the objectivity of science, they generally mean its ability to know things as they really are.”56 In the positivist tradition, objectivity is maintained in at least two ways. First, when engaged in research, the researcher maintains objectivity by applying commonly accepted methods in a consistent, rigorous, and well-documented manner—that is, the rules of scientific method are adhered. Rules serve as a check on personal biases that may influence findings of a study. In this way, “positivistic science facilitated the control and manipulation of environments by developing rules of data selection, collection procedures, and analysis.”57 A second check on subjectivity occurs by developing consensus within the scientific community about the acceptability of the findings and coherence with current understanding. Today, the peer review publication process serves as the predominant mechanism for consensus building. Porter refers to the first check as a mechanical form of objectivity—one mechanically follows established procedures to curb one’s subjectivity. He terms the second check disciplinary objectivity. Within the positivist paradigm, mechanical and disciplinary objectivity are viewed as “rigorous method, enforced by disciplinary peers, canceling the biases of the knower and leading ineluctably to valid conclusion.”58 This perspective differs from the philosophic perspective of objective, however: despite mechanical and disciplinary checks, the scientific community may be mistaken in its consensus about something that does not actually reflect reality. As an example, recall the general acceptance of phlogiston and the four humors by the scientific community discussed in Chapter 2. Paul Feyerabend, a late philosopher of science, similarly believes mechanical objectivity is problematic for the advancement of scientific knowledge. In his book, Against Method, Feyerabend writes, “A complex medium containing surprising and unforeseen developments demands complex procedures and defies analysis on the basis of rules which have been set up in advance and without regard to the ever-changing conditions of history.”59 Feyerabend argues that strict adherence to accepted method impedes discovery because methods impose restrictions on what is knowable and how it comes to be known. In addition, mechanical objectivity excludes the everyday person who is not well steeped in methodological training from participating in the acts of discovery. Yet, some of the most groundbreaking insights come from the non-methodologically based observations of the everyday person.60 Porter similarly questions the process of scientific knowledge development through the positivist paradigm, writing:

The White Racial Frame 121 Through what specific social processes is scientific knowledge made? How wide a circle of inquirers and judges is involved in the process of deciding what is true? The standard view has long held that in mature sciences, the truth is worked out or negotiated by a community of disciplinary specialists whose institutions are strong enough to screen out social ideologies and political demands [but] the effectiveness of this segregation has been exaggerated … the sciences have been compelled to redefine their proper domain in order to monopolize it, and that much of what passes for scientific method is a contrivance of weak communities, partly in response to the vulnerability of science to pressures from outside.61 Both Feyerabend’s and Porter’s criticisms point to the subjectivity that inevitably creeps into scientific inquiry conducted within an allegedly objective positivist paradigm. Strict adherence to methodological rigor and keeping to tradition limits the types of questions that can be explored. Reliance on the scientific community to police what passes as objective knowledge obtained through rigorous replicable procedures also limits who is permitted to engage in scientific inquiry and what passes as acceptable scientific knowledge. As we see in Chapter 8, this approach to establishing the objectivity of scientific findings contributed to what some have termed a replication crisis in the social sciences and slowed the development of both Bayesian statistical techniques and causal methods of analysis.62 Despite these modern criticisms, positivism provided the methodological foundation of the early social sciences.63 The French philosopher Auguste Comte is commonly recognized as the founder of sociology and is credited with applying the positivist paradigm to the study of social phenomena. Observing the social disorder that followed the French Revolution, Comte was committed to applying positivist scientific methodology and methods to understand and improve society. Through positivist inquiry, Comte believed the laws governing human social organization were discoverable just as they are in the physical sciences.64 As Celine-Marie Pascale, a sociologist who specializes in language and society, describes, Comte’s positivism articulated a search for laws of social life that could stand as equivalents to the natural laws of the physical sciences. It is anchored to the same ontological premise of the natural sciences: the world exists as an objective entity and is (at least in principle) knowable in its entirety; epistemologically, the tasks of the researcher are first to describe the reality accurately and then to analyze the results.65 From a positivist perspective, understanding is akin to measuring. This focus on measuring created a quantitative imperative within the social sciences.66 For Comte, quantification and mathematics constituted “the most

122 Race, Racism, and the White Racial Frame powerful instrument that the human mind can employ in investigating the laws of natural phenomena.”67 Comte and the social scientists that followed viewed quantitative measurements as value-free “abstract and objective extractions from the natural world … [that] provide information about the ‘truth’ of the world.”68 To this day, measurement and quantification are viewed by many in the social sciences and the public more broadly as an objective, precise tool that provides insight into the “truth” of both natural and social phenomena.69 As Porter explains, The appeal of numbers is especially compelling to bureaucratic officials who lack the mandate of a popular election, or divine right. Arbitrariness and bias are the most usual ground upon which such officials are criticized. A decision made by the numbers (or by explicit rules of some other sort) has at least the appearance of being fair and impersonal. Scientific objectivity thus provides an answer to a moral demand for impartiality and fairness. Quantification lends authority to officials who have very little of their own.70 Decisions made based on mathematical analyses of quantified data creates the aura of fair, impersonal, objective precision.71 Increasing reliance on quantifiable measures as a tool of science created a demand for standardization. No doubt, the standardization of measures has a long history and was initially driven for economic and administrative purposes. As an example, the administrative needs of the church and state to regulate the timing of tax payments, reporting for military service, celebration of Lent and celebration of Easter, and the daily observance of morning prayer inspired standardizations of calendars and clocks.72 Similarly, the need to document land ownership led to the development of the square grid. Although mapping square grids required a skilled and well-organized labor force, once square grids were measured, they “permitted land claims to be registered and enforced from hundreds of miles away, with a bare minimum of judgment or local knowledge.”73 For the advancement of science and industry, standardization of measures was requisite. In order to replicate experiments, it was essential that two scientists operated with quantities measured with the same standardized units. Absent standardized measures, it was impossible to compare the consistency of results across replications. To produce technologies that were the product of scientific discovery, standardized measures were similarly essential. As Porter conveys: The standardization of insulin provides a good illustration of the system in action. The Toronto researchers who discovered it initially defined a

The White Racial Frame 123 unit as the dose required to produce a certain degree of hypoglycemia in rabbits weighing two kilograms. But, as was pointed out by the leading British researcher on biological standardization, Henry H. Dale, such a unit could not “maintain the requisite uniformity when determined in different institutions in a number of different countries, on animals kept under different conditions.”74 Together, the empirical positivist paradigm’s universalist assumption and emphasis on replication led the physical and social sciences to embrace standardization of both the methods and the measures used to discover and apply new knowledge for the betterment of humankind. Once established, both a method and a measure become the norm to which new innovations are compared. And when new innovations produce results and insights that differ from that which is produced by the status quo, the field is quick to reject the potential advances and innovations.75 Similarly, reliance on the physical sciences as a model for investigating social phenomena limits both the range of phenomena that are examined and the extent to which context influences the occurrence and outcomes of those phenomena.76 For the White Racial Frame, obscuring the role context plays in producing outcomes supports narratives that direct attention to the individual, their traits, and the degree to which their lived experiences are merited. Embracing the established scientific method that focuses attention on the individual and/or pathologies of people membered non-White also helps protect these individualistic, racialized storylines. As we will explore in Part III, this emphasis on standardization and replicability also limits both methodological innovations and consideration of the role context plays in affecting processes and outcomes.77 Utilitarian View of Justice Modern philosophers have introduced several theories of justice. Some consider justice as applied broadly within society (a.k.a. social justice). As examples, John Stuart Mill and, later, Henry Sedgwick propose a utilitarian conception of social justice, Robert Nozick a libertarian conception, and John Rawls a conception of justice centered on fairness.78 Other theories consider justice within a sector or institution of society. For example, within the domain of criminal justice, at least three theories have been introduced, namely restorative justice, redistributive justice, and transformative justice. At the highest level, theories of justice explore principles aimed at balancing individual rights, fairness, and equality within the context of a functioning and sustainable civil society or sector of society. An essential tension explored in theories of social justice focuses on benefits

124 Race, Racism, and the White Racial Frame and harms produced by the distribution of power, property, and duties required for and produced through social cooperation.79 In a capitalist democratic republic such as the United States, the utilitarian philosophy most closely represents the principles of social justice applied to guide and maintain civil society. In its most basic form, the utilitarian view of justice maintains that “society is rightly ordered, and therefore just, when its major institutions are arranged so as to achieve the greatest net benefit of satisfaction summed over all the individuals belonging to it.”80 Here satisfaction focuses on the distribution of a given form of property (e.g., salary, financial wealth, real estate, etc.), power (political, economic, social), or duty (elected office, military service, essential service, parent/ guardianship, etc.) and the various benefits and harms produced by that distribution. From a utilitarian perspective, the degree to which satisfaction is distributed evenly or unevenly among individual members or subgroups of a society is of no concern. Similarly, whether and how much one set of people are harmed while another group benefits is not considered. What matters is that the distribution maximizes the sum-total level of satisfaction. In its most basic form, the utilitarian view allows for this maximization to occur through the denial of rights and the production of dissatisfaction for some within society.81 To achieve maximum net satisfaction, the utilitarian view allows people with higher levels of power to establish rules that further maximize their satisfaction, even to the detriment of those out of power, as long as net satisfaction is increased. In this way, the utilitarian view allows one subgroup of society to act in self-interested ways to benefit their level of satisfaction even though harm to other subgroups may result. A utilitarian view of justice was essential for the justification of slavery, the Jim Crow laws, and the many disparities in outcomes that persist today. From the utilitarian view, not only is the unequal distribution of resources tolerated, but harm to some is supported as long as the nation as a whole advances economically. During each of these eras—slavery, Jim Crow, and today—the nation experienced tremendous economic growth. From its inception and up to the Civil War, the United States rapidly grew from a struggling postcolonial nation-state to a nation that dominated the textile industry. During the Jim Crow era, the United States developed into a leading industrial nation. And during the 1990s and into the 2000s, the United States has prospered through its advancement as a leader in the technology industry. During each of these periods, the United States experienced large increases in its gross domestic product, and the average household income increased notably. These increases, however, were not distributed evenly, as evidenced by the large disparities in wealth accumulated by plantation and textile mill owners, corporate industrial leaders, and, more recently, the entrepreneurs and investors who helped launched what have become tech giants. Moreover, the increased economic benefits

The White Racial Frame 125 experienced by many U.S. citizens during each of these periods came at the expense of the people enslaved and factory workers who endured both dangerous working conditions and appalling living environments. Today, the harms experienced by some are less visible. But, as an example, consider the rapid and massive growth of the prison industry over the past 30 years. As Michelle Alexander documents, increased imprisonment has caused considerable harm for those imprisoned—many of whom are membered Black and Latine. Yet it is their harm that has brought economic benefits to corporate leaders and stable employment to prison guards, as well as the many others who work within the criminal justice system. During each of these eras, a theory of justice that prioritizes utility has and continues to tolerate both the unequal distribution of resources and the gain of many through harm to some—most often those who are othered. In Part III, I argue that educational measurement also operates under a utilitarian theory of justice that allows harm to some for net benefit summed across all. In Chapter 13, I explore a shift in orientation from a utilitarian view to one that centers on fairness and prioritizes rectifying disadvantage to some that has been produced while providing benefit to others. The White Racial Frame and Systemic Racism The White Racial Frame forms a central component of the ideology that undergirds systemic racism. Like the concept of race, the components forming the White Racial Frame have evolved and expanded over time. Core to the White Racial Frame is a white supremist view of the world that emerged during the Enlightenment, was imported to the British colonies, and has persisted in the United States ever since. In the United States, the white supremist view embraces norms that evolved from white European customs, practices, beliefs, and institutional arrangements. People membered White are understood as intellectually and morally superior to all other racialized groups. Three tenacles of the White Racial Frame function to support and confirm this white supremist world view. Individualism, individual merit, and the hereditability of mental and moral traits are concepts that emerged during the 19th century and were incorporated into the White Racial Frame to direct focus on the individual as the locus of responsibility for success and failure. Scientific pursuit conducted through a quantitative positivist lens similarly emerged during the 19th century and directs attention at developing universal social laws that apply regardless of context. Together, individualism and positivistic inquiry direct the focus of study on individuals and groups of individuals and distract attention from social and environmental structures that influence the production of outcomes. This distraction functions to protect the system of oppression from close scrutiny. Instead,

126 Race, Racism, and the White Racial Frame investigation focuses on exploring differences among individuals and groups of individuals to locate cause within the individual or group, which in turn produces narratives of deficit and pathology. Finally, a utilitarian conception of justice articulated during the first decade of the 20th century was integrated into the White Racial Frame to justify the production of disparate outcomes manufactured through oppression. Gains acquired by the dominant group through the oppression of the othered are understood as just if society, as a whole, benefits or advances. As is explored next in Part II, these components of the White Racial Frame influenced the questions of interest, the methods employed, and the stories told during the early phases of educational measurement’s development. While the racist beliefs that dominated the thinking of the field’s pioneers no longer operate in an overt manner, the framing of questions, methods, and narratives were carried forward to influence educational measurement as it operates today. As we will see come the end of Part II, this lasting impact allows educational measurement to function as apparatus for the system of racism that operates the United States today. Notes 1 Elias and Feagin (2016), p. 7. 2 I first learned of Brown’s trick during an NPR broadcast of Hidden Brain. To see a short video documenting Darren’s taxidermy advertising trick, see https:// www.youtube.com/watch?v=JZbSctDyG24 3 Bracey et al. (2017), p. 45. 4 Feagin (2013), p. 3. 5 Mills (1997), p. 93. 6 See Mills (1997), pp. 59–60; Eze (2001); Painter (2010). 7 For an extensive analysis of racialized narratives conveyed through the news, see Campbell et al. (2013). 8 Mills (1997), p. 97. 9 Bonilla-Silva (2018). 10 Smith (2012) suggests that the depictions of monsters and half-men produced during medieval times seeded depictions of people membered non-White during and since the Enlightenment. 11 Scheurich and Young (1997), p. 6. 12 Valdivia (2002), among the many feminist scholars, argues the media similarly produce conceptions that construct and reinforce the superiority of men. Valdivia writes that it was the mind-body divide that positioned men as rational and thinking subjects whereas women were driven by emotion, which disqualified women at a conceptual level and rendered them objects to the male subject. In terms of media, have women been taken seriously as autonomous speakers and actors? Until quite recently, for example, most women in the news were described in terms of their relation to men: wife, mother, grandmother, daughter, and so forth. Even somebody as prominent as Indira Ghandi [sic] was often described as a grandmother. (p. 436)

The White Racial Frame 127 13 Feagin (2013), p. 15. 14 Feagin (2013), p. 13. 15 In addition to Feagin (2013), see also Stanfield (2016) and Omi and Winant (2015) who include among the values “the American ‘civil religion’ of individualism, equality, compensation, opportunity, and the accessibility of ‘the American dream’ to all who strove for it” (p. 254). 16 Feagin (2013), p. 13. 17 Feagin (2013), p. 13. 18 Wessels (1997), p. 34. 19 Wessels (1997), see pp. 34–39. 20 Wilson quoted in Grandin (2019), pp. 122–123. 21 Grandin (2019), pp. 122–123. 22 Dixon-Román (2017), p. XXII. 23 Dixon-Román (2017), p. XXVII. 24 Claeys (2000), p. 235. 25 Shapiro quoted in Dixon-Román (2017), p. 112. 26 Bonilla-Silva and Forman (2000). 27 Lewis (2004), p. 641. 28 Dixon-Román (2017), p. 116. 29 Mills (1997), p. 42. 30 Taylor (1981), p. 451. 31 Malthus quoted in Chase (1975/1980), p. 6. 32 Malthus quoted in Chase (1975/1980), p. 6. 33 Chase (1975/1980), p. 6. 34 Chase (1975/1980), p. 8. Also see Dennis (1995), p. 244. 35 Beecher quoted in Hofstadter (1944/1992), p. 31. 36 Holmes quoted in Hofstadter (1944/1992), p. 32. 37 Hofstadter (1944/1992), p. 45. 38 McHugh (1980), p. 164 and Claeys (2000), p. 237. 39 Sumner quoted in Hofstadter (1944/1992), p. 51. 40 Jackson and Weidman (2005). 41 Recall that during the late 19th and early 20th centuries, racialization of people expanded to include non-Anglo-Saxon Europeans who immigrated to the United States and the poor, chronically ill, and/or “feeble-minded” in Britain. 42 Stanfield (2016), p. 21. 43 Acton (1951), p. 291. 44 When using the term positivism, I refer to the ideas, aims, and ways of knowing introduced by Comte and refined during the 19th century—a form of positivism that has been referred to as social or sociological positivism. In the broadest sense, this early “school” of positivism aimed to apply methods of discovery similar to those employed in the physical sciences to form laws of nature in order to discover social “facts” and social “laws.” Over time, positivism developed through various stages including logical positivism and post-positivism. These latter developments are not the focus of discussion in this chapter or the critique made in Critical Theory explored in Chapter 10. 45 Cruickshank (2012), p. 72. 46 Cruickshank (2012). 47 McGrath and Johnson (2003), p. 34. 48 Acton (1951), p. 291. 49 Pascale (2010), p. 156. 50 Porter (1995), p. 13.

128 Race, Racism, and the White Racial Frame 51 52 53 54 55 56

Porter (1995), p. 15. Dixon-Román (2017), p. 21. Popper (1959/2006). Stanfield (2016), p. 21, italics in the original. Porter (1995), pp. 3–4. Porter (1995), p. 3. Karl Popper expanded this dichotomous distinction between objective and subjective by arguing that knowledge (reality) falls into three categories, or worlds as he terms them (see Popper, 1968). 57 Stanfield (2016), p. 21, italics in the original. 58 Porter (1995), p. 4. 59 Feyerabend (1975/2002), pp. 10–11. 60 Feyerabend (1975/2002). 61 Porter (1995), pp. 11–12. 62 See Clayton (2021) and Pearl and Mackenzie (2018). 63 Pascale (2010), p. 156. 64 See Turner (1985), p. 24 and Porter (1995), p. 20. 65 Pascale (2010), p. 156. Turner (1985, p. 24) divides Comte’s application of positivism to sociology into three investigatory stages, writing that Comte believed: First, the social universe is amenable to the development of abstract laws that can be tested through the careful collection of data. Second, these abstract laws will denote the basic and generic properties of the social universe and they will specify their ‘natural relations.’ Third, such laws will not be overly concerned with causality, or functions. 66 See Smith (2012), p. 44, and Dixon-Román (2017), p. 14. 67 Dixon-Román (2017), p. 51. 68 Dixon-Román (2017), p. 64. 69 See Porter (1995), p. 23 and Dixon-Román (2017), p. XXV. 70 Porter (1995), p. 8. 71 See Dixon-Román (2017), pp. 55–56. 72 See Porter (1995), p. 23. 73 Porter (1995), p. 22. It is interesting to note that the use of square grid units as a measure of land ignores two important aspects of land. First, because a square grid is recorded on a single plane, the contour of the land is ignored. The resulting measure allows one to report the number of square units a landowner owns on a singular plane, but it does not represent the actual surface area of land contained within that unit. As a result, a square unit of land in a flat region contains substantially less workable land than does the same square unit of land in a hilly region. Similarly, the square unit ignores the quality of soil that forms the land. As a result, the productivity of a single square unit of land could vary substantially depending on the quality of soil contained within that unit. In Poland, land measures often varied by soil quality, so that a unit of land would represent more or less equal productive value. This unit was often defined as the territory upon which a certain quantity of seed could properly be sown. (Porter, 1995, p. 24) 74 Porter (1995), p. 31. 75 In educational measurement, consider the development of a new test for a given construct. Often, scores produced by the new test are compared with scores produced by an existing test as a means for examining what was once termed concurrent criterion validity. If the scores produced by the new instrument are not

The White Racial Frame 129 sufficiently correlated with scores produced by the existing test, the new test is typically interpreted as a poor measure of the targeted construct despite the fact that it may be the existing test that has been providing a poor measure. 76 Pascale (2010), p. 154. 77 As one example outside of race and racism of how standardization limits innovation in educational measurement, consider the resistance to test accommodations and later adoption of principles of universal design that has occurred over the past 40 years. Both accommodations and universal design were introduced to decrease bias in educational measures for specific subgroups of students. Yet, for many years, testing programs flagged scores for students who received accommodations and later resisted adoption of universally design test delivery tools. Although these practices are now common today, concerns about standardization delayed their adoption. 78 See Mill (1861) and Sidgwick (1874) for an explication of a utilitarian view of justice, Nozick (1974) for a libertarian conception, and Rawls (1971/1999) for a conception of justice centered on fairness. 79 See Rawls (1999), p. 6. 80 Rawls (1999), p. 20. 81 Sidgwick (1874); Rawls (1971/1999).

References Acton, H.B. (1951). Comte’s positivism and the science of society. Philosophy, 26(99), 291–310. Bonilla-Silva, E. (2018). Racism without Racists: Color-Blind Racism and the Persistence of Racial Inequality in the United States (5th ed.). Rowman & Littlefield Publishers. Bonilla-Silva, E. and Forman, T. 2000. “‘I’m Not a Racist but … ’ Mapping White College Students Racial Ideology in the U.S.A.” Discourse and Society, 11(1), 53–85. Bracey, G., Chambers, C., Lavelle, K. & Mueller, J.C. (2017). The white racial frame: A roundtable discussion. In Systemic Racism. Palgrave Macmillan. Campbell, C.P., LeDuff, K.M., Jenkins, C.D. & Brown, R.A. (2013). Race and News: Critical Perspectives. Routledge. Chase, A. (1975/1980). The Legacy of Malthus: The Social Costs of the New Scientific Racism. Knopf. Claeys, G. (2000). The “survival of the fittest” and the origins of social Darwinism. Journal of the History of Ideas, 61(2), 223–240. Clayton, A. (2021). Bernoulli's Fallacy: Statistical Illogic and the Crisis of Modern Science. Columbia University Press. Cruickshank, J. (2012). Positioning positivism, critical realism and social constructionism in the health sciences: A philosophical orientation. Nursing Inquiry, 19(1), 71–82. Dennis, R.M. (1995). Social Darwinism, scientific racism, and the metaphysics of race. Journal of Negro Education, 64, 243–252. Dixon-Román, E.J. (2017). Inheriting Possibility: Social Reproduction and Quantification in Education. University of Minnesota Press. Elias, S. & Feagin, J.R. (2016). Racial Theories in Social Science: A Systemic Racism Critique. Routledge.

130 Race, Racism, and the White Racial Frame Eze, E.C. (2001). Race and the Enlightenment: A Reader. Blackwell Publishers. Feagin, J.R. (2013). The White Racial Frame: Centuries of Racial Framing and Counter-framing. Routledge. Feyerabend, P. (1975/2002). Against Method. Verso. Grandin, G. (2019). The End of the Myth: From the Frontier to the Border Wall in the Mind of America. Metropolitan Books. Hofstadter, R. (1944/1992). Social Darwinism in American Thought. Beacon Press. Jackson, J.P. & Weidman, N.M. (2005). The origins of scientific racism. The Journal of Blacks in Higher Education, 50(50), 66–79. Lewis, A.E. (2004). “What group?” Studying whites and whiteness in the era of “color-blindness”. Sociological Theory, 22(4), 623–646. McGrath, J.E. & Johnson, B.A. (2003). Methodology makes meaning: How both qualitative and quantitative paradigms shape evidence and its interpretation. In Qualitative Research in Psychology: Expanding Perspectives in Methodology and Design. American Psychological Association. McHugh, C. (1980). Social Darwinism: Science and Myth in Anglo-American Social Thought. Temple University Press. Mill, J.S. (1861). Utilitarianism. In Justice: A Reader. Oxford University Press. Mills, C.W. (1997/2014). The Racial Contract. Cornell University Press. Nozick, R. (1974/2007). Anarchy, state, and utopia. In Justice: A Reader. Oxford University Press. Omi, M. & Winant, H. (2015). Racial Formation in the United States. Routledge. Painter, N.I. (2010). The History of White People. WW Norton & Company. Pascale, C.M. (2010). Epistemology and the politics of knowledge. The Sociological Review, 58, 154–165. Pearl, J. & Mackenzie, D. (2018). The Book of Why: The New Science of Cause and Effect. Basic Books. Popper, K. (1959/2006). The Logic of Scientific Discovery. Routledge. Popper, K.R. (1968). Epistemology without a knowing subject. In Studies in Logic and the Foundations of Mathematics. Elsevier. Porter, T.M. (1995). Trust in Numbers. Princeton University Press. Rawls. J. (1971/1999). A Theory of Justice (Rev. ed.). Harvard University Press. Scheurich, J.J. & Young, M.D. (1997). Coloring epistemologies: Are our research epistemologies racially biased? Educational Researcher, 26(4), 4–16. Sidgwick, H. (1874/2019). The Methods of Ethics. Good Press. Smith, L.T. (2012). Decolonizing Methodologies: Research and Indigenous Peoples. Zed Books. Stanfield, J. H. (2016). Black Reflective Sociology: Epistemology, Theory, and Methodology. Taylor & Francis. Taylor, C.M. (1981). W.E.B. DuBois’s challenge to scientific racism. Journal of Black Studies, 11(4), 449–460. Turner, J.H. (1985). In defense of positivism. Sociological Theory, 3(2), 24–30. Valdivia, A.N. (2002). bell hooks: Ethics from the margins. Qualitative Inquiry, 8(4), 429–447. Wessels, T. (1997). Reading the Forested Landscape: A Natural History of New England. The Countryman Press.

Part II

The White Racial Frame and the Development of Educational Measurement Just as concepts of race and racism have evolved, the components forming the White Racial Frame have similarly transformed. The division of humans into racialized groups and the hierarchical ordering of those groups is bedrock for the White Racial Frame. Etched into this bedrock are assumptions that center White European culture, discourse, and behaviors as the norm to which all “others” are expected to assimilate. Glorification of individualism—mythologized in the labor of early colonialists, pioneers who tamed the West, and aggrandized rags-to-riches stories—added heft to the frame. Over time, individualism provided backing for merit, justifying access to opportunities and resources for the White middle and upper classes, and the accumulation of capital and power by the White “elite.” Disparities in the distribution of property, capital, power, and civic duties were further sustained by a utilitarian concept of justice. As discovery through scientific investigation replaced dogmatic religious proclamations and mystical revelations as sources of knowledge, empiricism, positivism, and the associated tenets of objectivity, quantification, and measurement were integrated into the White Racial Frame. Following Darwin’s evolutionary theory and Spencer’s concept of the survival of the fittest, heredity became a welcomed explanation for natural differences among humans and provided a rationale for the disparities produced by merit. Collectively, these core components of the White Racial Frame formed a key pillar in the ideology that dominated U.S. and British societies during the late 19th and early 20th centuries. It was this ideological context into which educational measurement was birthed. The five chapters in this section explore the influences the White Racial Frame had on early developments in educational measurement and how these developments continue to influence practices today. Chapter 5 examines a series of family studies conducted over a 50-year period during the late 1800s and early 1900s, which set the stage for the development of mental measures. The first of these family studies was conducted by Francis Galton and traced the inheritance of “genius” (intellect) across family generations. DOI: 10.4324/9781003228141-7

132 The White Racial Frame and Educational Measurement Importing Galton’s method of family study to the United States, more than a dozen similar studies were conducted to document the heritability of mental traits and the social behaviors believed to be produced by those traits. This collection of studies stimulated interest in the measure of cognitive traits—variously termed genius, intellect, mental ability, intelligence, feeblemindedness, and achievement. These studies were greatly influenced by a hereditary conception of “mental” traits and the role these traits play in shaping lived experience. These studies also supported the development of eugenics—a pseudoscientific progressive social movement that aimed to manipulate the survival of the fittest to maximize the fitness of society. Chapter 6 examines the development of tests of mental ability and the tight relationship this development had with the U.S. eugenics movement. Here again, the heritability of mental traits plays an important role in the production and initial use of measures of cognitive ability. Biological conceptions of race dovetailed with heritability, objectivity, and quantification, blinding developers to bias in these early tests. Coupled with preconceptions of the hierarchical ordering of races, this bias led early test developers to produce narratives speciously extolling differences among racialized groups. Efforts to use these tests to assign recruits to military positions, deny immigrants entry into the United States, place people into institutions, and, in the most extreme, force sterilization—all endeavors made to improve the quality of U.S. society—were clear productions of a utilitarian conception of justice. Chapter 7 examines the development of admission testing programs, with specific focus on the rise of the SAT. It is here that merit enters the story, forcefully backing individualism and utilitarianism. In this chapter, I explore how the initial motivation for the SAT, which aimed to increase the economic diversity of those accessing higher education, has led admission testing to function as apparatus for systemic racism. Combined with the continued embrace of IQ testing, college admission tests also function to support deficit narratives that perpetuate conceptions of biological inferiority of people membered not-White. Chapter 8 explores the role racialized beliefs and objectivity played in the development of statistical methods, many of which are now applied to examine the efficacy of educational programs. This chapter also examines the ways in which current use of statistical methods contribute to the (re) production of deficit narratives and have produced what some social scientists term a replication crisis. This chapter ends by considering how the racist thinking that influenced the development of modern statistical methods hampered the development of alternate statistical techniques. Openness to the development of alternate techniques is revisited in Chapters 11 and 12, where the misalignment of current methods with modern social theories is explored.

The White Racial Frame and Educational Measurement 133 The final chapter builds on the first four to consider ways in which educational measurements operate today as apparatus for systemic racism in ways that maintain the racialized structure of U.S. society. This chapter also considers ways in which components of the White Racial Frame continue to influence practices within the field. In this analysis, I consider how some of our best practices might unintentionally serve to preserve advantage for those who embrace the dominant White culture. This chapter also examines ways in which institutional and structural racism operate within educational measurement. This chapter concludes by locating college admission testing in the model of systemic racism presented in Chapter 3 and examining how various uses of educational tests function as apparatus in our social system to reproduce inequities. Collectively, this set of chapters highlights a few of the many ways in which the White Racial Frame has and continues to influence the field of educational measurement. Reflecting on my own experiences over the past several years, confronting manifestations of White Racial Framing is uncomfortable. Yet, for me, contestation was a necessary step to becoming open to alternate frames that may help advance the altruistic aims of the field. The topics and ideas explored in this section aim to open opportunities for the field to consider alternate frames explored in Part III. Background: Francis Galton, Henry Goddard, and the Eugenics Record Office Francis Galton, Henry Goddard, and the Eugenics Record Office play multiple roles in the early development of educational measurement. Several books and dozens of articles have detailed the life story of Galton. Leila Zenderland’s Measuring Minds: Henry Herbert Goddard and the Origins of American Intelligence Testing similarly chronicles Goddard’s life’s work. Here, I provide a brief introduction to these two central figures, as well as the Eugenics Records Office, an agency that Galton inspired and was a sponsor for Goddard. Francis Galton is often described as the grandfather of both modern statistics and psychometrics—two essential components of educational measurement. Born into a wealthy English family, at the age of seven Galton was identified as an intellectual prodigy and, when only 16, he began studying medicine at London’s King’s College. After two years, the focus of his studies shifted to chemistry, anatomy, and physiology before settling on mathematics. Following four years at Trinity College, Galton experienced a nervous breakdown while preparing for the honors exam and opted instead to accept the equivalent of a bachelor’s degree. Despite this disappointment, six years of advanced study provided Galton with familiarity with a range of subjects.

134 The White Racial Frame and Educational Measurement At the age of 22, his father’s death left a generous inheritance that freed Galton to explore his varied interests. Like a modern-day serial entrepreneur, Galton pursued his many interests from unique and fresh perspectives. And, like a successful serial entrepreneur, Galton developed fresh ideas and methods of discovery that advanced several fields. Among his contributions were the surveying of a large number of weather stations, the data from which revealed new understandings of weather patterns; the collection of data from large samples of people, the analysis of which documented the manner in which a great many traits were normally distributed; the discovery of the uniqueness of each person’s fingerprints and the subsequent development of a method of fingerprint identification; and, as we will explore in detail in Chapter 9, the discovery of statistical correlation and regression. Yet, not all of his inquiries were fruitful. Perhaps most notable was his vast collection of ratings of women residing in various towns with which he attempted to produce a map of the distribution of beauty in England—a project that conjures Francois Bernier’s objectified descriptions of women membered into different racialized groups. In large part, Galton’s many discoveries were a product of his penchant for collecting large sets of numbers and his keen ability to see patterns in those numbers. While many of his ideas were influential, Galton was rarely successful in fully scaling his ideas. As we will see in Chapters 6 and 8, his efforts to develop methods for measuring intellect and advances to statistical methods gained widespread use only after others fully developed what his intuition produced. Two aspects of Galton’s work, however, leave an ugly stain on his legacy. The first is his overtly racist views. Galton was an ardent believer in the supremacy of White Europeans. Further, among the White population, Galton believed there were fundamental differences in the genetic qualities that distinguished society’s elite from the working class, the paupers, and “degenerates” below them. His views are expressed clearly throughout his writings and greatly influenced his work. Pushing the boundaries of science’s social value, his belief in the heritability of cognitive and psychological traits inspired Galton to develop the concept of eugenics—by far his most notorious and troubling influence on (pseudo)science during the first half of the 20th century. As explored in greater detail in Chapter 6, eugenics became a “scientific” approach to social engineering that misapplied a nascent understanding of heredity to improve the quality of future generations. Several branches of eugenics emerged in regions across the world, the most extreme applications resulting in forced sterilizations and institutionalizing of tens of thousands of people in the United States and the Holocaust in Nazi Germany. It is important to note that Galton’s vision for eugenics was far less destructive. Rather than applying eugenics to negate or eliminate from society what were believed to be

The White Racial Frame and Educational Measurement 135 “negative” human traits, Galton advocated a “positive” form of eugenics that encouraged reproduction by couples displaying “desirable” traits. Whether applied positively or negatively, eugenics is a troubling conception that infringes on basic human rights and stands in direct opposition to any modern conception of justice. Although less prolific than Galton, Henry Goddard was similarly influential in the early development of educational measurement. Goddard was born and raised in rural Maine. Injured by a bull, his father was unable to work what was once a sizeable farm, passed away when Goddard was nine, and left his mother with few financial resources. For a period of time, Goddard’s family lived in poverty and appreciated assistance by their Quaker society. As a young teenager, his fortunes shifted when he received a scholarship to attend the Friends School in Providence, Rhode Island, and later Haverford College, from which he eventually earned a master’s degree in mathematics. Goddard worked as a teacher and principal before earning a doctorate in psychology from Clark University. After completing his studies, Goddard worked at the State Normal School in Chester, Pennsylvania, for six years before becoming director of research at the Vineland Training School for Feeble-Minded Girls and Boys, a residential school for children with special needs located in Vineland, New Jersey. Goddard’s work with children with special needs sparked his interest in the heredity of mental traits and inspired his infamous study of the Kallikak family that is detailed in Chapter 5. This interest also led Goddard to Alfred Binet, whose pioneering work in France produced the first standardized instrument for categorizing mental abilities, which Goddard imported to the United States. As explored in Chapter 6, Goddard adapted Binet’s procedure and conducted several studies documenting the mental capabilities of schoolchildren. Later, he applied these same techniques to classify the mental ability of immigrants attempting to enter the nation. As a U.S. pioneer in mental measurement, Goddard was a key player in the development of the Army Alpha and Beta tests—the first large-scale standardized tests of mental ability—offering his campus at Vineland as the operation center for this test development project. Like Galton, Goddard was attracted to eugenics and was actively involved in efforts to implement eugenic policy and practices in the United States. Goddard’s best-known work on the Kallikak family was funded, in part, by the Eugenics Record Office. Located in Cold Springs Harbor on Long Island, New York, the Eugenics Record Office was founded in 1910 by Charles Davenport and directed by Harry Laughlin. Davenport latched onto Mendel’s theory of hereditary genetics and sought to apply this theory to limit the passing of “degenerative” traits to future generations. Funded generously by the Carnegie Institution and the Rockefeller Foundation, the Eugenics Record Office collected

136 The White Racial Frame and Educational Measurement vast records used to document the heredity of various mental traits and social behaviors. Research findings produced by the Eugenics Record Office were used to support several discriminatory and, at times, abhorrent policies, including immigration restrictions, institutionalization, and forced sterilization. In 1935, the Carnegie Institute initiated a review of the Eugenics Record Office that eventually convinced the foundation’s new leadership to cease funding for the Cold Springs office, which led to its demise in 1939. Although the Eugenics Record Office was not directly involved in the development of educational measurement, it provided funding for many of the field’s early pioneers, including Robert Yerkes, Edward Thorndike, and Henry Goddard, and made use of findings from their research and development to promote eugenic solutions. Much has been written about eugenics. A subset of this literature has documented the embrace of eugenics, and in some cases active advocacy for eugenic solutions, by many of the pioneers of educational measurement. As we will see in Chapter 8, the three people—Francis Galton, Karl Pearson, and Ronald Fisher—who had the strongest influence on the development of statistics during the first half of the 20th century ardently supported eugenics. In Chapter 6, we will see that all of the major players who traveled to the Vineland School to aid in developing the first large-scale standardized test had ties to eugenics, many of whom were ardent advocates for eugenic solutions. Reckoning racism in the field of educational measurement requires that this strong link between the field’s founding members and the eugenics movement is acknowledged. My aim in the chapters that follow, however, is not to further document or expand on this link. Rather, my aim is to shift focus to the role the White Racial Frame played in shaping the work of these influential people. Although I do not build the case that it is this same White Racial Frame that led these many leaders to embrace eugenics, the case is there for the making.

5

Heredity and Family Traits

Once a Miller, always a Miller.1 Sociobiological theories become popular for their capacity to comfort groups that have deep-seated needs for reassurance.2

In the mid-1500s, the proverb “the apple does not like to fall far from its tree” entered the German language. Some 200 years later, a similar adage appeared in English.3 Today, it is used to reflect a child whose physical traits, interests, talents, or behaviors are similar to those of a parent. Whether it’s a pronounced jaw line, distinct nose, warm smile, stout frame, athleticism, or clumsiness, similarities across family generations have long been noted. In fact, during the Middle Ages, professions handed down through generations were the basis for English surnames, many surviving until today—Bowman, Butler, Carpenter, Cook, Fowler, Hunter, Miller, Potter, Skinner, Tanner, Weaver, and Wheeler. The English term heredity traces to the 1530s, when it was adapted from the Latin word hereditatem. At the time, heredity was used in reference to “the condition of being an heir.” Initially, the condition of interest focused on social and economic status passed intergenerationally. Over time, heredity’s meaning expanded to the passing of physical, and later psychological, characteristics from parent to child. By controlling procreation, breeders of livestock and other domesticated animals have long capitalized on heredity to modify specific characteristics of a breed. In 1859, Charles Darwin’s theory of evolution gave scientific credence to what was obvious to the untrained eye; traits were passed intergenerationally through heredity. Darwin’s theory of inheritance offered insight into the scientific processes that were responsible for the heredity of traits. Although Darwin’s conception of gemmules was an underdeveloped and inaccurate model of the mechanism for genetic inheritance, it sparked increased interest in the hereditability of human traits.

DOI: 10.4324/9781003228141-8

138 The White Racial Frame and Educational Measurement In both England and the United States, rapid social and economic changes during the late 1800s and early 1900s focused particular attention on the hereditability of desired traits—genius and disposition—believed “fit” for the evolution of humankind, as well as those that were believed “unfit”— criminality, prostitution, alcoholism, pauperism, and insanity, among many more. In short order, these desirable and degenerative traits were branded as “intelligence” and “feeblemindedness.” Believing intelligence and feeblemindedness were the root cause of benefit and harm to society, a research program that spanned a half-century was undertaken to document the hereditability of mental traits, and the behaviors derived from them. This program of study was initiated in England by Darwin’s cousin, Francis Galton, who focused on the heredity of what he termed genius. Shortly thereafter, the study of families was imported to the United States, where the focus shifted from genius to degeneracy. Family Studies Over the period spanning 1877 to 1919, at least 15 studies tracing intergenerational traits were undertaken in the United States. Segments from many of these studies are republished in criminologist Nicole Hahn Rafter’s book, White Trash: The Eugenic Family Studies, 1877–1919. Rafter’s analysis of these influential studies focuses attention on the “relation between science and society, and of the relation of both to ideology.”4 Like Galton’s work in England, these studies were undertaken during a time when the United States experienced considerable industrial expansion, wealth accumulation by the social elite, the emergence of a technocratic middle class, rapid urban growth, degrading social conditions for the industrial workforce, and dramatic increases in crime and other “degenerate” behaviors.5 The White Racial Frame ruling the ideology of the day directed attention away from these social and economic developments and instead focused particular attention on the role heredity played in producing this degeneration. As Rafter summarizes, “these studies identified tribes [i.e., groups of people] whose inferior heredity was considered the source of alcoholism, crime, feeble-mindedness, harlotry, hyperactivity, laziness, loquacity, poverty, and a host of other ills.”6 These studies also provided a valuable source of evidence of the heredity of “bad germ plasm” and “cacogenic” (bad-gene) families, slowing and, for some, ceasing the spread of which became a primary aim of the eugenics movement.7 As Rafter’s quote that opens this chapter observes, family studies yielded empirical “evidence” that provided comfort to the White elite regarding their position in society and their advocacy for social policies that further harmed those less fortunate in their society. This “scientific” evidence also informed the development of schools and institutions into which people

Heredity and Family Traits 139 deemed feebleminded were placed. Studies documenting intergenerational criminal activity also provided a foundation for “defective delinquency,” a theory of criminology that rapidly attracted favor. The theory of defective delinquency not only posited that feeblemindedness produced criminality, but it maintained that the feebleminded were inherently criminal. This tautology permitted preemptive and indefinite (sometimes for life) detention of people classified “feebleminded.”8 Family studies conducted during the later years of the 1800s also inspired the Eugenic Records Office to publish a guidebook titled How to Make a Eugenical Family Study. This guidebook motivated and informed additional efforts to evidence the “physical, mental and moral hereditary traits.”9 To support these further investigations, the Eugenic Records Office offered summer courses designed “to provide lectures and laboratory and clinical studies in human heredity and other eugenical factors, to give special instruction in the principles and practice of making first-hand human pedigreestudies, and to train students to make investigations in eugenics.”10 As Rafter describes, the investigators who conducted family studies shared common ideological and methodological assumptions. Among these ideological assumptions were the heredity, rather than social production, of behaviors and mental capabilities, survival of the fittest through social competition, and, contradictorily, the decline of U.S. society due to higher birthrates in “degenerate” families compared to those of “talent” and “high moral fiber.” These studies also helped fix the idea that the social class structure, and in particular the rise of the wealthy and the struggles of the poor, were products of nature. As Rafter describes, methodologically, the studies conducted in the United States typically began with living representatives of a group that appeared to be characterized by a set of unsavory traits. From there the authors worked back in time, using public records, memories of neighbors, and recollections of family members themselves to show that similar disabilities had characterized the group’s ancestors—and thus must be hereditary … Cumulatively [the studies] created a powerful myth about the somatic nature of social problems.11 Interestingly, in each of these studies, the researcher began with a focus on a specific “degenerative” trait, yet through their analysis, multiple social “defects” were often unveiled. As Rafter observes, “that even ‘specialized’ families bore a multitude of stigmata did not disturb their chroniclers; each new defect merely confirmed the underlying assumption that inferiority had many facets.”12 And, for many authors of family studies, feeblemindedness was the root cause of these many inferior facets.

140 The White Racial Frame and Educational Measurement Of the dozen family studies published during the tumultuous period of industrial expansion, three are of particular interest: Francis Galton’s Hereditary Genius, Richard Dugdale and Arthur Estabrook’s The Jukes, and Henry Goddard’s The Kallikak Family. The sections that follow visit each of these studies and examine the ways in which this body of work both was greatly influenced by the White Racial Frame and influenced the development of educational measurement. Francis Galton’s Hereditary Genius First published in 1869, Hereditary Genius provides detailed documentation of Galton’s attempt to trace “natural ability” through family lines.13 Galton defined natural ability as “those qualities of intellect and disposition, which urge and qualify a man to perform acts that lead to reputation.”14 To determine one’s reputation, Galton relied on the opinion of contemporaries, revised by posterity—the favourable result of a critical analysis of each man’s character, by many biographers … I do not mean high social or official position … but I speak of the reputation of a leader of opinion, or an originator, or a man to whom the world deliberately acknowledges itself largely indebted.15 In other words, Galton focused on people whose accomplishments were recorded favorably in written form. Given a favorable account, the person was assumed to possess natural ability. Galton’s inquiry combined three methods: tracing eminence within select families, examining traits of adopted children, and employing statistical properties to estimate the distribution of intellect among the general populace. Galton began his investigation by examining families of “English Judges since the Reformation” for which he calculated the percentage of members whose status was deemed “eminent.”16 Galton reasoned that “judgeship is a guarantee of its possessor being gifted with exceptional ability.” In addition, “Judges are sufficiently numerous and prolific to form an adequate basis for statistical inductions, and they are the subjects of several excellent biographical treatises.”17 Using Lives of the Judges, published in 1865, Galton identified 286 judges and determined that “109 of them have one or more eminent relations” which occurred in just 85 families.18 Galton further observed that when two or more members of a family served as judges, more often it was a father and son than a father and grandchild or other more distant relation that formed the coupling. In fact, Galton noted that “out of the 286 Judges, more than one in every nine of them have been either father, son, or brother to another judge.” From this, he concluded “there cannot, then,

Heredity and Family Traits 141 remain a doubt but that the peculiar type of ability that is necessary to a judge is often transmitted by descent.”19 Following a similar approach, Galton extended his study to other professions often held by the elite within Victorian English society, including military commanders, literary men, men of science, poets, musicians, and painters, as well as the more ordinary wrestlers and oarsman. For each group, he estimated the probability that two members of a family would achieve similar eminence within the profession. Comparing these calculations, he noted “the general uniformity in the distribution of ability among the kinsmen in the different groups.” Based on these analyses, Galton concluded, “There cannot, therefore, remain a doubt as to the existence of a law of distribution of ability in families.”20 To inform his conclusions further, Galton engaged two other lines of inquiry. The first aimed to address potential criticism that the pattern Galton unearthed was a product of nurture rather than nature. To dispel this argument, Galton reasoned that children adopted into environments that provided educational and social advantages would show similar probabilities of rising to eminence if nurture was the primary cause. To examine this hypothesis, Galton recognized that he could not simply compare “sons of eminent men with those of non-eminent, because much which [he] should ascribe to breed, others might ascribe to parental encouragement and example.”21 To form “a very fair comparison,” Galton turned to the adopted sons of popes and dignitaries of the Catholic Church. Galton reasoned that “if social help is really of the highest importance, the nephews of the Popes will attain eminence as frequently, or nearly so, as the sons of other eminent men; otherwise, they will not.”22 Based on his analysis, Galton concluded that the sons of eminent men were decidedly more likely to rise to eminence than were the adopted sons of popes. “The social helps are the same, but hereditary gifts are wanting in the latter case.”23 To support his claim regarding the hereditary nature of traits, Galton also applied the statistical concept of “frequency of error” to estimate the distribution of intellect for all people residing in England. Here Galton extended Adolphe Quetelet’s observation regarding the approximately normal distribution of physical traits to intellect. As described in greater detail in Chapter 8, Quetelet examined measures of physical traits and observed that that distribution of physical traits was similar to the manner in which errors of measures are typically dispersed. Galton “applied this same law [of errors] to mental faculties, working it backwards in order to obtain a scale of ability.”24 He then divided his theoretically derived distribution of intellect into 14 intervals or “grades,” seven of which were above the average intellect of all English citizens, and the other seven below that average. Based on properties of the normal distribution, he then estimated the number of people residing in England that fell into each grade of intellect.

142 The White Racial Frame and Educational Measurement As Galton describes, “it will be seen that more than half of each million [people] is contained in the two mediocre classes” immediately above and below the average; “the four mediocre classes [abutting the average] contain more than four-fifths, and the six mediocre classes more than nineteentwentieths of the entire population.”25 Thus, Galton reasoned that the intellect of the vast majority of people of England could be described as “mediocre” and fell within four of the seven grades extending in both directions from the average. To provide a sense of the type of person possessing each level of intellect, Galton posited that a person residing within the third grade above the mean “possess abilities a trifle higher than those commonly possessed by the foreman of an ordinary jury.” The fourth grade above the mean “includes the mass of men who obtain the ordinary prizes of life.” The fifth grade “is a stage higher. Then we reach [the sixth grade], the lowest of those yet superior classes of intellect” and the class that is the focus of his analysis of the heredity of intellect. In this description, Galton also notes that if one were to encounter a person residing within the sixth grade below the average, “we are already among the idiots and imbeciles … [for which] there are 400 idiots and imbeciles, to every million of persons living in this country.”26 Galton applied properties of the law of errors to intellect to justify his focus on approximately 400 men of eminence, reasoning that these men reside within the highest levels of intellect in English society. Galton’s use of the law of errors to estimate the distribution of intellect was also used to describe the difference in intellect among racialized groups. Although the vast majority of Hereditary Genius focuses on the inheritance of intellect of English (i.e., White) citizens (i.e., men) who hold various eminent professions, the 19th of his 21 chapters abruptly shifts focus to an exploration of “the comparative worth of different races.”27 To make comparisons between racialized groups, Galton “act[s] upon an assumption … that the intervals between the grades of ability are the same in all the races.”28 Thus, although the average intellect might differ across racialized groups, the distribution of intellect within a given racialized group is similar. This assumption allowed Galton to examine the number of people within different racialized groups who have risen to eminence and to then associate the grade levels across races. Based on this rough estimation, Galton concludes that although “the negro race is by no means wholly deficient in men capable of becoming good factors, thriving merchants, and otherwise considerably raised above the average of whites,” the frequency with which such people reach such levels “points to the conclusion, that the average intellectual standard of the negro race is some two grades below our own.”29 It is interesting to pause here and note a convenient contradiction in Galton’s reasoning that is revealed in his effort to compare people membered White or Black. Prior to estimating the frequency with which people

Heredity and Family Traits 143 membered Black rise to eminence, Galton considers the sample of people he might use as a source for his analysis. Recall that Galton’s analysis focuses on intellect (genius)—a trait he believes is hereditary in nature rather than a product of one’s social environment. Initially, Galton directs his attention to people membered Black residing in the United States, many of whom were only recently freed from slavery. He determines, however, that he cannot use this group as his study sample because [i]f the negro race in America had been affected by no social disabilities, a comparison of their achievements with those of the whites in their several branches of intellectual effort, having regard to the total number of their respective populations, would give the necessary information.30 The fact that a great many people membered Black in the United States had been subjected to enslavement and denied access to civil rights and other social opportunities, the social conditions in which they were raised and later resided as adults provided considerable disadvantage that prohibited a fair comparison. Oddly, this rationale—which focused on the role nurture played in the intellectual development of enslaved and otherwise oppressed people—did not deter Galton from his insistence that nature was responsible for variations in intellect. Through these, and other analyses, Galton challenged the then-common practice of improving humankind through public assistance, arguing that success within Victorian society was a matter of genetic inheritance rather than socially supported development. Galton’s emphasis on heredity and scientific naturalism led him to conclude that, rather than focusing resources on improving the “weak,” society should promote the reproduction of those whose hereditary endowments position them for success. It is here that initial wisps of what developed into eugenic theory emerge in Galton’s work. When first published, Hereditary Genius received mixed reviews. The scientific community praised Galton’s methods of inquiry and relished the support his conclusions provided for scientific naturalism. Upon reading the first several chapters, Galton’s cousin Charles Darwin wrote to him, I do not think I ever in all my life read anything more interesting and original—and how well and clearly you put every point! … You have made a convert of an opponent in one sense, for I have always maintained that, excepting fools, men did not differ much in intellect, only in zeal and hard work.31 Alfred Wallace, who discovered the theory of natural selection contemporaneously with Darwin, similarly praised Hereditary Genius, noting

144 The White Racial Frame and Educational Measurement that Galton offered “some of the most startling and suggestive ideas to be found in any modern work.”32 In contrast, the religious community was highly critical of Galton’s recommendations regarding the use of marriage to manufacture the production of “positive” hereditary traits—a practice aimed at improving society which stood in sharp contrast to the social welfare programs provided by church and state. Critiques by religious reviewers focused particular attention on the so-called “scientific” method employed by Galton—an effort designed to “cast doubt on the infallibility of truth reached through scientific reasoning … [and] the legitimacy of nineteenth- century formulations of science.”33 A review in the Catholic World similarly questioned Galton’s extension of scientific observations to inform laws of nature. “The danger of science, the article [declared], was not the factual information revealed, but the attempt to ‘induce a law … that will hold good beyond the particulars observed and analyzed.’” Galton’s conclusion that the pattern of genius within select families was a product of heredity was the focus of dispute: All that could be asserted would be the relation of concomitance or of juxtaposition, not the relation of cause and effect. … For, after all, there may be a real cause on which the facts depend, and which demands an entirely different explanation from the one which the scientist offer.34 Neutral reviewers, who applied neither a scientific nor a religious lens to their analysis, generally praised Galton’s work, noting the “sincerity, ingenuity, and intelligence of the author.”35 The few criticisms leveled focused on the arbitrary and selective use of statistics to back his claims and the impracticality of orchestrating marriage and childbearing which Galton advocated to improve society. Galton’s overemphasis on heredity and “too little emphasis on education, circumstances, and nepotism” and “that his data were flexible enough to fit any theory” were also concerns raised by neutral reviewers.36 Looking back on Galton’s work, four aspects stand out. First, his method of examining family history was novel at the time and provided a model for examining the heredity of traits that persists to this day. Similarly, his use of adopted children to study the influence nature and nurture may have on the development of “matched” individuals was also novel and influenced future practice. Yet, the arbitrariness with which he selected eminent families and adopted sons for his analyses yielded a sample that misrepresented the populations to which he generalized his findings. By restricting the families he studied to those whose history was recorded in select publications, and by limiting his study of adopted sons to the few embraced by popes and other dignitaries, the objectivity of Galton’s findings is questionable, at best. Second, Galton’s use of statistical methods, not just to describe society but to extrapolate and make inferences about a population, was similarly novel.

Heredity and Family Traits 145 Yet, the absence of data from the general population to which he applied his extrapolations raises concerns about the accuracy of his estimations. Instead, Galton relied on measures of physical characteristics of humans he had recently collected, which suggested a normal distribution to such traits and two sets of exam scores for students at Cambridge University to reason that cognitive or psychological traits were similarly distributed. This absence of data regarding the intellect of the general population, however, did not deter Galton from assuming normalcy in the distribution of intellect. The normal distribution of mental ability was a key assumption that influenced the derivation of intelligence test score scales and continues to influence the development of tests today. Third, beliefs Galton held prior to engaging in his investigation had a clear impact on his method, interpretations, and recommendations. Galton held strong racial biases and was an ardent supporter of Darwin’s theory of natural selection. As evidenced by the opening line of the preface to his original edition, Galton concluded that intellect was hereditary before he launched his formal investigation: The idea of investigating the subject of hereditary genius occurred to me during the course of a purely ethnological inquiry into the mental peculiarities of different races; when the fact that characteristics cling to families was so frequently forced on my notice as to induce me to pay especial attention to that branch of the subject. I began by thinking over the dispositions and achievements of my contemporaries at school, at college, and in after life, and was surprised to find how frequently ability seemed to go by descent. Then I made a cursory examination into the kindred of about four hundred illustrious men of all periods of history, and the results were such, in my own opinion, as completely to establish the theory that genius was hereditary, under limitations that required to be investigated.37 The very first phrase reveals Galton’s acceptance of the hierarchical ordering of racialized people; but providing scientific support for this hierarchy did not motivate his study. Rather, the second and third sentences reveal Galton’s preordained conclusion with which he entered into the study— mental traits are passed intergenerationally and thus are hereditary. Fourth, the recommendations Galton reached based on his questionable findings foreshadows the pseudoscientific nature of the eugenics movement that Galton would soon launch. Based on the combination of selective family histories, incomplete analysis of adopted sons, conjectures regarding the distribution of intellect, and the absence of data about environmental factors that might contribute to personal success, Galton enthusiastically embraced a solution to social challenges that, if implemented, would impact civil liberties. Although Galton’s proposed marriage and

146 The White Racial Frame and Educational Measurement childbirth incentives for those deemed to have desirable hereditary traits may seem less troubling than other eugenic solutions that would follow, the willingness to dismiss social conditions and embrace heredity as the cause of disparities in economic, health, and social well-being highlights the individualist, merit-based, and positivistic elements of the White Racial Frame that were gaining dominance at the time.38 The Jukes The study of the “Jukes” family provides insight into rising concern about degeneracy in the United States and the shift in beliefs about the cause of degeneracy that occurred during the latter years of the 19th century. First published in 1877 under the title, The Jukes: A Study in Crime, Pauperism, Disease, and Heredity, and then as a sequel as The Jukes in 1915, the Juke study traces criminality and other degenerative social behaviors within a single family residing in upstate New York. The initial study was launched in 1874 by Richard Dugdale, a member of the executive committee of the Prison Association of New York. Dugdale was charged with conducting an inspection of the state jails. Through a review of 13 jails, Dugdale found six prisoners with ties to the same family. This discovery prompted Dugdale to dig deeper into the prisoners’ family history to explore the apparent proclivity for crime. As Dugdale describes, “the question I am called upon to treat involves an examination into the correlation which exists between physical, biological, and social phenomena.”39 Of particular interest was the hereditary nature of criminality and pauperism. Following Galton’s publication of Hereditary Genius, Dugdale observed that the doctrine of heredity [was being] pushed still further by those extremists who believe it is the preponderating factor in psychology, until it is claimed that genius, special intellectual aptitude, and recondite moral qualities are, of necessity, transmitted to posterity.40 To examine the potential heredity of criminality and pauperism, Dugdale found the Jukes a particularly attractive family because of the “many intermarriages within the degrees of first and second cousins that they approximate to what breeders call ‘breeding in and in.’”41 His research on the Juke family traced the family back to six sisters who were born between 1740 and 1770. Dugdale identified 709 family members, living and dead, 540 of which were direct descendants of the sisters. Examining the descendants from each sister, Dugdale noted that the lines varied such that some were “distinctively industrious, distinctively criminal, or distinctively pauperized.”42 More specifically, Dugdale noted that “with the Jukes crime preponderates in branches that spring from bastard stock … it favors the male lines of descent, and it is thirty times more frequent than in the community at

Heredity and Family Traits 147 large.”43 Given this observation, Elisha Harris, the corresponding secretary for the Prison Association of New York, reached the conclusion that “habitual criminals spring almost exclusively from degenerating stocks.”44 Dugdale, however, did not fully attribute criminality or pauperism to heredity: We have remarked that the law of heredity is much more firmly established in the domain of physiological and pathological conditions than it is as respects the transmission of intellectual and moral aptitudes. In proportion as we approach features which are moulded by education, they are less transmissible, and more completely governed by the laws of variation, which are largely referable to environment … if we compare the proportion of pauperism among the “Jukes” who are diseased to that among those who are healthy, we shall find that fifty-six out of a hundred of the diseased come under public charge, and only seventeen out of a hundred from among those who are healthy … inherited disease preceded, pauperism follows.45 In this way, Dugdale believed that degenerative behavior was the product of the family’s poor environment. Yet, he also claimed that “environment tends to produce habits which may become hereditary.”46 Whereas Dugdale’s 1877 account of the Jukes family hedged on the hereditary nature of crime and pauperism, Arthur Estabrook’s 1915 sequel fully embraced heredity as the principal cause. Like Goddard and several other leading eugenicists, Estabrook was trained at Clark University. After graduating, Estabrook gained employment with the Carnegie Institute and was assigned to work in the newly formed Eugenics Record Office, where he engaged in various efforts to document the hereditary nature of mental traits and degenerative behaviors. Estabrook was a strong proponent of forced sterilization and served as an expert witness in the trial of Carrie Buck—a case that eventually reached the U.S. Supreme Court, whose ruling infamously upheld forced sterilization declaring, “three generations of imbeciles are enough.”47 Prior to extending Dugdale’s study of the Jukes, Estabrook conducted a study of the “Nam” family in which he concluded that the high rates of crime, disease, and poverty experienced by the family’s members was hereditary in nature.48 As Estabrook states, “the primary aim of [his] work [was] to present the facts of the lives of the Jukes.”49 To this end, Estabrook took up where Dugdale left off, tracing the living and since-born members of the Juke family between 1875 and 1915. His analysis nearly tripled the number of family members examined from Dugdale’s 709 to 2,094, of whom 1,258 remained alive in 1915. Over this 40-year period, several family members moved away from the rural upstate New York enclave that was the focus of Dugdale’s analysis, taking up residence in Connecticut, New Jersey, and Minnesota, as well as

148 The White Racial Frame and Educational Measurement other regions of New York. With this geographic dispersion, inner-marriage among family members decreased notably, with some Juke children marrying into families of greater means, while others maintained a similar economic status. Despite this dispersion, both geographically and familial, Estabrook’s analysis pointed to “the same feeble-mindedness, indolence, licentiousness, and dishonesty, even when not handicapped by the associations of their bad family name and despite the fact of being surrounded by better social conditions.”50 In fact, Estabrook concluded that one-half of the living Jukes were feebleminded. More fully, Estabrook summarized his findings and the implications of those findings as follows: Heredity, whether good or bad, has its complemental factor in environment. The two determine the behavior of the individual. The social reformer and the student of eugenics must see that, no matter what the degree of perfection to which we raise the standard of the environment, the response of the individual will still depend on its constitution and the constitution must be adequate before we can attain the perfect individual, socially and eugenically … The natural question which arises in the reader’s mind is, “What can be done to prevent the breeding of these defectives?” Two practical solutions of this problem are apparent. One of these is the permanent custodial care of the feeble-minded men and all feeble-minded women of child-bearing age. The other is the sterilization of those whose germ-plasm contains the defects which society wishes to eliminate.51 Although Estabrook acknowledges that environmental factors play a role in shaping a person’s lived experience, hereditary traits were understood as the primary factor influencing outcomes. This deference to the dominant role played by hereditary traits is similarly seen in Charles Davenport’s interpretation of Estabrook’s study: There is, indeed, no conflict between environment and heredity; each is a factor in all behavior. Environment affords the stimulus; heredity determines largely the nature of the reacting substance; the reaction, or behavior, is the resultant or product of the two … The chief value of a detailed study of this sort lies in this: that it demonstrates again the importance of the factor of heredity.52 Given Estabrook and Davenport’s leading roles in the Eugenics Record Office—an office dedicated to documenting the hereditary nature of degenerative traits—their conclusions regarding the dominant role of heredity in determining mental traits and social behaviors is expected. Nonetheless, The Jukes in 1915 reflects a profound shift in beliefs that influenced political and social solutions to growing concerns about a declining society. In the minds

Heredity and Family Traits 149 of many—particularly those who were primarily responsible for the development of mental measures—heredity nature all but replaced environmental nurture as the cause of life outcomes. Henry Goddard and The Kallikak Family Between 1896 and 1899, Henry Goddard studied under G. Stanley Hall at Clark University, where he earned a PhD in psychology. At that time, the study of pedagogy and psychology were tightly coupled. With few laboratories dedicated to the study of psychology yet established in the United States, Hall was proactive in securing positions for his graduates in normal schools, which functioned as training institutes for aspiring educators. It was in just such an institution that Goddard landed his first position. Yet, his interest in neurology, psychology, and the emerging field of Child-Study were not well aligned with the pedagogical focus of the West Chester Normal School. In a nation experiencing enormous urban growth, whose demographics were evolving rapidly, and which had only recently introduced compulsory education, Child-Study emerged as an important field. In particular, Hall’s research on “the contents of children’s minds” conducted during the years preceding Goddard’s doctorate study produced concerning findings. As an example, Hall found that “60 percent of the six-year old children entering Boston schools had never seen a robin,” “71 percent did not know beans— even in Boston,” many children believed “that good people, when they die, go into the country,” and many students having not seen a cow in person believed “it as big as their thumb or the picture” that they were shown.53 Compulsory education aimed to address these and other faulty conceptions while assimilating the masses into the dominant White Anglo-Saxon Protestant culture of the time. Doing so for all children under a single roof challenged instructional practices. As a result, two systems of schools were created: one called common schools that served the “normal” population of students; and another, often termed training schools, focused on the training of students with “special needs.” In 1900, Goddard was invited to visit what was then named the New Jersey Home for the Education and Care of Feeble-Minded Children. Located in Vineland, New Jersey, the school’s name was later changed to the Vineland Training School for Feeble-Minded Girls and Boys. As Leila Zenderland, a professor of American studies, describes, Although a private institution, the Training School also accepted many state-sponsored poor children. Yet whereas state institutions frequently housed more than 1,000 residents in large buildings, this school was unimposing, for it housed 223 children in a set of small, “homelike” cottages grouped around a small administration building.54

150 The White Racial Frame and Educational Measurement In this tranquil and nurturing environment, Goddard saw potential to engage in the type of psychological, Child-Study research for which he had developed a passion. In 1906, the then-director of the Vineland School, Edward Johnstone, extended an invitation for Goddard to join the school as a psychologist and lead research in what he called the school’s “human laboratory.” As Goddard began his research on the feebleminded, he initially saw the issue as psychological and educational. Over time, however, he came to see it is as also biological and social; the biological angle was largely one of genetics and heredity, the social one of a drain on society caused by feeblemindedness. As a research psychologist working in a residential school, Goddard worked individually with each student to deepen his understanding of cognitive development and to refine techniques to “diagnose the feebleminded.” As we will see in Chapter 6, this line of investigation led Goddard to Alfred Binet and his subsequent importation of the Binet-Simon test of mental ability. It was also in this context that Goddard began working with a young woman named Emma Wolverton—better known by her pseudonym, Deborah Kallikak. As Goddard describes, Deborah Kallikak came to the Vineland School in 1897 at the age of eight. Deborah was born in an almshouse to a mother who would later marry twice and give birth to at least four more children. Neither of Deborah’s stepfathers would provide support for her. Upon beginning public school she was identified as “feeble-minded” and was granted entry to Vineland. At the age of 20, the Binet scale indicated that Deborah operated with “the mentality of a nine-year-old child with two points over.”55 As part of his investigation into the cause of Deborah’s feeblemindedness, Goddard traced her family genealogy. His initial investigation indicated that Deborah was not the only member of her family that was feebleminded. This observation led Goddard to engage in a more extensive study of her family history. To do so, he formed a team of field workers, led by Elizabeth Kite. Like Goddard, Kite was raised as a Quaker and worked in public schools before joining Goddard’s research team. Earlier in her life, Kite also studied in Europe for six years, during which she became fluent in French—a skill that positioned her to translate Alfred Binet’s work for Goddard. As Goddard describes, the field workers were women highly trained, of broad human experience, and interested in social problems … acquainted with the condition of the feeble-minded … [They] acquaint themselves with the method of testing and recognizing them. They then go out with an introduction from the Superintendent to the homes of the children and there ask that all the facts which are

Heredity and Family Traits 151 available may be furnished, in order that we can know more about the child and be better able to care for him and more wisely train him.56 On this pretense, the field workers then gathered information about a family’s history, making judgments about the mental ability of family members based on physical appearance and accounts by relatives. To learn more about each family member, the field workers reached out to neighbors and acquaintances, and traced the family tree back several generations. Based on these narratives and other artifacts, including family photographs and journals, the field workers made judgments about the mental faculties of each family member. For Deborah Kallikak, field workers were able to trace her family back to Martin Kallikak. As a young man, Martin served as a soldier in the Revolutionary War. According to Goddard, it was “at one of the taverns frequented by the militia he [Martin] met a feeble-minded girl by whom he became the father of a feeble-minded son … This illegitimate boy was Martin Kallikak Jr., the great-great-grandfather of Deborah.”57 From Martin Jr., Goddard and his team were able to trace 480 descendants. Once out of the military, Martin Sr. failed to support his son born out of wedlock, and instead married a “normal woman” named Rhoda Zabeth. Together they had ten children, of whom eight survived infancy. Goddard similarly traced the descendants of Martin Sr. and Rhoda, finding a total of 496 descendants. Relying on accounts, personal observation, and inferences based on photographs, Goddard and Kite determined that of the 480 descendants of Martin Jr, 103 “were or are feeble-minded, while only forty-six have been found normal.” The remaining 347 were classified as “unknown or doubtful” with the caveat that many of these descendants were “people we can scarcely recognize as normal; frequently they are not what we could call good members of society.”58 In contrast, relying primarily on a family tree produced by a single descendant of Martin Sr. and Rhoda, Goddard determined that none of their descendants were feebleminded. This observation led Goddard to conclude: The striking fact of the enormous proportion of feeble-minded individuals in the descendants of Martin Kallikak Jr. and the total absence of such in the descendants of his half brothers and sisters is conclusive on this point. Clearly it was not environment that has made that good family. They made their environment; and their own good blood, with the good blood in the families into which they married … From this comparison the conclusion is inevitable that all this degeneracy has come as the result of the defective mentality and bad blood having been brought into the normal family of good blood, first from the nameless feeble-minded girl

152 The White Racial Frame and Educational Measurement and later by additional contaminations from other sources. The biologist could hardly plan and carry out a more rigid experiment or one from which the conclusions would follow more inevitably.59 Through his study of the Kallikak family, Goddard believed he firmly linked feeblemindedness to heredity. And, like Estabrook, Goddard viewed sterilization and “segregation through colonization” practical approaches to slowing the spread of feeblemindedness.60 The White Racial Frame and Family Studies The method of family study introduced by Galton was propagated in the United States to explain why the elite few thrived in a rapidly changing social environment while an increasingly many struggled for survival.61 For those who conducted family studies, the primary reason for disparate outcomes was preordained: it was the hereditary nature of mental traits that allowed the elite to rise to “eminence” while others descended to a “degenerate” state. Edwin Black, an American historian and investigative journalist, locates interest in the heredity of traits that motivated family studies at the intersection of three influential developments in economic philosophy, social theory, and biological sciences. The first development connected the growth of social welfare supports with the warning Thomas Malthus made about such programs more than a half-century earlier. At the time Galton launched his study of the heredity of genius, England and the United States were both experiencing rapid growth in paupers and the poor—people the elite believed were becoming increasingly dependent on public assistance. Rather than helping people improve their lot, these programs appeared to increase dependence—a pattern consistent with Malthus’s social-economic theory positing that social welfare caused intergenerational poverty (an argument that persists to this day). With industrialization and exploitation of the workforce placing greater strain on those with the least financial resources, Malthus’s theory was used to reject the utility of providing assistance to the poor.62 Spencer’s conception of “survival of the fittest” functioned as a second development that provided context for family studies. Building on Malthus’s theory, Spencer argued that traits which made a person “fit” or “unfit” to lead a productive life were inherited. “Through evolution, the ‘fittest’ would naturally continue to perfect society. And the ‘unfit’ would naturally become more impoverished, less educated and ultimately die off.”63 Spencer’s belief in the heritability of traits provided backing to Malthus’s criticism of public assistance, suggesting that such assistance is only slowing the otherwise natural processes that advance humankind. Nearly a decade after Spencer introduced the notion of the survival of the fittest, Darwin published On the Origin of Species. Whereas Spencer’s notion

Heredity and Family Traits 153 was a reasoned extension of Malthus’s theory, Darwin’s discovery of evolution through natural selection lent scientific credibility to the theory of survival of the fittest. As noted in Chapter 4, Darwin credited Malthus’s influence on the formation of his theory, writing: It is the doctrine of Malthus applied with manifold force to the whole animal and vegetable kingdoms; for in this case there can be no artificial increase of food, and no prudential restraint from marriage. Although some species may be now increasing, more or less rapidly, in numbers, all cannot do so, for the world would not hold them.64 Upon reading his cousin’s book, Galton wrote to Darwin, Pray let me add a word of congratulation on the completion of your wonderful volume … I have laid it down in the full enjoyment of a feeling that one rarely experiences after boyish days, of having been initiated into an entirely new province of knowledge, which, nevertheless, connects itself with other things in a thousand ways.65 It was this context in which Malthus, Spencer, and Darwin’s theories were intertwined that Galton, Goddard, Dugdale, Estabrook, and a dozen other investigators studied the heritability of family traits.66 Over a 50-year period, the increasing acceptance of the hereditary nature of traits led the investigators of families to narrow their focus on the characteristics and behaviors of individual family members and increasingly dismiss the influence of the social and economic environments in which each individual was reared and subsequently operated. This focused attention on the individual is most evident in Goddard’s description of his field researchers—women trained to identify and record mental capacities through mere observations, but whose field work produced few details on the lived experiences of these individuals beyond degenerate behaviors “produced” by their “feeblemindedness.” Although none of the family studies made use of the rudimentary or recently emerged tests of cognitive functions, their tight focus on the heredity of mental traits—understood as innate traits of the individual—functioned recursively to strengthen understanding of mental ability as a fixed attribute of the individual. As we see next, understanding mental ability as an innate, fixed trait of the individual was core to the development and rapid rise of tests of intelligence. Notes 1 Rotherham United is currently nicknamed the Millers, and this quote is the supporters’ team motto. 2 Rafter (1988), p. 5.

154 The White Racial Frame and Educational Measurement 3 See “The apple doesn’t fall far from the tree” at Grammarist (2002), https:// grammarist.com 4 Rafter (1988), p. ix. 5 Wiebe (1967); Black (2012); Rafter (1988); Zenderland (1998/2001). 6 Rafter (1988), p. 1. 7 Black (2012); Zenderland (1998/2001). 8 Rogers and Merrill (1919); Rafter (1988). 9 Davenport and Laughlin (1915), p. 3. 10 Summer School (1919), p. 21. 11 Rafter (1988), p. 2. 12 Rafter (1988), p. 6. 13 Hereditary Genius expanded on papers and letters Galton published in 1865, 1868, and 1869 which explored the hereditary nature of mental traits and their relationship with eminence. 14 Galton (1869/1914), p. 33. 15 Galton (1869/1914), p. 33. 16 Galton (1869/1914), p. xiii. 17 Galton (1869/1914), p. 49. 18 Galton (1869/1914), p. 52. 19 Galton (1869/1914), p. 62, italics in the original. 20 Galton (1869/1914), p. 309. 21 Galton (1869/1914), p. 37. 22 Galton (1869/1914), p. 37. 23 Galton (1869/1914), p. 38. 24 Galton (1869/1914), p. xii. 25 Galton (1869/1914), p. 31. 26 Galton (1869/1914), p. 31. 27 Galton (1869/1914), p. 325. 28 Galton (1869/1914), p. 326, italics in the original. 29 Galton (1869/1914), p. 327. 30 Galton (1869/1914), pp. 326–327. 31 Darwin (1869). 32 Wallace (1870) quoted in Gökyiḡ it (1994), p. 221. 33 Gökyiḡ it (1994), pp. 222–223. 34 Catholic World, quoted in Gökyiḡ it (1994), p. 223. 35 Gökyiḡ it (1994), p. 229. 36 Gökyiḡ it (1994), p. 229. 37 Galton (1869/1914), p. v. 38 Note that Galton’s analysis of the adopted sons of popes was an effort to compare the rise to eminence of children born to parents of an unknown social class but raised in privileged settings. However, Galton’s analysis relied on a weak assumption regarding the similarity of social environments among the English elite and children adopted into religious residences. Further, Galton limited his analysis to elite social settings absent similar analysis of the rise to eminence of children raised in less privileged settings or the few students from such settings who were granted admission to elite universities such as Cambridge. 39 Dugdale (1877/1988), p. 35. 40 Dugdale (1877/1988), p. 36. 41 Dugdale (1877/1988), p. 36. 42 Dugdale (1877/1988), p. 37. 43 Dugdale (1877/1988), p. 37. 44 Dugdale (1877), p. v, in introductory comments by Elisha Harris.

Heredity and Family Traits 155 45 Dugdale (1877/1988), pp. 38–40. 46 Dugdale (1877), p. 66. 47 Justice Oliver Wendell Holmes authored the majority opinion in which the court upheld Virginia’s intent to sterilize Carrie Buck due to her classification as a “mental defective.” See Lombardo (2008) for a detailed account of Buck v. Bell. 48 Estabrook and Davenport (1912). 49 Estabrook (1916), p. 85. 50 Charles Davenport summarizing Estabrook’s findings in the preface to Estabrook (1916), p. iii. 51 Estabrook (1916), p. 85. 52 Charles Davenport summarizing Estabrook’s findings in the preface to Estabrook (1916), p. iv. 53 Hall quoted in Zenderland (1998/2001), p. 57. 54 Zenderland (1998/2001), p. 60. 55 Goddard (1912/1916), p. 10. 56 Goddard (1912/1916), p. 13. 57 Goddard (1912/1916), p. 18. 58 Goddard (1912/1916), p. 19. 59 Goddard (1912/1916), pp. 53, 69. 60 Black (2012); Goddard (1912/1916), p. 117. 61 Rafter (1988). Rafter also argues that family studies provided a body of evidence that provide justification for eugenic conceptions that linked social class to hereditary traits. See also MacKenzie (1981, p. 18), who observed, “The eugenic theory of society is a way of reading the structure of social classes onto nature.” 62 Black (2012). 63 Black (2012), p. 12. 64 Darwin (1859), p. 34. 65 Darwin (1859). 66 Black (2012).

References Black, E. (2012). War Against the Weak: Eugenics and America’s Campaign to Create a Master Race. Dialog Press. Darwin, C. (1859). On the Origin of Species. D. Appleton and Company. Darwin, C. (1869). Letter 410, To Francis Galton, dated December 23. Davenport, C.B. & Laughlin, H.H. (1915). How to Make a Eugenical Family Study, Bulletin No. 13. Eugenics Records Office, Cold Spring Harbor. Dugdale, R.L. (1877). “The Jukes”: A Study in Crime, Pauperism, Disease, and Heredity: Also Further Studies of Criminals (No. 14). GP Putnam Sons. Estabrook, A.H. (1916). The Jukes in 1915 (No. 240). Carnegie Institution of Washington. Estabrook, A.H. & Davenport, C.B. (1912). The Nam Family: A Study in Cacogenics (No. 2). Eugenics Records Office. Galton, F. (1859). Letter 82, Francis Galton to Charles Darwin, dated December 9. Galton, F. (1869/1914). Hereditary Genius: An Inquiry into its Laws and Consequences. D. Appleton. Goddard, H.H. (1912). Feeble-mindedness and immigration. The Training School, 9(6), 91–94.

156 The White Racial Frame and Educational Measurement Gökyiḡ it, E.A. (1994). The reception of Francis Galton’s “Hereditary Genius” in the Victorian Periodical Press. Journal of the History of Biology, 27(2), 215–240. Lombardo, P.A. (2008). Three Generations, No Imbeciles: Eugenics, the Supreme Court, and Buck v. Bell. JHU Press. MacKenzie, D. (1981). Statistics in Britain, 1865–1930: The Social Construction of Scientific Knowledge. Edinburgh University Press. Rafter, N.H. (Ed.). (1988). White Trash: The Eugenic Family Studies, 1877–1919. Northeastern University Press. Rogers, A.C. & Merrill, M.A. (1919). Dwellers in the Vale of Siddem: A True Story of the Social Aspect of Feeble-Mindedness. RG Badger. Summer School. (1919). Alumni Roster: Eugenics course. Eugenical News, 4(3), 21–28. Wallace, A.R., (1870). Review of hereditary genius, Nature, I, 501–503. Wiebe, R.H. (1967). The Search for Order, 1877–1920. Macmillan. Zenderland, L. (1998/2001). Measuring Minds: Henry Herbert Goddard and the Origins of American Intelligence Testing. Cambridge University Press.

6

The Birth of Tests of Mental Ability

One of the most important objects of measurement … is to obtain a general knowledge of the capacities of a man by sinking shafts, as it were, at a few critical points.1

Published in 1869, Francis Galton’s Heredity Genius focused attention on the genetic transmission of mental ability across generations, kindling eugenic solutions intended to manage the evolution of the mental qualities of humankind. Interest in the intellect of humankind, however, predates Galton. As far back as the 4th century BC, Plato contemplated intelligence, positing that intelligence resided in the soul rather than the body.2 The intelligence of characters was also often noted by Shakespeare—in Thersites, Agamemnon was described as “not so much brain as earwax”; and in Twelfth Night, Quinapalus claimed, “better a witty fool than a foolish wit.”3 Mental traits were also noted by Linnaeus in his classifications of humans into racialized categories, describing Europaeus as “wise” and “inventive” and Africanus as “sly” and “neglectful.”4 Following the social construction of race and racialized categories, differences in mental ability became a focal topic of investigation for 19th-century social scientists. Most notable were efforts by George Cuvier, Samuel Morton, and Paul Broca to infer mental ability possessed by members of different racialized groups based on the characteristics of their skulls. These initial efforts to infer mental ability based on scientific observation of the deceased sparked interest in developing methods to measure mental traits in the living. This chapter traces the development of mental measures during the 19th and early 20th centuries, culminating in the standardized tests of intelligence that garnered widespread attraction in the 1920s. Much has been written about the development of intelligence testing. This chapter highlights key contributions to this development to document the ways in which this work was guided by the White Racial Frame. This tracing uncovers several ways in which the White Racial Frame influenced DOI: 10.4324/9781003228141-9

158 The White Racial Frame and Educational Measurement these early developments and how, in turn, these developments influenced the field as it exists today. To tell this story, the chapter is organized into four sections. The first section focuses on precursors to the development of intelligence tests. The second focuses on early efforts to develop and use measures of mental ability. The third focuses on the development of the first large-scale standardized measure of mental ability. The fourth explores interpretations of scores produced by that first large-scale administration. Precursors to Intelligence Testing Skulls and Mental Functions

Early efforts to “measure” intellect were inseparable from the racial science that developed in the 19th century. Initial efforts to make inferences about mental functions and capabilities relied on the analysis of skulls. Interest in skulls began with the French naturalist George Cuvier. Much of Cuvier’s work focused on the paleontology of vertebrae. Among his many contributions to science was his discovery and subsequent determination that fossils found in Argentina were not those of elephants, but rather prehistoric mastodons. Cuvier’s work with bones extended to modern-day humans for which he observed notable differences in the shape of skulls of people membered into different racialized groups. Perhaps his most notorious analysis was that of Sarah Baartman, a South African Khoikhoi woman exhibited in Europe as the “Hottentot Venus.” Upon her death, Cuvier dissected Baartman’s remains and, making particular note of her skull shape, claimed she more closely resembled a monkey than a human being.5 Cuvier’s crude analysis of skulls was advanced by Petrus Camper, a Dutch physician whose work focused on comparative anatomy. Through his analysis of skull shape, Camper introduced a measure of facial angle which he used to differentiate racialized groups. Although he did not relate measures of facial angle to intelligence, his method of measurement planted the seed for using features of skulls as an indicator of mental functions. Camper’s efforts to differentiate among racialized groups based on measures of facial angle piqued the interest of Johann Blumenbach, a German scholar who eventually collected 245 skulls from various regions of the world. Through his analysis of these skulls, Blumenbach developed several measures of skull features. Combining these measures into what he termed the norma verticalis, Blumenbach claimed his composite measure provided a unique technique for categorizing a skull into a racialized group. Cuvier, Camper, and Blumenbach each used skulls to differentiate among racialized groups. They each also operated with a hierarchical ordering of racialized groups, with White Europeans at the top and Black Africans at

The Birth of Tests of Mental Ability 159 the bottom. Yet, their analysis of skulls did not directly connect measurable differences in skull shape with differences in mental functions. These connections were made explicit by Franz Gall and Samuel Morton. Franz Gall was a German neuroanatomist and physiologist who pioneered phrenology—a popular pseudoscience during the early 1800s that made inferences about mental and psychological traits based on specific skull and facial features.6 Gall believed that the mind was formed by a collection of independent entities or traits each housed in specific locations within the brain. Over time, Gall developed maps detailing the location of what he termed “organs.” Among the dozens of organs mapped were traits such as courage, ambition, caution, humor, parental love, friendship, and self-reliance. Gall similarly located specific regions of the brain in which number and language were processed and where concentration occurred.7 Using maps that linked specific areas of a skull with each of these organs, Gall introduced methods to “conjure a person’s skills, propensities, and deep-seated personality traits from bumps on the skull.” Over time, application of Gall’s phrenological science “proved to be a lucrative business for those greedy charlatans, who opened shops in the cities and went around the countryside masquerading as knowledgeable professors.”8 Although phrenology was eventually dismissed as pseudoscience, it plays an important role in advancing the idea that analysis of the brain can provide insight into cognitive functions, a notion Samuel Morton helped further advance. Born and raised in Philadelphia as a Quaker, Morton was a polygenist who adhered strongly to the hierarchical ordering of racialized groups. More significantly, Morton is viewed as the father of the “American School” of anthropology and of scientific racism in the United States.9 At the age of 20, Morton began his studies at the University of Pennsylvania Medical School, and was elected a member of the prestigious Academy of Natural Sciences of Philadelphia, before transferring to the University of Edinburgh Medical School, where he completed his studies. While in Edinburgh, Morton was influenced by Robert Jameson whose work with animal specimens provided a foundation for Morton’s interest in craniometry. Upon returning to the United States, Morton established a medical practice in Philadelphia and became a teacher at the Philadelphia Association of Medical Instruction. Morton also engaged in several paleontology studies, including an analysis of fossils collected by Lewis and Clark.10 During the early phase of his career, Morton developed a strong interest in phrenology.11 Morton’s most notable work focused on the measure of cranial capacity of human skulls. Morton began collecting human skulls during the 1830s, and by his death in 1851 he had amassed nearly 1,000 skulls from around the world.12 In addition to recording the many measures introduced by Camper, Blumenbach, and others, Morton measured the volume of each skull’s cranial capacity. As Morton describes,

160 The White Racial Frame and Educational Measurement In order to measure the capacity of a cranium, the foramina were first stopped with cotton, and the cavity was then filled with white pepper seed poured into the foramen magnum until it reached the surface, and pressed down with the finger until the skull would receive no more. The seed was then removed and the “the capacity of the cranium in cubic inches” was determined based on the seed’s volume.13 Believing that the size of one’s brain determined one’s mental ability, Morton’s measure of cranial capacity was used to represent the level of intellect of the person whose skull was measured. Having collected skulls from around the world, Morton grouped his skulls based on the racialized identity of the person from whom the skull was taken. Morton then calculated the mean cranial capacity of each racialized group. Based on these calculations, Morton concluded that the Caucasian skulls had the largest cranial capacity, and thus Caucasians were of the highest intellect. Mongolian skulls were second in size, followed by Malay, (Indigenous) American, and then Ethiopian.14 The volume of sales of Morton’s book, Crania Americana, was disappointingly small, and some reviewers were critical of Morton’s methods. In particular, critics noted Morton’s failure to separate skulls from males and females and to consider the age of the person for each skull. More recently, Stephen J. Gould, the late paleontologist and evolutionary biologist, raised similar concerns about Morton’s seemingly subjective decisions to include or exclude skulls, leading Gould to conclude that Morton’s “a priori conviction of racial ranking [was] so powerful that it directed his tabulations along preestablished lines.”15 Despite these concerns, Morton’s conclusions had a notable influence on the thinking of emerging medical and scientific experts, most notably Josiah Nott and Louis Agassiz, who played fundamental roles in advancing the racial science that eventually manifested in intelligence testing and the eugenics movement in the United States.16 Psychophysics and Mental Abilities

Whereas Morton and Gall relied on features of the human skull to infer mental traits, Gustav Fechner employed physical stimuli to measure activity of the mind. A German-born physicist and experimental psychologist, Fechner pioneered the development of psychophysics. Reflecting the positivist endeavor to represent the world through mathematical laws that was gaining traction at the time, Fechner sought to discover the mathematical relationship between the strength of stimuli and the sensation of those stimuli. To identify this relationship, Fechner undertook various experiments in which he modified the intensity of a given stimulus and documented a person’s ability to detect the modification.

The Birth of Tests of Mental Ability 161 Fechner’s work built on that of Ernest Heinrich Weber, a German physiologist who developed a mathematical expression that represented the degree to which a stimulus must increase in order for that increase to be just noticeable. In Weber’s research, a person was exposed to two stimuli in succession and asked to indicate whether the stimuli were of the same intensity or of different intensities. As an example, a person might be asked to hold a block weighing 50 grams. The person was then presented with another block weighing 54 grams and asked whether there was a difference in the weight of the two blocks. The experiment was then repeated with blocks of other weights—for example, 100 grams and 104 grams, and so on. Through his many experiments, Weber discovered that the detection of just noticeable differences in stimuli was dependent not on a constant difference in stimuli, but rather on the ratio of that difference. In the example presented, if the 4-gram increase from 50 to 54 was just noticeable (yielding a ratio of 1.08), then the 4-gram increase between 100 and 104 grams would not be sufficient for producing a just noticeable difference. Instead, an 8-gram increase is required. Fechner’s work sought to extend Weber’s law to develop what he described as “an exact theory of the functionally dependent relations of body and soul or, more generally, to the material and the mental, of the physical and the psychological worlds.”17 In effect, Fechner aimed to measure “the mental” through manipulation of “the material.” As Derek Briggs, professor of research and evaluation methodology, describes: Fechner saw measurement as a matter of finding a unit or standard that could be used to count equalities in some target quantities … Fechner argued that the act of [mental] measurement always involves an estimate based on the mental impression that is made on the measurer … Fechner reasoned that since physical quantities are ultimately understood as measures through a psychological interpretation, it must be possible to invert the relationship such that psychological quantities could be understood through a physical interpretation.18 To this end, Fechner aimed to advance the measure of Weber’s just noticeable difference and deepen understanding of the relationship between stimuli produced in the physical world and the sensations experienced in the psychological. Like Weber, Fechner’s technique relied on the presentation of physical stimuli to elicit feedback on the sensations produced by those stimuli. In addition to Weber’s original technique, Fechner introduced a new method in which he presented participants with stimuli of two different magnitudes, one being of greater magnitude than the other, and asked them to judge

162 The White Racial Frame and Educational Measurement whether they perceived one as being greater than the other. As an example, to one group of people Weber might present a block of 100 grams and 108 grams and ask whether the second block is heavier than the first. To a second group, he might present blocks of 100 grams and 116 grams. In the first group, Fechner might observe that approximately 50% of people respond correctly that the second block is heavier than the first. In the second group, 65% might respond correctly. Modifying this “test item” in a systematic manner, Fechner estimated the relationship between differences in the physical magnitudes of the stimuli and the accuracy of correct judgments. Once knowing this relationship, Fechner then derived the just noticeable difference. Fechner’s work is important for the development of mental measures for two reasons. First, his method of presenting stimuli to activate psychological reactions was a precursor to the use of test items to stimulate cognitive functions. More specifically, Fechner’s systematic manipulation of the stimuli—altering the magnitude of the physical stimulus—to evidence the impact on a mental process—detection of a noticeable difference—foreshadows the manipulation of the difficulty of items presented to test takers in order to evidence differences in mental ability. Second, Fechner’s efforts to develop mathematical representations that express relationships between external stimuli and internal processes (e.g., evaluation of those sensations) laid a foundation for the development of mathematical models that yield measures of cognitive ability. Prior to Weber and Fechner’s work, measurement was limited to the physical world. Fechner, however, expanded the domain of measurement from the physical to the psychological world. As Briggs describes, Fechner’s naturalistic philosophy that rejected the mind-body dichotomy, and his theory that all measurement depends on the specification of a measurement formula … led him to the conviction that a psychological attribute (sensation intensity) was measurable through its relationship to a physical attribute (stimulus magnitude).19 Galton and the Anthropometric Laboratory

It is important to note that Fechner did not apply his efforts to measure psychological sensation to make inferences about mental ability; that leap was left for Francis Galton. As Arthur Jensen, an educational psychologist whose controversial interpretation of racial differences in mental measures we touch upon later, describes, Galton “suggested that individual differences in general ability are reflected in performance on relatively simple sensory capacities and in speed of reaction to a stimulus, variables that could be objectively measured by tests of sensory discrimination and reaction time.”20

The Birth of Tests of Mental Ability 163 To systematically collect measures of human physical and “mental” traits, Galton created what he termed the Anthropometric Laboratory, which he installed at the 1884 London International Health Exhibition. As Galton describes, the laboratory was formed by a space 36 feet long by 6 feet wide [that] is fenced off from the side of a gallery by open lattice-work. It is entered by a door at one end, and is quitted by a second door at the other. The public can easily see through the lattice work, while they are prevented from crowding too close. A narrow table runs half-way down the side of the laboratory, on which the smaller instruments are placed. The measurements with the larger ones take place beyond the table.21 Among the instruments, two are most relevant to the development of measures of mental ability. The first device was designed to measure one’s ability to judge the squareness of an angle. As Galton describes, the device consisted of a board including a sector of a circle … an arm movable about the centre of the circle … A black line is drawn across the board. The person tested is desired to set the arm as squarely as he can to the black line.22 Once the person completes the task, an attendant uses a built-in scoring guide to determine the deviation of the angle created from 90 degrees. The second measure focused on the ability to segment lines into sections representing one-half and one-third of the line’s length. Although Galton classified these two tasks as measures of “judgement of eye as regards length [and] as regard squareness,” one can find quite similar items in many mathematic achievement tests administered to elementary school students today. Galton charged three pence for attendees to pass through his laboratory. Despite this price of admission, during the course of the exhibition, Galton collected measures from 9,377 persons, as well as some 117 pound sterling— perhaps the first example of payment for a cognitive test.23 Galton argued that his systematic collection of measures was useful for both personal and statistical purposes: Periodical measurements afford a sure test whether the physical development of the child or youth is proceeding normally … Anthropometric records are treated statistically to discover the efficiency of the nation as a whole and in its several parts, and the direction in which it is changing, whether for better or worse.24 Galton’s anthropometric data provided a model for the data collection efforts undertaken some 25 years later by the Eugenics Records Office to

164 The White Racial Frame and Educational Measurement monitor “degeneracy” in the U.S. population. His vision also foreshadows modern-day large-scale testing programs, such as the National Assessment of Educational Progress and the Trends in International Mathematics and Science Study, that are designed to monitor changes in student achievement at the national level. As explored in Chapter 8, data collected by his laboratory also proved useful for Galton’s development of the concepts of statistical correlation and regression. A separate investigation undertaken by Galton similarly served as a model for the development and use of a standardized survey instrument to collect information from a population sample. In this case, Galton speculated that the ability to produce a clear and detailed mental image of a previously experienced scene was indicative of mental ability. To explore this hunch, Galton created an instrument comprising a single prompt followed by three open-response items. The prompt read as follows: Before addressing yourself to any of the Questions on the opposite page, think of some definite object—suppose it is your breakfast-table as you sat down to it this morning—and consider carefully the picture that rises before your mind’s eye. Respondents were then asked the following questions: 1 Illumination.—Is the image dim or fairly clear? Is its brightness comparable to that of the actual scene? 2 Definition.—Are all the objects pretty well defined at the same time, or is the place of sharpest definition at any one moment more contracted than it is in a real scene? 3 Colouring.—Are the colours of the china, of the toast, bread-crust, mustard, meat, parsley, or whatever may have been on the table, quite distinct and natural?25 Galton collected responses from a variety of people in England, the United States, and other regions of Europe. From this diverse sample, Galton reviewed the submissions and selectively sampled 100 responses, “at least half of whom are distinguished in science and other fields of intellectual work.”26 Through purposeful selection, Galton’s final set of responses represented a wide range in reported ability to produce mental imagery. Ordering the responses from 1 to 100 based on the degree of vividness reflected in the respondent’s descriptions, Galton then formed nine ordered categories ranging from “Highest” to “Lowest” level of vividness of mental imagery. The same process was repeated for “Colour Representation.” Galton then examined the relationship between the “intellect” of each respondent, as indicated by whether their work was considered distinguished in science and other fields, or whether they were a member of the general public.

The Birth of Tests of Mental Ability 165 Despite his belief that people of high intellect possess higher abilities of mental imagery, Galton’s data failed to support his speculative hunch. Galton, however, resisted this finding, reasoning that people of high intellect must have a habit of suppressing mental imagery … and as the power of dealing easily and firmly with these ideas is the surest criterion of a high order of intellect, we should expect that the visualising faculty would be starved by disuse among philosophers, and this is precisely what I found on inquiry to be the case.27 Among his other musings about the relationship between mental imagery and intellect, Galton recorded the following: I found as a rule that men have more delicate powers of discrimination than women, and the business experience of life seems to confirm this view. The tuners of pianofortes are men, and so I understand are the tasters of tea and wine, the sorters of wool, and the like. These latter occupations are well salaried, because it is of the first moment to the merchant that he should be rightly advised on the real value of what he is about to purchase or to sell. If the sensitivity of women were superior to that of men, the self-interest of merchants would lead to their being always employed; but as the reverse is the case, the opposite supposition is likely to be the true one.28 The discriminative faculty of idiots is curiously low; they hardly distinguish between heat and cold, and their sense of pain is so obtuse that some of the more idiotic seem hardly to know what it is.29 Since families differ so much in respect to this gift, we may suppose that races would also differ, and there can be no doubt that such is the case.30 Although Galton’s efforts to link response time and mental imagery to intellect were unsuccessful, Galton nonetheless wrote that “the trials I have as yet made on the sensitivity of different persons confirms the reasonable expectation that it would on the whole be highest among the intellectually ablest.”31 In these comments, we see the strong influence that a belief in the hereditary nature of traits coupled with preconceived notions regarding the intellectual superiority of males membered White had on Galton’s questionable conclusions and related speculations. Perhaps more accurately and influential for the development of mental measures is Galton’s conclusion that his work “proved [the] facility of obtaining statistical insight into the processes of other persons’ minds, whatever à priori objection may have been made as to its possibility.”32 It is the facility of gaining insight into the

166 The White Racial Frame and Educational Measurement mental ability through standardized procedures that Alfred Binet would soon develop, and which Henry Goddard would then help popularize in the United States. Standardized Tests of Mental Ability Although Galton’s tests did not measure anything closely related to our current conception of intellect, his investigations inspired scholars in the budding field of psychology to explore new methods for measuring mental abilities.33 Between 1890 and 1905, an expanding number of psychologists engaged in research programs aimed at developing measures of mental abilities. Initially, these efforts resembled techniques employed by psychophysicists. As an example, U.S. psychologist James Cattell developed both a 10-item and a 50-item test designed to “discover the constancy of mental processes, their interdependence, and their variation under different circumstances.”34 On his short form were items that measured dynamometer pressure (“greatest possible squeeze of the hand”), pressure causing pain, least noticeable difference in weight, reaction time, line bisection, and the number of letters listed that could be repeated.35 Hugo Münsterberg, a German-trained professor of applied psychology at Harvard University, similarly attempted to infer mental ability based on measures of the length of time required to complete various tasks such as reading aloud ten monosyllabic words, reading ten names of plants and animals, stating the names of ten simple drawings, adding ten single-digit numbers, and naming three different perfumes. Similar efforts were made by other psychologists, including Joseph Jastrow, Emil Kraepelin, and J.A. Gilbert, whose tests included tasks specific to memory, capacity for exercise, tiredness, absent-mindedness, and natural suggestibility. In 1896, Alfred Binet and his then-student Victor Henri critiqued each of these efforts, arguing that they focused too much attention on “elementary mental operations” and largely ignored “higher processes.”36 As Binet and Henri write: If one looks at the series of experiments that have been made—the mental tests, as the English say—one is astonished by the considerable place reserved to the sensations and simple processes, which some [experimenters] neglect completely … The objection will be made that the elementary processes can be determined with much more precision than the superior ones; this is true, but people differ much less in these elementary processes than in the complex ones; there is no need, therefore, for as precise a method of determining the latter as for the former, a point that is often forgotten. Anyway, it is only by applying ourselves to this point that we

The Birth of Tests of Mental Ability 167 can approach the study of individual differences … We must expend our attention on superior psychic faculties.37 With this shift in emphasis from elementary to superior process of the mind, Binet launched his effort to develop a measure of mental ability. Interested in pedagogy as well as psychology, Binet was particularly intent on developing a method to efficiently distinguish children whose cognitive abilities could be developed in typical educational environments from those children who required special instruction. Here again, Binet was critical of the then-current methods used to diagnose children with cognitive challenges. At the time, diagnoses were typically made by medical experts and focused on physical disabilities that often, but not always, accompanied cognitive needs. As Binet argued, a child should be diagnosed as requiring cognitive support not because the child does not walk, nor talk, has no control over secretions, is microcephalic, has the ears badly formed or the palate keeled … [Instead] the child is judged to be [cognitively challenged] because he is affected in his intellectual development … if we suppose a case presented to us where speech, locomotion, prehension were all nil, but which gave evidence of an intact intelligence, no one would consider that patient [cognitively challenged].38 For Binet, diagnosing cognitive development should be based solely on psychological criteria rather than medical or physical concerns. For nearly a decade, Binet engaged in various lines of research in pursuit of such a method, but his experiments were unsuccessful.39 The issues he grappled with during these efforts foreshadow challenges and disagreements about measures of cognition that persist to this day. For example, in 1898, Binet asked “if it is a question of measuring the keenness of intelligence, where is the method to be found to measure the richness of intelligence, the sureness of judgement, the subtlety of mind?”40 Shortly thereafter, Binet critiqued his own efforts to develop methods for measuring the mind, “press[ing] the point that the results offered classifications among individuals, not true measurements.”41 Binet argued that unlike physical measures, such as length, for which the difference between six and seven inches is the same as that between seven and eight, it is unknown “whether the difference between a recall of six digits and a recall of seven digits is or is not equal to the difference between the recall of seven digits and the recall of eight.”42 Through his work, Binet came to understand that “intelligence is embedded in the total personality” rather than something that exists independent of personality.43 Further, Binet did not conceive of intelligence as a single unidimensional construct, but rather as something that is comprised of

168 The White Racial Frame and Educational Measurement many different faculties. And by 1903 Binet had come to conceive “of thought or intelligence as something—an act, a process, a force—that takes in external stimuli, organizes, directs, chooses, adapts them, all in ways that differ greatly among individuals.”44 In this way, intelligence is stimulated and interacts with elements of the external world—an idea that gave rise to the use of items as external stimuli of the mental processes targeted by a test of cognitive abilities. In 1904 Binet was presented with a unique opportunity to apply his deepened understanding to solve a practical problem. Like the United States, France had adopted universal education that required all children to receive schooling. And like the United States, the French public schools struggled to support the development of students with cognitive challenges. To help address this issue, the Paris Minister of Public Education formed a commission to study the education of students with special needs. One of the initiatives the commission took up focused on establishing standards for placing students with learning challenges into “special classes.” It was this challenge that motivated Binet’s efforts to develop a method for classifying children who were in need of special instruction. The method Binet and his colleague Théodore Simon developed differed notably from previous efforts. Understanding the issue as one of classification rather than measurement, Binet structured the solution as a comparison between the cognitive functioning of “normal” children with those experiencing challenges.45 Recognizing that the ability to perform various cognitive functions developed at an older age for children with cognitive challenges, Binet sought a method that related a child’s current cognitive abilities with the age when children typically develop those abilities. To this end, Binet and Simon created a set of 30 tasks of increasing difficulty. Easier tasks asked children to identify common objects, count, and make rhymes. More challenging tasks required children to compose sentences, compare quantities, and perform more advanced arithmetic. To establish norms representing the age at which various cognitive functions typically emerge, this series of tasks was administered verbally in a standardized manner to children attending normal schools. The test administrator recorded the child’s age and the number of tasks successfully completed. With this information, Binet and Simon calculated the typical (average) number of tasks successfully completed by children of different ages. As Binet and Simon’s work advanced, they also developed items that differentiated children of one age from that immediately above. As an example, they noticed that three-year old children were sometimes unable to correctly state their gendered identity, but four-year old children rarely made this mistake. In another task, children were asked what is silly in a sentence that reads “I have three brothers, Paul, Ernest and myself.” As Binet describes, “After finishing with the [cognitively challenged] children we recognized that

The Birth of Tests of Mental Ability 169 it is almost always possible to equate them with normal children who are much younger.”46 Comparing the responses provided by a given child against norms calculated for children ranging from 3 to 13, Binet classified a child’s relative mental age. Comparing a child’s mental age with their actual age was used to identify cognitive challenges.47 Further, Binet applied the relationship between actual age and age of cognitive functioning (a.k.a. grade) to classify children into different levels of cognitive challenge. Importing Binet-Simon to the United States One year after Binet and Simon introduced the first version of their test of mental abilities, Henry Goddard became director of research at the Vineland Training School for Feeble-Minded Girls and Boys in Vineland, New Jersey. As director, Goddard launched a program of research aimed at understanding the causes of feeblemindedness. As part of this effort, Goddard and his small staff met regularly with each of the 200 or so children in residence at the school. During these meetings, Goddard probed the children to document the types of cognitive and physical tasks they were able to perform. Over years of data collection, Goddard hoped to chart the developmental path of feebleminded children and to develop more sound methods for classifying children into different levels of feeblemindedness. As Goddard would later recount, it was early in this research endeavor that “somehow there came into my hands a single printed sheet signed by an unknown Belgian by the name of M.C. Schuyten.”48 Engaged in similar Child-Study research in Europe, Schuyten introduced Goddard to a Doctor Ovide Decroly, who directed a research program in Brussels similar to Goddard’s at Vineland. Goddard traveled to Europe to meet Dr. Decroly and learn more about the Child-Study research occurring on that continent. As Goddard tells it, Dr. D. came to the door. I said I am Mr. Goddard from America. Quick as a flash, he said, “Dr. Henry Goddard? You have written an article on the ideals of German children. My wife has translated it into French.”49 With that warm reception, the two engaged in an extended discussion of their work, during which Decroly mentioned the test newly published in French by Binet and Simon. Goddard did not travel to Paris to meet directly with Binet, but he did return home with a copy of the Binet-Simon test. Although scholars in Europe were skeptical of Binet-Simon’s ability to accurately classify children, after administering the test to a handful of children in residence at Vineland, Goddard recognized its value. As Leila Zenderland, a professor of American studies, describes, “Contained within Binet’s articles [about the test], Goddard quickly realized, was an entirely

170 The White Racial Frame and Educational Measurement new psychological approach toward diagnosing and classifying feeble minds.”50 As he administered the test to additional children, Goddard noted a strong correspondence between the classifications provided by the test and the evaluations of the mental abilities of the children recorded after countless hours of interactions by his staff. Goddard concluded, “It met our needs. A classification of our children based on the Scale agreed with the Institution experience.”51 A year after Goddard began experimenting with the Binet-Simon test, he delivered a presentation at the 1909 American Association for the Study of the Feeble-Minded in which he made a strong case for the test’s utility. Like Binet, Goddard argued that the test was more informative than the classifications made by the medical profession based on physical characteristics of children. Goddard also argued that the test was far more efficient than the then-standard practices employed by educational institutions that required multiple interactions over several months before a classification was made. Conference attendees were similarly frustrated with existing methods, but their response to Goddard’s advocacy for the adoption of the Binet-Simon was lukewarm. Over the next two years, Goddard engaged in further experimentation with the test. In 1910 he published a paper titled, “Four Hundred Feeble-Minded Children Classified by the Binet Method,” in which he presented evidence of the consistency of classifications made by the test compared to that of institutional staff, providing what today is termed concurrent criterion-related validity evidence. The day after presenting his paper, the American Association for the Study of the Feeble-Minded unanimously adopted Goddard’s new classifications of the feebleminded. As Zenderland summarizes, “An idiot was now defined as one testing below a mental age of three on the Binet scale, and an imbecile as one testing between mental ages three and seven.”52 In addition to using the Binet-Simon routinely with the children at Vineland, Goddard administered the tests to students in normal schools. This research was inspired, in part, by recent findings that many children in normal schools were above age for their current grade of study. This finding prompted questions about the cause of this mismatch—was it that some of these children were feebleminded, or was the pedagogy less effective than believed? To explore these questions, Goddard administered the Binet-Simon test to more than 1,500 elementary schoolchildren in New Jersey normal schools. His findings indicated that the mental age of only 36% of children matched their chronological age, yet 78% were within a range of one year. About 4% were a year or more above and were classified “gifted.” Fifteen percent were two to three years behind their chronological age and were described as “merely backward.” The remaining 3% were three years or more behind and were believed to be feebleminded.

The Birth of Tests of Mental Ability 171 Following Goddard’s study in New Jersey, similar studies were launched in New York. In the three years that followed, use of Goddard’s version of the Binet-Simon test of intelligence spread across the United States, with more than 22,000 copies distributed by Goddard’s research lab at Vineland. As Zenderland describes, Goddard employed results from intelligence tests to several ends: By 1914, Henry Herbert Goddard had used Binet’s ideas to help draft the first special education legislation in the nation. He had challenged school grading policies in New Jersey and diagnostic practices in New York. He had inspired hundreds of teachers from all parts of the country while also training them to administer intelligence tests. And he had designed the curricula for several universities that would train hundreds more. By promoting testing in public schools, Goddard had begun to institutionalize a new role for psychologists as diagnosticians of the normal and the subnormal—a role whose repercussions would be felt for the remainder of the century.53 Goddard’s positive use of intelligence testing, however, was soon overshadowed by his harmful uses. The late 19th and early 20th centuries experienced a dramatic increase in the number of immigrants coming to the United States, particularly from southern and eastern Europe. As immigration increased, many public leaders grew concerned that the “quality” of people coming into the country was declining. In turn, it was this decline in “immigrant quality” that was blamed for the increasing social challenges that occurred—a change that was actually due more to industrialization and rapid urban growth than the quality of immigrants. Nonetheless, to reduce perceived stress on the U.S. social system, several immigration regulations were introduced. One restriction focused specifically on the mental quality of immigrants. As an example, the 1882 Immigration Act prohibited the entry of persons who were a “convict, lunatic, idiot, or any person unable to take care of himself or herself without becoming a charge.”54 This requirement introduced a notable challenge for immigration officials: how to determine whether a person qualifies as an “idiot.” Prior to 1910, the standard practice at Ellis Island, the largest entry point for immigrants in the United States at the time, was for medical doctors to observe immigrants as they passed through customs and visually identify those who appeared to be of low cognitive functioning. During the first decade of the 1900s, the effectiveness of these efforts was called into question as political leaders argued that the declining quality of U.S. citizenship was due to the continued influx of immigrants of low mental functioning and other social disorders. In 1912, Henry Goddard entered this debate.

172 The White Racial Frame and Educational Measurement Initially, Goddard observed, “Since we have begun to recognize the appallingly large number of mental defectives among us, it is but natural that many people should conclude that these defectives are foreigners and even immigrants.”55 Goddard’s analysis of his schoolchildren data, however, indicated that a relatively small number of students whose test classification indicated low levels of intelligence were in fact immigrants. To Goddard, this initial finding suggested the problem was not one of immigration. Nonetheless, Goddard engaged in a series of studies in which he brought his field research staff, many of whom were instrumental in classifying the intelligence levels of the Kallikak family members based on observation alone, to Ellis Island. During his initial studies, Goddard focused on the accuracy of classifications made by doctors at Ellis Island. For this, he ran experiments in which the doctors applied their standard methods to identify people believed to function as “idiots” while his staff similarly applied their methods. People identified as functioning with low intelligence were then brought to a room and subjected to a modified version of the Binet-Simon test—modifications of which aimed to limit verbal communication and replaced some content with that thought to be more familiar to the immigrants tested. Goddard’s findings indicated that the Ellis Island procedures were highly inaccurate and failed to identify a substantial number of immigrants functioning with low intelligence. Further, his studies suggested that his field workers were more accurate in their classifications, such that a much higher percentage of immigrants identified through observation had test results confirming the diagnosis.56 Goddard continued with his study of immigrant intelligence for the next four years, during which time his views shifted notably regarding the extent to which immigration contributed to the concerning levels of “a low grade of intelligence” among immigrants. As an article published in the Journal of Heredity in 1917 summarizing Goddard’s findings details, Each test taken by itself seems to indicate a very high percentage of feeblemindedness … With the Binet scale, only two of the 148 immigrants scored as high as twelve years, which is usually taken as the dividing line between feeblemindedness and normal intelligence in adults … it yet appears that more than half of these immigrants test feebleminded. The article does raise a question about the reasonableness of these findings but defers to science, stating “we know it is never wise to discard a scientific result because of its apparent absurdity.” To explain these findings, racism is relied upon: “it should be noted that the immigration of recent years is no longer representative of the respective races … we are now getting the poorest of each race.”57

The Birth of Tests of Mental Ability 173 To be clear, Goddard did not generalize findings from his nonrepresentative samples at Ellis Island to the broader populations of the various “European races.” Nor did Goddard use these findings to advocate for further immigration restrictions. But, many people with political power who became aware of Goddard’s findings did.58 As a result, Goddard’s work was instrumental in shaping the Emergency Immigration Act of 1921. Intelligence Testing and the Army Alpha Alfred Binet introduced the first modern test of intelligence in 1905. Shortly thereafter, Henry Goddard imported Binet’s test to the United States and demonstrated its utility for identifying the level of cognitive challenge experienced by children.59 Over the next decade, Lewis Terman, Arthur Otis, and Robert Yerkes expanded the use of intelligence tests to the general population of students and adults, creating a craze for the use of tests of mental ability for a wide range of purposes. Shortly after Goddard began using the Binet-Simon test with children attending normal schools, two issues became apparent. First, as a direct translation of the French version, some content on the test did not resonate with children living in the United States. Second, designed to be administered individually to each child, the Binet-Simon was time-consuming and impractical for large groups of students. Recognizing that some of the items in the Binet-Simon did not translate well to an American setting, American psychologists modified items on the test to better align them with the dominant U.S. culture.60 The most impactful of these refinements was that made by Lewis Terman, whose version of the test became known as the Stanford-Binet Intelligence Scales.61 Like Goddard, Terman studied under Hall at Clark University, where Terman first began exploring intelligence testing. Aware of the failure of Cattell’s sensory-motor approach to measuring intelligence, Terman experimented with tasks that tapped children’s creativity, memory, mathematical ability, and language skills to differentiate among students—an approach similar to that which Binet was developing contemporaneously. In his initial experiments with the 1908 version of the Binet-Simon, Terman noted that the tests seemed too easy for younger age groups and too difficult for older children. To address this misalignment, Terman experimented with items that targeted the ability to generalize from one context to another and the ability to make practical judgments.62 These content modifications expanded the length of the test at each age level and improved the quality of information provided about a child’s cognitive functioning. To inform content modifications, Terman engaged in an extensive review of literature on the testing of mental traits. Through this review, Terman became familiar with translations made by Guy Whipple, a professor of

174 The White Racial Frame and Educational Measurement psychology at Carnegie University also interested in mental measurement. Of particular interest was Whipple’s translation of the work of William Stern, a German psychologist who introduced the idea of calculating the ratio between a child’s chronological age and their mental age to produce a mental quotient. Terman adopted this method to calculate what he called an “intelligence quotient,” or IQ. Henry Minton, a biographer of Terman, observes that Terman’s introduction of the IQ marked an interesting shift in his conception of intelligence. Whereas Terman’s work as a doctoral student treated intelligence as multifaceted, the IQ represented intelligence as a single unitary trait. This conception of intelligence as a unitary trait conveniently fit a hereditary conception of intelligence that undergirded the eugenic effort to engineer procreation by “intelligent” couples and curb that of “degenerates.” As Minton observes, “it seemed far easier to trace the genetic passage of a single trait than of multitraits”—a notion that Terman embraced.63 As a professor at Stanford University, Terman oversaw the study of several students who would later make important contributions to the field of educational measurement, including Arthur Otis. It was Otis’s contribution that addressed the inefficiency resulting from administering the test individually in a one-on-one setting—a procedure that typically required an hour to complete. To increase efficiency, Otis experimented with developing a selected-response version of the test. The selected-response item had recently been introduced by Fredrick Kelly, a doctoral student working under Edward Thorndike at Teachers College at Columbia University. In his research, Kelly noted that teachers were spending an increasing amount of their time scoring written tests. He also noted a growing concern regarding the subjectivity in how teachers marked written tests.64 As a remedy to these problems, Kelly’s doctoral dissertation proposed the idea of standardizing the tests with predetermined answers.65 After earning his degree in 1915, Kelly accepted a position at the University of Kansas and developed the Kansas Silent Reading Test, which was composed of multiple-choice questions. The Kansas Silent Reading Test marked the first timed multiplechoice test.66 Otis explored the use of Kelly’s new item type to adapt Terman’s Stanford-Binet so that it could be administered more efficiently in a group setting.67 To do so, Otis converted items to a multiple-choice format and then developed a stencil to score responses.68 Shortly after Terman introduced the Stanford-Binet Scale and Otis began working on a group-administered version of the scale, the United States entered World War I. At the time, the field of psychology was struggling to establish itself as an accepted science. Recognizing the war as an opportunity to demonstrate the scientific nature of psychology and establish its benefit to the nation, Robert Yerkes proposed to the U.S. Army that a team of

The Birth of Tests of Mental Ability 175 psychologists be convened to develop a psychological test that the Army could use to classify recruits more efficiently and inform their assignment to positions within the military.69 As a professor of psychology at Harvard University, Yerkes was also working to improve the methods used to measure intelligence and had introduced the point scale. Rather than grouping items by the age level and then determining mental age based on patterns of response for items associated with a given age level, Yerkes ordered all items by difficulty and awarded a point for each item responded to correctly. The total number of points earned was divided by the average score for children of the same age as the child. This ratio of earned score to average score was then used to classify a child’s mental functioning.70 Despite skepticism among some military leaders, Yerkes’s proposal was approved, and a group of psychologists who had been experimenting with various approaches to mental measurement met at Goddard’s Vineyard School to launch the development of what became known as the Army Alpha.71 Among the team members were Yerkes, Goddard, Whipple, Terman, and Walter Bingham. The committee agreed that its work was to design a test for the purposes of “classification of recruits on the basis of intellectual ability, with special reference to the elimination of the unfit and the identification of exceptionally superior ability.”72 Given the need to test large numbers of Army recruits, Terman advocated for the use of Otis’s multiple-choice version of the Stanford-Binet Scale. Yerkes similarly advocated for his point system. The team members offered additional tasks that complimented Otis’s multiple-choice version of the Binet-Simon items. Working intensely over a two-week period, the team emerged with a first version of what they termed the Army Alpha. As Yerkes’s 1921 report documenting the Army Alpha attests, the tasks selected for inclusion on the test battery were efficient both in terms of administration time and scoring.73 The initial version was subjected to two weeks of field testing before the team reconvened to make revisions to what became the final version of the test. Recognizing that some recruits would be unable to read English, the team also developed a nonverbal form of the test, termed the Army Beta, that was administered verbally. Several concerns have since been raised about bias produced by many of the items and the validity of classifications that resulted from the test’s results.74 Many of the items required familiarity with the dominant U.S. culture. As an example, one question asked who authored Robinson Crusoe. Others relied on knowledge of “Velvet Joe” (a character in a tobacco advertisement), the food company Nabisco, the location in which the Overland car is produced, baseball batting averages, and knowledge of tennis and military commanders from past U.S. wars. Concerns were also levied against the conditions under which many of the tests were administered. Whereas

176 The White Racial Frame and Educational Measurement the team of developers envisioned the test administered to recruits seated in rows neatly arranged within a barracks building, many of the tests were given in “unfurnished rooms in cramped barracks with inadequate acoustics and lighting … [such that] men sitting in the rear could not hear clearly enough to follow the instructions.”75 Conditions such as these may account for an unusually large number of tests that contained blank scores for one or more subsections. Although problematic administrative conditions were noted in reports summarizing the Army testing program, these issues were ignored when the test scores were later analyzed. These analyses led to several troubling interpretations. Looking across all test scores, Terman, Yerkes, and others who analyzed test results following the end of the war noted with concern the low average score of Army recruits. As Terman reported, the average mental age of recruits membered White was only 13.08—a score that suggested a large percentage of adults in the nation were functioning at or below the score used to identify the “feeble-minded.” Rather than questioning the validity of the scores, Terman concluded, A moron has been defined as anyone with a mental age from 7 to 12 years … almost half of the white draft (47.3) would have been morons. Thus it appears that feeble-mindedness, as at present defined, is of much greater frequency of occurrence than had been originally supposed.76 Scores for other subgroups were even lower, particularly for people membered Black and those with ties to southern Europe. Terman and Yerkes also observed a striking correlation between years of schooling recruits experienced and test scores—a relationship that might suggest the test was, at least in part, a measure of academic achievement rather than native intelligence. However, Terman resisted this interpretation, instead arguing that higher native intelligence is what caused people to opt to stay in school longer.77 Despite concerns about the cultural bias of the test items, inconsistent and often poor administrative conditions, and the interpretations made of resulting test scores, the Army Alpha was administered to 1.7 million recruits, exposing the nation to large-scale standardized testing. The Army Alpha also demonstrated the efficiency with which people could be tested and sorted based on mental ability.78 Finally, as explored in greater detail in Chapter 7, the Army testing programs blazed the path that led to the rapid adoption of tests of mental ability in schools across the nation. Impact of Intelligence Testing on Educational Measurement In her book Measuring Minds, Zenderland observes that “while the intelligence testers may not have transformed the military, the military surely

The Birth of Tests of Mental Ability 177 transformed intelligence testing.”79 Given the need to test more than a million recruits, adapting the Binet-Simon, the Stanford-Binet, and other tests of intelligence demanded an increase in efficiency. This demand forced the adoption of group administration methods. In turn, group administration led to the use of selected-response items. Although the quality of implementation was inconsistent, group administration in a military setting also spurred the development of test administration protocols that established strict guidelines for the arrangement of working spaces, impersonal instructions, and rigid timing regulations. The quick decisions regarding assignment of recruits similarly demanded that all test scores be interpreted in the same manner—a demand that led test developers and test users to ignore the influence culture and educational opportunity have on test performance. Instead of developing different norms based on immigrant or educational status, all scores were considered a measure of the same unitary trait and placed under a single normal distribution (aka bell curve). Interpreting test scores as the product of a unitary mental trait further bolstered the view that intelligence was hereditary. Whereas initial uses of tests of mental ability focused on identifying people in need of special educational services, the Army Alpha shifted the use of tests to identifying recruits with superior intelligence—people whose superior intelligence warranted leadership positions. As Zenderland observes, “Even Yerkes admitted this change in objective with some surprise … for its new goal was to classify everyone.”80 Soon after the end of the war, the measure of mental ability through standardized testing procedures quickly moved far beyond the military, rapidly penetrating schools and industry as a tool for classification. The adoption of selected-response items, scoring stencils, group administration, and standardized administration conditions clearly influenced educational measurement as it operates today. Despite efforts to introduce performance assessments in the 1990s, technology-enhanced items in the 2000s, and digitally based simulation tasks more recently, the selectedresponse item remains the most frequently employed item type on educational tests, as well as on most other tests. Although scoring stencils have been replaced by scanning machines and more recently by computer-based processing of digitally recorded responses, the efficiency and automation of scoring similarly persists. In fact, one might argue that recent efforts to apply artificial intelligence and other text analysis techniques to score written responses reflects modern applications of more sophisticated scoring stencils. Group administration under standardized conditions is similarly seen in nearly all major testing programs. Even those that now allow remote administration in the privacy of one’s home establish strict rules about the room in which a test is taken and the orientation of cameras used to monitor the test taker. The point system introduced by Yerkes and the method of arranging

178 The White Racial Frame and Educational Measurement items by difficulty rather than “grade level” or other content characteristics is also evident in today’s scoring methods and test designs. One might argue that adaptive testing techniques have elevated Terman’s and Otis’s arrangements of items by difficulty to an advanced level. And despite increasing concerns about cultural sensitivity and responsiveness and other forms of bias in test scores, the vast majority of scores continue to be treated as unitary measures that fit under a single bell curve. The Influence of the White Racial Frame The work of the intelligence testers was deeply influenced by the White Racial Frame as it operated during the 19th and early 20th centuries. Whether termed genius, mental faculties, organs, intellect, or feeblemindedness, intelligence was understood as an innate individual trait. Although some, like Goddard, believed education and environment allowed one to develop their intelligence to its fullest potential, intelligence itself was understood by most investigators as fixed. The innate, fixed, individual nature of intelligence allowed the craniometrists, phrenologists, and, in particular, Samuel Morton to directly link levels of intelligence to specific features of the skull. Similarly, the psychophysicists and the intelligence testers approached intelligence as the direct product of mental processes, the level of which remained stable over time. For developers of intelligence tests during the late 19th and early 20th centuries, intelligence was believed to drive social behavior. For Galton and Goddard, the connection between intelligence and behavior is evident in their family studies—Hereditary Genius and the Kallikaks, respectively. In the former, high levels of intelligence were what led men to positions of high esteem. In the latter, feeblemindedness was the cause of degeneracy. Terman similarly believed it was intelligence that motivated people to either continue to pursue schooling or to drop out entirely. Understanding intelligence as an innate trait also facilitated hereditary notions. As Galton and Goddard’s family studies attempted to document, levels of intelligence ran in families and was passed down through generations. Ignoring the gross disparities in wealth, health, and access to opportunities for learning that existed among families in both English and U.S. society, this hereditarian conception allowed the intelligence test developers to support, and in several cases advocate for, eugenic policies. Specific to the development of mental tests, this belief in the hereditary nature of intelligence allowed these pioneers of mental measurement to ignore the potential influences that schooling and other social factors had on their measures. Conceiving intelligence, or what is now termed academic achievement, as an individual trait remains foundational in the use of tests, such as the SAT and ACT, to inform college admission decisions that largely ignore the role that opportunity to learn and other social factors have on test performance.

The Birth of Tests of Mental Ability 179 The white supremist thinking with which these men operated similarly enabled them to ignore questions about the validity of their measures. Despite critics—Walter Lippmann being perhaps the loudest—who raised questions about bias and the infeasibility of results, the intelligence test developers’ belief in the superiority of the White race and the social elite allowed them to ignore the strong correlation found between test scores and education. These beliefs similarly positioned them to unquestioningly accept large differences in test performance between racialized groups. Prior conceptions of the intellectual inferiority of the people membered Black conditioned the intelligence testers to simply accept large differences in scores without questioning similar differences in schooling and access to social supports. In fact, believing that level of intelligence caused years of schooling, Terman pointed to differences in schooling as evidence that supports differences in scores between people membered Black or White. Analyses of the Army Alpha suggested that nearly half of the White race was functioning at or below the cut-point for feeblemindedness—a finding that should surely have raised skepticism about the validity of the scores. Yet, the racialized understanding of White Europeans that operated at the time enabled intelligence test developers to accept these low scores as reasonable given that the “lower races” of Europe were disproportionately among those scoring below the cut-point for feeblemindedness. From Camper to Morton, Galton, Goddard, Terman, and Yerkes, pre-conceptions of racial and class superiority preordained their interpretation of differences as reasonable and natural reflections in their measures. From the early explorations in craniometry to the Army Alpha, efforts to study intelligence were inspired by a positivist endeavor to quantify. Analyses of skulls—whether Camper’s facial angles, Blumenbach’s norma verticalis, or Morton’s cranial capacity—were efforts to quantify mental processes. Although Galton’s initial analysis of genius did not provide a direct measure of mental functioning, his methods nonetheless relied on quantifying the occurrence of eminence within families. Like psychophysics, Galton’s later efforts endeavored to assign quantity to mental abilities through external actions and reactions. The methods of assigning number to mental functions were advanced first by Binet, who, despite his emphasis on classification, introduced the notion of mental age derived by performance on test items. Building on Stern, Terman advanced quantification of intelligence through his creation of the intelligence quotient. Yerkes’ point-scale system similarly advanced methods for representing mental ability as a quantity. Common across this century-long inquiry into mental functions was an effort to develop a method of quantification that reflects differences among levels of intelligence. As Chapter 7 explores next, these same components of the White Racial Frame combine with individual merit to influence the next major development in educational measurement—college admission testing.

180 The White Racial Frame and Educational Measurement Notes

1 Galton (1890), p. 380. 2 Princiotta and Goldstein (2015). 3 Troilus and Cressida, Act 5, Scene 1; Twelfth Night, Act 1, Scene 5. 4 Linnaeus (1758). 5 Fausto-Sterling (1995). 6 Gall employed the term organologie rather than phrenology. Phrenology, however, is the term that is most commonly associated with his techniques. 7 Finger and Eling (2019). 8 Finger and Eling (2019), p. ix. 9 Painter (2010, p. 191) employs the phrase “‘American School’ of anthropology,” while Fredrickson (1971, p. 74) uses the phrase “new scientific ethnology,” and throughout his work Gould (1996) uses the phrase “scientific racism.” 10 Penn Museum (2020). 11 Painter (2010). 12 Painter (2010) states that at his death, Morton had 918 human skulls in his possession with another 51 in transit (p. 191). 13 Morton (1839), p. 253. In a footnote on this same page, Morton notes that “White pepper seed was selected on account of its spherical form, its hardness, and the equal size of the grains. It was also sifted to render the equality still greater.” In his last study, Morton substituted lead shot for pepper seed because he found inconsistencies when measuring cranial capacity with seed (Morton, 1849). 14 Morton (1839), see p. 260. 15 Gould (1978), p. 509. 16 Painter (2010); Gould (1996). 17 Fechner (1860/1966), p. 7. 18 Briggs (2022), pp. 36–37. 19 Briggs (2022), p. 57. 20 Jensen (2002), p. 148. 21 Galton (1884a), p. 6. 22 Galton (1884a), pp. 9–10. 23 Galton (1884b). 24 Galton (1884a), pp. 3–4. 25 Galton (1883), p. 58. 26 Galton (1883), p. 61. 27 Galton (1883), p. 76. 28 Galton (1883), pp. 20–21. 29 Galton (1883), pp. 19–20. 30 Galton (1883), p. 70. 31 Galton (1883), p. 20. 32 Galton (1883), p. 60. 33 Pearson (1914); Wolf (1973); Zenderland (1998/2001). 34 Cattell (1890/1948), p. 347. 35 Cattell (1890/1948). 36 Binet and Henri (1896). 37 Binet & Henri quoted in Wolf (1973), p. 145, italics in the original. 38 Binet quoted in Zenderland (1998/2001), p. 95. 39 See Wolf (1973), pp. 146–158, for a description of many of Binet’s efforts. 40 Binet quoted in Wolf (1973), p. 149.

The Birth of Tests of Mental Ability 181 41 42 43 44 45

Wolf (1973), p. 151, italics in the original. Binet quoted in Zenderland (1998/2001), p. 96. Wolf (1973), p. 159. Wolf (1973), p. 160. Although Binet described his procedure as one of classification rather than measurement (see note 41), Binet applied the term measurement scale in refence to his testing procedure. Further, by classifying a child’s performance with respect to mental years, Binet’s use of a measure of time (i.e., years) implied that the resulting classifications placed children on a scale representing a continuum of time. Depending on whether one gives greater weight to Binet’s assertion that his effort was one of classification rather than measurement, or to Binet’s use of a temporal scale, Binet’s work can be interpreted as producing a classification scheme or a rudimentary measurement scale. See Briggs (2022), pp. 164–167, for a deeper exploration of this tension in Binet’s terminology and writings. 46 Binet quoted in Wolf (1973), p. 184. 47 Wolf (1973); Zenderland (1998/2001). 48 Goddard quoted in Zenderland (1998/2001), p. 92. 49 Goddard quoted in Zenderland (1998/2001), p. 92. 50 Zenderland (1998/2001), p. 93. 51 Goddard quoted in Zenderland (1998/2001), p. 98. 52 Zenderland (1998/2001), p. 102. 53 Zenderland (1998/2001), p. 142. 54 See Section 2 of the 1882 Immigration Act. 55 Goddard (1912), p. 91. 56 See Gould (1996), and Zenderland (1998/2001) for detailed descriptions of Goddard’s studies. 57 All quotes taken from N.A. (1917), p. 554. 58 In addition to his research on Ellis Island, Goddard studied immigrants residing in New York City. These studies indicated that, despite their lower intelligence, most of the immigrants were able to survive without public assistance. Goddard attributed this success to the hard efforts these immigrants made to overcome their mental challenges. 59 Following the publication of the 1905 version, two modified versions were published in 1908 and 1911. Goddard imported the 1908 version to the United States. 60 Gallagher (2003); Gregory (1992). 61 Terman (1916). 62 Minton (1988). 63 Minton (1988), p. 50. 64 Kelly (1915). 65 Davidson (2011). 66 Kamenetz (2015). 67 Davidson (2011); Gallagher (2003); Kamenetz (2015). 68 Zenderland (1998/2001). 69 Carson (1993); Monahan (1998). 70 Minton (1988). 71 Zenderland (1998/2001)). 72 Yerkes (1921), p. 299. 73 Boake (2002); Yerkes (1921). 74 Gould (1996). 75 Minton (1988), p. 69.

182 The White Racial Frame and Educational Measurement 76 See Yerkes (1921), p. 789. This section of the report was authored by Terman. 77 See Yerkes (1921), p. 783. This section of the report was authored by Terman. 78 Reed (1987). 79 Zenderland (1998/2001)), p. 290. 80 Zenderland (1998/2001)), p. 292.

References Binet, A. & Henri, V. (1896). La psychologie individuelle. LAnnee psychologique, 114(1), 5–60. Boake, C. (2002). From the Binet-Simon to the Wechsler-Bellevue: Tracing the history of intelligence testing. Journal of Clinical and Experimental Neuropsychology, 24(3), 383–405. Briggs, D.C. (2022). Historical and Conceptual Foundations of Measurement in the Human Sciences: Credos and Controversies. Routledge. Carson, J. (1993). Army Alpha, Army brass, and the search for Army intelligence. Isis, 84(2), 278–309. Cattell, J.M. (1890/1948). Mental tests and measurements. In Readings in the History of Psychology. Appleton-Century-Crofts. Davidson, C.N. (2011). Now You See It. Penguin Books. Fausto-Sterling, A. (1995). Gender, race, and nation: The comparative anatomy of “Hottentot” women in Europe, 1815–1817. In Deviant Bodies: Critical Perspectives on Difference in Science and Popular Culture. Indiana University Press. Fechner, G.T. (1860/1966). Elements of Psychophysics (Vol. 1). Translated by D.H. Howes. Holt, Rinehart and Winston. Finger, S. & Eling, P. (2019). Franz Joseph Gall: Naturalist of the Mind, Visionary of the Brain. Oxford University Press. Fredrickson, G.M. (1971/1987). The Black Image in the White Mind. Wesleyan University Press. Gallagher, C.J. (2003). Reconciling a tradition of testing with a new learning paradigm. Educational Psychology Review, 15(1), 83–99. Galton, F. (1883). Inquiries into Human Faculty and Its Development. Macmillan. Galton, F. (1884a). Anthropometric Laboratory. William Clowes and Sons, Limited. Galton, F. (1884b). An anthropometric laboratory. Science, 5(114), 294–295. Galton, F. (1890). Remarks. Mind, 15(59), 380–381. Gould, S.J. (1978). Morton’s ranking of races by cranial capacity: Unconscious manipulation of data may be a scientific norm. Science, 200(4341), 503–509. Gould, S.J., (1996). The Mismeasure of Man. WW Norton & Company. Gregory, R.J. (1992). Psychological Testing: History, Principles, and Applications. Allyn & Bacon. Jensen, A.R. (2002). Galton’s legacy to research on intelligence. Journal of Biosocial Science, 34, 145–172. Kamenetz, A. (2015). The Test. Public Affairs. Kelly, F.J. (1915). The Kansas Silent Reading Test. Studies by the Bureau of Educational Measurement and Standards. No. 3, 1–38. Linnaeus, C. (1758). Systema Naturae. Holmiae.

The Birth of Tests of Mental Ability 183 Minton, H.L. (1988). Lewis M. Terman: Pioneer in Psychological Testing. New York University Press. Monahan, T. (1998). The Rise of Standardized Educational Testing in the U.S.: A Bibliographic Overview. Rensselaer Polytechnic Institute. Morton, S.G. (1839). Crania Americana; or, a Comparative View of the Skulls of Various Aboriginal Nations of North and South America: To Which Is Prefixed an Essay on the Varieties of the Human Species. J. Dobson. Morton, S.G. (1849). Catalogue of Skulls of Man and the Inferior Animals, in the Collection of Samuel George Morton. Merrihew & Thompson, printers. N.A. (1917). Intelligence of immigrants: Dr. H. H. Goddard finds indications that large part of those who arrive in the steerage are feebleminded—Low grade of intelligence may possibly not be hereditary in this case. Journal of Heredity, 8(12), 554–556. Painter, N.I. (2010). The History of White People. WW Norton & Company. Pearson, K. (1914). The Life, Letters and Labours of Francis Galton. Cambridge University Press. Penn Museum. (2020). A History of Craniology in Race Science and Physical Anthropology. https://www.penn.museum/sites/morton/craniology.php Princiotta, D. & Goldstein, S. (2015). Intelligence as a conceptual construct: The philosophy of Plato and Pascal. In Handbook of Intelligence. Springer. Reed, J. (1987). Robert M. Yerkes and the mental testing movement. In Psychological Testing and American Society 1890–1930. Rutgers University Press. Terman, L.M. (1916). The Measurement of Intelligence: An Explanation of and a Complete Guide for the Use of the Stanford Revision and Extension of the BinetSimon intelligence Scale. Houghton Mifflin. Wolf, T.H. (1973). Alfred Binet. The University of Chicago Press. Yerkes, R.M. (1921). Psychological Examining in the United States Army. US Government Printing Office. Zenderland, L. (1998/2001). Measuring Minds: Henry Herbert Goddard and the Origins of American Intelligence Testing. Cambridge University Press.

7

The Rise of Educational Testing and Test Bias

Whatever exists at all exists in some amount. To know it thoroughly involves knowing its quantity as well as its quality.1

The history of educational testing dates to at least the 12th century, when the University of Paris and the University of Bologna introduced formal examinations of their students. These first educational tests took the form of oral theological disputations that assessed students’ knowledge of essential texts of the time, including Thomas Aquinas’s Summa Theologiciae, Peter Lombard’s Sententiae, and Petrus Comestor’s Historia Scholastica. Like the catechisms mastered by those entering the priesthood, these first educational tests employed questions known in advance for which students recited memorized responses. In 1927, documents produced during the 1230s were discovered in which answers to questions asked during these exams were documented—a prelude to the booming test preparation industry that operates today.2 For several centuries, tests similar to those introduced in Paris and Bologna were used to determine students’ readiness for scholarly work. In the late 1700s, Cambridge University modified the use of oral examination from assessing an individual’s readiness to ranking students based on their test performance. To support the ranking of students, judgments of student performance shifted from qualitative summaries to quantitative scores. This shift marked the first known use of quantity to summarize cognitive performance in an educational setting, setting a path that gradually led to the sophisticated quantitative psychometric modeling employed by test developers today. Like Europe, teachers in the United States relied on oral exams to assess student knowledge during the 18th and 19th centuries.3 As industrialization swelled urban populations, student enrollment in public schools increased

DOI: 10.4324/9781003228141-10

The Rise of Educational Testing and Test Bias 185 rapidly, raising concerns that the quality of instruction varied between schools. To examine this issue, in 1845 Horace Mann directed schools across the city of Boston to administer common tests.4 Mann believed written exams would “provide objective information about the quality of teaching and learning in urban schools, monitor the quality of instruction, and compare schools and teachers within each school.”5 As it turns out, Mann’s standardized testing program did detect differences in performance among Boston’s schools. And, over the next 50 years, his testing program inspired the development of similar common written exams in school systems across the United States.6 As the previous chapter explored, Goddard’s translation of the BinetSimon test, Terman’s enhancement, and Otis’s development of a selectedresponse version expanded the focus of testing in schools from the assessment of student achievement to the measurement of intelligence. Prior to World War I, use of intelligence tests in U.S. schools was limited to two purposes: identifying students who were placed in training schools for students with special needs, and for research on the intelligence of the general population of students. Following the end of the war, the use of intelligence and other forms of standardized tests changed dramatically in U.S. schools. Although it is unclear how extensively military leaders made use of test scores to classify recruits, the Army Alpha exposed nearly two million adults to standardized testing. For Lewis Terman, Robert Yerkes, Walter Bingham, and others who were involved in the Army Alpha testing program, the efficiency with which measures of mental ability were collected opened eyes to the possibility of applying similar methods in educational settings. Moreover, the bombardment of requests from school leaders to administer the Army Alpha to their students created demand for group-administered standardized tests. Shortly after the war ended, Terman and Yerkes received funding to form a committee within the National Research Council, titled the Committee on Intelligence Tests for Elementary Schools, that set to work converting the Army Alpha into the National Intelligence Test. As part of the development process, the committee developed norms used to compare and classify test takers. The committee recognized that rural children would likely score lower; yet students from rural regions were excluded from the norming process due to the financial costs required to collect representative samples from sparsely populated regions of the country. The committee also recognized that the performance of students membered Black would likely be lower but feared their inclusion would create political friction; a fear elevated by racialized riots that had recently occurred in Washington, D.C. Instead of forming a representative sample with respect to geographic locale and racialized representation, the Committee relied on schools in large urban areas populated by students membered White.

186 The White Racial Frame and Educational Measurement As Henry Minton, a biographer of Lewis Terman, explains: White, urban, middle-class children were to set the standard of test performance (the same standard Terman had set for his Stanford-Binet). However, this was not viewed as problematic because the committee had made it clear that it was developing methods for testing “native intellectual ability.” Consequently, if poor or black children scored lower on the tests it reflected their lower native intelligence, rather than any disadvantage resulting from a comparison with a more culturally advantaged population.7 It is interesting to note that within their norm group, female students scored, on average, at a mental age six months higher than male students. Yet, “unlike race differences, no conclusions were drawn about innate group differences in the case of sex.”8 Despite the unrepresentative nature of the norms, in 1920 World Book Company published their test, and within two months, more than 200,000 copies were purchased by school systems across the nation. Given this initial success, Terman began work on a similar test for high school students. Within a few years of its completion, more than a half a million copies were being sold to schools annually. In 1921, Terman then shifted his test development work from the measure of intelligence to the measure of educational achievement. Whereas Terman’s intelligence test was intended to measure an innate trait, the educational achievement test was intended to measure the outcome of student learning in school. Working with colleagues at Stanford University, Terman again partnered with World Book to publish the Stanford Achievement Test. Like the elementary and high school intelligence tests, Terman’s achievement test rapidly gained attraction as a tool for measuring the outcome of student learning in school, selling nearly 1.5 million copies a year by 1925.9 By the early 1930s, 75% of the largest urban school systems across the United States were using intelligence tests to route students into different educational tracks based on their measured mental ability.10 Although the tracking of students was deeply criticized several decades later, its practice was advocated by Terman.11 As Gerard Giordano describes in his history of educational testing: Terman (1924) advised educators that they should sort children on the basis of their test scores. He explained that this procedure was “absolutely essential if the public school is to be made a real instrument of democracy” … He added that it was as “unjustifiable and dangerous for the educator to prescribe the same educational treatment for all as it would be for a physician to prescribe the same medical treatment for all.”12

The Rise of Educational Testing and Test Bias 187 Belief in the scientific nature of scores provided by these tests and their utility for shaping the educational experiences of schoolchildren is similarly reflected in the comments of Charles Hubbard Judd, then director of the University of Chicago’s School of Education: We all understand now in definite scientific terms that children are different from one another … that the best we can hope for is improvement— not absolute achievement of ideals. With the theoretical ideal of perfection overthrown, there is now an opportunity to set up rational demands. We can venture to tell parents with assurance that their children in the fifth grade are as good as the average if they misspell fifty percent of a certain list of words. We know this just as well as we know that a certain automobile engine cannot draw a ton of weight up a certain hill. No one has a right to make unscientific demand of the automobile or of the school.13 During the late 1920s, Ben Wood, a former student of Edward Thorndike and director of Columbia University’s Collegiate Research Bureau, expanded the use of tests from informing decisions made in individual schools to helping states monitor their educational systems. Wood’s first foray into the study of state educational systems began in 1928 when the Carnegie Foundation provided funding for a six-year study of educational conditions in Pennsylvania. As part of this study, Wood and his team at Columbia administered tests composed of selected-response items to roughly 20,000 students across the state. At about the same time, Wood worked with the state of New York to convert its Regents exam to a multiple-choice format. Although the multiple-choice item allowed Wood and his team to administer a larger number of items in a shorter period of time, scoring such a large number of response sheets created delays in their analyses. To address this inefficiency, Wood reached out to both Reynold Johnson, a high school science teacher who had developed a rudimentary electrical test scoring machine, and Thomas Watson, then president of International Business Machines (IBM), to facilitate the development of the first automated scoring machine. A modified version of Johnson’s automated scoring machine was first used at scale to score the New York Regents exam in 1936. That same year, the machine also rescued Connecticut’s testing program which had exhausted its budget for printing test booklets. By printing a limited number of test booklets that were shared among students who recorded responses in separate scannable scoring sheets, the scoring machine immediately proved useful in reducing cost while increasing the accuracy and speed of objective scoring. Wood’s work, both in demonstrating the utility of state testing programs and in facilitating the development of automated scoring processes, had impacts on educational testing that persist to today.14

188 The White Racial Frame and Educational Measurement Reflecting on the rapid rise of standardized testing in public schools during the 1920s, Ellwood Cubberley, a professor and dean of the Stanford Graduate School of Education, extolled the scientific nature and utility of standardized tests: the test and measurement movement arose, something like a quarter of a century ago, largely as an attempt, on the part of a few students of education, to find a means for transforming guess work as to school progress into procedures having scientific accuracy.15 Together, the widespread use of these intelligence and achievement tests that emerged in the 1920s and blossomed in the 1930s launched the nation on a program of standardized testing. College Admissions Testing As the use of intelligence testing in elementary and high schools increased following World War I, interest in applying intelligence testing in higher education similarly emerged. Initially, the use of intelligence tests within institutes of higher education was limited to documenting the intelligence level of students for research purposes. As an example, at Ohio State University, Ellis Noble and George Arps administered the Army Alpha to 5,950 students, finding that just over half of the students scored at the highest levels.16 Hiram Hunter similarly administered the Army Alpha to students at Southern Methodist University, the University of Illinois, and Dickinson College and compared performance across the three institutes. In 1922, Guy Whipple assembled a report summarizing uses of Alpha across 29 institutes. Across all of these administrations, test scores were used primarily for research purposes and had no impact on the students or institutions themselves. Subsequent adoption of intelligence tests to inform college admission decisions, however, did have a direct impact on students seeking entry into institutes of higher education. The integration of intelligence testing and college admissions occurred through four critical developments, beginning in 1899 with the formation of the College Entrance Examination Board. Michael Schudson, a sociologist and professor of journalism, describes the late 1800s as a period during which the nation engaged in a variety of efforts to standardize its systems. As one example, locally established railroads, many of which employed different gauge railways, adopted a common gauge which required many local railways to replace their nonstandard gauge tracks. High school education was similarly unstandardized. This lack of standardization was compounded by rapid growth in the number of students attending high school—a number that increased from about 80,000 in 1870

The Rise of Educational Testing and Test Bias 189 to more than 900,000 in 1910. As the number of schools serving this growing population of students expanded, instruction tended to focus more on preparing students for life as a working adult than for advanced study in college. As a result, students entering elite colleges and universities came with varied exposure to concepts and skills.17 Expectations of students entering higher education similarly varied among colleges. To help standardize curriculum in high schools, particularly at the elite private boarding schools feeding students to Ivy League colleges, and to create common expectations for students entering higher education, the College Entrance Examination Board was formed in 1899.18 As Schudson describes: The College Board was founded to bring order to the chaos of college entrance requirements in the Eastern states. Its creators hoped it would replace the idiosyncratic examinations of its original clientele—established Eastern colleges—with a single set of examinations of its own. Standardization was its aim and early in its history the Carnegie Foundation praised it as “the most effective agency working toward uniformity in administration of entrance requirements.”19 The College Board was also interested in establishing a common standard that could be used to characterize both students and the high schools in which they were educated. By forming a common measure and a standard against which students were compared, the Board hoped to aid elite colleges in differentiating among students based on a common objective measure. This intent is reflected in comments made by Charles Eliot, then president of Harvard, just prior to a vote among colleges and universities to form the College Entrance Examination Board: The College Entrance Examination Board, if constituted, is not to admit students to any college, but so to define the subjects of admission that they will be uniform, to conduct examinations in these subjects at uniform times throughout the world, and to issue to those who take the examinations certificates of performance—good, bad, or indifferent.20 Given the opportunities that study at an elite college provided for future social and economic benefits, the College Board aimed to base admission decisions on merit rather than simply the (private boarding) school one attended. It is here that the White Racial Frame’s embrace of individual merit first appears as a factor influencing the development of educational measurement. Merit-based decisions can be viewed from two different angles. On the one hand, meritocratic decision-making can serve to thwart the allocation of

190 The White Racial Frame and Educational Measurement opportunities and resources based on political patronage, cronyism, and nepotism, and instead award those who are deemed to be most meritorious based on clearly defined criteria—in the case of college admissions, test scores served as the objective criteria of merit. On the other hand, meritocratic decision-making can serve to preserve the values, traditions, and practices of the dominant culture. As Schudson explains, as a tool for preservation, meritocracy can be viewed simply as the metaphysic of an industrial social order, and of the dominant social groups in that order. Civil service reform was intended in part to limit, not to promote democracy, to protect government from the people by insulating within it an educated elite … [meritocracy operates as] a movement of upper class, professional, and business groups to control politics for their own ends. “Achievement” in the school system is defined, controlled, and measured in terms of ability to survive within a culture which reflects the behavior and values of the upper, upper middle, and professional [White] classes.21 From this angle, use of test scores to inform merit-based decisions tend to benefit those whose social settings and educational opportunities align with values and knowledge systems held by those controlling the criteria employed to order merit. For the first 20 years, the College Board exams took the form of a series of extended written tests focused on subject areas the elite colleges determined were essential prerequisites for higher educational studies. By making known the topical areas covered by the exam, the College Board hoped high schools would adjust their curriculum to assure coverage of these topics. At the same time, by establishing a common set of questions to which all students seeking entrance responded, the College Board created a mechanism they believed would classify students based on their intellectual merit. The second stage in the marriage of intelligence and admission testing occurred following the end of World War I. As noted earlier, during the early 1920s, colleges and universities administered the Army Alpha to its students for research purposes. One line of research focused on the correlation between Alpha scores and college grade point average (GPA). In many cases, the Alpha scores were more strongly correlated with GPA than were the current composition-based admission tests employed by these schools. Recognizing the threat these findings posed to their essaybased entrance exam, the College Board responded by developing its own mental measure, which eventually replaced the more arduous and lengthy series of written exams. This new testing program, termed the Scholastic Aptitude Test (SAT), was first administered in 1926 to approximately 8,000 test takers.22

The Rise of Educational Testing and Test Bias 191 In addition to providing a measure better able to “predict” performance in college, the first schools adopting the SAT were interested in expanding the geographical representation of their student body. By creating a standardized test that could be administered and scored efficiently, these schools anticipated that the test would be administered nationwide, allowing them to select the most meritorious students from across the nation. Over the next 15 years, Carl Brigham, a professor at Princeton University, worked with the College Board to make several improvements to the SAT. Of particular note was Brigham’s effort to separate the SAT into two distinct sections, one focused on verbal aptitude and the other on mathematical aptitude. During this period, the selected-response portion of the SAT continued to be complimented by an essay portion. Despite improvements introduced by Brigham, during the first ten years of the testing program the number of test administrations remained relatively flat, increasing to just under 9,500 in 1936 when the third major development emerged.23 Shortly after James Conant was appointed president of Harvard College in 1933, he launched an initiative to increase the diversity of the school’s student body. At the time, the vast majority of Harvard’s student body was formed by students who attended private boarding schools located in New England. From Conant’s perspective, most of these students were more interested in the social aspects of college than the academic aspects. Through a scholarship program, Conant hoped to diversify the student body by bringing students to Harvard whose interests were more academically focused. Given the high representation of students from New England and New York, Conant was also interested in attracting students from more western regions of the nation. To this end, he asked Henry Chauncey, then a dean at Harvard, to work with him to develop a scholarship program that accepted and fully funded students based on their academic merit. At the time, Harvard employed a series of composition-based tests that focused on specific academic domains. Recognizing that performance on these exams was heavily influenced by the instruction provided by a test taker’s high school experience, Conant was concerned that use of these essaybased exams would fail to identify students of “natural ability” whose high school instruction might vary based on social-economic conditions. Understood as a measure of potential rather than a reflection of preparation, Conant and Chauncey adopted the Scholastic Aptitude Test as the tool for informing their scholarship selections. Although Harvard initially administered the SAT to a small number of students and awarded just ten fully funded scholarships per year, over the next few years Conant and Chauncey observed that those students selected using the SAT typically performed at a higher academic level than the general student body. Together, the success of the scholarship program in attracting a more diverse (geographically and perhaps social economically, but not racially)

192 The White Racial Frame and Educational Measurement student body and the academic success of these students provided backing to Conant’s philosophical interest in advancing society by helping to establish regional and national leadership based on merit rather than the elite cronyism that continued to dominate at the time.24 As historian John Carson explains, mental philosophers and political theorists [similar to Conant] … argued that if the “false” distinctions of wealth or family background or beauty or any of the other accidents of birth could be eliminated, then the “true” ones, those reflecting fundamental aspects of a person’s nature, could come to the fore. Almost all believed that social differences would not disappear; rather, they would be placed on a new footing—merit—and made to seem legitimate expressions of how individuals manifested those abilities.25 It was this interest in furthering the formation of a meritocratic society that inspired Conant and Chauncey to launch the final step in the ascendence of standardized admissions testing. Conant was an ardent advocate for meritocracy, believing that society should be led by an intellectual elite. Conant’s ideas regarding merit were consistent with, and likely influenced by, Thomas Jefferson’s distinction between an “artificial aristocracy” and a “natural aristocracy.” As a strong advocate for a public educational system designed to support the development of the most intellectually capable within society, Jefferson believed that “worth and genius” should be “sought out from every condition of life, and compleatly [sic] prepared by education for defeating the competition of wealth and birth for public trusts.”26 Reflecting on the leadership that had dominated the United States since its founding and that was still in place in the 1930s, Conant believed the nation remained controlled by “an artificial aristocracy.” To Conant, the SAT provided a unique and powerful tool that could change the course of the nation’s leadership. By identifying students who were most intellectually capable and prioritizing their entry into higher education, Conant envisioned a future in which the intellectual elite—rather than those born into wealth and well-connected circumstances—would ascend to leadership. Using test scores to establish merit aligned with the utilitarian form of justice that dominated at that time—allowing some to advance while improving the efficiency of the selection process employed by colleges and universities, yet disadvantaging those already disadvantaged. With this vision in mind, Conant collaborated with Chauncey to convince other institutes of higher education to adopt the SAT to inform their admission decisions. Their vision, however, did not end with widescale adoption of the SAT; they also envisioned a single organization under which major testing programs would be consolidated and advanced through a focused program of research. As Nicholas Lemann documents in detail in his book,

The Rise of Educational Testing and Test Bias 193 The Big Test: The Secret History of the American Meritocracy, Conant and Chauncey worked for nearly a decade to convince leaders at colleges, universities, foundations, and test development organizations to embrace their vision—an endeavor that encountered considerable resistance. Those running other testing programs understandably wanted to preserve control over their programs and reap the financial rewards generated by sales of their tests. Meanwhile, several colleges and universities did not see how a test such as the SAT would add value to their admission process. The end of World War II and the subsequent introduction of the GI Bill changed this perspective. Prior to the war, a relatively small percentage of young adults sought higher education. Returning home from war with uncertain futures, hundreds of thousands of soldiers capitalized on the generous funding for higher education provided by the GI Bill, swelling the number of people seeking entrance to colleges and universities across the nation. At about the same time, the Carnegie Foundation, interested in supporting Conant and Chauncey’s vision, worked behind the scenes to gain support for the formation of a new entity focused on the advancement of educational testing. Meanwhile, Conant worked to bring several Ivy League schools into his fold. Together, these events led to increased adoption of the SAT by institutes of higher education across the nation and the formation of the Educational Testing Service—a new organization headed by Chauncey, whose mission focused on the development and advancement of educational testing. Several scholars and journalists have examined the many ways in which these two developments have impacted the college admissions process, the rise of the test preparation industry, the dominance of the multiple-choice item for measuring mental abilities, and the reproduction of racialized inequalities in our higher educational system and society more broadly.27 Across these accounts, two themes are shared. First, since Conant and Chauncey’s vision was realized, the use of the SAT, and later the ACT, to inform college admissions has increased dramatically such that more than 3.5 million high school students took these tests annually prior to the COVID-19 pandemic.28 Second, use of the SAT and ACT has increased emphasis on intellectual merit as a key factor influencing admission to college. Despite legitimate concerns regarding the role test preparation plays in inflating scores for students from higher socioeconomic households—whose families are better able to afford the financial cost of preparation services—the role test scores play in shaping decisions has increased as the number of applicants to college and universities has similarly swelled. Although growth in the number of people accessing higher education since World War II ironically clashes with Conant’s idea of using merit to restrict higher education to the relatively small percentage of the

194 The White Racial Frame and Educational Measurement population whose merit warranted further education, it has increased focus on intellectual merit as a key criterion for entrance. Bias in Mental Measures If the group of men who gathered at Vineland in 1917 to developed the Army Alpha were alive today, they would surely relish the many roles that testing of mental abilities plays in our educational system. Yet, despite the success of these pioneers in launching what has matured into a billion-dollar industry, the use of mental measures for educational purposes has garnered warranted criticism. In many cases, this criticism centers on issues of bias. Within the field of educational measurement, bias is employed as a technical term and is narrowly defined as a systematic source of error associated with the test scores for a given subgroup of test takers. For an educational measurement specialist, observing differences in scores among subgroups of test takers is not sufficient evidence of bias. Instead, bias is said to occur when test takers of the same ability receive scores that differ due to error that systematically advantages one subgroup or disadvantages another subgroup. In the common lexicon, bias is understood more generally as a policy, program, tool, or action that unduly favors or disfavors one or more subgroups of people. Charges of bias levied against tests of mental abilities address both technical and commonsense concerns regarding bias. For simplicity, I group the issues raised regarding bias and tests of mental abilities into three general categories of concern: the content of tests, the norming of tests, and the interpretation of test scores. The Bias of Test Content

Soon after Goddard imported the Binet-Simon and began administering a translated version in the United States, test administrators observed that some items required knowledge of French culture that was unfamiliar to U.S. students. As a result, these items behaved differently from what Binet reported. To remove this source of bias, Goddard revised a small set of items by changing the context in which a problem was set such that the new context was more familiar to U.S. students. Terman’s modified version of the Binet-Simon similarly replaced content to produce contexts better aligned with the dominant U.S. culture. The efforts made by Goddard and Terman to align the content of items with the experiences of test takers is perhaps the earliest example of test developers modifying a test to reduce cultural bias. Yet, when these same men gathered at Vineland to develop the Army Alpha, they seemed unaware of the cultural bias that operated in many items comprising the Alpha.

The Rise of Educational Testing and Test Bias 195 Reflecting on the development of the Army Alpha and the subsequent infusion of tests of mental ability into education, numerous observers highlight the impact that the content of items has on test scores for test takers whose background differs from those membered into the dominant White middle and upper classes. As just a few examples, one item forming the Army Alpha asked test takers why soldiers wore wrist watches rather than pocket watches—an item that assumed all examinees were familiar with both types of watches. Another item asked about glass insulators on telegraph wires—again assuming all test takers were familiar with telegraph lines. Other items required test takers to unscramble words to create sentences often conveying idioms. Additional items required familiarity with bowling balls, tennis nets, crabs, phonographs, and playing cards. While most of this content was familiar to many test takers, for those relatively new to the United States, those from households of lesser financial means, or those who were raised in rural areas, this content was unfamiliar. Similar concerns about the bias of test content were raised for the intelligence tests administered to students in elementary and high school, as well as those seeking college admission. Perhaps most highly criticized was the vocabulary employed for items testing knowledge of synonyms and antonyms used on the SAT and tests of intelligence. Whereas tests of intelligence claimed to measure an innate, native ability, the more one is exposed to vocabulary through reading and conversation, the more likely one is to be familiar with many of the words employed for these items—a familiarity that is hardly native. Despite efforts to reduce cultural bias in educational tests, research continues to document ways in which the content of items produce bias in test scores. As just two examples, consider findings from analyses of vocabulary items on the Graduate Record Examination (GRE) and SAT and similar analyses of science items employed by the Massachusetts Comprehensive Assessment System (MCAS), a set of state tests administered annually to public students across the state. In their analysis of the GRE, Roy Freedle and Irene Kostin, researchers for the Graduate Record Board, compared the performance of test takers membered White and those membered Black on each item forming the test. As is common practice when examining item bias, their analyses grouped test takers by their ability as estimated by their total score. Comparisons between the two subgroups of test takers were then made for each item within each ability group. Freedle and Kostin identified several vocabulary items for which the percentage of test takers responding correctly differed between the two subgroups of test takers. Interestingly, for some items, test takers membered Black responded correctly at lower rates than did test takers membered White. Yet, for other items, the reverse pattern was noted. Upon closer analysis, Freedle and Kostin observed that test

196 The White Racial Frame and Educational Measurement takers membered Black tended to perform better on the more difficult vocabulary items, while test takers membered White performed better on the easier vocabulary items.29 Conducting separate analyses, researchers at ETS found a similar pattern for the SAT—items containing more difficult vocabulary items were responded to correctly at higher rates by test takers membered Black than by those membered White, while the reverse pattern occurred for easier items.30 These findings were later corroborated by Maria Santelices and Mark Wilson, who used alternate analytic methods to examine whether these patterns were an aberration resulting from the statistical methods employed by the ETS researchers.31 Through a separate line of research, Tracey Noble and her colleagues at TERC32 have similarly found that the cognitive complexity of sentences employed by science test items can negatively impact the performance of students who are categorized as English Language Learners. In their research, items forming the Massachusetts (MCAS) science tests were analyzed to identify items on which English Language Learners performed lower than expected. This subset of items was then examined to identify characteristics that might account for this lower-than-expected performance. Noble and her team identified two linguistic features common to this subset of items. As they describe, the first feature, which they term forced comparison, typically (a) used Which of the following, (b) named a category of what was sought: ‘Which of the following drawings …,’ (c) asked students for an end of scale value (e.g., best shows or most likely result), and (d) had a verb or noun associated with the end of scale value.33 The second feature, which they term reference back, asked students to return to a previous sentence in the item to identify information that informed their response.34 Despite these relatively recent findings, the content of tests has evolved considerably since the first tests of intelligence were created. In addition, standards and item development guidelines addressing fairness, many of which focus on avoiding cultural bias and linguistic complexity in test content, now exist and are generally adhered to by test developers. Nonetheless, critical analyses recently presented by Jennifer Randall, a professor of psychometrics and test development, describes how current efforts to reduce bias continue to advantage test takers most familiar with the dominant culture. Specifically, Randall argues that efforts to author items that are “context neutral” and guidelines that advise against content that may be sensitive to some test takers ends up advantaging students most familiar with the dominant White culture and, in turn, disadvantaging other subgroups of test takers.35 As explored in greater detail in Chapter 13, Randall’s analysis

The Rise of Educational Testing and Test Bias 197 highlights the challenge the field continues to encounter developing items that function in an unbiased manner for all test takers. Test Norm Bias

When developing the Binet-Simon test of mental ability, Binet emphasized that he did not intend for his test to function as a measure. Instead, Binet designed the test to serve as a tool for classifying students. Further, Binet’s classification decision was binary—either a student was positioned to learn through standard educational practices, or the student would benefit from special instruction. Binet’s interest was limited to informing these individual decisions, and he showed no interest in using his tests either to describe a given population or to compare people to others forming a population. Shortly after Goddard imported the Binet-Simon into the United States, use of intelligence tests shifted in an important way; rather than separating test takers into one of two categories as Binet had done, the U.S.-based test creators developed interest in examining variation in intelligence among members of a population of students. Accomplishing this task required two important changes to intelligence tests. First, a test needed to provide a measure capturing standing on a scale of intelligence. To this end, Yerkes introduced the point scale, and Terman developed the intelligence quotient. Second, understanding of how intelligence was distributed was necessary in order to express a given examinee’s standing relative to others in the population. Establishing the distribution of intelligence required that these early test developers collect scores from a sample of students whose performance was used both to establish norms and to examine deviations from that norm. Goddard’s first efforts to examine the distribution of scores from a test of intelligence focused on public school students in New Jersey and later New York. Terman relied on students in a subset of school districts in California with which he and others at Stanford had working relationships. Given the schools selected for data collection and the fact that a substantial portion of the population of school-age children did not attend school, these samples were not representative of U.S. school-age children at the time. Those skeptical of claims about both the low level of intelligence of students within U.S. schools and even lower levels of intelligence of specific subgroups of students pointed to the unrepresentativeness of these initial norm groups as problematic.36 As described earlier in this chapter, similar limitations arose when Terman and his colleagues converted the Army Alpha for use in elementary schools. Due to practical challenges, students living in rural areas were excluded from the norm group. For political reasons, students membered Black were also excluded. As a result, the norm group was formed predominantly by urban, middle- and upper-class students membered White. Given the social

198 The White Racial Frame and Educational Measurement and economic advantages experienced by middle- and upper-class students membered White coupled with the cultural bias present in these early tests, performance by test takers forming the norm groups was higher than that of students excluded from the norming process. In turn, reliance on this higher-performing group to form the norm group overestimated the average score against which members of the excluded group were then compared and, in turn, overstated the low performance of those students. This artificial inflation of low comparative performance, however, was ignored when scores were interpreted and used to evidence the inferior intelligence of select subgroups of test takers. Biased Interpretation of Test Scores

Goddard’s study of students in New Jersey and New York suggested that a great many students were less intelligent than their grade level implied. Analyses of the Army Alpha similarly suggested that the average adult male was operating with a relatively low level of intelligence. During the early 1920s, several pioneers in intelligence testing, most notably Yerkes, Terman, and Brigham, conducted analyses of the large volume of Alpha scores collected for the Army. Without fail, their analysis interpreted differences in the average scores among subgroups of test takers as evidence of the intellectual superiority of people membered White and the inferiority of all other racialized groups. Similar interpretations were made based on the intelligence and achievement tests administered in elementary and high schools during the 1920s and 1930s. Walter Lippmann was among the first to admonish such interpretations. In a series of articles published in The New Republic in 1922, Lippmann acknowledged that intelligence tests might be useful for grouping individuals as the Army did during World War I. But, as Schudson writes, Lippman also argued: their use was likely to be abused because so many of the creators and marketers of the tests claimed not only to measure intelligence (whatever that might mean) but asserted as well that what they measured was hereditary. “Intelligence testing in the hands of men who hold this dogma,” Lippmann wrote, “could not but lead to an intellectual caste system in which the task of education had given way to the doctrine of predestination and infant damnation.”37 Carl Brigham, the lead developer of the original version of the SAT, also came to criticize the SAT and similar tests of mental ability. To be clear, until late in his career, Brigham was an ardent eugenicist whose writings employed test scores to defend his a priori conclusions regarding differences in the intellectual ability of racialized groups. Yet, upon reflecting on his work in 1929, Brigham wrote,

The Rise of Educational Testing and Test Bias 199 The more I work in this field, the more I am convinced that psychologists have sinned greatly in sliding easily from the name of the test to the function or trait measured. I feel we should all stop naming tests and saying what they measure … if we are to proceed beyond the stage of psycho-phrenology.38 Five years later, he added: The test movement came to this country some twenty-five or thirty years ago accompanied by one of the most glorious fallacies in the history of science, namely, that the tests measured native intelligence purely and simply without regard to training or schooling. I hope nobody believes that now. The test scores very definitely are a composite including schooling, family background, familiarity with English, and everything else, relevant and irrelevant. The “native intelligence” hypothesis is dead.39 It was not that Brigham came to believe testing was without utility. Rather, foreshadowing arguments put forth by Joel Michell regarding the focus on instrumentation rather than the development of theory, Brigham believed that “practice has always outrun theory,” and as a result, a strong understanding of what test scores represented was lacking.40 Despite critiques by Lippmann, Brigham, and others regarding the use of test scores to support claims about differences in the intellectual ability of racialized groups, these interpretations persisted. In the 1930s and 1940s, Cyril Burt, a British psychometrician, conducted a series of analyses examining the heritability of intelligence. Evidence from these analyses was used to support claims about innate differences in intelligence among racialized groups.41 In the United States, Arthur Jensen, a professor of psychology at Berkeley, employed tests of intelligence to support claims regarding genetic inheritance of intellect and differences among racialized groups.42 Similar interpretations made headlines in the mid-1990s when Richard Herrnstein and Charles Murray’s book, The Bell Curve: Intelligence and Class Structure in American Life, became a best seller.43 Across these examples, the notion of White superiority and faith in the objective nature of test scores drove interpretations that aligned with pre-existing beliefs and ignored potential score bias and social contexts. The Influence of the White Racial Frame As educational researcher Marguerite Clarke and her colleagues describe, “The early decades of the 20th century were characterized by a common faith in the power of technology, quantification, a benign science, a culture of objectivity, and cool reason to solve all manner of social problems.”44

200 The White Racial Frame and Educational Measurement As detailed in Chapter 2, this period was also marked with deep racism that expanded the social construction of race from the four geographically determined racialized groups established by Linnaeus to the more than two dozen racialized European groups. During this period, inherited fixed mental abilities understood as a property of the individual became reified by scores on intelligence tests, whether administered to immigrants passing through Ellis Island, recruits entering the Army, children attending schools, or young adults hoping to access higher education. Collectively, this embrace of objective mental measurement, validated in part by expected differences among racialized groups, was a product of the White Racial Frame. The White Racial Frame preconditioned these early test developers to understand mental abilities as inherited individual traits that determined whether one would become a successful contributor to society or a degenerate drag on society. The White Racial Frame conditioned the early test developers to understand their measures as objective, scientific, empirical observations of mental capabilities at work. And the White Racial Frame primed interpreters of test scores to expect scores to differ among racialized groups. Together, these features of the White Racial Frame enabled test developers and test users to accept test scores as objective, valid reflections of innate individual traits, leaving questions about potential bias or flaws in their instrumentation unasked. In turn, results from these tests documenting differences among racialized groups provided confirmatory evidence that reinforced racist beliefs about the superiority of people membered White and the inferiority of all other racialized groups. Although a handful of critical observers questioned the objectivity and fairness of these measures, these criticisms were largely dismissed as misguided concerns offered by critics lacking technical expertise sufficient to understand testing. Expansion of test use to inform merit-based decisions also reflect the utilitarian view of justice that dominates the United States. This reflection is seen best in Conant’s use of a standard test to base college admission decisions on intellectual merit in hopes of improving the nation’s leadership. What Conant did not consider, however, was the adverse impact tests that were influenced by background and opportunity would come to have on those already disadvantaged by society. Through merit-based admissions testing, good would come to many despite the disadvantage to others. Today, the overt racist views that dominated thought during the late 19th and early 20th centuries are less prevalent within the discourse produced by the educational measurement community. Yet, as we will see in the next chapter, modern discourse nonetheless produces racialized deficit narratives. Moreover, as is explored in Chapter 13, a belief in the objectivity of tests as a measure of mental ability and academic achievement continues to

The Rise of Educational Testing and Test Bias 201 impede a close and concerted focus on test bias. Perhaps most importantly, as Chapter 9 examines, the use of test scores to inform high-stakes decisions continues to contribute to the reproduction of disparate outcomes among racialized groups. Notes 1 Thorndike (1918), p. 16. 2 Madaus and O’Dwyer (1999). 3 Madaus et al. (2009). 4 Gallagher (2003); Russell (2006). 5 Gallagher (2003), p. 84. 6 Russell (2006). 7 Minton (1988), p. 93. 8 Minton (1988), p. 283, note 14. 9 Minton (1988). 10 Haney (1984). 11 Oakes (1985, 2008); Oakes and Lipton (1990). 12 Giordano (2005), p. 81. 13 Judd (1918), p. 152, quoted in Clarke et al. (2000), p. 162. 14 See Downey (1965) for a biography recounting the many contributions Ben Wood made to the development of educational testing. 15 Cubberly (1934), p. viii, quoted in Giordano (2005), p. 23. 16 Noble and Arps (1920). 17 Schudson (1972). 18 Schudson (1972). 19 Schudson (1972), p. 36. 20 Elliot quoted in Schudson (1972), p. 44. 21 Schudson (1972), pp. 37–38. 22 Lemann (2000). 23 Lemann (2000). 24 Lemann (2000). 25 Carson (2007), p. 2. 26 Jefferson quoted in Lemann (2000), p. 43. 27 See Sandel (2020), Lemann (2000), and Tough (2019) as a few examples of how the SAT and merit-based testing have impacted the higher education system and society more broadly. See Au (2013) for a discussion of how test score use contributes to inequality in access to higher education. 28 The American College Test (ACT) was introduced in 1959 by Everett Lindquist at the University of Iowa as a competitor to the SAT. Whereas the SAT was an offshoot of intelligence tests and was designed to reflect innate mental abilities, the ACT was designed to reflect a student’s general educational development. Despite the difference traits targeted by the SAT and ACT, their use to inform college decisions have both increased over the past several decades. 29 Freedle and Kostin (1988). 30 Kulick and Hu (1989). 31 Santelices and Wilson (2012). 32 The acronym TERC originally stood for Technical Education Research Centers. That full name has since been replaced by the name TERC. 33 Noble et al. (2014a), p. 5.

202 The White Racial Frame and Educational Measurement 34 Noble et al. (2014a, 2014b). 35 Randall (in press). 36 See Zenderland (2001) for criticism of Goddard’s sampling. 37 Schudson (1972), p. 51. 38 Brigham quoted in Lemann (2000), p. 33. 39 Brigham quoted in Lemann (2000), p. 34, italics in the original. 40 Brigham quoted in Lemann (2000), p. 34. See Michell (1999, 2005, 2014) for his critique of the field’s focus on instrumentation rather than the advancement of measurement theory. 41 Burt (1909, 1957, 1963, 1966). 42 Jensen (1969, 1984). 43 Herrnstein and Murray (1994). 44 Clarke et al. (2000), p. 161.

References Au, W. (2013). Hiding behind high-stakes testing: Meritocracy, objectivity and inequality in US education. International Education Journal: Comparative Perspectives, 12(2), 7–19. Burt, C. (1909). Experimental tests of general intelligence. British Journal of Psychology, 3(1), 94. Burt, C. (1957). Distribution of intelligence. British Journal of Psychology, 48(3), 161–175. Burt, C. (1963). Is intelligence distributed normally? British Journal of Statistical Psychology, 16(2), 175–190. Burt, C. (1966). The genetic determination of differences in intelligence: A study of monozygotic twins reared together and apart. British Journal of Psychology, 57(1–2), 137–153. Carson, J. (2007). The Measure of Merit. Princeton University Press. Clarke, M.M., Madaus, G.F., Horn, C.L. & Ramos, M.A. (2000). Retrospective on educational testing and assessment in the 20th century. Journal of Curriculum Studies, 32(2), 159–181. Downey, M.T. (1965). Ben D. Wood: Educational Reformer. Educational Testing Service. Freedle, R. & Kostin, I. (1988). Relationship between item characteristics and an index of Differential Item Functioning (DIF) for the four GRE verbal item types. ETS Research Report 88-29, GRE Board Report No. 85-3P. Gallagher, C.J. (2003). Reconciling a tradition of testing with a new learning paradigm. Educational Psychology Review, 15(1), 83–99. Giordano, G. (2005). How Testing Came to Dominate American Schools: The History of Educational Assessment. Peter Lang. Haney, W. (1984). Testing reasoning and reasoning about testing. Review of Educational Research, 54(4), 597–654. Herrnstein, R.J. & Murray, C. (1994). The Bell Curve: Intelligence and Class Structure in American Life. Simon and Schuster. Jensen, A.R. (1969). Environment, Heredity, and Intelligence. Harvard Educational Review.

The Rise of Educational Testing and Test Bias 203 Jensen, A.R. (1984). Test bias. In Perspectives on Bias in Mental Testing. Springer. Judd, C.H. (1918). A look forward. In The Measurement of Educational Products, 17th Yearbook, Part 2, of the National Society for the Study of Education. Public School Publishing Co. Kulick, E. & Hu, P.G. (1989). Examining the Relationship between Differential Item Functioning and Item Difficulty. College Entrance Examination Board. Lemann, N. (2000). The Big Test: The Secret History of the American Meritocracy. Macmillan. Madaus, G., Russell, M. & Higgins, J. (2009). The Paradoxes of High Stakes Testing: How They Affect Students, Their Parents, Teachers, Principals, Schools, and Society. Information Age Publishing. Madaus, G.F. & O’Dwyer, L.M. (1999). A short history of performance assessment: Lessons learned. Phi Delta Kappan, 80(9), 688. Michell, J. (1999). Measurement in Psychology: A Critical History of a Methodological Concept. Cambridge University Press. Michell, J. (2005). The logic of measurement: A realist overview. Measurement, 38(4), 285–294. Michell, J. (2014). An Introduction to the Logic of Psychological Measurement. Psychology Press. Minton, H.L. (1988). Lewis M. Terman: Pioneer in Psychological Testing. New York University Press. Noble, E.L. & Arps, G.F. (1920). University students’ intelligence ratings according to the Army Alpha test. Journal of Philosophy, Psychology and Scientific Methods, 17(17), 468–469. Noble, T., Kachchaf, R., Rosebery, A., Warren, B., O’Connor, M.C. & Wang, Y. (2014a). Do Linguistic Features of Science Test Items Prevent English Language Learners from Demonstrating Their Knowledge?. Grantee Submission. Noble, T., Rosebery, A., Suarez, C., Warren, B. & O’Connor, M.C. (2014b). Science assessments and English language learners: Validity evidence based on response processes. Applied Measurement in Education, 27(4), 248–260. Oakes, J. (1985). Keeping Track: How Schools Structure Inequality. Yale University Press. Oakes, J. (2008). Keeping track: Structuring equality and inequality in an era of accountability. Teachers College Record, 110(3), 700–712. Oakes, J. & Lipton, M. (1990). Tracking and Ability Grouping: A Structural Barrier to Access and Achievement. College Entrance Examination Board. Randall, J. (in press). It ain’t near ’bout fair: Re-envisioning the bias and sensitivity review process from a justice-oriented antiracist perspective. Educational Assessment. Russell, M. (2006). Technology and Assessment: The Tale of Two Interpretations. Information Age Publishing. Sandel, M.J. (2020). The Tyranny of Merit: What’s Become of the Common Good. Farrar, Straus, and Giroux. Santelices, M.V. & Wilson, M. (2012). On the relationship between differential item functioning and item difficulty: An issue of methods? Educational and Psychological Measurement, 72(1), 5–36.

204 The White Racial Frame and Educational Measurement Schudson, M. (1972). Organizing the ‘meritocracy’: A history of the College Entrance Examination Board. Harvard Educational Review, 42(1), 34–69. Thorndike, E.L. (1918). The nature, purposes, and general methods of measurements of educational products. Teachers College Record, 19(7), 16–24. Tough, P. (2019). The Years that Matter Most. Random House. Zenderland, L. (2001). Measuring Minds: Henry Herbert Goddard and the Origins of American Intelligence Testing. Cambridge University Press.

8

The Rise of Statistics in Educational Measurement

Racism, aristocratic class panic, anti-Semitism, xenophobia, and ableism are all original sins of modern statistics.1

In 2001, President George W. Bush signed the No Child Left Behind Act into law. Testing students annually in English language arts and mathematics was one of several components of the legislation. Test scores were used as a part of an accountability system intended to shed light on the quality of education provided by public schools and to inform the distribution of funds to support low-performing schools. Implementation of the program varied across states, in some cases leading to the use of test scores to inform student retention and graduation, school sanctions, and judgments about individual teachers. The program, however, also directed increased attention to instruction for students with individual education plans and, in some school systems, students considered “at risk.” Despite its intent to improve learning for all students, many observers were justifiably critical of the heavy reliance on standardized tests and the use of test scores to inform punitive decisions in some locales. During a press conference in 2005, President Bush was asked whether he thought No Child Left Behind was working. He replied, Listen, the whole theory behind No Child Left Behind is this: If we’re going to spend federal money, we expect the states to show us whether or not we’re achieving, you know, simple objectives—like literacy, literacy in math, the ability to read and write. And, yes, we’re making progress. And I can say that with certainty, because we’re measuring … Instead of just spending money and hoping for the best, we’re now spending money and saying, Measure.2 President Bush’s response is revealing in two ways. First, because No Child Left Behind based its evaluation of school quality solely on student test DOI: 10.4324/9781003228141-11

206 The White Racial Frame and Educational Measurement scores, Bush’s response reflects the lofty level to which educational testing had ascended as an independent, objective, and precise tool for assessing student achievement—at least as viewed by politicians and many in the general public. Second, Bush’s focus on measures reflects the degree to which the quantitative imperative had penetrated the U.S. educational system. No Child Left Behind put faith in a single set of numbers—the average score for students within a school and the percentage of students performing above a minimum level of test performance—to represent the quality of instruction. As George Madaus, late professor of educational measurement and evaluation, and his colleagues observe, “Bush’s use of high-stakes tests to measure the outcomes of education reflects a larger belief in the use of metrics to determine the success of any policy.”3 The necessity of measuring to determine the effectiveness of public policy is similarly reflected in comments made by Ken Mehlman, a former chairman of the Republican National Committee, who argued, “If you can’t measure it, it’s not worth doing, because then you [won’t] know whether you’re being successful. That’s how you avoid hope being your strategy.”4 Both Bush and Mehlman extolled the importance of quantitative measures as indicators of success (or failure). Absent from their comments, yet equally important for evaluating the effectiveness of a policy, program, or intervention, are methods for determining whether changes in measures are meaningful. It is here that statistics enter educational measurement, and social science more generally. Over the past century, statistical techniques have emerged as the dominant tool for analyzing quantitative information provided by educational measures. Test developers depend heavily on statistical techniques to inform decisions throughout the test development process—descriptive statistics are calculated for individual items, correlations are used to examine item discrimination, fit statistics are calculated to determine how well response patterns fit a measurement model, and differential item functioning statistics are calculated to examine potential item bias. Statistical techniques also play an essential role in producing test scores—the mean and standard deviation are used to create norm-referenced scales and to transform scores between scales, regression techniques are employed to calculate growth percentiles, and maximum likelihood methods are used to estimate item difficulty and test-taker ability in item response theory modeling. Statistical techniques are also essential for the analysis of test scores. T-tests and analyses of variance are employed to examine mean differences in scores among groups of test takers. Correlations are calculated to examine relationships between test scores and a wide assortment of variables. Time series analyses are used to examine trends over time. Regression models are developed to estimate the relationship various factors have with test

The Rise of Statistics in Educational Measurement 207 performance and educational achievement more broadly. And p-values are calculated to determine the statistical significance of the various statistics estimated by each of these methods. Given the central role statistics play in educational measurement, this chapter explores the history of statistics. In recent years, several scholars have argued that the statistical techniques employed most commonly today are a racist production.5 In forming this argument, critics point to the eugenic beliefs embraced by pioneers in the development of statistical techniques. As the quote by Aubrey Clayton that opens this chapter reflects, the texts produced by Francis Galton, Karl Pearson, Ronald Fisher, and, to a lesser extent, Charles Spearman leave no doubt that these pioneers held deeply racist, classist, and ableist beliefs and embraced eugenic ideas.6 These pioneers also openly advocated eugenic policies and practices that are an abhorrent affront on human decency. Given the rich and detailed history of statistics conveyed by Stephen Stigler and Theodore Porter, this chapter does not aim to retread their accounts. Instead I focus on a few key developments that intersect with recent critiques of the racist nature of statistics.7 Similarly, given the detailed accounts of eugenics and the eugenic beliefs proffered by these pioneers of statistics offered by Stephen Gould, Leila Zenderland, and Michael Bulmer, among others, this chapter does not attempt to recount the full depth and breadth of their racist beliefs.8 Instead, this chapter begins with a condensed history of key developments that manifest in statistical practices employed commonly today when developing test instruments and when using test scores in studies of educational practices and outcomes. In this brief history, I highlight how interest in eugenics provided motivation for the development of these techniques. Next, the chapter explores how the methods produced by these pioneers influence practice today. Here I focus specifically on the conception of a population when examining differences among groups and the discourse employed when discussing coefficients estimated by regression models. The chapter then visits the recent argument offered by Aubrey Clayton, which posits that the eugenic interests pursued by the pioneers in statistics elevated attention on a frequentist school of statistics and laid silent for decades a Bayesian approach. This analysis reveals the ways in which racialized views and white supremacy, the hereditary nature of human traits, and quantitative objectivity motivated and shaped the statistical methods developed during the late 19th and early 20th centuries. A Brief History of the Early Development of Statistics Aubrey Clayton, a mathematician who specializes in the philosophy of probability and statistics, observes that many of the statistical methods used

208 The White Racial Frame and Educational Measurement by social scientists today are based on a frequentist conception of probability.9 The frequentist conception frames probability in terms of the frequency with which an event occurs or is expected to occur over the long run. Statistical inferences, then, focus on the likelihood that an observed event would occur, given the frequency with which the event is known or expected to occur. Clayton observes that the frequentist perspective is just one way to understand probability. The frequentist approach is considered objective in that it bases probability statements solely on the relationship between what is observed and the frequency with which such an observation occurs or is expected to occur, and thus excludes additional (subjective) information or views held by the analyst. As we will see later in this chapter, the frequentist perspective was embraced by the early pioneers who applied probability theory to develop the statistical techniques commonly employed by social scientists today. In part, this embrace was motivated by the type of questions about society that these pioneers were interested in exploring—questions that centered on the heredity of traits and differences among people and racialized groups of people. The roots of the frequentist perspective of probability theory date to the mid-16th century, when Gerolamo Cardano, an Italian polymath with an interest in gambling, published his Book on Games of Chance. Examining dice throwing, Cardano developed the idea of expressing the odds of winning a throw as the ratio of experiencing a favorable result to the total of all possible outcomes.10 Motivated by a desire to increase his success in gambling, Cardano limited his exploration of probability to the occurrence of an event given a known set of possible outcomes such as rolling a given number in a game of dice or holding a given card or set of cards. Nearly a century later, interest in gambling motivated another advance in probability theory. In this case, the focus shifted from determining the chances of winning a game to determining how to fairly divide the pot or stakes for a gambling event when the game is interrupted before its completion. Through written correspondence, Blaise Pascal and Pierre de Fermat developed formal rules for calculating probability events. Imagine a game of chance, such as flipping a coin, in which the winner is determined when a specified number of outcomes occur (e.g., the first to ten heads or tails). Further, imagine that the game has begun, but after only a few rounds the game is interrupted. The question explored is how to fairly divide the earnings given the observed outcome at the time of the interruption. Fermat and Pascal advanced two different methods to solve this problem. Fermat proposed laying out all possible outcomes if the game were to continue and then dividing the pot based on the odds that each player would win. Pascal approached the problem from the perspective of expanding a

The Rise of Statistics in Educational Measurement 209 binomial expression which proved to be a more efficient solution for these types of problems.11 The next major advance in probability theory occurred in 1713, when Jacob Bernoulli’s work detailing what has come to be known as the law of large numbers was published. When developing the law of large numbers, Bernoulli’s main concern focused on estimating a distribution based on observations. As Bernoulli describes: suppose that without your knowledge there are concealed in an urn 3000 white pebbles and 2000 black pebbles, and in trying to determine the numbers of these pebbles you take out one pebble after another (each time replacing the pebble you have drawn before choosing the next, in order not to decrease the number of pebbles in the urn), and that you observe how often a white and how often a black pebble is withdrawn. The question is, can you do this so often that it becomes ten times, one hundred times, one thousand times, etc., more probable (that is, it be morally certain) that the numbers of whites and blacks chosen are in the same 3:2 ratio as the pebbles in the urn, rather than in any other different ratio?12 Bernoulli reasoned that the larger the number of samples drawn, the more accurately the observed ratio will approximate the actual ratio of white to black pebbles. Further, Bernoulli posited that as the number of observations increases, the accuracy of approximation also increases. Bernoulli’s contribution to probability theory shifted focus from exploring probability within a space in which the frequency of events is known (e.g., the number of sides on a fair coin and thus the odds that a coin will fall heads or tails side up) to one of inference—that is inferring an unknown distribution of white and black pebbles in an urn based on a sample of observations. As Clayton describes, Bernoulli advanced probability theory from sampling probabilities focused on the frequency of occurrences to inferential probabilities that focus on the confidence we have in a given statement based on a sample of observations: Sampling probabilities lend themselves to a frequency-based interpretation; the probability of something measures how frequently it happens. Inferential probabilities require something more subtle; the probability of a statement depends on how much confidence we have in it. Sampling probabilities go from hypothesis to data: Given an assumption, what will we observe, and how often? Inferential probabilities go from data to hypothesis: Given what we observed, what can we conclude, and with what certainty? Sampling probabilities are fundamentally predictive, inferential probabilities are fundamentally explanatory.13

210 The White Racial Frame and Educational Measurement As we will see shortly, it is this shift from predictive to explanatory that Galton, Pearson, Fisher, and other pioneers of modern statistical methods struggled to make as they introduced new statistical methods. But first there is one more important development that is essential to modern statistical methods: the discovery of the normal distribution and its applicability to human characteristics. This discovery occurred in two phases. First, in 1809, Carl Gauss published The Theory of the Motion of Heavenly Bodies Moving about the Sun in Conic Sections, in which he explored mathematical expressions of planetary orbits. One of the expressions Gauss presented became known as the normal distribution. In effect, Gauss’s expression is built on Pierre-Simon Laplace’s work on the instability of astronomical observations. Examining the manner in which astronomical observations varied, Laplace noted that observations (or more accurately, errors in observation) were distributed in a manner resembling a bell curve. Building on the notion of a normal distribution of observations, Gauss derived a mathematical expression for the normal distribution. In 1835, the Belgian mathematician Adolphe Quetelet extended the applicability of the normal distribution of error to human traits. Analyzing a large set of data documenting various measures of people—arm length, chest circumference, etc.—Quetelet noted that the distribution of these measures closely resembled a normal distribution. Since these measures were collected from a common set of people, Quetelet interpreted this normal distribution as variation within a single population. Quetelet understood the mean measure as representing the idealized “average” or “normal man,” and variation around this mean as simply error in the reproduction of humans.14 Viewed from this perspective, Quetelet understood human variation fundamentally as error.15 As Porter describes, Quetelet also introduced the idea that the average man could be treated as the “type” of the nation, the representative of a society in social science comparable to the center of gravity in physics. Hence for the average man “all things will occur in conformity with the mean results obtained for a society. If one seeks to establish, in some way, the basis of a social physics, it is he [average man] whom one should consider, without disturbing oneself with particular cases or anomalies, and without studying whether some given individual can undergo a greater or lesser development in one of his faculties.”16 Clayton similarly notes that the apparent stability of these average propensities across time and place gave a sense that perhaps there was some order to human society after all.

The Rise of Statistics in Educational Measurement 211 Even if an individual person’s life was subject to chaotic ups and downs, the life of the average man was remarkably predictable.17 It was this predictability that Quetelet believed provided the foundation for applying statistics to create the theory of social physics which Auguste Comte introduced as social science. Quetelet’s observation regarding the normal distribution of human physical traits and his notion of the average man representing a society or nation strongly influenced the rapid development of statistical methods that occurred during the late 19th and early 20th centuries. At this point in the brief history of the development of statistical methods, Francis Galton again plays an important role. Whereas Quetelet understood the average man as a description or depiction of a society and viewed the distribution of measures of a given trait as error in the reproduction of the average man, Galton, and later Karl Pearson, expanded applications of error theory and the normal distribution to explore the cause of variance in human traits, behaviors, and outcomes.18 For both Galton and Pearson, as well as other eugenicists, heredity was the cause attributed for variance in human traits. And it was evidence of the role heredity played in producing differences among humans that motivated Galton and Pearson’s development of statistical techniques.19 As Clayton describes, What united them [Galton, Pearson, and Fisher] was an understanding of the power of their new statistical machine to shape society according to their agenda, an agenda that, in turn, led them to shape statistics to be what they needed … quantitative ways to express arguments about human heredity and selective breeding. Clayton similarly observes that the development of statistical methods was essential in order to assert an authority founded on what they claimed was objective truth … their methods became more cloaked in objectivity as statistics gained more political importance, until by the end the stakes were such that they couldn’t allow any hint of subjectivity.20 As we will see later in this chapter, embrace of objectivity produced an opportunity cost that thwarted the development of alternate methods of statistical analyses that may be more useful for estimating educational efficacy and other questions of interest in the field of education. Whereas Quetelet discouraged focus on cases that deviated from the “normal man” and resisted attention on those in a population with greater or lesser standing on a trait, Galton, Pearson, and Fisher were deeply interested in just these types of occurrences. For these three pioneers of modern

212 The White Racial Frame and Educational Measurement statistical methods, the normal distribution became a tool for quantifying the magnitude of differences among humans. Galton’s Statistical Contributions Galton made several early contributions to educational measurement, many of which focused on introducing statistical concepts that are core to the analytic work conducted in the field today. Here I limit focus to three of these pivotal concepts: statistics by intercomparison, correlation, and regression. During his early years, Galton traveled extensively, documenting all sorts of information about the groups of people he met. One form of information focused on physical traits. As Galton encountered tribes of people, he would ask that they line up in order of their height. Rather than measure each individual’s height, Galton limited his measurement to the two people at the ends of the line, at the middle of the line, and a quarter of the way between the middle and the ends. With these five measures, Galton was able to estimate the height of the average person in the tribe while also obtaining a sense of the variation in height among the tribe of people.21 As described in Chapter 5, when contemplating the hereditary nature of genius, Galton similarly divided his theoretically derived distribution of mental ability into 14 equally spaced units to estimate the number of people in England functioning at different levels of cognitive ability.22 Dividing a distribution into ordered segments, which Galton termed statistics by intercomparison, proved useful for describing variation within a sample of observations and informed the development of percentiles, quartiles, and similar divisions of data sets. Galton’s discovery of correlation has been retold many times, in part due to the manner in which Karl Pearson draws on a passage in Galton’s memoirs to describe an alleged moment of epiphany that occurred as Galton sought refuge from a rain shower under a rocky recess. Stephen Stigler, a professor and historian of statistics, challenges this account, arguing that Galton’s invention of correlation was a product of his inquiry into the hereditary nature of human traits. As Stigler describes it, [Galton] was faced with a problem; how to reconcile an empirical fact with a mathematical theorem. The fact was that most physical measurements (such as height for men or diameter for seeds) were approximately normally distributed in the populations he studied. The theorem was the central limit theorem, which stated that the normal distribution should arise when an object is subjected to a large number of independent disturbances, no few of them dominant. The problem of reconciliation that confronted Galton was that he believed his physical measurement to be subject to important, even dominant influences in the process of heredity. How could, for example, the dominant factor of father’s height be reconciled

The Rise of Statistics in Educational Measurement 213 with the appearance of normality in the offspring that seemed to belie the existence of such a single dominant factor?23 Through experimentation with a device Galton invented and which he termed a Quincunx, Galton was able to observe how a disturbance that produced a dominant influence yields a subsequent distribution that is normally distributed. Galton had originally created his Quincunx to help lay audiences understand the concept of the central limit theorem.The Quincunx took the form of a board on which pins were arranged in equally spaced columns, with each row offset by one-half of that space. A funnel was placed at the top of the device into which shot was directed to the topmost center pin. The shot then bounced off pins as it worked its way down through the rows, moving randomly to the left or right with each bounce. Shoots or slots were placed at the bottom of the device into which the shot collected after passing the final row of pins. The bouncing of the shot off each pin represented a binary random disturbance or “effect,” the accumulation of which distributed the shot in a manner resembling the normal distribution. Galton’s Quincunx provided a visual representation of the repeated binary outcomes Pascal applied when developing his solution to the interrupted game of chance. To explore the hereditary question troubling him at the time, Galton modified his Quincunx to add intermediary shoots through which the shot would pass before reaching the bottom row. These intermediary shoots represented the passing of hereditary traits onto siblings. In effect, what Galton showed was that shot released onto the top-most middle pin would become normally distributed through its initial random travel. Then, as shot was released from each intermediary shoot, random deflections on subsequent pins would similarly produce a normal distribution, but one centered on the location of the intermediary shoot from which the shot was released. The cumulative effect of these many normal distributions was in turn one large single set of shot that was also normally distributed. Through this demonstration, Galton was able to see how data that was normally distributed could give rise to a similarly normally distribute population of “offspring,” yet still maintain similarity in the location of the “parent” and “offspring” within each distribution. It was this relationship between the parent’s location within the initial distribution and their children’s locations in the subsequent distribution that gave rise to the concept of correlation.24 Galton’s third major contribution introduced the statistical idea of regression. Porter describes the motivation that led to Galton’s discovery of regression in this way: Galton finally gave up theorizing about hereditary models and investigating the inheritance of acquired characteristics about 1873, and turned at

214 The White Racial Frame and Educational Measurement last to investigation of the statistical properties of natural inheritance discussed in the conclusion of Hereditary Genius. His inability to apply calculation to his survey of men of science completed in 1874 confirmed what he had earlier suspected, that the laws of heredity could best be investigated experimentally, using the simplest possible materials. He was, in any event, confident that these laws were universal and, once discovered, could be applied to inheritance of intellectual and moral traits.25 With the goal of proving the hereditability of mental and moral traits in mind, Galton’s first effort to explore statistically the hereditability of traits focused on the inheritance of weight in pea plants. Galton began his investigation by grouping a collection of pea seeds based on their weight. He then distributed batches of peas to various acquaintances with strict instructions to plant the seeds in similar soil and to subject the offspring to similar growing conditions.26 Once the offspring yielded their own seeds, Galton’s acquaintances collected the seed samples, taking care to organize the samples based on the parent seed’s weight. Galton then measured the child seeds for each parent seed, producing a separate distribution of weight for each set of offspring seeds. His analysis of the parent seeds and the many sets of offspring revealed a few interesting patterns. First, Galton noted that the dispersion or error from the average within each set of offspring was similar despite the size of the parent. In other words, the amount of variability was similar across all family sets. Yet, the average weight within each offspring set differed systematically, such that the average weight of offspring of larger parent seeds tended to be larger than the average weight of offspring of smaller parent seeds. Here, Galton made one further observation that was critical for developing his idea of regression: while there was a clear relationship between the average weight of the offspring and that of their parent, the average weight of the offspring tended to be closer to the average weight of all offspring than was the parent’s weight to the average of all parent seeds. In other words, the average offspring of any given parent seemed to move or regress back toward the grand average. This observation gave rise to the concept of regression and soon led to a statistical technique for examining regression in quantitative data sets. As Porter reminds us, “Galton was, it must be remembered, studying heredity, not seeking new methods of statistics.” Galton’s data on the genius of eminent men did not permit the type of analysis enabled by his experiments with peas. Although his measures of peas “did not bear directly on the questions of intellectual and moral inheritance that most concerned him … [t]he results seemed to meet his expectations, and Galton was at last able to apply a close statistical analysis to hereditary materials.”27

The Rise of Statistics in Educational Measurement 215 Karl Pearson and Ronald Fisher Galton possessed an unusual gift for seeing patterns in data, but his ability to translate his ideas into mathematical expressions was limited. It is here that Karl Pearson played an important role. Like Galton, Pearson held strong eugenics beliefs that motivated his interest in statistics. As just one example of the essential role he saw statistics playing in the eugenics movement, in 1911 Pearson published a book titled The Scope and Importance of the State of the Science of National Eugenics, in which he proclaimed: Those who have not the courage, or it may be the strength to face life as it is, must avoid Science; or at least the portion of it termed National Eugenics. Those who fear to know humanity in its degradation, as well as in its nobler phases, will scarce reach the standpoint of knowledge from which they can effectively help the progress of our race. They will be ignorant of the essential factors which alone can determine whether a nation shall be sound in mind and body. Disease and Health, Vigour and Impotence, Intelligence and Stupidity, Sanity and Insanity, Conscientiousness and Irresponsibility, Clean Living and Licence,—all things which make for strength and weakness of character—must be studied, not by verbal argument, but be dissected under the statistical microscope, if we are to realize why nations rise and fall, if we are to know whether our own folk is progressing or regressing. Only by such examination can we understand the disease; only by such means can we suggest a valid cure where we find there is that in any community which is making for degeneracy. The study of Eugenics centres round the actuarial treatment of human society in all its phases, healthy and morbid.28 Pearson’s contributions to statistics, however, began years before his treatise on the science of eugenics. Soon after reading Galton’s work on regression and co-relations during the first half of the 1890s, Pearson set to work developing a mathematical approach for expressing the relationship between two sets of data. As Pearson describes in the opening paragraph of his manuscript published in 1896: The problems of regression and heredity have been dealt with by Mr. Francis Galton in his epoch-making work on “Natural Inheritance” but, although he has shown exact methods of dealing, both experimentally and mathematically, with the problems of inheritance, it does not appear that mathematicians have hitherto developed his treatment, or that biologists and medical men have yet fully appreciated that he has really shown how many of the problems which perplex them may receive at any rate a partial answer. A considerable portion

216 The White Racial Frame and Educational Measurement of the present memoir will be devoted to the expansion and fuller development of Mr. Galton’s ideas, particularly their application to the problem of bi-parental inheritance.29 In his manuscript, Pearson presented his mathematical treatment of Galton’s ideas, introducing a formula for calculating a coefficient that represented the strength of correlation among two data sets. From this effort, Pearson also developed mathematical methods for estimating coefficients for linear regression analyses. In addition to introducing a formula for calculating a correlation coefficient, Pearson also developed mathematical methods for estimating the standard deviation, an important statistic used in educational measurement to calculate z-scores and to perform normative scale transformations. In addition to his interest in relationships among variables, Pearson was also interested in questions about differences between observed patterns and expected patterns, as well as differences between groups. Interest in the first topic led Pearson to develop the chi-square test and the p-value as a tool for determining statistical significance.30 As Porter describes in his biography of Karl Pearson, Pearson’s initial use of the chi-square test derived from his interest in distribution curves and determining whether a given set of data fit an expected distribution. In the mid-1910s, Pearson extended his use of the chi-square test to examine causal relationships. Perhaps most notable was a paper in which Pearson employed the chisquare test to examine the relationship between paternal wages and family size—a question motivated by his eugenic belief that poorer families were procreating at higher rates than the wealthy and thus creating a drag on society.31 Pearson’s method of calculating chi-square was later modified by Ronald Fisher to function more accurately when applied to make comparisons between groups.32 Fisher also made several other contributions to statistical techniques. Perhaps most impactful for applications of statistics in studies of educational outcomes today was Fisher’s work on significance testing and the use of p-values to evaluate the statistical significance of findings. Like Galton and Pearson, Fisher’s work on significance testing was motivated by an interest in documenting differences between groups of people, at times to inform the selection of people. As an example, in an article titled “The Elimination of Mental Defect” published in 1924, Fisher employed statistical analyses to address “a widespread misapprehension of the effectiveness of selection, either by segregation or sterilization, in purging the population of its feebleminded strains.”33 Later, Fisher applied his statistical techniques to provide objective evidence of the benefits forced sterilization of “feeble minded high-grade defectives” would have for British society.34

The Rise of Statistics in Educational Measurement 217 Charles Spearman A final contribution of note during this early period of modern statistics was made by Charles Spearman. Like Galton, Pearson, and Fisher, Spearman supported eugenic notions and, as the title of an article he published in 1914 makes clear, he was a firm believer in The Heredity of Ability.35 Early in his career, Spearman introduced an alternate method for calculating a correlation coefficient. Rather than examining correlation between interval data, Spearman’s method supported analysis of the strength of relationships between ranks. Spearman’s interest in correlation later led to his development of factor analytic techniques. Capitalizing on the concept of correlation, factor analysis allowed relationships among many variables to be examined simultaneously and to combine these many variables into scores representing one or more factors believed to be represented among the variables. Most notably, Spearman applied factor analysis to examine relationships among responses to items on tests of mental ability. As described in Chapter 6, at the time, tests of mental ability comprised batteries of items, each battery focused on what was believed to be a facet of intelligence, such as vocabulary knowledge, spatial ability, mathematical ability, and so on. Based on factor analysis of intelligence test items, Spearman argued that performance on these tests was the product of two forms of intelligence. The first form was task-specific while the second was a general form of intelligence, which he termed g. As Spearman describes, “ability depends on two factors; the one of these is a specific ability or disposition, different and independent for every different kind of ability; the other is the general energy of the mind, always the same.”36 Although Spearman acknowledged these two forms of intelligence, he gave more weight to the second general form, believing that g influenced all forms of cognition. Further, it was the heritability of g that was most influential in determining whether one is feebleminded, an average man, or of superior intellect. Spearman’s conception of g was hotly debated for several decades and was foundational to the arguments regarding differences in intelligence among racialized groups posited decades later by Arthur Jensen, Richard Herrnstein, and Charles Murray, among others.37 Influences on Educational Measurement As practiced today, educational measurement relies heavily on statistical concepts and techniques developed in the 19th and early 20th centuries. Test norms, scale scores, percentile ranks, and growth percentiles developed for many of today’s tests are products of the normal distribution and Galton’s method of inter-comparison. Item discrimination and dimensionality analyses rely on correlation and factor analytic techniques introduced by Galton, Pearson, and Spearman. Analyses of item and test bias draw on

218 The White Racial Frame and Educational Measurement regression techniques and significance testing introduced by Pearson and Fisher. And uses of test scores to examine the efficacy of educational interventions similarly depend heavily on regression analyses, significance tests of differences among group means, and a variety of related techniques to examine group differences. Rather than explore these more obvious applications of statistical techniques, I focus on two less apparent influences these pioneering statistical tools have had on educational measurement—influences which contribute directly to the role educational measurement plays in systemic racism detailed in Chapter 9. The first influence focuses on the conception of population and statistical tests of differences among groups. The second influence focuses on the terminology inherited and still used today when presenting and discussing findings from regression analyses. By exploring these two influences, I hope to show that the techniques themselves are not problematic, but that the ways in which these techniques encourage us to orient our questions and discourse our findings in ways that sustain educational measurement’s role as apparatus for systemic racism are concerning. Interestingly, concern about the ways in which statistical techniques may influence the questions asked is not new and dates back to Pearson’s initial encounter with Galton’s statistical ideas. Writing in 1889, Pearson warned, there is, in my own opinion, considerable danger in applying the methods of exact science to problems in descriptive science … the grace and logical accuracy of the mathematical processes are apt to so fascinate the descriptive scientist that he seeks for sociological hypotheses which fit his mathematical reasoning and this without first ascertaining whether the basis of his hypotheses is as broad as that human life to which the theory is to be applied.38 As I explore in Chapter 12, the limitations of current statistical models presents notable challenges to analyses performed through the lens of Intersectionality Theory. Differences Among Populations

Although brief, the history presented to begin this chapter describes how the questions explored by pioneers in statistics shifted from probability of winning games of chance to making inferences based on a sample and then onto examining differences among people. It is this landing spot where differences among groups of people are the primary concern on which the field still stands. As noted earlier in this chapter, Quetelet’s analysis of measurements of human features led to his conception of the average man from which members

The Rise of Statistics in Educational Measurement 219 of a population disperse. Quetelet thought of this dispersion as error in the production of the “normal man” but understood that all people within the distribution formed a single population, the members of which deviated by various degrees from the “average man.” Galton similarly noted that human physical traits and behaviors tended to be normally distributed. Through his experiments with the Quincunx, Galton also noted that a disturbance to a series of binary random events creates separate populations, the members of which are normally distributed around distinct population means. By considering these sub-distributions produced by a disturbance as separate populations, Galton planted the seed for employing properties of the normal curve (or any expected distribution) to determine whether two groups are drawn from the same population.39 This idea was picked up first by Francis Edgeworth, who developed a method that used information about variation within samples to estimate whether differences in the means and medians of those samples were meaningful.40 As Stigler describes, The technical apparatus behind the test Edgeworth used was not new; it dated from Laplace. What was novel was the conceptual setting in which the apparatus was employed. Edgeworth, continuing along the path Galton had marked out in 1875, was subdividing populations that might have been considered homogenous by Quetelet’s test of fitting a normal curve and was then testing for differences between subpopulations using estimates of variability internal to the subpopulations.41 These methods were later advanced through William Gosset’s Student’s t-test, Pearson’s chi-square test, and Fisher’s analysis of variance, all of which are commonly applied today to examine differences among subgroups of people. As an example, one of Pearson’s first application of his chi-square test compared measurements of skulls collected in southern Germany from graves dug in the 5th through 7th centuries. Through his analysis, Pearson concluded that differences among the skulls evidenced that the area was populated by two separate and distinct races of people. Pearson described the value of employing statistical analysis to compare measures among members of groups, noting that “[t]he asymmetry may arise from the fact that the units grouped together in the measured material are not really homogeneous.”42 As Clayton observes, That skull measurements could indicate differences between races—and by extension, differences in intelligence or character—was almost axiomatic to eugenicist thinking. Establishing those differences in a way that appeared scientific was a powerful step toward arguing for racial superiority.43

220 The White Racial Frame and Educational Measurement Fisher similarly applied the logic of testing differences among subgroups to determine whether the members of each subgroup represent a single population or formed two different populations. In effect, a t-test and other statistics employed to determine whether differences among groups are statistically significant apply the following logic. First, the groups are assumed to be drawn from the same population distribution. The probability of drawing group members from the same population distribution such that a difference in group mean is equal to or larger than what is observed is estimated. When the probability of drawing groups that yield a difference as large or larger than the observed difference is smaller than a predetermined threshold, the groups are understood as having been pulled from two different populations. On the surface, this logic is sensible. Yet, two aspects of this logic are potentially problematic. First, use of the term population connotes a fundamental difference among people that allows members of one group to be clearly distinguished from another. In part, this implication is an artifact of the colloquial meaning of the word population typically used in reference to the people residing in socially constructed entities defined by clearly established borders such as nations, states, provinces, and towns. Given this colloquial meaning, we are conditioned to think of two populations as separate and distinct. This understanding of populations was clearly evident in the thinking of Galton, Pearson, and Fisher, all of whom understood racialized groups as forming different populations. Fisher similarly seemed to understand economically, racially, and socially defined classes of people as being composed of fundamentally different populations of people. In part, their conception of racialized groups of people forming different populations is a product of their biological conception of race.44 Although many scholars no longer embrace the biological conception of race and instead acknowledge race as a social construction, as we will see in a moment, the framing of research questions, interpretation of findings, and discourse employed when presenting results tend to express racialized groups as separate populations of people. As Janet Helms, a professor of research psychology, points out, this conception of separation is misleading, particularly when examining differences among test scores. In a distribution of test scores, the variation within racialized, gendered, or other socially constructed groups of test takers is typically (much) larger than the variation between groups. In addition, despite differences in mean scores, considerable overlap typically exists among the two distributions of scores. In most circumstances, members of each group are found at all points in the distribution. The use of the term population, however, implies that the border between groups is more clearly defined than the data typically reflects.45 A second concern with framing the analysis with respect to population differences is that the term implies that a given person can be discretely

The Rise of Statistics in Educational Measurement 221 placed into one group or another. A notion of discrete placement ignores the fluidity and, in some cases, multiplicity of many socially constructed categories. As an example, at first glance a trait with clearly defined borders such as nationality or residential zip code may seem discrete. But consider people with dual citizenship, children who divide their time between two households, people without permanent homes, and people with multiple residences. When people experiencing each of these situations are included in a study, their group membership is arbitrary. Membering people into groups becomes even more complex when various forms of identity are simultaneously considered. As explored in Chapters 1 and 2 and examined in greater detail in Chapters 11 and 12, racialized identity categories are complex, evolving, and context-dependent. Similar complexity exists for gender, sexuality, and socioeconomic status, among other socially constructed categories. The attention directed to group and population membership by current methods typically employed to conduct inferential tests of statistical significance encourages educational measurement specialists to essentialize identity and, in many cases, treat it as a stable discrete trait. In addition, the challenge of membering people into racialized categories is already problematic in the United States. As the people residing within the United States become increasingly diverse, the number of children attending schools whose racialized identity crosses more than one socially constructed group will continue to increase, and with it the arbitrary assignment of racialized identity will further confound racialized membering. Conceiving identity as discrete essentialized group membership also encourages the conception and subsequent treatment of groups as monoliths composed of homogenous members. This focus on homogeneity is particularly problematic when examining differences among racialized groups. As explored in Chapter 12, Kimberlé Crenshaw, Patricia Collins, Vivian May, Lisa Bowleg, and other Intersectionality theorists emphasize that identity is a multiplicity. Failure to consider the complex and interlocking aspects of identity can produce misleading findings from statistical analyses focused on essentialized monolithic group differences.46 Yet, the existing methods for examining group differences require simplicity. Intersectional conceptions of identity rapidly inflate the number of groups among which differences become a focus of analysis. As groups become more specific through their complex multiplicity, sample sizes decrease. Together, the increased number of groups and smaller sample sizes within groups stress statistical power and inflate Type I error produced by current statistical methods.47 As our understanding of our world increases in complexity, the limitations of current statistical methods force educational measurement specialists to cling to older conceptions of identity and group membership. I began this section by stating that the problem resides not within the methods but in

222 The White Racial Frame and Educational Measurement the use of those methods. This opening statement is only partially correct. The challenge today is that understanding of identity and groups has advanced to a state that surpasses that held when these methods were developed. Lacking new and more sophisticated statistical methods aligned with current understandings and questions, educational measurement specialists are forced to employ the existing tools in their toolbox. In turn, either use of these tools requires the questions asked to be adjusted to align with the limitations of the tools, or exploration of today’s questions places stress on the tools—a challenge that will likely produce misleading results. This tension and alternate frames that may provide insight into paths forward are explored further in Part III. Regression Effects and Deficit Narratives

Regression techniques are commonly applied to examine the relationship between various input or independent variables and an outcome or dependent variable. Because the samples employed for many educational research studies are not randomly assigned to conditions, regression techniques are also employed to adjust estimates of efficacy for differences that may be associated with specific characteristics of students forming each sample group. As an example, a typical regression analysis might be employed to estimate the effect that a new instructional technique has on student achievement as measured by a test score. Teachers in one sample of classrooms are trained on the instructional technique and apply it during their teaching. A separate sample of teachers employs their standard instructional practice. Recognizing there may be differences in prior achievement, the regression analysis may use a pre-test score to account for these differences. In this simple example, both pre-test and exposure to the intervention are used to predict or model a post-test score following instruction. The researcher may also observe additional differences among the students forming the treatment and comparison groups. As an example, the racialized composition of students may differ among groups. Observing that there tends to be differences in test scores among racialized groups, racialized identity is often added to a regression model to adjust for any disparities associated with racialized identity. Textbooks and instruction in statistics courses emphasize the difference between correlation and causation such that the phrase “correlation is not causation” is commonly conveyed to students as they learn statistical techniques. As the brief history presented in this chapter describes, regression techniques are a direct outgrowth of correlation, and estimates of regression coefficients capitalize on correlation among a predictor or input variable and outcome variables. When a study employs a strict experimental

The Rise of Statistics in Educational Measurement 223 design in which participants are randomly assigned to treatments, findings from regression analyses may be interpreted as causal. In many educational studies—particularly those that focus on curricular, instructional, and policy impacts—random assignment of participants is not possible. Yet, the discourse employed when discussing findings from regression analysis often contains a causal flavor. Specifically, the phrasing when discussing a regression coefficient often includes the terms effect and/or impact. As an example, a study published in 2002 titled “The Increasing Significance of Class: The Relative Effects of Race and Socioeconomic Status on Student Achievement” included racialized identity and household socioeconomic status, among additional variables, to predict 8th- and 12th-grade test scores, and considered whether relationships have changed over time. When describing the findings, the following discourse was used: For 12th grade outcomes (see Model III), African American students performed less well than their white counterparts. However two years later (see Model V), the effect actually reverses—African American students outperform whites … Family size is positively related with 12th grade scores; however, they have no impact two years later.48 In this example, the discourse implies that racialized identity—that is, being membered into a racialized group—has an effect on a test score. As is examined in greater detail in Chapter 11, the notion that membering into a racialized (or gendered or any other socially constructed demographic) group has an effect on an outcome is problematic. Simply assigning a person to a given group does not cause a change in test scores. Rather, it is the lived experiences that systematically differ among racialized groups that contribute to differences in outcomes. Despite the warnings regarding causation, the language of regression nonetheless implies causal effects. The causal implications of discourse employed when discussing findings from a regression analysis also contribute to the production of deficit narratives about racialized groups. Since the system of racism that operates the United States produces disparities in educational opportunities and outcomes among racialized groups, performance on measures of educational achievement tends to differ among racialized groups. When presenting findings from regression and similar statistical analyses, in which racialized identity (typically phrased as race by authors of the studies) are presented, language that implies the existence of a deficit within the members of racialized groups is too often employed. In turn, use of language that implies a deficit within a racialized group contributes to the production of deficit narratives—a powerful tool that operates within systemic racism to support racialized ideology and justify disparate outcomes.

224 The White Racial Frame and Educational Measurement Research by Shaun Harper, a professor of higher education, as well as by my colleagues and me, examined the prevalence of discourse employed when presenting and discussing findings from educational research that contributes to deficit narratives.49 A few examples of discourse taken from articles published in educational research journals since 2008 include: On the other hand, we found test score disparities across race/ethnic lines during the kindergarten school year, with Black students’ school-year gains lagging behind those of White students. Even after accounting for children’s reading skills at the start of kindergarten, African American ethnicity continued to affect the average rate of growth. The race/ethnic gap indicated that the Black–White gap was typically as large as the SES gap and in many grades significant, which suggested that Black students trailed their White peers in mathematics and reading scores by and large.50 In each of these examples, attribution to the outcome focuses attention on the racialized group rather than on the failure of the intervention to support students membered into the racialized group—students membered Black “lagged behind” and “trailed,” as if they were unable to keep up; African American ethnicity “continued to affect”, as if membering into a group affected growth. Attribution of an outcome or effect to a racialized group is a key component of a deficit narrative, particularly when the outcome variable for the racialized group is lower than that of the dominant group or when a regression coefficient is negative for a racialized group. Attribution to a racialized group is problematic because it implies that the cause for disparate outcomes resides within the group rather than the broader social forces produced by systematic racism and other forms of oppression that impact the individual membered into that group. Review of research published across ten educational research journals indicates that the discourse employed when presenting findings from quantitative analyses in which racialized identity is included as a variable produces deficit narratives with relatively high frequency. As an example, our analysis of articles published since 2008 found that 56% of sentences discussing findings for students membered Black or African American attributed outcomes to the racialized group rather than the intervention. Similarly, 59% of the articles contained at least one sentence that, if quoted directly, conveys a deficit narrative.51 The production of deficit narratives through the discourse of statistical analyses is an outgrowth of the racist beliefs held by the pioneers of the field—they understood race as a biological trait that was responsible for creating differences among racialized groups. While understanding

The Rise of Statistics in Educational Measurement 225 of racialized identity has evolved considerably since these statistical methods were developed, the habits of discourse persist. As explored in greater detail in Chapters 11 and 12, the use of racialized identity in regression and similar analyses may sometimes be intended to reflect larger environmental factors that differ, on average, across racialized groups due to systemic racism. Yet, this simplistic representation of complex socio-cultural-historical-political productions both fails to reflect these environmental factors and muddles attribution by focusing discourse on the racialized group rather than on the contextual conditions. While critiquing industrial-organizational psychology’s focus on intelligence and racialized identity, Helms argues: a first step in developing broader conceptual questions about race and intelligence would involve more complex analyses of environments than currently exists … [but] attention to racial environmental effects on the persons assessed by intelligence tests and, therefore, intelligence as assessed by such tests has not been a focus of psychologists.52 By substituting “educational achievement” for “intelligence” and “educational researchers” for “psychologists,” Helms’s observation applies to the field of educational measurement. Opportunity Costs The brief history presented at the beginning of this chapter covers only a small selection of contributions to modern statistics made during the late 19th and early 20th centuries. The history of statistics and narratives depicting the lives of the pioneers of modern statistics are rich, complex, and revealing. Spanning a period of roughly 50 years, Galton, Pearson, Fisher, and Spearman, among others, had a profound impact on statistical developments. Collectively, their work helped transform statistics from a descriptive to an inferential endeavor. Their interests and application also expanded use of statistics from the physical world to the study of human traits and broader social issues. While each of these pioneers contributed important conceptual and mathematical advancements, their interest in statistical analytic techniques was driven by their pursuit of evidence supporting their eugenic beliefs. As noted earlier, today critics of the use of statistical methods to examine issues of race and racism highlight Galton, Pearson, Fisher, and, to a lesser extent, Spearman’s focus on eugenics and their fixation on the heredity of intellectual and moral traits as an undue influence on the development and current use of statistics. There is no doubt that Galton’s obsession and

226 The White Racial Frame and Educational Measurement strong desire to demonstrate the hereditability of intellectual and moral traits inspired his pursuit of statistical methods he could apply to use his data to support eugenic positions. Further, critics rightly connect Galton’s interest in the hereditability of intellectual and moral traits with his eugenic ideas. Clearly, these eugenic ideas are troubling and cast a dark shadow on Galton’s legacy. Pearson, Fisher, and Spearman similarly expressed troubling eugenic ideas. Their focus on heredity, innate individual traits, and racial (“race” being broadly defined at the time) differences also had an impact on the focus of statistical inquiry and the vocabulary of statistics. The important role the term population plays in tests of statistical inference was explored earlier. The term homogeneity plays a similar role in statistics. In statistical terms, homogeneity of a sample is assumed for several statistical techniques. In colloquial terms, however, Pearson employed the term in a eugenics sense. When writing about colonial states comprised of two racialized groups, Pearson argued: [t]he nation organized for the struggle must be a homogenous whole, not a mixture of superior and inferior races. For this reason every new land we colonize with white men is a source of strength; every land of coloured men we simply rule may be needful as a source of food and mineral wealth, but it is not an element of stability in our community, and must ever be regarded with grave anxiety by our statesmen.53 Reflecting on the use of the term homogeneous in the statistical sense, Clayton observes: the word homogeneous, linking the purely statistical statement to the one from eugenics, had a particularly charged meaning, with connotations of racial purity and ethnic cleansing. Homogeneity in data and what it indicated about homogeneity of people had racial undertones from the start.54 In presenting Clayton’s observation, I am not suggesting the use of the term today is interpreted in the same manner as Pearson seems to have thought of the word. Rather, I suggest that, like the term population, its use influenced the way in which users of statistical methods thought about the groups they were interested in studying, which in turn focused attention on characteristics of the groups themselves rather than on the conditions that produced the groups in the first place, and which then influenced the formation of characteristics within each group. This focus on homogeneously conceived groups and their characteristics—absent deep analysis or attempts to represent contextual factors in analyses—persists and shapes the questions explored in educational measurement today.

The Rise of Statistics in Educational Measurement 227 The persistent focus on characteristics of socially constructed groups encouraged by the statistical methods creates an opportunity cost—rather than examining and developing methods to explore contextual impacts on the production of group differences, statistical techniques have been applied largely to examine differences among groups—whether those groups are formed by assignment to conditions, by racialized or other socially constructed categories, or both. In his book Bernoulli’s Fallacy, Clayton explores a separate opportunity cost that he links to both the eugenic interests and the frequentist conception of statistics embraced by the pioneers of modern statistical methods. This cost focuses specifically on the use of a subset of statistical techniques founded on a frequentist, objective conception of probability theory. Clayton argues that the affinity for and development of methods based on a frequentist conception of probability came at the expense of developing methods based on a “subjective” Bayesian conception of probability estimation. In The Book of Why: The New Science of Cause and Effect, Judea Pearl, an award-winning computer scientist, makes a similar argument about how the frequentist, objective orientation held by Pearson and Fisher thwarted development of causal analytic methods such as path analysis.55 In the section that follows, I focus narrowly on Clayton’s argument. Taken together, though, Clayton and Pearl evidence how dominant ideology coupled with institutional power—both of which Pearson and Fisher controlled in the field of statistics for nearly a halfcentury—silenced alternate ideas and methods and delayed the arc of progress within the field of educational measurement and the social sciences more broadly. Silencing Bayesian Statistics

Clayton argues that the frequentist conception of probability that undergirds the statistical methods developed by Pearson and Fisher, coupled with their reliance on objectivity to shield the influence their eugenic beliefs had on their interpretation of findings, produced disdain for a Bayesian orientation to statistics. There are three parts to Clayton’s analysis. The first focuses on Thomas Bayes, an English philosopher, minister, and mathematician, who developed interest in probability during the mid-1700s. Unlike his contemporaries, who were focused largely on probabilities of events occurring given a known distribution of possible outcomes, Bayes developed interest in estimating the confidence (or strength) of a belief or the occurrence of a future event based on current knowledge but absent a known distribution or reference class. Clayton describes a simple example of the type of question Bayes was interested in solving:

228 The White Racial Frame and Educational Measurement Your friend rolls a six-sided die and secretly records the outcome; this number becomes your target T. You then put on a blindfold and roll the same six-sided die over and over … your friend … tells you only whether the number you just rolled was greater than, equal to, or less than T. After some number of rolls, say 10, you must guess what the target was. What would be your strategy for guessing, and how confident would you be?56 At the time, methods for calculating probability allowed one to estimate the occurrence of an event given all possible outcomes—drawing three red cards and one blue card given that one-half of the cards are red and one-half are blue. Bayes’ question was the inverse: given that one has drawn three red cards and one blue card, what is the probability that the ratio of red to blue cards is one-to-one? Or, in Clayton’s example, given a sequence of greater than, equal to, or less than observations, what is the probability the target is a given number? To address this type of question, Bayes developed a theorem, known as Bayes’ Theorem, that used existing knowledge to estimate the probability of an unknown. There are two important aspects of statistical applications of Bayes’ Theorem. First, estimation of the probability of an event makes use of prior information. In the case of the card distribution example, one could begin the estimation procedure with a prior assumption about the probable distribution—say we know that a factory tends to overproduce red cards, and thus the intended 1:1 distribution is off for a small percentage of card packs. In such a case, this prior information can be used to inform the initial estimate of the probable distribution of cards within the actual pack given a set of observed draws. A second important component is the ability to update that prior information based on new information. In contrast, the frequentist approach relies solely on knowledge of a known or imagined distribution to estimate the probability that a given observation (or pattern of observations) would occur. Because the frequentist approach does not employ any information beyond what is known about the distribution and what is observed, it is termed objective probability. In contrast, because the Bayesian approach makes use of prior information provided by the investigator, it is referred to as subjective probability. The second component of Clayton’s argument focuses on the methods developed by Pearson and later Fisher to test the statistical significance of observed outcomes. Their methods applied an objective frequentist approach to probability estimation. For Fisher (and Pearson), their work aimed to develop “the concept of probability as an objective fact, verifiable by observations of frequency”57 such that “[t]he feeling induced by a test of significance has an objective basis in that the probability statement on which it is based is a fact communicable to, and verifiable by, other rational minds.”58 Beyond understanding probability as an observable fact,

The Rise of Statistics in Educational Measurement 229 Fisher also understood probability as a fixed, measurable property of the system under study. This way of thinking was consistent with the positivist pursuit of universal truths that motivated the social sciences of Fisher’s time and was intended to support the formation of conclusions that appeared to be free of bias—a perception that greatly aided analyses conducted in pursuit of the eugenicist goals ingrained in Fisher’s and Pearson’s beliefs.59 This quest for objectivity led Pearson and Fisher to shun what they saw as the subjective nature of prior assumptions on which the Bayesian method capitalized. As evidence of Pearson and Fisher’s disdain for the subjective Bayesian method, Clayton points to an interesting exchange that occurred between the two in 1917, in which Pearson critiques a maximum-likelihood method developed by Fisher for relying on what Pearson perceived as a Bayesian technique.60 In another context, Fisher was said to erupt into a “a boiling cauldron of wrath” at the mere mention of the inverse probability logic upon which Bayes’ method was based.61 Disdain for Bayesian logic was a natural outgrowth of Pearson and Fisher’s frequentist conception of probability and statistics as the study of populations and sampling conducted within that population; a concept they extended to measures. As Clayton describes, any derived quantities such as averages of the sample could also be given probabilities by thinking of those as having been sampled from a theoretical infinite population of measurements: “The idea of a population is to be applied not only to living, or even to material, individuals. If an observation, such as a simple measurement, be repeated indefinitely, the aggregate of the results is a population of measurements.”62 Conceiving of a measurement or a statistic based on a measurement as a sample from a population enabled Pearson and Fisher to extend the idea of variance within a population of people to that of a population of sample statistics—an extension that was necessary to apply the logic of frequentist probability to estimate error and evaluate the statistical significance of sample statistics. A third aspect of Clayton’s argument centers on the influence eugenics had on the type of questions of interest to Galton, Pearson, Fisher, and other social scientists at the time. Given eugenic notions of differences among races and populations of people, questions regarding differences among people were really ones of whether groups of people are of the same population. Given understanding that human traits—physical and mental— were normally distributed, observations of difference were reshaped as questions about the probability that the two groups would be formed within the same known distribution.

230 The White Racial Frame and Educational Measurement Together, eugenic interest in differences among groups and assumptions regarding the normal distribution of human traits yielded attraction to a frequentist conception of probability. Given the dominant voices Pearson and Fisher had at the time, Bayesian thought was effectively silenced. As an example, Clayton observes that Fisher’s text, Statistical Methods for Research Workers, which saw 14 editions published between 1925 and 1970, became such an industry standard that analyses that did not apply one of the statistical methods he detailed in his text had little chance for publication.63 Absent from those methods were Bayesian techniques. As a result, over the next 30 years, frequentist methods came to dominate practice. It was not until the 1950s and 1960s that Bayesian logic was resurrected and applied to address statistical problems. By then, however, the social sciences had firmly embraced the frequentist-based statistical methods introduced during the early 20th century. It is this dominance and silencing that produced an opportunity cost for the advancement of methods and, as we see next, produced what has become known as the replication crisis. Given the increasing ways in which Bayesian logic is being applied today to explore various social questions and advance measurement solutions, one wonders where the field might be otherwise if eugenic notions had not motivated the development of frequentist-based statistical methods. The Replication Crisis

Since the 1960s, concern has grown about the use of frequentist methods of statistical analysis in the social sciences. There are two components to this concern. The first focuses on the illogic of the frequentist approach when making inferences about a hypothesis. The second raises concern about the (over)reliance on p-values to inform decisions about the acceptance or rejection of statistical hypotheses and the high frequency with which findings based on p-values fail to be replicated. The logic of statistical hypothesis testing developed through a frequentist conception of probability begins by establishing a null hypothesis and an alternate hypothesis (which is typically the hypothesis in which the researcher is actually interested). The null hypothesis is then tested by estimating the probability that an occurrence or outcome equal to or more extreme than what is observed is expected to occur if the null hypothesis is true. In effect, this test first imagines a population distribution that would exist if the null hypothesis is true and then estimates the probability of an event given the population distribution. The p-value indicates the probability of observing the event, or of one even more extreme, given that population distribution. As an example, a null hypothesis might establish that there is no difference among the mean height of two groups of people. The alternate hypothesis might speculate that the mean height does differ between the two groups.

The Rise of Statistics in Educational Measurement 231 To test the null hypothesis, a distribution of the height of all people in a homogenous population is imagined. From this distribution, the probability of forming two groups whose mean height equals or exceeds the difference observed is estimated. This estimated probability is defined as the p-value. A criterion is then applied to determine whether the p-value—that is, the probability of observing a mean difference equal to or larger than what was observed—is smaller than a predetermined threshold. For social science research, this probability threshold is typically set at .05, which indicates that such a difference or larger is expected to be observed only 5% of the time if the study is repeated over the long run. If the p-value is at or below the threshold, the null hypothesis is rejected, indicating that the probability of the occurrence, assuming the null hypothesis is true, is more unlikely than we are willing to accept. In turn, the alternate hypothesis is deemed tenable. The logic of p-value hypothesis testing has been repeatedly questioned since the 1940s. Of particular concern is the misalignment between what a researcher is actually interested in and what a p-value-based hypothesis test focuses on. In most cases, the researcher is most interested in making claims about a given theory or hypothesis that is represented by the alternative hypothesis. The claim they wish to make focuses on the probability that the theory is accurate or that the hypothesis holds in a given context. A p-valuebased hypothesis, however, does not provide any information about the alternate hypothesis beyond its possible tenability. Recall that a p-value is an estimate that an observed outcome (or one even more extreme) would occur if the null hypothesis holds. The p-value provides no information about the probability of the alternate hypothesis. As William Rozeboom described, the heart of the problem is that p-value-based hypothesis testing is based on a frequentist conception of probability that is not equipped to provide information about the plausibility of the researcher’s hypothesis of interest: [the scientist] is fundamentally and inescapably committed to an explicit concern with the problem of inverse probability. What he wants to know is how plausible are his hypotheses, and he is interested in the probability ascribed by a hypothesis to an observed experimental outcome only to the extent he is able to reason backwards to the likelihood of the hypothesis given this outcome.64 Drawing on the work of Paul Meehl, Stanislav Andreski, and Jacob Cohen, Clayton suggests that persistent reliance on p-value-based hypothesis tests is a product of both the long-lasting desire of social scientists to appear quantitatively objective and the power Pearson and Fisher held on maintaining frequentist-based statistics as the standard toolset for the social sciences.65 As John Campbell noted in 1982,

232 The White Racial Frame and Educational Measurement It is almost impossible to drag authors away from their p values, and the more zeros after the decimal point, the harder people cling to them … Perhaps p values are like mosquitos. They have an evolutionary niche somewhere and no amount of scratching, swatting, or spraying will dislodge them.66 Absent efforts to estimate the probability that the hypothesis of interest holds, p-value-based hypothesis testing has led to the publication of several studies of questionable repute. As one example, Clayton points to a study for which the p-value led the researcher to reject a null hypothesis regarding the absence of extrasensory perception (ESP) and instead accept that ESP is activated in college students when they are exposed to pornographic images. More broadly, several studies conducted in the early 21st century have failed to replicate findings for a substantial body of studies. When replication studies do obtain statistically significant findings, the magnitude of the effect is almost always notably smaller than the original study. Clayton and the various authors referenced posit that reliance on the frequentist-oriented, p-value-based hypothesis tests are primarily responsible for both acceptance of highly unlikely hypotheses (e.g., ESP activated by pornography) and the failure to replicate findings from previous studies. In turn, Clayton and others argue that inverse probability techniques enabled by Bayesian statistical methods would limit faulty acceptance of questionable hypotheses. Although the degree to which this argument holds is yet unknown, the more important point is that Pearson and Fisher’s strong embrace of frequentist statistical methods and their corresponding efforts to label Bayesian methods as subjective both directed the field toward p-value-based hypothesis testing and thwarted development and use of inverse probability-based inferences. Taken together, the terminology, discourse, and orientation of the statistical methods commonly employed today remain strongly influenced by the frequentist-oriented methods developed to support the racist, ableist, classist, eugenicist agenda embraced by Galton, Pearson, Fisher, and Spearman. As I explore in Part III, reliance on these methods impedes the exploration of questions that go beyond examining individual and group differences and that instead aim to deepen understanding of the ways in which racialized and other forms of oppression and social structural arrangements contribute to the production of disparate outcomes. As a result, continued embrace of dominant statistical methods function to support the reproduction of racialized deficit narratives while simultaneously inhibiting focus on the institutional, structural, and systemic causes of inequities in educational opportunities and outcomes.

The Rise of Statistics in Educational Measurement 233 Notes

1 Clayton (2021), p. 177. 2 Quoted in Madaus et al. (2009), p. 15, italics added. 3 Madaus et al. (2009), p. 15. 4 Mehlman on Frontline, April 12, 2005, https://www.pbs.org/wgbh/pages/frontline/ shows/architect/interviews/mehlman.html 5 Zuberi (2001); Zuberi and Bonilla-Silva (2008); Helms (2012); Dixon-Román (2017). 6 Compared to Galton, Pearson, and Fisher, whose writings contain several racist, classist, and eugenicist statements, Spearman rarely addressed these topics in his publications, and it seems the strength with which he held these beliefs was weaker than that of his contemporaries; see Briggs (2022), p. 220. Note that Fisher also held strong anti-Semitic beliefs. Fisher applied his statistical methods in an effort to demonstrate the inferiority of people who practiced Judaism. During World War II, Fisher had close ties with German Nazi scientists. Following the end of the war, Fisher authored public statements intended to help improve the image of Otmar Freiherr von Verschuer, a Nazi geneticist who embraced the Nazi racial hygiene program; see Clayton (2021), p. 158. 7 Porter (1986); Stigler (1986). 8 Gould (1996); Zenderland (2001); Bulmer (2003). 9 Clayton (2021). 10 Cardano’s exact language is as follows: So there is one general rule, namely, that we should consider the whole circuit, and the number of those casts which represents in how many ways the favorable result can occur, and compare that number to the rest of the circuit, and according to that proportion should the mutual wagers be laid so that one may contend on equal terms. Cardano quoted in Gorroochurn (2012), p. 14 11 Ore (1960); Devlin (2010). 12 Bernoulli quoted in Stigler (1986), p. 65. 13 Clayton (2021), p. 9, italics in the original. 14 Eknoyan (2008). 15 Porter (1986). 16 Porter (1986), pp. 52–53, quoting Quetelet (1835), pp. 21–22. 17 Clayton (2021), p. 113. 18 Porter (1986). 19 Clayton (2021). 20 Clayton (2021), pp. 131–132. 21 Galton (1875). 22 Galton (1870). 23 Stigler (1989), p. 74. 24 Stigler (1989); Galton (1889, 1890). 25 Porter (1986), p. 286. 26 By instructing his collaborators to assure growing conditions were similar for each batch of seeds they received, we again see Galton’s awareness that contextual factors have important influences of outcomes, whether they be the size of seeds produced by a plant or, as seen in Chapter 5, the mental ability of formerly enslaved people who were denied access to education and other social supports.

234 The White Racial Frame and Educational Measurement

27 28 29 30 31 32 33 34 35 36 37 38 39

Yet, despite this apparent awareness, Galton placed stock in the heredity of traits over the influence of social contextual factors. Porter (1986), p. 287. Pearson (1911), pp. 14–15. Pearson (1896), pp. 254–255. Pearson (1900). Porter (2004). Fisher (1922). Fisher (1924), p. 114. N.A. (1930). Spearman (1914). Spearman (1914), p. 228. Jensen (1980); Herrnstein and Murray (1994). Pearson quoted in Stigler (1986), p. 304. Stigler (1989), p. 75, describes Galton’s conception of disturbances forming separate populations in this way: But Galton’s use was at a deeper conceptual level—it showed how a normal population (the bottom level or a set of human heights) could be taken apart, dissected into smaller normal populations, each of which could be associated with another measurement (the index of the midlevel compartment or the height of a parent).

40 Stigler (1986). 41 Stigler (1986), p. 311, italics in the original. 42 Pearson (1894). 43 Clayton (2021), p. 144. 44 Clayton (2021). 45 Helms (1992). 46 In Chapter 13, two examples of misleading results that occur when the multiplicity of identity is ignored are presented in detail. The first focuses on a mis-estimate of differences in physician referrals across racialized and gendered groups of patients (Schulman et al., 1999). The second focuses on under-identification of differential item functioning when the intersections of identity are overlooked (Russell et al., 2022). 47 Put simply, Type I error occurs when one concludes that an observed difference is statistically significant, yet, in reality, no difference actually exists. 48 Battle and Lewis (2002), p. 28. 49 Harper (2012); Russell et al. (2022). 50 Quoted anonymously in Russell et al. (2022), pp. 13–14. 51 Russell et al. (2022). Although Harper (2010) did not quantify the frequency with which deficit narratives are produced in research on higher education, his analyses found similar patterns of common occurrence. 52 Helms (2012), p. 177. 53 Pearson (1905), p. 50. 54 Clayton (2021), p. 145. 55 Pearl and Mackenzie (2018). 56 Clayton (2021), p. 36. 57 Fisher (1956), p. 25. 58 Fisher (1956), p. 43. 59 Clayton (2021). 60 See Clayton (2021), p. 151.

The Rise of Statistics in Educational Measurement 235 61 See Fred Hoyle quoted in Clayton (2021), p. 164. Also see Porter (2004), p. 256, in which Pearson is quoted poking fun at Bayesian logic, stating “‘Give me enough arbitrary constants and I will describe any past experience,’ cries the mathematician. ‘Possibly,’ replies drily the natural philosopher, ‘but will your description not only suffice to predict, will it predict correctly future experience?’” (italics in the original). 62 Clayton (2021), p. 166, quoting Fisher, Statistical Methods for Research Workers, pp. 2–3, italics in the original. 63 Clayton (2021), see p. 153. 64 Rozeboom (1960), p. 422, quoted in Clayton (2021), pp. 241–242. 65 Meehl (1967, 1990, 1992); Andreski (1972); Cohen (1994). 66 Campbell (1982), p. 698.

References Andreski, S. (1972). Social Sciences as Sorcery. St. Martin’s Press. Battle, J. & Lewis, M. (2002). The increasing significance of class: The relative effects of race and socioeconomic status on academic achievement. Journal of Poverty, 6(2), 21–35. Briggs, D.C. (2022). Historical and Conceptual Foundations of Measurement in the Human Sciences: Credos and Controversies. Routledge. Bulmer, M.G. (2003). Francis Galton: Pioneer of Heredity and Biometry. JHU Press. Campbell, J.P. (Ed.). (1982). Editorial: Some remarks from the outgoing. Journal of Applied Psychology, 67(6), 691–700. Clayton, A. (2021). Bernoulli’s Fallacy: Statistical Illogic and the Crisis of Modern Science. Columbia University Press. Cohen, J. (1994). The earth is round. American Psychologist, 49, 997–1003. Devlin, K. (2010). The Unfinished Game: Pascal, Fermat, and the SeventeenthCentury Letter that Made the World Modern. Basic Books. Dixon-Román, E.J. (2017). Inheriting Possibility: Social Reproduction and Quantification in Education. University of Minnesota Press. Eknoyan, G. (2008). Adolphe Quetelet (1796–1874)—The average man and indices of obesity. Nephrology Dialysis Transplantation, 23(1), 47–51. Fisher, R.A. (1922). On the interpretation of χ2 from contingency tables, and the calculation of p. Journal of the Royal Statistical Society, 85(1), 87–94. Fisher, R.A. (1924). The elimination of mental defect. The Eugenics Review, 16(2), 114. Fisher, R.A. (1956). Statistical Methods and Scientific Inference. Oliver and Boyd. Galton, F. (1870). Hereditary Genius: An Inquiry into its Laws and Consequences. D. Appleton. Galton, F. (1875). Statistics by intercomparison, with remarks on the law of frequency of error. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 49(322), 33–46. Galton, F. (1889). Co-relations and their measurement, chiefly from anthropometric data. Proceedings of the Royal Society of London, 45(273–279), 135–145. Galton, F. (1890). Kinship and correlation. The North American Review, 150(401), 419–431.

236 The White Racial Frame and Educational Measurement Gorroochurn, P. (2012). Classic Problems of Probability. John Wiley & Sons. Gould, S.J., (1996). The Mismeasure of Man. WW Norton & Company. Harper, S.R. (2012). Race without racism: How higher education researchers minimize racist institutional norms. The Review of Higher Education, 36(1), 9–29. Helms, J.E. (1992). Why is there no study of cultural equivalence in standardized cognitive ability testing? American Psychologist, 47(9), 1083–1101. Helms, J.E. (2012). A legacy of eugenics underlies racial-group comparisons in intelligence testing. Industrial and Organizational Psychology, 5(2), 176–179. Jensen, A.R. (1980). Bias in Mental Testing. Free Press. Madaus, G., Russell, M. & Higgins, J. (2009). The Paradoxes of High Stakes Testing: How They Affect Students, Their Parents, Teachers, Principals, Schools, and Society. Information Age Publishing. Meehl, P.E. (1967). Theory-testing in psychology and physics: A methodological paradox. Philosophy of Science, 34(2), 103–115. Meehl, P.E. (1990). Why summaries of research on psychological theories are often uninterpretable. Psychological Reports, 66(1), 195–244. Meehl, P.E. (1992). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46, 806–834. N.A. (1930). Report of committee for legalizing eugenic sterilization. Postgraduate Medical Journal, 6(61), 13. Ore, O. (1960). Pascal and the invention of probability theory. The American Mathematical Monthly, 67(5), 409–419. Pearl, J. & Mackenzie, D. (2018). The Book of Why: The New Science of Cause and Effect. Basic Books. Pearson, K. (1894). Contributions to the mathematical theory of evolution. Philosophical Transactions of the Royal Society of London. A, 185, 71–110. Pearson, K. (1896). VII. Mathematical contributions to the theory of evolution.— III. Regression, heredity, and panmixia. Philosophical Transactions of the Royal Society of London, 187, 253–318. Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 50(302), 157–175. Pearson, K. (1905). National Life from the Standpoint of Science (Vol. 11). Cambridge University Press. Pearson, K. (1911). The Scope and Importance to the State of the Science of National Eugenics. Dulau and Co. Porter, T.M. (1986). The Rise of Statistical Thinking, 1820–1900. Princeton University Press. Porter, T.M. (2004). Karl Pearson: The Scientific Life in a Statistical Age. Princeton University Press. Quetelet, A. (1835). Sur l’homme et le développement de ses facultés, ou esai de physique sociale. Russell, M., Oddleifson, C., Russell Kish, M. & Kaplan, L. (2022). Countering deficit narratives in quantitative educational research. Practical Assessment, Research, and Evaluation, 27(1), 14.

The Rise of Statistics in Educational Measurement 237 Schulman, K.A., Berlin, J.A., Harless, W., Kerner, J.F., Sistrunk, S., Gersh, B.J. & Escarce, J.J. (1999). The effect of race and sex on physicians’ recommendations for cardiac catheterization. New England Journal of Medicine, 340(8), 618–626. Spearman, C. (1914). The heredity of abilities. The Eugenics Review, 6(3), 219. Stigler, S.M. (1986). The History of Statistics: The Measurement of Uncertainty Before 1900. Harvard University Press. Stigler, S.M. (1989). Francis Galton’s account of the invention of correlation. Statistical Science, 4(2), 73–79. Zenderland, L. (1998/2001). Measuring Minds: Henry Herbert Goddard and the Origins of American Intelligence Testing. Cambridge University Press. Zuberi, T. (2001). Thicker Than Blood: How Racial Statistics Lie. University of Minnesota Press. Zuberi, T. & Bonilla-Silva, E. (2008). White Logic, White Methods: Racism and Methodology. Rowman & Littlefield Publishers.

9

Educational Measurement as Apparatus for Systemic Racism

Racist projects exist in a dense matrix, operating at varying scales, networked with each other in formally and informally organized ways, enveloping and penetrating contemporary social relations, institutions, identities, and experiences.1

Part I explored race, racism, and the White Racial Frame, setting a foundation for examining the White Racial Frame’s influence on early developments in educational measurement. In the first four chapters of Part II, we saw how hereditary beliefs about mental traits influenced conclusions reached in family studies and the interpretation of scores produced by tests of intelligence. We also saw how a positivist quantitative orientation influenced the drive to develop and administer tests measuring mental ability. This quantitative imperative also motivated the creation of statistical methods used today to inform the use of test scores and other outcome measures to examine the effects of educational interventions, policies, and programs. Together, understanding intelligence and achievement as an individual trait and a desire to reward individual merit motivated the development of college admission tests. Across each of these developments, belief in the supremacy of white norms and intellect influenced the content of instruments and the interpretations of test scores which were discoursed in ways that produced deficit narratives pathologizing people membered not-White. Integrating Foucault’s notions of power and oppression, Omi and Winant’s conception of racial projects, and the understanding of systemic racism developed in Part I, this chapter shifts focus to consider how the field of educational measurement currently functions as apparatus for systemic racism. Reflecting on the roles educational measurement plays in the system of racism positions us to explore ways in which alternate frames presented in Part III might be applied to educational measurement in ways that support an anti-racist endeavor. Actions the field can take in the near term to facilitate this shift are considered in the final chapter.

DOI: 10.4324/9781003228141-12

Educational Measurement as Apparatus for Systemic Racism 239 Connecting Racial Projects and Apparatus of Oppression The analysis presented in this chapter is informed by a melding of Omi and Winant’s conception of racist and anti-racist projects with Foucault’s conception of apparatus of oppression. In their analysis of systemic racism, Omi and Winant prefix the adjectives racial, racist, and anti-racist to the term project, defining and elaborating all three types of projects. In doing so, they do not provide a formal definition of the term project. Yet, from the many examples of racial, racist, and anti-racist projects they present, I interpret the term project to mean any effort undertaken by an individual, group of individuals, organization, institution, or political body to advance ideas, policies, practices, regulations, or laws intended to inform and/or influence the functioning and evolution of a society. This definition encompasses a broad array of efforts engaged at all levels of a society. For an individual, a project can include statements made through letters to local newspapers or political representatives, digital posts or tweets spreading an idea about or critique of an aspect of society, a research study engaged to shed light on or to influence practice in a given social institution, or active engagement in a political rally, march, or movement. For organizations and institutions, a project can take the form of employment or promotion policies that alter the composition of its members and/or leadership, initiatives aimed at influencing practices and behaviors of its members, customers, or those it serves, and lobbying and advocacy efforts to influence legislative actions or governmental policies and regulations. Similarly, projects include the priorities, policies, regulations, decisions, and laws introduced by agents of the government, whether they be members of the judicial system, legislative or executive branches, or the various organizations that provide local, state, or federal governmental functions. For Omi and Winant, a project becomes racialized (aka a racial project) when it “shape[s] the ways in which social structures are racially signified and the ways that racial meanings are embedded in social structures.”2 In this way, whether intentional or inadvertently, a racial project influences the meaning of racialized categorizations, the role racialization plays in social structures, and the impact social structures and institutions have on access to and distribution of resources across racialized groups. In this way, “a racial project is simultaneously an interpretation, representation, or explanation of racial identities and meanings, and an effort to organize and distribute resources (economic, political, cultural) along particular racial lines.”3 A project becomes racist when it functions to preserve or extend racialized meanings and/or disparities in the distribution of resources. In contrast, an anti-racist project functions as a challenge to existing racialized meanings and works to disrupt and/or rectify existing disparities between racialized groups.

240 The White Racial Frame and Educational Measurement Omi and Winant’s conception of racial projects aligns with Foucault’s analysis of power and oppression, which was presented in the Introduction to this book.4 For Foucault, power does not operate solely from the top down through a society. Instead, power operates at and through all levels of a society. Individuals exert power through their own actions, and they experience power through their interactions with other individuals. Organizations and institutions similarly exert power through their policies, practices, and actions, and they experience power through the actions of other organizations and institutions. Government (aka the State), similarly exerts power through priorities, policies, regulations, legal decisions, and laws, and it experiences power through the individual and collective actions of the people, organizations, and institutions it serves. At all levels of a society, the exercise of power can advance the interest of individuals, organizations, institutions, and government agencies. But, exertions of power can also produce repression and/or oppression. When the exercise of power at any level within society interacts with racialized meanings, structures, and/or disparities, Foucault’s conception of power and oppression dovetails with Omi and Winant’s notion of racial projects. Whereas Omi and Winant apply the term project to capture the initiatives and actions undertaken by individuals, organizations, institutions, and governmental bodies to inform and influence the functioning and evolution of a society, Foucault applies the term apparatus. In this way, I understand a racist project to be a form of apparatus that is adopted by and integrated into the system of racism to maintain or advance racialized ideology and advantage produced for dominant members of society through the racialized oppression of nondominant members. In contrast, an anti-racist project is apparatus that resists and/or works to undo racialized ideology and oppression. With this connection between racial project and apparatus in mind, Foucault’s concept of apparatus can be interpreted as “projects” that become absorbed by a system of oppression because their operation produces oppressive effects that help sustain the system itself. It is important to note that in a modern society, the system is not controlled by any one person or group of people, nor is it a product of a “master plan.”5 Rather, an oppressive system is developed, refined, and sustained over time by those individuals and institutions that benefit from the system through the absorption and ongoing support of projects that maintain and/or further those benefits. As reflected in the quote opening this chapter, Foucault’s conception is consistent with Omi and Winant, who see a large number of projects operating in ways that “converge and conflict, accumulate and interact with one another” to recreate and/or refine narratives, actions, and outcomes that produce disparate effects across racialized groups.6 The manner in which power and the production of oppression is distributed throughout all levels of a social system is reflected in the model of

Educational Measurement as Apparatus for Systemic Racism 241 systemic racism presented in Chapter 3. In this model, racial oppression operates at the level of individuals, institutions, and through governmental laws and regulations (both current and historical). In this model, individual actions, institutional policies and practices, and laws and regulations that produce racialized oppression are tolerated, embraced, or otherwise allowed to persist as apparatus for the system. Through the production of oppression, these apparatus function to provide economic, social, and political advantage for the various individuals and institutions that execute their power in ways that help sustain the system. In this system, projects initiated by individuals, organizations, institutions, or governmental agencies that can be applied or which interact with other apparatus in the system to sustain or extend racialized oppression and advantage, regardless of the project’s original intent, are absorbed into the system. As three examples outside of educational measurement, consider regulations that prohibit voting by people who are incarcerated or have been convicted of a specific class of crimes, requirements to provide identification when voting, and practices that prohibit collecting information on employment applications about prior criminal convictions. In each of these cases, one could present a rationale that the regulation or practice itself is introduced to further deter undesired criminal behavior, further improve the integrity of election results, and decrease racial discrimination, respectively.7 Yet, when each of these regulations or practices interact with other apparatus in the system, racialized disparities are produced. In the first example, racialized disparities in arrests, conviction rates, and sentencing interact with voting restrictions to produce racialized disparities in voter eligibility.8 In the second example, voter identification requirements interact with historical segregation of communities and disparities in income and wealth along racialized lines, which in turn contribute to differences in the need for driver’s licenses, as well as access to centers that distribute accepted forms of identification. Collectively, the interactions between these historical and current productions creates disparities across racialized groups in those who currently hold valid forms of identification and those who do not.9 Finally, in the third example, a practice implemented to explicitly reduce employment discrimination produced by racialized disparities in the criminal justice system in turn interacts with forms of implicit bias to inadvertently increase employment disparities for qualified candidates membered Black.10 Again, in the first two of these examples, the regulation or practice is not explicitly designed to produce racialized disparities. In the third example, reducing existing disparities is an explicit aim of the policy. Yet, in all three examples, interactions with other apparatus in the system result in furthering the production of racialized disparities. Since these regulations and practices do not disrupt the advantage produced for the individuals and institutions that gain benefit from systemic racism, their absorption into the system is

242 The White Racial Frame and Educational Measurement permitted. It is in this way that practices and productions of educational measurement are absorbed and function as apparatus for systemic racism. Educational Measurement and Systemic Racism When examining power and oppression, Foucault employs an “ascending analysis” that begins at the level of the individual and builds up to the family, community, institutions, and eventually to the State.11 The analysis of educational measurement and systemic racism presented here follows a similar approach, beginning first by examining two ways in which analyses of educational outcomes conducted by individuals and groups of individuals function as apparatus for systemic racism. Next, two practices typically employed by organizations that produce educational tests are considered. The final analysis examines uses of educational tests by other institutions and the State which (unintentionally) function as apparatus that reproduce educational inequities and economic disparities. Several of these examples draw on content presented in other sections of this book. In such cases, I provide a brief summary of that content and point to sections in which additional details are presented. Before beginning this analysis, it is useful to revisit my conception of the field of educational measurement. I employ a broad definition of educational measurement that includes the design, development, psychometric analysis, and application of instruments to collect evidence of cognitive, affective, and/or psychological constructs valued by our educational system, as well as the use of scores from these instruments to deepen understanding of the effects of educational interventions, policies, and practices. Under this broad definition, the field of educational measurement includes specialists who develop instruments, design and implement test-based accountability systems, and researchers who employ scores provided by instruments to examine factors that impact educational outcomes. When considering each of the examples, it is also useful to keep in mind that those in the field who developed or engage in the practices described may not intend to serve systemic racism. Nonetheless, these functions have and continue to be absorbed and/or applied by people and institutions operating outside of educational measurement in ways that support and sustain systemic racism. Individual Productions as Apparatus for Systemic Racism

Three components of the White Racial Frame play a central role in shaping the questions asked, interpretations made, and discourse employed by individuals and organizations who engage in research focused on educational outcomes. First, the White Racial Frame promotes race as an individual

Educational Measurement as Apparatus for Systemic Racism 243 trait rather than as a social construction. Second, the White Racial Frame centers on practices and experiences of students membered White as the norm and treats cultural beliefs and behaviors associated with groups membered not-White as pathologies. Third, the White Racial Frame directs attention to race and racialized pathologies as a main driver of disparities in opportunities and outcomes, distracting attention from the social, political, and economic structures (past and present) that are the primary cause of these disparities. Collectively, these components of the White Racial Frame influence both the discourse produced when interpreting findings from analyses focused on educational outcomes, and the variables considered in such analyses. In turn, findings from these analyses are employed to provide quantitative evidence that supports narratives which sustain these components of the White Racial Frame. As examined in Chapters 8 and elaborated further in Chapter 11, treating race as an individual trait is reflected in the statistical models employed by analysts to examine educational outcomes. For example, racialized identity is often included as a covariate or predictor model in regression analyses. As Paul Spector and Michael Brannick observe and recent analyses of peer-reviewed quantitative analyses of educational outcomes documents, (too) many authors describe the coefficient for a racialized demographic variable as “the effect of race,” implying that one’s racialized identity has a causal influence on an outcome variable.12 Being a social construct that serves to segregate advantage and disadvantage along racialized lines within our society, racial identity does not impact educational outcomes; rather, it is the advantage provided to people membered White and the oppression experienced by people membered not-White that has a causal influence.13 Nonetheless, discourse that presents racialized identity as causal functions as apparatus that reinforces the White Racial Frame’s conception of race as a trait inherent to the individual. This discourse also serves as apparatus that provides quantified backing to the White Racial Frame’s notion that race and racialized pathologies are a main driver for disparities in educational outcomes. In this way, the White Racial Frame influences the interpretation given to variables included in statistical analyses of educational outcomes, and in turn, core ideas promulgated by the White Racial Frame are reinforced through the interpretation and discourse produced through those analyses. When analyses of educational outcomes do consider factors beyond racialized identity as potential causal variables, centering on practices and experiences of students membered White as the norm also contributes to the production of deficit narratives about people membered not-White. For example, research on the development of early reading ability has focused on a variety of home factors, including the relationship between the number of books in the home, the variety of vocabulary spoken to children, the

244 The White Racial Frame and Educational Measurement amount of time parents spend reading to their children, the education level of parents, and other parenting practices.14 Research consistently finds that practices and conditions found more often in homes of students membered White correlate with higher early reading outcomes. Recommendations based on this research suggest that adopting the “effective” practices will improve reading skills for all children. Yet, as examined in Chapters 11 and 12, such analyses and recommendations overlook additional contextual factors impacted by the racialized structuring of society that influence home practices. In particular, these studies tend to ignore the social- historical factors that differentially impact families and, in turn, differentially affect the ease and feasibility (and in some cases desirability) of adopting the recommended practices. Focusing narrowly on household practices absent consideration of the social-historical factors centers the structures and supports that flow from the advantages of oppression for members of the dominant group and establishes these advantaged settings as the achievable and expected norm. In turn, discourse produced by such studies again provides backing to core components of the White Racial Frame that maintain White norms as superior and attribute disparate outcomes to racialized pathologies. In presenting this analysis, it is important to acknowledge that some studies employ “context surveys” to collect contextual information. As an example, both the National Assessment of Educational Progress (NAEP) and the Trends in International Mathematics and Science Study (TIMSS) administer a context questionnaire in addition to subject-area achievement tests. The context questionnaires are designed to collect information about each student’s home context, school context, and classroom contexts as well as their attitudes toward learning. While these sets of contextual information are useful for exploring the relationship between local context factors and student achievement, they do not provide insight into the impacts larger social structures play in shaping the local context. As a result, interpretations of context factors tend to focus on advantages and disadvantages produced by a student’s home and school environment, absent insight into the structures that contributed to the production of those advantages and disadvantages. In turn, discourse on the positive and negative relationships between these structurally decontextualized local factors focuses attention on students, households, and schools as prime contributors to achievement. And, when a given student, household, or school factor is negatively associated with achievement, students, households, or schools are pathologized as if the cause for low achievement resides with themselves. This focus on students, households, and schools as the cause of disparities in outcomes functions as apparatus for systemic racism by distracting attention from the structural elements that are a larger contributor to disparate outcomes.

Educational Measurement as Apparatus for Systemic Racism 245 Test Development Practices as Apparatus for Systemic Racism

As discussed in Chapter 7, the presence of bias was a considerable issue for tests of mental ability developed during the early 20th century. As the field of educational measurement matured, concerns about bias have stimulated efforts to reduce test bias. Test developers now apply item authoring, bias and sensitivity, and accessibility guidelines, all of which are designed to minimize bias in the content, presentation, and response modes of test items. The Standards for Educational and Psychological Testing reference several practices and procedures test developers should take to minimize bias. And a variety of statistical methods have been developed and routinely used to detect potential bias in test items. Collectively, these developments have reduced the presence of bias in tests developed today. Yet, despite these advances, concerns about test bias continue to limit trust in educational tests, particularly for members of nondominant racialized groups. In part, this mistrust is a product of what Steve Sireci described as a “dominant White culture that has permeated our field.”15 The overrepresentation of people membered White and the underrepresentation of people membered Black, Latine, and Indigenous in the field of educational measurement likely also contributes to this mistrust.16 But beyond this mistrust, I posit that this unbalanced representation and the influence of the White Racial Frame function together to unintentionally reduce the effectiveness of bias detection methods and overrepresent content that reflects culture most familiar to test takers membered White. In turn, these unintentional productions continue to yield test scores that underrepresent the performance of students membered into nondominant racialized groups. As detailed in Chapter 13, differential item functioning (DIF) analyses are commonly used to examine potential bias in a test item. DIF analyses are typically performed to examine potential bias for racialized groups, gendered identity, social economic status, English Language Learner status, and Special Educational status. DIF analyses are performed to examine whether an item functions similarly between two groups, termed a reference group and a focal group. In most cases, the group of test takers who are most advantaged in society serves as the reference group, and each additional group of test takers form focal groups. As an example, when examining potential racialized bias, test takers membered White typically form the reference group, and test takers membered Black form a focal group. An item is said to function similarly if the probability of responding correctly to the item is the same for each group of test takers when conditioned on estimated ability level. When an item is identified or flagged for performing differently between the reference and focal groups, the item is subject to review by a panel of experts. The review process attempts to identify features of the item that might create bias for the group of interest.

246 The White Racial Frame and Educational Measurement When a cause is identified, the item is either modified or removed. When no cause is identified, the item is typically retained. The treatment of race as a discrete individual trait is a core component of the White Racial Frame. Essentializing traits such as race and gender as discrete traits inherent in an individual leads DIF analyses to focus consideration of bias separately by race, gender, and other variables of interest. As the exploration of Intersectionality Theory in Chapter 12 explicates, power and oppression do not operate independently by race and gender (and other socially constructed categories of identity). Rather, power and oppression operate and are experienced in unique ways at the intersection of social identity constructs. As an example, advantage experienced by males membered White from high-wealth households is typically higher than that of males membered White from low-wealth households or males membered Black from either high- or low-wealth households. Yet, focusing a DIF analysis on only gender treats the advantage experienced by males as similar regardless of their racialized membering or economic status. As a result, focusing DIF analyses separately by gender or race may misrepresent the potential bias experienced by each intersectional grouping. Recent analyses that compare the detection of DIF under the traditional approach to focal and group formation with an intersectional approach provide evidence that the traditional approach underestimates the number of items for which potential bias occurs.17 In this way, DIF analyses conducted through a lens that treats socially constructed categories of identity as discrete may reduce the detection of bias. Despite the intent to decrease disparities in test scores produced by bias, reduced detection leads DIF analyses (as currently conducted) to function as apparatus for systemic racism. In addition to performing DIF analyses, test developers apply item authoring and review guidelines during the item development process to reduce potential bias. Yet, the White Racial Frame’s centering of culture most familiar to people membered White influences these guidelines in ways that allow the resulting tests to function as apparatus for systemic racism. As examined in greater detail in Chapter 13, Jennifer Randall argues that bias and sensitivity guidelines are authored from the perspective of whiteness and as a result promote practices that “implicitly (and perhaps unintentionally), further marginalizes our most marginalized students.”18 As just one example, Randall points to guidelines that discourage the use of the term “junk food” in test items and notes that for many students whose households have been marginalized by society, what the White dominant culture defines as “junk food” is the processed food that is most economically affordable and readily available in the food deserts that have developed in some neighborhoods segregated by historical racist policies. Excluding everyday experiences for a subgroup of test takers acts to erase their experiences and requires engagement with contexts with which they may be less familiar.19

Educational Measurement as Apparatus for Systemic Racism 247 Randall makes a similar argument with respect to the institutional practice of developing test content that is culturally or “color-neutral.”20 She argues that, when a test item includes content designed to place a problem in context, there is no such thing as a cultural/color-neutral context. Further, when test developers attempt to produce culturally neutral content, by default they draw on content believed to be familiar to all students. This “familiar” content, however, derives from media (television, academic learning materials, news broadcasts, political speeches, etc.) that is dominated by whiteness and tends to reflect narratives developed by the dominant group within society. As a result, culturally neutral content creates contexts that are most familiar to students membered White. Collectively, efforts to produce “neutral” content and use of sensitivity and bias review guidelines that inform the production of “neutral” content promote a white-frame that functions as apparatus that creates disparate impacts for test takers membered not-White. In both examples, procedures have been put into place to minimize bias in test scores. Yet, components of the White Racial Frame that essentialize race (and other socially constructed categories of identity) and center cultural experiences most familiar to people membered White lead these procedures to inadvertently function as apparatus for systemic racism. Admission, Graduation, and Scholarship Decisions as Apparatus for Systemic Racism

Some tests developed by the educational measurement community are employed by institutions and state agencies to inform high-stakes decisions for students. As examples, many colleges and universities use the SAT and ACT to inform admissions decisions, some public and private high schools similarly use test scores to inform admissions decisions, some state education agencies use test scores as one criterion for graduation, and test scores are used to inform scholarship decisions. Each of these uses of test scores reflects a desire for quantitative objectivity and an embrace of individual merit promoted by the White Racial Frame. As seen in Chapter 7, the use of test scores to inform college admission decisions was intended to increase the (economic) diversity of students attending elite universities. Nonetheless, each of these uses of test scores interacts with structural racism in ways that reproduce a broad set of disparate outcomes. To visualize how this use of educational tests contributes to this reproduction, Figure 9.1 depicts a simplified representation of a dense matrix of apparatus that connect structurally to produce disparate outcomes.21 This depiction begins with the redline housing and financial policies of the mid20th century that recreated communities segregated along racialized lines. In turn, segregation of residential communities produced inequities in wealth,

248 The White Racial Frame and Educational Measurement

Figure 9.1 Admission Testing Role in Simple Model of a System of Racism.

health, employment opportunities, tax bases, policing policies and practices, and school quality. These many disparities intersect to contribute to disparities in opportunities to learn, which in turn produce disparities in state achievement and admission test scores. Use of these scores to inform admission, graduation, and scholarship decisions then contribute to disparities in access to educational opportunities. And disparities in educational opportunities impact employment opportunities, which creates disparities in socioeconomic mobility. This series of productions cycles back to communities and reproduces the disparities intended when communities were intentionally segregated by historical redline policies. Although the tests themselves are of high technical quality and the disparities in test scores are a reflection of inequities that exist within the public educational system, these uses of test scores by other institutions and state agencies nonetheless function as apparatus that preserves inequities produced by previous policies and practices. When connecting the use of test scores to inform graduation, admission, and scholarship decisions, it is important to recognize that test developers

Educational Measurement as Apparatus for Systemic Racism 249 typically serve as subcontractors for decision-makers and that the decisions themselves are not made by the educational measurement community.22 Further, the primary responsibility for disparate impacts of decisions resides with the decision-makers and not the test developer.23 And, while the test scores employed for these decisions reflect inequities in the educational system produced by systemic racism, increased reliance on other sources of information, such as essays and Advanced Placement Program (AP) coursework, would likely also reflect inequities in educational opportunities. Nonetheless, the tests are designed to reflect differences in academic achievement/ college readiness—differences that systemic racism manufactures to differ among racialized groups. Further, it is understood by both the decisionmakers and test developers that the test scores are intended to be used to inform graduation, admissions, and scholarship decisions, which in turn impact socioeconomic mobility. By operating within this larger system—a system that produces and preserves racialized disparities—graduation and admission testing function as apparatus for systemic racism. As this analysis reveals, there are several ways in which the products of educational measurement function as apparatus that support the system of racism that operates the United States today. This functioning as apparatus for systemic racism occurs absent conscious intent by the educational research community. However, this unconscious operation is an intended function of the White Racial Frame. The White Racial Frame directs attention to issues, ideas, priorities, and methods that help justify and sustain the system of racism. At the same time, the White Racial Frame obscures focus on those factors—contexts produced through racialized advantage and oppression—that are the primary drivers of disparate outcomes among racialized groups. Recognizing the many ways in which educational measurement has and continues to be influenced by the White Racial Frame is a necessary step in reckoning the roles that educational measurement plays in this system of racism. Part III explores alternate frames which hold promise to transform educational measurement from its current operation as apparatus for systemic racism to that which supports an anti-racist endeavor. Notes

1 2 3 4 5 6 7

Omi and Winant (2015), p. 128. Omi and Winant (2015), p. 125. Omi and Winant (2015), p. 125. The full sentence was italicized in the original. Foucault (1980). Foucault (1980); Omi and Winant (2015). Omi and Winant (2015), p. 128. I recognize that one could also argue that these rationales are presented to mask the intended racialized disparities each of these practices produce. 8 Alexander (2012).

250 The White Racial Frame and Educational Measurement 9 Barreto et al. (2019); Fraga and Miller (2022); Grimmer and Yoder (2022); Highton (2017). 10 Bertrand and Mullainathan (2004); Doleac and Hansen (2020). 11 Foucault (1980, 1994). 12 Spector and Brannick (2011); Russell et al. (2022a). 13 Holland (2008); Zuberi and Bonilla-Silva (2008). 14 Pace et al. (2017); Stephenson et al. (2008); Yeo et al. (2014). 15 Sireci (2021), p. 4. 16 Women in Measurement (2021); Randall et al. (2021). 17 Russell et al. (2022b). 18 Randall (in press), p. 11. 19 Randall (in press). 20 Randall (in press). 21 Omi and Winant (2015). 22 Albano (2021). 23 Geisinger (2021).

References Albano, A.D. (2021). Commentary: Social responsibility in college admissions requires a reimagining of standardized testing. Educational Measurement: Issues and Practice, 40(4), 49–52. Alexander, M. (2012). The New Jim Crow: Mass Incarceration in the Age of Colorblindness. The New Press. Barreto, M. A., Nuño, S., Sanchez, G. R. & Walker, H. L. (2019). The racial implications of voter identification laws in America. American Politics Research, 47(2), 238–249. Bertrand, M. & Mullainathan, S. (2004). Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination. American Economic Review, 94(4), 991–1013. Doleac, J.L. & Hansen, B. (2020). The unintended consequences of “ban the box”: Statistical discrimination and employment outcomes when criminal histories are hidden. Journal of Labor Economics, 38(2), 321–374. Foucault, M. (1980). Power/Knowledge: Selected Interviews & Other Writings 1972– 1977. Pantheon Books. Foucault, M. (1994). Power. Edited by J.D. Faubion. The New Press. Fraga, B.L. & Miller, M.G. (2022). Who does voter ID keep from voting? Journal of Politics, 84, 1091–1105. Geisinger, K.F. (2021). Commentary: Social responsibility, fairness, and college admissions tests. Educational Measurement: Issues and Practice, 40(4), 57–60. Grimmer, J. & Yoder, J. (2022). The durable differential deterrent effects of strict photo identification laws. Political Science Research Methods, 10, 453–469. Highton, B. (2017). Voter identification laws and turnout in the United States. Annual Review of Political Science, 20, 149–167. Holland, P.W. (2008). Causation and race. In White Logic, White Methods: Racism and Methodology. Rowman & Littlefield. Omi, M. & Winant, H. (2015). Racial Formation in the United States. Routledge.

Educational Measurement as Apparatus for Systemic Racism 251 Pace, A., Luo, R., Hirsh-Pasek, K. & Golinkoff, R.M. (2017). Identifying pathways between socioeconomic status and language development. Annual Review of Linguistics, 3, 285–308. Randall, J. (in press). It ain’t near ‘bout fair: Re-envisioning the bias and sensitivity review process from a justice-oriented antiracist perspective. Educational Assessment. Randall, J., Rios, J.A. & Jung, H.J. (2021). A longitudinal analysis of doctoral graduate supply in the educational measurement field. Educational Measurement: Issues and Practice, 40(1), 59–68. Russell, M., Oddleifson, C., Russell Kish, M. & Kaplan, L. (2022a). Countering deficit narratives in quantitative educational research. Practical Assessment, Research, and Evaluation, 27(1), 14. Russell, M., Szendey, O. & Li, Z. (2022b). An intersectional approach to DIF: Comparing outcomes across methods. Educational Assessment, 27(2), 115–135. Sireci, S.G. (2021). NCME presidential address 2020: Valuing educational measurement. Educational Measurement: Issues and Practice, 40(1), 7–16. Spector, P.E. & Brannick, M.T. (2011). Methodological urban legends: The misuse of statistical control variables. Organizational Research Methods, 14(2), 287–305. Stephenson, K.A., Parrila, R.K., Georgiou, G.K. & Kirby, J.R. (2008). Effects of home literacy, parents’ beliefs, and children’s task-focused behavior on emergent literacy and word reading skills. Scientific Studies of Reading, 12(1), 24–50. Women in Measurement. (2021). 2021: A year in review. Accessed at https://www. womeninmeasurement.org/assets/files/WIM-Annual-Report2021.pdf Yeo, L.S., Ong, W.W. & Ng, C.M. (2014). The home literacy environment and preschool children’s reading skills and interest. Early Education and Development, 25(6), 791–814. Zuberi, T. & Bonilla-Silva, E. (2008). White Logic, White Methods: Racism and Methodology. Rowman & Littlefield Publishers.

Part III

Alternate Lenses for Educational Measurement New perspectives, new theories, and new empirical information all can enable us to see how things can be different from the ways they first present themselves to us, and how things even could be different from the ways they are.1

The White Racial Frame is hegemonic in U.S. society and other regions of the world.2 At its core, the White Racial Frame understands race as a natural trait inherent to each individual. The White Racial Frame also establishes hierarchy to racialized categorizations, placing those membered White at the top. Over time, this white supremist world view was wrapped with notions about the biological heredity of physical, mental, and psychological traits, universal laws of nature and society, the scientific quantification of those laws, and a utilitarian view of justice. Racialized narratives developed through this White Racial Frame construct White (Anglo-Saxon) European culture, discourse, and behaviors as supreme. At the same time, these narratives pathologize people membered not-White, attributing disparate outcomes to the racialized people themselves. In turn, these narratives function to distract attention from social institutions and structures that function as apparatus for systemic racism. Created, advanced, and applied within societies operated by systemic racism, educational measurement was and continues to be similarly influenced by the White Racial Frame. Early conceptions of mental abilities were influenced by a hereditary conception of mental traits and the role these traits play in shaping lived experience. Belief in the hierarchical ordering of racialized groups distracted early pioneers in educational measurement from the bias in test instruments and instead led them to interpret differences in test scores as further evidence that supports racialized narratives. Statistical methods were developed to provide objective, quantitative tools that were applied to examine individual and group differences. Treating race as an individual trait, interpretations informed by statistical analyses often present

DOI: 10.4324/9781003228141-13

254 Alternate Lenses for Educational Measurement race as a causal variable and, in turn, contribute further to the production of racialized narratives. Finally, the embrace of individual merit and a utilitarian conception of justice influenced the development and eventual widespread adoption of admission tests. Undervaluing the influence social structures have on test performance, interpretations of test scores as individual productions now function as apparatus that contributes to the disparate outcomes produced by systemic racism. Over the past century-plus, several critical theories have been introduced that challenge components of the White Racial Frame. Part III examines several of these theories and considers how the adoption of each critical theory might influence practices within educational measurement as I have broadly defined it. Chapter 10 examines Critical Theory. Critical Theory encourages us to examine dominant ideology and to consider the ways in which it is constructed and operates to sustain current power arrangements. Critical Theory also asks us to understand history as a human production and to place current structures and power arrangements within a historical context. Embracing historicity, Critical Theory challenges positivist attempts to discover universal social laws. Finally, Critical Theory encourages social scientists to reflect on the influences dominant ideology has on their work and to consider how their work might be applied to challenge false or otherwise problematic aspects of that ideology. Chapter 11 explores Critical Race Theory and Quantitative Critical Race Theory (QuantCrit), both of which acknowledge that racism has and continues to function in our society to provide advantage for the dominant racialized group through the oppression of nondominant racialized groups. These theories treat racialized categories as social constructions developed and refined to support systemic racism. These theories also recognize the role narratives play in both sustaining the racialized ideology that is core to the White Racial Frame and in challenging that ideology. Tenets of Critical Race Theory and QuantCrit encourage educational measurement specialists to rethink their use of race and other socially constructed identity categories in statistical analyses. In addition, these theories ask us to develop new measures that better represent experiences with oppression and to use these variables in analyses that challenge racialized narratives. Chapter 12 focuses on Intersectionality Theory and the ways in which it challenges dominant conceptions of social constructs. Rather than understanding racialized categories, such as gender, social economic status, sexuality, and ableness, as discrete single-axis identity categories, Intersectional Theory directs attention to the unique experiences produced through the intersection of these social constructions. In this way, Intersectionality Theory challenges educational measurement specialists to again reconsider the ways in which variables representing identity are conceptualized.

Alternate Lenses for Educational Measurement 255 In addition, Intersectionality Theory highlights opportunities to develop new methods of analysis that better align with modern social theories. Finally, Chapter 13 examines Justice as Fairness and a Rectificatory conception of justice and considers how these theories of justice challenge the utilitarian structure of our society. These conceptions of justice seek to gain benefit for all, particularly those most harmed in the past. When applied to educational measurement, these alternate frames hold potential to modify practices and applications of educational measurement in ways that allow the institution to serve as apparatus for an anti-racist agenda that challenges the current system of racism. As the quote by Calhoun that opens Part III encourages, the frames explored in this Part offer new perspectives, theories, and empirical information that provide opportunities for the field of educational measurement both to reckon with racism as it has operated and continues to operate, and to embrace its altruistic goals by shifting practices to center on those who have been harmed by both past practices and current uses of scores our instruments produce. Notes 1 Calhoun (1995), p. 2. 2 Bracey et al. (2017), p. 59.

References Bracey, G., Chambers, C., Lavelle, K. & Mueller, J.C. (2017). The white racial frame: A roundtable discussion. In Systemic Racism. Palgrave Macmillan. Calhoun, C. (1995). Critical Social Theory: Culture, History, and the Challenge of Difference. Wiley-Blackwell.

10 Critical Theory

Only thought which does violence to itself is hard enough to shatter myths.1 It is said of generals that, based on experience, they are always preparing to fight the last war. One of the roles of theory is to enable us to recognize in what ways our future wars may be different.2

The 1920s were a turbulent period for postwar Germany. By signing the Treaty of Versailles, Germany acknowledged responsibility for loss and destruction in the Allied Nations and agreed to pay retributions that totaled more than $400 billion in 2023 U.S. dollars. Burdened by its own costs for fronting the war, these retributions left Germany deeply saddled with debt. Absent a functioning economy, German soldiers returned home without prospects for employment. Pulsing with austere nationalistic pride, thousands of these soldiers bonded with unemployed youth and other discontented citizens and formed paramilitary groups led by former midlevel German military leaders. Known as Freikorps, these unsanctioned militias terrorized Germany and contributed to the assassination of nearly 400 political leaders over a three-year period. Political tumult was soon followed by devastating economic collapse and hyperinflation sparked by Germany’s efforts to renege on its retribution obligations.3 It is said that inflation was so rapid that laborers demanded payment twice each day while restaurant prices rose as patrons waited for their meals to be served.4 The publication of the Protocols of the Learned Elders of Zion, a fictitious narrative that produced false accounts of the ambitions of people who practiced Judaism, added to this economic and political turmoil by strengthening antisemitic feelings among many German nationals and provided a scapegoat for Germany’s hardships. Collectively, this political, economic, and social instability created conditions that enabled the gradual rise and eventual seizure of power by the ultra-nationalistic, antisemitic Adolph Hitler and his fascist National Socialist German Workers (a.k.a. Nazi) Party. DOI: 10.4324/9781003228141-14

258 Alternate Lenses for Educational Measurement For a select group of German intellectuals allured by the idealistic vision of socialist society crafted by Karl Marx during the latter half of the 19th century, the turmoil of the 1920s and eventual embrace of totalitarian fascist leadership was troubling. Forming what became known as the Frankfurt School, this group of intellectuals pondered how citizens of a functioning democracy allowed themselves to become dominated by an anti-democratic repressive authoritarian regime. Over decades, this driving question expanded from Hitler’s repressive rule to a more general failure of Enlightenment thought to liberate humanity, instead entrapping the masses within the oppressive leadership of the few. As Max Horkheimer and Theodor Adorno, two of the Frankfurt School’s most prolific thinkers, wrote, Enlightenment, understood in the widest sense as the advance of thought, has always aimed at liberating human beings from fear and installing them as masters. Yet the wholly enlightened earth radiates under the sign of disaster triumphant. Enlightenment’s program was the disenchantment of the world.5 This observation raised an essential question: How can the progress of modern science and medicine and industry promise to liberate people from ignorance, disease, and brutal, mindnumbing work, yet help create a world where people willingly swallow fascist ideology, knowingly practice deliberate genocide, and energetically develop lethal weapons of mass destruction?6 Although the questions explored by members of the Frankfurt School expanded beyond the failure of the Enlightenment, democratic, and capitalist ideals to protect humankind from repressive domination, it was the unsettling events that occurred in post–World War I Germany that motivated the development of Critical Theory.7 No one person affiliated with the Frankfurt School is credited with developing the concept of Critical Theory. Rather, the combined efforts by members of the Frankfurt School to identify and challenge the ways in which institutions functioned to shape and control society “formulated a unified approach to social investigation and criticism.”8 It is this approach to identifying and challenging the levers of an institution that produce harmful social productions that became known as Critical Theory. Extending Marx’s analysis, which directed attention to the role capitalist modes of productions played in subjecting the working person to subordinate positions in the social system, the Frankfurt School pointed to the roles modern technologies,

Critical Theory 259 positivistic social science, and ideology played in similarly subjecting the common person to subordinate positions within society. The work and influence of the Frankfurt School on social research and the development of social theory is divided into two generations. The first generation began in the late 1920s and was led by Theodor Adorno, Max Horkheimer, Erich Fromm, and Herbert Marcuse. A second generation emerged in the 1960s and was led by Jurgen Habermas. Following the rise to power of the German National Socialists in the early 1930s, members of the Frankfurt School sought refuge in the United States, where they observed dominance of positivist thought in social inquiry. Their attention also focused on the role of ideology and culture, particularly the mass production of culture, in conditioning members of society to accept oppressive conditions that favor an elite few. Although Critical Theory was and continues to be applied to identify structures and functions within society that yield negative productions, engagement with Critical Theory also creates a “discourse of possibility.” This discourse of possibility envisions the liberatory experience that emerges from the dismantling and transformation of the oppressive structures that are the focus of analysis. In this way, Critical Theory offers both a critique of negative functions of society and a vision of the positive experiences produced through liberation from those functions. Since its emergence in the late 1920s, Critical Theory has informed the development of an expanding number of social theories including feminist theory, postcolonial theory, critical race theory, critical media studies, and queer theory, among others. Critical Theory is a challenging idea to grasp. As we will see, Critical Theory challenges positivism, asks us to engage in critical analysis of the social arrangements that exist in modern societies as well as the ideology that enables those arrangements, and requires us to reflect on our role in those arrangements. For readers grounded in empiricism, positivism, postpositivism, and/or scientific inquiry, ideas presented by Critical Theory may feel foreign. The style of discourse employed by Critical Theorists may also be unfamiliar. And the relevance to educational measurement may at first be obscure. In the sections that follow, I present select aspects of Critical Theory that have relevance to the alternate frames presented in upcoming chapters and which have implications for reducing educational measurement’s function as apparatus for systemic racism. In doing so, I attempt to respect the complexity of ideas that form Critical Theory while attempting to make these ideas accessible for readers. Given the broad and expanding nature and application of Critical Theory, this chapter treats the two generations of the Frankfurt School as a unitary whole and focuses on a select set of core concepts and applications that have direct relevance for modifying the frame(s) used to inform work within the field of educational measurement. Among the topics explored

260 Alternate Lenses for Educational Measurement are: the idea of critical theory; a critique of positivism; the importance of historicity, criticality, and reflexivity; and the role ideology and mass culture play in deceiving those subordinated in society to embrace oppressive conditions. The chapter ends by considering implications these core concepts of Critical Theory have for the field of educational measurement. The Idea of Critical Theory For many readers, the word theory likely evokes a connected set of ideas that explain the production of a specific phenomenon, occurrence, or class of occurrences. As Omi and Winant observe, “Theory is driven by demand; by a necessity to explain [and] account for.”9 Johannes Kepler’s laws of planetary motion were developed to explain the movement of the planets around the sun and, by extension, a moon’s movement around a planet. Sir Isaac Newton’s theory of gravity was developed to explain why Kepler’s laws of planetary motion held and provided a physical rationale for the pull, or kinetic impact, objects have on each other. Archimedes’ principal of buoyancy was developed to explain why some objects overcome the pull of what Newton later termed gravity and float, while others do not. These and many more scientific theories provide explanations for specific physical occurrences observed by humans. When applied to social functions, traditional theories are developed to explain the production of events and outcomes that occur within a society. John Keynes developed theory to explain the relationship between demand and fluctuations in economic output and inflation. In education, Jean Piaget developed theory to explain the cognitive development of children. Theories of action are similarly developed to provide an explanation for how change occurs within an organization, institution, or system. As an example, the Smarter Balanced Assessment Consortium introduced a theory of action that explained how establishing common educational standards, developing and administering tests designed to measure the achievement of those standards, and providing supports such as formative feedback and professional development work together to improve the quality of teaching and, in turn, the level of student achievement.10 Common across these and the large corpus of traditional social theories is the formation of a “string of connected statements that represented the nature of the world around us” and explain events that occur within that world.11 In this way, traditional theory “enables us to make observations and thus convert sensory impressions into understandings we can appropriate as facts. Theories [also] offer us ways to think about the empirical world, ways to make observations, and ways to formulate tests.”12 In developing traditional theory, science functions as a mirror reflecting the world, explains how that world functions, and how to function more efficiently

Critical Theory 261 and effectively within that world.13 Although theories are refined and advanced over time, or in some cases are entirely replaced by new theories, traditional theories explain the world as it is, often assuming permanence of that world.14 Critical Theory differs from traditional theory in three fundamental aspects. First, the goal of Critical Theory is not limited to explaining a given phenomenon or occurrence. While Critical Theory provides an explanation for a given phenomenon or occurrence, it also aims to reveal the structures that provide the conditions under which those phenomena and occurrences are produced.15 For Critical Theory, it is the structures that produce occurrences that are of primary interest, not the occurrences themselves. Second, Critical Theory recognizes the social construction of those structures. As social constructions, Critical Theory recognizes their impermanence. Current structures are understood as products of historical social development. Both history and the structures that exist at any moment of history are acknowledged as human productions. As human productions, history and existing structures are alterable by humankind. This ability to be altered directs the focus of Critical Theory on aspects of current history and existing structures that operate to differentially impact members of a social system. Third, recognizing the role historical developments and existing structures play in disparately shaping human experiences, Critical Theory aims to liberate humankind from the oppressive functions of those structures. In these ways, Critical Theory “aims to better society by both understanding and working to change it.”16 As the two quotes that open this chapter reflect, Critical Theory aims to shatter myths and reveal truths about social arrangements in order to form a future that is more just. It is with this pursuit in mind that Max Horkheimer writes, For all its insight into the individual steps in social change and for all the agreement of its elements with the most advanced traditional theories, the critical theory has no specific influence on its side, except concern for the abolition of social injustice.17 Three ideas are essential for understanding how Critical Theory differs from traditional theory. These ideas include historicity, criticality, and reflexivity. Undergirding Critical Theory’s embrace of these three concepts is its critique of positivism. Critique of Positivism

The Frankfurt School coalesced in the mid-1920s, and its work expanded and evolved over five decades. In the 1920s, nearly all social science was

262 Alternate Lenses for Educational Measurement conducted through the classic positivist frame introduced in the 1830s and 1840s by Auguste Comte.18 At this time, the field of psychology was still struggling for acceptance as a legitimate form of science, group-administered tests of cognitive ability were nearing the end of their first decade of use, Stevens’s four levels of measurement was 25 years from formulation, and a biological conception of race remained largely unchallenged. It was in this context that the Frankfurt School offered Critical Theory as an alternative to a positivist approach to social science. As Ben Agger, a late professor of sociology and humanities, unpacks, members of the Frankfurt School understood the primary aim of positivism as an “attempt to formulate lawful understandings of the social universe … [and] attempt to explain the causal relationships supposedly governing the social world.”19 Positivist theories increasingly relied on statistical methods and mathematical expressions to represent causal relationships among an outcome (dependent) variable of interest and one or more input (independent) variables that contribute to the production of that outcome. Commonly employed statistical models attempt to explain variance in the outcome variable by estimating the relative contribution each input variable has on the production of variability in the outcome variable. As an example, a positivist model might attempt to explain variance in divorce rates that is produced by a person’s social class and racialized identity. Positive social theory assumes that social laws govern these sorts of relationships between dependent and independent variables … [and] assembles these findings into general patterns of social explanation, perhaps, in terms of the aforementioned example, elaborating large-scale theoretical understanding of the relationship between family dynamics, reflected particularly in the divorce rate, and class and race.20 In so doing, a positivist social theory accepts the world as it is. In this example, both social class and racialized identity are accepted as natural innate properties of individuals, and are understood to function as causal contributors to enduring marriage or divorce.21 In this way, the positivist perspective views each subject as an individual operating independently and freely within society. As a freely operating agent, it is the characteristics of each individual that influence their development, decisions, and actions. The Frankfurt School rejected this individualist position “for failing to account for how the social context impacts upon agents.” Instead, Critical Theory embraces a structuralist position, “which focus[es] on how social and historical forces shape the behaviour of individuals … social structures emerge from the actions of individuals and then exert a causal influence over individuals.”22 For members of the Frankfurt School, positivist understanding of

Critical Theory 263 individuals operating freely within a world ignores the role humanity plays in shaping that world and instead accepts the world as it is. Accepting the world as it is enabled positivism to treat the laws governing social actions and outcomes as universal, applying across time and space. Members of the Frankfurt School argued that accepting the world as it is not only overlooks the role humanity plays in shaping history and constructing the conditions in which society operates, but also ignores the role ideology and domination play in maintaining these constructions and justifying the outcomes experienced by individuals that are produced by these constructions. A core distinction between positivism and critical theory flows from their primary purposes and underlying assumptions. Positivism accepts the social world as it is and aims to describe and discover laws governing that world. Critical Theory understands the social world as a construction of humankind, the alteration of which will produce different outcomes and experiences. As a result, any law governing a social production is impermanent, and its application is limited to the society’s current historical instantiation and structure. A main function of Critical Theory, then, “is to point to the contradictions of the present and to encourage the emergence of needs, patterns of interaction, and struggle which point the way toward a new [and more just] society.”23 Members of the Frankfurt School were also critical of positivist conceptions of objectivity. This criticism is particularly evident in Habermas’s critique of discourse employed by positivist-oriented social scientists. Sociologist Kyung-Man Kim suggests that Habermas’s critique of objectivity is an outgrowth of his understanding of the purpose of positivist inquiry: “aiming at the control and prediction of the external world … these sciences are concerned with deciding the truth value of a particular type of statements that are about the external state of affairs.”24 Being external to research, these states of affairs exist independent of the researcher. The research endeavor, then, is monological, with the researcher behaving as “a solitary scientist interacting with the external world to acquire knowledge” in order to describe and make sense of that external world. Habermas is critical of “the positivists’ claim that the analyticalempirical sciences rest on nothing other than themselves—that is, they are capable of explicating themselves—[and thus] must be taken as deceiving themselves.”25 Instead, Habermas argues that the scientific inquiry with which any scientist engages begins with the scientist’s special interest in the topic of inquiry. In addition, any observation or understanding acquired during the process of investigation is interpreted through a lens of pre-understanding—knowledge and experiences the researcher brings to their current investigation. The acceptance or rejection of findings from the investigation are similarly dependent on the pre-understanding that exists within the broader research community. In this way,

264 Alternate Lenses for Educational Measurement the community decision on which the acceptance or rejection of the basic statements depends is laid down “institutionally” … [and is based on] the existence of a “prior consensus” that enables scientists to distinguish empirically plausible causal hypotheses from those that are not.26 As Habermas describes, “an implicit pre-understanding of the rules of the game guides the discussion of the investigators when they are deciding whether to accept basic statements.”27 Pre-understanding, then, influences the questions asked, the methods employed to gather empirical evidence, the interpretation of observations, and ultimately the acceptance or rejection of findings. Given the role pre-understanding plays throughout the research endeavor, and its location within the researcher and research community, the researcher necessarily acts as a subjective participant in the research endeavor. As an example, consider the interpretation Lewis Terman, Robert Yerkes, and Carl Brigham made based on their analysis of Army Alpha test scores described in Chapter 6. Each analyst noted that the average intelligence of test takers, based on Alpha scores, was alarming low. They also noted differences among racialized groups. And they noted that as the years of schooling test takers had completed increased, so too did their test scores. Based on these observation, one might question whether the test was measuring native intelligence or academic achievement. Further, given that newly arrived immigrants, people membered Black, and people from lower economic households might, on average, have few years of schooling, one might also wonder if the difference in scores among groups was a product of differences in schooling. These analysts, however, came to their work with an important pre-understanding: intelligence differed among racialized groups, and people membered White were “naturally more intelligent” than people membered not-White. As a result, seeing the test scores align with their pre-understanding confirmed their pre-understanding and halted any further inquiry. Historicity

As we saw in Chapter 4, Comte applied the physical sciences as a metaphor for engaging in social science and aimed to develop sociology into a form of “social physics.” As such, social sciences endeavor to discover laws governing social processes which effectively “[freeze] the present into ontological ice, portraying such historical patterns as capitalism, racism, sexism, and the domination of nature as inevitable and necessary.”28 Critical Theory challenges the notion that historical patterns are naturally occurring events. Instead, social patterns are understood as historically fluid productions of human thought, decisions, and actions. It is the historical fluidity of social patterns that Marxist philosophers term historicity.

Critical Theory 265 For Critical Theory, historicity plays a central role in defining the purpose of social research. Recall the disillusionment which members of the Frankfurt School felt as the promises of Enlightenment, capitalism, and technological advances intended to improve the lot of humanity were eclipsed by the economic, political, and social turbulence experienced in a postwar Germany that gradually descended into totalitarian fascist rule. Social patterns such as those that emerged in postwar Germany were understood by the Frankfurt School as creations of humanity. Understanding these social patterns as natural occurrences leaves those harmed by the evolved conditions helpless to alter their life course or to improve the lot for those who follow. Critical Theorists, however, recognize that “emancipation [from oppressive conditions] is predicated on people’s ability to recognize ‘historicity,’ [and] the impermanence of domination.”29 Failure to recognize historicity both blinds humanity to its agency and enables humanity to shirk responsibility for producing the conditions under which it suffers. Critical Theory views the universal laws produced through positivist social sciences as a fatalistic acceptance of the domination and oppression of the many for the benefit of the few. Critical Theory employs historicity to argue that freedom from oppression is possible through concerted efforts to shape history’s course. Embracing historicity and aiming to shape history by altering structures and power arrangements that produce domination and oppression, Critical Theory is as a political project. And when applied to address issues of racism, Critical Theory can be applied as part of an anti-racist project. Here, too, we see another important contrast with positivist-oriented social research. Whereas positivist-oriented research aims to make positive change by applying understanding of social laws to maximize outcomes, Critical Theory aims to improve outcomes by making positive social change.30 This effort to identify and alter the structures and power arrangements within society that produce domination and oppression requires criticality and reflexivity. Criticality

A primary aim of Critical Theory is to empower humanity to improve social conditions. A first step in improving social conditions is to unveil what is not seen and to give voice to those who are unheard. The Frankfurt School believed it is unrecognized fallacies within the dominant ideology and associated narratives that lure the masses into accepting social conditions that bind them to an oppressed state. Unveiling fallacies requires asking questions that have been unasked and questioning assumptions that undergird social constructions in order to explore which assumptions hold and which do not. In this way, critical analysis is a challenge to the status quo that requires constant checking of the assumptions and beliefs that justify social arrangements.31

266 Alternate Lenses for Educational Measurement As Horkheimer describes: The critical attitude of which we are speaking is wholly distrustful of the rules of conduct with which society as presently constituted provided each of its members. The separation between individual and society in virtue of which the individual accepts as natural the limits prescribed for his activity is relativized in critical theory. The latter considers the overall framework which is conditioned by the blind interaction of individual activities (that is, the existent division of labor and the class distinctions) to be a function which originated in human action and therefore is a possible object of planful decision and rational determination of goals.32 Horkheimer questions the blind faith that society currently employs to benefit its members. This questioning requires one to apply a level of distrust in society and, more specifically, the decisions those in power have and continue to make in their structuring and operation of society. “Thinking critically about such rules enables people both to unmask societal rules that foster passivity and to refuse to accept them.”33 Embracing the impact human action can have on the (re)creation of worldviews and (re)structuring of social arrangements motivates critical analysis.34 “Given this, the task of research is not to uncover new truths about reality but to unmask supposedly objective knowledge claims by exposing them as symptoms of underlying power relations.”35 Drawing on the work of political theorist John Dryzek, Kerry Howell points to three implications of critical analysis for a research program. First, a research program should work to: “understand the ideologically distorted subjective situation of some individual or group, second … explore the forces that have caused that situation and third to show that the forces that have caused this situation can be overcome” through making these forces clear to those groups or individuals that exist within these situations. Consequently, critical theory involves reflective action, specifically the reflective action of those individuals and groups involved in the research programme.36 Reflexivity

Critical theorists question the impact research conducted through a positivist lens has on the improvement of society. This skepticism derives both from the incremental change typically produced through positivist research and, more importantly, the location of positivist research within the very institutional and social arrangements that oppress the masses for the benefit of the few. By supporting incremental change within the existing social arrangements and

Critical Theory 267 accepting those arrangements as normative, positivist research is served by those very same social arrangements. In this way, positivist research and existing social arrangements function as a closed loop—positivist research supports slow incremental modifications to the social institutions that fund positivist research, and the social institutions funding positivist research appropriate these gradual modifications in ways that maintain power and control of the status quo. Critical Theory also rejects conceptions of social research as an objective endeavor. Critical Theory does not recognize the researcher as an objective external observer of social phenomena or the social production of outcomes. Instead, Critical Theory acknowledges that the researcher brings pre-understandings to their endeavor. As described previously, the researcher is part of the social system they are examining. Given Critical Theory’s ambition to transform social systems to eliminate social injustices, the researcher is also an integral component in that change process—the critical analyst chooses to what their attention is directed and, when fallacies are unveiled, directs attention to possible alternatives. It is the ways in which institutions influence research and the choices made by the researcher throughout the research process that lead Critical Theory to view social research as subjective rather than objective. Through reflexivity, Critical Theory aims to disrupt the circular arrangement between institutions funding researchers, the researchers’ decisions, and researchers’ position with the social system in which those institutions function. Within Critical Theory, reflexivity operates in multiple integrated ways. Before engaging in a research endeavor, the researcher identifies their own location and lever(s) of power within the social system under study. They consider the reason(s) for engaging in their research endeavor and reflect on the benefit(s) that may be produced through the endeavor—for the researcher, for those who participate in the program of research, for the institutions supporting and/or involved in the endeavor, and for humanity more broadly. Given the role pre-understanding plays in shaping research questions, selecting methods, and shaping interpretations, the researcher reflects on the pre-understandings they bring to their endeavor and the subjective contributions they provide to that endeavor. Questions are asked about epistemology, methodology, and methods employed for the endeavor, with careful consideration of the influence dominant ideology and practices have on the adoption of these positions. This reflection is not engaged to simply resist dominant practice, but rather to assure that dominant practices are not adopted merely to gain acceptance of the endeavor or findings from that endeavor. Reflexivity also directs focus on the location of the research endeavor and participants in that endeavor within the structures that form the social system in which the project is situated. Here, reflection on the ideology that

268 Alternate Lenses for Educational Measurement gave rise to and continues to nourish those structures is necessary to anticipate and confront ideas and narratives that resist findings and implications for change derived from the endeavor. In these ways, “Critical Theory demonstrates reflexivity about both its own practices and how its social location within power relations shape those practices. Reflexivity is part of critical practice, especially when knowledge has potential effects on people’s lives.”37 Ideology Historicity, criticality, and reflexivity were applied by members of the Frankfurt School to explain how it was that the masses allowed themselves to be subjected to domination. Unlike the enslavement of people of African descent, the colonialization of Indigenous people, and the Jewish Holocaust, all of which required armed domination, the Frankfurt School understood the oppression that developed in free societies to be a product of consent by the masses to subordination through legal democratic processes.38 Ideology was the essential tool that enabled the production of consented subordination. For the Frankfurt School, ideology is more than a world view that is applied to understand and operate within the world as it is. Ideology is a form of consciousness that is used to legitimize oppression and the unequal distribution of power and surplus within a free society. Ideology is a form of consciousness bound to a particular interest or set of interests possessed by those holding power. As Agger and Baldus note: To be effective in reproducing people’s conformist behaviors, ideologies must not be sheer illusion but must in some respects correspond to “reality” as people experience it … ideologies are subtle attempts to portray the present as both rational and necessary, especially given the apparent alternatives, past and present.39 Raymond Geuss, a professor of philosophy, describes Critical Theory’s interest in ideology as it relates to power distribution in this way: That the distribution of normative power is “unequal” means that more is distributed to some group A, than to some other group B. If the social institutions distribute more normative power to A than to B, it will in general be in A’s (true or real) interest to retain the normative power its members wield; B will be the group to which the critical theory is “addressed.”40 The central concerns in Critical Theory’s analysis is to unveil those components of an ideology that are false, assist members of group B in seeing that falseness, and identify ideological and institutional transformations that will

Critical Theory 269 produce emancipation of members of group B from their oppressed state. To this end, according to Guess, a critical theory shows members of group B that: [a] the present social arrangements cause pain, suffering, and frustration; [b] the agents in the society only accept the present arrangements and the suffering they entail because they hold a particular world-picture; [c] that the world-picture is not reflectively acceptable to the agents, i.e., it is one they acquired only because they were in conditions of coercion; [and] [d] the proposed final state will be one which will lack the illusions and unnecessary coercion and frustration of the present state; the proposed final state will be one in which it will be easier for the agents to realize their true interests.41 In order for members of group B to embrace a critical theory, that theory must also make agents aware of the suffering or unhappiness they experience under the current state and create dissatisfaction with limitations produced by the current state of existence.42 The ideology supporting the current state contains fallacies that create illusions of benefit (current or potential) for members of group B, when in fact the realization of these benefits is limited or restricted—these fallacies produce a mirage for the vast majority membered into group B. Examining the way in which fallacies within an ideology are embraced by those harmed by those fallacies, Horkheimer and Adorno observe that: Just as the ruled have always taken the morality dispensed to them by the rulers more seriously than the rulers themselves, the defrauded masses today cling to the myth of success still more ardently than the successful. They, too, have their aspirations. They insist unwaveringly on the ideology by which they are enslaved. The pernicious love of the common people for the harm done to them outstrips even the cunning of the authorities.43 As an example, in Chapter 4, individualism and merit were defined as a key component of the ideology that is the White Racial Frame. Individualism and merit portray success within society as a product of hard work and perseverance. Individualism and merit are used both to establish that any member of society can rise up the ranks and to explain why the distribution of capital and power varies within society. And individualism explains why many people, who presumably do not apply sufficient persistent effort, remain in an oppressed state. Although individual effort is often necessary for success in our society, individualism alone is typically not sufficient for success. Further, for some, success is provided through other mechanisms such as family status, inheritance,

270 Alternate Lenses for Educational Measurement legacy, and nepotism absent any meaningful role of individualism. In this way, individualism operates as a fallacy within the ideology of the White Racial Frame and creates the mirage of individual responsibility for ascent or failure within our social system. By embracing individualism and merit as a chief determinant of one’s position within the social system, the great many who exist in a repressed or oppressed state within our society unknowingly consent to their present conditions. As Geuss describes, it is the particular insidiousness of ideology that it turns human desires and aspirations against themselves and uses them to fuel repression. These aspirations and desires do find a kind of expression in the ideology … which it is the task of the critical theory to set free.44 Critical Theory aims to refute false ideological beliefs and attitudes by inducing reflection that makes the subordinate population who hold these beliefs aware of how these beliefs were acquired.45 Mass Media An important topic of analysis for the Frankfurt School focused on how the masses who are repressed within a free society come to accept the fallacies in the ideology that operates in society. Mass media is a prime culprit. In the 1920s and 1930s, both the form(s) of media and the role(s) media played in people’s lives expanded rapidly as motion pictures, radio broadcasts, and magazines spread across the European and North American continents. For members of the Frankfurt School, the rise of mass media brought with it a consolidation of the issues, perspectives, and information served to the masses. As sociologist Patricia Hill Collins describes: Horkheimer and Adorno posited that technological developments enabled cultural products such as music, film, and art to be distributed on a mass scale … [and] fostered a sameness of cultural experience in that a mass of people could passively consume cultural content rather than actively engage one another. They argued that these new cultural formations fostered political passivity. Mass culture suppressed critical thinking and the social action that it might engender.46 Whereas communication and debate was once localized—occurring in cafes, town squares, town halls, places of worship, and other local gathering spots— mass media broadcasts the same narrowly focused messages across many locales with great rapidity. From Horkheimer and Adorno’s perspective, these new forms of media form a system that infects everything with sameness.47 Further, the one-way communication provided by the broadcast of

Critical Theory 271 media limits conversation and debate—there is no mechanism for the ordinary citizen to challenge, contest, or seek greater depth from a broadcast.48 While some variation in cultural productions were offered, the differences among productions is trivial and function as pseudo-individualization.49 Just as mass media consolidated the content broadcast across the social system, it also shaped culture. Mass media’s role in developing and shaping mass culture was of particular concern to Adorno. Trained as a musician who authored several concertos and published several musical critiques, Adorno observed how the rise of mass culture—or what he termed the “culture industry”—led to the standardization of artistic productions.50 This standardization was most evident in the production of music that was then broadcast through radio and later extended to television programs, where the same formula was applied across different episodes and shows. In this way, mass media and the culture industry it distributes serve to preserve the status quo and control individual consciousness by ensuring that “the reproduction of mind does not lead to the expansion of mind.”51 In these and other ways, members of the Frankfurt School posited that marketing and mass media are prime vehicles for controlling society, standardizing the expectations and needs of society, and aiding in society’s acceptance of ideological fallacies. Through control of the media, and by extension cultural productions, those benefitting from unequal power and capital distributions are able to control and develop members of a free society “into malleable and predictable people who without critical analysis accept social situations and consumerism.”52 Through analysis of the media’s role in producing sameness of thought and controlling the status quo, Critical Theory aims to help the spectator recognize one way in which they have come to embrace ideological fallacies that support the dominant and repressive state within the current social structures and power arrangements. Implications of Critical Theory for Educational Measurement Critical Theory is the foundation upon which the additional frames explored in this section are built. Critical Race Theory and Intersectionality Theory are critical theories that center analysis of power and oppression on race and the multiplicity of our identities, respectively. QuantCrit is a principled approach to applying quantitative methods in the context of analysis informed by Critical Race Theory. As explored in Chapters 11 and 12, in each of these three frames, assumptions are questioned, historicity is considered, criticality is applied, and reflexivity is engaged. These frames focus attention on power structures and the mechanisms that prop them up in order to target resistance and change. These frames also reveal shortcomings in some of the methods and practices currently employed

272 Alternate Lenses for Educational Measurement within the field of educational measurement and point to opportunities to advance those practices. For educational measurement, Critical Theory plays a similar role. Test development and research inquiry is undergirded by an assortment of assumptions, some of which are informed by the White Racial Frame. Ability and achievement have been understood (assumed) to be traits that reside and operate within each individual, and individuals have been understood to be the primary driver for development of these traits. Structuralism, constructivism, and sociocultural theories of knowledge development challenge individualistic assumptions. If elements of these theories better reflect knowledge production, what implications might there be for test and study design and inferences made based on scores and research findings? Objectivity and standardization are also key features of educational measurement assumed necessary for reliable and comparable score production. But might more valid inferences derive from greater flexibility in administrative conditions? For decades, the dimensionality of tests of intelligence was hotly debated. Today, for most cognitive tests, particularly those used to assess academic achievement, unidimensionality is assumed requisite. Yet, given the complexity of knowledge and skills applied in practical contexts, might a measure produced by a single cognitive dimension limit the generalizability of a test score inference? I present these questions merely as examples of how Critical Theory might be applied to explore assumptions that influence the development and use of educational measurements. Context and historicity are also directly relevant to both testing and the use of test scores in educational studies. Standardization in test format, content, and administration ignores differences in context, and assumes that a uniform context is requisite to maximize score comparability. Recent efforts to develop socioculturally sensitive assessments are beginning to challenge this assumption, but concerns are quietly emerging about the costs and feasibility of scaling these exploratory practices. Historicity and the focus on structures that produce barriers for accelerating academic progress for those membered into oppressed groups also has relevance to research in which test scores function as outcome measures. Such research tends to direct attention to individuals and those who interact with them (families, friends, and teachers), using findings to advocate that individuals change. While individuals and their immediate contacts play a role in the production of outcomes, implications of findings directed at individuals ignore the structural features that may play a larger role in impacting outcomes. Expanding the focus of research to examine structural impediments and enablers may create a considerable challenge for research design and implementation, but it may also reveal changes that can produce greater impact. Examining the role of mass media in educational measurement may seem unproductive. Yet, there are vehicles of communication within the field of

Critical Theory 273 educational measurement that do control and influence production and dispersion of knowledge and innovation. Like television programming, referred journals and conferences control which ideas are distributed and which are restrained. As one example, consider the power the chief editor of a journal such as Educational Measurement: Issues and Practices has in protecting the field from consideration of its racist productions and anti-racist alternatives by deeming such critical analyses as “out of scope” for the journal—and thus, for the field. Agencies funding research and development similarly influence work in the field, setting priorities that direct attention for those seeking financial support. Applying the lens of Critical Theory directs attention to the ideas, practices, and advancements these institutions inhibit, and those they promote, and who gains by these actions. For many in the field of educational measurement, Critical Theory may feel like a direct attack on the positivist elements upon which the field was founded. In advocating that the field incorporate Critical Theory into its lens, I am not suggesting that all elements of an existing frame be discarded. Instead, I suggest that criticality, reflexivity, acknowledgement of historicity, and careful analysis of institutions that control the production and dissemination of knowledge are necessary for shifting educational measurement in ways that redress its function as apparatus for systemic racism. Notes 1 Horkheimer and Adorno (1947/2002), p. 2. 2 Calhoun (1995), p. 10, italics in the original. 3 With Germany having fallen behind on retribution payments to France and Belgium, the two nations occupied the Ruhr Valley, an industrial region in western Germany. The occupation led to the closure of industrial plants, leaving thousands of Germans without work. Rather than meeting their retribution obligations, the German government opted to pay the now-idle workers. These payments rapidly taxed the German monetary reserves and forced the government to print German marks, the value of which rapidly decreased, causing hyperinflation across the nation. 4 Encyclopedia Britannica (1998/2022). 5 Horkheimer and Adorno (1947/2002), p. 1. 6 Zuidervaart (2015), no page numbering. 7 It is essential to acknowledge that although the Frankfurt School formally introduced the concept of Critical Theory, many of the ideas that form Critical Theory had been introduced and/or applied previously by Karl Marx, Friedrich Engels, and W.E.B. Du Bois, among others. 8 Howell (2012), p. 75. 9 Omi and Winant (2015), p. 249. 10 Smarter Balanced Assessment Consortium (2010). 11 Collins (2019), p. 60. 12 Calhoun (1995), p. 6. 13 Collins (2019). 14 See Kuhn (1962) on the occurrence of paradigm shifts in scientific theory.

274 Alternate Lenses for Educational Measurement 15 Collins (2019). 16 Collins (2019), p. 62. 17 Horkheimer (1982), p. 242, as quoted in Collins (2019), p. 62, italics in original. 18 Here I distinguish Comte’s classic positivism from later revisions to positivism introduced as Logical Positivism by the Vienna Circle in the 1920s and 1930s and post-positivism in the late 1930s and 1940s. Although positivism has evolved over time, I maintain many of the core concepts introduced by Comte’s classic positivism, which are the focus of critique by the Frankfurt School’s Critical Theory, have and continue to influence the practice of educational measurement. For this reason, I limit discussion of positivism to Comte’s classic form. 19 Agger and Baldus (1999), p. 24. 20 Agger and Baldus (1999), p. 24. 21 In the next chapter, I challenge this individualistic, biological conception of racialized identity and examine critiques of the use of racialized identity in causal analyses and interpretation of results from statistical analyses that attribute “effects” to racialized identity. I use this example here because it is representative of the manner in which many positivist-oriented social science researchers have (mis)employed racialized identity (and other demographic characteristics) in causal statistical analyses. 22 Cruickshank (2012), p. 73. 23 Benhabib (1986), p. ix. 24 Kim (2015), p. 73. 25 Kim (2015), p. 74. 26 Kim (2015), pp. 74–75. 27 Habermas (1974), p. 203, quoted in Kim (2015), p. 75. 28 Agger and Baldus (1999), p. 5. 29 Agger and Baldus (1999), p. 7, italics in the original. 30 Agger and Baldus (1999). 31 Foucault (1982), p. 778. 32 Horkheimer (1982), p. 207. 33 Collins (2019), pp. 60–61. 34 Howell (2012), p. 81. 35 Cruickshank (2012), p. 75. 36 Howell (2012), p. 83, quoting Dryzek (1995), p. 99. 37 Collins (2019), p. 64, italics in the original. 38 Here, the Frankfurt School was not focused on the oppression experienced by groups like enslaved people from Africa, people Indigenous to colonized land, or people of Jewish faith during Nazi Germany, whose oppression was the direct product of force. Rather, the Frankfurt School was focused on members of a democratic capitalist and technologically advanced society who were subjugated and oppressed within that society through legal democratic processes. Their concern was akin to one asked today when citizens support a political candidate whose policies do not benefit them but instead impede their pursuit of life, liberty, and happiness. 39 Agger and Baldus (1999), p. 8. 40 Geuss (1981), p. 74. 41 Geuss (1981), p. 76. 42 Geuss (1981), p. 84. 43 Horkheimer and Adorno (1947/2002), p. 106. 44 Geuss (1981), p. 88. 45 Geuss (1981), see p. 91.

Critical Theory 275 46 Collins (2019), p. 58. 47 Horkheimer and Adorno (1947/2002), see p. 94. 48 Horkheimer and Adorno (1947/2002), see p. 96. Some might argue that the Internet has provided a mechanism for the audience to respond to broadcasts. The Internet, however, was popularized several decades after Horkheimer and Adorno’s analysis of the mass media’s role in producing sameness. 49 Held (1980), p. 91. From Horkheimer’s (1941, pp. 302–303) perspective, an option offered by a manufacturer or producer of mass media “is not the product of genuine demands; rather, it is the result of demands which are ‘evoked and manipulated’” by those controlling the culture industry. 50 See Müller-Doohm (2005) and Claussen (2008) for detailed biographies that examine Adorno’s musical accomplishments and the role critique of music played in shaping his perspective on mass media’s impact on culture. 51 Horkheimer and Adorno (1947/2002), p. 100. See also Held (1980), p. 90. 52 Howell (2012), p. 83.

References Agger, B. & Baldus, B. (1999). Critical social theories: An introduction. Canadian Journal of Sociology, 24(3), 426–428. Benhabib, S. (1986). Critique, Norm, and Utopia: A Study of the Foundations of Critical Theory. Columbia University Press. Calhoun, C. (1995). Critical Social Theory: Culture, History, and the Challenge of Difference. Wiley-Blackwell. Claussen, D. (2008). Theodor W. Adorno: One Last Genius. Harvard University Press. Collins, P.H. (2019). Intersectionality as critical social theory. In Intersectionality as Critical Social Theory. Duke University Press. Cruickshank, J. (2012). Positioning positivism, critical realism and social constructionism in the health sciences: A philosophical orientation. Nursing Inquiry, 19(1), 71–82. Dryzek, J.S. (1995). Critical theory as a research program. In The Cambridge Companion to Habermas. Cambridge University Press. Encyclopedia Britannica. (1998/2022). Years of crisis, 1920–23. https://www. britannica.com/place/Germany/Years-of-crisis-1920-23 Foucault, M. (1982). The subject and power. Critical Inquiry, 8(4), 777–795. Geuss, R. (1981). The Idea of a Critical Theory: Habermas and the Frankfurt School. Cambridge University Press. Habermas, J. (1974). Rationalism divided in two: A reply to Albert. In Positivism and Sociology. Heineman. Held, D. (1980). Introduction to Critical Theory: Horkheimer to Habermas. University of California Press. Horkheimer, M. (1941). Art and mass culture. Zeitschrift für Sozialforschung, 9(2), 290–304. Horkheimer, M. (1982). Traditional and critical theory. In Critical Theory: Selected Essays. A&C Black. Horkheimer, M. & Adorno, T.W. (1947/2002). Dialectic of Enlightenment: Philosophical Fragments. Stanford University Press.

276 Alternate Lenses for Educational Measurement Howell, K.E. (2012). An Introduction to the Philosophy of Methodology. Sage. Kim, K.M. (2015). Discourses on Liberation: An Anatomy of Critical Theory. Routledge. Kuhn, T. S. (1962). The Structure of Scientific Revolutions. University of Chicago Press. Müller-Doohm, S. (2005). Adorno: A Biography. Polity Press. Omi, M. & Winant, H. (2015). Racial Formation in the United States. Routledge. Smarter Balanced Assessment Consortium. (2010). Theory of Action. https://files. eric.ed.gov/fulltext/ED536956.pdf Zuidervaart, L. (2015). Theodor W. Adorno. The Stanford Encyclopedia of Philosophy. https://plato.stanford.edu/archives/win2015/entries/adorno/

11 Critical Race Theory and QuantCrit

Our theories of society, not our empirical evidence, guide how we interpret racial data.1

Since the murder of George Floyd, Critical Race Theory has garnered more national attention than perhaps any modern social theory. Video documenting Floyd’s murder made the persistence of racism visible (again) to millions of people in the United States. In response, schools across the nation modified curriculum to assist students in developing a deeper understanding of racism and other forms of oppression. Conservative activists and political leaders, the vast majority of whom benefit from the advantages racism bestows upon people membered White, lashed out against these educational efforts. A once obscure social theory familiar to a relatively small group of academics, “critical race theory” became the buzz phrase conservative activists, legislators, and members of the press used to represent any educational effort intended to develop understanding of race, racism, anti-racism, or other forms of oppression.2 These critics reconfigured the term “critical race theory” into their boogeyman, arguing that teaching “critical race theory” “injects racism into what should be, in their view, a colorblind system.”3 U.S. Senate Majority Leader Mitch McConnell and 38 fellow Republican senators complained in a letter to the secretary of education that the teaching of “critical race theory” reflected “a politicized and divisive agenda.”4 Despite acknowledging that he had “never figured out what critical race theory is,” former Fox News commentator Tucker Carlson went so far as to broadcast claims that “critical race theory” is “an overtly racist doctrine” and “so obviously poison, fatal to any society that ingests it.”5 To avoid such “poisoning,” Florida governor Ron DeSantis introduced legislation that took “a stand against the state-sanctioned racism that is critical race theory,” coverage of which he argued teaches “kids to hate our country and to hate each other.”6

DOI: 10.4324/9781003228141-15

278 Alternate Lenses for Educational Measurement Like Florida, seven states implemented bans against “critical race theory,” and 16 additional states have bills in various stages of the legislative process.7 Despite nationwide attention over the past three years, Critical Race Theory is rarely mentioned in scholarship that focuses on educational measurement. This dearth of attention is exemplified by a search for the phrase “Critical Race Theory” in two of the field’s leading scholarly journals. Across all published articles, Critical Race Theory is mentioned only nine times in four articles. In one case, the author mentions the term in relation to the reaction some conservative leaders have had to the modifications of school curriculum.8 In another, the author notes that “[d]espite this clear commitment to equity, relatively few scholars and practitioners have taken an explicit social justice orientation or applied a critical race theory lens to our work in testing and assessment.”9 A third article similarly advocates for the applicability of Critical Race Theory to educational measurement, writing, “We see this need for race-reimage from a critical race theory in measurement methodology as an opportunity to counter the conventional color-blind lens through which constructs are typically examined.”10 The final article is the only one that applies tenets of Critical Race Theory to the author’s analysis.11 This chapter explores Critical Race Theory and QuantCrit. The chapter begins by exploring the early formation of Critical Race Theory by legal scholars and examines its core tenets. The emergence of QuantCrit and its core tenets are then described. The chapter ends by examining specific challenges QuantCrit raises for educational measurement while also outlining opportunities Critical Race Theory and QuantCrit provide for advancing educational measurement as apparatus for an anti-racist endeavor. Critical Race Theory Legal decisions and legislative acts that occurred in the 1950s and 1960s are hailed for extending the civil rights long enjoyed by a subset of the U.S. population to all citizens, regardless of one’s gender, racialized identity, religion, or national origin. Brown v. Board of Education of Topeka, Kansas (henceforth referred to as Brown) declared segregation in public education unconstitutional. A year later, President Dwight D. Eisenhower issued Executive Order 10590, which established and charged the President’s Committee on Government Policy with enforcing federal nondiscrimination policies in government employment. In 1961, President John F. Kennedy issued Executive Order 10925, introducing affirmative action in the hiring of government employees; and in 1964, President Lyndon B. Johnson signed an order preventing employment discrimination by federal contractors. The 1964 Civil Rights Act extended employment rights by authorizing the federal attorney general to prosecute employment discrimination and segregation based on

Critical Race Theory and QuantCrit 279 racialized identity in schools and public accommodations. The 1968 Act extended the attorney general’s focus to housing discrimination. Various protections of voting rights were introduced by the Civil Rights Acts of 1957, 1960, and 1965. In 1967, the U.S. Supreme Court’s ruling in Loving v. Virginia finally ended various state bans on interracial marriage. And in 1972, Swann v. Charlotte-Mecklenburg Board of Education ruled busing of students a legal approach to desegregating public schools. Collectively, these rulings, orders, and acts profoundly increased the legal backing of civil rights for racialized and other oppressed groups in the United States. For many people, the extension of civil rights was celebrated as a path for rectifying racialized disparities in employment, housing, income and wealth, education, and incarceration that had persisted since (and before) the Civil War. Although it was understood that racist thinking and attitudes of individuals were problematic, segregation and discrimination were viewed as the primary cause of these disparities. During the decade following the civil rights movement, improvements occurred in each of these areas, but change was less dramatic than many people hoped. This gulf between the expansion of the protection of civil rights and persistent disparities, particularly between people membered White or Black, led a small group of legal scholars to theorize why these disparities endured. This legal theorizing is the origin of Critical Race Theory. Legal scholars Derrick Bell, Allen Freeman, Richard Delgado, Kimberlé Crenshaw, and Mari Matsuda are recognized as early contributors to the development of Critical Race Theory.12 Law professors Richard Delgado and Jean Stefancic recount Critical Race Theory’s emergence from analyses motivated by concerns that advances in civil rights had stalled and, in some cases, were being reversed.13 As legal scholars theorized how and why these advances were being stalled and reversed, they applied the lens of critical legal studies and radical feminism. An important idea borrowed from critical legal studies focused on legal indeterminacy. Legal indeterminacy recognizes that, in some cases, there is no one correct outcome for a legal case. In such situations, the outcome of a case is driven by the line of authority embraced by a justice or the interpretation made of a fact. As is explored in greater detail in the “Interest Convergence” section of this chapter, Derrick Bell derived the concept of interest convergence as one factor that influenced the direction of a decision when legal indeterminacy is at play in the case. From radical feminism, these scholars extended “insights into the relationship between power and the construction of social roles, as well as the unseen, largely invisible collection of patterns and habits that make up patriarchy and other types of domination.”14 For these pioneering Critical Race Theory scholars, white supremacy and the racism that flows from it was the specific form of domination centered in their legal analysis. Reflecting on

280 Alternate Lenses for Educational Measurement the legal analysis that emerged during the late 1970s and into the early 1990s, Cornel West, a professor of African American studies, commended this work for “confront[ing] critically the most explosive issue in American civilization: the historical centrality and complicity of law in upholding white supremacy (and concomitant hierarchies of gender, class, and sexual orientation).”15 Core to their analysis of white supremacy and racism in the law was a shift in focus from racism as a “perpetrator perspective” to racism as systemic. Approaching racism from the perpetrator perspective is akin to conceiving of racism as acts performed by an individual—that is, racial discrimination is the result of “an intentional, albeit, irrational, deviation by a conscious wrongdoer from otherwise neutral, rational, and just ways of distributing jobs, power, prestige, and wealth.”16 Critical Race Theory recognized the power the perpetrator perspective had within U.S. society in that it allowed the existence of racism to be acknowledged, yet to view acts of racism as anomalies that occurred infrequently within an otherwise free and just social system. Embrace of racism as acts performed by an individual within mainstream thinking positioned the legal system to recognize the existence of racism “when—and only when—one can point to specific, discrete acts of racial discrimination, which is in turn narrowly defined as decision-making based on the irrational and irrelevant attribute of race.”17 This narrow focus on decision-making based on race allowed the courts, and the general public, to ignore the institutional and structural configurations that interact to produce systemic racism. Because it was nearly impossible to point to a specific use of race in a decision-making process that produced an institutional or structural policy or practice, the systemic nature of racism was uncontestable in the court of law. This focus on individual decisions also allowed legal decisions to ignore historicity—that is, the historical acts of humanity that produced conditions the effects of which perpetuate through future generations and which contribute to decisions that produce disparate effects across racialized groups without including the consideration of race during the decision-making process. Critical Race Theory shifted focus from racism being seen as individual and infrequent, to racism as systemic and “normalized” within the U.S. social system. Critical Race Theory began as a lens through which the U.S. legal system and its decisions were analyzed. However, the lens through which Critical Race Theory positioned scholars to view social challenges extend beyond the law. During the 1990s, core tenets of Critical Race Theory were embraced by many fields of scholarship, including education, political science, women’s studies, sociology, health, philosophy, and American studies. Being a form of critical theory, Critical Race Theory is applied in these various fields

Critical Race Theory and QuantCrit 281 both to understand a social situation and, more importantly, to change the situation. In this way, any analysis conducted through the lens of Critical Race Theory is political—it seeks to transform society to eliminate the operation of racial (and other forms of) oppression in a specific situation or set of situations. As sociologist Tanya Golash-Boza observes, “the study of race must be political and politicized because there is no good reason to study race other than working toward the elimination of racial oppression.”18 Although research conducted through any frame is political—even that which works to merely understand the status quo operates to preserve power relationships in that status quo19—the candid embrace of Critical Race Theory as a political project is perhaps one reason some conservative activists, legislators, and members of the press have inflamed concerns about the influence Critical Race Theory may have on institutions that operate the U.S. social system. Tenets of Critical Race Theory

As a form of critical theory, Critical Race Theory provides a framework for theorizing about the social world. As stated in its title, Critical Race Theory positions race and racism centrally in the theorizing process. Race and racism are acknowledged as prime drivers for social interactions, structures, and productions. Although race and racism are central in all analyses engaged within the framework of Critical Race Theory, other forms of oppression are also examined and are understood to intersect with racism in multiple and complex ways. Critical Race Theory established five tenets that inform the lens through which social analyses are conducted. These tenets include: 1 Racism Is Ordinary: Racism is the usual way U.S. society does business; racism is the common everyday experience of all people in the United States. 2 Race as a Social Construct: Race is a product of social thought; it is not an objective, inherent, fixed, and/or biological trait. 3 Intersectionality and Anti-Essentialism: No person has a single identity; rather, their social location and position of power are influenced by the context in which they operate and the intersection of multiple socially constructed hierarchies of identity. 4 Interest Convergence: Because racism advances the interests of people membered White, and people membered White hold power in the United States, there is little incentive for the dominant group to eradicate racism except in cases where their interests are served. 5 Storytelling and Counter-Stories: The voices and narratives produced by racialized individuals are valued and essential for developing a deep understanding of racialized and other forms of oppression.20

282 Alternate Lenses for Educational Measurement Part I of this book examined tenets 1 and 2 in detail. As Chapters 1 and 2 document, although presented as a natural biological trait, the concept of race was invented and molded to support the operation of a social, economic, and political system that provided advantage for people membered White through the oppression of people membered not-White. As we saw in Chapter 3, Charles Mills’s concept of the racial contract explicates that this system of racism was and remains foundational in the social, political, and economic systems that operate the United States.21 Critical Race Theory “dares to look beyond the popular belief that getting rid of racism means simply getting rid of ignorance or encouraging everyone to ‘get along.’”22 Understanding racism as systemic, rather than the product of individual acts, Critical Race Theory focuses attention on the institutions, structures, ideology, and associated narratives that form and support systemic racism.23 Recognizing race as a social construction and racism as ordinary directs researchers to examine social issues through a lens that illuminates the various ways in which racism produces advantages through oppression, and, more importantly, the changes required of our social institutions and structures to liberate all people from oppression. As education professor Edward Taylor describes, by defining racism as the acts of [the] larger, systemic, structural conventions and customs that uphold and sustain oppressive group relationships, status, income, and educational attainment … in CRT scholarship, the terms ‘White’ and ‘Black’ are not meant to signal individuals or even group identity. Rather, they indicate a particular political and legal structure rooted in the ideology of White European supremacy and the global impact of colonialism.24 Because systemic racism impacts all racialized groups in various ways, Taylor also notes that in scholarship guided by the tenets of Critical Race Theory, “Non-White” is an interchangeable term for “Black.” In my writing, I have opted to employ the phrase “membered not-White” to both emphasize the social construction of racialized categories and to avoid negative connotation associated with “non.” The third tenet—intersectionality—recognizes that many forms of opp ression and advantage operate within our social system. These many forms of oppression and advantage intersect in complex and multiple ways to differentially impact the lived experiences of each individual. To develop a deeper understanding of the roles oppression and advantage play in producing disparate outcomes and lived experiences, the multiple and complex intersections of oppression and advantage must be considered. The concept of intersectionality has developed into its own social theory and is the topic of the next chapter.

Critical Race Theory and QuantCrit 283 Interest convergence and counter-storytelling are two tenets of Critical Race Theory that are explored in greater detail next. Interest Convergence

The concept of interest convergence was introduced by Derrick Bell through his analysis of Brown. The core idea of interest convergence is that liberal policies aimed at reforming institutions and structures to address racial inequities are instituted only when a benefit occurs for the dominant racialized group. Bell’s analysis focused specifically on the role the 14th Amendment played in shaping judicial decisions. Recall that the 14th Amendment extended constitutional rights to all citizens (regardless of their racialized identity) and prevented states from depriving any person of life, liberty, or property without due process of the law. In his analyses of Brown and other cases brought to the courts based on the rights granted by the 14th Amendment, Bell observed: The Fourteenth Amendment, standing alone, will not authorize a judicial remedy providing effective racial equality for blacks where the remedy sought threatens the superior societal status of middle- and upper-class whites. It follows that the availability of Fourteenth Amendment protection in racial cases may not actually be determined by the character of harm suffered by blacks or the quantum of liability proved against whites. Racial remedies may instead be the outward manifestations of unspoken and perhaps subconscious judicial conclusions that the remedies, if granted, will secure, advance, or at least not harm societal interests deemed important by middle- and upper-class whites.25 In making this observation, Bell also noted that, in some cases, advancing racial justice (or the appearance of such justice) may be sufficient for providing benefit to the middle- and upper-class people membered White. As an example, in Brown, Bell locates the U.S. Supreme Court’s decision in the larger political and economic world context of the 1950s. At that time, the United States was engaged in the Cold War with the Soviet Union and was actively pursuing alliances with several nations in Africa and Asia. In the United States, Jim Crow laws were still in effect and enforced in many states. Racialized discrimination, disparities among racialized groups, and rising tensions between racialized groups were making international headlines. Bell argues that a decision to desegregate schools would demonstrate to potential U.S.-allied African nations that the United States was actively addressing racial discrimination and inequities.26 As evidence to support this position, Bell pointed to amicus briefs filed in support of Brown’s

284 Alternate Lenses for Educational Measurement appeal to desegregate schools. As an example, one brief filed by U.S. Attorney General Herbert Brownell argued, It is in the context of the present world struggle between freedom and tyranny that the problem of racial discrimination must be viewed. The United States is trying to prove to the people of the world, of every nationality, race, and color, that a free democracy is the most civilized and most secure form of government yet devised by man. We must set an example for others by showing firm determination to remove existing flaws in our democracy. The existence of discrimination against minority groups in the United States has an adverse effect upon our relations with other countries. Racial discrimination furnished grist for the Communist propaganda mills, and it raises doubts even among friendly nations as to the intensity of our devotion to the democratic faith.27 The attorney general’s brief also included a letter from Secretary of State Dean Acheson, who wrote: The United States is under constant attack in the foreign press, over the foreign radio, and in such international bodies as the United Nations because of various practices of discrimination against minority groups in this country. … [T]he continuance of racial discrimination in the United States remains a source of constant embarrassment to this Government in the day-to-day conduct of its foreign relations; and it jeopardizes the effective maintenance of our moral leadership of the free and democratic nations of the world.28 In Brownell’s brief and Acheson’s letter, we see that the primary motivation for supporting Brown’s appeal was not that segregation was wrong or that school segregation was producing adverse effects for people membered Black. Rather, the problem with segregated schools (and other forms of discrimination) was the “constant embarrassment” to the U.S. government and the “adverse effect upon relations with other countries.” Improving international relations by removing this embarrassment was the primary motivation for ending the segregation of schools. Bell also documents ways in which the press responded to the court’s ruling. For example, Bell quotes Time magazine, which wrote, “In many countries, where U.S. prestige and leadership have been damaged by the fact of U.S. segregation, it will come as a timely reassertion of the basic American principle that ‘all men are created equal.’” Similarly, Life magazine stated that the Supreme Court “at one stroke immeasurably raised the respect of other nations for the U.S.” And Newsweek proclaimed that

Critical Race Theory and QuantCrit 285 the psychological effect will be tremendous … segregation in the public schools has become a symbol of inequality, not only to Negroes in the United States but to colored peoples elsewhere in the world. It has also been a weapon of world Communism. Now that symbol lies shattered.29 In the case of Brown, Bell also noted benefit for middle- and upper-class people membered White residing in southern states. Bell argues that segregation was an impediment to efforts in the South to transition further from a rural plantation society to a more industrialized economy. Bell points to the increasing number of elite business leaders membered White who spoke out against racial discrimination in the South during the 1940s and 1950s.30 Like Brownell and Acheson, these comments were not motivated by a desire to address inequality. Instead, interest in ending segregation in schools was motivated by a desire to send a strong signal that the South was addressing racial discrimination and thus attract greater investment in its efforts to further industrialize. As a separate example of interest convergence that occurred nearly a century prior to Brown, Patricia Collins describes the change in response to mob lynchings in New Orleans that was observed by Ida B. Wells-Barnett. Wells-Barnett argued that “‘the killing of a few Negroes’ by mobs” initially generated little response by leaders in Louisiana. Instead, it was only when the “reign of mob law exerts a depressing influence upon the stock market and city securities begin to show unsteady standing in money centers, then the strong arm of the good white people of the South asserts itself and order is quickly brought out of chaos.”31 It was only when the economic interests of the middle- and upper-class people membered White were threatened that action was taken to address mob lynching. Critical Race Theory recognizes the important role interest convergence plays in motivating efforts by those who hold a dominant racialized position to address racialized discrimination and inequities. In legal cases, interest convergence is particularly relevant when legal indeterminacy is present for a given case. In such cases, indeterminacy allows the interest of the dominant racialized group to influence the line of authority or interpretation of facts that are embraced by the court and thus rule in a manner that protects that interest. The role that interest convergence plays in shaping decisions and actions, however, is not limited to legal decisions. Interest convergence influences all decisions and actions that involve racialized discrimination and inequities— both those that confront racialized inequities and those that preserve the status quo—regardless of whether they occur in education, housing, the

286 Alternate Lenses for Educational Measurement workplace, financial sectors, or research focused on educational measurement. When engaged in research conducted through the lens of Critical Race Theory, recognizing interests and the potential convergence or divergence of interests is an important component of reflexivity. Storytelling and Counter-Storytelling

Critical Race Theory recognizes racism as an ordinary and normalized component of the U.S. social, economic, political, and legal system. Racism, in fact, is so ordinary that many/most people who gain advantage through racialized oppression do not see racism in operation in their social world.32 To help people membered White understand the challenge of seeing the many ways in which racism operates U.S. society, Edward Taylor draws upon an analogy offered by Toni Morrison in her book Playing in the Dark. Morrison explores the difference between focusing closely on the fish moving in and out of a castle and through bubbles percolating up through the water, and then stepping back to see the fishbowl that creates “the structure that transparently (and invisibly) permits the ordered life it contains to exist in the larger world.”33 Suddenly realizing that the fishbowl is the structuring agent for all that occurs within the water it holds is akin to members of the dominant racialized group seeing racism as the structure that operates their social system. As Taylor explains, for many/most people membered White in the United States, they “cannot understand the world that they themselves have made. Their political, economic and educational advantages are invisible to them and many find it difficult to comprehend the non-White experience and perspectives that White domination has produced.”34 As explored in Chapter 4, narratives play an essential role in maintaining the ideology of the White Racial Frame and aid in justifying the production of disparate outcomes. For too many people, these narratives are accepted as truths, and the advantages legitimated through these narratives, although often unrecognized, become what Bell terms “settled expectations.”35 Together, the inability to see racism operating the social system and the embrace of dominant racialized narratives protects and perpetuates the status quo of systemic racism. To shift focus from the fish and their immediate environment to the fishbowl, and to unveil the mythology of dominant narratives, Critical Race Theory elevates the voice of people membered not-White; people whose lived experiences and perspectives offer counters to dominant storylines. Through counter-stories, Critical Race Theory “demand[s] that racial problems be viewed from the perspective of minority groups, rather than a white perspective.”36 Critical Race Theory employs counter-stories to unveil and challenge unexamined assumptions held by the dominant racialized group.37 Counter-stories also hold power to unseat the influence that fallacies within

Critical Race Theory and QuantCrit 287 dominant ideology have on the structuring of efforts aimed at addressing inequities and instead focus attention on the structures themselves that produce those inequities. In these ways, “counter-stories can shatter complacency, challenge the dominant discourse on race, and further the struggle for racial reform.”38 As Daniel Solórzano and Tara Yosso, professors of education and Chicano studies (respectively), detail, counter-stories can take many forms, including personal narratives, narratives of other people, and composites. In some cases, counter-stories can be based on true stories; in other cases, they are fictionalized.39 Faces at the Bottom of the Well: The Permanence of Racism provides an excellent example of a personal narrative in which Derrick Bell recounts select experiences in his life to reveal various ways in which racism impacted his lived experiences.40 As an example of a composite story, Solórzano and Yosso draw on the experiences and conversations between two characters— Professor Leticia Garcia and graduate student Esperanza Gonzalez—to reveal the various ways in which racialized and gender discrimination are experienced in graduate training programs by people membered Chicana or Chicano. Through their story, experiences such as self-doubt, survivor guilt, imposter syndrome, and invisibility are made real for the reader and unveil ways in which the culture of such programs present unique challenges to students and faculty membered not-White. Through this story, potential ways in which graduate programs can be altered to reduce these challenges are also implied.41 Regardless of the technique employed, counter-stories function as a nontraditional method for theorizing racism and other forms of oppression. Counter-stories also aid people membered White to develop understanding of the role racism plays in shaping lived experiences of those membered notWhite. Storytelling may seem well outside the domain of the quantitatively oriented field of educational measurement. But as we will explore next, storytelling can be a useful tool for unveiling racialized assumptions and operations within educational measurement. The Emergence of QuantCrit Given the emphasis Critical Race Theory places on storytelling and counterstories, it is not surprising that social analyses conducted through the lens of Critical Race Theory rely heavily on qualitative methods.42 As explored in Chapter 8, statistical methods were and continue to be applied to support specious pre-understandings of biologically produced differences among racialized groups. Faulty interpretations for these analyses have been and continue to be used to develop and support a variety of deficit narratives about people membered not-White; in many cases, pathologizing members

288 Alternate Lenses for Educational Measurement of racialized groups. As David Gillborn, a professor of critical race studies, observes, “quantitative approaches often encoded particular assumptions about the nature of social processes and the generation of educational inequality reflect a generally superficial understanding of racism.”43 As described in Chapter 8, historical and continued misapplications of statistical methods to support deficit narratives motivated concerns among a small group of scholars about the use of quantitative methods for the study of race and racism. For these scholars, Audre Lorde’s observation that “the master’s tools will never dismantle the master’s house” is apt.44 Since the mid-2000s, however, other scholars have pushed back on resistance to the use of quantitative methods to examine race and racism. As an example, Jenna Sablan, who specializes in higher education access, acknowledges that statistical methods have been used to support racist projects, but notes that these uses are primarily a product of the motivation rather than strictly a production of the method.45 Further, she observes that “qualitative methods are not immune to a critique grounded in perspectives of racism and colonialism, such as the use of ethnography in stereotyping Indigenous and Native communities.”46 These scholars argue that quantitative tools are useful in revealing inequities produced through racism and the structures through which racism operates.47 It is this effort to use the “master’s tools” to reveal and combat racism that gave rise to quantitative Critical Race Theory, or what has popularly been termed QuantCrit.48 The origin of QuantCrit is murky. QuantCrit is clearly informed by Critical Race Theory. Although not well documented, critical quantitative research also likely influenced the emergence of QuantCrit. Critical quantitative research is a methodology introduced during the early 2000s by quantitatively oriented scholars who endeavored to shift their analytic work from a positivist orientation to a critical perspective. As Francis Stage and Ryan Wells, two scholars who led the development of critical quantitative research, describe: The term quantitative criticalist was used to describe a researcher who used quantitative methods to represent educational processes and outcomes to reveal inequities and to identify perpetuation of those that were systematic. The term also included researchers who question models, measures, and analytical practices, in order to ensure equity when describing educational experiences. These scholars resisted traditional quantitative research motivations that sought solely to confirm theory and explain processes.49 Over time, the importance of studying people and institutions in a sociocultural and historical context was added as another pillar of critical quantitative research.50

Critical Race Theory and QuantCrit 289 Building on Critical Theory, critical quantitative research is concerned with addressing social injustice by documenting inequities, locating the causes of inequities, and informing changes to social systems to halt the production of inequities. Quantitative criticalists are also interested in unveiling false assumptions and shortcomings in existing analytic methods that challenge applications of quantitative methods in the pursuit of social justice. In this way, they look critically at social problems and at existing methods. While the language employed by quantitative criticalists was less revolutionary than the Frankfurt School of Critical Theory, the aim of their work was focused on liberating people from oppressive conditions. By integrating Critical Race Theory with the primary aims of critical quantitative research, QuantCrit sharpens the focus of critical quantitative analytic projects on race and racism. Like Critical Race Theory, five tenets guide work conducted through the lens of QuantCrit. Tenets of QuantCrit

Tenets of QuantCrit build on those established by Critical Race Theory. QuantCrit tenets are designed to guide the focus of research and to avoid the limitations of positivist-oriented research identified by the Frankfurt School. Although five tenets are listed here, each tenet does not stand on its own. Instead, the tenets interact and intersect to form a composite frame through which QuantCrit scholars approach social research. QuantCrit tenets include: 1 The centrality of racism: Racism is a complex and deeply rooted aspect of society that is not readily amenable to quantification. 2 Categories are neither “natural” nor given: Categories themselves, their units, and the forms of analysis that employ categories must be critically evaluated. 3 Using numbers for social justice: Statistical analyses have no inherent value, but they can play a role in struggles for social justice. 4 Numbers are not neutral: Numbers should be interrogated for their role in promoting deficit analyses that serve white racial interests. 5 Voice and insight are vital: Data cannot “speak for itself,” and critical analyses must be informed by the experiential knowledge of marginalized groups.51 Readers steeped in quantitative methods may react to some of these tenets with the feeling that there is nothing new here. Textbooks and scholarly articles on experimental and quasi-experimental designs and the presentation of findings from statistical analyses authored by Donald Campbell, Thomas Cook, William Shadish, John Tukey, Edward Tufte, and others have long

290 Alternate Lenses for Educational Measurement emphasized the non-neutrality of numbers and the importance of presenting findings from data analyses in ways that give meaning and voice to data.52 In part, such reactions stem from a (mis)conception that QuantCrit was developed as a counter or an alternative to quantitative methods. This conception is inaccurate. QuantCrit is a response to researchers who are critical of the use of quantitative methods to explore issues specific to racism. While recognizing the legitimacy of concerns raised about the misapplication and misinterpretation of findings from quantitative research, neither QuantCrit nor its tenets stand in opposition to quantitative methods. Rather, the tenets are intended to focus quantitative researchers sharply on key considerations and practices in their work to avoid such misapplications and misinterpretations as they apply and extend quantitative methods in pursuit of racial justice. It is also important to note that, before the phrase QuantCrit was coined, these tenets were in action in social science research. As noted earlier, W.E.B. Du Bois’s analysis presented in The Philadelphia Negro employed quantitative analyses, along with qualitative data, to present racism as operating structurally to shape the lives of people membered Black residing in Philadelphia. In his work, DuBois centered the voices of people membered Black and countered the deficit narratives produced by eugenicists and others who held a biologically based conception of race at the time. More recently, sociologist Tukufu Zuberi engaged in a deep analysis of the many ways in which quantitative methods can be applied by researchers to support racist deficit narratives yet applied by others to unveil the impacts and structures of racism in the pursuit of racial justice.53 Nichole Garcia, a professor of higher education, and her colleagues point to several additional studies that applied the tenets of QuantCrit before QuantCrit was coined, including work performed by Michelle Fine that reframed analyses of high school “dropouts,” David Embrick and Kasey Henricks’s analyses of tax policies, fines, and racial violence, and Daniel Solórzano and his colleagues’ analyses of the educational pipeline.54 Centrality of Racism and Quantification

The first tenet of QuantCrit extends the first tenet of Critical Race Theory by affirming the central role racism plays in current social systems. This first tenet also acknowledges that quantitative methods can present challenges to the study of race and racism in a social system. As one example, the model of systemic racism presented in Chapter 3 depicts the many ways in which racism impacts lived experiences. In some cases, the impacts occur at the level of the individual. For example, both overt and aversive racism occur as individuals interact with each other, producing harm for the individual membered not-White when a person or small group of people membered White exert their social power.

Critical Race Theory and QuantCrit 291 In other cases, effects of racism operate within locales that have been structured by policies such as redlining, the assignment of students to schools, and the processes used to allocate resources to those schools. Within a locale, clusters of individuals are impacted in similar ways by policies and practices enacted within that locale which produce disparities in opportunities and outcomes between locales. As an example, choices about the location of grocery stores by the food industry and waste management facilities by municipalities impact access to high-quality food and exposure to pollutants similarly for people residing within a given locale, yet differently across locales. When considering racism that operates within locales, it is important to focus causal attention on the structures that produce disparate outcomes across locales rather than the racialized composition of people residing within locales. As Zuberi reminds us, “The causal factors in environmental racism are discriminatory practices by institutions in the location of hazardous waste sites, not the racial composition of the communities.”55 The same can be said of the causal factors for disparate access to nutritious food, quality schools, employment opportunities, and other conditions that are the product of decisions made by industry, politicians, and others who exert power. The model of systemic racism also depicts historical and current laws and regulations, as well as some institutions and structures, that work across locales to similarly impact racialized people in disparate ways despite the locale in which they operate. For example, home loan practices in the 1950s and 1960s as well as in the early 2000s were implemented differently for people membered White than for those membered Black, regardless of the locale in which one resided. In these ways, racism operates at different levels within our social system. At first glance, the multilevel structure to the operation of racism may seem well aligned with multilevel statistical modeling techniques. Multilevel statistical methods were introduced to account for the ways in which individuals are clustered within our society. Individual children are clustered within families, which are in turn clustered within neighborhoods. These neighborhoods are further clustered within a city or town. Students are clustered in classrooms, which are clustered in schools that are further clustered within a district. Individual patients are clustered within doctors clustered within hospitals. The clustering of individuals within these larger units tends to produce commonalities among members of a cluster that contribute to correlations among members within a cluster. As an example, redlining policies produced clustering of people membered into different racialized groups within neighborhoods. As noted earlier, various policies and practices in turn differentially impact the resources available among neighborhoods, which in turn impacts health, job, and educational opportunities. These differential

292 Alternate Lenses for Educational Measurement impacts contribute to correlations among income, health status, and educational achievement of individuals clustered within those neighborhoods. It is this type of clustering and associated effects that multilevel statistical techniques are adept at modeling. However, the impacts of systemic racism—that is, racism that cuts across locales and produces similar impacts for people membered into specific racialized groups—cannot be accommodated by existing multilevel modeling techniques. Existing techniques are designed to cluster people within units and to then cluster those units within larger units. The ways in which systemic racism impacts individuals, however, cuts across these units—a structuring for which existing models does not currently account. In this way, the structure of existing models does not adequately match the way in which racism is theorized to operate in the model of systemic racism. It is this lack of alignment between existing statistical modeling techniques and the theory of systemic racism that the first tenet raises to consciousness when quantitative methods are applied to examine the social systems in which racism operates. As will be explored in greater detail, the causal frame and associated language that is employed when presenting findings from some statistical modeling techniques—regression modeling in particular—similarly challenges analyses focused on systems in which racism operates. This first tenet directs attention to these types of limitations and challenges that exist in current quantitative analytic and measurement techniques. Critically Evaluating Categories

Like the second tenet of Critical Race Theory, QuantCrit’s second tenet reminds quantitative researchers that race—like all identity forms—is a socially constructed category. Given the social construction of race, this tenet encourages researchers to be mindful of the meaning given to “race” in their analyses, the technique(s) used to associate a racialized category with an individual, and the interpretation one makes based on analyses in which that category is employed. As noted earlier, Taylor reminds us that the terms White and Black were introduced as identity categories for political and legal purposes. These terms were introduced and then molded to (re)produce advantage for people membered White through the oppression of people membered notWhite.56 Although ‘race’ serves as a social identity category, when used in quantitative analysis of an outcome produced by social system, “race” functions as a proxy for racism.57 This proxy use, however, often goes unnoted by the researcher. As an example, recall the review of peer-reviewed articles summarized in Chapter 8 in which “race” was used as a variable in statistical analyses. None

Critical Race Theory and QuantCrit 293 of these articles provided a working definition for the meaning of “race” as used in the analysis.58 As a result, the reader is left to decide what the variable “race” represents. Too often, the default decision is to conceive of “race” as an individual trait rather than as a social construction used for political purposes. Worse yet, some readers understand “race” as an immutable biological production; and it is biological differences among racialized groups that are understood to produce disparate outcomes among racialized groups. When “race” (read, “racism”) is used in statistical analyses, the units emplo yed to encode the racialized membership of each individual must be carefully considered. In most analyses, a single unit is used to represent each racialized group. As an example, the code “0” might be used to represent people membered White and the code “1” to represent people membered Black. When “race” serves as a proxy for racism, this approach to coding “race” is particularly problematic because it suggests homogeneity of experiences of racism within each racialized category. By assigning the same value to all members of a racialized group, all members of that group are represented as having identical experiences with and resulting impacts of racism. Homogeneity of experiences with racism, however, does not reflect the diversity of experiences lived by people membered into a given racialized group. Before racialized identity—or any form of social identity—can be assigned a categorical value (e.g., dummy coded) and used in quantitative analyses, participants in the dataset must be placed into a racialized group. The second QuantCrit tenet cautions researchers to reflect on the process used to member people into a social identity group. There are at least four components to this concern. First, the social identity information used in analysis must align with the research question(s) explored through the analysis. As previously noted, research questions often focus attention on effects of racism rather than that of racialized categorization. In such cases, rather than simply relying on one’s racialized identity, information about experiences with racialized oppression should be used to form a variable. When it is not possible to collect information about exposure to racialized oppression, attention should focus on the ways in which racialized identity is theorized to influence interactions within a social system. As examples, research conducted by sociologist Ellis Monk found a relationship between skin pantone and health outcomes, and sociologist Nancy Lopéz and her colleagues’ work on “street race” found that for some individuals, how one is identified by others reveals different patterns from when racialized group membership is defined by how one identifies.59 Relying on the racialized group with which one identities or the racialized group that others perceive one to be a member should align with the research questions being explored. As an example, for a study focused on stereotype threat, how one identifies

294 Alternate Lenses for Educational Measurement is likely most relevant.60 In contrast, experience with racialized enactments might better align with one’s street race or another indicator of the racialized group with which one is perceived to be membered. The second consideration addresses the racialized groups into which people are membered. Chapter 2 examined the ways in which racialized categorizations were and continue to be molded. The (re)molding of race has altered the groups into which people are membered. The effects of this (re)molding are evident in the many changes to the racialized groups included in the U.S. census explored in Chapter 2. Considerable variation in the terms applied for racialized group membership is also evident in educational research. Dominque Baker, a professor of education policy, and her colleagues examined a large body of educational research published over a ten-year period and found that educational researchers use a variety of terms for racialized groups. In some cases, different terms are used for what many readers might interpret as the same racialized group—for example, some researchers conducting research on students in the United States will use the term “Asian,” while others use the term “Asian American.” Similarly, some researchers use the term “Black,” while others use “African American.” In addition, the level of specificity of terms within an overarching racialized group varies. For example, in some cases, only the term “Asian” is employed, while in other cases, racialized groups are separated into smaller groupings, including “Pacific Islander,” “Native Hawaiian,” and, in some cases, specific regions (e.g., “East Asian,” “South Asian,” “Southeast Asian”), nationalities (“Filipino,” “Indian,” “Korean”), or cultural groups (e.g., “Hmong” or “Desi”).61 Drawing on Judith Butler’s work on gender and queer theory, Ezekiel Dixon-Román, a professor of social policy, explores the performative nature of research focused on race and racism. Research that is performative functions “as a discursive practice that enacts or produces that which it names.”62 When collecting information about racialized identity, the racialized groups offered for selection operate performatively to produce racialized identity based on the respondents’ selection(s)—that is, the participant is provided a list of identity categories and through their selection becomes what that category label represents. This performative process begins when a researcher designs or selects the instrument employed to collect information about one’s racialized identity. During the design/selection process, the researcher decides which racialized groups to include as well as the names given to those groups. The respondent is then presented with the instrument developer’s set of response options, the reading of which triggers pre-understandings held by the respondent about the meaning of each racialized group offered. Whether the meanings held by the respondent match that of the instrument developer is unknown, nor does it matter once the respondent selects an option. Once selected, the respondent becomes a member of that racialized group. Further, all

Critical Race Theory and QuantCrit 295 respondents who make the same selection become representatives of that racialized group, and their collective set of data comes to represent the entirety of the racialized group. In this way, selections about one’s racialized identity produces a group that holds that identity. This performative act is of particular concern when a limited number of racialized groups are offered for selection, thus forcing some respondent to place themselves into a group that comes closest to their identity, but is not their actual identity. Even when a substantial range of identity options are offered, selecting from among identities is similarly performative for those respondents who hold multiple identity affiliations, and either respondents are forced to select only one option or, as considered next, the researcher prioritizes one identity selection or collapses separate identity options into a master category. The third component focuses on collapsing racialized groups into a master group. In many studies, the sample size for a subset of racialized groups is insufficient to support statistical analyses. As a result, two or more racialized groups are combined, or collapsed, to form a master group. As an example, a study in a school system that predominantly serves students membered White may collapse students membered Black, Latine, Asian, and/or Indigenous/American Indian into a master category termed BIPOC. Similarly, when studying sexual identity in a relatively homogenous organization such as an Evangelical religious congregation, people who identify as lesbian, gay, bisexual, transgender queer, intersex, and/or asexual may be collapsed into a category labeled LGBTQIA. In each case, the research participant’s selected identity is lost and is replaced by the master category. Collapsing racial or other social category groups to create a master group increases statistical power. This collapsing, however, also implies homogeneity with respect to experiences and other variables of interest that do not reflect the actual experiences of people membered into the master group. Nichole Garcia and Oscar Mayorga observe “that educational researchers typically collapse ethnic and racial populations into one ‘Hispanic’ category and report on data regarding diverse experiences as though groups are one racially and ethnically homogenous population.”63 Robert Teranishi, a professor of social science and comparative education, similarly observes that people whose heritage traces to nations in Asia are often treated as a homogenous group termed “Asian.” Further, he argues that the myth of the “model minority” is then applied to all people placed into the “Asian” racialized group regardless of their origin within the Asian continent. In his analysis, he disaggregates people into racialized groups representing their region of origin within Asia (e.g., Northeast Asia, Southeast Asia) and by nation (e.g., Japan, Korea, Vietnam) or culture (e.g., Hmong) of origin to show large differences in experiences and outcomes among these more refined racialized groupings.64

296 Alternate Lenses for Educational Measurement Garcia and Mayorga also point to Teranishi’s work as an example of challenges that arise when racialized data is collapsed. Yet they critique Teranishi for failing to critically reflect on these finer grained groupings, writing that there is one notable limitation in Teranishi’s approach as he does not question the assumptions of the quantitative methods stipulating belief in homogenized groupings. In other words, by disaggregating by one racial group (e.g. Asian Americans), he does not challenge the logic of racial categories themselves but reifies the logic, which creates smaller racial sub groups. Teranishi’s goal is to debunk the model minority myth, but the means in achieving that goal contain a tacit acceptance of the underlying logic of racial categories within the dataset.65 Garcia and Mayorga’s critique elevates the challenge this second tenet of QuantCrit raises: while it is necessary to employ racialized groupings to unveil disparities and racialized injustices, such use holds potential to reinforce acceptance of those groupings as real rather than as a social production. It is here that reflexivity and careful attention to the description of such groupings is necessary to justify use of groupings while also emphasizing the political purposes driving the social construction of those groupings. The final consideration arises when conducting analyses with a secondary dataset. Because data was collected by another research team, consideration must be given to the racialized groupings present in the dataset and the original researcher’s definition of those groupings. As Garcia and Mayorga explain: a dataset is constructed with a particular theoretical framework that impacts the entire research design and process (e.g. questionnaire, sampling). When a secondary analysis is conducted, the researcher accepts those theoretical decisions made by the initial principal investigator … [The] first step of a critical race scholar in using secondary analysis is the dismantling of the dataset and all the components used to create it. The dismantling is to identify and name the limitations of the dataset.66 As part of this dismantling, secondary data analysts must consider what the racial identity groupings were intended to represent, the groupings offered, the method used to locate participants within those groupings, the units employed to represent groupings in analyses, and the degree to which each of these decisions artificially infers homogeneity within groupings. Recognizing the social construction of categories that are used to mark identity and form demographic groupings, this second tenet challenges QuantCrit researchers to consider critically the categories used in analyses, with a specific focus on articulating what is represented by a categorical

Critical Race Theory and QuantCrit 297 variable, how this representation is reflected or modeled in quantitative analyses, and the interpretation and meaning given to those analyses. Using Numbers for Social Justice

The third tenet assures researchers that quantitative analyses can be employed in the pursuit of social justice. As the quote by Zuberi that opens this chapter reflects, this third tenet also reminds quantitative researchers that findings from statistical analyses do not have any inherent value absent theory, interpretation, and discourse. As the fourth and fifth tenets detail, theory is essential for informing the ways in which racialized oppression is understood to operate within a system and, in turn, disparately impact outcomes. As seen in the discussion of tenet 2, theorization must inform the approach taken to represent racial oppression and the types of information collected from research participants about their experiences with oppression. Theory is also essential for informing the interpretation and discussion of findings from statistical analyses. As Zuberi describes, Deriving a statistical model of social relationships requires an elaborate theory that states explicitly and in detail the variables in the system, how these variables are causally interrelated, the functional form of their relationships, and the statistical quality and traits of the error terms.67 In addition, interpreting results of a statistical analysis should be connected to an underlying causal theory. The results themselves do not prove anything beyond the statistical relationship between two or more variables. The connection of these variables in the real world requires a causal theory.68 Applying quantitative methods in pursuit of social justice requires researchers to understand and apply theories about social systems and the role socially constructed categories play in those systems. Absent theory, it is too easy to apply findings from statistical analyses to support narratives of inherent, immutable differences among socially constructed groups. As the Frankfurt School of Critical Theory emphasizes, social injustice is a product of humanity and its construction of history. The pursuit of social justice is a political project that requires change to the course of our future history—the construction of an alternate, justice-oriented history. Use of quantitative methods in the pursuit of social justice requires conscious and intentional placement of findings in relation to social theory. In doing so, both the non-neutrality of numbers and the importance of giving voice to numbers become essential.

298 Alternate Lenses for Educational Measurement Numbers Are Not Neutral—Voice Is Vital

Tenets 4 and 5 are closely related and interact with each other to remind quantitative researchers that numbers neither are neutral nor should be left to stand on their own without interpretative explanation. These two ideas are not new. Yet, for some researchers, policymakers, and the general public, numbers are seen as objective, impersonal signifiers of meaning. As Theodore Porter observes, decisions made based on numbers have “the appearance of being fair and impersonal. Scientific objectivity thus provides an answer to a moral demand for impartiality and fairness. Quantification is a way of making decisions without seeming to decide.”69 Yet, Zuberi and Bonilla-Silva argue that the results of a statistical analysis “do not prove anything beyond the numerical relationship between two or more lists of numbers or variables. The connection of these variables in the real world requires a causal theory.”70 Elsewhere, Zuberi reminds us that “our theories of society, not our empirical evidence, guide how we interpret racial [and all other] data.” It is this connection between the numbers produced through a statistical analysis and the theory connecting “lists of numbers” that informs the fourth and fifth tenets of QuantCrit. Several factors contribute to the non-neutrality of numbers. As we saw in our examination of tenet 2 (categories are neither given nor natural), the choices made by a researcher about the way in which racialized categories are constructed impacts the meaning of numbers. Teranshi shows clearly how the numbers representing outcomes experienced by people membered into the master category “Asian” change when disaggregated into smaller racialized groupings representing geographical, national, and cultural groupings within the Asian continent. Choices made when developing a measurement scale for a given construct or the method used to display data can similarly impact the meaning inferred from numbers. As an example, the scale for the SAT is designed such that 100 points separate scores awarded to people whose test performance differs by one standard deviation. In turn, two groups of test takers whose mean scores differ by one-fifth of a standard deviation will differ by 20 points. For members of the general public, a 20-point score difference may seem relatively large and create the impression there is a meaningful difference between the two groups. Yet, if the scale had established 10 points as separating scores differing by one standard deviation, the resulting two-point difference between the groups would appear relatively minor. Regardless of which scaling decision is made, the impression produced by the resulting numbers is hardly neutral. Gillborn shows how this non-neutrality extends to the statistical techniques employed to establish performance expectations for schools. As an example, Gillborn describes a modeling technique that considers contextual

Critical Race Theory and QuantCrit 299 factors, including the racialized composition of the student body, to predict future performance. Using past data, the amount of growth a student typically experiences is estimated. This past data, however, reflects patterns of growth that are impacted by racism. As a result, the data used to estimate the amount of growth students membered into different racialized groups reflects disparities in the rate of growth for students membered Black compared to those membered White. In subsequent years, actual growth is compared to the expected growth. Only when actual growth is lower than predicted by the model is concern raised. Although unstated, built into this modeling technique is an assumption that students membered Black are expected to make less growth than students membered White. When this assumption holds, schools are not flagged for underachievement. Only when students membered Black grow even less than expected compared to their counterparts membered White is the situation elevated to concern.71 Although expected growth may seem like a neutral target, it actually reflects racist assumptions that, even in the best of circumstances, would maintain an educational system that holds lower expectations for students membered Black. The narrative given to numbers and the discourse employed when communicating numbers has profound impacts on the meaning taken from statistical analyses. Relatively minor modifications to the phrasing employed when presenting findings from a statistical analysis can impact whether a deficit narrative or an anti-racist narrative is conveyed. As Zuberi and Bonilla-Silva describe, some researchers reach beyond the data when they interpret their statistical results. Data do not tell us a story. We use data to craft a story that comports with our understanding of the world. If we begin with a racially biased view of the world, then we will end with a racially biased view of what the data have to say.72 Presentations of findings that manifest as racial biases are particularly problematic when “race” is used in regression modeling and the findings from such analyses are interpreted as the “effect of race.” The “Effect of Race” Problem

Paul Holland, a research scientist at the Educational Testing Service, observes that Every day, an economist, a sociologist, or a political scientist “runs” a regression analysis in which some variable denoting the race of the person who is the unit of analysis appears as a predictor (along with other

300 Alternate Lenses for Educational Measurement predictors) of some outcome variable. Every day, the analyst interprets the coefficient of this race variable as the “effect of race” on the outcome variable. Is there a “causal interpretation” to this race effect?73 Tenets 1 and 2 remind us that race is a social construction that is a product of racism. As a social construction, race itself is not causal. Zuberi also reminds us that, “According to the causal theory of manipulative causation, an unalterable characteristic cannot be a cause in inferential statistical models.”74 Rather, differences in outcomes among racialized groups are caused by differences in opportunities, access, and treatment that result from discrimination and bias based on one’s racialized membering. Holland elaborates this point, emphasizing that: causes are experiences that units undergo and not attributes that they possess: No causation without manipulation! … [D]iscrimination is a social phenomenon, one that is learned, taught, and fostered by a social system in which it plays a complex part … It is not just that different groups of people have different experiences, which is what statisticians would call the main effect of RACE. It is the statistical interaction of RACE with an appropriate difference in society that makes the original different experiences into discrimination … If discrimination were removed from society, different groups of people should experience this change differently. If, instead, they all experienced the difference in the same way, I would find it hard to say that there was ever “discrimination” in the first place. We don’t call it discrimination that children can’t vote, but adults can. It is discrimination when only some children can vote when they become adults.75 Paul Spector and Michael Brannick, professors of psychology, raise similar concerns about the use of race (and other socially constructed demographic variables) as “control variables” in nonexperimental research. As they describe: The distinguishing feature of control variables is that they are considered extraneous variables that are not linked to the hypotheses and theories being tested. Their role is assumed to be confounding, that is, producing distortions in observed relationships … Rather than being included on the basis of theory, control variables are often entered with limited (or even no) comment, as if the controls have somehow, almost magically, purified the results.76 Later, they argue that the idea that control variables, such as racialized identity, purify estimates of the primary causal variable of interest—such as an

Critical Race Theory and QuantCrit 301 educational intervention—operates as a “methodological urban legend.”77 Their concern is not with the use of control variables per se, but with the rationale for including a control variable and the interpretation given to the estimated “effects” of a control variable. Like Holland, Spector and Brannick argue: Attention to demographics should focus on mechanisms that explain relations with demographics rather than on the demographics themselves, which in many cases are used with little apparent concern for the reasons that demographics might relate to variables of interest. Here, we are not suggesting that demographic variables are unworthy of research. Rather, we are suggesting that they be avoided as mere control variables in theory development and testing.78 Zuberi observes that “as a rule, social statisticians ignore the discussions about the meaning of race and the implications this meaning has for their statistical models.”79 This lack of attention is reflected in many studies published in major educational measurement journals that include “race” in their analyses but fail to define what “race” means or represents in those analyses.80 In turn, coefficients that reflect the relationship between racialized group membership—which typically serves as a proxy for exposure to or experiences with racism—are described as effects. Without careful reflection on the meaning of race, describing “race” as an “effect”—or the “effect of race”—allows the reader to infer that race is a causal contributor to the outcome. By leaving unsaid that race is a proxy for racism and that, if a causal relationship exists, it is racism that contributes to the production of the outcome, “effects of race” enable the construction of “interpretive narratives of pathology, deficiency, or depravity without any acknowledgement of the cultural construction of race and gender or the ideological system they are byproducts of.”81 Dixon-Román similarly notes a failure to consider the many social, economic, legal, and health-related factors that impact educational achievement, and which vary systematically across racialized categorizations of students.82 This shortcoming in quantitative studies used to inform educational policies and practices (re)produces narratives that falsely attribute “race-based” differences to the members of a racialized group rather than to the racialized social and institutional structures that produced advantage for people membered White by disadvantaging people membered into all other racialized groups. Shaun Harper, a professor of education, analyzed 255 articles on higher education research published in seven peer-reviewed journals. His analysis revealed that, when disparate outcomes across racialized groups were reported, racism was rarely discussed as a possible cause of those outcomes.

302 Alternate Lenses for Educational Measurement Instead, the hardships experienced by people membered not-White were presented as a possible factor that contributed to a disparate outcome. Although hardships may contribute, Harper found that the role racism plays in causing hardships was missing from the authors’ speculations. As Harper observes, “reported in several articles were results that showed how persons of color perceived and experienced campus racial climate differently than their White counterparts. Few [articles], however, considered structural/institutional racism as a logical explanation for such differences.”83 Similarly, Harper quotes a study that recommended institutional researchers identify students at high-risk of not completing course work and that they target support services to those students. He then observes, “such recommendations seemed to suggest that only individuals, not racialized campus environment, were in need of institutional attention.”84 Harper attributes the lack of engagement with racism as a cause of outcomes or a need for redress to, what he terms, “an uncritical race theory” in the framing and conduct of most higher education research. Put simply, Holland asserts that quantitative studies in which racialized categorizations are employed as a predictor variable for a given outcome are fundamentally flawed for the simple reason that a causal interpretation cannot be attributed to a nonrandom categorical variable.85 Similar concerns are raised by Zuberi, who argues that “much that is presented as racial statistics has only helped aggravate the problem of racial conflict by making it appear that race causes people to behave or respond in particular ways.”86 Collectively, analyses of the use of “race” in statistical analyses by Holland, Zuberi, Dixon-Román, and Spector and Brannick shed light on the applicability of each of QuantCrit’s tenets to quantitative research. Implications of QuantCrit for Educational Measurement Quantitative methods are often used by educational measurement specialists to examine the impact instructional practices, curricular programs, and educational policies have on student learning as indicated by educational test scores and measures of other outcomes. QuantCrit tenets were developed to support the application of quantitative research in the pursuit of racial and other forms of social justice. As the discussion in the previous section highlights, quantitative methods have and continue to be applied by educational measurement specialists and other social science researchers in ways that essentialize socially constructed categories and allow research findings to be applied to support deficit narratives that pathologize people and communities membered not-White. The tenets were established not as a direct challenge to quantitative research itself, but as a framework that supports application of quantitative methods for the pursuit of racial justice.

Critical Race Theory and QuantCrit 303 QuantCrit tenets have several implications for the use of quantitative methods by educational measurement specialists and other social science researchers. First, the tenets highlight the importance of applying theory to one’s work. Given the centrality of racism in U.S. social systems, social scientists must incorporate a theory of racism into the theory undergirding their analytic work. Second, because race is a socially constructed category that operates in service to racism, careful reflection on the use of “race” in an analysis is requisite. Social scientists must ask what relevance “race” has in the theory informing their work, what role “race” plays in that theory, and what measures are most appropriate for representing that role. As explored previously, in many studies that employ race as a variable for analysis, it is the effects of racism experienced by members of a racialized group that is of interest rather than group membership itself. As explored in detail in the next chapter, abundant opportunity exists to improve the ways in which race and racism are measured and represented in quantitative analyses. A third implication QuantCrit has for quantitative methods resides in the methods themselves. As detailed in this chapter and explored in greater detail in the next chapter, existing quantitative methods were designed to address specific types of questions and to reflect specific social theories. As an example, multilevel modeling was designed to address relationships among people who are clustered within units. Although current theories of racism and other forms of oppression share some structural elements with the theories that guided the development of existing statistical methods, they introduce other structural arrangements that are not reflected in existing methods. This absence provides opportunities for quantitative methodologists to advance analytic techniques to better align statistical modeling techniques with these more complex structures within social theories. A final implication focuses on discourse employed by quantitative resear chers when presenting interpretations of their findings. In particular, phrasing that attributes relationships among variables must take care to avoid furthering deficit narratives, implying homogeneity among members of racialized (or other socially constructed) groups, and/or essentializing socially constructed categories. Instead, discourse that relates findings to social theories that incorporate the role power, ideology, and structures play in producing the discrimination and bias that differentially and disparately shapes lived experiences positions quantitative research to support the pursuit of racial justice. As the next chapter explores in detail, acknowledging and reflecting intersections of oppression in quantitative research further elevates both the challenges encountered when applying quantitative methods and the opportunities to advance quantitative methods in the pursuit of social justice.

304 Alternate Lenses for Educational Measurement Notes 1 Zuberi (2001), p. 144. 2 In fact, Critical Race Theory, as a way of theorizing, is not being taught in K-12 schools. Because the phrase “critical race theory” is used to represent something larger than Critical Race Theory itself, I write the phrase “critical race theory” in lowercase and in quotes when it is used in a nonscholarly manner to represent any discourse that touches upon racism and/or other forms of oppression. 3 Meckler and Natanson (2021). 4 Meckler and Natanson (2021). 5 The first two quotes are from Baragona (2021). The third quote is from two broadcasts of Tucker Carlson Tonight that aired on September 8, 2020, and October 5, 2021. 6 DeSantis (2021). 7 In some cases, these bans and legislative bills specifically mention “critical race theory.” In other cases, they reference the teaching of racism and, in some cases, sexism. These figures are as of July 2022 and are based on data provided by the World Population Review (https://worldpopulationreview.com/state-rankings/ states-that-have-banned-critical-race-theory). 8 Bennett (2022). 9 Lyons et al. (2021), p. 41. 10 Vo and French (2021), p. 70. 11 Randall (2021). 12 See Delgado and Stefancic (2017), pp. 6–7, for an extended list of scholars who had important early influences on the development of Critical Race Theory. See also Crenshaw et al. (1995) for a collection of publications that informed and shaped the development of Critical Race Theory. 13 Delgado and Stefancic (2017). 14 Delgado and Stefancic (2017), p. 5. 15 West (1995), p. xi. 16 Crenshaw et al. (1995), p. xiii. 17 Crenshaw et al. (1995), p. xv. 18 Golash-Boza (2016), p. 130. 19 See Doucet (2021), p. 13. 20 Delgado and Stefancic (2017). Note that some Critical Race Theory scholars have framed these tenets somewhat differently. See Solórzano and Yosso (2001). Also note that I have altered the order in which the tenets were originally listed by Delgado and Stefancic (2017). The altered ordering presented here is intended to represent the relationships among the tenets. 21 Mills (1997). 22 Delgado and Stefancic (2017), p. xvi. 23 López et al. (2018a). 24 Taylor (2016), p. 3. 25 Bell (1995), p. 22. 26 Bell (1995). 27 Brief for the United States as Amicus Curiae: Oliver Brown, et al. v. Board of Education of Topeka, 347 U.S. 483 (1954). 28 Quoted in a speech by Ruth Bader Ginsburg at the Centre for Human Rights at the University of Pretoria, South Africa on February 7, 2006. See https://www. supremecourt.gov/publicinfo/speeches/viewspeech/sp_02-07a-06 29 Bell (1976), p. 12, footnote 31.

Critical Race Theory and QuantCrit 305 30 Bell (1995), p. 23, see footnote 37. 31 Collins (2019), p. 164 quoting Wells-Barnett (2002), p. 168. 32 Taylor (2016). 33 Taylor (2016), p. 4 quoting Morrison (1992), p. 17. 34 Taylor (2016), p. 4. See also Solórzano and Yosso (2002). 35 Bell (2016), p. 38. 36 Farber (1994) quoted in Bell (2016), p. 39. 37 Bernal and Villalpando (2016), p. 82. 38 Solórzano and Yosso (2002), p. 32. 39 See Gillborn (2010) for an example of a fictionalized counter-story. 40 Bell (2018/1992). 41 Solórzano and Yosso (2001). 42 Huber (2008). 43 Gillborn (2010), p. 253. 44 Lorde (1984), p. 111. For examples of the use of Lorde’s quote to either advocate against the use of existing statistical methods or to implore the development of new methods, see Huber (2008), Bowleg (2008), and Smith (2012). 45 See Zuberi (2001) for a critique of early uses of statistical methods to support racialized deficit narratives. 46 Sablan (2019), p. 182. 47 For example, Cokley and Awad (2013) title their article “In Defense of Quantitative Methods: Using the ‘Master’s Tools’ to Promote Social Justice.” For additional examples, see Garcia and Mayorga (2018), Bailey (2007), and Gillborn (2010). 48 Gillborn et al. (2018, p. 169) note that the phrase “CritQuant” predates that of QuantCrit, and credit “Earnestyne Sullivan and colleagues [who] have used the term “CritQuant” to describe an approach to quantitative policy analyses that seek to embody two “CRT tenets,” namely the “permanence of racism and critique of liberalism.” 49 Stage and Wells (2014), p. 1, italics in the original. 50 Stage and Wells (2014), pp. 2–3. 51 This list of QuantCrit tenets is adapted from Garcia et al. (2018), p. 151 and Gillborn et al. (2018), p. 169. Their lists of QuantCrit tenets present the tenets in a different order. 52 See for example Cook and Campbell (1979), Shadish, Cook, and Campbell (2002), Tukey (1977), Tufte (1985), Tufte, Goeler, and Benson (1990), and Tufte, McKay, Christian, and Matey (1998). 53 Zuberi (2001). See also Zuberi and Bonilla-Silva (2008) for a collection of papers that examine applications of quantitative methods and the study of race and racism. 54 Garcia et al. (2018); Fine (1991); Embrick and Henricks (2015); and Solórzano et al. (2005). 55 Zuberi (2001), p. 138. 56 Taylor (2016), p. 3. 57 Gillborn et al. (2018); Dixon-Román (2017). 58 Russell (in press). 59 Monk (2014); López et al. (2018b). 60 Steele and Aronson (1998); Steele (2003/2018). 61 Baker et al. (2022). 62 Dixon-Román (2017), p. 60 quoting Butler (1993), p. 13. 63 Garcia and Mayorga (2018), p. 248. 64 Teranishi (2007).

306 Alternate Lenses for Educational Measurement 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86

Garcia and Mayorga (2018), pp. 234–235. Garcia and Mayorga (2018), p. 246. Zuberi (2001), pp. 123–124. Zuberi (2001), p. 104. Porter (1995), p. 8. Zuberi and Bonilla-Silva (2008), p. 9. Gillborn et al. (2018). Zuberi and Bonilla-Silva (2008), p. 7. Zuberi and Bonilla-Silva (2008), p. 95. Zuberi (2001), p. 130. Holland (2003), pp. 8–11, italics in the original. Spector and Brannick (2011), p. 288. Spector and Brannick (2011), p. 288. Spector and Brannick (2011), p. 297. Zuberi (2001), p. 123. Russell (in press). Dixon-Román (2017), p. 80. Dixon-Román (2017). Harper (2012), p. 17. Harper (2012), p. 18. Holland (2003, 2008). Zuberi (2001), p. 120.

References Bailey, A. (2007). Strategic ignorance. In Race and Epistemologies of Ignorance. SUNY Press. Baker, D., Karly, S., Ford, S. and Johnston-Guerrero, M. (2022). Racial Category Usage in Education Research: Examining the Publications from AERA Journals (EdWorkingPaper: 22-596). Retrieved from Annenberg Institute at Brown University. https://doi.org/10.26300/r9dg-kd13 Baragona, J. (2021). Tucker admits he’s “never figured out what critical race theory is”. Daily Beast, November 3, https://www.thedailybeast.com/tucker-carlsonadmits-hes-never-figured-out-what-critical-race-theory-is?ref=scroll Bell, D. (1995). Brown v. Board of Education and the interest convergence dilemma. In Critical Race Theory: The Key Writings that Formed the Movement. The New Press. Bell, D. (1976). Racial remediation: An historical perspective on current conditions. Notre Dame Law, 52, 5. Bell, D. (2016). Who’s afraid of critical race theory? In E. Taylor, D. Gillborn, & G. Ladson-Billings (Eds.), Foundations of Critical Race Theory in Education (2nd ed.). Routledge. Bell, D. (2018). Faces at the Bottom of the Well: The Permanence of Racism. Basic Books. Bennett, R.E. (2022). The good side of COVID-19. Educational Measurement: Issues and Practice, 41(1), 61–63. Bernal, D.D. & Villalpando, O. (2016). An apartheid of knowledge in academia: The struggle over the “legitimate” knowledge of faculty of color. In Foundations of Critical Race Theory in Education (2nd ed.). Routledge.

Critical Race Theory and QuantCrit 307 Bowleg, L. (2008). When Black+lesbian+woman≠ Black lesbian woman: The methodological challenges of qualitative and quantitative intersectionality research. Sex Roles, 59(5), 312–325. Butler, J. (1993). Bodies that Matter: On the Discursive Limits of Sex. Routledge. Cokley, K. & Awad, G.H. (2013). In defense of quantitative methods: Using the “master’s tools” to promote social justice. Journal for Social Action in Counseling & Psychology, 5(2), 26–41. Collins, P.H. (2019). Intersectionality as critical social theory. In Intersectionality as Critical Social Theory. Duke University Press. Cook, T.D. & Campbell, D.T. (1979). Quasi-Experimentation: Design & Analysis Issues for Field Settings. Houghton Mifflin. Crenshaw, K., Gotanda, N., Peller, G. & Thomas, K. (1995). Critical Race Theory: The Key Writings that Formed the Movement. The New Press. Delgado, R. & Stefancic, J. (2017). Critical Race Theory: An Introduction. New York University Press. DeSantis, R. (2021). Governor DeSantis announces legislative proposal to stop W.O.K.E activism and critical race theory in schools and corporations. New Release by Staff dated December 21. Available at https://www.flgov.com/2021/12/ 15/governor-desantis-announces-legislative-proposal-to-stop-w-o-k-e-activismand-critical-race-theory-in-schools-and-corporations/ Dixon-Román, E.J. (2017). Inheriting Possibility: Social Reproduction and Quantification in Education. University of Minnesota Press. Doucet, F. (2021). Identifying and testing strategies to improve the use of antiracist research evidence through critical race lenses. WT Grant Foundation. Available at: https://wtgrantfoundation.org/wp-content/uploads/2021/01/Doucet_Digest_ Issue-6.pdf Embrick, D.G. & Henricks, K. (2015). Two-Faced-isms: Racism at work and how race discourse shapes classtalk and gendertalk. Language Sciences, 52, 165–175. Farber, D.A. (1994). The outmoded debate over affirmative action. California Law Review, 82, 893. Fine, M. 1991. Framing Dropouts: Notes on the Politics of an Urban Public High School. State University of New York Press. Garcia, N.M., López, N. & Vélez, V.N. (2018). QuantCrit: Rectifying quantita tive methods through critical race theory. Race Ethnicity and Education, 21(2), 149–157. Garcia, N.M. & Mayorga, O.J. (2018). The threat of unexamined secondary data: A critical race transformative convergent mixed methods. Race Ethnicity and Education, 21(2), 231–252. Gillborn, D. (2010). The colour of numbers: Surveys, statistics and deficit-thinking about race and class. Journal of Education Policy, 25(2), 253–276. Gillborn, D., Warmington, P. & Demack, S. (2018). QuantCrit: education, policy, “Big Data” and principles for a critical race theory of statistics. Race Ethnicity and Education, 21(2), 158–179. Golash-Boza, T. (2016). A critical and comprehensive sociological theory of race and racism. Sociology of Race and Ethnicity, 2(2), 129–141. Harper, S.R. (2012). Race without racism: How higher education researchers minimize racist institutional norms. The Review of Higher Education, 36(1), 9–29.

308 Alternate Lenses for Educational Measurement Holland, P.W. (2003). Causation and race. ETS Research Report Series, 2003(1), i–21. Holland, P.W. (2008). Causation and race. In White Logic, White Methods: Racism and Methodology. Rowman & Littlefield. Huber, L.P. (2008). Building critical race methodologies in educational research: A research note on critical race testimonio. FIU Law Review, 4, 159. López, N., Erwin, C., Binder, M. & Chavez, M.J. (2018a). Making the invisible visible: Advancing quantitative methods in higher education using critical race theory and intersectionality. Race Ethnicity and Education, 21(2), 180–207. López, N., Vargas, E., Juarez, M., Cacari-Stone, L. & Bettez, S. (2018b). What’s your “street race”? Leveraging multidimensional measures of race and intersectionality for examining physical and mental health status among Latinxs. Sociology of Race and Ethnicity, 4(1), 49–66. Lorde, A. (1984). Sister Outsider: Essays and Speeches. Cornell University Press. Lyons, S., Hinds, F. & Poggio, J. (2021). Commentary: Evolution of equity perspectives on higher education admissions testing: A Call for increased critical consciousness. Educational Measurement: Issues and Practice, 40(4), 41–43. Meckler, L. & Natanson, H. (2021). As schools expand racial equity work, conservatives see a new threat in critical race theory. Washington Post, May 3. Mills, C.W. (1997/2014). The Racial Contract. Cornell University Press. Monk, E.P. (2014). Skin tone stratification among Black Americans, 2001–2003. Social Forces, 92(4), 1313–1337. Morrison, T. (1992). Playing in the Dark: Whiteness in the Literary Imagination. Harvard University Press. Porter, T.M. (1995). Trust in Numbers. Princeton University Press. Randall, J. (2021). “Color-Neutral” is not a thing: Redefining construct definition and representation through a justice-oriented critical antiracist lens. Educational Measurement: Issues and Practice, 40(4), 82–90. Russell, M. (in press). Shifting educational measurement from an agent of systemic racism to an anti-racist endeavor. Applied Measurement in Education. Sablan, J.R. (2019). Can you really measure that? Combining critical race theory and quantitative methods. American Educational Research Journal, 56(1), 178–203. Shadish, W.R., Cook, T.D. & Campbell, D.T. (2002). Experimental and Quasi-experimental Designs for Generalized Causal Inference. Houghton, Mifflin and Company. Smith, L.T. (2012). Decolonizing Methodologies: Research and Indigenous Peoples. Zed Books. Solórzano, D.G. & Yosso, T.J. (2001). Critical race and LatCrit theory and method: Counter-storytelling. International Journal of Qualitative Studies in Education, 14(4), 471–495. Solórzano, D.G., Villalpando, O. & Oseguera, L. (2005). Educational inequities and Latina/o undergraduate students in the United States: A critical race analysis of their educational progress. Journal of Hispanic Higher Education, 4(3), 272–294. Solórzano, D.G. & Yosso, T. J. (2002). Critical race methodology: Counter-storytelling as an analytical framework for education research. Qualitative Inquiry, 8(1), 23–44.

Critical Race Theory and QuantCrit 309 Spector, P.E. & Brannick, M.T. (2011). Methodological urban legends: The misuse of statistical control variables. Organizational Research Methods, 14(2), 287–305. Stage, F.K. & Wells, R.S. (2014). New Scholarship in Critical Quantitative Research, Part 1: Studying Institutions and People in Context: New Directions for Institutional Research. John Wiley & Sons. Steele, C. (2003/2018). Stereotype threat and African-American student achievement. In Social Stratification. Routledge. Steele, C.M. & Aronson, J. (1998). Stereotype threat and the test performance of academically successful African Americans. In The Black–White Test Score Gap. Brookings Institution Press. Taylor, E. (2016). The foundations of critical race theory in education: an introduction. In Foundations of Critical Race Theory in Education (2nd ed.). Routledge. Teranishi, R.T. (2007). Race, ethnicity, and higher education policy: The use of critical quantitative research. New Directions for Institutional Research, 2007(133), 37–49. Tufte, E.R. (1985). The visual display of quantitative information. The Journal for Healthcare Quality, 7(3), 15. Tufte, E.R., Goeler, N.H. & Benson, R. (1990). Envisioning Information. Graphics Press. Tufte, E.R., McKay, S.R., Christian, W. & Matey, J.R. (1998). Visual Explanations: Images and Quantities, Evidence and Narrative. Graphics Press. Tukey, J.W. (1977). Exploratory Data Analysis. Pearson. Vo, T.T. & French, B.F. (2021). An ecological framework for item responding within the context of a youth risk and needs assessment. Educational Measurement: Issues and Practice, 40(3), 64–72. Wells-Barnett, I.B. (2002). On Lynchings. Humanity Books. West, C. (1995). Foreword. In Critical Race Theory: The Key Writings that Formed the Movement. The New Press. Zuberi, T. (2001). Thicker Than Blood: How Racial Statistics Lie. University of Minnesota Press. Zuberi, T. & Bonilla-Silva, E. (2008). White Logic, White Methods: Racism and Methodology. Rowman & Littlefield Publishers.

12 Intersectionality Theory

I find I am constantly being encouraged to pluck out some one aspect of myself and present this as the meaningful whole, eclipsing or denying the other parts of self.1

The Civil Rights Act of 1964 is hailed as landmark legislation designed to address the many discriminatory practices that disenfranchised people based on their racialized identity, sex, religion, or national origin. Title VII of the Act focused specifically on employment, stating, “It shall be an unlawful employment practice for an employer to fail or refuse to hire or to discharge any individual … because of such individual’s race, color, sex or national origin.”2 Twelve years later, General Motors Corporation (GM) initiated layoffs based on seniority that resulted in the discharge of dozens of employees hired after 1970. Among the discharged employees were five women membered Black who brought suit against GM alleging discrimination against Black women.3 The plaintiff’s core argument focused on the fact that, prior to the 1964 Civil Rights Act, GM did not hire Black women. Further, between 1964 and 1970, GM had hired only one Black woman as a janitor. As a result, GM’s “last hired–first fired” layoff tactic led to the release of a disproportionate number of Black women—an outcome the plaintiff argued was discriminatory and violated Title VII of the Civil Rights Act. Although the facts of the case were not disputed, the court found that the plaintiff’s case did not violate Title VII. The court’s logic focused laser-like attention on the wording of the statue, which prohibited discrimination based on race or gender. Absent from the statute was any mention of race and gender. As the court wrote: Plaintiffs have failed to cite any decisions which have stated that Black women are a special class to be protected from discrimination … they should not be allowed to combine statutory remedies to create a new DOI: 10.4324/9781003228141-16

Intersectionality Theory 311 “super-remedy” … this lawsuit must be examined to see if it states a cause of action for race discrimination, sex discrimination or alternatively either, but not a combination of both.4 In effect, the court ruled that discrimination against women and discrimination against people membered Black are both prohibited by Title VII, but discrimination against Black women was tolerated. The court’s insistence on treating racialized and gender identity as unique and separate, and thus refusing to recognize the ways in which racialized and gendered oppression intersect to uniquely impact Black women, echoes Audre Lorde’s sentiments as expressed in the quote that opens this chapter, as well as words spoken more than a century earlier by Sojourner Truth, a former slave turned abolitionist and women’s right activist: That man over there says that women need to be helped into carriages, and lifted over ditches, and to have the best place everywhere. Nobody ever helps me into carriages, or over mud-puddles, or gives me any best place! And ain’t I a woman? … Then that little man in black there, he says women can’t have as much rights as men, ’cause Christ wasn’t a woman! Where did your Christ come from? Where did your Christ come from? From God and a woman! Man had nothing to do with Him.5 It is human nature to categorize. Psychologists and cognitive scientists theorize that categorization allows us to engage with our world without becoming overwhelmed by its vastness. Absent categorization, our sensory percep tions would inundate our brains with stimuli and paralyze our cognitive processes. Categorization allows us to organize the objects, ideas, and experiences we encounter in a manner that simplifies our understanding of and encounters with the world. Categorization systems also position us to prioritize information—to elevate information prior experience categorizes as important and subordinate, or even ignore, other information prior experience indicates is not important. This drive to categorize and establish hierarchy is evident in both the Great Chain of Being and the classification system developed by Linnaeus that gave birth and order to the concept of race in the 18th century. The positivist component of the White Racial Frame further shapes our understanding of categories as essential, unique, and universal. Until very recently, and perhaps still for most people, we are conditioned to categorize humans as “man” or “woman,” “boy” or “girl.”6 In fact, the first question many people ask when they learn of the birth of a baby is, “Is it a boy or a girl?”7 The positivist element of the White Racial Frame teaches us that gender is separate from racialized identity, and both are separate from economic status, sexual orientation, nationality, age, and so on. As a result, when we

312 Alternate Lenses for Educational Measurement consider discrimination, oppression, and advantage, our focus directs attention separately to each category. Further, our normative expectations and the narratives that reinforce these expectations are developed separately within each unique category. And our need to establish hierarchy often leads us to elevate one identity or form of oppression over others. Like Karl Marx, some of us see class as the overarching issue that produces inequality. Pointing to the patriarchal structure of our society, others see gender as the primary driver of oppression. Citing our nation’s long history of slavery, Jim Crow, and other forms of racialized oppression, others elevate race and racism as in most need of remedy. Troubled by the singularity of our focus, legal scholar Kimberlé Crenshaw gave name to our struggle to recognize identity and oppression as multiple. Through critical analysis of three court cases that treated gender and racialized identity separately, Crenshaw introduced the term intersectionality as a tool for understanding identity, social position, oppression, and the impacts of oppression as a multiplicity. Multiplicity and Intersectionality The concept of multiplicity challenges the single-discrete-dominant-axis orientation to categorization promoted by the White Racial Frame’s positivist logics. These positivist logics condition us to identify main effects and gravitate towards one form of identity or oppression as dominant. “I am a man first or White first or a husband first or a Christian first.” You are either a “man” or a “woman,” “White” or a “person of color,” “married” or “single,” “Christian” or “heathen.” When the multiple aspects of our identity are considered, they are understood as additive. “I am a man first, and also White and married.” This additive conception is reflected in the statistical models used first to separately estimate the “effects” of gender, race, marital status, or religious faith on a given outcome and then to add up the weighted effects to predict an outcome. The “variable” with the largest weight is often interpreted as the variable having the largest effect. As explored in greater detail throughout this chapter, multiplicity confronts this single-axis additive conception and is core to intersectionality’s challenge to White Racial Framed analyses of relationships between identity, oppression, and the many policies, practices, and outcomes of interest to the educational measurement community. But more than being core within intersectionality, multiplicity is foundational to a deep understanding of intersectionality itself. Crenshaw introduced the term intersectionality in 1989, describing it as a provisional concept. As a provisional concept, intersectionality challenges and aims to “transform our thinking, … institutionalized practices, our current axiomatic assumptions, cognitive habits, and unreflective premises” to point toward a new way

Intersectionality Theory 313 of understanding and exploring identity and oppression.8 As a provisional concept, intersectionality “is meant to get us to thinking about how ‘we’ think, as opposed to constituting an answer to a problem ‘we’ already know and understand” and to open “the linguistic space for the kind of communicative exchange that is violently foreclosed by essentialist constructions of identities, oppression(s), and experiences … that fragment concepts and discourses of oppression [about] ‘multiply oppressed’ groups.”9 In the time since Crenshaw introduced the term, intersectionality itself has become multiplicative. Intersectionality has developed into a critical social theory, a paradigm, a framework, and a heuristic for quantitative analyses.10 Intersectionality focuses both on identity and positions produced by social structures. Intersectionality has come to challenge single-dominant-axis conceptions of identity and social positions and the discrete (often dichotomous) categorizations of identity and social positions. As Vivian May, a professor of women’s and gender studies, describes, it is the multiple ways in which intersectionality functions to “unsettle dominant imaginaries” that makes intersectionality “for many, hard to grasp or hold on to—the force of conventional ways of thinking (via categorical, either/or logics, for instance) keeps it just out of reach.”11 The remainder of this chapter is divided into two broad sections. The first section examines intersectionality as a provisional concept and separately considers intersectionality as critical social theory, a paradigm, and as a heuristic for quantitative analyses. The second section focuses on intersectionality and quantitative analyses with specific attention to the approaches scholars have taken to apply an intersectional lens to their analyses and the challenges intersectionality presents for current analytic methods. Like intersectionality itself, the ideas explored in the second part of this chapter are provisional and open opportunities for the field of educational measurement to refine existing methods and develop new methods that are more fully aligned with the theory of intersectionality. Intersectionality as a Provisional Concept Kimberlé Crenshaw introduced the term intersectionality in response to the failure of single forms (or axis) of oppression to account for the unique form(s) of oppression experienced by Black women. As Crenshaw observed, addressing gendered oppression was advantageous for White women but left Black women vulnerable to racialized oppression. Addressing racial oppression was beneficial for Black men, but left Black women exposed to gendered oppression. Reflecting on Crenshaw’s legal analyses, sociologist Patricia Collins observes that “racism and sexism not only fostered social inequalities, they marginalized individuals and groups that did not fit comfortably within race-only, gender-only mono-categorical frameworks.

314 Alternate Lenses for Educational Measurement Women of color remained politically marginalized within both movements.”12 Similarly, political theorist Anna Carastathis argues that the failure for either action to address experiences of Black women occurs because [n]either agenda is constructed around the experiences, needs, or political visions of women of color; to the extent that antiracism reproduces patriarchy and feminism reproduces racism, women of color are asked to choose between two inadequate analyses, each of which constitutes a denial of a fundamental dimension of our subordination.13 And Crenshaw observes, “Because of their intersectional identity as both women and of color within discourses that are shaped to respond to one or the other, women of color are marginalized within both.”14 These observations are not new—Sojourner Truth, Julia Cooper, and Frances Beale among others have long highlighted the double/multiplejeopardy of oppression experienced by Black women.15 Meeting periodically over a three-year period during the mid-1970s, a group of Black feminists known as the Combahee River Collective “asserted that it is impossible to separate racial, sexual, and class-based oppression because these systems are experienced simultaneously.”16 By giving this simultaneity a name, Crenshaw intensified focus on “the structural convergence among intersecting systems of power that created blind spots in antiracist and feminist activism.”17 Through her conception of intersecting forms of oppression, Collins also revealed the challenge encountered when one is “situated between two (or more) political movements … pursuing mutually exclusive and often conflicting agendas that conspire to marginalize and fragment experiences of intermeshed oppressions.”18 For Collins, intersectionality reveals ways in which multiple-axis analysis of oppression/discrimination makes visible the experiences of Black women. As a way of understanding the production of lived experiences and a tool for analyzing lived experience, intersectionality emerged as both a critical social theory and an analytic heuristic. Intersectionality as a Critical Theory

As explored in Chapter 10, a critical social theory explains an existing social world, “offering interpretations for how and why things are the way they are … Social theories justify or challenge existing social orders … and both explains and criticizes existing social inequalities, with an eye toward creating possibilities for change.”19 Although initially applied to explain the unique experiences of Black women produced through multiple systems of oppression, intersectionality as a critical social theory has been and continues to be applied to explain and envision possibilities for diverse subpopulations of

Intersectionality Theory 315 our society shaped through the intersections of various forms of oppression. As just a few examples, Gloria Anzaldúa focused on the unique experiences of what she termed the new Mestiza—Latine women who occupy borderlands between the United States and Mexico where they confront oppression as Latine immigrant women who neither fit nor are fully excluded from the culture of the United States or Mexico.20 Gerry Veenstra examined inequality in health care through the intersections of gender, racialized, class, and sexual orientation.21 Doug Meyer explored the ways in which violence was differentially experienced at the intersection of race and sexual orientation.22 Through these and many more analyses, scholars have revealed the ways in which lived experiences and outcomes are disparately produced through the intersections of various forms of oppression. Collins argues, however, that a critical theory requires more than criticism. To be critical, an entity must also be “essential, needed, or critical for something to happen.”23 As an example, Collins points to water as being essential for sustaining life. The aforementioned examples engage both aspects of a critical theory. Through their various analyses, they are critical of single-axis analyses because such analyses obscure wide variation in the experiences with oppression of people forming subgroups within the single axis. At the same time, they evidence the need to apply an intersectional lens to develop a full understanding of disparities and the causes of those disparities. A third aspect of critical social theory is political and focuses on change. As Collins observes, critical theories are not simply those that oppose a given condition or practice. “Rather, they also have to make the case for what they are for, and why that matters now.”24 It is not enough to reveal differences in outcomes and lived experiences produced by the intersections of oppression; revealing these differences must be accompanied by calls for change and point to specific mechanisms within a social system where change can be executed. Change can be reformist in nature, leaving the social system largely unchanged—altering a practice, such as the questions asked during medical examinations, the response of a police officer in specific situations, tightening of policies within an institution, and so on. Or change can be transformative—elevating disenfranchised people to positions of power, replacing curriculum to alter narratives and alter dominant ideology, modifying mechanisms for funding social services (e.g., schools and workforce retraining) to increase opportunities in communities impoverished by existing practices.25 While reformist projects are fundamental for producing impacts in the near term on the lived experiences of people most affected by intersecting forms of oppression, transformative change holds greater promise for sustainable, large-scale impact that alters lived experiences for current and future generations. Regardless of the type of change advocated, engaging intersectionality as a critical social theory must unveil opportunities for change.

316 Alternate Lenses for Educational Measurement Identity and Social Position

Foundational to intersectionality is the role identity and social position play in the production of lived experiences. The positivist component of the White Racial Frame locates identity within the individual and assumes equivalence between demographic categorizations and identity. Equating demographic categorization with identity ignores the social construction of demographic characteristics, such as race and gender, and treats these socially constructed characteristics as “real” traits inherent in an individual. This framing permits identity to be treated as a discrete, stable, individual trait. As an individual trait, the positivist position understands identity as a key influence on one’s view of and interactions with the world. As we saw in Chapter 11, treating identity as an individual trait allows statistical analyses framed from this positivist perspective to model specific aspects of identity (e.g., gender, racialized, social economic status, etc.) as “causal factors” that have “effects” on outcomes of interest.26 Intersectionality both challenges the conventional conception of identity and expands it in at least three ways. First, identity is understood as coconstituted by the individual, the social position(s) held by the individual, and the social structures that produce those social positions. Second, identity is understood as fluid rather than permanent. Third, the role identity plays in shaping lived experiences is divided between how one identifies and how one is identified. As we saw in Chapters 1 and 2, race was socially constructed and molded over centuries to support and maintain a system of racism. This system enacted policies and practices that structure society in ways that produce benefit for people membered into the dominant racialized group and manufacture obstacles for people membered into subordinate racialized groups. Redlining, segregated schools (and associated funding structures), voting laws, and social security regulations, among others, structured society along racialized lines, which in turn influenced demographic characteristics such as economic status, educational credentials, and residential locale. Although this structuring did not impact all members of a racialized group in the same manner, it produced patterns that differed among racialized groups. Concepts specific to gender and sexuality are similarly socially constructed and reinforced through social structures. Although more recent understandings of both gender and sexuality challenge the dichotomous categorizations of the past, for many people, gender is structured as male and female, and sexuality is understood dichotomously as “heterosexual” or “homosexual.” These understandings have been reinforced through numerous policies and practices. The Catholic Church reinforces the dual conception of gender, permitting males and prohibiting females to the priesthood. Until relatively

Intersectionality Theory 317 recently, the U.S. military similarly permitted males to and prohibited females from combat roles. And many industries restricted a subset of jobs to males, implicitly limiting employment opportunities for women. Until the early 2000s, marriage laws, tax policy, and health insurance practices have similarly distinguished hetero- and homosexuality, permitting marriage between heterosexual partners, granting tax breaks to married couples, and extending health care coverage to marital partners. These policies and practices produced secondary impacts on social economic status, health status, and educational credentials. In some cases, they also limited the social groups within which individuals felt safe engaging. As psychologists Leah Warner and Stephanie Shields describe, “intersectionality theory emphasizes the importance of understanding identity within a social structural context. That is, rather than being a collection of personality traits or individualized experiences, identity is informed by institutional, political, and societal structures.”27 Intersectionality also recognizes identity as fluid, ever-evolving, and context-bound. This conception of identity as fluid contrasts with “commonsense understandings of identity [that] see a person as basically carrying one essential identity from one social setting to the next.”28 Rather than a permanent trait, sociologist Stuart Hall recognizes identity as “a constantly shifting process of positioning … identity is always a never-completed process of becoming—a process of shifting identifications, rather than a singular, complete, finished state of being.”29 The facets of one’s identity that are shaped and at play vary by setting. Reflecting on my own identity, my educational and professional status are elevated when I am located in an academic setting but are minimized when interacting as a coach on my daughter’s soccer team. Similarly, my identity as a White male of moderate wealth is elevated when interacting in financial institutions, but de-emphasized when working in diverse working-class settings. Similarly, over time, facets of my identity have shifted. As a young man, I was less conscious of the patriarchal, heteronormative, White dominance in our society. As a result, those aspects of my identity were nearly always at play. Having developed a deeper understanding of our societal structuring, dominant ideology, and the oppression of nondominant members of our society, my heterosexual identity is subdued, and I work to check my White maleness. As psychologists Nicole Else-Quest and Janet Shibley Hyde write, As context is fundamental to intersectionality, we caution quantitative researchers seeking to apply an intersectional approach to be mindful of the need for context at all stages of their research … as researchers analyze social categories, they should attend to systematic variations among members of those categories.30

318 Alternate Lenses for Educational Measurement The relationship between identity and lived experience is also influenced both by how one identifies and how one is identified. As Collins observes, “people shape their social worlds through the actions they take as well as the experiences that their actions engender.”31 One’s identity plays an essential role in shaping actions and producing experiences. But experiences in a social context are also influenced by those with whom one interacts. The facets of a person’s identity that are at play in a given social context influence how that person engages in that context. However, how the person is identified by those with whom they interact also impacts the experience produced. As Nancy López and her colleagues explore, how one is identified is not always consistent with how one identifies. It is this potential mismatch that inspired the concept of “street race”—how one is identified by others. Although research on street race is relatively new, early findings indicate that, for some individuals, street race is an important factor that affects experience. Moreover, the influence street race has on experience varies by context.32 Together, the influence of social structure on social position, the fluidity of identity, and distinguishing how one identifies from how one is identified operate to influence lived experiences. In this way, intersectionality expands focus to “consider the intersection of multiple-level social-structural factors as well as the intersection between multiple microlevel and macrolevel factors.”33 This expanded focus requires consideration of ways in which individual identities and characteristics interact with perceived identities, assumptions, and resulting behaviors, as well as structural elements that shape the contexts in which these interactions occur.34 This multilevel focus is similar to Uri Bronfenbrenner’s ecological systems theory that recognized the role that micro- (individual), meso- (locale), and macro- (societal) level factors have in shaping human development. As Carastathis describes, oppression operates and is resisted at “the micro-level of personal biography or lived experience; the meso-level of social groups, communities, and cultures; and the macro-level of social institutions.”35 Given the dominant role structure plays in shaping lived experiences, Lisa Bowleg, a professor of applied social psychology, encourages research conducted through an intersectional lens to “privilege[e] a focus on structural-level factors rather than an exclusive focus on the individual [as it] is likely to facilitate the development of structural-level interventions more likely to affect the ‘fundamental causes’ (e.g., poverty, social discrimination) of social inequalities.”36 Intersectionality Metaphors

At first, the concept of intersectionality may feel easy to grasp—lived experiences are influenced by the combined effects of multiple forms of oppression and advantage. A White male is advantaged through both his whiteness and

Intersectionality Theory 319 his gendered status. A White woman is similarly advantaged through her whiteness but disadvantaged through her gendered status. A Black man is advantaged through his gender, yet disadvantaged through his blackness. And a Black woman is disadvantage through both her blackness and her gender. Conceiving intersectionality as the combination of various forms of oppression and advantage, however, is not consistent with Intersectionality Theory. It is not the combined or additive effects of oppression and advantage that produce lived experiences. It is the intersection, the space between multiple forms of oppression and advantage, that produce these experiences. In this conception of intersectional identity, each combination of oppression and advantage acts as a unique and independent form of oppression or advantage. The oppression experienced by Black women is unique to a woman membered Black. The oppression experienced by White women is unique to a woman membered White. In this formulation, gendered and racialized oppression and advantage reconfigure in two unique forms of oppression, one form specific to Black women and the second form specific to White women. For those of us (myself included) conditioned to think of each form of oppression or advantage as operating along a single dimension or axis, the multiplicity of identity is a slippery concept to grasp. Man and woman are understood as gendered productions. Black, White, Asian, and Indigenous are racialized productions. When combined as Black women or White men, these terms are heard separately—there is a female, and that female is membered Black; there is a man, and that man is membered White. For many of us, our conditioning leads us to conceive gender as a single unique axis and racialized identity as a separate single unique axis. To aid reconceptualization identity and oppression as multiplicitous, intersectionality scholars have introduced several metaphors. In her initial writings, Crenshaw employed the metaphor of an intersection. When an intersection is envisioned as two routes crossing each other—Highway 104 intersecting with Route 7—this metaphor conjures images of multiple axes coming together at a crossing point—an intersection. In this metaphor, however, the static nature of roadways depicts the intersection of identities as the addition of one form of oppression or advantage with another form of oppression or advantage. The pavement of Highway 104 is laid upon that of Route 7, producing an additive effect of oppression or advantage. Highway 104 is gendered oppression/advantage. Route 7 is racialized oppression/ advantage. Their intersection is the combined effects of oppression experienced by a Black woman, a Black man, or a White woman, or by the advantage of the White man. An alternate conception of an intersection, however, shifts focus from the roadway itself to the vehicles traveling along each route. In this conception,

320 Alternate Lenses for Educational Measurement the vehicle traveling along Highway 104 represents gendered oppression/ advantage, and the vehicle traveling on Route 7 represents racialized oppression/advantage. At the intersection, two vehicles collide, bringing them together. The force of the collision carries passengers off into a new direction; the new direction of travel defined by the complex relationship between the initial directions and speeds of travel, as well as the mass of each vehicle. Rather than laying one form of oppression on top of another, the intersecting of moving vehicles produces an entirely new path of direction. This new path of direction comes closer to representing the multiplicitous conception of intersectional identity. Yet, this metaphor still relies on separate vehicles representing distinct forms of oppression/advantage, which collide at an intersection. Thus, by starting with two separate and distinct axes (vehicles) of identity, this metaphor does not fully reflect the multiplicity of intersectionality. Elsewhere in her writings, Crenshaw employs the metaphor of a basement to convey the disparate impact produced by the intersection of oppression. In this metaphor, Crenshaw envisions a basement, the ceiling of which contains a hatch that allows one to access the building’s first floor on which one acquires relief from the dark, dank, oppressive conditions of the basement. She asks the reader to imagine the hatch being out of reach for a person standing on the basement floor. Only those who stand on the shoulders of others are able to grab hold of the hatch and pull themselves out of the basement. Those who have already accessed the first floor—or perhaps were born into that floor—benefit from two forms of advantage. Those standing on the floor of the basement are impacted by both forms of oppression. Those standing on the shoulders, within reach of the hatch, are impacted by only one form of oppression. Through the basement metaphor, Crenshaw articulates the ways in which multiple forms of oppression and advantage interact to produce social positions. The metaphor, though, still conjures an additive (or, more accurately in this metaphor, subtractive) combination of effects. Expanding on the metaphor of intersecting routes, Collins uses the metaphor of a “matrix of domination.” The matrix comprises two components. The first are systems of oppression (racism, patriarchy, heterosexism, ableism, etc.). The second is a hierarchical arrangement of power relations within each system of oppression. Within the matrix, vectors point from the origin, with each vector representing a system of oppression, the origin representing maximal power, and the distance from the origin indicating degradation of power. Power is represented by the location within the matrix at which one is positioned by their identity. Being multidimensional, the metaphor of a matrix allows multiple vectors to form the matrix, with each vector representing a unique category of identity. The point (or cell, if one envisions the matrix in tabular form) within the matrix at which locations along

Intersectionality Theory 321 each vector intersect represents the power associated with one’s intersectional identity. While the matrix metaphor supports conceptions of intersectional identity formed by more than two categories of identity, it still conceives of identity being formed by unique vectors (axes), each representing a separate category of identity. Although the matrix metaphor avoids an additive conception, the power associated with a given intersectional identity is still defined by the combined effect of power associated with one’s position along each vector of identity. Other metaphors employ the terms interlocking, intertwined, intermeshed, and woven to represent the integrated effects of oppression. As an example, the Combahee River Collective writes, “the major systems of oppression are interlocking. The synthesis of these oppressions creates the conditions of our lives.”37 Ange-Marie Hancock, a professor of gender studies and political science, also employs the interlocking metaphor, writing, “Intersectionality theory claims that these policy problems are more than the sum of mutually exclusive parts; they create an interlocking prison from which there is little escape.”38 This conception of oppression as interlocking captures the tight coupling that occurs through an intersectional conception of identity and oppression/advantage. Yet, its reliance on a locking mechanism to connect forms of oppression suggests that the intersecting forms of oppression can be unlocked and separated from one another, an implication that again does not fully reflect the multiplicity of intersectionality. In their description of intersectionality, Else-Quest and Hyde write, all people are characterized simultaneously by multiple social categories, including, for example, gender, race and ethnicity, class, and sexual orientation; these multiple social categories are interconnected or intertwined, such that the experience of each social category is linked to the other categories.39 Similarly, Chicana feminist theorist Norma Alarcón conceptualizes intersections of identity and oppression as being woven, and feminist philosopher María Lugones uses the term intermeshed.40 These metaphors come closer to capturing the production of unique lived experiences produced by the intersection of identity and oppression. By meshing, twining, and weaving, a wholly new product is formed. It is through this interconnected, intertwining, intermeshed weaving that identity and oppression are experienced simultaneously as a multiplicitous yet singular manifestation.41 By understanding multiple forms of identity and oppression as a singular experience, these metaphors reflect the Combahee River Collective’s assertion “that it is impossible to separate racial, sexual, and class-based oppression because these systems are experienced simultaneously.”42 Yet, within each of these metaphors, the threads of distinct categories of identity and oppression

322 Alternate Lenses for Educational Measurement remain visible. And with patience and care, one can pull on a given thread to separate the product into its distinct parts. To capture the inseparability of intersectional identity and oppression, Gloria Anzaldúa, a late Chicana cultural theorist, describes her borderland identity as “an act of kneading.”43 Lugones similarly employs the term curdling, writing, “[m]ultiplicity follows the logic of curdling … According to the logic of curdling, the social world is complex and heterogenous and each person is multiple, nonfragmented, embodied.”44 Drawing on her interviews with Black gay men, Lisa Bowleg presents the metaphor of blending a cake: Nigel a 37-year-old gay man deftly summarized the intersectionality of his race, gender, and sexual identity, noting the difficulty of separating these identities as if they were independent of each other: “Well it’s hard for me to separate [my identities]. When I’m thinking of me, I’m thinking of all of them as me. Like once you’ve blended the cake you can’t take the parts back to the main ingredients.”45 In these three metaphors, kneading, curdling, and blending produce a new and unique product. Once formed, it is impossible to dissemble the product into its original parts. In addition, once formed, the product is given a new name that is distinct from its constituent parts—bread dough makes no reference to the flour, sugar, water, and yeast that is used to form it; eggs, oil, flour, or water are absent in the word cake. Moreover, the properties of the final product are fundamentally different from those of their constituent ingredients. In these ways, the chemistry that comes into play when curdling, kneading, blending/mixing, and baking allows these metaphors to come closest to representing intersectionality’s concept of multiplicity. Yet, despite the closer proximity with which these metaphors come to the concept of multiplicity, they still rely on single-axis conceptions of identity and oppression as the ingredients brought together to produce the end product. To avoid initiating a metaphor with a single-axis conception of identity and oppression, Crenshaw, and later Carastathis, explore multiplicity from the perspective of a coalition.46 In this metaphor, Crenshaw and Carastathis begin with a set of intersectional identities and build out to a single-axis coalition. As an example, one might consider Black women, White women, Asian women, Latine women, and Indigenous women each as a distinct intersectional identity the members of which experience unique forms of oppression and/or advantage. These five intersectional groups, however, might coalesce around their womanhood to form a coalition representing women. Collectively, this coalition might then resist, advocate, and work towards alternate power arrangements that benefit members of all intersectional groupings. The coalition metaphor turns the table on single-axis conceptions of identity by envisioning a single-axis identity as the product of a coming

Intersectionality Theory 323 together of unique intersectional groupings that hold an aspect of their identity in common. As Carastathis posits, “conceptualizing intersectional identities in coalitional terms may help us avoid positivist and essentialist constructions of identity and subjectivity.”47 Critics, however, raise concerns that the coalition metaphor may lead to the production of an infinite number of intersectional groupings, particularizing identity to a point that becomes unmanageable from an analytic perspective.48 Analysis of these many metaphors reveals the challenge intersectionality theorists encounter when trying to shed the single-axis conception of identity and oppression instilled through the White Racial Frame’s positivist logics. Despite the many metaphors introduced, none fully represent the concept of multiplicity embraced by intersectionality theory. This failure to locate a fully satisfactory metaphor for the multiplicity of identity and oppression reflects a shortcoming in our current vocabulary and way of envisioning multiplicity. As legal scholar Devon Carbado and colleagues observe, “the strictures of language require us to invoke race, gender, sexual orientation, and other categories one discursive moment at a time.”49 The challenge of envisioning identity and oppression as multiplicities foreshadows similar challenges our analytic techniques encounter when applied to examine identity and oppression through the lens of multiplicity. Intersectionality as a Heuristic Device Analyses conducted through the lens of intersectionality initially employed qualitative methods and theorized ways in which intersectional forms of oppression operate to impact lived experiences of people membered into specific intersectional groups. Crenshaw’s analyses of legal casework focused singularly on Black women. Her subsequent analysis focused on violence against women, with a critical eye on the failure of feminist and anti-racist discourses “to consider intersectional identities such as women of color … and how these experiences [of violence] tend not to be represented within the discourses of either feminism or antiracism.”50 Crenshaw’s analysis led her to conclude that strategies to address violence against women will be of limited utility unless they are tailored to the experiences of women with different economic and/or racialized backgrounds. In the years that followed, the focus of intersectional analyses expanded from Black women to a diverse array of intersectional groupings. These analyses focused largely on developing understanding of experiences of a people membered into a specific intersectional group (e.g., Black gay men, Latine immigrant women, etc.). The introduction of quantitative methods as a tool for engaging in intersectional analyses expanded focus from theorizing about a specific intersectional form of oppression to engaging in comparative analyses. With this shift, intersectionality expanded from a theory

324 Alternate Lenses for Educational Measurement to also serve as a heuristic device. Intersectionality as a heuristic device aided the formation of research questions, study design, and interpretation of findings produced through quantitative analyses. Intersectionality as a heuristic device aids quantitative researchers in reconceiving group formations and the estimation of variation among and within groups. Rather than focusing on single-axis identities and forms of oppression, intersectionality as a heuristic device supports the definition and formation of intersectional groupings and allows impacts of oppression to be estimated separately for each intersectional identity group. When guided by theory about the ways in which oppression and advantage operate differently among intersectional groupings, intersectionality as a heuristic device also guides decisions about which intersectional group might serve as a point of reference.51 By understanding how and why outcomes differ among and within intersectional groupings, remedies can be tailored to address the unique social positions and resulting experiences of a given intersectional group. Sociologist Leslie McCall identified three approaches commonly employed when examining “multiple, intersecting, complex social relations,” terming these approaches intercategorical, intracategorical, and anticategorical.52 Perhaps most familiar to researchers who apply quantitative methods, the intercategorical approach focuses on differences among intersectional groupings. In the intercategorical approach, each intersectional grouping is understood as a separate and distinct social group (or population), and the focus of analysis is on comparing impacts of experiences across groups. In the intercategorical approach, one group is typically identified as the focal group against which all other intersectional groupings are compared. Most often, the intersectional grouping that is most advantaged in society serves as the focal group.53 One danger with establishing a focal group is the potential for consumers of research to infer that the focal group represents normative expectations and that the different impacts experienced by all other intersectional groupings are deviant from the norm.54 Rather than selecting an intersectional grouping as the focal group, using the mean to center estimated impacts may minimize implying that any one intersectional grouping represents normative expectations. The intracategorical approach avoids implying normative comparisons by focusing on variation within a single intersectional grouping. This approach begins by identifying an intersectional group of interest and theorizing reasons why lived experiences might vary among members of the intersectional group. An intracategorical approach “shifts the research focus toward identifying and understanding the mechanisms by which inequalities are created and expressed within those categories.”55 The reasons for variation focus on the differences experienced by members of that group that are produced by

Intersectionality Theory 325 the intersections of additional aspects of their identity and social locations. For example, Hancock applies intracategorical analysis to critique Derrick Bell’s advocacy for establishing “African American independent schools” in “inner city” districts to help address educational inequality that disparately impacts students membered Black. While supportive of efforts to address racialized educational inequality, Hancock points to census data indicating that 94 of the 96 counties in which people membered Black comprise more than 50% of the population are located in the rural South. Only two of these counties contain an urban setting.56 By adding locale to her analysis of the student population membered Black, Hancock unveils an important limitation to interventions that focus attention on urban settings. To avoid such oversights, Else-Quest and Hyde caution quantitative researchers seeking to apply an intersectional approach to be mindful of the need for context at all stages of their research … as researchers analyze social categories, they should attend to systematic variations among members of those categories.57 The intracategorical approach heeds this advice by focusing specifically within a select intersectional group to explore sources of variation among members of that group and to tailor remedies in response to those sources of variation. A limitation of both the intercategorical and the intracategorical approach is the need for researchers to “provisionally adopt existing analytical categories,” particularly when secondary datasets are employed.58 The anticategorical approach avoids this need by combatting the concept of social categories. By viewing social categories—whether single-axis or intersectional—as “simplifying social fictions that produce inequalities in the process of producing differences,”59 the anticategorical approach “deconstructs the received categories that construct social group memberships.”60 As McCall describes, in the anticategorical approach, “the premise of this approach is that nothing fits neatly except as a result of imposing a stable and homogenizing order on a more unstable and heterogeneous social reality.” 61 Recognizing that the purpose for establishing social categories—whether single-axis or intersectional—is to support oppression of subordinate group members to produce advantage for dominant group members, McCall argues that the deconstruction of master categories is understood as part and parcel of the deconstruction of inequality itself. That is, since symbolic violence and material inequalities are rooted in relationships that are defined by race, class, sexuality, and gender, the project of deconstructing the normative assumptions of these categories contributes to the possibility of positive social change.62

326 Alternate Lenses for Educational Measurement The goal of anticategorical analyses is to reveal the suspect nature of social categories by demonstrating that they have no inherent or natural foundation.63 By revealing the variation within and incoherent logic required for membering people into social categories, the anticategorical approach challenges “the singularity, separateness, and wholeness of a wide range of social categories,”64 rendering “suspect both the process of categorization itself and any research that is based on such categorization.”65 In this way, the end goal of the anticategorical approach is to eliminate social categories, the oppression based on those categories, and ultimately the need to engage in research focused on social categories. The prospect that these goals will be reached anytime soon is unlikely. Yet, the goal is a clear reflection of both the liberatory aim of Intersectionality Theory and the care that must be taken by the intracategorical and intercategorical approaches not to reify or imply normativity to the identities and social positions that are the focus of study. Concerns About Intersectionality as Heuristic

Despite its potential to deepen understanding and inform the development of more nuanced and tailored remedies, applying intersectionality to quantitative research as a heuristic device has sparked concern among intersectionality theorists. At the core of concern is the potential to treat intersectionality as a factorial design in which available identities are crossed and resulting estimates are used to compare the magnitude of effects among the many combinations of identity. There are at least four issues to consider when applying intersectionality as a heuristic device. First, simply combining single-axis identity categorizations available within a dataset to produce intersectional groupings holds potential to reveal differences that lack a theoretical foundation for why or how a difference is produced. In turn, reporting differences absent a theory as to why differences are observed “forces the reader to draw her or his own conclusions for why the differences occur, which often is based, at least in part, on a tendency to essentialize difference and rely on stereotypes”66—an outcome that is in direct conflict with the reformist and transformational aims of Intersectionality Theory. Quantitative analyses that employ intersectional groupings must be guided by an explicit rationale informed by theory as to why the intersections of identity, social location, and resulting oppression/advantage conspire to differentially impact members of the intersectional groupings.67 Second, forming intersectional groupings absent theory and then performing statistical analyses that fail to find statistically significant relationships with outcome variables of interest or among intersectional groupings can lead to faulty conclusions about the interplay of identity, social location,

Intersectionality Theory 327 and oppression/advantage. Similarly, when differences are detected, there is danger that these differences may be employed to identify those groups that are most advantaged or disadvantaged—the findings serving as a form of “Oppression Olympics.”68 Efforts to compare or rank impacts of oppression and advantage among intersectional groupings absent theory stands in stark contrast to Crenshaw’s emphasis that experiential differences are qualitative, not quantitative. As an example, in Crenshaw’s analysis of violence against women, it was not that women of color experienced more violence; rather, the way in which they experienced violence was qualitatively different from the experiences of White women.69 Simply comparing magnitudes of effect without theorizing about or examining the experiences that produce those magnitudes may lead to the development of faulty remedies. Third, a factorial approach that combines available identity categories into intersectional groupings both limits analysis, and resulting theorizing, to the identity categories available and may produce research that attends to identity categories that have no theoretical relevance to the question at hand. As Collins observes, “Each [identity] is an analytical category that cannot be simply added together and combined with the others. The relationships among these categories lie in their particulars—they must be empirically studied and theorized, not simply assumed for heuristic convenience.”70 This concern is particularly relevant for secondary data analyses, for which the researcher must rely on the identity/social position data that was collected by another entity. When the question at hand implicates an identity, social position, and form of oppression/advantage that is not represented in the dataset, exploring the question absent that representation may produce misleading conclusions. Similarly, adding all available identities, social positions, and forms of oppression to the mix—a kitchen sink mentality—may produce findings that are uninterpretable or which similarly result in misleading conclusions about the ways in which identities, social positions, and oppression/advantages interact to impact outcomes of interest.71 To avoid these first three issues, Leah Warner advises researchers to explicitly state why we choose particular intersections rather than simply that we do … [As a researcher] I should ask myself: Have I given thought to why one was included and not another? What is the underlying rationale for making these choices? And how do these (one or two or three) identities together explain something that each identity alone does not?72 Collins similarly warns that “if a heuristic device is applied uncritically, more as a formula than as a tool for invention for critically engaged social problem solving, it may no longer be able to spark innovation” about the reform or transformation of social power structures required to remedy

328 Alternate Lenses for Educational Measurement inequalities.73 Absent theory as to why a given set of identities, social positions, and forms of oppression/advantage interact to impact lived experiences, implications for remedying disparities are obscured. A final issue that may arise when intersectionality is applied as a heuristic device focuses on the degree to which power relations are considered part of the analysis. As a critical social theory, intersectionality assumes that power plays a central role in constructing social categories. As the analysis of systemic racism presented in Chapter 3 reveals, those in power construct and mold the concept of race in order to provide advantage to those membered into the dominant racialized group through the oppression of those membered into subordinate racialized groups. To sustain and refine racialized categories, narratives are developed and spread throughout society, shaping thoughts and experiences for members of that society. Similar power relations produce conceptions and associated narratives specific to gender, sexuality, religious affiliation, age, ableism, and all other socially constructed categorizations. As a critical social theory, a primary objective of intersectionality “is the empowerment of individuals and groups to transcend the constraints imposed by those power relations.”74 To support this transcendence, intersectional analyses cannot be limited to examining differences among members of social categories. Intersectional analyses must explore the role power plays in producing those differences and, ideally, locate positions of intervention to alter the power relations that enable disparate productions. Given the centrality of power relations in intersectionality theory, simply asking questions about demographic difference or comparing different social groups does not constitute intersectionality research. Rather, it is the analysis and interpretation of research findings within the sociohistorical context of structural inequality for groups positioned in social hierarchies of unequal power that best defines intersectionality research.75 A recent analysis of published studies that employ intersectional groupings by Lisa Bowleg and her colleagues indicates that “deep engagement with ideas of power” is too often absent from such studies.76 In part, this lack of engagement with power stems from a failure to explicitly link theory to the method and interpretations made based on intersectional analyses. In many cases, studies that fail to engage power and theory take for granted the existence of discrete social categories, treating the categories as inherent to individuals, and simply combining them to form intersectional groupings. Rationale for why the resulting groupings matter to the question at hand and, more specifically, how society may operate to produce lived experiences that differ among the groupings is either absent or underdeveloped. As Bowleg and her colleagues describe,

Intersectionality Theory 329 In order for intersectionality to be clearly understood within quantitative studies, authors must explicitly identify the intersectional positions of interest and how they reflect social power, as well as specify their intersectional approaches, assumptions, and interpretations, making the match between theory and methods clear.77 Elizabeth Cole, a professor of women’s studies and psychology, similarly encourages researchers to “consider how race, gender and other social categories operate as social processes in relation to the research topic, rather than simply treating them as independent variables describing their participants.”78 To this end, Cole suggests that researchers consider shifting away from using membership in social categories as the variables of interest and instead develop and employ more direct measures of the behaviors, attitudes, and experiences that are theorized to differ among members of social identity categories and which are believed to contribute to the production of disparate impacts. As examples, she suggests that an indicator representing “experiences of discrimination” could be used in lieu of racialized group membership, and that “feminist consciousness” might be used instead of gendered identity.79 Bowleg similarly suggests several constructs that might provide more direct indicators of the behaviors, attitudes, and experiences that are inferred when social identity categories are employed as a proxy for oppression/advantage. As Bowleg observes, “concepts such as race and class are socially constructed, and as such, explain virtually nothing in and of themselves.”80 Rather, it is constructs such as stress, prejudice, discrimination, and stereotype threat vulnerability, among others, that are differentially experienced by members of social identity categories and that contribute to the production of disparate outcomes. Practical Challenges for Quantitative Research Intersectionality theory introduces several challenges for quantitative research. These challenges are productions the White Racial Frame continues to have on both quantitative analytic techniques and methods used to “measure” categories of social identity. Epidemiologist Greta Bauer observes, “Some challenges are conceptual or linguistic, some relate to measurement and specification, and others arise from difficulties or confusion in matching the social theory to the statistical theory underlying particular quantitative analysis methods.” Bauer notes, however, that in addition to limiting the utility of existing methods for intersectional analyses, these challenges also present an opportunity to improve analytic methods, “particularly with regard to [their] potential to more accurately document … inequalities, and to identify causes of these inequalities and their potential solutions”81 The following

330 Alternate Lenses for Educational Measurement subsections explore two primary challenges to representations of identity and existing analytic techniques when quantitative research is conducted through an intersectional lens. Reliance on Identity Categories

Intersectionality Theory challenges traditional conceptions of identity, which is often conceptualized as distinct, mutually exclusive variables rather than a fusion of interactive processes.82 In this traditional conception, a person is understood as being composed of a combination of multiple, independent, discrete identities. When the traditional conception is open to the existence of oppression and advantage, each category of identity is understood as being associated with a specific form of oppression and advantage. Under the traditional conception, a person’s lived experience is produced by the added effect of each form of oppression/advantage. Intersectionality Theory challenges three aspects of this traditional conception of identity categories. First, rather than seeing identity categories as discrete and unique, identity categories are mutually constructed. As a result, no one identity category operates independently of any other category of identity. Second, because identity categories are understood as mutually constructed, Intersectionality Theory does not view the impacts of oppression/advantage as additive. Instead, impacts of oppression/advantage are understood as a multiplicitous unique whole. Third, rather that understanding identity as a stable characteristic of an individual, one’s identity and social position shift both over time and based on the social context in which one is located. Further, identity is not understood as discrete, dichotomous nominal categories. Instead, identity more closely resembles a continuous “variable” that contains spaces between the traditional nominal categories. These spaces are produced through the ways in which experiences with oppression/advantage are influenced by one’s self-identity and perceived (street) identity, one’s current social context, and more complex understandings of biological sex, gender, sexuality, economic class,83 racialized membering, and so on. This complex conception of identity, social location, and oppression/ advantage challenge current approaches to “measuring” identity. It is not sufficient to simply ask a person to select between two gender options (male or female) or to select among a set of racialized identities. Understanding identity, social location, and oppression/advantage as multiplicitous challenges the current vocabulary of identity. The concept of gender, racialized identity, sexuality, economic status, and so on are deeply etched components of the current dominant ideology, and the vocabulary and associated understandings are engrained in us from a very young age. Lacking a new vocabulary, we fall back on existing terminology.

Intersectionality Theory 331 Similarly, methods for collecting information about identity rely on this vocabulary and associated conceptions. Together, these dependencies create a bind for intersectionality theorists that leads to the use of language that combines terminology for discrete identity categories to create intersectional identities—a person who identifies/is identified as Black and as a woman is termed a Black woman, and the response to one question asking about their gender and the response to a separate question asking about their racial identity is used to place them into the category labeled Black woman. Having been presented with a limited set of options for each category—man or woman, Black, Asian, Latine, Indigenous, or White—the resulting set of intersectionality categories is limited by the combination of response options. Further, being asked to consider each identity in an unspecified context results in each response suggesting one’s discrete identities are universal across time and place. Current data collection methods create challenges capturing the variation within, fluidity of, and influence of social context on identity. The limitation of current conceptions of and methods for collecting information about identity produce a dilemma for intersectionality: accept the current methods and risk perpetuating a positivist, White Racial Framed conception of identity, or forgo quantitative intersectional analyses until new approaches are developed that yield representations of identity that are more compatible with Intersectionality Theory. Until sufficient progress is made on the latter development effort, recent advances in identity data collection techniques may provide a middle path that enables intersectionality researchers to collect more nuanced identity information—albeit still limited by the mutually exclusive identity framework. For researchers who apply an intersectional lens to their work, at least four recent advances hold promise to improve the collection of information about identity and oppression. As discussed earlier, López and her colleagues have developed a method for collecting information about both a person’s identity and their street race.84 Their analyses show how understanding of oppression and lived experiences is enriched through the use of an indicator of street race. Similar efforts are needed to develop and apply measures focused on how one is identified and how that membering influences lived experiences. Once validated, analyses that explore the intersections of one’s identity and how one is identified hold promise to deepen understanding of the complex ways in which identity and social position work to impact lived experiences. Situating identity—both how one identifies and how one is identified—in specific contexts holds similar potential to deepen understanding of the intersections of identity, social position, and lived experiences. As Bauer describes: There is not necessarily concordance between one’s personally held identity and a social position one occupies, as indicated either by objective

332 Alternate Lenses for Educational Measurement measure (e.g. income or wealth) or the way one is perceived and treated by others (e.g. racialization). A woman migrating to the United Kingdom may find herself racialized as black, despite holding no such identity in her home country; a bisexual-identified woman may be assumed by others to be heterosexual based on her male partner; and one does not have to identify as impoverished to live in poverty. … Moreover, identities are context-specific and may shift with regard to place and time, or with the need to align with others around shared identity.85 By establishing the context in which a person visualizes their interactions, instruments may collect more accurate and nuanced information about identity and social position that is better aligned with the lived experiences of interest. Given the influence context can have on identity and social position, Hancock suggests that “fuzzy-set logic” holds potential to provide useful differentiations within identity categories: Fuzzy-set logic can best capture the within-group diversity at stake among categories of race, class, gender, and region … Using fuzzy-set theory allows a scholar to attend to the issue of within-group diversity in each category in a manner that is substantively and theoretically consistent with the claims of intersectionality.86 As an example, in a traditional survey item designed to collect information about “race,” one’s racialized identity may not fit neatly into any one response option. Rather than asking a single item, multiple items might be employed to understand the fluidity of one’s identity membership. Fuzzy-set logic is applied to the responses to these items to represent one’s membership. As an example, a person deemed fully in a category across contexts would receive a value of 1; a person who is almost always within the category would receive a value of .75; a marginal member who is equally in and out of the group depending on context would receive a value of .5; a person rarely in the group would receive a value of .25; and a nonmember who is never in the group would receive a value of 0. Using graduate students as an example, Hancock shows how the concept of fuzzy-set logic may be particularly applicable for class identity. In some contexts, class identity is based solely on income. In other cases, family wealth defines class. And in still other cases, education level influences class identification. Graduate students can be defined, depending on with whom you speak, as “educated working poor”—making comparatively little money while in the process of acquiring the highest level of education available. Based solely on income, graduate students might be classified in relatively low

Intersectionality Theory 333 income-based class level. However, their education level would place them at much higher-class level.87 Using fuzzy-set logic, a graduate student from a high-wealth background would be given a value closer to the highest level of high social economic status than would a graduate student from a higher-income but lower-wealth background, while the graduate student from both a low-income and a lowwealth background would be assigned the lowest value. Similar logic could be employed to assign values to those with lower levels of education but whose income and wealth vary. Fuzzy-set logic allows multiple sources of information about class to be used in conjunction to assign values representing the strength of one’s class membership. More generally, fuzzy-set logic holds potential to shift away from dichotomous classifications and more accurately represent the variation that exists within identity categories. Intersectionality Theory acknowledges that identity, social position, and the oppression/advantage a person experiences shifts over time and context. Intersectionality also understands that oppression/advantage operate at multiple levels within a society’s ecological system—the micro/individual, meso/locale, and macro/systemic. Until recently, researchers too often relied on demographic characteristics, such as racialized identity, gender, and social economic status, as proxies for the impacts of racism, genderism, classism, and so on. This reliance, however, provides a weak representation of experiences with oppression/advantage, treats these experiences as universal across time and place, and flattens the ecological social system. To provide a more direct indicator of the oppression/advantage experienced at the meso/locale level of the ecological social system and to capture shifts in the level of exposure to oppression/advantage over time, Sarah LaFave and her colleagues at the Johns Hopkins School of Nursing engaged in a pilot study to develop a “cross-time, cross-space, cross-context instrument” that yields a representation of a person’s exposure to structural racism during different periods of their life.88 Focusing on adults membered Black, their instrument collects information about the locales in which a person resided during specific periods of their life—pre-school, elementary school, high school, young adult, and so on. The research team then connected data about each locale stored in various databases that contain information about census return rates, voting wait times, school quality, employment discrimination lawsuits, wage and job growth rates, adult asthma rates, landfills, firearm dealers, taxes on soda and chips, food access, residential segregation, policeinvolved deaths by racialized identity, incarceration rates by racialized identity, and so on. This detailed information is used to provide indicators of the level of structural racism to which the person was exposed across nine context domains that include: civics, education, employment, environment, health care, income credit/wealth, media and marketing, neighborhood factors, and

334 Alternate Lenses for Educational Measurement policing. This approach is resource-intensive, in large part because tools that link these various data sources have not yet been developed. Further, producing a robust record of exposure to structural racism across time depends on a person’s ability to recall detailed information about their residence at specific stages of their life. Nonetheless, in their pilot study, LaFave and her team found that they were able to produce complete records for 72% of their sample.89 More importantly, their pilot study provides evidence that the portrait about a person’s exposure to structural racism at the meso-level captures the nuance variation that occurs among people who would otherwise be membered into the same category using traditional demographic information about one’s racialized identity. Each of these efforts to advance data collection methods—street race, fuzzy-set logic, and situating identity and exposure to meso-level oppression/advantage across time, space, and context—hold potential to better align data with intersectionality’s conception of identity, social position, and oppression/advantage. Yet, as noted earlier in the chapter, many of these methods rely on discrete single-axis representations of identity and oppression/advantage. Clearly, additional efforts are needed to develop new ways of collecting information and representing intersections of identity, social position, and oppression/advantage to better align social issues with Intersectionality Theory. Limitations of Current Statistical Methods

Just as current data collection methods present challenges for quantitative researchers who wish to explore questions through an intersectional lens, current statistical analytic methods present similar challenges. To a large extent, these challenges are a product of the positivist component of the White Racial Frame. This positivist component privileges research questions that examine differences among groups, provide insight into laws governing the production of outcomes, or both. Intersectionality Theory centers on social power arrangements and the disparate outcomes produced by those arrangements. Intersectionality Theory directs sharp focus on how multiple forms of oppression/advantage conspire to impact the lived experiences at the intersection of identity and social locations. The analytic techniques commonly employed by quantitative researchers, however, were not designed to represent this social arrangement. As Bowleg describes, “many statistical methods often rely on assumptions of linearity, unidimensionality of measures, and uncorrelated error components that are incongruent with the complex tenets of intersectionality.”90 In addition, whereas the White Racial Frame conceives various forms of identity and oppression/advantage as operating separately and uniquely, Intersectionality Theory understands identity and oppression/advantage to

Intersectionality Theory 335 function in a simultaneous, multiplicitous manner. However, current statistical methods were designed to treat each form of identity/oppression separately, yielding separate estimates for each distinct variable and then summing the estimated effects to model cumulative effects. As Bowleg observes: although intersectionality theory provides a conceptually solid framework with which to examine the social location of individuals and groups within “interlocking structures of oppression,” the methodological choices at our disposal to do so are severely limited. Try as we might, it is virtually impossible to escape the additive assumption implicit in the questions we use to measure intersectionality and in our analysis of the phenomenon.91 Unable to escape current statistical methods, intersectional researchers have applied three classes of statistical models: additive, interaction, and multilevel. Additive Models

Regression models and analysis of variance are the most commonly employed additive models used to examine the relationship among outcomes of interest and the intersections of identity and oppression/advantage. This approach typically begins by identifying a set of demographic characteristics that are theorized to be collectively related to an outcome variable. In the best case, the demographic variables are presented as proxies for forms of oppression/ advantage associated with that demographic category. As an example, gender serves as a proxy for genderism, and racialized identity (most often termed race) serves as a proxy for racism. In both regression analyses and analysis of variance, separate estimates of the relationship between each demographic characteristic and the outcome variable are made. In a regression analysis, a function is then produced that contains each of the demographic variables of interest along with a coefficient that represents the contribution that the demographic variable makes toward estimating or predicting the outcome variable of interest above and beyond the influence of other variables included in the model. In an additive model, the coefficient for each demographic variable is known as a main effect and reflects the unique contribution variation in each demographic variable it shares with variation in the outcome of interest. As an example, the following function uses racialized identity as a proxy for exposure to racism and gender as a proxy for exposure to genderism to predict monthly savings. For the sake of simplicity, only two racialized identities are considered in this example, White and Black. For each demographic characteristic,

336 Alternate Lenses for Educational Measurement a person is given a code of 1 if their gender identity is male and 0 if female. For racialized identity, a person is coded 1 if their identity is White and 0 if Black. In this regression model, the coefficient for racialized identity is 10, which indicates that, on average, people who identified as White contributed ten more dollars to savings than did people who identified as Black. Similarly, the coefficient for gender is 4, indicating that people who identified as male contributed four more dollars to savings than did people who identified as female. Based on this model, the added effects of racialized identity and gender estimate that people who identity as White and male save $14 more on average than people who identify as Black and female. A person who identifies as White and female saves $10 more a month than a Black female, and a person identifying as Black and male saves $4 more than a Black female.

Savings A 10 Racialized Identity 4 Gendered Identity

By estimating separate coefficients for each demographic characteristic included in the model, the additive model treats each form of identity (proxy for oppression/advantage) separately. This separate treatment contrasts with Intersectionality Theory’s conception of multiplicitous simultaneity. By adding estimated “effects” to produce an estimate for each intersectional “effect,” potential to mispresent the intersectional impact of oppression/ advantage occurs. As an example, Bowleg and Bauer critique the interpretation made in an article published in the New England Journal of Medicine that focused on the odds of being referred for cardiac catheterization based on the intersection of racialized identity and gender identity.92 In this study, actors who differed in their racialized and gender identities (Black male, Black female, White male, and White female) reported the same symptoms to doctors who then decided whether or not to refer the acting patient for cardiac catheterization.93 Using an additive model to estimate the effects separately for gender and racialized identity, the researchers estimated that the odds ratio of referral for women were 40% lower than for men and that odds of referral for patients membered Black were also 40% lower than for those membered White. Combining the two effects, the researchers then concluded that Black women had 60% lower odds of referral compared to White men. Yet, when examining the results more closely, Bowleg and Bauer observed that in reality, Black women were not only the worst off, but the effect was limited entirely to Black women. This resulted in misestimates at two other intersections. White women appeared to have reduced referrals, but only because the main effect for gender also included Black women, the sole

Intersectionality Theory 337 driver of the effect. Similarly, Black men appear to have reduced odds of referral, again, only because main effects for race included Black women.94 In other words, employing an additive model led the researchers to combine the two separately estimated effects to reach false conclusions for three of the four intersectional groups of interest. For analyses conducted through an intersectional lens, additive models that rely on single-axis representations of identity treat intersectional effects as the simple addition of effects associated with each unique identity category. As Bauer summarizes, “main effects models violate intersectionality’s core premise that multiple social positions shape experience jointly, rather than independently.”95 Interaction Models

Interaction models come a step closer to reflecting the compound functioning of intersections of identity and oppression/advantage. Interaction models are an extension of an additive model and include estimates for the joint effects of two or more main effects. Interaction models begin by estimating the main effects of each demographic variable included in the model. Once the relationship between each demographic variable and the outcome variable of interest is accounted for, interaction terms are added to the model to estimate the joint relationship between two or more demographic variables and the outcome variable. An interaction term is created by multiplying the values assigned to two or more demographic variables. In the aforementioned example, the racialized identity demographic variable was coded 1 for people who identity as White, and 0 for people who identity as Black. Similarly, gender was coded 1 for people identifying as male, and 0 for female identity. An interaction term representing the intersection of racialized identity and gendered identity is formed by multiplying the values assigned for racialized identity and gender identity. In this example, the interaction term for White males would equal 1, and all other intersecting identities are assigned a value of 0. Adding the interaction term representing the intersection of racialized and gender identities produces an estimate of the relationship between the outcome variable and White male status as compared to all other intersections of racialized and gendered identity. At first glance, interaction models seem more closely aligned with the notion of simultaneity and multiplicity that is at the core of Intersectionality Theory—after all, the interaction term is multiplicative and represents the simultaneous joint relationship between an outcome variable and two or more forms of identity and oppression/advantage. Upon closer inspection, however, we see that the interaction model provides an estimate for only one intersecting identity and compares it to all other intersections of identity.

338 Alternate Lenses for Educational Measurement This approach treats all other intersecting identities as if they were one—a concept that runs counter to intersectionality theory. In addition, because interaction models estimate the interaction term after accounting for main effects, they privilege single-axis conceptions of identity and oppression/ advantage. By estimating the interaction term after having accounted for the main effects of gender and racialized identity, interaction models treat the relationship between an intersecting identity and form of oppression/advantage as residual to the main effects of gender and racialized identity.96 As Bowleg explains: interactions are contingent on the size of main effects. For example, when significant main effects exist, the probability of finding significant first order (a two-way interaction) or higher order interactions (three, four and n way interactions) decreases because the significant main effects account for the bulk of the variance in the dependent variable … Thus, when the effect sizes of main effects are large, the probability that no interaction effect will be found is greater. When there are no main effects or just a few, predicting whether an interaction will be found and the magnitude of the interaction becomes virtually impossible.97

Multilevel Models

Multilevel modeling is a third approach used to analyze quantitative data through an intersectional lens. As described in Chapter 11, multilevel modeling recognizes the clustering of individuals that social structures produce and is designed to incorporate contextual factors that are associated with or produced by the clustering of individuals into larger units. There are two ways in which researchers have applied multilevel modeling when applying an intersectional lens. The first approach aims to reflect the ecological structuring of our society by including both individual- (micro-) and cluster- (meso-) level factors in the model. In this ecological approach, multilevel modeling clusters people within units representing a specific meso-level condition to examine how relationships among identity, social position, and oppression/advantage may differentially impact both individuals and clusters of individuals. As sociologists Nicholas Scott and Janet Siltanen describe, “the conceptual underpinning of multilevel modeling is to explicitly account for the social contexts of inequality by animating context itself as a unit of analysis and source of variance.”98 As an example, they present a study that examined the relationship between social position and unpaid housework. In this analysis, the researchers recognized that people are clustered into neighborhoods and that neighborhoods tend to differ with respect to the social economic

Intersectionality Theory 339 status of people residing in the neighborhoods. The researchers hypothesized that both an individual’s social economic status and that of those who reside in their neighborhood combine to influence the amount of unpaid housework that is performed. Their analysis found a negative relationship between low income and unpaid housework at the individual level. When they factored in the neighborhood context, they found individuals with low income who resided in low-income neighborhoods engaged in less unpaid housework than individuals with similarly low income residing in higher-income neighborhoods. As the authors state, neighborhood residence “magnif[ied] the negative impact of low income on average hours of housework.”99 By including gender and income in their analyses, the researchers were also able to estimate how gender and income separately and jointly were related to the amount of unpaid housework performed by the individual. Multilevel analyses such as this assist researchers in separating the influence of individual-level factors from those of higher levels within an ecological system to deepen understanding of how the social structuring of society contributes to the production of inequities. And when intersections of identity and social position are included in the analyses, multilevel modeling provides insight into how meso-level factors interact to impact experiences at the intersections of identity and social position. In this approach, however, identity and social position are treated as single-axis characteristics that reside within each individual. Thus, while this multilevel modeling approach reflects the ecological structuring of a social system embraced by intersectionality, it does not fully reflect the social structuring of identity, social position, and resulting oppression/advantage. A second approach to multilevel modeling attempts to address this shortcoming. In this second approach, the social construction of identity and social position is understood to manufacture the clustering of individuals. Rather than understanding the clustering of individuals to occur in a physical context such as a neighborhood, school, or classroom, individuals are understood as clustered within social strata. Further, social strata are formed by the intersections of two or more social strata. As discussed earlier, both race and gender are understood as social constructs. Traditionally, these social constructions are understood to produce two different and distinct social strata, one termed gender and the other termed race (racialized identity). The intersection of these two strata yields multiple unique intersectional social strata—White males, Black males, White females, Black females, etc. In this second approach multilevel modeling is used to estimate the relationship between an outcome of interest and the clustering within social strata, rather than physical clustering. As sociologist Clare Evans describes, this approach “treat[s] intersections of identity and process as more akin to contexts than separate axes … bringing intersectional methods closer into alignment with intersectional theorizing.”100

340 Alternate Lenses for Educational Measurement This approach begins by building a function that models the relationship between input variables, including social positions, and an outcome of interest. In building this model, both individual variables and clustering within social strata are used to predict the outcome for each individual. The difference between the predicted outcome and the outcome that was actually observed is then calculated for each individual, yielding what is termed a residual—a value representing what is left over and not explained or predicted by the model. A positive residual indicates that the observed outcome was higher than predicted, and vice versa for a negative residual. The residual values for the individuals located within each social stratum are then compared. On average, the residual values are expected to equal zero. If, however, the average residual is less than zero, then the effect of oppression experienced by the social stratum is interpreted as negatively impacting the outcome. Conversely, when the average residual is greater than zero, the advantage associated with the social stratum is said to positively impact the production of the outcome. The more the average residual for a social stratum differs from zero, the larger the influence oppression/advantage experienced by that stratum is estimated to have on the production of the outcome. As an example, Evans and her colleagues applied this multilevel approach to model health inequalities at the intersection of multiple social identities. Specifically, the authors examined the relationship between the intersections of gender, racialized identity, education, income, and age and Body Mass Index (BMI), a common indicator of health risk. For this analysis, social strata were formed by intersecting each of the social identities. A set of multilevel models was estimated that predicted BMI based on individual characteristics and the social stratum to which individuals were membered. The average residuals for each social stratum was then analyzed. Among the findings was that the lived experiences of people membered as low-income Black females contributed to higher BMI than predicted, while the advantage experienced by highly educated White female membership contributed to lower-than-predicted BMI. As the authors note, this application of multilevel modeling “does not explicitly model causal pathways ranging from the societal [macro-level] to the molecular [micro-level]” and thus does not reflect the ecological structuring of society.101 However, this “multilevel approach brings the methods used to study interactions closer into line with what intersectional theorists and social epidemiologists have advocated— the assessment of intersectional effects for all social identities, including those that mix privilege and disadvantage.”102 Implications for Educational Measurement This chapter explored several ways in which intersectionality theory challenges traditional conceptions and existing analytic techniques commonly

Intersectionality Theory 341 employed by the field of educational measurement. These challenges provide opportunities for the field to expand its conception of identity from stable single-axis discrete formation inherent to the individual to a multiplicitous, fluid, and context-bound social construction designed to support productions of oppression and advantage. Moreover, these productions of oppression and advantage occur at multiple levels of an ecologically structured society, including the micro/individual, meso/ locale, and macro/systemic levels. Further, the forms of oppression and advantage experienced by an individual are both context-bound and specific to their intersectionally defined social location. This complex conception challenges the ways in which identity is typically “measured” and modeled in educational measurement. Responding to these challenges provides opportunities for the field to advance measurement practices and advance statistical modeling techniques. There are also opportunities to broaden the focus of research to examine intersections of identity, social processes that produce differences in lived experiences, and potential approaches to remedying the production of disparate outcomes through reform and transformation of these social processes. As just one small early example, recent efforts have applied an intersectional lens to the examination of potential bias in the scores produced by cognitive test instruments. Similar to Bowleg and Bauer’s critique of the New England Journal of Medicine article that confounded experiences of Black women through single-axis estimates of the odds of cardiac catheterization, my colleagues and I speculated that analysis of potential bias in test items employing single-axis conceptions of identity misestimated the level of potential bias. Recognizing the compounding impact various forms of oppression and advantage have at the intersection of identity, our analyses compared findings from analyses that estimated the degree to which items functioned differently across identity groupings. In the traditional analyses, item performance was examined separately by gender, racialized identity, and economic status. In this traditional approach, a very small number of occurrences of potential bias were detected. In the alternate approach, analyses were conducted at the intersection of these identities, with students identified in the dataset as White males from high economic households serving as the reference group. This intersectional approach detected a much higher level of potential bias, with those intersectional groupings who were theorized to experience the highest levels of oppression in society and the educational system specifically experiencing higher amounts of potential bias.103 While further research is needed to understand the various social processes that may contribute to this bias, these analyses demonstrate the utility an intersectional lens has for unveiling inequities in educational measurement and directing attention to exploring potential sources of social productions of those inequities.

342 Alternate Lenses for Educational Measurement As has been explored in the second portion of this chapter, Intersectionality Theory points to several opportunities to improve the measure of identity. These opportunities focus on developing indicators that differentiate how one identifies from how one is identified, how one’s identity shifts over time and place, and what experiences associated with one’s identity are at play in shaping lived experiences. Beyond improving the methods employed to collect indicators of identity and experiences associated with identity, Intersectionality Theory also highlights opportunities to advance statistical modeling techniques to shift focus from single-axis estimates of “effects” and to further represent the ecological structuring of society and the influence of that structuring on lived experiences. No doubt Intersectionality Theory can create cognitive dissonance for a person like me whose understandings were shaped by a White Racial Frame that treats identity categories as discrete individual traits. By shedding this conception, Intersectionality Theory points to important opportunities to reform and transform our practices to expand the field’s efforts to advance social justice. Notes

1 Lorde (1984), p. 120. 2 Public Law 88-352, July 2, 1964, p. 255. 3 See Crenshaw (1989) for an analysis of this case. 4 DeGraffenreid v. General Motors, 413 F Supp at 143, quoted in Crenshaw (1989), p. 141. 5 Truth (1851). 6 Else-Quest and Hyde (2016, p. 160) observe that “[m]ost researchers in the United States have typically assumed that it is a fact that gender is categorical and that there are two gender categories, not more.” 7 See Else-Quest and Hyde (2016), p. 163, who explore the natural inclination to categorize and dichotomize. 8 Carastathis (2016), p. 108. 9 Carastathis (2016), pp. 111, 114. 10 Collins (2019), p. 3. 11 May (2015), title and p. vii. 12 Collins (2019), p. 26. 13 Carastathis (2016), p. 51. 14 Crenshaw (1991), p. 1244. 15 Truth (1851); Cooper (1988/1892); Beal (1970). 16 Fehrenbacher and Patel (2020), p. 148. See Combahee River Collective (2014/1978) for the full statement. 17 Collins (2019), p. 26. 18 Carastathis (2016), p. 166, italics added. 19 Collins (2019), pp. 4–5. 20 Anzaldúa (2012/1987). 21 Veenstra (2011). 22 Meyer (2012). 23 Collins (2019), p. 9. 24 Collins (2019), p. 89. 25 See Collins (2019) and Bowleg (2012).

Intersectionality Theory 343 26 27 28 29 30 31 32 33 34

Zuberi (2001); Holland (2008); Spector and Brannick (2011). Warner (2008), p. 459. Collins (2019), p. 173. Hall (2017), p. 16, quoted in Collins (2019), p. 37, italics in the original. Else-Quest and Hyde (2016), p. 164. Collins (2019), p. 173. López et al. (2018a, 2018b). Bowleg (2012), pp. 1271–1272. Hancock (2007, p. 74) explores the micro-/macro-focus that is an essential component of an intersectional lens. 35 Carastathis (2016), p. 47. 36 Bowleg (2012), pp. 1271–1272. 37 Combahee River Collective (2014), p. 271. 38 Hancock (2007), p. 65, italics added. 39 Else-Quest and Hyde (2016), p. 155, italics added. 40 Alarcón (1990), p. 366; Lugones (2003), p. 223. 41 King (1988); Carastathis (2016). 42 Fehrenbacher and Patel (2020), p. 148. 43 Anzaldúa (2012/1987), p. 103. 44 Lugones (1994), p. 463. 45 Bowleg (2013), p. 758. 46 Crenshaw (1991); Carastathis (2016). 47 Carastathis (2016), p. 11. 48 See Hancock (2007) for an overview of this concern, about which she writes, The powerful reply that all categories can be fractured into ever-exponentially increasing sub-categories once intersectionality is addressed empirically has led to a rejection of intersectionality by a number of variable-oriented researchers who envision a paralysis emerging from the inclusion of increasing numbers of variables. The rule of parsimony, so the argument goes, would be violated with little to no gain in explanatory power for political problems such as persistent poverty or discrimination. (p. 66) 49 Carbado et al. (2013), p. 816, quoted in Carastathis (2016), p. 127. 50 Crenshaw (1991), pp. 1242–1244. 51 Collins (2007) explores the importance of dynamic centering when conducting intersectional research in the field of sociology. 52 McCall (2005), pp. 1772–1773. 53 Carastathis (2016). 54 May (2015). 55 Else-Quest and Hyde (2016), p. 155. 56 Hancock (2007), see p. 69. 57 Else-Quest and Hyde (2016), p. 164. See also Cole (2008), p. 449 on the importance of attending to differences within groups to shed light on variation within social categories. 58 McCall (2005), p. 1773. 59 McCall (2005), p. 1774. 60 Carastathis (2016), p. 135. 61 McCall (2005), p. 1777. 62 McCall (2005), p. 1777. 63 McCall (2005). 64 McCall (2005), p. 1778.

344 Alternate Lenses for Educational Measurement 65 McCall (2005), p. 1777. 66 Warner and Shields (2013), p. 806. 67 Else-Quest and Hyde (2016). 68 Hancock (2007). 69 Carastathis (2016), see pp. 52–53. 70 Collins (2019), p. 40. 71 Warner and Shields (2013). 72 Warner (2008), p. 456, italics in the original. 73 Collins (2019), p. 41. 74 Else-Quest and Hyde (2016), p. 321. 75 Bowleg (2008), p. 323. 76 Bauer et al. (2021), p. 5. 77 Bauer et al. (2021), p. 5. 78 Cole (2008), p. 451. 79 Cole (2008), p. 451. 80 Bowleg (2008), p. 316. 81 Bauer (2014), p. 12. 82 Carastathis (2016), see p. 62; Fehrenbacher and Patel (2020), see p. 146. 83 As an example, recall the differences in wealth between people membered Black or White within income groupings documented by Oliver and Shapiro (2006). Defining social economic status based on income versus wealth will alter the membering of people, particularly those membered Black. 84 López et al. (2018b). 85 Bauer (2014), p. 13. 86 Hancock (2007), p. 71. 87 Hancock (2007), p. 72. 88 LaFave et al. (2022), p. 3. 89 LaFave et al. (2022). 90 Bowleg (2012), p. 1270. 91 Bowleg (2008), p. 322. 92 Bowleg and Bauer (2016). 93 Schulman et al. (1999). In the study, age was also considered, with two age groups represented by the actors. For the sake of simplicity, I focus only on the conclusions reached by the authors based on the intersection of race and gender. 94 Bowleg and Bauer (2016), p. 339. 95 Bauer et al. (2021), p. 2. In making this point, Bauer references Bowleg and Bauer (2016). 96 Scott and Siltanen (2017). 97 Bowleg (2008), p. 319. 98 Scott and Siltanen (2017), p. 380. 99 Scott and Siltanen (2017), p. 381. 100 Evans (2019), p. 96. 101 Evans et al. (2018), p. 70. 102 Evans et al. (2018), p. 70. 103 Russell et al. (2022).

References Alarcón, N. (1990). The theoretical subject(s) of This Bridge Called My Back and Anglo-American feminism. In Making Face, Making Soul/Haciendo Caras: Creative and Critical Perspectives by Feminists of Color. Aunt Lute.

Intersectionality Theory 345 Anzaldúa, G. (2012/1987). Borderlands/La frontera: The New Mestiza (4th ed.). Aunt Lute. Bauer, G.R. (2014). Incorporating intersectionality theory into population health research methodology: challenges and the potential to advance health equity. Social Science & Medicine, 110, 10–17. Bauer, G.R., Churchill, S.M., Mahendran, M., Walwyn, C., Lizotte, D. & VillaRueda, A.A. (2021). Intersectionality in quantitative research: A systematic review of its emergence and applications of theory and methods. SSM-Population Health, 14, 1–11. Beal, F.M. (1970). Double jeopardy: To be Black and female. In The Black Woman: An Anthology. New American Library. Bowleg, L. (2008). When Black + lesbian + woman ≠ Black lesbian woman: The methodological challenges of qualitative and quantitative intersectionality research. Sex Roles, 59(5), 312–325. Bowleg, L. (2012). The problem with the phrase women and minorities: Intersectionality—An important theoretical framework for public health. American Journal of Public Health, 102(7), 1267–1273. Bowleg, L. (2013). Once you’ve blended the cake, you can’t take the parts back to the main ingredients: Black gay and bisexual men’s descriptions and experiences of intersectionality. Sex Roles, 68(11), 754–767. Bowleg, L. & Bauer, G. (2016). Invited reflection: Quantifying intersectionality. Psychology of Women Quarterly, 40(3), 337–341. Carastathis, A. (2016). Intersectionality: Origins, Contestations, Horizons. University of Nebraska Press. Carbado, D.W., Crenshaw, K.W., Mays, V.M. & Tomlinson, B. (2013). Intersectionality: Mapping the movements of a theory. Du Bois Review, 10(2), 303–312. Cole, E.R. (2008). Coalitions as a model for intersectionality: From practice to theory. Sex Roles, 59(5), 443–453. Collins, P.H. (2007). Pushing the boundaries or business as usual? Race, class, and gender studies and sociological inquiry. In Sociology in America. University of Chicago Press. Collins, P.H. (2019). Intersectionality as critical social theory. In Intersectionality as Critical Social Theory. Duke University Press. Combahee River Collective. (2014). A black feminist statement. Women’s Studies Quarterly, 42(3/4), 271–280. Cooper, A.J. (1988/1892). A Voice from the South. Oxford University Press. Crenshaw, K. (1989). Demarginalizing the intersection of race and sex: A black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics. University of Chicago Legal Forum, 1989(Article 8), 139. Crenshaw, K. (1991). Mapping the margins: Intersectionality, identity politics, and violence against women of color. Stanford Law Review, 43, 1241–1299. Else-Quest, N.M. & Hyde, J.S. (2016). Intersectionality in quantitative psychological research: I. Theoretical and epistemological issues. Psychology of Women Quarterly, 40(2), 155–170. Evans, C.R. (2019). Adding interactions to models of intersectional health inequalities: Comparing multilevel and conventional methods. Social Science & Medicine, 221, 95–105.

346 Alternate Lenses for Educational Measurement Evans, C.R., Williams, D.R., Onnela, J.P. & Subramanian, S.V. (2018). A multilevel approach to modeling health inequalities at the intersection of multiple social identities. Social Science & Medicine, 203, 64–73. Fehrenbacher, A.E. & Patel, D. (2020). Translating the theory of intersectionality into quantitative and mixed methods for empirical gender transformative research on health. Culture, Health & Sexuality, 22, 145–160. Hall, S. (2017). Familiar Stranger: A Life Between Two Islands. Duke University Press. Hancock, A.M. (2007). When multiplication doesn’t equal quick addition: Examining intersectionality as a research paradigm. Perspectives on Politics, 5(1), 63–79. Holland, P.W. (2008). Causation and race. In White Logic, White Methods: Racism and Methodology. Rowman & Littlefield. King, D.K. (1988). Multiple jeopardy, multiple consciousness: The context of a Black feminist ideology. Signs, 14(1), 42–72. LaFave, S.E., Bandeen-Roche, K., Gee, G., Thorpe, R.J., Li, Q., Crews, D. & Szanton, S.L. (2022). Quantifying older Black Americans’ exposure to structural racial discrimination: How can we measure the water in which we swim? Journal of Urban Health, 99, 1–9. López, N., Erwin, C., Binder, M. & Chavez, M.J. (2018a). Making the invisible visible: Advancing quantitative methods in higher education using critical race theory and intersectionality. Race Ethnicity and Education, 21(2), 180–207. López, N., Vargas, E., Juarez, M., Cacari-Stone, L. & Bettez, S. (2018b). What’s your “street race”? Leveraging multidimensional measures of race and intersectionality for examining physical and mental health status among Latinxs. Sociology of Race and Ethnicity, 4(1), 49–66. Lorde, A. (1984). Sister Outsider: Essays and Speeches. The Crossing Press Feminist Series. Lugones, M. (1994). Purity, impurity, and separation. Signs: Journal of Women in Culture and Society, 19(2), 458–479. Lugones, M. (2003). Pilgrimages/Peregrinajes: Theorizing Coalition against Multiple Oppressions. Rowman & Littlefield. May, V.M. (2015). Pursuing Intersectionality, Unsettling Dominant Imaginaries. Routledge. McCall, L. (2005). The complexity of intersectionality. Signs, 30(3), 1771–1800. Meyer, D. (2012). An intersectional analysis of lesbian, gay, bisexual, and transgender (LGBT) people’s evaluations of anti-queer violence. Gender & Society, 26(6), 849–873. Oliver, M.L. & Shapiro, T.M. (2006). Black Wealth, White Wealth: A New Perspective on Racial Inequality. Taylor & Francis. Russell, M., Szendey, O. & Li, Z. (2022). An intersectional approach to DIF: Comparing outcomes across methods. Educational Assessment, 27(2), 115–135. Schulman, K.A., Berlin, J.A., Harless, W., Kerner, J.F., Sistrunk, S., Gersh, B.J. & Escarce, J.J. (1999). The effect of race and sex on physicians’ recommendations for cardiac catheterization. New England Journal of Medicine, 340(8), 618–626. Scott, N.A. & Siltanen, J. (2017). Intersectionality and quantitative methods: Assessing regression from a feminist perspective. International Journal of Social Research Methodology, 20(4), 373–385.

Intersectionality Theory 347 Spector, P.E. & Brannick, M.T. (2011). Methodological urban legends: The misuse of statistical control variables. Organizational Research Methods, 14(2), 287–305. Truth, S. (1851). Ain’t I a Woman? Delivered at the 1851 women’s convention, Akron, OH. Accessed at: https://tag.rutgers.edu/wp-content/uploads/2014/05/AintI-woman.pdf Veenstra, G. (2011). Race, gender, class, and sexual orientation: Intersecting axes of inequality and self-rated health in Canada. International Journal for Equity in Health, 10(1), 1–11. Warner, L.R. (2008). A best practices guide to intersectional approaches in psychological research. Sex Roles, 59(5), 454–463. Warner, L.R. & Shields, S.A. (2013). The intersections of sexuality, gender, and race: Identity research at the crossroads. Sex Roles, 68(11), 803–810. Zuberi, T. (2001). Thicker Than Blood: How Racial Statistics Lie. University of Minnesota Press.

13 Educational Measurement and the Pursuit of Racial Justice

It is time we started paving our own paths and one such trail should be prophetic thinking and reasoning about what a just society and world should be. This is the only way we can assure that we will be the definers of, rather than the victims of our new century.1 Preemptive precautions to prevent injustices from entering the “basic structure” of a society are not the same as rectificatory measures aimed at correcting them once they have already entered. Prevention generally differs from cure.2

Critical social theories—whether shaped through the lens of classism, genderism, racism, queer theory, ablism, or intersectionality—are developed to unveil the operation of oppression within society and liberate humanity from that oppression. Critical social theorists engage this liberatory aim in pursuit of a society in which power, resources, and the quality of life is distributed in a manner free of advantage through oppression. Critical social theories function as foundations for paths to a more just society. For those whose work is guided by critical social theory, the destination is social justice.3 Philosophers, political thinkers, social activists, and, to a lesser extent, social scientists have offered diverse frames for conceiving and pursuing social justice. Some consider justice as applied broadly within society (a.k.a., social justice). Others focus social justice on the distribution of power, property, and position within society (a.k.a., distributive justice). As examples, Jeremy Bentham, John Stuart Mill, and later Henry Sidgwick propose a utilitarian conception of distributive justice, Robert Nozick a libertarian conception, and John Rawls a conception of justice centered on fairness.4 The field of educational measurement plays a critical role in the distribution of power, property, and duties. Tests developed by the field are used to make important and life-altering decisions about entry into private schools, public exam schools, and institutes of higher education. Test scores inform decisions about high school graduation, scholarships, fellowships, internships, and other forms of financial assistance and career training. For many fields of DOI: 10.4324/9781003228141-17

Educational Measurement and the Pursuit of Racial Justice 349 employment, including policing, teaching, nursing, foreign service, and the law, tests function as a gatekeeper, allowing some test takers entry to the field while others are locked out. These and other uses of tests have considerable impact on the distribution of power, property, and duties within our society, and play a critical role in the justness of our society. Despite the important role tests play in the distributive justice of our society, the concept of justice has received little attention within the field of educational measurement. As an example, a search of all articles published in Educational Measurement: Issues and Practices reveals only one article in which the term justice appears in the title. An additional 45 articles employ the term justice at least once within the article, but in almost all cases, the term is used absent a deep exploration of the role testing plays in impacting justice within society, or of the role justice plays in informing practices within the field of educational measurement. In contrast, the term fairness appears in 14 titles and 476 articles, consequences in 23 titles and 546 articles, validity in 63 titles and 976 articles, and test in 308 titles and 1,524 articles.5 Outside of Educational Measurement: Issues and Practices, there are three notable explorations of the relationship between justice and testing. The first is a book authored by Zachary Stein, titled Social Justice and Educational Measurement: John Rawls, the History of Testing, and the Future of Education. Stein applies Rawls’s theory of Justice as Fairness to examine the ways in which state implementations of federally mandated student testing programs contribute and infringe on the advancement of justice within the U.S. educational system. Stein also examines efforts by educators, students, and parents to resist high-stakes uses of educational tests through the lens of Rawls’s conception of civil disobedience. Stein’s analysis focuses more broadly on high-stakes testing and justice within the educational system and builds an argument that such testing programs, on balance, yield greater infringements on, rather than benefits to, justice, and thus civil disobedience against these programs is, on balance, warranted.6 Heinz-Dieter Meyer focuses his analysis of justice on access to higher education. Although Meyer’s exploration focuses on several tensions in the production of the pool of students who engage in higher education studies, use of tests to inform admission decisions is a component of his analysis. Meyer examines various facets of higher education access through the lens of both individuals attempting to gain access to advance their interests and institutions granting access in ways that fulfill their specific missions. Meyer considers these two perspectives through the lenses of libertarian, utilitarian, Justice as Fairness, and communitarian conceptions of justice. His analysis demonstrates how one’s perspective on access to higher education—specifically what should be considered and prioritized in informing admissions decisions— varies across these multiple conceptions of justice.7

350 Alternate Lenses for Educational Measurement Rebecca Zwick similarly applies multiple theories of justice to examine three cases that have shaped the field of educational measurement. Her analysis initially appeared in a chapter coauthored with Neil Dorans, and is revisited in her presidential address to the National Council on Measurement in Education. Zwick and Dorans’s analysis applies three lenses of distributive justice—the Aristotelian concept of virtue, Nozick’s libertarian perspective, and Rawls’s Justice as Fairness theory—to examine how one’s perspective on the fairness of decisions in three cases might vary depending on the frame of justice through which the ruling is viewed. The three cases of interest include: Castaneda v. Regents of the University of California, in which the plaintiff argued test scores were given undue weight over other indicators of merit in admission decisions; Debra P. v. Turlington, which focused on use of test scores to inform graduation decisions; and challenges to the use of different cut-scores on the PSAT to inform National Merit Scholarship eligibility scores.8 Zwick and Doran’s analysis showed that issues at play and the degree to which the resulting rulings and actions support the production of a just distribution of higher education access, high school diplomas, and scholarships varied depending on the lens of distributive justice through which each case was viewed.9 Most recently, Jennifer Randall applied Rawls’s theory of Justice as Fairness in her analysis of sensitivity and bias guidelines employed by large-scale testing programs to support the fairness of the content of test items. As described in greater detail below, a key principle in Rawls’s theory of Justice as Fairness is the maximization of benefit for those people who are most disadvantaged in a society. Randall applies this principle to argue for a number of modifications to sensitivity and bias guidelines in an effort to prioritize benefit to test takers who are members of groups historically harmed by racialized and other forms of oppression. Through this prioritization, Randall argues test development can shift from a process informed by white supremacy to one that is guided by a justice-oriented, anti-racist perspective.10 In the remainder of this chapter, I build on the preceding analyses to identify ways in which current practices in the field of educational measurement align with a utilitarian conception of justice. I then explore how the field’s embrace of Rawls’s Justice as Fairness might alter practices. This analysis is furthered by examining Charles Mills’s critique of and rectificatory extensions to Justice as Fairness to explicitly address racialized and other forms of oppression, and by exploring additional implications that Mills’s extensions have for our practices. Utilitarianism and Educational Measurement At the highest level, theories of justice explore principles aimed at balancing individual rights, fairness, and equality within the context of a functioning

Educational Measurement and the Pursuit of Racial Justice 351 and sustainable civil society. An essential tension explored in theories of social justice focuses on benefits and harms produced by the distribution of power, property, and duties required for and produced through social cooperation.11 In its most basic form, utilitarianism aims to maximize net benefit across individuals. As political scientist Michael Sandel states in his introduction to utilitarianism, “one way of thinking about the right thing to do … is to ask what will produce the greatest happiness for the greatest number of people.”12 When considering the formation of a social structure, policy, or practice within a society, according to the 19th-century English philosopher Jeremy Bentham, utilitarianism favors that option which increases the sum of interests of the several members who compose it … A thing is said to promote the interest, or to be for the interest, of an individual, when it tends to add to the sum total of his pleasures: or, what comes to the same thing, to diminish the sum total of his pains.13 For Bentham, pleasure takes various forms including profit, convenience, advantage, emoluments, happiness, and so on. Pain is produced by inconvenience, disadvantage, loss, and unhappiness.14 John Stuart Mill termed this aim to maximize pleasure the Greatest Happiness Principle. Core to the Greatest Happiness Principle is maximizing the greatest happiness across all individuals without regard to the happiness (or pain) for any given individual.15 Within utilitarianism, no consideration is given to the manner in which benefit is distributed among individuals.16 As a result, some individuals can experience considerably greater benefit than others. In addition, maximum net benefit can be produced despite harm to some individuals as long as that harm does not deprive a person of their personal liberty, property, or any other legally obtained possession (unless that person has forfeited those rights in some manner). Further, the assessment of benefit and harm is impartial such that everyone’s level of benefit or harm counts the same.17 As we will see shortly, Rawls is critical of utilitarianism’s tolerance of harm for some in the derivation of maximum net benefit. It is this tolerance of harm for some that aligns several current practices within the field of educational measurement with utilitarianism. Here I offer three examples of current practices that, while to the benefit of most, permit harm to some. These practices include the use of differential item functioning to identify potential item bias, development of “context neutral” item content, and reliance on statistical means to evaluate efficacy. In presenting these examples, I am not arguing that they are necessarily problematic or should be halted. Rather, I use these examples to show how current practices align with utilitarian principles of distributive justice. Later in the chapter, I use these same

352 Alternate Lenses for Educational Measurement examples to explore how practices might differ if a Justice as Fairness or a Rectificatory Justice frame were applied to modify these practices. Differential Item Functioning

Differential item functioning (DIF) is commonly employed by test developers to identify items that operate in a biased manner. DIF analyses are based on a key assumption in item response theory—a foundational theory used to inform the development of nearly all tests produced since the 1980s. In item response theory, any two groups of test takers who have the same ability are assumed to have the same probability of responding correctly to a given item. This test theory also assumes that, for a given item, a test taker with a higher ability will have a higher probability of responding correctly compared to a test taker with a lower ability. Further, the theory assumes that, for any given test taker regardless of their ability, the probability of responding correctly to an easier item will be higher than the probability of responding correctly to a more difficult item. Differential item functioning analysis serves as an empirical test of this first assumption. In a typical DIF analysis, two groups of test takers are formed, most often based on a demographic characteristic such as racialized identity, gendered identity, socioeconomic household status, disability status, English as a second language status, and so on. One group serves as a reference group (the normative expectation for how the item should function), and the other as a focal group (the group for which potential bias is a concern). Typically, the group that is most advantaged in society serves as the reference group, and the group disadvantaged in society forms the focal group. As an example, when examining DIF by racialized identity, test takers membered White typically form the reference group, and test takers membered Black or Asian or Latine or Indigenous form the focal group(s). Several methods can be used to conduct a DIF analysis. In most methods, members of the reference and focal groups are divided into several subgroups, or bands, based on their estimated ability. Typically, the total test score is used to form these ability-based bands. Recall that the assumption tested in a DIF analysis is that test takers with the same ability have the same probability of responding to an item correctly. Forming bands based on ability allows the observed probability of a correct response within a given ability subgroup to be compared between the reference and focal groups. When differences in the probability of responding correctly within each band is minimal, the assumption is supported, and the item is interpreted as operating in an unbiased manner. When differences are detected and those differences exceed a specified tolerance level, the assumption is violated, and the item is classified (or flagged) as potentially biased.

Educational Measurement and the Pursuit of Racial Justice 353 In most cases, an item flagged for potential bias is then subjected to review by a panel of experts who examine the content of the item to identify a potential reason the item functions differently across the reference and focal group(s). As an example, panel review of a flagged item that functions differently across high and low social economic groups might determine that using a rowing regatta or travel to an overseas location as the context for a mathematics problem might contribute to the differential functioning of the item. In such a case, the item might either be removed from the test or revised by changing the context in which the problem is set to one that is similarly familiar across test takers from higher and lower social economic households. There are two locations within differential item analyses where a utilitarian perspective is evident. The first occurs in the business rule that establishes review procedures for a flagged item. Recall that an item is flagged when empirical evidence indicates the item is functioning differently for two groups of test takers. This empirical evidence indicates that a core assumption undergirding the theory employed to guide test development is violated. For some DIF detection methods, the empirical evidence employs statistical analyses, the result of which requires that differences in the functioning of the item across groups is found to be statistically significant. In effect, findings from a DIF analysis that result in the flagging of an item provide empirical evidence that the item functions differently, violating a key assumption of item response theory, and thus is performing in a manner that biases the information provided by the item. Yet, despite the harm produced by the item for a subset of test takers, the item is not automatically removed or modified. Instead, it is forwarded for a panel review. Some in the field may argue that this process is enacted because the findings from a DIF analysis are not definitive. Others may observe that the cost of item development is high, and automatically removing items as a result of an empirical analysis would further inflate test development costs.18 Both positions are reasonable from a utilitarian perspective that aims to maximize benefit within a system, allowing for potential harm to some. The influence of utilitarianism is further evidenced in the business rules that guide a decision about removing or retaining an item following panel review. In most cases, an item is removed only if a potential cause of differential item functioning is identified by the panel. If the panel fails to identity content or other elements of an item that is suspected of producing differences in item functioning, then the item is retained. Although not well documented in the field, it is understood that most reviews fail to identify a potential cause of differential functioning. As a result, most items are retained for operational use despite empirical evidence indicating that they are functioning differently across groups. Again, the utilitarian priority for maximizing happiness—in this case, for the test developer and the group of students

354 Alternate Lenses for Educational Measurement whose probability of responding correctly is higher—despite harm to some, is operative in this business rule. Context-Neutral Item Content

Jennifer Randall presents a critical analysis of current test development processes that support the development of “bias-free” item content. Randall’s analysis focuses on both the development of item content and the guidelines used to review that content. A core aim of item development is to author items that measure the targeted construct—the knowledge, skill, and/or ability intended to be measured by the test—in a manner that removes, at best, or minimizes, at worst, bias from the measurement process. Most test items require test takers to apply the targeted construct in a given context. As an example, reading comprehension requires the presentation of a context in which the test taker reads content—such as a passage, poem, or speech—and is then asked about that content. Similarly, many mathematics and science items ask test takers to apply the targeted construct within a given context, such as purchasing food items at a grocery store, calculating distances or driving times between cities, or calculating the distance, maximum height, or speed at which a ball travels under specific conditions. To reduce bias, many in the field advocate for the use of context-neutral content. A neutral context is one that is equally familiar across subgroups of test takers. Use of neutral context is assumed to eliminate or, at a minimum, reduce potential bias produced by differences in the familiarity with the setting or material used to measure the targeted construct. Randall’s analysis points to at least two issues that arise when test developers attempt to produce context-neutral content. First, Randall argues that current efforts to develop context-neutral content tends to yield content that is most familiar to test takers membered White. As Randall describes, Historically, the field of measurement has considered the inclusion of linguistic, communicative, and cultural characteristics in test items as construct irrelevant variants; the Standards state explicitly that “test developers should use language in tests that is consistent with the purposes of the test and that is familiar to as wide a range of test takers as possible” … Without a critical lens, such approaches to test development provide the illusion of fairness, but in reality, these practices serve only to support the current white supremacist hegemony. For example, it can be argued that “familiar to as wide a range of test takers as possible” is simply coded language for “must use White Mainstream English” (often referred to as Standard Edited American English). Put simply, the Standards, however unintentionally, call for test developers to ignore/exclude

Educational Measurement and the Pursuit of Racial Justice 355 the linguistic and cultural characteristics associated with, for example, Black culture in favor of/to privilege the linguistic and cultural characteristics associated with whiteness.19 Because White norms are dominant within our society, content that presents a “neutral” context that is familiar to the largest proportion of the test-taking body is likely to align with White norms. This issue is exacerbated by overrepresentation of people membered White in the item authoring process. Although the field has not collected sound data on the demographic characteristics of people who work in various sectors of the field, what little data that does exist indicates that the field is overrepresented by people membered White and underrepresented by people membered Black, Latine, and/or Indigenous.20 This overrepresentation of people membered White contributes further to employing contexts that are more likely to be familiar to people membered White because it is just such people that are selecting those contexts for test items. Randall similarly presents a critical analysis of the guidelines used to inform bias and sensitivity review, which is conducted prior to finalizing an item. The review process aims to identify content that may cause bias between specific subgroups of test takers or that might trigger concern or adverse reactions for a subgroup of test takers. Although the aims of bias and sensitivity review are admirable, Randall again highlights the Whiteframing of current guidelines. Randall further argues that the position from which current bias and sensitivity guidelines were developed is one of fear. This fear stems from potential actions a person or subgroup of people who raise concern about item content may take—such as legal action or a press story—that creates harm to the test developer and/or the testing program. As an example, content of reading passages that focus on racism might elevate concern among parents who then file a lawsuit against the testing program. Randall argues that fear of negative responses and legal actions are the primary driver for many of the sensitivity guidelines. Together, this fear orientation and White hegemony combine to produce bias and sensitivity guidelines that tend to permit content that is familiar and acceptable to people membered White and in turn limits the use of content that may be more familiar or resonate more strongly with people membered not-White. As examples of content that is restricted, but which may resonate with people membered not-White, Randall points to guidelines that warn against the use of content focused on racial justice, slavery, genocide, experimentation on people, and similar social problems. Randall argues that content specific to human experiments, such as the Tuskegee and Guatemalan syphilis experiments, the mass killing of Indigenous people on the American continents, the role of slavery in the development of the U.S. economy, and

356 Alternate Lenses for Educational Measurement various civil rights movements, are made inappropriate content for most test items. For many people membered not-White, exclusion of topics such as these requires engagement with content that may have less meaning and personal resonance when applying the construct targeted by a test or test item. Randall makes similar arguments about guidelines focused on the use of nonstandard English language and images that accompany items. Here, Randall argues that “these guidelines imply that everything but ‘whiteness’ should be considered a barrier, thereby marginalizing what whiteness has deemed as ‘other.’”21 Guidelines that restrict content specific to the orientation of people similarly elevates one identity over others: heteronormativity requires that the default assumption one makes about orientation will always be straight and cis-gendered … The issue here goes beyond the acceptance and inclusion of LGBTQ+ content—relevant or irrelevant—on an assessment. The broader problem is bias and sensitivity guidelines leveraging construct irrelevance as a tool of erasure.22 From a utilitarian frame, efforts to produce “context-neutral” content that meets current bias and sensitivity guidelines maximize benefit for most test takers (particularly those membered White, those who embrace White normativity, and/or those who resonate with heteronormativity). By reducing risk of legal action or public protest, these efforts also provide benefit to test developers and testing programs. One might further argue that testing programs are designed to provide information about students, schools, and the educational system that supports improvements to teaching and learning that aim to benefit all students and society more broadly. These benefits, however, come at the expense of content that may be more engaging for subgroups of test takers for whom “controversial” content is both meaningful and impactful—both in its absence and when it is present. Mean Effect

The aforementioned two examples focus on test development practices aligned with utilitarian principles that allow harm to some in the pursuit of maximum benefit across all. The next example focuses on a practice engaged when using test scores and other outcome indicators to evaluate the efficacy of educational initiatives. In presenting this example, I am not arguing against its soundness as a statistical analytic technique, but rather to consider how it aligns with principles of utilitarianism. Later, I will explore how different perspectives on justice might alter this practice. Most efficacy studies center analyses on the mean effect a given intervention has on an outcome. There are many different designs social scientists

Educational Measurement and the Pursuit of Racial Justice 357 implement to examine efficacy of educational interventions, some employing complex and sophisticated sampling techniques, others applying statistical techniques to reflect the clustering of people within units, still others using a variety of covariates to adjust estimates for the lack of randomization in the assignment of participants to conditions, and many studies combining two or more of these methods. Across designs, however, the primary question of interest focuses on the effect an intervention has on participants, on average. Focusing on average (or mean) effects, whether adjusted for various factors such as clustering (a.k.a. nesting) or nonrandom assignment, treats the sample participating in the study as a monolith—as a single body of people who experience the intervention in the same (or very similar) way. It is this assumption of similar experience that allows the mean to represent the experience of all participants. In turn, the use of the mean to represent the monolith obscures considerable variability that may exist in how participants experienced the intervention and how the intervention impacts each individual. In making this statement, I acknowledge that nearly all analyses report standard deviations for outcome measures. Similarly, standard errors are reported for statistics calculated to estimate the magnitude of effects, differences among groups, and relationship with covariates. Both standard deviations and standard errors provide information about variability both within the sample and within the population represented by the sample. Yet, as Janet Helms observes, most discussions of findings focus on mean effects, differences among groups, and relationships with covariates absent deep consideration of variability among participants. In addition, rarely do analyses deeply explore such variation. Further, Helms argues that in studies that compare differences across racialized groups, there is considerably greater variation (or differences among individuals) within groups than between groups.23 Similarly, the efficacy of an intervention typically varies considerably among participants, sometimes resulting in positive effects for some students and negative effects for others. Yet, rarely do studies compare the magnitude or direction of the effect for people in the top decile or quartile with those in the bottom decile or quartile. Instead, the mean effect is typically employed as a representation of the group effect. And when the mean effect is statistically significant, the intervention is typically endorsed regardless of its effectiveness for all members of the sample. It is in this way that analyses that focus on the mean effect align with utilitarianism—effects that, on average, are positive and statistically significant are embraced, regardless of whether they work for all or produce potential harm for some. Rawls’s Justice as Fairness Rawls’s theory of Justice as Fairness confronts the harm to individuals and subgroups of individuals that may be produced by self-interest. In its

358 Alternate Lenses for Educational Measurement simplest form, Justice as Fairness aims to maximize satisfaction (a.k.a. happiness or benefit) for all. Doing so, however, does not require the level of satisfaction to be the same for all members of society. Rather, Justice as Fairness allows for unequal distribution of satisfaction. Any such inequality, however, must result from a policy or practice that increases satisfaction for all without decreasing satisfaction for any. Rawls contrasts the utilitarian view with the Justice as Fairness view as follows: The question is whether the imposition of disadvantages on a few can be outweighed by a greater sum of advantages enjoyed by others; or whether the weight of justice requires an equal liberty for all and permits only those economic and social inequalities which are to each person’s interests … In the one we think of a well-ordered society as a scheme of cooperation for reciprocal advantage regulated by principles which persons would choose in an initial situation that is fair, in the other as the efficient administration of social resources to maximize the satisfaction of the system of desire.24 In addition to increasing satisfaction for all, Rawls also gives preference to that option which provides the greatest benefit for those most disadvantaged in society. This is not to say that policies should be designed to provide benefit to those disadvantaged in ways that harm those people advantaged in society. Rather, this principle holds that when two or more options are available, each of which provides benefit to all, the option that provides the greatest benefit to those most disadvantaged in society is prioritized. In establishing this principle, Rawls permits the adoption of an option that may not provide maximal benefit across society, but instead, while providing some benefit to all, provides maximal benefit (compared to the alternative options) to those most disadvantaged in society.25 An essential component of Rawls’s theory is the concept of fairness. For Rawls, fairness does not focus on the distribution of satisfaction. That is, it is not the distribution of satisfaction that is said to be fair or unfair. Rather, fairness centers on the state in which the rules governing society or an institution within society are established. It is acceptable to produce rules that produce unequal distribution of a certain form of satisfaction, given that the satisfaction of any individual or subgroup is not diminished. What makes the production of the rules satisfy the condition of fairness is the state in which people are in when the rules are produced. Rawls terms the ideal state in which rules governing society are developed the original position. In the fictitious original position, all people forming a rule are equal with respect to their level of satisfaction. In addition, no person has knowledge of how the rule will affect their level of satisfaction

Educational Measurement and the Pursuit of Racial Justice 359 except that their level of satisfaction will increase to some degree. In this way, they act under a veil of ignorance about the position they hold within society—they know nothing about their racialized, gendered, economic, ableness, or other forms of identity, the material resources they hold, or the duties they perform. In the original position, then, the formation of rules is free from the influence of self-interest, for none of the members forming the rules knows how the rule will impact them. “Since all are similarly situated and no one is able to design principles to favor his particular condition, the principles of justice are the result of a fair agreement or bargain.”26 Fairness, then, derives from a lack of knowledge regarding one’s future state and how a given rule impacts the formation of one’s future state. For Rawls’s concept of justice, fairness is essential. Fairness in educational measurement is also an essential component for establishing the validity of inferences and interpretations made based on the scores produced by a test. Issues of fairness appear throughout the Joint Standards, in which an entire chapter focuses on fairness in testing.27 In educational measurement, however, fairness pays little attention to the people who make decisions during the development, administration, scoring, and reporting of tests and their scores, or the state those people are in. Rather, fairness focuses on issues considered and processes employed throughout the testing process to minimize systematic error (a.k.a. bias) in scores produced by the test. The primary concern of fairness in testing is minimizing the influence of constructs that are not the target of measurement on the measure of the targeted construct. Given the racial, class, and gender biases present throughout the history of testing, and the well-documented challenges the testing industry has historically had making tests accessible for students with different forms of ableness or who are developing English language fluency, fairness in testing also focuses special attention on processes and procedures that minimize error in test scores for specific subgroups of test takers. In this way, fairness in educational measurement is specific to “equitable treatment of all test takers during the testing process” and “the lack or absence of measurement bias … access to the constructs measured, and … validity of individual test score interpretations for the intended use(s).”28 The aim of fairness in testing is “protecting test takers and test users in all aspects of testing.”29 Common across Rawls’s concept of fairness and fairness as defined by the Joint Standards is a focus on the ultimate impact on people. What differs is twofold. First, Rawls focuses on the conditions under which rules and practices are established, with specific concern about protecting against decisions made in one’s self-interest. The Joint Standards pays no concern to who makes decision rules and how they are made and are concerned only that the impact of the rules and practices minimizes bias, construct irrelevancy, or negative impacts on validity. Second, Rawls recognizes that rules and practices may

360 Alternate Lenses for Educational Measurement produce differences in the distribution of a given satisfaction, but holds that whatever impact does occur, all must realize some benefit. The Joint Standards strive for equitable treatment but do not define what does and does not satisfy equitable treatment. To explore the potential relevance of Rawls’s theory of Justice as Fairness, I revisit the three examples of practices in educational measurement, this time contemplating alternate ways in which these practices might be employed if formed from the original position and aimed to maximize benefit for those most disadvantaged. In this thought experiment, two considerations are common across topics. The first focuses on the position from which current business rules and practices were developed. For each topic, it was technical experts in positions of power within the field who developed business rules and practices. Further, status as an expert was known by those experts. Absent, and therefore silent, in the process of developing business rules and practices were people outside of the field who may be impacted by the application of those rules and practices or who may otherwise be impacted by the outcomes of such applications. In this way, the development of business rules and practices was a technical endeavor engaged to guide, support, and in some cases, protect members of the field. The second consideration focuses on changes to business rules and practices that might emerge had those rules and practices been developed from the original position. In other words, the key issue explored in these thought experiments is how those rules and practices might be different had those who developed the rules not known they: (a) would be members of the field; (b) held expertise; and (c) would be the ones implementing the rules rather than those who are impacted by the implementation of the rules. The final consideration relates to the intended impact of the rules and practices. From a Justice as Fairness perspective, the adoption of any rule, practice, procedure, or structural arrangement should maximize benefit, particularly for those most disadvantaged within society. Rawls also maintained that benefit should occur for all, or at a minimum, harm should be produced for none. In the analyses that follow, I acknowledge that it is not possible to produce both benefit for all and maximum benefit for those most disadvantaged within society. In such cases, preference is given to accruing greatest benefit for those most disadvantaged within society. Differential Item Functioning

The current business rules guiding differential item functioning analyses employ two explicit checks that limit the removal or revision of items. In addition, the dominant method employed to form reference and focal groups produces a third implicit check on the detection of biased items. The first explicit check focuses on the action taken when an item is flagged for operating

Educational Measurement and the Pursuit of Racial Justice 361 differently between the reference and focal group. Recall that DIF analyses serve as a test of the assumption that test takers with the same ability have the same probability of responding correctly to a given item. An item is flagged for DIF when this assumption is violated. The current business rule forwards such items for panel review, which serves as a check on the removal of an item. The second explicit check occurs during the panel review itself, which is guided by a business rule that preserves an item if no reasonable cause for differential functioning is identified by the panel. In other words, an item is assumed to be acceptable unless the panel identifies a reason for it to be unacceptable. These two checks serve to retain items that violate a key assumption of the theory informing test development and, when the panel fails to observe something that exists, produces harm for a subgroup of test takers. As previously noted, this practice benefits the test developer by saving costs, and it benefits those for whom the item does not negatively bias test scores. From the original position—a position in which one does not know whether they are a test developer, a member of the group favored by the item, a member of the group harmed by the item, or a neutral party—one wonders whether a rule that may produce harm to those the process is intended to protect would be established. If not, is it more reasonable to establish a rule that simply removes all items for which the DIF statistic exceeds a level that indicates a violation of the assumption tested? Or, perhaps, exceeding that threshold should trigger a panel review that operates to identify recommended revisions to the item, and when no reasonable revision is identified prior to readministering the item, the item is removed? When an item is found to produce potential bias against the focal group, either of these rules would provide maximal benefit for those most disadvantaged in society who typically form the focal group of interest. While this decision might increase the costs for the test developer, it might also reduce exposure to claims of test bias and potential legal action in response to that bias. An implicit check on the removal of potentially biased items also operates through the current method for forming reference and focal groups. The current practice typically examines DIF based on racialized identity, gender, and social economic status. In these analyses, each category of identity is examined separately. In doing so, the implicit assumption is that oppression associated with each form of identity operates independently and uniquely. As the exploration of Intersectionality Theory in Chapter 12 reveals, this assumption ignores the intersecting manner in which oppression operates, often producing more severe oppressive impacts for people whose social location is at the intersections of forms of oppression. Further, by focusing separately on each category of identity, impacts of other forms of oppression are conflated. As an example, when gender oppression operates, the

362 Alternate Lenses for Educational Measurement disadvantage produced for people who identify as female may be counterbalanced by the advantage produced for people membered male when racialized identity is used to form reference and focal groups. In turn, the extent to which items function differently for those people most oppressed in our society may be underestimated. Recent DIF analyses that employ an intersectional approach to reference and focal group formation provide preliminary evidence that under-detection of DIF does occur through the traditional approach to group formation. Across six different state tests and three different methods for examining DIF, the intersectional approach to group formation consistently found three to five times the level of DIF compared to the traditional approach to group formation. Moreover, for some intersectional groups, nearly half of the items were found to operate in a potentially biased manner. In contrast, the traditional approach found only one to three items operating differently across gender, racialized identity, or social economic status. Given these initial findings, it is reasonable to conclude that the traditional approach to group formation would not be adopted in the original position.30 Context-Neutral Item Content

Randall argues that efforts to produce item content that is context-neutral, and that abides by current bias and sensitivity guidelines, produces content that aligns with White norms. In turn, the use of White-normed test content produces a disadvantage for test takers membered not-White—particularly those who resist assimilation into the White normative culture. No doubt, efforts to produce content that is absent of bias and minimizes triggering psychological and emotional harm has value. However, one wonders whether the business rules that guide content development would be adopted in the original position. The key issue at play in this thought experiment is the restriction on content that is more likely to trigger sensitivity issues for people membered White, and thus the use of content that is less likely to resonate or otherwise be familiar to people membered not-White. In this consideration, the use of any context will likely result in content that is more familiar to and/or resonate more with some test takers than to others. Clearly, however, context is required for many of the constructs measured by a test. So, the question might morph from one focused on item content to one focused on the range of content employed across all items forming the test. That is, rather than developing guidelines specific to item content that will inevitably produce advantage for some and disadvantage for others, perhaps the original position would shift focus to the range of content used across all items forming the test. Guidelines developed from the original position might then encourage the inclusion of item content that balances the advantage and disadvantage

Educational Measurement and the Pursuit of Racial Justice 363 produced for specific subgroups of test takers across all items those test takers encounter. Further, to support implementation of this guideline, operating procedures might direct item authors to identify subgroups of test takers for whom the item content is likely to advantage, for whom the content is likely to disadvantage, and for whom the content would have a neutral impact. This information could be used to document the degree to which potential advantage is shared across subgroups of interest and, in the ideal state, document that potential advantage and disadvantage is shared equally among groups. Mean Effect

The mean effect or differences between means is typically used to evaluate the effect an intervention has on a learning outcome, such as a test score. Variation around the mean exists in nearly all datasets. In some cases, variation around the mean encapsulates participants for whom the intervention had little, no, or potentially a negative impact of the outcome of interest. In addition, for these participants, an alternate intervention might have a substantially larger positive impact. By focusing on mean effects, impacts on different subsets of participants are often overlooked. This oversight is particularly problematic when those participants are disproportionately members of society who are most disadvantaged by society. Although use of mean effects is useful for informing decisions that are likely to produce positive impacts for the majority of the population to whom the findings are generalized, it may also exacerbate inequities when those who are marginally or negatively impacted by the intervention or for whom an alternate intervention would be more effective are members of an already-disadvantaged group. When considered from the frame of Justice as Fairness, reliance on mean effects will likely fail to uphold the principle of providing the greatest benefit to those most disadvantaged in society. Rather than focusing on mean effects, business rules developed from the original position might focus attention on the direction and magnitude of effects at different bands within the distribution of outcomes. As an example, a series of studies that examined the effect that use of a computer to compose essays on a writing test had on test scores found that there was no mean difference in scores of essays produced via pen-and-paper compared to those produced on a computer. This finding suggests that it does not matter whether the writing test is administered on paper or on a computer. Further analyses, however, divided the sample into three groups based on the keyboarding speed of test takers, with those in the lowest third forming one group, those in the middle third a second group, and those in the highest third a third group. Difference in performance between essays authored on paper or on a computer were then compared. Results of these analyses found that for students in the

364 Alternate Lenses for Educational Measurement lowest third, authoring essays on paper resulted in significantly higher scores. In contrast, for those in the highest third, authoring essays on a computer resulted in significantly higher scores. For the middle third, no differences in scores were observed.31 This reanalysis produced a very different story regarding the effect that mode of administration had on the essays produced by students. From a Justice as Fairness perspective, analyses such as this may deepen understanding of the distribution of effects and the characteristics of students who benefit from those effects. Such knowledge might then inform whether and for whom a given intervention is adopted. Mills’s Rectificatory Justice Rawls employs the original position as a tool for engaging in a thought experiment focused on developing what he terms “ideal theory”—the design of a perfectly just society. The original position is useful for considering the rules, regulations, practices, and structures that support the formation of a society that is “ideally just.” Although Charles Mills is similarly interested in forming an ideally just society, he is critical of Rawls’s use of an original position in which a veil of ignorance obscures existing injustices. As the second quote opening this chapter reflects, Mills was interested in taking action to create a more just society by proactively correcting for past injustices. In his critique of Rawls, Mills differentiates between “‘ideally just’ as meaning a society without any previous history of injustice and ‘ideally just’ as meaning a society with an unjust history that has now been completely corrected for.”32 Mills concludes that Rawls focused only on the former. He further argues that the fictitious, ahistorical position from which Rawls develops his ideal theory makes that theory inapplicable to the racist reality of current U.S. society. As Mills reasons, The Rawlsian ideal, starting from ground zero, is a society with no history of racial (or any other kind of) injustice. So all we need is appropriate anti-discrimination legislation to make sure that this injustice does not enter the basic structure. Interestingly, Mills extends this logic to argue that, not only would [anti-discrimination legislation under the Rawlsian ideal] produce a racism-free polity; it would produce a race-free polity … [given that] race is socially constructed, and without systemic discrimination race would not even have come into existence in the first place.33 Such an ideal world is so far from current reality that Mills rejects Rawls’s starting point.

Educational Measurement and the Pursuit of Racial Justice 365 Mills further argues, “Preemptive precautions to prevent injustices from entering the ‘basic structure’ of a society are not the same as rectificatory measures aimed at correcting them once they have already entered. Prevention generally differs from cure.”34 Given the normalcy with which racism operates in U.S. society, Mills wonders, “[h]ow theoretically useful is it then going to be in the philosophical investigation of social justice to start from a raceless ideal so remote from this reality?”35 Instead, Mills advocates for the development of Justice as Fairness “on the terrain of non-ideal theory [such] that the normative project would then no longer be the adjudication of competing versions of an ideally just social order, but, rather, the adjudication of competing policies for redressing social injustice.”36 In his effort to rectify Rawls for racial justice, Mills extends the concept of distributive justice to what he terms Rectificatory Justice that aims to correct wrongful distributions that are the product of past and continuing racialized and other forms of oppression. The primary aim of Rectificatory Justice is consistent with Rawls’s Justice as Fairness. Rectificatory Justice, however, recognizes that the legacy of oppression must first be addressed—to be rectified—before anything close to ideal justice can reasonably be considered. Mills views this as “the transition problem,” which requires careful consideration of the “route to take and by which principles to be guided.”37 What is needed are rules, regulations, practices, and structures for an ill-ordered society rather than the well-ordered society that serves as Rawls’s starting point. Mills passed away before fully elaborating his conception of Rectificatory Justice. However, implicit in his thinking is a desire to modify Rawls’s principle of maximizing benefit for those most disadvantaged to a principle that preferences maximizing correction for past injustices experienced by those most disadvantaged in society. Here, the difference between Rawls and Mills occurs in the aims of this prioritizing principle. Operating in an ideal society in which injustice has not yet operated, Rawls’s aim was limited to assuring that the option selected provides maximal benefit for those currently most disadvantaged in a just social system. Mills, on the other hand, recognizes that some degree of disadvantage has been unjustly produced through oppression that operates currently or historically within the social system. Mills seeks to rectify this injustice through the option that maximally closes disparities that are the product of injustice and/or which alters the conditions that allowed racialized and other forms of oppression to operate. As his analysis of affirmative action and white ignorance (see later in this section) evidences, for Mills, closing disparities permits the use of policies, practices, and structures that give preference to those who are disadvantaged through past and ongoing injustice(s), and force society to confront ideological fallacies that enable racialized and other forms of oppression. Whereas the application of Rawls’s theory of Justice as Fairness to the three educational measurement scenarios altered business rules and practices

366 Alternate Lenses for Educational Measurement to maximize benefit for those most disadvantaged in our society while attempting to produce no harm for those holding advantaged positions, Mills’s rectificatory conception of justice permits preferencing people who have been disadvantaged through unjust policies and actions despite the resulting challenges introduced for those who are advantaged, at least in the short term. It is through this preferencing and ideological confrontation that corrective actions aim to rectify past injustice and elevate all to the proverbial “even playing field.” With this aim in mind, I now revisit the three educational measurement scenarios and reconsider the formation of business rules and practices developed through the frame of Mills’s Rectificatory Justice. Differential Item Functioning

Earlier I suggested that business rules and practices developed through the lens of Rawls’s Justice as Fairness might yield three changes to current practice: (1) automatic removal of items that exceed empirical thresholds that indicate an item functions differently across groups; (2) altering the role of the review panel such that it suggests modifications to remove bias and if no such modifications are identified, removing the item; and (3) employing an intersectional approach to examining DIF. Through the frame of Mills’s Rectificatory Justice, each of these recommendations would likely persist. Taking a more aggressive approach to removing items that are found to behave in ways that violate a key assumption of item response theory and thus perpetuate harm to test takers whose disadvantage in society has been produced by unjust racialized and other forms of oppression is clearly aligned with Mills’s conception of Rectificatory Justice. To these changes, two additional practices might be considered. The first builds on the idea of racialized enactments and focuses on those items for which a DIF statistic does not exceed a threshold for flagging or removal, but still indicates a small level of differential functioning. As Claudia Rankine’s Citizen poetically expresses, to the (White) outsider, racialized enactments may appear to be small incidents that produce momentary discomfort.38 But in fact, for the people who experience racialized enactments, these enactments function as a series of cuts that accumulate as considerable pain. Test items that contain small degrees of bias—a magnitude that falls short of a critical threshold—nonetheless causes harm to some. And when several items, each of which inflict small magnitudes of harm, are added together, the cumulative effect can produce notable impact. To account for the cumulative impact of micro-bias, DIF analyses might be expanded from focusing on bias within each individual item to examining cumulative bias. This expansion would require the field to explore adaptations to existing techniques and, perhaps, develop new methods to estimate

Educational Measurement and the Pursuit of Racial Justice 367 the cumulative impact small degrees of differential functioning has on bias in the total test score. Whether thresholds are then established to determine when the cumulative effect exceeds an acceptable level or methods for adjusting scores are introduced is an additional topic for consideration. A second, more radical topic for consideration that the lens of Rectificatory Justice opens focuses on the removal of only those items that produce bias against subgroups whose disadvantage in society is the product of past unjust oppression. Although DIF analyses typically focus on bias that negatively impacts the focal group, some items are found to operate in a manner that favors the focal group. This occurrence produces what is termed positive DIF. In current DIF practices, both items identified for negative or positive DIF are subject to review and potential removal. Given the past harm inflicted through racialized and other forms of oppression, and through undetected bias in tests, Rectificatory Justice might support removing only those items that produce negative DIF and retaining those items that produce positive DIF (perhaps restricted to moderate levels). Given the conservative practice that has long been in place and which continues to produce harm for those most disadvantaged in society, the rectificatory frame might permit correction to occur through the inclusion of items that function in ways that benefit test takers who are members of previously harmed groups. Context-Neutral Content

In addition to his critique of Rawls’s original position, Mills also explores the concept of white ignorance. Mills argues that white ignorance is manufactured to veil people membered White from racism and the racial injustices it produces. Mills uses the term ignorance “to cover both false belief and the absence of true belief.”39 In developing his conception of white ignorance, Mills extends the concept of ignorance described by Alvin Goldman in Knowledge in a Social World. For Mills, white ignorance encompasses “the ‘spread of misinformation,’ the ‘distribution or error’ (including the possibility of ‘massive error’) with the ‘larger social cluster,’ the ’group entity,’ of whites and the ‘social practices’ (some ‘wholly pernicious’) that encourage it.”40 Recalling the model of systemic racism presented in Chapter 3, Mills’s white ignorance functions within racialized ideology and narratives to blind people membered White from the institutional, social structures, and historical practices that produce racialized oppression and resulting disparities. Further, white ignorance transfers responsibility for those disparities to racialized individuals and cultures. In describing white ignorance, Mills acknowledges that the existence and operation of white ignorance is not limited to people membered White and thus influences some people membered not-White. In addition, white ignorance does not exist and operate

368 Alternate Lenses for Educational Measurement uniformly across the population membered White. Moreover, white ignorance is not the only form of ignorance that exists within society—ignorance serves many forms of advantage gained through oppression. To disrupt the white ignorance that has flourished for centuries, Mills argues that “illuminating blackness or redness” is requisite. “Only by starting to break these rules and metarules can we begin the long process that will lead to the eventual overcoming of this white darkness and the achievement of an enlightenment that is genuinely multiracial.”41 In her analysis of content, bias, and sensitivity guidelines that inform the development and review of item content, Randall argues for the inclusion of content that exposes racial and other forms of oppression. She advocates for contexts that make clear the hardships and struggles encountered, and the actions performed every day to confront and resist oppression. As a concrete example, Randall offers a word problem that situates the mathematics that are the targeted construct in the following context: Marcellus is cooking hot meals to hand out to a small group of twelve Black Lives Matter protesters demonstrating against separating families held at the U.S./Mexico border. He is making a meal of rice, cornbread, and red beans. He wants to make enough red beans for each person to have more than ¾ cup. Determine whether each inequality or number line correctly models c, the number of [cups of] red beans Marcellus needs to make.42 In addition to presenting a context in which the concept of inequality is assessed, the item presents students with a positive representation of Black Lives Matter activism, exposes them to the hardships experienced by some families attempting to enter the United States, and uses a meal that is familiar to a subgroup of test takers whose culture is typically underrepresented in educational content and White-normed society more generally. Although item authoring and sensitivity guidelines advise against the inclusion of images or references to pregnancy or orientation unless they are essential to the targeted construct, Randall “encourage[s] test developers to employ the liberal use of illustrations to rupture negative stereotypes, increase representation of historically minoritized persons, and rupture notions of whiteness as neutral and/or superior.”43 As examples, she encourages the inclusion of images of women who are pregnant in the workplace and in leadership positions, people of color in leadership roles, and people with diverse orientations playing various roles in society. Although Randall does not directly reference Mills’s conception of white ignorance, her recommendations support the disruption of false beliefs and absence of true beliefs that are the foundation of that ignorance. Use of both images and contexts that provide

Educational Measurement and the Pursuit of Racial Justice 369 [m]eaningful representation will require the inclusion of content that elevates the real contributions of marginalized populations (including their historical and contemporary efforts to resist oppressive systems of injustice), draws on their cultural funds of knowledge (Lee, 1998), and employs their linguistic systems in the same way/to the same degree as white-centric linguistic systems are employed.44 The lens of Rectificatory Justice would likely support Randall’s position and advocate for the use of images and contexts that provide meaningful (over) representations. Mean Effect

A core principle of Rectificatory Justice is correcting for disadvantage produced through oppression. When examining the efficacy of an educational intervention, Rectificatory Justice grants priority to interventions that provide maximal benefit for those who are members of groups harmed by past oppression. Extending the practice of examining effects for specific bands within a larger distribution, Rectificatory Justice might advocate analysis of the composition of members within bands and then only recommend an intervention that is shown to have a similar or larger positive effect for members of a group disadvantaged by past injustice. As an example, if analysis of the distribution of effect (say, at the lowest and highest quartile) found that people membered White disproportionately comprised the higher band while people disadvantaged through prior injustice disproportionately comprised the lower band, Rectificatory Justice might reject implementation of the intervention because it would further exacerbate inequity. However, if the situation were reversed, Rectificatory Justice might advocate adoption of the intervention because it would serve to help rectify a disparity. Similarly, analyses that employ demographic characteristics as covariates might only advocate for adoption of interventions for which the coefficient for a characteristic that serves to represent a form of oppression is positive, indicating the intervention was more effective for members of that marginalized group. In contrast, Rectificatory Justice might reject implementation of any intervention for which the coefficient for a demographic covariate is negative. Justice Through Measurement As the first quote opening this chapter observes, the pursuit of justice requires reflection on current practices and the modification of those practices to shape our future. The issues presented in this chapter illustrate how the adoption of different conceptions of distributive justice might alter the rules and practices used in the field of educational measurement to develop

370 Alternate Lenses for Educational Measurement test instruments and to use test scores to evaluate the efficacy of educational interventions. In offering these perspectives, I am not advocating for the adoption of any one frame. Rather, my intent is to expose the impact adoption of a given frame might have on practices within the field. As the many topics explored in this book reveal, racial and other forms of oppression have and continue to be problematic within our society. The continued operation of various forms of oppression limit too many individuals from developing their interests and reaching their full potential. In turn, this limiting extends to the functioning of our society. As shown in Chapter 9, educational measurement, as it is operated today, functions as apparatus for the system of racism. This operation is produced by a White Racial Frame that influences much of the work in the field of educational measurement. The chapters in Part III have explored alternate frames which, if embraced and applied to guide our work, hold promise to shift educational measurement from apparatus for systemic racism to an anti-racist endeavor. The final chapter explores complications and additional considerations that are required if shifts in practice explored in this section are adopted and shares concrete actions the field of educational measurement can initiate now to begin this transition. Notes

1 Stanfield (2011), p. 92. 2 Mills (2017), p. 140, italics in the original. 3 Horkheimer (1982); Crenshaw (1991), Crenshaw et al. (1995); Collins (2019). 4 See Bentham (1789/2007); Sidgwick (1874/2019); Rawls (1971/1999); Nozick (1974/2007). 5 Based on a search of all articles published in Educational Measurement: Issues and Practices, conducted on August 7, 2022, using the advanced search tool provided at https://onlinelibrary.wiley.com/journal/17453992 6 Stein (2016). 7 Meyer (2013). 8 Criticism was also levied at the content of the test, which produced differences in scores between test takers membered male and female. 9 Zwick and Dorans (2016); Zwick (2019). 10 Randall (in press). 11 See Rawls (1971/1999), p. 6. 12 Sandel (2020), p. 9. 13 Bentham (1789/2007), p. 10, italics in the original. 14 See Bentham (1789/2007), p. 14. 15 Mill (1861/2007), pp. 17–19. 16 See Rawls (1971/1999), p. 23. 17 Mill (1861/2007). 18 Some in the field offer a third reason for not automatically removing item. This reason focuses on the construct measured by the item and reasons that in some cases there is a legitimate construct-relevant reason why an item may function differently across groups. As an example, NCME and the Joint Standards suggest

Educational Measurement and the Pursuit of Racial Justice 371

19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44

that differences in opportunity to learn across schools may influence the functioning of the item. Elsewhere I challenge this position, noting that to date no empirical evidence has been produced that supports that differences in opportunity to learn produce differential item functioning that generalizes across all students participating in a testing program who form the focal group for which DIF is detected. Randall (2021), p. 4. Rios et al. (2019); Women in Measurement (2021). Randall (in press), p. 21 of draft. Randall (in press), p. 23 of draft. Helms (1992). Rawls (1971/1999), pp. 29–30. Rawls (1971/1999). Rawls (1971/1999), p. 11. AERA/APA/NCME (2014). AERA/APA/NCME (2014), p. 51. AERA/APA/NCME (2014), p. 49. Russell et al. (2022). Russell and Haney (1997); Russell (1999). Mills (2017), p. 140, italics in original. Mills (2017), p. 157, italics in original. Mills (2017), p. 140, italics in original. Mills (2017), p. 148. Mills (2017), p. 160. Mills (2017), p. 170. Rankine (2014). Mills (2017), p. 52. Mills (2017), p. 52. Mills (2017), p. 71. Randall (2021), p. 2. Randall (in press), currently page 23. Randall (in press), currently page 28.

References American Educational Research Association, American Psychological Association & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. American Educational Research Association. Bentham, J. (1789/2007). An introduction to the principles of morals and legislation. In Justice: A Reader. Oxford University Press. Collins, P.H. (2019). Intersectionality as critical social theory. In Intersectionality as Critical Social Theory. Duke University Press. Crenshaw, K. (1991). Mapping the margins: Intersectionality, identity politics, and violence against women of color. Stanford Law Review, 43, 1241–1299. Crenshaw, K., Gotanda, N., Peller, G. & Thomas, K. (1995). Critical Race Theory: The Key Writings that Formed the Movement. The New Press. Helms, J.E. (1992). Why is there no study of cultural equivalence in standardized cognitive ability testing? American Psychologist, 47(9), 1083–1101. Horkheimer, M. (1982). Traditional and critical theory. In Critical Theory: Selected Essays. A&C Black.

372 Alternate Lenses for Educational Measurement Lee, C.D. (1998). Culturally responsive pedagogy and performance-based assessment. The Journal of Negro Education, 67(3), 268–279. Meyer, H.D. (2013). Reasoning about fairness in access to higher education: Common sense, normative, and institutional perspectives. In Fairness in Access to Higher Education in a Global Perspective. Brill. Mill, J.S. (1861/2007). Utilitarianism. In Justice: A Reader. Oxford University Press. Mills, C.W. (2017). Black Rights/White Wrongs: The Critique of Racial Liberalism. Oxford University Press. Nozick, R. (1974/2007). Anarchy, state, and utopia. In Justice: A Reader. Oxford University Press. Randall, J. (2021). “Color-Neutral” is not a thing: Redefining construct definition and representation through a justice-oriented critical antiracist lens. Educational Measurement: Issues and Practice, 40(4), 82–90. Randall, J. (in press). It ain’t near ‘bout fair: Re-envisioning the bias and sensitivity review process from a justice-oriented antiracist perspective. Educational Assessment. Rankine, C. (2014). Citizen: An American Lyric. Graywolf Press. Rawls. J. (1971/1999). A Theory of Justice: (Revised edition). Harvard University Press. Rios, J.A., Randall, J. & Donnelly, M. (2019). An analysis of college choice information provided on graduate program websites: Implications for improving applicant diversity in educational measurement. Educational Measurement: Issues and Practice, 38(4), 67–77. Russell, M. (1999). Testing writing on computers: A follow-up study comparing performance on computer and on paper. Educational Policy Analysis Archives, 7(20), 1–47. Russell, M., Szendey, O. & Li, Z. (2022). An intersectional approach to DIF: Comparing outcomes across methods. Educational Assessment, 27(2), 115–135. Russell, M. & Haney, W. (1997). Testing writing on computers. Education Policy Analysis Archives, 5, 3. Sandel, M.J. (2020). The Tyranny of Merit: What’s Become of the Common Good. Farrar, Straus, and Giroux. Sidgwick, H. (1874/2019). The Methods of Ethics. Good Press. Stein, Z. (2016). Social Justice and Educational Measurement: John Rawls, The History of Testing, and the Future of Education. Routledge. Women in Measurement. (2021). 2021: A year in review. Accessed at https://www. womeninmeasurement.org/assets/files/WIM-Annual-Report2021.pdf Zwick, R. (2019). Fairness in measurement and selection: Statistical, philosophical, and public perspectives. Educational Measurement: Issues and Practice, 38(4), 34–41. Zwick, R. & Dorans, N.J. (2016). Philosophical perspectives on fairness in educational assessment. In Fairness in Educational Assessment and Measurement. Routledge.

14 Forging a Path Toward Anti-Racism in Educational Measurement

When used properly and progressively, science, as a way of thinking and acting, is useful for empowering otherwise oppressed and marginal[ized] people … in their most humane manifestations, the human sciences are supposed to be liberating and empowering, not oppressive or merely steps in a career.1

On the opening page of this book, I recounted Steve Sireci’s 2020 Presidential Address to the National Council on Measurement in Education (NCME), in which educational measurement was described as an altruistic profession.2 Without question, the educational measurement community provides a variety of valuable services to the field of education. Training and professional development provided by members of the educational measurement community to teachers who work in K-12 schools supports improvements in classroom assessment practices. Formative and interim assessment systems developed by test companies provide educators with information they use to tailor instruction to students’ learning needs. Summative assessment programs provide information about student achievement that schools and school systems use to inform modifications to their programs. Research performed by educational measurement specialists helps schools, educational leaders, and educators make informed decisions about curricular programs, instructional practices, and other educational programs. Test developers engage in research that improves the quality of information provided by tools for assessment. Research programs explore new approaches to improve accessibility and reduce other forms of bias in test instruments. These and the many more lines of work engaged by those in the field of educational measurement have a variety of positive impacts on the development of students across all sectors of the educational system. Working to have a positive impact is an important aspect of altruism. But there is more to altruism than positive impacts. Altruism requires one to engage in work with a selfless concern for the well-being of others. As Sireci

DOI: 10.4324/9781003228141-18

374 Alternate Lenses for Educational Measurement observes, “[p]sychometricians and other educational measurement specialists are highly trained in statistical analysis, data management, research design, and evaluation. These are highly valued skills in the more lucrative business world, but we chose to work in education. Why?”3 The implication made through this observation is that the selfless act of those who opt to work in the field of educational measurement derives, in part, from a sacrifice in salary. While there may be some truth in this observation, surely there is more the field of educational measurement can do to selflessly pursue well-being for others. The analyses presented in this book consider a variety of ways in which the White Racial Frame has influenced the field of educational measurement and how some of the products produced by educational measurement are functioned as apparatus for the system of racism that operates the United States. These analyses also explore ways in which alternate frames both challenge current practices and provide opportunities to evolve those practices. In this closing chapter, I shift focus from past and current practice to consider our future. To this end, I suggest several projects in which the field of educational measurement can engage in the near term. I group these activities into three broad categories that include reflection, research, and diverse representation. As John Stanfield expresses in the quote opening this chapter, it is my hope that activities such as those I present in this chapter can help position educational measurement to serve as an empowering and liberating agent that works for the well-being of those who have been most oppressed by our society. Reflection Avoiding problematic components of the White Racial Frame and working to shift educational measurement to function as apparatus for an anti-racist endeavor begins with active reflection on our practices. In this section, I consider reflection on three aspects of educational measurement that include “race” as a variable in analyses, item bias, and college admission test use. “Race” as a Variable in Analyses

Several chapters explored the topic of race, racialized categorization, and the use of racialized categories in statistical analyses. Chapters 1 and 2 documented the social political construction of race. Race was developed and functions as apparatus for oppression, signifying who in society is advantaged and who is oppressed. Chapters 8 and 9 examined challenges that arise when racialized identity is employed as an explanatory variable in statistical analyses of educational outcomes. Too often interpretations and discourse that flow from these analyses do not reflect the social construction of race

Forging a Path Toward Anti-Racism in Educational Measurement 375 and instead present deficit narratives that provide backing for the White Racial Frame.4 Chapters 11 and 12 considered the ways in which variables reflecting racialized identity are formed, what these formations represent, and how theory must inform the use of such variables in our analyses. Chapter 12 also challenged the treatment of socially constructed identity categories as discrete single-axis conceptions and instead called us to understand social positions through an intersectional lens. Collectively, the issues explored in these chapters ask us to reflect on the following questions each time we consider including “race” as a part of an analysis:

• Why are we including “race” in our analysis? • What do we intend “race” to represent in our analysis? • How is this representation reflected in the variable(s) used to reflect “race”?

• What theory warrants the inclusion of “race” in our analysis? • Are there other ways of representing “race” that better reflect the theory we are applying to our analysis?

• In what ways does “race” intersect with other socially constructed catego • •

• •

ries contained by that theory? And how is this intersection reflected in our analysis? If “race” is found to have meaning in our analysis, what interpretation will we give to that meaning? And what discourse will we use to communicate that interpretation? In what ways will the inclusion, representation, interpretation, and discourse of “race” reflect notions of “race” advanced by the White Racial Frame? In what ways will this use and discourse challenge notions of “race” promulgated by the White Racial Frame? How might our discourse about “race” enable our analysis to function as apparatus for systemic racism? How will the use of “race” in the analysis serve as apparatus for an antiracist endeavor?

Reflecting on questions such as these will help increase clarity about why “race” is important for an analysis, confront racialized notions promulgated by the White Racial Frame, and better position an analytic project to support anti-racist endeavors. Item Bias

Racialized bias in tests of “mental abilities” was a topic explored across several chapters. Chapters 6 and 7 described ways in which bias impacted scores and interpretation of scores produced by the first tests of mental ability used in the United States during the early 20th century. Chapters 9 and 13

376 Alternate Lenses for Educational Measurement considered practices employed by test developers to address bias when developing test items. These chapters also examined the use of differential item functioning (DIF) to detect potential bias in items. The use of item writing, bias and sensitivity, and accessibility guidelines, as well as DIF analyses were presented as important advances in practice that have aided in reducing test bias. Yet, ways in which the White Racial Frame continues to influence these practices were also identified. These persistent influences present opportunities for further advancements. As the field of educational measurement continues to advance practice, the issues explored in these chapters open opportunity to reflect on the following questions:

• In what ways does the content of test items reflect culture most familiar to test takers membered White?

• In what ways might our item development guidelines promote the production of content that is most familiar to test takers membered White?

• How are cultures most familiar to test takers membered into nondominant racialized categories reflected in the content of items?

• In what ways does the content of items challenge white ignorance? • What is the balance of cultural representation across the items forming a test?

• If it is not possible to remove all bias in a test, how might we distribute bias appropriately across subgroups of test takers?

• How are intersections of oppression reflected in our efforts to reduce bias?

• When conducting DIF analyses, how are intersections of oppression represented?

• What criteria are applied to determine whether to remove an item flagged for DIF, and how do those criteria maximize benefit for those most oppressed by our society? Reflecting on these and similar questions positions developers of tomorrow’s tests to advance item development procedures to further reduce bias in our measures of mental abilities. As I explore next, these questions may also help inform research that can similarly advance efforts to minimize bias in test scores. College Admission Test Use

Although bias is one factor that contributes to differences in test scores across racialized groups, inequities in educational opportunities produced by systemic racism are the primary cause of disparities in educational outcomes. Yet, by conceiving test performance as a product of an individual, and by embracing

Forging a Path Toward Anti-Racism in Educational Measurement 377 individual merit as a basis for social and economic advancement, the White Racial Frame promotes the award of educational opportunities based, in part, on test performance. As explored in Chapters 8 and 9, this promotion has influenced the development of admission test programs and the use of achievement tests to inform graduation and scholarship decisions. Although individuals play an important role in developing knowledge, skills, and abilities assessed through admission and achievement tests, educational opportunities also have a critical influence on each individual’s test performance. As a result, these test scores become a reflection of both individual ability/achievement and inequities in educational opportunities. As depicted in Chapter 9, use of these test scores to inform admission, graduation, and scholarship decisions function as apparatus for systemic racism by aiding in the reproduction of social and economic disparities. Although this functioning is not intended by test developers or others in the educational measurement community, recognizing this reality provides opportunity to reflect on the following questions:

• In what ways can admission tests be redesigned to reduce the influence that inequities in educational opportunities have on test performance?

• Are such changes useful for informing readiness for advanced study? • In what ways can test scores be used to identify students who can succeed • • • •

in advance study if provided with support that rectifies past inequities in educational opportunities? What validity issues are raised by using achievement tests designed to assess school quality to also inform high-stakes decisions about students (e.g., graduation and scholarship decisions)? How might admission test scores and achievement test scores be used in conjunction with information about educational opportunities? How might information about educational opportunities be collected in an efficient and accurate manner? How might college admission decisions be impacted if admission test scores were no longer used as part of the decision-making process?

As noted in Chapter 9, admission, graduation, and scholarship decisions are made by institutions that operate outside of the field of educational measurement. Further, it is the decision-making process itself that contributes to the production of disparities in access to higher education and the social and economic benefits that flow from that access. Nonetheless, through the development of tests used for these purposes, educational measurement plays a role in the production of these disparities. Reflecting on these questions provides opportunities for educational measurement specialists to help decision-makers explore modifications to their procedures in order to reduce future productions of these disparities.

378 Alternate Lenses for Educational Measurement Research The many questions offered in the previous section have clear implications for a program of research that the educational measurement community is well positioned to engage. In addition to the research implications implicit in those questions, Chapter 12 identified analytic challenges introduced by Intersectionality Theory that require research and development to address. Specifically, two aspects of Intersectionality Theory warrant attention. First, acknowledging that experiences with advantage and oppression are produced through the intersection of what has traditionally been viewed as multiple single-axis categories, Intersectionality Theory requires us to rethink how we construct variables representing identity. Second, this more complex conception of identity and social position results in the formation of subgroups of people that are notably smaller in size than is typically experienced when applying single-axis conceptions of identity. In turn, these smaller sample sizes produce technical challenges for statistical techniques used to examine the relationship between identity/social position and outcomes of interest. As explored in Chapters 11 and 12, both Intersectionality Theory and the theory of systemic racism pose an additional challenge to existing analytic methods. As detailed in these chapters, multilevel modeling methods were developed to address the many ways in which individuals are clustered in a hierarchical manner. As an example, students are clustered in classrooms, which are clustered in schools, which are in turn clustered within districts and/or geographical locales. Although multilevel models are well equipped to account for this hierarchically nested structure, both Intersectionality Theory and systemic racism conceive of social structures in ways that do not match this hierarchically nested structure. Together, challenges presented by small sample sizes and alternate social structuring of society may require new approaches to statistical modeling. The many opportunities for research implicit in the aforementioned reflection questions and which Intersectionality Theory and the theory of systemic racism both introduce are organized as four categories of research, which I term socially constructed positions (i.e., identity), item bias, college admission decisions, and analytic methods. Socially Constructed Positions

The creation of socially constructed categories to form subgroups of people presents several challenges for social science research. As explored throughout this book, these challenges fall into at least three categories. First, identity categories are often used in statistical analyses to examine the relationship between identity and outcomes. In such cases, it is not identity itself that is

Forging a Path Toward Anti-Racism in Educational Measurement 379 understood to be causal.5 Rather, it is the lived experiences produced in a society for which the identity category has significance that is of interest. In this way, an identity category serves as a proxy for lived experiences. Second, for a given form of identity (e.g., racialized identity or gender), the categories into which a person is membered often shift over time and have imprecise boundaries.6 Further, depending on the topic of focus, whether the category into which one locates themselves (e.g., how one selfidentifies) or how one is identified (e.g., street race) influences the way in which participants in research are categorized.7 Third, as detailed in Chapter 12, Intersectionality Theory understands social positions as a multiplicity. This understanding conflicts with single-axis categorization of identity and instead requires new ways to represent one’s intersectional social position. Collectively, these issues and challenges present opportunities for several lines of research that include:

• Developing methods to collect information to form variables that reflect variation in experiences lived by members at specific social positions.8

• Exploring ways fuzzy-set logic might be applied to reflect differentiations within identity and/or lived experience variables.9

• Deepening understanding of how one identifies versus how one is identified impacts group formation and, in turn, impacts estimation of relationships with various outcome variables.10 • Developing techniques to collect information about intersectional identity that better reflects the theory of intersectionality. Initial efforts to address some of these issues have already been launched by researchers in the fields of sociology, nursing, and women’s studies. These and other lines of research are needed to better position researchers to represent participants and to model the influence their lived experiences have on outcomes of interest. Item Bias

Several reflection questions presented earlier address bias produced through the content of test items. As noted, the field of educational measurement has made considerable progress addressing item bias. Chapters 9 and 13, however, highlighted additional actions that might be taken to further address item bias. Some of these actions may introduce new sources of bias. As an example, Jennifer Randall’s advocacy for contexts that raise to consciousness various social issues and the use of content that may be more familiar to test takers with nondominant cultural backgrounds may introduce bias for test takers unfamiliar with these social issues and contexts, or who have differing opinions than is presented for a given social issue.11 In Chapter 13,

380 Alternate Lenses for Educational Measurement I also suggested coding item content to indicate which subgroups of test takers for whom the item content is likely to advantage, for whom the content is likely to disadvantage, and for whom the content would have a neutral impact. These codes might then be used to inform item assembly to produce a more equitable distribution of anticipated advantage and disadvantage. Chapter 13 also introduced an intersectional approach to differential item analyses and indicated that preliminary evidence suggests this approach may aid in the identification of potential item bias.12 An intersectional approach, however, raises challenges to existing methods for examining DIF due to small sample sizes and an increase in the number of subgroups for which DIF is examined. Together, item authoring priorities and DIF detection methods present opportunities to explore the following research questions:

• To what extent does item content designed to raise social issues to consciousness differentially impact performance for subgroups of students?

• To what extent does item content that elevates culture of a given non •

• • •

dominant group impact performance for other nondominant groups as well as the dominant group? Can methods be developed to reliably and accurately code items to indicate the subgroup of students anticipated to be advantaged, disadvantaged, or unaffected by the item content? If so, can these codes be used to balance potential bias across subgroups of test takers and/or to adapt items presented to a student based on the anticipated impact of the content? To what extent do the preliminary findings for an intersectional approach to DIF analysis generalize across tests? To what extent does the increased number of comparisons produced by an intersectional approach to DIF inflate overidentification of potentially biased items (a.k.a. Type I error)? How can DIF methods be advanced to accommodate smaller sample sizes associated with an intersectional approach?

College Admission Decisions

College admission decisions are performed by institutions that reside outside of educational measurement. Nonetheless, tests produced by educational measurement specialists provide one source of information for those decisions. As discussed in Chapter 9, scores produced by admission tests reflect individual traits (e.g., academic achievement and college readiness) that are influenced by inequitable educational opportunities. In turn, use of these scores as one criterion for college admission decisions contributes to disparities in higher education and, in turn, the reproduction of disparities

Forging a Path Toward Anti-Racism in Educational Measurement 381 in social and economic outcomes. These productions present several opportunities for research, which include:

• Developing approaches to document opportunity to learn and/or other indicators of advantage/disadvantage.

• Exploring approaches to integrate opportunity to learn and/or advan • •

• •

tage/disadvantage indicators with admission test scores to place scores in a social context.13 Examining the extent to which the use of an integrated indicator to inform admission decisions impacts the demographic composition of admitted students. Exploring the use of test scores, in combination with other indicators, to identify students who have excelled in their high school environment but whose environment may have underprepared them for higher education studies. Providing support services/supplemental education for students identified through the aforementioned approach.14 Examining the impact that eliminating the use of test scores as part of the admission decision process has on the demographic composition of admitted students.15

Although the type of questions in this list cannot be explored independently by the educational measurement community, forming partnerships with colleges and universities holds potential to alter college admission decisions in ways that may help reduce the reproduction of social and economic disparities and may serve as one vehicle in the pursuit of rectificatory justice. Analytic Methods

In addition to opportunities to advance DIF methods, Intersectionality Theory and the theory of systemic racism present opportunities to advance other statistical methods employed to examine relationships between social positions and educational (and other) outcomes. As explored in Chapter 13, estimating effects associated with lived experiences of members of an intersectional social position presents challenges to current statistical methods. In part, these challenges arise from an inconsistency between how intersectionality theorizes an intersectional social position and how statistical models are designed to represent interactions among variables and the nesting of variables.16 For example, regression models are typically applied to first estimate effects for two or more variables (e.g., race and gender). The interaction between these two variables is then added to the model. By treating each identity variable as independent and then examining the added “effect” of

382 Alternate Lenses for Educational Measurement their interaction, this approach conflicts with the conception of a social position being a simultaneous multiplicity. Similarly, multilevel analyses are designed to handle nested data. While one might treat an intersectional social position as nested—that is, gender nested with racialized identity, or vice versa—this approach is inconsistent with the simultaneous multiplicity of a social position. The theory of systemic racism similarly presents challenges for multilevel models. As described in Chapter 3, racism is experienced by individuals at multiple levels within a social system. Individual racism occurs at a nano level. Institutional and structural racism operate at micro- and meso-levels. Systemic racism is experienced at a macro-level that spans across the full system. At first, this nano-micro-meso-macro structure may appear as a nested arrangement compatible with multilevel models. For example, one can think of individuals experiencing the effects of individual racism to varying degrees and in different directions (positive and negative). Further, individuals might be clustered into locales (e.g., neighborhoods segregated by historical redline policies), with members of each locale sharing similar experiences with structural racism. But, when systemic racism is added to the model, the clustered arrangement no longer holds. Rather than forming an additional level of units, systemic racism operates across the entire system but impacts members within each locale in different ways, depending on their racialized membering. It is this systemic aspect of racism that challenges current approaches to multilevel models. Together, the misalignment between existing statistical modeling techniques and core conceptions of Intersectionality Theory and the theory of systemic racism provide opportunities to explore the development of new modeling techniques that reflect the simultaneous multiplicity of social positions and the systemic nature of racism. Diverse Representation Combatting the various ways in which educational measurement is coopted to function as apparatus for systemic racism requires a collaborative approach. To explore many of the topics for research suggested in this chapter, partnerships with colleges and universities are requisite. Similarly, developing new approaches to forming variables representing socially constructed positions will benefit from collaboration with sociologists, demographers, quantitative researchers in the field of nursing, and others with expertise in modern social theories. Advancing statistical methods that align with these social theories will require similar sets of partnerships. Beyond partnerships that diversify the knowledge base applied to inform research, other forms of outreach are needed to increase diversity in the field of educational measurement.

Forging a Path Toward Anti-Racism in Educational Measurement 383 The institutions through which the work of educational measurement is performed—testing companies, research organizations, institutions of higher education, state assessment and accountability offices, and professional organizations—are overrepresented by people membered White and notably underrepresented by people membered Black, Latine, and Indigenous. Although the unbalanced representation is not intentional, it is a product of the history of the field. The field was founded by White men. Starting with Francis Galton in the late 1800s through the early 1930s, the lead authors for nearly all publications specific to educational measurement were White men. It wasn’t until the 1930s when the first cohort of women entered the field. While the number of women working in the field grew over the decades, women remain underrepresented. Entry into the field has been and continues to be even slower for people membered Black and Latine. As an example, today only 20% of full professors in top-tier educational psychology programs and 33% of heads of assessment companies are women. Moreover, only 4% of full professors in top-tier educational psychology programs, and none of the heads of assessment companies, are women of color.17 Focusing on the demographic composition of students in master’s and doctoral programs that feed the field of educational measurement, data indicates that between 1996 and 2016, 6.8% of graduates were membered Black and 4.4% were membered Latine.18 Although a systematic analysis of the institutional practices that contribute to the homogeneity of the field has not yet been conducted, the institution of educational measurement is clearly less representative than the nation as a whole.19 In recent years, several efforts to increase representation have occurred. Today, the composition of the NCME leadership team is notably more diverse than two decades ago, both in terms of gender and racialized representation. In 2021, Women in Measurement was established as a nonprofit group whose mission is “dedicated to the advancement of gender and racial equity in educational measurement leadership.”20 Shortly thereafter, the Center for Measurement Justice was launched. As part of its mission, it seeks to increase the diversity of measurement specialists in the field. Both of these groups have launched initiatives that provide support for women and members of nondominant racialized groups to enter and advance in the field of educational measurement. During the summer of 2022, the Center for Assessment, a not-for-profit consulting group, also piloted a program— the Strengthening Opportunities in Assessment and Research (SOAR) Program—that worked with Historically Black Colleges and Universities to recruit undergraduate students of color who then worked as paid summer interns with members of the educational measurement community. The aim of the SOAR program is to stimulate interest in educational measurement in hopes that undergraduate students will pursue advanced training in the field.21 These efforts are important steps to help increase the diversity of the

384 Alternate Lenses for Educational Measurement field and can serve as models for the larger body of organizations that form the educational measurement community. A Final Thought Diversifying representation in the field is important, but doing so absent the reflection, research, and other efforts to advance practices is destined to reproduce the status quo and thus be insufficient for transforming educational measurement into apparatus for anti-racist endeavors. To address growing concerns about racism and the pursuit of social justice, the field of educational measurement must first acknowledge that, in its current form, it functions within, is influenced by, and contributes to systemic racism. To redress the inequities intended by our racialized system of segregated communities, the educational measurement community must become an active voice in changing narratives, reducing (and ideally eliminating) racialized bias, expanding representation in the field, and working in partnership with other institutions to modify ways in which tests are used to inform decisions that too often (re)produce disparate impacts for racialized groups. In the short term, these investments may increase costs and potentially decrease revenue—a selfless sacrifice the field should absorb in its altruistic pursuit. However, these, and surely many more, are the types of actions the field must be prepared to take if it is to escape its function as apparatus for systemic racism and serve as apparatus for anti-racist endeavors. Notes

1 Stanfield (2016), pp. 209, 211–212. 2 Sireci (2021). 3 Sireci (2021), p. 7. 4 Spector and Brannick (2011); Russell et al. (2022a). 5 Zuberi (2001); Holland (2003, 2008). 6 Hall (2017); Collins (2019). 7 López et al. (2018a, 2018b). 8 See LaFave et al. (2022) for an example of an effort to form a variable reflecting exposure to systemic racism over the course of one’s life. 9 See Hancock (2007) for ideas on how fuzzy-set logic can help in creating indicators such as these. 10 See López et al. (2018a, 2018b) for initial efforts to explore this topic. 11 Randall (2021). 12 Russell et al. (2022b). 13 González Canché (2019); Hartocollis (2019). 14 Tough (2019) describe this practice at the University of Texas at Austin and indicates the program was successful in increasing graduation rates for students whose SAT score was notably lower than predicted based on their high school grade point average.

Forging a Path Toward Anti-Racism in Educational Measurement 385 15 In recent years, several colleges and universities have eliminated requirements to submit admission test scores, and analyses of the impact these decisions are having on the composition of the body of admitted students are in progress. 16 See Evans (2019) and Scott and Siltanen (2017) for two efforts to apply multilevel modeling to examine intersectionality and the challenges encountered. 17 Women in Measurement (2021). 18 Randall et al. (2021). 19 As I performed final revisions to this manuscript, I was contacted by both NCME and AERA Div-D to participate in a study on diversity in the field educational measurement being conducted by Women in Measurement. As of this writing, findings from this study were not yet available. 20 Women in Measurement (2021). 21 Marion and Lee (2022).

References Collins, P.H. (2019). Intersectionality as critical social theory. In Intersectionality as Critical Social Theory. Duke University Press. Evans, C.R. (2019). Adding interactions to models of intersectional health inequalities: Comparing multilevel and conventional methods. Social Science & Medicine, 221, 95–105. Evans, C.R., Williams, D.R., Onnela, J.P. & Subramanian, S.V. (2018). A multilevel approach to modeling health inequalities at the intersection of multiple social identities. Social Science & Medicine, 203, 64–73. González Canché, M.S. (2019). Repurposing standardized testing for educational equity: Can geographical bias and adversity scores expand true college access? Policy Insights from the Behavioral and Brain Sciences, 6(2), 225–235. Hall, S. (2017). Familiar Stranger: A Life Between Two Islands. Duke University Press. Hancock, A.M. (2007). When multiplication doesn’t equal quick addition: Examining intersectionality as a research paradigm. Perspectives on Politics, 5(1), 63–79. Hartocollis, A. (2019). SAT “Adversity Score” is abandoned in wake of criticism. New York Times, August 27. Holland, P.W. (2003). Causation and race. ETS Research Report Series, 2003(1), i–21. Holland, P.W. (2008). Causation and race. In White Logic, White Methods: Racism and Methodology. Rowman &Littlefield. LaFave, S.E., Bandeen-Roche, K., Gee, G., Thorpe, R.J., Li, Q., Crews, D. & Szanton, S.L. (2022). Quantifying older Black Americans’ exposure to structural racial discrimination: How can we measure the water in which we swim? Journal of Urban Health, 99(5), 1–9. López, N., Erwin, C., Binder, M. & Chavez, M.J. (2018a). Making the invisible visible: Advancing quantitative methods in higher education using critical race theory and intersectionality. Race Ethnicity and Education, 21(2), 180–207. López, N., Vargas, E., Juarez, M., Cacari-Stone, L. & Bettez, S. (2018b). What’s your “street race”? Leveraging multidimensional measures of race and intersectionality for examining physical and mental health status among Latinxs. Sociology of Race and Ethnicity, 4(1), 49–66.

386 Alternate Lenses for Educational Measurement Marion, S. & Lee, J. (2022). The SOAR scholars take flight: Improving and diversifying the talent pool in educational assessment. Center for Assessment. https://www. nciea.org/blog/the-soar-scholars-take-flight/ Randall, J. (2021). “Color-Neutral” is not a thing: Redefining construct definition and representation through a justice-oriented critical antiracist lens. Educational Measurement: Issues and Practice, 40(4), 82–90. Randall, J., Rios, J.A. & Jung, H.J. (2021). A longitudinal analysis of doctoral graduate supply in the educational measurement field. Educational Measurement: Issues and Practice, 40(1), 59–68. Russell, M., Oddleifson, C., Russell Kish, M. & Kaplan, L. (2022a). Countering deficit narratives in quantitative educational research. Practical Assessment, Research, and Evaluation, 27(1), 14. Russell, M., Szendey, O. & Li, Z. (2022b). An intersectional approach to DIF: Comparing outcomes across methods. Educational Assessment, 27(2), 115–135. Scott, N.A. & Siltanen, J. (2017). Intersectionality and quantitative methods: Assessing regression from a feminist perspective. International Journal of Social Research Methodology, 20(4), 373–385. Sireci, S.G. (2021). NCME presidential address 2020: Valuing educational measurement. Educational Measurement: Issues and Practice, 40(1), 7–16. Spector, P.E. & Brannick, M.T. (2011). Methodological urban legends: The misuse of statistical control variables. Organizational Research Methods, 14(2), 287–305. Stanfield, J. H. (2016). Black Reflective Sociology: Epistemology, Theory, and Methodology. Taylor & Francis. Tough, P. (2019). The Years that Matter Most. Random House. Women in Measurement. (2021). 2021: A year in review. Accessed at https://www. womeninmeasurement.org/assets/files/WIM-Annual-Report2021.pdf Zuberi, T. (2001). Thicker Than Blood: How Racial Statistics Lie. University of Minnesota Press.

Index

1619 Project 36 1832 Reform Act 113 ableness 254 Acheson, Dean 284–285 achievement 132, 190, 205, 222, 272; academic 200; educational 186, 225 ACT 178, 193, 201, 247 Act XII 42 Act XXII 39 Acton, Henry 119 adopted sons of popes 141 Adorno, Theodor 258–259, 269–271, 275 Advanced Placement Program (AP) 249 Against Method (Feyerabend) 120 Agassiz, Louis 160 Agger, Ben 262, 268 Alacrón, Norma 321 alchemy/alchemist 65 Alexander, Michelle 90, 125 Alpha, Army 135, 175–177, 179, 184, 188, 190, 194–195, 197, 198 altruism/altruistic 1, 133, 255, 373, 374 Amendment, 14th 41, 283 An American Dilemma: The Negro Problem and Modern Democracy (Myrdal) 80 analysis of variance 219 Andreski, Stanislav 231 Anglo-Saxon 44, 45–48, 57, 58, 60, 79, 110 anti-racist xii anti-racist project see apparatus, for anti-racist endeavor antisemitic/antisemitism 257

Anzaldúa, Gloria 315, 322 apparatus xii, 5, 69, 133, 219, 241, 243, 247–248; for anti-racism xii, 255, 278, 370, 374, 384; for anti-racist endeavor vii, 3, 9, 238, 249, 278, 370, 374, 375, 384; for oppression xii, 5–6, 239; for systemic racism xii, 2, 3, 13, 126, 132, 133, 218, 242, 244, 246, 247, 249, 273, 370, 374, 375, 377, 382 Aptheker, Herbert 74 Archimedes: principal of buoyancy 260 Aristocracy: artificial 192; natural 192 Army, U.S. 174–175 Arps, George 188 Aryan 51–52 assimilation 78; cultural 77; theory 79 astronomy/astronomer 65 automated scoring machine 187 Baartman, Sarah 158 Baiocchi, Gianpaolo 80 Baker, Dominque 294 Baldwin, James 109 Banks, Sir Joseph 32 Barlowe, Arthur 92–93 Battutah, Ibn 18 Bauer, Greta 329, 331, 336–337, 341 Baxter, Richard 77 Bayes Theorem 228 Bayes, Thomas 227–228 Bazile, Judge Leon 53 Beale, Frances 314 Beecher, Henry 116 bell curve see normal distribution

388 Index Bell Curve: Intelligence and Class Structure in American Life, The (Herrnstein and Murray) 199 Bell, Derrick 279, 283–287, 325 Benedict, Ruth 75, 77 Bentham, Jeremy 348, 351 Bernier, Francois 22–23, 28, 38, 43, 51, 53, 68, 108, 134 Bernoulli, Jacob 209 Bernoulli’s Fallacy (Clayton) 227 Beta, Army 135, 175 bias 6, 107, 179, 194, 200, 229; cultural 176, 194, 195–196, 198, 354–356; implicit 80, 81, 82, 241; item 195, 206, 360–362, 366–367, 374–376, 379–380; test 132, 175, 178, 199, 200, 217, 245–247, 352–353, 359; test content 195, 355–356, 362–366, 367–368 Big Test: The Secret History of the American Meritocracy, The (Lemann) 193 Binet-Simon test of mental ability 150, 169–170, 172, 173, 175, 176, 185, 194, 197 Binet, Alfred 135, 150, 166–169, 173, 179, 197 binomial expression 209 biological 16, 76, 77, 224 birthplace 56 Black, Edwin 152 Black Lives Matter 1, 368 Black Power: The Politics of Liberation (Ture & Hamilton) 83 Blumenbach, Johann Friedrich 26–27, 28, 43, 44, 49–52, 53, 108, 158, 159, 179 Boas, Franz 75 Bonilla-Silva, Eduardo 2, 67, 68, 70, 71, 72, 80, 83, 88, 91, 96, 114 Book of Why: The New Science of Cause and Effect, The (Pearl) 227 Book on Games of Chance (Cardano) 208 borderland 315, 322 Bosnia 74 Boucicault, Dion 53 Bowleg, Lisa 221, 318, 322, 328–329, 334–338, 341 Brace, Loring 17, 30 Brady Bunch, The 108

Brannick, Michael 243, 300–302 Briggs, Derek 161–162 Brigham, Carl 191, 198, 264 Broca, Paul 157 Bronfenbrenner, Uri 318 Brown v. Board of Education of Topeka, Kansas 278, 283, 285 Brown, Derren 105–106 Brown, Sir Thomas 77 Brownell, Herbert 284–285 Brubaker, Rogers viii Buck, Carrie 147, 155 Buffon, Comte de 20, 23–25, 26, 28, 38, 43, 50, 108, 110 Bulmer, Michael 207 Burgess, Ernest 78 Burt, Cyril 199 Bush, George W. 92, 205–206 Butler, Judith 294 Campbell, Donald 289 Campbell, John 231–232 Camper, Petrus 25, 26, 28, 108, 158, 159, 179 Capitein, Jacques-Elisa-Jean 29 Carastathis, Anna 314, 318, 322–323 Carbado, Devon 232 Cardano, Gerolamo 208 Carlson, Tucker 277 Carnegie Foundation 187, 189, 193 Carnegie Institute 135, 147 Carnegie, Andrew 116 Carson, John 192 Castaneda v. Regents of the University of California 350 Catholic Church 141, 144, 316 Cattell, James 166, 173 Caucasian 26, 27, 30, 44, 45, 47, 49–52, 53, 58, 62, 94, 160 Caucasus Mountains 27, 29, 51 causal analysis 121, 227 causal/causation 222, 223, 243, 254, 262, 264, 274, 291–292, 297–298, 300–302, 316, 340, 379 cavalry, U.S. 112 Celtic 44, 45 Census Act, 13th 55 Census Bureau 57 census, U.S. 43, 48, 53–57 Center for Assessment 383 Center for Measurement Justice 383

Index 389 central limit theorem 212–213 change: reformist 315, 326, 327, 341, 342; transformative 315, 326, 327, 341, 342; phenotypical 11, 76 Chase, Allan 116 Chauncey, Henry 191–193 chi-square test 216, 219 Chico and the Man 108 child-study 149, 150, 169 China/Chinese 43, 44, 47, 49, 50, 55, 56, 61 Chinese Exclusion Act 47 Christian/Christianity 30, 38, 40, 41, 59, 109 Citizen: An American Lyric (Rankine) 98, 366 citizenship 43, 44, 47, 48–53 civil rights 62, 279; 1866 Act 48; 1957 Act 279; 1960 Act 279; 1964 Act 278, 310; 1965 Act 279; Title VII of 1964 Act 310–311 Civil War 92, 124 Clark University 135, 147, 149, 173 Clarke, Marguerite 199 Clayton, Aubrey 207–211, 219, 226–229, 232 Clinton, Bill 108 cocaine 84–85 cognitive ability see traits, mental Cohen, Jacob 231 Cold Springs Harbor 135 Cold War 283 Cole, Elizabeth 329 college admission 69, 82, 85, 89, 133, 178, 188, 189, 192, 195, 238, 247, 349–350, 376–378, 380–381 College Board 188–191 College Entrance Examination Board see College Board Collins, Patricia Hill vii, 221, 270, 285, 313–315, 318, 320, 327 colonial period 88, 89, 111 colonialism 31, 282, 288 colonialists 113, 131; English 30, 38 colonization 77 Combahee River Collective 314, 321 Comestor, Petrus 184 Comte, Auguste 121, 128, 211, 262, 264, 274 Conant, James 191–193, 200 confidence 209

Constitution, U.S. 111 Constitution, U.S. 40, 53 construct/construction: social 67, 68, 254 context neutral 196, 247, 354–356, 362–363, 367–368 context survey 244 context, social 199, 233 conviction rate 90 Cook, Captain 32 Cook, Thomas 289 Cooper, Fredrick viii Cooper, Julia 314 Copernicus 65 correlation 134, 164, 206, 212, 213, 217, 222 Cosby Show, The 108 Crania Americana (Morton) 160 cranial capacity 159–160, 179 craniometry 179 Crenshaw, Kimberlé vii, 221, 279, 312–314, 319–322, 327 criminal justice 85, 125, 241 criminality 139, 146, 147 critical legal studies 279 critical quantitative research 288 Critical Race Theory vii–viii, 9, 254, 259, 271–273, 278–282, 285–289, 290, 292, 304 Critical Theory vii–viii, 9, 127, 254, 258–273, 289, 297 criticality 260, 261, 265, 268, 271, 273 Croatian 74 Cubberley, Ellwood 188 cultural responsiveness 178 cultural sensitivity 178 culturally neutral 247 culture 6, 74, 79, 110, 177, 259, 287, 295, 315, 368, 276, 380; Black 355; dominant 173, 175, 190, 194, 196; mass 260, 270, 271; mass production 259; White 2, 108, 131, 133, 149, 196, 245, 246, 253, 362 Cuvier, George 25–26, 28, 108, 157, 158 Dale, Henry H. 123 Darwin, Charles 115–117, 138, 143, 145, 152; theory of evolution 25, 131 Davenport, Charles 135, 148 Davis, Hush 41, 59 Debra P. v. Turlington 350

390 Index Decroly, Ovide 169 defective delinquency 139 degeneration hypothesis 23–25, 25, 26 degeneration/degenerate/degenerative 24, 46, 60, 134, 135, 138, 139, 147, 148, 151–153, 164, 174, 178, 200, 215 Delgado, Richard 279 DeSantis, Ron 277 determinism 111, 115–118 Devine, Patricia 80 DiAngelo, Robin 1 Dictionary of English Language 17 Dictionary of Races or Peoples 46, 55 Dictionnaire de l’ Acadèmie 17 differential item functioning (DIF) 206, 245–246, 351–353, 360–362, 366–367, 371, 376, 380–381 dimensionality 217, 272 discourse 77, 110, 131, 200, 207, 218, 220, 223–225, 232, 238, 242, 243–245, 253, 259, 263, 287, 297, 299, 303, 304, 374–375; of possibility 259 discrimination/discriminate 62, 73, 80, 81, 83, 108, 114, 240, 300, 303, 310–312, 314, 318, 329, 333, 343, 364; gender 287; housing 279; racial 280, 283–285 disparate/disparities: criminal justice 94; economic 242, 248; educational 94, 223, 248; health 94; impact 62, 240; opportunity 69; outcomes 91, 94, 96, 106, 124, 126, 224, 232, 247, 254; policing 94; racialized 241, 249, 279 Diversity Works 1 Dixon-Román, Ezekiel 113, 114, 119, 294, 301, 302 domination 4, 91, 263, 264, 265, 268, 279; matrix of 320; White 88, 286 Dorans, Neil 350 Doron, Claude-Olivier 24 Douglas, Bronwen 17 Douglas, Frederick 94 Dovidio, John 81 Dovidio, John 175, 185 Dred Scott 40 Dryzek, John 266 Du Bois, W.E.B. 66, 71, 83, 273 Dugdale, Richard 140, 146, 153 dwarfs of the Alps 19

ecological system/structure 318, 333, 338–342 Edgeworth, Francis 219 educational assessment 8 educational measurement 6–8, 131, 133, 135–136, 140, 189, 194, 206, 212, 216, 217–218, 226, 238, 242, 249, 253; community 1, 76, 200, 247, 249, 312, 374, 377, 378, 381, 383–384; field of 97, 133, 136, 174, 194, 225, 227, 238, 242, 245, 255, 259–260, 272–273, 287, 313, 341, 348–351, 369–370, 373–374, 376–377, 379, 382–383, 384; specialist 1, 194, 221–222, 242, 254, 302–303, 373–374, 377, 380, 383 Educational Measurement: Issues and Practices 273, 349 educational outcome 7, 87, 216, 242–243, 374, 376 educational testing 1–2, 8, 184, 193, 194, 206 Educational Testing Service (ETS) 193, 196 Eisenhower, Dwight D. 278 Elimination of Mental Defect, The (Fisher) 216 Eliot, Charles 189 Ellis Island 171–173, 181, 200 Else-Quest, Nicole 317, 321, 325 emancipation 84, 265, 269 Embrick, David 290 Emergency Immigration Act of 1921 47, 173 Emerson, Ralph Waldo 45–46, 48, 108 empiricism/empiricist/empirical 29, 118, 119, 123, 131, 138, 200, 259, 260, 264, 278, 298, 352–353, 366 employment 70 English Language Learner 196 English Traits (Emerson) 45 Enlightenment 4, 17–20, 30, 97, 118, 125, 258, 265, 368 enslavement see slavery enumerator, census 74 environment see context, social equity 114, 278, 288, 383 error 214 errors, frequency of 141 errors, law of 142

Index 391 Essay on the Principle of Population (Malthus) 116 Estabrook, Arthur 140, 147–148, 152, 153 ethnicity/ethnic 73, 76, 79, 98 ethnocentric/ethnocentrism 107 eugenics 47, 132, 134, 138–139, 143, 145–146, 147, 148, 155, 157, 160, 174, 178, 198, 207, 211, 215, 217, 219, 225–227, 228, 229, 232, 290 Eugenics Record Office 133, 135–136, 139, 147, 148, 163 Eurocentric/Eurocentrism 77 Europe/European 55; eastern 43, 44, 46–48, 55, 58, 74, 77, 78, 171; southern 43, 44, 46–48, 49, 55, 58, 74, 77, 78, 171, 176 Evan, Clare 339–340 evolution, theory of 118, 137–138, 153–154 Executive Order: 10590 278; 10925 278 Eze, Emmanuel Chukwudi 29 Faces at the Bottom of the Well: The Permanence of Racism (Bell) 287 facial angle 25, 158, 179 factor analysis 217 fairness 196, 200, 348–350, 354, 358–359 falsification 119 family study 140, 152, 178, 238 Family Ties 108 Fat Albert 108 Fatal Intervention (Roberts) 16 Feagin, Joe 2, 89, 91, 92, 107, 108, 109 Fechner, Gustav 160–162 Federal Housing Authority 84 feebleminded/feeblemindedness 47, 117, 132, 138, 139, 148, 150–153, 169–170, 176, 178, 179, 216, 217 feminism 259, 314, 323; Black 314; radical 279 feminist see feminism Fermat, Pierre de 208 Fernando 39 Feyerabend, Paul 120, 121 Fine, Michelle 290 fingerprint identification 134 Fisher, Ronald 136, 207, 210, 211, 216–218, 219, 220, 225–226, 227, 228–230, 231–232; anti-Semitism

233; support for Otmar Freiherr von Verschuer 233 Floyd, George 1, 278 Foucault, Michel 3–5, 69, 238, 240 Frankfurt School 258–259, 261–265, 268, 270–271, 273, 274, 289, 297 Freedle, Roy 195 Freeman, Allen 279 Freikorp 257 Fresh Prince of Bel-Air 108 From, Erich 259 fuzzy-set logic 332–334, 379 Gaertner, Samuel 81 Gall, Franz 159, 160 Galton, Francis 133–136, 141–145, 152–153, 157, 178, 179, 210, 220, 225–226, 229, 232, 383; anthropometric laboratory 162–166; classism 61; contribution to statistics 207, 212–217 eugenics 134–135, 211; family study 132, 138–139; genius 131; Hereditary Genius 140–146; interest in the normal distribution 211, 219; Quincunx 100 Garcia, Nichole 290, 295–296 Gates, Henry Louis Jr. 59 Gauss, Carl 210 gender 221, 246, 254 General Motors Corporation (GM) 310 genetic 16, 135, 150 genius 131, 132, 138, 179, 192, 212 geocentric model 65–66 Germany/German 75, 257–258, 265; Nazi 134, 233, 257 Geuss, Raymond 268, 270 Gilbert, J.A. 166 Gillborn, David 288, 298 Giordano, Gerard 186 Gliddon, George 43 God 17, 18, 20, 21, 23, 116, 118 Goddard, Henry 131, 135, 153, 175, 178, 179, 185, 197, 198; Army Alpha 173; eugenics 136; immigration and Ellis Island 171–173; Kallikak Family 140, 149–152; use of the Binet-Simon test 166, 169–173, 194 Golash-Boza, Tanya 281 Goldman, Alvin 367 Goncalves, Antão 59

392 Index Good Times 108 Gosset, William 219 Gould, Stephen J. 160, 207 grade point average 190 Graduate Record Examination (GRE) 195 graduation 247, 249, 348, 350, 377 Gray, Robert 93 great chain of being 18, 23, 29, 311 Greek 21–22, 29; focal 245–246, 352–353, 360–362, 367, 371; reference 245, 341, 352 growth percentile 217 guidelines: bias and sensitivity 245–247, 350, 355–356, 362, 368, 376; test development/item authoring 196, 245, 246 Habermas, Jürgen 259, 263–264 Hall, Stanley G. 149, 173 Hall, Stuart 82, 317 Hamilton, Charles 83 Hancock, Ange-Marie 321, 325, 332 Happy Days 108 Harper, Shaun 224, 301–302 Harris, Elisha 147 health 70 heathen 38 Hegel, Georg 29, 30 Helms, Janet 220 Henri, Victor 166 Henrick, Kasey 290 Hepburn 77 Hereditary Genius (Galton) 140, 143, 146, 154, 157, 178, 214 Heredity of Ability, The (Spearman) 217 heredity/hereditability/hereditary 117, 125, 131–132, 134, 135, 138–139, 143, 145–146, 148, 149, 150, 152, 165, 174, 177, 198, 207, 208, 211–214, 217, 225–226, 238, 253 Herrnstein, Richard 199, 217 Hidden Cost of Being African American, The (Shapiro) 113 Higginbotham, Judge Leon 39, 42 higher education 2, 132, 188–189, 192–193, 200, 301–302, 348–350, 377, 380–381, 383 Hill, James 116 hiring/employment 91 Histoire Naturelle (Buffon) 23, 32

Historia Scholastica (Comestor) 184 historicity 254, 260, 261, 264–265, 268, 271–273, 280 History of White People, The (Painter) 21, 45 Hitler, Adolph 257–258 Hochschild, Jennifer 57 Hoffman 84 Holland, Paul 299–302 Holmes, Oliver Wendell Jr. 116, 155 Holocaust 134, 268 homogeneity 221, 226, 230, 293, 295, 296, 303, 383 Horkheimer, Max 258–259, 261, 266, 269–270, 275 Horton Hayward 96 Hottentots 19, 25, 158 housing 84 How to Make a Eugenical Family Study (Eugenics Record Office) 139 Howell, Kerry 266 Hume, David 25, 29, 30, 108 humors 65, 120 Hunter, Hiram 188 Hutu 74 Hyde, Shibley 317, 321, 325 hypothesis testing see statistics/ statistical, significance identification viii; gender 168, 220, 223, 234, 245, 319, 329, 337, 359 identity ix identity ix, 221, 247; racialized 222–223; racialized ix, xii, 37, 43, 48, 54, 55, 59, 81, 160, 221–225, 243, 262, 274, 278, 283, 293, 294, 295, 300, 311, 312, 319, 330–338, 340, 341, 352, 361–362, 374, 375, 382; single-axis category 254, 324–326, 334, 337–342, 375, 378–379 ideology/ideological 66, 71, 76, 79, 80, 82, 91, 94, 96, 97, 108, 111, 113, 114, 117, 121, 125, 131, 139, 227, 254, 263, 265; dominant 72, 111, 227, 254, 265, 267, 287, 315, 317, 330; Eurocentric white supremist 77, 82; fallacy 269–271, 286, 365; racialized 2, 72, 92, 93, 97, 107, 223, 240, 254 idiot 142, 170, 171, 172 ignorance, epistemology of 108 imbecile 142, 170

Index 393 Immigration Act of 1882 171 Immigration Act of 1907 47 Immigration Act of 1924 47, 57 Immigration and Nationality Act of 1952 49 immigration/immigrant 43, 44, 47, 48–53, 55, 56, 60, 68, 73, 77, 78, 132, 135, 171–173, 200, 264; restriction 136 imperial expansion, U.S. 117 In re Ah Yup 49 incarceration/imprisonment 69, 85, 90 Indigenous/“Indian”/Native American 43, 55, 58, 59, 68–69, 76, 77, 82, 89, 91–93, 111–112 individualism 93, 106, 107, 111, 113, 117, 125, 131, 132, 146, 269–270, 272, 274; rugged 111, 112 industrialize/industrialization 115, 171, 184 inequity 15, 54, 85, 114, 117, 133, 247–249, 287, 288–289, 339, 341, 363, 369, 384; educational 232, 242, 249, 376, 377; employment 248; health 248; racial(ized) xiii, 86, 283, 285; wealth 247 inference 6, 7, 151, 158, 159, 162, 208, 209, 218, 230, 232, 272, 359 inherit/inheritance 113–114, 199, 200 institution/institutionalize 132, 134, 136 insurance 84 integration 73; ethnic 74 intellect see traits, mental intelligence 132, 138, 172, 174, 178, 190, 197, 198, 219, 225 interest convergence 279, 281, 283, 285 International Business Machines (IBM) 187 intersectionality 282, 312–318, 331–334, 343, 348; anticategorical 324, 326; coalition 322–323; heuristic device 323–329; intercategorical 324–326; intracategorical 324–326; metaphors 318–323 Intersectionality Theory 218, 246, 254, 317, 319, 321, 323, 326–337, 340–342 IQ (intelligence quotient) 174, 179, 197; testing 132 Irish 45, 60, 73 Item: difficulty 162; discrimination 206, 217

item response theory (IRT) 206, 352–353, 366 Jameson, Robert 159 Japan/Japanese 43, 44, 47, 49, 50, 55, 57, 58 Jastrow, Joseph 166 Jefferson, Thomas 108, 192 Jensen, Arthur 162, 199, 217 Jew/Jewish/Judaism 45, 73, 233, 257, 274 Jim Crow 2, 57, 75, 124, 283, 312 Johnson, Lyndon B. 278 Johnson, Reynold 187 Johnstone, Edward 149 Jordan, Winthrop 21, 110 Judd, Charles Hubbard 187 Jukes in 1915, The (Estabrook) 146, 148 Jukes: A Study in Crime, Pauperism, Disease, and Heredity, The (Dugdale) 140, 146 junk food 246 just noticeable difference 161 justice: as fairness vii–viii, 3, 9, 255, 349–352, 357–360, 363–366; distributive 348–351, 365, 369; libertarian 123, 348; rectificatory vii–viii, 3, 9, 255, 350, 352, 364–369, 381; redistributive 123; restorative 123; social 9, 123–124, 278, 289, 297, 302, 303, 342, 348–349, 351, 365; transformative 123; utilitarian 2, 3, 107, 123–125, 124–125, 126, 131, 132, 192, 200, 253–255, 348–351, 353, 356, 358 Kallikak Family, The (Goddard) 140 Kallikak: family 135, 152, 172, 178; Deborah 150, 151; Martin 151; Martin Jr. 151 Kansas Silent Reading Test 174 Kant, Immanuel 28, 29, 30, 108 Kelly, Fredrick 174 Kendi, Ibram X. 1, 84, 93, 110 Kennedy, John F. 278 Kepler, Johannes: laws of planetary motion 260 Key, Elizabeth 42, 60 Keynes, John 260 Kim, Kyung-Man 263 King of Tars, The 21, 31

394 Index Kite, Elizabeth 150 Knowledge in a Social World (Goldman) 367 Kostin, Irene 195 Kovel, Joel 81 Kraepelin, Emil 166 Krieger, Nancy 86 LaFave, Sarah 333–334 Laplace, Pierre-Simon 210, 219 Lapland/Lapps 22, 32 Laughlin, Harry 135 law of large numbers 209 laws, mathematical 160 League of the United Latin American Citizens 56 Learning for Justice 1 Leave It to Beaver 108 Leclerc, Georges Louis see Buffon Leeman, Jennifer 55 Legacy of Malthus, The (Chase) 116 legal indeterminancy 279, 285 Lemann, Nicholas 192 Levant of Acre 21 Lewis, Amanda 67 liberate/liberation 258, 259, 261, 282, 326, 348 Life 284 Lindquist, Everett 201 Linnaeus, Carl 18–20, 22, 23, 27, 28, 31, 38, 43, 44, 46, 49, 52, 53, 68, 108, 110, 157, 200, 311 Lippmann, Walter 179, 198 Lives of Judges 140 Lodge, Henry Cabot 46, 108 Lombard, Peter 184 London International Health Exhibition 163 López, Ian Haney 48–49, 52 Lopéz, Nancy 293, 318, 331 Lorde, Audre 288, 311 Loving v. Virginia 279 Loving, Mildred and Richard 53 Lugones, María 321, 322 Madaus, George 206 Malthus, Thomas 115, 152–153 Mann, Horace 185 Marcuse, Herbert 259 Martin Luther 30 Marx, Karl 258, 273

Massachusetts Bay, colony of 40 Massachusetts Comprehensive Assessment System (MCAS) 195, 196 Matsuda, Mari 279 May, Vivian 221, 313 Mayorga, Oscar 295–296 McCall, Leslie 324–325 McConnell, Mitch 277 mean 206, 210 mean effect 356–357, 363, 369 measurement 6, 118, 122, 131, 229 Measuring Minds: Henry Herbert Goddard and the Origins of American Intelligence Testing (Zenderland) 133, 176 media 80, 86, 92, 93, 108, 110, 111, 126, 247, 333; mass 270–271 Meehl, Paul 231 Mehlman, Ken 206 Meiners, Christoph 25, 28, 108 Mendel, Gregor 117–118, 135 mental ability see traits, mental mental measures see test, mental ability merit 2, 93, 106, 111, 114, 115, 131, 146, 189–190, 192, 193–194, 238, 269–270, 350; individual 3, 5, 13, 107, 113, 125, 179, 189, 239, 247, 254, 376; intellectual 200 meritocracy 113–114, 190, 192 Mexican-American War 48, 56 Mexico/Mexican 43, 44, 48, 56–57, 62, 98, 112 Meyer, Doug 315 Meyer, Heinz-Dieter 349 Michell, Joel 6, 199 Middle Eastern/Muslim 94 military, U.S. 317 Mill, John Stuart 123, 348, 351; Greatest Happiness Principle 351 Mills, Charles 87, 88, 108, 109, 114, 282, 350, 364–368; white ignorance 365, 367–368 Minton, Henry 174, 186 miscegenation 53, 58 Monk, Ellis 293 monogenism 24, 25, 26 Moors 28, 35 moron 176 Morrison, Toni 286 mortgage 90, 291; subprime 84

Index 395 Morton, Samuel 43, 157, 159–160, 178, 179 Morton, Thomas 111–112 “mother tongue” 55–56 mulatto 56, 61, 62, 94 multiple-choice see selected-response multiplicity/multiplicitous 221, 234, 271, 312, 319, 320–323, 330, 335, 337, 341, 379, 382 Murray, Charles 199, 217 Müsterberg, Hugo 166 Myrdal, Gunnar 80 Nam family 147 narratives 72, 74, 92, 94, 96, 107, 108, 132, 240, 243, 253–254, 265, 268, 281, 282, 286, 287, 297, 301, 312, 315, 328, 367, 384; deficit 6, 12, 126, 132, 200, 223–225, 232, 238, 243, 288, 290, 302, 303, 375 National Assessment of Educational Progress 164, 244 National Council on Measurement in Education (NCME) 1, 6–7, 350, 373, 380 National Intelligence Test 185 National Merit Scholarship 350 National Research Council Committee on Intelligence Tests for Elementary Schools 185 National Socialist German Workers Party 257, 259 nativism 78 natural selection 115, 145 naturalization 43, 44, 47, 48–53 Naturalization Act of 1870 48 Naturalization Act of 1906 47 nature nepotism 144, 190, 270 New Deal Home Owners Loan Corporation 84 New Division of the Earth, The (Bernier) 22 New England Journal of Medicine 336, 341 New English Canaan (Morton) 112 New Jersey Home for the Education and Care of Feeble-Minded Children 149 new Mestiza 315 Newsweek 284

Newton, Sir Isaac: theory of gravity 260 Ngai, Mae 44 Nixon, Richard 108 No Child Left Behind Act 205–206 Noah’s ark Noble, Ellis 188 Noble, Tracey 196 norm group 197–198 norma verticalis 26, 158, 179 normal distribution 134, 141, 144, 178, 210–213, 217, 219, 229–230 Norman 44, 45 Norms: cultural 6; test 197, 217; White 79, 238, 244, 355–356, 362, 368 northern migration, the 77 Nott, Josiah 43, 160 Nozick, Robert 123, 348 nurture 141, 143, 149 objective/objectivity 106, 111, 118–122, 131, 132, 199, 200, 206, 208, 211, 216, 227, 228, 263, 272, 298; disciplinary 120; mechanical 120; quantitative 207, 231, 247, 253 octoroon 54 Octoroon, The (Boucicault) 53 Of National Characters (Hume) 25 Olou, Ijeoma 1 Omi, Michael 67, 68, 69, 70, 74, 76, 85, 238, 239–240, 260 On the Origins of Species (Darwin) 152 one-drop rule 48, 52, 53 “Operation Wetback” 48 oppression 3–5, 66, 68, 69, 125, 143, 225, 232, 238, 242, 243, 246, 249, 254, 265, 268, 271, 274, 277, 281–282, 287, 292, 303, 304, 312–315, 317–340, 348, 350, 361, 365, 366–370, 374, 376, 378; gendered 311, 313, 319, 320, 361; racialized xii, 5, 13, 74, 240–241, 286, 293, 297, 312, 313, 319, 320, 367 Otis, Arthur 173–175, 178 Otis, James 77 Outline of the History of Humanity (Meiners) 25 Ozawa v. United States 50 Ozawa, Takao 50–51 p-value 207, 216, 232 Page Act 47

396 Index Painter, Nell Irving 21, 45, 46, 110 Park, Robert 77–78 Pascal, Blaise 208, 213 Pascale, Celine-Marie 121 Patagonian giants 19 path analysis 227 pathology/pathologize 126, 238, 243–244, 253, 287, 301, 302 pauper/pauperism 134, 147, 152 Pearl, Judea 227 Pearson, Karl 136, 207, 210, 211, 212, 215–218, 219, 220, 225–226, 227, 228, 229, 231–232, 235 peer review publication 120, 273 People’s Institute for Survival and Beyond 1 percentile 212 percentile rank 217 Petty, Sir William 31 phenotype/phenotypical 67, 72, 73, 74 Philadelphia Association of Medical Instruction 159 Philadelphia Negro, The (Du Bois) 290 phlogiston 65–66, 120 phrenology 159, 199 Piaget, Jean 260 pioneer 111, 113, 131 Plato 157 Playing in the Dark (Morrison) 286 Point Comfort 36, 38 point scale 175, 177, 179, 197 policing 69, 85, 90; stop-and-frisk 90 political construction 374 Polo, Marco 16, 18, 30, 31 polygenism 25, 159 Popper, Karl 119 population 207, 210, 218–219, 220, 226, 229 Porter, Theodore 119–121, 122, 207, 210, 213, 214, 298 Portugal/Portuguese 35–36, 59, 74 positivism/positivist 118–121, 123, 125, 127, 131, 146, 160, 179, 229, 238, 254, 259, 262–263, 265–267, 273, 274, 288, 289, 311–312, 316, 323, 331, 334 poverty intergenerational 152 Powell, Brenna Marea 57 power 3–5, 66, 83, 90, 93, 94, 96, 124, 131, 238, 240, 242, 246, 254, 265–269, 271, 273, 279, 280–281,

286, 290, 291, 295, 303, 314, 315, 320–322, 328–329, 334, 343, 347, 349, 351, 360; sovereign 3; state 4 Powhatan confederation 38, 41 pre-understanding 263–264, 267, 287, 294 prejudice 81, 83, 107, 114 Prince Henry 28, 35 prison 5, 91, 125 probability 141, 207–209, 220, 227–228, 230–232; inferential 209; inverse 229, 231–232; objective 228; sampling 209; subjective 228; anti-racist xii, 69, 71, 238, 239, 249; racial xii, 69–70, 238, 240; racist xii, 69–70, 239, 288 Protocols of the Learned Elders of Zion 257 PSAT 350 psychometric 133; modeling 184 psychophysics 166, 178, 179 Punch, John 39 quadroon 54 QuantCrit vii–viii, 3, 9, 254, 271, 278, 287–303, 305 quantification/quantitative 2, 118, 121, 131, 179, 184, 199, 211, 238, 243; criticalist 288–289; educational research 8 quantitative methods/analysis see statistics Quantitative Critical Race Theory see QuantCrit quantitative imperative 121, 206, 238 quantitative measure 206 quantitative social science 8 quartile 212 Quetelet, Adolphe 210–211, 218–219; “average” or “normal man” 210–211, 218–219 Quincunx 100, 213, 219 race xi, 2, 16, 238, 246, 254; biological 11, 15–18, 20, 23, 24, 29, 30, 31, 37, 38, 43, 44, 49, 54, 58, 62, 66–68, 76, 107–108, 132, 220, 224, 262, 274, 281, 282, 293, 330; “effect of ” 46, 223, 243, 299–301, 312, 338; individual trait 242, 243, 253; origins of 16–17; street 293, 294, 318, 331, 334, 379; suicide 46 Race and Racism (Benedict) 75

Index 397 race tax 85 Race, Traits and Tendencies of the American Negro (Hoffman) 84 Races of Europe, The (Ripley) 46 racial contract 61, 87, 88, 97, 107–108, 282 Racial Equity Institute 1 racial formation xi, 69–70; hierarchy 2, 28, 253 Racial Integrity Act, 1924 53 racialism xi, 80, 111 racialist xi, 72–73 racialization xi, 68, 69, 70–71, 74, 79, 91, 239 racialized 98; category 221, 239, 254; group 142, 239; social system/ structure 70–71, 92, 97 racism 238, 254; aversive 66, 80–82, 94, 96, 290; biological 77, 220; covert 80, 83, 88; dominative 80; individual xii, 79–83, 94, 96; institutional xii, 83–86, 94; scientific 159; structural xii, 86, 91, 133, 247, 302, 333–334, 382; systemic xii, 1–3, 5–6, 8–9, 13–14, 58, 66–67, 76, 91–97, 107–108, 109, 125, 132–133, 218, 223–225, 238, 239, 241, 242, 246–249, 253–255, 280, 282, 286, 290–292, 328, 367, 370, 375–378, 381–384 racist xi, 12, 71–73, 96, 232; overt 6, 80, 82, 94, 96, 200, 290 Rafter, Nicole Hahn 138, 139 Randall, Jennifer 196, 246–247, 350, 354–356, 362, 368–369, 379 Rankine, Claudia 98, 366 Rawls, John 123, 348–351, 357–360, 364–367; ideal state 358, 363; original position 358, 360–363; veil of ignorance 359, 364, 367 Reading the Forested Landscape (Wessels) 111–112 Reagan, Ronald 93, 100, 108 realism 120 redlining 69, 84, 88–90, 247–248, 291, 316, 382 reflexivity 260, 261, 265, 266–268, 271, 273, 286, 296 Regents exam 187 regression 134, 164, 206, 207, 212, 213, 214, 216–217, 222–225, 243, 299, 335–336, 381

replication crisis 121, 132, 230–232 residential 70 Ripley, William 46, 108 Roberts, Dorothy 16, 23 Rockefeller Foundation 135 Rockefeller, John D. 116 Roman Empire 35 Roosevelt, Theodore 45, 46, 47, 48, 108 Rothstein, Richard 84, 90 Rousseau, Jean-Jacques 87 Rozeboom, William 231 Rush, Benjamin 77 Rwanda 74 Sablan, Jenna 288 Sandel, Michael 351 Sandiford, Ralph 77 Sanford and Son 108 Santelices, Maria 196 SAT 132, 178, 190–193, 195–196, 198, 247, 298 scala naturae 18 scale score 217 Scheurich, James 83, 110 scholarship 135, 191, 247–249, 278, 280, 282, 348, 350, 377 Scholastic Aptitude Test see SAT schools 70 Schudson, Michael 188, 198 Schuyten, M.C. 169 science/scientific: discovery 107, 118; scientific method 118, 120, 121, 123; scientific naturalism 143; racial 24 Scope and Importance of the State of the Science of National Eugenics, The (Pearson) 215 scoring sheet, scannable 187 Scott, Nicholas 338 Sedgwick, Henry 123 segregation/segregated 12, 13, 69, 86, 88, 89, 216, 246–248, 278–279, 284–285, 333 selected-response 174, 175, 176, 185, 187, 191, 193 Senate, U.S. 55 sentencing, criminal 82, 84, 90, 241 Sententiae (Lombard) 184 separate-but-equal 89 Serbs 74 Servicemen’s Readjustment Act/GI Bill 84

398 Index settlers, English 111–112 sexuality/sexual orientation 221, 254, 280, 295, 311, 314–317, 321–323, 325, 328, 330 Shadish, William 289 Shakespeare 157 Shapiro, Thomas 113 Shields, Stephanie 317 Short History of the English Colonies in America (Lodge) 46 Sidgwick, Henry 348 Siltanen, Janet 338 Simon, Théodore 168, 169 Sireci, Steve 1–2, 245, 374 skulls 25, 26–27, 158, 179, 219 slave trade, African 28, 59 slavery/slave 2, 27, 28, 29, 35, 52, 57, 58, 59, 68–69, 74, 76, 77, 82, 88, 89, 92, 124–125, 143, 233, 268–269, 274, 311, 312, 355 Smarter Balanced Assessment Consortium 260 social contract 61, 100, 131 social Darwinism 111, 115–118 social injustice 261, 289, 297, 365 Social Justice and Educational Measurement: John Rawls, the History of Testing, and the Future of Education (Stein) 349 social conditions 5, 143, 151, 153, 225, 226 social location 268, 281, 325, 326, 330, 334–335, 341, 361 social security 89, 92 Social Security Act 84 social structure 232, 239, 243, 244, 253 social-economic/socio-economic status 113, 221, 223, 245, 246, 249, 254 sociology/sociologist 66, 121, 264 Solórzano, Daniel 287, 290 South Africa 71 Soviet Union 283 Spearman, Charles 207, 217, 225–226, 232, 233; g 217 special education 177 Spector, Paul 243, 300–302 Spencer, Herbert 115, 116, 118, 131, 152–153 square grid, mapping 122, 128 Stage, Francis 288 Stamped from the Beginning (Kendi) 92

standard deviation 206, 216 standardize/standardization 118, 122, 123, 174, 188–189, 271–272 Stanfield, John 61, 100, 131, 374 Stanford Achievement Test 186 Stanford-Binet scale 175, 176, 186 statistics/statistical 7, 132, 133, 134, 140, 206–207, 218, 222, 224, 226–227, 232, 238, 243, 245, 253–254; Bayesian 121, 207, 227, 230, 232, 235; frequentist school of 207–208, 227, 228–232; inferential 221, 225; multi-level model 291–292, 303, 335, 338–340, 378, 382; power 221; significance 216, 221, 228, 229, 230–232 Statistical Methods for Research Workers (Fisher) 230 Statistical model: additive 335–337; interaction 337–338 statistics by intercomparison 212, 217 Stein, Zachary 349 stereotype 12, 80, 93, 107, 288, 326, 368 stereotype threat vulnerability 293, 329 sterilize/sterilization 132, 134, 136, 148, 152, 155, 216 Stern, William 174, 179 Stevens, Stanley S. 6 Stigler, Stephen 207, 212 Storytelling 281, 283, 287; counterstory 283, 286 Strengthening Opportunities in Assessment and Research (SOAR) 383 Students t-test 219, 220 subjective/subjectivity 120, 128, 174, 211, 227, 232, 264, 266, 267 Summa Theologiciae (Aquinas) 184 Sumner, William Graham 117 Supreme Court, U.S. 40–41, 48–53, 147, 279, 283–284 survival of the fittest 47, 115–117, 131, 132, 139, 152–153 Sutherland, Justice 50–51 Swann v. Charlotte-Mecklenburg Board of Education 279 Sweat, Robert 41, 60 Sykes, Lori 96 Systema Naturae (Linnaeus) 18–20 Taney, Justice Roger B. 40–41 Taylor, Edward 282, 292

Index 399 Taylor, Linda 100–101 Teranishi, Robert 295–296, 298 TERC 196 Terman, Lewis 173, 175–176, 178, 179, 185–186, 194, 197, 198, 264 test: accommodations 129; achievement 188, 198, 222; admission 132, 179, 190, 192, 200, 238, 249, 253; content 194; educational 7, 242, 245, 247; essay-based 190–191; intelligence 145, 157, 158, 160, 171, 177, 178, 185, 186, 188, 195–200, 217, 238; mental ability 7, 131, 135, 149, 163, 165, 166–169, 175, 176, 193, 195, 217, 245; preparation 184, 193; score use 7, 69, 85, 176–178, 185, 190, 192, 198–201, 205, 206, 220, 222–223, 238, 247–249, 253–254, 272, 348–335, 352, 359, 363, 370; standardized 136, 176, 177, 185, 188, 205 The Standards for Educational and Psychological Testing (Joint Standards) 7, 245, 359–360 theological disputations 184 theory 260; critical 254, 260, 261, 263, 266, 268–270, 280, 281, 315; critical social 313–315, 328, 348; modern social 255, 277; of action 260; social 152, 259, 262, 278, 282, 297, 329 Theory of the Motion of Heavenly Bodies Moving about the Sun in Conic Sections, The (Gauss) 210 Thind, Bhagat Singh 51 Thomas Aquinas 184 Thorndike, Edward 136, 174, 187 Time 284 tracking 186 traits: family 153; mental 132, 135, 138, 145, 147, 157, 160, 162, 164, 194, 198, 200, 212, 238, 253 Treaty of Versailles 257 Trends in International Mathematics and Science Study 164, 244 Trump, Donald 93, 108 Trust in Numbers (Porter) 119 Truth, Sojourner 311, 314 Tufte, Edward 289 Tukey, John 289 Ture, Kwame (Stokely Carmichael) 83 Tutsi 74 Type I error 221, 234, 380

Tyron, Thomas 77 universal design 129 universal social laws see universalism universal truth see universalism universalism 111, 118, 119, 229, 253–254 validity 175, 176, 179 variation/variability 214, 220 Veenstra, Gerry 315 Vineland Training School for FeebleMinded Girls and Boys 135, 136, 149, 169, 171, 175, 194; colony/ colonists 38, 39, 40 Virginia, Code of 1930 53 Voltaire 25, 28, 108 vote/voting 90, 94, 241, 279 Walker, Francis Amasa 46 Wallace, Alfred 115, 143 Warner, Leah 317 Watson, Thomas 187 wealth 70, 84, 113 weather patterns 134 Weber, Ernest Heinrich 161 welfare queen 93, 100 welfare, social 152 Wells-Barnett, Ida B. 285 Wells, Ryan 288 Wessels, Tom 111 West, Cornel 280 Whipple, Guy 173–175, 188 Whitaker, Cord 11, 20–21 White by Law (López) 48 white racial frame xii, 2–3, 4, 13, 76, 97, 105–111, 114, 118, 123, 125, 131, 133, 136, 138, 146, 157, 179, 189, 200, 238, 242–247, 249, 253, 269, 270, 272, 286, 311, 316, 329, 334, 342, 370, 374–377 white supremacy/supremist 77, 88, 125, 207, 253, 354 White Trash: The Eugenic Family Studies, 1877–1919 (Rafter) 138 Wilhem, Anton 29 Williams, Francis 29 Wilson, Mark 196 Wilson, Woodrow 112 Winant, Howard 67, 68, 69, 70, 74, 76, 85, 238, 239–240, 160

400 Index Winckelmann, Johann 21–22 Wolverton, Emma 150 Women in Measurement 383, 385 Wood, Ben 187 Woolman, John 77 World Book Company 186 World War I 174, 185, 188, 190, 198, 258 World War II 193 Yeakey, Carol 83 Yerkes, Robert 136, 173–177, 179, 185, 197, 198, 264

Yosso, Tara 287 Young, Michelle 83, 110 Yugoslavia 74 z-score 216 Zabeth, Rhoda 151 Zenderland, Leila 133, 149, 169, 170, 176, 177, 207 Zuberi, Tukufu 290–291, 297–302 Zurara, Gomes Easne de Azurara 28, 35, 59 Zwick, Rebecca 350